Leha, Andreas
2018-11-26 11:10:10 UTC
Hi all,
I am interested in assessing the association of a (potential) risk
factor to a (binary) grouping.
I am having trouble with diverging results from modeling one time point
(without random effect) and modeling two time points (with random effect).
When analysing the first time point (base line, BL) only, I get a highly
significant association.
Now, I want to see, whether there is an interaction between time and
risk factor (the risk factor is not constant). But when analysing both
time points, the estimated effect at BL is estimated to be not significant.
Now my simplified questions are:
(1) Is there an association at BL or not?
(2) How should I analyse both time points with this data?
The aim is to look for confounding with other factors. But I'd like to
understand the simple models before moving on.
Below you find a reproducible example and the detailed results.
Any suggestions would be highly appreciated!
Regards,
Andreas
PS: The code / results
---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")
## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>%
as.character %>%
gsub("\r\n", "", .) %>%
parse(text = .) %>%
eval
## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----
## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")
dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----
## group sizes seem fine
## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##
## we build a cross table
tab_bl <-
dat %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 35 174
## | mut 170 96
## `----
## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value < 2.2e-16
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## | 0.07099792 0.18002325
## | sample estimates:
## | odds ratio
## | 0.1141677
## `----
log(0.114)
## ,----
## | [1] -2.171557
## `----
## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -2.2
## we get the same result using logistic regression:
dat %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## |
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## | Min 1Q Median 3Q Max
## | -1.8802 -0.9374 0.6119 0.6119 1.4381
## |
## | Coefficients:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 651.63 on 474 degrees of freedom
## | Residual deviance: 538.83 on 473 degrees of freedom
## | AIC: 542.83
## |
## | Number of Fisher Scoring iterations: 4
## `----
## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##
dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----
## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
I am interested in assessing the association of a (potential) risk
factor to a (binary) grouping.
I am having trouble with diverging results from modeling one time point
(without random effect) and modeling two time points (with random effect).
When analysing the first time point (base line, BL) only, I get a highly
significant association.
Now, I want to see, whether there is an interaction between time and
risk factor (the risk factor is not constant). But when analysing both
time points, the estimated effect at BL is estimated to be not significant.
Now my simplified questions are:
(1) Is there an association at BL or not?
(2) How should I analyse both time points with this data?
The aim is to look for confounding with other factors. But I'd like to
understand the simple models before moving on.
Below you find a reproducible example and the detailed results.
Any suggestions would be highly appreciated!
Regards,
Andreas
PS: The code / results
---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")
## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>%
as.character %>%
gsub("\r\n", "", .) %>%
parse(text = .) %>%
eval
## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----
## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")
dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----
## group sizes seem fine
## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##
## we build a cross table
tab_bl <-
dat %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 35 174
## | mut 170 96
## `----
## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value < 2.2e-16
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## | 0.07099792 0.18002325
## | sample estimates:
## | odds ratio
## | 0.1141677
## `----
log(0.114)
## ,----
## | [1] -2.171557
## `----
## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -2.2
## we get the same result using logistic regression:
dat %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## |
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## | Min 1Q Median 3Q Max
## | -1.8802 -0.9374 0.6119 0.6119 1.4381
## |
## | Coefficients:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 651.63 on 474 degrees of freedom
## | Residual deviance: 538.83 on 473 degrees of freedom
## | AIC: 542.83
## |
## | Number of Fisher Scoring iterations: 4
## `----
## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##
dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----
## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------