Discussion:
[R-sig-ME] diverging results with and without random effects
Leha, Andreas
2018-11-26 11:10:10 UTC
Permalink
Hi all,

I am interested in assessing the association of a (potential) risk
factor to a (binary) grouping.

I am having trouble with diverging results from modeling one time point
(without random effect) and modeling two time points (with random effect).

When analysing the first time point (base line, BL) only, I get a highly
significant association.
Now, I want to see, whether there is an interaction between time and
risk factor (the risk factor is not constant). But when analysing both
time points, the estimated effect at BL is estimated to be not significant.

Now my simplified questions are:
(1) Is there an association at BL or not?
(2) How should I analyse both time points with this data?

The aim is to look for confounding with other factors. But I'd like to
understand the simple models before moving on.

Below you find a reproducible example and the detailed results.

Any suggestions would be highly appreciated!

Regards,
Andreas



PS: The code / results

---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")

## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>%
as.character %>%
gsub("\r\n", "", .) %>%
parse(text = .) %>%
eval



## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----

## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")

dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----

## group sizes seem fine



## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##

## we build a cross table
tab_bl <-
dat %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 35 174
## | mut 170 96
## `----

## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value < 2.2e-16
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## | 0.07099792 0.18002325
## | sample estimates:
## | odds ratio
## | 0.1141677
## `----
log(0.114)
## ,----
## | [1] -2.171557
## `----

## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -2.2

## we get the same result using logistic regression:
dat %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## |
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## | Min 1Q Median 3Q Max
## | -1.8802 -0.9374 0.6119 0.6119 1.4381
## |
## | Coefficients:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 651.63 on 474 degrees of freedom
## | Residual deviance: 538.83 on 473 degrees of freedom
## | AIC: 542.83
## |
## | Number of Fisher Scoring iterations: 4
## `----



## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##

dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----

## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
Leha, Andreas
2018-11-26 12:47:57 UTC
Permalink
Hi all,

sent the wrong code (w/o filtering for BL). If you want to look at the
data, please use this code:

---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")

## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
"", .) %>% parse(text = .) %>% eval



## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----

## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")

dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----

## group sizes seem fine



## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##

## we build a cross table
tab_bl <-
dat %>%
dplyr::filter(fu == "BL") %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 22 86
## | mut 87 63
## `----

## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value = 1.18e-09
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## | 0.09986548 0.33817966
## | sample estimates:
## | odds ratio
## | 0.1865377
## `----
log(0.187)
## ,----
## | [1] -1.676647
## `----

## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -1.7

## we get the same result using logistic regression:
dat %>%
filter(fu == "BL") %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## | Min 1Q Median 3Q Max
## | -1.7890 -1.0484 0.6715 0.6715 1.3121
## |
## | Coefficients:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.3749 0.2386 5.761 8.35e-09 ***
## | riskfactornorisk -1.6861 0.2906 -5.802 6.55e-09 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 350.80 on 257 degrees of freedom
## | Residual deviance: 312.63 on 256 degrees of freedom
## | AIC: 316.63
## |
## | Number of Fisher Scoring iterations: 4
## `----



## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##

dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----

## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
Post by Leha, Andreas
Hi all,
I am interested in assessing the association of a (potential) risk
factor to a (binary) grouping.
I am having trouble with diverging results from modeling one time point
(without random effect) and modeling two time points (with random effect).
When analysing the first time point (base line, BL) only, I get a highly
significant association.
Now, I want to see, whether there is an interaction between time and
risk factor (the risk factor is not constant). But when analysing both
time points, the estimated effect at BL is estimated to be not significant.
(1) Is there an association at BL or not?
(2) How should I analyse both time points with this data?
The aim is to look for confounding with other factors. But I'd like to
understand the simple models before moving on.
Below you find a reproducible example and the detailed results.
Any suggestions would be highly appreciated!
Regards,
Andreas
PS: The code / results
---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")
## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>%
as.character %>%
gsub("\r\n", "", .) %>%
parse(text = .) %>%
eval
## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----
## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")
dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----
## group sizes seem fine
## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##
## we build a cross table
tab_bl <-
dat %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 35 174
## | mut 170 96
## `----
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value < 2.2e-16
## | alternative hypothesis: true odds ratio is not equal to 1
## | 0.07099792 0.18002325
## | odds ratio
## | 0.1141677
## `----
log(0.114)
## ,----
## | [1] -2.171557
## `----
## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -2.2
dat %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## |
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Min 1Q Median 3Q Max
## | -1.8802 -0.9374 0.6119 0.6119 1.4381
## |
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 651.63 on 474 degrees of freedom
## | Residual deviance: 538.83 on 473 degrees of freedom
## | AIC: 542.83
## |
## | Number of Fisher Scoring iterations: 4
## `----
## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##
dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----
## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'

UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml
Leha, Andreas
2018-11-26 16:00:53 UTC
Permalink
Dear Thierry,

thanks for looking into this!

So, one solution would be a baysian analysis, right?

Would you have a recommendation for me?

I followed [1] and used

library("blme")
dat %>%
bglmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
family = "binomial",
data = .,
fixef.prior = normal(cov = diag(9,4))) %>%
summary

Which runs and gives the following fixed effect estimates:


Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.2598 0.7445 11.094 <2e-16 ***
riskfactornorisk -16.0942 1.3085 -12.300 <2e-16 ***
fuFU 1.0019 1.0047 0.997 0.319
riskfactornorisk:fuFU -1.8675 1.2365 -1.510 0.131


These still do not seem reasonable.

Thanks in advance!

Regards,
Andreas


[1]
https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678
Dear Andreas,
This is due to quasi complete separatation. This occurs when all
responses for a specific combination of levels are always TRUE or FALSE.
In your case, you have only two observations per patient. Hence adding
the patient as random effect, guarantees quasi complete separation issues.  
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be <http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op ma 26 nov. 2018 om 13:48 schreef Leha, Andreas
Hi all,
sent the wrong code (w/o filtering for BL).  If you want to look at the
---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")
## ---------------------------------- ##
## load the data                      ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>% gsub("\r\n",
"", .) %>% parse(text = .) %>% eval
## ---------------------------------- ##
## have a look                        ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## |    patient group fu    riskfactor
## |    <fct>   <fct> <fct> <fct>
## |  1 p001    wt    BL    norisk
## |  2 p002    wt    BL    norisk
## |  3 p003    wt    BL    norisk
## |  4 p004    wt    BL    norisk
## |  5 p005    wt    BL    norisk
## |  6 p006    wt    BL    norisk
## |  7 p007    wt    BL    norisk
## |  8 p008    wt    BL    norisk
## |  9 p009    wt    BL    risk
## | 10 p010    wt    BL    norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  475 obs. of  4
## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1
1 ...
## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1
1 ...
## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----
## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")
dat %>% summary
## ,----
## |     patient    group      fu       riskfactor
## |  p001   :  2   wt :209   BL:258   risk  :205
## |  p002   :  2   mut:266   FU:217   norisk:270
## |  p003   :  2
## |  p004   :  2
## |  p005   :  2
## |  p006   :  2
## |  (Other):463
## `----
## group sizes seem fine
## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##
## we build a cross table
tab_bl <-
  dat %>%
  dplyr::filter(fu == "BL") %>%
  dplyr::select(group, riskfactor) %>%
  table
tab_bl
## ,----
## |      riskfactor
## | group risk norisk
## |   wt    22     86
## |   mut   87     63
## `----
tab_bl %>% fisher.test
## ,----
## |    Fisher's Exact Test for Count Data
## |
## | data:  .
## | p-value = 1.18e-09
## | alternative hypothesis: true odds ratio is not equal to 1
## |  0.09986548 0.33817966
## | odds ratio
## |  0.1865377
## `----
log(0.187)
## ,----
## | [1] -1.676647
## `----
## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -1.7
dat %>%
  filter(fu == "BL") %>%
  glm(group ~ riskfactor, family = "binomial", data = .) %>%
  summary
## ,----
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## |     Min       1Q   Median       3Q      Max
## | -1.7890  -1.0484   0.6715   0.6715   1.3121
## |
## |                  Estimate Std. Error z value Pr(>|z|)
## | (Intercept)        1.3749     0.2386   5.761 8.35e-09 ***
## | riskfactornorisk  -1.6861     0.2906  -5.802 6.55e-09 ***
## | ---
## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## |     Null deviance: 350.80  on 257  degrees of freedom
## | Residual deviance: 312.63  on 256  degrees of freedom
## | AIC: 316.63
## |
## | Number of Fisher Scoring iterations: 4
## `----
## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##
dat %>%
  glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
  summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## |   Approximation) [glmerMod]
## |  Family: binomial  ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## |    Data: .
## |
## |      AIC      BIC   logLik deviance df.resid
## |    345.2    366.0   -167.6    335.2      470
## |
## |       Min        1Q    Median        3Q       Max
## | -0.095863 -0.058669  0.002278  0.002866  0.007324
## |
## |  Groups  Name        Variance Std.Dev.
## |  patient (Intercept) 1849     43
## | Number of obs: 475, groups:  patient, 265
## |
## |                       Estimate Std. Error z value Pr(>|z|)
## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
## | fuFU                    0.4596     1.9165   0.240    0.810
## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
## | ---
## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## |             (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU        -0.513  0.510
## | rskfctrn:FU  0.478 -0.576 -0.908
## `----
## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
Post by Leha, Andreas
Hi all,
I am interested in assessing the association of a (potential) risk
factor to a (binary) grouping.
I am having trouble with diverging results from modeling one time
point
Post by Leha, Andreas
(without random effect) and modeling two time points (with random
effect).
Post by Leha, Andreas
When analysing the first time point (base line, BL) only, I get a
highly
Post by Leha, Andreas
significant association.
Now, I want to see, whether there is an interaction between time and
risk factor (the risk factor is not constant).  But when analysing
both
Post by Leha, Andreas
time points, the estimated effect at BL is estimated to be not
significant.
Post by Leha, Andreas
(1) Is there an association at BL or not?
(2) How should I analyse both time points with this data?
The aim is to look for confounding with other factors.  But I'd
like to
Post by Leha, Andreas
understand the simple models before moving on.
Below you find a reproducible example and the detailed results.
Any suggestions would be highly appreciated!
Regards,
Andreas
PS: The code / results
---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")
## ---------------------------------- ##
## load the data                      ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>%
   as.character %>%
   gsub("\r\n", "", .) %>%
   parse(text = .) %>%
   eval
## ---------------------------------- ##
## have a look                        ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## |    patient group fu    riskfactor
## |    <fct>   <fct> <fct> <fct>
## |  1 p001    wt    BL    norisk
## |  2 p002    wt    BL    norisk
## |  3 p003    wt    BL    norisk
## |  4 p004    wt    BL    norisk
## |  5 p005    wt    BL    norisk
## |  6 p006    wt    BL    norisk
## |  7 p007    wt    BL    norisk
## |  8 p008    wt    BL    norisk
## |  9 p009    wt    BL    risk
## | 10 p010    wt    BL    norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':        475 obs. of 
## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2 3 4
5 6 7
Post by Leha, Andreas
8 9 10 ...
## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1
1 1 ...
Post by Leha, Andreas
## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1
1 1 ...
Post by Leha, Andreas
## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2
2 2 2
Post by Leha, Andreas
1 2 ...
## `----
## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")
dat %>% summary
## ,----
## |     patient    group      fu       riskfactor
## |  p001   :  2   wt :209   BL:258   risk  :205
## |  p002   :  2   mut:266   FU:217   norisk:270
## |  p003   :  2
## |  p004   :  2
## |  p005   :  2
## |  p006   :  2
## |  (Other):463
## `----
## group sizes seem fine
## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##
## we build a cross table
tab_bl <-
   dat %>%
   dplyr::select(group, riskfactor) %>%
   table
tab_bl
## ,----
## |      riskfactor
## | group risk norisk
## |   wt    35    174
## |   mut  170     96
## `----
tab_bl %>% fisher.test
## ,----
## |    Fisher's Exact Test for Count Data
## |
## | data:  .
## | p-value < 2.2e-16
## | alternative hypothesis: true odds ratio is not equal to 1
## |  0.07099792 0.18002325
## | odds ratio
## |  0.1141677
## `----
log(0.114)
## ,----
## | [1] -2.171557
## `----
## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -2.2
dat %>%
   glm(group ~ riskfactor, family = "binomial", data = .) %>%
   summary
## ,----
## |
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## |     Min       1Q   Median       3Q      Max
## | -1.8802  -0.9374   0.6119   0.6119   1.4381
## |
## |                  Estimate Std. Error z value Pr(>|z|)
## | (Intercept)        1.5805     0.1856   8.515   <2e-16 ***
## | riskfactornorisk  -2.1752     0.2250  -9.668   <2e-16 ***
## | ---
## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## |     Null deviance: 651.63  on 474  degrees of freedom
## | Residual deviance: 538.83  on 473  degrees of freedom
## | AIC: 542.83
## |
## | Number of Fisher Scoring iterations: 4
## `----
## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##
dat %>%
   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
family =
Post by Leha, Andreas
"binomial", data = .) %>%
   summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## |   Approximation) [glmerMod]
## |  Family: binomial  ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## |    Data: .
## |
## |      AIC      BIC   logLik deviance df.resid
## |    345.2    366.0   -167.6    335.2      470
## |
## |       Min        1Q    Median        3Q       Max
## | -0.095863 -0.058669  0.002278  0.002866  0.007324
## |
## |  Groups  Name        Variance Std.Dev.
## |  patient (Intercept) 1849     43
## | Number of obs: 475, groups:  patient, 265
## |
## |                       Estimate Std. Error z value Pr(>|z|)
## | (Intercept)            11.6846     1.3736   8.507   <2e-16 ***
## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
## | fuFU                    0.4596     1.9165   0.240    0.810
## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
## | ---
## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## |             (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU        -0.513  0.510
## | rskfctrn:FU  0.478 -0.576 -0.908
## `----
## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'
UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'

UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml
Leha, Andreas
2018-11-27 04:30:55 UTC
Permalink
Dear Thierry and all,

Thanks for your continued help here. I am not versed with Bayesian
analyses.

Below is the code I currently use. The priors are basically due to
trial and error until I got expected/reasonable results.

Therefor I would be grateful for some comments on the
(in-)appropriateness of my (quite extreme) parameters.

As cov.prior I used
invwishart(df = 50, scale = diag(0.5, 1))

Thanks in advance!

Regards,
Andreas


PS: The code/results


library("blme")
dat %>%
bglmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
family = "binomial",
data = .,
cov.prior = invwishart(df = 50, scale = diag(0.5, 1)),
fixef.prior = normal(cov = diag(9,4))) %>%
summary
## ,----
## | Cov prior : patient ~ invwishart(df = 50, scale = 0.5,
## | posterior.scale = cov, common.scale = TRUE)
## | Fixef prior: normal(sd = c(3, 3, ...), corr = c(0 ...),
## | common.scale = FALSE)
## | Prior dev : 6.2087
## |
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [bglmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 540.0 560.8 -265.0 530.0 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -2.4984 -0.8512 0.3979 0.5038 1.6228
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 0.009725 0.09862
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.3679 0.2355 5.810 6.26e-09 ***
## | riskfactornorisk -1.6776 0.2868 -5.850 4.91e-09 ***
## | fuFU 0.4718 0.3738 1.262 0.2069
## | riskfactornorisk:fuFU -1.1375 0.4539 -2.506 0.0122 *
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.816
## | fuFU -0.617 0.502
## | rskfctrn:FU 0.503 -0.618 -0.817
## `----
Dear Andreas,
You'll need a very informative prior for the random intercept variance
in order to keep the random intercepts reasonable small.
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be <http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op ma 26 nov. 2018 om 17:00 schreef Leha, Andreas
Dear Thierry,
thanks for looking into this!
So, one solution would be a baysian analysis, right?
Would you have a recommendation for me?
I followed [1] and used
  library("blme")
  dat %>%
    bglmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
           family = "binomial",
           data = .,
           fixef.prior = normal(cov = diag(9,4))) %>%
    summary
                        Estimate Std. Error z value Pr(>|z|)
  (Intercept)             8.2598     0.7445  11.094   <2e-16 ***
  riskfactornorisk      -16.0942     1.3085 -12.300   <2e-16 ***
  fuFU                    1.0019     1.0047   0.997    0.319
  riskfactornorisk:fuFU  -1.8675     1.2365  -1.510    0.131
These still do not seem reasonable.
Thanks in advance!
Regards,
Andreas
[1]
https://stats.stackexchange.com/questions/132677/binomial-glmm-with-a-categorical-variable-with-full-successes/132678#132678
Dear Andreas,
This is due to quasi complete separatation. This occurs when all
responses for a specific combination of levels are always TRUE or
FALSE.
In your case, you have only two observations per patient. Hence adding
the patient as random effect, guarantees quasi complete separation
issues.  
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be <http://www.inbo.be> <http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more
than asking him to perform a post-mortem examination: he may be
able to
say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer
does not
ensure that a reasonable answer can be extracted from a given body of
data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>
Op ma 26 nov. 2018 om 13:48 schreef Leha, Andreas
     Hi all,
     sent the wrong code (w/o filtering for BL).  If you want to
look at the
     ---------- cut here --------------------------------------------
     library("dplyr")
     library("lme4")
     library("lmerTest")
     ## install_github("hrbrmstr/pastebin", upgrade_dependencies =
FALSE)
     library("pastebin")
     ## ---------------------------------- ##
     ## load the data                      ##
     ## ---------------------------------- ##
     dat <- pastebin::get_paste("Xgwgtb7j") %>% as.character %>%
gsub("\r\n",
     "", .) %>% parse(text = .) %>% eval
     ## ---------------------------------- ##
     ## have a look                        ##
     ## ---------------------------------- ##
     dat
     ## ,----
     ## | # A tibble: 475 x 4
     ## |    patient group fu    riskfactor
     ## |    <fct>   <fct> <fct> <fct>
     ## |  1 p001    wt    BL    norisk
     ## |  2 p002    wt    BL    norisk
     ## |  3 p003    wt    BL    norisk
     ## |  4 p004    wt    BL    norisk
     ## |  5 p005    wt    BL    norisk
     ## |  6 p006    wt    BL    norisk
     ## |  7 p007    wt    BL    norisk
     ## |  8 p008    wt    BL    norisk
     ## |  9 p009    wt    BL    risk
     ## | 10 p010    wt    BL    norisk
     ## | # ... with 465 more rows
     ## `----
     dat %>% str
     ## ,----
     ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  475 obs. of  4
     ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1 2
3 4 5 6 7
     8 9 10 ...
     ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1
1 1 1
     1 ...
     ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1
1 1 1
     1 ...
     ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2
2 2 2 2 2
     1 2 ...
     ## `----
     ## there are 265 patients
     ## in 2 groups: "wt" and "mut"
     ## with a dichotomous risk factor ("risk" and "norisk")
     ## measured at two time points ("BL" and "FU")
     dat %>% summary
     ## ,----
     ## |     patient    group      fu       riskfactor
     ## |  p001   :  2   wt :209   BL:258   risk  :205
     ## |  p002   :  2   mut:266   FU:217   norisk:270
     ## |  p003   :  2
     ## |  p004   :  2
     ## |  p005   :  2
     ## |  p006   :  2
     ## |  (Other):463
     ## `----
     ## group sizes seem fine
     ## ---------------------------------------------- ##
     ## first, we look at the first time point, the BL ##
     ## ---------------------------------------------- ##
     ## we build a cross table
     tab_bl <-
       dat %>%
       dplyr::filter(fu == "BL") %>%
       dplyr::select(group, riskfactor) %>%
       table
     tab_bl
     ## ,----
     ## |      riskfactor
     ## | group risk norisk
     ## |   wt    22     86
     ## |   mut   87     63
     ## `----
     tab_bl %>% fisher.test
     ## ,----
     ## |    Fisher's Exact Test for Count Data
     ## |
     ## | data:  .
     ## | p-value = 1.18e-09
     ## | alternative hypothesis: true odds ratio is not equal to 1
     ## |  0.09986548 0.33817966
     ## | odds ratio
     ## |  0.1865377
     ## `----
     log(0.187)
     ## ,----
     ## | [1] -1.676647
     ## `----
     ## so, we get a highly significant association of the riskfactor
     ## and the group with an log(odds ratio) of -1.7
     dat %>%
       filter(fu == "BL") %>%
       glm(group ~ riskfactor, family = "binomial", data = .) %>%
       summary
     ## ,----
     ## | glm(formula = group ~ riskfactor, family = "binomial",
data = .)
     ## |
     ## |     Min       1Q   Median       3Q      Max
     ## | -1.7890  -1.0484   0.6715   0.6715   1.3121
     ## |
     ## |                  Estimate Std. Error z value Pr(>|z|)
     ## | (Intercept)        1.3749     0.2386   5.761 8.35e-09 ***
     ## | riskfactornorisk  -1.6861     0.2906  -5.802 6.55e-09 ***
     ## | ---
     ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1
‘ ’ 1
     ## |
     ## | (Dispersion parameter for binomial family taken to be 1)
     ## |
     ## |     Null deviance: 350.80  on 257  degrees of freedom
     ## | Residual deviance: 312.63  on 256  degrees of freedom
     ## | AIC: 316.63
     ## |
     ## | Number of Fisher Scoring iterations: 4
     ## `----
     ## ------------------------------------------------- ##
     ## Now, we analyse both time points with interaction ##
     ## ------------------------------------------------- ##
     dat %>%
       glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
family =
     "binomial", data = .) %>%
       summary
     ## ,----
     ## | Generalized linear mixed model fit by maximum likelihood
(Laplace
     ## |   Approximation) [glmerMod]
     ## |  Family: binomial  ( logit )
     ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 |
patient)
     ## |    Data: .
     ## |
     ## |      AIC      BIC   logLik deviance df.resid
     ## |    345.2    366.0   -167.6    335.2      470
     ## |
     ## |       Min        1Q    Median        3Q       Max
     ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
     ## |
     ## |  Groups  Name        Variance Std.Dev.
     ## |  patient (Intercept) 1849     43
     ## | Number of obs: 475, groups:  patient, 265
     ## |
     ## |                       Estimate Std. Error z value Pr(>|z|)
     ## | (Intercept)            11.6846     1.3736   8.507 
 <2e-16 ***
     ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
     ## | fuFU                    0.4596     1.9165   0.240    0.810
     ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
     ## | ---
     ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1
‘ ’ 1
     ## |
     ## |             (Intr) rskfct fuFU
     ## | rskfctrnrsk -0.746
     ## | fuFU        -0.513  0.510
     ## | rskfctrn:FU  0.478 -0.576 -0.908
     ## `----
     ## I get huge variation in the random effects
     ##
     ## And the risk factor at BL gets an estimated log(odds ratio)
of -1.6
     ## but one which is not significant
     ---------- cut here --------------------------------------------
     > Hi all,
     >
     > I am interested in assessing the association of a
(potential) risk
     > factor to a (binary) grouping.
     >
     > I am having trouble with diverging results from modeling one
time
     point
     > (without random effect) and modeling two time points (with
random
     effect).
     >
     > When analysing the first time point (base line, BL) only, I
get a
     highly
     > significant association.
     > Now, I want to see, whether there is an interaction between
time and
     > risk factor (the risk factor is not constant).  But when
analysing
     both
     > time points, the estimated effect at BL is estimated to be not
     significant.
     >
     > (1) Is there an association at BL or not?
     > (2) How should I analyse both time points with this data?
     >
     > The aim is to look for confounding with other factors.  But I'd
     like to
     > understand the simple models before moving on.
     >
     > Below you find a reproducible example and the detailed results.
     >
     > Any suggestions would be highly appreciated!
     >
     > Regards,
     > Andreas
     >
     >
     >
     > PS: The code / results
     >
     > ---------- cut here --------------------------------------------
     > library("dplyr")
     > library("lme4")
     > library("lmerTest")
     > ## install_github("hrbrmstr/pastebin", upgrade_dependencies
= FALSE)
     > library("pastebin")
     >
     > ## ---------------------------------- ##
     > ## load the data                      ##
     > ## ---------------------------------- ##
     > dat <- pastebin::get_paste("Xgwgtb7j") %>%
     >   as.character %>%
     >   gsub("\r\n", "", .) %>%
     >   parse(text = .) %>%
     >   eval
     >
     >
     >
     > ## ---------------------------------- ##
     > ## have a look                        ##
     > ## ---------------------------------- ##
     > dat
     > ## ,----
     > ## | # A tibble: 475 x 4
     > ## |    patient group fu    riskfactor
     > ## |    <fct>   <fct> <fct> <fct>
     > ## |  1 p001    wt    BL    norisk
     > ## |  2 p002    wt    BL    norisk
     > ## |  3 p003    wt    BL    norisk
     > ## |  4 p004    wt    BL    norisk
     > ## |  5 p005    wt    BL    norisk
     > ## |  6 p006    wt    BL    norisk
     > ## |  7 p007    wt    BL    norisk
     > ## |  8 p008    wt    BL    norisk
     > ## |  9 p009    wt    BL    risk
     > ## | 10 p010    wt    BL    norisk
     > ## | # ... with 465 more rows
     > ## `----
     > dat %>% str
     > ## ,----
     > ## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame':        475
obs. of 
     > ## |  $ patient   : Factor w/ 265 levels "p001","p002",..: 1
2 3 4
     5 6 7
     > 8 9 10 ...
     > ## |  $ group     : Factor w/ 2 levels "wt","mut": 1 1 1 1 1
1 1 1
     1 1 ...
     > ## |  $ fu        : Factor w/ 2 levels "BL","FU": 1 1 1 1 1
1 1 1
     1 1 ...
     > ## |  $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2
2 2 2
     2 2 2
     > 1 2 ...
     > ## `----
     >
     > ## there are 265 patients
     > ## in 2 groups: "wt" and "mut"
     > ## with a dichotomous risk factor ("risk" and "norisk")
     > ## measured at two time points ("BL" and "FU")
     >
     > dat %>% summary
     > ## ,----
     > ## |     patient    group      fu       riskfactor
     > ## |  p001   :  2   wt :209   BL:258   risk  :205
     > ## |  p002   :  2   mut:266   FU:217   norisk:270
     > ## |  p003   :  2
     > ## |  p004   :  2
     > ## |  p005   :  2
     > ## |  p006   :  2
     > ## |  (Other):463
     > ## `----
     >
     > ## group sizes seem fine
     >
     >
     >
     > ## ---------------------------------------------- ##
     > ## first, we look at the first time point, the BL ##
     > ## ---------------------------------------------- ##
     >
     > ## we build a cross table
     > tab_bl <-
     >   dat %>%
     >   dplyr::select(group, riskfactor) %>%
     >   table
     > tab_bl
     > ## ,----
     > ## |      riskfactor
     > ## | group risk norisk
     > ## |   wt    35    174
     > ## |   mut  170     96
     > ## `----
     >
     > tab_bl %>% fisher.test
     > ## ,----
     > ## |    Fisher's Exact Test for Count Data
     > ## |
     > ## | data:  .
     > ## | p-value < 2.2e-16
     > ## | alternative hypothesis: true odds ratio is not equal to 1
     > ## |  0.07099792 0.18002325
     > ## | odds ratio
     > ## |  0.1141677
     > ## `----
     > log(0.114)
     > ## ,----
     > ## | [1] -2.171557
     > ## `----
     >
     > ## so, we get a highly significant association of the riskfactor
     > ## and the group with an log(odds ratio) of -2.2
     >
     > dat %>%
     >   glm(group ~ riskfactor, family = "binomial", data = .) %>%
     >   summary
     > ## ,----
     > ## |
     > ## | glm(formula = group ~ riskfactor, family = "binomial",
data = .)
     > ## |
     > ## |     Min       1Q   Median       3Q      Max
     > ## | -1.8802  -0.9374   0.6119   0.6119   1.4381
     > ## |
     > ## |                  Estimate Std. Error z value Pr(>|z|)
     > ## | (Intercept)        1.5805     0.1856   8.515   <2e-16 ***
     > ## | riskfactornorisk  -2.1752     0.2250  -9.668   <2e-16 ***
     > ## | ---
     > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
0.1 ‘ ’ 1
     > ## |
     > ## | (Dispersion parameter for binomial family taken to be 1)
     > ## |
     > ## |     Null deviance: 651.63  on 474  degrees of freedom
     > ## | Residual deviance: 538.83  on 473  degrees of freedom
     > ## | AIC: 542.83
     > ## |
     > ## | Number of Fisher Scoring iterations: 4
     > ## `----
     >
     >
     >
     > ## ------------------------------------------------- ##
     > ## Now, we analyse both time points with interaction ##
     > ## ------------------------------------------------- ##
     >
     > dat %>%
     >   glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient),
     family =
     > "binomial", data = .) %>%
     >   summary
     > ## ,----
     > ## | Generalized linear mixed model fit by maximum
likelihood (Laplace
     > ## |   Approximation) [glmerMod]
     > ## |  Family: binomial  ( logit )
     > ## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 |
patient)
     > ## |    Data: .
     > ## |
     > ## |      AIC      BIC   logLik deviance df.resid
     > ## |    345.2    366.0   -167.6    335.2      470
     > ## |
     > ## |       Min        1Q    Median        3Q       Max
     > ## | -0.095863 -0.058669  0.002278  0.002866  0.007324
     > ## |
     > ## |  Groups  Name        Variance Std.Dev.
     > ## |  patient (Intercept) 1849     43
     > ## | Number of obs: 475, groups:  patient, 265
     > ## |
     > ## |                       Estimate Std. Error z value Pr(>|z|)
     > ## | (Intercept)            11.6846     1.3736   8.507 
 <2e-16 ***
     > ## | riskfactornorisk       -1.5919     1.4166  -1.124    0.261
     > ## | fuFU                    0.4596     1.9165   0.240    0.810
     > ## | riskfactornorisk:fuFU  -0.8183     2.1651  -0.378    0.705
     > ## | ---
     > ## | Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
0.1 ‘ ’ 1
     > ## |
     > ## |             (Intr) rskfct fuFU
     > ## | rskfctrnrsk -0.746
     > ## | fuFU        -0.513  0.510
     > ## | rskfctrn:FU  0.478 -0.576 -0.908
     > ## `----
     >
     > ## I get huge variation in the random effects
     > ##
     > ## And the risk factor at BL gets an estimated log(odds
ratio) of -1.6
     > ## but one which is not significant
     > ---------- cut here --------------------------------------------
     > _______________________________________________
     > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
     >
     --
     Dr. Andreas Leha
     Head of the 'Core Facility
     Medical Biometry and Statistical Bioinformatics'
     UNIVERSITY MEDICAL CENTER GÖTTINGEN
     GEORG-AUGUST-UNIVERSITÄT
     Department of Medical Statistics
     Humboldtallee 32
     37073 Göttingen
     Mailing Address: 37099 Göttingen, Germany
     Fax: +49 (0) 551 39-4995
     Tel: +49 (0) 551 39-4987
     http://www.ams.med.uni-goettingen.de/service-de.shtml
     _______________________________________________
     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'
UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml
--
Dr. Andreas Leha
Head of the 'Core Facility
Medical Biometry and Statistical Bioinformatics'

UNIVERSITY MEDICAL CENTER GÖTTINGEN
GEORG-AUGUST-UNIVERSITÄT
Department of Medical Statistics
Humboldtallee 32
37073 Göttingen
Mailing Address: 37099 Göttingen, Germany
Fax: +49 (0) 551 39-4995
Tel: +49 (0) 551 39-4987
http://www.ams.med.uni-goettingen.de/service-de.shtml

Leha, Andreas
2018-11-25 04:23:33 UTC
Permalink
Hi all,

I am interested in assessing the association of a (potential) risk
factor to a (binary) grouping.

I am having trouble with diverging results from modeling one time point
(without random effect) and modeling two time points (with random effect).

When analysing the first time point (base line, BL) only, I get a highly
significant association.
Now, I want to see, whether there is an interaction between time and
risk factor (the risk factor is not constant). But when analysing both
time points, the estimated effect at BL is estimated to be not significant.

Now my simplified questions are:
(1) Is there an association at BL or not?
(2) How should I analyse both time points with this data?

The aim is to look for confounding with other factors. But I'd like to
understand the simple models before moving on.

Below you find a reproducible example and the detailed results.

Any suggestions would be highly appreciated!

Regards,
Andreas



PS: The code / results

---------- cut here --------------------------------------------
library("dplyr")
library("lme4")
library("lmerTest")
## install_github("hrbrmstr/pastebin", upgrade_dependencies = FALSE)
library("pastebin")

## ---------------------------------- ##
## load the data ##
## ---------------------------------- ##
dat <- pastebin::get_paste("Xgwgtb7j") %>%
as.character %>%
gsub("\r\n", "", .) %>%
parse(text = .) %>%
eval



## ---------------------------------- ##
## have a look ##
## ---------------------------------- ##
dat
## ,----
## | # A tibble: 475 x 4
## | patient group fu riskfactor
## | <fct> <fct> <fct> <fct>
## | 1 p001 wt BL norisk
## | 2 p002 wt BL norisk
## | 3 p003 wt BL norisk
## | 4 p004 wt BL norisk
## | 5 p005 wt BL norisk
## | 6 p006 wt BL norisk
## | 7 p007 wt BL norisk
## | 8 p008 wt BL norisk
## | 9 p009 wt BL risk
## | 10 p010 wt BL norisk
## | # ... with 465 more rows
## `----
dat %>% str
## ,----
## | Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 475 obs. of 4 variables:
## | $ patient : Factor w/ 265 levels "p001","p002",..: 1 2 3 4 5 6 7
8 9 10 ...
## | $ group : Factor w/ 2 levels "wt","mut": 1 1 1 1 1 1 1 1 1 1 ...
## | $ fu : Factor w/ 2 levels "BL","FU": 1 1 1 1 1 1 1 1 1 1 ...
## | $ riskfactor: Factor w/ 2 levels "risk","norisk": 2 2 2 2 2 2 2 2
1 2 ...
## `----

## there are 265 patients
## in 2 groups: "wt" and "mut"
## with a dichotomous risk factor ("risk" and "norisk")
## measured at two time points ("BL" and "FU")

dat %>% summary
## ,----
## | patient group fu riskfactor
## | p001 : 2 wt :209 BL:258 risk :205
## | p002 : 2 mut:266 FU:217 norisk:270
## | p003 : 2
## | p004 : 2
## | p005 : 2
## | p006 : 2
## | (Other):463
## `----

## group sizes seem fine



## ---------------------------------------------- ##
## first, we look at the first time point, the BL ##
## ---------------------------------------------- ##

## we build a cross table
tab_bl <-
dat %>%
dplyr::select(group, riskfactor) %>%
table
tab_bl
## ,----
## | riskfactor
## | group risk norisk
## | wt 35 174
## | mut 170 96
## `----

## and we test using fisher:
tab_bl %>% fisher.test
## ,----
## | Fisher's Exact Test for Count Data
## |
## | data: .
## | p-value < 2.2e-16
## | alternative hypothesis: true odds ratio is not equal to 1
## | 95 percent confidence interval:
## | 0.07099792 0.18002325
## | sample estimates:
## | odds ratio
## | 0.1141677
## `----
log(0.114)
## ,----
## | [1] -2.171557
## `----

## so, we get a highly significant association of the riskfactor
## and the group with an log(odds ratio) of -2.2

## we get the same result using logistic regression:
dat %>%
glm(group ~ riskfactor, family = "binomial", data = .) %>%
summary
## ,----
## |
## | Call:
## | glm(formula = group ~ riskfactor, family = "binomial", data = .)
## |
## | Deviance Residuals:
## | Min 1Q Median 3Q Max
## | -1.8802 -0.9374 0.6119 0.6119 1.4381
## |
## | Coefficients:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 1.5805 0.1856 8.515 <2e-16 ***
## | riskfactornorisk -2.1752 0.2250 -9.668 <2e-16 ***
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | (Dispersion parameter for binomial family taken to be 1)
## |
## | Null deviance: 651.63 on 474 degrees of freedom
## | Residual deviance: 538.83 on 473 degrees of freedom
## | AIC: 542.83
## |
## | Number of Fisher Scoring iterations: 4
## `----



## ------------------------------------------------- ##
## Now, we analyse both time points with interaction ##
## ------------------------------------------------- ##

dat %>%
glmer(group ~ riskfactor + fu + riskfactor:fu + (1|patient), family =
"binomial", data = .) %>%
summary
## ,----
## | Generalized linear mixed model fit by maximum likelihood (Laplace
## | Approximation) [glmerMod]
## | Family: binomial ( logit )
## | Formula: group ~ riskfactor + fu + riskfactor:fu + (1 | patient)
## | Data: .
## |
## | AIC BIC logLik deviance df.resid
## | 345.2 366.0 -167.6 335.2 470
## |
## | Scaled residuals:
## | Min 1Q Median 3Q Max
## | -0.095863 -0.058669 0.002278 0.002866 0.007324
## |
## | Random effects:
## | Groups Name Variance Std.Dev.
## | patient (Intercept) 1849 43
## | Number of obs: 475, groups: patient, 265
## |
## | Fixed effects:
## | Estimate Std. Error z value Pr(>|z|)
## | (Intercept) 11.6846 1.3736 8.507 <2e-16 ***
## | riskfactornorisk -1.5919 1.4166 -1.124 0.261
## | fuFU 0.4596 1.9165 0.240 0.810
## | riskfactornorisk:fuFU -0.8183 2.1651 -0.378 0.705
## | ---
## | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## |
## | Correlation of Fixed Effects:
## | (Intr) rskfct fuFU
## | rskfctrnrsk -0.746
## | fuFU -0.513 0.510
## | rskfctrn:FU 0.478 -0.576 -0.908
## `----

## I get huge variation in the random effects
##
## And the risk factor at BL gets an estimated log(odds ratio) of -1.6
## but one which is not significant
---------- cut here --------------------------------------------
Loading...