Smith, Desmond
2018-07-09 00:03:05 UTC
Dear All,
I have previously tested a mixed model using lmer. However, since the dependent variable is count data, which is over-dispersed, it seems that I should use a negative binomial generalized linear mixed model (GLMM), such as glmmTMB.
The full data frame is:
dim(example)
[1] 115 7
Here are the first six rows of the full 115 row data frame:
head(example)
fixed1 fixed2 random nested_random counts id log_total_counts
1 0 0 1 1 643 1 12.89582
1001 0 0 2 6 585 1 13.67509
2001 0 0 3 11 846 1 13.94209
3001 0 0 4 16 755 1 13.93056
4001 0 0 5 21 1428 1 13.65672
5001 0 0 6 26 1566 1 13.64421
str(example)
'data.frame': 115 obs. of 7 variables:
$ fixed1 : num 0 0 0 0 0 0 1 1 1 1 ...
$ fixed2 : num 0 0 0 0 0 0 0 0 0 0 ...
$ random : num 1 2 3 4 5 6 1 2 3 4 ...
$ nested_random : num 1 6 11 16 21 26 2 7 12 17 ...
$ counts : int 643 585 846 755 1428 1566 309 719 1542 1446 ...
$ id : int 1 1 1 1 1 1 1 1 1 1 ...
$ log_total_counts: num 12.9 13.7 13.9 13.9 13.7 ...
There are six values of fixed1, plus a zero:
unique(example$fixed1)
[1] 0 1 2 3 4 6
The mean of the six values of fixed1 is
mean(c(1,2,3,4,6))
[1] 3.2
There are four values of fixed2, including zero:
unique(example$fixed2)
[1] 0 25 75 8
The mean of the four values of fixed1 is
mean(unique(example$fixed2))
[1] 27
Here is the lmer model I used:
library(lme4)
m1 <- lmer(counts ~ fixed1*fixed2 + (1|random/nested_random) + offset(log_total_counts), data = example, verbose=FALSE)
Followed by the use of glht for post-hoc analysis:
library(multcomp)
glht_fixed1 <- glht(m1, linfct = c(
"fixed1 == 0",
"fixed1 + 8*fixed1:fixed2 == 0",
"fixed1 + 25*fixed1:fixed2 == 0",
"fixed1 + 75*fixed1:fixed2 == 0",
"fixed1 + (27)*fixed1:fixed2 == 0"))
glht_fixed2 <- glht(m1, linfct = c(
"fixed2 + 1*fixed1:fixed2 == 0",
"fixed2 + 2*fixed1:fixed2 == 0",
"fixed2 + 3*fixed1:fixed2 == 0",
"fixed2 + 4*fixed1:fixed2 == 0",
"fixed2 + 6*fixed1:fixed2 == 0",
"fixed2 + (3.2)*fixed1:fixed2 == 0"))
glht_omni <- glht(m1)
Here is the corresponding negative binomial glmmTMB model:
library(glmmTMB)
m2 <- glmmTMB(counts ~ fixed1*fixed2 + (1|random/nested_random) + offset(log_total_counts), data = example, verbose=FALSE, family="nbinom2")
According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (or its more recent equivalent, emmeans).
After running
source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))
I can use lsmeans on the glmmTMB object. For example,
as.glht(emmeans(m1,~(week + 27*week:conc)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813
But this does not seem correct. What I cannot figure out, for the life of me, is how to correctly carry over the use of linfct to the lsmeans scenario on the glmmTMB. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.
The glht software and post hoc testing works easily with the glmmADMB package, but the glmmADMB software is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the mixed model, so speed is essential.
Many thanks for your suggestions or help!
Cheers,
Des
.
________________________________
UCLA HEALTH SCIENCES IMPORTANT WARNING: This email (and any attachments) is only intended for the use of the person or entity to which it is addressed, and may contain information that is privileged and confidential. You, the recipient, are obligated to maintain it in a safe, secure and confidential manner. Unauthorized redisclosure or failure to maintain confidentiality may subject you to federal and state penalties. If you are not the intended recipient, please immediately notify us by return email, and delete this message from your computer.
[[alternative HTML version deleted]]
I have previously tested a mixed model using lmer. However, since the dependent variable is count data, which is over-dispersed, it seems that I should use a negative binomial generalized linear mixed model (GLMM), such as glmmTMB.
The full data frame is:
dim(example)
[1] 115 7
Here are the first six rows of the full 115 row data frame:
head(example)
fixed1 fixed2 random nested_random counts id log_total_counts
1 0 0 1 1 643 1 12.89582
1001 0 0 2 6 585 1 13.67509
2001 0 0 3 11 846 1 13.94209
3001 0 0 4 16 755 1 13.93056
4001 0 0 5 21 1428 1 13.65672
5001 0 0 6 26 1566 1 13.64421
str(example)
'data.frame': 115 obs. of 7 variables:
$ fixed1 : num 0 0 0 0 0 0 1 1 1 1 ...
$ fixed2 : num 0 0 0 0 0 0 0 0 0 0 ...
$ random : num 1 2 3 4 5 6 1 2 3 4 ...
$ nested_random : num 1 6 11 16 21 26 2 7 12 17 ...
$ counts : int 643 585 846 755 1428 1566 309 719 1542 1446 ...
$ id : int 1 1 1 1 1 1 1 1 1 1 ...
$ log_total_counts: num 12.9 13.7 13.9 13.9 13.7 ...
There are six values of fixed1, plus a zero:
unique(example$fixed1)
[1] 0 1 2 3 4 6
The mean of the six values of fixed1 is
mean(c(1,2,3,4,6))
[1] 3.2
There are four values of fixed2, including zero:
unique(example$fixed2)
[1] 0 25 75 8
The mean of the four values of fixed1 is
mean(unique(example$fixed2))
[1] 27
Here is the lmer model I used:
library(lme4)
m1 <- lmer(counts ~ fixed1*fixed2 + (1|random/nested_random) + offset(log_total_counts), data = example, verbose=FALSE)
Followed by the use of glht for post-hoc analysis:
library(multcomp)
glht_fixed1 <- glht(m1, linfct = c(
"fixed1 == 0",
"fixed1 + 8*fixed1:fixed2 == 0",
"fixed1 + 25*fixed1:fixed2 == 0",
"fixed1 + 75*fixed1:fixed2 == 0",
"fixed1 + (27)*fixed1:fixed2 == 0"))
glht_fixed2 <- glht(m1, linfct = c(
"fixed2 + 1*fixed1:fixed2 == 0",
"fixed2 + 2*fixed1:fixed2 == 0",
"fixed2 + 3*fixed1:fixed2 == 0",
"fixed2 + 4*fixed1:fixed2 == 0",
"fixed2 + 6*fixed1:fixed2 == 0",
"fixed2 + (3.2)*fixed1:fixed2 == 0"))
glht_omni <- glht(m1)
Here is the corresponding negative binomial glmmTMB model:
library(glmmTMB)
m2 <- glmmTMB(counts ~ fixed1*fixed2 + (1|random/nested_random) + offset(log_total_counts), data = example, verbose=FALSE, family="nbinom2")
According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (or its more recent equivalent, emmeans).
After running
source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))
I can use lsmeans on the glmmTMB object. For example,
as.glht(emmeans(m1,~(week + 27*week:conc)))
General Linear Hypotheses
Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813
But this does not seem correct. What I cannot figure out, for the life of me, is how to correctly carry over the use of linfct to the lsmeans scenario on the glmmTMB. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.
The glht software and post hoc testing works easily with the glmmADMB package, but the glmmADMB software is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the mixed model, so speed is essential.
Many thanks for your suggestions or help!
Cheers,
Des
.
________________________________
UCLA HEALTH SCIENCES IMPORTANT WARNING: This email (and any attachments) is only intended for the use of the person or entity to which it is addressed, and may contain information that is privileged and confidential. You, the recipient, are obligated to maintain it in a safe, secure and confidential manner. Unauthorized redisclosure or failure to maintain confidentiality may subject you to federal and state penalties. If you are not the intended recipient, please immediately notify us by return email, and delete this message from your computer.
[[alternative HTML version deleted]]