Nicolas Flaibani
2018-04-27 20:33:28 UTC
Hi everyone!
I would like to join the discussion of the second topic (Specifying Multiple Random Effects in NLME by Dan) but in my case Im going to talk about the lme4 package, instead of the nmle. I think that the question is still relevant.
Ive had a doubt for a long time about how to investigate the interactions between random and fixed effects. Ive read a lot of forums, papers and helps packages and Ive always concluded that the correct form of testing the interaction between a random and a fixed variable was:
Model1 <- lmer (Y ~ X1 + X2 + (X1 | Random Variable)
However, I found in some forums and personal communications from several researchers that there is another way to investigate the interaction between random and fixed variables and has the following syntax:
Model2 <- lmer (Y ~ X1 + X2 + (1 | Random Variable: X1)
I understand that this syntax
(1|Random Variable/X1) = (1|Random Variable)+(1|Random Variable:X1)
specify a nested structure between the variables and this is not the case of interest.
My particular question is whether the syntax of the Model 2 is correct to test interactions between random and fixed variables. If this model is correct, which are the differences with the syntax of Model 1, since the resulting models are clearly different? Besides, in coincidence with the question of Dan (Are there cases where one vs. the other formulation should absolutely be used? My understanding that for continuous variables, e.g., multiple measurements across multiple days, Days|Object would be the correct syntax. But here we're talking about a factor variable), I ask if one type of syntax should be used if the fixed variables are continuous or there are factors.
If I compare the summary from a model with the latter syntax (model 2), with the summary of the same analysis made with a statistic program (like Statistica), the results are very similar. Thats not the case with the model 1.
For example, if I analyze a morphological trait with the syntax
M2 <- lmer (Wing ~ Temperature * Sex + Temperature + Sex + (1 | Line) + (1 | Line:Sex:Temperature) + (1 | Line:Sex) + (1 | Line:Temperature))
the summary is the following:
Random effects:
Groups Name Variance Std.Dev.
Line:Sex:Temperature (Intercept) 14.6231 3.8240
Line:Temperature (Intercept) 154.7685 12.4406
Line:Sex (Intercept) 0.6947 0.8335
Line (Intercept) 72.5945 8.5202
Residual 180.0664 13.4189
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 501.141 2.268 96.940 221.009 < 2e-16 ***
Temperature25 -57.960 2.699 54.800 -21.473 < 2e-16 ***
SexM -53.639 1.001 96.260 -53.584 < 2e-16 ***
Temperature25:SexM -6.488 1.391 48.300 -4.663 2.49e-05 ***
I found that the function rand() from the lmerTest package gives me the p values of the random effects if I write the model like this:
> rand(M2)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
Line 4.6152 1 0.03 *
Line:Sex:Temperature 30.8130 1 3e-08 ***
Line:Sex 0.0391 1 0.84
Line:Temperature 112.1539 1 <2e-16 ***
I dont know if this function is reliable because it is not mentioned for testing the significance of the random effects in the page https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
The summary of the same analysis made with the Statistica is:
Effect
SS
Degr. of
MS
Den.Syn.
Den.Syn.
F
p
Intercept
Fixed
764579151
1
764579151
48,001
12655,91
60412,83
0,000000
Line
Random
608181
48
12670
47,800
6489,64
1,95
0,011254
Sex
Fixed
3138661
1
3138661
48,038
495,62
6332,81
0,000000
Temperature
Fixed
3686660
1
3686660
48,003
6459,62
570,72
0,000000
Line*Sex
Random
23808
48
496
48,000
473,30
1,05
0,435866
Line*Temperature
Random
310413
48
6467
48,000
473,30
13,66
0,000000
Sex*Temperature
Fixed
10075
1
10075
48,040
472,94
21,30
0,000029
Line*Sex*Temperature
Random
22718
48
473
3696,000
167,33
2,83
0,000000
Error
618467
3696
167
But if I write the model with the other syntax:
M1 <- lmer(Wing ~ Temperature * Sex + (Temperature * Sex | Line))
the summary is the following:
REML criterion at convergence: 31440.9
Random effects:
Groups Name Variance Std.Dev. Corr
Line (Intercept) 266.78 16.333
Temperature25 398.27 19.957 -0.60
SexM 41.54 6.446 -0.56 0.46
Temperature25:SexM 61.34 7.832 0.56 -0.61 -0.80
Residual 167.33 12.936
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 501.603 2.371 48.046 211.586 < 2e-16 ***
Temperature25 -58.423 2.911 48.027 -20.070 < 2e-16 ***
SexM -53.659 1.095 47.964 -49.023 < 2e-16 ***
Temperature 25:SexM -6.470 1.393 48.278 -4.644 2.66e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
In addition, if I apply the rand functionŽ for this syntax (M1), it doesnt retourn the whole p-values of the random effects (Do not give me p value for line and line*Temperature*Sex)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
Temperatura:Línea 0.00e+00 0 1
Sexo:Línea 1.46e-10 0 <2e-16 ***
I really appreciate your time and dedication for answering this questions. Thank you for trying to help us understand a little more about the syntax of these complex models and thus better understand their correct approach.
Thank you very much for your time everyone.
Greetings,
Nicolas
----------------------------------------------------------------------------------------
Message: 2
Date: Wed, 25 Apr 2018 16:11:38 -0400
From: Dan <***@gmail.com>
To: "R-SIG-Mixed-***@R-project.org"
<R-sig-mixed-***@r-project.org>, Ben Bolker <***@gmail.com>
Subject: [R-sig-ME] Specifying Multiple Random Effects in NLME
Message-ID:
<CAET4i1f-SH5zA6xcCxNXc091HCk+snMv+***@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi all:
I am curating an off-list thread about specifying multiple random effects
in NLME.
1. If it's (1|Object) + (1|Object:Coating) that you want then
you should be able to use a nested specification (which nlme *can*
handle relatively easily), i.e. something like
random=a+b+c~1|Object/Coating
Although (Coating|Object) and (1|Object:Coating) both in some sense
represent "interactions" the latter is *much* simpler/more parsimonious.
If you're OK with 1|Object:Coating rather than Coating|Object it
should be *much* faster. If you don't understand the distinction (which
would be absolutely fine and understandable) can we resume the
discussion on r-sig-mixed-models ... ?
-----------
So:
random=a+b+c~Coating|Object
does not fit.
But:
random=a+b+c~Object/Coating
fits.
Can you better explain the distinction here? I have sometimes used the
1|Object:Coating + 1|Object syntax and sometimes the Coating|Object syntax
in other models. My experience/understanding is that the former syntax with
multiple "within subject" variables produces exactly matching output to the
standard "repeated measures ANOVA" with the lmer assumption of compound
symmetry.
Are there cases where one vs. the other formulation should absolutely be
used? My understanding that for continuous variables, e.g., multiple
measurements across multiple days, Days|Object would be the correct syntax.
But here we're talking about a factor variable.
2. I'm trying to read the "random" section for nlme right now but it's
kind of making my head explode (and I think there's a typo: " the same
as the order of the order of the elements in the list"). It *sounds*
like (1) explicitly creating an interaction
ObjCoating=interaction(Object,Coating) and (2) using something like
list(ObjCoating=a~1,Object=b~1,Object=c~1)
should work (grouping factors as names, then [right-hand-side variable
name]~[random effects model], but I'm worried about the phrase "The
order of nesting will be assumed the same as the order of the elements
in the list": what nesting?
-----------
I think that formulation is explicitly in order. I replaced your first
ObjCoating with simply Object, just to test what would happen:
Random effects:
Formula: a ~ 1 | Object
a.(Intercept)
StdDev: 1.305816
Formula: b ~ 1 | Object %in% Object
b.(Intercept)
StdDev: 0.01576521
Formula: c ~ 1 | Object %in% Object %in% Object
c.(Intercept) Residual
StdDev: 2.677883 2.219676
[[alternative HTML version deleted]]
*****
[[alternative HTML version deleted]]
I would like to join the discussion of the second topic (Specifying Multiple Random Effects in NLME by Dan) but in my case Im going to talk about the lme4 package, instead of the nmle. I think that the question is still relevant.
Ive had a doubt for a long time about how to investigate the interactions between random and fixed effects. Ive read a lot of forums, papers and helps packages and Ive always concluded that the correct form of testing the interaction between a random and a fixed variable was:
Model1 <- lmer (Y ~ X1 + X2 + (X1 | Random Variable)
However, I found in some forums and personal communications from several researchers that there is another way to investigate the interaction between random and fixed variables and has the following syntax:
Model2 <- lmer (Y ~ X1 + X2 + (1 | Random Variable: X1)
I understand that this syntax
(1|Random Variable/X1) = (1|Random Variable)+(1|Random Variable:X1)
specify a nested structure between the variables and this is not the case of interest.
My particular question is whether the syntax of the Model 2 is correct to test interactions between random and fixed variables. If this model is correct, which are the differences with the syntax of Model 1, since the resulting models are clearly different? Besides, in coincidence with the question of Dan (Are there cases where one vs. the other formulation should absolutely be used? My understanding that for continuous variables, e.g., multiple measurements across multiple days, Days|Object would be the correct syntax. But here we're talking about a factor variable), I ask if one type of syntax should be used if the fixed variables are continuous or there are factors.
If I compare the summary from a model with the latter syntax (model 2), with the summary of the same analysis made with a statistic program (like Statistica), the results are very similar. Thats not the case with the model 1.
For example, if I analyze a morphological trait with the syntax
M2 <- lmer (Wing ~ Temperature * Sex + Temperature + Sex + (1 | Line) + (1 | Line:Sex:Temperature) + (1 | Line:Sex) + (1 | Line:Temperature))
the summary is the following:
Random effects:
Groups Name Variance Std.Dev.
Line:Sex:Temperature (Intercept) 14.6231 3.8240
Line:Temperature (Intercept) 154.7685 12.4406
Line:Sex (Intercept) 0.6947 0.8335
Line (Intercept) 72.5945 8.5202
Residual 180.0664 13.4189
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 501.141 2.268 96.940 221.009 < 2e-16 ***
Temperature25 -57.960 2.699 54.800 -21.473 < 2e-16 ***
SexM -53.639 1.001 96.260 -53.584 < 2e-16 ***
Temperature25:SexM -6.488 1.391 48.300 -4.663 2.49e-05 ***
I found that the function rand() from the lmerTest package gives me the p values of the random effects if I write the model like this:
> rand(M2)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
Line 4.6152 1 0.03 *
Line:Sex:Temperature 30.8130 1 3e-08 ***
Line:Sex 0.0391 1 0.84
Line:Temperature 112.1539 1 <2e-16 ***
I dont know if this function is reliable because it is not mentioned for testing the significance of the random effects in the page https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
The summary of the same analysis made with the Statistica is:
Effect
SS
Degr. of
MS
Den.Syn.
Den.Syn.
F
p
Intercept
Fixed
764579151
1
764579151
48,001
12655,91
60412,83
0,000000
Line
Random
608181
48
12670
47,800
6489,64
1,95
0,011254
Sex
Fixed
3138661
1
3138661
48,038
495,62
6332,81
0,000000
Temperature
Fixed
3686660
1
3686660
48,003
6459,62
570,72
0,000000
Line*Sex
Random
23808
48
496
48,000
473,30
1,05
0,435866
Line*Temperature
Random
310413
48
6467
48,000
473,30
13,66
0,000000
Sex*Temperature
Fixed
10075
1
10075
48,040
472,94
21,30
0,000029
Line*Sex*Temperature
Random
22718
48
473
3696,000
167,33
2,83
0,000000
Error
618467
3696
167
But if I write the model with the other syntax:
M1 <- lmer(Wing ~ Temperature * Sex + (Temperature * Sex | Line))
the summary is the following:
REML criterion at convergence: 31440.9
Random effects:
Groups Name Variance Std.Dev. Corr
Line (Intercept) 266.78 16.333
Temperature25 398.27 19.957 -0.60
SexM 41.54 6.446 -0.56 0.46
Temperature25:SexM 61.34 7.832 0.56 -0.61 -0.80
Residual 167.33 12.936
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 501.603 2.371 48.046 211.586 < 2e-16 ***
Temperature25 -58.423 2.911 48.027 -20.070 < 2e-16 ***
SexM -53.659 1.095 47.964 -49.023 < 2e-16 ***
Temperature 25:SexM -6.470 1.393 48.278 -4.644 2.66e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
In addition, if I apply the rand functionŽ for this syntax (M1), it doesnt retourn the whole p-values of the random effects (Do not give me p value for line and line*Temperature*Sex)
Analysis of Random effects Table:
Chi.sq Chi.DF p.value
Temperatura:Línea 0.00e+00 0 1
Sexo:Línea 1.46e-10 0 <2e-16 ***
I really appreciate your time and dedication for answering this questions. Thank you for trying to help us understand a little more about the syntax of these complex models and thus better understand their correct approach.
Thank you very much for your time everyone.
Greetings,
Nicolas
----------------------------------------------------------------------------------------
Message: 2
Date: Wed, 25 Apr 2018 16:11:38 -0400
From: Dan <***@gmail.com>
To: "R-SIG-Mixed-***@R-project.org"
<R-sig-mixed-***@r-project.org>, Ben Bolker <***@gmail.com>
Subject: [R-sig-ME] Specifying Multiple Random Effects in NLME
Message-ID:
<CAET4i1f-SH5zA6xcCxNXc091HCk+snMv+***@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi all:
I am curating an off-list thread about specifying multiple random effects
in NLME.
1. If it's (1|Object) + (1|Object:Coating) that you want then
you should be able to use a nested specification (which nlme *can*
handle relatively easily), i.e. something like
random=a+b+c~1|Object/Coating
Although (Coating|Object) and (1|Object:Coating) both in some sense
represent "interactions" the latter is *much* simpler/more parsimonious.
If you're OK with 1|Object:Coating rather than Coating|Object it
should be *much* faster. If you don't understand the distinction (which
would be absolutely fine and understandable) can we resume the
discussion on r-sig-mixed-models ... ?
-----------
So:
random=a+b+c~Coating|Object
does not fit.
But:
random=a+b+c~Object/Coating
fits.
Can you better explain the distinction here? I have sometimes used the
1|Object:Coating + 1|Object syntax and sometimes the Coating|Object syntax
in other models. My experience/understanding is that the former syntax with
multiple "within subject" variables produces exactly matching output to the
standard "repeated measures ANOVA" with the lmer assumption of compound
symmetry.
Are there cases where one vs. the other formulation should absolutely be
used? My understanding that for continuous variables, e.g., multiple
measurements across multiple days, Days|Object would be the correct syntax.
But here we're talking about a factor variable.
2. I'm trying to read the "random" section for nlme right now but it's
kind of making my head explode (and I think there's a typo: " the same
as the order of the order of the elements in the list"). It *sounds*
like (1) explicitly creating an interaction
ObjCoating=interaction(Object,Coating) and (2) using something like
list(ObjCoating=a~1,Object=b~1,Object=c~1)
should work (grouping factors as names, then [right-hand-side variable
name]~[random effects model], but I'm worried about the phrase "The
order of nesting will be assumed the same as the order of the elements
in the list": what nesting?
-----------
I think that formulation is explicitly in order. I replaced your first
ObjCoating with simply Object, just to test what would happen:
Random effects:
Formula: a ~ 1 | Object
a.(Intercept)
StdDev: 1.305816
Formula: b ~ 1 | Object %in% Object
b.(Intercept)
StdDev: 0.01576521
Formula: c ~ 1 | Object %in% Object %in% Object
c.(Intercept) Residual
StdDev: 2.677883 2.219676
[[alternative HTML version deleted]]
*****
[[alternative HTML version deleted]]