Discussion:
[R-sig-ME] Removing random intercepts before random slopes
Maarten Jung
2018-08-29 09:21:33 UTC
Permalink
Dear list,

Does it make sense to remove random intercepts before one removes
random slopes (regarding the same grouping factor)?

Barr et al. (2013, [1]) suggest that a model "missing within-unit
random intercepts is preferable to one missing the critical random
slopes" (p. 276).
However, I wonder whether this procedure does make sense from a
conceptual perspective and whether it is reconcilable with the
principal of marginality?

And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?

Best regards,
Maarten

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3881361/
Phillip Alday
2018-08-29 10:41:10 UTC
Permalink
Post by Maarten Jung
And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?
Absolutely! Consider the trivial case of only one categorical predictor
with dummy coding and no continuous predictors in a fixed-effect model.

Then ~ 0 + cat.pred and ~ 1 + cat.pred produce identical models in some
sense, but in the former each level of the predictor is estimated as an
"absolute" value, while in the latter, one predictor is coded as the
intercept and estimated as an "absolute" value, while the other levels
are coded as offsets from that value.

For a really interesting example, try this:

data(Oats,package="nlme")
summary(lm(yield ~ 1 + Variety,Oats))
summary(lm(yield ~ 0 + Variety,Oats))

Note that the residual error is identical, but all of the summary
statistics -- R2, F -- are different.

Best,
Phillip
Post by Maarten Jung
Dear list,
Does it make sense to remove random intercepts before one removes
random slopes (regarding the same grouping factor)?
Barr et al. (2013, [1]) suggest that a model "missing within-unit
random intercepts is preferable to one missing the critical random
slopes" (p. 276).
However, I wonder whether this procedure does make sense from a
conceptual perspective and whether it is reconcilable with the
principal of marginality?
And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?
Best regards,
Maarten
[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3881361/
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Maarten Jung
2018-08-29 12:07:47 UTC
Permalink
Post by Phillip Alday
Post by Maarten Jung
And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?
Absolutely! Consider the trivial case of only one categorical predictor
with dummy coding and no continuous predictors in a fixed-effect model.
Then ~ 0 + cat.pred and ~ 1 + cat.pred produce identical models in some
sense, but in the former each level of the predictor is estimated as an
"absolute" value, while in the latter, one predictor is coded as the
intercept and estimated as an "absolute" value, while the other levels
are coded as offsets from that value.
data(Oats,package="nlme")
summary(lm(yield ~ 1 + Variety,Oats))
summary(lm(yield ~ 0 + Variety,Oats))
Note that the residual error is identical, but all of the summary
statistics -- R2, F -- are different.
Sorry, I just realized that I didn't make clear what I was talking about.
I know that ~ 0 + cat.pred and ~ 1 + cat.pred in the fixed effects part
are just reparameterizations of the same model.
As I'm working with afex::lmer_alt() which converts categorical predictors
to numeric covariates (via model.matrix()) per default, I was talking about
removing random intercepts before removing random slopes in such a model,
especially one without correlation parameters [e.g. m1], and whether this
is conceptually different from removing random intercepts before removing
random slopes in a LMM with continuous predictors.
I. e., I would like to know if it makes sense in this case vs. doesn't make
sense in this case but does for continuous predictors vs. does never make
sense.

# here c1 and c2 represent the two contrasts/numeric covariates defined
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || cat.pred)

Best,
Maarten

[[alternative HTML version deleted]]
Maarten Jung
2018-08-29 12:13:13 UTC
Permalink
Sorry, hit the send button too fast:

# here c1 and c2 represent the two contrasts/numeric covariates defined
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || group)

On Wed, Aug 29, 2018 at 2:07 PM Maarten Jung <
Post by Maarten Jung
Post by Phillip Alday
Post by Maarten Jung
And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?
Absolutely! Consider the trivial case of only one categorical predictor
with dummy coding and no continuous predictors in a fixed-effect model.
Then ~ 0 + cat.pred and ~ 1 + cat.pred produce identical models in some
sense, but in the former each level of the predictor is estimated as an
"absolute" value, while in the latter, one predictor is coded as the
intercept and estimated as an "absolute" value, while the other levels
are coded as offsets from that value.
data(Oats,package="nlme")
summary(lm(yield ~ 1 + Variety,Oats))
summary(lm(yield ~ 0 + Variety,Oats))
Note that the residual error is identical, but all of the summary
statistics -- R2, F -- are different.
Sorry, I just realized that I didn't make clear what I was talking about.
I know that ~ 0 + cat.pred and ~ 1 + cat.pred in the fixed effects part
are just reparameterizations of the same model.
As I'm working with afex::lmer_alt() which converts categorical
predictors to numeric covariates (via model.matrix()) per default, I was
talking about removing random intercepts before removing random slopes in
such a model, especially one without correlation parameters [e.g. m1],
and whether this is conceptually different from removing random
intercepts before removing random slopes in a LMM with continuous
predictors.
I. e., I would like to know if it makes sense in this case vs. doesn't
make sense in this case but does for continuous predictors vs. does never
make sense.
# here c1 and c2 represent the two contrasts/numeric covariates defined
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || cat.pred)
Best,
Maarten
[[alternative HTML version deleted]]
Jake Westfall
2018-08-29 13:34:15 UTC
Permalink
Maarten,

Regarding whether it makes conceptual sense to have a model with random
slopes but not random intercepts. I believe the context of this
recommendation is an experiment where the goal is to do a confirmatory test
of whether the associated fixed slope = 0. In that case, as long as the
experiment is fairly balanced, the random slope variance appears in (and
expands) the standard error for the fixed effect of interest, while the
random intercept variance has little or no effect on the standard error
(again, assuming the experiment is close to balanced). So we'd like to keep
the random slopes in the model if possible so that the type 1 error rate
won't exceed the nominal alpha level by too much. But keeping the random
intercepts in the model is less important because it should have little or
no impact on the type 1 error rate either way, albeit it would be
conceptually strange to have random slopes but not random intercepts. So,
anyway, that's the line of thinking as I understand it, and I don't think
it's crazy.

Jake

On Wed, Aug 29, 2018 at 7:18 AM Maarten Jung <
Post by Maarten Jung
# here c1 and c2 represent the two contrasts/numeric covariates defined
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || group)
On Wed, Aug 29, 2018 at 2:07 PM Maarten Jung <
Post by Maarten Jung
Post by Phillip Alday
Post by Maarten Jung
And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?
Absolutely! Consider the trivial case of only one categorical predictor
with dummy coding and no continuous predictors in a fixed-effect model.
Then ~ 0 + cat.pred and ~ 1 + cat.pred produce identical models in
some
Post by Maarten Jung
Post by Phillip Alday
sense, but in the former each level of the predictor is estimated as an
"absolute" value, while in the latter, one predictor is coded as the
intercept and estimated as an "absolute" value, while the other levels
are coded as offsets from that value.
data(Oats,package="nlme")
summary(lm(yield ~ 1 + Variety,Oats))
summary(lm(yield ~ 0 + Variety,Oats))
Note that the residual error is identical, but all of the summary
statistics -- R2, F -- are different.
Sorry, I just realized that I didn't make clear what I was talking about.
I know that ~ 0 + cat.pred and ~ 1 + cat.pred in the fixed effects part
are just reparameterizations of the same model.
As I'm working with afex::lmer_alt() which converts categorical
predictors to numeric covariates (via model.matrix()) per default, I was
talking about removing random intercepts before removing random slopes in
such a model, especially one without correlation parameters [e.g. m1],
and whether this is conceptually different from removing random
intercepts before removing random slopes in a LMM with continuous
predictors.
I. e., I would like to know if it makes sense in this case vs. doesn't
make sense in this case but does for continuous predictors vs. does never
make sense.
# here c1 and c2 represent the two contrasts/numeric covariates defined
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || cat.pred)
Best,
Maarten
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
Maarten Jung
2018-08-29 15:56:28 UTC
Permalink
Dear Jake,

thanks for your answer, makes sense to me. I think removing the random
intercepts should mostly increase the residual error and thus even
increase the SEs for the fixed effects. Is this correct?

Why exactely would it be conceptually strange to have random slopes but not
random intercepts?
Because intercepts often represent some kind of baseline and, say subjects,
will probably have different baselines (and thus a corresponding variance
component estimated as > 0) if their slopes (i.e. effects) vary, or is
there any other statistical reason why most people remove the random slopes
first?

Best,
Maarten
Post by Jake Westfall
Maarten,
Regarding whether it makes conceptual sense to have a model with random
slopes but not random intercepts. I believe the context of this
recommendation is an experiment where the goal is to do a confirmatory test
of whether the associated fixed slope = 0. In that case, as long as the
experiment is fairly balanced, the random slope variance appears in (and
expands) the standard error for the fixed effect of interest, while the
random intercept variance has little or no effect on the standard error
(again, assuming the experiment is close to balanced). So we'd like to keep
the random slopes in the model if possible so that the type 1 error rate
won't exceed the nominal alpha level by too much. But keeping the random
intercepts in the model is less important because it should have little or
no impact on the type 1 error rate either way, albeit it would be
conceptually strange to have random slopes but not random intercepts. So,
anyway, that's the line of thinking as I understand it, and I don't think
it's crazy.
Jake
On Wed, Aug 29, 2018 at 7:18 AM Maarten Jung <
Post by Maarten Jung
# here c1 and c2 represent the two contrasts/numeric covariates defined
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || group)
On Wed, Aug 29, 2018 at 2:07 PM Maarten Jung <
Post by Maarten Jung
Post by Phillip Alday
Post by Maarten Jung
And, is there any difference between LMMs with categorical and LMMs
with continuous predictors regarding this?
Absolutely! Consider the trivial case of only one categorical
predictor
Post by Maarten Jung
Post by Maarten Jung
Post by Phillip Alday
with dummy coding and no continuous predictors in a fixed-effect
model.
Post by Maarten Jung
Post by Maarten Jung
Post by Phillip Alday
Then ~ 0 + cat.pred and ~ 1 + cat.pred produce identical models in
some
Post by Maarten Jung
Post by Phillip Alday
sense, but in the former each level of the predictor is estimated as
an
Post by Maarten Jung
Post by Maarten Jung
Post by Phillip Alday
"absolute" value, while in the latter, one predictor is coded as the
intercept and estimated as an "absolute" value, while the other
levels
Post by Maarten Jung
Post by Maarten Jung
Post by Phillip Alday
are coded as offsets from that value.
data(Oats,package="nlme")
summary(lm(yield ~ 1 + Variety,Oats))
summary(lm(yield ~ 0 + Variety,Oats))
Note that the residual error is identical, but all of the summary
statistics -- R2, F -- are different.
Sorry, I just realized that I didn't make clear what I was talking
about.
Post by Maarten Jung
Post by Maarten Jung
I know that ~ 0 + cat.pred and ~ 1 + cat.pred in the fixed effects
part
Post by Maarten Jung
Post by Maarten Jung
are just reparameterizations of the same model.
As I'm working with afex::lmer_alt() which converts categorical
predictors to numeric covariates (via model.matrix()) per default, I
was
Post by Maarten Jung
Post by Maarten Jung
talking about removing random intercepts before removing random slopes
in
Post by Maarten Jung
Post by Maarten Jung
such a model, especially one without correlation parameters [e.g. m1],
and whether this is conceptually different from removing random
intercepts before removing random slopes in a LMM with continuous
predictors.
I. e., I would like to know if it makes sense in this case vs. doesn't
make sense in this case but does for continuous predictors vs. does
never
Post by Maarten Jung
Post by Maarten Jung
make sense.
# here c1 and c2 represent the two contrasts/numeric covariates
defined
Post by Maarten Jung
Post by Maarten Jung
for the three levels of a categorical predictor
m1 <- y ~ 1 + c1 + c2 + (1 + c1 + c2 || cat.pred)
Best,
Maarten
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
Maarten Jung
2018-09-01 13:49:58 UTC
Permalink
thanks for your answer, makes sense to me. I think removing the random intercepts should mostly increase the residual error and thus even
increase the SEs for the fixed effects. Is this correct?
Fwiw: this quick test with the Machines data seems to support my speculation:

data("Machines", package = "MEMSS")
d <- Machines
xtabs(~ Worker + Machine, d) # balanced

mm <- model.matrix(~ 1 + Machine, d)
c1 <- mm[, 2]
c2 <- mm[, 3]

summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (1 + c1 + c2 | Worker), d))
# Fixed effects:
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.356 1.681 5.000 31.151 6.4e-07 ***
# c1 7.967 2.421 5.000 3.291 0.021693 *
# c2 13.917 1.540 5.000 9.036 0.000277 ***

summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (0 + c1 + c2 | Worker), d))
### SEs increased:
# Fixed effects:
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.3556 0.6242 41.0000 83.880 < 2e-16 ***
# c1 7.9667 3.5833 5.3172 2.223 0.073612 .
# c2 13.9167 1.9111 6.2545 7.282 0.000282 ***

summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (1 + c1 + c2 || Worker), d))
# Fixed effects:
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.356 1.679 5.004 31.188 6.31e-07 ***
# c1 7.967 2.426 5.002 3.284 0.021833 *
# c2 13.917 1.523 5.004 9.137 0.000262 ***

summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (0 + c1 + c2 || Worker), d))
### SEs increased:
# Fixed effects:
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.3556 0.6242 41.0000 83.880 < 2e-16 ***
# c1 7.9667 3.5833 5.3172 2.223 0.073612 .
# c2 13.9167 1.9111 6.2545 7.282 0.000282 ***
Why exactely would it be conceptually strange to have random slopes but not random intercepts?
Because intercepts often represent some kind of baseline and, say subjects, will probably have different baselines (and thus a corresponding variance component estimated as > 0) if their slopes (i.e. effects) vary, or is there any other statistical reason why most people remove the random slopes first?
Jake Westfall
2018-09-01 14:52:10 UTC
Permalink
Hi Maarten,

I should point out that all of this, both what I said and what the Barr et
al. paper said, is contingent on the fixed predictor being
contrast/deviation coded, NOT treatment/dummy coded. This is sort of
mentioned in the Barr paper in footnote 12 (attached to the paragraph you
cited on p. 276), but it's not 100% clear, and I probably should have
reminded about it too.

If you add `options(contrasts=c("contr.helmert", "contr.poly"))` to the top
of your script you'll see the expected results.

The reason the coding matters in this way is that iff we're using
contrast/deviation codes and the design is approximately balanced, then
removing the random intercepts is the same as constraining all units to
have the same overall mean response -- visually, this just vertically
shifts each of the unit-specific regression lines (actually planes in this
case) so that they all intersect X=0 at the same Y -- but this shift
doesn't have much impact on any of the unit-specific slopes, and thus
doesn't change much the random slope variance. Since the random slope
variance enters the standard errors of the fixed slopes while the random
intercept variance does not (because the fixed slopes are effectively
difference scores that subtract out the unit-specific means), this means
that the standard errors are mostly unchanged by this shift.

As you point you, the residual variance does expand a little bit to soak up
some of the ignored random intercept variance. But this has very little
impact on the standard errors because, in the standard error expression,
the residual variance is divided by the total number of observations, so
its contribution to the entire expression is negligible except for tiny
data sets (which to some extent is true of the Machines dataset).

Now on the other hand, if we use treatment/dummy codes, removing the random
intercepts corresponds to a completely different constraint, specifically
we constrain all units to have the same response *in one of the
experimental conditions*, and the random slopes are left to be whatever
they now need to be to fit the other experimental conditions. This can have
a big impact on the unit-specific regression lines, generally increasing
their variance, possibly by a lot, which has a big impact on the standard
errors of the fixed slopes.

This is a lot easier to understand using pictures (and maybe a few
equations) rather than quickly typed words, but it's Saturday morning and I
want to do fun stuff, so... anyway, I hope this helps a little.

Finally, in answer to your follow-up question of why it might be
conceptually strange to have random slopes but not random intercepts, maybe
this image from the Gelman & Hill textbook of what that implies for the
unit-specific regression lines will make that more clear. I hope you agree
that the middle panel is strange. The image is specifically in a dummy
coding context, but it's not much less strange even if we use
contrast/deviation codes.


Jake



On Sat, Sep 1, 2018 at 8:45 AM Maarten Jung <
Post by Maarten Jung
Post by Maarten Jung
thanks for your answer, makes sense to me. I think removing the random
intercepts should mostly increase the residual error and thus even
Post by Maarten Jung
increase the SEs for the fixed effects. Is this correct?
data("Machines", package = "MEMSS")
d <- Machines
xtabs(~ Worker + Machine, d) # balanced
mm <- model.matrix(~ 1 + Machine, d)
c1 <- mm[, 2]
c2 <- mm[, 3]
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (1 + c1 + c2 | Worker), d))
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.356 1.681 5.000 31.151 6.4e-07 ***
# c1 7.967 2.421 5.000 3.291 0.021693 *
# c2 13.917 1.540 5.000 9.036 0.000277 ***
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (0 + c1 + c2 | Worker), d))
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.3556 0.6242 41.0000 83.880 < 2e-16 ***
# c1 7.9667 3.5833 5.3172 2.223 0.073612 .
# c2 13.9167 1.9111 6.2545 7.282 0.000282 ***
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (1 + c1 + c2 || Worker), d))
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.356 1.679 5.004 31.188 6.31e-07 ***
# c1 7.967 2.426 5.002 3.284 0.021833 *
# c2 13.917 1.523 5.004 9.137 0.000262 ***
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (0 + c1 + c2 || Worker), d))
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.3556 0.6242 41.0000 83.880 < 2e-16 ***
# c1 7.9667 3.5833 5.3172 2.223 0.073612 .
# c2 13.9167 1.9111 6.2545 7.282 0.000282 ***
Post by Maarten Jung
Why exactely would it be conceptually strange to have random slopes but
not random intercepts?
Post by Maarten Jung
Because intercepts often represent some kind of baseline and, say
subjects, will probably have different baselines (and thus a corresponding
variance component estimated as > 0) if their slopes (i.e. effects) vary,
or is there any other statistical reason why most people remove the random
slopes first?
[[alternative HTML version deleted]]
D. Rizopoulos
2018-09-01 18:45:07 UTC
Permalink
When you're using lmer() or lme(), actually you don't fit a mixed model, but the implied marginal model. That is, under maximum likelihood estimation, you integrate out the random effects, and you obtain a marginal model of the form

Y ~ N(Xbeta, ZDZ + sigma^2 I),

where X is the design matrix for the fixed effects beta, Z the design matrix of the random effects, D the covariance matrix of the random effects, and sigma^2 the measurement error variance.

Hence, it all boils down to how this marginal covariance matrix looks like. Typically, a bottom up approach is used. That is, you start with intercepts because it is the simplest structure corresponding to compound symmetry (i.e., constant correlation over time), and you continue with including random slopes that allow correlations to decay with the time lag, and even include higher order terms (e.g., in case of longitudinal data often you can include a spline of your time variable in the random effects). To judge if you need to include an extra random effect a likelihood ratio test is typically used.

Now, if you're only interest in the marginal model, that is in the fixed effects and their standard errors, it is perfectly fine to exclude the random intercepts even if you have slopes in your model, because you just see it as a particular (perhaps more parsimonious) choice for your marginal covariance matrix.

However, if you're also interested in producing subject-specific predictions that will use the empirical Bayes estimates of the random effects that you obtain in a second step, then having slopes without intercepts will look strange in most of the cases (even though I know a couple of example in this makes sense).

I hope it helps.

Best,
Dimitris


-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On Behalf Of Jake Westfall
Sent: Saturday, September 1, 2018 4:52 PM
To: ***@mailbox.tu-dresden.de
Cc: r-sig-mixed-models <r-sig-mixed-***@r-project.org>
Subject: Re: [R-sig-ME] Removing random intercepts before random slopes

Hi Maarten,

I should point out that all of this, both what I said and what the Barr et al. paper said, is contingent on the fixed predictor being contrast/deviation coded, NOT treatment/dummy coded. This is sort of mentioned in the Barr paper in footnote 12 (attached to the paragraph you cited on p. 276), but it's not 100% clear, and I probably should have reminded about it too.

If you add `options(contrasts=c("contr.helmert", "contr.poly"))` to the top of your script you'll see the expected results.

The reason the coding matters in this way is that iff we're using contrast/deviation codes and the design is approximately balanced, then removing the random intercepts is the same as constraining all units to have the same overall mean response -- visually, this just vertically shifts each of the unit-specific regression lines (actually planes in this
case) so that they all intersect X=0 at the same Y -- but this shift doesn't have much impact on any of the unit-specific slopes, and thus doesn't change much the random slope variance. Since the random slope variance enters the standard errors of the fixed slopes while the random intercept variance does not (because the fixed slopes are effectively difference scores that subtract out the unit-specific means), this means that the standard errors are mostly unchanged by this shift.

As you point you, the residual variance does expand a little bit to soak up some of the ignored random intercept variance. But this has very little impact on the standard errors because, in the standard error expression, the residual variance is divided by the total number of observations, so its contribution to the entire expression is negligible except for tiny data sets (which to some extent is true of the Machines dataset).

Now on the other hand, if we use treatment/dummy codes, removing the random intercepts corresponds to a completely different constraint, specifically we constrain all units to have the same response *in one of the experimental conditions*, and the random slopes are left to be whatever they now need to be to fit the other experimental conditions. This can have a big impact on the unit-specific regression lines, generally increasing their variance, possibly by a lot, which has a big impact on the standard errors of the fixed slopes.

This is a lot easier to understand using pictures (and maybe a few
equations) rather than quickly typed words, but it's Saturday morning and I want to do fun stuff, so... anyway, I hope this helps a little.

Finally, in answer to your follow-up question of why it might be conceptually strange to have random slopes but not random intercepts, maybe this image from the Gelman & Hill textbook of what that implies for the unit-specific regression lines will make that more clear. I hope you agree that the middle panel is strange. The image is specifically in a dummy coding context, but it's not much less strange even if we use contrast/deviation codes.


Jake
Post by Maarten Jung
Post by Maarten Jung
thanks for your answer, makes sense to me. I think removing the random
intercepts should mostly increase the residual error and thus even
Post by Maarten Jung
increase the SEs for the fixed effects. Is this correct?
data("Machines", package = "MEMSS")
d <- Machines
xtabs(~ Worker + Machine, d) # balanced
mm <- model.matrix(~ 1 + Machine, d)
c1 <- mm[, 2]
c2 <- mm[, 3]
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (1 + c1 + c2 | Worker),
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.356 1.681 5.000 31.151 6.4e-07 ***
# c1 7.967 2.421 5.000 3.291 0.021693 *
# c2 13.917 1.540 5.000 9.036 0.000277 ***
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (0 + c1 + c2 | Worker),
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.3556 0.6242 41.0000 83.880 < 2e-16 ***
# c1 7.9667 3.5833 5.3172 2.223 0.073612 .
# c2 13.9167 1.9111 6.2545 7.282 0.000282 ***
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (1 + c1 + c2 || Worker),
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.356 1.679 5.004 31.188 6.31e-07 ***
# c1 7.967 2.426 5.002 3.284 0.021833 *
# c2 13.917 1.523 5.004 9.137 0.000262 ***
summary(lmerTest::lmer(score ~ 1 + c1 + c2 + (0 + c1 + c2 || Worker),
# Estimate Std. Error df t value Pr(>|t|)
# (Intercept) 52.3556 0.6242 41.0000 83.880 < 2e-16 ***
# c1 7.9667 3.5833 5.3172 2.223 0.073612 .
# c2 13.9167 1.9111 6.2545 7.282 0.000282 ***
Post by Maarten Jung
Why exactely would it be conceptually strange to have random slopes but
not random intercepts?
Post by Maarten Jung
Because intercepts often represent some kind of baseline and, say
subjects, will probably have different baselines (and thus a
corresponding variance component estimated as > 0) if their slopes
(i.e. effects) vary, or is there any other statistical reason why most
people remove the random slopes first?
[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Maarten Jung
2018-09-03 23:42:37 UTC
Permalink
Post by D. Rizopoulos
Now, if you're only interest in the marginal model, that is in the fixed effects and their standard errors, it is perfectly fine to exclude the random intercepts even if you have slopes in your model, because you just see it as a particular (perhaps more parsimonious) choice for your marginal covariance matrix.
This seems to be an interesting point: If one defines the covariance
matrix as in [1], the off-diagonal elements of this matrix should be
zero for a model without correlation parameters (by which I mean the
|| syntax) and without random intercepts, i.e. just (uncorrelated)
random slopes: (0 + c1 + c2 || Worker). This is, of course, a more
parsimonious model compared to
(1 + c1 + c2 || Worker); however, from a conceptual point of view it
does not look like an appropriate model for most situations I can
think of.
On the other hand (as Jake pointed out and I agree with that) this
model should still be OK if one is only interested in the fixed
effects, their SEs and t values.

[1] https://stats.stackexchange.com/a/348572/136579

Maarten Jung
2018-09-03 23:31:03 UTC
Permalink
Thank you, Jake, this makes total sense to me and reminds me to choose
contrasts where the intercept corresponds to the overall mean (which seems
to be handy not only in this case..)
Post by Jake Westfall
Hi Maarten,
I should point out that all of this, both what I said and what the Barr et
al. paper said, is contingent on the fixed predictor being
contrast/deviation coded, NOT treatment/dummy coded. This is sort of
mentioned in the Barr paper in footnote 12 (attached to the paragraph you
cited on p. 276), but it's not 100% clear, and I probably should have
reminded about it too.
If you add `options(contrasts=c("contr.helmert", "contr.poly"))` to the
top of your script you'll see the expected results.
The reason the coding matters in this way is that iff we're using
contrast/deviation codes and the design is approximately balanced, then
removing the random intercepts is the same as constraining all units to
have the same overall mean response -- visually, this just vertically
shifts each of the unit-specific regression lines (actually planes in this
case) so that they all intersect X=0 at the same Y -- but this shift
doesn't have much impact on any of the unit-specific slopes, and thus
doesn't change much the random slope variance. Since the random slope
variance enters the standard errors of the fixed slopes while the random
intercept variance does not (because the fixed slopes are effectively
difference scores that subtract out the unit-specific means), this means
that the standard errors are mostly unchanged by this shift.
As you point you, the residual variance does expand a little bit to soak
up some of the ignored random intercept variance. But this has very little
impact on the standard errors because, in the standard error expression,
the residual variance is divided by the total number of observations, so
its contribution to the entire expression is negligible except for tiny
data sets (which to some extent is true of the Machines dataset).
Now on the other hand, if we use treatment/dummy codes, removing the
random intercepts corresponds to a completely different constraint,
specifically we constrain all units to have the same response *in one of
the experimental conditions*, and the random slopes are left to be whatever
they now need to be to fit the other experimental conditions. This can have
a big impact on the unit-specific regression lines, generally increasing
their variance, possibly by a lot, which has a big impact on the standard
errors of the fixed slopes.
This is a lot easier to understand using pictures (and maybe a few
equations) rather than quickly typed words, but it's Saturday morning and I
want to do fun stuff, so... anyway, I hope this helps a little.
Finally, in answer to your follow-up question of why it might be
conceptually strange to have random slopes but not random intercepts, maybe
this image from the Gelman & Hill textbook of what that implies for the
unit-specific regression lines will make that more clear. I hope you agree
that the middle panel is strange. The image is specifically in a dummy
coding context, but it's not much less strange even if we use
contrast/deviation codes.
Jake
[[alternative HTML version deleted]]
Continue reading on narkive:
Search results for '[R-sig-ME] Removing random intercepts before random slopes' (Questions and Answers)
6
replies
what is a manova?
started 2006-07-30 19:24:46 UTC
higher education (university +)
Loading...