[R-sig-ME] Order of terms for random slopes

Discussion:

Stefan Th. Gries

2018-08-29 16:31:00 UTC

Hi all

I have a question about how the ordering of variable names in the
random effects structure of an lmer model leads to different results.
These are the data:

###############
x <- structure(list(OVERLAParcsine = c(0.232077682862713, 0.656060590924923,
0.546850950695944, 0.668742703202372, 0.631058840778021, 0.433445320069886,
0.315193032440724, 0.656060590924923, 0.389796296474261, 0.455598673395823,
0.500654712404588, 0.477995198518952, 0.304692654015398, 0.631058840778021,
0.489290778014116, 0.694498265626556, 0.656060590924923, 0.466765339047296,
0.411516846067488, 0.582364237868743, 0.33630357515398, 0.36826789343664,
0.489290778014116, 0.582364237868743, 0.283794109208328, 0.631058840778021,
0.33630357515398, 0.606505855213087, 0.512089752934148, 0.150568272776686,
0.273393031467473, 0.466765339047296, 0.160690652951911, 0.120289882394788,
0.558600565342801, 0.400631592701372, 0.273393031467473, 0.72081876087009,
0.444492776935819, 0.681553211563117, 0.546850950695944, 0.523598775598299,
0.273393031467473, 0.694498265626556, 0.294226837748982, 0.500654712404588,
0.411516846067488, 0.618728690672251), NAME = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), .Label = c("Anne", "Aran", "Becky", "Carl", "Dominic",
"Gail", "Joel", "John", "Liz", "Nicole", "Ruth", "Warren"), class = "factor"),
PERSON = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("caretaker",
"child"), class = "factor"), PHASE = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), class =
"data.frame", row.names = c(NA, -48L))
###############

With the following order of variables in the random-effects structure,
I get convergence warnings,

###############
summary(m1a <- lme4::lmer(OVERLAParcsine ~ 1+PERSON*PHASE +
(1+PERSON+PHASE|NAME), data=x), correlation=F) # warning
logLik(m1a) # 31.89056
###############

but not with this order:

###############
summary(m1b <- lme4::lmer(OVERLAParcsine ~ 1+PERSON*PHASE +
(1+PHASE+PERSON|NAME), data=x), correlation=F) # fine
logLik(m1b) # 31.89128
###############

Why does the order of the random effects matter when PHASE is still
considered numeric? Thanks for any input you may have,
STG

Ben Bolker

2018-08-29 17:15:57 UTC

Permalink

Thanks. This is a known issue: https://github.com/lme4/lme4/issues/449

At the risk of sounding like a stuffy old statistical fart:

- yes, lme4 *should* give an identical fit either way
- it's not terribly surprising that a model with 11 parameters fitted
to 48 observations is numerically unstable ...
- there don't seem to be any _substantive_ differences in the estimate ...

cheers
Ben Bolker

https://github.com/lme4/lme4/issues/449

Post by Stefan Th. Gries
Hi all
I have a question about how the ordering of variable names in the
random effects structure of an lmer model leads to different results.
###############
x <- structure(list(OVERLAParcsine = c(0.232077682862713, 0.656060590924923,
0.546850950695944, 0.668742703202372, 0.631058840778021, 0.433445320069886,
0.315193032440724, 0.656060590924923, 0.389796296474261, 0.455598673395823,
0.500654712404588, 0.477995198518952, 0.304692654015398, 0.631058840778021,
0.489290778014116, 0.694498265626556, 0.656060590924923, 0.466765339047296,
0.411516846067488, 0.582364237868743, 0.33630357515398, 0.36826789343664,
0.489290778014116, 0.582364237868743, 0.283794109208328, 0.631058840778021,
0.33630357515398, 0.606505855213087, 0.512089752934148, 0.150568272776686,
0.273393031467473, 0.466765339047296, 0.160690652951911, 0.120289882394788,
0.558600565342801, 0.400631592701372, 0.273393031467473, 0.72081876087009,
0.444492776935819, 0.681553211563117, 0.546850950695944, 0.523598775598299,
0.273393031467473, 0.694498265626556, 0.294226837748982, 0.500654712404588,
0.411516846067488, 0.618728690672251), NAME = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), .Label = c("Anne", "Aran", "Becky", "Carl", "Dominic",
"Gail", "Joel", "John", "Liz", "Nicole", "Ruth", "Warren"), class = "factor"),
PERSON = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("caretaker",
"child"), class = "factor"), PHASE = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), class =
"data.frame", row.names = c(NA, -48L))
###############
With the following order of variables in the random-effects structure,
I get convergence warnings,
###############
summary(m1a <- lme4::lmer(OVERLAParcsine ~ 1+PERSON*PHASE +
(1+PERSON+PHASE|NAME), data=x), correlation=F) # warning
logLik(m1a) # 31.89056
###############
###############
summary(m1b <- lme4::lmer(OVERLAParcsine ~ 1+PERSON*PHASE +
(1+PHASE+PERSON|NAME), data=x), correlation=F) # fine
logLik(m1b) # 31.89128
###############
Why does the order of the random effects matter when PHASE is still
considered numeric? Thanks for any input you may have,
STG
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Stefan Th. Gries

2018-08-29 18:51:46 UTC

Permalink

Post by Ben Bolker
Thanks. This is a known issue: https://github.com/lme4/lme4/issues/449

Ohh, ok, I had googled a bit on 'order of terms', 'random effects'
etc. but hadn't come across this, sorry.

Post by Ben Bolker
- it's not terribly surprising that a model with 11 parameters fitted to 48 observations is numerically unstable ...

Absolutely, the example is from a workshop and was used only for
didactic purposes, and ...

Post by Ben Bolker
there don't seem to be any _substantive_ differences in the estimate ...

... yes, we only wanted to make sure there wasn't something
superobvious but important we had missed.

Thanks for the quick feedback!

Thierry Onkelinx

2018-08-30 07:14:36 UTC

Permalink

Dear Stefan,

IMHO you shouldn't use an overfitted model for didatic purposes. Teach
students that you need a sufficiently large data set depending on the
complexity of the model.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Post by Stefan Th. Gries

Post by Ben Bolker
Thanks. This is a known issue: https://github.com/lme4/lme4/issues/449

Ohh, ok, I had googled a bit on 'order of terms', 'random effects'
etc. but hadn't come across this, sorry.

Post by Ben Bolker
- it's not terribly surprising that a model with 11 parameters fitted to

48 observations is numerically unstable ...
Absolutely, the example is from a workshop and was used only for
didactic purposes, and ...

Post by Ben Bolker
there don't seem to be any _substantive_ differences in the estimate ...

... yes, we only wanted to make sure there wasn't something
superobvious but important we had missed.
Thanks for the quick feedback!
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

[[alternative HTML version deleted]]