[R-sig-ME] distribution of random effects glmmTMB

Discussion:

[R-sig-ME] distribution of random effects glmmTMB - covariance structure

Vidal, Tiffany (FWE )

2018-09-06 17:59:05 UTC

I'm unclear about the distributional assumptions regarding the random effects in glmmTMB, using different covariance structures. It is my understanding that the default is unstructured covariance structure. When estimating a vector of random effects, what is the assumption about the distribution of the factor levels within each grouping? I'm usually assuming normality with a mean of 0 and estimated variance. This doesn't seem to hold looking at the ranef(mod) for the different grouping variables.

For example:
mod <- glmmTMB(Count ~ us(time + 0|Subject))
or
mod <- glmmTMB(Count ~ diag(time + 0|Subject))

Here, I'm modeling (I think) variability among subjects through time (e.g., a different subject variance in each time step), and assuming that the repeated measures within each individual subject at time t, come from some distribution. If the assumed distribution was normal with a mean of 0, I would expect the sum of the Subject BLUPs in each year to approximate 0, but that doesn't appear to be the case. Any clarification on this would be appreciated.

Thank you,
Tiffany

[[alternative HTML version deleted]]

D. Rizopoulos

2018-09-06 18:42:50 UTC

Permalink

Logically, the ranef() gives you the empirical Bayes estimates of the
random effects. Note that the distribution (and as a result the variance
and covariances) of these is not the same as the distribution you
specified in the formula of the model. Namely, the distribution you
define is the _prior_ distribution of the random effects, whereas the
empirical Bayes estimates are coming from the posterior of the random
effects.

In math terms, the choice of us() of diag() specifies the distribution
[b] of the random effects, whereas from ranef() you get the modes or
means of the posterior distribution

[b | y] which is proportional to [y | b] * [b],

where y denotes you Count outcome, and [y | b] denotes the distribution
of your outcome.

Best,
Dimitris

Post by Vidal, Tiffany (FWE )
I'm unclear about the distributional assumptions regarding the random effects in glmmTMB, using different covariance structures. It is my understanding that the default is unstructured covariance structure. When estimating a vector of random effects, what is the assumption about the distribution of the factor levels within each grouping? I'm usually assuming normality with a mean of 0 and estimated variance. This doesn't seem to hold looking at the ranef(mod) for the different grouping variables.
mod <- glmmTMB(Count ~ us(time + 0|Subject))
or
mod <- glmmTMB(Count ~ diag(time + 0|Subject))
Here, I'm modeling (I think) variability among subjects through time (e.g., a different subject variance in each time step), and assuming that the repeated measures within each individual subject at time t, come from some distribution. If the assumed distribution was normal with a mean of 0, I would expect the sum of the Subject BLUPs in each year to approximate 0, but that doesn't appear to be the case. Any clarification on this would be appreciated.
Thank you,
Tiffany
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Dimitris Rizopoulos
Professor of Biostatistics
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web (personal): http://www.drizopoulos.com/
Web (work): http://www.erasmusmc.nl/biostatistiek/
Blog: http://iprogn.blogspot.nl/

Ben Bolker

2018-09-06 19:05:23 UTC

Permalink

Yes.

While the distribution of conditional modes is certainly not assumed
to be exactly N(0,s^2), informally, if the observed distribution of
conditional modes is far from zero-centered Gaussian, I might worry
about misspecification of the model. I know of the existence of a
literature on the diagnosis and effects of model misspecification
(especially of the distribution of conditional modes) in (G)LMMs -- e.g.
go to http://bbolker.github.io/mixedmodels-misc/glmmbib.html and search
for "misspec" -- but I don't know its contents well at all.

(1) adding group-level covariates (to explain some of the non-Normal
among-group variability) can help, if you have any such information
(2) one more question about your random-effect specification. Is time
being treated as categorical or continuous?
If categorical:
- if there are n time points, us(time+0|Subject) will have
n*(n+1)/2 parameters, which could get out of hand (you'll be trying to
estimate the full variance-covariance matrix among all n observations
for each subject -- you'll need lots of subjects to make this work).
Could be worth trying an ar1() model instead?
- allowing for a *continuous*, fixed effect of time in addition to
the random effect could help (again, by explaining some of the
systematic variability)
- if continuous: I'm not sure why you would suppress the intercept
variation?

Post by D. Rizopoulos
Logically, the ranef() gives you the empirical Bayes estimates of the
random effects. Note that the distribution (and as a result the variance
and covariances) of these is not the same as the distribution you
specified in the formula of the model. Namely, the distribution you
define is the _prior_ distribution of the random effects, whereas the
empirical Bayes estimates are coming from the posterior of the random
effects.
In math terms, the choice of us() of diag() specifies the distribution
[b] of the random effects, whereas from ranef() you get the modes or
means of the posterior distribution
[b | y] which is proportional to [y | b] * [b],
where y denotes you Count outcome, and [y | b] denotes the distribution
of your outcome.
Best,
Dimitris

Vidal, Tiffany (FWE )

2018-09-06 19:15:33 UTC

Permalink

Thank you. That makes sense regarding the conditional modes. This model is not specified fully yet, but more an example to understand the different covariance structure options in this package. Your replies have been very helpful, and I will consult the materials suggested.

Tiffany

-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-***@r-project.org] On Behalf Of Ben Bolker
Sent: Thursday, September 06, 2018 3:05 PM
To: r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] distribution of random effects glmmTMB - covariance structure

Yes.

While the distribution of conditional modes is certainly not assumed to be exactly N(0,s^2), informally, if the observed distribution of conditional modes is far from zero-centered Gaussian, I might worry about misspecification of the model. I know of the existence of a literature on the diagnosis and effects of model misspecification (especially of the distribution of conditional modes) in (G)LMMs -- e.g.
go to https://urldefense.proofpoint.com/v2/url?u=http-3A__bbolker.github.io_mixedmodels-2Dmisc_glmmbib.html&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=GRAZe7mikKkDmGfVMz0G4FV6LLM-lUlXzdYo2QRJULY&s=R9nkTkXINyOZkqZ3csEKQoPWhmEa8WKDU50YUmGF_9Q&e= and search for "misspec" -- but I don't know its contents well at all.

(1) adding group-level covariates (to explain some of the non-Normal among-group variability) can help, if you have any such information
(2) one more question about your random-effect specification. Is time being treated as categorical or continuous?
If categorical:
- if there are n time points, us(time+0|Subject) will have
n*(n+1)/2 parameters, which could get out of hand (you'll be trying to estimate the full variance-covariance matrix among all n observations for each subject -- you'll need lots of subjects to make this work).
Could be worth trying an ar1() model instead?
- allowing for a *continuous*, fixed effect of time in addition to the random effect could help (again, by explaining some of the systematic variability)
- if continuous: I'm not sure why you would suppress the intercept variation?

Post by D. Rizopoulos
Logically, the ranef() gives you the empirical Bayes estimates of the
random effects. Note that the distribution (and as a result the
variance and covariances) of these is not the same as the distribution
you specified in the formula of the model. Namely, the distribution
you define is the _prior_ distribution of the random effects, whereas
the empirical Bayes estimates are coming from the posterior of the
random effects.
In math terms, the choice of us() of diag() specifies the distribution
[b] of the random effects, whereas from ranef() you get the modes or
means of the posterior distribution
[b | y] which is proportional to [y | b] * [b],
where y denotes you Count outcome, and [y | b] denotes the
distribution of your outcome.
Best,
Dimitris

Post by Vidal, Tiffany (FWE )
I'm unclear about the distributional assumptions regarding the random effects in glmmTMB, using different covariance structures. It is my understanding that the default is unstructured covariance structure. When estimating a vector of random effects, what is the assumption about the distribution of the factor levels within each grouping? I'm usually assuming normality with a mean of 0 and estimated variance. This doesn't seem to hold looking at the ranef(mod) for the different grouping variables.
mod <- glmmTMB(Count ~ us(time + 0|Subject)) or mod <- glmmTMB(Count
~ diag(time + 0|Subject))
Here, I'm modeling (I think) variability among subjects through time (e.g., a different subject variance in each time step), and assuming that the repeated measures within each individual subject at time t, come from some distribution. If the assumed distribution was normal with a mean of 0, I would expect the sum of the Subject BLUPs in each year to approximate 0, but that doesn't appear to be the case. Any clarification on this would be appreciated.
Thank you,
Tiffany
[[alternative HTML version deleted]]
_______________________________________________
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mai
lman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwICAg&c=lDF7oMaPKXpkYvev9V-
fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwb
w&m=GRAZe7mikKkDmGfVMz0G4FV6LLM-lUlXzdYo2QRJULY&s=qblTkKF4oFycWjohI0c
7Fuium-03zG-v81IrPw2vCVM&e=

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=GRAZe7mikKkDmGfVMz0G4FV6LLM-lUlXzdYo2QRJULY&s=qblTkKF4oFycWjohI0c7Fuium-03zG-v81IrPw2vCVM&e=

D. Rizopoulos

2018-09-06 19:29:30 UTC

Permalink

Well, AFAIK checking the normality assumption of the prior distribution
of the random effects using the EB estimates can be problematic. This is
because they have different distributions that depend on the design
matrices of the fixed and random effects of each subject. And, also
because there is the effect of shrinkage that has an impact on their
distribution.

For more on these points, a nice overview is given in Section 7.8 of
book of Verbeke and Molenberghs (2000), "Linear Mixed Models for
Longitudinal Data", Springer-Verlag.

In any case, if we're talking about linear mixed models, it has be shown
that misspecifying the prior distribution of the random effects has very
little impact in parameter estimates and standard errors for the fixed
effects.

Best,
Dimitris

Post by Ben Bolker
Yes.
While the distribution of conditional modes is certainly not assumed
to be exactly N(0,s^2), informally, if the observed distribution of
conditional modes is far from zero-centered Gaussian, I might worry
about misspecification of the model. I know of the existence of a
literature on the diagnosis and effects of model misspecification
(especially of the distribution of conditional modes) in (G)LMMs -- e.g.
go to http://bbolker.github.io/mixedmodels-misc/glmmbib.html and search
for "misspec" -- but I don't know its contents well at all.
(1) adding group-level covariates (to explain some of the non-Normal
among-group variability) can help, if you have any such information
(2) one more question about your random-effect specification. Is time
being treated as categorical or continuous?
- if there are n time points, us(time+0|Subject) will have
n*(n+1)/2 parameters, which could get out of hand (you'll be trying to
estimate the full variance-covariance matrix among all n observations
for each subject -- you'll need lots of subjects to make this work).
Could be worth trying an ar1() model instead?
- allowing for a *continuous*, fixed effect of time in addition to
the random effect could help (again, by explaining some of the
systematic variability)
- if continuous: I'm not sure why you would suppress the intercept
variation?

_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

--
Dimitris Rizopoulos
Professor of Biostatistics
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web (personal): http://www.drizopoulos.com/
Web (work): http://www.erasmusmc.nl/biostatistiek/
Blog: http://iprogn.blogspot.nl/

Vidal, Tiffany (FWE )

2018-09-06 19:44:45 UTC

Permalink

Thank you for this discussion and suggested materials. This is making much more sense now, and part of the problem could certainly be that the model is not fully specified yet, but was written to try to understand how the different covariance structures were operating. I'll dig into this more - thanks for the guidance!

Tiffany

-----Original Message-----
From: R-sig-mixed-models [mailto:r-sig-mixed-models-***@r-project.org] On Behalf Of D. Rizopoulos
Sent: Thursday, September 06, 2018 3:30 PM
To: Ben Bolker; r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] distribution of random effects glmmTMB - covariance structure

Well, AFAIK checking the normality assumption of the prior distribution of the random effects using the EB estimates can be problematic. This is because they have different distributions that depend on the design matrices of the fixed and random effects of each subject. And, also because there is the effect of shrinkage that has an impact on their distribution.

For more on these points, a nice overview is given in Section 7.8 of book of Verbeke and Molenberghs (2000), "Linear Mixed Models for Longitudinal Data", Springer-Verlag.

In any case, if we're talking about linear mixed models, it has be shown that misspecifying the prior distribution of the random effects has very little impact in parameter estimates and standard errors for the fixed effects.

Best,
Dimitris

Post by Ben Bolker
Yes.
While the distribution of conditional modes is certainly not
assumed to be exactly N(0,s^2), informally, if the observed
distribution of conditional modes is far from zero-centered Gaussian,
I might worry about misspecification of the model. I know of the
existence of a literature on the diagnosis and effects of model
misspecification (especially of the distribution of conditional modes) in (G)LMMs -- e.g.
go to
https://urldefense.proofpoint.com/v2/url?u=http-3A__bbolker.github.io_mixedmodels-2Dmisc_glmmbib.html&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=ZPfMOcctjXZQxfX9fEQ8D2iZ5pDIvEopo1I67gQa7Nc&e= and search for "misspec" -- but I don't know its contents well at all.
(1) adding group-level covariates (to explain some of the non-Normal
among-group variability) can help, if you have any such information
(2) one more question about your random-effect specification. Is
time being treated as categorical or continuous?
- if there are n time points, us(time+0|Subject) will have
n*(n+1)/2 parameters, which could get out of hand (you'll be trying to
estimate the full variance-covariance matrix among all n observations
for each subject -- you'll need lots of subjects to make this work).
Could be worth trying an ar1() model instead?
- allowing for a *continuous*, fixed effect of time in addition
to the random effect could help (again, by explaining some of the
systematic variability)
- if continuous: I'm not sure why you would suppress the intercept
variation?

Post by D. Rizopoulos
Logically, the ranef() gives you the empirical Bayes estimates of the
random effects. Note that the distribution (and as a result the
variance and covariances) of these is not the same as the
distribution you specified in the formula of the model. Namely, the
distribution you define is the _prior_ distribution of the random
effects, whereas the empirical Bayes estimates are coming from the
posterior of the random effects.
In math terms, the choice of us() of diag() specifies the
distribution [b] of the random effects, whereas from ranef() you get
the modes or means of the posterior distribution
[b | y] which is proportional to [y | b] * [b],
where y denotes you Count outcome, and [y | b] denotes the
distribution of your outcome.
Best,
Dimitris

Post by Vidal, Tiffany (FWE )
I'm unclear about the distributional assumptions regarding the random effects in glmmTMB, using different covariance structures. It is my understanding that the default is unstructured covariance structure. When estimating a vector of random effects, what is the assumption about the distribution of the factor levels within each grouping? I'm usually assuming normality with a mean of 0 and estimated variance. This doesn't seem to hold looking at the ranef(mod) for the different grouping variables.
mod <- glmmTMB(Count ~ us(time + 0|Subject)) or mod <- glmmTMB(Count
~ diag(time + 0|Subject))
Here, I'm modeling (I think) variability among subjects through time (e.g., a different subject variance in each time step), and assuming that the repeated measures within each individual subject at time t, come from some distribution. If the assumed distribution was normal with a mean of 0, I would expect the sum of the Subject BLUPs in each year to approximate 0, but that doesn't appear to be the case. Any clarification on this would be appreciated.
Thank you,
Tiffany
[[alternative HTML version deleted]]
_______________________________________________
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma
ilman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwICAg&c=lDF7oMaPKXpkYvev9
V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqR
Zwbw&m=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=nWDwo8-rSy39wCi
vdQsbmA85UQkYviIz1tb-2VIR2cg&e=

_______________________________________________
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
man_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fV
ahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m
=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=nWDwo8-rSy39wCivdQsbmA8
5UQkYviIz1tb-2VIR2cg&e=

--
Dimitris Rizopoulos
Professor of Biostatistics
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web (personal): https://urldefense.proofpoint.com/v2/url?u=http-3A__www.drizopoulos.com_&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=G_wJsbEuwqiP3MLBq5ymJYNHv5zjzW_hq9RJeHMh8nY&e=
Web (work): https://urldefense.proofpoint.com/v2/url?u=http-3A__www.erasmusmc.nl_biostatistiek_&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=elBlIW1rIwFBR6v-ImOrmRW94vUHOTI3YrL-5yjEtzs&e=
Blog: https://urldefense.proofpoint.com/v2/url?u=http-3A__iprogn.blogspot.nl_&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=ye5iQPWdroI6HXFE_gKH5yOe2xfCrzQFjSVEiKR45Dw&e=
_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwICAg&c=lDF7oMaPKXpkYvev9V-fVahWL0QWnGCCAfCDz1Bns_w&r=sqewvGWc5AUwYJSPkw7hFHEzecJLoIBs7pn2DqRZwbw&m=1wyRjyOI7n3_7Xn0v20mIyploRKDemTvOsjO0QUR1TM&s=nWDwo8-rSy39wCivdQsbmA85UQkYviIz1tb-2VIR2cg&e=