Discussion:
[R-sig-ME] Dependency structure
Yashree Mehta
2018-10-16 17:03:27 UTC
Permalink
Hi,

Is there literature on how to specify the dependency structure between the
random intercept and the statistical noise error term in a random intercept
model?
It would be useful to also know how to implement using R...

Thank you

Yashree

[[alternative HTML version deleted]]
Ben Bolker
2018-10-17 00:34:59 UTC
Permalink
Post by Yashree Mehta
Hi,
Is there literature on how to specify the dependency structure between the
random intercept and the statistical noise error term in a random intercept
model?
It would be useful to also know how to implement using R...
Can you be more specific about what you want? Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for each
observation (error term) and an epsilon_"1,i} for each group (random
intercept). Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with variance
sigma_1^2. What kind of correlation structure are you looking for?

While we're at it, you previously asked:

===
I am working with a random intercept model. I have the usual "X" vector
of covariates and one id variable which will make up the random
intercept. For example,

Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID

I need to acknowledge that there is correlation between the FIXED EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.

Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===

The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect coefficients
and random intercepts. (This actually seems like a weird question to me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model. In the
context of LMM fitting, I don't think parameters are random effects in
this sense.
Post by Yashree Mehta
Thank you
Yashree
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Baud-Bovy Gabriel
2018-10-17 09:06:43 UTC
Permalink
My apologies for sending this post again. I am not sure that it went
through because I did not
receive it myself (though I think my options should have let me receive
it). Also, there was
a line missing due to hurried cut-and-paste in the example below.

Dear all,

I have a dataset with response times of 200 participants measured 36
times in different conditions.
The RTs distribution is right skewed as it is generally the case for
RTs. I would like to avoid transform them
(see Lo & Andrew
(2015)](https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full).
However, I often encounter warning messages with this dataset when using
a GLMM with Gamma
distribution and identity link. The problem is not caused by an over
parametrization as it arises
even in the case in a intercept only model.
fit <- glmer(choice.lat ~ 1 + (1|su), data=dat,
family=Gamma(link="identity"))
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
fit
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: Gamma ( identity )
Formula: choice.lat ~ 1 + (1 | su)
Data: dat
AIC BIC logLik deviance df.resid
22629.26 22649.78 -11311.63 22623.26 6914
Random effects:
Groups Name Std.Dev.
su (Intercept) 0.8477
Residual 0.5721
Number of obs: 6917, groups: su, 200
Fixed Effects:
(Intercept)
3.044
convergence code 0; 2 optimizer warnings; 0 lme4 warnings

The dispersion parameter of the estimated Gamma distribution is
sigma(fit.glmer)^2 = 0.327
with a population mean (Fixed effect) of circa 3.0 sec which fits
reasonably well the RTs.

After some poking around, I figured out that the problem probably comes
from the
fact that the participants means are NOT distributed normally. In fact,
their distribution
is similarly skewed. Removing the two extreme subjects or using a log
link function "solves" the convergence
problem/ warning messages but I don't like either solution because the
is no valid reason to exclude these subjects
and using a log transform cause theoretical problems when one want to
interpret
effects of more complex models (see Lo & Andrew (2015)).

I have seen many discussions of about warning message/convergence problem (
e.g.
https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html).
The usual advice is to try with different optimizers and/or simplify the
model, which do not help here (they all give
similar warning messages; see below). I have two questions:

1) Can somebody explain intuitively why non-normal random effects lead
to these warning
messages ? I would have naively assumed that it would simply lead
to large a large variance estimate.

2) Is there something within the lme4 framework to tackle this problem?
Should I simply ignore the
warning messages ?

3) What principle approach would you suggest ? I have seen that brms
allow in principle to specify
non-normal random effect (I think that only student is implemented
so far) but I haven't tried yet.

Here is a toy problem that reproduces the issue (it depends on the seed
actually). Depending
on the choice of the dispersion parameter for the REs, it is possible
also to induce convergence
problem.

# artificial data set
nsu <- 200
nrep <- 36
mu <- 3.0 # mean RT
phi.su <- 0.8 # dispersion participants
phi <- 0.3 # dispersion
d <- expand.grid(
su = paste0("S",1:nsu),
rep = 1:nrep,
mu =NA,
rt = NA)

set.seed(125)
# generate 200 gamma distributed participant means
mu.su <- sort(rgamma(nsu, shape=1/phi.su, scale=phi.su*mu))
d$mu <- mu.su[d$su]
# add Gamma noise
d$rt <- rgamma(nsu*nrep,shape=1/phi,scale=phi*d$mu)

# fit intercept GLMM model (gives warning messages)
fit <- glmer(rt ~ 1 + (1|su), data=d, family=Gamma(link="identity"))

# histogram of participant means and response times
par(mfrow=c(1,2))
hist(mu.su,n=50)
hist(d$rt,n=200,xlim=c(0,10))
all_fit(fit)
bobyqa. : [OK]
Nelder_Mead. : [OK]
optimx.nlminb : [OK]
optimx.L-BFGS-B : [OK]
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
nloptwrap.NLOPT_LN_BOBYQA : [OK]
nmkbw. : [OK]
....
were 13 warnings (use warnings() to see them)
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model failed to converge with max|grad| = 0.215844 (tol = 0.001,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
component 1)
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
5: In optwrap(optimizer, devfun, start, rho$lower, control = control, ... :
convergence code 1 from optimx
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model failed to converge with max|grad| = 0.215713 (tol = 0.001,
component 1)
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model failed to converge with max|grad| = 0.102562 (tol = 0.001,
component 1)
9: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
... :
Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
component 1)
10: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
control$checkConv, ... :
Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
component 1)
11: In optwrap(optimizer, devfun, start, rho$lower, control = control,
... :
convergence code 2 from nmkbw
12: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
control$checkConv, ... :
Model failed to converge with max|grad| = 0.220811 (tol = 0.001,
component 1)
13: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
control$checkConv, ... :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?


Thank you,

Gabriel
--
---------------------------------------------------------------------
Gabriel Baud-Bovy tel.: (+39) 348 172 4045 (mobile)
UHSR University (+39) 02 2643 3429 (laboratory)
via Olgettina, 58 (+39) 02 9175 1540 (secretary)
20132 Milan, Italy email: gabriel.baud-***@hsr.it
---------------------------------------------------------------------



Rispetta l’ambiente: non stampare questa mail se non è necessario.
Respect the environment: print this email only if necessary.
Ulf Köther
2018-10-17 16:28:30 UTC
Permalink
Dear Gabriel,

as no one of the real professionals stepped in yet, I will try to
provide some thoughts:

(1) In my experience, Gamma GLMMs with an identity link are very buggy
all the time, so normally, when I am confronted with positive continuous
data, I mostly never consider them in the first place...why not use
another link function when using GLMMs, because that's exactly the goal
they were invented for: modeling data that does not belong to a normal
distribution by using an appropriate link function.

(2) I see why you don't want to transform the raw RT data and use an
ordinary LMM, but in my opinion, using a Gamma GLMM with a log-Link is a
valid option. Especially because you can insert the raw RT as a DV and
then model it on the scale of the linear predictor and after that, get
all parameter estimates and (which is much more valuable in my opinion)
the predicted values on the scale of the dependent variable without any
hard work, just use the invert link function (e.g., use the effects
package with lme4 and get the raw RT predictions). The interpretation if
such a model is much easier as you can directly get the predictions on
the scale of the DV.

Is it totally reasonable with regards to power? That's another question
that a real RT specialist should answer...

(2) I know the article you mentioned and skimmed it again. I really
wonder why they do not mention a log-link function as a valid option and
just compare the LMM with the transformed DV against a Gamma GLMM with
the identity link or an inverse Gaussian with identity link. It seems to
my that the Gamma GLMM with a log link is never really discussed. But,
as they also use the inverse Gaussian with the following link function
in lme4:

invfn <- function() {
## link
linkfun <- function(y) -1000/y
## inverse link
linkinv <- function(eta) -1000/eta
## derivative of invlink wrt eta
mu.eta <- function(eta) { 1000/(eta^2) }
valideta <- function(eta) TRUE
link <- "-1000/y"
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta,
name = link),
class = "link-glm")
}

Why not try this when you want to go along the line the authors suggest?
The parameter estimates in their analyses seem really comparable between
Gamma and inverse Gaussian GLMMs to me. Just have a look at the
supplement! ;-)

Try to play around with your generated data with the log-link function
and with the inverse Gaussian to see which one recovers your parameters
best and go from there...!

Greetings,

Ulf
Post by Baud-Bovy Gabriel
My apologies for sending this post again. I am not sure that it went
through because I did not
receive it myself (though I think my options should have let me receive
it). Also, there was
a line missing due to hurried cut-and-paste in the example below.
Dear all,
I have a dataset with response times of 200 participants measured 36
times in different conditions.
The RTs distribution is right skewed as it is generally the case for
RTs. I would like to avoid transform them
(see Lo & Andrew
(2015)](https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full).
However, I often encounter warning messages with this dataset when using
a GLMM with Gamma
distribution and identity link. The problem is not caused by an over
parametrization as it arises
even in the case in a intercept only model.
fit <- glmer(choice.lat ~ 1 + (1|su), data=dat,
family=Gamma(link="identity"))
Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
fit
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: Gamma ( identity )
Formula: choice.lat ~ 1 + (1 | su)
Data: dat
AIC BIC logLik deviance df.resid
22629.26 22649.78 -11311.63 22623.26 6914
Groups Name Std.Dev.
su (Intercept) 0.8477
Residual 0.5721
Number of obs: 6917, groups: su, 200
(Intercept)
3.044
convergence code 0; 2 optimizer warnings; 0 lme4 warnings
The dispersion parameter of the estimated Gamma distribution is
sigma(fit.glmer)^2 = 0.327
with a population mean (Fixed effect) of circa 3.0 sec which fits
reasonably well the RTs.
After some poking around, I figured out that the problem probably comes
from the
fact that the participants means are NOT distributed normally. In fact,
their distribution
is similarly skewed. Removing the two extreme subjects or using a log
link function "solves" the convergence
problem/ warning messages but I don't like either solution because the
is no valid reason to exclude these subjects
and using a log transform cause theoretical problems when one want to
interpret
effects of more complex models (see Lo & Andrew (2015)).
I have seen many discussions of about warning message/convergence problem (
e.g.
https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html).
The usual advice is to try with different optimizers and/or simplify the
model, which do not help here (they all give
1) Can somebody explain intuitively why non-normal random effects lead
to these warning
messages ? I would have naively assumed that it would simply lead
to large a large variance estimate.
2) Is there something within the lme4 framework to tackle this problem?
Should I simply ignore the
warning messages ?
3) What principle approach would you suggest ? I have seen that brms
allow in principle to specify
non-normal random effect (I think that only student is implemented
so far) but I haven't tried yet.
Here is a toy problem that reproduces the issue (it depends on the seed
actually). Depending
on the choice of the dispersion parameter for the REs, it is possible
also to induce convergence
problem.
# artificial data set
nsu <- 200
nrep <- 36
mu <- 3.0 # mean RT
phi.su <- 0.8 # dispersion participants
phi <- 0.3 # dispersion
d <- expand.grid(
su = paste0("S",1:nsu),
rep = 1:nrep,
mu =NA,
rt = NA)
set.seed(125)
# generate 200 Gamma distributed participant means
mu.su <- sort(rGamma(nsu, shape=1/phi.su, scale=phi.su*mu))
d$mu <- mu.su[d$su]
# add Gamma noise
d$rt <- rGamma(nsu*nrep,shape=1/phi,scale=phi*d$mu)
# fit intercept GLMM model (gives warning messages)
fit <- glmer(rt ~ 1 + (1|su), data=d, family=Gamma(link="identity"))
# histogram of participant means and response times
par(mfrow=c(1,2))
hist(mu.su,n=50)
hist(d$rt,n=200,xlim=c(0,10))
all_fit(fit)
bobyqa. : [OK]
Nelder_Mead. : [OK]
optimx.nlminb : [OK]
optimx.L-BFGS-B : [OK]
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
nloptwrap.NLOPT_LN_BOBYQA : [OK]
nmkbw. : [OK]
....
were 13 warnings (use warnings() to see them)
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.215844 (tol = 0.001,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
component 1)
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
convergence code 1 from optimx
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.215713 (tol = 0.001,
component 1)
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.102562 (tol = 0.001,
component 1)
9: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
component 1)
10: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
component 1)
11: In optwrap(optimizer, devfun, start, rho$lower, control = control,
convergence code 2 from nmkbw
12: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
Model failed to converge with max|grad| = 0.220811 (tol = 0.001,
component 1)
13: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Thank you,
Gabriel
--
---------------------------------------------------------------------
Gabriel Baud-Bovy tel.: (+39) 348 172 4045 (mobile)
UHSR University (+39) 02 2643 3429 (laboratory)
via Olgettina, 58 (+39) 02 9175 1540 (secretary)
---------------------------------------------------------------------
Rispetta l’ambiente: non stampare questa mail se non è necessario.
Respect the environment: print this email only if necessary.
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING
Baud-Bovy Gabriel
2018-10-18 13:04:36 UTC
Permalink
Dear Ulf,

Thank you for the reply. I think that the approach is very reasonable
but I am
not fully comfortable with it for reasons that I detail below your comments.
Post by Ulf Köther
Dear Gabriel,
as no one of the real professionals stepped in yet, I will try to
(1) In my experience, Gamma GLMMs with an identity link are very buggy
all the time, so normally, when I am confronted with positive continuous
data, I mostly never consider them in the first place...why not use
another link function when using GLMMs, because that's exactly the goal
they were invented for: modeling data that does not belong to a normal
distribution by using an appropriate link function.
One motivation for my post was to ask for insights about why Gamma GLMMS
with identity link
are so difficult to fit even with a very simple intercept only model. 
My guess is that it is related to the
fact that the distribution of the subject's mean are not gaussian but I
would very much appreciate
opinion of the experts.
Post by Ulf Köther
(2) I see why you don't want to transform the raw RT data and use an
ordinary LMM, but in my opinion, using a Gamma GLMM with a log-Link is a
valid option. Especially because you can insert the raw RT as a DV and
then model it on the scale of the linear predictor and after that, get
all parameter estimates and (which is much more valuable in my opinion)
the predicted values on the scale of the dependent variable without any
hard work, just use the invert link function (e.g., use the effects
package with lme4 and get the raw RT predictions). The interpretation if
such a model is much easier as you can directly get the predictions on
the scale of the DV.
Is it totally reasonable with regards to power? That's another question
that a real RT specialist should answer...
The log link solves indeed the fitting issue. However, my reason for not
doing it, which is
discussed in Lo & Andrew 2015 paper, is the problem of "scale dependent"
interactions,
which they discuss in their section on Transform or Not to Transform.
https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full

The "existence" of an such an interaction and, its statistical
significance, depends on
the scale.  If I am not mistaken, being able to transform back the
coefficients on the raw RTs scale is a
separate issue.
Post by Ulf Köther
(2) I know the article you mentioned and skimmed it again. I really
wonder why they do not mention a log-link function as a valid option and
just compare the LMM with the transformed DV against a Gamma GLMM with
the identity link or an inverse Gaussian with identity link. It seems to
my that the Gamma GLMM with a log link is never really discussed. But,
as they also use the inverse Gaussian with the following link function
invfn <- function() {
## link
linkfun <- function(y) -1000/y
## inverse link
linkinv <- function(eta) -1000/eta
## derivative of invlink wrt eta
mu.eta <- function(eta) { 1000/(eta^2) }
valideta <- function(eta) TRUE
link <- "-1000/y"
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta,
name = link),
class = "link-glm")
}
Why not try this when you want to go along the line the authors suggest?
The parameter estimates in their analyses seem really comparable between
Gamma and inverse Gaussian GLMMs to me. Just have a look at the
supplement! ;-)
While I think that the problem of "scale dependent"  interaction
remains, I probably should explore more
this possibility because the transformed scale might  justified on the
theoretical basis. In fact, I think
that inverse transformation might be justified on basis of the LATER's
model or similar
(https://www.sciencedirect.com/science/article/pii/S0149763415301226).
In this case,
the coefficients represent the rate of accumulation of evidence. My
understanding
of this model is that gaussian noise is added to the rate of
accumulation.  Typically, the inverse
is applied to the RTs and the transformed data are analyzed with LMs or
LMMs.

Lo & Andrew use the inverse link function with Gamma or Inverse Gaussian
distribution but they don't
really discuss the assumptions behind choosing the inverse link
function  and the inverse gaussian. From what I
know, the inverse gaussian distribution is not the reciprocal of the
normal distribution and I see only a
partial  connection between the LATER model and the proposed GLMM model.

I would appreciate any further comment from you and other RT experts on
this issue.

Best,

Gabriel

P.S.  Paul Johnson's wrote an interesting note on Gamma GLM with the
inverse link
(i.e. when modeling relationships of the form mu = a + b/x or 1/mu = a +
b/x).
http://pj.freefaculty.org/guides/stat/Regression-GLM/Gamma/GammaGLM-01.pdf
His point was that using GLM rather than OLS  was important when there
is a transformation
but using Normal  instead of Gamma  distributed dependent variable with
GLM does not
result in disaster.  I think that the question when analyzing RTs is
different since there
is reason to use Gamma without a transformation, a case that is not
discussed in Paul
Johnson's note.
Post by Ulf Köther
Try to play around with your generated data with the log-link function
and with the inverse Gaussian to see which one recovers your parameters
best and go from there...!
Greetings,
Ulf
Post by Baud-Bovy Gabriel
My apologies for sending this post again. I am not sure that it went
through because I did not
receive it myself (though I think my options should have let me receive
it). Also, there was
a line missing due to hurried cut-and-paste in the example below.
Dear all,
I have a dataset with response times of 200 participants measured 36
times in different conditions.
The RTs distribution is right skewed as it is generally the case for
RTs. I would like to avoid transform them
(see Lo & Andrew
(2015)](https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full).
However, I often encounter warning messages with this dataset when using
a GLMM with Gamma
distribution and identity link. The problem is not caused by an over
parametrization as it arises
even in the case in a intercept only model.
fit <- glmer(choice.lat ~ 1 + (1|su), data=dat,
family=Gamma(link="identity"))
Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
fit
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: Gamma ( identity )
Formula: choice.lat ~ 1 + (1 | su)
Data: dat
AIC BIC logLik deviance df.resid
22629.26 22649.78 -11311.63 22623.26 6914
Groups Name Std.Dev.
su (Intercept) 0.8477
Residual 0.5721
Number of obs: 6917, groups: su, 200
(Intercept)
3.044
convergence code 0; 2 optimizer warnings; 0 lme4 warnings
The dispersion parameter of the estimated Gamma distribution is
sigma(fit.glmer)^2 = 0.327
with a population mean (Fixed effect) of circa 3.0 sec which fits
reasonably well the RTs.
After some poking around, I figured out that the problem probably comes
from the
fact that the participants means are NOT distributed normally. In fact,
their distribution
is similarly skewed. Removing the two extreme subjects or using a log
link function "solves" the convergence
problem/ warning messages but I don't like either solution because the
is no valid reason to exclude these subjects
and using a log transform cause theoretical problems when one want to
interpret
effects of more complex models (see Lo & Andrew (2015)).
I have seen many discussions of about warning message/convergence problem (
e.g.
https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html).
The usual advice is to try with different optimizers and/or simplify the
model, which do not help here (they all give
1) Can somebody explain intuitively why non-normal random effects lead
to these warning
messages ? I would have naively assumed that it would simply lead
to large a large variance estimate.
2) Is there something within the lme4 framework to tackle this problem?
Should I simply ignore the
warning messages ?
3) What principle approach would you suggest ? I have seen that brms
allow in principle to specify
non-normal random effect (I think that only student is implemented
so far) but I haven't tried yet.
Here is a toy problem that reproduces the issue (it depends on the seed
actually). Depending
on the choice of the dispersion parameter for the REs, it is possible
also to induce convergence
problem.
# artificial data set
nsu <- 200
nrep <- 36
mu <- 3.0 # mean RT
phi.su <- 0.8 # dispersion participants
phi <- 0.3 # dispersion
d <- expand.grid(
su = paste0("S",1:nsu),
rep = 1:nrep,
mu =NA,
rt = NA)
set.seed(125)
# generate 200 Gamma distributed participant means
mu.su <- sort(rGamma(nsu, shape=1/phi.su, scale=phi.su*mu))
d$mu <- mu.su[d$su]
# add Gamma noise
d$rt <- rGamma(nsu*nrep,shape=1/phi,scale=phi*d$mu)
# fit intercept GLMM model (gives warning messages)
fit <- glmer(rt ~ 1 + (1|su), data=d, family=Gamma(link="identity"))
# histogram of participant means and response times
par(mfrow=c(1,2))
hist(mu.su,n=50)
hist(d$rt,n=200,xlim=c(0,10))
all_fit(fit)
bobyqa. : [OK]
Nelder_Mead. : [OK]
optimx.nlminb : [OK]
optimx.L-BFGS-B : [OK]
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
nloptwrap.NLOPT_LN_BOBYQA : [OK]
nmkbw. : [OK]
....
were 13 warnings (use warnings() to see them)
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.215844 (tol = 0.001,
component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
component 1)
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
convergence code 1 from optimx
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.215713 (tol = 0.001,
component 1)
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.102562 (tol = 0.001,
component 1)
9: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
component 1)
10: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
component 1)
11: In optwrap(optimizer, devfun, start, rho$lower, control = control,
convergence code 2 from nmkbw
12: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
Model failed to converge with max|grad| = 0.220811 (tol = 0.001,
component 1)
13: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Thank you,
Gabriel
--
---------------------------------------------------------------------
Gabriel Baud-Bovy tel.: (+39) 348 172 4045 (mobile)
UHSR University (+39) 02 2643 3429 (laboratory)
via Olgettina, 58 (+39) 02 9175 1540 (secretary)
---------------------------------------------------------------------
Rispetta l’ambiente: non stampare questa mail se non è necessario.
Respect the environment: print this email only if necessary.
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
_____________________________________________________________________
Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_____________________________________________________________________
SAVE PAPER - THINK BEFORE PRINTING
--
---------------------------------------------------------------------
Gabriel Baud-Bovy tel.: (+39) 348 172 4045 (mobile)
UHSR University (+39) 02 2643 3429 (laboratory)
via Olgettina, 58 (+39) 02 9175 1540 (secretary)
20132 Milan, Italy email: gabriel.baud-***@hsr.it
---------------------------------------------------------------------
Emmanuel Curis
2018-10-19 13:28:07 UTC
Permalink
Hello,

Just a personnal opinion considering the point of interaction
depending on the scale. I think this is more an issue of what model do
you believe in and its mathematical formulation, and how to handle
non-linearities, than a problem of link function (or data
transformation), especially if you consider an interaction « simply »
as a product of predictor variables term in a linear model.

That is, considering your RT variable for instance, and two predictors
for sake of simplicity, you simply cannot have simultaneously

RT = µ0 + a * factor1 + b * factor2

and

log RT = µ0' + a' * factor1 + b' * factor2

(unless a = b = 0, or both factor1 and factor2 are indicator
variables, that is associated to binary qualitative predictors)

so when trying to apply a linear model to the scale where the
relationship is not linear, some « spurious » interactions (defined as
product of factor1 and factor2) seem to appear because the model will
add these terms to handle the intrinsic non-linearity of the model
induced by the scale change.

That does not mean the answer is easy, but that using this argument to
avoid using non-linear link functions seems strange unless you have
strong arguments to affirm that the relationship is indeed linear in
the original scale. If you think the model is linear in the log scale,
then it is the model with the linear link that is doubtful, and the
associated (lack of) interaction « spurious ».

My guess would be, in your particular case, that a gamma model with
identity link and linear model on this scale simply does not hold,
because it violates the constraints on the gamma distribution
parameters, and that having random effects with a large variance may
be enough to have cases where the model, even quite simple, gives
irrealistic results. Especially with a Gaussian random effect, that
can have negative values...


On Thu, Oct 18, 2018 at 01:04:36PM +0000, Baud-Bovy Gabriel wrote:
« Dear Ulf,
«
« Thank you for the reply. I think that the approach is very reasonable
« but I am
« not fully comfortable with it for reasons that I detail below your comments.
«
« On 10/17/2018 6:28 PM, Ulf Köther wrote:
« > Dear Gabriel,
« >
« > as no one of the real professionals stepped in yet, I will try to
« > provide some thoughts:
« >
« > (1) In my experience, Gamma GLMMs with an identity link are very buggy
« > all the time, so normally, when I am confronted with positive continuous
« > data, I mostly never consider them in the first place...why not use
« > another link function when using GLMMs, because that's exactly the goal
« > they were invented for: modeling data that does not belong to a normal
« > distribution by using an appropriate link function.
« One motivation for my post was to ask for insights about why Gamma GLMMS
« with identity link
« are so difficult to fit even with a very simple intercept only model. 
« My guess is that it is related to the
« fact that the distribution of the subject's mean are not gaussian but I
« would very much appreciate
« opinion of the experts.
«
« > (2) I see why you don't want to transform the raw RT data and use an
« > ordinary LMM, but in my opinion, using a Gamma GLMM with a log-Link is a
« > valid option. Especially because you can insert the raw RT as a DV and
« > then model it on the scale of the linear predictor and after that, get
« > all parameter estimates and (which is much more valuable in my opinion)
« > the predicted values on the scale of the dependent variable without any
« > hard work, just use the invert link function (e.g., use the effects
« > package with lme4 and get the raw RT predictions). The interpretation if
« > such a model is much easier as you can directly get the predictions on
« > the scale of the DV.
« >
« > Is it totally reasonable with regards to power? That's another question
« > that a real RT specialist should answer...
« The log link solves indeed the fitting issue. However, my reason for not
« doing it, which is
« discussed in Lo & Andrew 2015 paper, is the problem of "scale dependent"
« interactions,
« which they discuss in their section on Transform or Not to Transform.
« https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full
«
« The "existence" of an such an interaction and, its statistical
« significance, depends on
« the scale.  If I am not mistaken, being able to transform back the
« coefficients on the raw RTs scale is a
« separate issue.
«
« >
« > (2) I know the article you mentioned and skimmed it again. I really
« > wonder why they do not mention a log-link function as a valid option and
« > just compare the LMM with the transformed DV against a Gamma GLMM with
« > the identity link or an inverse Gaussian with identity link. It seems to
« > my that the Gamma GLMM with a log link is never really discussed. But,
« > as they also use the inverse Gaussian with the following link function
« > in lme4:
« >
« > invfn <- function() {
« > ## link
« > linkfun <- function(y) -1000/y
« > ## inverse link
« > linkinv <- function(eta) -1000/eta
« > ## derivative of invlink wrt eta
« > mu.eta <- function(eta) { 1000/(eta^2) }
« > valideta <- function(eta) TRUE
« > link <- "-1000/y"
« > structure(list(linkfun = linkfun, linkinv = linkinv,
« > mu.eta = mu.eta, valideta = valideta,
« > name = link),
« > class = "link-glm")
« > }
« >
« > Why not try this when you want to go along the line the authors suggest?
« > The parameter estimates in their analyses seem really comparable between
« > Gamma and inverse Gaussian GLMMs to me. Just have a look at the
« > supplement! ;-)
«
« While I think that the problem of "scale dependent"  interaction
« remains, I probably should explore more
« this possibility because the transformed scale might  justified on the
« theoretical basis. In fact, I think
« that inverse transformation might be justified on basis of the LATER's
« model or similar
« (https://www.sciencedirect.com/science/article/pii/S0149763415301226).
« In this case,
« the coefficients represent the rate of accumulation of evidence. My
« understanding
« of this model is that gaussian noise is added to the rate of
« accumulation.  Typically, the inverse
« is applied to the RTs and the transformed data are analyzed with LMs or
« LMMs.
«
« Lo & Andrew use the inverse link function with Gamma or Inverse Gaussian
« distribution but they don't
« really discuss the assumptions behind choosing the inverse link
« function  and the inverse gaussian. From what I
« know, the inverse gaussian distribution is not the reciprocal of the
« normal distribution and I see only a
« partial  connection between the LATER model and the proposed GLMM model.
«
« I would appreciate any further comment from you and other RT experts on
« this issue.
«
« Best,
«
« Gabriel
«
« P.S.  Paul Johnson's wrote an interesting note on Gamma GLM with the
« inverse link
« (i.e. when modeling relationships of the form mu = a + b/x or 1/mu = a +
« b/x).
« http://pj.freefaculty.org/guides/stat/Regression-GLM/Gamma/GammaGLM-01.pdf
« His point was that using GLM rather than OLS  was important when there
« is a transformation
« but using Normal  instead of Gamma  distributed dependent variable with
« GLM does not
« result in disaster.  I think that the question when analyzing RTs is
« different since there
« is reason to use Gamma without a transformation, a case that is not
« discussed in Paul
« Johnson's note.
«
« > Try to play around with your generated data with the log-link function
« > and with the inverse Gaussian to see which one recovers your parameters
« > best and go from there...!
« >
« > Greetings,
« >
« > Ulf
« >
« >
« >
« >
« > Am 17.10.2018 um 11:06 schrieb Baud-Bovy Gabriel:
« >> My apologies for sending this post again. I am not sure that it went
« >> through because I did not
« >> receive it myself (though I think my options should have let me receive
« >> it). Also, there was
« >> a line missing due to hurried cut-and-paste in the example below.
« >>
« >> Dear all,
« >>
« >> I have a dataset with response times of 200 participants measured 36
« >> times in different conditions.
« >> The RTs distribution is right skewed as it is generally the case for
« >> RTs. I would like to avoid transform them
« >> (see Lo & Andrew
« >> (2015)](https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full).
« >> However, I often encounter warning messages with this dataset when using
« >> a GLMM with Gamma
« >> distribution and identity link. The problem is not caused by an over
« >> parametrization as it arises
« >> even in the case in a intercept only model.
« >>
« >> > fit <- glmer(choice.lat ~ 1 + (1|su), data=dat,
« >> family=Gamma(link="identity"))
« >> Warning messages:
« >> 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
« >> Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
« >> component 1)
« >> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
« >> Model is nearly unidentifiable: very large eigenvalue
« >> - Rescale variables?
« >>
« >> > fit
« >> Generalized linear mixed model fit by maximum likelihood (Laplace
« >> Approximation) [glmerMod]
« >> Family: Gamma ( identity )
« >> Formula: choice.lat ~ 1 + (1 | su)
« >> Data: dat
« >> AIC BIC logLik deviance df.resid
« >> 22629.26 22649.78 -11311.63 22623.26 6914
« >> Random effects:
« >> Groups Name Std.Dev.
« >> su (Intercept) 0.8477
« >> Residual 0.5721
« >> Number of obs: 6917, groups: su, 200
« >> Fixed Effects:
« >> (Intercept)
« >> 3.044
« >> convergence code 0; 2 optimizer warnings; 0 lme4 warnings
« >>
« >> The dispersion parameter of the estimated Gamma distribution is
« >> sigma(fit.glmer)^2 = 0.327
« >> with a population mean (Fixed effect) of circa 3.0 sec which fits
« >> reasonably well the RTs.
« >>
« >> After some poking around, I figured out that the problem probably comes
« >> from the
« >> fact that the participants means are NOT distributed normally. In fact,
« >> their distribution
« >> is similarly skewed. Removing the two extreme subjects or using a log
« >> link function "solves" the convergence
« >> problem/ warning messages but I don't like either solution because the
« >> is no valid reason to exclude these subjects
« >> and using a log transform cause theoretical problems when one want to
« >> interpret
« >> effects of more complex models (see Lo & Andrew (2015)).
« >>
« >> I have seen many discussions of about warning message/convergence problem (
« >> e.g.
« >> https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html).
« >> The usual advice is to try with different optimizers and/or simplify the
« >> model, which do not help here (they all give
« >> similar warning messages; see below). I have two questions:
« >>
« >> 1) Can somebody explain intuitively why non-normal random effects lead
« >> to these warning
« >> messages ? I would have naively assumed that it would simply lead
« >> to large a large variance estimate.
« >>
« >> 2) Is there something within the lme4 framework to tackle this problem?
« >> Should I simply ignore the
« >> warning messages ?
« >>
« >> 3) What principle approach would you suggest ? I have seen that brms
« >> allow in principle to specify
« >> non-normal random effect (I think that only student is implemented
« >> so far) but I haven't tried yet.
« >>
« >> Here is a toy problem that reproduces the issue (it depends on the seed
« >> actually). Depending
« >> on the choice of the dispersion parameter for the REs, it is possible
« >> also to induce convergence
« >> problem.
« >>
« >> # artificial data set
« >> nsu <- 200
« >> nrep <- 36
« >> mu <- 3.0 # mean RT
« >> phi.su <- 0.8 # dispersion participants
« >> phi <- 0.3 # dispersion
« >> d <- expand.grid(
« >> su = paste0("S",1:nsu),
« >> rep = 1:nrep,
« >> mu =NA,
« >> rt = NA)
« >>
« >> set.seed(125)
« >> # generate 200 Gamma distributed participant means
« >> mu.su <- sort(rGamma(nsu, shape=1/phi.su, scale=phi.su*mu))
« >> d$mu <- mu.su[d$su]
« >> # add Gamma noise
« >> d$rt <- rGamma(nsu*nrep,shape=1/phi,scale=phi*d$mu)
« >>
« >> # fit intercept GLMM model (gives warning messages)
« >> fit <- glmer(rt ~ 1 + (1|su), data=d, family=Gamma(link="identity"))
« >>
« >> # histogram of participant means and response times
« >> par(mfrow=c(1,2))
« >> hist(mu.su,n=50)
« >> hist(d$rt,n=200,xlim=c(0,10))
« >>
« >> > all_fit(fit)
« >> bobyqa. : [OK]
« >> Nelder_Mead. : [OK]
« >> optimx.nlminb : [OK]
« >> optimx.L-BFGS-B : [OK]
« >> nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
« >> nloptwrap.NLOPT_LN_BOBYQA : [OK]
« >> nmkbw. : [OK]
« >> ....
« >> were 13 warnings (use warnings() to see them)
« >> 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model failed to converge with max|grad| = 0.215844 (tol = 0.001,
« >> component 1)
« >> 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model is nearly unidentifiable: very large eigenvalue
« >> - Rescale variables?
« >> 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model failed to converge with max|grad| = 0.220808 (tol = 0.001,
« >> component 1)
« >> 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model is nearly unidentifiable: very large eigenvalue
« >> - Rescale variables?
« >> 5: In optwrap(optimizer, devfun, start, rho$lower, control = control, ... :
« >> convergence code 1 from optimx
« >> 6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model failed to converge with max|grad| = 0.215713 (tol = 0.001,
« >> component 1)
« >> 7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model is nearly unidentifiable: very large eigenvalue
« >> - Rescale variables?
« >> 8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model failed to converge with max|grad| = 0.102562 (tol = 0.001,
« >> component 1)
« >> 9: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
« >> ... :
« >> Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
« >> component 1)
« >> 10: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
« >> control$checkConv, ... :
« >> Model failed to converge with max|grad| = 0.109779 (tol = 0.001,
« >> component 1)
« >> 11: In optwrap(optimizer, devfun, start, rho$lower, control = control,
« >> ... :
« >> convergence code 2 from nmkbw
« >> 12: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
« >> control$checkConv, ... :
« >> Model failed to converge with max|grad| = 0.220811 (tol = 0.001,
« >> component 1)
« >> 13: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
« >> control$checkConv, ... :
« >> Model is nearly unidentifiable: very large eigenvalue
« >> - Rescale variables?
« >>
« >>
« >> Thank you,
« >>
« >> Gabriel
« >>
« >> --
« >> ---------------------------------------------------------------------
« >> Gabriel Baud-Bovy tel.: (+39) 348 172 4045 (mobile)
« >> UHSR University (+39) 02 2643 3429 (laboratory)
« >> via Olgettina, 58 (+39) 02 9175 1540 (secretary)
« >> 20132 Milan, Italy email: gabriel.baud-***@hsr.it
« >> ---------------------------------------------------------------------
« >>
« >>
« >>
« >> Rispetta l’ambiente: non stampare questa mail se non è necessario.
« >> Respect the environment: print this email only if necessary.
« >> _______________________________________________
« >> R-sig-mixed-***@r-project.org mailing list
« >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
« >>
« > --
« >
« > _____________________________________________________________________
« >
« > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
« > Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
« > _____________________________________________________________________
« >
« > SAVE PAPER - THINK BEFORE PRINTING
«
«
« --
« ---------------------------------------------------------------------
« Gabriel Baud-Bovy tel.: (+39) 348 172 4045 (mobile)
« UHSR University (+39) 02 2643 3429 (laboratory)
« via Olgettina, 58 (+39) 02 9175 1540 (secretary)
« 20132 Milan, Italy email: gabriel.baud-***@hsr.it
« ---------------------------------------------------------------------
«
« _______________________________________________
« R-sig-mixed-***@r-project.org mailing list
« https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Emmanuel CURIS
***@parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html
Ben Bolker
2018-10-25 23:45:45 UTC
Permalink
Please keep r-sig-mixed-models in the Cc: .

I don't immediately see any way to allow a correlation between the
intercept terms and the X-covariates, nor even why that would
necessarily make sense. I do recall that there's some stuff in the
causal inference literature (of special interest to economists)
surrounding this assumption, and how one can center covariates by group
to address the issue, but I can't remember where it is/point you to it
at the moment. Perhaps someone else on the mailing list can help ...
Hi Ben,
Thanks for your response. For the second question in your reply to this
email, it is my mistake. I meant to incorporate correlation between the
random intercept and X-covariates, not their beta coefficients. Is there
a way to do that?
Regards
Yashree-
Post by Yashree Mehta
Hi,
Is there literature on how to specify the dependency structure
between the
Post by Yashree Mehta
random intercept and the statistical noise error term in a random
intercept
Post by Yashree Mehta
model?
It would be useful to also know how to implement using R...
  Can you be more specific about what you want?  Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for each
observation (error term) and an epsilon_"1,i} for each group (random
intercept).  Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with variance
sigma_1^2.  What kind of correlation structure are you looking for?
===
I am working with a random intercept model. I have the usual "X" vector
of covariates and one id variable which will make up the random
intercept. For example,
Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID
I need to acknowledge that there is correlation between the FIXED EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.
Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===
  The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect coefficients
and random intercepts. (This actually seems like a weird question to me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model.  In the
context of LMM fitting, I don't think parameters are random effects in
this sense.
Post by Yashree Mehta
Thank you
Yashree
       [[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Poe, John
2018-10-26 00:30:48 UTC
Permalink
Here's a link to my previous post on the topic of model specification and
endogeneity (correlation between X and the grouping structure). You
basically either do group mean centering or group varying coefficients.
However, those aren't bullet proof depending on your data structure.

Also, i did a thread on twitter a while back that is related and links some
additional readings. I have a link to my advanced multilevel syllabus for
the icpsr program on the Twitter thead and it has a pretty decent reading
list on there.

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2016q4/025150.html

https://twitter.com/DavidPoe223/status/1009485620337692674?s=09
Post by Ben Bolker
Please keep r-sig-mixed-models in the Cc: .
I don't immediately see any way to allow a correlation between the
intercept terms and the X-covariates, nor even why that would
necessarily make sense. I do recall that there's some stuff in the
causal inference literature (of special interest to economists)
surrounding this assumption, and how one can center covariates by group
to address the issue, but I can't remember where it is/point you to it
at the moment. Perhaps someone else on the mailing list can help ...
Hi Ben,
Thanks for your response. For the second question in your reply to this
email, it is my mistake. I meant to incorporate correlation between the
random intercept and X-covariates, not their beta coefficients. Is there
a way to do that?
Regards
Yashree-
Post by Yashree Mehta
Hi,
Is there literature on how to specify the dependency structure
between the
Post by Yashree Mehta
random intercept and the statistical noise error term in a random
intercept
Post by Yashree Mehta
model?
It would be useful to also know how to implement using R...
Can you be more specific about what you want? Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for
each
observation (error term) and an epsilon_"1,i} for each group (random
intercept). Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with
variance
sigma_1^2. What kind of correlation structure are you looking for?
===
I am working with a random intercept model. I have the usual "X"
vector
of covariates and one id variable which will make up the random
intercept. For example,
Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID
I need to acknowledge that there is correlation between the FIXED
EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.
Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===
The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect
coefficients
and random intercepts. (This actually seems like a weird question to
me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model. In the
context of LMM fitting, I don't think parameters are random effects
in
this sense.
Post by Yashree Mehta
Thank you
Yashree
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
Yashree Mehta
2018-11-06 18:11:08 UTC
Permalink
Hi,

Regarding the question on dependency structure, is there a way to allow for
the possibility of the error term and random intercept being correlated? I
need to define the covariance matrix between these two terms and estimate
the values which should go into this matrix.

Thank you

Regards,
Yashree
Post by Yashree Mehta
Post by Yashree Mehta
Hi,
Is there literature on how to specify the dependency structure between
the
Post by Yashree Mehta
random intercept and the statistical noise error term in a random
intercept
Post by Yashree Mehta
model?
It would be useful to also know how to implement using R...
Can you be more specific about what you want? Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for each
observation (error term) and an epsilon_"1,i} for each group (random
intercept). Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with variance
sigma_1^2. What kind of correlation structure are you looking for?
===
I am working with a random intercept model. I have the usual "X" vector
of covariates and one id variable which will make up the random
intercept. For example,
Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID
I need to acknowledge that there is correlation between the FIXED EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.
Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===
The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect coefficients
and random intercepts. (This actually seems like a weird question to me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model. In the
context of LMM fitting, I don't think parameters are random effects in
this sense.
Post by Yashree Mehta
Thank you
Yashree
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
Poe, John
2018-11-06 18:25:12 UTC
Permalink
Just to clarify, you mean that you want to specify a correlation structure
between the individual level error term in the model (also called the
residuals) and the random intercept or group-level error.

This doesn't make a lot of sense to me because the random intercept is
literally the product of a decomposition of the general model's error
structure into the within group (R matrix) and between group (G matrix)
components of the error. They are uncorrelated by construction. The only
way that they could possibly be correlated would be if you had an
exchangability problem in the random effects structure. You could have a
fuzzy boundaries issue like US counties are correlated by space. But you
wouldn't solve that by correlating the lower level error term with the
random intercept. You'd build a group boundary spatial weights matrix and
include it in the model.

I must be missing something in the translation.
Post by Yashree Mehta
Hi,
Regarding the question on dependency structure, is there a way to allow for
the possibility of the error term and random intercept being correlated? I
need to define the covariance matrix between these two terms and estimate
the values which should go into this matrix.
Thank you
Regards,
Yashree
Post by Yashree Mehta
Post by Yashree Mehta
Hi,
Is there literature on how to specify the dependency structure between
the
Post by Yashree Mehta
random intercept and the statistical noise error term in a random
intercept
Post by Yashree Mehta
model?
It would be useful to also know how to implement using R...
Can you be more specific about what you want? Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for each
observation (error term) and an epsilon_"1,i} for each group (random
intercept). Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with variance
sigma_1^2. What kind of correlation structure are you looking for?
===
I am working with a random intercept model. I have the usual "X" vector
of covariates and one id variable which will make up the random
intercept. For example,
Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID
I need to acknowledge that there is correlation between the FIXED EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.
Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===
The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect coefficients
and random intercepts. (This actually seems like a weird question to me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model. In the
context of LMM fitting, I don't think parameters are random effects in
this sense.
Post by Yashree Mehta
Thank you
Yashree
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Thanks,
John


John Poe, Ph.D.
Postdoctoral Scholar / Research Methodologist
Center for Public Health Services & Systems Research
University of Kentucky
www.johndavidpoe.com

[[alternative HTML version deleted]]
Yashree Mehta
2018-11-06 20:05:33 UTC
Permalink
thanks for your reply. I read about prediction theory that in the
application of BLUP, one can try to reparameterize a model with non-zero
variance-covariance matrix between the error term and the random intercept
into an equivalent model containing the random intercept and error term as
uncorrelated. Is this possible?
Post by Poe, John
Just to clarify, you mean that you want to specify a correlation structure
between the individual level error term in the model (also called the
residuals) and the random intercept or group-level error.
This doesn't make a lot of sense to me because the random intercept is
literally the product of a decomposition of the general model's error
structure into the within group (R matrix) and between group (G matrix)
components of the error. They are uncorrelated by construction. The only
way that they could possibly be correlated would be if you had an
exchangability problem in the random effects structure. You could have a
fuzzy boundaries issue like US counties are correlated by space. But you
wouldn't solve that by correlating the lower level error term with the
random intercept. You'd build a group boundary spatial weights matrix and
include it in the model.
I must be missing something in the translation.
Post by Yashree Mehta
Hi,
Regarding the question on dependency structure, is there a way to allow for
the possibility of the error term and random intercept being correlated? I
need to define the covariance matrix between these two terms and estimate
the values which should go into this matrix.
Thank you
Regards,
Yashree
Post by Yashree Mehta
Post by Yashree Mehta
Hi,
Is there literature on how to specify the dependency structure between
the
Post by Yashree Mehta
random intercept and the statistical noise error term in a random
intercept
Post by Yashree Mehta
model?
It would be useful to also know how to implement using R...
Can you be more specific about what you want? Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for each
observation (error term) and an epsilon_"1,i} for each group (random
intercept). Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with variance
sigma_1^2. What kind of correlation structure are you looking for?
===
I am working with a random intercept model. I have the usual "X" vector
of covariates and one id variable which will make up the random
intercept. For example,
Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID
I need to acknowledge that there is correlation between the FIXED EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.
Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===
The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect coefficients
and random intercepts. (This actually seems like a weird question to me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model. In the
context of LMM fitting, I don't think parameters are random effects in
this sense.
Post by Yashree Mehta
Thank you
Yashree
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Thanks,
John
John Poe, Ph.D.
Postdoctoral Scholar / Research Methodologist
Center for Public Health Services & Systems Research
University of Kentucky
www.johndavidpoe.com
[[alternative HTML version deleted]]

Malcolm Fairbrother
2018-10-26 09:02:35 UTC
Permalink
Hi,

John Poe’s response to this just now is very helpful.

Yashree, you may find it helpful to look at some code I wrote to run simulations effectively implementing what John suggests (distinguishing between from within effects). A paper presenting the simulations is here <https://doi.org/10.1017/psrm.2013.24>, and the R code is in the supplementary material linked from that website. I used the “rmsn” function to generate random intercepts that are correlated with a group-level covariate—which I take it is the issue you want to deal with.

Other than using a “REWB” specification (see also https://doi.org/10.1007/s11135-018-0802-x), I’m not sure there’s much you can do.

Hope that helps,
Malcolm


Malcolm Fairbrother

Professor of Sociology, Umeå University<http://www.soc.umu.se/english/>
Researcher, Institute for Futures Studies<https://www.iffs.se/en/>
Sweden



Date: Thu, 25 Oct 2018 19:45:45 -0400
From: Ben Bolker <***@gmail.com<mailto:***@gmail.com>>
To: Yashree Mehta <***@gmail.com<mailto:***@gmail.com>>
Cc: "r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>"
<r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>>
Subject: Re: [R-sig-ME] Dependency structure
Message-ID: <a1cab4e4-514a-e18e-4e86-***@gmail.com<mailto:a1cab4e4-514a-e18e-4e86-***@gmail.com>>
Content-Type: text/plain; charset="utf-8"


Please keep r-sig-mixed-models in the Cc: .

I don't immediately see any way to allow a correlation between the
intercept terms and the X-covariates, nor even why that would
necessarily make sense. I do recall that there's some stuff in the
causal inference literature (of special interest to economists)
surrounding this assumption, and how one can center covariates by group
to address the issue, but I can't remember where it is/point you to it
at the moment. Perhaps someone else on the mailing list can help ...



On 2018-10-25 12:04 p.m., Yashree Mehta wrote:
Hi Ben,

Thanks for your response. For the second question in your reply to this
email, it is my mistake. I meant to incorporate correlation between the
random intercept and X-covariates, not their beta coefficients. Is there
a way to do that?

Regards
Yashree-

On Wed, Oct 17, 2018 at 2:37 AM Ben Bolker <***@gmail.com<mailto:***@gmail.com>
<mailto:***@gmail.com>> wrote:


Hi,

Is there literature on how to specify the dependency structure
between the
random intercept and the statistical noise error term in a random
intercept
model?
It would be useful to also know how to implement using R...


Can you be more specific about what you want? Suppose you have
observations j within groups i, and you have an epsilon_{0,ij} for each
observation (error term) and an epsilon_"1,i} for each group (random
intercept). Typically the epsilon_{0,ij} values are iid with
homogeneous variance sigma_0^2, and epsilon_{1,i} are iid with variance
sigma_1^2. What kind of correlation structure are you looking for?

While we're at it, you previously asked:

===
I am working with a random intercept model. I have the usual "X" vector
of covariates and one id variable which will make up the random
intercept. For example,

Response variable: Production of maize
Covariate: Size of plot
ID variable: Household_ID

I need to acknowledge that there is correlation between the FIXED EFFECT
coefficient of plot size and the estimated random intercept. It is my
model assumption.

Does lme4 assume this correlation or do I have to make changes in the
formula so that it gets considered?
===

The short answer to this one is "no", I think -- I don't know that
there's a way to allow for correlation between fixed effect coefficients
and random intercepts. (This actually seems like a weird question to me;
in the frequentist world, as far as I know, you can only specify
correlation models for *random variables* within the model. In the
context of LMM fitting, I don't think parameters are random effects in
this sense.

On 2018-10-16 01:03 PM, Yashree Mehta wrote:


Thank you

Yashree

[[alternative HTML version deleted]]


[[alternative HTML version deleted]]
Loading...