[R-sig-ME] Question about continuous distributions in GLMM

Discussion:

Victoria Ortiz

2018-04-26 23:45:28 UTC

I write to ask a simple question about quantitative continuous variables
distributions. We have data for morphological traits in insects but they do
not fit any distribution in GLMM. The design has two fixed variables and a
random one. We are interested in the variance components of the random
variable and its interactions. We tried normal (lm4), gamma (glmer),
lognormal (GLMMPQL), tweedie (GLMMTMB) and compound poison (CPLM). There is
no good fit for any case. In fact, the better model using AIC is normal. The
residuals vs. predicted graphic and the Q-Q plot have the following
form: *https://github.com/vicrotas/Repositorio-de-Vicka/issues/1
<https://github.com/vicrotas/Repositorio-de-Vicka/issues/1>*

Given that the fit to normal distribution is not good, we want to know if
there is any other distribution we could try. What else we can do in this
scenario?

On the other hand, to estimate the variance components we used the
following in lmer:

m1 <- lmer ( variable ~ fixed factor 1 * fixed factor 2 + (fixed factor 1
* fixed factor 2 | | random factor))

The specific question is if the double bar ('| |') is a good way to
estimate the variance components or if there is another way to do it?

Thanks in advance!

[[alternative HTML version deleted]]

Ben Bolker

2018-04-27 20:08:00 UTC

Permalink

Post by Victoria Ortiz
I write to ask a simple question about quantitative continuous variables
distributions. We have data for morphological traits in insects but they do
not fit any distribution in GLMM. The design has two fixed variables and a
random one. We are interested in the variance components of the random
variable and its interactions. We tried normal (lm4), gamma (glmer),
lognormal (GLMMPQL), tweedie (GLMMTMB) and compound poison (CPLM). There is
no good fit for any case. In fact, the better model using AIC is normal. The
residuals vs. predicted graphic and the Q-Q plot have the following
form: *https://github.com/vicrotas/Repositorio-de-Vicka/issues/1
<https://github.com/vicrotas/Repositorio-de-Vicka/issues/1>*

I'm not quite sure what to suggest about the distribution. Since this
looks left-skewed, you might try a power transformation with g > 1 (e.g.
x^1.5) to shift it. (That would be applied to the data rather than the
residuals, so might not work perfectly ...) For a rough idea, you could
run a Box-Cox analysis on the residuals.

Alternatively, if you can figure out a permutation approach that works
(e.g. permutation within and between groups) that could give you a
distribution-robust way to get a p-value.

Post by Victoria Ortiz
Given that the fit to normal distribution is not good, we want to know if
there is any other distribution we could try. What else we can do in this
scenario?
On the other hand, to estimate the variance components we used the
m1 <- lmer ( variable ~ fixed factor 1 * fixed factor 2 + (fixed factor 1
* fixed factor 2 || random factor))
The specific question is if the double bar ('| |') is a good way to
estimate the variance components or if there is another way to do it?

Can you clarify what you mean by "variance components"? Are you
explicitly trying to partition variance, or are you just trying to make
sure that you control for among-group variation?

If your data will support it, I think it would be better to fit the
unstructured variance-covariance matrix; if not, you could try one of
the Bayesian methods (blme, MCMCglmm, brms, rstanarm ...) that would
allow you to regularize/put a prior on the variance-covariance matrix.

Post by Victoria Ortiz
Thanks in advance!
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Rune Haubo

2018-05-09 18:41:20 UTC

Permalink

Hi,
I'm so sorry for the delay in the response, I was with a lot of work.
With "variance components" I mean the partition of the total variance into
the different factors that explain it. Our interest is to have a
quantification of the portion of the variance explained by the different
factors, both random and fixed. Translated to the biology of our data, this
means to estimate genetic, genotype x environment variation, and
environment variation of the total phenotypic variation for a given trait
in a population. In particular, the objective is to compare this estimators
between diferent populations analyzed separately.
Additionaly, reading another topics of this mail list, I found that the
classical model for testing the interaction and obtain the variance
m2 <- lmer ( variable ~ fixed factor 1 * fixed factor 2 + (1 | random
factor) + (1 | fixed factor 1:random factor2) + (1 | fixed factor 2:random
factor) + (1| fixed factor 1:fixed factor 2:random factor))
So, with this model, in the summary I can see the partition of the total
variance of the random effects. Is this right?

Yes, this model will decompose the variance of the response into
variance components for the random effects and the residual variance.

Finally, if I want the p-values of the random effects, I should analize the
full and reduce models sequentially. Also, I found that another way to do
it is with the 'ranova' function from the lmerTest package, but the results
are very dissimilar. I don't know in wich analysis should I trust, I think
that in this case the sequentially one is correct.

Can you quantify how these approaches are different? If you run
lmerTest::ranova(m2) it should provide (REML) likelihood ratio tests
of the random terms by deleting these from the full model one-by-one.
Note that if the model is fitted with REML (default) the tests are
REML-likelihood ratio tests - otherwise ML likelihood ratio tests.

Perhaps you use anova(m2, reduce_m2) or equivalently anova(m2,
reduce_m2, refit=TRUE) which produce ML likelihood ratio tests while
fitting your model with REML and that is the source of the difference?
[For tests of random effect terms I recommend the REML likelihood
ratio tests produced by lmerTest::ranova over the ML LR tests produced
by anova(m2, reduce_m2, refit=TRUE) but other tools, e.g. package
RLRsim may produce even more accurate tests].

Cheers
Rune