[R-sig-ME] A theoretical question : usage of AIC and similar information criteria for mixed models

Discussion:

Nik Tuzov

2018-11-01 18:32:21 UTC

Hello all:
I was wondering if you could comment on this theoretical question.
You are probably familiar with the book of Burnham & Anderson:

https://www.amazon.com/Kenneth-Burnham-Selection-Multi-Model-Information-Theoretic/dp/B008UBJ0VQ/ref=sr_1_2?s=books&ie=UTF8&qid=1540931659&sr=1-2&keywords=model+selection+and+multimodel+inference

1) They claim (Section 6.6.1) that AIC and similar information criteria can not be used to compare models that have different random effects
because the number of effective parameters associated with a random effect is unknown. For instance, if there is a fixed categorical factor with K levels,
then the number of parameters associated with it is (K - 1). If that factor is labeled random, then one can say there is only one parameter,
the corresponding variance component sigma_K. However, most likely the number of "effective" parameters is somewhere between 1 and (K-1).
Since we don't know what it is, AIC is not computable.

2) On the other hand, I believe any mixed model can be represented as Y = Xb + e, where X describes the fixed effects and e is the variance-covariance matrix
of the error terms that is defined by random effects. Technically speaking, that model has no random effects, and the solution can be obtained using
Generalized Least Squares. Given all that, Burnham & Anderson essentially claim that AIC and similar criteria cannot be computed for GLS.

3) I find it hard to believe in 2). In particular, AIC have been routinely used in Time Series. A simple AR(1) model can be represented as Y = Xb + e
where Var[e] is not diagonal because all of the observations are correlated. If it's ok to use AIC for Time Series, why is it a problem with
GLS and mixed models?

Please let me know what you think.

Regards,
Nik Tuzov, PhD

[[alternative HTML version deleted]]

Ben Bolker

2018-11-01 19:48:37 UTC

Permalink

This is a good question; I think it's harder than you think.

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#can-i-use-aic-for-mixed-models-how-do-i-count-the-number-of-degrees-of-freedom-for-a-random-effect

has some discussion, including references to the "conditional AIC"
which tries to compute an effective number of parameters based on the
degree of shrinkage. The key question is whether we're trying to
evaluate expected predictive accuracy at the level of the population
(as typically assumed in the time series/GLS literature) or the
individual group (as Burnham and Anderson seem to be assuming).

On Thu, Nov 1, 2018 at 3:02 PM Nik Tuzov <***@ntuzov.com> wrote:
>
> Hello all:
> I was wondering if you could comment on this theoretical question.
> You are probably familiar with the book of Burnham & Anderson:
>
> https://www.amazon.com/Kenneth-Burnham-Selection-Multi-Model-Information-Theoretic/dp/B008UBJ0VQ/ref=sr_1_2?s=books&ie=UTF8&qid=1540931659&sr=1-2&keywords=model+selection+and+multimodel+inference
>
> 1) They claim (Section 6.6.1) that AIC and similar information criteria can not be used to compare models that have different random effects
> because the number of effective parameters associated with a random effect is unknown. For instance, if there is a fixed categorical factor with K levels,
> then the number of parameters associated with it is (K - 1). If that factor is labeled random, then one can say there is only one parameter,
> the corresponding variance component sigma_K. However, most likely the number of "effective" parameters is somewhere between 1 and (K-1).
> Since we don't know what it is, AIC is not computable.
>
> 2) On the other hand, I believe any mixed model can be represented as Y = Xb + e, where X describes the fixed effects and e is the variance-covariance matrix
> of the error terms that is defined by random effects. Technically speaking, that model has no random effects, and the solution can be obtained using
> Generalized Least Squares. Given all that, Burnham & Anderson essentially claim that AIC and similar criteria cannot be computed for GLS.
>
> 3) I find it hard to believe in 2). In particular, AIC have been routinely used in Time Series. A simple AR(1) model can be represented as Y = Xb + e
> where Var[e] is not diagonal because all of the observations are correlated. If it's ok to use AIC for Time Series, why is it a problem with
> GLS and mixed models?
>
> Please let me know what you think.
>
> Regards,
> Nik Tuzov, PhD
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

D. Rizopoulos

2018-11-01 21:24:55 UTC

Permalink

There are several definitions of AIC for mixed models, and it matters at which level you want to do the selection, i.e., for the implied marginal model (fixed effects alone) or the hierarchical model (fixed and random effects).

For the latter, have a look at the cAIC4 package (https://cran.r-project.org/package=cAIC4 )

Best,
Dimitris

From: Nik Tuzov <***@ntuzov.com<mailto:***@ntuzov.com>>
Date: Thursday, 01 Nov 2018, 8:02 PM
To: r-sig-mixed-***@r-project.org <r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>>
Subject: [R-sig-ME] A theoretical question : usage of AIC and similar information criteria for mixed models

Hello all:
I was wondering if you could comment on this theoretical question.
You are probably familiar with the book of Burnham & Anderson:

https://www.amazon.com/Kenneth-Burnham-Selection-Multi-Model-Information-Theoretic/dp/B008UBJ0VQ/ref=sr_1_2?s=books&ie=UTF8&qid=1540931659&sr=1-2&keywords=model+selection+and+multimodel+inference

1) They claim (Section 6.6.1) that AIC and similar information criteria can not be used to compare models that have different random effects
because the number of effective parameters associated with a random effect is unknown. For instance, if there is a fixed categorical factor with K levels,
then the number of parameters associated with it is (K - 1). If that factor is labeled random, then one can say there is only one parameter,
the corresponding variance component sigma_K. However, most likely the number of "effective" parameters is somewhere between 1 and (K-1).
Since we don't know what it is, AIC is not computable.

2) On the other hand, I believe any mixed model can be represented as Y = Xb + e, where X describes the fixed effects and e is the variance-covariance matrix
of the error terms that is defined by random effects. Technically speaking, that model has no random effects, and the solution can be obtained using
Generalized Least Squares. Given all that, Burnham & Anderson essentially claim that AIC and similar criteria cannot be computed for GLS.

3) I find it hard to believe in 2). In particular, AIC have been routinely used in Time Series. A simple AR(1) model can be represented as Y = Xb + e
where Var[e] is not diagonal because all of the observations are correlated. If it's ok to use AIC for Time Series, why is it a problem with
GLS and mixed models?

Please let me know what you think.

Regards,
Nik Tuzov, PhD

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

[[alternative HTML version deleted]]