Discussion:
[R-sig-ME] MCMCglmm models and (quasi-)complete separation
Jasmin Herden
2018-05-24 12:16:54 UTC
Permalink
Dear fellow R users,

I have recently started using the MCMCglmm R package to analyse some of my
problematic
data which severely suffers from (quasi)complete separation.

I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
fixed effects to analyse such kinds of data.
(https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)

My model is:

k<-8 #number of the fixed effects
#Intercept+single effects+interactions

prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
R=list(V=1,fix=1),
G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))

nsamp <- 10000
THIN <- 900
BURNIN <- 10000
NITT <- BURNIN + THIN*nsamp
model3 = MCMCglmm(survival~
Site*b*c,
random=~x+Field+Field_block,
data=dset,
slice=TRUE,
pl=T,
prior=prior.c,
family="categorical",verbose=FALSE,
nitt=NITT,burnin=BURNIN,thin=THIN)

Survival is a binary value of 0 or 1 and is observed only once per
experimental plant.
Therefore the observation-level variance R is fixed to 1. (As in the linked
example.)

Site, b, and c are two-level categorical variables. x is crossed with Field
and Field_block, but Field_block is nested within Field.

Models are run for each species separately.

My questions are:

a) Many worked examples which I based my own analysis on use the
Gelman-Rubin
criterion where you check the convergence of your model by running it a
number of times and then compare models.

However, I think the MCMCglmm vignette said to start the model running with
overdispersed priors which is definitely not an option for me with the kind
of data I have.

I have tried using the testing for the Gelman-Rubin criterion nonetheless,
but the Gelman diagnostic plots do not show a oscillating line that finally
converges on a value but
rather clines and straigt lines.

b) I am also not quite sure, if the value R is fixed at is appropiate for
all models I run. For some
models, I still get latent variable values bigger than 20, even at very
high numbers of iterations.

c) How do you decide to use family="categorical" (=logit link) or "ordinal"
(=probit link)?
Based on the DIC of the models?

d) For many of my models, the explained variance for the random effects
Field and Field_block are very high; sometimes reaching an upper estimate
of 99%.
I think the problem is that Field_block is not only nested in Field but
that Field is also
nested in the categorical fixed effect Site.
Is my model overparametrized with regard to Field, since I have nearly
complete survival in one of the two levels of Site?

Kind regards,
Jasmin

[[alternative HTML version deleted]]
David Duffy
2018-05-25 06:58:40 UTC
Permalink
Dear Jasmin. I, at least, would need to see some kind of data to understand your comments re separation etc. Does a simpler model run eg dropping field_block and simplifying the fixed effects? And does such a model run in lmer or other programs? Diagnosing a nonidentified model (ie some parameters in your model may not be estimable given your pattern of observations) in MCMC is hard.
________________________________________
From: R-sig-mixed-models [r-sig-mixed-models-***@r-project.org] on behalf of Jasmin Herden [***@gmail.com]
Sent: Thursday, 24 May 2018 10:16 PM
To: r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] MCMCglmm models and (quasi-)complete separation

Dear fellow R users,

I have recently started using the MCMCglmm R package to analyse some of my
problematic
data which severely suffers from (quasi)complete separation.

I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
fixed effects to analyse such kinds of data.
(https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)

My model is:

k<-8 #number of the fixed effects
#Intercept+single effects+interactions

prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
R=list(V=1,fix=1),
G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))

nsamp <- 10000
THIN <- 900
BURNIN <- 10000
NITT <- BURNIN + THIN*nsamp
model3 = MCMCglmm(survival~
Site*b*c,
random=~x+Field+Field_block,
data=dset,
slice=TRUE,
pl=T,
prior=prior.c,
family="categorical",verbose=FALSE,
nitt=NITT,burnin=BURNIN,thin=THIN)

Survival is a binary value of 0 or 1 and is observed only once per
experimental plant.
Therefore the observation-level variance R is fixed to 1. (As in the linked
example.)

Site, b, and c are two-level categorical variables. x is crossed with Field
and Field_block, but Field_block is nested within Field.

Models are run for each species separately.

My questions are:

a) Many worked examples which I based my own analysis on use the
Gelman-Rubin
criterion where you check the convergence of your model by running it a
number of times and then compare models.

However, I think the MCMCglmm vignette said to start the model running with
overdispersed priors which is definitely not an option for me with the kind
of data I have.

I have tried using the testing for the Gelman-Rubin criterion nonetheless,
but the Gelman diagnostic plots do not show a oscillating line that finally
converges on a value but
rather clines and straigt lines.

b) I am also not quite sure, if the value R is fixed at is appropiate for
all models I run. For some
models, I still get latent variable values bigger than 20, even at very
high numbers of iterations.

c) How do you decide to use family="categorical" (=logit link) or "ordinal"
(=probit link)?
Based on the DIC of the models?

d) For many of my models, the explained variance for the random effects
Field and Field_block are very high; sometimes reaching an upper estimate
of 99%.
I think the problem is that Field_block is not only nested in Field but
that Field is also
nested in the categorical fixed effect Site.
Is my model overparametrized with regard to Field, since I have nearly
complete survival in one of the two levels of Site?

Kind regards,
Jasmin

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Jasmin Herden
2018-05-25 12:05:33 UTC
Permalink
Dear David,

a)


*Dear Jasmin. I, at least, would need to see some kind of data to
understand your comments re separation etc.*
Here are the first few lines of the data set.


> head(dsetMailingList) Site Field Species PlantID native.neophyte Treatment Seed_family Origin survival_to_harvest Sp_Sf Field_Subplot
355 Potsdam DB Dat Dat10CK neophyte CON
E10 K 1 Dat.E10 DB _ C
356 Potsdam MQ Dat Dat10CK neophyte CON
E10 K 0 Dat.E10 MQ _ B
357 Potsdam GR Dat Dat10CK neophyte CON
E10 K 1 Dat.E10 GR _ B
358 Potsdam GR Dat Dat10CP neophyte CON
10 P 0 Dat.10 GR _ B
359 Potsdam DB Dat Dat10CP neophyte CON
10 P 1 Dat.10 DB _ A
360 Potsdam MQ Dat Dat10CP neophyte CON
10 P 1 Dat.10 MQ _ B


>

b) *Does a simpler model run eg dropping field_block and simplifying the
fixed effects? And does such a model run in lmer or other programs?*

I had originally started with a binomial glmer model, but in most species
this lead to ridiculously large standard errors. These large SE values are
indicative of quasi-complete separation (i.e. nearly complete 0 or 1 for
one factor level). (See also the following link:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004172.html).

I have tried using blglmer as suggested by Ben Bolker in the example
referenced in my first post to the mailing list. In principle, all models
with complete separation looked good with bglmer.
However, my intended approach in glmer was to take the full model and get
likelihood ratio tests and corresponding p-values for the fixed effects by
step-wise model
reduction. Theoretically, this is also possible with bglmer, but Ben Bolker
advised against using bglmer for more than just an overall model check in
several posts. Furthermore, I am not quite sure if or how I need to adjust
the priors in bglmer during step-wise model reduction or if step-wise model
reduction is appropiate for bglmer at all. Ben Bolker recommended using the
MCMCglmm package instead of bglmer for a proper analysis.

Due to my approach of testing all fixed effects (Site, Origin, and
Treatment, and interactions) in my local adaptation study, reducing the
fixed effects is not really an option.

Kind regards,
Jasmin Herden




On Fri, May 25, 2018 at 8:58 AM, David Duffy <***@qimrberghofer.
edu.au> wrote:

> Does a simpler model run eg dropping field_block and simplifying the
> fixed effects? And does such a model run in lmer or other programs?
> Diagnosing a nonidentified model (ie some parameters in your model may not
> be estimable given your pattern of observations) in MCMC is hard.
> ________________________________________
> From: R-sig-mixed-models [r-sig-mixed-models-***@r-project.org] on
> behalf of Jasmin Herden [***@gmail.com]
> Sent: Thursday, 24 May 2018 10:16 PM
> To: r-sig-mixed-***@r-project.org
> Subject: Re: [R-sig-ME] MCMCglmm models and (quasi-)complete separation
>
> Dear fellow R users,
>
> I have recently started using the MCMCglmm R package to analyse some of my
> problematic
> data which severely suffers from (quasi)complete separation.
>
> I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
> fixed effects to analyse such kinds of data.
> (https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)
>
> My model is:
>
> k<-8 #number of the fixed effects
> #Intercept+single effects+interactions
>
> prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
> R=list(V=1,fix=1),
> G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
> G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
> G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))
>
> nsamp <- 10000
> THIN <- 900
> BURNIN <- 10000
> NITT <- BURNIN + THIN*nsamp
> model3 = MCMCglmm(survival~
> Site*b*c,
> random=~x+Field+Field_block,
> data=dset,
> slice=TRUE,
> pl=T,
> prior=prior.c,
> family="categorical",verbose=FALSE,
> nitt=NITT,burnin=BURNIN,thin=THIN)
>
> Survival is a binary value of 0 or 1 and is observed only once per
> experimental plant.
> Therefore the observation-level variance R is fixed to 1. (As in the linked
> example.)
>
> Site, b, and c are two-level categorical variables. x is crossed with Field
> and Field_block, but Field_block is nested within Field.
>
> Models are run for each species separately.
>
> My questions are:
>
> a) Many worked examples which I based my own analysis on use the
> Gelman-Rubin
> criterion where you check the convergence of your model by running it a
> number of times and then compare models.
>
> However, I think the MCMCglmm vignette said to start the model running with
> overdispersed priors which is definitely not an option for me with the kind
> of data I have.
>
> I have tried using the testing for the Gelman-Rubin criterion nonetheless,
> but the Gelman diagnostic plots do not show a oscillating line that finally
> converges on a value but
> rather clines and straigt lines.
>
> b) I am also not quite sure, if the value R is fixed at is appropiate for
> all models I run. For some
> models, I still get latent variable values bigger than 20, even at very
> high numbers of iterations.
>
> c) How do you decide to use family="categorical" (=logit link) or "ordinal"
> (=probit link)?
> Based on the DIC of the models?
>
> d) For many of my models, the explained variance for the random effects
> Field and Field_block are very high; sometimes reaching an upper estimate
> of 99%.
> I think the problem is that Field_block is not only nested in Field but
> that Field is also
> nested in the categorical fixed effect Site.
> Is my model overparametrized with regard to Field, since I have nearly
> complete survival in one of the two levels of Site?
>
> Kind regards,
> Jasmin
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

[[alternative HTML version deleted]]
Matthew Wolak
2018-05-25 14:48:59 UTC
Permalink
Dear Jasmin,

Have you had a look at the MCMCglmm Course Notes (apologies if you have)? Specifically in Ch. 2 (specifically pp.52-57, of my Nov. 14, 2016 version, if you want to jump right into some of the relevant chunks). That at least gives an example and works through some of the issues you are dealing with. In particular, when Jarrod explains setting a better prior on the fixed effects, he drops the models global intercept, which might be pretty helpful in your case.

Just a couple (~5) additional thoughts inserted to your original email below. Hope this helps!

Sincerely,
Matthew
________________________________________
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> on behalf of Jasmin Herden <***@gmail.com>
Sent: Friday, May 25, 2018 7:05 AM
To: r-sig-mixed-***@r-project.org; ***@qimrberghofer.edu.au
Subject: Re: [R-sig-ME] MCMCglmm models and (quasi-)complete separation

Dear David,

a)


*Dear Jasmin. I, at least, would need to see some kind of data to
understand your comments re separation etc.*
Here are the first few lines of the data set.


> head(dsetMailingList) Site Field Species PlantID native.neophyte Treatment Seed_family Origin survival_to_harvest Sp_Sf Field_Subplot
355 Potsdam DB Dat Dat10CK neophyte CON
E10 K 1 Dat.E10 DB _ C
356 Potsdam MQ Dat Dat10CK neophyte CON
E10 K 0 Dat.E10 MQ _ B
357 Potsdam GR Dat Dat10CK neophyte CON
E10 K 1 Dat.E10 GR _ B
358 Potsdam GR Dat Dat10CP neophyte CON
10 P 0 Dat.10 GR _ B
359 Potsdam DB Dat Dat10CP neophyte CON
10 P 1 Dat.10 DB _ A
360 Potsdam MQ Dat Dat10CP neophyte CON
10 P 1 Dat.10 MQ _ B


>

b) *Does a simpler model run eg dropping field_block and simplifying the
fixed effects? And does such a model run in lmer or other programs?*

I had originally started with a binomial glmer model, but in most species
this lead to ridiculously large standard errors. These large SE values are
indicative of quasi-complete separation (i.e. nearly complete 0 or 1 for
one factor level). (See also the following link:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004172.html).

I have tried using blglmer as suggested by Ben Bolker in the example
referenced in my first post to the mailing list. In principle, all models
with complete separation looked good with bglmer.
However, my intended approach in glmer was to take the full model and get
likelihood ratio tests and corresponding p-values for the fixed effects by
step-wise model
reduction. Theoretically, this is also possible with bglmer, but Ben Bolker
advised against using bglmer for more than just an overall model check in
several posts. Furthermore, I am not quite sure if or how I need to adjust
the priors in bglmer during step-wise model reduction or if step-wise model
reduction is appropiate for bglmer at all. Ben Bolker recommended using the
MCMCglmm package instead of bglmer for a proper analysis.

Due to my approach of testing all fixed effects (Site, Origin, and
Treatment, and interactions) in my local adaptation study, reducing the
fixed effects is not really an option.

Kind regards,
Jasmin Herden




On Fri, May 25, 2018 at 8:58 AM, David Duffy <***@qimrberghofer.
edu.au> wrote:

> Does a simpler model run eg dropping field_block and simplifying the
> fixed effects? And does such a model run in lmer or other programs?
> Diagnosing a nonidentified model (ie some parameters in your model may not
> be estimable given your pattern of observations) in MCMC is hard.
> ________________________________________
> From: R-sig-mixed-models [r-sig-mixed-models-***@r-project.org] on
> behalf of Jasmin Herden [***@gmail.com]
> Sent: Thursday, 24 May 2018 10:16 PM
> To: r-sig-mixed-***@r-project.org
> Subject: Re: [R-sig-ME] MCMCglmm models and (quasi-)complete separation
>
> Dear fellow R users,
>
> I have recently started using the MCMCglmm R package to analyse some of my
> problematic
> data which severely suffers from (quasi)complete separation.
>
> I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
> fixed effects to analyse such kinds of data.
> (https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)
>
> My model is:
>
> k<-8 #number of the fixed effects
> #Intercept+single effects+interactions
>

#### <<INSERT>> ####
# So if you removed the global intercept, k<-7

> prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
> R=list(V=1,fix=1),
> G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
> G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
> G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))
>
> nsamp <- 10000
> THIN <- 900
> BURNIN <- 10000
> NITT <- BURNIN + THIN*nsamp


#### <<INSERT>> ####
# You may want to consider a larger thinning interval (and for trial runs of the model, nsamp~1000 will suffice)
## I can't tell you an exact number, but it could be THIN <- 5000 or 10000
# These specifications also partly depend on the size of your dataset and the number of levels of fixed
## and random effects and the distribution of 0s/1s across these; which we cannot get from `head()`




> model3 = MCMCglmm(survival~
> Site*b*c,
> random=~x+Field+Field_block,
> data=dset,
> slice=TRUE,
> pl=T,
> prior=prior.c,
> family="categorical",verbose=FALSE,
> nitt=NITT,burnin=BURNIN,thin=THIN)
>
> Survival is a binary value of 0 or 1 and is observed only once per
> experimental plant.
> Therefore the observation-level variance R is fixed to 1. (As in the linked
> example.)
>
> Site, b, and c are two-level categorical variables. x is crossed with Field
> and Field_block, but Field_block is nested within Field.
>
> Models are run for each species separately.
>
> My questions are:
>
> a) Many worked examples which I based my own analysis on use the
> Gelman-Rubin
> criterion where you check the convergence of your model by running it a
> number of times and then compare models.
>
> However, I think the MCMCglmm vignette said to start the model running with
> overdispersed priors which is definitely not an option for me with the kind
> of data I have.
>
> I have tried using the testing for the Gelman-Rubin criterion nonetheless,
> but the Gelman diagnostic plots do not show a oscillating line that finally
> converges on a value but
> rather clines and straigt lines.


#### <<INSERT>> ####
# I'm not sure to what part of the MCMCglmm vignette you are referring, but to use this
## Approach with multiple models, you will need to specify starting values.
## Otherwise, each model might proceed along the same chain (if I'm not mistaken) and you just get
## n of the same model/posterior
# To specify starting values:

# I can't remember if it is necessary, but it would be good to set the random seed for each model
m <- 1 #<-- model=1
set.seed(101+m)
#XXX Make sure matches prior in structure
# nfx <- number of liabilities
startN <- list(liab = rnorm(nfx, 0, 12), # Need to specify the liabilities
B = list(V=diag(20,k), mu=rep(0,k)),
R = list(R1 = rIW(diag(1), 15, fix = 1)), # where I have 15, tailor to the values you are getting
G = list(G1 = rIW(diag(1), 15),
G2 = rIW(diag(1), 15),
G3 = rIW(diag(1), 15)))
# Then in MCMCglmm()
## MCMCglmm(...., start = startN)




>
> b) I am also not quite sure, if the value R is fixed at is appropiate for
> all models I run. For some
> models, I still get latent variable values bigger than 20, even at very
> high numbers of iterations.
>


#### <<INSERT>> ####
# fixing R=1 is fairly standard for family="categorical".



> c) How do you decide to use family="categorical" (=logit link) or "ordinal"
> (=probit link)?
> Based on the DIC of the models?
>
> d) For many of my models, the explained variance for the random effects
> Field and Field_block are very high; sometimes reaching an upper estimate
> of 99%.
> I think the problem is that Field_block is not only nested in Field but
> that Field is also
> nested in the categorical fixed effect Site.
> Is my model overparametrized with regard to Field, since I have nearly
> complete survival in one of the two levels of Site?

#### <<INSERT>> ####
# You might need to make sure the model you specify is converging OK, before
## deciding about your model specification based on the parameters it has estimated



>
> Kind regards,
> Jasmin
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Loading...