Jasmin Herden
2018-05-24 12:16:54 UTC
Dear fellow R users,
I have recently started using the MCMCglmm R package to analyse some of my
problematic
data which severely suffers from (quasi)complete separation.
I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
fixed effects to analyse such kinds of data.
(https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)
My model is:
k<-8 #number of the fixed effects
#Intercept+single effects+interactions
prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
R=list(V=1,fix=1),
G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))
nsamp <- 10000
THIN <- 900
BURNIN <- 10000
NITT <- BURNIN + THIN*nsamp
model3 = MCMCglmm(survival~
Site*b*c,
random=~x+Field+Field_block,
data=dset,
slice=TRUE,
pl=T,
prior=prior.c,
family="categorical",verbose=FALSE,
nitt=NITT,burnin=BURNIN,thin=THIN)
Survival is a binary value of 0 or 1 and is observed only once per
experimental plant.
Therefore the observation-level variance R is fixed to 1. (As in the linked
example.)
Site, b, and c are two-level categorical variables. x is crossed with Field
and Field_block, but Field_block is nested within Field.
Models are run for each species separately.
My questions are:
a) Many worked examples which I based my own analysis on use the
Gelman-Rubin
criterion where you check the convergence of your model by running it a
number of times and then compare models.
However, I think the MCMCglmm vignette said to start the model running with
overdispersed priors which is definitely not an option for me with the kind
of data I have.
I have tried using the testing for the Gelman-Rubin criterion nonetheless,
but the Gelman diagnostic plots do not show a oscillating line that finally
converges on a value but
rather clines and straigt lines.
b) I am also not quite sure, if the value R is fixed at is appropiate for
all models I run. For some
models, I still get latent variable values bigger than 20, even at very
high numbers of iterations.
c) How do you decide to use family="categorical" (=logit link) or "ordinal"
(=probit link)?
Based on the DIC of the models?
d) For many of my models, the explained variance for the random effects
Field and Field_block are very high; sometimes reaching an upper estimate
of 99%.
I think the problem is that Field_block is not only nested in Field but
that Field is also
nested in the categorical fixed effect Site.
Is my model overparametrized with regard to Field, since I have nearly
complete survival in one of the two levels of Site?
Kind regards,
Jasmin
[[alternative HTML version deleted]]
I have recently started using the MCMCglmm R package to analyse some of my
problematic
data which severely suffers from (quasi)complete separation.
I have followed Ben Bolker's suggestions of zero-mean Normal priors on the
fixed effects to analyse such kinds of data.
(https://ms.mcmaster.ca/~bolker/R/misc/foxchapter/bolker_chap.html)
My model is:
k<-8 #number of the fixed effects
#Intercept+single effects+interactions
prior.c <- list(B=list(V=diag(9,k), mu=rep(0,k)),
R=list(V=1,fix=1),
G=list(G1=list(V=1, nu=1,alpha.mu=0, alpha.V=1000),
G2=list(V=1,nu=1,alpha.mu=0, alpha.V=1000),
G3=list(V=1,nu=1,alpha.mu=0, alpha.V=1000)))
nsamp <- 10000
THIN <- 900
BURNIN <- 10000
NITT <- BURNIN + THIN*nsamp
model3 = MCMCglmm(survival~
Site*b*c,
random=~x+Field+Field_block,
data=dset,
slice=TRUE,
pl=T,
prior=prior.c,
family="categorical",verbose=FALSE,
nitt=NITT,burnin=BURNIN,thin=THIN)
Survival is a binary value of 0 or 1 and is observed only once per
experimental plant.
Therefore the observation-level variance R is fixed to 1. (As in the linked
example.)
Site, b, and c are two-level categorical variables. x is crossed with Field
and Field_block, but Field_block is nested within Field.
Models are run for each species separately.
My questions are:
a) Many worked examples which I based my own analysis on use the
Gelman-Rubin
criterion where you check the convergence of your model by running it a
number of times and then compare models.
However, I think the MCMCglmm vignette said to start the model running with
overdispersed priors which is definitely not an option for me with the kind
of data I have.
I have tried using the testing for the Gelman-Rubin criterion nonetheless,
but the Gelman diagnostic plots do not show a oscillating line that finally
converges on a value but
rather clines and straigt lines.
b) I am also not quite sure, if the value R is fixed at is appropiate for
all models I run. For some
models, I still get latent variable values bigger than 20, even at very
high numbers of iterations.
c) How do you decide to use family="categorical" (=logit link) or "ordinal"
(=probit link)?
Based on the DIC of the models?
d) For many of my models, the explained variance for the random effects
Field and Field_block are very high; sometimes reaching an upper estimate
of 99%.
I think the problem is that Field_block is not only nested in Field but
that Field is also
nested in the categorical fixed effect Site.
Is my model overparametrized with regard to Field, since I have nearly
complete survival in one of the two levels of Site?
Kind regards,
Jasmin
[[alternative HTML version deleted]]