Hi Meng,
Is it possible for you to model the raw data (the counts of alleles within each individual site/assay/etc) rather than the calculated allele frequency (= proportion*)? If you have access to this data it would simplify your model greatly. I guess the reason for having zeroes is that there were no copies of an allele in a subpopulation (site, etc), giving an estimated proportion of zero? If so, you need a model with a binomial distribution for the responses (e.g. cbind(n.minor.allele, n.major.allele)**, assuming two alleles), and you could have a beta distribution for the allele frequencies. Then the model will see zeroes as being drawn from a binomial distribution with a non-zero allele frequency. This is the Balding-Nichols model, which gives a natural way of gauging the strength of random effects via f, which is an analogue of Wright’s Fst (which measures correlation between alleles in a subpopulation relative to the total population):
p.sub ~ Beta(shape1 = (1 - f) / f * p, shape2 = (1 - f) / f * (1 - p))
…for 0 < f < 1. p is the mean allele frequency, p.sub are the subpopulation (site, assay, etc) allele frequencies.
And the distribution of the number of alleles in subpopulation i is:
n.minor.allele[i] ~ Binom(2n[i], p.sub[i])
...n[i] is the number of diploid individuals in subpopulation i.
Alternatively, I guess there’s nothing to stop you having normally distributed allele frequencies on the logit scale, in which case you could fit the model with glmer.
Best wishes,
Paul
*It’s a stats/population genetics language difference — for statisticians frequency means count, while for population geneticists an allele frequency is a proportion (I have a foot in each camp).
**Pooling alleles in this way, i.e. ignoring the individual level, assumes HWE. If you don’t want to do that you can treat a diploid individual as a level nested within site/assay/etc, and estimate an individual level f, which is analogous to Fis, the inbreeding coefficient.
Post by Meng LiuHi Ben and Guilaume,
Thank you for reply. I am working on a precision experiment design, in
which a sample will be tested using different assay, by different operator
at different site. The measurement is allele frequency of DNA, which is a
continuous proportion outcome. I originally plan to run a beta distribution
random effects model, among which assay, operator and site are all random
factors. However, because I found there are some zeros in the response
data, that's why I am trying to run a zero-inflated beta random effects
model, with random factors in both zero part and non zero part. I.e., we
assume there will be variance from each factor in terms of predicting zero,
and variance from each factor in terms of continuous data. However, the
final research question would be evaluating the total variance contributed
from each factor.I can see here it is more complex for just generalized
linear model because of random effects from two different models. I am
wondering if you have any idea on this or do you know anybody who might
have thoughts on this?
Thank you again for all help!
Best regards,
Meng
Post by Ben BolkerI'm not sure how this (variance decomposition based on a
zero-inflated model) would work.
What is your subject-area/scientific question?
On Wed, Jun 13, 2018 at 4:26 AM, Guillaume Chaumet
Post by Guillaume ChaumetMy bad, I replied to you the first time without including the list.
Regarding your last question, perhaps the list and/or Ben could
provide a more accurate answer than me.
I'm also curious to know how glmmTMB could do that
Hi Guillaume,
Thank you so much for this! I just have another question: for example
if I
Post by Guillaume Chaumethave random factor A and B in both logistic model part and beta model
part,
Post by Guillaume Chaumetthen after I fit the whole model and got variance component estimation
of
Post by Guillaume Chaumetrandom effect for factor A and B for both logistic model part and beta
model
Post by Guillaume Chaumetmodel part, will there be any way to combine variance together? I.e. I
can
Post by Guillaume Chaumetestimate a total variance from factor A, and a total variance from
factor B
Post by Guillaume Chaumet(i.e. only differ by factor, not model)? Something like variance
decomposition but I believe here is more complex as this is a mixture
model.
Post by Guillaume ChaumetThank you again for all your help
Best regards,
Meng
On Sun, Jun 10, 2018 at 11:03 AM, Guillaume Chaumet
https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-
2Dproject.org_web_packages_brms_index.html&d=DwIBaQ&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=
Ij73g98b5MaGitndhxmoIw&m=Uy-z_keMG1SfZG-g8FxVqzfz-Ghl2OHun7TY7tfexwo&s=
Gfi89kd1PSimpIhWBglYPuJRn3_FF_uNBGvzVDvWe4A&e=
Post by Guillaume ChaumetPost by Meng LiuTo whom it may concern,
I am trying to fit a model for a data among which the response value
is
Post by Guillaume ChaumetPost by Meng Liuwithin [0,1). I am thinking about fitting the zeros as a complete separate
category from the non-zero data, i.e. a binomial (Bernoulli) model to "==0
vs >0" and a Beta model to the >0 responses. Also, my data contains
both
Post by Guillaume ChaumetPost by Meng Liunested factors and crossed factors, which means I need to add nested random
effects and crossed random effects to both logistic model part and
beta
Post by Guillaume ChaumetPost by Meng Liumodel model. However, I didn't find any R packages can do exactly
what I
Post by Guillaume ChaumetPost by Meng Liuwant (By far I found gamlss, glmmTMB, zoib but they either can only assume
random zero or they can only fit repeated measures/clustered data but not
nested and crossed design). Therefore, I am wondering if any one
know if
Post by Guillaume ChaumetPost by Meng Liuthere is any available package or function can do this.
Thank you very much for your help!
Best regards
Meng
[[alternative HTML version deleted]]
_______________________________________________
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
ethz.ch_mailman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwIBaQ&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=
Ij73g98b5MaGitndhxmoIw&m=Uy-z_keMG1SfZG-g8FxVqzfz-Ghl2OHun7TY7tfexwo&s=
FMNtOORgf7OlXhD5m8VHoGCnuWlt5NLqtXxalxQOhQw&e=
Post by Guillaume Chaumet_______________________________________________
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
ethz.ch_mailman_listinfo_r-2Dsig-2Dmixed-2Dmodels&d=DwIFaQ&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Ij73g98b5MaGitndhxmoIw&m=
2BcoGwnjTH2lMtoXsQy3PazIJQlKthmPnzhLeUAe8Iw&s=
ni4tfiCBB6TzaOnxJfYkjSxo5Ztgo3sx1FyYE-Qc2AY&e=
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models