[R-sig-ME] P-value associated to explanatory from glmer binomial family

Discussion:

Mario Garrido

2018-05-23 11:14:59 UTC

Dear lme4-users,
I am trying to get the P-value associated with a glmer model from the
binomial family.
My model is the following:
glmer(Infection.status~origin+ (1|donationID), family=binomial)->q7H

where Infection status is a dummy variable with two levels, infected and
uninfected
I tried to get the P-value associated to the the explanatory variable origin
but I get only the F-value and the degrees of freedom

(aov <- anova(q7H))
Analysis of Variance Table
Df Sum Sq Mean Sq F value
origin 2 5.3061 2.6531 2.6531

I have 2 different questions
1. Am I doing correctly or am I using an incorrect command?

2. with the F-value I get and the df, should I go to test the significance
to a F or Chi-squared table? I guess I should go to the latest since I am
running a binomial test, right?
In case I have to go to an F table, how can I know the numerator and
denominator degrees of freedom?

Thanks in advance

[[alternative HTML version deleted]]

Thierry Onkelinx

2018-05-24 08:55:32 UTC

Permalink

Dear Mario,

Calculating the degrees of freedom of a mixed model is not straightforward.

A workaround would be to use a likelihoodratio test between two nested
models: one with and one without the variable. See the example below.

library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
anova(gm1)
gm0 <- glmer(cbind(incidence, size - incidence) ~ (1 | herd),
data = cbpp, family = binomial)
anova(gm1, gm0)

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Post by Mario Garrido
Dear lme4-users,
I am trying to get the P-value associated with a glmer model from the
binomial family.
glmer(Infection.status~origin+ (1|donationID), family=binomial)->q7H
where Infection status is a dummy variable with two levels, infected and
uninfected
I tried to get the P-value associated to the the explanatory variable origin
but I get only the F-value and the degrees of freedom
(aov <- anova(q7H))
Analysis of Variance Table
Df Sum Sq Mean Sq F value
origin 2 5.3061 2.6531 2.6531
I have 2 different questions
1. Am I doing correctly or am I using an incorrect command?
2. with the F-value I get and the df, should I go to test the significance
to a F or Chi-squared table? I guess I should go to the latest since I am
running a binomial test, right?
In case I have to go to an F table, how can I know the numerator and
denominator degrees of freedom?
Thanks in advance
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

[[alternative HTML version deleted]]

Mario Garrido

2018-05-25 08:13:34 UTC

Permalink

Dear Thierry,
thanks so much for the clarification. After I run the LRT I get those
results

gm1 <- glmer(Mycoplasma~ type+ (1|donor.number), family=binomial)

anova(gm1)

Analysis of Variance Table

Df Sum Sq Mean Sq F value

type 1 3.2124 3.2124 3.2124

gm0<- glmer(Mycoplasma~ 1+ (1|donor.number), family=binomial)

anova(gm1, gm0)

Data: NULL

Models:

gm0: Mycoplasma ~ 1 + (1 | donor.number)

gm1: Mycoplasma ~ type + (1 | donor.number)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

gm0 2 46.492 49.869 -21.246 42.492
gm1 3 44.901 49.968 -19.451 38.901 3.5905 1 0.05811 .

The F-value associated to type, my only explanatory variable, is 3.124, as
anova(gm1) shows above

1. So which values should I take to calculate the P-value associated to the
variable type? 2 and 3 as shows anova(gm1, gm0)
Is like that then?

1-pf(3.2124,2,3)

[1] 0.1795865

or

1-pf(3.2124,3,2)

[1] 0.246378

2. anova(gm1, gm0) give a P-value associated of 0.058, since I have only
one explanatory variable, is not this value the defining the significance
of this variable (the ine that makes the difference between the 2 models)

3. What is in this case the F-value and df provided by anova(gm1)?

Srry, I am a little confused with the results. Thanks!

Post by Thierry Onkelinx
Dear Mario,
Calculating the degrees of freedom of a mixed model is not straightforward.
A workaround would be to use a likelihoodratio test between two nested
models: one with and one without the variable. See the example below.
library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
anova(gm1)
gm0 <- glmer(cbind(incidence, size - incidence) ~ (1 | herd),
data = cbpp, family = binomial)
anova(gm1, gm0)
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
Havenlaan 88
<https://maps.google.com/?q=Havenlaan+88&entry=gmail&source=g> bus 73,
1000 Brussel
www.inbo.be
////////////////////////////////////////////////////////////
///////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
////////////////////////////////////////////////////////////
///////////////////////////////
<https://www.inbo.be>

[[alternative HTML version deleted]]

Ben Bolker

2018-05-25 18:45:58 UTC

Permalink

What Thierry is explaining is that F tests don't really work for
GLM(M)s. (F tests are based on the uncertainty of the estimate of the
residual variance; typical (binomial & Poisson) GLMs don't estimate a
residual variance at all. There is a theory of "Bartlett corrections",
which are finite-size corrections for GLMs, but they're not widely
used.)

1. pf(Fstat,df,lower.tail=FALSE) (equivalent to your 1-pf(...)
calculation, but more accurate for small p-values) would be reasonable
*if* the "F statistic" presented by anova() were actually
F-distributed with a known df, but as Thierry said, it's not. The "2"
and "3" given by anova() are the *model* ("numerator") degrees of
freedom, not the *residual* (denominator) df, which are unknown/hard
to compute.

2. 0.058 is indeed the p-value associated with the 'type' variable,
since that's the only difference between the models

3. The df doesn't really exist here. I *think* you can get to the
equivalent F statistic reported in the anova() from the information
given here, but I'd have to think about it for 5 minutes ... but since
you're not going to run an F test anyway, it doesn't matter too much
...

Post by Mario Garrido
Dear Thierry,
thanks so much for the clarification. After I run the LRT I get those
results
gm1 <- glmer(Mycoplasma~ type+ (1|donor.number), family=binomial)
anova(gm1)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
type 1 3.2124 3.2124 3.2124
gm0<- glmer(Mycoplasma~ 1+ (1|donor.number), family=binomial)
anova(gm1, gm0)
Data: NULL
gm0: Mycoplasma ~ 1 + (1 | donor.number)
gm1: Mycoplasma ~ type + (1 | donor.number)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
gm0 2 46.492 49.869 -21.246 42.492
gm1 3 44.901 49.968 -19.451 38.901 3.5905 1 0.05811 .
The F-value associated to type, my only explanatory variable, is 3.124, as
anova(gm1) shows above
1. So which values should I take to calculate the P-value associated to the
variable type? 2 and 3 as shows anova(gm1, gm0)
Is like that then?
1-pf(3.2124,2,3)
[1] 0.1795865
or
1-pf(3.2124,3,2)
[1] 0.246378
2. anova(gm1, gm0) give a P-value associated of 0.058, since I have only
one explanatory variable, is not this value the defining the significance
of this variable (the ine that makes the difference between the 2 models)
3. What is in this case the F-value and df provided by anova(gm1)?
Srry, I am a little confused with the results. Thanks!

[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models