Anne Lerche
2018-09-21 14:54:21 UTC
Good afternoon,
I have a problem with reporting significance of b-splines components
in a mixed-effects logistic regression fit in lme4 (caused by a
reviewer's comment on a paper). After several hours of searching the
literature, forums and the internet more generally, I have not found a
solution and therefore turn to the recipients of this mailing list for
help. (My questions are at the very end of the mail)
I am trying to model the change in the use of linguistic variable on
the basis of corpus data. My dataset contains the binary dependent
variable (DV, variant "a" or "b" being used), 2 random variables (RV1
and RV2, both categorical) and three predictors (IV1=time, IV2=another
numeric variable, IV3=a categorical variable with 7 levels).
I wasn't sure if I should attach my (modified) dataset, so I'm trying
to produce an example. Unfortunately, it doesn't give the same results
as my original dataset.
library(lme4)
library(splines)
library(languageR)
df <- dative[dative$Modality == "spoken",]
df <- df[,c("RealizationOfRecipient", "Verb", "Speaker",
"LengthOfTheme", "SemanticClass")]
colnames(df) <- c("DV", "RV1", "RV2", "IV2", "IV3")
set.seed(130)
df$IV1 <- sample(1:13, 2360, replace = TRUE)
My final regression model looks like this (treatment contrast coding):
fin.mod <- glmer(DV~bs(IV1, knots=c(5,9), degree=1)+IV2+IV3+(1|RV1)+(1|RV2),
data=df, family=binomial)
summary(fin.mod, corr=FALSE)
where the effect of IV1 is modelled as a b-spline with 2 knots and a
degree of 1. Anova comparisons (of the original dataset) show that
this model performs significantly better than a) a model without IV1
modelled as a b-spline (bs(IV1, knots=c(5,9), degree=1)), b) a model
with IV1 as a linear predictor (not using bs), c) a model with the df
of the spline specified instead of the knots (df=3), so that bs
chooses knots autonomously, and d) a model with only 2 df (bs(IV1,
df=2, degree=1)). I also ran comparisons with models with quadratic or
cubis splines, and still my final model performs significantly better.
The problem is that I am reporting this final model in a paper, and
one of the reviewers comments that I am reporting a non-significant
effect of IV1 because according to the coefficients table the variable
does not seem to have a significant effect (outlier correction does
not make a big difference):
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.52473 0.50759 1.034 0.301
bs(IV1, knots = c(5, 9), degree = 1)1 -0.93178 0.59162 -1.575 0.115
bs(IV1, knots = c(5, 9), degree = 1)2 0.69287 0.43018 1.611 0.107
bs(IV1, knots = c(5, 9), degree = 1)3 -0.19389 0.61144 -0.317 0.751
IV2 0.47041 0.11615 4.050 5.12e-05 ***
IV3level2 0.30149 0.53837 0.560 0.575
IV3level3 0.15682 0.48760 0.322 0.748
IV3level4 -0.89664 0.18656 -4.806 1.54e-06 ***
IV3level5 -2.90305 0.68119 -4.262 2.03e-05 ***
IV3level6 -0.32081 0.29438 -1.090 0.276
IV3level7 -0.07038 0.87727 -0.080 0.936
(coefficients table of the sample dataset will differ)
I know that the results of anova comparisons and what the coefficients
table shows are two different things (as in the case of IV3 which also
significantly improves model quality when added to the regression even
if only few levels show significant contrasts).
My questions are:
How can I justify reporting my regression model when the regression
table shows only non-significant components for the b-spline term? (Is
it enough to point to the anova comparisons?)
Is is possible to keep only some components of the b-spline (as
suggested here for linear regression:
https://freakonometrics.hypotheses.org/47681)?
Is there a better way of modeling the data? I am not very familiar
with gamm4 or nlme, for example.
Any help is very much appreciated!
Thank you,
Anne
I have a problem with reporting significance of b-splines components
in a mixed-effects logistic regression fit in lme4 (caused by a
reviewer's comment on a paper). After several hours of searching the
literature, forums and the internet more generally, I have not found a
solution and therefore turn to the recipients of this mailing list for
help. (My questions are at the very end of the mail)
I am trying to model the change in the use of linguistic variable on
the basis of corpus data. My dataset contains the binary dependent
variable (DV, variant "a" or "b" being used), 2 random variables (RV1
and RV2, both categorical) and three predictors (IV1=time, IV2=another
numeric variable, IV3=a categorical variable with 7 levels).
I wasn't sure if I should attach my (modified) dataset, so I'm trying
to produce an example. Unfortunately, it doesn't give the same results
as my original dataset.
library(lme4)
library(splines)
library(languageR)
df <- dative[dative$Modality == "spoken",]
df <- df[,c("RealizationOfRecipient", "Verb", "Speaker",
"LengthOfTheme", "SemanticClass")]
colnames(df) <- c("DV", "RV1", "RV2", "IV2", "IV3")
set.seed(130)
df$IV1 <- sample(1:13, 2360, replace = TRUE)
My final regression model looks like this (treatment contrast coding):
fin.mod <- glmer(DV~bs(IV1, knots=c(5,9), degree=1)+IV2+IV3+(1|RV1)+(1|RV2),
data=df, family=binomial)
summary(fin.mod, corr=FALSE)
where the effect of IV1 is modelled as a b-spline with 2 knots and a
degree of 1. Anova comparisons (of the original dataset) show that
this model performs significantly better than a) a model without IV1
modelled as a b-spline (bs(IV1, knots=c(5,9), degree=1)), b) a model
with IV1 as a linear predictor (not using bs), c) a model with the df
of the spline specified instead of the knots (df=3), so that bs
chooses knots autonomously, and d) a model with only 2 df (bs(IV1,
df=2, degree=1)). I also ran comparisons with models with quadratic or
cubis splines, and still my final model performs significantly better.
The problem is that I am reporting this final model in a paper, and
one of the reviewers comments that I am reporting a non-significant
effect of IV1 because according to the coefficients table the variable
does not seem to have a significant effect (outlier correction does
not make a big difference):
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.52473 0.50759 1.034 0.301
bs(IV1, knots = c(5, 9), degree = 1)1 -0.93178 0.59162 -1.575 0.115
bs(IV1, knots = c(5, 9), degree = 1)2 0.69287 0.43018 1.611 0.107
bs(IV1, knots = c(5, 9), degree = 1)3 -0.19389 0.61144 -0.317 0.751
IV2 0.47041 0.11615 4.050 5.12e-05 ***
IV3level2 0.30149 0.53837 0.560 0.575
IV3level3 0.15682 0.48760 0.322 0.748
IV3level4 -0.89664 0.18656 -4.806 1.54e-06 ***
IV3level5 -2.90305 0.68119 -4.262 2.03e-05 ***
IV3level6 -0.32081 0.29438 -1.090 0.276
IV3level7 -0.07038 0.87727 -0.080 0.936
(coefficients table of the sample dataset will differ)
I know that the results of anova comparisons and what the coefficients
table shows are two different things (as in the case of IV3 which also
significantly improves model quality when added to the regression even
if only few levels show significant contrasts).
My questions are:
How can I justify reporting my regression model when the regression
table shows only non-significant components for the b-spline term? (Is
it enough to point to the anova comparisons?)
Is is possible to keep only some components of the b-spline (as
suggested here for linear regression:
https://freakonometrics.hypotheses.org/47681)?
Is there a better way of modeling the data? I am not very familiar
with gamm4 or nlme, for example.
Any help is very much appreciated!
Thank you,
Anne
--
Anne Lerche
Institute of British Studies
Leipzig University
Anne Lerche
Institute of British Studies
Leipzig University