[R-sig-ME] pre/post with partial participation

Paul Johnson

2018-11-13 16:11:22 UTC

I have a crazy ANOVA question and would appreciate your advice.

We have a project that did a pre-post measurement, but the
participation in the data collection was haphazard. There are only 19
people that participated pre-post, but there are about 40 that
participated only in the pre phase and 30 that participated in the
post phase.

I don't have the data to show you, but I made some up. I've got
ID=1:19 for the ones who are both in pre and post data samples, and ID
20:59 are in pre only and 60:89 are post only.

In my first example, the data has no true random effect and lmer gets
the correct estimate, the random effect is estimated as 0.00, or close
to it. I find that slight differences in the way I generate the data
(either I get exactly 0.0 or 7 x 10^-13 or similar).

This way of making the data generates a "full pre/post" data set and
then throws away pre and post observations for the missing cases:

set.seed(234234)
dat4 <- data.frame(ID = rep(1:89, 2), x = gl(2, 89, labels = c("pre", "post")))
err <- rnorm(length(dat4$x), 0, 1)
b <- 0
beta <- 4
dat4$y <- ifelse(dat4$x == "pre", 40 + err, 40 + beta + err) + b
## Now omit the
## post measurement for ID 20:59
## pre measurement for ID 60:89
dat4 <- dat4[!(dat4$ID %in% 20:59 & dat4$x == "post"), ]
dat4 <- dat4[!(dat4$ID %in% 60:89 & dat4$x == "pre"), ]

library(lme4)

m1 <- lmer(y ~ x + (1 | ID), dat4)
summary(m1)

Output shows nearly 0 random ID variance:

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | ID)
Data: dat4

REML criterion at convergence: 313.3

Scaled residuals:
Min 1Q Median 3Q Max
-2.4502 -0.6261 -0.0361 0.5753 3.5321

Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 7.342e-13 8.569e-07
Residual 1.043e+00 1.021e+00
Number of obs: 108, groups: ID, 89

Fixed effects:
Estimate Std. Error t value
(Intercept) 39.8946 0.1330 299.99
xpost 4.2518 0.1974 21.54

I thought that was a happy result, the pre/post effect is estimated
reasonably and the estimator does not find a random effect if there is
none.

Then I put in a random effect.
set.seed(234234)
dat5 <- data.frame(ID = rep(1:89, 2), x = gl(2, 89, labels = c("pre", "post")))
err <- rnorm(length(dat5$x), 0, 1)
b <- rep(rnorm(89, 0, 1), 2)
beta <- 4
dat5$y <- ifelse(dat5$x == "pre", 40 + err, 40 + beta + err) + b
dat5 <- dat5[!(dat5$ID %in% 20:59 & dat5$x == "post"), ]
dat5 <- dat5[!(dat5$ID %in% 60:89 & dat5$x == "pre"), ]
m2 <- lmer(y ~ x + (1 | ID), dat5)
summary(m2)

summary(m2)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | ID)
Data: dat5

REML criterion at convergence: 367.1

Scaled residuals:
Min 1Q Median 3Q Max
-1.8537 -0.3634 -0.0253 0.3611 2.1063

Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 1.2441 1.1154
Residual 0.6632 0.8144
Number of obs: 108, groups: ID, 89

Fixed effects:
Estimate Std. Error t value
(Intercept) 39.9205 0.1704 234.29
xpost 4.1333 0.2069 19.98

This "seems" to work reasonably. What dangers await?

--
Paul E. Johnson http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.