Discussion:
[R-sig-ME] Should I include participants with baseline score only (missing afterwards) in a longitudinal study?
K Imran M
2018-07-31 06:24:46 UTC
Permalink
Hi everyone,

I did a longitudinal study where I collected functional score at 3
different times (baseline, 1 month after baseline and 3 months after
baseline) from 98 patients. There were 11 patients who died right after
baseline (so they have functional score at baseline only, and they did not
have the scores at 1 month after baseline or 3 months after baseline).

My question is should I remove 11 patients from the dataset (because they
only provide 1 score?)

What I did was, next , I run the nlme::lme function on 2 datasets, the
first dataset that contained 98 participants (11 with only 1 score at
baseline) and the second dataset with participants with at least 2 scores
(baseline + 1 month or baseline + 3 month or baseline + 1 month + 3 month).
I noticed the lme estimates for the two datasets are slightly different.
How can I explain this?

In the analysis above, I used a random intercept model (participants as the
random effect) with time (baseline, 1 month after baseline and 3 months
after baseline) treated as a factor variable. The covariate is age.

The datasets (edited due to privacy) are from this links:
dat.a (https://drive.google.com/open?id=1jAAFnrUfuTsVQST7EE3vjrh0_71ziAut)
dat.b (https://drive.google.com/open?id=1caGTd6SNnzbHSln84jw9b_lVHhnz7Qij)

And the R codes are here:
#######
library(haven)
dat.a <- read_dta("test_complete_data.dta")
dat.b <- read_dta("test_complete_with_at_discharge.dta")

# mixed model
library(nlme)
mod.dta.a <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.a, na.action = 'na.omit', method =
'ML')
mod.dta.b <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.b, na.action = 'na.omit', method = 'ML')

# res
summary(mod.dta.a)
summary(mod.dta.b)
#####


So let me rephrase the questions (Let us assume we are not interested in
the mechanism of missingness but purely on the estimation from mixed model)
1) should I include patients that have only 1 measurement in a longitudinal
study in my model?
2) why the estimates are different from the dataset with at least 2 data on
follow-ups) vs the dataset that also contain participants with only 1 data
on follow-up? A simple explanation should be fine for me.

I apologize for my lack of math and stat skill. I really appreciate your
time in responding to this question.

Thank you.

Best wishes

Kamarul Imran
Universiti Sains Malaysia

[[alternative HTML version deleted]]
Phillip Alday
2018-07-31 13:34:11 UTC
Permalink
The model will additional baseline-only participants will have less
uncertainty about the estimates concerning the baseline. This reduced
uncertainty will help "pin" those values, which may also impact other
estimates.

As a simple example, think of a line passing through two points. Your
job is to determine the slope of the line, but this is made more
complicated by you not being totally certain about the position of the
two points. If can reduce the uncertainty in the position of just one
point, then this will still reduce the possible range of slopes and may
event cause your estimate of the slope to tend towards a
particular/different value.

As for your particular inference: I would tend to keep the data in so
that my estimates of function at baseline were as good as possible, even
though this extra data adds no information about function at 1 or 3
months. The loss in uncertainty of the location of the baseline is
potentially useful in its own right and may even help give better
estimates of the slope (=difference between baseline and subsequent
measurement) by creating additional constraints.

Phillip
Post by K Imran M
Hi everyone,
I did a longitudinal study where I collected functional score at 3
different times (baseline, 1 month after baseline and 3 months after
baseline) from 98 patients. There were 11 patients who died right after
baseline (so they have functional score at baseline only, and they did not
have the scores at 1 month after baseline or 3 months after baseline).
My question is should I remove 11 patients from the dataset (because they
only provide 1 score?)
What I did was, next , I run the nlme::lme function on 2 datasets, the
first dataset that contained 98 participants (11 with only 1 score at
baseline) and the second dataset with participants with at least 2 scores
(baseline + 1 month or baseline + 3 month or baseline + 1 month + 3 month).
I noticed the lme estimates for the two datasets are slightly different.
How can I explain this?
In the analysis above, I used a random intercept model (participants as the
random effect) with time (baseline, 1 month after baseline and 3 months
after baseline) treated as a factor variable. The covariate is age.
dat.a (https://drive.google.com/open?id=1jAAFnrUfuTsVQST7EE3vjrh0_71ziAut)
dat.b (https://drive.google.com/open?id=1caGTd6SNnzbHSln84jw9b_lVHhnz7Qij)
#######
library(haven)
dat.a <- read_dta("test_complete_data.dta")
dat.b <- read_dta("test_complete_with_at_discharge.dta")
# mixed model
library(nlme)
mod.dta.a <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.a, na.action = 'na.omit', method =
'ML')
mod.dta.b <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.b, na.action = 'na.omit', method = 'ML')
# res
summary(mod.dta.a)
summary(mod.dta.b)
#####
So let me rephrase the questions (Let us assume we are not interested in
the mechanism of missingness but purely on the estimation from mixed model)
1) should I include patients that have only 1 measurement in a longitudinal
study in my model?
2) why the estimates are different from the dataset with at least 2 data on
follow-ups) vs the dataset that also contain participants with only 1 data
on follow-up? A simple explanation should be fine for me.
I apologize for my lack of math and stat skill. I really appreciate your
time in responding to this question.
Thank you.
Best wishes
Kamarul Imran
Universiti Sains Malaysia
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Jon Baron
2018-07-31 14:02:22 UTC
Permalink
Post by Phillip Alday
The model will additional baseline-only participants will have less
uncertainty about the estimates concerning the baseline. This reduced
uncertainty will help "pin" those values, which may also impact other
estimates.
As a simple example, think of a line passing through two points. Your
job is to determine the slope of the line, but this is made more
complicated by you not being totally certain about the position of the
two points. If can reduce the uncertainty in the position of just one
point, then this will still reduce the possible range of slopes and may
event cause your estimate of the slope to tend towards a
particular/different value.
As for your particular inference: I would tend to keep the data in so
that my estimates of function at baseline were as good as possible, even
though this extra data adds no information about function at 1 or 3
months. The loss in uncertainty of the location of the baseline is
potentially useful in its own right and may even help give better
estimates of the slope (=difference between baseline and subsequent
measurement) by creating additional constraints.
Phillip
This makes sense if those who died after baseline did not differ
systematically from those who survived. But, if there is any reason for
the baseline measure to correlate with longevity, I think it would be
safer to remove the 11 subjects, even at the expense of some
additional error.

Jon
Post by Phillip Alday
Post by K Imran M
Hi everyone,
I did a longitudinal study where I collected functional score at 3
different times (baseline, 1 month after baseline and 3 months after
baseline) from 98 patients. There were 11 patients who died right after
baseline (so they have functional score at baseline only, and they did not
have the scores at 1 month after baseline or 3 months after baseline).
My question is should I remove 11 patients from the dataset (because they
only provide 1 score?)
What I did was, next , I run the nlme::lme function on 2 datasets, the
first dataset that contained 98 participants (11 with only 1 score at
baseline) and the second dataset with participants with at least 2 scores
(baseline + 1 month or baseline + 3 month or baseline + 1 month + 3 month).
I noticed the lme estimates for the two datasets are slightly different.
How can I explain this?
In the analysis above, I used a random intercept model (participants as the
random effect) with time (baseline, 1 month after baseline and 3 months
after baseline) treated as a factor variable. The covariate is age.
dat.a (https://drive.google.com/open?id=1jAAFnrUfuTsVQST7EE3vjrh0_71ziAut)
dat.b (https://drive.google.com/open?id=1caGTd6SNnzbHSln84jw9b_lVHhnz7Qij)
#######
library(haven)
dat.a <- read_dta("test_complete_data.dta")
dat.b <- read_dta("test_complete_with_at_discharge.dta")
# mixed model
library(nlme)
mod.dta.a <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.a, na.action = 'na.omit', method =
'ML')
mod.dta.b <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.b, na.action = 'na.omit', method = 'ML')
# res
summary(mod.dta.a)
summary(mod.dta.b)
#####
So let me rephrase the questions (Let us assume we are not interested in
the mechanism of missingness but purely on the estimation from mixed model)
1) should I include patients that have only 1 measurement in a longitudinal
study in my model?
2) why the estimates are different from the dataset with at least 2 data on
follow-ups) vs the dataset that also contain participants with only 1 data
on follow-up? A simple explanation should be fine for me.
I apologize for my lack of math and stat skill. I really appreciate your
time in responding to this question.
Thank you.
Best wishes
Kamarul Imran
Universiti Sains Malaysia
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Phillip Alday
2018-07-31 14:08:45 UTC
Permalink
Thanks for catching that, Jon, I wasn't paying attention to the full
details of the study.

My implicit assumption was "missing at random". For missing not at
random, then you should either exclude the missings or model the
missingness explicitly with say a hurdle model.

Phillip
Post by Jon Baron
Post by Phillip Alday
The model will additional baseline-only participants will have less
uncertainty about the estimates concerning the baseline. This reduced
uncertainty will help "pin" those values, which may also impact other
estimates.
As a simple example, think of a line passing through two points. Your
job is to determine the slope of the line, but this is made more
complicated by you not being totally certain about the position of the
two points. If can reduce the uncertainty in the position of just one
point, then this will still reduce the possible range of slopes and may
event cause your estimate of the slope to tend towards a
particular/different value.
As for your particular inference: I would tend to keep the data in so
that my estimates of function at baseline were as good as possible, even
though this extra data adds no information about function at 1 or 3
months. The loss in uncertainty of the location of the baseline is
potentially useful in its own right and may even help give better
estimates of the slope (=difference between baseline and subsequent
measurement) by creating additional constraints.
Phillip
This makes sense if those who died after baseline did not differ
systematically from those who survived. But, if there is any reason for
the baseline measure to correlate with longevity, I think it would be
safer to remove the 11 subjects, even at the expense of some
additional error.
Jon
Post by Phillip Alday
Post by K Imran M
Hi everyone,
I did a longitudinal study where I collected functional score at 3
different times (baseline, 1 month after baseline and 3 months after
baseline) from 98 patients. There were 11 patients who died right after
baseline (so they have functional score at baseline only, and they did not
have the scores at 1 month after baseline or 3 months after baseline).
My question is should I remove 11 patients from the dataset (because they
only provide 1 score?)
What I did was, next , I run the nlme::lme function on 2 datasets, the
first dataset that contained 98 participants (11 with only 1 score at
baseline) and the second dataset with participants with at least 2 scores
(baseline + 1 month or baseline + 3 month or baseline + 1 month + 3 month).
I noticed the lme estimates for the two datasets are slightly different.
How can I explain this?
In the analysis above, I used a random intercept model (participants as the
random effect) with time (baseline, 1 month after baseline and 3 months
after baseline) treated as a factor variable. The covariate is age.
dat.a
(https://drive.google.com/open?id=1jAAFnrUfuTsVQST7EE3vjrh0_71ziAut)
dat.b
(https://drive.google.com/open?id=1caGTd6SNnzbHSln84jw9b_lVHhnz7Qij)
#######
library(haven)
dat.a <- read_dta("test_complete_data.dta")
dat.b <- read_dta("test_complete_with_at_discharge.dta")
# mixed model
library(nlme)
mod.dta.a <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.a, na.action = 'na.omit', method =
'ML')
mod.dta.b <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.b, na.action = 'na.omit', method = 'ML')
# res
summary(mod.dta.a)
summary(mod.dta.b)
#####
So let me rephrase the questions (Let us assume we are not interested in
the mechanism of missingness but purely on the estimation from mixed model)
1) should I include patients that have only 1 measurement in a longitudinal
study in my model?
2) why the estimates are different from the dataset with at least 2 data on
follow-ups) vs the dataset that also contain participants with only 1 data
on follow-up? A simple explanation should be fine for me.
I apologize for my lack of math and stat skill. I really appreciate your
time in responding to this question.
Thank you.
Best wishes
Kamarul Imran
Universiti Sains Malaysia
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
K Imran M
2018-07-31 16:18:15 UTC
Permalink
Phillip and Jon,

Very2 helpful explanation and insight. Really appreciate it.

KIM
Post by Phillip Alday
Thanks for catching that, Jon, I wasn't paying attention to the full
details of the study.
My implicit assumption was "missing at random". For missing not at
random, then you should either exclude the missings or model the
missingness explicitly with say a hurdle model.
Phillip
Post by Jon Baron
Post by Phillip Alday
The model will additional baseline-only participants will have less
uncertainty about the estimates concerning the baseline. This reduced
uncertainty will help "pin" those values, which may also impact other
estimates.
As a simple example, think of a line passing through two points. Your
job is to determine the slope of the line, but this is made more
complicated by you not being totally certain about the position of the
two points. If can reduce the uncertainty in the position of just one
point, then this will still reduce the possible range of slopes and may
event cause your estimate of the slope to tend towards a
particular/different value.
As for your particular inference: I would tend to keep the data in so
that my estimates of function at baseline were as good as possible, even
though this extra data adds no information about function at 1 or 3
months. The loss in uncertainty of the location of the baseline is
potentially useful in its own right and may even help give better
estimates of the slope (=difference between baseline and subsequent
measurement) by creating additional constraints.
Phillip
This makes sense if those who died after baseline did not differ
systematically from those who survived. But, if there is any reason for
the baseline measure to correlate with longevity, I think it would be
safer to remove the 11 subjects, even at the expense of some
additional error.
Jon
Post by Phillip Alday
Post by K Imran M
Hi everyone,
I did a longitudinal study where I collected functional score at 3
different times (baseline, 1 month after baseline and 3 months after
baseline) from 98 patients. There were 11 patients who died right after
baseline (so they have functional score at baseline only, and they did not
have the scores at 1 month after baseline or 3 months after baseline).
My question is should I remove 11 patients from the dataset (because they
only provide 1 score?)
What I did was, next , I run the nlme::lme function on 2 datasets, the
first dataset that contained 98 participants (11 with only 1 score at
baseline) and the second dataset with participants with at least 2 scores
(baseline + 1 month or baseline + 3 month or baseline + 1 month + 3 month).
I noticed the lme estimates for the two datasets are slightly
different.
Post by Jon Baron
Post by Phillip Alday
Post by K Imran M
How can I explain this?
In the analysis above, I used a random intercept model (participants as the
random effect) with time (baseline, 1 month after baseline and 3 months
after baseline) treated as a factor variable. The covariate is age.
dat.a
(https://drive.google.com/open?id=1jAAFnrUfuTsVQST7EE3vjrh0_71ziAut)
dat.b
(https://drive.google.com/open?id=1caGTd6SNnzbHSln84jw9b_lVHhnz7Qij)
#######
library(haven)
dat.a <- read_dta("test_complete_data.dta")
dat.b <- read_dta("test_complete_with_at_discharge.dta")
# mixed model
library(nlme)
mod.dta.a <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.a, na.action = 'na.omit', method =
'ML')
mod.dta.b <- lme(barthel ~ -1 + age + factor(time), random = ~1| id,
data = dat.b, na.action = 'na.omit', method = 'ML')
# res
summary(mod.dta.a)
summary(mod.dta.b)
#####
So let me rephrase the questions (Let us assume we are not interested
in
Post by Jon Baron
Post by Phillip Alday
Post by K Imran M
the mechanism of missingness but purely on the estimation from mixed model)
1) should I include patients that have only 1 measurement in a longitudinal
study in my model?
2) why the estimates are different from the dataset with at least 2 data on
follow-ups) vs the dataset that also contain participants with only 1 data
on follow-up? A simple explanation should be fine for me.
I apologize for my lack of math and stat skill. I really appreciate
your
Post by Jon Baron
Post by Phillip Alday
Post by K Imran M
time in responding to this question.
Thank you.
Best wishes
Kamarul Imran
Universiti Sains Malaysia
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]

Loading...