David Jones
2018-06-11 13:59:24 UTC
I am looking to model quality of life (QOL) as a DV over time. The DV
shows strong negative skew. I am wondering about the best way to
handle this (more detail below). Frequency distribution of QOL and
example code are also at the end of this message.
Many participants just say that their quality of life is great, and
thus there is a ceiling effect with many values clustered at the
highest value. While the distribution resembles y=e^x, I have not been
able to fit a distribution via GLMM that results in normally
distributed and homoskedastic residuals (including gamma and inverse
gaussian). A number of DV transformations have not worked either
(e.g., log, exponential, Box-Cox), in large part because of the large
proportion of values at the maximum level of QOL, which creates a
spike at the end of the distribution. I could try zero-inflated models
by transforming the dv (multiply by -1 and put the starting value at
0), but even then there will still be a disproportionate number of
values clustered at one end.
My question: I am particularly interested in fixed effects parameters
from a longitudinal model, and was thinking of testing these
parameters by using percentile bootstrap CIs via confint(). However,
the residuals from a lmer model are both non-normal and
heteroskedastic - will percentile bootstrap of beta coefficients
address this, or can only the wild bootstrap address these issues (as
it is targeted to residuals)? I have a basic understanding of the
bootstrap but am not an expert regarding its use in linear models.
Many thanks!
# Example lmer code
model <- lmer(QOL ~ poly(time, 2) + (time | ID), data=dataset, REML =
FALSE )
# Frequency distribution
QOL valid_percent
25 0.000308261
30 0.000308261
32 0.000308261
34 0.000616523
38 0.000308261
41 0.000308261
45 0.000308261
46 0.000308261
47 0.000308261
48 0.000616523
49 0.000616523
50 0.000616523
51 0.000308261
52 0.000308261
53 0.001541307
54 0.000616523
55 0.001233046
56 0.000616523
57 0.000924784
58 0.000308261
59 0.000924784
60 0.000924784
61 0.001849568
62 0.001541307
63 0.003082614
64 0.001849568
65 0.00215783
66 0.002466091
67 0.004007398
68 0.002466091
69 0.004007398
70 0.002466091
71 0.003699137
72 0.006781751
73 0.004932183
74 0.006781751
75 0.006165228
76 0.007090012
77 0.007706535
78 0.008631319
79 0.010789149
80 0.015104809
81 0.014488286
82 0.01541307
83 0.020345253
84 0.025893958
85 0.03298397
86 0.036066585
87 0.053020962
88 0.064426634
89 0.080147966
90 0.088779285
91 0.452219482
shows strong negative skew. I am wondering about the best way to
handle this (more detail below). Frequency distribution of QOL and
example code are also at the end of this message.
Many participants just say that their quality of life is great, and
thus there is a ceiling effect with many values clustered at the
highest value. While the distribution resembles y=e^x, I have not been
able to fit a distribution via GLMM that results in normally
distributed and homoskedastic residuals (including gamma and inverse
gaussian). A number of DV transformations have not worked either
(e.g., log, exponential, Box-Cox), in large part because of the large
proportion of values at the maximum level of QOL, which creates a
spike at the end of the distribution. I could try zero-inflated models
by transforming the dv (multiply by -1 and put the starting value at
0), but even then there will still be a disproportionate number of
values clustered at one end.
My question: I am particularly interested in fixed effects parameters
from a longitudinal model, and was thinking of testing these
parameters by using percentile bootstrap CIs via confint(). However,
the residuals from a lmer model are both non-normal and
heteroskedastic - will percentile bootstrap of beta coefficients
address this, or can only the wild bootstrap address these issues (as
it is targeted to residuals)? I have a basic understanding of the
bootstrap but am not an expert regarding its use in linear models.
Many thanks!
# Example lmer code
model <- lmer(QOL ~ poly(time, 2) + (time | ID), data=dataset, REML =
FALSE )
# Frequency distribution
QOL valid_percent
25 0.000308261
30 0.000308261
32 0.000308261
34 0.000616523
38 0.000308261
41 0.000308261
45 0.000308261
46 0.000308261
47 0.000308261
48 0.000616523
49 0.000616523
50 0.000616523
51 0.000308261
52 0.000308261
53 0.001541307
54 0.000616523
55 0.001233046
56 0.000616523
57 0.000924784
58 0.000308261
59 0.000924784
60 0.000924784
61 0.001849568
62 0.001541307
63 0.003082614
64 0.001849568
65 0.00215783
66 0.002466091
67 0.004007398
68 0.002466091
69 0.004007398
70 0.002466091
71 0.003699137
72 0.006781751
73 0.004932183
74 0.006781751
75 0.006165228
76 0.007090012
77 0.007706535
78 0.008631319
79 0.010789149
80 0.015104809
81 0.014488286
82 0.01541307
83 0.020345253
84 0.025893958
85 0.03298397
86 0.036066585
87 0.053020962
88 0.064426634
89 0.080147966
90 0.088779285
91 0.452219482