Discussion:
[R-sig-ME] Running
Jonathan Miller
2018-11-09 21:15:53 UTC
Permalink
Dr. Bolker,

I am a Phd student at NCSU and struggling with a coding issue. I am
bootstrapping some glmm model predictions in order to determine the
uncertainty associated with their fixed effects. I read your comments on
https://github.com/lme4/lme4/issues/388 and have used a code similar to
yours below (b3):

## param, RE, and conditional
b1 <- bootMer(fm1,FUN=sfun1,nsim=100,seed=101)
## param and RE (no conditional)
b2 <- bootMer(fm1,FUN=sfun2,nsim=100,seed=101)
## param only
b3 <- bootMer(fm1,FUN=function(x) predict(x,newdata=test,re.form=~0),
## re.form=~0 is equivalent to use.u=FALSE
nsim=100,seed=101)


It has worked well for me but takes an extremely long time to run. I am
predicting 6 different wq indicators for 1,423 sites and the datasets range
in size from 3,000 to 25,000 entries each. The small one is relatively
runs relatively ok, but the others are extremely slow. In my code (below),
I also want to make more than one prediction (current conditions, possible
future conditions) using the bootstrapping. Using "snow" in parallel
doesn't seem to speed things up. I thought of two possibilities, but am
unsure how to implement them.

for (s in 1:1423){

bi <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites[s,],re.form=~0,REML=TRUE),
parallel="snow",nsim=1000,seed=101)
bi.5 <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites.m5[s,],re.form=~0,REML=TRUE),
parallel="snow",nsim=1000,seed=101)
}

1) Can I predict the bootstrapped model using two different datasets at
once to speed things up (i.e., pred.sites and pred.sites.m5)?
2) Can I use parallel processing of the initial loop (1,423 sites) outside
of bootmer (perhaps with doParallel and foreach) and then run bootmer
within that loop? Though I have used foreach before, I find it hard to
compile the data that I really want on the backend.

Thank you for your time and any suggestions you might have.

Sincerely,

Jonathan

[[alternative HTML version deleted]]
Ben Bolker
2018-11-09 22:17:22 UTC
Permalink
I will give this some thought when I get a chance (hopefully someone
else will give it some thought and find some answers sooner ...) In the
meantime -- do you really need parametric bootstrapping/bootMer to get
the confidence intervals you want? It's quite possible that a simpler
approximation (e.g. assuming that the variation caused by uncertainty in
the top-level random-effects parameters is small relative to other
sources of variability) is adequate, given that you have thousands of
samples ...
Post by Jonathan Miller
Dr. Bolker,
I am a Phd student at NCSU and struggling with a coding issue. I am
bootstrapping some glmm model predictions in order to determine the
uncertainty associated with their fixed effects. I read your comments on
https://github.com/lme4/lme4/issues/388 and have used a code similar to
## param, RE, and conditional
b1 <- bootMer(fm1,FUN=sfun1,nsim=100,seed=101)
## param and RE (no conditional)
b2 <- bootMer(fm1,FUN=sfun2,nsim=100,seed=101)
## param only
b3 <- bootMer(fm1,FUN=function(x) predict(x,newdata=test,re.form=~0),
## re.form=~0 is equivalent to use.u=FALSE
nsim=100,seed=101)
It has worked well for me but takes an extremely long time to run. I am
predicting 6 different wq indicators for 1,423 sites and the datasets range
in size from 3,000 to 25,000 entries each. The small one is relatively
runs relatively ok, but the others are extremely slow. In my code (below),
I also want to make more than one prediction (current conditions, possible
future conditions) using the bootstrapping. Using "snow" in parallel
doesn't seem to speed things up. I thought of two possibilities, but am
unsure how to implement them.
for (s in 1:1423){
bi <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites[s,],re.form=~0,REML=TRUE),
parallel="snow",nsim=1000,seed=101)
bi.5 <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites.m5[s,],re.form=~0,REML=TRUE),
parallel="snow",nsim=1000,seed=101)
}
1) Can I predict the bootstrapped model using two different datasets at
once to speed things up (i.e., pred.sites and pred.sites.m5)?
2) Can I use parallel processing of the initial loop (1,423 sites) outside
of bootmer (perhaps with doParallel and foreach) and then run bootmer
within that loop? Though I have used foreach before, I find it hard to
compile the data that I really want on the backend.
Thank you for your time and any suggestions you might have.
Sincerely,
Jonathan
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker
2018-11-09 23:44:27 UTC
Permalink
[please keep r-sig-mixed-models in the Cc: if possible - although I
see it's a judgment call in this case because the e-mail contains both
generally pertinent info (uncertainty of FE small) and a personal-ish
message ...]

Just to be clear, (1) I was suggesting that the uncertainty of the
fixed effects might be *large* with respect to the uncertainty of the
random effects, and largely independent of it; (2) have you already
tried implementing other (approximate, faster) methods for the
uncertainty on a small subset of the sites to convince yourself that you
really need the full PB method?
Thank you.  You are right the uncertainty of the fixed effects is
smaller than the others, but is of importance for my project. I
appreciate any thoughts you have when you have time to get to it.
Jonathan
  I will give this some thought when I get a chance (hopefully someone
else will give it some thought and find some answers sooner ...)  In the
meantime -- do you really need parametric bootstrapping/bootMer to get
the confidence intervals you want?  It's quite possible that a simpler
approximation (e.g. assuming that the variation caused by uncertainty in
the top-level random-effects parameters is small relative to other
sources of variability) is adequate, given that you have thousands of
samples ...
Post by Jonathan Miller
Dr. Bolker,
I am a Phd student at NCSU and struggling with a coding issue. I am
bootstrapping some glmm model predictions in order to determine the
uncertainty associated with their fixed effects.  I read your
comments on
Post by Jonathan Miller
https://github.com/lme4/lme4/issues/388 and have used a code
similar to
Post by Jonathan Miller
## param, RE, and conditional
b1 <- bootMer(fm1,FUN=sfun1,nsim=100,seed=101)
## param and RE (no conditional)
b2 <- bootMer(fm1,FUN=sfun2,nsim=100,seed=101)
## param only
b3 <- bootMer(fm1,FUN=function(x) predict(x,newdata=test,re.form=~0),
               ## re.form=~0 is equivalent to use.u=FALSE
               nsim=100,seed=101)
It has worked well for me but takes an extremely long time to run.
I am
Post by Jonathan Miller
predicting 6 different wq indicators for 1,423 sites and the
datasets range
Post by Jonathan Miller
in size from 3,000 to 25,000 entries each.  The small one is
relatively
Post by Jonathan Miller
runs relatively ok, but the others are extremely slow. In my code
(below),
Post by Jonathan Miller
I also want to make more than one prediction (current conditions,
possible
Post by Jonathan Miller
future conditions) using the bootstrapping. Using "snow" in parallel
doesn't seem to speed things up.  I thought of two possibilities,
but am
Post by Jonathan Miller
unsure how to implement them.
for (s in 1:1423){
bi <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites[s,],re.form=~0,REML=TRUE),
               parallel="snow",nsim=1000,seed=101)
bi.5 <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites.m5[s,],re.form=~0,REML=TRUE),
               parallel="snow",nsim=1000,seed=101)
}
1) Can I predict the bootstrapped model using two different
datasets at
Post by Jonathan Miller
once to speed things up (i.e., pred.sites and pred.sites.m5)?
2) Can I use parallel processing of the initial loop (1,423 sites)
outside
Post by Jonathan Miller
of bootmer (perhaps with doParallel and foreach) and then run bootmer
within that loop?  Though I have used foreach before, I find it
hard to
Post by Jonathan Miller
compile the data that I really want on the backend.
Thank you for your time and any suggestions you might have.
Sincerely,
Jonathan
       [[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Jonathan Miller
2018-11-10 13:56:58 UTC
Permalink
Ben,

I am sorry. I did misunderstand your first email last night. I am using
the glmm models for predicting water quality and my random effects are at
the site and basin level and they do explain a lot of the variance in the
models especially for "noisy" indicators like turbidity and fecal
coliform. In the project, I am predicting for current conditions as well
as potential management scenarios throughout a region. Initially, I just
calculate the mean difference between these two values (current vs.
management scenario) for the region, but I would like to get an idea of the
uncertainty in this mean reduction. Though the random effects are
significant, we are making an assumption that when trying to restore a
particular site, the random effect at that site will not change over the
course of the restoration. This implies that the uncertainty of improvement
for a given site is mostly affected by the uncertainty in the fixed effects
which are being adjusted for the management scenarios (i.e., increase of
canopy cover, nutrient loadings from wastewater treatment plants, etc.). I
tried to use the predictInterval function, but it seemed to give me
predictive intervals including random effects as well. In essence, they
were much larger than the ones I am getting using :

## param only
b3 <- bootMer(fm1,FUN=function(x) predict(x,newdata=test,re.form=~0),
## re.form=~0 is equivalent to use.u=FALSE
nsim=100,seed=101)

I also used Cholesky decomposition on the covariance matrix of the fixed
effects to "simulate" the uncertainty of the fixed effects giving similar
results. I think bootstrapping is a bit easier to explain in my manuscript
though and thought it might also be easier for coding purposes using
bootmer.

It does seem to be working well, but my question was more on why using
parallel= "snow" isn't speeding things up, though maybe your concerns of me
having to do PB are right as well.

Thank you,

Jonathan
Post by Ben Bolker
[please keep r-sig-mixed-models in the Cc: if possible - although I
see it's a judgment call in this case because the e-mail contains both
generally pertinent info (uncertainty of FE small) and a personal-ish
message ...]
Just to be clear, (1) I was suggesting that the uncertainty of the
fixed effects might be *large* with respect to the uncertainty of the
random effects, and largely independent of it; (2) have you already
tried implementing other (approximate, faster) methods for the
uncertainty on a small subset of the sites to convince yourself that you
really need the full PB method?
Thank you. You are right the uncertainty of the fixed effects is
smaller than the others, but is of importance for my project. I
appreciate any thoughts you have when you have time to get to it.
Jonathan
I will give this some thought when I get a chance (hopefully
someone
else will give it some thought and find some answers sooner ...) In
the
meantime -- do you really need parametric bootstrapping/bootMer to
get
the confidence intervals you want? It's quite possible that a
simpler
approximation (e.g. assuming that the variation caused by
uncertainty in
the top-level random-effects parameters is small relative to other
sources of variability) is adequate, given that you have thousands of
samples ...
Post by Jonathan Miller
Dr. Bolker,
I am a Phd student at NCSU and struggling with a coding issue. I am
bootstrapping some glmm model predictions in order to determine the
uncertainty associated with their fixed effects. I read your
comments on
Post by Jonathan Miller
https://github.com/lme4/lme4/issues/388 and have used a code
similar to
Post by Jonathan Miller
## param, RE, and conditional
b1 <- bootMer(fm1,FUN=sfun1,nsim=100,seed=101)
## param and RE (no conditional)
b2 <- bootMer(fm1,FUN=sfun2,nsim=100,seed=101)
## param only
b3 <- bootMer(fm1,FUN=function(x)
predict(x,newdata=test,re.form=~0),
Post by Jonathan Miller
## re.form=~0 is equivalent to use.u=FALSE
nsim=100,seed=101)
It has worked well for me but takes an extremely long time to run.
I am
Post by Jonathan Miller
predicting 6 different wq indicators for 1,423 sites and the
datasets range
Post by Jonathan Miller
in size from 3,000 to 25,000 entries each. The small one is
relatively
Post by Jonathan Miller
runs relatively ok, but the others are extremely slow. In my code
(below),
Post by Jonathan Miller
I also want to make more than one prediction (current conditions,
possible
Post by Jonathan Miller
future conditions) using the bootstrapping. Using "snow" in
parallel
Post by Jonathan Miller
doesn't seem to speed things up. I thought of two possibilities,
but am
Post by Jonathan Miller
unsure how to implement them.
for (s in 1:1423){
bi <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites[s,],re.form=~0,REML=TRUE),
parallel="snow",nsim=1000,seed=101)
bi.5 <- bootMer(BI.mod,FUN=function(x)
predict(x,newdata=pred.sites.m5[s,],re.form=~0,REML=TRUE),
parallel="snow",nsim=1000,seed=101)
}
1) Can I predict the bootstrapped model using two different
datasets at
Post by Jonathan Miller
once to speed things up (i.e., pred.sites and pred.sites.m5)?
2) Can I use parallel processing of the initial loop (1,423 sites)
outside
Post by Jonathan Miller
of bootmer (perhaps with doParallel and foreach) and then run
bootmer
Post by Jonathan Miller
within that loop? Though I have used foreach before, I find it
hard to
Post by Jonathan Miller
compile the data that I really want on the backend.
Thank you for your time and any suggestions you might have.
Sincerely,
Jonathan
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]

Continue reading on narkive:
Loading...