Discussion:
[R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
Adam Mills-Campisi
2018-08-23 19:18:11 UTC
Permalink
I am estimating a piecewise exponential, mixed-effects, survival model with
recurrent events. Each individual in the dataset gets an individual
interpret (where using a PWP approach). Our full dataset has 10 million
individuals, with 180 million events. I am not sure that there is any
framework which can accommodate data at that size, so we are going to
sample. Our final sample size largely depends on how quickly we can
estimate the model, which brings me to my question: Is there a way to
mutli-thread/core the model? I tried to find some kind of instruction on
the web and the best lead I could find was a reference to this list serve.
Any help would be greatly appreciated.

[[alternative HTML version deleted]]
Doran, Harold
2018-08-23 19:21:58 UTC
Permalink
No. You can change to an improved BLAS or I have found the Microsoft R has some built in multithreading that is fast for matrix algebra and it passes that benefit to lmer. From some experience, you can improve computational time of an lmer model with Microsoft R

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On Behalf Of Adam Mills-Campisi
Sent: Thursday, August 23, 2018 3:18 PM
To: r-sig-mixed-***@r-project.org
Subject: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with

I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
Any help would be greatly appreciated.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Adam Mills-Campisi
2018-08-23 19:30:55 UTC
Permalink
We originally tried to use stan to estimate the model, we were getting
performance issues. I assumed that the frequentist approaches would be
faster.

On Thu, Aug 23, 2018 at 12:28 PM Doran, Harold <***@air.org> wrote:

> No. You can change to an improved BLAS or I have found the Microsoft R has
> some built in multithreading that is fast for matrix algebra and it passes
> that benefit to lmer. From some experience, you can improve computational
> time of an lmer model with Microsoft R
>
> -----Original Message-----
> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On
> Behalf Of Adam Mills-Campisi
> Sent: Thursday, August 23, 2018 3:18 PM
> To: r-sig-mixed-***@r-project.org
> Subject: [R-sig-ME] How to use all the cores while running glmer on a
> piecewise exponential survival with
>
> I am estimating a piecewise exponential, mixed-effects, survival model
> with recurrent events. Each individual in the dataset gets an individual
> interpret (where using a PWP approach). Our full dataset has 10 million
> individuals, with 180 million events. I am not sure that there is any
> framework which can accommodate data at that size, so we are going to
> sample. Our final sample size largely depends on how quickly we can
> estimate the model, which brings me to my question: Is there a way to
> mutli-thread/core the model? I tried to find some kind of instruction on
> the web and the best lead I could find was a reference to this list serve.
> Any help would be greatly appreciated.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

[[alternative HTML version deleted]]
Ben Bolker
2018-08-23 20:16:40 UTC
Permalink
Are the frequentist methods *not* faster? I'd be pretty surprised,
unless some you're hitting some terrible memory bottleneck or something.


On 2018-08-23 03:30 PM, Adam Mills-Campisi wrote:
> We originally tried to use stan to estimate the model, we were getting
> performance issues. I assumed that the frequentist approaches would be
> faster.
>
> On Thu, Aug 23, 2018 at 12:28 PM Doran, Harold <***@air.org> wrote:
>
>> No. You can change to an improved BLAS or I have found the Microsoft R has
>> some built in multithreading that is fast for matrix algebra and it passes
>> that benefit to lmer. From some experience, you can improve computational
>> time of an lmer model with Microsoft R
>>
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On
>> Behalf Of Adam Mills-Campisi
>> Sent: Thursday, August 23, 2018 3:18 PM
>> To: r-sig-mixed-***@r-project.org
>> Subject: [R-sig-ME] How to use all the cores while running glmer on a
>> piecewise exponential survival with
>>
>> I am estimating a piecewise exponential, mixed-effects, survival model
>> with recurrent events. Each individual in the dataset gets an individual
>> interpret (where using a PWP approach). Our full dataset has 10 million
>> individuals, with 180 million events. I am not sure that there is any
>> framework which can accommodate data at that size, so we are going to
>> sample. Our final sample size largely depends on how quickly we can
>> estimate the model, which brings me to my question: Is there a way to
>> mutli-thread/core the model? I tried to find some kind of instruction on
>> the web and the best lead I could find was a reference to this list serve.
>> Any help would be greatly appreciated.
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-***@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
D. Rizopoulos
2018-08-23 21:27:41 UTC
Permalink
As suggested, an approach could be to split the original big sample in manageable pieces, do the analysis in each, and then combine the results.

Geert Molenberghs, Geert Verbeke and colleagues have worked on this; a relevant recent papers seems to be: https://lirias2repo.kuleuven.be/bitstream/id/470902/

I hope it helps.

Best,
Dimitris


From: Ben Bolker <***@gmail.com<mailto:***@gmail.com>>
Date: Thursday, 23 Aug 2018, 10:29 PM
To: r-sig-mixed-***@r-project.org <r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>>
Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with


Are the frequentist methods *not* faster? I'd be pretty surprised,
unless some you're hitting some terrible memory bottleneck or something.


On 2018-08-23 03:30 PM, Adam Mills-Campisi wrote:
> We originally tried to use stan to estimate the model, we were getting
> performance issues. I assumed that the frequentist approaches would be
> faster.
>
> On Thu, Aug 23, 2018 at 12:28 PM Doran, Harold <***@air.org> wrote:
>
>> No. You can change to an improved BLAS or I have found the Microsoft R has
>> some built in multithreading that is fast for matrix algebra and it passes
>> that benefit to lmer. From some experience, you can improve computational
>> time of an lmer model with Microsoft R
>>
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On
>> Behalf Of Adam Mills-Campisi
>> Sent: Thursday, August 23, 2018 3:18 PM
>> To: r-sig-mixed-***@r-project.org
>> Subject: [R-sig-ME] How to use all the cores while running glmer on a
>> piecewise exponential survival with
>>
>> I am estimating a piecewise exponential, mixed-effects, survival model
>> with recurrent events. Each individual in the dataset gets an individual
>> interpret (where using a PWP approach). Our full dataset has 10 million
>> individuals, with 180 million events. I am not sure that there is any
>> framework which can accommodate data at that size, so we are going to
>> sample. Our final sample size largely depends on how quickly we can
>> estimate the model, which brings me to my question: Is there a way to
>> mutli-thread/core the model? I tried to find some kind of instruction on
>> the web and the best lead I could find was a reference to this list serve.
>> Any help would be greatly appreciated.
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-***@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

_______________________________________________
R-sig-mixed-***@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

[[alternative HTML version deleted]]
Adam Mills-Campisi
2018-08-23 22:46:54 UTC
Permalink
Thanks! We are looking into our options. The MixedModels package in Julia
benchmarks at about 2 orders of magnitude faster than R on a small dataset;
however, I would think a lot of that is just overhead from R. On a model of
this size, the computational time should converge because everyone is using
the same BLAS libraries. It might be worth further investigation if timing
remains an issue.

On Thu, Aug 23, 2018 at 2:27 PM D. Rizopoulos <***@erasmusmc.nl>
wrote:

> As suggested, an approach could be to split the original big sample in
> manageable pieces, do the analysis in each, and then combine the results.
>
> Geert Molenberghs, Geert Verbeke and colleagues have worked on this; a
> relevant recent papers seems to be:
> https://lirias2repo.kuleuven.be/bitstream/id/470902/
>
> I hope it helps.
>
> Best,
> Dimitris
>
>
> From: Ben Bolker <***@gmail.com<mailto:***@gmail.com>>
> Date: Thursday, 23 Aug 2018, 10:29 PM
> To: r-sig-mixed-***@r-project.org <r-sig-mixed-***@r-project.org
> <mailto:r-sig-mixed-***@r-project.org>>
> Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a
> piecewise exponential survival with
>
>
> Are the frequentist methods *not* faster? I'd be pretty surprised,
> unless some you're hitting some terrible memory bottleneck or something.
>
>
> On 2018-08-23 03:30 PM, Adam Mills-Campisi wrote:
> > We originally tried to use stan to estimate the model, we were getting
> > performance issues. I assumed that the frequentist approaches would be
> > faster.
> >
> > On Thu, Aug 23, 2018 at 12:28 PM Doran, Harold <***@air.org> wrote:
> >
> >> No. You can change to an improved BLAS or I have found the Microsoft R
> has
> >> some built in multithreading that is fast for matrix algebra and it
> passes
> >> that benefit to lmer. From some experience, you can improve
> computational
> >> time of an lmer model with Microsoft R
> >>
> >> -----Original Message-----
> >> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On
> >> Behalf Of Adam Mills-Campisi
> >> Sent: Thursday, August 23, 2018 3:18 PM
> >> To: r-sig-mixed-***@r-project.org
> >> Subject: [R-sig-ME] How to use all the cores while running glmer on a
> >> piecewise exponential survival with
> >>
> >> I am estimating a piecewise exponential, mixed-effects, survival model
> >> with recurrent events. Each individual in the dataset gets an individual
> >> interpret (where using a PWP approach). Our full dataset has 10 million
> >> individuals, with 180 million events. I am not sure that there is any
> >> framework which can accommodate data at that size, so we are going to
> >> sample. Our final sample size largely depends on how quickly we can
> >> estimate the model, which brings me to my question: Is there a way to
> >> mutli-thread/core the model? I tried to find some kind of instruction on
> >> the web and the best lead I could find was a reference to this list
> serve.
> >> Any help would be greatly appreciated.
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> R-sig-mixed-***@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-***@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

[[alternative HTML version deleted]]
Manuel Ramon
2018-08-24 06:15:29 UTC
Permalink
Not sure if this can be useful:
bigglm: faster-generalised-linear-models-in-largeish-data
<https://notstatschat.rbind.io/2018/03/05/faster-generalised-linear-models-in-largeish-data/>

Manuel


On Fri, Aug 24, 2018 at 12:47 AM Adam Mills-Campisi <
***@gmail.com> wrote:

> Thanks! We are looking into our options. The MixedModels package in Julia
> benchmarks at about 2 orders of magnitude faster than R on a small dataset;
> however, I would think a lot of that is just overhead from R. On a model of
> this size, the computational time should converge because everyone is using
> the same BLAS libraries. It might be worth further investigation if timing
> remains an issue.
>
> On Thu, Aug 23, 2018 at 2:27 PM D. Rizopoulos <***@erasmusmc.nl>
> wrote:
>
> > As suggested, an approach could be to split the original big sample in
> > manageable pieces, do the analysis in each, and then combine the results.
> >
> > Geert Molenberghs, Geert Verbeke and colleagues have worked on this; a
> > relevant recent papers seems to be:
> > https://lirias2repo.kuleuven.be/bitstream/id/470902/
> >
> > I hope it helps.
> >
> > Best,
> > Dimitris
> >
> >
> > From: Ben Bolker <***@gmail.com<mailto:***@gmail.com>>
> > Date: Thursday, 23 Aug 2018, 10:29 PM
> > To: r-sig-mixed-***@r-project.org <r-sig-mixed-***@r-project.org
> > <mailto:r-sig-mixed-***@r-project.org>>
> > Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a
> > piecewise exponential survival with
> >
> >
> > Are the frequentist methods *not* faster? I'd be pretty surprised,
> > unless some you're hitting some terrible memory bottleneck or something.
> >
> >
> > On 2018-08-23 03:30 PM, Adam Mills-Campisi wrote:
> > > We originally tried to use stan to estimate the model, we were getting
> > > performance issues. I assumed that the frequentist approaches would be
> > > faster.
> > >
> > > On Thu, Aug 23, 2018 at 12:28 PM Doran, Harold <***@air.org> wrote:
> > >
> > >> No. You can change to an improved BLAS or I have found the Microsoft R
> > has
> > >> some built in multithreading that is fast for matrix algebra and it
> > passes
> > >> that benefit to lmer. From some experience, you can improve
> > computational
> > >> time of an lmer model with Microsoft R
> > >>
> > >> -----Original Message-----
> > >> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org>
> On
> > >> Behalf Of Adam Mills-Campisi
> > >> Sent: Thursday, August 23, 2018 3:18 PM
> > >> To: r-sig-mixed-***@r-project.org
> > >> Subject: [R-sig-ME] How to use all the cores while running glmer on a
> > >> piecewise exponential survival with
> > >>
> > >> I am estimating a piecewise exponential, mixed-effects, survival model
> > >> with recurrent events. Each individual in the dataset gets an
> individual
> > >> interpret (where using a PWP approach). Our full dataset has 10
> million
> > >> individuals, with 180 million events. I am not sure that there is any
> > >> framework which can accommodate data at that size, so we are going to
> > >> sample. Our final sample size largely depends on how quickly we can
> > >> estimate the model, which brings me to my question: Is there a way to
> > >> mutli-thread/core the model? I tried to find some kind of instruction
> on
> > >> the web and the best lead I could find was a reference to this list
> > serve.
> > >> Any help would be greatly appreciated.
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >> _______________________________________________
> > >> R-sig-mixed-***@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > >>
> > >>
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-mixed-***@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > >
> >
> > _______________________________________________
> > R-sig-mixed-***@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-***@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

[[alternative HTML version deleted]]
Jonathan Judge
2018-08-24 12:51:24 UTC
Permalink
Have you given INLA a try? That’s a lot of individuals, but as long as they are members of one “group” it might only cost you one hyperparameter, which seems to be the primary limitation for that approach.

Jonathan

Sent from my iPhone

> On Aug 23, 2018, at 5:47 PM, Adam Mills-Campisi <***@gmail.com> wrote:
>
> Thanks! We are looking into our options. The MixedModels package in Julia
> benchmarks at about 2 orders of magnitude faster than R on a small dataset;
> however, I would think a lot of that is just overhead from R. On a model of
> this size, the computational time should converge because everyone is using
> the same BLAS libraries. It might be worth further investigation if timing
> remains an issue.
>
> On Thu, Aug 23, 2018 at 2:27 PM D. Rizopoulos <***@erasmusmc.nl>
> wrote:
>
>> As suggested, an approach could be to split the original big sample in
>> manageable pieces, do the analysis in each, and then combine the results.
>>
>> Geert Molenberghs, Geert Verbeke and colleagues have worked on this; a
>> relevant recent papers seems to be:
>> https://lirias2repo.kuleuven.be/bitstream/id/470902/
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> From: Ben Bolker <***@gmail.com<mailto:***@gmail.com>>
>> Date: Thursday, 23 Aug 2018, 10:29 PM
>> To: r-sig-mixed-***@r-project.org <r-sig-mixed-***@r-project.org
>> <mailto:r-sig-mixed-***@r-project.org>>
>> Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a
>> piecewise exponential survival with
>>
>>
>> Are the frequentist methods *not* faster? I'd be pretty surprised,
>> unless some you're hitting some terrible memory bottleneck or something.
>>
>>
>>> On 2018-08-23 03:30 PM, Adam Mills-Campisi wrote:
>>> We originally tried to use stan to estimate the model, we were getting
>>> performance issues. I assumed that the frequentist approaches would be
>>> faster.
>>>
>>>> On Thu, Aug 23, 2018 at 12:28 PM Doran, Harold <***@air.org> wrote:
>>>>
>>>> No. You can change to an improved BLAS or I have found the Microsoft R
>> has
>>>> some built in multithreading that is fast for matrix algebra and it
>> passes
>>>> that benefit to lmer. From some experience, you can improve
>> computational
>>>> time of an lmer model with Microsoft R
>>>>
>>>> -----Original Message-----
>>>> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On
>>>> Behalf Of Adam Mills-Campisi
>>>> Sent: Thursday, August 23, 2018 3:18 PM
>>>> To: r-sig-mixed-***@r-project.org
>>>> Subject: [R-sig-ME] How to use all the cores while running glmer on a
>>>> piecewise exponential survival with
>>>>
>>>> I am estimating a piecewise exponential, mixed-effects, survival model
>>>> with recurrent events. Each individual in the dataset gets an individual
>>>> interpret (where using a PWP approach). Our full dataset has 10 million
>>>> individuals, with 180 million events. I am not sure that there is any
>>>> framework which can accommodate data at that size, so we are going to
>>>> sample. Our final sample size largely depends on how quickly we can
>>>> estimate the model, which brings me to my question: Is there a way to
>>>> mutli-thread/core the model? I tried to find some kind of instruction on
>>>> the web and the best lead I could find was a reference to this list
>> serve.
>>>> Any help would be greatly appreciated.
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-***@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-***@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>> _______________________________________________
>> R-sig-mixed-***@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-***@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Doran, Harold
2018-08-23 19:23:21 UTC
Permalink
One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On Behalf Of Adam Mills-Campisi
Sent: Thursday, August 23, 2018 3:18 PM
To: r-sig-mixed-***@r-project.org
Subject: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with

I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
Any help would be greatly appreciated.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Adam Mills-Campisi
2018-08-23 19:25:19 UTC
Permalink
That's the plan, the real question is how big should the samples be. The
faster we can estimate the model, the bigger the sample can be. If I can
run the model on multiple cores that would significantly increase the
sample size.

On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <***@air.org> wrote:

> One idea, though, is you can take samples from your very large data set
> and estimate models on the samples very quickly.
>
> -----Original Message-----
> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On
> Behalf Of Adam Mills-Campisi
> Sent: Thursday, August 23, 2018 3:18 PM
> To: r-sig-mixed-***@r-project.org
> Subject: [R-sig-ME] How to use all the cores while running glmer on a
> piecewise exponential survival with
>
> I am estimating a piecewise exponential, mixed-effects, survival model
> with recurrent events. Each individual in the dataset gets an individual
> interpret (where using a PWP approach). Our full dataset has 10 million
> individuals, with 180 million events. I am not sure that there is any
> framework which can accommodate data at that size, so we are going to
> sample. Our final sample size largely depends on how quickly we can
> estimate the model, which brings me to my question: Is there a way to
> mutli-thread/core the model? I tried to find some kind of instruction on
> the web and the best lead I could find was a reference to this list serve.
> Any help would be greatly appreciated.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>

[[alternative HTML version deleted]]
Doran, Harold
2018-08-23 19:32:10 UTC
Permalink
Running the model on multiple cores won’t work because lmer isn’t written that way. One idea I’ve toyed with is start with a small-ish sample and get results. Plug those in as starting values to your next run which uses larger sample, but takes fewer steps because you’re closer to the max. Repeat until the difference in the param estimates from prior run is less than some tolerance.


From: Adam Mills-Campisi <***@gmail.com>
Sent: Thursday, August 23, 2018 3:25 PM
To: Doran, Harold <***@air.org>
Cc: r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with

That's the plan, the real question is how big should the samples be. The faster we can estimate the model, the bigger the sample can be. If I can run the model on multiple cores that would significantly increase the sample size.

On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <***@air.org<mailto:***@air.org>> wrote:
One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org<mailto:r-sig-mixed-models-***@r-project.org>> On Behalf Of Adam Mills-Campisi
Sent: Thursday, August 23, 2018 3:18 PM
To: r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>
Subject: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with

I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
Any help would be greatly appreciated.

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-***@r-project.org<mailto:R-sig-mixed-***@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

[[alternative HTML version deleted]]
Ben Bolker
2018-08-23 20:08:58 UTC
Permalink
Harold, what do you think of my suggestion (partition problem into
multiple conditionally independent subsets, evaluate separate deviances
on workers, run top-level optimization on a central 'master' processor)?
Am I missing something (except that some problems can't easily be
partitioned that way?)

FWIW I think Doug Bates has pointed out in the past that for simple
(e.g. nested, not crossed) designs, the whole problem can be
reformulated in a more efficient way (of course I can't dig up that
e-mail ...). lme4's strength is that it can handle the complex cases,
and so far no-one has had the time/energy/interest/capability of
implementing any of Doug's "special case" strategies, at least in lme4
-- may be done elsewhere in R, or in Doug's MixedModels.jl ...

cheers
Ben Bolker


On 2018-08-23 03:32 PM, Doran, Harold wrote:
> Running the model on multiple cores won’t work because lmer isn’t written that way. One idea I’ve toyed with is start with a small-ish sample and get results. Plug those in as starting values to your next run which uses larger sample, but takes fewer steps because you’re closer to the max. Repeat until the difference in the param estimates from prior run is less than some tolerance.
>
>
> From: Adam Mills-Campisi <***@gmail.com>
> Sent: Thursday, August 23, 2018 3:25 PM
> To: Doran, Harold <***@air.org>
> Cc: r-sig-mixed-***@r-project.org
> Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
>
> That's the plan, the real question is how big should the samples be. The faster we can estimate the model, the bigger the sample can be. If I can run the model on multiple cores that would significantly increase the sample size.
>
> On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <***@air.org<mailto:***@air.org>> wrote:
> One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.
>
> -----Original Message-----
> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org<mailto:r-sig-mixed-models-***@r-project.org>> On Behalf Of Adam Mills-Campisi
> Sent: Thursday, August 23, 2018 3:18 PM
> To: r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>
> Subject: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
>
> I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
> Any help would be greatly appreciated.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org<mailto:R-sig-mixed-***@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
Doran, Harold
2018-08-24 14:20:59 UTC
Permalink
@ben, I like that idea. I've done that with some recent work that reduces some dimensionality in the integration and makes the problem easier to compute. I just don't know the current problem well enough to know if that is feasible here.

But, it's certainly an idea to explore.



-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On Behalf Of Ben Bolker
Sent: Thursday, August 23, 2018 4:09 PM
To: r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with


Harold, what do you think of my suggestion (partition problem into multiple conditionally independent subsets, evaluate separate deviances on workers, run top-level optimization on a central 'master' processor)?
Am I missing something (except that some problems can't easily be partitioned that way?)

FWIW I think Doug Bates has pointed out in the past that for simple (e.g. nested, not crossed) designs, the whole problem can be reformulated in a more efficient way (of course I can't dig up that e-mail ...). lme4's strength is that it can handle the complex cases, and so far no-one has had the time/energy/interest/capability of implementing any of Doug's "special case" strategies, at least in lme4
-- may be done elsewhere in R, or in Doug's MixedModels.jl ...

cheers
Ben Bolker


On 2018-08-23 03:32 PM, Doran, Harold wrote:
> Running the model on multiple cores won’t work because lmer isn’t written that way. One idea I’ve toyed with is start with a small-ish sample and get results. Plug those in as starting values to your next run which uses larger sample, but takes fewer steps because you’re closer to the max. Repeat until the difference in the param estimates from prior run is less than some tolerance.
>
>
> From: Adam Mills-Campisi <***@gmail.com>
> Sent: Thursday, August 23, 2018 3:25 PM
> To: Doran, Harold <***@air.org>
> Cc: r-sig-mixed-***@r-project.org
> Subject: Re: [R-sig-ME] How to use all the cores while running glmer
> on a piecewise exponential survival with
>
> That's the plan, the real question is how big should the samples be. The faster we can estimate the model, the bigger the sample can be. If I can run the model on multiple cores that would significantly increase the sample size.
>
> On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <***@air.org<mailto:***@air.org>> wrote:
> One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.
>
> -----Original Message-----
> From: R-sig-mixed-models
> <r-sig-mixed-models-***@r-project.org<mailto:r-sig-mixed-models-bo
> ***@r-project.org>> On Behalf Of Adam Mills-Campisi
> Sent: Thursday, August 23, 2018 3:18 PM
> To:
> r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.o
> rg>
> Subject: [R-sig-ME] How to use all the cores while running glmer on a
> piecewise exponential survival with
>
> I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
> Any help would be greatly appreciated.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org<mailto:R-sig-mixed-***@r-project.o
> rg> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
D. Rizopoulos
2018-08-24 14:28:18 UTC
Permalink
I think the idea of how to efficiently implement Laplace and adaptive Gauss-Hermite integration in nested random effects designs for GLMMS is described in https://dx.doi.org/10.1198/106186006X96962



-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On Behalf Of Doran, Harold
Sent: Friday, August 24, 2018 4:21 PM
To: Ben Bolker <***@gmail.com>; r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with

@ben, I like that idea. I've done that with some recent work that reduces some dimensionality in the integration and makes the problem easier to compute. I just don't know the current problem well enough to know if that is feasible here.

But, it's certainly an idea to explore.



-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org> On Behalf Of Ben Bolker
Sent: Thursday, August 23, 2018 4:09 PM
To: r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with


Harold, what do you think of my suggestion (partition problem into multiple conditionally independent subsets, evaluate separate deviances on workers, run top-level optimization on a central 'master' processor)?
Am I missing something (except that some problems can't easily be partitioned that way?)

FWIW I think Doug Bates has pointed out in the past that for simple (e.g. nested, not crossed) designs, the whole problem can be reformulated in a more efficient way (of course I can't dig up that e-mail ...). lme4's strength is that it can handle the complex cases, and so far no-one has had the time/energy/interest/capability of implementing any of Doug's "special case" strategies, at least in lme4
-- may be done elsewhere in R, or in Doug's MixedModels.jl ...

cheers
Ben Bolker


On 2018-08-23 03:32 PM, Doran, Harold wrote:
> Running the model on multiple cores won’t work because lmer isn’t written that way. One idea I’ve toyed with is start with a small-ish sample and get results. Plug those in as starting values to your next run which uses larger sample, but takes fewer steps because you’re closer to the max. Repeat until the difference in the param estimates from prior run is less than some tolerance.
>
>
> From: Adam Mills-Campisi <***@gmail.com>
> Sent: Thursday, August 23, 2018 3:25 PM
> To: Doran, Harold <***@air.org>
> Cc: r-sig-mixed-***@r-project.org
> Subject: Re: [R-sig-ME] How to use all the cores while running glmer
> on a piecewise exponential survival with
>
> That's the plan, the real question is how big should the samples be. The faster we can estimate the model, the bigger the sample can be. If I can run the model on multiple cores that would significantly increase the sample size.
>
> On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <***@air.org<mailto:***@air.org>> wrote:
> One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.
>
> -----Original Message-----
> From: R-sig-mixed-models
> <r-sig-mixed-models-***@r-project.org<mailto:r-sig-mixed-models-bo
> ***@r-project.org>> On Behalf Of Adam Mills-Campisi
> Sent: Thursday, August 23, 2018 3:18 PM
> To:
> r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.o
> rg>
> Subject: [R-sig-ME] How to use all the cores while running glmer on a
> piecewise exponential survival with
>
> I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
> Any help would be greatly appreciated.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org<mailto:R-sig-mixed-***@r-project.o
> rg> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-***@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Adam Mills-Campisi
2018-08-24 16:10:49 UTC
Permalink
This post might be inappropriate. Click to display it.
David Duffy
2018-08-27 02:38:47 UTC
Permalink
Maybe this is old hat, but folks might have seen

https://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2009.06127?journalCode=ucgs20

where they fit poisson and binomial GLMMs for 33M records. Just have to develop and code a suitable discrete-time model for your data ;) I did download the code for that paper, but didn't ever follow through.

Cheers, David Duffy.
Phillip Alday
2018-08-28 14:08:49 UTC
Permalink
I'm late to the party on this one, but I've been playing around with
this issue by (ab)using brms::brm_multiple() :

library("tidyverse")
library("brms")

rstan::rstan_options(autowrite=TRUE)
options(mc.cores=2)

dat.split <- dat %>%
select(A,B,C,G) %>%
group_by(G) %>%
mutate(slice=sample.int(100,n(),replace=TRUE)) %>%
ungroup()

dat.split <- split(dat.split, dat.split$slice)

dat.model.split <- brm_multiple(log10(A) ~ 1 + scale(B) * scale(C) + (1|G),
algorithm="sampling",
prior=set_prior("normal(0,2)",class="b"),
save_all_pars=TRUE,
sample_prior = TRUE,
family=student(),
chains=2,
iter=2e3,
data=dat.split)


This fits models on different subsets and then combines the posteriors.
Maybe a better Bayesian than me can point out any flaws or potential
pitfalls in this approach.

If you use a slightly less automated approach for the split and combine,
you can run multiple chains for multiple models in parallel. For the
automated split-combine with brm_multiple, each model is run in
sequence, although the chains within a given model are run in parallel
if mc.cores > 1.

Best,
Phillip

On 08/23/2018 10:08 PM, Ben Bolker wrote:
>
> Harold, what do you think of my suggestion (partition problem into
> multiple conditionally independent subsets, evaluate separate deviances
> on workers, run top-level optimization on a central 'master' processor)?
> Am I missing something (except that some problems can't easily be
> partitioned that way?)
>
> FWIW I think Doug Bates has pointed out in the past that for simple
> (e.g. nested, not crossed) designs, the whole problem can be
> reformulated in a more efficient way (of course I can't dig up that
> e-mail ...). lme4's strength is that it can handle the complex cases,
> and so far no-one has had the time/energy/interest/capability of
> implementing any of Doug's "special case" strategies, at least in lme4
> -- may be done elsewhere in R, or in Doug's MixedModels.jl ...
>
> cheers
> Ben Bolker
>
>
> On 2018-08-23 03:32 PM, Doran, Harold wrote:
>> Running the model on multiple cores won’t work because lmer isn’t written that way. One idea I’ve toyed with is start with a small-ish sample and get results. Plug those in as starting values to your next run which uses larger sample, but takes fewer steps because you’re closer to the max. Repeat until the difference in the param estimates from prior run is less than some tolerance.
>>
>>
>> From: Adam Mills-Campisi <***@gmail.com>
>> Sent: Thursday, August 23, 2018 3:25 PM
>> To: Doran, Harold <***@air.org>
>> Cc: r-sig-mixed-***@r-project.org
>> Subject: Re: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
>>
>> That's the plan, the real question is how big should the samples be. The faster we can estimate the model, the bigger the sample can be. If I can run the model on multiple cores that would significantly increase the sample size.
>>
>> On Thu, Aug 23, 2018 at 12:23 PM Doran, Harold <***@air.org<mailto:***@air.org>> wrote:
>> One idea, though, is you can take samples from your very large data set and estimate models on the samples very quickly.
>>
>> -----Original Message-----
>> From: R-sig-mixed-models <r-sig-mixed-models-***@r-project.org<mailto:r-sig-mixed-models-***@r-project.org>> On Behalf Of Adam Mills-Campisi
>> Sent: Thursday, August 23, 2018 3:18 PM
>> To: r-sig-mixed-***@r-project.org<mailto:r-sig-mixed-***@r-project.org>
>> Subject: [R-sig-ME] How to use all the cores while running glmer on a piecewise exponential survival with
>>
>> I am estimating a piecewise exponential, mixed-effects, survival model with recurrent events. Each individual in the dataset gets an individual interpret (where using a PWP approach). Our full dataset has 10 million individuals, with 180 million events. I am not sure that there is any framework which can accommodate data at that size, so we are going to sample. Our final sample size largely depends on how quickly we can estimate the model, which brings me to my question: Is there a way to mutli-thread/core the model? I tried to find some kind of instruction on the web and the best lead I could find was a reference to this list serve.
>> Any help would be greatly appreciated.
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-***@r-project.org<mailto:R-sig-mixed-***@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-***@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-***@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
Ben Bolker
2018-08-23 19:34:50 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...