Discussion:
[R-sig-ME] Question on hierarchical nature and data format using lmer
Bernard Liew
2018-06-25 15:09:58 UTC
Permalink
Dear Community,

Thank you first for the help. My question pertains to a research design as follow:

200 students in total from 4 schools, undergoing different clinical placements in a semester. There are 5 different plausible clinical placements. This means some students have zero placements, others can have a maximum of three, with any placement combinations. Two out of three clinical placements are restricted to some schools. So some clinical placements are nested within schools, others are crossed across schools.

The response variable is an ordinal measure Likert scale of "sharing". The predictors are school and placement.

Qn to be answered: Does different school and clinical placement alter a student's degree of sharing?

Problem 1: data format

The traditional way to format the data is the long "tidy" method. However, because placements are not unique to an individual, how best should one format the data?

Solution 1 ( I think): make the placement variable into a wide format, so instead of one placement predictor, I now have five different placement predictors. This then appears to change the research question? Is there another solution?

Kind regards,
Bernard


[[alternative HTML version deleted]]
Thierry Onkelinx
2018-06-26 08:37:55 UTC
Permalink
Dear Bernard,

The typical format is one row of data per observation. If you have one
measurement per student, then you need to have a column per clinical
placement (with a TRUE or FALSE value).

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
Post by Bernard Liew
Dear Community,
200 students in total from 4 schools, undergoing different clinical placements in a semester. There are 5 different plausible clinical placements. This means some students have zero placements, others can have a maximum of three, with any placement combinations. Two out of three clinical placements are restricted to some schools. So some clinical placements are nested within schools, others are crossed across schools.
The response variable is an ordinal measure Likert scale of "sharing". The predictors are school and placement.
Qn to be answered: Does different school and clinical placement alter a student's degree of sharing?
Problem 1: data format
The traditional way to format the data is the long "tidy" method. However, because placements are not unique to an individual, how best should one format the data?
Solution 1 ( I think): make the placement variable into a wide format, so instead of one placement predictor, I now have five different placement predictors. This then appears to change the research question? Is there another solution?
Kind regards,
Bernard
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thierry Onkelinx
2018-06-27 15:13:48 UTC
Permalink
Dear Bernard,

Using a factor both in the fixed effects and the random effects is
nonsense. I wrote a blog post on that topic:
https://www.muscardinus.be/2017/08/fixed-and-random/

Do all programs have multiple clinics? If programme "physio" has only
clinic "A" and clinic "A" is only used in programme "physio" then it
is impossible to distinguish between the effect of the programme and
the effect of the clinic. Can you illustrate your design as a simple
table (e.g. programme as column and clinic as row)? Note that any HTML
formatting will be removed, so you plain text formatting only.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
Thanks Thierry,
I thought so. So this leads on to the next question of how to then specify the random effects structure. Given predictors of study programme (4 levels: physio, med, nurse, speed), clinicA (yes/no), clinicB(yes/no), clinicC(yes/no), clinicD(yes/no), clinicE(yes/no), and s clinic A,B,C are nested in study programme [ie. Some clinics are only offered in some programme), and D,E are crossed across programme (common to all programmes)
Is the following logical? I am using the ordinal package, but the formula follows that of lmer.
clmm (as.factor (Sharing) ~ Programme + clinicA + clinicB + clinicC+ clinicD+ clinicE +
(1| Programme) + (1| Programme:clinicA) + (1| Programme:clinicB) + (1| Programme:clinicC) +
(1| clinicD) + (1| clinicE) + (1| Programme:clinicD) + (1| Programme:clinicE) ,
data = dat,
link = "logit",
threshold = "equidistant")
Many thanks,
Bernard
-----Original Message-----
Sent: 26 June 2018 09:38
Subject: Re: [R-sig-ME] Question on hierarchical nature and data format using lmer
Dear Bernard,
The typical format is one row of data per observation. If you have one measurement per student, then you need to have a column per clinical placement (with a TRUE or FALSE value).
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ///////////////////////////////////////////////////////////////////////////////////////////
Post by Bernard Liew
Dear Community,
200 students in total from 4 schools, undergoing different clinical placements in a semester. There are 5 different plausible clinical placements. This means some students have zero placements, others can have a maximum of three, with any placement combinations. Two out of three clinical placements are restricted to some schools. So some clinical placements are nested within schools, others are crossed across schools.
The response variable is an ordinal measure Likert scale of "sharing". The predictors are school and placement.
Qn to be answered: Does different school and clinical placement alter a student's degree of sharing?
Problem 1: data format
The traditional way to format the data is the long "tidy" method. However, because placements are not unique to an individual, how best should one format the data?
Solution 1 ( I think): make the placement variable into a wide format, so instead of one placement predictor, I now have five different placement predictors. This then appears to change the research question? Is there another solution?
Kind regards,
Bernard
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Bernard Liew
2018-06-27 15:45:57 UTC
Permalink
Great Thierry!

I am really exposing my ignorance, but I am also learning a lot. I design matrix shows up well. The research question is simple: Does study programme and different clinical placements predict a student's degree of sharing (ordinal).

Programme A B C D E
Medicine N N N Y Y
Medicine N N N Y Y
Medicine N N N Y Y
Nurse Y Y N Y Y
Nurse Y Y N Y Y
Nurse Y Y N Y Y
Physio Y Y Y Y Y
Physio Y Y Y Y Y
Physio Y Y Y Y Y
Speech Y Y Y Y Y
Speech Y Y Y Y Y
Speech Y Y Y Y Y


Kind regards,
Bernard
-----Original Message-----
From: ***@inbo.be <***@inbo.be>
Sent: 27 June 2018 16:14
To: Bernard Liew (School of Sport Exercise and Rehabilitation Sciences) <***@bham.ac.uk>
Cc: r-sig-mixed-***@r-project.org
Subject: Re: [R-sig-ME] Question on hierarchical nature and data format using lmer

Dear Bernard,

Using a factor both in the fixed effects and the random effects is nonsense. I wrote a blog post on that topic:
https://www.muscardinus.be/2017/08/fixed-and-random/

Do all programs have multiple clinics? If programme "physio" has only clinic "A" and clinic "A" is only used in programme "physio" then it is impossible to distinguish between the effect of the programme and the effect of the clinic. Can you illustrate your design as a simple table (e.g. programme as column and clinic as row)? Note that any HTML formatting will be removed, so you plain text formatting only.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance ***@inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ///////////////////////////////////////////////////////////////////////////////////////////
Thanks Thierry,
I thought so. So this leads on to the next question of how to then
specify the random effects structure. Given predictors of study
programme (4 levels: physio, med, nurse, speed), clinicA (yes/no),
clinicB(yes/no), clinicC(yes/no), clinicD(yes/no), clinicE(yes/no),
and s clinic A,B,C are nested in study programme [ie. Some clinics are
only offered in some programme), and D,E are crossed across programme
(common to all programmes)
Is the following logical? I am using the ordinal package, but the formula follows that of lmer.
clmm (as.factor (Sharing) ~ Programme + clinicA + clinicB + clinicC+ clinicD+ clinicE +
(1| Programme) + (1| Programme:clinicA) + (1| Programme:clinicB) + (1| Programme:clinicC) +
(1| clinicD) + (1| clinicE) + (1| Programme:clinicD) + (1| Programme:clinicE) ,
data = dat,
link = "logit",
threshold = "equidistant")
Many thanks,
Bernard
-----Original Message-----
Sent: 26 June 2018 09:38
To: Bernard Liew (School of Sport Exercise and Rehabilitation
Subject: Re: [R-sig-ME] Question on hierarchical nature and data format using lmer
Dear Bernard,
The typical format is one row of data per observation. If you have one measurement per student, then you need to have a column per clinical placement (with a TRUE or FALSE value).
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN
BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie
& Kwaliteitszorg / Team Biometrics & Quality Assurance
//////////////////////////////////////////////////////////////////////
///////////////////// To call in the statistician after the experiment
is done may be no more than asking him to perform a post-mortem
examination: he may be able to say what the experiment died of. ~ Sir
Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger
Brinner The combination of some data and an aching desire for an
answer does not ensure that a reasonable answer can be extracted from
a given body of data. ~ John Tukey
//////////////////////////////////////////////////////////////////////
/////////////////////
Post by Bernard Liew
Dear Community,
200 students in total from 4 schools, undergoing different clinical placements in a semester. There are 5 different plausible clinical placements. This means some students have zero placements, others can have a maximum of three, with any placement combinations. Two out of three clinical placements are restricted to some schools. So some clinical placements are nested within schools, others are crossed across schools.
The response variable is an ordinal measure Likert scale of "sharing". The predictors are school and placement.
Qn to be answered: Does different school and clinical placement alter a student's degree of sharing?
Problem 1: data format
The traditional way to format the data is the long "tidy" method. However, because placements are not unique to an individual, how best should one format the data?
Solution 1 ( I think): make the placement variable into a wide format, so instead of one placement predictor, I now have five different placement predictors. This then appears to change the research question? Is there another solution?
Kind regards,
Bernard
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Thierry Onkelinx
2018-06-28 09:04:28 UTC
Permalink
Dear Bernard,

It seems like you need no random effects at all. Something like
clm(Sharing ~ Programma/Clinic) could do.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
Post by Bernard Liew
Great Thierry!
I am really exposing my ignorance, but I am also learning a lot. I design matrix shows up well. The research question is simple: Does study programme and different clinical placements predict a student's degree of sharing (ordinal).
Programme A B C D E
Medicine N N N Y Y
Medicine N N N Y Y
Medicine N N N Y Y
Nurse Y Y N Y Y
Nurse Y Y N Y Y
Nurse Y Y N Y Y
Physio Y Y Y Y Y
Physio Y Y Y Y Y
Physio Y Y Y Y Y
Speech Y Y Y Y Y
Speech Y Y Y Y Y
Speech Y Y Y Y Y
Kind regards,
Bernard
-----Original Message-----
Sent: 27 June 2018 16:14
Subject: Re: [R-sig-ME] Question on hierarchical nature and data format using lmer
Dear Bernard,
https://www.muscardinus.be/2017/08/fixed-and-random/
Do all programs have multiple clinics? If programme "physio" has only clinic "A" and clinic "A" is only used in programme "physio" then it is impossible to distinguish between the effect of the programme and the effect of the clinic. Can you illustrate your design as a simple table (e.g. programme as column and clinic as row)? Note that any HTML formatting will be removed, so you plain text formatting only.
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey ///////////////////////////////////////////////////////////////////////////////////////////
Thanks Thierry,
I thought so. So this leads on to the next question of how to then
specify the random effects structure. Given predictors of study
programme (4 levels: physio, med, nurse, speed), clinicA (yes/no),
clinicB(yes/no), clinicC(yes/no), clinicD(yes/no), clinicE(yes/no),
and s clinic A,B,C are nested in study programme [ie. Some clinics are
only offered in some programme), and D,E are crossed across programme
(common to all programmes)
Is the following logical? I am using the ordinal package, but the formula follows that of lmer.
clmm (as.factor (Sharing) ~ Programme + clinicA + clinicB + clinicC+ clinicD+ clinicE +
(1| Programme) + (1| Programme:clinicA) + (1| Programme:clinicB) + (1| Programme:clinicC) +
(1| clinicD) + (1| clinicE) + (1| Programme:clinicD) + (1| Programme:clinicE) ,
data = dat,
link = "logit",
threshold = "equidistant")
Many thanks,
Bernard
-----Original Message-----
Sent: 26 June 2018 09:38
To: Bernard Liew (School of Sport Exercise and Rehabilitation
Subject: Re: [R-sig-ME] Question on hierarchical nature and data format using lmer
Dear Bernard,
The typical format is one row of data per observation. If you have one measurement per student, then you need to have a column per clinical placement (with a TRUE or FALSE value).
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN
BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie
& Kwaliteitszorg / Team Biometrics & Quality Assurance
//////////////////////////////////////////////////////////////////////
///////////////////// To call in the statistician after the experiment
is done may be no more than asking him to perform a post-mortem
examination: he may be able to say what the experiment died of. ~ Sir
Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger
Brinner The combination of some data and an aching desire for an
answer does not ensure that a reasonable answer can be extracted from
a given body of data. ~ John Tukey
//////////////////////////////////////////////////////////////////////
/////////////////////
Post by Bernard Liew
Dear Community,
200 students in total from 4 schools, undergoing different clinical placements in a semester. There are 5 different plausible clinical placements. This means some students have zero placements, others can have a maximum of three, with any placement combinations. Two out of three clinical placements are restricted to some schools. So some clinical placements are nested within schools, others are crossed across schools.
The response variable is an ordinal measure Likert scale of "sharing". The predictors are school and placement.
Qn to be answered: Does different school and clinical placement alter a student's degree of sharing?
Problem 1: data format
The traditional way to format the data is the long "tidy" method. However, because placements are not unique to an individual, how best should one format the data?
Solution 1 ( I think): make the placement variable into a wide format, so instead of one placement predictor, I now have five different placement predictors. This then appears to change the research question? Is there another solution?
Kind regards,
Bernard
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Loading...