Thierry Onkelinx
2018-06-06 14:24:15 UTC
Dear Nicolas,
The cbind(success, failure) notation is used when we aggregate (sum)
the number of successes and failures. The data generating process
behind it, are a series of trials which result in either success or
failure. Hence their sum will be integer.
We need to know more about your data generating process in order to
give you sensible advice. Scaling the data by using different units is
wrong. Compare binom.test(c(1, 9)) and binom.test(c(1000, 9000)). Both
yield exactly the same proportion, but their confidence interval are
very different. Why? c(1000, 9000) is much more informative than c(1,
9).
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
The cbind(success, failure) notation is used when we aggregate (sum)
the number of successes and failures. The data generating process
behind it, are a series of trials which result in either success or
failure. Hence their sum will be integer.
We need to know more about your data generating process in order to
give you sensible advice. Scaling the data by using different units is
wrong. Compare binom.test(c(1, 9)) and binom.test(c(1000, 9000)). Both
yield exactly the same proportion, but their confidence interval are
very different. Why? c(1000, 9000) is much more informative than c(1,
9).
Best regards,
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
***@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////
Dear list,
I have a question regarding GLMM's for proportion fitted with lme4.
Such models are fitted using the binomial family. When I fit such models, I
use, on the left side of the formula : cbind(success,failure).
Problem is when, for example, data are durations (duration of success and
duration of failure) that are not integer numbers if speaking in seconds.
When fitting a GLM, one can use directly in the left part of the formula a
variable that is the proportion of success. When trying to do this for a
non-integer # successes in a binomial glm! »
To avoid this, biologists I work sometimes with, used ms instead of s for
their duration times of success and failure but then the associated tests
are too powerfull...
I am not able to tell if the displayed warning message is of concern or not.
So my question is : do you think it is better to use ms instead of s or
directly the proportion?
Thanks in advance for any help that can be provided
Best regards
--
Nicolas Poulin
Ingénieur de Recherche
Centre de Statistique de Strasbourg (CeStatS)
http://www.math.unistra.fr/CeStatS/
Tél : 03 68 85 0189
IRMA, UMR 7501
Université de Strasbourg et CNRS
7 rue René-Descartes
67084 Strasbourg Cedex
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
I have a question regarding GLMM's for proportion fitted with lme4.
Such models are fitted using the binomial family. When I fit such models, I
use, on the left side of the formula : cbind(success,failure).
Problem is when, for example, data are durations (duration of success and
duration of failure) that are not integer numbers if speaking in seconds.
When fitting a GLM, one can use directly in the left part of the formula a
variable that is the proportion of success. When trying to do this for a
non-integer # successes in a binomial glm! »
To avoid this, biologists I work sometimes with, used ms instead of s for
their duration times of success and failure but then the associated tests
are too powerfull...
I am not able to tell if the displayed warning message is of concern or not.
So my question is : do you think it is better to use ms instead of s or
directly the proportion?
Thanks in advance for any help that can be provided
Best regards
--
Nicolas Poulin
Ingénieur de Recherche
Centre de Statistique de Strasbourg (CeStatS)
http://www.math.unistra.fr/CeStatS/
Tél : 03 68 85 0189
IRMA, UMR 7501
Université de Strasbourg et CNRS
7 rue René-Descartes
67084 Strasbourg Cedex
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models