Discussion:
[R-sig-ME] glmmTMB: how to calculate posterior prob. of structural zero?
Aaron Mackey
2018-10-03 18:18:51 UTC
Permalink
I'm happily using glmmTMB to fit zero-inflated count models with my data,
but I'd like to also know which zeroes in my data are more likely (or not)
to be structural vs. expected from the conditional distribution. I know how
to use Bayes formula to calculate the posterior, and predict(zinb,
type="zprob") gives me the prior probabilites for each data point being
structural or not (respecting the zero inflation part of the model), and
the likelihoods for the structural components are 1 (if the data point is a
zero) or 0 (if the data point is not a zero) -- but is there a way to
extract the likelihood for each zero data point with respect to the
conditional part of the model?

thanks,
-Aaron

[[alternative HTML version deleted]]
Ben Bolker
2018-10-03 19:01:09 UTC
Permalink
If you're fitting a zero-inflated Poisson model, it would be
especially easy because the zero probability is simply exp(-lambda), so
I believe the "posterior"(ish) probability that the zero is due to the
structural rather than conditional part of the model would be

P(zero|structural)/P(zero) =
P(zero|structural)/(P(zero_structural)+P(zero_conditional))

or

prob_struc <- function(model) {
strucprob <- predict(model, type="zprob")
condprob <- exp(-predict(model, type="conditional"))
return(strucprob/(strucprob+condprob))
}

For a negative binomial ("nbinom2", or quadratic parameterization) the
zero probability is (k/(k+mu))^k (I think ... e.g.
dnbinom(0,mu=0.5,size=0.5) is sqrt(0.5)). sigma(model) returns the
overdispersion parameter k, so a function as above but with

k <- sigma(model)
condprob <- (k/(k+predict(model,type="conditional")))^k

inserted should do the trick. (Please test these yourself and make
sure they are sensible before proceeding!)
Post by Aaron Mackey
I'm happily using glmmTMB to fit zero-inflated count models with my data,
but I'd like to also know which zeroes in my data are more likely (or not)
to be structural vs. expected from the conditional distribution. I know how
to use Bayes formula to calculate the posterior, and predict(zinb,
type="zprob") gives me the prior probabilites for each data point being
structural or not (respecting the zero inflation part of the model), and
the likelihoods for the structural components are 1 (if the data point is a
zero) or 0 (if the data point is not a zero) -- but is there a way to
extract the likelihood for each zero data point with respect to the
conditional part of the model?
thanks,
-Aaron
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Aaron Mackey
2018-10-05 20:07:15 UTC
Permalink
Thanks for this, it seems to provide sensible numbers; but just to confirm,
does predict(model, type="conditional") actually use the Y variable
observed count? Or does it generate the expected log count from the
provided X variables (as in the case when newdata is provided)?

-Aaron
Post by Ben Bolker
If you're fitting a zero-inflated Poisson model, it would be
especially easy because the zero probability is simply exp(-lambda), so
I believe the "posterior"(ish) probability that the zero is due to the
structural rather than conditional part of the model would be
P(zero|structural)/P(zero) =
P(zero|structural)/(P(zero_structural)+P(zero_conditional))
or
prob_struc <- function(model) {
strucprob <- predict(model, type="zprob")
condprob <- exp(-predict(model, type="conditional"))
return(strucprob/(strucprob+condprob))
}
For a negative binomial ("nbinom2", or quadratic parameterization) the
zero probability is (k/(k+mu))^k (I think ... e.g.
dnbinom(0,mu=0.5,size=0.5) is sqrt(0.5)). sigma(model) returns the
overdispersion parameter k, so a function as above but with
k <- sigma(model)
condprob <- (k/(k+predict(model,type="conditional")))^k
inserted should do the trick. (Please test these yourself and make
sure they are sensible before proceeding!)
Post by Aaron Mackey
I'm happily using glmmTMB to fit zero-inflated count models with my data,
but I'd like to also know which zeroes in my data are more likely (or
not)
Post by Aaron Mackey
to be structural vs. expected from the conditional distribution. I know
how
Post by Aaron Mackey
to use Bayes formula to calculate the posterior, and predict(zinb,
type="zprob") gives me the prior probabilites for each data point being
structural or not (respecting the zero inflation part of the model), and
the likelihoods for the structural components are 1 (if the data point
is a
Post by Aaron Mackey
zero) or 0 (if the data point is not a zero) -- but is there a way to
extract the likelihood for each zero data point with respect to the
conditional part of the model?
thanks,
-Aaron
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
Ben Bolker
2018-10-05 20:30:15 UTC
Permalink
Thanks for this, it seems to provide sensible numbers; but just to confirm, does predict(model, type="conditional") actually use the Y variable observed count? Or does it generate the expected log count from the provided X variables (as in the case when newdata is provided)?
It generates the expected *count* (not the log count) *of the
conditional part of the model only* (i.e. not including structural
zeros), based on the estimated parameters and a model matrix (either
the one from the model or a newly generated one, if newdata is
specified)
-Aaron
Post by Ben Bolker
If you're fitting a zero-inflated Poisson model, it would be
especially easy because the zero probability is simply exp(-lambda), so
I believe the "posterior"(ish) probability that the zero is due to the
structural rather than conditional part of the model would be
P(zero|structural)/P(zero) =
P(zero|structural)/(P(zero_structural)+P(zero_conditional))
or
prob_struc <- function(model) {
strucprob <- predict(model, type="zprob")
condprob <- exp(-predict(model, type="conditional"))
return(strucprob/(strucprob+condprob))
}
For a negative binomial ("nbinom2", or quadratic parameterization) the
zero probability is (k/(k+mu))^k (I think ... e.g.
dnbinom(0,mu=0.5,size=0.5) is sqrt(0.5)). sigma(model) returns the
overdispersion parameter k, so a function as above but with
k <- sigma(model)
condprob <- (k/(k+predict(model,type="conditional")))^k
inserted should do the trick. (Please test these yourself and make
sure they are sensible before proceeding!)
Post by Aaron Mackey
I'm happily using glmmTMB to fit zero-inflated count models with my data,
but I'd like to also know which zeroes in my data are more likely (or not)
to be structural vs. expected from the conditional distribution. I know how
to use Bayes formula to calculate the posterior, and predict(zinb,
type="zprob") gives me the prior probabilites for each data point being
structural or not (respecting the zero inflation part of the model), and
the likelihoods for the structural components are 1 (if the data point is a
zero) or 0 (if the data point is not a zero) -- but is there a way to
extract the likelihood for each zero data point with respect to the
conditional part of the model?
thanks,
-Aaron
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Aaron Mackey
2018-10-06 00:29:53 UTC
Permalink
Right -- so that's not what we need, is it? We need the probability of
having actually seen a zero, given the conditional model (regardless of
what the expected count might be). The probability of seeing a zero in the
other part of the mixture is 1.0, and the prior for both of the two models
is the mixture coefficient (and 1-coef). But we still need the likelihood
of the zero, given the conditional model, not the likelihood of seeing the
expected value.

-Aaron
Post by Aaron Mackey
Thanks for this, it seems to provide sensible numbers; but just to
confirm, does predict(model, type="conditional") actually use the Y
variable observed count? Or does it generate the expected log count from
the provided X variables (as in the case when newdata is provided)?
It generates the expected *count* (not the log count) *of the
conditional part of the model only* (i.e. not including structural
zeros), based on the estimated parameters and a model matrix (either
the one from the model or a newly generated one, if newdata is
specified)
Post by Aaron Mackey
-Aaron
Post by Ben Bolker
If you're fitting a zero-inflated Poisson model, it would be
especially easy because the zero probability is simply exp(-lambda), so
I believe the "posterior"(ish) probability that the zero is due to the
structural rather than conditional part of the model would be
P(zero|structural)/P(zero) =
P(zero|structural)/(P(zero_structural)+P(zero_conditional))
or
prob_struc <- function(model) {
strucprob <- predict(model, type="zprob")
condprob <- exp(-predict(model, type="conditional"))
return(strucprob/(strucprob+condprob))
}
For a negative binomial ("nbinom2", or quadratic parameterization) the
zero probability is (k/(k+mu))^k (I think ... e.g.
dnbinom(0,mu=0.5,size=0.5) is sqrt(0.5)). sigma(model) returns the
overdispersion parameter k, so a function as above but with
k <- sigma(model)
condprob <- (k/(k+predict(model,type="conditional")))^k
inserted should do the trick. (Please test these yourself and make
sure they are sensible before proceeding!)
Post by Aaron Mackey
I'm happily using glmmTMB to fit zero-inflated count models with my
data,
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
but I'd like to also know which zeroes in my data are more likely (or
not)
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
to be structural vs. expected from the conditional distribution. I
know how
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
to use Bayes formula to calculate the posterior, and predict(zinb,
type="zprob") gives me the prior probabilites for each data point
being
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
structural or not (respecting the zero inflation part of the model),
and
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
the likelihoods for the structural components are 1 (if the data
point is a
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
zero) or 0 (if the data point is not a zero) -- but is there a way to
extract the likelihood for each zero data point with respect to the
conditional part of the model?
thanks,
-Aaron
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
Aaron Mackey
2018-10-06 00:33:31 UTC
Permalink
Nevermind, I think I see my mistake -- the conditional expectation gives
the new mean, and with the dispersion you get a P value for having seen
zero, given the mean and dispersion. Sorry to be daft. If I wanted the
probability of some value other than zero, then the formula would contain
that actual value. thanks again! -Aaron
Post by Aaron Mackey
Right -- so that's not what we need, is it? We need the probability of
having actually seen a zero, given the conditional model (regardless of
what the expected count might be). The probability of seeing a zero in the
other part of the mixture is 1.0, and the prior for both of the two models
is the mixture coefficient (and 1-coef). But we still need the likelihood
of the zero, given the conditional model, not the likelihood of seeing the
expected value.
-Aaron
Post by Aaron Mackey
Thanks for this, it seems to provide sensible numbers; but just to
confirm, does predict(model, type="conditional") actually use the Y
variable observed count? Or does it generate the expected log count from
the provided X variables (as in the case when newdata is provided)?
It generates the expected *count* (not the log count) *of the
conditional part of the model only* (i.e. not including structural
zeros), based on the estimated parameters and a model matrix (either
the one from the model or a newly generated one, if newdata is
specified)
Post by Aaron Mackey
-Aaron
Post by Ben Bolker
If you're fitting a zero-inflated Poisson model, it would be
especially easy because the zero probability is simply exp(-lambda), so
I believe the "posterior"(ish) probability that the zero is due to the
structural rather than conditional part of the model would be
P(zero|structural)/P(zero) =
P(zero|structural)/(P(zero_structural)+P(zero_conditional))
or
prob_struc <- function(model) {
strucprob <- predict(model, type="zprob")
condprob <- exp(-predict(model, type="conditional"))
return(strucprob/(strucprob+condprob))
}
For a negative binomial ("nbinom2", or quadratic parameterization) the
zero probability is (k/(k+mu))^k (I think ... e.g.
dnbinom(0,mu=0.5,size=0.5) is sqrt(0.5)). sigma(model) returns the
overdispersion parameter k, so a function as above but with
k <- sigma(model)
condprob <- (k/(k+predict(model,type="conditional")))^k
inserted should do the trick. (Please test these yourself and make
sure they are sensible before proceeding!)
Post by Aaron Mackey
I'm happily using glmmTMB to fit zero-inflated count models with my
data,
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
but I'd like to also know which zeroes in my data are more likely
(or not)
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
to be structural vs. expected from the conditional distribution. I
know how
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
to use Bayes formula to calculate the posterior, and predict(zinb,
type="zprob") gives me the prior probabilites for each data point
being
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
structural or not (respecting the zero inflation part of the model),
and
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
the likelihoods for the structural components are 1 (if the data
point is a
Post by Aaron Mackey
Post by Ben Bolker
Post by Aaron Mackey
zero) or 0 (if the data point is not a zero) -- but is there a way to
extract the likelihood for each zero data point with respect to the
conditional part of the model?
thanks,
-Aaron
[[alternative HTML version deleted]]
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]

Loading...