Moult data likelihoods and model priors • moultmcmc

The moultmcmc package implements the regression models outlined in Les G. Underhill and Zucchini (1988) and L. G. Underhill, Zucchini, and Summers (1990). In their notation¹, samples consist of $I$ pre-moult birds, $J$ birds in active moult, and $K$ post-moult birds. Birds in each category are observed on days $t = t_1,\ldots,t_I$ ; $u = u_1,\ldots,u_J$ ; $v = v_1,\ldots,v_K$ , respectively. Moult scores for actively moulting birds, where available, are encoded as $y = y_1,\ldots,y_J$ .

Each moult state has a probability of occurrence $\begin{aligned} P(t) &= \Pr\{Y(t)=0\}&= &1-F_T(t)\\ Q(t) &= \Pr\{0<Y(t)<1\}& =&F_T(t)-F_T(t-\tau)\\ R(t) &= \Pr\{Y(t)=1\}&= &1-F_T(t-\tau)\\ \end{aligned}$ Further, assuming a linear progression of the moult indices over time, the probability density of a particular moult score at time $t$ is $f_Y(t)(y)=\tau f_T(t-y\tau),\quad0 < y < 1,$ In moultmcmc the unobserved start date of the study population is assumed to follow a normal distribution with mean $\mu$ and standard deviation $\sigma$ , such that $F_T(t)=\Phi\left(\frac{t-\mu}{\sigma}\right)$ where $\Phi$ is the standard normal distribution function and $f_T(t) = \phi(t) = \frac{1}{\sqrt{2\pi}}\exp\frac{-t^2}{2}$ .

We further assume that $F_T(t)$ has $p$ parameters $\mathbf{\theta} = \theta_1, \theta_2, \ldots, \theta_p$ and for convenience the start date $\mu$ , duration $\tau$ , and population standard deviation of moult $\sigma$ will be elements of $\mathbf{\theta}$ .

Likelihoods

Type 1

Type 1 data consist of observations of categorical moult state (pre-moult, active moult, post-moult) and sampling is representative in all three categories. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,t,u,v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^JQ(u_j)\prod_{k=1}^KR(v_k).$

Type 2

Type 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult). Sampling is representative for all three categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,t,u,y,v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^Jq(u_j,y_j)\prod_{k=1}^KR(v_k),$ where $q(u_j,y_j) = \tau f_T(u_j-y_j\tau).$

Lumped Type 2

Lumped type 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult), but where the pre-moult and post-moult states cannot be distinguished from each other, yielding $I$ observations of dates $t_i$ on which non-moulting birds were observed. Sampling is representative for all three categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,t,u,y) = \prod_{i=1}^IPR(t_i)\prod_{j = 1}^Jq(u_j,y_j),$ where $q(u_j,y_j) = \tau f_T(u_j-y_j\tau)$ and $PR(t_i)=P(t_i)+R(t_i).$

Type 3

Type 3 data consist of observations of actively moulting birds only, and a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known for each individual. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,u,y) = \prod_{j = 1}^J\frac{q(u_j,y_j)}{Q(u_j)}.$

Type 4

Type 4 data consist of observations of birds in active moult and post-moult only. Sampling is representative for these two categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,u,y,v) = \prod_{j = 1}^J\frac{q(u_j,y_j)}{1-P(u_j)}\prod_{k=1}^K\frac{R(v_k)}{1-P(v_k)}.$

Type 5

Type 5 data consist of observations of birds in pre-moult and active moult. Sampling is representative for these two categories, and for actively moulting birds a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,t,u,y) = \prod_{i=1}^I\frac{P(t_i)}{1-R(t_i)}\prod_{j = 1}^J\frac{q(u_j,y_j)}{1-R(u_j)}.$

Type 1 + 2

As outlined in Les G. Underhill and Zucchini (1988) estimates can also be derived from mixtures of data types. Type 1 + 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult). Sampling is representative for all three categories, but a sufficiently linear moult index $y$ (e.g. percent feather mass grown) is known only for some of the actively moulting birds. This means the sample consist of $I$ pre-moult birds, $J$ birds in active moult with known indices, $L$ birds in active moult without known indices but known capture dates $u'=u'_l,\ldots,u'_L$ , and $K$ post-moult birds. The likelihood of these observations is $\mathcal{L}(\boldsymbol\theta,t,u,y,u',v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^Jq(u_j,y_j)\prod_{l = 1}^{L}Q(u'_{l})\prod_{k=1}^KR(v_k),$

Recapture models

moultmcmc currently implements a recaptures model which allows for heterogeneity in start dates $\mu$ but assumes a common moult duration $\tau$ . When repeat observations are available an individual’s start date $\mu_n$ then becomes

$\begin{equation} \mu_n = \mu_0 + \mu'_n + \mathbf{x}_\mu\boldsymbol{\beta}_\mu \end{equation}$

where $\boldsymbol{x}_\mu$ is a row vector containing the values of individual-specific predictors (in the same format as $\boldsymbol{X}_\mu$ ), and $\mu'_n$ is an individual-level random effect intercept

$\begin{equation} \mu'_n \sim \mathrm{Normal}(0,\sigma_n) \end{equation}$ where $\sigma_n$ is the individual-specific standard deviation. We can then exploit the linearity assumption and treat observed moult scores as $\begin{equation} y_{ni} \sim \mathrm{Normal}(\mu_0 + \mu'_n + \tau * u_{ni}, \sigma_\tau) \end{equation}$ where $\sigma_\tau$ captures any unmodelled variance in $\tau$ as well as any measurement error in $y$ .

The likelihood for the Type 3-like model for a sample of $J$ birds in active moult without repeated observations, and $N$ birds in active moult with a total of $M$ repeated observations $u'=u'_m,\ldots,u'_M$ and $y'=y'_m,\ldots,y'_M$ then is

$\mathcal{L}(\boldsymbol\theta,u,y,u',y') = \prod_{j = 1}^J\frac{q(u_j,y_j)}{Q(u_j)}\prod_{m = 1}^{M}f(u'_m,y'_m)\prod_{n=1}^N\phi(\mu'_n|0,\sigma_n),$ where $f(u'_m,y'_m)$ follows from above.

Priors

Users have a choice between two set of priors for the intercept terms of the linear predictors on the start date $\mu$ , the duration $\tau$ , and the population standard deviation of the start date $\sigma$ , respectively. By default flat priors are used for $\mu_0$ and $\tau_0$ and a vaguely informative normal prior on $\ln(\sigma_0)$

$\mu_0 \sim \mathrm{Uniform(0,366)}$
$\tau_0 \sim \mathrm{Uniform(0,366)}$
$\ln(\sigma_0) \sim \mathrm{Normal(0,5)}$

In some cases the models sample poorly with these priors, and better convergence can be achieved by setting the argument flat_prior = FALSE. In this case vaguely informative truncated normal priors are used for $\mu_0$ and $\tau_0$ :

$\mu_0 \sim \mathrm{TruncNormal(150,50,0,366)}$
$\tau_0 \sim \mathrm{TruncNormal(100,30,0,366)}$

These priors work well for data from passerines in seasonal environments, i.e. when the sampling occasion data is encoded as days from mid-winter.

For any additional regression coefficients an improper flat prior is used as a default.

References

Underhill, L. G., W. Zucchini, and R. W. Summers. 1990. “A Model for Avian Primary Moult-Data Types Based on Migration Strategies and an Example Using the Redshank Tringa Totanus.” Ibis 132: 118–23. https://doi.org/10.1111/j.1474-919x.1990.tb01024.x.

Underhill, Les G., and Walter Zucchini. 1988. “A Model for Avian Primary Moult.” Ibis 130: 358–72. https://doi.org/10.1111/j.1474-919x.1988.tb00993.x.

Note that in Les G. Underhill and Zucchini (1988) the variable $t$ is doubly defined. It is both a generic variable of time in the model derivation, and denotes the sample dates of pre-moult birds in the data likelihoods↩︎