`moult-likelihoods.Rmd`

The `moultmcmc`

package implements the regression models outlined in Underhill and Zucchini (1988) and Underhill, Zucchini, and Summers (1990). In their notation^{1}, samples consist of \(I\) pre-moult birds, \(J\) birds in active moult, and \(K\) post-moult birds. Birds in each category are observed on days \(t = t_1,\ldots,t_I\); \(u = u_1,\ldots,u_J\); \(v = v_1,\ldots,v_K\), respectively. Moult scores for actively moulting birds, where available, are encoded as \(y = y_1,\ldots,y_J\).

Each moult state has a probability of occurrence \[
\begin{aligned}
P(t) &= \Pr\{Y(t)=0\}&= &1-F_T(t)\\
Q(t) &= \Pr\{0<Y(t)<1\}& =&F_T(t)-F_T(t-\tau)\\
R(t) &= \Pr\{Y(t)=1\}&= &1-F_T(t-\tau)\\
\end{aligned}
\] Further, assuming a linear progression of the moult indices over time, the probability density of a particular moult score at time \(t\) is \[
f_Y(t)(y)=\tau f_T(t-y\tau),\quad0 < y < 1,
\] In `moultmcmc`

the unobserved start date of the study population is assumed to follow a normal distribution with mean \(\mu\) and standard deviation \(\sigma\), such that \[F_T(t)=\Phi\left(\frac{t-\mu}{\sigma}\right)\] where \(\Phi\) is the standard normal distribution function and \[f_T(t) = \phi(t) = \frac{1}{\sqrt{2\pi}}\exp\frac{-t^2}{2}\].

We further assume that \(F_T(t)\) has \(p\) parameters \(\mathbf{\theta} = \theta_1, \theta_2, \ldots, \theta_p\) and for convenience the start date \(\mu\), duration \(\tau\), and population standard deviation of moult \(\sigma\) will be elements of \(\mathbf{\theta}\).

Type 1 data consist of observations of categorical moult state (pre-moult, active moult, post-moult) and sampling is representative in all three categories. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,t,u,v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^JQ(u_j)\prod_{k=1}^KR(v_k). \]

Type 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult). Sampling is representative for all three categories, and for actively moulting birds a sufficiently linear moult index \(y\) (e.g. percent feather mass grown) is known. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,t,u,y,v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^Jq(u_j,y_j)\prod_{k=1}^KR(v_k), \] where \(q(u_j,y_j) = \tau f_T(u_j-y_j\tau).\)

Lumped type 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult), but where the pre-moult and post-moult states cannot be distinguished from each other, yielding \(I\) observations of dates \(t_i\) on which non-moulting birds were observed. Sampling is representative for all three categories, and for actively moulting birds a sufficiently linear moult index \(y\) (e.g. percent feather mass grown) is known. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,t,u,y) = \prod_{i=1}^IPR(t_i)\prod_{j = 1}^Jq(u_j,y_j), \] where \(q(u_j,y_j) = \tau f_T(u_j-y_j\tau)\) and \(PR(t_i)=P(t_i)+R(t_i).\)

Type 3 data consist of observations of actively moulting birds only, and a sufficiently linear moult index \(y\) (e.g. percent feather mass grown) is known for each individual. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,u,y) = \prod_{j = 1}^J\frac{q(u_j,y_j)}{Q(u_j)}. \]

Type 4 data consist of observations of birds in active moult and post-moult only. Sampling is representative for these two categories, and for actively moulting birds a sufficiently linear moult index \(y\) (e.g. percent feather mass grown) is known. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,u,y,v) = \prod_{j = 1}^J\frac{q(u_j,y_j)}{1-P(u_j)}\prod_{k=1}^K\frac{R(v_k)}{1-P(v_k)}. \]

Type 5 data consist of observations of birds in pre-moult and active moult. Sampling is representative for these two categories, and for actively moulting birds a sufficiently linear moult index \(y\) (e.g. percent feather mass grown) is known. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,t,u,y) = \prod_{i=1}^I\frac{P(t_i)}{1-R(t_i)}\prod_{j = 1}^J\frac{q(u_j,y_j)}{1-R(u_j)}. \]

As outlined in Underhill and Zucchini (1988) estimates can also be derived from mixtures of data types. Type 1 + 2 data consist of observations of birds in all three moult states (pre-moult, active moult, post-moult). Sampling is representative for all three categories, but a sufficiently linear moult index \(y\) (e.g. percent feather mass grown) is known only for some of the actively moulting birds. This means the sample consist of \(I\) pre-moult birds, \(J\) birds in active moult with known indices, \(L\) birds in active moult without known indices but known capture dates \(u'=u'_l,\ldots,u'_L\), and \(K\) post-moult birds. The likelihood of these observations is \[ \mathcal{L}(\boldsymbol\theta,t,u,y,u',v) = \prod_{i=1}^IP(t_i)\prod_{j = 1}^Jq(u_j,y_j)\prod_{l = 1}^{L}Q(u'_{l})\prod_{k=1}^KR(v_k), \]

`moultmcmc`

currently implements a recaptures model which allows for heterogeneity in start dates \(\mu\) but assumes a common moult duration \(\tau\). When repeat observations are available an individual’s start date \(\mu_n\) then becomes

\[\begin{equation} \mu_n = \mu_0 + \mu'_n + \mathbf{x}_\mu\boldsymbol{\beta}_\mu \end{equation}\]

where \(\boldsymbol{x}_\mu\) is a row vector containing the values of individual-specific predictors (in the same format as \(\boldsymbol{X}_\mu\)), and \(\mu'_n\) is an individual-level random effect intercept

\[\begin{equation} \mu'_n \sim \mathrm{Normal}(0,\sigma_n) \end{equation}\] where \(\sigma_n\) is the individual-specific standard deviation. We can then exploit the linearity assumption and treat observed moult scores as \[\begin{equation} y_{ni} \sim \mathrm{Normal}(\mu_0 + \mu'_n + \tau * u_{ni}, \sigma_\tau) \end{equation}\] where \(\sigma_\tau\) captures any unmodelled variance in \(\tau\) as well as any measurement error in \(y\).

The likelihood for the Type 3-like model for a sample of \(J\) birds in active moult without repeated observations, and \(N\) birds in active moult with a total of \(M\) repeated observations \(u'=u'_m,\ldots,u'_M\) and \(y'=y'_m,\ldots,y'_M\) then is

\[ \mathcal{L}(\boldsymbol\theta,u,y,u',y') = \prod_{j = 1}^J\frac{q(u_j,y_j)}{Q(u_j)}\prod_{m = 1}^{M}f(u'_m,y'_m)\prod_{n=1}^N\phi(\mu'_n|0,\sigma_n), \] where \(f(u'_m,y'_m)\) follows from above.

Users have a choice between two set of priors for the intercept terms of the linear predictors on the start date \(\mu\), the duration \(\tau\), and the population standard deviation of the start date \(\sigma\), respectively. By default flat priors are used for \(\mu_0\) and \(\tau_0\) and a vaguely informative normal prior on \(\ln(\sigma_0)\)

\(\mu_0 \sim \mathrm{Uniform(0,366)}\)

\(\tau_0 \sim \mathrm{Uniform(0,366)}\)

\(\ln(\sigma_0) \sim \mathrm{Normal(0,5)}\)

In some cases the models sample poorly with these priors, and better convergence can be achieved by setting the argument `flat_prior = FALSE`

. In this case vaguely informative truncated normal priors are used for \(\mu_0\) and \(\tau_0\):

\(\mu_0 \sim \mathrm{TruncNormal(150,50,0,366)}\)

\(\tau_0 \sim \mathrm{TruncNormal(100,30,0,366)}\)

These priors work well for data from passerines in seasonal environments, i.e. when the sampling occasion data is encoded as days from mid-winter.

For any additional regression coefficients an improper flat prior is used as a default.

Underhill, Les G., and Walter Zucchini. 1988. “A Model for Avian Primary Moult.” *Ibis* 130: 358–72. https://doi.org/10.1111/j.1474-919x.1988.tb00993.x.

Underhill, L. G., W. Zucchini, and R. W. Summers. 1990. “A Model for Avian Primary Moult-Data Types Based on Migration Strategies and an Example Using the Redshank *Tringa Totanus*.” *Ibis* 132: 118–23. https://doi.org/10.1111/j.1474-919x.1990.tb01024.x.

Note that in Underhill and Zucchini (1988) the variable \(t\) is doubly defined. It is both a generic variable of time in the model derivation, and denotes the sample dates of pre-moult birds in the data likelihoods↩︎