5 Categorical Random Interactions

In Chapter 4 we looked at a simple random-effect specification where a single effect is associated with each level of a categorical predictor and the effects are assumed to be identically and independently distributed. In the next three chapters we will cover more complicated random-effect specifications. In this Chapter we will start by exploring models where an interaction is formed between two categorical predictors, one which (usually) has few levels with main effects fitted as fixed and the other which (usually) has many levels and would normally be fitted as random. We will also cover related models where the random effects associated with two or more sets of categorical predictors (including residual effects) are correlated. This includes multi-membership models as a special case.

It is probably easiest to start with an example. The data set $\texttt{sigma}$ contains data on how well fruit flies (Drosophila melanogaster) transmit sigma virus to their offspring (Carpenter et al. 2012).

data(sigma)
head(sigma)

##   infected not_infected  virus line
## 1        2           38    USA 391A
## 2        0            8    USA 391A
## 3        3           50 France 392A
## 4       17           48 France 556A
## 5       10           16 France 556A
## 6        1           14 Greece 392A

The number of flies in a vial that are infected with the virus ($\texttt{infected}$) or not ($\texttt{not_infected}$) are recorded. The mean number of flies per vial is 29 although the range is large (1 - 96 ). All the flies in a vial are the offspring of parents infected with one of five strains ($\texttt{strain}$) of sigma virus, a virus that is transmitted to offspring through eggs and sperm. The parents belong to one of 67 genetically distinct lines ($\texttt{line}$). Let’s start with a simple binomial GLMM that we covered in Chapter 4, but to keep things manageable we’ll only analyse data from three strains ($\texttt{France}$, $\texttt{Greece}$ and $\texttt{Spain}$, named after their country of origin). Note that we will specify the prior using prior generator functions (Section 4.7) and so this specification can be recycled in all models. See Section @ref(VCVprior-r–sec) for why we have specified a scaled-$F_{1,2}$ distribution rather than the scaled-$F_{1,1}$ distribution used when estimating scalar variances (Section 4.6).

sigma_small<-subset(sigma, virus%in%c("France","Greece", "Spain"))

prior.sigma=list(R=IW(1, 0.002), G=F(2, 1000))

m.sigma.1<-MCMCglmm(cbind(infected, not_infected)~virus, random=~line, data=sigma_small, family="binomial", prior=prior.sigma)

summary(m.sigma.1)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 16739.41 
## 
##  G-structure:  ~line
## 
##      post.mean l-95% CI u-95% CI eff.samp
## line     3.954    2.328    5.649    850.3
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units     2.686     2.25    3.099    661.5
## 
##  Location effects: cbind(infected, not_infected) ~ virus 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)  -0.66374 -1.22050 -0.08718    717.3  0.022 *  
## virusGreece  -1.73961 -2.10652 -1.39823   1000.0 <0.001 ***
## virusSpain    0.53186  0.16149  0.89971   1000.0  0.004 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the base-line virus, $\texttt{France}$, the estimate of the median probability of infection is 0.34, obtained by applying the plogis function to the intercept (Section 3.6.2). However, the viruses have substantially different infection rates with the median odds of infection for $\texttt{Greece}$ being only 0.18 times that of $\texttt{France}$, whereas $\texttt{Spain}$ has an odds of infection 1.73 times greater (obtained by exponentiating the coefficients).

There is also substantial variation across lines in how well they transmit the virus (posterior mean for $\sigma^2_\texttt{line}$= 3.95) and substantial overdispersion ($\sigma^2_\texttt{units}$= 2.69). However, this model contains the implicit assumption that the increased odds of infection associated with a line is constant over viruses. For example, the first level of $\texttt{line}$ is $\texttt{101A}$. If this line’s effect is $u^{(\texttt{line})}_1$ then the linear predictor for all observations of that line will include $u^{(\texttt{line})}_1$ and the odds of infection is expected to change by a factor $\textrm{exp}(u^{(\texttt{line})}_1)$ irrespective of the virus. However, we may expect there to be genetic interactions between the virus and the fly such that the infectivity of a line differs depending on the virus. We could add a simple interaction between line and variance and treat these ($67\times 3 = 201$) interaction effects as random:

m.sigma.2<-MCMCglmm(cbind(infected, not_infected)~virus, random=~line+virus:line, data=sigma_small, family="binomial", prior=prior.sigma)

summary(m.sigma.2)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 16731.28 
## 
##  G-structure:  ~line
## 
##      post.mean l-95% CI u-95% CI eff.samp
## line     3.615    2.043    5.274    315.9
## 
##                ~virus:line
## 
##            post.mean l-95% CI u-95% CI eff.samp
## virus:line    0.6537   0.2215     1.09    679.3
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units     2.317    1.907    2.725    741.8
## 
##  Location effects: cbind(infected, not_infected) ~ virus 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)   -0.6611  -1.2598  -0.0904     1000  0.020 *  
## virusGreece   -1.7384  -2.1970  -1.2575     1000 <0.001 ***
## virusSpain     0.5568   0.1203   1.0217     1000  0.016 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

While the variance in the interaction effects ($\sigma^2_\texttt{virus:line}=$ 0.65) is substantially smaller than the main $\texttt{line}$ effects, the lower 95% credible interval is well removed from zero suggesting that interactions are present. This model still has implicit assumptions baked into it about how lines vary in their infectivity for a given virus, and how lines covary in their infectivity between viruses. To avoid complications later, I will - with some abuse of notation - call the main $\texttt{line}$ effects in this model $u^{(\texttt{m-line})}$ and their variance $\sigma^2_\texttt{m-line}$, with a posterior mean of 3.61. The aggregate effect of $\texttt{line}$ $i$ infected with $\texttt{virus}$ $j$ is then given by:

\[ u^{(\texttt{line})}_{ij} = u^{(\texttt{m-line})}_i+u^{(\texttt{virus:line})}_{ij}\]

If we took the linear predictor for two observations made on the same line in the same virus they would both share the same two random effects and therefore have the same $u^{(\texttt{line})}_{ij}$. Consequently, the covariance in their linear predictor is the variance of $u^{(\texttt{line})}$: $\sigma^2_\texttt{m-line}+\sigma^2_\texttt{virus:line}$. Since these two variances are assumed constant over viruses, we are implicitly assuming that the variance between-lines is the same for all viruses. If we took the linear predictor for two observations made on the same line but in different viruses they would both share the same $u^{(\texttt{m-line})}$ effect but have different interaction effects ($u^{(\texttt{virus:line})}$). Consequently, the covariance in their linear predictor is the variance of $u^{(\texttt{m-line})}$ only: $\sigma^2_\texttt{m-line}$. Again, $\sigma^2_\texttt{m-line}$ is assumed constant over viruses and so this implies that the between-line covariance is also constant: the covariance between lines infected with $\texttt{France}$ versus $\texttt{Spain}$ is the same as the covariance between lines infected with $\texttt{France}$ versus $\texttt{Greece}$ or $\texttt{Spain}$ versus $\texttt{Greece}$. This constancy is also reflected in the between-line correlation:

\[r_{\texttt{line}}= \frac{\sigma^2_\texttt{m-line}}{\sigma^2_\texttt{m-line}+\sigma^2_\texttt{virus:line}}\]

which has a posterior mean of 0.84 and a relatively tight 95% credible interval of 0.73 - 0.95. Since $\sigma^2_\texttt{m-line}$ is a variance, and therefore constrained to be positive, we are also implicitly assuming that the covariance and correlation are both constant and positive: a line that shows greater infectivity with $\texttt{France}$ is not expected, on average, to have reduced infectivity with $\texttt{Spain}$.

We can think of summarising this information in a $3\times 3$ between-line covariance matrix (the meaning of the colours will become apparent soon):

\[{\bf V}_{{\color{red}{\texttt{line}}}}= \left[ \begin{array}{ccc} \sigma^{2}_{\color{blue}{\texttt{France}}}&\sigma_{\color{blue}{\texttt{France}, \texttt{Spain}}}&\sigma_{\color{blue}{\texttt{France}, \texttt{Greece}}}\\ \sigma_{\color{blue}{\texttt{France}, \texttt{Spain}}}&\sigma^{2}_{\color{blue}{\texttt{Spain}}}&\sigma_{\color{blue}{\texttt{Spain}, \texttt{Greece}}}\\ \sigma_{\color{blue}{\texttt{France}, \texttt{Greece}}}&\sigma_{\color{blue}{\texttt{Spain}, \texttt{Greece}}}&\sigma^{2}_{\color{blue}{\texttt{Greece}}}\\ \end{array} \right]\]

The diagonal elements gives us the variance in $\texttt{line}$ effects for each virus and the off-diagonal elements gives us the covariance in $\texttt{line}$ effects between pairs of viruses. This covariance matrix has 6 (co)variances, but in model m.sigma.2 we assumed that these (co)variances could be parametrised using only two parameters: all diagonal elements are equal to $\sigma^2_\texttt{m-line}+\sigma^2_\texttt{virus:line}$ and all off-diagonal elements are equal to $\sigma^2_\texttt{m-line}$. If we wish to relax this assumption we need to understand how variance functions, such as $\texttt{us}$ and $\texttt{idh}$, work.

5.1 Variance Structures

We could refit our first model m.sigma.2 using the random effect specifications:

random=~us(1):line

random=~idh(1):line

and this would give exactly the same answer as the model specified by ~line. The term inside the brackets is a model formula and is interpreted exactly how you would interpret any R formula expect the intercept is only fitted if it is explicitly defined (as here - remember a 1 in a formula stands in for ‘intercept’). These formula are therefore fitting an intercept which is interacted with the random effects. We can get a representation of the interaction for the first few levels of line (101A, 109A, 129A, 142A, 153A):

Across the top, we have the original $\texttt{line}$ effects in red, and along the side we have the term defined by the variance structure formula (just the intercept in this case). The interaction forms a new set of factors. Although they have different names from the original $\texttt{line}$ effects, it is clear that there is a one to one mapping between the original and the new factor levels and the models are therefore equivalent. For more complex interactions this is not the case. For example, if we fit $\texttt{virus}$ in the variance structure model (i.e. us(virus):line or idh(virus):line) we get¹⁵:

\[\begin{array}{c|rrrrrc} &{\color{red}{\texttt{101A}}}&{\color{red}{\texttt{109A}}}&{\color{red}{\texttt{129A}}}&{\color{red}{\texttt{142A}}}&{\color{red}{\texttt{153A}}}&\dots\\ \hline {\color{blue}{\texttt{France}}}&{\color{blue}{\texttt{France}}}.{\color{red}{\texttt{101A}}}&{\color{blue}{\texttt{France}}}.{\color{red}{\texttt{109A}}}&{\color{blue}{\texttt{France}}}.{\color{red}{\texttt{129A}}}&{\color{blue}{\texttt{France}}}.{\color{red}{\texttt{142A}}}&{\color{blue}{\texttt{France}}}.{\color{red}{\texttt{153A}}}&\dots\\ {\color{blue}{\texttt{Spain}}}&{\color{blue}{\texttt{Spain}}}.{\color{red}{\texttt{101A}}}&{\color{blue}{\texttt{Spain}}}.{\color{red}{\texttt{109A}}}&{\color{blue}{\texttt{Spain}}}.{\color{red}{\texttt{129A}}}&{\color{blue}{\texttt{Spain}}}.{\color{red}{\texttt{142A}}}&{\color{blue}{\texttt{Spain}}}.{\color{red}{\texttt{153A}}}&\dots\\ {\color{blue}{\texttt{Greece}}}&{\color{blue}{\texttt{Greece}}}.{\color{red}{\texttt{101A}}}&{\color{blue}{\texttt{Greece}}}.{\color{red}{\texttt{109A}}}&{\color{blue}{\texttt{Greece}}}.{\color{red}{\texttt{129A}}}&{\color{blue}{\texttt{Greece}}}.{\color{red}{\texttt{142A}}}&{\color{blue}{\texttt{Greece}}}.{\color{red}{\texttt{153A}}}&\dots\\ \end{array}\]

which creates three times as many random effects, one associated with observations for each $\texttt{virus}$ for each each $\texttt{line}$.

5.1.1 $\texttt{idh}$ Variance Structure

The different variance functions make different assumptions about how the effects are distributed within and across rows. First, we may want to allow the variance in the effects to be different for each row of factors; i.e. does $\texttt{line}$ explain different amounts of variation depending on the $\texttt{virus}$. We can fit this model using the idh function:

m.sigma.3<-MCMCglmm(cbind(infected, not_infected)~virus, random=~idh(virus):line, data=sigma_small, family="binomial", prior=prior.sigma)

In the simpler models we have fitted so far, each random effect term (terms separated by + in the random formula) specified a single variance (e.g. $\sigma^2_\texttt{virus:line}$) and the prior specification was relatively simple and covered in Sections 2.6 and 4.6. Prior specifications for covariance matrices are much trickier and are covered in Section 5.3. For now, simply note that we have recycled the same prior generator function that we used in the simpler models, despite the the prior specification requiring $3\times 3$ matrices. If we take a look at the model summary:

summary(m.sigma.3)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 16741.54 
## 
##  G-structure:  ~idh(virus):line
## 
##                  post.mean l-95% CI u-95% CI eff.samp
## virusFrance.line     3.980    2.005    6.307    599.3
## virusGreece.line     4.214    2.181    6.830    496.5
## virusSpain.line      3.687    2.093    5.750    639.6
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units     2.314    1.923    2.722    697.1
## 
##  Location effects: cbind(infected, not_infected) ~ virus 
## 
##             post.mean l-95% CI u-95% CI eff.samp pMCMC   
## (Intercept)  -0.69857 -1.26637 -0.08768    399.0 0.018 * 
## virusGreece  -1.62506 -2.44844 -0.77185   1000.0 0.004 **
## virusSpain    0.62382 -0.10118  1.48195    570.2 0.116   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

we see that three parameters are summarised for ~idh(virus):line: the variance in line effects for each virus. While the $\texttt{idh}$ variance function allows the variances to be different across the viruses it assumes that $\texttt{line}$ effects for one virus are independent of the $\texttt{line}$ effects for a different virus. In terms of our coloured table of effects $\texttt{idh}$ assumes effects in different rows are independently distributed, but it relaxes the assumption that they are identically distributed - they can have different variances. We can represent this structure in terms of a $3\times3$ covariance matrix:

\[{\bf V}_{{\color{red}{\texttt{line}}}}= \left[ \begin{array}{ccc} \sigma^{2}_{\color{blue}{\texttt{France}}}&0&0\\ 0&\sigma^{2}_{\color{blue}{\texttt{Spain}}}&0\\ 0&0&\sigma^{2}_{\color{blue}{\texttt{Greece}}}\\ \end{array} \right]\]

We can extract the posterior means for each variance and place them into a matrix:

Vline.idh<-diag(colMeans(m.sigma.3$VCV)[1:3])
colnames(Vline.idh)<-rownames(Vline.idh)<-c("France", "Spain", "Greece")
Vline.idh

##          France    Spain   Greece
## France 3.979824 0.000000 0.000000
## Spain  0.000000 4.214103 0.000000
## Greece 0.000000 0.000000 3.687059

Because the $\texttt{line}$ effects are assumed to be multivariate normal we can also represent their distribution in terms of an ellipsoid (you can rotate the image):

plotsubspace(Vline.idh, axes.lab=TRUE)

Widget 5.1: Ellipsoid that circumscribes 95% of the expected $\texttt{line}$ effects as estimated in model m.sigma.3. This can be thought of as a scatter plot of the $\texttt{line}$ effects between each virus, if the $\texttt{line}$ effects could be directly measured. Because the covariances of the $\texttt{line}$ effects between each virus were set to zero, and the variance of the $\texttt{line}$ effects are quite similar for each virus, the ellipsoid is almost spherical.

Although we have allowed the variance in $\texttt{line}$ effects to be different across the viruses, we have used the default specification for the residual structure (rcov=~units). $\texttt{MCMCglmm}$ augments the original data-frame with a column called $\texttt{units}$ which is a factor that has unique levels for each row. This is covered in greater depth when discussing multi-response models (Chapter 7) but for the single-response model used here we can simply think of $\texttt{units}$ as indexing residuals. Having a constant residual variance for each virus but allowing the between-line variance to vary is somewhat dangerous: if the residual variances did vary then this may be reflected to some degree in the estimates of the between-line variances. Allowing the residual variances to also vary overs viruses is straightforward:

m.sigma.4<-MCMCglmm(cbind(infected, not_infected)~virus, random=~idh(virus):line, rcov=~idh(virus):units, data=sigma_small, family="binomial", prior=prior.sigma)

summary(m.sigma.4)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 16739.59 
## 
##  G-structure:  ~idh(virus):line
## 
##                  post.mean l-95% CI u-95% CI eff.samp
## virusFrance.line     4.107    1.953    6.485    373.3
## virusGreece.line     4.259    2.278    6.863    517.6
## virusSpain.line      3.752    1.946    5.575    622.3
## 
##  R-structure:  ~idh(virus):units
## 
##                   post.mean l-95% CI u-95% CI eff.samp
## virusFrance.units     1.760    1.193    2.411    580.1
## virusGreece.units     2.515    1.709    3.311    855.9
## virusSpain.units      2.654    1.927    3.358    999.0
## 
##  Location effects: cbind(infected, not_infected) ~ virus 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)  -0.68250 -1.26166 -0.09061     1000  0.014 *  
## virusGreece  -1.68520 -2.48699 -0.75300     1000 <0.001 ***
## virusSpain    0.59606 -0.17928  1.46033     1146  0.156    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As with the variance in $\texttt{line}$ effects, the residual variances don’t appear to be dramatically different across viruses.

5.1.2 $\texttt{us}$ Variance Structure

The $\texttt{idh}$ structure allowed the variance in $\texttt{line}$ effects to be different for different viruses, but it assumed that the effect of a line for one virus is uncorrelated with the effect of that line in another virus. However, the strong correlation (0.84) in $\texttt{line}$ effects estimated from model m.sigma.2 suggests that this assumption is unlikely to be reasonable. We can relax this assumption by using the us function which estimates the fully parameterised matrix seen earlier:

m.sigma.5<-MCMCglmm(cbind(infected, not_infected)~virus, random=~us(virus):line, rcov=~idh(virus):units, data=sigma_small, family="binomial", prior=prior.sigma)

summary(m.sigma.5)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 16734.1 
## 
##  G-structure:  ~us(virus):line
## 
##                              post.mean l-95% CI u-95% CI eff.samp
## virusFrance:virusFrance.line     4.043    2.179    5.942    560.2
## virusGreece:virusFrance.line     3.621    2.018    5.732    413.6
## virusSpain:virusFrance.line      3.011    1.639    4.666    688.7
## virusFrance:virusGreece.line     3.621    2.018    5.732    413.6
## virusGreece:virusGreece.line     4.996    2.374    7.843    332.0
## virusSpain:virusGreece.line      3.494    1.759    5.394    439.0
## virusFrance:virusSpain.line      3.011    1.639    4.666    688.7
## virusGreece:virusSpain.line      3.494    1.759    5.394    439.0
## virusSpain:virusSpain.line       3.878    2.154    5.756    778.4
## 
##  R-structure:  ~idh(virus):units
## 
##                   post.mean l-95% CI u-95% CI eff.samp
## virusFrance.units     1.767    1.155    2.416    715.0
## virusGreece.units     2.491    1.811    3.323    643.8
## virusSpain.units      2.627    1.981    3.302    799.8
## 
##  Location effects: cbind(infected, not_infected) ~ virus 
## 
##             post.mean  l-95% CI  u-95% CI eff.samp  pMCMC    
## (Intercept) -0.625816 -1.181138 -0.057590   1000.0  0.014 *  
## virusGreece -1.839004 -2.341864 -1.271311    838.3 <0.001 ***
## virusSpain   0.504100  0.004488  1.018975   1000.0  0.046 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The rows displayed for the G-structure correspond to the 9 elements of the $3\times 3$ covariance matrix (displayed column-wise). As before, we can arrange the posterior mean estimates in to a matrix

Vline.us<-matrix(colMeans(m.sigma.5$VCV)[1:9], 3, 3)
colnames(Vline.us)<-rownames(Vline.us)<-c("France", "Spain", "Greece")
Vline.us

##          France    Spain   Greece
## France 4.042588 3.620713 3.011193
## Spain  3.620713 4.995818 3.493868
## Greece 3.011193 3.493868 3.878277

We can see that the covariances are large, and like the variances, relatively constant. If we visualise this distribution we can see it is quite different from that obtained from model $\texttt{m.sigma.4}$ (or model $\texttt{m.sigma.3}$) where the covariances are assumed to be zero.

plotsubspace(Vline.us, axes.lab=TRUE)

Widget 5.2: Ellipsoid that circumscribes 95% of the expected $\texttt{line}$ effects as estimated in model $\texttt{m.sigma.5}$. This can be thought of as a scatter plot of the $\texttt{line}$ effects between virus, if the $\texttt{line}$ effects could be directly measured. The correlations of the $\texttt{line}$ effects between viruses are large, and the variance in $\texttt{line}$ effects are roughly equal in magnitude across viruses. Consequently the orientation of the major axis of the ellipsoid is tending towards $45^{o}$ relative to the figure axes.

We can use the function posterior.cor to convert the posterior samples of the (co)variance matrix into posterior samples of the correlation matrix. Doing so we can see that the estimated correlations are indeed large and relatively invariant.

rline.us<-matrix(colMeans(posterior.cor(m.sigma.5$VCV[,1:9])), 3, 3)
colnames(rline.us)<-rownames(rline.us)<-c("France", "Spain", "Greece")
rline.us

##           France     Spain    Greece
## France 1.0000000 0.8102356 0.7627693
## Spain  0.8102356 1.0000000 0.7983696
## Greece 0.7627693 0.7983696 1.0000000

In this particular example, it looks like the simpler two-parameter model implemented in model m.sigma.2 may do a good job of representing the covariance matrix:

Vline.2<-matrix(mean(m.sigma.2$VCV[,"line"]), 3, 3)
diag(Vline.2)<-diag(Vline.2)+mean(m.sigma.2$VCV[,"virus:line"])
colnames(Vline.2)<-rownames(Vline.2)<-c("France", "Spain", "Greece")
Vline.2

##          France    Spain   Greece
## France 4.268622 3.614878 3.614878
## Spain  3.614878 4.268622 3.614878
## Greece 3.614878 3.614878 4.268622

and we can see this if we visualise the two distributions simultaneously:

plotsubspace(Vline.us, Vline.2, axes.lab=TRUE, shadeCA=FALSE)

Widget 5.3: Ellipsoids that circumscribes 95% of the expected $\texttt{line}$ effects as estimated in model $\texttt{m.sigma.5}$ where all (co)variance parameters are estimated (red wireframe) and model $\texttt{m.sigma.2}$ where the variances across viruses, and the correlations between viruses, are assumed constant (solid blue).

Note that when using a $\texttt{us}$ structure for the $\texttt{line}$ effects in model $\texttt{m.sigma.6}$ we retained the $\texttt{idh}$ structure for the residuals. Since the flies in each vial (indexed by $\texttt{units}$ since they correspond to a row of the data-frame) are all exposed to the same virus, the residual covariances between viruses cannot be estimated and so can be set to zero. In Chapter 7 we will look at multi-response models where this generally isn’t the case.

5.1.3 Other Variance Structures

In some cases, additional restrictions need to be placed on a covariance matrix. In Section 3.6.3 we used the prior argument $\texttt{fix}$ to force the residual variance to be fixed at a specified value in a Bernoulli GLM. When a variance structure defines a (co)variance matrix, $\texttt{fix}$ specifies a diagonal element of the matrix that splits the matrix into four quadrants. For example, imagine we have the $5\times 5$ matrix specified in the $\texttt{V}$ element of the prior:

\[\texttt{V}= \left[ \begin{array}{cc|ccc} 1&0.5&0.5&0&0\\ 0.5&1&0.5&0&0\\ \hline 0.5&0.5&2&0&0\\ 0&0&0&2&1\\ 0&0&0&1&3\\ \end{array} \right]\]

If $\texttt{fix}=3$ then the matrix is split into four quardants with the lower right quadrant starting on the 3rd diagonal element. This quadrant will be fixed at the value specified in $\texttt{V}$ but all remaining (co)variance parameters will be estimated (if a $\texttt{us}$ structure is used).

Sometimes constraints are required that can not be achieved by using $\texttt{fix}$ and cannot be expressed in terms of the (re)parameterisations discussed above. Table 5.1 provides a summary of available variance functions and $\texttt{ante}$ structures - which give the greatest flexibility - are covered in Section 6.5).

Table 5.1: Different random effect specifications in $\texttt{MCMCglmm}$ (with equivalent $\texttt{lmer}$ syntax when possible) with fixed parameters in red and estimated parameters in black. $\texttt{virus}$ is a factor with three levels so the resulting covariance and correlation matrices are $3\times 3$. For the `cors(virus):line` structure, the prior specification had `fix=2` which forces the last diagonal block (starting from position 2,2) to be a correlation matrix rather than a covariance matrix. $\texttt{fix}$ can also be specified for other structures but then it fixes the last diagonal block to what ever is specified in the $\texttt{V}$ element of the prior. $\texttt{ante}$ structures are not covered here but in Section 6.5
MCMCglmm	lmer	nparameters	Variance	Correlation
`line`	`(1\|line)`	1	$\left[\begin{array}{ccc}V&V&V\\V&V&V\\V&V&V\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&{\color{red}{1}}&{\color{red}{1}}\\{\color{red}{1}}&{\color{red}{1}}&{\color{red}{1}}\\{\color{red}{1}}&{\color{red}{1}}&{\color{red}{1}}\\ \end{array}\right]$
`us(virus):line`	`(virus\|line)`	6	$\left[\begin{array}{ccc}V_{1,1}&C_{1,2}&C_{1,3}\\C_{2,1}&V_{2,2}&C_{2,3}\\C_{3,1}&C_{3,2}&V_{3,3}\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]$
`virus:line`	`(1\|virus:line)`	1	$\left[\begin{array}{ccc}V&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&V&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&V\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{1}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&{\color{red}{1}}\\ \end{array}\right]$
`line+virus:line`	`(1\|line)+(1\|virus:line)`	2	$\left[\begin{array}{ccc}V_1+V_2&V_1&V_1\\V_1&V_1+V_2&V_1\\V_1&V_1&V_1+V_2\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&r&r\\r&{\color{red}{1}}&r\\r&r&{\color{red}{1}}\\ \end{array}\right]$
`idh(virus):line`		3	$\left[\begin{array}{ccc}V_{1,1}&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&V_{2,2}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&V_{3,3}\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{1}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&{\color{red}{1}}\\ \end{array}\right]$
`corg(virus):line`		3	$\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]$
`corgh(virus):line`		3	$\left[\begin{array}{ccc}{\color{red}{V_{1,1}}}&r_{1,2}{\color{red}{\sqrt{V_{1,1}V_{2,2}}}}&r_{1,3}{\color{red}{\sqrt{V_{1,1}V_{3,3}}}}\\r_{2,1}{\color{red}{\sqrt{V_{2,2}V_{1,1}}}}&{\color{red}{V_{2,2}}}&r_{2,3}{\color{red}{\sqrt{V_{2,2}V_{3,3}}}}\\r_{3,1}{\color{red}{\sqrt{V_{3,3}V_{1,1}}}}&r_{3,2}{\color{red}{\sqrt{V_{3,3}V_{2,2}}}}&{\color{red}{V_{3,3}}}\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]$
`cors(virus):line`		4	$\left[\begin{array}{ccc}V_{1,1}&C_{1,2}&C_{1,3}\\C_{2,1}&{\color{red}{1}}&r_{2,3}\\C_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]$	$\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]$

5.2 Linking Functions

In some models we would like to estimate covariances between sets of random effects (or random effects and residuals) in a way that is conceptually very similar to that discussed in Section 5.1 using variance structures. However, the way that we would like the (co)variances to be set up cannot be achieved by forming interactions. This happens when the predictors associated with the two or more sets of random effects are in different columns of the data frame, despite sharing the same set of levels, and we need some way to link them. Again, this is perhaps best illustrated through an example.

5.2.1 $\texttt{str}$: covariances between random terms

My mum is a great mum. She’s also a super grandmother to my children, and I wonder whether these two things are correlated?¹⁶ Let’s say we could measure the well-being for many children over several generations, and importantly we have multiple children from the same mother so we can estimate $\texttt{mother}$ effects. As importantly, some of these mothers go onto to be grandmothers of their children’s children. If we also have measures of well-being for these children we can also estimate $\texttt{grandmother}$ effects. However, it’s not clear how we could specify a covariance between these effects in the same way that we did using variance structures. Our hypothetical data-frame may look like this:

##           y mother grandmother
## 1 -2.612746      q           u
## 2 -1.200558      h           c
## 3  2.101922      r           k
## 4  1.374406      b           x
## 5 -4.339486      u           i
## 6 -1.358210      o           t

with $\texttt{y}$ being the children’s well-being score and the letters being the names of the child’s mother and grandmother. I have ensured that the $\texttt{mother}$ and $\texttt{grandmother}$ columns have the same factor levels using the $\texttt{levels}$ argument of the function $\texttt{as.factor}$. It is not necessary that all mothers appear as grandmothers, or vice versa, but of course some mothers must appear as grandmothers if we wish to estimate the correlation in their effects on a child’s well-being. The linking function $\texttt{str}$ links two or more sets of random effects and assumes a fully unstructured covariance matrix for their distribution. It may help to visualise the random effects that the term str(mother+grandmother) is specifying:

\[\begin{array}{c|rrrrc} &{\color{red}{\texttt{a}}}&{\color{red}{\texttt{b}}}&{\color{red}{\texttt{c}}}&{\color{red}{\texttt{d}}}&\dots\\ \hline {\color{blue}{\texttt{mother}}}&{\color{blue}{\texttt{mother}}}.{\color{red}{\texttt{a}}}&{\color{blue}{\texttt{mother}}}.{\color{red}{\texttt{b}}}&{\color{blue}{\texttt{mother}}}.{\color{red}{\texttt{c}}}&{\color{blue}{\texttt{mother}}}.{\color{red}{\texttt{d}}}&\dots\\ {\color{blue}{\texttt{grandmother}}}&{\color{blue}{\texttt{grandmother}}}.{\color{red}{\texttt{a}}}&{\color{blue}{\texttt{grandmother}}}.{\color{red}{\texttt{b}}}&{\color{blue}{\texttt{grandmother}}}.{\color{red}{\texttt{c}}}&{\color{blue}{\texttt{grandmother}}}.{\color{red}{\texttt{d}}}&\dots\\ \end{array}\]

We can seen that this is conceptually identical to our sigma virus example but rather than specifying the classifying variable inside $\texttt{idh()}$ or $\texttt{us()}$, as we could for $\texttt{virus}$, the classifying variable is actually the column names of the terms specified in $\texttt{str()}$ .

prior.str=list(R=IW(1, 0.002), G=F(1,1000))

m.str<-MCMCglmm(y~1, random=~str(mother+grandmother), data=data.nana, prior=prior.str)

The model output shows that a $2\times 2$ covariance matrix has been estimated for the $\texttt{mother}$/$\texttt{grandmother}$ effects.

summary(m.str)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 2081.763 
## 
##  G-structure:  ~str(mother + grandmother)
## 
##                         post.mean l-95% CI u-95% CI eff.samp
## mother.mother              2.2118   1.0797   3.8169    819.6
## grandmother.mother         0.1213  -0.4134   0.7509   1093.0
## mother.grandmother         0.1213  -0.4134   0.7509   1093.0
## grandmother.grandmother    0.6925   0.2314   1.2873   1000.0
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units     3.455    3.039    3.929     1000
## 
##  Location effects: y ~ 1 
## 
##             post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept)   0.05767 -0.61336  0.70451     1000  0.85

When simulating the data I assumed the variance in mother effects was 2, the variance in grandmother effects was 1 and the covariance between them was 0.5 (giving a correlation of 0.35). The data set contained few (grand)mothers and so the estimates are uncertain, but the credible intervals overlap the true values.

5.2.2 $\texttt{mm}$: multi-membership models

Using $\texttt{str}$ allowed us to link random effects associated with categorical predictors appearing in different columns of our data frame. The effects defined by different columns were allowed to have different variances (the variance in mother effects was different from the variance in grandmother effects) and the correlation was estimated. In some cases we would like to treat the effects defined by the different columns as the same effects when their levels agree. In the $\texttt{str}$ example we did not want to do this because being a mother is not the same thing as being a grandmother. However, imagine that each of our children has five friends:

##            y friend.1 friend.2 friend.3 friend.4 friend.5
## 1 -1.3268672        L        G        M        B        J
## 2  1.2606695        M        T        N        G        X
## 3 -2.6914660        E        A        Y        B        T
## 4 -0.1975649        Q        A        I        L        J
## 5  0.4686364        S        B        Q        E        X
## 6 -2.6165464        M        D        K        O        N

Friend B appears as a friend to multiple children and we would like to model that friend’s effect on the focal child’s well-being irrespective of whether B appears as $\texttt{friend.1}$, $\texttt{friend.2}$ … or $\texttt{friend.5}$ as the numbering is arbitrary. The linking function $\texttt{mm}$ allows us to specify this.

prior.mm=list(R=IW(1, 0.002), G=F(1,1000))

m.mm<-MCMCglmm(y~1, random=~mm(friend.1+friend.2+friend.3+friend.4+friend.5), data=data.friends, prior=prior.mm)

summary(m.mm)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 1842.966 
## 
##  G-structure:  ~mm(friend.1 + friend.2 + friend.3 + friend.4 + friend.5)
## 
##                                              post.mean l-95% CI u-95% CI
## friend.1+friend.2+friend.3+friend.4+friend.5    0.1395  0.05475   0.2269
##                                              eff.samp
## friend.1+friend.2+friend.3+friend.4+friend.5      796
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units     2.235    1.917    2.495    729.4
## 
##  Location effects: y ~ 1 
## 
##             post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept)   -0.2512  -0.9245   0.4907     1000 0.484

The variance in $\texttt{friend}$ effects was set to 0.1 in the simulated data. While small compared to the simulated residual variance (2) the aggregate effect of all friends is five times this: 0.5. Note that if children varied in how many friends they had, the number of $\texttt{friend.#}$ columns can be set to the maximum and children with fewer friends than the maximum can have $\texttt{NA}$ in any redundant columns. For example, lets say the second child in our data-frame only had four friends - $\texttt{friend}$ G is not, in fact, a friend of this child. We would then just set this cell to $\texttt{NA}$:

head(data.friends)

##            y friend.1 friend.2 friend.3 friend.4 friend.5
## 1 -1.3268672        L        G        M        B        J
## 2  1.2606695        M        T        N     <NA>        X
## 3 -2.6914660        E        A        Y        B        T
## 4 -0.1975649        Q        A        I        L        J
## 5  0.4686364        S        B        Q        E        X
## 6 -2.6165464        M        D        K        O        N

When fitting the model, $\texttt{MCMCglmm}$ will issue the warning:

missing values in random predictors

but this can be ignored.

5.2.3 $\texttt{covu}$: covariances between random and residual terms

Occasionally, we need to link a set of random effects with a set of residuals. This most commonly occurs in multi-response models (Chapter 7) when we have repeated measures for one response but only single measures for another response. I will leave the main discussion of this topic to then (Section 7.4). However, to give some idea of how this works, imagine that the final random term in the random formula defines the set of random effects $[{\bf u}^{(1)}, {\bf u}^{(2)}, {\bf u}^{(3)}]$ (for example the $\texttt{virus}$ by $\texttt{line}$ effects discussed above). Imagine also that the first residual term in the rcov formula defines the residuals $[{\bf e}^{(1)}, {\bf e}^{(2)}]$. By specifying $\texttt{covu=TRUE}$ in the prior for the first residual term a $5\times 5$ covariance matrix is estimated rather than a $3\times 3$ covariance matrix for the random effects and a $2\times 2$ covariance matrix for the residuals.

5.2.4 $\texttt{theta_scale}$: scaled linear predictor

Occasionally, we need to force a common relationship between two sets of (multiple) random effects. As with $\texttt{covu}$ models, this most commonly occurs in multi-response models (Chapter 7) and so will be covered in detail there (Section ??). For now, we can express the idea compactly by imagining two sets random effects $\left\{{\bf u}^{(1)}, {\bf u}^{(2)}\right\}$ and $\left\{{\bf u}^{(3)}, {\bf u}^{(4)}\right\}$. The $\texttt{theta_scale}$ argument allows the user to force the constraint $\left\{{\bf u}^{(3)}, {\bf u}^{(4)}\right\}$=$\left\{b{\bf u}^{(1)}, b{\bf u}^{(2)}\right\}$ with $b$ estimated. Fixed effects can also be added to the set to be scaled. for example, $\left\{{\bf \beta}^{(B)}, {\bf u}^{(3)}, {\bf u}^{(4)}\right\}$=$\left\{b{\bf \beta}^{(A)}, b{\bf u}^{(1)}, b{\bf u}^{(2)}\right\}$.

5.3 Priors for Covariance Matrices

When fitting a $k$ dimensional (co)variance matrix, $\texttt{V}$ and $\texttt{alpha.V}$ must specify $k$ dimensional (co)variance matrices and $\texttt{alpha.mu}$ must specify a vector of length $k$. The degrees-of-freedom, $\texttt{nu}$, however, remains scalar. By using prior generators to specify the prior (Section 4.7) we hid this complexity. If we want to understand exactly what the prior generators are specifying in this context we cannot avoid this complexity. In addition, if we want to add arguments such as $\texttt{fix}$ or $\texttt{covu}$ to the prior, prior generators cannot be used.

What follows is important, but also hard. For those short of time, patience or expertise, I would recommend using the prior generator F(2,1000) for random-effect covariance matrices. However, you may need to adjust the scale if the standard deviation of the data differs greatly from one (Section 4.6).

5.3.1 Marginal Priors for Variances

The joint distribution of all $k(k-1)/2$ elements of the covariance matrix is hard to characterise or visualise (Tokuda et al. 2025). We will start by exploring a simpler problem - given some prior specification for a $k$ dimensional (co)variance matrix what is the marginal prior distribution of each variance? Luckily, the marginal distributions have the same form as they have in the scalar case but with some rescaling of parameters. We will designate the parameters of the marginal distribution using asterisks. The marginal distribution is characterised by the scalar parameters $\texttt{V}^{\ast}$ and $\texttt{nu}^{\ast}$, and if $F$ priors are used, then $\texttt{alpha.mu}^{\ast}$ and $\texttt{alpha.V}^{\ast}$. The properties of these (marginal) distributions are fully covered in Sections 2.6 and 4.6, respectively.

Before continuing, it should be pointed out that what follows in the next paragraph does not hold when an $\texttt{idh}$ structure is used. $\texttt{idh}$ structures set all correlations to zero and simply estimate a set of $k$ variances, where $k$ is the number of parameters specified by the $\texttt{idh}$ model. For example, in the sigma virus example we used idh(virus):line and $k=3$ since there were three levels of $\texttt{virus}$. The prior specification still requires $k\times k$ ($3\times 3$) matrices for $\texttt{V}$ and $\texttt{alpha.V}$ but the off-diagonal elements are ignored, and the prior for the $i^\textrm{th}$ variance is simply $\texttt{V}^{\ast}=\texttt{V[i,i]}$, $\texttt{nu}^\ast=\texttt{nu}$, $\texttt{alpha.mu}^\ast=\texttt{alpha.mu[i]}$ and $\texttt{alpha.V}^\ast=\texttt{alpha.V[i,i]}$¹⁷. Forcing all variances to have the same degrees-of-freedom might be too restrictive, but at least with $\texttt{idh}$ structures this can be relaxed using an alternative parameterisation¹⁸. This luxury isn’t afforded to other variance structures for which covariances or correlation are estimated.

For structures where a full covariance matrix is to be inferred (for example by using a $\texttt{us}$ structure) a useful result is that the prior for the $i^\textrm{th}$ variance has parameters $\texttt{V}^{\ast}=\frac{\texttt{nu}}{\texttt{nu}^{\ast}}\texttt{V[i,i]}$, $\texttt{nu}^{\ast}=\texttt{nu}-k+1$, $\texttt{alpha.mu}^\ast=\texttt{alpha.mu[i]}$ and $\texttt{alpha.V}^\ast=\texttt{alpha.V[i,i]}$. Consequently, we can choose a $\texttt{V}$ and $\texttt{nu}$ that allows us to specify one of the priors discussed in Sections 2.6 and 4.6 for a single variance. For example, in Section 2.6 we saw that an inverse-gamma prior with a shape and scale of 0.001 is equivalent to the scalar inverse-Wishart with $\texttt{V}^{\ast}=1$ and $\texttt{nu}^{\ast}=0.002$. Consequently, we can place this prior on the $i^\textrm{th}$ variance by setting $\texttt{nu}=k-1+0.002$ such that $\texttt{nu}^{\ast}=0.002$. Having $\texttt{V[i,i]}=\texttt{nu}^{\ast}/\texttt{nu}=0.002/(k-1+0.002)$ then results in $\texttt{V}^{\ast}=1$. An improper prior that is flat for the variance requires $\texttt{V}^{\ast}=0$ and $\texttt{nu}^{\ast}=-2$ which can be achieved by setting $\texttt{nu}=k-3$ and $\texttt{V[i,i]}=0$. In Section 4.6 we noted that scaled $F$ priors for variances (scaled folded-$t$ priors for standard deviations) can have better properties than the inverse-Wishart. As in Section 4.6 we can set $\texttt{alpha.mu}$ to be a vector of zeros and work with the scaled central $F$ and scaled half-$t$. The marginal distribution for the $i^\textrm{th}$ variance then follows a $F_{1, \texttt{nu}^{\ast}}$ distribution with scale equal to $\texttt{V}^{\ast}*\texttt{alpha.V[i,i]}$. We can therefore chose $\texttt{V}$, $\texttt{nu}$ and $\texttt{alpha.V}$ to specify one of the priors discussed in Section 2.6 for a single variance or standard deviation. For example, by setting $\texttt{nu}=k$ we have $\texttt{nu}^{\ast}=1$ generating a scaled $F_{1,1}$ prior for the variance (or half-Cauchy for the standard deviation). We noted in Section 4.6 that the prior specification in $\texttt{MCMCglmm}$ is redundant and we can set the scale by setting $\texttt{V}$ to one and controlling the scale through $\texttt{alpha.V}$. This can be achieved in the multivariate case by setting $\texttt{V[i,i]}=1/k$ such that $\texttt{V}^{\ast}=1$ and then altering $\texttt{alpha.V[i,i]}$ as before.

The rules for rescaling can be hard to remember and fiddly to implement. In order to simplify the process of prior specification, priors can also be specified using the prior generator functions detailed in Section 4.7. When the random term defines a (co)variance matrix, these prior generators specify a prior for which the marginal prior distributions for the variance have the specified distribution. In all cases the off-diagonal elements of $\texttt{V}$ and $\texttt{alpha.V}$ are set to zero. If we want to see the prior specification generated by a prior generator we can use the function resolve_prior which also requires the dimension of the (co)variance matrix ($\texttt{k}$) and the type of variance structure ($\texttt{vtype}$). For example, to have scaled-$F_{1,1}$ marginal priors for the variances with a scale of 1,000, the specification for a $3\times 3$ unstructured (co)variance matrix is:

resolve_prior(F(1,1000), k=3, vtype="us")

## $V
##           [,1]      [,2]      [,3]
## [1,] 0.3333333 0.0000000 0.0000000
## [2,] 0.0000000 0.3333333 0.0000000
## [3,] 0.0000000 0.0000000 0.3333333
## 
## $nu
## [1] 3
## 
## $alpha.mu
## [1] 0 0 0
## 
## $alpha.V
##      [,1] [,2] [,3]
## [1,] 1000    0    0
## [2,]    0 1000    0
## [3,]    0    0 1000

5.3.2 Marginal Priors for Covariances and Correlations

Above, we looked at the prior specifications that resulted in marginal priors for the variances which are well understood. What do the marginal priors for the covariances and correlations look like under these prior specifications? While the marginal distribution for a covariance does not have an easy form, the marginal distribution of a correlations for an inverse-Wishart with diagonal $\texttt{V}$ is a beta distribution transformed to the interval $[-1, 1]$ (Box and Tiao 1973; Barnard, McCulloch, and Meng 2000). The shape and scale of the beta are equal to $(\texttt{nu}-k+1)/2$ which is the marginal degrees-of-freedom $\texttt{nu}^{\ast}$ used in the prior generator function¹⁹. The beta is flat when $\texttt{nu}^{\ast}=2$ since the shape=scale=1. As $\texttt{nu}^{\ast}$ becomes smaller the distribution becomes more and more U-shaped with a lot of prior density close to -1 and 1. As $\texttt{nu}^{\ast}$ increases beyond $2$ this distribution becomes more constrained around zero (Figure 5.1). Since the working parameters in parameter expansion cancel for the correlation, the distribution of the correlations is the same as that under the inverse-Wishart (although the distribution of the covariances will differ).

$Prior marginal density for a correlation when $\texttt{V}$ is diagonal. The density is a beta distribution over the interval $[-1,1]$ with shape=scale=$\texttt{nu}^{\ast}/2$. $\texttt{nu}^{st}/2$ is the marginal degrees of freedom for a variance: $\texttt{nu}-k+1$.$

Figure 5.1: Prior marginal density for a correlation when $\texttt{V}$ is diagonal. The density is a beta distribution over the interval $[-1,1]$ with shape=scale=$\texttt{nu}^{\ast}/2$. $\texttt{nu}^{st}/2$ is the marginal degrees of freedom for a variance: $\texttt{nu}-k+1$.

With simple random effects models, where only variances need to be estimated, I usually use a scaled $F_{1,1}$ prior with large scale. In the past, I also specified priors for covariance matrices that had a marginal $F_{1,1}$ prior. However, since this implies $\texttt{nu}^{\ast}$=1 (and $\texttt{nu}=k$) this may push the posterior correlation away from zero, especially if the data have support for correlations that are large in magnitude. Using an $F_{1,2}$ marginal prior for the variance is flat for the correlation but it does mean we’ve switched from a half-Cauchy (half-$t_1$) to a half-$t_2$ for the standard deviation. However, with a large scale ($\sqrt{1000}=$ 31.6 for the standard deviation), this is unlikely to have a large effect (Figure 5.2).

$Marginal prior densities for the standard deviation using the prior generator functions $\texttt{F(1,1000)}$, $\texttt{F(2,1000)}$ and $\texttt{F(3,1000)}$. $\texttt{F(2,1000)}$ results in a flat prior for the correlation. Note that the legend refers to these in terms of the marginal density for the standard deviations (ie. half-$t$) and that the half-$t_1$ is the Cauchy. Also, while the x-axis runs from zero to 31, the posterior distributions for the standard deviation in $\texttt{line}$ effects lie in the range 1.2-3.5. By doing this, the figure is directly comparable to Figure \@ref(fig:prior-compare).$

Figure 5.2: Marginal prior densities for the standard deviation using the prior generator functions $\texttt{F(1,1000)}$, $\texttt{F(2,1000)}$ and $\texttt{F(3,1000)}$. $\texttt{F(2,1000)}$ results in a flat prior for the correlation. Note that the legend refers to these in terms of the marginal density for the standard deviations (ie. half-$t$) and that the half-$t_1$ is the Cauchy. Also, while the x-axis runs from zero to 31, the posterior distributions for the standard deviation in $\texttt{line}$ effects lie in the range 1.2-3.5. By doing this, the figure is directly comparable to Figure 4.7.

Let’s rerun model m.sigma.5 with an $F_{1,1}$ prior of the same scale to see what happens:

prior.sigma.6=list(R=IW(1, 0.002), G=F(1, 1000))

m.sigma.6<-MCMCglmm(cbind(infected, not_infected)~virus, random=~us(virus):line, rcov=~idh(virus):units, data=sigma_small, family="binomial", prior=prior.sigma.6)

The variance in $\texttt{line}$ effects over viruses remain largely unchanged

$MCMC traces for the variance in $\texttt{line}$ effects for the three viruses when the marginal prior for the variances are scaled (1000) $F_{1,2}$ (black - model $\texttt{m.sigma.5}$) or $F_{1,1}$ (red- model $\texttt{m.sigma.6}$) distributions. The $F_{1,2}$ prior is flat for the correlation in $\texttt{line}$ effects across viruses.$

Figure 5.3: MCMC traces for the variance in $\texttt{line}$ effects for the three viruses when the marginal prior for the variances are scaled (1000) $F_{1,2}$ (black - model $\texttt{m.sigma.5}$) or $F_{1,1}$ (red- model $\texttt{m.sigma.6}$) distributions. The $F_{1,2}$ prior is flat for the correlation in $\texttt{line}$ effects across viruses.

The correlations in $\texttt{line}$ effects between viruses are nudged a little further from zero under the new prior but the effect is rather subtle (there’s also some auto-correlation in chains so they should be run for longer than the number of default iterations).

$MCMC traces for the correlation in $\texttt{line}$ effects between the three viruses when the marginal prior for the variances are scaled (1000) $$F_{1,2}$ (black - model $\texttt{m.sigma.5}$) or $F_{1,1}$ (red- model $\texttt{m.sigma.6}$) distributions. The $F_{1,2}$ prior is flat for the correlation in $\texttt{line}$ effects across viruses.$

Figure 5.4: MCMC traces for the correlation in $\texttt{line}$ effects between the three viruses when the marginal prior for the variances are scaled (1000) $$F_{1,2}$ (black - model $\texttt{m.sigma.5}$) or $F_{1,1}$ (red- model $\texttt{m.sigma.6}$) distributions. The $F_{1,2}$ prior is flat for the correlation in $\texttt{line}$ effects across viruses.

In contrast, altering the degrees-of-freedom in the inverse-Wishart (ie. not using parameter-expanded $F$-priors) to achieve a better prior distribution for the correlation can have a much stronger influence on the variances and standard deviations 5.5.

$Marginal prior densities for the standard deviation using the prior generator functions $\texttt{IW(1,0.002)}$, $\texttt{IW(1,1)}$ and $\texttt{IW(1,2)}$, $\texttt{IW(1,3)}$ and $\texttt{IW(1,10)}$. $\texttt{IW(1,2)}$ results in a flat prior for the correlation. Note that the legend refers to these in terms of the marginal density for the variance (ie. inverse-gamma). Also, while the x-axis runs from zero to 31, the posterior distributions for the standard deviation in $\texttt{line}$ effects lie in the range 1.2-3.5. By doing this, the figure is directly comparable to Figure \@ref(fig:prior-compare).$

Figure 5.5: Marginal prior densities for the standard deviation using the prior generator functions $\texttt{IW(1,0.002)}$, $\texttt{IW(1,1)}$ and $\texttt{IW(1,2)}$, $\texttt{IW(1,3)}$ and $\texttt{IW(1,10)}$. $\texttt{IW(1,2)}$ results in a flat prior for the correlation. Note that the legend refers to these in terms of the marginal density for the variance (ie. inverse-gamma). Also, while the x-axis runs from zero to 31, the posterior distributions for the standard deviation in $\texttt{line}$ effects lie in the range 1.2-3.5. By doing this, the figure is directly comparable to Figure 4.7.

5.3.3 Full joint prior

We have concentrated on the marginal priors, since these are relatively easy to understand and visualise. However, by concentrating on the marginal distributions we may miss aspects of the joint prior distribution that is important. Characterising the joint distribution is hard, particular under parameter expansion where the full probability density function probably doesn’t have a closed form solution. However, we can simulate from these priors using the function rprior and use the visualisation tools in the library $\texttt{VisCov}$ (Tokuda et al. 2025).

prior.line<-resolve_prior(prior.sigma$G, k=3, vtype="us")

VCV<-rprior(2000, prior=prior.line)
# simulate from prior for line (co)variance matrix 

mat<- apply(VCV, 1, matrix, nrow=3, simplify=FALSE)
# coerce to list

CovPlotData = VisCov("User defined distribution", param=list(prob = 0.5, mat = mat))

$Visualisation of the prior distribution used for the $3\times 3$ (co)variance matrix of $\texttt{line}$ effects from model $\texttt{m.sigma.5}$ where a parameter expanded prior was used with marginal $F{1,2}$ priors on the variances with a scale of a 1,000 (half-$t_2$ prior on the standard deviation with scale $\sqrt{1000}\approx 32$). The prior is flat for the correlations. See @Tokuda.2025 for discussion of $\texttt{VisCov}$ plots.$

Figure 5.6: Visualisation of the prior distribution used for the $3\times 3$ (co)variance matrix of $\texttt{line}$ effects from model $\texttt{m.sigma.5}$ where a parameter expanded prior was used with marginal $F{1,2}$ priors on the variances with a scale of a 1,000 (half-$t_2$ prior on the standard deviation with scale $\sqrt{1000}\approx 32$). The prior is flat for the correlations. See Tokuda et al. (2025) for discussion of $\texttt{VisCov}$ plots.

We can also visualise the inverse-Wishart prior with $\texttt{nu}=4$ which is flat for the correlations since $\texttt{nu}^{\ast}=2$. We can see the dependencies in the posterior are stronger, particularly between the correlations and the standard deviations (Figure 5.7).

prior.IW<-resolve_prior(IW(1,2), k=3, vtype="us")
# inverse-Wishart prior with marginal inverse-gamma(1,1) prior on variances

VCV<-rprior(2000, prior=prior.IW)
# simulate from prior for line (co)variance matrix 

mat<- apply(VCV, 1, matrix, nrow=3, simplify=FALSE)
# coerce to list

CovPlotData = VisCov("User defined distribution", param=list(prob = 0.5, mat = mat))

$Visualisation of the prior distribution for a $3\times 3$ (co)variance matrix using an inverse-Wishart distribution with $\texttt{n}=4$ and $\texttt{V}={\bf I}\frac{1}{2}$. This prior is flat for the correlations ($\texttt{nu}^{\ast}=2$) but the marginal prior for the variances is inverse-gamma with shape and scale equal to one. See @Tokuda.2025 for discussion of $\texttt{VisCov}$ plots.$

Figure 5.7: Visualisation of the prior distribution for a $3\times 3$ (co)variance matrix using an inverse-Wishart distribution with $\texttt{n}=4$ and $\texttt{V}={\bf I}\frac{1}{2}$. This prior is flat for the correlations ($\texttt{nu}^{\ast}=2$) but the marginal prior for the variances is inverse-gamma with shape and scale equal to one. See Tokuda et al. (2025) for discussion of $\texttt{VisCov}$ plots.

References

Barnard, J., R. McCulloch, and X. L. Meng. 2000. “Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, with Application to Shrinkage.” Statistica Sinica 10 (4): 1281–311.

Box, G. E. P., and G. C. Tiao. 1973. Bayesian Inference in Statistical Analysis. New York: John Wiley & Sons.

Carpenter, Jennifer A, Jarrod D Hadfield, Jenny Bangham, and Francis M Jiggins. 2012. “Specific Interactions Between Host and Parasite Genotypes Do Not Act as a Constraint on the Evolution of Antiviral Resistance in Drosophila.” Evolution 66 (4): 1114–25.

Tokuda, Tomoki, Ben Goodrich, Iven Van Mechelen, Andrew Gelman, and Francis Tuerlinckx. 2025. “Visualizing Distributions of Covariance Matrices.” Journal of Data Science, Statistics, and Visualisation 5 (7).

Remember that a global intercept is not fitted by default for variance structure models, and the model formula is essentially ~virus-1. To add the global intercept, us(1+virus):line could be fitted but this can be harder to interpret because the effects are then France, Spain-France and Greece-France. If a $\texttt{us}$ structure is fitted, the two models are equivalent reparameterisations of each other although the priors have to be modified accordingly. This is not the case if the variance function is $\texttt{idh}$. In this case the virus-specific variances are allowed to vary as before, but a constant covariance equal to $\sigma^{2}_{\color{blue}{\texttt{France}}}$ is also assumed↩︎
If anyone has an actual data set that could serve as an example, please get in touch. The code was developed for estimating a direct-maternal genetic correlation in an animal model (Chapter 8) but as an example it is too complicated.↩︎
In versions $<$ 2.05 priors on each variance of an $\texttt{idh}$ structure were distributed as $IW\left(\texttt{nu}^{\ast}\texttt{=nu-dim(V)+1},\ \texttt{V}^{\ast}=\texttt{V[1,1]}\right)$ but this was neither intuitive or completely consistent with the marginal prior from a full-covariance matrix (since $\texttt{V}^{\ast}$ should be $\texttt{V[1,1]}\frac{\texttt{nu}}{\texttt{nu}^{\ast}}$).↩︎
In models m.sigma.3 (and m.sigma.4) we fitted the $\texttt{idh}$ term idh(virus):line. We could also have set this model up as the much more ugly idh(at.level(virus, "France")):line+idh(at.level(virus, "Spain")):line+idh(at.level(virus, "Greece")):line. The function at.level returns a vector with ones where the first term ($\texttt{virus}$) is equal to the second term (e.g. $\texttt{"France"}$) and zero otherwise. In this context, it essentially just fits $\texttt{line}$ effects on observations made on the $\texttt{France}$ virus. The prior specification would require a prior on each of the three variances and a different $\texttt{nu}$ could be specified for each.↩︎
When correlation matrices are estimated using $\texttt{corg}$ and $\texttt{corgh}$ structures only $\texttt{nu}$ determines the prior density - $\texttt{V}$ is ignored. The marginal density for the correlations are the same as those given for the $inverse-Wishart$ and marginal $F$-priors.↩︎

MCMCglmm	lmer	nparameters	Variance	Correlation
`line`	`(1\|line)`	1	\(\left[\begin{array}{ccc}V&V&V\\V&V&V\\V&V&V\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&{\color{red}{1}}&{\color{red}{1}}\\{\color{red}{1}}&{\color{red}{1}}&{\color{red}{1}}\\{\color{red}{1}}&{\color{red}{1}}&{\color{red}{1}}\\ \end{array}\right]\)
`us(virus):line`	`(virus\|line)`	6	\(\left[\begin{array}{ccc}V_{1,1}&C_{1,2}&C_{1,3}\\C_{2,1}&V_{2,2}&C_{2,3}\\C_{3,1}&C_{3,2}&V_{3,3}\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]\)
`virus:line`	`(1\|virus:line)`	1	\(\left[\begin{array}{ccc}V&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&V&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&V\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{1}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&{\color{red}{1}}\\ \end{array}\right]\)
`line+virus:line`	`(1\|line)+(1\|virus:line)`	2	\(\left[\begin{array}{ccc}V_1+V_2&V_1&V_1\\V_1&V_1+V_2&V_1\\V_1&V_1&V_1+V_2\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&r&r\\r&{\color{red}{1}}&r\\r&r&{\color{red}{1}}\\ \end{array}\right]\)
`idh(virus):line`		3	\(\left[\begin{array}{ccc}V_{1,1}&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&V_{2,2}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&V_{3,3}\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&{\color{red}{0}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{1}}&{\color{red}{0}}\\{\color{red}{0}}&{\color{red}{0}}&{\color{red}{1}}\\ \end{array}\right]\)
`corg(virus):line`		3	\(\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]\)
`corgh(virus):line`		3	\(\left[\begin{array}{ccc}{\color{red}{V_{1,1}}}&r_{1,2}{\color{red}{\sqrt{V_{1,1}V_{2,2}}}}&r_{1,3}{\color{red}{\sqrt{V_{1,1}V_{3,3}}}}\\r_{2,1}{\color{red}{\sqrt{V_{2,2}V_{1,1}}}}&{\color{red}{V_{2,2}}}&r_{2,3}{\color{red}{\sqrt{V_{2,2}V_{3,3}}}}\\r_{3,1}{\color{red}{\sqrt{V_{3,3}V_{1,1}}}}&r_{3,2}{\color{red}{\sqrt{V_{3,3}V_{2,2}}}}&{\color{red}{V_{3,3}}}\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]\)
`cors(virus):line`		4	\(\left[\begin{array}{ccc}V_{1,1}&C_{1,2}&C_{1,3}\\C_{2,1}&{\color{red}{1}}&r_{2,3}\\C_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]\)	\(\left[\begin{array}{ccc}{\color{red}{1}}&r_{1,2}&r_{1,3}\\r_{2,1}&{\color{red}{1}}&r_{2,3}\\r_{3,1}&r_{3,2}&{\color{red}{1}}\\ \end{array}\right]\)

MCMCglmm Course Notes

5 Categorical Random Interactions

5.1 Variance Structures

5.1.1 \(\texttt{idh}\) Variance Structure

5.1.2 \(\texttt{us}\) Variance Structure

5.1.3 Other Variance Structures

5.2 Linking Functions

5.2.1 \(\texttt{str}\): covariances between random terms

5.2.2 \(\texttt{mm}\): multi-membership models

5.2.3 \(\texttt{covu}\): covariances between random and residual terms

5.2.4 \(\texttt{theta_scale}\): scaled linear predictor

5.3 Priors for Covariance Matrices

5.3.1 Marginal Priors for Variances

5.3.2 Marginal Priors for Covariances and Correlations

5.3.3 Full joint prior

References