10 Path Analysis & Antedependence Structures

There are many situations where it would seem reasonable to put some aspect of a response variable in as a predictor, and the only thing that stops us is some (often vague) notion that this is a bad thing to do from a statistical point of view. The approach appears to have a long history in economics but I came across the idea in a paper written by Gianola and Sorensen (2004). The notation of this section, and indeed the sampling strategy employed in MCMCglmm is derived from this paper.

10.1 Path Anlaysis

\(\boldsymbol{\Psi}^{(l)}\) is a square matrix of dimension \(N \times N\) where \(\Psi_{i,j}^{(l)}=k\) sets up the equation \(y_{i} = \lambda_{l}ky_{j} \dots\). In matrix terms:

\[\boldsymbol{\Lambda}{\bf y} = {\bf y} - \sum_{l}\boldsymbol{\Psi}^{(l)}{\bf y}{\lambda_{l}}\]

Having \(\boldsymbol{\Psi} = \left[\boldsymbol{\Psi}^{(1)}\ \boldsymbol{\Psi}^{(2)}\ \dots \boldsymbol{\Psi}^{(L-1)}\ \boldsymbol{\Psi}^{(L)}\right]\) we have

\[\begin{array}{rl} \boldsymbol{\Lambda}{\bf y} =& {\bf y}-\boldsymbol{\Psi}\left({\bf I}_{L} \otimes {\bf y} \right)\boldsymbol{\Lambda}\\ =& {\bf y}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right) {\bf y}\\ \end{array} \label{rs-Eq} \tag{10.1}\]

where \(\boldsymbol{\Lambda} = \left[\lambda_{1}\ \lambda_{2} \dots \lambda_{L-1}\ \lambda_{L} \right]^{\top}\), and \(\boldsymbol{\Lambda}={\bf I}_{N}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right)\)

Each \(\boldsymbol{\Psi}^{(l)}\) can be formed using the function sir which takes two formulae. \(\boldsymbol{\Psi} = {\bf X}_{1}{\bf X}_{2}^{\top}\) where \({\bf X}_{1}\) and \({\bf X}_{2}\) are the model matrices defined by the formulae (with intercept removed). \({\bf X}_{1}\) and \({\bf X}_{2}^{\top}\) have to be conformable, and although this could be achieved in many ways, one way to ensure this is to have categorical predictors in each which have common factor levels. To give a concrete example, lets take a sample of individuals measured a variable number of times for 2 traits:

id <- sample(1:100, 100, replace = T)
y1 <- rnorm(100)
y2 <- rnorm(100)
y <- c(y1, y2)
trait <- gl(2, 100)

Lets then imagine that each of these individuals interacts with another randomly chosen individual - indexed in the vector id1

id1 <- sample(id, 100, replace = T)
id <- as.factor(c(id, id))
id1 <- factor(c(id1, id1), levels = levels(id))

we will adopt a recursive model where by the phenotypes of individuals in the id1 vector affect those in the id vector:

Psi <- sir(~id1, ~id)

we can see that the first record for individual id[1]=83 is directly affected by individual id1[1]=61’s traits:

Psi[1, which(id == id1[1])]
##  38  83 138 183 
##   1   1   1   1

i.e individual id1[1]=61 has 4 records.

We can build on this simple model by stating that only trait 2 affects trait 1:

Psi <- sir(~id1:at.level(trait, 1), ~id:at.level(trait, 2))
Psi[c(1, 101), which(id == id1[1])]
##     38 83 138 183
## 1    0  0   1   1
## 101  0  0   0   0

or that trait 2 affect both trait 2 and trait 1:

Psi <- sir(~id1, ~id:at.level(trait, 2))
Psi[c(1, 101), which(id == id1[1])]
##     38 83 138 183
## 1    0  0   1   1
## 101  0  0   1   1
my.data <- data.frame(y1 = y1, y2 = y2, id = id[1:100], id1 = id1[1:100], x = rnorm(100))
m1 <- MCMCglmm(y1 ~ x + sir(~id1, ~id) + y2, data = my.data)

One problem is that \({\bf e}^{\star}\) the residual vector that appears in the likelihood for the latent variable does not have a simple (block) diagonal structure when (as in the case above) the elements of the response vector that are regressed on each other are not grouped in the R-structure:

\[{\bf e}^{\star} \sim N\left({\boldsymbol{\mathbf{0}}}, \boldsymbol{\Lambda}^{-1}{\bf R}\boldsymbol{\Lambda}^{-\top}\right)\]

Consequently, analyses that involve latent variables (i.e. non-Gaussian data, or analyses that have incomplete records for determining the R-structure) are currently not implemented in MCMCglmm.

The path function is a way of specifying path models that are less general than those formed by sir but are simple enough to allow updating of the latent variables associated with non-Gaussian data. Imagine a residual structure is fitted where the \(N\) observations are grouped into \(n\) blocks of \(k\). For instance this might be \(k\) different characteristics measured in \(n\) individuals. A path model may be entertained whereby an individual’s characteristics only affect their own characteristics rather than anyone else’s. In this case, \(\boldsymbol{\Psi}^{(l)}=\boldsymbol{\Psi}^{(l)}\otimes{\bf I}_{n}\) is block diagonal, and \(\boldsymbol{\Psi}=\boldsymbol{\Psi}\otimes{\bf I}_{n}\) where \(\boldsymbol{\Psi} = \left[\boldsymbol{\Psi}^{(1)}, \boldsymbol{\Psi}^{(2)} \dots \boldsymbol{\Psi}^{(L)}\right]\). Consequently,

\[\begin{array}{rl} \boldsymbol{\Lambda}=&{\bf I}_{N}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right)\\ =&{\bf I}_{N}-\left(\boldsymbol{\Psi}\otimes{\bf I}_{n}\right)\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right)\\ =&\left({\bf I}_{k}\otimes{\bf I}_{n}\right)-\left(\boldsymbol{\Psi}\otimes{\bf I}_{n}\right)\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\otimes{\bf I}_{n}\right)\\ =&\left({\bf I}_{k}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\right)\right)\otimes{\bf I}_{n}\\ \end{array}\]

and so \(\boldsymbol{\Lambda}^{-1}=\left({\bf I}_{k}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\right)\right)^{-1}\otimes{\bf I}_{n}\) and \(|\boldsymbol{\Lambda}| = |{\bf I}_{k}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\right)|^{n}\)

Mean-centring responses can help mixing, because \(\boldsymbol{\theta}\) and \(\boldsymbol{\lambda}\) are not sampled in a block. (Jarrod - I guess when \(|\boldsymbol{\Lambda}|=1\) this could be detected and updating occur in a block?)

10.2 Antedependence

An \(n\times n\) unstructured covariance matrix can be reparameterised in terms of regression coefficients and residual variances from a set of \(n\) nested multiple regressions. For example, for \(n=3\) the following 3 multiple regressions can be defined:

\[\begin{array}{rl} u_{3} =& u_{2}\beta_{3|2}+u_{1}\beta_{3|1}+e_{u_{3}}\\ u_{2} =& u_{1}\beta_{2|1}+e_{u_{2}}\\ u_{1} =& e_{u_{1}}\\ \end{array}\]

Arranging the regression coefficients and residual (‘innovation’) variances into a lower triangular matrix and diagonal matrix respectively:

\[{\bf L}_{u}= \left[ \begin{array}{ccc} 1&0&0\\ -\beta_{2|1}&1&0\\ -\beta_{3|1}&-\beta_{3|2}&1\\ \end{array} \right]\]

and

\[{\bf D}_{u}= \left[ \begin{array}{ccc} \sigma^{2}_{e_{u_{1}}}&0&0\\ 0&\sigma^{2}_{e_{u_{2}}}&0\\ 0&0&\sigma^{2}_{e_{u_{3}}}\\ \end{array} \right]\]

gives \[{\bf V}_{u} = {\bf L}_{u}{\bf D}_{u}{\bf L}_{u}^{\top}\]

Rather than fit the saturated model (in this case all 3 regression coefficients) \(k^{th}\) order antedependence models seek to model \({\bf V}_{u}\) whilst constraining the regression coefficients in \({\bf L}_{u}\) to be zero if they are on sub-diagonals. For example, a first order antedependence model would set the regression coefficients in the second off-diagonal (i.e. \(\beta_{3|1}\)) to zero, but estimate those in the first sub-diagonal (i.e. \(\beta_{2|1}\) and \(\beta_{3|2}\)). For a \(3\times3\) matrix, a second order antedependence model would fit a fully unstructured covariance matrix. In terms of Gibbs sampling this parameterisation is less efficient because \({\bf V}_{u}\) is sampled in two blocks (the regression coefficients followed by the innovation variances) rather than in a single block from the inverse Wishart. However, more flexible conjugate prior specifications are possible by placing multivariate normal priors on the regression coefficients and independent inverse Wishart priors on the innovation variances. By constraining arbitrary regression coefficients to be zero in a fully unstructured model allows any fully recursive path model to be constructed for a set of random effects.

10.3 Scaling

The path analyses described above essentially allow elements of the response vector to be regressed on each other. Regressing an observation on itself would seem like a peculiar thing to do, although with a little work we can show that by doing this we can allow two sets of observations to conform to the same model except for a difference in scale.

References

Gianola, D., and D. Sorensen. 2004. “Quantitative Genetic Models for Describing Simultaneous and Recursive Relationships Between Phenotypes.” Genetics 167 (3): 1407–24.