10 Path Analysis

In previous chapters, variables could be neatly divided into predictor variables and a response variable (or variables, in the multi-response models covered in Chapter 7). However, sometimes we may want to allow observations of a response variable to serve as predictors for other observations. The \(\texttt{sir}\) function can be used to specify linear dependencies between elements of the response that are arbitrarily complex in their pattern. However, the \(\texttt{sir}\) function is hard to use and can only be used for Gaussian responses without missing values. The simpler \(\texttt{path}\) function only allows dependencies to exist between observations within a residual block but non-Gaussian repose variables are permitted. However, it is important to realise that when the response is non-Gaussian, dependencies are defined at the level of the latent variable, not the observation.

Before discussing these models it will be useful to determine if they are indeed required, as very often they can be set up as a standard multivariate analysis without \(\texttt{path}\)/\(\texttt{sir}\) terms or even a series of univariate analyses. .

10.1 \(\texttt{path}\)

data(mtcars)

m.path<-MCMCglmm(cbind(mpg, hp)~trait-1+at.level(trait, 1):(gear+cyl+disp+carb+am+wt)+at.level(trait, 2):(cyl+disp+carb)+path(2, 1, 2), rcov=~idh(trait):units, data=mtcars, family=rep("gaussian", 2),prior=list(R=list(V=diag(2),nu=0.002)), longer=50)

10.2 \(\texttt{sir}\)

#library(sand) 
#data(lazega)           
#A<-as_adjacency_matrix(lazega, sparse = TRUE)                             

\(\boldsymbol{\Psi}^{(l)}\) is a square matrix of dimension \(N \times N\) where \(\Psi_{i,j}^{(l)}=k\) sets up the equation \(y_{i} = \lambda_{l}ky_{j} \dots\). In matrix terms:

\[\boldsymbol{\Lambda}{\bf y} = {\bf y} - \sum_{l}\boldsymbol{\Psi}^{(l)}{\bf y}{\lambda_{l}}\]

Having \(\boldsymbol{\Psi} = \left[\boldsymbol{\Psi}^{(1)}\ \boldsymbol{\Psi}^{(2)}\ \dots \boldsymbol{\Psi}^{(L-1)}\ \boldsymbol{\Psi}^{(L)}\right]\) we have

\[\begin{array}{rl} \boldsymbol{\Lambda}{\bf y} =& {\bf y}-\boldsymbol{\Psi}\left({\bf I}_{L} \otimes {\bf y} \right)\boldsymbol{\Lambda}\\ =& {\bf y}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right) {\bf y}\\ \end{array} \label{rs-Eq} \tag{10.1}\]

where \(\boldsymbol{\Lambda} = \left[\lambda_{1}\ \lambda_{2} \dots \lambda_{L-1}\ \lambda_{L} \right]^{\top}\), and \(\boldsymbol{\Lambda}={\bf I}_{N}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right)\)

Each \(\boldsymbol{\Psi}^{(l)}\) can be formed using the function sir which takes two formulae. \(\boldsymbol{\Psi} = {\bf X}_{1}{\bf X}_{2}^{\top}\) where \({\bf X}_{1}\) and \({\bf X}_{2}\) are the model matrices defined by the formulae (with intercept removed). \({\bf X}_{1}\) and \({\bf X}_{2}^{\top}\) have to be conformable, and although this could be achieved in many ways, one way to ensure this is to have categorical predictors in each which have common factor levels. To give a concrete example, lets take a sample of individuals measured a variable number of times for 2 traits:

id<-sample(1:100, 100, replace=T)
y1<-rnorm(100)
y2<-rnorm(100)
y<-c(y1,y2)
trait<-gl(2,100)

Lets then imagine that each of these individuals interacts with another randomly chosen individual - indexed in the vector id1

id1<-sample(id, 100, replace=T)
id<-as.factor(c(id,id))
id1<-factor(c(id1,id1),levels=levels(id))

we will adopt a recursive model where by the phenotypes of individuals in the id1 vector affect those in the id vector:

Psi<-MCMCglmm::sir(~id1, ~id)

we can see that the first record for individual id[1]=68 is directly affected by individual id1[1]=64’s traits:

Psi[1,which(id==id1[1])]
##   8  51  88 108 151 188 
##   1   1   1   1   1   1

i.e individual id1[1]=64 has 6 records.

We can build on this simple model by stating that only trait 2 affects trait 1:

Psi<-MCMCglmm::sir(~id1:at.level(trait,1), ~id:at.level(trait,2))
Psi[c(1,101),which(id==id1[1])]
##     8 51 88 108 151 188
## 1   0  0  0   1   1   1
## 101 0  0  0   0   0   0

or that trait 2 affect both trait 2 and trait 1:

Psi<-MCMCglmm::sir(~id1, ~id:at.level(trait,2))
Psi[c(1,101),which(id==id1[1])]
##     8 51 88 108 151 188
## 1   0  0  0   1   1   1
## 101 0  0  0   1   1   1
my.data<-data.frame(y1=y1, y2=y2, id=id[1:100], id1=id1[1:100], x=rnorm(100))
m1<-MCMCglmm(y1~x+sir(~id1, ~id)+y2, data=my.data)

One problem is that \({\bf e}^{\star}\) the residual vector that appears in the likelihood for the latent variable does not have a simple (block) diagonal structure when (as in the case above) the elements of the response vector that are regressed on each other are not grouped in the R-structure:

\[{\bf e}^{\star} \sim N\left({\boldsymbol{\mathbf{0}}}, \boldsymbol{\Lambda}^{-1}{\bf R}\boldsymbol{\Lambda}^{-\top}\right)\]

Consequently, analyses that involve latent variables (i.e. non-Gaussian data, or analyses that have incomplete records for determining the R-structure) are currently not implemented in MCMCglmm.

The path function is a way of specifying path models that are less general than those formed by sir but are simple enough to allow updating of the latent variables associated with non-Gaussian data. Imagine a residual structure is fitted where the \(N\) observations are grouped into \(n\) blocks of \(k\). For instance this might be \(k\) different characteristics measured in \(n\) individuals. A path model may be entertained whereby an individual’s characteristics only affect their own characteristics rather than anyone else’s. In this case, \(\boldsymbol{\Psi}^{(l)}=\boldsymbol{\Psi}^{(l)}\otimes{\bf I}_{n}\) is block diagonal, and \(\boldsymbol{\Psi}=\boldsymbol{\Psi}\otimes{\bf I}_{n}\) where \(\boldsymbol{\Psi} = \left[\boldsymbol{\Psi}^{(1)}, \boldsymbol{\Psi}^{(2)} \dots \boldsymbol{\Psi}^{(L)}\right]\). Consequently,

\[\begin{array}{rl} \boldsymbol{\Lambda}=&{\bf I}_{N}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right)\\ =&{\bf I}_{N}-\left(\boldsymbol{\Psi}\otimes{\bf I}_{n}\right)\left(\boldsymbol{\Lambda}\otimes {\bf I}_{N}\right)\\ =&\left({\bf I}_{k}\otimes{\bf I}_{n}\right)-\left(\boldsymbol{\Psi}\otimes{\bf I}_{n}\right)\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\otimes{\bf I}_{n}\right)\\ =&\left({\bf I}_{k}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\right)\right)\otimes{\bf I}_{n}\\ \end{array}\]

and so \(\boldsymbol{\Lambda}^{-1}=\left({\bf I}_{k}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\right)\right)^{-1}\otimes{\bf I}_{n}\) and \(|\boldsymbol{\Lambda}| = |{\bf I}_{k}-\boldsymbol{\Psi}\left(\boldsymbol{\Lambda}\otimes {\bf I}_{k}\right)|^{n}\)

Mean-centring responses can help mixing, because \(\boldsymbol{\theta}\) and \(\boldsymbol{\lambda}\) are not sampled in a block. (Jarrod - I guess when \(|\boldsymbol{\Lambda}|=1\) this could be detected and updating occur in a block?)

10.3 Antedependence

An \(n\times n\) unstructured covariance matrix can be reparameterised in terms of regression coefficients and residual variances from a set of \(n\) nested multiple regressions. For example, for \(n=3\) the following 3 multiple regressions can be defined:

\[\begin{array}{rl} u_{3} =& u_{2}\beta_{3|2}+u_{1}\beta_{3|1}+e_{u_{3}}\\ u_{2} =& u_{1}\beta_{2|1}+e_{u_{2}}\\ u_{1} =& e_{u_{1}}\\ \end{array}\]

Arranging the regression coefficients and residual (‘innovation’) variances into a lower triangular matrix and diagonal matrix respectively:

\[{\bf L}_{u}= \left[ \begin{array}{ccc} 1&0&0\\ -\beta_{2|1}&1&0\\ -\beta_{3|1}&-\beta_{3|2}&1\\ \end{array} \right]\]

and

\[{\bf D}_{u}= \left[ \begin{array}{ccc} \sigma^{2}_{e_{u_{1}}}&0&0\\ 0&\sigma^{2}_{e_{u_{2}}}&0\\ 0&0&\sigma^{2}_{e_{u_{3}}}\\ \end{array} \right]\]

gives \[{\bf V}_{u} = {\bf L}_{u}{\bf D}_{u}{\bf L}_{u}^{\top}\]

Rather than fit the saturated model (in this case all 3 regression coefficients) \(k^{th}\) order antedependence models seek to model \({\bf V}_{u}\) whilst constraining the regression coefficients in \({\bf L}_{u}\) to be zero if they are on sub-diagonals. For example, a first order antedependence model would set the regression coefficients in the second off-diagonal (i.e. \(\beta_{3|1}\)) to zero, but estimate those in the first sub-diagonal (i.e. \(\beta_{2|1}\) and \(\beta_{3|2}\)). For a \(3\times3\) matrix, a second order antedependence model would fit a fully unstructured covariance matrix. In terms of Gibbs sampling this parameterisation is less efficient because \({\bf V}_{u}\) is sampled in two blocks (the regression coefficients followed by the innovation variances) rather than in a single block from the inverse Wishart. However, more flexible conjugate prior specifications are possible by placing multivariate normal priors on the regression coefficients and independent inverse Wishart priors on the innovation variances. By constraining arbitrary regression coefficients to be zero in a fully unstructured model allows any fully recursive path model to be constructed for a set of random effects.

10.4 Scaling

The path analyses described above essentially allow elements of the response vector to be regressed on each other. Regressing an observation on itself would seem like a peculiar thing to do, although with a little work we can show that by doing this we can allow two sets of observations to conform to the same model except for a difference in scale.