Welcome to our Knowledge Base

Covariance Selection

You are here:
< last topic

Covariance Selection

Mean reversion is a classic trading strategy, which assumes that a stock’s price will tend to move to the average price over time. It is a classic indicator of predictability in financial markets.

Traditionally, people use cointegration or canonical correlation analysis to obtain the optimal mean reversion portfolio. However, there are some shortcomings of these methods:

  • The mean reverting portfolios they identify include every asset in the time series analyzed. It involves considerable transaction costs.
  • Less interpretability of the resulting portfolio.
  • Optimally mean reverting portfolios often behave like noise and sometimes vary well inside bid-ask spreads, hence do not form meaningful statistical arbitrage opportunities.

Hence, we want to form portfolios with maximum mean reversion while constraining the number of assets in these portfolios. The covariance selection method is included in this article.

Algorithms for extracting sparse mean reverting portfolios

The asset prices follow a stationary VAR process with

\(S_t = S_{t-1}A + Z_t\)

where \(S_{t-1}\) is the lagged portfolio process, \(A \in R^{n\times n}\) and \(Z_t\) is a vector of i.i.d. \(Z_t\) is the noise term following \(\mathcal{N}(0,\Sigma)\), where \(\Sigma \in S^n\), and it is independent of \(S_{t-1}\). Assume \(S_t\) has zero mean. After computation we get:

\(\hat{A} = (S^T_{t-1}S_{t-1})^{-1}S^T_{t-1}S_t\)

Here are two techniques to get good approximate solutions:

1. Greedy Search

Call \(I_k\) the support of the solution vector \(x\) given \(k > 0\)

\(I_k = \{i \in [1,n]: x_i \neq 0\}\)

by construction \(|I_k| \le k\).

Suppose we have a good approximate solution with support set \(I_k\) given by:

\(x_k = \underset{\{x\in R^n:x_{I^c_k}=0\}}{argmax}\frac{x^TAx}{x^TBx}\)

where \(I_k^c\) is the complement of the set \(I_k\). We seek to add one variable with index \(i_{k+1}\) to the set \(I_k\) to produce the largest increase in predictability by scanning each of the remaining indices in \(I_k^c\). The index \(i_{k+1}\) is then given by:

\(i_{k+1} = \underset{i \in I_k^c}{argmax}\underset{\{x\in R^n:x_{J_i}=0\}}{max}\frac{x^TAx}{x^TBx},\) where \(J_i = I_k^c \setminus \{i\}\)

which amounts to solving  generalized eigenvalue problems of size . We then define:

\(I_{k+1} = I_k\cup \{i_{k+1}\}\)

And repeat the procedure until \(k=n\).

2. Semidefinite Relaxation Techniques

Transfer the original question to the weak convex constraint question:

Maximize \(Tr(AY)\)

Subject to \(1^T|Y|1 – kz \le 0\)

\(Tr(Y) – z = 0\)

\(Tr(BY) = 1\)

\(Y \succeq 0\)

where \(Y = \frac{X}{Tr(BX)}\) and \(z = \frac{1}{Tr(BX)}\).

The computational complexity of this relaxation is significantly higher than that of the greedy search algorithm.

Parameter Estimation

Both \(\Gamma\) and \(A\) estimates suffer from the stability issues. We can penalize the covariance estimation using a multiple of the norm of \(\Gamma\) to stabilize the estimation.

1. Covariance Selection

We estimate the covariance matrix  by maximum likelihood:

\(\underset{X}{max}\ log\ det X – Tr(\Sigma X)-\rho Card(X)\)

where \(Card(X)\) is the cardinality of \(X\), i.e. the number of nonzero coefficients in \(X\) and \(\rho > 0\) is a parameter controlling the tradeoff between likelihood and structure.

The advantages of this process are:

  • Improves the stability of this estimation procedure by implicitly reducing the number of parameters
  • Directly highlights structure in the underlying model.

However, the cardinality penalty makes this problem very hard to solve numerically. So we improve it by:

\(\underset{X}{max}\ log\ det X – Tr(\Sigma X)-\rho \sum_{i,j=1}^n |X_{ij}|\)

2. Estimating Structured VAR Models

(1) Endogenous dependence models

Assume that the conditional dependence structure of the assets \(S_t\) is purely endogenous, with the VAR model \(S_t = S_{t-1}A + Z_t\) where \(Z_t \sim \mathcal{N}(0,\sigma I)\) and choosing \(\sigma\) small enough, we can get \(A\) as a matrix square root of \((I-\sigma\Gamma^{-1})\).

(2) Exogenous dependence models

Modify the previous \(\hat{A} = (S^T_{t-1}S_{t-1})^{-1}S^T_{t-1}S_t\) estimation in order to get a spare model matrix \(A\), we get the columns of \(A\) by solving:

\(a_i = \underset{x}{argmax}||S_{it} – S_{t-1}x||^2 + \gamma ||x||_1\)

3. Canonical Decomposition with Penalized Estimation

We want to combine covariance selection and the penalized regression to extract information on the support of the canonical portfolios.

We first estimate a sparse inverse covariance matrix by solving the improved covariance selection formula, setting \(\rho\) large enough. We then check if penalized estimates of \(A\) share some clusters with the graph of \(\Gamma^{-1}\). After this preprocessing step, we use the greedy search or semidefinite relaxation algorithms to search these clusters of variables for optimal mean reverting portfolios.


  1. So far only the simple models with priori recognize that a few variables have economic significance can be recovered by the sparse canonical decomposition.
  2. Currently we have no procedure for deriving simple bounds on suboptimality for the greedy algorithm.


  1. “O. Banerjee, L. E. Ghaoui and A. d’Aspremont, “Model Selection Through Sparse Maximum Likelihood Estimation for multivariate Gaussian or Binary Data,” Journal of Machine Learning Research, 9, pp. 485-516, March 2008.”
  2. “A. d’Aspremont, “Identifying Small Mean Reverting Portfolios”, 2008.”
Was this article helpful?
0 out of 5 stars
5 Stars 0%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
How can we improve this article?
Table of Contents

Leave a Reply

Your email address will not be published. Required fields are marked *