As usual, our starting point is a random experiment with probability measure \(\P\) on an underlying sample space. Unless otherwise noted, we assume that all expected values mentioned in this section exist. Suppose now that \(X\) and \(Y\) are real-valued random variables for the experiment with means \(\E(X)\), \(\E(Y)\) and variances \(\var(X)\), \(\var(Y)\), respectively.
The covariance of \((X, Y)\) is defined by \[ \cov(X, Y) = \E\left(\left[X - \E(X)\right]\left[Y - \E(Y)\right]\right) \] and, assuming the variances are positive, the correlation of \( (X, Y)\) is defined by \[ \cor(X, Y) = \frac<\cov(X, Y)><\sd(X) \sd(Y)> \]
Correlation is a scaled version of covariance; note that the two parameters always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated ; when the sign is negative, the variables are said to be negatively correlated ; and when the sign is 0, the variables are said to be uncorrelated. Note also that correlation is dimensionless, since the numerator and denominator have the same physical units, namely the product of the units of \(X\) and \(Y\).
As these terms suggest, covariance and correlation measure a certain kind of dependence between the variables. One of our goals is a deep understanding of this dependence. As a start, note that \(\left(\E(X), \E(Y)\right)\) is the center of the joint distribution of \((X, Y)\), and the vertical and horizontal lines through this point separate \(\R\) into four quadrants. The function \((x, y) \mapsto \left[x - \E(X)\right]\left[y - \E(Y)\right]\) is positive on the first and third of these
quadrants and negative on the second and fourth.
A joint distribution with \( \left(\E(X), \E(Y)\right) \) as the center of mass
Properties of Covariance
The following theorems give some basic properties of covariance. The main tool that we will need is the fact that expected value is a linear operation. Other important properties will be derived below, in the subsection on the best linear predictor. As usual, be sure to try the proofs yourself before reading the ones in the text.
Our first result is a formula that is better than the definition for computational purposes
\(\cov(X, Y) = \E(X Y) - \E(X) \, \E(Y)\).
Let \( \mu = \E(X) \) and \( \nu = \E(Y) \). Then
\[ \cov(X, Y) = \E\left[(X - \mu)(Y - \nu)\right] = \E(X Y - \mu Y - \nu X + \mu \nu) = \E(X Y) - \mu \E(Y) - \nu \E(X) + \mu \nu = \E(X Y) - \mu \nu \]
By the previous result. we see that \(X\) and \(Y\) are uncorrelated if and only if \(\E(X Y) = \E(X) \E(Y)\). In particular, if \(X\) and \(Y\) are independent. then they are uncorrelated. However, the converse fails with a passion: an exercise below gives an example of two variables that are functionally related (the strongest form of dependence), yet uncorrelated. The computational exercises give other examples of dependent yet uncorrelated variables also. Note also that if one of the variables has mean 0, then the covariance is simply the expected product.
Trivially, covariance is a symmetric operation.