The Pearson correlation coefficient is the most widely used. It measures the strength of the linear relationship between normally distributed variables. When the variables are not normally distributed or the relationship between the variables is not linear, it may be more appropriate to use the Spearman rank correlation method.
All Answers (106)
Yuanzhang Li · Walter Reed Army Institute of Research
Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. The form of the definition involves a "product moment", that is, the mean (the first moment about the origin) of the product of the mean-adjusted random variables; hence the modifier product-moment in the name.
If the covariance and variances of the two vairables exist, the Pearson's correlation coefficient can be extimated. No normal distribution assmption is needed.
In the case of the bivariate normal distribution, the sample correlation coefficient is the maximum likelihood estimate of the population correlation coefficient, and
is asymptotically unbiased and efficient. which roughly means that it is impossible to construct a more accurate estimate than the sample correlation coefficient if the data are normal and the sample size is moderate or large.
For non-normal populations, the sample correlation coefficient remains approximately unbiased, but may not be efficient. The sample correlation coefficient is a consistent estimator of the population correlation coefficient as long as the sample means, variances, and covariance are consistent (which is guaranteed when the law of large numbers can be applied).
The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables. If the data has outliers, a few values are far away from others, use Spearman correlation coefficient .
Note that this method should not be used in cases where the data set is truncated; that is, when the Spearman correlation coefficient is desired for the top X records (whether by pre-change rank or post-change rank, or both), the user should use the Pearson correlation coefficient formula .