Correlation Explained [TimeWeb]
The correlation between two variables is the degree to which there is a 'linear relationship' between them. Correlation is usually expressed as a 'coefficient' which measures the strength of that linear relationship between the variables.
Two main methods of calculating correlations are Spearman's Rank Correlation Coefficient and Pearson's or the Product-Moment Correlation Coefficient .
There is an illustration available of the correlation of variables .
The Correlation Coefficient (r)
Scatter plots (or x-y charts) and regression lines can provide a general picture of the correlation between two sets of variables. Often though, you will need to give a more precise measurement of the degree of correlation. Using a correlation coefficient is the way to produce a mathematical measure of correlation.
There is more detailed help available on Spearman's Rank Correlation and Pearson's Correlation Coefficients in the illustration section. What we want to do here, is to go through the steps that Excel follows to calculate r, the correlation coefficient. This is included for interest, as many of you will be using Excel to arrive at coefficients of correlation. There's also a review of the two main correlation methods included as a recap. Don't forget that TimeWeb has an Excel Guide available in the reference section. This is accesible at any time from the navigation bar on the left-hand side.
- Firstly, you transform the scores in each of the sets of data into z-scores. Remember that a z-score is a measure of how far any particular score is from the mean of the entire set and that the units of z-scores are standard deviations. So a z score of 2.5 means that this value is 2.5 standard deviations above the mean; a z-score of -2.5 means that this value falls 2.5 standard deviations below the mean.
and significance of the coefficient
The following general categories indicate a quick way of interpreting a calculated r value:
0.0 to 0.2 Very weak to negligible correlation
0.2 to 0.4 Weak, low correlation (not very significant)
0.4 to 0.7 Moderate correlation
0.7 to 0.9 Strong, high correlation
0.9 to 1.0 Very strong correlation
One useful rule of thumb for estimating the importance of the r value is to calculate the square of the correlation coefficient. That is to say, calculate r 2. This squared result will give us a rough percentage for the amount of variation in the final result which is directly attributable to the other variable.
For example, let's suppose that you have data on school performance, that you want to test to see how good a predictor of success in GCSE level work, is the writing ability of children at entry point into the school.
Let's say that you find that the correlation between the writing test and the GCSE performance is 0.28.
If we square this value, we get the value 0.0784, or 78 in 1000 or 7.8 per cent. On the basis of this, we can claim that 7.8 per cent of the learners' success in school is attributable to the writing skills they had at the very start of it. You would be forced to conclude that success in overall school performance is not very dependent upon the writing skills the students have upon entry.
Suppose now we administer another diagnostic test at the very start of the course, this time measuring reading ability. We find that when leaving secondary school the correlation between the results of the reading test (on entry) and the final marks for GCSE is 0.76 (r = 0.76). If we square 0.76, we get 0.58 or 58 per cent.
We can then claim that 58 per cent of the students' success in school is directly attributable to the reading skills they possessed (and which we measured) when they started.
Correlation analysis is a useful tool for investigating all sorts of claims. We are often told that one particular characteristic, skill, or value has a direct and important effect upon another: the number of years spent in education and the expected level of income at 35 years of age for instance; frequency of smoking and incidence of heart disease; success in a particular test and achievement in another aspect of life, and so on).
Correlation analysis enables us to test such claims and to provide some quantifiable measure to them.
There is an illustration available of Excel's correlation function to show how you can use Excel to calculate these values.
Submitted by ahargrave on Mon, 11/02/2002 - 13:00