We introduced a notation earlier in the course called the sum of squares. This notation was the SS notation, and will make these formulas much easier to work with.
Notice these are all the same pattern,
SS(x) could be written as
Also note that
Pearson's Correlation Coefficient
There is a measure of linear correlation. The population parameter is denoted by the greek letter rho and the sample statistic is denoted by the roman letter r.Here are some properties of r
- r only measures the strength of a linear relationship. There are other kinds of relationships besides linear.
- r is always between -1 and 1 inclusive. -1 means perfect negative linear correlation and +1 means perfect positive linear correlation
- r has the same sign as the slope of the regression (best fit) line
- r does not change if the independent (x) and dependent (y) variables are interchanged
- r does not change if the scale on either variable is changed. You may multiply, divide, add, or subtract a value to/from all the x-values or y-values without changing the value of r.
- r has a Student's t distribution
Here is the formula for r. Don't worry about it, we won't be finding it this way. This formula can be simplified through some simple algebra and then some substitutions using
the SS notation discussed earlier.
If you divide the numerator and denominator by n. then you get something which is starting to hopefully look familiar. Each of these values have been seen before in the Sum of Squares notation section. So, the linear correlation coefficient can be written in terms of sum of squares.
This is the formula that we would be using for calculating the linear correlation coefficient if we were doing it by hand. Luckily for us, the TI-82 has this calculation built into it, and we won't have to do it by hand at all.
The claim we will be testing is "There is significant linear correlation"The Greek letter for r is rho, so the parameter used for linear correlation is rho
- H0. rho = 0
- H1. rho <> 0
r has a t distribution with n-2 degrees of freedom, and the test statistic is given by:
Now, there are n-2 degrees of freedom this time. This is a difference from before. As an over-simplification, you subtract one degree of freedom for each variable, and since there are 2 variables, the degrees of freedom are n-2.
This doesn't look like our
If you consider the standard error for r is
the formula for the test statistic is
, which does look like the pattern we're looking for.