A data model explicitly describes a relationship between predictor and response variables. Linear regression fits a data model that is linear in the model coefficients. The most common type of linear regression is a least-squares fit. which can fit both lines and polynomials, among other linear models.
Before you model the relationship between pairs of quantities, it is a good idea to perform correlation analysis to establish if a linear relationship exists between these quantities. Be aware that variables can have nonlinear relationships, which correlation analysis cannot detect. For more information, see Linear Correlation .
The MATLAB ® Basic Fitting UI helps you to fit your data, so you can calculate model coefficients and plot the model on top of the data. For an example, see Example: Using Basic Fitting UI. You also can use the MATLAB polyfit and polyval functions to fit your data to a model that is linear in the coefficients. For an example, see Programmatic Fitting .
If you need to fit data with a nonlinear model, transform the variables to make the relationship linear. Alternatively, try to fit a nonlinear function directly using either the Statistics and Machine Learning Toolbox™ nlinfit function, the Optimization Toolbox™ lsqcurvefit function, or by applying functions in the Curve Fitting Toolbox™.
This topic explains how to:
Use correlation analysis to determine whether two quantities are related to justify fitting the data.
Fit a linear model to the data.
Evaluate the goodness of fit by plotting residuals and looking for patterns.
Calculate measures of goodness of fit R 2 and adjusted R 2
Residuals and Goodness of Fit
Residuals are the difference between the observed values of the response (dependent) variable and the values that a model predicts. When you fit a model that is appropriate for your data, the residuals approximate independent random errors. That is, the distribution of residuals ought not to exhibit a discernible pattern.
Producing a fit using a linear model requires minimizing the sum of the squares of the residuals. This minimization yields what is
called a least-squares fit. You can gain insight into the "goodness" of a fit by visually examining a plot of the residuals. If the residual plot has a pattern (that is, residual data points do not appear to have a random scatter), the randomness indicates that the model does not properly fit the data.
Evaluate each fit you make in the context of your data. For example, if your goal of fitting the data is to extract coefficients that have physical meaning, then it is important that your model reflect the physics of the data. Understanding what your data represents, how it was measured, and how it is modeled is important when evaluating the goodness of fit.
One measure of goodness of fit is the coefficient of determination. or R 2 (pronounced r-square). This statistic indicates how closely values you obtain from fitting a model match the dependent variable the model is intended to predict. Statisticians often define R 2 using the residual variance from a fitted model:
R 2 = 1 – SSresid / SStotal
SSresid is the sum of the squared residuals from the regression. SStotal is the sum of the squared differences from the mean of the dependent variable ( total sum of squares ). Both are positive scalars.
To learn how to compute R 2 when you use the Basic Fitting tool, see Derive R 2. the Coefficient of Determination. To learn more about calculating the R 2 statistic and its multivariate generalization, continue reading here.
Example: Computing R 2 from Polynomial Fits
You can derive R 2 from the coefficients of a polynomial regression to determine how much variance in y a linear model explains, as the following example describes:
Create two variables, x and y. from the first two columns of the count variable in the data file count.dat :
Use polyfit to compute a linear regression that predicts y from x :
p(1) is the slope and p(2) is the intercept of the linear predictor. You can also obtain regression coefficients using the Basic Fitting UI .