curvefit.com. Guide to nonlinear regression.Try our software free for 30 days.StatMate leads you step by step through power and sample size calculations.InStat is a less cumbersome alternative to typical heavy-duty statistical programs. With InStat, even a statistical novice can analyze data in just a few minutes.Prism is a powerful combination of basic biostatistics, curve fitting and scientific graphing in one comprehensive program.GraphPad Software. Data analysis and biostatistics resources.


spa

Table of contents
Intro to regression
Nonlinear regression
Curve fitting with Prism
Interpreting the results


s

Questions
Are results sensible?
SE and CI
Goodness of fit
Systematic deviation
Local minimum
Assumptions
Common errors
Comparing two curves
Distributions of best-fit values
Radioligand binding
Saturation binding
Competitive binding
Kinetics of binding
Dose-response curves
Enzyme kinetics
Standard curves
More information
Search curvefit.com


curvefit.com was created by GraphPad Software, Inc. Send comments or questions to the author of these pages, Dr. Harvey Motulsky, president of GraphPad Software.

In April 2003, GraphPad released Prism 4 and published Fitting Models to Biological Data using Linear and Nonlinear Regression. This book includes all the information that comprises curvefit.com, and much more. You can read this book as a pdf file.

How good is the fit?

Sum-of-squares from nonlinear regression

The sum-of-squares (SS) is the sum of the square of the vertical distances of the points from the curve. Nonlinear regression works by varying the values of the variables to minimize the sum-of-squares. It is expressed in the square of the units used for the Y values.

If you chose to weight the values and minimize the relative distance squared, Prism reports both the absolute sum-of-squares (defined above) and the relative sum-of-squares, which is minimized.

Sy.x

The value sy.x is the standard deviation of the vertical distances of the points from the line. Since the distances of the points from the line are called residuals, sy.x is the standard deviation of the residuals. Its value is expressed in the same units as Y. Some programs call this quantity s.e

Prism calculates sy.x from the sum-of-squares (SS) and degrees of freedom (df, equal to number of data points minus the number of parameters fit) as:

MathType Equation

R2 from nonlinear regression

The value R2 quantifies goodness of fit. It is a fraction between 0.0 and 1.0, and has no units. Higher values indicate that the model fits the data better. You can interpret R2 from nonlinear regression very much like you interpret r2 from linear regression.  By tradition, statisticians use uppercase (R2) for the results of nonlinear and multiple regression and lowercase (r2) for the results of linear regression, but this is a distinction without a difference.

Tip: Don't make the mistake of using R2 as your main criterion for whether a fit is reasonable. A high R2 tells you that the curve came very close to the points. That doesn't mean the fit is "good" in other ways. The best-fit values of the parameters may have values that make no sense (for example, negative rate constants) or the confidence intervals may be very wide.

When R2 equals 0.0, the best-fit curve fits the data no better than a horizontal line going through the mean of all Y values. In this case, knowing X does not help you predict Y. When R2=1.0, all points lie exactly on the curve with no scatter. If you know X you can calculate Y exactly. You can think of R2 as the fraction of the total variance of Y that is explained by the model (equation).

R2 is computed from the sum of the squares of the distances of the points from the best-fit curve determined by nonlinear regression. This sum-of-squares value is called SSreg, which is in the units of the Y-axis squared. To turn R2 into a fraction, the results are normalized to the sum of the square of the distances of the points from a horizontal line through the mean of all Y values. This value is called SStot. If the curve fits the data well, SSreg will be much smaller than SStot.

The figure below illustrates the calculation of R2. Both panels show the same data and best-fit curve. The left panel also shows a horizontal line at the mean of all Y values, and vertical lines showing how far each point is from the mean of all Y values. The sum of the square of these distances (SStot) equals 62735. The right panel shows the vertical distance of each point from the best-fit curve. The sum of squares of these distances (SSreg) equals 4165.

R2 is calculated using this equation.

MathType Equation

Note that R2 is not really the square of anything. If SSreg is larger than SStot, R2 will be negative. While it is surprising to see something called "squared" have a negative value, it is not impossible (since R2 is not actually the square of R). R2 will be negative when the best-fit curve fits the data worse than a horizontal line at the mean Y value. This could happen if you pick an inappropriate model, or fix a parameter to an inappropriate constant value (for example, if you fix the Hill slope of a dose-response curve to 1.0 when the curve goes downhill).

Even if you choose weighted nonlinear regression, Prism still computes R2 using the formula above. In other words, it computes R2 from the unweighted sum-of-squares.

If you want to compare the fit of two equations, don't just compare R2 values. See Comparing the fits of two models

Does the curve systematically deviate from the data?


All contents copyright © 1999 by GraphPad Software, Inc. All rights reserved.