Department of Energy Argonne National Laboratory Office of Science NEWTON's Homepage NEWTON's Homepage
NEWTON, Ask A Scientist!
NEWTON Home Page NEWTON Teachers Visit Our Archives Ask A Question How To Ask A Question Question of the Week Our Expert Scientists Volunteer at NEWTON! Frequently Asked Questions Referencing NEWTON About NEWTON About Ask A Scientist Education At Argonne Least Squares
Name: Joshua
Status: student	
Age:  N/A
Location: N/A
Country: N/A
Date: N/A


Question:
What is the principle of least squares?



Replies:
Least squares, also called "regression analysis", is a computational procedure for fitting an equation to a set of experimental data points. The criterion of the "best" fit is that the sum of the squares of the differences between the observed data points, (Xi,Yi), and the value calculated by the fitting equation, Ycalc(Xi), i.e.: SUM [Yi - Ycalc(Xi)]^2 is a minimum. The reason for using the square is to make all of the differences positive numbers. If the fitting equation Ycalc(Xi) contains 'j' adjustable parameters: i.e. Ycalc(a1, a2, ..., aj) then the minimum in the sum of difference squared occurs when: dSUM/da1 = dSUM/da2 = ...= dSUM/daj =0 where: the quantities dSUM/da1, dSUM/da2, ..., dSUM/daj are the first derivatives of the sum with respect to the constants a1, a2, ..., aj.

This is the "in principle" part. The actual formulas for calculating "the best" values of a1, a2, ..., aj can be messy depending upon the equation chosen for Ycalc(Xi), but don't let that obscure what is happening "in principle". Most data analysis programs, even those on hand held calculators, "crunch" the numbers for you.

There are several things that need to be kept in mind: 1. The SUM of squares of the differences being a minimum is not the only criterion of "best fit"; however, it is the most commonly used one. 2. The individual data points (Xi, Yi) can be given different "weight" if there is reason to expect some of the data to be more reliable than others. This means "counting" some data points more than others. 3. Most importantly, inherent in the procedure, is the condition that there is no error in the Xi. That is, the independent variable is known with exact precision and accuracy. This may or may not be true. A consequence of this last condition is the following: If you reverse the independent and dependent variables, that is, if you fit (Yi, Xi) instead of (Xi,Yi), you don't obtain the same fitting equation because in the case, (Yi,Xi) the condition is that the Yi's are known exactly, instead of the Xi's.

There are a number of statistical tests that can be applied to the "fit" to determine how good the "fit" is, but that is beyond what can be done here. One word of caution however. A good "fit" does not mean that X "caused" Y. It only means that X and Y are correlated with one another. Correlation is not causation!!

There are many web sites that discuss least squares (or regression) in varying degrees of sophistication. This one: http://ite.pubs.informs.org/Vol1No1/ErkutIngolfsson/ErkutIngolfsson.pdf gives a pretty good explanation as well as a number of other links.

Vince Calder



Click here to return to the Mathematics Archives

NEWTON is an electronic community for Science, Math, and Computer Science K-12 Educators, sponsored and operated by Argonne National Laboratory's Educational Programs, Andrew Skipor, Ph.D., Head of Educational Programs.

For assistance with NEWTON contact a System Operator (help@newton.dep.anl.gov), or at Argonne's Educational Programs

NEWTON AND ASK A SCIENTIST
Educational Programs
Building 360
9700 S. Cass Ave.
Argonne, Illinois
60439-4845, USA
Update: June 2012
Weclome To Newton

Argonne National Laboratory