

Least Squares and Other Powers
Name: Bill
Status: educator
Age: N/A
Location: N/A
Country: N/A
Date: 6/22/2005
Question:
The method of least squares insures that the selected
socalled best fit line is the one such that the sum of the squares of
the ydistance from all the points in the scatter of points to the
best fit line is less than for any other line of the form chosen to
fit the scatter. Is this just because it is convenient to use
squares? What about 4th's? The math is very difficult if not
impossible, but would we get a different line? If we did, which line
would be a better fit? It seems arbitrary, even though it is
convenient and probably is good enough for statistical purposes.
Replies:
Least squares has several things to recommend it: it is simple;
it is insensitive to the sign of the difference between a data point
and a fit point; it varies smoothly as the datafit difference goes
through zero; its sensitivity to datafit differences does not change
too drastically as those differences grow; people have studied least
squares more thoroughly than other goodnessoffit indicators, and a
body of knowledge has grown up around it.
You could also use the absolute magnitude of the datafit difference,
but this has a kink at zero, so you cannot use the derivative of the
difference in your fitting algorithm.
You could also use 4th's, as you note. In this case you would get a
different best fit because least4th's penalize large datafit differences
more severely than leastsquares do.
Tim Mooney
"Least squares" is a subset of a more general method of fitting data
to an equation called "regression analysis". It is a concept that is
easier to understand "in principle" than it is to actually derive the
specific appropriate formulas. It does require some understanding of
calculus, however. The method asks the question: If there is a set of data
(xi , yi) and a function Y(X) there exists residuals (yi  Y(Xi)) which
can be added: SUM(xi,yi) = [yi  Y(Xi)], where the vertical bars 
mean the absolute value of the difference. Regression analysis asks the
question: If there are adjustable parameters in the function: Y(X), how do
I select values of these adjustable parameters so that the sum of the
residuals SUM(xi,yi) is a minimum. The reason that the direct difference
is not used often is that the absolute value function is discontinuous (It
has a "kink" at X = 0) whereas the square of the difference is a
continuous function with continuous derivatives. The values of the
adjustable parameters that result in a minimum SUM(xi,yi) are calculated by
taking the partial derivatives of the SUM(xi,yi) with respect to each of
the adjustable parameters and setting each derivative equal to zero. This
results in "n" algebraic equations in each of the "n" adjustable
parameters. These are soluble "in principle" even though the actual
equations to be solved are quite messy. The restrictions on Y(X) are quite
general, and whether one decides to minimize:
SUM(xi,yi) or SUM^2(xi,yi) or SUM^4(xi,yi) is also quite general, even
though any odd power of S^n is going to have the same problem of negative
differences just like SUM(xi,yi). Even the experimental points (xi,yi) may
be weighted differently. None of these options pose difficulties "in
principle" but may be very involved to find the desired values of the
adjustable parameters. So your question about minimizing SUM^4(xi,yi) poses
no problem "in principle". What it does do is to give greater "weight" to
points that differ from the function Y(X), but in some cases that may be
what you want to do.
Vince Calder
Click here to return to the Mathematics Archives
 
Update: June 2012

