Least Squares and Other Powers ```Name: Bill Status: educator Age: N/A Location: N/A Country: N/A Date: 6/22/2005 ``` Question: The method of least squares insures that the selected so-called best fit line is the one such that the sum of the squares of the y-distance from all the points in the scatter of points to the best fit line is less than for any other line of the form chosen to fit the scatter. Is this just because it is convenient to use squares? What about 4th's? The math is very difficult if not impossible, but would we get a different line? If we did, which line would be a better fit? It seems arbitrary, even though it is convenient and probably is good enough for statistical purposes. Replies: Least squares has several things to recommend it: it is simple; it is insensitive to the sign of the difference between a data point and a fit point; it varies smoothly as the data-fit difference goes through zero; its sensitivity to data-fit differences does not change too drastically as those differences grow; people have studied least squares more thoroughly than other goodness-of-fit indicators, and a body of knowledge has grown up around it. You could also use the absolute magnitude of the data-fit difference, but this has a kink at zero, so you cannot use the derivative of the difference in your fitting algorithm. You could also use 4th's, as you note. In this case you would get a different best fit because least-4th's penalize large data-fit differences more severely than least-squares do. Tim Mooney "Least squares" is a sub-set of a more general method of fitting data to an equation called "regression analysis". It is a concept that is easier to understand "in principle" than it is to actually derive the specific appropriate formulas. It does require some understanding of calculus, however. The method asks the question: If there is a set of data (xi , yi) and a function Y(X) there exists residuals (yi - Y(Xi)) which can be added: SUM(xi,yi) = [|yi - Y(Xi)|], where the vertical bars |---| mean the absolute value of the difference. Regression analysis asks the question: If there are adjustable parameters in the function: Y(X), how do I select values of these adjustable parameters so that the sum of the residuals SUM(xi,yi) is a minimum. The reason that the direct difference is not used often is that the absolute value function is discontinuous (It has a "kink" at X = 0) whereas the square of the difference is a continuous function with continuous derivatives. The values of the adjustable parameters that result in a minimum SUM(xi,yi) are calculated by taking the partial derivatives of the SUM(xi,yi) with respect to each of the adjustable parameters and setting each derivative equal to zero. This results in "n" algebraic equations in each of the "n" adjustable parameters. These are soluble "in principle" even though the actual equations to be solved are quite messy. The restrictions on Y(X) are quite general, and whether one decides to minimize: SUM(xi,yi) or SUM^2(xi,yi) or SUM^4(xi,yi) is also quite general, even though any odd power of S^n is going to have the same problem of negative differences just like SUM(xi,yi). Even the experimental points (xi,yi) may be weighted differently. None of these options pose difficulties "in principle" but may be very involved to find the desired values of the adjustable parameters. So your question about minimizing SUM^4(xi,yi) poses no problem "in principle". What it does do is to give greater "weight" to points that differ from the function Y(X), but in some cases that may be what you want to do. Vince Calder Click here to return to the Mathematics Archives

NEWTON is an electronic community for Science, Math, and Computer Science K-12 Educators, sponsored and operated by Argonne National Laboratory's Educational Programs, Andrew Skipor, Ph.D., Head of Educational Programs.

For assistance with NEWTON contact a System Operator (help@newton.dep.anl.gov), or at Argonne's Educational Programs