A calculus of the absurd

22.7.9 The method of least squares

There’s a fairly intuitive notion that the orthogonal (well, “perpendicular”) line minimises the distance between points.

(-tikz- diagram)

I had a surprising amount of trouble trying to get a nice little right- angle symbol onto the diagram (and ultimately failed), but the lin connecting the point to the other line is in fact perpendicular to the line and also the shortest distance.

Usually the question of how to minimising or maximising a quantity requires some analysis (usually differentiation), however, here we are lucky to have a case where we can minimise things using only linear algebra!

The key idea in least squares is that when we have a system of linear equations \(Ax = b\), sometimes we cannot find a solution to our system of linear equations (which sucks), but it would be nice to have an approximate solution. That is, our problem is how to find an \(x_0\) such that

\begin{equation} Ax_0 = b \end{equation}

is the next best thing to a solution to \(Ax_0 = b\). How exactly we should define optimal actually does matter (for example, it turns out that in the “real world” error in experiments tends to follow a certain kind of statistical distribution; the normal distribution), but let us be driven by what seems simplest, and say that whatever mimisises the “distance” between \(Ax_0\) and \(b\) is best, that is we seek to find \(x_0\) which minimises

\begin{equation} \norm {Ax - b} \end{equation}

Note that we are essentially trying to find the vector in \(R(A)\) (the range of \(A\)) which is closest to \(b\).