31.1 QR decomposition
31.1.1 Definition and some uses
The QR decomposition is a very powerful matrix factorisation. The idea is that for any matrix we can find two matrices such that
where is an orthogonal matrix and is an upper triangular matrix.
Example 31.1.1
Consider the least squares problem of finding the value of which minimises . If we have a decomposition for we can then write
(31.2) | ||||
(31.3) | ||||
(31.4) |
31.1.2 Householder transformations
We already know one method for computing a decomposition; Gram-Schmitt. Unfortunately, Gram-Schmitt is a fantastic theoretical tool, but very numerically unstable. There are alternative methods which are more suitable for computational applications.
One of these uses Householder transformations. The idea here is that we will apply a series of transformations which zero out everything in our matrix. In algebraic terms we will be carrying out the operation
and if all the s are orthogonal this means that we can find
which is a decomposition where
because the inverse of an orthogonal matrix is also orthogonal, and the product of orthogonal matrices is also orthogonal.
The idea is that the first will carry out the following transformation (in matrix terms).
The second will carry out
and so on, until we get a triangular matrix.
The matrices, can be interpreted as Householder reflections for smaller matrices. The th transformation should only affect the sub-matrix formed by excluding the first rows and columns. This means that will look like
where is a Householder reflection for the smaller square matrix formed by eliminating the first row and column of our original matrix. We can continue to apply this recursively to find the remaining matrices.
Let us consider how will work - in matrix terms we will get something along the lines of
so we want to satisfy two very important constraints
-
•
be orthogonal
-
•
the column vector should have all zero entries (the only exception being the first, which can be anything)
Orthogonal transformations are generally rotations or reflections. In this case we will use reflections. It is also possible to use rotations, as we will see in the next section.
If we consider this diagram (you might want to draw your own as mine is uniquely terrible) we can see that we want
we need to reflect to (rather than just or some other value) so that we preserve the size of the vector (and thus have an orthogonal transformation).
Consider the dotted line, which we can call (for hyperplane, a reminder that this also works in higher dimensions). The orthogonal projection of onto is given by
and while we’re at it we should define , which is
and if you are wondering why there is a there, more on that later (it’s about numerical stability). The important thing is that either works. The diagram uses , but we could also use , except that we’d be on the negative -axis and by drawing skills aren’t that good.
Therefore we have the complementary projector
which projects onto (the dotted line). This is actually pretty helpful because it gets us half the way to . To get the rest of the way there we just need to multiply by , i.e. use
Thus to find we have to compute the corresponding , and then use Equation 31.15.
31.1.3 Givens rotations
These are like Householder reflections, but more fiddly. Personally I prefer Householder reflections.
The idea with a Givens rotation is that instead of zeroing out all the entries in each vector at the same time, we do this one by one. Visually this goes something like
(31.28) | ||||
(31.35) | ||||
(31.36) | ||||
(31.43) |
Therefore the question is really how we can find a matrix which satisfies
This problem looks pretty hard, but we can simplify it by first consider the d problem; we seek such that
where so that is an orthogonal transformation. This is just the linear system
where we can assume the special structure for because we want to be a rotation. Therefore we want our solution to satisfy
(31.47) | |||
(31.48) |
which means that we need
Here we have to be a bit careful because and is quite big and thus susceptible to overflow error. There are good ways to compute this but let’s not get distracted.
We can generalise this to the -dimensional case. Let represent the transform for an matrix. For a given (pun entirely unintended) vector we know
and we are interested in leaving all the vectors unpreserved except the th one, which means we need where . For the and terms we want
(31.65) |
so that the transform gives us
which applies the rotation as required.