31.1 QR decomposition

31.1.1 Definition and some uses

The QR decomposition is a very powerful matrix factorisation. The idea is that for any matrix AMm×n()A\in M_{m\times n}(\mathbb{R}) we can find two matrices Q,RQ,R such that

A=QRA=QR (31.1)

where QQ is an orthogonal matrix and RR is an upper triangular matrix.

Example 31.1.1

Consider the least squares problem of finding the value of xx which minimises Axb2||Ax-b||_{2}. If we have a QRQR decomposition for AA we can then write

Axb2\displaystyle\left\lVert Ax-b\right\rVert_{2} =QRxb2\displaystyle=\left\lVert QRx-b\right\rVert_{2} (31.2)
=Q(RxQTb)2\displaystyle=\left\lVert Q(Rx-Q^{T}b)\right\rVert_{2} (31.3)
=RxQTb2\displaystyle=\left\lVert Rx-Q^{T}b\right\rVert_{2} (31.4)

31.1.2 Householder transformations

We already know one method for computing a QRQR decomposition; Gram-Schmitt. Unfortunately, Gram-Schmitt is a fantastic theoretical tool, but very numerically unstable. There are alternative methods which are more suitable for computational applications.

One of these uses Householder transformations. The idea here is that we will apply a series of transformations which zero out everything in our matrix. In algebraic terms we will be carrying out the operation

TnT3T2T1A=RT_{n}...T_{3}T_{2}T_{1}A=R (31.5)

and if all the TTs are orthogonal this means that we can find

A=(T11Tn11Tn1)RA=(T_{1}^{-1}...T_{n-1}^{-1}T_{n}^{-1})R (31.6)

which is a QRQR decomposition where

Q=T11Tn11Tn1Q=T_{1}^{-1}...T_{n-1}^{-1}T_{n}^{-1} (31.7)

because the inverse of an orthogonal matrix is also orthogonal, and the product of orthogonal matrices is also orthogonal.

The idea is that the first TT will carry out the following transformation (in matrix terms).

(××××××××××××××××××××××××)(××××0×××0×××0×××0×××0×××)\begin{pmatrix}\times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix}\mapsto\begin{pmatrix}\times&\times&% \times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\end{pmatrix} (31.8)

The second TT will carry out

(××××0×××0×××0×××0×××0×××)(××××0×××00××00××00××00××)\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\end{pmatrix}\mapsto\begin{pmatrix}\times&\times&\times&% \times\\ 0&\times&\times&\times\\ 0&0&\times&\times\\ 0&0&\times&\times\\ 0&0&\times&\times\\ 0&0&\times&\times\end{pmatrix} (31.9)

and so on, until we get a triangular matrix.

The matrices, T2,T3,,Tn1T_{2},T_{3},...,T_{n-1} can be interpreted as Householder reflections for smaller matrices. The iith transformation should only affect the sub-matrix formed by excluding the first i1i-1 rows and columns. This means that T1T_{1} will look like

[100000000T(n1)×T(n1)]\left[\begin{array}[]{c|c}1&\begin{array}[]{cccc}0&0&0&0\end{array}\\ \hline\cr\begin{array}[]{c}0\\ 0\\ 0\\ 0\end{array}&T_{(n-1)\times T_{(}n-1)}\end{array}\right] (31.10)

where T(n1)×(n1)T_{(n-1)\times(n-1)} is a Householder reflection for the smaller square matrix formed by eliminating the first row and column of our original matrix. We can continue to apply this recursively to find the remaining matrices.

Let us consider how T1T_{1} will work - in matrix terms we will get something along the lines of

T(A1A2An)=(TA1TA2TAn)T\begin{pmatrix}A_{1}&A_{2}&...&A_{n}\end{pmatrix}=\begin{pmatrix}TA_{1}&TA_{2% }&...&TA_{n}\end{pmatrix} (31.11)

so we want TT to satisfy two very important constraints

  • be orthogonal

  • the column vector TA1TA_{1} should have all zero entries (the only exception being the first, which can be anything)

Orthogonal transformations are generally rotations or reflections. In this case we will use reflections. It is also possible to use rotations, as we will see in the next section.

If we consider this diagram (you might want to draw your own as mine is uniquely terrible) we can see that we want

TA1=A1e1TA_{1}=\left\lVert A_{1}\right\rVert e_{1} (31.12)

we need to reflect to A1e1\left\lVert A_{1}\right\rVert e_{1} (rather than just e1e_{1} or some other value) so that we preserve the size of the vector (and thus have an orthogonal transformation).

A1A_{1}vvA1e1||A_{1}||e_{1}

Consider the dotted line, which we can call HH (for hyperplane, a reminder that this also works in higher dimensions). The orthogonal projection of A1A_{1} onto vv is given by

P=vvTvP=\frac{vv^{T}}{\left\lVert v\right\rVert}

and while we’re at it we should define vv, which is

v:=±A1e1A1v:=\pm\left\lVert A_{1}\right\rVert e_{1}-A_{1} (31.13)

and if you are wondering why there is a ±\pm there, more on that later (it’s about numerical stability). The important thing is that either works. The diagram uses v=A1e1A1v=\left\lVert A_{1}\right\rVert e_{1}-A_{1}, but we could also use A1e1A1-\left\lVert A_{1}\right\rVert e_{1}-A_{1}, except that we’d be on the negative e1e_{1}-axis and by drawing skills aren’t that good.

Therefore we have the complementary projector

T1/2=IvvTvT_{1/2}=I-\frac{vv^{T}}{\left\lVert v\right\rVert} (31.14)

which projects onto HH (the dotted line). This is actually pretty helpful because it gets us half the way to A1e1\left\lVert A_{1}\right\rVert e_{1}. To get the rest of the way there we just need to multiply by 22, i.e. use

T=I2vvTvT=I-2\frac{vv^{T}}{\left\lVert v\right\rVert} (31.15)

Thus to find TiT_{i} we have to compute the corresponding vv, and then use Equation 31.15.

31.1.3 Givens rotations

These are like Householder reflections, but more fiddly. Personally I prefer Householder reflections.

The idea with a Givens rotation is that instead of zeroing out all the entries in each vector at the same time, we do this one by one. Visually this goes something like

(××××××××××××××××××××××××)\displaystyle\begin{pmatrix}\times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix} 1(××××0×××××××××××××××××××)\displaystyle\mapsto_{1}\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix} (31.28)
2(××××0×××0×××××××××××××××)\displaystyle\mapsto_{2}\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix} (31.35)
\displaystyle... (31.36)
n1(××××0×××0×××0×××0×××0×××)\displaystyle\mapsto_{n-1}\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\end{pmatrix} (31.43)

Therefore the question is really how we can find a matrix TT which satisfies

T(v1v2vn)=(t1t2tk10tk+1tn)T\begin{pmatrix}v_{1}\\ v_{2}\\ ...\\ v_{n}\end{pmatrix}=\begin{pmatrix}t_{1}\\ t_{2}\\ ...\\ t_{k-1}\\ 0\\ t_{k+1}\\ ...\\ t_{n}\end{pmatrix} (31.44)

This problem looks pretty hard, but we can simplify it by first consider the 22d problem; we seek TT such that

T2(v1v2)=(r0)T_{2}\begin{pmatrix}v_{1}\\ v_{2}\end{pmatrix}=\begin{pmatrix}r\\ 0\end{pmatrix} (31.45)

where r=v12+v22r=\sqrt{v_{1}^{2}+v_{2}^{2}} so that TT is an orthogonal transformation. This is just the linear system

(cssc)(v1v2)=(r0)\begin{pmatrix}c&-s\\ s&c\end{pmatrix}\begin{pmatrix}v_{1}\\ v_{2}\end{pmatrix}=\begin{pmatrix}r\\ 0\end{pmatrix} (31.46)

where we can assume the special structure for TT because we want TT to be a rotation. Therefore we want our solution to satisfy

v1cv2s=r\displaystyle v_{1}c-v_{2}s=r (31.47)
v2c+v1s=0\displaystyle v_{2}c+v_{1}s=0 (31.48)

which means that we need

c=v1/r and s=v2/r.c=v_{1}/r\text{ and }s=-v_{2}/r. (31.49)

Here we have to be a bit careful because r=v12+v22r=\sqrt{v_{1}^{2}+v_{2}^{2}} and r2r^{2} is quite big and thus susceptible to overflow error. There are good ways to compute this but let’s not get distracted.

We can generalise this to the nn-dimensional case. Let TnT_{n} represent the transform for an n×nn\times n matrix. For a given (pun entirely unintended) vector we know

Tnv=v1T1+v2T2++vnTnT_{n}v=v_{1}T_{1}+v_{2}T_{2}+...+v_{n}T_{n} (31.50)

and we are interested in leaving all the vectors unpreserved except the kkth one, which means we need Ti=eiT_{i}=e_{i} where i{k,k+1}i\notin\{k,k+1\}. For the vkv_{k} and vk+1v_{k+1} terms we want

vk(00cs0)+vk+1(00sc0)\displaystyle v_{k}\begin{pmatrix}0\\ 0\\ ...\\ c\\ -s\\ ...\\ 0\end{pmatrix}+v_{k+1}\begin{pmatrix}0\\ 0\\ ...\\ s\\ c\\ ...\\ 0\end{pmatrix} (31.65)

so that the transform gives us

(00T2vkT2vk+10)\begin{pmatrix}0\\ 0\\ ...\\ T_{2}v_{k}\\ T_{2}v_{k+1}\\ ...\\ 0\end{pmatrix} (31.66)

which applies the rotation as required.