31.1 QR decomposition

31.1.1 Definition and some uses

The QR decomposition is a very powerful matrix factorisation. The idea is that for any matrix $A\in M_{m\times n}(\mathbb{R})$ we can find two matrices $Q,R$ such that

A=QR

(31.1)

where $Q$ is an orthogonal matrix and $R$ is an upper triangular matrix.

Example 31.1.1

Consider the least squares problem of finding the value of $x$ which minimises $||Ax-b||_{2}$ . If we have a $QR$ decomposition for $A$ we can then write

$\displaystyle\left\lVert Ax-b\right\rVert_{2}$	$\displaystyle=\left\lVert QRx-b\right\rVert_{2}$	(31.2)
	$\displaystyle=\left\lVert Q(Rx-Q^{T}b)\right\rVert_{2}$	(31.3)
	$\displaystyle=\left\lVert Rx-Q^{T}b\right\rVert_{2}$	(31.4)

31.1.2 Householder transformations

We already know one method for computing a $QR$ decomposition; Gram-Schmitt. Unfortunately, Gram-Schmitt is a fantastic theoretical tool, but very numerically unstable. There are alternative methods which are more suitable for computational applications.

One of these uses Householder transformations. The idea here is that we will apply a series of transformations which zero out everything in our matrix. In algebraic terms we will be carrying out the operation

T_{n}...T_{3}T_{2}T_{1}A=R

(31.5)

and if all the $T$ s are orthogonal this means that we can find

A=(T_{1}^{-1}...T_{n-1}^{-1}T_{n}^{-1})R

(31.6)

which is a $QR$ decomposition where

Q=T_{1}^{-1}...T_{n-1}^{-1}T_{n}^{-1}

(31.7)

because the inverse of an orthogonal matrix is also orthogonal, and the product of orthogonal matrices is also orthogonal.

The idea is that the first $T$ will carry out the following transformation (in matrix terms).

\begin{pmatrix}\times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix}\mapsto\begin{pmatrix}\times&\times&% \times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\end{pmatrix}

(31.8)

The second $T$ will carry out

\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\end{pmatrix}\mapsto\begin{pmatrix}\times&\times&\times&% \times\\ 0&\times&\times&\times\\ 0&0&\times&\times\\ 0&0&\times&\times\\ 0&0&\times&\times\\ 0&0&\times&\times\end{pmatrix}

(31.9)

and so on, until we get a triangular matrix.

The matrices, $T_{2},T_{3},...,T_{n-1}$ can be interpreted as Householder reflections for smaller matrices. The $i$ th transformation should only affect the sub-matrix formed by excluding the first $i-1$ rows and columns. This means that $T_{1}$ will look like

\left[\begin{array}[]{c|c}1&\begin{array}[]{cccc}0&0&0&0\end{array}\\ \hline\cr\begin{array}[]{c}0\\ 0\\ 0\\ 0\end{array}&T_{(n-1)\times T_{(}n-1)}\end{array}\right]

(31.10)

where $T_{(n-1)\times(n-1)}$ is a Householder reflection for the smaller square matrix formed by eliminating the first row and column of our original matrix. We can continue to apply this recursively to find the remaining matrices.

Let us consider how $T_{1}$ will work - in matrix terms we will get something along the lines of

T\begin{pmatrix}A_{1}&A_{2}&...&A_{n}\end{pmatrix}=\begin{pmatrix}TA_{1}&TA_{2% }&...&TA_{n}\end{pmatrix}

(31.11)

so we want $T$ to satisfy two very important constraints

•

be orthogonal
•

the column vector $TA_{1}$ should have all zero entries (the only exception being the first, which can be anything)

Orthogonal transformations are generally rotations or reflections. In this case we will use reflections. It is also possible to use rotations, as we will see in the next section.

If we consider this diagram (you might want to draw your own as mine is uniquely terrible) we can see that we want

TA_{1}=\left\lVert A_{1}\right\rVert e_{1}

(31.12)

we need to reflect to $\left\lVert A_{1}\right\rVert e_{1}$ (rather than just $e_{1}$ or some other value) so that we preserve the size of the vector (and thus have an orthogonal transformation).

Consider the dotted line, which we can call $H$ (for hyperplane, a reminder that this also works in higher dimensions). The orthogonal projection of $A_{1}$ onto $v$ is given by

P=\frac{vv^{T}}{\left\lVert v\right\rVert}

and while we’re at it we should define $v$ , which is

v:=\pm\left\lVert A_{1}\right\rVert e_{1}-A_{1}

(31.13)

and if you are wondering why there is a $\pm$ there, more on that later (it’s about numerical stability). The important thing is that either works. The diagram uses $v=\left\lVert A_{1}\right\rVert e_{1}-A_{1}$ , but we could also use $-\left\lVert A_{1}\right\rVert e_{1}-A_{1}$ , except that we’d be on the negative $e_{1}$ -axis and by drawing skills aren’t that good.

Therefore we have the complementary projector

T_{1/2}=I-\frac{vv^{T}}{\left\lVert v\right\rVert}

(31.14)

which projects onto $H$ (the dotted line). This is actually pretty helpful because it gets us half the way to $\left\lVert A_{1}\right\rVert e_{1}$ . To get the rest of the way there we just need to multiply by $2$ , i.e. use

T=I-2\frac{vv^{T}}{\left\lVert v\right\rVert}

(31.15)

Thus to find $T_{i}$ we have to compute the corresponding $v$ , and then use Equation 31.15.

31.1.3 Givens rotations

These are like Householder reflections, but more fiddly. Personally I prefer Householder reflections.

The idea with a Givens rotation is that instead of zeroing out all the entries in each vector at the same time, we do this one by one. Visually this goes something like

$\displaystyle\begin{pmatrix}\times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix}$	$\displaystyle\mapsto_{1}\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix}$	(31.28)
	$\displaystyle\mapsto_{2}\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\\ \times&\times&\times&\times\end{pmatrix}$	(31.35)
	$\displaystyle...$	(31.36)
	$\displaystyle\mapsto_{n-1}\begin{pmatrix}\times&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\\ 0&\times&\times&\times\end{pmatrix}$	(31.43)

Therefore the question is really how we can find a matrix $T$ which satisfies

T\begin{pmatrix}v_{1}\\ v_{2}\\ ...\\ v_{n}\end{pmatrix}=\begin{pmatrix}t_{1}\\ t_{2}\\ ...\\ t_{k-1}\\ 0\\ t_{k+1}\\ ...\\ t_{n}\end{pmatrix}

(31.44)

This problem looks pretty hard, but we can simplify it by first consider the $2$ d problem; we seek $T$ such that

T_{2}\begin{pmatrix}v_{1}\\ v_{2}\end{pmatrix}=\begin{pmatrix}r\\ 0\end{pmatrix}

(31.45)

where $r=\sqrt{v_{1}^{2}+v_{2}^{2}}$ so that $T$ is an orthogonal transformation. This is just the linear system

\begin{pmatrix}c&-s\\ s&c\end{pmatrix}\begin{pmatrix}v_{1}\\ v_{2}\end{pmatrix}=\begin{pmatrix}r\\ 0\end{pmatrix}

(31.46)

where we can assume the special structure for $T$ because we want $T$ to be a rotation. Therefore we want our solution to satisfy

	$\displaystyle v_{1}c-v_{2}s=r$		(31.47)
	$\displaystyle v_{2}c+v_{1}s=0$		(31.48)

which means that we need

c=v_{1}/r\text{ and }s=-v_{2}/r.

(31.49)

Here we have to be a bit careful because $r=\sqrt{v_{1}^{2}+v_{2}^{2}}$ and $r^{2}$ is quite big and thus susceptible to overflow error. There are good ways to compute this but let’s not get distracted.

We can generalise this to the $n$ -dimensional case. Let $T_{n}$ represent the transform for an $n\times n$ matrix. For a given (pun entirely unintended) vector we know

T_{n}v=v_{1}T_{1}+v_{2}T_{2}+...+v_{n}T_{n}

(31.50)

and we are interested in leaving all the vectors unpreserved except the $k$ th one, which means we need $T_{i}=e_{i}$ where $i\notin\{k,k+1\}$ . For the $v_{k}$ and $v_{k+1}$ terms we want

\displaystyle v_{k}\begin{pmatrix}0\\ 0\\ ...\\ c\\ -s\\ ...\\ 0\end{pmatrix}+v_{k+1}\begin{pmatrix}0\\ 0\\ ...\\ s\\ c\\ ...\\ 0\end{pmatrix}

(31.65)

so that the transform gives us

\begin{pmatrix}0\\ 0\\ ...\\ T_{2}v_{k}\\ T_{2}v_{k+1}\\ ...\\ 0\end{pmatrix}

(31.66)

which applies the rotation as required.