19.1 Systems of linear equations

19.1.1 Gaussian elimination

A system of linear equations is something in the form

	$\displaystyle\alpha_{1,1}x_{1}+\alpha_{1,2}x_{2}+...+\alpha_{1,n}x_{n}=b_{1}$		(19.1)
	$\displaystyle\alpha_{2,1}x_{1}+\alpha_{2,2}x_{2}+...+\alpha_{2,n}x_{n}=b_{2}$		(19.2)
	$\displaystyle...$		(19.3)
	$\displaystyle\alpha_{m,1}x_{1}+\alpha_{m,2}x_{2}+...+\alpha_{m,n}x_{n}=b_{m}$		(19.4)

This is pretty abstract (it describes any system of linear equations). The rather obvious question is what’s what?

A concrete example will probably help

	$\displaystyle 12x+y=12$		(19.6)
	$\displaystyle x+14y=24$		(19.7)

•

The $\alpha_{i,j}$ (for all integer $i$ and $j$ such that $1\leqq i\leqq n$ ) are the coefficients of the system of linear equations. In the case of the concrete example, $\alpha_{1,1}=12$ , $\alpha_{1,2}=1$ , $\alpha_{2,1}=1$ and $\alpha_{2,2}=14$ . The coefficients are constants and are fixed for the given system.
•

The $x_{1},x_{2},...,x_{n}$ are the \sayvariables. We do not know the values of these (we are trying to solve the system for them) and thus we denote them using letters, as we don’t know which values they are. In the case of the concrete values, $x_{1}=x$ and $x_{1}=y$ (don’t be confused by the fact that $x_{1}$ and $x$ look similar - they’re definitely different variables).
•

The $b_{1},b_{2},...,b_{m}$ are the \sayconstant terms. We are trying to solve for these. They are fixed for the system in question.

How do we solve a system of linear algebraic equations? Well, in the case of the concrete example, we can just try using algebra. Of course, $2\times 2$ systems are not the most complex systems to solve (imagine solving a $10\times 10$ system!) We would like a systematic approach to solving these equations which always works, and if the equations can’t be solved allows us to easily work out why.

The key idea behind Gaussian elimination is that a single equation in the form $ax=b$ can be solved relatively easily by dividing through by $a$ (if $a=0$ and $b\neq 0$ then by definition of equality $0\neq 0$ so the equation has no solutions), giving

x=\frac{b}{a}

(19.8)

If we have two equations, how can we turn them into one such equation? Take the concrete example; currently there are two equations in two variables, and we’d like one equation in one variable. From the system, we should pick one variable (either $x$ or $y$ ) and choose to eliminate it (this is effectively the approach taught in most secondary school maths curricula).

	$\displaystyle 12x+y=12$		(19.9)
	$\displaystyle x+14y=24$		(19.10)

For this system (the concrete example system), we can eliminate $x$ from the second system as follows. Take $12x+y=12$ and divide it through by $12$ , which gives that

x+\frac{y}{12}=1

(19.11)

Then we can just subtract this equation from both sides of the other equation, which gives that

x+14y-(x+\frac{y}{12})=24-1

(19.12)

Therefore

y\left(14-\frac{1}{12}\right)=23

(19.13)

So $y=\frac{276}{167}$ .

The concrete example wasn’t too bad, so let’s try to think about a more general case, that of the system

	$\displaystyle ax+by=m$		(19.14)
	$\displaystyle cx+dy=n$		(19.15)

If we proceed by using the previous method (of solving for $y$ by eliminating) $x$ we can unfortunately run into trouble! We have an $ax$ in the first equation, and ideally we’d like a $-cx$ (so that we can eliminate $cx$ from the other equation). We can proceed by multiplying by $\frac{c}{a}$ , which gives that

\displaystyle cx+\frac{bcy}{a}=\frac{cm}{a}

(19.16)

Then by subtracting from the second equation we obtain a linear equation which only involves $y$ (and thus can be relatively easily solved). But what if $a=0$ ? Then $\frac{c}{a}=\frac{c}{0}$ which is not defined! This case is essentially the one where $a=0$ and all the other coefficients (that is $b,c,d$ ) are not equal to $0$ . We can write this system as

		$\displaystyle by$	$\displaystyle=m$		(19.17)
	$\displaystyle cx+$	$\displaystyle dy$	$\displaystyle=n$		(19.18)

It hopefully seems obvious that we already have a system in terms of $y$ (that is that $by=m$ ).

There’s another case which is a bit of a pain; if the two equations are multiples of one another (e.g. $ax+by=c$ and $2ax+2by=2c$ ), then we cannot find a unique solution (we can find infinitely many solutions; all the points on the straight line $y=\frac{1}{b}\left[-ax+c\right]$ ). The case where we cannot find a solution is if we have an equation in the form $0x+0b=c$ where $c\neq 0$ , which is clearly false.

Therefore, we can solve a $2\times 2$ system in every case (or deduce that it cannot be solved)!

How do we solve a $3\times 3$ system of equations? Here’s the core principle underlying Gaussian elimination - we solve a $2\times 2$ system and use this to solve for the $3\times 3$ system. Consider the case

	$\displaystyle a_{1}x+b_{1}y+c_{1}z=d_{1}$		(19.19)
	$\displaystyle a_{2}x+b_{2}y+c_{2}z=d_{2}$		(19.20)
	$\displaystyle a_{3}x+b_{3}y+c_{3}z=d_{3}$		(19.21)

Who doesn’t love a good spoiler? Here’s how we can decompose this $3\times 3$ case into a $2\times 2$ case:

\begin{array}[]{c|ccc}a_{1}x+&b_{1}y+&c_{1}z&=d_{1}\\ \hline\cr a_{2}x+&b_{2}y+&c_{2}z&=d_{2}\\ a_{3}x+&b_{3}y+&c_{3}z&=d_{3}\par\end{array}

(19.22)

In the bottom right-hand corner, we have a $2\times 2$ system. Or we would, if we could remove $a_{2}x$ from the second equation and $a_{3}x$ from the third system of equations. How do we do this? Subtraction!

•

If $a_{1}\neq 0$ then we can just compute $\frac{a_{2}}{a_{1}}$ and $\frac{a_{3}}{a_{1}}$ and subtract the first equation multiplied by each value from the second equation and third equation (respectively).
•

If $a_{1}=0$ then we can’t eliminate the other $x$ s from the equations. What we can instead do, though, is swap two rows. If either of the coefficients is non-zero (i.e. $a_{1}\neq 0$ or $a_{3}\neq 0$ ), we just swap that row with the first row (which one doesn’t matter). In the case where $a_{1}=a_{2}=a_{3}\neq 0$ , we already just have an equation in terms of $y$ and $z$ , so $x$ is a \sayfree variable (i.e. $x$ can be anything).

Then we have a $2\times 2$ system in terms of $y$ and $z$ . After we solve this equation (if it does have a solution) we then have an equation

a_{1}x+b_{1}y+c_{1}y=d_{1}

(19.23)

This is the easier case (from earlier), as we know $y$ and $z$ , and we can therefore compute $x_{1}$ in terms of them.

We can solve for arbitrarily large $n\times n$ systems by applying this principle (reduce it to an $(n-1)\times(n-1)$ system and keep doing this until we obtain a $1\times 1$ system, which we can solve and then use this to solve for the other variables).

It’s worth doing a few examples to really understand the method in full. As always, there are some great questions on https://madasmaths.com.

19.1.2 Elementary matrices

An \sayelementary matrix is a matrix which can be obtained by performing exactly one of the elementary row operations (last seen in Section 19.1.2), once on the identity matrix.

For example, let us consider the case of the $2\times 2$ identity matrix

I=\begin{pmatrix}1&0\\ 0&1\end{pmatrix}

(19.24)

If we swap the two rows, we obtain a new matrix

P=\begin{pmatrix}0&1\\ 1&0\end{pmatrix}

(19.25)

Why the letter $P$ ? Because this a $2\times 2$ permutation matrix - if we apply it to another matrix it will permute (i.e. reorder) the rows of that matrix, for example

\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}a&b\\ c&d\end{pmatrix}=\begin{pmatrix}c&d\\ a&b\end{pmatrix}

(19.26)

We can codify the fact that elementary matrices and elementary row operations are kind of equivalent using this theorem.

Theorem 19.1.1

If $E$ is an elementary matrix (which by definition was obtained by performing one of the three fundamental row operations a single time on $I$ ) then when we multiply any matrix $A$ by $E$ the resulting matrix is the same as the matrix we would obtain by directly performing an elementary row operation to this matrix.

To prove this, we need to consider all the elementary matrices. The first class (perhaps the easiest) to consider, is the elementary matrix which we obtain by swapping two rows of the identity matrix. To prove that multiplying this with another matrix, $A$ , gives the same result as just swapping two rows of $A$ , we first can first devise a formula for $E_{i,j}$ (i.e. a formula for every element in the matrix).

E_{i,j}=\begin{cases}1&\text{if $i=j$ and $i\neq x$ and $x\neq y$}\\ 1&\text{if ($i=x$ and $j=y$) or ($i=y$ and $j=x$)}\\ 0&\text{otherwise}\end{cases}

(19.27)

Then, we can consider the result of multiplying $A$ by $E$ , i.e.¹¹ 1 Note: here $n$ refers to the number of columns in $E$ and rows in $A$

(EA)_{ij}=\sum_{1\leqq r\leqq n}\Big{[}E_{i,r}A_{r,j}\Big{]}

(19.28)

From here we can consider a set of exhaustive cases.

•

If $i=x$ then we have that $E_{x,j}=1$ if and only if $r=y$ and $0$ in all other cases. This means that the sum is equal to $E_{x,y}A_{y,j}=1$ , as this is the only non-zero member of the sum (i.e. the jth item of the yth row is now the jth item of the xth row - the positions have been swapped which is what we wanted to show).
•

We now get the case where $i=y$ for free, as we can apply the exact same logic as for the case $i=x$ by swapping $x$ and $y$ (i.e. this case is true by symmetry).
•

In all other cases of $i$ we know that $E_{i,r}=1$ if and only if $i=j$ , so when $r=i=j$ . Therefore, the only non-zero member of the sum is $E_{i,i}A_{i,j}=A_{i,j}$ . That is, all the other rows in $AE$ are equivalent to all the other rows in $A$ (they have remained unchanged, which is what would have happened had we just applied the elementary row operations to $A$ directly).

This means that the theorem is true for any $E$ where we swap two rows.

TODO: prove the other theorems (they can be shown analogously)

19.1.3 LU decomposition

In general, systems of linear equations which are triangular²² 2 That is, all the elements in coefficient matrix either above or below the diagonal are all equal to $0$ are substantially easier to solve than those which are not (because we can simply use backwards or forwards substitution).

Unfortunately, however, more often than not coefficient matrices do not come in triangular form³³ 3 As you’ve probably noticed, reality is messy.. It would, however, be nice if we could reduce them to triangular matrices (if you’re thinking \sayof course, we can just use Gaussian elimination (Section 19.1.1), then you are correct)! However, Gaussian elimination is useful for solving an equation in the form

Ax=b

(19.29)

It works very well when we only have one $b$ , but sometimes we have more than one $b$ we might want to solve for. What if we want to solve the equation $Ax=b$ for every⁴⁴ 4 One principle which seems to hold in a lot of mathematics is that if you consider enough absurd scenarios, eventually you will discover something interesting (e.g. a principle which can be applied to non-absurd scenarios) $b\in\{b_{1},b_{2},...,b_{n}\}$ . Here’s a neat idea; if we can write $A=LU$ (where $L$ and $U$ are both triangular matrices), then we can write our system equivalently as

(LU)x=b

(19.30)

And because matrix multiplication is associative, this is the same as

L(Ux)=b

(19.31)

As $Ux$ will be a column vector, if we set $y=Ux$ , then we can build a system

Ly=b

(19.32)

We can solve this system using forwards or backwards substitution (depending on whether $L$ is upper-triangular or lower-triangular). Then, once we know the value of $y$ , we can solve a second equation,

Ux=y

(19.33)

Which gives us the value of $x$ (which is what we wanted). Note that because of how we defined $y$ , $y=Ux$ .

19.1.4 Performing LU decomposition

This is all very well, but how do we split $A$ into $LU=A$ (where $L$ is lower-triangular and $U$ is upper-triangular)? Well, we can obtain an upper-triangular matrix by using Gaussian elimination, but what about the lower-triangular matrix? We can represent the series of elementary row operations which we need to perform in order to write $A$ as an upper-triangular matrix as $E_{n}E_{n-1}...E_{1}$ . We also know that these operations have an inverse (the set of matrices which "undo" all the elementary row operations which we have applied)! If we are lucky, we didn’t have to swap any rows, and this inverse is a lower-triangular matrix. If not, then we cannot write $A=LU$ where $L$ is a lower-triangular matrix, but we can write $PA=LU$ . Why?

After Gaussian elimination, if we carried out steps represented by the matrix $E$ , then we have

$\displaystyle A$	$\displaystyle=IA$		(19.34)
	$\displaystyle=(E^{-1}E)A$	As any matrix multiplied by its inverse is $I$	(19.35)
	$\displaystyle=(E^{-1})EA$	As matrix multiplication is associative	(19.36)

Now we have an equation in the form

\displaystyle A=(E^{-1})EA

(19.37)

Unfortunately, $E^{-1}$ is not a lower-triangular matrix (we considered the case where it is above), but what we can do is rearrange some of the rows to make it into one, i.e. we have a matrix $P$ which swaps rows (a \saytransposition matrix) of a matrix. In this case,

\displaystyle PA=P(E^{-1})EA

(19.38)

That is, we have

PA=LU

(19.39)

Where $L=PE^{-1}$ and $U=EA$ .