19.1 Systems of linear equations

19.1.1 Gaussian elimination

A system of linear equations is something in the form

α1,1x1+α1,2x2++α1,nxn=b1\displaystyle\alpha_{1,1}x_{1}+\alpha_{1,2}x_{2}+...+\alpha_{1,n}x_{n}=b_{1} (19.1)
α2,1x1+α2,2x2++α2,nxn=b2\displaystyle\alpha_{2,1}x_{1}+\alpha_{2,2}x_{2}+...+\alpha_{2,n}x_{n}=b_{2} (19.2)
\displaystyle... (19.3)
αm,1x1+αm,2x2++αm,nxn=bm\displaystyle\alpha_{m,1}x_{1}+\alpha_{m,2}x_{2}+...+\alpha_{m,n}x_{n}=b_{m} (19.4)

This is pretty abstract (it describes any system of linear equations). The rather obvious question is what’s what?

A concrete example will probably help

12x+y=12\displaystyle 12x+y=12 (19.6)
x+14y=24\displaystyle x+14y=24 (19.7)
  • The αi,j\alpha_{i,j} (for all integer ii and jj such that 1in1\leqq i\leqq n) are the coefficients of the system of linear equations. In the case of the concrete example, α1,1=12\alpha_{1,1}=12, α1,2=1\alpha_{1,2}=1, α2,1=1\alpha_{2,1}=1 and α2,2=14\alpha_{2,2}=14. The coefficients are constants and are fixed for the given system.

  • The x1,x2,,xnx_{1},x_{2},...,x_{n} are the \sayvariables. We do not know the values of these (we are trying to solve the system for them) and thus we denote them using letters, as we don’t know which values they are. In the case of the concrete values, x1=xx_{1}=x and x1=yx_{1}=y (don’t be confused by the fact that x1x_{1} and xx look similar - they’re definitely different variables).

  • The b1,b2,,bmb_{1},b_{2},...,b_{m} are the \sayconstant terms. We are trying to solve for these. They are fixed for the system in question.

How do we solve a system of linear algebraic equations? Well, in the case of the concrete example, we can just try using algebra. Of course, 2×22\times 2 systems are not the most complex systems to solve (imagine solving a 10×1010\times 10 system!) We would like a systematic approach to solving these equations which always works, and if the equations can’t be solved allows us to easily work out why.

The key idea behind Gaussian elimination is that a single equation in the form ax=bax=b can be solved relatively easily by dividing through by aa (if a=0a=0 and b0b\neq 0 then by definition of equality 000\neq 0 so the equation has no solutions), giving

x=bax=\frac{b}{a} (19.8)

If we have two equations, how can we turn them into one such equation? Take the concrete example; currently there are two equations in two variables, and we’d like one equation in one variable. From the system, we should pick one variable (either xx or yy) and choose to eliminate it (this is effectively the approach taught in most secondary school maths curricula).

12x+y=12\displaystyle 12x+y=12 (19.9)
x+14y=24\displaystyle x+14y=24 (19.10)

For this system (the concrete example system), we can eliminate xx from the second system as follows. Take 12x+y=1212x+y=12 and divide it through by 1212, which gives that

x+y12=1x+\frac{y}{12}=1 (19.11)

Then we can just subtract this equation from both sides of the other equation, which gives that

x+14y(x+y12)=241x+14y-(x+\frac{y}{12})=24-1 (19.12)


y(14112)=23y\left(14-\frac{1}{12}\right)=23 (19.13)

So y=276167y=\frac{276}{167}.

The concrete example wasn’t too bad, so let’s try to think about a more general case, that of the system

ax+by=m\displaystyle ax+by=m (19.14)
cx+dy=n\displaystyle cx+dy=n (19.15)

If we proceed by using the previous method (of solving for yy by eliminating) xx we can unfortunately run into trouble! We have an axax in the first equation, and ideally we’d like a cx-cx (so that we can eliminate cxcx from the other equation). We can proceed by multiplying by ca\frac{c}{a}, which gives that

cx+bcya=cma\displaystyle cx+\frac{bcy}{a}=\frac{cm}{a} (19.16)

Then by subtracting from the second equation we obtain a linear equation which only involves yy (and thus can be relatively easily solved). But what if a=0a=0? Then ca=c0\frac{c}{a}=\frac{c}{0} which is not defined! This case is essentially the one where a=0a=0 and all the other coefficients (that is b,c,db,c,d) are not equal to 0. We can write this system as

by\displaystyle by =m\displaystyle=m (19.17)
cx+\displaystyle cx+ dy\displaystyle dy =n\displaystyle=n (19.18)

It hopefully seems obvious that we already have a system in terms of yy (that is that by=mby=m).

There’s another case which is a bit of a pain; if the two equations are multiples of one another (e.g. ax+by=cax+by=c and 2ax+2by=2c2ax+2by=2c), then we cannot find a unique solution (we can find infinitely many solutions; all the points on the straight line y=1b[ax+c]y=\frac{1}{b}\left[-ax+c\right]). The case where we cannot find a solution is if we have an equation in the form 0x+0b=c0x+0b=c where c0c\neq 0, which is clearly false.

Therefore, we can solve a 2×22\times 2 system in every case (or deduce that it cannot be solved)!

How do we solve a 3×33\times 3 system of equations? Here’s the core principle underlying Gaussian elimination - we solve a 2×22\times 2 system and use this to solve for the 3×33\times 3 system. Consider the case

a1x+b1y+c1z=d1\displaystyle a_{1}x+b_{1}y+c_{1}z=d_{1} (19.19)
a2x+b2y+c2z=d2\displaystyle a_{2}x+b_{2}y+c_{2}z=d_{2} (19.20)
a3x+b3y+c3z=d3\displaystyle a_{3}x+b_{3}y+c_{3}z=d_{3} (19.21)

Who doesn’t love a good spoiler? Here’s how we can decompose this 3×33\times 3 case into a 2×22\times 2 case:

a1x+b1y+c1z=d1a2x+b2y+c2z=d2a3x+b3y+c3z=d3\begin{array}[]{c|ccc}a_{1}x+&b_{1}y+&c_{1}z&=d_{1}\\ \hline\cr a_{2}x+&b_{2}y+&c_{2}z&=d_{2}\\ a_{3}x+&b_{3}y+&c_{3}z&=d_{3}\par\end{array} (19.22)

In the bottom right-hand corner, we have a 2×22\times 2 system. Or we would, if we could remove a2xa_{2}x from the second equation and a3xa_{3}x from the third system of equations. How do we do this? Subtraction!

  • If a10a_{1}\neq 0 then we can just compute a2a1\frac{a_{2}}{a_{1}} and a3a1\frac{a_{3}}{a_{1}} and subtract the first equation multiplied by each value from the second equation and third equation (respectively).

  • If a1=0a_{1}=0 then we can’t eliminate the other xxs from the equations. What we can instead do, though, is swap two rows. If either of the coefficients is non-zero (i.e. a10a_{1}\neq 0 or a30a_{3}\neq 0), we just swap that row with the first row (which one doesn’t matter). In the case where a1=a2=a30a_{1}=a_{2}=a_{3}\neq 0, we already just have an equation in terms of yy and zz, so xx is a \sayfree variable (i.e. xx can be anything).

Then we have a 2×22\times 2 system in terms of yy and zz. After we solve this equation (if it does have a solution) we then have an equation

a1x+b1y+c1y=d1a_{1}x+b_{1}y+c_{1}y=d_{1} (19.23)

This is the easier case (from earlier), as we know yy and zz, and we can therefore compute x1x_{1} in terms of them.

We can solve for arbitrarily large n×nn\times n systems by applying this principle (reduce it to an (n1)×(n1)(n-1)\times(n-1) system and keep doing this until we obtain a 1×11\times 1 system, which we can solve and then use this to solve for the other variables).

It’s worth doing a few examples to really understand the method in full. As always, there are some great questions on https://madasmaths.com.

19.1.2 Elementary matrices

An \sayelementary matrix is a matrix which can be obtained by performing exactly one of the elementary row operations (last seen in Section 19.1.2), once on the identity matrix.

For example, let us consider the case of the 2×22\times 2 identity matrix

I=(1001)I=\begin{pmatrix}1&0\\ 0&1\end{pmatrix} (19.24)

If we swap the two rows, we obtain a new matrix

P=(0110)P=\begin{pmatrix}0&1\\ 1&0\end{pmatrix} (19.25)

Why the letter PP? Because this a 2×22\times 2 permutation matrix - if we apply it to another matrix it will permute (i.e. reorder) the rows of that matrix, for example

(0110)(abcd)=(cdab)\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}a&b\\ c&d\end{pmatrix}=\begin{pmatrix}c&d\\ a&b\end{pmatrix} (19.26)

We can codify the fact that elementary matrices and elementary row operations are kind of equivalent using this theorem.

Theorem 19.1.1

If EE is an elementary matrix (which by definition was obtained by performing one of the three fundamental row operations a single time on II) then when we multiply any matrix AA by EE the resulting matrix is the same as the matrix we would obtain by directly performing an elementary row operation to this matrix.

To prove this, we need to consider all the elementary matrices. The first class (perhaps the easiest) to consider, is the elementary matrix which we obtain by swapping two rows of the identity matrix. To prove that multiplying this with another matrix, AA, gives the same result as just swapping two rows of AA, we first can first devise a formula for Ei,jE_{i,j} (i.e. a formula for every element in the matrix).

Ei,j={1if i=j and ix and xy1if (i=x and j=y) or (i=y and j=x)0otherwiseE_{i,j}=\begin{cases}1&\text{if $i=j$ and $i\neq x$ and $x\neq y$}\\ 1&\text{if ($i=x$ and $j=y$) or ($i=y$ and $j=x$)}\\ 0&\text{otherwise}\end{cases} (19.27)

Then, we can consider the result of multiplying AA by EE, i.e.11 1 Note: here nn refers to the number of columns in EE and rows in AA

(EA)ij=1rn[Ei,rAr,j](EA)_{ij}=\sum_{1\leqq r\leqq n}\Big{[}E_{i,r}A_{r,j}\Big{]} (19.28)

From here we can consider a set of exhaustive cases.

  • If i=xi=x then we have that Ex,j=1E_{x,j}=1 if and only if r=yr=y and 0 in all other cases. This means that the sum is equal to Ex,yAy,j=1E_{x,y}A_{y,j}=1, as this is the only non-zero member of the sum (i.e. the jth item of the yth row is now the jth item of the xth row - the positions have been swapped which is what we wanted to show).

  • We now get the case where i=yi=y for free, as we can apply the exact same logic as for the case i=xi=x by swapping xx and yy (i.e. this case is true by symmetry).

  • In all other cases of ii we know that Ei,r=1E_{i,r}=1 if and only if i=ji=j, so when r=i=jr=i=j. Therefore, the only non-zero member of the sum is Ei,iAi,j=Ai,jE_{i,i}A_{i,j}=A_{i,j}. That is, all the other rows in AEAE are equivalent to all the other rows in AA (they have remained unchanged, which is what would have happened had we just applied the elementary row operations to AA directly).

This means that the theorem is true for any EE where we swap two rows.

TODO: prove the other theorems (they can be shown analogously)

19.1.3 LU decomposition

In general, systems of linear equations which are triangular22 2 That is, all the elements in coefficient matrix either above or below the diagonal are all equal to 0 are substantially easier to solve than those which are not (because we can simply use backwards or forwards substitution).

Unfortunately, however, more often than not coefficient matrices do not come in triangular form33 3 As you’ve probably noticed, reality is messy.. It would, however, be nice if we could reduce them to triangular matrices (if you’re thinking \sayof course, we can just use Gaussian elimination (Section 19.1.1), then you are correct)! However, Gaussian elimination is useful for solving an equation in the form

Ax=bAx=b (19.29)

It works very well when we only have one bb, but sometimes we have more than one bb we might want to solve for. What if we want to solve the equation Ax=bAx=b for every44 4 One principle which seems to hold in a lot of mathematics is that if you consider enough absurd scenarios, eventually you will discover something interesting (e.g. a principle which can be applied to non-absurd scenarios) b{b1,b2,,bn}b\in\{b_{1},b_{2},...,b_{n}\}. Here’s a neat idea; if we can write A=LUA=LU (where LL and UU are both triangular matrices), then we can write our system equivalently as

(LU)x=b(LU)x=b (19.30)

And because matrix multiplication is associative, this is the same as

L(Ux)=bL(Ux)=b (19.31)

As UxUx will be a column vector, if we set y=Uxy=Ux, then we can build a system

Ly=bLy=b (19.32)

We can solve this system using forwards or backwards substitution (depending on whether LL is upper-triangular or lower-triangular). Then, once we know the value of yy, we can solve a second equation,

Ux=yUx=y (19.33)

Which gives us the value of xx (which is what we wanted). Note that because of how we defined yy, y=Uxy=Ux.

19.1.4 Performing LU decomposition

This is all very well, but how do we split AA into LU=ALU=A (where LL is lower-triangular and UU is upper-triangular)? Well, we can obtain an upper-triangular matrix by using Gaussian elimination, but what about the lower-triangular matrix? We can represent the series of elementary row operations which we need to perform in order to write AA as an upper-triangular matrix as EnEn1E1E_{n}E_{n-1}...E_{1}. We also know that these operations have an inverse (the set of matrices which "undo" all the elementary row operations which we have applied)! If we are lucky, we didn’t have to swap any rows, and this inverse is a lower-triangular matrix. If not, then we cannot write A=LUA=LU where LL is a lower-triangular matrix, but we can write PA=LUPA=LU. Why?

After Gaussian elimination, if we carried out steps represented by the matrix EE, then we have

A\displaystyle A =IA\displaystyle=IA (19.34)
=(E1E)A\displaystyle=(E^{-1}E)A As any matrix multiplied by its inverse is II (19.35)
=(E1)EA\displaystyle=(E^{-1})EA As matrix multiplication is associative (19.36)

Now we have an equation in the form

A=(E1)EA\displaystyle A=(E^{-1})EA (19.37)

Unfortunately, E1E^{-1} is not a lower-triangular matrix (we considered the case where it is above), but what we can do is rearrange some of the rows to make it into one, i.e. we have a matrix PP which swaps rows (a \saytransposition matrix) of a matrix. In this case,

PA=P(E1)EA\displaystyle PA=P(E^{-1})EA (19.38)

That is, we have

PA=LUPA=LU (19.39)

Where L=PE1L=PE^{-1} and U=EAU=EA.