A calculus of the absurd

18.2 Systems of linear equations

18.2.1 Gaussian elimination

A system of linear equations is something in the form

\begin{align} & \alpha _{1,1} x_1 + \alpha _{1,2} x_2 + ... + \alpha _{1,n} x_n = b_1 \\ & \alpha _{2,1} x_1 + \alpha _{2,2} x_2 + ... + \alpha _{2,n} x_n = b_2 \\ & ... \\ & \alpha _{m,1} x_1 + \alpha _{m,2} x_2 + ... + \alpha _{m,n} x_n = b_m \\ \end{align}

This is pretty abstract (it describes any system of linear equations). The rather obvious question is what’s what?

A concrete example will probably help

\begin{align} & 12x + y = 12 \\ & x + 14 y = 24 \end{align}

  • • The \(\alpha _{i, j}\) (for all integer \(i\) and \(j\) such that \(1\leqq i \leqq n\)) are the coefficients of the system of linear equations. In the case of the concrete example, \(\alpha _{1, 1} = 12\), \(\alpha _{1, 2} = 1\), \(\alpha _{2, 1} = 1\) and \(\alpha _{2, 2} = 14\). The coefficients are constants and are fixed for the given system.

  • • The \(x_1, x_2, ..., x_n\) are the “variables”. We do not know the values of these (we are trying to solve the system for them) and thus we denote them using letters, as we don’t know which values they are. In the case of the concrete values, \(x_1 = x\) and \(x_1 = y\) (don’t be confused by the fact that \(x_1\) and \(x\) look similar - they’re definitely different variables).

  • • The \(b_1, b_2, ..., b_m\) are the “constant terms”. We are trying to solve for these. They are fixed for the system in question.

How do we solve a system of linear algebraic equations? Well, in the case of the concrete example, we can just try using algebra. Of course, \(2 \times 2\) systems are not the most complex systems to solve (imagine solving a \(10 \times 10\) system!) We would like a systematic approach to solving these equations which always works, and if the equations can’t be solved allows us to easily work out why.

The key idea behind Gaussian elimination is that a single equation in the form \(ax = b\) can be solved relatively easily by dividing through by \(a\) (if \(a = 0\) and \(b \ne 0\) then by definition of equality \(0 \ne 0\) so the equation has no solutions), giving

\begin{equation} x = \frac {b}{a} \end{equation}

If we have two equations, how can we turn them into one such equation? Take the concrete example; currently there are two equations in two variables, and we’d like one equation in one variable. From the system, we should pick one variable (either \(x\) or \(y\)) and choose to eliminate it (this is effectively the approach taught in most secondary school maths curricula).

\begin{align} & 12x + y = 12 \\ & x + 14 y = 24 \end{align}

For this system (the concrete example system), we can eliminate \(x\) from the second system as follows. Take \(12x + y = 12\) and divide it through by \(12\), which gives that

\begin{equation} x + \frac {y}{12} = 1 \end{equation}

Then we can just subtract this equation from both sides of the other equation, which gives that

\begin{equation} x + 14 y - (x + \frac {y}{12}) = 24 - 1 \end{equation}


\begin{equation} y \left (14 - \frac {1}{12}\right ) = 23 \end{equation}

So \(y = \frac {276}{167}\).

The concrete example wasn’t too bad, so let’s try to think about a more general case, that of the system

\begin{align} & ax + by = m \\ & cx + dy = n \end{align}

If we proceed by using the previous method (of solving for \(y\) by eliminating) \(x\) we can unfortunately run into trouble! We have an \(ax\) in the first equation, and ideally we’d like a \(-cx\) (so that we can eliminate \(cx\) from the other equation). We can proceed by multiplying by \(\frac {c}{a}\), which gives that

\begin{align} cx + \frac {bcy}{a} = \frac {cm}{a} \end{align}

Then by subtracting from the second equation we obtain a linear equation which only involves \(y\) (and thus can be relatively easily solved). But what if \(a=0\)? Then \(\frac {c}{a} = \frac {c}{0}\) which is not defined! This case is essentially the one where \(a = 0\) and all the other coefficients (that is \(b, c, d\)) are not equal to \(0\). We can write this system as

\begin{align} & & by & = m \\ & cx + & dy & = n \end{align}

It hopefully seems obvious that we already have a system in terms of \(y\) (that is that \(by = m\)).

There’s another case which is a bit of a pain; if the two equations are multiples of one another (e.g. \(ax + by = c\) and \(2ax + 2by = 2c\)), then we cannot find a unique solution (we can find infinitely many solutions; all the points on the straight line \(y = \frac {1}{b} \left [-ax + c\right ]\)). The case where we cannot find a solution is if we have an equation in the form \(0x + 0b = c\) where \(c \ne 0\), which is clearly false.

Therefore, we can solve a \(2 \times 2\) system in every case (or deduce that it cannot be solved)!

How do we solve a \(3 \times 3\) system of equations? Here’s the core principle underlying Gaussian elimination - we solve a \(2\times 2\) system and use this to solve for the \(3 \times 3\) system. Consider the case

\begin{align} & a_1 x + b_1 y + c_1 z = d_1 \\ & a_2 x + b_2 y + c_2 z = d_2 \\ & a_3 x + b_3 y + c_3 z = d_3 \end{align}

Who doesn’t love a good spoiler? Here’s how we can decompose this \(3 \times 3\) case into a \(2 \times 2\) case:

\begin{equation} \begin{array}[]{c|ccc} a_1 x + & b_1 y + & c_1 z & = d_1 \\ \hline a_2 x + & b_2 y + & c_2 z & = d_2 \\ a_3 x + & b_3 y + & c_3 z & = d_3 \par \end {array} \end{equation}

In the bottom right-hand corner, we have a \(2 \times 2\) system. Or we would, if we could remove \(a_2x\) from the second equation and \(a_3x\) from the third system of equations. How do we do this? Subtraction!

  • • If \(a_1 \ne 0\) then we can just compute \(\frac {a_2}{a_1}\) and \(\frac {a_3}{a_1}\) and subtract the first equation multiplied by each value from the second equation and third equation (respectively).

  • • If \(a_1 = 0\) then we can’t eliminate the other \(x\)s from the equations. What we can instead do, though, is swap two rows. If either of the coefficients is non-zero (i.e. \(a_1 \ne 0\) or \(a_3 \ne 0\)), we just swap that row with the first row (which one doesn’t matter). In the case where \(a_1 = a_2 = a_3 \ne 0\), we already just have an equation in terms of \(y\) and \(z\), so \(x\) is a “free” variable (i.e. \(x\) can be anything).

Then we have a \(2 \times 2\) system in terms of \(y\) and \(z\). After we solve this equation (if it does have a solution) we then have an equation

\begin{equation} a_1x + b_1 y + c_1 y = d_1 \end{equation}

This is the easier case (from earlier), as we know \(y\) and \(z\), and we can therefore compute \(x_1\) in terms of them.

We can solve for arbitrarily large \(n \times n\) systems by applying this principle (reduce it to an \((n-1) \times (n-1)\) system and keep doing this until we obtain a \(1 \times 1\) system, which we can solve and then use this to solve for the other variables).

TODO: most general case