19.1 Systems of linear equations
19.1.1 Gaussian elimination
A system of linear equations is something in the form
This is pretty abstract (it describes any system of linear equations). The rather obvious question is what’s what?
A concrete example will probably help
The (for all integer and such that ) are the coefficients of the system of linear equations. In the case of the concrete example, , , and . The coefficients are constants and are fixed for the given system.
The are the \sayvariables. We do not know the values of these (we are trying to solve the system for them) and thus we denote them using letters, as we don’t know which values they are. In the case of the concrete values, and (don’t be confused by the fact that and look similar - they’re definitely different variables).
The are the \sayconstant terms. We are trying to solve for these. They are fixed for the system in question.
How do we solve a system of linear algebraic equations? Well, in the case of the concrete example, we can just try using algebra. Of course, systems are not the most complex systems to solve (imagine solving a system!) We would like a systematic approach to solving these equations which always works, and if the equations can’t be solved allows us to easily work out why.
The key idea behind Gaussian elimination is that a single equation in the form can be solved relatively easily by dividing through by (if and then by definition of equality so the equation has no solutions), giving
If we have two equations, how can we turn them into one such equation? Take the concrete example; currently there are two equations in two variables, and we’d like one equation in one variable. From the system, we should pick one variable (either or ) and choose to eliminate it (this is effectively the approach taught in most secondary school maths curricula).
For this system (the concrete example system), we can eliminate from the second system as follows. Take and divide it through by , which gives that
Then we can just subtract this equation from both sides of the other equation, which gives that
The concrete example wasn’t too bad, so let’s try to think about a more general case, that of the system
If we proceed by using the previous method (of solving for by eliminating) we can unfortunately run into trouble! We have an in the first equation, and ideally we’d like a (so that we can eliminate from the other equation). We can proceed by multiplying by , which gives that
Then by subtracting from the second equation we obtain a linear equation which only involves (and thus can be relatively easily solved). But what if ? Then which is not defined! This case is essentially the one where and all the other coefficients (that is ) are not equal to . We can write this system as
It hopefully seems obvious that we already have a system in terms of (that is that ).
There’s another case which is a bit of a pain; if the two equations are multiples of one another (e.g. and ), then we cannot find a unique solution (we can find infinitely many solutions; all the points on the straight line ). The case where we cannot find a solution is if we have an equation in the form where , which is clearly false.
Therefore, we can solve a system in every case (or deduce that it cannot be solved)!
How do we solve a system of equations? Here’s the core principle underlying Gaussian elimination - we solve a system and use this to solve for the system. Consider the case
Who doesn’t love a good spoiler? Here’s how we can decompose this case into a case:
In the bottom right-hand corner, we have a system. Or we would, if we could remove from the second equation and from the third system of equations. How do we do this? Subtraction!
If then we can just compute and and subtract the first equation multiplied by each value from the second equation and third equation (respectively).
If then we can’t eliminate the other s from the equations. What we can instead do, though, is swap two rows. If either of the coefficients is non-zero (i.e. or ), we just swap that row with the first row (which one doesn’t matter). In the case where , we already just have an equation in terms of and , so is a \sayfree variable (i.e. can be anything).
Then we have a system in terms of and . After we solve this equation (if it does have a solution) we then have an equation
This is the easier case (from earlier), as we know and , and we can therefore compute in terms of them.
We can solve for arbitrarily large systems by applying this principle (reduce it to an system and keep doing this until we obtain a system, which we can solve and then use this to solve for the other variables).
It’s worth doing a few examples to really understand the method in full. As always, there are some great questions on https://madasmaths.com.
19.1.2 Elementary matrices
An \sayelementary matrix is a matrix which can be obtained by performing exactly one of the elementary row operations (last seen in Section 19.1.2), once on the identity matrix.
For example, let us consider the case of the identity matrix
If we swap the two rows, we obtain a new matrix
Why the letter ? Because this a permutation matrix - if we apply it to another matrix it will permute (i.e. reorder) the rows of that matrix, for example
We can codify the fact that elementary matrices and elementary row operations are kind of equivalent using this theorem.
If is an elementary matrix (which by definition was obtained by performing one of the three fundamental row operations a single time on ) then when we multiply any matrix by the resulting matrix is the same as the matrix we would obtain by directly performing an elementary row operation to this matrix.
To prove this, we need to consider all the elementary matrices. The first class (perhaps the easiest) to consider, is the elementary matrix which we obtain by swapping two rows of the identity matrix. To prove that multiplying this with another matrix, , gives the same result as just swapping two rows of , we first can first devise a formula for (i.e. a formula for every element in the matrix).
Then, we can consider the result of multiplying by , i.e.11 1 Note: here refers to the number of columns in and rows in
From here we can consider a set of exhaustive cases.
If then we have that if and only if and in all other cases. This means that the sum is equal to , as this is the only non-zero member of the sum (i.e. the jth item of the yth row is now the jth item of the xth row - the positions have been swapped which is what we wanted to show).
We now get the case where for free, as we can apply the exact same logic as for the case by swapping and (i.e. this case is true by symmetry).
In all other cases of we know that if and only if , so when . Therefore, the only non-zero member of the sum is . That is, all the other rows in are equivalent to all the other rows in (they have remained unchanged, which is what would have happened had we just applied the elementary row operations to directly).
This means that the theorem is true for any where we swap two rows.
TODO: prove the other theorems (they can be shown analogously)
19.1.3 LU decomposition
In general, systems of linear equations which are triangular22 2 That is, all the elements in coefficient matrix either above or below the diagonal are all equal to are substantially easier to solve than those which are not (because we can simply use backwards or forwards substitution).
Unfortunately, however, more often than not coefficient matrices do not come in triangular form33 3 As you’ve probably noticed, reality is messy.. It would, however, be nice if we could reduce them to triangular matrices (if you’re thinking \sayof course, we can just use Gaussian elimination (Section 19.1.1), then you are correct)! However, Gaussian elimination is useful for solving an equation in the form
It works very well when we only have one , but sometimes we have more than one we might want to solve for. What if we want to solve the equation for every44 4 One principle which seems to hold in a lot of mathematics is that if you consider enough absurd scenarios, eventually you will discover something interesting (e.g. a principle which can be applied to non-absurd scenarios) . Here’s a neat idea; if we can write (where and are both triangular matrices), then we can write our system equivalently as
And because matrix multiplication is associative, this is the same as
As will be a column vector, if we set , then we can build a system
We can solve this system using forwards or backwards substitution (depending on whether is upper-triangular or lower-triangular). Then, once we know the value of , we can solve a second equation,
Which gives us the value of (which is what we wanted). Note that because of how we defined , .
19.1.4 Performing LU decomposition
This is all very well, but how do we split into (where is lower-triangular and is upper-triangular)? Well, we can obtain an upper-triangular matrix by using Gaussian elimination, but what about the lower-triangular matrix? We can represent the series of elementary row operations which we need to perform in order to write as an upper-triangular matrix as . We also know that these operations have an inverse (the set of matrices which "undo" all the elementary row operations which we have applied)! If we are lucky, we didn’t have to swap any rows, and this inverse is a lower-triangular matrix. If not, then we cannot write where is a lower-triangular matrix, but we can write . Why?
After Gaussian elimination, if we carried out steps represented by the matrix , then we have
|As any matrix multiplied by its inverse is||(19.35)|
|As matrix multiplication is associative||(19.36)|
Now we have an equation in the form
Unfortunately, is not a lower-triangular matrix (we considered the case where it is above), but what we can do is rearrange some of the rows to make it into one, i.e. we have a matrix which swaps rows (a \saytransposition matrix) of a matrix. In this case,
That is, we have
Where and .