19.6 Inner product spaces
TODO
19.6.1 Inner products
19.6.2 Norms
Induced norms
Every inner product \sayinduces a norm, which is to say that if we have an inner product, we can define the norm
To show that this actually is a norm it is necessary and sufficient to show that it satisfies the four norm axioms.

•
Homogeneity. Consider $\left\lVert\alpha\mathbf{x}\right\rVert$, for which
$\displaystyle\left\lVert\alpha\mathbf{x}\right\rVert$ $\displaystyle=\sqrt{\langle\alpha\mathbf{x},\alpha\mathbf{x}\rangle}$ (19.146) $\displaystyle=\sqrt{\alpha\langle\mathbf{x},\alpha\mathbf{x}\rangle}$ (19.147) $\displaystyle=\sqrt{\alpha\overline{\langle\alpha\mathbf{x},\mathbf{x}\rangle}}$ (19.148) $\displaystyle=\sqrt{\alpha\overline{\alpha}\overline{\langle\mathbf{x},\mathbf% {x}\rangle}}$ (19.149) $\displaystyle=\sqrt{\left\lvert\alpha\right\rvert\langle\mathbf{x},\mathbf{x}\rangle}$ (19.150) 
•
TODO: other proofs
Norms which are not induced norms
We will pretend these do not exist (but be aware that they do most definitely exist)!!!
19.6.3 Orthogonality
Hopefully you know what perpendicular means. We would like to generalise this notion to a more abstract setting; we can say that
Definition 19.6.1
Let V be an inner product space over a field F, then two vectors are orthogonal if and only if their inner product is zero; that is if we let $x,y\in\textsf{V}$
We can denote this as $x\bot y$ (read \say$x$ and $y$ are orthogonal).
This is not the most exciting definition, but it is a very useful one!
Example 19.6.1
Consider the vector space $\textsf{V}=\mathbb{R}^{2}$. The two vectors
are orthogonal with respect to the dot product.
We can just apply the definition, recall that
$\displaystyle\begin{pmatrix}1\\ 0\end{pmatrix}\cdot\begin{pmatrix}0\\ 1\end{pmatrix}$  $\displaystyle=1\times 0+0\times 1$  (19.157)  
$\displaystyle=0$  (19.158) 
And therefore, the two vectors are orthogonal.
19.6.4 Some useful properties orthogonal vectors possess.
Theorem 19.6.1
Let $A$ be a set of nonzero pairwise orthogonal vectors (that is, any two vectors are orthogonal  as defined in 19.6.1), then this set is linearly independent.
todo: proof
19.6.5 Orthonormal bases
Theorem 19.6.2
Let $V$ be a finitedimensional vector space, and let $\beta=\{v_{1},v_{2},...,v_{n}\}$ be a basis for this vector space. Then, (spoiler alert) we know that for all $x\in V$ that
We can prove this as follows. First we know that for some scalars $a_{1},a_{2},...,a_{n}$ that as $\beta$ is a basis.
We now want to find the values of $a_{i}$ (for any $i$). We know that for all $j$
$\displaystyle\langle x,v_{j}\rangle$  $\displaystyle=\left\langle\sum_{1\leqq j\leqq n}a_{i}v_{i},v_{j}\right\rangle$  (19.161)  
$\displaystyle=\sum_{1\leqq j\leqq n}a_{i}\langle v_{i},v_{j}\rangle$  (19.162) 
Then as $\beta$ is an orthogonal basis, we know that all the $\langle v_{i},v_{j}\rangle$ terms are zero
19.6.6 GramSchmidt orthonormalisation
GramSchmidt orthonormalisation provides a useful way to turn a set of linearly independent vectors into a set of orthogonal vectors.
The key idea is that we build our set of vectors inductively, i.e. if our set of vectors is $S$ (a finite subset of some vector space $V$). Then we will order our set (doesn’t matter how, any ordering will do) and start to build sets $S^{\prime}_{1}$, $S^{\prime}_{2},...,S^{\prime}_{S}$ such that $S^{\prime}_{k}$ is an orthogonal set of $k$ vectors which satisfies
Clearly the main thing which is missing here is the step which takes us from $S^{\prime}_{k}$ to $S^{\prime}_{k+1}$. There are a lot of ways to find this step,

•
Consider specific examples of linear vectors in wellknown vector spaces (for example $\mathbb{R}^{2}$) and guess the formula ^{8}^{8} 8 Pun entirely unintended. for performing this orthonormalisation process.

•
Try to write a proof for our method and through this try to fill in the actual method.
I will try for the latter, because I think it is an approach which is much more fun. We will start by creating $S^{\prime}_{1}$ by simply selecting the only element in the first one element of $S$ which is by itself orthogonal.
Now, suppose that we have constructed $S^{\prime}_{k}$. We would like to find a way to build $S^{\prime}_{k+1}$. Clearly we should add $s_{k+1}$ to this set, the question of course is how. We need our new vector, say $s^{\prime}_{k+1}$ to be such that
$\displaystyle\langle s^{\prime}_{k+1},s^{\prime}_{j}\rangle=0$  $\displaystyle\forall j,1\leqq j\leqq k$  (19.164) 
19.6.7 Orthogonal complement
Definition 19.6.2
Let V be an inner product space, and $E$ be a subspace of V. We define the orthogonal complement of $E$ as the set
Theorem 19.6.3
Let $V$ be a vector space, and $W$ be a subspace of $V$. In this case wouldn’t it be nice if $W\cap W^{\bot}=\{0\}$?
Yes, it would. We can prove separately that $W\cap W^{\bot}\subseteq\{0\}$ and $\{0\}\subseteq W\cap W^{\bot}$.

•
$\{0\}\subseteq W\cap W^{\bot}$. This direction is the easier one, because we know both $W$ and $W^{\bot}$ are subspaces and therefore they both contain at least the $0$ vector.

•
$W\cap W^{\bot}\subseteq\{0\}$. Let us suppose that $x\in W$ and $x\in W^{\bot}$, and to prove this by contradiction that $x\notin\{0\}$, i.e. $x\neq 0$. Because $x\in W^{\bot}$ we know that for all $w\in W$ that $\langle x,w\rangle=0$ (this follows directly from the definition of $W^{\bot}$). As it is also true that $x\in W$, it follows from this that therefore also
$\langle x,x\rangle=0$ (19.166)which is true if and only if $x=0$, which contradicts our earlier assumption that $x\neq 0$, and thus this direction is true.
Therefore, the theorem is true.
$\Box$
Theorem 19.6.4 (The very important resolving theorem)
Let $V$ be a vector space, of which $W$ and $W^{\bot}$ are subspaces, then for every vector $y\in V$ there exist unique vectors $u,z$ such that
Theorem 19.6.5
Let $V$ be a vector space, of which $W$ is a subspace. Then
This theorem follows mostly from the definitions. First, let $\dim(W)=k$ and $\dim(W^{\bot})=m$, then our goal is to show that $\dim(k+m)=\dim(V)$.
First, fix a basis $\alpha=\{w_{1},...,w_{k}\}$ for $W$ and a basis for $W^{\bot}$, $\beta=\{w_{k+1},...,w_{k+m}\}$. Then we will prove that $\alpha\cup\beta$ is a basis for $V$.

•
Generating. Let $x\in V$, then $x=v_{1}+v_{2}$ for $v_{1}\in W$ and $v_{2}\in W^{\bot}$ using
19.6.8 Orthogonal projection
If you’ve done any physics (shivers) then you’ve probably come across the idea of \sayresolving forces. If you haven’t, then the basic idea is that if we have some vector $v$ in $\mathbb{R}^{2}$, then for example we can split it into two components: a perpendicular one and a parallel one.
One way we can do this (which is quite natural) is to resolve any vector parallel and perpendicular to the \sayaxes, for example we could have the vector in the diagram below which can be resolved into a component parallel to $\boldsymbol{\hat{\textbf{i}}}$ and $\boldsymbol{\hat{\textbf{j}}}$ (note that $\boldsymbol{\hat{\textbf{i}}}$ and $\boldsymbol{\hat{\textbf{j}}}$ are orthogonal).
But perpendicular and parallel to what? Usually in secondary school mathematics this is not very welldefined, but we can now use some of our previous definitions to define this notion of \saysplitting up a vector and generalise it to vector spaces where we don’t have a ready geometric interpretation.
This definition encodes a lot of the intuitive notions about orthogonality and perpendicularity.
Definition 19.6.3
Let V be a vector space, and $E$ be a subspace of $V$.
We say that w is the orthogonal projection of v onto $E$ if

1.
The vector w is in $E$.

2.
The vector obtained by subtracting w from v is orthogonal to all the vectors in $E$ (which we can write as $\textbf{v}\textbf{w}\bot E$)
A very key property is that the orthogonal projection is the vector in $E$ which is the closest vector to w. This makes the orthogonal projection useful in optimisation problems!
Theorem 19.6.6
Let $V$ be a vector space, and $W$ be a subspace of $V$. Let $v\in V$, in which case the orthogonal projection of $v$ onto $W$ minimises the distance between $v$ and $W$ (which we define as the distance between $v$ and the closest vector in $W$).
To prove this, we will need the Pythagorean theorem. Let $v\in v$ and let $w$ be an arbitrary vector in $W$. Then we will define $p$ to be the orthogonal projection of $p$ onto $W$. We can write the distance between $v$ and $w$ as $vw^{2}$. Our goal is to show that this is greater than or equal to $vx^{2}$ (note that in general it is always nicer to work with the distance squared, and this is all good and well because distance is never negative). Then we apply the trusty trick of adding zero (i.e. an object and its inverse), in this case $p$, which gives us
$\displaystylevw^{2}$  $\displaystyle=vp+pw^{2}$  (19.169) 
Then note that $vp\in W^{\bot}$ and that $pw\in W$ (as both are in $W$, which is a subspace). To this, we can apply Pythagoras’ theorem (just when you thought you’d escaped school geometry, it comes back to bite!) which we know that
Now that we have defined the object, we can ask some questions that (at least to me) it makes sense to ask. For example, we can ask if the orthogonal projection always exists! This seems to be intuitively true, but how do we know? Let us prove this too.
19.6.9 The method of least squares
There’s a fairly intuitive notion that the orthogonal (well, \sayperpendicular) line minimises the distance between points.
I had a surprising amount of trouble trying to get a nice little right angle symbol onto the diagram (and ultimately failed), but the lin connecting the point to the other line is in fact perpendicular to the line and also the shortest distance.
Usually the question of how to minimising or maximising a quantity requires some analysis (usually differentiation), however, here we are lucky to have a case where we can minimise things using only linear algebra!
The key idea in least squares is that when we have a system of linear equations $Ax=b$, sometimes we cannot find a solution to our system of linear equations (which sucks), but it would be nice to have an approximate solution. That is, our problem is how to find an $x_{0}$ such that
is the next best thing to a solution to $Ax_{0}=b$. How exactly we should define optimal actually does matter (for example, it turns out that in the \sayreal world error in experiments tends to follow a certain kind of statistical distribution; the normal distribution), but let us be driven by what seems simplest, and say that whatever mimisises the \saydistance between $Ax_{0}$ and $b$ is best, that is we seek to find $x_{0}$ which minimises
Note that we are essentially trying to find the vector in $R(A)$ (the range of $A$) which is closest to $b$.