19.6 Inner product spaces

TODO

19.6.1 Inner products

19.6.2 Norms

Induced norms

Every inner product \sayinduces a norm, which is to say that if we have an inner product, we can define the norm

𝐱=𝐱,𝐱\left\lVert\mathbf{x}\right\rVert=\sqrt{\langle\mathbf{x},\mathbf{x}\rangle} (19.145)

To show that this actually is a norm it is necessary and sufficient to show that it satisfies the four norm axioms.

  • Homogeneity. Consider α𝐱\left\lVert\alpha\mathbf{x}\right\rVert, for which

    α𝐱\displaystyle\left\lVert\alpha\mathbf{x}\right\rVert =α𝐱,α𝐱\displaystyle=\sqrt{\langle\alpha\mathbf{x},\alpha\mathbf{x}\rangle} (19.146)
    =α𝐱,α𝐱\displaystyle=\sqrt{\alpha\langle\mathbf{x},\alpha\mathbf{x}\rangle} (19.147)
    =αα𝐱,𝐱¯\displaystyle=\sqrt{\alpha\overline{\langle\alpha\mathbf{x},\mathbf{x}\rangle}} (19.148)
    =αα¯𝐱,𝐱¯\displaystyle=\sqrt{\alpha\overline{\alpha}\overline{\langle\mathbf{x},\mathbf% {x}\rangle}} (19.149)
    =|α|𝐱,𝐱\displaystyle=\sqrt{\left\lvert\alpha\right\rvert\langle\mathbf{x},\mathbf{x}\rangle} (19.150)
  • TODO: other proofs

Norms which are not induced norms

We will pretend these do not exist (but be aware that they do most definitely exist)!!!

19.6.3 Orthogonality

Hopefully you know what perpendicular means. We would like to generalise this notion to a more abstract setting; we can say that

Definition 19.6.1

Let V be an inner product space over a field F, then two vectors are orthogonal if and only if their inner product is zero; that is if we let x,y𝖵x,y\in\textsf{V}

x,y=0 x and y are orthogonal\langle x,y\rangle=0\iff\text{ $x$ and $y$ are orthogonal} (19.151)

We can denote this as xyx\bot y (read \sayxx and yy are orthogonal).

This is not the most exciting definition, but it is a very useful one!

Example 19.6.1

Consider the vector space 𝖵=2\textsf{V}=\mathbb{R}^{2}. The two vectors

(10),(01)\begin{pmatrix}1\\ 0\end{pmatrix},\begin{pmatrix}0\\ 1\end{pmatrix} (19.152)

are orthogonal with respect to the dot product.

We can just apply the definition, recall that

(10)(01)\displaystyle\begin{pmatrix}1\\ 0\end{pmatrix}\cdot\begin{pmatrix}0\\ 1\end{pmatrix} =1×0+0×1\displaystyle=1\times 0+0\times 1 (19.157)
=0\displaystyle=0 (19.158)

And therefore, the two vectors are orthogonal.

19.6.4 Some useful properties orthogonal vectors possess.

Theorem 19.6.1

Let AA be a set of non-zero pairwise orthogonal vectors (that is, any two vectors are orthogonal - as defined in 19.6.1), then this set is linearly independent.

todo: proof

19.6.5 Orthonormal bases

Theorem 19.6.2

Let VV be a finite-dimensional vector space, and let β={v1,v2,,vn}\beta=\{v_{1},v_{2},...,v_{n}\} be a basis for this vector space. Then, (spoiler alert) we know that for all xVx\in V that

x=1knx,vkvk2vkx=\sum_{1\leqq k\leqq n}\frac{\langle x,v_{k}\rangle}{||v_{k}||^{2}}v_{k} (19.159)

We can prove this as follows. First we know that for some scalars a1,a2,,ana_{1},a_{2},...,a_{n} that as β\beta is a basis.

x=1jnaivix=\sum_{1\leqq j\leqq n}a_{i}v_{i} (19.160)

We now want to find the values of aia_{i} (for any ii). We know that for all jj

x,vj\displaystyle\langle x,v_{j}\rangle =1jnaivi,vj\displaystyle=\left\langle\sum_{1\leqq j\leqq n}a_{i}v_{i},v_{j}\right\rangle (19.161)
=1jnaivi,vj\displaystyle=\sum_{1\leqq j\leqq n}a_{i}\langle v_{i},v_{j}\rangle (19.162)

Then as β\beta is an orthogonal basis, we know that all the vi,vj\langle v_{i},v_{j}\rangle terms are zero

19.6.6 Gram-Schmidt orthonormalisation

Gram-Schmidt orthonormalisation provides a useful way to turn a set of linearly independent vectors into a set of orthogonal vectors.

The key idea is that we build our set of vectors inductively, i.e. if our set of vectors is SS (a finite subset of some vector space VV). Then we will order our set (doesn’t matter how, any ordering will do) and start to build sets S1S^{\prime}_{1}, S2,,S|S|S^{\prime}_{2},...,S^{\prime}_{|S|} such that SkS^{\prime}_{k} is an orthogonal set of kk vectors which satisfies

span(first k vectors in S)=span(Sk).\operatorname{span}(\text{first k vectors in $S$})=\operatorname{span}(S^{% \prime}_{k}). (19.163)

Clearly the main thing which is missing here is the step which takes us from SkS^{\prime}_{k} to Sk+1S^{\prime}_{k+1}. There are a lot of ways to find this step,

  • Consider specific examples of linear vectors in well-known vector spaces (for example 2\mathbb{R}^{2}) and guess the formula 88 8 Pun entirely unintended. for performing this orthonormalisation process.

  • Try to write a proof for our method and through this try to fill in the actual method.

I will try for the latter, because I think it is an approach which is much more fun. We will start by creating S1S^{\prime}_{1} by simply selecting the only element in the first one element of SS which is by itself orthogonal.

Now, suppose that we have constructed SkS^{\prime}_{k}. We would like to find a way to build Sk+1S^{\prime}_{k+1}. Clearly we should add sk+1s_{k+1} to this set, the question of course is how. We need our new vector, say sk+1s^{\prime}_{k+1} to be such that

sk+1,sj=0\displaystyle\langle s^{\prime}_{k+1},s^{\prime}_{j}\rangle=0 j,1jk\displaystyle\forall j,1\leqq j\leqq k (19.164)

19.6.7 Orthogonal complement

Definition 19.6.2

Let V be an inner product space, and EE be a subspace of V. We define the orthogonal complement of EE as the set

E={x𝘝:vE(xe)}E^{\bot}=\{x\in\textsf{V}:\forall v\in E\big{(}x\bot e\big{)}\} (19.165)
Theorem 19.6.3

Let VV be a vector space, and WW be a subspace of VV. In this case wouldn’t it be nice if WW={0}W\cap W^{\bot}=\{0\}?

Yes, it would. We can prove separately that WW{0}W\cap W^{\bot}\subseteq\{0\} and {0}WW\{0\}\subseteq W\cap W^{\bot}.

  • {0}WW\{0\}\subseteq W\cap W^{\bot}. This direction is the easier one, because we know both WW and WW^{\bot} are subspaces and therefore they both contain at least the 0 vector.

  • WW{0}W\cap W^{\bot}\subseteq\{0\}. Let us suppose that xWx\in W and xWx\in W^{\bot}, and to prove this by contradiction that x{0}x\notin\{0\}, i.e. x0x\neq 0. Because xWx\in W^{\bot} we know that for all wWw\in W that x,w=0\langle x,w\rangle=0 (this follows directly from the definition of WW^{\bot}). As it is also true that xWx\in W, it follows from this that therefore also

    x,x=0\langle x,x\rangle=0 (19.166)

    which is true if and only if x=0x=0, which contradicts our earlier assumption that x0x\neq 0, and thus this direction is true.

Therefore, the theorem is true.

\Box

Theorem 19.6.4 (The very important resolving theorem)

Let VV be a vector space, of which WW and WW^{\bot} are subspaces, then for every vector yVy\in V there exist unique vectors u,zu,z such that

y=u+zy=u+z (19.167)
Theorem 19.6.5

Let VV be a vector space, of which WW is a subspace. Then

dim(W)+dim(W)=dim(V).\dim(W)+\dim(W^{\bot})=\dim(V). (19.168)

This theorem follows mostly from the definitions. First, let dim(W)=k\dim(W)=k and dim(W)=m\dim(W^{\bot})=m, then our goal is to show that dim(k+m)=dim(V)\dim(k+m)=\dim(V).

First, fix a basis α={w1,,wk}\alpha=\{w_{1},...,w_{k}\} for WW and a basis for WW^{\bot}, β={wk+1,,wk+m}\beta=\{w_{k+1},...,w_{k+m}\}. Then we will prove that αβ\alpha\cup\beta is a basis for VV.

  • Generating. Let xVx\in V, then x=v1+v2x=v_{1}+v_{2} for v1Wv_{1}\in W and v2Wv_{2}\in W^{\bot} using

19.6.8 Orthogonal projection

If you’ve done any physics (shivers) then you’ve probably come across the idea of \sayresolving forces. If you haven’t, then the basic idea is that if we have some vector vv in 2\mathbb{R}^{2}, then for example we can split it into two components: a perpendicular one and a parallel one.

One way we can do this (which is quite natural) is to resolve any vector parallel and perpendicular to the \sayaxes, for example we could have the vector in the diagram below which can be resolved into a component parallel to 𝐢^\boldsymbol{\hat{\textbf{i}}} and 𝐣^\boldsymbol{\hat{\textbf{j}}} (note that 𝐢^\boldsymbol{\hat{\textbf{i}}} and 𝐣^\boldsymbol{\hat{\textbf{j}}} are orthogonal).

𝐢^\boldsymbol{\hat{\textbf{i}}} component𝐣^\boldsymbol{\hat{\textbf{j}}} component

But perpendicular and parallel to what? Usually in secondary school mathematics this is not very well-defined, but we can now use some of our previous definitions to define this notion of \saysplitting up a vector and generalise it to vector spaces where we don’t have a ready geometric interpretation.

This definition encodes a lot of the intuitive notions about orthogonality and perpendicularity.

Definition 19.6.3

Let V be a vector space, and EE be a subspace of VV.

We say that w is the orthogonal projection of v onto EE if

  1. 1.

    The vector w is in EE.

  2. 2.

    The vector obtained by subtracting w from v is orthogonal to all the vectors in EE (which we can write as 𝒗𝒘E\textbf{v}-\textbf{w}\bot E)

A very key property is that the orthogonal projection is the vector in EE which is the closest vector to w. This makes the orthogonal projection useful in optimisation problems!

Theorem 19.6.6

Let VV be a vector space, and WW be a subspace of VV. Let vVv\in V, in which case the orthogonal projection of vv onto WW minimises the distance between vv and WW (which we define as the distance between vv and the closest vector in WW).

To prove this, we will need the Pythagorean theorem. Let vvv\in v and let ww be an arbitrary vector in WW. Then we will define pp to be the orthogonal projection of pp onto WW. We can write the distance between vv and ww as vw2||v-w||^{2}. Our goal is to show that this is greater than or equal to vx2||v-x||^{2} (note that in general it is always nicer to work with the distance squared, and this is all good and well because distance is never negative). Then we apply the trusty trick of adding zero (i.e. an object and its inverse), in this case pp, which gives us

vw2\displaystyle||v-w||^{2} =vp+pw2\displaystyle=||v-p+p-w||^{2} (19.169)

Then note that vpWv-p\in W^{\bot} and that pwWp-w\in W (as both are in WW, which is a subspace). To this, we can apply Pythagoras’ theorem (just when you thought you’d escaped school geometry, it comes back to bite!) which we know that

Now that we have defined the object, we can ask some questions that (at least to me) it makes sense to ask. For example, we can ask if the orthogonal projection always exists! This seems to be intuitively true, but how do we know? Let us prove this too.

19.6.9 The method of least squares

There’s a fairly intuitive notion that the orthogonal (well, \sayperpendicular) line minimises the distance between points.

line of shortest distance

I had a surprising amount of trouble trying to get a nice little right- angle symbol onto the diagram (and ultimately failed), but the lin connecting the point to the other line is in fact perpendicular to the line and also the shortest distance.

Usually the question of how to minimising or maximising a quantity requires some analysis (usually differentiation), however, here we are lucky to have a case where we can minimise things using only linear algebra!

The key idea in least squares is that when we have a system of linear equations Ax=bAx=b, sometimes we cannot find a solution to our system of linear equations (which sucks), but it would be nice to have an approximate solution. That is, our problem is how to find an x0x_{0} such that

Ax0=bAx_{0}=b (19.170)

is the next best thing to a solution to Ax0=bAx_{0}=b. How exactly we should define optimal actually does matter (for example, it turns out that in the \sayreal world error in experiments tends to follow a certain kind of statistical distribution; the normal distribution), but let us be driven by what seems simplest, and say that whatever mimisises the \saydistance between Ax0Ax_{0} and bb is best, that is we seek to find x0x_{0} which minimises

Axb\left\lVert Ax-b\right\rVert (19.171)

Note that we are essentially trying to find the vector in R(A)R(A) (the range of AA) which is closest to bb.