## Least squares and pseudo-inverses

Posted: 5th February 2014 by seanmathmodelguy in Lectures

To appreciate the connections between solutions of the system $$Ax=b$$ and least squares, we begin with two illustrative examples.

###### Overdetermined systems:

$$A$$ is $$m \times n$$ with $$m > n$$. In this case there are more equations than unknowns, $$A^{\top}A$$ is $$n\times n$$ and $$AA^{\top}$$ is $$m\times m$$. The connection with the pseudo-inverse is that
$x = (A^{\top}A)^{-1}A^{\top}b = A^+b$ is the particular $$x$$ that minimizes $$\|Ax-b\|$$.
Example: Solve the system $$(2\ 3\ 4\ 6)^\top(x_1) = (4\ 6\ 8\ 10)^\top.$$
Using $$A^{\top}A = (2\ 3\ 4\ 6)(2\ 3\ 4\ 6)^\top=65$$,
$x = (A^{\top}A)^{-1}A^{\top}b = \frac{1}{65}\begin{pmatrix}2 &3 &4 &6\end{pmatrix}\begin{pmatrix}4\\6\\8\\10\end{pmatrix} = \frac{118}{65}.$ Considering the least squares problem, $$\|Ax – b\|^2 = (2x-4)^2+(3x-6)^2+(4x-8)^2+(6x-10)^2$$ which has a minimum at $$2(2x-4)2+2(3x-6)3+2(4x-8)4+2(6x-10)6 = 0$$ so that
$x = \frac{8+18+32+60}{4+9+16+36} = \frac{118}{65}.$

###### Underdetermined systems:

$$A$$ is $$m \times n$$ with $$m < n$$. In this case there are more unknowns than equations, and the connection with the pseudo-inverse is that $x = A^{\top}(AA^{\top})^{-1}b = A^+b$ is the particular $$x$$ that minimizes $$\|x\|$$ amongst all of the possible solutions. Example: Solve the system $$(1\ -1\ 0)(x_1\ x_2\ x_3)^\top = (2).$$
In this case we find $$AA^{\top} = (1\ -1\ 0)(1\ -1\ 0)^\top = 2$$ and
$x = A^{\top}(AA^{\top})^{-1}b = \begin{pmatrix}1\\-1\\0\end{pmatrix}\frac{1}{2}2 = \begin{pmatrix}1\\-1\\0\end{pmatrix}.$

To see the connection with the least squares problem, the system is row reduced (it is already row reduced) and we can choose $$x_1$$ as the pivot with $$x_2, x_3$$ as free parameters. Letting $$x_2 = s$$ and $$x_3 = t$$ we have $$x_1 = 2+s$$ so that the general solution is
$x = \begin{pmatrix}2+s\\ s\\ t\end{pmatrix} = \begin{pmatrix}2\\0\\0\end{pmatrix} + s\begin{pmatrix}1\\1\\0\end{pmatrix} + t\begin{pmatrix}0\\0\\1\end{pmatrix}$ for any $$s,t\in\mathbb{R}$$. Considering $$\|x\|$$, we have $$\|x\|^2 = (2+s)^2 + s^2 + t^2$$ which has a minimum at $$2(2+s)+2s = 0$$ and $$2t = 0$$ or $$t = 0, s = -1$$ giving $$x = (1\ -1\ 0)^\top$$ as before.

###### Symmetric matrices:

Decomposing the structure of the matrix $$A$$ can help understand the resulting solutions and in the case of a symmetric matrix, the eigenvectors form an orthogonal set which allows one to expand $$A = UDU^\top$$ where $$U$$ is the orthogonal matrix $$(U^{-1}=U^\top)$$ with the eigenvectors as columns and $$D$$ a diagonal matrix with the corresponding eigenvalues as the diagonal elements. Also, the eigenvalues could be anything, but if we specify that we want a non-negative definite matrix then the eigenvalues must be greater than or equal to zero.

Example: Write $$A_1 = \begin{pmatrix}2 &1\\ 1 &2\end{pmatrix}$$ in the form $$A_1 = UDU^\top$$.
A quick calculation gives $$\lambda_1 = 3$$ with corresponding eigenvector $$\mathbf{\xi}^{(1)} = \frac{1}{\sqrt{2}}(1\ 1)^\top$$ and a second pair $$\lambda_2 = 1, \mathbf{\xi}^{(2)} = \frac{1}{\sqrt{2}}(1\ -1)^\top.$$ This gives the decompositon $A_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}\begin{pmatrix}3 &0\\ 0 &1\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}^\top.$ Using this decomposition the inverse of a matrix is easily computed by replacing the diagonal elements of $$D$$ with their reciprocals so that for example $A_1^{-1} = \frac{1}{3}\begin{pmatrix}2& -1\\ -1& 2\end{pmatrix} = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}\begin{pmatrix}{\small\frac{1}{3}} &0\\ 0 &1\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ 1 &-1\end{pmatrix}^\top.$

What happens if one of the eigenvalues is zero? This will not effect the decomposition of $$A$$, in fact if we decompose $$A_2 = \begin{pmatrix}1&-1\\-1&1\end{pmatrix}$$ with eigenvalues $$\lambda_1 = 2, \lambda_2 = 0$$ and corresponding eigenvectors $$\xi^{(1)} = \frac{1}{\sqrt{2}}(1\ -1)^\top$$, $$\xi^{(2)} = \frac{1}{\sqrt{2}}(1\ 1)^\top$$ then $A_2 = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}\begin{pmatrix}2 &0\\ 0 &0\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}^\top.$ Notice that this matrix is not invertible since one of the eigenvalues is zero. But what if we took the reciprocal of all the nonzero diagonal elements to form $A_3 = \frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}\begin{pmatrix}{\small\frac{1}{2}} &0\\ 0 &0\end{pmatrix}\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\ -1 &1\end{pmatrix}^\top = \frac{1}{4}\begin{pmatrix}1& -1\\ -1 &1\end{pmatrix}.$ Sadly, $$A_3$$ is not the inverse of $$A_2$$. This would be very surprising since $$A_2$$ is not invertible. So what then is $$A_3$$? Well, $$A_2$$ and $$A_3$$ are pseudo-inverses. This can be generalized later to non-square matrices by constructing the SVD of a matrix. As a refresher please watch the following video:

Here is how the pseudo-inverse is connected to the solution of a least-squares problem. If the linear system $$Ax = b$$ has any solutions, then they will have the form$x = A^+ b + \left(I – A^+ A\right)\xi$ for some arbitrary vector $$\xi$$. Multiplying by $$A$$ on the left gives that condition that ($$A = A A^+ A$$ is a property of $$A^+$$)$Ax = A A^+ b + \left(A – A A^+ A\right)\xi = A A^+ b = b.$ So for a any solution to exist we need the admissibility condition $$AA^+ b = b$$. For linear systems $$A x = b$$ with non-unique solutions as in the underdetermined case, the pseudo-inverse may be used to construct the solution of minimum Euclidean norm $$\|x\|$$ among all solutions. If $$A x = b$$ is admissible ($$AA^+ b = b$$), the vector $$y = A^+b$$ is a solution, and satisfies $$\|y\| \le \|x\|$$ for all solutions.

One final example should tie this all together.
Example: Consider finding the solution to $$A_2x = b$$ that minimizes $$\|x\|$$.
Row reducing $$A_2x = b, b = (b_1\ b_2)^\top$$ reveals the $$b_2=-b_1$$ to ensure a solution, and
$\begin{pmatrix}x_1\\x_2\end{pmatrix} = \begin{pmatrix}b_1\\0\end{pmatrix} + s\begin{pmatrix}1\\1\end{pmatrix}$ for any $$s\in\mathbb{R}$$. Continuing, $$\|x\|^2 = (b_1+s)^2+s^2$$ which is minimized when $$2(b_1+s)+2s=0$$ or when $$s = -\frac{b_1}{2}$$ so that the solution is
$\begin{pmatrix}x_1\\x_2\end{pmatrix} = \begin{pmatrix}b_1\\0\end{pmatrix} – \frac{b_1}{2}\begin{pmatrix}1\\1\end{pmatrix}=\frac{b_1}{2}\begin{pmatrix}1\\-1\end{pmatrix}.$ Using the pseudo-inverse of $$A_2$$, $$A_3 = A_2^+$$ gives the admissibility condition $$A_2A_2^+A_2b = b$$ which simplifies to $$b_2 = -b_1$$ and the solution
$\begin{pmatrix}x_1\\x_2\end{pmatrix} = A_2^+b = \frac{1}{4}\begin{pmatrix}1&-1\\-1&1\end{pmatrix}=\frac{1}{4}\begin{pmatrix}b_1-b_2\\-b_1+b_2\end{pmatrix}=\frac{b_1}{2}\begin{pmatrix}1\\-1\end{pmatrix}.$

### Summary

At the beginning of this post, the Moore-Penrose pseudo-inverse generalized the idea of an inverse to non-square matrices and another notion of pseudo-inverse arose for symmetric matrices that have at least one zero eigenvalue. This second notion can be generalized (using the SVD) to non-square matrices and matrices that are not symmetric where the eigenvectors are not guaranteed to form an orthonormal set. In all cases, the pseudo-inverse is implicitly tied to the notion of finding solutions with minimal norm.