(Please replace the erroneous posting from yesterday, Nov. 28, with this corrected version.)... read more >>

]]>(Please replace the erroneous posting from yesterday, Nov. 28, with this corrected version.)

Here is a very short course in Linear Algebra. The Singular Value Decomposition provides a natural basis for Gil Strang's Four Fundamental Subspaces.

Screen shot from Gil Strang MIT/MathWorks video lecture, "The Big Picture of Linear Algebra".

Gil Strang tells me that he began to think about linear algebra in terms of four fundamental subspaces in the 1970's when he wrote the first edition of his textbook, *Introduction to Linear Algebra*. The fifth edition, which was published last May, features the spaces on the cover.

The concept is a centerpiece in his video lectures for MIT course 18.06. It even found its way into the new video series about ordinary differential equations that he and I made for MIT and MathWorks. His paper included in the notes for 18.06 is referenced below.

Suppose that $A$ is a $m$ -by- $n$ matrix that maps vectors in $R^n$ to vectors in $R^m$. The four fundamental subspaces associated with $A$, two in $R^n$ and two in $R^m$, are:

- column space of $A$, the set of all $y$ in $R^m$ resulting from $y = Ax$,
- row space of $A$, the set of all $x$ in $R^n$ resulting from $x = A^Ty$,
- null space of $A$, the set of all $x$ in $R^n$ for which $Ax = 0$,
- left null space of $A$, the set of all $y$ in $R^m$ for which $A^T y = 0$.

The row space and the null space are *orthogonal* to each other and span all of $R^n$. The column space and the left null space are also *orthogonal* to each other and span all of $R^m$.

The *dimension* of a subspace is the number of linearly independent vectors required to span that space. *The Fundamental Theorem of Linear Algebra* is

- The dimension of the row space is equal to the dimension of the column space.

In other words, the number of linearly independent rows is equal to the number of linearly independent columns. This may seem obvious, but it is actually a subtle fact that requires proof.

The *rank* of a matrix is this number of linearly independent rows or columns.

The natural bases for the four fundamental subspaces are provided by the SVD, the Singular Value Decomposition, of $A$.

$$ A = U \Sigma V^T $$

The matrices $U$ and $V$ are *orthogonal*.

$$ U^T U = I_m, \ \ V^T V = I_n $$

You can think of orthogonal matrices as multidimensional generalizations of two dimensional rotations. The matrix $\Sigma$ is *diagonal*, so its only nonzero elements are on the main diagonal.

The shape and size of these matrices are important. The matrix $A$ is rectangular, say with $m$ rows and $n$ columns; $U$ is square, with the same number of rows as $A$; $V$ is also square, with the same number of columns as $A$; and $\Sigma$ is the same size as $A$. Here is a picture of this equation when $A$ is tall and skinny, so $m > n$. The diagonal elements of $\Sigma$ are the *singular values*, shown as blue dots. All of the other elements of $\Sigma$ are zero.

The signs and the ordering of the columns in $U$ and $V$ can always be taken so that the singular values are nonnegative and arranged in decreasing order.

For any diagonal matrix like $\Sigma$, it is clear that the rank, which is the number of independent rows or columns, is just the number of nonzero diagonal elements.

In MATLAB, the SVD is computed by the statement.

[U,Sigma,V] = svd(A)

With inexact floating point computation, it is appropriate to take the rank to be the number of *nonnegligible* diagonal elements. So the function

r = rank(A)

counts the number of singular values larger than a tolerance.

Multiply both sides of $A = U\Sigma V^T $ on the right by $V$. Since $V^T V = I$, we find

$$ AV = U\Sigma $$

Here is the picture. I've drawn a green line after column $r$ to show the rank. The only nonzero elements of $\Sigma$, the singular values, are the blue dots.

Write out this equation column by column.

$$ Av_j = \sigma_j u_j, \ \ j = 1,...,r $$

$$ Av_j = 0, \ \ j = r+1,...,n $$

This says that $A$ maps the first $r$ columns of $V$ onto nonzero multiples of the first $r$ columns of $U$ and maps the remaining columns of $V$ onto zero.

- $U(:,1:r)$ spans the column space.
- $V(:,r+1:n)$ spans the null space.

Transpose the equation $A = U\Sigma V^T $ and multiply both sides on the right by $U$. Since $U^T U = I$, we find

$$ A^T U = V \Sigma^T $$

Here's the picture, with the green line at the rank.

Write this out column by column.

$$ A^T u_j = \sigma_j v_j, \ \ j = 1,...,r $$

$$ A^T u_j = 0, \ \ j = r+1,...,m $$

This says that $A^T$ maps the first $r$ columns of $U$ onto nonzero multiples of the first $r$ columns of $V$ and maps the remaining columns of $U$ onto zero.

- $V(:,1:r)$ spans the row space.
- $U(:,r+1:m)$ spans the left nullspace.

Here is an example involving lines in two dimensions, so $m = n = 2$. Start with these vectors.

u = [-3 4]' v = [1 3]'

u = -3 4 v = 1 3

The matrix $A$ is their outer product.

A = u*v'

A = -3 -9 4 12

Compute the SVD.

[U,Sigma,V] = svd(A)

U = -0.6000 0.8000 0.8000 0.6000 Sigma = 15.8114 0 0 0 V = 0.3162 -0.9487 0.9487 0.3162

As expected, $\Sigma$ has only one nonzero singular value, so the rank is $r = 1$.

The first left and right singular vectors are our starting vectors, normalized to have unit length.

ubar = u/norm(u) vbar = v/norm(v)

ubar = -0.6000 0.8000 vbar = 0.3162 0.9487

The columns of $A$ are proportional to each other, and to $\bar{u}$. So the column space is just the line generated by multiples of either column and $\bar{u}$ is the normalized basis vector for the column space. The columns of $A^T$ are proportional to each other, and to $\bar{v}$. So $\bar{v}$ is the normalized basis vector for the row space.

The only nonzero singular value is the product of the normalizing factors.

sigma = norm(u)*norm(v)

sigma = 15.8114

The second right and left singular vectors, that is the second columns of $V$ and $U$, provide bases for the null spaces of $A$ and $A^T$.

Here is the picture.

Gilbert Strang, "The Four Fundamental Subspaces: 4 Lines", undated notes for MIT course 18.06, <http://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf>

Gilbert Strang, *Introduction to Linear Algebra*, Wellesley-Cambridge Press, fifth edition, 2016, x+574 pages, <http://bookstore.siam.org/wc14>

Gilbert Strang, "The Fundamental Theorem of Linear Algebra", *The American Mathematical Monthly*, Vol. 100, No. 9. (Nov., 1993), pp. 848-855, <http://www.jstor.org/stable/2324660?seq=1#page_scan_tab_contents>, also available at <http://www.souravsengupta.com/cds2016/lectures/Strang_Paper1.pdf>

Get
the MATLAB code

Published with MATLAB® R2016b

Here is a very short course in Linear Algebra. The Singular Value Decomposition provides a natural basis for Gil Strang's Four Fundamental Subspaces.... read more >>

]]>Here is a very short course in Linear Algebra. The Singular Value Decomposition provides a natural basis for Gil Strang's Four Fundamental Subspaces.

Screen shot from Gil Strang MIT/MathWorks video lecture, "The Big Picture of Linear Algebra".

Gil Strang tells me that he began to think about linear algebra in terms of four fundamental subspaces in the 1970's when he wrote the first edition of his textbook, *Introduction to Linear Algebra*. The fifth edition, which was published last May, features the spaces on the cover.

The concept is a centerpiece in his video lectures for MIT course 18.06. It even found its way into the new video series about ordinary differential equations that he and I made for MIT and MathWorks. His paper included in the notes for 18.06 is referenced below.

Suppose that $A$ is a $m$ -by- $n$ matrix that maps vectors in $R^n$ to vectors in $R^m$. The four fundamental subspaces associated with $A$, two in $R^n$ and two in $R^m$, are:

- row space of $A$, the set of all $x$ for which $Ax$ is nonzero,
- null space of $A$, the set of all $x$ for which $Ax = 0$,
- column space of $A$, the set of all $y$ for which $A^T y$ is nonzero.
- left null space of $A$, the set of all $y$ for which $A^T y = 0$.

The row space and the null space are *orthogonal* to each other and span all of $R^n$. The column space and the left null space are also *orthogonal* to each other and span all of $R^m$.

The *dimension* of a subspace is the number of linearly independent vectors required to span that space. *The Fundamental Theorem of Linear Algebra* is

- The dimension of the row space is equal to the dimension of the column space.

In other words, the number of linearly independent rows is equal to the number of linearly independent columns. This may seem obvious, but it is actually a subtle fact that requires proof.

The *rank* of a matrix is this number of linearly independent rows or columns.

The natural bases for the four fundamental subspaces are provided by the SVD, the Singular Value Decomposition, of $A$.

$$ A = U \Sigma V^T $$

The matrices $U$ and $V$ are *orthogonal*, which you can think of as multidimensional generalizations of two dimensional rotations. The matrix $\Sigma$ is *diagonal*, so its only nonzero elements are on the main diagonal.

The shape and size of these matrices are important. The matrix $A$ is rectangular, say with $m$ rows and $n$ columns; $U$ is square, with the same number of rows as $A$; $V$ is also square, with the same number of columns as $A$; and $\Sigma$ is the same size as $A$. Here is a picture of this equation when $A$ is tall and skinny, so $m > n$. The diagonal elements of $\Sigma$ are the *singular values*, shown as blue dots. All of the other elements of $\Sigma$ are zero.

The signs and the ordering of the columns in $U$ and $V$ can always be taken so that the singular values are nonnegative and arranged in decreasing order.

For any diagonal matrix like $\Sigma$, it is clear that the rank, which is the number of independent rows or columns, is just the number of nonzero diagonal elements.

In MATLAB, the SVD is computed by the statement.

[U,Sigma,V] = svd(A)

With inexact floating point computation, it is appropriate to take the rank to be the number of *nonnegligible* diagonal elements. So the function

r = rank(A)

counts the number of singular values larger than a tolerance.

Multiply both sides of $A = U\Sigma V^T $ on the right by $V$. Since $V^T V = I$, we find

$$ AV = U\Sigma $$

Here is the picture. I've drawn a green line after column $r$ to show the rank. The only nonzero elements of $\Sigma$, the singular values, are the blue dots.

Write out this equation column by column.

$$ Av_j = \sigma_j u_j, \ \ j = 1,...,r $$

$$ Av_j = 0, \ \ j = r+1,...,n $$

This says that $A$ maps the first $r$ columns of $V$ onto nonzero vectors and maps the remaining columns of $V$ onto zero. So the columns of $V$, which are known as the *right singular vectors*, form a natural basis for the first two fundamental spaces.

- $V(:,1:r)$ spans the row space.
- $V(:,r+1:n)$ spans the null space.

Transpose the equation $A = U\Sigma V^T $ and multiply both sides on the right by $U$. Since $U^T U = I$, we find

$$ A^T U = V \Sigma^T $$

Here's the picture, with the green line at the rank.

Write this out column by column.

$$ A^T u_j = \sigma_j v_j, \ \ j = 1,...,r $$

$$ A^T u_j = 0, \ \ j = r+1,...,m $$

This says that $A^T$ maps the first $r$ columns of $U$ onto nonzero vectors and maps the remaining columns of $U$ onto zero. So the columns of $U$, which are known as the *left singular vectors*, form a natural basis for the other two fundamental spaces.

- $U(:,1:r)$ spans the column_space.
- $U(:,r+1:m)$ spans the left_nullspace.

Here is an example involving lines in two dimensions. So $m = n = 2$ and the rank is $r = 1$. Start with these vectors.

u = [-3 4]' v = [1 3]'

u = -3 4 v = 1 3

The matrix $A$ is their outer product

A = u*v'

A = -3 -9 4 12

Compute the SVD.

[U,S,V] = svd(A)

U = -0.6000 0.8000 0.8000 0.6000 S = 15.8114 0 0 0 V = 0.3162 -0.9487 0.9487 0.3162

The first left and right singular vectors are our starting vectors, normalized to have unit length.

ubar = u/norm(u) vbar = v/norm(v)

ubar = -0.6000 0.8000 vbar = 0.3162 0.9487

These vectors provide bases for the one dimensional column and row spaces. The only nonzero singular value is the product of the normalizing factors.

sigma = norm(u)*norm(v)

sigma = 15.8114

The second left and right singular vectors are perpendicular to the first two and form bases for the null spaces of $A$ and $A^T$.

Here is the picture.

Gilbert Strang, "The Four Fundamental Subspaces: 4 Lines", undated notes for MIT course 18.06, <http://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf>

Gilbert Strang, *Introduction to Linear Algebra*, Wellesley-Cambridge Press, fifth edition, 2016, x+574 pages, <http://bookstore.siam.org/wc14>

Gilbert Strang, "The Fundamental Theorem of Linear Algebra", *The American Mathematical Monthly*, Vol. 100, No. 9. (Nov., 1993), pp. 848-855, <http://www.jstor.org/stable/2324660?seq=1#page_scan_tab_contents>, also available at <http://www.souravsengupta.com/cds2016/lectures/Strang_Paper1.pdf>

Get
the MATLAB code

Published with MATLAB® R2016b

My favorite ordinary differential equation provides a good test of ODE solvers, both numeric and symbolic. It also provides a nice illustration of the underlying existence theory and error analysis.... read more >>

]]>My favorite ordinary differential equation provides a good test of ODE solvers, both numeric and symbolic. It also provides a nice illustration of the underlying existence theory and error analysis.

Here is one of my favorite ordinary differential equations. The dot means differentiation with respect to $t$. The equation does not express $\dot{y}$ directly, so it is implicit.

$$ \dot{y}^2 + y^2 = 1 $$

We don't need a computer to verify that the analytic solution is a combination of $\sin{t}$ and $\cos{t}$. But the constants $y=1$ and $y=-1$ are also solutions.

Our ODE solvers require an explicit equation, but when we solve this equation for $\dot{y}$ we get two solutions, differing in sign.

$$ \dot{y} = \pm \sqrt{1-y^2} $$

That $\pm$ sign makes this interesting.

Let's plunge ahead with the $+$ sign. Specify the equation, an interval for $t$ and an initial condition.

```
f = @(t,y) sqrt(1-y^2)
format short
tspan = [0 pi]
y0 = 0
```

f = @(t,y)sqrt(1-y^2) tspan = 0 3.1416 y0 = 0

An attempt to compute a numeric solution with `ode45` runs into trouble when $y$ gets slightly larger than $1$ and the square root of a negative number generates an imaginary result. So temporarily use a shorter $t$ interval.

try ode45(f,tspan,y0); catch ode45(f,[0 1.47],y0) end pi_axis([-0.1 1.1])

Error using odeplot (line 63) Error updating the ODEPLOT window. Solution data may have been corrupted. Argument Y cannot be complex. Error in ode45 (line 435)

Change the equation so that it provides a guard against the square root of negative numbers.

f = @(t,y) sqrt(1-min(y,1)^2) ode45(f,tspan,y0); pi_axis([-0.1 1.1])

f = @(t,y)sqrt(1-min(y,1)^2)

This is the correct solution, $\sin{t}$. For a while. But when $y$ reaches $1$ it has to stay there because its derivative is always nonnegative.

If we want to continue with $\sin{t}$ beyond $\pi/2$ we have to switch to the minus sign.

f = @(t,y) (1-2*double(t>pi/2))*real(sqrt(1-y.^2)) ode45(f,tspan,y0) pi_axis([-0.1 1.1])

f = @(t,y)(1-2*double(t>pi/2))*real(sqrt(1-y.^2))

We're solving the autonomous differential equation,

$$ \dot{y} = f(y) $$

where

$$ f(y) = \sqrt{1-y^2} $$

An important quantity in both the existence theory for ODEs and the error analysis for numeric methods is the *Jacobian*.

$$ J = \frac{\partial{f}}{\partial{y}} $$

In this case $J$ is 1-by-1.

$$ J = -\frac{y}{\sqrt{1 - y^2}} $$

When $y$ approaches $\pm 1$ the Jacobian approaches infinity.

Existence and uniqueness theorems for ODEs rely on a bounded Jacobian (or a bounded Lipschitz constant). The derivations of numerical methods rely on a bounded Jacobian. Error control in modern ODE software relies on a bounded Jacobian. All of these are up in the air with this example.

We can see the lack of uniqueness happening when we change the initial condition to $y(0) = -1$. The Jacobian is infinite at the start and both the constant $y(t) = -1$ and the trig function $y(t) = -\cos{t}$ are solutions. A small perturbation in the initial condition switches the numerical solution from one to the other.

f = @(t,y) sqrt(1-max(min(y,1),-1)^2) y0 = -1 ode45(f,tspan,y0); pi_axis([-1.2 0]) hold on format long y0 = -.99999999 ode45(f,tspan,y0); pi_axis([-1.1 1.1]) hold off

f = @(t,y)sqrt(1-max(min(y,1),-1)^2) y0 = -1 y0 = -0.999999990000000

I was curious to see how the Symbolic Toolbox handles this equation with this initial condition. There is good news and surprising news.

```
syms y(t)
ode = diff(y)^2+y^2==1
init = y(0)==-1
ysym = dsolve(ode,init)
```

ode(t) = diff(y(t), t)^2 + y(t)^2 == 1 init = y(0) == -1 ysym = -1 -cosh(t*1i)

I am pleased to see the two solutions confirmed, but I am surprised to see the second one expressed as a hyperbolic trig function of an imaginary argument.

Simplification generates a more familiar form.

simplify(ysym)

ans = -1 -cos(t)

Let's return to the original implicit equation and differentiate the entire equation with respect to $t$.

$$ \frac{d}{dt} \left( \dot{y}^2 + y^2 = 1 \right) $$

$$ 2 \dot{y} \ddot{y} + 2 y \dot{y} = 0 $$

We can divide through this equation by $2 \dot{y}$, provided that ${\dot{y}}$ is nonzero. This eliminates the constant solutions and produces an old friend, the harmonic oscillator.

$$ \ddot{y} + y = 0 $$

Our ODE solvers need the second order equation to be expressed as a first order system.

$$ \dot{y_1} = y_2, \ \ \dot{y_2} = -y_1 $$

Now there is no trouble computing $\sin{t}$ and $-\cos{t}$.

f = @(t,y)[y(2); -y(1)] y0 = [0; -1] ode23(f,tspan,y0) pi_axis([-1 1])

f = @(t,y)[y(2);-y(1)] y0 = 0 -1

I have been using this little function to label the x-axis with multiples of $\pi$ and scale the y-axis.

```
type pi_axis
```

function pi_axis(yaxis) axis([0 pi yaxis]) set(gca,'xtick',0:pi/4:pi, ... 'xticklabels',{0 '\pi/4' '\pi/2' '3\pi/4' '\pi'}) end

Get
the MATLAB code

Published with MATLAB® R2016a

I am launching a project that I call *Cleve's Laboratory*. My most recent Cleve's Corner column in MathWorks News & Notes is about the project. The MATLAB app itself is available for download from the MATLAB Central File Exchange.... read more >>

I am launching a project that I call *Cleve's Laboratory*. My most recent Cleve's Corner column in MathWorks News & Notes is about the project. The MATLAB app itself is available for download from the MATLAB Central File Exchange.

*Cleve’s Laboratory* collects much of the work I have done over the last several years in one place. The experiments come from this blog, my two online books, *Cleve’s Corner* columns in News & Notes, and new work.

Each experiment centers on an interactive “app” that allows experimenters to try out the ideas for themselves. Several of the experiments, including `hello_world`, `klock`, and `biorhythms`, are designed to introduce newcomers to MATLAB.

This figure shows snapshots of some of the graphical entry pages. Even these are live pages driven by MATLAB. The wave in the first icon moves every time you open the page. The clock reads the correct time. The Sudoku puzzle changes. The double pendulum swings.

Here are the apps in the current release. If you have been following this blog, you might recognize some of them.

- logo_wave
- biorhythms
- klock
- lifex
- fibonacci
- fern
- house_mult
- t_puzzle
- tictactoe
- flame
- predprey
- mandelbrot
- durerperm
- waterwave
- expshow
- sudoku
- orbits
- golden_spiral
- walker
- hello_world
- blackjack
- swinger
- eigsvdapp
- censusapp
- touchtone
- pdeapp
- interpapp
- waves
- tumbling_box

Get
the MATLAB code

Published with MATLAB® R2016a

This is a follow-up to my previous follow-up, posted several days ago. A very careful reader, Bruno Bazzano, contributed a comment pointing out what he called "a small typo" in my code for the classic Gram-Schmidt algorithm. It is more than a small typo, it is a serious blunder. I must correct the code, then do more careful experiments and reword my conclusions.... read more >>

]]>This is a follow-up to my previous follow-up, posted several days ago. A very careful reader, Bruno Bazzano, contributed a comment pointing out what he called "a small typo" in my code for the classic Gram-Schmidt algorithm. It is more than a small typo, it is a serious blunder. I must correct the code, then do more careful experiments and reword my conclusions.

Bruno noticed that in the inner loop of my classic Gram-Schmidt function, the expression `Q(:,k-1)` should be `Q(:,1:k-1)`. I had dropped the two characters, " `1:` ". I need to orthogonalize against all the previous vectors, not just one of them. Here is the corrected code.

```
type gs
```

function [Q,R] = gs(X) % Classical Gram-Schmidt. [Q,R] = gs(X); % G. W. Stewart, "Matrix Algorithms, Volume 1", SIAM, 1998. [n,p] = size(X); Q = zeros(n,p); R = zeros(p,p); for k = 1:p Q(:,k) = X(:,k); if k ~= 1 R(1:k-1,k) = Q(:,1:k-1)'*Q(:,k); Q(:,k) = Q(:,k) - Q(:,1:k-1)*R(1:k-1,k); end R(k,k) = norm(Q(:,k)); Q(:,k) = Q(:,k)/R(k,k); end end

By the way, this points out the importance of publishing codes.

For various matrices `X`, let's again check how the three algorithms perform on two tasks, accuracy and orthogonality. How close is `Q*R` to `X`? And, how close is `Q'*Q` to `I`. I have modified the `compare` function in last week's post to return `ortherr`.

First, try a well conditioned matrix, a magic square of odd order.

n = 7; X = magic(n)

X = 30 39 48 1 10 19 28 38 47 7 9 18 27 29 46 6 8 17 26 35 37 5 14 16 25 34 36 45 13 15 24 33 42 44 4 21 23 32 41 43 3 12 22 31 40 49 2 11 20

Check its condition.

kappa = condest(X)

kappa = 8.5681

Do the comparison.

compare(X);

Classic Modified Householder QR error 1.52e-16 6.09e-17 5.68e-16 Orthogonality 1.87e-15 1.53e-15 1.96e-15

The `orthogonality` value in the `Classic` column is now at roundoff level. All three algorithms do well with both accuracy and orthogonality on this matrix.

I want to do two experiments with matrices that are poorly conditioned, but not exactly singular. First, Hilbert matrices of increasing order.

kappa = zeros(8,1); ortherr = zeros(8,3); for n = 1:8 X = hilb(n); kappa(n) = condest(X); ortherr(n,:) = compare(X); end

Plot the results with a logarithmic scale.

semilogy([ortherr eps*kappa eps*kappa.^2],'o-') axis([0 9 10^(-16) 10^5]) legend({'gs','mgs','house','eps*kappa','eps*kappa^2'}, ... 'location','northwest') title('Loss of orthogonality') xlabel('n')

The loss of orthogonality of GS is worse than the other two and roughly varies like the square of the condition number, `kappa`. The loss of orthogonality of MGS varies like `kappa` to the first power. Householder does well with orthogonality.

Another set of nonsingular, but poorly conditioned, matrices, generated by adding a small value to the diagonal of the magic square of order 8

M = magic(8); I = eye(8); for k = 1:8 s = 10^(-k); X = M + s*I; kappa(k) = condest(X); ortherr(k,:) = compare(X); end

Plot the results with a logarithmic scale.

semilogy([ortherr eps*kappa eps*kappa.^2],'o-') axis([0 9 10^(-16) 10^5]) legend({'gs','mgs','house','eps*kappa','eps*kappa^2'}, ... 'location','northwest') title('Loss of orthogonality') xlabel('log10(1/s)')

Again, the orthogonality of GS varies like `kappa^2`, and the orthogonality of MGS varies like `kappa`.

Finally, an exactly singular matrix, a magic square of even order.

n = 8; X = magic(n); compare(X);

Classic Modified Householder QR error 4.78e-17 8.54e-17 4.85e-16 Orthogonality 5.17e+00 2.16e+00 1.30e-15

All three algorithms do well with accuracy. Both Gram-Schmidts fail completely with orthogonality. Householder still does well with orthogonality.

All three of these algorithms provide `Q` and `R` that do a good job of reproducing the data `X`, that is

`Q`*`R`is always close to`X`for all three algorithms.

On the other hand, their behavior is very different when it comes to producing orthogonality.

- Classic Gram-Schmidt. Loss of orthogonality can be much worse than the other two when
`X`is badly conditioned. - Modified Gram-Schmidt.
`Q'*Q`varies like the condition of`X`and fails completely when`X`is singular. - Householder triangularization.
`Q'*Q`is always close to`I`

G. W. Stewart, *Matrix Algorithms: Volume 1: Basic Decompositions*, SIAM, xix+458, 1998. <http://epubs.siam.org/doi/book/10.1137/1.9781611971408>

Ake Bjorck, *Numerical Methods for Least Squares Problems*, SIAM, xvii+407, 1996. <http://epubs.siam.org/doi/book/10.1137/1.9781611971484>

Get
the MATLAB code

Published with MATLAB® R2016a

This is a follow-up to my previous post. Classical Gram-Schmidt and Modified Gram-Schmidt are two algorithms for orthogonalizing a set of vectors. Householder elementary reflectors can be used for the same task. The three algorithms have very different roundoff error properties.... read more >>

]]>This is a follow-up to my previous post. Classical Gram-Schmidt and Modified Gram-Schmidt are two algorithms for orthogonalizing a set of vectors. Householder elementary reflectors can be used for the same task. The three algorithms have very different roundoff error properties.

As I did in my previous post, I am using Pete Stewart's book *Matrix Algorithms, Volume I: Basic Decompositions*. His pseudocode is MATLAB ready.

The classic Gram-Schmidt algorithm is the first thing you might think of for producing an orthogonal set of vectors. For each vector in your data set, remove its projection onto the data set, normalize what is left, and include it in the orthogonal set. Here is the code. `X` is the original set of vectors, `Q` is the resulting set of orthogonal vectors, and `R` is the set of coefficients, organized into an upper triangular matrix.

```
type gs
```

function [Q,R] = gs(X) % Classical Gram-Schmidt. [Q,R] = gs(X); % G. W. Stewart, "Matrix Algorithms, Volume 1", SIAM, 1998. [n,p] = size(X); Q = zeros(n,p); R = zeros(p,p); for k = 1:p Q(:,k) = X(:,k); if k ~= 1 R(1:k-1,k) = Q(:,k-1)'*Q(:,k); Q(:,k) = Q(:,k) - Q(:,1:k-1)*R(1:k-1,k); end R(k,k) = norm(Q(:,k)); Q(:,k) = Q(:,k)/R(k,k); end end

This is a rather different algorithm, not just a simple modification of classical Gram-Schmidt. The idea is to orthogonalize against the emerging set of vectors instead of against the original set. There are two variants, a column-oriented one and a row-oriented one. They produce the same results, in different order. Here is the column version,

```
type mgs
```

function [Q,R] = mgs(X) % Modified Gram-Schmidt. [Q,R] = mgs(X); % G. W. Stewart, "Matrix Algorithms, Volume 1", SIAM, 1998. [n,p] = size(X); Q = zeros(n,p); R = zeros(p,p); for k = 1:p Q(:,k) = X(:,k); for i = 1:k-1 R(i,k) = Q(:,i)'*Q(:,k); Q(:,k) = Q(:,k) - R(i,k)*Q(:,i); end R(k,k) = norm(Q(:,k))'; Q(:,k) = Q(:,k)/R(k,k); end end

Compute `R` by applying Householder reflections to `X` a column at a time. Do not actually compute `Q`, just save the vectors that generate the reflections. See the description and codes from my previous post.

```
type house_qr
```

function [U,R] = house_qr(A) % Householder reflections for QR decomposition. % [U,R] = house_qr(A) returns % U, the reflector generators for use by house_apply. % R, the upper triangular factor. H = @(u,x) x - u*(u'*x); [m,n] = size(A); U = zeros(m,n); R = A; for j = 1:min(m,n) u = house_gen(R(j:m,j)); U(j:m,j) = u; R(j:m,j:n) = H(u,R(j:m,j:n)); R(j+1:m,j) = 0; end end

For various matrices `X`, let's check how the three algorithms perform on two tasks, accuracy and orthogonality. How close is `Q*R` to `X`? And, how close is `Q'*Q` to `I`.

```
type compare.m
```

function compare(X); % compare(X). Compare three QR decompositions. I = eye(size(X)); qrerr = zeros(1,3); ortherr = zeros(1,3); %% Classic Gram Schmidt [Q,R] = gs(X); qrerr(1) = norm(Q*R-X,inf)/norm(X,inf); ortherr(1) = norm(Q'*Q-I,inf); %% Modified Gram Schmidt [Q,R] = mgs(X); qrerr(2) = norm(Q*R-X,inf)/norm(X,inf); ortherr(2) = norm(Q'*Q-I,inf); %% Householder QR Decomposition [U,R] = house_qr(X); QR = house_apply(U,R); QQ = house_apply_transpose(U,house_apply(U,I)); qrerr(3) = norm(QR-X,inf)/norm(X,inf); ortherr(3) = norm(QQ-I,inf); %% Report results fprintf('\n Classic Modified Householder\n') fprintf('QR error %10.2e %10.2e %10.2e\n',qrerr) fprintf('Orthogonality %10.2e %10.2e %10.2e\n',ortherr)

First, try a well conditioned matrix, a magic square of odd order.

n = 7; X = magic(n)

X = 30 39 48 1 10 19 28 38 47 7 9 18 27 29 46 6 8 17 26 35 37 5 14 16 25 34 36 45 13 15 24 33 42 44 4 21 23 32 41 43 3 12 22 31 40 49 2 11 20

Check its condition.

kappa = condest(X)

kappa = 8.5681

Do the comparison.

compare(X);

Classic Modified Householder QR error 1.73e-16 6.09e-17 5.68e-16 Orthogonality 3.20e+00 1.53e-15 1.96e-15

All three algorithms do well with accuracy, but classic Gram-Schmidt fails with orthogonality.

Next, try a Hilbert matrix that is poorly conditioned, but not exactly singular.

n = 7; X = hilb(n)

X = 1.0000 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.0769

Check its condition.

kappa = condest(X)

kappa = 9.8519e+08

Do the comparison.

compare(X)

Classic Modified Householder QR error 5.35e-17 5.35e-17 8.03e-16 Orthogonality 5.21e+00 1.22e-08 1.67e-15

All three algorithms do well with accuracy. Classic Gram-Schmidt fails completely with orthogonality. The orthogonality of MGS depends upon `kappa`. Householder does well with orthogonality.

Finally, an exactly singular matrix, a magic square of even order.

n = 8; X = magic(n)

X = 64 2 3 61 60 6 7 57 9 55 54 12 13 51 50 16 17 47 46 20 21 43 42 24 40 26 27 37 36 30 31 33 32 34 35 29 28 38 39 25 41 23 22 44 45 19 18 48 49 15 14 52 53 11 10 56 8 58 59 5 4 62 63 1

Check its rank.

rankX = rank(X)

rankX = 3

Do the comparison.

compare(X)

Classic Modified Householder QR error 1.43e-16 8.54e-17 4.85e-16 Orthogonality 5.41e+00 2.16e+00 1.30e-15

Again, all three algorithms do well with accuracy. Both Gram-Schmidts fail completely with orthogonality. Householder still does well with orthogonality.

All three of these algorithms provide `Q` and `R` that do a good job of reproducing the data `X`, that is

`Q`*`R`is always close to`X`for all three algorithms.

On the other hand, their behavior is very different when it comes to producing orthogonality.

- Classic Gram-Schmidt.
`Q'*Q`is almost never close to`I`. - Modified Gram-Schmidt.
`Q'*Q`depends upon condition of`X`and fails completely when`X`is singular. - Householder triangularization.
`Q'*Q`is always close to`I`

G. W. Stewart, *Matrix Algorithms: Volume 1: Basic Decompositions*, SIAM, xix+458, 1998. <http://epubs.siam.org/doi/book/10.1137/1.9781611971408>

Get
the MATLAB code

Published with MATLAB® R2016a

The QR decomposition is often the first step in algorithms for solving many different matrix problems, including linear systems, eigenvalues, and singular values. Householder reflections are the preferred tool for computing the QR decomposition.... read more >>

]]>The QR decomposition is often the first step in algorithms for solving many different matrix problems, including linear systems, eigenvalues, and singular values. Householder reflections are the preferred tool for computing the QR decomposition.

Alston Householder (1904-1993) is one of the pioneers of modern numerical linear algebra. He was a member of the mathematics division of Oak Ridge National Laboratory for over 20 years, from 1946 until 1969, and was also a Professor at the University of Tennessee.

Alston served as President of both SIAM and ACM. He introduced what he called *elementary Hermitian matrices* in a paper in the Journal of the ACM in 1958.

Alston was head of the organizing committee for the Gatlinburg Conferences on Numerical Algebra. A photo of the 1964 committee is available in your MATLAB `demos` directory.

```
load gatlin
image(X)
colormap(map)
```

The Gatlinburg Conferences are now called the Householder Conferences. I wrote about them in MATLAB News & Notes.

My colleague and friend G. W. Stewart is a Distinguished University Professor Emeritus at the Department of Computer Science, University of Maryland. Everybody knows him as "Pete". He has never been able to satisfactorily explain the origins of "Pete" to me. It somehow goes back through his father to his grandfather and maybe great grandfather, who were also nicknamed "Pete".

Pete was Householder's Ph.D. student at the University of Tennessee.

Pete has written several books on numerical linear algebra. The reference for my blog today is his book "Matrix Algorithms, Volume I: Basic Decompositions", published by SIAM. His pseudocode is MATLAB ready.

The QR decomposition expresses a matrix as the product of an orthogonal matrix and an upper triangular matrix. The letter Q is a substitute for the letter O from "orthogonal" and the letter R is from "right", an alternative for "upper".

The decomposition is available explicitly from the MATLAB function `qr`. But, more importantly, the decomposition is part of the fundamental MATLAB linear equation solver denoted by backslash, " `\` ", as well as both the `eig` and `svd` functions for dense matrices.

For any matrix $A$ we write

$$ A = QR $$

Operations based upon orthogonal matrices are very desirable in numeric computation because they do not magnify errors, either those inherited from the underlying data, or those introduced by floating point arithmetic.

A nifty example, taken from the *Wikipedia* page on "QR decomposition", is unusual because $A$, $R$, and a renormalization of $Q$ all have integer entries.

A = [12 -51 4; 6 167 -68; -4 24 -41] [Q,R] = qr(A)

A = 12 -51 4 6 167 -68 -4 24 -41 Q = -0.8571 0.3943 0.3314 -0.4286 -0.9029 -0.0343 0.2857 -0.1714 0.9429 R = -14.0000 -21.0000 14.0000 0 -175.0000 70.0000 0 0 -35.0000

Scale $Q$ to get integers.

cQ = 175*Q

cQ = -150.0000 69.0000 58.0000 -75.0000 -158.0000 -6.0000 50.0000 -30.0000 165.0000

Let's check that $Q$ and $R$ reproduce $A$.

QR = Q*R

QR = 12.0000 -51.0000 4.0000 6.0000 167.0000 -68.0000 -4.0000 24.0000 -41.0000

A Householder reflection is characterized by a vector $u$, which, following Pete's convention, is normalized to have

$$ ||u|| = \sqrt{2} $$

It is usual to define a Householder reflection by a matrix. But I want to define it in a different way, by a MATLAB anonymous function.

H = @(u,x) x - u*(u'*x);

This emphasizes the way in which the reflection should be computed. For any vector $x$, find its projection onto $u$ and then subtract that multiple of $u$ from $x$. In the following picture $u_p$ is a line perpendicular to $u$. In more dimensions $u_p$ would be the plane through the origin perpendicular to $u$. With the $\sqrt{2}$ normalization of $u$, the effect of $H$ is to transform any vector $x$ to its mirror image $Hx$ on the other side of this plane. (Vectors in the plane are left there by $H$.)

house_pic

Where do we get the vector `u` that characterizes the reflection? We want to operate on the columns of a matrix to introduce zeros below the diagonal. Let's begin with the first column, call it `x`. We want to find `u` so that `Hx = H(u,x)` is zero below the first element. And, since the reflection is to preserve length, the only nonzero element in `Hx` should have `abs(Hx(1)) == norm(x)`.

Start with `x` normalized to have length one.

u = x/norm(x)

Now add `+1` or `-1` to `u(1)`, choosing the sign to match. The other choice involves cancellation and is less desirable numerically.

u(1) = u(1) + sign(u(1))

Finally, scale `u` by `sqrt(abs(u(1)))`.

u = u/sqrt(abs(u(1)))

It turns out that this makes `norm(u)` equal to `sqrt(2)`. And a bit more algebra shows that the resulting reflection zeros out all but the first element of `x`.

The figure above illustrates the situation because `Hx` has only one nonzero component.

Here is the code, including the case where `x` is already all zero.

```
type house_gen
```

function u = house_gen(x) % u = house_gen(x) % Generate Householder reflection. % u = house_gen(x) returns u with norm(u) = sqrt(2), and % H(u,x) = x - u*(u'*x) = -+ norm(x)*e_1. % Modify the sign function so that sign(0) = 1. sig = @(u) sign(u) + (u==0); nu = norm(x); if nu ~= 0 u = x/nu; u(1) = u(1) + sig(u(1)); u = u/sqrt(abs(u(1))); else u = x; u(1) = sqrt(2); end end

Let's try this on the first column of the *Wikipedia* example

x = [12 6 -4]'

x = 12 6 -4

The vector `u` generated is

u = house_gen(x)

u = 1.3628 0.3145 -0.2097

The resulting reflection has the desired effect.

Hx = H(u,x)

Hx = -14.0000 -0.0000 0.0000

`Hx(1)` is equal to `-norm(x)` and the other elements of `Hx` are zero.

Our anonymous function can be represented by a matrix. This is the usual way of defining Householder reflections

I = eye(3); M = H(u,I)

M = -0.8571 -0.4286 0.2857 -0.4286 0.9011 0.0659 0.2857 0.0659 0.9560

In general

$$M = I - uu'$$

Multiplication by this matrix produces the same reflection as the anonymous function, but requires $n^2$ operations, instead of $2n$.

Recall that Gaussian elimination can be described by a sequence of matrix multiplications, but it is actually carried out by subtracting multiples of rows from other rows. There is an anonymous function for this elimination operation, but it is not so easy to express the pivoting.

We are now ready to compute the QR decomposition. Pass over the columns of the input matrix, using Householder reflections to introduce zeros below the diagonal. This produces the R matrix. It is inefficient and usually not necessary to actually compute Q. Just save the `u` 's.

Here is where the way we have expressed the anonymous function is important. When `x` is a vector, `u'*x` is a scalar and we could have written it in front of `u`, like this.

H = @(u,x) x - (u'*x)*u;

But we want to apply `H` to several columns of a matrix at once, so we have written `(u'*x)` after `u`, like this,

H = @(u,x) x - u*(u'*x);

```
type house_qr
```

function [R,U] = house_qr(A) % Householder reflections for QR decomposition. % [R,U] = house_qr(A) returns % R, the upper triangular factor, and % U, the reflector generators for use by house_apply. H = @(u,x) x - u*(u'*x); [m,n] = size(A); U = zeros(m,n); R = A; for j = 1:min(m,n) u = house_gen(R(j:m,j)); U(j:m,j) = u; R(j:m,j:n) = H(u,R(j:m,j:n)); R(j+1:m,j) = 0; end end

Use a magic square as an example.

A = magic(6)

A = 35 1 6 26 19 24 3 32 7 21 23 25 31 9 2 22 27 20 8 28 33 17 10 15 30 5 34 12 14 16 4 36 29 13 18 11

Compute its decomposition.

[R,U] = house_qr(A)

R = -56.3471 -16.4693 -30.0459 -39.0969 -38.0321 -38.6710 0 -54.2196 -34.8797 -23.1669 -25.2609 -23.2963 0 0 32.4907 -8.9182 -11.2895 -7.9245 0 0 0 -7.6283 3.9114 -7.4339 0 0 0 0 -3.4197 -6.8393 0 0 0 0 0 -0.0000 U = 1.2732 0 0 0 0 0 0.0418 1.2568 0 0 0 0 0.4321 0.0451 -1.1661 0 0 0 0.1115 0.3884 0.4557 1.0739 0 0 0.4182 -0.0108 0.5942 -0.6455 1.0796 0 0.0558 0.5171 0.2819 -0.6558 -0.9135 1.4142

The fact that `R(6,6)` is equal to zero tells us that the magic square of order 6 has rank less than 6 and so is singular

type house_apply type house_apply_transpose

function Z = house_apply(U,X) % Apply Householder reflections. % Z = house_apply(U,X), with U from house_qr % computes Q*X without actually computing Q. H = @(u,x) x - u*(u'*x); Z = X; [~,n] = size(U); for j = n:-1:1 Z = H(U(:,j),Z); end end function Z = house_apply_transpose(U,X) % Apply Householder transposed reflections. % Z = house_apply(U,X), with U from house_qr % computes Q'*X without actually computing Q'. H = @(u,x) x - u*(u'*x); Z = X; [~,n] = size(U); for j = 1:n Z = H(U(:,j),Z); end end

We can use `house_apply` to get the matrix $Q$ of the QR decomposition by applying the transformations to the identity matrix.

I = eye(size(U)); Q = house_apply(U,I)

Q = -0.6211 0.1702 -0.2070 -0.4998 0.2062 0.5000 -0.0532 -0.5740 -0.4500 -0.2106 -0.6487 -0.0000 -0.5502 0.0011 -0.4460 0.4537 0.2062 -0.5000 -0.1420 -0.4733 0.3763 -0.5034 0.3329 -0.5000 -0.5324 0.0695 0.6287 0.2096 -0.5220 -0.0000 -0.0710 -0.6424 0.1373 0.4501 0.3329 0.5000

Check that `Q` is orthogonal.

QQ = Q'*Q

QQ = 1.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 1.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 1.0000

And that `Q*R` regenerates our magic square.

QR = Q*R

QR = 35.0000 1.0000 6.0000 26.0000 19.0000 24.0000 3.0000 32.0000 7.0000 21.0000 23.0000 25.0000 31.0000 9.0000 2.0000 22.0000 27.0000 20.0000 8.0000 28.0000 33.0000 17.0000 10.0000 15.0000 30.0000 5.0000 34.0000 12.0000 14.0000 16.0000 4.0000 36.0000 29.0000 13.0000 18.0000 11.0000

Alston S. Householder, "Unitary Triangularization of a Nonsymmetric Matrix", Journal of the ACM 5, 339-242, 1958. <http://dl.acm.org/citation.cfm?doid=320941.320947>

G. W. Stewart, *Matrix Algorithms: Volume 1: Basic Decompositions*, SIAM, xix+458, 1998. <http://epubs.siam.org/doi/book/10.1137/1.9781611971408>

Get
the MATLAB code

Published with MATLAB® R2016a

A new app employs transformations of a graphic depicting a house to demonstrate matrix multiplication.... read more >>

]]>A new app employs transformations of a graphic depicting a house to demonstrate matrix multiplication.

The house in the animated gif has been featured in Gil Strang's textbook, *Introduction to Linear Algebra*. A quilt inspired by the house in on the cover of the third edition.

You can rotate the house with your mouse by clicking and dragging the peak of the roof. You can stretch the house horizontally by clicking and dragging either side. You can stretch the house vertically by dragging the floor.

The motion is effected by matrix multiplication. Rotating the roof through an angle $\theta$ defines a 2-by-2 orthogonal matrix $U$.

$$ U = \left( \begin{array}{rr} \cos\theta \ \ \sin\theta \\ -\sin\theta \ \ \cos\theta \\ \end{array} \right) $$

Moving a side defines a horizontal scaling $\sigma_1$. Moving the floor defines a vertical scaling $\sigma_2$. Together they form the diagonal scaling matrix $S$.

$$ S = \left( \begin{array}{rr} \sigma_1 \ \ 0 \\ 0 \ \ \sigma_2 \\ \end{array} \right) $$

The resulting 2-by-2 transformation matrix $A$ is the product of $U$ and $S$.

$$A = U S $$

Watch how the three matrices change as you manipulate the house. The scale factor $\sigma_1$ operates on the first column of $U$, while $\sigma_2$ scales the second.

The code for this app is available here.

Get
the MATLAB code

Published with MATLAB® R2016a

MATLAB Central is celebrating its 15th birthday this fall. In honor of the occasion, MathWorks bloggers are reminiscing about their first involvement with the Web site. My first contribution to the File Exchange was not MATLAB software, but rather a collection of documents that I called the Pentium Papers. I saved this material in November and December of 1994 when I was deeply involved in the Intel Pentium Floating Point Division Affair.... read more >>

]]>MATLAB Central is celebrating its 15th birthday this fall. In honor of the occasion, MathWorks bloggers are reminiscing about their first involvement with the Web site. My first contribution to the File Exchange was not MATLAB software, but rather a collection of documents that I called the Pentium Papers. I saved this material in November and December of 1994 when I was deeply involved in the Intel Pentium Floating Point Division Affair.

Pentium key chain. Image thanks to Thomas Johansson, thomas@cpucollection.se.

*Bad companies are destroyed by crises;
Good companies survive them;
Great companies are improved by them.;*

–Andy Grove, Chairman, Intel Corp., December 1994

As I said in my blog post in 2013, “The Pentium division bug episode in the fall of 1994 was a defining moment for the MathWorks, for the Internet, for Intel Corporation, and for me personally.”

In the fall of 1994 the Internet was not anything like it is today. The World Wide Web was in its infancy. All of the connections were text based. The Mosaic graphical browser was only a few months old and few people knew about it. Internet explorer did not yet exist and Google was four years in the future. We did have email. Some of us used FTP, File Transfer Protocol. And the social media of the day were the text only news groups, which had names like `comp.soft-sys.matlab` and `comp.soft-sys.intel`.

But calling the news groups “social media” is a stretch. The term “social media” didn’t exist in 1994. And almost all of the participants in the news groups were geeks and nerds in universities and industrial research labs.

Thomas Nicely, a Professor of Mathematics at Lynchburg College in Virginia, did research on prime numbers, especially *twin primes*. For his computational experiments he employed several IBM PCs running Intel 486 processors. In the summer of 1994 he added a PC with the new Pentium chip. Much to his surprise, he found the Pentium gave a different result for the reciprocal of one of his primes.

On October 30, Nicely emailed several other Pentium users that he had discovered a bug in the Pentium’s floating point arithmetic. The news soon reached Terje Mathisen, a computer jock at Norsk Hydro in Oslo, Norway, who had written about the accuracy of Intel’s transcendental functions. Mathisen confirmed the bug and, on November 3, posted a test program to `comp.soft-sys.intel`. A day later Andreas Kaiser in Germany used Mathisen’s program to post a list of numbers whose reciprocals were being computed to what appeared to be only single precision accuracy.

Tim Coe designed floating point hardware for an aerospace contractor in California. He didn’t have access to a Pentium machine. But from Kaiser’s list of erroneous reciprocals he was able to reverse engineer Intel’s division algorithm. He expected that certain divisions, involving quantities with specific bit patterns, would produce results that were much less accurate than even single precision. He drove to a local computer store, ran a calculator program on a Pentium in the showroom, and confirmed his prediction.

I first heard about the FDIV bug (FDIV is the mnemonic for Floating point Division on x86 processors) in early November from an email list for floating point arithmetic. It appeared to be a single/double precision hardware glitch. Annoying, but not surprising. I started to follow `comp.soft-sys.intel` anyway. But when Coe posted his results, and when MathWorks tech support got a couple of queries asking how this affect MATLAB, I got seriously interested.

On November 15, 1994, I made the first of what would become several postings to both `comp.soft-sys.intel` and `comp.soft-sys.matlab`, summarizing what I knew. I pointed out that the relative error in one of Coe’s examples is $6.1 \cdot 10^{-5}$. This is ten orders of magnitude larger than what we expect from MATLAB, or any other scientific computation using IEEE double precision. The error might not occur very often, but when it does, it can be very significant.

Within the next week the Net became very active. Several participants contacted newspapers and TV stations. Two engineers at the Jet Propulsion Laboratory convinced JPL to issue a press release announcing that they were no longer purchasing Pentium-based PCs. Reporters seeking more information found my posting and redistributed it.

On November 22 CNN sent a TV crew to MathWorks, interviewed me, and then led off their evening *Moneyline* with a story about Intel’s troubles. I spent the next day fielding phone calls from other reporters. A number of newspapers, including the *New York Times* and the *San Jose Mercury News*, ran stories on the 24th. The headline of the front page story in the *Boston Globe* was “Sorry, Wrong Number”.

As a corporation, Intel did not know how to react and handled the situation badly. They had little experience dealing with the public. Their customers were computer manufacturers, not individual computer users. They had no experience whatsoever dealing with the Internet. Their first response came in the form of canned statement that could be faxed back to anyone contacting tech support via fax. This FAXBACK document was soon posted to the Net, but not by Intel. I’ve included a copy in the Pentium Papers.

Intel’s statement said that they had already discovered the FDIV “flaw” themselves and had fixed it in recent releases of the Pentium chip. They claimed that it would occur only once in 9 billion divide operations and that the average spreadsheet user would encounter it only once in 27,000 years of use. They provided an 800 telephone number to call for anyone doing “prime number generation or other complex mathematics”. They would interview callers and offer to replace the chip for anyone they thought really required error-free divisions.

Needless to say, this aggressive non-apology only enflamed the criticism of Intel on the Net. Nobody believed their claims about the frequency of occurrence. The frequencies came from a study Intel had made, but initially refused to release. As far as I was concerned, the problem was not the likelihood of encountering the FDIV bug, it was the fact that we had to worry about it at all.

Tim Coe, Terje Mathisen and I devised a scheme where a Pentium FDIV hardware instruction could be replaced by less than a dozen lines of software that insured all divisions were done correctly. The idea is that the divisor of each prospective division operation would be checked for the presence of certain bit patterns in the floating point fraction that made it “at risk”. When an at risk divisor and the corresponding dividend are both scaled by 15/16, the quotient remains unchanged, but the operation can then be done safely by the FDIV instruction.

We wanted to make our workaround widely available. Intel contacted us and, along with Peter Tang, a computer scientist who was then at Argonne and who has been a consultant to Intel, we began to work, via conference calls, with a group at Intel. It was our intention to provide the workaround to compiler writers and major software vendors, and to announce its availability on the Net. The workaround macro would replace FDIV in all PC software being developed. (It was more complicated — functions like “mod” and “rem” and a few transcendental functions like “atan” that had Pentium hardware support were also involved.)

MATLAB was the proof-of-concept for the software workaround. We built and released a special “Pentium Aware” MATLAB. Its documentation says

MATLAB detects, and optionally repairs,

erroneous arithmetic results produced by Intel’s Pentium processor.

Erroneous results are infrequent, but can occur in many MATLAB operations

and functions. … When an erroneous result is detected, MATLAB prints a

message. … Options exist to suppress the printing of the messages,

count the number of occurrences, and suppress the corrections

altogether.

Our public relations firm had sent out a press release with the headline

THE MATHWORKS DEVELOPS FIX FOR THE INTEL PENTIUM(tm) FLOATING POINT ERROR

At the time, MathWorks had just reached its 10th anniversary. The company name was barely known in the industries we served, and completely unheard of in the public generally. So when a press release arrived saying this obscure little company in Massachusetts has fixed the Pentium bug, it created quite a stir. I got dozens of more phone calls.

I now have a folder of hundreds of press clippings from all over the world that resulted from the Pentium affair.

On December 12th IBM issued its own study. IBM had several reasons to get involved. They had invented, or at least named, the IBM PC, and one division was selling PCs employing the Pentium. Another division was developing and manufacturing its own chip, the PowerPC.

The IBM study claimed that typical spreadsheet calculations were likely to generate numbers with the “at risk” bit patterns and so FDIV errors were much more likely to occur than Intel claimed. The study also cleverly multiplied their predicted likelihood of an individual spreadsheet user encountering an error by an estimated total number of spreadsheet users worldwide to conclude “on a typical day a large number of people are making mistakes in their computation without realizing it.”

IBM announced they were suspending production of Pentium-based PCs.

Within hours of the IBM announcement, Intel’s stock price dropped 10 points. A week later Intel issued an apology and announced a no-questions-asked return policy on Pentium chips. They set up a network of service centers to handle the replacements and allocated $475 million to pay for the replacement program.

Months later, very few actual requests for replacements had been made. We had learned that encountering the FDIV error was, in fact, very unlikely. But more important, for most people, Intel’s apology was enough.

As all this was happening, I saved some of the postings from the `comp.soft-sys.intel` news groups, a few newspaper stories and a few other contemporary documents. I would respond to email requests for information with “If you have access to the Internet, you can download my Pentium Papers using anonymous ftp from `ftp.mathworks.com`.”

As the MathWorks Web site developed there was usually a note somewhere about how to get to the Pentium Papers. When we started MATLAB Central File Exchange I moved the collection there as my first contribution.

If you open the `.txt` files in the Pentium Papers with Microsoft Word, it will break up the long lines and produce a readable file.

Published with MATLAB® R2016a

]]>Jim Sanderson has had a fascinating professional life. He was my PhD student in math at the University of New Mexico in the 1970s. He spent almost 20 years as a computational scientist at Los Alamos National Laboratory, working on the lab's supercomputers. He then developed an interest in ecology, went back to school, and is now the world's leading authority on the preservation of small wild cats around the world.... read more >>

]]>Jim Sanderson has had a fascinating professional life. He was my PhD student in math at the University of New Mexico in the 1970s. He spent almost 20 years as a computational scientist at Los Alamos National Laboratory, working on the lab’s supercomputers. He then developed an interest in ecology, went back to school, and is now the world’s leading authority on the preservation of small wild cats around the world.

<http://globalwildlife.org/about-gwc/team/jim-sanderson-ph-d>

One of the great success stories of modern numerical linear algebra is the QR algorithm with the Francis shift for computing all the eigenvalues of a symmetric tridiagonal matrix. Jim Wilkinson showed that the algorithm was superior to all its competitors in the 1960s. So it was a key component of the Wilkinson and Reinsch *Handbook*, of *EISPACK*, and ultimately of the first MATLAB. It survives today as one of the primary routines in LAPACK and MATLAB.

Wilkinson established the algorithm’s global convergence and its asymptotic cubic convergence rate. But, curiously, he never analyzed its roundoff error behavior. This was Jim Sanderson’s PhD thesis. At the time Jim was writing his thesis, the academic computer science community was beginning to be interested in the notion of proving algorithms correct. Jim found it necessary to reorder the operations in Wilkinson’s implementation, but with this modification he was able to prove that the algorithm was correct. It was guaranteed to be globally, and cubically, convergent even in the presence of floating point roundoff errors.

After grad school, Jim joined the staff of the Los Alamos National Laboratory, located in the high desert mountains of northern New Mexico. Starting with the Manhattan Project, the Lab had originally developed the nation’s atomic and nuclear weapons. But with the limits imposed by international treaties, the Lab’s mission evolved from developing new weapons to maintaining existing stockpiles.

I’m not exactly sure what Jim did during his almost 20 years at Los Alamos. He is not allowed to tell me in much detail. At first it involved image processing. Later it involved large computer programs, very likely written in Fortran, run on whatever were the world’s fastest supercomputers of the day, that implemented partial differential equation models of complex nuclear reactions.

During his time at Los Alamos, Jim developed a hobby, nature photography. He says it was esoteric, artistic stuff, with photos of fallen leaves on the ground and dark clouds in a storm.

Eliot Porter was one of the first photographers to exhibit and publish color photographs of nature. He and his family eventually established a ranch in Tesuque, New Mexico, not far from Los Alamos. Jim came to work under Eliot and travel with him on several photography expeditions. Porter had a strong influence on Jim’s interest in nature photography and conservation.

In 1995 Jim left the Labs and, in middle age, returned to undergrad studies, this time majoring in biology, again at the University of New Mexico. After graduating for the second time from UNM, Jim went on to the highly regarded graduate program in Wildlife Ecology and Conservation at the University of Florida.

After completing grad school in Ecology at UFL, Jim joined Conservation International, working as a landscape ecologist.

By this time Jim was interested in small, wild cats. Big cats — lions, tigers, cheetahs, leopards — get most of the world’s attention. But there are over two dozen smaller wild cats, most of which we’ve never heard of and many of which are endangered. These cats have been an important focus in Jim’s second career.

Jim formed the Small Cat Conservation Alliance in 2006. That organization evolved into the Small Wild Cat Conservation Foundation in 2008.

One of the first cats to get Jim’s attention was the Andean Mountain Cat. Jim was the first scientist to confirm this cat’s existence. It lives above 14,000 feet in Andes of Chile and Bolivia. Before Jim’s work it was only known as a mythical creature that threatened the villagers’ chickens.

*Andean_cat_1_Jim_Sanderson.jpg*, Small Wild Cat Conservation Foundation.

Jim took this portrait of the cat in 1998. He says he brought all of his nature photography skills to bear when he took this shot. The photo was featured in an ad for Canon cameras that ran in the National Geographic magazine and that earned a sizable contribution to the Small Cat Conservation Alliance.

A few years ago Jim was on a long flight back from South America. He noticed a guy across the aisle doing some sort of mathematical looking puzzle in the airline magazine and asked him what it was. The guy introduced Jim (who had apparently been living away from the rest of civilization) to Sudoku. Jim returned to his seat, and to his laptop, and wrote a Fortran (more evidence of his lifestyle) program to play Sudoku.

When showed the solution to the puzzle, the guy across the aisle, not realizing that Jim had written an entire puzzle-solving program, was not impressed. “Took you a long time”, was the terse reaction. No more conversation there.

Jim told me about the incident some time later. I had a different reaction. “Time you learned to use MATLAB.”

I had been avoiding Sudoku because I was afraid I would get sucked in. To this day I’ve never done a puzzle by hand. But the program I wrote to demonstrate MATLAB to Jim is the same program that I’ve described in Cleve’s Corner, MathWorks News & Notes. I got sucked in in a different way.

Jared Diamond is a Professor of Geography and Physiology at UCLA. He is best known for the 1998 Pulitzer Prize winning book “Guns, Germs, and Steel: The Fates of Human Societies”. A documentary based on the book was produced by the National Geographic Society and broadcast on PBS.

Before writing about the great issues that made him famous, one of Diamond’s first interests in science was birds. How do various species of birds distribute themselves across strings of islands? Is the distribution random? Is it different from what it would be if the species did not interact?

In 1975, in a 102 page paper, “Assembly of species communities”, Diamond introduced the concepts of *competitive exclusion* and *checkboard distributions*. He suggests that species composition on a particular island is due to minimization of unutilized resources. The resulting distributions observed across an archipelago are alternating patterns that he called *checkerboards*. The data supporting his hypothesis came from his own extensive personal observations in the Bismarck Archipelago near Papua New Guinea in the western Pacific Ocean, as well as that of other visiting observers and natives.

In 1979, Edward Connor and Daniel Simberloff set off a controversy that continues to this day when they published “Assembly of species communities: chance or competition?” They pointed out that Diamond’s observed patterns had not been tested against appropriate null hypotheses. They suggested that the observed patterns could not be distinguished from patterns resulting from random distributions of the species across the subject area.

Sanderson became interested in the controversy when he read the papers while he was in school. He figured he could settle whether the patterns were random with some computer simulations. He needed to generate random matrices while maintaining constraints on the row and column sums. One approach involving the Knight’s Tour proved to be too slow to generate the $10^6$ random matrices required in this situation and faster alternatives were developed.

Jim’s simulations strongly supported Diamond’s thesis. Stuart Pimm, now a professor at Duke, had been a professor at Florida, knew Diamond, and sent Jim’s work to Diamond. A paper by the three of them, James Sanderson, Jared Diamond, and Stuart Pimm, “Pairwise Co-existence of Bismarck and Solomon Landbird Species”, was finally published in 2009. The paper criticized Connor and Simberloff’s methodology and conclusions in no uncertain terms.

Connor and Simberloff, together with Michael Collins, responded with “The checkered history of checkerboard distributions” in 2013. They claim that “Few, if any, ‘true checkerboards’ exist in these archipelagoes that could possibly have been influenced by competitive interactions.”

A pair of back-to-back notes published in December 2015 continue the interchange. One is by Diamond, Pimm, and Sanderson. “The checkered history of checkerboard distributions: comment”. The follow-up comes from Connor, Collins, and Simberloff. “The checkered history of checkerboard distributions: reply”. One of the arguments is over convex hulls on maps.

Jim and Stuart Pimm have written a book describing the whole affair, at least as of last year, that is accessible to a wider audience.

James G. Sanderson and Stuart L. Pimm, _Patterns in Nature_, The Analysis of Species Co-occurrences, The University of Chicago Press, 205 pages, 2015.

In the preface to the book they say, “Some of what happened thereafter was simply ugly, involving one of the most violent clashes of personalities in recent ecological history.” They quote Professor R. J. Putman of the University of Glasgow, even as far back as 1994, that the debate was “almost unprecedented in the apparent entrenchment and hostility of the opposing camps.”

Jim recently joined a conservation organization based in Austin that was established by some friends of his, Global Wildlife Conservation, <http://globalwildlife.org>. He is their new Director of Wild Cat Conservation.

Published with MATLAB® R2016a

]]>