A headline in the *New York Times* at the end of 2016 said "Growth of U.S. Population Is at Slowest Pace Since 1937". This prompted me to revisit an old chestnut about fitting and extrapolating census data. In the process I have added a couple of nonlinear fits, namely the logistic curve and the double exponential Gompertz model.... read more >>

A headline in the *New York Times* at the end of 2016 said "Growth of U.S. Population Is at Slowest Pace Since 1937". This prompted me to revisit an old chestnut about fitting and extrapolating census data. In the process I have added a couple of nonlinear fits, namely the logistic curve and the double exponential Gompertz model.

This experiment is older than MATLAB. It started as an exercise in *Computer Methods for Mathematical Computations*, by Forsythe, Malcolm and Moler, published by Prentice-Hall 40 years ago. We were using Fortran back then. The data set has been updated every ten years since. Today, MATLAB graphics makes it easier to vary the parameters and see the results, but the underlying mathematical principles are unchanged:

- Using polynomials of even modest degree to predict the future by extrapolating data is a risky business.

Recall that the famous computational scientist Yogi Berra said

- "It's tough to make predictions, especially about the future."

The data are from the decennial census of the United States for the years 1900 to 2010. The population is in millions.

1900 75.995 1910 91.972 1920 105.711 1930 123.203 1940 131.669 1950 150.697 1960 179.323 1970 203.212 1980 226.505 1990 249.633 2000 281.422 2010 308.746

The task is to extrapolate beyond 2010. Let's see how an extrapolation of just six years to 2016 matches the Census Bureau announcement. Before you read any further, pause and make your own guess.

Here's is the opening screen of the January 2017 edition of my `censusapp`, which is included in Cleve's Laboratory. The plus and minus buttons change the extrapolation year in the title. If you go beyond 2030, the plot zooms out.

The pull-down menu offers these models. Forty years ago we had only polynomials.

models = {'census data','polynomial','pchip','spline', ... 'exponential','logistic','gompertz'}'

models = 7×1 cell array 'census data' 'polynomial' 'pchip' 'spline' 'exponential' 'logistic' 'gompertz'

The Census Bureau news release that prompted the story in *The Times* said the official population in 2016 was 323.1 million. That was on Census Day, which is April 1 of each year. The Census Bureau also provides a dynamic Population Clock that operates continuously. But let's stick with the 323.1 number.

Polynomials like to wiggle. Constrained to match data in a particular interval, they go crazy outside that interval. Today, there are 12 data points. The app lets you vary the polynomial degree between 0 and 11. Polynomials with degree less than 11 approximate the data in a least squares sense. The polynomial of degree 11 interpolates the data exactly. As the degree is increased, the approximation of the data becomes more accurate, but the behavior beyond 2010 (or before 1900) becomes more violent. Here are degrees 2 and 7, 9, 11, superimposed on one plot.

The quadratic fit is the best behaved. When evaluated at year 2016, it misses the target by six million. Of course, there is no reason to believe that the US population grows like a second degree polynomial in time.

The interpolating polynomial of degree 11 tries to escape even before it gets to 2010, and it goes negative late in 2014.

MATLAB has two piecewise cubic interpolating polynomials. The classic `spline` is smooth because it has two continuous derivatives. Its competitor `pchip` sacrifices a continuous second derivate to preserve shape and avoid overshoots. I blogged about splines and pchips a few years ago.

Neither is intended for extrapolation, but we will do it anyway. Their behavior beyond the interval is determined by their end conditions. The classic `spline` uses the so-called *not-a-knot* condition. It is actually a single cubic in the last two subintervals. That cubic is also used for extrapolation beyond the endpoint. `pchip` uses just the last three data points to create a different shape-preserving cubic for use in the last subinterval and beyond.

Let's zoom in on the two. Both are predicting a decreasing rate of growth beyond 2010, just as the Census Bureau is observing. `pchip` gets lucky and comes within 0.2 million of the announcement for 2016.

As I said, there is no good reason to model population growth by a polynomial, piecewise or not. But because the rate of growth can be expected to be proportional to the size of the population, there is good reason to use an exponential.

$$ p(t) \approx a e^{bt} $$

There have been many proposals for ways to modify this model to avoid its unbounded growth. I have just added two of these to `censusapp`. One is the logistic model.

$$ p(t) \approx \frac{a}{1+b e^{-ct}} $$

And the other is the Gompertz double exponential model, named after Benjamin Gompertz, a 19th century self-educated British mathematician and astronomer.

$$ p(t) \approx a e^{-b e^{-ct}} $$

In both of these models the growth is limited because the approximating term approaches $a$ as $t$ approaches infinity.

In all three of the exponential models, the parameters $a$, $b$, and possibly $c$, appear nonlinearly. In principle, I could use `lsqcurvefit` to search in two or three dimensions for a least squares fit to the census data. But I have an alternative. By taking one or two logarithms, I have a *separable least squares* model where at most one parameter, $a$, appears nonlinearly.

For the exponential model, take one logarithm.

$$ \log{p} \approx \log{a} + bt $$

Fit the logarithm of the data by a straight line and then exponentiate the result. No search is required.

For the logistic model, take one logarithm.

$$ \log{(a/p-1)} \approx \log{b} - ct $$

For any value of $a$, the parameters $\log{b}$ and $c$ appear linearly and can be found without a search. So use a one-dimensional minimizer to search for $a$. I could use `fminbnd`, or its textbook version, `fmintx`, from *Numerical Methods with MATLAB*.

For the Gompertz model, take two logarithms.

$$ \log{\log{a/p}} \approx \log{b} - ct $$

Again, do a one-dimensional search for the minimizing $a$, solving for $\log{b}$ and $c$ at each step.

Here are the three resulting fits, extrapolated over more than 200 years to the year 2250. The pure exponential model reaches 5 billion people by that time, and is growing ever faster. I think that's unreasonable.

The value of $a$ in the Gompertz fit turns out to be 4309.6, so the population will be capped at 4.3 billion. But it has only reached 1.5 billion two hundred years from now. Again unlikely.

The value of $a$ in the logistic fit turns out to be 756.4, so the predicted US population will slightly more than double over the next two hundred years. Despite the Census Bureau's observation that our rate of growth has slowed recently, we are not yet even half the way to our ultimate population limit. I'll let you be the judge of that prediction.

I have recently updated Cleve's Laboratory in the MATLAB Central file exchange. One of the updates changes the name of `censusgui` to `censusapp` and adds the two exponential models. If you do install this new version of the Laboratory, you can answer the following question.

The fit generated by `pchip` defines a cubic for use beyond the year 2000 that predicts the population will reach a maximum in the not too distant future and decrease after that. What is that maximum and when does it happen?

Get
the MATLAB code

Published with MATLAB® R2017a

I don't recall where I found this seasonal fractal. And I can't explain how it works. So please submit a comment if you can shed any light on either of these questions.... read more >>

]]>I don't recall where I found this seasonal fractal. And I can't explain how it works. So please submit a comment if you can shed any light on either of these questions.

I have featured a fractal at this time of the year in two previous years. I see Christmas trees or perhaps a holly decoration.

All these figures are obtained by varying the parameters in a formula that generates complex numbers $z$ from the partial sums.

$$ z = \sum_n{\exp{(\phi n^p+\sigma) \pi i}} $$

A vector of points in the complex plane is produced by taking $n$ to be a vector of consecutive integers and using the MATLAB cumulative summation function `cumsum` to compute the partial sums. There are 8600 points in the figure above and 100,000 points in the figures below.

The default value of the parameter $\phi$ is my old friend the `golden ratio`.

$$ \phi = \frac{1+\sqrt{5}}{2} $$

In previous posts I've taken $\phi$ to be other rational and irrational numbers, but today I am sticking to this value.

The parameter $\sigma$ controls the angular orientation. Taking $\sigma$ near $1/8$ makes the large Christmas tree vertical.

While trying how to understand how this thing works I've varied the power $p$ from its usual value of 2 and taken hundreds of thousands of points. This produces today's pictures. Different values of $p$ produce wildly different results.

For a real variable $x$, the expression $\exp (x \pi i)$ is periodic and lies on the unit circle in the complex plane. So we are plotting the cumulative sum of values taken from around the unit circle. At first glance, this appears to be a complex valued random number generator. But it is a lousy generator because we can see Christmas trees in the output.

```
type greetings_gifs
```

function greetings_gifs % Generate animated season's greeting gifs. % Generate the fractal phi = (1+sqrt(5))/2; s = 1/8; n = 100000; for gifnum = 1:4 switch gifnum case 1, p = 2; case 2, p = 2/3; case 3, p = 1.25; case 4, p = 4; end w = exp((phi*(0:n).^p+s)*pi*1i); z = cumsum(w); % Find local extrema ks = extrema(z); % Make an animated gif plotit(z,p,ks,gifnum) end % gifnum % ------------------------ function ks = extrema(z) n = length(z)-1; m = n/40; ks = []; for j = 0:m:n-m zj = z(j+1:j+m); w = zj - mean(zj); k = find(abs(w) == max(abs(w))) + j; ks = [ks k]; end end % extrema function plotit(z,p,ks,gifnum) % Make an animated gif shg plot(z,'.') axis square ax = axis; gif_frame(['greetings' int2str(gifnum) '.gif']) clf axis(ax) axis square gif_frame for j = 1:length(ks) k = ks(j); plot(z(1:k),'k.','markersize',0.5) axis(ax) axis square hold on plot(z(ks(1:j)),'g.','markersize',18) plot(z(k),'r.','markersize',24) hold off title(sprintf('p = %4.2f',p)) gif_frame end gif_frame(5) gif_frame('reset') end % plotit end % greetings_gifs

*Happy New Year.*

Get
the MATLAB code

Published with MATLAB® R2016a

When I was in high school in the 1950's, I didn't know anything about matrices. But I nearly encountered one when I wrote a paper for my physics class about the color scheme that allowed new color TV broadcasts to be compatible with existing black-and-white TV receivers.... read more >>

]]>When I was in high school in the 1950's, I didn't know anything about matrices. But I nearly encountered one when I wrote a paper for my physics class about the color scheme that allowed new color TV broadcasts to be compatible with existing black-and-white TV receivers.

During my high school years in Utah, I worked part-time and summers as a camera man and projectionist at what was then called KTVT, Channel 4, and the local NBC affiliate in Salt Lake City. The TV cameras at the time employed vacuum tubes, not transistors. They were the size of suit cases, weighed over 100 pounds, and were mounted on dollies.

The first coast-to-coast TV broadcast in color was the Tournament of Roses parade on January 1, 1954. KTVT had a prototype color receiver, perhaps the only one in Utah at the time. We set up bleachers in the studio and invited guests to see the new technology.

The NTSC compatible color television standard, adopted in 1953, allows color information to be encoded in a broadcast in such a way that existing black-and-white receivers will still function without modification. The standard remained in use in the US until broadcasters ceased analog transmissions in 2009.

I wrote a paper for my high school physics class about the NTSC compatible color scheme. It involves a *luminance-chrominance* conversion between the RGB, red-green-blue, signal used in color studio equipment and home receivers, and a YIQ encoding. Y represents the brightness, or *luma*, of the signal, and is used by itself for black-and-white. The letters I and Q stand for *in-phase* and *quadrature*. These two components combine with the brightness to produce color.

Here are the conversion formulas that I had in my high school paper.

$$ Y = 0.2989 R + 0.5870 G + 0.1140 B $$

$$ I = 0.5959 R - 0.2744 G - 0.3216 B $$

$$ Q = 0.2115 R - 0.5229 G + 0.3114 B $$

I now know how to write the conversion formulas in matrix notion. If $x$ is the 3-vector containing RGB and $y$ is the 3-vector of YIQ, then

$$ y = A x $$

where $A$ is the matrix

$$ A = \left[ \begin{array}{rrr} 0.2989 & 0.5870 & 0.1140 \\ 0.5959 & -0.2744 & -0.3216 \\ 0.2115 & -0.5229 & 0.3114 \end{array} \right] $$

In retrospect, this was my very first matrix.

Of course, conversion at the receiving end from YIQ to RGB is done by

$$ x = A^{-1} y $$

Fast forward 25 years so we can use MATLAB.

A = [0.2989 0.5870 0.1140 0.5959 -0.2744 -0.3216 0.2115 -0.5229 0.3114] T = inv(A)

A = 0.2989 0.5870 0.1140 0.5959 -0.2744 -0.3216 0.2115 -0.5229 0.3114 T = 1.0002 0.9560 0.6211 1.0001 -0.2720 -0.6470 1.0000 -1.1060 1.7030

The first column of $A^{-1}$ is all 1's, so each of R, G and B gets a full dose of Y.

Although I had nothing to do with putting them there, these two matrices are central to the functions `ntsc2rgb` and `rgb2ntsc` in the `colorspaces` folder of the Image Processing Toolbox. In the following code `A` is a 3D full color RGB image.

Here is the core of the NTSC to RGB function. The image is reshaped to a tall, skinny 2D array and the transformation applied on the right by the transpose of `T`.

dbtype 36:36 ntsc2rgb dbtype 41:43 ntsc2rgb dbtype 48:48 ntsc2rgb

36 T = [1.0 0.956 0.621; 1.0 -0.272 -0.647; 1.0 -1.106 1.703]; 41 m = size(A,1); 42 n = size(A,2); 43 A = reshape(A(:),m*n,3)*T'; 48 A = reshape(A,m,n,3);

The conversion from RGB to NTSC uses *matrix division* on the right by `T` transpose. This may be the only place in MATLAB where the forward slash matrix operator appears.

dbtype 30:31 rgb2ntsc dbtype 35:35 rgb2ntsc

30 T = [1.0 0.956 0.621; 1.0 -0.272 -0.647; 1.0 -1.106 1.703].'; 31 [so(1),so(2),thirdD] = size(A); 35 A = reshape(reshape(A,so(1)*so(2),thirdD)/T,so(1),so(2),thirdD);

Since analog NTSC broadcasts are history, these conversions are probably not very common any more. But the single formula

$$ Y = 0.2989 R + 0.5870 G + 0.1140 B $$

is the basis for the `rgb2gray` function.

NBC introduced a peacock logo to promote the debut of its "in living color" broadcasts. The peacock has lost a few feathers over the years, but it still provides an excellent color test pattern.

(NBC SVG by FOX 52 [Public domain], via Wikimedia Commons)

Here are the red, green and blue components. The bright portions represent high values. All three have relatively bright backgrounds and bright peacocks in the center; these combine to produce white. The R component has bright feathers on the left to contribute to the yellow, orange and red feathers in the full logo. The G component has bright components on the bottom to contribute to the yellow and green in the full. And B has bright feathers on the upper right to give blue and purple.

Convert to YIQ. The Y component is the grayscale rendition to drive old black-and-white TVs. The I component goes from orange to blue and the Q component goes from purple to green.

Talking about color, here is one of my favorite graphics. Let's start with an animated .gif of `colorcubes(3)`, which contains 27 cubelets. If you run `colorcubes` yourself, you will drive the rotation with your mouse.

You can see white at one vertex and black at the opposite vertex. Red, a bright green, and blue are at three vertices, while their complements -- cyan, magenta, and yellow -- oppose them. Actually, the green vertex is too bright. It is often called "lime". A darker green is along the bottom edge, halfway between lime and black. Here's a little quiz: What color is the hidden cubelet in the center?

Now a look at the construction of `colorcubes(5)`.

Here is the `colorcubes` code.

```
type colorcubes
```

function colorcubes(n,w) % COLORCUBES A cube of cubes in the RGB color space. % COLORCUBES, with no arguments, shows 5^3 = 125 cubes with % colors equally spaced in the RGB color space. % COLORCUBES(n) shows n-by-n-by-n colors. % COLORCUBES(2) shows 8 colors: R, G, B, C, M, Y, W, K (black). % COLORCUBES(n,w) uses cubes of width w. Default is w = 0.85. % Rotate the cube with the mouse or arrow keys. % Copyright 2016 The MathWorks, Inc. if nargin < 1, n = 5; end if nargin < 2, w = 0.85; end initgraphics(n) [x,y,z] = cube(w); m = n-1; for i = m:-1:0 for j = m:-1:0 for k = 0:m r = k/m; g = 1-j/m; b = 1-i/m; surface(i+x,j+y,k+z, ... 'facecolor',[r g b], ... 'facelighting','gouraud'); drawnow end %k end %j end %i % ------------------------ % INITGRAPHCS Inialize the colorcubes axis. % INITGRAPHICS(n) for n-by-n-by-n display. function initgraphics(n) clf reset shg set(gcf,'color','white') axis([0 n 0 n 0 n]); axis off axis vis3d rotate3d on end %initgraphics function [x,y,z] = cube(w) % CUBE Coordinates of the faces of a cube. % [x,y,z] = cube(w); surface(x,y,z) % plots a cube of with w. u = [0 0; 0 0; w w; w w]; v = [0 w; 0 w; 0 w; 0 w]; z = [w w; 0 0; 0 0; w w]; s = [nan nan]; x = [u; s; v]; y = [v; s; u]; z = [z; s; w-z]; end %cube end % colorcubes

Get
the MATLAB code

Published with MATLAB® R2016a

(Please replace the erroneous posting from yesterday, Nov. 28, with this corrected version.)... read more >>

]]>(Please replace the erroneous posting from yesterday, Nov. 28, with this corrected version.)

Here is a very short course in Linear Algebra. The Singular Value Decomposition provides a natural basis for Gil Strang's Four Fundamental Subspaces.

Screen shot from Gil Strang MIT/MathWorks video lecture, "The Big Picture of Linear Algebra".

Gil Strang tells me that he began to think about linear algebra in terms of four fundamental subspaces in the 1970's when he wrote the first edition of his textbook, *Introduction to Linear Algebra*. The fifth edition, which was published last May, features the spaces on the cover.

The concept is a centerpiece in his video lectures for MIT course 18.06. It even found its way into the new video series about ordinary differential equations that he and I made for MIT and MathWorks. His paper included in the notes for 18.06 is referenced below.

Suppose that $A$ is a $m$ -by- $n$ matrix that maps vectors in $R^n$ to vectors in $R^m$. The four fundamental subspaces associated with $A$, two in $R^n$ and two in $R^m$, are:

- column space of $A$, the set of all $y$ in $R^m$ resulting from $y = Ax$,
- row space of $A$, the set of all $x$ in $R^n$ resulting from $x = A^Ty$,
- null space of $A$, the set of all $x$ in $R^n$ for which $Ax = 0$,
- left null space of $A$, the set of all $y$ in $R^m$ for which $A^T y = 0$.

The row space and the null space are *orthogonal* to each other and span all of $R^n$. The column space and the left null space are also *orthogonal* to each other and span all of $R^m$.

The *dimension* of a subspace is the number of linearly independent vectors required to span that space. *The Fundamental Theorem of Linear Algebra* is

- The dimension of the row space is equal to the dimension of the column space.

In other words, the number of linearly independent rows is equal to the number of linearly independent columns. This may seem obvious, but it is actually a subtle fact that requires proof.

The *rank* of a matrix is this number of linearly independent rows or columns.

The natural bases for the four fundamental subspaces are provided by the SVD, the Singular Value Decomposition, of $A$.

$$ A = U \Sigma V^T $$

The matrices $U$ and $V$ are *orthogonal*.

$$ U^T U = I_m, \ \ V^T V = I_n $$

You can think of orthogonal matrices as multidimensional generalizations of two dimensional rotations. The matrix $\Sigma$ is *diagonal*, so its only nonzero elements are on the main diagonal.

The shape and size of these matrices are important. The matrix $A$ is rectangular, say with $m$ rows and $n$ columns; $U$ is square, with the same number of rows as $A$; $V$ is also square, with the same number of columns as $A$; and $\Sigma$ is the same size as $A$. Here is a picture of this equation when $A$ is tall and skinny, so $m > n$. The diagonal elements of $\Sigma$ are the *singular values*, shown as blue dots. All of the other elements of $\Sigma$ are zero.

The signs and the ordering of the columns in $U$ and $V$ can always be taken so that the singular values are nonnegative and arranged in decreasing order.

For any diagonal matrix like $\Sigma$, it is clear that the rank, which is the number of independent rows or columns, is just the number of nonzero diagonal elements.

In MATLAB, the SVD is computed by the statement.

[U,Sigma,V] = svd(A)

With inexact floating point computation, it is appropriate to take the rank to be the number of *nonnegligible* diagonal elements. So the function

r = rank(A)

counts the number of singular values larger than a tolerance.

Multiply both sides of $A = U\Sigma V^T $ on the right by $V$. Since $V^T V = I$, we find

$$ AV = U\Sigma $$

Here is the picture. I've drawn a green line after column $r$ to show the rank. The only nonzero elements of $\Sigma$, the singular values, are the blue dots.

Write out this equation column by column.

$$ Av_j = \sigma_j u_j, \ \ j = 1,...,r $$

$$ Av_j = 0, \ \ j = r+1,...,n $$

This says that $A$ maps the first $r$ columns of $V$ onto nonzero multiples of the first $r$ columns of $U$ and maps the remaining columns of $V$ onto zero.

- $U(:,1:r)$ spans the column space.
- $V(:,r+1:n)$ spans the null space.

Transpose the equation $A = U\Sigma V^T $ and multiply both sides on the right by $U$. Since $U^T U = I$, we find

$$ A^T U = V \Sigma^T $$

Here's the picture, with the green line at the rank.

Write this out column by column.

$$ A^T u_j = \sigma_j v_j, \ \ j = 1,...,r $$

$$ A^T u_j = 0, \ \ j = r+1,...,m $$

This says that $A^T$ maps the first $r$ columns of $U$ onto nonzero multiples of the first $r$ columns of $V$ and maps the remaining columns of $U$ onto zero.

- $V(:,1:r)$ spans the row space.
- $U(:,r+1:m)$ spans the left nullspace.

Here is an example involving lines in two dimensions, so $m = n = 2$. Start with these vectors.

u = [-3 4]' v = [1 3]'

u = -3 4 v = 1 3

The matrix $A$ is their outer product.

A = u*v'

A = -3 -9 4 12

Compute the SVD.

[U,Sigma,V] = svd(A)

U = -0.6000 0.8000 0.8000 0.6000 Sigma = 15.8114 0 0 0 V = 0.3162 -0.9487 0.9487 0.3162

As expected, $\Sigma$ has only one nonzero singular value, so the rank is $r = 1$.

The first left and right singular vectors are our starting vectors, normalized to have unit length.

ubar = u/norm(u) vbar = v/norm(v)

ubar = -0.6000 0.8000 vbar = 0.3162 0.9487

The columns of $A$ are proportional to each other, and to $\bar{u}$. So the column space is just the line generated by multiples of either column and $\bar{u}$ is the normalized basis vector for the column space. The columns of $A^T$ are proportional to each other, and to $\bar{v}$. So $\bar{v}$ is the normalized basis vector for the row space.

The only nonzero singular value is the product of the normalizing factors.

sigma = norm(u)*norm(v)

sigma = 15.8114

The second right and left singular vectors, that is the second columns of $V$ and $U$, provide bases for the null spaces of $A$ and $A^T$.

Here is the picture.

Gilbert Strang, "The Four Fundamental Subspaces: 4 Lines", undated notes for MIT course 18.06, <http://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf>

Gilbert Strang, *Introduction to Linear Algebra*, Wellesley-Cambridge Press, fifth edition, 2016, x+574 pages, <http://bookstore.siam.org/wc14>

Gilbert Strang, "The Fundamental Theorem of Linear Algebra", *The American Mathematical Monthly*, Vol. 100, No. 9. (Nov., 1993), pp. 848-855, <http://www.jstor.org/stable/2324660?seq=1#page_scan_tab_contents>, also available at <http://www.souravsengupta.com/cds2016/lectures/Strang_Paper1.pdf>

Get
the MATLAB code

Published with MATLAB® R2016b

Here is a very short course in Linear Algebra. The Singular Value Decomposition provides a natural basis for Gil Strang's Four Fundamental Subspaces.... read more >>

]]>Here is a very short course in Linear Algebra. The Singular Value Decomposition provides a natural basis for Gil Strang's Four Fundamental Subspaces.

Screen shot from Gil Strang MIT/MathWorks video lecture, "The Big Picture of Linear Algebra".

Gil Strang tells me that he began to think about linear algebra in terms of four fundamental subspaces in the 1970's when he wrote the first edition of his textbook, *Introduction to Linear Algebra*. The fifth edition, which was published last May, features the spaces on the cover.

The concept is a centerpiece in his video lectures for MIT course 18.06. It even found its way into the new video series about ordinary differential equations that he and I made for MIT and MathWorks. His paper included in the notes for 18.06 is referenced below.

Suppose that $A$ is a $m$ -by- $n$ matrix that maps vectors in $R^n$ to vectors in $R^m$. The four fundamental subspaces associated with $A$, two in $R^n$ and two in $R^m$, are:

- row space of $A$, the set of all $x$ for which $Ax$ is nonzero,
- null space of $A$, the set of all $x$ for which $Ax = 0$,
- column space of $A$, the set of all $y$ for which $A^T y$ is nonzero.
- left null space of $A$, the set of all $y$ for which $A^T y = 0$.

The row space and the null space are *orthogonal* to each other and span all of $R^n$. The column space and the left null space are also *orthogonal* to each other and span all of $R^m$.

The *dimension* of a subspace is the number of linearly independent vectors required to span that space. *The Fundamental Theorem of Linear Algebra* is

- The dimension of the row space is equal to the dimension of the column space.

In other words, the number of linearly independent rows is equal to the number of linearly independent columns. This may seem obvious, but it is actually a subtle fact that requires proof.

The *rank* of a matrix is this number of linearly independent rows or columns.

The natural bases for the four fundamental subspaces are provided by the SVD, the Singular Value Decomposition, of $A$.

$$ A = U \Sigma V^T $$

The matrices $U$ and $V$ are *orthogonal*, which you can think of as multidimensional generalizations of two dimensional rotations. The matrix $\Sigma$ is *diagonal*, so its only nonzero elements are on the main diagonal.

The shape and size of these matrices are important. The matrix $A$ is rectangular, say with $m$ rows and $n$ columns; $U$ is square, with the same number of rows as $A$; $V$ is also square, with the same number of columns as $A$; and $\Sigma$ is the same size as $A$. Here is a picture of this equation when $A$ is tall and skinny, so $m > n$. The diagonal elements of $\Sigma$ are the *singular values*, shown as blue dots. All of the other elements of $\Sigma$ are zero.

The signs and the ordering of the columns in $U$ and $V$ can always be taken so that the singular values are nonnegative and arranged in decreasing order.

For any diagonal matrix like $\Sigma$, it is clear that the rank, which is the number of independent rows or columns, is just the number of nonzero diagonal elements.

In MATLAB, the SVD is computed by the statement.

[U,Sigma,V] = svd(A)

With inexact floating point computation, it is appropriate to take the rank to be the number of *nonnegligible* diagonal elements. So the function

r = rank(A)

counts the number of singular values larger than a tolerance.

Multiply both sides of $A = U\Sigma V^T $ on the right by $V$. Since $V^T V = I$, we find

$$ AV = U\Sigma $$

Here is the picture. I've drawn a green line after column $r$ to show the rank. The only nonzero elements of $\Sigma$, the singular values, are the blue dots.

Write out this equation column by column.

$$ Av_j = \sigma_j u_j, \ \ j = 1,...,r $$

$$ Av_j = 0, \ \ j = r+1,...,n $$

This says that $A$ maps the first $r$ columns of $V$ onto nonzero vectors and maps the remaining columns of $V$ onto zero. So the columns of $V$, which are known as the *right singular vectors*, form a natural basis for the first two fundamental spaces.

- $V(:,1:r)$ spans the row space.
- $V(:,r+1:n)$ spans the null space.

Transpose the equation $A = U\Sigma V^T $ and multiply both sides on the right by $U$. Since $U^T U = I$, we find

$$ A^T U = V \Sigma^T $$

Here's the picture, with the green line at the rank.

Write this out column by column.

$$ A^T u_j = \sigma_j v_j, \ \ j = 1,...,r $$

$$ A^T u_j = 0, \ \ j = r+1,...,m $$

This says that $A^T$ maps the first $r$ columns of $U$ onto nonzero vectors and maps the remaining columns of $U$ onto zero. So the columns of $U$, which are known as the *left singular vectors*, form a natural basis for the other two fundamental spaces.

- $U(:,1:r)$ spans the column_space.
- $U(:,r+1:m)$ spans the left_nullspace.

Here is an example involving lines in two dimensions. So $m = n = 2$ and the rank is $r = 1$. Start with these vectors.

u = [-3 4]' v = [1 3]'

u = -3 4 v = 1 3

The matrix $A$ is their outer product

A = u*v'

A = -3 -9 4 12

Compute the SVD.

[U,S,V] = svd(A)

U = -0.6000 0.8000 0.8000 0.6000 S = 15.8114 0 0 0 V = 0.3162 -0.9487 0.9487 0.3162

The first left and right singular vectors are our starting vectors, normalized to have unit length.

ubar = u/norm(u) vbar = v/norm(v)

ubar = -0.6000 0.8000 vbar = 0.3162 0.9487

These vectors provide bases for the one dimensional column and row spaces. The only nonzero singular value is the product of the normalizing factors.

sigma = norm(u)*norm(v)

sigma = 15.8114

The second left and right singular vectors are perpendicular to the first two and form bases for the null spaces of $A$ and $A^T$.

Here is the picture.

Gilbert Strang, "The Four Fundamental Subspaces: 4 Lines", undated notes for MIT course 18.06, <http://web.mit.edu/18.06/www/Essays/newpaper_ver3.pdf>

Gilbert Strang, *Introduction to Linear Algebra*, Wellesley-Cambridge Press, fifth edition, 2016, x+574 pages, <http://bookstore.siam.org/wc14>

Gilbert Strang, "The Fundamental Theorem of Linear Algebra", *The American Mathematical Monthly*, Vol. 100, No. 9. (Nov., 1993), pp. 848-855, <http://www.jstor.org/stable/2324660?seq=1#page_scan_tab_contents>, also available at <http://www.souravsengupta.com/cds2016/lectures/Strang_Paper1.pdf>

Get
the MATLAB code

Published with MATLAB® R2016b

My favorite ordinary differential equation provides a good test of ODE solvers, both numeric and symbolic. It also provides a nice illustration of the underlying existence theory and error analysis.... read more >>

]]>My favorite ordinary differential equation provides a good test of ODE solvers, both numeric and symbolic. It also provides a nice illustration of the underlying existence theory and error analysis.

Here is one of my favorite ordinary differential equations. The dot means differentiation with respect to $t$. The equation does not express $\dot{y}$ directly, so it is implicit.

$$ \dot{y}^2 + y^2 = 1 $$

We don't need a computer to verify that the analytic solution is a combination of $\sin{t}$ and $\cos{t}$. But the constants $y=1$ and $y=-1$ are also solutions.

Our ODE solvers require an explicit equation, but when we solve this equation for $\dot{y}$ we get two solutions, differing in sign.

$$ \dot{y} = \pm \sqrt{1-y^2} $$

That $\pm$ sign makes this interesting.

Let's plunge ahead with the $+$ sign. Specify the equation, an interval for $t$ and an initial condition.

```
f = @(t,y) sqrt(1-y^2)
format short
tspan = [0 pi]
y0 = 0
```

f = @(t,y)sqrt(1-y^2) tspan = 0 3.1416 y0 = 0

An attempt to compute a numeric solution with `ode45` runs into trouble when $y$ gets slightly larger than $1$ and the square root of a negative number generates an imaginary result. So temporarily use a shorter $t$ interval.

try ode45(f,tspan,y0); catch ode45(f,[0 1.47],y0) end pi_axis([-0.1 1.1])

Error using odeplot (line 63) Error updating the ODEPLOT window. Solution data may have been corrupted. Argument Y cannot be complex. Error in ode45 (line 435)

Change the equation so that it provides a guard against the square root of negative numbers.

f = @(t,y) sqrt(1-min(y,1)^2) ode45(f,tspan,y0); pi_axis([-0.1 1.1])

f = @(t,y)sqrt(1-min(y,1)^2)

This is the correct solution, $\sin{t}$. For a while. But when $y$ reaches $1$ it has to stay there because its derivative is always nonnegative.

If we want to continue with $\sin{t}$ beyond $\pi/2$ we have to switch to the minus sign.

f = @(t,y) (1-2*double(t>pi/2))*real(sqrt(1-y.^2)) ode45(f,tspan,y0) pi_axis([-0.1 1.1])

f = @(t,y)(1-2*double(t>pi/2))*real(sqrt(1-y.^2))

We're solving the autonomous differential equation,

$$ \dot{y} = f(y) $$

where

$$ f(y) = \sqrt{1-y^2} $$

An important quantity in both the existence theory for ODEs and the error analysis for numeric methods is the *Jacobian*.

$$ J = \frac{\partial{f}}{\partial{y}} $$

In this case $J$ is 1-by-1.

$$ J = -\frac{y}{\sqrt{1 - y^2}} $$

When $y$ approaches $\pm 1$ the Jacobian approaches infinity.

Existence and uniqueness theorems for ODEs rely on a bounded Jacobian (or a bounded Lipschitz constant). The derivations of numerical methods rely on a bounded Jacobian. Error control in modern ODE software relies on a bounded Jacobian. All of these are up in the air with this example.

We can see the lack of uniqueness happening when we change the initial condition to $y(0) = -1$. The Jacobian is infinite at the start and both the constant $y(t) = -1$ and the trig function $y(t) = -\cos{t}$ are solutions. A small perturbation in the initial condition switches the numerical solution from one to the other.

f = @(t,y) sqrt(1-max(min(y,1),-1)^2) y0 = -1 ode45(f,tspan,y0); pi_axis([-1.2 0]) hold on format long y0 = -.99999999 ode45(f,tspan,y0); pi_axis([-1.1 1.1]) hold off

f = @(t,y)sqrt(1-max(min(y,1),-1)^2) y0 = -1 y0 = -0.999999990000000

I was curious to see how the Symbolic Toolbox handles this equation with this initial condition. There is good news and surprising news.

```
syms y(t)
ode = diff(y)^2+y^2==1
init = y(0)==-1
ysym = dsolve(ode,init)
```

ode(t) = diff(y(t), t)^2 + y(t)^2 == 1 init = y(0) == -1 ysym = -1 -cosh(t*1i)

I am pleased to see the two solutions confirmed, but I am surprised to see the second one expressed as a hyperbolic trig function of an imaginary argument.

Simplification generates a more familiar form.

simplify(ysym)

ans = -1 -cos(t)

Let's return to the original implicit equation and differentiate the entire equation with respect to $t$.

$$ \frac{d}{dt} \left( \dot{y}^2 + y^2 = 1 \right) $$

$$ 2 \dot{y} \ddot{y} + 2 y \dot{y} = 0 $$

We can divide through this equation by $2 \dot{y}$, provided that ${\dot{y}}$ is nonzero. This eliminates the constant solutions and produces an old friend, the harmonic oscillator.

$$ \ddot{y} + y = 0 $$

Our ODE solvers need the second order equation to be expressed as a first order system.

$$ \dot{y_1} = y_2, \ \ \dot{y_2} = -y_1 $$

Now there is no trouble computing $\sin{t}$ and $-\cos{t}$.

f = @(t,y)[y(2); -y(1)] y0 = [0; -1] ode23(f,tspan,y0) pi_axis([-1 1])

f = @(t,y)[y(2);-y(1)] y0 = 0 -1

I have been using this little function to label the x-axis with multiples of $\pi$ and scale the y-axis.

```
type pi_axis
```

function pi_axis(yaxis) axis([0 pi yaxis]) set(gca,'xtick',0:pi/4:pi, ... 'xticklabels',{0 '\pi/4' '\pi/2' '3\pi/4' '\pi'}) end

Get
the MATLAB code

Published with MATLAB® R2016a

I am launching a project that I call *Cleve's Laboratory*. My most recent Cleve's Corner column in MathWorks News & Notes is about the project. The MATLAB app itself is available for download from the MATLAB Central File Exchange.... read more >>

I am launching a project that I call *Cleve's Laboratory*. My most recent Cleve's Corner column in MathWorks News & Notes is about the project. The MATLAB app itself is available for download from the MATLAB Central File Exchange.

*Cleve’s Laboratory* collects much of the work I have done over the last several years in one place. The experiments come from this blog, my two online books, *Cleve’s Corner* columns in News & Notes, and new work.

Each experiment centers on an interactive “app” that allows experimenters to try out the ideas for themselves. Several of the experiments, including `hello_world`, `klock`, and `biorhythms`, are designed to introduce newcomers to MATLAB.

This figure shows snapshots of some of the graphical entry pages. Even these are live pages driven by MATLAB. The wave in the first icon moves every time you open the page. The clock reads the correct time. The Sudoku puzzle changes. The double pendulum swings.

Here are the apps in the current release. If you have been following this blog, you might recognize some of them.

- logo_wave
- biorhythms
- klock
- lifex
- fibonacci
- fern
- house_mult
- t_puzzle
- tictactoe
- flame
- predprey
- mandelbrot
- durerperm
- waterwave
- expshow
- sudoku
- orbits
- golden_spiral
- walker
- hello_world
- blackjack
- swinger
- eigsvdapp
- censusapp
- touchtone
- pdeapp
- interpapp
- waves
- tumbling_box

Get
the MATLAB code

Published with MATLAB® R2016a

This is a follow-up to my previous follow-up, posted several days ago. A very careful reader, Bruno Bazzano, contributed a comment pointing out what he called "a small typo" in my code for the classic Gram-Schmidt algorithm. It is more than a small typo, it is a serious blunder. I must correct the code, then do more careful experiments and reword my conclusions.... read more >>

]]>This is a follow-up to my previous follow-up, posted several days ago. A very careful reader, Bruno Bazzano, contributed a comment pointing out what he called "a small typo" in my code for the classic Gram-Schmidt algorithm. It is more than a small typo, it is a serious blunder. I must correct the code, then do more careful experiments and reword my conclusions.

Bruno noticed that in the inner loop of my classic Gram-Schmidt function, the expression `Q(:,k-1)` should be `Q(:,1:k-1)`. I had dropped the two characters, " `1:` ". I need to orthogonalize against all the previous vectors, not just one of them. Here is the corrected code.

```
type gs
```

function [Q,R] = gs(X) % Classical Gram-Schmidt. [Q,R] = gs(X); % G. W. Stewart, "Matrix Algorithms, Volume 1", SIAM, 1998. [n,p] = size(X); Q = zeros(n,p); R = zeros(p,p); for k = 1:p Q(:,k) = X(:,k); if k ~= 1 R(1:k-1,k) = Q(:,1:k-1)'*Q(:,k); Q(:,k) = Q(:,k) - Q(:,1:k-1)*R(1:k-1,k); end R(k,k) = norm(Q(:,k)); Q(:,k) = Q(:,k)/R(k,k); end end

By the way, this points out the importance of publishing codes.

For various matrices `X`, let's again check how the three algorithms perform on two tasks, accuracy and orthogonality. How close is `Q*R` to `X`? And, how close is `Q'*Q` to `I`. I have modified the `compare` function in last week's post to return `ortherr`.

First, try a well conditioned matrix, a magic square of odd order.

n = 7; X = magic(n)

X = 30 39 48 1 10 19 28 38 47 7 9 18 27 29 46 6 8 17 26 35 37 5 14 16 25 34 36 45 13 15 24 33 42 44 4 21 23 32 41 43 3 12 22 31 40 49 2 11 20

Check its condition.

kappa = condest(X)

kappa = 8.5681

Do the comparison.

compare(X);

Classic Modified Householder QR error 1.52e-16 6.09e-17 5.68e-16 Orthogonality 1.87e-15 1.53e-15 1.96e-15

The `orthogonality` value in the `Classic` column is now at roundoff level. All three algorithms do well with both accuracy and orthogonality on this matrix.

I want to do two experiments with matrices that are poorly conditioned, but not exactly singular. First, Hilbert matrices of increasing order.

kappa = zeros(8,1); ortherr = zeros(8,3); for n = 1:8 X = hilb(n); kappa(n) = condest(X); ortherr(n,:) = compare(X); end

Plot the results with a logarithmic scale.

semilogy([ortherr eps*kappa eps*kappa.^2],'o-') axis([0 9 10^(-16) 10^5]) legend({'gs','mgs','house','eps*kappa','eps*kappa^2'}, ... 'location','northwest') title('Loss of orthogonality') xlabel('n')

The loss of orthogonality of GS is worse than the other two and roughly varies like the square of the condition number, `kappa`. The loss of orthogonality of MGS varies like `kappa` to the first power. Householder does well with orthogonality.

Another set of nonsingular, but poorly conditioned, matrices, generated by adding a small value to the diagonal of the magic square of order 8

M = magic(8); I = eye(8); for k = 1:8 s = 10^(-k); X = M + s*I; kappa(k) = condest(X); ortherr(k,:) = compare(X); end

Plot the results with a logarithmic scale.

semilogy([ortherr eps*kappa eps*kappa.^2],'o-') axis([0 9 10^(-16) 10^5]) legend({'gs','mgs','house','eps*kappa','eps*kappa^2'}, ... 'location','northwest') title('Loss of orthogonality') xlabel('log10(1/s)')

Again, the orthogonality of GS varies like `kappa^2`, and the orthogonality of MGS varies like `kappa`.

Finally, an exactly singular matrix, a magic square of even order.

n = 8; X = magic(n); compare(X);

Classic Modified Householder QR error 4.78e-17 8.54e-17 4.85e-16 Orthogonality 5.17e+00 2.16e+00 1.30e-15

All three algorithms do well with accuracy. Both Gram-Schmidts fail completely with orthogonality. Householder still does well with orthogonality.

All three of these algorithms provide `Q` and `R` that do a good job of reproducing the data `X`, that is

`Q`*`R`is always close to`X`for all three algorithms.

On the other hand, their behavior is very different when it comes to producing orthogonality.

- Classic Gram-Schmidt. Loss of orthogonality can be much worse than the other two when
`X`is badly conditioned. - Modified Gram-Schmidt.
`Q'*Q`varies like the condition of`X`and fails completely when`X`is singular. - Householder triangularization.
`Q'*Q`is always close to`I`

G. W. Stewart, *Matrix Algorithms: Volume 1: Basic Decompositions*, SIAM, xix+458, 1998. <http://epubs.siam.org/doi/book/10.1137/1.9781611971408>

Ake Bjorck, *Numerical Methods for Least Squares Problems*, SIAM, xvii+407, 1996. <http://epubs.siam.org/doi/book/10.1137/1.9781611971484>

Get
the MATLAB code

Published with MATLAB® R2016a

This is a follow-up to my previous post. Classical Gram-Schmidt and Modified Gram-Schmidt are two algorithms for orthogonalizing a set of vectors. Householder elementary reflectors can be used for the same task. The three algorithms have very different roundoff error properties.... read more >>

]]>This is a follow-up to my previous post. Classical Gram-Schmidt and Modified Gram-Schmidt are two algorithms for orthogonalizing a set of vectors. Householder elementary reflectors can be used for the same task. The three algorithms have very different roundoff error properties.

As I did in my previous post, I am using Pete Stewart's book *Matrix Algorithms, Volume I: Basic Decompositions*. His pseudocode is MATLAB ready.

The classic Gram-Schmidt algorithm is the first thing you might think of for producing an orthogonal set of vectors. For each vector in your data set, remove its projection onto the data set, normalize what is left, and include it in the orthogonal set. Here is the code. `X` is the original set of vectors, `Q` is the resulting set of orthogonal vectors, and `R` is the set of coefficients, organized into an upper triangular matrix.

```
type gs
```

function [Q,R] = gs(X) % Classical Gram-Schmidt. [Q,R] = gs(X); % G. W. Stewart, "Matrix Algorithms, Volume 1", SIAM, 1998. [n,p] = size(X); Q = zeros(n,p); R = zeros(p,p); for k = 1:p Q(:,k) = X(:,k); if k ~= 1 R(1:k-1,k) = Q(:,k-1)'*Q(:,k); Q(:,k) = Q(:,k) - Q(:,1:k-1)*R(1:k-1,k); end R(k,k) = norm(Q(:,k)); Q(:,k) = Q(:,k)/R(k,k); end end

This is a rather different algorithm, not just a simple modification of classical Gram-Schmidt. The idea is to orthogonalize against the emerging set of vectors instead of against the original set. There are two variants, a column-oriented one and a row-oriented one. They produce the same results, in different order. Here is the column version,

```
type mgs
```

function [Q,R] = mgs(X) % Modified Gram-Schmidt. [Q,R] = mgs(X); % G. W. Stewart, "Matrix Algorithms, Volume 1", SIAM, 1998. [n,p] = size(X); Q = zeros(n,p); R = zeros(p,p); for k = 1:p Q(:,k) = X(:,k); for i = 1:k-1 R(i,k) = Q(:,i)'*Q(:,k); Q(:,k) = Q(:,k) - R(i,k)*Q(:,i); end R(k,k) = norm(Q(:,k))'; Q(:,k) = Q(:,k)/R(k,k); end end

Compute `R` by applying Householder reflections to `X` a column at a time. Do not actually compute `Q`, just save the vectors that generate the reflections. See the description and codes from my previous post.

```
type house_qr
```

function [U,R] = house_qr(A) % Householder reflections for QR decomposition. % [U,R] = house_qr(A) returns % U, the reflector generators for use by house_apply. % R, the upper triangular factor. H = @(u,x) x - u*(u'*x); [m,n] = size(A); U = zeros(m,n); R = A; for j = 1:min(m,n) u = house_gen(R(j:m,j)); U(j:m,j) = u; R(j:m,j:n) = H(u,R(j:m,j:n)); R(j+1:m,j) = 0; end end

For various matrices `X`, let's check how the three algorithms perform on two tasks, accuracy and orthogonality. How close is `Q*R` to `X`? And, how close is `Q'*Q` to `I`.

```
type compare.m
```

function compare(X); % compare(X). Compare three QR decompositions. I = eye(size(X)); qrerr = zeros(1,3); ortherr = zeros(1,3); %% Classic Gram Schmidt [Q,R] = gs(X); qrerr(1) = norm(Q*R-X,inf)/norm(X,inf); ortherr(1) = norm(Q'*Q-I,inf); %% Modified Gram Schmidt [Q,R] = mgs(X); qrerr(2) = norm(Q*R-X,inf)/norm(X,inf); ortherr(2) = norm(Q'*Q-I,inf); %% Householder QR Decomposition [U,R] = house_qr(X); QR = house_apply(U,R); QQ = house_apply_transpose(U,house_apply(U,I)); qrerr(3) = norm(QR-X,inf)/norm(X,inf); ortherr(3) = norm(QQ-I,inf); %% Report results fprintf('\n Classic Modified Householder\n') fprintf('QR error %10.2e %10.2e %10.2e\n',qrerr) fprintf('Orthogonality %10.2e %10.2e %10.2e\n',ortherr)

First, try a well conditioned matrix, a magic square of odd order.

n = 7; X = magic(n)

X = 30 39 48 1 10 19 28 38 47 7 9 18 27 29 46 6 8 17 26 35 37 5 14 16 25 34 36 45 13 15 24 33 42 44 4 21 23 32 41 43 3 12 22 31 40 49 2 11 20

Check its condition.

kappa = condest(X)

kappa = 8.5681

Do the comparison.

compare(X);

Classic Modified Householder QR error 1.73e-16 6.09e-17 5.68e-16 Orthogonality 3.20e+00 1.53e-15 1.96e-15

All three algorithms do well with accuracy, but classic Gram-Schmidt fails with orthogonality.

Next, try a Hilbert matrix that is poorly conditioned, but not exactly singular.

n = 7; X = hilb(n)

X = 1.0000 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.1667 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.1429 0.1250 0.1111 0.1000 0.0909 0.0833 0.0769

Check its condition.

kappa = condest(X)

kappa = 9.8519e+08

Do the comparison.

compare(X)

Classic Modified Householder QR error 5.35e-17 5.35e-17 8.03e-16 Orthogonality 5.21e+00 1.22e-08 1.67e-15

All three algorithms do well with accuracy. Classic Gram-Schmidt fails completely with orthogonality. The orthogonality of MGS depends upon `kappa`. Householder does well with orthogonality.

Finally, an exactly singular matrix, a magic square of even order.

n = 8; X = magic(n)

X = 64 2 3 61 60 6 7 57 9 55 54 12 13 51 50 16 17 47 46 20 21 43 42 24 40 26 27 37 36 30 31 33 32 34 35 29 28 38 39 25 41 23 22 44 45 19 18 48 49 15 14 52 53 11 10 56 8 58 59 5 4 62 63 1

Check its rank.

rankX = rank(X)

rankX = 3

Do the comparison.

compare(X)

Classic Modified Householder QR error 1.43e-16 8.54e-17 4.85e-16 Orthogonality 5.41e+00 2.16e+00 1.30e-15

Again, all three algorithms do well with accuracy. Both Gram-Schmidts fail completely with orthogonality. Householder still does well with orthogonality.

All three of these algorithms provide `Q` and `R` that do a good job of reproducing the data `X`, that is

`Q`*`R`is always close to`X`for all three algorithms.

On the other hand, their behavior is very different when it comes to producing orthogonality.

- Classic Gram-Schmidt.
`Q'*Q`is almost never close to`I`. - Modified Gram-Schmidt.
`Q'*Q`depends upon condition of`X`and fails completely when`X`is singular. - Householder triangularization.
`Q'*Q`is always close to`I`

G. W. Stewart, *Matrix Algorithms: Volume 1: Basic Decompositions*, SIAM, xix+458, 1998. <http://epubs.siam.org/doi/book/10.1137/1.9781611971408>

Get
the MATLAB code

Published with MATLAB® R2016a

The QR decomposition is often the first step in algorithms for solving many different matrix problems, including linear systems, eigenvalues, and singular values. Householder reflections are the preferred tool for computing the QR decomposition.... read more >>

]]>The QR decomposition is often the first step in algorithms for solving many different matrix problems, including linear systems, eigenvalues, and singular values. Householder reflections are the preferred tool for computing the QR decomposition.

Alston Householder (1904-1993) is one of the pioneers of modern numerical linear algebra. He was a member of the mathematics division of Oak Ridge National Laboratory for over 20 years, from 1946 until 1969, and was also a Professor at the University of Tennessee.

Alston served as President of both SIAM and ACM. He introduced what he called *elementary Hermitian matrices* in a paper in the Journal of the ACM in 1958.

Alston was head of the organizing committee for the Gatlinburg Conferences on Numerical Algebra. A photo of the 1964 committee is available in your MATLAB `demos` directory.

```
load gatlin
image(X)
colormap(map)
```

The Gatlinburg Conferences are now called the Householder Conferences. I wrote about them in MATLAB News & Notes.

My colleague and friend G. W. Stewart is a Distinguished University Professor Emeritus at the Department of Computer Science, University of Maryland. Everybody knows him as "Pete". He has never been able to satisfactorily explain the origins of "Pete" to me. It somehow goes back through his father to his grandfather and maybe great grandfather, who were also nicknamed "Pete".

Pete was Householder's Ph.D. student at the University of Tennessee.

Pete has written several books on numerical linear algebra. The reference for my blog today is his book "Matrix Algorithms, Volume I: Basic Decompositions", published by SIAM. His pseudocode is MATLAB ready.

The QR decomposition expresses a matrix as the product of an orthogonal matrix and an upper triangular matrix. The letter Q is a substitute for the letter O from "orthogonal" and the letter R is from "right", an alternative for "upper".

The decomposition is available explicitly from the MATLAB function `qr`. But, more importantly, the decomposition is part of the fundamental MATLAB linear equation solver denoted by backslash, " `\` ", as well as both the `eig` and `svd` functions for dense matrices.

For any matrix $A$ we write

$$ A = QR $$

Operations based upon orthogonal matrices are very desirable in numeric computation because they do not magnify errors, either those inherited from the underlying data, or those introduced by floating point arithmetic.

A nifty example, taken from the *Wikipedia* page on "QR decomposition", is unusual because $A$, $R$, and a renormalization of $Q$ all have integer entries.

A = [12 -51 4; 6 167 -68; -4 24 -41] [Q,R] = qr(A)

A = 12 -51 4 6 167 -68 -4 24 -41 Q = -0.8571 0.3943 0.3314 -0.4286 -0.9029 -0.0343 0.2857 -0.1714 0.9429 R = -14.0000 -21.0000 14.0000 0 -175.0000 70.0000 0 0 -35.0000

Scale $Q$ to get integers.

cQ = 175*Q

cQ = -150.0000 69.0000 58.0000 -75.0000 -158.0000 -6.0000 50.0000 -30.0000 165.0000

Let's check that $Q$ and $R$ reproduce $A$.

QR = Q*R

QR = 12.0000 -51.0000 4.0000 6.0000 167.0000 -68.0000 -4.0000 24.0000 -41.0000

A Householder reflection is characterized by a vector $u$, which, following Pete's convention, is normalized to have

$$ ||u|| = \sqrt{2} $$

It is usual to define a Householder reflection by a matrix. But I want to define it in a different way, by a MATLAB anonymous function.

H = @(u,x) x - u*(u'*x);

This emphasizes the way in which the reflection should be computed. For any vector $x$, find its projection onto $u$ and then subtract that multiple of $u$ from $x$. In the following picture $u_p$ is a line perpendicular to $u$. In more dimensions $u_p$ would be the plane through the origin perpendicular to $u$. With the $\sqrt{2}$ normalization of $u$, the effect of $H$ is to transform any vector $x$ to its mirror image $Hx$ on the other side of this plane. (Vectors in the plane are left there by $H$.)

house_pic

Where do we get the vector `u` that characterizes the reflection? We want to operate on the columns of a matrix to introduce zeros below the diagonal. Let's begin with the first column, call it `x`. We want to find `u` so that `Hx = H(u,x)` is zero below the first element. And, since the reflection is to preserve length, the only nonzero element in `Hx` should have `abs(Hx(1)) == norm(x)`.

Start with `x` normalized to have length one.

u = x/norm(x)

Now add `+1` or `-1` to `u(1)`, choosing the sign to match. The other choice involves cancellation and is less desirable numerically.

u(1) = u(1) + sign(u(1))

Finally, scale `u` by `sqrt(abs(u(1)))`.

u = u/sqrt(abs(u(1)))

It turns out that this makes `norm(u)` equal to `sqrt(2)`. And a bit more algebra shows that the resulting reflection zeros out all but the first element of `x`.

The figure above illustrates the situation because `Hx` has only one nonzero component.

Here is the code, including the case where `x` is already all zero.

```
type house_gen
```

function u = house_gen(x) % u = house_gen(x) % Generate Householder reflection. % u = house_gen(x) returns u with norm(u) = sqrt(2), and % H(u,x) = x - u*(u'*x) = -+ norm(x)*e_1. % Modify the sign function so that sign(0) = 1. sig = @(u) sign(u) + (u==0); nu = norm(x); if nu ~= 0 u = x/nu; u(1) = u(1) + sig(u(1)); u = u/sqrt(abs(u(1))); else u = x; u(1) = sqrt(2); end end

Let's try this on the first column of the *Wikipedia* example

x = [12 6 -4]'

x = 12 6 -4

The vector `u` generated is

u = house_gen(x)

u = 1.3628 0.3145 -0.2097

The resulting reflection has the desired effect.

Hx = H(u,x)

Hx = -14.0000 -0.0000 0.0000

`Hx(1)` is equal to `-norm(x)` and the other elements of `Hx` are zero.

Our anonymous function can be represented by a matrix. This is the usual way of defining Householder reflections

I = eye(3); M = H(u,I)

M = -0.8571 -0.4286 0.2857 -0.4286 0.9011 0.0659 0.2857 0.0659 0.9560

In general

$$M = I - uu'$$

Multiplication by this matrix produces the same reflection as the anonymous function, but requires $n^2$ operations, instead of $2n$.

Recall that Gaussian elimination can be described by a sequence of matrix multiplications, but it is actually carried out by subtracting multiples of rows from other rows. There is an anonymous function for this elimination operation, but it is not so easy to express the pivoting.

We are now ready to compute the QR decomposition. Pass over the columns of the input matrix, using Householder reflections to introduce zeros below the diagonal. This produces the R matrix. It is inefficient and usually not necessary to actually compute Q. Just save the `u` 's.

Here is where the way we have expressed the anonymous function is important. When `x` is a vector, `u'*x` is a scalar and we could have written it in front of `u`, like this.

H = @(u,x) x - (u'*x)*u;

But we want to apply `H` to several columns of a matrix at once, so we have written `(u'*x)` after `u`, like this,

H = @(u,x) x - u*(u'*x);

```
type house_qr
```

function [R,U] = house_qr(A) % Householder reflections for QR decomposition. % [R,U] = house_qr(A) returns % R, the upper triangular factor, and % U, the reflector generators for use by house_apply. H = @(u,x) x - u*(u'*x); [m,n] = size(A); U = zeros(m,n); R = A; for j = 1:min(m,n) u = house_gen(R(j:m,j)); U(j:m,j) = u; R(j:m,j:n) = H(u,R(j:m,j:n)); R(j+1:m,j) = 0; end end

Use a magic square as an example.

A = magic(6)

A = 35 1 6 26 19 24 3 32 7 21 23 25 31 9 2 22 27 20 8 28 33 17 10 15 30 5 34 12 14 16 4 36 29 13 18 11

Compute its decomposition.

[R,U] = house_qr(A)

R = -56.3471 -16.4693 -30.0459 -39.0969 -38.0321 -38.6710 0 -54.2196 -34.8797 -23.1669 -25.2609 -23.2963 0 0 32.4907 -8.9182 -11.2895 -7.9245 0 0 0 -7.6283 3.9114 -7.4339 0 0 0 0 -3.4197 -6.8393 0 0 0 0 0 -0.0000 U = 1.2732 0 0 0 0 0 0.0418 1.2568 0 0 0 0 0.4321 0.0451 -1.1661 0 0 0 0.1115 0.3884 0.4557 1.0739 0 0 0.4182 -0.0108 0.5942 -0.6455 1.0796 0 0.0558 0.5171 0.2819 -0.6558 -0.9135 1.4142

The fact that `R(6,6)` is equal to zero tells us that the magic square of order 6 has rank less than 6 and so is singular

type house_apply type house_apply_transpose

function Z = house_apply(U,X) % Apply Householder reflections. % Z = house_apply(U,X), with U from house_qr % computes Q*X without actually computing Q. H = @(u,x) x - u*(u'*x); Z = X; [~,n] = size(U); for j = n:-1:1 Z = H(U(:,j),Z); end end function Z = house_apply_transpose(U,X) % Apply Householder transposed reflections. % Z = house_apply(U,X), with U from house_qr % computes Q'*X without actually computing Q'. H = @(u,x) x - u*(u'*x); Z = X; [~,n] = size(U); for j = 1:n Z = H(U(:,j),Z); end end

We can use `house_apply` to get the matrix $Q$ of the QR decomposition by applying the transformations to the identity matrix.

I = eye(size(U)); Q = house_apply(U,I)

Q = -0.6211 0.1702 -0.2070 -0.4998 0.2062 0.5000 -0.0532 -0.5740 -0.4500 -0.2106 -0.6487 -0.0000 -0.5502 0.0011 -0.4460 0.4537 0.2062 -0.5000 -0.1420 -0.4733 0.3763 -0.5034 0.3329 -0.5000 -0.5324 0.0695 0.6287 0.2096 -0.5220 -0.0000 -0.0710 -0.6424 0.1373 0.4501 0.3329 0.5000

Check that `Q` is orthogonal.

QQ = Q'*Q

QQ = 1.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 1.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 1.0000

And that `Q*R` regenerates our magic square.

QR = Q*R

QR = 35.0000 1.0000 6.0000 26.0000 19.0000 24.0000 3.0000 32.0000 7.0000 21.0000 23.0000 25.0000 31.0000 9.0000 2.0000 22.0000 27.0000 20.0000 8.0000 28.0000 33.0000 17.0000 10.0000 15.0000 30.0000 5.0000 34.0000 12.0000 14.0000 16.0000 4.0000 36.0000 29.0000 13.0000 18.0000 11.0000

Alston S. Householder, "Unitary Triangularization of a Nonsymmetric Matrix", Journal of the ACM 5, 339-242, 1958. <http://dl.acm.org/citation.cfm?doid=320941.320947>

G. W. Stewart, *Matrix Algorithms: Volume 1: Basic Decompositions*, SIAM, xix+458, 1998. <http://epubs.siam.org/doi/book/10.1137/1.9781611971408>

Get
the MATLAB code

Published with MATLAB® R2016a