The hypergeometric distribution is a probability distribution with parameters *N*, *M*, and *n*. Suppose you have an urn containing *N* balls, *M* red and the rest, *N* – *M* blue and you select *n* balls at a time. The hypergeometric distribution gives the probability of selecting *k* red balls.

The **probability generating function** for a discrete distribution is the series formed by summing over the probability of an outcome *k* and *x*^{k}. So the probability generating function for a hypergeometric distribution is given by

The summation is over all integers, but the terms are only non-zero for *k* between 0 and *M* inclusive. (This may be more general than the definition of binomial coefficients you’ve seen before. If so, see these notes on the general definition of binomial coefficients.)

It turns out that *f* is a **hypergeometric function** of *x* because it is can be written as a **hypergeometric series**. (Strictly speaking, *f* is a constant multiple of a hypergeometric function. More on that in a moment.)

A hypergeometric function is defined by a pattern in its power series coefficients. The hypergeometric function *F*(*a, **b*; *c*; *x*) has a the power series

where (*n*)_{k} is the *k*th rising power of *n*. It’s a sort of opposite of factorial. Start with *n* and multiply consecutive *increasing* integers for *k* terms. (*n*)_{0} is an empty product, so it is 1. (*n*)_{1} = *n*, (*n*)_{2} = *n*(*n*+1), etc.

If the ratio of the *k*+1st term to the *k*th term in a power series is a polynomial in *k*, then the series is a (multiple of) a hypergeometric series, and you can read the parameters of the hypergeometric series off the polynomial. This ratio for our probability generating function works out to be

and so the corresponding hypergeometric function is *F*(-*M*, –*n*; *N* – *M* – *n* + 1; *x*). The constant term of a hypergeometric function is always 1, so evaluating our probability generating function at 0 tells us what the constant is multiplying *F*(-*M*, –*n*; *N* – *M* – *n* + 1; *x*). Now

and so

The hypergeometric series above gives the original hypergeometric function as defined by Gauss, and may be the most common form in application. But the definition has been extended to have any number of rising powers in the numerator and denominator of the coefficients. The classical hypergeometric function of Gauss is denoted _{2}*F*_{1} because it has two falling powers on top and one on bottom. In general, the hypergeometric function _{p}*F*_{q} has *p* rising powers in the denominator and *q* rising powers in the denominator.

The CDF of a hypergeometric distribution turns out to be a more general hypergeometric function:

where *a* = 1, *b* = *k*+1-*M*, *c* = *k*+1-*n*, *d = k*+2, and *e* = *N*+*k*+2-*M*–*n*.

Thanks to Jan Galkowski for suggesting this topic via a comment on an earlier post, Hypergeometric bootstrapping.

]]>Every field has its own structure, its tropes, its traditions. Someone unfamiliar with the field can be overwhelmed, not having the framework that an insider has to understand things. They may think something is completely original when in fact the original portion is small.

In college I used to browse the journals in the math library and be completely overwhelmed. I didn’t learn until later that usually very little in a journal article is original, and even the original part isn’t *that* original. There’s a typical structure for a paper in PDEs, for example, just as there are typical structures for romantic comedies, symphonies, or space operas. A paper in partial differential equations might look like this:

- Motivation / previous work
- Weak formulation of PDE
- Craft function spaces and PDE as operator
- A priori estimates imply operator properties
- Well posedness results
- Regularity

An expert knows these structures. They know what’s boilerplate, what’s new, and just how new the new part is. When I wrote something up for my PhD advisor I remember him saying “You know what I find most interesting?” and pointing to one inequality. The part he found interesting, the *only* part he found interesting, was not that special from my perspective. It was all hard work for me, but only one part of it stood out as slightly original to him. An expert in partial differential equations sees a PDE paper the way a professional musician listens to another or the way a chess master sees a chess board.

While a math journal article may look totally incomprehensible, an expert in that specialization might see 10% of it as somewhat new. An interesting contrast to this is the “*abc* conjecture.” Three and a half years ago Shinichi Mochizuki proposed a proof of this conjecture. But his approach is so entirely idiosyncratic that nobody has been able to understand it. Even after a recent conference held for the sole purpose of penetrating this proof, nobody but Mochizuki really understands it. So even though most original research is not that original, once in a while something really new comes out.

Sometimes consultants working in software development will ask me to help out with some mathematical part of their projects. And sometimes math/stat folks will ask me to help out with some computational part of their projects.

I started my consulting business three years ago. Since then I’ve gotten to know a few other consultants well. This lets me offer a broader range of services to a client by bringing in other people, and sometimes it helps me find projects.

If you’re a consultant and interested in working together, please send me an email introducing yourself. I’m most interested in meeting consultants who have some overlap with what I do but who also have complementary strengths.

]]>Unfortunately polylogarithms are defined in several slightly different and incompatible ways. I’ll start by following An Atlas of Functions and then mention differences in A&S, SciPy, and Mathematica.

According to *Atlas*,

Polylogarithms are themselves special cases of Lerch’s function. Also known as

Jonquière’s functions(Ernest Jean Philippe Fauque de Jonquières, 1820–1901, French naval officer and mathematician), they appear in theFeynman diagramsof particle physics.

The idea is to introduce an extra parameter ν in the power series for natural log:

When ν = 1 we get the ordinary logarithm, i.e. polyln_{1} = log. Then polyln_{2} is the dilogarithm diln, and polyln_{3} is the trilogarithm triln.

One advantage of the definition in *Atlas* is that the logarithm is a special case of the polylogarithm. Other conventions don’t have this property.

The venerable A&S defines dilogarithms in a way that’s equivalent to the negative of the definition above and does not define polylogarithms of any other order. SciPy’s special function library follows A&S. SciPy uses the name `spence`

for the dilogarithm for reasons we’ll get to shortly.

Mathematica has the function `PolyLog[ν, x]`

that evaluates to

So polyln_{ν} above corresponds to `-PolyLog[ν, -x]`

in Mathematica. Matlab’s `polylog`

is the same as Mathematica’s `PolyLog`

.

**Spence’s integral** is the function of *x* given by the integral

and equals diln(*x*). Note that the SciPy function `spence`

returns the negative of the integral above.

The **Lerch function** mentioned above is named for Mathias Lerch (1860–1922) and is defined by the integral

The connection with polylogarithms is easier to see from the series expansion:

The connection with polylogarithms is then

Note that the Lerch function also generalizes the Hurwitz zeta function, which in turn generalizes the Riemann zeta function. When *x* = 1, the Lerch function reduces to ζ(ν, *u*).

Here’s one solution.

A[cglmrstu]|B[aehikr]?|C[adeflmnorsu]?|D[bsy]|E[rsu]|F[elmr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airuv]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|U(u[opst])?|V|W|Xe|Yb?|Z[nr]

Here’s the same expression in more readable form:

/ A[cglmrstu] | B[aehikr]? | C[adeflmnorsu]? | D[bsy] | E[rsu] | F[elmr]? | G[ade] | H[efgos]? | I[nr]? | Kr? | L[airuv] | M[dgnot] | N[abdeiop]? | Os? | P[abdmortu]? | R[abefghnu] | S[bcegimnr]? | T[abcehilm] | U(u[opst])? | V | W | Xe | Yb? | Z[nr] /x

The `/x`

option in Perl says to ignore white space. Other regular expression implementations have something similar. Python has two such options, `X`

for similarity with Perl, and `VERBOSE`

for readability. Both have the same behavior.

The regular expression says that a chemical element symbol may start with A, followed by c, g, l, m, r, s, t, or u; or a B, optionally followed by a, e, h, i, k, or r; or …

The most complicated part of the regex is the part for symbols starting with U. There’s Uranium whose symbols is simply U, and there are the elements who have temporary names based on their atomic numbers: Ununtrium, Ununpentium, Ununseptium, and Ununoctium. These are just Latin for one-one-three, one-one-five, one-one-seven, and one-one-eight. The symbols are U, Uut, Uup, Uus, and Uuo. The regex `U(u[opst])?`

can be read “U, optionally followed by u and one of o, p, s, or t.”

Note that the regex will match any string that contains a chemical element symbol, but it could match more. For example, it would match “I’ve never been to Boston in the fall” because that string contains B, the symbol for boron. Exercise: Modify the regex to *only* match chemical element symbols.

There may be clever ways to use fewer characters at the cost of being more obfuscated. But this is for fun anyway, so we’ll indulge in a little regex golf.

There are five elements whose symbols start with I or Z: I, In, Ir, Zn, and Zr. You could write `[IZ][nr]`

to match four of these. The regex `I|[IZ][nr]`

would represent all five with 10 characters, while `I[nr]?|Z[nr]`

uses 12. Two characters saved! Can you cut out any more?

- Notes on regular expressions in Python, PowerShell, R, Mathematica, and C++
- Twitter account with daily regex tips

We can take the mean value property from the continuous case as the definition in the **discrete** case. A function on a graph is harmonic at a node *x* if *f*(*x*) equals the average of *f*‘s values on the nodes connected to *x*. A function is said to have a singularity at points of a graph where it is not harmonic.

A non-constant function on a finite connected graph cannot be harmonic at every node. It has to have at least two singularities, namely the points where it takes on its maximum and its minimum. This is analogous to Liouville’s theorem for (continuous) harmonic functions: a bounded harmonic function on all of Euclidean space must be constant. Some infinite graphs have non-constant functions that are harmonic everywhere. For example, linear functions on the integers are harmonic.

In the continuous case, we want to find solutions to Laplace’s equation satisfying a boundary condition. We want to find a function that has the specified values on the boundary and is harmonic in the interior. With the right regularity conditions, this solution exists and is unique.

We can do the same for discrete harmonic functions. If we specify the values of a function on some non-empty subset of nodes, there is a unique discrete harmonic function that extends the function to the rest of a graph.

Discrete harmonic functions come up in applications such as random walks and electrical networks.

**Related**: Can you hear the shape of a network?

]]>

**JC**: Ray, could you start by saying a little about yourself?

**RR**: Sure. My name is Ray Rilling, and I am the Director of Technology at Putnam Plastics.

My initial training was cellular biology with an emphasis on epidemiology, but I decided to move to a more applied (and lucrative) field. I’ve been in medical plastics for the last 23 years.

When I saw that epidemiology was a crowded field, I pursued substantial amounts of additional coursework that allowed me to get into the medical engineering plastics space. In lots of ways I think this pick-and-choose approach was much better than a traditional bio-medical engineering degree would have been. It allowed me to focus on exactly what I needed to learn.

**JC**: Thanks. And tell me a bit about where you work.

**RR**: Putnam Plastics is a 30 year old medical plastics manufacturer, specializing in polymer-based medical components (i.e. plastic tubing for medical usage).

**JC**: I did some consulting work for a company that manufactures stents, so I imagine your companies have a few things in common. What is your day-to-day mathematical toolkit?

**RR**: We cross a lot of boundaries and there are many layers of computational needs. The majority of the computations are basic algebraic formulas to calculate simple things like annular tooling draw down ratios or the ultimate tensile of a tube. From there, you may find that calculating the burst pressure of a composite or may require the solving of a linear system.

The most challenging computations that we perform are for our complex polymer extrusion tools. These are based on Computational Fluid Dynamics (CFD) and Finite or Boundary Element Methods (FEM / BEM). Many complexities emerge because polymer flow is compressible, non-isothermal, and viscoelastic in nature. It does not allow us to use conventional Navier-Stokes or other common equations. In all reality, we rely on proprietary equations to point us in the right direction. No model has ever provided a perfect solution. We combine our experience and expertise with the applied calculations.

The most complex of the computations are performed with third party software packages and often require the use of complex meshing, materials testing for inputs, and can even require us to create modified code to suit the specific application. We work with some very creative minds and there is never a shortage of new challenges.

Aside from the design engineering mathematics, we think statistically about our results. This can be to analyze the process capability, equivalency, and even product risk analysis. This statistical analysis can be intertwined with the design engineering aspects and deal with a range of biological and medical constants (i.e. devices might need to handle liquid nitrogen or remain in bodies for years or decades).

One of my mentors used to say, “The best equation is teamwork.” Teamwork is a sort of expectation, a crowd-sourced form of expert opinion. It allows you bring a focus on what the real variables are.

Some calculations can take weeks, so having a good appreciation of how to approach the challenge is important. You can get away with not knowing what’s under the hood. But that’s dangerous. It is much better to get things done more quickly, and with better understanding. Especially when a client is waiting on results.

**JC**: What is some math you wish you learned more of?

Vector and tensor-based physics. As the bar in the medical device industry increases, so do the expectations of the products that we make. Being able take large number of variables or inputs and transform them into a workable model is extremely valuable. My dream is to be able to reliably calculate extrusion tooling, product failure modes, and performance properties before I start cutting steel. But in our time and age, we still rely on some development iterations and calculate what we can.

I also wish I had more applied materials science. When I am developing something in the lab, sometimes the product does not perform the way you want it to. Everytime I start learning something new, like applied surface chemistry or the effects of radiation on polymers, I think of 100 things that I could have done in previous projects.

**JC**: Did you learn any math you haven’t had much use for yet? I’ve used most of the math I learned, though sometimes there’s been a gap of twenty years between learning something and applying it.

**RR**: I actually use pretty much all of the math I’ve learned. But that’s likely because I was able to pick and choose because I went back and did supplemental studies. Even the epidemiology is useful.

**JC**: I wanted to pick up on what you said earlier about physical models. Could you say more about the interplay of physical and mathematical models at Putnan?

**RR**: The models are never completely right. We use them as a tool. They often fail in three key ways:

- We find additional variables we need to account for or constrain. In many cases our first attempt failed because the product did not perform properly in an unaccounted for way. At the end of the day it might take many iterations before we have all of the performance properties that are needed.
- We are translating soft or subjective numbers from physicians or engineers. A model will provide concrete numbers. It is often important to create a baseline product that can take the customers subjective description and correlate it to a model.
- We need to additionally constrain for cost effectiveness (i.e. minimize the amount of expensive wire in a catheter).

Those three trade offs mean that we are never able to just take a model print out and run with it. There always has to be some back and forth.

**JC**: It’s interesting that you consider shortcomings besides inaccuracy as failures. Someone new to applying math in the real world might think of your first point but the other points. The latter two aren’t failures from an academic perspective, but they certainly are from a business perspective.

* * *

**Other interviews**:

- Eric Floer on weather forecast accuracy
- Sir Michael Atiyah on mathematics
- Rick Richter, CIO of Food for the Hungry

Hypergeometric functions satisfy a **lot** of identities, so you can bootstrap one such function into many more. That’s one reason they’re so useful in applications. For this post, I want to focus on just three formulas, the so-called linear transformation formulas that relate one hypergeometric function to another. (There are more linear formulas that relate one hypergeometric function to a combination of two others. I may consider those in a future post, but not today.)

The classical hypergeometric function has three parameters. The linear transformations discussed above relate the function with parameters (*a*, *b*, *c*) to those with parameters

- (
*c*–*a*,*c*–*b*,*c*) - (
*a*,*c*–*b*,*c*) - (
*b*,*c*–*a*,*c*)

Here are the linear transformations in detail:

The three transformations above are the ones listed in Handbook of Mathematical Functions by Abramowitz and Stegun. How many more transformations can we create by combining them? To answer this, we represent each of the transformations with a matrix. The transformations correspond to multiplying the matrix by the column vector (*a*, *b*, *c*).

The question of how many transformations we can come up with can be recast as asking what is the order of the group generated by *A*, *B*, and *C*. (I’m jumping ahead a little by presuming these matrices generate a group. Conceivably the don’t have inverses, but in fact they do. The inverses of *A*, *B*, and *C* are *A*, *B*, and *AC* respectively.)

A little experimentation reveals that there are eight transformations: *I, A*, *B*, *C*, *D* = *AB*, *E* = *AC*, *F* = *BC*, and *G* = *CB*. So in addition to the identity *I*, the do-nothing transformation, we have found four more: *D*, *E*, *F*, and *G*. These last four correspond to taking (*a*, *b*, *c*) to

- (
*c*–*a*,*b*,*c*) - (
*a*,*c*–*b*,*c*) - (
*b*,*a*,*c*) - (
*c*–*b*,*c*–*a*,*c*)

This means that if we have software to compute the hypergeometric function with one fixed set of parameters, we can bootstrap that into computing the hypergeometric function with up to seven more sets of parameters. (Some of the seven new combinations could be repetitive if, for example, *a* = *b*.)

We have uncovered a group of order 8, and there are only 5 groups that size. We should be able to find a familiar group isomorphic to our group of transformations.

The five groups of order 8 are Z8 (cyclic group), Z4×Z2 (direct product), E8 (elementary Abelian group), D8 (dihedral group), and the quaternion group. The first three are Abelian, and our group is not, so our group must be isomorphic to either the quaternion group or the dihedral group. The quaternion group has only 1 element of order 2, and the dihedral group has 5 elements of order 2. Our group has five elements of order 2 (*A*, *B*, *D*, *F*, *G*) and so it must be isomorphic to the dihedral group of order 8, the rotations and reflections of a square. (Some authors call this D8, because the group has eight elements. Others call it D4, because it is the group based on a 4-sided figure.)

]]>

Ten years before Kac asked about hearing the shape of a drum, Günthard and Primas asked the analogous question about graphs. Can you hear the shape of a graph? That is, can two different graphs have the same eigenvalues? At the time, the answer was believed to be yes, but a year later it was found to be no, not always [1]. So the next natural question is when *can* you hear the shape of a graph, i.e. under what conditions is a graph determined by its eigenvalues? It depends on which matrix you’re taking the eigenvalues of, but under some conditions some matrix spectra uniquely determine graphs. We don’t know in general how common it is for spectra to uniquely determine graphs.

Here are two graphs that have the same adjacency matrix spectra, first published in [2]:

Both have adjacency spectra [-2, 0, 0, 0, 2]. But the graphs are not cospectral as far as the Laplacian is concerned. Their Laplace spectra are [0, 0, 2, 2, 4] and [0, 1, 1, 1, 5] respectively. We could tell that the Laplace spectra would be different before computing them because the second smallest Laplace eigenvalue is positive if and only if a graph is connected.

The graphs below are cospectral for the adjacency, Laplacian, and unsigned Laplacian matrices.

Both graphs have the same number of nodes and edges, and every node has degree 4 in both graphs. But the graph on the left contains more triangles than the one on the right, so they cannot be isomorphic.

One way to test whether two graphs are isomorphic is to compute their spectra. If the spectra are different, the graphs are not isomorphic. So spectral analysis gives a way to show that two graphs are *not* isomorphic in polynomial time, though the test may be inconclusive.

If two graphs do have the same spectra, what is the probability that they are isomorphic? In [1] the authors answer this question empirically for graphs of order up to 11. Very roughly, there’s about an 80% chance graphs with the same adjacency matrix spectrum are isomorphic. The chances go up to 90% for the Laplacian and 95% for the signless Laplacian.

* * *

[1] Edwin R. van Dam, Willem H. Haemers. Which graphs are determined by their spectrum? Linear Algebra and its Applications 373 (2003) 241–272

[2] D.M. Cvetkovi´c, Graphs and their spectra, Univ. Beograd. Publ. Elektrotehn. Fak. Ser. Mat. Fiz. 354–356 (1971) 1–50.

]]>The continuous versions of the Fourier and Laplace transforms are given as follows.

Fourier transform:

Laplace transform:

The Fourier transform is defined several ways, and I actually prefer the convention that puts a factor of 2π in the exponential, but the convention above makes the analogy with Laplace transform simpler. There are two differences between the Fourier and Laplace transforms. The Laplace transform integrates over only half the real line, compared to the entire real line for Fourier. But a variation on the Laplace transform, the **Bilateral Laplace transform** integrates over the entire real line. The Bilateral Laplace transform at *s* is simply the Fourier transform at *x* = *is*. And of course the same is true for the (one-sided) Laplace transform if the function *f* is only non-zero for positive values.

I’ve encountered the Fourier transform more in application, and the Laplace transform more in teaching. This is not to say the Laplace transform isn’t used in practice; it certainly is used in applications. But the two transforms serve similar purposes, and the Laplace transform is easier to teach. Because the factor exp(-*sx*) decays rapidly, the integral defining the Laplace transform converges for functions where the integral defining the Fourier transform would not. Such functions may still have Fourier transforms, but the transforms require distribution theory whereas the Laplace transforms can be computed using basic calculus.

There’s more difference between the discrete versions of the Fourier and Laplace transforms than between the continuous versions.

The discrete Fourier transform (DFT) approximates the integral defining the (continuous) Fourier transform with a finite sum. It discretizes the integral and truncates its domain. The discrete Laplace transform is an infinite sum. It discretizes the integral defining the Laplace transform, but it does *not* truncate the domain. Given a step size η > 0, the discrete Laplace transform of *f* is

The discrete Laplace transform isn’t “as discrete” as the discrete Fourier transform. The latter takes a finite sequence and returns a finite sequence. The former evaluates a function at an infinite number of points and produces a continuous function.

The discrete Laplace transform is used in applications such as signal processing, as well as in the theory of analytic functions.

If η = 1 and *z* = exp(-*s*), the discrete Laplace transform becomes the **z-transform** of the values of *f* at non-negative integers. And if we replace *z* with 1/*z*, or equivalently set *z* = exp(*s*) instead of *z* = exp(-*s*), we get the **generating function** of the values of *f* at non-negative integers.

z-transforms are common in digital signal processing, while generating functions are common in combinatorics. They are essentially the same thing.

]]>“Reproducible” and “randomized” don’t seem to go together. If something was unpredictable the first time, shouldn’t it be unpredictable if you start over and run it again? As is often the case, we want incompatible things.

But the combination of reproducible and random can be reconciled. Why would we want a randomized controlled trial (RCT) to be random, and why would we want it to be reproducible?

**One of the purposes** in randomized experiments is the hope of scattering complicating factors evenly between two groups. For example, one way to test two drugs on a 1000 people would be to gather 1000 people and give the first drug to all the men and the second to all the women. But maybe a person’s sex has something to do with how the drug acts. If we randomize between two groups, it’s likely that about the same number of men and women will be in each group.

The example of sex as a factor is oversimplified because there’s reason to suspect *a priori* that sex might make a difference in how a drug performs. The bigger problem is that factors we can’t anticipate or control may matter, and we’d like them scattered evenly between the two treatment groups. If we knew what the factors were, we could assure that they’re evenly split between the groups. The hope is that randomization will do that for us with things we’re unaware of. For this purpose we don’t need a process that is “truly random,” whatever that means, but a process that matches our expectations of how randomness should behave. So a pseudorandom number generator (PRNG) is fine. No need, for example, to randomize using some physical source of randomness like radioactive decay.

**Another purpose** in randomization is for the assignments to be unpredictable. We want a physician, for example, to enroll patients on a clinical trial without knowing what treatment they will receive. Otherwise there could be a bias, presumably unconscious, against assigning patients with poor prognosis if the physicians know the next treatment be the one they hope or believe is better. Note here that the randomization only has to be unpredictable from the perspective of the people participating in and conducting the trial. The assignments could be predictable, in principle, by someone *not* involved in the study.

And why would you want an randomization assignments to be **reproducible**? One reason would be to test whether randomization software is working correctly. Another might be to satisfy a regulatory agency or some other oversight group. Still another reason might be to defend your randomization in a law suit. A physical random number generator, such as using the time down to the millisecond at which the randomization is conducted would achieve random assignments and unpredictability, but not reproducibility.

Computer algorithms for generating random numbers (technically pseudo-random numbers) can achieve reproducibility, practically random allocation, and unpredictability. The randomization outcomes are predictable, and hence reproducible, to someone with access to the random number generator and its state, but unpredictable in practice to those involved in the trial. The internal state of the random number generator has to be saved between assignments and passed back into the randomization software each time.

Random number generators such as the Mersenne Twister have good statistical properties, but they also carry a large amount of state. The random number generator described here has very small state, 64 bits, and so storing and returning the state is simple. If you needed to generate a trillion random samples, Mersenne Twitster would be preferable, but since RCTs usually have less than a trillion subjects, the RNG in the article is perfectly fine. I have run the Die Harder random number generator quality tests on this generator and it performs quite well.

**Related**:

Image by Ilmicrofono Oggiono, licensed under Creative Commons

]]>(Here *z* is a complex variable.) Despite its simplicity, it’s interesting to look at in detail.

Let *z* = *r* exp(*i*θ) and let *w* = *u* + *iv* be its image. Writing the Joukowsky transformation in terms of its real and complex parts makes it easier to understand how it transforms lines and circles.

We can see how circles are mapped by holding *r* constant and letting θ vary. The unit circle gets crushed to the interval [-1, 1] on the real axis, traversed twice. Circles of radius ρ ≠ 1 get mapped to ellipses

where

Next we hold θ constant and let *r* vary. If sin θ = 0 and cos θ > 0 then the image is [1, ∞). If sin θ = 0 and cos θ < 0 then the image is (-∞, -1]. In both cases the image is traced out twice. The image of the line with cos θ = 0 is the vertical axis. Otherwise lines through the origin are mapped to hyperbolas with equation

If (*z* + 1/*z*)/2 = *w* then *z*^{2} -2*wz* + 1 = 0. The discriminant of this equation is 4(*w*^{2} – 1) and so the Joukowsky transformation is 2-to-1 unless *w* = ± 1, in which case *z* = ± 1. The product of two roots of a quadratic equation equals the constant term divided by the leading coefficient. In this case, the product is 1. This says the Joukowski transformation is 1-to-1 in any region that doesn’t contain both *z* and 1/*z*. This is the case for the interior or exterior of the unit circle, or of the upper or lower half planes. In any of those four regions, one can invert the Joukowski transformation by solving a quadratic equation and choosing the correct root.

The birthday problem makes a nice party trick, but generalizations of the problem come up frequently in applications. I wrote in the previous post how it comes up in seeding distributed Monte Carlo simulations. In computer science, it’s a concern in hashing algorithms.

If you have a set of *N* things to choose from, such as *N* = 365 birthdays, and take *r* samples, the probability that all *r* samples are unique is

and the probability that at least two of the samples are the same is 1 – *p*. (This assumes that *N* is at least as big as *r*. Otherwise the denominator is undefined, but in that case we know *p* is 0.)

With moderately large values of *N* and *r* the formula is likely to overflow if implemented directly. So as usual the trick is to use logarithms to avoid overflow or underflow. Here’s how you could compute the probability above in Python. SciPy doesn’t have a log factorial function, but does have a log gamma function, so we use that instead.

from scipy import exp, log from scipy.special import gammaln def prob_unique(N, r): return exp( gammaln(N+1) - gammaln(N-r+1) - r*log(N) )]]>

I got a call one time to take a look at randomization software that wasn’t randomizing. My first thought was that the software was working as designed, and that the users were just seeing a long run. Long sequences of the same assignment are more likely than you think. You might argue, for example, that the chances of flipping five heads in a row would be (1/2)^{5} = 1/32, but that underestimates the probability because a run could start at any time. The chances that the *first* five flips are heads would indeed be 1/32. But the probability of seeing five heads in a row any time during a series of flips is higher.

Most of the times that I’ve been asked to look at randomization software that “couldn’t be right,” the software was fine. But in this case, there wasn’t simply a long run of random results that happened to be equal. The results were truly constant. At least for some users. Some users would get different results from time to time, but others would get the same result every single time.

The problem turned out to be how the software set the seed in its random number generator. When the program started up it asked the user “Enter a number.” No indication of what kind of number or for what purpose. This number, unbeknownst to the user, was being used as the random number generator seed. Some users would enter the same number every time, and get the same randomization result, every time. Others would use more whimsy in selecting numbers and get varied output.

How do you seed a random number generator in a situation like this? A better solution would be to seed the generator with the current time, though that has drawbacks too. I write about that in another post.

A more subtle problem I’ve seen with random number generator seeding is spawning multiple processes that each generate random numbers. In a well-intentioned attempt to give each process a unique seed, the developers ended up virtually assuring that many of the processes would have exactly the same seed.

If you parallelize a huge simulation by spreading out multiple copies, but two of the processes use the same seed, then their results will be identical. Throwing out the redundant simulation would reduce your number of samples, but not noticing and keeping the redundant output would be worse because it would cause you to underestimate the amount of variation.

To avoid duplicate seeds, the developers used a random number generator to assign the RNG seeds for each process. Sounds reasonable. Randomly assigned RNG seeds give you even more random goodness. Except they don’t.

The developers had run into a variation on the famous birthday problem. In a room of 23 people, there’s a 50% chance that two people share the same birthday. And with 50 people, the chances go up to 97%. It’s not *certain* that two people will have the same birthday until you have 367 people in the room, but the chances approach 1 faster than you might think.

Applying the analog of the birthday problem to the RNG seeds explains why the project was launching processes with the same seed. Suppose you seed each process with an unsigned 16-bit integer. That means there are 65,536 possible seeds. Now suppose you launch 1,000 processes. With 65 times as many possible seeds as processes, surely every process should get its own seed, right? Not at all. There’s a 99.95% chance that two processes will have the same seed.

In this case it would have been better to seed each process with *sequential* seeds: give the first process seed 1, the second seed 2, etc. The *seeds* don’t have to be random; they just have to be unique. If you’re using a good random number generator, the outputs of 1,000 processes seeded with 1, 2, 3, …, 1000 will be independent.

If you need help with randomization, please contact me. I can help you avoid subtle errors and have randomization procedures that will stand up to scrutiny.

]]>If *G* has *r* distinct eigenvalues—whether of the adjacency matrix *A*, Laplacian *L*, or signless Laplacian *Q*—then the its diameter *d* is at most *r*-1. (Definitions of these matrices here.)

If *G* has *n* vertices then we have an lower bound on the diameter: *d* ≥ 4/*n*λ_{2} where λ_{2} is the algebraic connectivity, the second eigenvalue of the graph Laplacian.

Let Δ be the maximum degree of *G. T*hen we also have the following upper bound:

How well do these bounds work? Let’s look at a random graph with 50 vertices.

The diameter of this graph is 3. The highest vertex degree is 22.

Each of the three matrices *A*, *L*, and *Q* have 50 distinct eigenvalues. So that says the diameter should be less than 49, which it certainly is.

The lower bound based on algebraic connectivity is 0.0129, and the upper bound is 32.

Altogether, we estimated the diameter, whose true value is 3, to be between 0.0129 and 32. Which is correct, but not very tight.

Maybe the bounds weren’t very good because the diameter was small. Let’s look at graph with a much bigger diameter, a circle with 50 nodes.

Now this graph has diameter 25. The maximum node degree is 2, because every node has degree 2.

Each of the three matrices *A*, *L*, and *Q* have 26 distinct eigenvalues, which predicts that the graph diameter will be no more than 25. In this case, the upper bound is exact.

The algebraic connectivity gives a lower bound of 5.0727 for the diameter and an upper bound of 180. So while these bounds are correct, they are not especially tight either.

Source: Handbook of Linear Algebra

**Related**:

Let *G* be a graph with *n* vertices and *m* edges and let *L* be the Laplacian matrix of *G*. Then the sum of the eigenvalues of *L* is 2*m*. (The diagonal of *L* contains the degrees of each node, so it’s trace is twice the number of edges. And the trace of a matrix is the sum of its eigenvalues.)

Let λ_{i} be the eigenvalues of the *L*. Then we have

with equality if and only if *G* is regular. [reference]

We can try this out by starting with a dodecahedron graph. It’s a regular graph, since each vertex has three edges. There are a total of 30 edges and the eigenvalues sum to 60. The right-most expression is 60 as expected.

Next remove one of the edges from the graph. The result bears a remarkable resemblance to Iron Man.

Now there are 29 edges, so 2*m* = 58. The right-most expression above is now 58.31. It’s bigger than 58 because the new graph is not regular. But as expected, it’s close to 58 since the graph is close to regular.

What are the next areas of pure math to find widespread application? Some people are saying algebraic topology and category theory.

[I saw a cartoon to this effect the other day but I can’t find it. If I remember correctly, someone was standing on a hill labeled “algebraic topology” and looking over at hills in the distance labeled with traditional areas of applied math. Differential equations, Fourier analysis, or things like that. If anybody can find that cartoon, please let me know.]

The big idea behind algebraic topology is to turn topological problems, which are hard, into algebraic problems, which are easier. For example, you can associate a group with a space, the *fundamental group*, by looking at equivalence classes of loops. If two spaces have different fundamental groups, they can’t be topologically equivalent. The converse generally isn’t true: having the same fundamental group does not prove two spaces are equivalent. There’s some loss of information going from topology to algebra, which is a good thing. As long as information you need isn’t lost, you get a simpler problem to work with.

Fundamental groups are easy to visualize, but hard to compute. Fundamental groups are the lowest dimensional case of homotopy groups, and higher dimensional homotopy groups are even harder to compute. Homology groups, on the other hand, are a little harder to visualize but much easier to compute. Applied topology, at least at this point, is applied *algebraic* topology, and more specifically applied *homology* because homology is practical to compute.

People like Robert Ghrist are using homology to study, among other things, sensor networks. You start with a point cloud, such as the location of sensors, and thicken the points until they fuse into spaces that have interesting homology. This is the basic idea of *persistent homology*. You’re looking for homology that persists over some range of thickening. As the amount of thickening increases, you may go through different ranges with different topology. The homology of these spaces tells you something about the structure of the underlying problem. This information might then be used as features in a machine learning algorithm. Topological invariants might prove to be useful features for classification or clustering, for example.

Most applications of topology that I’ve seen have used persistent homology. But there may be entirely different ways to apply algebraic topology that no one is looking at yet.

Category theory has been getting a lot of buzz, especially in computer science. One of the first ideas in category theory is to focus on how objects interact with each other, not on their internal structure. This should sound very familiar to computer scientists: focus on interface, not implementation. That suggests that category theory might be useful in computer science. Sometimes the connection between category theory and computer science is quite explicit, as in functional programming. Haskell, for example, has several ideas from category theory explicit in the language: monads, natural transformations, etc.

Outside of computer science, applications of category theory are less direct. Category theory can guide you to ask the right questions, and to avoid common errors. The mathematical term “category” was borrowed from philosophy for good reason. Mathematicians seek to avoid categorical errors, just as Aristotle and Kant did. I think of category theory as analogous to dimensional analysis in engineering or type checking in software development, a tool for finding and avoiding errors.

I used to be very skeptical of applications of category theory. I’m still skeptical, though not as much. I’ve seen category theory used as a smoke screen, and I’ve seen it put to real use. More about my experience with category theory here.

* * *

Topology illustration from Barcodes: The persistent topology of data by Robert Ghrist.

Category theory diagram from Category theory for scientists by David Spivak

]]>I don’t think he’s simply saying see if you can do everything 20 times faster. If you estimate something will take ten days, it probably will take more than half a day. We’re better at estimating things on the scale of days than on the scale of years.

If you expect to finish a project in ten days, you’re probably going to go about it the way you’ve approached similar projects before. There’s not a lot of time for other options. But there are a lot of ways to go about a decade-long project.

Since Thiel is well known for being skeptical of college education, I imagine one of the things he had in mind was starting a company in six months rather than going to college, getting an entry level job, then leaving to start your company.

As I wrote in an earlier post, some things can’t be done slowly.

Some projects can only be done so slowly. If you send up a rocket at half of escape velocity, it’s not going to take twice as long to get where you want it to go. It’s going to take infinitely longer.

Some projects have to be done quickly if they are going to be done at all. Software projects are usually like this. If a software project is expected to take two years, I bet it’ll take five, if it’s not cancelled before then. You have to deliver software faster than the requirements change. Of course this isn’t unique to software. To be successful, you have to deliver any project before your motivation or your opportunity go away.

]]>First, reciprocals of integer powers:

Then reciprocals of odd powers:

Both are easy to prove since they’re geometric series.

More posts related to the golden ratio:

]]>For example, the 0-core of a graph is simply the entire graph since every vertex has at least zero edges; a vertex can’t have a negative number of edges. The 1-core of a graph contains the vertices that are connected to other vertices. You form the 1-core by throwing away the 0-shell, the set of isolated vertices. You move from the 1-core to the 2-core by removing nodes of degree 1 until everything that’s left has degree at least 2.

NB: You do not form the *k*-core simply by discarding nodes of degree less than *k*. For example, consider the star graph below.

If you peel off the the nodes of degree 1, eventually you have nothing left. There is no 2-core, even though the original graph had a node of degree 5. Or going back to the definition, there is no subgraph such that every vertex has degree 2, and so no maximal subgraph. The node of degree 5 only has degree five because it is connected to nodes of degree 1.

The *k*-cores of a complete graph of degree *n* are simply the entire graph, for *k* ≤ *n*. The *k*-shells are empty for *k* < *n*.

Let’s see what the *k*-cores and *k*-shells look like for some more complex graphs. First, let’s go back to the post looking at the overlap in keywords. The edges of the graph are the 200 search terms that have lead visitors to the site at least twice. We connect two search terms if they share a word. The 56-core of the graph has 57 vertices, and the 57-core is empty. So the 56-core is a complete graph. Here’s the distribution of the *k*-shell sizes.

Next let’s look at a random graph. As in the post on spectra of random graphs, we look at an Erdős-Rényi graph *G*_{n,p}. First we start with *n* = 1000 and *p* = 0.01. So we start with 1000 nodes, and there’s a 0.01 that any particular pair of nodes is connected. Here are the core sizes for one such random graph:

[1000 1000 999 999 990 970 914 793]

So the 7-core has 793 nodes, then the 8-core is empty. I ran this ten times and the 7-core sizes ranged from size 709 to 849. But every time the 8-core was empty. The largest value of *k* such that the *k*-core is non-empty is called the **degeneracy **of the graph, so we could say the degeneracy was 7 every time.

I reran with *n* = 2000. The degeneracy was 14 every time. The 14-cores were in the range 1723 to 1810.

Finally I set *n* = 4000 and ran again. The degeneracy was 29 three times and 30 seven times. The final core sizes ranged from 3451 to 3714.

The experiments above suggest that the *k*-core size abruptly drops to zero, at a predictable value of *k*, and with a fairly predictable core size. There are papers on *k*-core size of random graphs, but I have not yet read any. (I have one in my to-read queue that someone let me know about after posting this article.)

* * *

]]>For most of the last 500 years, the largest known prime has been a Mersenne prime, a number of the form 2^{p} – 1 where *p* is itself prime. Such numbers are not always prime. For example, 2^{11} − 1 = 2047 = 23 × 89.

Euclid proved circa 300 BC that if *M* is Mersenne prime, then *M*(*M* + 1)/2 is a perfect number, i.e. it is equal to the sum of the divisors less than itself. Euler proved two millennia later that all even perfect numbers must have this form. Since no odd perfect numbers have been discovered, the discovery of a new Mersenne prime implies the discovery of a new perfect number.

Using the same methods as in the posts from last year, we can say some things about how *P* appears in various bases.

In binary, *P* is written as a string of 74,207,281 ones.

In base four, *P* is a 1 followed by 37,103,640 threes.

In base eight, *P* is a 1 followed by 24,735,760 sevens.

In base 16, *P* is a 1 followed by 18,551,820 F’s.

In base 10, *P* has 22,338,618 digits, the first of which is 3 and the last of which is 1.

**Related post**: Five interesting things about Mersenne primes

In this example there are two connected bipartite components. There is a variation on the graph Laplacian called the **signless Laplacian** whose eigenvalues can tell you how many bipartite components a graph has.

The graph Laplacian is *L* = *D* – *A* where *D* is the diagonal matrix whose entries are the degrees of each node and *A* is the adjacency matrix. The signless Laplacian is *Q* = *D* + *A*. Since *D* and *A* have only non-negative entries, so does *Q*.

The smallest eigenvalue of the Laplacian *L* is always zero, and the second smallest eigenvalue is positive if and only if the graph is connected. The smallest eigenvalue of the signless Laplacian *Q* is positive if and only if the graph is connected and bipartite. The multiplicity of zero as an eigenvalue equals the number of connected bipartite components.

The graph above has two connected bipartite components. Zero is a double eigenvalue of the signless Laplacian, and the next eigenvalue is 0.2603.

Next we connect two of the blue nodes. Now the graph is connected and bipartite.

Zero is now a single eigenvalue, and the second eigenvalue is 0.0968. This second eigenvalue is positive because there is only one component now, but it is small because the graph can almost be separated into two bipartite components.

Next we remove the edge connecting the two components, but we draw an edge between two of the green vertices.

Now there are two components, but only one of them is bipartite. The smallest eigenvalue is zero, with multiplicity 1, and the second eigenvalue is 0.2015.

Next we repeat the examples above with more edges. Now each component of the graph is a complete bipartite graph.

The smallest eigenvalue is 0, with multiplicity 2. But now the next eigenvalue is 3, This value is larger than before because the components are better connected.

As before, we now join two of the blue nodes.

Now there is one bipartite component. The smallest eigenvalue is 0 with multiplicity 1. The second eigenvalue is 0.2103.

Finally, we remove the edge connecting the two components and connect two of the green vertices.

Now the smallest eigenvalue is 0 with multiplicity 1. While there are two components, there’s only one bipartite component. The next eigenvalue is 0.4812.

**Related posts**:

Then the spectrum of the adjacency matrix *A* of the graph bounces around the spectrum of the expected value of *A*. This post will illustrate this to give an idea how close the two spectra are.

The adjacency matrix *A* for our random graph has zeros along the diagonal because the model doesn’t allow loops connecting a vertex with itself. The rest of the entries are 1 with probability *p* and 0 with probability 1-*p*. The entries above the diagonal are independent of each other, but the entries below the diagonal are determined by the entries above by symmetry. (An edge between *i* and *j* is also an edge between *j* and *i*.)

The expected value of *A* is a matrix with 0’s along the diagonal and *p*‘s everywhere else. This matrix has one eigenvalue of *p*(*n* – 1) and (*n* – 1) eigenvalues –*p*. Let’s see how close the spectra of a dozen random graphs come the spectra of the expected adjacency matrix. As above, *n* = 100 and *p* = 0.1.

Because the eigenvalues are close together, I jittered each sample horizontally to make the values easier to see. Notice that the largest eigenvalue in each sample is around 10. The expected value is *p*(*n* – 1) = 9.9, so the values are close to their expected value.

The other eigenvalues seem to be centered around 0. This is consistent with the expected value of –*p* = -0.1. The most obvious thing about the smaller eigenvalues is not their mean but their range. How far can might the eigenvalues be from their expected value? Fan Chung and Mary Radcliffe give the following estimate in their paper On the Spectra of General Random Graphs.

For

G_{n,p,}ifp> 8 log(√2n)/9n, then with probability at least 1 – 1/nwe have

Here *J* is the matrix of all 1’s and *I* is the identity matrix. In our case the theorem says that with probability at least 0.99, we expect the eigenvalues of our random graph to be within 19.903 of their expected value. This is a pessimistic bound since no value is much more than 5 away from its expected value.

*a* = *m*^{2} – *n*^{2}, *b* = 2*mn*, *c* = *m*^{2} + *n*^{2}.

Now what if we look at matrices with integer entries such that *A*^{2} + *B*^{2} = *C*^{2}? Are there any? Yes, here’s an example:

How can you find matrix-valued Pythagorean triples? One way is to create diagonal matrices with Pythagorean triple integers. Take a list of *n* Pythagorean triples (*a*_{i}, *b*_{i}, *c*_{i}) and create diagonal matrices *A*, *B*, and *C* with the *a*_{i} along the diagonal of *A*, *b*_{i} along the diagonal of *B*, and *c*_{i} along the diagonal of *C*.

But that seems cheap. We’re really not using any properties of matrix multiplication other than the fact that you can square a diagonal matrix by squaring its elements. One way to create more interesting matrix Pythagorean triples is to start with the diagonal matrices described above and conjugate them by some matrix *Q* with integer entries such that *Q*^{-1} also has integer entries. Here’s why that works:

The example above was created by starting with the Pythagorean triples (3, 4, 5) and (5, 12, 13), creating diagonal matrices, and conjugating by

This gives a way to create lots of matrix Pythagorean triples, but there may triples that this procedure leaves out.

]]>To do this, I created a graph where the nodes are the keywords and edges join nodes that share a word. (A “keyword” is actually a set of words, whatever someone types into a search.) The result was a solid gray blob be cause the the nodes are so densely connected.

I thinned the graph by first only looking at keywords that have occurred at least twice. Then I made my connection criteria a little stronger. I take the union of the individual words in two keywords and create a connection if at least 1/3 of these words appear in both keywords.

(You can click on the image to see a larger version.)

The area of each red dot is proportional to the number of times someone has come to the site using that keyword. You can see that there’s long-tail behavior: while some dots are much larger than others, there are a lot of little dots.

In case you’re curious, these are the top keywords:

- pest control
- ri
- pest control ri
- pest control rhode island,
- plants that keep bees away
- poisonous snakes in ri
- plants that keep bees and wasps away
- plants to keep bees away
- plants that keep wasps away
- pest control companies

The keywords were filtered a little before making the graph. Some of the top keywords contained “debug.” These may have been people using the term to refer to eliminating pests from their property, but more likely they were people who knew the company name. By removing “debug” from each of the keywords, the results focus on people searching for the company’s services rather than the company itself.

]]>Spectral coordinates are a natural way to draw a graph because they are determined by the properties of the graph, not arbitrary aesthetic choices. Construct the Laplacian matrix and let *x* and *y* be the eigenvectors associated with the second and third eigenvalues. (The smallest eigenvalue is always zero and has an eigenvector of all 1’s. The second and third eigenvalues and eigenvectors are the first to contain information about a graph.) The spectral coordinates of the *i*th node are the *i*th components of *x* and *y*.

We illustrate this with a graph constructed from a dodecahedron, a regular solid with twenty vertices and twelve pentagonal faces. You can make a dodecahedron from a soccer ball by connecting the centers of all the white hexagons. Here’s one I made from Zometool pieces for a previous post:

Although we’re thinking of this graph as sitting in three dimensions, the nodes being the corners of pentagons etc., the graph simply says which vertices are connected to each other. But from this information, we can construct the graph Laplacian and use it to assign plane coordinates to each point. And fortunately, this produces a nice picture:

Here’s how that image was created using Python’s NetworkX library.

import networkx as nx import matplotlib.pyplot as plt from scipy.linalg import eigh # Read in graph and compute the Laplacian L ... # Laplacian matrices are real and symmetric, so we can use eigh, # the variation on eig specialized for Hermetian matrices. w, v = eigh(L) # w = eigenvalues, v = eigenvectors x = v[:,1] y = v[:,2] spectral_coordinates = {i : (x[i], y[i]) for i in range(n)} G = nx.Graph() G.add_edges_from(graph) nx.draw(G, pos=spectral_coordinates) plt.show()

**Update**: After posting this I discovered that NetworkX has a method `draw_spectral`

that will compute the spectral coordinates for you.

**Related**:

If you don’t use all the dominoes you can make an actual square, no extra row to ignore:

I found the design of these magic squares in The Zen of Magic Squares, Circles, and Stars by Cliff Pickover. He credits Mathematical recreations and essays by Ball and Coxeter.

While the most common set of dominoes goes up to double six, there are other possibilities. I made the images above by pulling out the double six subset out of a set of double nine dominoes. Is it possible to make magic squares like the first photo above out of a full set of dominoes up to double *n* for other values of *n*? Not if *n* is odd, but perhaps if *n* is even.

You can at least arrange the dominoes into a square. A set of double *n* dominoes has (*n*+1)(*n* + 2)/2 pieces, so (*n*+1)(*n* + 2) squares. With a row of *n* + 1 blanks on the bottom, this leaves (*n *+ 1)^{2} squares in the magic square. The total number of points in a set of double *n* dominoes is *n*(*n* + 1)(*n* + 2)/2, so each row, column, and diagonal would have to sum to *n*(*n* + 2)/2. This means you cannot arrange a set of double *n* dominoes into a magic square if *n* is odd. So, for example, it won’t work for double nine dominoes. Is it possible when *n* is even? Certainly if *n* = 6, but what about other values of *n*?

]]>

- Was this data collected in a proper way?
- Does common sense apply here, or is there something subtle going on?
- What conclusions can we draw from the data?
- Is this analysis routine or is there something unusual about it?
- How much confidence can we place in the conclusions?
- How can you explain this to a jury?
- Did opposing counsel analyze their data properly?

**Related**:

The matrix representing the inverse of the DFT is the conjugate of the DFT matrix (divided by *Nf*, but we’re only looking at phase here, so we can ignore this rescaling.) The image below displays the DFT matrix on the left and it’s inverse on the right.

Taking the conjugate amounts to making all the clocks run backward.

The DFT is often called the FFT. Strictly speaking, the FFT is an algorithm for computing the DFT. Nobody computes a DFT by multiplying by the DFT matrix, because the FFT is faster. The DFT matrix has a lot of special structure, which the FFT takes advantage of to compute the product faster than using ordinary matrix multiplication.

By the way, there are Unicode characters for clock times on the hour, U+1F550 through U+1F55B. I created the image above by writing a script that put the right characters in a table. The colors have HSL values where H is proportional to the angle and S = L =0.8.

**Related**: Digital signal processing and time series analysis

]]>

For a complete graph on *n* vertices, all the eigenvalues except the first equal *n*. The eigenvalues of the Laplacian of a graph with *n* vertices are always less than or equal to *n*, this says the complete graph has the largest possible eigenvalue. This makes sense in light of the previous post: adding edges increases the eigenvalues. When you add as many edges as possible, you get the largest possible eigenvalues.

Conversely, if all the non-zero eigenvalues of the graph Laplacian are equal, the graph is complete.

Next we look at the case of a star with *n*-1 vertices connected to a central vertex. The smallest eigenvalue is zero, as always, and the largest is *n*. The ones in the middle are all 1. In particular, the second eigenvalue, the algebraic connectivity, is 1. It’s positive because the graph is connected, but it’s not large because the graph is not well connected: if you remove the central vertex it becomes completely disconnected.

Finally, we look at a cyclical graph, a ring with *n* vertices. Here the eigenvalues are 2 – 2 cos(2π*k*/*n*) where 0 ≤ *k* ≤ *n/*2. In particular, smallest non-zero eigenvalue is 2 – 2 cos(2π/*n*) and so as *n* increases, the algebraic connectivity approaches zero. This is curious since the topology doesn’t change at all. But from a graph theory perspective, a big ring is less connected than a small ring. In a big ring, each vertex is connected to a small proportion of the vertices, and the average distance between vertices is larger.

**Related**: Help understanding complex networks