You can find most of the links from previous Wednesday posts on one page by going to technical notes from the navigation menu at the top of the site.

]]>

graphemeA graphene is an allotrope of carbon arranged in a hexagonal crystal lattice one atom thick. Grapheme, or more fully, agrapheme cluster stringis a single user-visiblecharacter, which in turn may be several characters (codepoints) long. For example … a “ȫ” is a single grapheme but one, two, or even three characters, depending onnormalization.

In case the character ȫ doesn’t display correctly for you, here it is:

First, *graphene* has little to do with *grapheme*, but it’s geeky fun to include it anyway. (Both are related to writing. A grapheme has to do with how characters are written, and the word *graphene* comes from graphite, the “lead” in pencils. The origin of *grapheme* has nothing to do with graphene but was an analogy to *phoneme*.)

Second, the example shows how complicated the details of Unicode can get. The Perl code below expands on the details of the comment about ways to represent ȫ.

This demonstrates that the character `.`

in regular expressions matches any single character, but `\X`

matches any single grapheme. (Well, almost. The character `.`

usually matches any character *except a newline*, though this can be modified via optional switches. But `\X`

matches any grapheme including newline characters.)

# U+0226, o with diaeresis and macron my $a = "\x{22B}"; # U+00F6 U+0304, (o with diaeresis) + macron my $b = "\x{F6}\x{304}"; # o U+0308 U+0304, o + diaeresis + macron my $c = "o\x{308}\x{304}"; my @versions = ($a, $b, $c); # All versions display the same. say @versions; # The versions have length 1, 2, and 3. # Only $a contains one character and so matches . say map {length $_ if /^.$/} @versions; # All versions consist of one grapheme. say map {length $_ if /^\X$/} @versions;]]>

If the only criticism is that something is too easy or “OK for beginners” then maybe it’s a threat to people who invested a lot of work learning to do things the old way.

The problem with the “OK for beginners” put-down is that everyone is a beginner sometimes. Professionals are often beginners because they’re routinely trying out new things. And being easier for beginners doesn’t exclude the possibility of being easier for professionals too.

Sometimes we assume that harder must be better. I know I do. For example, when I first used Windows, it was so much easier than Unix that I assumed Unix must be better for reasons I couldn’t articulate. I had invested so much work learning to use the Unix command line, it must have been worth it. (There are indeed advantages to doing some things from the command line, but not the work I was doing at the time.)

There often are advantages to doing things the hard way, but something isn’t necessary better *because* it’s hard. The easiest tool to pick up may not be best tool for long-term use, but then again it might be.

Most of the time you want to *add* the easy tool to your toolbox, not take the old one out. Just because you can use specialized professional tools doesn’t mean that you always have to.

**Related post**: Don’t be a technical masochist

- CRMSimulator is used to design CRM trials, dose-finding based only on toxicity outcomes.
- BMA-CRMSimulator is a variation on CRMSimulator using Bayesian model averaging.
- EffTox is used for dose-finding based on toxicity and efficacy outcomes.
- TTEConduct and TTEDesigner are for safety monitoring of single-arm trials with time-to-event outcomes.
- Multc Lean monitors two binary outcomes, efficacy and toxicity.
- Adaptive Randomization is for multiple-arm trials using outcome-adaptive randomization.
- More software from MD Anderson

**Last week’s resource post**: Stand-alone numerical code

]]>

Since your goal is to find the best dose, it seems natural to compare dose-finding methods by how often they find the best dose. This is what is most often done in the clinical trial literature. But this seemingly natural criterion is actually artificial.

Suppose a trial is testing doses of 100, 200, 300, and 400 milligrams of some new drug. Suppose further that on some scale of goodness, these doses rank 0.1, 0.2, 0.5, and 0.51. (Of course these goodness scores are unknown; the point of the trial is to estimate them. But you might make up some values for simulation, pretending with half your brain that these are the true values and pretending with the other half that you don’t know what they are.)

Now suppose you’re evaluating two clinical trial designs, running simulations to see how each performs. The first design picks the 400 mg dose, the best dose, 20% of the time and picks the 300 mg dose, the second best dose, 50% of the time. The second design picks each dose with equal probability. The latter design picks the *best* dose more often, but it picks a *good* dose less often.

In this scenario, the two largest doses are essentially equally good; it hardly matters how often a method distinguishes between them. The first method picks one of the two good doses 70% of the time while the second method picks one of the two good doses only 50% of the time.

This example was exaggerated to make a point: obviously it doesn’t matter how often a method can pick the better of two very similar doses, not when it very often picks a bad dose. But there are less obvious situations that are quantitatively different but qualitatively the same.

The goal is actually to find a good dose. Finding the absolute best dose is impossible. The most you could hope for is that a method finds with high probability the best of the four *arbitrarily chosen doses* under consideration. Maybe the *best* dose is 350 mg, 843 mg, or some other dose not under consideration.

A simple way to make evaluating dose-finding methods less arbitrary would be to estimate the benefit to patients. Finding the best dose is only a matter of curiosity in itself unless you consider how that information is used. Knowing the best dose is important because you want to treat future patients as effectively as you can. (And patients in the trial itself as well, if it is an adaptive trial.)

Suppose the measure of goodness in the scenario above is probability of successful treatment and that 1,000 patients will be treated at the dose level picked by the trial. Under the first design, there’s a 20% chance that 51% of the future patients will be treated successfully, and a 50% chance that 50% will be. The expected number of successful treatments from the two best doses is 352. Under the second design, the corresponding number is 252.5.

(To simplify the example above, I didn’t say how often the first design picks each of the two lowest doses. But the first design will result in at least 382 expected successes and the second design 327.5.)

You never know how many future patients will be treated according to the outcome of a clinical trial, but there must be some implicit estimate. If this estimate is zero, the trial is not worth conducting. In the example given here, the estimate of 1,000 future patients is irrelevant: the future patient horizon cancels out in a comparison of the two methods. The patient horizon matters when you want to include the benefit to patients in the trial itself. The patient horizon serves as a way to weigh the interests of current versus future patients, an ethically difficult comparison usually left implicit.

]]>Albert Einstein, 1954

]]>The opposite of an idiot would not be someone who is wise, but someone who takes too much outside input, someone who passively lets others make all his decisions or who puts every decision to a vote.

An idiot lives only in his own world; the opposite of an idiot has no world of his own. Both are foolish, but I think the Internet encourages more of the latter kind of foolishness. It’s not turning people into idiots, it’s turning them into the opposite.

]]>Maybe the company is good at attracting investors. Maybe they have enormous economies of scale that overcome their diseconomies of scale. Maybe they’ve lobbied politicians to protect the company from competition. (That’s a form of success, though not an honorable one.) Maybe they serve a market well, but you don’t see it because you’re not part of that market.

]]>The code is available in Python, C++, and C# versions. It could easily be translated into other languages since it hardly uses any language-specific features.

I wrote these functions for projects where you don’t have a numerical library available or would like to minimize dependencies. If you have access to a numerical library, such as SciPy in Python, then by all means use it (although SciPy is missing some of the random number generators provided here). In C++ and especially C#, it’s harder to find some of this functionality.

**Last week**: Code Project articles

**Next week**: Clinical trial software

As the number of steps *N* goes to infinity, the probability that the proportion of your time in positive territory is less than *x* approaches 2 arcsin(√*x*)/π. The arcsine term gives this rule its name, the arcsine law.

Here’s a little Python script to illustrate the arcsine law.

import random from numpy import arcsin, pi, sqrt def step(): u = random.random() return 1 if u < 0.5 else -1 M = 1000 # outer loop N = 1000 # inner loop x = 0.3 # Use any 0 < x < 1 you'd like. outer_count = 0 for _ in range(M): n = 0 position= 0 inner_count = 0 for __ in range(N): position += step() if position > 0: inner_count += 1 if inner_count/N < x: outer_count += 1 print (outer_count/M) print (2*arcsin(sqrt(x))/pi)]]>

Aleksandr Khinchin proved that for almost all real numbers *x*, as *n* → ∞ the geometric means converge. Not only that, they converge to the same constant, known as Khinchin’s constant, 2.685452001…. (“Almost all” here mean in the sense of measure theory: the set of real numbers that are exceptions to Khinchin’s theorem have measure zero.)

To get an idea how fast this convergence is, let’s start by looking at the continued fraction expansion of π. In Sage, we can type

continued_fraction(RealField(100)(pi))

to get the continued fraction coefficient

[3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3]

for π to 100 decimal places. The geometric mean of these coefficients is 2.84777288486, which only matches Khinchin’s constant to 1 significant figure.

Let’s try choosing random numbers and working with more decimal places.

There may be a more direct way to find geometric means in Sage, but here’s a function I wrote. It leaves off any leading zeros that would cause the geometric mean to be zero.

from numpy import exp, mean, log def geometric_mean(x): return exp( mean([log(k) for k in x if k > 0]) )

Now let’s find 10 random numbers to 1,000 decimal places.

for _ in range(10): r = RealField(1000).random_element(0,1) print(geometric_mean(continued_fraction(r)))

This produced

2.66169890535 2.62280675227 2.61146463641 2.58515620064 2.58396664032 2.78152297661 2.55950338205 2.86878898900 2.70852612496 2.52689450535

Three of these agree with Khinchin’s constant to two significant figures but the rest agree only to one. Apparently the convergence is very slow.

If we go back to π, this time looking out 10,000 decimal places, we get a little closer:

print(geometric_mean(continued_fraction(RealField(10000)(pi))))

produces 2.67104567579, which differs from Khinchin’s constant by about 0.5%.

]]>]]>It was not a matter of everything in mathematics collapsing in on itself, with one branch turning out to have been merely a recapitulation of another under a different guise. Rather, the principle was that every sufficiently beautiful mathematical system was rich enough to mirror in part — and sometimes in a complex and distorted fashion — every other sufficiently beautiful system. Nothing became sterile and redundant, nothing proved to have been a waste of time, but everything was shown to be magnificently intertwined.

In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point.

When I first read this I immediately thought of several examples where theory said that an optima was at an extreme, but experience said otherwise.

Linear programming (LP) says the opposite of Akin’s law. The optimal point for a linear objective function subject to linear constraints is *always* at an extreme point. The constraints form a many-sided shape—you could think of it something like a diamond—and the optimal point will always be at one of the corners.

Nature is not linear, though it is often approximately linear over some useful range. One way to read Akin’s law is to say that even when something is approximately linear in the middle, there’s enough non-linearity at the extremes to pull the optimal points back from the edges. Or said another way, when you have an optimal value at an extreme point, there may be some unrealistic aspect of your model that pushed the optimal point out to the boundary.

**Related post**: Data calls the model’s bluff

But “maximum tolerated dose” implies a degree of personalization that rarely exists in clinical trials. Phase I chemotherapy trials don’t try to find the maximum dose that any particular patient can tolerate. They try to find a dose that is toxic to a certain percentage of the trial participants, say 30%. (This rate may seem high, but it’s typical. It’s not far from the toxicity rate implicit in the so-called 3+3 rule or from the explicit rate given in many CRM (“continual reassessment method”) designs.)

It’s tempting to think of “30% toxicity rate” as meaning that each patient experiences a 30% toxic reaction. But that’s not what it means. It means that each patient has a 30% chance of a toxicity, however toxicity is defined in a particular trial. If toxicity were defined as kidney failure, for example, then 30% toxicity rate means that each patient has a 30% probability of kidney failure, not that they should expect a 30% reduction in kidney function.

**Related posts**:

It’s hard imagine how investors could abandon something as large and expensive as a shopping mall. And yet it must have been a sensible decision. If anyone disagreed, they could buy the abandoned mall on the belief that they could make a profit.

The idea that you should stick to something just because you’ve invested in it goes by many names: sunk cost fallacy, escalation of commitment, gambler’s ruin, etc. If further investment will simply lose more money, there’s no economic reason to continue investing, regardless of how much money you’ve spent. (There may be non-economic reasons. You may have a moral obligation to fulfill a commitment or to clean up a mess you’ve made.)

Most of us have not faced the temptation to keep investing in an unprofitable shopping mall, but everyone is tempted by the sunk cost fallacy in other forms: finishing a novel you don’t enjoy reading, holding on to hopelessly tangled software, trying to earn a living with a skill people no longer wish to pay for, etc.

According to Peter Drucker, “It cannot be said often enough that one should not postpone; one abandons.”

The first step in a growth policy is not to decide where and how to grow.

It is to decide what to abandon. In order to grow, a business must have a systematic policy to get rid of the outgrown, the obsolete, the unproductive.

It’s usually more obvious what someone else should abandon than what we should abandon. Smart businesses turn to outside consultants for such advice. Smart individuals turn to trusted friends. An objective observer without our emotional investment can see things more clearly than we can.

***

This posted started out as a shorter post on Google+.

Photo above by Steve Petrucelli via flickr

]]>Pitfalls in Random Number Generation includes several lessons learned the hard way.

Simple Random Number Generation is a random number generator written in C# based on George Marsaglia’s WMC algorithm.

Finding probability distribution parameters from percentiles

Avoiding Overflow, Underflow, and Loss of Precision explains why the most obvious method for evaluating mathematical functions may not work. The article includes C++ source code for evaluating some functions that come up in statistics (particularly logistic regression) that could have problems if naïvely implemented.

An introduction to numerical programming in C#

Five tips for floating point programming gives five of the most important things someone needs to know when working with floating point numbers.

Optimizing a function of one variable with Brent’s method.

Fast numerical integration using the double-exponential transform method. Optimally efficient numerical integration for analytic functions over a finite interval.

Three methods for root-finding in C#

Getting started with the SciPy (Scientific Python) library

Numerical integration with Simpson’s rule

Filling in the gaps: simple interpolation discusses linear interpolation and inverse interpolation, and gives some suggestions for what to do next if linear interpolation isn’t adequate.

Automated Extract and Build from Team System using PowerShell explains a PowerShell script to automatically extract and build Visual Studio projects from Visual Studio Team System (VSTS) version control.

PowerShell Script for Reviewing Text Show to Users is a tool for finding errors in prose displayed to users that might not be exposed during testing.

Monitoring unreliable scheduled tasks describes a simple program for monitoring legacy processes.

Calculating percentiles in memory-bound applications gives an algorithm and C++ code for calculating percentiles of a list too large to fit into memory.

Quick Start for C++ TR1 Regular Expressions answers 10 of the first questions that are likely to come to mind when someone wants to use the new regular expression support in C++.

**Last week**: Miscellaneous math notes

**Next week**: Stand-alone numerical code

Why specifically Perl regular expressions? Because Perl has the most powerful support for regular expressions (strictly speaking, “pattern matching.”) Other languages offer “Perl compatible” regular expressions, though the degree of compatibility varies and is always less than complete.

I imagine more people have ruled England than have mastered the whole of the Perl language. But it’s possible to use Perl for regular expression processing without learning too much of the wider language.

]]>Looking through Andrew Gelman and Jennifer Hill’s regression book, I noticed a justification for natural logarithms I hadn’t thought about before.

We prefer natural logs (that is, logarithms base

e) because, as described above, coefficients on the natural-log scale are directly interpretable as approximate proportional differences: with a coefficient of 0.06, a difference of 1 inxcorresponds to an approximate 6% difference iny, and so forth.

This is because

exp(*x*) ≈ 1 + *x*

for small values of *x* based on a Taylor series expansion. So in Gelman and Hill’s example, a difference of 0.06 on a natural log scale corresponds to roughly multiplying by 1.06 on the original scale, i.e. a 6% increase.

The Taylor series expansion for exponents of 10 is not so tidy:

10^{x} ≈ 1 + 2.302585 *x*

where 2.302585 is the numerical value of the natural log of 10. This means that a change of 0.01 on a log_{10} scale corresponds to an increase of about 2.3% on the original scale.

**Related post**: Approximation relating lg, ln, and log10

Probability and statistics:

- How to test a random number generator
- Predictive probabilities for normal outcomes
- One-arm binary predictive probability
- Relating two definitions of expectation
- Illustrating the error in the delta method
- Relating the error function erf and Φ
- Inverse gamma distribution
- Negative binomial distribution
- Upper and lower bounds for the normal distribution function
- Canonical example of Bayes’ theorem in detail
- Functions of regular variation
- Student-t as a mixture of normals

Other math:

- Chebyshev polynomials
- Richard Stanley’s twelvefold way (combinatorics)
- Hypergeometric functions
- Outline of Laplace transforms
- Navier-Stokes equations
- Picking the step size for numerical ODEs
- Orthogonal polynomials
- Multi-index notation
- The
*pqr*theorem for seminorms

See also journal articles and technical reports.

**Last week**: Probability approximations

**Next week**: Code Project articles

Peter Norvig calculated frequencies for all pairs of letters based on the corpus Google extracted from millions of books. He gives a table that will show you the frequency of a bigram when you mouse over it. I scraped his HTML page to create a simple CSV version of the data. My file lists bigrams, frequencies to three decimal places, and the raw counts: bigram_frequencies.csv. The file is sorted in decreasing order of frequency.

The Emacs key-chord module lets you bind pairs of letters to Emacs commands. For example, if you map a command to `jk`

, that command will execute whenever you type `j`

and `k`

in quick succession. In that case if you want the literal sequence “jk” to appear in a file, pause between typing the `j`

and the `k`

. This may sound like a bad idea, but I haven’t run into any problems using it. It allows you to execute your most frequently used commands very quickly. Also, there’s no danger of conflict since neither basic Emacs nor any of its common packages use key chords.

The table below gives bigrams whose percentage frequency rounds to zero keeping three decimal places. See the data file for details.

Since Q is always followed by U in native English words, it’s safe to combine Q with any other letter. (If you need to type Qatar, just pause a little after typing the Q.) It’s also safe to use any consonant after J and most consonants after Z. (It’s rare for a consonant to follow Z, but not quite rare enough to round to zero. ZH and ZL occur with 0.001% frequency, ZY 0.002% and ZZ 0.003%.)

Double letters make especially convenient key chords since they’re easy to type quickly. JJ, KK, QQ, VV, WW, and YY all have frequency rounding to zero. HH and UU have frequency 0.001% and AA, XX, and ZZ have frequency 0.003%.

Note that the discussion above does not distinguish upper and lower case letters in counting frequencies, but Emacs key chords are case-sensitive. You could make a key chord out of any pair of capital letters unless you like to shout in online discussions, use a lot of acronyms, or write old-school FORTRAN.

**Update** (2 Feb 2015):

This post only considered ordered bigrams. But Emacs key chords are unordered, combinations of keys pressed at or near the same time. This means, for example, that `qe`

would not be a good keychord because although QE is a rare bigram, EQ is not (0.057%). The file unordered_bigram_frequencies.csv gives the combined frequencies of bigrams and their reverse (except for double letters, in which case it simply gives the frequency).

Combinations of J and a consonant are still mostly good key chords except for JB (0.023%), JN (0.011%), and JD (0.005%).

Combinations of Q and a consonant are also good key chords except for QS (0.007%), QN (0.006%), and QC (0.005%). And although O is a vowel, QO still makes a good key chord (0.001%).

]]>If your points cluster along one of the coordinate axes, then projecting them to that axis will show the full width of the data. But if your points cluster along one of the diagonal directions, the projection along every coordinate axis will be a tiny smudge near the origin. There are a lot more diagonal directions than coordinate directions, 2^{N} versus *N*, and so there are a lot of orientations of your points that could be missed by every coordinate projection.

Here’s the math behind the loose statements above. The diagonal directions of the form (±1, ±1, …, ±1). A unit vector in one of these directions will have the form (1/√*N*)(±1, ±1, …, ±1) and so its inner product with any of the coordinate basis vectors is 1/√*N*, which goes to zero as *N* gets large. Said another way, taking a set of points along a diagonal and projecting it to a coordinate axis divides its width by √*N.*

There are some things you may need to learn not for the content itself but for the confidence boost. Maybe you need to learn them so you can confidently say you didn’t need to. Also, some things you need to learn before you can see uses for them. (More on that theme here.)

I’ve learned several things backward in the sense of learning the advanced material before the elementary. For example, I studied PDEs in graduate school before having mastered the typical undergraduate differential equation curriculum. That nagged at me. I kept thinking I might find some use for the undergrad tricks. When I had a chance to teach the undergrad course a couple times, I increased my confidence. I also convinced myself that I didn’t need that material after all.

My experience with statistics was similar. I was writing research articles in statistics before I learned some of the introductory material. Once again the opportunity to teach the introductory material increased my confidence. The material wasn’t particularly useful, but the experience of having taught it was.

**Related post**: Psychological encapsulation

[1] See Yeats’ poem The Second Coming:

…

The best lack all conviction, while the worst

Are full of passionate intensity.

…

]]>

Do we even need probability approximations anymore? They’re not as necessary for numerical computation as they once were, but they remain vital for understanding the behavior of probability distributions and for theoretical calculations.

Textbooks often leave out details such as quantifying the error when discussion approximations. The following pages are notes I wrote to fill in some of these details when I was teaching.

- Error in the normal approximation to the binomial distribution
- Error in the normal approximation to the gamma distribution
- Error in the normal approximation to the Poisson distribution
- Error in the normal approximation to the t distribution
- Error in the Poisson approximation to the binomial distribution
- Error in the normal approximation to the beta distribution
- Camp-Paulson normal approximation to the binomial distribution
- Diagram of probability distribution relationships
- Relative error in normal approximations

See also blog posts tagged Probability and statistics and the Twitter account ProbFact.

**Last week**: Numerical computing resources

**Next week**: Miscellaneous math notes

There are three ways Bayesian posterior probability calculations can degrade with more data:

- Polynomial approximation
- Missing the spike
- Underflow

Elementary numerical integration algorithms, such as Gaussian quadrature, are based on polynomial approximations. The method aims to exactly integrate a polynomial that approximates the integrand. But likelihood functions are not approximately polynomial, and they become less like polynomials when they contain more data. They become more like a normal density, asymptotically flat in the tails, something no polynomial can do. With better integration techniques, the integration accuracy will *improve* with more data rather than degrade.

With more data, the posterior distribution becomes more concentrated. This means that a naive approach to integration might entirely miss the part of the integrand where nearly all the mass is concentrated. You need to make sure your integration method is putting its effort where the action is. Fortunately, it’s easy to estimate where the mode should be.

The third problem is that software calculating the likelihood function can underflow with even a moderate amount of data. The usual solution is to work with the logarithm of the likelihood function, but with numerical integration the solution isn’t quite that simple. You need to integrate the likelihood function itself, not its logarithm. I describe how to deal with this situation in Avoiding underflow in Bayesian computations.

***

If you’d like help with statistical computation, let’s talk.

]]>**Update **(27 Jan 2015): You may want to skip the post below and use ace-window. I plan to use it rather than my solution below. Also, see Sacha Chua’s video on window management, including using ace-window.

You can use `C-x o`

to move the cursor to the “other” window. That works great when you only have two windows: `C-x o`

toggles between them.

When you have more windows, `C-x o`

moves to the next window in top-to-bottom, left-to-right order. So if you have five windows, for example, you have to go to the “other” window up to four times to move to each of the windows.

The cursor is in the A window. Repeatedly executing `C-x o`

will move the cursor to C, B, and D.

There are more direct commands:

`windmove-up`

`windmove-down`

`windmove-left`

`windmove-right`

You might map these to Control or Alt plus the arrow keys to have a memorable key sequence to navigate windows. However, such keybindings may conflict with existing keybindings, particularly in `org-mode`

.

You can move windows around with the buffer-move. (Technically the windows stay fixed and the buffer contents moves around.)

Since the various Control arrow keys and Alt arrow keys are spoken for in my configuration, I thought I’d create Super and Hyper keys so that Super-up would map to `windmove-up`

etc.

Several articles recommend mapping the Windows key to Super and the Menu key to Hyper on Windows. The former is problematic because so many of the Windows key combinations are already used by the OS. I did get the latter to work by adding the following to the branch of my `init.el`

that runs on Windows.

(setq w32-pass-apps-to-system nil w32-apps-modifier 'hyper) (global-set-key (kbd "<H-up>") 'windmove-up) (global-set-key (kbd "<H-down>") 'windmove-down) (global-set-key (kbd "<H-left>") 'windmove-left) (global-set-key (kbd "<H-right>") 'windmove-right)

I haven’t figured out yet how to make Super or Hyper keys work on Linux. Suggestions for setting up Super and Hyper on Linux or Windows would be much appreciated.

]]>I’ve done consulting on the side throughout my career, and I planned to ramp up my consulting to the point that I could gradually transition into doing it full time. That never happened. I had to quit my day job before I had the **time** and **credibility** to find projects.

When you have a job and you tell someone you can work 10 hours a week on their project, working evenings and weekends, you sound like an amateur. But when you have your own business and tell someone you can only allocate 10 hours a week to their project, you sound like you’re in high demand and they’re glad you could squeeze them in.

When I left MD Anderson, I had one small consulting project lined up. I had been working on a second project, but it ended sooner than expected. (I was an expert witness on a case that settled out of court.) The result was that I started my consulting career with little work, and I imagine that’s common.

As soon as I announced on my blog that I was going out on my own, someone from a pharmaceutical company contacted me saying he was hoping I’d quit my job because he had a project for me, helping improve his company’s statistical software development. First day, first lead. This is going to be easy! Even though he was eager to get started, it was months before the project started and months more before I got paid.

In general, the larger a client is, the longer it takes to get projects started, and the longer they sit on invoices. (Google is an exception to the latter; they pay invoices fairly quickly.) Small clients negotiate projects quickly and pay quickly. They can help with your cash flow while you’re waiting on bigger clients.

This is a corollary to the discussion above. You might not have much income for the first few months, so you need several month’s living expenses in the bank before you start.

If you’re thinking about going out on your own and would like to talk about it, give me a call or send me an email. My contact information is listed here.

]]>

- Stand-alone code for numerical computing
- Accurately computing running variance
- IEEE floating-point exceptions in C++
- Double exponential integration
- Math.h in POSIX, ISO, and Visual Studio

See also the Twitter account SciPyTip and numerical programming articles I’ve written for Code Project.

**Last week**: Regular expressions

**Next week**: Probability approximations

In an all too familiar trade-off,

the result of striving for ultimate simplicity is intolerable complexity; to eliminatetoo-long proofswe find ourselves “hopelessly lost” among thetoo-long definitions. [emphasis added]

It’s as if there’s some sort of conservation of complexity, but not quite in the sense of a physical conservation law. Conservation of momentum, for example, means that if one part of a system loses 5 units of momentum, other parts of the system have to absorb exactly 5 units of momentum. But perceived complexity is psychological, not physical, and the accounting is not the same. By moving complexity around we might increase or decrease the overall complexity.

The opening quote suggests that complexity is an optimization problem, not an accounting problem. It also suggests that driving the complexity of one part of a system to its minimum may disproportionately increase the complexity of another part. Striving for the simplest possible proofs, for example, could make the definitions much harder to digest. There’s a similar dynamic in programming languages and programs.

Larry Wall said that Scheme is a beautiful programming language, but every Scheme program is ugly. Perl, on the other hand, is ugly, but it lets you write beautiful programs. Scheme can be simple because it requires libraries and applications to implement functionality that is part of more complex languages. I had similar thoughts about COM. It was an elegant object system that lead to hideous programs.

Scheme is a minimalist programming language, and COM is a minimalist object framework. By and large the software development community prefers complex languages and frameworks in hopes of writing smaller programs. Additional complexity in languages and frameworks isn’t felt as strongly as additional complexity in application code. (Until something breaks. Then you might have to explore parts of the language or framework that you had blissfully ignored before.)

The opening quote deals specifically with the complexity of theorems and proofs. In context, the author was saying that the price of Grothendieck’s elegant proofs was a daunting edifice of definitions. (More on that here.) Grothendieck may have taken this to extremes, but many mathematicians agree with the general approach of pushing complexity out of theorems and into definitions. Michael Spivak defends this approach in the preface to his book Calculus on Manifolds.

… the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions …

There are good reasons why the theorems should all be easy and the definitions hard. As the evolution of Stokes’ theorem revealed, a single simple principle can masquerade as several difficult results; the proofs of many theorems involve merely stripping away the disguise. The definitions, on the other hand, serve a twofold purpose: they are rigorous replacements for vague notions, and machinery for elegant proofs. [emphasis added]

Mathematicians like to push complexity into definitions like software developers like to push complexity into languages and frameworks. Both strategies can make life easier on professionals while making it harder on beginners.

**Related post**: A little simplicity goes a long way

- Regular expressions in PowerShell and Perl
- Regular expressions in Python
- Regular expressions in R
- Regular expressions in Mathematica
- C++ TR1 regular expressions

See also blog posts tagged regular expressions and the RegexTip Twitter account.

**Last week**: Probability resources

**Next week**: Numerical computing resources