Here's the trick: every boss has a weakness. After you beat Fire-man with your regular gun, you earn a fire weapon. This makes your upcoming fight with Ice-man easier, which helps defeat the next boss, and so on.

So why's this special?

*In Mega Man, you look forward to encountering more bosses.*

Every level is a chance to permanently upgrade your abilities, not a grind you're trying to survive.

Think about Tetris: would you look forward to a variety of new shapes appearing?

Heck no. Tetris can be fun in a "survive hordes of incoming zombies" sort of way, but in terms of learning, it's a frustrating, Sisyphean task. Every new piece is something to move beyond, not a learning opportunity. It's a test to find your breaking point.

In Mega Man, the game gets easier the more bosses you have. It's specifically designed for you to improve over time. Guess which game has 10+ sequels?

Math learning can follow the Mega Man pattern. If we want to beat "Dr. Euler", we need to beat his henchmen and master their weapons:

- Rad-man, once defeated, lets you think with radians, not just degrees
- Power-man lets you understand the base and power of an exponent
- I-man lets you unlock the rotation of imaginary numbers
- Pi-man lets you think in cyclic patterns
- Multi-man lets you understand how quantities transform each other
- E-man helps you visualize continuous change
- And finally, Dr. Euler lets you understand the role of imaginary exponents: e
^{ix}= cos(x) + isin(x)

After defeating the henchmen — truly understanding them and mimicking their abilities— Dr. Euler becomes defeatable. And after that, Boss Fourier. Then Captain Convolution.

There many more challenges on the horizon… and that's great! Every formula, once mastered, is a power to use.

When learning, I ask: "Did I internalize the concept so much I look *forward* to seeing it?". Learned ideas become allies, a decoder key to help unlock future equations.

I constantly seek analogies for learning because my understanding has improved most from perspective shifts, not from studying specific concepts.

Despite years of math classes, I lacked intuitions on e, i, pi, radians, logs, exponents… (the list goes on). Math class involved grinding through Tetris levels, moving *past* the concepts and not absorbing them fully. Imaginary numbers were *not* friends I looked forward to seeing in an equation.

An Aha! moment helped me see learning as a set of additive skills we could internalize, and the process and challenges became something to look forward to. I hope the same shift happens for you. Wouldn't it be great to look forward to adding new abilities to your arsenal?

Happy math.

]]>**Verify that**a statement is true**Understand why**a statement is true

There's a tendency to put the goals in opposition, assuming concepts are either "easily understood but wrong" or "difficult to understand yet correct".

It's like a restaurant that believes in having taste *or* nutrition, but not both. Why choose?

Our goal is a deep intuition for correct things. And it's ok to start with an understood "sorta-true" concept and refine it to an understood "very-true" version:

Correctness isn't binary. Our standards for a valid proof have evolved over the years, and in a century our notion of what's true may seem embarrassingly primitive. That's ok: let's get a decent understanding and work to make it better.

We have to balance two roles: the safety inspector who makes sure the food is safe, and the customer who wants to enjoy the meal. In my head I think about "inspection mode" and "tasting mode": the secret is inspecting things that already taste good.

The Pythagorean theorem is usually introduced as a statement about *triangles*. A common proof is a visual rearrangement, like this:

This is nutritious and correct, but not tasty to me. It seems like a special case, an optical illusion: with *just* the right shape, things can be re-arranged.

A tastier proof is that the Pythagorean Theorem is really about the nature of 2d area. A big shape, when split, yields two smaller shapes. The total area must be the same:

The split-apart area can come from a triangle, circle, or cardboard cutout of Thomas Jefferson. It doesn't matter: two pieces, when cut from a larger one, must have the same total area.

Aha!

This intuition can then be refined into a more formal statement.

Here's a trickier example: Euler's Formula.

It's a baffling statement, and here's the common justification:

It's crisp and concise, but unsatisfying to even other math fans:

I agree: it's a bunch of symbols that happen to line up. Here's a tastier version:

- e
^{x}represents continuous growth (interest earning interest, which earns interest…) - sin(x) and cos(x) represent vertical and horizontal directions
- i represents rotation

If we create "continuous rotation" (e^{ix}) then we move in a circle, which can be separated into horizontal and vertical components (cos(x) and i sin(x)).

Again, this intuition can be sharpened further:

- e
^{x}can be seen as an infinite series, starting with an initial value (1), the interest it earns (x), the interest that earns (frac(x^{2})(2!)), and so on. - Sine (and cosine) are infinite series based on an initial impulse, which creates a restoring force, which creates a restoring force, and so on. This is like interest that earns interest in the opposite direction (and why sine oscillates without going to infinity: its motion opposes itself.)
- Plugging i into e
^{x}means we earn "imaginary interest" (i), which earns "imaginary imaginary interest" (-1) which earns "imaginary imaginary imaginary interest" (-i), and so on. Some of the interest opposes the previous terms, and we can collect them into patterns matching sine and cosine.

Rather than staring at a dry proof and trying to understand it directly, get a rough intuition (ADEPT method) and then see if the proof makes sense. It's a bit of math inception, where we try to *understand* the verification step, not simply *verify* the verification step.

Happy math.

William Thurston (Fields Medal Winner) wrote a great essay, *On Proof and Progress in Mathematics*. It's full of ideas I found interesting:

- The question for mathematicians is: "How do mathematicians advance human understanding of mathematics?"

For instance, when Appel and Haken completed a proof of the 4-color map theorem using a massive automatic computation, it evoked much controversy. I interpret the controversy as having little to do with doubt people had as to the veracity of the theorem or the correctness of the proof. Rather, it reflected a continuing desire for human understanding of a proof, in addition to knowledge that the theorem is true...They discover by this kind of experience that what they really want is usually not some collection of “answers”—what they want is understanding.

- We're never done explaining a concept:

We may think we know all there is to say about a certain subject, but new insights are around the corner. Furthermore, one person’s clear mental image is another person’s intimidation.

- On the role of intuition:

Personally, I put a lot of effort into “listening” to my intuitions and associations, and building them into metaphors and connections. This involves a kind of simultaneous quieting and focusing of my mind. Words, logic, and detailed pictures rattling around can inhibit intuitions and associations.

- The "emperor's clothes" problem in math happens even for professionals:

Nonetheless, most of the audience at an average colloquium talk gets little of value from it. Perhaps they are lost within the first 5 minutes, yet sit silently through the remaining 55 minutes. Or perhaps they quickly lose interest because the speaker plunges into technical details without presenting any reason to investigate them. At the end of the talk, the few mathematicians who are close to the field of the speaker ask a question or two to avoid embarrassment.

A further issue is that people sometimes need or want an accepted and validated result not in order to learn it, but so that they can quote it and rely on it.

- On the difference between everyday explanations and and technical ones:

Why is there such a big expansion from the informal discussion to the talk to the paper? One-on-one, people use wide channels of communication that go far beyond formal mathematical language. They use gestures, they draw pictures and diagrams, they make sound effects and use body language...In papers, people are still more formal. Writers translate their ideas into symbols and logic, and readers try to translate back.

It’s like a new toaster that comes with a 16-page manual. If you already understand toasters and if the toaster looks like previous toasters you’ve encountered, you might just plug it in and see if it works, rather than first reading all the details in the manual.

- On what motivates us to do math:

What motivates people to do mathematics? There is a real joy in doing mathematics, in learning ways of thinking that explain and organize and simplify. One can feel this joy discovering new mathematics, rediscovering old mathematics, learning a way of thinking from a person or text, or finding a new way to explain or to view an old mathematical structure.

I love the "aha!" moments when a concept click. People willing seek out mysteries and puzzles (movies where we don't know the ending, games like Tetris). Math is an experience with similar emotional payoffs when approached correctly.

]]>You'll spin yourself dizzy trying to reconcile the ideas. So just rename it:

**How about "forwards" and "backwards" numbers?**

The invention of the number line *vastly* improved our understanding. What is "backwards from backwards"? It's forwards again! (I.e., why negative x negative = positive — more visual arithmetic.)

What is "halfway backwards"? Sideways!

The wrong words limit our thoughts.

That this subject [imaginary numbers] has hitherto been surrounded by mysterious obscurity, is to be attributed largely to an ill adapted notation. If, for example, +1, -1, and the square root of -1 had been called direct, inverse and lateral units, instead of positive, negative and imaginary (or even impossible), such an obscurity would have been out of the question. - Carl Gauss

When we want a "count" of something, we aren't being specific enough. We're using too general a name.

There are two types of counts:

- The points that determine the boundaries
- The spans between the boundaries

Assuming there's a single, universal "count" sets ourselves up for off-by-one errors. Was that a "point count" or a "span count"?

"There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors." - Phil Karlton

Even the name "off by one" isn't that helpful. That's the symptom, but what's the root cause? The "Fencepost error" helps identify why the issue happened.

How should we describe an angle? There's two ways to see it:

- Degrees, the
*swivel*an observer went through to follow an object - Radians, the
*distance*the object moved on its path

When physics formulas (sine, cosine, etc.) ask for "angle in radians" they mean "distance the object moved". Aha!

(The laws of motion don't particularly care how much you, the observer, had to tilt your head. Sorry.)

Integrals are usually described as the inverse of differentiation, finding the area under the curve, and so on.

How about this renaming: integrals are fancy multiplication.

That's it. You try to multiply two quantities, but you can't — one of the critters is scurrying around — so you use an integral.

"If I drive an unwavering 30mph for 3 hours, would just multiply it out. But since my speed changes, I'll integrate."

Think "fancy multiplication" not "inverse of differentiation". (Unless you're actually solving differential equations. In that case, have fun.)

Most formulas are named after their inventor. It's good to give credit, but it's not helpful for the student. For the Pythagorean Theorem, there's a few alternate names we can try:

- Triangle Theorem: Given two sides of a right triangle, we can know the third.
- Distance Theorem: Get the distance between points in any number of dimensions. (By imagining they're on a sequence of triangles.)
- Tradeoff Theorem: Find the tradeoff as you move in any direction (how much "x distance" you give up to gain "y distance"). Using this, we can follow the gradient for the optimal direction.

Math gets so much easier with the right phrase in your head. Just use the "distance theorem" for distance, the "tradeoff theorem" to find the best direction to move.

Even if you forget the formula, you know how it's applied. Otherwise, you have the phrase "Pythagorean Theorem" without the understanding of why you'd need it.

A super-common question is "Why is the number e important"? It's all in the name.

e is the "universal growth constant" like c is the "speed of light constant". c is perfect speed (can't improve it further), e^x is perfect growth (can't compound it further).

e^x is what we see when compounding 100% with no delay. Why 100%? Symmetry, baby: our growth rate matches our current amount. (Similarly, we do trig on the unit circle, use 1.0 as our base increment for counting, and so on.)

Once we have perfection, we can modify it for our scenario. Maybe are growing for more time periods, or at a different rate — that's fine, just modify e^x.

Naming e "Euler's constant" isn't descriptive. The "universal growth constant" is more helpful.

I'm on a linear algebra kick. Instead of "Matrix multiplication" (which is a very drab description of what we're doing), how about "running data through operations" (or "running a spreadsheet").

One matrix represents the operations, one matrix represents the data, and we are running the data through a pipeline.

Yes, in *some* cases you can think about "linearly transforming a vector space" but often times we're just transforming data.

And if you have an error? Instead of "You're multiplying it wrong" (gee, thanks) how about "The operations expect different-sized data." Ah! I know what to fix.

They never tell you how much easier math gets when you create your own names for things. I use whatever analogies, metaphors, or plain-English descriptions help. They may feel silly, but it's a much better feeling than confusion.

There's an ancient concept that knowing the name of thing gives you power over it. It seems to be true for math.

A famous mathematician had a great quote on naming:

"Mathematics is the art of giving the same name to different things." -Henri Poincare

Math often finds what things have in common (two birds, two fish => "twoness"). But to internalize a concept, we need several ways to describe the same thing. Use as many names as it takes: each is an antibiotic that can treat our confusion, and maybe one will stick.

I have an intuition cheatsheet with rewordings of many concepts.

Hope you enjoy it!

]]>**1) Matrix multiplication scales/rotates/skews a geometric plane.**

This is useful when first learning about vectors: vectors go in, new ones come out. Unfortunately, this can lead to an over-reliance on geometric visualization.

If 20 families are coming to your BBQ, how do you estimate the hotdogs you need? (*Hrm… 20 families, call it 3 people per family, 2 hotdogs each… about 20 * 3 * 2 = 120 hotdogs.*)

You probably don't think "Oh, I need the volume of a invitation-familysize-hunger prism!". With large matrices I don't think about 500-dimensional vectors, just data to be modified.

**2) Matrix multiplication composes linear operations.**

This is the technically accurate definition: yes, matrix multiplication results in a new matrix that composes the original functions. However, sometimes the matrix being operated on is not a linear operation, but a set of vectors or data points. We need another intuition for what's happening.

I'll put a programmer's viewpoint into the ring:

**3) Matrix multiplication is about information flow, converting data to code and back.**

I think of linear algebra as "math spreadsheets" (if you're new to linear algebra, read this intro):

- We store information in various spreadsheets ("matrices")
- Some of the data are seen as functions to apply, others as data points to use
- We can swap between the vector and function interpretation as needed

Sometimes I'll think of data as geometric vectors, and sometimes I'll see a matrix as a composing functions. But mostly I think about information flowing through a system. (Some purists cringe at reducing beuatiful algebraic structures into frumpy spreadsheets; I sleep OK at night.)

Take your favorite recipe. If you interpret the words as *instructions*, you'll end up with a pie, muffin, cake, etc.

If you interpret the words as *data*, the text is prose that can be tweaked:

- Convert measurements to metric units
- Swap ingredients due to allergies
- Adjust for altitude or different equipment

The result is a new recipe, which can be further tweaked, or executed as instructions to make a different pie, muffin, cake, etc. (Compilers treat a program as text, modify it, and eventually output "instructions" — which could be text for another layer.)

That's Linear Algebra. We take raw information like "3 4 5" treat it as a vector or function, depending on how it's written:

By convention, a vertical column is usually a vector, and a horizontal row is typically a function:

`[3; 4; 5]`

means`x = (3, 4, 5)`

. Here,`x`

is a vector of data (I'm using`;`

to separate each row).`[3 4 5]`

means`f(a, b, c) = 3a + 4b + 5c`

. This is a function taking three inputs and returning a single result.

And the aha! moment: data is code, code is data!

The row containing a horizontal function could really be three data points (each with a single element). The vertical column of data could really be three distinct functions, each taking a single parameter.

Ah. This is getting neat: depending on the desired outcome, we can combine data and code in a different order.

The matrix transpose swaps rows and columns. Here's what it means in practice.

If `x`

was a column vector with 3 entries (`[3; 4; 5]`

), then `x'`

is:

- A function taking 3 arguments (
`[3 4 5]`

) `x'`

can still remain a data vector, but as three separate entries. The transpose "split it up".

Similarly, if `f = [3 4 5]`

is our row vector, then `f'`

can mean:

- A single data vector, in a vertical column.
`f'`

is separated into three functions (each taking a single input).

Let's use this in practice.

When we see `x' * x`

we mean: `x'`

(as a single function) is working on `x`

(a single vector). The result is the ** dot product** (read more). In other words, we've applied the data to itself.

When we see `x * x'`

we mean `x`

(as a set of functions) is working on `x'`

(a set of individual data points). The result is a grid where we've applied each function to each data point. Here, we've mixed the data with itself in every possible permutation.

I think of `xx`

as `x(x)`

. It's the "function x" working on the "vector x". (This helps compute the covariance matrix, a measure of self-similarity in the data.)

Phew! How does this help us? When we see an equation like this (from the Machine Learning class):

I now have an instant feel of what's happening. In the first equation, we're treating theta (which is normally a set of data parameters) as a function, and passing in x as an argument. This should give us a single value.

More complex derivations like this:

can be worked through. In some cases it gets tricky because we store the data as rows (not columns) in the matrix, but now I have much better tools to follow along. You can start estimating when you'll get a single value, or when you'll get a "permutation grid" as a result.

Geometric scaling and linear composition have their place, but here I want to think about information. "The information in x is becoming a function, and we're passing itself the a parameter."

Long story short, don't get locked into a single intuition. Multiplication evolved from repeated addition, to scaling (decimals), to rotations (imaginary numbers), to "applying" one number to another (integrals), and so on. Why not the same for matrix multiplication?

Happy math.

You may be curious why we can't use the other combinations, like `x x`

or `x' x'`

. Simply put, the parameters don't line up: we'd have functions expecting 3 inputs only being passed a single parameter, or functions expecting single inputs getting passed 3.

The dot product `x' * x`

could be seen as the following javascript command:

`(function(a,b,c){ return 3*a + 4*b + 5*c; })(3,4,5)`

We define a function of 3 arguments and pass it the 3 parameters. This returns 50 (the dot product).

The math notation is super-compact, so we can simply write (in Octave/Matlab):

>> [3 4 5] * [3 4 5]' ans = 50

(Remember that `[3 4 5]`

is the function and `[3; 4; 5]`

or `[3 4 5]'`

is how we'd write the data vector.)

This article came about from a TODO in my class notes:

I wanted to explain to myself — in plain English — why we wanted `x' x`

and not the reverse. Now, in plain English: We're treating the information as a function, and passing the same info as the parameter.