Pat Shaughnessy

Compiling Ruby To Machine Language

2025-11-17T00:00:00Z

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.

Here’s an excerpt from the completely new content for Chapter 4, about YJIT and ZJIT. I’m still finishing this up… so this content is fresh off the page! It’s been a lot of fun for me to learn about how JIT compilers work and to brush up on my Rust skills as well. And it’s very exciting to see all the impressive work the Ruby team at Shopify and other contributors have done to improve Ruby’s runtime performance.

Chapter 4: Compiling Ruby To Machine Language

Interpreting vs. Compiling Ruby Code	4
Yet Another JIT (YJIT)	6
Virtual Machines and Actual Machines	6
Counting Method and Block Calls	8
YJIT Blocks	8
YJIT Branch Stubs	10
Executing YJIT Blocks and Branches	11
Deferred Compilation	12
Regenerating a YJIT Branch	12
YJIT Guards	14
Adding Two Integers Using Machine Language	15
Experiment 4-1: Which Code Does YJIT Optimize?	18
How YJIT Recompiles Code	22
Finding a Block Version	22
Saving Multiple Block Versions	24
ZJIT, Ruby’s Next Generation JIT	26
Counting Method and Block Calls	27
ZJIT Blocks	29
Method Based JIT	31
Rust Inside of Ruby	33
Experiment 4-2: Reading ZJIT HIR and LIR	35
Summary	37

Counting Method and Block Calls

To find hot spots, YJIT counts how many times your program calls each function or block. When this count reaches a certain threshold, YJIT stops your program and converts that section of code into machine language. Later Ruby will execute the machine language version instead of the original YARV instructions.

To keep track of these counts, YJIT saves an internal counter nearby the YARV instruction sequence for each function or block.

Figure 4-5: YJIT saves information adjacent to each set of YARV instructions

Figure 4-5 shows the YARV instruction sequence the main Ruby compiler created for the sum += i block at (3) in Listing 4-1. At the top, above the YARV instructions, Figure 4-5 shows two YJIT related values: jit_entry and jit_entry_calls. As we’ll see in a moment, jit_entry starts as a null value but will later hold a pointer to the machine language instructions YJIT produces for this Ruby block. Below jit_entry, Figure 4-5 also shows jit_entry_calls, YJIT’s internal counter.

Each time the program in Listing 4-1 calls this block, YJIT increments the value of jit_entry_calls. Since the range at (1) in Listing 4-1 spans from 1 through 40, this counter will start at zero and increase by 1 each time Range#each calls the block at (3).

When the jit_entry_calls reaches a particular threshold, YJIT will compile the YARV instructions into machine language. By default for small Ruby programs YJIT in Ruby 3.5 uses a threshold of 30. Larger programs, like Ruby on Rails web applications, will use a larger threshold value of 120. (You can also change the threshold by passing —yjit-call-threshold when you run your Ruby program.)

YJIT Blocks

While compiling your Ruby program, YJIT saves the machine language instructions it creates into YJIT blocks. YJIT blocks, which are distinct from Ruby blocks, each contain a sequence of machine language instructions for a range of corresponding YARV instructions. By grouping YARV instructions and compiling each group into a YJIT block, YJIT can produce more optimized code that is tailored to your program’s behavior and avoid compiling code that your program doesn’t need.

As we’ll see next, a single YJIT block doesn’t correspond to a Ruby function or block. YJIT blocks instead represent smaller sections of code: individual YARV instructions or a small range of YARV instructions. Each Ruby function or block typically consists of several YJIT blocks.

Let’s see how this works for our example. After the program in Listing 4-1 executes the Ruby block at (3) 29 times, YJIT will increment the jit_entry_calls counter again, just before Ruby runs the block for the 30th time. Since jit_entry_calls reaches the threshold value of 30, YJIT triggers the compilation process.

YJIT compiles the first YARV instruction getlocal_WC_1 and saves machine language instructions that perform the same work as getlocal_WC_1 into a new YJIT block:

Figure 4-6: Creating a YJIT block

On the left side, Figure 4-6 shows the YARV instructions for the sum += i Ruby block. On the right, Figure 4-6 shows the new YJIT block corresponding to getlocal_WC_1.

Next, the YJIT compiler continues and compiles the second YARV instruction from the left side of Figure 4-7: getlocal_WC_0 at index 2.

Figure 4-7: Appending to a YJIT block

On the left side, Figure 4-7 shows the same YARV instructions for the sum += i Ruby block that we saw above in Figure 4-6. But now the two dotted arrows indicate that the YJIT block on the right contains the machine language instructions equivalent to both getlocal_WC_1 and getlocal_WC_0.

Let’s take a look inside this new block. YJIT compiles or translates the Ruby YARV instructions into machine language instructions. In this example, running on my Mac laptop, YJIT writes the following machine language instructions into this new block:

Figure 4-8: The contents of one YJIT block

Figure 4-8 shows a closer view of the new YJIT block that appeared on the right side of Figures 4-6 and 4-7. Inside the block, Figure 4-8 shows the assembly language acronyms corresponding to the ARM64 machine language instructions that YJIT generated for the two YARV instructions shown on the left. The YARV instructions on the left are: getlocal_WC_1, which loads a value from a local variable located in the previous stack frame and saves it on the YARV stack, and getlocal_WC_0, which loads a local variable from the current stack from and also saves it on the YARV stack. The machine language instructions on the right side of Figure 4-8 perform the same task, loading these values into registers on my M1 microprocessor: x1 and x9. If you’re curious and would like to learn more about what the machine language instructions mean and how they work, the section “Adding Two Integers Using Machine Language” discusses the instructions for this example in more detail.

YJIT Branch Stubs

Next, YJIT continues down the sequence of YARV instructions and compiles the opt_plus YARV instruction at index 4 in Figures 4-6 and 4-7. But this time, YJIT runs into a problem: It doesn’t know the type of the addition arguments. That is, will opt_plus add two integers? Or two strings, floating point numbers, or some other types?

Machine language is very specific. To add two 64-bit integers on an M1 microprocessor, YJIT could use the adds assembly language instruction. But adding two floating point numbers would require different instructions. And, of course, adding or concatenating two strings is an entirely different operation altogether.

In order for YJIT to know which machine language instructions to save into the YJIT block for opt_plus, YJIT needs to know exactly what type of values the Ruby program might ever add at (3) in Listing 4-1. You and I can tell by reading Listing 4-1 that the Ruby code is adding integers. We know right away that the sum += 1 block at (3) is always adding one integer to another. But YJIT doesn’t know this.

YJIT uses a clever trick to solve this problem. Instead of analyzing the entire program ahead of time to determine all of the possible types of values the opt_plus YARV instruction might ever need to add, YJIT simply waits until the block runs and observes which types the program actually passes in.

YJIT uses branch stubs to achieve this wait-and-see compile behavior, as shown in Figure 4-9.

Figure 4-9: A YJIT block, branch and stub

Figure 4-9 shows the YARV instructions on the left, and the YJIT block for indexes 0000-0002 on the right. But note the bottom right corner of Figure 4-7, which shows an arrow pointing down from the block to a box labeled stub. This arrow represents a YJIT branch. Since this new branch doesn’t point to a block yet, YJIT sets up the branch to point to a branch stub instead.

YARV’s Internal Stack and Your Ruby Stack

2025-11-10T00:00:00Z

The content of Chapter 3, about the YARV virtual machine, hasn't changed much since 2014. However, I did update all of the diagrams to account for some new values YARV now saves inside of each stack frame. And some of the common YARV instructions were renamed as well. I also moved some content that was previously part of Chapter 4 here into Chapter 3. Right now I'm rewriting Chapter 4 from scratch, describing Ruby's new JIT compilers.

Chapter 3: How Ruby Executes Your Code

YARV’s Internal Stack and Your Ruby Stack	4
Stepping Through How Ruby Executes a Simple Script	5
Executing a Call to a Block	8
Taking a Close Look at a YARV Instruction	10
Local and Dynamic Access of Ruby Variables	12
Local Variable Access	12
Method Arguments Are Treated Like Local Variables	15
Dynamic Variable Access	16
Climbing The Environment Pointer Ladder In C	22
Experiment 3-1: Exploring Special Variables	23
Controlling the Flow of Execution	27
How Ruby Executes an if Statement	27
Jumping from One Scope to Another	29
Catch Tables	30
Other Uses for Catch Tables	33
Experiment 3-2: Testing How Ruby Implements For Loops Internally	34
Summary	35

YARV’s Internal Stack and Your Ruby Stack

As we’ll see in a moment, YARV uses a stack internally to track intermediate values, arguments, and return values. YARV is a stack-oriented virtual machine.

In addition to its own internal stack, YARV keeps track of your Ruby program’s call stack, recording which methods call which other methods, functions, blocks, lambdas, and so on. In fact, YARV is not just a stack machine—it’s a double-stack machine! It has to track the arguments and return values not only for its own internal instructions but also for your Ruby program.

Figure 3-1 shows YARV’s basic registers and internal stack.

Figure 3-1: Some of YARV’s internal registers, including the program counter and stack pointer

YARV’s internal stack is on the left. The SP label is the stack pointer, or the location of the top of the stack. On the right are the instructions that YARV is executing. PC is the program counter, or the location of the current instruction.

You can see the YARV instructions that Ruby compiled from the puts 2+2 example on the right side of Figure 3-1. YARV stores both the SP and PC registers in a C structure called rb_control_frame_t, along with the current value of Ruby’s self variable and some other values not shown here.

At the same time, YARV maintains another stack of these rb_control_frame_t structures, as shown in Figure 3-2.

Figure 3-2: YARV keeps track of your Ruby call stack using a series of rb_control_frame_t structures.

This second stack of rb_control_frame_t structures represents the path that YARV has taken through your Ruby program, and YARV’s current location. In other words, this is your Ruby call stack—what you would see if you ran puts caller.

The CFP pointer indicates the current frame pointer. Each stack frame in your Ruby program stack contains, in turn, a different value for the self, PC, and SP registers, as shown in Figure 3-1. Ruby also keeps track of type of code running at each level in your Ruby call stack, indicated by the “[BLOCK]”, “[METHOD]” notation in Figure 3-2.

Stepping Through How Ruby Executes a Simple Script

In order to help you understand this a bit better, here are a couple of examples. I’ll begin with the simple 2+2 example from Chapters 1 and 2, shown again in Listing 3-1.

puts 2+2

Listing 3-1: A one-line Ruby program that we’ll execute as an example

This one-line Ruby script doesn’t have a Ruby call stack, so I’ll focus on the internal YARV stack for now. Figure 3-3 shows how YARV will execute this script, beginning with the first instruction, putself.

Figure 3-3: On the left is YARV’s internal stack, and on the right is the compiled version of my puts 2+2 program.

As you can see in Figure 3-3, YARV starts the program counter (PC) at the first instruction, and initially the stack is empty. Now YARV executes the putself instruction, and pushes the current value of self onto the stack, as shown in Figure 3-4.

Figure 3-4: putself pushes the top self value onto the stack

Because this simple script contains no Ruby objects or classes, the self pointer is set to the default top self object. This is an instance of the Object class that Ruby automatically creates when YARV starts. It serves as the receiver for method calls and the container for instance variables in the top-level scope. The top self object contains a single, predefined to_s method, which returns the string “main.” You can call this method by running the following command in the console:

$ ruby -e 'puts self'
=> main

YARV will use this self value on the stack when it executes the opt_send_without_block instruction: self is the receiver of the puts method because I didn’t specify a receiver for this method call.

Next, YARV executes putobject 2. It pushes the numeric value 2 onto the stack and increments the PC again, as shown in Figure 3-5.

Figure 3-5: Ruby pushes the value 2 onto the stack, the receiver of the + method.

This is the first step of the receiver (arguments) operation pattern described in “How Ruby Compiles a Simple Script” on page 34. First, Ruby pushes the receiver onto the internal YARV stack. In this example, the Fixnum object 2 is the receiver of the message/method +, which takes a single argument, also a 2. Next, Ruby pushes the argument 2, as shown in Figure 3-6.

Figure 3-6: Ruby pushes another value 2 onto the stack, the argument of the + method.

Finally, Ruby executes the + operation. In this case, opt_plus is an optimized instruction that will add two values: the receiver and the argument, as shown in Figure 3-7.

Figure 3-7: Figure 3-7: The opt_plus instruction calculates 2 + 2 = 4.

As you can see in Figure 3-7, the opt_plus instruction leaves the result, 4, at the top of the stack. Now Ruby is perfectly positioned to execute the puts function call: The receiver self is first on the stack, and the single argument, 4, is at the top of the stack. (I’ll describe how method lookup works in Chapter 6.)

Next, Figure 3-8 shows what happens when Ruby executes the puts method call. As you can see, the opt_send_without_block instruction leaves the return value, nil, at the top of the stack. Finally, Ruby executes the last instruction, leave, which finishes the execution of our simple, one-line Ruby program. Of course, when Ruby executes the puts call, the C code implementing the puts function will actually display the value 4 in the console output.

Figure 3-8: Ruby calls the puts method on the top self object.

Compiling a Call to a Block

2025-11-03T00:00:00Z

This week's excerpt is from Chapter 2, about Ruby's compiler. Whenever I think about it, I'm always suprised that Ruby has a compiler like C, Java or any other programming language. The only difference is that we don't normally interact with Ruby's compiler directly.

The developers who contributed Ruby's new parser, Prism, also had to rewrite the Ruby compiler because Prism now produces a completely different, redesigned abstract syntax tree (AST). Chapter 2's outline is more or less the same as it was in 2014, but I redrew all of the diagrams and updated much of the text to match the new AST nodes and other changes for Prism.

Chapter 2: Compilation

The Ruby Compiler	4
Ruby 3.2 Introduces a Just-In-Time (JIT) Compiler	5
How Ruby Compiles a Simple Script	6
Scope AST Nodes	7
Compiling a Simple AST	8
Compiling a Call to a Block	12
How Ruby Iterates Through the AST	16
Experiment 2-1: Displaying YARV Instructions	19
The Local Table	21
Compiling Optional Arguments	23
Compiling Keyword Arguments	24
Unnamed Local Variables	25
Experiment 2-2: Displaying the Local Table	28
Summary	30

Compiling a Call to a Block

Next, let’s compile my 10.times do example from Listing 1-1 in Chapter 1 (see Listing 2-2).

10.times do |n|
  puts n
end

Listing 2-2: A simple script that calls a block (repeated from Listing 1-1)

Notice that this example contains a block parameter to the times method. This is interesting because it will give us a chance to see how the Ruby compiler handles blocks. Figure 2-13 shows the AST for the 10.times do example again.

Figure 2-13: A simplified view of the AST for the call to 10.times, passing a block

The left side of Figure 2-13 shows the AST for the 10.times function call: the call node and the receiver 10, represented by integer node. On the right, Figure 2-13 shows the beginning of the AST for the block: do |n| puts n end, represented by the block node. You can see Ruby has added a scope node on both sides, since there are two lexical scopes in Listing 2-2: the top level and the block. Let’s break down how Ruby compiles the main portion of the script shown on the left of Figure 2-13. As before, Ruby starts with the first PM_NODE_SCOPE and creates a new snippet of YARV instructions, as shown in Figure 2-14.

Figure 2-14: Each PM_SCOPE_NODE is compiled into a new snippet of YARV instructions.

Next, Ruby steps down the AST nodes to PM_CALL_NODE, as shown in Figure 2-15.

Figure 2-15: Ruby stepping through an AST

At this point, there is still no code generated, but notice in Figure 2-13 that two arrows lead from PM_CALL_NODE: one to PM_INTEGER_NODE, which represents the 10 in the 10.times call, and another to the inner block. Ruby will first continue down the AST to the integer node and compile the 10.times method call. The resulting YARV code, following the same receiver-arguments-message pattern we saw in Figures 2-7 through 2-11, is shown in Figure 2-16.

Figure 2-16: Ruby compiles the 10.times method call.

Notice that the new YARV instructions shown in Figure 2-16 push the receiver (the integer object 10) onto the stack first, after which Ruby generates an instruction to execute the times method call. But notice, too, the block in <main> argument in the send instruction. This indicates that the method call also contains a block argument: do |n| puts n end. In this example, the arrow from PM_CALL_NODE to the second PM_SCOPE_NODE has caused the Ruby compiler to include this block argument. Ruby continues by compiling the inner block, beginning with the second PM_CALL_NODE shown at right in Figure 2-13. Figure 2-17 shows what the AST for that inner block looks like.

Figure 2-17: The branch of the AST for the contents of the block

Notice Ruby inserted a scope node at the top of this branch of the AST also. Figure 2-17 shows the scope node contains two values: argc=1 and locals: [n]. These values were empty in the parent scope node, but Ruby set them here to indicate the presence of the block parameter n. From a relatively high level, Figure 2-18 shows how Ruby compiles the inner block.

Figure 2-18: How Ruby compiles a call to a block

You can see the parent PM_NODE_SCOPE at the top, along with the YARV code from Figure 2-16. And below that Figure 2-18 shows the the inner scope node for the block, along with the YARV instructions for the block’s call to puts n. Later in this chapter we’ll learn how Ruby handles parameters and local variables, like n in this example; why Ruby generates these instructions for puts n. The key point for now is that Ruby compiles each distinct scope in your Ruby program—methods, blocks, classes, or modules, for example—into a separate snippet of YARV instructions.

Parsing: How Ruby Understands Your Code

2025-10-27T00:00:00Z

Update: I’ve made a lot of progress so far this year. I had time to completely rewrite Chapters 1 and 2, which cover Ruby’s new Prism parser and the Ruby compiler which now handles the Prism AST. I also updated Chapter 3 about YARV and right now I’m working on rewriting Chapter 4 which will cover YJIT and possibly other Ruby JIT compilers.

Here’s an excerpt from the new version of Chapter 1. Many thanks to Kevin Newton, who reviewed the content about Prism and had a number of corrections and great suggestions. Also thanks to Douglas Eichelberger who had some great feedback as well.

I’ll post more excerpts from Chapters 2, 3 and 4 in the coming weeks. Thanks for everyone’s interest in Ruby Under a Microscope!

Chapter 1: Tokenization And Parsing

Tokens: The Words That Make Up the Ruby Language	3
Which Words Are Reserved Words?	8
Experiment 1-1: Using Prism to Tokenize Different Ruby Scripts	10
Parsing: How Ruby Understands Your Code	13
Identifying Tokens	14
Parsing Subexpressions Recursively	16
Comparing Tokens	17
Operator Precedence	19
Left and Right Associative Operators	24
Binding Powers	28
Experiment 1-2: Using Prism to Parse Different Ruby Scripts	30
Summary	33

Parsing: How Ruby Understands Your Code

Once Ruby converts your code into a series of tokens, what does it do next? How does it actually understand and run your program? Does Ruby simply step through the tokens and execute each one in order?

No. Your code still has a long way to go before Ruby can run it. The next step on its journey through Ruby is called parsing, where words or tokens are grouped into sentences or phrases that make sense to Ruby. When parsing, Ruby takes into account the order of operations, methods, blocks, and other larger code structures.

Ruby’s parsing engine defines Ruby’s syntax rules. Reading in tokens, Ruby matches the token types and the order the tokens appear with a large series of patterns. These patterns, indeed, are the heart and soul of the Ruby language. How we write a function call, how we define a method using the def keyword, how we write classes and modules - the patterns Ruby looks for define the language.

Ruby’s parse algorithm has three high level steps:

Identify: First, Ruby identifies what the next token represents. Ruby does this by comparing the token’s type - and possibly the types of the following tokens - with a large series of patterns. If one pattern matches, Ruby understands what your code means. If not, Ruby emits a syntax error.
Recurse: Secondly, Ruby calls itself. Each value in one of the syntax patterns can itself be a subexpression - a smaller program that Ruby needs to parse. To do this, Ruby calls itself recursively.
Compare: Third, Ruby compares the current token with the next token to determine which has a higher precedence. This comparison leads Ruby down a specific path, processing the tokens in a certain order.

Let’s break down these ideas further, by following Ruby through the “Hello World” program. Afterwards, we’ll look at a second, slightly more complicated example.

puts "Hello World"

Listing 1-11: Hello World in Ruby

As we saw in the previous section, Ruby first converts the text in this code file into tokens. For Hello World, Ruby’s tokenizer produces these five tokens:

Figure 1-14: Hello World Tokenized

To make the following diagrams simpler, let’s redraw these tokens in a more compact format:

Figure 1-15: Hello World tokens in a more compact format

Using a single gray line of text, Figure 1-15 shows the five tokens from Figure 1-14 in a more compact format. First, PM_TOKEN_IDENTIFIER represents the word “puts” from the beginning of the program. Next, three tokens make up the string literal value: PM_TOKEN_STRING_BEGIN for the first double quote, followed by PM_TOKEN_STRING_VALUE for the words Hello and World, and PM_TOKEN_STRING_END represents the second quote. Finally, the program ends with PM_TOKEN_EOF to mark the end of the source code file.

Now let’s follow Ruby as it processes the Hello World example using the three steps: identify, recurse and compare.

Identifying Tokens

First, identify. How does Ruby understand what the first token, PM_TOKEN_IDENTIFIER, means?

Figure 1-16: Parsing the first token

Figure 1-16 represents the state of Ruby’s parser when it starts to parse this code. At this moment, Ruby is just getting started by inspecting the puts identifier. One of the patterns Ruby looks for matches the identifier; but what does this identifier mean? Ruby knows puts could be a local variable, or it could be the name of a function to call. Since there are no local variables defined in this program, Ruby determines that the puts identifier represents a function the program is calling. (It’s also possible that the program is about to create a new local variable like this: puts = "Hello World". If that were the case, Ruby would see the assignment operator next and parse things differently.)

What happens next? After matching the token to the function call pattern, Ruby records this match in a data structure called an abstract syntax tree (AST). Ruby and most other programming languages use ASTs to record the results of parsing tokens like this. As we’ll see, the AST’s tree structure is well suited for holding the nested, recursive structure of computer programs.

Figure 1-17: The first AST node

Figure 1-17 shows the first node Ruby saves in the AST tree. In a moment, Ruby will begin to add more nodes to the AST.

Before proceeding to the next token, let’s imagine the syntax pattern for a function call:

function-name ( argument1, argument2, argument3, etc. )

Although in Ruby the parentheses are optional, so this pattern also applies:

function-name argument1, argument2, argument3, etc.

NOTE
The original version of the Ruby parser used patterns or grammar rules like this directly with a tool called a parser generator. However, starting with Ruby 3.3, Ruby uses a new parser called Prism, which detects these patterns directly using hand written C code.

After parsing the first token, Ruby inspects the second token. According to the function call pattern, Ruby knows the second token might represent the first argument to the function call. But, how many arguments are there? And what is each argument? The program in Listing 1-11 is very simple, but it could have instead printed a very complex expression - the arguments to puts could have run on for many lines and used hundreds of tokens.

Parsing Subexpressions

Second, recurse. To parse each of the arguments to puts, Ruby has to call itself.

Figure 1-18: Parsing the second token

Figure 1-18 shows two levels of the Ruby parser’s call stack; the top line shows Ruby parsing the puts identifier token, and matching the function call pattern. The second line shows how Ruby called itself to parse the second token, PM_TOKEN_STRING_BEGIN, the leading quote of the string literal. Think of these lines as the backtrace of the Ruby parser.

Figure 1-18 also shows a value 14 on the right side. While calling itself recursively, Ruby passes in a numeric value called the binding power. We’ll return to this later.

Now that Ruby has called itself, Ruby starts the 3-step process all over again: identify, recurse and compare. This time, Ruby has to identify what the PM_TOKEN_STRING_BEGIN token means. This token always indicates the start of a string value. In this example PM_TOKEN_STRING_BEGIN represents the double quote that appears after puts. But the same token might represent a single quote or one of the other ways you can write a string in Ruby, for example using %Q or %q.

Ruby’s new parser, Prism, next parses the string contents directly by processing the following two tokens:

Figure 1-19: Parsing the third and fourth tokens

In this example, Ruby’s parser is done after finding the PM_TOKEN_STRING_END token and can continue to the next step. More complex strings - strings that contain interpolated values using #{} for example - might have required Ruby to call itself yet again to process more nested expressions. But for the simple "Hello World" string Ruby is done.

To record the string value, Ruby creates a new AST node called PM_STRING_NODE.

Figure 1-20: Two AST nodes

Figure 1-20 shows two AST nodes Ruby has created so far: the call node created earlier, and now a new string node.

Ruby’s parser is a recursive descent parser. This Computer Science term describes parsers that resemble the grammar or syntax rules of the programs they parse, and call themselves recursively in a top-down manner as they process nested structures. Many modern programming languages today use this general approach.

Write Barriers

2025-02-18T00:00:00Z

Ruby’s garbage collection implementation is complex and confusing - yet powerful and beautiful at the same time. Chapter 13 summarizes and explains advanced aspects of Ruby’s garbage collector. I love learning about the difficult bits at a deep enough level to be able to explain them to someone else. In RUM I often show snippets from Ruby's C source code in gray callouts. These sections are for readers who already understand C syntax, or who would like to learn it. These can give you sign posts in case you ever decide to read Ruby’s source code for yourself.

Chapter 13: Incremental and Generational Garbage Collection

Incremental GC	3
Marking in More Detail	4
Stop The World Garbage Collection	8
Incremental Garbage Collection	10
How Ruby Implements Incremental GC	11
Write Barriers	12
A Write Barrier in Action	14
Experiment 13-1: What happens if GC can’t free enough slots?	15
Generational GC	22
Keeping Track of an Object’s Age	23
How Marking and Sweeping Work Together	24
Seeing Generational GC In Action	26
Write Barriers Again	29
The Remember Set and Unprotected Values	30
Garbage Collection Bitmaps	31
Major and Minor GCs	34
Experiment 13-2: Major and Minor GCs	35
Summary	40

Write Barriers

Write barriers warn Ruby’s garbage collection algorithm whenever your program creates a new object that might have to be marked. You can think of the barriers as small boxes that surround arrays, hashes and other data structures you might have in your program. For example, in Listing 13-1 Ruby places a write barrier around arr, the array that contains all of the new objects:

Figure 13-9: A Write Barrier

Figure 13-9 repeats Figure 13-6, which showed the array arr, just after Ruby removed it from the mark stack, along with all of the elements of arr, still on the stack. However, on the left the dotted rectangle represents a writer barrier around this array. (In reality, there’s nothing special about this array; Ruby uses write barriers for all arrays, hashes and other data structures.)

The write barrier, as the name implies, jumps into action whenever your program writes to the array inside. In this example, the barrier will be triggered when the program runs the line of code at (3) in Listing 13-1 inside the loop:

(3) arr.push(Object.new)

If your program was running between incremental GC steps when Ruby added a new element to the array, Ruby would move the array back on to the mark stack:

Figure 13-10: Triggering a Write Barrier Pushes the Array Back on the Mark Stack

Figure 13-10 shows the array arr on the left, inside of a write barrier. The line of code just above, arr.push, triggers the write barrier because it writes to the array’s contents. This triggers the write barrier, which moves the array arr back on to the mark stack, shown on the right. Now during the next GC step, Ruby will process the array’s children again, even if it had processed them earlier. This allows Ruby to find and mark the new object just added to the array.

The mark stack is how Ruby remembers its place inside of the GC algorithm, between one incremental GC and the next. Whenever an incremental GC step starts, Ruby continues to mark the objects present on the mark stack that were pending from last time or that were modified in the meantime.

A Write Barrier in Action

Whenever your program modifies an array, hash or other value protected by write barriers, Ruby calls this C code internally. rb_gc_impl_writebarrier_remember at (1) in Listing 13-2 pushes a modified object back on to the mark stack, as shown above in Figure 13-10:

(1) void rb_gc_impl_writebarrier_remember(void *objspace_ptr, VALUE obj)
    {
        rb_objspace_t *objspace = objspace_ptr;
    
        gc_report(1, objspace, "rb_gc_writebarrier_remember: %s\n", rb_obj_info(obj));
    
(2)     if (is_incremental_marking(objspace)) {
(3)         if (RVALUE_BLACK_P(objspace, obj)) {
(4)             gc_grey(objspace, obj);
            }
        }
        else {
(5)         if (RVALUE_OLD_P(objspace, obj)) {
                rgengc_remember(objspace, obj);
            }
        }
    }

Listing 13-2: The rb_gc_writebarrier_remember function

Ruby first checks at (2) whether it is currently in the midst of an incremental garbage collection. If it is, Ruby continues to the line at (3), and checks whether the given object was already completely processed: whether Ruby marked it and all of its children. (Recall the non-inclusive term “black” from the tricolor GC algorithm.) For example, in Figure 13-9 the arr value was completed processed, since Ruby marked it and also moved all of its children on to the mark stack, removing arr from the mark stack.

If Ruby did already mark the value, and if Ruby already pushed the value’s children on to the mark stack, then Ruby knows it needs to process the value again - since your program modified it. In this case, Ruby continues on to the line at (4) and calls gc_grey, which pushes the value back on to the mark stack. Later Ruby will iterate over its children again, pushing each of them back on to the mark stack.

Looking ahead to the next section, Generational GC on page 22, write barriers use the code in the else clause at (5) to handle “old” values. We’ll cover Generational GC next. But first, Experiment 13-1 will take a look at incremental GC in action.

Using Different Size Pools

2025-02-11T00:00:00Z

The Ruby team has done a tremendous amount of work over the past decade on Ruby's garbage collection (GC) implementation. In fact, Ruby's new GC is one of the key reasons Ruby 3 is so much faster than Ruby 2. To bring all of this work to light, I decided to rewrite Chapter 12 from scratch, covering garbage collection in Ruby more accurately and in more depth. But then, after a few months, I realized I had gotten carried away and wrote too much material for one chapter. So the updated book will have two new chapters on garbage collection: one on garbage collection basics and a second new chapter on incremental and generational garbage collection. Here's a small excerpt.

Chapter 12: Garbage Collection Basics

Where Do Ruby Values Live?	4
The RVALUE Structure	7
RVALUE Written in C	8
The Free List	9
Embedded Values	10
Size Pools	12
Experiment 12-1: Using Different Size Pools	14
Cleaning Up Unused Values	19
Filling Up a Page	19
Marking	22
Sweeping	24
How Ruby Frees An Object	28
Experiment 12-2: When Does Ruby Collect Your Garbage?	29
Summary	25

Experiment 12-1: Using Different Size Pools

Ruby 3.2 and later provides a way to see statistics about size pools, the GC.stat_heap class method. GC.stat_heap returns a hash as shown in Listing 12-5.

require 'pp'
pp GC.stat_heap

{0 =>
  {slot_size: 40,
   heap_eden_pages: 10,
   heap_eden_slots: 16374,
   total_allocated_pages: 10,
   force_major_gc_count: 0,
   force_incremental_marking_finish_count: 0,
   total_allocated_objects: 26378,
   total_freed_objects: 10231},
 1 =>
  {slot_size: 80,

Etc…

Listing 12-5: Calling GC.stat_heap

Listing 12-5 shows the output of GC.stat_heap for Size Pool 0, which includes the slot size (:slot_size=>40), the number of pages Ruby has allocated so far for this size (:heap_eden_pages=>10) and the total number of slots allocated across all of these pages (:heap_eden_slots=>16374). The output of GC.stat_heap continues on to show the same statistics for the other size pools.

We can use GC.stat_heap to investigate how Ruby uses size pools as we allocate more and more objects. Listing 12-6 shows a Ruby program that allocates arrays of varying sizes, and then records the output from GC.stat_heap.

(1) CAPACITY_ITER = 100
    ALLOCATE_ITER = 10000

(2) all = []

(3) CAPACITY_ITER.times do |capa|

(4)   ALLOCATE_ITER.times do |i|
        all << Array.new(capa)
      end

(5)   total_slots_by_size_pool = []
      GC.stat_heap.each do |size_pool, stats|
        total_slots_by_size_pool[size_pool] = stats[:heap_eden_slots]
      end

      print "#{capa}, "
      print "#{total_slots_by_size_pool[0]}, "
      print "#{total_slots_by_size_pool[1]}, "
      print "#{total_slots_by_size_pool[2]}, "
      print "#{total_slots_by_size_pool[3]}, "
      puts  "#{total_slots_by_size_pool[4]}"

    end

Listing 12-6: Detecting Which Size Pool Ruby Uses for Arrays of Varying Sizes

This program contains an inner loop and an outer loop. The outer loop at (3) in Listing 12-6 iterates over arrays of different capacities, from 1 up to 100 (CAPACITY_ITER). For each array capacity, the program creates 10,000 (ALLOCATE_ITER) array objects of that size using the inner loop (4). Note the program saves all of the new arrays into a single array called all, created at (2). This insures that Ruby doesn’t free all of our new arrays by running a garbage collection.

After creating 10,000 arrays of the given capacity, the program saves the heap_eden_slots value from the return value of GC.stat_heap for all of the size pools at (5), and then prints out the results at (6).

Running the code in Listing 12-6 produces this output:

0, 22923, 6548, 2861, 611, 306
1, 34385, 6548, 2861, 611, 306
2, 44209, 6548, 2861, 611, 306
3, 54033, 6548, 2861, 611, 306
4, 54033, 13914, 2861, 611, 306
5, 54033, 23735, 2861, 611, 306
6, 54033, 34374, 2861, 611, 306
7, 54033, 44195, 2861, 611, 306
8, 54033, 54016, 2861, 611, 306
9, 54033, 54016, 11446, 611, 306
10, 54033, 54016, 21257, 611, 306
11, 54033, 54016, 31477, 611, 306

Etc…

Listing 12-7: The Output From Running the Program in Listing 12-6

The output in Listing 12-7 shows how many slots Ruby has allocated in total after each iteration of the outer, array capacity loop. For example:

0, 22923, 6548, 2861, 611, 306

… shows that after allocating 10,000 empty arrays, Ruby has now uses a total of 22923 slots for Size Pool 0, 6548 for Size Pool One, 2861 for Size Pool Two, etc. If you try running this, you will see slightly different values.

Plotting these values, we can see which pool Ruby uses for the new arrays of various capacities:

Figure 12-10: Allocating Slots for Arrays of Various Sizes - Size Pools 0, 1 and 2, Ruby 3.4.1

Each bar in Figure 12-10 represents values from a line of output from Listing 12-7. For example, the first bar on the far left plots this output:

0, 22923, 6548, 2861, 611, 306

The first value, 0, is the position of each bar on the x-axis, while the bar’s color segments display the following three values: The dark grey bar at the bottom left corner represents Size Pool 0 (22923), the lighter bar above it shows Size Pool 1 (6548), and the lightest, top bar shows Size Pool 2 (2861).

Moving to the right, each successive bar shows the values for different array capacities. Looking over the entire graph, we can see the following pattern in Figure 12-10:

The dark grey bars for Size Pool 0 at the bottom of Figure 12-10 increase in size from capacity 0 through capacity 3, and then remain the same height after that for capacities 4 and greater.
The medium grey bars for Size Pool 1 have the same height from capacity 0 through 3, but then increase from capacity 4 through 8. From capacity 9 and onward, the medium grey bars have the same height again.
The light grey bars at the top for Size Pool 2 remain small for capacities 0 through 8, and then increase in size steadily for capacities 9 through 18. Then the remain unchanged after that.

Figure 12-10 shows Ruby saves new arrays using the following size pools:

Capacity Range	Size Pool
0-3			0
4-8			1
9-18			2

Plotting the entire output from Listing 12-7 up to capacity=100, we see the following:

Figure 12-11: Complete Output from Listing 12-7, Ruby 3.4.1

Figure 12-11 shows an interesting pattern. Ruby uses size pools 0 through 4 to save arrays with capacities from 0 up to 78 in a similar way:

Capacity Range	Size Pool
0-3			0
4-8			1
9-18			2
19-38			3
39-78			4

For each capacity range, the length of the corresponding bars grows steadily, and then stops growing.

However, once we started to save large arrays with capacities of 79 or more elements, Ruby saved them in the original Size Pool Zero again. This indicates that Ruby stopped embedding the array elements in the size pool entirely, and instead allocated a new, separate memory segment to save the elements. For these large arrays, small 40 byte RVALUE slots in Size Pool Zero were sufficient, because they each contained a pointer to the array data, and not the embedded array data itself.

Figure 12-12: A Large Array Saving Its Elements In A Separate Memory Segment

Figure 12-12 shows how large arrays, arrays which contain 79 or more elements, do not save their elements inside of the Array structure, but instead save a pointer (ptr) which contains the location of a separate memory segment that holds the array elements.

One key detail of this experiment was in Listing 12-6 at (2): the all array. The inner loop just below in Listing 12-6 at (4) saved each new array into the all array. This meant all the new arrays were in fact still being used and Ruby’s garbage collector could not reclaim their memory. Without this line of code, we would not have seen the total number of slots continually increase, preventing us from discovering which slots Ruby saved the arrays into.

But how did Ruby’s garbage collector know this, exactly? How does Ruby identify which values are used and unused by our programs? And how does it reclaim memory from the unused values? Let’s take a look at Ruby’s garbage collection process next.

Inserting One New Element into Hashes of Varying Sizes

2025-02-04T00:00:00Z

RUM includes a series of “experiments:” simple code snippets that show evidence the book’s explanations are accurate. One of the first experiments I wrote back in 2013, Experiment 7-2 is a fun way to see exactly when Ruby increases the number of bins in a hash table. The experiments in RUM are a great way to see for yourself how Ruby works. They also keep me honest; in fact, I ran this code again recently using Ruby 3.4.1 and saw different results than what I expected!

Chapter 7: The Hash Table: The Workhorse Of Ruby Internals

Hash Tables in Ruby	3
Saving a Value in a Hash Table	3
Retrieving a Value from a Hash Table	5
Experiment 7-1: Retrieving a Value from Hashes of Varying Sizes	7
How Hash Tables Expand to Accommodate More Values	9
Hash Collisions and Open Addressing	9
Searching For an Element Using Open Addressing	11
Expanding a Hash Table	14
How Does Ruby Decide How Many Entries And Bins To Use?	15
Experiment 7-2: Inserting One New Element into Hashes of Varying Sizes	17
Optimization for Small Hashes	20
How Does Ruby Convert A Packed Hash Into A Hash Table?	22
How Ruby Implements Hash Functions	23
Experiment 7-3: Using Objects as Keys in a Hash	25
Summary	30

Experiment 7-2: Inserting One New Element into Hashes of Varying Sizes

One way to test whether this rehashing, or redistribution, of entries really occurs when Ruby expands a hash is to measure the amount of time Ruby takes to save one new element into existing hashes of different sizes. As we add more elements to the same hash, we should eventually see evidence that Ruby is taking extra time to rehash the elements.

The code for this experiment is shown in Listing 7-3.

require 'benchmark'

(1) 100.times do |size|

    hashes = []
(2) 10000.times do
      hash = {}
      (1..size).each do
        hash[rand] = rand
      end 
      hashes << hash
    end 

    GC.disable

    Benchmark.bm do |bench|
      bench.report("adding element number #{size+1}") do
        10000.times do |n| 
(3)       hashes[n][size] = rand
        end 
      end 
    end 

    GC.enable

  end

Listing 7-3: Adding one more element to hashes of different sizes

At (1) the outer loop iterates over hash sizes from 0 to 100, and at (2) the inner loop creates 10,000 hashes of the given size. After disabling garbage collection, this experiment uses the benchmark library to measure how long it takes Ruby to insert a single new value at (3) into all 10,000 hashes of the given size. The results are surprising! Figure 7-13 shows the results for Ruby 3.4.1.

Figure 7-13: Time to add 10,000 key/value pairs vs. hash size (Ruby 3.4.1)

Interpreting these data values from left to right, we see the following:

It takes about 0.6 ms to insert the first element into an empty hash (10,000 times).
As the hash size increases from 2 to 8, the amount of time required to insert a new element is about the same: 0.6ms more or less.
Inserting the 9th key/value pair takes much longer, about 2ms for 10,000 hashes!
Next, as the hash size increases from 10 up to 16, once again Ruby can insert new elements quickly, between 0.6ms and 0.7ms (10,000 times).
A huge spike! It takes almost 3.1ms to insert the 17th element.
And then once again starting with the 18th element, the time to insert each element reduced to around 0.7ms-0.8.ms.
A 3rd spike appears when Ruby inserts the 33rd element: almost 5ms.
The graph once again flattens and returns to around 0.7-0.8ms per element (10,000 times).
And a 4th spike appears when Ruby inserts the 65th element: almost 6ms.

What’s going on here? Well, Ruby spends the extra time required to insert that 17th key/value pair expanding the hash table: reallocating the entries array from 16 to 32 entries, and the bin array from 32 to 64 bins, and then reassigning the st_table_entry structures to the new bin array. Ruby expands the entries and bins arrays a second time again after inserting the 33rd entry, this from from 32 to 64 entries and 64 to 128 bins. (Recall the st_features table, shown on page 15, used powers of 2 to determine these array sizes.)

The smaller spike on the 9th insert in this figure is curious. While not as pronounced as the spike at the 17th element, this smaller spike is nonetheless noticeable. As it turns out, Ruby contains another optimization that speeds up hash access even more for small hashes that contain less than 9 elements.

Updating Ruby Under a Microscope

2025-01-28T00:00:00Z

Ruby stores much of its own internal data in hash tables.

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while to finish. Leave a comment or drop me a line and I'll email you when it's finished.

In the meantime, here’s an excerpt from a rewrite of Chapter 7 about hash tables. Vladimir Makarov and the Ruby team redesigned Ruby’s hash table implementation back in 2015 to use open addressing, shortly after I published the first edition of RUM. This seemed like a good place to start.

Chapter 7: The Hash Table: The Workhorse Of Ruby Internals

Hash Tables in Ruby	3
Saving a Value in a Hash Table	3
Retrieving a Value from a Hash Table	5
Experiment 7-1: Retrieving a Value from Hashes of Varying Sizes	7
How Hash Tables Expand to Accommodate More Values	9
Hash Collisions and Open Addressing	9
Searching For an Element Using Open Addressing	11
Expanding a Hash Table	14
How Does Ruby Decide How Many Entries And Bins To Use?	15
Experiment 7-2: Inserting One New Element into Hashes of Varying Sizes	17
Optimization for Small Hashes	20
How Does Ruby Convert A Packed Hash Into A Hash Table?	22
How Ruby Implements Hash Functions	23
Experiment 7-3: Using Objects as Keys in a Hash	25
Summary	30

Hash Tables in Ruby

Hash tables are a commonly used, well-known, age-old concept in computer science. They organize values into groups, or bins, based on an integer value calculated from each value — a hash. When you need to find a value, you can figure out which bin it’s in by recalculating its hash value, thus speeding up the search.

Every time you write a method, Ruby creates an entry in a hash table.

Saving a Value in a Hash Table

Figure 7-1 shows a single hash object and its hash table.

Figure 7-1: A Ruby hash object with an empty hash table

On the left is the RHash (short for Ruby hash) structure. On the right, you see the hash table used by this hash, represented by the st_table structure. This C structure contains the basic information about the hash table, including the number of entries saved in the table and pointers to the entries and bins. Each RHash structure contains a pointer to a corresponding st_table structure. For many hashes, Ruby initially creates 32 entries and 64 bins. (Hashes with 8 or fewer entries work somewhat differently; see “Optimization for Small Hashes” on page 20.) The best way to understand how a hash table works is by stepping through an example. Suppose I add a new key/value to a hash called my_hash:

my_hash[:key] = "value"

While executing this line of code, Ruby saves the key and value into the first entry, as shown in Figure 7-2.

Figure 7-2: A Ruby hash object containing a single value

Here you can see the new key/value pair inside the first entry slot, called an st_table_entry in Ruby’s C source code. Ruby saves the keys and values in the entries array in the order you add them. This makes it easy for Ruby to return them back to you in the same order. Also see that Ruby saved an entry index of 0 in the third bin, number 2. Ruby did this by taking the given key — in this example, the symbol :key — and passing it to an internal hash function that returns a pseudorandom integer:

some_value = internal_hash_function(:key)

Next, Ruby takes the hash value — in this example, some_value — and calculates the modulus by the number of bins, which is the remainder after dividing by the number of bins.

some_value % 64 = 2

NOTE: In Figure 7-2, I assume that the actual hash value for :key divided by 64 leaves a remainder of 2. Later in this chapter, I’ll explore in more detail the hash functions that Ruby actually uses. We’ll see how using 64 bins (a power of 2) speeds up this calculation.

Now let’s add a second element to the hash:

my_hash[:key2] = "value2"

This time let’s imagine that the hash value of :key2 divided by 64 yields a remainder of 5.

internal_hash_function(:key2) % 64 = 5

Figure 7-3 shows that Ruby fills in a second st_table_entry structure in the entries array, and the entry index 1 in bin number 5, the sixth bin.

Figure 7-3: A Ruby hash object containing two values

Retrieving a Value from a Hash Table

The benefit of using a hash table becomes clear when you ask Ruby to retrieve the value for a given key. For example:

p my_hash[:key]
 => "value"

If Ruby had saved all of the keys and values in an array or linked list, it would have to iterate over all the elements in that array or list, looking for :key. This might take a very long time, depending on the number of elements. But using a hash table, Ruby can jump straight to the key it needs to find by recalculating the hash value for that key. To recalculate the hash value for a particular key, Ruby simply calls the hash function again: some_value = internal_hash_function(:key).

Then, it redivides the hash value by the number of bins to get the remainder, or the modulus. some_value % 64 = 2 At this point, Ruby knows to look in bin number 2 and finds the entry index 0, and in turn finds the entry with the key of :key in entry number 0 or the first entry slot. Ruby can later find the value for :key2 by repeating the same hash calculation internal_hash_function(:key2) % 64 = 5.

Figure 7-4: Finding Values in a Hash Table

Figure 7-4 explains how Ruby would find these two values in the hash table: On the left side, the first arrow starts from the third bin (bin index 2 = internal_hash_function(:key) % 64) and points to the first key/value pair, at index 0. To the right, the second arrow starts from the sixth bin (bin index 5 = internal_hash_function(:key2) % 64) and points to the second key/value pair, at index 1.

NOTE: The C library used by Ruby to implement hash tables was written in the 1980s by Peter Moore from the University of California, Berkeley. Later in 2015 Vladimir Makarov rewrote the hash table code, using a data structure which saves the entry and bin arrays in contiguous memory segments. By saving all the entries and bins nearby each other in memory, modern CPUs are able to cache all of the data in the hash table more efficiently, speeding up the overall process. You can find Makarov’s hash table code in the C code files st.c and include/ruby/st.h. All of the function and structure names in that code use the naming convention st_. The definition of the RHash structure that represents every Ruby Hash object is in the include/ruby/ruby.h file. Along with RHash, this file contains all of the other primary object structures used in the Ruby source code: RString, RArray, RValue, and so on.

LLVM IR: The Esperanto of Computer Languages

2022-02-19T00:00:00Z

Esperanto grammar is logical and self
consistent, designed to be easy to learn.
via Wikimedia Commons

I empathize for people who have to learn English as a foreign language. English grammar is inconsistent, arbitrary and hard to master. English spelling is even worse. I sometimes find myself apologizing for my language’s shortcomings. But learning any foreign language as an adult is very difficult.

Esperanto, an “artificial language,” is different. Invented by Ludwik Zamenhof in 1873, Esperanto has a vocabulary and grammar that are logical and consistent, designed to be easier to learn. Zamenhof intended Esperanto to become the universal second language.

Computers have to learn foreign languages too. Every time you compile and run a program, your compiler translates your code into a foreign language: the native machine language that runs on your target platform. Compilers should have been called translators. And compilers struggle with the same things we do: inconsistent grammar and vocabulary, and other peculiarities of the target platform.

Recently, however, more and more compilers translate your code to an artificial machine language. They produce a simpler, more consistent, more powerful machine language that doesn’t actually run on any machine. This artificial machine language, LLVM IR, makes writing compilers simpler and reading the code compilers produce simpler too.

LLVM IR is becoming the universal second language for compilers.

One Line of LLVM IR

The Low Level Virtual Machine (LLVM) project had the novel idea of inventing a virtual machine that was easy for compiler engineers to use as a target platform. The LLVM team designed a special instruction set called intermediate representation (IR). New, modern languages such as Rust, Swift, Clang-based versions of C and many others, first translate your code to LLVM IR. Then they use the LLVM framework to convert the IR into actual machine language for any target platform LLVM supports:

LLVM is great for compilers. Compiler engineers don’t have to worry about the detailed instruction set of each platform, and LLVM optimizes your code for whatever platform you choose automatically. And LLVM is also great for people like me who are interested in what machine language instructions look like and how CPUs execute them. LLVM instructions are much easier to follow than real machine instructions. Let’s take a look at one!

Here’s a line of LLVM IR I generated from a simple Crystal program:

%57 = call %"Array(Int32)"* @"*Array(Int32)@Array(T)::unsafe_build:Array(Int32)"(i32 610, i32 2), !dbg !89

Wait a minute! This isn’t simple or easy to follow at all! What am I talking about here? At first glance, this does look confusing. But as we’ll see, most of the confusing syntax is related to Crystal, not LLVM. Studying this line of code will reveal more about Crystal than it will about LLVM.

The rest of this article will unpack and explain what this line of code means. It looks complex, but is actually quite simple.

The Call Instruction

The instruction above is a function call in LLVM IR. To produce this code, I wrote a small Crystal program and then translated it using this command:

$ crystal build array_example.cr --emit llvm-ir

The --emit option directed Crystal to generate a file called array_example.ll, which contains the line above along with thousands of other lines. We’ll get to the Crystal code in a minute. But for now, how do I get started understanding what the LLVM code means?

The LLVM Language Reference Manual has documentation for call and all of the other LLVM IR instructions. Here’s the syntax for call:

<result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] [addrspace(<num>)]
         <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [ operand bundles ]

My example call instruction doesn’t use many of these options. Removing the unused options, I can see the actual, basic syntax of call:

<result> = call <ty> <fnptrval>(<function args>)

In order from left to right, these values are:

<result> which register to save the result in
<ty> the type of the return value
<fnptrval> a pointer to the function to call
<function args> the arguments to pass to that function

What does all of this mean, exactly? Let’s find out!

A CPU With Infinite Registers

Starting on the left and moving right, let’s step through the call instruction:

The token %57 to the left of the equals sign tells LLVM where to save the return value of the function call that follows. This isn’t a normal variable; %57 is an LLVM “register.”

Registers are physical circuits located on microprocessor chips used to save intermediate values. Saving a value in a CPU register is much faster than saving a value in memory, since the register is located on the same chip as the rest of the microprocessor. Saving a value in RAM memory, on the other hand, requires transmitting that value from one chip to another and is much slower, relatively speaking. Unfortunately, each CPU has a limited number of registers available, and so compilers have to decide which values are used frequently enough to warrant saving in nearby registers, and which other values can be moved out to more distant memory.

Unlike the limited number of registers available on a real CPU, the imaginary LLVM microprocessor has an infinite number of them. Because of this, compilers that target LLVM can simply save values to a register whenever they would like. There’s no need to find an available register, or to move an existing value out of a register first before using it for something else. Busy work that normal machine language code can’t avoid.

In this program, the Crystal compiler had already saved 56 other values in “registers” and so for this line of LLVM IR, Crystal simply used the next register, number 57.

LLVM Structure Types

Moving left to right, LLVM call instructions next indicate the type of the function call’s return value:

This name of this type, Array(Int32), is generated by the Crystal compiler, not by LLVM. That is, this is a type from my Crystal program. It could have been anything, and indeed other compilers that target LLVM will generate completely different type names.

The example Crystal program I used to generate this LLVM code was:

arr = [12345, 67890]
puts arr[1]

When I compiled this program, Crystal generated the call instruction above, which returns a pointer to the new array, arr. Since arr is an array containing integers, Crystal uses a generic type Array(Int32).

Machine languages that target real machines only support hardware types that machine supports. For example, Intel x86 assembly language allows you to save integers of different widths, 16, 32 or 64 bits for example, and an Intel x86 CPU has registers designed to hold values of each of these sizes.

LLVM IR is more powerful. It supports “structure types,” similar to a C structure or an object in a language like Crystal or Swift. Here the %"…" syntax indicates the name inside the quotes is the name of a structure type. And the asterisk which follows, like in C, indicates the type of the return value of my function call is a pointer to this structure.

My example LLVM program defines the type Array(Int32) like this:

%"Array(Int32)" = type { i32, i32, i32, i32, i32* }

Structure types allow LLVM IR programs to create pointers to structures or objects, and to access any of the values inside each object. That makes writing a compiler much easier. In my example, the call instruction returns a pointer to an object which contains 4 32-bit integer values, followed by a pointer to other 32 integer values. But what are all of these integer values? Above I said this function call was returning a new array - how can that be the case?

LLVM itself has no idea, and no opinion on the matter. To understand what these values are, and what they have to do with the array in my program, we need to learn more about the Crystal compiler that generated this LLVM IR code.

Reading the Crystal standard library, we can see Crystal implements arrays like this:

class Array(T)
include Indexable::Mutable(T)
include Comparable(Array)

# Size of an Array that we consider small to do linear scans or other optimizations.
private SMALL_ARRAY_SIZE = 16 

# The size of this array.
@size : Int32

# The capacity of `@buffer`.
# Note that, because `@buffer` moves on shift, the actual
# capacity (the allocated memory) starts at `@buffer - @offset_to_buffer`.
# The actual capacity is also given by the `remaining_capacity` internal method.
@capacity : Int32

# Offset to the buffer that was originally allocated, and which needs to
# be reallocated on resize. On shift this value gets increased, together with
# `@buffer`. To reach the root buffer you have to do `@buffer - @offset_to_buffer`,
# and this is also provided by the `root_buffer` internal method.
@offset_to_buffer : Int32 = 0

# The buffer where elements start.
@buffer : Pointer(T)

# In 64 bits the Array is composed then by:
# - type_id            : Int32   # 4 bytes -|
# - size               : Int32   # 4 bytes  |- packed as 8 bytes
#
# - capacity           : Int32   # 4 bytes -|
# - offset_to_buffer   : Int32   # 4 bytes  |- packed as 8 bytes
#
# - buffer             : Pointer # 8 bytes  |- another 8 bytes

The comments above are very illustrative and complete - the Crystal team took the time to document their standard library and explain not only how to use each class, like Array(T), but how they are implemented internally.

In this case, we can see the four i32 values inside the Array(Int32) LLVM structure type hold the size and capacity off the array, among other things. And the i32* value is a pointer to the actual contents of the array.

Functions

The target of the call instruction appears next, after the return type:

This is quite a mouthful! What sort of function is this?

There are two steps to understanding this: First, the @"…" syntax. This is simply a global identifier in this LLVM program. So my call instruction is just calling a global function. In LLVM programs, all functions are global; there is no concept of a class, module or similar groupings of code.

But what in the world does that crazy identifier mean?

LLVM ignores this complex name. For LLVM this is just a name like foo or bar. But for Crystal, the name has much more significance. Crystal encoded a lot of information into this one name. Crystal can do this because the LLVM code isn’t intended for anyone to read directly. Crystal has created a “mangled name,” meaning the original version of the function to call is there but it’s been mangled or rewritten in a confusing manner.

Crystal rewrites function names to ensure they are unique. In Crystal, like in many other statically typed languages, functions with different argument types or return value types are actually different functions. So in Crystal if I write:

def foo(a : Int32)
puts "Int: #{a}"
end

def foo(a : String)
puts "String: #{a}"
end

foo(123)
#=> Int: 123
foo("123")
#=> String: 123

…I have two separate, different functions both called foo. The type of the parameter a distinguishes one from the other.

Crystal generates unique function names by encoding the arguments, return value and type of the receiver into the into the function name string, making it quite complex. Let’s break it down:

Array(Int32)@Array(T) - this is the type of the receiver. That means the unsafe_build function is actually a method on the Array(T) generic class. And in this case, the receiver is an array holding 32 bit integers, the Array(Int32) class. Crystal includes both names in the mangled function name.
unsafe_build - this is the function Crystal is calling.
Int32 - these are the function’s parameter types. In this case, Crystal is passing in a single integer, so we just see one Int32 type.
Array(Int32) - this is the return value type, a new array containing integers.

As I discussed in my last post, the Crystal compiler internally rewrites my array literal expression [12345, 67890] into code that creates and initializes a new array object:

__temp_621 = ::Array(typeof(12345, 67890)).unsafe_build(2)
__temp_622 = __temp_621.to_unsafe
__temp_622[0] = 12345
__temp_622[1] = 67890
__temp_621

In this expanded code, Crystal calls unsafe_build and passes in 2, the required capacity of the new array. And to distinguish this use of unsafe_build from other unsafe_build functions that might exist in my program, the compiler generated the mangled name we see above.

Arguments

Finally, after the function name the LLVM IR instruction shows the arguments for the function call:

LLVM IR uses parentheses, like most languages, to enclose the arguments to a function call. And the types precede each value: 610 is a 32-bit integer and 2 is also a 32-bit integer.

But wait a minute! We saw just above the expanded Crystal code for generating the array literal passes a single value, 2, into the call to unsafe_build. And looking at the mangled function name above, we also see there is a single i32 parameter to the function call.

But reading the LLVM IR code we can see a second value is also passed in: 610. What in the world does 610 mean? I don’t have 610 elements in my new array, and 610 is not one of the array elements. So what is going on here?

Crystal is an object oriented language, meaning that each function is optionally associated with a class. In OOP parlance, we say that we are “sending a message” to a “receiver.” In this case, unsafe_build is the message, and ::Array(typeof(12345, 67890)) is the receiver. In fact, this function is really a class method. We are calling unsafe_build on the Array(Int32) class, not on an instance of one array.

Regardless, LLVM IR does’t support classes or instance methods or class methods. In LLVM IR, we only have simple, global functions. And indeed, the LLVM virtual machine doesn’t care what these arguments are or what they mean. LLVM doesn’t encode the meaning or purpose of each argument; it just does what the Crystal compiler tells it to do.

But Crystal, on the other hand, has to implement object oriented behavior somehow. Specifically, the unsafe_build function needs to behave differently depending on which class it was called for, depending on what the receiver is. For example:

::Array(typeof(12345, 67890)).unsafe_build(2)

… has to return an array of two integers. While:

::Array(typeof("abc", "def")).unsafe_build(2)

…has to return an array of two strings. How does this work in the LLVM IR code?

To implement object oriented behavior, Crystal passes the receiver as a hidden, special argument to the function call:

This receiver argument is a reference or pointer to the receiver’s object, and is normally known as self. Here 610 is a reference or tag corresponding to the Array(Int32) class, the receiver. And 2 is the actual argument to the unsafe_build method.

Reading the LLVM IR code, we’ve learned that Crystal secretly passes a hidden self argument to every method call to an object. Then inside each method, the code has access to self, to the object instance that code is running for. Some languages, like Rust, require us to pass self explicitly in each method call; in Crystal this behavior is automatic and hidden.

Learning How Compilers Work

LLVM IR is a simple language designed for compiler engineers. I think of it like a blank slate for them to write on. Most LLVM instructions are quite simple and easy to understand; as we saw above, understanding the basic syntax of the call instruction wasn’t hard at all.

The hard part was understanding how the Crystal compiler, which targets LLVM IR, generates code. The LLVM syntax itself was easy to follow; it was the Crystal language’s implementation that was harder to understand.

And this is the real reason to learn about LLVM IR syntax. If you take the time to learn how LLVM instructions work, then you can start to read the code your favorite language’s compiler generates. And once you can do that, you can learn more about how your favorite compiler works, and what your programs actually do when you run them.

Visiting an Abstract Syntax Tree

2022-01-22T00:00:00Z

Joshua Tree National Park (via: Wikimedia Commons)

In my last post, I explored how Crystal parsed a simple program and produced a data structure called an abstract syntax tree (AST). But what does Crystal do with the AST? Why bother going to such lengths to create it?

After Crystal parses my code, it repeatedly steps through all the entries or nodes in the AST and builds up a description of the intended meaning and behavior of my code. This process is known as semantic analysis. Later, Crystal will use this description to convert my program into a machine language executable.

But what does this description contain? What does it really mean for a compiler to understand anything? Let’s pack our bags and visit an abstract syntax tree with Crystal to find out.

The Visitor Pattern

Imagine several tourists visiting a famous tree: Each of them sees the same tree in a different way. The tree doesn’t change, but the perspective of each person looking at it is different. They each take a different photo, or remember different details.

In Computer Science this separation of the data structure (the tree) from the algorithms using it (the tourists) is known as the visitor pattern. This technique allows compilers and other programs to run multiple algorithms on the same data structure without making a mess.

The visitor pattern calls for two functions: accept and visit. First, a node in the data structure “accepts” a visitor:

After accepting a visitor, the node turns around and calls the visit method on Visitor:

The visit method implements whatever algorithm that visitor is interested in.

This seems kind of pointless… why use accept at all? We could just call visit directly. The key is that, after calling the visitor and passing itself, the node passes the visitor to each of its children, recursively:

And then the visitor can visit each of the child nodes also. The Visitor class doesn’t necessarily need to know anything about how to navigate the node data structure. And more and more visitor classes can implement new algorithms without changing the underlying data structure and breaking each other.

The Visitor Pattern in the Crystal Compiler

In order to understand what my code means, Crystal reads through my program’s AST over and over again using different visitors. Each algorithm looks for certain syntax, records information about the types and objects my code uses or possibly even transforms my code into a different form.

A photo I took in 2018 of Angel Oak,
a 400-500 year old tree in South Carolina.

Crystal implements the basics of the visitor pattern in visitor.cr, inside the superclass of all AST nodes:

class ASTNode
  def accept(visitor)
    if visitor.visit_any self
      if visitor.visit self
        accept_children visitor
      end
      visitor.end_visit self
      visitor.end_visit_any self
    end
  end
end

Each subclass of ASTNode implements its own version of accept_children.

My Tiny Crystal Program

To get a sense of how the visitor pattern works inside of Crystal, let’s look at one line of code from my last post:

arr = [12345, 67890]

As I explained last month, the Crystal parser generates this AST tree fragment:

Once the parser is finished and has created this small tree, the Crystal compiler steps through it a number of different times, looking for classes, variables, type declarations, etc. Each of these passes through the AST is performed by a different visitor class: TopLevelVisitor, InstanceVarsInitializerVisitor or ClassVarsInitializerVisitor among many others.

The most important visitor class the Crystal compiler uses is called simply MainVisitor. You can find the code for MainVisitor in main_visitor.cr:

# This is the main visitor of the program, ran after types have been declared
# and their type declarations (like `@x : Int32`) have been processed.
#
# This visits the "main" code of the program and resolves calls, instantiates
# methods and visits them, recursively, with other MainVisitors.
#
# The visitor keeps track of a method's variables (or the main program, split into
# several files, in case of top-level code). It keeps track both of the type of a
# variable at a single point (stored in @vars) and the combined type of all assignments
# to it (in @meta_vars).
#
# Call resolution logic is in `Call#recalculate`, where method lookup is done.
class MainVisitor < SemanticVisitor

Since Crystal supports typed parameters and method overloading, the visitor class implements a different visit method for each type of node that it visits, for example:

class MainVisitor < SemanticVisitor
  def visit(node : Assign)
  def visit(node : Var)
  def visit(node : ArrayLiteral)
  def visit(node : NumberLiteral)

Etc…

Now let’s look at three examples of what the MainVisitor class does with my code: identifying variables, assigning types and expanding array literals. The Crystal compiler is much too complex to describe in a single blog post, but hopefully I can give you glimpse into the sort of work Crystal does during semantic analysis.

Identifying Variables

Obviously, my example code creates and initializes a variable called arr:

arr = [12345, 67890]

But how does Crystal identify this variable’s name and value? What does it do with arr?

The MainVisitor class starts to process my code by visiting the root node of this branch of my AST, the Assign node:

As you can see, earlier during the parsing phrase Crystal had saved the target variable and value of this assign statement in the child AST nodes. The target variable, arr, appears in the Var node, and the value to assign is an ArrayLiteral node. The MainVisitor now knows I declared a new variable, called arr, in the current lexical scope. Since my program has no classes, methods or any other lexical scopes, Crystal saves this variable in a table of variables for the top level program:

Actually, to be more accurate, there will always be many other variables in this table along with arr. All Crystal programs automatically include the standard library, so Crystal also saves all of the top level variables from the standard library in this table.

In a more normal program, there will be many lexical scopes for different method and class or module definitions, and MainVisitor will save each variable in the corresponding table.

Assigning Types

Probably the most important function of MainVisitor is to assign a type to each value in my program. The simplest example of this is when MainVisitor visits a NumberLiteral node:

Looking at the size of the numeric value, Crystal determines the type should be Int32. Crystal then saves this type right inside of the NumberLiteral node:

Strictly speaking, this violates the visitor pattern because the visitors shouldn’t be modifying the data structure they visit. But the type of each node, the type of each programming construct in my program, is really an integral part of that node. In this case the MainVisitor is really just completing each node. It’s not changing the structure of the AST in this case… although as we’ll see in a minute the MainVisitor does this for other nodes!

Type Inference

Sometimes type values can’t be determined from the intrinsic value of an AST node. Often the type of a node is determined by other nodes in the AST.

Recall my example line of code is:

arr = [12345, 67890]

Here Crystal automatically sets the type of the arr variable to the type of the array literal expression: Array(Int32). In Computer Science, this is known as type inference. Because Crystal can automatically determine the type of arr, I don’t need to declare it explicitly by writing something more complicated like this:

arr = uninitialized Array(Int32)
arr = [12345, 67890]

Type inference allows me to write concise, clean code with fewer type annotations. Most modern, statically typed languages such as Swift, Rust, Julia, Kotlin, etc., use type inference in the same way as Crystal. Even newer versions of Java or C++ use type inference.

The Crystal compiler implements type inference when the MainVisitor encounters an Assign AST node, what we saw above.

After encountering the Assign node, Crystal recursively processes one of the two child nodes, the ArrayLiteral value, and its child nodes. When this process finishes, Crystal knows the type of the ArrayLiteral node is Array(Int32):

I’ll take a closer look at how Crystal processes the ArrayLiteral node next. But for now, once Crystal has the type of the ArrayLiteral node it copies that type over to the Var node and sets its type also:

But Crystal does something else interesting here: It sets up a dependency between the two AST nodes: it “binds” the variable to the value:

This binding dependency allows Crystal to later update the type of the arr variable whenever necessary. In this case the value [12345, 67890] will always have the same type, but I suspect that sometimes the Crystal compiler can update types during semantic analysis. In this way if the Crystal compiler ever changed its mind about the type of some value, it can easy update the types of any dependent values. I also suspect Crystal uses these type dependency connections to produce error messages whenever you pass an incorrect type to some function, for example. These are just guesses, however; if anyone from the Crystal team knows exactly what these type bindings are used for let me know.

Update: Ary Borenszweig explained that sometimes the Crystal compiler updates the type of variables based on how they are used. He posted an interesting example on The Crystal Programming Language Forum.

Expanding an Array Literal

So far we’ve seen Crystal set the type of the NumberLiteral node to Int32, and we’ve seen Crystal assign arr a type of Array(Int32). But how did Crystal determine the type of the array literal [12345, 67890]?

This is where things get even more complicated. Sometimes during semantic analysis the Crystal compiler completely rewrites parts of your code, replacing it with something else. This happens even with my simple example. When visiting the ArrayLiteral node, the MainVisitor expands this simple line of code into something more complex:

__temp_621 = ::Array(typeof(12345, 67890)).unsafe_build(2)
__temp_622 = __temp_621.to_unsafe
__temp_622[0] = 12345
__temp_622[1] = 67890
__temp_621

Reading this, you can see how later my compiled program will create the new array. First Crystal creates an empty array with a capacity of 2, and an element type of Int32. typeof(12345, 67890) returns the type (or multiple types inside a union type) found in the given set of values, in this case just Int32. Later Crystal sets the two elements in the array just by assigning them.

Crystal achieves this by replacing part of my program’s AST with a new branch:

For clarity, I’m not drawing the AST nodes for the inner assign operations, only the first line:

__temp_621 = ::Array(typeof(12345, 67890)).unsafe_build(2)

Putting It All Together

With this new, updated AST we can see exactly how Crystal determines the type of my variable arr. Starting at the root of my AST, MainVisitor visits all of the AST nodes in this order in a series of recursive calls:

And it determines the types of each of these nodes as it returns from the recursive calls:

Some interesting details here that I don’t understand completely or have space to explain here:

The TypeOf node calculates a common union type using a type formula. In this example, it just returns Int32 because both elements of my array, 12345 and 67890, are simple 32 bit integers.
I believe the Generic node refers to a Crystal generic class via the Path node shown above, in this example Array(T). When MainVisitor processes the Generic node, it sets T to the type Int32, arriving at the type Array(Int32).class.
The Call node looks up the method my code is calling (unsafe_build) and uses the type from that method’s return value. I didn’t have time to explore how method lookup works in Crystal, however, so I’m not sure about this.

Scratching the Surface

Today we looked at a tiny piece of what the Crystal compiler can do. There are many more types of AST nodes, each of which the MainVisitor class handles differently. And there are many different visitor classes also, beyond MainVisitor. When analyzing a more complex program Crystal has to understand class and module definitions, instance and class variables, type annotations, different lexical scopes, macros, and much, much more. Crystal will need all of this information later, during the code generation phase, the next step that follows semantic analysis.

But I hope this article gave you a sense of what sort of work a compiler has to do in order to understand your code. As you can see, for a statically typed language like Crystal the compiler spends much of its time identifying all of the types in your code, and determining which programming constructs or AST nodes have which types.

Next time I’ll look at code generation: Now that Crystal has identified the variables, function calls and types in my code it is ready to generate the machine language code needed to execute my program. To do that, it will leverage the LLVM framework.