Chris Hulbert is Splinter Software

Local AI with a 2026 MacBook Neo

2026-06-24T00:00:00+10:00

There’s a ton of chatter lately around the expense of AI, with headings like “AI’s real cost problem: Using the tech is more expensive than paying human employees”. Wow, maybe our jobs aren’t in danger! Anyway, I’m not here to express an opinion about clankers replacing humans, that’s for another day. Here I’m just wanting to ask and answer the question: Can you simply run AI locally? If so, what’s it like? And just how capable is the 2026 MacBook Neo, really?

So, to start with, here’s a video of one of the newer highly optimised AI models running on my Neo:

As you can see, it’s certainly quick enough to be useful! And arguably more reliable than using AI online where, if their servers are overloaded, you often don’t get an answer at all (I’m looking at you, Gemini…).

How did I do this? Simply by following this recipe:

LM Studio (lmstudio.ai)
LM Studio > My Models > Search > E2B > Google > Download options > MLX 4-bit > Download
Click the ‘downloading’ button to open the downloads, pause and resume when it gets stuck

I tried various other approaches using command-line python, Apple’s MLX library, etc, but none of that worked. LM Studio is good.

SkiFree clone output

So I asked it to generate a SkiFree game. Here’s what it came up with:

As you can see, nowhere near SkiFree. However, in its defence:

It did generate working HTML + JS just fine.
It is some sort of game, to be fair.
I didn’t re-prompt it several times, which you would if you were actually using it.
It “admitted” that it was just making a jumping game, not a SkiFree clone.
It output 21 tokens/sec.

Plasma output

I then asked it to generate a ‘plasma’ graphics effect. Here’s what it made:

Again, not quite a traditional plasma effect, but still something fascinating!

Conclusion

So, in conclusion, even if you have a cheap-as-chips Neo, you can run AI locally just fine, and it’s obvious not as good as the frontier commercial models, but it’s better than you’d think, and plenty fine for the way I use AI: like a supercharged search.

Give it a try! May i recommend LM Studio and the <= 4-bit MLX models.

Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this helpful, at least a tiny bit, God bless!

Photo from unsplash.com/@allthestories

Lockless FIFO SPSC (Single Producer Single Consumer) Queue in C

2026-05-13T00:00:00+10:00

For real-time audio processing, it is necessary to communicate to your audio thread in such a way that the audio thread can never be blocked. Audio threads are hard-real-time: any delays will lead to loud pops/crackles/echoes. This is proper engineering! Fun!

So here’s my take on a simple C Lockless Single-Producer Single-Consumer queue that can be used for this (and other) applications.

This algorithm was written from a combination of reading existing C++ SPSC libraries (I couldn’t find any C ones), Wikipedia, the C atomics documentation, and Gemini. Even though Gemini helped, this was by no means remotely vibe coded, as will be explained later under the ‘heads or tails’ heading. Performance is good but note that there are faster options out there.

Have a look at the github repo: github.com/chrishulbert/spsc.

Theory

Basically we’re creating a ‘circular buffer’ where the reader chases the writer. The writer puts an item in the buffer if there’s a vacant position, and the reader consumes items from the buffer.

Seriously, go read the Wikipedia article on circular buffers, it’s surprisingly clear for a Wiki article, and does a better job than I at explaining the idea.

I’d also recommend skim-reading the references at the end of this article to understand the theory.

Heads or Tails?

These queues are often referred to as having a head and a tail. But there doesn’t seem to be a consensus as to whether you add to the tail and read from the head, or vice-versa. So there’s a terminology problem:

Linux says you write (insert) to the head: Queue.
Rust says you write (push) to the tail (back): VecDeque.
C++ STL says you write (push) to the tail (back): std::queue.
Java says you write (insert) at the tail: java.util.Queue.

In everday life, it is usual terminology that if a person is ‘at the head of the queue at the bank’ that they will be served next. Being served is analogous to being ‘read’ in our context. I suspect Linux is the one being unintuitive here.

Anyway, this is more important than a tomahto-tomayto thing, because this confusion makes it very difficult to use AI to make suggestions around queuing code, because half its training data treats head indices as writers, and the other half of its data treats heads as readers, so it tends to mix them up in code. AI aside, it makes it difficult to understand when reading examples!

Wikipedia avoids this confusion entirely by referring to them as the ‘write pointer’ and ‘read pointer’, so I’ll follow that terminology to make things unambiguous.

Circular representation in memory

Now since RAM isn’t circular, how do we represent this ring buffer?

It is a fixed-size array, with an index for where to write to, and an index for where to read from.

Since it is represented as a fixed-size array, this means that the buffer can be filled, and writes can fail. Since the writer is often the non-real-time side of the equation, if writes must not fail, some form of slower dynamic array could be used to keep track and retry later. But then you have to worry about “backpressure” when the consumer cannot keep up… 😀

It looks like this initially, for a size-4 queue. When the indices are equal (not necessarily 0, just equal) as follows, the queue is considered to be empty:

  ┌───┐
0 │   │ ← Reader index, Writer index
  ├───┤
1 │   │
  ├───┤
2 │   │
  ├───┤
3 │   │
  └───┘

To add item A, it is written at the current writer index, then the writer is incremented:

  ┌───┐
0 │ A │ ← Reader
  ├───┤
1 │   │ ← Writer
  ├───┤
2 │   │
  ├───┤
3 │   │
  └───┘

Then we add item B:

  ┌───┐
0 │ A │ ← Reader
  ├───┤
1 │ B │
  ├───┤
2 │   │ ← Writer
  ├───┤
3 │   │
  └───┘

Then we read. Since the reader index != writer, it considers it may read an item. It reads from the read index position, then increments the index:

  ┌───┐
0 │   │ A is read
  ├───┤
1 │ B │ ← Reader
  ├───┤
2 │   │ ← Writer
  ├───┤
3 │   │
  └───┘

We can read again, because the indices are unequal:

  ┌───┐
0 │   │ 
  ├───┤
1 │   │ B is read
  ├───┤
2 │   │ ← Reader, Writer
  ├───┤
3 │   │
  └───┘

There is nothing left to read, because the Reader == Writer:

  ┌───┐
0 │   │ 
  ├───┤
1 │   │
  ├───┤
2 │   │ ← Reader, Writer
  ├───┤
3 │   │
  └───┘

Now let’s add a few more, so we can see how it wraps around in a circular fashion. Let’s add X:

  ┌───┐
0 │   │ 
  ├───┤
1 │   │
  ├───┤
2 │ X │ ← Reader
  ├───┤
3 │   │ ← Writer
  └───┘

Now we add Y. Notice that the writer wraps around (circles) to index 0:

  ┌───┐
0 │   │ ← Writer
  ├───┤
1 │   │
  ├───┤
2 │ X │ ← Reader
  ├───┤
3 │ Y │
  └───┘

Now we add Z:

  ┌───┐
0 │ Z │ 
  ├───┤
1 │   │ ← Writer
  ├───┤
2 │ X │ ← Reader
  ├───┤
3 │ Y │
  └───┘

Even though the array above has an empty element, this queue is considered ‘full’ because (Writer + 1) % Length == Reader.

This ‘wasted’ element is a simple common technique to know when the queue is full, because if that extra element was filled, then Writer == Reader, which is the trigger for being considered empty, and another variable would be needed to disambiguate empty vs full.

To the best of my knowledge, this ‘wasted’ element isn’t reserved for multithreading safety, as the memory barriers (discussed later) solve that issue, it is just the way to determine full vs empty.

Atomics

Now: How is this to be made thread-safe?

Atomics are used for synchronising the threads in a lockless way, taking advantage of hardware CPU features to ensure neither thread will have to wait. In particular, this queue uses “release-acquire ordering”. More about atomics and release-acquire ordering can be read here.

In short:

An atomic_size_t foo; variable exists.
Values are ‘acquired’/loaded/read from this.
Values are ‘released’/stored/written to this.
When this acquire/release pattern is followed, the CPU guarantees: Any writes to other variables above the ‘release’ line of code will be visible to the ‘acquiring’ (reading) thread, they won’t be partially written or reordered by Out-Of-Order CPU optimisation or anything.

Thread Safety

Now we’re familiar with Atomics, how do they apply here?

A new entry is stored before the incremented write index is ‘released’.
The reader ‘acquires’ the write index, guaranteeing the data for the new entry is in a consistent state, because the entry was stored before the ‘release’.
The reader reads all the entries, then ‘releases’ the incremented read index.
The writer ‘acquires’ the read index, guaranteeing the reader has finished reading any queue entries, because the queue reading was completed before the ‘release’.

Code

Code is available at github.com/chrishulbert/spsc with benchmarking; however the important parts are here:

#include <stdatomic.h>

// You may want to have a 'type' for your queue entries:
typedef enum {
    QUEUE_ENTRY_TYPE_PLAY,
    QUEUE_ENTRY_TYPE_STOP,
} QueueEntryType;

// Put whatever you want to have in your queue entries here:
typedef struct {
    QueueEntryType type;
    int a;
    int b;
    int c;
} QueueEntry;

// If the queue size is a power of two eg 16,
// the '%' calculations later will be optimised to '& 0xF'.
#define QUEUE_SIZE 16

// This is the SPSC queue:
static struct {
    // Producer's index (writer thread, perhaps the game):
    atomic_size_t writeIndex;

    // Since the producer is the only thread that updates writeIndex,
    // it can use this 'mirror' as its source of truth,
    // thus no synchronisation is needed when reading.
    size_t writeIndexMirror; 

    // Consumer's index (reader thread, perhaps the audio thread).
    atomic_size_t readIndex;

    // Since the consumer is the only thread that updates readIndex,
    // it can use this 'mirror' as the source of truth,
    // thus no synchronisation is needed when reading.
    size_t readIndexMirror; 

    // The ring buffer.
    QueueEntry buffer[QUEUE_SIZE]; 
} queue;

// Write an item to the queue.
// To be called from the producer thread.
// Returns true on success.
bool queue_write(QueueEntry entry) {
    // Since only this thread updates the write index, use a non-atomic
    // mirror for a potentially-quicker read:
    size_t writeIndex = queue.writeIndexMirror;
    size_t nextWriteIndex = (writeIndex + 1) % QUEUE_SIZE;

    // Acquire the read index, so that any updates the reader made are visible.
    // Not that the reader makes any changes to the buffer, though.
    // Open to feedback here if this is necessary.
    size_t readIndex = atomic_load_explicit(
                           &queue.readIndex,
                           memory_order_acquire);

    // Check if queue is full.
    if (nextWriteIndex == readIndex) {
        return false; // Full!
    }

    // Store this entry in the queue.
    queue.buffer[writeIndex] = entry;
    
    // Use 'release' to ensure the entry in the buffer is visible
    // to the reader after it 'acquires' the write index.
    atomic_store_explicit(&queue.writeIndex, nextWriteIndex, memory_order_release);

    // Use this mirror as the source of truth, so it can be read next time
    // without needing synchronisation, since only this thread uses it.
    queue.writeIndexMirror = nextWriteIndex;

    return true; // Success, there was room!
}

// Read items from the queue, handling each one.
// To be called from the consumer thread.
void queue_read() {
    size_t readIndex = queue.readIndexMirror;

    // Acquire the write index from the writer thread.
    // Once acquired, any buffer updates made before updating the
    // write index will be visible to this thread.
    size_t writeIndex = atomic_load_explicit(
                            &queue.writeIndex, 
                            memory_order_acquire);

    // Loop through all entries:
    while (readIndex != writeIndex) {
        // Get this entry:
        QueueEntry* entry = &queue.buffer[readIndex];

        // Increment the read index, wrapping to 0 at the end of the queue buffer:
        readIndex = (readIndex + 1) % QUEUE_SIZE;
        
        // Deal with this entry:
        switch (entry->type) {
            case QUEUE_ENTRY_TYPE_PLAY:
                // Do something.
                break;

            case QUEUE_ENTRY_TYPE_STOP:
                // Do something.
                break;
        }
    }

    // Release the read index so the producer knows space has been cleared in
    // the buffer that it can use to store incoming entries:
    atomic_store_explicit(&queue.readIndex, readIndex, memory_order_release);

    queue.readIndexMirror = readIndex;
}

Benchmark

In my testing on a base model M4 Macbook Air:

Speed (lower is better): 14.886 nanos per entry
Throughput (higher is better): 67179 ops / millisecond

This is about 1/5th as fast as the 362Kops/sec that SPSCQueue achieves. I’m not sure why the speed difference? Perhaps because I’m running on macOS. Nevertheless, this is plenty fast enough for eg audio threads.

References

Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this fascinating, at least a tiny bit, God bless!

Commander Keen 4-6 file formats

2025-12-20T00:00:00+11:00

Recently I made Dopefish Decoder, a Rust tool for dumping the graphics from a very old-school Id software game: Commander Keen 4-6. It was a bit of work (fun work though) combining information from various sources to figure out how to read it all, so here’s the formats in rough EBNF! Further explanations are afterwards for the more complex elements.

Files

Graphics:
- graph_head
- graph_dict
- egagraph
Maps:
- map_head
- gamemaps

Non-file files

The map_head/graph_head/graph_dict “files” are actually present inside the game executable. Having said that, in many mods they are their own separate files. To get them, the executable first needs to be decompressed first, then these offsets used to extract them.

EBNF

// All multi-byte ints are little-endian.

graph_head = { graph offset }, graph length
graph length = 3 byte int // Matches length of egagraph file.
graph offset = 3 byte int 

graph_dict = { huffman node }
huffman node = node side, node side // Left, right.
node side = node value, node type
node value = byte
node type = leaf | node // Byte: 0 = leaf, else = node.

egagraph = { chunk }
egagraph = unmasked picture table chunk with header,
    masked picture table chunk with header,
    sprite table chunk with header,
    font a chunk with header,
    font b chunk with header,
    font c chunk with header,
    { unmasked picture chunk with header }, // Count from unmasked picture table.
    { masked picture chunk with header }, // Count from masked picture table.
    { sprite chunk with header }, // Count from sprite table.
    unmasked 8x8 tiles chunk without header, // One chunk for all tiles.
    masked 8x8 tiles chunk without header, // One chunk for all tiles.
    { unmasked 16x16 tile chunk without header },
    { masked 16x16 tile chunk without header },
    { text etc }
chunk without header = huffman encoded chunk
chunk with header = chunk decompressed length, huffman encoded chunk
chunk decompressed length = 4 byte int

picture table = { picture table entry }
picture table entry = width_pixels_divided_by_8, height_pixels
width_pixels_divided_by_8 = 2 byte int
height_pixels = 2 byte int

sprite table = { sprite table entry } // 18 bytes each.
sprite table entry = width_div_by_8, // All are 2 byte ints.
    height,
    x offset,
    y offset,
    clip left,
    clip top,
    clip right,
    clip bottom,
    shifts

image = picture | tile | sprite
unmasked image = red plane, green plane, blue plane, intensity plane
masked image = red plane, green plane, blue plane, intensity plane, mask plane

map_head = rlew key, { map header offset }
rlew key = 2 bytes
map header offset = 4 byte int // 0 means no map in this slot.

gamemaps = "TED5v1.0", { map }
map = map planes, map header
map header = background plane offset, // 38 bytes.
    foreground plane offset,
    sprite plane offset,
    background plane length,
    foreground plane length,
    sprite plane length,
    tile count width,
    tile count height,
    map name
plane offset = 4 byte int
plane length = 2 byte int
tile count = 2 byte int
map name = 16 bytes asciiz
map planes = background carmackized plane,
    foreground carmackized plane,
    sprite carmackized plane
carmackized plane = carmackized decompressed length, carmackized data
carmackized decompressed length = 2 byte int
carmackized data = carmack compressed(rlew plane)
rlew plane = rlew decompresed length, rlew data
rlew decompressed length = 2 byte int
rlew data = rlew compressed(decompressed plane)
decompressed plane = { map plane row }
map plane row = { map plane element }
map plane element = 2 byte int

Image planes

Images are stored in EGA planes.
Data is one whole-image plane, then the next plane, and so on.
Thus a certain pixel is represented 4-5 times across the data.
Masked image planes: RGBIM.
Red, Green, Blue, Intensity, Mask.
Unmasked image planes: RGBI.
When the mask bit = 1, it is a transparent pixel.
Each pixel in a plane is represented by 1 bit.
Inside each byte, pixels are left->right in big-endian order, 0x80 being leftmost.
All widths are multiples of 8 so you don’t have to worry about rows starting mid-byte.

Map elements

Map plane elements are ints representing which tile is displayed at that position.
Background plane corresponds to the unmasked tiles.
Foreground plane corresponds to the masked tiles.
Foreground plane elements are not always present. 0 means no element here. Which means that to represent the first masked tile, the value is 1. This means that you -1 the value to get the tile index.
Background plane always has an element, so the above -1 does not apply.

RLEW / Huffman / Carmackization

These compression techniques are big topics, far too complex for EBNF, and out of scope for an article like this.

They are are probably best described in code, which also has links to further reading. Hopefully the following code is readable enough to communicate the how-to:

Summary

I know this is the most random topic imaginable. Still, thanks for reading, I pinky promise this was written by a human, not AI, hope you found this fascinating if not useful, at least a tiny bit, God bless!

Cloudflare Rust Analysis

2025-12-05T00:00:00+11:00

A few weeks ago, there was a huge Cloudflare outage that knocked out half the internet for a while. As someone who has written a fair bit of Rust in my spare time (23KLOC according to cloc over the last few years), I couldn’t resist the urge to add some constructive thoughts to the discussion around the Rust code that was identified for the outage.

And I’m not going full Rust-Evangelism-Strike-Force here, as my pro-Swift conclusion will attest. Basically I’d just like to take this outage as an opportunity to recommend a couple tricks for writing safer Rust code.

The culprit

So, here’s the culprit according to Cloudflare’s postmortem:

pub fn fetch_features(
        &mut self,
        input: &dyn BotsInput,
        features: &mut Features,
) -> Result<(), (ErrorFlags, i32)> {
    features.checksum &= 0xffff_ffff_0000_0000;
    features.checksum |= u64::from(self.config.checksum);
    let (feature_values, _) = features
        .append_with_names(&self.config.feature_names)
        .unwrap();
    ...
}

Apparently it processes new configuration, and crashed at the unwrap because configuration with too many features was passed in.

Code Review

Keep in mind that I’m not seeing the greater context of this function, so the following may be affected by that, but here are my thoughts re the above code:

It returns a Result, with nothing for success case, and a combo of ErrorFlags and i32 for the failure case.
The presence of the &dyn for input indicates this uses dynamic dispatch, which means this isn’t intended as high-performance code. Which makes sense if this is just for loading configuration. Given that, they could have simply used anyhow’s all-purpose Result to make their lives simpler instead of this complex tuple for the error generic.
unwrap() is called. This is the big red flag, and something that should only generally be done in code that you are happy to have panic eg command line utilities, but less so for services. Swift’s equivalent is the force-unwrap operator !. When Swift was new, it was explained that the ! was chosen because it signifies danger, and stands out like a sore thumb in code reviews to encourage thorough examination. Rust’s unwrap isn’t as obvious at review time, and thus can sneak through unnoticed.
Since we’re already in a function that returns Result, it would be more idiomatic to use ? after the call to append_with_names, so that this function would hot-potato the error to the caller, instead of panicing.
If append_with_names returns an Option not a Result, ok_or(..)? would be a tidy option.

Alternative

Here I’ve changed the fetch_features function to be safer, with a couple options for how to gracefully handle this if append_with_names returns either a Result or an Option (it isn’t clear which it is from Cloudflare’s snippet, so I’ve done both). Note that I’ve also added some boilerplate around all this to keep the fetch_features code as similar as possible, but also commented out some stuff that’s less relevant.

fn main() {
    let mut fetcher = Fetcher::new();
    let mut features = Features::new();
    if let Err(e) = fetcher.fetch_features(&mut features) {
        // ... Gracefully handle the error here without panicing ...
        eprintln!("Error gracefully handled: {:#?}", e);
        return
    }
}

enum FeatureName {
    Foo,
    Bar,
}

struct Fetcher {
    feature_names: Vec<FeatureName>,
}

impl Fetcher {
    fn new() -> Self {
        Fetcher { feature_names: vec![] }
    }
    
    // This is the function Cloudflare said caused the outage:
    fn fetch_features(
        &mut self,
        // input: &dyn BotsInput,
        features: &mut Features,
    ) -> Result<(), (ErrorFlags, i32)> {
        // features.checksum &= 0xffff_ffff_0000_0000;
        // features.checksum |= u64::from(self.config.checksum);
        
        // If append_with_names returns a Result,
        // the question mark operator is safer than unwrap:
        let (feature_values, _) = features
            .append_with_names_result(&self.feature_names)?;
        
        // If append_with_names returns Option,
        // ok_or converts to a result, which forces you to be
        // explicit about what error is relevant,
        // which is then safely unwrapped using the question mark operator.
        let (feature_values, _) = features
            .append_with_names_option(&self.feature_names)
            .ok_or((ErrorFlags::AppendWithNamesFailed, -1))?;
        
        Ok(())
    }
}

#[derive(Debug)]
enum ErrorFlags {
    AppendWithNamesFailed,
    TooManyFeatures,
}

struct Features {
}

impl Features {
    fn new() -> Self {
        Features {}
    }
    
    // This is for if it returns a Result:
    fn append_with_names_result(
        &mut self,
        names: &[FeatureName],
    ) -> Result<(i32, i32), (ErrorFlags, i32)> {
        if names.len() > 200 { // Config is too big!
            Err((ErrorFlags::TooManyFeatures, -1))
        } else {
            Ok((42, 42))
        }
    }

    // This is for if it returns an Option:
    fn append_with_names_option(
        &mut self,
        names: &[FeatureName],
    ) -> Option<(i32, i32)> {
        if names.len() > 200 { // Config is too big!
            None
        } else {
            Some((42, 42))
        }
    }
}

Feel free to paste this into the Rust Playground and see if you have better suggestions :)

Suggestions

Instead of unwrap, the ? operator is a great option, particularly if you are already in a function that returns a Result, so please take advantage of such a situation.
ok_or is a great way to safely unwrap Options inside a Result function. If forces you to think about ‘what error should I return if there’s no value here?’.
Consider Swift! The exclamation point operator is a great way of drawing attention to danger in a code review, which is a fantastic piece of language ergonomics.

Summary

If anyone from Cloudflare is reading this, I hope this critique does not come across as unkind, much of my code is not amazingly bulletproof either! And kudos to Cloudflare for allowing us to see some of their code in the postmortem :)

Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this useful, at least a tiny bit, God bless!

Rust Compilation: Sequoia vs Tahoe

2025-12-04T00:00:00+11:00

Are you curious to know if upgrading from macOS Sequoia to Tahoe will affect compilation speeds? Everyone seems to be piling onto the anti-Tahoe bandwagon, so I thought I’d add some anecdata to the anecdotes going around.

Note that I have two identical laptops, the only difference is that one has Tahoe:

Mac                   macOS         Speed (lower is better)
---                   -----         -----
2025 M2 Air 16GB RAM  Sequoia 15.6  361.54s
2025 M2 Air 16GB RAM  Tahoe 26.1    360.88s

My core point is: Tahoe isn’t slower in my (admittedly simplistic) Rust compilation benchmark. It’s technically 0.2% faster, but that’s statistically insignificant.

To the mix, I’ve added a few other Macs I had lying around, to add some colour to the conversation:

M1 Studio Ultra   Sequoia 15.6.1  512.63s
M4 Air, 16GB RAM  Sequoia 15.6    378.13s
M2 Air, 8GB RAM   Sequoia         343.97s

Note that all macs are ‘base models’ of their generation.

Benchmark details

So, this benchmark is, as mentioned above, admittedly simple. I recently wrote a Rust tool to extract the sprites and maps from the Commander Keen episodes, and this benchmark times how long it takes to compile its 16 source files from scratch 400 times. Despite its simplicity, the two identical-hardware Mac’s scored within 0.2% of each other, so it is at least consistent.

If you’d like to repeat it:

Fresh install of macOS if possible
Install default Rust via rustup.rs
My macs were running Rustc 1.91.1
Install homebrew via brew.sh
git clone https://github.com/chrishulbert/dopefish-decoder.git
Do your best to ensure other things aren’t running in the background
make bench

Conspiracy theory!

It’s surprising that the M4 doesn’t trounce the M2’s! I wonder if Apple is actually putting M4 chips into the 2025 batch of “M2” laptops that have been updated to have 16GB RAM. Given the RAM is integrated with the CPU, maybe it was just simpler for them to put M4 chips in, rather than dust off the M2 designs, add more RAM, and restart the production line? And maybe they just didn’t bother to throttle them in some way. Maybe?

Alternatively… perhaps this was just a poor benchmark? After all, my older M2 somehow came out fastest. But the performance consistency between the two identical laptops is remarkably tight, indicating at least some level of accuracy. My M4 also has a corporate security rootkit installed too, which may slow things. Lots to think about.

Ultra

It’s unfortunate to see the M1 Ultra taking a lot longer than the others. I guess the M1 is showing its age! I can see why Apple’s rumoured to have given up on the Mac Pro: by the time the Ultra team has managed to release an Mn Ultra, the Mn+1 Max is out and faster. If I were to make any recommendations here, I’d say forget previous-gen Ultras, instead buy latest-gen Studio Max. Perhaps Ultra will become more relevant once the yearly pace of improvement in M processors slows down.

Summary

So there you have it: Benchmarking is hard. Kudos to those who arguably do it well. If nothing else though, I wouldn’t be too worried about Tahoe slowing things down, it’s a perfectly cromulent ~~word~~ operating system. Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this fascinating, at least a tiny bit, God bless!

Better React Native devex through Expo Go

2025-09-05T00:00:00+10:00

Having worked with React Native projects on and off for years now, I’ve come to appreciate that there are significant productivity and developer experience (devex) gains on the table, that tend to be derailed the moment a native library is added to the mix. But what if you could keep that productivity flowing?

Most people (somewhat rightly) think of Expo Go as the training wheels that nobody uses for serious React Native development. But you’re probably like me: the vast majority of daily work is simple Create-Read-Update-Delete (CRUD!) data manipulation. And what if, for that daily work, we didn’t need to fight with getting Xcode or Android Studio to compile, code sign, deal with cocoapods, ruby, gradle, etc etc? What if most of your team didn’t even need to install Xcode/Studio at all? I believe this strategy can be beneficial for keeping you and your team productive, and isolate all the pain of the native integration to the CI builds.

So, how to get to this point? Some thoughts:

When considering libraries, ask yourself ‘is this pure-js or native?’. For instance, when evaluating options for a feature flag library, you could use FooFlags (not a real product) or LaunchDarkly. FooFlags has a react native library that wraps native code, however LaunchDarkly is a pure-js library. You should use the one that has a pure JS library, because that gets you one step closer to being able to do your daily work in Expo Go.
Sometimes, companies release newer versions of their libraries that are pure JS. LaunchDarkly did this in the last year or two: their older library was native + JS shim, but their newer one is pure JS. In cases like these, you can upgrade to the latest pure JS one to make your life easier.
If you have an unavoidably native component, you can wrap it in a pure-JS component that shows a placeholder. If this is a part of the app that you don’t need to work on very often, this can be a great way of having your cake and eating it too: Have native components for part of the app, yet still be able to spend most of your productive workday zipping along with Expo Go.
If you have native modules, you can shim them to perform no-ops (or whatever is reasonable) when in Expo Go. I’ll demonstrate some strategies for achieving these last 2 points next:

Your native expo modules

If you’ve made your own native module, you can ‘shim’ it out in such a way that it does nothing when run in the Expo Go environment. To do so, as an example, I modify the generated modules/my-foo-module/src/MyFooModule.ts file as follows:

import { NativeModule, requireNativeModule } from 'expo';
import Constants, { ExecutionEnvironment } from "expo-constants";
import { EventSubscription } from 'expo-modules-core';
import { MyFooModuleEvents } from './MyFooModule.types';

declare class MyFooModule extends NativeModule<MyFooModuleEvents> {
  PI: number;
  getValueSync(): string;
  setValueAsync(value: string): Promise<void>;
  doSomething(): void;
}

function requireOrMock(): MyFooModule {
  if (Constants.executionEnvironment === ExecutionEnvironment.StoreClient) {
    // Expo Go:
    return {
      // My stuff, mocked:
      PI: 3.141,
      getValueSync: function (): string { return '' },
      setValueAsync: async function (value: string): Promise<void> {},
      doSomething: function (): void {},

      // Generic expo module stuff:
      addListener: function <EventName extends keyof MyFooModuleEvents>(
        eventName: EventName,
        listener: MyFooModuleEvents[EventName]): EventSubscription {
          return { remove: function(): void {} }
        },
      removeListener: function <EventName extends keyof MyFooModuleEvents>(
        eventName: EventName,
        listener: MyFooModuleEvents[EventName]): void {},
      removeAllListeners: function (
        eventName: keyof MyFooModuleEvents): void {},
      emit: function <EventName extends keyof MyFooModuleEvents>(
        eventName: EventName,
        ...args: Parameters<MyFooModuleEvents[EventName]>): void {},
      listenerCount: function <EventName extends keyof MyFooModuleEvents>(
        eventName: EventName): number { return 0 }
    } 
  } else {
    return requireNativeModule<MyFooModule>('MyFooModule');
  }
}
export default requireOrMock();

Imported library components

In our case, we use a native library for VOIP calling. We only have one component that uses this library, so I’ve added a ‘wrapper’ component that replaces our component with a placeholder when we’re using Expo Go. The wrapper works as follows:

import Constants, { ExecutionEnvironment } from "expo-constants";
import { Text, View } from "react-native";
import { MyComponentProps } from "./MyComponent";

// This wraps a MyComponent in such a way it is not instantiated for Expo Go.
export default function MyComponentWrapper(props: MyComponentProps) {
  if (Constants.executionEnvironment === ExecutionEnvironment.StoreClient) {
    // Expo Go:
    return (
      <View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
        <Text>This is disabled while using Expo Go</Text>
      </View>
    );
  } else {
    // Production:
    const { default: MyComponent } = require('./MyComponent'); // Lazy import.
    return <MyComponent {...props} ></MyComponent>
  }
}

This wrapper has the same props as the actual component, thus everywhere our component is used, this wrapper component is to be simply used instead.

For this to work, you have to edit MyComponent.tsx and export its props like this:

export interface MyComponentProps { ...

Summary

Hope you find this helpful! I strongly recommend using Expo Go for the sake of your team’s productivity if possible, and with the above tips, I think it is reasonably achievable. Thanks for reading, I pinky promise this was written by a human, not AI, hope you found this fascinating, at least a tiny bit, God bless!

The Maths of FM Synthesis

2024-10-09T00:00:00+11:00

FM Synthesis is an old-school way of generating musical instrument sounds, initially popularised by the Adlib and SoundBlaster PC sound cards in the late ’80s (and, of course, in piano keyboards). Here’s an example of what FM Synth music sounded like in games. Ahh the nostalgia.

A friend who is a school music teacher found that his students all use the same identical samples for instruments for their creations. So I created YouSynth, a web app that allows you to create any instrument you like using a basic form of FM synthesis, and download that instrument as WAV file you can use anywhere, as well as play around with it using an attached MIDI keyboard. Please check it out!

So as to not leave out the maths teachers, I thought I’d write an article about how the maths for FM synthesis works! I think it’s fascinating, hopefully you might too. My dream is that maybe a maths teacher somewhere would use this as an interesting demonstration of applied maths to pique their students’ interest :)

Formula

To start with, here’s the gist of it - for each sample, the value is:

sin(
    carrierFrequency * time * 2 * pi
    +
    sin(modulatorFrequency * time * 2 * pi) * modulatorEnvelope
) * carrierEnvelope

Now let’s break that down.

Carrier frequency

The carrier frequency is the fundamental frequency of the note. Eg for A4, it’s 440 Hz. For Middle C, aka C4, it’s ~261.6 Hz.

For each note you go up (including sharps), the frequency is multiplied by 2^(1/12). The 1/12 is because there are 12 freqencies in each octave when including the sharps. The 2^ is because frequencies double with each octave. Eg A4 is 440 Hz, and A5 is 880 Hz.

When working with MIDI, each note gets a number representation: C4=60, C#4=61, D4=62, etc. To convert from a midi note to a frequency, the formula is: 440 * 2 ^ ((midiNote - 69) / 12).

Time

The time in the above formula is in seconds since the note started playing. Since you’d typically be generating samples at a rate of 44100 or 48000 Hz, to convert from the sample number to the time, this formula applies: time = sample / sampleRate.

Pi

The 2 * pi is necessary because sin repeats its output every multiple of 2 * pi on its input. An interesting aside: Credible mathematicians consider that tau (2 * pi) should be taught to students instead of pi, because it is so common that we need to double pi before using it, so why not just use the double as the famous constant, then? See the Tau manifesto.

Modulator frequency

The modulator is the waveform that ‘modulates’ the fundamental frequency. Think of it as the whammy bar on a guitar being wiggled up and down quickly.

Typically the modulator frequency is a whole-number multiple or fraction of the fundamental frequency. Eg for a fundamental of 440 Hz, the following modulator frequencies all sound ‘nice’: 110 (440/4), 146.7 (440/3), 220 (440/2), 440, 880, 1320, etc.

Envelopes

The envelopes control the amplitude/volume of the carrier and modulator over time. From initially zero, quickly up to 100%, then down to a sustained volume of perhaps 50%, where it remains while the piano key is held, then when the key is released, it gradually returns to 0.

A common strategy is the ADSR envelope.

During the attack stage: amplitude = time / attackDuration.

During decay stage: amplitude = 1 - (time - attackDuration) / decayDuration * (1 - sustainAmplitude).

During sustain stage: amplitude = sustainAmplitude.

During release stage: amplitude = sustainAmplitude - releasingTime / releaseDuration.

Other waves

To make more interesting sounds, other waveforms besides sine waves can be used. Some common ones are square, triangle, and sawtooth. Here are their formulae which repeat every multiple of 1 on the input:

Sine = sin(x * 2 * pi)
Square = 4 * floor(x) - 2 * floor(2 * x) + 1
Triangle = 2 * abs(2 * (x + 0.25 - floor(x + 0.75))) - 1
Sawtooth = 2 * (x - floor(x + 0.5))

So there you have it, the maths behind basic FM Synthesis. Thanks for reading, hope you found this fascinating, at least a tiny bit, God bless!

Photo by Vackground on Unsplash

Neural Networks from scratch #4: Training layers of neurons, backpropagation with pseudocode and a Rust demo

2024-07-10T00:00:00+10:00

Hi all, here’s the fourth on my series on neural networks / machine learning / AI from scratch. In the previous articles (please read them first!), I explained how a single neuron works, then how to calculate the gradient of its weight and bias, and how you can use that gradient to train the neuron. In this article, I’ll explain how to determine the gradients when you have many layers of many neurons, and use those gradients to train the neural net.

In my previous articles in this series, I used spreadsheets to make the maths easier to follow along. Unfortunately I don’t think I’ll be able to demonstrate this topic in a spreadsheet, I think it’d get out of hand, so I’ll keep it in code. I hope you can still follow along!

Data model

Pardon my pseudocode:

class Net {
    layers: [Layer]
}

class Layer {
    neurons: [Neuron]
}

class Neuron {
    value: float
    bias: float
    weights: [float]
    activation_gradient: float
}

Explanation:

Layers: The neural net is made up of multiple layers. The first one in the array is the input layer, the last one is the output layer.
Neurons: The neurons that make up a layer. Each layer will typically have different numbers of neurons.
Value: The output of each neuron.
Bias: The bias of each neuron.
Weights: Input weights for each neuron. This array’s size will be the number of inputs to this layer. For the first layer, this will be the number of inputs (aka features) to the neural net. For subsequent layers, this will be the count of neurons in the previous layer.
Activation Gradient: These are the gradients of each neuron, chained to the latter layers via the magic of calculus. This is also equal to the gradient of the bias too. Maybe reading my second article in this series will help understand what this gradient means :)

High(ish) level explanation

What we’re trying to achieve here is to use calculus to determine the ‘gradient’ of every bias and every weight in this neural net. In order to do this, we have to ‘back propagate’ these gradients from the back to the front of the ‘layers’ array.

Concretely - if, say, we had 3 layers: we’d figure out the gradients of the activation functions of layers[2], then use those values to calculate the gradients of layers[1], and then layers[0].

Once we have the gradients of the activation functions for each neuron in each layer, it’s easy to figure out the gradient of the weights and bias for each neuron.

And, as demonstrated in my previous article, once we have the gradients, we can ‘nudge’ the weights and biases in the direction that their gradients say, thus train the neural net.

Steps

Training and determining the gradients go hand-in-hand, as you need the inputs to calculate the values of each neuron in the net, and you need the targets (aka desired outputs) to determine the gradients. Thus it’s a three step process:

Forward pass (calculate the Layer.values)
Backpropagation (calculate the Layer.activation_gradients)
Train the weights and biases (adjust the Layer.biases and Layer.weights)

Forward pass

This pass fills in the ‘value’ fields.

The first layer’s neurons must have the same number of weights as the number of inputs.
Each neuron’s value is calculated as tanh(bias + sum(weights * inputs)).
Since tanh is used as the activation function, this neural net can only work with inputs and outputs and targets that are in the range -1 to +1.

Forward pass pseudocode:

for layer in layers, first to last {
    if this is the first layer {
        for neuron in layer.neurons {
            total = neuron.bias
            for weight in neuron.weights {
                total += weight * inputs[weight_index]
            }
            neuron.value = tanh(total)
        }
    } else {
        previous_layer = layers[layer_index - 1]
        for neuron in layer.neurons {
            total = neuron.bias
            for weight in neuron.weights {
                total += weight * previous_layer.neuron[weight_index].value
            }
            neuron.value = tanh(total)
        }
    }
}

Backward pass (aka backpropagation)

This fills in the ‘activation_gradient’ fields.

Note that when iterating the layers here, you must go last to first.
The ‘targets’ are the array of output value(s) from the training data.
The last layer must have the same number of neurons as the number of targets.
The (1 - value^2) * ... are calculus equations for determining gradients.

Backward pass pseudocode:

for layer in reversed layers, last to first {
    if this is the last layer {
        for neuron in layer.neurons {
            neuron.activation_gradient =
                (1 - neuron.value^2) *
                (value - targets[neuron_index])
        }
    } else {
        next_layer = layers[layer_index + 1]
        for this_layer_neuron in layer.neurons {
            next_layer_gradient_sum = 0
            for next_layer_neuron in next_layer.neurons {
                next_layer_gradient_sum +=
                    next_layer_neuron.activation_gradient * 
                    next_layer_neuron.weights[this_layer_neuron_index]
            }
            this_layer_neuron.activation_gradient =
                (1 - this_layer_neuron.value^2) *
                next_layer_gradient_sum
        }
    }
}

Training pass

Now that you have the gradients, you can adjust the biases/weights to train it to better.

I’ll skim over this as it’s covered in my earlier articles in this series. The gist of it is that, for each neuron, the gradient is calculated for the bias and every weight, and the bias/weights are adjusted a little to ‘descend the gradient’. Perhaps my pseudocode might make more sense:

learning_rate = 0.01 // Aka 1%
for layer in layers {
    if this is the first layer {
        for neuron in layer.neurons {
            neuron.bias -= neuron.activation_gradient * learning_rate
            for weight in neuron.weights {
                gradient_for_this_weight = inputs[weight_index] *
                    neuron.activation_gradient
                weight -= gradient_for_this_weight * learning_rate
            }
        }
    } else {
        previous_layer = layers[layer_index - 1]
        for neuron in layer.neurons {
            neuron.bias -= neuron.activation_gradient * learning_rate
            for weight in neuron.weights {
                gradient_for_this_weight =
                    previous_layer.neurons[weight_index].value *
                    neuron.activation_gradient
                weight -= gradient_for_this_weight * learning_rate
            }
        }
    }
}

Rust demo

Because I’m a Rust tragic, here’s a demo. It’s kinda long, sorry, not sorry. It was fun to write :)

This trains a neural network to calculate the area and circumference of a rectangle, given the width and height as inputs.

Width and height are scaled to the range 0.1 - 1. because that’s the range that the tanh activation function supports.
Target values are also scaled to be in the range that tanh supports.
Initial biases and weights are randomly assigned.

🦀🦀🦀

use rand::Rng;

struct Net {
    layers: Vec<Layer>,
}

struct Layer {
    neurons: Vec<Neuron>,
}

struct Neuron {
    value: f64,
    bias: f64,
    weights: Vec<f64>,
    activation_gradient: f64
}

const LEARNING_RATE: f64 = 0.001;

fn main() {
    let mut rng = rand::thread_rng();

    // Make a 3,3,2 neural net that inputs the width and height of a rectangle,
    // and outputs the area and circumference.
    let mut net = Net {
        layers: vec![
            Layer { // First layer has 2 weights to suit the 2 inputs.
                neurons: vec![
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                ],
            },
            Layer { // Second layer neurons have the same number of weights as the previous layer has neurons.
                neurons: vec![
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                ],
            },
            Layer { // Last layer has 2 neurons to suit 2 outputs.
                neurons: vec![
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                    Neuron {
                        value: 0.,
                        bias: rng.gen_range(-1. .. 1.),
                        weights: vec![
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                            rng.gen_range(-1. .. 1.),
                        ],
                        activation_gradient: 0.,
                    },
                ],
            },
        ],
    };

    // Train.
    let mut cumulative_error_counter: i64 = 0; // These vars are for averaging the errors.
    let mut area_error_percent_sum: f64 = 0.;
    let mut circumference_error_percent_sum: f64 = 0.;
    for training_iteration in 0..100_000_000 {
        // Inputs:
        let width: f64 = rng.gen_range(0.1 .. 1.);
        let height: f64 = rng.gen_range(0.1 .. 1.);
        let inputs: Vec<f64> = vec![width, height];

        // Targets (eg desired outputs):
        let area = width * height;
        let circumference_scaled = (height * 2. + width * 2.) * 0.25; // Scaled by 0.25 so it'll always be in range 0..1.
        let targets: Vec<f64> = vec![area, circumference_scaled];

        // Forward pass!
        for layer_index in 0..net.layers.len() {
            if layer_index == 0 {
                let layer = &mut net.layers[layer_index];
                for neuron in &mut layer.neurons {
                    let mut total = neuron.bias;
                    for (weight_index, weight) in neuron.weights.iter().enumerate() {
                        total += weight * inputs[weight_index];
                    }
                    neuron.value = total.tanh();
                }
            } else {
                // Workaround for Rust not allowing you to borrow two different vec elements simultaneously.
                let previous_layer: &Layer;
                unsafe { previous_layer = & *net.layers.as_ptr().add(layer_index - 1) }
                let layer = &mut net.layers[layer_index];
                for neuron in &mut layer.neurons {
                    let mut total = neuron.bias;
                    for (weight_index, weight) in neuron.weights.iter().enumerate() {
                        total += weight * previous_layer.neurons[weight_index].value;
                    }
                    neuron.value = total.tanh();
                }
            }
        }

        // Let's check the results!
        let outputs: Vec<f64> = net.layers.last().unwrap().neurons
            .iter().map(|n| n.value).collect();
        let area_error_percent = (targets[0] - outputs[0]).abs() / targets[0] * 100.;
        let circumference_error_percent = (targets[1] - outputs[1]).abs() / targets[1] * 100.;
        area_error_percent_sum += area_error_percent;
        circumference_error_percent_sum += circumference_error_percent;
        cumulative_error_counter += 1;
        if training_iteration % 10_000_000 == 0 {
            println!("Iteration {} errors: area {:.3}%, circumference: {:.3}% (smaller = better)",
                training_iteration,
                area_error_percent_sum / cumulative_error_counter as f64,
                circumference_error_percent_sum / cumulative_error_counter as f64);
            area_error_percent_sum = 0.;
            circumference_error_percent_sum = 0.;
            cumulative_error_counter = 0;
        }

        // Backward pass! (aka backpropagation)
        let layers_len = net.layers.len();
        for layer_index in (0..layers_len).rev() { // Reverse the order.
            if layer_index == layers_len - 1 { // Last layer.
                let layer = &mut net.layers[layer_index];
                for (neuron_index, neuron) in layer.neurons.iter_mut().enumerate() {
                    neuron.activation_gradient =
                        (1. - neuron.value * neuron.value) *
                        (neuron.value - targets[neuron_index]);
                }
            } else {
                // Workaround for Rust not allowing you to borrow two different vec elements simultaneously.
                let next_layer: &Layer;
                unsafe { next_layer = & *net.layers.as_ptr().add(layer_index + 1) }
                let layer = &mut net.layers[layer_index];
                for (this_layer_neuron_index, this_layer_neuron) in layer.neurons.iter_mut().enumerate() {
                    let mut next_layer_gradient_sum: f64 = 0.;
                    for next_layer_neuron in &next_layer.neurons {
                        next_layer_gradient_sum +=
                            next_layer_neuron.activation_gradient * 
                            next_layer_neuron.weights[this_layer_neuron_index];
                    }
                    this_layer_neuron.activation_gradient =
                        (1. - this_layer_neuron.value * this_layer_neuron.value) *
                        next_layer_gradient_sum;
                }
            }
        }

        // Training pass!
        for layer_index in 0..net.layers.len() {
            if layer_index == 0 {
                let layer = &mut net.layers[layer_index];
                for neuron in &mut layer.neurons {
                    neuron.bias -= neuron.activation_gradient * LEARNING_RATE;
                    for (weight_index, weight) in neuron.weights.iter_mut().enumerate() {
                        let gradient_for_this_weight =
                            inputs[weight_index] *
                            neuron.activation_gradient;
                        *weight -= gradient_for_this_weight * LEARNING_RATE
                    }
                }
            } else {
                // Workaround for Rust not allowing you to borrow two different vec elements simultaneously.
                let previous_layer: &Layer;
                unsafe { previous_layer = & *net.layers.as_ptr().add(layer_index - 1) }
                let layer = &mut net.layers[layer_index];
                for neuron in &mut layer.neurons {
                    neuron.bias -= neuron.activation_gradient * LEARNING_RATE;
                    for (weight_index, weight) in neuron.weights.iter_mut().enumerate() {
                        let gradient_for_this_weight =
                            previous_layer.neurons[weight_index].value *
                            neuron.activation_gradient;
                        *weight -= gradient_for_this_weight * LEARNING_RATE;
                    }
                }
            }
        }
    }
}

Which outputs:

Iteration 0 errors: area 223.106%, circumference: 13.175% (smaller = better)
Iteration 10000000 errors: area 17.861%, circumference: 1.123% (smaller = better)
Iteration 20000000 errors: area 14.656%, circumference: 0.790% (smaller = better)
Iteration 30000000 errors: area 14.516%, circumference: 0.698% (smaller = better)
Iteration 40000000 errors: area 6.359%, circumference: 0.882% (smaller = better)
Iteration 50000000 errors: area 2.966%, circumference: 0.875% (smaller = better)
Iteration 60000000 errors: area 2.769%, circumference: 0.807% (smaller = better)
Iteration 70000000 errors: area 2.600%, circumference: 0.698% (smaller = better)
Iteration 80000000 errors: area 2.401%, circumference: 0.573% (smaller = better)
Iteration 90000000 errors: area 2.166%, circumference: 0.468% (smaller = better)

Which you can see the error percentage drop down as it ‘learns’ to calculate the area and circumference of a rectangle. Magic!

Thanks for reading, hope you found this helpful, at least a tiny bit, God bless!

Photo by Jonas Hensel on Unsplash

Previewable SwiftUI ViewModels

2024-05-16T00:00:00+10:00

Hi all, I’d like to talk about a way to setup your ViewModels in SwiftUI to make previews easy:

A) Decouple your ViewModels from your Views.
B) Replace your ViewModel when previewing.
C) Easily inject any ViewState content when previewing.
D) Test your ViewModels without needing a View, instead testing their ViewState.

I’ve used a variant of this (I simplified it a little) with a big team before so I know it’s battle-proven. But of course this may be more helpful as a starting point for you, too.

The general idea is this: Have a ‘ViewModel’ protocol, and make your Views have a generic constraint to accept any ViewModel that uses that view’s specific state/events, and use a preview viewmodel that adheres to the protocol.

One-time boilerplate

So here’s the generic ViewModel that every screen will re-use. ViewEvent is typically an enum, and used by the View to eg send button presses to the ViewModel. ViewState is the struct that is used to push the loaded/loading/error/whatever state to the View.

protocol ViewModel<ViewEvent, ViewState>: ObservableObject {
    associatedtype ViewEvent
    associatedtype ViewState

    // For communication in the VM -> View direction:
    var viewState: ViewState { get set }

    // For communication in the View -> VM direction:
    func handle(event: ViewEvent)
}

Somewhere you’ll have a ‘preview’ viewmodel. This is declared once and used by all screens you want to preview. I’m a fan of putting your preview code in a conditional compilation statement. Note that this allows you to inject any viewstate you like. Is ‘preview view’ a tautology? Should this be called PreviewModel or PreViewModel? Flip a coin to decide…

#if targetEnvironment(simulator)
class PreviewViewModel<ViewEvent, ViewState>: ViewModel {
    @Published var viewState: ViewState

    init(viewState: ViewState) {
        self.viewState = viewState
    }

    func handle(event: ViewEvent) {
        print("Event: \(event)")
    }
}
#endif

View

Before I show the view, I’ll introduce the event and states. Firstly the event enum, this is the single ‘pipe’ via which the View calls through to the ViewModel (aspirationally… 2-way bindings sidestep this). You will likely have associated values on some of these, eg the id of which row was pressed, that kind of thing:

enum FooViewEvent {
    case hello
    case goodbye
    case present
}

Next is the ViewState. This controls what is displayed. Typically you might have an loading/loaded/error enum in here, among other things. Notice there’s an ‘xIsPresented’ var here that is used in a 2-way-binding later for modal presentation:

struct FooViewState: Equatable {
    var text: String
    var sheetIsPresented: Bool = false
}

Ok, now the state and event are out of the way, here’s how a view might look. Note the gnarly generic clause up the top, this is the trickiest part of this whole technique to be honest. Basically it’s saying ‘I can accept any ViewModel that uses this particular screen’s event/state’. Also note the 2-way binding for the modal sheet: even though this somewhat side-steps the idea of piping all input/output through the event/state concept, it’s very SwiftUI-idiomatic to use these bindings so I don’t want to be overly rigid and make life difficult: we want to avoid ‘cutting against the grain’ when working with SwiftUI. So, yeah, this isn’t architecturally pure, but it is productive!

struct FooView<VM: ViewModel>: View
where VM.ViewEvent == FooViewEvent,
      VM.ViewState == FooViewState
{
    @StateObject var viewModel: VM

    var body: some View {
        VStack {
            Text(viewModel.viewState.text)
            Button("Hello") {
                viewModel.handle(event: .hello)
            }
            Button("Goodbye") {
                viewModel.handle(event: .goodbye)
            }
            Button("Present modal sheet") {
                viewModel.handle(event: .present)
            }
        }
        .sheet(isPresented: $viewModel.viewState.sheetIsPresented) {
            Text("This is a modal sheet!")
                .presentationDetents([.medium])
                .presentationDragIndicator(.visible)
        }
    }
}

ViewModel

Last but not least is the ViewModel for this screen. Note that because viewState is @Published, and ViewModel is a @StateObject, any updates to viewState are magically automatically applied to the View. It’s really simple, no Combine required! Also note the xIsPresented is trivial to set to true to present something, far simpler than using some form of router which I fear can be convoluted.

class FooViewModel: ViewModel {
    @Published var viewState: FooViewState

    init() {
        viewState = FooViewState(
            text: "Nothing has happened yet."
        )
    }

    func handle(event: FooViewEvent) {
        switch event {
        case .hello:
            viewState.text = "👋"
        case .goodbye:
            viewState.text = "😢"
        case .present:
            viewState.sheetIsPresented = true
        }
    }
}

Previews

At the bottom of the view file you’ll want your previews. By using the PreviewViewModel you can inject whatever ViewState you like:

#if targetEnvironment(simulator)
#Preview {
    FooView(
        viewModel: PreviewViewModel(
            viewState: FooViewState(
                text: "This is a preview!"
            )
        )
    )
}    
#endif

Conclusion

I hope this helps you use SwiftUI in a preview-friendly way! SwiftUI without previews is the pits…

The source for this is on this github gist here

Thanks for reading, hope you found this helpful, at least a tiny bit, God bless!

Photo by Yahya Gopalani on Unsplash Font by Khurasan on Dafont

Neural Networks explained with spreadsheets, 3: Training a single neuron

2024-04-22T00:00:00+10:00

Hi all, here’s the third on my series on neural networks / machine learning / AI from scratch. In the previous articles (please read them first!), I explained how a single neuron works, and how to calculate the gradient of its weight and bias. In this article, I’ll explain how you can use those gradients to train the neuron.

Spreadsheet

I recommend opening this spreadsheet in a separate tab, and viewing it as you read this post which explains the maths: Single neuron training.

In case the linked spreadsheet is lost to posterity, here it is in slightly less well-formatted form (note: for brevity’s sake, I’ve shortened references such as B2 to simply ‘B’ when referring to a column in the same row):

	A	C	D	F	G	H	I	J	K	L	N	O	P	Q
1	Learning rate	Training		Neuron							Outputs
2	0.1	In	Out	Input	Weight	Weight gradient	Bias	Bias gradient	Net	Output	Target	Attempt	Error	Loss
3		0.01	0.1 (C*10)	0.01 (C)	0.5	J * F	0.5	P * (1-L²)	F*G+I	Tanh(K)	0.1 (D)	1	L-N	P² / 2
4		0.01	0.1 (C*10)	0.01 (C)	G3 - H3 * LEARNING_RATE	J * F	I3 - J3 * LEARNING_RATE	P * (1-L²)	F*G+I	Tanh(K)	0.1 (D)	2	L-N	P² / 2
5		0.01	0.1 (C*10)	0.01 (C)	G4 - H4 * LEARNING_RATE	J * F	I4 - J4 * LEARNING_RATE	P * (1-L²)	F*G+I	Tanh(K)	0.1 (D)	3	L-N	P² / 2

High level explanation

Note: “Parameters” is the umbrella term for “weights and biases”.

Row 3 starts with any old values for the parameters.
Row 4 optimises the parameters a little to decrease the error.
Row 5.1000 repeat this optimisation process, aka ‘gradient descent’.
Eventually the optimised parameters will produce the output we want!

Detailed explanation

A2 is the ‘learning rate’. This governs how much we ‘nudge’ our weight/bias each iteration. In this example it’s higher than a more common 0.1% - 1%.

Columns C-D are the ‘training data’. In this example we want to train the neuron to multiply by 10.

Columns F-L are the neuron maths, as covered by my earlier articles. The two gradients in particular are tricky and important: They dictate which direction the bias/weight should respectively be ‘nudged’ to decrease the error.

Columns N-Q are the outputs, and useful for producing the neat graph you’ll hopefully see in the actual spreadsheet, which demonstrates how the error decreases over the iterations.

Row 3 is the initial data. At this point in a real implementation we would typically choose random values for the initial bias and weight, however I’ve chosen 0.5 to start with because it’s a nice round number.

🧨💣💥 Rows 4+ are the same as row 3, except that the parameters have some of their gradient subtracted each time. (this is the important bit)

Incidentally, this might help explain why training a NN uses a lot more computation than using it: Because of all the gradient calculations and iterations over training data.

And there you have it, that’s how to use the gradients to train a single neuron. Next I’ll explain how to calculate the gradients for a network of them!

Rust demo

Because I’m a Rust tragic, here’s a demo:

const LEARNING_RATE: f64 = 0.01;
const TRAINING_INPUT: f64 = 0.01;
const TRAINING_OUTPUT: f64 = 0.1;

fn main() {
    // Initial parameters.
    let mut weight: f64 = 0.5;
    let mut bias: f64 = 0.5;

    // Train.
    for _ in 0..100_000 {
        let net = TRAINING_INPUT * weight + bias;
        let output = net.tanh();
        let error = output - TRAINING_OUTPUT;
        let loss = error * error / 2.;
        let bias_gradient = error * (1. - output * output);
        let weight_gradient = bias_gradient * TRAINING_INPUT;
        weight -= weight_gradient * LEARNING_RATE;
        bias -= bias_gradient * LEARNING_RATE;
    }

    // Use the trained parameters:
    let trained_net = TRAINING_INPUT * weight + bias;
    let trained_output = trained_net.tanh();
    println!("Trained output: {}", trained_output);
}

Which outputs:

Trained output: 0.1000000000000007

Which matches the training output nicely!

Thanks for reading, hope you found this helpful, at least a tiny bit, God bless!

Photo by Eugene Golovesov on Unsplash