scvalex.net

On zram swap and zswap

2026-05-05T00:00:00Z

I recently converted all my machines from zram swap to zswap. In this post I go over the differences between the two and why zswap is almost certainly better for any general use-case.

This excellent post by Chris Down is what got me to change the zram swap setup I’ve been using for 5 years on both desktops and servers. Specifically, the discussion about behaviour under memory pressure buried in the middle of the post is what convinced me.

That said, I think the above post errs into being too technical. I suspect the author was trying to be comprehensive in order to establish ethos and preempt questions, but the result is an argument cluttered by details.

I am going to make the same argument here—that basically everybody should use zswap instead of zram swap—using simpler language and glossing over technical nuances. Read Chris Down’s post for the nitty gritty details.

Linux memory

Linux has several kinds of memory:

Free memory. This is unused memory that can be allocated to programs right now.
Anonymous/program memory. This is memory allocated by programs and the kernel on the stack and on the heap. This is what people mean by Used memory.
Cache. These are bytes that apps have recently read from disk, bytes that are buffered before being written to disk, but also the executable code of programs and mmaps. So, basically everything for which a copy exists on the disk. The cache can be “evicted” to free up memory.
Swap. These are pages of Anonymous/program memory that the kernel decided aren’t being used at the moment, so it has “swapped” them out. If the system has a swap partition or swap file, swapping out means writing to disk and swapping in means reading from disk. If the system has zram swap, then swapping just means (de)compressing pages in memory.

We can see the different kinds of memory graphically in a system monitor like btop:

btop showing 30.9 GiB of main memory and 24.2 GiB of swap. Of the main memory, 9.8 GiB are used, 21.1 GiB are available, 9.5 GiB are cached, and 10.3 GiB are free.
Of swap, 7.6 GiB are used and 16.6 GiB are free.

The above is a screenshot from my laptop which has been running for 3 days. The important observation is that swap is being used even though the system has plenty of free memory.

We can see what processes are currently being swapped with smem:

$ smem --autosize -k -c "swap command" -s swap -r | sed -e 's#/nix/store/[^/]*/##' | head -n 20
   Swap Command
   2.8G bin/rust-analyzer
 418.0M hx
 193.9M usr/lib/zotero-bin-7.0.27/zotero-bin -app /nix/store/8i6idm7gf4wccmna246a004b8q7wa3z5-zotero-7.0.27/usr/lib/zotero-bin-7.0.27/app
 177.0M /run/current-system/sw/bin/harper-ls --stdio
 162.1M bin/node /nix/store/5kfn1vqwxr11micrqxw2klvl0bz6f9zg-tailwindcss-language-server-0.14.28/bin/tailwindcss-language-server --stdio
 161.6M /run/current-system/sw/bin/firefox
 158.8M bin/.thunderbird-wrapped_ --name thunderbird
 150.8M /run/current-system/sw/bin/birdtray
 135.8M ./target/release/dis dev
 118.6M bin/elisa
 116.0M libexec/electron/electron /nix/store/wfjrp90gxm2jcfq0vwbza81lywwjs6wq-signal-desktop-8.6.1/share/signal-desktop/app.a
 115.1M opt/Discord/.Discord-wrapped --type=zygote --no-zygote-sandbox
 100.0M bin/plasmashell --no-respawn
  98.6M lib/marksman/marksman server
  91.9M /proc/self/exe --type=renderer --crashpad-handler-pid=3340 --enable-crash-reporter=c8233cca-57ab-49cf-a7c9-87ac7d5730bc,no_channel --user-data-dir=/home/scvalex/.config/discord --standard
  81.7M /run/current-system/sw/bin/neochat
  79.5M bin/tailwindcss --minify --no-autoprefixer -o styles_out/r/screen3.css -i styles/screen3.css --watch
  61.5M lib/firefox/firefox -contentproc -isForBrowser -prefsHandle 0:36709 -prefMapHandle 1:294101 -jsInitHandle 2:156120 -parentBuildID
  61.1M /run/current-system/sw/bin/Discord --enable-speech-dispatcher

Half the swap usage comes from my editor (Helix, rust-analyzer, TailwindCSS, marksman, and harper-ls). I think what’s going on is that when I opened my blog to write this post, I also opened some Rust source file. This triggered rust-analyzer to load all the type information into memory, but since I’m not doing any Rust coding, that memory was never touched again, the kernel noticed, so it swapped it out.

This is the behaviour we want. There’s no point in keeping stuff in Used memory if it isn’t being actively used. It’s better to swap out cold pages, so that the system has more Free memory it could use for other programs or for Cache. The last bit is particularly important. Since disk IO is 1000x slower than memory, it’s better to fill memory with disk cache than with cold program data.

The above is true under normal circumstances, but it’s even more important if the system is under memory pressure. If I started a big build that needed 15 GB of memory, the kernel would first use the 10 GB of Free memory, then evict 5 GB of Cache to make room.

However, if I didn’t have any Swap, then I’d only have 3 GB of Free, so the kernel would evict the entire 10 GB of Cache, then OOM kill another 2 GB of programs. Having programs OOM’d is bad, but not having any Cache and forcing the build to fully write every build artifact to disk only for it to be immediately read back is worse.

When under memory pressure, something is going to get evicted to disk. It’s better to swap out cold program memory than to reduce disk cache.

If you’re like me and remember the 2000’s when memory was scarce and “swapping” meant that your system would crawl to a halt while the disk made angry noises, the world has changed. Running out of memory and putting the kernel in a situation where it has to swap out the big process that’s using 100% of the CPU is still bad, but that’s not what happens most of the time. Usually, the kernel will be swapping out massive Electron apps like Signal and Discord which slowly leak memory or programs that haven’t been touched in a while. Also, SSD’s are 20x faster than spinning disks, so the penalty for swapping is a lot lower. Also also, disk space is really cheap now, so there’s genuinely no reason not to just give the system a few tens of gigs of swap.

`zram` swap

We’ve established that swap is good, but what kind of swap is better? Let’s consider swap on zram first. To quote the kernel docs:

The zram module creates RAM-based block devices named /dev/zramN (N = 0, 1, …). Pages written to these disks are compressed and stored in memory itself. These disks allow very fast I/O and compression provides good amounts of memory savings. Some of the use cases include /tmp storage, use as swap disks, various caches under /var and maybe many more. :)

To use normal swap, we have a swap partition /dev/sda2 and we call swapon /dev/sda2 to activate it. To use swap on zram, we first create a /dev/zram0 device with modprobe zram, then we call swapon /dev/zram0.

Swap on zram is the closest real thing to the Download More RAM joke. However, zram is not RAM because programs can’t use it directly. As in, there’s no way to run Firefox directly from zram. Firefox always runs from normal memory, but just like how some of its pages can be swapped out to disk, they can also be swapped out to a different part of memory which happens to be compressed. The pages then need to be copied and decompressed if Firefox accesses them.

The benefits of swap on zram over swap on disk is that zram is much faster since it’s in memory and is always available since we don’t need a special partition or file. But being fast isn’t very useful because this memory is by definition infrequently accessed. For example, being able to swap in my dormant 3 GB rust-analyzer half a second faster isn’t really valuable to me. And not needing a pre-existing file or partition is a feature, but we can just create the file or partition in most cases.

The big downside of swap on zram is that it still uses memory. Assuming an optimistic compression ratio of 2:1, my laptop would now be using 4 GB more memory with zram. That’s paying 12% of my total memory for benefits I don’t find particularly compelling.

So, just because swap is good, swap on zram is better than no swap, but we can easily do better with swap on disk.

`zswap`

As we’ve seen, swap on zram has gotchas. By contrast, zswap is just normal swap fronted by a compressed in-memory cache. To quote the kernel docs:

Zswap is a lightweight compressed cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a dynamically allocated RAM-based memory pool. Zswap basically trades CPU cycles for potentially reduced swap I/O. This trade-off can also result in a significant performance improvement if reads from the compressed cache are faster than reads from a swap device.

This is just the swap on zram trick, except the kernel knows that the compressed things are swap pages, so it can do specialized logic like evict the pages to disk when they’ve been cold for long enough.

In other words, zswap implements a proper memory hierarchy:

Frequently accessed pages are stored in RAM,
Infrequently accessed pages are stored in zswap‘s’ compressed RAM, and
Cold pages are swapped to disk.

So, zswap has the benefits of both swap on disk and swap on zram. When we have swap on disk, we should enable zswap as well.

Conclusion

TL;DR For any general use-case, have a swap partition or swap file and enable zswap. Don’t bother with swap on zram unless you really can’t have swap on disk.

I’ve been saying “general use-case” in this post, but I think it’s basically every use-case in 2026. The usual example where swap on zram is better is an embedded device running on flash memory that degrades quickly if thrashed. As Chris Down’s post points out, you can still get disk thrashing from the cache being evicted under memory pressure, so zram is not a panacea. I would add that the Raspberry Pi 5 has been out since 2023 and has NVMe support, so I don’t think the “crappy flash memory” use-case is going to be a concern for much longer even in the embedded space.

What the hell is a decibel?

2026-04-25T00:00:00Z

Decibels come up often in digital audio and I’ve always found them to be confusing. In this post, I try to explain what decibels are and more importantly, how they’re used in practice.

Contents ↑ top ↑

A unit of nothing

I think the most striking thing about decibels is how alien they are compared to other units of measurement we encounter in the real world.

For example, we often talk about centimeters. If your display is configured properly, the following rectangle is 1 cm wide: . We can convert centimeters to millimeters or kilometers. We can also convert centimeters to inches or miles. However, whatever we convert them to, there is always a unit of length left—we can never convert a centimeter to a plain number.

Decibels look like centimeters. There’s an SI prefix, “deci”, followed by a thing, “bel”. One would reasonably assume a “bel” is a unit of loudness or something of the sort, but no, it’s actually just the number $1.122$ (approximatley). This means we can evaluate a decibel value to a plain number. For example, $6 dB = 1.995$ .

Since decibels are plain numbers, we can reasonably tack on other units of measurement. For example, my height is about $45.25 dB cm$ . This is how decibels are usually used in practice and the unit of measurement is implied by the context. This is why the sound of breathing is 10 dB, the sound of a car is 50 dB, but the volume slider in every audio editing application goes from -60 dB to 0 dB. Although the numbers all measure loudness, they don’t fit together. This is because the first pair are in dB_SPL and the second pair are in dBFS.

To add to the confusion, there are two different formulas for decibels and context determines which one to use.

Power formula. This applies to power values such as energy use in an amplifier.
- To evaluate $x dB$ to a number use: $10^{\frac{x}{10}}$ .
- To compute the decibel value of the number $x$ use: $10 \times \log_{10} (x)$ .
Amplitude formula. This applies to the loudness of sound and digital audio.
- To evaluate $x dB$ to a number use: $10^{\frac{x}{20}}$ .
- To compute the decibel value of the number $x$ use: $20 \times \log_{10} (x)$ .

In the context of digital audio, the rule of thumb is that positive dB values refer to physical sound pressure and any dB values with a sign in front of them (e.g. +6 dB, -12 dB) refer to digital sound volume. Both cases use the amplitude formula.

Sound pressure

The human ear is basically a spiral tube filled with liquid, one end of which is covered by a membrane. When the membrane moves, it displaces the liquid and tiny hairs on the sides of the tube turn this motion into nerve impulses which is what we perceive as sound.

Anatomy of the ear
(source: Bruce Blaus via Wikipedia)

The critical bit is that the eardrum is moved by tiny changes in atmospheric pressure. “Quiet” means the changes in pressure are small. “Loud” means the changes in pressure are big.

The unit of measurement for atmospheric pressure is the Pascal. The problem is that 1 Pa roughly translates to somebody screaming directly into your ear, so it’s not a good baseline in the context of human hearing. Instead, the sound pressure level of 0 dB_SPL is usually taken to be 20 μPa which is the quietest sound a person can hear.

Some examples of real world sound pressure levels are below. We use the amplitude formula to convert between numbers in Pascals and decibels. The rule of thumb for real world sound is that a difference of 6 dB is twice the pressure, 10 dB is three times the pressure, and 20 dB is ten times the pressure.

Sound	dB	Pa
Pain	120 dB	20 Pa
Jet engine at 100m	110 dB	6.3 Pa
Jackhammer	100 dB	2 Pa
Petrol car travelling at 30 kph	67-70 dB	45-63 mPa
Normal conversation	40-60 dB	2-20 mPa
Calm room	20-30 dB	200-630 μPa
Leaf rustling, light breathing	10 dB	63 μPa
Threshold of hearing	0 dB	20 μPa

The dB scale is logarithmic, so adding dB values is equivalent to multiplying the evaluated numbers. For example, $20 dB$ translates to $10$ times the pressure, but $20 + 20 = 40 dB$ translates to $10 \times 10 = 100$ times the pressure. It’s convenient to use decibels when talking about sound pressure because numbers on the $0 \dots 120 dB$ scale are much easier to express than numbers on the $0.00002 \dots 20 Pa$ scale.

Digital audio

Our model for digital audio is that we have a loudspeaker with a membrane. We send floating point numbers in the [-1.0, 1.0] range to the loudspeaker to control the position of the membrane. The membrane is pulled back by -1.0 and pushed forward by 1.0. If we send 48,000 of these “samples” per second, the speaker produces sound.

We adjust the volume of the sound by scaling the samples by some factor in the [0.0, +inf] range. For example, if we scale by 0.5, the speaker membrane moves half as much, so the sound is quieter. If we scale by 4.0, the speaker membrane moves much more, so the sound is louder. The final numbers have to be in the [-1.0, 1.0] range, so scaling by 4.0 only makes sense if we know the peaks in our samples are less than 0.25.

The difficulty with volume is that scaling by 0.5 doesn’t lower the perceived volume by half because our ears interpret changes in pressure as logarithmic. We have higher resolution for quiet sounds than for loud ones. This makes evolutionary sense since it’s more important to hear the slightly increased rustling of the tall grass as the saber-tooth tiger is sneaking up on you than being able to distinguish between loud and super-loud.

This effect is easy to see with an example. The two players below emit 440 Hz sine waves. The volume adjustments for both go from 0.01 to 1.00. The difference between them is that the first slider is linear, so half-way is 0.5, but the second slider is logarithmic, so half-way is 0.07. Turn up the volume on your computer and headset for this example.

If you turned up the volume before trying the example, you’ll have noticed that there’s a point where the sound is so loud that making it louder doesn’t make it feel any louder. For me, that point is about 0.6 which is 60% of the linear slider, but 90% of the log slider. Pressing both play buttons sends the sounds in stereo—the left/linear channel obviously stays saturated for longer than the right/log channel. In other words, almost half of the linear slider is “too loud”, so the UI space is wasted.

When converting a slider position to decibels in digital audio, we use the amplitude formula ( $20 \times l o g_{10} (x)$ ). There isn’t really an implied unit of measurement for these decibels since they represent pure numbers in a computer, so they’re called decibels relative to full scale (dBFS).

Since the numbers are frequently in the [0.0, 1.0] range, we often see negative decibel values like $- 60 dB$ and $0 dB$ . There’s no way to represent 0 as decibels, but $- 60 dB = 10^{\frac{- 60}{20}} = 10^{- 3} = 0.001$ is so quiet it might as well be 0. And $0 dB = 10^{\frac{0}{20}} = 1.0$ .

The rule of thumb for digital audio is that 6 dB means 2x and -6 dB means 0.5x. So, increasing the volume by 6 dB just means scaling all the samples by 2.0. If somebody says to cap the peaks in the sound to -6 dB, they’re saying to make sure no samples have values outside the [-0.5, 0.5] range.

Since decibels are logarithms, adding them together translates into multiplying the scaling factors. So, a gain effect of +6 dB followed by a gain of +18 dB has the cummulative effect of 24 dB of gain. Since $24 = 4 \times 6$ and 6 dB means 2x scaling, this translates to a cummulative scaling of $2^{4} = 16$ .

Next up

That’s it for decibels. When we see a graph with a scale from -60 dB to 0 dB, we now know that this means it goes from practically 0.0 to 1.0 and that the smaller values are over-emphasized to match how human hearing works. We also know that +6 dB means scale the numbers by 2x and that keeping our sound peaks below -12 dB just means ensuring the absolute numbers don’t exceed 0.25. Finally, if we hear someone say babies scream at 100 dB, we know that’s just a completely different unit of measurement.

Next, we’ll go back to our trusty sine waves and see how to break apart a sound into sines at different volumes.

A load balancer on every host

2026-04-21T00:00:00Z

I’ve started running HAProxy on every machine in my fleet. This neatly solves the problem of connecting to services in my Kubernetes cluster, as well as making it possible to have nice URLs for local services running on weird ports.

Contents ↑ top ↑

Problem statement

There are two problems to solve here. The first is that I want to be able to connect to any one of the multiple servers running some service. For example, there are three Kubernetes masters running in my cluster listening on addresses like 10.10.0.18:6443 and I want my admin tooling to connect to whichever of them is available. This needs to work from every machine in the fleet and not just my workstation.

The second problem is that I would like to have clean URLs for services running on ports other than 80. For example, I would like to use http://livebook.local for the Livebook server listening on http://127.0.0.1:30123. Getting a nice name is easy, getting rid of the port is harder.

DNS “load balancing”

The first problem is obviously a load balancing problem. Before we solve it properly, let’s look at something that doesn’t work, namely DNS load balancing.

When I first encountered this, I thought “Oh, I’ll just use /etc/hosts!”

10.10.0.16 fsn-qpr-kube3.wg kube.eu1
10.10.0.17 fsn-qpr-kube2.wg kube.eu1
10.10.0.18 fsn-qpr-kube1.wg kube.eu1

Excerpt from /etc/hosts. This does not work.

Each of the hosts has a unique name associated with its IP address, but they all also share the common name kube.eu1. My thinking was that I could now use https://kube.eu1:6443 as the address of the master and tools would randomly pick one of the IPs and fallback if they’re unreachable.

What happened in practice is that the system resolver picked one of the three IPs and always returned that. So, tools would just stickily bind to one of the masters and if that one was down, the tool wouldn’t work.

My next thought was to use a real DNS server. For example, I could add these internal addresses to a public subdomain like kube.scvalex.net. DNS queries would now return all three IPs. Unfortunately, this also doesn’t work because tools would have to be smart enough to try out multiple addresses and fallback gracefully. I know I’ve never written a program that does this, so I wouldn’t want to rely on others putting in the effort.

Load balancer

Enough beating around the bush—we need a load balancer. The standard LB setup looks something like this:

A normal load balancer setup

We have three servers: srv-0, srv-1, and srv-2 (currently down). Sitting in front of them are two LBs: lb-0 and lb-1. When “Bob” wants to connect to service srv, he looks it up in DNS and gets back lb-0 as its address. “Bob” then connects to lb-0 which proxies the request to one of srv-0 or srv-1. If any of the servers want to connect to service srv (e.g. because we’re running both Kubernetes masters and nodes on the same hosts), they do the same thing as “Bob” and go through the LB. Since the LBs are continuously checking the health of their backends, they know that srv-2 is currently down and won’t route any traffic to it.

When we upgrade any of the servers, we can just take them down and rely on the LB health checks to ensure clients don’t see too much of a disruption.

When we upgrade an LB, we first have to update DNS to point to the other one. Then we wait for the update to fully propagate and for all clients to switch to the second LB. Then we can finally take down the first LB.

The above is a lot of work. I don’t want to maintain two extra servers, I don’t want to wait for DNS updates to propagate, and I most definitely don’t want a complicated upgrade procedure. If you’re like me and have a mostly static infrastructure, then we can do away with the separate LB servers and just put an LB on each host.

A load balancer on every host

In this mesh setup, we have an LB on every client and on every server. When we want to reach a service from any host, we connect to the LB listening on localhost which then proxies the request to the correct server. The LBs on all the hosts are continuously health-checking each other, so they all know if any servers go down.

HAProxy

For our load balancer, we’re going to use HAProxy because it’s been around for a long time, has an absurd number of configuration options, and works flawlessly for what we’re trying to do. It’s also pretty lightweight using only ~50 MB of memory after running for several days.

The first problem we were trying to solve was access to the three Kubernetes masters. These listen on multiple addresses like 10.10.0.18:6443 and we want them to be reachable with a single address.

global
  log /dev/log local0
  daemon
defaults
  log global
  timeout connect 5s
  timeout client 1m
  timeout server 1m
frontend kube-eu1
  bind 127.0.0.100:6443
  mode tcp
  option tcplog
  use_backend kube-eu1_backend
backend kube-eu1_backend
  balance roundrobin
  mode tcp
  server server0 10.10.0.18:6443 check
  server server1 10.10.0.17:6443 check
  server server2 10.10.0.16:6443 check

Excerpt from haproxy.conf

The backends use TLS, so we use HAProxy’s mode tcp to avoid having to deal with extra certificates. We also turn on option tcplog to getter more detailed log lines. Technically, TCP is the default, so we don’t need to bother, but I find this makes the configs more symmetric with the HTTP ones we’ll see later.

The server server0 10.10.0.18:6443 check lines list the backend server addresses. The check option is what enables live TCP checks. The configuration for these checks are the timeout options at the beginning of the file.

One interesting bit is the address HAProxy is configured to listen on: 127.0.0.100:6443. This works because, at least on NixOS, the lo interface is configured to have address 127.0.0.1/8. The /8 means that any address which starts with 127 works, so we pick 127.0.0.100 to be HAProxy’s address on every host.

Our second problem was getting nice URLs for services running on weird ports. The example was a Livebook server listening on 127.0.0.1:30123. A nice URL means port 80 and a nice name. We do port 80 now and leave the nice name for the next section.

frontend http
  bind 127.0.0.100:80
  mode http
  option httplog
  use_backend %[req.hdr(host),lower]

backend livebook.local
  balance roundrobin
  mode http
  server server0 127.0.0.1:30123 check

Excerpt from haproxy.conf

We use mode http and option httplog here to get better logging.

The use_backend %[req.hdr(host),lower] bit tells HAProxy to extract the value of the Host header, lowercase it, and look for a backend with the same name. So, if we type http://livebook.local in a browser, this sets Host: livebook.local in the request, which then makes HAProxy route it to the backend named livebook.local.

We use a similar configuration to access HTTP services in the Kubernetes cluster. For example, I have my Traefik gateway listening on port 30180. It’s configured to route requests for warrior.local to the ArchiveTeam Warrior service running somewhere in the cluster. To redirect requests from any host outside the cluster into it, we add the following to the HAProxy config:

backend warrior.local
  balance roundrobin
  mode http
  server server0 10.10.0.18:30180 check
  server server1 10.10.0.17:30180 check
  server server2 10.10.0.16:30180 check

Excerpt from haproxy.conf

We don’t need another frontend definition because the existing one already selects the backend section using on the Host header.

`/etc/hosts`

With HAProxy set up, we can now reach the Kubernetes masters and HTTP services by hitting 127.0.0.100:6443 and 127.0.0.100:80 respectively. Now we need names for these. We could setup a DNS server, but it’s simpler to use /etc/hosts in this case:

127.0.0.100 kube.eu1 warrior.local livebook.local

Excerpt from /etc/hosts

I know I said earlier that /etc/hosts is weird, but it works as expected as long as no name has more than one IP address assigned to it. Assigning multiple names to the same address is fine.

With this in place, we can access the Kubernetes master at https://kube.eu1:6443, the local Livebook server at http://livebook.local, and the Warrior in the cluster at http://warrior.local, which is what we set out to do.

NixOS module

The setup is complete, but it’s a bit annoying to have to keep haproxy.conf and /etc/hosts in sync, so I wrote a NixOS module to automate things: ab-local-proxy.nix. I am not going to try to upstream this, but you’re welcome to copy it.

Using the module looks like this:

ab.services.localProxy = {
  address = "127.0.0.100";
  tcpMappings = [
    {
      name = "kube-eu1";
      hostname = "kube.eu1";
      port = 6443;
      backends = ["10.10.0.16:6443" "10.10.0.17:6443" "10.10.0.18:6443"];
    }
  ];
  httpMappings = [
    {
      hostname = "warrior.local";
      backends = ["10.10.0.16:30180" "10.10.0.17:30180"  "10.10.0.18:30180"];
    }
    {
      hostname = "livebook.local";
      backends = [ "127.0.0.1:30123" ];
    }
  ];
};

Example usage of ab-local-proxy.nix

Looking back

The idea of doing something like this has been bouncing around in my head for years, but I balked at the thought of running “a heavy load balancer” on every host. Instead, I kept trying and failing to get IPVS to work in a nice way. My mistake was conceptualizing HAProxy as “heavy” when it really isn’t.

I’m currently in the process of cycling the VMs my Kubernetes cluster runs on and it’s so refreshing to see a clear log of “I see this backend going down” and “I see this backend is back up”. It gives me confidence to make changes at a much faster pace than I would’ve previously allowed myself.

Building the grandma videoconf

2026-04-05T00:00:00Z

Grandma wants to talk to her grandkids and also see them. The problem is that grandma cannot interact with modern technology at all. No keyboard, no mouse, no touchscreen—the solution has to be fully automated. Let’s build this.

Front view

Back view

Contents ↑ top ↑

Hardware discussion

Like everything else with this project, the hardware has to be outwardly simple. The ideal would be a tablet, except that modern tablets require far too much user interaction.

For example, the previous solution was an Android tablet with Signal installed. Grandma’s main problem was that she could not press the smallish “answer” button on incoming calls. But even if we fixed that with an app that’s more user-friendly than Signal, the other problems would still remain:

The tablet sometimes pops ups notification dialogs about updates and other things. Grandma cannot read them and cannot even see the “Ok” button to dismiss them.
The tablet sometimes has a lag in turning on its screen. This leads to grandma pressing on the power button for too long until the tablet turns off. Then the tablet takes minutes to start back up.
Grandma sometimes presses the volume buttons when picking up the tablet and mutes it. The buttons are then too small for her to see or feel out easily.

I considered using a tablet and replacing the stock Android with something else, but everything in the mobile space seems to be very “special” for lack of a better word. I would rather use the same tooling I use for everything else, so the device we’re building has to a PC.

The canonical “small PC” is a Raspberry Pi. The problem with Raspberry Pis is that they’re mobile-adjacent so special in their own ways. For example, here is the NixOS Wiki page on the RaspberryPi 4. It lists a lot of little things you need to configure to get everything working. It’s a lot of friction.

The bigger problem with Raspberry Pis is that they run off of SD cards by default. If you leave one turned on for long enough, you discover that the SD card degrades over about 2-3 months to the point where the system won’t boot. It turns out ext4 on an SD card is a bad idea. You can try using something like F2FS, but now you have to figure out how to make a bootable SD card with that, so again, friction.

Raspberry Pi 5’s are much better at not being special. You can even boot them from NVMe flash drives! But you have to buy a separate NVMe shield, and an NVMe drive, not to mention the power adapter. Oh, and you probably want a heatsink and a fan too.

Hardware choices

If I were building something for myself, a Raspberry Pi 5 is probably what I would use. However, what we’re building here needs to work without intervention for months at a time, so a “stick” form-factor PC is the way to go. There aren’t many choices if we want a recent-ish processor, but this DreamQuest one fits the bill. It features an Intel N95 released in 2023, 12 GB of RAM, and a SATA SSD. It has 4 USB type-A ports, 2 type-C ports (one of which is the power port), an Ethernet port, 2 HDMI ports, and even a headphone jack. Weight-wise, it feels a bit heavier than a mobile phone.

DreamQuest stick PC compared to a mobile phone

For the monitor, we want something small-ish so that it fits on grandma’s coffee table. We also want it to be light enough that she can move it around. I went with the Raspberry Pi monitor. It’s 15.6’ which is a bit big, but it’s USB powered so it doesn’t need an extra power cable, and weighs only about 1 kg. It has built-in loudspeakers, so that’s one less component to worry about. It has a flap on the back which means it doesn’t need a dedicated stand.

Raspberry Pi monitor

The remaining components are any USB webcam and a bunch of cables. I used some I had lying around. The PC cost £200, the monitor was £100, a new webcam would’ve cost about £50, and let’s say the cables were £50. This brings the cost of the whole setup to £400 which is about the same as a mid-range tablet.

Base system

With the hardware settled, we now turn to software. As a reminder, the primary goal is for the setup to be completely hands free—grandma can’t interact with the device in any meaningful way, so there must never be a “System update ready. Install?” prompt. In other words, we need everything to be remotely manageable.

Since I expect I’ll have to build this several times, a secondary goal of mine is for the setup to be repeatable. As such, we’re using NixOS and Home Manager because this allows us to configure everything through files. We do remote deployments with Colmena because it’s what I’m familiar with (not that it matters much since all the NixOS deployment tools are essentially interchangeable).

The full configuration is available here. Below, we’re only going to look at the bits that make this device different from a regular desktop.

First, we need to auto-login into a graphical environment. We do this trivially with getty:

services.getty = {
  autologinUser = "auto";
  autologinOnce = false;
};
environment.loginShellInit = ''
  [[ "$(tty)" == /dev/tty1 ]] && exec dbus-run-session sway
'';

Excerpt from dor-qws-vid1.nix

For the desktop manager, I picked Sway because we can configure it through a single file, it’s well supported, and it’s Wayland so we don’t have to deal with any X11 jank. The configuration is basically the default one with the status bar commented out, window borders turned off, and some programs started automatically.

programs.sway.enable = true;
home-manager.users.auto =
  { ... }:
  {
    # …
    home.file.sway-config = {
      source = ./dor-qws-vid1/sway-config;
      target = ".config/sway/config";
    };
  };

Excerpt from dor-qws-vid1.nix

# … default config

# bar {
# }

default_border none

output * bg /home/auto/Documents/wallpapers/forest-1.jpg fill
exec /run/current-system/sw/bin/wpaperd -d

exec wayvnc 127.0.0.1 5900

include /etc/sway/config.d/*

Excerpt from ~/.config/sway/config

We set the background to a nice photo of a forest and we start wpaperd to change it every few minutes.

environment.systemPackages = with pkgs; [
  # …
  wpaperd
];
home-manager.users.auto =
  { ... }:
  {
    # …
    home.file.wpaperd-config = {
      source = ./dor-qws-vid1/wpaperd-config.toml;
      target = ".config/wpaperd/config.toml";
    };
  };

Excerpt from dor-qws-vid1.nix

[default]
path = "/home/auto/Documents/wallpapers"
duration = "5m"
transition-time = 500

~/.config/wpaperd/config.toml

Next, we want to be able to control screen brightness programmatically so that grandma doesn’t have to find the physical buttons on the back of the monitor. The way to do this is with ddcutil and it’s a bit arcane, but it does work consistently.

environment.systemPackages = with pkgs; [
  # …
  ddcutil
];

# Brightness control
# https://wiki.nixos.org/wiki/Backlight#Via_ddcutil
# Min: `ddcutil --bus=0 setvcp 10 0`
# Max: `ddcutil --bus=0 setvcp 10 60`
hardware.i2c.enable = true;

Excerpt from dor-qws-vid1.nix

A command like ddcutil --bus=0 setvcp 10 60 sets the brightness to 60%. The --bus parameter has to be determined experimentally, but ddcutil detect can probably guess it. The setvcp 10 bit is the magic code to control brightness. The last number is the brightness value and it only seems to do anything between 0 and 60 on this screen.

Finally, we want to turn off the screen at night so that it doesn’t disturb grandma’s sleep. We can do this by sending Sway a command every morning and every evening.

let
  swaymsgService =
    { time, cmd }:
    {
      serviceConfig = {
        Type = "oneshot";
        WorkingDirectory = "/home/auto/";
      };
      startAt = "*-*-* ${time}";
      script = ''
        if [[ "$(whoami)" = "auto" ]]; then
          export SWAYSOCK="$(ls /run/user/1000/sway-*)"
          ${pkgs.sway}/bin/swaymsg "${cmd}"
        else
          echo "Not auto user"
        fi
      '';
    };
in
{
  # …
  # Turn off screen during the night
  #
  # IMPORTANT: The timers need to be manually `systemctl enable`'d for
  # them to actually start.
  systemd.user.services.turn-off-screen = swaymsgService {
    time = "20:00:00";
    cmd = "output * power off";
  };
  systemd.user.services.turn-on-screen = swaymsgService {
    time = "08:00:00";
    cmd = "output * power on";
  };
}

Excerpt from dor-qws-vid1.nix

Networking

With the base system in place, we can focus on networking. The most important requirement here is for the machine to connect to the Internet out of the box and for us to be able to ssh into it regardless of what NAT it’s placed behind.

First, we configure networkd to use the wired connection if it’s available.

networking.useNetworkd = true;
systemd.network.enable = true;
systemd.network.networks."10-lan" = {
  matchConfig.Name = "enp1s0";
  networkConfig.DHCP = "yes";
};
# Don't stall the boot if a cable isn't connected.
systemd.network.wait-online.enable = false;

Excerpt from dor-qws-vid1.nix

Next, we enable NetworkManager for wireless connectivity. We have to use nmtui to pre-configure the WiFi password. This could be done with flat files, but since this part is going to be different for every iteration of this device, I’m not going to bother.

# Use `nmtui` to configure the Wi-Fi networks.
networking.networkmanager = {
  enable = true;
  wifi = {
    powersave = false;
  };
};

Excerpt from dor-qws-vid1.nix

SSH

Now comes the important bit: we need a way to ssh into this machine. We could setup a VPN, but in my experience, they tend to break unexpectedly when combined with residential networking. Instead, we’ll use plain old ssh reverse tunnels.

The way this works is that our videoconf device ssh’s into a “middleman” host. When we want to ssh into the videoconf, we first ssh into the middleman, then ssh from there into the videoconf.

The OpenSSH option for reverse tunnels is -R. We basically want the videoconf to be running ssh -N -R 32022:localhost:2022 middleman at all times. In this command, 2022 is the videoconf’s ssh port and 32022 is a random port on the middleman host.

The key part of the above is “at all times”, so we need several more configuration options:

We want to connect even if the middleman host gets rebuilt with a new SSH keypair. So, we set UserKnownHostsFile to /dev/null and StrictHostKeyChecking to no.
If the videoconf’s connection to the middleman is interrupted in any way, we want to close the tunnel and then recreate it. The videoconf can detect this happening with the ServerAliveInterval and ServerAliveCountMax options.
We also want the middleman host to detect if the connection drops, so we add ClientAliveInterval and ClientAliveCountMax to its OpenSSH config.
We want the ssh command to exit if the tunnel fails somehow (e.g. if the port is already bound on the middleman host). This is what the ExitOnForwardFailure option does.

So, the full command looks like:

# ssh \
    -o "UserKnownHostsFile /dev/null" \
    -o "StrictHostKeyChecking no" \
    -o "ServerAliveInterval 30" \
    -o "ServerAliveCountMax 3" \
    -o "ExitOnForwardFailure yes" \
    -o "ControlMaster no" \
    -N \
    -i /root/.ssh/id_ed25519 \
    -p 2022 \
    -R 32022:localhost:2022 \
    mid@mid.abstractbinary.org

This is the ultimate ssh configuration.
We'll be using the autossh-ng module do this automatically.

Putting this all together, we first add a new mid user to the middleman host:

users.users.mid = {
  isSystemUser = true;
  createHome = true;
  home = "/home/mid";
  group = "mid";
  openssh.authorizedKeys.keyFiles =
    config.users.users.root.openssh.authorizedKeys.keyFiles ++ cfg.extraKeyFiles;

  # Unless set, the default shell is `nologin` which allows tunnels
  # to be formed, but doesn't allow the user to login or run commands.
  # shell = "${pkgs.coreutils}/bin/true";
};
users.groups.mid = { };
services.openssh.settings = {
  ClientAliveInterval = 30;
  ClientAliveCountMax = 3;
};

Excerpt from ab-middleman.nix

Then, we configure services.autossh-ng on the videoconf to create the tunnel at boot and recreate it when it fails:

services.autossh-ng.sessions = {
  # …
  forward-ssh = {
    user = "root";
    destination = "mid@mid.abstractbinary.org";
    extraArguments = "-i /root/.ssh/id_ed25519 -p 2022 -o \"ControlMaster no\" -R 32022:localhost:2022";
    hostKeyChecking = false;
    knownHostsFile = "/dev/null";
  };
};

Finally, we use the ProxyJump setting in our ~/.ssh/config to tell SSH to go via the middleman:

Host via-mid-dor-qws-vid1
  ProxyJump mid.abstractbinary.org:2022
  HostName localhost
  Port 32022
  User root

~/.ssh/config

Now, we can just use ssh via-mid-dor-qws-vid1 to connect to the videoconf. The fact that this is going through an intermediate host is abstracted from us.

VNC

With the above in place, we can reliably ssh into the videoconf. Strictly speaking, this is enough, but I’ve done enough tech support for older relatives to know that it is very useful to be able to see their screens.

Our two options are VNC and Remote Desktop. The later is the more modern protocol and generally works better over bad connections, but all the RDP servers available in NixOS seem like a hassle to setup. So, we just use wayvnc.

It’s so easy to setup that we’ve already started it in the Sway section:

exec wayvnc 127.0.0.1 5900

Excerpt from ~/.config/sway/config

With the above, wayvnc is listening on port 5900 on localhost on the videoconf. To access this port, we setup another SSH reverse tunnel. It’s the same code as in the previous section, except with 5900 instead of 2022:

services.autossh-ng.sessions = {
  # …
  forward-vnc = {
    user = "root";
    destination = "mid@mid.abstractbinary.org";
    extraArguments = "-i /root/.ssh/id_ed25519 -p 2022 -o \"ControlMaster no\" -R 35900:localhost:5900";
    hostKeyChecking = false;
    knownHostsFile = "/dev/null";
  };
};

Excerpt from dor-qws-vid1.nix

Now, we can just use a VNC client like KRDC to connect. The important KRDC settings are “Connect via SSH tunnel” and “Tunnel via loopback address”.

KRDC settings

Video conferencing

Finally, we get to the actual video conferencing bit. We use Jitsi Meet because it has never let me down and it runs in a web browser.

As a reminder, the solution we’re building has to be fully automatic. The flow “we call grandma and grandma presses button to answer” doesn’t work because grandma can’t reliably press a button.

Instead, the plan is to ssh into the videoconf and run a Selenium script that starts up Firefox, goes to Jitsi Meet, and joins a specific meeting.

At the system level, we need Firefox, geckodriver, and Selenium installed:

environment.systemPackages = with pkgs; [
  # …
  geckodriver
  (python3.withPackages (
    python-pkgs: with python-pkgs; [
      selenium
    ]
  ))
];
programs.firefox.enable = true;

Excerpt from dor-qws-vid1.nix

Then, we write a simple script to join a meeting. The only trickyness is that we need WAYLAND_DISPLAY set for Firefox to actually start, and we need to configure some permissions in the profile to allow webcam and microphone access.

#!/usr/bin/env python3

from selenium import webdriver
from selenium.webdriver.common.by import By
import os
import argparse

def main():
    parser = argparse.ArgumentParser("join-jitsi-meeting")
    parser.add_argument("meeting", help="Meeting code to join")
    args = parser.parse_args()
    print("Connecting to jitsi meeting '%s'…" % (args.meeting,))

    os.environ["WAYLAND_DISPLAY"] = "wayland-1"
    options = webdriver.FirefoxOptions()
    options.args = ["-profile", "/home/auto/.config/mozilla/firefox/zfegmmhu.default/"]
    options.preferences["media.navigator.enabled"] = True
    options.preferences["permissions.default.microphone"] = 1
    options.preferences["permissions.default.camera"] = 1
    # options.binary_location = "/run/current-system/sw/bin/firefox"
    driver = webdriver.Firefox(options=options)

    driver.get("https://meet.jit.si")

    # https://www.selenium.dev/documentation/webdriver/elements/

    driver.implicitly_wait(1.0)
    room_text_box = driver.find_element(by=By.ID, value="enter_room_field")
    enter_room_button = driver.find_element(by=By.ID, value="enter_room_button")
    room_text_box.send_keys(args.meeting)
    enter_room_button.click()

    driver.implicitly_wait(1.0)
    name_text_box = driver.find_element(by=By.ID, value="premeeting-name-input")
    join_button = driver.find_element(by=By.XPATH, value="//div[@aria-label='Join meeting']")
    name_text_box.send_keys("Petra")
    join_button.click()

if __name__ == "__main__":
    main()

join-jitsi-meeting.py

This script will likely break whenever Jitsi change their website, but we have it deployed through Home Manager, so it will be easy to fix:

home-manager.users.auto =
  { ... }:
  {
    # …
    home.file.join_jitsi_meeting = {
      source = ./dor-qws-vid1/join-jitsi-meeting.py;
      target = "scripts/join-jitsi-meeting.py";
    };
  };

Excerpt from dor-qws-vid1.nix

The last piece of the puzzle is to make the videoconf ring. I just grabbed a phone ring sound file from Pixabay and we can play it with paplay telephone-ring.ogg.

Looking back

That’s it—we built a video conferencing device for grandma. We can manage it remotely, see its screen, and do video calls, all without grandma having to press any buttons.

Combining sine waves

2026-02-18T00:00:00Z

In the first post in this series, we generated sine waves. In this post, we combine sine waves together to explore harmonics, dissonance, and the 12 tone system used in Western music.

Contents ↑ top ↑

Waveform graphs

As a reminder, our model of a loudspeaker is a device that takes floating point numbers in the [-1.0, 1.0] range as inputs. Sending it -1.0 pulls the membrane as far as it goes, and sending it +1.0 pushes the membrane as far as possible. We produce sound by pushing and pulling the membrane.

Web Audio defaults to a sample rate of 48,000, so we need to send that many floats every second. For performance reasons, instead of sending the numbers one by one, we send buffers of many numbers a few times a second. We’re using buffers of 1200 samples or 0.025 seconds of sound in these examples.

An example buffer would look like this, except with far more numbers:

[0.0, 0.38, 0.71, 0.92, 1.0, 0.92, 0.71, 0.38, 0.0, -0.38, ..]

If we were to just plot these membrane-position numbers, we’d get a waveform graph. Below is the sine wave generator from the previous post, except now we also show the waveform it is generating.

The points in the displayed waveform are the actual values sent to the speaker, so this is not some sort of idealized representation.

If we increase the frequency of the sine wave, we get more ups-and-downs. Decrease the frequency and we get fewer ups-and-downs. Change the volume and we see the waveform’s height change in response (or more likely, the scale of the graph change).

Harmonics

Sine waves sound clean, but also unnatural. We will never hear a sine wave in nature—it’s just too idealized. That said, something does come close: a vibrating string produces a sine wave at a base frequency and then other weaker sine waves at integer-multiple frequencies.

For example, plucking a string tuned to 220 Hz produces a sine wave at 220 Hz, then increasingly weaker ones at 440 Hz, 660 Hz, 880 Hz, and so on. These integer-multiple frequencies are called harmonics. We start counting at one, so 220 Hz is the first harmonic, 440 Hz is the second, and so on.

Below is our sine wave generator, but with harmonics enabled:

If we turn off the first harmonic and activate just the second, we see that it’s just a sine wave with twice as many ups-and-downs. Let’s activate both first and second harmonics and look at the result.

When we add them together, they’re both going up initially, so the resulting wave goes up faster. The second harmonic then starts going down while the first is still going up, so they fight each other, and the result doesn’t reach as high. Both then go down, leading to a faster down curve. Then the second starts going up while the first is still going down and this creates a hump. Both then go down at the same time. This process creates the asymmetrical big-hump-little-hump pattern. Adding more harmonics adds more humps.

More harmonics make the sound “fuller”. Activating both even and odd harmonics makes it sound like an ideal vibrating string. Activating just the odd-numbered harmonics makes it sound like an ideal wind instrument. It also makes the waveform look like a bunch of cats waiting to be fed.

Mixing sounds

We’re talking about combining sine waves, but this begs the question: how do we actually implement adding two sounds together?

Suppose we have the buffer for a sine wave:

[0.0, 0.38, 0.71, 0.92, 1.0, 0.92, 0.71, 0.38, 0.0, ..]

We want to add its second harmonic to it:

[0.0, 0.71, 1.0, 0.71, 0.0, -0.71, -1.0, -0.71, 0.0, ..]

How do we combine these two buffers into one? In these examples, we do the simple thing and just average them out:

[0.0, 0.54, 0.85, 0.82, 0.5, 0.11, -0.15, -0.16, 0.0]

Adding the numbers pair-wise has the property that opposite motions of the speaker membrane cancel each other out as they should. The problem with addition is that we might get numbers outside the [-1.0, 1.0] range, so we need to scale down by some factor. We use the arithmetic average because it’s easy to code and easy to reason about.

Averaging sounds like this isn’t “correct” in that it usually makes the result sound quieter than it should. Fixing this isn’t easy because the human ear is the product of billions of years of evolution. It is excellent at hearing the big cat sneak up on you in the tall grass, but it is only mediocre at hearing tones accurately. It perceives sound volume non-linearly and worse yet, it interprets high frequencies as louder than low frequencies. If you want to learn more about this, the keyword to search for is “LUFS”.

Dissonance

On the topic of the human ear evolving to do things other than hearing tones properly, there are certain combinations of sounds we all find irritating. We call this dissonance. It’s easier to show than to explain, so here are two of our sine wave generators:

If we press both play buttons above, we hear two sine waves at the same time. The frequencies 294 and 440 Hz have a ratio of 1.5. This is the first time we’ve heard two sines that are not an integer multiple of each other. The combination sounds pretty good.

Now let’s lower the 440 towards 294. We notice two things:

The waveform graph start showing a repeating inflate-deflate pattern. This becomes really obvious at around 320.
The sound develops a “beating” quality to it. The closer the frequencies of the two sines, the slower the beating becomes. I personally really don’t like the combination of 294 with 290, 295, or 300.

Now let’s activate harmonics 1 through 4 on both generators. Every additional harmonic adds another higher frequency beating to the combined sound making it more irritating.

Now let’s increase the frequency of the second generator back up to 440. The beating goes away for the most part and the combination sounds fine. I wouldn’t exactly call it pleasant sounding, but I also wouldn’t call it irritating.

It’s also interesting to hear how the “niceness” of the sound doesn’t simply increase linearly with the frequency difference. For example, I like 294+440 much more than 294+430 or 294+450. What’s going on here is that the 3^rd harmonic of 294 is at 882 Hz and the 2^nd harmonic of 430 is at 860 Hz and that’s close enough together that it creates the irritating beating sound. This effect goes away if we turn off the 3rd harmonic in the generators.

The short of it is that we don’t like hearing combinations of sine waves at very close frequencies. It’s also the case that we don’t like it when combinations have harmonics at very close frequencies. Studies have found that this is pretty universal among humans.

That’s all we’re going to say about dissonance here. If you want to learn more, Minute Physics has an excellent video on The Physics of Dissonance (YT link).

12 tones

We’ve been identifying sine waves by their frequencies. This is precise, but it’s awkward in the same way that talking about colors using their RGB hexcodes is awkward. As in, we can probably figure out what #0069a8 and #53eafd are, but it’s much clearer if we say “blue” and “light blue”. The words also tell us something about the relationship between the two that is not obvious from the hexcodes.

Instead of frequencies, let’s start using scientific pitch notation. This is what most people and tools have been using for the last century.

First, we split the entire sound spectrum into ranges such that each range is twice as large as the previous one. For historical reasons, these ranges are called octaves. In the simulator below, the columns are the octaves. For example, octave 3 spans 130.81 Hz to 261.63 Hz and octave 4 spans 261.63 Hz to 523.25 Hz.

Next, we chop each octave into 12 equally large parts. For historical reasons, we call these semitones. Since each octave is twice as large as the previous one, we have to do the split into semitones geometrically as well. So, the frequency of each semitone is $\sqrt[12]{2}$ times bigger than the previous one. The semitones are the rows in the simulator below.

Next, we give each semitone a name. For historical reasons, some of these are written as “notes”: C, D, E, F, G, A, and B. The others are written as “notes with accidentals”: C#, D#, F#, G#, and A#.

Finally, we can uniquely identify any tone by its note, accidental if present, and octave. Some important tones are:

C4 at 261.63 Hz, also called “middle C” because it’s the middle key on a piano,
A4 at 440 Hz, which is the reference frequency and also the A string on a violin, and
A#4 at 466.16 Hz, which is the “trust me bro, I can tune a violin by ear and this is definitely what A4 sounds like” tone.

Most of the above notation is arbitrary—the only fundamental aspect is that since string and wind harmonics are multiples of each other, everything else also has to be in terms of multiples. As we saw in the section on dissonance, it’s very easy to get two sine waves that sound irritating when played together, especially when harmonics are involved. So, I’d say the primary purpose of music notation is to ensure everybody is playing the same frequencies and no dissonance slips in accidentally.

That said, splitting every octave into 12 semitones has some nice properties. For example, I knew before writing the dissonance section that 294 and 440 Hz would sound good together because they’re D4 and A4 which are a “perfect fifth”. Every pair of notes that are a fifth apart sounds good.

Next up

That’s it for now. In this post, we visualized sounds as waveform graphs, we introduced the idea of harmonics to talk about real strings, we saw how some combinations of sine waves sound dissonant, and we introduced a language to make talking about all this easier.

In other words, this post was all about combining sine waves in different ways. Soon, we’ll do the opposite and see how any sound can be decomposed into sine waves through the Fourier transform.

But first, we need to understand what decibels are because they’ll came up as the unit of measurement on every graph going forward.