On Code, and Other Things

Skipping Optional Fields in Prisma 5.x

2025-04-26T00:00:00+00:00

Introduction

I use Prisma ORM for database access in my TypeScript projects. I recently ran into an issue where I needed to skip optional fields in an upsert query. Setting undefined is not allowed in Prisma to avoid unexpected results. The prescribed solution - to set Prisma.skip - does not work in the version I’m using. Upgrading Prisma was not an option as I was in the middle of a feature.

So here’s what worked for me.

tldr; I wrote a function that creates args objects which have only the fields that are present in the incoming data object.

The schema

model PSP {
  id           Int      @id @default(autoincrement())
  userId       String
  pageId       String   @unique
  pageTitle    String
  companyURL   String?
  supportEmail String?
}

The last 3 fields are optional.

The query

A Prisma query for upserting in the case where none of the fields are optional would look like:

    const pspData = ....//Data object with values from the user
    ....

    const psp = await db.pSP.upsert({
        where: {
            pageId: update.pageId,
            userId: update.userId
        },
        create: {
            pageTitle: pspData.pageTitle,
            companyURL: pspData.companyURL,//Optional
            supportEmail: pspData.supportEmail,//Optional
        },
        update: {
            pageTitle: pspData.pageTitle,
            companyURL: pspData.companyURL,//Optional
            supportEmail: pspData.supportEmail,//Optional
        }
    });

Obviously, this does not work as the optional fields would be undefined, and Prisma does not allow that.

The first approach is a brute force way where I created the args objects based on which fields were present in the incoming data object.

const getCreateUpdateArgs = (update: PSPUpdate): { createArgs: any, updateArgs: any } => {
    const createArgs: any = {}
    const updateArgs: any = {}
    if (update.pageTitle) {
        createArgs.pageTitle = update.pageTitle;
        updateArgs.pageTitle = update.pageTitle;
    }
    if (update.companyURL) {
        createArgs.companyURL = update.companyURL;
        updateArgs.companyURL = update.companyURL;
    }
    if (update.supportEmail) {
        createArgs.supportEmail = update.supportEmail;
        updateArgs.supportEmail = update.supportEmail;
    }
    createArgs.pageId = update.pageId;
    createArgs.userId = update.userId;

    return { createArgs, updateArgs };
}

I then used the function’s return values in the upsert query.

    const { createArgs, updateArgs } = getCreateUpdateArgs(pspData);

    const psp = await db.pSP.upsert({
        where: {
            pageId: update.pageId,
            userId: update.userId
        },
        create: {
            ...createArgs
        },
        update: {
            ...updateArgs
        }
    });

This works, but I did not like the idea of checking each field in the incoming data object. My JavaScript skills are not great, so after a bit of searching this is what I came up with:

const getCreateUpdateArgs2 = (update: PSPUpdate): { createArgs: any, updateArgs: any } => {
    const createArgs: any = {}
    const updateArgs: any = {}

    if (typeof update !== 'object' || update === null) {
        throw new Error("Invalid update object");//Unexpected
    }

    if (Object.getOwnPropertyNames(update).length === 0) {
        return { createArgs, updateArgs };
    }

    Object.entries(update).forEach(([key, value]) => {
        if (value && !immutables.includes(key)) {
            updateArgs[key] = value;
            createArgs[key] = value;
        }
    });

    return { createArgs, updateArgs };
}

This assumes a few things:

That there are no nested objects.
There are no fields we want to exclude from the update.

I’m sure there are edge cases I’m not considering, but it works for now.

Summarizing SRE/Ops Podcasts Using an LLM

2025-02-03T00:00:00+00:00

Introduction

There are plenty of good SRE/Ops related podcasts out there. I follow a few of them and listen to episodes whose titles sound interesting. The problem with podcasts is that some episodes focus on one topic, and other episodes deal with a host of topics. In between there is filler and things that are not relevant to the topic but are necessary to carry on a conversation. Spending 30-60 minutes listening to podcasts is not always a great use of time.

A while ago I decided to create a tool that summarizes podcasts for me using an LLM. If I find the summary interesting enough or there is something that I want to learn about, I go and listen to the entire episode. Such a tool might be useful to others also, so I made a website for it - https://www.srenews.info.

I encourage you to listen to the complete episodes if you find a summary interesting - they are linked from each summary page.

My personal favorites include Google’s SRE Prodcast, Incidentally Reliable, and Slight Reliability.

Architecture

The architecture is pretty simple.

Behind the scenes:

The feed checker is not automatic yet. I run it manually for now. The code runs on Cloudflare workers.
The checker triggers a Typescript program which fetches Youtube metadata and the transcript. I chose Typescript as Python packages are not available on Cloudflare workers as of this writing, otherwise Python is my first choice for such things.
The transcript is fed into OpenAI which generates a summary.
The summary and the metadata are used to generate a post based on a Hugo template and pushed to Git.
Netlify deploys the site automatically on Git push.

So far the only costs I’m incurring for this are for the domain name and OpenAI’s API - and it’s worth it.

Today I Learned - How to Add Different Passport Bearer Auth Methods for Different Routes

2025-01-23T00:00:00+00:00

I have a route in ExpressJS that is protected by Passport bearer auth. The docs have a straightforward example which works if that is the only strategy you need.

passport.use(new Strategy(
    function (token: any, cb: any) {
        validateToken(token).then(userid => {
            if (userid) {
                return cb(null, userid)
            } else {
                return cb(null, false)
            }
        }).catch(error => {
            console.log("[EXT_ROUTES]Failed bearer auth", error);
            return cb(error);
        })
    }
));

which is used later like this:

app.use('/api/v1/ext',
    passport.authenticate('bearer', { session: false }),
    externalRoutes,
)

Now I need to add a route that is protected by a different bearer auth strategy. The official docs don’t have clarity on this.

It turns out that the string ‘bearer’ in the passport.authenticate call is an identifier for the strategy. Defining a new strategy then becomes:

passport.use("cached-bearer", new Strategy(
    function (token, cb) {
        validateCachedToken(token).then(userid => {
            if (userid) {
                return cb(null, userid);
            } else {
                return cb(null, false);
            }
        }).catch(error => {
            console.log("[EXT_ROUTES]Failed cached bearer auth", error);
            return cb(error);
        })
    }
));

and it can be used as:

app.use('/api/v1/inbound',
    passport.authenticate('cached-bearer', { session: false }),
    inboundRoutes,
);

This Stackoverflow answer pointed me towards the solution.

It’s curious as to how a lot of official docs don’t handle anything beyond the simplest cases, and also don’t explain basics. The docs are the first thing I look at when doing something with a new library, and it’s often a struggle when they are not sufficient.

Today I Learned - How to Recover a GCP Instance With 0 Boot Disk Space

2025-01-06T00:00:00+00:00

I have a GCP instance that I bring up periodically using instance schedules to run database backups. My database provider has backups of its own but I have an additional backup in place. The GCP instance on boot runs a user script which:

Creates a database dump into a temporary file
tar + gz the dump file
Uploads it to a secure bucket

If this backup fails, I get alerted on Slack.

The Slack alert fired today. I checked the instance and it seemed to have booted up and went down as expected. I booted it up manually and tried to ssh and got a public key error. Attempting to ssh using the browser from the GCP cloud console also failed with the same error.

I enabled serial port logging - and the logs showed that the instance booted up but failed to write the temporary backup files to disk due to lack of space.

Now increasing the boot disk size seemed like one option. However, increasing the disk size from the GCP console just increases the size of the disk and not the partition size. So I needed a way to either increase the partition size or delete some files to free up space. But since I could not login to the instance, I could not do either.

The generally prescribed option here is to:

Create a snapshot of the boot disk
Create a new disk from the snapshot with a larger size
Boot with the new disk (or create a new instance with the new disk)

There seemed to be a shortcut - which was to “detach the boot disk” from the instance and attach it to a new temporary instance, and then:

Boot into the temporary instance
Delete files from the disk
Delete the temporary instance
Reattach the disk to the original instance

This worked.

The reason behind the disk going out of space was that my backup script was not deleting the temporary files after the backup was complete, which I fixed.

On Customer Service

2024-11-17T00:00:00+00:00

This is a ruminative post, and I won’t be using bullet points.

I also learnt the difference between ruminative and ruminating while writing this.

Recently, I got one of my early customers for my SaaS IncidentHub. In our back and forth during the trial period, they happened to mention that one of the deciding factors for them in choosing to go with IncidentHub against a similar product was our “excellent support”.

But what exactly is support? For a software product, or any other product?

For me personally, the key element of support is being meaningfully responsive.

Being responsive is to let the customer know that I am aware of whatever it is that they need help with. Once that initial assertion is made, it is completely up to me - and not the customer - to let the customer know about progress with their problems, or any further roadblocks, until they are satisfied. Even if the end result is that I cannot solve their problem, I have to let them know.

I want to go back a bit since I’m musing anyway. Who is a customer? Are all users - paying and non-paying - also customers?

A customer is somebody to whom I am providing something - a product, a service, an assurance that something will be done in a specific time period. By this definition, if I’m in a team, all my team members are my customers. If I have a product that users are using, they are my customers.

My thoughts on customer service have been shaped by various people throughout my career, and I wanted to capture these thoughts in one place for two reasons. One - to provide clarity to myself. Two - to express gratefulness to those people even though I won’t be naming names.

My first job out of college was in a middleware company called Pramati. The product team I joined was building a J2EE appserver - which incidentally was also the first to be J2EE 1.3 certified in the world. Someday I would love to talk about that in another post. I don’t know what piece of luck landed me in that team after several failed interviews in other companies. I joined an experienced engineering team along with a few other fresh-out-of-college folks. It was a new, exciting experience for all of us. Perhaps what set the tone for the rest of my career was the way we used to work. A focus on technical excellence founded on depth of understanding, a friendly, easy-going atmosphere, and a company led by a visionary CEO and a co-founder who were far ahead of their time.

It was a small team. Engineering would often work closely with customer support. Every customer was valuable. We got to see firsthand how the support team interacted with customers, and we sometimes got on calls ourselves. Now that I look back at it, this experience - so early on in my career - created an awareness that many engineers seem to lack. It’s an eye-opening moment when you see the impact - bad or good - that your code has in an actual customer deployment.

It would still take me many years before I understood the full value of what I had learnt then.

Years later, in a different team, a boss taught me that the commonly understood definition of customer as somebody who pays you for your product or service is severely restrictive. A customer is actually anybody who depends on you for something, and thus includes your team members. My boss was trying to make us work better as a team - and I think he succeeded to some extent.

The importance of engineering folks working with the support team - or even as part of the support team for a few weeks - cannot be understated. I think it should be part of every junior engineer’s training.

Today I Learned - How To Setup GitHub Actions for a Node Monorepo

2024-06-27T00:00:00+00:00

A monorepo is a repository with more than one logical project. I was trying to setup GitHub Actions to build my repository with two distinct projects inside it. GitHub Actions will run your YAML configuration whenever you commit. Turns out it’s not so straightforward.

This is what my project structure looks like

project-root
  - frontend
    - package-lock.json
    - src/
    - ... 
  - backend
    - package-lock.json
    - src/
    - ... 

The autogenerated node.js.yml looks like this

name: Node.js CI

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [14.x, 16.x, 18.x]
        # See supported Node.js release schedule at https://nodejs.org/en/about/releases/
    steps:
    - uses: actions/checkout@v4
    - name: Use Node.js $
      uses: actions/setup-node@v3
      with:
        node-version: $
        cache: 'npm'
    - run: npm ci
    - run: npm run build --if-present

How do I specify that the jobs have to run inside each subdirectory? And there is no top level package-lock.json (which is where some of the proposed solutions break)

A combination of SO and Medium posts and a GitHub issue comment helped me to get to a working configuration. The key points are

Use the “matrix” keyword to declare the equivalent of “run this for d in directories” where directories is the list of your subdirectories
Specify a working-directory under defaults, with the value being $d from the matrix
Specify the cache-dependency-path with the same $d/package-lock.json
Add an npm install before doing anything

The final working YAML looks like this

  # This workflow will do a clean installation of node dependencies, cache/restore them, build the source code and run tests across different versions of node
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-nodejs

name: Node.js CI

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix: { dir: ['./backend', './frontend'], node-version: ['20.x'] }
        # See supported Node.js release schedule at https://nodejs.org/en/about/releases/
    defaults:
      run:
        working-directory: $
    steps:
    - uses: actions/checkout@v4
    - name: Use Node.js $
      uses: actions/setup-node@v3
      with:
        node-version: $
        cache: 'npm'
        cache-dependency-path: '$/package-lock.json'
    - run: npm install
    - run: npm ci
    - run: npm run build --if-present
#    - run: npm run test

Note that I did not have to include the node-version in the matrix - it is just to illustrate the syntax.

Thank you for reading, and do reach out via comments or on Twitter if you want to chat or share your thoughts.

Today I Learned - Disabling Object Listing for Google Storage Buckets with Public Access

2024-06-22T00:00:00+00:00

Using a Google Cloud Storage (GCS) bucket for static storage is a very easy way to serve static content over HTTPS. For this to work, public access has to be enabled on the bucket’s objects. The access should be read only at the public level and can be set using one of Google IAM’s predefined roles.

At first glance, the role for this would seem to be Storage Object Viewer, and that’s what I went with when setting up a bucket recently to serve images. This role though also exposes the bucket contents as an XML, which is not something you want.

It turns out that the appropriate role is Storage Legacy Object Reader. The difference between the roles can be seen in their permissions.

Storage Object Viewer has both list and get permissions:

whereas Storage Legacy Object Reader has just a get permission:

Thank you for reading, and do reach out via comments or on Twitter if you want to chat or share your thoughts.

Software Defined Networking - A Short Introduction

2024-04-19T00:00:00+00:00

Programmable Networks

Let’s take a simple example. To set up a simple network switch for your home network, you have to connect it using LAN cables to your router, and then to the individual devices - laptops, desktops, your NAS, and so on. You “define” the routes between the devices using cables and sockets. Switches are “smart” in the sense they learn routes to the devices connected to them. You cannot change the way the routes are defined - your switch has 6 or 8 or more ports which it knows about, and how to route packets between the devices. The routing logic is hardcoded in the device.

But what if you wanted to change the routing logic to something custom? And what if you had multiple switches, and you wanted to control the routing logic from one place? Traditional networking devices won’t allow you to do this.

Why Should You, an Ops Engineer, Know Anything About SDN?

The entire foundation of a cloud is based on the virtualization of resources - storage, compute, and network. The concepts of SDN make it easier to enable network virtualization. Clouds are fundamentally multi-tenanted, and having an idea of how things work behind the scenes can give us a better appreciation of what goes on under the hood. As a cloud user or admin, you don’t need to know how SDN works because it’s never exposed to you. Nevertheless, it can be fascinating especially if you are interested in networking, like I am.

The Control Plane and the Data Plane

Before going ahead we need to understand these two terms. There are two primary functions where data transfer is concerned - transferring the data itself and communicating the routing messages that define how the data should be transferred.

Control plane - The part of a networking infrastructure that decides how to handle traffic.

Data plane - The part of a networking infrastructure that forwards traffic according to the control plane’s decisions.

In addition, there is a “controller” which sits on top of the control plane and is the gateway to configure the data plane devices. A single software controller can control multiple data planes using APIs.

A Bit of History

Networking equipment used to be configurable using vendor-specific interfaces on individual devices. The evolution of SDN was driven by several factors, and happened in roughly three stages, a s described in Feamster et al’s paper [1].

Active networks

Active networks were an initiative where network devices (we will refer to each one as a node) exposed internal resources using a network API. Packets passing through a node could undergo custom p rocessing (in the data plane). This was born out of frustration with long timeframes to deploy new network services, the difficulty of customizability for specific applications, and researchers’ desire for the ability to experiment at a large scale.

Control and data plane separation

As commodity hardware became cheaper and more powerful, and ISPs had to manage larger networks, research projects started to focus on the control plane, rather than data plane (which was the case in active networking) programmability. Visibility into the entire network also became a requirement. There was some initial skepticism around not having a single point of failure (e.g. the control plane failing and the data plane continuing to work) but similar problems already existed in existing hardware.

The OpenFlow API

OpenFlow is a protocol specification of the data plane functionality and also a protocol between the controllers and the data plane devices in a setup where these two planes were separated. OpenFlow allowed packet forwarding rules to be defined on much more than the destination IP address, which was the case with traditional devices.

So what identifies a network as software-defined?

Separate control and data planes.
The control plane is a centralized controller or set of controllers that can view and control the entire network or networks and is implemented as software that can run on commodity hardware.
Data plane devices are “dumb” forwarding devices.
Well-known (public) interfaces exist between the control plane devices (controllers) and the data plane devices).
Other software can program the network using the SDN controllers to suit their needs.

The OpenFlow Protocol

The initial OpenFlow spec was born out of the idea that the already existing flow tables (or access control lists) in network devices could be used to describe newer packet forwarding behaviour [2]. There was also a need to divide (“slice”) the flow tables so that researchers could run experiments on production networks without impacting real-world traffic.

After separating the planes, admins could program the control plane from any operating system remotely, and thus define the flow tables the way they wanted without programming the device itself.

Virtualizing the Network

Network virtualization (NV) allows one or more virtual networks to exist on top of a shared physical network. Virtual networks predate the idea of SDN. The concept is related to SDN because SDN enables NV more easily.

If you use any cloud provider, you already use network virtualization. The VPC in AWS, or in GCP - are just virtual (and your personal) networks built on top of shared physical ones.

Setup a Virtual Network on Your Own

Mininet is a software that can create a virtual network or networks on a single machine, letting you play around with various network topologies on your laptop. The network devices are emulated in software.

You can run Mininet

From the CLI
Programmatically
Using miniedit - a rudimentary GUI. I don’t recommend this as it’s easy to trip on some bugs.

We’ll take the programmatic approach.

Installing Mininet is simple using your package manager if you’re on Linux.

On Debian-based distros, run

sudo apt install mininet

Although you can create complex topologies (Ring, Tree, and so on), we will create simple host and switch-based ones here. You can refer to the Mininet docs for more information.

A Simple Two-Host Network Topology

Source: https://github.com/talonx/mininet-demo/blob/main/basic_topology.py

from mininet.topo import Topo, LinearTopo;
from mininet.net import Mininet;
from mininet.cli import CLI;

class Basic(Topo):

    def __init__(self):
        Topo.__init__(self)

        h1 = self.addHost("h1");
        h2 = self.addHost("h2");

        s1 = self.addSwitch("s1");

        self.addLink(h1, s1, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);
        self.addLink(h2, s1, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);

topo = Basic();
net = Mininet(topo=topo) # Uses the default reference controller
net.start()
CLI(net)
net.stop()

This defines two hosts h1 and h2 and a switch connecting them using the addLink method. This forms the core of any Mininet topology definition.

There are different ways to start the network simulator, but the easiest is to invoke

net.start()

and then move into interactive mode with

CLI(net)

which will drop you into a command prompt (mininet>) where you can run various commands to interact with your virtual network.

Run the Python script by invoking

sudo python3 basic_topology_run_cmd.py

The dump command shows you the hosts and devices

mininet> dump

Note that Mininet created a default SDN controller since we did not provide one. This is sufficient for simple topologies.

Now try pinging h2 from h1

mininet> h1 ping h2 -c 3
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=4.45 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.280 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.081 ms

--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2023ms
rtt min/avg/max/mdev = 0.081/1.602/4.445/2.011 ms

Type quit or exit to exit from the CLI.

Running Commands Inside the “Hosts”

Source: https://github.com/talonx/mininet-demo/blob/main/basic_topology_run_cmd.py

from mininet.topo import Topo, LinearTopo;
from mininet.net import Mininet;
from mininet.cli import CLI;

class Basic(Topo):

    def __init__(self):
        Topo.__init__(self)

        h1 = self.addHost("h1");
        h2 = self.addHost("h2");

        s1 = self.addSwitch("s1");

        self.addLink(h1, s1, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);
        self.addLink(h2, s1, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);

topo = Basic();
net = Mininet(topo=topo) # Uses the default reference controller
net.start()

h1 = net.get("h1");
res = h1.cmd("route -n")
print(res)

net.stop()

This illustrates running a command from inside one of the hosts.

A Multi-Switch Topology

Source: https://github.com/talonx/mininet-demo/blob/main/multi_switch.py

from mininet.topo import Topo, LinearTopo;
from mininet.net import Mininet;
from mininet.cli import CLI;

class Lan(Topo):

    def __init__(self):
        Topo.__init__(self)

        s1 = self.addSwitch("s1");
        s2 = self.addSwitch("s2");
        h1 = self.addHost("h1");
        h2 = self.addHost("h2");

        # Link host 1 to switch 1
        self.addLink(h1, s1, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);
        # Link host 2 to switch 2
        self.addLink(h2, s2, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);
        # Link switch 1 to switch 2
        self.addLink(s1, s2, bw=1, delay="10ms", loss=0, max_queue_size=1000, use_htb=True);

topo = Lan();
net = Mininet(topo=topo) # Uses the default reference controller
net.start()
CLI(net)
net.stop()

This illustrates two switches, with one host connected to each. Check the links created once you’re inside the CLI

mininet> links
h1-eth0<->s1-eth1 (OK OK)
h2-eth0<->s2-eth1 (OK OK)
s1-eth2<->s2-eth2 (OK OK)

And then ping h2 from h1 (they are not connected to the same switch)

mininet> h1 ping h2 -c 3
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.077 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.080 ms

--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2025ms
rtt min/avg/max/mdev = 0.077/0.078/0.080/0.001 ms

Mininet uses the reference implementation of an SDN controller if you don’t specify anything.

There are other software controllers available

Pox - https://github.com/noxrepo/pox
Ryu - https://ryu-sdn.org/
ONOS - https://opennetworking.org/onos/
OpenVSwitch - https://www.openvswitch.org/
OpenDaylight - https://www.opendaylight.org/

References

The Road to SDN - An intellectual history of programmable networks - https://queue.acm.org/detail.cfm?id=2560327
Cloud Native Data Center Networking: Architecture, Protocols, and Tools - Dinesh G. Dutt - https://www.oreilly.com/library/view/cloud-native-data/9781492045595/
Foundations of Modern Networking - William Stallings - https://www.oreilly.com/library/view/foundations-of-modern/9780134175478/

Thank you for reading, and do reach out via comments or on Twitter if you want to chat or share your thoughts.

How the Domain Name System Uses Anycast for Low Latency

2024-02-29T00:00:00+00:00

In this article, I will explore what Anycast is in internetworking and how it is used to reduce latency.

Anycast is a concept that involves a group of servers that share the same IP address, and the server that is closest to the client gets to serve the request. This definition raises some questions — won’t there be IP conflicts? How is “closest” determined? How does the request reach the “closest” server?

We will take the example of DNS throughout this article. All major DNS providers use Anycast.

First, let’s look at how DNS resolution works.

The Mechanics of DNS Resolution

A request for the domain google.com from your browser goes to a DNS resolver, which resolves it to an IP address. This resolution happens by querying nameservers recursively. Why recursively? Each query in the recursive process resolves one part of the domain and the process starts from the tail-end.

The resolver is usually on your laptop (e.g. unbound or resolved on Linux). It contacts one of the 13 root nameservers first. From there, it fetches the IP of the nameserver that knows about the TLD (top-level domain), in this case, .com. Next, it contacts the .com nameserver to ask who knows about google.com. The response is the IP of another nameserver — called an authoritative nameserver. The authoritative nameserver responds with the IP address of google.com. If we had queried for a subdomain of google.com (www.google.com or images.google.com) the query would have continued similarly.

What’s important to note here is the server that responds at each step.

What if one or all of the 13 nameservers, or the other servers in the chain, were down or unreachable because of hardware failure, a damaged undersea cable, or a Distributed Denial-of-Service (DDoS) attack?

In reality, the 13 servers are 13 IP addresses, each backed by multiple actual servers. So are the TLD and authoritative nameservers. So which server actually responds to our DNS resolver query? That is where Anycast comes in.

We have to dive a bit into how routing works on the internet to understand this.

Routing on the Internet

The client (the resolver) gets the IP address of a root nameserver and it sends out a query — let’s call it Q. How is Q routed?

The internet is made up of Autonomous Systems (ASs) — blocks of network owned by different entities, many of them ISPs. Each AS knows how to route packets within its own network, and it advertises the network prefixes to which it can route. These prefixes include its own network prefix as well as other ASs it connects to and to which it can forward packets. Routers in one AS announce these advertisements using the Border Gateway Protocol (BGP) to routers in other ASs. This is how routers know how to send a packet originating in one AS to its destination. BGP is used for most of the routing on the internet.

Our Q will also use BGP. If we had just one physical server for a root nameserver IP (let’s say 198.41.0.4), Q will get routed through various ASs until it reaches the root nameserver. BGP will use its shortest path algo to send the packet to 198.41.0.4.

But what if

The client and 198.41.0.4 are far away (as measured by BGP).
There is a damaged undersea cable, further lengthening the path Q has to take.
198.41.0.4’s data center has a power failure. Q will never receive a response. If 198.41.0.4 were to be replicated behind a load balancer and multiple data centers, a single network disruption could still make it unreachable.

Anycast is used to mitigate such issues.

Anycast in a Nutshell

Multiple servers in different locations (ASs) announce the same address (198.41.0.4 in this example) to their routing device. This is possible because internet routes for a particular prefix can come from multiple ASs. BGP uses this information to create routes. When Q reaches its first routing point in Q’s AS, BGP calculates the shortest path to 198.41.0.4 from Q’s AS. This router might have multiple paths to 198.41.0.4, which in reality points to different servers - but the router thinks they are the same endpoint through different routes. Based on the client’s location, and thus the path, the actual server to which 198.41.0.4 is mapped might be different. The packet gets routed to the closest server which answers to 198.41.0.4. A client in a different location might get routed to another server which also answers to 198.41.0.4.

The servers are geographically distributed, which helps to prevent or minimize disruptions from outages.

If you think about this for a moment, some things stand out:

All servers that answer to 198.41.0.4 must have the same information.
The admin of 198.41.0.4 should be able to announce the same IP address into the routing system at multiple points.

Anycast is not a different protocol, does not need any different hardware, and does not require any special capabilities.

Anycast for Stateful Applications

The original RFC describing Anycast raised some interesting points.

What stops a packet from reaching multiple servers since the Internet Protocol (IP) is allowed to misroute and duplicate packets?

Imagine that Q reaches its target but the ACK is lost. The packet will get redelivered, but will it reach the same server as before or a different server? What if BGP’s shortest path algorithm determines a new path in the time between the redelivery attempts because of a change? Does it even matter?

At the network level, these issues matter for stateful protocols like TCP. TCP’s connections will get reset frequently if the destination server changes in the middle due to routing changes.

DNS queries use mostly UDP, so these issues don’t arise. However, if other applications using UDP attempt to maintain stateful connections, they might run into such issues when using Anycast. The RFC goes on to say:

“The obvious solutions to these issues are to require applications which wish to maintain state to learn the unicast address of their peer on the first exchange of UDP datagrams or during the first TCP connection and use the unicast address in future conversations.”

What about the application level? Can we use Anycast?

We can, in theory, but it would restrict application capabilities and raise new challenges:

Transactions will have to be short-lived enough to get routed to the same server.
Servers will have to synchronize data between themselves.

But wait, many Content Delivery Networks (CDNs) also use Anycast for data transfer over HTTP. HTTP is a stateless protocol but uses TCP which is stateful. So how does it work?

CDNs mostly serve short-lived, static content that can be served with a single request, so this issue is of low-importance for such cases. There are also some studies indicating that the actual “switching” of servers due to routing changes is very low (PDF link). For longer-lived connections, some services use a strategy where the initial address reached using Anycast is used to redirect the client to a nearby server, which is possibly co-located in the same data center. All subsequent communication happens using that.

Anycast is not a load-balancing mechanism. Also, BGP’s path selection algorithm uses the AS_PATH metric between ASs, which determines the shortest path based on the number of ASs that have to be traversed. It does not take into account network delays or capacity.

Anycast in DDoS Mitigation

Anycast is also used to mitigate DDoS attacks. Eliminating a single point of failure improves resiliency of the service. In the case of DNS, the root nameservers are replicated. During an attack, the bulk of the DDoS traffic can be localized to specific regions, and thus avoid taking down the entire service.

Anycast remains a key mechanism for global internet services like DNS and CDN to reduce latency and has been put into operation by most big providers.

References

List of Anycast-related RFCs
- Host Anycasting Service https://www.rfc-editor.org/info/rfc1546
- Distributing Authoritative Name Servers via Shared Unicast Addresses https://www.rfc-editor.org/info/rfc3258
- Operation of Anycast Services https://www.rfc-editor.org/info/rfc4786
- Architectural Considerations of IP Anycast https://www.rfc-editor.org/info/rfc7094
Submarine Cable Map https://www.submarinecablemap.com/

Thank you for reading, and do reach out via comments or on Twitter if you want to chat or share your thoughts.

Some New Tech I Learnt in 2023

2024-02-05T00:00:00+00:00

I took a career break in the middle of last year to pursue some of my other interests - both technical and non-technical.

In the process I ended up tinkering with quite a few new things in tech, and I’ve made some of them part of my ongoing and future work.

I explored the Fast AI API using Jeremy Howards’s fantastic set of videos. I completed most of Part 1 in his Practical Deep Learning course. I learnt some of the Fast.ai API, a little bit of deep learning concepts, and how to use Jupyter notebooks on Kaggle.
I wrote a Twitter bot that tweets classical music trivia. This was just for fun.
It’s built with Python, Google Cloud Functions, and Firebase. The thing does not cost me a single penny as it runs entirely on GCP’s free tier. I learnt about Twitter bot types and its auth API and a bit of Firebase.
While working on a frontend for a personal project, I had to try out Hotwire. A friend had mentioned this. I did a few tutorials which used ROR. My ROR knowledge is essentially zero, but Hotwire is not tied to ROR - it can be used with any backend.
For the same project I had to use Java Template Engine. It’s a good thing that it comes with Spring Boot integration.
Spring boot persistence with JPA/Hibernate - This was for a personal project that I’m still working on. JPA is a beast, and I’m thinking of switching to something simpler, if it exists.
Rust. This is a whole new way of thinking. It is also a lot of struggling with the compiler for the first few months.