Martin Kleppmann's blog

Prediction: AI will make formal verification go mainstream

Martin Kleppmann — Mon, 08 Dec 2025 00:00:00 +0000

Much has been said about the effects that AI will have on software development, but there is an angle I haven’t seen talked about: I believe that AI will bring formal verification, which for decades has been a bit of a fringe pursuit, into the software engineering mainstream.

Proof assistants and proof-oriented programming languages such as Rocq, Isabelle, Lean, F*, and Agda have been around for a long time. They make it possible to write a formal specification that some piece of code is supposed to satisfy, and then mathematically prove that the code always satisfies that spec (even on weird edge cases that you didn’t think of testing). These tools have been used to develop some large formally verified software systems, such as an operating system kernel, a C compiler, and a cryptographic protocol stack.

At present, formal verification is mostly used by research projects, and it is uncommon for industrial software engineers to use formal methods (even those working on classic high-assurance software such as medical devices and aircraft). The reason is that writing those proofs is both very difficult (requiring PhD-level training) and very laborious.

For example, as of 2009, the formally verified seL4 microkernel consisted of 8,700 lines of C code, but proving it correct required 20 person-years and 200,000 lines of Isabelle code – or 23 lines of proof and half a person-day for every single line of implementation. Moreover, there are maybe a few hundred people in the world (wild guess) who know how to write such proofs, since it requires a lot of arcane knowledge about the proof system.

To put it in simple economic terms: for most systems, the expected cost of bugs is lower than the expected cost of using the proof techniques that would eliminate those bugs. Part of the reason is perhaps that bugs are a negative externality: it’s not the software developer who bears the cost of the bugs, but the users. But even if the software developer were to bear the cost, formal verification is simply very hard and expensive.

At least, that was the case until recently. Now, LLM-based coding assistants are getting pretty good not only at writing implementation code, but also at writing proof scripts in various languages. At present, a human with specialist expertise still has to guide the process, but it’s not hard to extrapolate and imagine that process becoming fully automated in the next few years. And when that happens, it will totally change the economics of formal verification.

If formal verification becomes vastly cheaper, then we can afford to verify much more software. But on top of that, AI also creates a need to formally verify more software: rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct. If it can do that, I’ll take AI-generated code over handcrafted code (with all its artisanal bugs) any day!

In fact, I would argue that writing proof scripts is one of the best applications for LLMs. It doesn’t matter if they hallucinate nonsense, because the proof checker will reject any invalid proof and force the AI agent to retry. The proof checker is a small amount of code that is itself verified, making it virtually impossible to sneak an invalid proof past the checker.

That doesn’t mean software will suddenly be bug-free. As the verification process itself becomes automated, the challenge will move to correctly defining the specification: that is, how do you know that the properties that were proved are actually the properties that you cared about? Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.

I could also imagine AI agents helping with the process of writing the specifications, translating between formal language and natural language. Here there is the potential for subtleties to be lost in translation, but this seems like a manageable risk.

I find it exciting to think that we could just specify in a high-level, declarative way the properties that we want some piece of code to have, and then to vibe code the implementation along with a proof that it satisfies the specification. That would totally change the nature of software development: we wouldn’t even need to bother looking at the AI-generated code any more, just like we don’t bother looking at the machine code generated by a compiler.

In summary: 1. formal verification is about to become vastly cheaper; 2. AI-generated code needs formal verification so that we can skip human review and still be sure that it works; 3. the precision of formal verification counteracts the imprecise and probabilistic nature of LLMs. These three things taken together mean formal verification is likely to go mainstream in the foreseeable future. I suspect that soon the limiting factor will not be the technology, but the culture change required for people to realise that formal methods have become viable in practice.

Updates:

This paper from September 2025 coins the term “vericoding” (in contrast to “vibecoding”) to describe using LLMs to generate formally verified code, and presents some benchmark results for several languages.
A few startups are on the case: I’ve heard reports that Harmonic’s Aristotle, Logical Intelligence, and DeepSeek-Prover-V2 are getting pretty good at writing Lean proofs. In fact, it seems that Lean is where most of the action is.

Pudding: user discovery for anonymity networks

Martin Kleppmann — Fri, 05 Jul 2024 00:00:00 +0000

I’d like to introduce an exciting new research paper I worked on! It’s about a system called Pudding, and it was presented by Ceren at the IEEE Symposium on Security and Privacy, one of the top academic conferences on computer security, in May. Daniel and Alastair also worked on this project. Ceren’s presentation is now available:

Let me briefly explain what the paper is about.

Anonymity systems allow internet users to hide who is communicating with whom – for example, think a whistleblower talking to a journalist, or a group of activists organising protests against their repressive regime. Tor is the most popular anonymity network; Nym is a more recent design with stronger security (and incidentally, one of the better cryptocurrency applications I’ve seen). Nym is based on a research system called Loopix.

The trouble with these anonymity networks is that if you want to contact someone, you need to know their public key, and sometimes a bunch of other information as well. In the case of Tor, this is encoded in a “onion service” URL, which is an unreadable sequence of random letters and numbers (sometimes service operators use brute force to pick a public key so that the first few letters of the hostname spell out the name of the service, but the rest remains random). In Nym, it’s an even longer base58 string. How are users supposed to find the correct key for the person they’re trying to contact? If they send the key via a non-anonymous channel or query a server, they leak the information of who is talking to who, which defeats the entire purpose of the anonymity network.

Having to manually exchange public keys is a huge step backwards in terms of usability. A big part of why WhatsApp and Signal succeeded in bringing end-to-end encryption to billions of users, while PGP failed, is that today’s secure messaging apps allow you to find your friends using only a phone number or some other friendly username, while PGP encouraged weird, nerdy, in-person meetings for exchanging keys.

Pudding brings friendly usernames to the Loopix/Nym anonymity networks, so that users don’t have to deal with long random strings. We used email addresses rather than phone numbers, for reasons explained in the paper, but the idea is the same. The challenge is providing the username lookup in a way that doesn’t leak who is talking to who. In fact, Pudding even goes further and hides whether a given username is registered to the network or not.

If you’re wondering how this work on anonymity relates to my other work on CRDTs/local-first software: I see anonymity networks as one possible transport layer on top of which we can build decentralised collaboration software. Not all collaboration apps will need the metadata privacy of an anonymity network, but it’s nice to be able to support high-risk users, such as investigative journalists, who do have strong security needs.

If you want to learn more, please watch the talk, read the paper, or check out the source code! Just note that the implementation is a research prototype and not fit for production use. We’re hoping that Nym might officially adopt something like Pudding in the future.

2023 year in review

Martin Kleppmann — Thu, 04 Jan 2024 00:00:00 +0000

A lot has happened in the last year, so I thought it would be good to write up a review.

My biggest change in 2023 was that my wife and I had a baby! This has brought a mixture of joys and frustrations, but overall it has been very good. I took three months of full-time parental leave after the birth, and since going back to work I’ve been sharing the parenting with responsibilities with my partner. Family has therefore been my top priority, but I won’t talk much about family things in this post, since I prefer to keep it private. Lots of work things happened as well:

New job!

As of January 2024 I have a new job as Associate Professor in Cambridge! Unlike all my previous academic positions, which were all fixed-term contracts of a few years, this is a permanent position. A huge number of people apply for this sort of position, and so I feel very fortunate that my colleagues had faith in my work and decided to choose me.

(Technically, I have to pass a 5-year probation period until the position is permanent, but I’m told that this is mostly a formality, and nothing like the problematic tenure-track system in the US.)

I’ve arranged to work part-time (65%) for the first year on the job, so that I can do a greater share of the parenting duties until our child goes to nursery (which we’re hoping will be in approximately a year’s time). Partly for this reason I’ve not been given any teaching duties for this academic year. However, I’ve been asked to offer a new master’s module for next year, which will take some effort to prepare. I’m planning to do it on cryptographic protocols.

I had only started my previous job at TU Munich in October 2022, so it’s a bit strange to leave again after just over a year. However, Cambridge is better for us for family reasons, and Cambridge was offering a permanent position whereas my job at TU Munich was fixed-term, so it made sense to move back to Cambridge.

The biggest downside of moving is that I have lost the grant that brought me to Munich in the first place (since that grant requires me to be at a German university). That’s a shame, because it was a lot of money – enough for two PhD students and a postdoc for several years. One of my first activities in Cambridge will therefore be to start applying for new grants. Ç’est la vie (académique).

Research papers and projects

I had one big paper acceptance in 2023: our article “Pudding: Private User Discovery in Anonymity Networks” (with Ceren Kocaoğullar, Daniel Hugenroth, and Alastair Beresford) was accepted at the IEEE Symposium on Security and Privacy, which will take place in May 2024. This paper solves a problem with the Loopix/Nym anonymity network: previously you had to somehow find out someone’s public key in order to contact them on the network, and our work makes it possible to contact people via a short, friendly username instead (while preserving the security properties of the anonymity network).

Matthew Weidner and I went through several iterations of our paper “The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing”. The latest version is currently under submission at a journal, and a preprint is available on arxiv. This paper tackles a problem in many collaborative text editing algorithms: when different users insert text at the same place in a document (especially while working offline), the algorithms may mix up text from the different users. Our paper shows how to solve this problem.

The paper “Upwelling: Combining Real-time Collaboration with Version Control for Writers” (with Rae McKelvey, Scott Jenson, Eileen Wagner, and Blaine Cook) appeared on the Ink & Switch website in March. We also submitted it to an academic conference, but it was rejected, so we’re just keeping it as a web article. The paper describes a prototype rich text editor that combines Google-Docs-style real-time collaboration with Git-style version control features (branching, merging, diffing, and editing history).

My master’s student Liangrun Da published “Extending JSON CRDT with move operations”, a report from a research project he did with me in 2023. The goal of this project was to develop a move operation for Automerge, which could be used to reorder items in a list, or to move a subtree of a JSON document to a different location in the tree. The algorithm is not yet fully implemented within Automerge, but we’re hoping to get there this year.

My other master’s student Leo Stewen’s report “Undo and Redo Support for Replicated Registers” describes another algorithm prototype for Automerge – this one aiming to add support for undo and redo. This also turns out to not be entirely straightforward, especially when you consider the interaction with all the other features of Automerge.

Industrial collaborations: Automerge and Bluesky

I’ve continued my long-standing collaboration with Ink & Switch, in particular around the Automerge open-source project. Alex Good, who is funded by Automerge sponsors and my Patreon supporters, works full-time to maintain the project for our industrial users, while several others at Ink & Switch and in the open source community have been making valuable contributions. I’ve moved into an advisory role and haven’t been writing any actual code for the project lately.

The two biggest milestones for Automerge in 2023 were:

The release of Automerge 2.0, the rewrite of the original JavaScript code base in Rust. This has enabled huge performance improvements, and also made Automerge available on many more platforms: we compile Rust to WebAssembly and have a TypeScript/JavaScript wrapper for web browsers and node, but we can also compile Rust to a native library and call it from C, Go, Swift/iOS, Java/Android, and others. The idea is to implement the hairy, performance-critical CRDT logic once in Rust, and then to have wrapper APIs for all common programming languages that all share the same data format and interoperate.
Whereas Automerge itself is only an in-memory data structure library with no I/O, Automerge-Repo now provides out-of-the-box integrations with persistent storage (e.g. IndexedDB in a browser, and the filesystem in native apps) and with network protocols (e.g. WebSocket). Moreover, Automerge-Repo provides integrations with frontend libraries (e.g. React and Svelte). Previously app developers had to figure out all of this for themselves, so Automerge-Repo is a huge step forward in terms of making it easier to build applications on top of Automerge.

My other ongoing industrial collaboration is with Bluesky, a decentralised social network/protocol. Bluesky has had a tremendously successful year: launched into private beta in early 2023, it has grown to 3 million users by the end of the year. I’ve been advising the team since the beginning (they started development about two years ago) on topics around scalability, protocol design, architecture, and security.

I also helped them write a research paper about the Bluesky architecture and comparing it to other decentralised social protocols; we’ll be publishing that paper sometime in the next few months. I personally think Bluesky and the underlying AT Protocol do many things much better than the alternatives, such as Mastodon/ActivityPub, and they have a real chance of becoming a mainstream Twitter successor. Bluesky wants to come out of private beta and open up public federation early this year; it’s going to be an exciting time.

I still have some Bluesky invitation codes to give out. If I know you personally, feel free to send me an email and I’ll send you a code. (Sorry, I don’t have enough codes to give out to people I don’t know.)

Events, conferences, workshops

I co-organised three events last year:

The first summer school on Distributed and Replicated Environments (DARE) in Brussels, Belgium. We had 40 master’s and PhD students from all over Europe, and a few from further afield as well. I gave four hours of lectures (plus lots more time spent in informal conversations), and I think we succeeded in getting the students excited about research in distributed systems. One of the attending master’s students is now applying to do a PhD with me in Cambridge.
An unconference on local-first software in St. Louis, MO, USA the day after Strange Loop. We had space for about 100 people and the event sold out surprisingly quickly. Sadly I couldn’t be there because I caught covid at the summer school, but my co-organisers told me that there were excellent discussion among the attendees. Notes and photos from the event have been collected in this Git repository.
The Programming Local-First Software (PLF) workshop at SPLASH 2023 in Cascais, Portugal. This event aims to bring together industrial practitioners with researchers in the area of programming language design to discuss ways of improving how local-first software is developed. The event included a keynote by Brooklyn Zelenka, and we had 15 submissions from which we were able to build an interesting and varied programme of talks.

I also gave several public talks in 2023:

At the GOTO Amsterdam conference in June (recording) I gave a talk introducing Automerge and local-first software to an audience of industrial software engineers, and I repeated the talk at the Amsterdam Elixir meetup.
At the Strange Loop conference in September (recording) I spoke about the research we’ve done over the last few years on collaborative text editing, especially bringing together real-time collaboration with Git-style version control: diffing, branching, and merging (featuring Upwelling, Automerge, and Peritext). I had to give the talk remotely and I couldn’t see or hear the room, but I’m told that it was full, with standing room only.
At the KASTEL Distinguished Lecture Series in Karlsruhe, Germany (recording) I spoke about the security challenges that arise when you try making collaboration software peer-to-peer, and you have to make it work even though you don’t know who you can trust.
At the ACM Tech Talks series (recording) I gave a repeat of my GOTO Amsterdam talk, and there was a lively Q&A session afterwards with lots of good questions. There was a good turnout: around 400 people watched the talk live.
At the IETF Decentralization of the Internet Research Group I gave a talk about local-first software. My collaborators and I have been discussing that we would like to eventually develop open standards for the protocols around local-first software (right now it’s still too early, so this would be something to consider once they have matured a bit). I’m hoping that this talk might be the beginning of a process of engagement that could eventually lead to such a standardisation effort.

Designing Data-Intensive Applications

My book continues to sell well, with now over 230,000 copies sold, and reviews continue to be very positive. However, it is gradually showing its age – it was published in 2017, but I wrote the first few chapters around 2014/15, so they are now almost a decade old. Moreover, I have learnt a lot in the meantime, and there are quite a few things in the book that I would now say differently.

For that reason, I have been working on a second edition that brings the book up-to-date. However, my progress has been very slow, as I’ve had to fit in the research and writing for the second edition alongside my various other work and family commitments. I actually already agreed to do the second edition with O’Reilly in 2021, and the full manuscript was supposed to be complete by January 2023. Well… that didn’t quite happen as planned.

In fact, I only properly started writing in 2023, and so far I’ve only completed the revision of the first three chapters. I’m much happier with the revised version, but it takes a lot of time to do such thorough revisions, so I’m not even going to try to give an updated completion date. I’d much rather take the time to make it good, however long it takes, rather than rush to meet some artificial deadline. And I’m in the lucky situation where I can get away with such a stance.

In case you’re wondering what’s changing in the second edition: I’m keeping the high-level structure and topics quite similar, but I’m rewriting a lot of the actual text to be easier to follow and more nuanced. I also collected a lot of reference material over the years (books, papers, blog posts, etc.); a large part of my time is spent reading that material and incorporating it into the narrative.

The biggest technological change since the first edition is probably that hosted cloud services are now a much bigger thing than they were a decade ago, and the resulting rise of “cloud-native” architecture. Other things: NoSQL as a buzzword is dead (though many of its ideas have been absorbed into mainstream systems), MapReduce is dead (replaced by cloud data warehouses, data lakes, and things like Spark), and GDPR arrived (though the degree to which it is influencing data systems architecture is still somewhat open).

Local-first is taking off

Together with some colleagues from Ink & Switch I coined the term “local-first” in 2019 to describe the type of software we wanted to enable with Automerge and related projects. Initially the term was mostly used by ourselves and our direct collaborators, but in 2023 we have seen the idea catching on much more widely:

More people have been writing about local-first, including WIRED magazine in August, a blog post by Kyle Mathews in September, and a LWN.net article last year. These articles capture some of the excitement surrounding local-first software.
A website and Discord server on local-first web development was set up by members of the community in February 2023, and now has over 1,600 members. To date this community has organised ten online meetups, each with several speakers who are working in the area.
Besides the aforementioned local-first unconference in St. Louis and the programming local-first workshop in Cascais that I co-organised, there were also in-person local-first meetups in Berlin and London that were organised independently by community members.
Local-first appears prominently in the October 2022 edition of the Thoughtworks Technology Radar, an influential publication in enterprise software development circles.
We’ve seen at least a dozen products and startups advertising themselves as “local-first” on their websites, including for example Anytype, Fission, Replicache, Mycelial, ElectricSQL, Odd.dev, TinyBase, Aphrodite, DXOS, OrbitDB, p2panda, Socket Supply, and KDE Itinerary.
In academia the idea is also catching on: our original local-first article now has around 100 citations according to Google Scholar, 15 of which even use the term “local-first” in the paper title.

It’s exciting that so many people are buying into the idea. Over the coming years I hope we will continue to grow this community, and realise the advantages of the local-first approach in a broader range of software.

Verifying distributed systems with Isabelle/HOL

Martin Kleppmann — Wed, 12 Oct 2022 00:00:00 +0000

This post also appears on Larry Paulson’s blog.

We use distributed systems every day in the form of internet services. These systems are very useful, but also challenging to implement because networks are unpredictable. Whenever you send a message over the network, it is likely to arrive quite quickly, but it’s possible that it might be delayed for a long time, or never arrive, or arrive several times.

When you send a request to another process and don’t receive a response, you have no idea what happened: was the request lost, or has the other process crashed, or was the response lost? Or maybe nothing was lost at all, but a message has simply been delayed and may yet arrive. There is no way of knowing what happened, because unreliable message-passing is the only way how processes can communicate.

Distributed algorithms work with this model of unreliable communication and build stronger guarantees on top of it. Examples of such stronger guarantees include database transactions and replication (maintaining copies of some data on multiple machines so that the data is not lost if one machine fails).

Unfortunately, distributed algorithms are notoriously difficult to reason about, because they must uphold their guarantees regardless of the order in which messages are delivered, and even when some messages are lost or some processes crash. Many algorithms are very subtle, and informal reasoning is not sufficient for ensuring that they are correct. Moreover, the number of possible permutations and interleavings of concurrent activities quickly becomes too great for model-checkers to test exhaustively. For this reason, formal proofs of correctness are valuable for distributed algorithms.

Modelling a distributed system in Isabelle/HOL

In this blog post we will explore how to use the Isabelle/HOL proof assistant to formally verify a number of distributed algorithms. Isabelle/HOL does not have any built-in support for distributed computing, but fortunately it is quite straightforward to model a distributed system using structures that Isabelle/HOL provides: functions, lists, and sets.

First, we asssume each process (or node) in the system has a unique identifier, which could simply be an integer or a string. Depending on the algorithm, the set of process IDs in the system may be fixed and known, or unknown and unbounded (the latter is appropriate for systems where processes can join and leave over time).

The execution of the algorithm then proceeds in discrete time steps. In each time step, an event occurs at one of the processes, and this event could be one of three things: receiving a message sent by another process, receiving user input, or the elapsing of a timeout.

datatype ('proc, 'msg, 'val) event
  = Receive (msg_sender: 'proc) (recv_msg: 'msg)
  | Request 'val
  | Timeout

Triggered by one of these events, the process executes a function that may update its own state, and may send messages to other processes. A message sent in one time step may be received at any future time step, or may never be received at all.

Each process has a local state that is not shared with any other process. This state has a fixed initial value at the beginning of the execution, and is updated only when that process executes a step. One process cannot read the state of another process, but we can describe the state of the entire system as the collection of all the processes’ individual states:

Why a linear sequence of time steps is sufficient

Even though in reality processes may run in parallel, we do not need to model this parallelism since the only communication between processes is by sending and receiving messages, and we can assume that a process finishes processing one event before starting to process the next event. Every parallel execution is therefore equivalent to some linear sequence of execution steps. Other formalisations of distributed systems, such as the TLA+ language, also use such a linear sequence of steps.

We do not make any assumptions about which time step is executed by which process. It is possible that the processes fairly take turns to run, but it is equally possible for one process to execute a million steps while another process does nothing at all. By avoiding assumptions about process activity we ensure that the algorithm works correctly regardless of the timing in the system. For example, a process that is temporarily disconnected from the network is modelled simply by a process that does not experience any receive-message events, even while the other processes continue sending and receiving messages.

In this model, a process crash is represented simply by a process that executes no more steps after some point in time; there is no need for a crash to be explicitly represented. If we want to allow processes to recover from a crash, we can add a fourth type of event that models a process restarting after a crash. When executing such a crash-recovery event, a process deletes any parts of its local state that are stored in volatile memory, but preserves those parts of its state that are in stable storage (on disk) and hence survive the crash.

When reasoning about safety properties of algorithms, it is best not to assume anything about which process executes in which time step, since that ensures the algorithm can tolerate arbitrary message delays. If we wanted to reason about liveness (for example, that an algorithm eventually terminates), we would have to make some fairness assumptions, e.g. that every non-crashed process eventually executes a step. However, in our proofs so far we have only focussed on safety properties.

We can now express a distributed algorithm as the step function, which takes three arguments: the ID of the process executing the current time step, the current local state of that process, and the event that has occurred (message receipt, user input, timeout, or crash recovery). The return value consists of the new state for that process, and a set of messages to send to other processes (each message tagged with the ID of the recipient process).

type_synonym ('proc, 'state, 'msg, 'val) step_func =
  ‹'proc ⇒ 'state ⇒ ('proc, 'msg, 'val) event ⇒
  ('state × ('proc × 'msg) set)›

The current state of a process at one time step equals the new state after the previous step by the same process (or the initial state if there is no previous step). Assuming the step function is deterministic, we can now encode any execution of the system as a list of (processID, event) pairs indicating the series of events that occurred, and at which process they happened. The final state of the system is obtained by calling the step function one event at a time.

Defining what may happen

To prove a distributed algorithm correct, we need to show that it produces a correct result in every possible execution, i.e. for every possible list of (processID, event) pairs. But which executions are possible? There is only really one thing we can safely assume: if a message is received by a process, then that message must have been sent to that process. In other words, we assume the network does not fabricate messages out of thin air, and one process cannot impersonate another process. (In a public network where an attacker can inject fake packets, we would have to cryptographically authenticate the messages to ensure this property, but let’s leave that out of scope for now.)

Therefore, the only assumption we will make is that if a message is received in some time step, then it must have been sent in a previous time step. However, we will allow messages to be lost, reordered, or received multiple times. Let’s encode this assumption in Isabelle/HOL.

First, we define a function that tells us whether a single event is possible: (valid_event evt proc msgs) returns true if event evt is allowed to occur at process proc in a system in which msgs is the set of all messages that have been sent so far. msgs is a set of (sender, recipient, message) triples. We define that a Receive event is allowed to occur iff the received message is in msgs, and Request or Timeout events are allowed to happen anytime.

fun valid_event :: ‹('proc, 'msg, 'val) event ⇒ 'proc ⇒
                    ('proc × 'proc × 'msg) set ⇒ bool›
where
  ‹valid_event (Receive sender msg) recpt msgs =
    ((sender, recpt, msg) ∈ msgs)› |
  ‹valid_event (Request _) _ _ = True› |
  ‹valid_event Timeout _ _ = True›

Next, we define the set of all possible event sequences. For this we use an inductive predicate in Isabelle: (execute step init procs events msgs states) returns true if events is a valid sequence of events in an execution of the algorithm where step is the step function, init is the initial state of each process, and proc is the set of all processes in the system (which might be infinite if we want to allow any number of processes). The last two arguments keep track of the execution state: msgs is the set of all messages sent so far, and states is a map from process ID to the state of that process.

inductive execute ::
  ‹('proc, 'state, 'msg, 'val) step_func ⇒ ('proc ⇒ 'state) ⇒
   'proc set ⇒ ('proc × ('proc, 'msg, 'val) event) list ⇒
   ('proc × 'proc × 'msg) set ⇒ ('proc ⇒ 'state) ⇒ bool›
where
  ‹execute step init procs [] {} init› |
  ‹⟦execute step init procs events msgs states;
    proc ∈ procs;
    valid_event event proc msgs;
    step proc (states proc) event = (new_state, sent);
    events' = events @ [(proc, event)];
    msgs' = msgs ∪ {m. ∃(recpt, msg) ∈ sent.
                       m = (proc, recpt, msg)};
    states' = states (proc := new_state)
   ⟧ ⟹ execute step init procs events' msgs' states'›

This definition states that the empty list of events is valid when the system is in the initial state and no messages have been sent. Moreover, if events is a valid sequence of events so far, and event is allowed in the current state, then we can invoke the step function, add any messages it sends to msgs, update the state of the appropriate process, and the result is another valid sequence of events.

And that’s all we need to model the distributed system!

Proving an algorithm correct

Now we can take some algorithm (defined by its step function and initial state) and prove that for all possible lists of events, some property P holds. Since we do not fix a maximum number of time steps, there is an infinite number of possible lists of events. But that’s not a problem, since we can use induction over lists to prove P.

We use the List.rev_induct induction rule in Isabelle/HOL. It requires showing that:

the property P is true for the empty list (i.e. for a system in the initial state, which has not executed any time steps); and
if the property P is true for some execution, and we add one more time step to the end of the execution, then P still holds after that time step.

In other words, we prove that P is an invariant over all possible states of the whole system. In Isabelle, that proof looks roughly like this (where step, init, and procs are appropriately defined):

theorem prove_invariant:
  assumes ‹execute step init procs events msgs states›
  shows ‹some_invariant states›
using assms proof (induction events arbitrary: msgs states
                   rule: List.rev_induct)
  case Nil
  then show ‹some_invariant states› sorry
next
  case (snoc event events)
  then show ?case sorry
qed

The real challenge in verifying distributed algorithms is to come up with the right invariant that is both true and also implies the properties you want your algorithm to have. Unfortunately, designing this invariant has to be done manually. However, once you have a candidate invariant, Isabelle is very helpful for checking whether it is correct and whether it is strong enough to meet your goals.

For more detail on how to prove the correctness of a simple consensus algorithm in this model, I recorded a 2-hour video lecture that runs through a demo from first principles (no prior Isabelle experience required). The Isabelle code of the demo is also available.

If you want to work on this kind of thing, I will soon be looking for a PhD student to work with me on formalising distributed algorithms in Isabelle, based at TU Munich. If this sounds like something you want to do, please get in touch!

Book Review: The Future of Fusion Energy

Martin Kleppmann — Mon, 03 Jan 2022 00:00:00 +0000

I give a five-star ⭐️⭐️⭐️⭐️⭐️ rating to the following book:

Jason Parisi and Justin Ball. The Future of Fusion Energy. World Scientific, 2019. ISBN 978-1-78634-749-7. Available from Amazon US, Amazon UK, and many other retailers.

I came to this book looking for answers to questions such as: Is there still hope that a fusion power plant will ever be viable? If so, what exactly are the main obstacles on the way there? Why has progress in this field been so slow? And what should I make of the various startups claiming to have a fusion power plant just round the corner?

The book provides an excellent, detailed answer to these questions, and more. It’s the best kind of popular science book: you don’t need a physics degree to read it, but it doesn’t fob you off with oversimplified hand-waving either; all of the core arguments are convincingly backed up with evidence. There are some equations, but they are not necessary for understanding the book: as long as you know the difference between an electron, a proton, and a neutron, you’ll be able to follow it.

The book is clear about which constraints on fusion energy are fundamental limits of nature, and which constraints can be overcome with better technology. It offers optimism that fusion power is possible, highlighting the most promising paths to getting there, while remaining honest about the open problems that are yet to be solved. My take-away was that core problems, such as plasma turbulence, are very difficult, but likely solvable with more brainpower and experiments.

The book also provides compelling arguments in favour of fusion: not only the obvious case of providing cheap energy without carbon emissions or seasonal variation, but also that compared to fission, there is much less risk that the technology will facilitate the proliferation of nuclear weapons.

The need to transition away from fossil fuels is so urgent that we can’t afford to wait for fusion — renewables and fission are still crucial. However, for the medium to long term, fusion offers optimism. From about 1970 to 2000, fusion research made very impressive progress, with the key performance metric doubling every 1.8 years — faster even than Moore’s law, and getting pretty close to the point where the fusion reaction is self-sustaining without having to continually feed in external energy (the dotted line labelled “ignition” on the following diagram)!

Figure 4.25 from the book. Note the log scale on the y axis, so the straight line from 1970 to 2000 is actually exponential growth.

Since 2000, progress has stalled, and the book argues that this is primarily because research in the field has been under-funded, not because of any particular fundamental limit. Of course, anybody can claim that more money will solve their problems, but in this case I’m inclined to believe it. What changed in 2000 is that the fusion research community started putting all their eggs in one basket (ITER), because there wasn’t the money for multiple baskets. More money would allow more parallel experiments to explore different approaches and see which ones work better.

Investment in fusion research is small compared with investment in renewables and fission R&D, and tiny compared to things like agricultural and fossil fuel subsidies. Even if it’s not guaranteed that fusion will work, given the potentially transformative nature of cheap, climate-friendly energy to human civilisation, it seems well worth putting some more money in it and giving it our best shot (in addition to faster ways of getting off fossil fuels, such as renewables, of course).

I won’t try to summarise the technical details of the book, but if you are interested in them, I can assure you that you will find this book worthwhile.

Several podcast interviews

Martin Kleppmann — Wed, 01 Sep 2021 00:00:00 +0000

I regularly get asked to give interviews on the topics that I work on, especially for podcasts. To make them easier to find for anybody who’s interested, I thought I would make a list. They touch on a range of different topics, although there is also some overlap so I wouldn’t recommend listening to them all in a row!

(By the way, if you want a list of conference talks I have given, I have a YouTube playlist for that.)

Here’s a list of interviews I’ve given as of September 2021:

Interview with Wix Engineering, in which we discuss my book, the state of Automerge, the convergence of streaming systems and databases, Kafka’s move to replace ZooKeeper with their own Raft implementation, impact of my research, and more. Recorded 16 June 2021, published 26 August 2021. Video, transcript.
Interview with the Metamuse podcast, in which we discuss local-first software: how the concept has evolved since we first articulated it, and where it’s heading in the future. Recorded 17 August 2021, published 14 October 2021. Episode link
Interview with the Coding over Cocktails podcast, in which we discuss making systems scalable, how data systems have evolved over the years, and local-first software. Recorded 26 August 2021, published 30 August 2021. Episode link and transcript, Soundcloud, iTunes, Google Play.
Interview with the Programming Love podcast, in which we discuss peer-to-peer systems for collaboration, CRDTs and conflict resolution, undo and other challenges of collaboration software, my “Is Kafka a Database?” talk, and more. Recorded 9 July 2020, published 19 October 2020. Episode link, Apple Podcasts, Spotify, Stitcher.
Interview with CSR (Computer Science Research) Tales, in which we discuss formally proving the correctness of distributed systems, and verifying CRDTs in particular. Published 30 July 2019. Transcript.
Interview with the Hydra conference on seeing through technology hype, the CAP theorem, decentralisation, proving the correctness of CRDTs, event-based systems, and personal growth. Recorded 3 June 2019, published 27 June 2019. Transcript.
Interview with the Code Podcast, in which we talked in depth about my research on CRDTs, what they can and cannot do, and how we deal with time in distributed systems. Recorded 4 September 2018. Not sure this episode ever got published.
Interview for an internal podcast at Booz Allen Hamilton. Recorded 9 April 2018. I don’t think it ever got made publicly available.
Interview with the Invested Investor podcast, in which we talked about my startup career before I got into academia, selling two companies, going through Y Combinator, moving to Silicon Valley, and all that jazz. Recorded 20 November 2017, published 24 January 2018. Episode link, Audioboom, Transcript.
First interview with Software Engineering Daily, in which we talk about data-intensive applications, the CAP theorem, scalability, data models, data formats, the challenges of distributed systems, and ideas for the future. Recorded 20 April 2017, published 2 May 2017. I am told that this was the most popular episode ever of this podcast! Episode link, Download, Transcript.
Second interview with Software Engineering Daily, in which we talk about decentralisation, CRDTs, blockchains, consensus, concurrency, and how to make CRDTs work in practice. Recorded 15 November 2017, published 8 December 2017. Episode link, Download, Transcript.
Interview with Advance Tech Podcast, in which we discuss a wide range of topics: my past life in startups, security and decentralisation, event streaming systems, data consistency, and formal verification. Recorded and published 27 October 2017. Episode link.
Interview with InfoQ about log-based messaging, stream processing, and change data capture. Recorded 24 April 2015, published 28 June 2015. Video and transcript.

Update — later additions:

Short 4-minute episode on CRDTs (April 2022)
Interview with Russian reading group {между скобок} or {between brackets} (July 2022)

It's time to say goodbye to the GPL

Martin Kleppmann — Wed, 14 Apr 2021 00:00:00 +0000

The trigger for this post is the reinstating of Richard Stallman, a very problematic character, to the board of the Free Software Foundation (FSF). I am appalled by this move, and join others in the call for his removal.

This occasion has caused me to reevaluate the position of the FSF in computing. It is the steward of the GNU project (a part of Linux distributions, loosely speaking), and of a family of software licenses centred around the GNU General Public License (GPL). These efforts are unfortunately tainted by Stallman’s behaviour. However, this is not what I actually want to talk about today.

In this post I argue that we should move away from the GPL and related licenses (LGPL, AGPL), for reasons that have nothing to do with Stallman, but simply because I think they have failed to achieve their purpose, and they are more trouble than they are worth.

First, brief background: the defining feature of the GPL family of licenses is the concept of copyleft, which states (roughly) that if you take some GPL-licensed code and modify it or build upon it, you must also make your modifications/extensions (known as a “derivative work”) freely available under the same license. This has the effect that the GPL’ed source code cannot be incorporated into closed-source software. At first glance, this seems like a great idea. So what is the problem?

The enemy has changed

In the 1980s and 1990s, when the GPL was written, the enemy of the free software movement was Microsoft and other companies that sold closed-source (“proprietary”) software. The GPL intended to disrupt this business model for two main reasons:

Closed-source software cannot easily be modified by users; you can take it or leave it, but you cannot adapt it to your own needs. To counteract this, the GPL was designed to force companies to release the source code of their software, so that users of the software could study it, modify it, compile and use their modified version, and thus have the freedom to customise their computing devices to their needs.
Moreover, GPL was motivated by a desire for fairness: if you write some software in your spare time and release it for free, it’s understandable that you don’t want others to profit from your work without giving something back to the community. Forcing derivative works to be open source ensures at least some baseline of “giving back”.

While this made sense in 1990, I think the world has changed, and closed-source software is no longer the main problem. In the 2020s, the enemy of freedom in computing is cloud software (aka software as a service/SaaS, aka web apps) – i.e. software that runs primarily on the vendor’s servers, with all your data also stored on those servers. Examples include Google Docs, Trello, Slack, Figma, Notion, and many others.

This cloud software may have a client-side component (a mobile app, or the JavaScript running in your web browser), but it only works in conjunction with the vendor’s server. And there are lots of problems with cloud software:

If the company providing the cloud software goes out of business or decides to discontinue a product, the software stops working, and you are locked out of the documents and data you created with that software. This is an especially common problem with software made by a startup, which may get acquired by a bigger company that has no interest in continuing to maintain the startup’s product.
Google and other cloud services may suddenly suspend your account with no warning and no recourse, for example if an automated system thinks you have violated its terms of service. Even if your own behaviour has been faultless, someone else may have hacked into your account and used it to send malware or phishing emails without your knowledge, triggering a terms of service violation. Thus, you could suddenly find yourself permanently locked out of every document you ever created on Google Docs or another app.
With software that runs on your own computer, even if the software vendor goes bust, you can continue running it forever (in a VM/emulator if it’s no longer compatible with your OS, and assuming it doesn’t need to contact a server to check for a license check). For example, the Internet Archive has a collection of over 100,000 historical software titles that you can run in an emulator inside your web browser! In contrast, if cloud software gets shut down, there is no way for you to preserve it, because you never had a copy of the server-side software, neither as source code nor in compiled form.
The 1990s problem of not being able to customise or extend software you use is aggravated further in cloud software. With closed-source software that runs on your own computer, at least someone could reverse-engineer the file format it uses to store its data, so that you could load it into alternative software (think pre-OOXML Microsoft Office file formats, or Photoshop files before the spec was published). With cloud software, not even that is possible, since the data is only stored in the cloud, not in files on your own computer.

If all software was free and open source, these problems would all be solved. However, making the source code available is not actually necessary to solve the problems with cloud software; even closed-source software avoids the aforementioned problems, as long as it is running on your own computer rather than the vendor’s cloud server. Note that the Internet Archive is able to keep historical software working without ever having its source code: for purposes of preservation, running the compiled machine code in an emulator is just fine. Maybe having the source code would make it a little easier, but it’s not crucial. The important thing is having a copy of the software at all.

Local-first software

My collaborators and I have previously argued for local-first software, which is a response to these problems with cloud software. Local-first software runs on your own computer, and stores its data on your local hard drive, while also retaining the convenience of cloud software, such as real-time collaboration and syncing your data across all of your devices. It is nice for local-first software to also be open source, but this is not necessary: 90% of its benefits apply equally to closed-source local-first software.

Cloud software, not closed-source software, is the real threat to software freedom, because the harm from being suddenly locked out of all of your data at the whim of a cloud provider is much greater than the harm from not being able to view and modify the source code of your software. For that reason, it is much more important and pressing that we make local-first software ubiquitous. If, in that process, we can also make more software open-source, then that would be nice, but that is less critical. Focus on the biggest and most urgent challenges first.

Legal tools to promote software freedom

Copyleft software licenses are a legal tool that attempts to force more software vendors to release their source code. In particular, the AGPL is an attempt to force providers of cloud services to release the source of their server-side software. However, this hasn’t really worked: most vendors of cloud software simply refuse to use AGPL-licensed software, and either use a different implementation with a more permissive license, or re-implement the necessary functionality themselves, or buy a commercial license that comes without the copyleft clauses. I don’t think the license has caused any source code to become available that wouldn’t have been open source anyway.

As a legal tool to promote greater software freedom, I believe copyleft software licenses have largely failed, since they have done nothing to stop the rise of cloud software, and probably not done much to increase the share of software whose source is available. Open source software has become very successful, but much of this success is in projects with non-copyleft licenses (e.g. Apache, MIT, or BSD licenses), and even in the GPL-licensed projects (e.g. Linux) I am skeptical that the copyleft aspect was really an important factor in the project’s success.

I believe a much more promising legal tool to promote software freedom is in government regulation. For example, the GDPR includes a right to data portability, which means that users must be able to move their data from one service to another. Existing implementations of portability, such as Google Takeout, are quite rudimentary (what can you really do with a big zip archive of JSON files?), but we can lobby regulators to push for better portability/interoperability, e.g. requiring real-time bidirectional sync of your data between two apps by competing providers.

Another promising route I see is pushing public-sector procurement to prefer open source, local-first software over closed-source cloud software. This creates a positive incentive for businesses to develop and maintain high-quality open source software, in a way that copyleft clauses do not.

You might argue that a software license is something that an individual developer can control, whereas governmental regulation and public policy is a much bigger issue outside of any one individual’s power. Yes, but how much impact can you really have by choosing a software license? Anyone who doesn’t like your license can simply choose not to use your software, in which case your power is zero. Effective change comes from collective action on big issues, not from one person’s little open source side project choosing one license over another.

Conclusion

The GPL and other copyleft licenses are not bad; I just think they’re pointless. They have practical problems, and they are tainted by the behaviour of the FSF, but most importantly, I do not believe they have been an effective contributor to software freedom. The only real use for copyleft nowadays is by commercial software vendors (MongoDB, Elastic) who want to stop Amazon from providing their software as a service – which is fine, but it’s motivated purely by business concerns, not by software freedom.

Open source software has been tremendously successful, and it has come a long way since the origins of the free software movement born from 1990s anti-Microsoft sentiment. I will acknowledge that the FSF was instrumental in getting this all started. However, 30 years on, the ecosystem has changed, but the FSF has failed to keep up, and has become more and more out of touch. It has failed to establish a coherent response to cloud software and other recent threats to software freedom, and it just continues to rehash tired old arguments from decades ago. Now, by reinstating Stallman and dismissing the concerns about him, the FSF is actively harming the cause of free software. We must distance ourselves from the FSF and their worldview.

For all these reasons, I think it no longer makes sense to cling on to the GPL and copyleft. Let them go. Instead, I would encourage you to adopt a permissive license for your projects (e.g. MIT, BSD, Apache 2.0), and then focus your energies on the things that will really make a difference to software freedom: counteracting the monopolising effects of cloud software, developing sustainable business models that allow open source software to thrive, and pushing for regulation that prioritises the interests of software users over the interests of vendors.

Thank you to Rob McQueen for feedback on a draft of this post.

Building the future of computing, with your help

Martin Kleppmann — Tue, 23 Feb 2021 00:00:00 +0000

For the last five or six years, since I bid goodbye to the startup scene and Silicon Valley, I have been increasingly working in public. I have written a book, given around 100 talks (many of which are available on YouTube), published over 20 research papers (all freely available from my website), and released and maintained some open source projects. Just a few months ago I released a new undergraduate-level course on distributed systems, consisting of 7 hours of video lectures and 87 pages of notes and exercises, all free; in student evaluation at the University of Cambridge, over 80% rated my lectures and notes as “excellent”.

I love doing first-rate work and making it broadly available. In fact, apart from my book, I give everything away for free, because I want to be able to reach and help the broadest possible set of people. And even my book is very cheap compared to the value that many people get out of it (just read the reviews).

Of course, nobody goes into academia because of the money (or the job security of untentured posts, for that matter). I would probably be earning five times my current salary if I had stayed in industry. But I have absolutely no regrets about taking that pay cut: I love the freedom to work on whatever I find interesting, and the freedom to publish everything so that others can use it. If you have found any of my talks, writing, or code useful, then you have also benefitted from the freedom that I enjoy.

Of course, like everybody else, I have bills to pay. At the moment I’m employed at the University of Cambridge on a fixed-term contract, funded by a charitable research grant. This grant gives me wonderful freedom to pursue my research and make it publicly available, but it’s a fixed amount of money, and once it runs out, my job disappears in a puff of smoke. This sort of grant is not renewable, regardless how amazing the work it has enabled. I can try applying for follow-on grants from other funders, but this takes a lot of time and has a low chance of success.

Therefore I am setting up crowdfunding through Patreon, in the hope of establishing a sustainable basic income that will allow me to continue my work of research and teaching long-term. I want to continue making most of my work freely available, so that the maximum number of people can benefit from it.

Why support me?

I am offering three membership tiers for anyone who wants to support my work:

At the lowest tier, you will get regular news about new things I am working on, and exclusive early access to drafts and work-in-progress. Keep your finger on the pulse of new research as it is happening. I will also send you some nice stickers (once I’ve got them printed).
At the middle tier, you will additionally be invited to participate in an exclusive community with other supporters and myself, with both live and asynchronous discussions. I hope to cultivate thoughtful, high-quality exchange of ideas with likeminded people in this community.
At the highest tier, you get all the aforementioned benefits, plus the ability to influence my direction when I’m choosing what to work on next. Not saying I will definitely do what you want; also not saying that I will only take input from paying supporters (I still welcome ideas from everyone). However, I will consult and engage with supporters at this tier to get your opinions. I will also acknowledge you in any papers and books I write, making your name permanently etched into the scientific literature.

However, the biggest benefit is that by supporting me on Patreon you are enabling the creation of future work: that is, new thinking, writing, talks, and code that would not be created if I had to spend my time writing grant proposals or working for some company instead. If I have to go and get a job somewhere, you will mostly hear me giving bland talks promoting the technology of whatever company I happen to work for. Being independent allows me to pick topics that I find interesting and important (such as database transactions, formal verification, CRDTs, or elliptic curve cryptography), and present them in an accessible and neutral way.

I will continue making most of my work publicly available for free (except for books): even if you cannot afford to be a Patreon supporter, it will still be available to you. Patreon supporters simply get earlier access, plus the warm fuzzy feeling of knowing that you enabled the creation of new work that, without your support, may never have existed. Supporting me on Patreon is not a donation: it is an investment in future work that will hopefully be valuable to you.

If you have found my work useful – for example, if you have applied ideas from my talks in your work, or if my book helped you get a job – then I would be delighted to welcome you as a supporter! And if your company uses my book for training engineers, please find out how your company can support me: even my highest supporter tier is a tiny amount of money for a company that uses my work to improve the skills of their staff. I only get around $2 to $5 for every copy of my book that is sold; if you’re getting a lot more value than this out of it, it would only be fair of you to support me more substantially.

If you cannot contribute financially, worry not. I equally appreciate your support in the form of contributions to the open source community, discussing interesting ideas with me, and sharing useful material with others. I will continue to engage with you and answer your questions, regardless of whether you are a paying supporter. And most things I produce will continue to be free, so that everyone can benefit from them.

Planned work

Keep in mind that when you support me, you are not buying a product. You don’t know exactly what you’re going to get, because I don’t know exactly what I am going to do in advance either. That’s why it’s called research – it’s open-ended, and part of its purpose is to go down unexpected rabbit-holes if they seem important! You are funding a person because this person has done good work in the past, and is likely to continue doing good work in the future.

I do have a lot of plans, though. At a high level, I am hoping to do these things over the next few years:

Write another book to complement Designing Data-Intensive Applications;
Develop the foundational technologies to enable the next generation of collaboration software (such as Google Docs), in a way that does not require giving Google all of our data;
Continue writing research papers, blog posts, and giving talks/making videos on distributed systems and related topics.

There is no concrete timescale for these things; most likely I will work on several of them in tandem, as I have been doing over the last several years.

Part of this story is creating educational content on topics that I find important, and part is a vision for the future of collaborative computing, which my collaborators and I are realising in the form of Automerge, an open source project. Our vision is articulated in the essay-cum-manifesto on local-first software, which I suggest you read if you haven’t already.

Research philosophy

For me it is important to have this mixture of research, open source software development, and teaching (through speaking and writing), because all of these activities feed off each other. I don’t want to just work on open source without doing research, because that only leads to incremental improvements, no fundamental breakthroughs. I don’t want to just do research without applying it, because that would mean losing touch with reality. And I don’t want to just be a YouTuber or writer without doing original research, because I would run out of ideas and my content would get stale and boring; good teaching requires actively working in the area.

This interaction was articulated wonderfully by Turing award winner Jim Gray:

I aspire to be a scholar of computer science. All fields of scholarship, from religion to medicine, emphasize three aspects: meditation, teaching and service. Meditation (called research by scientists) is the official part of research. But, teaching (writing papers, explaining your ideas, and transferring technology) and service (making computer systems and helping people use them) are also major aspects of the scholarly process. They keep the scholar in touch with reality.

— Jim Gray, 1980

(That’s from Gray’s letter of resignation from IBM. The whole letter is a fascinating read if you’re into computing history. At the time Gray was working on System R, the precursor of all relational databases we use today. It’s fair to say that his work has had a huge impact.)

Another aspect of my research philosophy is that good work rarely happens with one person alone, but through collaboration with other good people. Quoting Jim Gray again:

Computer science is an empirical and multi-disciplinary field. The aspect of it that I work on, computer systems, requires lots of good people, time and equipment to produce anything of interest. Projects of five or ten people working for five or ten years seem to be about the right scale. More modest projects are unable to attack significant problems. More ambitious projects have unclear goals and have management problems.

You might be wondering: even if I get enough Patreon funding to cover my own living expenses, it seems unlikely that I will be able to crowdfund a team of five to ten people. Fortunately, I have found over the last years that collaboration does not require all team members to be funded out of the same purse. I constantly collaborate with people without being responsible for their payroll. In open source, it is common for contributors to a project to be employed by several different organisations, and indeed such diversity makes projects better and more resilient.

I work closely with the Ink & Switch lab, who have their own funding. Some of my collaborators are PhD students who have their own stipends, or research fellows who have their own grants. We come together because of our common interests, and because nobody is trying to profit from the others. We have a vision of the future that we want to realise, and the funding just lets us pay the bills as we work towards the greater goal.

Of course, if my Patreon ends up being successful and generates more money than I need for my own living expenses, I will use it to help fund collaborators. I am not aiming to recreate the lavish Silicon Valley engineering salary that I left behind; I just want to do good work without having to spend a lot of time chasing grants.

Alternatives to crowdfunding

Before moving to Patreon I considered several alternatives:

Academic jobs and fellowships? It’s a difficult to get a stable position at a research-focussed university. Both jobs and funding are fiercely competitive (hundreds of applicants for one place), and they require a strong track record of publications. Unfortunately, there is a large degree of randomness in the choice of papers that get accepted to top-tier publication venues. I am still interested in an academic career, but it seems unwise to put all eggs in this uncertain basket. Oh, and due to the pandemic my current university has a hiring freeze anyway, so no jobs anytime soon.
Founding a startup? Been there, done that (twice). A startup is a great way of productising technology on a 1–2 year time scale; it also needs fast growth and/or a strong revenue model. My current work does not fit that model since it focusses on foundational technolgies with a longer time-scale (the 5–10 years mentioned by Jim Gray), and it aims for public benefit rather than private profit.
Getting a job at someone else’s company? I want to be free to choose what to work on based on what I believe is important, not whatever happens to suit a company’s agenda. I also want to be free to publish that work openly. Not many companies are willing to support such positions long-term.
Consulting work and training? I could spend a fraction of my time helping companies solve problems within my area of expertise, or running training workshops. However, this type of income can fluctuate wildly, and generating a steady stream of clients is a lot of work and very distracting. It’s difficult to make consulting compatible with the deep thinking and long-term view required for research.
Becoming a professional author? I have been able to draw a reasonable income from royalties for sales of my book. However, I have no idea how long those sales will last, and I have no idea whether any future book I write will sell similarly well. Given this unpredictability, it seems unwise to bet on royalties as only income. Moreover, book-writing is only one of several things I do, and I believe the other things generate value too. I believe my funding situation should reflect that.

With crowdfunding, I hope to not only generate a steady income stream, but also build a community of people who are excited about the same topics as me, and who are invested in making these ideas a reality. It is an opportunity for me to share early-stage work with enthusiasts, and to improve that work through feedback from the community. And it is an opportunity for you to get an insider view of the research process as we build the future of computing.

If you believe in our vision for a better future of collaborative computing, or if you want to see more high-quality educational materials for computer science, then why not head over to Patreon and pledge your support? It will make a huge difference. Thank you!

Decentralised content moderation

Martin Kleppmann — Wed, 13 Jan 2021 00:00:00 +0000

Who is doing interesting work on decentralised content moderation?

With Donald Trump suspended from Twitter and Facebook, and Parler kicked off AWS, there is renewed discussion about what sort of speech is acceptable online, and how it should be enforced. Let me say up front that I believe that these bans were justified. However, they do raise questions that need to be discussed, especially within the technology community.

As many have already pointed out, Twitter, Facebook and Amazon are corporations that are free to enforce their terms of service in whatever way they see fit, within the bounds of applicable law (e.g. anti-discrimination legislation). However, we should also realise that almost all social media, the public spaces of the digital realm, are in fact privately owned spaces subject to a corporation’s terms of service. There is currently no viable, non-corporate alternative space that we could all move to. For better or for worse, Mark Zuckerberg, Jack Dorsey, and Jeff Bezos (and their underlings) are, for now, the arbiters of what can and cannot be said online.

This situation draws attention to the decentralised web community, a catch-all for a broad set of projects that are aiming to reduce the degree of centralised corporate control in the digital sphere. This includes self-hosted/federated social networks such as Mastodon and Diaspora, peer-to-peer social networks such as Scuttlebutt, and miscellaneous blockchain projects. The exact aims and technicalities of those projects are not important for this post. I will start by focussing on one particular design goal that is mentioned by many decentralised web projects, and that is censorship resistance.

Censorship resistance

When we think of censorship, we think of totalitarian states exercising violent control over their population, crushing dissent and stifling the press. Against such an adversary, technologies that provide censorship resistance seem like a positive step forward, since they promote individual liberty and human rights.

However, often the adversary is not a totalitarian state, but other users. Censorship resistance means that anybody can say anything, without suffering consequences. And unfortunately there are a lot of people out there who say and do rather horrible things. Thus, as soon as a censorship-resistant social network becomes sufficiently popular, I expect that it will be filled with messages from spammers, neo-nazis, and child pornographers (or any other type of content that you consider despicable). One person’s freedom from violence is another person’s censorship, and thus, a system that emphasises censorship resistance will inevitably invite violence against some people.

I fear that many decentralised web projects are designed for censorship resistance not so much because they deliberately want to become hubs for neo-nazis, but rather out of a kind of naive utopian belief that more speech is always better. But I think we have learnt in the last decade that this is not the case. If we want technologies to help build the type of society that we want to live in, then certain abusive types of behaviour must be restricted. Thus, content moderation is needed.

The difficulty of content moderation

If we want to declare some types of content as unacceptable, we need a process for distinguishing between acceptable and unacceptable material. But this is difficult. Where do you draw the line between healthy scepticism and harmful conspiracy theory? Where do you draw the line between healthy satire, using exaggeration for comic effect, and harmful misinformation? Between legitimate disagreement and harassment? Between honest misunderstanding and malicious misrepresentation?

With all of these, some cases will be very clearly on one side or the other of the dividing line, but there will always be a large grey area of cases that are unclear and a matter of subjective interpretation. “I know it when I see it” is difficult to generalise into a rule that can be applied objectively and consistently; and without objectivity and consistency, moderation can easily degenerate into a situation where one group of people forces their opinions on everyone else, like them or not.

In a service that is used around the world, there will be cultural differences on what is considered acceptable or not. Maybe one culture is sensitive about nudity and tolerant of depictions of violence, while another culture is liberal about nudity and sensitive about violence. One person’s terrorist is another person’s freedom fighter. There is no single, globally agreed standard of what is or is not considered acceptable.

Nevertheless, it is possible to come to agreement. For example, Wikipedia editors successfully manage to agree on what should and should not be included in Wikipedia articles, even those on contentious subjects. I won’t say that this process is perfect: Wikipedia editors are predominantly white, male, and from the Anglo-American cultural sphere, so there is bound to be bias in their editorial decisions. I haven’t participated in this community, but I assume the process of coming to agreement is sometimes messy and will not make everybody happy.

Moreover, being an encyclopaedia, Wikipedia is focussed on widely accepted facts backed by evidence. Attempting to moderate social media in the same way as Wikipedia would make it joyless, with no room for satire, comedy, experimental art, or many of the other things that make it interesting and humane. Nevertheless, Wikipedia is an interesting example of decentralised content moderation that is not controlled by a private entity.

Another example is federated social networks such as Mastodon or Diaspora. Here, each individual server administrator has the authority to set the rules for the users of their server, but they have no control over activity on other servers (other than to block another server entirely). Despite the decentralised architecture, there is a trend towards centralisation (10% of Mastodon instances account for almost half the users), leaving a lot of power in the hand of a small number of server administrators. If these social networks are to go more mainstream, I expect these effects to be amplified.

Filter bubbles

One form of social media is private chat for small groups, as provided e.g. by WhatsApp, Signal, or even email. Here, when you post a message to a group, the only people who can see it are members of that group. In this setting, not much content moderation is needed: group members can kick out other members if they say things considered unacceptable. If one group says things that another group considers objectionable, that’s no problem, because the two groups can’t see each other’s conversations anyway. If one user is harassing another, the victim can block the harasser. Thus, private groups are comparatively easy to deal with.

The situation is harder with social media that is public (anyone can read) and open (anyone can join a conversation), or when the groups are very large. Twitter is an example of this model (and Facebook to some degree, depending on your privacy settings). When anybody can write a message that you will see (e.g. a reply to something you posted publicly), the door is opened to harassment and abuse.

One response might be to retreat into our filter bubbles. For example, we could say that you see only messages posted by your immediate friends and friends-of-friends. I am pretty sure that there are no neo-nazis among my direct friends, and probably also among my second-degree network, so such a rule would shield me from extremist content of one sort, at least.

It is also possible for users to collaborate on creating filters. For example, ggautoblocker was a tool to block abusive Twitter accounts during GamerGate, a 2014 misogynistic harassment campaign that foreshadowed the rise of the alt-right and Trumpism. In the absence of central moderation by Twitter, victims of this harassment could use this tool to automatically block a large number of harmful users so that they wouldn’t have to see the abusive messages.

Of course, even though such filtering saves you from having to see things you don’t like, it doesn’t stop the objectionable content from existing. Moreover, other people may have the opposite sort of filter bubble in which they see lots of extremist content, causing them to become radicalised. Personalised filters also stop us from seeing alternative (valid) opinions that would help broaden our worldview and enable better mutual understanding of different groups in society.

Thus, subjective filtering of who sees what, such as blocking users, is an important part of reducing harm on social media, but by itself it is not sufficient. It is also necessary to uphold minimum standards on what can be posted at all, for example by requiring a baseline of civility and truthfulness.

Democratic content moderation

I previously argued that there is no universally agreed standard of acceptability of content; and yet, we must somehow keep the standard of discourse high enough that it does not become intolerable for those involved, and to minimise the harms e.g. from harassment, radicalisation, and incitement of violence. How do we solve this contradiction? Leaving the power in the hands of a small number of tech company CEOs, or any other small and unelected group of people, does not seem like a good long-term solution.

A purely technical solution does not exist either, since code cannot make value judgements about what sort of behaviour is acceptable. It seems like some kind of democratic process is the only viable long-term solution here, perhaps supported by some technological mechanisms, such as AI/machine learning to flag potentially abusive material. But what might this democratic process look like?

Moderation should not be so heavy-handed that it drowns out legitimate disagreement. Disagreement need not always be polite; indeed, tone policing should not be a means of silencing legitimate complaints. On the other hand, aggressive criticism may quickly flip into the realm of harassment, and it may be unclear when exactly this line has been crossed. Sometimes it may be appropriate to take into account the power relationships between the people involved, and hold the privileged and powerful to a higher standard than the oppressed and disadvantaged, since otherwise the system may end up reinforcing existing imbalances. But there are no hard and fast rules here, and much depends on the context and background of the people involved.

This example indicates that the moderation process needs to embed ethical principles and values. One way of doing this would be to have a board of moderation overseers that is elected by the user base. In their manifesto, candidates for this board can articulate the principles and values that they will bring to the job. Different candidates may choose to represent people with different world views, such as conservatives and liberals. Having a diverse set of opinions and cultures represented on such a board would both legitimise its authority and improve the quality of its decision-making. In time, maybe even parties and factions may emerge, which I would regard as a democratic success.

Facebook employs around 15,000 content moderators, and on all accounts it’s a horrible job. Who would want to do it? On the other hand, 15,000 is a tiny number compared to Facebook’s user count. Rather than concentrating all the content moderation work on a comparatively small number of moderators, maybe every user should have to do a stint at moderation from time to time as part of their conditions for using a service? Precedents for this sort of thing exist: in a number of countries, individuals may be called to jury duty to help decide criminal cases; and researchers are regularly asked to review articles written by their peers. These things are not great fun either, but we do them for the sake of the civic system that we all benefit from.

Moderators with differing political views may disagree on whether a certain piece of content is acceptable or not. In cases of such disagreement, additional people can be brought in, hopefully allowing the question to be settled through debate. If no agreement can be found, the matter can be escalated to the elected board, which has the final say and which uses the experience to set guidelines for future moderation.

Implications for decentralised technologies

In decentralised social media, I believe that ultimately it should be the users themselves who decide what is acceptable or not. This governance will have to take place through some human process of debate and deliberation, although technical tools and some degree of automation may be able to support the process and make it more efficient. Rather than simplistic censorship resistance, or giving administrators dictatorial powers, we should work towards ethical principles, democratic control, and accountability.

I realise that my proposals are probably naive and smack of “computer scientist finally discovers why the humanities are important”. Therefore, if you know of any work that is relevant to this topic and can help technological systems learn from centuries of experience in democracy in the civil society, please send it to me — I am keen to learn more. Moreover, if there is existing work in the decentralised web community on enabling this kind of grassroots democracy, I would love to hear about it too.

You can find me on Twitter @martinkl, or contact me by email (firstname at lastname dot com). I will update this post with interesting things that are sent to me.

Here are some related projects that have been pointed out to me since this post was published. I have not vetted them, so don’t take this as an endorsement.

The Facebook/Instagram Oversight Board is quite close to what I have in mind, and it has upheld the suspension of Trump’s account.
The recently launched MIT Center for Constructive Communication is an ambitious effort in this area.
“The Decentralized Web of Hate” is a detailed report by Emmi Bevensee on use of decentralised technologies by extremists.
Amy X. Zhang and her collaborators have done a lot of research on moderation.
Evelyn Douek argues that it’s not sufficient to view content moderation as lots of individual decisions on individual pieces of content, but that accountability requires a new form of institution that provides a dynamic, continuous governance structure.
Jay Graber recently published a comprehensive report comparing decentralised social protocols, and a blog post on decentralised content moderation.
Wes Chow has written a thoughtful and nunanced article on decentralised content moderation, with lots of references to further reading at the end.
A few people mentioned Slashdot, Reddit, and Stack Overflow as successful examples of community-run moderation.
On the other hand, J. Nathan Matias is skeptical that volunteers will be able to handle the challenges of content moderation at scale, since Facebook reportedly spends $500m a year on it.
Trustnet is a way of computing numerical scores for the degree of trust in indvidual users, based on the social graph.
Matrix, a federated messaging system, is working on a decentralised, subjective reputation system.
Freenet has a web-of-trust-based, decentralised user reputation system (see also this Bachelor’s thesis).
Waivlength is exploring a governance approach inspired by jury duty.
Freechains is a peer-to-peer content distribution protocol with an embedded user reputation system.
Songbird is a sketch of a decentralised moderation system for IPFS.
Cabal allows users to subscribe to other users’ moderation actions, such as blocking and hiding posts.
An app called Fantastic is exploring mechanisms for moderation.
Felix Dietze’s 2015 master’s thesis explores community-run moderation. He is also working on ranking algorithms for news aggregators.
Twitter is trialling Birdwatch, a crowdsourced effort to tackle misinformation.
Coinbase’s approach is to ban only content that is illegal in jurisdictions where they operate, or content that is not considered protected speech under the U.S. First Amendment.

Using Bloom filters to efficiently synchronise hash graphs

Martin Kleppmann — Wed, 02 Dec 2020 00:00:00 +0000

This blog post uses MathJax to render mathematics. You need JavaScript enabled for MathJax to work.

In some recent research, Heidi and I needed to solve the following problem. Say you want to sync a hash graph, such as a Git repository, between two nodes. In Git, each commit is identified by its hash, and a commit may include the hashes of predecessor commits (a commit may include more than one hash if it’s a merge commit). We want to figure out the minimal set of commits that the two nodes need to send to each other in order to make their graphs the same.

You might wonder: isn’t this a solved problem? Git has to do this every time you do git pull or git push! You’re right, and some cases are easy, but other cases are a bit trickier. What’s more, the algorithm used by Git is not particularly well-documented, and in any case we think that we can do better.

For example, say we have two nodes, and each has one of the following two hash graphs (circles are commits, arrows indicate one commit referencing the hash of another). The blue part (commit A and those to the left of it) is shared between the two graphs, while the dark grey and light grey parts exist in only one of the two graphs.

We want to reconcile the two nodes’ states so that one node sends all of the dark-grey-coloured commits, the other sends all of the light-grey-coloured commits, and both end up with the following graph:

How do we efficiently figure out which commits the two nodes need to send to each other?

Traversing the graph

First, some terminology. Let’s say commit A is a predecessor of commit B if B references the hash of A, or if there is some chain of hash references from B leading to A. If A is a predecessor of B, then B is a successor of A. Finally, define the heads of the graph to be those commits that have no successors. In the example above, the heads are B, C, and D. (This is slightly different from how Git defines HEAD.)

The reconciliation algorithm is easy if it’s a “fast-forward” situation: that is, if one node’s heads are commits that the other node already has. In that case, one node sends the other the hashes of its heads, and the other node replies with all commits that are successors of the first node’s heads. However, the situation is tricker in the example above, where one node’s heads B and C are unknown to the other node, and likewise head D is unknown to the first node.

In order to reconcile the two graphs, we want to figure out which commits are the latest common predecessors of both graphs’ heads (also known as common ancestors, marked A in the example), and then the nodes can send each other all commits that are successors of the common predecessors.

As a first attempt, we can try this: the two nodes send each other their heads; if those contain any unknown predecessor hashes, they request those, and repeat until all hashes resolve to known commits. Thus, the nodes gradually work their way from the heads towards the common predecessors. This works, but it is slow if your graph contains long chains of commits, since the number of round trips required equals the length of the longest path from a head to a common predecessor.

The “smart” transfer protocol used by Git essentially works like this, except that it sends 32 hashes at a time in order to reduce the number of round trips. Why 32? Who knows. It’s a trade-off: send more hashes to reduce the number of round trips, but each request/response is bigger. Presumably they decided that 32 was a reasonable compromise between latency and bandwidth.

Recent versions of Git also support an experimental “skipping” algorithm, which can be enabled using the fetch.negotiationAlgorithm config option. Rather than moving forward by a fixed number of predecessors in each round trip, this algorithm allows some commits to be skipped, so that it reaches the common predecessors faster. The skip size grows similarly to the Fibonacci sequence (i.e. exponentially) with each round trip. This reduces the number of round trips to $O(\log n)$, but you can end up overshooting the common predecessors, and thus the protocol may end up unnecessarily transmitting commits that the other node already has.

Bloom filters to the rescue

In our new paper draft, which we are making available on arXiv today, Heidi and I propose a different algorithm for performing this kind of reconciliation. It is quite simple if you know how Bloom filters work.

In addition to sending the hashes of their heads, each node constructs a Bloom filter containing the hashes of the commits that it knows about. In our prototype, we allocate 10 bits (1.25 bytes) per commit. This number can be adjusted, but note that it is a lot more compact than sending the full 20-byte (for SHA-1, used by Git) or 32-byte (for SHA-256, which is more secure) hash for each commit. Moreover, we keep track of the heads from the last time we reconciled our state with a particular node, and then the Bloom filter only needs to include commits that were added since the last reconciliation.

When a node receives such a Bloom filter, it checks its own commit hashes to see whether they appear in the filter. Any commits whose hash does not appear in the Bloom filter, and its successors, can immediately be sent to the other node, since we can be sure that the other node does not know about those commits. For any commits whose hash does appear in the Bloom filter, it is likely that the other node knows about that commit, but due to false positives it is possible that the other node actually does not know about those commits.

After receiving all the commits that did not appear in the Bloom filter, we check whether we know all of their predecessor hashes. If any are missing, we request them in a separate round trip using the same graph traversal algorithm as before. Due to the way the false positive probabilities work, the probability of requiring n round trips decreases exponentially as n grows. For example, you might have a 1% chance of requiring two round trips, a 0.01% chance of requiring three round trips, a 0.0001% chance of requiring four round trips, and so on. Almost all reconciliations complete in one round trip.

Unlike the skipping algorithm used by Git, our algorithm never unnecessarily sends any commits that the other side already has, and the Bloom filters are very compact, even for large commit histories.

Practical relevance

In the paper we also prove that this algorithm allows nodes to sync their state even in the presence of arbitrarily many malicious nodes, making it immune to Sybil attacks. We then go on to prove a theorem that shows which types of applications can and cannot be implemented in this Sybil-immune way, without requiring any Sybil countermeasures such as proof-of-work or the centralised control of permissioned blockchains.

All of this is directly relevant for local-first peer-to-peer applications in which apps running on different devices need to sync up their state without necessarily trusting each other or relying on any trusted servers. I assume it’s also relevant for blockchains that use hash graphs, but I don’t know much about them. So, syncing a Git commit history is just one of many possible use cases – I just used it because most developers will be at least roughly familiar with it!

The details of the algorithm and the theorems are in the paper, so I won’t repeat them here. Instead, I will briefly mention a few interesting things that didn’t make it into the paper.

Why Bloom filters?

One thing you might be wondering: rather than creating a Bloom filters with 10 bits per commit, can we not just truncate the commit hashes to 10 bits and send those instead? That would use the same amount of network bandwidth, and intuitively it may seem like it should be equivalent.

However, that is not the case: Bloom filters perform vastly better than truncated hashes. I will use a small amount of probability theory to explain why.

Say we have a hash graph containing $n$ distinct items, and we want to use $b$ bits per item (so the total size of the data structure is $m=bn$ bits). If we are using truncated hashes, there are $2^b$ possible values for each $b$-bit hash. Thus, given two independently chosen, uniformly distributed hashes, the probability that they are the same is $2^{-b}$.

If we have $n$ uniformly distributed hashes, the probability that they are all different from a given $b$-bit hash is $(1-2^{-b})^n$. The false positive probability is therefore the probability that a given $b$-bit hash equals one or more of the $n$ hashes:

\[ P(\text{false positive in truncated hashes}) = 1 - (1 - 2^{-b})^n \]

On the other hand, with a Bloom filter, we start out with all $m$ bits set to zero, and then for each item, we set $k$ bits to one. After one uniformly distributed bit-setting operation, the probability that a given bit is zero is $1 - 1/m$. Thus, after $kn$ bit-setting operations, the probability that a given bit is still zero is $(1 - 1/m)^{kn}$.

A Bloom filter has a false positive when we check $k$ bits for some item and they are all one, even though that item was not in the set. The probability of this happening is

\[ P(\text{false positive in Bloom filter}) = (1 - (1 - 1/m)^{kn})^k \]

It’s not obvious from those expressions which of the two is better, so I plotted the false positive probabilities of truncated hashes and Bloom filters for varying numbers of items $n$, and with parameters $b=10$, $k=7$, $m=bn$:

For a Bloom filter, as long as we grow the size of the filter proportionally to the number of items (here we have 10 bits per item), the false positive probability remains pretty much constant at about 0.8%. But truncated hashes of the same size behave much worse, and with more than about 1,000 items the false positive probability exceeds 50%.

The reason for this: with 10-bit truncated hashes there are only 1,024 possible hash values, and if we have 1,000 different items, then most of those 1,024 possible values are already taken. With truncated hashes, if we wanted to keep the false positive probability constant, we would have to use more bits per item as the number of items grows, so the total size of the data structure would grow faster than linearly in the number of items.

Viewing it like this, it is quite remarkable that Bloom filters work as well as they do, using only a constant number of bits per item!

Further details

The Bloom filter false positive formula given above is the one that is commonly quoted, but it’s actually not quite correct. To be precise, it is a lower bound on the exact false positive probability (open access paper).

Out of curiosity I wrote a little Python script that calculates the false positive probability for truncated hashes, Bloom filters using the approximate formula, and Bloom filters using the exact formula. Fortunately, for the parameter values we are interested in, the difference between approximate and exact probability is very small. The gist also contains a Gnuplot script to produce the graph above.

Peter suggested that a Cockoo filter may perform even better than a Bloom filter, but we haven’t looked into that yet. To be honest, the Bloom filter approach already works so well, and it’s so simple, that I’m not sure the added complexity of a more sophisticated data structure would really be worth it.

That’s all for today. Our paper is at arxiv.org/abs/2012.00472. Hope you found this interesting, and please let us know if you end up using the algorithm!