The Pragmatic Engineer

The Pulse: What can we learn from Bun’s rapid Rust rewrite with AI?

Ivan Klaric — Thu, 16 Jul 2026 16:50:20 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics of last week's The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

Last week in San Francisco, I met Jarred Sumner, creator of JavaScript runtime, Bun, and was keen to learn more about the rewrite of Bun from Zig to Rust. But at the time, Jarred didn’t want to say too much, as the tool used for the migration, Fable, was out of action due to the US government imposing export controls.

Jarred and I at Anthropic’s HQ, last week

Fortunately, the situation is now resolved and Fable is available globally, and Jarred has published a detailed post about the project. Before we get into the migration, some context:

Bun is a complex project, with lots of production software depending on it. Bun itself does many things:

JavaScript, TypeScript and CSS transpiling, minifying and bundling
A test runner
A package manager (npm-compatible)
Other things: module resolution, a WebSocket client, Node.js implementations and many modules

Today, Bun has 22 million monthly downloads, and software like Claude Code and OpenCode depend on it, while hosting providers like Vercel, Railway and DigitalOcean do first-party support for Bun.

Why a rewrite?

Zig is not a memory safe language, and memory-related bugs occurred continuously. Jarred lists memory-related bugs in the latest version of Bun: memory leaks, crashes due to memory issues, heap-out-of-bounds writes, and so on. This was after the Bun team patched the Zig compiler to reduce memory-related issues, and put end-to-end memory leak tests in place. As Jarred says:

“Our bugfix list felt bad and I was tired of going to sleep worrying about crashes in Bun. I don't blame Zig for that - other users of Zig don't have the bugs we had, and mixing GC with manually-managed memory is an uncommon enough thing for software to need that no language really designs for it. (...)

For Bun, correctly handling the lifetimes of garbage-collected values and manually-managed values has been a major source of stability issues - most often small memory leaks and occasionally crashes. Every memory allocation has to be meticulously reviewed. Where do these bytes get freed? How do we ensure it only gets freed once? Did we check for JavaScript exceptions properly? Is this garbage-collected pointer visible to the conservative stack scanner? Is this garbage collected memory or manually managed memory?”

Moving to a memory-safe, yet performant language could eliminate such errors, and Rust is one such language that fitted the bill. Jarred:

“A large percentage of bugs from that list are use-after-free, double-free, and "forgot to free" in an error path. In safe Rust, these are compiler errors and RAII-like automatic cleanup with Drop. Compiler errors are a better feedback loop than a style guide.”

However, doing a full rewrite on Rust has always been a terrible idea. Or at least, it used to be, because of how unbearably long it would have taken:

There are two problems with rewrites: they take too long, and they take waaaay too long. A dev who has done rewrites probably knows how things tend to go:

Make an educated guess about how long it will take; say, nine months.
Nine months later, there’s still another ~6 months to go because new functionality is added to the original codebase, and now that new functionality needs to be added in!
By 15 months in, there’s still months left to go for the same reason!
In the end, you manage to mandate a “feature freeze” for two months and finish the rewrite in ~18 months, if lucky. The original nine-month estimate can end up taking 2+ years.

Jarred likened rewriting Bun in Zig to this:

“Historically, rewrites are a terrible idea. Excluding comments, Bun is 535,496 lines of Zig.

A rewrite in another language would take a small team of engineers a full year.

A year of zero user-facing impact is not a realistic option we could consider. So, enforcement through code-style to fix stability issues was our best bet, and was our plan when we added Rust-inspired smart pointers to Bun's codebase.

But honestly, I didn't want to do it. Homegrown smart pointers offer worse ergonomics than Rust, with none of the guarantees.

What if, instead, I spend a week testing if Anthropic's new model [Fable] can rewrite Bun in Rust?”

Rewriting Bun with Fable

Unsurprisingly, the rewrite was not as simple as typing a prompt like: “Claude, rewrite Bun in Rust. Make zero mistakes.” Instead, this is how Jarred did it:

Step #1: Prep work. Three hours of intense prep work with Claude, explained Jarred:

“Before writing any code, I spent about 3 hours talking to Claude about how to map patterns from our Zig codebase closely to Rust. Claude serialized this discussion into a PORTING.md document, which ended up on Hacker News [as the Zig → Rust porting guide]”

This guide is a 600-line file with instructions like:

Ground rules:

No tokio, rayon, hyper, async-trait, futures. No std::fs, std::net, std::process. Bun owns its event loop and syscalls. (Rust core/std slice, iter, mem, fmt, and core::ffi are fine — only the I/O-touching modules are banned.)
No async fn. Everything is callbacks + state machines, same as the Zig.
Borrow-checker reshaping is allowed. When matching Zig flow yields overlapping &mut, capture the needed scalar (.len(), index) into a local, drop the borrow, then re-borrow. Do NOT reach for raw pointers just to silence borrowck; leave // PORT NOTE: reshaped for borrowck so Phase B diff readers aren't confused.

It’s a series of instructions that makes sense to someone who’s expert in Rust. If you want to learn more, we cover Rust basics and why Rust is different, with Alice Ryhl.

Step #2: Trial run + adversarial review. Asking Claude to rewrite three files out of 1,448 total number of files. After the rewrite, Jarred ran two separate adversarial reviews with Claude to critique the result, in separate sessions than the one that Claude made the changes in.

Step #3: split up the work across 64 AI agents. Jarred split up the job so that agents worked on files independent from one another, in parallel.

Step #4: iron out issues with the run (~1 day). When Jarred attempted to run all this, agents kept getting in each other’s way:

“I asked Claude to loop the workflow on all 1,448 .zig files, and about 2 minutes in, one Claude ran git stash before committing. Another ran git stash pop. And then git reset HEAD --hard. They were stepping on each other! And if I put each Claude into a separate worktree, I would run out of disk space because Bun's git repository is too big and eventually the changes will need to be compiled and seen together.

So, I asked Claude to edit the workflow to instruct Claude to never run git stash or git reset or any git command that doesn't commit a specific file at once. No cargo either. No slow commands at all.

Then, Claude resumed the workflows. And it was working! Too slowly, so I split it into just 4 workflow shards each with their own worktree (4 worktrees total), each running 16 Claudes committing and pushing files.”

Step #5: have it run and wait ~2 days. The parallel agents went to work, and completed the rewrite of 535,496 lines of Zig code over the course of two days. Each commit was checked by two adversarial reviews, before being committed.

Step #7: fix ~1,600 compiler errors (~12 hours). The rewrite was completed, but nothing compiled. Going crate-by-crate (‘crate’ is Rust’s concept of a top-level compilation unit), Jarred had Claude fix compiler errors. This alone would be a herculean task for an engineer, but not for Claude:

“Fixing the cyclical dependencies revealed about 16,000 compiler errors. A massive number for 1 human, but not a crazy number for 64 Claude’s at once.

To maximize parallelism, the workflow looped over each crate.

For each crate, run cargo check, group the output by file and save the errors to a file
Fix all the compiler errors within that crate
2 adversarial reviewers for the crate's changes
1 fixer applies the fixes”

Visualizing fixing of errors, one by one, done by the agents. Source: Anthropic

The enjoyable thing about this phase of the migration was that the agents ran from midnight until 11:30am, fixing compiler bugs on their own – when Jarred and the team were getting some sleep.

Step #8: run tests locally (~2 days). Bun has a large test suite. The next step was to get these tests to run without compilation errors.

Step #9: get the test suite to pass CI (~3 days). Once the tests were running (and failing), the next step was to fix the code, so that the tests could pass. This took two days.

Step #10: Done in 11 days! After all the tests passed and Jarred verified that everything worked as expected, he merged the changes. The whole process took 11 days, from planning to the finish.

The rewrite: porting ~550K lines of code, in 6,500 commits, over 11 days, with 64 agents

How repeatable is this process?

The rewrite cost a whopping $165,000 with API pricing. With Fable’s API prices, the rewrite consumed 5.9 billion uncached input tokens, 690 million output tokens, and 72 billion cached input token reads. Anthropic sells API tokens at a margin as its business, so the cost of the rewrite for it was lower. It’s a large amount: the equivalent of the annual base salary for a software engineer at a mid-tier company in the US!

But then again, could have an engineer done all this work in a year? Probably not, and Mitchell Hashimoto says the same:

“On the cost, I think $165,000 at API pricing for Fable (didn’t verify) is an incredible deal. There’s absolutely no way an engineer with that salary would’ve been able to achieve the milestones Claude did in 11 days. No way. (Even if you break it down to N engineers paid $165K total in 11 days it doesn’t math out)

This does, however, also reconfirm my own biases which is that Fable in particular is most excellent at hard, focused tasks with clear reward functions. I’ve been tweeting about this recently.”

What if AI enables rewrites and migrations that wouldn’t have been considered before? The idea of rewriting Bun in Rust without AI was impractical, admits Jarred:

“By hand, I think this would've taken three engineers with full context on the codebase about a year, during which time we wouldn't be able to improve Node.js compatibility, fix bugs, fix security issues or implement new features. We never would've done that. The realistic alternative was to do nothing and keep fixing the bugs at the top of this post forever.”

A rewrite or migration taking months or years is why so many of these projects never happen. Let’s take aside the cost for a minute and consider this question: if AI can shorten a one-year rewrite to a week: would you do it?

If the answer is “hell, yes:” a blueprint now exists in the form of the Bun migration on how to do it. There are some caveats not detailed in the post, though:

You need an engineer who is very motivated and knows the codebase very well
You need an extremely robust test suite, so when the test suite passes, you know it works
You need to be willing to invest a lot in tokens, not knowing how well it all will work

In fairness, #3 is the weakest point because we know LLMs are pretty good at “mundane” work like code migrations. With a good test suite (#2) and a motivated engineer to iron out things (#1), you’ll more likely than not succeed.

The remaining question is how much can be spent. It will likely not be $165K: and costs can be reduced with a simpler project, or by being thoughtful about model usage. For example, do high-level planning with the most expensive model, and cheaper ones for coding and review tasks.

Migrations with AI are surely speeding up, but only when projects are well-engineered like Bun’s has been.

Read the full issue of The Pulse this excerpt is from, or check out the latest The Pulse from today. Today’s issue covers:

Grok’s CLI uploaded all your local files to the cloud, then got caught.
New trend: concern about massive increase in code review load.
Are more devs at enterprises upset about enterprise pricing by AI labs – and does it matter?
Linux creator: AI “clearly useful.”

Read the full issue here

The Pulse: Interesting AI coding stats from Cursor

Gergely Orosz — Thu, 09 Jul 2026 17:20:34 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics a past The Pulse issue. Full subscribers received the article below five weeks ago. If you’ve been forwarded this email, you can subscribe here.

Cursor has just released a new report based on two years of its aggregated usage data, and there are some interesting findings:

Power users generate 10x as many lines of code vs the median

Source: Cursor

The median dev using Cursor (the p50) generates about 700 lines of code per week with it, while for the 90th percentile, it’s closer to 9,000 lines.

Top 1% of users create incredible volume of code

The p99 data is pretty stunning:

The top 1% of Cursor users (p99) vs the top 10% (p90)

The top 1% of users generate around 30-40K lines of code per week! That’s the equivalent of what ~45 “median” devs generate in the same period.

It’s worth asking how these top 1% of users are different. Are they writing a lot more greenfield code, do they have a bias for not using libraries, are they tokenmaxxing to get to the top of leaderboards? Do they generate 45x as many bugs, and importantly: are they adding a lot of business value with the software they ship?

Cursor consumes 10x more input tokens than it generates in output tokens

This is surprising: 90% of Cursor’s token usage is input tokens! This means that most of the tokens used are for reading the existing codebase and documentation. Outputting of code is a minority usage:

Input tokens (Cursor reading the codebase) is the bulk of token usage

In some ways, this usage makes sense: as devs, we always spent far more time on reading the code, compared to lines of code we typed out. The “10:1 read-to-write” ratio is a classic. Here’s Robert. C. Martin (aka “uncle Bob”) sharing this observation in 2008, in his book, Clean Code:

“Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code… [Therefore] making it easy to read, makes it easier to write.”

I find it amusing that we’re now seeing this 10:1 read / write ratio for token usage with AI agents!

Input tokens become the main AI token cost

Input tokens are priced at a fraction of output tokens: for example, Opus 4.7 charges 5x more for output tokens than for input tokens ($5 per 1 million input tokens and $25 per 1 million output tokens). Still, thanks to input tokens dominating token usage, Cursor is seeing input tokens account for closer to 70% of the cost of AI coding agents:

Input tokens dominate Cursor costs

Without caching context, token cost would be 10x higher

Cursor does smart caching of context, to avoid re-generating old context with more new input tokens. When taking cache usage into account, Cursor only spends 0.6% of tokens on output tokens. The remaining 99% is split between cache read (90%), cache write (2.5%), and input tokens (7%):

Output tokens are only 0.6% of token usage when considering cache reads & writes

I wonder if context reuse and caching will be a key AI efficiency component in the future? AI tokens are expensive to generate, so any form of reuse will make a lot of sense, especially in workflows like coding where a lot of existing context is reused.

Of course, Cursor sharing this detail also makes sense, as they remind everyone that building an efficient AI agent harness is far from trivial. Indeed, if you roll your own agent harness, you also need to put an efficient caching layer in place to match the efficiency of tools like Cursor.

Opus is the most expensive model & could hurt Anthropic

At the time of publishing, Opus 4.7 was still considered the most capable coding model. However, it’s also very expensive, and Cursor’s own data shows it’s close to 10x more expensive than its own Composer 2.5 model:

Opus 4.7 is twice as expensive as GPT-5.5 & nearly 10x more than Composer 2.5

It’s significant that Cursor compares the cost of a single agent request; it’s not a direct token-to-token comparison. And it’s worth noting this benchmark is being shared by Cursor, which has an incentive for its Composer model to appear the lowest-cost.

Still, assuming you can get similar-enough results with a 10x cheaper model, it is a saving that’s hard to ignore, especially for mid-sized and above companies. I would not be surprised if more tech companies find ways for devs to use less capable – but cheaper – models for less critical work.

More expensive models result in higher acceptance rates

An interesting metric Cursor shares is cost-per-line-added, per model:

This metric is a more realistic cost because it correlates to output: “smart” models that are expensive, but which produce code that is frequently accepted, are penalized by the cost-per-agent-request metric, but they’re not here.

Indeed, Opus 4.7 has the same cost-per-line-accepted as GPT 5.5 at half the cost per agent request. In this comparison, Cursor’s Composer model is “only” 5x as efficient.

Missing from both lists are Google’s Gemini models, a strange omission by Cursor. I reached out to Cursor and they told me that Gemini was left out simply because they see very little usage of this model on their platform, similar to the sparsely used Grok model.

Almost half of AI changes accepted without manual review by devs

I’ve left the most interesting part of this report to last: in just a month, among devs using Cursor, it has gone from 10% who let AI agents create commits without a manual step, to around 40% of devs who no longer personally check the code:

The jump correlates with Opus 4.7 and GPT-5.5 being released, and around the time when many devs seem to have concluded that writing code by hand is dying after experiencing this generation of models’ capability at generating code.

Check out the full report from Cursor for more details. Thanks to the team for releasing this data!

Read the full issue of The Pulse this excerpt is from, or check out the latest The Pulse from today. Today's issue covers:

Bun’s Rust rewrite with Fable: what can we learn?
Anthropic’s Fable, OpenAI’s GPT-5.6 Sol, Cursor’s Grok 4.5, Meta’s Muse
North Korean hackers keep trying to infiltrate full-remote companies
Industry Pulse: Meta’s key logging exposed sensitive data, massive cuts at Xbox, Meta could not buy enough AI capacity from Google, Qualcomm acquires Modular, and memory price hikes hit Apple products.

The Pulse: a new trend, smart model routing

Gergely Orosz — Thu, 02 Jul 2026 18:46:24 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from a previous The Pulse issue. Full subscribers received the article below three weeks ago. If you’ve been forwarded this email, you can subscribe here.

Two weeks ago, I covered a trend of companies trying to reduce spending on AI within their engineering departments. While talking to my sources about this, one head of engineering at a larger company told me that they wished there was an ‘intelligent’ router that picks the right model for the right task.

The reason for such a wish is clear; prices for tokens vary greatly per model, and there can easily be a 10-20x difference between a cheap, average model, and a state-of-the-art one.

I did some digging into whether any solutions like this currently exist because the benefits look obvious, and what I found is listed below. Usual disclaimer: I have no affiliation with these vendors, and have not been paid to mention any of them!

Vendors:

Factory Router: automatically selecting the right model per session, claiming 20-25% cost savings. More details.
Not Diamond: auto-selection of coding models, claiming around 30% cost savings. Used by OpenRouter, under the hood. More details.
Vercel AI gateway. Hundreds of AI models, smart routing and billing in one place. More details.
Prism by Augment Code. Choosing the “best” model automatically for coding tasks. More details.
Model Router by Morph. An API to suggest model selection for a prompt, based on a list of models. More details
Weave router: a token router that works inside Codex, Claude Code and Cursor. “Hard” requests stay on frontier models, while “easy” ones go to open source ones. More details

AI gateways with routing built in. API gateways are popular ways to use LLMs in workplaces.

OpenRouter: comes with “auto router” functionality where, after analyzing the prompt, the best one is selected. Uses Not Diamond under the hood. More details
Kilo Gateway: route requests the model considered the best price-per-value. Supports using your own model keys, and using the service only as a router. More details
Requestly.ai: automatically route requests to the right model based on cost, latency, and availability, and tons of configuration. More details
LiteLLM: define routing rules that automatically select the best model, based on input content with the “auto routing” functionality. The setup is more manual, but you get more control than with many other AI gateways. More details
Envoy AI Gateway: an open source gateway that offers some routing configuration, though it feels that the routing engine focuses more on availability, not cost optimization and smart model routing. More details

Cursor and GitHub Copilot also have an “Auto” model selection that does automatic model selection. For Cursor, it’s a fixed-price model where any savings made are for Cursor: they are not passed on to customers, but the model is cheaper than most others. For Copilot, the Auto mode results in intelligent model selection – but I’ve not heard much positive feedback about this mode from the few devs I asked about it. For Pro plans, Copilot supports pretty old models: GPT-5.5 and Opus 4.8 are not available. These are, however, available on the Pro+ and above plans.

Demand seems to be extremely high for intelligent routing. I asked Matan Grinberg, cofounder and CEO at Factory AI, who told me:

“Demand has been off the charts, especially from the enterprise [from large companies.] I’ve met with practically every bank CEO since we launched this offering, because they want a layer to control spend, while still generating high-quality code.

Pretty much everyone in tech is starting to see that open models are often sufficient. We’re seeing open model usage strictly increasing the last six months. My guess is that hosted open models are sufficient in performance for around 60% of coding-related work, in terms of token spend.”

It feels to me that “intelligent routing” will become table stakes, and so we can expect pretty much all AI vendors to build some version of it, and many new vendors to offer this kind of functionality.

If you know of any additional vendors not listed, you can add a comment on the original The Pulse article, and see more options there.

Read the full issue The Pulse that this excerpt was from, or check out all The Pulse issues.

Pollen tried to remove my article about CEO Callum Negus-Fancey and CTO Bradley Wright, and Google is assisting with it

Gergely Orosz — Sun, 28 Jun 2026 00:40:25 GMT

In 2022, I wrote about the damning fall of events tech company Pollen. The short of it:

Pollen seemed to have pulled off the improbable feat of building a business in the notoriously low margin industry of events, surviving Covid-19, and building a solid software engineering organization. In April this year, the company announced it had raised another $150M in fresh funding.

But just three weeks later, Pollen laid off about 200 people, a third of staff. Leadership assured employees all was well. However, from that point on, things got worse. Leadership later pulled the plug on Slack, employees were not paid wages, pension contributions went missing, and vendors were not paid. Some vendors took matters into their own hands; on 9 August 2022, JIRA was suspended when Atlassian tired of the company’s failure to pay.

On 10 August 2022, Pollen went bankrupt, collapsing into administration.

The article looked bad on Pollen's founder, Callum Negus-Fancey. He was ultimately responsible for lying to staff, not paying salaries, the missing pension contributions, and the unpaid health insurance for US employees. The story was so bad that the BBC created a documentary titled Crashed: $800M Festival Fail.

And then there was the $3.2M double charge for customers, manually initiated by CTO Bradley Wright, detailed extensively in the documentary Crashed: $800M Festival Fail. That double charge would have been trivial to reverse, but the reversal never happened, customers never got their money back, and the postmortem of the incident was never released to staff.

Four years later, Pollen and Callum Negus-Fancey are attempting to erase this shameful story from the public record. The article is my original writing, and thus I am the copyright holder of it. So imagine my surprise when I was notified that Google removed the article from its search results thanks to a copyright infringement claim it received:

It seems that anyone can file a bogus copyright claim to get an article they don't like removed from Google's search index. This happened in this case. I have no information on who filed the copyright claim. Even less so on who claims to be the copyright owner? Because I am the only possible copyright owner!

And Google has gone ahead and removed my article about Pollen's shameful collapse from its search results.

I have the option to appeal, which I have done so.

Google's copyright removal system is clearly being abused, to a comical degree. Someone doesn't like that I went into extreme detail about the events at Pollen - all of which are facts. And, for some reason, bogus copyright requests can be weaponized to remove information like this from Google's search index.

I managed to find the bogus DMCA complaint submission, after Google removed my site from search results. It is absolute BS: it claims that my original article is a copy of a The New York Post article. Which is absolute nonsense!

This "Ellie Piee" claimed that this 1998 article titled Band Leader Hits Winning Chord was copied by my article Inside Pollen’s Collapse: “$200M Raised” but Staff Unpaid - Exclusive. The two do not even share a single sentence!

The fake DMCA is made by a fake profile from a country with zero inhabitants. The removal requests by this "Ellie Piee" are made from the country called Bouvet Island, an uninhabited Norwegian dependent territory in the South Atlantic/Southern Ocean near Antarctica. It has zero inhabitants, and is referred to as the "world's most remote island."

Bouvet Island. No inhabitants, and yet Google accepted a fake DMCA takedown request from a fake person claiming to reside here. What a joke

Why does Google allow fraudulent DMCA notices to be filed with no penalty? My own speculation is that it is clear enough that either Pollen, or its former CEO Callum Negus-Fancey, or its cofounder and COO Liam Negus-Fancey or someone else related to the company hired reputation firms to remove Pollen articles from Google. This firm then files the most bogus requests under fake names supposedly residing in uninhabited regions of the world, and Google complies.

I never thought I would have to revisit the shameful history of Pollen, but someone at the company felt the need to prompt me to do so.

Lawsuits are still ongoing against Pollen, by the way. Now that someone from Pollen tried to erase the record of this story, I got a bit of renewed interest in what has happened since. In California, the lawsuit Tayler Ulmer vs Pollen is still in progress, summarized as:

Tayler Ulmer and five other named former employees, on behalf of themselves and “all similarly situated employees” claim to have been laid off without paid wages and benefits, plus claiming possible fraud
The filing says that Pollen executives Callum Negus‑Fancey, Liam Negus‑Fancey, and James Ellis are personally liable in this lawsuit
The lawsuit wants to reclaim unpaid wages, unpaid severance, restoration of lost 401(k) contributions, and a uling that all the named entities and individuals are jointly liable, including successor entities, so employees can collect regardless of how Pollen shuffled assets and dissolved subsidiaries

I am wishing best of luck to the claimants - former Pollen employees - and we will see how the judge rules in this lawsuit. The more Pollen wants to silence me writing about this, the more I'll likely pay attention.

Pollen executives should have read what the Streinsand effect means!

Reliability fail: No automated zone failover for Coinbase’s global trading service

Gergely Orosz — Tue, 23 Jun 2026 16:30:59 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from this past The Pulse issue. Full subscribers received the article below two weeks ago. If you’ve been forwarded this email, you can subscribe here.

On the evening of Thursday, 7 May, trading at Coinbase went offline and stayed that way for nearly 10 hours (!!). Customers could not buy, sell, deposit, receive, or withdraw. Basically, the core services of Coinbase were unavailable.

The outage coincided with a regional AWS outage. But no other company suffered a global outage; the most I observed was a few infra companies like Datadog noting that some regions had issues, and were failing over to a healthy region.

It’s weird that Coinbase – a $40B company! – told customers to monitor AWS’s status pages for recovery. This made it pretty clear that the company fully depends on a single AWS zone. Unusually, Coinbase deleted this information from its status page, but I got a screenshot first:

Out in the open: Coinbase shifts blame for outage to a cloud provider

Coinbase later confirmed that it does indeed have a single-availability zone dependency. From its postmortem:

“Our matching engine was pinned to a single building. The Coinbase Exchange matching engine runs as a Raft-based replicated cluster inside an AWS Cluster Placement Group. We make this choice deliberately. A matching engine that meets the latency and throughput demands of a serious market cannot tolerate inter-zone network hops between voting cluster members. The physics of distributed consensus and the economics of running a fair, liquid order book point to the same answer, which is co-location.”

A quick recap on the difference between an availability zone (AZ) and region:

Availability zone: One or more data centers (in the case of AWS, it is usually several data centers) located close enough to have low latency between them. Data centers in different AZs must be independently resilient. In the same AZ, there is no such requirement.
Region: Within AWS, this consists of at least three isolated, physically separated AZs, usually 10-30 miles apart. It’s unlikely they’ll go down simultaneously, even in extreme circumstances.

From deepdive, Three Cloud Providers, Three Outages, Three Different Responses

Coinbase is saying that running from more than one availability zone (AZ) (building) would introduce too much latency to their product. This makes sense for low-latency activities like trading. But what about preparing for a failover as and when the AZ goes down? After all, an AZ is not guaranteed to have high uptime!

Turns out, Coinbase did not prepare for a failover for an AZ. Also from its postmortem (emphasis mine):

“We lacked an automated ability to fail over to another availability zone. When AWS terminated EC2 instances inside our placement group at 9:29 PM ET, three of five matching-engine nodes went down and we lost quorum. There was no automated cross-zone failover. Recovery required an emergency code change shipped during the incident to remove a startup assumption that all five cluster nodes were resolvable, the creation of a new node group outside the impaired placement group, and a careful sequence to restore a 3-of-5 quorum. This allowed us to reopen markets: first cancel-only, then auction mode, and finally full trading.”

Having no automated failovers is incredibly amateurish for an operation of Coinbase’s scale. Coinbase moves about 5.2 trillion dollars per year, and is valued at around $40B. The outage interrupted around $7 billion-worth of financial activity, based on my napkin math.

Back in 2016, Uber was valued at roughly as much as Coinbase, and handled circa $40-50B yearly. It had two data centers on the east and west coasts, and operated more as if it ran out of two zones. I worked at Uber at the time and there were regular failover drills to another data center (another region), in preparation should a region go down. Uber’s business, in terms of the financial figures, was a fraction of Coinbase’s!

My impression of Coinbase’s engineering culture has sunk after this incident, and it’s almost comical that CEO Brian Armstrong is boasting that non-technical teams now ship production code, thanks to AI. This feels like the wrong thing to focus on when Coinbase’s infrastructure basics seem to be in far worse shape in 2026 than Uber’s were a decade ago in 2016!

It seems Coinbase did not learn lessons after getting burned by previous regional AWS outages. In October 2025, the company suffered a three-hour-long global trading outage due to issues with AWS’s DynamoDB service. Following that outage, Coinbase engineering said (emphasis mine):

“To be better prepared in the future, we are exploring all options, including reviewing our regional deployment strategy to implement immediate and long-term fixes to reduce the impact of these types of outages.”

That process of reviewing the regional deployment strategy evidently missed or ignored the risk of a single-zone dependency of the heart of the business, with no cross-zone failover.

Read the full The Pulse issue.

The Pulse: a trend of trying to cut back on AI spend within eng departments?

Gergely Orosz — Thu, 11 Jun 2026 16:31:42 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from The Pulse issue from two weeks ago. Full subscribers received the article below fourteen days ago. If you’ve been forwarded this email, you can subscribe here.

The below The Pulse is interesting, as a week after the original was sent it out, OpenAI CEO Sam Altman also said how AI budgeting is a huge issue for some companies – echoing findings from this analysis.

In mid-May, Uber president, Andrew McDonald, was on the Rapid Response podcast for a conversation about the ridesharing giant with host Bob Safian, who raised the lack of hoped-for efficiency leverage from AI, citing the language learning app, Duolingo.

“When you hear companies talking about 25% of code commits over the last quarter were AI-driven, or how their token usage went from X to Y percentage of employees: all these numbers are amazing. I think it’s a massive transformation of society”, McDonald said.

“But, then you go and you talk to your senior engineering leaders, and you’re asking: “how many projects that were “on the cutting room floor” got moved above the line [of being done] because of the productivity gains? Because 25% of our code commits were via Claude Code last quarter.”

That link [of improved productivity thanks to AI] is not there yet. I mean, maybe implicitly there’s more that is getting shipped, but it’s very hard to draw a line between one of those stats and more useful consumer features.

Over the coming quarters and years, maybe that will become clearer. But today it’s hard, even if some of the underlying metrics are trending in a really astronomical direction.

Our CTO, Praveen, went viral because he said in an interview that we had blown through our AI budget for 2026 and it was the middle of March. We’re going to have to start talking about token consumption and the associated cost versus headcount, and making tradesoffs on that as an engineering organization.

If you’re not able to draw a direct line to [how many] useful features and functionality you’re shipping to your users, that tradeoff [on AI spend] becomes harder to justify because AI is not free.

If you’re just a user [of AI tools] sitting there and coming up with interesting use cases, and you don’t pay the bill, it can feel [like AI is free]. But somebody’s paying the bill”.

My hunch is that pretty much every company is starting to, or will do soon, ask questions about the massive growth in AI spend; starting with AI coding tools. I talked with a few folks at larger and smaller companies about it:

OpenCode: customer demand for optimizing spend is spiking. Yesterday, on the podcast episode with OpenCode creator, Dax Raad, he said demand for OpenCode’s hosted inference service (OpenCode Zen) surpassed all expectations because larger companies want cheaper, but still capable, AI models. He revealed that over the past month, every single inbound enterprise request was about optimizing spend. So, there’s some widespread concern about AI bills.
Companies with cutting-edge AI bite the bullet with model routing. I talked with a CTO and a Head of Engineering at two cutting-edge tech companies. They also do not have an obvious return on investment (ROI) as yet. Still, they feel they have no choice but to pay the “intelligence premium” for state-of-the-art models or increase the number of bugs shipped. To reduce costs, both are considering “smart” model routing based on use case and prompt. These places pay top-of-market for the best engineers, so similarly, there are expectations of access to the best tools and models.
DoorDash: More knowledge-sharing sessions and responsibility for devs. The leading food delivery company gives responsibility for spending to devs: everyone has a high monthly token usage limit. To exceed it, you need to justify why, and also share the plan for being more efficient next month. Many regular in-house knowledge-sharing sessions are about efficient AI use.
Traditional company: monthly limits and dumb-model downgrades. One month ago, one of the largest retirement-savings companies in the US updated its AI usage policy for all devs, a current engineer told me, imposing a monthly GitHub Copilot token limit. Once gone, devs must use the less capable “0x” models on Copilot, which are not charged extra: GPT‑5 mini, GPT‑4.1, and Grok Code Fast 1.
Startups: signing up for multiple Claude / Codex Max subscriptions. I talked with several smaller startups that are generating meaningful revenue, and don’t want to pay expensive API prices. So, they’ve made it a practice for devs to get subsidized Claude Code Max or Codex Max subscriptions.

There’s a new bottom-up focus on AI efficiency. Most tech companies do a variety of internal knowledge-sharing things like regular team demos, lunch-and-learn sessions, and engineering all-hands. I’ve been noticing more AI efficiency-focused sessions in the past couple of months, coming from engineers: no top-down mandate!

Engineering all-hands, CTOs, and even CEOs have started to raise concerns about increasing AI token costs, and now more engineers are experimenting with cheaper models for simpler tasks, model routing, more efficient token usage, etc.

I’d expect that during the next performance review and promotion cycles, engineers who helped save on token costs might be rewarded, like two years ago, when engineering teams were rewarded for saving on third-party vendor bills.

For an engineer, the best way to show impact in your work is to translate it to money: revenue generated, or costs saved. With AI spending as high as (or higher than) on observability, it should be straightforward to show massive savings with smart optimizations. There’s a touch of irony in how any savings – for which there might be promotions and pay rises – will come from the places that actually did the rocketing spending.

Read the full issue in the previous The Pulse. Or check out this week's The Pulse: Did Anthropic’s new model just boost rival Codex’s market share?

The Pulse: Antigravity 2.0 takes ‘IDE’ out of its new IDE

Gergely Orosz — Thu, 11 Jun 2026 16:22:16 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from The Pulse issue from 21 May 2026. Full subscribers received the article below three weeks days ago. If you’ve been forwarded this email, you can subscribe here.

Yesterday, Google launched a full redesign of its flagship AI IDE, Antigravity 2.0. The “original” Antigravity came out in November 2025, as pretty much a clone of Windsurf, the IDE whose team Google acquired for $2.4B last July.

Google has turned Antigravity into two distinct applications, “Antigravity IDE” (its former incarnation) and “Antigravity 2.0”. This new version itself resembles a clone of Codex’s desktop app. When you install Antigravity 2.0, there are two different applications. From Google’s launch post:

“If you already have installed the Antigravity IDE, when that application next updates, it will automatically update to Antigravity 2.0. At this point, you will be asked if you would like to still keep the Antigravity IDE, which is recommended for developers:

My sense is that the team at Google may have struggled to decide whether to keep supporting “original” Antigravity while investing in the Codex-like experience, and so kept both. Whatever the reasoning, it has created confusing naming, and it feels to me like the team’s true focus is 2.0.

Big change: Antigravity 2.0 throws out the IDE and adds a conversational interface

The upgrade feels rushed, sloppy, and poorly thought out. I had Antigravity on my machine, and installed Antigravity 2.0 separately. I wanted to use them side-by-side, but when I tapped “Restart to update” on Antigravity 1.0, it upgraded itself to Antigravity 2.0 (the non-IDE version). Suddenly, I had two applications with different names, but neither is an IDE:

Testing times: Two apps but no IDE version on my machine, due to lack of testing

Google has introduced an “Agent Manager” concept that feels unintuitive, and perhaps suitably, its creator struggles to explain it (emphasis mine:)

“When we launched the Google Antigravity IDE in November 2025, there was no agent-first GUI surface in the market. We wanted to prove that such a surface worked, at least for software development. So, while the core of the Antigravity IDE was a familiar agent-powered IDE, we introduced the Agent Manager, a second surface that stripped away much of the “IDE” UI. This allowed users to focus on the agent conversations themselves, the artifacts the agents produced, and multi-agent management.

Even without this separation, we have been pleasantly surprised how many people have adopted the Agent Manager in the Antigravity IDE for such non-development tasks, but it is not particularly intuitive”.

The “Agent Manager” is basically a way to launch several agents, and the most intuitive interfaces for doing so are inside Claude and Codex desktop apps and Claude CoWork. Antigravity 2.0 copies them by starting new agent tasks on the right hand of the UI, and keeping track of them.

Google looks indecisive about what to do with the IDE part of Antigravity. The release announcement suggests they’ll keep on confusing users (emphasis mine:)

“Although Antigravity 2.0 is the future, we won’t disrupt your workflows right away. For now, both the Antigravity IDE application itself and the Agent Manager in the Antigravity IDE will remain available. In an upcoming release, we will remove the Agent Manager from the Antigravity IDE, turning the IDE into a purely agent-powered IDE.”

Basically, the Antigravity IDE (not “the future” in Google’s vision) will become more limited over time. It’s unclear what a “purely” agent-powered IDE will be once agentic functionality is removed, especially as Antigravity IDE is not the future, as per Google.

Not only that, but the announcement also encourages devs to use Antigravity 2.0 with other IDEs! From the launch post (emphasis mine):

“We recommend dual-wielding Antigravity 2.0 with your IDE of choice, whether it is the Antigravity IDE or otherwise. Googlers have already been dual wielding Antigravity 2.0 with a whole host of IDEs! We will have compatible extensions and plugins into other popular IDEs shortly”.

To me, this suggests Google will retire Antigravity IDE and recommend VS Code, JetBrains, Cursor, or Zed, with Antigravity. Then again, why would Cursor or Zed support Antigravity? The messaging is extremely confusing: Google’s still the king of opacity.

Feedback on Antigravity 2.0 has been negative due to bugs, poor UX and model support, more bugs, and eating up Gemini token quotas rapidly. Antigravity does not support state-of-the-art Anthropic or OpenAI models (no Opus 4.7 or GPT 5.5). Not supporting OpenAI’s models like this is sensible as they’re competitors, but Google is an investor in Anthropic, so not supporting Opus 4.7 (while supporting the legacy 4.6 model) is a bit odd.

Models which Antigravity 2.0 supports

Gemini 3.5 Flash is Google’s cutting-edge model, but it gets lots of complaints from devs for editing files without asking, and seems like an inefficient model. Another common complaint is that Antigravity uses up the $100/month Ultra subscription daily quota in minutes. Basically, it seems like a poor-quality product that wasn’t polished due to lack of time or inclination.

In context, it’s embarrassing for there to be a “Codex” folder in the launch video if it suggests that Google’s own Antigravity devs are using Codex for day-to-day work. It also suggests that the launch video was not reviewed properly, otherwise this obvious detail would presumably have been caught and fixed:

Codex folder in Documents suggests Antigravity devs are users of it. Source: Antigravity 2.0 launch video

To upset devs even more, Google is replacing its open source Gemini CLI with the closed source Antigravity CLI. There are a few issues with this move:

Antigravity CLI does not support Google’s own Agent Client Protocol (ACP), used for programmatic control, primarily for IDE and other developer tool integrations. This is protocol which IDEs like JetBrains and Zed have adopted, so Antigravity CLI becomes incompatible with them
Google offers no migration path from Gemini CLI settings/skills/MCPs into Antigravity. Figure it out on your own!
Devs using Gemini models are forced to move as Google has removed support for Gemini 3.5 Flash model from Gemini CLI. It can only be used from Antigravity CLI. Clearly, this was done to force a move. Why not offer a migration path?

My sense is the Antigravity team is moving fast, breaking things, and shipping a broken product. It feels like the Antigravity 2.0 and Antigravity CLI products have been rushed to meet the annual Google conference (Google I/O) deadline, this week. Google deprecates existing products to attempt to get users to switch to the new version. But the new one is broken.

What’s changed? Manu Cornet penned this cartoon in 2011

And this is a big reason why I don’t believe Google will become a serious player in the dev tools space – not even with AI dev tools. Every six to twelve months they remind devs who onboarded to their dev tools that it was a mistake to do so. I would expect the majority of Google CLI and Antigravity users to go and try products from other vendors – be that Cursor, Anthropic, OpenAI, GitHub, or others – and for few to stick around after their workflows are broken.

The Pulse: Forward deployed engineering heats up again

Gergely Orosz — Sun, 24 May 2026 20:35:11 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of five topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

Last August, we covered a sudden trend of high demand for forward deployed engineers (FDEs), and now there are signs demand is increasing more.

Google: FDE recruitment spike

Google is doubling down on FDEs and making the interview process much simpler. Google Cloud CEO, Thomas Kurian, has announced a new, AI-focused organization within the Go-To-Market team, and is hiring a bunch of FDEs for it.

I’m hearing the hiring process has been shortened from 4-6 interviews held over the course of weeks, to as few as two interviews in just two days. It looks like Google is unusually eager (desperate?) to fill this job.

OpenAI outsources FDE hiring spree

On Monday (11 May), OpenAI announced The OpenAI Deployment Company, a standalone entity funded by $4 billion of private equity from TPG, Advent, and others at a $14B valuation. It appears OpenAI is not an investor and holds a partner role.

The announcement mentions FDEs and says their job will be to “work closely with business leaders, operators, and frontline teams to identify where AI can make the biggest impact, redesign organizational infrastructure and critical workflows around it, and turn those gains into durable systems”.

Based on that, the FDEs will play an important role in OpenAI’s enterprise sales activity by ensuring the company’s AI systems work and deliver value for customers. Outsourcing this to the new Deployment Company should also free up OpenAI to focus on developing better AI models, while the partner company and its FDEs take care of the customer-facing side of things.

In a related development, OpenAI has acquired Tomoro, a UK-headquartered AI company founded in 2023, which employs 150 FDEs across the UK, Asia, and Australia. Tomoro is the first acquisition of the OpenAI Deployment Company.

Anthropic plans outsourced FDE recruitment

Anthropic is doing the same by creating its own distinct FDE consulting company. Last Monday (May 4), Anthropic issued an unusually hand-wavy announcement about the new business without a name and with few investment details mentioned.

Investors are Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs, and the new business will work with “mid-sized companies across sectors to bring Claude into their most important operations.”

Anthropic’s approach seems to be the same as OpenAI’s: create a standalone company with external funding, in which FDEs integrate Claude into enterprises that will then presumably start purchasing more Claude tokens than ever.

FDE or a consultant?

These FDE roles seem very similar to those of an external consultant or a systems integrator. A year ago, I talked with FDEs at OpenAI and Ramp whose jobs seemed a genuine mix of platform engineering – with an FDE contributing back to the platform – software engineering, in that they built new solutions, and also solutions engineering: integrating into customers’ services.

The FDE role as I visualized it in mid-2025

But today, it looks like the role is about to become indistinguishable from a solutions architect or consultant, especially given that these new FDE jobs are in quasi-external companies and separate organizations from where AI products are built.

The reality of the FDE role: an AI-focused solutions architect or consultant

Job adverts are increasingly clear about the role, but it still helps to read between the lines. Here’s one for an FDE at Google Cloud. At first glance, it’s impressive (emphasis mine):

“You are an embedded builder who bridges the gap between frontier AI products and production-grade reality within customers. Unlike traditional advisory roles, you function as an “innovator-builder,” moving beyond high-level architecture to code, debug, and jointly ship bespoke agentic solutions directly within the customer’s environment. Your role is designed for high-agency engineers with a founder’s mindset. You will address blockers to production, including solving the integration complexities, data readiness issues, and state-management challenges that prevent AI from reaching enterprise-grade maturity. By embedding with strategic accounts, you serve a dual purpose: providing “white glove” deployment of complex AI systems and acting as a critical feedback loop, transforming real-world field insights into Google Cloud’s future product roadmap.”

Translated into plain English:

You are a contractor who codes at a customer’s office. The actual job is around ~25% coding-related, 50% integration/plumbing, 25% meetings and customer hand-holding. Anything else will be assorted admin and internal process-related stuff.

Here’s what I reckon some of the terms in Google’s job advert will add up to on the job:

“Founder’s mindset”. No one will provide a spec, and scope creep is your problem to deal with. If your project doesn’t ship, that’s also your problem
“High-agency”. There are no resources besides your own
“White glove”. Do not say “no” to anything the customer suggests, even when they should probably listen to your feedback about whatever it is
“Critical feedback loop transforming real-world field insights into Google Cloud’s future product roadmap”. You will file tickets and a few PMs at Google may read some of them

But in all fairness, this FDE job looks like a great fit for some folks:

Those at the early-career stage who want Google on their resumes, but who might struggle to land a software engineering job with the tech giant
Those who enjoy shipping end-to-end, can work well with ambiguity (“founder’s mindset” is spot on!) and will own outcomes

On the other hand, I suspect this FDE role will not be a good fit for those who:

Like to build well-engineered systems and value the time to do it well
Like building greenfield systems
Prefer longer-term projects and working with other software engineers

In the cases of OpenAI and Anthropic, the outsourcing of FDEs is even clearer. Google at least hires FDEs to the company, and they will be issued some stock as part of their compensation package. But at OpenAI and Anthropic, new FDEs will be hired to a standalone company, and if they get stock, it will likely not be OpenAI or Anthropic stock. So, if OpenAI or Anthropic benefit greatly from FDEs’ work, then the FDEs won’t see the upside!

Putting it more simply: FDEs hired in these external companies will not be seen as “core”. If they were, then the companies would hire more FDEs, as in the past.

Opportunity for new grads?

As mentioned above, the new FDE roles could be a great opportunity for early-career software engineers entering the industry, according to Box CEO, Aaron Levie:

“If I were a college career counselor or in career services, I’d quickly be figuring out how to get students to understand these forward deployed engineer jobs exist and how to get them.

The requirements are a mix of deep technical skills, often CS majors or minors. You must be great at understanding problem solving, how to have systems thinking, and have a strong business acumen. The kicker, of course, is to make sure you’re very deep in AI agents; you need to have fluency in coding agents, MCP, CLIs, Skills, and so on.

Hundreds (thousands?) technology companies will be hiring for these roles, same with any consulting and IT services company, and the vast majority of mid-size and large enterprises will be hiring for this talent internally as well.”

Historically, tech consultancies hired many new grads for consultant roles, which are not so attractive to experienced engineers, but are great, real-world, paid learning opportunities for more junior ones. With product companies hiring fewer new grads, new grads will increasingly find FDE roles that they have a chance at getting.

All things considered, I expect demand for FDE roles to increase, industry-wide. They speed up AI rollouts, which several parties have an interest in doing:

AI labs: the faster that AI solutions roll out, the more revenue they make!
AI vendors: any company selling AI products will, similarly, want FDEs to help integrate the software with customers, so they can sell more
Non-AI companies: these will want to hire FDEs for an “AI transformation” and to integrate AI into workflows and products
Non-AI vendors: even SaaS companies that don’t sell AI products will be able to close larger clients if they hire FDEs who can roll out their software faster, and for more use cases, inside enterprises they work with.

FDE was the hottest tech role in 2025 and this trend seems set to continue this year. Demand for this role is high and rising, but it’s likely to stay unattractive to experienced devs for whom being a consultant may feel like a step down – especially after you’ve learned to love building products!

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse. This week’s issue covers:

Antigravity 2.0 takes the ‘IDE’ out of its new IDE. Feedback about the redesigned IDE is overwhelmingly negative due to bugs, poor UX & model support, and eating through Gemini token quotas. Also: a clue that Antigravity’s own devs use other tools for their work?
Why is Google’s product ecosystem chaotic? The range of products on display at the Google I/O conference made a messy, incoherent impression. But Google’s “let a thousand flowers bloom” approach might be giving the search giant an underrated advantage in the AI race that no other Big Tech giant has.
Meta cuts 8,000 jobs. Morale is very low inside the social media giant as thousands lose their jobs, just as revenue and profits hit record levels. Meanwhile, those assigned to dull data labeling work are spared the axe.
Industry pulse. Anthropic pays $15B/year for SpaceX compute, SpaceX’s financials and IPO filing, more woes for GitHub, court dismisses Elon Musk’s “hypocritical” OpenAI lawsuit, and Spain may stop blocking its internet during La Liga football games.
How to get a job at a frontier lab in 2026. A Distinguished Engineer at Google recommends focusing on developing particular skills

Read the full The Pulse.

Google Cloud deletes Australian trading fund’s infra

Gergely Orosz — Wed, 20 May 2026 08:31:08 GMT

A $124B fund in Australia would have lost all data stored with Google Cloud, had they not relied on a third-party backup. A rare blunder from GCP, where regional replication did not stop the deletion – and a just as rare statement from Google Cloud’s CEO taking the blame.

The below is an excerpt from The Pulse #93: OpenAI makes Google dance, originally published on 16 May, 2024. I am republishing it because on 20 May 2026 Google Cloud has done it again: they took offline cloud infra provider Railway by blocking Railway's Google Cloud account. The below is a warning: if you are on GCP, have a plan B, should GCP delete your account and data, or block your account.

Following on from Google Cloud being a distant third among cloud providers, a recent event could cement its reputation as the least reliable of the top three.

UniSuper is one of the largest retirement savings accounts in Australia, used by 615,000 citizens, that’s also known as a “superannuation fund.” UniSuper has $124B of assets under management and is one of the biggest in the country.

On 29 April, the service suffered an outage. Members could not log into their online accounts, or manage their funds until two weeks later, on 15 May.

The reason was that Google Cloud accidentally deleted UniSuper’s subscription, which also deleted all data associated with the subscription. UniSuper had set up replication across two regions in Google Cloud to protect from a regional failure, but Google Cloud deleted the replica as well!

UniSuper could only avoid data loss thanks to having a backup on another service provider outside Google. In a surprising admission, UniSuper would have lost all data with Google thanks to the failure of the cloud provider. The only reason UniSuper could restore services was by having another provider with whom they’d backed up the data. Basically, UniSuper not trusting Google’s replication across two regions turned out to be a 100% correct assumption. Whoever pushed through the decision to spend additional resources in “a backup in case Google fails” saved the day at the retirement fund.

The incident is incredibly embarrassing for Google. UniSuper seems to have forced Google Cloud’s hand by issuing a joint statement with Google Cloud CEO Thomas Kurian, in which Google Cloud takes all the blame for this failure. In my experience, the situation is rarely this black-and-white, as it usually takes two parties to cause such a major outage. I would not be shocked if it turned out UniSuper’s staff played a role in this failure, but Google Cloud made enough mistakes that the press release could dump all blame on it. I asked Google Cloud if the press release really was a joint release, and if they had more to add. The company confirmed the press release is correct and added nothing else.

Whoever was at fault, two weeks of downtime is still very long for a major fund. As I understand, the damage to UniSuper is mainly reputational because the funds are safe and secure.

Users could not see their balances for a few weeks, and were told Google Cloud had messed things up. This means there are up to 615,000 Australians in whose minds UniSuper and Google Cloud are indelibly linked with unreliability.

I keep seeing that Google Cloud has no apparent strategy for what it wants its cloud to offer. A few months ago, we dived into how AWS, Azure and GCP respond to regional outages, and I concluded it’s hard to see a strategy at GCP beyond following processes, while doing the least impressive job of all three cloud providers. It’s hard to gain market share if you remain the slowest to respond to regional outages, and the provider for whom a zone outage takes down a region, or which loses all customers’ data, despite regional replication, by deleting it.

This incident is a reminder you shouldn’t fully trust your cloud provider. UniSuper was smart to have backups elsewhere for its data in Google Cloud. And while it’s tempting to point fingers at Google Cloud: there are no definite assurances that another vendor would not make a similarly unprecedented mistake in the future!

The learning is that if you have really valuable data, keep a backup somewhere else. If you use any cloud provider, use another cloud, on-prem backups, or something else.

Read the full The Pulse issue

The Pulse: Did capacity shortages turn Anthropic hostile to devs?

Gergely Orosz — Thu, 14 May 2026 16:10:59 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of five topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

Last week, we reported on Anthropic seemingly being on a speed run to break devs’ goodwill by silently “nerfing” Claude Code, banning corporate accounts without warning, and a weird growth experiment involving revoking Claude Code and then restoring it. This week, a dev on the $20/month Pro plan had Claude Code removed just days into their subscription:

Claude Code turned out to be a trial for seven days for some paying customers. Source: Jaime Geiger

This week, Anthropic announced a big data center expansion, and relaxing previous usage limitations, while Elon Musk’s SpaceX / xAI ( a single company after a merger) is renting its complete Colossus 1 data center to Anthropic. From the announcement:

“Colossus 1 features over 220,000 NVIDIA GPUs, including dense deployments of H100, H200, and next-generation GB200 accelerators. The cluster delivers extreme parallel performance for large language models, multimodal systems, scientific simulations, and generative AI at frontier scale.

Anthropic plans to use this additional compute to directly improve capacity for Claude Pro and Claude Max subscribers.”

In parallel with this release, Anthropic announced:

Doubling Claude Code’s current 5-hour limits for Pro, Max, Team, and seat-based Enterprise plans
Removing peak hours limit reduction on Claude Code for Pro and Max plans
Substantially raising API rate limits for Opus models

Is it possible that capacity issues are what led Anthropic to make Claude worse? It’s confirmed the company has struggled with capacity for months. Conveniently, Claude Code being “nerfed” led to lower compute load, while removing Claude Code access from cheap plans could look like rate limiting. Even the banning of corporate accounts could be seen as scaling back at a time when the business has struggled to serve existing growth. Yesterday, (6 May), at the Code with Claude event hosted by Anthropic, CEO, Dario Amodei, said:

“We originally planned for 10x growth, and we’ve seen something more like 80x growth in revenue and usage over the last period of time.”

SpaceX / xAI renting a good chunk of its capacity to Anthropic is ironic, considering that xAI (Musk’s AI startup) builds Grok, a frontier model and direct rival of Claude, and also in January, Anthropic banned xAI developers from Claude. As covered at the time:

“It’s common for an AI lab to not allow another AI lab to use its model, like at OpenAI, Anthropic, and Google. On the other side, there’s also the pertinent question of why a leading AI lab would even want to use a rival for its own day-to-day work?

Turns out, xAI (Elon Musk’s AI lab) was relying on Cursor to write code, which we know because they got cut off.”

Anthropic likely banned xAI to stop Claude from being potentially distilled while it tried to improve Grok’s coding capability. Meanwhile, Musk called Anthropic “misanthropic and evil” earlier this year, and said the new tenant “hates Western civilization”. But both parties seem happy to put that behind them and strike a deal, so perhaps there’s something else at play.

Could SpaceX / xAI be checking out of the frontier-AI model wars? Leasing a good chunk of its data center capacity might suggest that. SpaceX / xAI has two data centers: Colossus 1 and Colossus 2. Colossus 1 represents somewhere around 45% of current SpaceX / xAI capacity, and 20-25% of planned total capacity.

Giving up as much capacity as this might indicate a lack of demand, or capacity sitting idle. It also means Grok is losing out in market share to Claude, ChatGPT, and other leading models. In February’s AI tooling survey we found scarce mention of Grok, which lagged in usage behind open models like DeepSeek and Qwen.

To be fair, unlike Anthropic and OpenAI, Grok never had a B2C nor B2B business that took off. The biggest consumer use case for Grok seems to be its integration into the social media platform, X; at least, I don’t know of any tech company using the model for serious work.

“The enemy of my enemy is my friend”, says the maxim, and if there’s one company Musk hates, it’s OpenAI. He is currently suing OpenAI, claiming it betrayed its founding nonprofit mission to develop safe AGI for humanity’s benefit by shifting to a profit-driven model backed by Microsoft. Musk also claims that despite investing about $40M, he has no ownership of the company.

He wants $150B in damages, the removal of Sam Altman and Greg Brockman, and for OpenAI to return to a full nonprofit, as per when he invested in the company. We covered more about OpenAI’s own ethical challenges between nonprofit and for-profit right after the firing of Sam Altman in 2023, in the deepdive What is OpenAI, really?

Similarly, Anthropic may well have an issue with OpenAI, if CEO Dario Amodei’s failure to join hands with Sam Altman while sharing a stage with the Prime Minister of India earlier this year is anything to go by.

(Most) AI leaders join hands at the AI Impact Summit with India’s Prime Minister. Source: Fortune

Capacity issues hurting Anthropic would benefit OpenAI, and so by offering significant capacity to Anthropic, Musk is making it harder for OpenAI to win the market. That would be ironic, given he’s a former investor.

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse. This week’s issue covers:

Forward deployed engineering heats up again. Massive demand for the role at Google, OpenAI, and Anthropic. The latest version of the FDE role looks like the consultant / solution architect role done by many early-junior engineers.
Why are layoffs spiking? Tech job cuts are higher than since early 2023 for various reasons: smaller teams prompt reorgs and reduce the need for middle management. Meanwhile, poorly performing companies make layoffs without the influence of AI.
New trend: self-reporting 100% AI generated code at Microsoft. With mid-year performance reviews looming, some managers advise their reports to claim they use AI for everything.
Industry Pulse. Tokenmaxxing at Amazon, too, SaaS companies grow faster than before – perhaps partly due to AI, Bun rewritten in Rust with AI works well, Anthropic overtakes OpenAI in enterprise spend, and more.
Vibe coding & agentic engineering get uncomfortably close. A relatable observation by software engineer, Simon Willison, about reviewing AI agents’ code less than would be ideal.

TechPays has been acquired by Levels.fyi

Gergely Orosz — Tue, 12 May 2026 16:06:08 GMT

tl;dr: TechPays is joining Levels.fyi: so the leading tech salary site in Europe gets the love and care it deserves. Thanks to Zsombor for building this project with me for so many years.

Pay transparency has always been an issue in tech, especially in Europe. For a while, I assumed that the most that a senior+ software engineer could make in London or Amsterdam would be in the realm of £100K / €100K. Once you reach that level, you've made it. You’re now at the very top of the market! Or are you?

So when I was making £93K in London, working as a principal engineer at Skyscanner in 2016, I was not expecting that I could be compensated meaningfully better. Pay surveys kept confirming that I'm well above the median, and into the 90th percentile of pay grades.

Imagine my surprise when I got an offer from Uber, in Amsterdam, that effectively doubled by compensation, into the realm of around €220-250K ($260-295K). By year four, I made €283K ($332K):

My total compensation at Uber, per year, 2016-2019. Blue is base salary, yellow is equity, green is cash bonus. Note how by year 5 (2020), my compensation dropped to below year 2, thanks to hitting my 4-year vesting cliff for the initial equity grant.

It felt like I discovered a "secret, upper-tier" of the market that no one else knew about. When I became a manager at Uber, and started hiring for my team, several strong software engineers were hesitant to move forward with the process, because they assumed that they were at the very top of the market – but they still made ~half of what we would have offered! I had no way of telling them "your data is wrong, this place pays a lot more!" and so several of them just never bothered to interview, assuming the most raise they would get would be 5-10%. When they could have potentially doubled their compensation…

I saw first-hand that not having good compensation information works against us, developers, and decided to try and change this. I collected data points from closer to 200 engineers working in the Netherlands, and explained that there's a third, "hidden" tier of compensation in The Trimodal Nature of Software Engineering Salaries in the Netherlands and Europe.

After the success of the article, I decided to "open source" compensation data points I collected, and thus TechPays was born:

TechPays

I built this site together with Zsombor Erdődy-Nagy. We paid attention to support compensation anonymization, capture freelancer compensation, and break down how compensation packages were put together. We've received so many heart-warming stories on how you've been able to negotiate better compensation packages, thanks to having access to this information.

Knowing that we're making a difference kept us going for a few years, as a side project. However, over time, both Zsombor and I got busier with other projects. For me, it was The Pragmatic Engineer taking up more of my time. We wanted to find a way to keep TechPays running, and get the care it deserves.

Levels.fyi will be taking over operating TechPays – and taking learnings about European compensation packages, and integrating into their global pay transparency platform. I've known Levels.fyi founders Zuhayeer and Zaheer for years, and we share our drive to make compensation as transparent as possible, across the tech industry.

With TechPays, there are no changes: you get to browse the data, as before. And expect even more, high-quality data points on Levels.fyi, for Europe, and globally.

To get more details on compensation, check out Levels.fyi. And read the Trimodal nature of tech compensation in the US, UK and India, based on Levels.fyi data points:

From the deepdive The trimodal nature of tech compensation in the US, UK and India

The Pulse: AI load breaks GitHub – why not other vendors?

Gergely Orosz — Thu, 07 May 2026 17:33:18 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

GitHub’s reliability has been beyond unacceptable recently: last month, third party measurements pinned it at one nine (right at 90%). This month, reliability has been down to zero nines – 86% – as per a third-party tracker, and last week, things got even worse: a frankly embarrassing data integrity incident, more outages, and a partial explanation from GitHub, eventually.

Data integrity incident

Last Thursday (23 April), this happened: PRs merged via the merge queue using the squash merge method produced incorrect merge commits, when the merge group contained more than one PR. Commits were reverted from subsequent merges: basically, commits were “lost” in the code that was merged!

Thanks to a bug GitHub introduced, the service broke its integrity promise that pull requests would be merged as expected when using squash merge, which is a technique typically used to merge multiple small commits into a single, meaningful commit. This is a big deal: as data integrity promises are some of the most important ones, for services like GitHub.

A total of 2,092 pull requests were impacted, and companies hit by the outage included Modal and Zipline. Effectively, GitHub pushed a bunch of work on affected customers who had to manually untangle and recover lost commits, which GitHub could offer zero assistance with.

Customers had to manually go through their git history and restore missing code. After following manual recovery steps (reverting the squash commit and re-applying commits one by one), all commits should have been recovered.

GitHub later emailed the list of affected commits to customers, but it’s odd that GitHub executives seemed to downplay the nature of this outage. After all, an outage that messes with data integrity is a much bigger deal than something like a fall in availability where no data is corrupted.

Can Duruk, software engineer at Modal, was unhappy about GitHub’s muted response to the outage:

“The COO going out of their way to find a huge denominator to make the impact appear small feels very dishonest; versus a sincere apology about how this invalidates their entire promise to their customers. We had to dig into their status page about this to even realize they just casually f***ed up our repo.”

Outages don’t stop

On Monday (27 April), pull requests and issues disappeared from GitHub’s web UI:

Pull requests go missing. Source: Mario Zechner

Issues also not to be found. Source: David Cramer

This had to do with an Elasticsearch outage on GitHub’s backend: the cluster became overloaded and went down. So, while pull requests, issues, and projects didn’t vanish altogether, they also didn’t show up during the 6-hour-long outage.

There were other outages this week:

Some pull requests not showing up (Tuesday, 28 April)
Problems with some GitHub Actions (the same day)
Incomplete pull requests in repositories (Wednesday, 29 April)

Also on Tuesday (28 April), security firm Wiz disclosed a critical security issue, where a bad actor could get access to all repositories on GitHub and GitHub Enterprise server by using only a git push command. GitHub fixed the issue on GitHub.com within six hours, but GitHub Enterprise servers that were not updated remain vulnerable.

Famous open source contributor quits GitHub in frustration

On Tuesday, Mitchell Hashimoto, founder of HashiCorp, creator of Ghostty, announced GitHub was unfit for professional work and that he was moving off to Ghostty, the open source terminal that’s his main focus. Mitchell’s reasoning was dead simple: being on GitHub makes him unproductive (emphasis mine:)

“The past month I’ve kept a journal where I put an “X” next to every date where a GitHub outage has negatively impacted my ability to work. Almost every day has an X. On the day I am writing this post, I’ve been unable to do any PR review for ~2 hours because there is a GitHub Actions outage. This is no longer a place for serious work if it just blocks you out for hours per day, every day.

It’s not a fun place for me to be anymore. I want to be there, but it doesn’t want me to be there. I want to get work done and it doesn’t want me to get work done. I want to ship software and it doesn’t want me to ship software.

I want it to be better, but I also want to code. And I can’t code with GitHub anymore. I’m sorry. After 18 years, I’ve got to go. I’d love to come back one day, but this will have to be predicated on real results and improvements, not words and promises.”

Mitchell’s experience suggests that GitHub’s official status page is inaccurate from the point of view of a heavy user like himself. The third-party “missing GitHub status page” is likely to be a better estimation: where GitHub’s reliability is at zero nines: at 85.51% uptime. That means that a part of GitHub was down for 2-3 hours, per day, on average, for the last 90 days (!!)

Reliability woes: GitHub “not a place for serious work.” Source: The Missing GitHub Status Page

Mitchell’s complaint sounds straightforward:

As a professional software engineer, it’s important to have tools that help you get work done
For months, GitHub has got in the way of his work on open source projects via a flood of outages
It makes no sense to use a product unfit for professional work.
As GitHub shows no signs of improvement, it’s worthwhile to move to a different solution which just works

CTO blames AI agent-fuelled load spike

GitHub CTO, Vlad Fedorov, shared an update on why reliability has been terrible for months at GitHub. He identified the load from agents being much bigger than expected as the culprit. Charts illustrating this were shared by GitHub:

This chart looks eye-catching – but there’s just one tiny issue: no Y axis! So, while it tells the story of the load going up slowly and then very fast, we’re not told by how much. However, I managed to get data from GitHub, and below is the chart showing the actual load increase over two years:

A load increase of ~3.5x, spread across two years, doesn’t seem so brutal at first glance. It is nothing like a load increase of 10x in a month, and a good chunk of it occurred in recent months. So, why can’t GitHub handle it? In a blog post, Fedorov said:

“A pull request can touch Git storage, mergeability checks, branch protection, GitHub Actions, search, notifications, permissions, webhooks, APIs, background jobs, caches, and databases. At large scale, small inefficiencies compound: queues deepen, cache misses become database load, indexes fall behind, retries amplify traffic, and one slow dependency can affect several product experiences.”

Here’s how the per-second load numbers from January 2023 and today compare:

GitHub took 15 years to achieve the 2023 numbers, and maybe it expected to continue growing in a comparable way in the future. If so, some engineering decisions about long-term infrastructure improvements would have been made obsolete by the arrival of AI agents.

To add to GitHub’s challenges, the company is in the midst of a migration from its own data centers → Azure. In October last year, GitHub started to move over to Azure – a project expected to take 12 months – because it already had constraints on its own data center capacity.

Such large-scale infrastructure migrations are hard enough when the load on a service is relatively stable; just making sure nothing breaks takes a lot of effort. But moving at a time when load is spiking means that bugs can cause more visible outages. Of course, GitHub can secure a lot more compute capacity on Azure, now they know what to expect.

But other major companies prepared for a 10x increase in infra load, so why not Microsoft / GitHub? A year ago, I did research on how Big Tech was preparing to respond to the impact of AI on their business. Google was improving its internal systems to accommodate for a 10x increase in load. As we covered in The Pragmatic Engineer, in July last year:

“Google is preparing for 10x more code to be shipped. A former Google Site Reliability Engineer (SRE) told me:

“What I’m hearing from SRE friends is that they are preparing for 10x the lines of code making their way into production.”

If any company has data on the likely impact of AI tools, it’s Google. 10x as much code generated will likely also mean 10x more: code review, deployments, feature flags, source control footprint and, perhaps, even bugs and outages, if not handled with care.”

Predicted enormous load increases were not secret knowledge within the industry, yet it seems GitHub was blissfully ignorant of their potential size. According to Vlad, GitHub did eventually plan for a need to increase capacity by 10x, but this was in October 2025, months later. In February 2026, the company is now adjusting that expectation to 30x. He wrote:

“We started executing our plan to increase GitHub’s capacity by 10X in October 2025 with a goal of substantially improving reliability and failover. By February 2026, it was clear that we needed to design for a future that requires 30X today’s scale.”

There’s also the question of whether GitHub miscalculated how much time it had to prepare for explosive load growth, and whether it was caught off guard when that growth materialized months sooner than expected at the start of this year.

Given GitHub only started to prepare for a major load increase in October, its current problems are unsurprising. At the scale of GitHub, it’s common enough for each team owning a service to plan a year ahead on how much load their service will have, and hardware resources like storage, VMs, and networking are allocated accordingly. Load planning can account for up to half of the preparations, and when reality doesn’t conform to plans, some systems can struggle to scale up.

So, on one hand, dealing with a 3.5x increase in load over 2 years should not be such a big deal for most services; especially not ones which can be horizontally scaled (when there’s not much state, and scaling is achieved simply by adding new nodes.) But GitHub probably stores a lot more state with pull requests, workflows, projects, etc. This probably makes scaling more tricky when it comes to databases and systems running workflows.

GitHub also has 18 years of tech debt on its hands, and thousands of staff to align as “organizational overhead.” As its service load grows faster than before, responding is harder due to all that accumulated “debt”:

Tech debt: many systems at the company are 10+ years old and are likely patched up, making them more difficult and risky to change
Organizational debt: around 4,000 people work at GitHub, of whom 1,000 are engineers. Teams have dependencies with each other, and even seemingly simple work can require dozens of engineers to work together
Customer expectations: GitHub cannot break customer workflows, even if doing so would mean changes to systems happen faster

GitHub finds itself in the ‘innovator’s dilemma’: the company became successful because it built developer workflows that made sense, pre-AI, and it used to be able to accurately forecast service load changes. But now that engineering teams’ workflows include AI agents, GitHub’s own workflows are not necessarily the best fit, and the company failed to forecast service-level changes.

Other vendors floored by AI load? Not really

One thing that doesn’t add up about the situation is that other vendors who are presumably experiencing similar load spikes don’t appear to be suffering with reliability issues as much. Vercel, Linear, Resend, Railway, Sentry, and other infra providers see record-level growth thanks to AI, but keep up with the load.

Yes, it’s true that AI vendors like Anthropic, OpenAI, and Cursor have some reliability issues, but it’s not at the scale of GitHub’s. GitHub’s direct competitors, GitLab and Bitbucket, presumably see load going up similarly, but they’re not going down as much.

An obvious question is how much of GitHub’s pain is self-inflicted? With Microsoft as owner, it has more resources at its disposal than any competitor or startup, and yet failed to predict load increases and is too big to respond with the nimbleness of a startup.

It’s undeniable that solving for a major load increase is a hard challenge; it’s when the difference between average and standout engineering teams is apparent. GitHub hasn’t been responding like a world-class engineering org.

GitHub alternatives?

Every regular user of GitHub feels the pain of ongoing outages. As a dev, you can either hope Microsoft will eventually improve reliability, or seek alternatives. As covered above, Mitchell has chosen to quit and is currently deciding where to take Ghostty.

The obvious alternatives are GitHub’s biggest competitors, GitLab, and Bitbucket. Each offers Git hosting, and neither comes with the uptime woes that GitHub is suffering from.

Self-hosted solutions are also an option, like self-hosting your git repo, or going with a self-hosted forge like Forgejo, which is an open source, local-first GitHub alternative.

I also suspect that, soon enough, we’ll see startups offering GitHub-like code hosting capabilities, while offering more robust uptime and being architected to handle the 30x-or-more scale which GitHub hopes one day to support.

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse. This week’s issue covers:

Did Anthropic turn hostile on devs because capacity was running low?
Amazon finally allows Claude Code and Codex usage
Meta forcefully assigns engineers to data labelling ahead of job cuts
New trend: small “AI-forward” teams
Industry Pulse: why Meta tracks employees’ computer activity, OpenAI starts to move off Datadog, Apple lets slip it uses Claude Code, GitHub → Xbox transfers at Microsoft, VS Code inserted “coathored by Copilot” even when Copilot did nothing, analysis of the Coinbase layoffs

The Pulse: token spend breaks budgets – what next?

Gergely Orosz — Thu, 30 Apr 2026 14:52:36 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of three topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

Last week, we covered the slightly perverse trend of “tokenmaxxing” across the industry, where devs run agents with the sole aim of boosting their personal “token stats” in an effort to rank higher on internal token leaderboards, and not be seen as a Luddite who doesn’t use AI tools enough compared to peers.

This week, I spoke with a software engineer at a large company and another at a seed-stage place. Both shared almost identical stories: at their latest all-hands, company leadership expressed concerns about the fast-rising costs of tokens. At both places, token spend has increased by ~10x in the last six months – with no signs of slowing down.

I wanted to find out about this trend, so I talked to devs at 15 businesses. Below is what I learned about what’s happening in workplaces of all sizes. Names are anonymized.

Large companies

Setting the default model to a cheaper one: 10,000+ person SaaS company, offices on all continents

Inside a large SaaS company, most devs use an internal background coding tool for coding. This model defaults to Claude Sonnet, which is the cheaper Claude version. Model selection is not persisted, so devs who prefer working with Opus, for instance, must reselect it on every subsequent startup.

This tool supports all major frontier models such as Sonnet, Opus, GPT, and Gemini. Devs at the company whom I talked to are very heavy users of the tool and have not encountered usage limitations.

Fintech company, US, Series D, ~8,000 people. Staff engineer:

“The cost in token spend is off the charts – and leadership has shared this trend with us. They have not said anything beyond showing growth in spend, and mentioning that this won’t be sustainable. So, nothing specific yet, but my sense is that something will have to change. Limits or prioritizing cheaper models, cutting back on hiring? Who knows.”

Infra company, US, publicly traded, ~5,000 people. Engineering Director:

“We’re monitoring but not restricting. We are spot checking the heaviest users, but we are seeing the business cases working out.

We are offering some guidance on model selection - e.g., turn off the new high-effort setting in Claude. Some users are trying open source models – but open source model usage is a bottom-up initiative, not a top-down one.”

Information technology, US, 10,000+ people. Director of Engineering:

“We have already had to raise our API budget limits multiple times in April. We recently switched to a much higher-effort level for Claude, which significantly increased the cost per PR.

One reason for the cost spike is using state-of-the-art models for demanding tasks. We are using that high-effort setting even for fairly trivial tasks that could have been handled by much cheaper models, or even by lower-effort Claude loops. Despite a few of us pointing this out, leadership has basically said budget is not the concern right now.

I sense that the budget increase has not been forecasted, and we’re in for a reckoning. I suspect the attitude changes once finance and other cost-conscious parts of the org realize we are spending hundreds of dollars per day, per highly-engaged developer. For now, fear of missing out and not wanting to fall behind seems to be outweighing cost discipline.”

Games studio, US+Europe, ~5,000 people. Senior developer:

“What budget increase? It’s very hard to get a budget for AI here! Claude Code is still not rolled out because $200/month/dev is seen as too high a cost. I talk with people at startups where $1,000/month in spending is totally normal, and it’s night and day here.”

Fintech company, US+Europe, late stage, ~5,000 people. Staff engineer:

“Some developers are now spending $500 a day (!!) on Claude Code. Practically speaking, this means that employee costs have doubled. Productivity has increased, in my view, but now the bottleneck is code reviews. AI can spit out code quite quickly, but we still have human reviews in place. Leadership encourages using AI for code review, but my team will not blindly trust AI.

The push from AI is coming from the top. This year’s performance review had a section on AI, rating devs by how well they used AI, so this is another reason everyone just uses it as much as they can.”

Mid-sized companies

SaaS industry, US, ~2,000 people. Dev Productivity Lead:

“Model routing helped keep our costs growing less dramatically. For example, changing the default model reduced cost by 30%. This is our strategy with AI spend, summarized:Short term: spend, spend, spend! Experiment and use whatever models make sense.Measure the impact. Measure key outcomes and report on spend, monthly.When spend vs results diverge: adjust. When our spend increases dramatically, but outcomes don’t follow: see what we can do to adjust the delta. More spend should mean better outcomes. If not, we are doing something wrong.”

Finance industry, US, ~2,000 people. VP of AI:

“We have Cursor and Claude Desktop, both of which have around 800-1,200 total users. Token usage is growing somewhat unexpectedly. Estimates are being adjusted on the fly; the initial plan to have strict limits (say, $100 per user) is breaking when reality hits, and people exhaust them in 3-5 working days.

Using expensive models is a problem. In regards to Cursor, many devs are defaulting to the most expensive models without realizing that going with Opus gives single percentage gains in intelligence compared to Sonnet, for example, while exhausting their budgets almost immediately.

We are working on blocking/managing out the most expensive models [with Cursor], as going into thousands of dollars per user, per month is not sustainable on our scale. Cursor is a good partner and we’re working with them to switch to a “pooled spend” model where heavy users can tap into a pool of extra spend.

Claude is a similar story. We were at $100 of Claude Desktop limit for everyone, but as we are moving forward, I can see that we would need to go much higher, especially for business-critical use cases.”

Infra company, US, late-stage, ~700 people. Founder:

“We haven’t had much of an issue. Most folks police themselves for runaway costs; for example, we had someone hit like $10K in a week because they messed up caching, but it was caught and they corrected their harness.

For the most part, we don’t see our high-end folks spending more than ~$1K/week. Now, to be clear, this is not a small amount! BUT it’s already a small subset of the population.

We’re just factoring it into engineering costs at this point: if it’s, say, $2K/month per employee, that’s $24K per year.

Who cares, then, when engineers already cost $200-400K/year in cash comp? Okay, so what if it’s $5K/month. That’s $60K/year.

Our bet is that token costs will stabilize and we’ll eventually end up with local-ish models.

Now, it could be five years before they stabilize, but overall, spend today isn’t that insane to me.

There’s a lot of people who are just dumb about it, but most legit execs push back on this. Take the Ralph loops or other insanity where someone spends $1K/day, $5K/week or stuff like this. That’s all just people being fools thinking they’re doing “R&D,” or somehow that they’re smarter than everyone else, but they’re just producing junk that never ships or is not useful.

We saw a bit of “stupid overspend” in the first couple months, but that’s all gone now. Costs could go up even more if we would “crack the whip” in wanting to see even more output, but we’re not doing that.”

Healthcare industry, US, ~500 people. Senior engineering manager:

“We are not holding back on spend, and have a monthly spend leaderboard. And we WANT devs to spend more on tokens! For example, one of my engineers spent $1,400 on a long Claude Code session in a single day.

We are seeing massive leverage, and we do more with the same number of people. This is why we are okay with our spending spiking. Our traffic is growing more than 10x, year-on-year, and we have managed to keep things running with the same team, and these AI tools.

Engineering is now blocked on Product and Design – which never happened before! This is how fast execution has become. We now have Staff+ engineers writing Product PRDs so we can move faster.

I’ve been in tech for close to 15 years and I never saw dramatic change like this. I just came back after a 3-month break, and every single thing is different in my day! I feel these AI agents are the biggest change in the industry since high-level languages became widespread.”

E-commerce company, US & Europe, ~2,000 devs. Head of Engineering:

“The increase in spend is INSANE. It’s about usage going up, with no signs of stopping. Usage is off the charts.

We currently do not have limits in place, and are not pausing now. Our CEO is AI-pilled and won’t let us slow down.

We do buy tokens at a discount. They start from 5% and go up with usage with the vendors we use (the usual suspects.)

We don’t let devs use anything lower than Opus 4.7 for coding. Cheaper models might work better, but a slight error pushed to prod would result in hours of toil.”

Small companies

Series A, US, ~50 people. Principal Engineer:

“About 15 devs are heavy users of AI and costs are rising very fast. Almost everyone uses Claude and Claude Code. We are considering four potential options:Increase AI budget, and start measuring more. Continue doing what we are, but allow devs to use more tokens instead of hiring limits. The precise ROI is hard to quantify, but we’ll start to measure and track both AI adoption and impact.Optimize token consumption. Use cheaper models for simpler tasks, review token usage, and see where we can cut usage. Downside: this approach could become one with diminishing returns, fast.Integrate more AI providers in the company. Find wrappers to abstract LLMs. The problem is: how do you replace Claude Code, for instance?Pivot to local models: such as Kimi, Qwen, and so on. The problem is it’s a big investment in high-end hardware or cloud GPUs. Upside: it offers better long-term cost control, once done.

We are likely to go with option #1: increase spend BUT maintain momentum and put the right measurements in place. We can do #2, #3 and #4 later. But if we kill AI usage momentum inside the company, the outcome will probably be worse.”

AI infra, US, seed stage, ~15 people. Founder:

“We saw a 15x increase in 6 months:Six months ago our spend per developer was ~$200/monthToday, it’s around $3,000/developer/month, for our seven devs
We’re not slowing usage, especially as we are building an AI infra product. The increase was much faster than expected, though.”

Small, bootstrapped company, Europe. Founding engineer:

“Our current strategy in dealing with the increase in costs is to switch to a cheaper model; unfortunately, from Opus to Sonnet in our case. That said, Sonnet is quite decent.”

How businesses manage token spend

Regardless of company size, there seems to be two strategies for how companies deal with increased spending. A summary:

Strategy #1: “let it rip and start measuring.” Around half of respondents say AI spend is rising dramatically, and they have decided to do nothing about it. They want devs to use AI as much as it makes sense to, and to help the work as much as possible.

However, because the cost is rising dramatically, these companies are now starting to measure usage and attempting to measure the impact of their AI tools.

There’s a few companies where the impact seems to be very positive, already. Smaller startups whose business is exploding in numbers of customers, load, and revenue, see that they don’t need to hire more staff because existing engineers can keep supporting the growth with AI tools.

Strategy #2: curb spending. Commonly mentioned cost-saving approaches:

Use cheaper models for simpler tasks
Set default models to less capable ones
Set a spending cap and make it hard for engineers to exceed it, or require consent for doing so

Most companies using strategy #1 have briefly considered going with this approach, but threw it away, because they see this approach as optimizing on the wrong thing: cutting costs before the productivity impact of using state-of-the-art tools is even known!

Discounts exist when the spend is in the millions of dollars. I asked several people if they are getting discounts from vendors when buying tokens at scale. There were no exact numbers, but this is what I gathered in aggregate about possible custom agreements:

Cursor: open to discounts above a few million dollars in spend. Companies have negotiated discounts with Cursor after crossing $1M of spending. Some companies negotiated tiered discounts from this level, starting at 5% and going higher as their spend goes up.
Anthropic: no discounts. I talked with companies spending $5M+ per year on Claude which have received no discounts. If Anthropic offers discounts, it will likely be at a much higher tier.
All discounts are custom, so try to negotiate – it’s free! Pricing discounts are on a per-customer basis, and highly custom. The easiest way to see if a discount is available is to ask the vendors!

—-

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse. This week’s issue covers:

Load from AI breaks GitHub – but why not other vendors? GitHub’s reliability is less than one nine, and getting worse. Prolific open source contributor, Mitchell Hashimoto, is quitting GitHub because he thinks it’s not suited for professional work. GitHub’s leadership blames the 3.5x increase in service load as the cause of degradation – or it might be self-inflicted.
Anthropic’s speedrun to destroy trust. Anthropic could do no wrong until recently, but in the past month, that’s all changed. Silently nerfing Claude Code, banning companies from Claude, and baffling price rises all add to a sense that Anthropic is in its “extraction” era of generating more revenue for the same or worse service.
Industry pulse. Dramatic price increases at GitHub Copilot, explosive growth at Codex, Google scrambling to build a good coding model, Cursor might be bought by SpaceX, AI agent deletes car business, and more.
Mitchell Hashimoto & the “building block economy.” Ghostty’s creator finds that open source “building blocks” are the best way to win massive adoption by software components – but it’s got harder to build a business on top of open building blocks.

The Pulse: ‘Tokenmaxxing’ as a weird new trend

Gergely Orosz — Thu, 23 Apr 2026 16:55:40 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from last week’s The Pulse issue. Full subscribers received the article below seven days ago. If you’ve been forwarded this email, you can subscribe here.

Inside Meta, an engineer created a “token leaderboard” that ranks employees by token usage. Last week, The Information reported:

“Employees at Meta Platforms who want to show off their AI superuser chops are competing on an internal leaderboard for status as a “Session Immortal”— or, even better, “Token Legend.”

The rankings, set up by a Meta employee on its intranet using company data, measure how many tokens — the units of data processed by AI models — employees are burning through. Dubbed “Claudeonomics” after the flagship product of AI startup Anthropic, the leaderboard aggregates AI usage from more than 85,000 Meta employees, listing the top 250 power users.

The practice is emblematic of Silicon Valley’s newest form of conspicuous consumption, known as “tokenmaxxing,” which has turned token usage into a benchmark for productivity and a competitive measure of who is most AI native. Workers are maximizing their prompts, coding sessions and the number of agents working in parallel to climb internal rankings at Meta and other companies and demonstrate their value as AI automates functions such as coding.”

I spoke with a few engineers at Meta about what’s happening, and this is what they said:

Massive waste. Plenty of devs are running an OpenClaw-like internal agent that burns massive amounts of tokens for little to no outcome.
Outages caused by AI overuse. A dev mentioned that some SEVs were caused by what looked like careless AI code generation; almost like a dev behind the SEV was more concerned with churning out massive amounts of code with AI than with product quality.
Gamified leaderboard. Those at the top of the leaderboard produce throwaway, wasteful work. This is painfully clear to anyone who checks Trajectories (AI prompts), which can be viewed.

As per The Information, Meta employees used a total of 60.2 trillion AI tokens (!!) in 30 days. If this was charged at Anthropic’s API prices, it would cost $900M. Of course, Meta is likely purchasing tokens at a discount, but that could still come in at $100M+ – in large part from senseless “tokenmaxxing”.

After backlash on social media, Meta abolished the internal leaderboard last week. One day after The Information revealed details about the incredible tokenmaxxing numbers, I confirmed that Meta has taken down its leaderboard; perhaps they realized that the incentive created enormous and unnecessary waste. If so, it’s a bit surprising that it took media coverage for the social media giant to reach that conclusion.

One engineer at Meta told me they think Meta had a different goal with the token leaderboard. A long-tenured engineer suspects increasing AI usage actually was the real goal. They said:

“Putting a leaderboard in place was always going to incentivize much more AI usage. And more AI usage means producing a lot more real-world traces. These traces can then be used to train Meta’s next-generation coding model better.

I believe this was the goal, even if no one said it out loud.

It’s an expensive way to generate data for training, but if any company has the means to do so, it’s Meta.”

Microsoft: full-force tokenmaxxing

Similarly, Microsoft has had an internal token leaderboard like Meta’s since January, and it started pretty well, as I reported back at the time: there’s an internal token dashboard that displays the individuals who use the most tokens in order to promote the use of tokens and experimentation with LLMs. At the Windows maker, this leaderboard is interesting:

Very senior engineers – distinguished-level folks – are in the top 5 across the whole company, despite the fact that this group generally wrote little code in the past.
VP-level folks make the top 10 and top 20, despite often being in meetings for most of the day and rarely writing code.

However, what starts as a metric for performance reviews or promotions can quickly become a target for devs. I talked with a software engineer at the Windows maker who admitted they’re full-on “tokenmaxxing” – not to get on the leaderboard, but rather because they don’t want to be seen as using too few tokens:

“We have internal dashboards and metrics tracking AI usage, token usage, percentage of code written by AI vs hand-written code.

I am conscious of not wanting to be seen as “uses too little AI,” and I’m not ashamed to say I need to do tokenmaxxing to do this. Things I do to inflate my token usage metrics:Ask AI questions about the code already in the documentation. The AI pulls up the documentation, processes it, and gives me results 10x slower, but while burning lots of tokens. I could use “readthedocs” [an internal product], but then my token numbers would be lowerAsk the AI to prototype a feature that I have no intention of working on. Prompt it a few more times, then throw the whole thing awayDefault to always using the agent, even when I know I could do the work by hand much faster. Then watch it fail”

This engineer is relatively new at the company, so is concerned about job security, and is playing this game to avoid being tagged as insufficiently “AI-native” by burning far more tokens than necessary.

Salesforce: burning tokens to hit “minimum” & “ideal” targets

Elsewhere, Salesforce has created “tokenmaxxing” incentives, as well. Talking with an engineer there, I learned that the company built two tools that effectively incentivize excessive spending on tokens:

“Minimum” incentives with a tracking tool. There’s a Mac widget that shows your own spend, updated every 15 minutes. It also displays minimum expected spend. Last week, the target was $100 on Claude Code, and $70 on Cursor.
Showing everyone’s spend. A web-based tool to see the token spend of any colleague. It’s used to check where team mates’ usage is at.
“Maximum” spend limits that can be exceeded. Up to a week ago, there was also a maximum monthly limit of $250 for Claude Code and $170 for Cursor. However, this can be exceeded with the simple press of a button if the limit is reached. I’ve learned that last week, some engineering organisations at Salesforce had their “maximum” limit removed in order to “remove any friction from the development process.”

The message Salesforce sends to staff is clear: “use a minimum of $170/month tokens or be flagged.” Who wants to get flagged for using too few tokens? The outcome is somewhat wasteful token spend:

Burning tokens for nothing. Devs ask Claude or Cursor: “build me X,” where X is a project or product with nothing to do with their work, and not something they’d ever ship. It’s just a way to burn tokens
Calibrating token spend to be above average. Plenty of devs browse peers’ token spend to figure out the slightly-above average point, then use the tokens needed to hit that mark

Shopify: an example on how to avoid tokenmaxxing

The first-ever token leaderboard that I’m aware of was built by Shopify in 2025. And it worked well! Last June, the Head of Engineering at Shopify, Farhan Thawar, told me on The Pragmatic Engineer Podcast:

“We have a leaderboard where we actively celebrate the people who use the most tokens because we want to make sure they are [celebrated] if they’re doing great work with AI.

[And for the top people on the leaderboard,] I want to see why they spent say $1,000 a month in credits for Cursor. Maybe that’s because they’re building something great and they have an agent workforce underneath them!”

I asked Farhan for details on how it’s gone since. Here’s what he told me:

“We have since renamed the token leaderboard to usage dashboard: for obvious reasons, as we don’t want to encourage “competing” to make it to the top of this board. We have token spend on our internal wiki profile as well as on the usage dashboard.

We also have circuit breakers to catch “runaway agents.” So if personal spend spikes within a day, we can cut off access immediately, and you can renew if the usage spike was deliberate, or if it was a runaway agent. The circuit breaker worked well for us: we’ve not only caught runaway agents, but found bugs in our infra this way!”

Shopify’s approach seems to have worked for a few reasons:

The usage dashboard served as a “push” for devs to use AI tools, early-on. Last year, devs were mostly experimenting with AI tools because they were not as performant as today. The usage dashboard encouraged developers to try new tools, and highlighted power users.
Circuit breakers helped. Cutting off spend when usage spikes helped catch “runaway agents.”
High usage is looked at. Farhan checks-in with top-spending individuals to understand the use cases. Any tokenmaxxing would likely have been spotted at this stage, which would have been a bit embarrassing for the user!

One more interesting learning Farhan shared with me: it’s more interesting to not look at “who spent the most in overall token cost?” but instead, “whose tokens cost the most?” Devs who generate tokens that come out as expensive have turned out to do in-depth work that was interesting to learn about!

Tokenmaxxing: great for AI vendors, bad for everyone else

I see very few rational reasons why incentivizing tokenmaxxing makes sense for any company. It results in increasing AI spend – by a lot! – in return for little to no value. Heck, in some cases it actually incentivises slower work – as shown by devs using the AI to answer questions when documentation is readily available – and encouraging ‘busywork’ where devs prompt projects that they don’t even want to ship. Tokenmaxxing seems to push devs to focus on stuff that makes no difference to a business.

It feels to me that a good part of the industry is using token count numbers similarly to how the lines-of-code-produced metric was used years ago. There was a time when the number of lines written daily or monthly was an important metric in programmer productivity, until it became clear that it’s a terrible thing to focus on. A lines-of-code metric can easily be gamed by writing boilerplate or throwaway code. Also, the best developers are not necessarily those who write the most code; they’re the ones who solve hard problems for the business quickly and reliably with – or without – code!

Similarly, the number of tokens a dev generates can easily be gamed, and if this metric is measured then devs will indeed game it. But doing so generates a massive accompanying AI bill!

—-

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse. This week’s issue covers:

New trend: token spend breaks budgets – what next? In the past 2-3 months, spending on AI agents has exploded at many tech companies, and the ramifications of this are starting to dawn on engineering leaders. We’ve sourced details from 15 companies, including the different ways they are coping with this realization.
New trend: more AI vendors can’t keep up with demand. Related to massively increased spending, GitHub Copilot and Anthropic are starting to limit less-profitable individual users, so they can serve business users whose spend has easily 10x’d in the last few months. The exception is OpenAI and Codex.
Morale at Meta hits all-time low? Business is booming but devs at Meta are furious and worried due to looming layoffs, and an invasive tracking program rolled out to all US employees.

The Pulse: is GitHub still best for AI-native development?

Gergely Orosz — Fri, 03 Apr 2026 15:03:38 GMT

Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from last week’s The Pulse issue. Full subscribers received the article below eight days ago. If you’ve been forwarded this email, you can subscribe here.

We’re used to highly reliable systems which target four-nines of availability (99.99%, meaning about 52 minutes of downtime per year), and for it to be embarrassing to barely hit three nines (around 9 hours of downtime per year.) And yet, in the past month, GitHub’s reliability is down to one nine!

Here’s data from the third-party, “missing GitHub status page”, which was built after GitHub stopped updating its own status page due to terrible availability. Recently, things have looked poor:

GitHub down at one nine. Source: The Missing GitHub Status Page

This means that for every 30 days, GitHub had issues on 3 days, or issues/degradations for 2.5 hours daily (around 10% of the time.)

GitHub seems unable to keep up with the massive increase in infra load from agents. One software engineer built a clever website called “Claude’s Code” that tracks Claude Code bot contributions across GitHub. Growth in the past three months has been enormous:

Load from Claude Code has 6x’d in 3 months. Source: Claude’s Code

Stream of GitHub outages from infra overload

GitHub’s CTO, Vladimir Fedorov, addressed availability issues in a blog post and covered three major incidents:

2 February: security policies unintentionally blocked access to virtual machine metadata
9 February: a database cluster got overloaded
5 March: writes failed on a Redis cluster

Software engineer Lori Hochstein did a helpful analysis of these outages and the CTO’s response, and has interesting observations:

Saturation: the database cluster incident (9 Feb) was a case of the database getting saturated, due to higher-than-expected usage. Databases are harder to scale up than stateless services. GitHub also underestimated how much additional traffic there would be.
Failover + telemetry gap: the 2 Feb incident was a combination of an infra issue in one region failing over to a healthy region, and making things worse with a telemetry gap (incorrect security policies were applied in the new regions which blocked access to VM metadata)
Failover + configuration issue: the 5 March incident was uncannily similar: after a failover, a configuration issue blocked writes on a Redis cluster

It is certainly nice to get details from GitHub on these outages. It feels to me that infra strains are causing more infra issues → they trigger constraints faster → failovers are not as smooth as they should be. Could it be because GitHub keeps changing their existing systems?

Startup shows GitHub how it’s done

While GitHub struggles to keep up with the increase in load from AI agents generating more code and pull requests, a new startup called Pierre Computer claims to have built an “AI-native” solution for AI agents pushing code, which scales far beyond what GitHub can do. Pierre was founded by Jacob Thornton: formerly an engineer at Coinbase, Medium, and Twitter, and also the creator of the once-very popular Bootstrap CSS library.

Here’s what Pierre supports, which GitHub does not:

“In October [2025], Github shared they were averaging ~230 new repos per minute.

Last week we [at Pierre Computer] hit a sustained peak of > 15,000 repos per minute for 3 hours.

And in the last 30 days customers have created > 9M repos”

These are incredible numbers – if also self-reported – and something that GitHub clearly cannot get close to, at least not today! There are few details about customers, while the product – called Code.storage – seems to be in closed beta.

Still, this is the type of “git for AI agents” that GitHub has failed to build, and the type of infrastructure it needs badly.

Has GitHub lost focus and purpose?

GitHub’s reliability issues are acute enough that, if it keeps up, teams will start giving alternatives like small startups such as Pierre a try, or perhaps even consider self-hosting Git. But how did the largest Git host in the world neglect its customers, and fail to prepare its infra for an increase in code commits and pull requests?

Mitchell Hashimoto, founder of Ghostty, and a heavy user of GitHub himself, had advice on what he would do if he was in charge of GitHub, after growing frustrated with the state of its core offering. He writes (emphasis mine)

“Here’s what I’d do if I was in charge of GitHub, in order:

1. Establish a North Star plan around being critical infrastructure for agentic code lifecycles and determine a set of ways to measure that.

2. Fire everyone who works on or advocates for Copilot and shut it down. It’s not about the people, I’m sure there’s many talented people; you’re just working at the wrong company.

3. Buy Pierre and launch agentic repo hosting as the first agentic product. Repos would be separate from the legacy web product to start, since they’re likely burdened with legacy cross product interactions.

4. Re-evaluate all product lines and initiatives against the new North Star. I suspect 50% get cut (to make room for different ones).

The big idea is all agentic interactions should critically rely on GitHub APIs. Code review should be agentic but the labs should be building that into GH (not bolted in through GHA like today, real first class platform primitives). GH should absolutely launch an agent chat primitive, agent mailboxes are obviously good. GH should be a platform and not an agent itself.

This is going to be very obviously lacking since I only have external ideas to work off of and have no idea how GitHub internals are working, what their KPIs are or what North Star they define, etc.

But, with imperfect information, this is what I’d do.”

My sense is that GitHub has three concurrent problems:

GitHub and Copilot are entangled with Microsoft’s internal politics. GitHub’s Copilot in 2021 was the first massively successful “AI product.” Microsoft took the “Copilot” brand and used it across all of their product lines, creating low-quality AI integrations. Simultaneously, internal Microsoft orgs like Azure and Microsoft AI were trying to get their hands on GitHub, which is one of the most positive developer brands at Microsoft.
GitHub has no leader, seemingly by design. GitHub’s last CEO was Thomas Dohmke, who stepped down voluntarily, and Microsoft never backfilled the CEO role; instead carrying out a reorg to make GitHub part of Microsoft’s AI group and stripping its independence. It seems the “Microsoft AI” side won that battle.
GitHub has no focus, and is stuck chasing Copilot as a revenue source. GitHub has no CEO and is caught up in internal politics, so, what can GitHub teams do? The safest bet is to increase revenue and the best way to do that is by investing more into GitHub Copilot, and ignoring long-term issues like reliability.

I agree with Mitchell: GitHub has no “North Star” and we see a large org being dysfunctional. That lack of vision – and CEO – is hitting hard:

GitHub Copilot went from the most-used AI agent in 2021, to be overtaken by Claude Code, and is soon to be overtaken by Cursor.
As a platform, GitHub has no vision for how to evolve to support AI agents. Sure, GitHub has an MCP server, but it has no “AI-native git platform” that can handle the massive load AI agents generate.
GitHub keeps shipping small features and improvements without direction. For example, in October 2025, they started to work on stacked diffs. However, when it ships, the stacked diffs workflow might be mostly obsolete – at least with AI agents!

It’s easy to win a market when you do one thing better than anyone else in the world. Right now, GitHub is doing too many things and doing a subpar job with Copilot, its platform, and AI infra.

Read the full issue of last week’s The Pulse, or check out this week’s The Pulse.

Catch up with recent The Pragmatic Engineer issues:

Scaling Uber with Thuan Pham (Uber’s first CTO — podcast). We went into topics like scaling Uber from constant outages to global infrastructure, the shift to microservices and platform teams, and how AI is reshaping engineering.
Building WhatsApp with Jean Lee (podcast): Jean Lee, engineer #19 at WhatsApp, on scaling the app with a tiny team, the Facebook acquisition, and what it reveals about the future of engineering.
What will the Staff Engineer role look like in 2027 and beyond? What happens to the Staff engineer role when agents write more code? Actually, they could be more in demand than ever!