<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Deciphering Glyph</title><link href="https://blog.glyph.im/" rel="alternate"></link><link href="https://blog.glyph.im/feeds/all.atom.xml" rel="self"></link><id>https://blog.glyph.im/</id><updated>2026-03-03T21:24:00-08:00</updated><entry><title>What Is Code Review For?</title><link href="https://blog.glyph.im/2026/03/what-is-code-review-for.html" rel="alternate"></link><published>2026-03-03T21:24:00-08:00</published><updated>2026-03-03T21:24:00-08:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2026-03-03:/2026/03/what-is-code-review-for.html</id><summary type="html">&lt;p&gt;Code review is not for catching bugs.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;h2 id=humans-are-bad-at-perceiving&gt;Humans Are Bad At Perceiving&lt;/h2&gt;
&lt;p&gt;Humans are not particularly good at catching bugs.  For one thing, we get tired
easily.  &lt;a href="https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/"&gt;There is some science on this, indicating that humans can’t even
maintain enough concentration to review more than about 400 lines of code at a
time.&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We have existing terms of art, in various fields, for the ways in which the
human perceptual system fails to register stimuli. Perception fails when humans
are distracted, tired, overloaded, or merely improperly engaged.&lt;/p&gt;
&lt;p&gt;Each of these has implications for the fundamental limitations of code review
as an engineering practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Inattentional_blindness"&gt;Inattentional
  Blindness&lt;/a&gt;: you won’t
  be able to reliably find bugs that you’re &lt;em&gt;not&lt;/em&gt; looking for.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Repetition_blindness"&gt;Repetition Blindness&lt;/a&gt;:
  you won’t be able to reliably find bugs that you &lt;em&gt;are&lt;/em&gt; looking for, if they
  keep occurring.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.asisonline.org/security-management-magazine/articles/2013/05/vigilance-fatigue/"&gt;Vigilance
  Fatigue&lt;/a&gt;:
  you won’t be able to reliably find either kind of bugs, if you have to keep
  being alert to the presence of bugs all the time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;and, of course, the distinct but related &lt;a href="https://www.ibm.com/think/topics/alert-fatigue"&gt;Alert
  Fatigue&lt;/a&gt;: you won’t even be
  able to reliably &lt;em&gt;evaluate&lt;/em&gt; reports of &lt;em&gt;possible&lt;/em&gt; bugs, if there are too many
  false positives.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=never-send-a-human-to-do-a-machines-job&gt;Never Send A Human To Do A Machine’s Job&lt;/h2&gt;
&lt;p&gt;When you need to catch a category of error in your code reliably, you will need
a deterministic tool to evaluate — and, thanks to our old friend “alert
fatigue” above — ideally, to also remedy that type of error.  These tools will
relieve the need for a human to make the same repetitive checks over and over.
None of them are perfect, but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;to catch logical errors, use automated tests.&lt;/li&gt;
&lt;li&gt;to catch formatting errors, use autoformatters.&lt;/li&gt;
&lt;li&gt;to catch common mistakes, use linters.&lt;/li&gt;
&lt;li&gt;to catch common security problems, use a security scanner.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Don’t blame reviewers for missing these things.&lt;/p&gt;
&lt;p&gt;Code review should not be how you catch bugs.&lt;/p&gt;
&lt;h2 id=what-is-code-review-for-then&gt;What Is Code Review For, Then?&lt;/h2&gt;
&lt;p&gt;Code review is for three things.&lt;/p&gt;
&lt;p&gt;First, code review is for catching &lt;em&gt;process failures&lt;/em&gt;.  If a reviewer &lt;em&gt;has&lt;/em&gt;
noticed a few bugs of the same type in code review, that’s a sign that that
type of bug is probably getting &lt;em&gt;through&lt;/em&gt; review more often than it’s getting
caught.  Which means it’s time to figure out a way to deploy a tool or a test
into CI that will &lt;em&gt;reliably&lt;/em&gt; prevent that class of error, without requiring
reviewers to be vigilant to it any more.&lt;/p&gt;
&lt;p&gt;Second — and this is actually its &lt;em&gt;more important&lt;/em&gt; purpose — code review is a
tool for &lt;em&gt;acculturation&lt;/em&gt;.  Even if you already have good tools, good processes,
and good documentation, new members of the team won’t necessarily &lt;em&gt;know&lt;/em&gt; about
those things.  Code review is an opportunity for older members of the team to
introduce newer ones to existing tools, patterns, or areas of responsibility.
If you’re building an observer pattern, you might not realize that the codebase
you’re working in already has an existing idiom for doing that, so you wouldn’t
even think to search for it, but someone else who has worked more with the code
might know about it and help you avoid repetition.&lt;/p&gt;
&lt;p&gt;You will notice that I carefully avoided saying “junior” or “senior” in that
paragraph.  Sometimes the newer team member is actually more senior.  But also,
the acculturation goes both ways.  This is the third thing that code review is
for: &lt;em&gt;disrupting&lt;/em&gt; your team’s culture and avoiding stagnation.  If you have new
talent, a fresh perspective can &lt;em&gt;also&lt;/em&gt; be an extremely valuable tool for
building a healthy culture.  If you’re new to a team and trying to build
something with an observer pattern, and this codebase has no tools for that,
but your &lt;em&gt;last&lt;/em&gt; job did, and it used one from an open source library, that is a
good thing to point out in a review as well.  It’s an opportunity to spot areas
for improvement to culture, as much as it is to spot areas for improvement to
process.&lt;/p&gt;
&lt;p&gt;Thus, code review should be as hierarchically flat as possible.  If the goal of
code review were to spot bugs, it would make sense to reserve the ability to
review code to only the most senior, detail-oriented, rigorous engineers in the
organization.  But most teams already know that that’s a recipe for
brittleness, stagnation and bottlenecks.  Thus, even though we &lt;em&gt;know&lt;/em&gt; that not
everyone on the team will be equally good at spotting bugs, it is very common
in most teams to allow anyone past some fairly low minimum seniority bar to do
reviews, often as low as “everyone on the team who has finished onboarding”.&lt;/p&gt;
&lt;h2 id=oops-surprise-this-post-is-actually-about-llms-again&gt;Oops, Surprise, This Post Is Actually About LLMs Again&lt;/h2&gt;
&lt;p&gt;Sigh.  I’m as disappointed as you are, but there are no two ways about it: LLM
code generators are everywhere now, and we need to talk about how to deal with
them.  Thus, an important corollary of this understanding that code review is a
&lt;em&gt;social activity&lt;/em&gt;, is that LLMs are not &lt;em&gt;social actors&lt;/em&gt;, thus you cannot rely
on code review to inspect their output.&lt;/p&gt;
&lt;p&gt;My own &lt;em&gt;personal&lt;/em&gt; preference would be to eschew their use entirely, but in the
spirit of harm reduction, if you’re going to use LLMs to generate code, you
need to remember the ways in which LLMs are not like human beings.&lt;/p&gt;
&lt;p&gt;When you relate to a human colleague, you will expect that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;you can make decisions about what to focus on based on their level of
   experience and areas of expertise to know what problems to focus on; from a
   late-career colleague you might be looking for bad habits held over from
   legacy programming languages; from an earlier-career colleague you might be
   focused more on logical test-coverage gaps,&lt;/li&gt;
&lt;li&gt;and, they will learn from repeated interactions so that you can gradually
   focus less on a specific type of problem once you have seen that they’ve
   learned how to address it,&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With an LLM, by contrast, while errors can certainly be &lt;em&gt;biased&lt;/em&gt; a bit by the
prompt from the engineer and pre-prompts that might exist in the repository, the
types of errors that the LLM will make are somewhat more uniformly distributed
across the experience range.&lt;/p&gt;
&lt;p&gt;You will still find supposedly extremely sophisticated LLMs making &lt;a href="https://arxiv.org/html/2407.07064v2"&gt;extremely
common mistakes&lt;/a&gt;, specifically because
they &lt;em&gt;are&lt;/em&gt; common, and thus appear frequently in the training data.&lt;/p&gt;
&lt;p&gt;The LLM also can’t really learn.  An intuitive response to this problem is to
simply continue adding more and more instructions to its pre-prompt, treating
&lt;em&gt;that&lt;/em&gt; text file as its “memory”, but that &lt;a href="https://github.com/anthropics/claude-code/issues/2766"&gt;just doesn’t work, and probably
never will&lt;/a&gt;.  The
problem — “&lt;a href="https://research.trychroma.com/context-rot"&gt;context rot&lt;/a&gt;” is
somewhat fundamental to the nature of the technology.&lt;/p&gt;
&lt;p&gt;Thus, code-generators must be treated more adversarially than you would a human
code review partner.  When you notice it making errors, you &lt;em&gt;always&lt;/em&gt; have to
add tests to a mechanical, deterministic harness that will evaluates the code,
because the LLM cannot meaningfully learn from its mistakes outside a very
small context window in the way that a human would, so giving it &lt;em&gt;feedback&lt;/em&gt; is
unhelpful.  Asking it to just generate the code again still requires you to
review it &lt;em&gt;all&lt;/em&gt; again, and as we have previously learned, you, a human, cannot
review more than 400 lines at once.&lt;/p&gt;
&lt;h2 id=to-sum-up&gt;To Sum Up&lt;/h2&gt;
&lt;p&gt;Code review is a social process, and you should treat it as such.  When you’re
reviewing code from humans, share knowledge and encouragement as much as you
share bugs or unmet technical requirements.&lt;/p&gt;
&lt;p&gt;If you must reviewing code from an LLM, strengthen your automated code-quality
verification tooling and make sure that its agentic loop will fail on its own
when those quality checks fail immediately next time. Do not fall into the trap
of appealing to its feelings, knowledge, or experience, because it doesn’t have
any of those things.&lt;/p&gt;
&lt;p&gt;But for both humans &lt;em&gt;and&lt;/em&gt; LLMs, do not fall into the trap of thinking that your
code review process is catching your bugs.  That’s not its job.&lt;/p&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt; who are supporting my writing on
this blog.  If you like what you’ve read here and you’d
like to read more of it, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!&lt;/p&gt;&lt;/body&gt;</content><category term="misc"></category><category term="programming"></category><category term="process"></category><category term="ai"></category><category term="llm"></category></entry><entry><title>How To Argue With Me About AI, If You Must</title><link href="https://blog.glyph.im/2026/01/how-to-argue-with-me-about-ai.html" rel="alternate"></link><published>2026-01-04T21:22:00-08:00</published><updated>2026-01-04T21:22:00-08:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2026-01-04:/2026/01/how-to-argue-with-me-about-ai.html</id><summary type="html">&lt;p&gt;If you insist we have a conversation, please come prepared.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;p&gt;As you already know if you’ve read any of this blog in the last few years, I am
a somewhat reluctant — but nevertheless quite staunch — critic of LLMs.  This
means that I have enthusiasts of varying degrees sometimes taking issue with my
stance.&lt;/p&gt;
&lt;p&gt;It seems that I am not going to get away from discussions, and, let’s be
honest, pretty intense arguments about “AI” any time soon.  These arguments are
starting to make me quite upset.  So it might be time to set some rules of
engagement.&lt;/p&gt;
&lt;p&gt;I’ve written about all of these before at greater length, but this is a short
post because it’s not about the technology or making a broader point, it’s
about &lt;em&gt;me&lt;/em&gt;.  These are rules for engaging with me, personally, on this topic.
Others are welcome to adopt these rules if they so wish but I am not
encouraging anyone to do so.&lt;/p&gt;
&lt;p&gt;Thus, I’ve made this post as short as I can so everyone interested in engaging
can read the whole thing.  If you can’t make it through to the end, then please
just follow Rule Zero.&lt;/p&gt;
&lt;h2 id=rule-zero-maybe-dont&gt;Rule Zero: Maybe Don’t&lt;/h2&gt;
&lt;p&gt;You are welcome to ignore me.  You can think my take is stupid and I can think
yours is.  We don’t have to get into an Internet Fight about it; we can even
remain friends.  You do not need to instigate an argument with me at all, if
you think that my analysis is so bad that it doesn’t require rebutting.&lt;/p&gt;
&lt;h2 id=rule-one-no-just&gt;Rule One: No ‘Just’&lt;/h2&gt;
&lt;p&gt;As I explained in a post with perhaps the least-predictive title I’ve ever
written, &lt;a href="https://blog.glyph.im/2025/06/i-think-im-done-thinking-about-genai-for-now.html"&gt;“I Think I’m Done Thinking About genAI For
Now”&lt;/a&gt;, I’ve already
heard a bunch of bad arguments.  Don’t tell me to ‘just’ use a better model,
use an agentic tool, use a more recent version, or use some prompting trick
that you personally believe works better.  If you skim my work and think that I
must not have deeply researched anything or read about it because you don’t
like my conclusion, that is wrong.&lt;/p&gt;
&lt;h2 id=rule-two-no-look-at-this-cool-thing&gt;Rule Two: No ‘Look At This Cool Thing’&lt;/h2&gt;
&lt;p&gt;Purely as a productivity tool, I have had a terrible experience with genAI.
Perhaps you have had a great one.  Neat.  That’s great for you.  As I explained
&lt;em&gt;at great length&lt;/em&gt; in &lt;a href="https://blog.glyph.im/2025/08/futzing-fraction.html"&gt;“The Futzing Fraction”&lt;/a&gt;,
my concern with generative AI is that &lt;em&gt;I believe&lt;/em&gt; it is probably a &lt;em&gt;net
negative&lt;/em&gt; impact on productivity, based on both my experience and plenty of
citations. Go check out the copious footnotes if you’re interested in more
detail.&lt;/p&gt;
&lt;p&gt;Therefore, I have already acknowledged that you can get an LLM to do various
impressive, cool things, &lt;em&gt;sometimes&lt;/em&gt;.  If I tell you that you will, on average,
lose money betting on a slot machine, &lt;em&gt;a picture of a slot machine hitting a
jackpot is not evidence against my position&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id=rule-two-and-a-half-engage-in-metacognition&gt;Rule Two And A Half: Engage In Metacognition&lt;/h3&gt;
&lt;p&gt;I specifically didn’t title the previous rule “no anecdotes” because data
beyond anecdotes may be extremely expensive to produce.  I don’t want to say
you can never talk to me unless you’re doing a randomized controlled trial.
However, if you are going to tell me an anecdote about the way that you’re
using an LLM, I am interested in hearing &lt;em&gt;how you are compensating&lt;/em&gt; for the
well-documented biases that LLM use tends to induce.  Try to measure what you
can.&lt;/p&gt;
&lt;h2 id=rule-three-do-not-cite-the-deep-magic-to-me&gt;Rule Three: Do Not Cite The Deep Magic To Me&lt;/h2&gt;
&lt;p&gt;As I explained in &lt;a href="https://blog.glyph.im/2024/05/grand-unified-ai-hype.html"&gt;“A Grand Unified Theory of the AI Hype
Cycle”&lt;/a&gt;, I already know &lt;em&gt;quite a bit&lt;/em&gt; of
history of the “AI” label.  If you are tempted to tell me something about how
“AI” is really such a broad field, and it doesn’t just mean LLMs, especially if
you are &lt;em&gt;trying&lt;/em&gt; to launder the reputation of LLMs under the banner of jumbling
them together with other things that have been called “AI”, I assure you that
this will not be convincing to me.&lt;/p&gt;
&lt;h2 id=rule-four-ethics-are-not-optional&gt;Rule Four: Ethics Are Not Optional&lt;/h2&gt;
&lt;p&gt;I have made several arguments in my previous writing: there are ethical
arguments, efficacy arguments, structuralist arguments, efficiency arguments
and aesthetic arguments.&lt;/p&gt;
&lt;p&gt;I am happy to, for the purposes of a good-faith discussion, focus on a specific
set of concerns or an individual point that you want to make where you think I
got something wrong.  If you convince me that I am entirely incorrect about the
effectiveness or predictability of LLMs in general or as specific LLM product,
you don’t need to make a comprehensive argument about whether one should use
the technology overall.  I will even assume that you &lt;em&gt;have&lt;/em&gt; your own ethical
arguments.&lt;/p&gt;
&lt;p&gt;However, if you scoff at the idea that one &lt;em&gt;should have any ethical boundaries
at all&lt;/em&gt;, and think that there’s no reason to care about the overall utilitarian
impact of this technology, that it’s worth using no matter what else it does as
long as it makes you 5% better at your job, that’s sociopath behavior.&lt;/p&gt;
&lt;p&gt;This includes extreme whataboutism regarding things like the water use of
datacenters, other elements of the surveillance technology stack, and so on.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=consequences&gt;Consequences&lt;/h2&gt;
&lt;p&gt;These are rules, once again, just for engaging with &lt;em&gt;me&lt;/em&gt;. I have no particular
power to enact broader sanctions upon you, nor would I be inclined to do so if
I could.  However, if you can’t stay within these basic parameters &lt;em&gt;and&lt;/em&gt; you
insist upon continuing to direct messages to me about this topic, I will
summarily block you with no warning, on mastodon, email, GitHub, IRC, or
wherever else you’re choosing to do that.  This is for your benefit as well:
such a discussion will not be a productive use of either of our time.&lt;/p&gt;&lt;/body&gt;</content><category term="misc"></category><category term="ai"></category><category term="meta"></category></entry><entry><title>The Next Thing Will Not Be Big</title><link href="https://blog.glyph.im/2026/01/the-next-thing-will-not-be-big.html" rel="alternate"></link><published>2026-01-01T17:59:00-08:00</published><updated>2026-01-01T17:59:00-08:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2026-01-01:/2026/01/the-next-thing-will-not-be-big.html</id><summary type="html">&lt;p&gt;Disruption, too, will be disrupted.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;p&gt;The dawning of a new year is an opportune moment to contemplate what has
transpired in the old year, and consider what is likely to happen in the new
one.&lt;/p&gt;
&lt;p&gt;Today, I’d like to contemplate that contemplation itself.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;The 20th century was an era characterized by rapidly accelerating change in
technology and industry, creating shorter and shorter cultural cycles of
changes in lifestyles. Thus far, the 21st century seems to be following that
trend, at least in its recently concluded first quarter.&lt;/p&gt;
&lt;p&gt;The early half of the twentieth century saw the massive disruption caused by
electrification, radio, motion pictures, and then television.&lt;/p&gt;
&lt;p&gt;In 1971, Intel poured gasoline on that fire by releasing the 4004, a microchip
generally recognized as the first general-purpose microprocessor. Popular
innovations rapidly followed: the computerized cash register, the personal
computer, credit cards, cellular phones, text messaging, the Internet, the web,
online games, mass surveillance, app stores, social media.&lt;/p&gt;
&lt;p&gt;These innovations have arrived faster than previous generations, but also, they
have crossed a crucial threshold: that of the human lifespan.&lt;/p&gt;
&lt;p&gt;While the entire second millennium A.D. has been characterized by a gradually
accelerating rate of technological and social change — the printing press and
the industrial revolution were no slouches, in terms of changing society, and
those predate the 20th century — most of those changes had the benefit of
unfolding throughout the course of a generation or so.&lt;/p&gt;
&lt;p&gt;Which means that any &lt;em&gt;individual person&lt;/em&gt; in any given century up to the 20th
might remember &lt;em&gt;one&lt;/em&gt; major world-altering social shift within their lifetime,
not five to ten of them.  The diversity of human experience is vast, but &lt;em&gt;most&lt;/em&gt;
people would not &lt;em&gt;expect&lt;/em&gt; that the defining technology of their lifetime was
merely the latest in a progression of predictable civilization-shattering
marvels.&lt;/p&gt;
&lt;p&gt;Along with each of these successive generations of technology, we minted a new
generation of industry titans. Westinghouse, Carnegie, Sarnoff, Edison, Ford,
Hughes, Gates, Jobs, Zuckerberg, Musk. Not just individual rich people, but
entire new &lt;em&gt;classes&lt;/em&gt; of rich people that did not exist before. “Radio DJ”,
“Movie Star”, “Rock Star”, “Dot Com Founder”, were all new paths to wealth
opened (and closed) by specific technologies. While most of these people did
come from at least &lt;em&gt;some&lt;/em&gt; level of generational wealth, they no longer came
from a literal hereditary aristocracy.&lt;/p&gt;
&lt;p&gt;To &lt;em&gt;describe&lt;/em&gt; this new feeling of constant acceleration, a new phrase was
coined: “&lt;a href="https://grammarphobia.com/blog/2015/11/thing-2.html"&gt;The Next Big
Thing&lt;/a&gt;”.  In addition to
denoting that some Thing was coming and that it would be Big (i.e.: that it
would change a lot about our lives), this phrase also carries the strong
&lt;em&gt;implication&lt;/em&gt; that such a Thing would be a product.  Not a development in
social relationships or a shift in cultural values, but some new and amazing
form of conveying salted &lt;a href="https://en.wikipedia.org/wiki/Spam_(food)"&gt;meat&lt;/a&gt;
&lt;a href="https://en.wikipedia.org/wiki/Bovril"&gt;paste&lt;/a&gt; or what-have-you, that would make
whatever lucky tinkerer who stumbled into it into a billionaire — along with
any friends and family lucky enough to believe in their vision and get in on
the ground floor with an investment.&lt;/p&gt;
&lt;p&gt;In the latter part of the 20th century, our entire model of capital allocation
shifted to account for this widespread belief. No longer were mega-businesses
built by bank loans, stock issuances, and reinvestment of profit, the new model
was “Venture Capital”. Venture capital is a model of capital allocation
&lt;em&gt;explicitly predicated&lt;/em&gt; on the idea that carefully considering each bet on a
likely-to-succeed business and reducing one’s risk was a waste of time, because
the return on the equity from the Next Big Thing would be so disproportionately
huge — 10x, 100x, 1000x – that one could afford to make &lt;em&gt;at least&lt;/em&gt; 10 bad bets
for each good one, and still come out ahead.&lt;/p&gt;
&lt;p&gt;The biggest risk was in &lt;em&gt;missing the deal&lt;/em&gt;, not in giving a bunch of money to a
scam.  Thus, value investing and focus on fundamentals have been broadly
disregarded in favor of the pursuit of the Next Big Thing.&lt;/p&gt;
&lt;p&gt;If Americans of the twentieth century were temporarily embarrassed
millionaires, those of the twenty-first are all temporarily embarrassed
&lt;a href="https://en.wikipedia.org/wiki/Big_Tech#Acronyms"&gt;FAANG&lt;/a&gt; CEOs.&lt;/p&gt;
&lt;p&gt;The predicament that this tendency leaves us in today is that the world is
increasingly run by generations — GenX and Millennials — with the shared
experience that the computer industry, either hardware or software, would
produce some radical innovation every few years.  We assume that to be true.&lt;/p&gt;
&lt;p&gt;But all things change, even change itself, and that industry is beginning to
slow down.  Physically, transistor density is starting to &lt;a href="https://interestingengineering.com/innovation/transistors-moores-law"&gt;brush up against
physical
limits&lt;/a&gt;.
Economically, most people are drowning in more compute power than they know
what to do with anyway. Users already have most of what they need from the
Internet.&lt;/p&gt;
&lt;p&gt;The big new feature in every operating system is a bunch of &lt;a href="https://www.cnet.com/tech/mobile/73-of-iphone-owners-say-no-thanks-to-apple-intelligence-new-data-echoes-cnets-findings/"&gt;useless
junk&lt;/a&gt;
&lt;a href="https://www.windowscentral.com/microsoft/windows-11/2025-has-been-an-awful-year-for-windows-11-with-infuriating-bugs-and-constant-unwanted-features"&gt;nobody really
wants&lt;/a&gt;
and is seeing remarkably little uptake.  Social media and smartphones changed
the world, true, but… those are both innovations from 2008.  They’re just not
&lt;em&gt;new&lt;/em&gt; any more.&lt;/p&gt;
&lt;p&gt;So we are all — collectively, culturally — looking for the Next Big Thing, and
we keep not finding it.&lt;/p&gt;
&lt;p&gt;It wasn’t 3D printing. It wasn’t crowdfunding. It wasn’t smart watches. It
wasn’t VR. It wasn’t the Metaverse, it wasn’t Bitcoin, it wasn’t NFTs&lt;sup id=fnref:1:the-next-thing-will-not-be-big-2026-1&gt;&lt;a class=footnote-ref href=#fn:1:the-next-thing-will-not-be-big-2026-1 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;It’s also not AI, but this is why so many people &lt;em&gt;assume&lt;/em&gt; that it will be AI.
Because it’s got to be &lt;em&gt;something&lt;/em&gt;, right?  If it’s got to be &lt;em&gt;something&lt;/em&gt; then
AI is as good a guess as anything else right now.&lt;/p&gt;
&lt;p&gt;The fact is, &lt;em&gt;our lifetimes have been an extreme anomaly&lt;/em&gt;.  Things like the
Internet used to come along every thousand years or so, and while we might
expect that the pace will stay a bit higher than that, it is not reasonable to
expect that something new &lt;em&gt;like&lt;/em&gt; “personal computers” or “the Internet”&lt;sup id=fnref:3:the-next-thing-will-not-be-big-2026-1&gt;&lt;a class=footnote-ref href=#fn:3:the-next-thing-will-not-be-big-2026-1 id=fnref:3&gt;3&lt;/a&gt;&lt;/sup&gt;
will arrive again.&lt;/p&gt;
&lt;p&gt;We are not going to get rich by getting in on the ground floor of the next
Apple or the next Google because the next Apple and the next Google are Apple
and Google.  The industry is maturing.  Software technology, computer
technology, and internet technology are all maturing.&lt;/p&gt;
&lt;h2 id=there-will-be-next-things&gt;There Will Be Next Things&lt;/h2&gt;
&lt;p&gt;Research and development is happening in all fields all the time. Amazing new
developments quietly and regularly occur in pharmaceuticals and in materials
science.  But these are not predictable. They do not inhabit the public
consciousness until they’ve already happened, and they are rarely so profound
and transformative that they change &lt;em&gt;everybody’s&lt;/em&gt; life.&lt;/p&gt;
&lt;p&gt;There will even be new things in the computer industry, both software and
hardware. Foldable phones do address a real problem (I wish the screen were
even bigger but I don’t want to carry around such a big device), and would
probably be more popular if they got the costs under control.  One day
somebody’s going to crack the problem of volumetric displays, probably. Some VR
product will probably, eventually, hit a more realistic price/performance ratio
where the niche will expand at least a little more.&lt;/p&gt;
&lt;p&gt;Maybe there will even be something genuinely useful, which is recognizably
adjacent to the current “AI” fad, but if it is, it will be some &lt;em&gt;new
development&lt;/em&gt; that we haven’t seen yet.  If current AI technology were
sufficient to drive some interesting product, it would already be doing it, not
&lt;a href="https://theoutpost.ai/news-story/major-study-reveals-ai-benchmarks-may-be-misleading-casting-doubt-on-reported-capabilities-21513/"&gt;using marketing disguised as
science&lt;/a&gt;
to &lt;a href="https://www.wired.com/story/the-ai-industrys-scaling-obsession-is-headed-for-a-cliff/"&gt;conceal diminishing
returns&lt;/a&gt;
on current investments.&lt;/p&gt;
&lt;h2 id=but-they-will-not-be-big&gt;But They Will Not Be Big&lt;/h2&gt;
&lt;p&gt;The impulse to find the One Big Thing that will dominate the next five years is
a fool’s errand.  Incremental gains are diminishing across the board.  The
markets for time and attention&lt;sup id=fnref:2:the-next-thing-will-not-be-big-2026-1&gt;&lt;a class=footnote-ref href=#fn:2:the-next-thing-will-not-be-big-2026-1 id=fnref:2&gt;2&lt;/a&gt;&lt;/sup&gt; are largely saturated.  There’s no need for
another streaming service if 100% of your leisure time is already committed to
TikTok, YouTube and Netflix; famously, Netflix has already considered
&lt;a href="https://www.fastcompany.com/40491939/netflix-ceo-reed-hastings-sleep-is-our-competition"&gt;sleep&lt;/a&gt;
its primary competitor for close to a decade - years &lt;em&gt;before&lt;/em&gt; the pandemic.&lt;/p&gt;
&lt;p&gt;Those rare tech markets which &lt;em&gt;aren’t&lt;/em&gt; saturated are suffering from pedestrian
economic problems like wealth inequality, not technological bottlenecks.&lt;/p&gt;
&lt;p&gt;For example, the thing preventing the development of a robot that can do your
laundry and your dishes without your input is not necessarily that we couldn’t
build something like that, but that most households just &lt;em&gt;can’t afford it&lt;/em&gt;
without &lt;a href="https://www.epi.org/productivity-pay-gap/"&gt;wage growth catching up to productivity
growth&lt;/a&gt;.  It doesn’t make sense for
anyone to commit to the substantial R&amp;amp;D investment that such a thing would
take, if the market doesn’t exist because the average worker isn’t paid enough
to afford it on top of all the &lt;em&gt;other&lt;/em&gt; tech which is already required to exist
in society.&lt;/p&gt;
&lt;p&gt;The projected income from the tiny, wealthy sliver of the population who
&lt;em&gt;could&lt;/em&gt; pay for the hardware, cannot justify an investment in the software past
a &lt;a href="https://futurism.com/future-society/robot-servant-neo-remote-controlled"&gt;fake version remotely operated by workers in the global south, only made
possible by Internet wage
arbitrage&lt;/a&gt;,
i.e. a more palatable, modern version of indentured servitude.&lt;/p&gt;
&lt;p&gt;Even if we were to accept the premise of an actually-“AI” version of this, that
is still just a wish that ChatGPT could somehow improve enough behind the
scenes to replace that worker, not any substantive investment in a novel,
proprietary-to-the-chores-robot software system which could &lt;em&gt;reliably&lt;/em&gt; perform
specific functions.&lt;/p&gt;
&lt;h2 id=what-then&gt;What, Then?&lt;/h2&gt;
&lt;p&gt;The expectation for, and lack of, a “big thing” is a big problem.  There are
others who could describe its economic, political, and financial dimensions
better than I can.  So then let me speak to my expertise and my audience: open
source software developers.&lt;/p&gt;
&lt;p&gt;When I began my own involvement with open source, a big part of the draw for me
was participating in a low-cost (to the corporate developer) but high-value (to
society at large) positive externality.  None of my employers would ever have
cared about many of the &lt;a href="https://deluge-torrent.org"&gt;applications&lt;/a&gt; for which
&lt;a href="https://twisted.org/"&gt;Twisted&lt;/a&gt; forms a core bit of infrastructure; nor would I
have been able to predict those applications’ existence.  Yet, it is nice to
have contributed to their development, even a little bit.&lt;/p&gt;
&lt;p&gt;However, it’s not actually a positive externality if the public at large can’t
directly &lt;em&gt;benefit&lt;/em&gt; from it.&lt;/p&gt;
&lt;p&gt;When &lt;em&gt;real&lt;/em&gt; world-changing, disruptive developments are occurring, the
bean-counters are not watching positive externalities too closely.  As we
discovered with &lt;a href="https://www.businessinsider.com/zirp-end-of-cushy-big-tech-job-perks-mass-layoffs-2024-2"&gt;many of the other benefits that temporarily accrued to
labor&lt;/a&gt;
in the tech economy, Open Source that is &lt;em&gt;usable by individuals and small
companies&lt;/em&gt; may have been a ZIRP.  If you know you’re gonna make a billion
dollars you’re not going to worry about giving away a few hundred thousand here
and there.&lt;/p&gt;
&lt;p&gt;When gains are smaller and harder to realize, and margins are starting to get
squeezed, it’s harder to justify the investment in vaguely good vibes.&lt;/p&gt;
&lt;p&gt;But this, itself, is not a call to action.  I doubt very much that anyone
reading this can do anything about the macroeconomic reality of higher interest
rates. The technological reality of “development is happening slower” is
inherently something that you can’t change on purpose.&lt;/p&gt;
&lt;p&gt;However, what we &lt;em&gt;can&lt;/em&gt; do is to be aware of this trend in our own work.&lt;/p&gt;
&lt;h2 id=fight-scale-creep&gt;Fight Scale Creep&lt;/h2&gt;
&lt;p&gt;It seems to me that more and more open source infrastructure projects are tools
for hyper-scale application development, only relevant to massive cloud
companies.  This is just a subjective assessment on my part — I’m not sure what
tools even exist today to measure this empirically — but I remember a big part
of the open source community when I was younger being things like Inkscape,
Themes.Org and Slashdot, not React, Docker Hub and Hacker News.&lt;/p&gt;
&lt;p&gt;This is not to say that the hobbyist world no longer exists. There is of course
a ton of stuff going on with Raspberry Pi, Home Assistant, OwnCloud, and so on.
If anything there’s a bit of a resurgence of self-hosting.  But the interests
of self-hosters and corporate developers are growing apart; there seems to be
far less of a beneficial overflow from corporate infrastructure projects into
these enthusiast or prosumer communities.&lt;/p&gt;
&lt;p&gt;This is the concrete call to action: if you are employed in any capacity as an
open source maintainer, dedicate &lt;em&gt;more&lt;/em&gt; energy to medium- or small-scale open
source projects.&lt;/p&gt;
&lt;p&gt;If your assumption is that you will eventually reach a hyper-scale inflection
point, then mimicking Facebook and Netflix is likely to be a good idea.
However, if we can all admit to ourselves that we’re &lt;em&gt;not&lt;/em&gt; going to achieve a
trillion-dollar valuation and a hundred thousand engineer headcount, we can
begin to consider ways to make our Next Thing a bit smaller, and to accommodate
the world as it is rather than as we wish it would be.&lt;/p&gt;
&lt;h2 id=be-prepared-to-scale-down&gt;Be Prepared to Scale Down&lt;/h2&gt;
&lt;p&gt;Here are some design guidelines you might consider, for just about any open
source project, particularly infrastructure ones:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Don’t assume that your software can sustain an arbitrarily large fixed
   overhead because “you just pay that cost once” and you’re going to be
   running a billion instances so it will always amortize; maybe you’re only
   going to be running ten.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Remember that such fixed overhead includes not just CPU, RAM, and filesystem
   storage, but also the learning curve for developers.  Front-loading a
   massive amount of conceptual complexity to accommodate the problems of
   hyper-scalers is a common mistake.  Try to smooth out these complexities and
   introduce them only when necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Test your code on edge devices. This means supporting Windows and macOS, and
   even Android and iOS.  If you want your tool to help empower individual
   users, you will need to meet them where they are, which is &lt;em&gt;not&lt;/em&gt; on an EC2
   instance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This includes considering Desktop Linux as a platform, as opposed to Server
   Linux as a platform, which (while they certainly have plenty in common) they
   are also distinct in some details.  Consider the highly specific example of
   secret storage: if you are writing something that intends to live in a cloud
   environment, and you need to configure it with a secret, you will probably
   want to provide it via a text file or an environment variable.  By contrast,
   if you want this same code to run on a desktop system, your users will
   expect you to support the &lt;a href="https://specifications.freedesktop.org/secret-service/latest/"&gt;Secret
   Service&lt;/a&gt;.
   This will likely only require a few lines of code to accommodate, but it is
   a massive difference to the user experience.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Don’t rely on LLMs remaining cheap or free.  If you have LLM-related
   features&lt;sup id=fnref:5:the-next-thing-will-not-be-big-2026-1&gt;&lt;a class=footnote-ref href=#fn:5:the-next-thing-will-not-be-big-2026-1 id=fnref:5&gt;4&lt;/a&gt;&lt;/sup&gt;, make sure that they are sufficiently severable from the rest of
   your offering that if ChatGPT starts costing $1000 a month, your tool
   doesn’t break completely.  Similarly, do not require that your users have
   easy access to half a terabyte of VRAM and a rack full of 5090s in order to
   run a local model.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Even if you &lt;em&gt;were&lt;/em&gt; going to scale up to infinity, the ability to scale down and
consider smaller deployments means that you can run more comfortably on, for
example, a developer’s laptop.  So even if you can’t convince your employer
that this is where the economy and the future of technology in our lifetimes is
going, it can be easy enough to justify this sort of design shift, particularly
as individual choices.  Make your onboarding cheaper, your development feedback loops tighter, and your systems generally more resilient to economic headwinds.&lt;/p&gt;
&lt;p&gt;So, please design your open source libraries, applications, and services to run
on smaller devices, with less complexity.  It will be worth your time as well
as your users’.&lt;/p&gt;
&lt;p&gt;But if you &lt;em&gt;can&lt;/em&gt; fix the whole wealth inequality thing, do that first.&lt;/p&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt; who are supporting my writing on
this blog.  If you like what you’ve read here and you’d
like to read more of it, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:the-next-thing-will-not-be-big-2026-1&gt;
&lt;p id=fn:1&gt;&lt;a href="https://www.technologyreview.com/10-breakthrough-technologies/2013/"&gt;These sorts of
lists&lt;/a&gt; are
pretty funny reads, in retrospect. &lt;a class=footnote-backref href=#fnref:1:the-next-thing-will-not-be-big-2026-1 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:the-next-thing-will-not-be-big-2026-1&gt;
&lt;p id=fn:2&gt;Which is to say, “distraction”. &lt;a class=footnote-backref href=#fnref:2:the-next-thing-will-not-be-big-2026-1 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:3:the-next-thing-will-not-be-big-2026-1&gt;
&lt;p id=fn:3&gt;... or even their lesser-but-still-profound aftershocks like “Social
Media”, “Smartphones”, or “On-Demand Streaming Video” ...
secondary manifestations of the underlying innovation of a packet-switched
global digital network ... &lt;a class=footnote-backref href=#fnref:3:the-next-thing-will-not-be-big-2026-1 title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:5:the-next-thing-will-not-be-big-2026-1&gt;
&lt;p id=fn:5&gt;My preference would of course be that you just didn’t have such features
at all, but perhaps even if you agree with me, you are part of an
organization with some mandate to implement LLM stuff.  Just try not to
wrap the chain of this anchor &lt;em&gt;all&lt;/em&gt; the way around your code’s neck. &lt;a class=footnote-backref href=#fnref:5:the-next-thing-will-not-be-big-2026-1 title="Jump back to footnote 4 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="startups"></category><category term="ai"></category><category term="open-source"></category><category term="programming"></category><category term="politics"></category></entry><entry><title>The “Dependency Cutout” Workflow Pattern, Part I</title><link href="https://blog.glyph.im/2025/11/dependency-cutout-workflow-pattern.html" rel="alternate"></link><published>2025-11-10T17:44:00-08:00</published><updated>2025-11-10T17:44:00-08:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-11-10:/2025/11/dependency-cutout-workflow-pattern.html</id><summary type="html">&lt;p&gt;It’s important to be able to fix bugs in your open source
dependencies, and not just work around them.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;p&gt;Tell me if you’ve heard this one before.&lt;/p&gt;
&lt;p&gt;You’re working on an application.  Let’s call it “FooApp”.  FooApp has a
dependency on an open source library, let’s call it “LibBar”.  You find a bug
in LibBar that affects FooApp.&lt;/p&gt;
&lt;p&gt;To envisage the best possible version of this scenario, let’s say you actively
&lt;em&gt;like&lt;/em&gt; LibBar, both technically and socially.  You’ve contributed to it in the
past.  But this bug is causing production issues in FooApp &lt;em&gt;today&lt;/em&gt;, and
LibBar’s release schedule is quarterly.  FooApp is your job; LibBar is (at
best) your hobby.  Blocking on the full upstream contribution cycle and waiting
for a release is an absolute non-starter.&lt;/p&gt;
&lt;p&gt;What do you do?&lt;/p&gt;
&lt;p&gt;There are a few common reactions to this type of scenario, all of which are
bad options.&lt;/p&gt;
&lt;p&gt;I will enumerate them specifically here, because I suspect that some of them
may resonate with many readers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Find an alternative to LibBar, and switch to it.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a bad idea because a transition to a core infrastructure component
could be extremely expensive.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Vendor LibBar into your codebase and fix your vendored version.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a bad idea because carrying this one fix now requires you to
maintain all the tooling associated with a monorepo&lt;sup id=fnref:1:dependency-cutout-workflow-pattern-2025-11&gt;&lt;a class=footnote-ref href=#fn:1:dependency-cutout-workflow-pattern-2025-11 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt;: you have to be
able to start pulling in new versions from LibBar regularly, reconcile your
changes even though you now have a separate version history on your
imported version, and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Monkey_patch"&gt;Monkey-patch&lt;/a&gt; LibBar to
   include your fix.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a bad idea because you are now extremely tightly coupled to a
specific version of LibBar.  By modifying LibBar internally like this,
you’re inherently violating its compatibility contract, in a way which is
going to be extremely difficult to test.  You &lt;em&gt;can&lt;/em&gt; test this change, of
course, but as LibBar changes, you will need to replicate any relevant
portions of its test suite (which may be its &lt;em&gt;entire&lt;/em&gt; test suite) in
FooApp.  Lots of potential duplication of effort there.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Implement a workaround in your own code, rather than fixing it.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a bad idea because you are distorting the responsibility for
correct behavior.  LibBar is supposed to do LibBar’s job, and unless you
have a full wrapper for it in your own codebase, other engineers (including
“yourself, personally”) might later forget to go through the alternate,
workaround codepath, and invoke the buggy LibBar behavior again in some new
place.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Implement the fix upstream in LibBar anyway, because that’s the Right
   Thing To Do, and burn credibility with management while you anxiously wait
   for a release with the bug in production.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a bad idea because you are betraying your users — by allowing the
buggy behavior to persist — for the workflow convenience of your dependency
providers. Your users are probably giving you money, and trusting you with
their data. This means you have both ethical and economic obligations to
consider their interests.&lt;/p&gt;
&lt;p&gt;As much as it’s nice to participate in the open source community and take
on an appropriate level of burden to maintain the commons, this cannot
sustainably be at the explicit expense of the population you serve
directly.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Even if&lt;/em&gt; we only care about the open source maintainers here, there’s
still a problem: as you are likely to come under immediate pressure to ship
your changes, you will inevitably relay at least a bit of that stress to
the maintainers.  Even if you try to be exceedingly polite, the maintainers
will know that &lt;em&gt;you&lt;/em&gt; are coming under fire for not having shipped the fix
yet, and are likely to feel an even greater burden of obligation to ship
your code fast.&lt;/p&gt;
&lt;p&gt;Much as it’s good to contribute the fix, it’s not great to put this on the
maintainers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The respective incentive structures of software development — specifically, of
corporate application development and open source infrastructure development —
make options 1-4 very common.&lt;/p&gt;
&lt;p&gt;On the corporate / application side, these issues are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;it’s difficult for corporate developers to get clearance to spend even small amounts of
  their work hours on upstream open source projects, but clearance to spend
  time on the project they actually work on is implicit.  If it takes 3 hours
  of wrangling with Legal&lt;sup id=fnref:2:dependency-cutout-workflow-pattern-2025-11&gt;&lt;a class=footnote-ref href=#fn:2:dependency-cutout-workflow-pattern-2025-11 id=fnref:2&gt;2&lt;/a&gt;&lt;/sup&gt; and 3 hours of implementation work to fix the
  issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of
  implementation work in FooApp, a FooApp developer will often perceive it as
  “easier” to fix the issue downstream.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;it’s difficult for corporate developers to get clearance from management to
  spend even small amounts of &lt;em&gt;money&lt;/em&gt; sponsoring upstream reviewers, so even if
  they can find the time to contribute the fix, chances are high that it will
  remain stuck in review unless they are personally well-integrated members of
  the LibBar development team already.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;even assuming there’s zero pressure whatsoever to avoid open sourcing the
  upstream changes, there’s still the fact inherent to any development team
  that FooApp’s developers will be more familiar with FooApp’s codebase and
  development processes than they are with LibBar’s.  It’s just &lt;em&gt;easier&lt;/em&gt; to
  work there, even if all other things are equal.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;systems for tracking risk from open source dependencies often lack visibility
  into vendoring, particularly if you’re doing a hybrid approach and only
  vendoring a &lt;em&gt;few&lt;/em&gt; things to address work in progress, rather than a
  comprehensive and disciplined approach to a monorepo.  If you fully absorb a
  vendored dependency and then modify it, Dependabot isn’t going to tell you
  that a new version is available any more, because it won’t be present in your
  dependency list.  Organizationally this is bad of course but from the
  perspective of an &lt;em&gt;individual developer&lt;/em&gt; this manifests mostly as fewer
  annoying emails.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But there are problems on the open source side as well.  Those problems are all
derived from one big issue: because we’re often working with relatively small
sums of money, it’s hard for upstream open source developers to &lt;em&gt;consume&lt;/em&gt;
either money or patches from application developers.  It’s nice to say that you
should contribute money to your dependencies, and you absolutely &lt;em&gt;should&lt;/em&gt;, but
the cost-benefit function is discontinuous.  Before a project reaches the
fiscal threshold where it can be at least &lt;em&gt;one&lt;/em&gt; person’s full-time job to worry
about this stuff, there’s often no-one responsible in the first place.
Developers will therefore gravitate to the issues that are either fun, or
relevant to their &lt;em&gt;own&lt;/em&gt; job.&lt;/p&gt;
&lt;p&gt;These mutually-reinforcing incentive structures are a big reason that users of
open source infrastructure, even teams who work at corporate users with
zillions of dollars, don’t reliably contribute back.&lt;/p&gt;
&lt;h2 id=the-answer-we-want&gt;The Answer We Want&lt;/h2&gt;
&lt;p&gt;All those options are bad. If we had a good option, what would it look like?&lt;/p&gt;
&lt;p&gt;It is both practically necessary&lt;sup id=fnref:3:dependency-cutout-workflow-pattern-2025-11&gt;&lt;a class=footnote-ref href=#fn:3:dependency-cutout-workflow-pattern-2025-11 id=fnref:3&gt;3&lt;/a&gt;&lt;/sup&gt; and morally required&lt;sup id=fnref:4:dependency-cutout-workflow-pattern-2025-11&gt;&lt;a class=footnote-ref href=#fn:4:dependency-cutout-workflow-pattern-2025-11 id=fnref:4&gt;4&lt;/a&gt;&lt;/sup&gt; for you to have a
way to temporarily rely on a modified version of an open source dependency,
&lt;em&gt;without&lt;/em&gt; permanently diverging.&lt;/p&gt;
&lt;p&gt;Below, I will describe a desirable abstract workflow for achieving this goal.&lt;/p&gt;
&lt;h3 id=step-0-report-the-problem&gt;Step 0: Report the Problem&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Before&lt;/em&gt; you get started with any of these other steps, write up a clear
description of the problem and report it to the project as an issue;
specifically, &lt;em&gt;in contrast to&lt;/em&gt; writing it up as a pull request.  Describe the
problem &lt;em&gt;before&lt;/em&gt; submitting a solution.&lt;/p&gt;
&lt;p&gt;You may not be able to wait for a volunteer-run open source project to respond
to your request, but you should &lt;em&gt;at least&lt;/em&gt; tell the project what you’re
planning on doing.&lt;/p&gt;
&lt;p&gt;If you don’t hear back from them at all, you will have at least made sure to
comprehensively describe your issue and strategy beforehand, which will provide
some clarity and focus to your changes.&lt;/p&gt;
&lt;p&gt;If you &lt;em&gt;do&lt;/em&gt; hear back from them, in the worst case scenario, you may discover
that a hard fork will be necessary because they don’t consider your issue
valid, but even that information will save you time, if you know it before you
get started.  In the best case, you may get a reply from the project telling
you that you’ve misunderstood its functionality and that there is already a
configuration parameter or usage pattern that will resolve your problems with
no new code.  But in all cases, you will benefit from early coordination on
&lt;em&gt;what&lt;/em&gt; needs fixing before you get to &lt;em&gt;how&lt;/em&gt; to fix it.&lt;/p&gt;
&lt;h3 id=step-1-source-code-and-ci-setup&gt;Step 1: Source Code and CI Setup&lt;/h3&gt;
&lt;p&gt;Fork the source code for your upstream dependency to a writable location where
it can live at least for the duration of this one bug-fix, and possibly for the
duration of your application’s use of the dependency.  After all, you might
want to fix more than &lt;em&gt;one&lt;/em&gt; bug in LibBar.&lt;/p&gt;
&lt;p&gt;You want to have a place where you can put your edits, that will be version
controlled and code reviewed according to your normal development process.
This probably means you’ll need to have your own main branch that diverges from
your upstream’s main branch.&lt;/p&gt;
&lt;p&gt;Remember: you’re going to need to deploy this to &lt;em&gt;your production&lt;/em&gt;, so testing
gates that your upstream only applies to final releases of LibBar will need to
be applied to every commit here.&lt;/p&gt;
&lt;p&gt;Depending on your LibBar’s own development process, this may result in slightly
unusual configurations where, for example, your fixes are written against the
last LibBar release tag, rather than its current&lt;sup id=fnref:5:dependency-cutout-workflow-pattern-2025-11&gt;&lt;a class=footnote-ref href=#fn:5:dependency-cutout-workflow-pattern-2025-11 id=fnref:5&gt;5&lt;/a&gt;&lt;/sup&gt; &lt;code&gt;main&lt;/code&gt;; if the project has a branch-freshness requirement, you
might need two branches, one for your upstream PR (based on main) and one for
your own use (based on the release branch with your changes).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Ideally&lt;/em&gt; for projects with really good CI and a strong “keep main
release-ready at all times” policy, you can deploy straight from a development
branch, but it’s good to take a moment to consider this before you get started.
It’s usually easier to rebase changes from an older HEAD onto a newer one than
it is to go backwards.&lt;/p&gt;
&lt;p&gt;Speaking of CI, you will want to have your own CI system. The fact that GitHub
Actions has become a de-facto lingua franca of continuous integration means
that this step may be quite simple, and your forked repo can just run its own
instance.&lt;/p&gt;
&lt;h4 id=optional-bonus-step-1a-artifact-management&gt;Optional Bonus Step 1a: Artifact Management&lt;/h4&gt;
&lt;p&gt;If you have an in-house artifact repository, you should set that up for your
dependency too, and upload your own build artifacts to it.  You can often treat
your modified dependency as an extension of your own source tree and install
from a GitHub URL, but if you’ve already gone to the trouble of having an
in-house package repository, you can pretend you’ve taken over maintenance of
the upstream package temporarily (which you kind of have) and leverage those
workflows for caching and build-time savings as you would with any other
internal repo.&lt;/p&gt;
&lt;h3 id=step-2-do-the-fix&gt;Step 2: Do The Fix&lt;/h3&gt;
&lt;p&gt;Now that you’ve got somewhere to edit LibBar’s code, you will want to actually
fix the bug.&lt;/p&gt;
&lt;h4 id=step-2a-local-filesystem-setup&gt;Step 2a: Local Filesystem Setup&lt;/h4&gt;
&lt;p&gt;&lt;em&gt;Before&lt;/em&gt; you have a production version on your own deployed branch, you’ll want
to test locally, which means having &lt;em&gt;both&lt;/em&gt; repositories in a single integrated
development environment.&lt;/p&gt;
&lt;p&gt;At this point, you will want to have a &lt;em&gt;local filesystem reference&lt;/em&gt; to your
LibBar dependency, so that you can make real-time edits, without going through
a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp
branch, and waiting for all of CI to run on both.&lt;/p&gt;
&lt;p&gt;This is useful in both directions: as you prepare the FooApp branch that makes
any necessary updates on that end, you’ll want to make sure that FooApp can
exercise the LibBar fix in any integration tests.  As you work on the LibBar
fix itself, you’ll also want to be able to use FooApp to exercise the code and
see if you’ve missed anything - and this, you wouldn’t get in CI, since LibBar
can’t depend on FooApp itself.&lt;/p&gt;
&lt;p&gt;In short, you want to be able to treat both projects as an integrated
&lt;em&gt;development environment&lt;/em&gt;, with support from your usual testing and debugging
tools, just as much as you want your deployment output to be an integrated
artifact.&lt;/p&gt;
&lt;h4 id=step-2b-branch-setup-for-pr&gt;Step 2b: Branch Setup for PR&lt;/h4&gt;
&lt;p&gt;However, for continuous integration to work, you will &lt;em&gt;also&lt;/em&gt; need to have a
remote resource reference of some kind from FooApp’s branch to LibBar.  You
will need 2 pull requests: the first to land your LibBar changes to your
internal LibBar fork and make sure it’s passing its &lt;em&gt;own&lt;/em&gt; tests, and then a
second PR to switch your LibBar dependency from the public repository to your
internal fork.&lt;/p&gt;
&lt;p&gt;At this step it is &lt;em&gt;very important&lt;/em&gt; to ensure that there is an issue filed on
your own internal backlog to drop your LibBar fork.  You do not want to lose
track of this work; it is technical debt that must be addressed.&lt;/p&gt;
&lt;p&gt;Until it’s addressed, automated tools like Dependabot will not be able to apply
security updates to LibBar for you; you’re going to need to manually integrate
every upstream change.  This type of work is itself very easy to drop or lose
track of, so you might just end up stuck on a vulnerable version.&lt;/p&gt;
&lt;h3 id=step-3-deploy-internally&gt;Step 3: Deploy Internally&lt;/h3&gt;
&lt;p&gt;Now that you’re confident that the fix will work, and that your
temporarily-internally-maintained version of LibBar isn’t going to break
anything on &lt;em&gt;your&lt;/em&gt; site, it’s time to deploy.&lt;/p&gt;
&lt;p&gt;Some &lt;a href="https://www.esa.int/Applications/Connectivity_and_Secure_Communications/Atlas_lifts_satcom_heritage#:~:text=They%20need%20proof%20that%20it%20has%20already%20worked%20in%20space%2C%20that%20it%20has%20‘flight%20heritage’"&gt;deployment
heritage&lt;/a&gt;
should help to provide &lt;em&gt;some&lt;/em&gt; evidence that your fix is ready to land in
LibBar, but at the next step, please remember that your production environment
isn’t necessarily emblematic of that of all LibBar users.&lt;/p&gt;
&lt;h3 id=step-4-propose-externally&gt;Step 4: Propose Externally&lt;/h3&gt;
&lt;p&gt;You’ve got the fix, you’ve tested the fix, you’ve got the fix in your own
production, you’ve told upstream you want to send them some changes.  Now, it’s
time to make the pull request.&lt;/p&gt;
&lt;p&gt;You’re likely going to get some feedback on the PR, even if you think it’s
already ready to go; as I said, despite having been proven in &lt;em&gt;your&lt;/em&gt; production
environment, you may get feedback about additional concerns from other users
that you’ll need to address before LibBar’s maintainers can land it.&lt;/p&gt;
&lt;p&gt;As you process the feedback, make sure that each new iteration of your branch
gets re-deployed to your own production. It would be a huge bummer to go
through all this trouble, and then end up unable to deploy the next publicly
released version of LibBar within FooApp because you forgot to test that your
responses to feedback &lt;em&gt;still worked&lt;/em&gt; on your own environment.&lt;/p&gt;
&lt;h4 id=step-4a-hurry-up-and-wait&gt;Step 4a: Hurry Up And Wait&lt;/h4&gt;
&lt;p&gt;If you’re lucky, upstream will land your changes to LibBar.  But, there’s still
no release version available.  Here, you’ll have to stay in a holding pattern
until upstream can finalize the release on their end.&lt;/p&gt;
&lt;p&gt;Depending on some particulars, it &lt;em&gt;might&lt;/em&gt; make sense at this point to archive
your internal LibBar repository and move your pinned release version to a git
hash of the LibBar version where your fix landed, in their repository.&lt;/p&gt;
&lt;p&gt;Before you do this, check in with the LibBar core team and make sure that they
understand that’s what you’re doing and they don’t have any wacky workflows
which may involve rebasing or eliding that commit as part of their release
process.&lt;/p&gt;
&lt;h3 id=step-5-unwind-everything&gt;Step 5: Unwind Everything&lt;/h3&gt;
&lt;p&gt;Finally, you eventually want to stop carrying any patches and move back to an
official released version that integrates your fix.&lt;/p&gt;
&lt;p&gt;You want to do this because this is what the upstream will expect when you are
reporting bugs.  Part of the benefit of using open source is benefiting from
the collective work to do bug-fixes and such, so you don’t want to be stuck off
on a pinned git hash that the developers do not support for anyone else.&lt;/p&gt;
&lt;p&gt;As I said in step 2b&lt;sup id=fnref:6:dependency-cutout-workflow-pattern-2025-11&gt;&lt;a class=footnote-ref href=#fn:6:dependency-cutout-workflow-pattern-2025-11 id=fnref:6&gt;6&lt;/a&gt;&lt;/sup&gt;, make sure to &lt;em&gt;maintain a tracking task&lt;/em&gt; for doing this
work, because leaving this sort of relatively &lt;em&gt;easy&lt;/em&gt;-to-clean-up technical debt
lying around is something that can potentially create a lot of aggravation for
no particular benefit.  Make sure to put your internal LibBar repository into
an appropriate state at this point as well.&lt;/p&gt;
&lt;h2 id=up-next&gt;Up Next&lt;/h2&gt;
&lt;p&gt;This is part 1 of a 2-part series.  In part 2, I will explore in depth how to
execute this workflow specifically for Python packages, using some popular
tools.  I’ll discuss my own workflow, standards like PEP 517 and
&lt;code&gt;pyproject.toml&lt;/code&gt;, and of course, by the popular demand that I just &lt;em&gt;know&lt;/em&gt; will
come, &lt;a href="https://github.com/astral-sh/uv"&gt;&lt;code&gt;uv&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt; who are supporting my writing on
this blog.  If you like what you’ve read here and you’d like to read more of
it, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:dependency-cutout-workflow-pattern-2025-11&gt;
&lt;p id=fn:1&gt;if you already have all the tooling associated with a monorepo,
&lt;em&gt;including&lt;/em&gt; the ability to manage divergence and reintegrate patches with
upstream, you already have the higher-overhead version of the workflow I am
going to propose, so, never mind. but chances are you don’t have that, very
few companies do. &lt;a class=footnote-backref href=#fnref:1:dependency-cutout-workflow-pattern-2025-11 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:dependency-cutout-workflow-pattern-2025-11&gt;
&lt;p id=fn:2&gt;In any business where one must wrangle with Legal, 3 hours is a &lt;em&gt;wildly&lt;/em&gt;
optimistic estimate. &lt;a class=footnote-backref href=#fnref:2:dependency-cutout-workflow-pattern-2025-11 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:3:dependency-cutout-workflow-pattern-2025-11&gt;
&lt;p id=fn:3&gt;&lt;a href="https://mastodon.social/@mcc/112117339397138167"&gt;c.f. &lt;code&gt;@mcc@mastodon.social&lt;/code&gt;&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:3:dependency-cutout-workflow-pattern-2025-11 title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:4:dependency-cutout-workflow-pattern-2025-11&gt;
&lt;p id=fn:4&gt;&lt;a href="https://mastodon.social/@geofft/112186487032016599"&gt;c.f. &lt;code&gt;@geofft@mastodon.social&lt;/code&gt;&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:4:dependency-cutout-workflow-pattern-2025-11 title="Jump back to footnote 4 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:5:dependency-cutout-workflow-pattern-2025-11&gt;
&lt;p id=fn:5&gt;In an ideal world every project would &lt;a href="https://martinfowler.com/articles/continuousIntegration.html#FixBrokenBuildsImmediately"&gt;keep its main branch ready to
release at all times, no matter
what&lt;/a&gt;
but we do not live in an ideal world. &lt;a class=footnote-backref href=#fnref:5:dependency-cutout-workflow-pattern-2025-11 title="Jump back to footnote 5 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:6:dependency-cutout-workflow-pattern-2025-11&gt;
&lt;p id=fn:6&gt;In this case, there is no question. It’s 2b only, no not-2b. &lt;a class=footnote-backref href=#fnref:6:dependency-cutout-workflow-pattern-2025-11 title="Jump back to footnote 6 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="programming"></category><category term="python"></category><category term="deployment"></category><category term="open-source"></category></entry><entry><title>The Futzing Fraction</title><link href="https://blog.glyph.im/2025/08/futzing-fraction.html" rel="alternate"></link><published>2025-08-15T00:51:00-07:00</published><updated>2025-08-15T00:51:00-07:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-08-15:/2025/08/futzing-fraction.html</id><summary type="html">&lt;p&gt;At least &lt;em&gt;some&lt;/em&gt; of your time with genAI will be spent just kind of… futzing with it.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;p&gt;The most optimistic vision of generative AI&lt;sup id=fnref:1:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:1:futzing-fraction-2025-8 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt; is that it will relieve us of
the tedious, repetitive elements of knowledge work so that we can get to work
on the &lt;em&gt;really&lt;/em&gt; interesting problems that such tedium stands in the way of.
Even if you fully believe in this vision, it’s hard to deny that &lt;em&gt;today&lt;/em&gt;, some
tedium is associated with the process of using generative AI itself.&lt;/p&gt;
&lt;p&gt;Generative AI also
&lt;a href="https://futurism.com/the-byte/openai-chatgpt-pro-subscription-losing-money"&gt;isn’t&lt;/a&gt;
&lt;a href="https://www.computerworld.com/article/4021954/rushing-into-genai-prepare-for-budget-blowouts-and-broken-promises.html"&gt;free&lt;/a&gt;,
and so, as responsible consumers, we need to ask: is it worth it?  What’s the
&lt;a href="https://www.investopedia.com/articles/basics/10/guide-to-calculating-roi.asp"&gt;ROI&lt;/a&gt;
of genAI, and how can we tell?  In this post, I’d like to explore a logical
framework for evaluating genAI expenditures, to determine if your organization
is getting its money’s worth.&lt;/p&gt;
&lt;h1 id=perpetually-proffering-permuted-prompts&gt;Perpetually Proffering Permuted Prompts&lt;/h1&gt;
&lt;p&gt;I think most LLM users would agree with me that a typical workflow with an LLM
rarely involves prompting it only one time and getting a perfectly useful
answer that solves the whole problem.&lt;/p&gt;
&lt;p&gt;Generative AI best practices, even &lt;a href="https://techcommunity.microsoft.com/blog/azuredevcommunityblog/evaluating-generative-ai-best-practices-for-developers/4271488#:~:text=Frequent%20and%20scheduled%20evaluations%20should%20be%20embedded%20into%20the%20development%20cycle"&gt;from the most optimistic
vendors&lt;/a&gt;
all suggest that you should continuously evaluate everything.  ChatGPT, which
is really the
&lt;a href="https://www.wheresyoured.at/the-haters-gui/#:~:text=ChatGPT%20has%20500%20million%20weekly%20users%2C%20and%20otherwise%2C%20it%20seems%20that%20other%20services%20struggle%20to%20get%2015%20million%20of%20them"&gt;only&lt;/a&gt;
genAI product with significantly scaled adoption, still says at the bottom of
&lt;em&gt;every&lt;/em&gt; interaction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;ChatGPT can make mistakes. Check important info.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If we have to “check important info” on every interaction, it stands to reason
that even if we think it’s useful, &lt;em&gt;some&lt;/em&gt; of those checks will find an error.
Again, if we think it’s useful, presumably the next thing to do is to perturb
our prompt somehow, and issue it again, in the hopes that the next invocation
will, by dint of either:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://pivot-to-ai.com/2025/06/05/generative-ai-runs-on-gambling-addiction-just-one-more-prompt-bro/"&gt;better luck this
time&lt;/a&gt;
   with the &lt;a href="https://en.wikipedia.org/wiki/Stochastic_parrot"&gt;stochastic&lt;/a&gt; aspect of the inference process,&lt;/li&gt;
&lt;li&gt;enhanced application of our skill to
   &lt;a href="https://www.fastcompany.com/91327911/prompt-engineering-going-extinct"&gt;engineer&lt;/a&gt;
   a better prompt based on the deficiencies of the current inference, or&lt;/li&gt;
&lt;li&gt;better performance of the model by populating additional
   &lt;a href="https://research.trychroma.com/context-rot"&gt;context&lt;/a&gt; in subsequent chained
   prompts.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Unfortunately, given the relative lack of &lt;em&gt;reliable&lt;/em&gt; methods to re-generate the
prompt and receive a better answer&lt;sup id=fnref:2:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:2:futzing-fraction-2025-8 id=fnref:2&gt;2&lt;/a&gt;&lt;/sup&gt;, checking the output and re-prompting
the model can feel like just kinda futzing around with it.  You try, you get a
wrong answer, you try a few more times, eventually you get the right answer
that you wanted in the first place.  It’s a somewhat unsatisfying process, but
if you get the right answer eventually, it does feel like progress, and you
didn’t need to use up another human’s time.&lt;/p&gt;
&lt;p&gt;In fact, the hottest buzzword of the last hype cycle is “agentic”.  While I
have my own feelings about this particular word&lt;sup id=fnref:3:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:3:futzing-fraction-2025-8 id=fnref:3&gt;3&lt;/a&gt;&lt;/sup&gt;, its current &lt;em&gt;practical&lt;/em&gt;
definition is “a generative AI system which automates the process of
re-prompting itself, by having a deterministic program evaluate its outputs for
correctness”.&lt;/p&gt;
&lt;p&gt;A better term for an “agentic” system would be a “self-futzing system”.&lt;/p&gt;
&lt;p&gt;However, the ability to automate &lt;em&gt;some&lt;/em&gt; level of checking and re-prompting does
not mean that you can &lt;em&gt;fully&lt;/em&gt; delegate tasks to an agentic tool, either.  It
is, plainly put, not safe. If you leave the AI on its own, you will get
&lt;em&gt;terrible&lt;/em&gt; results that will at best make for a funny story&lt;sup id=fnref:4:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:4:futzing-fraction-2025-8 id=fnref:4&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;sup id=fnref:5:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:5:futzing-fraction-2025-8 id=fnref:5&gt;5&lt;/a&gt;&lt;/sup&gt; and at
worst might end up causing serious damage&lt;sup id=fnref:6:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:6:futzing-fraction-2025-8 id=fnref:6&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;sup id=fnref:7:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:7:futzing-fraction-2025-8 id=fnref:7&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Taken together, this all means that for &lt;em&gt;any&lt;/em&gt; consequential task that you want
to accomplish with genAI, you need an expert &lt;a href="https://en.wikipedia.org/wiki/Human-in-the-loop"&gt;human in the
loop&lt;/a&gt;.  The human must be
capable of independently doing the job that the genAI system is being asked to
accomplish.&lt;/p&gt;
&lt;p&gt;When the genAI guesses correctly and produces usable output, some of the
human’s time will be saved.  When the genAI guesses wrong and produces
hallucinatory gibberish or even “correct” output that nevertheless fails to
account for some unstated but necessary property such as security or scale,
some of the human’s time will be wasted evaluating it and re-trying it.&lt;/p&gt;
&lt;h1 id=income-from-investment-in-inference&gt;Income from Investment in Inference&lt;/h1&gt;
&lt;p&gt;Let’s evaluate an abstract, hypothetical genAI system that can automate some
work for our organization.  To avoid implicating any specific vendor, let’s
call the system “Mallory”.&lt;/p&gt;
&lt;p&gt;Is Mallory worth the money?  How can we know?&lt;/p&gt;
&lt;p&gt;Logically, there are only two outcomes that might result from using Mallory to
do our work.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We prompt Mallory to do some work; we check its work, it is correct, and
   some time is saved.&lt;/li&gt;
&lt;li&gt;We prompt Mallory to do some work; we check its work, it fails, and we futz
   around with the result; this time is wasted.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As a &lt;em&gt;logical&lt;/em&gt; framework, this makes sense, but ROI is an arithmetical concept,
not a logical one.  So let’s translate this into some terms.&lt;/p&gt;
&lt;p&gt;In order to evaluate Mallory, let’s define the Futzing Fraction, “&lt;math&gt;
&lt;mi&gt;FF&lt;/mi&gt; &lt;/math&gt;”, in terms of the following variables:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt;&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;the average amount of time a &lt;strong&gt;&lt;em&gt;H&lt;/em&gt;&lt;/strong&gt;uman worker would take to do a task,
unaided by Mallory&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;math&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;/math&gt;&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;the amount of time that Mallory takes to run one &lt;strong&gt;&lt;em&gt;I&lt;/em&gt;&lt;/strong&gt;nference&lt;sup id=fnref:8:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:8:futzing-fraction-2025-8 id=fnref:8&gt;8&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;the amount of time that a human has to spend &lt;strong&gt;&lt;em&gt;C&lt;/em&gt;&lt;/strong&gt;hecking Mallory’s output for
each inference&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt;&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;the &lt;strong&gt;&lt;em&gt;P&lt;/em&gt;&lt;/strong&gt;robability that Mallory will produce a correct inference for each prompt&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;math&gt;&lt;mi&gt;W&lt;/mi&gt;&lt;/math&gt;&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;the average amount of time that it takes for a human to &lt;strong&gt;&lt;em&gt;W&lt;/em&gt;&lt;/strong&gt;rite one prompt for
Mallory&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;math&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/math&gt;&lt;/dt&gt;
&lt;dd&gt;
&lt;p&gt;since we are normalizing everything to &lt;em&gt;time&lt;/em&gt;, rather than &lt;em&gt;money&lt;/em&gt;, we do also have to account for the dollar of Mallory as as a product, so we will include the &lt;strong&gt;&lt;em&gt;E&lt;/em&gt;&lt;/strong&gt;quivalent amount of human time we could purchase for the marginal cost of one&lt;sup id=fnref:9:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:9:futzing-fraction-2025-8 id=fnref:9&gt;9&lt;/a&gt;&lt;/sup&gt; inference.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;As in last week’s example of &lt;a href="https://blog.glyph.im/2025/08/r0mls-ratio.html"&gt;simple ROI
arithmetic&lt;/a&gt;, we will put our costs in the
numerator, and our benefits in the denominator.&lt;/p&gt;
&lt;div style="font-size: 30px; text-align: center;"&gt;
&lt;math&gt;
    &lt;mi&gt;FF&lt;/mi&gt; &lt;mo&gt; = &lt;/mo&gt;
    &lt;mfrac&gt;
        &lt;mrow&gt;&lt;mi&gt;W&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/mrow&gt;
        &lt;mrow&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;!--&lt;mo&gt;✕&lt;/mo&gt;--&gt; &lt;mi&gt;H&lt;/mi&gt;&lt;/mrow&gt;
    &lt;/mfrac&gt;
&lt;/math&gt;
&lt;/div&gt;

&lt;p&gt;The idea here is that for each prompt, the &lt;em&gt;minimum&lt;/em&gt; amount of time-equivalent cost possible is &lt;math&gt;&lt;mi&gt;W&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/math&gt;.  The user must, at least once, write a prompt, wait for inference to run, then check the output; and, of course, pay any costs to Mallory’s vendor.&lt;/p&gt;
&lt;p&gt;If the probability of a correct answer is &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;/mfrac&gt;&lt;/math&gt;, then they will do this entire process 3 times&lt;sup id=fnref:10:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:10:futzing-fraction-2025-8 id=fnref:10&gt;10&lt;/a&gt;&lt;/sup&gt;, so we put &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; in the denominator.  Finally, we divide everything by &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt;, because we are trying to determine if we are actually saving any time or money, versus just letting our existing human, who has to be driving this process anyway, do the whole thing.&lt;/p&gt;
&lt;p&gt;If the Futzing Fraction evaluates to a number greater than 1, &lt;a href="https://blog.glyph.im/2025/08/r0mls-ratio.html"&gt;as previously discussed, you are a bozo&lt;/a&gt;; you’re spending more time futzing with Mallory than getting value out of it.&lt;/p&gt;
&lt;h1 id=figuring-out-the-fraction-is-frustrating&gt;Figuring out the Fraction is Frustrating&lt;/h1&gt;
&lt;p&gt;In order to even evaluate the value of the Futzing Fraction though, you have to
have a sound method to even get a vague sense of all the terms.&lt;/p&gt;
&lt;p&gt;If you are a business leader, a lot of this is relatively easy to measure.  You
vaguely know what &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt; is, because you know what your
payroll costs, and similarly, you can figure out &lt;math&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/math&gt; with
some pretty trivial arithmetic based on Mallory’s pricing table.  There are endless
YouTube channels, spec sheets and benchmarks to give you &lt;math&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;/math&gt;. &lt;math&gt;&lt;mi&gt;W&lt;/mi&gt;&lt;/math&gt; is probably going to be so small compared to &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt; that it hardly merits consideration&lt;sup id=fnref:11:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:11:futzing-fraction-2025-8 id=fnref:11&gt;11&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;But, are you measuring &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;?  If your employees are &lt;em&gt;not&lt;/em&gt; checking the outputs of the AI, you’re on a path to catastrophe that no ROI calculation can capture, so it had &lt;em&gt;better&lt;/em&gt; be greater than zero.&lt;/p&gt;
&lt;p&gt;Are you measuring &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt;?  How often does the AI get it right on the first try?&lt;/p&gt;
&lt;h2 id=challenges-to-computing-checking-costs&gt;Challenges to Computing Checking Costs&lt;/h2&gt;
&lt;p&gt;In the fraction defined above, the term &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; is going to be
large.  Larger than you think.&lt;/p&gt;
&lt;p&gt;Measuring &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; and &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; with a high
degree of precision is probably going to be very hard; possibly unreasonably
so, or too expensive&lt;sup id=fnref:12:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:12:futzing-fraction-2025-8 id=fnref:12&gt;12&lt;/a&gt;&lt;/sup&gt; to bother with in practice.  So you will undoubtedly need
to work with estimates and proxy metrics.  But you have to be aware that this
is a problem domain where your normal method of estimating is going to be
&lt;em&gt;extremely&lt;/em&gt; vulnerable to inherent cognitive bias, and find ways to measure.&lt;/p&gt;
&lt;h3 id=margins-money-and-metacognition&gt;Margins, Money, and Metacognition&lt;/h3&gt;
&lt;p&gt;First let’s discuss cognitive and metacognitive bias.&lt;/p&gt;
&lt;p&gt;My favorite cognitive bias is the &lt;a href="https://en.wikipedia.org/wiki/Availability_heuristic"&gt;availability
heuristic&lt;/a&gt; and a close
second is its cousin &lt;a href="https://en.wikipedia.org/wiki/Salience_(neuroscience)#Salience_bias"&gt;salience
bias&lt;/a&gt;.
Humans are empirically predisposed towards noticing and remembering things that
are more striking, and to overestimate their frequency.&lt;/p&gt;
&lt;p&gt;If you are estimating the variables above based on the &lt;em&gt;vibe&lt;/em&gt; that you’re
getting from the experience of using an LLM, you may be overestimating its
utility.&lt;/p&gt;
&lt;p&gt;Consider a slot machine.&lt;/p&gt;
&lt;p&gt;If you put a dollar in to a slot machine, and you lose that dollar, this is an
unremarkable event. Expected, even.  It doesn’t seem interesting.  You can
repeat this over and over again, a thousand times, and each time it will seem
equally unremarkable.  If you do it a thousand times, you will probably get
gradually more anxious as your sense of your dwindling bank account becomes
slowly more salient, but losing one more dollar still seems unremarkable.&lt;/p&gt;
&lt;p&gt;If you put a dollar in a slot machine and it gives you a &lt;em&gt;thousand&lt;/em&gt; dollars,
that will probably seem pretty cool.  Interesting.  Memorable.  You might tell
a story about this happening, but you definitely wouldn’t really remember any
particular time you lost one dollar.&lt;/p&gt;
&lt;p&gt;Luckily, when you arrive at a casino with slot machines, you probably know well
enough to set a hard budget in the form of some amount of physical currency you
will have available to you.  The odds are against you, you’ll probably lose it
all, but any responsible gambler will have an immediate, physical
representation of their balance in front of them, so when they have lost it
all, they can see that their hands are empty, and can try to resist the “just
one more pull” temptation, after hitting that limit.&lt;/p&gt;
&lt;p&gt;Now, consider Mallory.&lt;/p&gt;
&lt;p&gt;If you put ten minutes into writing a prompt, and Mallory gives a completely
off-the-rails, useless answer, and you lose ten minutes, well, that’s just what
using a computer is like sometimes.  Mallory malfunctioned, or hallucinated,
but it does that sometimes, everybody knows that.  You only wasted ten minutes.
It’s fine.  Not a big deal.  Let’s try it a few more times.  Just ten more
minutes.  It’ll probably work this time.&lt;/p&gt;
&lt;p&gt;If you put ten minutes into writing a prompt, and it completes a task that
would have otherwise taken you 4 hours, that feels amazing.  Like the computer
is &lt;em&gt;magic&lt;/em&gt;! An absolute endorphin rush.&lt;/p&gt;
&lt;p&gt;Very memorable.  When it happens, it feels like &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/math&gt;.&lt;/p&gt;
&lt;p&gt;But... did you have a time budget before you started?  Did you have a specified
N such that “I will give up on Mallory as soon as I have spent N minutes
attempting to solve this problem with it”?  When the jackpot finally pays out
that 4 hours, did you &lt;em&gt;notice&lt;/em&gt; that you put 6 hours worth of 10-minute prompt
coins into it in?&lt;/p&gt;
&lt;p&gt;If you are attempting to use the same sort of heuristic intuition that probably
works &lt;em&gt;pretty well&lt;/em&gt; for other business leadership decisions, Mallory’s
slot-machine chat-prompt user interface is practically &lt;em&gt;designed&lt;/em&gt; to subvert
those sensibilities.  Most business activities do not have nearly such an
emotionally variable, intermittent reward schedule.  They’re not going to trick
you with this sort of cognitive illusion.&lt;/p&gt;
&lt;p&gt;Thus far we have been talking about cognitive bias, but there is a
metacognitive bias at play too: while
&lt;a href="https://en.wikipedia.org/wiki/Dunning–Kruger_effect"&gt;Dunning-Kruger&lt;/a&gt;,
everybody’s favorite metacognitive bias does have some
&lt;a href="https://www.sciencedirect.com/science/article/pii/S1877042814051489#bib0040"&gt;problems&lt;/a&gt;
with it, the main underlying metacognitive bias is that we tend to &lt;em&gt;believe our
own thoughts and perceptions&lt;/em&gt;, and it requires active effort to distance
ourselves from them, even if we know they might be wrong.&lt;/p&gt;
&lt;p&gt;This means you must assume any &lt;em&gt;intuitive&lt;/em&gt; estimate of &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;
is going to be biased low; similarly &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; is going to be
biased high.  You will forget the time you spent checking, and you will
underestimate the number of times you had to re-check.&lt;/p&gt;
&lt;p&gt;To avoid this, you will need to decide on a &lt;a href="https://en.wikipedia.org/wiki/Ulysses_pact"&gt;Ulysses
pact&lt;/a&gt; to provide some inputs to a
calculation for these factors that you will not be able to able to fudge if
they seem wrong to you.&lt;/p&gt;
&lt;h3 id=problematically-plausible-presentation&gt;Problematically Plausible Presentation&lt;/h3&gt;
&lt;p&gt;Another nasty little cognitive-bias landmine for you to watch out for is the
&lt;a href="https://en.wikipedia.org/wiki/Authority_bias"&gt;authority bias&lt;/a&gt;, for two
reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;People will tend to see Mallory as an unbiased, external authority, and
   thereby see it as more of an authority than a similarly-situated human&lt;sup id=fnref:13:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:13:futzing-fraction-2025-8 id=fnref:13&gt;13&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;Being an LLM, Mallory will be overconfident in its answers&lt;sup id=fnref:14:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:14:futzing-fraction-2025-8 id=fnref:14&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The nature of LLM training is also such that commonly co-occurring tokens in
the training corpus produce higher likelihood of co-occurring in the output;
they’re just going to be closer together in the vector-space of the weights;
that’s, like, what training a model &lt;em&gt;is&lt;/em&gt;, establishing those relationships.&lt;/p&gt;
&lt;p&gt;If you’ve ever used an heuristic to informally evaluate someone’s credibility
by listening for industry-specific shibboleths or ways of describing a
particular issue, that skill is now useless.  Having ingested every industry’s
expert literature, commonly-occurring phrases will always be present in
Mallory’s output.  Mallory will usually sound like an expert, but then make
mistakes at random.&lt;sup id=fnref:15:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:15:futzing-fraction-2025-8 id=fnref:15&gt;15&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;While you might intuitively estimate &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; by thinking “well,
if I asked a &lt;em&gt;person&lt;/em&gt;, how could I check that &lt;em&gt;they&lt;/em&gt; were correct, and how long
would that take?” that estimate will be extremely optimistic, because the
heuristic techniques you would use to quickly evaluate incorrect information
from other humans will fail with Mallory.  You need to go all the way back to
primary sources and actually &lt;em&gt;fully&lt;/em&gt; verify the output every time, or you will
likely fall into one of these traps.&lt;/p&gt;
&lt;h3 id=mallory-mangling-mentorship&gt;Mallory Mangling Mentorship&lt;/h3&gt;
&lt;p&gt;So far, I’ve been describing the effect Mallory will have in the context of an
individual attempting to get some work done. If we are considering
organization-wide adoption of Mallory, however, we must &lt;em&gt;also&lt;/em&gt; consider the
impact on team dynamics.  There are a number of possible potential side effects
that one might consider when looking at, but here I will focus on just one that
I have observed.&lt;/p&gt;
&lt;p&gt;I have a cohort of friends in the software industry, most of whom are
individual contributors.  I’m a programmer who likes programming, so are most
of my friends, and we are also (&lt;strong&gt;sigh&lt;/strong&gt;), charitably, &lt;em&gt;pretty solidly
middle-aged&lt;/em&gt; at this point, so we tend to have a lot of experience.&lt;/p&gt;
&lt;p&gt;As such, we are often the folks that the team — or, in my case, the community —
goes to when less-experienced folks need answers.&lt;/p&gt;
&lt;p&gt;On its own, this is actually pretty great.  Answering questions from more
junior folks is one of the best parts of a software development job.  It’s an
opportunity to be helpful, mostly just by knowing a thing we already knew.  And
it’s an opportunity to help someone else improve their own agency by giving
them knowledge that they can use in the future.&lt;/p&gt;
&lt;p&gt;However, generative AI throws a bit of a wrench into the mix.&lt;/p&gt;
&lt;p&gt;Let’s imagine a scenario where we have 2 developers: Alice, a staff engineer
who has a good understanding of the system being built, and Bob, a relatively
junior engineer who is still onboarding.&lt;/p&gt;
&lt;p&gt;The traditional interaction between Alice and Bob, when Bob has a question,
goes like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Bob gets confused about something in the system being developed, because
   Bob’s understanding of the system is incorrect.&lt;/li&gt;
&lt;li&gt;Bob formulates a question based on this confusion.&lt;/li&gt;
&lt;li&gt;Bob asks Alice that question.&lt;/li&gt;
&lt;li&gt;Alice knows the system, so she gives an answer which
   accurately reflects the state of the system to Bob.&lt;/li&gt;
&lt;li&gt;Bob’s understanding of the system improves, and thus he will have fewer and
   better-informed questions going forward.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can imagine how repeating this simple 5-step process will eventually
transform Bob into a senior developer, and then he can start answering
questions on his own.  Making sufficient time for regularly iterating this loop
is the heart of any good mentorship process.&lt;/p&gt;
&lt;p&gt;Now, though, with Mallory in the mix, the process now has a new decision point,
changing it from a linear sequence to a flow chart.&lt;/p&gt;
&lt;p&gt;We begin the same way, with steps 1 and 2.  Bob’s confused, Bob formulates a
question, but then:&lt;/p&gt;
&lt;ol start=3&gt;
&lt;li&gt;Bob asks Mallory that question.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here, our path then diverges into a “happy” path, a “meh” path, and a “sad”
path.&lt;/p&gt;
&lt;p&gt;The “happy” path proceeds like so:&lt;/p&gt;
&lt;ol start=4&gt;
&lt;li&gt;Mallory happens to formulate a correct answer.&lt;/li&gt;
&lt;li&gt;Bob’s understanding of the system improves, and thus he will have fewer and
   better-informed questions going forward.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Great. Problem solved. We just saved some of Alice’s time. But as we learned earlier,&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Mallory can make mistakes&lt;/em&gt;&lt;/strong&gt;.  When that happens, we will need to &lt;strong&gt;&lt;em&gt;check
important info&lt;/em&gt;&lt;/strong&gt;.  So let’s get checking:&lt;/p&gt;
&lt;ol start=4&gt;
&lt;li&gt;Mallory happens to formulate an &lt;em&gt;incorrect&lt;/em&gt; answer.&lt;/li&gt;
&lt;li&gt;Bob investigates this answer.&lt;/li&gt;
&lt;li&gt;Bob realizes that this answer is incorrect because it is inconsistent with
   some of his prior, correct knowledge of the system, or his investigation.&lt;/li&gt;
&lt;li&gt;Bob asks Alice the same question; GOTO traditional interaction step 4.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;On this path, Bob spent a while futzing around with Mallory, to no particular
benefit.  This wastes some of Bob’s time, but then again, Bob &lt;em&gt;could&lt;/em&gt; have
ended up on the happy path, so perhaps it was worth the risk; at least Bob
wasn’t wasting any of &lt;em&gt;Alice’s&lt;/em&gt; much more valuable time in the process.&lt;sup id=fnref:16:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:16:futzing-fraction-2025-8 id=fnref:16&gt;16&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Notice that beginning at the start of step 4, we must begin allocating all of
Bob’s time to &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;, so &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; already
starts getting a bit bigger than if it were just Bob checking Mallory’s output
specifically on &lt;em&gt;tasks&lt;/em&gt; that Bob is doing.&lt;/p&gt;
&lt;p&gt;That brings us to the “sad” path.&lt;/p&gt;
&lt;ol start=4&gt;
&lt;li&gt;Mallory happens to formulate an &lt;em&gt;incorrect&lt;/em&gt; answer.&lt;/li&gt;
&lt;li&gt;Bob investigates this answer.&lt;/li&gt;
&lt;li&gt;Bob &lt;em&gt;does not realize&lt;/em&gt; that this answer is incorrect because he is unable to
   recognize any inconsistencies with his existing, incomplete knowledge of the
   system.&lt;/li&gt;
&lt;li&gt;Bob integrates Mallory’s incorrect information of the system into his mental
   model.&lt;/li&gt;
&lt;li&gt;Bob proceeds to make a larger and larger mess of his work, based on an
   incorrect mental model.&lt;/li&gt;
&lt;li&gt;Eventually, Bob asks Alice a new, worse question, based on this incorrect
   understanding.&lt;/li&gt;
&lt;li&gt;Sadly we &lt;em&gt;cannot&lt;/em&gt; return to the happy path at this point, because now Alice
    must unravel the complex series of confusing misunderstandings that Mallory
    has unfortunately conveyed to Bob at this point.  In the &lt;em&gt;really&lt;/em&gt; sad
    case, Bob actually &lt;em&gt;doesn’t believe&lt;/em&gt; Alice for a while, because Mallory
    seems unbiased&lt;sup id=fnref:17:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:17:futzing-fraction-2025-8 id=fnref:17&gt;17&lt;/a&gt;&lt;/sup&gt;, and Alice has to waste even more time &lt;em&gt;convincing&lt;/em&gt; Bob
    before she can simply &lt;em&gt;explain&lt;/em&gt; to him.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now, we have wasted some of Bob’s time, &lt;em&gt;and&lt;/em&gt; some of Alice’s time.  Everything
from step 5-10 is &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;, and as soon as Alice gets involved,
we are now adding to &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; at &lt;em&gt;double&lt;/em&gt; real-time.  If more
team members are pulled in to the investigation, you are now multiplying &lt;math&gt;
&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; by the number of investigators, potentially running at triple
or quadruple real time.&lt;/p&gt;
&lt;h3 id=but-thats-not-all&gt;But That’s Not All&lt;/h3&gt;
&lt;p&gt;Here I’ve presented a &lt;em&gt;brief&lt;/em&gt; selection reasons why &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;
will be both large, and larger than you expect. To review:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Gambling-style mechanics of the user interface will interfere with your own
   self-monitoring and developing a good estimate.&lt;/li&gt;
&lt;li&gt;You can’t use human heuristics for quickly spotting bad answers.&lt;/li&gt;
&lt;li&gt;Wrong answers given to junior people who can’t evaluate them will waste more
   time from your more senior employees.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But this is a &lt;em&gt;small&lt;/em&gt; selection of ways that Mallory’s output can cost you
money and time.  It’s harder to simplistically model second-order effects like
this, but there’s also a broad range of possibilities for ways that, rather
than simply checking and catching errors, an error slips through and starts
doing damage. Or ways in which the output isn’t exactly &lt;em&gt;wrong&lt;/em&gt;, but still
sub-optimal in ways which can be difficult to notice in the short term.&lt;/p&gt;
&lt;p&gt;For example, you might successfully vibe-code your way to launch a series of
applications, successfully “checking” the output along the way, but then
discover that the resulting code is unmaintainable garbage that prevents future
feature delivery, and needs to be re-written&lt;sup id=fnref:18:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:18:futzing-fraction-2025-8 id=fnref:18&gt;18&lt;/a&gt;&lt;/sup&gt;.  But this kind of
intellectual debt isn’t even specific to technical debt while coding; it can
even affect such apparently genAI-amenable fields as LinkedIn content
marketing&lt;sup id=fnref:19:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:19:futzing-fraction-2025-8 id=fnref:19&gt;19&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=problems-with-the-prediction-of-p&gt;Problems with the Prediction of &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt;&lt;/h2&gt;
&lt;p&gt;&lt;math&gt; &lt;mi&gt;C&lt;/mi&gt; &lt;/math&gt; isn’t the only challenging term
though. &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt;, is just as, if not more important, and just as
hard to measure.&lt;/p&gt;

&lt;p&gt;LLM marketing materials love to phrase their accuracy in terms of a
&lt;em&gt;percentage&lt;/em&gt;.  Accuracy claims for LLMs in general tend to hover around
70%&lt;sup id=fnref:20:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:20:futzing-fraction-2025-8 id=fnref:20&gt;20&lt;/a&gt;&lt;/sup&gt;.  But these scores vary per field, and when you aggregate them across
multiple topic areas, they start to trend down. This is exactly why “agentic”
approaches for more immediately-verifiable LLM outputs (with checks like “did
the code work”) got popular in the first place: you need to try more than once.&lt;/p&gt;
&lt;p&gt;Independently measured claims about accuracy tend to be quite a bit lower&lt;sup id=fnref:21:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:21:futzing-fraction-2025-8 id=fnref:21&gt;21&lt;/a&gt;&lt;/sup&gt;.
The field of AI benchmarks is exploding, but it probably goes without saying
that LLM vendors game those benchmarks&lt;sup id=fnref:22:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:22:futzing-fraction-2025-8 id=fnref:22&gt;22&lt;/a&gt;&lt;/sup&gt;, because of course every incentive
would encourage them to do that.  Regardless of what their arbitrary scoring on
some benchmark might say, all that matters to &lt;em&gt;your&lt;/em&gt; business is whether it is
accurate for the problems &lt;em&gt;you&lt;/em&gt; are solving, for the way that &lt;em&gt;you&lt;/em&gt; use it.
Which is not necessarily going to correspond to any benchmark. You will need to
measure it for yourself.&lt;/p&gt;
&lt;p&gt;With that goal in mind, our formulation of &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; must be a
somewhat harsher standard than “accuracy”.  It’s not merely “was the factual
information contained in any generated output accurate”, but, “is the output
good enough that some given real knowledge-work task is &lt;em&gt;done&lt;/em&gt; and the human
does not need to issue another prompt”?&lt;/p&gt;
&lt;h3 id=surprisingly-small-space-for-slip-ups&gt;Surprisingly Small Space for Slip-Ups&lt;/h3&gt;
&lt;p&gt;The problem with reporting these things as percentages at all, however, is that our actual definition for &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; is &lt;math&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mi&gt;attempts&lt;/mi&gt;&lt;/mfrac&gt;&lt;/math&gt;, where &lt;math&gt;&lt;mi&gt;attempts&lt;/mi&gt;&lt;/math&gt; for any given attempt, at least, must be an integer greater than or equal to 1.&lt;/p&gt;
&lt;p&gt;Taken in aggregate, if we succeed on the first prompt more often than not, we &lt;em&gt;could&lt;/em&gt; end up with a &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo&gt;&amp;gt;&lt;/mo&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/mfrac&gt;&lt;/math&gt;, but combined with
the previous observation that you almost always have to prompt it more than once, the practical reality is that &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; will start at 50% and go down from there.&lt;/p&gt;
&lt;p&gt;If we plug in some numbers, trying to be as &lt;em&gt;extremely&lt;/em&gt; optimistic as we can,
and say that we have a uniform stream of tasks, every one of which can be
addressed by Mallory, every one of which:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;we can measure perfectly, with no overhead&lt;/li&gt;
&lt;li&gt;would take a human 45 minutes&lt;/li&gt;
&lt;li&gt;takes Mallory only a single minute to generate a response&lt;/li&gt;
&lt;li&gt;Mallory will require only 1 re-prompt, so “good enough” half the time&lt;/li&gt;
&lt;li&gt;takes a human only 5 minutes to write a prompt for&lt;/li&gt;
&lt;li&gt;takes a human only 5 minutes to check the result of&lt;/li&gt;
&lt;li&gt;has a per-prompt cost of the equivalent of a single second of a human’s time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thought experiments are a dicey basis for reasoning in the face of
disagreements, so I have tried to formulate something here that is absolutely,
comically, over-the-top stacked in favor of the AI optimist here.&lt;/p&gt;
&lt;p&gt;Would that be a profitable?  It sure seems like it, given that we are trading
off 45 minutes of human time for 1 minute of Mallory-time and 10 minutes of
human time.  If we ask Python:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;
&lt;span class=normal&gt;5&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; def FF(H, I, C, P, W, E):
&lt;span class=k&gt;...&lt;/span&gt;     return (W + I + C + E) / (P * H)
&lt;span class=k&gt;...&lt;/span&gt; FF(H=45.0, I=1.0, C=5.0, P=1/2, W=5.0, E=0.01)
...
0.48933333333333334
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We get a futzing fraction of about 0.4893.  Not bad!  Sounds like, at least
under these conditions, it would indeed be cost-effective to deploy Mallory.
But…  realistically, do you &lt;em&gt;reliably&lt;/em&gt; get useful, done-with-the-task quality
output on the &lt;em&gt;second&lt;/em&gt; prompt?  Let’s bump up the denominator on &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; just a little bit there, and see how we fare:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; FF(H=45.0, I=1.0, C=5.0, P=1/3, W=5.0, E=0.01)
0.734
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Oof.  Still cost-effective at 0.734, but not quite as good.  Where do
we cap out, exactly?&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;
&lt;span class=normal&gt;5&lt;/span&gt;
&lt;span class=normal&gt;6&lt;/span&gt;
&lt;span class=normal&gt;7&lt;/span&gt;
&lt;span class=normal&gt;8&lt;/span&gt;
&lt;span class=normal&gt;9&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=o&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=kn&gt;from&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=nn&gt;itertools&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=kn&gt;import&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;count&lt;/span&gt;
&lt;span class=o&gt;...&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=k&gt;for&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;A&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=ow&gt;in&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;count&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;start&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mi&gt;4&lt;/span&gt;&lt;span class=p&gt;):&lt;/span&gt;
&lt;span class=o&gt;...&lt;/span&gt;&lt;span class=w&gt;     &lt;/span&gt;&lt;span class=nb&gt;print&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;A&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;result&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=o&gt;:=&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;FF&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;H&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mf&gt;45.0&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;I&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mf&gt;1.0&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;C&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mf&gt;5.0&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;P&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mi&gt;1&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=o&gt;/&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;A&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;W&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mf&gt;5.0&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;E&lt;/span&gt;&lt;span class=o&gt;=&lt;/span&gt;&lt;span class=mi&gt;1&lt;/span&gt;&lt;span class=o&gt;/&lt;/span&gt;&lt;span class=mf&gt;60.&lt;/span&gt;&lt;span class=p&gt;))&lt;/span&gt;
&lt;span class=o&gt;...&lt;/span&gt;&lt;span class=w&gt;     &lt;/span&gt;&lt;span class=k&gt;if&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=n&gt;result&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=o&gt;&amp;gt;&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=mi&gt;1&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
&lt;span class=o&gt;...&lt;/span&gt;&lt;span class=w&gt;         &lt;/span&gt;&lt;span class=k&gt;break&lt;/span&gt;
&lt;span class=o&gt;...&lt;/span&gt;
&lt;span class=mi&gt;4&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=mf&gt;0.9792592592592594&lt;/span&gt;
&lt;span class=mi&gt;5&lt;/span&gt;&lt;span class=w&gt; &lt;/span&gt;&lt;span class=mf&gt;1.224074074074074&lt;/span&gt;
&lt;span class=o&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With this little test, we can see that at our next iteration we are already at
0.9792, and by 5 tries per prompt, even in this absolute fever-dream of an
over-optimistic scenario, with a futzing fraction of 1.2240, Mallory is now a
net detriment to our bottom line.&lt;/p&gt;
&lt;h2 id=harm-to-the-humans&gt;Harm to the Humans&lt;/h2&gt;
&lt;p&gt;We are treating &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt; as functionally constant so far, an
average around some hypothetical Gaussian distribution, but the distribution
itself can also change over time.&lt;/p&gt;
&lt;p&gt;Formally speaking, an increase to &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt; would be &lt;em&gt;good&lt;/em&gt; for
our fraction.  Maybe it would even be a good thing; it could mean we’re taking
on harder and harder tasks due to the superpowers that Mallory has given us.&lt;/p&gt;
&lt;p&gt;But an observed increase to &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt; would probably &lt;em&gt;not&lt;/em&gt; be
good.  An increase could also mean your humans are getting worse at solving
problems, because using Mallory has atrophied their skills&lt;sup id=fnref:23:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:23:futzing-fraction-2025-8 id=fnref:23&gt;23&lt;/a&gt;&lt;/sup&gt; and sabotaged
learning opportunities&lt;sup id=fnref:24:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:24:futzing-fraction-2025-8 id=fnref:24&gt;24&lt;/a&gt;&lt;/sup&gt;&lt;sup id=fnref:25:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:25:futzing-fraction-2025-8 id=fnref:25&gt;25&lt;/a&gt;&lt;/sup&gt;.  It could also go up because your senior,
experienced people now hate their jobs&lt;sup id=fnref:26:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:26:futzing-fraction-2025-8 id=fnref:26&gt;26&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;For some more vulnerable folks, Mallory might just take a shortcut to all these
complex interactions and drive them completely insane&lt;sup id=fnref:27:futzing-fraction-2025-8&gt;&lt;a class=footnote-ref href=#fn:27:futzing-fraction-2025-8 id=fnref:27&gt;27&lt;/a&gt;&lt;/sup&gt; directly.  Employees
experiencing an intense psychotic episode are famously less productive than
those who are not.&lt;/p&gt;
&lt;p&gt;This could all be very bad, if our futzing fraction eventually does head north
of 1 and you need to reconsider introducing human-only workflows, without
Mallory.&lt;/p&gt;
&lt;h1 id=abridging-the-artificial-arithmetic-alliteratively&gt;Abridging the Artificial Arithmetic (Alliteratively)&lt;/h1&gt;
&lt;p&gt;To reiterate, I have proposed this fraction:&lt;/p&gt;
&lt;div style="font-size: 30px; text-align: center;"&gt;
&lt;math&gt;
    &lt;mi&gt;FF&lt;/mi&gt; &lt;mo&gt; = &lt;/mo&gt;
    &lt;mfrac&gt;
        &lt;mrow&gt;&lt;mi&gt;W&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;mo&gt;+&lt;/mo&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/mrow&gt;
        &lt;mrow&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;!--&lt;mo&gt;✕&lt;/mo&gt;--&gt; &lt;mi&gt;H&lt;/mi&gt;&lt;/mrow&gt;
    &lt;/mfrac&gt;
&lt;/math&gt;
&lt;/div&gt;

&lt;p&gt;which shows us positive ROI when FF is less than 1, and negative ROI when it is
more than 1.&lt;/p&gt;
&lt;p&gt;This model is heavily simplified.  A comprehensive measurement program that
tests the efficacy of &lt;em&gt;any&lt;/em&gt; technology, let alone one as complex and rapidly
changing as LLMs, is more complex than could be captured in a single blog post.&lt;/p&gt;
&lt;p&gt;Real-world work might be insufficiently uniform to fit into a closed-form
solution like this.  Perhaps an iterated simulation with variables based on the
range of values seem from your team’s metrics would give better results.&lt;/p&gt;
&lt;p&gt;However, in this post, I want to illustrate that if you are going to try to
evaluate an LLM-based tool, you need to at &lt;em&gt;least&lt;/em&gt; include some representation
of each of these terms &lt;em&gt;somewhere&lt;/em&gt;.  They are all fundamental to the way the
technology works, and if you’re not measuring them somehow, then you are flying
blind into the genAI storm.&lt;/p&gt;
&lt;p&gt;I also hope to show that a lot of existing assumptions about how benefits
might be demonstrated, for example with user surveys about general impressions,
or by evaluating artificial benchmark scores, are deeply flawed.&lt;/p&gt;
&lt;p&gt;Even making what I consider to be wildly, unrealistically optimistic
assumptions about these measurements, I hope I’ve shown:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;in the numerator, &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; might be a lot higher than you
   expect,&lt;/li&gt;
&lt;li&gt;in the denominator, &lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; might be a lot lower than you
   expect,&lt;/li&gt;
&lt;li&gt;repeated use of an LLM might make &lt;math&gt;&lt;mi&gt;H&lt;/mi&gt;&lt;/math&gt; go up, but despite
   the fact that it's in the denominator, that will ultimately be quite bad for
   your business.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Personally, I don’t have all that many concerns about &lt;math&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/math&gt; and  &lt;math&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;/math&gt;.  &lt;math&gt;&lt;mi&gt;E&lt;/mi&gt;&lt;/math&gt; is still seeing significant loss-leader pricing, and &lt;math&gt;&lt;mi&gt;I&lt;/mi&gt;&lt;/math&gt; might not be coming down as fast as vendors would like us to believe, if the other numbers work out I don’t think they make a huge difference.  However, there might still be surprises lurking in there, and if you want to rationally evaluate the effectiveness of a model, you need to be able to measure them and incorporate them as well.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;In particular&lt;/em&gt;, I really want to stress the importance of the influence of LLMs on your &lt;em&gt;team dynamic&lt;/em&gt;, as that can cause massive, hidden increases to &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt;.  LLMs present opportunities for junior employees to generate an endless stream of chaff that will simultaneously:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;wreck your performance review process by making them look much more
  productive than they are,&lt;/li&gt;
&lt;li&gt;increase stress and load on senior employees who need to clean up unforeseen
  messes created by their LLM output,&lt;/li&gt;
&lt;li&gt;and ruin their own opportunities for career development by skipping over
  learning opportunities.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’ve already deployed LLM tooling without measuring these things and
without updating your performance management processes to account for the
strange distortions that these tools make possible, your Futzing Fraction may
be much, much greater than 1, creating hidden costs and technical debt that
your organization will not notice until a lot of damage has already been done.&lt;/p&gt;
&lt;p&gt;If you got all the way here, &lt;em&gt;particularly&lt;/em&gt; if you’re someone who is
enthusiastic about these technologies, thank you for reading.  I appreciate
your attention and I am hopeful that if we can start paying attention to these
details, perhaps we can &lt;em&gt;all&lt;/em&gt; stop futzing around so much with this stuff and
get back to doing real work.&lt;/p&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt; who are supporting my writing on
this blog.  If you like what you’ve read here and you’d
like to read more of it, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:futzing-fraction-2025-8&gt;
&lt;p id=fn:1&gt;I do not share this optimism, but I want to try &lt;em&gt;very&lt;/em&gt; hard in this
particular piece to &lt;em&gt;take it as a given&lt;/em&gt; that genAI is in fact helpful. &lt;a class=footnote-backref href=#fnref:1:futzing-fraction-2025-8 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:futzing-fraction-2025-8&gt;
&lt;p id=fn:2&gt;If we could have a better prompt on demand via some repeatable and
automatable process, surely we would have used a prompt that got the answer
we wanted in the first place. &lt;a class=footnote-backref href=#fnref:2:futzing-fraction-2025-8 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:3:futzing-fraction-2025-8&gt;
&lt;p id=fn:3&gt;The software idea of a “&lt;a href="https://www.w3.org/WAI/UA/work/wiki/Definition_of_User_Agent"&gt;user
agent&lt;/a&gt;”
straightforwardly comes from the legal principle of an
&lt;a href="https://en.wikipedia.org/wiki/Law_of_agency"&gt;agent&lt;/a&gt;, which has deep roots
in common law, jurisprudence, &lt;a href="https://en.wikipedia.org/wiki/Principal–agent_problem"&gt;philosophy, and
math&lt;/a&gt;.  When we
think of an agent (some software) acting on behalf of a principal (a human
user), this historical baggage imputes some &lt;a href="https://blog.glyph.im/2005/11/ethics-for-programmers-primum-non.html"&gt;&lt;strong&gt;important ethical
obligations&lt;/strong&gt;&lt;/a&gt;
to the developer of the agent software.  genAI vendors have been as eager
as any software vendor to &lt;a href="https://openai.com/policies/row-terms-of-use/#:~:text=NEITHER%20WE%20NOR%20ANY%20OF%20OUR%20AFFILIATES%20OR%20LICENSORS%20WILL%20BE%20LIABLE"&gt;dodge responsibility for faithfully representing
the user’s
interests&lt;/a&gt;
even as there are some indications that &lt;a href="https://www.forbes.com/sites/marisagarcia/2024/02/19/what-air-canada-lost-in-remarkable-lying-ai-chatbot-case/"&gt;at least some courts are not
persuaded&lt;/a&gt;
by this dodge, at least by the consumers of genAI attempting to pass on the
responsibility all the way to end users.  Perhaps it goes without saying,
but I’ll say it anyway: I don’t like this newer interpretation of “agent”. &lt;a class=footnote-backref href=#fnref:3:futzing-fraction-2025-8 title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:4:futzing-fraction-2025-8&gt;
&lt;p id=fn:4&gt;&lt;a href="https://arxiv.org/abs/2502.15840"&gt;“Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous
Agents”&lt;/a&gt;, Axel Backlund, Lukas Petersson,
Feb 20, 2025 &lt;a class=footnote-backref href=#fnref:4:futzing-fraction-2025-8 title="Jump back to footnote 4 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:5:futzing-fraction-2025-8&gt;
&lt;p id=fn:5&gt;&lt;a href="https://xcancel.com/leojr94_/status/1901560276488511759"&gt;“random thing are happening, maxed out usage on api keys”&lt;/a&gt;, @leojr94 on Twitter, Mar 17, 2025 &lt;a class=footnote-backref href=#fnref:5:futzing-fraction-2025-8 title="Jump back to footnote 5 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:6:futzing-fraction-2025-8&gt;
&lt;p id=fn:6&gt;&lt;a href="https://apnews.com/article/chatgpt-study-harmful-advice-teens-c569cddf28f1f33b36c692428c2191d4"&gt;“New study sheds light on ChatGPT’s alarming interactions with
teens”&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:6:futzing-fraction-2025-8 title="Jump back to footnote 6 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:7:futzing-fraction-2025-8&gt;
&lt;p id=fn:7&gt;&lt;a href="https://apnews.com/article/artificial-intelligence-chatgpt-fake-case-lawyers-d6ae9fa79d0542db9e1455397aef381c"&gt;“Lawyers submitted bogus case law created by ChatGPT. A judge fined
them
$5,000”&lt;/a&gt;,
by Larry Neumeister for the Associated Press, June 22, 2023 &lt;a class=footnote-backref href=#fnref:7:futzing-fraction-2025-8 title="Jump back to footnote 7 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:8:futzing-fraction-2025-8&gt;
&lt;p id=fn:8&gt;During which a human will be busy-waiting on an answer. &lt;a class=footnote-backref href=#fnref:8:futzing-fraction-2025-8 title="Jump back to footnote 8 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:9:futzing-fraction-2025-8&gt;
&lt;p id=fn:9&gt;Given the fluctuating pricing of these products, and fixed subscription overhead, this will obviously need to be amortized; including all the additional terms to actually convert this from your inputs is left as an exercise for the reader. &lt;a class=footnote-backref href=#fnref:9:futzing-fraction-2025-8 title="Jump back to footnote 9 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:10:futzing-fraction-2025-8&gt;
&lt;p id=fn:10&gt;I feel like I should emphasize explicitly here that everything is an
average over repeated interactions.  For example, you might observe that a
particular LLM has a low probability of outputting acceptable work on the
first prompt, but higher probability on subsequent prompts in the same
context, such that it usually takes 4 prompts.  For the purposes of this
extremely simple closed-form model, we’d still consider that a
&lt;math&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;/math&gt; of 25%, even though a more sophisticated model, or
a monte carlo simulation that sets progressive bounds on the probability,
might produce more accurate values. &lt;a class=footnote-backref href=#fnref:10:futzing-fraction-2025-8 title="Jump back to footnote 10 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:11:futzing-fraction-2025-8&gt;
&lt;p id=fn:11&gt;&lt;a href="https://social.coop/@chrisjrn/115011133688436556"&gt;No it isn’t,
actually&lt;/a&gt;, but for the
sake of argument let’s grant that it is. &lt;a class=footnote-backref href=#fnref:11:futzing-fraction-2025-8 title="Jump back to footnote 11 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:12:futzing-fraction-2025-8&gt;
&lt;p id=fn:12&gt;It’s worth noting that all this expensive measuring &lt;em&gt;itself&lt;/em&gt; must be
included in &lt;math&gt;&lt;mi&gt;C&lt;/mi&gt;&lt;/math&gt; until you have a solid grounding for
all your metrics, but let’s optimistically leave all of that out for the
sake of simplicity. &lt;a class=footnote-backref href=#fnref:12:futzing-fraction-2025-8 title="Jump back to footnote 12 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:13:futzing-fraction-2025-8&gt;
&lt;p id=fn:13&gt;&lt;a href="https://www.newsweek.com/nearly-half-employees-trust-ai-more-their-coworkers-2113159"&gt;“AI Company Poll Finds 45% of Workers Trust the Tech More Than Their
Peers”&lt;/a&gt;,
by Suzanne Blake for Newsweek, Aug 13, 2025 &lt;a class=footnote-backref href=#fnref:13:futzing-fraction-2025-8 title="Jump back to footnote 13 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:14:futzing-fraction-2025-8&gt;
&lt;p id=fn:14&gt;&lt;a href="https://www.cmu.edu/dietrich/news/news-stories/2025/july/trent-cash-ai-overconfidence.html"&gt;AI Chatbots Remain Overconfident — Even When They’re
Wrong&lt;/a&gt;
by Jason Bittel for the Dietrich College of Humanities and Social Sciences
at Carnegie Mellon University, July 22, 2025 &lt;a class=footnote-backref href=#fnref:14:futzing-fraction-2025-8 title="Jump back to footnote 14 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:15:futzing-fraction-2025-8&gt;
&lt;p id=fn:15&gt;&lt;a href="https://spectrum.ieee.org/ai-mistakes-schneier"&gt;AI Mistakes Are Very Different From Human
Mistakes&lt;/a&gt; by Bruce Schneier
and Nathan E. Sanders for IEEE Spectrum, Jan 13, 2025 &lt;a class=footnote-backref href=#fnref:15:futzing-fraction-2025-8 title="Jump back to footnote 15 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:16:futzing-fraction-2025-8&gt;
&lt;p id=fn:16&gt;Foreshadowing is a narrative device in which a storyteller gives an
advance hint of an upcoming event later in the story. &lt;a class=footnote-backref href=#fnref:16:futzing-fraction-2025-8 title="Jump back to footnote 16 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:17:futzing-fraction-2025-8&gt;
&lt;p id=fn:17&gt;&lt;a href="https://www.ipsos.com/en-us/people-are-worried-about-misuse-ai-they-trust-it-more-humans"&gt;“People are worried about the misuse of AI, but they trust it more than humans”&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:17:futzing-fraction-2025-8 title="Jump back to footnote 17 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:18:futzing-fraction-2025-8&gt;
&lt;p id=fn:18&gt;&lt;a href="https://youtu.be/w3EZpcTZ4ZA?si=816uBg6N3Pmon2P3"&gt;“Why I stopped using AI (as a Senior Software
Engineer)”&lt;/a&gt;, theSeniorDev
YouTube channel, Jun 17, 2025 &lt;a class=footnote-backref href=#fnref:18:futzing-fraction-2025-8 title="Jump back to footnote 18 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:19:futzing-fraction-2025-8&gt;
&lt;p id=fn:19&gt;&lt;a href="https://www.youtube.com/watch?v=1ghBG302_LQ"&gt;“I was an AI evangelist. Now I’m an AI vegan. Here’s
why.”&lt;/a&gt;, Joe McKay for the
greatchatlinkedin YouTube channel, Aug 8, 2025 &lt;a class=footnote-backref href=#fnref:19:futzing-fraction-2025-8 title="Jump back to footnote 19 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:20:futzing-fraction-2025-8&gt;
&lt;p id=fn:20&gt;&lt;a href="https://originality.ai/blog/what-llm-is-the-most-accurate"&gt;“What LLM is The Most Accurate?”&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:20:futzing-fraction-2025-8 title="Jump back to footnote 20 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:21:futzing-fraction-2025-8&gt;
&lt;p id=fn:21&gt;&lt;a href="https://futurism.com/the-byte/study-chatgpt-answers-wrong"&gt;“Study Finds That 52 Percent Of ChatGPT Answers to Programming Questions are Wrong”&lt;/a&gt;, by Sharon Adarlo for Futurism, May 23, 2024 &lt;a class=footnote-backref href=#fnref:21:futzing-fraction-2025-8 title="Jump back to footnote 21 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:22:futzing-fraction-2025-8&gt;
&lt;p id=fn:22&gt;&lt;a href="https://blog.boxcars.ai/p/off-the-mark-the-pitfalls-of-metrics"&gt;“Off the Mark: The Pitfalls of Metrics Gaming in AI Progress
Races”&lt;/a&gt;, by
Tabrez Syed on BoxCars AI, Dec 14, 2023 &lt;a class=footnote-backref href=#fnref:22:futzing-fraction-2025-8 title="Jump back to footnote 22 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:23:futzing-fraction-2025-8&gt;
&lt;p id=fn:23&gt;&lt;a href="https://thomasorus.com/i-tried-coding-with-ai-i-became-lazy-and-stupid"&gt;“I tried coding with AI, I became lazy and
stupid”&lt;/a&gt;,
by Thomasorus, Aug 8, 2025 &lt;a class=footnote-backref href=#fnref:23:futzing-fraction-2025-8 title="Jump back to footnote 23 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:24:futzing-fraction-2025-8&gt;
&lt;p id=fn:24&gt;&lt;a href="https://www.psychologytoday.com/us/blog/the-algorithmic-mind/202505/how-ai-changes-student-thinking-the-hidden-cognitive-risks"&gt;“How AI Changes Student Thinking: The Hidden Cognitive
Risks”&lt;/a&gt;
by Timothy Cook for Psychology Today, May 10, 2025 &lt;a class=footnote-backref href=#fnref:24:futzing-fraction-2025-8 title="Jump back to footnote 24 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:25:futzing-fraction-2025-8&gt;
&lt;p id=fn:25&gt;&lt;a href="https://phys.org/news/2025-01-ai-linked-eroding-critical-skills.html"&gt;“Increased AI use linked to eroding critical thinking skills”&lt;/a&gt; by Justin Jackson for Phys.org, Jan 13, 2025 &lt;a class=footnote-backref href=#fnref:25:futzing-fraction-2025-8 title="Jump back to footnote 25 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:26:futzing-fraction-2025-8&gt;
&lt;p id=fn:26&gt;&lt;a href="https://dev.to/manuartero/ai-could-end-my-job-just-not-the-way-i-expected-5g3m"&gt;“AI could end my job — Just not the way I expected”&lt;/a&gt; by Manuel Artero Anguita on dev.to, Jan 27, 2025 &lt;a class=footnote-backref href=#fnref:26:futzing-fraction-2025-8 title="Jump back to footnote 26 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:27:futzing-fraction-2025-8&gt;
&lt;p id=fn:27&gt;&lt;a href="https://www.psychologytoday.com/us/blog/urban-survival/202507/the-emerging-problem-of-ai-psychosis"&gt;“The Emerging Problem of “AI
Psychosis””&lt;/a&gt;
by Gary Drevitch for Psychology Today, July 21, 2025. &lt;a class=footnote-backref href=#fnref:27:futzing-fraction-2025-8 title="Jump back to footnote 27 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="ai"></category><category term="llm"></category><category term="basic-arithmetic"></category></entry><entry><title>R0ML’s Ratio</title><link href="https://blog.glyph.im/2025/08/r0mls-ratio.html" rel="alternate"></link><published>2025-08-08T21:41:00-07:00</published><updated>2025-08-08T21:41:00-07:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-08-08:/2025/08/r0mls-ratio.html</id><summary type="html">&lt;p&gt;Is your volume discount a good deal? Who nose!&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;p&gt;My &lt;a href="https://blog.glyph.im/2011/06/blog-post.html"&gt;father&lt;/a&gt;, also known as
“&lt;a href="https://r0ml.medium.com"&gt;R0ML&lt;/a&gt;” once described a methodology for evaluating
volume purchases that I think needs to be more popular.&lt;/p&gt;
&lt;p&gt;If you are a hardcore fan, you might know that he &lt;em&gt;has&lt;/em&gt; already described this
concept publicly in a talk at OSCON in 2005, among other places, but it has
never found its way to the public Internet, so I’m giving it a home here, and
in the process, appropriating some of his words.&lt;sup id=fnref:1:r0mls-ratio-2025-8&gt;&lt;a class=footnote-ref href=#fn:1:r0mls-ratio-2025-8 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Let’s say you’re running a circus.  The circus has many clowns.  Ten thousand
clowns, to be precise.  They require bright red clown noses.  Therefore, you
must acquire a significant volume of clown noses.  An enterprise licensing
agreement for clown noses, if you will.&lt;/p&gt;
&lt;p&gt;If the nose
&lt;a href="https://en.wikiquote.org/wiki/Ocean%27s_Thirteen#:~:text=Have%20you%20guys%20been%20talking%20to%20my%20dad?"&gt;plays&lt;/a&gt;,
it can really make the act.  In order to make sure you’re getting quality
noses, you go with a quality vendor.  You select a vendor who can supply noses
for $100 each, at retail.&lt;/p&gt;
&lt;p&gt;Do you want to buy retail?  Ten thousand clowns, ten thousand noses, one
hundred dollars: that’s a million bucks worth of noses, so it’s worth your
while to get a good deal.&lt;/p&gt;
&lt;p&gt;As a conscientious executive, you go to the golf course with your favorite
clown accessories vendor and negotiate yourself a 50% discount, with a
commitment to buy all ten thousand noses.&lt;/p&gt;
&lt;p&gt;Is this a &lt;em&gt;good&lt;/em&gt; deal?  Should you take it?&lt;/p&gt;
&lt;p&gt;To determine this, we will use an analytical tool called &lt;em&gt;R0ML’s Ratio&lt;/em&gt; (RR).&lt;/p&gt;
&lt;p&gt;The ratio has 2 terms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;the Full Undiscounted Retail List Price of Units Used (FURLPoUU), which can
   of course be computed by the individual retail list price of a single unit
   (in our case, $100) multiplied by the &lt;strong&gt;number of units used&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;the Total Price of the Entire Enterprise Volume Licensing Agreement
   (TPotEEVLA), which in our case is $500,000.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It is expressed as:&lt;/p&gt;
&lt;math&gt;
&lt;mi&gt;RR&lt;/mi&gt; &lt;mo&gt; = &lt;/mo&gt; &lt;mfrac&gt;&lt;mi&gt;TPotEEVLA&lt;/mi&gt; &lt;mi&gt;FURLPoUU&lt;/mi&gt;&lt;/mfrac&gt;
&lt;/math&gt;

&lt;p&gt;Crucially, you must be able to compute the &lt;strong&gt;number of units used&lt;/strong&gt; in order to
complete this ratio.  If, as expected, every single clown wears their nose at
least once during the period of the license agreement, then our Units Used is
10,000, our FURLPoUU is $1,000,000 and our TPotEEVLA is $500,000, which makes
our RR 0.5.&lt;/p&gt;
&lt;p&gt;Congratulations.  If R0ML’s Ratio is less than 1, it’s a good deal.  Proceed.&lt;/p&gt;
&lt;p&gt;But… maybe the nose &lt;em&gt;doesn’t&lt;/em&gt; play.  Not every clown’s costume is an exact
clone of the traditional, stereotypical image of a clown.  Many are
avant-garde.  Perhaps this plentiful proboscis pledge was premature.  Here, I
must quote the originator of this theoretical framework directly:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What if the wheeze doesn’t please?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;What if the schnozz gives some pause?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words: what if some clowns don’t wear their noses?&lt;/p&gt;
&lt;p&gt;If we were to do this deal, and then ask around &lt;em&gt;afterwards&lt;/em&gt; to find out that
only &lt;em&gt;200&lt;/em&gt; of our 10,000 clowns were to use their noses, then FURLPoUU comes
out to 200 * $100, for a total of $20,000.  In that scenario, RR is &lt;strong&gt;25&lt;/strong&gt;,
which you may observe is &lt;em&gt;substantially greater&lt;/em&gt; than 1.&lt;/p&gt;
&lt;p&gt;If you do a deal where R0ML’s ratio is greater than 1, then &lt;em&gt;you&lt;/em&gt; are the bozo.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;I apologize if I have belabored this point.  As R0ML expressed in the email we
exchanged about this many years ago,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I do not mind if you blog about it — and I don't mind getting the credit —
although one would think it would be obvious.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And yeah, one &lt;em&gt;would&lt;/em&gt; think this would be obvious?  But I have belabored it
because many discounted enterprise volume purchasing agreements still fail the
R0ML’s Ratio Bozo Test.&lt;sup id=fnref:2:r0mls-ratio-2025-8&gt;&lt;a class=footnote-ref href=#fn:2:r0mls-ratio-2025-8 id=fnref:2&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;In the case of clown noses, if you pay the discounted price, at least you get
to keep the nose; maybe lightly-used clown noses have some resale value.  But
in software licensing or SaaS deals, once you’ve purchased the “discounted”
software or service, once you have provisioned the “seats”, the money is gone,
and if your employees don’t use it, then no value for your organization will
ever result.&lt;/p&gt;
&lt;p&gt;Measuring &lt;strong&gt;number of units used&lt;/strong&gt; is &lt;strong&gt;&lt;em&gt;very&lt;/em&gt;&lt;/strong&gt; important.  Without this
number, you have no idea if you are a bozo or not.&lt;/p&gt;
&lt;p&gt;It is &lt;em&gt;often&lt;/em&gt; better to give your individual employees a corporate card and
allow them to make arbitrary individual purchases of software licenses and SaaS
tools, with minimal expense-reporting overhead; this will always keep R0ML’s
Ratio at 1.0, and thus, you will never be a bozo.&lt;/p&gt;
&lt;p&gt;It is &lt;em&gt;always&lt;/em&gt; better to do that the &lt;em&gt;first&lt;/em&gt; time you are purchasing a &lt;em&gt;new&lt;/em&gt;
software tool, because the first time making such a purchase you (almost by
definition) have no information about “units used” yet.  You have no idea — you
&lt;em&gt;cannot&lt;/em&gt; have any idea — if you are a bozo or not.&lt;/p&gt;
&lt;p&gt;If you don’t know who the bozo is, it’s probably you.&lt;/p&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you for reading, and especially thank you to &lt;a href="/pages/patrons.html"&gt;my
patrons&lt;/a&gt; who are supporting my writing on this blog.  Of
course, extra thanks to dad for, like, having this idea and doing most of the
work here beyond my transcription.  If you like my dad’s ideas and you’d like
to post more of them, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:r0mls-ratio-2025-8&gt;
&lt;p id=fn:1&gt;One of my other favorite posts on this blog was just
&lt;a href="https://blog.glyph.im/2022/12/potato-programming.html"&gt;stealing&lt;/a&gt; another one of his ideas, so
hopefully this one will be good too. &lt;a class=footnote-backref href=#fnref:1:r0mls-ratio-2025-8 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:r0mls-ratio-2025-8&gt;
&lt;p id=fn:2&gt;This concept was first developed in 2001, but it has some implications
for extremely recent developments in the software industry; but that’s a
post for another day. &lt;a class=footnote-backref href=#fnref:2:r0mls-ratio-2025-8 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="basic-arithmetic"></category><category term="software"></category><category term="saas"></category></entry><entry><title>The Best Line Length</title><link href="https://blog.glyph.im/2025/08/the-best-line-length.html" rel="alternate"></link><published>2025-08-07T22:37:00-07:00</published><updated>2025-08-07T22:37:00-07:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-08-07:/2025/08/the-best-line-length.html</id><summary type="html">&lt;p&gt;What’s a good maximum line length for your coding standard?&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;p&gt;What’s a good maximum line length for your coding standard?&lt;/p&gt;
&lt;p&gt;This is, of course, a trick question. By posing it &lt;em&gt;as&lt;/em&gt; a question, I have
created the misleading impression that it &lt;em&gt;is&lt;/em&gt; a question, but
&lt;a href="https://github.com/psf/black"&gt;Black&lt;/a&gt; has selected the correct number for you;
it’s 90 characters or so, &lt;a href="https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#labels-line-length"&gt;about 10% wider than the width of a standard VT100
hardware
terminal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Thanks for reading my blog.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;OK, OK. Clearly, there’s more to it than that. This is an age-old debate on the
level of “tabs versus spaces”. So contentious, in fact, that even the famously
opinionated Black &lt;a href="https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#labels-line-length"&gt;does in fact let you change
it&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=ancient-history&gt;Ancient History&lt;/h2&gt;
&lt;p&gt;One argument that certain &lt;a href="https://lkml.org/lkml/2020/5/29/1038"&gt;silly
people&lt;/a&gt;&lt;sup id=fnref:1:the-best-line-length-2025-8&gt;&lt;a class=footnote-ref href=#fn:1:the-best-line-length-2025-8 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt; like to make is “why are we
wrapping at 80 characters like we are using 80 character teletypes, it’s the
2020s!  I have an ultrawide monitor!”.  The implication here is that the width
of 80-character terminals is an antiquated relic, based entirely around the
hardware limitations of a bygone era, and modern displays can put tons of stuff
on one line, so why not &lt;em&gt;use&lt;/em&gt; that capability?&lt;/p&gt;
&lt;p&gt;This feels intuitively true, given the &lt;em&gt;huge&lt;/em&gt; disparity between ancient times
and now: on my own display, I can comfortably fit &lt;em&gt;about&lt;/em&gt; 350 characters on a
line. What a shame, to have so much room for so many characters in each line,
and to waste it all on blank space!&lt;/p&gt;
&lt;p&gt;But... is that true?&lt;/p&gt;
&lt;p&gt;I stretched out my editor window all the way to measure that ‘350’ number, but
I did not continue editing at that window width.  In order to have a more
comfortable editing experience, I switched back into &lt;a href="https://github.com/joostkremers/writeroom-mode"&gt;writeroom
mode&lt;/a&gt;, a mode which emulates a
considerably more
&lt;a href="https://apps.apple.com/us/app/writeroom/id417967324?mt=12"&gt;writerly&lt;/a&gt;
application, which limits each line length to 92 characters, regardless of
frame width.&lt;/p&gt;
&lt;p&gt;You’ve probably noticed this too.  Almost all sites that display prose of any
kind limit their width, even on very wide screens.&lt;/p&gt;
&lt;p&gt;As silly as that tiny little ribbon of text running down the middle of your
monitor might look with a full-screened stereotypical news site or blog, if you
full-screen a site that &lt;em&gt;doesn’t&lt;/em&gt; set that width-limit, although it &lt;em&gt;makes
sense&lt;/em&gt; that you can now use all that space up, it will look &lt;a href="https://danluu.com/"&gt;&lt;em&gt;extremely&lt;/em&gt;,
almost unreadably bad&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Blogging software does not set a column width limit on your text because of
some 80-character-wide accident of history in the form of a hardware terminal.&lt;/p&gt;
&lt;p&gt;Similarly, if you really try to use that screen real estate to its fullest for
&lt;em&gt;coding&lt;/em&gt;, and start editing 200-300 character lines, you’ll quickly notice it
starts to feel just a bit weird and confusing.  It gets surprisingly easy to
lose your place.  &lt;em&gt;Rhetorically&lt;/em&gt; the “80 characters is just because of dinosaur
technology! Use all those ultrawide pixels!” talking point is quite popular,
but &lt;em&gt;practically&lt;/em&gt; people usually just want a few more characters worth of
breathing room, maxing out at 100 characters, far narrower than even the most
svelte widescreen.&lt;/p&gt;
&lt;p&gt;So maybe those 80 character terminals are holding us back a &lt;em&gt;little&lt;/em&gt; bit,
but... wait a second.  Why were the &lt;em&gt;terminals&lt;/em&gt; 80 characters wide in the first
place?&lt;/p&gt;
&lt;h2 id=ancienter-history&gt;Ancienter History&lt;/h2&gt;
&lt;p&gt;As &lt;a href="https://softwareengineering.stackexchange.com/a/148729"&gt;this lovely Software Engineering Stack
Exchange&lt;/a&gt; post
summarizes, terminals were probably 80 characters because teletypes were 80
characters, and teletypes were probably 80 characters because punch cards were
80 characters, and punch cards were probably 80 characters because &lt;em&gt;that’s just
about how many typewritten characters fit onto one line of a US-Letter piece of
paper&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Even before typewriters, consider the average &lt;em&gt;newspaper&lt;/em&gt;: why do we call a
regularly-occurring featured article in a newspaper a “column”? Because
broadsheet papers were &lt;em&gt;too wide&lt;/em&gt; to have only a single column; they would
always be broken into multiple!  Far more aggressive than 80 characters,
columns in newspapers typically have &lt;em&gt;30&lt;/em&gt; characters per line.&lt;/p&gt;
&lt;p&gt;The first newspaper printing machines were custom designed and could have used
whatever width they wanted, so why standardize on something so narrow?&lt;sup id=fnref:2:the-best-line-length-2025-8&gt;&lt;a class=footnote-ref href=#fn:2:the-best-line-length-2025-8 id=fnref:2&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=science&gt;Science!&lt;/h2&gt;
&lt;p&gt;There has been a surprising amount of scientific research around &lt;a href="https://en.wikipedia.org/wiki/Line_length"&gt;this
issue&lt;/a&gt;, but in brief, there’s a
reason here rooted in human physiology: when you read a block of text, you are
not consciously moving your eyes from word to word like you’re dragging a mouse
cursor, repositioning continuously.  Human eyes reading text move in quick
bursts of rotation called “&lt;a href="https://en.wikipedia.org/wiki/Saccade"&gt;saccades&lt;/a&gt;”.
In order to quickly and accurately move from one line of text to another, the
start of the next line needs to be clearly visible in the reader’s peripheral
vision in order for them to accurately target it.  This limits the angle of
rotation that the reader can perform in a single saccade, and, thus, the length
of a line that they can comfortably read without hunting around for the start
of the next line every time they get to the end.&lt;/p&gt;
&lt;p&gt;So, 80 (or 80 plus about 10%) characters isn’t too unreasonable for a limit.
It’s longer than &lt;em&gt;30&lt;/em&gt; characters, that’s for sure!&lt;/p&gt;
&lt;p&gt;But, surely that’s not &lt;em&gt;all&lt;/em&gt;, or this wouldn’t be so contentious in the first
place?&lt;/p&gt;
&lt;h2 id=caveats&gt;Caveats&lt;/h2&gt;
&lt;h3 id=the-screen-is-wide-though&gt;The screen &lt;em&gt;is&lt;/em&gt; wide, though.&lt;/h3&gt;
&lt;p&gt;The ultrawide aficionados &lt;em&gt;do&lt;/em&gt; have a point, even if it’s not really the
simple one about “old terminals” they originally thought.  Our modern
wide-screen displays &lt;em&gt;are&lt;/em&gt; criminally underutilized, particularly for text.
Even adding in the big chunky file, class, and method tree browser over on the
left and the source code preview on the right, a brief survey of a Google Image
search for “vs code” shows a &lt;em&gt;lot&lt;/em&gt; of editors open with huge, blank areas on
the right side of the window.&lt;/p&gt;
&lt;p&gt;Big screens &lt;em&gt;are&lt;/em&gt; super useful as they allow us to leverage our spatial
memories to keep more relevant code around and simply glance around as we
think, rather than navigate interactively.  But it only works if you remember
to do it.&lt;/p&gt;
&lt;p&gt;Newspapers allowed us to read a ton of
information in one sitting with minimum shuffling by packing in as much as 6
columns of text.  You could read a column to the bottom of the page, back to
the top, and down again, several times.&lt;/p&gt;
&lt;p&gt;Similarly, books fill both of their opposed pages with text at the same time,
doubling the amount of stuff you can read at once before needing to turn the
page.&lt;/p&gt;
&lt;p&gt;You may notice that reading text in a book, even in an ebook app, is more
comfortable than reading a ton of text by scrolling around in a web browser.
That’s because our eyes are &lt;em&gt;built&lt;/em&gt; for saccades, and repeatedly tracking the
continuous smooth motion of the page as it scrolls to a stop, then re-targeting
the new fixed location to start saccading around from, is literally more
physically strenuous on your eye’s muscles!&lt;/p&gt;
&lt;p&gt;There’s a reason that the &lt;a href="https://en.wikipedia.org/wiki/Codex"&gt;codex&lt;/a&gt;
was a big technological innovation over the scroll.  This is a regression!&lt;/p&gt;
&lt;p&gt;Today, the right thing to do here is to make use of horizontally split panes in
your text editor or IDE, and just make a bit of conscious effort to set up the
appropriate code on screen for the problem you’re working on.  However, this is
a potential area for different IDEs to really differentiate themselves, and
build multi-column continuous-code-reading layouts that allow for buffers to
wrap and be navigable newspaper-style.&lt;/p&gt;
&lt;p&gt;Similar, &lt;a href="https://developer.mozilla.org/en-US/docs/Web/CSS/columns"&gt;modern CSS has shockingly good support for multi-column
layouts&lt;/a&gt;, and it’s a
shame that true multi-column, page-turning layouts are so rare.  If I ever
figure out a way to deploy this here that isn’t horribly clunky and fighting
modern platform conventions like “scrolling horizontally is substantially more
annoying and inconsistent than scrolling vertically” maybe I will experiment
with such a layout on this blog one day.  Until then… just make the browser
window narrower so other useful stuff can be in the other parts of the screen,
I guess.&lt;/p&gt;
&lt;h3 id=code-isnt-prose&gt;Code Isn’t Prose&lt;/h3&gt;
&lt;p&gt;But, I digress. While I think that columnar layouts for reading prose &lt;em&gt;are&lt;/em&gt; an
interesting thing more people should experiment with, code isn’t prose.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;metric&lt;/em&gt; used for ideal line width, which you may have noticed if you
clicked through some of those Wikipedia links earlier, is not “character cells
in your editor window”, it is &lt;em&gt;characters per line&lt;/em&gt;, or “CPL”.&lt;/p&gt;
&lt;p&gt;With an optimal CPL somewhere between 45 and 95, a &lt;em&gt;code&lt;/em&gt;-line-width of
somewhere around 90 might actually be the best idea, because &lt;em&gt;whitespace uses
up your line-width budget&lt;/em&gt;.  In a typical object-oriented Python program&lt;sup id=fnref:3:the-best-line-length-2025-8&gt;&lt;a class=footnote-ref href=#fn:3:the-best-line-length-2025-8 id=fnref:3&gt;2&lt;/a&gt;&lt;/sup&gt;,
&lt;em&gt;most&lt;/em&gt; of your code ends up indented by at least 8 spaces: 4 for the class
scope, 4 for the method scope.  Most likely a lot of it is 12, because any
interesting code will have at least one conditional or loop.  So, by the time
you’re done wasting all that horizontal space, a max line length of 90 actually
looks more like a maximum of 78... right about that sweet spot from the
US-Letter page in the typewriter that we started with.&lt;/p&gt;
&lt;h3 id=what-about-soft-wrap&gt;What about soft-wrap?&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;In principle&lt;/em&gt;, source code is structured information, whose presentation could
be fully decoupled from its serialized representation.  Everyone could configure
their preferred line width appropriate to their custom preferences and the
specific physiological characteristics of their eyes, and the code could be
formatted according to the language it was expressed in, and “hard wrapping”
could be a silly antiquated thing.&lt;/p&gt;
&lt;p&gt;The problem with this argument is the same as the argument against “but tabs
are &lt;em&gt;semantic&lt;/em&gt; indentation”, to wit: nope, no it isn’t.  What “in principle”
means in the previous paragraph is actually “in a fantasy world which we do not
inhabit”.  I’d love it if editors treated code this way and we had a rich
history and tradition of structured manipulations rather than typing in strings
of symbols to construct source code textually.  But that is not the world we
live in.  Hard wrapping is unfortunately necessary to integrate with diff
tools.&lt;/p&gt;
&lt;h2 id=so-whats-the-optimal-line-width&gt;So what’s the optimal line width?&lt;/h2&gt;
&lt;p&gt;The exact, specific number here is still ultimately a matter of personal
preference.&lt;/p&gt;
&lt;p&gt;Hopefully, understanding the long history, science, and underlying physical
constraints can lead you to select a contextually appropriate value for your
own purposes that will balance ease of reading, integration with the relevant
tools in your ecosystem, diff size, presentation in the editors and IDEs that
your contributors tend to use, reasonable display in web contexts, on
presentation slides, and so on.&lt;/p&gt;
&lt;p&gt;But — and this is important — counterpoint:&lt;/p&gt;
&lt;p&gt;No it isn’t, you don’t need to
select an optimal width, because it’s already been selected for you.  It is
&lt;a href="https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#labels-line-length"&gt;the full width of a standard VT100 terminal plus 10%&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you for reading, and especially thank you to &lt;a href="/pages/patrons.html"&gt;my
patrons&lt;/a&gt; who are supporting my writing on this blog.  If
you like what you’ve read here and you’d like to read more of it, or you’d like
to support my &lt;a href="https://github.com/glyph/"&gt;various open-source endeavors&lt;/a&gt;, you
can &lt;a href="/pages/patrons.html"&gt;support my work as a sponsor&lt;/a&gt;!&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:the-best-line-length-2025-8&gt;
&lt;p id=fn:1&gt;I love the fact that this message is, itself, hard-wrapped to 77
characters. &lt;a class=footnote-backref href=#fnref:1:the-best-line-length-2025-8 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:3:the-best-line-length-2025-8&gt;
&lt;p id=fn:3&gt;Let’s be honest; we’re all object-oriented python programmers here,
aren’t we? &lt;a class=footnote-backref href=#fnref:3:the-best-line-length-2025-8 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:the-best-line-length-2025-8&gt;
&lt;p id=fn:2&gt;Unsurprisingly, there are also financial reasons. &lt;a href="https://www.josephdickerson.com/blog/2011/11/13/what-is-the-reason-for-multi-column-layout-in-magazines-and-newspapers/"&gt;More, narrower
columns meant it was easier to fix typesetting errors and to insert more
advertisements as
necessary&lt;/a&gt;. But
readability really did have a lot to do with it, too; scientists were
looking at ease of reading as far back as the 1800s. &lt;a class=footnote-backref href=#fnref:2:the-best-line-length-2025-8 title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="programming"></category><category term="python"></category><category term="saccades"></category></entry><entry><title>I Think I’m Done Thinking About genAI For Now</title><link href="https://blog.glyph.im/2025/06/i-think-im-done-thinking-about-genai-for-now.html" rel="alternate"></link><published>2025-06-04T22:22:00-07:00</published><updated>2025-06-04T22:22:00-07:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-06-04:/2025/06/i-think-im-done-thinking-about-genai-for-now.html</id><summary type="html">&lt;p&gt;The conversation isn’t over, but I don’t think I have much to add to it.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;h2 id=the-problem&gt;The Problem&lt;/h2&gt;
&lt;p&gt;Like many other self-styled thinky programmer guys, I like to imagine myself as
a sort of &lt;a href="https://en.wikipedia.org/wiki/Sherlock_Holmes"&gt;Holmesian&lt;/a&gt; genius,
making trenchant observations, collecting them, and then synergizing them into
brilliant deductions with the keen application of my powerful mind.&lt;/p&gt;
&lt;p&gt;However, several years ago, I had an epiphany in my self-concept.  I finally
understood that, to the extent that I &lt;em&gt;am&lt;/em&gt; usefully clever, it is less in a
Holmesian idiom, and more, shall we say,
&lt;a href="https://en.wikipedia.org/wiki/Adrian_Monk"&gt;Monkesque&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For those unfamiliar with either of the respective franchises:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Holmes is a towering intellect honed by years of training, who catalogues
  intentional, systematic observations and deduces logical, factual conclusions
  from those observations.&lt;/li&gt;
&lt;li&gt;Monk, on the other hand, while also a reasonably intelligent guy, is highly
  neurotic, wracked by unresolved trauma and profound grief.  As both a
  consulting job and a coping mechanism, he makes a habit of erratically
  wandering into crime scenes, and, driven by a carefully managed jenga tower
  of mental illnesses, leverages his dual inabilities to solve crimes.  First,
  he is unable to filter out apparently inconsequential details, building up a
  mental rat’s nest of trivia about the problem; second, he is unable to let go
  of any minor incongruity, obsessively ruminating on the collection of facts
  until they all make sense in a consistent timeline.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Perhaps surprisingly, this tendency serves both this fictional wretch of a
detective, and myself, reasonably well.  I find annoying incongruities in
abstractions and I fidget and fiddle with them until I end up building
something that &lt;a href="https://twisted.org"&gt;a lot of people like&lt;/a&gt;, or perhaps
something that a smaller number of people get &lt;a href="https://automat.readthedocs.io/en/latest/"&gt;&lt;em&gt;really&lt;/em&gt; excited
about&lt;/a&gt;.  At worst, at least &lt;a href="https://fritter.readthedocs.io/en/latest/"&gt;&lt;em&gt;I&lt;/em&gt;
eventually understand what’s going
on&lt;/a&gt;.  This is a self-soothing
activity but it turns out that, managed properly, it can very effectively
soothe others as well.&lt;/p&gt;
&lt;p&gt;All that brings us to today’s topic, which is an incongruity I cannot smooth
out or fit into a logical framework to make sense.  I am, somewhat reluctantly,
a &lt;a href="https://blog.glyph.im/2024/05/grand-unified-ai-hype.html"&gt;genAI&lt;/a&gt;
&lt;a href="https://blog.glyph.im/2025/03/a-bigger-database.html"&gt;skeptic&lt;/a&gt;.  However, I am, &lt;em&gt;even more&lt;/em&gt;
reluctantly, exposed to genAI Discourse every damn minute of every damn day.
It is relentless, inescapable, and exhausting.&lt;/p&gt;
&lt;p&gt;This preamble about personality should hopefully help you, dear reader, to
understand how I usually address problematical ideas by thinking and thinking
and fidgeting with them until I manage to write some words — or perhaps a new
open source package — that logically orders the ideas around it in a way which
allows my brain to calm down and let it go, and how that process is important
to me.&lt;/p&gt;
&lt;p&gt;In this particular instance, however, genAI has defeated me.  I cannot make it
make sense, but I need to stop thinking about it anyway.  It is too much and I
need to give up.&lt;/p&gt;
&lt;p&gt;My goal with this post is not to &lt;em&gt;convince&lt;/em&gt; anyone of anything in particular —
and we’ll get to why that is a bit later — but rather:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;to set out my current understanding in one place, including all the various
   negative feelings which are still bothering me, so I can stop repeating it
   elsewhere,&lt;/li&gt;
&lt;li&gt;to explain &lt;em&gt;why&lt;/em&gt; I cannot build a case that I think &lt;em&gt;should&lt;/em&gt; be particularly
   convincing to anyone else, particularly to someone who actively disagrees
   with me,&lt;/li&gt;
&lt;li&gt;in so doing, to illustrate why I think the discourse is so fractious and
   unresolvable, and finally&lt;/li&gt;
&lt;li&gt;to give myself, and hopefully by proxy to give others in the same situation,
   permission to just peace out of this nightmare quagmire corner of the
   noosphere.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But first, just because I can’t &lt;em&gt;prove&lt;/em&gt; that my interlocutors are &lt;a href="https://xkcd.com/386/"&gt;Wrong On The
Internet&lt;/a&gt;, doesn’t mean I won’t explain why I &lt;em&gt;feel&lt;/em&gt;
like they are wrong.&lt;/p&gt;
&lt;h2 id=the-anti-antis&gt;The Anti-Antis&lt;/h2&gt;
&lt;p&gt;Most recently, at time of writing, there have been a spate of “the genAI
discourse is bad” articles, almost exclusively written from the perspective of,
not &lt;em&gt;boosters&lt;/em&gt; exactly, but pragmatically minded (albeit concerned) genAI
users, wishing for the skeptics to be more pointed and accurate in our
critiques.  This is anti-anti-genAI content.&lt;/p&gt;
&lt;p&gt;I am not going to link to any of these, because, as part of their
self-fulfilling prophecy about the “genAI discourse”, they’re &lt;em&gt;also&lt;/em&gt; all bad.&lt;/p&gt;
&lt;p&gt;Mostly, however, they had very little worthwhile to respond to because they
were straw-manning their erstwhile interlocutors.  They are all getting annoyed
at “bad genAI criticism” while failing to engage with — and often failing to
even &lt;em&gt;mention&lt;/em&gt; — most of the actual &lt;em&gt;substance&lt;/em&gt; of any serious genAI
criticism.  At least, any of the criticism that I’ve personally read.&lt;/p&gt;
&lt;p&gt;I understand wanting to avoid a callout or Gish-gallop culture and just express
your own ideas.  So, I understand that they didn’t link directly to particular
sources or go point-by-point on anyone else’s writing.  Obviously I get it,
since that’s exactly what this post is doing too.&lt;/p&gt;
&lt;p&gt;But if you’re going to talk about how bad the genAI conversation is, without
even &lt;em&gt;mentioning&lt;/em&gt; huge categories of problem like “climate impact” or
“disinformation”&lt;sup id=fnref:1:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:1:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt; even once, I honestly don’t know what conversation you’re
even talking about.  This is peak “make up a guy to get mad at” behavior, which
is especially confusing in this circumstance, because there’s an absolutely
&lt;em&gt;huge&lt;/em&gt; crowd of actual people that you could already be mad at.&lt;/p&gt;
&lt;p&gt;The people writing these pieces have historically seemed very thoughtful to me.
Some of them I know personally.  It is worrying to me that their critical
thinking skills appear to have substantially degraded &lt;em&gt;specifically&lt;/em&gt; after
spending a bunch of time intensely using this technology which I believe has a
&lt;em&gt;scary&lt;/em&gt; risk of &lt;a href="https://skepchick.org/2025/05/chatgpt-is-creating-cult-leaders/"&gt;degrading one’s critical thinking
skills&lt;/a&gt;.
Correlation is not causation or whatever, and sure, from a rhetorical
perspective this is “post hoc ergo propter hoc” and maybe a little “ad hominem”
for good measure, but correlation can still be &lt;em&gt;concerning&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Yet, I cannot &lt;em&gt;effectively&lt;/em&gt; respond to these folks, because they are making a
&lt;em&gt;practical&lt;/em&gt; argument that I cannot, despite my best efforts, find compelling
evidence to refute categorically.  &lt;em&gt;My&lt;/em&gt; experiences of genAI are all extremely
bad, but that is barely even anecdata.  &lt;em&gt;Their&lt;/em&gt; experiences are
neutral-to-positive.  Little scientific data exists.  How to resolve this?&lt;sup id=fnref:2:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:2:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:2&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id=the-aesthetics&gt;The Aesthetics&lt;/h2&gt;
&lt;p&gt;As I begin to state my &lt;em&gt;own&lt;/em&gt; position, let me lead with this: my factual
analysis of genAI is hopelessly negatively biased.  I find the vast majority of
the aesthetic properties of genAI to be &lt;em&gt;intensely&lt;/em&gt; unpleasant.&lt;/p&gt;
&lt;p&gt;I have been trying &lt;em&gt;very&lt;/em&gt; hard to correct for this bias, to try to pay
attention to the facts and to have a clear-eyed view of these systems’
capabilities. But the feelings are visceral, and the effort to compensate is
tiring.  It is, in fact, the desire to stop making this &lt;em&gt;particular&lt;/em&gt; kind of
effort that has me writing up this piece and trying to take an intentional
break from the subject, despite its intense relevance.&lt;/p&gt;
&lt;p&gt;When I say its “aesthetic qualities” are unpleasant, I don’t just mean the
aesthetic elements of output of genAIs themselves. The aesthetic quality of
genAI writing, visual design, animation and so on, while &lt;em&gt;mostly&lt;/em&gt; atrocious, is
also highly variable.  There are cherry-picked examples which look… fine.
Maybe even good.  For years now, there have been, famously, literally
award-winning aesthetic outputs of genAI&lt;sup id=fnref:3:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:3:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:3&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;While I am ideologically predisposed to see any “good” genAI art as accruing
the benefits of either a survivorship bias from thousands of terrible outputs
or simple plagiarism rather than its own inherent quality, I cannot deny that
in many cases it &lt;em&gt;is&lt;/em&gt; “good”.&lt;/p&gt;
&lt;p&gt;However, I am not just talking about the product, but the process; the
aesthetic experience of interfacing with the genAI system itself, rather than
the aesthetic experience of the outputs of that system.&lt;/p&gt;
&lt;p&gt;I am not a visual artist and I am not really a writer&lt;sup id=fnref:4:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:4:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:4&gt;4&lt;/a&gt;&lt;/sup&gt;, particularly not a
writer of fiction or anything else whose experience is primarily aesthetic.  So
I will speak directly to the experience of software development.&lt;/p&gt;
&lt;p&gt;I have seen very few successful examples of using genAI to produce whole,
working systems.  There are no shortage of highly public &lt;a href="https://neuromatch.social/@jonny/114622547164112473"&gt;miserable
failures&lt;/a&gt;, particularly
from &lt;a href="https://old.reddit.com/r/ExperiencedDevs/comments/1krttqo/my_new_hobby_watching_ai_slowly_drive_microsoft/"&gt;the vendors of these systems
themselves&lt;/a&gt;,
where the outputs are confused, self-contradictory, full of subtle errors and
generally unusable.  While few studies exist, it sure &lt;em&gt;looks&lt;/em&gt; like this is an
automated way of producing a &lt;a href="https://wiki.c2.com/?NetNegativeProducingProgrammer"&gt;Net Negative Productivity
Programmer&lt;/a&gt;, throwing out
chaff to slow down the rest of the team.&lt;sup id=fnref:5:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:5:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:5&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Juxtapose this with my aforementioned psychological motivations, to wit, I want
to have everything in the computer be &lt;em&gt;orderly&lt;/em&gt; and &lt;em&gt;make sense&lt;/em&gt;, I’m sure most
of you would have no trouble imagining that sitting through this sort of
practice would make me &lt;em&gt;extremely&lt;/em&gt; unhappy.&lt;/p&gt;
&lt;p&gt;Despite this plethora of negative experiences, executives are aggressively
mandating the use of AI&lt;sup id=fnref:6:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:6:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:6&gt;6&lt;/a&gt;&lt;/sup&gt;.  It looks like &lt;em&gt;without&lt;/em&gt; such mandates, most
people will not bother to use such tools, so the executives will need muscular
policies to enforce its use.&lt;sup id=fnref:7:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:7:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:7&gt;7&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Being forced to sit and argue with a robot while it struggles and fails to
produce a working output, while you have to rewrite the code at the end anyway,
is incredibly demoralizing.  This is the kind of activity that activates &lt;a href="https://hbr.org/2019/07/6-causes-of-burnout-and-how-to-avoid-them"&gt;every
single major cause of
burnout&lt;/a&gt; at
once.&lt;/p&gt;
&lt;p&gt;But, at least in that scenario, the thing &lt;em&gt;ultimately doesn’t work&lt;/em&gt;, so there’s
a hope that after a very stressful six month pilot program, you can go to
management with a pile of meticulously collected evidence, and shut the whole
thing down.&lt;/p&gt;
&lt;p&gt;I am inclined to believe that, in fact, it doesn’t work well enough to be used
this way, and that we are going to see a big crash.  But that is not the most
aesthetically distressing thing.  The most distressing thing is that maybe it
&lt;em&gt;does&lt;/em&gt; work; if not well enough to actually do the work, at least ambiguously
enough to fool the executives long-term.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/cloudflare/workers-oauth-provider?tab=readme-ov-file#written-using-claude"&gt;This
project&lt;/a&gt;,
in particular, stood out to me as an example.  Its author, a self-professed “AI
skeptic” who “thought LLMs were glorified Markov chain generators that didn’t
actually understand code and couldn’t produce anything novel”, did a
green-field project to test this hypothesis.&lt;/p&gt;
&lt;p&gt;Now, this particular project is not &lt;em&gt;totally&lt;/em&gt; inconsistent with a world in
which LLMs cannot produce anything novel.  One could imagine that, out in the
world of open source, perhaps there is enough “OAuth provider written in
TypeScript” blended up into the slurry of “borrowed&lt;sup id=fnref:8:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:8:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:8&gt;8&lt;/a&gt;&lt;/sup&gt;” training data that the
minor constraint of “make it work on Cloudflare Workers” is a small tweak&lt;sup id=fnref:9:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:9:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:9&gt;9&lt;/a&gt;&lt;/sup&gt;. It
is not fully dispositive of the question of the viability of “genAI coding”.&lt;/p&gt;
&lt;p&gt;But it is a data point related to that question, and thus it did make me
contend with what might happen if it &lt;em&gt;were&lt;/em&gt; actually a fully demonstrative
example.  I reviewed the commit history, as the author suggested.  For the sake
of argument, I tried to ask myself if I would like working this way.  Just for
clarity on this question, I wanted to suspend judgement about everything else;
assuming:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the model could be created with ethically, legally, voluntarily sourced
  training data&lt;/li&gt;
&lt;li&gt;its usage involved consent from labor rather than authoritarian mandates&lt;/li&gt;
&lt;li&gt;sensible levels of energy expenditure, with minimal CO2 impact&lt;/li&gt;
&lt;li&gt;it is substantially more efficient to work this way than to just write the
  code yourself&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and so on, and so on… would I &lt;em&gt;like&lt;/em&gt; to use this magic robot that could mostly
just emit working code for me?  Would I use it if it were &lt;em&gt;free&lt;/em&gt;, in all senses
of the word?&lt;/p&gt;
&lt;p&gt;No. I absolutely would not.&lt;/p&gt;
&lt;p&gt;I found the experience of reading this commit history and imagining myself
using such a tool — without exaggeration — nauseating.&lt;/p&gt;
&lt;p&gt;Unlike &lt;a href="https://duckduckgo.com/?q=i+hate+code+review"&gt;many programmers&lt;/a&gt;, I love
code review.  I find that it is one of the best parts of the process of
programming.  I can help people learn, and develop their skills, and learn
&lt;em&gt;from&lt;/em&gt; them, and appreciate the decisions they made, develop an impression of a
fellow programmer’s style.  It’s a great way to build a mutual theory of mind.&lt;/p&gt;
&lt;p&gt;Of course, it can still be really annoying; people make mistakes, often can’t
see things I find obvious, and in particular when you’re reviewing a lot of
code from a lot of different people, you often end up having to repeat
explanations of the &lt;em&gt;same&lt;/em&gt; mistakes.  So I can see why many programmers,
particularly those more introverted than I am, hate it.&lt;/p&gt;
&lt;p&gt;But, ultimately, when I review their code and work hard to provide clear and
actionable feedback, people learn and grow and it’s worth that investment in
inconvenience.&lt;/p&gt;
&lt;p&gt;The process of coding with an “agentic” LLM appears to be the process of
carefully distilling all the worst parts of code review, and removing and
discarding all of its benefits.&lt;/p&gt;
&lt;p&gt;The lazy, dumb, lying robot asshole keeps making the same mistakes over and
over again, never improving, never genuinely reacting, always obsequiously
&lt;em&gt;pretending&lt;/em&gt; to take your feedback on board.&lt;/p&gt;
&lt;p&gt;Even when it “does” actually “understand” and manages to load your instructions
into its context window, 200K tokens later it will slide cleanly out of its
memory and you will have to say it again.&lt;/p&gt;
&lt;p&gt;All the while, it is attempting to trick you.  It gets most things right, but
it consistently makes mistakes in the places that you are least likely to
notice.  In places where a person &lt;em&gt;wouldn’t&lt;/em&gt; make a mistake.  Your brain keeps
trying to develop a theory of mind to predict its behavior but there’s no mind
there, so it always behaves infuriatingly randomly.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://youtu.be/_2C2CNmK7dQ"&gt;I don’t think I am the only one who feels this way.&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=the-affordances&gt;The Affordances&lt;/h2&gt;
&lt;p&gt;Whatever our environments &lt;a href="https://en.wikipedia.org/wiki/Affordance"&gt;afford&lt;/a&gt;,
we tend to do more of.  Whatever they resist, we tend to do less of.  So in a
world where we were all writing all of our code and emails and blog posts and
texts to each other with LLMs, what do they afford that existing tools do not?&lt;/p&gt;
&lt;p&gt;As a weirdo who enjoys code review, I also enjoy process engineering.  The
central question of almost all process engineering is to continuously ask: how
shall we shape our tools, to better shape ourselves?&lt;/p&gt;
&lt;p&gt;LLMs are an affordance for &lt;em&gt;producing more text, faster&lt;/em&gt;.  How is that going to
shape us?&lt;/p&gt;
&lt;p&gt;Again arguing in the alternative here, assuming the text is free from errors
and hallucinations and whatever, it’s all correct and fit for purpose, that
means it reduces the pain of circumstances where you have to repeat yourself.
Less pain!  Sounds great; I don’t like pain.&lt;/p&gt;
&lt;p&gt;Every codebase has places where you need boilerplate.  Every organization has
defects in its information architecture that require repetition of certain
information rather than a link back to the authoritative source of truth.
Often, these problems persist for a very long time, because it is difficult to
overcome the institutional inertia required to make real progress rather than
going along with the status quo.  But this is often where the highest-value
projects can be found. &lt;a href="https://www.phrases.org.uk/meanings/408900.html"&gt;Where there’s muck, there’s
brass&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The process-engineering function of an LLM, therefore, is to prevent
fundamental problems from ever getting fixed, to reward the rapid-fire
overwhelm of infrastructure teams with an immediate, catastrophic cascade of
legacy code that is now much harder to delete than it is to write.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;There is a scene in Game of Thrones where Khal Drogo kills himself.  He does so
by replacing a stinging, burning, therapeutic antiseptic wound dressing with
some cool, soothing mud.  The mud felt nice, addressed the immediate pain,
removed the discomfort of the antiseptic, and immediately gave him a lethal
infection.&lt;/p&gt;
&lt;p&gt;The pleasing feeling of immediate progress when one prompts an LLM to solve
some problem feels like cool mud on my brain.&lt;/p&gt;
&lt;h3 id=the-economics&gt;The Economics&lt;/h3&gt;
&lt;p&gt;We are in the middle of a &lt;a href="https://blog.glyph.im/2024/05/grand-unified-ai-hype.html"&gt;mania&lt;/a&gt; around
this technology.  As I have written about before, I believe the mania will end.
There will then be a crash, and a “winter”.  But, as I may not have stressed
sufficiently, this crash will be the biggest of its kind — so big, that it is
arguably not of a kind at all.  The level of investment in these technologies
is &lt;em&gt;bananas&lt;/em&gt; and the possibility that the investors will recoup their
investment seems close to zero.  Meanwhile, that cost keeps going up, and up,
and up.&lt;/p&gt;
&lt;p&gt;Others have reported on this in detail&lt;sup id=fnref:10:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:10:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:10&gt;10&lt;/a&gt;&lt;/sup&gt;, and I will not reiterate that all
here, but in addition to being a looming and scary industry-wide (if we are
lucky; more likely it’s probably “world-wide”) economic threat, it is also
going to drive some panicked behavior from management.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Elite_panic"&gt;Panicky behavior from management&lt;/a&gt;
stressed that their idea is not panning out is, famously, the cause of much
human misery.  I expect that even in the “good” scenario, where &lt;em&gt;some&lt;/em&gt; profit
is ultimately achieved, will still involve mass layoffs rocking the industry,
panicked re-hiring, destruction of large amounts of wealth.&lt;/p&gt;
&lt;p&gt;It feels bad to think about this.&lt;/p&gt;
&lt;h3 id=the-energy-usage&gt;The Energy Usage&lt;/h3&gt;
&lt;p&gt;For a long time I believed that the energy impact was overstated.  I am even on
record, &lt;a href="https://mastodon.social/@glyph/112242020641010867"&gt;about a year ago&lt;/a&gt;,
saying I didn’t think the energy usage was a big deal.  I think I was wrong
about that.&lt;/p&gt;
&lt;p&gt;It initially seemed like it was letting regular old data centers off the hook.
But recently I have learned that, while the numbers are incomplete because the
vendors aren’t sharing information, they’re also &lt;em&gt;extremely&lt;/em&gt; bad.&lt;sup id=fnref:11:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:11:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:11&gt;11&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;I think there’s probably a version of this technology that isn’t a climate
emergency nightmare, but that’s not the version that the general public has
access to today.&lt;/p&gt;
&lt;h2 id=the-educational-impact&gt;The Educational Impact&lt;/h2&gt;
&lt;p&gt;LLMs are making academic cheating &lt;em&gt;incredibly&lt;/em&gt; rampant.&lt;sup id=fnref:12:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:12:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:12&gt;12&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Not only is it so common as to be nearly universal, it’s also extremely harmful
to learning.&lt;sup id=fnref:13:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:13:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:13&gt;13&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;For learning, genAI is a &lt;a href="https://bsky.app/profile/samhalpert.bsky.social/post/3lmt3coqvqk2w"&gt;forklift at the
gym&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To some extent, LLMs are simply revealing a structural rot within education and
academia that has been building for decades if not centuries.  But it was
within those inefficiencies and the inconveniences of the academic experience
that real learning &lt;em&gt;was&lt;/em&gt;, against all odds, still happening in schools.&lt;/p&gt;
&lt;p&gt;LLMs produce a frictionless, streamlined process where students can
effortlessly glide through the entire credential, learning nothing.  Once
again, they dull the pain without regard to its cause.&lt;/p&gt;
&lt;p&gt;This is not good.&lt;/p&gt;
&lt;h2 id=the-invasion-of-privacy&gt;The Invasion of Privacy&lt;/h2&gt;
&lt;p&gt;This is obviously only a problem with the big cloud models, but then, the big
cloud models are the only ones that people actually use.  If you are having
conversations about anything private with ChatGPT, you are sending all of that
private information directly to Sam Altman, to do with as he wishes.&lt;/p&gt;
&lt;p&gt;Even if you don’t think he is a particularly bad guy, maybe he won’t even
create the privacy nightmare on purpose.  Maybe he will be forced to do so as a
result of some bizarre kafkaesque accident.&lt;sup id=fnref:14:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:14:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:14&gt;14&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Imagine the scenario, for example, where a woman is tracking her cycle and
uploading the logs to ChatGPT so she can chat with it about a health concern.
Except, surprise, you don’t have to imagine, you can just search for it, as &lt;em&gt;I&lt;/em&gt;
have personally, organically, seen three separate women on YouTube, at least
one of whom &lt;em&gt;lives in Texas&lt;/em&gt;, not only do this on camera but &lt;em&gt;recommend doing
this to their audiences&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Citation links withheld on this particular claim for hopefully obvious reasons.&lt;/p&gt;
&lt;p&gt;I assure you that I am neither particularly interested in menstrual products
nor genAI content, and if &lt;em&gt;I&lt;/em&gt; am seeing this more than once, it is probably a
distressingly large trend.&lt;/p&gt;
&lt;h2 id=the-stealing&gt;The Stealing&lt;/h2&gt;
&lt;p&gt;The training data for LLMs is stolen.  I don’t mean like “pirated” in the sense
where someone illicitly shares a copy they obtained legitimately; I mean their
scrapers are ignoring both norms&lt;sup id=fnref:15:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:15:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:15&gt;15&lt;/a&gt;&lt;/sup&gt; and laws&lt;sup id=fnref:16:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:16:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:16&gt;16&lt;/a&gt;&lt;/sup&gt; to obtain copies under false
pretenses, destroying other people’s infrastructure&lt;sup id=fnref:17:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:17:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:17&gt;17&lt;/a&gt;&lt;/sup&gt; in the process.&lt;/p&gt;
&lt;h2 id=the-fatigue&gt;The Fatigue&lt;/h2&gt;
&lt;p&gt;I have provided references to numerous articles outlining rhetorical and
sometimes data-driven cases for the existence of certain properties and
consequences of genAI tools.  But I can’t &lt;em&gt;prove&lt;/em&gt; any of these properties,
either at a point in time or as a durable ongoing problem.&lt;/p&gt;
&lt;p&gt;The LLMs themselves are simply too large to model with the usual kind of
heuristics one would use to think about software.  I’d sooner be able to
predict the physics of dice in a casino than a 2 trillion parameter neural
network.  They resist scientific understanding, not just because of their size
and complexity, but because unlike a natural phenomenon (which could of course
be considerably larger and more complex) they &lt;em&gt;resist experimentation&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The first form of genAI resistance to experiment is that every discussion is a
&lt;a href="https://en.wikipedia.org/wiki/Motte-and-bailey_fallacy"&gt;motte-and-bailey&lt;/a&gt;.  If
I use a free model and get a bad result I’m told it’s because I should have
used the paid model.  If I get a bad result with ChatGPT I should have used
Claude. If I get a bad result with a chatbot I need to start using an agentic
tool. If an agentic tool deletes my hard drive by putting &lt;code&gt;os.system(“rm -rf
~/”)&lt;/code&gt; into &lt;code&gt;sitecustomize.py&lt;/code&gt; then I guess I should have built my own MCP
integration with a completely novel heretofore never even considered security
sandbox or something?&lt;/p&gt;
&lt;p&gt;What configuration, exactly, would let me make a categorical claim about these
things?  What specific methodological approach should I stick to, to get
reliably adequate prompts?&lt;/p&gt;
&lt;p&gt;For the record though, if the idea of the free models is that they are going to
be provocative demonstrations of the impressive capabilities of the commercial
models, and the results are consistently dogshit, I am finding it increasingly
&lt;a href="https://ioc.exchange/@kevinriggle/114617713278070348"&gt;hard to care&lt;/a&gt; how much
better the paid ones are supposed to be, especially since the “better”-ness
cannot really be quantified in any meaningful way.&lt;/p&gt;
&lt;p&gt;The motte-and-bailey doesn’t stop there though.  It’s a war on all fronts.
Concerned about energy usage?  That’s OK, you can use a local model.  Concerned
about infringement?  That’s okay, somewhere, somebody, maybe, has figured out
how to train models consensually&lt;sup id=fnref:18:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:18:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:18&gt;18&lt;/a&gt;&lt;/sup&gt;.  Worried about the politics of enriching
the richest monsters in the world?  Don’t worry, you can always download an
“open source” model from Hugging Face.  It doesn’t matter that many of these
properties are mutually exclusive and attempting to fix one breaks two others;
there’s always an answer, the field is so abuzz with so many people trying to
pull in so many directions at once that it is legitimately difficult to
understand what’s going on.&lt;/p&gt;
&lt;p&gt;Even here though, I can see that characterizing everything this way is unfair
to a hypothetical sort of person.  If there is someone working at one of these
thousands of AI companies that have been springing up like toadstools after a
rain, and they &lt;em&gt;really are&lt;/em&gt; solving one of these extremely difficult problems,
how can I handwave that away?  We need people working on problems, that’s like,
the whole point of having an economy.  And I really don’t like shitting on
other people’s earnest efforts, so I try not to dismiss whole fields.  Given
how AI has gotten into &lt;em&gt;everything&lt;/em&gt;, in a way that e.g. cryptocurrency never
did, painting with that broad a brush inevitably ends up tarring a bunch of
stuff that isn’t even really AI at all.&lt;/p&gt;
&lt;p&gt;The second form of genAI resistance to experiment is the inherent obfuscation
of productization.  The models themselves are already complicated enough, but
the &lt;em&gt;products&lt;/em&gt; that are built around the models are evolving extremely rapidly.
ChatGPT is not just a “model”, and with the rapid&lt;sup id=fnref:19:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:19:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:19&gt;19&lt;/a&gt;&lt;/sup&gt; deployment of Model
Context Protocol tools, the edges of all these things will blur even further.
Every LLM is now just an enormous unbounded soup of arbitrary software doing
arbitrary whatever.  How could I possibly get my arms around that to understand
it?&lt;/p&gt;
&lt;h2 id=the-challenge&gt;The Challenge&lt;/h2&gt;
&lt;p&gt;I have woefully little experience with these tools.&lt;/p&gt;
&lt;p&gt;I’ve tried them out a little bit, and almost every single time the result has
been a disaster that has not made me curious to push further.  Yet, I keep
hearing from all over the industry that I should.&lt;/p&gt;
&lt;p&gt;To some extent, I feel like the motte-and-bailey characterization above is
fair; if the technology itself can really do real software development, it
ought to be able to do it in multiple modalities, and there’s nothing anyone
can &lt;em&gt;articulate&lt;/em&gt; to me about GPT-4o which puts it in a fundamentally different
class than GPT-3.5.&lt;/p&gt;
&lt;p&gt;But, also, I consistently hear that the &lt;em&gt;subjective experience&lt;/em&gt; of using the
premium versions of the tools is actually good, and the free ones are actually
bad.&lt;/p&gt;
&lt;p&gt;I keep struggling to find ways to try them “the right way”, the way that people
I know and otherwise respect claim to be using them, but I haven’t managed to
do so in any meaningful way yet.&lt;/p&gt;
&lt;p&gt;I do not want to be using the cloud versions of these models with their
potentially hideous energy demands; I’d like to use a local model.  But there
is obviously not a nicely composed way to use local models like this.&lt;/p&gt;
&lt;p&gt;Since there are apparently &lt;em&gt;zero&lt;/em&gt; models with ethically-sourced training data,
and litigation is ongoing&lt;sup id=fnref:20:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:20:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:20&gt;20&lt;/a&gt;&lt;/sup&gt; to determine the legal relationships of training
data and outputs, even if I can be comfortable with some level of plagiarism on
a project, I don’t feel that I can introduce the existential legal risk into
&lt;a href="https://pypi.org/user/glyph/"&gt;other people’s infrastructure&lt;/a&gt;, so I would need
to make a &lt;em&gt;new&lt;/em&gt; project.&lt;/p&gt;
&lt;p&gt;Others have differing opinions of course, including some within my dependency
chain, which does worry me, but I still don’t feel like I can freely contribute
further to the problem; it’s going to be bad enough to unwind any impact
upstream.  Even just for my own sake, I don’t want to make it worse.&lt;/p&gt;
&lt;p&gt;This especially presents a problem because I have &lt;a href="https://mastodon.social/@glyph/112498550495367755"&gt;way too much stuff going
on&lt;/a&gt; already.  A new project
is not practical.&lt;/p&gt;
&lt;p&gt;Finally, even if I &lt;em&gt;did&lt;/em&gt; manage to satisfy all of my quirky&lt;sup id=fnref:21:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:21:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:21&gt;21&lt;/a&gt;&lt;/sup&gt; constraints,
would this experiment really be worth anything?  The models and tools that
people are raving about are the big, expensive, harmful ones.  If I proved to
myself yet again that a small model with bad tools was unpleasant to use, I
wouldn’t really be addressing my opponents’ views.&lt;/p&gt;
&lt;p&gt;I’m stuck.&lt;/p&gt;
&lt;h2 id=the-surrender&gt;The Surrender&lt;/h2&gt;
&lt;p&gt;I am writing this piece to make &lt;em&gt;my&lt;/em&gt; peace with giving up on this topic, at
least for a while.  While I do idly hope that some folks might find bits of it
convincing, and perhaps find ways to be more mindful with their own usage of
genAI tools, and consider the harm they may be causing, that’s not actually the
goal.  And that is not the goal because it is just so much goddamn &lt;em&gt;work&lt;/em&gt; to
prove.&lt;/p&gt;
&lt;p&gt;Here, I must return to my philosophical hobbyhorse of
&lt;a href="https://en.wikipedia.org/wiki/Language_game_(philosophy)"&gt;sprachspiel&lt;/a&gt;.  In
this case, specifically to use it as an analytical tool, not just to understand
&lt;em&gt;what&lt;/em&gt; I am trying to say, but what the &lt;em&gt;purpose&lt;/em&gt; for my speech is.&lt;/p&gt;
&lt;p&gt;The concept of sprachspiel is most frequently deployed to describe the &lt;em&gt;goal&lt;/em&gt;
of the language game being played, but in game theory, that’s only half the
story.  Speech — particularly rigorously justified speech — has a &lt;em&gt;cost&lt;/em&gt;, as
well as a benefit.  I can make shit up pretty easily, but if I want to do
anything remotely like scientific or academic rigor, that cost can be
astronomical.  In the case of developing an abstract understanding of LLMs, the
cost is just too high.&lt;/p&gt;
&lt;p&gt;So what is my goal, then?  To be king Canute, standing astride the shore of
“tech”, whatever that is, commanding the LLM tide not to rise?  This is a
multi-trillion dollar juggernaut.&lt;/p&gt;
&lt;p&gt;Even the rump, loser, also-ran fragment of it has the power to literally
suffocate us in our homes&lt;sup id=fnref:22:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;&lt;a class=footnote-ref href=#fn:22:i-think-im-done-thinking-about-genai-for-now-2025-6 id=fnref:22&gt;22&lt;/a&gt;&lt;/sup&gt; if they so choose, completely insulated from any
consequence.  If the power curve starts there, imagine what the &lt;em&gt;winners&lt;/em&gt; in
this industry are going to be capable of, irrespective of the technology
they’re building - just with the resources they have to hand.  Am I going to
write a blog post that can rival their propaganda apparatus?  Doubtful.&lt;/p&gt;
&lt;p&gt;Instead, I will just have to concede that maybe I’m wrong.  I don’t have the
skill, or the knowledge, or the energy, to demonstrate with any level of rigor
that LLMs are generally, in fact, hot garbage.  Intellectually, I will have to
acknowledge that maybe the boosters are right.  Maybe it’ll be OK.&lt;/p&gt;
&lt;p&gt;Maybe the carbon emissions aren’t so bad.  Maybe everybody is keeping them
secret in ways that they don’t for other types of datacenter for perfectly
legitimate reasons.  Maybe the tools really can write novel and correct code,
and with a little more tweaking, it won’t be so difficult to get them to do it.
Maybe by the time they become a mandatory condition of access to developer
tools, they won’t be miserable.&lt;/p&gt;
&lt;p&gt;Sure, I even sincerely agree, intellectual property really has been a pretty
bad idea from the beginning.  Maybe it’s OK that we’ve made an exception to
those rules.  The rules were stupid anyway, so what does it matter if we let a
few billionaires break them?  Really, everybody should be able to break them
(although of course, regular people can’t, because we can’t afford the lawyers
to fight off the MPAA and RIAA, but that’s a problem with the legal system, not
tech).&lt;/p&gt;
&lt;p&gt;I come not to praise “AI skepticism”, but to bury it.&lt;/p&gt;
&lt;p&gt;Maybe it really is all going to be fine.  Perhaps I am simply catastrophizing;
I have been known to do that from time to time.  I can even sort of believe it,
in my head.  Still, even after writing all this out, I can’t quite manage to
believe it in the pit of my stomach.&lt;/p&gt;
&lt;p&gt;Unfortunately, that feeling is not something that you, or I, can argue with.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt;. Normally, I would say, “who are
supporting my writing on this blog”, but in the case of this piece, I feel more
like I should apologize to them for this than to thank them; these thoughts
have been preventing me from thinking more productive, useful things that I
actually have relevant skill and expertise in; this felt more like a creative
blockage that I just needed to expel than a deliberately written article.  If
you like what you’ve read here and you’d like to read more of it, well, too
bad; I am &lt;em&gt;sincerely&lt;/em&gt; determined to stop writing about this topic.  But, if
you’d like to read more stuff like &lt;em&gt;other&lt;/em&gt; things I have written, or you’d like
to support my &lt;a href="https://github.com/glyph/"&gt;various open-source endeavors&lt;/a&gt;, you
can &lt;a href="/pages/patrons.html"&gt;support my work as a sponsor&lt;/a&gt;!&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:1&gt;And yes, disinformation is still an issue even if you’re “just” using it
for coding. Even sidestepping the practical matter that technology is
inherently political, &lt;a href="https://gist.github.com/0xabad1dea/be18e11beb2e12433d93475d72016902"&gt;validation and propagation of poor technique is a
form of
disinformation&lt;/a&gt;. &lt;a class=footnote-backref href=#fnref:1:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:2&gt;I can’t resolve it, that’s the whole tragedy here, but I guess we have
to pretend I will to maintain narrative momentum here. &lt;a class=footnote-backref href=#fnref:2:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:3:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:3&gt;The story in &lt;a href="https://www.creativebloq.com/news/ai-art-wins-competition"&gt;Creative
Bloq&lt;/a&gt;, or &lt;a href="https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html"&gt;the
NYT&lt;/a&gt;,
if you must &lt;a class=footnote-backref href=#fnref:3:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:4:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:4&gt;although it’s not for lack of trying, Jesus, look at the word count on this &lt;a class=footnote-backref href=#fnref:4:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 4 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:5:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:5&gt;These are sometimes referred to as “10x” programmers, because they make
everyone around them 10x slower. &lt;a class=footnote-backref href=#fnref:5:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 5 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:6:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:6&gt;Douglas B. Laney at Forbes, &lt;a href="https://www.forbes.com/sites/douglaslaney/2025/04/09/selling-ai-strategy-to-employees-shopify-ceos-manifesto/"&gt;Viral Shopify CEO Manifesto Says AI Now Mandatory For All Employees&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:6:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 6 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:7:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:7&gt;The National CIO Review, &lt;a href="https://nationalcioreview.com/articles-insights/leadership/ai-mandates-minimal-use-closing-the-workplace-readiness-gap/"&gt;AI Mandates, Minimal Use: Closing the
Workplace Readiness
Gap&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:7:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 7 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:8:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:8&gt;Matt O’Brien at the AP, &lt;a href="https://apnews.com/article/reddit-sues-ai-company-anthropic-claude-chatbot-f5ea042beb253a3f05a091e70531692d"&gt;Reddit sues AI company Anthropic for allegedly ‘scraping’ user comments to train chatbot Claude&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:8:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 8 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:9:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:9&gt;Using the usual tricks to find plagiarism like searching for literal
transcriptions of snippets of training data did not pull up anything when I
tried, but then, that’s not how LLMs work these days, is it?  If it didn’t
&lt;em&gt;obfuscate&lt;/em&gt; the plagiarism it wouldn’t be a very good
plagiarism-obfuscator. &lt;a class=footnote-backref href=#fnref:9:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 9 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:10:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:10&gt;David Gerard at Pivot to AI, “&lt;a href="https://pivot-to-ai.com/2025/05/31/microsoft-and-ai-spending-billions-to-make-millions/"&gt;Microsoft and AI: spending billions to
make
millions&lt;/a&gt;”,
Edward Zitron at Where’s Your Ed At, “&lt;a href="https://www.wheresyoured.at/the-era-of-the-business-idiot/"&gt;The Era Of The Business
Idiot&lt;/a&gt;”, both
sobering reads &lt;a class=footnote-backref href=#fnref:10:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 10 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:11:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:11&gt;James O’Donnell and Casey Crownhart at the MIT Technology Review, &lt;a href="https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/"&gt;We did
the math on AI’s energy footprint. Here’s the story you haven’t
heard.&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:11:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 11 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:12:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:12&gt;Lucas Ropek at Gizmodo, &lt;a href="https://gizmodo.com/ai-cheating-is-so-out-of-hand-in-americas-schools-that-the-blue-books-are-coming-back-2000607771"&gt;AI Cheating Is So Out of Hand In America’s
Schools That the Blue Books Are Coming
Back&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:12:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 12 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:13:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:13&gt;James D. Walsh at the New York Magazine Intelligencer, &lt;a href="https://nymag.com/intelligencer/article/openai-chatgpt-ai-cheating-education-college-students-school.html"&gt;Everyone Is Cheating Their Way Through College&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:13:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 13 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:14:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:14&gt;Ashley Belanger at Ars Technica, &lt;a href="https://arstechnica.com/tech-policy/2025/06/openai-says-court-forcing-it-to-save-all-chatgpt-logs-is-a-privacy-nightmare/"&gt;OpenAI slams court order to save all
ChatGPT logs, including deleted
chats&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:14:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 14 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:15:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:15&gt;Ashley Belanger at Ars Technica, &lt;a href="https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/"&gt;AI haters build tarpits to trap and trick
AI scrapers that ignore
robots.txt&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:15:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 15 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:16:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:16&gt;Blake Brittain at Reuters, &lt;a href="https://www.reuters.com/legal/litigation/judge-meta-case-weighs-key-question-ai-copyright-lawsuits-2025-05-01/"&gt;Judge in Meta case warns AI could ‘obliterate’ market for original works&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:16:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 16 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:17:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:17&gt;Xkeeper, &lt;a href="https://blog.xkeeper.net/uncategorized/tcrf-has-been-getting-ddosed/"&gt;TCRF has been getting DDoSed&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:17:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 17 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:18:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:18&gt;Kate Knibbs at Wired, &lt;a href="https://www.wired.com/story/proof-you-can-train-ai-without-slurping-copyrighted-content/"&gt;Here’s Proof You Can Train an AI Model Without
Slurping Copyrighted
Content&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:18:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 18 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:19:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:19&gt;and, I should note, &lt;a href="https://equixly.com/blog/2025/03/29/mcp-server-new-security-nightmare/"&gt;extremely irresponsible&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:19:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 19 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:20:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:20&gt;Porter Anderson at Publishing Perspectives, &lt;a href="https://publishingperspectives.com/2025/04/meta-ai-lawsuit-us-publishers-file-amicus-brief/"&gt;Meta AI Lawsuit: US
Publishers File Amicus
Brief&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:20:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 20 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:21:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:21&gt;It feels bizarre to characterize what feel like baseline ethical
concerns this way, but the fact remains that within the “genAI community”,
this places me into a tiny and obscure minority. &lt;a class=footnote-backref href=#fnref:21:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 21 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:22:i-think-im-done-thinking-about-genai-for-now-2025-6&gt;
&lt;p id=fn:22&gt;Ariel Wittenberg for Politico, &lt;a href="https://www.politico.com/news/2025/05/06/elon-musk-xai-memphis-gas-turbines-air-pollution-permits-00317582"&gt;‘How come I can’t breathe?’: Musk’s data
company draws a backlash in
Memphis&lt;/a&gt; &lt;a class=footnote-backref href=#fnref:22:i-think-im-done-thinking-about-genai-for-now-2025-6 title="Jump back to footnote 22 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="ai"></category><category term="programming"></category><category term="politics"></category><category term="sprachspiel"></category></entry><entry><title>Stop Writing `__init__` Methods</title><link href="https://blog.glyph.im/2025/04/stop-writing-init-methods.html" rel="alternate"></link><published>2025-04-17T15:35:00-07:00</published><updated>2025-04-17T15:35:00-07:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-04-17:/2025/04/stop-writing-init-methods.html</id><summary type="html">&lt;p&gt;YEARS OF DATACLASSES yet NO REAL-WORLD USE FOUND for overriding
special methods just so you can have some attributes.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;h2 id=the-history&gt;The History&lt;/h2&gt;
&lt;p&gt;Before dataclasses were added to Python in version 3.7 — in June of 2018 — the
&lt;code&gt;__init__&lt;/code&gt; special method had an important use.  If you had a class
representing a data structure — for example a &lt;code&gt;2DCoordinate&lt;/code&gt;, with &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;
attributes — you would want to be able to construct it as &lt;code&gt;2DCoordinate(x=1,
y=2)&lt;/code&gt;, which would require you to add an &lt;code&gt;__init__&lt;/code&gt; method with &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;
parameters.&lt;/p&gt;
&lt;p&gt;The other options available at the time all had pretty bad problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You could remove &lt;code&gt;2DCoordinate&lt;/code&gt; from your public API and instead expose a
   &lt;code&gt;make_2d_coordinate&lt;/code&gt; function and make it non-importable, but then how would
   you document your return or parameter types?&lt;/li&gt;
&lt;li&gt;You could document the &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt; attributes and make the user assign each
   one themselves, but then &lt;code&gt;2DCoordinate()&lt;/code&gt; would return an invalid object.&lt;/li&gt;
&lt;li&gt;You could default your coordinates to 0 with class attributes, and while
   that would fix the problem with option 2, this would now require all
   &lt;code&gt;2DCoordinate&lt;/code&gt; objects to be not just mutable, but mutated at every call
   site.&lt;/li&gt;
&lt;li&gt;You could fix the problems with option 1 by adding a new &lt;em&gt;abstract&lt;/em&gt; class
   that you could expose in your public API, but this would explode the
   complexity of every new public class, no matter how simple.  To make matters
   worse, &lt;code&gt;typing.Protocol&lt;/code&gt; didn’t even arrive until Python 3.8, so, in the
   pre-3.7 world this would condemn you to using concrete inheritance and
   declaring multiple classes even for the most basic data structure
   imaginable.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Also, an &lt;code&gt;__init__&lt;/code&gt; method &lt;em&gt;that does nothing but assign a few attributes&lt;/em&gt;
doesn’t have any significant problems, so it is an obvious choice in this case.
Given all the problems that I just described with the alternatives, it makes
sense that it became the obvious &lt;em&gt;default&lt;/em&gt; choice, in most cases.&lt;/p&gt;
&lt;p&gt;However, by accepting “define a custom &lt;code&gt;__init__&lt;/code&gt;” as the &lt;em&gt;default&lt;/em&gt; way to
allow users to create your objects, we make a habit of beginning &lt;em&gt;every&lt;/em&gt; class
with a pile of &lt;em&gt;arbitrary code&lt;/em&gt; that gets executed every time it is
instantiated.&lt;/p&gt;
&lt;p&gt;Wherever there is arbitrary code, there are arbitrary problems.&lt;/p&gt;
&lt;h2 id=the-problems&gt;The Problems&lt;/h2&gt;
&lt;p&gt;Let’s consider a data structure more complex than one that simply holds a
couple of attributes.  We will create one that represents a reference to some
I/O in the external world: a &lt;code&gt;FileReader&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Of course Python has &lt;a href="https://docs.python.org/3.13/library/io.html#io.FileIO"&gt;its own open-file object
abstraction&lt;/a&gt;, but I
will be ignoring that for the purposes of the example.&lt;/p&gt;
&lt;p&gt;Let’s assume a world where we have the following functions, in an imaginary
&lt;code&gt;fileio&lt;/code&gt; module:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;open(path: str) -&amp;gt; int&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;read(fileno: int, length: int)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;close(fileno: int)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our hypothetical &lt;code&gt;fileio.open&lt;/code&gt; returns an integer representing a file
descriptor&lt;sup id=fnref:1:stop-writing-init-methods-2025-4&gt;&lt;a class=footnote-ref href=#fn:1:stop-writing-init-methods-2025-4 id=fnref:1&gt;1&lt;/a&gt;&lt;/sup&gt;, &lt;code&gt;fileio.read&lt;/code&gt; allows us to read &lt;code&gt;length&lt;/code&gt; bytes from an open
file descriptor, and &lt;code&gt;fileio.close&lt;/code&gt; closes that file descriptor, invalidating
it for future use.&lt;/p&gt;
&lt;p&gt;With the habit that we have built from writing thousands of &lt;code&gt;__init__&lt;/code&gt; methods,
we might want to write our &lt;code&gt;FileReader&lt;/code&gt; class like this:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;
&lt;span class=normal&gt;5&lt;/span&gt;
&lt;span class=normal&gt;6&lt;/span&gt;
&lt;span class=normal&gt;7&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=k&gt;class&lt;/span&gt; &lt;span class=nc&gt;FileReader&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=fm&gt;__init__&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;path&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;str&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=kc&gt;None&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;open&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;path&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;read&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;length&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=nb&gt;bytes&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=k&gt;return&lt;/span&gt; &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;read&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;length&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;close&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=kc&gt;None&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;close&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For our initial use-case, this is fine.  Client code creates a &lt;code&gt;FileReader&lt;/code&gt; by
doing something like &lt;code&gt;FileReader("./config.json")&lt;/code&gt;, which always creates a
&lt;code&gt;FileReader&lt;/code&gt; that maintains its file descriptor &lt;code&gt;int&lt;/code&gt; internally as private
state.  This is as it should be; we don’t want user code to see or mess with
&lt;code&gt;_fd&lt;/code&gt;, as that might violate &lt;code&gt;FileReader&lt;/code&gt;’s invariants.  All the necessary work
to construct a valid &lt;code&gt;FileReader&lt;/code&gt; — i.e. the call to &lt;code&gt;open&lt;/code&gt; — is always taken
care of for you by &lt;code&gt;FileReader.__init__&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;However, additional requirements will creep in, and as they do,
&lt;code&gt;FileReader.__init__&lt;/code&gt; becomes increasingly awkward.&lt;/p&gt;
&lt;p&gt;Initially we only care about &lt;code&gt;fileio.open&lt;/code&gt;, but later, we may have to deal with
a library that has its own reasons for managing the call to &lt;code&gt;fileio.open&lt;/code&gt; by
itself, and wants to give us an &lt;code&gt;int&lt;/code&gt; that we use as our &lt;code&gt;_fd&lt;/code&gt;, we now have to
resort to weird workarounds like:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;reader_from_fd&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;fd&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=n&gt;FileReader&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
    &lt;span class=n&gt;fr&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=nb&gt;object&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=fm&gt;__new__&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;FileReader&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
    &lt;span class=n&gt;fr&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=n&gt;fd&lt;/span&gt;
    &lt;span class=k&gt;return&lt;/span&gt; &lt;span class=n&gt;fr&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now, all those nice properties that we got from trying to force object
construction to give us a valid object are gone.  &lt;code&gt;reader_from_fd&lt;/code&gt;’s type
signature, which takes a plain &lt;code&gt;int&lt;/code&gt;, has no way of even suggesting to client
code how to ensure that it has passed in the right &lt;em&gt;kind&lt;/em&gt; of &lt;code&gt;int&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Testing is much more of a hassle, because we have to patch in our own copy of
&lt;code&gt;fileio.open&lt;/code&gt; any time we want an instance of a &lt;code&gt;FileReader&lt;/code&gt; in a test without
doing any real-life file I/O, even if we could (for example) share a single
file descriptor among many &lt;code&gt;FileReader&lt;/code&gt; s for testing purposes.&lt;/p&gt;
&lt;p&gt;All of this also assumes a &lt;code&gt;fileio.open&lt;/code&gt; that is &lt;em&gt;synchronous&lt;/em&gt;.  Although for
literal file I/O this is more of a
&lt;a href="https://stackoverflow.com/questions/87892/what-is-the-status-of-posix-asynchronous-i-o-aio"&gt;hypothetical&lt;/a&gt;
concern, there are many types of networked resource which are really only
available via an asynchronous (and thus: potentially slow, potentially
error-prone) API.  If you’ve ever found yourself wanting to type &lt;code&gt;async def
__init__(self): ...&lt;/code&gt; then you have seen this limitation in practice.&lt;/p&gt;
&lt;p&gt;Comprehensively describing &lt;em&gt;all&lt;/em&gt; the possible problems with this approach would
end up being a book-length treatise on a philosophy of object oriented design,
so I will sum up by saying that the &lt;em&gt;cause&lt;/em&gt; of all these problems is the same:
we are inextricably linking the act of &lt;em&gt;creating a data structure&lt;/em&gt; with
&lt;em&gt;whatever side-effects are &lt;/em&gt;&lt;em&gt;most often&lt;/em&gt;&lt;em&gt; associated&lt;/em&gt; with that data structure.
If they are “often” associated with it, then by definition they are not
“always” associated with it, and all the cases where they &lt;em&gt;aren’t&lt;/em&gt; associated
become unweildy and potentially broken.&lt;/p&gt;
&lt;p&gt;Defining an &lt;code&gt;__init__&lt;/code&gt; is an anti-pattern, and we need a replacement for it.&lt;/p&gt;
&lt;h2 id=the-solutions&gt;The Solutions&lt;/h2&gt;
&lt;p&gt;I believe this tripartite assemblage of design techniques will address the
problems raised above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;using &lt;code&gt;dataclass&lt;/code&gt; to define attributes,&lt;/li&gt;
&lt;li&gt;replacing behavior that previously would have previously been in &lt;code&gt;__init__&lt;/code&gt;
  with a new classmethod that does the same thing, and&lt;/li&gt;
&lt;li&gt;using precise types to describe what a valid instance looks like.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=using-dataclass-attributes-to-create-an-__init__-for-you&gt;Using &lt;code&gt;dataclass&lt;/code&gt; attributes to create an &lt;code&gt;__init__&lt;/code&gt; for you&lt;/h3&gt;
&lt;p&gt;To begin, let’s refactor &lt;code&gt;FileReader&lt;/code&gt; into a &lt;code&gt;dataclass&lt;/code&gt;.  This does get us an
&lt;code&gt;__init__&lt;/code&gt; method, but it &lt;em&gt;won’t&lt;/em&gt; be one an arbitrary one we define ourselves;
it will get the useful constraint enforced on it that it will just assign
attributes.&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;
&lt;span class=normal&gt;5&lt;/span&gt;
&lt;span class=normal&gt;6&lt;/span&gt;
&lt;span class=normal&gt;7&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=nd&gt;@dataclass&lt;/span&gt;
&lt;span class=k&gt;class&lt;/span&gt; &lt;span class=nc&gt;FileReader&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
    &lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;read&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;length&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=nb&gt;bytes&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=k&gt;return&lt;/span&gt; &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;read&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;length&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;close&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=kc&gt;None&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;close&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Except... oops.  In fixing the problems that we created with our custom
&lt;code&gt;__init__&lt;/code&gt; that calls &lt;code&gt;fileio.open&lt;/code&gt;, we have re-introduced several problems
that it solved:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We have removed all the convenience of &lt;code&gt;FileReader("path")&lt;/code&gt;.  Now the user
   needs to import the low-level &lt;code&gt;fileio.open&lt;/code&gt; again, making the most common
   type of construction both more verbose and less discoverable; if we want
   users to know how to build a &lt;code&gt;FileReader&lt;/code&gt; in a practical scenario, we will
   have to add something in our documentation to point at a separate module
   entirely.&lt;/li&gt;
&lt;li&gt;There’s no enforcement of the validity of &lt;code&gt;_fd&lt;/code&gt; as a file descriptor; it’s
   just some integer, which the user could easily pass an incorrect instance
   of, with no error.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In isolation, &lt;code&gt;dataclass&lt;/code&gt; by itself can’t solve all our problems, so let’s add
in the second technique.&lt;/p&gt;
&lt;h3 id=using-classmethod-factories-to-create-objects&gt;Using &lt;code&gt;classmethod&lt;/code&gt; factories to create objects&lt;/h3&gt;
&lt;p&gt;We don’t want to require any additional imports, or require users to go looking
at any other modules — or indeed anything other than &lt;code&gt;FileReader&lt;/code&gt; itself — to
figure out how to create a &lt;code&gt;FileReader&lt;/code&gt; for its intended usage.&lt;/p&gt;
&lt;p&gt;Luckily we have a tool that can easily address all of these concerns at once:
&lt;code&gt;@classmethod&lt;/code&gt;.  Let’s define a &lt;code&gt;FileReader.open&lt;/code&gt; class method:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;
&lt;span class=normal&gt;5&lt;/span&gt;
&lt;span class=normal&gt;6&lt;/span&gt;
&lt;span class=normal&gt;7&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=kn&gt;from&lt;/span&gt; &lt;span class=nn&gt;typing&lt;/span&gt; &lt;span class=kn&gt;import&lt;/span&gt; &lt;span class=n&gt;Self&lt;/span&gt;
&lt;span class=nd&gt;@dataclass&lt;/span&gt;
&lt;span class=k&gt;class&lt;/span&gt; &lt;span class=nc&gt;FileReader&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
    &lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;
    &lt;span class=nd&gt;@classmethod&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;open&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;cls&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;path&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;str&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=n&gt;Self&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=k&gt;return&lt;/span&gt; &lt;span class=bp&gt;cls&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;open&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;path&lt;/span&gt;&lt;span class=p&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now, your callers can replace &lt;code&gt;FileReader("path")&lt;/code&gt; with
&lt;code&gt;FileReader.open("path")&lt;/code&gt;, and get all the same benefits.&lt;/p&gt;
&lt;p&gt;Additionally, if we needed to &lt;code&gt;await fileio.open(...)&lt;/code&gt;, and thus we needed its
signature to be &lt;code&gt;@classmethod async def open&lt;/code&gt;, we are freed from the constraint
of &lt;code&gt;__init__&lt;/code&gt; as a special method.  There is nothing that would prevent a
&lt;code&gt;@classmethod&lt;/code&gt; from being &lt;code&gt;async&lt;/code&gt;, or indeed, from having any other
modification to its return value, such as returning a &lt;code&gt;tuple&lt;/code&gt; of related values
rather than just the object being constructed.&lt;/p&gt;
&lt;h3 id=using-newtype-to-address-object-validity&gt;Using &lt;code&gt;NewType&lt;/code&gt; to address object validity&lt;/h3&gt;
&lt;p&gt;Next, let’s address the slightly trickier issue of enforcing object validity.&lt;/p&gt;
&lt;p&gt;Our type signature calls this thing an &lt;code&gt;int&lt;/code&gt;, and indeed, that is unfortunately
what the lower-level &lt;code&gt;fileio.open&lt;/code&gt; gives us, and that’s beyond our control.
But for our &lt;em&gt;own&lt;/em&gt; purposes, we can be more precise in our definitions, using
&lt;a href="https://docs.python.org/3.13/library/typing.html#newtype"&gt;&lt;code&gt;NewType&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=kn&gt;from&lt;/span&gt; &lt;span class=nn&gt;typing&lt;/span&gt; &lt;span class=kn&gt;import&lt;/span&gt; &lt;span class=n&gt;NewType&lt;/span&gt;
&lt;span class=n&gt;FileDescriptor&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=n&gt;NewType&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=s2&gt;"FileDescriptor"&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There are a few different ways to address the underlying library, but for the
sake of brevity and to illustrate that this can be done with &lt;em&gt;zero&lt;/em&gt; run-time
overhead, let’s just &lt;em&gt;insist&lt;/em&gt; to Mypy that we have versions of &lt;code&gt;fileio.open&lt;/code&gt;,
&lt;code&gt;fileio.read&lt;/code&gt;, and &lt;code&gt;fileio.write&lt;/code&gt; which actually already take &lt;code&gt;FileDescriptor&lt;/code&gt;
integers rather than regular ones.&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt;1&lt;/span&gt;
&lt;span class=normal&gt;2&lt;/span&gt;
&lt;span class=normal&gt;3&lt;/span&gt;
&lt;span class=normal&gt;4&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=kn&gt;from&lt;/span&gt; &lt;span class=nn&gt;typing&lt;/span&gt; &lt;span class=kn&gt;import&lt;/span&gt; &lt;span class=n&gt;Callable&lt;/span&gt;
&lt;span class=n&gt;_open&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=n&gt;Callable&lt;/span&gt;&lt;span class=p&gt;[[&lt;/span&gt;&lt;span class=nb&gt;str&lt;/span&gt;&lt;span class=p&gt;],&lt;/span&gt; &lt;span class=n&gt;FileDescriptor&lt;/span&gt;&lt;span class=p&gt;]&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;open&lt;/span&gt;  &lt;span class=c1&gt;# type:ignore[assignment]&lt;/span&gt;
&lt;span class=n&gt;_read&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=n&gt;Callable&lt;/span&gt;&lt;span class=p&gt;[[&lt;/span&gt;&lt;span class=n&gt;FileDescriptor&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;&lt;span class=p&gt;],&lt;/span&gt; &lt;span class=nb&gt;bytes&lt;/span&gt;&lt;span class=p&gt;]&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;read&lt;/span&gt;
&lt;span class=n&gt;_close&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=n&gt;Callable&lt;/span&gt;&lt;span class=p&gt;[[&lt;/span&gt;&lt;span class=n&gt;FileDescriptor&lt;/span&gt;&lt;span class=p&gt;],&lt;/span&gt; &lt;span class=kc&gt;None&lt;/span&gt;&lt;span class=p&gt;]&lt;/span&gt; &lt;span class=o&gt;=&lt;/span&gt; &lt;span class=n&gt;fileio&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;close&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We do of course have to slightly adjust &lt;code&gt;FileReader&lt;/code&gt;, too, but the changes are
very small.  Putting it all together, we get:&lt;/p&gt;
&lt;div class=highlight&gt;&lt;table class=highlighttable&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class=linenos&gt;&lt;div class=linenodiv&gt;&lt;pre&gt;&lt;span class=normal&gt; 1&lt;/span&gt;
&lt;span class=normal&gt; 2&lt;/span&gt;
&lt;span class=normal&gt; 3&lt;/span&gt;
&lt;span class=normal&gt; 4&lt;/span&gt;
&lt;span class=normal&gt; 5&lt;/span&gt;
&lt;span class=normal&gt; 6&lt;/span&gt;
&lt;span class=normal&gt; 7&lt;/span&gt;
&lt;span class=normal&gt; 8&lt;/span&gt;
&lt;span class=normal&gt; 9&lt;/span&gt;
&lt;span class=normal&gt;10&lt;/span&gt;
&lt;span class=normal&gt;11&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=code&gt;&lt;div&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class=kn&gt;from&lt;/span&gt; &lt;span class=nn&gt;typing&lt;/span&gt; &lt;span class=kn&gt;import&lt;/span&gt; &lt;span class=n&gt;Self&lt;/span&gt;
&lt;span class=nd&gt;@dataclass&lt;/span&gt;
&lt;span class=k&gt;class&lt;/span&gt; &lt;span class=nc&gt;FileReader&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
    &lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=n&gt;FileDescriptor&lt;/span&gt;
    &lt;span class=nd&gt;@classmethod&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;open&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;cls&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;path&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;str&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=n&gt;Self&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=k&gt;return&lt;/span&gt; &lt;span class=bp&gt;cls&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;_open&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=n&gt;path&lt;/span&gt;&lt;span class=p&gt;))&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;read&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;length&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt; &lt;span class=nb&gt;int&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=nb&gt;bytes&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=k&gt;return&lt;/span&gt; &lt;span class=n&gt;_read&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;,&lt;/span&gt; &lt;span class=n&gt;length&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
    &lt;span class=k&gt;def&lt;/span&gt; &lt;span class=nf&gt;close&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt; &lt;span class=o&gt;-&amp;gt;&lt;/span&gt; &lt;span class=kc&gt;None&lt;/span&gt;&lt;span class=p&gt;:&lt;/span&gt;
        &lt;span class=n&gt;_close&lt;/span&gt;&lt;span class=p&gt;(&lt;/span&gt;&lt;span class=bp&gt;self&lt;/span&gt;&lt;span class=o&gt;.&lt;/span&gt;&lt;span class=n&gt;_fd&lt;/span&gt;&lt;span class=p&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note that the main technique here is not &lt;em&gt;necessarily&lt;/em&gt; using &lt;code&gt;NewType&lt;/code&gt;
specifically, but rather aligning an instance’s property of “has all attributes
set” as closely as possible with an instance’s property of “fully valid
instance of its class”; &lt;code&gt;NewType&lt;/code&gt; is just a handy tool to enforce any necessary
constraints on the places where you need to use a primitive type like &lt;code&gt;int&lt;/code&gt;,
&lt;code&gt;str&lt;/code&gt; or &lt;code&gt;bytes&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=in-summary-the-new-best-practice&gt;In Summary - The New Best Practice&lt;/h2&gt;
&lt;p&gt;From now on, when you’re defining a new Python class:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make it a dataclass&lt;sup id=fnref:2:stop-writing-init-methods-2025-4&gt;&lt;a class=footnote-ref href=#fn:2:stop-writing-init-methods-2025-4 id=fnref:2&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;Use its default &lt;code&gt;__init__&lt;/code&gt; method&lt;sup id=fnref:3:stop-writing-init-methods-2025-4&gt;&lt;a class=footnote-ref href=#fn:3:stop-writing-init-methods-2025-4 id=fnref:3&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;@classmethod&lt;/code&gt;s to provide your users convenient and discoverable ways to
  build your objects.&lt;/li&gt;
&lt;li&gt;Require that &lt;em&gt;all&lt;/em&gt; dependencies be satisfied by attributes, so you always
  start with a valid object.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;typing.NewType&lt;/code&gt; to enforce any constraints on primitive data types (like
  &lt;code&gt;int&lt;/code&gt; and &lt;code&gt;str&lt;/code&gt;) which might have magical external attributes, like needing
  to come from a particular library, needing to be random, and so on.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you define all your classes this way, you will get all the benefits of a
custom &lt;code&gt;__init__&lt;/code&gt; method:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All consumers of your data structures will receive valid objects, because an
  object with all its attributes populated correctly is inherently valid.&lt;/li&gt;
&lt;li&gt;Users of your library will be presented with convenient ways to create your
  objects that do as much work as is necessary to make them easy to use, and
  they can discover these just by looking at the methods on your class itself.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Along with some nice new benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You will be future-proofed against new requirements for different ways that
    users may need to construct your object.&lt;/li&gt;
&lt;li&gt;If there are already multiple ways to instantiate your class, you can now
    give each of them a meaningful name; no need to have monstrosities like
    &lt;code&gt;def __init__(self, maybe_a_filename: int | str | None = None):&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Your test suite can always construct an object by satisfying all its
    dependencies; no need to monkey-patch anything when you can always call the
    type and never do any I/O or generate any side effects.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Before dataclasses, it was always a bit weird that such a basic feature of the
Python language — giving data to a data structure to make it valid — required
overriding a method with 4 underscores in its name.  &lt;code&gt;__init__&lt;/code&gt; stuck out like
a sore thumb.  Other such methods like &lt;code&gt;__add__&lt;/code&gt; or even &lt;code&gt;__repr__&lt;/code&gt; were
inherently customizing esoteric attributes of classes.&lt;/p&gt;
&lt;p&gt;For many years now, that historical language wart has been
resolved. &lt;code&gt;@dataclass&lt;/code&gt;, &lt;code&gt;@classmethod&lt;/code&gt;, and &lt;code&gt;NewType&lt;/code&gt; give you everything you
need to build classes which are convenient, idiomatic, flexible, testable, and
robust.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt; who are supporting my writing on
this blog.  If you like what you’ve read here and you’d like to read more of
it, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!  I am also &lt;a href=mailto:consulting@glyph.im&gt;available for
consulting work&lt;/a&gt; if you think your organization
could benefit from expertise on topics like “but what &lt;em&gt;is&lt;/em&gt; a ‘class’, really?”.&lt;/p&gt;
&lt;div class=footnote&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=fn:1:stop-writing-init-methods-2025-4&gt;
&lt;p id=fn:1&gt;If you aren’t already familiar, a “file descriptor” is an integer which
has meaning only within your program; you tell the operating system to open
a file, it says “I have opened file 7 for you”, and then whenever you refer
to “7” it is that file, until you &lt;code&gt;close(7)&lt;/code&gt;. &lt;a class=footnote-backref href=#fnref:1:stop-writing-init-methods-2025-4 title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:2:stop-writing-init-methods-2025-4&gt;
&lt;p id=fn:2&gt;Or an &lt;a href="https://blog.glyph.im/2016/08/attrs.html"&gt;attrs class&lt;/a&gt;, if you’re nasty. &lt;a class=footnote-backref href=#fnref:2:stop-writing-init-methods-2025-4 title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=fn:3:stop-writing-init-methods-2025-4&gt;
&lt;p id=fn:3&gt;Unless you have a really good reason to, of course.  Backwards
compatibility, or compatibility with another library, might be good reasons
to do that.  Or certain types of data-consistency validation which cannot
be expressed within the type system.  The most common example of these
would be a class that requires consistency &lt;em&gt;between&lt;/em&gt; two different fields,
such as a “range” object where &lt;code&gt;start&lt;/code&gt; must always be less than &lt;code&gt;end&lt;/code&gt;.
There are always exceptions to these types of rules.  Still, it’s pretty
much &lt;em&gt;never&lt;/em&gt; a good idea to do any I/O in &lt;code&gt;__init__&lt;/code&gt;, and nearly all of the
remaining stuff that may &lt;em&gt;sometimes&lt;/em&gt; be a good idea in edge-cases can be
achieved with a
&lt;a href="https://docs.python.org/3.13/library/dataclasses.html#dataclasses.__post_init__"&gt;&lt;code&gt;__post_init__&lt;/code&gt;&lt;/a&gt;
rather than writing a literal &lt;code&gt;__init__&lt;/code&gt;. &lt;a class=footnote-backref href=#fnref:3:stop-writing-init-methods-2025-4 title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;&lt;/body&gt;</content><category term="misc"></category><category term="python"></category><category term="programming"></category></entry><entry><title>A Bigger Database</title><link href="https://blog.glyph.im/2025/03/a-bigger-database.html" rel="alternate"></link><published>2025-03-31T17:47:00-07:00</published><updated>2025-03-31T17:47:00-07:00</updated><author><name>Glyph</name></author><id>tag:blog.glyph.im,2025-03-31:/2025/03/a-bigger-database.html</id><summary type="html">&lt;p&gt;We do what we can, because we must.&lt;/p&gt;</summary><content type="html">&lt;body&gt;&lt;h2 id=a-database-file&gt;A Database File&lt;/h2&gt;
&lt;p&gt;When I was 10 years old, and going through a fairly difficult time, I was lucky
enough to come into the possession of a piece of software called Claris
FileMaker Pro™.&lt;/p&gt;
&lt;p&gt;FileMaker allowed its users to construct arbitrary databases, and to associate
their tables with a customized visual presentation.  FileMaker also had a
rudimentary scripting language, which would allow users to imbue these
databases with behavior.&lt;/p&gt;
&lt;p&gt;As a mentally ill pre-teen, lacking a sense of control over anything or anyone
in my own life, including myself, I began building a personalized database to
catalogue the various objects and people in my immediate vicinity.  If one were
inclined to be generous, one might assess this behavior and say I was
systematically taxonomizing the objects in my life and recording schematized
information about them.&lt;/p&gt;
&lt;p&gt;As I saw it at the time, if I collected the information, I could always use it
later, to answer questions that I might have.  If I &lt;em&gt;didn’t&lt;/em&gt; collect it, then
what if I needed it?  Surely I would regret it!  Thus I developed a categorical
imperative to spend as much of my time as possible collecting and entering data
about everything that I could reasonably arrange into a common schema.&lt;/p&gt;
&lt;p&gt;Having thus summoned this specter of regret for all lost data-entry
opportunities, it was hard to dismiss.  We might label it “Claris’s Basilisk”,
for obvious reasons.&lt;/p&gt;
&lt;p&gt;Therefore, a less-generous (or more clinically-minded) observer might have
replaced the word “systematically” with “obsessively” in the assessment above.&lt;/p&gt;
&lt;p&gt;I also began writing what scripts were within my marginal programming abilities
at the time, just because I could: things like computing the sum of every
street number of every person in my address book.  Why was this useful?  Wrong
question: the right question is “was it &lt;em&gt;possible&lt;/em&gt;” to which my answer was
“yes”.&lt;/p&gt;
&lt;p&gt;If I was obliged to collect all the information which I could observe — in case
it later became interesting — I was similarly obliged to write and run every
program I could.  It might, after all, emit some &lt;em&gt;other&lt;/em&gt; interesting
information.&lt;/p&gt;
&lt;p&gt;I was an avid reader of science fiction as well.&lt;/p&gt;
&lt;p&gt;I had this vague sense that computers could kind of think.  This resulted in a
chain of reasoning that went something like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;human brains are kinda like computers,&lt;/li&gt;
&lt;li&gt;the software running in the human brain is very complex,&lt;/li&gt;
&lt;li&gt;I could only write simple computer programs, &lt;em&gt;but&lt;/em&gt;,&lt;/li&gt;
&lt;li&gt;when you really think about it, a “complex” program is just a collection of
   simpler programs&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Therefore: if I just kept collecting data, collecting smaller programs that
could solve &lt;em&gt;specific&lt;/em&gt; problems, and connecting them all together in one big
file, &lt;em&gt;eventually&lt;/em&gt; the database as a whole would become self-aware and could
solve &lt;strong&gt;whatever&lt;/strong&gt; problem I wanted.  I just needed to be patient; to “keep
grinding” as the kids would put it today.&lt;/p&gt;
&lt;p&gt;I still feel like this is an understandable way to think — &lt;em&gt;if&lt;/em&gt; you are a
highly depressed and anxious 10-year-old in 1990.&lt;/p&gt;
&lt;p&gt;Anyway.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=35-years-later&gt;35 Years Later&lt;/h2&gt;
&lt;p&gt;OpenAI is a company that produces transformer architecture machine learning
generative AI models; their current generation was trained on about 10 trillion
words, obtained in a variety of different ways from a &lt;a href="https://www.artnews.com/art-news/news/books-database-libgen-meta-ai-training-andy-warhol-ai-weiwei-1234736227/"&gt;large
variety&lt;/a&gt;
of different, unrelated sources.&lt;/p&gt;
&lt;p&gt;A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to
“X™, The Everything App™,” and described the trajectory of his career of the
last decade at OpenAI as, and I quote, a “&lt;a href="https://x.com/sama/status/1904921537884676398"&gt;grind for a decade trying to help
make super-intelligence to cure cancer &lt;strong&gt;or
whatever&lt;/strong&gt;&lt;/a&gt;” (emphasis mine).&lt;/p&gt;
&lt;p&gt;I really, really don’t want to become a &lt;a href="https://blog.glyph.im/2024/05/grand-unified-ai-hype.html"&gt;full-time AI
skeptic&lt;/a&gt;, and I am not an expert here,
but I feel like I can identify a logically flawed premise when I see one.&lt;/p&gt;
&lt;p&gt;This is not a system-design strategy.  It is a trauma response.&lt;/p&gt;
&lt;p&gt;You can’t cure cancer “or whatever”. If you want to build a computer system
that does some thing, you actually need to hire experts &lt;em&gt;in that thing&lt;/em&gt;, and
have them work to both design and validate that the system is fit for the
purpose of that thing.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=aside-but-are-they-though&gt;Aside: But... &lt;em&gt;are&lt;/em&gt; they, though?&lt;/h2&gt;
&lt;p&gt;I am not an oncologist; I do not particularly want to be writing about the
&lt;em&gt;specifics&lt;/em&gt; here, but, if I am going to make a claim like “you can’t cure
cancer this way” I need to back it up.&lt;/p&gt;
&lt;p&gt;My first argument — and possibly my strongest — is that cancer is not cured.&lt;/p&gt;
&lt;p&gt;QED.&lt;/p&gt;
&lt;p&gt;But I guess, to Sam’s credit, &lt;a href="https://www.color.com/blog/colors-copilot-and-partnership-with-openai"&gt;there is at least one other company &lt;em&gt;partnering&lt;/em&gt;
with
OpenAI&lt;/a&gt;
to do things that &lt;em&gt;are&lt;/em&gt; specifically related to cancer. However, that company
is still in a self-described “initial phase” and it’s &lt;a href="https://www.patientpower.info/navigating-cancer/can-chatgpt-provide-reliable-cancer-info"&gt;not entirely
clear&lt;/a&gt;
that it is going to work out very well.&lt;/p&gt;
&lt;p&gt;Almost everything I can find about it online was from a PR push in the middle
of last year, so it all reads like a press release.  I can’t easily find any
independently-verified information.&lt;/p&gt;
&lt;p&gt;A lot of AI hype is &lt;a href="https://blog.glyph.im/2024/05/grand-unified-ai-hype.html"&gt;like this&lt;/a&gt;.  A
promising demo is delivered; claims are made that surely if the technology can
solve &lt;em&gt;this&lt;/em&gt; small part of the problem now, &lt;a href="https://www.jalopnik.com/elon-musk-tesla-self-driving-cars-anniversary-autopilot-1850432357/"&gt;within 5
years&lt;/a&gt;
surely it will be able to solve everything else as well!&lt;/p&gt;
&lt;p&gt;But even the light-on-content puff-pieces tend to hedge &lt;em&gt;quite&lt;/em&gt; a lot.  For
example, as &lt;a href="https://www.wsj.com/articles/openai-expands-healthcare-push-with-color-healths-cancer-copilot-86594ff1#:~:text=The%20most%20promising%20use%20of%20AI%20in%20healthcare%20right%20now%20is%20automating%20“mundane”%20tasks"&gt;the Wall Street Journal quoted one of the users initially testing
it&lt;/a&gt;
(emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;The most promising use of AI in healthcare right now is automating
“mundane” tasks like paperwork and physician note-taking&lt;/em&gt;&lt;/strong&gt;, he said. The
tendency for AI models to “hallucinate” and contain bias presents serious
risks for using AI to replace doctors. &lt;strong&gt;Both Color’s Laraki and OpenAI’s
Lightcap are adamant that doctors be involved in any clinical decisions.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I would probably not personally characterize “‘mundane’ tasks like paperwork
and … note-taking” as “curing cancer”.  Maybe an oncologist could use &lt;a href="https://github.com/glyph/"&gt;some
code I developed&lt;/a&gt; too; even if it helped them, I
wouldn’t be stealing valor from them on the curing-cancer part of their job.&lt;/p&gt;
&lt;p&gt;Even fully giving it the benefit of the doubt that it works great, and improves
patient outcomes significantly, this is medical back-office software. It is not
super-intelligence.&lt;/p&gt;
&lt;p&gt;It would not even matter if it &lt;em&gt;were&lt;/em&gt; “super-intelligence”, whatever that
means, because “intelligence” is &lt;a href="https://www.researchgate.net/publication/281609788_Intelligence_quotient_analysis_and_its_association_with_academic_performance_of_medical_students"&gt;not how you do medical
care&lt;/a&gt;
or medical research.  It’s called “lab work” not “lab think”.&lt;/p&gt;
&lt;p&gt;To put a fine point on it: biomedical research &lt;em&gt;fundamentally&lt;/em&gt; cannot be done
entirely by reading papers or processing existing information.  It cannot even
be done by testing drugs in computer simulations.&lt;/p&gt;
&lt;p&gt;Biological systems are enormously complex, and medical research on new
therapies inherently &lt;em&gt;requires&lt;/em&gt; careful, repeated empirical testing to validate
the correspondence of existing research with reality.  Not “an experiment”, but
a series of &lt;em&gt;coordinated&lt;/em&gt; experiments that all test the same theoretical model.
The data (which, in an LLM context, is “training data”) might just be &lt;em&gt;wrong&lt;/em&gt;;
it may not reflect reality, and the only way to tell is to continuously verify
it &lt;em&gt;against&lt;/em&gt; reality.&lt;/p&gt;
&lt;p&gt;Previous observations can be tainted by methodological errors, by data fraud,
and by operational mistakes by practitioners.  If there were a way to do
verifiable development of new disease therapies without the extremely expensive
ladder going from cell cultures to animal models to human trials, &lt;em&gt;we would
already be doing it&lt;/em&gt;, and “AI” would just be an improvement to efficiency of
that process.  But there &lt;em&gt;is no way to do that&lt;/em&gt; and nothing about the
technologies involved in LLMs is going to change that fact.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=knowing-things&gt;Knowing Things&lt;/h2&gt;
&lt;p&gt;The practice of science — indeed any practice of the collection of &lt;em&gt;meaningful&lt;/em&gt;
information — must be done by intentionally and carefully selecting inclusion
criteria, methodically and repeatedly curating our data, building a model that
operates &lt;em&gt;according to rules we understand and can verify&lt;/em&gt;, and verifying the
data itself with repeated tests against nature.  We cannot just hoover up
whatever information happens to be conveniently available with no human
intervention and &lt;em&gt;hope&lt;/em&gt; it resolves to a correct model of reality by accident.
We need to look &lt;a href="https://en.wikipedia.org/wiki/Streetlight_effect"&gt;where the keys are, not where the light
is&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Piling up more and more information in a haphazard and increasingly precarious
pile will not allow us to climb to the top of that pile, all the way to heaven,
so that we can attack and dethrone God.&lt;/p&gt;
&lt;p&gt;Eventually, we’ll just run out of disk space, and then lose the database file
when the family gets a new computer anyway.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=acknowledgments&gt;Acknowledgments&lt;/h2&gt;
&lt;p class=update-note&gt;Thank you to &lt;a href="/pages/patrons.html"&gt;my patrons&lt;/a&gt; who are supporting my writing on
this blog.  If you like what you’ve read here and you’d
like to read more of it, or you’d like to support my &lt;a href="https://github.com/glyph/"&gt;various open-source
endeavors&lt;/a&gt;, you can &lt;a href="/pages/patrons.html"&gt;support my work as a
sponsor&lt;/a&gt;!  Special thanks also to Itamar Turner-Trauring
and Thomas Grainger for pre-publication feedback on this article; any errors of
course remain my own.&lt;/p&gt;&lt;/body&gt;</content><category term="misc"></category><category term="ai"></category><category term="llm"></category><category term="memoir"></category></entry></feed>