<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:thoughtbot="https://thoughtbot.com/feeds/">
  <title>Giant Robots Smashing Into Other Giant Robots</title>
  <subtitle>Written by thoughtbot, your expert partner for design and development.
</subtitle>
  <id>https://robots.thoughtbot.com/</id>
  <link href="https://thoughtbot.com/blog"/>
  <link href="https://feed.thoughtbot.com" rel="self"/>
  <updated>2026-06-01T00:00:00+00:00</updated>
  <author>
    <name>thoughtbot</name>
  </author>
<entry>
  <title>The Four Signals of AI Observability</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/the-four-signals-of-ai-observability"/>
  <author>
    <name>Matheus Sales</name>
  </author>
  <id>https://thoughtbot.com/blog/the-four-signals-of-ai-observability</id>
  <published>2026-06-01T00:00:00+00:00</published>
  <updated>2026-05-29T18:05:52Z</updated>
  <content type="html">&lt;p&gt;A few months ago we shipped a chat experience to production. Users ask a
question, our app routes it through an LLM model, the model calls a few internal
tools, and an answer comes back from it.&lt;/p&gt;

&lt;p&gt;It worked. Sort of.&lt;/p&gt;

&lt;p&gt;When the model answered well, we had no idea why. When it answered badly, we had
no idea either. The model was a black box attached to our app, and our best
debugging tool was reading logs and guessing.&lt;/p&gt;

&lt;p&gt;We realized our app could not answer a very normal operational question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Show us every chat where the user said the answer was bad, group them by which
version of the system prompt was loaded, and let us read the whole
conversation, including which tools the model called.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s the AI equivalent of “show me every 500 errors on this endpoint after deploy X.”
But our app couldn’t answer it.&lt;/p&gt;

&lt;p&gt;That was the trigger to stop looking for a smarter model and start looking to
add an observability layer. We ended up using &lt;a href="https://langfuse.com/"&gt;Langfuse&lt;/a&gt;, but the specific vendor
matters less than the capabilities. Helicone, Arize Phoenix, LangSmith, and
Braintrust all solve versions of the same problem.&lt;/p&gt;

&lt;p&gt;After a couple of months of iteration, we noticed that the things we need came
in four flavors. I call them the four signals that every AI feature needs to
emit about itself.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A version on every prompt.&lt;/strong&gt; Which exact words did the model see today?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A trace shaped like the actual work.&lt;/strong&gt; What did it call, in what order,
with what arguments?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A score from the user.&lt;/strong&gt; Did the human like the result?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A score from another model.&lt;/strong&gt; When the human is quiet, who is grading?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Of course we can build an AI feature without all four. We just can’t improve it on purpose.&lt;/p&gt;
&lt;h2 id="a-version-on-every-prompt"&gt;
  
    A version on every prompt
  
&lt;/h2&gt;

&lt;p&gt;The first thing we did was move every prompt out of the code and into a
versioned store the app fetches at runtime.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The code never references a version. It asks for a label.&lt;/span&gt;
&lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;PromptRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;name: &lt;/span&gt;&lt;span class="s2"&gt;"classify_question"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;label: &lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# A human moves "production" between versions in the Langfuse UI.&lt;/span&gt;
&lt;span class="c1"&gt;# Promotion is a click. Rollback is a click. No deploy.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The first time we rolled back a bad prompt by clicking a button instead of reverting a PR and waiting for CI, we knew this was the right shape.&lt;/p&gt;

&lt;p&gt;Once prompts became content, the people closest to the problem became the people writing the prompts.
The feedback loop got much shorter, and the quality went up.&lt;/p&gt;
&lt;h2 id="a-trace-shaped-like-the-actual-work"&gt;
  
    A trace shaped like the actual work
  
&lt;/h2&gt;

&lt;p&gt;A chat is not a single call. It is a small program. Classify the question, load the
right prompt, call a tool or two, then compose an answer.&lt;/p&gt;

&lt;p&gt;If your trace is one row, you only know that something happened. A trace tree tells
you what actually happened. If your trace is a tree of calls, you have a database of decisions
the model made.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Before: one log line, no shape
[INFO] chat_completed user_id=123 duration_ms=4200 tokens=1840

# After: a tree of decisions
trace: "chat"
  span:       load-prompt                  (version=production:v12)
  generation: classify-question            (model=haiku, category="billing")
  generation: compose-answer
    span:       tool-call.lookup_invoice   (200ms)
    span:       tool-call.lookup_customer  (180ms)
  generation: final-response               (model=sonnet, 1.2k tokens)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Each node carries the prompt name and version, the model id, token usage, and a
set of metadata fields we control. The customer it ran for, the category the
question was classified as, which tools ran, whether the conversation was new.&lt;/p&gt;

&lt;p&gt;That metadata is the part that turned out to matter most.&lt;/p&gt;

&lt;p&gt;The first time we filtered traces to “every chat in scope X where a
particular tool ran and the user said the answer was bad”, we had a small
realization. The trace list was not a log anymore. It was a queryable database
of decisions the model made.&lt;/p&gt;

&lt;p&gt;The rule we would write on a sticky note: &lt;strong&gt;tag your traces with the dimensions
you will want to filter on later&lt;/strong&gt;. It is cheap up front and impossible to add
later, once you wish you had it.&lt;/p&gt;
&lt;h2 id="a-score-from-the-user"&gt;
  
    A score from the user
  
&lt;/h2&gt;

&lt;p&gt;Every assistant message in the UI has a thumbs up and a thumbs down. When a user
clicks one, we save a row and post it back to the observability tool as a score
on the trace.&lt;/p&gt;

&lt;p&gt;A thumbs-down on its own isn’t actionable. A thumbs-down attached to a trace tells
you what the model saw, what it called, which prompt version produced it, and what category the
question fell into. Now you can ask: are downvotes concentrated in one category? On one prompt version? After one specific tool call?&lt;/p&gt;

&lt;p&gt;You should review downvoted traces. It takes time, sometimes they’re noise, the user wanted something we don’t support,
or hit thumbs-down by accident. But maybe one in ten is a real signal, and that’s the one that turns into a prompt change,
a new tool, or a bug fix.&lt;/p&gt;

&lt;p&gt;The point of all this plumbing is one new query.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Show us every trace a user labeled bad.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you can run that query and read the entire conversation that produced it
(prompt version, tool calls, model, latency, everything), you stop the guessing
game.&lt;/p&gt;
&lt;h2 id="a-score-from-another-model"&gt;
  
    A score from another model
  
&lt;/h2&gt;

&lt;p&gt;Human feedback is useful but rare. Most users do not click anything.&lt;/p&gt;

&lt;p&gt;So we added a second model to grade the first one. A background job pulls
finished chats, runs them through a separate “judge” prompt (versioned and
labeled in the same store as the production prompts), and writes the result
back as a score on the same trace.&lt;/p&gt;

&lt;p&gt;Now the trace carries two streams of judgment. When the user and the judge
agree, our judge is in sync with real users. When they disagree, that is the
most interesting trace in the system. Either way, the judge runs on every chat,
so a regression shows up the same day we ship the prompt that caused it, not a
week later when somebody complains.&lt;/p&gt;

&lt;p&gt;Our judge scores things like factuality, instruction-following, completeness,
hallucination, and whether the assistant actually used the right internal context.&lt;/p&gt;

&lt;p&gt;We underestimated this one. A judge that catches a regression before it ships
is worth more than a faster or smarter model. It is the only signal that scales
when nobody is clicking thumbs.&lt;/p&gt;

&lt;p&gt;The lesson we had to learn: the judge is just a prompt. It can be wrong. It
needs versioning and a Playground and a rollback button, exactly like a
user-facing prompt.&lt;/p&gt;

&lt;figure&gt;
  &lt;img src="https://images.thoughtbot.com/8exq7pktql71m95hlwd2jd0m457q_diagram.png" alt="A diagram showing the four signals of AI observability: prompt version, trace, user score, and judge score."&gt;
  &lt;figcaption style="text-align:center;"&gt;
    Each signal writes back to the same trace. That’s the whole trick
  &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="four-signals-one-idea"&gt;
  
    Four signals, one idea
  
&lt;/h2&gt;

&lt;p&gt;The four signals overlap, and that’s on purpose. The prompt version shows up on
the trace. The user score attaches to the trace. The judge score attaches to
the trace too. They are not really four separate things. They are the same idea
viewed from four different angles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make the AI feature observable, then you can change it on purpose.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a while I treated AI features like a different category of software: less debuggable,
less testable, less under our control. An AI feature is software. It has inputs, makes decisions, produces outputs,
and can be observed like anything else.&lt;/p&gt;

&lt;p&gt;The four signals overlap on purpose. They are one idea, make the system observable, viewed from four angles.
What changes once you have them isn’t that the model gets smarter. It’s that you stop hoping. You ship a prompt
change knowing the judge will tell if it regressed. You read a downvote knowing you can replay the exact conversation
that produced it. You promote a new prompt to production knowing you can roll it back in one click if it breaks.&lt;/p&gt;

&lt;p&gt;The model is the engine. The observability layer is the dashboard. You can drive without one. You just can’t drive on purpose.&lt;/p&gt;

&lt;aside class="related-articles"&gt;&lt;h2&gt;If you enjoyed this post, you might also like:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/how-to-use-chatgpt-to-find-custom-software-consultants"&gt;How to Use ChatGPT to Find Custom Software Consultants&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/using-machine-learning-to-answer-questions-from-internal-documentation"&gt;Using Machine Learning to Answer Questions from Internal Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/priority-determines-product"&gt;Priority Determines Product&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/aside&gt;
</content>
  <summary>Treat your AI feature like a software you can watch, not a model you hope works.</summary>
  <thoughtbot:auto_social_share>true</thoughtbot:auto_social_share>
</entry>
<entry>
  <title>Can you really launch a tech business with a no-code app builder?</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/can-you-really-launch-a-tech-business-with-a-no-code-app-builder"/>
  <author>
    <name>Michelle Taute</name>
  </author>
  <id>https://thoughtbot.com/blog/can-you-really-launch-a-tech-business-with-a-no-code-app-builder</id>
  <published>2026-05-29T00:00:00+00:00</published>
  <updated>2026-05-26T17:44:15Z</updated>
  <content type="html">&lt;p&gt;Sometimes AI really does feel like magic, and right now just about any no-code app builder provides that magical experience for entrepreneurs with big ideas but no coding skills.&lt;/p&gt;

&lt;p&gt;You can spend half an hour typing a few details about your app into Lovable and create a simple, functional app with a polished user interface. Then add six new features over the weekend without ever hiring a developer or talking to a real user.&lt;/p&gt;

&lt;p&gt;Besides Lovable, there’s Bolt, Base44, Bubble, Replit and others. These AI app generators allow just about anyone to build software by describing it through text-based prompts. No technical knowledge, &lt;a href="https://thoughtbot.com/blog/using-a-design-sprint-to-find-focus-for-an-ai-solution"&gt;design sprint&lt;/a&gt; or VC funding required. Many even allow you to export the code, so you can continue your project outside the tool.&lt;/p&gt;

&lt;p&gt;It’s exciting to mockup ideas so quickly, but can you launch a real business with a no-code app builder? Are non-developers really creating high quality, production-ready code?&lt;/p&gt;
&lt;h2 id="making-money-with-ai-built-apps"&gt;
  
    Making money with AI-built apps
  
&lt;/h2&gt;

&lt;p&gt;Startup culture loves to celebrate the outliers, the big, against-the-odds success stories. Lovable’s own ad campaign hypes ShiftNex, a healthcare staffing platform that hit $1 million annual recurring revenue (ARR) in five months, and Plinq, a background check app for dating built in 45 days that achieved $465,000 in ARR.&lt;/p&gt;

&lt;p&gt;Not to be left out, Replit publicizes GEN AIPI, an AI education platform with training courses, payments, certifications and an admin system that was originally built in just three days with no dev team. The business achieved $180,000 in revenue in the first six weeks.&lt;/p&gt;

&lt;p&gt;But the ad campaigns don’t include all the entrepreneurs who encountered big—or even impassable—roadblocks attempting similar business results. There’s often a point of diminishing returns with no-code app builders. At first, there’s instant gratification, but eventually, it becomes slower, harder and sometimes impossible to refine existing features or add new ones.&lt;/p&gt;
&lt;h2 id="are-ai-generated-apps-production-ready"&gt;
  
    Are AI-generated apps production ready?
  
&lt;/h2&gt;

&lt;p&gt;Unless you’re in a highly regulated industry, you can push software created with a no-code app builder to production. But that’s just the first challenge on the road to a stable and profitable tech business. Will performance hold up when you hit 1,000 or 10,000 users? Can you easily add new features? When do you add an engineering team? &lt;/p&gt;

&lt;p&gt;AI app generators allow you to bring an &lt;a href="https://thoughtbot.com/blog/how-to-launch-a-lovable-mvp-in-2026"&gt;MVP&lt;/a&gt; to life quickly, but scaling a successful, long-term tech business is much tougher. It’s nearly impossible for non-technical founders to evaluate potential risks in a code base or to even know what risks to look for in the first place. A big one is data security: How hard would it be for a bad actor to tap into sensitive data or user information?&lt;/p&gt;

&lt;p&gt;And even when a founder does identify a bug or issue, it can be hard for an AI app builder to solve. If you already have users and things start breaking, it can easily become a hair-on-fire emergency. One that might leave traditional developers trying to get up to speed on thousands of lines of code with no context.&lt;/p&gt;

&lt;p&gt;Then there’s an even bigger issue. Entrepreneurs are often so focused on what they can build that they forget to think about what they should build. With traditional development barriers gone, it’s easy to skip the strategic work of identifying a real problem to solve for real users. We’ve honed an entire &lt;a href="https://thoughtbot.com/services/shaping-sprint"&gt;Shaping Sprint&lt;/a&gt; process to work with founders on solidifying a product strategy and direction.&lt;/p&gt;

&lt;p&gt;Despite these concerns, no-code app builders may be a fit for small businesses in industries with relatively low volume, regulation and security risk. We’re just at the beginning of AI app builders evolving and growing in capability, so it’s impossible to know exactly what the future holds. &lt;/p&gt;
&lt;h2 id="no-code-app-builders-20"&gt;
  
    No-code app builders 2.0
  
&lt;/h2&gt;

&lt;p&gt;Right now, most software created by no-code app builders amounts to a prototype: useful for learning, but often difficult to scale into a successful long-term business. The current generation of tools is optimized for speed and instant gratification, but not necessarily for helping founders build the right product or make thoughtful product decisions along the way.&lt;/p&gt;

&lt;p&gt;We think there’s an opportunity for the next generation of AI product tools to evolve beyond pure “vibe coding.”&lt;/p&gt;

&lt;p&gt;Not just generating interfaces and features faster, but helping founders and teams:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Think through product direction&lt;/li&gt;
&lt;li&gt;Validate assumptions&lt;/li&gt;
&lt;li&gt;Prioritize the right problems&lt;/li&gt;
&lt;li&gt;And move from idea to real product more intentionally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s part of why our team has been experimenting publicly with new workflows and AI-assisted product design approaches through our &lt;a href="https://thoughtbot.com/blog/going-beyond-vibe-coding-with-readysetgo"&gt;ReadySetGo initiative&lt;/a&gt;  and weekly &lt;a href="https://www.youtube.com/watch?v=gZv7-hOSHD8&amp;amp;list=PL8tzorAO7s0jaDFZYPAtR_AIHgD5J3a7d"&gt;AI in Focus livestream series.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We’re still early in exploring what this category could become, but one thing already feels clear: as AI lowers the barrier to building software, the ability to identify the &lt;i&gt;right&lt;/i&gt; thing to build may become even more important.&lt;/p&gt;

&lt;aside class="related-articles"&gt;&lt;h2&gt;If you enjoyed this post, you might also like:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/how-to-use-chatgpt-to-find-custom-software-consultants"&gt;How to Use ChatGPT to Find Custom Software Consultants&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/from-idea-to-impact-the-role-of-rapid-prototyping-in-agetech"&gt;From idea to impact: The role of rapid prototyping in AgeTech&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/using-machine-learning-to-answer-questions-from-internal-documentation"&gt;Using Machine Learning to Answer Questions from Internal Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/aside&gt;
</content>
  <summary>AI app builders promise to turn anyone into a founder overnight, but is the code actually production-ready? We look at the success stories, technical risks and a path forward.</summary>
  <thoughtbot:auto_social_share>true</thoughtbot:auto_social_share>
</entry>
<entry>
  <title>Giant Robots Podcast Ep 612:  Do fish drink?</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/giant-robots-podcast-ep-612-do-fish-drink"/>
  <author>
    <name>Chad Pytel, Will Larry &amp;amp; Sami Birnbaum</name>
  </author>
  <id>https://thoughtbot.com/blog/giant-robots-podcast-ep-612-do-fish-drink</id>
  <published>2026-05-28T00:00:00+00:00</published>
  <updated>2026-05-28T14:12:14Z</updated>
  <content type="html">The Giant Robots trio are back to discuss the development of thoughtbot’s ReadySetGo app, and whether AI might be causing developers to go backwards.</content>
  <summary>The Giant Robots trio are back to discuss the development of thoughtbot’s ReadySetGo app, and whether AI might be causing developers to go backwards.</summary>
  <thoughtbot:auto_social_share>false</thoughtbot:auto_social_share>
</entry>
<entry>
  <title>This week in #dev (May 15, 2026)</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/this-week-in-dev-may-15-2026"/>
  <author>
    <name>thoughtbot</name>
  </author>
  <id>https://thoughtbot.com/blog/this-week-in-dev-may-15-2026</id>
  <published>2026-05-28T00:00:00+00:00</published>
  <updated>2026-05-26T14:17:29Z</updated>
  <content type="html">&lt;p&gt;Welcome to another edition of &lt;a href="https://thoughtbot.com/blog/tags/this-week-in-dev"&gt;This Week in #dev&lt;/a&gt;, a series of posts
where we bring some of our most interesting Slack conversations to the public.&lt;/p&gt;
&lt;h2 id="alternative-text-for-css-generated-content"&gt;
  
    Alternative Text for CSS-Generated Content
  
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://thoughtbot.com/blog/authors/matheus-richard"&gt;Matheus Richard&lt;/a&gt; learned that the CSS &lt;code&gt;content&lt;/code&gt; property accepts
alternative text for screen readers, separated by a &lt;code&gt;/&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nc"&gt;.warning&lt;/span&gt;&lt;span class="nd"&gt;::before&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;"⚠️"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="s1"&gt;"Warning"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Without the alt text, assistive technology either reads out the emoji name or
skips it entirely. More details in &lt;a href="https://www.stefanjudis.com/today-i-learned/css-content-property-accepts-alternative-text/"&gt;Stefan Judis’ article&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="a-faster-ui-for-large-github-diffs"&gt;
  
    A Faster UI for Large GitHub Diffs
  
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://thoughtbot.com/blog/authors/matheus-richard"&gt;Matheus Richard&lt;/a&gt; shares &lt;a href="https://diffshub.com"&gt;diffshub&lt;/a&gt;, a tool that renders PR
diffs GitHub struggles with. It’s a drop-in replacement: swap &lt;code&gt;github.com&lt;/code&gt; for
&lt;code&gt;diffshub.com&lt;/code&gt; in any PR URL, like
&lt;a href="https://diffshub.com/oven-sh/bun/pull/30412"&gt;https://diffshub.com/oven-sh/bun/pull/30412&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="aube-a-new-javascript-package-manager"&gt;
  
    Aube, a New JavaScript Package Manager
  
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://thoughtbot.com/blog/authors/jared-turner"&gt;Jared Turner&lt;/a&gt; shares &lt;a href="https://aube.en.dev"&gt;Aube&lt;/a&gt;, a JavaScript package manager
from the creator of Mise. It’s pitched as fast, compatible with existing
lockfiles, and security-focused, including a 24-hour cooldown before newly
published versions can be installed.&lt;/p&gt;
&lt;h2 id="thanks"&gt;
  
    Thanks
  
&lt;/h2&gt;

&lt;p&gt;This edition was brought to you by &lt;a href="https://thoughtbot.com/blog/authors/jared-turner"&gt;Jared Turner&lt;/a&gt; and &lt;a href="https://thoughtbot.com/blog/authors/matheus-richard"&gt;Matheus
Richard&lt;/a&gt;. Thanks to all contributors! 🎉&lt;/p&gt;

&lt;aside class="related-articles"&gt;&lt;h2&gt;If you enjoyed this post, you might also like:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/this-week-in-dev-jan-26-2024"&gt;This Week in #dev (Jan 26, 2024)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/this-week-in-open-source-6-30"&gt;This Week in Open Source (June 30, 2023)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/this-week-in-dev-feb-9-2024"&gt;This Week in #dev (Feb 9, 2024)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/aside&gt;
</content>
  <summary>Highlights of what happened in our #dev channel on Slack this week.
</summary>
  <thoughtbot:auto_social_share>true</thoughtbot:auto_social_share>
</entry>
<entry>
  <title>Lost, forgotten, and unfamiliar HTML</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/lost-forgotten-and-unfamiliar-html"/>
  <author>
    <name>Dave Iverson</name>
  </author>
  <id>https://thoughtbot.com/blog/lost-forgotten-and-unfamiliar-html</id>
  <published>2026-05-27T00:00:00+00:00</published>
  <updated>2026-05-22T21:54:31Z</updated>
  <content type="html">&lt;p&gt;I ran &lt;a href="https://html-validate.org/"&gt;HTML-validate &lt;/a&gt;and &lt;a href="https://github.com/dequelabs/axe-core"&gt;Axe core&lt;/a&gt; and a Claude prompt against a new website I’m building, and they caught a bunch of stuff I missed! This gave me a chance to remember the easily overlooked bits of building a website. And I visited a few dark corners of the HTML spec I hadn’t been to yet!&lt;/p&gt;
&lt;h2 id="data-attributes-should-be-lowercase"&gt;
  
    Data attributes should be lowercase
  
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;data-dialogOpen&lt;/code&gt; is invalid - it should be &lt;code&gt;data-dialogopen&lt;/code&gt;.
But did you know that &lt;a href="https://html.spec.whatwg.org/multipage/dom.html#custom-data-attribute"&gt;&lt;em&gt;all&lt;/em&gt; HTML attribute names get automatically lowercased&lt;/a&gt;? I didn’t.&lt;/p&gt;

&lt;p&gt;HTTP headers are also case-insensitive &lt;a href="https://datatracker.ietf.org/doc/html/rfc9113#section-8.2.1"&gt;except in HTTP2&lt;/a&gt; where they MUST be lowercase.&lt;/p&gt;
&lt;h2 id="invalid-id-attributes"&gt;
  
    Invalid id attributes
  
&lt;/h2&gt;

&lt;p&gt;I learned that in HTML5, &lt;a href="https://html.spec.whatwg.org/multipage/dom.html#the-id-attribute"&gt;an &lt;code&gt;id&lt;/code&gt; can be &lt;em&gt;anything&lt;/em&gt; &lt;/a&gt;as long as it’s 1 character with no whitespace (and it’s unique). &lt;code&gt;id="_0$!11"&lt;/code&gt; is totally valid and I think even emojis are ok!.&lt;/p&gt;

&lt;p&gt;However, &lt;a href="https://www.w3.org/TR/html4/types.html#type-id"&gt;in HTML4 &lt;code&gt;,id&lt;/code&gt;s need to start with a letter&lt;/a&gt; and can only contain letters, numbers, and a few punctuation symbols. So it’s probably best not to go too wild. Backwards compatibility is nice.&lt;/p&gt;

&lt;p&gt;Oh, and the uniqueness requirement? &lt;code&gt;id&lt;/code&gt;s inside iFrames only need to be unique within their document. Otherwise, imagine how tricky it would be to iFrame in an arbitrary page.&lt;/p&gt;
&lt;h2 id="redundant-for-attributes"&gt;
  
    Redundant for attributes
  
&lt;/h2&gt;

&lt;p&gt;A bit of a nitpick: when you label an input by putting it inside a label, the &lt;code&gt;for&lt;/code&gt; attribute is redundant. When the input is outside the label, you definitely need that &lt;code&gt;for&lt;/code&gt;!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!-- Rails-style: no `for=""` needed --&amp;gt;
&amp;lt;label&amp;gt;
  Username &amp;lt;input type="text" name="username" /&amp;gt;
&amp;lt;/label&amp;gt;

&amp;lt;!-- non-Rails-style: don't forget the `for=""`! --&amp;gt;
&amp;lt;label for="username&amp;gt;Username&amp;lt;/label&amp;gt;
&amp;lt;input type="text" name="username" id="username" /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Some reasons that thoughtbot prefers inputs inside labels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it reduces the need for an extra wrapper div&lt;/li&gt;
&lt;li&gt;since the label is clickable, this often results in a bigger click/tap area&lt;/li&gt;
&lt;li&gt;you don’t need to generate unique IDs for inputs&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="extra-whitespace-in-a-textarea"&gt;
  
    Extra whitespace in a textarea
  
&lt;/h2&gt;

&lt;p&gt;Claude spotted this one: I accidentally had a blank space inside a textarea.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;textarea name="explain"&amp;gt; &amp;lt;/textarea&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;An easy mistake to make and kind of annoying to an end user, especially because it will cause the &lt;code&gt;required&lt;/code&gt; validation to be skipped. I wish one of my automated scanners had caught it.&lt;/p&gt;
&lt;h2 id="false-positive-aria-label-misuse"&gt;
  
    False positive: aria-label misuse
  
&lt;/h2&gt;

&lt;p&gt;HTML-validate told me that using the &lt;code&gt;aria-label&lt;/code&gt; attribute on &lt;code&gt;&amp;lt;search&amp;gt;&lt;/code&gt; is invalid. Nope - I was using it correctly!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.w3.org/WAI/ARIA/apg/patterns/landmarks/examples/search.html"&gt;W3c explicitly recommends it&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If a page includes more than one search landmark, each should have a unique label.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;search aria-label="Site-wide"&amp;gt;
  &amp;lt;form&amp;gt;
    ...
  &amp;lt;/form&amp;gt;
&amp;lt;/search&amp;gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="https://gitlab.com/html-validate/html-validate/-/work_items/359"&gt;I filed a bug report.&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="iframes-with-unique-names"&gt;
  
    iFrames with unique names
  
&lt;/h2&gt;

&lt;p&gt;I had trouble with this one, but I’m glad Axe caught it because it’s genuinely useful for screen reader users.&lt;/p&gt;

&lt;p&gt;Every iFrame needs a title, and those titles should be unique so they can be differentiated. But also, landmarks INSIDE the iFrames must be unique across the entire page, including the parent document.&lt;/p&gt;

&lt;p&gt;I had 3 iFrames on a page, all with &lt;code&gt;&amp;lt;main aria-label="Component Example"&amp;gt;&lt;/code&gt;. Sure enough, when I opened Voiceover it read out 3 of the same landmark:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Component Example main&lt;/li&gt;
&lt;li&gt;Component Example main&lt;/li&gt;
&lt;li&gt;Component Example main&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a great experience.&lt;/p&gt;

&lt;p&gt;First, I tried to fix it by removing the &lt;code&gt;aria-label&lt;/code&gt;s, but Axe warns me that the document has multiple &lt;code&gt;&amp;lt;main&amp;gt;&lt;/code&gt;s without unique labels. I had to refactor how the iFrames were generated so that each one had both a unique title and &lt;code&gt;&amp;lt;main&amp;gt;&lt;/code&gt; label.&lt;/p&gt;
&lt;h2 id="color-contrast-issues"&gt;
  
    Color contrast issues
  
&lt;/h2&gt;

&lt;p&gt;Automated scanners are the best at finding contrast issues. I happened to have a link state that used a slightly-too-light purple on white. It didn’t pass WCAG’s minimum contrast levels. Easy for me to miss, but troublesome for someone with reduced vision.&lt;/p&gt;
&lt;h2 id="keyboard-accessible-overflow-scrolling"&gt;
  
    Keyboard-accessible overflow scrolling
  
&lt;/h2&gt;

&lt;p&gt;This was a new one for me! &lt;a href="https://dequeuniversity.com/rules/axe/4.11/scrollable-region-focusable?application=playwright"&gt;Axe tells me&lt;/a&gt; that when a region scrolls using &lt;code&gt;overflow: scroll&lt;/code&gt; or similar, it must contain a focusable element. This seems to be a Safari-specific bug.&lt;/p&gt;

&lt;p&gt;I tested with Safari and confirmed that it’s true: using the keyboard I was unable to scroll down to see the cut-off content.&lt;/p&gt;

&lt;p&gt;The simplest solution is to add &lt;code&gt;tabindex="0"&lt;/code&gt; to an element inside the scrolling region.&lt;/p&gt;
&lt;h2 id="forgotten-svgs"&gt;
  
    Forgotten SVGs
  
&lt;/h2&gt;

&lt;p&gt;I’m constantly forgetting to check that SVGs have the right label and role. With images it’s easy: just make sure you’ve got an &lt;code&gt;alt&lt;/code&gt; tag. But inline SVGs can either be decorative or presentational.&lt;/p&gt;

&lt;p&gt;Decorative SVGs must use &lt;code&gt;aria-hidden="true"&lt;/code&gt; to keep them out of the accessibility tree.&lt;/p&gt;

&lt;p&gt;Presentational ones must use &lt;code&gt;role="image&lt;/code&gt; and NEED a &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; tag to serve the same function as &lt;code&gt;alt&lt;/code&gt; text. And since not all screen readers catch the &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; tag, you usually want to associate it with the &lt;code&gt;&amp;lt;svg&amp;gt;&lt;/code&gt; tag using &lt;code&gt;aria-labelledby&lt;/code&gt;. And if the SVG contains multiple images, text blocks, or interactivity, there’s even more to consider.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.w3.org/TR/graphics-aria-1.0/#role_definitions"&gt;I dug into the WAI-ARIA rabbit hole&lt;/a&gt; and learned that maybe some of my SVGs could be &lt;code&gt;role="graphics-symbol"&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A graphical object used to convey a simple meaning or category, where the meaning is more important than the particular visual appearance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Axe missed all this, but Claude caught it. I wonder if there’s an automated scanner that could help me out.&lt;/p&gt;
&lt;h2 id="explain-your-asterisks"&gt;
  
    Explain your asterisks
  
&lt;/h2&gt;

&lt;p&gt;If you’re going to denote require inputs using an asterisk &lt;code&gt;*&lt;/code&gt; in the label, &lt;a href="https://www.w3.org/WAI/WCAG22/Techniques/html/H90#description"&gt;you’d better provide a legend that explains it&lt;/a&gt;. Even better, replace asterisk with &lt;code&gt;(required)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Oops, thanks for the reminder, Claude. I added an explainer to the form:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;small&amp;gt;* asterisks denote required fields&amp;lt;/small&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="punctuation-as-labels"&gt;
  
    Punctuation as labels
  
&lt;/h2&gt;

&lt;p&gt;I built a pagination component that looked like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt; 1 … 45 46 47 … 104 &amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Claude reminded me that when a screen reader reads out those angle brackets and ellipses, it’s going to sound weird. I opened Voiceover and sure enough - it sounds weird.&lt;/p&gt;

&lt;p&gt;I followed &lt;a href="https://github.com/ddnexus/pagy"&gt;Pagy’s&lt;/a&gt; example: the ellipses get &lt;code&gt;role="separator"&lt;/code&gt; and the buttons get &lt;code&gt;aria-label="Next"&lt;/code&gt;/&lt;code&gt;aria-label="Previous"&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id="table-header-cell-scopes"&gt;
  
    Table header cell scopes
  
&lt;/h2&gt;

&lt;p&gt;A blind spot for me: I didn’t know about the &lt;code&gt;scope&lt;/code&gt; attribute. &lt;a href="https://www.w3.org/WAI/WCAG22/Techniques/html/H63"&gt;WCAG recommends&lt;/a&gt; using &lt;code&gt;scope="col"&lt;/code&gt; on table header &lt;code&gt;&amp;lt;th&amp;gt;&lt;/code&gt; cells to associate them with their column. And also using &lt;code&gt;&amp;lt;th scope="row"&amp;gt;&lt;/code&gt; for table body cells that identify the subject of the row.&lt;/p&gt;

&lt;p&gt;Probably more useful for complex tables than simple ones. I’ll have to remember this.&lt;/p&gt;

&lt;hr&gt;

&lt;p&gt;Thank goodness for automated scanners and the people who maintain them[^2]! The stuff I build is better for it. I was impressed by the bugs Claude caught, even though it surely wasn’t comparable to an accessibility audit by a real person.&lt;/p&gt;

&lt;p&gt;[^1] My prompt: “You are an accessibility expert. Please review all the pages on this site and create a table of accessibility and WCAG violations”&lt;/p&gt;

&lt;p&gt;[^2] By the way: thoughtbot maintains &lt;a href="https://github.com/thoughtbot/capybara_accessibility_audit"&gt;CapybaraAccessibilityAudit&lt;/a&gt; which uses Axe under the hood!&lt;/p&gt;

&lt;aside class="related-articles"&gt;&lt;h2&gt;If you enjoyed this post, you might also like:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/debugging-why-your-specs-have-slowed-down"&gt;Debugging Why Your Specs Have Slowed Down&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/automating-barcode-scanner-tests-with-capybara"&gt;Automating barcode scanner tests with Capybara&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/theme-based-iterations"&gt;Theme-Based Iterations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/aside&gt;
</content>
  <summary>Automated scans taught me about some web stuff I forgot or never even know.</summary>
  <thoughtbot:auto_social_share>true</thoughtbot:auto_social_share>
</entry>
<entry>
  <title>The Bike Shed Ep 500:  Celebrating with past hosts</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/the-bike-shed-ep-500-celebrating-with-past-hosts"/>
  <author>
    <name>Stephanie Viccari, Joël Quenneville, Chris Toomey, Stephanie Minn, Sally Hall &amp;amp; Aji Slater</name>
  </author>
  <id>https://thoughtbot.com/blog/the-bike-shed-ep-500-celebrating-with-past-hosts</id>
  <published>2026-05-26T00:00:00+00:00</published>
  <updated>2026-06-01T18:27:11Z</updated>
  <content type="html">The Bike Shed celebrates its 500th episode with hosts new and old as they reflect on the show’s history and ask, what’s new in your world?</content>
  <summary>The Bike Shed celebrates its 500th episode with hosts new and old as they reflect on the show’s history and ask, what’s new in your world?</summary>
  <thoughtbot:auto_social_share>false</thoughtbot:auto_social_share>
</entry>
<entry>
  <title>Why Duck Typer?</title>
  <link rel="alternate" href="https://thoughtbot.com/blog/why-duck-typer"/>
  <author>
    <name>Thiago Araújo Silva</name>
  </author>
  <id>https://thoughtbot.com/blog/why-duck-typer</id>
  <published>2026-05-26T00:00:00+00:00</published>
  <updated>2026-05-21T18:49:29Z</updated>
  <content type="html">&lt;p&gt;&lt;a href="https://github.com/thoughtbot/duck_typer"&gt;Duck Typer&lt;/a&gt; is a Ruby gem that &lt;a href="https://thoughtbot.com/blog/meet-duck-typer-your-new-duck-typing-friend"&gt;validates interface
compatibility&lt;/a&gt; across polymorphic classes sharing the same
role, so they can be used interchangeably. It detects and clearly
reports interface drift directly in your test suite.&lt;/p&gt;

&lt;p&gt;Since Duck Typer launched, there’s been some discussion about the
validity of interface testing. In this post, I want to make the case
for it.&lt;/p&gt;
&lt;h2 id="quotinterface-tests-are-fragile-so-you-shouldn39t-write-themquot"&gt;
  
    “Interface tests are fragile, so you shouldn’t write them”
  
&lt;/h2&gt;

&lt;p&gt;That’s not true without context. How is your test suite structured?
What do you test? Obviously, if you write only
interface tests like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_interfaces_match&lt;/span&gt;
  &lt;span class="n"&gt;assert_interfaces_match&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="no"&gt;StripeProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;PaypalProcessor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With no behavior tests to accompany it, that quote &lt;em&gt;will&lt;/em&gt; be true.
Why? Because you’re not testing actual code behavior. Alone, Duck Typer
tests are fragile. So why should you still write them?&lt;/p&gt;
&lt;h2 id="quotbut-i-already-have-behavior-tests-that-catch-mismatchesquot"&gt;
  
    “But I already have behavior tests that catch mismatches”
  
&lt;/h2&gt;

&lt;p&gt;You do, and they will catch mismatches eventually, assuming you have
good test coverage. The problem is &lt;em&gt;how&lt;/em&gt; they catch them. A behavior
test will blow up with a &lt;code&gt;NoMethodError&lt;/code&gt; or an &lt;code&gt;ArgumentError&lt;/code&gt;,
but nothing about that tells you it’s an interface problem across
a group of classes. You have to figure that out yourself, then
work backwards to find which class drifted and what changed.&lt;/p&gt;

&lt;p&gt;Duck Typer short-circuits that investigation. It tells you &lt;em&gt;what&lt;/em&gt;
drifted and &lt;em&gt;where&lt;/em&gt;, in a single message, before you ever hit a
behavioral failure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Expected StripeProcessor and PaypalProcessor to implement compatible
interfaces, but the following method signatures differ:

StripeProcessor: refund(transaction_id)
PaypalProcessor: refund not defined
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There’s also a sharper version of this objection: “You can remove the
implementation and the test still passes, so it’s not a good test.”
That’s true, and it’s by design. Duck Typer checks shape, not behavior.
It explicitly marks that a set of classes is expected to evolve
together, and when one changes, the failure makes it clear. That’s a
different job than verifying correctness, and both are worth doing.&lt;/p&gt;
&lt;h2 id="it39s-about-quality-of-life"&gt;
  
    It’s about quality of life
  
&lt;/h2&gt;

&lt;p&gt;At thoughtbot, we always valued testing UX and clear error reporting. &lt;em&gt;We
care about the details&lt;/em&gt;. For example, this is a style of test generally
not encouraged here:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt; &lt;span class="n"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;post_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post_3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Assume that the &lt;code&gt;post&lt;/code&gt; objects are complex Active Record instances. Can
you imagine what the error message will look like if one object has
differences? It will dump a huge blob of text that incurs overhead to
parse. What are we really testing there? That we’re getting the right
objects! Instead, we can use named identifiers to make error reporting
more actionable and crystal clear:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:title&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt; &lt;span class="n"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;"Post 1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Post 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Post 3"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Duck Typer applies the same principle to interface errors. Without
it, you only get generic Ruby errors that say nothing about
interface drift across classes. With Duck Typer, you also get a
clear, targeted failure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Expected StripeProcessor and BraintreeProcessor to implement compatible
interfaces, but the following method signatures differ:

StripeProcessor: charge(amount, currency:)
BraintreeProcessor: charge(amount, currency:, description:)

StripeProcessor: refund(transaction_id)
BraintreeProcessor: refund(transaction_id, amount)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="it-communicates-design-intent-as-actionable-errors"&gt;
  
    It communicates design intent as actionable errors
  
&lt;/h2&gt;

&lt;p&gt;I wish Ruby had interfaces. As I said in the &lt;a href="https://thoughtbot.com/blog/meet-duck-typer-your-new-duck-typing-friend"&gt;introductory
post&lt;/a&gt;, I want to be alerted of interface drift because it’s
a great developer experience feature.&lt;/p&gt;

&lt;p&gt;It’s not always obvious when classes are supposed to be used
interchangeably. A clear error message communicates which classes
share a role and what shape their interfaces should have.&lt;/p&gt;

&lt;p&gt;What if you join a legacy project where the original developers left a
long time ago? Duck Typer would be super helpful there too.&lt;/p&gt;

&lt;p&gt;A concrete example: Null Objects. You add a &lt;code&gt;deactivate&lt;/code&gt; method to
&lt;code&gt;User&lt;/code&gt;, and your behavior tests for &lt;code&gt;User&lt;/code&gt; pass. But &lt;code&gt;NullUser&lt;/code&gt;, which
is supposed to be interchangeable with &lt;code&gt;User&lt;/code&gt;, silently drifts because
nobody remembered to update it. Behavior tests on &lt;code&gt;User&lt;/code&gt; won’t catch
that. Duck Typer will, immediately, because it treats those classes as a
group that must stay in sync. It also reminds you to write
the actual behavior test for &lt;code&gt;NullUser#deactivate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As a developer who loves targeted feedback, that is right up my alley.&lt;/p&gt;
&lt;h2 id="it-helps-you-think-about-design"&gt;
  
    It helps you think about design
  
&lt;/h2&gt;

&lt;p&gt;Let’s say that introducing a &lt;code&gt;do_stuff&lt;/code&gt; public method in
&lt;code&gt;StripeProcessor&lt;/code&gt; is the easiest way to accomplish a goal. You add it,
but get a test failure like the following:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Expected StripeProcessor and PaypalProcessor to implement compatible
interfaces, but the following method signatures differ:

StripeProcessor: do_stuff(data)
PaypalProcessor: do_stuff not defined
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That message doesn’t just report interface drift. It actually asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why are you doing that? A public method in StripeProcessor should
also exist in the other processors.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most likely, your &lt;code&gt;do_stuff&lt;/code&gt; method is not in the right place. Maybe it
belongs in a collaborator object, or maybe it should be a private method
that isn’t part of the public interface at all.&lt;/p&gt;

&lt;p&gt;The same applies to differing method parameters; if you introduce a
parameter in one class but it is not needed in another class from the
same interface, you are &lt;em&gt;probably&lt;/em&gt; doing something wrong.&lt;/p&gt;
&lt;h2 id="quotbut-that39s-just-like-shoulda-matchersquot"&gt;
  
    “But that’s just like shoulda-matchers”
  
&lt;/h2&gt;

&lt;p&gt;Not quite. &lt;a href="https://github.com/thoughtbot/shoulda-matchers"&gt;Shoulda Matchers&lt;/a&gt; are great for shortening
TDD feedback loops when working with Rails conventions. They verify a
single object’s declarations: does this model &lt;code&gt;have_many :posts&lt;/code&gt;? Does
it &lt;code&gt;validate_presence_of :email&lt;/code&gt;? That’s inward-facing: one object, one
declaration.&lt;/p&gt;

&lt;p&gt;In fact, once the code has enough behavior coverage, you could delete
the shoulda-matchers tests entirely. They’ve done their job.&lt;/p&gt;

&lt;p&gt;Duck Typer is cross-cutting. It checks whether a &lt;em&gt;group&lt;/em&gt; of objects
agrees on a shared interface. The question isn’t “does &lt;code&gt;StripeProcessor&lt;/code&gt;
have a &lt;code&gt;charge&lt;/code&gt; method?” but “do &lt;code&gt;StripeProcessor&lt;/code&gt;, &lt;code&gt;PaypalProcessor&lt;/code&gt;,
and &lt;code&gt;BraintreeProcessor&lt;/code&gt; all define &lt;code&gt;charge&lt;/code&gt; with the same signature?”
That’s a fundamentally different concern, and one that single-object
matchers can’t express.&lt;/p&gt;
&lt;h2 id="quotbut-it-doesn39t-catch-errors-in-productionquot"&gt;
  
    “But it doesn’t catch errors in production”
  
&lt;/h2&gt;

&lt;p&gt;Some prefer an approach where the interface is validated at class load
time: declare the contract, and if a class doesn’t conform, raise a
&lt;code&gt;RuntimeError&lt;/code&gt; immediately. That way, mismatches surface as errors in
production rather than only in tests.&lt;/p&gt;

&lt;p&gt;That’s a valid approach, although not exactly great. In a typed
language, an interface mismatch would never be deployed because the
code wouldn’t compile. Ruby doesn’t have a compiler, but it has
its own equivalent: the test suite. And guess what inhibits bad
deployments in Ruby projects? In all my years working with Ruby,
I’ve &lt;em&gt;never&lt;/em&gt; seen a project without a CI pipeline. If tests fail,
your code doesn’t get deployed. In practice, the safety net is
the same.&lt;/p&gt;

&lt;p&gt;On top of that, runtime checks add metaprogramming to your
production code, and you’d still need tests to verify the setup
is correct.&lt;/p&gt;

&lt;p&gt;That’s why Duck Typer deliberately stays in the test suite: it’s
Ruby’s natural place to enforce constraints like this, and your
implementation stays clean, without workarounds that try to mimic
static typing at runtime.&lt;/p&gt;

&lt;p&gt;If you want compile-time or runtime guarantees, tools like
&lt;a href="https://sorbet.org"&gt;Sorbet&lt;/a&gt; or &lt;a href="https://github.com/ruby/rbs"&gt;RBS&lt;/a&gt;
take a fundamentally different approach to the same problem and you
wouldn’t need Duck Typer. That said, Duck Typer gives you some of those
benefits with a fraction of the effort, at least when it comes to
interfaces.&lt;/p&gt;
&lt;h2 id="wrapping-up"&gt;
  
    Wrapping up
  
&lt;/h2&gt;

&lt;p&gt;Duck Typer won’t replace your behavior tests, and it was never meant to.
It’s a small, focused tool that gives you targeted feedback when
interfaces drift. It’s usually a one-liner to add, has no runtime
dependencies, and lives only in your test environment. If you value
clear error messages and care about keeping polymorphic classes in
sync, &lt;a href="https://github.com/thoughtbot/duck_typer"&gt;give it a try&lt;/a&gt;.&lt;/p&gt;

&lt;aside class="related-articles"&gt;&lt;h2&gt;If you enjoyed this post, you might also like:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/waiting-for-a-factory-bot"&gt;Waiting For a Factory~~Girl~~Bot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/don-t-inline-rescue-in-ruby"&gt;Don’t Inline-Rescue in Ruby&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thoughtbot.com/blog/your-flaky-tests-might-be-time-dependent"&gt;Your flaky tests might be time dependent&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/aside&gt;
</content>
  <summary>Some say interface tests are fragile and shouldn't be written. I disagree. Here's why I think they're worth writing.
</summary>
  <thoughtbot:auto_social_share>true</thoughtbot:auto_social_share>
</entry>
</feed>
