Matt Collins

What to Look For in an AI Agent Orchestration Platform in 2026

Matt Collins — Thu, 02 Apr 2026 11:26:34 +0000

We’re at an interesting but confusing point in time.

AI agents are good enough to be useful but setting them up can be frustratingly hard.

And setting them up to be both useful and secure? Good luck with that.

I’ve been trying to figure out how best to set up my own AI agents and still haven’t found an approach that I’m entirely happy with.

A lot of what you need comes down to what we might call ‘agent orchestration’ and ‘agent harnesses’ (if we define that a little more broadly than is becoming common now).

The growing jumble of possibilities includes agent harnesses such as OpenClaw, Claude Code, Claude Cowork and Hermes as well as more orchestration-focussed products such as n8n, CrewAI, Zapier and Paperclip.

Here are some of the aspects I currently see as important in any agent orchestration / harness setup:

1. Triggers

Chat interfaces are great but I don’t want to have to trigger my agents manually every time I want them to do something. I want the platform to support, at least, the following different triggers:

Scheduled: I want some actions to happen on a regular basis, e.g. every day or every week.
On receipt of an inbound event: e.g. receipt of an email or WhatsApp message (it should ideally be easy to hook up different inbound channels).
Manually: Sometimes I’m doing something ad hoc or developing a new agent or workflow. In those cases, I do still want to be able to trigger things manually.

2. Availability and Responsiveness

24×7 availability: I want the agents to be able to respond to me and/or other triggers 24×7 and to not be reliant on my laptop being on.

Fast responses: And when I’m interacting with it (e.g. via a chat interface) I want it to respond quickly.

3. Cost Effectiveness

Low marginal cost per workflow: The cost overhead of each extra agentic workflow I set up should be little more than the extra tokens it uses.
Subscription-friendly: Ideally, I should be able to take advantage of the cheap tokens available through subscriptions such as Claude Pro/Max and ChatGPT Plus/Pro.
Cost visibility: It should be easy for me to see how much I’m spending.
API cost efficiency: The platform should use LLM APIs effectively to avoid unnecessary costs. For example:
- Model choice: It should be possible to use cheaper models where they are good enough for a task.
- Cache-friendliness: It should be possible to take advantage of the cost benefits of caching.
- Token efficiency: The platform should be efficient in the number of tokens it uses for tasks.

4. Security

The platform should help in maintaining a reasonable level of security. There are many aspects to this, but some that I think may be particularly important:

Minimise agents’ access to secrets irrelevant to their task: Agents shouldn’t, for example, be able to scan your laptop and dig out important credentials from .env or .zshrc files or use Gmail to read all the emails you’ve ever sent or received. Ideally, they shouldn’t have access to any 3rd party credentials at all and should only have access to the specific information they need to perform their tasks well.

Platform security: It should be easy to keep the platform itself up to date and free from known security issues. For example, I don’t want to worry that my agent harness is running on a server that hasn’t had security patches applied for the last year. And it’s no use keeping important credentials away from agents if the place we’re keeping them is insecure.

5. Support for Popular Patterns

Some patterns have emerged around AI agents that people seem to be finding useful. The platform should provide good support for those patterns.

For example:

Filesystem/shell use
Skills
MCP
Subagents
Code execution

6. Flexibility

The ecosystem of models, libraries, agent harnesses, etc. is messy and evolving very quickly.

The platform should make it easy to try existing things out and flexible enough to work with whatever new things that emerge.

7. Configuration Management

As far as possible I want to be able to easily understand my current setup: how agents are configured (perhaps including important aspects of their ‘memories’), what triggers are in place, etc.

And, ideally, I want good ways to track that configuration over time; perhaps using software-style version control that allows me to see what’s changed and undo changes that are proving problematic.

8. Simplicity

Ideally, the platform should be simple to use so that I don’t need to spend a lot of time trying to understand it, troubleshooting it, etc.

9. Good Long-Term Outlook

Whatever platform I use, I want it to have a good chance of being around and well-supported for the long term to minimse the chances I’ll have the pain of switching over to another platform in six months or a year’s time.

Closing Thoughts

This list reflects my rough current thoughts on what’s important in an agent orchestration platform.

I haven’t tried to be comprehensive with my list. In particular, I haven’t tried to cover things that I’m sure would be important in a more corporate context.

That said, I hope these thoughts are helpful to you if you’re also trying to figure out how to set up your AI agents.

I’d love to connect with more people working with AI agents. If that’s you, you can find me on X here.

The post What to Look For in an AI Agent Orchestration Platform in 2026 first appeared on Matt Collins.

First Impressions of Cloudflare’s Code Mode for Building AI Agents

Matt Collins — Wed, 19 Nov 2025 00:31:16 +0000

Cloudflare’s ‘Code Mode’ got some attention in the AI developer community recently, thanks to a popular blog post about it.

It’s based on the ‘CodeACT‘ pattern for AI agents which is something I’ve been interested in for a while now.

These are my initial first-hand impressions of using it.

What is CodeACT?

Most LLM-based AI agents work by having an LLM ‘call tools’ by generating JSON. In these, the agent harness interprets that JSON and decides what code to run (e.g. calling out to a weather API.)

In CodeACT, the LLM generates code instead of JSON. The agent harness executes that code. The code may involve the equivalent of what the tool calls would have done (e.g. calling out to a weather API.)

Generating code can potentially be more effective than generating JSON for a number of reasons:

Flexibility: Code can easily express loops, conditional logic, and the dynamic composition of multiple tools, none of which is possible with JSON.
Code fluency: Since there is so much code in LLMs’ training data, they may be better at generating code than generating the equivalent structured JSON.
Task success improvement: The approach can potentially result in higher success rates than JSON-based approaches.

That’s all quite enticing.

But there are downsides, too:

Increased security risks: Executing arbitrary code opens potential attack surfaces and requires stringent safety controls.
Greater fragility: Generated code could crash the agent without straightforward fallback or recovery mechanisms.
Sandboxing requirements: Careful containment is needed to avoid infinite loops, resource exhaustion, or unintended system impact.
Challenging debugging and monitoring: Diagnosing and recovering from code errors can be more complicated than handling misformatted JSON.

Where Does Cloudflare Code Mode Come In?

Cloudflare’s Code Mode is a library for building CodeACT-inspired agents on top of Cloudflare’s Workers platform.

Why is it interesting?

Firstly, it makes it easy to execute code generated by your LLM in a very lightweight sandboxed environment (in this case a Cloudflare Worker).

Secondly, it provides ‘bindings’ which let you pass functionality into the sandbox’s environment in the form of functions that can be called from within the sandbox but that execute outside of it.

This seems like a great fit for executing code that relies on secrets that you don’t want to expose to the LLM and for restricting what access to outside systems you want the agent to have.

For example, for a customer support agent responding to a ticket from a given customer, you might provide a binding giving read-only access to all support tickets related to that customer. This would allow the agent to fetch useful context without it seeing either (a) credentials for accessing the customer support system as a whole, or (b) tickets related to other customers.

Let’s See Code Mode in Action!

I set up a simple Code Mode agent using sample code provided by Cloudflare.

If you’d like to take a look or try it yourself, you can find the code here.

You can define tools using the tools array, something like this:

import { tool, jsonSchema } from 'ai';

export const tools = {
  getWeather: tool({
    description: 'Returns a silly canned weather string.',
    inputSchema: jsonSchema({
      type: 'object',
      properties: {},
      additionalProperties: false
    }),
    outputSchema: jsonSchema({
      type: 'string'
    }),
    execute: () => "It's cold. Brrrrrrrr!"
  })
};

To better understand how Code Mode works, let’s look at what happens when we ask the agent about the weather…

1. First Iteration of Agent Loop

The LLM is given the user’s prompt (in this case, “how’s the weather today?”) and told it has access to a ‘codemode’ tool that can return a canned weather string:

{
  "model": "gpt-5-nano",
  "input": [
    {
      "role": "developer",
      "content": "You are a helpful assistant. You have access to the \"codemode\" tool that can do different things:

- Returns a silly canned weather string.

If the user asks to do anything that can be achieved by the codemode tool, then simply pass over control to it by giving it a simple function description. Don't be too verbose."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "how's the weather today?"
        }
      ]
    }
  ],
  "tools": [
    {
      "type": "function",
      "name": "codemode",
      "description": "codemode: a tool that can generate code to achieve a goal",
      "parameters": {
        "type": "object",
        "properties": {
          "functionDescription": {
            "type": "string"
          }
        },
        "required": ["functionDescription"],
        "additionalProperties": false,
        "$schema": "http://json-schema.org/draft-07/schema#"
      },
      "strict": false
    }
  ],
  "tool_choice": "auto",
  "stream": true
}

The harness parses the response from the first LLM call and finds a call to the 'codemode' tool with the argument "Return a silly canned weather string."

Note that, so far, this is standard tool-calling agent stuff.

‘Calling the code mode tool’ is where things get interesting.

2. Code Generation

To ‘call the code mode tool,’ first we need to generate some code, then we need to execute that code in a suitably sandboxed environment and return a result.

We insert the function description provided by the first LLM into the prompt below.

Note that we tell the LLM about the getWeather function that it can use.

The LLM in this case isn’t given any tools; it is just asked to generate suitable code:

{
  "model": "gpt-4.1",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "You are a code generating machine.

In addition to regular javascript, you can also use the following functions:

interface GetWeatherInput {}
export type GetWeatherOutput = string

declare const codemode: {
/*
Returns a silly canned weather string.
*/
    getWeather: (input: GetWeatherInput) => Promise;
}

Respond only with the code, nothing else. Output javascript code.

Generate an async function that achieves the goal. This async function doesn't accept any arguments.

Here is user input: Return a silly canned weather string."
        }
      ]
    }
  ],
  "text": {
    "format": {
      "type": "json_schema",
      "strict": false,
      "name": "response",
      "schema": {
        "type": "object",
        "properties": {
          "code": { "type": "string" }
        },
        "required": ["code"],
        "additionalProperties": false,
        "$schema": "http://json-schema.org/draft-07/schema#"
      }
    }
  }
}

3. Sandboxed Code Execution

The Code Mode harness extracts the code provided by the LLM in its response to the second call above:

async function getSillyWeatherString() {
  const weather = await codemode.getWeather({});
  return weather;
}

It executes it within a Cloudflare Worker.

When it executes codemode.getWeather() it makes use of the binding provided to the Cloudflare Worker. The code defined for the getWeather tool is executed outside of the worker (but still within Cloudflare’s infrastructure).

The result of "It's cold. Brrrrrrrr!" is passed back into the worker in response and assigned to the ‘weather’ variable.

The same value is then returned back to the agent harness (via "return weather;")

From here on we’re back to a standard LLM-in-a-loop agent mechanism.

4. Second Iteration of Agent Loop

Having completed the ‘tool call’, the harness now runs a new iteration of the main agent loop.

It calls the LLM much as in the first call, but this time with extra messages representing the tool call from the first LLM response and the result of that tool call:

{
  "model": "gpt-5-nano",
  "input": [
    {
      "role": "developer",
      "content": "You are a helpful assistant. You have access to the \"codemode\" tool that can do different things:

- Returns a silly canned weather string.

If the user asks to do anything that be achievable by the codemode tool, then simply pass over control to it by giving it a simple function description. Don't be too verbose."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "how's the weather today?"
        }
      ]
    },
    {
      "type": "item_reference",
      "id": "rs_0c6286506bb0ecd000691ca8b6b06881a3a7e6ce44822039e5"
    },
    {
      "type": "item_reference",
      "id": "fc_0c6286506bb0ecd000691ca8b86b5081a39a78381020daee0d"
    },
    {
      "type": "function_call_output",
      "call_id": "call_mKI1Yw7jLIeyqi4U9jKBZmyc",
      "output": "{\"code\":\"async function getSillyWeatherString() {\\n  const weather = await codemode.getWeather({});\\n  return weather;\\n}\",\"result\":\"It's cold. Brrrrrrrr!\"}"
    }
  ],
  "tools": [
    {
      "type": "function",
      "name": "codemode",
      "description": "codemode: a tool that can generate code to achieve a goal",
      "parameters": {
        "type": "object",
        "properties": {
          "functionDescription": {
            "type": "string"
          }
        },
        "required": ["functionDescription"],
        "additionalProperties": false,
        "$schema": "http://json-schema.org/draft-07/schema#"
      },
      "strict": false
    }
  ],
  "tool_choice": "auto",
  "stream": true
}

This time, the LLM returns some text (“It’s cold. Brrrrrrrr!”) and no tool calls.

To the harness, this means the agent loop is complete.

The harness duly returns the text to the user.

Observations

Two-Step Code Generation

It’s interesting that the harness doesn’t try to get the first LLM call to generate code; it just tries to get a description of the function, then uses that description in a subsequent LLM prompt to generate code.

I guess this separation slightly simplifies the jobs that the LLMs in each case need to do which may make the overall result more reliable.

On the other hand, this could have a couple of downsides:

The second LLM only sees the target function description provided by the first LLM rather than the full context of what is desired. I imagine this could result in code that isn’t quite as suitable.
Making two sequential LLM calls rather than one entails more latency.

I would be interested to learn whether Cloudflare considered a one-step approach at all and how other CodeACT-inspired implementations have approached this.

Model Choice

I was a little surprised that the LLM call to generate code used a different model (gpt-4.1) from the one specified in my code and used for the agent loop (gpt-5-nano). That currently seems to be hardcoded into the Code Mode code, but perhaps will be made configurable in the future.

Prompts

I was perhaps equally surprised by the wordings of the LLM prompts that Code Mode uses for the main agent loop and to generate code. They both seem quite vague about what, exactly, they want the LLM to do.

Closing Thoughts

Cloudflare’s Code Mode seems a very interesting option for running CodeACT-style agents in the cloud, given (i) how lightweight (as I understand it) the V8 isolates are that their workers run within, compared to other providers’ sandbox technologies, (ii) their support for bindings, and (iii) Cloudflare being a large company that can be trusted to be around for a while.

Earlier in the year I tried out Hugging Face’s smolagents library which offers ‘CodeAgents’ which are also based on the CodeACT pattern. That’s a Python-based option that is also worth a look.

Have you tried Code Mode, smolagents or similar offerings from other providers?

If so, I’d be curious to know how you’ve got on!

The post First Impressions of Cloudflare’s Code Mode for Building AI Agents first appeared on Matt Collins.

AI Splurge

Matt Collins — Tue, 30 Sep 2025 15:16:19 +0000

Heard of ‘AI slop’?

There’s something else I’m worried about these days and I’m calling it “AI splurge.”

What is AI Splurge?

AI splurge is the output of AI coding agents when they get overzealous and create more code than you need.

It can take many forms, but here are a few I’ve seen a lot:

Unrequested extra options added to a command line tool.
Defensive code to handle cases that should really result in loud errors.
Over-engineered code ready for a future that will likely never arrive.

AI splurge needlessly complicates your codebase and makes your system harder to understand and work with (for humans and AI.)

In some ways AI splurge is even worse than outright bugs because it’s more likely to go unnoticed and, if unaddressed, will build up relentlessly over time into a monster that will be hard to tame.

How Can You Avoid AI Splurge?

Code Review: You can try to review all the changes that your AI agent is proposing. Perhaps have an AI reviewer take a look first, with instructions to flag up any unnecessary complexity.

2. Steer Your Agent: You can also include instructions in your AGENTS.md file or equivalent to steer the agent away from splurging in the first place.

For example:

Write the smallest change that solves the stated problem.

This won’t be 100% effective but should help.

3. Periodic Rewrites: A more radical approach may be to periodically rewrite your system from scratch. This may sound crazy, but it’s something that may become increasingly feasible with ideas such as spec-driven development.

What’s the Future of AI Splurge?

Hopefully AI splurge will be a short-lived phenomenon that will fade away as AI agents get more powerful and, presumably, better at keeping systems simple.

I’m optimistic.

For now, though, beware AI splurge!

The post AI Splurge first appeared on Matt Collins.

AI Hallucinations? The Solution Is More AI, Not Less

Matt Collins — Thu, 27 Feb 2025 17:30:19 +0000

I’ve heard a few people say that, since generative AI systems can hallucinate, we’re alway going to have to double-check their work in great detail if we want to be confident it’s correct.

I disagree.

Why?

Because AI systems can do much of this checking themselves.

And, when the cost/benefit tradeoff is right, I believe that’s exactly what we’ll get them to do.

How Do Humans Check Documents?

Imagine the kind of checking you might do as a human reviewer if you wanted to check that an article was accurate:

Where the article cites online sources, you might go and check those sources to make sure any factual claims in the article are backed up by those sources.
Where the article doesn’t cite sources, you might try to find appropriate sources to corroborate or refute specific claims.
Where the article makes inferences, you might check that those inferences are reasonable.

AI Can Check Documents, Too!

It turns out that all these things can be done pretty well with the help of current LLMs and the kind of online content retrieval that tools such as Deep Research are built around. (I recently prototyped something along these lines myself, using some web search APIs.)

If an LLM hallucinates a claim in the middle of an article it is generating, it’s entirely possible that that same LLM, if asked to focus specifically on that claim (and provided the same source materials), can identify the claim as problematic.

We’ll Get AI To Check Documents (When It’s Worth It)

I don’t think this kind of mechanism has been built into tools much so far but I suspect it’s coming.

LLM costs are declining steeply and there are, surely, cases where documents are important enough to pay extra for increased accuracy.

Conclusion

I suspect that, very soon, we’ll start seeing tools along the lines of Deep Research that (perhaps for a premium price) incorporate AI-powered fact checking in order to provide an extra level of trustworthiness.

So the good news is that, despite what you may have heard, you won’t need to carefully check everything your AI tools write for you.

The post AI Hallucinations? The Solution Is More AI, Not Less first appeared on Matt Collins.

How to Build an AI Fact Checker

Matt Collins — Thu, 27 Feb 2025 16:31:27 +0000

Journalists sometimes get things wrong.

Wouldn’t it be nice if AI could help us spot when they do?

I recently prototyped an LLM-powered fact checker (using web search APIs) to see if it could do just that.

It looked promising!

In case you’re interested in learning more, I gave a talk to the MLOps London community about it. Here are the slides.

The post How to Build an AI Fact Checker first appeared on Matt Collins.

How to Use AI in Software Product Development Today

Matt Collins — Fri, 31 Jan 2025 14:16:54 +0000

It is now clear: AI is transforming how software is developed.

Anyone leading software development teams needs to be keeping a close eye on this to understand the tools’ rapidly evolving capabilities as well as their limitations and risks.

Here’s a snapshot (as of January 2025) of how I think AI tools currently fit into software development for teams working on established products and codebases.

High Priority

Have a clear company policy on use of AI

Your team needs to be clear about what they’re allowed (and not allowed) to do with AI tools.

If your company doesn’t already have a clear policy around this, then draw one up or push for one.

AI tools can increasingly offer significant productivity boosts (and more) but can carry risks of sending sensitive data to 3rd parties. Company leaders should be clear and intentional about any tradeoffs you’re making around this.

Provide access to AI tools

If at all possible, have a company-approved way for your team to be able to access a broad range of up-to-date AI services, including:

APIs to LLMs (e.g. via OpenAI, AWS, Azure or GCP)
At least one chat interface (ChatGPT, Google Gemini or similar) [could be a self-hosted wrapper of an LLM API if necessary]
An AI code editor such as Cursor or GitHub Copilot

(Some of the remaining points will only be possible if you have the above.)

Encourage experimentation with AI tools

Since this is such a rapidly-moving space, probably the most important thing is to encourage your team to be experimenting regularly with how AI can help in your particular context.

Have a clear policy
- As mentioned above, make it as easy as possible for people to know what’s allowed and what’s not. Be explicit about the most obvious use cases and make it easy for people to get clarity on others.
Eliminate concerns about trivial costs
- Consider allocating an ‘experimentation budget’.
- Consider dedicated ‘experimentation’ API keys with capped usage limits.
- Encourage people to expense things they’ve tested themselves.
Champion experimentation
- Lead by example: share your own experiments with AI tools.
- Encourage people to share their own experiments and findings.
- Consider including ‘sharing learnings from experimentation with AI’ in some or all individuals’ development plans.

Lean into AI code editors (Cursor, GitHub Copilot or similar)

Some developers remain skeptical of these tools (saying they produce low quality code), but others are finding them very useful (particularly with models since Claude 3.5 Sonnet.)

Personally, I think these tools are already very useful and, since they’re only getting better, it makes sense for developers to be familiar with these tools and to be sharing tips on getting the most from them with your particular codebase(s).

Be mindful of quality, however. These tools can make it faster to generate good code but, for now at least, they can also quickly produce a lot of bad code. So human oversight is needed. And you need to figure out what level of oversight is appropriate in your environment in different situations.

The ‘autocomplete’ features of these editors are, I think, a fairly safe way to get a productivity boost if developers are responsible for reading code they generate in this way and you maintain good automated test coverage alongside this.

More ‘agentic’ modes of operation where you prompt the AI to go off and make changes to multiple areas of the code can work well in some situations. They need much more care, though, not least because they can easily produce large sets of (potentially low quality) changes. Developing code this way this way entails a very different workflow; one which, as an industry, we’re only just starting to figure out.

Discuss the team’s use of AI

Given how quickly the landscape of AI software development tools (and practices for working with them) is evolving, you probably want to be regularly sharing learnings and discussing the tradeoffs of different approaches within your team(s). Hopefully that’s happening organically. If not, you may want to raise the topic from time to time.

You may want to have an evolving team policy or guidelines around how, as a team, you’re currently choosing to use AI. (If so, make sure it’s reviewed and updated every few months, though!)

Encourage use of ChatGPT or similar for general productivity

There’s a reason why ChatGPT was one of the fastest-growing products of all time – AI chat interfaces are incredibly useful. Some people will use them a lot; some people may not for now. That’s okay. You want to get people experimenting with these tools and learning when to reach for them and how best to incorporate them into the non-coding aspects of their work.

Medium Priority

Experiment with front-end focussed coding agents

Tools such as V0 and Replit Agent can be useful in some cases for prototyping. For example, a product manager could potentially create quick prototypes of a UI with minimal help from a developer.

Equally, tools like this could be useful in rapidly developing small internal tools (or parts of them) where quality isn’t critical.

Experiment with auxiliary AI tools

AI tools are rapidly popping up to help with other specific aspects of software development. For example, AI tools that can review code for security issues.

There’s too much to cover here in detail but I think it’s worth experimenting with any such tools that look well-aligned with your current priorities.

Low Priority

Be aware of end-to-end coding agents

There are also more ambitious coding agents such as Devin and OpenHands. These are designed to perform larger tasks, including front-end and back-end, but are fairly unreliable. I would keep an eye on developments here but not be in a rush to do anything with them for now.

What Do You Think?

What have I missed here?

Is there anything you disagree with?

And are there tools or new ways of working that you and your team are finding especially valuable?

The post How to Use AI in Software Product Development Today first appeared on Matt Collins.

How AI Helped Me Create a #1 Product Hunt Tool in Hours

Matt Collins — Thu, 09 Jan 2025 15:41:02 +0000

What if AI could help you generate ideas for products and build them?

That’s what happened when I used AI tools to create Flowdrafter—a writing app that solves a common problem for writers. (It even became Product Hunt’s #1 productivity tool of the week!)

Can AI Coding Agents Help with Marketing?

My wife is a writing coach and I’m frequently looking for ways to help her promote her business.

Seeing how easy it has become lately to create small tools using AI coding agents such as V0, I thought it might be worth experimenting with some tool-based content marketing (i.e. creating free tools as a way to bring relevant people to her site.)

But what tools should I create?

AI Product Idea Generation

I asked Claude to list common problems writers face.

One of Claude’s suggestions stood out:

“I keep editing as I write, and it’s taking me forever to make progress. I’ll spend an entire writing session perfecting one paragraph.”
— Claude.ai (pretending to be a writer.)

This seemed like a nice problem to try to tackle!

My memory of the next part of the process is a bit hazy and, going back, I couldn’t find much in my AI chat records but I believe I looked into Reddit discussions for more context and used ChatGPT to brainstorm app ideas.

At some point, the idea for a writing app that (perhaps weirdly!) disables editing came together.

(I and/or ChatGPT may very well have been inspired by an existing tool or Reddit thread. If you’re aware of such a tool, please let me know as I’d love to mention it here.)

Rapid AI Prototyping

I used V0, an AI app development tool, to quickly create a prototype. Within minutes, I had something basic but functional. I then used Cursor AI editor to tweak and refine the app further.

Getting it live took a bit longer. I hosted it on Vercel and set it up on a subdomain of my wife’s main domain. You can check it out here: https://flowdrafter.pickupyourpen.com/.

In all, it probably took me a total of a few hours to get everything in place.

Launching to Unexpected Success

I launched Flowdrafter on Product Hunt, not expecting much. I’ve worked on other products for considerably longer that didn’t get much attention there. But to my surprise, Flowdrafter quickly became the most upvoted product of the day!

[Update: It ended up getting pipped to the top spot in the closing minutes of the day but still got a very creditable #2 spot and claimed #1 productivity app of the week.]

Results

If you remember, I got started on this as a way to promote my wife’s writing coaching service.

How effective was it?

Let’s take a look.

So far, the tool (which is on a subdomain of her main domain) has accumulated 65 backlinks (mostly as a result of doing well on Product Hunt):

“So what?” you may ask.

Before this, my wife’s site was languishing on around page 4 of the Google search results for her target phrase of “writing coach”.

Now? It’s in the middle of the first page. Much better!:

As well as winning Product Hunt’s #1 Productivity App of the week, Flowdrafter was featured in The Neuron (an email newsletter going to 500k+ subscribers) as one of its ‘Treats To Try’).

It’s more incidental, but my Reddit write-up about the experience has received 162K views so far:

This led to it being picked up on X (by Deedy Das of Menlo Ventures):

And on LinkedIn (by Kieran Flanagan of Hubspot):

And, as of 9th January 2025, it’s still pinned to the top of the ClaudeAI subreddit as a community highlight (featuring my wife’s face, which she and I find rather amusing!):

What Made This Work?

Flowdrafter Addressed a Real Problem: Claude correctly identified spending too much time editing as a common problem.
It Was Simple: The concept of the app was simple, making it easy to understand.
It Was Quirky: The fact that the app helped address the problem by restricting what users can do probably seemed quite surprising and that seemed to appeal to people.
AI Made it Easy to Try: Critically, tools like V0 and Cursor let me build and launch the app quickly. In the past it would have taken me much longer and, consequently, I almost certainly wouldn’t have done it at all.

Lessons Learned

AI Coding Agents Can be Great for MVPs: AI tools such as V0 can do a lot of the work in creating simple front-end apps such as this. (And they’re only going to get better!)
AI Is a Great Creative Partner: As well as being great for coding, AI can help with product ideas too.

What’s Next?

It’s been fun to see the unexpected interest in Flowdrafter on Product Hunt but people liking the concept is not the same as them using it on a regular basis. I wasn’t planning to develop it further, but its popularity means I’ll be thinking a bit more about that now.

In any case, if you haven’t tried AI coding agents like V0 yet, I highly recommend doing so. They’re going to be getting more and more powerful and I’ll certainly be continuing to experiment to see what they can do!

The post How AI Helped Me Create a #1 Product Hunt Tool in Hours first appeared on Matt Collins.

FastHTML: The Perfect Framework for Simple AI-Powered Web Apps?

Matt Collins — Fri, 20 Sep 2024 14:44:41 +0000

I heard about FastHTML a few months back and was immediately intrigued by its pitch:

“Modern web applications in pure Python

Built on solid web foundations, not the latest fads – with FastHTML you can get started on anything from simple dashboards to scalable web applications in minutes.”
https://www.fastht.ml/

This sounded like something I could use!

But I’m wary of new frameworks. Few of them become popular enough long-term to really be worth your time. Was this just another one that people might get excited about, but that would ultimately fade away?

I’ve been using Django as my go-to platform for building LLM-based applications for a while now and it’s amazing (I’m a fan of Ruby on Rails, too, but switched to Python for AI-centric projects.) With so many people using it, you can be confident of finding built-in support or 3rd-party libraries and documentation for any common task.

But Django also feels quite cumbersome for small applications. A more lightweight option with a simple way to achieve some basic front-end effects could be a handy tool to have available.

I was keen to give this new framework a go.

Well – I’ve finally had a chance to do that. Here are some quick notes on how I found it.

Getting Started

Possibly the best thing about FastHTML is how easy it is to get started. It really is very straightforward to follow their instructions and get something up and running locally. It probably took me a few minutes. And as it hooks into HTMX out of the box, you can set up some handy and effective frontend UI effects very easily.

Building an MVP

This was a very simple app without any authentication or a relational database to interact with, so there were lots of common web app tasks that I didn’t have to go through.

I just needed to show a page with a simple form, respond to form submissions, do some back-end processing, and call out to some other services – nothing too challenging. FastHTML handled the web UI aspects perfectly in exactly the lightweight way I’d hoped.

Getting Ready for Production

Moving to production, things mostly went smoothly but I did hit one limitation with FastHTML. I was using TailwindCSS for styling and using Tailwind to generate streamlined versions of my main CSS file with just the styles I needed. Doing this reduced the page load size by 330 kB which wasn’t too critical in this case but turned out to be an interesting experiment in any case. Anyhow, this meant the CSS file was frequently being updated as I tweaked styling.

In production, browsers would cache an old version of the CSS file and end up with incorrect styling. This is a well-known problem and frameworks such as Rails and Django have out-of-the-box solutions to address it. At the moment, FastHTML doesn’t seem to. It’s not a major issue, but an example of the kind of hassle that you can largely avoid with more mature frameworks.

Deployment

Deploying my FastHTML app was fairly straightforward though did require a bit of figuring out on my part. I got my app running on a Digital Ocean’s App Platform after a bit of research into their ‘app specs‘ and making appropriate tweaks. If the FastHTML framework continues to gain in popularity, I expect hosting companies will make framework-specific tweaks and provide documentation that will make the process even more streamlined.

Conclusions

If you’re familiar with Python and wanting to build a fairly simple ‘GPT wrapper’ kind of web app, then I highly recommend taking a look at FastHTML. It’s great for creating MVPs of that kind of thing.

If you’re looking to build something a bit more complicated, then I’d be wary for now – you might like to wait a bit for the framework and the ecosystem around it to mature as I suspect you’ll hit limitations somewhere or another.

All in all, though, FastHTML is looking very promising. I think the philosophy behind it makes a lot of sense and, judging by the activity around it already, it seems like plenty of other people feel likewise. Congratulations to Jeremy Howard (the main person behind it) and everyone else who’s contributed to it so far!

P.S. In case you’d like to see the little app I created, here it is!

The post FastHTML: The Perfect Framework for Simple AI-Powered Web Apps? first appeared on Matt Collins.

5 Ways to Make an Agentic AI System More Reliable

Matt Collins — Thu, 18 Apr 2024 10:13:15 +0000

Lots of people are starting to explore what can be done with agent-based AI systems. It feels like there’s huge potential in them. But getting an agentic system to work reliably enough to use in practice often turns out to be very hard.

Here are a few techniques that can help with reliability.

Technique 1: Optimise The Agent’s Tools

The performance of agents depends heavily on the tools they have available. Give them better tools (or improve the ones they’ve got) and, naturally, they’re likely to perform better.

For example:

If your agent is doing online research, instead of giving it access to a generic search result API, give it access to one designed specifically for LLMs (e.g. one of the ones described here.)
If your agent only needs certain fields returned by an API, filter the other fields out.

Technique 2: Add a Retry Mechanism

Let’s suppose that, 20% of the time, your agent produces a result that can, in an automated way, be identified as poor. If latency isn’t a concern, you can check each result and ask the agent to try again if necessary. (You may need to give the agent some information about its previous attempts so that it can avoid repeating itself.) You can quickly significantly increase the reliability of the overall system.

Technique 3: Generate Extra Responses and Pick the Best

If your system involves generating a set of N items but some of those items can be weak, consider generating slightly more than N items, automatically scoring them, then picking and using the highest-scoring N.

Technique 4: Eliminate the Agent

AI agents are appealing precisely because they can figure things out in an iterative, exploratory way. But the flipside of that open-endedness is that they can be very unreliable. You probably don’t want to hear this, but your best bet for now may actually be to eliminate your agent if possible.

How?

It depends on your situation but, if you’re asking an agent to complete a particular sort of task, try thinking through the steps of how you’d complete that task yourself. See if you can find a fixed sequence of steps that will do the job in the vast majority of cases and that you can automate (using LLMs if appropriate).

Technique 5: Avoid the Agent in Certain (Important) Cases

Sometimes eliminating your agent entirely isn’t possible (but it’s worth thinking hard about it.) Even in these situations, you may have some important cases that can be handled using a fixed sequence of steps that is more reliable than the agent would be. You may be able to classify incoming requests and route them to either your (relatively reliable) fixed sequence of steps or to your (relatively unreliable) agent. The overall reliability of your system can be improved as a result.

Final Words

Agent-based AI systems have a lot of promise but can be very unreliable and pose challenges that can be unfamiliar if you’re used to more traditional, deterministic software development. Dealing with that unreliability often becomes a key challenge and a focus of your engineering.

I hope the techniques I’ve outlined here give you some useful ideas and help you a little on your journey.

Let me know what other techniques you’re finding useful when developing AI-based software systems.

The post 5 Ways to Make an Agentic AI System More Reliable first appeared on Matt Collins.

What the 2023 Hollywood Writers’ Agreement Says About the Use of AI in Screenwriting

Matt Collins — Wed, 27 Sep 2023 07:52:05 +0000

As of 27th September 2023, the Writers’ Guild of America (WGA) has ended its strike with the big studios, having reached a tentative agreement with the big studios that will now be voted on by the WGA’s members.

The deal has some key terms about AI and could be a sign of what will be agreed with actors who are also currently on strike.

Here’s how the WGA summarises the AI-related terms:

5. Artificial Intelligence

We have established regulations for the use of artificial intelligence (“AI”) on MBA-covered projects in the following ways:

AI can’t write or rewrite literary material, and AI-generated material will not be considered source material under the MBA, meaning that AI-generated material can’t be used to undermine a writer’s credit or separated rights.

A writer can choose to use AI when performing writing services, if the company consents and provided that the writer follows applicable company policies, but the company can’t require the writer to use AI software (e.g., ChatGPT) when performing writing services.

The Company must disclose to the writer if any materials given to the writer have been generated by AI or incorporate AI-generated material.

The WGA reserves the right to assert that exploitation of writers’ material to train AI is prohibited by MBA or other law.

https://www.wgacontract2023.org/the-campaign/summary-of-the-2023-wga-mba

And here’s the full text of the relevant clauses from the tentative agreement:

ARTICLE 72 GENERATIVE ARTIFICIAL INTELLIGENCE

A. The parties acknowledge that definitions of generative artificial intelligence
(‘GAI’) vary, but agree that the term generally refers to a subset of artificial
intelligence that learns patterns from data and produces content, including written
material, based on those patterns, and may employ algorithmic methods (e.g.,
ChatGPT, Llama, MidJourney, Dall-E). It does not include ‘traditional AI’
technologies such as those used in CGI and VFX and those programmed to
perform operational and analytical functions.

B. The Companies agree that because neither traditional AI nor GAI is a person,
neither is a ‘writer’ or ‘professional writer’ as defined in Articles 1.B.1.a.,
1.B.1.b., 1.C.1.a. and 1.C.1.b. of this MBA, and, therefore, written material
produced by traditional AI or GAI shall not be considered literary material under
this or any prior MBA.

C. Should a Company furnish a writer with written material produced by GAI which
has not been previously published or exploited, and instruct the writer to use the
GAI-produced material as the basis for writing literary material:

1. The Company shall disclose to that writer that the written material was
produced by GAI.

2. The GAI-produced written material shall not be considered assigned
material for purposes of determining the writer’s compensation.

3. The GAI-produced written material shall not be considered source material
for purposes of determining writing credit.

4. The GAI-produced written material shall not be the basis for disqualifying
a writer from eligibility for separated rights.

This subparagraph C. also applies when a writer, with the consent of the
Company, uses GAI in the course of preparing literary material. Company agrees
that it will not publish or exploit GAI written material for the purposes of evading
this provision.

When a writer, with the consent of the Company, uses GAI in the course of
preparing written material or incorporates GAI-produced material in written
material, such written material shall be considered literary material and not
material ‘produced’ by GAI.

The following examples illustrate application of this subparagraph C.:

EXAMPLE 1:

Company furnishes Writer A with written material substantially in the
form of a screenplay produced by GAI which has not been previously
published or exploited and assigns no other materials. Company instructs
Writer A to rewrite the GAI-produced written material. Company must
pay Writer A no less than the minimum compensation for a screenplay
under Article 13.A.1.a.(2), as well as no less than the amount specified in
Article 13.A.1.a.(9), ‘Additional Compensation Screenplay – No Assigned
Material.’ The GAI-produced written material is not considered source
material when determining writing credit to Writer A and will not
disqualify Writer A from eligibility for separated rights.

Company later assigns the screenplay rewritten by Writer A to Writer B
and instructs Writer B to rewrite the screenplay rewritten by Writer A.
Company must pay Writer B no less than the minimum compensation for a
rewrite under Article 13.A.1.a.(3). Writer A’s rewritten screenplay must be
considered when determining writing credit to Writer B and eligibility for
separated rights.

EXAMPLE 2:

Company furnishes Writer A with written material substantially in the
form of a story produced by GAI which has not been previously published
or exploited and assigns no other materials. Company instructs Writer A to
write a teleplay based on the GAI-produced written material. Company
must pay Writer A no less than the minimum compensation for a story and
teleplay. The GAI-produced story is not considered source material when
determining writing credit to Writer A and will not disqualify Writer A
from eligibility for separated rights.

Company later assigns the teleplay written by Writer A to Writer B and
instructs Writer B to rewrite the teleplay written by Writer A. Company
must pay Writer B no less than the minimum compensation for a rewrite.
Writer A’s teleplay must be considered when determining writing credit to
Writer B and eligibility for separated rights.

D. A writer will be required to adhere to the Company’s policies regarding the use of
GAI (e.g., policies related to ethics, privacy, security, copyrightability or other
protection of intellectual property rights). Any purchase of literary material from
a professional writer is also subject to such policies. A writer must obtain the
Company’s consent before using GAI. The Company retains the right to reject the
use of GAI, including the right to reject a use of GAI that could adversely affect
the copyrightability or exploitation of the work.

E. A Company may not require, as a condition of employment, that a writer use a
GAI program which generates written material that would otherwise be ‘literary
material’ (as defined in Article 1.A.5.) if written by a writer (as defined in Article
1.B.1.a. and Article 1.C.1.a.) (e.g., a Company may not require a writer to use
ChatGPT to write literary material). The preceding sentence does not prohibit a
Company from requiring a writer to use a GAI program that does not generate
written material, such as a GAI program that detects potential copyright
infringement or plagiarism.

F. The parties acknowledge that the legal landscape around the use of GAI is
uncertain and rapidly developing and each party is reserving all rights relating
thereto unless otherwise expressly addressed in this Article 72. For example,
nothing in this Article 72 restricts any writer who has retained reserved rights
under Article 16.B., or the WGA on behalf of any such writer, from asserting that
the exploitation of their literary material to train, inform, or in any other way
develop GAI software or systems, is within such rights and is not otherwise
permitted under applicable law.

G. Each Company agrees to meet with the Guild during the term of this Agreement at
least semi-annually at the request of the Guild and subject to appropriate
confidentiality agreements to discuss and review information related to the
Company’s use and intended use of GAI in motion picture development and
production. The foregoing provision shall not be construed to waive any right of
the Guild under the National Labor Relations Act, including but not limited to the
right to seek information necessary and relevant to the administration and enforcement of this Article 72.
https://www.wgacontract2023.org/wgacontract/files/memorandum-of-agreement-for-the-2023-wga-theatrical-and-television-basic-agreement.pdf

The post What the 2023 Hollywood Writers’ Agreement Says About the Use of AI in Screenwriting first appeared on Matt Collins.