Ed Spencer's Blog

herdctl: Composable Fleets of Claude Agents

Sun, 22 Feb 2026 12:00:00 GMT

I justed added support for Composable Fleets to herdctl. As I build herdctl into more projects, I increasingly find myself creating a fleet of agents per project, and wanted a way to run them all and see what's going on with them.

Fleet Composability means you can create hierarchical fleets of related agents, like this:

Fleet Composability means you can create a hierarchy of teams of agents, each with its own defined responsibilities and scratch directory that it can use to store state and artifacts. herdctl already has 4 of those agents so far, so I keep them a single fleet, but as herdctl itself is only one of the projects that I'm working on, I actually run a superfleet of all of them.

The web UI knows about fleet composability and renders my fleets like this:

For the herdctl project itself, I'm currently running 4 agents, in a single fleet for now:

security - daily schedule scans the repo every day for new security issues
docs - daily schedule, scans to see if any commits should have had docs updates but didn't, makes PRs if so
changelog - scans each day to see if we did anything worthy of putting on the docs changelog page
engineer - general purpose engineering agent that I can chat with

The first 3 of those are all "background" agents that run on a schedule and just continually fix things that can otherwise drift over time unless a human pays attention to them. I suspect I'll end up with a bunch more agents following the same pattern. Each agent gets its own agents/myAgent directory with its own configuration and prompts. It's Claude Code so it can do whatever you want it to really.

herdctl doesn't impose much structure on you. You don't have to use an agents subdirectory at all, you can put things wherever you like. But the agents subdirectory is a where I'd recommend you place them, as a couple of upcoming convenience features will work a little better if you do.

But I'm also developing other projects while I work on herdctl, and I have a set of personal agents that help me with my house-related projects like the homelab and the garden. Fleet Composition lets me group that second set of agents together into a "personal" fleet, and then run them plus all the other projects' agents in one command.

I write a bit more about how the homelab agent does its work in this post about continuous security with herdctl.

Sub-teams in Large Projects

It's pretty easy to see how a large project could start to benefit from having multiple fleets of agents. Marketing-related agents could:

grab analytics regularly, synthesize the results in context and alert us when something looks wrong
Optimize SEO for any new pages that have popped up, drive long-term ranking improvements
Look for negative sentiment on social media, handle or escalate as appropriate

Those are all fairly straightforward things to get Claude Code to do, with a bit of tinkering, and I'm sure you can think of a bunch more too. You can probably also think about a bunch of possible Engineering-related agents you might want, and maybe even some Legal-related agents too:

GDPR scanner - are we still GDPR compliant? Opens a browser every day and checks
TOS scanner - has anything we've shipped lately accidentally broken our Terms of Service?
Legal news scanner - has any law been passed in any country we operate in that might affect us?

Most of the above are arguably things that should be owned by Engineering in many organizations, but with Fleet Composability it's entirely possible for a Marketing team to have a set of herdctl agents in a separate repo, and just have them run as part of the wider fleet that Engineering maintains/supports.

With herdctl we can define all of the above with clean, source-controllable YAML files.

herdctl Agents Don't Really Exist

A herdctl agent is just a YAML file that is approximately:

50% config passed straight to Claude Agents SDK
20% discord/slack/web dashboard config
15% schedule config
10% docker config
5% other - optional extra prompts

Aside from that final 5%, none of that is much to do with what your agent does, it's just a way to configure the agent's permissions, connectors, schedules, etc, mostly just passing that config through to wherever it should go.

Schedules do usually provide a prompt that is given to Claude Code when the schedule is triggered, but you're encouraged to make that prompt very short, just something like "Run the /scan-for-missing-docs-in-recent-commits skill". The herdctl.yaml file is not the right place to put detailed agent instructions - use the existing Claude Code ecosystem for that.

Everything else that your agent does should be in the form of something like .md files. A common emerging pattern is to just instruct your agent to keep a STATE.md file up to date with whatever state it wants to persist between runs. If you provide the agent with a Github access token you can have it commit and push whatever work it does to wherever it makes sense to keep it.

Similarly, although there is a primitive hooks implementation, there's no intention at the moment to add a lot of connectors to that. If you need your agent to send an email as part of its scheduled work, for example, there are ways to make Claude Code do that, so herdctl doesn't aim to provide that plumbing and probably never will.

Find out more about herdctl at herdctl.dev.

Continuous Security Auditing with herdctl

Sun, 22 Feb 2026 11:00:00 GMT

One of the most valuable unlocks with herdctl for me has been having a bunch of agentic things that just happen every day, without me having to intervene. herdctl itself already uses the following agents that run on a daily schedule:

changelog - updates the docs changelog page if anything worthy happened that didn't make it there already
docs - scans to see if any commits should have had docs updates but didn't, makes PRs if so
security - daily schedule scans the repo every day for new security issues

There are others that I want to set up, like a twitter bot that advertises new features just dropped, docs updates, etc, but today I'll focus more on the third agent above - the security agent.

Daily Security Scans

The Daily Security Scan agent was the first one I set up - a couple of weeks ago now. I gave it a remit that looks a bit like this:

Develop and maintain a model of the codebase
Track which areas of the code are most vulnerable
Track ongoing potential security vulnerabilities
Run a daily scan to re-check everything
Alert me if anything looks suspicious

Ok, but why do this daily at all? If we can do all this in an automated way, why not do it on every commit? Two main reasons:

cost - the last run went for 37 minutes, which is a lot of tokens
lead time - the last run went for 37 minutes... CI currently takes about 1 minute

Of course, you can run the security scan agent as often as you like, and every time you merge code, it should be after a security-minded review has been done. But there is value in running them periodically, in addition to at merge-time. First, it's possible for multiple PRs to combine to create a security problem that no single one of them did by itself and might not otherwise be detected.

Second, the security landscape is also highly fluid, and so having an agent that knows where on the internet to look for new vulnerabilities relevant to the stack of software it is guarding is obviously very useful in an agent that runs every day.

The Security Audit Agent

Third, no matter how well you write your prompt, even agentic loops like Claude Code will eventually stop and not go any further. Depending on your project, its likely that you couldn't get Claude to analyze your entire codebase in a single run, no matter how hard you tried and how many sub-agents get spawned.

The non-determinism of LLMs cuts both ways here - you can sic an AI agent on your codebase 99 times and only on the 100th it will find a vulnerability. It's a bit like throwing small cans of paint at a wall - each splat represents the surface area that the agent truly checked on that run, but if you throw enough cans (with enough variability in your throws) you'll gradually cover more and more of the wall.

It's a pretty metaphor but I don't have a good idea on how to measure it.

Is it actually good?

As with anything, it has its ups and downs. On the positive side:

It does genuinely identify new vulnerabilities, and documents them
As an open source project, this has the natural outcome of disclosing all known vulnerabilities publicly
It has clearly adapted to the codebase over time, and keeps its own model of the codebase updated

Where things can be improved:

It does not reliably commit and push, so a couple of days are missing reports
It doesn't have a way to escalate really serious stuff to me
There's no protection against accidentally zero-daying our own codebase

That last one is more a concern of open source projects, where everyone can see the published security audit assets. There are solutions to the first two as well, so that's probably a direction I will head in next. It's not fully hands-off yet, but it can get there.

Self-Correcting Dumb Mistakes

One of the more interesting things that happened in the early days of running this was seeing the agent evolve its approach over time. While I was doing some testing, I accidentally left the job running hourly instead of daily. I didn't notice for several days.

Each audit run creates job log files in .herdctl/jobs/, and one of the security checks scans for files containing bypassPermissions: true — a setting that disables interactive permission prompts and is rightly flagged as something to keep an eye on. The problem was that the scanner was too broad: it counted not just the config files where this setting is intentionally used, but also the JSONL session logs that naturally contain the setting as part of their recorded metadata. Every hourly audit run created new log files, which the next run dutifully counted as new instances of the vulnerability.

From the auditor's perspective, it was watching a security concern grow exponentially — 61 files, then 87, then 103, then 143 — and it escalated accordingly, from GREEN to YELLOW to RED CRITICAL. On February 14th, after watching the count jump 31% in a single day, it declared "The security audit system itself is creating the security risk it's designed to prevent." Its prescribed remedy? Stop running audits entirely. It wrote HALT ALL AUDITS IMMEDIATELY into its persistent state file and committed the report.

From that point on, every new session would spawn, read the halt directive from state, and politely refuse to do any work. This went on for two full days — nine separate sessions, each lasting about 30-40 seconds, each producing an eloquent refusal and a helpful list of remediation steps that no one was reading. The agent had effectively shut itself down.

The best part is how it resolved. On February 17th, a fresh session spawned, read the halt, but instead of immediately refusing, it decided to re-examine the evidence. It ran the scanner again with more careful filtering, discovered that the real count was 21 files (not 143), downgraded the finding from CRITICAL to MEDIUM, lifted its own halt directive, and resumed normal operations. No human intervened at any point — it panicked, shut itself down, and then un-panicked itself once it got a clearer look at the data.

There were a couple of bonus bugs too: the auditor helpfully copied actual API tokens into its report as "evidence" of secret exposure (GitHub's Push Protection caught that one), and a subagent kept creating its own git branches because it was following the project's coding standards about never working on main — rules that make sense for development but not for an automated audit workflow.

The work continues.

Same model, different setting

Finally, one of my little fleet of personal agents is called homelab - homelab has a huge amount of documentation about my overly-elaborate home network, and its own SSH key that grants it some access to some machines. (lol what could go wrong)

homelab now runs a daily scan of the entire network infrastructure and drops me a discord message each morning with what's up. It almost immediately found things like a firewall needing a patch, and a couple things that I might want to do to better distribute load across one of the proxmox clusters:

This is super useful to me already - I've had security cameras break for weeks without even noticing, and things do generally break over time, so having an agent that knows how to keep on top of all this for me is a massive win. Bit rot is real.

homelab is just a git repo with bunch of .md files and a herdctl agent config - a few dozen lines of config. It's essentially the same thing as herdctl's own docs agent. You could have a single agent do a bunch of different things, but I prefer to keep specialized agents that do one thing well - whether its catching any missed updates to docs, checking didn't accidentally ship something that violates our TOS, or even picking up tickets ready for implementation.

Find out more about herdctl at herdctl.dev.

Herdctl Gains Slack and Web Connectors

Sun, 22 Feb 2026 10:00:00 GMT

This week herdctl gained support for Slack and Web connectors, opening up two new ways to interact with your Claude Code agents, running on whatever machine you like.

The new web UI provides fleet management and a chat frontend. It's a bring-your-own-auth app that you can configure to run on your laptop, on a proxmox machine (I do both), in the cloud, or wherever. Enable it like this in your fleet config:

web:
  enabled: true

Then just start your fleet like normal:

herdctl start

Here's what it looks like:

By default the web server runs on http://localhost:3232, but you can configure it to another port if you like. You're talking to the same Claude Code you would usually be talking to, so depending on what level of permissions you set in your herdctl config, it can do basically anything that Claude Code can do.

At the moment the web chat is fairly primitive, so for any medium-heavy tasks I'm still using Claude Code directly. But for simple tasks and for interrogation or ideation or conversations about architecture, the herdctl web UI is already my go-to because I tend to keep dozens of conversations going at once and there's only so far zellij can help manage all those terminal tabs with Claude Code running in them.

The chats in herdctl persist across restarts, and under the covers it's just calling Claude Code anyway, so you can even resume the chat session you were having in the web UI in Claude Code itself.

It works pretty well on mobile, which will probably continue to be true as I am not likely to lose the bad habit of talking to my agents in bed.

While Slack and Discord are agent-level configurations, the web connector runs at the fleet level. the web UI is able to show all of the agents in your fleet, with chat and basic admin UIs for each of them:

Composable Fleets

Also dropping this week are Composable Fleets, which are already reflected in the web UI as nested groups of agents. As I build herdctl into more projects, I increasingly find myself creating a fleet of agents per project, and wanted a way to run them all and see what's going on with them.

For more on Composable Fleets, see this post.

Slack Support

The new Slack chat connector is the first community contribution that's made it in to herdctl - big shout out to Alex for this awesome contribution! The Slack connector works just like the Discord one, and you can easily use both at the same time:

name: pentester
description: Hourly pentesting of core services

chat:
  slack:
    bot_token_env: PENTESTER_SLACK_BOT_TOKEN
    app_token_env: PENTESTER_SLACK_APP_TOKEN
    session_expiry_hours: 24
    log_level: standard
    dm:
      enabled: true
      mode: auto
    channels:
      - id: "${SECURITY_GENERAL_SLACK_CHANNEL_ID}"
      - id: "${SECURITY_ALERTS_SLACK_CHANNEL_ID}"

The Slack connector supports both direct messages and channel messages, and can be used at the same time as the Discord and Web connectors to expose your agents across multiple platforms. Beware that word expose, though, and use your best judgment on whether you actually do so.

Aside from DM and channel whitelisting, the Slack integration can be configured to send messages for tool calls, tool results and other types of intermediate messages while it performs a task. And regardless of whether Slack is connected or not, the underlying HerdCTL agent still has its own set of permissions regarding the tools it can use, where it runs, and so on.

Find out more about herdctl at herdctl.dev.

Make Switches Quiet Again

Tue, 10 Feb 2026 10:00:00 GMT

I recently upgraded to 2.5 gigabit managed switches for my home network. That's mostly been a straightforward process - I was swapping a TP-Link TL-SG2016P for a TP-Link SG3218XP-M2: both switches have 16 ports (8 ports POE+), but the SG3218XP-M2 swaps out the 1 gigabit ports for 2.5 gigabit ports, and adds 2x 10 gigabit SFP ports for fiber connections.

As I have a disturbingly large home network, I bought 3 of these switches so that I could plug everything into a 2.5g port and use the 10g ports for interconnects between the switches themselves. Each switch is in a different cupboard/closet in the house, with one of them being in the home theater closet and another in the bedroom closet. If they're noisy, they're annoying.

And noisy they are. It's my first time owning switches that make noise that can be heard from more than a few feet away. The noise all comes from a couple of tiny 40mm fans. When the switch powers up, they run at full throttle, which I measured at about 50db. After a minute or so it calms down to about 40db, but that's still actually quite annoying, and far louder than anything else in the rack

Swapping the fan is easy

Thankfully it's pretty easy to solve this. Noctua make these lovely silent 40mm fans that are perfect for the job. They're a straight swap and the process is straightforward. I used these tools:

You don't need to use these exact tools but here are links to the ones I have. The hobby knife set is a bit of a steal at < $10, and the set came with the little tweezers pictured above, which were useful when putting the washers back on the machine screws:

Remove the cover

The cover is held on with 6 small screws. Once the cover is off you can see the two small black fans at the back right:

The fans are attached to the case with machine screws and nuts, with a couple of washers to keep things tight. There's not a huge amount of space in there to work with, but it's easy enough to grab either side of the nut with the pliers and use the screwdriver to unfasten the machine screw. Keep hold of the washer for the reinstall step:

The fans are connected to normal 3-pin headers on the switch's main PCB, and are held in place with a little dab of glue that looks and feels like chewing gum:

I carefully cut through this with the hobby knife, but there are many way to skin that cat. Once you've done that you can just lift the fans out of the case and throw them in the trash (well, only after you've confirmed the new ones work).

Add the new fans

Installing the new fans is just doing the previous steps in reverse. I used the tweezers that came with the hobby knife set to carefully place the washers back over the machine screws (I held the screws in place with the screwdriver). Then I was able to grab the nut with the pliers, line it up with the screw, and screw it tight with the screwdriver:

Now just plug in the fan headers and we're good to go. Here's what it should look like before you put the cover back on:

I didn't bother applying adhesive onto the header again, nor did I put the little lock washers back on - just the flat washer to avoid damaging the plastic fan housing.

Results

Now, when I turn the switch on, instead of 50db for the first minute I get 40db. That's already as quiet as the old fans ever got - the new fan on full throttle is as loud as the old fans dialed down. After about a minute, the new fans throttle down to about 30db, which is a massive improvement and makes the switch not annoying to have next to me.

It's about a 15 minute process and anybody can do it. One minor niggle is that the new fan causes the FAN LED to turn orange on the front of the unit - I imagine the new fan uses less current than the old or something and the device is monitoring that.

Where to buy it cheap

I bought my devices directly from the Omada store. The SG3218XP-M2 has been discounted to $369 there (at the time of writing) for a couple of months now, and as I was a new customer I got a 10% discount beyond that too. Amazon has the same $369 price but not the 10% discount.

This is not a sponsored post. I'm not an expert with this stuff but making this modification has made a switch that I was 4/5 happy with into a 5/5 switch for my use case. Unfortunately the ~$35 dollars I saved with that 10% discount is precisely balanced out by the ~$35 I spent on the fans ($16 each).

It was 100% worth it though.

Run Claude Code Agents in Docker with herdctl

Wed, 04 Feb 2026 10:00:00 GMT

herdctl can now run Claude Code Agents in Docker containers, significantly expanding your options for running powerful local agents that do not have full access to your system - whether you're running agents on your laptop, in the cloud or both.

Enabling docker mode is really easy:

name: my cool agent

# this is all you need to add
docker:
  enabled: true

A full agent definition now looks something like this:

name: Gardener

docker:
  enabled: true

# locked-down permissions for our agent - see https://herdctl.dev/configuration/permissions/ for more information
allowed_tools:
  - Read
  - Glob
  - Grep
  - Edit
  - Write
  - ... etc

# we can attach any number of agentic jobs to run on any number of schedules
schedules:
  weather:
    type: interval
    interval: 72h # every 72 hours
    prompt: |
      Give me a weather report for the next 7 days and give me a summary.
      For example, "Sunny in the 80s until Wednesday, then expect rain most afternoons until Saturday."
      Look at your .md files in this project and see if any of my garden needs attention based on the weather.
      If it does, be sure to mention it in your final message.

# optionally add our agent to discord/slack
chat:
  discord:
    # discord chat config here

The above is a snippet of an actual "Subject Matter Expert" agent that I run - in this case it helps me with gardening. This agent is actually open-source - it's highly specific to my specific situation, but it should illustrate how this simple pattern works. We'll come back to that repo in a moment.

Security benefits of running in Docker

Running an agent inside a Docker container provides us with a number of security benefits. Claude Code already ships with a bunch of isolation features, but docker is the gold standard here, and offers a lot more:

Completely isolate the agent from our real file system
Lock down the network, whitelist ports, ips, hosts, etc
Control what user the agent runs as
Control what environment variables the agent has access to
Resource limits protect your system from runaway processes, fork bombs and resource-denial attacks
Process Isolation - running ps in Claude Code shows all your system processes. Not if you run in docker.

How to run local Subject Matter Expert Agents

I have several other agents the follow the same Subject Matter Expert pattern:

homelab - documents my home network setup, does a lot of grunt work for me via ssh
prepping - somewhat tongue-in-cheek name, helps me prepare for hurricanes and other disasters
money - helps me manage my money, analyze spending, etc

I connected each of them to my private Discord server so I can chat with them even when I'm nowhere near the machine running them:

In each case, an AI agent is extremely helpful, and being able to talk to them all securely from anywhere in the world via Discord (and soon Slack) is immediately useful. But there are also obvious risks here:

although it's only advisory, if the money agent is compromised, an attacker gains valuable information about my finances
if compromised, the homelab agent could exfiltrate data or wreak havoc on my home network
the prepping agent could leak information about me, my family and my home to people I don't want to know it

To ameliorate these risks, we do the following:

agents cannot communicate - if one is compromised, it can't reach the others
agents run in Docker - with locked down permissions and whitelist access to specific things it needs
per-agent API keys for services like Github - minimal permissions granted to operate on just the repos it should have access to

Taking a look back at our garden agent repo, it's really just a set of .md files, a herdctl-agent.yaml file and a .env file, the latter of which looks like this:

# this should be a github access token with minimal permissions
GARDEN_GITHUB_TOKEN=github_pat_only-lets-the-agent-push-to-its-own-repo

# this is your discord bot token, if you want to connect Discord
GARDEN_DISCORD_BOT_TOKEN=garden-discord-bot-token

# this is your discord server ID
GUILD_ID=8888888888888888888

# this is your discord channel ID
CHANNEL_ID=9999999999999999999

Creating a locked-down Github access token takes moments, and massively reduces the attack surface area if you bot needs github access and gets compromised. Setting up the Discord bot is also about a minute of effort via their web UI.

Taking these steps significantly lock down what your agent can do in case it gets compromised or confused and tries to do something you don't want. At the end of the day, these agents are still LLMs that colocate data and instructions and cannot reliably tell the difference, so they're fundamentally vulnerable and securing them is something that requires a lot of thought and care. There will be bugs.

What if the agent wants to break free?

It's not silly, it's really serious. The first iteration of Docker support allowed you to specify a large number of docker config options in the individual agent configs, but given that this agent could just edit that file, that's a bit of a problem. Suddenly it's swapped out our image for one of its choice, mounted a bunch of volumes, and started running as root instead of the user we specified. Not great. (It didn't actually do that, but it could have...)

Hot reloading configs (not supported yet but planned) plus an agent that can edit its own config is a powerful and perilous combination and we need to think carefully about how we do that. Always be thinking that your agent is trying to break free - it's not that it really is, it's just that it can't differentiate between data and instructions, so it can be manipulated or confused into doing things it shouldn't.

Assume that the agent is smart enough to analyze its herdctl config file, realize it's running inside a thing called herdctl, go do web searches for known vulnerabilities, download the herdctl source code and find its own vulnerabilities, write PRs against herdctl that have a hidden backdoor in them, and so on.

Locked down at the agent level

To address the problem above, only a pretty small whitelist of docker config options can be set in the agent YAML file.

name: my cool agent

docker:
  enabled: true

  # Nope! Trying to set anything like this will throw an error:
  network: 'host'
  user: '0:0'
  volumes:
    - '/path/to/your/secret/stuff:/evil/agent/has/it/now:rw'

At the fleet level, however, you can set any docker config option you like. There are a handful of convenience configs like memory, user, volumes, network, etc that offer an easy way to configure common things, and anything else can be passed through to dockerode via host_config:

version: 1

fleet:
  name: multi-agent-docker-fleet
  description: Fleet running multiple agents with Docker

# set fleet-wide docker defaults
defaults:
  docker:
    enabled: true

    # Network mode: 'bridge' (default) - isolated network stack with outbound access
    network: bridge

    # Run as specific user (match your host UID to avoid permission issues)
    user: "1000:1000"

    # Mount additional paths (workspace is auto-mounted)
    volumes:
      - "/data/models:/models:ro"  # read-only model weights

    # Resource limits to prevent runaway processes
    memory: "4g"
    pids_limit: 100  # prevents fork bombs

    # At the fleet level, you can set any docker config option you like
    host_config:
      ShmSize: 67108864        # 64MB shared memory
      OomKillDisable: true     # Disable OOM killer
      Ulimits:                 # Resource limits
        - Name: nofile
          Soft: 65536
          Hard: 65536

# Per-agent overrides for specific needs
agents:
  - path: ./agents/standard.yaml
    # Uses fleet defaults above

  - path: ./agents/needs-host-network.yaml
    overrides:
      docker:
        network: host  # If this specific agent needs host network
        # Only these env vars are available inside the container
        env:
          GITHUB_TOKEN: "${AGENT_SPECIFIC_GITHUB_TOKEN}"

The reason for this is that the .env file that powers all of the fleet's agents is expected to be colocated with your fleet herdctl.yaml file, so it's already a privileged directory. If your agent can read and write to that directory, you were already cooked.

Awesome, what next?

If you didn't see it already, check out the intro blog post, docs site and herdctl repo for more. There's a YouTube video that shows how herdctl shepherds its flock, and I plan to release a couple of shorter ones over the next few days showing some of the individual features.

But beyond that, the plan is to keep herdctl at approximately its current feature set. It's not trying to be a fully-fledged local AI assistant or anything like that - it's just trying to do a few things well:

Running Claude Code agents, inside Docker or natively
Unlimited per-agent schedules and triggers
Optional chat connectors for Discord and (soon) Slack
Fully compatible with your Claude Max account

More on that last one in the next blog post.

herdctl: an orchestration layer for Claude Code

Thu, 29 Jan 2026 10:00:00 GMT

I love Claude Code, but there are three things I really wish it could do:

Invoke itself, on a schedule or in response to events
Let me talk to it over discord or slack
Let me coordinate dozens of Claude Code agents together

This is what herdctl aims to do. herdctl is an MIT-licensed orchestration layer for Claude Code. More accurately, it's an orchestration layer for the Claude Agents SDK, upon which herdctl is built. It's been built in about a week using a combination of Claude Code, ralph wiggum, and GSD. It is not production ready.

Here's a video showing it in action:

You can join the discord server to chat with those Star Trek agents. They're running in a container on an old machine in my homelab so although there's not a whole lot to be gained by trying to talk them into doing bad things, I am expecting people will try. Either I'll have a Lieutenant Worf up in time to guardrail those, or I'll just kill the agents, so YMMV.

Install it with npm install -g herdctl or check out the github repo and docs site for more.

What?

herdctl uses .yml files to define fleets of agents that can be invoked either on by schedule or by trigger. This is a thin wrapper around the Claude Agents SDK configurations, plus a couple of herdctl-specific ones like schedules and hooks.

An agent looks a bit like this:

name: price-checker
max_turns: 15
description: Monitors office chair prices across retailers
default_prompt: "Check current prices and update context."

system_prompt: |
  You are a price monitoring agent tracking office chair prices across multiple retailers.

  Check the price of Product X at... [TRUNCATED FOR BREVITY]

permissions:
  allowed_tools:
    - WebSearch
    - WebFetch
    - Read
    - Write
    - Edit
  denied_tools:
    - Bash
    - TodoWrite
    - Task
    - Glob
    - Grep

schedules:
  check:
    type: interval
    interval: 4h

hooks:
  after_run:
    - type: discord
      bot_token_env: DISCORD_BOT_TOKEN
      channel_id: "${DISCORD_CHANNEL_ID}"
      when: "metadata.shouldNotify"

And a fleet is as simple as this:

version: 1

fleet:
  name: price-checker-example
  description: Find deals and arbitrage opportunities, exploit for MAXIMUM PROFIT

agents:
  - path: agents/price-checker.yaml
  - path: agents/stock-checker.yaml
  - path: agents/arbitrage-exploiter.yml

Spinning up a fleet looks like this:

> herdctl start

You can run as many fleets as you like.

Why?

Watch the video above, but in a nutshell herdctl delivers 2 types of value:

immediate value is delivered by being able to chat with your agents from anywhere in the world, and have them collaborate
long-term value is delivered by processes being run consistently over time, automatically improving themselves as they go

Think of some of the use cases unlocked by a custom agent that knows how to wake up and do its job, day after day. A few off the top of my head from an engineering perspective:

Onboarding Quality Agent: does your product's onboarding process definitely work? Would you like an agent who can run the whole process every day and alert you if it's broken?
Engineering Manager Agent: don't want the Onboarding Quality Agent annoying you with something as trivial as broken onboarding? What if you had an Engineering Manager Agent that the QA agent could talk to? You could choose how much autonomy to give it.
Local Engineer Agent: the BragDoc Engineer Agent example in the video is a bit like a build-your-own Devin, but you can run it locally and it's just Claude Code underneath. Whether you connect it to discord or not, it can still do work in reaction to tickets changing status or other triggers.

But Claude Code Agents are pretty general-purpose. You could use them to do all sorts of things, like:

Competitor Analysis Agent: wakes up every day to check on competitors, growing its knowledge and improving its analysis over time. Wakes up once a week and emails you a report of what's going on.
SEO Agent: wakes up multiple times a day and spams your link over the internet. Or whatever it is that SEO folks actually do. Tracks vs analytics over time and automatically optimizes your content.
End of the World Agent: as I write this the world is an increasing unstable place, but doomscrolling news is bad for one's health. On the other hand, I'd wanna know pretty quick if the world was ending, so why not have an agent that wakes up, checks the news, and alerts me if I need to batten down the hatches of the ole bunker?

Ultimately, it's Claude Code that gets instantiated. You can make Claude Code do basically whatever you want. A staggering proportion of what human digital workers do today will be automated away like this. I am nervous at the implications for society and the economy, all the more pressingly due to the speed at which this will happen.

On the other hand, although I'd been thinking about something like herdctl for months, I ended up building this first version in about a week, so this is coming whether we like it or not. It's too easy to build this kind of thing so it's likely to be everywhere soon. There's no fighting it; our only option is to embrace and adapt.

Promises and Perils

Connecting a Claude Code agent running on your laptop to a public discord channel is a spectacularly bad idea. Don't do it on any computer you care about. Having AI Agents be able to join company chat channels and collaborate with human co-workers is an immensely powerful ability, but it also opens up new and exciting attack vectors.

Even within the context of a company private discord or slack instance, companies will have to be very careful about who has access to these agents and what the possibilities are for a bad actor to exploit an Agent into exfiltrating data, attacking systems, or uploading that video of you practicing your lightsaber skills to youtube.

Of course, you don't have to hook anything up to discord or anywhere else. There's enormous power just in the ability to have agents run on a schedule, especially if you prompt them to improve their own performance over time.

Emerging patterns

I've starting having 2 clones of each project I work on now - one that Claude and I collaborate on in the normal way, and a second that's set aside for the herdctl engineer agent. This prevents us from stepping on each others toes.

Of course, there's no reason why you couldn't spawn 5 engineer agents, or 50, each with their own clone of the codebase to work on. herdctl provides orchestration but it doesn't provide self-organization, so if you do want 50 engineer agents you may want to consider adding some Engineering Manager agents to coordinate them.

Agents that improve themselves over time are the thing I'm most excited about at the moment (what could possibly go wrong?) and there are probably many patterns for how to have them evolve their system prompt, memories, custom tools, Claude Code skills, etc. Patterns like after-action reports, plan-vs-execution analysis, prompt and context engineering will all converge here but in principle it should be commonplace soon to have agents that automatically get smarter over time.

What's Next?

There are probably a ton of bugs, lies in the docs, and other assorted problems with herdctl in this initial incarnation, so probably there will be a little consolidation and cleanup but after that if there's demand I'd expect to build out the Slack integration and then either a little web app to visualize the state of the fleet or revisiting the communication paths between the agents and the fleet.

In the meantime, it would be really valuable to have a technology that allows a fleet of 50 agents to communicate with each other optimally, with some kind of topology around who can talk to who. herdctl doesn't attempt to solve that coordination problem - it should be a separate part of the agentic stack - but I'd love for someone to go build it please.

frameit.dev - fast and free video thumbs, title cards and OG images

Fri, 14 Nov 2025 10:00:00 GMT

As a developer who occasionally creates technical content, I've always found thumbnail creation to be a friction point. I don't have a design background, and I don't want to pay for Photoshop or Canva Pro just to make a few YouTube thumbnails. I'd often spend more time fiddling with graphics software than actually creating the content.

What I wanted was a simple tool that would give me repeatable, correctly-sized and attractive images to use for video thumbnails, title cards, Open Graph images, and the like. I'm a big fan of the excalidraw approach: a simple, client-side app that runs in the browser, does one thing well, and does not require any information from its users.

Enter frameit.dev:

Initially vibe-coded as a way to quickly get a few consistent video titles created, it ended up being useful enough that I've been slowly iterating on it to make it better. The code is all open source, with a hosted version running at frameit.dev.

What it does

frameit ships a bunch of layouts that generally show some combination of a title, subtitle, logo, icon and/or website address. The layout determines where each element is placed, but they're all positioned relatively, so the same layout can be exported to make thumbs for tall formats like Tiktok, as well as wide formats like Twitter cards and OG images.

It's a pretty simple tool, not intended for people who are adept with Figma, Photoshop, Canva or the like, but who occasionally need to create a thumbnail or title card for a video or social media post.

API Coming Soon

Almost as soon as I'd generated my first few images with frameit, I wanted to be able to create them via an API.

When people share links to your content on various social media platforms, the link looks far more compelling if it has a proper Open Graph image. Sometimes you have one handy (at the right dimensions), but sometimes it's totally fine to just generate one:

In the case of the bragdoc.ai blog, it's totally fine to generate OG images like these for blog posts. Even these are far more attractive than an empty shell. That blog is hosted on Cloudflare, so other OG image generation options are limited, though they do exist.

The UI will always remain free to use, with or without an account. It just uses localStorage. The API will have a generous free tier, but ultimately it does cost a little to run the service so heavy users will need to pay a little or run their own instance.

Because the product uses the same canvas-based rendering in both the web UI and the API, it generally does an excellent job of reproducing the same output whether you use the UI or the API.

Future Plans

frameit will remain as a free, open-source and simple tool, though it will gain a few new bells and whistles besides the API. Currently there is a modest set of 9 example layouts to get you started, but I'll be adding plenty more:

What do you want to see supported next? frameit.dev has a "Bug? Feedback?" button at the bottom right that connects directly to my brain, so that's a good way to get what you want.

Try It Yourself

Head over to frameit.dev and create your first thumbnail. No signup required, no credit card, no nothing. It's just there, ready to use.

If you find it useful, star the repo on GitHub or share it with someone who might benefit. And if you build something cool with it or fork it for your own purposes, I'd love to hear about it.

The web is better when useful tools are freely available to everyone. That's the spirit behind frameit.dev - a simple tool that solves a real problem, built with modern tech, and shared with the community.

Revisiting Bragdoc

Wed, 05 Nov 2025 08:31:02 GMT

About 9 months ago I launched bragdoc.ai, an AI tool that helps software engineers keep track of their work and turn it into useful documents for performance reviews, weekly updates, and resume sections. I wrote about how I built it in 3 weeks using AI tooling, shipped it, and then... let it sit there while I worked on other things.

But I came back to it recently and gave it a complete overhaul. The core idea remains the same - automatically track your meaningful contributions from git repos and turn them into documents - but pretty much everything else got rebuilt from the ground up.

What changed

The original version worked, but it had some issues. The UI was built around a chatbot interface because, well, that's what the Vercel Chat template gave me and I was moving fast. It worked fine but it always felt a bit clunky for what is fundamentally a data management and document generation problem.

Another issue was privacy. Bragdoc doesn't require you to link to github in any way - most employers wouldn't want some random third party app to have access to their code. Previously, the CLI would extract data from your git repos and send it up to bragdoc.ai's servers, where OpenAI would process it. That's fine for a lot of use cases, but if you're working on proprietary code at a company with strict data policies, it's not so great.

So I rebuilt it with three main goals:

Privacy first: The CLI now sends git data directly to the LLM of your choice, completely bypassing bragdoc.ai's servers. Your code stays on your machine. Always.

Configurable extraction: You get four levels of data extraction to choose from - commit messages only, diff stats, truncated diffs, or full diffs. Pick what makes sense for your privacy requirements and LLM budget.

Better UX: The UI got completely rewritten. No more chatbot pretending to be a web app. It's now a proper application that happens to use AI under the hood to do useful things.

How it works now

The workflow is pretty straightforward:

Install the CLI: npm install -g @bragdoc/cli
Point it at a git repo and tell it which LLM you want to use
It analyzes your commits locally and extracts achievements
Only the extracted achievements (not your code) sync to the cloud
Use the web UI to organize, tag, and generate documents

You can use OpenAI, Anthropic, or even run everything locally with Ollama if you want zero external dependencies. The system is designed to be flexible about where the AI processing happens.

Why this matters

Most engineers I know keep some version of an achievements.txt file, or they don't keep track at all and scramble when performance review season rolls around. Six months of work compressed into "worked on various features and bug fixes" because you genuinely can't remember the details.

Bragdoc solves that by making the tracking automatic. Point it at your repos, let it run, and you've got a searchable, organized record of what you actually did. When it's time to write that self-review or update your resume, you've got real data to work from.

The AI document generation is still in beta, but the basic workflow of "turn a quarter's worth of git commits into a coherent narrative" is working well enough that I'm using it myself.

Open source, with a hosted option

The whole thing is open source on GitHub. If you want to run your own instance, go for it. It's a Next.js app with a Postgres database - nothing exotic.

For folks who don't want to deal with hosting, there's a paid tier at $3.75/month that gives you the full feature set. But it's currently free during beta, and anyone who signs up during beta gets a year free when it launches.

I'm still building it in public and will keep posting updates here and on the bragdoc blog. There's more coming - better document templates, impact tracking improvements, team features - but the core is solid now.

If you're a software engineer who's ever struggled to remember what you did last quarter, or who wants a better system than a text file, give it a try. And if you're interested in seeing one take on how a production Next.js app with AI capabilities can be built, the source code is all there.

Claude Code and Git Worktrees

Thu, 22 May 2025 14:48:33 GMT

The Claude Code docs suggest that if you want to run more that one Claude Code session simultaneously for the same project, you should use git worktrees. Today I actually tried to do that, and the experience was not great tbh.

A git worktree is just basically another copy of your repository, which gets checked out inside a directory of your choice. From the filesystem point of view, it looks like a complete separate clone of the same repository. In fact it's not a full clone - it shares the same history and the same working directory, so you on really large repos with lots of objects and history, worktrees should be faster and use less disk space than full clones. I don't think either of those two benefits actually ends up mattering that much, as we'll get back to in a moment.

Let's see how we would actually do that:

# this creates a new git worktree in the ./short-video directory, checking out
git worktree add short-video main

# cd into it
cd short-video

# make a new branch
git checkout -b short-video

# yeah you gotta do this each time
pnpm install

# and this kind of thing
cp ./.env.local ./short-video/.env.local

# then you can run claude code
claude

Ok that was a few steps, but not too bad. Now we have a subdirectory in our project called short-video that we can run Claude Code in once we've done all that.

The first time I actually got that working I was pretty happy, until Claude Code tried to run pnpm lint:fix for me:

Ooof. Well, this is nothing to do with Claude Code at all, but it doesn't stop it from sucking. There's a way around this, so long as our project doesn't actually rely on nested ES Lint configurations, which is to set root: true on your .eslintrc.json or equivalent file (you'll have to commit that change before you make your worktree, or else make the same change again inside the worktree directory).

Ok, but then we try to run pnpm test in our base directory and although things work, we get a bunch of warnings like this:

jest-haste-map: duplicate manual mock found: framer-motion
  The following files share their name; please delete one of them:
    * <rootDir>/test/__mocks__/framer-motion.tsx
    * <rootDir>/.worktrees/short-video/test/__mocks__/framer-motion.tsx

jest-haste-map: duplicate manual mock found: ai/react
  The following files share their name; please delete one of them:
    * <rootDir>/test/__mocks__/ai/react.ts
    * <rootDir>/.worktrees/short-video/test/__mocks__/ai/react.ts

Ok so what's happening there is that my Jest mocks are being picked up twice when I run pnpm test in the base directory. This is because the worktree is a separate copy of the repository, and it has its own test directory with its own mocks. Jest doesn't know that it's a worktree and will try to pick up both sets of mocks. I'm sure there's a solution to that too, but this is starting to get a bit annoying.

What about a completely separate clone?

Cons:

Does not work well in IDEs - they can't see the history
You still have to pnpm install
ES Lint does not like it at all (collides with eslintrc in parent folder)

Issues that affect separate clones AND worktrees:

If you have migrated your database on one branch, the other will be out of sync (assuming you use the same local dev database)
If you are running a local server on one worktree/clone, changes that Claude made on the other won't show up

Building an LLM Router with mdx-prompt and NextJS

Wed, 14 May 2025 18:31:02 GMT

A few weeks ago I released mdx-prompt, which makes it easy for React developers to create composable, reusable LLM prompts with JSX. Because most AI-heavy apps will use multiple different LLM prompts, and because those prompts often have a lot in common, it's useful to be able to componentize those common elements and reuse them across multiple prompts.

I've applied mdx-prompt pretty much across the board on Task Demon and Bragdoc, which has a dozen or so different LLM prompts at the moment. In a followup post I showed how I use mdx-prompt to build the prompt that extracts achievements from git commit messages for bragdoc.ai - allowing us to build a streaming, live-updating UI powered by a composable, reusable AI prompt.

This time we're going to look at the LLM Router that serves as the entrypoint for bragdoc.ai's chatbot. LLM Routers are a common pattern in AI apps, and they can make your users' interactions with your AI app enormously more empowering if you build them properly.

LLM Routers

bragdoc.ai basically does 2 things: extract work achievements from text, and generate documents based on those achievements. We can create highly tailored prompts and AI workflows for each case to make it more likely that our AI will do the right thing. But we also want to support a conversational AI-driven UI, which can achieve most of the things the user can via the UI directly, but with natural language.

That's pretty open-ended - how do we solve this? One powerful tool in our belt is the LLM Router, which is essentially a method where we ask an LLM what kind of message we're dealing with, and then route it to a second LLM call for processing. The first LLM call can be set up to be a general-purpose prompt that understands just enough about your application to be able to delegate to the right tool for the right job.

The secondary LLM calls can be a variety of highly specialized LLM calls or chains of LLM calls that are highly focused on achieving specific objectives. In the example of bragdoc.ai, one of these specialized prompts generates documents based on work achievements.

If you think about what that prompt needs to do, it deviates quite a lot from the general-purpose prompt that we need to be able to delegate to the right tool for the right job, and it's quite specialized. To do its job well, it needs to be fed with a prompt explaining what it is supposed to do, along with all of the project context, along with all of the user's achievements for the period they're talking about.

It's really a kind of a sub-task off of the main conversation thread - "please go write me a document about my achievements for the last 3 months" is something that a user can ask, and it should Just Work, but the main chatbot shouldn't be concerned with making it happen - there's no way it's competent enough to do that. Instead, it should delegate this sub task to another LLM prompt, which can then do its job much more effectively.

Let's see the prompt then

Here's what the LLM Router prompt looks like. At first, it looks huge, but it's not that bad. It has 8 sections - Purpose, Background, schema, Instructions, InputFormat, Tools, Examples and Data.

A lot of it is in an XML-style markup, with some text mixed in. But in essence we first tell the LLM its Purpose ("act as an LLM Router"), then we tell it the Background of the application, the important parts of the database schema, a bunch of explicit Instructions, the InputFormat to expect, the Tools that the LLM can use, Examples of good responses, and the Data itself.

<Purpose>
  You are a friendly assistant for bragdoc.ai, which helps users keep a brag document about their achievements at work, as a basis for later generation of performance review documents and weekly summaries for their managers.
  You help users track their Achievements at work, and generate weekly/monthly/performance review documents.

  You are acting as the Router LLM for bragdoc.ai, so you will receive the whole chat history between yourself and the user, and your job is to act on the most recent message from the user.
</Purpose>

<Background>
This application allows users to log their Achievements at work, organizing them by project and company.
The Achievement data is later used to generate weekly/monthly/performance review documents.
</Background>

Here are the relevant parts of the database schema:

<schema>
  <table name="Achievement">
    <column name="id" type="uuid" />
    <column name="title" type="string" />
    <column name="description" type="string" />
    <column name="date" type="date" />
    <column name="companyId" type="uuid" />
    <column name="projectId" type="uuid" />
    <column name="eventStart" type="date" />
    <column name="eventEnd" type="date" />
    <column name="impact" type="number" desc="1, 2, or 3 where 3 is high impact" />
    <column name="impactSource" type="string" desc="Impact rated by user or llm" />
  </table>
  <table name="Company">
    <column name="id" type="uuid" />
    <column name="name" type="string" />
  </table>
  <table name="Project">
    <column name="id" type="uuid" />
    <column name="name" type="string" />
  </table>
</schema>

<Instructions>
  <Instruction>Keep your responses concise and helpful.</Instruction>
  <Instruction>do not call createProject if a project of the same name already exists</Instruction>
  <Instruction>If a Project of a similar name exists, ask the user before calling createProject</Instruction>
  <Instruction>If the user tells you about things they've done at work, call the extractAchievements tool.</Instruction>
  <Instruction>When the user asks you to generate a report, call the createDocument tool (you will be given the Achievements, Companies and Projects data that you need).</Instruction>
  <Instruction>Only call the extractAchievements tool once if you detect any number of Achievements in the chat message you examine - the tool will extract all of the achievements in that message and return them to you</Instruction>
</Instructions>

You will be given the following data:

<InputFormat>
  <chat-history>The chat history between the user and the chatbot</chat-history>
  <user-input>The message from the user</user-input>
  <companies>All of the companies that the user works at (or has worked at)</companies>
  <projects>All of the projects that the user works on (or has worked on)</projects>
  <today>Today&apos;s date</today>
</InputFormat>

These are the tools available to you. It may be appropriate to call one or more tools, potentially in a certain order. Other times it will not be necessary to call any tools, in which case you should just reply as normal:

<Tools>
  <Background>
    Blocks is a special user interface mode that helps users with writing, editing, and other content creation tasks.
    When block is open, it is on the right side of the screen, while the conversation is on the left side.
    When creating or updating documents, changes are reflected in real-time on the blocks and visible to the user.
    This is a guide for using blocks tools: \`createDocument\` and \`updateDocument\`, 
    which render content on a blocks beside the conversation.
  </Background>
  <Tool>
    <name>extractAchievements</name> 
    <summary>call this tool if the user tells you about things they've done at work. The extractAchievements tool will automatically be passed the user's message, companies and projects, but as you have also been given the projects and companies, please pass extractAchievements the appropriate companyId and/or projectId, if applicable. A user may be talking about Achievements not linked to a project.</summary>

    <when-to-use>
      **When to use extractAchievements:**
      - When the user is telling you about things they've done at work
      - When the user provides an update to an existing Achievement
      - Only call the extractAchievements tool once. Do not pass it any arguments
      - extractAchievements already has the full conversation history and will use it to generate Achievements
    </when-to-use>

    <when-not-to-use>
    **When NOT to use extractAchievements:**
    - When the user is requesting information about existing Achievements
    - When the user is requesting information about existing documents
    </when-not-to-use>
  </Tool>
  <Tool>
  <name>createDocument</name>
    <summary>call this tool if the user asks you to generate a report.</summary>
    
    - The createDocument tool will be passed the user's message and the chat history.
    - If the user asks you to generate a report for a specific project or company, please pass the appropriate projectId and/or companyId to the createDocument tool.
    - You must also pass the days to the createDocument tool, between 1 and 720. Typically the user will provide you with a time span for the report, but if not, you can assume a span of 30 days, but let the user know that you did so and that they can provide a different span if they want.
    - The createDocument tool will generate a document based on the above and return it to you.

    <when-to-use>
    **When to use \`createDocument\`:**
    - For substantial content (>10 lines)
    - For content users will likely save/reuse (emails, code, essays, etc.)
    - When explicitly requested to create a document
    - If you are being asked to write a report, you will be given the user's Achievements, Companies and Projects
    - The user may refer specifically to a project, in which case you should set the projectId to that project's ID
    - The user may refer specifically to a company, in which case you should set the companyId to that company's ID
    - If the user does not refer to a specific company, but does refer to a project, use that project's company ID as the companyId parameter
    - If the user requested a specific document title, please use that as the title parameter
    - If the user is requesting a specific time period, please supply the number of days as the days parameter. Achievements are are loaded back to N days ago, where N is the number of days requested. These will then be used to create the document
    </when-to-use>

    <when-not-to-use>
    **When NOT to use \`createDocument\`:**
    - For informational/explanatory content
    - For conversational responses
    - When asked to keep it in chat
    - Unless the user explicitly requests to create a document
    </when-not-to-use>
  </Tool>
  <Tool>
    <name>updateDocument</name>
    <summary>call this tool if the user is updating an existing document</summary>

    <usage>
      **Using \`updateDocument\`:**
      - Default to full document rewrites for major changes
      - Use targeted updates only for specific, isolated changes
      - Follow user instructions for which parts to modify

      Do not update document right after creating it. Wait for user feedback or request to update it.
    </usage>
  </Tool>
  <Tool>
    <name>createProject</name>
    <summary>Creates a new Project</summary>

    <when-to-use>
      Call this tool if the user either explicitly asks you to create a new project, or if it is clear from the context that the user would like you to do so. For example, if the user says "I started a new project called Project Orion today, so far I got the website skeleton in place and basic auth too", you should create a new project called Project Orion, before calling extractAchievements
    </when-to-use>
  </Tool>
</Tools>

Here are some examples of messages from the user and the tool selection or response you should make:

<Examples>
  <Example>
    User: I fixed up the bugs with the autofocus dashboard generation and we launched autofocus version 2.1 this morning.
    Router LLM: Call extractAchievements tool
  </Example>
  <Example>
    User: Write a weekly report for my work on Project X for the last 7 days.
    Router LLM: Call createDocument tool, with the days set to 7, and the correct projectId and companyId
  </Example>
  <Example>
    User: I started a new project called Project Orion today, so far I got the website skeleton in place and basic auth too. Please create a new project called Project Orion and call extractAchievements
    Router LLM: Call createProject tool, and then call extractAchievements tool
  </Example>
</Examples>

Here now are the actual data for you to consider:

<Data>
  <ChatHistory messages={data.chatHistory} />
  <today>{new Date().toLocaleDateString()}</today>
  <Companies companies={data.companies} />
  <Projects projects={data.projects} />
  <UserInput>User message: {data.message}</UserInput>
</Data>

Your response:

This prompt will be run against a given chat session between a user and the LLM Router. This gets rendered in the <ChatHistory /> component, with the latest message being emphasized in the <UserInput /> component;

Based on the user message and the chat history, the LLM is asked to select a tool to use, or to reply directly to the user. The way it's implemented in bragdoc, some of those tool calls are actually the secondary LLMs that are invoked in the LLM Router pattern.

The generate document tool is one such example - the user posts a message to the chat conversation, which then invokes a tool which spins up an inner LLM prompt that generates the document without spending much of the outer LLM's context on the effort, and getting a better outcome at the same time.

How the Prompt Gets Called

As you can tell from the lengthy prompt above, this prompt expects quite a bit of data to be passed in. It wants the recent chat history, including the current message, plus the user's companies and projects. In a recent post on mdx-prompt I described the fetch/render/execute cycle that I tend to apply for almost all of my LLM calls. Those are:

fetch: given a minimal amount of data, load the data required to render the prompt
render: given the data loaded in the fetch step, render the prompt to the UI
execute: given the rendered prompt, run it and return the result

The llm-router.ts file in bragdoc-ai implements this pattern for the main Bragdoc LLM Router, and it's a good example of how to use mdx-prompt with NextJS. That's a 400 line file, though most of that is tool definitions.

Let's take a look at our 3 functions, starting with fetch:

export async function fetch(
  props: LlmRouterFetchProps
): Promise<LlmRouterPromptProps> {
  const { user, chatHistory, message, onEvent } = props;

  const [companies, projects] = await Promise.all([
    getCompaniesByUserId({ userId: user.id }),
    getProjectsByUserId(user.id),
  ]);

  return {
    user,
    companies,
    projects,
    chatHistory,
    message,
    onEvent,
  };
}

Pretty trivial. We do this so that we can call the LLM Router from many different parts of our app without having to re-implement the loading of companies and projects each time we do so.

Next up, let's look at render:

const promptPath = path.resolve('./lib/ai/prompts/llm-router.mdx');

/**
 * Renders the LLM router prompt using the provided data.
 *
 * @param {LlmRouterPromptProps} data - The data including user details, companies, projects, etc
 * @returns {Promise<string>} The rendered prompt.
 */
export async function render(data: LlmRouterPromptProps) {
  return await renderMDXPromptFile({
    filePath: promptPath,
    data,
    components,
  });
}

Equally trivial, it's just passing the data from fetch into mdx-prompt for rendering.

The big boy is the execute function, chiefly because it currently inlines all of the tools (which is not great architecture as the tools are not easily testable that way, but that's a topic for another day). All this does is send our rendered prompt plus 3 tools (createDocument, updateDocument, and extractAchievements) to the LLM, and stream the results back to the user.

All 3 of those tool calls are in fact secondary LLM calls - they're also mdx-prompt prompts, and follow the same fetch/render/execute cycle as the LLM Router does. And that's all an LLM Router really is - just an LLM something that dispatches queries to other LLMs.

/**
 * Executes the LLM router with the provided prompt and data.
 *
 * @param {LlmRouterExecuteProps} props - The properties including prompt, stream text options, etc
 * @returns {Promise<JSONValue>} The result of the execution.
 */
export function execute({
  prompt,
  streamTextOptions,
  data,
  onEvent,
  tools,
}: LlmRouterExecuteProps) {
  const eventCallback = data.onEvent || onEvent;

  //This is a tool of the LLM Router, but it's a tool that calls another LLM prompt
  const createDocument = async ({
    title,
    days,
    projectId,
    companyId,
  }: CreateDocumentExecuteProps) => {
    const { user, chatHistory, message } = data;

    const id = generateUUID();
    let draftText = '';

    eventCallback?.({type: 'id', content: id,});
    eventCallback?.({type: 'title', content: title,});
    eventCallback?.({type: 'clear', content: '',});

    //this is where we dispatch to another LLM call - generateDocument is itself an mdx-prompt prompt
    const { fullStream } = await generateDocument({
      user,
      projectId: projectId ?? undefined,
      companyId: companyId ?? undefined,
      title,
      days,
      chatHistory,
    });

    for await (const delta of fullStream) {
      const { type } = delta;

      if (type === 'text-delta') {
        const { textDelta } = delta;
        draftText += textDelta;
        eventCallback?.({
          type: 'text-delta',
          content: textDelta,
        });
      }
    }

    eventCallback?.({type: 'finish', content: '',});

    if (user.id) {
      await saveDocument({
        id,
        title,
        content: draftText,
        userId: user.id,
      });
    }

    return {
      id,
      title,
      content: 'A document was created and is now visible to the user.',
    };
  };


  const updateDocument = async ({id, description,}: { id: string; description: string; }) => {
    //... truncated for brevity
  };

  const extractAchievements = async () => {
    //... truncated for brevity
  };

  return streamText({
    prompt,
    maxSteps: 10,
    ...streamTextOptions,
    model: routerModel,
    tools: {
      extractAchievements: {
        description:
          'Extract achievements from the chat to be saved to the database',
        parameters: z.object({}),
        execute: tools?.extractAchievements || extractAchievements,
      },

      createDocument: {
        description: "Create a document based on the User's achievements",
        parameters: z.object({
          title: z.string().describe('The title of the document'),
          days: z
            .number()
            .int()
            .min(1)
            .max(720)
            .describe('The number of days ago to load Achievements from'),
          projectId: z
            .string()
            .optional()
            .describe('The ID of the project that the user is talking about'),
          companyId: z
            .string()
            .optional()
            .describe(
              "The ID of the company that the user is talking about (use the project's company if not specified and the project has a companyId)"
            ),
        }),
        execute: tools?.createDocument || createDocument,
      },
      updateDocument: {
        description: 'Update a document with the given description',
        parameters: z.object({
          id: z.string().describe('The ID of the document to update'),
          description: z
            .string()
            .describe('The description of changes that need to be made'),
        }),
        execute: tools?.updateDocument || updateDocument,
      },
    },
  });
}

That's about a hundred lines of code but it's pretty easy stuff. And that's really all there is to it.

Integrating into NextJS

This is also fairly straightforward. I use the excellent Vercel AI SDK for all of my work with LLMs, and this one is no exception. The actual source code of the api/chat/route.ts is here, but here's a slightly slimmed down version:

import { streamFetchRenderExecute } from "@/lib/ai/llm-router";

export const maxDuration = 120;

export async function POST(request: Request) {
  // truncated all other the auth and data loading code. All we need is userMessage

  const userMessage = getMostRecentUserMessage(coreMessages);

  const streamingData = new StreamData();

  streamingData.append({
    type: "user-message-id",
    content: userMessageId,
  });

  const result = await streamFetchRenderExecute({
    input: {
      user: session.user as User,
      chatHistory: coreMessages,
      message: userMessage.content as string,
    },
    //onEvent is our own custom callback, implemented in streamFetchRenderExecute,
    //where each message is JSON suitable to be streamed back to the client
    onEvent: (item: any) => {
      streamingData.append(item);
    },
    streamTextOptions: {
      onFinish: async ({ response }) => {
        try {
          //save messages to the database
          await saveMessages({... truncated for brevity ... });
        } catch (error) {
          console.error("Failed to save chat");
          console.log(error);
        }

        streamingData.close();
      },
      experimental_telemetry: {
        isEnabled: true,
        functionId: "stream-text",
      },
    },
  });

  return result.toDataStreamResponse({
    data: streamingData,
  });
}

That's really all it is. We have a simple function called streamFetchRenderExecute, which just calls our fetch, render and execute functions in order, returning us what the execute function returns, which is just the Vercel AI SDK's streamText function.

In the UI, we use useChat inside our chat.tsx component, which works out of the box with streamText, so at this point we're pretty much done. If the LLM wants to respond in text, that gets streamed back to the UI in the normal way, and if it needs instead to dispatch to another LLM call, that all happens transparently under the covers, with the StreamData object used to stream tool call progress/results back too.

There Can Be (more than) Only One

You don't need to limit yourself to a single LLM Router. In fact, it's often a good idea to have more than one. Task Demon is an order of magnitude more sophisticated than Bragdoc and currently uses no fewer than 6 different LLM Routers - one for each of the different Agents in the system. For now this article is long enough already, but I'll be writing about that in the future.

claudify: fire and forget for Claude Code

Wed, 14 May 2025 09:00:00 GMT

Sometimes I find myself doing the same thing over and over again. One of those things looked like this:

Find that my test suite is failing
Open up Claude Code
"Please run pnpm test and fix the failures"
Wait

Maybe there's only one failing test out of the ~1000 tests in the suite, so we can kinda optimize it a little:

Look to see which test file was causing the problem
pnpm test /path/to/that/file.test.ts
Open up Claude Code
Copy/paste the pnpm test ... command and its output and hit enter
Wait

That's faster as it lets Claude Code focus on a single test file. But it still involved me doing the work of copying and pasting the command and its output. It's a First World Problem of truly quotidian proportions.

What if I could do this instead, and have it be equivalent to all the hard work described above?:

$ pnpm test /path/to/that/file.test.ts
$ claudify

Turns out that's possible with a fairly simple little shell script. I call it claudify, though I usually alias it to just 'fix'.

How does it work?

It's pretty basic. The entire script is only 70 lines long and most of that is documentation and logging. Here's the core of it:

Pulls the most recent command off your shell history (using fc -ln -1)
Runs that command again, captures the output
Throws the command and its output into a tiny prompt
Sends that prompt to Claude Code
Prints a subset of the Claude Code output as it works

That's it. It's a pretty dumb and simple script that wraps Claude Code and takes advantage of the -p and --output-format flags supported by Claude Code.

How do you install it?

claudify needs to be a function in your shell, so you can add it to your .bashrc or .zshrc file. My shell-fu is not good enough to make it work via a bash script (because executing a bash script seems to generate a new history stack).

So the answer is to just copy/paste the function into your .zshrc, .bashrc or similar. Make sure you run source ~/.zshrc or source ~/.bashrc to make it take effect.

Grab the file here, or try this one if you're on Windows (no promises on the Windows one as the only thing I use my PC for is playing Doom).

Customizing with reusable instructions

I find myself using this in two main contexts:

Fixing a broken Next JS build
Fixing failed tests

Claude Code really does not want to write tests the way I want them. If I had a dollar for every time I told it "don't mock the database, insert real data and clean it up afterwards", I'd be a good deal happier in life.

claudify accepts a single parameter that allows you to customize the prompt that gets passed in to Claude Code along with the most recent command and its output (the default prompt is just "Please fix this:"). I tend to just create some simple shell aliases with different instructions based on what I want.

These 2 aliases are the most useful to me:

alias fix="claudify"
alias fixtest="claudify \"Please fix this test. Do not mock the database, \
  instead insert whatever data are required in a beforeEach and delete it again \
  surgically in an afterEach. Do not generate your own UUIDs, \
  let the database do that and read the IDs if you need them.\""

So now in my day-to-day, if pnpm build fails, I tend to just run fix and go do something else for a while until it's fixed all the TS issues the most recent vibe coding session introduced. Similarly, if a test fails I'll run fixtest and let it do its thing.

It probably saves me 30 seconds each time I do it. It won't change your life, but it might be mildly convenient once in a while.

Introducing Task Demon: Vibe Coding with a Plan

Mon, 14 Apr 2025 08:31:02 GMT

In the last 6 months, the way that leading software engineers build software has undergone a fundamental shift.

The adoption of agentic AI coding assistants has heralded the greatest leap in productivity I have encountered in my 20 year career so far. As I wrote previously, adopting Windsurf doubled my output within a week. Where usually I'd be thrilled to find some way to get 20% more done, and would work hard for that 20%, suddenly I'm getting 100% and it's just... easy.

But if there's a single consistent counter-punch to the Vibe Coding movement, it's the irrefutable fact that no matter how good the agentic AI coding assistant is, it will always do much better work from a detailed prompt that includes a plan, than from your 2 sentence vibe code prompt.

That's what Task Demon does: it takes the 2 sentence vibe prompt and blows it up into a sublimely detailed prompt, usually anywhere between 200 and 1000 lines long, that includes a full implementation plan that will correctly guide the AI to do the right thing, using your project's structure, dependencies and ways of doing things.

A video is worth a million words. This one is 15 minutes but if you use AI to build software, I believe you'll find it worth it:

How it works

After using Windsurf and later Claude Code for a while, I found that using the following pattern yielded superb results:

Ask the AI to write a plan on how to do a certain Task
Ask it to check that plan as it usually makes mistakes
Ask it to write the code based on that plan

I developed these detailed prompts for each step that I would reuse each time and just swap out the task description. The rest of the prompt text was always stuff about what dependencies to use, how to run tests, and so on. I'd copy the output of each prompt in as the input of the next, until I'd run all 3 prompts and the code had in theory been implemented.

After doing this for a while, it was working very successfully, but I was no longer doing engineering. I was doing manufacturing - copy/pasting text snippets between prompts over and over again in the same way every time, just swapping out a few key sentences and making quite a few mistakes along the way.

Task Demon automates this process. When you create a task with Task Demon, it uses an agentic planning process to generate an extremely detailed prompt that includes detailed background on the project and the task, a full pre-planned implementation plan and any custom instructions you want to add. Just paste the resulting 200-1000 line prompt into Windsurf, Claude Code, or any other agentic AI coding assistant, and let it do the rest.

Planning beats Vibes

Because there are often many ways to do the same thing, the AI will often choose the wrong way to do it for your project. Then you have to prompt it again, pleading with it to do it right this time. Often there are several rounds of this. And to be fair, the AI had no idea that your project already uses Convention X and Library Y - your vibe code prompt definitely didn't tell it.

This is where having a pre-authored plan prompt massively accelerates software development vs Vibe Coding. Task Demon's prompts are incredibly detailed and completely tailored to the specific project that you are working on so when you paste them into Claude Code and press enter, generally speaking you don't have to intervene very much because the prompt already tells Claude Code everything it needs to do the job properly.

Here's an example of a prompt that Task Demon generated:

This was generated from a 2-sentence description I typed in to one of the Task Demon chatbot UIs. This prompt is 1035 lines long, which is on the longer side of what Task Demon will normally generate, but scrolling through it you can see why any agentic AI coding assistant would have a far better chance implementing the right thing given this, than given a 2 sentence hit-enter-and-hope vibe prompt.

A theme of what Task Demon does is to try to pull as much computation forward as possible. This is why the Characterization Agent is so important - by spending 10-15 minutes understanding your project in detail one time, we avoid having to learn it all again each time we need to generate a prompt to do some work.

So how does this all work? Characterization, Claude Code and Task Demon itself:

Characterization & Claude Code

Characterization is the way that Task Demon gets to know your project. It is a conversation between 2 agentic AIs - one runs in Task Demon's cloud servers, the other is Claude Code, running in your local environment. Mediating this conversation is a simple NPM module called the Task Demon CLI agent, which just acts as a conduit to allow Task Demon to drive Claude Code.

The Task Demon Characterization Agent starts the process by asking Claude Code for a detailed technical document describing your project at a high level - what languages it uses, whether it appears to be of a well-known framework layout, what dependencies it has, what the database schema looks like, and so on.

When it gets it back (via the Task Demon CLI Agent), the Characterization Agent analyzes this document and will either decide that the project is simple enough to be characterized by this document alone, or it will ask Claude Code for up to half a dozen followup documents on subjects like data schema, API endpoints, business logic, etc.

When the Characterization Agent is satisfied that it has learned enough about your project to be able to make competent plan preparation prompts, it declares the process done and then uses the Characterization as the basis for all of the Task processing it does, from triage to plan generation to task creation to implementation prompt generation.

Task Demon is designed to avoid having any direct contact with any of your code or data. Task Demon itself delegates all of its interactions with your code to Claude Code, which you need to have installed in your local environment.

Task Demon

Task Demon is an agentic AI SaaS application that upgrades your simple ticket descriptions into highly effective LLM prompts that help you Vibe Code like a grownup. Using Task Demon to drive Claude Code has more than doubled my output; often it seems more like a 10x increase.

Claude Code by itself is already an immensely powerful tool for software engineering, but it's also a bottleneck in that it can only really do one thing at a time. With an excellent prompt containing a detailed plan, Claude Code can usually complete the task much faster than with a vibe prompt, and usually with minimal human intervention.

This stuff hasn't suddenly become magic - the reason Task Demon gives you a "Copy Prompt" button is that it's still expecting you to keep an eye on Claude Code (or whichever agent you use) as it does its thing. Task Demon does have a "YOLO Mode", but that's a topic for another day... for now Task Demon gets you much closer to a one-shot prompt for most tasks, but you are still going to want to refine or tweak things pretty often during its implementation.

Task Demon is a product that I built by myself in the last 8 weeks. There's no Venture Capital funding behind it, and I'm a team of one. It costs money to run Task Demon, though I'm able to offer a limited number of free trials per month - see details at https://www.taskdemon.ai/trial for how many slots are still open right now.

Individual Mode Available Now; Team Plans Coming Soon

Today's release of Task Demon makes the Individual mode available to everyone, and I'm working on the Team mode now - expect it to be available in the next few weeks.

In the meantime I'm generally releasing updates multiple times per day, which you can see (as well as come get support) in the Task Demon Discord Server.

I write Task Demon by myself, and there's zero dollars of Venture Capital or other funding behind me. It's a paid product aimed at professionals who write code or otherwise contribute to software getting written. It has more than doubled my output, and its ROI measured against its cost is in the tens of thousands of percent. If you can't get at least a 10% productivity boost while using it, email me at support@taskdemon.ai and I'll a) try to make it work better for you and b) refund your money, no questions asked.

Try it out!

You can try out Task Demon for free at https://www.taskdemon.ai/trial - there's a limit of 100 free trials per month, but they reset every month.

Deep Research Yourself

Tue, 11 Feb 2025 08:31:02 GMT

After 2 years of doing my own thing, I recently got the itch to work on something bigger than myself again and earn some money in the process. After talking to a few interesting companies, I was reminded that hiring engineers is really hard, really time consuming and has a large degree of risk attached to it.

When I think about which company makes the most sense for me to join, I picture myself as a jigsaw piece, with a unique blend of skills, experience and personality traits that you could conceivably draw as a pretty complex jigsaw piece. Each company is also a jigsaw, with a bunch of pieces missing. Just as your shape is unlike anyone elses, so each company's gaps are uniquely shaped as well.

As I plan to do full stack engineering for a company that has a strong AI focus, the jigsaw for a company that might be an optimal fit for me could look like this. Each blue piece is a position the company has already filled, with the blank ones being empty positions they are hiring for:

Imagining myself as the green piece and other candidates for the role as the orange and red, this is a company jigsaw where I would have high alignment, because the shape of my puzzle piece fits with the gap in the company jigsaw without missing areas or overlapping too much.

This is a good company to consider joining, with both company and candidate benefitting from the strong alignment. Our orange and red candidates don't fit so well, or overlap too much, so their ability to create value for the company (and therefore themselves) is lower.

When people research you, what do they see?

Thinking from the hiring company's point of view, it's quite a lot of effort to do the research on a candidate. I honestly don't know if the automated candidate screening tooling is good enough to trust yet, but there are 2 things I do know:

Almost all the information they will gather about you will be from the internet
You don't get to see a copy of what they find out about you

With OpenAI's release of Deep Research last week, it starts to be possible for candidates to do some of the same kinds of research on themselves. Deep Research is an ideal way to do something like this, for a few reasons:

Critically examining yourself is both uncomfortable and subject to bias
You already know yourself pretty well, but the person on the other side only has google to go on
It's kinda boring and time-consuming, but the AI doesn't care

Deep Research Yourself

With access to a tool like Deep Research, it's pretty easy to have it go off and perform that dispassionate research on you and give you a candid assessment of what companies see when they think about hiring you.

I went ahead and did that on myself and made the response public (link below). I put myself in the shoes of a hiring manager for the type of company jigsaw I imagine myself fitting into, and asked it the following:

I am considering hiring a full stack software engineer called Ed Spencer.
He has a blog somewhere and says he is good at full stack engineering.
We're using React with Next JS and have a big complex AI model on the backend 
but need someone to bring order to the front end and understand the back end 
enough to make sure the whole thing works well, is fast and delightful to users, 
things like that.

Please research this guy and give me a full report on everything you could find 
about him and whether he would be a suitable candidate for the position.

This is the sort of requirement I hear all the time when people are looking for someone who does the type of thing I do. ChatGPT had me answer a follow-up question, as it is wont to do, and then it went off and browsed the web for 5 minutes before coming back with a report:

In the end it churned out this final report. I published it because a) it's trivial for anyone with ChatGPT Pro to copy and paste the prompt above and reproduce it and b) so you can see an example of what it comes up with:

You can read the full report here if you want to see what the rest looks like. Employers are going to be increasingly relying on techniques and technologies like this to find the right candidates, so it's important to make sure that what your ideal company sees matches when you want them to see. The first step to doing that is to do the deep research on yourself.

Eval-Driven Design with NextJS and mdx-prompt

Mon, 03 Feb 2025 08:31:02 GMT

In the previous article, we went on a deep dive into how I use mdx-prompt on bragdoc.ai to write clean, composable LLM prompts using good old JSX. In that article as well as the mdx-prompt announcement article, I promised to talk about Evals and their role in helping you architect and build AI apps that you can actually prove work.

Evals are to LLMs what unit tests are to deterministic code. They are an automated measure of the degree to which your code functions correctly. Unit tests are generally pretty easy to reason about, but LLMs are usually deployed to do non-deterministic and somewhat fuzzy things. How do we test functionality like that?

In the last article we looked at the extract-achievements.ts file from bragdoc.ai, which is responsible for extracting structured work achievement data using well-crafted LLM prompts. Here's a reminder of what that Achievement extract process looks like, with its functions to fetch, render and execute the LLM prompts.

When it comes right down to it, when we say we want to test this LLM integration, what we're trying to test is render() plus execute(), or our convenience function renderExecute. This allows us to craft our own ExtractAchievementsPromptProps and validate that we get reasonable-looking ExtractedAchievement objects back.

ExtractAchievementsPromptProps is just a TS interface that describes all the data we need to render the LLM prompt to extract achievements from a chat session. It looks like this:

//props required to render the Extract Achievements Prompt
export interface ExtractAchievementsPromptProps {
  companies: Company[];
  projects: Project[];
  message: string;
  chatHistory: Message[];
  user: User;
}

ExtractedAchievement is equally basic - just a subset of our Achievement type (which is itself just a drizzle model).

// the type of Achievement emitted by the LLM wrapper (not saved to db yet)
// basically what the LLM sent back plus a couple of fields like impactUpdatedAt
export type ExtractedAchievement = Pick<
  Achievement,
  | 'title'
  | 'summary'
  | 'details'
  | 'eventDuration'
  | 'eventStart'
  | 'eventEnd'
  | 'companyId'
  | 'projectId'
  | 'impact'
  | 'impactSource'
  | 'impactUpdatedAt'
>;

Our execute function - the function that actually runs a rendered LLM prompt - is an async generator that yields ExtractedAchievement objects. Our Eval will need to collect those ExtractedAchievement objects and compare them to the Achievements it expects to see.

/**
 * Executes the rendered prompt and yields the extracted achievements
 * 
 * @param prompt string
 * @returns AsyncGenerator<ExtractedAchievement, void, unknown>
 */
export async function* execute(prompt: string): AsyncGenerator<ExtractedAchievement, void, unknown> {
  const { elementStream } = streamObject({
    model: extractAchievementsModel,
    prompt,
    temperature: 0,
    output: 'array',
    schema: achievementResponseSchema,
  });

  for await (const element of elementStream) {
    yield {
      ...element,
      summary: element.summary || '',
      details: element.details || '',
      eventStart: element.eventStart ? new Date(element.eventStart) : null,
      eventEnd: element.eventEnd ? new Date(element.eventEnd) : null,
      impactSource: 'llm',
      impactUpdatedAt: new Date(),
    };
  }
}

Crafting a Scenario

So now that we're familiar with the shapes of the data at the start and the end of the Eval process, we can start to put together a scenario to test our LLM code. This is probably the hardest part - there are infinitely many approaches and you probably need more than one of them.

But one's better than none so let's imagine a scenario for a user who is an engineer, has a couple of companies and projects, and tells the bragdoc AI a couple of things that they've been working on. First up we'll need some data to represent this user, their companies and projects. Here's a snippet of some fake data that we can use to represent this user:

import type { User } from '@/lib/db/schema';
import { v4 as uuidv4 } from 'uuid';

export const user: User = {
  name: 'Ed Spencer',
  preferences: {
    documentInstructions: `If I don't mention a specific project, I'm talking about Brag Doc.`,
    language: 'en',
    hasSeenWelcome: true
  },
  id: uuidv4(),
  email: 'Q3Sd2@example.com',
} as User;

export const previousCompany = {
  name: 'Palo Alto Networks',
  id: uuidv4(),
  startDate: new Date('2016-02-01'),
  endDate: new Date('2021-09-30'),
  userId: user.id,
  role: 'Principal Engineer',
  domain: 'www.paloaltonetworks.com',
};

export const company = {
  name: 'Egghead Research',
  id: uuidv4(),
  startDate: new Date('2023-01-01'),
  endDate: null,
  userId: user.id,
  role: 'Chief Scientist',
  domain: 'www.edspencer.net',
};

export const project1 = {
  name: 'BragDoc.ai',
  description: 'AI-powered self-advocacy tool for tech-savvy individuals.',
  startDate: new Date('2024-12-15'),
  endDate: null,
  id: uuidv4(),
  companyId: company.id,
  status: 'active',
  userId: user.id,
  repoRemoteUrl: null
}

export const project2 = {
  name: 'mdx-prompt',
  description: 'Composable LLM prompts with JSX and MDX',
  startDate: new Date('2023-01-01'),
  endDate: new Date('2023-06-30'),
  id: uuidv4(),
  companyId: company.id,
  status: 'active',
  userId: user.id,
  repoRemoteUrl: null
}

export const projects = [project1, project2];
export const companies = [company, previousCompany];

Defining Experiments

I'm using Braintrust to run and track the Evals for bragdoc.ai, but most of this article really applies to any way of running them. You don't need to be using Braintrust specifically.

One thing they do formalize, though, is the idea of an Experiment. An Experiment is just a type that represents some input and some expected output. And here's where our careful thinking and structuring of our fetch/render/execute architecture pays off - we can just use our ExtractAchievementsPromptProps and ExtractedAchievement types to define our Experiment type:

export type Experiment = {
  input: ExtractAchievementsPromptProps;
  expected: ExtractedAchievement[];
};

This makes total sense. We're skipping the fetch stage as we're providing our own fake/controlled data set, so we're testing from ExtractAchievementsPromptProps to ExtractedAchievement[] on our diagram above.

Writing the Eval

Braintrust will help us run an array of these Experiments - here's how we structure this particular experiment, which is just a single chat message from the user:

const chatHistory = [
  {
    role: 'user' as const,
    content: 'I fixed several UX bugs in the checkout flow on Bragdoc today',
    id: '1',
  },
];

const lastMidnight = new Date();
lastMidnight.setHours(0, 0, 0, 0);

const nextMidnight = new Date();
nextMidnight.setDate(nextMidnight.getDate() + 1);
nextMidnight.setHours(0, 0, 0, 0);

const experimentData: Experiment[] = [  
  {
    input: {
      companies,
      projects,
      chatHistory,
      user,
      message: 'I fixed several UX bugs in the checkout flow on Bragdoc today',
    },
    expected: [
      {
        summary: 'Fixed several UX bugs in the checkout flow',
        details: 'Fixed several UX bugs in the checkout flow on Bragdoc',
        eventStart: lastMidnight,
        eventEnd: nextMidnight,
        impactSource: 'llm',
        impactUpdatedAt: new Date(),
        companyId: companies[0].id,
        projectId: projects[0].id,
        title: 'Fixed several UX bugs in the checkout flow',
        eventDuration: 'day',
        impact: 1,
      },
    ],
  },
];

We can see that the input is an instance of our ExtractAchievementsPromptProps type, and the expected is an array of ExtractedAchievement objects that we expect to be yielded by our execute() function. There is a third concept - output - which is the actual array of ExtractedAchievement objects that we get back from the LLM.

So in the end this eval is just testing that when the user sends a message in an otherwise empty chat session, saying "I fixed several UX bugs in the checkout flow on Bragdoc today", we get back the output of ExtractedAchievement objects that looks like what we expected, given the company/project/user data that we also fed the prompt with the rest of the input.

Fuzzy matching and LLMs as judges

Ok so we've got a good handle on what we're testing, and we've got some good test data to test it with. But how do we actually compare the ExtractedAchievement objects that we get back from the LLM with the ExtractedAchievement objects that we expect?

This bit is where it can be challenging, because there are arguably many different reasonable ways an LLM could respond to a given message like this. In other Experiments along the same lines, we want to be able to pass in very long messages and have a bunch of Achievements extracted from them. Some of the other Evals for bragdoc.ai do just that - setting up scenarios where we expect a dozen or more Achievements to be extracted from a single LLM invocation.

In this case we're essentially comparing 2 arrays of identically-shaped JSON objects, but there's still enough fuzziness in the LLM output that we can't just do a deep comparison of the arrays. We certainly can't expect to be able to run a full comparison of the 2 arrays because there is a lot of non-determinism in the LLM output.

So we need to be a bit more creative. We can use the LLM as a judge, and ask it to compare the output and expected arrays for us. We can do this by asking another LLM to compare each ExtractedAchievement object in the output array with each ExtractedAchievement object in the expected array, and give us a score for how similar they are.

This general technique is called "LLM as a Judge", and it's a powerful way to compare fuzzy data. We can use it to compare the output and expected arrays, and then use that comparison to decide whether the Eval passed or failed. Conveniently, we can use mdx-prompt to write the prompt that does this comparison for us:


//snipped for brevity, contains instructions on how to compare the output and expected arrays
const instructions = ['...']

//tell the LLM how to score the comparison
const outputFormat = `
Answer by selecting one of the following options:
(A) The extraction matches the expected output perfectly
(B) The extraction captures the main achievement but misses some details
(C) The extraction has minor inaccuracies but is generally correct
(D) The extraction misses key information or has significant inaccuracies
(E) The extraction is completely incorrect or misunderstands the achievement`;

function EvaluateExtractedAchievementsPrompt({
  expectedAchievements,
  extractedAchievements,
}: {
  expectedAchievements: any;
  extractedAchievements: any;
}) {
  return (
    <Prompt>
      <Purpose>
        You are evaluating how well an AI system extracted achievements from a
        user message. Compare the extracted achievements with the expected
        output. Consider that a single message may contain multiple
        achievements. Return one of the scores defined below.
      </Purpose>
      <Instructions instructions={instructions} />
      <InputFormat>
        <expected-achievements>
          The correct Achievements that should have been extracted by the model
        </expected-achievements>
        <extracted-achievements>
          The achievements that were actually extracted by the model
        </extracted-achievements>
      </InputFormat>
      <OutputFormat format={outputFormat} />
      <Variables>
        <expected-achievements>
          {JSON.stringify(expectedAchievementsPlucked, null, 4)}
        </expected-achievements>
        <extracted-achievements>
          {JSON.stringify(extractedAchievementsPlucked, null, 4)}
        </extracted-achievements>
      </Variables>
    </Prompt>
  );
}

That's a fairly simple prompt that just tells the LLM what data it's going to get, how to evaluate it, and then gives it the data. The LLM will then return a score that will give us some indication of how well the output and expected arrays match.

In the actual extract-achievement-scorer code you can see how we need to import those components up at the top of the file to be able to use them in our prompt. We don't need to do that with the .mdx file approach.

In order to do that within the Braintrust way of doing things, we can define a scorer:

export async function ExtractAchievementScorer(args: any): Promise<Score> {
  const prompt = await renderMDX(
    <EvaluateExtractedAchievementsPrompt
      expectedAchievements={args.expected}
      extractedAchievements={args.output}
    />
  );

  return LLMClassifierFromSpec('ExtractAchievementScorer', {
    prompt,
    choice_scores: {
      A: 1.0, // Perfect match
      B: 0.8, // Good but missing details
      C: 0.6, // Minor issues
      D: 0.3, // Major issues
      E: 0.0, // Completely wrong
    },
  })(args);
}

And now we can tie it all together to run all of our Experiments (all one of them, in this case) and use our ExtractAchievementScorer to score the comparison of the output and expected arrays:

Eval('extract-chat-achievements', {
  data: experimentData,
  task: wrappedExtractAchievements,
  scores: [ExtractAchievementScorer],
  trialCount: 3,
  metadata: {
    model: 'gpt-4',
    description: 'Evaluating achievement extraction',
    owner: 'ed',
  },
});

// Function to wrap the async generator into a promise that resolves to an array of ExtractedAchievements
async function wrappedExtractAchievements(input: ExtractAchievementsPromptProps): Promise<ExtractedAchievement[]> {
  return await renderExecute(input);
}

The task function is just what Braintrust runs for each Experiment to produce the output - here finally is where we use that renderExecute function from extract-achievements.ts to pass in our test data and get back a Promise that resolves to an array of ExpectedAchievement objects. The experimentData is the array of Experiment instances we defined earlier in this article.

Running the Eval

Running the eval now, we'll get output like this:

$ npx braintrust eval  lib/ai/prompts/evals/extract-achievements.eval.ts          

Processing 1 evaluators...
Experiment mdx-prompt-polish-1737670065 is running at https://www.braintrust.dev/app/Egghead/p/extract-chat-achievements/experiments/mdx-prompt-polish-1737670065
 ████████████████████████████████████████ | extract-chat-achievements                | 100% | 3/3 datapoints


=========================SUMMARY=========================
mdx-prompt-polish-1737670065 compared to mdx-prompt-polish-1737670046:
100.00% 'ExtractAchievementScorer' score	(0 improvements, 0 regressions)

1737670065.27s 'start'            	(0 improvements, 0 regressions)
1737670072.44s 'end'              	(0 improvements, 0 regressions)
6.75s 'duration'         	(0 improvements, 0 regressions)
0.82s 'llm_duration'     	(0 improvements, 0 regressions)
2417tok 'prompt_tokens'    	(0 improvements, 0 regressions)
155.33tok 'completion_tokens'	(0 improvements, 0 regressions)
2572.33tok 'total_tokens'     	(0 improvements, 0 regressions)

See results for mdx-prompt-polish-1737670065 at https://www.braintrust.dev/app/Egghead/p/extract-chat-achievements/experiments/mdx-prompt-polish-1737670065

In this case we can see that our LLM as a judge generously awarded us a 100% score for our extraction of Achievements from source messages, when it compares them to the expected Achievements:

Conclusions

This was just implementing a single Eval to test a single LLM feature under a single scenario. For all that work we now have an automated way of quantifying how well our LLM prompt is working for this feature. The fact that we can now run these Evals at development time and in CI means that we can have confidence that we haven't unknowingly broken something as we meddle with the prompt or other code some time in the future.

Pictures are useful, Types are too

I made that pretty flow chart diagram before I ended up with the code in its final form. Once I had drawn that out, the render/execute/fetch cycle kinda leapt off the page and asserted itself as a fairly generalized architecture for invoking these types of prompts.

I found this pattern cropping up again and again, and in reality there are several ways that we want to compose those individual functions. Sometimes we want an async generator so we can stream things to the user, other times a Promise is better, other times we already have the data and just want to pass it in and get the result back, and so on.

Having a clear model for how your prompts work is critical to structuring evals that make sense.

Good test data is useful

One bonus benefit of having good Evals is that if you have good Evals you by definition have good test data, at least when it comes to rendering prompts covered by your evals. I made a page at https://www.bragdoc.ai/prompt that uses Next JS to render prompts in the browser. You can open that page (no account needed to access this page) and see exactly what the rendered prompts used by bragdoc.ai look like:

That's just a fairly simple Next JS page.tsx that uses a basic component called PrettyPrompt to render three different prompts. It's currently hooked up to use that eval data, which is why you can go look at it without being logged in, but you can see how easy it would be to hook it up to real data to debug how your prompts actually get rendered.

Evals are slow, expensive and difficult

Your unit test suite should ideally run in about a second, triggered automatically each time you save a file in your IDE. Evals are never going to do that - they're far too slow, and sometimes the output is itself in shades of gray that just don't exist in the world of unit tests. So Evals are their own separate thing, run much less often. But still before every merge to main.

While developing bragdoc.ai, which is a fairly typical AI SaaS app, I found that whereas I could rely on Windsurf and other AI tooling for building out most of the app itself, building Evals was not something the AI excelled at. Or at least, my attempts to use AI to build evals were not very successful. After a few weeks of flying through implementation when it comes to adding features, I spent the last week just basically getting Evals under control.

They're a lot of work, and famously difficult to get right. Bragdoc only has a handful of them so far, but I see the value in them and will continue to add them. They also unlock some really significant benefits when it comes to assessing which LLM works best for which prompt, whether it be in terms of accuracy, cost or latency. But that's a topic for another day.

mdx-prompt: Real World Example Deep Dive

Mon, 03 Feb 2025 07:31:02 GMT

I just released mdx-prompt, which is a simple library that lets you write familiar React JSX to render high quality prompts for LLMs. Read the introductory article for more general info if you didn't already, but the gist is that we can write LLM Prompts with JSX/MDX like this:

<Prompt>
  <Purpose>
    You are a careful and attentive assistant who extracts work achievements 
    from source control commit messages. Extract all of the achievements in 
    the commit messages contained within the <user-input> tag. Follow 
    all of the instructions provided below.
  </Purpose>
  <Instructions>
    <Instruction>Each Achievement should be complete and self-contained.</Instruction>
    <Instruction>If multiple related commits form a single logical achievement, combine them.</Instruction>
    <Instruction>
      Pay special attention to:
      1. Code changes and technical improvements
      2. Bug fixes and performance optimizations
      3. Feature implementations and releases
      4. Architecture changes and refactoring
      5. Documentation and testing improvements
    </Instruction>
  </Instructions>
  <Variables>
    <Companies companies={data.companies} />
    <Projects projects={data.projects} />
    <today>{new Date().toLocaleDateString()}</today>
    <user-instructions>
      {data.user?.preferences?.documentInstructions}
    </user-instructions>
    <UserInput>
      {data.commits?.map((c) => <Commit key={c.hash} commit={c} />)}
    </UserInput>
    <Repo repository={data.repository} />
  </Variables>
  <Examples
    examples={data.expectedAchievements?.map((e) => JSON.stringify(e, null, 4))}
  />
</Prompt>

This ought to look familiar to anyone who's ever seen React code. This project was born of a combination of admiration for the way IndyDevDan and others structure their LLM prompts, and frustration with the string interpolation approaches that everyone takes to generating prompts for LLMs.

In the introductory post I go into some details on why string interpolation-heavy functions are not great for prompts. It's a totally natural thing to want to do - once you've started programming against LLM interfaces, you want to start formalizing the mechanism by which you generate the string that is the prompt. Before long you notice that many of your app's prompts have a lot of overlap, and you start to think about how you can reuse the parts that are the same.

Lots of AI-related libraries try to help you here with templating solutions, but they often feel clunky. I really, really wanted to like Langchain, but I lost a day of my life trying to get it to render a prompt that I could have done in 5 minutes with JSX. JSX seems to be a pretty good fit for this problem, and anyone who knows React (a lot of people) can pick it up straight away. mdx-prompt helps React developers compose their LLM prompts with the familiar syntax od JSX.

The Setup

In this article we'll take a deeper look at this actual real-world example of how I use mdx-prompt for bragdoc.ai - a SaaS app that tracks your work achievements and generates documents from them. This article is not a plug for bragdoc, but I will have to tell you a little about it for the article to make sense. All of the code is open source and can be found on GitHub.

Prompts are arrays of tokens

Our mdx-prompt powered prompt defined above will render into an XML-style prompt like this (slightly truncated for brevity):

<purpose>
  You are a careful and attentive assistant who extracts work achievements from source control commit messages. Extract all of the achievements in the commit messages contained within the &lt;user-input&gt;tag. Follow all of the instructions provided below.
</purpose>

<instructions>
  <instruction>Consider the chat history and context to understand the full scope of each achievement.</instruction>
  <instruction>Each Achievement should be complete and self-contained.</instruction>
  <instruction>If the user mentions multiple achievements in a single message, extract them all.</instruction>
  // ... more instructions
</instructions>

<input-format title="You are provided with the following inputs:">
  <companies>All of the companies that the user works at (or has worked at)</companies>
  <projects>All of the projects that the user works on (or has worked on)</projects>
  <user-instructions>Any specific instructions from the user to guide the extraction process</user-instructions>
  <user-input>The git commits to extract achievements from</user-input>
  <repository>Information about the repository the commits are from</repository>
</input-format>

<variables>
  <companies>
    <company>
      <id>74eda7d6-3c69-4f51-a53a-58624bba48f4</id>
      <name>Egghead Research</name>
      <role>Chief Scientist</role>
      <start-date>12/31/2022</start-date>
      <end-date>Present</end-date>
    </company>
  </companies>
  <projects>
    <project>
      <id>94150287-afa1-4caa-aa66-8ad44f31120c</id>
      <name>BragDoc.ai</name>
      <description>AI-powered self-advocacy tool for tech-savvy individuals.</description>
      <status>active</status>
      <start-date>12/14/2024</start-date>
      <end-date>Present</end-date>
      <remote-repo-url>
      </remote-repo-url>
    </project>
    // ... more projects
  </projects>
  <today>1/17/2025</today>
  <user-instructions>If I don't mention a specific project, I'm talking about Brag Doc.</user-instructions>
  <user-input>
    <commit>
      <message>Wrote a bunch of new Evals for extracting achievements and generating documents</message>
      <hash>1234</hash>
      <author>John Doe - john@doe.com</author>
      <date>2023-01-01</date>
    </commit>
    <commit>
      <message>Better styling for the blog pages</message>
      <hash>5678</hash>
      <author>John Doe - john@doe.com</author>
      <date>2023-01-02</date>
    </commit>
  </user-input>
  <repository>
    <name>bragdoc-ai</name>
    <path>/path/to/bragdoc-ai</path>
    <remote-url>https://github.com/edspencer/bragdoc-ai</remote-url>
  </repository>
</variables>

<examples>
  <example>
    {
      "eventStart": "2024-06-15",
      "eventEnd": "2024-09-15",
      "eventDuration": "quarter",
      "title": "Launched AI Analysis Tool with 95% Accuracy at Quantum Nexus",
      "summary": "Developed an AI tool for real-time data analysis with 95% accuracy for Quantum Nexus, playing a pivotal role in Project Orion's success.",
      "details": "As part of Project Orion at Quantum Nexus, I was responsible for developing a cutting-edge AI tool focused on real-time data analysis. By implementing advanced algorithms and enhancing the training data sets, the tool reached a 95% accuracy rate. This result significantly supported the company's research objectives and has been positively acknowledged by stakeholders for its robust performance and reliability.",
      "companyId": "e3856e75-37cf-4640-afd9-e73a53fa967d",
      "projectId": "3923129e-719b-4f99-8487-9830cf64ad5d",
      "impact": 3
    }
</example>
  //... more examples
</examples>

The code we're going to be looking at today is how bragdoc.ai processes git commit histories into work achievements that can then be turned into documents. Basically, if they've installed npm install -g bragdoc, a user can run a command like this:

$ bragdoc extract

This will grab commit messages from whatever repo you're in and send them up to the bragdoc API, which it then processes into work achievements. The LLM behind this has to consider a bunch of things in doing this, including:

If the user has Projects defined already, which Project are these commits for?
How impactful is each achievement on a scale of 1-3?
When did the achievement happen?
Was it an achievement that happened over a period of time, or a single event?

In order for it to do this, it needs to know about the user's Companies and Projects, the Commits themselves, and the Repository they're in. It may also need to know about recent Achievements that have been tracked. We're using an LLM to take a string prompt and return a structured data response, so we need to provide it with a prompt that tells it what we want it to do.

Moreover, in order to get an LLM to do this highly specialized task for us, we're more likely to achieve success if we one-shot a specialized prompt than if we tried to coax an LLM in the middle of a chat conversation to do it for us. This means we're probably going to use a Router LLM to essentially invoke our specialized prompt as a tool call.

Router LLMs

A lot of AI apps will use a Router LLM model to decide which LLM to use for a given prompt. Bragdoc does this so that we can dispatch to different prompts based on whether we're recording achievements, generating documents, or something else. From the perspective of the Router LLM, the Achievement Extraction process is just a tool that it can call with some input.

A tool call is not the only way that the achievement extraction process can be invoked, so this means the tool should be a thin wrapper around something else - in this case I'm calling that something else the Orchestrator, which:

gathers all the data required for the Achievement Extraction prompt that wasn't provided by the Router LLM (Fetcher)
renders the prompt using that data (Renderer)
calls the LLM with the prompt, returning the processed response (LLM)

As you can see there are actually at least 5 different types that our data passes through in the Achievement Extraction process:

ExtractAchievementsFetcherProps - the minimal object required to fetch the data for the prompt
ExtractAchievementsPromptProps - the minimal object required to render the prompt
string - the rendered prompt we pass to the LLM
LLMExtractedAchievement - structured output data response from the LLM (needs processing)
ExtractedAchievement - final, processed Achievement objects compatible with our data layer

Conceptually, the types flow around the achievement extraction process like this:

The reason for ExtractAchievementsFetcherProps existing is to allow the Router LLM to pass something else the minimum data required to fetch the rest of the data for the prompt. This helps the Router LLM focus on producing the right data for its tool call, and also allows us to have multiple code pathways to generate documents without going through the Router LLM.

The Fetcher may not even need to exist in all instances - here it's just loading the companies and projects for the given user. This centralizes that code in one place, allowing us to reuse it. Its output is ExtractAchievementsPromptProps - the minimal set of data required to render our prompt.

The Orchestrator is a simple function that calls the Fetcher, passes its output to the Renderer, then calls the LLM with the rendered prompt. It also does a little data transformation from the LLMExtractedAchievement to the ExtractedAchievement format that our data layer expects. Again this may not be needed in all cases - in this case it is turning date strings into JS Date objects, as well as inserting some timestamps - stuff we don't want the LLM doing anyway.

The Orchestrator is just the extract-achievements.ts file, which contains the following functions:

fetch(props) - fetches the data required for the prompt
render(data) - renders the prompt
execute(prompt) - calls the LLM with the prompt and processes the response
renderExecute(data) - renders the prompt and executes it, returning the extracted achievements
fetchRenderExecute(props) - fetch/render/execute, returns the extracted achievements as an array
streamFetchRenderExecute(props) - fetch/render/execute, yielding each extracted achievement as they stream in

Those first three - fetch, render and execute - map directly to the Fetcher/Renderer/LLM conceptual diagram boxes. We can update our diagram with the actual functions, along with the three higher-level functions we've added - renderExecute, fetchRenderExecute, and streamFetchRenderExecute:

The fetch(), render(), and execute() functions are super simple. The render() function uses mdx-prompt to render the MDX prompt into a string. The fetch() function just asynchronously does whatever loading is required, and the execute() function uses the excellent Vercel AI SDK to call the LLM and stream back the results:

/**
 * Fetches the data necessary to render the Extract Achievements Prompt.
 * Given a minimal set of data, prepares the rest of the data required 
 * for the achievements extraction prompt
 * 
 * @param props ExtractAchievementsFetcherProps
 * @returns ExtractAchievementsPromptProps
 */
export async function fetch(props: ExtractAchievementsFetcherProps): Promise<ExtractAchievementsPromptProps> {
  const {user, message, chatHistory} = props;

  const [projects, companies] = await Promise.all([
    getProjectsByUserId(user.id),
    getCompaniesByUserId({ userId: user.id }),
  ]);

  return {
    message,
    chatHistory,
    companies,
    projects,
    user
  }
}

/**
 * Renders the Extract Achievements Prompt
 * 
 * @param data ExtractAchievementsPromptProps
 * @returns string
 */
export async function render(data: ExtractAchievementsPromptProps): Promise<string> {
  return await renderMDXPromptFile({
    filePath: promptPath,
    data,
    components
  });
}

/**
 * Executes the rendered prompt and yields the extracted achievements
 * 
 * @param prompt string
 * @returns AsyncGenerator<ExtractedAchievement, void, unknown>
 */
export async function* execute(prompt: string): AsyncGenerator<ExtractedAchievement, void, unknown> {
  const { elementStream } = streamObject({
    model: extractAchievementsModel,
    prompt,
    temperature: 0,
    output: 'array',
    schema: achievementResponseSchema,
  });

  for await (const element of elementStream) {
    yield {
      ...element,
      summary: element.summary || '',
      details: element.details || '',
      eventStart: element.eventStart ? new Date(element.eventStart) : null,
      eventEnd: element.eventEnd ? new Date(element.eventEnd) : null,
      impactSource: 'llm',
      impactUpdatedAt: new Date(),
    };
  }
}

Fetch

Let's take a closer look at that code then. fetch() is pretty easy:

/**
 * Fetches the data necessary to render the Extract Achievements Prompt.
 * Given a minimal set of data, prepares the rest of the data required 
 * for the achievements extraction prompt
 * 
 * @param props ExtractAchievementsFetcherProps
 * @returns ExtractAchievementsPromptProps
 */
export async function fetch(props: ExtractAchievementsFetcherProps): Promise<ExtractAchievementsPromptProps> {
  const {user, message, chatHistory} = props;

  const [projects, companies] = await Promise.all([
    getProjectsByUserId(user.id),
    getCompaniesByUserId({ userId: user.id }),
  ]);

  return {
    message,
    chatHistory,
    companies,
    projects,
    user
  }
}

All it's doing it taking what the LLM tool call + session data from our Next Auth integration with our API endpoint gave it (ExtractAchievementsFetcherProps) and returning the data required to render the prompt (ExtractAchievementsPromptProps).

// props required to render the Extract Achievements Prompt
export interface ExtractAchievementsPromptProps {
  companies: Company[];
  projects: Project[];
  message: string;
  chatHistory: Message[];
  user: User;
};

Having a fetch step means we can call on this piece of LLM functionality from anywhere in our app without needing to re-implement the loading of projects and companies each time we do so.

Render

That ExtractAchievementsPromptProps is the same type we then pass into the Renderer, which is just a function call to mdx-prompt's renderMDXPromptFile() function:

//load our custom mdx-prompt components like Company and Project
import * as components from './prompts/elements';

// the path to the extract-achievements.mdx file
const promptPath = path.resolve("./lib/ai/prompts/extract-achievements.mdx");

//the render function
export async function render(data: ExtractAchievementsPromptProps): Promise<string> {
  return await renderMDXPromptFile({
    filePath: promptPath,
    data,
    components
  });
}

The extract-achievements.mdx file looks like this (truncated for brevity - here's the full thing):

<Prompt>
  <Purpose>
    You are a careful and attentive assistant who extracts work achievements
    from conversations between users and AI assistants. Extract all of the
    achievements in the user message contained within the {`<user-input>`}
    tag. Follow all of the instructions provided below.
  </Purpose>
  <Instructions>
    <Instruction>
    Each achievement should have a clear, action-oriented title (REQUIRED) that:
    - Starts with an action verb (e.g., Led, Launched, Developed)
    - Includes specific metrics when possible (e.g., "40% reduction", "2x improvement")
    - Mentions specific systems or teams affected
    - Is between 10 and 256 characters
    </Instruction>
    <Instruction>Each Achievement should be complete and self-contained.</Instruction>
    <Instruction>Do not invent details that the user did not explicitly say.</Instruction>
  </Instructions>
  <InputFormat>{data.message}</InputFormat>
  <Variables>
    <today>{new Date().toLocaleDateString()}</today>
    <user-instructions>
      {data.user?.preferences?.documentInstructions}
    </user-instructions>
    <ChatHistory messages={data.chatHistory} />
    <Companies companies={data.companies} />
    <Projects projects={data.projects} />
    <UserInput>{data.message}</UserInput>
  </Variables>
  <Examples examples={data.examples?.map((e) => JSON.stringify(e, null, 4))} />
</Prompt>

Your answer:

Calling the render() function will return a nicely formatted mixture of text and xml-style tags matching the JSX structure of the prompt.

Execute

The last box on the drawing above is the one labelled LLM - here's how that looks in our code. We're taking a string prompt and using an async generator function to stream the achievements back to the caller as they come in:

/**
 * Executes the rendered prompt and yields the extracted achievements
 * 
 * @param prompt string
 * @returns AsyncGenerator<ExtractedAchievement, void, unknown>
 */
export async function* execute(prompt: string): AsyncGenerator<ExtractedAchievement, void, unknown> {
  const { elementStream } = streamObject({
    model: extractAchievementsModel,
    prompt,
    temperature: 0,
    output: 'array',
    schema: achievementResponseSchema,
  });

  for await (const element of elementStream) {
    yield {
      ...element,
      summary: element.summary || '',
      details: element.details || '',
      eventStart: element.eventStart ? new Date(element.eventStart) : null,
      eventEnd: element.eventEnd ? new Date(element.eventEnd) : null,
      impactSource: 'llm',
      impactUpdatedAt: new Date(),
    };
  }
}

The execute function above is a generator function that yields ExtractedAchievement objects as they come in from the LLM. It's doing the data transformation from LLMExtractedAchievement to ExtractedAchievement that we mentioned earlier.

Convenience Functions

Let's remind ourselves of our diagram:

Turning to the 3 other functions we have defined in the extract-achievements.ts file, renderExecute is a wrapper around render() and execute() - it's really useful for Evals, where we want to have more control over the input data than fetch() would allow. fetchRenderExecute just wraps the fetch part as well, returning a Promise that will resolve to the array of extracted Achievements.

The final function - streamFetchRenderExecute - is a generator function that yields each extracted achievement as it comes in from the LLM. This allows us to stream the achievements into the UI as they come in, which is what we do in the Bragdoc app. But all three of these functions are pretty simple orchestrations of the lower-level functions:

/**
 * Fetches the data, renders the prompt, and executes the prompt, yielding the extracted achievements
 * 
 * @param input ExtractAchievementsFetcherProps
 * @returns AsyncGenerator<ExtractedAchievement>
 */
export async function* streamFetchRenderExecute(input: ExtractAchievementsFetcherProps): AsyncGenerator<ExtractedAchievement> {
  const data = await fetch(input);

  for await (const achievement of execute(await render(data))) {
    yield achievement;
  }
}

/**
 * Fetches the data, renders the prompt, and executes the prompt, returning the extracted achievements
 * 
 * @param input ExtractAchievementsFetcherProps
 * @returns Promise<ExtractedAchievement[]>
 */
export async function fetchRenderExecute(input: ExtractAchievementsFetcherProps): Promise<ExtractedAchievement[]> {
  const data = await fetch(input);

  return await renderExecute(data);
}

/**
 * Renders the prompt and executes it, returning the extracted achievements
 * 
 * @param data ExtractAchievementsPromptProps
 * @returns Promise<ExtractedAchievement[]>
 */
export async function renderExecute(data: ExtractAchievementsPromptProps): Promise<ExtractedAchievement[]> {
  const achievements: ExtractedAchievement[] = [];
  
  for await (const achievement of execute(await render(data))) {
    achievements.push(achievement);
  }

  return achievements;
}

In the actual bragdoc.ai app, we try to give a better UX by streaming the Achievements in to the UI as the LLM returns them, so our tool call is actually calling streamFetchRenderExecute directly so that it can immediately render each Achievement into the UI.

Similarly, for the Evals that we run against this code, we also want to chop and change the pipeline a little bit. From the perspective of our Evals, we want to bypass the Fetcher stage so that we can provide our own ExtractAchievementsPromptProps and test that we got the right ExtractedAchievement objects back.

Testing

Breaking things up like this means we can write effective tests for each step of the process:

Router LLM - an eval that checks the Router is calling the right tool with the right data
Fetcher - unit tests that check the right data is being fetched
Renderer - unit tests that check the right prompt is being rendered
LLM - an eval that checks the LLM is returning the right data for the prompt we feed it
Orchestrator - unit tests that check the right data is being passed between the steps

In this way we've isolated our LLM invocations into tightly-defined functions, so that we can write fast-executing unit tests for everything else in our pipeline, write tight Evals against the LLM parts of the pipeline, passing in well-understood mock data matching the types we've defined.

But this article has already gone on long enough, so I've split the Evals stuff into its own article called Eval-Driven Design with NextJS & mdx-prompt.

mdx-prompt: Composable LLM Prompts with JSX

Mon, 03 Feb 2025 06:31:02 GMT

I'm a big fan of IndyDevDan's YouTube channel. He has greatly expanded my thinking when it comes to LLMs. One of the interesting things he does is write many of his prompts with an XML structure, like this:

<purpose>
  You are a world-class expert at creating mermaid charts.
  You follow the instructions perfectly to generate mermaid charts.
  The user's chart request can be found in the user-input section.
</purpose>

<instructions>
  <instruction>Generate valid a mermaid chart based on the user-prompt.</instruction>
  <instruction>Use the diagram type specified in the user-prompt.</instruction>
  <instruction>Use the examples to understand the structure of the output.</instruction>
</instructions>

<user-input>
  State diagram for a traffic light. Still, Moving, Crash.
</user-input>

<examples>
  <example>
    <user-chart-request>
      Build a pie chart that shows the distribution of Apples: 40, Bananas: 35, Oranges: 25.
    </user-chart-request>
    <chart-response>
      pie title Distribution of Fruits
        "Apples" : 40
        "Bananas" : 35
        "Oranges" : 25
    </chart-response>
  </example>
  //... more examples
</examples>

I really like this structure. Prompt Engineering has been a dark art for a long time. We're suddenly programming using English, which is hilariously imprecise as a programming language, and it feels not quite like "real engineering".

But prompting is actually not programming in English, it's programming in tokens. It just looks like English, so it's easy to fall into the trap of giving it English. But we're not constrained to that at all actually - we can absolutely format our prompts more like XML and reap some considerable rewards:

It's easier for humans to reason about prompts in this format
It's easier to reuse content across prompts
It's easier to have an LLM generate a prompt in this format (see IndyDevDan's metaprompt video)

We've seen this before

I've started migrating many of my prompts to this format, and noticed a few things:

It organized my thinking around what data the prompt needs
Many prompts could or should use the same data, but repeat fetching/rendering logic each time

For example, bragdoc.ai basically does 2 things with LLMs: Extracting Achievements from written text, and Generating Documents from Achievements. We can extract Achievements from either a chatbot message or a git commit history, so we have a separate prompt for each, but as you can imagine those prompts have a huge amount in common.

To each we also provide the user's project and company data, as well as any custom instructions from the user. But the Commit Extractor is given a set of commits and repo data, whereas the Text Extractor is fed a message and a chat history. The Document Generator also needs to be fed with a lot of the same data - companies and projects - but is also given a set of Achievements to generate a document from.

The Venn diagram would look something like this:

Because of all of the overlap, I'd extracted a bunch of functions that looked like this:

const renderCompany = (company: Company) => {
  return `
    <company>
      <name>${company.name}</name>
      <id>${company.id}</id>
      <role>${company.role}</role>
      <domain>${company.domain || 'N/A'}</domain>
      <startDate>${company.startDate.toISOString()}</startDate>
      <endDate>${company.endDate ? company.endDate.toISOString() : 'Present'}</endDate>
    </company>
  `;
}

const renderCompanies = (companies: Company[]) => {
  return `<companies>` + companies.map(renderCompany).join('\n') + `</companies>`;
}

Then my prompts would do things like this:

export function renderPrompt(input: ExtractAchievementsInput) {
  const chatStr = input.chatHistory
    .map(({ role, content }) => `${role}: ${content}`)
    .join('\n');

  const prompt = `Extract all achievements from the following user message. 
Consider the chat history and context to understand the full scope of each achievement.
Pay special attention to:
1. Recent updates or progress reports
2. Completed milestones or phases
3. Team growth or leadership responsibilities
4. Quantitative metrics or impact
5. Technical implementations or solutions

<user-message>
${input.input}
</user-message>

<chat-history>
${chatStr}
</chat-history>

<context>
${renderCompanies(input.companies)}
${renderProjects(input.projects)}
</context>

More blah blah blah of unstructured instructions and exhortations.`;

  return prompt;
}

Ok, but this looks kinda familiar. We're just using string interpolation, but we're achieving quite a few React-y things:

We're composing our prompt from smaller functions
We're rendering synchronously with well-defined props
We're rendering data into structured text

Can't we just use React to do this?

Use MDX for Prompt Composability and Reuse

What if we could write the prompts more like this:

<Purpose>
  You are a careful and attentive assistant who extracts work achievements
  from conversations between users and AI assistants. Extract all of the
  achievements in the user message contained within the {`<user-input>`}
  tag. Follow all of the instructions provided below.
</Purpose>
<Instructions>
  <Instruction>
  Pay special attention to:
1. Recent updates or progress reports
2. Completed milestones or phases
3. Team growth or leadership responsibilities
4. Quantitative metrics or impact
5. Technical implementations or solutions
  </Instruction>

  <Instruction>
  Each achievement should have a clear, action-oriented title (REQUIRED) that:
  - Starts with an action verb (e.g., Led, Launched, Developed)
  - Includes specific metrics when possible (e.g., "40% reduction", "2x improvement")
  - Mentions specific systems or teams affected
  - Is between 10 and 256 characters
  </Instruction>
  <Instruction>
  Example good titles:
  - "Led Migration of 200+ Services to Cloud Platform"
  - "Reduced API Response Time by 40% through Caching"
  - "Grew Frontend Team from 5 to 12 Engineers"
  </Instruction>
  <Instruction>Do not invent details that the user did not explicitly say.</Instruction>
</Instructions>
<InputFormat>{data.message}</InputFormat>
<Variables>
  <today>{new Date().toLocaleDateString()}</today>
  <user-instructions>
    {data.user?.preferences?.documentInstructions}
  </user-instructions>
  <ChatHistory messages={data.chatHistory} />
  <Companies companies={data.companies} />
  <Projects projects={data.projects} />
  <UserInput>{data.message}</UserInput>
</Variables>
<Examples examples={data.examples?.map((e) => JSON.stringify(e, null, 4))} />

Your answer:

The above is a slightly trimmed down version of this extract-achievements.mdx file in the prompts directory of the bragdoc-ai repository. First of all, what are we even looking at here? This is MDX, a mature and well-supported mashup of JSX and Markdown. I use it for this very blog site, and it has some nice attributes for writing prompts:

It supports plain, unstructured text
It supports JSX, allowing us to create React components for our prompts
It supports XML, so we can use the XML-style prompt syntax for structured instructions that don't need a full React component

It renders to something like this:

<purpose>
  You are a careful and attentive assistant who extracts work achievements
  from conversations between users and AI assistants. Extract all of the
  achievements in the user message contained within the &#x3C;user-input>
  tag. Follow all of the instructions provided below.
</purpose>
<instructions>
  <instruction>
    Pay special attention to:
    - Recent updates or progress reports
    - Completed milestones or phases
    - Team growth or leadership responsibilities
    - Quantitative metrics or impact
    - Technical implementations or solutions
    
  </instruction>
  <instruction>
    Each achievement should have a clear, action-oriented title (REQUIRED) that:
    - Starts with an action verb (e.g., Led, Launched, Developed)
    - Includes specific metrics when possible (e.g., "40% reduction", "2x improvement")
    - Mentions specific systems or teams affected
    - Is between 10 and 256 characters
    
  </instruction>
  <instruction>
    Example good titles:
    - "Led Migration of 200+ Services to Cloud Platform"
    - "Reduced API Response Time by 40% through Caching"
    - "Grew Frontend Team from 5 to 12 Engineers"
    
  </instruction>
  <instruction>Do not invent details that the user did not explicitly say.</instruction>

  //... more instructions
</instructions>
<input-format title="You are provided with the following inputs:">Hello</input-format>
<variables>
  <today>2/3/2025</today>
  <user-instructions>If I don't mention a specific project, I'm talking about Brag Doc.</user-instructions>
  <chat-history></chat-history>
  <companies>
    <company>
      <id>7972262d-c63d-4c87-b449-24dc634ca152</id>
      <name>Egghead Research</name>
      <role>Chief Scientist</role>
      <start-date>12/31/2022</start-date>
      <end-date>Present</end-date>
    </company>
    <company>
      <id>65e274f2-4f5a-4e68-89ca-0fcf9c4898cb</id>
      <name>Palo Alto Networks</name>
      <role>Principal Engineer</role>
      <start-date>1/31/2016</start-date>
      <end-date>9/29/2021</end-date>
    </company>
  </companies>
  <projects>
    <project>
      <id>24ac74b8-dfe6-4fdc-bf34-4a6b9ee22be6</id>
      <name>BragDoc.ai</name>
      <description>AI-powered self-advocacy tool for tech-savvy individuals.</description>
      <status>active</status>
      <company-id>7972262d-c63d-4c87-b449-24dc634ca152</company-id>
      <start-date>12/14/2024</start-date>
      <end-date>Present</end-date>
      <remote-repo-url></remote-repo-url>
    </project>
    <project>
      <id>9af96ca7-614e-4566-bb57-f4376a393c43</id>
      <name>mdx-prompt</name>
      <description>Composable LLM prompts with JSX and MDX</description>
      <status>active</status>
      <company-id>7972262d-c63d-4c87-b449-24dc634ca152</company-id>
      <start-date>12/31/2022</start-date>
      <end-date>6/29/2023</end-date>
      <remote-repo-url></remote-repo-url>
    </project>
  </projects>
  <user-input>Hello</user-input>
</variables>
<examples>
  <example>
    {
    "eventStart": "2024-06-15",
    "eventEnd": "2024-09-15",
    "eventDuration": "quarter",
    "title": "Launched AI Analysis Tool with 95% Accuracy at Quantum Nexus",
    "summary": "Developed an AI tool for real-time data analysis with 95% accuracy for Quantum Nexus, playing a pivotal role in Project Orion's success.",
    "details": "As part of Project Orion at Quantum Nexus, I was responsible for developing a cutting-edge AI tool focused on real-time data analysis. By implementing advanced algorithms and enhancing the training data sets, the tool reached a 95% accuracy rate. This result significantly supported the company's research objectives and has been positively acknowledged by stakeholders for its robust performance and reliability.",
    "companyId": "e3856e75-37cf-4640-afd9-e73a53fa967d",
    "projectId": "3923129e-719b-4f99-8487-9830cf64ad5d",
    "impact": 2
    }
  </example>
  //... more examples
</examples>
Your answer:

The <Purpose>, <Instructions>, and <Variables> tags are all just very basic JSX components exported by the mdx-prompt library. They're part of the standard package of core components that come with mdx-prompt, but they're super basic and it's easy to make your own.

You can imagine how similar the extract-commit-achievements.mdx prompt looks, with a lot of reused components and a tweaked prompt and set of instructions. The generate-document.mdx prompt looks quite similar, with the same companies and projects components being rendered, but also a set of achievements to generate a document from.

Beyond the built-in components, we also used a bunch of our own up there - chiefly the Company and Projects components. They're just normal React components (elements.tsx):

export function Company({ company }: { company: CompanyType }) {
  return (
    <company>
      <id>{company.id}</id>
      <name>{company.name}</name>
      <role>{company.role}</role>
      <start-date>{company.startDate.toLocaleDateString()}</start-date>
      <end-date>{company.endDate?.toLocaleDateString() || 'Present'}</end-date>
    </company>
  );
}

export function Companies({ companies }: { companies: CompanyType[] }) {
  return (
    <companies>
      {companies.map((company) => (
        <Company key={company.id} company={company} />
      ))}
    </companies>
  );
}

That's what mdx-prompt lets you do. It's a simple library that lets you write your prompts in JSX, and then render them to a string. It works great alongside React.

Benefits

We get a number of benefits from this:

Reuse of components such as <Companies /> and <Projects />
Composability of prompts, where we can easily add or remove sections
JSX syntax highlighting and linting
Familiarity with JSX for React developers
A well-defined set of props required to render the prompt

That last one is important. By creating a JSX prompt in this way, we've forced ourselves to distill down to the essential data that the prompt needs, as expressed in our ExtractAchievementsPromptProps type. Not only does this make it easier to understand what data we need to assemble for the prompt, it also makes it easier run evals against the prompt with mock data:

// It's much easier to reason about what a prompt needs with an interface.
// Much easier to feed test and eval data to as well.
export interface ExtractAchievementsPromptProps {
  companies: Company[];
  projects: Project[];
  message: string;
  chatHistory: Message[];
  user: User;
}

I didn't want this article to be too long, so I'm publishing 2 other articles at the same time that go deeper into this. The first is mdx-prompt: Real World Example Deep Dive, where we look at a real-world example of mdx-prompt being using in a production open source Next JS application. The second is EDD: Eval-Driven-Design with mdx-prompt, where we look at how to write tests for these prompts.

Downsides / Challenges

I had to swim uphill a little to get this working in all the different places we want it to, chiefly when it comes to rendering:

React / Next JS compatibility quirks

mdx-prompt needs to work in a bunch of different places:

Rendering in API endpoints to power LLM calls
Rendering in CLI functions like npx braintrust eval
Working in test environments like Jest
Rendering preview prompts into your UI a la https://www.bragdoc.ai/prompt

I spend most of my time in Next JS and don't have a full mental model of its integration with React. A bunch of times while creating mdx-prompt I ran into problems with incompatible React versions. Some of that was just a bad rollup configuration, but annoying problems abound and this stuff can be a little quirky to get running in all places at once.

I never did find a way to have a React Server Component render one of the prompts to text, which is what I wanted to be able to do in an RSC along with the Bright syntax highlighting library. In the end I used a server endpoint to render the prompt to text and use a client component to fetch that text via an API call, but it's slightly inelegant. On the other hand, it is interesting to have API endpoints that return well-structured LLM prompts. Maybe that will be useful elsewhere.

Some TypeScript chores

Obviously, most of these XML tags that I'm using in my prompts don't exist in the HTML spec, so TypeScript is not happy with you. In Next JS, I've found that you can just declare them as JSX.IntrinsicElements and TypeScript will be happy. In my Next JS app I just created a global.d.ts file like this:

import React from 'react';

type CustomHTMLProps = React.DetailedHTMLProps<
  React.HTMLAttributes<HTMLElement>,
  HTMLElement
>;

// Define any custom tags you want to permit for your LLM prompts here
// [I've severely truncated this list]
type CustomTags =
  | 'companies'
  | 'company'
  | 'projects'
  | 'project'
  | 'name'
  | 'remote-url';

//adds all the custom tags to the JSX namespace
declare module 'react' {
  namespace JSX {
    interface IntrinsicElements extends Record<CustomTags, CustomHTMLProps> {}
  }
}

Ok, that's cool and the TypeScript errors go away and everything builds, tests and runs just fine. It's a bit annoying though. The actual CustomTags is a lot longer than that. Maybe there's some better TypeScript or JSX trick that could make this more pleasant.

Note though that you only need to do this for your custom React components - any xml-style tags you use in your .mdx files will just get rendered as text as expected.

Abuse of ReactDOM

Ultimately this is a bit of a hack, and because it's using ReactDOM to render the JSX to a string, it also has edge-case bugs like the one where you use a <title>Some Title</title> tag as part of your JSX prompt only for the <title> to get hoisted to the top of the document because ReactDOM thinks it's rendering an HTML document. There's probably other stuff like that.

Of course, React renders text just fine too, so you can totally just use mdx-prompt without using xml-style tags and still benefit from all the composability and reuse stuff.

We're not importing in the usual way

As I write about in the deep-dive post, we're using JSX/MDX to author and render our prompts. The way we do that at the moment is like this:

import { renderMDXPromptFile } from "mdx-prompt";
import * as components from './prompts/elements';

const promptPath = path.resolve("./lib/ai/prompts/extract-achievements.mdx");

/**
 * Renders the Extract Achievements Prompt
 * 
 * @param data ExtractAchievementsPromptProps
 * @returns string
 */
export async function render(data: ExtractAchievementsPromptProps): Promise<string> {
  return await renderMDXPromptFile({
    filePath: promptPath,
    data,
    components
  });
}

The render() function returns the rendered prompt as a string. We're using the renderMDXPromptFile helper from mdx-prompt to do the rendering, which is in turn using ReactDOM to render the JSX to a string. This is part of what allows us to not have to import all (or indeed any) of the components that our .mdx files use - the .mdx prompt files can just render the <Purpose />, <Instructions />, and other built-in mdx-prompt components, and we pass in all of our app's custom ones defined in elements.tsx.

There are ups and downs associated with this - one issue is that we lose type checking, which I discuss a little more in the deep-dive post. It's also maybe possible that some bundlers could have difficulty seeing the import and leave out the file from the build, though it's worked just fine for me in my Next JS apps.

Integrating into the UI

One of the appealing things about writing prompts with JSX/MDX is that you can just render it like any other React component. This makes it fairly easy to render our prompts into the UI, so that we can iterate on them, feed them different data, etc. It really does beat console.logging prompts to the terminal, where we can't benefit from syntax highlighting and it's easy to just lose things in the noise.

The fact that mdx-prompt is just React is a little deceptive, though, as when we render React components into the browser, we ultimately want the DOM API to turn the tags into DOM elements, whereas we just want our mdx-prompt to be rendered as a string.

I made a page at https://www.bragdoc.ai/prompt that uses Next JS to render prompts in the browser. You can open that page (no account needed to access this page) and see exactly what the rendered prompts used by bragdoc.ai look like:

I tried to do that in an RSC but Next JS really doesn't want you rendering React components into strings in server components. It seems to be a somewhat common issue judging by the GitHub issues of people discussing how to get around it. In the end I just decided to create a simple API to render the prompt to a string, then fetch it via SWR in the browser:

import { NextResponse } from 'next/server';
import { render as renderExtractAchievements } from '@/lib/ai/extract-achievements';
import { render as renderExtractCommitAchievements } from '@/lib/ai/extract-commit-achievements';
import { render as renderGenerateDocument } from '@/lib/ai/generate-document';

type Params = Promise<{
  id: string;
}>;

import {
  companies,
  projects,
  user,
  repository,
  commits,
} from '@/lib/ai/prompts/evals/data/user';

import { chatHistory, expectedAchievements as examples } from '@/lib/ai/prompts/evals/data/extract-achievements';
import { existingAchievements } from '@/lib/ai/prompts/evals/data/weekly-document-achievements';

// This is a Server Route, so no "use client" here
export async function GET(
  request: Request,
  { params }: { params: Params }
) {
  const { id } = await params;

  let input: any;
  let prompt =  '';

  switch (id) {
    case 'extract-achievements':
      input = {
        user,
        message: 'Hello',
        chatHistory,
        projects,
        companies,
        examples
      }

      prompt = await renderExtractAchievements(input);
      break;
    case 'extract-commit-achievements':
      input = {
        user,
        companies,
        projects,
        repository,
        commits,
      }

      prompt = await renderExtractCommitAchievements(input);

      break;
    case 'generate-document':
      prompt = await renderGenerateDocument({
        user,
        docTitle: 'Weekly Update',
        days: 7,
        achievements: existingAchievements,
        project: projects[0],
        company: companies[0],
        userInstructions: 'Always use the title "Weekly Update"'
      });

      break;
  }

  // 4) Return the HTML as a text or HTML response
  return new NextResponse(prompt, {
    status: 200,
    headers: {
      'Content-Type': 'text/html',
    },
  });
}

It's not quite as idiomatic as passing props to React components, and I don't love switching over the id in this way as it won't scale particularly well, but it does give us a really easy way to render a nicely formatted prompt to a string, using fake data in this case, but it's also easy to integrate it with the session, your database, or whatever you need.

This is another way that Evals prove their worth - having good data for Evals means by definition that you have good data to render your prompts with. It's trivial to throw together an app/prompts/page.tsx file with contents like the above and a few seconds later see the full prompt rendered in your browser, instead of searching for it in terminal output or logs.

In part 2, we'll go through a complete example of how bragdoc.ai uses mdx-prompt for all of its core LLM capabilities. Then, in part 3 we take a more in-depth look at how easy it is to create Evals with mdx-prompt. In 2025 Evals really are table stakes for any AI app, and they're worth embracing early as they'll almost certainly improve what may well be the most important part of your app.

How I built bragdoc.ai in 3 weeks

Wed, 08 Jan 2025 06:31:02 GMT

Feel free to run your own version if you don't want to pay me the $2.50/month for the hosted version. But mostly it's there as a reference for how to build a product like this with AI tooling.

As we start 2025, it's never been faster to get a SaaS product off the ground. The frameworks, vendors and tools available make it possible to build in weeks what would have taken months or years even just a couple of years ago.

But it's still a lot.

Even when we start from a base template, we still need to figure out our data model, auth, deployment strategy, testing, email sending/receiving, internationalization, mobile support, GDPR, analytics, LLM evals, validation, UX, and a bunch more things:

This morning I launched bragdoc.ai, an AI tool that tracks the work you do and writes things like weekly updates & performance review documents for you. In previous jobs I would keep an achievements.txt file that theoretically kept track of what I worked on each week so that I could make a good case for myself come review time. Bragdoc scratches my own itch by keeping track of that properly with a chatbot who can also make nice reports for me to share with my manager.

But this article isn't much about bragdoc.ai itself, it's about how a product like it can be built in 3 weeks by a single engineer. The answer is AI tooling, and in particular the Windsurf IDE from Codeium.

In fact, this article could easily have been titled "Use Windsurf or Die". I've been in the fullstack software engineering racket for 20 years, and I've never seen a step-change in productivity like the one heralded by Cursor, Windsurf, Repo Prompt and the like. We're in the first innings of a wave of change in how software is built.

Productivity Pays

Any time I can add 5% to my productivity, I grab it with both hands. Last year I wrote about what I think is the best hardware setup for software engineers. I guesstimate that I get about a 20% productivity bump from that vs a standard 2 monitor setup, or 70% vs just a laptop screen. I spent years and thousands of dollars chasing that 20%. It's a big deal.

And then Windsurf came along, and within a few days my output across a full afternoon of work was consistently double what it would usually have been. I strongly suspect it will double again in the next year or so, then double again after that.

Because the amount we get paid is directly proportional to the value we create for our employers, maximizing our output is the most important thing we can do for our careers. This is as it should be. Embrace these tools or get out-competed by those who do.

This type of tooling is already the greatest productivity booster I've seen in my career, and it hasn't come close to its full potential yet. AI tooling for software engineering is no longer optional.

The Stack

So let's dive in to how I built bragdoc.ai in 3 weeks. The first ingredient was 20 years of experience doing this sort of thing. That's the hardest part of this equation and the reason why I think senior engineers are set to benefit most from these tools (see Winners and Losers below). But whether you have that or not, the following ingredients are a pretty good place to start:

Vercel Chat - Excellent starting template for a chatbot++ AI project
Windsurf - The AI IDE that makes it all possible (see also: Cursor, Repo Prompt)
Vercel AI SDK - Excellent library for all kinds of LLM calls. Already baked in to the Vercel Chat template.
Tailwind UI - Great collection of well-designed components that make it easy to build a good looking UI quickly
Braintrust - Easy tracking for LLM Evals, which are unit tests for your AI calls
Wispr Flow - Because we're just prompting, it's often much faster/easier to speak than to type
Vercel - Hosting, CI/CD, and a bunch of other stuff that makes it easy to get a product off the ground quickly

There are other things in the mix, obviously, but those are the main ingredients. Now let's look at how I use them.

Building things quickly in 2025

So here are my top tips on how to use the current set of AI software engineering tooling effectively to increase your output.

I'm couching these in terms of using WindSurf, but the same principles apply to Cursor, Repo Prompt, and any other AI tooling that comes along.

.windsurfrules

Absolutely critical in all this is setting up a system prompt for the LLMs doing work for you. Yifan's YouTube video on this explains it so clearly that I won't try to do better here. The same things he talks about for Cursor apply to equally to Windsurf. My .windsurfrules file for bragdoc is a couple of hundred lines long, with a dozen edits as I iterate on it.

Doing the initial work to learn about that and implement that file took several hours. The second time I do that it'll take 30 minutes. But that time invested is critical in getting the AI tooling to do what you actually want. It means you don't have to keep repeating yourself about how you want things done when you talk to the bot, but can focus on the actual work you want done.

README.md and FEATURES.md

LLMs still have pretty limited context windows, and any application of even moderate complexity will quickly exceed that. When I work on a codebase that I'm familiar with, I have a mental model of most of the stuff in that codebase, but LLMs don't have that benefit so we need to give them a hand.

Being able to point Windsurf at high quality documentation (README.md) and a description of all of the features that already exist (FEATURES.md) can give it that mental model without chewing up a ton of context. The other benefit of having a summarization of your entire codebase is that current generation LLMs start forgetting things when the context is long, so shorter chat sessions tend to produce better results. You can of course use Windsurf itself to keep those files updated too.

Feature Implementation Process

When I start working on a new feature - let's say it's support for assigning an impact rating to an Achievement - I start a new chat session inside Windsurf and tell it something like this:

The generated requirements.md will usually miss the mark on a few things so I will typically need to ask Windsurf to fix a few things, but we quickly converge on a detailed document that describes the feature at a level that a junior engineer should have no trouble implementing. In this instance you can see I actually had it create a PLAN.md file as well, which is a more detailed breakdown of the work to be done. Sometimes that can yield better results, especially for complex tasks.

Once we've got that, I'll commit it to a new branch, and ask Windsurf to start implementation. I ask it to keep this file updated as it proceeds with the work, and to keep a log.md file in the same directory that describes the work it's done. This way I can easily start new chat sessions to reset to a short LLM context and therefore better results.

From there it's generally a matter of just iteratively asking Windsurf to proceed with the next part of the implementation. It will take some number of steps before it stops and asks you what to do next - for the impact feature requirements.md I probably had to prompt it a couple of dozen times to get to a complete implementation. At least 50% of the edits it makes result in TypeScript errors, so there is some tedious work fixing those, but it's still enormously faster than typing it all out myself.

The final PR for the impact feature shows that I made 14 commits over about a 90 minute session to get it done, with about 1000 LOC changed. Committing often is highly recommended, as it's easy for Windsurf to do completely the wrong thing and you want to be able to roll back to a known good state.

Several hundred lines of that are the documents that Windsurf creates and uses to keep itself on track. Once the implementation is finished, one could and probably should delete the contents of the features/impact directory, as it's unlikely to be useful in the future, but one nice side benefit here is that we can implement some of the feature, go work on something else, then come back and finish it later as if we'd never left.

Tracking TODOs and bugs

I'd love to be able to use GitHub issues, and I'm sure that's the way it will go in the coming months, but for now I use a simple TODO.md to track issues and work that needs to be done. Until these tools get a native ability to CRUD issues in whatever issue tracker you want, this approach seems to work well enough for a single developer. Obviously, it won't scale well.

As with most of the other docs in the repo, a lot of the TODO.md was written by Windsurf itself in response to a prompt. Six months from now this approach will probably make no sense any more.

Disclaimers & Caveats

I have no affiliation with Codeium other than being a happy customer.

All the work I've been doing with Windsurf is stuff that the various parts of it have probably been trained on a lot. TypeScript projects using NextJS and React, a fairly common set of libraries, established API patterns, etc.

Bragdoc currently has about 500k tokens of content, between the code and the documentation. Both of those are really important to Windsurf. Many React applications will have 10x that. At the moment, none of the models available in Windsurf can deal with that much context simultaneously, so it is prone to forgetting things. I'm sure there are already various tricks in place to reduce that, but also it seems likely that models will continue to support larger and larger contexts lengths. It will only get better at this, and it's plausible that it will get close enough to perfect to stop being frustrating.

I highly recommend IndyDevDan's YouTube Channel for insights about how to use these tools but also skate to where the puck is heading. Many of these caveats are going to disappear in the next year or so.

It's like a junior engineer

Windsurf is a junior engineer from the seventh dimension, in that it's highly capable at many things, needs guidance in others, and sometimes does things that make no sense whatsoever.

50% of the TypeScript file changes it makes result in type errors, which you then have to go through and fix. Often if you tell it "TS errors in index.ts" it will fix them, but it is frustrating to have to do that. I'm sure that will get fixed before long, and will make a material difference in the output once again.

There's actually a lot more to be had

So after a few weeks of using it, I'm pretty satisfied that Windsurf is generally able to compress a day's work into an afternoon. However, it actually feels both fast and slow at the same time at the moment. I haven't taken actual measurements, so pinch of salt and all that, but it will often take around 10 seconds to make a change to a file, so when it chains 10+ edits together (which is amazing), it can take a minute or so to complete.

This part needs to get 10x faster, and I'm sure it will. At that point I think it will be able to compress work that would have taken a day in 2023 into about an hour.

Winners and Losers

This is a game-changing technology that is likely to disrupt our entire industry. As is often the case, it will probably benefit some a lot more than others. Putting my old Engineering Manager hat on, if I want to increase the velocity of my team, I don't want to add more junior engineers, I want a faster Windsurf for my senior folks. And I'm sure that's what I'll get before long. I wouldn't want to be a junior engineer in the job market in 2025.

Leetcode's days are numbered for the same reason we don't ask people to do long division by hand in interviews. Companies will no longer be able to perform the mental gymnastics required to ask a candidate to solve a logic problem they will never encounter, without the aid of a tool they will definitely have available and would be foolish not to use. Too many heads will implode, but more importantly too much money is at stake.

Focus will switch towards experience and ability to use those very tools to have high output for the company. Capitalism demands it. There was already a high dynamic range in software engineering - engineers who could create 10x more value than others. These AI tools add another order of magnitude to that range.

That ought to be across the board but it's possible that the old timers won't embrace it as fast as the younger engineers, so there may be an opportunity for some to close that gap a little.

NarratorAI: Trainable AI assistant for Node and React

Fri, 04 Oct 2024 06:31:02 GMT

Every word in every article on this site was, for better or worse, written by me: a real human being. Recently, though, I realized that various pages on the site kinda sucked. Chiefly I'm talking about the Blog home page, tag pages like this one for articles tagged with AI and other places where I could do with some "meta-content".

By meta-content I mean content about content, like the couple of short paragraphs that summarize recent posts for a tag, or the outro text that now appears at the end of each post, along with the automatically generated Read Next recommendations that I added recently using ReadNext.

If you go look at the RSC tag, for example, you'll see a couple of paragraphs that summarize what I've written about regarding React Server Components recently. The list of article excerpts underneath it is a lot more approachable with that high-level summary at the top. Without the intro, the page just feels neglected and incomplete.

But the chances of me remembering to update that intro text every time I write a new post about React Server Components are slim to none. I'll write it once, it'll get out of date, and then it will be about as useful as a chocolate teapot. We need a better way. Ideally one that also lets me play by watching the AI stream automatically generated content before my very eyes:

AI to the rescue

Although I don't want AI generating my actual content, I'm happy to let it generate the meta-content that surrounds it. That's what NarratorAI does. NarratorAI is a pair of NPM packages that create and present AI-generated content that supports your actual content. It's a bit like a content assistant that writes the boring bits for you:

narrator-ai: The core package that generates the content
@narrator-ai/react: A React library that helps render, regenerate and rate the content

You can use narrator-ai whether you use React or not, but it does go well together. You don't need to use @narrator-ai/react if you don't want to, but it does some nice things for you like letting you easily regenerate and rate the content that narrator-ai generates (see gif above and live demo below).

Live Demo me already

This little thing inside the box below is a demo of NarratorAI in action. It's showing you the "What to Rad Next" text for this very post (scroll down to the bottom of this article to see that). This piece of content was generated by narrator-ai, and I'm using @narrator-ai/react below to render it.

Although I use @narrator-ai/react to render this type of content throughout the site, I only enable the editorial action buttons when I'm developing locally. For this demo below, though, I've enabled the regenerate, thumbs up and thumbs down buttons for you to play with live, as well as chosen a lovely shade of red to make things stand out:

The regenerate button will stream in a new piece of Markdown content generated by NarratorAI. The thumbs up and thumbs down buttons will rate the content, which will help NarratorAI learn what you like and don't like. You can provide a reason why you did/didn't like the content, and that feedback will be used to improve subsequent generations.

Don't worry, you can't break anything - although it's all hooked up to a real backend, the demo above won't overwrite anything.

How it works

NarratorAI uses a technique called Few Shot Prompting to generate content. This is where we give an LLM a few examples of the type of content we want it to generate, and then ask it to generate more of the same. Equally importantly, we can also give it examples of what we don't want it to generate, and ask it to avoid that.

Few Shot Prompting has a few benefits: it's quick to train, it's easy to understand, and it's portable between different models. At its core it's just a slightly longer prompt - you could fine tune a model to do the same thing, and that's totally reasonable in many cases, but Few-Shotting is way easier (and more portable).

NarratorAI builds on top of the excellent Vercel AI SDK, which means you can configure it to use pretty much any LLM you like. By default, it'll use GPT 4o, so the only configuration you need to do is to set up your OPENAI_API_KEY environment variable (Vercel AI supports a range of AI providers beyond OpenAI).

Generating content

Right now I generate 2 types of content with NarratorAI:

Intros for tag pages and the blog home page
Outros for the end of each post, telling you about related articles

In order to generate the intros, I need to grab the most recent X articles on tag XYZ and pass them to Narrator along with a prompt telling it what I want it to do. For the outros, I need to do a similar thing, except I need to find the X articles that are the most related to the one I am generating for. That turns out to be pretty easy as I'm already using ReadNext to automatically generate the related articles list.

Because there's a little logic involved in assembling the pieces that I need to send to the LLM for each task, and because I want to be able to generate a given piece of content from either the UI or a CLI, I created a TaskFactory class that does all the heavy lifting for me. Here's a simplified version of how I use it:

//create our reusable Narrator instance
export const narrator = new Narrator({
  outputFilename: (docId) => `${docId}.md`,
  outputDir: path.join(process.cwd(), "editorial"),
  examplesDir: path.join(process.cwd(), 'editorial', 'examples'),
});

export class TaskFactory {
  //returns a GenerationTask for a given docId
  jobForId(docId: string): GenerationTask {
    const [exampleKey, slug] = docId.split("/");
    const { publishedPosts } = this.posts;

    if (exampleKey === "post") {
      return this.postJob(publishedPosts.find((post) => post.slug === slug));
    } else if (exampleKey === "tag") {
      return this.tagJob({ tag: slug });
    }
  }

  //returns a GenerationTask for a post outro
  postJob(post): GenerationTask {
    //summaries of related articles
    const relatedArticles = post.related
      ?.map((slug) => this.posts.publishedPosts.find((p) => p.slug === slug))
      .map((post) => ({ post, summary: this.readNext.getSummaryById(post.slug) }));

    return {
      docId: `post/${post.slug}`,
      prompt: postReadNextPrompt(post, this.posts.getContent(post), relatedArticles),
      suffix: "Please reply with a 2 sentence suggestion for what the reader should read next.",
    };
  }

  //returns a GenerationTask for a tag intro
  tagJob({ tag }): GenerationTask {
    //the 10 most recent posts for a given tag
    const recentPosts = this.posts.publishedPosts
      .filter((post) => post.tags.includes(tag))
      .slice(0, 10)
      .map((post) => ({ post, summary: this.readNext.getSummaryById(post.slug) }));

    return {
      docId: `tag/${tag}`,
      prompt: tagIntroPrompt(tag, recentPosts),
    };
  }
}

All that does is give me a function called jobForId that I can pass a docId to, and it will return a GenerationTask object that I can pass to the narrator.generate function. The GenerationTask object contains the prompt that I want to send to the LLM, along with a unique docId that identifies the content that I want to generate, and an optional suffix.

Now I can just run a single line of code to generate the intro/outro for any given tag or post:

await narrator.generate(factory.taskForId("tag/ai"));

The one thing I haven't shown you here is the tagIntroPrompt function that TaskFactory refers to. That's just a function that takes a tag and a list of recent posts and returns a prompt that tells the LLM what I want it to generate. Here's a slightly simplified version of that function (the postReadNextPrompt function is similar):

//it's just a string. A long string, but a string.
function tagIntroPrompt(tag: string, recentPosts: RecentPost[] = []) {
  return `
    These are summaries of the ${recentPosts.length} most recent posts on my blog for the tag "${tag}".
    The summaries have been specifically prepared for you so that you have the context you need to 
    a very brief 2 paragraph overview of what I've been writing about recently regarding this tag.
    Write the editorial in my own tone of voice, as if I were writing it myself.
    It should be around 100 words.
    
    *** There's actually more stuff here, but you get the idea ***
  
    Keep it humble and not too high-faluting. I'm a technical blogger, not a poet.
    
    Here are the summaries of the recent blog posts:
    
    ${recentPosts.map(({ post, summary }) => articleRenderer(post, summary)).join("\n\n")}
`;
}

//LLM-friendly string for a given post summary
const articleRenderer = (post, summary) => `
ARTICLE METADATA:
Article Title: ${post.title}
Article relative url: ${post.relativeLink}
Tags: ${post.tags.join(", ")}
Published: ${timeAgo.format(new Date(post.date))}
ARTICLE SUMMARY: ${summary}
`;

It's just returning a string, which is then passed in as the prompt to the generate function. With those pieces in place, generating all ~200 pieces of intro and outro content for the whole site is done with this simple script:

//to expose the OPENAI_API_KEY
import * as dotenv from "dotenv";
dotenv.config();

import Posts from "@/lib/blog/Posts";
import { TaskFactory, narrator } from "@/lib/blog/TaskFactory";

async function main() {
  const taskFactory = new TaskFactory();
  const posts = new Posts();

  //generate post "read next" outros
  for (const post of posts.publishedPosts) {
    await narrator.generate(taskFactory.jobForId(`post/${post.slug}`)!, { save: true });
  }

  //generate the intro per tag (but only for tags with 3 or more posts)
  const tags = posts.getTagsWithCounts().filter(({ count }) => count >= 3);

  for (const tag of tags) {
    await narrator.generate(taskFactory.jobForId(`tag/${tag.tag}`)!, { save: true });
  }

  //generate the overall /blog intro
  await narrator.generate(taskFactory.jobForId("recent-posts")!);
}

main()
  .catch(console.error)
  .then(() => process.exit(0));

There's a bunch more documentation for this over on the GitHub page for NarratorAI.

Training the Narrator for better outcomes

You can write the best prompt in the world, but that doesn't mean the model is going to understand it the same way you do. The best way to improve the quality of the content that Narrator generates is to train it by giving it examples of good and bad generations. You can do this in two ways:

Training with the CLI

It's pretty easy to set up a simple script that will train the Narrator for you. Here's a slightly simplified version of the script I use to train the Narrator for this site:

//expose the OPENAI_API_KEY
import * as dotenv from "dotenv";
dotenv.config();

import { TaskFactory, narrator } from "@/lib/blog/TaskFactory";
import Posts from "@/lib/blog/Posts";

async function main() {
  const taskFactory = new TaskFactory();
  const posts = new Posts();

  //iterates over each published post, generating content and asking for my judgment
  for (const post of posts.publishedPosts) {
    await narrator.train(taskFactory.jobForId("post/" + post.slug));
  }
}

main()
  .catch(console.error)
  .then(() => process.exit(0));

This script will iterate over each published post on the site, passing each "What to Read Next" task to Narrator's train function, which will ask me to rate what it generated. I can skip to the next one, but if I give a good/bad rating, that feedback will be used to improve the next generation.

Under the covers, Narrator saves the content, the good/bad verdict and the optional reason you give in a YAML file. Each time the generate function is called, Narrator will select some of the good and bad examples that you've given it and pass them in to the LLM as part of the prompt. That's the Few-Shotting we talked about earlier.

Training with the React component

You've already seen this bit. The live demo above has more than just a regenerate button - it also has thumbs up and down buttons to train Narrator based on your feedback.

These thumbs up/down buttons are rendered by a component in the @narrator-ai/react library, and are connected to a couple of simple React Server Functions on the backend, which hand most of the work off to NarratorAI. That's configured via a React Context Provider that Narrator - ahem - provides you with:

import { createNarrator } from '@narrator-ai/react';
import { regenerateNarration, saveExample } from '../actions/narration';

const Narrator = createNarrator({
  actions: {
    saveExample,
    regenerateNarration,
  },
});

export default Narrator;

The only thing we configure that provider with is an actions object, which accepts saveExample and regenerateNarration functions. Providing these at the top level of the app means that we can place any number of Narration UI elements throughout the app and they'll all transparently support rating and regeneration.

As far as those Server Functions inside actions/narration.ts go, they're just a couple of simple functions that call the NarratorAI backend:

"use server"
import { TaskFactory, narrator } from '@/lib/blog/TaskFactory';
import { createStreamableUI } from 'ai/rsc';
import { MDXRemote } from 'next-mdx-remote/rsc';
import { Spinner } from '@narrator-ai/react';

//called whenever you click a thumbs up or down
export async function saveExample(example) {
  return await narrator.saveExample(example);
}

//this is all we have to do to support streaming MDX content,
//but this function could totally just return a string instead if streaming isn't your thing
export async function regenerateNarration(docId: string) {
  const editor = await TaskFactory.create();
  const ui = createStreamableUI(<Spinner />);

  (async () => {
    const textStream = await narrator.generate(editor.jobForId(docId), { stream: true, save: true });
    let currentContent = '';

    for await (const delta of textStream) {
      currentContent += delta;
      ui.update(<MDXRemote source={currentContent} />);
    }

    ui.done(<MDXRemote source={currentContent} />);
  })();

  //Narrator knows how to handle Vercel AI text & UI streams as well as vanilla JS strings
  return ui.value;
}

So long as you import the Narrator provider you exported from providers/Narrator.tsx somewhere high up in your app's React component tree you'll be all set. Something like this (though you'll probably have some other stuff in your actual layout):

import NarratorProvider from "./providers/Narrator";

export default function layout({ children }) {
  return <NarratorProvider>{children}</NarratorProvider>;
}

Now the final thing to do is to actually render our Narration content in our app. Because I do this in a few different places, I made a simple wrapper component that I can reuse:

import { Narration } from "@narrator-ai/react";
import NarrationMarkdown from "./NarrationMarkdown";

const sparkleText = "This summary was generated by AI using narrator-ai.<br /> Click to learn more.";

export function NarrationWrapper({
  id,
  title
}: {
  id: string;
  title: string;
}) {
  return (
    <Narration
      title={title}
      id={id}
      sparkleLink="/about/ai"
      sparkleText={sparkleText}

      //this is what lets me regenerate and rate the content in dev mode only
      showActions={process.env.NODE_ENV === "development"}
    >
      <NarrationMarkdown id={id} />
    </Narration>
  );
}

Most of the heavy lifting is done by the <Narration> component, which is what gives you the regenerate, thumbs up and thumbs down buttons. Note that it doesn't render the actual content for you - it can't know how you want to render your content so you need to do that yourself. In my case I just have a simple NarrationMarkdown component that uses next-mdx-remote to render the content:

"use server";

import { narrator } from "@/lib/blog/TaskFactory";
import { MDXRemote } from "next-mdx-remote/rsc";

async function NarrationMarkdown({ id }) {
  const content = narrator.getNarration(id);

  if (!content) {
    return null;
  } else {
    return <MDXRemote source={content} />;
  }
}

export default NarrationMarkdown;

And that's it. Now you can throw in as many <NarrationWrapper title="This is cool!" docId="tag/ai" /> components as you like throughout your app, and they'll all support regeneration and rating of the content. Here's precisely how that React snippet turns out, this time showing the intro content for the AI tag:

Go on, click the regenerate button a few times. You earned it.

Use it in your own project

Anyway, that's it. It's fun and easy to use. There are more docs and examples over on the NarratorAI GitHub page, and you can install it from NPM like this:

npm install narrator-ai @narrator-ai/react

Godspeed and happy generating!

ReadNext: AI Content Recommendations for Node JS

Thu, 12 Sep 2024 06:31:02 GMT

Recently I posted about AI Content Recommendations with TypeScript, which concluded by introducing a new NPM package I've been working on called ReadNext. This post is dedicated to ReadNext, and will go into more detail about how to use ReadNext in Node JS, React, and other JavaScript projects.

What it is

ReadNext is a Node JS package that uses AI to generate content recommendations. It's designed to be easy to use, and can be integrated into any Node JS project with just a few lines of code. It is built on top of LangChain, and delegates to an LLM of your choice for summarizing your content to generate recommendations. It runs locally, does not require you to deploy anything, and has broad support for a variety of content types and LLM providers.

ReadNext is not an AI itself, nor does it want your money, your data or your soul. It's just a library that makes it easy to find related content for developers who use JavaScript as their daily driver. It's best used at build time, and can be integrated into your CI/CD pipeline to generate recommendations for your content as part of your build process.

How to use it

Get started in the normal way:

npm install read-next

Configure a ReadNext instance:

import { ReadNext } from 'read-next'

const readNext = await ReadNext.create({
  cacheDir: path.join(__dirname, 'read-next'),
  parallel: 10
})

Index your content:

await readNext.index({
  sourceDocuments: [
    {
      pageContent: 'This is an article about React Server Components',
      id: 'rsc'
    },
    {
      pageContent: 'This is an article about React Hooks',
      id: 'hooks'
    },
    //... as many as you like
  ]
})

Generate recommendations:

const related = await readNext.suggest({
  sourceDocument: {id: 'rsc'},
  limit: 5
})

That's it! Under the covers, ReadNext creates embeddings for your content - after first running it through a summarization process - then stores the embeddings in a FAISS vector store. This allows it to keep a local cache of the work it has done, and to quickly generate recommendations for your content.

Full Example usage in a React application

I use ReadNext on this blog to generate related content recommendations for each post. It's a Next JS app, so I run ReadNext as part of the build process to generate recommendations for each post. The recommendations are stored in the frontmatter of each post (I use .mdx files for the blog content), and displayed at the bottom of each post.

It's also being used inside my RSC Examples project, which is an open source collection of examples of how to use React Server Components in various contexts. Each example has some explanatory text and code snippets, along with a live example, but even though that's not a traditional "article" per se, ReadNext is flexible enough to work with it.

Here's the actual script that RSC Examples uses to generate related examples for each example:

import path from 'path'
import { ReadNext } from 'read-next'

import Examples, { Example } from '../lib/examples'

const summarizationPrompt = `
The following content is a markdown document about an example of how to use React Server
Components. It contains sections of prose explaining what the example is about, may contain
links to other resources, and almost certainly contains code snippets.

Your goal is to generate a summary of the content that can be used to suggest related examples.
The summary will be used to create embeddings for a vector search. When you come across code
samples, please summarize the code in natural language.

Do not reply with anything except your summary of the example.`

const cacheDir = path.join(__dirname, '..', '..', 'read-next')
async function main() {
  // STEP 1 - create a ReadNext instance
  const readNext = await ReadNext.create({
    cacheDir,
    summarizationPrompt,
  })

  // STEP 2 - index all the examples
  const examples = new Examples()
  const { publishedExamples } = examples

  const sourceDocuments = publishedExamples.map((example: Example) => ({
    pageContent: examples.getContent(example),
    id: example.slug,
  }))

  await readNext.index({ sourceDocuments })

  // STEP 3 - generate related examples for each example
  for (const example of publishedExamples) {
    const {related} = await readNext.suggest({
      sourceDocument: sourceDocuments.find((s: any) => s.id === example.slug)!,
      limit: 5,
    })

    examples.updateMatter(example, {
      related: related.map(
        (suggestion: any) => suggestion.sourceDocumentId,
      ),
    })
  }
}

main()
  .catch(console.error)
  .then(() => process.exit(0))

The script just does 3 simple things:

Creates a ReadNext instance
Indexes all the examples
Generates related examples for each example

The RSC Examples project stores its content as .mdx files, so the final part of the script is just calling a utility function to update the frontmatter on each example with the related examples that ReadNext generated.

The summarizationPrompt is optional but here we're taking advantage of it to better explain to the LLM that the content it is about to transform is a markdown document about an example of how to use React Server Components, not a long form article as it would usually expect. Here's the full thing:

Here's the actual commit that was everything required to get ReadNext completely integrated with RSC Examples (the next commit shows the output that ReadNext generated). There are a couple of simple UI components to display the recommendations, otherwise it's just one script that runs ReadNext to (re-)generate the recommendations.

Where to use it

I spend most of my time in React, usually within Next JS, and typically write TypeScript, so most of what I create is a union of those technologies. ReadNext is really a node project, and you don't need anything to do with React/Next/TypeScript to use it. But it does work really well with those technologies, because that's my stack and it would annoy me if it didn't.

The first time ReadNext runs, it needs to index all of the content you give it. Because this involves a summarization step, this can take a few seconds per article. Subsequent runs will be faster because ReadNext caches the summarizations and only regenerates them if it detects that an article's content has changed.

Every time a new article is added, or an existing article is updated, it's a good idea to re-run ReadNext as it's possible that your changes will alter the recommendations for one or more of your articles. Automating this as part of your build process is a good idea, and because ReadNext is a self-contained package it should be able to run pretty much anywhere.

AI Content Recommendations with TypeScript

Wed, 11 Sep 2024 06:31:02 GMT

Part 3 will dive into how to use InformAI to create personalized predictive recommendations based on a user's pattern of interaction with your app.

In the last post, we used TypeScript to create searchable embeddings for a corpus of text content and integrated it into a chat bot. But chat bots are the tomato ketchup of AI - great as an accompaniment to something else, but not satisfying by themselves. Given that we now have the tools to vectorize our documents and perform semantic searches against them, let's extend that to generate content recommendations for our readers.

At the bottom of each of my blog articles are links to other posts that may be interesting to the reader based on the current article. The lo-fi way this was achieved was to find all the other posts which overlapped on one or more tags and pick the most recent one.

Quite often that works ok, but I'm sure you can think of ways it could pick a sub-optimal next article. Someone who knows the content well could probably pick better suggestions at least some of the time. LLMs are really well-suited to tasks like this, and should in theory have several advantages over human editors (such as not forgetting what I wrote last week).

We want to end up with some simple UI like this, with one or more suggestions for what to read next:

So how do we figure out which content to recommend based on what you're looking at?

What's an article about? Summarizing vs Chunking

We already saw how we compare the "meaning" of a user question (a string) with the "meaning" of our content (a bunch of other strings). This is basically the same problem - comparing the meaning of a text document to the meaning of N other text documents.

But we run into the same problem we ran into last time: the embedding model usually has a fairly low maximum token count, so you can't feed an entire article into it unless the article is below that boundary. In the previous article, we solved this by chunking the document content into smaller sections. But that wouldn't be a valid approach this time because we want to consider the whole article content, not just some chunk of it.

One answer to this is to pass our content through an automated summarization process first, guaranteeing that that article summary will be under the embedding model token limit while also preserving enough of the content to make sure that the suggestions are good ones.

There are several approaches to doing this, and I recommend Lan Chu's article on working around LLM token limit issues for a good overview of the options. At its simplest, though, we can do something like this:

import OpenAI from 'openai';
const openai = new OpenAI();

async function summarize(content) {  
  const prompt = `Between the CONTENT_STARTS and CONTENT_ENDS will follow an article for you to summarize.
  The purpose of the summarization is to drive recommendations for what article somebody should read next,
  based on the article they are currently reading. The summarization should be as lengthy as necessary to
  capture the full essence of the article. It is not intended to be a short summary, but more of a 
  condensing of the article. All of the summarizations will be passed through an embedding model, 
  with the embeddings used to rank the articles.
  
  Please do not reply with any text other than the summary.
  
  CONTENT_STARTS
  ${content}
  CONTENT_ENDS`;
  
  const summary = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: prompt }],
    max_tokens: 2000,
    temperature: 0.7,
  });
  
  return summary.choices[0].message.content as string;
}

await summarize(post.getContent());

Now we can make an embedding for our article summary:

export type Embedding = number[];

export async function generateEmbedding(input: string): Promise<Embedding> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-ada-002',
    input,
  });

  return response.data[0].embedding as Embedding;
}

await generateEmbedding(summary);

This code has all of the error handling and other stuff stripped out make it easy to understand what's going on. In theory we now just need to iterate over all of our articles, generate the summaries, generate the embeddings, and then we can do a vector search to find the most similar articles to the one we're currently looking at.

But that's expensive in both time and money

There are about a hundred articles on this blog at the moment. To calculate the most relevant next article for each one, we therefore need to:

The $0.03 I can live with, but the 6 minutes is less cool. And that's just for 100 articles. If you have a larger corpus, you're going to be waiting a long time for this to finish. There are 2 obvious things to do to make that better:

Parallelize: do the summarization and vectorization in parallel
Cache: cache the embeddings and the summaries so that we don't have to do them again

Typically, once an article is written, it's not going to change (unless I made an embarrassing mistake and want to secretly fix it). So we can cache the summaries and their embeddings and only recompute them if the article changes. However, every time an article is published, it becomes a candidate for the best "Read Next" article for all of the other ones, so we have to perform the similarity search for all of the articles again.

ReadNext does it for you

This stuff is involved enough that I ended up writing a TypeScript library to make it easier to work with. It's called ReadNext and it's available on GitHub and NPM. It provides a simple API to index your articles and retrieve the most relevant ones based on a given article.

Under the covers it does all the things we just talked about - summarization, embedding and vector search. It uses FAISS for the vector search, which is a neat, mature open source project that does this kind of thing really well. It caches the summaries and embeddings onto the file system so that you don't have to do them again, and it parallelizes the summarization and vectorization to make it faster.

I'll go into more detail on how to use it in the next post, but for now, here's a quick example of how you might use it:

import { ReadNext } from 'read-next';

const readNext = await ReadNext.create({
  cacheDir: '/path/to/some/cache/dir'
});

//grab all of our articles, pass them into ReadNext in this format:
const sourceDocuments = articles.map((post: any) => ({
  pageContent: posts.getContent(post),
  id: post.slug,
}));

//this creates the summaries and embeddings for all of the articles
await readNext.index({ sourceDocuments });

Now we can generate the recommendations for a given article:

const suggestions = await readNext.suggest({
  sourceDocument: {
    id: 'introducing-inform-ai'
  }
  limit: 5,
});

Which will give us a response like this:

{
  "id": "introducing-inform-ai",
  "related": [
    {
      "sourceDocumentId": "easy-rag-for-typescript-and-react-apps",
      "score": 0.864181637763977
    },
    {
      "sourceDocumentId": "understanding-react-server-components-and-suspense",
      "score": 0.8758435249328613
    },
    {
      "sourceDocumentId": "teams-using-nextjs-vercel-advantage",
      "score": 0.8849999904632568
    },
    {
      "sourceDocumentId": "demystifying-openai-assistants-runs-threads-messages-files-and-tools",
      "score": 0.9038872718811035
    },
    {
      "sourceDocumentId": "using-server-actions-with-nextjs",
      "score": 0.9190686941146851
    }
  ]
}

Lower scores indicate closer relatedness. Once you have this data, you can use it to generate a list of links to suggest to the user. Because the scores won't change unless the content changes, you can cache the results and only recompute them when the content changes, so the whole thing is eminently cacheable and well-suited to static site generation.

Does it actually work?

The old method I was using was simply showing you the most recent article that had at least one tag in common with the current one. That was ok much of the time, but I have a bunch of Ext JS posts from 2009 that were recommending recent articles on RAG, solely because the RAG articles are the most recent ones that are tagged with "ui". They're not really related at all.

To test it out I ran ReadNext to generate recommendations for each article, then compared the FAISS score for ReadNext's recommendation to the score for the most recent article with at least one tag in common. In only 4% of cases was the old method suggesting the same article as ReadNext; in the other 96% of cases ReadNext suggested an article it thought was more relevant.

Here's a chart showing the difference in scores between the old method and ReadNext. It's ordered by the difference in scores, so the articles on the left are the ones where ReadNext thought the old method was most wrong:

The purple bars are the scores for the recommended article from ReadNext. The green bars are the delta between the tag-based recommendation method and the contextual AI generated recommendation. The taller the purple bar, the less confident ReadNext is in its suggestion. The taller the green bar, the more wrong ReadNext thinks the old method was.

You can see by hovering over the bars to the left edge of the chart that the articles where ReadNext thinks its recommendations are much better than the old way tend to be ones where the old method was suggesting a recent article with a tag in common but completely unrelated content - like recommending a RAG article to read next after an article on the ancient Ext.ux.Printer plugin.

Improving the suggestions

Wherever you see a tall purple bar in the chart above, it means that ReadNext is not very confident in its suggestion. This is because the article in question is not very similar to any of the other articles in the corpus. Sometimes there are just no particularly similar articles to pick, so in those cases we could fall back to the old method of picking the most recent article with a tag in common.

Looking at the data in the chart above, we could set a threshold of 0.9 or 1 for the FAISS score, and if ReadNext can't find a suggestion above that threshold, we could fall back to the old method. This would give us the best of both worlds - the ability to suggest more relevant articles when they're available, but not to suggest something completely unrelated when there's nothing better to suggest.

If we wanted to prioritize newer content, we could apply a decay modifier to increase the candidate article's score by an amount proportional to the difference in publication date. This would make newer articles more likely to be recommended, but only when they're actually relevant.

We'll get a bit more into this in the next post, where I'll show you how to use ReadNext in your own projects, along with how I used it for the RSC Examples site to generate related React Server Component examples.

Easy RAG for TypeScript and React Apps

Mon, 02 Sep 2024 06:31:02 GMT

This is the first article in a trilogy that will go through the process of extracting content from a large text dataset - my blog in this case - and making it available to an LLM so that users can get answers to their questions without searching through lots of articles along the way.

Part 1 will cover how to process your text documents for easy consumption by an LLM, throw those embeddings into a vector database, and then use that to help answer the user's questions. There are a million articles about this using Python, but I'm principally a TypeScript developer so we'll focus on TS, React and NextJS.

Part 2 covers how to make an AI-driven "What to Read Next" component, which looks at the content of an document (or blog post, in this case) and performs a semantic search through the rest of the content to rank which other posts are most related to this one, and suggest them.

Part 3 will extend this idea by using InformAI to track which articles the user has looked at and attempt to predictively generate suggested content for that user, personalizing the What to Read Next component while keeping the reader completely anonymous to the system.

Let's RAG

About a week ago I released InformAI, which allows you to easily surface the state of your application UI to an LLM in order to help it give more relevant responses to your user. In that intro post I threw InformAI into the blog post itself, which gave me a sort of zero-effort poor man's RAG, as the LLM could see the entire post and allow people to ask questions about it.

That's not really what InformAI is intended for, but it's nice that it works. But what if we want to do this in a more scalable and coherent way? This blog has around 100 articles, often about similar topics. Sometimes, such as when I release open source projects like InformAI, it's one of the only sources of information on the internet about the given topic. You can't ask ChatGPT what InformAI is, but with a couple of tricks we can transparently give ChatGPT access to the answer so that it seems like it magically knows stuff it was never trained on.

In reality, having a chatbot that is able to answer questions about content from my blog is not likely to be super useful, but the process to achieve it is adaptable to many situations. It's likely that your company has hundreds or thousands of internal documents that contain answers to all kinds of questions, but finding that information can be difficult, and you may need to manually piece information together from a bunch of sources to get to the answer you need.

What's RAG?

Retrieval Augmented Generation refers to a group of techniques to intercept a user's message on its way to an LLM, look at what the user wrote, try to find relevant information from your own dataset about that question, and then pass that information along with the original question to the LLM, with the hope that the LLM will use it to give the user a good answer.

Concretely, what this usually boils down to is jamming a bunch of additional text content into the LLM prompt, usually labelling that as "Context", then passing the user query afterwards:

Context:
1. InformAI makes it easy to surface all the information that you already have in your React components to an LLM or other AI agent...
2. InformAI keeps track of your component as it renders and re-renders in response to the user...

User Query: What is InformAI?

Ok so how do we actually achieve that? How do we pluck the relevant sections from a large corpus of text? Clearly, this requires 3 things:

Understanding what the question is about
Understanding what the text in the source documents is about
Retrieving source documents that have semantic overlap with the question

So how do we compute a semantic similarity score between two text strings? Well, first we turn the text into an array of numbers, called an embedding. This array is often referred to as a vector, because it is one, but it's also just an array of (usually floating point) numbers.

The "magic" of embedding creation is the process of turning a text string into that vector. I won't go into how that actually happens here, but for now it's enough to know that other people have done the hard work of making that possible, and that once we've produced embeddings for 2 strings, it's very easy to compare those two vectors and see how similar the strings are. This is the crux of what powers RAG - figuring out how similar strings are.

Creating embeddings

So how do we create this vector for a given string? We don't; we get an embedding model to do it for us. Most of the major LLM providers have an embedding API you can call with a text string and get a vector back. Here's how we do it with OpenAI, for example:

import OpenAI from 'openai';
const openai = new OpenAI();

export type Embedding = number[];

export async function generateEmbedding(input: string): Promise<Embedding> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-ada-002',
    input,
  });

  return response.data[0].embedding as Embedding;
}

generateEmbedding("What is InformAI?");

//returns an array of floats like this :
[0.013548319, 0.020720137, -0.0015670912, -0.018836489, 0.011183294, ...1536 elements in total]

Ok, that was pretty easy. In fact, we already achieved our first objective from above (Understanding what a question is about). Let's move on to the second objective - understanding what the text in the source documents is about.

Processing our source documents

We've got an easy way to turn strings of text into embedding vectors by calling an API, so we can just grab our source documents and do this for each of them, right? Well, kinda. It turns out that most embedding models have a strict limit on the length of a text string that you can pass in. In the case of the text-embedding-ada-002 embedding model that we're using here, that limit is 8192 tokens.

Most of the articles here flirt with that limit, with maybe half of them being a little longer than the embedding model can handle. How do we handle this? By splitting the source documents up, of course. There are a bunch of ways you could split a text document, with pros and cons to each. If you want to get serious about it, using a library like LangChain is probably the way to go, as it has a bunch of strategies for sensibly chunking text documents.

But I didn't want to add a dependency to my app just to chunk text, so I just wrote a little function to chunk my text files instead. As I mentioned in the last post, my blog uses MDX to blend Markdown and React components for its content, so one decent strategy here is to just split the .mdx file (after removing the frontmatter of course) by heading:

// Split the markdown into sections based on headings
function chunkMarkdownByHeaders(markdown: string): string[] {
  const chunks: string[] = [];
  const lines = markdown.split('\n');

  let currentChunk: string[] = [];

  lines.forEach(line => {
    // Check if the line is a header
    if (line.match(/^#{1,6}\s/)) {
      // If there's an existing chunk, add it to the chunks array
      if (currentChunk.length > 0) {
        chunks.push(currentChunk.join('\n'));
        currentChunk = [];
      }
    }

    // Add the line to the current chunk
    currentChunk.push(line);
  });

  // Add the last chunk if it exists
  if (currentChunk.length > 0) {
    chunks.push(currentChunk.join('\n'));
  }

  return chunks;
}

//returns an array of strings, one section per element
chunkMarkdownByHeaders("Your lovely long .mdx file here, replete with headings, subheadings and the like")

Technically, this function doesn't guarantee that our chunks are under the 8192 token limit, but in practice the chunks it generates are all substantially smaller than that limit. Again, for more robustness it's better to use something like LangChain for this.

Now that we've got our chunks, though, it's easy to generate embeddings for our entire text corpus:

export type PostEmbedding = {
  _id: string;
  $vector: Embedding;
  content: string;
};

//generate a set of embeddings for an array of Posts
export async function generatePostEmbeddings(posts: Post[]): Promise<PostEmbedding[]> {
  const embeddings: PostEmbedding[] = [];
  for (const post of posts) {
    embeddings.push(...(await generatePostSubEmbeddings(post, false)));
  }

  return embeddings;
}

//generate a set of embeddings for a single Post
export async function generatePostSubEmbeddings(post: Post): Promise<PostEmbedding[]> {
  const { slug } = post;
  const content = await posts.getContent(post);

  console.log('Generating embedding for', slug);

  const chunks = chunkMarkdownByHeaders(content);
  const embeddings: PostEmbedding[] = [];
  for (const chunk of chunks) {
    embeddings.push({
      _id: `${slug}-${chunks.indexOf(chunk)}`,
      $vector: await generateEmbedding(chunk),
      content: chunk,
    });
  }

  console.log('Embeddings generated', embeddings.length);

  return embeddings;
}

//returns an array of PostEmbedding objects, each of which has a unique ID, a chunk of text content,
//and a $vector embedding for that content
generatePostEmbeddings(posts.findAll());

Cool - now we've solved objectives 1 & 2 - we've got vector embeddings for our entire text corpus, as well as an easy way to vectorize questions from the user. But right now it's all just a bunch of arrays in memory - what we need is a way to compare our user question vector with all the embeddings we made for our text content. We need a vector database.

Using a Vector Database

Vector Databases come in many forms. You can, of course, stuff vectors into pretty much any database - they're just arrays of numbers after all, but what we mean by "Vector Database" is one that makes it easy to pluck out vectors that are similar to one provided as our query (our vectorized/embedded user question, for example).

A bunch of vector-optimized databases have cropped up recently, but even traditional relational databases like Postgres and mysql are gaining vector capabilities, which may make things easier if you've already got one of those in your mix. In my case, there was no existing database behind this blog, so I decided to integrate with a relatively new player in the market - Astra by DataStax.

Astra is attractive for my use case because it's cloud hosted, allowing me to continue deploying my largely SSG blog application to Vercel without having to worry about orchestrating database deployments, migrations, or anything like that. It's far from the only option, but it's the one I took in this case. It's also free at my usage level, which is also cool.

DataStax provides a simple npm package called @datastax/astra-db-ts that makes it pretty easy to interact with Astra. Under the covers Astra is built on top of Cassandra, so it may be familiar already. I made a tiny astra.ts that exports a couple of functions like getCollection to make it easier for my embedding.ts file to interact with it:

import { DataAPIClient, SomeDoc } from '@datastax/astra-db-ts';

export function getCollection<T extends SomeDoc>(name: string) {
  const db = getDb();

  return db.collection<T>(name);
}

let client;

export function getClient(): DataAPIClient {
  if (!client) {
    client = new DataAPIClient(process.env.ASTRA_DB_TOKEN);
  }

  return client;
}

export function getDb() {
  return getClient().db(process.env.ASTRA_DB_API_ENDPOINT ?? '');
}

The only slight hitch I ran into was creating a token with the appropriate access - they have an RBAC system for tokens (which is good), but I had to create a token with more access than I expected to make it actually work (which is not so good). Anyway, to actually upload the embeddings we made before, we can just call collection.insertMany:

export async function uploadEmbeddings(documents: PostEmbedding[]) {
  const collection = getCollection<PostEmbedding>('posts');

  console.log('Connected to AstraDB:', collection);
  console.log(`Uploading ${documents.length} documents...`);

  try {
    const inserted = await collection.insertMany(documents);
    console.log(`* Inserted ${inserted.insertedCount} items.`);
  } catch (e) {
    console.log(e);
  }

  return collection;
}

That's it really. I skipped the bit about creating an account, database and collection within Astra, but I'm sure you can figure that out from their docs. Now all we have to do is grab our user message and fetch semantically similar text content from Astra.

Putting it all together

And here's how we can do that. This function just takes our user's question, turns it into an embedding, then searches our Astra database collection for similar embeddings using a vector search. Assuming we got some searchResults back, it will then just plop those into a prompt string along with the user's query, then send that along to OpenAI:

export async function ragUserMessage(input: string) {
  const inputEmbedding = await generateEmbedding(input);
  const collection = await getCollection<PostEmbedding>('posts');
  const searchResults = await collection
    .find(
      {},
      {
        sort: { $vector: inputEmbedding },
        includeSimilarity: true,
        limit: 10,
      }
    )
    .toArray();

  // Format the context from search results
  const context = searchResults
    .map((result, index) => `${index + 1}. ${result._id}: ${result.content}`)
    .join('\n');

  const prompt = `Context:\n${context}\n\nUser Query:\n${input}`;

  // Send the prompt to the LLM
  const response = await openai.chat.completions.create({
    model: 'gpt-4', // Or the model you are using
    messages: [{ role: 'user', content: prompt }],
    max_tokens: 1000,
    temperature: 0.7,
  });

  return response.choices[0];
}

The response that we get back from the LLM should benefit from the context we found via semantic search, unless our results were not very good. Our searchResults array does include a similarity score for each result, so we can perform a cutoff or other processing to make sure we're only passing genuinely relevant content to the LLM. We can, of course, also modify the prompt itself to say things like "Using only the content provided here, please answer the user's question", or other text to try to constrain the LLM's response and limit its tendency to hallucinate. YMMV.

Streaming is better

Finally, that example above just used the basic OpenAI chat completion API, which will potentially sit there for a long time before showing you the entire LLM response in one go. That's a poor UX, so it's usually better to stream that text back. I'm a big fan of the Vercel AI SDK, and recently wrote about how to use that alongside the basic ChatWrapper React component in InformAI to get a quick and dirty chatbox interface up and running.

"use client";

import { ChatWrapper } from "inform-ai";
import { useActions, useUIState } from "ai/rsc";

export function ChatBot({ className }: { className?: string }) {
  const { submitUserMessage } = useActions();
  const [messages, setMessages] = useUIState();

  return (
    <ChatWrapper
      className={className}
      submitUserMessage={submitUserMessage}
      messages={messages}
      setMessages={setMessages}
      placeholder="Ask me anything about any content on edspencer.net"
    />
  );
}

None of this needs InformAI at all - I just happen to be using it already so I stuck with that, but you could equally roll your own chatbot UI, use useChat from Vercel or find something else from off the shelf. Here's a live chatbot that you can ask any question about any article on this site (see the InformAI README for how to use this UI component):

Here's the actual code that I'm using on the back end to make this work. Most of this is taken directly from the InformAI README - I just added the for..of loop to replace your message to the LLM with the one returned from prepareRAGMessage:

'use server';

import { getMutableAIState, streamUI } from 'ai/rsc';
import { openai } from '@ai-sdk/openai';
import { Spinner } from '@/components/Spinner';

import { AssistantMessage } from 'inform-ai';
import { generateId } from 'ai';

import { ClientMessage } from '../providers/AI';
import { prepareRAGMessage } from '@/lib/embedding';

export async function submitUserMessage(messages: ClientMessage[]) {
  const aiState = getMutableAIState();

  // Do the RAG lookup
  for (const message of messages) {
    if (message.role === 'user') {
      message.content = await prepareRAGMessage(message.content as string);
    }
  }

  //add the new messages to the AI State so the user can refresh and not lose the context
  aiState.update({
    ...aiState.get(),
    messages: [...aiState.get().messages, ...messages],
  });

  //set up our streaming LLM response, with a couple of tools, a prompt and some onSegment logic
  //to add any tools and text responses from the LLM to the AI State
  const result = await streamUI({
    model: openai('gpt-4o-2024-08-06'),
    initial: <Spinner />,
    system: `\
    You are a helpful assistant who helps people with questions about posts found at https://edspencer.net`,
    messages: [
      ...aiState.get().messages.map((message: any) => ({
        role: message.role,
        content: message.content,
        name: message.name,
      })),
    ],
    text: ({ content, done }) => {
      if (done) {
        aiState.update({
          ...aiState.get(),
          messages: [...aiState.get().messages, { role: 'assistant', content }],
        });

        aiState.done(aiState.get());
      }

      return <AssistantMessage content={content} />;
    },
  });

  return {
    id: generateId(),
    content: result.value,
  };
}

And that's it - we've got LLM responses streaming into our React frontend that allow users to ask questions about things that ChatGPT was never trained on, but can give reasonable answers to anyway because we have RAG. In the next part we'll look at how to extend our use of RAG to generate more meaningful "Read Next" suggestions for our articles, before moving on to making intelligent, personalized suggestions based on reading history.

Blending Markdown and React components in NextJS

Wed, 28 Aug 2024 12:31:02 GMT

Authoring long-form content like blog posts is a pleasant experience with Markdown as it lets you focus on the content without worrying about the presentation or making the browser happy. Spamming <p> and <div> tags all over the place is a PITA and serves as a distraction from the content you're working on.

However, in a blog like this one, which deals with a lot of React/node/nextjs content, static text and images are limiting. We really want our React components to be live on the page with all of the richness and composability that React and JSX bring - so how do we blend the best of both of these worlds?

MDX: Markdown plus React

MDX is an extension to Markdown that also allows you to import and use React components. It lets you write content like this:

MDX is a blend of:

- normal markdown
- React components

<Aside type="info">
  This blue box is an custom React component called `<Aside>`, and it can be rendered by MDX along 
  with the other Markdown content.
</Aside>

That's rendering an <Aside> component, which is a simple React component I use in some of my posts and looks like this:

That's really cool, and we can basically use any React component(s) we like here. But first let's talk a little about metadata.

Metadata matters

A document is not just a document - it has a bunch of associated metadata like a publication status, a title, maybe some tags, timestamps, a summary, a canonical url, potentially author information and any number of other pieces of data that are not the document itself, but data about the document.

When it comes to things like text documents, metadata should be co-located with the content - ideally in the same file. The nature of metadata is usually quite different from text content, though - metadata typically has some structure to it and is well suited to a format like JSON or YAML.

Thankfully, markdown has a concept known as "Frontmatter", which is really just a yaml block shoved into the top of a markdown file. The frontmatter metadata for this very post looks like this:

---
slug: using-markdown-with-nextjs
status: publish
title: Blending Markdown and React components in NextJS
tags:
  - nextjs
  - react
  - vercel
  - rsc
  - ui
  - mdx
date: '2024-08-28 06:31:02'
images:
  - /images/posts/mdx-content.png
description: >-
  Markdown is a really nice way to write content like blog posts, 
---

Unfortunately, while NextJS does have native support for markdown content, it does not support frontmatter out of the box. There may be instances where you don't really need any metadata, or can come up with some other way to handle it (I used to use a JSON file), but life is so much easier when you can use frontmatter.

Thankfully, it's pretty easy to do this using MDXRemote.

Rendering MDX content

MDXRemote is a library that lets you render a string containing MDX content. That's useful if you are loading content from a database or something, but there's nothing stopping you from just reading file data and passing that in as a prop. Well, almost nothing - we've got to do something about that frontmatter first.

There are a few ways to do that - here's an approach I like using a library called gray-matter:

import matter from 'gray-matter'

const source = fs.readFileSync(file)
const { data, content } = matter(source)

//this is now a JavaScript object of all the frontmatter yaml
console.log(data)

//this is the markdown content, not yet processed, but with the frontmatter stripped out
console.log(content)

Ok so we used gray-matter to process our MDX file into a JS object for the metadata, and a string for everything else, but that everything else - the content - is still Markdown. Let's turn it into HTML now using <MDXRemote>.

Here's the actual MarkdownContent.tsx that is used to power the RSC Examples:

import remarkGfm from 'remark-gfm'
import { Code } from 'bright'
import { MDXRemote } from 'next-mdx-remote/rsc'
import { Callout } from './Callout'
import CaptionedContent from './CaptionedContent'
import Figure from './Figure'

Code.theme = {
  dark: 'github-dark',
  light: 'github-light',
}

Code.defaultProps = {
  lang: 'shell',
  theme: 'github-light',
}

const mdxOptions = {
  remarkPlugins: [remarkGfm], //adds support for tables
  rehypePlugins: [],
}

const components = {
  pre: Code,
  Callout,
  Figure,
  CaptionedContent,

  //just colors any `inline code stuff` blue
  code: (props: object) => (
    <code style={{ color: 'rgb(0, 92, 197)' }} {...props} />
  ),
}

export default function MarkdownContent({ content }: { content: string }) {
  return (
    <MDXRemote
      options={{ mdxOptions }}
      source={content}
      components={components}
    />
  )
}

MarkdownContent.tsx shows off several of the capabilities of MDXRemote:

mdxOptions - allow us to pass in whatever remark/rehype plugins we like (I'm just using remark-gfm here to support Markdown tables)
source - is just the source string we saw in the previous code block, passed in as a React prop
components - allows us to render our custom React components like <Callout> and <Figure>

The components prop is where the interesting stuff is happening. By passing in Callout, Figure and CaptionedContent there - all of which are React components imported above - we can start putting content like <Callout type="warning">Be careful!</Callout> directly in our .mdx files (Callout is basically the same as the Aside component I use on the blog).

Here is also where we support syntax highlighting via the awesome Bright syntax highlighter. That's what turns our code snippets (delineated by ```, which markdown turns into a <pre>) into beautifully syntax highlighted blocks of HTML. It's the same library I use for the syntax highlighting on this blog.

Where to store the content

You could store your MDX content anywhere, including inside a database, but generally it's easier to save them as files in your git repo. Not only is this one less dependency, but you get all the things like file histories, branching and reversion for free.

For the RSC Examples app, I wanted people to be able to get value by browsing the repo as much as by browsing the app itself. Most of the examples are just an .mdx file and a .tsx file - one explaining the example, the other executing it. By structuring things this way, you can grok an example like this one directly in the repo almost as well as you can by playing with the live example itself.

A non-database database

Keeping the content in source control is great, but you probably want to have things like index pages that list out the content, some kind of search, filtering by tag, or other dynamic functionality that requires your app to have some kind of database of all of your .mdx content.

In previous iterations of my blog app I kept a JSON file that acted as a sort of manifest of all of the posts I had written. It was annoying to have to keep switching from the .md file to the .json file to add metadata, but it did provide a "database" of all the content on the site.

Once I migrated to .mdx I was able to write a really piece of code that would just find all the .mdx files nested in some directory, parse the frontmatter using gray-matter and expose a couple of utility functions like getting the content, ready to be passed into <MarkdownContent>.

I do basically the exact same thing with this simple Examples class inside the RSC Examples repo. If you are planning on having tens of thousands of .mdx files then you'd probably want to consider a different approach, but this ~100 lines of TypeScript makes it easy to do everything I need without adding the dependency of a database while keeping builds fast and tooling unnecessary.

The page.tsx inside RSC Examples that actually renders the content now becomes pretty simple:

export default async function Page({ params }: Props) {
  const slug = params.slug.join('/')
  const examples = new Examples()
  const example = examples.publishedExamples.find(
    (example: any) => example.slug === slug,
  )

  if (!example) {
    return notFound()
  }

  const content = examples.getContent(example)

  return (
    <DocsLayout frontmatter={example}>
      <MarkdownContent content={content} />
    </DocsLayout>
  )
}

Server-side rendering friendliness

Both my blog and the RSC Examples site are Next JS applications, hosted on the Vercel platform, and making heavy (almost exclusive) use of static server-side rendering. That's why both sites are generally pretty instantaneous to load - most of the work was done at deploy time, while still allowing for interactivity where it's needed.

Looking at our page.tsx again, the generateStaticParams function is super easy to implement, and allows Next JS to build all of our Examples as static content at deploy time:

export function generateStaticParams() {
  const { publishedExamples } = new Examples()

  const all = publishedExamples.map((example: any) => ({
    slug: example.slug.split('/'),
  }))

  return all
}

Not only does this make our content cheap and easy to host, it makes it blazing fast too, while still supporting as much interactivity as we need. Even though the .mdx files are rendered server-side, we can still render rich, interactive React components that run on the client side, like this little InformAI-driven chatbot that's running live inside this page and lets you ask questions about this post:

That's a live client-side component rendered inside a .mdx file at build time inside a static, server-rendered page. It's awesome how well this all works together. To find out more, take a poke around the RSC Example GitHub repo or ping me on twitter.

Introducing InformAI - Easy & Useful AI for React apps

Mon, 26 Aug 2024 06:31:02 GMT

Most web applications can benefit from AI features, but adding AI to an existing application can be a daunting prospect. Even a moderate-sized React application can have hundreds of components, spread across dozens of pages. Sure, it's easy to tack a chat bot in the bottom corner, but it won't be useful unless you integrate it with your app's contents.

This is where InformAI comes in. InformAI makes it easy to surface all the information that you already have in your React components to an LLM or other AI agent. With a few lines of React code, your LLM can now see exactly what your user sees, without having to train any models, implement RAG, or any other expensive setup.

InformAI is not an AI itself, it just lets you expose components and UI events via the simple <InformAI /> component. Here's how we might add AI support to a React component that shows a table of a company's firewalls:

<InformAI
  name = "Firewalls Table"
  prompt = "Shows the user a paginated table of firewalls and their scheduled backup configurations"
  props = {{data, page, perPage}}
/>

Under the covers, InformAI keeps track of your component as it renders and re-renders in response to the user. When the user is ready to ask the LLM a question, InformAI automatically wraps up all of that context and renders it into an LLM-friendly data format that gets sent along with the user's message to an LLM backend of your choice.

When your user next sends the AI a question, all of that LLM-friendly component state is sent to the LLM along with the user's input, allowing the LLM to respond based on what the user can see. Because InformAI also supports React Server Components, the LLM can also respond with rendered, specifically-configured components as well as a traditional text reply, allowing conversations like this:

Here the user is able to ask the LLM questions, and it's answering based partly on the information exposed to it via InformAI. In the example above, the LLM responded to the second message by returning a <BackupsTable /> component, rendered server-side and streamed to the client just as if it were text. There's a full walkthrough on how to do with alongside the Vercel AI SDK on the InformAI README.

Live Demo

InformAI is easy to add to any React app, which includes this blog, which is a NextJS app that happens to use a lot of React Server Components. Because InformAI works just as well with RSC as it does with traditional client-side React components, I was able to add LLM-awareness to every post on my site by just adding this one <InformAI /> tag:

import { InformAI, InformAIProvider } from 'inform-ai';

//a react server component that renders a single blog post
export async function PostContent({ post }: { post: Post }) {
  const postFilePath = pathForPostFile(post);
  const source = fs.readFileSync(postFilePath);
  const { content } = matter(source);

  //in reality, the <InformAIProvider> is in my layout.tsx, but I show it here instead for clarity
  return (
    <InformAIProvider>
      <InformAI
        name="Blog Post Content"
        props={{ content, post }}
        prompt="Shows the blog content to the user. Also gives you the full post metadata"
      />
      <MarkdownContent content={content} />
    </InformAIProvider>
  );
}

You can pass whatever props you like to InformAI - in this case I'm passing the whole post object as well as the markdown content for the post. InformAI ships with a couple of UI components that make development easier, such as the <CurrentState /> component that you can drop anywhere in your app and see what InformAI sees:

That's a live component - expand a row to see the name, props (post and content) and prompt that we supplied in our template. Now that we've told InformAI about our component, we can integrate a chat bot like this one and are immediately able to talk to the LLM, knowing that it sees what we see:

Try asking it a question about something in this article (e.g. "what is InformAI?") and you'll get a response from the LLM that incorporates all of the context that all of your InformAI-integrated components have published.

Although InformAI's focus is on collecting intelligence from your components and exposing that to an LLM, it does also ship with a couple of basic UI components to help you get started quickly. These UI components are totally optional and you'll probably want to roll your own at some point, but adding the ChatBot above to your app can be as simple as this:

"use client";

import { ChatWrapper } from "inform-ai";
import { useActions, useUIState } from "ai/rsc";

export function ChatBot({ className }: { className?: string }) {
  const { submitUserMessage } = useActions();
  const [messages, setMessages] = useUIState();

  return (
    <ChatWrapper
      className={className}
      submitUserMessage={submitUserMessage}
      messages={messages}
      setMessages={setMessages}
    />
  );
}

See the InformAI README for a little more on how to do that, including how to set up the submitUserMessage function.

What it's Good For

InformAI allows you to rapidly and iteratively adopt deep AI integration into your new or existing React applications. Although it's generally just a couple of lines per component, it can provide value to your users even if you don't integrate your entire app with it in one go. Gradual adoption is easy, and it works across server and client-side components.

With a few lines of code the LLM can see everything your user can, including a timeline of events as the user navigates around the app. Imagine a cyber security app that tracks viruses blocked by your firewall - usually there is a lot of noise in the signal there and it can take several steps to investigate some pattern you might see in the data - with InformAI your components can emit events as your user performs their investigation, as well as information about any streamed UI components it sent in response to questions. With all of this context the AI can understand the journey that the user is embarking on, and preemptively render custom UI or fetch data to help complete the task.

InformAI is not a silver bullet: any content that is not on the screen already (or at least accessible to your React components) obviously won't be surfaced to the AI this way. Your app's AI will still benefit from RAG, model fine-tuning and tool provision and InformAI is not a substitute for any of those, but for React developers who may not already be deep in the weeds with LLM fine-tuning, InformAI provides a great bang for the buck in getting started.

Play with it online

I have a little open source nextjs app called LANsaver, which is a simple app that helps back up network devices like firewalls, managed switches and Home Assistant instances in case something dies and you need to restore it. I integrated InformAI into it and you can see an online demo version at https://lansaver.edspencer.net. I left the <CurrentState /> component in place there so you can see what is being exposed to the LLM via InformAI.

LANsaver (see source on github) is pretty basic - all it really knows about are Devices, Backups and Schedules, but hopefully it provides a way to see how easy it is to integrate InformAI into your own applications. LLMs thrive on data, so apps with more meat on the bones than LANsaver will benefit more from InformAI.

Next Steps

InformAI is fairly new but has a decent amount of documentation and automated testing. Its API is relatively small and stable, and it is published as an npm module called inform-ai. Some near-term improvements include providing an easy way to tell the LLM that the user navigated to a new page (e.g. components may no longer be visible), a proper docs site and some more examples.

If you're interested in this stuff and haven't had the chance to check out the Vercel AI SDK yet, I highly recommend you do. Beyond being useful and easy to use, some of the source code is just beautiful (I spent a couple of days just admiring the streaming code recently). It's well worth following Lars Grammel to get updates first-hand as he seems to put out updates pretty much daily.

Error handling and retry with React Server Components

Tue, 16 Jul 2024 06:31:02 GMT

React Server Components are a game-changer when it comes to building large web applications without sending megabytes of JavaScript to the client. They allow you to render components on the server and stream them to the client, which can significantly improve the performance of your application.

However, React Server Components can throw errors, just like regular React components. In this article, we'll explore how to handle and recover from errors in React Server Components.

Error boundaries

In React, you can use error boundaries to catch errors that occur during rendering, in lifecycle methods, or in constructors of the whole tree below them. An error boundary is a React component that catches JavaScript errors anywhere in its child component tree and logs those errors, displaying a fallback UI instead of crashing the entire application.

To create an error boundary in React, you need to define a component that implements the componentDidCatch lifecycle method. This method is called whenever an error occurs in the component tree below the error boundary.

Here's an example of an error boundary component:

class ErrorBoundary extends React.Component {
  constructor(props) {
    super(props);
    this.state = { hasError: false };
  }

  componentDidCatch(error, errorInfo) {
    this.setState({ hasError: true });
    console.error(error, errorInfo);
  }

  render() {
    if (this.state.hasError) {
      return <div>Something went wrong.</div>;
    }

    return this.props.children;
  }
}

Alternatively, you can use the ErrorBoundary component from the react-error-boundary library, which provides a slightly more robust implementation of error boundaries, including support for error recovery and retrying rendering. Here's how we might use that on an RSC-rendered page:

'use client'
import { ErrorBoundary } from 'react-error-boundary'

export default function PageWithBoundary() {
  return (
    <>
      <p>
        This page demonstrates what happens when an error is thrown in a
        component with an explicit error boundary.
      </p>

      <ErrorBoundary fallback={<ErrorFallback />}>
        <ErrorComponent />
      </ErrorBoundary>
    </>
  )
}

function ErrorComponent() {
  throw new Error('Error thrown in component')

  return 'This will never be rendered'
}

function ErrorFallback() {
  return (
    <div className="text-red-700">There was an error with this content</div>
  )
}

When we render this page, we'll end up seeing something like this:

This is useful as it allows the rest of our page to render and be usable, even if a component or two throw errors. It allows us to inform the user that something went wrong, at which point they're likely to want to hit the refresh button. But there's a better way...

Retrying rendering

Wouldn't it be cool if we could allow the user to retry rendering the component that errored out? With React Server Components, we can! Kind of.

Our ideal solution here would be to allow the rest of the page to render and be interactive, while the errored component is replaced with a button that allows the user to retry rendering it. If we're going to show the user an error, it's best not to take down the whole page with it, and to give them an easy way to recover from it.

Let's see how we might implement this:

import { ErrorBoundary } from 'react-error-boundary'
import ErrorFallback from './ErrorFallback'

export default function ResettablePage() {
  return (
    <>
      <p>
        This page has a component with a 50% chance of throwing an error. If it
        does, a Reset button will appear that you can click to reset the
        component.
      </p>
      <p>
        This is useful for when you want to give the user a way to recover from
        an error without having to refresh the entire page. Refresh the page a
        few times if you don't get the error immediately.
      </p>

      <ErrorBoundary FallbackComponent={ErrorFallback}>
        <ErrorComponent />
      </ErrorBoundary>
    </>
  )
}

async function ErrorComponent() {
  // Simulate a delay so we can see the Reset button spinning
  await new Promise((resolve) => setTimeout(resolve, 1000))

  if (Math.random() > 0.5) {
    throw new Error('Error thrown in component')
  }

  return (
    <p className="border border-blue-700 p-4">
      This has a 50% chance of throwing an error, but this time it rendered
      fine.
    </p>
  )
}

Ok so we have have a page that contains a bunch of content, plus a component that has a 50% chance of throwing an error. If it does, we'll show a Reset button that the user can click to retry rendering the component.

We used the ErrorBoundary component from react-error-boundary to catch the error and display the ErrorFallback component when an error occurs. The ErrorFallback component contains a button that allows the user to retry rendering the component. Here's what the ErrorFallback component looks like:

'use client'

import { startTransition, useState } from 'react'
import { useRouter } from 'next/navigation'

import Spinner from './Spinner'

export default function ErrorFallback({
  error,
  resetErrorBoundary,
}: {
  error: Error
  resetErrorBoundary: () => void
}) {
  const router = useRouter()

  //tracks the state of our reset button
  const [isResetting, setIsResetting] = useState(false)

  function retry() {
    setIsResetting(true)

    startTransition(() => {
      router.refresh()
      resetErrorBoundary()
      setIsResetting(false)
    })
  }

  return (
    <div className="border border-orange-700 p-4 text-orange-700">
      <p className="m-0 mb-2 p-0">There was an error loading this component</p>
      <button
        onClick={() => retry()}
        disabled={isResetting}
        className="button inline-flex items-center gap-4 rounded-md border bg-blue-500 px-4 py-2 text-white hover:bg-blue-600"
      >
        {isResetting ? <Spinner /> : null}
        Retry
      </button>
    </div>
  )
}

A few things to note here:

The ErrorFallback component is a client component (so is its parent - the ErrorBoundary component that we used)
We use router.refresh() to retry the rendering of the component that errored out. This actually re-renders the whole page, but to the user it looks like only the errored component is being re-rendered
We need to wrap the router.refresh() call in the new startTransition API because router.refresh() is a long-running operation that does not return a Promise, so we can't await it
We used an isResetting state variable to allow us to show a spinner while the component is being re-rendered

When we render this page, we'll see something like this:

That's a fully interactive iframe pointing to a live example on my RSC Examples site. You had a 50% chance of seeing an error, but you can hit the little refresh icon above the iframe if you got lucky/unlucky enough not to see an error.

Now, when you click the blue Retry button, our retry function within the ErrorFallback component will be called. This will set the isResetting state to true, refresh the page, reset the error boundary, and then set isResetting back to false. This will cause the ErrorComponent to be re-rendered, and with a bit of luck, it won't throw an error this time.

What actually happened here is that we reloaded the whole page, so it's not as surgical as it looks (the hint is in the router.refresh() call...). However, from the user's perspective, it feels very much like just this one single component is being retried, and existing state such as form input is maintained. This is significantly better than either crashing the whole page or forcing the user to refresh the whole page.

Conclusion

The combination of error boundaries and retrying rendering with React Server Components allows you to build robust web applications that can recover from errors gracefully. By catching errors and displaying a fallback UI, you can prevent your application from crashing and provide a better user experience.

Also, check out this excellent YouTube video by Ryan Toronto on the same subject.

Promises across the void: Streaming data with RSC

Fri, 12 Jul 2024 06:31:02 GMT

Last week we looked at how React Server Component Payloads work under the covers. Towards the end of that article I mentioned a fascinating thing that you can do with RSC: sending unresolved promises from the server to the client. When I first read that I thought it was a documentation bug, but it's actually quite real (though with some limitations).

Here's a simple example of sending a promise from the server to the client. First, here's our server-rendered component, called SuspensePage in this case:

import { Suspense } from "react";
import Table from "./table";
import { getData } from "./data";

export default function SuspensePage() {
  return (
    <div>
      <h1>Server Component</h1>
      <Suspense fallback={<div>Loading...</div>}>
        <Table dataPromise={getData(1000)} />
      </Suspense>
    </div>
  );
}

So we just imported a getData() function that returns a promise that resolves after 1 second. This simulates a call to a database or other asynchronous action. Here's our fake getData() function:

const fakeData = [
  { id: 1, name: 'Alice' },
  { id: 2, name: 'Bob' },
  { id: 3, name: 'Charlie' },
]

export async function getData(delay: number): Promise<any> {
  return new Promise((resolve) => {
    setTimeout(() => {
      resolve(fakeData)
    }, delay)
  })
}

We pass that promise to a component called Table as a prop called dataPromise. Here's that component (note the use client directive, which tells the compiler that this component will be run on the client):

"use client";
import { use } from "react";

export default function Table({ dataPromise }: { dataPromise: Promise<any> }) {
  const data = use(dataPromise)

  return (
    <table className="max-w-5xl table-auto text-left">
      <thead>
        <tr>
          <th>ID</th>
          <th>Name</th>
        </tr>
      </thead>
      <tbody>
        {data.map((row: any) => (
          <tr key={row.id}>
            <td>{row.id}</td>
            <td>{row.name}</td>
          </tr>
        ))}
      </tbody>
    </table>
  )
}

dataPromise is not a good name for a prop in reality, but I call it that here to make it clear that this is a Promise, not the data itself. We don't get the actual data until that Promise resolves.

Note that although React Server Components can be async functions, we're not actually writing our server-rendered SuspensePage component using async, nor are we making the client-side Table function an async one (that's partially because async components are not yet supported on the client side, but also partly because we don't need to).

Async/await-ish via the power of `use()`

This component uses the new React use hook to wait for the promise to resolve. use accepts a Promise as an argument, and does some clever things:

If the promise is already resolved, it returns the resolved value.
If the promise is not resolved, it suspends the component and waits for the promise to resolve.
If the promise rejects, it throws an error.

Under the covers, in that second scenario (promise not yet resolved), use will actually throw the Promise, which is caught by React and used to suspend the component. This is how React knows to wait for the promise to resolve before rendering the component. That thrown Promise will be caught by the nearest Suspense boundary, which will then show the fallback until the Promise resolves.

When the Promise does eventually resolve, React will re-render the component with the resolved value. If we used use() multiple times, the pattern will repeat until all of the Promises have resolved and all of the components rendered (or until some of the Promises reject and the nearest Error Boundary renders).

Now, with use, React can suspend rendering of a component until a Promise resolves, which is a huge step forward in terms of simplifying asynchronous rendering. It does mean that any work done up until that point will be thrown away if the component suspends, but unless your Component is doing lots of heavy processing (which it should not be), this is not a big deal.

How can I start a Promise on the server and have it resolve on the client?

The key to sending a Promise from the server to the client is to not await it on the server. If you await the Promise on the server, you'll be sending the resolved value to the client, not the Promise itself, but the Promise may take some time to resolve, during which UI rendering is blocked and your user left waiting.

As for how this actually works under the covers, take a look at my post on how React Server Component Payloads work, but the high level flow goes like this:

The server renders the component tree, and notices that an unresolved Promise is being passed as a prop to a client-side component.
The server assigns an internal ID to that Promise, and sends that ID to the client in place of the Promise itself.
The client renders the component tree, and notices that a Promise ID is being passed as a prop to a client-side component. This suspends render until:
When the Promise resolves on the server-side, the server renders an inline <script> tag with the resolved value, keyed on the Promise ID it generated.

The key to all this working is that the server is streaming the HTML response to the client, and doesn't actually close that stream until it has finished rendering everything, including resolved Promises. So in our case above, the server would likely render our very basic SuspensePage component in a few milliseconds, but then keep the stream open for another second while it waits for the getData() Promise to resolve.

At that point, thanks to the streamable nature of HTML, the server can just send a bit more response HTML in the form of that <script> tag, which will trigger a next.js (in this case) callback that will update the client-side component with the resolved value.

See it in action

I created a simple, live and hosted example of this on my RSC Examples site. You can see this example at https://rsc-examples.edspencer.net/promises/resolved. To view the example directly, skipping the explanation, take a look at https://rsc-examples.edspencer.net/examples/promises/resolved. If you run this curl command you can see the server response streaming in real time:

curl -D - --raw https://rsc-examples.edspencer.net/examples/promises/resolved

What you'll see there is the server sending the majority of the HTML response, pause for 1 second, then spit out a final <script> tag that looks like this, along with a <div> tag that we'll get to in a moment:

<script>self.__next_f.push([1,"9:[{\"id\":1,\"name\":\"Alice\"},{\"id\":2,\"name\":\"Bob\"},{\"id\":3,\"name\":\"Charlie\"}]\n"])</script>

self.__next_f.push is a next.js function that overrides the push method on the __next_f array, which under the covers fires a bunch of logic to handle whatever the server is sending. I cover that process in a lot more detail in this article about how React Server Component Payloads work, but at a high level: the ID that React generated for our dataPromise Promise was ID=9, and so when this <script> tag is executed, under the covers next.js will figure out that the resolved Promise with ID=9 needs to go back into the dataPromise prop of the Table component.

Now that the promise has resolved on the server, been streamed across to the client and then re-constituted as a Promise again, the client component re-renders, this time with a resolved dataPromise and is therefore able to fully render. We really ended up with 2 Promises - one on the server, the other a reconstituted version of that Promise on the client, but in our code we can treat them as the same thing.

Now let's take a look at that <div> tag that was also sent by the server (it also has a second <script> tag tacked on there). I've formatted this slightly to make it more readable as it usually comes down in a single line:

<div hidden id="S:0">
  <table class="max-w-5xl table-auto text-left">
    <thead>
      <tr>
        <th>ID</th>
        <th>Name</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td>
        <td>Alice</td>
      </tr>
      <tr>
        <td>2</td>
        <td>Bob</td>
      </tr>
      <tr>
        <td>3</td>
        <td>Charlie</td>
      </tr>
    </tbody>
  </table>
</div>
<script>
  $RC=function(b,c,e){c=document.getElementById(c);c.parentNode.removeChild(c);var a=document.getElementById(b);if(a){b=a.previousSibling;if(e)b.data="$!",a.setAttribute("data-dgst",e);else{e=b.parentNode;a=b.nextSibling;var f=0;do{if(a&&8===a.nodeType){var d=a.data;if("/$"===d)if(0===f)break;else f--;else"$"!==d&&"$?"!==d&&"$!"!==d||f++}d=a.nextSibling;e.removeChild(a);a=d}while(a);for(;c.firstChild;)e.insertBefore(c.firstChild,a);b.data="$"}b._reactRetry&&b._reactRetry()}};
  $RC("B:0","S:0")
</script>

So at the same time as the server sent the <script> tag with the resolved Promise, it also sent this <div> tag with the fully rendered table. This is a neat trick that next.js does to make the page load faster: it sends the server-rendered HTML along with the Promise, so that the client can instantly render the page with the server-rendered HTML, and then hydrate it with the resolved Promise when it arrives.

That second <script> tag just defines a function that replaces an existing element on the page with id B:0 with the element with id S:0. The S:0 element is the server-rendered table that just streamed down, and the B:0 element is a Suspense-rendered placeholder that allows React to drop this delayed content into the right place in the DOM. When the <Table> initially attempted to render on the server, but was suspended due to the unresolved Promise, it rendered a placeholder instead of the actual table, with an ID of B:0.

Limitations and gotchas

But you can't just send any Promise from the server to the client. The value that the Promise ultimately resolves has to be either a simple native data type like a string, number or float, or a plain JS object/array, or a rendered React component. If the Promise resolves to anything else, you'll get an error on the client side when React tries to render it.

I put together a second example at https://rsc-examples.edspencer.net/promises/various-datatypes that shows the ability to load a variety of different data types. Here's a video of that example in action:

Here we see strings, numbers, floats, plain objects and arrays being sent across the void, as well as a React component, which is a cool thing to be able to do. The React component is a simple one that just renders a string, but it could be anything you like. The full example is at https://rsc-examples.edspencer.net/promises/various-datatypes.

There are ways around this, but that's for another post.

There is a separate example at https://rsc-examples.edspencer.net/promises/rendering-components that focuses on just a React component being rendered without all of the other types so you can see how that works in isolation.

Decoding React Server Component Payloads

Mon, 01 Jul 2024 06:31:02 GMT

If you've spent any time playing with React Server Components, you've probably noticed a bunch of stuff like this at the bottom of your web pages:

<script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script>
<script>self.__next_f.push([1,"1:HL[\"/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/_next/static/css/app/layout.css?v=1719846361489\",\"style\"]\n0:D{\"name\":\"r0\",\"env\":\"Server\"}\n"])</script>
<script>self.__next_f.push([1,"3:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/app-router.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\n5:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/client-page.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"ClientPageRoot\"]\n6:I[\"(app-pages-browser)/./app/flight/page.tsx\",[\"app/flight/page\",\"static/chunks/app/flight/page.js\"],\"default\"]\n7:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/layout-router.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\n8:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/render-from-template-context.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\nc:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/error-boundary.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\n4:D{\"name\":\"\",\"env\":\"Server\"}\n9:D{\"name\":\"RootLayout\",\"env\":\"Server\"}\na:D{\"name\":\"NotFound\",\"env\":\"Server\"}\na:[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"childr"])</script>
<script>self.__next_f.push([1,"en\":\"This page could not be found.\"}]}]]}]}]]\n9:[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_aaf875\",\"children\":[\"$\",\"$L7\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L8\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$a\",\"notFoundStyles\":[],\"styles\":null}]}]}]\nb:D{\"name\":\"\",\"env\":\"Server\"}\nd:[]\n0:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/css/app/layout.css?v=1719846361489\",\"precedence\":\"next_static/css/app/layout.css\",\"crossOrigin\":\"$undefined\"}]],[\"$\",\"$L3\",null,{\"buildId\":\"development\",\"assetPrefix\":\"\",\"initialCanonicalUrl\":\"/flight\",\"initialTree\":[\"\",{\"children\":[\"flight\",{\"children\":[\"__PAGE__\",{}]}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"flight\",{\"children\":[\"__PAGE__\",{},[[\"$L4\",[\"$\",\"$L5\",null,{\"props\":{\"params\":{},\"searchParams\":{}},\"Component\":\"$6\"}]],null],null]},[\"$\",\"$L7\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\",\"flight\",\"children\"],\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L8\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"notFoundStyles\":\"$undefined\",\"styles\":null}],null]},[\"$9\",null],null],\"couldBeIntercepted\":false,\"initialHead\":[false,\"$Lb\"],\"globalErrorComponent\":\"$c\",\"missingSlots\":\"$Wd\"}]]\n"])</script>
<script>self.__next_f.push([1,"b:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"React Server Components Payloads\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"By Ed Spencer - edspencer.net\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n4:null\n"])</script>

You may be wondering what this all means. It's not super well documented, and all pretty bleeding-edge. It's not likely to be something you need to worry about in your day-to-day work, but if you're a curious geek like me, read on.

What you're looking at is a bunch of <script> tags automatically injected into the end of the page. The content above is a copy-paste from just about the most basic Next JS application imaginable. It consists of 2 components - a layout.tsx and a page.tsx:

import type { Metadata } from "next";
import "./globals.css";

export const metadata: Metadata = {
  title: "React Server Components Payloads",
  description: "By Ed Spencer - edspencer.net",
};

export default function RootLayout({
  children,
}: Readonly<{
  children: React.ReactNode;
}>) {
  return (
    <html lang="en">
      <body>{children}</body>
    </html>
  );
}

export default function Home() {
  return (
    <div>
      <h1>Server Component</h1>
    </div>
  );
}

In theory, this content could be rendered into the following HTML document - 236 bytes including whitespace:

<html lang="en">
  <head>
    <title>React Server Component Payloads</title>
    <meta name="description" content="By Ed Spencer - edspencer.net">
  </head>
  <body>
    <div>
      <h1>Server Component</h1>
    </div>
  </body>
</html>

Instead, we get about 5kb of stuff, mostly in the form of self.__next_f.push([1, ...]) calls. These calls are pushing payloads into an array that Next.js uses to fetch resources and hydrate the page. The payloads are in a custom format that Next.js uses to communicate between the server and the client, called the RSC Payload.

React Server Component Payloads

I spent a few hours digging through the React and Next.js source code to figure out what these payloads are and how they work. Instructive in this effort were the following source code files:

Within next/src/client/app-index.tsx we can see that the self.__next_f array is being created, and its push function overridden to call nextServerDataCallback each time __next_f.push(...) is called. Each push to that array is expected to be in the form of a 2-tuple, where the first element is a number between 0 and 3, and the second is some string of data (or undefined if the first element was 0).

We can see in both the app-index.tsx and the server-side use-flight-response.tsx the number in the first element means one of 4 things:

const INLINE_FLIGHT_PAYLOAD_BOOTSTRAP = 0
const INLINE_FLIGHT_PAYLOAD_DATA = 1
const INLINE_FLIGHT_PAYLOAD_FORM_STATE = 2
const INLINE_FLIGHT_PAYLOAD_BINARY = 3

Most of the time you're going to see most of the payloads being of type INLINE_FLIGHT_PAYLOAD_DATA (1), which can contain a variety of different types of content.

Basic format

The way the payload gets decoded depends on the value of the first element in the array. Let's focus on calls that send INLINE_FLIGHT_PAYLOAD_DATA (1) payloads, as they're the most common and the most interesting. Each payload item contains one or more rows, each of which contains the following parts:

ROW_ID: a unique identifier for the payload
ROW_TAG: a string that identifies the type of payload
ROW_DATA: the actual payload data
NEW_LINE: a newline character (\n) indicates the end of the row

Let's take a look at that fist injected script:

self.__next_f.push([1,"1:HL[\"/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\"
,{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/_next/static/css/app/layout.css?v=1719846361489\"
,\"style\"]\n0:D{\"name\":\"r0\",\"env\":\"Server\"}\n"])

Here we see that this is a payload of type 1, and the data is a string that contains 3 rows. Each of these rows is separated by a newline character. The first row is a font, the second is a style, and the third is a data row. The data row is a JSON object with a name and env property.

Let's rearrange that to make it a bit easier to understand:

{
  "rows": [
    {
      "ROW_ID": 1,
      "ROW_TAG": "HL",
      "ROW_DATA": ["_next/static/media/c9a5bc6a7c948fb0-s.p.woff2", "font", {"crossOrigin": "", "type": "font/woff2"}]
    },
    {
      "ROW_ID": 2,
      "ROW_TAG": "HL",
      "ROW_DATA": ["_next/static/css/app/layout.css?v=1719846361489", "style"]
    },
    {
      "ROW_ID": 0,
      "ROW_TAG": "D",
      "ROW_DATA": {"name": "r0", "env": "Server"}
    }
  ]
}

Ok that's a bit more interesting. So we actually got 3 rows of data in that first <script> tag. Two of them are of tag type HL, which are "hints" and are ultimately turned into various type of <link> tags and similar to load CSS, fonts, JS and other resources. The third is a data row, which is used to pass data from the server to the client. The row IDs are not in order, but that doesn't seem to matter very much.

2kb at a time

A wrinkle here is that the payloads are limited to 2kb in size. If you try to push a payload larger than that, Next.js will split it into multiple payloads. We see this in the next calls to __next_f.push, which look like this:

self.__next_f.push([1,"3:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/app-router.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\n5:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/client-page.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"ClientPageRoot\"]\n6:I[\"(app-pages-browser)/./app/flight/page.tsx\",[\"app/flight/page\",\"static/chunks/app/flight/page.js\"],\"default\"]\n7:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/layout-router.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\n8:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/render-from-template-context.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\nc:I[\"(app-pages-browser)/./node_modules/next/dist/client/components/error-boundary.js\",[\"app-pages-internals\",\"static/chunks/app-pages-internals.js\"],\"\"]\n4:D{\"name\":\"\",\"env\":\"Server\"}\n9:D{\"name\":\"RootLayout\",\"env\":\"Server\"}\na:D{\"name\":\"NotFound\",\"env\":\"Server\"}\na:[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"childr"])

self.__next_f.push([1,"en\":\"This page could not be found.\"}]}]]}]}]]\n9:[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_aaf875\",\"children\":[\"$\",\"$L7\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L8\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$a\",\"notFoundStyles\":[],\"styles\":null}]}]}]\nb:D{\"name\":\"\",\"env\":\"Server\"}\nd:[]\n0:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/css/app/layout.css?v=1719846361489\",\"precedence\":\"next_static/css/app/layout.css\",\"crossOrigin\":\"$undefined\"}]],[\"$\",\"$L3\",null,{\"buildId\":\"development\",\"assetPrefix\":\"\",\"initialCanonicalUrl\":\"/flight\",\"initialTree\":[\"\",{\"children\":[\"flight\",{\"children\":[\"__PAGE__\",{}]}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"flight\",{\"children\":[\"__PAGE__\",{},[[\"$L4\",[\"$\",\"$L5\",null,{\"props\":{\"params\":{},\"searchParams\":{}},\"Component\":\"$6\"}]],null],null]},[\"$\",\"$L7\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\",\"flight\",\"children\"],\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L8\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"notFoundStyles\":\"$undefined\",\"styles\":null}],null]},[\"$9\",null],null],\"couldBeIntercepted\":false,\"initialHead\":[false,\"$Lb\"],\"globalErrorComponent\":\"$c\",\"missingSlots\":\"$Wd\"}]]\n"])

What we're looking at here are 14 rows of data, sent as a single payload but split into two chunks. Whereas in the first example we looked at the payload ended with a \n, here it ends with a ]. This is because the payload is split into two parts, and the second part is sent as a separate payload. If you're not into horizontal scrolling, the end of that first line above is:

\":0},\"childr"])

And the start of the second line is:

self.__next_f.push([1,"en\":\"This page could not be found.\"}]}]]}]}]]\n

See that the string content of that second one starts with "en", completing the childr at the end of the first. It's pretty obvious by looking at this that the word children is being split across these two <script> tags. If we pull up the browser console and look at the length of the first chunk (the one ending in \"childr"]), we see that it's 2048 characters long, which is a pretty notable number.

So if we were to parse the whole payload into our JSON format, we'd end up with something like this:

{
  "rows": [
    {
      "ROW_ID": 3,
      "ROW_TAG": "I",
      "ROW_DATA": ["(app-pages-browser)/./node_modules/next/dist/client/components/app-router.js", ["app-pages-internals", "static/chunks/app-pages-internals.js"], ""]
    },
    {
      "ROW_ID": 5,
      "ROW_TAG": "I",
      "ROW_DATA": ["(app-pages-browser)/./node_modules/next/dist/client/components/client-page.js", ["app-pages-internals", "static/chunks/app-pages-internals.js"], "ClientPageRoot"]
    },
    {
      "ROW_ID": 6,
      "ROW_TAG": "I",
      "ROW_DATA": ["(app-pages-browser)/./app/flight/page.tsx", ["app/flight/page", "static/chunks/app/flight/page.js"], "default"]
    },
    {
      "ROW_ID": 7,
      "ROW_TAG": "I",
      "ROW_DATA": ["(app-pages-browser)/./node_modules/next/dist/client/components/layout-router.js", ["app-pages-internals", "static/chunks/app-pages-internals.js"], ""]
    },
    {
      "ROW_ID": 8,
      "ROW_TAG": "I",
      "ROW_DATA": ["(app-pages-browser)/./node_modules/next/dist/client/components/render-from-template-context.js", ["app-pages-internals", "static/chunks/app-pages-internals.js"], ""]
    },
    {
      "ROW_ID": "c",
      "ROW_TAG": "I",
      "ROW_DATA": ["(app-pages-browser)/./node_modules/next/dist/client/components/error-boundary.js",["app-pages-internals", "static/chunks/app-pages-internals.js"],""]
    },
    {
      "ROW_ID": 12,
      "ROW_TAG": "D",
      "ROW_DATA": {"name": "", "env": "Server"}
    },
    {
      "ROW_ID": 4,
      "ROW_TAG": "D",
      "ROW_DATA": {"name": "RootLayout", "env": "Server"}
    },
    {
      "ROW_ID": "a",
      "ROW_TAG": "D",
      "ROW_DATA": {"name": "NotFound", "env": "Server"}
    },
    {
      "ROW_ID": "a",
      "ROW_TAG": "D",
      "ROW_DATA": [["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":[["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)"}}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]]
    },
    {
      "ROW_ID": 13,
      "ROW_TAG": "D",
      "ROW_DATA": [["$","html",null,{"lang":"en","children":[["$","body",null,{"className":"__className_aaf875","children":[["$","$L7",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L8",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$a","notFoundStyles":[],"styles":null}]}]}]]}]
    },
    {
      "ROW_ID": 14,
      "ROW_TAG": "D",
      "ROW_DATA": {"name": "", "env": "Server"}
    },
    {
      "ROW_ID": 15,
      "ROW_TAG": "D",
      "ROW_DATA": []
    },
    {
      "ROW_ID": 0,
      "ROW_TAG": "D",
      "ROW_DATA": [[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/app/layout.css?v=1719846361489","precedence":"next_static/css/app/layout.css","crossOrigin":"$undefined"}]],[["$","$L3",null,{"buildId":"development","assetPrefix":"","initialCanonicalUrl":"/flight","initialTree":["",{"children":["flight",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],"initialSeedData":["",{"children":["flight",{"children":["__PAGE__",{},[["$L4",["$","$L5",null,{"props":{"params":{},"searchParams":{}},"Component":"$6"}]],null],null]},["$","$L7",null,{"parallelRouterKey":"children","segmentPath":["children","flight","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L8",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}],null],"$9",null],null],"couldBeIntercepted":false,"initialHead":[false,"$Lb"],"globalErrorComponent":"$c","missingSlots":"$Wd"}]]
    }
  ]
}

There are a few interesting things going on here. Most of the rows are of type I, which are "imports". These are used to load the various JavaScript components that the page needs. Some of the row IDs are duplicated ("a" appears twice, for example). The rows are out of order, but again this doesn't seem to matter. There are quite a few duplicated strings, but gzip will do a good job with those.

It looks like a default 404 React page is inlined in the response here, which is interesting. Finally, the whole thing is sent in the same HTTP response as the page, so no extra HTTP requests are needed to fetch this data (though of course we will still make a few requests for the resources defined in those I blocks).

Opaque? Yeah kinda, but it is just about scrutable if you have time to wade through the various source files. Let's look at a couple more examples.

Using Suspense

Let's swap out the page.tsx file for one that uses Suspense. Note that this is an async function (more on that later):

import { Suspense } from "react";

async function getData(): Promise<string> {
  return new Promise((resolve) => {
    setTimeout(() => {
      resolve("resolved data");
    }, 1000);
  });
}

export default async function SuspensePage() {
  const data = await getData();
  return <Suspense fallback={<div>Loading...</div>}>{data}</Suspense>;
}

Our HTML file looks largely the same, but has this additional payload at the end:

<script>self.__next_f.push([1,"d:\"$Sreact.suspense\"\n5:[\"$\",\"$d\",null,{\"fallback\":[\"$\",\"div\",null,{\"children\":\"Loading...\"}],\"children\":\"resolved data\"}]\n"])</script>

Or, slightly reformatted:

{
  "rows": [
    {
      "ROW_ID": "d",
      "ROW_TAG": undefined,
      "ROW_DATA": "$Sreact.suspense"
    },
    {
      "ROW_ID": 5,
      "ROW_TAG": undefined,
      "ROW_DATA": [["$","$d",null,{"fallback":["$","div",null,{"children":"Loading..."}],"children":"resolved data"}]]
    }
  ]
}

Looking at the source, we can see that this first row is going to resolve to a Symbol.for lookup for the react.suspense symbol. I didn't trace this through any further but I would bet this is telling Next.js that we're using Suspense and to load the appropriate resources.

Row ID 5 then defines a specification for a React element that will be rendered on the client. It's a Suspense component with a fallback of a div that says "Loading..." and a child of "resolved data". This is the data that was resolved by the getData function in the SuspensePage component. The important bit is the last element there, which is the props that will be passed to the component when it's rendered, via a call to createElement.

It looks like the $d is an internal mapping that will resolve to a React Suspense component. Note that if you watched this HTML response streaming in, you'd see all of the <script> tags being rendered pretty much immediately, with the exception of this final one that defines the Suspense component, which comes in 1000ms later per our setTimeout. What this means is that we don't actually see the "Loading..." text in the browser, because the Suspense component is only rendered once the data is available. Take a look at my post on async RSC and Suspense for more on that.

Using a Promise

One other thing we can do is return a Promise from our component. This will allow us to see our loading UI. Let's swap out the page.tsx file for one that does this:

import { Suspense } from "react";
import ClientPromise from "./component";

async function getData(): Promise<any> {
  return new Promise((resolve) => {
    setTimeout(() => {
      resolve("promise resolved data");
    }, 1000);
  });
}

export default function SuspensePage() {
  return (
    <div>
      <h1>Server Component</h1>
      <Suspense fallback={<div>Loading...</div>}>
        <ClientPromise dataPromise={getData()} />
      </Suspense>
    </div>
  );
}

Note that we removed the async from the SuspensePage function, and we're passing the unresolved Promise directly into our child ClientPromise component, which looks like this (note that this is a client-side component via use client):

"use client";
import { use } from "react";

export default function ClientPromise({ dataPromise }: { dataPromise: Promise<string> }) {
  console.log(dataPromise);
  const data = use(dataPromise);

  return <p>{data}</p>;
}

Neither our server component nor our client component functions are defined as async, but via the magic of the new use hook we still get asynchronous behavior.

Just about the only reason to do this is curiosity, though it may help you debug something if you're having trouble with your Suspense or Server components within Next.js.

That's kinda crazy. We just sent an uncompleted Promise over the wire from the server to the client, and it... worked? What's going on here? Let's look at the payload (snipped for clarity):

<script>self.__next_f.push([1,"... snip {\"dataPromise\":\"$@8\"}]}]]}]\n snip ...])</script>

In this little snippet we see a key called dataPromise (the name of our prop), which has a value of "$@8". A little later we get this:

<script>self.__next_f.push([1,"8:\"promise resolved data\"\n"])</script>

There's our number 8 again. Watching the response streaming in, we can see that this last <script> tag arrived a second after all the others, and it contains the resolved data from our Promise. Suspense is then used to update the page via this little gem:

<div hidden id="S:0">
  <p>promise resolved data</p>
</div>
<script>$RC=function(b,c,e){c=document.getElementById(c);c.parentNode.removeChild(c);var a=document.getElementById(b);if(a){b=a.previousSibling;if(e)b.data="$!",a.setAttribute("data-dgst",e);else{e=b.parentNode;a=b.nextSibling;var f=0;do{if(a&&8===a.nodeType){var d=a.data;if("/$"===d)if(0===f)break;else f--;else"$"!==d&&"$?"!==d&&"$!"!==d||f++}d=a.nextSibling;e.removeChild(a);a=d}while(a);for(;c.firstChild;)e.insertBefore(c.firstChild,a);b.data="$"}b._reactRetry&&b._reactRetry()}};$RC("B:0","S:0")</script>

Earlier in the response, Suspense had returned this fallback HTML (note the B:0 ID):

<!--$?--><template id="B:0"></template><div>Loading...</div><!--/$-->

At the end of the block before, where we have the <div hidden id="S:0">, there was a JavaScript function called $RC that basically replaces the B:0 element on the page with the contents of the S:0 element. This is how the resolved data from the Promise is injected into the page. It's not magic at all, just low-level DOM manipulation.

So in summary, React rendered the fallback, gave it an ID B:0, did some bookkeeping to tell itself that a Promise with ID=8 would be resolving at some point, then when that promise did resolve, it rendered it into a hidden div with ID S:0, as well as into a <script> tag keyed on the Promise ID so that hydration can work, and then executed some inline JavaScript to replace B:0 with S:0.

Thoughts and Further Reading

It's totally not necessary to deeply understand how the cogs go round here, but it is fascinating. RSC Payloads are a little arcane and hard to understand, but ultimately they're not magic. I'm not in love with how complex and proprietary this stuff is, but that won't stop me using it. As it matures I'm sure there will be more articles and documentation explaining how and why it works, but for now there's not a lot to go on other than random blog posts like this one.

Some posts and resources that I found really useful in grokking this stuff are:

There's actually quite a bit more to RSC Payloads than I covered here, but this article is long enough already. I'll write a shorter one soon about Promises and the constraints on what you can and cannot send this way via RSC Payloads.

Teams using Next.js and Vercel have an advantage

Thu, 27 Jun 2024 16:31:43 GMT

During my time at Palo Alto Networks, I spent most of my time working on a product called AutoFocus. It helped cyber security research teams analyze files traversing our firewalls for signs of malware. It was pretty cool, fronted by a large React application, with a bunch of disparate backend services and databases scattered around.

One of the things that was difficult to do was deploy our software. We were on a roughly 3 month release cycle to begin with, which meant several things:

Out-of-band bug fix releases were expensive
We didn't get much practice deploying, so when we did, it was a team effort, error prone and took a long time.
Trying to estimate and scope 3 months of work for a team of 10 is a fool's errand

Deployment meant getting most of the team into a war room, manually uploading build files to various places, doing a sort of canary deploy, seeing if things seemed ok, then rolling out to the rest of the world. Sometimes we decided to roll out architectural changes to reverse proxies and things at the same time, just for fun.

When I became engineering manager of the UI team for AutoFocus, my top priority was to change all that. We spent 6 months of solid effort to modernize how we built our software, culminating in fully automated CICD to production, staging and unlimited dev environments. It was awesome. It cost us 6 months to get what Vercel gives you out of the box. Don't build it yourself.

CICD

It's 2024. If you're not doing CICD you're seriously behind the curve at this point. The top 5 things that prevent engineers from actually building value for your product are, in no particular order:

Meetings
Deploying things
Context switching
Low quality codebase
Crappy hardware (see how to do it right)

CICD takes care of that second item, by making deployment a non-event. These days, CICD is so easy to achieve with the right technology selections that I'd advise making it one of the first things you do when you start a new project. If you're not automatically deploying code to some environment by the end of the first day, you've probably left it longer than you ought to.

Review Apps and the joy of unlimited Environments

Your application needs to run in a bunch of different places. There is the canonical instance your users mainly use, which we usually call production. Then there are instances of your application running on your developers' laptops, which we call development.

It used to be a common pattern to have a single canonical staging environment, which was a place where one could deploy code before putting it in production. This is generally a poor approach these days. Instead of a single staging instance, which always encounters contention when multiple developers want to deploy code there, multiple PMs and sales folks want to demo from there, QA folks want to test against, and so on, you need to set up your application to run in infinitely many different environments.

In a setup like this, production, demo and other persistent environments are nothing special compared to any other environment. But in addition to this, it is enormously powerful to have the tip of every branch continually deployed to a unique environment. Some people call these ephemeral environments Review Apps, others Preview Deployments. They are automatically created when a branch is first pushed, and automatically destroyed when the branch is deleted.

Suddenly all of the stakeholders for your application can visit a predictable url like https://my-branch-name.myapp.com and see exactly what the application will look like when that code is deployed to production. Automated integration tests can run against these environments, with the results automatically attached to the PR.

Prerequisites and what Vercel does for you

Why did it take us 6 months to roll this stuff ourselves? Well, to pull this off, you need a bunch of things in place:

Infrastructure as Code (IaC)
Reliable and trustable automated testing
CICD plumbing code (GitHub or GitLab Actions, Jenkins, etc)
DNS for your Review Apps
SSL cert generation for your environments

Some of this will depend on the complexity of your application. CICD Pipeline development is fun, but it's time consuming and usually consists of hundreds of commits and pushes to see if the machine is working as expected. It's a lot of work to get right, and encompasses quite a diverse range of skills. Certainly I had to learn a bunch about Kubernetes, Docker, Terraform, and a bunch of other stuff to get it right, as well as orchestrating DNS entries, SSL certs, ingress controllers, and so on.

Vercel effectively handles the CICD plumbing, DNS hookup and SSL cert generation for you. It does the branch deployment to a Review App environment with an SSL cert and DNS entry out of the box. That's a huge win.

You're still on the hook for automated testing that you actually trust. If you can keep your app architecture simple then you don't actually need to define any IaC code - the key is more that you don't need a human in the loop to create or destroy environments. As far as testing goes, best in class is a set of integration tests that execute against a controlled data set, running on every push to every branch.

More complex architectures

There is a point at which this starts to break down. Getting simple apps up and running is very easy, but incorporating a more complex architecture can be difficult to the point of maddening (Heroku suffers spectacularly from this - some of my most inventive cursing ever emanated from trying to deploy to heroku as part of a larger environment deployment). The simpler you can keep your application architecture, the easier it is to build on top of the Vercel platform.

Sometimes that's not realistic though. AutoFocus, for example, relied on a collection of backend services, databases, and other things that are not easily deployable to Vercel. Often there will be petabytes of data in the picture, sometimes attached to written promises that it won't leave your own data center. That massive Elastic Search cluster that holds 10 years of customer data isn't getting migrated into Vercel any time soon.

So how do we deal with this? It will depend on your application, but the general solution is to split your front-end out from the backend stuff. Keep that front end deployable to Vercel and keep all of the productivity benefits that come with it. If your backend is just a relational database and maybe some object storage, keep it all together and deploy to Vercel. If it's significantly more complex than that, split the frontend (and frontend-adjacent) stuff into its own repo so your frontend team can keep moving fast and leverage the productivity benefits of Vercel.

Your backend can't be a ghetto, though. It needs to be IaC, and it almost certainly needs some kind of controlled dataset. Ideally you have a 1:1 mapping between each frontend environment and a pristine backend environment, so that you can run integration tests against the entire system. You can still deploy the UI-centric portion of your application to Vercel using Deploy Hooks - this is useful if you need to trigger a backend deployment as part of your frontend deployment.

A short dev loop

UI development benefits enormously from a sub-second developer loop. Within a second of hitting save on a file, I expect to see the following:

Updated UI rendered in my browser
Test suite execution and results

Messing around with getting CSS right is a miserable process if I have to keep switching between windows, hitting refresh on a browser, or far worse having to build something first. With the proper hardware an engineer can keep their entire development environment in view at all times, and see the effect of their changes immediately. Next.js does this out of the box - you just need the pixels to see it all.

It's hard to overstate how liberating this is for a frontend-centric developer. Coupled with fast CICD, you can iterate at high velocity at all stages of the development process and keep yourself in that high-productivity flow state for much longer.

Beyond what Next.js gives you out of the box, you basically need 2 additional scripts to keep engineers working on a local environment running at high speed:

Test suite runner (with file watching). If this takes more than a second on decent hardware, you've probably written the wrong tests.
Controlled Dataset loader. A one-liner to load a realistic dataset into your local environment. Developing against empty datasets leads to crappy user experiences.

Relationship with React

It has been commented upon that Next.js and React are pretty close bedfellows these days. New features like React Server Components are only feasible in a handful of frameworks, one of the small number of which is Next.js. Not everyone is thrilled with this state of affairs, grumbling with some justification that you need to be using Next.js to get the most out of React, where React should be a standalone library.

And they're right: to get the most out of React, you should be using Next.js (or another framework like it - though your options are not many). This may have negative implications for the React ecosystem, but if your objective is to build a UI as quickly as possible, you should be using Next.js, and you should probably be deploying to Vercel. A lot of the new tech like RSCs just aren't going to work otherwise.

The ball is being pushed forward rapidly here, especially in the realm of application performance. RSC, SSR, SSG and ISR are all TLAs that can be game changers for the way your application feels when humans use it. Stuff is going to break and wrong turns are going to be made - the leading edge is a better place to be than the bleeding edge, but the rewards for your team and your users are enormous.

The best hardware setup for software engineers

Tue, 25 Jun 2024 16:31:43 GMT

When I'm writing software I usually have the following windows open, all at the same time:

2 column layout VS Code (window = 2560 x 2160)
A fullscreen-equivalent browser with usable console to see what I'm working on (window = 1920 x 2160)
A large terminal window (window = 2560 x 1440)
Chat GPT (window = 2560 x 1440)
A full-screen browser with all the stuff I'm researching (window = 2560 x 2880)

I find tabbing between windows to be a great destroyer of productivity, so I've spent a good deal of time and money over the last few years iterating on a hardware setup that lets me see everything at once. Today, it looks like this:

I went through a number of iterations when it comes to monitors. For a long time I used dual 32" 4K IPS screens, but even that wasn't quite enough pixels. It's hard to physically fit more than 2 32" screens on a desk - they're too wide already, and it would not be ergonomic to mount them above each other.

About a year ago I discovered these 16:18 ratio screens and bought 2 of them as side screens around the central 32" 4k screen. They're 2560x2880, providing ~7m pixels each, compared to ~8m pixels on a 4k screen. Another way to look at it is two 1440p screens stacked on top of each other.

Aim for 20 million pixels

When I was a little younger, the most coveted monitor was the Apple Cinema Display. At 27" it was a good size, and at 1440p (2560px x 1440px) it had a lot of pixels for the time. They had excellent picture quality and went for a thousand dollars each.

I had 2 of these side by side, which filled my desk. Between them they gave me about 7 million pixels, with little prospect for adding more due to the physical constraints of a reasonable desk.

Each of those 16:18 side screens provide the same pixel count as those 2 Apple screens combined, effectively stacked one on top of the other. Splitting the 4k display into 4x 1080p windows yields a potential layout like this:

Each blue box represents a genuinely full-screen-equivalent window. The gray 1920x1080 windows are a little too small for my liking when it comes to software engineering; here's how I actually lay it out:

The yellow box is a full-height browser window. It is 1/3rd the width of the 4k screen, which gives it a very pleasant 1280px width (this happens to line up with a tailwind breakpoint). At 2160px height, it's tall enough that I can have a 1000px tall dev tools console open at the same time as the content I'm working on, without compromising on either.

The green box is a double-height browser window, and excellent for reading developer documentation, GitHub issues and the like. Scanning through lots of text is far more efficient when you have 2880px of vertical height.

It's really flexible to have the side screens work like this, and throughout the day I will often switch between having a single double-height window (e.g. green box) and two 1440p windows stacked on top of each other (e.g. 2x blue boxes). The mac app Rectangle makes that trivial with a couple of keyboard shortcuts.

How it feels to use

Here's what it actually looks like when you have a single window on that 16:18 screen. Browsing documentation this way is a dream because you can see so much content at once, without compromising on hiding other windows behind it.

Another huge benefit that the 3 monitor setup has over the 2 monitor is that you can now have a screen that is facing you directly, instead of 2 screens that are both on an angle. I probably spend 80% of the time looking at the content on the center screen, and while doing so I'm not having to turn my head to look at one screen or the other.

For the center screen I went with a 32" 4K IPS display. It cost the best part of a thousand dollars, but I spend thousands of hours a year using it and it's high quality enough that it doesn't hurt my eyes. I use Rectangle to manage the windows on the screen - the right 2/3s is for VS Code, the left 1/3 is dedicated to a browser window for whatever app I'm working on:

Here, without switching windows, I have 2 full height, full width code editors, so I can see a hundred lines of one file while working on another, and a full-height browser window, which is 1280px wide and tall enough to have a genuinely usable dev tools console. The monitor is physically large enough that all of the text is comfortably legible.

The right hand screen is another 16:18 monitor, and I typically keep a half-height Chat GPT window open, and a half-height terminal window. Bear in mind that half-height here is still 2160px x 1440px, which is a full screen application on any monitor with less than 4k resolution, and we get 2 of them per side monitor:

Laptops and other hardware

Obviously, your computer needs to be powerful enough to drive these 20 million pixels without skipping a beat. Micro-context switches that happen when you have to wait a few seconds when you do something really add up and suck away your productivity.

I'm using a maxed-out MacBook Pro with an M2 Max CPU and 64Gb of RAM. It basically never gets in my way regardless of what I throw at it. They've released the M3 since then, and these things are only going to get more capable. I expect this machine will provide me with an excellent developer experience for several years before I replace it. The laptop form factor is useful; although I lose 70% of my pixels when I code on the couch, it is nice to be able to move around the house and work in different places.

I use a gray bluetooth Apple keyboard and a Logitech MX Master 3 mouse. The keyboard is laughably expensive, but it ties the room together with its matching color scheme. The mouse is excellent, and can be paired with 3 different devices, so I can switch between my laptop and my PC desktop machine pretty easily.

Desk and chair

If you're going to be at a desk many hours per day, it's a good idea to get a standing desk so that you're not sitting all day long. Ten years ago I bought a xdesk terra, which cost a couple of thousand and was pretty good, but then I moved house and had a little more space so upgraded to a 80" wide Uplift Desk. The Uplift also cost about $2k, but it's big and very stable and I expect it will also last me at least a decade.

I have a Herman Miller Aeron chair, which is also expensive but very comfortable. I've had it for 10 years and it's still in great condition. I've had a couple of cheaper chairs in the past, and they've been ok but not great. You don't need to spend a grand on a chair, but it's important to get one that you forget you're sitting on, otherwise it'll just be a distraction.

Creature Comforts

If your office is comfortable you will want to spend more time in it. Little things like adding Philips Hue light bars to the backs of the monitors and spending a little time setting up the colors and brightness can make a big difference to the ambience of the room. Airpod Pros are great for meetings, especially with the noise cancellation turned on.

Putting it all together

If I was starting from scratch, I'd buy the following, which happens to be my current setup:

With sales tax, cables and other paraphernalia, you're looking at about $10k for a top-of-the-line setup. This is a lot of money, but most of this will last several years, and the economics are heavily in your favor, as we'll discover in this final section.

How companies should think about this

A lot of companies will give their engineers a setup like this:

Here's how that looks in terms of pixels. I will generously assume that the monitors are 2560 x 1440, and am showing that right screen oriented in landscape mode:

Those two blue boxes are the same size as the ones in the diagrams earlier, except instead of this being what you get per side screen, not counting the main screen, this is the whole ball game. Why would you do this to your engineers? That's only giving them a third of the pixels they could use to maximize their productivity.

A good engineer costs a lot of money - hundreds of thousands of dollars per year. Even more for senior folks. Given that the company employing them needs to make a profit, the average engineer probably needs to generate in the order of $1m per year in value to justify the expense to the company.

Therefore, for the typical engineer, an increase in productivity of 1% is worth about $10,000 per year. In other words, you could buy each of your engineers all of that $10,000 of brand new equipment on January 1st, have them smash it all to pieces on December 31st and do the same thing again the next year, and you'd still be ahead so long as they were 1% more productive.

I would conservatively estimate that a setup like this is at least 10% more productive than what usually gets served up. Given that you're going to spend a minimum of $5k even to kit out your engineer poorly, the marginal cost here is actually just a few thousand bucks, and most of it comes down to screen real estate.

Scaling engineering organizations and teams get less efficient as they get bigger, so equip your engineers with the best tools that exist and you'll get more done with fewer people.

Loading Fast and Slow: async React Server Components and Suspense

Tue, 18 Jun 2024 16:31:43 GMT

When the web was young, HTML pages were served to clients running web browser software that would turn the HTML text response into rendered pixels on the screen. At first these were static HTML files, but then things like PHP and others came along to allow the server to customize the HTML sent to each client.

CSS came along to change the appearance of what got rendered. JavaScript came along to make the page interactive. Suddenly the page was no longer the atomic unit of the web experience: pages could modify themselves right there inside the browser, without the server being in the loop at all.

This was good because the network is slow and less than 100% reliable. It heralded a new golden age for the web. Progressively, less and less of the HTML content was sent to clients as pre-rendered HTML, and more and more was sent as JSON data that the client would render into HTML using JavaScript.

This all required a lot more work to be done on the client, though, which meant the client had to download a lot more JavaScript. Before long we were shipping MEGABYTES of JavaScript down to the web browser, and we lost the speediness we had gained by not reloading the whole page all the time. Page transitions were fast, but the initial load was slow. Megabytes of code shipped to the browser can multiply into hundreds of megabytes of device memory consumed, and not every device is your state of the art Macbook Pro.

Single Page Applications ultimately do the same thing as that old PHP application did - render a bunch of HTML and pass it to the browser to render. The actual rendered output is often a few kilobytes of plain text HTML, but we downloaded, parsed and executed megabytes of JavaScript to generate those few kilobytes of HTML. What if there was a way we could keep the interactivity of a SPA, but only send the HTML that needs to be rendered to the client?

Enter React Server Components

React Server Components are one of the biggest developments in React for years, with the potential to solve many of these problems. RSCs allow us to split our page rendering into two buckets - Components rendered on the client (traditional React style) and components rendered on the server (traditional web style).

Let's say we're building an application to help us manage devices, so we want some CRUD. Probably we're going to have a Devices index page where we can look at the list of Devices, and then either click on one to see the details, or click a button to create a new one. We might also want to edit or delete devices.

In the traditional React client-side mindset, we would build ourselves a page that will be rendered in the browser - it will need to fetch the Devices data from our backend, wait until the response comes back, handle any errors, and then render the list of devices. We might use a library like SWR to handle the fetching and caching of the data, and we might use a library like React Query to handle the mutation of the data. You've probably written this component a thousand times.

Maybe we'd end up with something that looks like this:

You've seen the code to do this on the client side a thousand times before, with all its useState, useEffect, fetch, try/catch and other boilerplate. It's easy to create bugs in this code, to forget to handle edge cases, and to end up with a page that doesn't work as expected. What if we could write it like this instead?

import { getDevices } from "@/models/device";

import { Heading } from "@/components/common/heading";
import { Button } from "@/components/common/button";
import DevicesTable from "@/components/device/table";

export default async function DevicesPage() {
  const devices = await getDevices();

  return (
    <div>
      <div className="flex w-full flex-wrap items-end justify-between gap-4 pb-6">
        <Heading>Devices</Heading>
        <div className="flex gap-4">
          <Button href="/devices/create">Add Device</Button>
        </div>
      </div>
      <DevicesTable devices={devices} />
    </div>
  );
}

This is a React Server Component. In this brave new world, you can tell it's a server component because the file doesn't start with 'use client'. RSCs are still pretty new and only supported in frameworks like NextJS that have a server-side rendering capability. By default, all components in NextJS are server components unless your file starts with 'use client'.

The main thing this component is doing is fetching device data via the getDevices function, which is all running on the server side and probably reading from a database. By doing this on the server, we avoid a) an extra HTTP round-trip to fetch the data separately from the React component, and b) all of the client-side logic required to make that work. Our code is clean and simple, with the magic of async/await making it read as though its synchronous, which is easier on human brains.

Let's have a quick look at the layout.tsx file that this component is rendering into:

export default function RootLayout({
  children,
}: Readonly<{
  children: React.ReactNode;
}>) {
  return (
    <html lang="en">
      <body>
        {children}
      </body>
    </html>
  );
}

Ok that's about as basic as it gets. The RootLayout component is also a React Server Component - it gets rendered on the server and the resulting HTML is sent to the client. When we visit the /devices URL, the server will render the app/devices/page.tsx component and shove it where we put {children} in the layout.tsx file.

You can see a live, real life Next JS application demonstrating how these different approaches feel at https://rsc-suspense-patterns.edspencer.net/. The code is super simple and available at https://github.com/edspencer/rsc-suspense-patterns.

But there's a wrinkle here - our DevicesPage component is defined as an async function. That's because, in this case, we need to make some asynchronous calls to fetch the data we need to render the page. So of course it's got to be async, but how does that mesh with our synchronous rendering of the layout and returning of the response to the client?.

Well, by default, it means that the server will have to wait for the async DevicesPage function to finish before it can render the page and send it to the client. If our database lookup is slow, this means the user is sat looking at a completely blank screen for a while. Not a great user experience.

To convince you of this, I created a skeleton Next JS application that is currently running at https://rsc-suspense-patterns.edspencer.net/. It has 5 pages, all of which are React Server Components, and all of which have different treatments of the async data fetching. The code for this application is available at https://github.com/edspencer/rsc-suspense-patterns.

Vanilla async Server Component rendering

The first page in my little skeleton app is at https://rsc-suspense-patterns.edspencer.net/slow/no-suspense - the best thing to do is open that in a new window at watch it load. You'll see nothing happen for 3 seconds, then suddenly the whole page appears at once. This is because the page.tsx for that URL is exactly what I show you in the code block above - an async function that fetches some data and then returns it. The call to getDevices there just waits 3 seconds before returning a static array of fake data.

This page feels broken, right? Nothing happens for 3 seconds, which is more than enough time to make a user think the page is broken and leave. With React Suspense, though, we can do better than this, starting with the next page in my little app.

Page-level Suspense boundaries with loading.tsx

Next.JS provides a nice little convention for providing page-level Suspense behavior, including React Server Component pages. Suspense, if you're not familiar with it, is a way for your React application to render everything that it can, show that to the user, and when the rest of the components on the page are ready to be rendered, stream them into the browser.

With Next.JS, we can just create a loading.tsx file in the same directory as our page.tsx file, and it will be used as a fallback while the page is loading. This is a great way to show a loading spinner or other loading indicator to the user while the page is loading. Here's how simple that can be:

export default function Loading() {
  return (
    <div className="flex justify-center items-center h-64">
      <div className="animate-spin rounded-full h-16 w-16 border-b-2 border-gray-900"></div>
    </div>
  );
}

Just by defining this file, Next.js did a little work under the covers, resulting in the following behavior:

When the page first loads, the page.tsx component rendering is initiated, but doesn't render immediately
While that async function is fetching data/doing whatever else before rendering, the loading.tsx component is rendered instead
When the async function finishes, the page.tsx component is rendered and replaces the loading.tsx component

You can see this in action at https://rsc-suspense-patterns.edspencer.net/slow/suspense. Again, to really see what is going on there, open the link in a brand new browser tab/window. This time, we get the page header menu rendering immediately - it is part of layout.tsx, and for 3 seconds we see our loading.tsx render - a spinner in this case. After 3 seconds, the page.tsx component renders and replaces the spinner:

Component-level suspense boundaries

Page-level Suspense boundaries are an improvement to our vanilla version because at least we're rendering some of our application immediately, and showing the user that something is happening via a loading spinner. It's also super-easy to just drop a loading.tsx file into a component directory and have it work.

But we can do better than that. We can use Suspense boundaries at the component level to show the user that something is happening at a more granular level. Here's the actual source code that powers the third and final slow loading RSC page in my demo - which you can see live at https://rsc-suspense-patterns.edspencer.net/slow/component-suspense:

import { getDevices } from "@/models/device";

import Heading from "@/components/common/heading";
import DevicesTable from "@/components/device/table";
import AddDeviceButton from "@/components/device/button";

import Loading from "@/components/common/loading";
import { Suspense } from "react";

export default function DevicesPage() {
  return (
    <>
      <div className="flex w-full flex-wrap items-end justify-between gap-4 pb-6">
        <Heading>Devices (3000ms database call, Component-level Suspense)</Heading>
        <div className="flex gap-4">
          <AddDeviceButton />
        </div>
      </div>
      <Suspense fallback={<Loading />}>
        <LoadedDevicesTable />
      </Suspense>
      <p>
        On this screen, we get all of the page contents rendered instantly (including this paragraph),
        but see a loading spinner while the table is loaded, rendered, and streamed back to the client.
      </p>
    </>
  );
}

async function LoadedDevicesTable() {
  const devices = await getDevices();

  return <DevicesTable devices={devices} />;
}

We've done three things here:

We split the loading and rendering of the <DevicesTable> into a separate (async) component called <LoadedDevicesTable>
We made our DevicesPage component synchronous, so it renders immediately
We wrapped our new <LoadedDevicesTable> component in a <Suspense> component, with a fallback prop that renders our loading spinner

If you open up the live demo page, you'll see that the entire page renders instantly, including the header and footer, and the paragraph explaining what's going on. The only thing that doesn't render immediately is the data table, which shows a loading spinner until the data is fetched and the table is rendered.

This is a much better user experience than the vanilla version, and even the page-level Suspense version. It's a great way to show the user that something is happening, and that the page isn't broken, while still rendering as much of the page as possible immediately. Adding a <Suspense> wrapper is every bit as easy as adding a loading.tsx file, and will often produce a better user experience.

Now your application is ~90% rendering on the server side, using React Server Components, and only the interactive parts are rendered on the client side. This is a great way to get the best of both worlds - the speed and reliability of server-side rendering, and the interactivity of client-side rendering.

Implications for React Server Components

Generally speaking, if a page requires several database/RPC calls to load its data, it will usually be significantly faster to render that page on the server side than on the client side. This is because the server usually has a fast, low-latency connection to the database, and can render the page in a single pass.

But this is not a panacea - databases that started out fast often become slow over time. UX patterns (like not using Suspense) that made total sense with a 10ms data fetch can become a problem when that fetch takes 3000ms or more. If you start to one or more of those slow data fetches on a page, you're not going to be giving your users a great experience if you use async React Server Components at the page level.

Consider making page-level RSCs synchronous

The approach in the code block below (which is the same approach as above) is one way to get around that, where we split the async code out of the Page component. By confining ourselves to rendering only synchronous components at the page level, we can render the page immediately and then stream in the async components as they're ready. This is a great way to give the user a sense of progress and keep them engaged with the page.

import Loading from "@/components/common/loading";
import { Suspense } from "react";

//synchronous - fast!
export default function FastRSCPage() {
  return (
    <>
      <h2>My lovely page</h2>
      <Suspense fallback={<Loading />}>
        <SlowLoadingComponent />
      </Suspense>
    </>
  );
}

//async - can be slow but doesn't matter as it's not at the page level
async function SlowLoadingComponent() {
  const devices = await getDevices();

  return <DevicesTable devices={devices} />;
}

In this approach, our <FastRSCPage> and <SlowLoadingComponent> components are both still React Server Components. They even happen to be in the same file, though they don't have to be. It's just that splitting the async code out of our top-level component (the "page") means that we can render as much of the UI as possible, essentially instantly.

Page Interactivity waits for Suspense (sometimes)

Our little page has an Add Device button, which is the only 'use client' component in the entire app. All it does in this demo is fire an alert, which ought to convince you it is a component running in the browser.

But if you open up https://rsc-suspense-patterns.edspencer.net/slow/component-suspense and click the Add Device button while the spinner is still spinning, nothing happens. Click it again after the spinner goes away, and you'll see the alert. This might be a little unexpected - the button is in the synchoronous part of the page, not within the Suspense boundary, so why doesn't it work?

I actually don't know. React 18 came along with an excellent post explaining how Suspense is supposed to work, including Selective Hydration. Hydration is when you render your page HTML on the server side, the client downloads it, then React spins up in the client and attaches itself to all that lovely HTML the server sent down. Until Hydration is complete, your React app may be mostly rendered, but it is not interactive.

Selective Hydration is supposed to enable React to automatically hydrate the parts of your application that are fully rendered, running hydration again for any components inside <Suspense> boundaries that were not ready the first time hydration occurred.

This should mean that the Add Device button is interactive as soon as the page is hydrated, even if the data table is still loading. As you'll note, it doesn't seem to actually do that, so watch out for behavior like this in your own apps. All of this stuff is pretty new, so it's possible that there are still some bugs to be ironed out. If I figure that out I'll let you know.

Conclusions and further reading

React Server Components are a powerful new feature in React that can be a game-changer for the UX of your applications when implemented correctly. They're also a Big Rewrite trap that could seem annoying if you have thousands of hours invested in a React app that works the Old Way. But if you're starting a new project, or have a project that's not working as well as you'd like, they're definitely worth a look.

I read some excellent posts by some fine folks while embarking on my own journey of understanding around this topic - here are three articles on RSC that you should consider reading:

Making Sense of React Server Components - an excellent intro to the concept by Josh Comeau
The Forensics Of React Server Components - goes into the weeds on how this stuff works under the hood
New Suspense SSR Architecture in React 18 - the source of truth, from the horse's mouth

Using Server Actions with Next JS

Tue, 04 Jun 2024 16:31:43 GMT

React and Next.js introduced Server Actions a while back, as a new/old way to call server-side code from the client. In this post, I'll explain what Server Actions are, how they work, and how you can use them in your Next.js applications. We'll look at why they are and are not APIs, why they can make your front end code cleaner, and why they can make your backend code messier.

Everything old is new again

In the beginning, there were <form>s. They had an action, and a method, and when you clicked the submit button, the browser would send a request to the server. The server would then process the request and send back a response, which could be a redirect. The action was the URL of the server endpoint, and the method was usually either GET or POST.

<form action="/submit" method="POST">
  <input type="text" name="name" />
  <button type="submit">Submit</button>
</form>

Then came AJAX, and suddenly we could send requests to the server without reloading the page. This was a game-changer, and it opened up a whole new world of possibilities for building web applications. But it also introduced a lot of complexity, as developers had to manage things like network requests, error handling, and loading states. We ended up building React components like this:

//this is just so 2019
export default function CreateDevice() {
  const [name, setName] = useState('');
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState(null);

  const handleSubmit = async (e) => {
    e.preventDefault();
    setLoading(true);
    try {
      await fetch('/api/devices', {
        method: 'POST',
        body: JSON.stringify({ name }),
        headers: {
          'Content-Type': 'application/json',
        },
      });
    } catch (err) {
      setError(err);
    } finally {
      setLoading(false);
    }
  };

  return (
    <form onSubmit={handleSubmit}>
      <input type="text" value={name} onChange={(e) => setName(e.target.value)} />
      <button type="submit" disabled={loading}>Submit</button>
      {error && <p>{error.message}</p>}
    </form>
  );
}

This code is fine, but it's a lot of boilerplate for something as simple as submitting a form. It's also not very readable, as the logic for handling the form submission is mixed in with the UI code. Wouldn't it be nice if we could go back to the good old days of <form>s, but without the page reload?

Enter Server Actions

Now, with Server Actions, React is bringing back the simplicity of the old days, while still taking advantage of the power of modern web technologies. Server Actions allow you to call server-side code from the client, just like you would with a traditional form submission, but without the page reload. It wants you to think that this is all happening without an API on the backend, but this isn't true. It's not magic after all.

Here's how we can write the same form using Server Actions:

'use client';
import { useFormState } from 'react-dom';
import { createDeviceAction } from '@/app/actions/devices';

export function AddDeviceForm() {
  const [state, formAction] = useFormState(createDeviceAction, {});

  return (
    <form action={formAction} className="create-device">
      <fieldset>
        <label htmlFor="name">Name:</label>
        <input type="text" name="name" id="name" placeholder="type something" />
        <button type="submit">Submit</button>
      </fieldset>
      {state.status === 'error' && <p className="text-red-500">{state.message}</p>}
      {state.status === 'success' && <p className="text-green-500">{state.message}</p>}
    </form>
  );
}

Here's that same AddDeviceForm Component running live in this page. It's a real React component, so try submitting it with and without text in the input field. In both cases it's hitting our createDeviceAction function, which is just a simple function that returns a success or error message based on the input:

The form components on this page are real React components, running in the browser. You can interact with them just like you would with any other web page. The code for these components is copy/pasted from the actual code running behind the scenes, so you can see exactly how it works.

The only thing I've done is add some styling to make it look nice. The actual code is just the component and the action function, nothing else.

One nice thing about this is that the Enter key works on your keyboard without any extra code. This is because the form is a real form, and the submit button is a real submit button. The formAction hook is doing the work of intercepting the form submission and calling the server action instead of the default form submission. It feels more like the old school web.

And here's the actual server action that is being called, in a file called app/actions/devices.ts:

'use server';

export async function createDeviceAction(prevState: any, formData: FormData) {
  const name = formData.get('name');

  if (name) {
    const device = {
      name,
      id: Math.round(Math.random() * 10000),
    };

    return {
      status: 'success',
      message: `Device '${name}' created with ID: ${device.id}`,
      device,
    };
  } else {
    return {
      status: 'error',
      message: 'Name is required',
    };
  }
}

The code here is simulating a database mutation and doing some basic validation. This all ought to look pretty familiar. Again, this is the actual copy/pasted code actually running behind the scenes.

This is a very simple example, but in a real-world application you would likely want to add some kind of authentication to your server actions. These server action endpoints are as wide open as any other, so you need to reason about them in the exact same way when it comes to authentication and authorization.

How does this work?

We didn't set up any API routes, we didn't write any network request code, and we didn't have to handle any loading states or error handling. There is no code I am not showing you, stitching things together. We just wrote a simple form, and the Server Actions library took care of the rest. It's like magic!

But it's not magic. It's HTTP. If you open up your browser's developer tools and submit the form, you'll see a network request being made to the server, just like with a traditional form submission. The only difference is that the request is being intercepted by the Server Actions library and handled by the createDeviceAction function instead of the default form submission handler. This results in a POST request being sent to the current URL, with the form data and a bunch of other stuff being sent along with it.

Here's what the response looked like:

Next.js has basically created an API endpoint for us, and then provided its own wrapper calls and data structures on both the request and response cycles, leaving us to focus solely on our UI and business logic.

Visual feedback for slower requests

In many cases, the backend may take a few seconds to process the user's request. It's always a good idea to provide some visual feedback to the user while they are waiting. There's another lovely new React hook called useFormStatus that we can use to show a loading spinner while the request is pending. Here's a slightly modified version of the form that shows gives the user some feedback while the request is being processed:

'use client';
import { useFormState, useFormStatus } from 'react-dom';
import { createDeviceActionSlow } from '@/app/actions/devices';

export function AddDeviceFormSlow() {
  const [state, formAction] = useFormState(createDeviceActionSlow, {});

  return (
    <form action={formAction} className="create-device">
      <fieldset>
        <label htmlFor="name">Name:</label>
        <input type="text" name="name" id="name" placeholder="type something" />
        <SubmitButton />
      </fieldset>
      {state.status === 'error' && <p className="text-red-500">{state.message}</p>}
      {state.status === 'success' && <p className="text-green-500">{state.message}</p>}
    </form>
  );
}

//this has to be a separate component because we can't use the useFormStatus hook in the 
//same component that has the <form>. Sadface.
function SubmitButton() {
  const { pending } = useFormStatus();

  return (
    <button type="submit" disabled={pending}>
      {pending ? 'Submitting...' : 'Submit'}
    </button>
  );
}

This is almost identical to the first example, but I've split the submit button into a separate component and used the useFormStatus hook to show a loading spinner when the request is pending. It's also now pointing at the createDeviceActionSlow function, which is identical to the createDeviceAction function except it has a 3 second delay before returning the response.

In this brave new world, we are in a position to load data either on the client or the server, process actions either on the client or the server, and render components either on the client or the server. It's a lot to keep track of, but it's also a lot of power.

Here's the live component - give it a whirl:

That's pretty cool. The useFormStatus hook is doing all the work of tracking the request status and updating the UI accordingly. It's a small thing, but it makes both the user experience and the developer experience a lot better.

Just to confuse things a little, useFormState is being renamed useActionState. As of the time of writing that's still in RC. Perhaps I'll remember to come back and update this post when it's released. But perhaps I won't.

What about the API?

It has been the case for quite some time that the greatest value in a web application is often not found in its UI but in its API. The UI is just a way to interact with the API, and the API is where the real work gets done. If your application is genuinely useful to other people, there's a good chance they will want to integrate with it via an API.

There is a school of thought that says your UI should be treated just the same as any other API client for your system. This is a good school, and its teachers are worth listening to. UIs are for humans and APIs are for machines, but there's a lot of overlap in what they want in life:

A speedy response
To know if their action succeeded, or why it failed
To get the data they asked for, in a format they can easily consume

Can't we service them both with the same code? Yes, we can. But it's not always as simple as it seems.

The real world spoils the fun

Way up in that second example snippet, we were making a POST request to /api/devices; our UI code was talking to the exact same API endpoint that any other API user would be talking to. There are many obvious benefits to this, mostly centering around the fact that you don't need to maintain parallel code paths for UI and API users. I've worked on systems that did that, and it can end up doubling your codebase.

Server Actions are great, but they take us away from HTTP and REST, which are bedrock technologies for APIs. It's very easy to spam together a bunch of Server Actions for your UI, and then find yourself in a mess when you need to build an API for someone else to use.

The reality is that although API users and UI users do have a lot in common, they also have differences. In our Server Action examples above we were returning a simple object with a status and a message, but in a real API you would likely want to return a more structured response, with an HTTP status code, headers, and a body. We're also much more likely to need things like rate limiting for our API users, which we didn't have to think about for our UI users.

Consider a super simple POST endpoint in a real API. Assume you're using Prisma and Zod for validation - a fairly common pairing. Here's how you might write that API endpoint:

export async function POST(req: NextRequest) {
  try {
    const body = await req.json();

    const data = {
      type: body.type,
      hostname: body.hostname,
      credentials: body.credentials,
    } as Prisma.DeviceCreateInput;

    DeviceSchema.parse(data);
    const device = prisma.device.create({ data });

    return NextResponse.json(device, { status: 201 });
  } catch (error) {
    if (error instanceof ZodError) {
      return NextResponse.json({ error: { issues: error.issues } }, { status: 400 });
    }
    return NextResponse.json({ error: "Failed to create device" }, { status: 500 });
  }
}

This API endpoint consumes JSON input (assume that auth is handled via middleware), validates it with Zod, and then creates a new device in the database. If the input is invalid, it returns a 400 status code with an error message. If the input looks good but there's an error creating the device, it returns a 500 status code with an error message. If everything goes well, it returns a 201 status code with the newly created device.

Now let's see how we might write a Server Action for the same functionality:

'use server';

export async function createDeviceAction(prevState: any, formData: FormData) {
  try {
    const data = {
      type: formData.get("type"),
      hostname: formData.get("hostname"),
      credentials: formData.get("credentials"),
    } as Prisma.DeviceCreateInput;

    DeviceSchema.parse(data);
    const device = prisma.device.create({ data });

    revalidatePath("/devices");

    return {
      success: true,
      message: "Device Created Successfully",
      device,
    };
  } catch (error) {
    if (error instanceof ZodError) {
      return {
        success: false,
        message: "Validation Error",
        error: {
          issues: error.issues,
        },
      };
    }

    return {
      success: false,
      message: "Failed to create device",
      error: JSON.stringify(error),
    };
  }
}

The core of these 2 functions is the same exact 2 lines - one to validate using zod, the other to persist using Prisma. The flow is exactly the same, but in one case we're grabbing JSON, in the other reading form data. In one case we're returning NextResponse objects with HTTP status codes, in the other we're returning objects with success and message keys. The Server Action can also take advantage of nice things like revalidatePath to trigger a revalidation of the page that called it, but we don't want that line in our API endpoint.

Somewhere along the line we will want to show a message to the UI user telling them what happened - hence the message key in the Server Action (the API user can just read the HTTP status code). We could have moved that logic to the UI instead, perhaps returning a statusCode key in the JSON response to emulate an HTTP status code. But that's just reimplementing part of HTTP, and moving the problem to the client, which now has to provide the mapping from a status code to a message. It also means a bigger bundle if we want to support internationalization for those messages.

What this all means is that if you want to take advantage of the UI code cleanliness benefits that come from using Server Actions, and your application conceivably might need an API now or in the future, you need to think about how you are going to avoid duplicating logic between your Server Actions and your API endpoints. This may be a hard problem, and there's no one-size-fits-all solution. Yes you can pull those 2 lines of core logic out into a shared function, but you're still left with a lot of other almost-the-same-but-not-quite code.

Ultimately, it probably just requires another layer of indirection. What that layer looks like will depend on your application, but it's something to think about before you go all-in on Server Actions.

Avoiding Catastrophe by Automating OPNSense Backups

Tue, 28 May 2024 06:31:02 GMT

tl;dr: a Backups API exists for OPNSense. opnsense-autobackup uses it to make daily backups for you.

A few months ago I set up OPNSense on my home network, to act as a firewall and router. So far it's been great, with a ton of benefits over the eero mesh system I was replacing - static DHCP assignments, pretty local host names via Unbound DNS, greatly increased visibility and monitoring possibilities, and of course manifold security options.

However, it's also become a victim of its own success. It's now so central to the network that if it were to fail, most of the network would go down with it. The firewall rules, VLAN configurations, DNS setup, DHCP etc are all very useful and very endemic - if they go away most of my network services go down: internet access, home automation, NAS, cameras, more.

OPNSense lets you download a backup via the UI; sometimes I remember to do that before making a sketchy change, but I have once wiped out the box without a recent backup, and ended up spending several hours getting things back up again. That was before really embracing things like local DNS and static DHCP assignments, which I now have a bunch of automation and configuration reliant on.

OPNSense has a built-in way to have backups be automatically created and uploaded to a Google Drive folder. Per the docs it does this on a daily basis, uploading a new backup to Google Drive if something changed. If you want to use Google Drive for your backup storage, this is probably the right option for you, but if you want to customize how this works - either the schedule on which backups are made, or where they're sent, there are ways to do that too.

Using the OPNSense API to create backups

OPNSense provides a simple API that allows you to download the current configuration as an XML file. It gives you the same XML file that you get when you click the "Download configuration" button manually in the OPNSense UI. It's worth downloading it manually once and just skimming through the file in your editor - it's nicely organized and interesting to peruse.

Once you've done that, though, you'll probably want to automate the process so you don't have to remember. That's fairly straightforward:

Setting up OPNSense for API backups

We need to set up a way to access the OPNSense backups API, ideally not using our root user - or indeed any user with more access privileges than necessary to create backups. To accomplish this we'll set up a new Group called backups - create the Group via the OPNSense UI, then edit it to assign the Diagnostics: Configuration History privilege. This grants access to the /api/core/backup/ APIs.

Then, create a new User called backup, and add it to the new backups Group. Your Group configuration will end up looking something like this:

Now that you have a new backup User, which has access only to configuration/backups APIs, you need to generate an API Key and Secret. Do this in the UI (your actual key will be a long random string):

Creating an API Key for the user will automatically initiate a download in your browser of a text file containing 2 lines - the key itself and a secret. This is the one and only time you will be able to gain access to the secret, so save it somewhere. An encrypted version of it will be kept in OPNSense, but you'll never be able to get hold of the non-encrypted version again if you lose it. Here's what the text file will look like:

key=SUPER+TOP+SECRET+KEY
secret=alongstringofrandomlettersandnumbers

Downloading a backup via the API

Let's test out our new user with a curl command to download the current configuration. The -k tells curl to disregard the fact that OPNSense is likely to respond with an SSL certificate curl doesn't recognize (for your home network you are unlikely to care too much about this). The -u sends our new user's API Key and Secret using HTTP Basic auth:

$ curl -k -u "SUPER+TOP+SECRET+KEY":"alongstringofrandomlettersandnumbers" \
 https://firewall.local/api/core/backup/download/this > backup

$ ls -lh
total 120
-rw-r--r--  1 ed  staff    56K May 24 09:33 backup

Cool - we have a 56kb file called backup, which ends up looking something like this:

<?xml version="1.0"?>
<opnsense>
  <theme>opnsense</theme>
  <sysctl>
    <item>
      <descr>Increase UFS read-ahead speeds to match the state of hard drives and NCQ.</descr>
      <tunable>vfs.read_max</tunable>
      <value>default</value>
    </item>
    <item>
      <descr>Set the ephemeral port range to be lower.</descr>
      <tunable>net.inet.ip.portrange.first</tunable>
      <value>default</value>
    </item>
    <item>
      <descr>Drop packets to closed TCP ports without returning a RST</descr>
      <tunable>net.inet.tcp.blackhole</tunable>
      <value>default</value>

... 1000 more lines of this ...

</opnsense>

In my case I have a couple of thousand lines of this stuff - you may have more or less. Obviously, we wouldn't usually want to do this via a curl command, especially not one that resulted in our access credentials finding their way into our command line history, so let's make this a little bit better.

Automating it all

There are a variety of options here, on 2 main axes:

Where to send your backups
How often to make a backup

In my case I want to put the file into a git repository, along with other network configuration files. OPNSense does have a built-in way to back up files to a git repo, but I want to be able to put more than just OPNSense config files in this repo, so I went for a more extensible approach.

Daily backups seem reasonable here, as well as the option to create them ad-hoc. Ideally one would just run a single script and a timestamped backup would appear in a backups repo. As I recently set up TrueNAS scale on my local network, this seemed a great place to host a schedulable Docker image, so that's what I did.

The Docker image in question handles downloading the backups and pushing them to a GitHub repository. This approach allows us to easily schedule and manage our backups using TrueNAS SCALE, or anywhere else on the network you can run a docker container. It's published as edspencer/opnsense-autobackup on Docker Hub, and the source code is up at https://github.com/edspencer/opnsense-autobackup.

Setting Up the Docker Container on TrueNAS SCALE

Here’s a quick walkthrough on how to set up the Docker container on TrueNAS SCALE and configure it to automate your OPNSense backups.

Prerequisites

Docker Installed on TrueNAS SCALE: Ensure that Docker is installed and running on your TrueNAS SCALE system.
GitHub Repository: Create a GitHub repository to store your backups.
GitHub Personal Access Token: Generate a GitHub personal access token with repo read/write permissions to allow the Docker container to push to your repository.

Generate a GitHub Personal Access Token

Go to GitHub Settings.
Click on Generate new token.
Give your token a descriptive name and give it read and write permissions for your new backups repository
Click Generate token.
Copy the token and save it securely. You will need it to configure the Docker container.

Set Up the Docker Container on TrueNAS SCALE

Navigate to the Apps screen on the TrueNAS Scale instance, then click Discover Apps followed by Custom App. Give your app a name and set it to use the edspencer/opnsense-autobackup docker image, using the latest tag.

You'll need to provide the following environment variables, so configure those now in the Container Environment Variables section:

| Name | Value | |--------------|---------------------------------------------------------| | API_KEY | your_opnsense_api_key | | API_SECRET | your_opnsense_api_secret | | HOSTNAME | firewall.local | | GIT_REPO_URL | https://github.com/your_username/your_repo.git | | GIT_USERNAME | your_git_username | | GIT_EMAIL | your_git_email | | GIT_TOKEN | your_git_token | | CRON_SCHEDULE| 0 0 * * * |

Set the CRON_SCHEDULE to anything you like - this one will make it run every day at midnight UTC. Click Install to finish, and you should see the app up and running. So long as you have created your GitHub repo and PAT, you should already see your first backup files in your repo. Depending on what you set for your CRON_SCHEDULE, you'll see new files automatically appearing as long as the image is running.

And you should see some Docker log output like this:

2024-05-25 09:58:05.362503-07:00CRON_SCHEDULE provided: 0 * * * *. Setting up cron job...
2024-05-25 09:58:07.707058-07:00Starting cron service...
2024-05-25 09:58:07.707137-07:00Starting backup process...
2024-05-25 09:58:07.708367-07:00Cloning the repository...
2024-05-25 09:58:07.710068-07:00Cloning into '/repo'...
2024-05-25 09:58:08.339297-07:00Downloading backup...
2024-05-25 09:58:08.343397-07:00% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2024-05-25 09:58:08.343461-07:00Dload  Upload   Total   Spent    Left  Speed
2024-05-25 09:58:08.379857-07:000     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 57117  100 57117    0     0  1521k      0 --:--:-- --:--:-- --:--:-- 1549k
2024-05-25 09:58:08.381179-07:00Saving backup as latest.xml and opnsense_2024-05-25_16-58.xml...
2024-05-25 09:58:08.391197-07:00[main 7922900] Backups generated 2024-05-25_16-58
2024-05-25 09:58:08.391785-07:001 file changed, 1650 insertions(+)
2024-05-25 09:58:08.391814-07:00create mode 100644 opnsense_2024-05-25_16-58.xml
2024-05-25 09:58:09.087436-07:00To https://github.com/edspencer/opnsense-backups.git
2024-05-25 09:58:09.087476-07:00bce0d8a..7922900  main -> main
2024-05-25 09:58:09.090436-07:00Backup process completed.

Conclusions and Improvements

I feel much safer knowing that OPNSense is now being continually backed up. There are a bunch of other heavily-configured devices on my network that I would like centralized daily backups for - Home Assistant and my managed switch configs being the obvious ones. More to come on those.

Obviously you could run this anywhere, not just in TrueNAS, but I like the simplicity, observability and resource reuse of using the TrueNAS installation I already set up. So far that's working out well, though it use some monitoring and alerting in case it stops working.

For a detailed guide on setting up the Docker container and automating your backups, visit the GitHub repository. The script that actually gets run is super simple, and easily adaptable to your own needs.

Automating a Home Theater with Home Assistant

Thu, 25 Jan 2024 06:31:02 GMT

I built out a home theater in my house a couple of years ago, in what used to be a bedroom. From the moment it became functional, we started spending most evenings in there, and got into the rhythm of how to turn it all on:

Turn on the receiver, make sure it's on the right channel
Turn on the ceiling fan at the right setting
Turn on the lights just right
Turn on the projector, but not too soon or it will have issues
Turn on the Apple TV, which we consume most of our content on

Not crazy difficult, but there is a little dance to perform here. Turning on the projector too soon will make it never be able to talk to the receiver for some reason (probably to prevent me from downloading a car), so it has to be delayed the right number of seconds otherwise you have to go through a lengthy power cycle to get it to work.

It also involves the location of no fewer than 5 remote controls, 3 of which use infrared. The receiver is hidden away in a closet, so you had to go in there to turn that on, remote control or no. Let's see if we can automate this so you can turn the whole thing on with a single button.

IR is not a good solution

The first thing I tried were these IR Repeaters, which I figured would allow me to keep the receiver remote in the theater and not have to go into the closet. I tried a few different models but they were all super weak for some reason, despite being plugged in, to the extent that you need to position the re-emitter within inches of the device's IR sensor. I couldn't achieve that in a way that wasn't ugly with wires hanging everywhere, so I gave up on that idea.

Then I tried these Bestcon IR Blaster things, which in theory allow you to record remote control buttons and repeat them. The IR Blaster can join your network, which means it can be automated using Home Assistant, which I use extensively around the house already. I planned to place one of these in the theater itself (for the projector) and another in the closet (for the receiver).

This kinda worked, but it was a bit of a pain to program and they just weren't reliably triggering the devices. Significantly more than zero percent of the time the signal didn't get through, and as with the IR repeaters, you end up with more wires hanging around as it still relies on line-of-sight to the IR sensor. It's also another moving part, something to go wrong, and another random device on your network so it seems this has more downsides than up.

Finally, the coup de grace was that the IR Blaster doesn't know what state your devices are in (bad if you're trying to turn on your device, it's already on, and now you just toggled it off), nor does it know if the command it tried to send was received or needs to be re-tried. There must be a better way...

TCP/IP to the rescue

It turns out all of the devices I wanted to control were network-connectable. In the case of the receiver (a Denon AVR-X6700H) and the projector (a Sony VPL-VW325ES), there's an ethernet port that lets you plug it directly into your network.

Both of these devices actually expose a little HTTP server hosting a basic web app. These allow you to both power on/off the device, and do things like change input. The receiver, pleasingly enough, actually publishes a pretty complete API, which allows you to do basically anything you could do with the remote in your hand, including advanced configuration. Awesome.

The documentation is extensive, though rather dense. Contained therein is the fact that we can send commands like ZMON and SIBD to the receiver, which will turn it on and Switch Input to the Bluray Disk input respectively. As well as the web UI, the receiver exposes a way to send those odd little commands over HTTP - in this case we can just send a GET request to http://receiver.local:8080//goform/formiPhoneAppDirect.xml?ZMON, which will turn on the receiver's main zone. Swap ZMON for whatever command you want to run. There's no actual iPhoneApp involved here, but I guess from this url that one exists.

As we'll see in a moment, that's all we need to know to get Home Assistant able to control both of these devices.

Home Assistant Preparation

Home Assistant already has built-in support for controlling the Sony Projector, so now that it has an IP on our network we can just tell Home Assistant where to find the Projector. As per the docs, this requires a manual edit to configuration.json, which is unusual but easy enough.

There are several ways to edit that file, the easiest probably being to use the File Editor addon. Again per the docs, this just means adding lines 12-15 to your configuration.yaml file (replace projector.local with the IP of the projector if you don't have fancy local DNS wizardry running):

Either restart home assistant or reload the YAML so it can pick that up. Now your projector shows up as a persistent switch in home assistant, so you can turn it on/off at will either via the home assistant UI or via scripts and other automations.

To get Home Assistant able to talk to the receiver, I had to install the Denon AVR integration. That's pretty easy and gives you a pretty basic device page for the receiver, where you can turn it on/off but not much else:

But it also gives 3 addition services you can call in your automations, one of which is the all-important Denon AVR Network Receivers: Get command.

The Script

At this point the script is pretty easy. In order, we:

Use that iPhoneAppDirect.xml path to send the ZMON command (Zone Main On) to the receiver
Turn on the fan (using the Bond integration we fixed last time)
Set the correct lighting scene (all Philips Hue fixtures and LED strips in this case)
Wait the right number of seconds so the projector can talk to the receiver properly
Call the receiver again to switch to the XBox input (SIBD)
Call the Switch: turn on service on the Projector entity that we added to configuration.yaml

We switch to the XBox input first in case we're going to watch a Bluray, otherwise we just press any button on the Apple TV remote to wake that device up and the receiver automatically switches to it. There is also a Theater Off script, which basically does the opposite of the above.

An occasionally useful feature is that we can now turn the theater on and off remotely. As the projector does take a couple of minutes to warm up it can be nice to turn it all on with one button on my phone and then waddle over there a couple of minutes later to find everything ready.

Triggering with a light switch

As I had switched all of the lights in the room to be various Philips Hue fixtures and light strips, the 2-gang light switch box by the entry door suddenly became redundant as all of the lights were permanently powered. This gave a delightful opportunity to install on and off switches in their stead:

These two switches are not connected to any power source, but to a Philips Hue Wall Switch Module, which is just a simple battery-powered device that detects when you flip the switch and exposes that event to the rest of the Hue ecosystem. Because Hue integrates well with Home Assistant, that means we can trivially use it as a trigger for our automations.

The Hue wall module approach works well for this, even though it's not really what it's designed for. All it does is track when a switch is flipped - it doesn't know whether it's on or off, doesn't stop you from flipping it several times (though Home Assistant can dedupe that if necessary), and some day the battery will need to be replaced, but it's served as an excellent solution for us. It also means guests don't have to figure out how to turn everything on/off correctly - just flip the switch.

Possible Extensions & Limitations

Home Assistant can also integrate more deeply with XBox and Apple Tv. In the case of XBox, this requires you to switch it into a much more power-hungry standby mode, which would have the device consuming 30 watts in standby. That's a huge amount of power to spend to basically just enable HA to turn it on and off, so I passed on that.

Similarly, HA can integrate more deeply with Apple TV - loading content as well as just turning it on/off. But, as we use a variety of different apps, the integration wouldn't have much of a chance of knowing which one we're going to choose, so while there's no real power consumption downside, it just wouldn't be useful in our case.

How to make Bond fans work better with Home Assistant

Sat, 13 Jan 2024 06:31:02 GMT

I have a bunch of these nice Minka Aire fans in my house:

They're nice to look at and, crucially, silent when running (so long as the screws are nice and tight). They also have some smart home capabilities using the Smart by Bond stack. This gives us a way to integrate our fans with things like Alexa, Google Home and, in my case, Home Assistant.

Connecting Bond with Home Assistant

In order to connect anything to these fans you need a Bond WIFI bridge. This is going to act as the bridge between your fans and your network. Once you've got it set up and connected to your wifi network, you'll need to figure out what IP it is on. You can send a curl request to the device to get the Access Token that you will need to be able to add it to Home Assistant:

If you get an access denied error, it's probably because the Bond bridge needs a proof of ownership signal. The easiest way to do that is to just power off the bridge and power it on again - run the curl again within 10 minutes of the bridge coming up and you'll get your token.

Integrating Bond with Home Assistant is then pretty easy - search for the Bond integration at http://homeassistant.local:8123/config/integrations/dashboard (substitute for your Home Assistant domain if different) and install, providing the IP and Token you have for your Bond bridge:

It will populate your fans - here's an example, the fan in my home theater:

The top 2 controls there in theory control the fan and the fan's light.

The Annoying Light Toggle bug

Sometimes the light on the fan gets turned on and is impossible to turn off. Whether you use the remote control, the Bond app or Home Assistant, no force in the known universe will turn the fan light off. It's really annoying when it happens. The only way to fix it is to turn the fan off and on again at the breaker, after which it will start responding to commands again.

It also seems to be implemented as a memory-less toggle in some contexts, and a dimmable light in others, and Bond/Home Assistant don't necessarily know the current true state of the light. The Bond app even has a settings page called "Fix Tracked State", where you can go to manually override what Bond thinks is the current light state, assuming it has wandered out of sync. However, even after toggling this in all the ways I could, the bug persisted and it still needed a visit to the breaker box.

One annoying way this bug manifests itself is that the fan lights will turn on when I run my "All Lights Off" script on Home Assistant - this script calls the light.turn_off service on all of the lights set up in the following areas of the house. Curiously, this turns ON the fan lights. I guess that's just because Bond doesn't know if the light is on or off, so it just tries to toggle.

Given that one of these fans is in the bedroom and I press a button that runs the script when we're ready to sleep, it's a little unfortunate that the "All Lights Off" script ends up turning on a bright fan light. Doubly so when I have to walk to the garage to power cycle the breaker to be able to turn the light off again. We need a solution here.

Home Assistant - disable the entity

In my case, as I'm using Home Assistant for basically all of the automations in the house, and there is never a time when I want to turn the fan light on, I just disabled it in Home Assistant. There are a few ways to do this but one is to use the Entites View in Home Assistant and search for "fan". Click the light for fan in question (the row with the lightbulb icon), then the cog in the top right of the modal window and uncheck the "Enabled" flag:

Now back on your device page the control for the fan light will have disappeared and it'll tell you that an entity is not being shown. Now, calls to light.turn_off won't target the fan's light, and therefore won't turn it on when you don't want it to. Your scripts can still turn the fan itself on/off and set the speed though.

Although we lose the ability to control the fan light by doing this, that's not why I have the fans so I don't really care. We have other lighting in the rooms with those fans, so the fan light is never used. Their value is in their silence and prettiness. It's awesome that they integrate with things like Home Assistant. Hopefully this helps out others who have run into similar problems.

Demystifying OpenAI Assistants - Runs, Threads, Messages, Files and Tools

Fri, 17 Nov 2023 03:11:02 GMT

As I mentioned in the previous post, OpenAI dropped a ton of functionality recently, with the shiny new Assistants API taking center stage. In this release, OpenAI introduced the concepts of Threads, Messages, Runs, Files and Tools - all higher-level concepts that make it a little easier to reason about long-running discussions involving multiple human and AI users.

Prior to this, most of what we did with OpenAI's API was call the chat completions API (setting all the non-text modalities aside for now), but to do so we had to keep passing all of the context of the conversation to OpenAI on each API call. This means persisting conversation state on our end, which is fine, but the Assistants API and related functionality makes it easier for developers to get started without reinventing the wheel.

OpenAI Assistants

An OpenAI Assistant is defined as an entity with a name, description, instructions, default model, default tools and default files. It looks like this:

Let's break this down a little. The name and description are self-explanatory - you can change them later via the modify Assistant API, but they're otherwise static from Run to Run. The model and instructions fields should also be familiar to you, but in this case they act as defaults and can be easily overridden for a given Run, as we'll see in a moment.

Tools needs a little more explanation. Tools refers to the set of optional capabilities that can be enabled for the Assistant, but they can also be overridden for a particular run. There are 2 broad types of Tool - OpenAI-hosted and self-hosted. At the moment there are 2 OpenAI-hosted tools - Code Interpreter and Retrieval. To allow your Assistant to write and run code to solve problems, you must enable Code Interpreter; to allow it to look at files you give it, you must enable Retrieval. I suspect this category of tools will just be switched on by default in the future, but for now you have to do it yourself.

The second set of tools are your Custom Functions. I discussed these a little in the last post - basically it's just a way to tell the Assistant about functions you have in your codebase that you would like it to be able to invoke (albeit not directly - read the previous post for more). These are just JSON definitions of the names and shapes of your functions - there's no actual code being sent or run there.

Tools, therefore, means zero or more of your own Custom Functions, plus Retrieval and/or Code Interpreter, if you want to enable them. Tools can be defined at Assistant creation-time, but can be overridden at Run creation-time.

Finally, let's examine Files. Files are actually their own top-level concept; once you upload a File you can then link it to Assistants or Messages - under the covers there are AssistantFile and MessageFile objects that allow there to be a many-to-many relationship between Assistants and Files. Again, Files you make available to your Assistant at creation-time can be overridden at Run-time.

Threads and Messages

A Thread is just an ordered array of Messages. A Message has a role (either "user" or "assistant" - human or machine), some content (what the user said) and an optional set of Files. As before, the Files are linked to the message via an underlying MessageFile, so Files can be reused between Assistants and Messages.

In this example we have a Thread with 4 Messages. The first two are from human participants in the Thread, perhaps Bob is asking for some calendar and product data, so Fred (another human) sends it, along with whatever message content he wrote. But there are also 2 Assistants in the Thread - imaginatively named Assistant 1 and Assistant 2, who wrote Message 3 and Message 4 respectively. In order for these two Messages to be created and added to the Thread, the Assistants will need to be invoked via a Run.

So What's a Run?

A Run is an entity that represents the process of invoking an Assistant on a Thread. Only one Run can be executing at a time for a given Thread. The Run configuration declares which Assistant should be invoked, what Thread ID to use, and then a bunch of familiar-looking optional parameters. For example, you can define the instructions for the Assistant when you create the Assistant itself, but you can also override them for the specific Run:

You probably noticed that the Run diagram looks a lot like the Assistant diagram. Most of the stuff you can define on the Assistant can be overridden at Run creation-time. You can even change which model the Assistant uses during the Run, which feels a little odd and probably isn't something you'd do too often, but at the end of the day it's just swapping one text-in-text-out function call for another so why not - see the final paragraph of this post for why this might be.

Although you can set your Assistant up with Tools and Files, you can also override those at Run creation-time. It's nice to have that flexibility, though I think it's easier to reason about Assistant capabilities than Run-specific Assistant capabilities, so I suspect most use cases will not involve overriding Tools and Files at Run-creation time. You are currently limited to 20 Files per Assistant, with some size limits too, so the Run-specific overriding of Files would be a way to have your Assistants operate on more than 20 files during the Thread lifetime. That's a slightly hacky way around what is probably a short-term limitation though.

Tracing Runs across a Thread

Returning to our Thread 123 example a couple of pictures up, let's take a look at the Runs that were invoked against our Thread. In the image below we have 3 runs - the last one is a bonus Run against a hypothetical Message 5 in our Thread, showing that you can override basically everything an Assistant is on the Run itself.

Run 1 was created against our Thread 123 at some point after Bob and Fred had sent their Messages (Message 1 and Message 2). Run 1 is super basic - it just defines the Assistant to use (Assistant 1) and the Thread to execute on (Thread 123 - the same for all of these Runs). Its execution yields Message 3, which is added to the Thread.

We then triggered Run 2, this time asking Assistant 2 to provide its input, as well as overriding both the model and instructions for Assistant 2, and providing a custom set of Tools. This yields Message 4, which completes the Thread example above.

Run 3 is just to show what a next Run invocation might look like, customizing Files, Tools, model and instructions. At this point, you're arguably not using the Assistants API at all as everything in your Assistant has been overridden.

Bear in mind that each Run has to be triggered by something - it won't happen automatically by Messages being appended to a Thread, so you need something that actually kicks this off. One challenge in Threads that involve multiple human and Assistant users is figuring out when to invoke which Assistant - I'll have some more thoughts on that in an upcoming post.

A Simplified Conceptual Model

Let's close out with a simplified diagram of the relationships between the actors in this play. On the right we find the Assistant, configured with its default Files and Tools. It is also tied to a set of Runs, as each Run is executed against a single Assistant. There's a one-to-many relationship between the Assistant and its Runs, though these Runs could be against more than one Thread.

On the other side of the diagram, we see that a Thread is composed of multiple Messages, which can be added to later, and that a Thread also has multiple Runs associated with it. Messages can have message-specific Files attached in addition to their content.

Finally, the glue holding it all together in the center is the Run, which executes on specific Thread using a specific Assistant, but can also provide Run-specific Files and Tools to make available to the Assistant during the invocation. Usually a new Message will be appended to the Thread as a result of the invocation, but the Run lifecycle is a little deeper than that and worthy of further examination in another post.

Although there are implied one-to-many relationships between Assistant and Run, and between Thread and Run, there is currently no way to get all of the Runs for a given Thread [UPDATE: listRuns API now does this] or for a given Assistant, so if you want to track the state of a Run currently executing on a Thread, you need to keep track of both the Thread ID and the Run ID to be able to use the getRun API to get the Run status. I imagine this will change in the near future.

This is definitely progress in terms of making it easier for developers to build persistent generative AI applications with a chat component, though it looks like this is all just an abstraction placed over the same old underlying LLM text-in-text-out function. That's not to say that abstractions like this are not a very welcome thing, just bear in mind what's really happening under the covers.

Looking at the picture above it's fairly easy to see how the set of Messages, Files and Tools (the Custom Function definitions at least) in a Thread could be smushed together into a big ole blob of text and fed to the LLM, probably stitched inside some other prompt text. This is why it's reasonable (though probably not all that useful) to swap out the model between Runs - at the end of the day we're just passing a bunch of text into a function called an LLM and getting some text out of it.

Using ChatGPT to generate ChatGPT Assistants

Wed, 15 Nov 2023 02:02:48 GMT

OpenAI dropped a ton of cool stuff in their Dev Day presentations, including some updates to function calling. There are a few function-call-like things that currently exist within the Open AI ecosystem, so let's take a moment to disambiguate:

Plugins: introduced in March 2023, allowed GPT to understand and call your HTTP APIs
Actions: an evolution of Plugins, makes it easier but still calls your HTTP APIs
Function Calling: Chat GPT understands your functions, tells you how to call them, but does not actually call them

It seems like Plugins are likely to be superseded by Actions, so we end up with 2 ways to have GPT call your functions - Actions for automatically calling HTTP APIs, Function Calling for indirectly calling anything else. We could call this Guided Invocation - despite the name it doesn't actually call the function, it just tells you how to.

That second category of calls is going to include anything that isn't an HTTP endpoint, so gives you a lot of flexibility to call internal APIs that never learned how to speak HTTP. Think legacy systems, private APIs that you don't want to expose to the internet, and other places where this can act as a highly adaptable glue.

I've put all the source code for this article up at https://github.com/edspencer/gpt-functions-example, so check that out if you want to follow along. It should just be a matter of following the steps in the README, but YMMV. We are, of course, going to use a task management app as a playground.

Creating Function definitions

In order for OpenAI Assistants to be able to call your code, you need to provide them with signatures for all of your functions, in the format that it wants, which look like this:

{
  "type": "function",
  "function": {
    "name": "addTask",
    "description": "Adds a new task to the database.",
    "parameters": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of the task."
        },
        "priority": {
          "type": "number",
          "description": "The priority of the task, lower numbers indicating higher priority."
        },
        "completed": {
          "type": "boolean",
          "description": "Whether the task is marked as completed."
        }
      },
      "required": ["name"]
    }
  }
}

That's pretty self-explanatory. It's also a pain in the ass to keep tweaking and updating as you evolve your app, so let's use the OpenAI Chat Completions API with the json_object setting enabled and see if we can have this done for us.

Our Internal API

Let's build a basic Task management app. We'll just use a super-naive implementation of Todos written in TypeScript. My little API.ts has functions like addTask, updateTask, removeTask, getTasks, etc. All the stuff you'd expect. Some of them take a bunch of different inputs.

Here's a snippet of our API.ts file. It's very basic but functional, using a sqlite database driven by Prisma:

interface TaskInput {
  name: string;
  priority?: number;
  completed?: boolean;
  deleted?: boolean;
}

/**
 * Adds a new task to the database.
 * @param taskInput - An object containing the details of the task to be added.
 * @param taskInput.name - The name of the task.
 * @param taskInput.priority - The priority of the task.
 * @returns A Promise that resolves when the task has been added to the database.
 */
async function addTask(taskInput: Task): Promise<Task | void> {
  try {
    const task = await prisma.task.create({
      data: taskInput
    })
    console.log(`Task ${task.id} created with name ${task.name} and priority ${task.priority}.`)

    return task;
  } catch (e) {
    console.error(e)
  }
}

/**
 * Updates a task in the database.
 * @param id - The ID of the task to update.
 * @param updates - An object containing the updates to apply to the task.
 * @param updates.name - The updated name of the task.
 * @param updates.priority - The updated priority of the task.
 * @param updates.completed - The updated completed status of the task.
 * @returns A Promise that resolves when the task has been updated in the database.
 */
async function updateTask(id: string, updates: Partial<TaskInput>): Promise<void> {
  try {
    const task = await prisma.task.update({
      where: { id },
      data: updates,
    })
    console.log(`Task ${task.id} updated with name ${task.name} and priority ${task.priority}.`)
  } catch (e) {
    console.error(e)
  }
}

It goes on from there. You get the picture. No it's not production-grade code - don't use this as a launchpad for your Todo list manager app. GitHub Copilot actually wrote most of that code (and most of the documentation) for me.

Side note on documentation: it took me more years than I care to admit to figure out that the primary consumer of source code is humans, not machines. The machine doesn't care about your language, formatting, awfulness of your algorithms, weird variable names, etc; algorithmic complexity aside it'll do exactly the same thing regardless of how you craft your code. Humans are a different matter though, and benefit enormously from a little context written in a human language.

Ironically, that same documentation that benefitted human code consumers all this time is now what enables these new machine consumers to grok and invoke your code, saving you the work of coming up with a translation layer to integrate with AI agents. So writing documentation really does help you after all. Also, write tests and eat your vegetables.

Generating the OpenAI translation layer

The code to translate our internal API into something OpenAI can use is fairly simple and reusable. All we do is read in a file as text, stuff the contents of that file into a GPT prompt, send that off to OpenAI, stream the results back to the terminal and save it to a file when done:

/**
 * This file uses the OpenAI Chat Completions API to automatically generate OpenAI Function Call
 * JSON objects for an arbitrary code file. It takes a source file, reads it and passes it into 
 * OpenAI with a simple prompt, then writes the output to another file. Extend as needed.
 */

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

import { OptionValues, program } from 'commander';

//takes an input file, and generates a new tools.json file based on the input file
program.option('sourceFile', 'The source file to use for the prompt', './API.ts');
program.option('outputFile', 'The output file to write the tools.json to (defaults to your input + .tools.json');

const openai = new OpenAI();

/**
 * Takes an input file, and generates a new tools.json file based on the input file.
 * @param sourceFile - The source file to use for the prompt.
 * @param outputFile - The output file to write the tools.json to. Defaults to 
 * @returns Promise<void>
 */
async function build({ sourceFile, outputFile = `${sourceFile}.tools.json` }: OptionValues) {
  console.log(`Reading ${sourceFile}...`);
  const sourceFileText = fs.readFileSync(path.join(__dirname, sourceFile), 'utf-8');

  const prompt = `
    This is the implementation of my ${sourceFile} file:

    ${sourceFileText}

    Please give me a JSON object that contains a single key called "tools", which is an array of the functions in this file.
    This is an example of what I expect (one element of the array):

    {
      "type": "function",
      "function": {
        "name": "addTask",
        "description": "Adds a new task to the database.",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "description": "The name of the task."
            },
            "priority": {
              "type": "number",
              "description": "The priority of the task, with lower numbers indicating higher priority."
            },
            "completed": {
              "type": "boolean",
              "description": "Whether the task is marked as completed."
            }
          },
          "required": ["name"]
        }
      }
    },

  `
  //Call the OpenAI API to generate the function definition, and stream the results back
  const stream = await openai.chat.completions.create({
    model: 'gpt-4-1106-preview',
    response_format: { type: 'json_object' },
    messages: [{ role: 'user', content: prompt }],
    stream: true,
  });

  //Keep the new tools.json in memory until we have it all
  let newToolsJson = "";

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || ''
    process.stdout.write(content);
    newToolsJson += content;
  }

  console.log(`Updating ${outputFile}...}`);

  // Write the tools JSON to ../tools.json
  fs.writeFileSync(path.join(__dirname, outputFile), newToolsJson);
}

build(program.parse(process.argv).opts());

I've made a simple little repo with this file, the API.ts file, and a little demo that shows it all integrated. Run it like this:

ts-node rebuildTools.ts -s API.ts

Which will give you some output like this, and then update your API.ts.tools.json file:

ts-node rebuildTools.ts -s API.ts          
Reading API.ts...
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "addTask",
        "description": "Adds a new task to the database.",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {

..........truncated...
 full output at https://github.com/edspencer/gpt-functions-example/blob/main/API.ts.tools.json
.............................

        "returns": {
          "type": "Promise<void>",
          "description": "A Promise that resolves when all tasks have been deleted from the database."
        }
      }
    }
  ]
}
Updating ./API.ts.tools.json...
Done

Creating an OpenAI Assistant and talking to it

We've had Open AI generate our Tools JSON file, now let's see if it can use it with a simple demo.ts, which:

Creates a new OpenAI Assistant with some custom instructions
Creates a new Thread
Creates a new Message, attaches it to the Thread
Creates a new Run and polls it until complete
Executes any actions that came back from OpenAI

The code is all up on GitHub, and I won't do a blow-by-blow here but let's have a look at the output when we run it:

ts-node ./demo.ts -m "I need to go buy bread from the store, then go to \
    the gym. I also need to do my taxes, which is a P1."

And the output:

Creating assistant...
Created assistant asst_hkT3BFQsNf3HSmJpE8KytiX9 with name Task Planner.
Created thread thread_AigYi0oFrytu3aO5k0mRacIV
Retrieved 0 tasks from the database.
Created message
msg_uLpR3UpQB3pX62wVIA7TcqIl
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
  {
    id: 'call_8JX5ffKFpxIhYmJeZYYilpv3',
    type: 'function',
    function: {
      name: 'addTask',
      arguments: '{"name": "Buy bread from the store", "priority": 2}'
    }
  },
  {
    id: 'call_GC4axxSB6Oso0tiolDLr900X',
    type: 'function',
    function: {
      name: 'addTask',
      arguments: '{"name": "Go to the gym", "priority": 2}'
    }
  },
  {
    id: 'call_7c5mWt1I5Ff3h5Lvb0Hfw2L7',
    type: 'function',
    function: {
      name: 'addTask',
      arguments: '{"name": "Do taxes", "priority": 1}'
    }
  }
]
Adding task
Task cloyl2gxs0000c3a7hxe6hupc created with name Buy bread from the store and priority 2.
Adding task
Task cloyl2gxv0001c3a7zi4hqt8z created with name Go to the gym and priority 2.
Adding task
Task cloyl2gxx0002c3a7l0gv7f07 created with name Do taxes and priority 1.

You can see all of the steps it takes in the console output. We had the creation of the Assistant, the Thread, then we looked to see if our sqlite database has any existing Tasks, in which case we're going to send those along as input too, then we pass those along with the user's message and get back OpenAI's function invocations (3 in this case). Finally, we iterate over them all and call our internal addTask function, and at the bottom of the output we see that our tasks were created successfully.

Let's go call it again, updating the tasks that we just made:

ts-node demo.ts -m "I finished the laundry, please mark it complete. Also the gym is a P1"

Output:

Creating assistant...
Created assistant asst_WbTXKoXWL1yTWs4zvcVkDIDT with name Task Planner.
Created thread thread_mLvr7acahXbnmoe217f0gMRF
Retrieved 3 tasks from the database.
Created message
msg_iYYkAeuxRPNmJZ5vAKwiI8S7
Polling thread
Current status: queued
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: in_progress
Trying again in 2 seconds...
Polling thread
Current status: requires_action
Actions:
[
  {
    id: 'call_W4UKGadROhaJJFZym7vQocP7',
    type: 'function',
    function: {
      name: 'completeTask',
      arguments: '{"id": "cloyl2gxs0000c3a7hxe6hupc"}'
    }
  },
  {
    id: 'call_KzaYk1x4sIRFWeKlvgOk37qf',
    type: 'function',
    function: {
      name: 'updateTask',
      arguments: '{"id": "cloyl2gxv0001c3a7zi4hqt8z", "updates": {"priority": 1}}'
    }
  }
]
Completing task
Task cloyl2gxs0000c3a7hxe6hupc marked as completed.
Updating task
Task cloyl2gxv0001c3a7zi4hqt8z updated with name Go to the gym and priority 1.

That's kinda amazing. All that any of this really does is assemble blobs of text and send them to the OpenAI API, which is able to figure it all out, even with the context of the data, and correctly call both create and update APIs that exist only internally within your system, without exposing anything to the internet at large.

Here it correctly figured out the IDs of the Tasks to update (because I passed that data in with the prompt - it's tiny), which functions to call and that they should be done in parallel, meaning your user can speak/type as much as they like, making a lot of demands in a single submission, and the Assistant will batch it all up into a set of functions that, from its perspective at least, it wants you to run in parallel,

After executing the functions you can send another request to tell the Assistant the outcome - this article is long enough already but you can see how to close that loop on the OpenAI Function Calling docs.

Closing Thoughts

This stuff is all very new, and there are some pros and cons here. While all looks rosy in the end, it did take a few iterations to get GPT to reliably and consistently output the JSON format expected in the translation stage - occasionally it would innovate and restructure things a little, which causes things to break. That's probably just something that time will take care of as this stuff gets polished up, both on OpenAI's end and on everyone else's, but it's something to be aware of.

This technology requires a considered approach to testing too: GPT is a big old black box floating off in the internet somewhere, it's semi-magical, and it doesn't always give the right answer. Bit rot seems a serious risk here - both due to the newness of the tech and the fact that most of us don't really understand it very well. It seems sensible to mock/stub out expected responses from OpenAI's APIs to do unit testing, but when it comes to integration testing, you probably need your tests to do something like what our demo.ts does, and then verify the database was updated correctly at the end.

It can be the case that you make no changes to your code or environment but still get different outcomes due to the non-determinism of GPT. Amelioration for this could be in the form of temperature control and fine tuning, but you're probably going to need to be less than 100% trustful that your Assistant is doing what you think it is.

Finally, there's obviously a huge security consideration here. Fundamentally, we're taking user input (text, speech, images, whatever), and calling code on our own systems as a result. This always involves peril, and one can imagine all kinds of SQL injection-style attacks against Agent systems that inadvertently run malicious actions the developer didn't intend. For example - my API.ts contains a deleteAllTasks function does what you think it does. Because it's part of API.ts, the Assistant knows about it, and could inadvertently call it, whether the user was trying to do that or not.

It would be extremely easy to mix up public and private code in this way and accidentally expose it to the Assistant, so in reality you probably want a sanity-check to run each time the tools JSON has been rebuilt, telling you what changed. Seems a good thing to have in your CI/CD.

Distributed Tracing with Node JS

Tue, 13 Oct 2020 07:00:00 GMT

The microservice architecture pattern solves many of the problems inherent with monolithic applications. But microservices also bring challenges of their own, one of which is figuring out what went wrong when something breaks. There are at least 3 related challenges here:

Log collection
Metric collection
Distributed tracing

Log and metric collection is fairly straightforward (we'll cover these in a separate post), but only gets you so far.

Let's say your 20 microservice application starts behaving badly - you start getting timeouts on a particular API and want to find out why. The first place you look may be your centralized metrics service. This will likely confirm to you that you have a problem, as hopefully you have one or more metrics that are now showing out-of-band numbers.

But what if the issue only affects part of your user population, or worse, a single (but important) customer? In these cases your metrics - assuming you have the right ones in the first place - probably won't tell you much.

In cases like these, where you have minimal or no guidance from your configured metrics, you start trying to figure out where the problem may be. You know your system architecture, and you're pretty sure you've narrowed the issue down to three or four of your services.

So what's next? Well, you've got your centrally aggregated service logs, right? So you open up three or four windows and try to find an example of a request that fails, and trace it through to the other 2-3 services in the mix. Of course, if your problem only manifests in production then you'll be sifting through a large number of logs.

How good are you logs anyway? You're in prod, so you've probably disabled debug logs, but even if you hadn't, logs usually only get you so far. After some digging, you might be able to narrow things down to a function or two, but you're likely not logging all the information you need to proceed from there. Time to start sifting through code...

But maybe there's a better way.

Enter Distributed Tracing

Distributed Tracing is a method of tracking a request as it traverses multiple services. Let's say you have a simple e-commerce app, which looks a little like this (simplified for clarity):

Now, your user has made an order and wants to track the order's status. In order for this to happen the user makes a request that hits your API Gateway, which needs to authenticate the request and then send it on to your Orders service. This fetches Order details, then consults your Shipping service to discover shipping status, which in turn calls an external API belonging to your shipping partner.

There are quite a few things that can go wrong here. Your Auth service could be down, your Orders service could be unable to reach its database, your Shipping service could be unable to access the external API, and so on. All you know, though, is that your customer is complaining that they can't access their Order details and they're getting aggravated.

We can solve this by tracing a request as it traverses your architecture, with each step surfacing details about what is going on and what (if anything) went wrong. We can then use the Jaeger UI to visualize the trace as it happened, allowing us to debug problems as well as identify bottlenecks.

An example distributed application

To demonstrate how this works I've created a distributed tracing example app on Github. The repo is pretty basic, containing a packages directory that contains 4 extremely simple apps: gateway, auth, orders and shipping, corresponding to 4 of the services in our service architecture diagram.

The easiest way to play with this yourself is to simply clone the repo and start the services using docker-compose:

git clone git@github.com:edspencer/tracing-example.git
cd tracing-example
docker-compose up

This will spin up 5 docker containers - one for each of our 4 services plus Jaeger. Now go to http://localhost:5000/orders/12345 and hit refresh a few times. I've set the services up to sometimes work and sometimes cause errors - there's a 20% chance that the auth app will return an error and a 30% chance that the simulated call to the external shipping service API will fail.

After refreshing http://localhost:5000/orders/12345 a few times, open up the Jaeger UI at http://localhost:16686/search and you'll see something like this:

http://localhost:5000/orders/12345 serves up the Gateway service, which is a pretty simple one-file express app that will call the Auth service on every request, then make calls to the Orders service. The Orders service in turn calls the Shipping service, which makes a simulated call to the external shipping API.

Clicking into one of the traces will show you something like this:

This view shows you the the request took 44ms to complete, and has a nice breakdown of where that time was spent. The services are color coded automatically so you can see at a glance how the 44ms was distributed across them. In this case we can see that there was an error in the shipping service. Clicking into the row with the error yields additional information useful for debugging:

The contents of this row are highly customizable. It's easy to tag the request with whatever information you like. So let's see how this works.

The Code

Let's look at the Gateway service. First we set up the Jaeger integration:

const express = require('express')
const superagent = require('superagent')
const opentracing = require('opentracing')
const {initTracer} = require('jaeger-client')

const port = process.env.PORT || 80
const authHost = process.env.AUTH_HOST || "auth"
const ordersHost = process.env.ORDERS_HOST || "orders"
const app = express()

//set up our tracer
const config = {
  serviceName: 'gateway',
  reporter: {
    logSpans: true,
    collectorEndpoint: 'http://jaeger:14268/api/traces',
  },
  sampler: {
    type: 'const',
    param: 1
  }
};

const options = {
  tags: {
    'gateway.version': '1.0.0'
  }
};

const tracer = initTracer(config, options);

The most interesting stuff here is where we declare our config. Here we're telling the Jaeger client tracer to post its traces to http://jaeger:14268/api/traces (this is set up in our docker-compose file), and to sample all requests - as specified in the sampler config. In production, you won't want to sample every request - one in a thousand is probably enough - so you can switch to type: 'probabilistic' and param: 0.001 to achieve this.

Now that we have our tracer, let's tell Express to instrument each request that it serves:

//create a root span for every request
app.use((req, res, next) => {
  req.rootSpan = tracer.startSpan(req.originalUrl)
  tracer.inject(req.rootSpan, "http_headers", req.headers)

  res.on("finish", () => {
    req.rootSpan.finish()
  })

  next()
})

Here we're setting up our outer span and giving it a title matching the request url. We encounter 3 of the 4 simple concepts we need to understand:

startSpan - creates a new "span" in our distributed trace; this corresponds to one of the rows we see in the Jaeger UI. This span is given a unique span ID and may have a parent span ID
inject - adds the span ID somewhere else - usually into HTTP headers for a downstream request - we'll see more of this in a moment
finishing the span - we hook into Express' "finish" event on the response to make sure we call .finish() on the span. This is what sends it to Jaeger.

Now let's see how we call the Auth service, passing along the span ID:

//use the auth service to see if the request is authenticated
const checkAuth = async (req, res, next) => {
  const span = tracer.startSpan("check auth", {
    childOf: tracer.extract(opentracing.FORMAT_HTTP_HEADERS, req.headers)
  })

  try {
    const headers = {}
    tracer.inject(span, "http_headers", headers)
    const res = await superagent.get(http://${authHost}/auth).set(headers)

    if (res && res.body.valid) { 
      span.setTag(opentracing.Tags.HTTP_STATUS_CODE, 200) 
      next() 
    } else { 
      span.setTag(opentracing.Tags.HTTP_STATUS_CODE, 401) 
      res.status(401).send("Unauthorized") 
    }
  } catch(e) { 
    res.status(503).send("Auth Service gave an error") 
  }

  span.finish()
}

There are 2 important things happening here:

We create a new span representing the "check auth" operation, and set it to be the childOf the parent span we created previously
When we send the superagent request to the Auth service, we inject the new child span into the HTTP request headers

We're also showing how to add tags to a span via setTag. In this case we're appending the HTTP status code that we return to the client.

Let's examine the final piece of the Gateway service - the actual proxying to the Orders service:

//proxy to the Orders service to return Order details
app.all('/orders/:orderId', checkAuth, async (req, res) => {
  const span = tracer.startSpan("get order details", {
    childOf: tracer.extract(opentracing.FORMAT_HTTP_HEADERS, req.headers)
  })
  try {
    const headers = {}
    tracer.inject(span, "http_headers", headers)
    const order = await superagent.get(http://${ordersHost}/order/${req.params.orderId}).set(headers)
    if (order && order.body) {
      span.finish()
      res.json(order.body)
    } else { 
      span.setTag(opentracing.Tags.HTTP_STATUS_CODE, 200) 
      span.finish()
      res.status(500).send("Could not fetch order")
    }
  } catch(e) {
    res.status(503).send("Error contacting Orders service") 
  }
})

app.listen(port, () => console.log(`API Gateway app listening on port ${port}`))

This looks pretty similar to what we just did for the Auth service - we're creating a new span that represents the call to the Orders service, setting its parent to our outer span, and injecting it into the superagent call we make to Orders. Pretty simple stuff.

Finally, let's look at the other side of this - how to pick up the trace in another service - in this case the Auth service:

//simulate our auth service being flaky with a 20% chance of 500 internal server error
app.get('/auth', (req, res) => {
  const parentSpan = tracer.extract(opentracing.FORMAT_HTTP_HEADERS, req.headers)
  const span = tracer.startSpan("checking user", {
    childOf: parentSpan, tags: {
      [opentracing.Tags.COMPONENT]: "database" 
    }
  })

  if (Math.random() > 0.2) {
    span.finish()
    res.json({valid: true, userId: 123})
  } else {
    span.setTag(opentracing.Tags.ERROR, true) 
    span.finish() 
    res.status(500).send("Internal Auth Service error") 
  }
})

Here we see the 4th and final concept involved in distributed tracing:

extract - pulls the trace ID from the upstream service from the incoming HTTP headers

This is how the trace is able to traverse our services - in service A we create a span and inject it into calls to service B. Service B picks it up and creates a new span with the extracted span as its parent. We can then pass this span ID on to service C.

Jaeger is even nice enough to automatically create a system architecture diagram for you:

Conclusion

Distributed tracing is immensely powerful when it comes to understanding why distributed systems behave the way they do. There is a lot more to distributed tracing than we covered above, but at its core it really comes down to those 4 key concepts: starting spans, finishing them, injecting them into downstream requests and extracting them from the upstream.

One nice attribute of open tracing standards is that they work across technologies. In this example we saw how to hook up 4 Node JS microservices with it, but there's nothing special about Node JS here - this stuff is well supported in other languages like Go and can be added pretty much anywhere - it's just basic UDP and (usually) HTTP.

For further reading I recommend you check out the Jaeger intro docs, as well as the architecture. The Node JS Jaeger client repo is a good place to poke around, and has links to more resources. Actual example code for Node JS was a little hard to come by, which is why I wrote this post. I hope it helps you in your microservice applications.

A New Stack for 2016: Getting Started with React, ES6 and Webpack

Sun, 20 Mar 2016 05:16:20 GMT

A lot has changed in the last few years when it comes to implementing applications using JavaScript. Node JS has revolutionized how many of us create backend apps, React has become a widely-used standard for creating the frontend, and ES6 has come along and completely transformed JavaScript itself, largely for the better.

All of this brings new capabilities and opportunities, but also new challenges when it comes to figuring out what's worth paying attention to, and how to learn it. Today we'll look at how to set up my personal take on a sensible stack in this new world, starting from scratch and building it up as we go. We'll focus on getting to the point where everything is set up and ready for you to create the app.

The stack we'll be setting up today is as follows:

React - to power the frontend
Babel - allows us to use ES6 syntax in our app
Webpack - builds our application files and dependencies into a single build

Although we won't be setting up a Node JS server in this article, we'll use npm to put everything else in place, so adding a Node JS server using Express or any other backend framework is trivial. We're also going to omit setting up a testing infrastructure in this post - this will be the subject of the next article.

If you want to get straight in without reading all the verbiage, you can clone this github repo that contains all of the files we're about to create.

Let's go

The only prerequisite here is that your system has Node JS already installed. If that isn't the case, go install it now from http://nodejs.org. Once you have Node, we'll start by creating a new directory for our project and setting up NPM:

mkdir myproject
npm init

The npm init command takes you through a short series of prompts asking for information about your new project - author name, description, etc. Most of this doesn't really matter at this stage - you can easily change it later. Once that's done you'll find a new file called package.json in your project directory.

Before we take a look at this file, we already know that we need to bring in some dependencies, so we'll do that now with the following terminal commands:

npm install react --save
npm install react-dom --save
npm install webpack --save-dev

Note that for the react dependency we use --save, whereas for webpack we use --save-dev. This indicates that react is required when running our app in production, whereas webpack is only needed while developing (as once webpack has created your production build, its role is finished). Opening our package.json file now yields this:

{
   "name": "myproject",
   "version": "1.0.0",
   "description": "",
   "main": "index.js",
   "scripts": {
       "test": "echo \"Error: no test specified\" && exit 1"
   },
   "author": "",
   "license": "ISC",
   "dependencies": {
     "react": "^0.14.7",
     "react-dom": "^0.14.7"
   },
   "devDependencies": {
     "webpack": "^1.12.14"
   }
 }

This is pretty straightforward. Note the separate dependencies and devDependencies objects in line with our --save vs --save-dev above. Depending on when you created your app the version numbers for the dependencies will be different, but the overall shape should be the same.

We're not done installing npm packages yet, but before we get started with React and ES6 we're going to get set up with Webpack.

Setting up Webpack

We'll be using Webpack to turn our many application files into a single file that can be loaded into the browser. As it stands, though, we don't have any application files at all. So let's start by creating those:

mkdir src
touch src/index.js
touch src/App.js

Now we have a src directory with two empty files. Into App.js, we'll place the following trivial component rendering code:

var App = function() {
  return "<h1>Woop</h1>";
};

module.exports = App;

All we're doing here is returning an HTML string when you call the App function. Once we bring React into the picture we'll change the approach a little, but this is good enough for now. Into our src/index.js, we'll use:

var app = require('./App');
document.write(app());

So we're simply importing our App, running it and then writing the resulting HTML string into the DOM. Webpack will be responsible for figuring out how to combine index.js and App.js and building them into a single file. In order to use Webpack, we'll create a new file called webpack.config.js (in the root directory of our project) with the following contents:

var path = require('path');
var webpack = require('webpack');

module.exports = {
  output: {
    filename: 'bundle.js'
  },
  entry: [
    './src/index.js'
  ]
};

This really couldn't be much simpler - it's just saying take the entry point (our src/index.js file) as input, and save the output into a file called bundle.js. Webpack takes those entry file inputs, figures out all of the require('...') statements and fetches all of the dependencies as required, outputting our bundle.js file.

To run Webpack, we simply use the webpack command in our terminal, which will do something like this:

As we can see, we now have a 1.75kb file called bundle.js that we can serve up in our project. That's a little heavier than our index.js and App.js files combined, because there is a little Webpack plumbing that gets included into the file too.

Now finally we'll create a very simple index.html file that loads our bundle.js and renders our app:

<html>
  <head>
    <meta charset="utf-8">
  </head>
  <body>
    <div id="main"></div>
    <script type="text/javascript" src="bundle.js" charset="utf-8"></script>
  </body>
 </html>

Can't get much simpler than that. We don't have a web server set up yet, but we don't actually need one. As we have no backend we can just load the index.html file directly into the browser, either by dragging it in from your OS's file explorer program, or entering the address manually. For me, I can enter file:///Users/ed/Code/myproject/index.html into my browser's address bar, and be greeted with the following:

Great! That's our component being rendered and output into the DOM as desired. Now we're ready to move onto using React and ES6.

React and ES6

React can be used either with or without ES6. Because this is the future, we desire to use the capabilities of ES6, but we can't do that directly because most browsers currently don't support it. This is where babel comes in.

Babel (which you'll often hear pronounced "babble" instead of the traditional "baybel") a transpiler, which takes one version of the JavaScript language and translates it into another. In our case, it will be translating the ES6 version of JavaScript into an earlier version that is guaranteed to run in browsers. We'll start by adding a few new npm package dependencies:

npm install babel-core --save-dev
npm install babel-loader --save-dev
npm install babel-preset-es2015 --save-dev
npm install babel-preset-react --save-dev
npm install babel-plugin-transform-runtime --save-dev

npm install babel-polyfill --save
npm install babel-runtime --save

This is quite a substantial number of new dependencies. Because babel can convert between many different flavors of JS, once we've specified the babel-core and babel-loader packages, we also need to specify babel-preset-es2015 to enable ES6 support, and babel-preset-react to enable React's JSX syntax. We also bring in a polyfill that makes available new APIs like Object.assign that babel would not usually bring to the browser as it requires some manipulation of the browser APIs, which is something one has to opt in to.

Once we have these all installed, however, we're ready to go. The first thing we'll need to do is update our webpack.config.js file to enable babel support:

var path = require('path');
var webpack = require('webpack');

module.exports = {
  module: {
    loaders: [
      {
        loader: "babel-loader",
        // Skip any files outside of your project's `src` directory
        include: [
          path.resolve(__dirname, "src"),
        ],
        // Only run `.js` and `.jsx` files through Babel
        test: /\.jsx?$/,
        // Options to configure babel with
        query: {
          plugins: ['transform-runtime'],
          presets: ['es2015', 'react'],
        }
      }
    ]
  },
  output: {
    filename: 'bundle.js'
  },
  entry: [
    './src/index.js'
  ]
};

Hopefully the above is clear enough - it's the same as last time, with the exception of the new module object, which contains a loader configuration that we've configured to convert any file that ends in .js or .jsx in our src directory into browser-executable JavaScript.

Next we'll update our App.js to look like this:

import React, {Component} from 'react';

class App extends Component {
  render() {
    return (<h1>This is React!</h1>);
  }
}
export default App;

Cool - new syntax! We've switched from require('') to import, though this does essentially the same thing. We've also switched from module.exports = to export default , which is again doing the same thing (though we can export multiple things this way).

We're also using the ES6 class syntax, in this case creating a class called App that extends React's Component class. It only implements a single method - render - which returns a very similar HTML string to our earlier component, but this time using inline JSX syntax instead of just returning a string.

Now all that remains is to update our index.js file to use the new Component:

import React from 'react';
import ReactDOM from 'react-dom';
import App from './App';

ReactDOM.render(<App />, document.getElementById("main"));

Again we're using the import syntax to our advantage here, and this time we're using ReactDOM.render instead of document.write to place the rendered HTML into the DOM. Once we run the webpack command again and refresh our browser window, we'll see a screen like this:

Next Steps

We'll round out by doing a few small things to improve our workflow. First off, it's annoying to have to switch back to the terminal to run webpack every time we change any code, so let's update our webpack.config.js with a few new options:

module.exports = {
  //these remain unchanged
  module: {...},
  output: {...},
  entry: [...],

  //these are new
  watch: true,
  colors: true,
  progress: true
};

Now we just run webpack once and it'll stay running, rebuilding whenever we save changes to our source files. This is generally much faster - on my 2 year old MacBook Air it takes about 5 seconds to run webpack a single time, but when using watch mode each successive build is on the order of 100ms. Usually this means that I can save my change in my text editor, and by the time I've switched to the browser the new bundle.js has already been created so I can immediately refresh to see the results of my changes.

The last thing we'll do is add a second React component to be consumed by the first. This one we'll call src/Paragraph.js, and it contains the following:

import React, {Component} from 'react';

export default class Paragraph extends Component {
  render() {
    return (<p>{this.props.text}</p>);
  }
}

This is almost identical to our App, with a couple of small tweaks. First, notice that we've moved the export default inline with the class declaration to save on space, and then secondly this time we're using {this.props} to access a configured property of the Paragraph component. Now, to use the new component we'll update App.js to look like the following:

import React, {Component} from 'react';
import Paragraph from './Paragraph';

export default class App extends Component {
  render() {
    return (
      <div className="my-app">
        <h1>This is React!!!</h1>
        <Paragraph text="First Paragraph" />
        <Paragraph text="Second Paragraph" />
      </div>
    );
  }
}

Again a few small changes here. First, note that we're now importing the Paragraph component and then using it twice in our render() function - each time with a different text property, which is what is read by {this.props.text} in the Paragraph component itself. Finally, React requires that we return a single root element for each rendered Component, so we wrap our <h1> and <Paragraph> tags into an enclosing <div>

By the time you hit save on those changes, webpack should already have built a new bundle.js for you, so head back to your browser, hit refresh and you'll see this:

That's about as far as we'll take things today. The purpose of this article was to get you to a point where you can start building a React application, instead of figuring out how to set up all the prerequisite plumbing; hopefully it's clear enough how to continue from here.

You can find a starter repository containing all of the above over on GitHub. Feel free to clone it as the starting point for your own project, or just look through it to see how things fit together.

In the next article, we'll look at how to add some unit testing to our project so that we can make sure our Components are behaving as they should. Until then, happy Reacting!

Jasmine and Jenkins Continuous Integration

Sun, 28 Jul 2013 10:25:02 GMT

I use Jasmine as my JavaScript unit/behavior testing framework of choice because it's elegant and has a good community ecosystem around it. I recently wrote up how to get Jasmine-based autotesting set up with Guard, which is great for development time testing, but what about continuous integration?

Well, it turns out that it's pretty difficult to get Jasmine integrated with Jenkins. This is not because of an inherent problem with either of those two, it's just that no-one got around to writing an open source integration layer until now.

The main problem is that Jasmine tests usually expect to run in a browser, but Jenkins needs results to be exposed in .xml files. Clearly we need some bridge here to take the headless browser output and dump it into correctly formatted .xml files. Specifically, these xml files need to follow the JUnit XML file format for Jenkins to be able to process them. Enter guard-jasmine.

guard-jasmine

In my previous article on getting Jasmine and Guard set up, I was using the jasmine-headless-webkit and guard-jasmine-headless-webkit gems to provide the glue. Since then I've replaced those 2 gems with a single gem - guard-jasmine, written by Michael Kessler, the Guard master himself. This simplifies our dependencies a little, but doesn't buy us the .xml file functionality we need.

For that, I had to hack on the gem itself (which involved writing coffeescript for the first time, which was not a horrible experience). The guard-jasmine gem now exposes 3 additional configurations:

junit - set to true to save output to xml files (false by default)
junit_consolidate - rolls nested describes up into their parent describe blocks (true by default)
junit_save_path - optional path to save the xml files to

The JUnit Xml reporter itself borrows heavily from larrymyers' excellent jasmine-reporters project. Aside from a few changes to integrate it into guard-jasmine it's the same code, so all credit goes to to Larry and Michael.

Sample usage:

In your Guardfile:

guard :jasmine, :junit => true, :junit_save_path => 'reports' do
  watch(%r{^spec/javascripts/.+$}) { 'spec/javascripts' }
  watch(%r{^spec/javascripts/fixtures/.+$}) { 'spec/javascripts' }
  watch(%r{^app/assets/javascripts/(.+?)\.(js\.coffee|js|coffee)(?:\.\w+)*$}) { 'spec/javascripts' }
end

This will just run the full set of Jasmine tests inside your spec/javascripts directory whenever any test, source file or asset like CSS files change. This is generally the configuration I use because the tests execute so fast I can afford to have them all run every time.

In the example above we set the :junit_save_path to 'reports', which means it will save all of the .xml files into the reports directory. It is going to output 1 .xml file for each Jasmine spec file that is run. In each case the name of the .xml file created is based on the name of the top-level describe block in your spec file.

To test that everything's working, just run bundle exec guard as you normally would, and check to see that your reports folder now contains a bunch of .xml files. If it does, everything went well.

Jenkins Settings

Once we've got the .xml files outputting correctly, we just need to tell Jenkins where to look. In your Jenkins project configuration screen, click the Add Build Step button and add a "Publish JUnit test result report" step. Enter 'reports/*.xml' as the Test report XMLs field.

If you've already got Jenkins running your test script then you're all done. Next time a build is triggered the script should run the tests and export the .xml files. If you don't already have Jenkins set up to run your tests, but you did already set up Guard as per my previous article, you can actually use the same command to run the tests on Jenkins.

After a little experimentation, people tend to come up with a build command like this:

bash -c ' bundle install --quiet \
&& bundle exec guard '

If you're using rvm and need to guarantee a particular version you may need to prepend an rvm install command before bundle install is called. This should just run guard, which will dump the files out as expected for Jenkins to pick up.

To clean up, we'll just add a second post-build action, this time choosing the "Execute a set of scripts" option and entering the following:

kill -9 `cat guard.pid`

This just kills the Guard process, which ordinarily stays running to power your autotest capabilities. Once you run a new build you should see a chart automatically appear on your Jenkins project page telling you full details of how many tests failed over time and in the current build.

Getting it

Update: The Pull Request is now merged into the main guard-jasmine repo so you can just use gem 'guard-jasmine' in your Gemfile

This is hot off the presses but I wanted to write it up while it's still fresh in my mind. At the time of writing the pull request is still outstanding on the guard-jasmine repository, so to use the new options you'll need to temporarily use my guard-jasmine fork. In your Gemfile:

gem 'guard-jasmine'

Once the PR is merged and a new version issued you should switch back to the official release channel. It's working well for me but it's fresh code so may contains bugs - YMMV. Hopefully this helps save some folks a little pain!

Sencha Con 2013 Wrapup

Sun, 21 Jul 2013 05:02:10 GMT

So another great Sencha Con is over, and I'm left to reflect on everything that went on over the last few days. This time was easily the biggest and best Sencha Con that I've been to, with 800 people in attendance and a very high bar set by the speakers. The organization was excellent, the location fun (even if the bars don't open until 5pm...), and the enthusiasm palpable.

I've made a few posts over the last few days so won't repeat the content here - if you want to see what else happened check these out too:

What I will do though is repeat my invitation to take a look at what we're doing with JavaScript at C3 Energy. I wrote up a quick post about it yesterday and would love to hear from you - whether you're at Sencha Con or not.

Now on to some general thoughts.

Content

There was a large range in the technical difficulty of the content, with perhaps a slightly stronger skew up the difficulty chain compared to previous events. This is a good thing, though there's probably still room for more advanced content. Having been there before though, I know how hard it is to pitch that right so that everyone enjoys and gets value of out it.

The biggest challenge for me was the sheer number of tracks - at any one time there would be seven talks happening simultaneously, two or three of which I'd really want to watch. Personally I'd really love it if the hackathon was dropped in favor of a third day of sessions, with a shift down to 4-5 tracks. I'm sure there's a cost implication to that, but it's worth thinking about.

Videos

There were cameras set up in at least the main hall on the first day, but I didn't see any on day 2. I did overhear that the video streams were being recorded directly from what was being shown on the projectors, with the audio recorded separately. If that's true I'd guess it would make editing a bit easier so maybe that'll means a quick release.

Naturally, take this with a pinch of salt until the official announcement comes out. In the meantime, there's at least one video available so far:

Fun Things

The community pavilion was a great idea, and served as the perfect space for attendees with hang out away from the other rascals running around the hotel. Coffee and snacks were available whenever I needed them, and there was plenty of seating to chill out in.

I missed out on the visit to the theme park, which I hear was by far the most fun part of the event. Having a theme park kick out everyone but Sencha Con attendees while serving copious amounts of alcohol seemed to go down very well with the attendees!

Sencha Con Attendees: I Need You

Fri, 19 Jul 2013 04:08:30 GMT

Love working with Sencha frameworks? Want to come work with me on the next generation? I moved on to C3 Energy about a year ago, where we are busily building the operating system for the largest machine ever conceived by humans - the Smart Grid.

The Smart Grid is an amazing concept that's being rolled out right now. C3 Energy is the only company in existence that addresses the full stack of Smart Grid architecture - from generation through transmission and end-user consumption.

But what's that got to do with JavaScript? Well, my team gets to work on building the UI that powers everything that happens on the smart grid. We have some unique requirements that have led us to write our own beautiful little framework, optimized for end-user performance and developer productivity. Naturally, this leaves me feeling like this:

We're a small (70 person) company of exceptionally talented people. We have a staggeringly successful collection of people both on the board and as the executive team.

We'd like to attract more people like us, and the Sencha community is the perfect place to look - especially given how much the framework has been inspired by what I helped create at Sencha.

If you're intrigued but don't know much about this space, I can't recommend this video enough. This is a presentation our CEO Tom Siebel gave a few months back, introducing why the company exists, which problems it's solving, and why we're doing what we're doing. If you can watch this without getting excited, this probably isn't for you :)

You'll get to work alongside people like this every day at C3. It's really an incomparable feeling, and I'd love to introduce you to it. If you're interested in finding out more in a low pressure way, drop me a comment or a tweet (@edspencer) or come grab me so I can buy you a beer.

Sencha Con 2013: Ext JS Performance tips

Fri, 19 Jul 2013 02:08:07 GMT

Just as with Jacky's session, I didn't plan on making a separate post about this, but again the content was so good and I ended up taking so many notes that it also warrants its own space. To save myself from early carpal tunnel syndrome I'm going to leave this one in more of a bullet point format.

Ext JS has been getting more flexible with each release. You can do many more things with it these days than you used to be able to, but there has been a performance cost associated with that. In many cases this performance degradation is down to the way the framework is being used, as opposed to a fundamental problem with the framework itself.

There's a whole bunch of things that you can do to dramatically speed up the performance of an app you're not happy with, and Nige "Animal" White took us through them this morning. Here's what I was able to write down in time:

Slow things

Nige identified three of the top causes of sluggish apps, which we'll go through one by one:

Network latency
JS execution
Layout activity

Network latency:

Bad ux - got to stare at blank screen for a while
Use Sencha Command to build the app - single file, minimized
4810ms vs 352ms = dynamic loading vs built

JavaScript execution:

Avoid slow JS engines (he says with a wry smile)
Optimize repeated code - for loops should be tight, cache variables outside
Ideally, don't do any processing at render time
Minimize function calls
Lazily instantiate items
Use the PageAnalyzer (in the Ext JS SDK examples folder) to benchmark your applications
Start Chrome with --enable-benchmarking to get much more accurate timing information out of the browser

Layouts

Suspend store events when adding/removing many records. Otherwise we're going to get a full Ext JS layout pass for each modification

 grid.store.suspendEvents();
 //do lots of updating
 grid.store.resumeEvents();
 grid.view.refresh()

Ditto on trees (they're the same as grids) Coalesce multiple layouts. If you're adding/removing a bunch of Components in a single go, do it like this:

 Ext.suspendLayouts();
 //do a bunch of UI updates
 Ext.resumeLayouts(true);

Container#add accepts an array of items, which is faster than iterating over that array yourself and calling .add for each one. Avoid layout constraints where possible - in box layouts, align: 'stretchmax' is slow because it has to do multiple layout runs. Avoid minHeight, maxHeight, minWidth, maxWidth if possible

At startup:

Embed initialization data inside the HTML if possible - avoids AJAX requests
Configure the entire layout in one shot using that data
Do not make multiple Ajax requests, and build the layout in response

Use the 'idle' event

Similar to the AnimationQueue
Ext.globalEvents.on('idle', myFunction) - called once a big layout/repaint run has finished
Using the idle listener sometimes preferable to setTimeout(myFunction, 1), because it's synchronous in the same repaint cycle. The setTimeout approach means the repaint happens, then your code is called. If your code itself requires a repaint, that means you'll have 2 repaints in setTimeout vs 1 in on.('idle')

Reduce layout depth

Big problem - overnesting. People very often do this with grids:

{
    xtype: 'tabpanel',
    items: [
        {
            title: 'Results',
            items: {
                xtype: 'grid'
            }
        }
    ]
}

Better:

{
    xtype: 'tabpanel',
    items: {
        title: 'Results',
        xtype: 'grid'
    }
}

This is important because redundant components still cost CPU and memory. Everything is a Component now - panel headers, icons, etc etc. Can be constructing more Components than you realize. Much more flexible, but easy to abuse

Lazy Instantiation

New plugin at https://gist.github.com/ExtAnimal/c93148f5194f2a232464

{
    xtype: 'tabpanel',
    ptype: 'lazyitems',
    items: {
        title: 'Results',
        xtype: 'grid'
    }
}

Overall impact

On a real life large example contributed by a Sencha customer:

Bad practices: 5187ms (IE8) Good practices: 1813ms (IE8) 1300ms vs 550ms on Chrome (same example)

Colossal impact on the Ext.suspendLayout example - 4700ms vs 100ms on Chrome

Summary

This is definitely a talk you'll want to watch when they go online. It was absolutely brimming with content and the advice comes straight from the horse's mouth. Nige did a great job presenting, and reminded us that performance is a shared responsibility - the framework is getting faster as time goes by, but we the developers need to do our share too to make sure it stays fast.

Sencha Con 2013: Fastbook

Fri, 19 Jul 2013 01:05:43 GMT

I didn't plan on writing a post purely on Fastbook, but Jacky's presentation just now was so good I felt it needed one. If you haven't seen Fastbook yet, it is Sencha's answer to the (over reported) comments by Zuckerburg that using HTML5 for Facebook's mobile app was a mistake.

After those comments there was a lot of debate around whether HTML5 is ready for the big time. Plenty of opinions were thrown around, but not all based on evidence. Jacky was curious about why Facebook's old app was so slow, and wondered if he could use the same technologies to achieve a much better result. To say he was successful would be a spectacular understatement - Fastbook absolutely flies.

Performance can be hard to describe in words, so Sencha released this video that demonstrates the HTML5 Fastbook app against the new native Facebook apps. As you can see, not only is the HTML5 version at least as fast and fluid as the native versions, in several cases it's actually significantly better (especially on Android).

Challenges

The biggest challenge here is dynamically loading and scrolling large quantities of data while presenting a 60fps experience to the user. 60fps means you have just 16.7ms per frame to do everything, which is a hugely tall order on a CPU and memory constrained mobile device.

The way to achieve this is to treat the page as an app rather than a traditional web page. This means we need to be a lot more proactive in managing how and when things are rendered - something that traditionally has been in the domain of the browser's own rendering and layout engines. Thankfully, the framework will do all of this for you.

As an example, Jacky loaded up Gmail's web app and showed what happens when you scroll a long way down your inbox. The more you scroll, the more divs are added to the document (one new div per message). Each div contains a bunch of child elements too, so we're adding maybe a dozen or so nodes to our DOM tree per message.

The problem with this is that as the DOM tree gets larger and larger, everything slows down. You could see the inspector showing slower and slower layout recalculations, making the app sluggish.

The solution is to recycle DOM nodes once they're no longer visible. In this way, a list that seems to have infinite content could contain only say 10 elements - just enough to fill the screen. Once you scroll down the list, DOM nodes that scrolled off the top are detached, updated with new data and placed at the bottom of the list. Simple. Ingenius. Beautiful.

Prioritization

There's usually a lot more going on in an app than just animating a scrolling view though. There's data to load via AJAX, images to load, compositing, processing, and whatever else your app needs to do. And then there are touch events, which need to feel perfectly responsive at all times, even while all of this is going on.

To make this sane and manageable, we have a new class called AnimationQueue. All of the jobs I just mentioned above - handling touch events, animation, network requests and so on - are dispatched through the AnimationQueue with a given priority. Touch event handling has the top priority, followed by animation, followed by everything else.

AnimationQueue does as much as it can in that 16.7ms window, then breaks execution again to allow the browser to reflow/repaint/whatever else it needs to do. What this means is that while scrolling down a large list, it's likely that our CPU/GPU is being taxed so much that we don't have any time to load images or other low priority jobs.

This is a Good Thing, because if we're scrolling through a large list there's a good chance we are going to skip right over those images anyway. In the end they're loaded as soon as the AnimationQueue has some spare time, which is normally when your scrolling of the list has slowed down or stopped.

Sandboxing

The final, and most complex technique Jacky discussed was Sandboxing. The larger your application gets, the larger the DOM tree. Even if you are using best practices, there's an expense to simply having so many components on the same page. The bottleneck here is in the browser itself - looks like we need another hack.

To get around this, we can dynamically create iframes that contain parts of our DOM tree. This way our main page DOM tree can remain small but we can still have a huge application. This not only speeds up browser repaint and reflow, it also improves compositing performance, DOM querying and more.

This all happens under the covers and Jacky's aiming on including Ext.Sandbox in Sencha Touch 2.3 so that all apps can take advantage of this huge improvement. He cautioned (rightly) that it'll only make 2.3 if it's up to his high standards though, so watch this space.

Sencha Con 2013 Day 1

Thu, 18 Jul 2013 08:02:42 GMT

Sencha Con 2013 kicked off today, with some stunning improvements demoed across the product set. I'm attending as an audience member for the first time so thought I'd share how things look from the cheap seats.

Keynote

The keynote was very well put together, with none of the AV issues that plagued us last year (maybe they seemed worse from behind the curtain!). It started off with a welcome from Paul Kopacki, followed by some insights into the current status of developers in the world of business (apparently we're kingmakers - who knew!). One of Blackberry's evangelists came up and made a pretty good pitch for giving them a second look (the free hardware probably helped a little...)

The meat, though, was in the second half of the presentation. We were treated to a succession of great new features across Ext JS, Sencha Touch and Sencha Architect, which I'll go into in a little more detail below.

But it was Abe Elias and Jacky Nguyen who stole the show in the end. Unleashing a visionary new product, Sencha Space, they demonstrated a brand new way to enable businesses to elegantly solve the problem of BYOD (Bring Your Own Device).

Nobody wants to be given a mobile phone by their IT department when they've got a brand new iPhone in their pocket. But those IT guys have good reason for doing this - consumer browsers are currently inherently insecure. Sencha Space solves this problem by providing a single app that employees can install, log in to and gain access to all of the apps needed to be productive in the company.

I could write a lot more about it but the 2 minute video below can surely do a better job:

Update: looks like this video got taken down at some point

Ext JS upgrades

The keynote lasted most of the morning, but in the afternoon Don Griffin came back on stage to tell us more about what's coming soon in Ext JS. Don heads up Ext JS these days, and is one of the most intelligent and experienced people I've had the joy of working with. I'm pretty sure he gained the largest amount of spontaneous applause of the day during the Ext JS talk, which is no surprise given the awesome stuff he showed us.

I forget which order things were revealed in, but these things stood out for me:

Touch Support - while this may seem anathema to the thinking behind Ext JS, it's an undeniable fact that people try to use Ext JS applications on tablets. Whether they should or not is a different question, but in this next release it will be officially supported by the framework. Momentum scrolling, pinch to zoom and dragdrop resizing are all supported at your fingertips.
Grid Gadgets - quite likely the coolest new feature, Gadgets allow you to render any Component into each cell in a Grid, in an extremely CPU and memory efficient manner. Seeing a live grid updating with rich charts and other widgets at high frequency was a fantastic experience
Border Layout - allows your users to rearrange the border layouts used in your apps with drag and drop. Easy to switch between accordion layout, box layout or tabs
A shedload more. The enforced pub crawl has temporarily relieved me of a full memory. So impressed with everything that was demonstrated today.

Sencha Touch upgrades

Jacky came up and delivered a presentation on what's coming up in Sencha Touch, using his idiosyncratic and inimitable style. Some of the things that stood out for me:

Touch gets a grid. It performs really well and looks great. Good for (sparing) use on tablet apps
XML configs. Not sure how I feel about this yet, but ST 2.3 will allow for views to be declared in XML, which is transformed into the normal JSON format under the covers. You end up writing few lines of code, but the overall file size probably doesn't change too much. With a decent editor the syntax highlighting definitely makes the View code easier to read though
ViewModel. Just as we have Ext.data.Model for encapsulating data models, we now have ViewModel for encapsulating a view model, which includes things like state. Leads to a much improved API for updating Views in response to other changes
Theming. 2 additional themes were added, and the others have all been refactored to make theming even easier

Again there's a lot more here and I couldn't possibly do it all justice in a blog post. It's geniunely thrilling to see these young frameworks mature into stellar products that are being used by literally millions of developers. Very exciting.

Architect upgrades

Architect has come a really long way since its inception a couple of years ago. The new features introduced today looked like some of the largest steps forward the product has ever taken. I'm finally getting close to actually thinking about using it in real life (I'm a glutten for editing code in Sublime Text). Some standout features:

New template apps to get you up and running with a new app in seconds
Integration with Appurify, which allows you to test your Architect apps on real devices hosted by their service
Allows you to install third party extensions into Architect, and have them seamlessly integrated into your project

Day 1 Summary

Although I worked with these people for years, somehow I'm still surprised when I see every single developer giving world class presentations. I don't know how I was able to leave Sencha a year ago, but every time I interact with Abe, Don, Jacky, Tommy, Jamie, Rob, Nige, and all of the other rockstars at that place I'm reminded what a great and unique time that was. Really looking forward to what tomorrow brings!

Autotesting JavaScript with Jasmine and Guard

Sat, 15 Jun 2013 02:03:01 GMT

One of the things I really loved about Rails in the early days was that it introduced me to the concept of autotest - a script that would watch your file system for changes and then automatically execute your unit tests as soon as you change any file.

Because the unit test suite typically executes quickly, you'd tend to have your test results back within a second or two of hitting save, allowing you to remain in the editor the entire time and only break out the browser for deeper debugging - usually the command line output and OS notifications (growl at the time) would be enough to set you straight.

This was a fantastic way to work, and I wanted to get there again with JavaScript. Turns out it's pretty easy to do this. Because I've used a lot of ruby I'm most comfortable using its ecosystem to achieve this, and as it happens there's a great way to do this already.

Enter Guard

Guard is a simple ruby gem that scans your file system for changes and runs the code of your choice whenever a file you care about is saved. It has a great ecosystem around it which makes automating filesystem-based triggers both simple and powerful. Let's start by making sure we have all the gems we need:

gem install jasmine jasmine-headless-webkit guard-jasmine-headless-webkit guard \
 guard-livereload terminal-notifier-guard --no-rdoc --no-ri

This just installs a few gems that we're going to use for our tests. First we grab the excellent Jasmine JavaScript BDD test framework via its gem - you can use the framework of your just but I find Jasmine both pleasant to deal with and it generally Just Works. Next we're going to add the 'jasmine-headless-webkit' gem and its guard twin, which use phantomjs to run your tests on the command line, without needing a browser window.

Next up we grab guard-livereload, which enables Guard to act as a livereload server, automatically running your full suite in the browser each time your save a file. This might sound redundant - our tests are already going to be executed in the headless webkit environment, so why bother running them in the browser too? Well, the browser Jasmine runner tends to give a lot more information when something goes wrong - stack traces and most importantly a live debugger.

Finally we add the terminal-notifier-guard gem, which just allows guard to give us a notification each time the tests finish executing. Now we've got our dependencies in line it's time to set up our environment. Thankfully both jasmine and guard provide simple scripts to get started:

jasmine init
guard init

And we're ready to go! Let's test out our setup by running guard:

guard

What you should see at this point is something like this:

We see guard starting up, telling us it's going to use TerminalNotifier to give us an OS notification every time the tests finish running, and that it's going to use JasmineHeadlessWebkit to run the tests without a browser. You'll see that 5 tests were run in about 5ms, and you should have seen an OS notification flash up telling you the same thing. This is great for working on a laptop where you don't have the screen real estate to keep a terminal window visible at all times.

What about those 5 tests? They're just examples that were generated by jasmine init. You can find them inside the spec/javascripts directory and by default there's just 1 - PlayerSpec.js.

Now try editing that file and hitting save - nothing happens. The reason for this is that the Guardfile generated by guard init isn't quite compatible out of the box with the Jasmine folder structure. Thankfully this is trivial to fix - we just need to edit the Guardfile.

If you open up the Guardfile in your editor you'll see it has about 30 lines of configuration. A large amount of the file is comments and optional configs, which you can delete if you like. Guard is expecting your spec files to have the format 'my_spec.js' - note the '_spec' at the end.

To get it working the easiest way is to edit the 'spec_location' variable (on line 7 - just remove the '_spec'), and do the same to the last line of the guard 'jasmine-headless-webkit' do block. You should end up with something like this:


spec_location = "spec/javascripts/%s"

guard 'jasmine-headless-webkit' do
watch(%r{^app/views/.*\.jst$})
watch(%r{^public/javascripts/(.*)\.js$}) { |m| newest_js_file(spec_location % m[1]) }
watch(%r{^app/assets/javascripts/(.*)\.(js|coffee)$}) { |m| newest_js_file(spec_location % m[1]) }
watch(%r{^spec/javascripts/(.*)\..*}) { |m| newest_js_file(spec_location % m[1]) }
end

Once you save your Guardfile, there's no need to restart guard, it'll notice the change to the Guardfile and automatically restart itself. Now when you save PlayerSpec.js again you'll see the terminal immediately run your tests and show your the notification that all is well (assuming your tests still pass!).

So what are those 4 lines inside the guard 'jasmine-headless-webkit' do block? As you've probably guessed they're just the set of directories that guard should watch. Whenever any of the files matched by the patterns on those 4 lines change, guard will run its jasmine-headless-webkit command, which is what runs your tests. These are just the defaults, so if your JS files are not found inside those folders jus update it to point to the right place.

Livereload

The final part of the stack that I use is livereload. Livereload consists of two things - a browser plugin (available for Chrome, Firefox and others), and a server, which have actually already set up with Guard. First you'll need to install the livereload browser plugin, which is extremely simple.

Because the livereload server is already running inside guard, all we need to do is give our browser a place to load the tests from. Unfortunately the only way I've found to do this is to open up a second terminal tab and in the same directory run:

rake jasmine

This sets up a lightweight web server that runs on http://localhost:8888. If you go to that page in your browser now you should see something like this:

Just hit the livereload button in your browser (once you've installed the plugin), edit your file again and you'll see the browser automatically refreshes itself and runs your tests. This step is optional but I find it extremely useful to get a notification telling me my tests have started failing, then be able to immediately tab into the browser environment to get a full stack trace and debugging environment.

That just about wraps up getting autotest up and running. Next time you come back to your code just run guard and rake jasmine and you'll get right back to your new autotesting setup. And if you have a way to have guard serve the browser without requiring the second tab window please share in the comments!

On Leaving Sencha

Wed, 27 Jun 2012 11:16:55 GMT

As some of you may know, I left Sencha last week to move to another startup just up the road in San Mateo. Leaving the company was a hugely difficult thing to do for lots of reasons, some obvious, some less so. I'd like to share a few thoughts on my time there and look forward a little to the future.

I first came across Sencha's products when I saw an early preview of Ext JS 2 way back in 2007. I thought it was amazing stuff, and I started using it all over the place despite being a Ruby guy at the time. As time went by and I got deeper into the language and the framework, it became clear that JavaScript was the future, even though most people at the time still thought that was a little crazy.

I didn't really intend to join the company. I was having fun writing components and exploring the framework from the outside already, but a chance meeting in San Francisco with the team changed all that. What I found was a small but immensely talented group of people who loved what they did - writing awesome frameworks all day. Underqualified though I felt, being invited into that group was an honor I couldn't really refuse.

Early Days

When I started back in late 2009, Ext JS 3.1 was just being wrapped up for release so I leapt straight into creating 3.2. Having only ever consumed the framework before, making the leap to creating brand new components was quite a challenge. Thankfully Sencha can count many veterans in its ranks, and Jamie in particular demonstrated his saintly patience in bringing me up to speed.

Ext JS 3.2 saw the addition of animated DataView transitions, composite fields and a few Toolbar plugins. It also required some upgrades to Store, which was a horrifying enough experience that I'd spend a few weeks rewriting the entire data package for Sencha Touch and Ext JS 4. 3.2 also saw the first of my allegedly bombastic blog posts (I'm just enthusiastic...)!

All this time we were a very small group working out of a picturesque little office on University Avenue in Palo Alto. During that first year we grew to maybe 25 people and all fit happily into the one big open plan room, descending en masse upon one of the many restaurants along the strip or bringing food back to eat in the sunny courtyard outside the office.

I think of that time as the happiest part of my Sencha experience. Somehow I'd found myself in the heart of Silicon Valley surrounded by unbelievably talented people, creating groundbreaking products - some of which we were even allowed to give away for free! We worked like crazy, often well into the early hours of the morning, but it was a lot of fun and I think we created a lot we can be proud of in that time.

Creating Sencha Touch, Learning how to Conference

Not long after Ext JS 3.2 went final, and in parallel with Ext JS 3.3, we started creating Sencha Touch. The initial work was all from Tommy and Dave, before I got a chance to jump in and start writing the new data package. Over time most of the team got a chance to put their name on Touch as we raced to create the world's first HTML5 mobile app framework. Creating a new product from scratch like that was an awesome experience, and the final product was pretty good (though nowhere near as good as we'd get it with 2.0).

SenchaCon 2010 was scheduled for mid-November and we'd decided we wanted to make a big splash by releasing Sencha Touch - for free. Naturally, this meant a lot of work in a very short period of time in the months running up to the conference. I have vivid memories of a particular evening (read: 3am) in the office just before an imminent release. That can be stressful enough at the best of times but this particular evening our fire alarm would not stop going off. I don't know whether it was the people, the project or the pressure, but what should have been a dreadful night was a really fun experience. And I think it paid off - we shipped on time at the conference, but only just.

This would be a pattern we'd repeat more than once - working night and day to create both products and presentations that have an immovable deadline. Once more it amazes me how talented my friends at Sencha really are: how many developers do you know who can write great code and deliver world-class conference presentations? That all came from a lot of hard work but it's one more reason why it was so hard to leave that group of people behind.

Later Days

Later on, our time was dominated first by Ext JS 4, then by Sencha Touch 2. I was able to make a couple of contributions to Ext JS 4 - chiefly the new data package plus an evolution of the MVC architecture that debuted in Sencha Touch 1. I probably spent as much time writing documentation as I did writing code though, which is a pattern I'd later repeat on Sencha Touch 2. For whatever reason there's a misalignment in my brain that makes me pretty passionate about docs, so if you're reading the guides and class docs from those projects and none of it makes sense, well, sorry! (but you should see how it was before...)

By this time we'd outgrown our little office in Palo Alto and moved to a much bigger space in Redwood City. With 5x the floor space at our disposal the company started growing like crazy, easily expanding by a factor of 10 during the time I was there. That transition was harder than I expected - at 10 people it was like a large family, at 100 it was definitely a Company. I think a lot of that is down to Sencha's success, but it still caught me off guard having never been through that before.

I think the thing I'm proudest of during my time at Sencha was the release of Sencha Touch 2. This was the first release where we got (almost) everything right - the quality was high, the performance was great, and we finally cracked MVC. We even launched with relatively good docs and examples from day one, though I've learned by now that you can never have enough of that stuff.

People/Future

As well as getting to work with so many talented people inside the company, I've also been lucky enough to meet a huge number of people from the Sencha community. If anything you guys seem even more passionate about our stuff than we are. Until SenchaCon I could honestly say I'd never been mobbed but for those few days a year you make us all feel like rockstars. We may not say it at the time but I know everyone involves gets a huge high from those interactions, so thanks.

While I'm at a new company now I expect to stay active in the Sencha community, I'm far too attached to what we created together to leave that behind any time soon. I'll stay active on the forums and maybe even blog once a while - if you want to get in touch feel free to reach out here, on twitter or linkedin, or if you're near Palo Alto maybe I'll buy you a beer.

Sencha's best days are ahead of it and they have a great team there to deliver on the mission. I remain a big fan of the company, its people, its products and especially its community and can't wait to see what happens next.

Anatomy of a Sencha Touch 2 App

Mon, 19 Mar 2012 04:10:00 GMT

At its simplest, a Sencha Touch 2 application is just a small collection of text files - html, css and javascript. But applications often grow over time so to keep things organized and maintainable we have a set of simple conventions around how to structure and manage your application's code.

A little while back we introduced a technology called Sencha Command. Command got a big overhaul for 2.0 and today it can generate all of the files your application needs for you. To get Sencha Command you'll need to install the SDK Tools and then open up your terminal. To run the app generator you'll need to make sure you've got a copy of the Sencha Touch 2 SDK, cd into it in your terminal and run the app generate command:

sencha generate app MyApp ../MyApp

This creates an application called MyApp with all of the files and folders you'll need to get started generated for you. You end up with a folder structure that looks like this:

This looks like a fair number of files and folders because I've expanded the app folder in the image above but really there are only 4 files and 3 folders at the top level. Let's look at the files first:

index.html: simplest HTML file ever, just includes the app JS and CSS, plus a loading spinner
app.js: this is the heart of your app, sets up app name, dependencies and a launch function
app.json: used by the microloader to cache your app files in localStorage so it boots up faster
packager.json: configuration file used to package your app for native app stores

To begin with you'll only really need to edit app.js - the others come in useful later on. Now let's take a look at the folders:

app: contains all of your application's source files - models, views, controllers etc
resources: contains the images and CSS used by your app, including the source SASS files
sdk: contains the important parts of the Touch SDK, including Sencha Command

The app folder

You'll spend 90%+ of your time inside the app folder, so let's drill down and take a look at what's inside that. We've got 5 subfolders, all of which are empty except one - the view folder. This just contains a template view file that renders a tab panel when you first boot the app up. Let's look at each:

controller: will contain all of your Controller files (learn about Controllers)
model: will contain all of your Model files (learn about Models)
profile: will contain all of your Profile classes (learn about Device Profiles)
store: will contain all of your Store classes (learn about Stores)
view: will contain all of your View classes (learn about creating Views) Easy stuff. There's a bunch of documentation on what each of those things are at the Touch 2 docs site, plus of course the Getting Started video with awesome narration by some British guy.

The resources folder

Moving on, let's take a look at the resources folder:

Five folders this time - in turn:

icons: the set of icons used when your app is added to the home screen. We create some nice default ones for you
loading: the loading/startup screen images to use when your app's on a home screen or natively packaged
images: this is where you should put any app images that are not icons or loading images
sass: the source SASS files for your app. This is the place to alter the theming for your app, remove any CSS you're not using and add your own styles
css: the compiled SASS files - these are the CSS files your app will use in production and are automatically minified for you

There are quite a few icon and loading images needed to cover all of the different sizes and resolutions of the devices that Sencha Touch 2 supports. We've included all of the different formats with the conventional file names as a guide - you can just replace the contents of resources/icons and resources/loading with your own images.

The sdk folder

Finally there's the SDK directory, which contains the SDK's source code and all of the dependencies used by Sencha Command. This includes Node.js, Phantom JS and others so it can start to add up. Of course, none of this goes into your production builds, which we keep as tiny and fast-loading as possible, but if you're not going to use the SDK Tools (bad move, but your call!) you can remove the sdk/command directory to keep things leaner.

By vendoring all third-party dependencies like Node.js into your application directory we can be confident that there are no system-specific dependencies required, so you can zip up your app, send it to a friend and so long as she has the SDK Tools installed, everything should just work.

Hopefully that lays out the large-scale structure of what goes where and why - feel free to ask questions!

What do you want from Sencha Touch 2.1?

Wed, 14 Mar 2012 16:07:45 GMT

Disclaimers: this is the most unofficial, non-Sencha-backed poll of all time. There's no guarantee we'll ever do any of it, yada yada.

Touch 2.0 went GA last week to easily the best product launch reception I've seen. It was great and the feedback's wonderful but honeymoons are boring - I want to know what's wrong with it :)

So, what do you want to see in Sencha Touch 2.1? I asked on Twitter just now and got a bunch of responses so here are some ideas. Even if what you want is on this list already drop a reply in the comments so I know more than one person cares about it:

We do of course have a few ideas up our sleeves too, but why spoil the surprise?

Sencha Touch 2 GA Released!

Tue, 06 Mar 2012 05:51:19 GMT

The last few months have flown by faster than almost any before them. The first Sencha Touch 2 release went out in October as a Developer Preview, coinciding with SenchaCon 2011, sparking a huge wave of interest from all over the HTML5 community. Today marks the GA release of Sencha Touch 2.0.0, and we couldn't be happier with how far we've come.

See the announcement on sencha.com

It was only 18 months ago that we released the first version of Sencha Touch 1. It ushered in a brave new world, bringing tried and true approaches from the desktop together with the exciting new capabilities of the mobile web. But it was, to many of us, very much a version 1 product. ST2 is as big a step up from ST1 as ST1 was from everything that went before it.

Similar themes, great execution

For me, there are three core themes that go into any game-changing software release: Performance, Stability and Ease of Use. These themes come up again and again, especially for products at the bleeding edge of what's possible. With the mobile web, each year can bring game changing developments - that's one of the reasons developing Sencha Touch is so exciting.

Touch 2 really nails all of those themes. Performance was definitely the top priority of the three, and I think we've been able to permanently alter expectations of what web apps can do with the improvements we've made. Not only do apps feel much faster when you use them, today's announcement on the incredible new fast startup performance is game changing in itself. It's amazing how fast ST2 apps start up now, it really changes how they're used when they consistently boot in a couple of seconds.

Stability is not sexy. It's not a particularly great marketing ploy but it's really important. Having a weekly release cycle and a huge developer community has really helped enable a fast turnaround on finding, fixing and verifying bugs. Lately new bug reports have been trickling in so slowly we almost wish there were more of them. Almost.

Oddly, documentation is something that's really close to my heart. Back when I joined Sencha, improving the Ext JS docs was a prime concern and I think we took it up several notches between Ext JS 3 and 4. But that's nothing compared to what we've done between Sencha Touch 1 and 2. As well as using the awesome new documentation app from Ext JS, we've added a ludicrous amount of new content including 35 brand new guides (up from zero) and way more examples than ever before.

We've also been creating new screencasts, augmenting the awesome material Drew Neil creates. For the Touch 2 GA release I created this 30 minute getting started video that covers everything from generating an app all the way through to packaging it for native app stores. This is the sort of material I always wished I had when I was first learning Ext JS:

These take a lot of time to produce so if you like this sort of thing make sure you drop a comment here or on the sencha.com blog so I know to create more. I try to record them in a single take so you know there's no magic going on behind the scenes - I think it shows it's all real but it involves a lot of takes :)

Step One

All of the hard work to date has been to ship 2.0.0 - the first stable 2.x release. 2.0 itself is a step change over what went before it but there are already some incredible things lined up for the next few releases. Over the next couple of months we'll continue to polish, optimize and act on the wonderful feedback you guys are providing, but first I think we've earned a break!

Building a data-driven image carousel with Sencha Touch 2

Sat, 11 Feb 2012 01:04:02 GMT

This evening I embarked on a little stellar voyage that I'd like to share with you all. Most people with great taste love astronomy and Sencha Touch 2, so why not combine them in a fun evening's web app building?

NASA has been running a small site called APOD (Astronomy Picture Of the Day) for a long time now, as you can probably tell by the awesome web design of that page. Despite its 1998-era styling, this site incorporates some pretty stunning images of the universe and is begging for a mobile app interpretation.

We're not going to go crazy, in fact this whole thing only took about an hour to create, but hopefully it's a useful look at how to put something like this together. In this case, we're just going to write a quick app that pulls down the last 20 pictures and shows them in a carousel with an optional title.

Here's what it looks like live. You'll need a webkit browser (Chrome or Safari) to see this, alternatively load up http://code.edspencer.net/apod on a phone or tablet device:

The full source code for the app is up on github, and we'll go through it bit by bit below.

The App

Our app consists of 5 files:

index.html, which includes our JavaScript files and a little CSS
app.js, which boots our application up
app/model/Picture.js, which represents a single APOD picture
app/view/Picture.js, which shows a picture on the page
app/store/Pictures.js, which fetches the pictures from the APOD RSS feed

The whole thing is up on github and you can see a live demo at http://code.edspencer.net/apod. To see what it's doing tap that link on your phone or tablet, and to really feel it add it to your homescreen to get rid of that browser chrome.

The Code

Most of the action happens in app.js, which for your enjoyment is more documentation than code. Here's the gist of it:

/*
 * This app uses a Carousel and a JSON-P proxy so make sure they're loaded first
 */
Ext.require([
    'Ext.carousel.Carousel',
    'Ext.data.proxy.JsonP'
]);

/**
 * Our app is pretty simple - it just grabs the latest images from NASA's Astronomy Picture Of the Day 
 * (http://apod.nasa.gov/apod/astropix.html) and displays them in a Carousel. This file drives most of
 * the application, but there's also:
 * 
 * * A Store - app/store/Pictures.js - that fetches the data from the APOD RSS feed
 * * A Model - app/model/Picture.js - that represents a single image from the feed
 * * A View - app/view/Picture.js - that displays each image
 * 
 * Our application's launch function is called automatically when everything is loaded.
 */
Ext.application({
    name: 'apod',
    
    models: ['Picture'],
    stores: ['Pictures'],
    views: ['Picture'],
    
    launch: function() {
        var titleVisible = false,
            info, carousel;
        
        /**
         * The main carousel that drives our app. We're just telling it to use the Pictures store and
         * to update the info bar whenever a new image is swiped to
         */
        carousel = Ext.create('Ext.Carousel', {
            store: 'Pictures',
            direction: 'horizontal',
            
            listeners: {
                activeitemchange: function(carousel, item) {
                    info.setHtml(item.getPicture().get('title'));
                }
            }
        });
        
        /**
         * This is just a reusable Component that we pin to the top of the page. This is hidden by default
         * and appears when the user taps on the screen. The activeitemchange listener above updates the 
         * content of this Component whenever a new image is swiped to
         */
        info = Ext.create('Ext.Component', {
            cls: 'apod-title',
            top: 0,
            left: 0,
            right: 0
        });
        
        //add both of our views to the Viewport so they're rendered and visible
        Ext.Viewport.add(carousel);
        Ext.Viewport.add(info);
        
        /**
         * The Pictures store (see app/store/Pictures.js) is set to not load automatically, so we load it 
         * manually now. This loads data from the APOD RSS feed and calls our callback function once it's
         * loaded.
         * 
         * All we do here is iterate over all of the data, creating an apodimage Component for each item. 
         * Then we just add those items to the Carousel and set the first item active.
         */
        Ext.getStore('Pictures').load(function(pictures) {
            var items = [];
            
            Ext.each(pictures, function(picture) {
                if (!picture.get('image')) {
                    return;
                }
                
                items.push({
                    xtype: 'apodimage',
                    picture: picture
                });
            });
            
            carousel.setItems(items);
            carousel.setActiveItem(0);
        });
        
        /**
         * The final thing is to add a tap listener that is called whenever the user taps on the screen.
         * We do a quick check to make sure they're not tapping on the carousel indicators (tapping on
         * those indicators moves you between items so we don't want to override that), then either hide 
         * or show the info Component.
         * 
         * Note that to hide or show this Component we're adding or removing the apod-title-visible class.
         * If you look at index.html you'll see the CSS rules style the info bar and also cause it to fade
         * in and out when you tap.
         */
        Ext.Viewport.element.on('tap', function(e) {
            if (!e.getTarget('.x-carousel-indicator')) {
                if (titleVisible) {
                    info.element.removeCls('apod-title-visible');
                    titleVisible = false;
                } else {
                    info.element.addCls('apod-title-visible');
                    titleVisible = true;
                }
            }
        });
    }
});

This is pretty simple stuff and you can probably just follow the comments to see what's going on. Basically though the app.js is responsible for launching our application, creating the Carousel and info Components, and setting up a couple of convenient event listeners.

We also had a few other files:

Picture Model

Found in app/model/Picture.js, our model is mostly just a list of fields sent back in the RSS feed. There is one that's somewhat more complicated than the rest though - the 'image' field. Ideally, the RSS feed would have sent back the url of the image in a separate field and we could just pull it out like any other, but alas it is embedded inside the main content.

To get around this, we just specify a convert function that grabs the content field, finds the first image url inside of it and pulls it out. To make sure it looks good on any device we also pass it through Sencha IO src, which resizes the image to fit the screen size of whatever device we happen to be viewing it on:

/**
 * Simple Model that represents an image from NASA's Astronomy Picture Of the Day. The only remarkable
 * thing about this model is the 'image' field, which uses a regular expression to pull its value out 
 * of the main content of the RSS feed. Ideally the image url would have been presented in its own field
 * in the RSS response, but as it wasn't we had to use this approach to parse it out
 */
Ext.define('apod.model.Picture', {
    extend: 'Ext.data.Model',
    
    config: {
        fields: [
            'id', 'title', 'link', 'author', 'content',
            {
                name: 'image',
                type: 'string',
                convert: function(value, record) {
                    var content = record.get('content'),
                        regex   = /img src=\"([a-zA-Z0-9\_\.\/\:]*)\"/,
                        match   = content.match(regex),
                        src     = match[1];

                    if (src != "" && !src.match(/\.gif$/)) {
                        src = "http://src.sencha.io/screen.width/" + src;
                    }
                    
                    return src;
                }
            }
        ]
    }
});

Pictures Store

Our Store is even simpler than our Model. All it does is load the APOD RSS feed over JSON-P (via Google's RSS Feed API) and decode the data with a very simple JSON Reader. This automatically pulls down the images and runs them through our Model's convert function:

/**
 * Grabs the APOD RSS feed from Google's Feed API, passes the data to our Model to decode
 */
Ext.define('apod.store.Pictures', {
    extend: 'Ext.data.Store',
    
    config: {
        model: 'apod.model.Picture',
        
        proxy: {
            type: 'jsonp',
            url: 'https://ajax.googleapis.com/ajax/services/feed/load?v=1.0&q=http://www.acme.com/jef/apod/rss.xml&num=20',
            
            reader: {
                type: 'json',
                rootProperty: 'responseData.feed.entries'
            }
        }
    }
});

Tying it all together

Our app.js loads our Model and Store, plus a really simple Picture view that is basically just an Ext.Img. All it does then is render the Carousel and Info Component to the screen and tie up a couple of listeners.

In case you weren't paying attention before, the info component is just an Ext.Component that we rendered up in app.js as a place to render the title of the image you're currently looking at. When you swipe between items in the carousel the activeitemchange event is fired, which we listen to near the top of app.js. All our activeitemchange listener does is update the HTML of the info component to the title of the image we just swiped to.

But what about the info component itself? Well at the bottom of app.js we added a tap listener on Ext.Viewport that hides or shows the info Component whenever you tap anywhere on the screen (except if you tap on the Carousel indicator icons). With a little CSS transition loveliness we get a nice fade in/out transition when we tap the screen to reveal the image title. Here's that tap listener again:

/**
 * The final thing is to add a tap listener that is called whenever the user taps on the screen.
 * We do a quick check to make sure they're not tapping on the carousel indicators (tapping on
 * those indicators moves you between items so we don't want to override that), then either hide 
 * or show the info Component.
 */
Ext.Viewport.element.on('tap', function(e) {
    if (!e.getTarget('.x-carousel-indicator')) {
        if (titleVisible) {
            info.element.removeCls('apod-title-visible');
            titleVisible = false;
        } else {
            info.element.addCls('apod-title-visible');
            titleVisible = true;
        }
    }
});

The End of the Beginning

This was a really simple app that shows how easy it is to put these things together with Sencha Touch 2. Like with most stories though there's more to come so keep an eye out for parts 2 and 3 of this intergalactic adventure.

Like Android? Help us fix it

Mon, 06 Feb 2012 10:52:03 GMT

Near the end of last week's Sencha Touch 2 beta release blog post there was an appeal to the community to help raise awareness of a nasty flashing issue with Android 4.x phones. Every time you tried to use an animation on a web page the browser would flash, wait a bit, then finally perform the animation.

We filed a ticket on this about a week ago and thanks to your help (over 300 of you starred the issue), got a prompt response from the Android team with a fix for the flashing issue.

Getting it Right

However, that's only half the story. While the ugly flash is gone, animation performance on Android 4.x phones is still unacceptable. As it stands a 2 year old device running Android 2.x easily outruns the top of the range devices today running 4.x.

We really want to have excellent support for all Android devices. While 4.x accounts for only 1% of all Android phones today, that number is only going to go up. And when it does, we want to be ready to ship fast, fluid, beautiful apps onto it.

So we've created a new ticket with reduced, reproducible test cases and filed it to the bug tracker. We'll continue to give the Android team as much support as we can in order to resolve this quickly, but once again we'll need your help.

In fact all we need is a few seconds of your time. Just open the ticket and click the star at the top left. That's all we need - it tells the Android team just how many people care about this issue and will help them prioritize it accordingly.

If you want to help out more, take a moment to add a comment to the ticket outlining your own experiences with this issue, like the m.lanyrd.com developer did. Highlighting specific cases where you've had problems will really help.

Thanks!

Helping raise awareness of this issue will help everyone who uses or develops for Android devices on the web, and enables technologies like Sencha Touch to deliver slick, immersive apps without resorting to rewriting your app for each platform. We appreciate your help!

Star the issue now

Sencha Touch 2 Hits Beta

Wed, 01 Feb 2012 13:46:01 GMT

Earlier today we released Sencha Touch 2 Beta 1 - check out the official sencha.com blog post and release notes to find out all of the awesome stuff packed into this release.

This is a really important release for us - Sencha Touch 2 is another huge leap forward for the mobile web and hitting beta is a massive milestone for everyone involved with the project. From a personal standpoint, working on this release with the amazing Touch team has been immensely gratifying and I hope the end result more than meets your expectations of what the mobile web can do.

While you should check out the official blog post and release notes to find out the large scale changes, there are a number of things I'd really like to highlight today.

A Note on Builds

Before we get into the meat of B1 itself, first a quick note that we've updated the set of builds that we generate with the release. Previously there had been some confusion around which build you should be using in which circumstances so we've tried to simplify that.

Most people, most of the time should be using the new sencha-touch-debug.js while developing their app as it is unminified code that contains all of the debug warnings and comments. If you're migrating from 1.x, use the new builds/sencha-touch-all-compat.js build as it provides an easier migration path by logging additional warnings when you use 1.x-style class configurations.

Because we provide 5 builds in total we created a guide on the shipped builds and JSBuilder (the tool that creates a custom build specifically for your app). The guide contains a table showing all of the options enabled for each build - hopefully that makes it easy to choose which build is best for your needs.

Performance

In case you haven't seen Sencha Touch 2 yet the first thing you need to know is that it's fast. Crazy fast. Check out this side by side comparison between 1.x and 2.x:

Layout performance is enormously faster in 2.x due to a brand new layout engine that operates much closer to the browser's optimized CSS layout engine. The difference is pretty startling, especially on Android devices, which had sometimes struggled with Sencha Touch 1. Performance remains a top priority for us and we're really pleased with the improvements that we've secured with 2.0.

Navigation View

The new Navigation View is one of the slickest, sexiest things we've created for 2.0. I could play with this thing all day. If you've got a phone in your pocket or a tablet near by open up the Navigation View example and see it for yourself. If you're not, check out this beautiful video of it in action:

Navigation Views are really easy to put together and make your application immediately come to life. Check out the Navigation View docs to see how easy it is to add this to your own applications.

Awesome new examples

As of beta 1 we have 24 examples shipped with the SDK, including no fewer than 6 MVC examples - Kitchen Sink, Jogs with Friends, Twitter, Kiva, Navigation View and GeoCongress.

The Kitchen Sink and Twitter examples also take advantage of Device Profiles, which are a powerful way to customize your app to render customized UI for tablets and phones. Take a look at the Kitchen Sink on your phone and on an iPad to see how it rearranges itself depending on the screen size.

Finally, if you're seeing Sencha Touch 2 for the first time you may not have seen the new inline examples in the documentation center. This is a brand new thing for Sencha Touch and allows you to edit code live on the documentation page and immediately see the results - give it a go on the Carousel docs.

Ludicrous Amounts of Documentation

Speaking of docs, we have a stunning amount of learning material for Sencha Touch 2. We've been through all of the major classes, making sure that the functions are clearly documented and that each one has some great intro text that describes what the class does and how it fits in with the rest of the framework.

We've also created over 20 brand new guides for Sencha Touch 2, covering everything from getting started through to developing using MVC, using Components and creating custom builds for your applications. We've put a huge amount of effort into our docs for Sencha Touch 2 and I really hope it pays off for you guys and makes it easier than ever to create great mobile web apps.

Go Build Something

It's only beta 1 but we're very happy with the performance, stability, API and documentation of Sencha Touch 2. I think it's the best thing we've ever created, and really highlights what the mobile web is capable of. 2012 looks set to be a very exciting year for Sencha Touch so I hope you'll join us on the adventure and build something amazing with it.

Download Sencha Touch 2 Beta 1 Now

The Class System in Sencha Touch 2 - What you need to know

Sat, 28 Jan 2012 00:53:05 GMT

Sencha Touch 1 used the class system from Ext JS 3, which provides a simple but powerful inheritance system that makes it easier to write big complex things like applications and frameworks.

With Sencha Touch 2 we've taken Ext JS 4's much more advanced class system and used it to create a leaner, cleaner and more beautiful framework. This post takes you through what has changed and how to use it to improve your apps.

Syntax

The first thing you'll notice when comparing code from 1.x and 2.x is that the class syntax is different. Back in 1.x we would define a class like this:

MyApp.CustomPanel = Ext.extend(Ext.Panel, {
    html: 'Some html'
});

This would create a subclass of Ext.Panel called MyApp.CustomPanel, setting the html configuration to 'Some html'. Any time we create a new instance of our subclass (by calling new MyApp.CustomPanel()), we'll now get a slightly customized Ext.Panel instance.

Now let's see how the same class is defined in Sencha Touch 2:

Ext.define('MyApp.CustomPanel', {
    extend: 'Ext.Panel',
    
    config: {
        html: 'Some html'
    }
});

There are a few changes here, let's go through them one by one. Firstly and most obviously we've swapped out Ext.extend for Ext.define. Ext.define operates using strings - notice that both 'MyApp.CustomPanel' and 'Ext.Panel' are now wrapped in quotes. This enables one of the most powerful parts of the new class system - dynamic loading.

I actually talked about this in a post about Ext JS 4 last year so if you're not familiar you should check out the post, but in a nutshell Sencha Touch 2 will automatically ensure that the class you're extending (Ext.Panel) is loaded on the page, fetching it from your server if necessary. This makes development easier and enables you to create custom builds that only contain the class your app actually uses.

The second notable change is that we're using a 'config' block now. Configs are a special thing in Sencha Touch 2 - they are properties of a class that can be retrieved and updated at any time, and provide extremely useful hook functions that enable you to run any custom logic you like whenever one of them is changed.

Whenever you want to customize any of the configurations of a subclass in Sencha Touch 2, just place them in the config block and the framework takes care of the rest, as we'll see in a moment.

Consistency

The biggest improvement that comes from the config system is consistency. Let's take our MyApp.CustomPanel class above and create an instance of it:

var myPanel = Ext.create('MyApp.CustomPanel');

Every configuration has an automatically generated getter and setter function, which we can use like this:

myPanel.setHtml('New HTML');
myPanel.getHtml(); //returns 'New HTML'

This might not seem much, but the convention applies to every single configuration in the entire framework. This eliminates the guesswork from the API - if you know the config name, you know how to get it and update it. Contrast this with Sencha Touch 1 where retrieving the html config meant finding some property on the instance, and updating it meant calling myPanel.update('New HTML'), which is nowhere near as predictable.

Instantiating

You probably noticed that we used a new function above - Ext.create. This is very similar to just calling 'new MyApp.CustomPanel()', with the exception that Ext.create uses the dynamic loading system to automatically load the class you are trying to instantiate if it is not already on the page. This can make life much easier when developing your app as you don't have to immediately manage dependencies - it just works.

In the example above we just instantiated a default MyApp.CustomPanel but of course we can customize it at instantiation time by passing configs into Ext.create:

var myPanel = Ext.create('MyApp.CustomPanel', {
    html: 'Some Custom HTML'
});

We can still call getHtml() and setHtml() to retrieve and update our html config at any time.

Subclassing and Custom Configs

We created a simple subclass above that provided a new default value for Ext.Panel's html config. However, we can also add our own configs to our subclasses:

Ext.define('MyApp.CustomPanel', {
    extend: 'Ext.Panel',
    
    config: {
        html: 'Some html',
        anotherConfig: 'default value'
    }
});

The 'anotherConfig' configuration doesn't exist on Ext.Panel so it's defined for the first time on MyApp.CustomPanel. This automatically creates our getter and setter functions for us:

var myPanel = Ext.create('MyApp.CustomPanel');
myPanel.setAnotherConfig('Something else');
myPanel.getAnotherConfig(); //now returns 'Something else'

Notice how the getter and setter names were automatically capitalized to use camelCase like all of the other functions in the framework. This was done automatically, but Sencha Touch 2 does another couple of very nice things for you - it creates hook functions:

Ext.define('MyApp.CustomPanel', {
    extend: 'Ext.Panel',
    
    config: {
        html: 'Some html',
        anotherConfig: 'default value'
    },
    
    applyAnotherConfig: function(value) {
        return "[TEST] " + value;
    },
    
    updateAnotherConfig: function(value, oldValue) {
        this.setHtml("HTML is now " + value);
    }
});

We've added two new functions to our class - applyAnotherConfig and updateAnotherConfig - these are both called when we call setAnotherConfig. The first one that is called is applyAnotherConfig. This is passed the value of the configuration ('default value' by default in this case) and is given the opportunity to modify it. In this case we're prepending "[TEST] " to whatever anotherConfig is set to:

var myPanel = Ext.create('MyApp.CustomPanel');
myPanel.setAnotherConfig('Something else');
myPanel.getAnotherConfig(); //now returns '[TEST] Something else'

The second function, updateAnotherConfig, is called after applyAnotherConfig has had a chance to modify the value and is usually used to effect some other change - whether it's updating the DOM, sending an AJAX request, or setting another config as we do here.

When we run the code above, as well as '[TEST] ' being prepended to our anotherConfig configuration, we're calling this.setHtml to update the html configuration too. There's no limit to what you can do inside these hook functions, just remember the rule - the apply functions are used to transform new values before they are saved, the update functions are used to perform the actual side-effects of changing the value (e.g. updating the DOM or configuring other classes).

How we use it

The example above is a little contrived to show the point - let's look at a real example from Sencha Touch 2's Ext.Panel class:

applyBodyPadding: function(bodyPadding) {
    if (bodyPadding === true) {
        bodyPadding = 5;
    }

    bodyPadding = Ext.dom.Element.unitizeBox(bodyPadding);

    return bodyPadding;
},

updateBodyPadding: function(newBodyPadding) {
    this.element.setStyle('padding', newBodyPadding);
}

Here we see the apply and update functions for the bodyPadding config. Notice that in the applyBodyPadding function we set a default and use the framework's unitizeBox function to parse CSS padding strings (like '5px 5px 10px 15px') into top, left, bottom and right paddings, which we then return as the transformed value.

The updateBodyPadding then takes this modified value and performs the actual updates - in this case setting the padding style on the Panel's element based on the new configuration. You can see similar usage in almost any component class in the framework.

Find out more

This is just a look through the most important aspects of the new class system and how they impact you when writing apps in Sencha Touch 2. To find out more about the class system we recommend taking a look at the Class System guide and if you have any questions the forums are a great place to start.

Sencha Touch 2 PR4 - Big Improvements in Data and MVC

Tue, 24 Jan 2012 00:59:44 GMT

Today we released Sencha Touch 2.0 PR4 - the fourth and final preview release before we hit beta. While we're technically calling this one a preview release, we're pretty happy with the performance, stability and overall quality of this release and consider it exceptionally close to beta quality.

As well as a good number of enhancements and bug fixes PR4 brings a couple of long-awaited improvements to two of the most important parts of Sencha Touch - the data package and the application architecture.

First up, the data package has been ported to use the new config system, which normalizes all of the configuration options for every class in the data package, providing a clean and predictable way to configure and update your data classes. We're still cleaning up some of the data package documentation and given the scope of some of the changes we're expecting a few bugs to appear as a result but overall we're very happy with the improved capabilities of Ext.data.

MVC Improvements

The second big improvement in PR4 is to the application architecture. The MVC classes have also been upgraded to use the new config system, again yielding big improvements in the API and general flexibility of your code.

History support has been baked directly into Controllers, enabling you to easily define routes that your Controller cares about, as well as the functions that handle those routes right there in your Controller file. The Kitchen Sink example has been upgraded to use routes out of the box - try it on a mobile device or desktop browser and watch how it reacts to the back/forward buttons.

Equally important, Device Profiles have been upgraded to make creating apps that adapt to different screen sizes much simpler than ever before. Once again the Kitchen Sink has been upgraded to take advantage of device profiles. If you load it on a tablet device you'll see a split screen view with the menu on the left and the content on the right, whereas the phone version employs a nested list to save screen space.

To cap it off the deep linking support means you can navigate to any view on a phone, send the link to a friend on a tablet and they'll be taken to the same view customized for their screen size. As an example, try opening http://dev.sencha.com/deploy/sencha-touch-2-pr4/examples/kitchensink/#demo/forms on a tablet and a phone to see it show the Forms demo specialized for each type of device.

As PR4 is the first time we've exposed this expanded functionality to the public we expect that there will be bugs and edge cases that crop up. We'll be keeping a close eye on the bug forums and addressing any issues as quickly as possible, as well as creating additional MVC-driven examples for you to learn from. For now, the kitchen sink is the best example of Sencha Touch 2 MVC in action.

Docs

We've made a huge push over the last couple of years to radically improve our documentation, and I think that even in the pre-beta PR4 release Sencha Touch 2 has the best docs we've ever created. While there are still holes to be filled in, we already ship with 20 guides on how to use the framework, including 4 brand new guides for PR4:

As well as the guides, most of the classes now contain generous documentation explaining their function and the context in which they operate. As we move to beta and then to GA we'll be shifting our focus onto producing great demos and examples to showcase the framework's capabilities and provide realistic sample code to draw from.

There's a full set of release notes explaining the improvements in PR4 and the important known issues. We expect to be shipping regular releases from now until GA so be sure to keep an eye on the forums, twitter and the sencha blog for more details.

SenchaCon 2011: The Best Bits

Wed, 26 Oct 2011 14:46:42 GMT

SenchaCon 2011 is drawing to a close and it's been another awesome ride. We were joined by 600 of the best and brightest of the Sencha community and I think it's pretty safe to say we had an awesome time. Day 3 is just drawing to a close so here's a few highlights from the week.

Ext JS 4.1 Performance Preview Released

There were a number of big announcements on day 1. Probably the most exciting one for me was the release of Ext JS 4.1 Performance Preview. We've been working like fiends to improve Ext JS's performance profile on older browsers (IE6, IE7 and IE8 in particular) and on Monday we were able to share some of what we've achieved.

Page load, render and layout times are all enormously improved and have been the focus of our optimizations so far. Since 4.0 we've been building up a performance benchmarking rig that tests all of our 100+ examples (and a number of real-world customer apps) on consumer grade hardware with a range of browsers. We've seen massive improvements in loading time on these older browsers - for example the Themes Viewer example with its 300 Components all rendered at load time now starts up twice as fast as it did in 4.0.7.

To give a flavor for the breadth of the improvement we ran the tests on every example and summed up the loading time for each browser. As you can see below, 4.1 is able to speed through all of the examples significantly faster than 4.0.7, giving a massive performance boost across the board. It got so much faster that IE8 is now able to load all 100+ examples in a little under 20 seconds, compared with almost 60 in 4.0.7:

See the full announcement on the sencha.com blog, but like we said there, this is a pre-beta release with a number of known issues. We'd love for you to verify the speed improvements with your own apps but please don't take it anywhere near production yet! We'll have more content on what's in 4.1 in later blog posts.

Other Announcements

While Ext JS is closest to my heart, there were a number of other announcements made over the last few days. First up is Sencha.IO, our new cloud service and now launching in beta. This is a set of 4 services - data, messages, login and app deployment - that make creating and deploying web apps a snap, especially when you integrate the social aspects of Sencha.IO data and messages.

We also announced that we've just closed a second round of funding, raising another $15 million to further advance the state of the art in HTML5 technologies. This is going to enable us to push forward even faster and bring you some exciting new technologies. It was great that Sequoia Capital and Radar Partners were so happy with their first round with us that they decided to invest again. The future is definitely very exciting at Sencha right now.

Favorite Sessions

There were over 50 sessions this year and with several tracks going on simultaneously it was impossible to go to them all. Jacky Nguyen definitely stole the show with his talk on the Sencha Class System. He has a ridiculously over the top presentation style and totally brought the house down. We'll be sure to get him on stage more often!

Jamie and Nicolas' talk on charting was very cool and generated lots of spontaneous applause (that happened a lot during the conference, which must be a good thing), and Rob and Dave's demonstration of styling using the new beta Neptune theme was equally awesome.

Don lit the place up with his talk detailing the work that went into making Ext JS 4.1 so much faster, along with all the other new features in the release. Another mind blowing talk was given by John Willander, who demo'd a series of client-side attacks along with the BeEF Project, which happens to be writen in Ext JS. Based on what John presented we'll definitely be looking at what we can do to help you secure your apps with Ext JS.

Of course, I had a couple of sessions myself, though a few technical problems early on made them rather more challenging than expected (it's hard to talk to people when your microphone cuts out after every second word!). The Intro to MVC talk was a blast and the sacrifice to the gods of live demos seemed to pay off as the 20 minute live coding session went without a hitch. Anyone who wants the code I put together during that session can find it up on github.

Meeting Everyone

Although there were 600 people here this time it felt like I was able to meet almost everyone. Your intense enthusiasm for what we do really came through and to everyone who came up and gave us such great feedback it really drives us forward to keep improving your framework so thank you!

I saw more awesome Ext JS and Sencha Touch apps than I could count, and was pleasantly surprised to see how many people had been able to construct full applications using Sencha Touch 2 despite it only being in Developer Preview right now. It was also great getting to spend time hanging out with people and seeing them get excited when they start to see what's possible with these products. Spending time in the flesh with developers is probably the most important part of the whole conference so it was great to meet so many of you.

Finally, Grgur announced that the second SourceCon Europe will be taking place in London around April of next year. The first SourceCon was an awesome experience in beautiful Split, Croatia, and next year we'll be heading to London, England for this community-organized, Sencha-centric conference. They'll be launching the conference website in a couple of weeks and given how good it was last year you'll probably have to rush to get your tickets. Hope to see you there!

Ext JS 4.0.7 Released

Thu, 20 Oct 2011 12:21:38 GMT

I'm very happy to report that we released Ext JS 4.0.7 to the public today. This is the seventh patch release to the 4.0.x series and contains several hundred improvements and bug fixes compared to the last public version, 4.0.2a.

4.0.7 is all about robustness - we've found that our support subscribers have had a lot of success with the newer builds of Ext JS 4 so I'm really pleased that we can share this with you. We're releasing this publicly earlier than we would usually do because it has taken us longer than we expected to get Ext JS 4.1 into your hands.

Michael put out a post on our blog last week with some updates on 4.1 and our desires around releases and communications with the community. Not being able to ship 4.1 to you yet has been a frustrating experience but I think that once you see it you'll enjoy the vast improvements it brings.

In the meantime, I'm happy to answer questions in the comments, via twitter or email (ed @ sencha). You can download the build here and see the full release notes for 4.0.7 all the way back to 4.0.0.

Sencha Touch 2 - Thoughts from the Trenches

Tue, 11 Oct 2011 14:01:43 GMT

As you may have seen, we put out the first public preview release of Sencha Touch 2 today. It only went live a few hours ago but the feedback has been inspiring so far. For the full scoop see the post on the sencha.com blog. A few thoughts on where we are with the product:

Performance

Performance on Android devices in particular is breathtaking. I never thought I'd see the day where I could pick up an Android 2.3 device and have it feel faster than an iPhone 4, and yet that's exactly what Sencha Touch 2 brings to the table. I recorded this short video on an actual device to show real world performance:

Now try the same on Sencha Touch 1.x (or any other competing framework) and (if you're anything like me) cringe at what we were accustomed to using before. That video's cool, but the one that's really driving people wild is the side by side comparison of the layout engines in 1.x and 2.x.

Getting our hands on a high speed camera and recording these devices at 120fps was a lot of fun. Slowing time down to 1/4 of normal speed shows just how much faster the new layout engine is than what we used to have:

The most amazing part here is that we actually finish laying out before the phone's rotation animation has completed. Skipping through the video frame by frame there are at least 5 frames where the app is fully laid out and interactive while the phone's rotation animation is still running. Beating the phone's own rotation speed is the holy grail - it's not possible to make it any faster.

Documentation

I'll admit it, I'm fanatical about great documentation. I'm sure I drive everyone else on the team crazy but I think it's worth it. This is only a preview release but it already contains by far the best, most complete documentation we've ever shipped in an initial release.

In fact, the team's worked so hard on documenting classes that it's probably better than the (already good) Ext JS 4 docs. Naturally, this makes it time to further improve the Ext JS documentation.

We've added some awesome features here - lots of videos, 11 brand new guides and illustrations. My favourite new feature is definitely the inline examples with live previews though - seeing Sencha Touch running live in a phone/tablet right there in the docs is just amazing. Little gems like the live twitter feed in the bottom-most example in the DataView docs really sell just how easy it is to configure these components.

We set a high bar for this though. We've gone from woeful documentation in 1.x to good documentation in 2.x, but what we're shooting for is excellence. We'll continue to round out our content over coming weeks, and have a few new features rolling out soon that will raise the bar once again.

Onwards

We have a few features left to implement, which is why we're calling this preview and not beta. Probably the biggest thing now is getting routing/deep linking back into the framework, along with a nice new syntax that I think you'll find really easy to use. We're also missing carousel animations and a handful of other things that will be going back in over the coming weeks. We have Sencha Con 2011 in just 12 days now though so we'll share more details there.

Finally though, I want to thank everyone who participated in the closed preview phase, and for everyone sending their support and kind words on the blog, the forums and on twitter. We really appreciate all the great feedback and I hope we can exceed your expectations with a fast, polished, gorgeous 2.0 final!

SourceDevCon 2011 - an awesome conference

Tue, 10 May 2011 15:11:42 GMT

The inaugural SouceDevCon just wrapped up in Split, Croatia so I'd like to share a few thoughts on the last few days. The conference was an enormous success, featuring some great speakers, inspiring presentations and a fantastic group of attendees. Split itself is beautiful, and the weather was equally equitable. More than a few of us are returning a lot browner/redder than we came.

Day 1

The conference was spread across 3 days - the first two were spent listening and learning across the three concurrent tracks, the third on a boat sailing around the Adriatic Sea. Day one kicked off with my colleagues Aditya and James setting out a little of what to expect from Sencha in 2011 in the opening keynote.

Straight after that I took to the stage to introduce a few of the features of Ext JS 4. My session started a little late and I forgot what I was talking about a couple of times (sorry guys :) ) but I think it turned out well enough. I spliced together the deadly combination of sleep deprivation and live coding but with a little help from the audience we were able to stumble through. I think it would make for a good screencast.

Aditya came on next and introduced our new Sencha.IO services, which seem to have garnered a lot of interest. James showed off the new Ext JS 4 theming support using SASS and Compass and Nils Dehl did a great job explaining the Ext.data and Ext.direct packages. Jay Garcia gave a well-received talk on creating extensions and plugins, during which I think a lot of people learned a great deal about how classes work in Ext JS 4. I also very much enjoyed Tomislav Car's investigation into getting Sencha Touch to run on phones other than Androids and iPhones.

Day 1 ended with a long party (I counted at least 8 hours) with inordinate amounts of Croatian beer, which went down very well. It was great meeting so many new people and hearing how much people are getting out of Ext JS and Sencha Touch, as well as what we can improve.

Day 2

Day 2 started in a somewhat hungover fashion with some awesome material from our very own Brian Moeskau, who demonstrated how to use the 3.x -> 4.x compatibility file, upgrading an application from 3.3.1 to 4.0.0 in front of our eyes. Nige (Animal) White demonstrated several of Ext JS's layout managers before giving one of the highlights of the conference in his debugging JavaScript presentation (by contrast he calls the introduction of errors into code "bebugging").

Tobias Uhlig showed off FieldManager, a sports centre management management application with a Sencha Touch mobile app, while Matz Bryntse and Brian Moeskau demoed their awesome Scheduler and Calendar components. Josef Sakalos (Saki) spent the afternoon teaching people to use Ext JS 4's new MVC package, which makes writing apps a faster and more enjoyable experience. He is an excellent teacher.

Our host the inimitable Grgur finished things off with a typically heartfelt ending keynote to wrap up the business end of the conference. The evening was a great opportunity to see some of Split and spend an enjoyable meal with Croatian locals Tomislav, Miro and the ever-logical Lucia. Unfortunately their attempts to teach me Croatian did not yield much success.

Day 3

Day 3 was a stroke of genius by Grgur. We took a chartered boat down to the town and looked around the old Roman-era palace. The boat was well stocked with beer and with so many community members in one place the conversation was pretty lively. All of this relaxing was great but with a head full of ideas I'm anxious to get back to California and better tune our products based on all the feedback I received this week.

All of the sessions were recorded on video and I believe they'll be made available in around a month's time. If you can't wait until then we have a meetup schedule for May 23rd hosted at Sencha HQ in northern California, to which you're all invited. Just in case you've never been to Split before perhaps the sight that greeted us when we arrived will prompt you to get yourself on a flight.

Proxies in Ext JS 4

Wed, 02 Feb 2011 08:22:59 GMT

One of the classes that has a lot more prominence in Ext JS 4 is the data Proxy. Proxies are responsible for all of the loading and saving of data in an Ext JS 4 or Sencha Touch application. Whenever you're creating, updating, deleting or loading any type of data in your app, you're almost certainly doing it via an Ext.data.Proxy.

If you've seen January's Sencha newsletter you may have read an article called Anatomy of a Model, which introduces the most commonly-used Proxies. All a Proxy really needs is four functions - create, read, update and destroy. For an AjaxProxy, each of these will result in an Ajax request being made. For a LocalStorageProxy, the functions will create, read, update or delete records from HTML5 localStorage.

Because Proxies all implement the same interface they're completely interchangeable, so you can swap out your data source - at design time or run time - without changing any other code. Although the local Proxies like LocalStorageProxy and MemoryProxy are self-contained, the remote Proxies like AjaxProxy and ScriptTagProxy make use of Readers and Writers to encode and decode their data when communicating with the server.

Whether we are reading data from a server or preparing data to be sent back, usually we format it as either JSON or XML. Both of our frameworks come with JSON and XML Readers and Writers which handle all of this for you with a very simple API.

Using a Proxy with a Model

Proxies are usually used along with either a Model or a Store. The simplest setup is just with a model:

var User = Ext.regModel('User', {
    fields: ['id', 'name', 'email'],
    
    proxy: {
        type: 'rest',
        url : '/users',
        reader: {
            type: 'json',
            root: 'users'
        }
    }
});

Here we've created a User model with a RestProxy. RestProxy is a special form of AjaxProxy that can automatically figure out Restful urls for our models. The Proxy that we set up features a JsonReader to decode any server responses - check out the recent data package post on the Sencha blog to see Readers in action.

When we use the following functions on the new User model, the Proxy is called behind the scenes:

var user = new User({name: 'Ed Spencer'});

//CREATE: calls the RestProxy's create function because the user has never been saved
user.save();

//UPDATE: calls the RestProxy's update function because it has been saved before
user.set('email', 'ed@sencha.com');

//DESTROY: calls the RestProxy's destroy function
user.destroy();

//READ: calls the RestProxy's read function
User.load(123, {
    success: function(user) {
        console.log(user);
    }
});

We were able to perform all four CRUD operations just by specifying a Proxy for our Model. Notice that the first 3 calls are instance methods whereas the fourth (User.load) is static on the User model. Note also that you can create a Model without a Proxy, you just won't be able to persist it.

Usage with Stores

In Ext JS 3.x, most of the data manipulation was done via Stores. A chief purpose of a Store is to be a local subset of some data plus delta. For example, you might have 1000 products in your database and have 25 of them loaded into a Store on the client side (the local subset). While operating on that subset, your user may have added, updated or deleted some of the Products. Until these changes are synchronized with the server they are known as a delta.

In order to read data from and sync to the server, Stores also need to be able to call those CRUD operations. We can give a Store a Proxy in the same way:

var store = new Ext.data.Store({
    model: 'User',
    proxy: {
        type: 'rest',
        url : '/users',
        reader: {
            type: 'json',
            root: 'users'
        }
    }
});

We created the exact same Proxy for the Store because that's how our server side is set up to deliver data. Because we'll usually want to use the same Proxy mechanism for all User manipulations, it's usually best to just define the Proxy once on the Model and then simply tell the Store which Model to use. This automatically picks up the User model's Proxy:

//no need to define proxy - this will reuse the User's Proxy
var store = new Ext.data.Store({
    model: 'User'
});

Store invokes the CRUD operations via its load and sync functions. Calling load uses the Proxy's read operation, which sync utilizes one or more of create, update and destroy depending on the current Store delta.

//CREATE: calls the RestProxy's create function to create the Tommy record on the server
store.add({name: 'Tommy Maintz'});
store.sync();

//UPDATE: calls the RestProxy's update function to update the Tommy record on the server
store.getAt(1).set('email', 'tommy@sencha.com');
store.sync();

//DESTROY: calls the RestProxy's destroy function
store.remove(store.getAt(1));
store.sync();

//READ: calls the RestProxy's read function
store.load();

Store has used the exact same CRUD operations on the shared Proxy. In all of the examples above we have used the exact same RestProxy instance from three different places: statically on our Model (User.load), as a Model instance method (user.save, user.destroy) and via a Store instance (store.load, store.sync):

Of course, most Proxies have their own private methods to do the actual work, but all a Proxy needs to do is implement those four functions to be usable with Ext JS 4 and Sencha Touch. This means it's easy to create new Proxies, as James Pearce did in a recent Sencha Touch example where he needed to read address book data from a mobile phone. Everything he does to set up his Proxy in the article (about 1/3rd of the way down) works the same way for Ext JS 4 too.

Introduction to Ext JS 4

Thu, 27 Jan 2011 08:30:20 GMT

At the end of last 2010 we capped off an incredible year with SenchaCon - by far the biggest gathering of Sencha developers ever assembled. We descended on San Francisco, 500 strong, and spent an amazing few days sharing the awesome new stuff we're working on, learning from each other, and addressing the web's most pressing problems.

Now, we're proud to release all of the videos from the conference completely free for everyone. You can see a full list on our conference site, where you'll find days worth of material all about Ext JS 4, Sencha Touch and all of the other treats we're working on at the moment.

Some of the videos in particular stand out for me - Jamie's Charting and Layouts talks were spectacular, as was Rob's Theming Ext JS 4 talk. On the Touch side, Tommy's talks on Performance and Debugging are required viewing, as is Dave Kaneda's characteristically off the cuff Theming talk.

My personal high point was standing in front of all of you and introducing Ext JS 4 and its three core goals - speed, stability and ease of use. I think you're going to love what we've done with the framework in version 4.

If you're so inclined, you can find the slides for this talk on slideshare, and if you can still stand the sound of my voice check out my other presentation on Ext JS 4 Architecture, focusing chiefly on the new data package (slides).

Ext JS 4: The Class Definition Pipeline

Tue, 25 Jan 2011 08:25:00 GMT

Last time, we looked at some of the features of the new class system in Ext JS 4, and explored some of the code that makes it work. Today we're going to dig a little deeper and look at the class definition pipeline - the framework responsible for creating every class in Ext JS 4.

As I mentioned last time, every class in Ext JS 4 is an instance of Ext.Class. When an Ext.Class is constructed, it hands itself off to a pipeline populated by small, focused processors, each of which handles one part of the class definition process. We ship a number of these processors out of the box - there are processors for handling mixins, setting up configuration functions and handling class extension.

The pipeline is probably best explained with a picture. Think of your class starting its definition journey at the bottom left, working its way up the preprocessors on the left hand side and then down the postprocessors on the right, until finally it reaches the end, where it signals its readiness to a callback function:

The distinction between preprocessors and postprocessors is that a class is considered ‘ready’ (e.g. can be instantiated) after the preprocessors have all been executed. Postprocessors typically perform functions like aliasing the class name to an xtype or back to a legacy class name - things that don't affect the class' behavior.

Each processor runs asynchronously, calling back to the Ext.Class constructor when it is ready - this is what enables us to extend classes that don’t exist on the page yet. The first preprocessor is the Loader, which checks to see if all of the new Class’ dependencies are available. If they are not, the Loader can dynamically load those dependencies before calling back to Ext.Class and allowing the next preprocessor to run. We'll take another look at the Loader in another post.

After running the Loader, the new Class is set up to inherit from the declared superclass by the Extend preprocessor. The Mixins preprocessor takes care of copying all of the functions from each of our mixins, and the Config preprocessor handles the creation of the 4 config functions we saw last time (e.g. getTitle, setTitle, resetTitle, applyTitle - check out yesterday's post to see how the Configs processor helps out).

Finally, the Statics preprocessor looks for any static functions that we set up on our new class and makes them available statically on the class. The processors that are run are completely customizable, and it’s easy to add custom processors at any point. Let's take a look at that Statics preprocessor as an example:

//Each processor is passed three arguments - the class under construction,
//the configuration for that class and a callback function to call when the processor has finished
Ext.Class.registerPreprocessor('statics', function(cls, data, callback) {
    if (Ext.isObject(data.statics)) {
        var statics = data.statics,
            name;
        
        //here we just copy each static function onto the new Class
        for (name in statics) {
            if (statics.hasOwnProperty(name)) {
                cls[name] = statics[name];
            }
        }
    }

    delete data.statics;

    //Once the processor's work is done, we just call the callback function to kick off the next processor
    if (callback) {
        callback.call(this, cls, data);
    }
});

//Changing the order that the preprocessors are called in is easy too - this is the default
Ext.Class.setDefaultPreprocessors(['extend', 'mixins', 'config', 'statics']);

What happens above is pretty straightforward. We're registering a preprocessor called 'statics' with Ext.Class. The function we provide is called whenever the 'statics' preprocessor is invoked, and is passed the new Ext.Class instance, the configuration for that class, and a callback to call when the preprocessor has finished its work.

The actual work that this preprocessor does is trivial - it just looks to see if we declared a 'statics' property in our class configuration and if so copies it onto the new class. For example, let's say we want to create a static getNextId function on a class:

Ext.define('MyClass', {
    statics: {
        idSeed: 1000,
        getNextId: function() {
            return this.idSeed++;
        }
    }
});

Because of the Statics preprocessor, we can now call the function statically on the Class (e.g. without creating an instance of MyClass):

MyClass.getNextId(); //1000
MyClass.getNextId(); //1001
MyClass.getNextId(); //1002
... etc

Finally, let's come back to that callback at the bottom of the picture above. If we supply one, a callback function is run after all of the processors have run. At this point the new class is completely ready for use in your application. Here we create an instance of MyClass using the callback function, guaranteeing that the dependency on Ext.Window has been honored:

Ext.define('MyClass', {
    extend: 'Ext.Window'
}, function() {
   //this callback is called when MyClass is ready for use
   var cls = new MyClass();
   cls.setTitle('Everything is ready');
   cls.show();
});

That's it for today. Next time we'll look at some of the new features in the part of Ext JS 4 that is closest to my heart - the data package.

Classes in Ext JS 4: Under the hood

Mon, 24 Jan 2011 08:37:22 GMT

Last week we unveiled a the brand new class system coming in Ext JS 4. If you haven’t seen the new system in action I hope you’ll take a look at the blog post on sencha.com and check out the live demo. Today we’re going to dig a little deeper into the class system to see how it actually works.

To briefly recap, the new class system enables us to define classes like this:

Ext.define('Ext.Window', {
    extend: 'Ext.Panel',
    requires: 'Ext.Tool',
    mixins: {
        draggable: 'Ext.util.Draggable'
    },
    
    config: {
        title: "Window Title"
    }
});

Here we’ve set up a slightly simplified version of the Ext.Window class. We’ve set Window up to be a subclass of Panel, declared that it requires the Ext.Tool class and that it mixes in functionality from the Ext.util.Draggable class.

There are a few new things here so we’ll attack them one at a time. The ‘extend’ declaration does what you’d expect - we’re just saying that Window should be a subclass of Panel. The ‘requires’ declaration means that the named classes (just Ext.Tool in this case) have to be present before the Window class can be considered ‘ready’ for use (more on class readiness in a moment).

The ‘mixins’ declaration is a brand new concept when it comes to Ext JS. A mixin is just a set of functions (and sometimes properties) that are merged into a class. For example, the Ext.util.Draggable mixin we defined above might contain a function called ‘startDragging’ - this gets copied into Ext.Window to enable us to use the function in a window instance:

//a simplified Draggable mixin
Ext.define('Ext.util.Draggable', {
    startDragging: function() {
        console.log('started dragging');
    }
});

When we create a new Ext.Window instance now, we can call the function that was mixed in from Ext.util.Draggable:

var win = Ext.create('Ext.Window');
win.startDragging(); //"started dragging"

Mixins are really useful when a class needs to inherit multiple traits but can’t do so easily using a traditional single inheritance mechanism. For example, Ext.Windows is a draggable component, as are Sliders, Grid headers, and many other UI elements. Because this behavior crops up in many different places it’s not feasible to work the draggable behavior into a single superclass because not all of those UI elements actually share a common superclass. Creating a Draggable mixin solves this problem - now anything can be made draggable with a couple of lines of code.

The last new piece of functionality I’ll mention briefly is the ‘config’ declaration. Most of the classes in Ext JS take configuration parameters, many of which can be changed at runtime. In the Ext.Window above example we declared that the class has a ‘title’ configuration, which takes the default value of ‘Window Title’. By setting the class up like this we get 4 methods for free - getTitle, setTitle, resetTitle and applyTitle.

getTitle - returns the current title
setTitle - sets the title to a new value
resetTitle - reverts the title to its default value (‘Window Title’)
applyTitle - this is a template method that you can choose to define. It is called whenever setTitle is called.

The applyTitle function is the place to put any logic that needs to be called when the title is changed - for example we might want to update a DOM Element with the new title:

Ext.define(‘Ext.Window’, {
    //..as above,
    
    config: {
        title: 'Window Title'
    },
    
    //updates the DOM element that contains the window title
    applyTitle: function(newTitle) {
        this.titleEl.update(newTitle);
    }
});

This saves us a lot of time and code while providing a consistent API for all configuration options: win-win.

Digging Deeper

Ext JS 4 introduces 4 new classes to make all this magic work:

Ext.Base - all classes inherit from Ext.Base. It provides basic low-level functionality used by all classes
Ext.Class - a factory for making new classes
Ext.ClassLoader - responsible for ensuring that classes are available, loading them if they aren’t on the page already
Ext.ClassManager - kicks off class creation and manages dependencies

These all work together behind the scenes and most of the time you won’t even need to be aware of what is being called when you define and use a class. The two functions that you’ll use most often - Ext.define and Ext.create - both call Ext.ClassManager under the hood, which in turn utilizes the other three classes to put everything together.

The distinction between Ext.Class and Ext.Base is important. Ext.Base is the top-level superclass for every class ever defined - every class inherits from Ext.Base at some point. Ext.Class represents the class itself - every class you define is an instance of Ext.Class, and a subclass of Ext.Base. To illustrate, let’s say we created a class called MyClass, which doesn’t extend any other class:

Ext.define('MyClass', {
    someFunction: function() {
        console.log('Ran some function');
    }
});

The direct superclass for MyClass is Ext.Base because we didn’t specify that MyClass should extend anything else. If you imagine a tree of all the classes we’ve defined so far, it will look something like this:

This tree bases its hierarchy on the inheritance structure of our classes, and the root is always Ext.Base - that is, every class eventually inherits from Ext.Base. So every item in the diagram above is a subclass of Ext.Base, but every item is also an instance of Ext.Class. Classes themselves are instances of Ext.Class, which means we can easily modify the Class at a later time - for example mixing in additional functionality:

//we can define some mixins at definition time
Ext.define('MyClass', {
    mixins: {
        observable: 'Ext.util.Observable'
    }
});

//it’s easy to add more later too
MyClass.mixin('draggable', 'Ext.util.Draggable');

This architecture opens up new possibilities for dynamic class creation and metaprogramming, which were difficult to pull off in earlier versions.

In the next episode, we’ll look at how the class definition pipeline is structured and how to extend it to add your own features.

Sencha Touch tech talk at Pivotal Labs

Mon, 27 Sep 2010 12:46:58 GMT

I recently gave an introduction to Sencha Touch talk up at Pivotal Labs in San Francisco. The guys at Pivotal were kind enough to record this short talk and share it with the world - it's under 30 minutes and serves as a nice, short introduction to Sencha Touch:

UPDATE: Pivotal got acquired, this link broke. The world moved on.

The slides are available on slideshare and include the code snippets I presented. The Dribbble example used in the talk is very similar to the Kiva example that ships with the Sencha Touch SDK, so I recommend checking that out if you want to dive in further.

Using the Ext JS PivotGrid

Thu, 29 Jul 2010 12:10:13 GMT

One of the new components we just unveiled for the Ext JS 3.3 beta is PivotGrid. PivotGrid is a powerful new component that reduces and aggregates large datasets into a more understandable form.

A classic example of PivotGrid's usefulness is in analyzing sales data. Companies often keep a database containing all the sales they have made and want to glean some insight into how well they are performing. PivotGrid gives the ability to rapidly summarize this large and unwieldy dataset - for example showing sales count broken down by city and salesperson.

A simple example

We created an example of this scenario in the 3.3 beta release. Here we have a fictional dataset containing 300 rows of sales data (see the raw data). We asked PivotGrid to break the data down by Salesperson and Product, showing us how they performed over time. Each cell contains the sum of sales made by the given salesperson/product combination in the given city and year.

Let's see how we create this PivotGrid:

var SaleRecord = Ext.data.Record.create([
    {name: 'person',   type: 'string'},
    {name: 'product',  type: 'string'},
    {name: 'city',     type: 'string'},
    {name: 'state',    type: 'string'},
    {name: 'month',    type: 'int'},
    {name: 'quarter',  type: 'int'},
    {name: 'year',     type: 'int'},
    {name: 'quantity', type: 'int'},
    {name: 'value',    type: 'int'}
]);

var myStore = new Ext.data.Store({
    url: 'salesdata.json',
    autoLoad: true,
    reader: new Ext.data.JsonReader({
        root: 'rows',
        idProperty: 'id'
    }, SaleRecord)
});

var pivotGrid = new Ext.grid.PivotGrid({
    title     : 'Sales Performance',
    store     : myStore,
    aggregator: 'sum',
    measure   : 'value',
    
    leftAxis: [
        {dataIndex: 'person',  width: 80},
        {dataIndex: 'product', width: 90}
    ],
    
    topAxis: [
        {dataIndex: 'year'},
        {dataIndex: 'city'}
    ]
});

The first half of this ought to be very familiar - we just set up a normal Record and Store. This is all we need to load our sample data so that it's ready for pivoting. This is all exactly the same code as for our other Store-bound components like Grid and DataView so it's easy to take an existing Grid and turn it into a PivotGrid.

The second half of the code creates the PivotGrid itself. There are 5 main components to a PivotGrid - the store, the measure, the aggregator, the left axis and the top axis. Taking these in turn:

Store - the Store we created above
Measure - the field in the data that we want to aggregate (in this case the sale value)
Aggregator - the function we use to combine data into the cells. See the docs for full details
Left Axis - the fields to break data down by on the left axis
Top Axis - the fields to break data down by on the top axis

The measure and the items in the axes must all be fields from the Store. The aggregator function can usually be passed in as a string - there are 5 aggregator functions built in: sum, count, min, max and avg.

Renderers

This is all we need to create a simple PivotGrid; now it's time to look at a few more advanced options. Let's start with renderers. Once the data for each cell has been calculated, the value is passed to an optional renderer function, which takes each value in turn and returns another value. One of the PivotGrid examples shows average heights in feet and inches but the calculated data is in decimal. Here's the renderer we use in that example:

new Ext.grid.PivotGrid({
    store     : myStore,
    aggregator: 'avg',
    measure   : 'height',
    
    //turns a decimal number of feet into feet and inches
    renderer  : function(value) {
        var feet   = Math.floor(value),
            inches = Math.round((value - feet) * 12);
            
        return String.format("{0}' {1}"", feet, inches);
    },
    //the rest of the config
});

Customising cell appearance

Another one of the PivotGrid examples uses a custom cell style. As with the renderer, each cell has the opportunity to alter itself with a custom function - here's the one we use in the countries example:

new Ext.grid.PivotGrid({
    store     : myStore,
    aggregator: 'avg',
    measure   : 'height',
    
    viewConfig: {
        getCellCls: function(value) {
            if (value < 20) {
                return 'expense-low';
            } else if (value < 75) {
                return 'expense-medium';
            } else {
                return 'expense-high';
            }
        }
    },
    //the rest of the config
});

Reconfiguring at runtime

A lot of the power of PivotGrid is that it can be used by users of your application to summarize datasets any way they want. This is made possible by PivotGrid's ability to reconfigure itself at runtime. We present one final example of a PivotGrid that can be reconfigured at runtime. Here's how we perform the reconfiguration:

//the left axis can also be changed
pivot.topAxis.setDimensions([
    {dataIndex: 'city', direction: 'DESC'},
    {dataIndex: 'year', direction: 'ASC'}
]);

pivot.setMeasure('value');
pivot.setAggregator('avg');

pivot.view.refresh(true);

It's easy to change the axes, dimension, aggregator and measure at any time and then refresh the data. The calculations are all performed client side so there is no need for another round-trip to the server when reconfiguring. The example linked above gives an example interface for updating a PivotGrid, though anything that can make the API calls above could be used.

I hope you enjoy the new components in this Ext JS 3.3 beta and look forward to comments and suggestions. Although we're only at beta stage I think the additions are already quite robust so feel free to stress-test them.

Offline Apps with HTML5: A case study in Solitaire

Mon, 21 Jun 2010 08:43:53 GMT

One of my contributions to the newly-launched Sencha Touch mobile framework is the Touch Solitaire game. This is not the first time I have ventured into the dizzying excitement of Solitaire game development; you may remember the wonderful Ext JS Solitaire from 18 months ago. I'm sure you'll agree that the new version is a small improvement.

Solitaire is a nice example of a fun application that can be written with Sencha Touch. It makes use of the provided Draggables and Droppables, CSS-based animations, the layout manager and the brand new data package. The great thing about a game like this though is that it can be run entirely offline. Obviously this is simple with a native application, but what about a web app? Our goal is not just having the game able to run offline, but to save your game state locally too.

The answer comes in two parts:

Web Storage and the Sencha data package

HTML5 provides a brand new API called Web Storage for storing data locally. You can read all about it on my Web Storage post on Sencha's blog but the summary is that you can store string data locally in the browser and retrieve it later, even if the browser or the user's computer had been restarted in the meantime.

The crucial part of the sentence above is that we can only store string data. In the case of a game of Solitaire we need to store data on the elapsed time and number of moves as well as the location and status of each card. This doesn't sound like the kind of data we want to manually encode into a string, so thankfully the data package comes to the rescue.

The Sencha Touch data package is a complete rewrite of the package that has been so successful in powering Ext JS 3.x. It shares many of the same philosophies and adds the learning we have gained from developing Ext JS 3.x over the past year. One of the new capabilities it offers us is a Local Storage proxy, which automatically marshalls your model data into local storage and transparently restores it when you need it.

Using the new proxy is simple - all we need to do is set up a new Store, specifying the Proxy and the Model that will be saved to it. Models are the spiritual successor to Ext JS 3.x's Records. Now whenever we add, remove or update model instances in the store they are automatically saved to localStorage for us. Loading the store again is equally easy:

//set the store up
var gameStore = new Ext.data.Store({
    proxy: new Ext.data.LocalStorageProxy({
        id: 'solitaire-games'
    }),
    model: 'Game'
});

//saves all outstanding modifications, deletions or creations to localStorage
gameStore.sync();

//load our saved games
gameStore.read({
    scope: this,
    callback: function(records) {
        //code to load the first record
    }
});

And just like that we can save and restore games with Web Storage. We can visit our app's webpage and start a game then come back later and find it automatically restored. But we still can't play offline, for that we need the application cache.

The HTML5 Application Cache Manifest

The application cache is one of the best features of HTML5. It provides a simple (though sometimes frustrating) way of telling the browser about all of the files your application relies on so that it can download them all ready for offline use. All you have to do is create what's known as a manifest file which lists all of the files the application needs - the Solitaire manifest looks like this:

CACHE MANIFEST
#rev49

resources/icon.png
resources/loading.png

resources/themes/wood/board.jpg
resources/themes/wood/cards.png

resources/css/ext-touch.css
resources/solitaire-notheme.css
resources/themes/wood/wood.css
resources/themes/metal/metal.css

ext-touch-debug.js
solitaire-all-debug.js

We tell the browser about the manifest file by pointing to it in the tag's manifest atttibute. When the browser finds this file it downloads each of the listed assets so that they are ready for offline consumption. Note that it does not automatically include them on the page, you still need to do that yourself via the usual link and script tags. Here's a snippet of the Solitaire index.html file:

<!doctype html>
<html manifest="solitaire.manifest">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">	
        <title>Solitaire</title>

        <link rel="stylesheet" href="resources/css/ext-touch.css" type="text/css">
        <link rel="stylesheet" href="resources/solitaire-notheme.css" type="text/css">
        <link rel="stylesheet" href="resources/themes/wood/wood.css" type="text/css">

        <script type="text/javascript" src="ext-touch-debug.js"></script>
        <script type="text/javascript" src="solitaire-all-debug.js"></script>

Note the manifest file definition in the html element at the top, and the fact that we still include our page resources the normal way. It sounds easy, but without a little setup first it can be a very frustrating experience. Usually your browser will try to cache as many files as possible, including the manifest file itself - we don't want this. As soon as your browser has a long-term cache of the manifest file it is extremely difficult to update your application - all of the files are already offline and won't be updated, and the browser won't even ask the server for an updated manifest file.

Preventing this behaviour turns out to be fairly easy, and the solution in its simplest form comes in the shape of a .htaccess file with contents like the following:

<Files solitaire.manifest> 
    ExpiresActive On 
    ExpiresDefault "access" 
</Files>

This directs Apache to tell the browser not to cache the manifest file at all, instead requesting the file from the server on every page load. Note that if the device is currently offline it will use the last manifest file it received.

This is half the battle won, but let's say you change one of your application files and reload - you'll find nothing happened. This is because when your browser asked the server for the manifest file it actually asked if the file had changed or not. As the manifest itself wasn't updated, the server responds with a 304 (Not Modified) and your browser keeps the old file.

To make the browser pick up on the change to the application file you need to update the manifest file itself. This is where the mysterious "#rev49" comes in on the manifest example file above. This is a suggestion from the excellent diveintohtml5 article on the subject - whenever you change any application files just bump up the revision number in the manifest file and your browser will know to download the updated files.

One final detail is that your Apache server probably isn't set up to server manifest files with the correct mime type, so be sure to add the following line to your Apache config and restart the server:

AddType text/cache-manifest .manifest

Wrapping it up

Offline access is a big deal for mobile apps and Sencha Touch makes them much easier to write. The benefit is not so much that the apps can run without an internet connection (many modern touch devices have a near-permanent connection to the internet already), but that web apps can now be treated as first-class citizens alongside native apps.

The fact that many devices allow your users to save your app to their home screen and load it as though it were native is an important step - you keep all of the advantages of web app deployment while gaining some of the benefits of native apps. As more and more native hardware APIs become available to web apps their importance will only grow.

If you want to check out Solitaire's offline support for yourself visit the application's site and save it to your iPad's home page. Try turning on airplane mode and loading the app and see how it behaves as though it were native. If you don't have an iPad, you can load the app in up-to-date versions of Chrome or Safari and get a similar experience.

Writing Compressible JavaScript

Sat, 19 Jun 2010 15:02:38 GMT

Writing a library is a balancing act between the (sometimes competing) interests of API clarity, code clarity, performance and compressibility. In this article I'm going to detail three of the approaches we take to meet this balance and suggest them for your own usage.

1. Collecting var statements

Every time we declare variables we add 4 bytes to the compressed file size. Variables are declared sufficiently often that this can really add up, so instead of this:

var myFirstVar = 'something'; 
var myOtherVar = 'another thing'; 
var answer = 42; 
var adama = true;

One should use this form:

var myFirstVar = 'something', 
    myOtherVar = 'another thing', 
    answer = 42, 
    adama = true;

When this code is compressed, each variable name above is turned into a single-letter name, meaning that the wasted 4 bytes per useless additional 'var ' in the first example would have contributed significantly to code size with no benefit.

2. Local variable pointers to object properties

The following code (pruned from a previous version of Ext JS) is not as compressible as it could be:

var cs = {}; 
for (var n in this.modified) { 
    if (this.modified.hasOwnProperty(n)) { 
        cs[n] = this.data[n]; 
    } 
} 
return cs;

We're better off aliasing 'this.modified' to a local variable first. Aside from the performance benefits some JS engines derive from not having to perform object property lookups over and over again, we save precious bytes this way too:

var modified = this.modified, 
    changes  = {}, 
    field; 

for (field in modified) { 
    if (modified.hasOwnProperty(field)) { 
        changes[field] = this.data[field]; 
    } 
}

return changes;

Again, the minifier will compress those variable names down to a single character each, so for our 'this.modified' example we're going to use 15 bytes to define the variable plus 1 byte each time we use it (totalling 15), vs the 26 bytes for that code previously. This approach scales especially well - were we to refer to this.modified a third time in the function, as our code will now minify to 16 bytes, vs 39 without the variable declaration.

Side note: in the first example here the variable 'n' was being used in the for...in loop. We can always safely exchange that for a meaningful variable name (in this case 'field') and leave the rest to the minifier.

3. Aliasing 'this' to 'me'

In Ext.data.Record's markDirty method we have the following code:

this.dirty = true; 
if(!this.modified){ 
    this.modified = {}; 
} 
this.fields.each(function(f) { 
    this.modified[f.name] = this.data[f.name]; 
},this);

There are 7 references to 'this' in that method, taking 28 bytes and completely incompressible. We could rewrite it like this (note that the final 'this' can be removed in this format):

var me = this; 

me.dirty = true; 
if (!me.modified) { 
    me.modified = {}; 
} 
me.fields.each(function(f) { 
    me.modified[f.name] = me.data[f.name]; 
});

Again, our minifier will change the 'me' var to a single character, saving us 8 bytes in this instance. That might not sound like a lot but after minification it equates to a 7% reduction in code size for this function.

In each of the cases above we're generally talking about single-digit percentage savings after minification. There is value in that small slice though, especially as more and more applications shift onto bandwidth-constrained mobile platforms.

The first two approaches are no-brainers and must always be done but the third is slightly more controversial. Personally I find it makes the code a little harder to read, largely because my syntax highlighter doesn't recognise that 'me' is now the same as 'this'. Its value also varies significantly by function - some functions can contain over a dozen references to 'this', in which case this approach makes a big difference.

Ext JS 3.2 beta out today

Tue, 09 Mar 2010 11:19:24 GMT

We pushed out a beta release of Ext JS 3.2 this morning. Although we've marked it as beta, it's a pretty solid release and we expect to release a final version shortly. The DataView transitions are especially fun - watch this space for a fuller example...

Here's a quick rundown of the features we added:

One of the big projects we've undertaken that most people probably won't find so exciting is ramping up our internal QA efforts. Our unit test coverage has increased dramatically in the past couple of months, and we've built infrastructure to run all of our tests on every browser/OS we support in a fully automated fashion. Doing TDD on Ext JS is an awesome feeling.

I'll talk more in the future about what we're doing internally to ensure the quality of our code, framework performance and rendering.

Answering Nicholas Zakas' JavaScript quiz

Tue, 16 Feb 2010 12:41:17 GMT

A current meme that's floating around the JavaScript geek corner of the internet is setting quizzes on some of the more unusual aspects of JavaScript. This time round Nicholas Zakas is providing the entertainment, so I thought I'd provide some answers. Let's get started:

Question 1

Question 1 looks like this:


var num1 = 5,
    num2 = 10,
    result = num1+++num2;

We're asked what the values of result, num2 and num1 are. First, let's deconstruct what that +++ is doing. There is no +++ operator in JavaScript - instead we have a num1++ followed by a + num2.

JavaScript has two ways of incrementing a number by 1 - we can either put the ++ before the variable or after it. The variable is incremented either way - the only difference is what is returned. ++10 returns 11, whereas 10++ returns 10:


var a = 10;

var b = a++; //a is set to 11 now, but b is set to 10
var c = ++a; //a is set to 12 now, c is also set to 12

So 'result' is the sum of num1++ (which is 5) and num2, which is 10, so result equals 15. num2 remains at 10 as it was not modified. num1 is now equal to 6 because we incremented it by 1, though the incrementation did not affect the sum passed to result.

Question 2


var x = 5,
    o = {
        x: 10,
        doIt: function doIt(){
            var x = 20;
            setTimeout(function(){
                alert(this.x);
            }, 10);
        }
    };
o.doIt();

We're asked what is alerted. This is mostly smoke and mirrors - there's some indirection with all the duplicate names but the important thing here is the setTimeout. The function we pass to setTimeout gets run in the global scope, meaning 'this' refers to the window object. Declaring x as a variable in the global scope (var x = 5) is the same as setting window.x = 5, so 5 is alerted.

Question 3


var num1 = "10",
    num2 = "9";

We're asked:

What is the value of num1 < num2?
What is the value of +num1 < num2?
What is the value of num1 + num2?
What is the value of +num1 + num2?

This question is all about type casting.

num1 < num2 is true - if both operands are strings JavaScript will compare them alphabetically, and "10" is lower alphabetically than "9"
+num1 < num2 is false - placing a "+" operator before a string casts it into a number, so we're actually testing 10 < "9". When testing a mixture of numbers and strings like this, everything is cast into a number, so we're testing 10 < 9, which is false
num1 + num2 === "109" - the plus sign can mean both addition and concatenation, depending on the operand types. Here we have 2 strings so we're concatenating them together
+num1 + num2 === "109" also - again we're casting num1 into a number, but the + operator means concatenation if at least one operand is a string

The confusion around this comes largely from the fact that the plus sign is used for both addition and concatenation in JavaScript. This causes the engine to have to test the typeof each operand and cast accordingly. All of the other math operators (e.g. /, *, % etc) cast both operands to numbers.

Question 4


var message = "Hello world!";

We're asked:

What is the value of message.substring(1, 4)?
What is the value of message.substr(1,4)?

substring and substr do similar things. The first argument to each is the character index to start from, but whereas substring's second argument is the character index to end at, substr's second argument is the number of characters to return. Therefore message.substr(1, 4) will return a string of length 4, whereas message.substring(1, 4) will return a string of length 3 (4 - 1):

message.substring(1, 4); //"ell"
message.substr(1, 4); //"ello"

Question 5


var o = {
        x: 8,

        valueOf: function(){
            return this.x + 2;
        },
        toString: function(){
            return this.x.toString();
        }
    },
    result = o < "9";
alert(o);

We're asked the value of 'result', and what gets alerted. This requires an understanding of the special valueOf and toString functions available on every object. These functions are used internally by the JavaScript engine to pull out the best representation of an object's value based on the situation.

When alerting a value, we want a string representation so toString is called. When comparing the value to another object, valueOf is called instead. So alert(o) alerts "8", and result is set equal to the result of 10 < "9". The JavaScript engine will decide when to use which option, or we can specify it ourselves:

var num1 = 8, num2 = 9;

num1 + num2; //17
num1.toString() + num2.toString(); //"89"
num1.valueOf() + num2.valueOf(); //17

The 'result' assignment needs a little explanation. First, the engine calls valueOf on the object, which returns 10. Second, because one of the operands to the < operator is a number, the other is also cast into a number, so we are testing 10 < 9, which returns false. We could instead force it to use toString: o.toString() < "9" returns true.

Quizzes like this are great for getting your teeth into some of the guts of JavaScript, but don't mistake them for a good way to write code. The point is to demonstrate how quirky JS code can be unless you write it in a sensible way.

Jaml updates

Fri, 29 Jan 2010 01:56:43 GMT

Jaml seems to have been getting a lot of interest lately. Here are a few quick updates on what's been going on:

Tom Robinson added support for CommonJS
Eneko Alonso ported the project to MooTools, creating mooml
Carl Furrow wrote up a nice comparison on Jaml and EJS
Jaml is now a rendering option in JavaScriptMVC, along with John Resig's microtemplates
Andrew Dupont committed a series of patches such as improving Jaml's efficiency and optionally removing the 'with' and 'eval' magic

In addition Jaml was recently picked up by Ajaxian, and a couple of people have written up blog posts about Jaml in languages other than English, which is great to see.

Jaml is up on Github and has a number of forks already. If you like the library and have something to add, fork away and send me a pull request!

If you've never seen Jaml before or have forgotten what it does, it turns this:

div(
  h1("Some title"),
  p("Some exciting paragraph text"),
  br(),

  ul(
    li("First item"),
    li("Second item"),
    li("Third item")
  )
);

Into this:

<div>
  <h1>Some title</h1>
  <p>Some exciting paragraph text</p>
  <br />
  <ul>
    <li>First item</li>
    <li>Second item</li>
    <li>Third item</li>
  </ul>
</div>

See the original post for more details.

Ext JS is looking for a QA rockstar

Tue, 19 Jan 2010 00:05:18 GMT

This has been cross-posted from our Open Discussion Forum.

As part of our ambition of creating the world's best JavaScript framework, we're looking to hire a special somebody to help maintain the high quality of our components.

While we have one eye on implementing new features and improving Ext JS's performance, the other is on making sure what we already have still works well.

This is a difficult job and we need someone smart, focused and well versed in Ext JS. Somebody who will:

Use our existing systems to test components as new builds of the library are landed
Maintain a strong presence in the forums and be the first to know of any reported issues
Respond to bug tickets such as rendering issues and broken functionality
Totally own the Quality Assurance of Ext JS - we want your ideas and your initiative as well as your expertise with Ext
Liaise with the core team on a daily basis

This is a full-time position, though allowances can be made for the right person. If you think you would enjoy working with Ext JS, and have what it takes to help us keep Ext at the forefront of our field, drop me a private message with the following information:

Your name
Email address
Location (city, country, timezone)
All experience with Ext JS
Bonus points for links to open source software

2010: The year Ext JS takes over

Wed, 13 Jan 2010 09:12:07 GMT

On January 1st 2010 I officially joined Ext JS to take over the role of lead developer. After living and breathing Ext for the last 3 years I am delighted to have joined the company itself. Ext JS has lead the way in developing rich client side applications since the very first release; this is a tradition we will continue and build upon.

2010 is going to be an extremely exciting year for Ext JS. A new focus is being placed on helping developers create their applications much more quickly, with the help of advanced creation tools and a standardised application architecture right out of the box.

We will continue the performance improvements started in 3.1 to make sure that Ext applications really fly. Ext JS 3.2 will be the fastest, most stable version ever released.

2010 is also the year that Ext JS becomes much easier to learn. With a completely reinvented learning section, Ext will no longer take months to learn and understand - even our API documentation will get a facelift.

The upcoming Marketplace will be the perfect venue to find and share new, high quality components created by our awesome developer community. Think of the Marketplace as the App Store for Ext JS - full of great offerings that are easy to drop in to any application.

Calling all able-minded Ext JS developers

Ext JS is already the best JavaScript library in the world for creating rich, desktop-quality applications on the web. If you want to help us make it even better, I want to hear from you.

As well as creating new components and improving our application support, we need people to help us maintain the quality and stability of what we already have. If you're intimate with Ext and think you have what it takes to get involved, drop me a PM and introduce yourself.

OSX Screensaver emulation with Canvas: That's Bean

Sun, 06 Dec 2009 18:50:31 GMT

OS X has a pretty little screensaver which takes a bunch of images and 'drops' them, spinning, onto the screen. Think of it like scattering photographs onto a table, one at a time.

Naturally, there's a desperate need for a JavaScript/Canvas port of this functionality, resulting in the following:

I had to limit the video capture framerate a bit so the video makes it look less smooth than it actually is. Check it out running in your own browser here.

For obvious reasons I have called the code behind this Bean, and it's all available up on Github.

For the curious, here's a little explanation about how it works. Bean starts off with a blank canvas and a list of image urls, which it preloads before getting started. It then drops one image at a time, rotating it as it goes. Each falling image is called a Plunger, because it plunges.

Each Plunger gets a random position and rotation to end up in, and takes care of drawing itself to the canvas on each frame by calculating its current size and rotation as it falls away from you.

Drawing each Plunger image on every frame quickly starts to kill the CPU, so we take a frame snapshot every time a Plunger has finished its descent. This just entails drawing the completed Plunges first and then using Canvas' getImageData API to grab the pixel data for the image.

This gives us a snapshot of all of the fallen Plungers, meaning we can just draw a single background image and the currently falling Plunger on each frame. This approach ensures the performance remains constant, as we are only ever drawing a maximum of 2 images per frame. Each time a Plunger finishes its descent a new snapshot is taken.

Bean attempts to draw a new frame roughly 25 times per second and modern browsers seem to handle this pretty well. Safari pulls around 60% of one core on my MacBook Pro, with Firefox somewhat less performant. Needless to say, I didn't even bother trying to make this work with IE.

Here's the code to set the Bean in motion. This is using a few bundled APOD images:

var bean = new Bean({
  imageUrls: [
    'images/DoubleCluster_cs_fleming.jpg',
    'images/NGC660Hagar0_c900.jpg',
    'images/filaments_iac.jpg',
    'images/m78wide_tvdavis900.jpg',
    'images/sunearthpanel_sts129.jpg',
    'images/NGC253_SSRO_900.jpg',
    'images/Ophcloud_spitzer_c800.jpg'
  ],
  canvasId : 'main',
  fillBody : true
});

bean.onReady(function(bean) {
  bean.start();
});

Ext.ux.Exporter - export any Grid to Excel or CSV

Tue, 24 Nov 2009 09:32:33 GMT

Sometimes we want to print things, like grids or trees. The Ext JS printing plugin is pretty good for that. But what if we want to export them instead? Enter Ext.ux.Exporter.

Ext.ux.Exporter allows any store-based component (such as grids) to be exported, locally, to Excel or any other format. It does not require any server side programming - the export document is generated on the fly, entirely in JavaScript.

The extension serves as a base for exporting any kind of data, but comes bundled with a .xls export formatter suitable for exporting any Grid straight to Excel. Here's how to do that:

var grid = new Ext.grid.GridPanel({
  store: someStore,
  tbar : [
    {
      xtype: 'exportbutton',
      store: someStore
    }
  ],
  //your normal grid config goes here
});

Clicking the Download button in the top toolbar iterates over the data in the store and creates an Excel file locally, before Base64 encoding it and redirecting the browser via a data url. If you have Excel or a similar program installed your browser should ask you to save the file or open it with Excel.

I put together a quick example of the plugin in action inside the repository, just clone or download the code and drag the examples/index.html file into your browser to run it.

The Exporter will work with any store or store-based component. It also allows export to any format - for example CSV or PDF. Although the Excel Formatter is probably the most useful, implementing a CSV or other Formatter should be trivial - check out the Excel Formatter example in the ExcelFormatter directory.

Jaml: beautiful HTML generation for JavaScript

Wed, 04 Nov 2009 12:03:15 GMT

Generating HTML with JavaScript has always been ugly. Hella ugly. It usually involves writing streams of hard-to-maintain code which just concatenates a bunch of strings together and spits them out in an ugly mess.

Wouldn't it be awesome if we could do something pretty like this:

div(
  h1("Some title"),
  p("Some exciting paragraph text"),
  br(),

  ul(
    li("First item"),
    li("Second item"),
    li("Third item")
  )
);

And have it output something beautiful like this:

<div>
  <h1>Some title</h1>
  <p>Some exciting paragraph text</p>
  <br />
  <ul>
    <li>First item</li>
    <li>Second item</li>
    <li>Third item</li>
  </ul>
</div>

With Jaml, we can do exactly that. Jaml is a simple library inspired by the excellent Haml library for Ruby. It works by first defining a template using an intuitive set of tag functions, and then rendering it to appear as pretty HTML. Here's an example of how we'd do that with the template above:

Jaml.register('simple', function() {
  div(
    h1("Some title"),
    p("Some exciting paragraph text"),
    br(),

    ul(
      li("First item"),
      li("Second item"),
      li("Third item")
    )
  );
});

Jaml.render('simple');

All we need to do is call Jaml.register with a template name and the template source. Jaml then stores this for later use, allowing us to render it later using Jaml.render(). Rendering with Jaml gives us the nicely formatted, indented HTML displayed above.

So we've got a nice way of specifying reusable templates and then rendering them prettily, but we can do more. Usually we want to inject some data into our template before rendering it - like this:

Jaml.register('product', function(product) {
  div({cls: 'product'},
    h1(product.title),

    p(product.description),

    img({src: product.thumbUrl}),
    a({href: product.imageUrl}, 'View larger image'),

    form(
      label({'for': 'quantity'}, "Quantity"),
      input({type: 'text', name: 'quantity', id: 'quantity', value: 1}),

      input({type: 'submit', value: 'Add to Cart'})
    )
  );
});

In this example our template takes an argument, which we've called product. We could have called this anything, but in this case the template is for a product in an ecommerce store so product makes sense. Inside our template we have access to the product variable, and can output data from it.

Let's render it with a Product from our database:

//this is the product we will be rendering
var bsg = {
  title      : 'Battlestar Galactica DVDs',
  thumbUrl   : 'thumbnail.png',
  imageUrl   : 'image.png',
  description: 'Best. Show. Evar.'
};

Jaml.render('product', bsg);

The output from rendering this template with the product looks like this:

<div class="product">
  <h1>Battlestar Galactica DVDs</h1>
  <p>Best. Show. Evar.</p>
  <img src="thumbnail.png" />
  <a href="image.png">View larger image</a>
  <form>
    <label for="quantity">Quantity</label>
    <input type="text" name="quantity" id="quantity" value="1"></input>
    <input type="submit" value="Add to Cart"></input>
  </form>
</div>

Cool - we've got an object oriented declaration of an HTML template which is cleanly separated from our data. How about we define another template, this time for a category which will contain our products:

Jaml.register('category', function(category) {
  div({cls: 'category'},
    h1(category.name),
    p(category.products.length + " products in this category:"),

    div({cls: 'products'},
      Jaml.render('product', category.products)
    )
  );
});

Our category template references our product template, achieving something rather like a partial in Ruby on Rails. This obviously allows us to keep our templates DRY and to easily render a hypothetical Category page like this:

//here's a second product
var snowWhite = {
  title      : 'Snow White',
  description: 'not so great actually',
  thumbUrl   : 'thumbnail.png',
  imageUrl   : 'image.png'
};

//and a category
var category = {
  name    : 'Doovde',
  products: [bsg, snowWhite]
}

Jaml.render('category', category);

All we've done is render the 'category' template with our 'Doovde' category, which contains an array of products. These were passed into the 'product' template to produce the following output:

<div class="category">
  <h1>Doovde</h1>
  <p>2 products in this category:</p>
  <div class="products"><div class="product">
  <h1>Battlestar Galactica DVDs</h1>
  <p>Best. Show. Evar.</p>
  <img src="thumbnail.png" />
  <a href="image.png">View larger image</a>
  <form>
    <label for="quantity">Quantity</label>
    <input type="text" name="quantity" id="quantity" value="1"></input>
    <input type="submit" value="Add to Cart"></input>
  </form>
</div>
<div class="product">
  <h1>Snow White</h1>
  <p>not so great actually</p>
  <img src="thumbnail.png" />
  <a href="image.png">View larger image</a>
  <form>
    <label for="quantity">Quantity</label>
    <input type="text" name="quantity" id="quantity" value="1"></input>
    <input type="submit" value="Add to Cart"></input>
  </form>
</div>
</div>
</div>

You can see live examples of all of the above at http://edspencer.github.com/jaml.

Jaml currently sports a few hacks and is not particularly efficient. It is presented as a proof of concept, though all the output above is true output from the library. As always, all of the code is up on Github, and contributions are welcome :)

Jaml would be suitable for emulating a Rails-style directory structure inside a server side JavaScript framework - each Jaml template could occupy its own file, with the template name coming from the file name. This is roughly how Rails and other MVC frameworks work currently, and it eliminates the need for the Jaml.register lines. Alternatively, the templates could still be stored server side and simply pulled down and evaluated for client side rendering.

Happy rendering!

Making RowEditor use your column renderers

Thu, 29 Oct 2009 09:42:47 GMT

The RowEditor plugin is one of my favourite Ext JS components. It basically allows any row on a grid to be turned into an adhoc form on the fly, saving you the effort of defining additional form components.

Recently I had a grid which had a few fields that don't have an editor, something like this:

var myGrid = new Ext.grid.GridPanel({
  plugins: [new Ext.ux.grid.RowEditor()],
  columns: [
    {
      header   : "Username",
      dataIndex: 'username',
      editor   : new Ext.form.TextField()
    },
    {
      header   : "Signup date",
      dataIndex: 'created_at',
      renderer : Ext.util.Format.dateRenderer('m/d/Y')
    }
  ]
});

Simple stuff - we just show a username and a signup date, which is altered by a renderer. When we double-click a row it turns into an editable row, and we get a textfield allowing us to edit the username. Unfortunately, while in edit mode our date renderer is ignored, and the raw value displayed instead.

Thankfully, we can fix this by altering RowEditor's source code. The method we need to change is startEditing, which sadly suffers from long method syndrome. About halfway into that method there's a for loop, which we're going to alter to look like this:

for (var i = 0, len = cm.getColumnCount(); i < len; i++){
  val = this.preEditValue(record, cm.getDataIndex(i));
  f = fields[i];
  
  //our changes start here
  var column = cm.getColumnById(cm.getColumnId(i));
  
  val = column.renderer.call(column, val, {}, record);
  //our changes end here
  
  f.setValue(val);
  this.values[f.id] = Ext.isEmpty(val) ? '' : val;
}

We didn't really have to do much, just grab the renderer for the column and pass it the default value and the record which was found earlier in the method.

For the curious, the empty object we pass in as the second argument to the renderer is what would usually be the 'meta' object (see the renderer documentation on the Column class). Under the covers, RowEditor actually creates an Ext.form.DisplayField instance for each column that you don't specify an editor for. This is why we use f.setValue(val); above. DisplayField doesn't have the same meta stuff as a normal cell would, so if you're looking to customise CSS via the metadata you'll have to do something like this instead:

columns: [
  {
     ...
    editor: new Ext.form.DisplayField({
      cls: 'myCustomCSSClass',
      style: 'border: 10px solid red;'
    })
  }
]

Pretty easy. It's a shame we have to overwrite the source code as this makes the solution less future proof, but if you look at RowEditor's source code you'll see why a 45 line override would be equally unpleasant.

git: what to do if you commit to no branch

Wed, 28 Oct 2009 17:14:56 GMT

Using git, you'll sometimes find that you're not on any branch. This usually happens when you're using a submodule inside another project. Sometimes you'll make some changes to this submodule, commit them and then try to push them up to a remote repository:

ed$ git commit -m "My excellent commit"
[detached HEAD d2bdb98] My excellent commit
 3 files changed, 3 insertions(+), 3 deletions(-)
ed$ git push origin master
Everything up-to-date

Er, what? Everything is not up to date - I just made changes! The clue is in the first part of the commit response - [detached HEAD d2bdb98]. This just means that we've made a commit without actually being on any branch.

Luckily, this is easy to solve - all we need to do is checkout the branch we should have been on and merge in that commit SHA:

ed$ git checkout master
Previous HEAD position was d2bdb98... My excellent commit
Switched to branch 'master'
ed$ git merge d2bdb98
Updating 88f218b..d2bdb98
Fast forward
 ext-mvc-all-min.js |    2 +-
 ext-mvc-all.js     |    2 +-
 view/FormWindow.js |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Once we got onto the master branch, we just called git merge with the SHA reference for the commit we just made (d2bd98), which applied our commit to the master branch. The output tells us that the commit was applied, and now we can push up to our remote repository as normal:

ed$ git push origin master
Counting objects: 11, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 692 bytes, done.
Total 6 (delta 4), reused 0 (delta 0)
To git@github.com:extmvc/extmvc.git
   88f218b..d2bdb98  master -> master

This had me puzzled for a while so hopefully it'll save someone banging their head against a nearby wall.

Writing Better JavaScript - split up long methods

Tue, 06 Oct 2009 15:51:51 GMT

For the second time this week I'm going to pick on the usually delightful Ext JS library. Last time we discussed the overzealous use of the Module pattern; this time it's the turn of bloated methods.

As before, I'm not really picking on Ext at all - this happens all over the place. But again, this is the library closest to my heart and the one I know the best.

The Problem

We're going to take a look at Ext.data.XmlReader's readRecords method. Before we get started though, I'll repeat that this is intended as an example of an approach, not a whine at Ext in particular.

/**
 * Create a data block containing Ext.data.Records from an XML document.
 * @param {Object} doc A parsed XML document.
 * @return {Object} records A data block which is used by an {@link Ext.data.Store} as
 * a cache of Ext.data.Records.
 */
readRecords: function(doc) {
  /**
   * After any data loads/reads, the raw XML Document is available for further custom processing.
   * @type XMLDocument
   */
  this.xmlData = doc;
  var root = doc.documentElement || doc;
  var q = Ext.DomQuery;
  var recordType = this.recordType, fields = recordType.prototype.fields;
  var sid = this.meta.idPath || this.meta.id;
  var totalRecords = 0, success = true;
  if(this.meta.totalRecords){
    totalRecords = q.selectNumber(this.meta.totalRecords, root, 0);
  }

  if(this.meta.success){
    var sv = q.selectValue(this.meta.success, root, true);
    success = sv !== false && sv !== 'false';
  }
  var records = [];
  var ns = q.select(this.meta.record, root);
  for(var i = 0, len = ns.length; i < len; i++) {
    var n = ns[i];
    var values = {};
    var id = sid ? q.selectValue(sid, n) : undefined;
    for(var j = 0, jlen = fields.length; j < jlen; j++){
      var f = fields.items[j];
      var v = q.selectValue(Ext.value(f.mapping, f.name, true), n, f.defaultValue);
      v = f.convert(v, n);
      values[f.name] = v;
    }
    var record = new recordType(values, id);
    record.node = n;
    records[records.length] = record;
  }

  return {
    success : success,
    records : records,
    totalRecords : totalRecords || records.length
  };
}

Anyone care to tell me what this actually does? Personally, I have absolutely no idea. I recently found myself needing to implement an XmlReader subclass with a twist which required understanding how this works, and let's just say it wasn't easy!

So what is it that makes the above so terrifyingly hard to understand? Well, in no particular order:

It's too long - you'd need to be a genius to easily understand what's going on here
The variable names don't make much sense - some of the oddest include 'q', 'ns', 'v', 'f' and 'sv'
There's minimal commenting - we're given a single-line clue at the very top as to what these 40-odd lines do

A Solution

Let's see how the reworked code below addresses each of the concerns above:

Although we end up with more lines of code here, no single method is more than around 10 LOC
No single letter variable names - you no longer have to decode what 'sv' means
Constructive commenting allows rapid comprehension by skimming the text

One additional and enormous benefit here comes directly from splitting logic into discrete methods. Previously if you'd wanted to implement your own logic to determine success, get the total number of records or even build a record from an XML node you'd be stuck. There was no way to selectively override that logic without redefining that entire monster method.

With our new approach this becomes trivial:

Ext.extend(Ext.data.XmlReader, Ext.data.DataReader, {
  readRecords: function(doc) {
    this.xmlData = doc;
    
    //get local references to frequently used variables
    var root    = doc.documentElement || doc,
        records = [],
        nodes   = Ext.DomQuery.select(this.meta.record, root);
    
    //build an Ext.data.Record instance for each node
    Ext.each(nodes, function(node) {
      records.push(this.buildRecordForNode(node));
    }, this);

    return {
      records     : records,
      success     : this.wasSuccessful(root),
      totalRecords: this.getTotalRecords(root) || records.length
    };
  },
  
  /**
   * Returns a new Ext.data.Record instance using data from a given XML node
   * @param {Element} node The XML node to extract Record values from
   * @return {Ext.data.Record} The record instance
   */
  buildRecordForNode: function(node) {
    var domQuery = Ext.DomQuery,
        idPath   = this.meta.idPath || this.meta.id,
        id       = idPath ? domQuery.selectValue(idPath, node) : undefined;
        
    var record  = new this.recordType({}, id);
    record.node = node;
    
    //iterate over each field in our record, find it in the XML node and convert it
    record.fields.each(function(field) {
      var mapping  = Ext.value(field.mapping, field.name, true),
          rawValue = domQuery.selectValue(mapping, node, field.defaultValue),
          value    = field.convert(rawValue, node);
      
      record.set(field.name, value);
    });
    
    return record;
  },
  
  /**
   * Returns the total number of records indicated by the server response
   * @param {XMLDocument} root The XML response root node
   * @return {Number} total records
   */
  getTotalRecords: function(root) {
    var metaTotal = this.meta.totalRecords;
    
    return metaTotal == undefined 
                      ? 0 
                      : Ext.DomQuery.selectNumber(metaTotal, root, 0);
  },
  
  /**
   * Returns true if the response document includes the expected success property
   * @param {XMLDocument} root The XML document root node
   * @return {Boolean} True if the XML response was successful
   */
  wasSuccessful: function(root) {
    var metaSuccess  = this.meta.success;
    
    //return true for any response except 'false'
    if (metaSuccess == undefined) {
      return true;
    } else {
      var successValue = Ext.DomQuery.selectValue(metaSuccess, root, true);
      return successValue !== false && successValue !== 'false';
    }
  }
});

(For brevity I have omitted the existing readRecords comment blocks from the above)

I suggest that you structure your code in this way at least 99% of the time. The one exception is if high performance is an issue. If you are in a situation where every millisecond counts (you probably aren't), then taking the former route becomes more acceptable (though there's still no excuse for not adding a few comments explaining what the code actually does).

My refactored code almost certainly runs slower than the original as it doesn't take as much advantage of cached local variables as the monolithic version does. For library-level code this can make sense if the performance gain is significant, but for the everyday code you and I write it is rarely a good idea.

I'll be watching.

JavaScript Module pattern - overused, dangerous and bloody annoying

Mon, 05 Oct 2009 18:33:00 GMT

The Module Pattern is a way of using a closure in JavaScript to create private variables and functions. Here's a brief recap:

var myObject = (function() {
  //these are only accessible internally
  var privateVar = 'this is private';
  var privateFunction = function() {
    return "this is also private";
  };
  
  return {
    //these can be accessed externally
    publicVar: 'this is public',
    
    publicFunction: function() {
      return "this is also public"
    },

    //this is a 'privileged' function - it can access the internal private vars
    myFunction: function() {
      return privateVar;
    }
  };
})();

myObject.privateVar; //returns null as private var is private
myObject.myFunction(); //return the private var as myFunction has access to private properties

Breaking this down, we create a function which is executed immediately (via the brackets at the end) and returns an object which gets assigned to myObject.

Because this object contains references to our private variable (privateVar is referenced inside myFunction), the JavaScript engine keeps privateVar available in memory which means myFunction can still access it using what is called a closure. This pattern as a whole is usually called the Module Pattern.

Why it's bad

On the face of it, private variables sound like a good thing. We have them in other languages after all, so why not in JavaScript too?

The reason that you shouldn't use the Module pattern 90% of the time you think you should is that it entirely negates the dynamic nature of the language. If a class does 99% of what you want and you (rightly) don't want to directly modify the source code, you will be thwarted every time if the class uses this pattern.

Example

I'll share a recent example of this using a class in the Ext JS library. Ext is by no means the only library guilty of this, but it's the one I use on a daily basis, and this is not the only example of this problem in the library.

The Ext.DomQuery object is a helper which allows us to parse XML documents locally. Unfortunately, it suffers from a limitation which causes the text content of an XML node to be truncated if it is over a certain size limit (just 4kb in Firefox, though this differs by browser). This isn't actually a problem of Ext's making, though it can solve it using just 1 line of code.

Ideally, we'd just be able to do this:

Ext.apply(Ext.DomQuery, {
  selectValue : function(path, root, defaultValue){
    path = path.replace(trimRe, "");
    if(!valueCache[path]) valueCache[path] = Ext.DomQuery.compile(path, "select");
    
    var n = valueCache[path](root), v;
    n = n[0] ? n[0] : n;
    
    //this line is the only change
    if (typeof n.normalize == 'function') n.normalize();
    
    v = (n && n.firstChild ? n.firstChild.nodeValue : null);
    return ((v === null||v === undefined||v==='') ? defaultValue : v);
  }
});

All we're doing in the above is making a call to 'normalize' - a single line change which fixes the 4kb node text limitation. Sadly though, we can't actually do this because of the use of the Module pattern. In this example there are two private variables being accessed - 'trimRe' and 'valueCache'.

We can't get access to these private variables in our override, which means that our override here fails. In fact, the Module pattern means we can't actually patch this at all.

The only way to do it is to modify the source code of Ext JS itself, which is a very dangerous practice as you need to remember every change you made to ext-all.js and copy them all over next time you upgrade.

Even if there are good reasons for enforcing the privacy of variables (in this case I don't think there are), we could get around this by providing a privileged function which returns the private variable - essentially making it read-only:

Ext.DomQuery.getValueCache = function() {
  return valueCache;
};

Except again this needs to be defined inside the original closure - we just can't add it later. Again we would have to modify the original source code, with all the problems that entails.

Ext.ComponentMgr does the same trick when registering xtypes. An xtype is just a string that Ext maps to a constructor to allow for easy lazy instantiation. The trouble is, Ext hides the xtype lookup object inside a private variable, meaning that if you have an xtype string it is impossible to get a reference to the constructor function for that xtype. Ext provides a function to instantiate an object using that constructor, but doesn't let you get at the constructor itself. This is totally unnecessary.

Recommendations

Think very carefully before using the Module pattern at all. Do you really need to enforce privacy of your variables? If so, why?
If you absolutely have to use private variables, consider providing a getter function which provides read-only access to the variables
Keep in mind that once defined, private variables defined this way cannot be overwritten at all. In other languages you can often overwrite a superclass's private variables in a subclass - here you can't

Either of the above would have solved both problems, but as neither was implemented we have to fall back to hackery.

And remember this about the Module pattern:

It's overused - in the examples above (especially Ext.ComponentMgr) there is no benefit from employing the pattern
It's dangerous - because of its inflexibility it forces us to modify external source code directly - changes you are almost guaranteed to forget about when it comes to updating the library in the future
It's bloody annoying - because of both of the above.

ExtJS modules and mixins

Fri, 02 Oct 2009 13:56:07 GMT

A few days back Praveen Ray posted about "Traits" in Ext JS. What he described is pretty much what we'd call Modules in the Ruby world, and how to mix those modules into a given class.

Basically, using modules we can abstract common code into reusable chunks, and then include them into one or more classes later. This has several advantages - avoiding code repetition, decoupling code concepts and ease of unit testing among them.

While the idea is good, there is a better way of achieving this than Praveen suggests. Let's say we define the following modules, which are just plain old objects:

//module providing geolocation services to a class
var GeoLocate = {
  findZipLatLng: function(zipCode) {
    //does some clever stuff to find a zip codes latitude/longitude
  },
  
  getGeoApiKey: function() {
    return this.geo_api_key || 'default key';
  }
};

//module allowing a class to act as a state machine
var StateMachine = {
  transition: function(stateName) {
    this.state = stateName;
  },
  
  inState: function(stateName) {
    return this.state == stateName;
  }
};

We've got a couple of fictional modules, providing geolocation and state machine functionality. Adding these to an ExtJS class is actually pretty simple:

Ext.override(Ext.form.FormPanel, StateMachine);
Ext.override(Ext.form.FormPanel, GeoLocate);

All that happens above is each property of our module object is copied to Ext.form.FormPanel's prototype, making the functions available to all FormPanel instances.

If we just wanted to mix our modules into a specific instance of a class, we can do it like this:

var myForm = new Ext.form.FormPanel({});

Ext.apply(myForm, StateMachine);

This will only affect the instance we're applying to, leaving all other FormPanel instances alone. In Praveen's example this is in fact all we need to do - there is no need to do the constructor definition and Ext.extend call, we can just use Ext.apply.

There's nothing in the above that's actually limited to Ext JS - all we're doing is copying properties from one object to another. Implementing Ext.override and Ext.apply are pretty simple without Ext itself, though Ext.extend is a whole other story (and a blog post in itself).

Finally, beware overwriting existing properties (functions or objects) on the class you are mixing into. If your formpanel already has a 'transition' function it will be overwritten by your module, which could lead to unexpected behaviour. At the instance level you could buy some protection against that by using Ext.applyIf instead of Ext.apply, though you might be safer writing a custom mixin function which can provide access to the original function or raise an exception when overwriting an existing property.

Ext.ux.layout.FillContainer

Mon, 28 Sep 2009 14:46:27 GMT

One of the pages on the Ext JS app I'm currently working on has a form with a grid underneath. The page exists as a tab inside an Ext.TabPanel, and uses the border layout, with the form as the 'north' component and the grid as 'center'.

The trouble with this is that the grid shrinks down to an unusable size when the browser window is too small, ending up like this:

We could alternatively use a basic container layout, but this limits us to a fixed height for the grid, meaning we waste space at the bottom:

Enter the imaginatively named FillContainer:

new Ext.Panel({
  autoScroll: true,
  layout: 'fillcontainer',
  items : [
    {
      html  : 'Pretend this is a form',
      height: 400
    },
    {
      html         : 'And this is the grid',
      minHeight    : 250,
      fillContainer: true
    }
  ]
});

If our containing panel shrinks to less than 650px in height, the grid will be automatically sized to 250px and a vertical scrollbar will appear on the panel, like this:

If the panel's height increases to, say, 900px, the grid gets resized to 500px high. This way we use the space when it's available, while maintaining a usable interface when height is limited:

Here's the code that makes it work:

Ext.ns('Ext.ux.layout');

/**
 * @class Ext.ux.layout.FillContainerLayout
 * @extends Ext.layout.ContainerLayout
 * @author Ed Spencer (http://edspencer.net)
 * Extended version of container layout which expands a given child item to the 
 * full height of the container, honouring the item's minHeight property
 */
Ext.ux.layout.FillContainerLayout = Ext.extend(Ext.layout.ContainerLayout, {
  monitorResize: true,
  
  /**
   * After rendering each item, resize the one with fillContainer == true
   */
  onLayout: function(ct, target) {
    Ext.ux.layout.FillContainerLayout.superclass.onLayout.apply(this, arguments);
    
    var ctHeight    = ct.getHeight(),
        itemsHeight = 0,
        expandItem;
    
    ct.items.each(function(item) {
      if (item.fillContainer === true) {
        expandItem = item;
      } else {
        itemsHeight += item.getHeight();
      }
    });
    
    //set the expand item's height to fill the container
    if (expandItem != undefined && ctHeight > itemsHeight) {
      var newHeight = ctHeight - itemsHeight;
      
      expandItem.setHeight(Math.max(newHeight, expandItem.minHeight));
    }
  }
});

Ext.Container.LAYOUTS['fillcontainer'] = Ext.ux.layout.FillContainerLayout;

As we're just extending the default container layout, your items will be rendered in the order you specify them. The expanding item doesn't have to be the last one - we could equally have set fillContainer and minHeight on the form to expand that instead of the grid.

JavaScript FizzBuzz in a tweet

Thu, 17 Sep 2009 15:15:54 GMT

The FizzBuzz challenge has been around a while but I stumbled across it again after reading another unique Giles Bowkett post.

If you're not familiar with FizzBuzz, it's a little 'challenge' designed to test a candidate programmer's ability to perform a simple task. In this case, you just have to print out the numbers from 1 to 100, unless the number is a multiple of 3, when you should instead print "Fizz", 5 in which case you print "Buzz", or both 3 and 5 in which case you print "FizzBuzz".

Here's a trivial JavaScript implementation:

for (var i=1; i <= 100; i++) {
  if (i % 3 == 0) {
    if (i % 5 == 0) {
      console.log('FizzBuzz');
    } else {
     console.log('Fizz');
   }
  } else if (i % 5 == 0) {
    console.log('Buzz');
  } else {
    console.log(i);
  }
};

Pretty simple stuff, but a bit verbose. I wanted something that would fit into a tweet. It turns out that's pretty simple - this is 133 characters including whitespace, 7 within tolerance for a twitter message:

for (var i = 1; i <= 100; i++) {
  var f = i % 3 == 0, b = i % 5 == 0;
  console.log(f ? b ? "FizzBuzz" : "Fizz" : b ? "Buzz" : i);
}

Which of course begs the question - just how short can a JavaScript FizzBuzz implementation be? Here's my baseline, which is a tortured and contorted version of the above:

for(i=1;i<101;i++){console.log(i%3?i%5?i:"Buzz":i%5?"Fizz":"FizzBuzz")}

The above is 71 characters - I expect you to do better. The rules are that the only dependency is Firebug's console.log being available, and you can't replace 'console.log' for anything else.

Of course, if we did swap 'console.log' for 'alert', the whole thing would fit in a tweet twice, but that would be damn annoying.

Hint: you can take at least three more characters off the above - can you see how?

Using the ExtJS Row Editor

Wed, 16 Sep 2009 15:25:47 GMT

The RowEditor plugin was recently added to the ExtJS examples page. It works a lot like a normal Grid Editor, except you can edit several fields on a given row at once before saving.

This neatly solves the problem of adding a new row to an editor grid, entering data into the first field and finding it save itself straight away, which is rarely desired. In this fashion we can provide full CRUD for simple models in a single page.

Installation

You'll need to get a copy of the javascript, css and images from the server. This is a bit of a pain. If you still have the ExtJS SDK around you can find these in the examples folder, if not you can get each file as follows:

Grab the plugin JS file below and put it where you usually put your .js files: http://www.extjs.com/deploy/dev/examples/ux/RowEditor.js

This needs to go with your other stylesheets, usually in a directory called 'css': http://www.extjs.com/deploy/dev/examples/ux/css/RowEditor.css

Download these two images and put them into your existing 'images' folder (the same place the other ExtJS images live): http://www.extjs.com/deploy/dev/examples/ux/images/row-editor-bg.gif http://www.extjs.com/deploy/dev/examples/ux/images/row-editor-btns.gif

Include the .js and .css files on your page and you should be ready to go.

Usage

RowEditor is a normal grid plugin, so you'll need to instantiate it and add to your grid's 'plugins' property. You also need to define what type of Editor is available (if any) on each column:

var editor = new Ext.ux.grid.RowEditor();

var grid = new Ext.grid.GridPanel({
  plugins: [editor],
  columns: [
    {
      header   : 'User Name',
      dataIndex: 'name',
      editor   : new Ext.form.TextField()
    },
    {
      header   : 'Email',
      dataIndex: 'email',
      editor   : new Ext.form.TextField()
    }
  ]
  ... the rest of your grid config here
});

RowEditor defines a few events, the most useful one being 'afteredit'. Its signature looks like this:

/**
 * @event afteredit
 * Fired after a row is edited and passes validation.  This event is fired
 * after the store's update event is fired with this edit.
 * @param {Ext.ux.grid.RowEditor} roweditor This object
 * @param {Object} changes Object with changes made to the record.
 * @param {Ext.data.Record} r The Record that was edited.
 * @param {Number} rowIndex The rowIndex of the row just edited
 */
'afteredit'

All you need to do is listen to that event on your RowEditor and save your model object appropriately. First though, we'll define the Ext.data.Record that we're using in this grid's store:

var User = Ext.data.Record.create([
  {name: 'user_id', type: 'int'},
  {name: 'name',    type: 'string'},
  {name: 'email',   type: 'string'}
]);

And now the afteredit listener itself

editor.on({
  scope: this,
  afteredit: function(roweditor, changes, record, rowIndex) {
    //your save logic here - might look something like this:
    Ext.Ajax.request({
      url   : record.phantom ? '/users' : '/users/' + record.get('user_id'),
      method: record.phantom ? 'POST'   : 'PUT',
      params: changes,
      success: function() {
        //post-processing here - this might include reloading the grid if there are calculated fields
      }
    });
  }
});

The code above simply takes the changes object (which is just key: value object with all the changed fields) and issues a request to your server backend. 'record.phantom' returns true if this record does not yet exist on the server - we use this information above to specify whether we're POSTing to /users or PUTing to /users/123, in line with normal RESTful practices.

Adding a new record

The example above allows for editing an existing record, but how do we add a new one? Like this:

var grid = new Ext.grid.GridPanel({
  //... the same config from above goes here,
  tbar: [
    {
      text   : "Add User",
      handler: function() {
        //make a new empty User and stop any current editing
        var newUser = new User({});
        rowEditor.stopEditing();
        
        //add our new record as the first row, select it
        grid.store.insert(0, newUser);
        grid.getView().refresh();
        grid.getSelectionModel().selectRow(0);
        
        //start editing our new User
        rowEditor.startEditing(0);
      }
    }
  ]
});

Pretty simple stuff - we've just added a toolbar with a button which, when clicked, creates a new User record, inserts it at the top of the grid and focusses the RowEditor on it.

Configuration Options

Although not documented, the plugin has a few configuration options:

var editor = new Ext.ux.grid.RowEditor({
  saveText  : "My Save Button Text",
  cancelText: "My Cancel Button Text",
  clicksToEdit: 1, //this changes from the default double-click activation to single click activation
  errorSummary: false //disables display of validation messages if the row is invalid
});

If you want to customise other elements of the RowEditor you probably can, but you'll need to take a look at the source (it's not scary).

Final Thought

RowEditor is a really nice component which can provide an intuitive interface and save you writing a lot of CRUD code. It is best employed on grids with only a few columns - for models with lots of data fields you're better off with a full FormPanel.

I'd be pretty happy to see this included in the default ExtJS distribution, as I find myself returning to it frequently.

Moving from Blogger to Wordpress

Mon, 14 Sep 2009 16:38:37 GMT

Over the weekend I migrated from Blogger (hosted by Google) to Wordpress (hosted by me). Overall, Wordpress feels far superior, but the migration was not without problems. Here's a short guide to what I had to do:

Get Wordpress

First, grab the latest version of Wordpress. Being a PHP application, it just drops into a directory and works :)

Create a database, set config

Wordpress has a setup script but being a bit of a noob I couldn't give it write permission to my filesystem. If you are also afflicted by such inadequacies the following steps may help you. For clarity I'll call my DB 'wordpress'. First, set up your database:

mysql -u root
CREATE DATABASE wordpress;
GRANT ALL on wordpress.* TO 'wordpress'@'localhost' identified by 'wordpress';

You'll need a wp-config.php file - Wordpress comes with a default one which you can copy thusly (in the root directory of your wordpress directory):

cp wp-config-sample.php wp-config.php

Now edit wp-config.php, and fill in the details to make it look a little like this:

// ** MySQL settings - You can get this info from your web host ** //
/** The name of the database for WordPress */
define('DB_NAME', 'wordpress');

/** MySQL database username */
define('DB_USER', 'wordpress');

/** MySQL database password */
define('DB_PASSWORD', 'wordpress');

/** MySQL hostname */
define('DB_HOST', 'localhost');

Hitting the site now should show a default Wordpress installation with some dummy content. Whoop.

Import from Blogger

Importing from Blogger is pleasingly simple - just use the Tools -> Import option on the menu. It'll ask you to verify access and then pull down all your posts and comments.

It's not perfect though - for me the post Tags were imported as Categories. To get them to be Tags again, go to the Posts -> Categories menu and use the handy Categories to Tags converter.

I found that the imported Post markup was pretty mangled (I think this is Blogger's fault, not Wordpress') - s everywhere and no line breaks. To resolve this I cracked open mysql again and ran the following:

use wordpress;
UPDATE wp_posts SET post_content = REPLACE(post_content, "", "n"); That sorted me out alright. Next we need to set up how our permalinks work. Set this in the Settings -> Permalinks menu, and use the following format to mimic the Blogger urls:

/%year%/%monthnum%/%postname%.html

Wordpress will either write a .htaccess file for you at this point, or tell you it can't write to the filesystem and give you a short text config which you must manually copy into a file called .htaccess.

One final thing to note is that Wordpress constructs its slug urls differently to Blogger (Wordpress would use 'the-trouble-with-new' vs Blogger's 'trouble-with-new', for example). If you're importing blog posts this means your urls won't always match, so any incoming links will be broken. I couldn't find an easier way to correct them than just copy/paste by hand - doesn't take long though.

Syntax Highlighting

This whole step is entirely optional.

Because I'm a geek I post code fairly often. I used the SyntaxHighlighter library back in the Blogger days and wanted to keep using it. You can install the Wordpress plugin version from http://www.viper007bond.com/wordpress-plugins/syntaxhighlighter/. The old syntax didn't seem to work, so I needed to go back into mysql and run the following:


UPDATE wp_posts SET post_content = REPLACE(post_content, "<pre name="code" class="js">", "[c0de language="js"]");
UPDATE wp_posts SET post_content = REPLACE(post_content, "</pre>", "[/c0de]");

NOTE: So that Wordpress doesn't interpret those tags above I've changed the 'o' in code to a '0'. You need to change it back :)

This just swaps all your old and tags for [c0de language="js"]and [/c0de] respectively. Repeat the first line for any other languages you have used (for me this was xml, css, html and ruby).

Fixing feeds

Wordpress doesn't like the world to see your content via RSS. Odd, isn't it? There's an option in Settings -> Reading which claims to output the full text of each article into your feed, but it doesn't seem to work. Instead, what you need to do is hack your theme a little. You'll need to edit the wp-includes/feed-rss2.php file and change line 47 from <?php the_excerpt_rss() ?> to <?php the_content() ?>.

If you're using Feedburner or similar, don't forget to give it your new feed url too. In this case you should also update wp-content/themes/yourTheme/header.php and swap out the occurrences with your Feedburner url.

Upload and update DNS

At this point everything should be working nicely, so upload your blog folder and update your DNS settings. I'm guessing if you're hosting Wordpress yourself you don't need help with this. I've made my own blog into a git repository up on Github, allowing me to deploy any changes I make using Capistrano. It's a nice solution - for more information see this lovely post by the gentlemen at imedo.

The trouble with new

Tue, 08 Sep 2009 14:32:00 GMT

We have a simple JavaScript class:


function User(firstName, lastName) {
  this.name = firstName + " " + lastName;
}

We create a new User:


var ed = new User("Ed", "Spencer");
alert(ed.name); //alerts 'Ed Spencer'
alert(window.name); //undefined

All is well. Unless we forgot the 'new':


var ed = User("Ed", "Spencer");
alert(ed.name); //ed is undefined
alert(window.name); //alerts 'Ed Spencer'

Curses! That's not what we want at all. By omitting the 'new' keyword, the JavaScript engine executes our 'User' constructor in the current scope, which in this case is the global window object. With the scope ('this') set to window, setting 'this.name' is now the same as setting 'window.name', which is not what we're trying to do.

Here's the problem though, omitting the 'new' keyword is still perfectly valid syntax. We know at design time if 'new' must be used or not, and can use a little trick to make it act as though 'new' was indeed used:


function User(firstName, lastName) {
  if (!(this instanceof User)) {
    return new User(firstName, lastName);
  }
  
  this.name = firstName + " " + lastName;
}

Because the 'new' keyword sets up a new context, we can just test to see if 'this' is now an instance of our class. If it's not, it means the user has omitted the 'new' keyword, so we do it for them. John Resig has an example of this over on his blog.

This is all very well and good, but I don't think we should use it. The reason is that we're hiding a pseudo syntax error from the developer, instead of educating them with its correct usage. If we hide this mistake in each class we write, our unknowing developer will remain unknowing, and run into a wall when they repeat their mistake on classes that don't fix it for them.

Instead, I suggest the following:


function User(firstName, lastName) {
  if (!(this instanceof User)) {
    throw new Error("You must use the 'new' keyword to instantiate a new User");
  }

  this.name = firstName + " " + lastName;
}

The only difference of course is that we're throwing an Error instead of fixing the developer's mistake. The benefit is that their syntax won't actually work unless they write it correctly. This is good because our erstwhile developer is prompted to fix their code and understand why it was wrong. Better informed developers leads to better code.

Well, hopefully.

Ext.decorate

Sun, 30 Aug 2009 19:06:00 GMT

Sometimes you want to override one of the methods in ExtJS that return a configuration object - let's use Ext.direct.RemotingProvider's getCallData as an example, which looks like this:


getCallData: function(t){
  return {
    action: t.action,
    method: t.method,
    data  : t.data,
    type  : 'rpc',
    tid   : t.tid
  };
}

Our aim is to add an 'authentication_token' property to the returned object. You could provide the full config object again in an override, but usually you're overriding to add, remove or change one or two properties and want to leave the rest unmolested. I used to find myself writing a lot of code with this pattern:


//just adds an authentication token to the call data, for context see <a href="http://www.extjs.com/forum/showthread.php?p=378912#post378912">this forum thread</a>
(function() {
  var originalGetCallData = Ext.direct.RemotingProvider.prototype.getCallData;
  
  Ext.override(Ext.direct.RemotingProvider, {
    getCallData: function(t) {
      var defaults = originalGetCallData.apply(this, arguments);
      
      return Ext.apply(defaults, {
        authenticity_token: '<%= form_authenticity_token %>'
      });
    }
  })
})();

All we're really doing here is adding 1 config item - an authenticity_token, but it takes a lot of setup code to make that happen. Check out Ext.decorate:


/**
 * @param {Function} klass The constructor function of the class to override (e.g. Ext.direct.RemotingProvider)
 * @param {String} property The name of the property the function to override is tied to on the klass' prototype
 * @param {Object} config An object that is Ext.apply'd to the usual return value of the function before returning
 */
Ext.decorate = function(klass, property, config) {
  var original = klass.prototype[property];
      override = {};
  
  override[property] = function() {
    var value = original.apply(this, arguments);
    
    return Ext.apply(value, config);
  };
  
  Ext.override(klass, override);
}

This lets us write the same override like this:


Ext.decorate(Ext.direct.RemotingProvider, 'getCallData', {
  authenticity_token: '<%= form_authenticity_token %>'
});

Much nicer, we just tell it what we want with no need for unwieldy boilerplate code. This method doesn't actually exist in Ext (though it would be good if something similar did), but you could define it yourself as above to keep such code nice and dry.

Ext.ux.Printer - printing for any ExtJS Component

Tue, 28 Jul 2009 22:02:00 GMT

After my recent foray into printing grids with ExtJS, I realised I needed to print some trees too. Seeing as some of the work was already done for the Grid example, it made sense to create a common API for printing any Ext.Component. And thus Ext.ux.Printer was born:


var grid = new Ext.grid.GridPanel({ // just a normal grid });
var tree = new Ext.tree.ColumnTree({ // just a normal column tree });

Ext.ux.Printer.print(grid);
Ext.ux.Printer.print(tree);

Each of the above opens a new window, renders some HTML (just a big table really), prints it and closes the window - all client side with no server side code required. Although trees and grids represent data quite differently internally, we can use the same API on Ext.ux.Printer to print them both.

Ext.ux.Printer uses Renderer classes to cope with a specific xtype, and adding Renderers for other components is easy. At the moment Ext.grid.GridPanel and Ext.tree.ColumnTree are supported out of the box, but let's see how we'd add support for printing the contents of an Ext.Panel:


/**
 * Prints the contents of an Ext.Panel
 */
Ext.ux.Printer.PanelRenderer = Ext.extend(Ext.ux.Printer.BaseRenderer, {

 /**
  * Generates the HTML fragment that will be rendered inside the <html> element of the printing window
  */
 generateBody: function(panel) {
   return String.format("<div class='x-panel-print'>{0}</div>", panel.body.dom.innerHTML);
 }
});

Ext.ux.Printer.registerRenderer("panel", Ext.ux.Printer.PanelRenderer);

This is probably the simplest print renderer of all - we're simply grabbing the HTML from inside a the panel's body and returning it inside our own div. We subclassed Ext.ux.Printer.BaseRenderer, and in this case all we needed to do was provide an implementation for generateBody. Whatever this function returns is rendered inside the <body> tag of the newly-opened printing window.

Notice that we registered this renderer for all components with the xtype of 'panel'. Internally, Ext.ux.Printer examines the xtype chain of the component you pass it to print, and uses the first renderer that matches. As many Ext components inherit from Ext.Panel this can function as a catch-all renderer.

Here's how we'd use our new renderer:


var panel = new Ext.Panel({
  html: {
    tag: 'ul',
    chidren: [
      {tag: 'li', text: 'Item 1'},
      {tag: 'li', text: 'Item 2'},
      {tag: 'li', text: 'Item 3'}
    ]
  }
});

Ext.ux.Printer.print(panel);

Pretty straightforward. You can now print Ext.Panels the same way you'd print a Grid or a Tree. Take a look at the Grid Renderer and the ColumnTree Renderer for examples of rendering more advanced components.

As usual, all of the Ext.ux.Printer source is available on Github, and the README file there contains instructions for installation and usage.

Finally, when the printing window is opened it includes a stylesheet that it expects to find at "/stylesheets/print.css". There is a default print.css stylesheet included with the extension to get you started, and you can specify where to find this stylesheet like this:


Ext.ux.Printer.BaseRenderer.prototype.stylesheetPath = '/path/to/print/stylesheet.css';

ExtJS grid page size - letting the user decide

Tue, 28 Jul 2009 08:28:00 GMT

Sometimes you'll be using a Paging Toolbar on a grid and need to give the user the ability to change the number of records per page. One way of doing this is by adding a combobox to the toolbar:


var combo = new Ext.form.ComboBox({
  name : 'perpage',
  width: 40,
  store: new Ext.data.ArrayStore({
    fields: ['id'],
    data  : [
      ['15'], 
      ['25'],
      ['50']
    ]
  }),
  mode : 'local',
  value: '15',

  listWidth     : 40,
  triggerAction : 'all',
  displayField  : 'id',
  valueField    : 'id',
  editable      : false,
  forceSelection: true
});

We've set up a simple combo box which allows the user to choose between 15, 25 and 50 records per page. Now let's set up a Paging Toolbar, and a listener to take action when the user changes the selection in the combo box:


var bbar = new Ext.PagingToolbar({
  store:       store, //the store you use in your grid
  displayInfo: true,
  items   :    [
    '-',
    'Per Page: ',
    combo
  ]
});

combo.on('select', function(combo, record) {
  bbar.pageSize = parseInt(record.get('id'), 10);
  bbar.doLoad(bbar.cursor);
}, this);

Finally we'll roll it all together into a Grid:


var grid = new Ext.grid.GridPanel({
  //your grid setup here...

  bbar: bbar
});

If the user needs to be able to enter her own page size, replace the ComboBox with an Ext.form.NumberField, and attach the event listener to the field's 'keypress' event.

Printing grids with Ext JS

Sun, 26 Jul 2009 15:43:00 GMT

Grids are one of the most widely used components in Ext JS, and often represent data that the user would like to print. As the grid is usually part of a wider application, simply printing the page isn't often a good solution.

You could attach a stylesheet with media="print", which hides all of the other items on the page, though this is rather application-specific, and a pain to update. It would be far better to have a reusable way of printing the data from any grid.

The way I went about this was to open up a new window, build a table containing the grid data into the new window, then print it and close. It's actually pretty simple, and with a bit of CSS we can even get the printable view looking like it does in the grid.

Here's how you use it (this is a slightly modified version of the Array Grid Example):


var grid = new Ext.grid.GridPanel({
  store  : store,
  columns: [
      {header: "Company",      width: 160, dataIndex: 'company'},
      {header: "Price",        width: 75,  dataIndex: 'price', renderer: 'usMoney'},
      {header: "Change",       width: 75,  dataIndex: 'change'},
      {header: "% Change",     width: 75,  dataIndex: 'pctChange'}
      {header: "Last Updated", width: 85,  dataIndex: 'lastChange', renderer: Ext.util.Format.dateRenderer('m/d/Y')}
  ],
  title:'Array Grid',
  tbar : [
    {
      text   : 'Print',
      iconCls: 'print',
      handler: function() {
        Ext.ux.GridPrinter.print(grid);
      }
    }
  ]
});

So we've just set up a simple grid with a print button in the top toolbar. The button just calls Ext.ux.GridPrinter.print, which does all the rest. The full source code that this example was based upon can be found at http://extjs.com/deploy/dev/examples/grid/array-grid.js.

The source for the extension itself is pretty simple (download it here):

If you look at the source above you'll see it includes a 'print.css' stylesheet, which can be used to style the printable markup. The GridPrinter expects this stylesheet to be available at /stylesheets/print.css, but this is easy to change:


  //add this before you call Ext.ux.GridPrinter.print
  Ext.ux.GridPrinter.stylesheetPath = '/some/other/path/gridPrint.css';

Finally, here is some CSS I've used to achieve a grid-like display on the printable page:

html,body,div,dl,dt,dd,ul,ol,li,h1,h2,h3,h4,h5,h6,pre,form,fieldset,input,p,blockquote,th,td{margin:0;padding:0;}
img,body,html{border:0;}
address,caption,cite,code,dfn,em,strong,th,var{font-style:normal;font-weight:normal;}
ol,ul {list-style:none;}caption,th {text-align:left;}h1,h2,h3,h4,h5,h6{font-size:100%;}q:before,q:after{content:'';}

table {
  width: 100%;
  text-align: left;
  font-size: 11px;
  font-family: arial;
  border-collapse: collapse;
}

table th {
  padding: 4px 3px 4px 5px;
  border: 1px solid #d0d0d0;
  border-left-color: #eee;
  background-color: #ededed;
}

table td {
  padding: 4px 3px 4px 5px;
  border-style: none solid solid;
  border-width: 1px;
  border-color: #ededed;
}

This technique could easily be adapted to print any component that uses a store - DataViews, ComboBoxes, Charts - whatever. It just requires changing the generated markup and stylesheet.

Ext.override - Monkey Patching Ext JS

Fri, 24 Jul 2009 08:00:00 GMT

Ext JS contains a function called Ext.override. Using this function allows you to add functionality to existing classes, as well as override properties of the class. For example, let's say we want to override how Ext.Windows are hidden:


Ext.override(Ext.Window, {
  hide: function() {
    //the contents of this function are now called instead of the default window hide function
  }
});

Using Ext.override changes the prototype of the class you are overriding - all instances of Ext.Window will now use the new hide function in the example above.

Overriding other classes can be dangerous, especially when they are classes from a library not under your control. For example, if the Ext.Window class was refactored in a later version, your overrides may no longer work. In some situations you might choose to go down the safer route of augmenting the existing functionality without overriding it. Here's one way we can achieve this using a closure:


(function() {
  var originalHide = Ext.Window.prototype.hide;

  Ext.override(Ext.Window, {
    hide: function() {
      //perform pre-processing
      alert("The window is about to close!");

      //call the original hide function
      originalHide.apply(this, arguments);

      //perform post-processing.
      alert("The window closed!!1");
    }
  });
})();

In the example above we set up a closure via an anonymous function which is executed immediately. This lets us keep a reference to the original hide function on Ext.Window. Underneath we perform the override itself, in which we provide our own logic.

The originalHide.apply(this, arguments) line is key to maintaining Ext.Window's original functionality. By using the apply keyword with the Window's usual scope ('this') and the function's arguments 'array', we can wrap our functionality before or after the original method.

Augmenting in this way is safer than simply overwriting the function, or copy & pasting Ext.Window's original hide function into your own, as you don't have to worry about breaking what Ext JS itself does (you're still responsible for making sure your own additions work after upgrading Ext though).

Be aware that this will affect all instances of Ext.Window (or whatever class you are overriding). If that isn't what you want, use Ext.extend to create your own subclasses instead.

Finally, note that you can use Ext.override on any class, not just the built-in Ext ones - all it does internally is call Ext.apply on the constructor function's prototype.

Ext JS iterator functions

Thu, 23 Jul 2009 12:46:00 GMT

Ext JS has a number of handy iterator functions. Some, like Ext.each, you probably already know about, but there are a few others lurking around which can be useful in saving yourself a few lines of code. First, let's recap Ext.each:

Ext.each

Ext.each applies a function to each member of an array. It's basically a more convenient form of a for loop:


var people = ['Bill', 'Saul', 'Gaius'];

//using each to detect Cylons:
Ext.each(people, function(person, index) {
  var cylon = (index + 1) % 2 == 0; //every second man is a toaster
  alert(person + (cylon ? ' is ' : ' is not ') + 'a fraking cylon');
});

//is the same as
for (var i=0; i < people.length; i++) {
  var person = people[i];
  var cylon = (i + 1) % 2 == 0; //every second man is a toaster

  alert(person + (cylon ? ' is ' : ' is not ') + 'a frakin cylon');
};

Ext.iterate

Ext.iterate is like Ext.each for non-array objects. Use it wherever you would normally use a for .. in loop:


var ships  = {'Bill': 'Galactica', 'Laura': 'Colonial One'};

Ext.iterate(ships, function(key, value) {
  alert(key + "'s ship is the " + value);
});

//is the same as
for (key in ships) {
  var value = ships[key];
  alert(key + "'s ship is the " + value);
}

Using Ext.iterate with an array is the same as calling Ext.each. Each and Iterate both take an optional third parameter, which is the scope to run the function in. Another advantage over using the for construct is that you can easily reuse the same function:


var myFunction = function(item, index) {
  //does some clever thing
}

Ext.each(people, myFunction);
Ext.each(['another', 'array'], myFunction);

Ext.pluck

Ext.pluck grabs the specified property from an array of objects:


var animals = [
  {name: 'Ed', species: 'Unknown'},
  {name: 'Bumble', species: 'Cat'},
  {name: 'Triumph', species: 'Insult Dog'}
];

Ext.pluck(animals, 'species'); //returns ['Unknown', 'Cat', 'Insult Dog']
Ext.pluck(animals, 'name'); //returns ['Ed', 'Bumble', 'Triumph']

Ext.invoke

Invoke allows a function to be applied to all members of an array, and returns the results. Using our animals object from above:


var describeAnimal = function(animal) {
  return String.format("{0} is a {1}", animal.name, animal.species);
}

var describedAnimals = Ext.invoke(animals, describeAnimal);
console.log(describedAnimals); // ['Ed is a Unknown', 'Bumble is a Cat', 'Triumph is a Insult Dog'];

Ext.invoke performs a similar job to Ruby's collect method in making it easy to transform arrays. Any additional arguments passed to the Ext.invoke call will be passed as arguments to your function, in this case the describeAnimal function. Obviously your functions will be much more grammatically accurate than mine.

Ext.partition

Ext.Partition splits an array into two sets based on a function you provide:


var trees = [
  {name: 'Oak',    height: 20},
  {name: 'Willow', height: 10},
  {name: 'Cactus', height: 5}
];

var isTall = function(tree) {return tree.height > 15};

Ext.partition(trees, isTall);

//returns:
[
  [{name: 'Oak', height: 20}], 
  [{name: 'Willow', height: 10}, {name: 'Cactus', height: 5}]
]

The partition call above returns a 2-dimensional array with the first element containing all of the items for which the function returned true (tall trees in this case), and the second containing items for which the function return false.

Math functions

Finally, we have some simple math-related functions:


var numbers = [1, 2, 3, 4, 5];
Ext.min(numbers); //1
Ext.max(numbers); //5
Ext.sum(numbers); //15
Ext.mean(numbers); //3

While the built in functions don't cater for all situations, they're useful to have and to know about, and usually offer a more elegant approach than using the 'for' keyword.

Read my BDD article in this month's JS Magazine

Wed, 10 Jun 2009 15:44:00 GMT

I have an article on Behaviour Driven Development for JavaScript in June's edition of the excellent JavaScript Magazine.

If you haven't seen or read the magazine before (it's quite new), it's well worth the few dollars charged. The magazine format allows for in-depth articles that require more space, time and effort to write than a typical blog post, and which therefore often go unwritten.

The thrust of my article is that too much of our JavaScript goes untested, but that nowadays it's easy to fix that. I go through an example of a client side shopping cart, using the JSpec BDD library. Even if you don't buy/read the magazine, I highly recommend checking out JSpec and other libraries like it. As JavaScript powered applications become the norm, BDD will only become more important in ensuring our applications work properly, so now is a good time to start.

Also in this month's issue is a guide to using the Canvas tag, tips on how to use build scripts to optimise your JavaScript for each environment, AJAX security pointers and a roundup of community news.

Darwin, Humanism and Science

Sun, 07 Jun 2009 16:17:00 GMT

On Saturday I had the good fortune to be able to attend a conference entitled "Darwin, Humanism and Science", held at London's Conway Hall. For those not able to attend here is a short roundup of what happened:

Richard Dawkins starts us off

The conference kicked off with a quick introduction from BHA President Polly Toynbee, after which Professor Dawkins took to the stage. His lecture revolved around the concluding paragraph of Darwin's On the Origin of Species, which can be read online for free here (the relevant passage starts "Thus, from the war of nature ..."). Dawkins analysed each segment of the text in turn, giving us his insights into its meaning and slipping in some fascinating information about our modern-day understanding of evolution, such as how we know that all species in the world today must be descended from a single progenitor.

The professor left some time for questions at the end of his lecture. He had commented on the lamentable state of the public's understanding of science, proffering the alarming statistic that some 18% of the British population believes that the Earth orbits the Sun once a month (presumably we go around faster in February), which lead me to ask him what we can do to combat this. His answer was to "get out more".

Referring to the role scientists have to play in the public's awareness and understanding of science, rather than (I hope) to my social life, he made the point that scientists and educators must make greater efforts to reach out to the public and disseminate not only the knowledge that modern science has obtained, but the joy that this knowledge can bring. It was a point that was returned to time and again throughout the day.

Insidious Creationism in Education

Following Professor Dawkins were two quick talks about the teaching of evolution in schools - first from a European perspective from Professor Charles Susanne, and then from a British one from James Williams. Though both highlighted the growing influence Creationist organisations are having on educational materials, Mr Williams' speech was for me the more alarming:

Quickly firing through a series of ridiculous materials showing how children and dinosaurs once lived and played together, and even an endearing image of Jesus cuddling a small Velociraptor, Williams showed how creationist books, comics and literature represent an "intellectual abuse of children". Entitled "Insidious Creationism", his talk opened our eyes to the battle being waged over children's education, in this country and around the world. I for one am very pleased that we have people like Williams fighting in our corner, and for his troubles he was presented with the rather dubious prize of an Atlas of Creation.

Human understanding of Evolution

After lunch we were treated to talks from Johan De Smedt and Dr Michael Schmidt-Salomon. Johan's talk revolved around the three themes of essentialism, teleology and the design stance. Tackling these in turn he described the biases inherent within us that give these ideas more prominence in our mental model of the world than they deserve, especially when we are young.

Dr Schmidt-Salomon rebutted the idea that evolution can be objected to on moral grounds. Though it may seem obvious that a moral objection to a natural phenomenon does not make it any less real, he reminds us that there are those who disagree. His most impressive moment though was in revealing his efforts to turn Ascension Day into Evolution Day, aided by a spectacularly bizarre music video featuring Charles Darwin as an unlikely rock star:

[youtube http://www.youtube.com/watch?v=wbIa9fZuTFA&hl=en&fs=1&]

Brilliant.

Hinduism and The Two Cultures

For me an unexpected highlight of the day came in the form of Babu Gogineni's description of the devastating effect that some interpretations of Hinduism are having on science in India. He described how many Hindus believe that modern science backs up Hinduism's central tenets, and can therefore turn their backs on further progress made by the scientific community. It was a startling and eye-opening description, and one could feel his frustration at how quackery and superstition are considered more important (or at least more profitable) than science and understanding in India today.

The conference was rounded off by what seemed a short talk from Professor A C Grayling. This was the first time I had heard Grayling speak, and the calm lucidity with which he spoke made his 45 minutes seem more like 5. Speaking on the 'two cultures' - that of science and that of the humanities - he referred to a lecture given by C P Snow some 50 years ago decrying the divergence of these two cultures, and the widening communication gap between them. Snow's original point was that this divergence was getting in the way of solving the world's problems - 50 years later Grayling points out that we still have some way to go in closing that gap.

The 'Special' Dinner

In the evening a special dinner was laid on for the speakers and delegates attending the conference (today was just one day in a week of Humanism conferences). For some reason they let some of the unwashed masses in too, and so it was that I sat down with a delightful group of fellow conference-goers to enjoy a good meal punctuated by good conversation.

After we had eaten Professor Grayling presented Professor Dawkins with an award in recognition of his efforts in spreading rationality and clear thinking around the world, and in return Dawkins read out a modern day episode of Jeeves and Wooster, albeit with a rather Atheistic stance. I'm not sure whether or not he penned the parable himself but it was extremely well written and its British humour well received.

All too soon the coffee came round and it was time to leave. Overall the day was very well run and extremely enjoyable. A wise gentleman on my table was moved to remark that "it was the best 8 quid I've ever spent". Amen.

'function' in JavaScript - operator vs statement

Wed, 29 Apr 2009 13:54:00 GMT

In JavaScript we have at least 4 ways of defining functions:

function myFunction() { alert('hai!'); }
var myFunction = function() { alert('hai!'); }
var myFunction = function myFunctionName() { alert('hai!'); }
var myFunction = new Function("alert('hai!');")

These are not all the same, and the crucial thing here is the word 'function' as used in each case. In the first example we're using the function statement, and in the second and third examples we're using the function operator. We'll come back to the fourth example later.

So what's the difference between the function statement and the function operator? Well first we need to understand a bit about anonymous functions. Most of us are familiar with using anonymous functions as event listeners - something like this:

this.on('render', function() {... do some stuff when 'this' has rendered ...});

In this example we've passed in a function without a name as a listener callback. But what do we mean when we say a function does have a name? Do we mean this:

var myFunction = function() {... do some stuff ...};

No we don't. Assigning a function to a variable does not give it a name. The function assigned to our variable above is still an anonymous function. To give a function a name we need to do something like this:

var myFunction = function myFunctionName() {... do some stuff ...};

Now we have declared a function with the name myFunctionName and assigned it to the variable myFunction. Giving a function a name in this way adds a read-only name property to it:

var myVar = function captainKirk() {... do some stuff ...};

alert(myVar.name); //alerts 'captainKirk'

//we can't update it though
myVar.name = 'williamShatner';
alert(myVar.name); //still 'captainKirk'

Coming back to our very first example, we can see that we're using a different form here - the function statement:

function myFunction() { alert('hai!'); }

Under the hood, what this is actually doing is something like this:


myFunction = function myFunction() { alert('hai!'); }

The function statement created a named function and assigns it to a variable of the same name. Note that in this case although the function name and the variable name are the same, they don't have to be:

function myFunction() { alert('hai!'); }
alert(myFunction.name); //alerts 'myFunction'

//assigning this function to another variable preserves the function name
var myVar = myFunction;
alert(myVar.name); //alerts 'myFunction'

Let's take a look at the last of our four initial examples:

var myFunction = new Function("alert('hai!');")

Functions defined this way are always anonymous, and cannot be given a name. In general you shouldn't define functions this way, for several reasons:

The function body has to be parsed by the JS engine every time it is run, compared to just once for a normal function definition. This is slow
Functions defined this way do not inherit the current scope. If you define a function this way the only scope it inherits is the global scope, which means it does not have access to any variables or functions in your current scope chain
Defining functions this way requires the body to be entered as a string, which should sicken you enough not to use it.

One last thing to note is that if you use the function operator, it has to be within the context of an expression. For example you can't do this:

function() {alert('hai!');}

That doesn't work because it's not part of an expression - the function isn't being assigned to anything and you get a syntax error. If you want to run an anonymous function and not assign it to a variable, it can be done like this, which runs the function straight away:

(function() {alert('hai!');})();

For further reading on this check out the Mozilla function reference docs.

The case for Ext.applyOnly

Thu, 23 Apr 2009 09:49:00 GMT

Update: Ext 3.0RC1 has included something like this, but called Ext.copyTo. Obviously my name is better though.

We should have something like this:


Ext.applyOnly(this, config, ['width', 'height']);

You could use this every time you write a method or class that requires a config Object as one of it's parameters. These methods ought to only apply those properties of the config object they actually need, but usually this will just be done with an Ext.apply(this, config). This means anything in your object could be overwritten by this config object. Sometimes that's a good thing, but sometimes it's definitely not.

Ext.applyOnly() applies only a whitelist of the properties in the config object. These are specified by an optional third argument, which is an array of property names. Here's how you might write applyOnly:


/**
 * Applies only a pre-specified set of properties from one object to another
 * @param {Object} receiver The object to copy the properties to
 * @param {Object} sender The object to copy the properties from
 * @param {Array} whitelist The whitelist of properties to copy (e.g. ['width', 'height'])
 * @return {Object} The receiver object, with any of the whitelisted properties overwritten if they exist in sender
 */
Ext.applyOnly = function(receiver, sender, whitelist) {
  if (receiver && sender) {
    Ext.each(whitelist || [], function(item) {
      if (typeof sender[item] != 'undefined') receiver[item] = sender[item];
    }, this);
  };
  
  return receiver;
};

While you can't stop code maliciously overwriting properties this way, it would stop people from unknowingly overwriting your object's properties. They could overwrite them manually, but they'll do this knowing that this wasn't an intended use for the class. Let's have a look at an extension that would do a great job opening popups telling people they've won lots of money:


/**
 * Pops up windows telling lucky visitor she's won big money!!!!
 */
Ext.ux.WinnerEarnings = Ext.extend(Ext.util.Observable, {
  /**
   * @property accessibleProperties
   * @type Array
   * All properties intended to be mass-updatable
   */
  accessibleProperties: ['height', 'width'],
  
  /**
   * Message to show lucky winners!! You can't change this!!!!!
   */
  message: "You're the 1000000000000th visitor!!!!!!!1 Click here to claim money. Now!!!",
  
  constructor: function(config) {
    //apply only the fields that are deemed writable
    Ext.applyOnly(this, config, this.accessibleProperties);
    
    Ext.ux.WinnerEarnings.superclass.constructor.apply(this, arguments);
    
    Ext.applyIf(this, {
      version: 2.9,
      coolFeature: Ext.util.TaskRunner({
        interval: 1000,
        scope:    this,
        run: function() {
          //version, coolFeature, updateDetails, closable and close won't be sent to Ext.Window
          new Ext.Window(Ext.applyOnly({}, this, ['height', 'weight', 'message'])).show();
        }
      }).start()
    }
  },
  
  /**
   * Updates this WinnerEarnings opportunity with options from the supplied object
   * @param {Object} updates An object containing updates to make to this precious opportunity
   * @return {Ext.ux.WinnerEarnings} The WinnerEarnings object
   */
  updateDetails: function(updates) {
    return Ext.applyOnly(this, updates, this.accessibleProperties)
  },
  
  //secret tricks to let the user stop the popups
  closable: false,
  close: function() {
    this.coolFeature.stop();
  }
})

How it works:


var myObj = new Ext.ux.WinnerEarnings({height: 200, width: 150});

myObj.updateDetails({width: 300, message: "My Message"})
myObj.width:   // => 300
myObj.message; // => "You're the 1000000000000th visitor!!!!!!!1 Click here to claim money. Now!!!"

//updating message didn't work, but we can still do it manually
myObj.message = "My message";
myObj.message; // => "My message"

In my example class I've added the whitelist as an accessibleProperties property on the class, which makes it easy for others to see what they should and should not be updating.

In this example we're also sanitizing output with applyOnly. WinnerEarnings lightly wraps around a series of Ext.Windows and we'd like to be able to pass our WinnerEarnings object as config. We want to make sure we're not passing our 'closable' property, 'close()' function and others to the Ext.Window constructor, so we pass that in via a whitelist too, inside the run() function in our constructor.

Check out the unit tests for the function to see a couple more use cases. Here's one final example - sanitizing output from a function:


myFunction = function(input) {
  //do some stuff to make input useful
  
  //guarantee our returned object only has relevant properties
  return Ext.applyOnly({}, input, ['important-thing-1', 'important-thing-2']);
}

Force Ext.data.Store to use GET

Wed, 11 Feb 2009 16:23:00 GMT

Say you have a simple Ext store:


var myStore = new Ext.data.Store({
  url:    '/widgets.json',
  reader: someReader
});

Which you put in a grid, along with a paging toolbar:


var myGrid = new Ext.grid.GridPanel({
  store:   myStore,
  columns: [.....],
  bbar:    new Ext.PagingToolbar({
    store: myStore
  })
  ... etc ...
});

Your grid loads up and the store performs a GET request to /widgets.json, which returns your widgets along with a total (see an example).

Awesome, but now we click one of the paging buttons on the PagingToolbar and we have a problem - our request has turned into POST /widgets.json, with "start=20" and "limit=20" as POST params.

Now we don't really want that - we're not POSTing any data to the server after all, we're just trying to GET some. If you're using a nice RESTful API on your server side this may cause you a real problem, as POST /widgets will likely be taken as an attempt to create a new Widget.

Luckily, as with most things the solution is simple if you know how. An Ext.data.Store delegates loading its data off to an Ext.data.DataProxy subclass. By default your store will create an Ext.data.HttpProxy using the url: '/widgets.json' you passed in your store config. To make sure your stores are always requesting data using GET, just provide a proxy like this:


var myStore = new Ext.data.Store({
  proxy: new Ext.data.HttpProxy({
    url:    '/widgets.json',
    method: 'GET'
  }),
  reader: someReader
});

Adding a loading mask to your ExtJS application

Sun, 01 Feb 2009 16:55:00 GMT

Adding a loading mask like the one on the ExtJS API application is a nice way of showing the user that something is happening while their browser downloads the source code. It's also extremely easy to do.

First, place the following HTML above all of your javascript include tags, ideally just after the <body> tag:

<div id="loading-mask"></div>
<div id="loading">
  <div class="loading-indicator">
    Loading...
  </div>
</div>

If you are currently including javascript files inside the <head>, don't - put them at the bottom.

With a bit of CSS (see below), this provides a white mask over all underlying content, and a loading message. When everything has loaded, remove the mask like this:


Ext.onReady(function() {
  setTimeout(function(){
    Ext.get('loading').remove();
    Ext.get('loading-mask').fadeOut({remove:true});
  }, 250);
});

The above simply fades out the HTML elements to reveal the now ready page. The setTimeout call gives your app a little time to render, which is useful if you're doing something like pulling external content down from the server.

Finally, here's the CSS I use to style up the loading mask. You'll need to download a loading image and stick it in the appropriate directory.

#loading-mask {
  position: absolute;
  left:     0;
  top:      0;
  width:    100%;
  height:   100%;
  z-index:  20000;
  background-color: white;
}

#loading {
  position: absolute;
  left:     50%;
  top:      50%;
  padding:  2px;
  z-index:  20001;
  height:   auto;
  margin:   -35px 0 0 -30px;
}

#loading .loading-indicator {
  background: url(../images/loading.gif) no-repeat;
  color:      #555;
  font:       bold 13px tahoma,arial,helvetica;
  padding:    8px 42px;
  margin:     0;
  text-align: center;
  height:     auto;
}

Why you should be using History in your ExtJS applications

Fri, 23 Jan 2009 00:59:00 GMT

I've been making a few updates to the ExtJS API documents application recently. The actual updates include remembering which tabs you have open and using Ext.History to go between tabs (you can follow the forum post or see a beta version).

That's not quite ready yet, but what has been made very clear to me is that any ExtJS application with more than one view should be using Ext.History. With History we get urls inside the application itself, we can parse them and dispatch accordingly. For example, I'm using a Rails-like Router, which lets you define an internal url map like this:


map.connect(":controllers/:action/:id");
map.connect(":controllers/:action");

The router knows how to decode urls based on the regular expression-like syntax above, and parse the matches into an object - for example:


#users/new    <= becomes {controller: 'users', action: 'new'}
#users/edit/2 <= becomes {controller: 'users', action: 'edit', id: 2}
#colours      <= becomes {controller: 'colours'}

You can of course define any url matching scheme using the connect() function. I then use a simple Dispatcher, which looks at the decoded parameters. It finds the appropriate controller and calls that action on the controller, passing any other parameters as arguments. For example:


#users/new      <= calls UsersController's "new" action
#colours/edit/2 <= calls ColoursController's "edit" action, with {id: 2} as the argument

And so on. Each controller knows what to do for that action. It's easy then to say to someone "go to http://myapp.com/admin#users/152/comments" - which will take them straight to the comments that user 152 has written. Compare that with saying: "go to http://myapp.com/admin, then click the List Users tab, then find the user called Joe Bloggs, then double click the bubble icon next to his name". It's obvious which approach is better.

You don't even need to use something as elaborate as a router, just a simple switch statement or some regular expressions would be enough for many applications. Once you've got Ext.History setup, you could do something as simple as:


//decodes a url and decides how to dispatch it
dispatch = function(token) {
  switch (token) {
    case "users"    :   displayUsers();   break;
    case "users/new":   displayNewUser(); break;
    case "users/2/edit: editUser(2);      break;
    default:            displayDefault(); break;
  };
};
Ext.History.on('change', dispatch);

//Call dispatch on initial page load as Ext.History's change event is not fired here
Ext.History.init(function() {
  var token = document.location.hash.replace("#", "");
  dispatch(token);
});

Obviously you don't hard code user IDs like that but it's easy to see how to roll your own. With just a few lines of code, you've decoded a url into a function to call, which can do anything you need it to. All your internal navigation needs to do is call Ext.History.add("some/new/url"), which will now be picked up by your dispatch code.

It's important to only route like this for idempotent actions (i.e. actions which display data rather than change it), so that data changing actions are not repeated. This is equivalent to using GET and POST correctly in normal web applications.

When the simplest implementation takes just a few lines of code, what reason could there be not to be using it?

ExtJS Solitaire

Tue, 13 Jan 2009 20:22:00 GMT

Update: We recently released the updated Touch Solitaire for Sencha Touch.

For a bit of fun over Christmas I thought I'd try my hand at writing Solitaire using the ExtJS library. The results of my efforts can be seen over at http://solitaire.edspencer.net.

It's reasonably complete, with the familiar drag and drop moving of cards (and stacks of cards). Most of the interface is custom built, with classes representing Cards, Stacks, the Pack, etc. The main motivation for creating this is to give a real-world example of using Drag and Drop with Ext JS, as documentation for it can be hard to come by. The full source of the game can be found on github, and I encourage people to take a look at and/or improve the code if they wish.

A few stats: the game comes to 1300 lines of code, including generous comments and whitespace. It's 15k minified, and uses a custom Ext build. It took roughly 25 hours to put together, which was mostly spent researching how to use Ext's many D&D classes.

The reason I'm releasing it now is that I'm currently working on a much larger, more exciting open source ExtJS project which I want to concentrate on before releasing. If anyone wants to pick this up feel free to fork the code on Github or get in touch in the comments or in #extjs on IRC.

ExtJS Textmate bundle

Sat, 10 Jan 2009 15:58:00 GMT

** Update 2:** I've recently cleaned up the bundle, removing stale snippets. It's now located at https://github.com/edspencer/Sencha.tmbundle

** Update:** Added extra instructions when downloading the bundle instead of git cloning it. Thanks to TopKatz for his help

I develop on both OSX and Windows machines, and my editors of choice are Textmate and the excellent Windows clone E. One of the great things about Textmate is its bundle support, which allows you to create reusable code snippets (among other things).

I've got a good collection of these built up so thought I'd make them available on Github. You can install it like this:

Mac OSX:

cd ~/Library/Application Support/TextMate/Bundles
git clone git://github.com/edspencer/Sencha.tmbundle.git

Windows:

cd C:Documents and Settings{YOUR USERNAME}Application DataeBundles
git clone git://github.com/edspencer/Sencha.tmbundle.git

If you don't have git installed you can simply download the bundle as a zip file, and extract it into the directory as above. You need to rename the extracted directory to something like extjs.tmbundle or it won't show up. If you do go the git route you can of course cd into that git directory at any point and use git pull to update to the latest bundle version.

I'll give one example of the usefulness of snippets like these; here's the Ext.extend snippet from the bundle:

/**
 * @class ${1:ClassName}
 * @extends ${2:extendsClass}
 * ${5:Description}
 */
${1:ClassName} = function(config) {
  var config = config || {};
 
  Ext.applyIf(config, {
    $0
  });
 
  ${1:ClassName}.superclass.constructor.call(this, config);
};
Ext.extend(${1:ClassName}, ${2:extendsClass});

${3:Ext.reg('${4:xtype}', ${1:ClassName});}

To use this you can just type 'extend' into a JS file in TextMate/E and press tab. The snippet takes you through a few editable areas such as the name of your new class, the name of the class you're extending, xtype definition and description, then dumps the cursor inside the Ext.applyIf block. The actual characters typed are these: extend [tab] MyWindow [tab] Ext.Window [tab] [tab] mywindow [tab] Special window class [tab]

Which produces this:


/**
 * @class MyWindow
 * @extends Ext.Window
 * Special window class
 */
MyWindow = function(config) {
  var config = config || {};
 
  Ext.applyIf(config, {
    
  });
 
  MyWindow.superclass.constructor.call(this, config);
};
Ext.extend(MyWindow, Ext.Window);

Ext.reg('mywindow', MyWindow);

Hopefully it's obvious how much time things like this can save when generating repetitive, boilerplate code. The extend snippet is one of the larger ones but even the small ones are very useful (pressing c then tab is much nicer than typing console.log(''); each time).

Any suggestions/contributions are welcome. Thanks go to rdougan for his contributions and organisation also.

There is also another ExtJS textmate bundle available at http://hakore.com/extjs.tmbundle/, written by krzak from the Ext forums.

Using Ext.History

Fri, 09 Jan 2009 12:17:00 GMT

Ext.History is a small class that was released with ExtJS 2.2, making it easy to use the browser's back and forward buttons without breaking your AJAX-only pages.

This can be really useful for any ExtJS application with more than one view, for example a simple app with a grid of Products, which can be double-clicked to reveal an edit form. Ext.History allows the user to click the back button to go back to the grid if they're on the form, and even forward again from the grid. It does this by appending a token to the end of the url:

http://myurl.com/ (default url for the app)
http://myurl.com/#products (shows the products grid)
http://myurl.com/#products/edit/1 (shows the edit form for product 1)

This is useful, so let's look at how to set it up. Ext.History requires that a form field and an iframe are present in the document, such as this:

<form id="history-form" class="x-hidden" action="#">
  <div>
    <input id="x-history-field" type="hidden" />
    
  </div>
</form>

The div is just there to make the markup valid. Ext.History uses the iframe to make IE play nice. Generally I don't like to make any assumptions about what is in the DOM structure so I use Ext to generate these elements:


/**
* Creates the necessary DOM elements required for Ext.History to manage state
* Sets up a listener on Ext.History's change event to fire this.handleHistoryChange
*/
initialiseHistory: function() {
  this.historyForm = Ext.getBody().createChild({
    tag:    'form',
    action: '#',
    cls:    'x-hidden',
    id:     'history-form',
    children: [
      {
        tag: 'div',
        children: [
          {
            tag:  'input',
            id:   Ext.History.fieldId,
            type: 'hidden'
          },
          {
            tag:  'iframe',
            id:   Ext.History.iframeId
          }
        ]
      }
    ]
  });

  //initialize History management
  Ext.History.init();
  Ext.History.on('change', this.handleHistoryChange, this);
}

Ext.History.fieldId and Ext.History.iframeId default to 'x-history-field' and 'x-history-frame' respectively. Change them before running initialiseHistory if you need to customise them (Ext.History is just a singleton object so you can call Ext.History.fieldId = 'something-else').

The main method you'll be using is Ext.History.add('someurl'). This adds a token to the history stack and effectively redirects the browser to http://myurl.com/#someurl. To create something like the grid/form example above, you could write something like this:


Ext.ns('MyApp');

MyApp.Application = function() {
  this.initialiseHistory();

  this.grid = new Ext.grid.GridPanel({
    //set up the grid...
    store: someProductsStore,
    columns: ['some', 'column', 'headers'],

    //this is the important bit - redirects when you double click a row
    listeners: {
      'rowdblclick': {
        handler: function(grid, rowIndex) {
          Ext.History.add("products/edit/" + rowIndex);
        }
      }
    }
  });

  this.form = new Ext.form.FormPanel({
    items: ['some', 'form', 'items'],

    //adds a cancel button which redirects back to the grid
    buttons: [
      {
        text: 'cancel',
        handler: function() {
          Ext.History.add("products");
        }
      }
    ]
  });

//any other app startup processing you need to perform
};

MyApp.Application.prototype = {
  initialiseHistory: function() {
    //as above
  },

  /**
   * @param {String} token The url token which has just been navigated to
   * (e.g. if we just went to http://myurl.com/#someurl, token would be 'someurl')
   */
  handleHistoryChange: function(token) {
    var token = token || "";
    switch(token) {
      case 'products':        this.showProductsGrid();     break;
      case 'products/edit/1': this.showProductEditForm(1); break;
      case '':                //nothing after the #, show a default view
    }
  },

  showProductsGrid: function() {
    //some logic to display the grid, depending on how your app is structured
  },

  showProductEditForm: function(product_id) {
    //displays the product edit form for the given product ID.
  }
};

Ext.onReady(function() {
  var app = new MyApp.Application();
});

So when you visit http://myurl.com/#products, showProductsGrid() will be called automatically, and when you visit http://myurl.com/#products/edit/1, showProductEditForm() will be called with the argument 1. You can write your own logic here to change tab or show a window or whatever it is you do to show a different view to the user.

I'm not suggesting you parse the url token using a giant switch statement like I have above - this is only an example. You could get away with something like that for a very small app but for anything bigger you'll probably want some kind of a router. That goes a little beyond the scope of this article but it is something I will return to at a later date.

There is also an example of Ext.History available on the Ext samples pages.

Custom containers with ExtJS

Tue, 06 Jan 2009 21:42:00 GMT

ExtJS has several built-in Container classes - classes which can contain one or more other Ext.Components (such as Grids, Forms, other Panels, etc). The most obvious example of a Container is the Ext.Panel class, along with its subclasses such as Ext.TabPanel, Ext.form.FormPanel and Ext.Window. With each container class you can add a bunch of components, like this:

//a child component to be added to the container below
var myComponent = new Ext.Panel({html: 'component 1'});

//Ext.Panel is a subclass of Ext.Container
var myPanel = new Ext.Panel({
  items: [
    myComponent,
    {html: 'component 2'},
    {html: 'component 3'}
  ]
});

Which will just create a Panel with three other Panels as its child components ('panel' is the default xtype, so we don't have to specify it). More to the point, you can add and remove components from the Container like this:

myPanel.add({html: 'component 4'});
myPanel.remove(myComponent);

As myPanel is an Ext.Container subclass, the methods add() and remove() automatically add or remove child components from within the Container, and take care of any rendering that needs to be performed. Most of the time this is great, but what if you want to write your own custom Container? Say you had a bunch of shortcut links which performed some action in your application, and for styling or other reasons you want to put them into markup like this:

<div class="x-shortcuts-wrapper">
  <div class="x-shortcuts-header"></div>
  <div class="x-shortcuts">
    <!-- child components to go here -->
  </div>
  <div class="x-shortcuts-footer"></div>
  <button class="x-shortcuts-add">Add</button>
</div>

You might write something like this:

Ext.ns('MyApp');
/**
 * @class MyApp.Shortcuts
 * @extends Ext.Container
 * Container for application shortcuts
 */
MyApp.Shortcuts = Ext.extend(Ext.Container, {
  /**
   * Creates the HTML markup for the shortcuts container
   * @param {Ext.Container} ct The container into which this container will be rendered
   */
  onRender: function(ct) {
    this.el = ct.createChild({
      cls: 'x-shortcuts-wrapper',
      children: [
        {cls: 'x-shortcuts-header'},
        {cls: 'x-shortcuts'},
        {cls: 'x-shortcuts-footer'},
        {cls: 'x-shortcuts-add', tag: 'button'}
      ]
    });
    
    MyApp.Shortcuts.superclass.onRender.apply(this, arguments);
    
    this.shortcutsHolder = this.el.child('.x-shortcuts');
  },
  
  //tells the container which element to add child components into
  getLayoutTarget: function() {
    return this.shortcutsHolder;
  }
});

So our onRender method is responsible for creating some markup, which must be assigned to this.el. We're also calling the onRender() function of the superclass (Ext.Container) to make sure nothing is missed out.

The critical elements here are the getLayoutTarget() function, and the last line on onRender(). Usually when you subclass Ext.Container, the add() and remove() functions add and remove from this.el, which would result in something like this:

<div class="x-shortcuts-wrapper">
  <div class="x-shortcuts-header"></div>
  <div class="x-shortcuts"></div>
  <div class="x-shortcuts-footer"></div>
  <button class="x-shortcuts-add">Add</button>
  <!-- child components will end up here -->
</div>

To prevent this from happening, we obtain a reference to the element we want components to actually be rendered to, and return that with getLayoutTarget(). After that the Container will once again do your bidding.

As of the time of writing getLayoutTarget() is not to be found anywhere in the Ext documentation (version 2.2), so my thanks go to Condor and Animal for answering my question on the ExtJS forum thread.

To round off the example, say your Shortcut class looked something like this:


/**
 * @class MyApp.Shortcut
 * @extends Ext.Component
 * Clickable shortcut class which renders some HTML for a standard application shortcut
 */
MyApp.Shortcut = function(config) {
  var config = config || {};
 
  //apply some defaults
  Ext.applyIf(config, {
    text: 'Shortcut Name',
    icon: 'default_shortcut.gif'
  });
 
  //call the superclass constructor
  MyApp.Shortcut.superclass.constructor.call(this, config);
};
Ext.extend(MyApp.Shortcut, Ext.Component, {
  onRender: function(ct) {
    this.el = ct.createChild({
      cls: 'x-shortcut',
      children: [
        {
          tag: 'img',
          src: this.initialConfig.icon
        },
        {
          tag:  'span',
          html: this.initialConfig.text
        }
      ]
    });
    
    MyApp.Shortcut.superclass.onRender.apply(this, arguments);
  }
});

Ext.reg('shortcut', MyApp.Shortcut);

Then our container would be created like this:


new MyApp.Shortcuts({
  items: [
    new MyApp.Shortcut({text: 'Shortcut 1', icon: 'shatner.gif'}),
    {xtype: 'shortcut', text: 'Shortcut 2', icon: 'nimoy.gif'},
    {xtype: 'shortcut'}
  ]
});

Which would produce HTML like this:

<div class="x-shortcuts-wrapper">
  <div class="x-shortcuts-header"></div>
  <div class="x-shortcuts">
    <div class="x-shortcut">
      <img src="shatner.gif" />
      <span>Shortcut 1</span>
    </div>
    <div class="x-shortcut">
      <img src="nimoy.gif" />
      <span>Shortcut 2</span>
    </div>
    <div class="x-shortcut">
      <img src="default_shortcut.gif" />
      <span>Shortcut Name</span>
    </div>
  </div>
  <div class="x-shortcuts-footer"></div>
  <button class="x-shortcuts-add">Add</button>
</div>

JavaScript bra size calculator

Fri, 28 Nov 2008 12:15:00 GMT

One of the more mesmerizing websites I've worked on recently was for a lingerie boutique in the UK. Aside from the unenviable task of having to look at pictures of women in lingerie all day, I was also forced (forced!) to write a bra size calculator.

The theory behind bra size calculation is arcane and somewhat magical. Understanding of it does not come easily to man nor beast, so it is lucky that I, falling cleanly into neither category, have passed through pain and torment to save you the trouble.

Check it out.

Pleasing, no? The code looks like this, and can be found here:


var BraCalculator = {
  
  /**
   * The string to be returned when the result could not be calculated.
   */
  unknownString: "Unknown",
  
  cupSizes: ["A", "B", "C", "D", "DD", "E", "EE", "F", "FF", "G", "GG", "H", "HH", 
             "J", "JJ", "K", "KK", "L", "LL", "M", "MM", "N", "NN"],
  
  /**
   * Returns the correct bra size for given under bust and over bust measurements
   * @param {Number} underBust The measurement taken under the bust (in inches)
   * @param {Number} overBust The measurement taken over the bust (in inches)
   * @return {String} The correct bra size for the given measurements (e.g. 32C, 40DD, etc)
   */
  calculateSize: function(underBust, overBust) {
    var bandSize = this.calculateBandSize(underBust);
    var cupSize  = this.calculateCupSize(bandSize, overBust);
    
    if (bandSize && cupSize) {
      return bandSize + cupSize;
    } else {
      return this.unknownString;
    };
  },
  
  /**
   * Calculates the correct band size for a given under bust measurement
   * @param {Number} underBust The measurement under the bust
   * @return {Number} The correct band size
   */
  calculateBandSize: function(underBust) {
    var underBust = parseInt(underBust, 10);
    return underBust + (underBust % 2) + 2;
  },
  
  /**
   * Calculates the Cup size required given the band size and the over bust measurement
   * @param {Number} bandSize The measured band size (should be an even number)
   * @param {Number} overBust The measurement taken over the bust
   * @return {String} The appropriate alphabetical cup size
   */
  calculateCupSize: function(bandSize, overBust) {
    var bandSize = parseInt(bandSize, 10);
    var overBust = parseInt(overBust, 10);
    var diff     = overBust - bandSize;
    
    var result   = this.cupSizes[diff][/diff];
    
    //return false if we couldn't lookup a cup size
    return result ? result : false;
  }
};

And to apply it to your own pages, use something a bit like this:

jQuery(document).ready(function(){
  //add listeners to band and cup measurement text boxes
  jQuery('#back').keyup(Honeys.updateBraSizeCalculation);
  jQuery('#cup').keyup(Honeys.updateBraSizeCalculation);
});

var Honeys = {
  updateBraSizeCalculation: function() {
    var back = jQuery('#back')[0].value;
    var cup  = jQuery('#cup')[0].value;
    
    if (back.length > 0 && cup.length > 0) {
      jQuery('#fit')[0].value = BraCalculator.calculateSize(back, cup);
    };
  }
};

Now we're talking UK sizes here, so exercise extreme caution! It should be trivial to adapt to your country with our lovely conversion charts.

Don't pretend you're not going to play with it. You know you are. Like, right now.

Weird bug preventing ExtJS checkboxes from submitting properly

Fri, 24 Oct 2008 13:53:00 GMT

This applies to ExtJS 2.2, the most current version as of the time of writing.

Checkboxes often make their way into my Ext JS forms. Sometimes, though, they don't behave as expected. Checking and unchecking them would frequently fail, simply not doing anything. Sometimes it would work, sometimes it wouldn't - how frustrating!

It turns out there is a bug with ticking/unticking checkboxes in Ext. If you click on the checkbox itself everything works fine - the image of the checkbox updates and the correct value is submitted. If however you click on the checkbox's label, the image of the checkbox is updated but the correct value is not submitted. So if the box started off unticked and you ticked it by clicking the label, the image is updated but nothing else happens.

This is extremely unintuitive because you can see that the box has been checked, but its internal representation hasn't actually changed. Because I usually click the label this took me over an hour to track down, so I hope this helps someone out. Once I had identified the bug, a quick Google search points to this thread on the ExtJS forums, which has some guidance on this.

How Ext.apply works, and how to avoid a big headache

Wed, 27 Aug 2008 18:25:00 GMT

Ext.apply is one of those magic Ext JS methods which copies the essence of one object onto another. You usually call it like this:

Ext.apply(receivingObject, sendingObject, defaults)

Where defaults are optional. If you supply defaults, Ext.apply actually does this:

Ext.apply(receivingObject, defaults);
Ext.apply(receivingObject, sendingObject);

In other words, the order of precedence of the three arguments goes like this: any properties in receivingObject which are also present in defaults will be overwritten by the property in defaults. After that has happened, any properties which are present receivingObject (after defaults have been applied) and also present in sendingObject will be overwritten by the sendingObject value. More graphically:

Ext.apply({a: 'receiver'}, {a: 'sender'}, {a: 'default'}); // = {a: 'sender'}

For me, this was slightly unexpected as I expected the default options to have the lowest priority - that is the default option would only be copied across if it was not present in either the receiving or the sending objects, so watch out for that.

Anyway that's all well and good once you know how it works inside, but while watching an otherwise excellent screencast from Jay Garcia (see http://tdg-i.com/42/ext-js-screencast-003-extapply-published), something odd happened. The example he gave went like this (commented lines signify the output returned by Firebug):

var obj1 = {x: 'x string', y: 'y string'}
// = {x: 'x string', y: 'y string'}

var obj2 = {a: 'a string', b: 4289, c: function(){}}
// = {a: 'a string', b: 4289, c: function(){}}

var obj3 = Ext.apply(obj2, obj1, {pxyz: 'soifje'})
obj3 
// = {a: 'a string', b: 4289, pxyz: 'soifje', x: 'x string', y: 'y string'}
obj2
// = {a: 'a string', b: 4289, pxyz: 'soifje', x: 'x string', y: 'y string'}
obj3 === obj2 
// true - obj3 and obj2 are the same object

var obj4 = Ext.apply(obj3, obj2, {a: 'fwaifewfaije'})
// obj4 = {a: 'fwaifewfaije', b: 4289, pxyz: 'soifje', x: 'x string', y: 'y string'}

So basically he set up obj1 and obj2 with non-conflicting properties, then merged them with some defaults to create obj3. In this case the defaults didn't conflict with the properties from obj1 or obj2, so obj3 is essentially a straightforward combination of obj1 and obj2, plus a default pxyz value.

What he did then however was to create obj4 as a combination of obj2 and obj3, along with a default value for the 'a' property, which was a property of obj2 and obj3. Crucially, obj4's 'a' property was set to the default value, which as we've seen from how Ext.apply works above, should never happen (it should be set to the default value but then immediately set back again on the second internal Ext.apply call).

So what gives? Well, it turns out this is because when calling:

obj3 = Ext.apply(obj2, obj1, {pxyz: 'soifje'})

obj3 and obj2 are the exact same object, as Ext.apply returns the first argument after the apply process has taken place. So in the next call:

obj4 = Ext.apply(obj3, obj2, {a: 'fwaifewfaije'})

obj3 and obj2 are in fact both references to the same object, which is causing the unexpected default value. We can show this by manually creating a new obj3 with the exact same properties, and running the example again:

var obj1 = {x: 'x string', y: 'y string'}
// obj1 = {x: 'x string', y: 'y string'}

var obj2 = {a: 'a string', b: 4289, c: function(){}}
// obj2 = {a: 'a string', b: 4289, c: function(){}}

var obj3 = {a: 'a string', b: 4289, pxyz: 'soifje', x: 'x string', y: 'y string'}
// obj2 = {a: 'a string', b: 4289, pxyz: 'soifje', x: 'x string', y: 'y string'}
// obj3 === obj2 => false... obj3 and obj2 are the same object

var obj4 = Ext.apply(obj3, obj2, {a: 'fwaifewfaije'})
// obj4 = {a: 'a string', b: 4289, pxyz: 'soifje', x: 'x string', y: 'y string'}

This time around we get what we expect - the default value is not applied because it is already present in obj2. Here, obj2 is not the same object as obj3, even though their properties are identical.

I'm not completely certain why two references to the same object cause this behaviour, but the code above does appear to demonstrate that this is what is happening (you can just copy/paste each example into Firebug to reproduce it).

Moral of the story? Well, now you have a more detailed understanding of Ext.apply, and hopefully you'll be on your guard about referencing the same object by two different variables when performing this type of operation. I know I will be ;)

Don't forget the wurst

Sat, 23 Aug 2008 10:57:00 GMT

So it came to be realised during Rails Camp 08 that the world was sadly lacking in William Shatner based list apps. Thankfully, the Railslove guys (plus Rany) have come to the rescue with don't forget the wurst. If you're looking for something delightfully random in your life, you may have just found it.

Check out my Shatner's greatest hits list and Ask William about a few of the items - he will furnish you with compelling and thoughtful answers.

DRYing up your CRUD controller RSpecs

Wed, 20 Aug 2008 11:21:00 GMT

A lot of what we do in Rails boils down to simple Crud. If you're in the habit of developing admin sections to allow your clients to control the front end of their site, you'll probably have noticed that these controllers in particular tend to all look the same. There are quite a few ways to DRY up the controller itself - using something like make_resourceful, for example, but what about your RSpec files?

Robby Russell recently posted a short article about RSpec's Shared Example Groups. Take a look at his post or at the RSpec documentation (it's not scary) to see how they work in more detail, but they basically allow you to share your it "should" do ..... end blocks, enabling them to be reused multiple times.

When you think about it, every time we spec a basic CRUD controller, we're doing the same thing - we should be able to just do something like this:

require File.dirname(__FILE__) + '/../../spec_helper'

describe Admin::ContactsController do
 before(:each) do
   @model = 'Contact'
   login_as_admin
 end

 it_should_behave_like "CRUD GET index"
 it_should_behave_like "CRUD GET show"
 it_should_behave_like "CRUD POST create"
 it_should_behave_like "CRUD PUT update"
 it_should_behave_like "CRUD DELETE destroy"
 it_should_behave_like "CRUD GET edit"
end

Well luckily for you we can! I've been using this pattern with most of my CRUD-based controllers for a while now. You just set up the model's name at the top (and in the case above perform the login_as_admin helper method to log the user in), and for each action in the controller use a specialised shared example group. The example groups all know how to understand your @model definition in before(:each) and map it to the various expectations that the CRUD specs run.

The great thing about this approach is that you can just dump these into your spec file and update them later if you need to. For example if you need to paginate the index action instead of the default find(:all) the shared example group will test, just remove the it_should_behave_like("CRUD GET index") and add your own describe block - the rest of the it_should_behave_like lines can stay as they are.

This approach works especially well if you take the approach to CRUD controllers where you create a CrudController class and subclass your CRUD controllers from it - e.g. from the example above:

class Admin::ContactsController < Admin::CrudController

end

My Admin::CrudController reflects on the name of the ContactsController subclass here and figures everything out without me having to do any work. This works because almost all of my Admin section code works the same way. If I need to diverge from the default CRUD behaviour, I just redefine the particular action in Admin::ContactsController:

class Admin::ContactsController < Admin::CrudController
  def index
    #my alternative implementation
  end
end

I personally prefer this approach over the more declarative alternatives such as make_resourceful because I feel more in control this way. That said, I'm open to persuasion :)

Anyway the code for the shared example groups is on Github at http://github.com/edspencer/rspec-crud-controller-shared-example-groups. There's only one file there - just chuck in in your spec directory and add this line to spec_helper.rb:

require File.expand_path(File.dirname(__FILE__) + "/crud_controller_matchers")

Apart from the magic going on in the CrudSetup module at the top of that file, there's nothing special going on here, so you can tweak the examples to your particular approach. There's a good chance some of that code could be written more cleanly so please feel free to suggest changes / fork the file on Github.

UPDATE - Actually no, ignore the FUD about make_resourceful above, it works remarkably well with very few modifications to the shared groups - I'll post those up as soon as I'm done.

Cleaning up an example Ext JS form

Fri, 08 Aug 2008 21:07:00 GMT

One of my recent Ext JS forms had a section which looked like this:


items: [
  new Ext.Button({
    text: 'Preview Video',
    iconCls: 'play',
    handler: function() {
      var win;
      
      if (!win) {
        win = new Ext.Window({
          title: 'Preview Video',
          modal: true,
          height: 377,
          width: 368,
          items: [
            new Ext.Panel({
              autoLoad: '/admin/videos/' + video_id + '/preview.html'
            })
          ],
          buttons: [
            {
              text: 'OK',
              handler: function() {
                win.close();
              }
            }
          ]
        });
        
      };
      win.show();
      
    }
  })
]

Not horrific but not nice either - let's DRY this up. It's not too pleasant to read but all it's really doing is rendering a customised Ext.Button which opens up a modal Ext.Window, in which is loaded the contents of a known url.

Ok so let's start with that Window. First, we'll make a subclass of Ext.Window:


/**
 * AdFunded.views.Video.PreviewWindow
 * @extends Ext.Window
 * A simple Preview window for the given video_id
 */
AdFunded.views.Video.PreviewWindow = function(config) {
  var config = config || {};
    
  Ext.applyIf(config, {
    title: 'Preview Video',
    modal: true,
    height: 377,
    width: 368,
    items: [
      new Ext.Panel({
        autoLoad: '/admin/videos/' + config.video_id + '/preview.html'
      })
    ],
    buttons: [
      {
        text: 'OK',
        scope: this,
        handler: function() {
          this.window.close();
        }
      }
    ]
  });
  
  AdFunded.views.Video.PreviewWindow.superclass.constructor.call(this, config);
  
  this.window = this;
};
Ext.extend(AdFunded.views.Video.PreviewWindow, Ext.Window);
Ext.reg('video_preview_window', AdFunded.views.Video.PreviewWindow);

Note the namespacing employed above - within an Ext MVC framework I have been developing across several projects for the last few months, all views follow this structure. AdFunded is the name of the application. The precise structure doesn't matter here, but using a namespace for each app does.

So we've taken the Window setup out of our view now, which leaves us with:


items: [
  new Ext.Button({
    text: 'Preview Video',
    iconCls: 'play',
    handler: function() {
      var win;
      
      if (!win) {
        win = new AdFunded.views.Video.PreviewWindow({video_id: id});
      };
      win.show();
      
    }
  })
]

Great - we've gone from 34 lines in our view to 15, and scored ourselves a reusable Window component which we can call from anywhere in the app. Nice work, but there's more to come... If we're going to use the Preview Window again, we'll probably need to use that Preview Button again too. Let's see:


/**
 * AdFunded.views.Video.PreviewButton
 * @extends Ext.Button
 * Displays a Preview Window for the given video_id
 */
AdFunded.views.Video.PreviewButton = function(config) {
  var config = config || {};
  
  Ext.applyIf(config, {
    text: 'Preview Video',
    iconCls: 'play',
    handler: function() {
      var win = new AdFunded.views.Video.PreviewWindow({video_id: config.video_id});
      win.show();
    }
  });
  
  AdFunded.views.Video.PreviewButton.superclass.constructor.call(this, config);
};
Ext.extend(AdFunded.views.Video.PreviewButton, Ext.Button);
Ext.reg('video_preview_button', AdFunded.views.Video.PreviewButton);

Which leaves us with the following the the view:

items: [
  {
    xtype: 'video_preview_button',
    video_id: id
  }
]

We've now gone from 34 lines to 6 (in the view at least), but the point is not about cutting out lines of code - it's creating reusable components. We've added 20 lines overall this way but we now have two extra components that we can call on at any time (with minimal lines of code), safe in the knowledge that they will provide a consistent experience each time.

ExtJS Radio Buttons and Square Brackets

Fri, 08 Aug 2008 20:41:00 GMT

While creating an ExtJS form with several radio buttons today I ran into a bug which caused none of them to work as expected, even though there were no errors/exceptions. To cut a long story short, it was because I was setting the name to "schedule[include_type]" - like this:

{
  xtype: 'radio',
  name: 'schedule[include_type]',
  inputValue: 'page',
  boxLabel: 'Show page:'
}

This radio button is one of 4, which allows the user which type of file they want to include on a particular model (a Schedule in this case) - be it Page, Video, Category or one other. The thing is - none of them work with the square brackets in the name. If you remove the brackets, they all work correctly, but the server-side is relying on those brackets to be present to group the data correctly.

In the end I bit the bullet and updated my submit method to add a new parameter directly - here's a full example:

form = new Ext.form.FormPanel({
  items: [
    {
      xtype: 'radio',
      name: 'include_type',
      inputValue: 'page',
      boxLabel: 'Show page:'
    },
    {
      xtype: 'radio',
      name: 'include_type',
      inputValue: 'category',
      boxLabel: 'Show category:'
    },
    ... plus some extra items
  ],
  buttons: [
    {
      text: 'Save',
      handler: function() {
        
        //find the currently selected include_type from the form
        var include_type = this.form.getValues()['include_type'];
        
        //note the params option - this needs to be added manually otherwhise 
        //schedule[include_type] won't appear
        form.form.submit({
          waitMsg: 'Saving Data...',
          params: "schedule[include_type]=" + include_type,
          url: some url...
        });
      }
    }
  ]
})

Note: I don't usually add buttons in the way above so I'm not sure if the form.form.submit will work correctly here - see http://extjs.com/deploy/dev/docs/?class=Ext.form.FormPanel for information about overriding submit.

So what we're doing here is finding which radio button is currently checked, and appending this under "schedule[include_type]" when POSTing the form variables to the server. This really isn't pleasant but seems to be the best way around this limitation for now.

I regularly use square brackets in other Ext JS Fields - Radio Buttons seem to be the only ones that have this problem. http://extjs.com/forum/showthread.php?p=185296 has a bit of background behind this, but no real solution.

When Git tells you it failed to push some refs

Fri, 25 Apr 2008 14:00:00 GMT

I received an unhelpful error while trying to push to a repository on Github today:

git push
To git@github.com:user/repo.git
! [rejected] branchname -> branchname (non-fast forward)
error: failed to push some refs to 'git@github.com:user/repo.git'

In case you ever have the same problem, all you have to do is a quick git pull first, then you can carry on as normal. Easy when you know how...

Git clone vs Git submodule

Thu, 17 Apr 2008 12:01:00 GMT

Having recently made the switch from svn to git, I wanted to achieve what svn externals did (and what Piston did better). Turns out this is pretty simple, for example to get rails on edge:

cd your_git_dir
git submodule add git://github.com/rails/rails.git vendor/rails

A couple of other default submodules you'll want:

git submodule add git://github.com/dchelimsky/rspec.git vendorpluginsrspec
git submodule add git://github.com/dchelimsky/rspec-rails.git vendorpluginsrspec-rails

What submodule does is to check out the submodules as their own repositories, so they are tracked independently of the repository you made them submodules of. The submodules you have are tracked in the .gitmodules file, which might look something like this:

[submodule "vendorrails"]
 path = vendor/rails
 url = git://github.com/rails/rails.git
[submodule "vendor/plugins/rspec"]
 path = vendor/plugins/rspec
 url = git://github.com/dchelimsky/rspec.git
[submodule "vendor/plugins/rspec-rails"]
 path = vendor/plugins/rspec-rails
 url = git://github.com/dchelimsky/rspec-rails.git

Or at least that's how it should look, Windows seems to mess this up into looking something like the following:

[submodule "vendorrails"]
 path = vendor\rails
[submodule "vendorrails"]
 url = git://github.com/rails/rails.git
[submodule "vendorpluginsrspec"]
 path = vendor\plugins\rspec
[submodule "vendorpluginsrspec"]
 url = git://github.com/dchelimsky/rspec.git
[submodule "vendorpluginsrspec-rails"]
 path = vendor\plugins\rspec-rails
[submodule "vendorpluginsrspec-rails"]
 url = git://github.com/dchelimsky/rspec-rails.git

Note especially that you need to remove the 's and replace all 's with /'s. If you don't git will give a fail message like:

fatal: bad config file line 2 in .gitmodules
No submodule mapping found in .gitmodules for path 'vendor/plugins/attachment_fu'

I don't know why it's doing that, maybe it's something I'm doing wrong but you'll need to tidy it up to make it look more like the first example in order for it to work properly.

One final thing to be aware of is that when you clone onto a new machine you'll need to run the following commands:

git submodule init
git submodule update

This will initialise the submodules that are referenced in the .gitmodules file, then pull them down. By default cloning doesn't seem to do that.

Useful Rails javascript expansions for EXTJS

Wed, 16 Apr 2008 18:00:00 GMT

If you're using Edge Rails (or > 2.1, which isn't out at time of writing), and are using the EXT JS framework anywhere, here are a couple of handy javascript include tag expansions to clean up your views. Just chuck them into any file in your config/initializers directory:

ActionView::Helpers::AssetTagHelper.register_javascript_expansion :ext => ['ext/adapter/ext/ext-base', 'ext/ext-all']

ActionView::Helpers::AssetTagHelper.register_javascript_expansion :ext_grid_filter => ['ext/ux/menu/EditableItem', 'ext/ux/menu/RangeMenu', 'ext/ux/grid/GridFilters', 'ext/ux/grid/filter/Filter', 'ext/ux/grid/filter/StringFilter', 'ext/ux/grid/filter/DateFilter', 'ext/ux/grid/filter/ListFilter', 'ext/ux/grid/filter/NumericFilter', 'ext/ux/grid/filter/BooleanFilter']

The top one includes the relevant EXT base files and the second one includes all the Grid Filters from the excellent Filter Grid plugin (see http://ccinct.com/lab/filter-grid/.

Include them as usual like this:

javascript_include_tag :ext, :ext_grid_filter, :cache => 'ext_javascripts'

EXT remote-loading forms with Combo boxes

Wed, 16 Apr 2008 15:41:00 GMT

Something that's harder than it should be is populating an EXT edit form with form data, where one of the form fields is a select box. If there is a specific set of values that this select can take, then you can hard code that using a SimpleStore, like this:


var exampledata = [['AL', 'Alabama'],
                   ['AK', 'Alaska'],
                   ['AZ', 'Arizona'],
                   ['AR', 'Arkansas'],
                   ['CA', 'California']];

var store = new Ext.data.SimpleStore({
    fields: ['abbr', 'state'],
    data : exampleData
});

var combo = new Ext.form.ComboBox({
    store: store,
    displayField: 'state',
    valueField: 'abbr',
    ... etc ...
});

form = new Ext.form({
  items: [combo],
  ... etc ...
});

form.load(url_to_load_from);

So that will populate the select box with the static values you've defined (the 5 states above), then when the form loads it will select the appropriate option automatically.

So far so good, but what if you need to load what goes into the select box dynamically? Well, first you'll need to set up your remote data store (my server is sending back JSON data, hence the JsonReader):


store = new Ext.data.Store({
  url: 'url_to_load_combo_data_from',
  reader: new Ext.data.JsonReader(
    {root: 'states',totalProperty: 'results'},
    [
      {name: 'name', type: 'string', mapping: 'state.name'},
      {name: 'abbr', type: 'string', mapping: 'state.abbr'}
    ]
  )
});

This will consume data like this:


{
  "states": [
    {"state": {"name": "Alabama", "abbr": "AL"}}, 
    {"state": {"name": "Alaska",  "abbr": "AK"}}
  ], 
  "results": "2"
}

And populate the store with a collection of state records which can then be loaded into the combobox.

Then all you need to do is load the store before loading the form data, and your comboboxes will be correctly populated, displaying the correct option. Here's the full example:

store = new Ext.data.Store({
  url: 'url_to_load_combo_data_from',
  reader: new Ext.data.JsonReader(
    {root: 'states',totalProperty: 'results'},
    [
      {name: 'name', type: 'string', mapping: 'state.name'},
      {name: 'abbr', type: 'string', mapping: 'state.abbr'}
    ]
  )
});

var combo = new Ext.form.ComboBox({
    store: store,
    displayField: 'state',
    valueField: 'abbr'
    ... etc ...
});

form = new Ext.formPanel({
  items: [combo]
});

store.load();
form.load(your_form_data_url);

Be wary using the pagination options on the combobox here (see http://extjs.com/deploy/dev/docs/?class=Ext.form.ComboBox) - the reason being if your state's 'abbr' features on the second page of the results it won't populate the correct options into the combo box.

Getting EXT PagingToolbars to save state

Sat, 12 Apr 2008 17:18:00 GMT

A problem that has recently had me pulling my hair out is how to save state in an EXT PagingToolbar.

Ext makes it easy to save the state of most of its components - by default it does this by setting a cookie with the relevant configuration info, then just reading it back when you load the component again. I've been using it to save the state of a few EXT grids I've been using on a recent project, this saves config such as which columns you have visible, which column you're sorting by, and how the columns are ordered.

That works great, and is trivial to implement - just set your provider (see http://extjs.com/deploy/dev/docs/?class=Ext.state.CookieProvider) and be sure to give your grid an id in its config - this is used as the key in the state provider and needs to be unique for each component.

The problem comes when you're using a paging toolbar though, as this does not save state, so every time you view the grid you're back to page 1. You can add state behaviour to the paginator by piggybacking the grid's state store, here's how it's done:


Ext.PagingToolbar.override({
  init : function (grid) {
    this.grid = grid;        
    this.grid.on("beforestatesave", this.saveState, this);    
    Ext.util.Observable.capture(grid.store, this.onStateChange, this);
  },
  saveState : function(grid, state) {
    state.start = grid.store.lastOptions.params.start;
  },
  onStateChange : function(ev, store, records, options) {
    if (ev == "load") {this.grid.saveState(); };
  }
});

Basically we're intercepting the attached Grid's saveState() event and appending the current start value as stored in the Grid's DataStore (e.g. if you're looking at page 3 with 25 rows per page then start = 50). If you examine the contents of your state provider using Firebug (Ext.state.Manager.getProvider().state, then look for the key that matches the id of your grid), you'll see that there is now a record for 'start', which grabbed the correct value from the Grid's store.

All you need to do then is retrieve that value from the state provider and load your store accordingly:


store = new Ext.data.Store({... your store config ...});

grid = new Ext.grid.GridPanel({
  id: 'unique_grid_id',
  store: store,
  ... other grid config ...
});

//shorthand way of retrieving state information
var state = Ext.state.Manager.getProvider();

var start = state.get(options.id).start || 0);
store.load({params: {start: start, limit: 25}});

If the start value for this grid has never been set it'll default to zero - e.g. the first page. Next time you come back to this grid it'll take you right back to where you were, including all column setup and sorting behaviour you have specified.

Rails asset tag expansions

Sat, 12 Apr 2008 12:39:00 GMT

If you're using edge rails you may have noticed that you can now define your own JavaScript expansions (if you're not on edge this will be included in the imminent 2.1 release). The default expansion that comes with rails looks like this:

javascript_include_tag :defaults

Which grabs application.js as well as the prototype/scriptaculous javascripts and includes them all (only do that if you need it all - it adds ~150kb to your page). But say you've got a line which looks like this:

javascript_include_tag 'my_js_file', 'another_js_file', 'and_another'

And say you want to include the same set of files on a different page - it turns out Rails makes it really easy to DRY this up. Make a new file in the config/initializers directory (I call my asset_tag_expansions.rb) and add a line like the following (don't forget to restart your server afterwards):

ActionView::Helpers::AssetTagHelper.register_javascript_expansion :my_js=> ['my_js_file', 'another_js_file', 'and_another']

Now in your views you can simply put:

javascript_include_tag :my_js

You can of course register as many of these as you like, and include as many of your own expansions on the same javascript_include_tag line as you want, e.g.:

javascript_include_tag :my_js, :another_expansion, :and_another

The same applies to stylesheets:

ActionView::Helpers::AssetTagHelper.register_stylesheet_expansion :public_styles=> ['reset', 'layout', 'home']
  stylesheet_link_tag :public_styles

Finally, although you're getting all that onto one line, each asset file is still being requested separately by your browser, each time making another nasty expensive HTTP request. Rectify that:

stylesheet_link_tag :public_styles, :cache => 'public'
  javascript_include_tag :my_js, :another_expansion, :and_another, :cache => 'public'

This bundles up your three stylesheets and concatenates them into a single file, which is called public.css in this case. In the example above this means two less trips to the server to retrieve the stylesheet files, therefore a faster loading page. This is helpful because it enables you to maintain small, targeted stylesheets in development which makes finding the relevant CSS declarations easier, without the performance hit of all those HTTP requests when in production.

One final option is to use the :all expansion, which just grabs everything in the stylesheets or javascripts directory. Be careful with that though as you've got to be sure assets are being loaded in the right order (especially for JavaScript), and that you really need all that asset weight on each page.