Paul Hammant's blog

The limits of merging experiment

2026-04-28T00:00:00+00:00

A colleague suggested the cherry-picking way for trunk-based-development with branch for release needs worked examples cos there are foot-guns.

Cherry-pick to a release branch isn’t crystal clear as a workflow

The failure modes are more interesting than they look. To pick at it, I built a small playground - a single-file Sinatra CRUD app, an end-to-end Playwright test, and a series of trunk commits that would let me run the same scenario two different ways and see what git actually does.

The repo is here: paul-hammant/limits-of-merging-experiment. Clone it, run ./start.sh to make solution/ folder the git folder not the one you cloned.

The run bundle install to go get deps.

The setup

Folder solution/ contains the ruby/sinatra app and a playwright test for it. It is three tier - html with JS, a ruby middle tier, and a sqlite base tier. reset.sh will keep getting us back to a starting point. See “C1” below.

Five hypothetical trunk commits, all reachable as patches in patches/ and re-applicable on demand, are key to this experiment.

	Change	Touches
C1	Initial Person CRUD app, seeded Flintstones, Playwright happy-path test	everything
C2	Add `hair_color` (string) - dropdown, JS validation, DB `CHECK` constraint	`app.rb`, `happy_path_test.rb`
C3	Button text change to UPPERCASE: NEW PERSON / EDIT / DELETE / SAVE / CANCEL	`app.rb`, `happy_path_test.rb`
C4	`hair_color` becomes `INTEGER` (1..6); dropdown values are now ints	`app.rb`, `happy_path_test.rb`
C5	Add maintainer comment in header (cosmetic, previously untouched region)	`app.rb`

The shape that matters for this experiment: C3 is cosmetic and unrelated to hair colour. C4 builds on C2. C5 is in an untouched corner. This is normal trunk life - small unrelated changes interleaving in the same files.

The hypothetical release branch is cut from C2. The team wishes that was it for the release, and continues on an unfrozen trunk as normal. Later there’s something that agreed as a bug fix (definately not feature creep) and it should be cherry picked to the release branch by the responsible “merge meister” or release engineer. We want to ship bugfix C4 (the int conversion) but not C3 feature change (the uppercase buttons).

Scenario 1: `git am` is honest, and that’s why it fails

First release engineer instinct: apply the C4 patch directly to “stable” release branch.

$ git checkout -b release c2
$ git am patches/c4-hair-color-int.patch
Applying: C4: hair_color stored as INTEGER (1..6), dropdown values become ints
error: patch failed: app.rb:133
error: app.rb: patch does not apply

Loud failure. Why? Look at the failing hunk:

       <td><%= h p['dob'] %></td>
-      <td><%= h p['hair_color'] %></td>
+      <td><%= h(HAIR_COLORS[p['hair_color']]) %></td>
       <td class="row-actions">
         <a class="button" href="/people/<%= p['id'] %>/edit">EDIT</a>

The patch’s context lines (the unchanged EDIT reference around the change) include C3’s uppercase text. On the release branch the buttons still say Edit. Context doesn’t match. git am is strict about context - it refuses.

This is git am correctly doing its job. It’s not the failure mode I’m interested in.

Scenario 2: `git cherry-pick` is helpful, and that’s why it lies

(we do a ./reset.sh to go back to the starting position)

Second instinct, and what most developers actually type:

$ git checkout -b release c2
$ git cherry-pick c4
Auto-merging app.rb
[release ...] C4: hair_color stored as INTEGER (1..6), dropdown values become ints

Clean. No conflict. Test passes.

Why? Because git cherry-pick doesn’t apply patches by context - it does a three-way merge with the parent of C4 as the merge base. The merger sees:

merge base (C3 on trunk) had EDIT
the cherry-pick target (release branch) has Edit
C4 didn’t change either of those lines

So three-way merge correctly concludes “leave the case alone, just apply the hair-colour change.” Release ends up with C4’s diff cleanly applied on top of Edit.

That’s the textbook good outcome. And it’s exactly the failure mode the colleague was warning about - not because this cherry-pick was wrong, but because the success was contingent on a property of the diffs that nobody checked. C3 happened to touch only button text. If C3 had ever-so-slightly tidied up the dropdown the cherry pick may have conflicted - forcing a human to arbitrate on it.

The lies are little lies, from good intentions, perhaps.

What about merge-point tracking?

Here’s where Subversion fans get nostalgic. SVN’s svn:mergeinfo tries to record on the release branch “I have integrated revisions r3, r4 from trunk.” A subsequent sweep merge of trunk into release knows to skip those. Not just Subversion, but the “bigger” VCS technologies Perforce and Microsoft’s TFVC.

Git has none of that. A cherry-pick produces a commit with a different SHA from the original - git cat-file -p on the two reveals different parents, different trees, different hashes. The only audit trail is the optional (cherry picked from commit ...) line that git cherry-pick -x leaves in the commit message, and git itself does not consult that line for anything. It’s a comment.

So if you later merge trunk wholly into release, git’s merge base is the last common ancestor - which is C2, not C4. Git will re-apply C3, C4, and C5 from trunk on top of release. If the cherry-picked C4 on release is byte-identical to trunk’s C4, the merger deduplicates silently. If they differ even slightly (a hotfix on release, a typo correction), you get spurious conflicts that look very real, with no way for git to say “you already integrated this, just take trunk’s copy.” Coming from Svn, Perforce or TFVC merge-point-tracking is not as you remember it. In a TBD + release branches workflow you would never do a sweeping commit from trunk to the release branch - you would only do cherry picks and less and less so over time to that release branch. At some point the release branch has been superceded and eleigible for deletion.

SVN’s mergeinfo aimed at this and got bitten by edge cases - subtree mergeinfo creep, properties getting out of sync if you bypass svn merge. The “slightly broken” reputation is earned. But the intent - cross-branch awareness of what’s been integrated - is something git deliberately doesn’t have. Linus rejected it on simplicity grounds. The price is paid by anyone running long-lived release branches.

The order of cherry-picks question

If C3 and C4 are eventually both wanted on the release branch, does the order matter? Two scenarios:

Scenario A - out of order. Cherry-pick C4 first (the urgent fix), then C3 later (because someone decided uppercase buttons should ship after all):

release branch cherry-picks: C2 then C4' then C3'

Scenario B - in order. Cherry-pick in trunk order:

release branch cherry-picks: C2 then C3' then  C4'

In both cases, trunk has C2, C3, and C4. We could then sweep-merge c5 from trunk into the release branch.

Resulting tree hashes (with author/date/identity pinned so SHAs are deterministic):

	Tree hash after sweep merge	Merge commit SHA
Scenario A (out of order)	`088b679...`	`e96a8e0...`
Scenario B (in order)	`088b679...`	`4ad3efc...` (fast-forward!)

Tree hashes match. Content is byte-identical. Empty git diff between the two release branches.

But the commit graphs are different shapes. Scenario A produced a real merge commit with two parents - the cherry-picked C4’/C3’ on release and the trunk C5 - because the histories diverged. Scenario B’s cherry-picks produced commits byte-identical to trunk’s (same diffs, same pinned timestamps), so the sweep merge fast-forwarded; release’s tip is main’s tip.

So the answer to “does order matter” is layered:

Content: no, both converge to the same source tree.
History shape: yes, you get a merge node in one and a flat history in the other.
SHA equivalence: no, never - parent chains differ, so commit SHAs cascade differently. SHA equality was never the right test for “did this work.”

Blocking a commit (Scenario C)

The order question above is about C3 and C4 both eventually shipping. What about C3 not shipping at all — a hard “no, this isn’t for this release”? Cut release from C1 (so C2 also becomes a cherry-pick), cherry-pick C2 and C4, then:

$ git merge -s ours --no-edit \
    -m "block C3: merge -s ours (record without applying diff)" c3
$ git merge --no-edit main                      # sweep

git merge -s ours c3 makes c3 a parent of release via a merge commit, but the resulting tree is exactly “ours” — none of c3’s diff is applied. The merge-base machinery on subsequent merges then sees c3 as already integrated and skips it. It’s the structural equivalent of SVN’s --record-only and Perforce’s resolve -ay.

The scenario script probes “is each trunk tag reachable from release?” after every step, since reachability is git’s audit trail:

after Cherry-pick C2:       c2 yes, c3 no,  c4 no,  c5 no
after Cherry-pick C4:       c2 yes, c3 no,  c4 no,  c5 no
after -s ours block of C3:  c2 yes, c3 YES, c4 no,  c5 no   ← merge node makes c3 an ancestor
after sweep:                c2 yes, c3 yes, c4 yes, c5 yes  ← main is now an ancestor

The c3 flip on the third row is the block being recorded. After the sweep, all of main’s tags are reachable through the merge edge — but git diff release main reports only C3’s button-text changes (UPPERCASE on main, mixed-case on release). The block held.

(Aside: c2 says “yes” right after its cherry-pick because the experiment pins author/date/identity, so the cherry-pick reproduced c2’s original SHA byte-for-byte. In a real workflow timestamps differ on every cherry-pick, so c2 would show “no” too — cherry-picks leave no DAG fingerprint, only -s ours does.)

The thing to notice: git can block a commit durably, but the audit shape is different from SVN/P4. Where SVN updates a property string and P4 writes an ignored integration record, git records the decision as a graph fact — a merge node with the blocked commit as a parent. Reading the audit trail later means inspecting the DAG and the commit message at the block step. There’s no git mergeinfo, no git integrated. The data lives in git log --merges, and the meaning lives in the message you typed.

What the SHA actually proves

This is the bit I had to stop and think about. I’d been comparing commit SHAs and getting confused by the differences. The right framing:

Commit SHA equality means “byte-identical commit object including parents.” Tree hash equality means “byte-identical content.” Cherry-picks change parents. They cannot preserve commit SHAs. They can preserve tree hashes - and that’s the only equivalence that matters for correctness.

So when judging whether a cherry-pick + sweep merge produced the right result: diff the trees, not the commits. A clean git diff main release after the sweep is the only proof you need.

What git can’t tell you

Putting it together, here is what git silently cannot answer for a release branch built from cherry-picks:

“Have I already integrated this trunk commit?” Git’s answer is “no” - even if you cherry-picked it. The DAG has no link after the even (ignoring formatted comments).
“When I sweep merge runk to release, will it be a no-op?” Only if every cherry-picked patch is byte-identical to its trunk twin. There’s no machine check.
“Are these two release branches functionally equivalent?” Only git diff of trees can tell you. Commit history is misleading.
“Did this cherry-pick land safely?” Only your automated tests can tell you. Three-way merge succeeding is necessary, not sufficient.

The real risk in real corporate codebases

A common pushback: “in a million-line corporate codebase, two unrelated commits almost never touch the same file region - cherry-picks land cleanly the vast majority of the time.” That’s empirically true. The base rate of textual collision is low.

But the question isn’t frequency, it’s severity when it does happen. The classic bad outcome isn’t a noisy git am rejection - it’s a quiet git cherry-pick that three-way-merges into wrong-but-plausible code, ships to a release branch, passes your tests because the tests don’t cover the exact corner that broke, and surfaces in production a week later. Git gives you no warning. There’s no git fsck --semantic.

The mitigations I keep coming back to:

Work in thin vertical slices. This is a good idea generally, but it especially pays off when cherry-picks are in your future. A single commit/PR that changes the DB schema, the middle tier, and the UI together is one cherry-pick - either it all lands on the release branch or none of it does. Split the same bug fix across three commits (one per tier) and you now have three cherry-picks that each have to be remembered, ordered, and applied. Miss one and you ship a half-fix; the release branch compiles, the smoke test passes, and the bug is “fixed” everywhere except the layer you forgot. Your pre-commit automated tests on the release branch should catch the omission - but “should” is doing heavy lifting there, and the failure mode is exactly the kind of partial-state subtlety tests are weakest at.
Test the release branch like it’s a fresh codebase, not “trunk minus a few commits.” End-to-end, not just the change you cherry-picked.
Cherry-picks of schema/data changes need extra scrutiny. Migration logic written assuming a trunk DB state may break against a release DB state.
Prefer release-from-trunk over release-with-cherry-picks when your cadence allows it - roughly weekly or faster. If you ship every few days, a release branch buys you very little and the cherry-pick overhead isn’t worth it; just tag trunk. Cherry-picked release branches earn their keep at monthly/quarterly cadences where the stabilization window is long enough that trunk has moved on substantially. Long-lived release branches that absorb selective fixes are a structural risk, not a tooling problem git is going to grow out of.
For high-cost cherry-picks, run the full test suite on the cherry-picked branch before merge. This is what the playground demonstrates: a happy-path Selenium/Cypress/Playwright test that hits the UI and asserts on the DB shows up regressions that git diff won’t.

Reproducing this

git clone https://github.com/paul-hammant/limits-of-merging-experiment
cd limits-of-merging-experiment
./start.sh                          # set up solution/ as a fresh playground
./scenario-a-out-of-order.sh        # release: C2, C4, C3, then merge main
./rollback.sh
./scenario-b-in-order.sh            # release: C2, C3, C4, then merge main
./rollback.sh
./scenario-c-block-c3.sh            # release@C1: cherry-pick C2 + C4, BLOCK C3, sweep main

Each script prints the resulting graph, tree hash, and diff. Compare the two.

The patches in patches/ are real git format-patch output - readable, replayable, and the source of truth for the trunk timeline. The scenario scripts pin author identity and timestamps so anyone running them gets the same SHAs I did. That’s the only way to make a cherry-pick experiment reproducible.

Repeating in SVN: what `svn:mergeinfo` actually looks like

Earlier I waved at svn:mergeinfo as the thing git deliberately doesn’t have. Worth showing the property string itself, because the shape is the whole point.

The same C2–C5 timeline replayed in a fresh local SVN repo gives this revision map:

Repo rev	Meaning
r1	layout (`mkdir trunk + branches + tags`)
r2	C1 — initial Person CRUD app
r3	C2 — `hair_color` (string)
r4	C3 — UPPERCASE buttons
r5	C4 — `hair_color` INTEGER
r6	C5 — maintainer comment
r7	`svn copy /trunk@r3 /branches/release` (cut at C2)

Scenario A — cherry-pick out of order (C4 then C3, then sweep)

$ cd svn-wc/branches/release

$ svn merge -c5 ^/trunk .            # cherry-pick C4
$ svn commit -m "cherry-pick C4 from trunk@r5"
$ svn propget svn:mergeinfo .
  /trunk:5

$ svn merge -c4 ^/trunk .            # cherry-pick C3
$ svn commit -m "cherry-pick C3 from trunk@r4"
$ svn propget svn:mergeinfo .
  /trunk:4-5

$ svn merge ^/trunk .                # sweep
--- Merging r6 through r9 into '.':
U    app.rb
$ svn commit -m "sweep merge ^/trunk into release"
$ svn propget svn:mergeinfo .
  /trunk:4-9

Scenario B — cherry-pick in trunk order (C3 then C4, then sweep)

$ svn merge -c4 ^/trunk .            # cherry-pick C3
$ svn commit -m "cherry-pick C3 from trunk@r4"
$ svn propget svn:mergeinfo .
  /trunk:4

$ svn merge -c5 ^/trunk .            # cherry-pick C4
$ svn commit -m "cherry-pick C4 from trunk@r5"
$ svn propget svn:mergeinfo .
  /trunk:4-5

$ svn merge ^/trunk .                # sweep
$ svn commit -m "sweep merge ^/trunk into release"
$ svn propget svn:mergeinfo .
  /trunk:4-9

What the property is telling you

Both scenarios converge to /trunk:4-9, and svn diff ^/trunk ^/branches/release reports no file content difference — only this property exists on the branch root. The intermediate path differs (/trunk:5 → /trunk:4-5 versus /trunk:4 → /trunk:4-5), but order doesn’t matter to the end state, just like in git.

What is different from git: the sweep svn merge ^/trunk reads the property and refuses to re-apply revisions named in it. Only r6 (C5) actually produced edits in the sweep — r4 and r5 were already accounted for. SVN can answer the question “have I integrated this trunk revision yet?” because it wrote down the answer the first time. Git cannot, because git deliberately wrote nothing down.

There’s a quirk visible in the final string: /trunk:4-9, not /trunk:4-6. SVN records the closed range it considered during the sweep, including repo revisions that touched neither trunk nor any merge source — r7 was the branch copy, r8 and r9 were the cherry-pick commits themselves. This is exactly the kind of “mergeinfo creep” SVN earned its slightly-broken reputation for. It’s harmless here; it can become noisy across years of long-lived branches, particularly if anyone bypasses svn merge and edits properties by hand.

Scenario C — blocking C3 with `--record-only`

The third question worth asking: what about the change you don’t want to ship? Suppose the team’s verdict on C3 (UPPERCASE buttons) is “not for this release” — not “we’ll cherry-pick it later” but a hard no. Re-cut the release branch from C1 (so C2 also becomes a cherry-pick), then:

$ svn merge -c3 ^/trunk .                # cherry-pick C2 (the wanted feature)
$ svn merge -c5 ^/trunk .                # cherry-pick C4 (bugfix on top of C2)
$ svn merge --record-only -c4 ^/trunk .  # block C3: record but do not apply
$ svn merge ^/trunk .                    # sweep — should only pick up C5

The --record-only flag adds the revision to svn:mergeinfo without applying its diff. The property evolution:

after cherry-pick C2 (-c3):       /trunk:3
after cherry-pick C4 (-c5):       /trunk:3,5      ← non-contiguous, r4 absent
after record-only block (-c4):    /trunk:3-5      ← r4 fills in, no diff applied
after sweep:                      /trunk:3-10

The sweep consults the property, sees r4 is accounted for, and skips it. The final release tree differs from trunk only by C3’s button-text change — the block held.

The thing to notice: the post-block property string is /trunk:3-5. That is exactly what the string would say if C3 had been merged normally. It cannot tell you whether r4 was applied or blocked — only that it was considered. SVN’s mergeinfo records “we accounted for this revision,” not the intent behind that accounting. Future-you reading the property in a year has to fall back on the commit message at the block step. The data structure remembers the bookkeeping but not the decision.

Reproducing the SVN side

The scripts are on the svn-version branch of the same repo:

git checkout svn-version
sudo apt install subversion       # or your platform's equivalent
./start.sh                        # build trunk r2..r6 + release@r3
./scenario-a-out-of-order.sh      # cherry-pick C4 then C3, then sweep
./rollback.sh                     # wipe svn-repo/ and svn-wc/
./scenario-b-in-order.sh          # cherry-pick C3 then C4, then sweep
./rollback.sh
./scenario-c-block-c3.sh          # release@r2; cherry-pick C2 + C4, BLOCK C3, sweep

Each scenario script wipes and rebuilds the repo (SVN is append-only, so “reset to a past revision” means start over), prints the svn:mergeinfo value after every step, and ends with a svn diff of trunk against release. The same patches/ directory feeds both the git and SVN flows — the patches are applied with patch -p1 rather than git apply, because the SVN working copy lives inside the outer git worktree and git apply would treat it as the parent repo’s index.

So: SVN does have the audit trail, the property is human-readable, and the sweep merge is genuinely aware of it. The cost is the property’s tendency to grow ranges that include revisions it had no business including, plus the institutional discipline of never touching svn:mergeinfo directly. Whether that’s a better trade than git’s “we keep no record at all” is a judgement call about what failure mode you’d rather face — false reassurance from a slightly-wrong record, or no record and a test suite doing all the work.

Repeating in Perforce: integration records, not properties

Perforce solves the same problem SVN does — what’s been integrated where — but it stores the answer per-file in the integration database rather than as a string-property on the branch root. Run p4 integrated and the depot tells you, for every file revision on the branch, which trunk revision it came from and whether it arrived as a clean copy or a three-way merge.

Same C2–C5 timeline, replayed against a local p4d:

Changelist	Meaning
CL1	C1 — initial Person CRUD app
CL2	C2 — `hair_color` (string)
CL3	C3 — UPPERCASE buttons
CL4	C4 — `hair_color` INTEGER
CL5	C5 — maintainer comment
CL6	`p4 populate //depot/trunk/...@2 //depot/branches/release/...` (cut at C2)

Scenario A — cherry-pick out of order (C4 then C3, then sweep)

$ p4 integrate //depot/trunk/...@4,@4 //depot/branches/release/...
$ p4 resolve -am //depot/branches/release/...
$ p4 submit -d "cherry-pick C4 from trunk@CL4"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch from //depot/trunk/app.rb#1,#2
//depot/branches/release/app.rb#2 - merge from //depot/trunk/app.rb#4

$ p4 integrate //depot/trunk/...@3,@3 //depot/branches/release/...
$ p4 resolve -am //depot/branches/release/...
$ p4 submit -d "cherry-pick C3 from trunk@CL3"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch from //depot/trunk/app.rb#1,#2
//depot/branches/release/app.rb#3 - merge from //depot/trunk/app.rb#3
//depot/branches/release/app.rb#2 - merge from //depot/trunk/app.rb#4

$ p4 integrate //depot/trunk/... //depot/branches/release/...
$ p4 resolve -am //depot/branches/release/...
$ p4 submit -d "sweep merge //depot/trunk into //depot/branches/release"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch from //depot/trunk/app.rb#1,#2
//depot/branches/release/app.rb#3 - merge from //depot/trunk/app.rb#3
//depot/branches/release/app.rb#2 - merge from //depot/trunk/app.rb#4
//depot/branches/release/app.rb#4 - copy from //depot/trunk/app.rb#5

Scenario B — cherry-pick in trunk order (C3 then C4, then sweep)

$ p4 integrate //depot/trunk/...@3,@3 //depot/branches/release/...
$ p4 resolve -am //depot/branches/release/... ; p4 submit -d "cherry-pick C3"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch from //depot/trunk/app.rb#1,#2
//depot/branches/release/app.rb#2 - copy from //depot/trunk/app.rb#3

$ p4 integrate //depot/trunk/...@4,@4 //depot/branches/release/...
$ p4 resolve -am //depot/branches/release/... ; p4 submit -d "cherry-pick C4"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch from //depot/trunk/app.rb#1,#2
//depot/branches/release/app.rb#2 - copy from //depot/trunk/app.rb#3
//depot/branches/release/app.rb#3 - copy from //depot/trunk/app.rb#4

$ p4 integrate //depot/trunk/... //depot/branches/release/...
$ p4 resolve -am //depot/branches/release/... ; p4 submit -d "sweep merge"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch from //depot/trunk/app.rb#1,#2
//depot/branches/release/app.rb#2 - copy from //depot/trunk/app.rb#3
//depot/branches/release/app.rb#3 - copy from //depot/trunk/app.rb#4
//depot/branches/release/app.rb#4 - copy from //depot/trunk/app.rb#5

What the records are telling you

Same content in both scenarios — p4 diff2 //depot/trunk/... //depot/branches/release/... reports identical for every file. The two depots converged.

But the integration verbs differ in a way SVN’s mergeinfo and git’s history don’t expose at all:

Scenario A: every cherry-pick is recorded as merge from. Cherry-picking C4 onto a branch that doesn’t yet have C3 forced a three-way resolve under the hood — P4 noticed and labelled it.
Scenario B: every cherry-pick is recorded as copy from. C3 then C4 in trunk order produced clean takes on each step.

The “verb that landed me here” is part of the audit trail. If you ever investigate why a release branch file diverges from trunk, knowing whether it got there via a merge or a copy (and from which exact source revision) is the question P4 answers and the question git can’t.

The sweep p4 integrate //depot/trunk/... //depot/branches/release/... with no rev range consults the integration database and only re-applies revisions that haven’t been credited yet — r5 (C5) in our run. That’s what P4 marketing meant by “merge tracking” decades before SVN tried to bolt the same idea on with svn:mergeinfo. The cost is that it’s all server-side state, locked behind the depot — there’s no offline, no pull-request workflow, and an Unloaded depot for archival is its own ceremony. The benefit is the integration history is structured, queryable per-file, and never gets out of sync with what was actually integrated.

Scenario C — blocking C3 with `resolve -ay`

Same workflow as the SVN scenario C, branched from C1: cherry-pick C2, cherry-pick C4, block C3, sweep.

$ p4 integrate //depot/trunk/...@3,@3 //depot/branches/release/...
$ p4 resolve -ay //depot/branches/release/...     # accept yours = keep target, ignore source
$ p4 submit -d "block C3: integrate + accept-yours of trunk@CL3 (no diff)"
$ p4 integrated //depot/branches/release/app.rb
//depot/branches/release/app.rb#1 - branch  from //depot/trunk/app.rb#1
//depot/branches/release/app.rb#2 - copy    from //depot/trunk/app.rb#2     (C2)
//depot/branches/release/app.rb#3 - merge   from //depot/trunk/app.rb#4     (C4)
//depot/branches/release/app.rb#4 - ignored //depot/trunk/app.rb#3          (C3 — blocked)

The integration verb is ignored. P4’s per-file integration database has three states for any source revision: branch/copy/merge (it landed) or ignored (it was considered and rejected). After the sweep:

//depot/branches/release/app.rb#5 - merge   from //depot/trunk/app.rb#5     (C5)

C3 stays ignored. The block held, and the audit trail says, in machine-readable form, what happened.

The thing to notice: this is the cleanest case where P4 carries strictly more information than SVN. SVN’s svn:mergeinfo records that a revision was accounted for; P4’s integration database records whether the bytes were taken or rejected. Asked “did anyone consider C3 for this release?” the SVN string says yes; the P4 record says yes and tells you the answer was “no, ignored.”

Reproducing the Perforce side

Scripts are on the perforce-version branch:

git checkout perforce-version
# install p4 + p4d — see https://www.perforce.com/downloads/helix-core
# (or the primer at github.com/paul-hammant/fast_perforce_setup)
./start.sh                       # build trunk CL1..CL5 + release@CL2 on a sandbox p4d
./scenario-a-out-of-order.sh     # cherry-pick C4 then C3, then sweep
./rollback.sh                    # stop p4d, wipe p4-server/ and p4-wc/
./scenario-b-in-order.sh         # cherry-pick C3 then C4, then sweep
./rollback.sh
./scenario-c-block-c3.sh         # release@CL1; cherry-pick C2 + C4, BLOCK C3, sweep

The sandbox runs p4d on localhost:1667 (not the conventional 1666), with no SSL and no security level set, so no passwords. Everything lives under p4-server and p4-wc; both are wiped on every run. Patches are applied with patch -p1 for the same reason as the SVN scripts — git apply would notice the outer git worktree and refuse to write.

Closing

So which VCS “knows what’s been integrated”?

	Records integration history?	Where it lives	What it records	Distinguishes “applied” from “blocked”?
Git	No	Nowhere (optional `(cherry picked from …)` comment, never read)	Nothing machine-checkable	No
SVN	Yes	`svn:mergeinfo` property on branch root	A revision-range string, e.g. `/trunk:4-9`	No — both look the same in mergeinfo
Perforce	Yes	Per-file integration database	Source path, source rev, integration verb (`branch`, `copy`, `merge`, `ignored`)	Yes — the `ignored` verb

All three converge on the same source tree when the underlying patches don’t conflict. The difference is what the tool can tell you afterwards about how that tree got built — and therefore what kinds of “did the cherry-pick land safely?” questions you can ask the tool versus answer with tests.

Cherry-pick is not infallible. It also isn’t usually wrong. The hazard is the gap between those two facts: git’s vocabulary for “this is the same change” is “this is byte-identical bytes,” and outside of that narrow case, it has no opinion. SVN tried to fill that gap and is imperfect in it implementation. Git decided the mess wasn’t worth it.

What this means in practice for trunk-based development with release branches: cherry-pick is a tool that requires you to bring your own audit. The audit is your test suite, your code review, and your CI. If those are weak, cherry-pick is a foot-gun. If they’re strong, cherry-pick is fine - and the SHA divergence between trunk and release is just bookkeeping noise.

So per my colleagues nudge - cherry picks are great, but be careful and know the limits.

One last thing: I’ve been in engineering leadership previously for a 12 planned releases a year team, and sometimes late features were merged to the release branch using cherry-pick. Don’t do that - wait for the next release. I lost the argument cos the business really really wanted those late features - more than once from the for the same component of the system. Chery pick is for release stabilizing things only: toggle off your work that isn’t ready to go live before the branch cut moment.

Updates:

May 7, 2026: add scenario-c. Also svn and perforce branches.

Live Verify

2026-03-17T00:00:00+00:00

Nobody cares about a thing in the tech world until it is successful, and I have something here that could be, but isn’t yet. Its a brand new idea and has an immense chasm to cross. Most of it is an adoption/patronage chasm, but there are technical challenges too.

I’ve blogged before on this enough to have timed out any change of filing a patent on it. So there’s no way I could make a SaaS that would have some exclusivity to operate based on patent. Not that patents really protect business ideas these days anyway.

Live verify links

GitHub home page: live-verify.github.io/live-verify/
GitHub Repo: github.com/live-verify/live-verify

This is me and ClaudeCode mostly. Me reminding Claude that I really really like automated tests.

The home page has screenshots of live-verify in action (before, after verification, trust statement), and some real videos of tests running. Some of those videos are me showing camera apps, and some are voiceless screen recordings of a larger multi-tier simulation of all the tech pieces. The home page also links to 480 or so use cases (and has a search feature).

What is it?

A system to verify a document or part of a document in your hands that might be way away from the system that produced it. And verify means a claim within - could be just a rectangle of text mid way through the document. Could be 20 rectangles and 20 verifications. The could also be from something that’s printed and in front of you, using a camera-using phone app. Perhaps only for small rectangles of text. Most of the time you’ve the document as a digital document. Say a PDF or a web page. In that case the verification is via a Chrome extension. In time the same tech is built into Adobe, Outlook, MsWord etc. At some point it is in the OS and the need for a chrome extension disappears.

There’s 480 use cases that are listed. Maybe that’s only 280 after consolidation. Uses cases are anti-fraud, safety, and accountability. To say more, use cases cover preventing document fraud (fake certificates, forged receipts), verifying safety credentials (building inspectors, healthcare workers, equipment compliance), and enforcing accountability (audit trails, regulatory compliance, chain of custody). “Accountability” is a catch-all though. It captures the compliance, ethics, audit, and delegated-authority themes that don’t fit neatly under fraud or safety.

Most of the time it is about reducing costs of things by rolling back fraud, because fraudulent claims would now be easier and quicker to identify and avoid the consequences of. Yes, “claim” in an insurance thing, but I’m talking about more general use like this claim “I, Jimmy Cricket, worked at Microsoft in Seattle from 2010 to 2013 in the DevOps team”, or “Mr Alex E Spooner earned his millions from his family’s corner shop and not from selling drugs at all. He would make a great investor”. Other uses, could be ID systems with a camera app confirming the person with the eInk badge-ID is actually a police officer. That’d need rules of conduct for those verification moments.

The system rests on Human eyes being quickly able to scan a claim, and wonder “I would trust this if only there was a way of verifying it as true”, then a button press and some maths suggesting it is true (is “verified”) and a chain of trust that human eyes can also quickly scan toward another trust or not trust decision. And a system that’s much more trustworthy than presentation and use of QR code.

No personal data ever leaves your device. The text is captured, normalized, and hashed entirely on-device. Only the hash — a one-way fingerprint that reveals nothing about the document — is sent over the network. The verification endpoint never sees your degree certificate, salary receipt, or passport. Tens of thousands of cryptography PhDs could testify in court that hashing is irreversible.

The maths is hashing - SHA256 by default, but could be stronger or weaker. Some education needed for ordinary people to broadly understand why it is one-way. The hashes are all in public and not indexed. You know the hash, you can see the payload (default would be { "status":"verified" }). You don’t know the hash, you can’t discover it, subject to the plain-text’s entropy.

Revocation is built in. “Verified” isn’t permanent. An issuer can change the status to revoked, suspended, or expired at any time — just update what the endpoint returns. A doctor loses their license, a certificate is superseded, an employee is terminated: the next person who verifies sees the current status, not a stale “OK.” This is what static digital signatures can’t do.

Trust rests on the domain, not the app. There’s no central authority or certificate registry. You decide whether ed.ac.uk is an authority for Edinburgh degrees, or whether coned.com is an authority for utility worker badges. The verify: line on the document tells you whose domain is making the claim — and that domain can declare who authorized it. For example, in the automated tests there’s a James Whitfield bank statement from Meridian National Bank with verify:meridian-national.bank.us/statements. The extension verifies the hash, then walks the authority chain: Meridian National Bank says it’s authorized by the FDIC (fdic.gov), which in turn is authorized by the US Treasury (treasury.gov). The verifier sees the full chain and decides whether to trust it.

So we have two camera-using phone apps, and a Chrome extension. Ideally we’d go on to make some plugin for Acrobat and Outlook and more. The Chrome extension is working well, but the camera-apps have upper limits. One is size of “document” to be verified which is understandable as OCR though a camera lens isn’t perfect. The other is tabular data - read on for more on that.

I wish there was more shared code, between the implementations:

Platform	Normalization	Hashing
Web app (`public/normalize.js`)	JavaScript (canonical)	Web Crypto API
Chrome extension (`shared/normalize.js`)	JavaScript (auto-generated copy via `scripts/sync-shared.js`)	Web Crypto API
iOS app	JavaScript via JSBridge (runs normalize.js directly)	CryptoKit (native Swift)
Android app	JavaScript via Rhino (runs transpiled normalize.js)	Native `MessageDigest`

The web-app version was were I started with this some months ago. It used Tesseract OCR and would run online just asking for the camera. Tesseract (via WASM) was really problematic so I shifted to proper apps ahead of schedule.

I also have a reference backend that we use in the built-in automated tests. The possibly hundreds of SaaS companies that supply into this space may make different choices. The value is the open standard here.

Post-Verification Actions

Verification doesn’t have to stop at “verified.” There are two sources of follow-up actions.

The first is the app or phone itself. If you’re scanning a coffee shop receipt and you have Expensify installed, the app can offer “Send to Expensify?” — that’s a client-side decision based on context, and a choice for you to accept or ignore. The receipt issuer doesn’t know or control what’s offered. The second is the verification endpoint deliberately returning actions in its response. A building inspector’s badge could offer a form for the homeowner to record visit details (areas accessed, duration, any concerns). A lawyer’s credentials could link to the bar association’s public disciplinary record. These are issuer choices — the endpoint decides what to offer.

The endpoint pattern scales from light (a link) to strong (a POST form for reporting). The strong version matters where there’s a power dynamic — an inspector at your door, a healthcare worker with a vulnerable patient. The verification response tells the verifier “you may record details of this interaction” and explicitly says they will never be told not to.

Current Tech Problems

Live Verify’s camera mode works beautifully for prose documents — certificates, references, claims where text flows continuously line by line. Point your phone, tap verify, done.

But tabular data — receipts, invoices, anything with left-aligned descriptions and right-aligned prices — breaks the pipeline. Many of the use cases involve tabular data.

What Works: Prose

An employment reference like this verifies perfectly:

I, Paul Hammant, worked for Kevin Behr in
his role as CIO of HedgeServ in New York City
in 2015 and 2016
verify:paulhammant.com/refs

The text flows left-to-right with no gaps. The OCR engine (Apple Vision on iOS, Google ML Kit on Android) sees one contiguous block of text. Verification succeeds.

What Breaks: Tabular Receipts

A coffee shop receipt like this fails:

Flat White                £3.40
Almond Croissant          £3.25
SUBTOTAL:                 £6.65

Where a human sees one line, the OCR engine sees two separate text blocks — “Flat White” and “£3.40” — because the visual gap signals “these are separate regions.” A receipt that should be one block becomes 10+ fragments. Both iOS and Android camera apps I have made attempting to use Live-Text features of the OS show up to 10 fragments. And that’s the best case sometimes there’s 7 and some crucial numbers on the right hand side are omitted completely.

Where This Leaves Us

Document Type	Clip Mode (Browser)	Camera (Android)	Camera (iOS)
Short prose (peer references, badge claims)	Works	Works	Works
Longer prose (certificates, full letters)	Works	Works (OCR errors creep in)	Works (OCR errors creep in)
Tabular (receipts, invoices)	Works	Works (stitching)	Broken (single-rectangle)

Clip mode — the browser extension — handles everything perfectly because it operates on digital text, not pixels. No OCR, no rectangle fragmentation.

The Real Fix: Registration Marks for Tabular Data

Our stitching on Android is a workaround. It handles the common case but it’s fragile — font size changes, multi-column layouts, and unusual spacing can all defeat Y-coordinate grouping.

The proper solution is for Apple and Google to support registration marks for tabular data in their OCR APIs. Two marks at diagonal corners define a bounding rectangle. The OCR engine sees the marks, treats everything inside as one text block, and strips the marks from the output — like a QR finder pattern that the camera uses for orientation but doesn’t include in the payload.

We already have one mark: the vfy: line at the bottom-left of every verifiable region. A Unicode corner character — ⌝ (U+231D, upper right corner) — on the first line at the right margin provides the opposing corner:

8 Market Square                 ⌝
Henley-on-Thames RG9 2AA
Flat White                  £3.40
Almond Croissant            £3.25
SUBTOTAL:                   £6.65
vfy:r.the-daily-grind.co.uk

It’s already in the pic I took above.

The ⌝ is a control mark, not content. The OCR engine consumes it for bounds detection and omits it from the text output, just as it would omit a QR finder square. The vfy: line stays — it’s already part of the verification protocol and already in the OCR output.

Why a Unicode character rather than an image or a special printed mark? Because it works everywhere text works: HTML, PDF, LaTeX, Word, thermal receipt printers. Any system that can render U+231D can print the mark. No image embedding, no special font, no binary format dependency.

This is future work — it requires Apple and Google to recognize the ⌝ + vfy: pair as a single region boundary in their Vision and ML Kit frameworks. But the convention is simple, the marks are unobtrusive on the printed document, and the benefit is large: receipts, invoices, lab results, bank statements, and every other tabular document becomes verifiable by camera.

Until then, camera apps work better for prose, and clip mode works for everything - including most of the anti-fraud cases.

Did You Send This - for module phone SMS/Voice

2025-11-30T00:00:00+00:00

This article is a comprehensive 2025 update to the original 2006 post: “Did you send this - another weapon against spam?”

Applying DYST to Mobile Communication

The original DYST idea could theoretically be adapted by Google and Apple, who could implement it unilaterally for mobile calls and messages. Rather than relying on telecommunication carriers, they could use their existing TCP/IP infrastructure for verification. An app on the sending device would get a temporary token from an Apple/Google server, and the receiving device would check that token with the same central server before alerting the user.

DYST supporting phone or (iOS or Android)

Here is a sequence diagram illustrating the back-channel exchange for a genuine call

sequenceDiagram
    participant Alice as Alice's Handset<br>(Originator)
    participant AliceTeleco as Alice's<br>Teleco
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Alice->>AliceTeleco: 1. Alice actually<br>Initiates call
    AliceTeleco->>Bob: Call Signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is fully DYST-enabled.
    Alliance->>Alice: 3. DYST Challenge
    activate Alice
    Note over Alice: Alice does not sees<br/>"Are you placing a call<br>to <bobs phone numb>"<br>question. And handset<br>silently confirms she is
    Alice-->>Alliance: 4. Challenge Response (Confirmed)
    deactivate Alice
    Note over Alliance: Alice's auto<br>confirmation received.
    Alliance->>Bob: 5. Verification Success
    deactivate Alliance
    activate Bob
    Note over Bob: Now rings or displays<br>message. Call/Text<br>connection established.<br/>Phone buzzes or rings
    deactivate Bob

If Bob is in the address book, then <bobs phone numb> is replaced with Bob’s name

Here is a sequence diagram illustrating the back-channel exchange for a genuine Apple/Google “Messages” (not SMS)

sequenceDiagram
    participant Alice as Alice's Handset<br>(Originator)
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Alice->>Bob: 1. Alice actually<br>sends text msg
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is fully DYST-enabled.
    Alliance->>Alice: 3. DYST Challenge
    activate Alice
    Note over Alice: Alice does not see<br/>"Did you send this<br>to <bob phone num>"<br>question. And handset<br>silently confirms she did
    Alice-->>Alliance: 4. Challenge Response (Confirmed)
    deactivate Alice
    Note over Alliance: Alice's auto<br>confirmation received.
    Alliance->>Bob: 5. Verification Success
    deactivate Alliance
    activate Bob
    Note over Bob: Now rings or displays<br>message. Call/Text<br>connection established.<br/>Phone buzzes
    deactivate Bob

Apple launched the iPhone with iMessage (for other iPhone users). Google did the same for Android uses, and these days Rich Communication Services (RCS) allows both to interoperate. In the diagram above the teleco’s systems are not shown because they are not involved for the routing of “smart” (non SMS) messages.

Here is a sequence diagram illustrating the back-channel exchange for a fake call

sequenceDiagram
    participant Eve as Eve's Handset<br>(Originator)
    participant EveVOIP as Eve's VoIP<br>Gateway
    participant Alice as Alice's Handset<br>(Originator)
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Eve->>EveVOIP: 1. Eve actually<br>Initiates call<br>caller-id = Alice
    EveVOIP->>Bob: Call Signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is fully DYST-enabled.
    Alliance->>Alice: 3. DYST Challenge
    activate Alice
    Note over Alice: Alice does not see<br/>"Are you placing a call<br>to <bobs phone num>"<br>message. And handset<br>silently DENIES she is
    Alice-->>Alliance: 4. Challenge Response (Denied)
    deactivate Alice
    Note over Alliance: Alice's auto<br>denial received.
    Alliance->>Bob: 5. Verification Failure
    deactivate Alliance
    activate Bob
    Note over Bob: Drops the call without<br>notifying Bob and doesnt<br>place it in recent calls
    deactivate Bob

The sequence diagram for messages would look very similar.

Smartphone supporting RCS but not yet supporting DYST (older iOS or Android version perhaps)

Legit call via RCS (no DYST)

sequenceDiagram
    participant Alice as Alice's Handset<br>(Originator)
    participant AliceTeleco as Alice's<br>Teleco
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Alice->>AliceTeleco: 1. Alice actually<br>Initiates call
    AliceTeleco->>Bob: Call Signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is RCS-enabled but not<br/>DYST-enabled.
    Alliance->>Alice: 3. DYST Challenge (RCS)
    activate Alice
    Note over Alice: Alice sees interactive<br/>"Are you placing a call<br>to <bobs phone numb>"<br>question. And she<br>manually confirms.
    Alice-->>Alliance: 4. Challenge Response (Confirmed)
    deactivate Alice
    Note over Alliance: Alice's manual<br>confirmation received.
    Alliance->>Bob: 5. Verification Success
    deactivate Alliance
    activate Bob
    Note over Bob: Now rings or displays<br>message. Call/Text<br>connection established.<br/>Phone buzzes or rings
    deactivate Bob

Legit message via RCS (no DYST)

sequenceDiagram
    participant Alice as Alice's Handset<br>(Originator)
    participant AliceTeleco as Alice's<br>Teleco
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Alice->>AliceTeleco: 1. Alice actually<br>sends text msg
    AliceTeleco->>Bob: Message signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is RCS-enabled but not<br/>DYST-enabled.
    Alliance->>Alice: 3. DYST Challenge (RCS)
    activate Alice
    Note over Alice: Alice sees interactive<br/>"Did you send this<br>to <bob phone num>"<br>question. And she<br>manually confirms.
    Alice-->>Alliance: 4. Challenge Response (Confirmed)
    deactivate Alice
    Note over Alliance: Alice's manual<br>confirmation received.
    Alliance->>Bob: 5. Verification Success
    deactivate Alliance
    activate Bob
    Note over Bob: Now rings or displays<br>message. Call/Text<br>connection established.<br/>Phone buzzes
    deactivate Bob

Fake caller ID call via RCS (no DYST)

sequenceDiagram
    participant Eve as Eve's Handset<br>(Originator)
    participant EveVOIP as Eve's VoIP<br>Gateway
    participant Alice as Alice's Handset<br>(Originator)
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Eve->>EveVOIP: 1. Eve actually<br>Initiates call<br>caller-id = Alice
    EveVOIP->>Bob: Call Signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is RCS-enabled but not<br/>DYST-enabled.
    Alliance->>Alice: 3. DYST Challenge (RCS)
    activate Alice
    Note over Alice: Alice sees interactive<br/>"Are you placing a call<br>to <bobs phone num>"<br>message. And she<br>manually DENIES.
    Alice-->>Alliance: 4. Challenge Response (Denied)
    deactivate Alice
    Note over Alliance: Alice's manual<br>denial received.
    Alliance->>Bob: 5. Verification Failure
    deactivate Alliance
    activate Bob
    Note over Bob: Drops the call without<br>notifying Bob and doesnt<br>place it in recent calls
    deactivate Bob

Phone not supporting RCS nor DYST (much older iOS or Android version perhaps)

Legit call via SMS (no RCS, no DYST)

sequenceDiagram
    participant Alice as Alice's Handset<br>(Originator)
    participant AliceTeleco as Alice's<br>Teleco
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Alice->>AliceTeleco: 1. Alice actually<br>Initiates call
    AliceTeleco->>Bob: Call Signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is not RCS or DYST enabled.<br>Will use SMS.
    Alliance->>Alice: 3. DYST Challenge (SMS)
    activate Alice
    Note over Alice: Alice sees SMS asking if<br/>she's calling<br><bobs phone num>.<br/>She replies 'YES'.
    Alice-->>Alliance: 4. Challenge Response (Confirmed)
    deactivate Alice
    Note over Alliance: Alice's SMS reply<br>of 'YES' received.
    Alliance->>Bob: 5. Verification Success
    deactivate Alliance
    activate Bob
    Note over Bob: Now rings or displays<br>message. Call/Text<br>connection established.<br/>Phone buzzes or rings
    deactivate Bob

Legit message via SMS (no RCS, no DYST)

sequenceDiagram
    participant Alice as Alice's Handset<br>(Originator)
    participant AliceTeleco as Alice's<br>Teleco
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Alice->>AliceTeleco: 1. Alice actually<br>sends SMS msg
    AliceTeleco->>Bob: SMS received
    activate Bob
    Note over Bob: Receives SMS,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is not RCS or DYST enabled.<br>Will use SMS.
    Alliance->>Alice: 3. DYST Challenge (SMS)
    activate Alice
    Note over Alice: Alice sees SMS asking if<br/>she sent message to<br><bobs phone num>.<br/>She replies 'YES'.
    Alice-->>Alliance: 4. Challenge Response (Confirmed)
    deactivate Alice
    Note over Alliance: Alice's SMS reply<br>of 'YES' received.
    Alliance->>Bob: 5. Verification Success
    deactivate Alliance
    activate Bob
    Note over Bob: Now displays message.<br/>Phone buzzes
    deactivate Bob

Fake caller ID call via SMS (no RCS, no DYST)

sequenceDiagram
    participant Eve as Eve's Handset<br>(Originator)
    participant EveVOIP as Eve's VoIP<br>Gateway
    participant Alice as Alice's Handset<br>(Originator)
    participant Bob as Bob's Handset<br>(Recipient)
    participant Alliance as Spam Alliance<br>Backend

    Eve->>EveVOIP: 1. Eve actually<br>Initiates call<br>caller-id = Alice
    EveVOIP->>Bob: Call Signalling
    activate Bob
    Note over Bob: Receives signalling,<br/>waits for verification.

    Bob->>Alliance: 2. Verification Request
    deactivate Bob
    activate Alliance

    Note over Alliance: Determines Alice's handset<br/>is not RCS or DYST enabled.<br>Will use SMS.
    Alliance->>Alice: 3. DYST Challenge (SMS)
    activate Alice
    Note over Alice: Alice sees SMS asking if<br/>she is calling <bobs phone num>.<br/>She ignores it or replies 'NO'.
    Alice-->>Alliance: 4. Challenge Response (Denied)
    deactivate Alice
    Note over Alliance: Alice's SMS reply<br>of 'NO' received (or timeout).
    Alliance->>Bob: 5. Verification Failure
    deactivate Alliance
    activate Bob
    Note over Bob: Drops the call without<br>notifying Bob and doesnt<br>place it in recent calls
    deactivate Bob

Of course a text reply of “No” would be inferred for anyting other that ‘Y’, y, YES, 是, 对, हाँ (haan), sí, oui, نعم, হ্যাঁ , da, sim, ہاں, ya, ja, はい, ndiyo, हो , అవును , evet, vâng / dạ

This system could be enhanced with network intelligence. For IP-based messages, the receiving service’s backend (e.g., Apple’s servers) could perform a traceroute back to the sender’s IP. This wouldn’t reveal the user’s precise location but would identify the network path. A path from a reputable mobile carrier like O2 in the UK would contribute to a “low-probability spammer” score, whereas a path from a data center known for fraudulent activity would raise suspicion, all without exposing the sender’s private location.

A user could configure their device to “block all unverified calls”, “warn”, or perform no checks. However, limitations of this solution would including the risk of blocking legitimate calls if the verification service has an outage and the small delay the check adds to every communication. To counter denial-of-service attacks where bad actors flood users with fake verification messages to create fatigue and distrust, the system would rely on server-side intelligence from the central servers to detect and rate-limit anomalous floods of verification requests.

A challenge remains communication with “legacy users” on older devices. While most phones would handle the fallback SMS correctly, older feature phones (“dumbphones”) in particular might display the verification message as multiple confusing parts or silently drop it due to full inboxes, undermining the system. To build trust for this fallback, a logical conclusion would be for participating companies to form a “Spam Alliance”, complete with a website listing members, and brand the message with this neutral alliance name.

2026 footnote: The DYST challenge-response pattern — “did you send this?” queried back to the originator’s domain — is the same model underpinning Live Verify (source), but for documents instead of calls. A bank statement, police ID card, or sanctions attestation is normalized, SHA-256 hashed, and looked up at the issuer’s domain: GET /v/{hash} — effectively asking the issuer “did you issue this?” The domain-as-authority principle is identical: the originating server is the source of truth, no central registry needed. Where DYST proposes a Spam Alliance coordinating telcos, Live Verify proposes authority chains — issuer endorsed by regulator endorsed by sovereign root — to establish trust without a single coordinating body.

The telcos’ resistance would likely be because their large customers, such as call centers using VoIP, would face significant hurdles. Much of their software may be old and difficult to update to support a new verification protocol. A call center vendor might even configure their system to automatically answer “yes” to all challenges, legitimate or not. This would necessitate another layer of defense, led by the Spam Alliance, to apply reputation scoring to the responders themselves. If an endpoint blindly says “yes” to everything, its attestations would eventually be deemed untrustworthy.

Finally, it is instructive to look at the history of DMARC, SPF, and DKIM for email. These technologies did not eliminate spam, but they largely solved the problem of direct domain spoofing. Similarly, a DYST-like system for voice and SMS would not be a silver bullet for all unwanted communication, but it could effectively end caller-ID spoofing of legitimate numbers, forcing bad actors onto more easily traceable and blockable channels.

Example DYST Messages

As RCS (Rich Communication Services) supports rich cards and suggested actions, the user experience for a DYST-style verification could be significantly improved over a plain SMS fallback.

Challenge (JSON payload) to claimed originator who has a DYST-enabled Smartphone

When the claimed originator’s Smartphone is DYST-enabled, the verification happens silently in the background between the originating service’s server and the handset. This interaction uses a JSON payload, and the handset automatically answers the challenge without user intervention.

Recieving handset sends this JSON payload to the DYST-enabled handset of the claimed originator:

{
  "dyst_challenge": {
    "protocol_version": "1.0",
    "challenge_id": "dyst-challenge-12345",
    "timestamp": "2025-11-29T10:30:00Z",
    "originator_id": "+14155550110",
    "recipient_id": "+12025550148",
    "communication_type": "voice_call",
    "communication_id": "call-abc-789",
    "message_digest": "sha256:abcdef12345...",
    "ttl_seconds": 30
  }
}

Claimed originators DYST-enabled handset responds silently:

{
  "dyst_response": {
    "challenge_id": "dyst-challenge-12345",
    "status": "confirmed",
  }
}

Or if they had not placed the voice call (someone else faked caller ID):

{
  "dyst_response": {
    "challenge_id": "dyst-challenge-12345",
    "status": "denied",
  }
}

Challenge RCS message to claimed originator who has a Smartphone that’s not DYST enabled (a fallback)

This is what the person making the call would receive on their device if the recipient doesn’t recognize them. It’s designed to be simple and quick.

Spam Alliance Verification ✓
---------------------------------------
🛡️ Did you just try to contact `+1-202-555-0148`?

To connect your call, please confirm it was you.

[ ✅ Yes, that was me ]  [ ❌ No, not me ]
---------------------------------------
<small>Sent by the Spam Alliance. [Learn More]</small>

Challenge SMS message to claimed originator with a non-RCS phone

If the originator’s phone does not support RCS (e.g., a “dumbphone”), the system must fall back to plain SMS. This experience is the most basic and relies on the user to manually reply.

Spam Alliance: Did you just try to contact +1-202-555-0148? To connect, reply YES to this message.

Language Selection for a Global DYST System

For a global anti-spam system to be effective, it must communicate with users in their native language. The DYST system would handle this differently depending on the originator’s device capabilities.

1. First-Class (DYST-enabled Handset)

This is the simplest scenario. The silent JSON payload exchanged between the services is machine-readable and language-agnostic. If the originator’s handset needs to display any notification related to the verification, it uses its own local OS language setting to do so. No language information needs to be transmitted in the challenge itself.

2. RCS Fallback (non-DYST Smartphone)

When falling back to a user-facing RCS message, the system must send the challenge in the correct language. This would be solved by having the originator’s device report its language preference (e.g., ‘es-MX’ for Mexican Spanish) as part of its standard RCS capabilities. The Spam Alliance’s service would then deliver a pre-translated, interactive message in that specific language.

3. SMS Fallback (non-RCS phone)

This is the most challenging scenario. Like the RCS fallback, the system would attempt to determine the device’s language. However, without the rich capabilities of an RCS client, this may not be possible. As a last resort, the system would have to make an educated guess based on the phone number’s country code, which is less reliable but better than defaulting to a single language like English.

Modern CV Technology: JSON Resume embedded in HTML

2025-10-12T00:00:00+00:00

Problem: uploading your resume/CV to a job portal should yield a perfectly parsed CV but often does not. This is true even if your template .docx is a claimed good starting point for later ingesting into such systems.

Building the Future of Digital Resumes: A Technical Deep Dive

In an era where Applicant Tracking Systems (ATS) and job portals increasingly dominate the recruitment landscape, the traditional PDF resume is showing its age. What if we could create a resume format that’s simultaneously machine-readable, human-friendly, and completely self-contained? Well, that ended up being a side project - read on.

The Problem with Traditional Resumes

Traditional resume formats present a fundamental challenge:

PDFs look great but are difficult for ATS systems to parse accurately, even in the nascent AI era.
Word documents are editable but inconsistent across platforms
Plain text is machine-readable but lacks visual appeal
Linked In would like to own this space, but they’re way too much lock in, and spend too much trying to keep your engagement in their pages, vs get you a job.

The Solution: JSON Resume Schema + Interactive HTML

This repository showcases a evolutionary approach that combines the best of two words - mnachine parsable and appealing to human eyes. The raw storage of the CV/resume data:

Every resume uses the official JSON Resume standard:

{
  "$schema": "https://raw.githubusercontent.com/jsonresume/resume-schema/v1.0.0/schema.json",
  "basics": { "contact": "info and summary elemnts" },
  "work": [ { "employment": "history blah blah blah" } ],
  "education": [ { "degrees": "and certifications" } ],
  "skills": [ { "technical": "abilities yada yada"} ],
  "and": "ten more standardized sections"
}

Each resume/CV is a single HTML file containing:

Embedded JSON data above in a <script> tag for ATS extraction
Inlined CSS (~1000 lines) for complete visual control
Inlined JavaScript (~1200 lines) for interactive features
Zero external dependencies for viewing (optional CDN for PDF generation)

The same file serves two masters:

Machines: Extract structured JSON data for database import. It could have as easily been XML or YAML, but JSON parsing in web pages is built in.
Humans: Styled and responsive HTML with some interactive features

Some interactivity to control verbosity

You can click a pen to go into edit mode. Editing isn’t text, it is contract (-) and expand (+) affordances to reduce the verbosity of sections. While I may gush about the origin story of Selenium an in-firm recruiter, then agent who’s taken my CV to them, the downstream interviewers are spectacularly uninterested in that so they’ll hit (-) to collapse that section. This will persist if you go on to print the resume/CV, but not if you close the tab.

Collapse affordance shown:

That clicked, expand affordance shown:

This is only useful for someone wanting customize the document for a purpose. For example, and interview stage with the candidate or a discussion with a colleague about take forward in the process or decline.

Responsive Design with Print Optimization

The CSS includes specialized media queries to ensures URLs are visible in printed versions—crucial for ATS systems and recruiters. Versus just hyperlinks when that’s in a browser. No big deal perhaps.

PDF Generation Pipeline

For enhanced PDF output, the system dynamically loads:

html2canvas: Renders HTML to canvas with pixel-perfect accuracy
jsPDF: Converts canvas to professional PDF format

Ignoring this lazy load of JavaScript from CDNs, this was otherwise a zero dependency tech.

Sample CVs in the repository

Current collection includes 14 example resumes featuring fictional characters:

Technical roles: Tony Stark (Genius/Inventor), Harold Finch (Software Engineer)
Leadership positions: T’Challa (Head of State), Princess Leia (Rebel Leader)
Diverse backgrounds: Hermione Granger (Academic), Mulan (Military Officer)

Take a look at Sam “Root” Groves from “Person of Interest” TV series; paul-hammant.github.io/better-cv-tech/Samantha_Groves_Resume.html

Implementation Highlights

Markdown Support Within JSON

The system supports basic markdown in key text fields:

**bold text** and *italic text*
[link text](https://example.com) for clickable links
Paragraph breaks with \n\n

This enhances human readability while maintaining ATS compatibility. Well, maybe.

ATS Integration Strategy

For ATS systems to adopt this format, they need minimal changes:

// Extract resume data from HTML
const resumeScript = document.getElementById('cv-data-json');
const resumeData = JSON.parse(resumeScript.textContent);
// Now import structured data directly into database

Likely they’d just be snipping out of the unparsed source file though. That would go into their databases, but also maybe a pipeline that uses a JSON Resume to HTML pipeline. Either way, this is orders of magnitude more reliable than PDF or .docx text extraction or HTML scraping. Even with claimed AI on their side in 2025

Performance Characteristics

File size: ~80KB per resume (including all assets)
Load time: Near instant (no external requests for viewing)
Browser support: Modern browsers with JavaScript turned on (ES6+ required)
Mobile responsive: Breakpoints at 768px and 480px

Security Considerations

For recruiters receiving HTML resumes:

✅ No file system access - Pure DOM manipulation
✅ No external requests - Self-contained execution
✅ Standard JavaScript - No eval() or dangerous APIs
✅ Data transparency - JSON visible in source

Likely there will be some over-cautiousness. Bigger companies could verify the JavaScript within each uploaded CV/resume if they really wanted to. A whitelist of sorts (extract > lint > pretty-print > SHA256 > check against whitelist). Or just

Getting Started

To create your own resume:

Copy a template: Use Lorem_Ipsum_Resume.html as your starting point
Replace JSON data: Update the embedded resume data with your information
Inline css and javascript: They are separate in the repo, as there is fourteen or so sample CV/resumes.
Test thoroughly: Verify print output and mobile responsiveness
Name appropriately: Use FirstName_LastName_Resume.html format

Or get AI to take the constituent pieces and make the page for you. It did so for mine in a couple of mins. Prompt is in the repo.

Links:

Repository: paul-hammant/better-cv-tech
Live Demo: GitHub Pages Gallery - 14 resume/CVs
Schema: JSON Resume v1.0.0

Building a Secure Container Sandbox on ChromeOS for Testing Untrusted Code

2025-09-18T00:00:00+00:00

The Problem: Running Random GitHub Code Safely

As developers, we frequently encounter interesting GitHub repositories, development tools, or scripts that we want to test. However, running` untrusted code directly on our development machines poses significant security risks:

Supply chain attacks: Malicious code that modifies system binaries or installs backdoors (like the 2025 Chalk npm package compromise that affected millions of downloads)
Data theft: Scripts that scan for SSH keys (passphrase protected or not), API tokens, or sensitive files
System compromise: Privilege escalation attacks that gain persistent access
Resource abuse: Cryptocurrency miners or botnet participation

Traditional solutions like virtual machines are heavyweight and slow to reset. Docker provides isolation but shares the kernel and can be escaped. ChromeOS’s unique architecture offers a compelling alternative through its layered container system.

ChromeOS Container Architecture: Defense in Depth

ChromeOS provides multiple layers of isolation that make it ideal for secure sandboxing:

┌────────────────────────────────────────-─────┐
│              ChromeOS (Host)                 │
│  ┌─────────────────────────────────────────┐ │
│  │           Termina VM                    │ │
│  │  ┌─────────────┐  ┌─────────────────────┤ │
│  │  │   Penguin   │  │        OSS          │ │
│  │  │  (Trusted)  │  │   (Untrusted)       │ │
│  │  │             │  │                     │ │
│  │  │ - Your work │  │ - Random GitHub     │ │
│  │  │ - SSH keys  │  │   repositories      │ │
│  │  │ - Configs   │  │ - Untested tools    │ │
│  │  │             │  │ - No sensitive      │ │
│  │  │             │  │   data              │ │
│  │  └─────────────┘  └─────────────────────┤ │
│  └─────────────────────────────────────────┘ │
└─────────────────────────────────────────-────┘

Or a second ascii-art way of outlining the same situation, with more detail ()

ChromeOS (host)
 └─ crosvm (VM with KVM acceleration if hardware supports it)
     └─ Termina (tiny VM OS, runs LXD daemon and socket)
         │    └- uses LXD API to query kernel level cgroups/namespaces
         │    └- avoids trusting oss's /bin/ps /bin/find etc
         │         
         ├─ oss       (Debian container, untrusted)
         │    └─ [supply-chain attacker could replace ps/find/ls]
         │
         └─ penguin   (main Debian container, trusted code)

Complete Setup Process

Prerequisites

This setup must be run in ChromeOS’s Termina VM, not inside a container. You can access this by pressing ctrl-alt-t and the resulting terminal should say “Welcome to crosh, the ChromeOS developer shell.” You should see a prompt like crosh>

Quick Start Script

There’s no vi, emacs or nano in chrosh. You’ll prepare scripts elsewhere and email them to yourself. You’ll use ChromeOS’ regular mail app to read those, as you logged in with your google account after all. The ChromeOS text editor is a bit weak for my liking, and I like to keep a copy of things that may get refined and are not canonically in source control.

cat > /tmp/filename_as_instructed.sh << 'SETUP_SCRIPT_EOF' | bash
The bash code below 
SETUP_SCRIPT_EOF

Save this as /tmp/setup-secure-containers.sh and run it with bash /tmp/setup-secure-containers.sh as you can’t make it executable:

#!/bin/bash
# ChromeOS Secure Container Setup
# This creates two containers:
# - penguin (default/trusted) - your main work environment  
# - oss (untrusted) - for running cloned repos and untrusted code

# IMPORTANT: This script must be run in Termina (the ChromeOS VM)
# because lxc commands don't work from inside containers

set -e

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

echo "================================================="
echo "ChromeOS Secure Container Setup"
echo "================================================="
echo ""
echo "This script will create a two-container security setup:"
echo "- penguin: Your trusted container (already exists)"
echo "- oss: Untrusted container for running random code"
echo ""

# Verify we're in Termina
if ! command -v lxc &> /dev/null; then
    echo "ERROR: lxc command not found. This script must be run in Termina"
    echo "Open the Terminal app in ChromeOS and run this script there."
    exit 1
fi

echo "Environment check passed - lxc command found"

# Step 1: Show current containers
echo ""
echo "Step 1: Current container status"
echo "================================"
lxc list

# Step 2: Get available images
echo ""
echo "Step 2: Finding Debian image..."
echo "==============================="
IMAGE_FINGERPRINT=$(lxc image list --format csv | grep -i debian | cut -d',' -f2 | head -1)

if [ -z "$IMAGE_FINGERPRINT" ]; then
    echo "No Debian image found. Available images:"
    lxc image list
    exit 1
fi

# Step 3: Create the oss container
echo ""
echo "Step 3: Creating oss container..."
echo "=================================="

# Check if oss container already exists
if lxc info oss &>/dev/null; then
    echo "Container 'oss' already exists, skipping..."
else
    echo "Creating 'oss' container..."
    lxc launch $IMAGE_FINGERPRINT oss
    echo "Waiting for container to start..."
    sleep 5
fi

# Show updated container list
echo ""
lxc list

echo ""
echo "Step 4: Setting resource limits for oss container..."
echo "===================================================="

# Set resource limits and security restrictions
lxc config set oss limits.cpu 2
lxc config set oss limits.memory 2GB
lxc config set oss security.nesting false
lxc config set oss security.privileged false

echo "Resource limits and security restrictions applied to oss container"

echo ""
echo "Step 5: Setting up oss container for untrusted code..."
echo "====================================================="

# Install packages in oss container
lxc exec oss -- apt-get update
lxc exec oss -- apt-get install -y git build-essential python3 python3-pip curl wget nodejs npm
lxc exec oss -- mkdir -p /workspace
lxc exec oss -- bash -c "echo 'Untrusted code workspace - Only run random GitHub repos here!' > /workspace/README.txt"

echo "OSS container setup complete"

# Step 6: Set up baseline in trusted container (penguin)
echo ""
echo "Step 6: Creating baseline in penguin (trusted container)..."
echo "=========================================================="

# Create baseline hashes for binary integrity monitoring
lxc exec penguin -- bash -c "find /bin /usr/bin -type f -executable -exec sha256sum {} \; > /root/baseline_hashes.txt"
lxc exec penguin -- bash -c "echo 'Baseline created at $(date)' >> /root/baseline_hashes.txt"
lxc exec penguin -- bash -c "wc -l < /root/baseline_hashes.txt | xargs -I {} echo 'Baseline hash count: {}'"

# Ensure penguin has essential tools
lxc exec penguin -- apt-get update
lxc exec penguin -- apt-get install -y vim git

echo "Baseline created in penguin container"

# Step 7: Create monitoring script in Termina
echo ""
echo "Step 7: Creating Termina-based monitoring script..."
echo "================================================"

cat > /tmp/monitor-containers.sh << 'MONITOR_EOF'
#!/bin/bash
# Container Security Monitor - Runs in Termina
# Monitors the oss (untrusted) and penguin (trusted) containers for supply chain attacks

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

function setup_baseline() {
    local baseline_file="/tmp/.container_baseline"
    
    if [ ! -f "$baseline_file" ]; then
        echo "Creating baseline for binary integrity monitoring..."
        echo "# Container Security Baseline - Created $(date)" > "$baseline_file"
        
        # Create baseline for both containers
        for container in oss penguin; do
            if lxc info "$container" &>/dev/null; then
                echo "# $container container baseline" >> "$baseline_file"
                for binary in /bin/bash /bin/sh /usr/bin/python3 /usr/bin/node /usr/bin/git; do
                    hash=$(lxc exec $container -- sha256sum $binary 2>/dev/null | awk '{print $1}')
                    if [ -n "$hash" ]; then
                        echo "$container:$binary:$hash" >> "$baseline_file"
                    fi
                done
            fi
        done
        echo "Baseline created at $baseline_file"
    fi
}

function check_binary_integrity() {
    local container=$1
    local found_issues=0
   
    echo -e "${YELLOW}Binary Integrity Check - $container${NC}"
    
    # Check key binaries against baseline
    while IFS=: read -r base_container base_binary base_hash; do
        if [[ "$base_container" == "$container" ]] && [[ "$base_binary" =~ ^/.*$ ]]; then
            current_hash=$(lxc exec $container -- sha256sum $base_binary 2>/dev/null | awk '{print $1}')
            if [ -n "$current_hash" ] && [ "$current_hash" != "$base_hash" ]; then
                echo -e "  ${RED}!!!  Modified: $base_binary${NC}"
                echo "    Expected: $base_hash"
                echo "    Current:  $current_hash"
                found_issues=1
            fi
        fi
    done < "/tmp/.container_baseline" 2>/dev/null
    
    if [ $found_issues -eq 0 ]; then
        echo "  All monitored binaries match baseline"
    fi
    
    return $found_issues
}

function check_processes() {
    local container=$1
    echo -e "${YELLOW}Process Check - $container${NC}"
    echo "  Top processes:"
    
    # Show top CPU-consuming processes
    lxc exec $container -- ps aux --sort=-%cpu 2>/dev/null | head -6 | while IFS= read -r line; do
        echo "    $line"
    done
}

function check_network() {
    local container=$1
    echo -e "${YELLOW}Network Connections - $container${NC}"
   
    # Count listening ports and established connections
    listening=$(lxc exec $container -- ss -tlnp 2>/dev/null | grep LISTEN | wc -l)
    connections=$(lxc exec $container -- ss -tnp 2>/dev/null | grep ESTAB | wc -l)
    
    echo "  Listening ports: $listening"
    echo "  Active connections: $connections"
    
    # Alert on any external connections from untrusted container
    if [ "$container" = "oss" ] && [ "$connections" -gt 0 ]; then
        echo -e "  ${RED}!!!  External connections detected in untrusted container:${NC}"
        lxc exec $container -- ss -tnp 2>/dev/null | grep ESTAB | while IFS= read -r line; do
            echo "    $line"
        done
    fi
}

function check_recent_files() {
    local container=$1
    echo -e "${YELLOW}Recently Modified Files - $container${NC}"
    echo "  Files modified in last 10 minutes:"
   
    # Focus on system directories for supply chain attacks
    lxc exec $container -- find /bin /usr/bin /lib -xdev -type f -mmin -10 2>/dev/null | while IFS= read -r line; do
        echo "    $line"
    done
    
    # Also check workspace for oss
    if [ "$container" = "oss" ]; then
        echo "  Workspace files:"
        lxc exec $container -- find /workspace -xdev -type f -mmin -10 2>/dev/null | head -5 | while IFS= read -r line; do
            echo "    $line"
        done
    fi
}

function check_suspicious_files() {
    local container=$1
    echo -e "${YELLOW}Suspicious Files Check - $container${NC}"
    
    # Look for hidden files in tmp directories
    suspicious_count=$(lxc exec $container -- find /tmp /var/tmp /dev/shm -name ".*" -type f 2>/dev/null | wc -l)
    
    if [ "$suspicious_count" -gt 0 ]; then
        echo -e "  ${RED}!!!  Found $suspicious_count hidden files in temp directories${NC}"
    else
        echo "  No suspicious hidden files found"
    fi
}

function check_suid_changes() {
    local container=$1
    
    # Check for new SUID binaries
    lxc exec $container -- find / -xdev -perm -4000 -type f 2>/dev/null | while read suid_file; do
        echo "    SUID: $suid_file"
    done
}

function setup_suid_baseline() {
    for container in oss penguin; do
        if lxc info "$container" &>/dev/null; then
            if [ ! -f "/tmp/.known_suid_$container" ]; then
                lxc exec $container -- find / -xdev -perm -4000 -type f 2>/dev/null > "/tmp/.known_suid_$container"
            fi
        fi
    done
}

# Main monitoring loop
echo "================================================="
echo "Container Security Monitor"
echo "================================================="
echo "Monitoring for supply chain attacks in:"
echo "  - oss (untrusted) - where you run random code"
echo "  - penguin (trusted) - your main environment"
echo ""

# Setup baselines on first run
setup_baseline
setup_suid_baseline

while true; do
    clear
    echo "Container Security Monitor - $(date)"
    echo "====================================="
    echo ""
   
    # Monitor the untrusted container more closely
    echo -e "${RED}=== OSS Container (UNTRUSTED) ===${NC}"
    if ! check_binary_integrity "oss"; then
        echo -e "\n${RED}!!!  ALERT: Binary modification detected in OSS container!${NC}"
        echo -e "${RED}This could indicate a supply chain attack from recently run code${NC}\n"
    fi
    echo ""
    check_processes "oss"
    echo ""
    check_network "oss"
    echo ""
    check_recent_files "oss"
    echo ""
    check_suspicious_files "oss"
   
    echo ""
    echo -e "${GREEN}=== Penguin Container (TRUSTED) ===${NC}"
    if ! check_binary_integrity "penguin"; then
        echo -e "\n${RED}!!!  CRITICAL: Binary modification in TRUSTED container!${NC}\n"
    fi
    echo ""
    check_network "penguin"
   
    echo ""
    echo "Next scan in 30 seconds... (Press Ctrl+C to exit)"
    sleep 30
done
MONITOR_EOF

chmod +x /tmp/monitor-containers.sh
echo "Monitoring script created at /tmp/monitor-containers.sh"

# Step 8: Create helper scripts
echo ""
echo "Step 8: Creating helper scripts..."
echo "=================================="

# Create a script to enter each container
cat > /tmp/enter-oss.sh << 'ENTER_OSS_EOF'
#!/bin/bash
echo "========================================="
echo "Entering OSS (UNTRUSTED) Container"
echo "========================================="
echo "!!!  SECURITY WARNING:"
echo "- Only run untrusted code here"
echo "- No sensitive files or keys"
echo "- Monitor with: bash /tmp/monitor-containers.sh"
echo "========================================="
lxc exec oss -- bash
ENTER_OSS_EOF

chmod +x /tmp/enter-oss.sh

cat > /tmp/enter-penguin.sh << 'ENTER_PENGUIN_EOF'
#!/bin/bash
echo "========================================="
echo "Entering Penguin (TRUSTED) Container"
echo "========================================="
echo "This is your trusted development environment"
echo "========================================="
lxc exec penguin -- bash
ENTER_PENGUIN_EOF

chmod +x /tmp/enter-penguin.sh

# Create status script
cat > /tmp/status.sh << 'STATUS_EOF'
#!/bin/bash
echo "Container Security Setup Status"
echo "==============================="
lxc list
echo ""
echo "OSS Container Security Settings:"
echo "  CPU limit: $(lxc config get oss limits.cpu)"
echo "  Memory limit: $(lxc config get oss limits.memory)"
echo "  Privileged: $(lxc config get oss security.privileged)"
echo "  Nesting: $(lxc config get oss security.nesting)"
echo ""
echo "Quick security check:"
for container in oss penguin; do
    echo -n "  $container binary integrity: "
    ps_hash=$(lxc exec $container -- sha256sum /bin/ps 2>/dev/null | awk '{print substr($1,1,8)}')
    echo "$ps_hash"
done
STATUS_EOF

chmod +x /tmp/status.sh

# Create a reset script for the oss container
cat > /tmp/reset-oss.sh << 'RESET_EOF'
#!/bin/bash
echo "This will destroy and recreate the OSS container"
echo "Any data in the OSS container will be lost!"
read -p "Are you sure? (y/N): " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
    echo "Resetting OSS container..."
    lxc stop oss
    lxc delete oss
    
    # Recreate with same settings
    IMAGE_FP=$(lxc image list --format csv | grep -i debian | cut -d',' -f2 | head -1)
    lxc launch $IMAGE_FP oss
    
    # Reapply security settings
    lxc config set oss limits.cpu 2
    lxc config set oss limits.memory 2GB
    lxc config set oss security.nesting false
    lxc config set oss security.privileged false
    
    # Reinstall packages
    lxc exec oss -- apt-get update
    lxc exec oss -- apt-get install -y git build-essential python3 python3-pip curl wget nodejs npm
    lxc exec oss -- mkdir -p /workspace
    
    echo "OSS container reset complete"
    
    # Update baseline
    rm -f /tmp/.container_baseline
    echo "Run: bash /tmp/monitor-containers.sh to recreate baseline"
fi
RESET_EOF

chmod +x /tmp/reset-oss.sh

# Step 9: Optional SSH Setup for OSS Container
echo ""
echo "Step 9: Setting up SSH access to OSS (optional)..."
echo "=================================================="

# Install and configure SSH
lxc exec oss -- apt-get install -y openssh-server
lxc exec oss -- rm -f /etc/ssh/sshd_not_to_be_run
lxc exec oss -- ssh-keygen -A
lxc exec oss -- systemctl restart ssh
lxc exec oss -- systemctl enable ssh

# Create non-root user
lxc exec oss -- useradd -m -s /bin/bash dev 2>/dev/null || true
lxc exec oss -- bash -c "echo 'dev:changeme' | chpasswd"
lxc exec oss -- usermod -aG sudo dev

# Get IP address
OSS_IP=$(lxc list oss -f csv -c 4 | cut -d' ' -f1)
echo ""
echo "SSH access configured:"
echo "  ssh dev@$OSS_IP"
echo "  Default password: changeme (change immediately!)"
echo ""
echo "!!!  WARNING: Never use SSH agent forwarding (-A) with untrusted containers!"

# Final summary
echo ""
echo "Setup Complete!"
echo "==============="
lxc list
echo ""
echo "Available commands:"
echo "  bash /tmp/monitor-containers.sh - Start security monitoring"
echo "  bash /tmp/enter-oss.sh         - Enter untrusted container"
echo "  bash /tmp/enter-penguin.sh     - Enter trusted container"
echo "  bash /tmp/status.sh            - Check security status"
echo "  bash /tmp/reset-oss.sh         - Reset OSS container (if compromised)"
echo ""
echo "Usage pattern:"
echo "1. Run random code ONLY in oss container (bash /tmp/enter-oss.sh)"
echo "2. Keep monitoring running in another terminal (bash /tmp/monitor-containers.sh)"
echo "3. If monitor detects modified binaries, consider using reset script"
echo ""
echo "The monitor will detect:"
echo "- Modified system binaries (supply chain attacks)"
echo "- Suspicious processes and network connections"
echo "- Recently modified files in system directories"
echo "- Hidden files in temporary directories"
echo ""
echo "🔒 Your trusted work remains safe in the penguin container!"

Manual Setup Steps

If you prefer to understand each step, here’s the manual process:

1. Create the Untrusted Container

# Get available image
IMAGE_FP=$(lxc image list --format csv | grep -i debian | cut -d',' -f2 | head -1)

# Create OSS container
lxc launch $IMAGE_FP oss

# Set security restrictions
lxc config set oss limits.cpu 2
lxc config set oss limits.memory 2GB
lxc config set oss security.nesting false
lxc config set oss security.privileged false

2. Install Development Tools

# Update and install common development tools
lxc exec oss -- apt-get update
lxc exec oss -- apt-get install -y git build-essential python3 python3-pip curl wget nodejs npm

# Create workspace directory
lxc exec oss -- mkdir -p /workspace

3. Create Monitoring Script

The monitoring script runs from Termina and watches both containers for signs of compromise:

# Create the monitoring script
cat > /tmp/monitor-containers.sh << 'EOF'
#!/bin/bash
# [Full monitoring script from above]
EOF

chmod +x /tmp/monitor-containers.sh

Troubleshooting Common Issues

Termina Filesystem Issues

Issue: Scripts fail with “Read-only file system” error. Termina’s home directory (~/) is read-only:

Solution: Use /tmp for all scripts which is not read only.

Issue: no editors

Solution pipe to file trick as mentioned.

# Correct - use /tmp
cat > /tmp/script.sh << 'EOF'
...
EOF
bash /tmp/script.sh

SSH Setup

SSH service needs manual setup in containers:

You get into the container using lxc exec container_name -- bash from Termina (crosh):

# In the target container (run via lxc exec)
apt-get install -y openssh-server
rm -f /etc/ssh/sshd_not_to_be_run  # Remove startup blocker
ssh-keygen -A                       # Generate host keys
systemctl restart ssh
systemctl enable ssh
echo 'root:changeme' | chpasswd    # Set password

LXC Command Not Found

Do NOT run from inside penguin or any other container - you need to be in crosh: ctrl-alt-t

Secure Git Access Patterns

You don’t want you SSH private key on the container that may be taken over by ‘chalk’ style actions. At least you don’t want it without a lengthy passphrase, but there are traditional solutions:

The SSH Agent Forwarding Security Risk

Critical Warning: SSH agent forwarding allows untrusted code to use your keys!

While your SSH session with -A flag is active:

Malicious code can push to ANY repo you have write access to
Can clone ANY private repo you have access to
Cannot steal your key, but can USE it

Safer Alternatives for Git Access

Option 1: Read-Only Deploy Keys (RECOMMENDED)

# Create separate key for untrusted work
ssh-keygen -t ed25519 -f ~/.ssh/oss_readonly_key -N ""

# Add to GitHub as deploy key (READ-ONLY) for specific repos
# Copy ONLY this key to oss container
lxc exec penguin -- cat /home/USER/.ssh/oss_readonly_key | \
  lxc exec oss -- bash -c "cat > /home/dev/.ssh/id_ed25519 && chmod 600 /home/dev/.ssh/id_ed25519"

Option 2: Fine-Grained Personal Access Tokens

# Create token with minimal permissions (public_repo only)
# Use in oss container:
git clone https://TOKEN@github.com/user/repo.git

Option 3: Time-Limited Agent Forwarding (USE SPARINGLY)

# Only when absolutely necessary for push operations
ssh -A dev@[oss-ip]
# Do your git operation
# EXIT IMMEDIATELY
exit

# Or use confirmation-required keys
ssh-add -c ~/.ssh/id_ed25519  # Requires confirmation for each use

Security Best Practices

What to NEVER Do

Don’t store sensitive data in the OSS container:

# NEVER DO THIS
cp ~/.ssh/id_rsa /path/to/oss/container

Don’t run trusted code in OSS container:

# NEVER DO THIS
lxc exec oss -- git clone git@github.com:yourcompany/private-repo.git

Don’t disable monitoring:

# NEVER DO THIS - Always keep monitoring running
pkill monitor-containers

What to ALWAYS Do

Reset compromised containers:
```
# If monitor detects issues
bash /tmp/reset-oss.sh
```
Don’t attempt to repair them. Heck, maybe reset with some regularity anyway.

Regularly update baselines:

# After legitimate updates
rm /tmp/.container_baseline
bash /tmp/monitor-containers.sh  # Recreates baseline

Container Isolation Rules

Container	Purpose	SSH Keys	Git Configs	Sensitive Data
penguin	Trusted development	Safe	Safe	Safe
oss	Untrusted testing	Never	Never	Never

Git Security Matrix

Operation	Penguin (Trusted)	OSS (Untrusted)	Method
Clone public repos	y	y	HTTPS
Clone private repos	y	!	Deploy keys only
Push to repos	y	N	Never from OSS
Store SSH keys	y	N	Never
Store PATs	y	!	Limited scope only
Agent forwarding	N/A	!️	Brief sessions only

Container Access Patterns

Daily workflow

Terminal App -> penguin # Normal development
Ctrl+Alt+T -> vsh termina -> bash /tmp/enter-oss.sh # Testing
Or ssh from penguin to oss

Security monitoring

Ctrl+Alt+T -> vsh termina -> bash /tmp/monitor-containers.sh

Never create shortcuts that bypass security
Don’t alias direct access to OSS in penguin
Don’t auto-start monitoring (review alerts manually)

Lessons from Production Use

Key Insight: The separation between Termina (VM host) and containers is crucial. Many security solutions try to work entirely within containers, but the real power comes from leveraging the host-level view.

What we learned:

Container isolation is only as good as your monitoring for breaches
Host-level monitoring provides better security visibility
ChromeOS’s architecture is designed for this type of security model

Understanding the Monitoring System

The monitoring system watches for several types of compromise:

1. Binary Integrity Monitoring

I grant you this is underdeveloped at the point of this blog entry.

# Creates baseline hashes of critical binaries
for binary in /bin/bash /bin/sh /usr/bin/python3 /usr/bin/node /usr/bin/git; do
    hash=$(lxc exec $container -- sha256sum $binary 2>/dev/null | awk '{print $1}')
    echo "$container:$binary:$hash" >> ~/.container_baseline
done

What it detects: Supply chain attacks that modify system binaries

Real-world example: The 2024 Chalk npm package compromise replaced legitimate packages with malicious versions that could modify Node.js binaries or install backdoors. Our monitoring would detect such changes immediately.

Example alert:

!!!  Modified: /usr/bin/python3
Expected: a1b2c3d4e5f6...
Current:  x9y8z7w6v5u4...

2. Process Monitoring

# Shows CPU-intensive processes
lxc exec $container -- ps aux --sort=-%cpu | head -6

What it detects: Cryptocurrency miners, botnet activity, unexpected daemons

3. Network Connection Analysis

# Counts active connections
connections=$(lxc exec $container -- ss -tnp | grep ESTAB | wc -l)

What it detects: Data exfiltration, command & control communication, unexpected servers

4. File System Changes

# Finds recently modified system files
lxc exec $container -- find /bin /usr/bin /lib -xdev -type f -mmin -10

What it detects: System file tampering, backdoor installation

5. Hidden File Detection

# Searches for hidden files in temp directories
lxc exec $container -- find /tmp /var/tmp /dev/shm -name ".*" -type f

What it detects: Malware staging areas, credential harvesting tools

Advanced Usage Patterns

Testing Suspicious GitHub Repositories

# 1. Start monitoring (separate terminal)
bash /tmp/monitor-containers.sh

# 2. Enter untrusted container
bash /tmp/enter-oss.sh

# 3. In OSS container, test the repo
cd /workspace
git clone https://github.com/suspicious/repo.git
cd repo
./setup.sh  # This runs in isolation

# 4. Monitor terminal will alert on any suspicious changes
# 5. If compromised, reset the container
exit  # Leave OSS container
bash /tmp/reset-oss.sh

Development Tool Testing

# Test a new development tool safely
bash /tmp/enter-oss.sh

# In OSS container
curl -fsSL https://some-tool.com/install.sh | bash
some-new-tool --help

# Monitor for:
# - Modified binaries
# - Network connections
# - Hidden files
# - Unexpected processes

Multi-Container Workflow

# Terminal 1: Monitoring
bash /tmp/monitor-containers.sh

# Terminal 2: Trusted work
bash /tmp/enter-penguin.sh
# Do your normal development here

# Terminal 3: Untrusted testing
bash /tmp/enter-oss.sh
# Test random GitHub projects here

# Terminal 4: Status checking
bash /tmp/status.sh

Performance Considerations

Resource Limits

The OSS container is intentionally limited to prevent resource abuse:

# Current limits (modify as needed)
lxc config set oss limits.cpu 2        # 2 CPU cores max
lxc config set oss limits.memory 2GB   # 2GB RAM max

Monitoring Overhead

Monitoring script uses minimal resources
Scans every 30 seconds (configurable)
Focuses on security-critical changes only
Can run continuously without impact

Container Reset Speed

# Full OSS container reset takes ~2-3 minutes
time bash /tmp/reset-oss.sh
# Includes: stop, delete, recreate, configure, install packages

Integration with Development Workflow

IDE Integration

You can configure your IDE to work with the container setup:

# VS Code with Remote-Containers
# Point to penguin container for trusted development
code --folder-uri vscode-remote://attached-container+penguin/path/to/project

# For untrusted code testing, always use terminal access
bash /tmp/enter-oss.sh

Git Configuration

# In penguin (trusted) - normal git config
git config --global user.name "Your Name"
git config --global user.email "your.email@domain.com"

# In OSS (untrusted) - minimal or fake config only
git config --global user.name "Test User"
git config --global user.email "test@example.com"
# Never configure real credentials

# Share files from trusted to untrusted (one-way only)
# Copy from penguin to OSS for testing
lxc file push /path/in/penguin/container/file.txt oss/workspace/

# NEVER copy from OSS to penguin without verification
# Instead, manually recreate verified files in penguin

Extending the Security Model

Custom Monitoring Rules

Add your own detection rules to the monitoring script:

# Example: Monitor for specific file types
function check_crypto_miners() {
    local container=$1
    miners=$(lxc exec $container -- find /tmp -name "*mine*" -o -name "*crypto*" 2>/dev/null | wc -l)
    if [ "$miners" -gt 0 ]; then
        echo -e "${RED}!!!  Potential cryptocurrency miner detected${NC}"
    fi
}

Log Integration

# Enhanced logging in monitoring script
LOG_FILE="$HOME/container-security.log"

function log_alert() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') ALERT: $1" | tee -a "$LOG_FILE"
}

Conclusion

ChromeOS’s layered container architecture provides an excellent foundation for safely testing untrusted code. This setup gives you:

True isolation: Each container is properly sandboxed
Real-time monitoring: Immediate detection of compromise attempts
Easy recovery: Quick container reset when needed
Maintained productivity: Your trusted environment stays clean

The key insight is using ChromeOS’s existing security model rather than fighting it. By running the monitoring from Termina and isolating untrusted code in its own container, you get enterprise-grade security with developer-friendly workflows.

All that said, I’m unlikely to use it without the Terminal integration (see below). I’d like to open Terminal then click oss a line below penguin. Instead, I’m probably use a container in podman inside penguin - that seems hardened as a solution and workable today.

The Terminal App Limitation

CromeOS’ Terminal app’s inability to add custom container entries is a significant UX gap. The fact that only penguin appears as a clickable row means:

No visual distinction between trusted/untrusted environments
Can’t theme the OSS terminal differently (red background would be perfect!)
Extra friction to access the untrusted container
No way to know at a glance which container you’re in

The SSH workaround (penguin -> ssh dev@oss-ip) adds complexity to what should be a single click.

Security Implications

These UX limitations create real security risks:

Command confusion: Without visual distinction, you might accidentally run trusted commands (like git push with your real credentials) in the untrusted container
Increased attack surface: The SSH workaround opens network ports and adds authentication complexity that could be exploited
Remote access temptation: Lack of remote access means you might be tempted to run untrusted code directly on your primary machine when traveling, defeating the entire security model

I would really want to be able to make a second container from within the Terminal app. The UX hints that it should be possible but the feature is not there. A huge shame given how incredibly strong the dev experience on ChromeBooks (that have enough RAM and SSD).

The Chrome Remote Desktop Problem

This is even more frustrating. A ChromeOS Flex machine effectively becomes an island because, only Windows and Mac are first-class choices for destination for Chrome Remote Desktop. HOST-OS Linux is a second class choice because one seems to need PhD-level understanding of X11/Wayland/DISPLAY. Multi-user possibilities might be one of complexities, and systemd could be on the “complicating” mix too.

Chrome Remote Desktop TO ChromeOS/Linux (the very platform Google controls!) is not supported at all.

Yet, Chrome Remote Desktop FROM ChromeOS to Windows or Mac is supported.

This means you can’t remotely access your secure container (in ChromeOS Flex on a spare PC) from your Chromebook when traveling, defeating much of the purpose of having a dedicated security testing environment. Potential Workarounds (all imperfect). Or from your Mac or Windows laptop which support Chrome Remoting as an origin just fine.

For Terminal access I could create a web-based terminal (ttyd or similar) in each container, accessible via different ports, but I’d rather be in my terminal of choice: Terminal

I have other VNC server in penguin or Guacamole, too, but I wish this were a mainstream feature of ChromeOS once you’ve enabled developer features.

The irony is that Google has all the pieces (Crostini, Chrome Remote Desktop, Terminal app) but hasn’t connected them properly. A simple “Add Container to Terminal” button and proper Chrome Remote Desktop support would solve everything. It’s particularly galling because ChromeOS is supposed to be the “simple, secure” option, yet these limitations push us toward complex workarounds that probably decrease security.

Googlers

And if any Googlers have got this far: can you have an explicit “Disable trackpad while typing” setting as macOS Sierra had. It was removed after sierra. Chromebooks have plastic chassis and mild weight adjacent to the trackpad while typing can cause a click at current pointer position.

Updates

Jan 2026: The “Multi-container” ChromeOS flag #crostini-multi-container has moved from a hidden experiment to a core feature then deprecated. Now there’s a neg baugutte direction that is containerless - wee VMs instead. Its early days with that, and I’m playing with nixos-crostini toward the same goals.

Starting RexxJS

2025-09-15T00:00:00+00:00

Repo: github.com/RexxJS/RexxJS/

Yes, Mike Cowlishaw’s interpreted language from 1979 that’s line-centric, starts indexes at 1 not 0, where vars are “weak” & global dominant, which isn’t OO or functional, and has a dangerous “eval” equivalent. Yes, I do have reservations, but I wanted an in-the-DOM interpreted “glue language” that also works on the command line (mac, win and lin). I wanted it because have unreleased agentic-ai applications that will be better for having such a language for a use case, and I’m trying to work out whether this is compatible with model-context-protocols, alien to them, competitive, enabling of, etc. It could all be folly of course. In short, a familiar to me language that included a “Control Bus” that could work at distance was my goal.

RexxJS

RexxJS’s innovation: It combines ARexx’s lightweight string-based messaging with modern browser capabilities (iframes, workers, JSON-RPC - via that postMessage see below), plus adds progress monitoring and fault tolerance through a CHECKPOINT system. That’s one run target (pure DOM; connects iFrames). The other is on the command line via NodeJS. Building this I worked toward the iFrame/postMessage goal (with integration tests, and pure specs) and then back to other language and library and tooling ecosystem. It’s all in JavaScript obviously. Unit tests of Rexx can be via JavaScript instantiating the Rexx interpreter and parser, too. Halfway through I thought it was time for tests in Rexx itself, so there’s a test framework and very experimental expectations capability - those are dogfood tests and there will be more and more of them. At some stage I could be rash and migrate much more of the jest tests to rexx tests, but I’m sceptical about my ability to sort out an invitable change-one-line that breaks 3000 highly derived tests. Anyway, this is alpha quality right now - don’t put apps live using it. Of course, ClaudeCode, JulesAgent, OpenAI (via the excellent AiderChat) and Gemini-cli have helped a lot.

Agentic AI concerns

Will this have a state synchronization complexity? ARexx worked because AmigaOS applications shared memory space. With distributed nodes, we’ll need robust conflict resolution when scripts modify shared context concurrently. How will this handle split-brain scenarios?
Security surface - Self-sending executable code is possible, powerful but risky. That’s more for something with a shell at the other rather than APIs of some sort. MCP typically uses structured message passing rather than arbitrary code execution. For DOM execution at least I already have it working inside iframe sandboxes.
Security surface 2 - ARexx ports in 2025 are a hackers dream. Generally available ones for an app would need tokens, cryptographic signature tools, declarative privilege request meta-data.
Protocol overhead - ARexx’s beauty was its simplicity. Modern LLM interactions involve complex token management, conversation history, and tool calling. Can RexxJS’s scripting model be expressive enough without becoming any more verbose than it already is?
Testing challenges - distributed systems with progresive state changes are notoriously hard to test deterministically. new classes of integration tests may be needed. Possibly also property-based testing for the coordination logic.

Strengths, otherwise, include the bidirectional progressive reporting which “could” handle streaming responses naturally, (critical for LLM interactions), running in both DOM and CLI gives this flexibility that most MCP implementations lack BUT the ADDRESS fu for command-line has not been built yet.

Rexx history that influenced me

The RexxJS “Control Bus” draws inspiration from ARexx’s revolutionary ADDRESS/PORT model on the Amiga home computer by Commodore, which pioneered lightweight inter-application scripting. 1987 ARexx allowed any application to expose a named “port” that could receive text commands from Rexx scripts, enabling system-wide automation through simple string messaging. William Hawes was the author of ARexx outside Commodore, but it was so good it was subsequently bundled with the OS in 1990/91.

Before ARexx / Prior Art

Of course just because I thought ARexx’s address system was revolutionary did not mean it was without precedent.

IBM Rexx (mainframe / VM CMS exec)

ADDRESS already existed before ARexx — used to send commands to different environments (ADDRESS TSO, ADDRESS ISPEXEC, etc.). Lots of concepts that did not need peer within modern client/server unix-land software engineering.

ARexx extended this idea to user applications and arbitrary “ports.” That was without a TCP/IP subsystem for the Amiga in 1987.

Unix shell + stdin/out pipes (1970s)

Not the same, but the philosophy is similar: treat text as the lingua franca, and let a shell script send commands and collect results. What ARexx innovated was the naming of live applications as endpoints rather than just processes and pipes.

Forth message passing (early 80s)

Forth systems often had message/event words for controlling devices. Not system-wide like ARexx ports, but conceptually related.

Tcl’s send (1990s)

Direct analogue allowing Tcl interpreters to send strings to named targets

Smalltalk image messaging

Everything is a message, but it’s intra-image, not inter-application. However, some Smalltalks apparently had “workspace to morphic world” messaging similar to ARexx’s openness.

Modern Analogues

AppleScript (1993) - Similar concept but heavier (object models + AppleEvents vs simple strings)
window.postMessage (2008) - Browser API that essentially recreates ARexx ports for web contexts (my key target)
Comlink (Google lib, ~2017) - a library for this exact things, that one guesses their own web apps use. “Call methods on remote iframes/workers as if local.” it is said. It is strongly typed (Promises), and less string-oriented.
WebAssembly runtimes with messaging - Wasm modules talking to host via postMessage. Not standardized for cross-iframe yet, but evolving.
Electron IPC (2010’s; Node <-> Renderer). Very similar to ARexx ports: each renderer process is a “port” you can send strings to. Structured messages, not always text-based, but philosophically close.

Innovation around IPC keeps happening too. github.com/eclipse-iceoryx/iceoryx2 - “Eclipse iceoryx2 true zero-copy inter-process-communication with a Rust core” - is current, and make me remember those days of the Amiga with an ARexx script controlling multiple full apps.

Other languages that could implement the same iframe/postMessage thing

Languages with potential to emulate ARexx ADDRESS/PORT:

Lisp/Scheme: (BiwaScheme - current, LispyScript - paused): Very easy to extend the environment with primitives. You could define (send “frame1” “(do-something)”). Since Lisp interpreters already treat strings as code, it maps almost 1:1 to ARexx’s “send command string”.
Lua: (Fengari - status?): Lua has a built-in concept of coroutines and message loops. Adding postMessage/onmessage as primitives would make iframe scripting natural. You could write something like address(“frame1”, “command”).
Forth: (tiny Forths in JS): Forth is literally token streaming. Easy to define PORT words that wrap postMessage. Minimal, but you may lose ARexx-style structured results unless you build a return protocol.
Python (well, a subset of): Skulpt and Brython are Python interpreters in JavaScript. Pyodide too, which I am also delegating to in an “extra”.
Blasts from the past: Several small BASIC-in-JS interpreters exist (TinyBASIC.js .. current link?). Tau Prolog (paused?) is a Prolog interpreter written in JS.

All of those are more advanced languages than Rexx itself. I need help with this. It is likely only to be interesting to people with prior Rexx and more recent JavaScript framework/library/tool-building experience.

Lastly I’m likely to continue to refer to REXX as Rexx.

SwiftUI Component Testing with Appium & Test Harnesses

2025-06-30T00:00:00+00:00

Not sure this the final entry in a series exploring my “UI Component Testing” (started in 2017), but here goes. Over the last couple of weeks, I’ve shown implementations using Playwright, Cypress, Selenium, and NightWatch for a React web application. Now, we leave the web behind and see how the same principles apply to native desktop development with SwiftUI. At true to the oiginal blog entry a credit card component and a couple address-of-credit card component.

The core idea remains the same: test UI components in the “smallest reasonable rectangle” using dedicated test harnesses, enabling fast, isolated, and reliable tests before integrating them into a full application, but more visibly showing the things that would feed into a component and the outcomes of interactions with it.

The Application: A Native SwiftUI Payment Form

Repo: github.com/paul-hammant/swiftui-component-testing-with-appium

To get going with this repo, you’ll need Developer tools (the full app) on a recent Mac, as well as Node 22+. and Appium. After npm install, you’ll need to do appium driver install mac2.

Instead of the “Car Doppler” web app, this example uses a simple macOS payment application built entirely in SwiftUI. The application consists of two main components, and an pseudo app:

CreditCardView: A form for entering credit card details.
BillingAddressView: A form for entering a billing address.
CompositePaymentApp: The “real” application that combines both components (pseudo)

The components are combined in a final CompositePaymentApp to illustrate a complete payment screen, is is just a mockup. Just the placement of the two MVVM components above in an application that you could ship to customers were it finished and useful.

The composite app, outside of test automation:

Test Automation for SwiftUI

We have two tiers of testing: blazing fast unit tests and UI component testing in the style I have been blogging about.

Swift Unit & Integration Tests (`swift test`)

These tests operate directly on the data models in UIComponentTestingLib without launching any UI. They are incredibly fast. We could run hundreds of tests a second, but we don’t have that many in this repo. Being integrated into the Swift environment, these will compile prod and test code if needed before running.

Component Test via Appium (`npm test`)

To automate our SwiftUI test harnesses, we use Appium with its Mac2 driver. This gives us a powerful, Selenium/Webdriver-like ability to drive our native macOS application from an external script—in this case, JavaScript with WebdriverIO. WebDriver is familiar to me already, of course.

The key to achieving our component testing goal is to not totally rely on brittle UI interactions like typing. Here we use a combination of accessibility identifiers and fast data injection, to do some heavy lifting. To aid injection of data, we have aTextEditor view for JSON test data and a “Load Test Data” button to take that and push it into the model using regular functions of the production UI. Via Appium, this is quite smooth,

These tests launch the actual test compoenent harness app(s) and interact with them through the UI layer. They are slower but provide higher confidence that the components are visually correct and interactive. They are more representative of testing the full, compiled application in a way a user would interact with it (albeit with our data injection shortcut). And you have to remember to compile the swift pieces as NPM/Node and JS is a different world toward the scripted testing of this substantially MacOS app.

Anyway, here’s the video of UI component testing in the test harness:

Appium wants the whole screen. It places a shade layer over everything with an “Automation Running” message, to give you the distinct impression that you should take your hands off the mouse/kbd for the duration.

MacOS apps and Appium

It has been difficult ironing the kinks out of this demo. There are a few processes that need to be orchestrated which feels harder than treating everything as a lib for the language ecosystem in question. And also some permissions in MacOS settings (Settings -> Security & Privacy -> Accessibility) that I must say I still have not resolved the all the permutations of. I run regular Terminal on the Mac, as well as VsCode and JetBrains’ “Fleet”. The runner of the Appium script is NodeJs (node the executable). Node is off in a Homebrew manage folde for me. One which could easily change with a brew update/upgrade some time later. The registration is via fully qualified path like /opt/homebrew/Cellar/node/<version>/bin/node. A path that is not easy to enter into the same Mac settings UI that manages what has elevated privs and what does not. Then, after you’ve entered it, the fully qualified path disappears and you’re going to see it as just “node” in the list. This is inauthentic in my opinion and a hole in the Mac’s claimed impregnable security armour, but that is an aside. Running the suite, Appium could fail to find the test app and that could be because of this missing privilege. Or it could be something else. It may have worked for you last week, but because of one brew-upgrade it may not today. I am not sure that Appium can be changed to mind-read what the root cause problem could be. Even without version upgrades it is unclear whether how many runners I have to register in there - VsCode and Node and Fleet? Seems to broad.

My beefy ChromeBook plus has great separations between the OS and the VM I’m developing in. Easy to redo things too if I mess them up. Importantly, it’s hard on the Chromebook for a rogue developer-installed thing to take over the whole OS. I can’t have a Mac VM within the Chromebook though (not even the Docker-OSX thing or related), but I feel VMs are the way forward. Ideally, I’d have a MacOS VM within MacOS just for the programmatic testing of developed apps under Appium-like control. This would (should IMO) take only megabytes of disk, not gigabytes and be lean enough to run on 8GB RAM machines. Inside that Mac VM, I would want all permissions without fiddling with settings post install. I’d want a script to run on boot from an overlay that Appium would setup. Could be that we have that in Tahoe that we know ships with better containers, but I’ve not upgraded yet as it is in beta. It is also for Linux containers only, but a subsequent beta release will likely be for lightweight Mac VMs too.

It could be I already have lean Firecracker-style VMs capability for Mac Sequoia (v15) with “Lume” - see Show HN: Lume – OS lightweight CLI for MacOS and Linux VMs on Apple Silicon. It is Apple silicon only and I have the last of the Intel Macs for now. It is also for pre-built images (they have some Packer tech too). I’d be really happy with “same as host” retrictions on the VM in order to get to the megabyte place. The BSDs do this well I think with “jails”. I’ve talked of the principles of containment before (2016) and this all feels in that direction. To carve a larger screen into VMs seems doable: when the mouse is in that rectangle, it gets mouse/keyboard/camera/mic and can use speakers and video (as far as the scaled rectangle allows). When the mouse is outside that rectangle, some of those are lost, but not in a “USB unplugged” way. And I’d also want real touchpad as virtual touchpad, not re-presented as a mouse. Sandboxes are quite well understood by now, but the containing system should be able to easily grant more than defaults.

Anyway, enjoy this component tests with test harness for a (fat) MacOs app … that is closer to the 2017 blog entry punditing around this testing stuff.

PS: Pseudo-declarative UIs

It is also worth mentioning that I love pseudo-declarative markup languages like SwiftUI and have a 30 blog entries on the concept. Gazing at the main.swift sources in github.com/paul-hammant/swiftui-component-testing-with-appium/blob/main/Sources/ is where you’d see this markup style.

NightWatch Component Testing and visual documentation

2025-06-25T00:00:00+00:00

This blog entry follows my recent exploration of Playwright, Cypress, and Selenium-WebDriver for component testing, I’ve now completed a migration to NightWatch.js. This post documents the complete transition from Selenium WebDriver to NightWatch for a particular test-harness pattern for component testing.

NightWatch Component Testing Migration

Branch: nightwatch_instead_of_selenium

Note: I didn’t start with the Playwright branch for this one - I started with the canonical selenium-webdriver one, because NightWatch is closer to that ecosystem than it is to anything else.

The Migration Challenge

Starting with a fully functional Selenium WebDriver test suite covering both component tests and e2e tests, the goal was to migrate everything to NightWatch.js while preserving:

All test functionality and coverage
Screenshot capabilities for visual documentation
The same test harness pattern for component testing
Performance optimizations from the Selenium implementation

Why NightWatch.js?

NightWatch.js offers several advantages over raw Selenium WebDriver:

Cleaner more modern syntax: More readable test code with built-in assertions
Better error reporting: Detailed failure messages with stack traces
Integrated screenshots: Built-in screenshot capabilities with failure capture
Configuration simplicity: Single configuration file vs multiple setup files
Browser management: Automatic WebDriver lifecycle management (yes, it uses selenium-webDriver under the hood)
Parallel execution: Built-in support for parallel test execution (though we’re not using that here)

NightWatch Migration work

Component Tests: hundreds of assertions

Controls Component: 91 assertions across 5 scenarios
DebugConsole Component: 125 assertions across 6 scenarios
UnitsConversion Component: 57 assertions across 3 scenarios
Performance: Same visual test harness pattern with optimized navigation

E2E Tests: 127 assertions (less important for this blog entry)

Doppler App Tests: 96 assertions covering main app functionality
Audio Processing Tests: 31 assertions covering file upload and audio features
Responsive Testing: Mobile viewport and cross-browser compatibility

Visual Implementation: Identical Test Documentation

The NightWatch implementation preserves the same visual-first approach, generating detailed screenshots for each test interaction. Here’s the Test Harness Component Testing pattern now powered by NightWatch. Those are gated on an env-var so could be turned off.

Example: Component State Testing

Initial State: Component Ready

Recording Toggle Interaction

The NightWatch implementation maintains the same three-section visual pattern:

Component Under Test (blue border) - The actual React component being tested
Test Harness State (green border) - Shows parent component state reflecting real app conditions
Event Log (yellow border) - Complete interaction history for debugging and verification

Component and E2e tests via NightWatch

Component Test Utils: nightwatch-utils.js

Test harness navigation and interaction
Component-specific assertions
Screenshot management for test documentation

E2E Test Utils: nightwatch-e2e-utils.js

Full application navigation
Cross-component integration testing
Mobile responsive testing utilities

Performance Optimizations Preserved

The NightWatch migration maintained all performance optimizations from the Selenium implementation:

Shared browser instances: Single Firefox instance per test suite. I am not sure if I am doing this in an idiomatically correct way for a forced serial use of NightWatchJs.
Fast page updates: window.location.replace() instead of full navigation
Optimized waits: Implicit timeouts of 1-2 seconds vs default 10+ seconds
Strategic screenshots: Only when not in CI or when SKIP_SCREENSHOTS is false

Custom Dependency: @nightwatch/react Fork

This project uses a custom fork "@nightwatch/react": "github:paul-hammant/nightwatch-plugin-react#main" to update transitive dependencies that were several major versions behind, resolving React 18+ compatibility issues and security vulnerabilities while maintaining full API compatibility. Fingers crossed the Nightwatch team will process the pull request, and I get to delete the section.

Running the component tests

> react-app@2.1.2 test:ct
> npm run build:server --silent && nightwatch --config nightwatch.conf.js src/components/__tests__/**/*.ct.nightwatch.test.js

CSS imports will be handled by the server
Setting up NightWatch test environment...


[Controls Ct Nightwatch Test] Test Suite
───────────────────────────────────────────────────────────────────────────────
- Starting GeckoDriver on port 4444...

ℹ Connected to GeckoDriver on port 4444 (1542ms).
  Using: firefox (140.0) on LINUX.

- Loading url: http://localhost:3001/render-component/ControlsTestHarness?testName=Initial

  ℹ Loaded url http://localhost:3001/render-component/ControlsTestHarness?testName=Initial in 128ms
  ✔ Element <[data-testid="test-name"]> was present after 29 milliseconds.

  Running renders in test harness with initial state visible:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 24 milliseconds.
  ✔ Element <[data-testid="record-button"]> was present after 11 milliseconds.
  ✔ Testing if element's <[data-testid="record-button"]> inner text equals 'Start
Listening' (11ms)
  ✔ Element <[data-testid="unit-toggle-button"]> was present after 5 milliseconds.
  ✔ Testing if element's <[data-testid="unit-toggle-button"]> inner text equals 'Switch to
mph' (11ms)
  ✔ Element <[data-testid="harness-recording-state"]> was present after 4 milliseconds.
  ✔ Testing if element's <[data-testid="harness-recording-state"]> inner text equals 'Recording: OFF' (9ms)
  ✔ Element <[data-testid="harness-units-state"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-units-state"]> inner text equals 'Units: METRIC (km/h)' (8ms)
  ✔ Element <[data-testid="test-name"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="test-name"]> inner text equals 'Test: Initial State Visibility' (9ms)

  ✨ PASSED. 11 assertions. (220ms)

  Running demonstrates event coupling - recording toggle:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 21 milliseconds.
  ✔ Element <[data-testid="record-button"]> was present after 4 milliseconds.
  ✔ Testing if element's <[data-testid="record-button"]> inner text equals 'Start
Listening' (17ms)
  ✔ Element <[data-testid="harness-recording-state"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="harness-recording-state"]> inner text equals 'Recording: OFF' (9ms)
  ✔ Element <[data-testid="event-log"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="event-log"]> inner text equals 'No events yet...' (11ms)
  ✔ Element <[data-testid="record-button"]> was present after 2 milliseconds.
  ✔ Element <[data-testid="record-button"]> was visible after 12 milliseconds.

  PASSED: 9 passed (481ms)


[Debug Console Ct Nightwatch Test] Test Suite
───────────────────────────────────────────────────────────────────────────────
- Starting GeckoDriver on port 4444...

ℹ Connected to GeckoDriver on port 4444 (1580ms).
  Using: firefox (140.0) on LINUX.

- Loading url: http://localhost:3001/render-component/DebugConsoleTestHarness?testName=Initial

  ℹ Loaded url http://localhost:3001/render-component/DebugConsoleTestHarness?testName=Initial in 113ms
  ✔ Element <[data-testid="test-name"]> was present after 19 milliseconds.

  Running loadDebugTestHarness:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 24 milliseconds.

  ✨ PASSED. 1 assertions. (39ms)

  Running comprehensive debug console functionality and states:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 19 milliseconds.
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 8 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Show Debug Console' (15ms)
  ✔ Expected element <[data-testid="debug-console-container"]> to not be present - element was not found (1011ms)
  ✔ Expected element <[data-testid="debug-toggle-button"]> to have attribute "aria-label" which equals: "Show Debug Console" (12ms)
  ✔ Expected element <[data-testid="debug-toggle-button"]> to have attribute "class" which contains: "debug-toggle-button" (8ms)
  ✔ Element <[data-testid="harness-log-count"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 4' (10ms)
  ✔ Element <[data-testid="harness-intercept-state"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-intercept-state"]> inner text equals 'Intercept Console: NO' (8ms)
  ✔ Expected element <[data-testid="event-log"]> to be visible (8ms)
  ✔ Expected element <[data-testid="debug-toggle-button"]> to be present (3ms)

  ✨ PASSED. 12 assertions. (1.222s)

  Running handles empty logs state:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 19 milliseconds.
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Show Debug Console' (11ms)
  ✔ Expected element <[data-testid="debug-console-container"]> to not be present - element was not found (1004ms)
  ✔ Element <[data-testid="harness-log-count"]> was present after 6 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 0' (13ms)

  ✨ PASSED. 6 assertions. (1.098s)

  Running handles large number of log entries:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 27 milliseconds.
  ✔ Element <[data-testid="harness-log-count"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 50' (8ms)
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Show Debug Console' (8ms)
  ✔ Expected element <[data-testid="debug-console-container"]> to not be present - element was not found (1011ms)

  ✨ PASSED. 6 assertions. (1.144s)

  Running debug console with production-like log scenarios:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 25 milliseconds.
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 5 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Show Debug Console' (12ms)
  ✔ Expected element <[data-testid="debug-console-container"]> to not be present - element was not found (1006ms)
  ✔ Element <[data-testid="harness-log-count"]> was present after 5 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 10' (10ms)
  ✔ Expected element <[data-testid="event-log"]> to be present (3ms)
  ✔ Expected element <[data-testid="debug-toggle-button"]> to be present (3ms)
  ✔ Element <[data-testid="harness-log-count"]> was present after 5 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 10' (10ms)

  ✨ PASSED. 10 assertions. (1.182s)

  Running expanded debug console with production-like content:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 21 milliseconds.
  ✔ Element <[data-testid="harness-log-count"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 10' (10ms)
  ✔ Element <[data-testid="harness-expanded-state"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="harness-expanded-state"]> inner text equals 'Debug Console State: EXPANDED (for testing)' (9ms)
  ✔ Expected element <[data-testid="debug-console-container"]> to be visible (8ms)
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Hide Debug Console' (7ms)
  ✔ Expected element <[data-testid="debug-log-entry-0"]> to be visible (8ms)
  ✔ Expected element <[data-testid="debug-log-entry-4"]> to be visible (10ms)
  ✔ Expected element <[data-testid="debug-log-entry-9"]> to be visible (8ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text 'Application startup complete' (8ms)
  ✔ Element <[data-testid="debug-log-entry-4"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-4"]> contains text 'FFT processing timeout' (8ms)
  ✔ Element <[data-testid="debug-log-entry-6"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-6"]> contains text 'Audio processing restored' (9ms)
  ✔ Element <[data-testid="debug-log-entry-8"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-8"]> contains text 'Speed calculation: 25.3 mph' (8ms)
  ✔ Element <[data-testid="debug-log-entry-9"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-9"]> contains text 'Doppler shift detected: +127 Hz' (9ms)
  ✔ Element <[data-testid="debug-log-entry-4"]> was present after 1 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-4"]> contains text 'ERROR' (8ms)
  ✔ Element <[data-testid="debug-log-entry-4"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-4"]> contains text 'FFT processing timeout' (10ms)
  ✔ Element <[data-testid="debug-log-entry-7"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-7"]> contains text 'WARN' (17ms)
  ✔ Element <[data-testid="debug-log-entry-7"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-7"]> contains text 'High CPU usage detected' (10ms)
  ✔ Element <[data-testid="debug-log-entry-6"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-6"]> contains text 'SUCCESS' (11ms)
  ✔ Element <[data-testid="debug-log-entry-6"]> was present after 4 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-6"]> contains text 'Audio processing restored' (11ms)
  ✔ Expected element <[data-testid="debug-log-container"]> to be visible (8ms)
  ✔ Expected element <[data-testid="debug-fft-status"]> to be visible (10ms)
  ✔ Expected element <[data-testid="debug-clear-button"]> to be visible (13ms)

  ✨ PASSED. 36 assertions. (441ms)

  Running debug console supports dynamic log updates after initial load:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 17 milliseconds.
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Hide Debug Console' (11ms)
  ✔ Expected element <[data-testid="debug-console-container"]> to be visible (11ms)
  ✔ Element <[data-testid="harness-log-count"]> was present after 5 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 2' (10ms)
  ✔ Element <[data-testid="harness-expanded-state"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="harness-expanded-state"]> inner text equals 'Debug Console State: EXPANDED (for testing)' (9ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 7 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text 'ADDED AFTER 1' (9ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text 'INFO' (8ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text '10:30:00' (9ms)
  ✔ Element <[data-testid="debug-log-entry-1"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-1"]> contains text 'ADDED AFTER 2' (8ms)
  ✔ Element <[data-testid="debug-log-entry-1"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-1"]> contains text 'INFO' (8ms)
  ✔ Element <[data-testid="debug-log-entry-1"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-1"]> contains text '10:30:05' (9ms)
  ✔ Expected element <[data-testid="debug-toggle-button"]> to be present (2ms)
  ✔ Expected element <[data-testid="event-log"]> to be present (3ms)
  ✔ Element <[data-testid="test-name"]> was present after 22 milliseconds.
  ✔ Element <[data-testid="harness-log-count"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="harness-log-count"]> inner text equals 'Log Count: 5' (10ms)
  ✔ Element <[data-testid="harness-expanded-state"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-expanded-state"]> inner text equals 'Debug Console State: EXPANDED (for testing)' (9ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text 'ADDED AFTER 1' (10ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text 'INFO' (9ms)
  ✔ Element <[data-testid="debug-log-entry-0"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-0"]> contains text '10:30:00' (8ms)
  ✔ Element <[data-testid="debug-log-entry-1"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-1"]> contains text 'ADDED AFTER 2' (7ms)
  ✔ Element <[data-testid="debug-log-entry-1"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-1"]> contains text 'INFO' (9ms)
  ✔ Element <[data-testid="debug-log-entry-1"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-1"]> contains text '10:30:05' (8ms)
  ✔ Element <[data-testid="debug-log-entry-2"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-2"]> contains text 'Collaborator: High memory usage detected' (9ms)
  ✔ Element <[data-testid="debug-log-entry-2"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-2"]> contains text 'WARN' (7ms)
  ✔ Element <[data-testid="debug-log-entry-2"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-2"]> contains text '10:30:10' (9ms)
  ✔ Element <[data-testid="debug-log-entry-3"]> was present after 4 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-3"]> contains text 'System: Network timeout occurred' (9ms)
  ✔ Element <[data-testid="debug-log-entry-3"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-3"]> contains text 'ERROR' (8ms)
  ✔ Element <[data-testid="debug-log-entry-3"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-3"]> contains text '10:30:15' (9ms)
  ✔ Element <[data-testid="debug-log-entry-4"]> was present after 2 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-4"]> contains text 'User: Speed detection started' (9ms)
  ✔ Element <[data-testid="debug-log-entry-4"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-4"]> contains text 'INFO' (9ms)
  ✔ Element <[data-testid="debug-log-entry-4"]> was present after 3 milliseconds.
  ✔ Testing if element <[data-testid="debug-log-entry-4"]> contains text '10:30:20' (10ms)
  ✔ Expected element <[data-testid="event-log"]> to be present (2ms)
  ✔ Expected element <[data-testid="debug-toggle-button"]> to be present (2ms)
  ✔ Element <[data-testid="debug-toggle-button"]> was present after 2 milliseconds.
  ✔ Testing if element's <[data-testid="debug-toggle-button"]> inner text equals 'Hide Debug Console' (9ms)

  ✨ PASSED. 61 assertions. (590ms)


[Units Conversion Ct Nightwatch Test] Test Suite
───────────────────────────────────────────────────────────────────────────────
- Starting GeckoDriver on port 4444...

ℹ Connected to GeckoDriver on port 4444 (1780ms).
  Using: firefox (140.0) on LINUX.

- Loading url: http://localhost:3001/render-component/ControlsTestHarness?testName=Initial

  ℹ Loaded url http://localhost:3001/render-component/ControlsTestHarness?testName=Initial in 100ms
  ✔ Element <[data-testid="test-name"]> was present after 13 milliseconds.

  Running demonstrates mph → km/h → mph conversion cycle with full visibility:
───────────────────────────────────────────────────────────────────────────────────────────────────
  ✔ Element <[data-testid="test-name"]> was present after 26 milliseconds.
  ✔ Element <[data-testid="test-name"]> was present after 5 milliseconds.
  ✔ Testing if element's <[data-testid="test-name"]> inner text equals 'Test: Initial' (11ms)
  ✔ Element <[data-testid="unit-toggle-button"]> was present after 4 milliseconds.
  ✔ Testing if element's <[data-testid="unit-toggle-button"]> inner text equals 'Switch to
mph' (10ms)
  ✔ Element <[data-testid="harness-units-state"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="harness-units-state"]> inner text equals 'Units: METRIC (km/h)' (8ms)
  ✔ Element <[data-testid="event-log"]> was present after 3 milliseconds.
  ✔ Testing if element's <[data-testid="event-log"]> inner text equals 'No events yet...' (7ms)
  ✔ Element <[data-testid="unit-toggle-button"]> was present after 3 milliseconds.
  ✔ Element <[data-testid="unit-toggle-button"]> was visible after 11 milliseconds.

  PASSED: 11 passed (544ms)

───────────────────────────────────────────────────────────────────────────────────────────────────

  ️TEST FAILURE (14.615s): 
   - 0 assertions failed; 166 passed
   - 5 skipped

   ✖ 1) Controls.ct.nightwatch.test

   – demonstrates event coupling - recording toggle (481ms)

    SKIPPED (at runtime):
    - demonstrates event coupling - units toggle
    - shows processing state affecting component
    - complex scenario - multiple interactions with full trace
   ✖ 2) UnitsConversion.ct.nightwatch.test

   – demonstrates mph → km/h → mph conversion cycle with full visibility (544ms)

    SKIPPED (at runtime):
    - demonstrates units state with initial imperial mode
    - demonstrates units toggle with processing state

 Wrote HTML report file to: /home/paul/scm/car-doppler/test-results/nightwatch/nightwatch-html-report/index.html

Tearing down NightWatch test environment...

NightWatch vs Selenium-WebDriver JS

They’re the same “Selenium”, so both are using real browsers locally or remotely. NightWatch has a slightly different grammar tries to do more with less config files. It also has smooth built-in error reporting with automatic screenshots. I’m not showing the HTML report for that but it is pretty. You could make automatic failure screenshots for selenium-webdriver but it would require some coding setup. Nightwatch can also automate browser lifecycle for you. In my case I wanted one Firefox left open for all tests in a run, and that’s not (yet) configured in my project.

NightWatch.js strikes an excellent balance between power and simplicity, making it a solid choice for JavaScript teams wanting robust browser automation with a little less complexity of regular selenium-webdriver, yet still keeping the “real browser” selling point. It is also closer to the speed of Cypress.

Selenium Component Testing and visual documentation

2025-06-22T00:00:00+00:00

Five days ago I posted UI component testing revisited featuring a React web app and Playwright component tests and two days agoCypress component testing for a Cypress version of the same (and a refresher on the ideas). Now…

Selenium Component Testing of aa React web app

Branch: selenium_instead_of_playwright

Selenium Visual Implementation: Complete Test Documentation Through Screenshots

The Selenium implementation takes a similar visual-first approach, generating detailed screenshots for each test interaction. Here’s the identical Test Harness Component Testing pattern implemented with Selenium WebDriver, producing the same visual layout as both Playwright and Cypress implementations.

Recap - what the deployed app looks like

The app deployed in Safari on an smaller-screen iPhone:

Example Initial Component State

The Selenium implementation produces the same visual pattern as Playwright and Cypress:

Component Under Test (blue border) - The actual React component being tested
Test Harness State (green border) - Shows the parent component state that would exist in the real app
Event Log (yellow border) - Traces the complete interaction history for debugging

Units Conversion Cycle Component Test

Initial State: Metric Mode (Switch to mph available)

After Click: Imperial Mode (Switch to km/h available)

After Second Click: Back to Metric Mode

The Selenium implementation captures the same interaction flow:

Button text changing from “Switch to mph” → “Switch to km/h” → “Switch to mph”
Harness state updating from “METRIC (km/h)” → “IMPERIAL (mph)” → “METRIC (km/h)”
Event log accumulating each interaction: “Units changed to imperial” → both events visible

Architecture: External Test Server vs Built-in Mounting

As an experiment in testing framework diversity, the car-doppler project also implemented the same Test Harness Component Testing approach using Selenium WebDriver with Jest, providing an interesting comparison point to the Playwright implementation.

The Selenium approach uses a different architecture from Playwright’s built-in component mounting:

Selenium + Jest Setup:

Spawns an external component test server (server.ts)
Tests run against real HTTP endpoints
More complex setup but closer to production environment
Browser automation via WebDriver protocol

Key Architectural Difference:

Component-Under-Test == CUT

Playwright: mount(<CUT->) → virtual DOM → browser context
Selenium:   CUT in a page, on a HTTP server → real DOM → WebDriver → browser automation

Performance Trade-offs and Optimization Journey

For the Selenium Implementation:

One Shared WebDriver instance across all tests
Single browser navigation + page replacement strategy
Optional screenshot generation (SKIP_SCREENSHOTS env var)

Selenium Performance Analysis (100 iterations of identical ‘best case’ test, headless):

WITH Screenshots: 0.585s average per test (4.9 tests per second)
WITHOUT Screenshots: 1.248s average per test (7.99 tests per second)

Browser was Firefox (headless) and tests control component within the test harness. Admittedly, the test doesn’t have any interactions, so more interactive tests would be slower.

The Performance Gap

Despite aggressive optimization, the 100-iteration analysis reveals Selenium’s consistent performance disadvantage due to architectural differences. While Selenium is only marginally slower than Playwright without screenshots (0.585s vs 0.517s), the gap widens with screenshots. Both are much slower than Cypress’s optimized performance.

Process Architecture - The “Hops” Problem:

Selenium (Many Hops):

Jest Process → HTTP → Express Server → Server-Side React Render → 
HTTP Response → WebDriver JSON Protocol → geckodriver → Firefox Process → 
DOM Updates → WebDriver Response → HTTP → Jest

Playwright Component Testing (Minimal Hops):

Jest Process → Vite Dev Server (same process) → 
Direct Component Mount → Embedded Chromium → DOM → Direct Response

Practical Advice for Selenium-using Teams Today

If you need Selenium-level browser coverage, accept the 1-second tax as the cost of cross-browser compatibility. Definately bypass all interstitial steps/pages and go directly to the component under test.

When Selenium Component Testing Makes Sense

Use Cases:

High-stakes UI components where visual regression is critical
Stakeholder communication - screenshots are self-explanatory
Complex interaction flows that benefit from step-by-step visual documentationgit

Integration with WebDriver Infrastructure

For teams already using Selenium for E2E testing, extending the same infrastructure to component testing provides consistency:

Same browser automation patterns
Some shared WebDriver utilities and helpers
Consistent screenshot/video capture approaches
Single testing technology stack

Component test-base: Playwright vs Selenium

Topic	Playwright test-base	Selenium test-base
Core library	`@playwright/test` + `@playwright/experimental-ct-react`	`selenium-webdriver` (Jest/Mocha wrapper)
Browser control	In-process Playwright browsers (`chromium`, `firefox`, `webkit`)	Remote/WebDriver sessions (`Builder`, `By`, `until`)
Startup / teardown	Handled by Playwright fixtures; no manual server code	Helpers `startTestServer()` / `stopTestServer()` spin up dev server on demand
Global fixtures	`test`, `page`, automatic context per test	Custom `driver` via `getDriver()`; shared helpers
DOM access	`page.locator()`, `page.getByRole()`, `page.getByTestId()`	`driver.findElement(By.css('[data-testid="…"]'))` + helper wrappers
Wait strategy	Auto-wait built in; `expect(locator).toBeVisible()`	Explicit `driver.wait(until.elementLocated())` for every action/assert
Assertions	Playwright’s `expect` matchers	Node `assert` / Jest `expect`
Timeouts	Defaults ~ 30 s per Playwright	Test-suite timeout manually bumped (e.g. `jest.setTimeout(60000)`)
Artifacts	Trace viewer, video, screenshots configurable	No built-ins; screenshots would need extra code
File naming	`*.ct.playwright.test.ts(x)`	`*.ct.selenium.test.ts(x)`
Dependencies	`@playwright/*`, Playwright CT config	+ `selenium-webdriver`, `ts-jest`, @types/selenium-webdriver – all Playwright packages
Typical helpers (from diff)	N/A (built into Playwright)	`loadTestHarness(url)`, `findElementByTestId(id)`, `clickElementByTestId(id)`, `getTextByTestId(id)`
Speed / flakiness	Faster, less boilerplate, auto-waiting reduces flake	Slower, more boilerplate; explicit waits are best practice, though a default wait can be set too

Overall, the Use version replicates Playwright functionality but requires custom scaffolding for server control, waiting, and assertions, resulting in more verbose and potentially slower tests.

Serial vs parallel test execution

The testing in these blog entries was for serial execution on a single machine or vm. Parallel and distributed execution capabilities vary significantly, and are worth summarizing:

Cypress: Limited parallel execution (Cypress Cloud/Dashboard for CI, but component tests typically run serially)
Playwright: Excellent built-in parallel execution (--workers=4), can distribute across multiple machines
Selenium: Superior distributed execution via Selenium Grid, supports large-scale parallel execution across browser farms. That and 10+ SaaS vendors that made commercial Selenium grids (and more).

Conclusion

Selenium component testing feel more production-like, even with testing environments. While it comes with performance trade-offs compared to Cypress or (less so) Playwright, it is still compelling test evidence for multiple browsers potentially that’s immediately understandable to stakeholders. And with parallel/distributed execution via the likes of Selenium grid and commercial services, delays on good or bad could be reduced.

The key insight is that Test Harness Component Testing works equally well across different automation frameworks, allowing teams to choose based on their specific priorities: speed (Cypress), cross-browser capabilities (Playwright), or comprehensive visual documentation and production-like environments (Selenium).

Cypress Component Testing - Changing from Playwright for a demo repo

2025-06-20T00:00:00+00:00

A few days ago I posted UI component testing revisited on some component testing patterns I’ve been interested in for many years. The test application was React in TypeScript. The component-testing technology I focused on was Playwright. In this blog entry I explore Cypress as an alternative to Playwright for component testing.

Quick Refresher: Test Harness Component Testing (2017 → 2025)

Back in 2017, I introduced the concept of testing UI components within test harnesses rather than in isolation. The key insight was testing components in the “smallest reasonable rectangle” while maintaining realistic event coupling to parent components. Instead of mocking everything, you create a test harness that simulates how the component would actually be used in the real application. Could be that someone thought of it before me, of course. That’s usually the case.

The pattern involves dual assertions: testing both the component’s visual state AND the test harness state to verify that events flow correctly between child and parent. This bridges the gap between fast unit tests and comprehensive integration tests, providing confidence that components work correctly when integrated while maintaining reasonable test speeds.

Eight years later, this approach has become mainstream for React/Angular teams, with modern tooling making visual verification through screenshots a powerful addition for stakeholder communication and debugging.

Cypress Component Testing

Branch with code: cypress_instead_of_playwright

Component testing

While the examples in this article demonstrate the pattern using Playwright, the actual implementation in the cypress_instead_of_playwright branch of the car-doppler repo uses Cypress component testing instead, unsurprisingly. Here’s how the two approaches compare:

Comparison Criteria

For evaluating component testing frameworks, we’ll assess each on:

Criterion	Weight	Why It Matters
Performance	High	Fast feedback during development
Developer Experience	High	Learning curve and debugging ease
Browser Support	Medium	Cross-browser validation needs
Ecosystem Integration	Medium	CI/CD and tooling compatibility
Visual Testing	Medium	Screenshot and visual regression capabilities
Setup Complexity	Low	One-time cost, varies by team expertise

This framework will guide our Playwright ↔ Cypress ↔ Selenium comparisons.

Testing Framework Differences

Cypress Component Testing:

Integrated test runner with excellent developer experience
Built-in browser automation and debugging tools
Simpler setup for React component testing
Real-time test runner with automatic re-runs
Exceptional performance for component testing (17x faster than Playwright)

Playwright Component Testing:

Cross-browser testing capabilities (Chrome, Firefox, Safari)
Better for cross-browser validation (Cypress is Chromium-focused)
Better CI/CD integration options
More flexible browser configuration
Broader ecosystem support and active development

Performance Comparison

Cypress by default uses an embedded browser in Electron, though real Chromium or Firefox can be configured.

Cypress Implementation (Actual):

Specs: 3 component test files
Tests: 10 total tests across components  
Duration: ~1.6 seconds total execution
Performance: ~6.25 tests per second
Browser: Electron 130 (headless)
E2E: 11 passing tests in ~10 seconds (with some skipped tests)

Cypress Performance Analysis (100 iterations of identical test):

WITHOUT Screenshots: 0.030s average per test (33.3 tests per second)
WITH Screenshots: 0.190s average per test (5.26 tests per second)
Browser: Electron 130 (headless)
Test: Controls component with Test Harness Component Testing

Playwright Implementation (Article Examples):

Performance: 1.37-1.93 tests per second
WITH Screenshots: 0.730s average per test
WITHOUT Screenshots: 0.517s average per test

Performance Comparison:

Framework	WITHOUT Screenshots	WITH Screenshots
Cypress	0.030s (33.3/sec)	0.190s (5.26/sec)
Playwright	0.517s (1.93/sec)	0.730s (1.37/sec)
Advantage	Cypress 17x faster	Cypress 3.8x faster

The Cypress implementation dramatically outperforms Playwright, likely due to:

Optimized component mounting: Direct React component injection vs external browser process
Command batching: Multiple operations sent as coordinated sequences
Electron efficiency: Highly optimized for automation workloads
Reduced protocol overhead: Fewer round-trips between test runner and browser - batching of long series of interactions with the page under test is a Cypress special-sauce.

Both implementations successfully achieve the core goal: testing components in the “smallest reasonable rectangle” with realistic event coupling and visual verification capabilities.

When to Choose Cypress

Fast development feedback loops

The 17x performance advantage significantly reduces wait times during development, if you’re running all component tests of that class. If you know the test you want to run, or have a smart way of automatically selecting impacted tests, that advantage is less relevant.

Interactive debugging: Time-travel debugging is compelling too, for complex interaction flows

You might not choose Cypress if:

Cross-browser validation: A limitation to Chromium-based browsers primarily might be problematic if you needed Safari/Firefox
Legacy web applications: Better suited for modern JavaScript frameworks and much harder for (say) ASP.NET-MVC apps.

Cypress’s Command Batching: A Speed Advantage with Mind-Shift Requirements

One of Cypress’s unique architectural advantages is its command batching and queuing system. Unlike traditional testing frameworks that execute commands synchronously, Cypress batches operations and sends them to the browser for execution as a coordinated sequence.

The Batching Advantage:

// Cypress batches these commands and sends them together:
cy.get('[data-testid="record-button"]').click();
cy.get('[data-testid="unit-toggle-button"]').click(); 
cy.get('[data-testid="record-button"]').click();
cy.get('[data-testid="unit-toggle-button"]').click();

// Browser receives: "click these 4 elements in sequence"
// vs traditional approach: 4 separate round-trips

The performance impact ia a reduced round-trip latency as multiple operations sent in one batch, which leads to a network efficiency that can’t be beaten. Large form filling and wait-for strategies are particularly suited to batching.

The Mind-Shift in Simple Terms:

// Thinking synchronously (won't work):
const text = cy.get('[data-testid="button"]').text(); // Returns a command, not text!
if (text === 'Start') { /* This fails */ }

// Thinking in Cypress chains:
cy.get('[data-testid="button"]')
  .should('contain.text', 'Start')  // Assertion happens when command runs
  .click();                         // Click happens after text verification

Why This Matters: Once you understand the queuing concept, Cypress tests become more reliable because assertions auto-retry and commands execute in guaranteed order. I tried the same thing (ish) with FluentSelenium way back. Well the retry of chains of locators, but the batching - it was very chatty over however many hops you had between the test and the browser. I’m sure lots of seasoned QE folks initially try to program the old way, before deciding to relent and do things the Cypress way.

Playwright Comparison:

Playwright executes commands more immediately but with more round-trip overhead. For simple tests the difference is negligible, but for complex component interactions the batching advantage compounds.

Cypress Visual Implementation: The Same Pattern, Different Tool

Just as the post a couple of days ago demonstrated with “Test Harness Component Testing with Playwright” screenshots from a particular test, here’s the identical implementation using Cypress component testing. The visual layout and testing approach remain exactly the same - demonstrating the framework-agnostic nature of the pattern.

Recap - what the deployed app looks like

The app deployed in Safari on an smaller-screen iPhone:

Initial Component State

The Cypress implementation produces the same visual pattern as Playwright:

Component Under Test (blue border) - The actual React component being tested
Test Harness State (green border) - Shows the parent component state that would exist in the real app
Event Log (yellow border) - Traces the complete interaction history for debugging

Units Conversion Cycle Demonstration

Initial State: Metric Mode (Switch to mph available)

After Click: Imperial Mode (Switch to km/h available)

After Second Click: Back to Metric Mode

The Cypress implementation captures the same interaction flow:

Button text changing from “Switch to mph” → “Switch to km/h” → “Switch to mph”
Harness state updating from “METRIC (km/h)” → “IMPERIAL (mph)” → “METRIC (km/h)”
Event log accumulating each interaction: “Units changed to imperial” → both events visible

A potential reader questions: “Why not just use Storybook?”

Yes, I don’t have Storybook configured as part of the ‘car-doppler’ solution. Storybook and Test Harness Component Testing serve different but complementary purposes, I think:

Storybook excels at:

Visual component documentation and design system management. Perhaps allowing a menu-centric picking of components for multiple larger web applications.
Interactive component exploration for designers and stakeholders
Visual regression testing (Chromatic or similar tools)
Component isolation for development and review
And yes it is an EASY mounting for GET-centric test-automation purposes - I’ve been in teams that have loved this.

A Test Harness Component Testing setup excels at:

Automated verification of all component behaviors and event coupling
Testing realistic parent-child component interactions
Continuous integration validation with assertion-based testing
Debugging component integration issues through event tracing

A Reality, though: Many teams would use both. Storybook for documentation and design review, Test Harness Component Testing for automated quality assurance. They’re solving different problems in the component development lifecycle.

Conclusion

Cypress component testing offers significant performance advantages for teams focused on fast development feedback loops, particularly when working with React/Angular applications. The 17x speed improvement over Playwright for component testing makes it an attractive choice for development workflows that prioritize rapid iteration.

The framework-agnostic nature of Test Harness Component Testing means the same testing patterns and visual verification approaches work across different tools, allowing teams to choose based on their specific needs rather than being locked into a particular testing philosophy.

For teams building component-heavy applications where development speed is critical and cross-browser testing can be handled separately, Cypress component testing provides an excellent balance of performance, developer experience, and debugging capabilities.

The diff between the two branches is not so easy to see what went where: github.com/paul-hammant/car-doppler/compare/main…cypress_instead_of_playwright, but you can get a sense how how much was changed.

Paul Hammant's blog

The limits of merging experiment

The setup

Scenario 1: git am is honest, and that’s why it fails

Scenario 2: git cherry-pick is helpful, and that’s why it lies

What about merge-point tracking?

The order of cherry-picks question

Blocking a commit (Scenario C)

What the SHA actually proves

What git can’t tell you

The real risk in real corporate codebases

Reproducing this

Repeating in SVN: what svn:mergeinfo actually looks like

Scenario A — cherry-pick out of order (C4 then C3, then sweep)

Scenario B — cherry-pick in trunk order (C3 then C4, then sweep)

What the property is telling you

Scenario C — blocking C3 with --record-only

Reproducing the SVN side

Repeating in Perforce: integration records, not properties

Scenario A — cherry-pick out of order (C4 then C3, then sweep)

Scenario B — cherry-pick in trunk order (C3 then C4, then sweep)

What the records are telling you

Scenario C — blocking C3 with resolve -ay

Reproducing the Perforce side

Closing

So which VCS “knows what’s been integrated”?

Live Verify

Live verify links

What is it?

Post-Verification Actions

Current Tech Problems

What Works: Prose

What Breaks: Tabular Receipts

Where This Leaves Us

The Real Fix: Registration Marks for Tabular Data

Did You Send This - for module phone SMS/Voice

Applying DYST to Mobile Communication

DYST supporting phone or (iOS or Android)

Smartphone supporting RCS but not yet supporting DYST (older iOS or Android version perhaps)

Phone not supporting RCS nor DYST (much older iOS or Android version perhaps)

Example DYST Messages

Challenge (JSON payload) to claimed originator who has a DYST-enabled Smartphone

Challenge RCS message to claimed originator who has a Smartphone that’s not DYST enabled (a fallback)

Challenge SMS message to claimed originator with a non-RCS phone

Language Selection for a Global DYST System

1. First-Class (DYST-enabled Handset)

2. RCS Fallback (non-DYST Smartphone)

3. SMS Fallback (non-RCS phone)

Modern CV Technology: JSON Resume embedded in HTML

Building the Future of Digital Resumes: A Technical Deep Dive

The Problem with Traditional Resumes

The Solution: JSON Resume Schema + Interactive HTML

Some interactivity to control verbosity

Responsive Design with Print Optimization

PDF Generation Pipeline

Sample CVs in the repository

Implementation Highlights

Markdown Support Within JSON

ATS Integration Strategy

Performance Characteristics

Security Considerations

Getting Started

Links:

Building a Secure Container Sandbox on ChromeOS for Testing Untrusted Code

The Problem: Running Random GitHub Code Safely

ChromeOS Container Architecture: Defense in Depth

Complete Setup Process

Prerequisites

Quick Start Script

Manual Setup Steps

1. Create the Untrusted Container

2. Install Development Tools

3. Create Monitoring Script

Troubleshooting Common Issues

Termina Filesystem Issues

SSH Setup

LXC Command Not Found

Secure Git Access Patterns

The SSH Agent Forwarding Security Risk

Safer Alternatives for Git Access

Scenario 1: `git am` is honest, and that’s why it fails

Scenario 2: `git cherry-pick` is helpful, and that’s why it lies

Repeating in SVN: what `svn:mergeinfo` actually looks like

Scenario C — blocking C3 with `--record-only`

Scenario C — blocking C3 with `resolve -ay`

Swift Unit & Integration Tests (`swift test`)

Component Test via Appium (`npm test`)