<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//atom.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-04-12T21:39:52+00:00</updated><id>https://brokenco.de//atom.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">Private Open Source</title><link href="https://brokenco.de//2026/04/01/private-open-source.html" rel="alternate" type="text/html" title="Private Open Source" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://brokenco.de//2026/04/01/private-open-source</id><content type="html" xml:base="https://brokenco.de//2026/04/01/private-open-source.html"><![CDATA[<p>Open source communities depend on a fundamental assumption that is no longer
true: the presumption of good faith actors. The hosts serving free and open
source code are scraped relentlessly, denying service to developers. Once that
code has been assimilated into various models it is washed of all attribution
and license information, denying rights of the developers. Some subset of users
then feel empowered, emboldened, I’m not sure what exactly by these models and
lob massive thousand line changes back at the developers. Nearly every
technology has the possibility to be used for positive and negative effects,
but free and open source communities are being harmed from multiple directions
right now.</p>

<p>I am a big believer in <a href="https://openinfra.org/four-opens/">the four opens</a>:</p>

<blockquote>
  <p>The Four Opens are a set of principles guidelines that were created by the
OpenStack community as a way to guarantee that the users get all the benefits
associated with open source software, including the ability to engage with the
community and influence future evolution of the software.</p>

  <ul>
    <li>Open Source</li>
    <li>Open Design</li>
    <li>Open Development</li>
    <li>Open Community</li>
  </ul>
</blockquote>

<p>There is an implied “to the public” in each of the four opens, at least how I
have understood it over the past many (<em>many</em>) years. I have repeatedly
advocated for open (to the public) discourse and transparency when working with
companies like <a href="https://cloudbees.com">CloudBees</a> and
<a href="https://databricks.com">Databricks</a> as they have engaged with open source
projects.</p>

<p>The mounting negative pressures and in some cases <a href="https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/">outright
hostility</a>
towards free and open source projects has me reconsidering the implied “to the
public” and how these communities may need to evolve in the future.</p>

<p>While I have never been a fan of invite-only Discord or Slack servers, both of
which are used by the <a href="https://datafusion.apache.org/contributor-guide/communication.html">Apache
Datafusion</a>
project for some odd reason. There are good reasons to put the project’s shared
spaces in slightly more private and slightly less AI-accessible systems. A
little bit of privacy can lead to more candid conversations and <em>potentially</em> a
stronger feeling of community and safety.</p>

<p>My first line of thinking led me to the idea of “vouching” which I recall
<a href="https://mitchellh.com/writing">mitchellh</a> posting about in the fediverse, but
I couldn’t find a good linkable reference.</p>

<p>Vouching is what we did as kids when a new friend was suggested to join the
mischief, somebody would vouch for the new kid and say “hey, they’re my
neighbor, they’re cool” and then we would go start new trouble together. In the
context of an open source community vouching can:</p>

<ul>
  <li>Help build a web of trust without every person necessarily knowing each new person</li>
  <li>But <em>also</em> vouching means there is a higher tendency for a community to be
homogeneous, since it will be less welcoming to random new-comers.</li>
</ul>

<p>I think vouching could also exacerbate the likelihood of a <a href="https://en.wikipedia.org/wiki/XZ_Utils_backdoor">Jia
Tan</a> where the web of trust
within the community is compromised by a malicious actor. Getting <em>one</em> member
to vouch for you may lower the guard of all of the other members of the
community making these style of attacks easier to pull off.</p>

<p>Since I started writing this post a whole week has passed by, without any new
ideas or patterns popping into mind. I’m curious how others are thinking about
it, so please let me know <a href="https://hacky.town/@rtyler/116329725989266400">on Mastodon</a> or via
email <code class="language-plaintext highlighter-rouge">rtyler@</code>~</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opensource" /><category term="buoyantdata" /><category term="ai" /><summary type="html"><![CDATA[Open source communities depend on a fundamental assumption that is no longer true: the presumption of good faith actors. The hosts serving free and open source code are scraped relentlessly, denying service to developers. Once that code has been assimilated into various models it is washed of all attribution and license information, denying rights of the developers. Some subset of users then feel empowered, emboldened, I’m not sure what exactly by these models and lob massive thousand line changes back at the developers. Nearly every technology has the possibility to be used for positive and negative effects, but free and open source communities are being harmed from multiple directions right now.]]></summary></entry><entry><title type="html">The problem is obeying in advance</title><link href="https://brokenco.de//2026/03/25/do-not-comply.html" rel="alternate" type="text/html" title="The problem is obeying in advance" /><published>2026-03-25T00:00:00+00:00</published><updated>2026-03-25T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/25/do-not-comply</id><content type="html" xml:base="https://brokenco.de//2026/03/25/do-not-comply.html"><![CDATA[<p>Linux power-users tend to have strong opinions about two things: distribution
and systemd. The bazaar of distributions means
competing implementations or different perspectives end up expressed
through the curation of the packaged software. <code class="language-plaintext highlighter-rouge">systemd</code> ended up so
contentious because it’s a decent piece of technology which suffers from
persistent scope creep that became a foundational component in a <em>lot</em> of
distributions. The drama du jour is that systemd is somehow implicated in “age
verification laws.”</p>

<p><code class="language-plaintext highlighter-rouge">systemd</code> as an init system is pretty good. Once upon a time I worked on
porting <a href="https://en.wikipedia.org/wiki/Launchd">launchd</a> to
<a href="https://freebsd.org">FreeBSD</a> and so I have <em>some</em> familiarity with the
silliness of most init systems.</p>

<p><code class="language-plaintext highlighter-rouge">systemd</code> as a <a href="https://en.wikipedia.org/wiki/Katamari_Damacy">katamari</a> at the
root level of most Linux systems is <em>not</em> “pretty good.” There have been
<em>numerous</em> tendrils of what is understood to be “systemd” which are of lesser
quality and have resulted in security issues.</p>

<p>Anyways, I hope you get the point. systemd as an init system: good. systemd as a operating system: bad.</p>

<p>The drama du jour is about the latter.</p>

<hr />

<p>One should not <a href="https://timothysnyder.org/on-tyranny">obey in advance</a>.
Especially in the domain free and open source software which is <em>literally a
political project</em>.</p>

<p>I stumbled into <a href="https://blog.bofh.it/debian/id_473">this blog post</a> through
<a href="https://planet.debian.org">Planet Debian</a> by a debian maintainer which is
patently absurd.</p>

<blockquote>
  <p>Recently, the free software Nazi bar crowd styling themselves as “concerned
citizens” has tried to start a moral panic by saying that systemd is
implementing age verification checks or that somehow it will require
providing personally identifiable information.</p>
</blockquote>

<p>The author is correct insofar that <code class="language-plaintext highlighter-rouge">systemd</code> did <strong>not</strong> add age verification.
<strong>However</strong> most of the folks upset with the change are upset that their Linux
systems are obeying in advance.</p>

<p>systemd
<strong>did</strong> make changes in order to obey. To take part in anti-free restrictions
under the guise of “age verification” From the <a href="https://github.com/systemd/systemd/pull/40954">pull
request</a></p>

<blockquote>
  <p>Stores the user’s birth date for age verification, as required by recent laws
in California (AB-1043), Colorado (SB26-051), Brazil (Lei 15.211/2025), etc.</p>
</blockquote>

<p>The whole motivation of the change was to <em>obey in advance</em> to these unjust laws.</p>

<p>The author then goes on to make some equally absurd claims about how this
functionality is <em>important for porents</em> to implement controls on computers, for
the children! Clearly this person must not know any actual children, or
even parents for that matter. Children are <em>excellent</em> at finding ways
to circumvent restrictions. The idea that a user-modifiable piece of data on
a local machine should be trusted for “parental controls” is so detached from
reality that I originally thought they were making a sarcastic joke.</p>

<p>I think this <a href="https://lists.debian.org/debian-legal/2026/03/msg00018.html">tongue-in-cheek systemd-censord</a> post does better than anybody to exclaim how absolutely ludicrous this obeying in advance is:</p>

<blockquote>
  <p>Systemd units will be created for every desired censorship function, and will
be started based on the user’s location. For example, a unit for Kazakhstan
will implement the government-required backdoor, a unit for China will
implement keyword scans and web access blocks (more on this later), a unit
for Florida will ban all packages with “trans” in the name (201 packages in
current stable distribution), a unit for Oklahoma will ensure all educational
software is compliant with the Christian Holy Bible, a unit for the entire
United States will prevent installation of any program capable of decoding
DVD or BluRay media, and a unit for California will provide the user’s age to
all applications and all web sites from which applications may be downloaded.
As can be seen, multiple units may be started for a given location.</p>
</blockquote>

<p>Do not obey in advance.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="software" /><category term="opinion" /><category term="linux" /><summary type="html"><![CDATA[Linux power-users tend to have strong opinions about two things: distribution and systemd. The bazaar of distributions means competing implementations or different perspectives end up expressed through the curation of the packaged software. systemd ended up so contentious because it’s a decent piece of technology which suffers from persistent scope creep that became a foundational component in a lot of distributions. The drama du jour is that systemd is somehow implicated in “age verification laws.”]]></summary></entry><entry><title type="html">35E</title><link href="https://brokenco.de//2026/03/22/a-poem-for-united.html" rel="alternate" type="text/html" title="35E" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/22/a-poem-for-united</id><content type="html" xml:base="https://brokenco.de//2026/03/22/a-poem-for-united.html"><![CDATA[<p>35E, 35E.<br />
Stuck here in the middle<br />
of the middle,<br />
35E.</p>

<p>At my height any seat<br />
can feel like misery.</p>

<p>I wouldn’t be here today,<br />
if not for last night’s delay.</p>

<p>35E, 35E<br />
trapped in this humid<br />
sneeze of<br />
humanity.</p>

<p>Everything is expensive, and still it sucks.<br />
The cheapest coffee was four lousy  bucks.</p>

<p>The grumpiness was extreme at the TSA<br />
acting in their theatre for deferred pay</p>

<p>35E, 35E<br />
I’m not sure in which<br />
timezone<br />
I should be.</p>

<p>United customer support has been totally outsourced,<br />
hour seventeen on the phone; just the worst.</p>

<p>For the shareholders the texts all lie,<br />
about being powered by GenAI.</p>

<p>35E, 35E<br />
all of the staff<br />
on this flight and the last<br />
have been really kind and patient which is a testament to their professionalism and hospitality despite the overtly customer-hostile environment of modern American commercial aviation.</p>]]></content><author><name>R. Tyler Croy</name></author><summary type="html"><![CDATA[35E, 35E. Stuck here in the middle of the middle, 35E.]]></summary></entry><entry><title type="html">2026 March: Recently Studied Stuff</title><link href="https://brokenco.de//2026/03/21/fresh-from-rss.html" rel="alternate" type="text/html" title="2026 March: Recently Studied Stuff" /><published>2026-03-21T00:00:00+00:00</published><updated>2026-03-21T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/21/fresh-from-rss</id><content type="html" xml:base="https://brokenco.de//2026/03/21/fresh-from-rss.html"><![CDATA[<p>Over the past week I have made a more conscious effort to keep track of some
really interesting articles that came through my feed reader. I am a big fan of
the open web and the power of RSS for disseminating interesting information
from actual people. Below are some really interesting posts I have read recently!</p>

<p><strong><a href="https://felipe.rs/2024/10/23/arrow-over-http/">Compressed Apache Arrow tables over HTTP</a></strong></p>

<p>When discussing transport protocols for sending data between services at work
recently, a colleague asked “why can’t we just yeet Arrow over HTTP?” It turns out, you <a href="https://github.com/apache/arrow-experiments/tree/main/http/get_simple/python">absolutely can</a> and Arrow IPC streams even have a registered MIME type:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Content-Type: application/vnd.apache.arrow.stream
</code></pre></div></div>

<p><strong><a href="https://blog.dataexpert.io/p/parquet-can-shrink-your-data-100x">Understanding Parquet format for beginners</a></strong></p>

<p>A great introduction to the <a href="https://parquet.apache.org">Apache Parquet</a> format
and why it makes so many things better with large data storage systems like
<a href="https://delta.io">Delta Lake</a>. I have written on this
<a href="/tag/parquet.html">topic</a> before and encourage you to take another read
through <a href="https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/">this blog
post</a>
by some maintainers of the <a href="https://crates.io/crates/parquet">parquet</a> crate.</p>

<p><strong><a href="https://apenwarr.ca/log/20260316">Every layer of review makes you 10x slower</a></strong></p>

<blockquote>
  <p>Every layer of approval makes a process 10x slower [..]</p>

  <p>Just to be clear, we’re counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.</p>

  <ul>
    <li>Code a simple bug fix: 30 minutes</li>
    <li>Get it code reviewed by the peer next to you: 300 minutes → 5 hours → half a day</li>
    <li>Get a design doc approved by your architects team first: 50 hours → about a week</li>
    <li>Get it on some other team’s calendar to do all that (for example, if a customer requests a feature): 500 hours → 12 weeks → one fiscal quarter</li>
  </ul>
</blockquote>

<p>This inspired these thoughts which I shared with the <a href="https://github.com/delta-io/delta-rs">delta-rs</a> community:</p>

<p>“what if we didn’t require code review for merging into main”</p>

<p>I’m exploring the thought more about what we might need to make that happen.
“Why would you do such a thing, code review is so valuable!”  I do find code
reviews valuable but we do seem to lose a lot of flow time due to timezones,
differing work schedules, and a number of other things. For something without a
lot of changes, especially bug fixes that come with tests I would be much more
comfortable with maintainers merging once CI goes green.</p>

<p>Some pieces of the puzzle that I think would be needed:</p>

<ul>
  <li>Soft caps on pull requests. I saw this mentioned somewhere else, but implementing a soft cap of &lt;500 lines per pull request can help people avoid massive unreviewable changes that are simpler to integrate.</li>
  <li>Incorporating some of the benchmarking work into CI that has already been explored. If performance of key operations is not affected and the build is green, go for it.</li>
  <li>Stronger semantic version checks: if our APIs have not changed and all tests pass, I’m generally comfortable with landing stuff by maintainers.</li>
  <li>Implementing Apache Software Foundation style release candidates and voting: this is where we would put a mandatory bottleneck, rather than some jokey slack emojis like I tend to do, implementing a true release candidate process that requires review and vote before we push something to users.</li>
</ul>

<p>All of this is to say that reviews can still be requested, but I would love to
see us land more improvements faster and I think we have a bunch of different
schedules that can make pushing each change through a review queue a lot slower
than necessary.</p>

<p><strong><a href="https://www.possiblerust.com/pattern/conditional-impls">Conditional Impls in Rust</a></strong></p>

<blockquote>
  <p>It’s possible in Rust to conditionally implement methods and traits based on
the traits implemented by a type’s own type parameters. While this is used
extensively in Rust’s standard library, it’s not necessarily obvious that
this is possible.</p>
</blockquote>

<p>I have been vaguely aware of this functionality but haven’t really taken the
time to consider it, so I really appreciated this post walking through the
conditional impl functionality in Rust.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="arrow" /><category term="parquet" /><category term="rust" /><summary type="html"><![CDATA[Over the past week I have made a more conscious effort to keep track of some really interesting articles that came through my feed reader. I am a big fan of the open web and the power of RSS for disseminating interesting information from actual people. Below are some really interesting posts I have read recently!]]></summary></entry><entry><title type="html">Only so many sunrises</title><link href="https://brokenco.de//2026/03/15/so-many-sunrises.html" rel="alternate" type="text/html" title="Only so many sunrises" /><published>2026-03-15T00:00:00+00:00</published><updated>2026-03-15T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/15/so-many-sunrises</id><content type="html" xml:base="https://brokenco.de//2026/03/15/so-many-sunrises.html"><![CDATA[<p>With a lot of discussion around intelligence lately, I find myself thinking a lot more about <strong>wisdom</strong>.
Age doesn’t necessarily beget wisdom, but I do think that experience can.
I am always impressed by those who are able to reflect and grow wise from the
varied joys and traumas that shape each one of us.</p>

<p><a href="https://www.youtube.com/watch?v=ZJEnQOsMtsU">This video</a> struck a chord for
me. Contrasting the Bay Area hustle culture to the things that make life worth
living:</p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/ZJEnQOsMtsU?si=fmlv30bzptgNgN_t" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe></center>

<hr />

<p>San Francisco has always a destination for those seeking their fortunes. The
frenetic enthusiasm radiates through seemingly everything there.</p>

<p>I also really enjoyed the energy of San Francisco when I first moved
there. I had nothing else but work.</p>

<p>The trade-off for my relentless focus on my career was a tremendous level up in
a short amount of time. I wouldn’t be where I am today without a few years of
judicious networking and workaholism.</p>

<p>San Francisco was “Lord of the Flies” when I described it to friends from
elsewhere. Awash in adult boys, untethered from the real world. I would hang
out with men 10 years older then me who were doing the same dumb shit I was,
except I was in my early twenties, a commonly accepted time to be foolish.</p>

<p>I did not want to end up like them and increasingly put both physical and
mental distance between the them and myself.</p>

<p>There is more to life than panning for gold.</p>

<hr />

<p>Today I was talking with an elder almost twice as old as me, who casually
offered:</p>

<blockquote>
  <p>I still get up at 5am; at this age there are only so many sunrises left to
see.</p>
</blockquote>

<p>I’m going to try to not stay up too late dwelling on the comment, lest I miss
tomorrow’s sunrise.</p>]]></content><author><name>R. Tyler Croy</name></author><summary type="html"><![CDATA[With a lot of discussion around intelligence lately, I find myself thinking a lot more about wisdom. Age doesn’t necessarily beget wisdom, but I do think that experience can. I am always impressed by those who are able to reflect and grow wise from the varied joys and traumas that shape each one of us.]]></summary></entry><entry><title type="html">Based Lake, a petabyte-scale low-latency data lake</title><link href="https://brokenco.de//2026/03/10/based-lake.html" rel="alternate" type="text/html" title="Based Lake, a petabyte-scale low-latency data lake" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/10/based-lake</id><content type="html" xml:base="https://brokenco.de//2026/03/10/based-lake.html"><![CDATA[<p>I had a chat today about building large scale low-latency data retrieval
systems around AWS S3. In doing so I got to share a bit of the talk proposal I
submitted to <a href="https://dataaisummit.com">Data and AI Summit</a> this year about
real-live work that has made it into production.</p>

<p>For years the conventional wisdom around <a href="https://delta.io">Delta Lake</a> has
been to <strong>not</strong> connect user-facing/online systems to Delta tables. Basically,
don’t point your Django app at your Delta tables. This continues to be a decent
<em>guideline</em> but definitely <strong>not a rule</strong> and I have the performance data to
back that up.</p>

<p>My talk abstract:</p>

<blockquote>
  <p>Scribd hosts hundreds of millions of documents and has hundreds of billions of
objects across our buckets. Combining large-language models with a massive
amounts of text has required investment in our new Content Library
architecture.  We selected Delta Lake as the underlying storage technology but
have pushed it to an extreme. Using the same Delta Lake architecture we offer
both direct data access for data scientists in Databricks Notebooks and online
data retrieval in milliseconds for user-facing web services.</p>

  <p>In this talk we will review principles of performance for each layer of the
stack: web APIs, the Delta Lake tables, Apache Parquet, and AWS S3.</p>
</blockquote>

<p>The work done by myself and my colleague Eugene in this area has been heavily
related to my previous research around <a href="/2025/06/24/low-latency-parquet.html">Low latency Parquet
reads</a> which informed work named <a href="https://tech.scribd.com/blog/2026/content-crush.html">Content
Crush</a>, which I have
explored more on the Scribd tech blog and on the <a href="/2026/02/13/screaming-in-the-cloud.html">Screaming in the
Cloud</a> podcast.</p>

<p>I really hope that I am able to share results at Data and AI Summit from this
incredibly challenging work that I am undertaking. But even if I don’t, blog
posts like my musings on <a href="/2026/01/19/multimodal-delta-lake.html">Multimodal with Delta
Lake</a>, <a href="https://www.buoyantdata.com/blog/2024-12-31-high-concurrency-logstore.html">scaling streaming Delta Lake
applications</a>,
and a myriad of other articles I have published can be pieced together to form
the larger mosaic of insane large-scale data work I have been hammering on!</p>]]></content><author><name>R. Tyler Croy</name></author><category term="arrow" /><category term="parquet" /><category term="deltalake" /><category term="databricks" /><category term="scribd" /><summary type="html"><![CDATA[I had a chat today about building large scale low-latency data retrieval systems around AWS S3. In doing so I got to share a bit of the talk proposal I submitted to Data and AI Summit this year about real-live work that has made it into production.]]></summary></entry><entry><title type="html">Using tmux with bhyve</title><link href="https://brokenco.de//2026/03/10/bhyve-tmux.html" rel="alternate" type="text/html" title="Using tmux with bhyve" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/10/bhyve-tmux</id><content type="html" xml:base="https://brokenco.de//2026/03/10/bhyve-tmux.html"><![CDATA[<p>Many aspects of FreeBSD follow the user-friendly unix philosophy, it’s just choosy about who its friends are. <a href="https://en.wikiquote.org/wiki/Unix">1</a>
I have always found <a href="https://bhyve.org/">bhyve</a> virtualization to be really
interesting but really unfriendly. The
<a href="https://github.com/freebsd/vm-bhyve">vm-bhyve</a> management system was what
finally cracked <code class="language-plaintext highlighter-rouge">bhyve</code> open and made it usable for me. The <code class="language-plaintext highlighter-rouge">vm</code> command has
paper cuts but generally speaking it does what I want on my primary FreeBSD
machine.</p>

<p>For the longest time I used the built-in VNC support to connect to machines
because the <code class="language-plaintext highlighter-rouge">vm console</code> command would use <code class="language-plaintext highlighter-rouge">/usr/bin/cu</code> which would
<em>inevitably</em> trap my console and no amount of <code class="language-plaintext highlighter-rouge">~&gt;~&gt;~&gt;&lt;~D~&gt;D&lt;S~&gt;D&lt;~&gt;&lt;D~&lt;L</code> would
help me exit.</p>

<p>Somewhere along the line <a href="https://github.com/tmux/tmux/wiki">tmux</a> support was
<a href="https://github.com/freebsd/vm-bhyve/wiki/Using-tmux">added to <code class="language-plaintext highlighter-rouge">vm-bhyve</code></a> and now <code class="language-plaintext highlighter-rouge">vm console &lt;name&gt;</code> simply opens up a new tmux
window!</p>

<p>I host everything under <code class="language-plaintext highlighter-rouge">/vm</code> on the machine, so in <code class="language-plaintext highlighter-rouge">/vm/.config/system.conf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>console="tmux"
</code></pre></div></div>

<p>This seems like a simple thing to be excited about, and it is, but it makes
VMs <em>wildly</em> more accessible and useful for me.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="freebsd" /><summary type="html"><![CDATA[Many aspects of FreeBSD follow the user-friendly unix philosophy, it’s just choosy about who its friends are. 1 I have always found bhyve virtualization to be really interesting but really unfriendly. The vm-bhyve management system was what finally cracked bhyve open and made it usable for me. The vm command has paper cuts but generally speaking it does what I want on my primary FreeBSD machine.]]></summary></entry><entry><title type="html">A Few Good Humans</title><link href="https://brokenco.de//2026/03/05/a-few-good-humans.html" rel="alternate" type="text/html" title="A Few Good Humans" /><published>2026-03-05T00:00:00+00:00</published><updated>2026-03-05T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/05/a-few-good-humans</id><content type="html" xml:base="https://brokenco.de//2026/03/05/a-few-good-humans.html"><![CDATA[<p>The tool is only as good as its training data. A developer I work with was
expressing some frustrations with the strong encouragement <em>but definitely not
a mandate</em> to use AI-assisted coding tools. They were feeling gaslit because
they were told this was going to 10x their productivity and instead it had led
them <em>significantly</em> astray and ended up wasting much more time than it saved!</p>

<p>The developer was trying to write some brand Terraform for the project they
were working on, not their area of expertise but it needed to be done. They had
an experience which I recognized from my explorations with earlier models where
I <em>also</em> just wanted the model to write some miserable Terraform resources
because I didn’t want to myself. Except when prompted, it was as if I asked
“<em>write me some Terraform to provision this resource, wrong answers only.</em>”</p>

<p>I <em>also</em> thought I must be insane and just using the magic numbers box
incorrectly. Seeing the exact same situation play out for another developer,
with a different model, years later, I feel like i understand the problem!</p>

<p><strong>There is not enough open source <em>use</em> of Terraform to copy</strong>.</p>

<p>Plenty of great <a href="https://github.com/terraform-aws-modules/">open source terraform modules
exist</a> (Putin khuylo!) but perilously
few open source examples exist of <em>using</em> Terraform. I believe this to be
because the vast majority of Terraform is <em>closed source</em> and therefore not
scraped and ingested into these models.</p>

<p>If code-generating AI tools don’t want to suffer from the <a href="https://en.wikipedia.org/wiki/Dead_Internet_theory">dead internet
theory</a>, the data has to
come from somewhere.</p>

<p>The machine relies on code being <em>open sourced</em>.</p>

<hr />

<p><strong>LTJG Kaffee</strong>: Colonel Jessep! Did you author that code?</p>

<p><strong>Judge Randolph</strong>: You don’t have to answer that question!</p>

<p><strong>Col Jessup</strong>: I’ll answer the question. You want answers?</p>

<p><strong>LTJG Kaffee</strong>: I think I’m entitled to them.</p>

<p><strong>Col Jessup</strong>: You want answers?!</p>

<p><strong>LTJG Kaffee</strong>: I want the truth!</p>

<p><strong>Col Jessup</strong>: You can’t handle the truth!</p>

<p>Son, we live in a world that has models, and those models have to be seeded by
humans who code.</p>

<p>Who’s gonna do it? You? You, Lieutenant Weinberg?</p>

<p>I have a greater responsibility than you can possibly fathom.</p>

<p>You weep for the roadmap, and you curse the upstream. You have that luxury. You
have the luxury of not knowing what I know – that the launch delay, while
tragic, probably saved time; and my existence, while grotesque and
incomprehensible to you, saves time.</p>

<p>You don’t want the truth because deep down in places you don’t talk about at
parties, you want me feeding that model  – you need me feeding that model.</p>

<p>We use words like “fork,” “code,” “libre.” We use these words as the backbone of a life spent supporting something. You use them as a punch line.</p>

<p>I have neither the time nor the inclination to explain myself to a man who
rises and sleeps under the blanket of the productivity that I provide and then
questions the manner in which I provide it.</p>

<p>I would rather that you just said “thank you” and went on your way. Otherwise,
I suggest you pick up an editor and submit a PR.</p>

<p>Either way, I don’t give a DAMN what you think you’re entitled to!</p>

<p><strong>LTJG Kaffee</strong>: Did you author that code?</p>

<p><strong>Col Jessup</strong>: I did the job –</p>

<p><strong>LTJG Kaffee</strong>: – Did you author the code?</p>

<p><strong>Col Jessup</strong>: YOU’RE GOD DAMN RIGHT I DID!</p>

<hr />]]></content><author><name>R. Tyler Croy</name></author><category term="software" /><category term="ai" /><category term="llm" /><summary type="html"><![CDATA[The tool is only as good as its training data. A developer I work with was expressing some frustrations with the strong encouragement but definitely not a mandate to use AI-assisted coding tools. They were feeling gaslit because they were told this was going to 10x their productivity and instead it had led them significantly astray and ended up wasting much more time than it saved!]]></summary></entry><entry><title type="html">DCO and AI is a no-go.</title><link href="https://brokenco.de//2026/03/02/copyright-ai.html" rel="alternate" type="text/html" title="DCO and AI is a no-go." /><published>2026-03-02T00:00:00+00:00</published><updated>2026-03-02T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/02/copyright-ai</id><content type="html" xml:base="https://brokenco.de//2026/03/02/copyright-ai.html"><![CDATA[<p>The phrases “generative AI” and “copyright” evoke a multitude of stories about
unauthorized training, scraping, and violation of norms. The thought that
somebody could then turn around and then try to copyright works <em>generated</em> by
these large language models is absurd, but in 2026 anything kind of goes
doesn’t it?</p>

<p>One of the big arguments <em>against</em> generative AI-based coding
tools is that they were trained on billions of lines of <em>copyrighted</em> and
licensed works in the open source ecosystem, and they strip those works of all
attribution and violate the terms of the licenses.</p>

<p>Yesterday there was some fervor about <a href="https://www.theverge.com/policy/887678/supreme-court-ai-art-copyright">the Supreme Court allowing a lower court
decision to
stand</a>
in my timeline. I have been following this topic for a few weeks after reading this
<a href="https://www.congress.gov/crs_external_products/LSB/PDF/LSB10922/LSB10922.8.pdf">Congressional Research Service report: Generative Artificial Intelligence and
Copyright
Law</a>
(PDF) and considering how some of the guidance might affect the use of
generative AI in open source projects.</p>

<p><a href="https://toot.cat/@zkat/116162089501237946">kat nailed it with their toot</a></p>

<blockquote>
  <p>So uh</p>

  <p>does this mean that there is now precedent that at least “agentic” dev systems,
potentially any genAI dev system, now leaves companies open to their code no
longer being considered copyrightable if they use these systems?</p>
</blockquote>

<p>kat is right that <strong>this is huge</strong>.</p>

<p>In the <a href="https://github.com/delta-io">Delta Lake project</a> we rely on <a href="https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin">Developer
Certificate of
Origin</a> (DCO) with
guidance from the Linux Foundation. Yes, <a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">that Linux Foundation</a>.</p>

<p>From the DCO:</p>

<blockquote>
  <p>The contribution was created in whole or in part by me and I have the right
to submit it under the open source license indicated in the file; or</p>
</blockquote>

<p>From the <a href="https://www.apache.org/licenses/LICENSE-2.0.html">Apache License</a>:</p>

<blockquote>
  <p>“Licensor” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.</p>

  <p>[..]</p>

  <p>“Contributor” shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.</p>
</blockquote>

<p>I am no lawyer but I do have at least a 12th grade reading comprehension level.</p>

<p>If AI-generated works are not copyrightable, then it is not possible for
somebody to <em>license</em> under any open source license, much less assert via DCO
that they are able to do so.</p>

<hr />
<p><strong>2026-03-04 update:</strong> Software Freedom Conservancy has <a href="https://sfconservancy.org/blog/2026/mar/04/scotus-deny-cert-dc-circuit-thaler-appeal-llm-ai/">a blog
post</a>
on the topic which is worth a read. I understand the narrowness of the scope of
judgement which is being referred to, but my opinion on the situation is taking
into consideration the other guidance from the Congressional Research report
and the US Copyright Office oplicies as they stand today.</p>

<hr />

<p>Again, this is a big deal.</p>

<p>From the Congressional Research report</p>

<blockquote>
  <p>The AI Guidance states that authors may claim copyright protection only “for
their own contributions” to such works, and they must identify and disclaim
AI-generated parts of the works when applying to register their copyright.</p>
</blockquote>

<p>The only viable solution I can imagine is that all AI-generated code
contributions in open source projects is considered public domain and commented
appropriately. Otherwise I don’t see a sensible path forward for
AI-generated code in open source projects.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opinion" /><category term="opensource" /><category term="ai" /><category term="llm" /><summary type="html"><![CDATA[The phrases “generative AI” and “copyright” evoke a multitude of stories about unauthorized training, scraping, and violation of norms. The thought that somebody could then turn around and then try to copyright works generated by these large language models is absurd, but in 2026 anything kind of goes doesn’t it?]]></summary></entry><entry><title type="html">The value of efficient software</title><link href="https://brokenco.de//2026/02/23/value-of-efficiency.html" rel="alternate" type="text/html" title="The value of efficient software" /><published>2026-02-23T00:00:00+00:00</published><updated>2026-02-23T00:00:00+00:00</updated><id>https://brokenco.de//2026/02/23/value-of-efficiency</id><content type="html" xml:base="https://brokenco.de//2026/02/23/value-of-efficiency.html"><![CDATA[<p>The value of efficient and thoughtfully designed software is going to continue
to grow. What I never expected was for the “AI” data center to be the catalyst
that could help many organizations understand that argument!</p>

<p>Today Hetzner, a major cloud services provider in Europe <a href="https://www.hetzner.com/pressroom/statement-price-adjustment/">announced</a></p>

<blockquote>
  <p>There have been drastic price increases in various areas in the IT sector
recently. That is why, unfortunately, we must also increase the prices of our
products.</p>

  <p>The costs to operate our infrastructure and to buy new hardware have both
increased dramatically. Therefore, our price changes will affect both
existing products and new orders and will take effect starting on 1 April
2026.</p>
</blockquote>

<p>Last year for Earth Day I wrote <a href="https://www.buoyantdata.com/blog/2025-04-22-rust-is-good-for-the-climate.html">on the Buoyant Data blog</a></p>

<blockquote>
  <p>Time is money. In the cloud time is measured and billed by the vCPU/hour and
the most efficient software is always the cheapest.</p>
</blockquote>

<p>Nothing makes the case for more efficient software like more expensive
hardware!</p>

<p>In the past five years I have <em>repeatedly</em> seen success in taking a system
written in a less-efficient platform, redesigning and rebuilding in Rust, and
reaping the rewards in lower operational costs.</p>

<p>For a simple exercise, imagine a service which costs $100,000/year to operate,
that’s roughly $1,900 a week. Assuming a developer’s time costs roughly $6,000
a week, taking a month to rebuild the service might cost $25,000. The
efficiency needed is then only about 25% to pay off that rewrite in a year, but
what I have consistently seen is an <em>order of magnitude</em> change in efficiency.</p>

<p>Instead of costing $100k, these newly deployed services tend to cost less than
10-20% of their predecessors. Recouping the cost of conversion in a couple of
months, freeing up money to go towards different investments.</p>

<p>The biggest cost to contend with is opportunity cost and that one is <em>much</em>
harder to model, and also much less subject to changing prices by your vendors.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="software" /><category term="cloud" /><category term="opinion" /><category term="rust" /><summary type="html"><![CDATA[The value of efficient and thoughtfully designed software is going to continue to grow. What I never expected was for the “AI” data center to be the catalyst that could help many organizations understand that argument!]]></summary></entry></feed>