<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://brokenco.de//atom.xml" rel="self" type="application/atom+xml" /><link href="https://brokenco.de//" rel="alternate" type="text/html" /><updated>2026-05-03T00:12:50+00:00</updated><id>https://brokenco.de//atom.xml</id><title type="html">rtyler</title><subtitle>a moderately technical blog</subtitle><author><name>R. Tyler Croy</name></author><entry><title type="html">Butterflia</title><link href="https://brokenco.de//2026/05/02/butterflia.html" rel="alternate" type="text/html" title="Butterflia" /><published>2026-05-02T00:00:00+00:00</published><updated>2026-05-02T00:00:00+00:00</updated><id>https://brokenco.de//2026/05/02/butterflia</id><content type="html" xml:base="https://brokenco.de//2026/05/02/butterflia.html"><![CDATA[<p>There’s a dead deer in the bin. The burial while unceremonious was not without
a deep sadness for an animal that didn’t know I exist.</p>

<p>My house is in the shockingly short boundary where city turns into wild. Troops
of wild turkeys march through, the jack rabbits scurry away when you walk up
the hill, and deer leap across the road heading from the creek on one side to
the hills on the other.</p>

<p>I purchased this house from the woman who built it with her husband. Frank
passed away about 20 years ago, but he still visits. His wife Marilyn moved to
Sacramento and is an avid gardener. She cultivated a rich variety of flowers,
bushes, and trees under the towering oaks of the home she built.</p>

<p>I let most of the non-native plants wither, encouraging and those which could
survive the wet winters and dry summers. Continuing Marilyn’s garden would have
taken too much time and water, neither of which are in abundance these days.
Nonetheless I still feel a sense of responsibility to the land, the animals,
and the oaks.</p>

<p>Returning from a week of travel, I was shocked to spot a fat doe lounging on
the hillside. Deer are common but I have never seen a deer <em>just lying on the
ground</em>. A couple days later walking through the backyard the doe, who was
given the name “Butterflia” was just as surprised to see me as I was to see
her. Later that evening I noticed her again further down the hill, a tiny fawn
wobbling between her legs.</p>

<p>Last year a doe was struck and killed by a car on the road that runs alongside
my house. The following day I discovered a fawn with a broken hind leg in the
backyard. The nice woman from <a href="https://www.fawnrescue.org/">Fawn Rescue of Sonoma
County</a> and I cautiously crept through the woods
with outstretched beach towels but could not catch the injured fawn and it
escaped.</p>

<p>I called them back to see if they could help me relocate this doe and fawn. The
same nice woman assured me that they would probably move on in a week or so,
and to call back if their situation deteriorated.</p>

<p>Over the weekend I spotted a second fawn. Butterflia had twins! The backyard
was off limits to ensure that neither Butterflia nor her fawns would be scared
into the road. I am happy to share my space with such relatively benign neighbors.</p>

<p>By the end of the following week there was only one fawn that I regularly saw
with Butterflia.  During the day when I knew she was out I swept through the
backyard to see if I could find a body, but never learned what happened with
that fawn.</p>

<p>For whatever reason Butterflia had decided to call the backyard home.</p>

<p>A week later, on a Sunday while carefully walking through the backyard to get
something from the shed I discovered the other fawn’s newly deceased body
behind a rock near the house. The job of coroner would have to wait until
Monday after work, Sunday’s plans did not include a dead fawn.</p>

<p>Butterflia discovered the body Monday morning around 4am. I was awaken by the
mournful bellowing of a mother discovering the body of her child. It was
absolutely <strong>gut-wrenching</strong>. I busied myself on the other side of
the house, periodically checking back in the bedroom, only to hear Butterflia’s
continued sorrow. She painfully wailed for almost three hours and spent the
rest of the day near the rock. I would see her revisiting the body, poking her
head under the leaves of the bushes as if in disbelief.</p>

<p>She was mourning and I found myself mourning as well.</p>

<p>I let the body rest for a couple days. I wanted to make sure Butterflia had
left before I did anything, out of some sense of respect. Using an old spade I
scooped up the fawn and gently placed it in the bin, covered it with dirt, and
then some leaves and straw.</p>

<p>The week came and went without any additional sightings of Butterflia.  Her
sadness affected me more deeply than I anticipated. I don’t know if she mourned
the first fawn, or even knew it was gone. I imagined her grief was compounded
by losing both of her babies, the early morning anguish heavy with the
knowledge that they both died.</p>

<p>She must have moved on, hopefully to greener pastures.</p>

<p>I swept through the yard this weekend to see if there were other
remnants of deer habitation to be cleaned up. After completing my 
yard work, I startled Butterflia as she gingerly walked through the yard.</p>

<p>I said hello with a warm smile, pleased to see her, because I’m human.</p>

<p>She turned and looked at me dumbly, because she’s a deer.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opinion" /><summary type="html"><![CDATA[There’s a dead deer in the bin. The burial while unceremonious was not without a deep sadness for an animal that didn’t know I exist.]]></summary></entry><entry><title type="html">2026 April: Recently Studied Stuff</title><link href="https://brokenco.de//2026/04/30/fresh-from-rss.html" rel="alternate" type="text/html" title="2026 April: Recently Studied Stuff" /><published>2026-04-30T00:00:00+00:00</published><updated>2026-04-30T00:00:00+00:00</updated><id>https://brokenco.de//2026/04/30/fresh-from-rss</id><content type="html" xml:base="https://brokenco.de//2026/04/30/fresh-from-rss.html"><![CDATA[<p>Similar to last month I have given more intention to some of the interesting
things that I have stumbled across in my feed reader or the fediverse. Rather
than just a quip, boost, or reply, I have wanted to consolidate these thoughts
with more permanance here to my blog.</p>

<p>Chris’ talk below at <a href="https://northbaypython.org/">North Bay Python</a> was, as
his always are, well-delivered and worth consideration.</p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/d7AeWFbOTHg?si=zW0bHhRpj--dsrdW" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe></center>

<p>The conclusion that he
draws towards the end is similar to something I was <a href="/2025/09/20/sacrificing-the-understanding.html">noodling last
year</a>:</p>

<blockquote>
  <p>At some point somebody, somewhere, is going to have to actually understand
how things work.</p>
</blockquote>

<p>Chris makes the point, as he typically does, much more thoughtfully and with a
stronger philosophical base.</p>

<hr />

<p>Had some discussions with the <a href="https://github.com/delta-io/delta-kernel-rs">delta-kernel-rs</a> developers after they mistakenly added a <em>ton</em> of new files to <code class="language-plaintext highlighter-rouge">tests/</code> blowing up test cycle times. Another community member shared <a href="https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html">this great overview</a> about <strong>not</strong> using Cargo integration tests.</p>

<hr />

<p>Catching up on <a href="https://open.substack.com/pub/dataengineeringcentral/p/revisiting-data-quality?utm_source=share&amp;utm_medium=android&amp;r=cxg56">Daniel’s thoughts on Data
Quality</a>
and reconsidering the domain. The generation of slop has resulted in renewed
discussions of “but how do we ensure correctness?” which is a great question to
be trying to answer, but I am still rather disappointed with the state of the
art for data quality tooling.</p>

<hr />

<p>I recommend <a href="https://etbe.coker.com.au/2026/03/29/communication-hostile-ais/">this blog
post</a> which
has some good citations for negative AI behaviors affecting free and open
source communities.</p>

<blockquote>
  <p>This is going to be a difficult problem to solve, more difficult than the
email spam problem we have been unable to solve after 30
years of working on it.</p>

  <p>This is also a very important problem, we are currently in an age where we have
access to information that most people couldn’t even dream of 30 years ago. We
also have disinformation that combines some of the worst aspects of
authoritarian regimes throughout history combined with the worst aspects of
cult brainwashing. If we lose access to the information but the disinformation
remains (or get worse) then the result will be terrible.</p>
</blockquote>

<hr />

<p>I really enjoy <a href="https://planet.debian.org">Planet Debian</a> as an aggregator of an international set of voices from the Debian community. I get exposed to so many different view points from around the free software ecosystem, which I really value. This past week I read 
<a href="https://blog.bofh.it/debian/id_473">this blog post</a> by a debian maintainer which I was so flummoxed by I <a href="/2026/03/25/do-not-comply.html">wrote out my thoughts on the topic here</a></p>

<hr />

<p>Streaming tar over SSH is one of the more novel Unix tricks I don’t get to use
much anymore. <a href="https://drewdevault.com/2026/03/28/2026-03-28-rsync-without-rsync.html">Drew
Devault</a>
shared some helpful tips for using it without needing to use incantations of
<code class="language-plaintext highlighter-rouge">rsync(1)</code>.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="deltalake" /><category term="data" /><category term="dataengineering" /><category term="opensource" /><summary type="html"><![CDATA[Similar to last month I have given more intention to some of the interesting things that I have stumbled across in my feed reader or the fediverse. Rather than just a quip, boost, or reply, I have wanted to consolidate these thoughts with more permanance here to my blog.]]></summary></entry><entry><title type="html">Unity Catalog with S3 Access Points</title><link href="https://brokenco.de//2026/04/25/unity-s3-access-points.html" rel="alternate" type="text/html" title="Unity Catalog with S3 Access Points" /><published>2026-04-25T00:00:00+00:00</published><updated>2026-04-25T00:00:00+00:00</updated><id>https://brokenco.de//2026/04/25/unity-s3-access-points</id><content type="html" xml:base="https://brokenco.de//2026/04/25/unity-s3-access-points.html"><![CDATA[<p>Governance is the synergy of our era. If I could go one week without a
discussion around governance that really just boils down to classic role-based
access control practices..</p>

<p>The bad news I have for you is that today, in the year 2026 <strong>Unity Catalog
does not work with S3 Access Points.</strong></p>

<p><em>However</em> it does show a different pathology than it once did, which leads me
to believe that it <em>could</em>, if not for one silly little piece of technical
debt.</p>

<hr />

<p>The system I am building utilizes <a href="https://aws.amazon.com/s3/features/access-points/">Amazon S3 Access
Points</a> for <em>governance</em> but
must integrate into the <a href="https://databricks.com">Databricks</a> platform. A
platform which has its <em>own</em> ideas about governance: <a href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/">Unity
Catalog</a>. It
should come as no surprise that a system which was named <em>unity</em> would go to
great strides to make itself the center of the universe.</p>

<p>How troublesome!</p>

<p>Years ago a colleague and I tried to integrate Databricks Unity Catalog and S3
Access Points only for the approach to crash and burn. Integrating two
different opaque tools like IAM permissions and Unity Catalog led to all sorts
of attempted incantations, none of which actually succeeded.</p>

<p>The Databricks product team told us that the system did not support S3 Access
Points “by design.” I found the reasoning <em>very</em> patronizing because it was
presented as “we don’t support S3 Access Points by design to prevent users from
circumventing Unity access controls.”</p>

<p>What I understand now is how that “by design” was more of an excuse  “we just
don’t want to test it” rather than something more substantive.</p>

<p>S3 Access Points can be <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-naming.html">referenced a number of
ways</a>
like S3 Access Point Aliases to where even the most legacy system can integrate
with them.</p>

<blockquote>
  <p>An access point alias name meets all the requirements of a valid Amazon S3
 bucket name and consists of the following parts:</p>
</blockquote>

<p>The first time we bounced off this problem <a href="https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-s3-access-points-aliases-allow-application-requires-s3-bucket-name-easily-use-access-point/">S3 Access Point
Aliases</a>
had been only recently released;</p>

<p>Despite all Unity Catalog’s protestations the errors we ended up seeing don’t
convey a structural limitation when using S3 Access Point Aliases, instead they
point to simply out-dated SDK support in the underlying Databricks Runtime.</p>

<p>My hunch is that the AWS SDK v1 being <a href="https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/">announced as deprecated over two years
ago</a>
and being <em>completely</em> deprecated as of the end of 2025. Lots of Databricks and
other Spark <a href="https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/aws_sdk_upgrade.html">libraries still interact with S3 via the v1
SDK</a>. 
That SDK was originally released in 2010 (lol) and so it’s likely that the
issue we were authentication issue we were seeing was mixed up in the support
for S3 Access Point Aliases with this old SDK.</p>

<p>Since we bounced off this problem a number of years ago one thing that has
changed for the better in Unity Catalog is that it is now possible to grant
Unity a completely read-only configuration in IAM-based S3 bucket policies.
While we cannot use S3 Access Points as part of our governance strategy, we can
at least still grant a fairly limited permission to Unity for read-only
operations.</p>

<p>Now I can have my esoteric <a href="https://delta.io">Delta Lake</a> datastores present in
Unity without any risk of misconfiguration or error in Unity leading to data
corruption!</p>

<p><strong>Governance</strong> to a lot of enterprise vendors is about <em>centralization of
control</em>, but for me it’s about <a href="https://en.wikipedia.org/wiki/Defence_in_depth">defense in
depth</a>.
I never want a business-critical system to be a single
misconfiguration away from granting read or write access to the wrong
principal.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="databricks" /><category term="deltalake" /><summary type="html"><![CDATA[Governance is the synergy of our era. If I could go one week without a discussion around governance that really just boils down to classic role-based access control practices..]]></summary></entry><entry><title type="html">Private Open Source</title><link href="https://brokenco.de//2026/04/01/private-open-source.html" rel="alternate" type="text/html" title="Private Open Source" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://brokenco.de//2026/04/01/private-open-source</id><content type="html" xml:base="https://brokenco.de//2026/04/01/private-open-source.html"><![CDATA[<p>Open source communities depend on a fundamental assumption that is no longer
true: the presumption of good faith actors. The hosts serving free and open
source code are scraped relentlessly, denying service to developers. Once that
code has been assimilated into various models it is washed of all attribution
and license information, denying rights of the developers. Some subset of users
then feel empowered, emboldened, I’m not sure what exactly by these models and
lob massive thousand line changes back at the developers. Nearly every
technology has the possibility to be used for positive and negative effects,
but free and open source communities are being harmed from multiple directions
right now.</p>

<p>I am a big believer in <a href="https://openinfra.org/four-opens/">the four opens</a>:</p>

<blockquote>
  <p>The Four Opens are a set of principles guidelines that were created by the
OpenStack community as a way to guarantee that the users get all the benefits
associated with open source software, including the ability to engage with the
community and influence future evolution of the software.</p>

  <ul>
    <li>Open Source</li>
    <li>Open Design</li>
    <li>Open Development</li>
    <li>Open Community</li>
  </ul>
</blockquote>

<p>There is an implied “to the public” in each of the four opens, at least how I
have understood it over the past many (<em>many</em>) years. I have repeatedly
advocated for open (to the public) discourse and transparency when working with
companies like <a href="https://cloudbees.com">CloudBees</a> and
<a href="https://databricks.com">Databricks</a> as they have engaged with open source
projects.</p>

<p>The mounting negative pressures and in some cases <a href="https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/">outright
hostility</a>
towards free and open source projects has me reconsidering the implied “to the
public” and how these communities may need to evolve in the future.</p>

<p>While I have never been a fan of invite-only Discord or Slack servers, both of
which are used by the <a href="https://datafusion.apache.org/contributor-guide/communication.html">Apache
Datafusion</a>
project for some odd reason. There are good reasons to put the project’s shared
spaces in slightly more private and slightly less AI-accessible systems. A
little bit of privacy can lead to more candid conversations and <em>potentially</em> a
stronger feeling of community and safety.</p>

<p>My first line of thinking led me to the idea of “vouching” which I recall
<a href="https://mitchellh.com/writing">mitchellh</a> posting about in the fediverse, but
I couldn’t find a good linkable reference.</p>

<p>Vouching is what we did as kids when a new friend was suggested to join the
mischief, somebody would vouch for the new kid and say “hey, they’re my
neighbor, they’re cool” and then we would go start new trouble together. In the
context of an open source community vouching can:</p>

<ul>
  <li>Help build a web of trust without every person necessarily knowing each new person</li>
  <li>But <em>also</em> vouching means there is a higher tendency for a community to be
homogeneous, since it will be less welcoming to random new-comers.</li>
</ul>

<p>I think vouching could also exacerbate the likelihood of a <a href="https://en.wikipedia.org/wiki/XZ_Utils_backdoor">Jia
Tan</a> where the web of trust
within the community is compromised by a malicious actor. Getting <em>one</em> member
to vouch for you may lower the guard of all of the other members of the
community making these style of attacks easier to pull off.</p>

<p>Since I started writing this post a whole week has passed by, without any new
ideas or patterns popping into mind. I’m curious how others are thinking about
it, so please let me know <a href="https://hacky.town/@rtyler/116329725989266400">on Mastodon</a> or via
email <code class="language-plaintext highlighter-rouge">rtyler@</code>~</p>]]></content><author><name>R. Tyler Croy</name></author><category term="opensource" /><category term="buoyantdata" /><category term="ai" /><summary type="html"><![CDATA[Open source communities depend on a fundamental assumption that is no longer true: the presumption of good faith actors. The hosts serving free and open source code are scraped relentlessly, denying service to developers. Once that code has been assimilated into various models it is washed of all attribution and license information, denying rights of the developers. Some subset of users then feel empowered, emboldened, I’m not sure what exactly by these models and lob massive thousand line changes back at the developers. Nearly every technology has the possibility to be used for positive and negative effects, but free and open source communities are being harmed from multiple directions right now.]]></summary></entry><entry><title type="html">The problem is obeying in advance</title><link href="https://brokenco.de//2026/03/25/do-not-comply.html" rel="alternate" type="text/html" title="The problem is obeying in advance" /><published>2026-03-25T00:00:00+00:00</published><updated>2026-03-25T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/25/do-not-comply</id><content type="html" xml:base="https://brokenco.de//2026/03/25/do-not-comply.html"><![CDATA[<p>Linux power-users tend to have strong opinions about two things: distribution
and systemd. The bazaar of distributions means
competing implementations or different perspectives end up expressed
through the curation of the packaged software. <code class="language-plaintext highlighter-rouge">systemd</code> ended up so
contentious because it’s a decent piece of technology which suffers from
persistent scope creep that became a foundational component in a <em>lot</em> of
distributions. The drama du jour is that systemd is somehow implicated in “age
verification laws.”</p>

<p><code class="language-plaintext highlighter-rouge">systemd</code> as an init system is pretty good. Once upon a time I worked on
porting <a href="https://en.wikipedia.org/wiki/Launchd">launchd</a> to
<a href="https://freebsd.org">FreeBSD</a> and so I have <em>some</em> familiarity with the
silliness of most init systems.</p>

<p><code class="language-plaintext highlighter-rouge">systemd</code> as a <a href="https://en.wikipedia.org/wiki/Katamari_Damacy">katamari</a> at the
root level of most Linux systems is <em>not</em> “pretty good.” There have been
<em>numerous</em> tendrils of what is understood to be “systemd” which are of lesser
quality and have resulted in security issues.</p>

<p>Anyways, I hope you get the point. systemd as an init system: good. systemd as a operating system: bad.</p>

<p>The drama du jour is about the latter.</p>

<hr />

<p>One should not <a href="https://timothysnyder.org/on-tyranny">obey in advance</a>.
Especially in the domain free and open source software which is <em>literally a
political project</em>.</p>

<p>I stumbled into <a href="https://blog.bofh.it/debian/id_473">this blog post</a> through
<a href="https://planet.debian.org">Planet Debian</a> by a debian maintainer which is
patently absurd.</p>

<blockquote>
  <p>Recently, the free software Nazi bar crowd styling themselves as “concerned
citizens” has tried to start a moral panic by saying that systemd is
implementing age verification checks or that somehow it will require
providing personally identifiable information.</p>
</blockquote>

<p>The author is correct insofar that <code class="language-plaintext highlighter-rouge">systemd</code> did <strong>not</strong> add age verification.
<strong>However</strong> most of the folks upset with the change are upset that their Linux
systems are obeying in advance.</p>

<p>systemd
<strong>did</strong> make changes in order to obey. To take part in anti-free restrictions
under the guise of “age verification” From the <a href="https://github.com/systemd/systemd/pull/40954">pull
request</a></p>

<blockquote>
  <p>Stores the user’s birth date for age verification, as required by recent laws
in California (AB-1043), Colorado (SB26-051), Brazil (Lei 15.211/2025), etc.</p>
</blockquote>

<p>The whole motivation of the change was to <em>obey in advance</em> to these unjust laws.</p>

<p>The author then goes on to make some equally absurd claims about how this
functionality is <em>important for porents</em> to implement controls on computers, for
the children! Clearly this person must not know any actual children, or
even parents for that matter. Children are <em>excellent</em> at finding ways
to circumvent restrictions. The idea that a user-modifiable piece of data on
a local machine should be trusted for “parental controls” is so detached from
reality that I originally thought they were making a sarcastic joke.</p>

<p>I think this <a href="https://lists.debian.org/debian-legal/2026/03/msg00018.html">tongue-in-cheek systemd-censord</a> post does better than anybody to exclaim how absolutely ludicrous this obeying in advance is:</p>

<blockquote>
  <p>Systemd units will be created for every desired censorship function, and will
be started based on the user’s location. For example, a unit for Kazakhstan
will implement the government-required backdoor, a unit for China will
implement keyword scans and web access blocks (more on this later), a unit
for Florida will ban all packages with “trans” in the name (201 packages in
current stable distribution), a unit for Oklahoma will ensure all educational
software is compliant with the Christian Holy Bible, a unit for the entire
United States will prevent installation of any program capable of decoding
DVD or BluRay media, and a unit for California will provide the user’s age to
all applications and all web sites from which applications may be downloaded.
As can be seen, multiple units may be started for a given location.</p>
</blockquote>

<p>Do not obey in advance.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="software" /><category term="opinion" /><category term="linux" /><summary type="html"><![CDATA[Linux power-users tend to have strong opinions about two things: distribution and systemd. The bazaar of distributions means competing implementations or different perspectives end up expressed through the curation of the packaged software. systemd ended up so contentious because it’s a decent piece of technology which suffers from persistent scope creep that became a foundational component in a lot of distributions. The drama du jour is that systemd is somehow implicated in “age verification laws.”]]></summary></entry><entry><title type="html">35E</title><link href="https://brokenco.de//2026/03/22/a-poem-for-united.html" rel="alternate" type="text/html" title="35E" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/22/a-poem-for-united</id><content type="html" xml:base="https://brokenco.de//2026/03/22/a-poem-for-united.html"><![CDATA[<p>35E, 35E.<br />
Stuck here in the middle<br />
of the middle,<br />
35E.</p>

<p>At my height any seat<br />
can feel like misery.</p>

<p>I wouldn’t be here today,<br />
if not for last night’s delay.</p>

<p>35E, 35E<br />
trapped in this humid<br />
sneeze of<br />
humanity.</p>

<p>Everything is expensive, and still it sucks.<br />
The cheapest coffee was four lousy  bucks.</p>

<p>The grumpiness was extreme at the TSA<br />
acting in their theatre for deferred pay</p>

<p>35E, 35E<br />
I’m not sure in which<br />
timezone<br />
I should be.</p>

<p>United customer support has been totally outsourced,<br />
hour seventeen on the phone; just the worst.</p>

<p>For the shareholders the texts all lie,<br />
about being powered by GenAI.</p>

<p>35E, 35E<br />
all of the staff<br />
on this flight and the last<br />
have been really kind and patient which is a testament to their professionalism and hospitality despite the overtly customer-hostile environment of modern American commercial aviation.</p>]]></content><author><name>R. Tyler Croy</name></author><summary type="html"><![CDATA[35E, 35E. Stuck here in the middle of the middle, 35E.]]></summary></entry><entry><title type="html">2026 March: Recently Studied Stuff</title><link href="https://brokenco.de//2026/03/21/fresh-from-rss.html" rel="alternate" type="text/html" title="2026 March: Recently Studied Stuff" /><published>2026-03-21T00:00:00+00:00</published><updated>2026-03-21T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/21/fresh-from-rss</id><content type="html" xml:base="https://brokenco.de//2026/03/21/fresh-from-rss.html"><![CDATA[<p>Over the past week I have made a more conscious effort to keep track of some
really interesting articles that came through my feed reader. I am a big fan of
the open web and the power of RSS for disseminating interesting information
from actual people. Below are some really interesting posts I have read recently!</p>

<p><strong><a href="https://felipe.rs/2024/10/23/arrow-over-http/">Compressed Apache Arrow tables over HTTP</a></strong></p>

<p>When discussing transport protocols for sending data between services at work
recently, a colleague asked “why can’t we just yeet Arrow over HTTP?” It turns out, you <a href="https://github.com/apache/arrow-experiments/tree/main/http/get_simple/python">absolutely can</a> and Arrow IPC streams even have a registered MIME type:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Content-Type: application/vnd.apache.arrow.stream
</code></pre></div></div>

<p><strong><a href="https://blog.dataexpert.io/p/parquet-can-shrink-your-data-100x">Understanding Parquet format for beginners</a></strong></p>

<p>A great introduction to the <a href="https://parquet.apache.org">Apache Parquet</a> format
and why it makes so many things better with large data storage systems like
<a href="https://delta.io">Delta Lake</a>. I have written on this
<a href="/tag/parquet.html">topic</a> before and encourage you to take another read
through <a href="https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/">this blog
post</a>
by some maintainers of the <a href="https://crates.io/crates/parquet">parquet</a> crate.</p>

<p><strong><a href="https://apenwarr.ca/log/20260316">Every layer of review makes you 10x slower</a></strong></p>

<blockquote>
  <p>Every layer of approval makes a process 10x slower [..]</p>

  <p>Just to be clear, we’re counting “wall clock time” here rather than effort. Almost all the extra time is spent sitting and waiting.</p>

  <ul>
    <li>Code a simple bug fix: 30 minutes</li>
    <li>Get it code reviewed by the peer next to you: 300 minutes → 5 hours → half a day</li>
    <li>Get a design doc approved by your architects team first: 50 hours → about a week</li>
    <li>Get it on some other team’s calendar to do all that (for example, if a customer requests a feature): 500 hours → 12 weeks → one fiscal quarter</li>
  </ul>
</blockquote>

<p>This inspired these thoughts which I shared with the <a href="https://github.com/delta-io/delta-rs">delta-rs</a> community:</p>

<p>“what if we didn’t require code review for merging into main”</p>

<p>I’m exploring the thought more about what we might need to make that happen.
“Why would you do such a thing, code review is so valuable!”  I do find code
reviews valuable but we do seem to lose a lot of flow time due to timezones,
differing work schedules, and a number of other things. For something without a
lot of changes, especially bug fixes that come with tests I would be much more
comfortable with maintainers merging once CI goes green.</p>

<p>Some pieces of the puzzle that I think would be needed:</p>

<ul>
  <li>Soft caps on pull requests. I saw this mentioned somewhere else, but implementing a soft cap of &lt;500 lines per pull request can help people avoid massive unreviewable changes that are simpler to integrate.</li>
  <li>Incorporating some of the benchmarking work into CI that has already been explored. If performance of key operations is not affected and the build is green, go for it.</li>
  <li>Stronger semantic version checks: if our APIs have not changed and all tests pass, I’m generally comfortable with landing stuff by maintainers.</li>
  <li>Implementing Apache Software Foundation style release candidates and voting: this is where we would put a mandatory bottleneck, rather than some jokey slack emojis like I tend to do, implementing a true release candidate process that requires review and vote before we push something to users.</li>
</ul>

<p>All of this is to say that reviews can still be requested, but I would love to
see us land more improvements faster and I think we have a bunch of different
schedules that can make pushing each change through a review queue a lot slower
than necessary.</p>

<p><strong><a href="https://www.possiblerust.com/pattern/conditional-impls">Conditional Impls in Rust</a></strong></p>

<blockquote>
  <p>It’s possible in Rust to conditionally implement methods and traits based on
the traits implemented by a type’s own type parameters. While this is used
extensively in Rust’s standard library, it’s not necessarily obvious that
this is possible.</p>
</blockquote>

<p>I have been vaguely aware of this functionality but haven’t really taken the
time to consider it, so I really appreciated this post walking through the
conditional impl functionality in Rust.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="rss" /><category term="arrow" /><category term="parquet" /><category term="rust" /><summary type="html"><![CDATA[Over the past week I have made a more conscious effort to keep track of some really interesting articles that came through my feed reader. I am a big fan of the open web and the power of RSS for disseminating interesting information from actual people. Below are some really interesting posts I have read recently!]]></summary></entry><entry><title type="html">Only so many sunrises</title><link href="https://brokenco.de//2026/03/15/so-many-sunrises.html" rel="alternate" type="text/html" title="Only so many sunrises" /><published>2026-03-15T00:00:00+00:00</published><updated>2026-03-15T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/15/so-many-sunrises</id><content type="html" xml:base="https://brokenco.de//2026/03/15/so-many-sunrises.html"><![CDATA[<p>With a lot of discussion around intelligence lately, I find myself thinking a lot more about <strong>wisdom</strong>.
Age doesn’t necessarily beget wisdom, but I do think that experience can.
I am always impressed by those who are able to reflect and grow wise from the
varied joys and traumas that shape each one of us.</p>

<p><a href="https://www.youtube.com/watch?v=ZJEnQOsMtsU">This video</a> struck a chord for
me. Contrasting the Bay Area hustle culture to the things that make life worth
living:</p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/ZJEnQOsMtsU?si=fmlv30bzptgNgN_t" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe></center>

<hr />

<p>San Francisco has always a destination for those seeking their fortunes. The
frenetic enthusiasm radiates through seemingly everything there.</p>

<p>I also really enjoyed the energy of San Francisco when I first moved
there. I had nothing else but work.</p>

<p>The trade-off for my relentless focus on my career was a tremendous level up in
a short amount of time. I wouldn’t be where I am today without a few years of
judicious networking and workaholism.</p>

<p>San Francisco was “Lord of the Flies” when I described it to friends from
elsewhere. Awash in adult boys, untethered from the real world. I would hang
out with men 10 years older then me who were doing the same dumb shit I was,
except I was in my early twenties, a commonly accepted time to be foolish.</p>

<p>I did not want to end up like them and increasingly put both physical and
mental distance between the them and myself.</p>

<p>There is more to life than panning for gold.</p>

<hr />

<p>Today I was talking with an elder almost twice as old as me, who casually
offered:</p>

<blockquote>
  <p>I still get up at 5am; at this age there are only so many sunrises left to
see.</p>
</blockquote>

<p>I’m going to try to not stay up too late dwelling on the comment, lest I miss
tomorrow’s sunrise.</p>]]></content><author><name>R. Tyler Croy</name></author><summary type="html"><![CDATA[With a lot of discussion around intelligence lately, I find myself thinking a lot more about wisdom. Age doesn’t necessarily beget wisdom, but I do think that experience can. I am always impressed by those who are able to reflect and grow wise from the varied joys and traumas that shape each one of us.]]></summary></entry><entry><title type="html">Based Lake, a petabyte-scale low-latency data lake</title><link href="https://brokenco.de//2026/03/10/based-lake.html" rel="alternate" type="text/html" title="Based Lake, a petabyte-scale low-latency data lake" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/10/based-lake</id><content type="html" xml:base="https://brokenco.de//2026/03/10/based-lake.html"><![CDATA[<p>I had a chat today about building large scale low-latency data retrieval
systems around AWS S3. In doing so I got to share a bit of the talk proposal I
submitted to <a href="https://dataaisummit.com">Data and AI Summit</a> this year about
real-live work that has made it into production.</p>

<p>For years the conventional wisdom around <a href="https://delta.io">Delta Lake</a> has
been to <strong>not</strong> connect user-facing/online systems to Delta tables. Basically,
don’t point your Django app at your Delta tables. This continues to be a decent
<em>guideline</em> but definitely <strong>not a rule</strong> and I have the performance data to
back that up.</p>

<p>My talk abstract:</p>

<blockquote>
  <p>Scribd hosts hundreds of millions of documents and has hundreds of billions of
objects across our buckets. Combining large-language models with a massive
amounts of text has required investment in our new Content Library
architecture.  We selected Delta Lake as the underlying storage technology but
have pushed it to an extreme. Using the same Delta Lake architecture we offer
both direct data access for data scientists in Databricks Notebooks and online
data retrieval in milliseconds for user-facing web services.</p>

  <p>In this talk we will review principles of performance for each layer of the
stack: web APIs, the Delta Lake tables, Apache Parquet, and AWS S3.</p>
</blockquote>

<p>The work done by myself and my colleague Eugene in this area has been heavily
related to my previous research around <a href="/2025/06/24/low-latency-parquet.html">Low latency Parquet
reads</a> which informed work named <a href="https://tech.scribd.com/blog/2026/content-crush.html">Content
Crush</a>, which I have
explored more on the Scribd tech blog and on the <a href="/2026/02/13/screaming-in-the-cloud.html">Screaming in the
Cloud</a> podcast.</p>

<p>I really hope that I am able to share results at Data and AI Summit from this
incredibly challenging work that I am undertaking. But even if I don’t, blog
posts like my musings on <a href="/2026/01/19/multimodal-delta-lake.html">Multimodal with Delta
Lake</a>, <a href="https://www.buoyantdata.com/blog/2024-12-31-high-concurrency-logstore.html">scaling streaming Delta Lake
applications</a>,
and a myriad of other articles I have published can be pieced together to form
the larger mosaic of insane large-scale data work I have been hammering on!</p>]]></content><author><name>R. Tyler Croy</name></author><category term="arrow" /><category term="parquet" /><category term="deltalake" /><category term="databricks" /><category term="scribd" /><summary type="html"><![CDATA[I had a chat today about building large scale low-latency data retrieval systems around AWS S3. In doing so I got to share a bit of the talk proposal I submitted to Data and AI Summit this year about real-live work that has made it into production.]]></summary></entry><entry><title type="html">Using tmux with bhyve</title><link href="https://brokenco.de//2026/03/10/bhyve-tmux.html" rel="alternate" type="text/html" title="Using tmux with bhyve" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://brokenco.de//2026/03/10/bhyve-tmux</id><content type="html" xml:base="https://brokenco.de//2026/03/10/bhyve-tmux.html"><![CDATA[<p>Many aspects of FreeBSD follow the user-friendly unix philosophy, it’s just choosy about who its friends are. <a href="https://en.wikiquote.org/wiki/Unix">1</a>
I have always found <a href="https://bhyve.org/">bhyve</a> virtualization to be really
interesting but really unfriendly. The
<a href="https://github.com/freebsd/vm-bhyve">vm-bhyve</a> management system was what
finally cracked <code class="language-plaintext highlighter-rouge">bhyve</code> open and made it usable for me. The <code class="language-plaintext highlighter-rouge">vm</code> command has
paper cuts but generally speaking it does what I want on my primary FreeBSD
machine.</p>

<p>For the longest time I used the built-in VNC support to connect to machines
because the <code class="language-plaintext highlighter-rouge">vm console</code> command would use <code class="language-plaintext highlighter-rouge">/usr/bin/cu</code> which would
<em>inevitably</em> trap my console and no amount of <code class="language-plaintext highlighter-rouge">~&gt;~&gt;~&gt;&lt;~D~&gt;D&lt;S~&gt;D&lt;~&gt;&lt;D~&lt;L</code> would
help me exit.</p>

<p>Somewhere along the line <a href="https://github.com/tmux/tmux/wiki">tmux</a> support was
<a href="https://github.com/freebsd/vm-bhyve/wiki/Using-tmux">added to <code class="language-plaintext highlighter-rouge">vm-bhyve</code></a> and now <code class="language-plaintext highlighter-rouge">vm console &lt;name&gt;</code> simply opens up a new tmux
window!</p>

<p>I host everything under <code class="language-plaintext highlighter-rouge">/vm</code> on the machine, so in <code class="language-plaintext highlighter-rouge">/vm/.config/system.conf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>console="tmux"
</code></pre></div></div>

<p>This seems like a simple thing to be excited about, and it is, but it makes
VMs <em>wildly</em> more accessible and useful for me.</p>]]></content><author><name>R. Tyler Croy</name></author><category term="freebsd" /><summary type="html"><![CDATA[Many aspects of FreeBSD follow the user-friendly unix philosophy, it’s just choosy about who its friends are. 1 I have always found bhyve virtualization to be really interesting but really unfriendly. The vm-bhyve management system was what finally cracked bhyve open and made it usable for me. The vm command has paper cuts but generally speaking it does what I want on my primary FreeBSD machine.]]></summary></entry></feed>