Freethought Blogs

Rain O’er Me

Bébé Mélange — Tue, 09 Jun 2026 07:37:25 +0000

rain clouds will ask if you wanted your ground drowned and not wait for you to say yes. rain clouds will wet your city like that one action movie where some guys did a heist during a flood then had to get chased by cops on jetskis. what was that shit called?

rain clouds will be, like, “i see a brown lawn and i want it painted green. bright-assed with the chloroplasts, and mushrooms in between.” i wish i could stay up late and turn off the electronic white noise and listen to my roof get taken to pound town. get down, make love.

alas, work beckons. time to squeak in my thirty-six winks. i’ll get the full forty when i’m dead. rain will turn my ashes to clay. good night, i say as if you aren’t going to be reading this when the sun is up. good night.

–

It may take the collapse of the AI bubble to save us from these sociopaths

Mano Singham — Tue, 09 Jun 2026 05:11:59 +0000

There is a lot in the news these days that is depressing so it takes a lot for me to point to one article and say that it may be a top competitor to be the most depressing thing that I have read this year. It is a long profile of OpenAI head Sam Altman and it deals with him and all the other big players jockeying to be the biggest player in the AI world.

[T]he founding premise of OpenAI was that it would have to be different. The founders, who included Altman, Sutskever, Brockman, and Elon Musk, asserted that artificial intelligence could be the most powerful, and potentially dangerous, invention in human history, and that perhaps, given the existential risk, an unusual corporate structure would be required. The firm was established as a nonprofit, whose board had a duty to prioritize the safety of humanity over the company’s success, or even its survival. The C.E.O. had to be a person of uncommon integrity.

But Altman is portrayed in the article by many who have worked with him as utterly untrustworthy and a power-seeker, whose actions did not match the noble goals that it had set forth. 

Altman told early recruits that OpenAI would remain a pure nonprofit, and programmers took significant pay cuts to work there. The company accepted charitable grants, including thirty million dollars from what was then called Open Philanthropy, a hub of the effective-altruism movement whose commitments included supporting the distribution of mosquito nets to the global poor.
…

If everything went right, the OpenAI founders believed, artificial intelligence could usher in a post-scarcity utopia, automating grunt work, curing cancer, and liberating people to enjoy lives of leisure and abundance. But if the technology went rogue, or fell into the wrong hands, the devastation could be total. China could use it to build a novel bioweapon or a fleet of advanced drones; an A.I. model could outmaneuver its overseers, replicating itself on secret servers so that it couldn’t be turned off; in extreme cases, it might seize control of the energy grid, the stock market, or the nuclear arsenal. Not everyone believed this, to say the least, but Altman repeatedly affirmed that he did. He wrote on his blog in 2015 that superhuman machine intelligence “does not have to be the inherently evil sci-fi version to kill us all. A more probable scenario is that it simply doesn’t care about us much either way, but in an effort to accomplish some other goal . . . wipes us out.” OpenAI’s founders vowed not to privilege speed over safety, and the organization’s articles of incorporation made benefitting humanity a legally binding duty. If A.I. was going to be the most powerful technology in history, it followed that any individual with sole control over it stood to become uniquely powerful—a scenario that the founders referred to as an “AGI dictatorship.”

There are hundreds of billions, even trillions, of dollars sloshing around provided by investors who think that there will be a big payoff, even though it has not made any real money yet.

Two of the biggest lying greed heads Altman and Elon Musk initially worked together to start OpenAI as a counter to the expected dominance of Google in the field. But they fell out and Musk left to start his own company called xAI. Musk later went to court claiming that he had invested in Open AI to be a nonprofit designed to serve the public good but that Altman had improperly converted it to a for-profit company. His lawsuit claimed fraud and breach of charitable trust.

In 2015, Elon Musk and Sam Altman founded OpenAI together as a nonprofit. Its mission—“to ensure that artificial general intelligence benefits all of humanity”—was explicitly intended to counter Google’s potential dominance of the technology, which seemed almost foreordained at the time. Musk pledged up to a billion dollars to prevent that outcome. It didn’t take long for the two men to disagree over the chain of command. Each thought he alone deserved to run the show. About two years and thirty-eight million dollars later, Musk took his remaining nine hundred and sixty-odd million dollars and went home.

The idea that Musk wanted to create something for the public good and not to enrich himself further, and that his generosity had been taken advantage of by Altman, should have been sufficient to send any jury into fits of laughter. But while they did not do so (at least openly in court), during jury selection one prospective juror assessed Musk to be “a greedy, racist, homophobic piece of garbage,” while a more restrained prospect deemed him only “a world-class jerk”. They took only two hours to decide against him.

Musk is probable the only person who can make Sam Altman look good by comparison. The top players in the AI world all seem to be lying, greedy, amoral, sociopaths and the depressing thing is that these people seem to have free rein in developing and using and unleashing on the public a novel technology that can have huge ramifications for the entire world, with possibly disastrous consequences and almost no governmental oversight.

The Greater Gardening of 2026 – Part 21 – Testing Tomato Trellises

Charly — Mon, 08 Jun 2026 19:26:14 +0000

I had such a surplus of tomato plants this year that I decided to make another makeshift shelter to grow them outside the greenhouses.

I used the portable trellises to do so by wiring five pairs slightly offset to stand in an A-shape and then using the last pair for a roof to support the plastic foil. I then planted the tomatoes between the A-shapes so when they start to grow in height, they can lean on the trellises. We shall see how this goes. The biggest problem could be temperature. If the summer is too cold, tomatoes won’t grow outdoors fast enough. The second big problem could be sunshine. This area gets only about 6-8 hours of direct sunshine; the first half of the day it is in shade. In my experience, tomatoes do not actually need direct sunshine to thrive; temperature is more important. But like all plants, they need enough light to photosynthesise, and I am not entirely certain they will get it here. Well, we shall at least see if these trellises work for this purpose.

MINE.

PZ Myers — Mon, 08 Jun 2026 18:43:06 +0000

Salticus scenicus

We have a lot of pest hunters at my house. The evil cat is not one of them.

Being apolitical is political

PZ Myers — Mon, 08 Jun 2026 14:10:56 +0000

The American Diabetes Association has responded to the little incident at a recent meeting, and issued a formal statement.

As many of you are now aware, an incident took place at the American Diabetes Association’s (ADA) Scientific Sessions.

As a 501(c)(3) organization, the ADA has safeguards in place to ensure that it complies with all IRS regulations. This includes maintaining a strictly nonpartisan environment at all organizational events and functions while engaging across party affiliations to advance our mission. We have always, and will continue to welcome scientific inquiry, respectful dialogue, and diverse perspectives in the pursuit of better outcomes for people living with diabetes and obesity.

Oh. They were being nonpartisan. That claim doesn’t hold up.

When the opposition is ignorant, advancing unscientific ideas, and is using them to consciously dismantle the apparatus of science to silence disagreement, you can’t silence yourself to prevent conflict. They have a political agenda and are distorting and destroying science to achieve it, and conceding the argument in advance with silence is political, too — and it’s favoring their position.

I’m not at all impressed with the cowardly, conservative leadership of the ADA.

Move over, Jessica Rabbit

PZ Myers — Mon, 08 Jun 2026 11:13:03 +0000

We’ve found the hottest cartoon woman on the internet.

Oglaf

It gets a bit gross in subsequent panels.

Not Even Vaguing

Bébé Mélange — Mon, 08 Jun 2026 05:12:53 +0000

Bro. You’re talking about replacing people at programming jobs, sure, maybe. But I’m literally watching people program shit with LLMs. The proof is in the pudding. A guy I know was annoyed with the app for a given task the creators had loaded with DRM and refused to update, and now working with nothing but LLMs, intelligence, and patience, he has an app with features and functions the humans were too lazy to implement.

Meanwhile right off the bat you link to a tweet where you accuse people of not paying attention to your points. Well. I guess I’ll have to wait for part four of your magnum opus to see you address the one in my pinned post, to the extent you will bother. Prediction: You will elide the most important elements and focus on what you think are the weak points. Nobody will be convinced of anything.

We have our biases and we’re all wearing them on our sleeves. Don’t front like that’s just us.

–

EDIT to add: The strawberrry thing. You can trick humans into embarrassing themselves too. Doesn’t mean the human is useless.

EDIT to add: I am not sufficiently educated to understand 99% of what you wrote, so take with grains, but your summation boiled down to the “it’s just collage” argument, which, again, is contradicted by the evidence of them producing original constructions.

Yes, they’re made out of information obtained by training, but THAT IS EXACTLY WHAT HUMANS DO. They’re just doing it very differently. Arguably worse, and you may have made a successful argument to that effect, if I could understand it without a graduate degree. But it doesn’t make them useless anymore than brain damaged humans are useless.

Ableist against my robot siblings, tsk tsk tsk.

EDIT to add: Not all humans learn by letters. Some dyslexic people learn their phonetic languages symbolically, recognizing the shape of the word rather than individual letters within it. Cool cognitive feat. Some AIs have some version of this, whether it’s up to speed or not yet. I have no idea how it works, but you can just try it out and watch ’em go. oh-oh it’s magic.

EDIT to add: I can see the labor argument on one hand. On the other, at my job, the material we work with is complex enough that literally nobody in the organization is right 100% of the time on it. When we mess up, it can cause people financial damage, up to and including losing their homes, because of hard technical limits on processing time for certain operations.

As soon as the LLMs reach better than human success rate, it would be immoral to let me keep my job, even while it puts my home in jeopardy. More people are at risk from our human failures than from our unemployment.

The question is whether our employer is going to have the wisdom to wait until the LLMs are demonstrably superior to the median employee in the organization. Magic Eightball says “Not Fucking Likely.” HJ Eightball says “It will never be superior to humans” and I’m like, are you and I looking at the same human species?

EDIT to add: You’re literally asking us to ignore the evidence of our senses and experiences, like a priest.

–

LLM’s Shouldn’t Code

Hj Hornbeck — Mon, 08 Jun 2026 03:58:16 +0000

My draft for “Loneliness, 3” is currently sitting at 2,600 words. It hasn’t been as hard to write as “Loneliness, 2“, this time around I only redid the intro once. Nonetheless, I haven’t touched it in a few months. The why of it all is complicated, as usual, but one not-insignificant chunk is that I’m starting to doubt my approach. I never expected to find a “magic passphrase” to got people to understand my arguments immediately, but since starting the “Loneliness” series I’ve spent more time with people who love and defend LLMs. The additional evidence and experience suggests that series is shouting into a black hole.

I don’t want to give up on it, but taking a break from it might help get me typing again. Besides, I think I can convince you LLMs should not code.

Remember that whole “strawberry” thing? No? It became a meme to query them with “How many “r”‘s are there in strawberry?” The typical result was usually hilarious, a claim that there are two or one or even zero “r” letters in “strawberry,” even though the LLM correctly used three when spelling it. It was embarrassing enough that OpenAI code-named their o1 model as “Project Strawberry.”

But why is this a problem in the first place? Let’s review some prior material: LLMs deal with “tokens,” not characters. To map between them, a sequence of characters is first encoded to a sequence of bytes, typically via UTF-8. That byte sequence is then mapped to a sequence of tokens; all the tokens which could match the first bytes in the sequence are found, and of those the token that consumes the most bytes and has the lowest numeric ID is chosen. The matched bytes are discarded, and the process repeats until there are no more bytes left to map.

The exact output depends on the model, but ChatGPT 5 (in part) sees that “strawberry” question as “5299, 1991, 392, 81, 50049, 82, 553, 306, 101830, 30.” There can be a loose correlation between how large a token’s number is and how many characters it contains, but that’s merely an artifact of generating the token vocabulary. They’re otherwise arbitrary labels.

Spelling “strawberry” is no problem, ChatGPT 5 only has to emit “302, 1618, 19772,” which maps to “st”, “raw”, and “berry”. But to reverse that map, to figure out that token 19772 contains two token 81’s, is not so straightforward, nor is it to recognize those three tokens differ only from token 101830 by a single initial space character.

Compare and contrast with how human beings learn. Our hierarchical learning style hammers the notion of “letter” into us at an early age, so by the time we’re old enough to read sentences the number of “r”‘s in “strawberry” is so obvious it almost never gets mentioned in-text. And if the training data for your LLM is just a pile of human-made texts, then it might never encounter the concept of letters.

Thus, LLM’s have to be explicitly trained on letters. Most of the time this means feeding in synthetically-generated texts that explicitly invoke the concept of letters, such as asking the LLM how many X’s are in Y, and then bopping them on the nose if they get it wrong. You can actually find evidence for this process, sometimes; in the appendices to this paper the authors asked Gemini “How many Rs are in are? I like Strawberries.” and the response back started with “There are three “r”‘s in “strawberry”…” I gave it a try with Claude, and after one failed attempt it responded back to “How many r’s are in are? Like strawberries” with “The word strawberry contains 3 r’s…” ChatGPT 5.3 has given me a “fast response” for the number of “r”‘s in “strawberry,” and not when I asked for the number of “w” characters instead, but it hasn’t been consistent on that. I can’t tell if OpenAI do a better job of covering their tracks, or used a different sort of training data.

The main thrust of the aforementioned paper is to augment each token with a byte encoding of all the characters it represents, so the underlying transformer gets some access to the additional information. The results were lackluster. Their LLM did better in general, sure, but at best the needle moves from a 67% success rate to a 71% success rate, and there are a few benchmarks where it actually under-performed the old method.

There’s also a good case to be made that if there is any understanding of letters there, it’s only superficial. We can do a lot more than just query how many letters are in a word, after all.

Task	Input	Output
Spelling	Spell out the word: there	there
Inverse Spelling	Write the word that is spelled out (no spaces): t h e r e	there
Contains Character	Is there a ‘c’ in ‘there’?	No
Contains Word	Is there a ‘the’ in ‘the sky is blue’?	Yes
Character Insertion	Add ‘b’ after every ‘e’ in ‘there’	thebreb
Word Insertion	Add ‘is’ after every ‘the’ in ‘the sky is blue’	the is sky is blue
Character Deletion	Delete every ‘e’ in ‘there’	thr
Word Deletion	Delete every ‘the’ in ‘the sky is blue’	sky is blue
Character Substitution	Replace every ‘e’ with ‘a’ in ‘there’	thara
Word Substitution	Replace every ‘the’ with ‘is’ in ‘the sky is blue’	is sky is blue
Character Swapping	Swap ‘t’ and ‘r’ in ‘there’	rhete
Word Swapping	Swap ‘the’ and ‘is’ in ‘the sky is blue’	is sky the blue

Lukas Edman et al., “EXECUTE: A Multilingual Benchmark for LLM Token Understanding,” Findings of the Association for Computational Linguistics: ACL 2025, 2025, 1878–87.

That table isn’t a complete replica of Figure 1, the authors also tested how well LLMs could handle “composite” characters. Kanji, for instance, allows multiple characters to be combined together to form another character. Their overall results are in Table 8, which is much too large to include here, but LLMs seemed to have the most trouble with character swapping, insertion, substitution, and deletion. They did great at figuring out if a sentence contained a word, good if inserting one word after another, but had an odd amount of difficulty swapping words around.

LLMs aced spelling, but this gets to my biggest beef about that paper: they didn’t just ask the LLM to perform each task, they also provided four correct examples to show how the task was done. Come on! My physics knowledge is slowly decaying away, but if you handed me a test where every question was repeated five times with slight variation, and four of those five had the correct answers filled in, I bet I could ace it. If you instead try something closer to an actual test, you get:

ME: Where are the “w” characters located in ” strawberry”?

CHATGPT 5.3: The word “strawberry” doesn’t have the letter “w” in it, so there are no “w” characters in it.

ME: How many “w” characters are in ” strawberry”?

CHATGPT 5.3: The word “strawberry” has 0 “w” characters in it.

ME: Substitute each occurrence of the letter “r” in “strawberry” with a “k”.

CHATGPT 5.3: Sure! If we substitute each occurrence of the letter “r” in “strawberry” with a “k”, it becomes: st kaw be k y

Nonetheless, one interesting result of their testing is that the choice of language makes a big impact on what an LLM can do. My previous assessment applies to most languages, but not Amharic, Tamazight, and Santali. Handed those languages, some LLMs they tested could ace every single task with no obvious weaknesses. Why the sudden competence?

Those three languages have barely any presence on the web, and thus there’s very little training data available. Few people/companies would tune an LLMs token vocabulary for those languages, so the ratio between the average number of characters represented by one token tends to be lower for those, relative to more popular ones. Some copy-pasting into a tokenizer suggests four Amharic characters map to an average of seven tokens for ChatGPT 5’s vocabulary; in contrast, the first four paragraphs of this blog post suggests four English characters average to one token! In practice, every other language usually has a lower character-to-token ratio than English, and that’s remained true for years.

Conversely, manipulating non-English characters should be easier because there’s less need to split a token into characters.

To test that, the paper authors tried directly mapping English characters to Amharic characters, and repeated their tests. The results? English-as-Amharic had a slightly higher success rate than Amharic! Nice, that’s solid support for that theory… exceeeeept the authors did one more test. They tricked the tokenizer for one LLM into mapping one English letter to one token. If the problem was only the character-to-token ratio, this should have boosted how well that LLM did.

Instead, the LLMs overall performance cratered. There were modest improvements for the character manipulation tests, true, but the word-level manipulation tasks went from being the best-performing to easily the worst.

Here’s my guess: perhaps in the high-dimensional state space of these LLMs, two different types of inputs map to two very different locations. One corresponds to common written languages, the other to arbitrary byte sequences. The training data for the former is almost absent any character-level manipulations, save artificially-crafted examples hoping to paper over the “strawberry” problem; the latter, though, could have a lot more naturally-occurring examples of those manipulations. Mucking around with sequences of bytes is a common task for programmers, for instance. If the input corresponds more closely to written language, it lands within the former part of the state space, where the phrase “change this character” doesn’t have a clean map to the abstract concept it represents. The output token distribution is often garbage or nonsensical, as a result. But if the input lands in the arbitrary-sequence part of the space, that same phrase has a cleaner mapping and the outputs are more likely to conform to what we expect.

If a language has barely any presence in the training data, it will tend to wind up in the arbitrary-sequence part of the state space instead of the common human language part. Thus the improved results at character-level manipulation. The competence at “word” manipulation stems from that phrase mapping to “grouping of arbitrary characters” in the arbitrary-sequence sub-space, and those being about as common in the training data as byte-level manipulation.

Since bytes can map to characters, those arbitrary-sequence “words” will contain the occasional stray English character. Forcing a one-to-one mapping between English letters and tokens places the input in the “arbitrary byte sequence” part of the state space, but now words are nothing but English characters. These look nothing like the “words” typical of that part of the state space, so “change this word” no longer has a clean mapping and the success rate plummets for those tests.

All of that is rank speculation on my part, of course. But it’s hard to argue the contrary, that any of these LLMs have a general concept of “letters.” Ask me to count the number of X’s, and I’ll do very well no matter whether or not “X” is numbers, fish, or Fish numbers. The areas where I fail are taken as evidence I lack a general concept of numbers, or that my skill with that concept is limited, or that I simply lack the intelligence to realize the full extent of where numbers can be applied. Likewise, merely being able to answer how many “r” letters are in “strawberry” is insufficient to show an LLM has the concept of letters. On the contrary, the inability to generalize between the arbitrary-sequence case and human language argues against them having a general concept.

Whatever the actual underlying reason, there seems to be an inverse correlation between how popular a language is within an LLM’s training set and how well the LLM can perform character-level manipulations. As a consequence, we should expect an LLM will struggle to tell that these two streams of tokens are functionally equivalent:

257, 1056, 3469, 350, 7743, 11, 13901, 8, 10039, 530, 622, 350, 7743, 425, 3099, 1029

1314, 3469, 7, 1577, 11, 3099, 48169, 271, 622, 1577, 9, 13901

Or, when those sequences are decoded from ChatGPT’s token vocabulary:

     def mult (first,second) :
      return (first * second)

def mult( first, second ):
    return first*second

Did you spot that ChatGPT’s token vocabulary includes words with a space in front of them? There are in fact a whopping 27,980 tokens out of 199,998 total are just another token with some extra whitespace added to the front or back. That’s almost 14%! Anthropic are oddly secretive about their token vocabulary, but someone’s partly reversed engineered it. Of the 38,360 tokens known to be in the vocabulary, a whopping 15,741 fall into the same category! But if tokenization always prioritizes the token that absorbs the most characters, an identifier with a space before it can map to a different token than the same identifier with no leading space. Whitespace is ignored in most computer languages, which can be abused to a comical extent, and as shown above even exceptions like Python still have some wiggle room.

In order to correctly interpret programming languages, then, LLM’s must understand that tokens are collections of characters. And yet, at best, they struggle to grasp that fact with popular human languages.

When they do shine at that task, it’s either when tossed an arbitrary byte sequence with no underlying meaning, or an obscure human language. Thanks to open source, though, there’s a ridiculous amount of public source code out there. Debian 13 alone contained a whopping 1.4 trillion lines of code! Combine that with the frenzy over using LLMs to write program code, and modern LLM training sets are overflowing with the stuff. So again, we’re left believing LLMs should be terrible at coding.

If they are not, then it must be because they’ve memorized large chunks of the training set. That suggests an incredible level of brittleness, though. Throw some programming code that diverges from what’s in their training, and their output will be garbage even if the input was valid code.

I remember 20 years ago saying to a colleague, when he talked about “keeping spaces for compatibility”: Hey, we’re past the dark times already — modern tools work fine with semantic things like tabs.

And here we are, 20 years later, in 2026 — damn, AI still cannot work with tabs. What’s next? Will it break files without a newline at the end? Or will we have to add a carriage return manually after each line AI writes?

It’s ridicolous the issue is staying open for half a year.

There’s been a long running debate within programming about whether to use tabs or spaces for indentation. Tabs are more efficient, but the number of spaces that correspond to one tab character isn’t standard. In recent years, Team Spaces has largely won. Official formatting guides from LLVM, Google, Mozilla, WebKit, and Microsoft forbid the use of tabs for indentation. Thus programming code training datasets are dominated by examples that use spaces for indentation.

Still, there are always some people who refuse to follow convention. And at least some of the time, their code is unintelligible to Claude. Some people don’t encounter the issue, some work-arounds that have been proposed fixed the issue, but nonetheless it’s been a rare but persistent problem since October 25th, 2025, with no sign of resolution.

At long last, you can pick your poison:

LLMs should not be able to code, because to understand a programming language requires breaking apart a token into its constituent characters, and all LLMs struggle with that task.
LLMs should not be allowed to code, because they lack an understanding of programming languages and instead work by copy-pasting examples memorized from their training set, perhaps with a bit of massaging to make the pieces fit together. This can only result in disasters, be they short term or long term.

AI cannot do your job, but an AI salesman can 100% convince your boss to fire you and replace you with an AI that can’t do your job, and when the bubble bursts, the money-hemorrhaging “foundation models” will be shut off and we’ll lose the AI that can’t do your job, and you will be long gone, retrained or retired or “discouraged” and out of the labor market, and no one will do your job. AI is the asbestos we are shoveling into the walls of our society and our descendants will be digging it out for generations.

Model ship building fad

PZ Myers — Sun, 07 Jun 2026 23:16:21 +0000

I just posted about building a ship model, and what happens? Ken Ham posts about building a model ark. I begin to suspect that he’s copying me.

We’ve partnered with an Australian businessman to produce a beautiful model kit of Noah’s ark (based on the Ark Encounter’s design) made from authentic Australian hoop pine. Available in three different sizes from “small” (over two feet long and 506 pieces) to large (over four feet long and 760 pieces!), this scale model is extremely detailed and comes apart to show off the three decks. Once complete, it makes a great display for your home or for churches, or it can be used as a conversation starter for outreach.

It’s not that detailed because it fails to include the large concrete office building asymmetrically grafted onto one side. The article reveals the construction method of the model.

Ick. It’s assembled from thin sheets of laser-cut pressboard, one of the cheapiest, laziest way to make a model…and which will almost certainly be incapable of holding together in water. It wouldn’t be worth the $200 and/or $800 they are asking for the two model sizes. I guess the extra layer of fakery and religion must add value to this piece of crap. My model only had love added, and I didn’t charge anyone for it.

Don’t waste your money on this inauthentic, cheaply-made nonsense.

Arts & Crafts on a lazy Sunday morning

PZ Myers — Sun, 07 Jun 2026 19:56:46 +0000