muffinlabs.com

some recent livecoding

There was a 24-hour live code stream for Palestine starting this past Friday. I ended up performing in two 15-minute slots, one Friday night and one randomly on Saturday morning.

The Friday one was my first real-ish live coding session in front of more than a handful of people, so I was pretty nervous. I had a lot of adrenaline and made some big mistakes (probably no one will care except me) but it felt really great. It was my first time performing music in front of other people at all since I was playing the saxophone in 8th grade:

I thought that would be it, but I happened to notice that there was an open slot on Saturday morning, so I jumped on that and performed again. I felt a lot more relaxed for this one:

I hope to do more of this! There's not much of a scene I know of in Western MA but I guess we'll see.

If you're reading this and are up for it, please consider doing something to support the Palestinian people, whether that's donating money, engaging with the BDS movement, or even just raising awareness. The violence still hasn't stopped in Gaza, and things weren't great before October 7th anyway.

feedsin.space updates

A spent awhile porting feedsin.space from rust to node and deployed it about a week ago. I did it for a couple of reasons. First, as much as I like rust, it's a lot easier to iterate on things in node. Second, I wanted to play around with Fedify. It's in active development and it's going to make it a lot easier to do things like implement quote boosts. It's also got a bunch of logic for smartly delivering to inboxes, handle message queuring, things like that. I've been experiencing a lot of pain around stuff like that with the old rust code, and implementing the logic from scratch is a super drag.

Anyway, the code is over on codeberg, and the old code is still on github if anyone wants that.

looptober

2025-11-09T00:00:00Z

I spent some time last month participating in looptober, a project where you generate a short bit of looped music every month. I used Strudel to write my pieces. Strudel is a JavaScript/Web-based tool for live coding music.

I took a class on live coding at SFPC this past summer and really enjoyed it. I've been playing with Strudel ever since, and so looptober was a good opportunity to focus on using it with a specific goal in mind.

There's definitely a range of quality here, there's some pieces I like a lot, some that are ok, and some that are pretty blah. I also really like the evolution of a piece of live-coded music, which is lacking here. But there's some good snippets in here for sure. I've included a video of each song playing and a link to the Strudel script as well.

1st

First day, first loop. This one is pretty simple.

view on strudel.cc

2nd

I started messing around with using OBS to record these, and originally just recorded my whole screen, video feedback and all, which I actually enjoyed a lot. This piece is pretty messy, I sampled Seal's "Crazy" and sort of just messed around with it. For a lot of these loops I tried to get pretty close to something that was good without spending hours and hours refining.

view on strudel.cc

5th

I'd been listening to a huge excavator with attached jackhammer hack away at slate for a day or two when I made this.

view on strudel.cc

6th

For this piece I sort of went in the opposite direction of the last one, and used a cute bird tweet sample to make a simple melody.

view on strudel.cc

7th

Guitar-ish noodling in Studel (stroodeling?)

view on strudel.cc

8th

I sampled some trolley sounds from Mr Rogers, and also Mr Rogers himself saying "You ok?" and turned it into this. I actually like this one a lot, very basic but I feel like there could be something here.

view on strudel.cc

10th

Another loud one. This uses the sample of a 3d printer from Sonic PI.

view on strudel.cc

12th

I spent a week travelling without my laptop so made this on a Chromebook.

view on strudel.cc

13th

I also made this on a Chromebook. I like the built in number samples in Strudel and they are fun to play around with but this is not much of a song.

view on strudel.cc

14th

Another Chromebook piece, I actually like this one a lot, it was fun to jam on.

view on strudel.cc

15th

I used the number samples again for this one and it's more interesting. Another Chromebook piece!

view on strudel.cc

16th

I think this was my last Chromebook jam, it's ok.

view on strudel.cc

20th

I took some samples from Thomas the Tank Engine for this one, it was fun to make.

view on strudel.cc

22th

I think this is my favorite one from this project. I had an idea in my head, managed to hold onto it for a couple of hours while I was working/taking care of life, and then I was able to convert it into something that was pretty close to what I wanted. It's pretty simple but I like it a lot.

view on strudel.cc

24th

I used some loon samples for this one #loontober.

view on strudel.cc

26th

Messing around with the 'aahh' choir sounds from Strudel and some pad samples. It's easy to get lost in playing with samples like this.

view on strudel.cc

27th

I like the older electronic sound of this one.

view on strudel.cc

29th

Another electronic/vaguely industrial one. I'd seen Autechre perform the night before.

view on strudel.cc

31st

Last one! I think this is my second favorite piece. I sampled Ash from Aliens for the weird talking.

view on strudel.cc

mastodon archive viewer

2025-02-05T00:00:00Z

I kept putting off finishing it because, you know, everything sucks, but I put together a mastodon archive viewer website/tool here:

https://mastodon-viewer.muffinlabs.com/

It does the whole '100% local processing' thing, so you can specify your archive zipfile and everything is processed/stored locally. You can also use it to generate a website suitable for uploading somewhere and sharing your toots if you want.

My initial goal was to have this out in the world before botsin.space closed down, but again, the world/etc.

There's a few other tools out there that will do this for you, and a couple of them are definitely nicer looking, if not more functional than this one, but this might be the only mastodon archive viewer that does in-browser processing, offers an export, and also has the source code available. I hope folks find it useful.

botsin.space postmortem

2024-12-21T00:00:00Z

I switched botsin.space to read-only mode on December 16th. I essentially just prevented the creation of new toots -- you can still login, migrate your account, request an account archive, etc. Unsurprisingly, the load on the website has plummeted. In the next couple of weeks I hope to spend some time downsizing the instance, cleaning up some junk, etc, etc, to have a lean archive of the server that I can keep online without much expense for as long as possible.

After I made the announcement that I was shutting down the instance, a few things happened. First, I received an overwhelming amount of thanks and support. I'm so incredibly grateful for all the kind words. So many people reached out to offer their thanks for running botsin.space, and their understanding for my need to end things. A few people (literally just one or two out of hundreds) were angry about it, which I understand.

Several people/organizations reached out with offers to either fund the service, offer spare servers/bandwidth, promote their new cloud platform, or take it over in one way or another. I rejected these for several reasons. First, several of them involved me continuing to run botsin.space, and as soon as I made the announcement that I wanted to shut down the instance, I realized just how badly I didn't want to do that anymore. Second, a few of the offers involved established orgs/people with a decent presence on the fediverse, but I was concerned that handing over control would change how botsin.space would be run in ways that made me uncomfortable. For example, I blocked Threads/Meta on botsin.space, and I know that some of the orgs that offered their support are either cool with Meta or even supported by Meta. That's their right for sure, but it's not something I was comfortable with.

But even more importantly, I was really worried about the privacy implications of giving the service to someone else. It has always been really important to me to maintain the privacy of the people with accounts on the server. As a rule, it's good behavior to identify the creator of a bot/automated account in the profile of the bot, but it's not strictly required, and there's certainly good reasons to remain anonymous. From time to time someone would request the contact information of a bot owner, and I basically never handed out that information. I would try and be an intermediary instead -- someone contacted me, I would contact the bot owner and pass along their message, etc. I have also been concerned for a few years now about the possibility of law enforcement/state actors asking/demanding information from the instance. It's extremely plausible that the database of an average mastodon service includes some toots about someone's abortion for example, which is the sort of information that police officers are suddenly interested in having in parts of the US. Ultimately, my concerns are probably irrelevant because my hosting provider could hand over the information without me ever knowing, but I worried that handing the server to someone else would leave these questions unresolved in a way that was unacceptable to me.

I was also contacted by at least one known bad actor that definitely had malice in mind when they offered to take over the instance, and went so far as to setup and populate a throwaway account to contact me.

Finally, one or two archivists have reached out, but that's a little different from maintaining the service, and I'll probably see what the options are there.

Numbers

I thought it might be interesting to cobble together some stats, graphs, etc. for the instance. They show certain trends, and might be helpful for people thinking about their own instances.

Expenses

Here's a chart of the monthly expenses for botsin.space:

The red line is the bandwidth/storage cost, and the blue line is the server/CPU cost. The yellow line is the cost for the mail/SMTP service I used, which changed a couple of times over the course of the project. It started off being essentially free, then $24/year, then $10/month when the number of emails sent by the instance hit a certain threshold. The green line is the sum total of everything.

Observant folks will notice that the bandwidth expense trends up over time, but then has a big drop near the end of 2023. I'd been worrying about the creeping expense for months at that point, and I implemented a cache layer to try and save some money. Originally I used AWS to host files, but I moved to Digital Ocean in 2022. Both services charge separately for storage and bandwidth. As the bandwidth expense grew, I experimented with a proxy/cache system on the botsin.space server itself, which gets a certain amount of 'free' bandwidth every month. This helped a bit, but not for very long, as the bandwidth usage continued to grow and grow.

Toot/Media stats

Here's some charts for the number of toots tooted each month, as well as the number and size of media files processed by month. See if you can guess when Elon Musk bought Twitter.

In many ways, that giant spike in 2022 was the beginning of the end for botsin.space. The monthly active accounts total went from 700 to 1100, from just under a million statuses created to 3 million statuses (!). The media usage and bandwidth grew similarly. There's not a whole lot else to say about these three charts except that Elon Musk is a real chaos agent.

Tech Specs

I've written a bit about how I ran botsin.space already, and it's still fairly accurate. In the end times, I added a varnish caching layer to try and help keep things running, and I also added two extra servers, one devoted to web requests and one to background jobs.

Originally the instance was a 2 CPU instance with 4GB of RAM. In the end, it was an 8 CPU instance with 16GB of RAM, with the two extra instances being slightly smaller. It moved from a s-2vcpu-4gb to a s-8vcpu-16gb-amd on this table, with a few steps in the middle. I think one takeaway from these graphs are that if I'd been running an instance for a small, limited number of accounts, it would've been manageable for a very long time. It took a long time for the expense to creep past $50/month, and another long time to go above $100/month.

Starting in 2020, I moved the database storage off of the instance and onto a separate volume. Having data on a volume separate from everything else meant that I was able to take instant snapshots of the database before running upgrades, and this really saved my ass several times. There was a Mastodon upgrade sometime in 2022 that involved some involved data migrations, and I think an upgrade to Postgres. When I ran the upgrade, something went horribly wrong and botsin.space was pretty badly broken. I ended up taking the instance down for several days while I dealt with the issue. Ultimately, I needed to rewrite the migration scripts from scratch and run them on a backup snapshot of the database. This was really hard work -- I had to setup a new server, attach a copy of the database, debug my script, etc, etc. I was extremely likely to be qualified for this work -- I've been working with Ruby on Rails since it was beta software. But I still came pretty close to calling it quits right here and shutting down the instance forever. Anyway, make good backups of your server!

Assorted Challenges

Technical challenges

There were long stretches in the beginning when the service ran with little or no attention from me. Other than the event I described above, there weren't any epic challenges involving downtime, server crashes, etc. I'm fortunate to have a lot of knowledge and expertise regarding running servers. I tend to know how to identify a problem, I can figure out how to fix it, etc. Docker doesn't particularly bother me and using it made it easier for me to tune the service. I wrote scripts to handle things like expand the database volume when needed, run cleanup tasks, etc. I maintained the configuration for the server in git. I also have an instance of Jenkins that I use to deploy my personal projects, and I used it here too. Every time I pushed an update to the setup for botsin.space, Jenkins would grab the changes, build a new docker image and push it to the right place, and take care of a few other tasks. Then I would ssh into the server and run a command or two to switch to the new version. All things considered, I had it running pretty smoothly for years.

Using S3/DigitalOcean spaces to store files meant that I never really needed to worry about hard drives filling up, although when I decided to move from S3 to DigitalOcean, I ended up with a huge AWS bill for the expense of transferring all of those files. Oops!

Two annoyances were spiders/spam, and background jobs. Every now and then some web scraper, spammer, or well-meaning but malfunctioning bot script would tie up a huge chunk of system resources until I blocked their IP address or found some other way to deal with the issue. It was never a catastrophic problem, but it would be annoying from time to time.

But background jobs could frequently become an issue. Basically, every time someone creates a toot, a bunch of background jobs are created to send it to all the followers of an account. There's a similar set of jobs to deal with incoming toots. Sometimes this would slow down and there would be a backlog of work to be done. This was probably happening for several reasons. First, it involves communicating with remote servers, some of which might be slow to respond, or they might be offline. Second, sometimes a job would create multiple other jobs. Essentially speaking, a popular/viral toot can create an unexpected cascade of background jobs. The main solution here was usually to run extra copies of sidekiq to process the backlog, but sometimes this meant doing a lot of juggling to keep from overwhelming the db/web instances that also needed to run.

Other challenges

Dealing with new accounts was never too much of a big deal, especially after I shut down unverified signups, and asked for a 'magic word' in the signup flow. It probably got a little tiring during the months that a couple hundred people signed up.

Moderation was generally not much of a problem most of the time. Most months the number of reports was low, and sometimes there weren't any reports at all.

There were definitely some real problems in these reports -- bots behaving badly, accounts on remote servers being assholes. Lots of complaints about spammers on mastodon.social. But in general, a huge chunk of these reports contained literally no information in them -- no description of the problem, and no links to specific statuses. I generally classified these as "I don't like this account/content, but I'm going to make someone else do labor by clicking the report button instead of the block/unfollow button." They were really frustrating, because I'd generally end up doing some digging to see if anything was going on with the reported account, and inevitably nothing was. I've said this in the past, but I think it would be really easy for a bad actor to weaponize reports to cause a lot of pain for people running instances. I also think that I didn't have a typical experience regarding moderation, since as a rule any interaction between a remote account and a botsin.space account involved a non-human, and while bots can definitely say terrible things and make awful mistakes and are ultimately a human product, in general this didn't happen a whole lot (or at least, I rarely heard about it).

Weirdly, I didn't get a single DMCA takedown request, it seemed basically inevitable and never happened.

There were a few times that I added an instance to the blocklist and would get brigaded by people being total assholes about it. They'd hop on new accounts from other servers to harass me, they'd email me spewing hate or trying to convince me to undo the block (or both at the same time). Sometimes threats were involved. But I recognize that I had it pretty easy compared to a lot of other people running instances. I rarely announced blocks, which I think helped, but maybe wasn't the greatest way to do things.

Once or twice I needed to deal with content that was CSAM/CSAM-adjacent, and honestly it was very upsetting. I think all the articles out there about moderators at big social networks struggling to deal with the mental hardship of dealing with content like this are actually underselling the problem. It really sucked, I lost sleep over it, and what I needed to deal with was pretty trivial compared to other people.

Some final thoughts

When I first started drafting this post, I thought I would write something about how you shouldn't launch an instance without having a fallback plan, an exit strategy, a team to help manage your community, etc, but having spent a couple of days writing everything here, I'm not sure I feel that way anymore. I actually think that ideally, a person or group can decide they want to start a community on the fediverse, do that, and then when it's time to end, they end it, and have that all be fine. Communities don't have to last forever.

I think a lot about phpBB. It's easy to forget about in the post social media era, but there are a lot of phpBB/etc forums out there that host vibrant, active communities. They might be small, but that's ok, and it's likely even a good thing. A friend of mine runs an instance of phpBB devoted to cars with manual transmissions. I know of another forum devoted to parenting that started as a splinter group from yet another forum that was mostly about feminism. I used to frequent a board devoted to the Red Sox, and another one devoted to a 2d airplane game I used to love. Today I spend time on forums related to ambient music and making tape loops. There's forums devoted to medical issues, where people can go to talk, ask questions and find support. And most of these places have off-topic sections, where you can discuss random topics, what's for dinner, shitposting, etc. And those areas exist because phpBB is a great tool for building a community.

phpBB runs on PHP, arguably the most important programming language of all time, and certainly one of the most successful ones. Almost every hosting platform supports it by default. It runs on cheap hardware. You can get a Dreamhost/etc account on a server shared with 1000 other websites and run your forum without thinking twice. It's far from perfect, but you can do it with a relatively low amount of technical knowledge, especially with one-click installers and things like that. It's an incredibly successful and historically important software project.

Right now, there's basically no equivalent to phpBB for the fediverse. I think this is likely true in part because everyone's brains broke in the early dotcom era, and we all collectively decided that something wasn't worth doing online unless it could scale up to 10 million users. A web forum is useless to someone with that mindset. Mastodon requires quite a bit of technical knowledge, and if something goes wrong you need to be an expert to deal with it. Mastodon the software platform is built and maintained to support the scale required to run mastodon.social, an instance with two million users. And if you use Mastodon to run your instance, you're locked into the choices made to support a giant instance. There's alternatives to Mastodon, some of which make better choices for smaller instance, but I don't think any of them are really any easier to use.

I worry that this makes it very difficult to resist the inclination to centralize everything on a couple of big instances. And that inclination absolutely must be resisted. I would love to see a fediverse full of thousands of small instances of software platforms that are as easy to manage as a PHP web forum. But there are a lot of challenges to tackle before something like that could really exist. I could probably ramble about this for another thousand words but I'll stop now. If you made it this far, thanks for reading, and thanks for supporting botsin.space!

RIP botsin.space

2024-10-29T00:00:00Z

There's no easy way to put it, so here goes -- after a lot of consideration, I've made the hard (frankly, painful) decision to shut down botsin.space. TLDR: here's the plan:

Effective immediately I'm shutting down new account signups
I will switch the site into read-only mode sometime not long after December 15th.
I'll do everything I can to help people migrate their accounts elsewhere, and/or generate archives.
I'll keep the site running in read-only mode at least into March of 2025, and if possible/needed I'll extend that as long as I can.

Why?

I launched botsin.space in April of 2017, which honestly feels like it was about six thousand years ago. Originally I just wanted to play around with the fediverse and Mastodon. The oauth flow for creating a bot was messy, so I forked Mastodon and fixed it, deployed those changes to botsin.space, and invited people to create bots.

The server was popular with bot allies and artists, people who wanted to get RSS feeds onto the fediverse, as well as students and professors who wanted to work on coding projects or learn about federated social media. There have been some moderation challenges over the years, but to be honest those have never been all that considerable.

But botsin.space has always been a bit of an odd duck with a unique set of challenges. Over the years, the server has grown to have around a few thousand active accounts, which isn't all that many. However, they've generated something like 32 million statuses. Just to put that in perspective, mastodon.social has over 2 million users, who have generated around 110 million statuses. So the usage patterns are very different, and I think it's safe to say the the mastodon codebase is tuned for mastodon.social and not a weird freaky server like botsin.space.

I work on the internet professionally, use Rails at my day job, and server management is part of my job description, so I've been able to use my skills to keep botsin.space running on a relative shoestring budget. Until recently, the whole thing ran on one server. But that's not maintainable, and given that and some other concerns, I think that now is the time to retire the server.

There are four major expenses for botsin.space, in order from least to most expensive:

My time. As long as I find managing the server rewarding, this is an expense I'm happy to pay.
Server costs. I've been able to be fairly cheap here until recently.
Database storage. The database for the server lives on a dedicated volume at Digital Ocean, and is currently around 191GB in size. Every time I need to increase the size of this volume, the expense goes up, and it's safe to assume that this will only continue to grow.
File storage and bandwidth. These expenses will also only get more expensive over time. File bandwidth is the #1 charge on the monthly server bill right now. I live in fear of an AI scraper figuring out how to scrape all of these files and bankrupting me overnight.

Until recently, my thinking has been: "I'm cool with finding all sorts of weird tricks to keep the server going, and I'll worry about #3 and #4 someday in the future maybe lalalalala I can't hear you." But the recent Mastodon upgrade has caused a significant amount of performance degradation, and I think the only way to really solve it is going to be to throw a lot of money into hardware.

I should mention at this point that I've had a Patreon to help with server expenses, and I've also accepted Paypal donations, and I'm truly grateful for everyone who has ever sent me money to support the server.

However, even with the support, expenses have always outpaced the donations, and while that's been fine with me for a long time, it's not sustainable. I'm fortunate enough to have a career and life where I've been able to support botsin.space, but I can't do it forever, and as the expenses and challenges mount, I find myself thinking about things that I'd probably rather be doing with my time.

With a few exceptions, botsin.space isn't anyone's primary instance, and I've always been mindful of the fact that everyone who supports botsin.space financially has other places and people to think of also, and I am so thankful for the consideration. I hope that everyone who is currently supporting botsin.space finds another instance to support, there are a lot of great instances out there with vibrant communities that need all the help they can get to survive.

So, given two choices -- asking for more donations so I can pay for more hardware to keep the instance running, or retiring it and encouraging people to support more community-oriented instances, I'll choose the second option every time.

As I mentioned above, I'll be working to keep the server stable and running for as long as possible, to give people a chance to migrate their accounts, get archives, etc. If anyone has any questions, please feel free to contact me at @cass@muffin.industries. It's probably smarter to ping me there instead of at botsin.space, because it's more likely that your message will be delivered in a timely fashion.

I'd like to thank everyone who has ever run a bot on botsin.space and gotten joy out of it. I'd like to thank all the people who have ever shared their thanks or kind words with me online -- your support has meant the world to me. I'd like to thank all the #botALLY folks, who have been a constant source of inspiration and learning to me for over a decade. Finally, I'd like to thank Johanna, who has always been there for me <3

A system for randomly timed bot posts

2024-09-06T00:00:00Z

Awhile ago I made a very simple bot -- @WhatNewDevilry, which simply posts one of my favorite scenes/lines from Lord of The Rings:

Instead of posting on a scheduled interval, I wanted to add some randomness to the posting schedule. I decided to do this by using the Mastodon API's scheduled tweet system. Basically, the bot code checks to see if there's a scheduled tweet for the account. If not, it schedules one at a random time in the future. The script runs every hour or so to keep checking the status. Meanwhile, Mastodon itself will post the scheduled toot at the proper time.

The code is pretty basic and looks an awful lot like this:

A Million Random Digits with 100,000 Normal Deviates - Audio Edition

2024-05-05T00:00:00Z

A Million Random Digits with 100,000 Normal Deviates - Audio Edition

This is an audio version of the acclaimed book A Million Random Digits with 100,000 Normal Deviates. Although it was originally published in 1955, there has never been an audio version of it -- until today. Weighing in at 18 gigabytes of audio data, the book provides 455 hours (almost 19 days) of listening enjoyment.

The print edition contains a series of random numbers and some guidance on how to use them. At the time it was published, generating random numbers was challenging so the book was fairly useful and noteworthy for mathematicians, scientists, etc. RAND used it for lots of things, probably including gaming out how a global nuclear war would work out.

The audio book is simply a reading of the digits, in order, along with the introduction, preface, etc. It was generated via a bunch of Python scripts that use the Azure Speech service to generate the samples for all of the digits. Originally I considered generating samples for the numbers zero through nine, and just combining them together. But that generated pretty monotonous audio, so instead I generated samples for every unique sequence of five digits. I spread the creation out over a few months to maximize my use of the Azure free tier (although I messed up and did end up with a bit of a bill).

I originally thought that I would use the code that generates audio for the secret broadcast to generate the audio book, but ultimately I ended up writing some very simple Python code to do the work. I had a script to generate the samples, and a few more to combine them together into WAV files. I had to split the audiobook into sizes small enough to not exceed the 4GB maximum WAV file size.

feedsin.space -- RSS feeds on the fediverse

2023-03-09T00:00:00Z

Introducing feedsin.space, a service for generating Fediverse accounts that post content from RSS feeds.

Something that I've learned over the years of running botsin.space is that a lot of people want to be able to publish RSS feeds to Mastodon. A few months ago, I decided to implement a service to run alongside botsin.space that people could use to setup accounts for RSS feeds, but without needing to go through the work of setting up a full Mastodon account on botsin.space.

Using feedsin.space is pretty straightforward. First, you need to authenticate with the website by sending a message from your mastodon account to @admin@feedsin.space with the word "help". You'll get a response with a link you can click on to authenticate with the website. Then, you can create an account on feedsin.space by specifying a username for the account and the RSS feed you want to follow. Assuming everything looks good, there will be an account created at @username@feedsin.space, which you can follow from your Mastodon account, and any time the RSS feed updates, you'll get the post in your timeline.

I've implemented a few features to the service beyond the basics, including:

You can set the visibility of posts, so they can be public, unlisted, followers-only, or you can opt to get direct messages from the service to keep your feed private
The ability to add a content warning to posts
An optional hashtag for posts
A setting for allowing/disallowing search engine indexing
Embedded audio from podcasts (this isn't working as well as I want)

In the future, I am thinking about adding a directory of feeds available on the service, and I have some other things on my todo list.

The Code

feedsin.space is written in Rust, and the code is available on Github. There's a few libraries I've used, and other ActivityPub websites/projects that I frequently referenced to figure out what the heck I was doing. The ActivityPub protocol can be pretty intimidating to learn, so I spent a lot of time looking at the code for these projects:

The ActivityPub spec
The rust ActivityStreams library. I wouldn't have built the project in Rust without this library
Plume and Lemmy, two AP projects which use Rust
Darius Kazemi's rss-to-activitypub
Mastodon itself. I ran a local copy of Mastodon for testing, and I hacked into the code for sending/receiving messages to get an idea of what was going on, for debugging, etc, etc.

Alternatives

If you want to get RSS feeds into the Fediverse, but this doesn't seem like the service you are looking for, there's a few other tools currently available to do this, including:

feed2toot, which is a script you can run to post from an RSS feed to an account.
Zapier and/or IFTTT, if you're willing to hack a bit.
Darius Kazemi's rss-to-activitypub project.
MastoFeed, which is a web service to do this, but it's a little mysterious about who runs it, and it requires a lot of excess permissions.
FeedBot appears to be similar to MastoFeed, but I can't get it to work for me right now so I don't have much of an opinion about it.

How I maintain botsin.space

2022-09-10T00:00:00Z

I've been meaning to write up some notes on how I manage botsin.space, and how I've dealt with certain problems in the past -- in particular, the several days of issues and downtime in May 2022.

Hosting

botsin.space is hosted at DigitalOcean. There's nothing really special about DO (in fact, I think about moving often) but there's a few features that have really saved me a few times now. First, it's very easy to create a new disk volume, and once you have a volume, it's pretty easy to expand its size. I store the database on a separate volume. Currently the database is taking up ~65GB of space. When the volume is close to full, I'll expand it as needed. Second, it's very easy to take snapshots of volumes. I have a script that takes a nightly snapshot of the database volume. I also make snapshots before doing upgrades, server maintenance, etc. If something bad happens and I need to restore the db copy, I can create a new volume, attach it to the server, and switch from the broken db to the snapshot db. I've had to do this several times, and knowing I can do it again really helps alleviate the stress of running the server.

I run the instance using docker compose. I know that docker causes some people a lot of suffering (enough that the official mastodon documentation doesn't seem to include using docker as an option anymore), but I like it for a few reasons. First, I have a lot of professional experience using docker, so I'm used to the different ways it can cause you pain. Second, I find that using docker makes it a little easier to run upgrades and rollbacks. Third, it makes it a little easier to maintain the code/scripts I need to run the instance in git without having to fork the entire mastodon codebase. Finally, it also makes the service a lot more portable, since if/when I want to move the instance to a new server, I don't need to reinstall as many required programs.

The code

I have a slightly customized build of mastodon, with a docker file that looks an awful lot like this:

FROM tootsuite/mastodon:v3.5.3

COPY app/views/about/_registration.html.haml /opt/mastodon/app/views/about/
COPY app/views/about/_botsinspace-custom-signup.html.haml /opt/mastodon/app/views/about/

This takes the pre-existing image for mastodon, copies a few customized files in, and that's it!

Here's what the Dockerfile and docker-compose.yml look like:

Upgrades and maintenance

When it's time to run an upgrade, I make a snapshot of the database, update the version numbers in docker-compose.yml, and run something along the lines of docker compose build && docker compose up -d. This builds a new docker image and deploys it, then restarts everything as needed. If something goes wrong, I roll back the version and re-run docker compose up -d. The configuration file itself could be a little more optimized (ideally I'd only specify the version stuff once), but I'm lazy and usually do it via search/replace in my editor.

Other stuff

A few things happen outside of docker:

nginx - nginx runs directly on the server, and routes traffic to docker. The configuration is reasonably close to the default mastodon configuration file. There's a couple of rules in there to block some bad actors, and there's some rate-limiting as well.

Lets Encrypt - I use Let's Encrypt to setup HTTPS certificates/etc. I use DNS validation since there's a special plugin that handles everything via the Digital Ocean API.

Scheduled tasks - There's a few nightly tasks running in cron -- making backups, running mastodon maintenance/etc.

File storage - File storage is a huge chunk of the expense of running the instance. Uploads are stored in Digital Ocean's Spaces, which is basically a clone of S3. I kept files on S3 for awhile, but I don't like giving Amazon money, and Spaces is a little cheaper. It's also probably better for performance to have the file storage closer to the actual server.

Emails - Emails are sent with MailPace and it works well enough that I basically never think about it.

Server upgrades

The botsin.space server is running Ubuntu. Server updates aren't too much of a concern, but if I need to do upgrade between major versions or something else large like that, I take advantage of the fact that the database is on its own dedicated volume. I can boot an entirely new server, install any required software (I basically have a script for this), copy over my configuration files, then detach the volume from the old server, attach it to the new one, and update DNS to point to the new server.

Moderation and new accounts

At the moment, I handle all moderation issues and new account requests myself. I use a slightly tweaked version of ivory to help with spam signups and things like that. It's certainly possible that this will become enough work that I can't handle it myself, but that hasn't happened yet.

When things go wrong

The upgrade to Mastodon v3.5.0 involved upgrading PostgreSQL from version 9.6 to 14. There were instructions for running this upgrade that were along the lines of: make a dump of the data in the old version of postgres, upgrade, then import the data into the new version. With a large database, that can take hours or even days, and if it fails while it's running, that's a bunch of time that you've wasted. So, I took a snapshot and shutdown botsin.space, and started running the dump. Unfortunately, the process failed for me over and over again, and when I eventually got it to work, and tried to bring botsin.space back online, it was clear that there were some data issues. I rolled back to the old snapshot and started running upgrade tests on a separate test server.

Eventually I found a neat little docker image that can be used to upgrade between postgres versions, and that seemed to work.

However, there was another problem -- botsin.space was experiencing a data corruption issue. When I tried to run mastodon's custom fix-duplicates script, I found a whole new set of issues. That script checks a bunch of tables for duplicate data. Many of those tables have a manageable amount of data in them, but some of them -- particularly the conversations and status tables -- each of which have over 50 million rows in them right now. The script was trying to run fairly complicated queries against that table, but the server didn't have enough memory to process the result. This meant I needed to write some custom ruby code to do the same thing without causing quite so much server load. I managed to do that (luckily I program in Ruby for a living), let it run for a couple of hours, and when it was done, I was able to bring botsin.space back online.

If I hadn't been able to take snapshots, and increase the database storage volume as needed, and if I wasn't well-versed in Ruby, there's a good chance that this upgrade would've either failed entirely, involved a lot of data lostt, or taken many days/weeks to finish.

SpaceJamCheck in the New York Times

2021-05-24T00:00:00Z

Emoji Fireplace app

2019-12-29T00:00:00Z

Audio Sweetener Bot

2019-01-10T00:00:00Z

November Rain Bot

2018-11-16T00:00:00Z

Tall Boy

2018-11-16T00:00:00Z

Pitchers and Catchers

2018-10-29T00:00:00Z

The Secret Broadcast

2018-09-18T00:00:00Z

Emily Dickinson's Herbarium

2018-09-14T00:00:00Z

moire

2018-04-10T00:00:00Z

Before Dawn v0.9.25

2018-04-06T00:00:00Z

p5.js fullscreen starter on glitch

2018-04-06T00:00:00Z

Before Dawn v0.9.14

2017-12-12T00:00:00Z

Nice Gradients

2017-11-17T00:00:00Z

Before Dawn v0.9.11

2017-11-16T00:00:00Z

Editorialize Chrome Plugin

2017-08-30T00:00:00Z

Lonely Computer

2017-08-06T00:00:00Z

Defrag

2017-07-28T00:00:00Z

Atari Attract Mode

2017-07-14T00:00:00Z

@IndyDaySpeech

2017-07-04T00:00:00Z

@eliza on mastodon

2017-04-20T00:00:00Z

@loveletter on mastodon

2017-04-05T00:00:00Z

The Journey of EarthRoverBot

2017-03-28T16:20:00Z

@EarthRoverBot is in the final stretch of a journey from the edge of Maine to the US/Mexico border. The bot is entirely virtual, and the trip is powered by Google's Street View data. It takes a step forward every 12 minutes. It has a location and a bearing, and if there's valid Street View data in that direction, then it moves forward. If there isn't data, it adjusts course until it finds a way to continue. With each step it sends the image to Twitter.

The bot can be controlled via commands sent through tweets, but mostly it runs on autopilot, with a simple algorithm that it uses to work its way towards the border crossing in San Diego. At some point in the next few weeks, the bot will send an image that looks something like this:

And the trip will be done.

Of all the bots I've made, I think this one is my favorite. I love the experience of a slow, meditative journey, without using a map, getting stuck in unusual places, finding dead-ends and the insides of buildings in places where the data is weird.

Also, while the bot is basically automated, it can accept human commands, so people have been able to control the course of the bot. In fact, it never would have made it as far as it has without help from people.

At the same time, thanks to the use of Google Street view, the journey represents a fairly bizarre version of a road trip. Everything that you are able to see has been dictated by largely commercial needs of a gigantic company. It's almost always sunny in the world presented by Street View, although sometimes seasons will change without warning. There's very little traffic, you never see an accident or weather. The trip is largely devoid of visible people. The quality of light is almost constant -- it's always the middle of the day and the sun is usually out. Over days or weeks, the color palette changes in subtle ways.

When I made the bot, sending it from one corner of the country to the other seemed like a fun and fairly innocuous idea, but it spent an entire election season barreling towards a border that defined so much of the election, and now it's impossible to avoid the feeling that driving something towards a destination like this is inherently political.

Here's a map of the final leg of the trip:

(it's already moved past this point since I'm the worst blogger ever lol)

I do have plans for the bot after it has finished this trip. I might add the ability to jump to specific locations, or I might just start another trip between two other points. I've thought about making a web version, and I'll definitely release the source code for the bot.

Here's a video of the trip of the bot. I took every image is posted to Twitter, filtered out ones where the bot moved fewer than 10 meters, then composited them down to a few thousand frames, then turned that into a video. I'm still experimenting, so I might come up with something more interesting in the future.

Before Dawn

2017-02-28T00:00:00Z

Trump Administration Twitter Archives

2017-02-12T00:00:00Z

buzzcut

2016-11-30T00:00:00Z

@drillify_exe

2016-11-18T00:00:00Z

@head_2_keyboard

2016-11-04T00:00:00Z

@muffin_exe_sta

2016-11-04T00:00:00Z

@happened_today

2016-11-04T00:00:00Z

The Making of @lists_of_lists

2016-08-16T18:37:00Z

I thought I'd write something about how I made the bot @lists_of_lists, from start to finish. It's a relatively simple idea, so if you're interested in writing a bot for the first time, this might be a helpful guide.

I have a bit of an advantage for two reasons. First, I'm a professional programmer, and have been for many years. I know ruby very well, and it's the language I use to build most of my bots. Second, I wrote the library, that I use to make most of my bots, so it's basically adapted to my needs.

That said, if you are not a developer, but want to make a bot, you definitely can, but you should probably expect to have to learn a little bit about coding, and also a little bit about server management, because getting your bot to run consistently is sometimes the hardest part of the process.

The Idea

I spent a lot of time exploring wikipedia's data downloads when I was building gopherpedia. I knew that there were a lot of 'list of' pages, and that some of them were amusing and interesting. I decided to see if I could download a list of them so that I could play around with the data.

Wikipedia offers database dumps at https://dumps.wikimedia.org/. The main files here are gigantic XML files that represent the complete contents of the website. Depending on what you are interested in, some of these XML files are 12GB or larger. That's a single XML file! Parsing those is a real challenge.

Luckily, they offer a much smaller file of just page titles. I downloaded that file, and searched it for pages with the words 'list of' or 'lists of' in the title. I ended up running this a few times, so I combined it all into a single shell command that looks like this:

curl https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz > enwiki-latest-all-titles-in-ns0.gz && gzcat enwiki-latest-all-titles-in-ns0.gz | grep -i 'List_of\|Lists_of' > lists.txt

At that point, I had a text file that looked a little like this:

"List_of_the_works_of_Charles_Cottet_depicting_scenes_of_Brittany
"List_of_the_works_of_Charles_Cottet_depicting_scenes_of_Brittany"
'List_of_Mongolian_musical_instruments
(List_of_Toni,la_Chef_episodes)
/List_of_Parliament_of_Australia_Reports_on_Sport
1996_World_Monuments_Fund_List_of_Most_Endangered_Sites
1996_World_Monuments_Watch_List_of_Most_Endangered_Sites
1998_World_Monuments_Fund_List_of_Most_Endangered_Sites
1998_World_Monuments_Watch_List_of_Most_Endangered_Sites
2000_World_Monuments_Fund_List_of_Most_Endangered_Sites

Sit on it for a year

Once I had the data, I had no idea what I actually wanted to do with it. I thought about running it through a Markov chain tool, or maybe swapping out words randomly, adding adjectives and modifiers, etc, etc.

I couldn't really decide what to do, so I didn't do anything. I let the data sit around for a year or so.

Eventually, I decided to just keep it simple and make a bot that would simply iterate through the list of lists. I randomized the data to make it a little more interesting:

gshuf lists.txt > lists-random.txt

(gshuf is an OSX command to randomly shuffle the lines of a file. If it's not installed already, you can install it via brew install coreutils. On Linux, there's a command called shuf that does the exact same thing. I suspect it's pre-installed on most Linux systems. Thanks to @ckolderup for pointing all of this out!

Start The bot

I had the data, now I needed the bot. Amazingly, when I went to Twitter to register a new account, my first choice was available, so @lists_of_lists was born.

I made myself a directory to hold onto my bot files, and copied the data there. Then, I setup a Gemfile and got ready to install chatterbot

mkdir lists_of_lists

I made a Gemfile that looks like this:

source "https://rubygems.org"
gem "chatterbot", :git => "git://github.com/muffinista/chatterbot.git"

Then I ran bundle to install chatterbot.

Chatterbot has a script which will walk you through the process of setting up a Twitter bot. It will also create a template file for the bot, and setup your credentials file. I ran it!

NOTE I ran all of this while being logged into Twitter as the account for the bot.

bundle exec chatterbot-register

It prints out a message telling me what happens next:

Welcome to Chatterbot. Let's walk through the steps to get a bot running.

Hey, looks like you need to get an API key from Twitter before you can get started.

Have you already set up an app with Twitter? [Y/N]

I haven't setup an app yet, so I put 'N'

> N
OK, I can help with that!

Please hit enter, and I will send you to https://apps.twitter.com/app/new to start the process.
(If it doesn't work, you can open a browser and paste the URL in manually)

Hit Enter to continue.

The form looked a lot like this (they change this a lot):

Once you've filled out that form, Twitter will issue you some API keys. I copied those keys into chatterbot-register, which was waiting for the input:

Once you've filled out the app form, click on the 'Keys and Access Tokens' link


Paste the 'Consumer Key' here: 123456
Paste the 'Consumer Secret' here: abcdefg

Now it's time to authorize your bot!

Do you want to authorize a bot using the account that created the app? [Y/N]

I do want to authorize this account, so I say so:

> Y
OK, on the app page, you can click the 'Create my access token' button
to proceed.

I do that, then I paste the results:

Paste the 'Access Token' here: 123456

Paste the 'Access Token Secret' here: 45678

Hooray, now I have two files! lists_of_lists.rb is a template file for my bot. It lists a bunch of features of chatterbot and gives you something to work from. lists_of_lists.yml has the credentials for the bot, and will also track some other information needed to send out tweets.

My idea for the bot is pretty simple. Each time it runs, it should open up the file with all the lists in it, read the next one, and tweet it out.

The bot will need to keep track of which line it sent out last, and update that value every time. One of the features of chatterbot is that the YAML file which holds the configuration data is accessible to the bot, and is updated with any changes each time the bot is run. This means you can use it to track variables that you need to persist over time, such as the last index of a file that you used.

So I start with some ruby to handle all of that:

SOURCE = "lines.txt"

bot.config[:index] ||= 0

if ENV["FORCE_INDEX"]
  bot.config[:index] = ENV["FORCE_INDEX"].to_i
end
 
data = File.read(SOURCE).split(/\n/)

source = data[ bot.config[:index] ]
puts source

# the page title will have underscores in it, get rid of those
tweet_text = source.gsub(/_/, " ")

This code sets the index variable, opens the file "lines.txt", turns it into an array by splitting on newlines, and then reads the proper value from that array.

Make it Nicer

At this point, I could just tweet that value out like this:

tweet tweet_text

And be done. I decided that would be a little boring though, and I started to wonder about pulling an image from the wikipedia page for the list. Some lists have images on them, and they can be pretty funny.

Wikipedia has an API, and there are a few ruby libraries for accessing it. I decided to check out the official client since I had never used it before. My assumption was that I would need to parse out images from the source text, but it turns out that there is a method you can use to get a list of images! Anyway, here's that code

page = Wikipedia.find( source )

opts = {}

# check if there are any images
if page.image_urls && ! page.image_urls.empty?
  puts page.image_urls.inspect

  # pick an image at random
  image_url = filter_images(page.image_urls).sample
  
  puts image_url
  if image_url && image_url != ""
    # make a local copy of the image
    opts[:media] = save_to_tempfile(image_url)
  end
end

I added a simple method filter_images which rejects any SVG files:

def filter_images(list)
  list.reject { |l| l =~ /.svg$/ }
end

And a second method save_to_tempfile which makes a local copy of the image:

def save_to_tempfile(url)
  uri = URI.parse(url)
  ext = [".", uri.path.split(/\./).last].join("")

  dest = File.join "/tmp", Dir::Tmpname.make_tmpname(['list', ext], nil)

  puts "#{url} -> #{dest}"

  open(dest, 'wb') do |file|
    file << open(url).read
  end

  # if the image is too big, let's lower the quality a bit
  if File.size(dest) > 5_000_000
    `mogrify -quality 65% #{dest}`
  end

  dest
end

This method has one additional twist, which is that it checks the size of the downloaded file. If it's too large, it runs the ImageMagick command mogrify on it to drop the quality down.

At this point, I have the text of a tweet, a page object from the Wikiedpedia API library, and a hash that might have a file in it. I combine it all together and tweet it out:

output = [ tweet_title, page.fullurl ].join("\n")

begin
  tweet(output, opts)
rescue Exception => e
  puts e.inspect
end

Finally, I increment the index variable.

bot.config[:index] += 1

When the script is done running, this value will be updated in the YAML config file for the bot.

During this whole process, I ran the script a couple times. Chatterbot has a debug_mode command, which you can use to run a script without actually sending a tweet, which is pretty handy.

I'm a pretty messy coder, especially when I'm working on personal side projects, so I fixed a couple bugs, spent awhile cleaning up my junky code, etc, etc. Once I was happy with it, I uploaded my code to the server where I run my bots.

Then I needed to setup a cron job to run the bot every few hours. I decided to run the bot every two hours for starters (I might slow it down later), and for variery I run it at 2 minutes past the hour. This is what the job looks like:

2 */2 * * * . ~/.bash_profile; cd /var/stuff/lists_of_lists/; bundle exec ./lists_of_lists.rb >> tweets.log 2>&1

The first bit specifies when the job runs. The rest of it is the command that executes the bot. cron jobs usually run in a different environment then you get when you login to a server via SSH, so you need to explicitly load your environment, cd into the directory where the script is, and run the script. the >> tweets.log 2>&1 bit sends any output into the tweets.log file, which I can check for any errors/etc.

Anyway, that's about it! I've put the code on github -- please feel free to take it and adapt it to your needs!

@lists_of_lists

2016-08-05T00:00:00Z

Tweet Masker

2016-06-10T00:00:00Z

@kiki_flies_exe

2016-05-01T00:00:00Z

@cat_in_field

2016-03-07T00:00:00Z

@snowfall_exe

2015-12-14T00:00:00Z

@yulelogbot

2015-12-05T00:00:00Z

@wayback_exe

2015-10-15T00:00:00Z

@wayback_exe

@wayback_exe is a bot that loads old pages from the Wayback Machine and generates screenshots of the output in an old browser.

T-BannerThe James Marshall Homepage Jun 1997 https://t.co/ClD8MMNGgo pic.twitter.com/GfExw9p4tv
— Wayyyyyyback (@wayback_exe) October 20, 2015

Why?

The Wayback Machine is an incredible tool. It has archived over 23 petabtyes of data and grows by terabytes every day. The Wayback Machine launched just about 20 years ago, and I thought it would be interesting to write a bot displaying some of the oldest pages from the system.

By the way, if you like this bot, or if you like the internet, you should consider donating to the Internet Archive. This bot wouldn't be possible without their hard work.

Code

The bot is my first project using node.js. The code is pretty messy but I've put it on Github for people to take a look and experiment. Please fork it and make your own weird page scrapers!

@botgle

2015-07-01T00:00:00Z

@botgle is a bot that plays games of Boggle.

TIME FOR BOGGLE: Ｗ　Ｓ　Ｌ　Ｒ　Ａ　Ｉ　Ｉ　Ｎ　Ｇ　Ｂ　Ｗ　Ｓ　Ｍ　Ｓ　Ｙ　Ｙ　 🎶 💫 🐉
— BoggleBot (@Botgle) July 6, 2015

The Rules

The rules are pretty simple:

Just like in normal Boggle, there is a 4x4 grid of letters.
The goal is to find as many words as possible in eight minutes.
The game lasts for eight minutes, but the timer doesn't actually start until someone makes the first move.
Words must be built from adjacent tiles - the tiles must be touching horizontally, vertically, or diagonally.
Words must be at least three letters long.
Once someone plays a word, no one else can play it.
If you play a word that no one else has played, the bot will fav your tweet and send out a tweet with the words you found.

How To Play

It's easy! Just follow @botgle. There's four games a day, every six hours. When the game starts, tweet all the words you see in the grid at the bot. You can tweet more than one word at a time, and you can tweet as often as you like. The bot will reply with any valid words which haven't been played by another player. When the game is over, the bot will post the scores of everyone who played.

Notify Me Please!

If you would like to get DM notifications of upcoming games, DM 'NOTIFY' to @botgle, and it will send you a message 10 minutes before any games. If you DM it 'WARN', it will send you a message 1 minute before the game.

If you are getting notifications and they are annoying, DM 'STOP' to @botgle, and it will stop alerting you.

Scoring

3 and 4-letter words are worth 1 point.
5-letter words are worth 2 points.
6-letter words are worth 3 points.
7-letter words are worth 5 points.
Longer words are worth 11 points

Code

The code for @botgle is on github. A good chunk of the code for the board itself is based on earlier work here. There's a bunch of ruby code for Boggle solvers out there and I didn't want to reinvent the wheel. That said, any errors here are almost certainly mine.

The code is a mess and I'll be cleaning it up over the next few weeks.

Tweets by @Botgle

@kaleid_o_bot

2015-02-01T00:00:00Z

@kaleid_o_bot

@kaleid_o_bot is a bot that will take an image you send to it, and run it through an algorithm to make it look like you are seeing it through a kaleidoscope.

@muffinista here you go #northwest #12slices pic.twitter.com/akqJ7MpolK
— Kaleioscope (@kaleid_o_bot) December 23, 2014

You can send any image and it will tweet the results back to you. If you want, you can pass a few options:

animating: You can choose to animate the output or not. If you pass #animate the bot will animate the output to make a funky kaleidoscope effect. If you pass #noanimate the bot will return a static image. The whole animation process is definitely a work in progress, and could fail at any moment.
weighting: You can pass one of these options to specify which section of the image you would like to focus on:
- #northwest
- #north
- #northeast
- #east
- #southeast
- #south
- #southwest
- #west
- #center
Slices: You can specify how many slices you want. Basically the algorithm will make an image with a chunk of the source image mirrored this many times. Here's the options:
- #4slices
- #8slices
- #12slices
- #16slices
- #20slices
- #24slices

NOTE: If you don't pass an option, a random option will be chosen for you

Tweets by @kaleid_o_bot

Chatterbot: A Ruby Library for Twitter Bots

2015-01-01T00:00:00Z

Introducing Chatterbot, a ruby library for producing bots.

Features

It makes setting up Twitter OAuth permissions very easy. The included chatterbot-register script will walk you through authorization in a couple of steps and generate a skeleton script for your bot.
It has a very simple DSL for generating basic bots. Here's a basic but functional example for an actual bot:

It will run searches for you -- multiple searches if you want, and you can also check for responses to your tweets. This is the main functionality of all my bots so far -- search for a phrase or some keywords, and reply to those tweets in some interesting fashion, and then also potentially reply to any mentions of your name. You can do a lot with this sort of functionality on Twitter -- it's mostly what actual humans do on it all day long.
A very basic macro system -- if you reply to a tweet with your bot, and put "#USER#" in the reply string, it will replace that with the name of the user you're replying to.
A fairly simple, but extensible configuration system. If you just want to run a single bot, it will store your configuration data in a YAML file, and you never even need to look at it. But if you want to run multiple bots, you can setup a global config file to store several common parameters. You can also optionally store your configuration in a database.
If desired, you can log tweets to a database. This is handy for tracking your bots activity over time.
It has a blacklist system to keep you from annoying users who don't want to be annoyed, and also to make it easy to ignore tweets which have certain keywords in them.
It has a 'debug mode' so you can test the bot without sending out actual tweets.

Setting up a Bot

Make a Twitter Account

First thing you'll need to do is create an account for your bot on Twitter. That's the easy part.

Install Chatterbot

gem install chatterbot should do the trick.

Run the Generator

Chatterbot comes with a script named chatterbot-register which will handle two tasks -- it will authorize your bot with Twitter and it will generate a skeleton script, which you use to implement your actual bot.

Write your bot

Chatterbot has a very simple DSL inspired by Sinatra and Twibot, an earlier Twitter bot framework. Here's an example, based on @dr_rumack, an actual bot running on Twitter:

    require 'chatterbot/dsl'
    search("'surely you must be joking'") do |tweet|
      reply "@#{tweet_user(tweet)} I am serious, and don't call me Shirley!", tweet
    end

Or, you can create a bot object yourself, extend it if needed, and use it like so:

   bot = Chatterbot::Bot.new
   bot.search("'surely you must be joking'") do |tweet|
     bot.reply "@#{tweet_user(tweet)} I am serious, and don't call me Shirley!", tweet
   end

That's it!

Bots in Action

You can check out the bots I've written on my Twitter Bot page, and there's a bunch of relevant stuff tagged as 'twitter' on muffinlabs.

Got questions? Did I document something poorly? Contact me here or on Github and I'll see what I can do about that.

A Real River

2014-12-11T00:25:00Z

@ARealRiver is a Twitter bot that charts the course of a generative river via emoji. The course of the river is constant as it transitions between tweets, so you can scroll through 100s of tweets and watch the river expand and shrink, and meander back and forth, passing cities and forests and volcanoes and other scenery as it goes.

There were a lot of different inspirations for this bot. I was directly influenced by @katierosepipkin's @tiny_star_field, dungeon_bot by @jeffthompson_, as well as by accounts like @crashtxt and the #140art hash tag.

Another lingering inspiration was a book from the early 80s: Computer Spacegames from Usborne Publishing.

This book was one of several that introduced me to programming. You can get a look at it and many others like it here. It's full of source code for simple games written in BASIC. In particular, there's one called Death Valley.

This was a super-simple game that placed you in a canyon that probably looked a lot like this:

*            *
 *            *
  *            *
   *            *
    *            *
     *            *
   *            *
    *            *
  *            *
 *            *
*     X       *

Your ship is the X, and you need to run along the canyon for as long as possible. Good luck!

I spent years iterating on programs like this as a young programmer, all the way through high school. I would experiment with different output, different speeds, obstacles, etc. It's always stuck with me and ARealRiver is definitely inspired by my time with this code.

EarthRoverBot

2014-10-01T00:00:00Z

@EarthRoverBot status page

@EarthRoverBot is a simulation of a rover that travels the globe via Google Street View imagery. Basically, it stitches together a route by hopping from point to point, wherever there is valid street view data.

In April 2017 the bot completed its first trip from the Canada/US border in Maine, to the US/Mexico border in California, near San Diego. You can see a map of that trip if you like.

The bot is currently in eastern Russia, and is maybe driving to Portugal? I reserve the right to change my mind about that one.

Although the bot can move on its own, it also accepts commands via Twitter, and humans often help it to find paths through difficult situations. Without human input, the bot will make a move every 20 minutes.

The bot has some very simple navigation routines, and it will attempt to drive itself to its destination, but it will only travel to places that have Google Street Map data. Every 20 minutes it will pick a move to make, hopefully towards the destination.

But it's not very smart, so it could definitely use some help from humans! You can issue commands to the bot via Twitter to help it on its way. Here's a list of commands @EarthRoverBot accepts:

help - Asking for help will send you to this page.
status - Send this command to get the bot's position, bearing, and a picture of what it can see.
map - Are you lost? Send this command to get a Google Map URL of the bot's current location.
auto - make a move using the bot's autopilot
move x - Move the bot x meters. You can pass a negative value to go backwards
turn x - turn right x degrees. You can pass a negative value if you want.
left x - turn left x degrees
right x - turn right x degrees

Tweets by @EarthRoverBot

Also, to get around Twitter's duplicate tweet limitations, you can add extra words/characters to a command. For example, "@EarthRoverBot move 10" and "@EarthRoverBot move 10 please" will both work.

This map tracks where the rover is. It will update every 15 minutes or so.

US Prisons

2014-09-13T13:14:00Z

I launched @USPrisons on Twitter a few weeks ago. It will output every prison in the United States -- or at least, I think it's all of them. I found a website with the data, did a bunch of parsing and cleanup, and ended up with 4763 prisons. The bot should spend a year listing them all, along with a few stats, and a picture if possible.

If you're interested, I released the code that does the parsing on github.

StckMrktStatus - Providing Logical Explanations for the Stock Market

2014-04-21T00:00:00Z

I've always thought the stock market reports you hear on the news are fairly silly. "The Dow Jones was up x% because this or that happened." The people saying those things always sound smart and informed, but no one really has any idea why a stock goes up or down in value. So, I made a bot to do the same thing. @StckMrktStatus will pick a stock from the NASDAQ or Dow Jones, see how it is doing for the day, and then add a reason for the change. The reasons are pulled from tweets that have the word 'because' on them. It's pretty simple but seems to work nicely:

Tweets by @StckMrktStatus

The code is pretty simple, and I'll post it sometime soon (I'm working on a post about the code of my last few bots in general).

SpaceJamCheck: Space Jam website monitoring on Twitter

2014-01-09T00:00:00Z

People who have been online for awhile probably know that the website for Space Jam, a movie from 1996, is online still, and is essentially unchanged:

(If you don't know what I'm talking about, you can read about it here.)

At the end of 2010, someone noticed that the website was still online. Before I did a little research, I was convinced that people must have realized this before then, but Google suggests otherwise.

Anyway, here's an article that summarizes how it all happened, basically some Reddit user noticed, the word spread, and then it went viral on Twitter.

I haven't seen this mentioned anywhere, but according to the headers for the website, there were actually some modifications of some sort in 2005:

HEAD http://www2.warnerbros.com/spacejam/movie/jam.htm 200 OK Connection: close Date: Fri, 10 Jan 2014 02:12:09 GMT Accept-Ranges: bytes ETag: "89dfb-13c5-4027752a8ca80" Server: Apache Content-Length: 5061 Content-Type: text/html Last-Modified: Thu, 06 Oct 2005 15:10:18 GMT

It's possible this was just a server move or something like that, but it's interesting to think that someone actually did some maintenance of some sort on the site.

I enjoy visiting the site, especially when I get nostalgic for the early days of my work on the internet. There are so many projects which I've worked on over the years, and a lot of them are gone forever. It's nice to see one that has managed to survive.

Because I'm lazy, and like easy reassurance, I wrote a @SpaceJamStatus, a Twitter bot that will check on the status of the website every few hours and tweet out the status:

Tweets by @SpaceJamCheck

Furthermore, because I am apocalyptic, I wrote @spacejamisdown, a bot which checks the status of the website every few hours, and will only report if it's not online:

Tweets by @spacejamisdown

With a little luck, this bot won't tweet any time soon.

Finally, because I have a love of writing random libraries, I wrote the ruby gem spacejam, which is a pretty simple Ruby library you can use to check on the status of any website. It can do tests against expected response codes, the body of a page, etc. It's pretty simple, but it's good enough to check on the status of the Space Jam website.

Each Town - Listing All Towns in America on Twitter

2013-10-16T00:00:00Z

A week or two ago I launched @eachtown on Twitter. It will spend the next couple years tweeting the name and location of every populated place in America, in alphabetical order.

Tweets by @eachtown

A couple of years ago, I spent a lot of time fiddling with the USGS database of Geographic Names. It's a cool set of data and I've often thought of doing more with it. I was inspired by @everyword to create something similar, and decided to create a bot which iterates through every populated place in America, and tweets the name, and a link to a Google Map for the location. I enjoy the context you get from having the ability to look at a place. Not every location in the database is a city or even a town. There's mobile home parks, condominiums, etc. Seeing them on the map gives you a sense of the fact that these places are real, and gives them a little context.

**[Agnew Mobile Home Park, WA][agnew]**

It's a pretty simple bot, and I'll post the source code at some point once I clean it up a little.

Gopherpedia - The Free Encyclopedia via gopher

2013-06-14T00:00:00Z

My last release for Project Dump week is Gopherpedia -- a mirror of Wikipedia in gopherspace. If you happen to have a gopher client, you can see it at gopherpedia.com on port 70. Otherwise, you can browse to gopherpedia.com and view it via a web proxy.

A couple of years ago, I landed on the idea of a gopher interface to Wikipedia. Originally it was probably a joke, but it stuck with me. So one day I registered a domain name and got to work. The first thing I needed to do was build a gopher server, because none of the currently available options were up to the task. So I built Gopher2000. Then, I quickly realized that the current gopher proxies weren't any good either, so I built GoPHPer. Once both of those were written (well over a year ago), it didn't seem like there was much left to be done -- gopherpedia should've been ready to launch.

But I hadn't reckoned on the challenges of churning through a database dump of Wikipedia.

Wikipedia is very open. They have an API which you can use to search and query documents, and they provide downloadable archives of their entire collection of databases. They encourage you to download these, mirror them, etc.

My first implementation of gopherpedia used the API. This worked well, but had two problems. First, it was a little slow, since it needed to query a remote server for every request. Second, Wikipedia prohibits using the API this way - if you want to make a mirror of their website, they want you to download an archive and use that, so their servers aren't overloaded.

So I downloaded a dump of their database, which is a single 9GB compressed XML file. Nine. Gigabytes. Compressed. A single file.

Then a took the opportunity to learn about streaming XML Parsers. Basically I wrote a parser script that parsed the file while it was reading it, as opposed to reading the whole thing into memory at once, which was clearly impossible. The script splits up wikipedia entries and stores them as flat text files. Running that script took a couple days on my extremely cheap Dreamhost server -- that's right, I have a gopher server hosted on Dreamhost.

So, when someone requests a page, the gopher server reads that file, does some parsing, and returns the result as a gopher query. Sounds simple, right? Not quite, because parsing the contents of a wikipedia entry is also a mess. It's part wikitext, part HTML, and there's plenty of places where both are broken. If I was just outputting HTML, I could probably get away with it. But since this is Gopher I really needed to format the results as plain text. I spent a while writing an incredibly messy parser, and the imperfect results are what you see on gopherpedia now. Sorry for all the flaws.

Anyway, this was a fun project, and it occupied a pleasant chunk of my spare time over the last year or two, but it's time to release it to the wild. Unless I'm mistaken, this is now the largest gopher site in existence. There are about 4.2 million pages on gopherpedia, totaling somewhere over 10GB of data.

Here's my favorite page on the site -- the gopherified wikipedia entry for Gopher.

Please note, this is in extreme beta, and is likely to break, just let me know if you have any problems. Enjoy!

Gophper - A Modern Gopher Proxy for the Modern Age

2013-06-13T00:00:00Z

As I mentioned yesterday, building Gopher applications is fun, but using gopherspace is actually pretty challenging unless you're a die-hard throwback geek. I have a super-secret gopher project (to be revealed tomorrow), but it's pretty useless if no one can actually see it. Sure, I could write up a blog post about how to download a gopher client, etc, etc, but that's just dumb.

There's a few gopher proxies out there -- primarily floodgap and meulie -- these are websites which you can use to browse gopher servers. But there's a few problems with these proxies. First, they're a little clunky. They're handy tools, but they're not really attractive, and the HTML they output is pretty old-fashioned. And most importantly, neither one is open-source.

I wanted a simple gopher proxy, using modern web standards, that was open-source and easy to install. So, I wrote gophper. You can see it in action at gopher.muffinlabs.com.

Here's the details:

It runs on PHP using Slim, which is a nifty lightweight application framework.
It caches requests for faster response times.
All of the rendering happens in the browser, which means someone could easily write a different backend.
It has a wacky theme switcher, so you can choose between a nice modern look, or an old-school monochrome CRT look.
If the user accesses a binary file, they can download it. If they click on an image, they can see it in the browser.
It can be integrated with Google Analytics.
You can restrict it to a single gopher server, so you can integrate it into your project without any fears of someone using your proxy for naughty tricks.

It's still a little rough around the edges, but it definitely works. I would love to see it used all over the place. But tomorrow I'll reveal where I'm using it.

Gopher2000 - A Modern Gopher Server

2013-06-12T00:00:00Z

I'm old enough that the Internet basically didn't exist for anyone other than a college student or scientist when I was a teenager, but by the time I graduated from college it was everywhere. My first access to the Internet was via a friend-of-a-friend-of-a-friend's borrowed account on a Clark University server while I was in high school. I still remember the password.

I was nerdy enough to be dialing into BBSes at this point, and I even managed to communicate over some distance in discussion groups via FIDOnet, but that was a pretty pale comparison to undiscovered wilderness of the Internet. Most of my knowledge of the Internet came from reading The Cuckoo's Egg. When I finally had real access, naturally I spent most of my time playing on Multi-User Dungeons like DartMud and EOTL -- and somehow they both still exist. At the time, everything was text-based, so welcome screens like these were pretty amazing.

My friends and I would learn about interesting FTP servers, and we tried to download interesting documents and applications from them, but we barely knew any commands to use, and the files were always in weird archive formats that we didn't understand at the time, and of course you couldn't Google it.

So while it was amazing to be online, in a lot of ways it was very limiting. Until I learned about Gopher.

If you aren't familiar with it, Gopher is a very simple protocol for browsing text documents on the Internet. It doesn't sound like much, but before HTTP and the World-Wide Web, it was a revelation. There was data out there, and you could get it, if only you knew the hostname. Luckily the first few years of Wired would post interesting gopher address you could visit. Here's their description of gopher from the 'Net Surf' column of one of their early issues:

Is There a Rodent In Your Future?

If you surf the Internet and haven't heard of gopher, you're probably reefed in the backwaters somewhere. Gopher is one of cyberspace's hidden gems - the application even employs that buzz-term of computing, "client- server architecture."

Specifically, gopher is an information gathering tool that offers a smooth, menu-driven way to traverse international "gopherspace" - which these days literally means several hundred servers worldwide, offering text (from the CIA Fact Book to the Bible), computer programs, audio, still images, and even movie clips. Gopher provides a seamless, "hidden programming" interface with which you can transfer files, browse databases, and telnet to sites around the globe, simply and easily. For example, gopher the University of Wisconsin-Parkside (gopher.uwp.edu) and you'll find the music server: a collection of song lyrics, discographies and sound files from a variety of selected tunes.

Another destination, the ArchiGopher at the University of Michigan, contains photographed examples of French architecture and Ann Arbor campus buildings, as well as scanned copies of paintings by Kandinsky. Via gopher, academics can search for employment while students can seek information on various campuses. But there is a catch: To access these goodies, you must have direct access to the Internet (with client software), or be able to remotely login to Net servers that offer that capability. (The software is publicly available via ftp at boombox.micro.umn.edu, in directory pub/gopher.) Then it's as simple as typing "gopher" and the server address (with proper command accompaniment, such as "%" for Unix clients).

Gopher, the helpful rodent, was initially born of programmers at the University of Minnesota (the Gopher State) in an effort to link and search disparate, specialized computer systems on campus. Later offered up to the Net, most public gopher servers have sprung up only within the last year, while new rodents appear to be tunneling fresh soil almost daily. This little tool is a definite nugget in the ore of the Internet, rich with information. - Tom Zillner

We're Listing, Captain Every two weeks, surfers anxiously await the "Yanoff List." Compiled by Scott Yanoff, a computer science student at the University of Wisconsin (yanoff@csd4.csd.uwm.edu) the list offers concise descriptions of helpful sites around the Net. Started in 1991 as a personal log with only six entries, public distribution of The List brought a flood of suggestions: Topics now range from philosophy to amateur radio, astronomy to games. Yanoff also documents locations for such research essentials as Archie, WAIS, Netfind and World Wide Web (WWW or W3). (Internet Hunt participants remember this one ) Cut over to USENET group alt.internet.services, or ftp or gopher csd4.csd.uwm.edu (available in /pub/inet.services.txt). Don't leave cyberspace without it.

I'll Gopher That Known also as the Whole Earth 'Lectronic Link, this particular gopher shreds an info-tube It offers access to a host of electronic magazines, an SF arena featuring input from well-known cybernauts such as Bruce Sterling, as well as all the stuff you'd expect from old (and young) hippies. You'll find text from some of the major, progressive magazines, help files for traversing the big-bad-networks, the online Factsheet Five, art world calls for action, and lots more edgy stuff to gnaw on. All in an easily navigable, menu-driven environment that won't flatten out on you. This service is provided by the Well, and can be accessed at gopher.well.sf.ca.us. E-mail (gopher@well.sf.ca.us) with any questions.

Wow, has the tone of that magazine changed over 20 years.

Beyond sharing hostnames with your friends, if you knew about the Gopher search engine Veronica, then you could find all sorts of stuff. I first learned about Veronica from the teacher who ran our high school Model United Nations club. He showed us how to use it to download copies of UN Resolutions and other documents that would've been very hard to get otherwise. They still list a couple gopher servers on their website, but unfortunately I can't find one that's active anymore.

Of course, Gopher lost out to HTTP. There were some regrettable licensing decisions that scared away a lot of interest, and HTTP was always open. And even though in some ways it has never fulfilled this promise, HTTP was all about collaborative sharing and even editing of documents, something that was lacking from Gopher.

Today there's still a couple hundred gopher servers out there, with maybe a million or two pages on them, which is obviously nothing compared to the incredible mass of data you can access via your browser. But I still have a fondness for Gopher, for a few reasons. First, because it was part of my formative years on the internet. Second, because it has an important place in the history of the internet, and given how ephemeral digital history is, it's easy to lose track of this. And finally, because it is super-hackable.

And since it's so hackable, I went ahead and wrote a modern, fully-functional Gopher server in Ruby: Gopher2000

     _____             _                 _____  _____  _____  _____
    |  __ \           | |               / __  \|  _  ||  _  ||  _  |
    | |  \/ ___  _ __ | |__   ___ _ __  `' / /'| |/' || |/' || |/' |
    | | __ / _ \| '_ \| '_ \ / _ \ '__|   / /  |  /| ||  /| ||  /| |
    | |_\ \ (_) | |_) | | | |  __/ |    ./ /___\ |_/ /\ |_/ /\ |_/ /
     \____/\___/| .__/|_| |_|\___|_|    \_____/ \___/  \___/  \___/
                | |
                |_|

There are a few gopher server frameworks out there, but most of them are lacking in one way or another. They're focused on delivering static pages, or they force you to use weird methods of putting together your menus. There's even a few rough Ruby scripts out there for serving gopher requests, but they are all either so old that they don't work with a modern Ruby, or the code is lacking in one way or another.

I wanted to build the best Gopher server imaginable, using everything I've learned in my career writing software. I wanted something simple, with an easy, flexible syntax that stays out of your way. For example, this is the code for a working gopher application:

Gopher2000 is inspired by Sinatra, an awesome web framework also built in Ruby. Reviewing the code for Sinatra (and reading the book Sinatra: Up and Running) has inspired me and educated me about code more than any other place in recent memory.

Here's a few nice things about Gopher2000:

Simple, Sintra-inspired routing and templating DSL. It's easy to define routes, build menus, accept input, etc.
Dynamic routing of requests via named parameters on request paths, and dynamic responses -- so you can have a dynamic, interactive gopher site.
It's easy to mount directories and serve up files if you're into that.
Integrated logging and stats.
Lots of helper methods for formatting output as prettily as possible

I wrote most of Gopher2000 well over a year ago, and it's been functional for a long while, but I never publicized it until now.

Anyway, the real reason I wrote Gopher2000 is to help with the top-secret gopher project which I will announce in a couple days. Frankly, it's going to blow the fucking roof off of gopherspace. The only problem is that no one will be able to see it -- Gopher support has been stripped from all major web browsers over the years, and I'm guessing that you don't have a gopher client handy. My next post will talk about how I handled that problem.

Just The Comments

2013-06-11T00:00:00Z

"Don't read the bottom half of the internet" is a frequently repeated phrase as you travel on the web -- meaning, avoid the comments on a blog post or article whenever possible. In certain realms of the internet, it's basically a truism.

There's arguments out there both for and against it. Many bloggers have gone so far as to disable commenting entirely on their sites. Other bloggers find a lot of value in comments. But the notion of skipping out on a huge chunk of the internet really interested me.

It also raises an obvious question - instead of skipping it entirely, what if you only read the bottom half of the internet? Well, justthecomments.com is your chance to do that, all in one place, and totally removed from the context of pesky blog posts.

The site pulls comments from the all of the websites using the Disqus commenting system via their API in real time. You can read a bunch, then click the 'load more comments' button to see some more. There's essentially no context here. The comments come from all over the web, and have no relation to each other. I wanted to provide a link to the source content, but it's actually sort of involved with the Disqus API, and in hindsight, reading comments with no context whatsoever is kind of amazing. Here's some choice comments I saw over the course of a few minutes recently:

Yep! It is funny, I was required to read it Freshmen year of college... the Progressive professor tried to say that it was a roadmap to what Reagan was trying to impose upon the people... I could see that nothing in the book came close to Reagan's policies (but they did resemble what Carter had been saying and doing)... it only reinforced my belief that Democrats will lie about who and what they were. -- louisiana_mom

Um, no it isn't. -- BRAND NEWCROW

I mean, I've seen Hara. Especially when she was truly stripped down in those ChumChum Churi soju commercials. I'm not saying that she has to have the booty of Nicki Minaj/Kim Kardasian, but her body isn't something amazing and heaven-sent to me either. -- Nova_REMIX

Leftists are the most disgusting form of humanity that has been burped up by the oceans. To try to leverage and manipulate individuals under the guise of being in camp of the poor and unempowered is their great evil. But, as with everyone, I hope you have a long and meaningful life and that at some point you stop trying to foist the misanthropic political philosphy that exists at the core of the Left. There, how's that for positive. -- Alfa Spider

post a time and place, wear a purple shirt, and one of us will be glad to meat you face to face and DEBATE further -- A Z

YOU ARE HOPELESS ans long as you continue to BLAME the GOP or DEMOCRATS..you will NEVER understand what the problem is. you are a_MORON -- Buddy

Way to be classy. No valid point to make, namecall. -- Chiefpr

As you read the comments, it's clear that many of them would probably seem valid if you saw them in the context where they were written. But in isolation, they are sometimes incredibly offensive. It makes me think of moments of crowd silence where a crowd goes silent except for the one person who blurts out something incredibly inappropriate.

Some people like to be really immature, and sometimes very cruel, on the Internet. Here's an amazing story from someone who contacted a long-time troll on their website in order to learn more about him. They have an amazing conversation. And the troll is totally unlike his online persona.

I Forgot My Age: A Special Helper

2013-06-10T00:00:00Z

As we age, and stop remembering our exact age, down to the month/day, and as other concerns begin to crowd into our brains, it can be easy to forget something as essential as 'How old am I?' Am I 37, or 38? How old was I on my last birthday? And so on.

To help with this serious problem, I launched iforgotmyage.com, a website where you can enter your birthday and learn how old you are. Of course it sounds dumb when you read that, but there's more! You can also learn how many months, weeks, days, hours, minutes, and even seconds old you are. Furthermore, the site sets a cookie so that on return trips you can check your age quickly, and there's even a convenient bookmark you can store to check your age. Finally, if you visit on your birthday, you get a special surprise.

Enjoy!

WTFlevel.com: Real-Time Tracking of Swearing on Twitter

2012-11-05T00:00:00Z

Today I'm launching WTFlevel.com, a website that tracks the rate and magnitude of swearing on Twitter, and displays the data in real-time with a couple of snazzy dynamic charts. I wanted to launch it in time for the election, and I just made it.

PREDICTION there will be a lot of swearing tomorrow and the next day on Twitter.

I have worked on a number of projects with Twitter - there are a handful of bots that mostly entertain people, and a library devoted to building them, along with a few other random unpublished/discarded projects. Anyway, Twitter can be fairly annoying, and I doubt their API will last in its current form for too much longer, but it's still fun to work with their data.

Over time I noticed that people on Twitter swear a lot. I started wondering - do people swear more online than they do in real life? And could you identify trends or useful information in the rates of cursing on Twitter? WTFLevel.com is an attempt to begin to answer those questions.

Using the Twitter Streaming API, I scan tweets for a collection a swear words and other curse-like expressions. I calculate two values from that data: the rate of tweets which contain swears to those that do not contain swears, and also the magnitude of sweariness in those tweets. For example, a tweet with more swears in it has a higher magnitude than one which only has one swear in it.

For fun, I invented a threat level scale, basically a spoof of the Homeland Security threat level, since the notion of color-coded threat levels hasn't been sufficiently mocked yet.

I also tried to keep in mind the DEFCON system which always gave me chills during my childhood. And of course, who could forget this scene from War Games:

The level is assigned according to the current rate of swearing, with a little math tossed in to predict if the rate is increasing or decreasing. The colored bars displayed on the graph correspond to the levels.

I made a decision when I started the project to only look at tweets that are reported as being in English, and to only look for English swear words. This meant that I couldn't get a really good idea of the global swearing status, but I don't really have the knowledge to implement a decent system for tracking swears in other languages. That said, I am really amazed to see that people swear a lot more what is roughly the evening hours in America.

I expected to see a more constant level of swearing through the day. It's not like there aren't reasons to swear in the morning or something. So while I worked on the website, I spent a little time researching the use of profanity in real life, to get an idea of how it compares to online usage. According to Wikipedia's article on Profanity:

Analyses of recorded conversations reveal that roughly 80–90 spoken words each day – 0.5% to 0.7% of all words – are swear words, with usage varying from between 0% to 3.4%. In comparison, first-person plural pronouns (we, us, our) make up 1% of spoken words.

Over two months of monitoring, I found that about 6.97% of tweets had a swear in them. I need to work out the word count for that, but it would seem to be roughly comparable to this analysis. By the way, the article cited by Wikipedia is fascinating.

But I find it really interesting that there's an evening peak in the data. Since we're measuring the rate here, and not the totals, I expected swearing to be at least somewhat constant -- it didn't seem like there would be a reason for there to be fewer sweary tweets in the morning as opposed to the evening. I need to do a little digging into the data to see if I can figure out if there's anything obvious that can explain what is happening here. I might also change the output a bit -- it's interesting to see the current rate of swearing, but it might also be interesting to know how much higher/lower it is than it usually is for the given time of day.

If you like to follow computer programs on Twitter, you can follow @WTFLevel, and get notifications whenever shit blows up or calms down.

Tweets by @WTFLevel

Technical Notes

If you care about how things like this are implemented, then you can check out the WTFlevel implementation details.

Crossfire

2012-01-01T00:00:00Z

Here is Crossfire, my latest game project.

The original Crossfire is an 8-bit game published in the early 80s. It's one of the first memories I have of plucking away at our old Apple ][. This game is my attempt to recreate it, making a couple things easier but also hopefully a few things more interesting as well.

I used Flixel to build the game. It's a great library.

Although the game plays well enough, I'm not entirely happy with it. The balance could use some work, and it would be interesting to have some other modes.

Anyway, here's the game.

Public Art of Amazon Reviews

2011-11-29T00:00:00Z

The art and humor from the UC Davis pepper spray incident has certainly made the most of an otherwise awful event. There's no shortage of photoshopped photos:

A lot of people have helpfully reviewed a variant of the pepper spray on Amazon.

I casually used this product to try to disperse a small band of non-violent campers who had locked their arms together. Although initially it seemed to be effective, it took two applications! The worst part is that the next day they multiplied exponentially! Now what?

There's a long history of subversive product reviews on Amazon, and it is a weird and growing form of participatory public art. Way back in 1999, mock reviews of Monica Lewinksky's tell-all book made the news. Back then, there were a lot of questions about how it was even possible for Amazon to let these reviews appear on their site. Shouldn't they be doing a better job of scanning them and removing 'unacceptable content'? But people continued messing with the system, and Amazon was forced to stop anonymous reviews and they probably had to implement some other protections as well.

The best reviews of this genre on Amazon are without a doubt all from Family Circus books. This is where there first concerted effort to subvert the review system started, and I think it is where the best work lies. For example, check out the reviews for What Does This Say?, a collection published in 1995:

Proustian introspection with Munch's visual conundrums, July 29, 2002

Yeats once wrote, "None other knows what pleasures man/At table or in bed." Bil Keane, however, seems to have found in his latest 'Family Circus' opus a treasure-chest of pleasures for each and all of us. There are some who chafe at the seeming repetitive themes within Keane's major works; I would respectfully submit that all great stories are about life and death, love and loss, fear and triumph. If not Keane, then so go Shakespeare, Lewis Carroll, Sor Juana Inez de la Cruz and Callimachus, too, for good measure. It is not originality that spawns thought and wonderment; it is the vessels of those themes (Billy, Grandma, Barfy, PJ) that inspire and enlighten.

Keane, as carrier of these vessels, reminds us of a truth so eloquently immortalized by Ralph Waldo Emerson: "Some books leave us free and some books make us free." In 'What Does This Say', it is clear that the tome achieves the latter, with gusto and aplomb.

Happiness, November 10, 1999

There is a certain sadness one feels in remembering happy times: turning over the last page of a good novel, and reflecting over the wonders we have just experienced, the characters who have become our friends; discovering old pictures, seeing ourselves in the halcyon throes of youth, silly smiles on our innocent faces; the plangent last notes of a Chopin nocturne, the theme, growing softer and softer now, floating across the room to rest against our face like the rhythmic breaths of a peaceful, sleeping lover. I don't know how: but Keane captures this feeling, this happy sadness - "Oh heavy lightness," as Shakespeare put it. Billy romps around the yard. He runs all over town. His parents are in love. His family is love with itself, each unto each. Can our lives ever be like this? Perhaps not, but we can watch, watch ever single day, and wrap ourself in that happy sadness. And maybe forget, if only for a little while, the way our lives really are, the way they have to be: our heavy lightness. Thanks, Bil Keane, for that, and thanks to Amazon for letting people express themselves. Thank you all.

There is a long history of mocking Family Circus that predates the web. Dysfunctional Family Circus was originally a series of small booklets published in the late 80s/early 90s in San Jose. Apparently they managed to remain anonymous until recently, when they went public after the death of Bil Keane. They succeeded largely because they were anonymous -- it's hard to shut down a publishing operation when you can't find the publishers:

When we had boxes of professional looking 12- or 24-pagers, we left handfuls of them of them in public places around San Jose and the valley. Our influence was Jack Chick, the Chino-based comic-book evangelist whose millions of free pamphlets still turn up like lint at the Laundromat...

We knew what we were doing was semilegal, if that. We weren't just skirting certain sacred rules of copyright, we were making jokes about always uneasy subjects like molestation and incest. For some odd reason, this is the first direction a nihilist humorist takes when disfiguring cartoons about a blameless family.

In an 1999 article for Gettingit.com, David Cassel interviewed Seth Friedman, then editor of the zine roundup Factsheet Five. Friedman said he expected lawyers coming out of the woodwork when he saw the Dysfunctional Family Circus booklets: "We were kind of surprised at the time to hear that there was no legal action coming down. I think the anonymity of it really helped."

The booklets were published for a couple years until the creators moved on to other things. Then in the mid 90s someone else started up a Dysfunctional Family Circus website (that link is to an archive of the site). It ran for a couple years, and people submitted captions for 500 panels. Eventually it was shut down by a request from King Syndicates -- and an apparently amicable phone call between the person running the website and Bil Keane himself. A lot of people were really angry that DFC agreed to shutdown, since it seemed like an acceptable form of parody. Plenty of archives are still available on the web.

Ostensibly, Amazon expects reviews to be pertinent to the product, but any filtering they do is either very basic vulgarity blocking, or it is based on actual requests from the manufacturer of the product. So some of these reviews have been removed, but others have lived on, as seen in this reply to a snarky review of I Had A Frightmare!

As wonderful as this review is, it makes me sad to think about how similarly sarcastic reviews used to be deleted by Amazon--about this very book, in fact -- when some of us wrote them in the early 2000s. But, hey, yours is better than mine was anyway, so more power to you.

The world of Amazon product reviews is its own very bizarre ecosystem. As you flip through them, eventually you find the Top Reviewers page. And you'll find Harriet Klausner, who reviewed 24 books today -- so far! Her grand total for reviews is almost 26,000. She has been profiled in the Wall Street Journal, was listed as one of the "top 15 web generation's movers and shakers" in Time, and there is an entire blog devoted to attacking her reviews -- which seems fair, since she's reading books at a ridiculous rate, and quite possibly plagiarizing reviews.

You have to wonder why anyone would become a prolific reviewer on Amazon. It seems like the main motive is to become a member of the invite-only Amazon Vine program, where Amazon sends you a couple of items you choose each month in return for your reviews of those products. The program has been criticized all over the place for probably being a little too shadowy and underhanded.

Here's some other Family Circus books with good reviews:

Pick Up What Things?
Daddy's Cap Is on Backwards
What Does This Say?, archived elsewhere.
Some copies of reviews snagged before they were removed from Amazon.
An Archive of Family Circus Reviews from all over the internet.

There's also funny reviews on Amazon for speaker cables, the famous Three Wolf Moon T-Shirt, a book called Hgiyiyi (written by jjjj), and don't forget Birth Control is Sinful in the Christian Marriages and also Robbing God of Priesthood Children!!.

You also have to wonder why Amazon allows such bullshit products to remain on their site. The speaker cables are particularly heinous, since no one is going to spend $8500 on a cable, especially after reading the mocking commentary. I assume that Amazon figures the funny fake reviews are driving as much traffic as the good actual reviews.

In a twist on this idea, here's the story of people subverting a McDonalds question and answer site in the UK with some pretty awesome questions.

Namey - A Random Name Generator

2011-11-11T00:00:00Z

Recently, I had an idea for a project where I wanted to be able to create random names for people on the fly. It's a fictional schedule for a fictional cable channel, the Not Lifetime Movie Network. Some of the movie titles needed random names. I dug around, and found a few name generators out there, but none of them had an API, or available source code, so I ended up making my own. This quickly turned into a classic project -- I spent maybe an hour or two writing some code to generate a list of fake movies, and ten times longer coming up with a generic library for random name generation. it went something like this:

But when I was done, I had Namey -- a website where you can quickly generate some random names, as well as an underlying library written in ruby. It uses files generated by the US Census Bureau for the 1990 census to generate whatever sort of name you would like. You can pick a gender, specify if you want a last name or not, and if you would like a common name, rare name, etc.

The data itself looks a lot like this (in fact, this is the data):

JAMES 3.318 3.318 1
JOHN 3.271 6.589 2
ROBERT 3.143 9.732 3
MICHAEL 2.629 12.361 4
WILLIAM 2.451 14.812 5
DAVID 2.363 17.176 6
RICHARD 1.703 18.878 7
CHARLES 1.523 20.401 8
JOSEPH 1.404 21.805 9
THOMAS 1.380 23.185 10

So, the government is big on ALL CAPS. Also, there's no punctuation, so O'Brian is OBRIAN. I've done a bunch of massaging to the data - names are mostly in proper case, with apostrophes where it's fairly obvious, but I'm sure I missed some.

For nerds, the code is open source and available on github, so you can fork it and play around as much as you want. As I worked on this project, I developed certain goals:

Write a decent Ruby gem to generate random names -- this is the bulk of the project
Build a website which can use it -- see http://namey.muffinlabs.com/
Play with Twitter's bootstrap library -- I built the website with it. It's pretty cool!
Generate a Javascript API layer in between. If you wanted to, you could pull the JS onto your own site and generate random names. Still working on this part.

In short, by the time I'm done, I intend to have a ridiculously overbuilt system for generating random names.

Five Pawns, a Simple Chess Game

2010-12-29T00:00:00Z

At the beginning of this year, I released Four Pawns, a very basic strategy game. The idea is simple -- on a 4x4 chess board, two players each have four pawns, and you win by either getting a pawn to the other side of the board, eliminating all the other players pieces, or blocking them from moving.

The concept is simple enough, and I wrote a decent AI system for it, but it was relatively easy to win the game, because I limited the amount of processing per turn that the computer would do to pick a move. As a result, the computer wasn't able to look as far ahead as a human player might, and didn't win as much.

The general solution to this is to precompute a couple million boards, and store the results. Then, the AI just looks it up. The AI gets a lot smaller, and also a lot faster. So, I wrote some code to do this, outputted it in a way that made a file that wasn't too huge, and updated the game.

That makes the 4x4 version of the game virtually impossible to win for a human. Basically every move has been computed, and the computer will kick you ass over and over again.

My real plan the whole time was to make a 5x5 version of the game, but Five Pawns failed because the AI was even more limited than in Four Pawns, so it was pretty easy to beat. But now, with a couple bazillion moves computed, it's a lot smarter. Here goes:

UPDATE: Code available on github

Today in History on Twitter

2010-10-22T00:00:00Z

With my copious amounts of free seconds, I've written another Twitter bot. happened_today is operating differently from my other bots, which all reply to keywords posted by other users. Instead it tweets an event from history every hour or so, as my own personal "Today in History" Twitter bot.

The data is scraped from Wikipedia, and I wrote a Really Simple History API to allow general access. The API outputs JSON, which I then slurp into the bot, and spit out random events over the day. It's pretty crude, but it works.

Most Common Town/City Names in the US

2010-08-19T00:00:00Z

Here's the most common town names, again from the US Board on Geographic Names:

Its sort of amazing that the most common name is Midway, and Five Points isn't far behind. And a bunch of the rest are about hope or second chances.

+-----------------+-------+
| name            | tally |
+-----------------+-------+
| Midway          |   218 |
| Fairview        |   213 |
| Oak Grove       |   169 |
| Five Points     |   150 |
| Riverside       |   130 |
| Pleasant Hill   |   124 |
| Mount Pleasant  |   119 |
| Bethel          |   111 |
| Centerville     |   109 |
| New Hope        |   108 |
| Liberty         |    98 |
| Oakland         |    98 |
| Union           |    97 |
| Pleasant Valley |    97 |
| Shady Grove     |    94 |
| Pine Grove      |    92 |
| Salem           |    92 |
| Greenwood       |    91 |
| Pleasant Grove  |    89 |
| Forest Hills    |    89 |
| Oak Hill        |    88 |
| Georgetown      |    86 |
| Lakeview        |    85 |
| Shiloh          |    84 |
| Glendale        |    81 |
| Lakewood        |    80 |
| Concord         |    79 |
| Cedar Grove     |    78 |
| Highland Park   |    77 |
+-----------------+-------+

Montague - Paper or Plastic?

2010-01-01T00:00:00Z

Hey Montague, wondering what goes out on the curb this week? I can only ever find out by looking at what my neighbors have out. At some point I added the info to my calendar but that gets tiresome fast enough. So, I decided to write a quickie website to do the work for me. By the time I was done, I found the official schedule, but it was utterly buried, and the website is easy to refer to, and should work next year and any year after as well. The source code is available on github.