muffinlabs.comhttp://muffinlabs.com/2023-03-09T18:36:00+00:00Colin Mitchellfeedsin.space -- RSS feeds on the fediversehttp://muffinlabs.com/2023/03/09/feedsin-space/2023-03-09T18:36:00+00:002023-03-09T18:36:00+00:00Colin Mitchell<p>Introducing <a href="https://feedsin.space/">feedsin.space</a>, a
service for generating Fediverse accounts that post content from RSS
feeds.</p>
<p>Something that I've learned over the years of
running <a href="https://botsin.space">botsin.space</a> is that a lot
of people want to be able to publish RSS feeds to Mastodon. A few
months ago, I decided to implement a service to run alongside
botsin.space that people could use to setup accounts for RSS feeds,
but without needing to go through the work of setting up a full
Mastodon account on botsin.space.</p>
<p>Using feedsin.space is pretty straightforward. First, you need to
authenticate with the website by sending a message from your mastodon
account
to <a href="https://feedsin.space/feed/admin">@admin@feedsin.space</a>
with the word "help". You'll get a response with a link you can click
on to authenticate with the website. Then, you can create an account
on feedsin.space by specifying a username for the account and the RSS
feed you want to follow. Assuming everything looks good, there will be
an account created at @username@feedsin.space, which you can follow
from your Mastodon account, and any time the RSS feed updates, you'll
get the post in your timeline.</p>
<p>I've implemented a few features to the service beyond the basics, including:</p>
<ul>
<li>You can set the visibility of posts, so they can be public,
unlisted, followers-only, or you can opt to get direct messages from
the service to keep your feed private</li>
<li>The ability to add a content warning to posts</li>
<li>An optional hashtag for posts</li>
<li>A setting for allowing/disallowing search engine indexing</li>
<li>Embedded audio from podcasts (this isn't working as well as I want)</li>
</ul>
<p>In the future, I am thinking about adding a directory of feeds
available on the service, and I have some other things on my todo
list.</p>
<h2 id="the-code">The Code</h2>
<p>feedsin.space is written in Rust, and the code is
available <a href="https://github.com/muffinista/rust-rss-to-fedi">on
Github</a>. There's a few libraries I've used, and other ActivityPub
websites/projects that I frequently referenced to figure out what the
heck I was doing. The ActivityPub protocol can be pretty intimidating
to learn, so I spent a lot of time looking at the code for these
projects:</p>
<ul>
<li>The <a href="https://www.w3.org/TR/2018/REC-activitypub-20180123/">ActivityPub
spec</a></li>
<li>The
rust <a href="https://git.asonix.dog/asonix/activitystreams">ActivityStreams</a>
library. I wouldn't have built the project in Rust without this
library</li>
<li><a href="https://git.joinplu.me/Plume/Plume">Plume</a>
and <a href="https://github.com/LemmyNet/activitypub-federation-rust">Lemmy</a>,
two AP projects which use Rust</li>
<li>Darius
Kazemi's <a href="https://github.com/dariusk/rss-to-activitypub">rss-to-activitypub</a></li>
<li><a href="https://github.com/mastodon/mastodon/">Mastodon</a>
itself. I ran a local copy of Mastodon for testing, and I hacked
into the code for sending/receiving messages to get an idea of what
was going on, for debugging, etc, etc.</li>
</ul>
<h2 id="alternatives">Alternatives</h2>
<p>If you want to get RSS feeds into the Fediverse, but this doesn't seem
like the service you are looking for, there's a few other tools
currently available to do this, including:</p>
<ul>
<li><a href="https://feed2toot.readthedocs.io/en/latest/">feed2toot</a>,
which is a script you can run to post from an RSS feed to an
account.</li>
<li><a href="https://www.jessesquires.com/blog/2022/12/15/rss-to-mastodon/">Zapier and/or IFTTT</a>,
if you're willing to hack a bit.</li>
<li>Darius Kazemi's <a href="https://github.com/dariusk/rss-to-activitypub">rss-to-activitypub</a> project.</li>
<li><a href="https://mastofeed.org/">MastoFeed</a>, which is a web
service to do this, but it's a little mysterious about who runs it,
and it requires a lot of excess permissions.</li>
<li><a href="https://feedbot.net/">FeedBot</a> appears to be similar to MastoFeed, but I can't get it to work for me right now so I don't have much of an opinion about it.</li>
</ul>
How I maintain botsin.spacehttp://muffinlabs.com/2022/09/10/how-i-maintain-botsin-space/2022-09-10T12:49:00+00:002022-09-10T12:49:00+00:00Colin Mitchell<p>I've been meaning to write up some notes on how I manage <a href="https://botsin.space/">botsin.space</a>, and how I've dealt with certain
problems in the past – in particular, the several days of <a href="https://botsin.space/@muffinista/108273247048311451">issues and
downtime</a> in May 2022.</p>
<h2 id="hosting">Hosting</h2>
<p>botsin.space is hosted at DigitalOcean. There's nothing really special about DO
(in fact, I think about moving often) but there's a few features that have
really saved me a few times now. First, it's very easy to create a new disk
volume, and once you have a volume, it's pretty easy to expand its size. I store
the database on a separate volume. Currently the database is taking up ~65GB of
space. When the volume is close to full, I'll expand it as needed. Second, it's
very easy to take snapshots of volumes. I have a script that takes a nightly
snapshot of the database volume. I also make snapshots before doing upgrades,
server maintenance, etc. If something bad happens and I need to restore the db
copy, I can create a new volume, attach it to the server, and switch from the
broken db to the snapshot db. I've had to do this several times, and knowing I
can do it again really helps alleviate the stress of running the server.</p>
<p>I run the instance using <a href="https://github.com/mastodon/mastodon/blob/main/docker-compose.yml">docker
compose</a>. I know that docker causes some people a lot of suffering (enough
that the official mastodon documentation doesn't seem to include using docker as
an option anymore), but I like it for a few reasons. First, I have a lot of
professional experience using docker, so I'm used to the different ways it can
cause you pain. Second, I find that using docker makes it a little easier to run
upgrades and rollbacks. Third, it makes it a little easier to maintain the
code/scripts I need to run the instance in git without having to fork the entire
mastodon codebase. Finally, it also makes the service a lot more portable, since
if/when I want to move the instance to a new server, I don't need to reinstall
as many required programs.</p>
<h2 id="the-code">The code</h2>
<p>I have a slightly customized build of mastodon, with a docker file that looks an
awful lot like this:</p>
<pre><code>FROM tootsuite/mastodon:v3.5.3
COPY app/views/about/_registration.html.haml /opt/mastodon/app/views/about/
COPY app/views/about/_botsinspace-custom-signup.html.haml /opt/mastodon/app/views/about/
</code></pre>
<p>This takes the <a href="https://hub.docker.com/r/tootsuite/mastodon/#!">pre-existing image</a> for mastodon, copies a few customized files in, and that's it!</p>
<p>Here's what the Dockerfile and docker-compose.yml look like:</p>
<script src="https://gist.github.com/muffinista/b7674bd5afe5d68089a92fb034d72c9e.js"></script>
<h2 id="upgrades-and-maintenance">Upgrades and maintenance</h2>
<p>When it's time to run an upgrade, I make a snapshot of the database, update the
version numbers in docker-compose.yml, and run something along the lines of
<code>docker compose build && docker compose up -d</code>. This builds a new docker image
and deploys it, then restarts everything as needed. If something goes wrong, I
roll back the version and re-run <code>docker compose up -d</code>. The configuration file
itself could be a little more optimized (ideally I'd only specify the version
stuff once), but I'm lazy and usually do it via search/replace in my editor.</p>
<h2 id="other-stuff">Other stuff</h2>
<p>A few things happen outside of docker:</p>
<p><strong>nginx</strong> - nginx runs directly on the server, and routes traffic to docker. The
configuration is reasonably close to the default mastodon configuration file.
There's a couple of rules in there to block some bad actors, and there's some
rate-limiting as well.</p>
<p><strong>Lets Encrypt</strong> - I use Let's Encrypt to setup HTTPS certificates/etc. I use
DNS validation since there's a special plugin that handles everything via the
Digital Ocean API.</p>
<p><strong>Scheduled tasks</strong> - There's a few nightly tasks running in cron – making
backups, running mastodon maintenance/etc.</p>
<p><strong>File storage</strong> - File storage is a huge chunk of the expense of running the
instance. Uploads are stored in Digital Ocean's Spaces, which is basically a
clone of S3. I kept files on S3 for awhile, but I don't like giving Amazon
money, and Spaces is a little cheaper. It's also probably better for performance
to have the file storage closer to the actual server.</p>
<p><strong>Emails</strong> - Emails are sent with <a href="https://mailpace.com/">MailPace</a> and it works
well enough that I basically never think about it.</p>
<h2 id="server-upgrades">Server upgrades</h2>
<p>The botsin.space server is running Ubuntu. Server updates aren't too much of a
concern, but if I need to do upgrade between major versions or something else
large like that, I take advantage of the fact that the database is on its own
dedicated volume. I can boot an entirely new server, install any required
software (I basically have a script for this), copy over my configuration files,
then detach the volume from the old server, attach it to the new one, and update
DNS to point to the new server.</p>
<h2 id="moderation-and-new-accounts">Moderation and new accounts</h2>
<p>At the moment, I handle all moderation issues and new account requests myself. I
use a slightly tweaked version of <a href="https://github.com/bclindner/ivory">ivory</a> to help with spam signups
and things like that. It's certainly possible that this will become enough work
that I can't handle it myself, but that hasn't happened yet.</p>
<h2 id="when-things-go-wrong">When things go wrong</h2>
<p>The upgrade to Mastodon v3.5.0 involved upgrading PostgreSQL from version 9.6 to
14. There were instructions for running this upgrade that were along the lines
of: make a dump of the data in the old version of postgres, upgrade, then import
the data into the new version. With a large database, that can take hours or
even days, and if it fails while it's running, that's a bunch of time that
you've wasted. So, I took a snapshot and shutdown botsin.space, and started
running the dump. Unfortunately, the process failed for me over and over again,
and when I eventually got it to work, and tried to bring botsin.space back
online, it was clear that there were some data issues. I rolled back to the old
snapshot and started running upgrade tests on a separate test server.</p>
<p>Eventually I found a <a href="https://github.com/tianon/docker-postgres-upgrade">neat little docker
image</a> that can be used to upgrade between postgres versions, and that seemed
to work.</p>
<p>However, there was another problem – botsin.space was experiencing a <a href="https://docs.joinmastodon.org/admin/troubleshooting/index-corruption/">data
corruption issue</a>. When I tried to run mastodon's custom <a href="https://github.com/mastodon/mastodon/blob/b07906bdb0127cd73662506b519183cc51a2758e/lib/mastodon/maintenance_cli.rb#L139">fix-duplicates</a>
script, I found a whole new set of issues. That script checks a bunch of tables
for duplicate data. Many of those tables have a manageable amount of data in
them, but some of them – particularly the conversations and status tables –
each of which have over 50 million rows in them right now. The script was trying
to run fairly complicated queries against that table, but the server didn't have
enough memory to process the result. This meant I needed to write some custom
ruby code to do the same thing without causing quite so much server load. I
managed to do that (luckily I program in Ruby for a living), let it run for a
couple of hours, and when it was done, I was able to bring botsin.space back
online.</p>
<p>If I hadn't been able to take snapshots, and increase the database storage
volume as needed, and if I wasn't well-versed in Ruby, there's a good chance
that this upgrade would've either failed entirely, involved a lot of data lostt,
or taken many days/weeks to finish.</p>
SpaceJamCheck in the New York Timeshttps://www.nytimes.com/2021/05/21/style/welcome-to-the-space-jam-again.html2021-05-24T00:00:00+00:002021-05-24T00:00:00+00:00Colin MitchellI was interviewed in the NYT as part of an article about the Space Jam website and my Twitter bot @SpaceJamCheck.Emoji Fireplace apphttps://muffinlabs.com/emoji-fireplace2019-12-29T00:00:00+00:002019-12-29T00:00:00+00:00Colin MitchellI made an app version of my emoji yule log for Android devices.Audio Sweetener Bothttps://botsin.space/@audiosweetener2019-01-10T00:00:00+00:002019-01-10T00:00:00+00:00Colin MitchellHere's a bot that posts audio clips from the BBC Sound Effects Archive.Tall Boyhttps://glitch.com/~tall-boy2018-11-16T00:00:00+00:002018-11-16T00:00:00+00:00Colin MitchellI made a p5.js sketch that is a rendition of those inflatable dancers you see in front of stores and in other random placesNovember Rain Bothttps://botsin.space/@NovemberRain2018-11-16T00:00:00+00:002018-11-16T00:00:00+00:00Colin MitchellI made a bot that posts frames from the November Rain video. It'll only run for the month of November. It runs in rough order, so the bot starts at the start of the video, and it should end on the last frame.Pitchers and Catchershttp://pitchersandcatchersreport.in/2018-10-29T00:00:00+00:002018-10-29T00:00:00+00:00Colin MitchellI made a website to let you know when Spring Training startsThe Secret Broadcasthttps://secretbroadcast.net/2018-09-18T00:00:00+00:002018-09-18T00:00:00+00:00Colin MitchellThe Secret Broadcast is a numbers station podcast. The content of the podcast is encrypted messages. You can submit a message and it will be read as an encrypted sequence of letters and numbers, and which can be decrypted with a key only available to you and anyone you share it with.Emily Dickinson's Herbariumhttps://botsin.space/@herbarium2018-09-14T00:00:00+00:002018-09-14T00:00:00+00:00Colin MitchellI made a bot that posts images from Emily Dickinson's collection of pressed flowers and botanical samples.moirehttps://glitch.com/~p5-moireish-color2018-04-10T00:00:00+00:002018-04-10T00:00:00+00:00Colin MitchellI made another screensaver on glitch before adding it to Before Dawn, it's a neat experiment with moire patterns and colors.p5.js fullscreen starter on glitchhttps://p5-fullerscreen-starter.glitch.me/2018-04-06T00:00:00+00:002018-04-06T00:00:00+00:00Colin MitchellI made a p5.js starter project on glitch with a full screen toggle. Then I used that project to write a version of the old <a href='https://before-dawn-mystify.glitch.me'>mystify</a> screensaver.Before Dawn v0.9.25https://github.com/muffinista/before-dawn/releases2018-04-06T00:00:00+00:002018-04-06T00:00:00+00:00Colin MitchellBefore Dawn has been updated with a couple new features and new screensavers. In particular, you can specify that you only want to run screensavers on a single monitor -- which is a handy way to keep your CPU load lower.Before Dawn v0.9.14https://github.com/muffinista/before-dawn/releases2017-12-12T00:00:00+00:002017-12-12T00:00:00+00:00Colin MitchellI rewrote Before Dawn to use Vue.js, fixed a pile of bugs, and improved perfomance.Nice Gradientshttp://muffinlabs.com/gradients/2017-11-17T00:00:00+00:002017-11-17T00:00:00+00:00Colin MitchellI made a screensaver that slowly updates color gradients on your screen. It's part of <a href='https://github.com/muffinista/before-dawn'>Before Dawn</a> but I also wanted to put a copy of it on my website.Before Dawn v0.9.11https://github.com/muffinista/before-dawn/releases2017-11-16T00:00:00+00:002017-11-16T00:00:00+00:00Colin MitchellI continue to plug away at Before Dawn. It is getting very stable and usuable now. I've added a bunch of screensavers and have plans to add even more.Editorialize Chrome Pluginhttps://chrome.google.com/webstore/detail/editorialize/emfpglodamcbfnecphcmlkbhnlloghko2017-08-30T00:00:00+00:002017-08-30T00:00:00+00:00Colin MitchellI made a simple Chrome plugin which replaces any text on a NY Times editorial with poop emoji. The source code is <a href='https://github.com/muffinista/editorialize'>on github</a>Lonely Computerhttp://muffinlabs.com/screensavers/4-listening/2017-08-06T00:00:00+00:002017-08-06T00:00:00+00:00Colin MitchellI wrote about a screensaver that reacts to noise.Defraghttp://muffinlabs.com/screensavers/3-defragment/2017-07-28T00:00:00+00:002017-07-28T00:00:00+00:00Colin MitchellI wrote a little bit about a screensaver that defragments your screen.Atari Attract Modehttp://muffinista.github.io/before-dawn-screensavers/#atari-attract-mode2017-07-14T00:00:00+00:002017-07-14T00:00:00+00:00Colin MitchellI added an Atari Attract Mode to Before Dawn. It cycles the screen through low luminosity colors.@IndyDaySpeechhttps://twitter.com/IndyDaySpeech2017-07-04T00:00:00+00:002017-07-04T00:00:00+00:00Colin MitchellI made a bot that tweets the speech from Independence Day every 4th of July. The source code is <a href='https://github.com/muffinista/indyday'>available on github</a>@eliza on mastodonhttps://botsin.space/@eliza2017-04-20T00:00:00+00:002017-04-20T00:00:00+00:00Colin MitchellI made another bot for Mastodon, this one is a chatbot version of ELIZA@loveletter on mastodonhttps://botsin.space/@loveletter2017-04-05T00:00:00+00:002017-04-05T00:00:00+00:00Colin MitchellI've been exploring Mastodon as a social network so naturally I made a bot for it. I'm running a bot-friendly instance at <a href='https://botsin.space/'>botsin.space</a> too.The Journey of EarthRoverBothttp://muffinlabs.com/2017/03/28/the-journey-of-earth-rover-bot/2017-03-28T16:20:00+00:002017-03-28T16:20:00+00:00Colin Mitchell<p><img src="/2017/images/trip-full.jpg" class="full-bleed" /></p>
<p><a href="https://twitter.com/earthroverbot">@EarthRoverBot</a> is in the final
stretch of a journey from the edge of Maine to the US/Mexico border.
The bot is entirely virtual, and the trip is powered by Google's
Street View data. It takes a step forward every 12 minutes. It has a
location and a bearing, and if there's valid Street View data in that
direction, then it moves forward. If there isn't data, it adjusts
course until it finds a way to continue. With each step it sends the
image to Twitter.</p>
<p>The bot can be controlled via commands sent through tweets, but mostly
it runs on autopilot, with a simple algorithm that it uses to work its
way towards the border crossing in San Diego. At some point in the
next few weeks, the bot will send an image that looks something like
this:</p>
<p><img src="/2017/images/end-view.png" /></p>
<p>And the trip will be done.</p>
<p>Of all the bots I've made, I think this one is my favorite. I love
the experience of a slow, meditative journey, without using a map,
getting stuck in unusual places, finding dead-ends and the insides of
buildings in places where the data is weird.</p>
<p>Also, while the bot is basically automated, it can accept human
commands, so people have been able to control the course of the bot.
In fact, it never would have made it as far as it has without help
from people.</p>
<p>At the same time, thanks to the use of Google Street view, the journey
represents a fairly bizarre version of a road trip. Everything that
you are able to see has been dictated by largely commercial needs of a
gigantic company. It's almost always sunny in the world presented by
Street View, although sometimes seasons will change without warning.
There's very little traffic, you never see an accident or weather. The
trip is largely devoid of visible people. The quality of light is
almost constant – it's always the middle of the day and the sun is
usually out. Over days or weeks, the color palette changes in subtle
ways.</p>
<p><img src="/2017/images/flatten.gif" /></p>
<p>When I made the bot, sending it from one corner of the country to the
other seemed like a fun and fairly innocuous idea, but it spent an
entire election season barreling towards a border that defined so much
of the election, and now it's impossible to avoid the feeling that
driving something towards a destination like this is inherently
political.</p>
<p>Here's a moment with a collection of images from the journey:</p>
<p><a class="twitter-moment" href="https://twitter.com/i/moments/845805228364648448">EarthRoverBot</a>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>Here's a map of the final leg of the trip:</p>
<p><img src="/2017/images/rover-end-map.png" /></p>
<p>(it's already moved past this point since I'm the worst blogger ever lol)</p>
<p>I do have plans for the bot after it has finished this trip. I might
add the ability to jump to specific locations, or I might just start
another trip between two other points. I've thought about making a web
version, and I'll definitely release the source code for the bot.</p>
<p>Here's a video of the trip of the bot. I took every image is posted to
Twitter, filtered out ones where the bot moved fewer than 10 meters,
then composited them down to a few thousand frames, then turned that
into a video. I'm still experimenting, so I might come up with
something more interesting in the future.</p>
<iframe width="560" height="400" src="https://www.youtube.com/embed/unow3_ipkmQ?rel=0" frameborder="0" allowfullscreen=""></iframe>
Before Dawnhttps://github.com/muffinista/before-dawn2017-02-28T00:00:00+00:002017-02-28T00:00:00+00:00Colin MitchellI made a screensaver tool called Before Dawn! I <a href='/screensavers/'>wrote a little about screensavers</a> too.Trump Administration Twitter Archiveshttps://archive.org/download/twitterArchiveDumps2017-02-12T00:00:00+00:002017-02-12T00:00:00+00:00Colin MitchellI wrote some code to generate archives of realDonaldTrump tweets, along with several Trump admin accounts. The archive should be constantly updated any time someone tweets. I made this before realizing that the <a href='http://trumptwitterarchive.com/'>trump twitter archive</a> has made its <a href='https://github.com/bpb27/political_twitter_archive'>data available on github</a> but it might still be handy.buzzcuthttps://raw.githubusercontent.com/muffinista/buzzcut/master/output-2.txt2016-11-30T00:00:00+00:002016-11-30T00:00:00+00:00Colin MitchellThis was my entry for the 2016 #NaNoGenMo -- a novel generated using cutup techniques on pages from buzzfeed.@drillify_exehttps://twitter.com/drillify_exe2016-11-18T00:00:00+00:002016-11-18T00:00:00+00:00Colin Mitchell@drillify_exe is a bot that juxtaposes a random tweet with a random dril tweet. The source code is <a href='https://github.com/muffinista/drillify_exe'>here</a>.@happened_todayhttps://twitter.com/happened_today2016-11-04T00:00:00+00:002016-11-04T00:00:00+00:00Colin MitchellI updated @happened_today to include images from wikipedia when possible. The source code is <a href='https://github.com/muffinista/happened_today'>on github</a>.@muffin_exe_stahttps://twitter.com/muffin_exe_sta2016-11-04T00:00:00+00:002016-11-04T00:00:00+00:00Colin MitchellI made an ebooks-style bot that glitches old pictures from my main Twitter account.@head_2_keyboardhttps://twitter.com/head_2_keyboard2016-11-04T00:00:00+00:002016-11-04T00:00:00+00:00Colin Mitchell@head_2_keyboard is an ultra-realistic simulation of smashing a head into a keyboard. <a href='https://github.com/muffinista/head_2_keyboard'>here is the source code</a>.The Making of @lists_of_listshttp://muffinlabs.com/2016/08/16/the-making-of-lists-of-lists/2016-08-16T18:37:00+00:002016-08-16T18:37:00+00:00Colin Mitchell<p>I thought I'd write something about how I made the bot
<a href="https://twitter.com/lists_of_lists">@lists_of_lists</a>, from start to
finish. It's a relatively simple idea, so if you're interested in
writing a bot for the first time, this might be a helpful guide.</p>
<p>I have a bit of an advantage for two reasons. First, I'm a
professional programmer, and have been for many years. I know ruby
very well, and it's the language I use to build most of my bots.
Second, I wrote the
<a href="https://github.com/muffinista/chatterbot">library</a>, that I use to
make most of my bots, so it's basically adapted to my needs.</p>
<p>That said, if you are not a developer, but want to make a bot, you
definitely can, but you should probably expect to have to learn a
little bit about coding, and also a little bit about server
management, because getting your bot to run consistently is sometimes
the hardest part of the process.</p>
<h2 id="the-idea">The Idea</h2>
<p>I spent a lot of time exploring wikipedia's data downloads when I was
building <a href="http://gopherpedia.com/">gopherpedia</a>. I knew that there
were a lot of 'list of' pages, and that some of them were
<a href="https://en.wikipedia.org/wiki/List_of_salads">amusing</a> and
interesting. I decided to see if I could download a list of them so
that I could play around with the data.</p>
<p>Wikipedia offers database dumps at
<a href="https://dumps.wikimedia.org/">https://dumps.wikimedia.org/</a>. The main
files here are gigantic XML files that represent the complete contents
of the website. Depending on what you are interested in, some of these
XML files are 12GB or larger. That's a single XML file! Parsing those
is a real challenge.</p>
<p>Luckily, they offer a much smaller file of just page titles. I
downloaded that file, and searched it for pages with the words 'list of'
or 'lists of' in the title. I ended up running this a few times, so I
combined it all into a single shell command that looks like this:</p>
<pre><code>curl https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz > enwiki-latest-all-titles-in-ns0.gz && gzcat enwiki-latest-all-titles-in-ns0.gz | grep -i 'List_of\|Lists_of' > lists.txt
</code></pre>
<p>At that point, I had a text file that looked a little like this:</p>
<pre><code>"List_of_the_works_of_Charles_Cottet_depicting_scenes_of_Brittany
"List_of_the_works_of_Charles_Cottet_depicting_scenes_of_Brittany"
'List_of_Mongolian_musical_instruments
(List_of_Toni,la_Chef_episodes)
/List_of_Parliament_of_Australia_Reports_on_Sport
1996_World_Monuments_Fund_List_of_Most_Endangered_Sites
1996_World_Monuments_Watch_List_of_Most_Endangered_Sites
1998_World_Monuments_Fund_List_of_Most_Endangered_Sites
1998_World_Monuments_Watch_List_of_Most_Endangered_Sites
2000_World_Monuments_Fund_List_of_Most_Endangered_Sites
</code></pre>
<h2 id="sit-on-it-for-a-year">Sit on it for a year</h2>
<p>Once I had the data, I had no idea what I actually wanted to do with
it. I thought about running it through a Markov chain tool, or
maybe swapping out words randomly, adding adjectives and modifiers,
etc, etc.</p>
<p>I couldn't really decide what to do, so I didn't do anything. I let
the data sit around for a year or so.</p>
<p>Eventually, I decided to just keep it simple and make a bot that would
simply iterate through the list of lists. I randomized the data to
make it a little more interesting:</p>
<pre><code>gshuf lists.txt > lists-random.txt
</code></pre>
<p>(gshuf is an OSX command to randomly shuffle the lines of a file. If
it's not installed already, you can install it via <code>brew install
coreutils</code>. On Linux, there's a command called <code>shuf</code> that does the
exact same thing. I suspect it's pre-installed on most Linux systems.
Thanks to <a href="https://twitter.com/ckolderup">@ckolderup</a> for pointing
all of this out!</p>
<h2 id="start-the-bot">Start The bot</h2>
<p>I had the data, now I needed the bot. Amazingly, when I went to
Twitter to register a new account, my first choice was available, so
<a href="https://twitter.com/lists_of_lists">@lists_of_lists</a> was born.</p>
<p>I made myself a directory to hold onto my bot files, and copied the
data there. Then, I setup a <code>Gemfile</code> and got ready to install
<code>chatterbot</code></p>
<pre><code>mkdir lists_of_lists
</code></pre>
<p>I made a Gemfile that looks like this:</p>
<pre><code>source "https://rubygems.org"
gem "chatterbot", :git => "git://github.com/muffinista/chatterbot.git"
</code></pre>
<p>Then I ran <code>bundle</code> to install chatterbot.</p>
<p>Chatterbot has a script which will walk you through the process of
setting up a Twitter bot. It will also create a template file for the
bot, and setup your credentials file. I ran it!</p>
<p><strong>NOTE</strong> I ran all of this while being logged into Twitter as the
account for the bot.</p>
<pre><code>bundle exec chatterbot-register
</code></pre>
<p>It prints out a message telling me what happens next:</p>
<pre><code>Welcome to Chatterbot. Let's walk through the steps to get a bot running.
Hey, looks like you need to get an API key from Twitter before you can get started.
Have you already set up an app with Twitter? [Y/N]
</code></pre>
<p>I haven't setup an app yet, so I put 'N'</p>
<pre><code>> N
OK, I can help with that!
Please hit enter, and I will send you to https://apps.twitter.com/app/new to start the process.
(If it doesn't work, you can open a browser and paste the URL in manually)
Hit Enter to continue.
</code></pre>
<p>The form looked a lot like this (they change this a lot):</p>
<p><img src="http://muffinlabs.com/images/twitter-app-signup.png" alt="Twitter App Form" title="Twitter App Form" /></p>
<p>Once you've filled out that form, Twitter will issue you some API
keys. I copied those keys into chatterbot-register, which was waiting
for the input:</p>
<p><img src="http://muffinlabs.com/images/twitter-app-settings.png" alt="Twitter App Settings" title="Twitter App Settings" /></p>
<pre><code>Once you've filled out the app form, click on the 'Keys and Access Tokens' link
Paste the 'Consumer Key' here: 123456
Paste the 'Consumer Secret' here: abcdefg
Now it's time to authorize your bot!
Do you want to authorize a bot using the account that created the app? [Y/N]
</code></pre>
<p>I do want to authorize this account, so I say so:</p>
<pre><code>> Y
OK, on the app page, you can click the 'Create my access token' button
to proceed.
</code></pre>
<p>I do that, then I paste the results:</p>
<pre><code>Paste the 'Access Token' here: 123456
Paste the 'Access Token Secret' here: 45678
</code></pre>
<p>Hooray, now I have two files! lists_of_lists.rb is a template file for
my bot. It lists a bunch of features of chatterbot and gives you
something to work from. lists_of_lists.yml has the credentials for the
bot, and will also track some other information needed to send out
tweets.</p>
<p>My idea for the bot is pretty simple. Each time it runs, it should
open up the file with all the lists in it, read the next one, and
tweet it out.</p>
<p>The bot will need to keep track of which line it sent
out last, and update that value every time. One of the features of
chatterbot is that the YAML file which holds the configuration data is
accessible to the bot, and is updated with any changes each time the
bot is run. This means you can use it to track variables that you need
to persist over time, such as the last index of a file that you used.</p>
<p>So I start with some ruby to handle all of that:</p>
<pre><code>SOURCE = "lines.txt"
bot.config[:index] ||= 0
if ENV["FORCE_INDEX"]
bot.config[:index] = ENV["FORCE_INDEX"].to_i
end
data = File.read(SOURCE).split(/\n/)
source = data[ bot.config[:index] ]
puts source
# the page title will have underscores in it, get rid of those
tweet_text = source.gsub(/_/, " ")
</code></pre>
<p>This code sets the index variable, opens the file "lines.txt", turns
it into an array by splitting on newlines, and then reads the proper
value from that array.</p>
<h2 id="make-it-nicer">Make it Nicer</h2>
<p>At this point, I could just tweet that value out like this:</p>
<pre><code>tweet tweet_text
</code></pre>
<p>And be done. I decided that would be a little boring though, and I
started to wonder about pulling an image from the wikipedia page for
the list. Some lists have images on them, and they can be
<a href="https://en.wikipedia.org/wiki/List_of_salads">pretty funny</a>.</p>
<p>Wikipedia has an API, and there are a few ruby libraries for accessing
it. I decided to check out the
<a href="https://github.com/kenpratt/wikipedia-client">official client</a> since
I had never used it before. My assumption was that I would need to
parse out images from the source text, but it turns out that there is
a method you can use to get a list of images! Anyway, here's that code</p>
<pre><code>page = Wikipedia.find( source )
opts = {}
# check if there are any images
if page.image_urls && ! page.image_urls.empty?
puts page.image_urls.inspect
# pick an image at random
image_url = filter_images(page.image_urls).sample
puts image_url
if image_url && image_url != ""
# make a local copy of the image
opts[:media] = save_to_tempfile(image_url)
end
end
</code></pre>
<p>I added a simple method <code>filter_images</code> which rejects any SVG files:</p>
<pre><code>def filter_images(list)
list.reject { |l| l =~ /.svg$/ }
end
</code></pre>
<p>And a second method <code>save_to_tempfile</code> which makes a local copy of the
image:</p>
<pre><code>def save_to_tempfile(url)
uri = URI.parse(url)
ext = [".", uri.path.split(/\./).last].join("")
dest = File.join "/tmp", Dir::Tmpname.make_tmpname(['list', ext], nil)
puts "#{url} -> #{dest}"
open(dest, 'wb') do |file|
file << open(url).read
end
# if the image is too big, let's lower the quality a bit
if File.size(dest) > 5_000_000
`mogrify -quality 65% #{dest}`
end
dest
end
</code></pre>
<p>This method has one additional twist, which is that it checks the size
of the downloaded file. If it's too large, it runs the ImageMagick
command <code>mogrify</code> on it to drop the quality down.</p>
<p>At this point, I have the text of a tweet, a <code>page</code> object from the
Wikiedpedia API library, and a hash that might have a file in it. I
combine it all together and tweet it out:</p>
<pre><code>output = [ tweet_title, page.fullurl ].join("\n")
begin
tweet(output, opts)
rescue Exception => e
puts e.inspect
end
</code></pre>
<p>Finally, I increment the index variable.</p>
<pre><code>bot.config[:index] += 1
</code></pre>
<p>When the script is done running, this value will be updated in the
YAML config file for the bot.</p>
<p>During this whole process, I ran the script a couple times. Chatterbot
has a <code>debug_mode</code> command, which you can use to run a script without
actually sending a tweet, which is pretty handy.</p>
<p>I'm a pretty messy coder, especially when I'm working on personal side
projects, so I fixed a couple bugs, spent awhile cleaning up my junky
code, etc, etc. Once I was happy with it, I uploaded my code to the
server where I run my bots.</p>
<p>Then I needed to setup a cron job to run the bot every few hours. I
decided to run the bot every two hours for starters (I might slow it
down later), and for variery I run it at 2 minutes past the hour. This
is what the job looks like:</p>
<pre><code>2 */2 * * * . ~/.bash_profile; cd /var/stuff/lists_of_lists/; bundle exec ./lists_of_lists.rb >> tweets.log 2>&1
</code></pre>
<p>The first bit specifies when the job runs. The rest of it is the
command that executes the bot. cron jobs usually run in a different
environment then you get when you login to a server via SSH, so you
need to explicitly load your environment, cd into the directory where
the script is, and run the script. the <code>>> tweets.log 2>&1</code> bit sends any output
into the tweets.log file, which I can check for any errors/etc.</p>
<p>Anyway, that's about it! I've put the code <a href="https://github.com/muffinista/lists_of_lists">on
github</a> – please feel
free to take it and adapt it to your needs!</p>
@lists_of_listshttps://twitter.com/lists_of_lists2016-08-05T00:00:00+00:002016-08-05T00:00:00+00:00Colin MitchellTwitter bot that tweets lists found on Wikipedia. Source code is <a href='https://github.com/muffinista/lists_of_lists'>here</a>.@HulkDonaldTrumphttps://twitter.com/HulkDonaldTrump2016-08-04T00:00:00+00:002016-08-04T00:00:00+00:00Colin MitchellTwitter bot that tweets like the Hulk version of Donald Trump. Here's the <a href='https://github.com/muffinista/HulkDonaldTrump'>source code</a>.Tweet Maskerhttps://chrome.google.com/webstore/detail/tweet-masker/aobdgenfpejjjfcpagkhognobonnjcbc2016-06-10T00:00:00+00:002016-06-10T00:00:00+00:00Colin MitchellChrome extension that can mask tweets that have content warnings.@kiki_flies_exehttps://twitter.com/kiki_flies_exe2016-05-01T00:00:00+00:002016-05-01T00:00:00+00:00Colin Mitchella bot that outputs scenes from Kiki's Delivery Service.@cat_in_fieldhttps://twitter.com/cat_in_field2016-03-07T00:00:00+00:002016-03-07T00:00:00+00:00Colin Mitchella cat playing in a field on Twitter. Source code is at <a href='https://github.com/muffinista/cat_in_field'>https://github.com/muffinista/cat_in_field</a>@snowfall_exehttps://twitter.com/snowfall_exe2015-12-14T00:00:00+00:002015-12-14T00:00:00+00:00Colin Mitchella bot that will turn an image into an animated GIF with falling snow. Source code is at <a href='https://github.com/muffinista/snowfall_exe'>https://github.com/muffinista/snowfall_exe</a>emoji yule loghttp://muffinlabs.com/emoji_yule_log/2015-12-05T00:00:00+00:002015-12-05T00:00:00+00:00Colin Mitchell@yulelogbothttps://twitter.com/yulelogbot2015-12-05T00:00:00+00:002015-12-05T00:00:00+00:00Colin Mitchellan emoji-powered fireplace on twitter, tweeting every 30 minutes through the holiday season.@wayback_exehttp://muffinlabs.com/wayback_exe/2015-10-15T00:00:00+00:002015-10-15T00:00:00+00:00Colin Mitchell@botglehttp://muffinlabs.com/botgle/2015-07-01T00:00:00+00:002015-07-01T00:00:00+00:00Colin Mitchell@kaleid_o_bothttp://muffinlabs.com/kaleid_o_bot/2015-02-01T00:00:00+00:002015-02-01T00:00:00+00:00Colin MitchellChatterbot: A Ruby Library for Twitter Botshttp://muffinlabs.com/chatterbot.html2015-01-01T00:00:00+00:002015-01-01T00:00:00+00:00Colin MitchellA Real Riverhttp://muffinlabs.com/2014/12/11/a-real-river/2014-12-11T00:25:00+00:002014-12-11T00:25:00+00:00Colin Mitchell<p><a href="https://twitter.com/arealriver">@ARealRiver</a> is a Twitter bot that charts the course of a
generative river via emoji. The course of the river is constant as it
transitions between tweets, so you can scroll through 100s of tweets
and watch the river expand and shrink, and meander back and forth,
passing cities and forests and volcanoes and other scenery as it goes.</p>
<p><img src="/images/ARealRiver.png" /></p>
<p>There were a lot of different inspirations for this bot. I was
directly influenced by <a href="https://twitter.com/katierosepipkin">@katierosepipkin</a>'s
<a href="https://twitter.com/tiny_star_field">@tiny_star_field</a>, <a href="https://twitter.com/dungeon_bot">dungeon_bot</a> by
<a href="https://twitter.com/jeffthompson_">@jeffthompson_</a>, as well as by accounts like
<a href="https://twitter.com/crashtxt">@crashtxt</a> and the <a href="https://twitter.com/hashtag/140art?src=hash">#140art</a> hash tag.</p>
<p>Another lingering inspiration was a book from the early 80s: <em>Computer
Spacegames</em> from Usborne Publishing.</p>
<p><a href="/images/computer-spacegames.jpg"><img src="/images/computer-spacegames-small.jpg" /></a></p>
<p>This book was one of several that introduced me to programming. You
can get a look at it and many others like it <a href="http://mocagh.org/loadpage.php?getcompany=usborne-hayes">here</a>. It's
full of source code for simple games written in BASIC. In particular,
there's one called <em>Death Valley</em>.</p>
<p><a href="/images/death-valley.jpg"><img src="/images/death-valley-small.jpg" /></a></p>
<p>This was a super-simple game that placed you in a canyon that probably
looked a lot like this:</p>
<pre>
* *
* *
* *
* *
* *
* *
* *
* *
* *
* *
* X *
</pre>
<p>Your ship is the X, and you need to run along the canyon for as long
as possible. Good luck!</p>
<p>I spent years iterating on programs like this as a young programmer,
all the way through high school. I would experiment with different
output, different speeds, obstacles, etc. It's always stuck with me
and ARealRiver is definitely inspired by my time with this code.</p>
EarthRoverBothttp://muffinlabs.com/rover/2014-10-01T00:00:00+00:002014-10-01T00:00:00+00:00Colin MitchellUS Prisonshttp://muffinlabs.com/2014/09/13/us-prisons/2014-09-13T13:14:00+00:002014-09-13T13:14:00+00:00Colin Mitchell<p>I launched <a href="https://twitter.com/usprisons">@USPrisons</a> on Twitter a
few weeks ago. It will output every prison in the
United States – or at least, I think it's all of them. I found a
<a href="http://www.insideprison.com/">website</a> with the data, did a bunch of parsing and cleanup, and ended
up with 4763 prisons. The bot should spend a year listing them all,
along with a few stats, and a picture if possible.</p>
<p>If you're interested, I released
<a href="https://github.com/muffinista/prison_scrape">the code</a> that does the
parsing on github.</p>
StckMrktStatus - Providing Logical Explanations for the Stock Markethttp://muffinlabs.com/2014/04/21/stckmrktstatus-providing-logical-explanations-for-the-stock-market/2014-04-21T00:00:00+00:002014-04-21T00:00:00+00:00Colin Mitchell<p>I've always thought the stock market reports you hear on the news are
fairly silly. "The Dow Jones was up x% because this or that happened."
The people saying those things always sound smart and informed, but no
one really has any idea why a stock goes up or down in value. So, I
made a bot to do the same thing. <a href="https://twitter.com/StckMrktStatus">@StckMrktStatus</a> will pick a stock
from the NASDAQ or Dow Jones, see how it is doing for the day, and
then add a reason for the change. The reasons are pulled from tweets
that have the word 'because' on them. It's pretty simple but seems to
work nicely:</p>
<p><a class="twitter-timeline" href="https://twitter.com/StckMrktStatus" data-widget-id="458347021812789248">Tweets by @StckMrktStatus</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script></p>
<p>The code is pretty simple, and I'll post it sometime soon (I'm working
on a post about the code of my last few bots in general).</p>
SpaceJamCheck: Space Jam website monitoring on Twitterhttp://muffinlabs.com/2014/01/09/spacejamcheck-space-jam-website-monitoring-on-twitter/2014-01-09T00:00:00+00:002014-01-09T00:00:00+00:00Colin Mitchell<p>People who have been online for awhile probably know that the <a href="http://www2.warnerbros.com/spacejam/movie/jam.htm">website
for Space Jam</a>, a movie from 1996, is online still, and is essentially
unchanged:</p>
<p><img src="/images/space-jam-small.jpg" /></p>
<p>(If you don't know what I'm talking about, you can read about it <a href="http://lmgtfy.com/?q=space+jam+website+still+up">here</a>.)</p>
<p>At the end of 2010, someone noticed that the website was still online.
Before I did a little research, I was convinced that people must have
realized this before then, but <a href="http://www.google.com/trends/explore#q=%22space%20jam%22&geo=US&cmpt=q">Google suggests otherwise</a>.</p>
<p>Anyway, here's <a href="http://techcrunch.com/2010/12/31/space-jam/">an article</a> that summarizes how it all happened,
basically some Reddit user noticed, the word spread, and then it went
viral on Twitter.</p>
<p>I haven't seen this mentioned anywhere, but according to the headers
for the website, there were actually some modifications of some sort
in 2005:</p>
<pre><code>HEAD http://www2.warnerbros.com/spacejam/movie/jam.htm
200 OK
Connection: close
Date: Fri, 10 Jan 2014 02:12:09 GMT
Accept-Ranges: bytes
ETag: "89dfb-13c5-4027752a8ca80"
Server: Apache
Content-Length: 5061
Content-Type: text/html
Last-Modified: Thu, 06 Oct 2005 15:10:18 GMT
</code></pre>
<p>It's possible this was just a server move or something like that, but
it's interesting to think that someone actually did some <em>maintenance</em>
of some sort on the site.</p>
<p>I enjoy visiting the site, especially when I get nostalgic for the
early days of my work on the internet. There are so many projects
which I've worked on over the years, and a lot of them are gone
forever. It's nice to see one that has managed to survive.</p>
<p>Because I'm lazy, and like easy reassurance, I wrote a
<a href="https://twitter.com/SpaceJamCheck">@SpaceJamStatus</a>, a Twitter bot that will check on the status of the
website every few hours and tweet out the status:</p>
<p><a class="twitter-timeline" href="https://twitter.com/SpaceJamCheck" data-widget-id="421465296361103361">Tweets by @SpaceJamCheck</a></p>
<p>Furthermore, because I am apocalyptic, I wrote
<a href="https://twitter.com/spacejamisdown">@spacejamisdown</a>, a bot which checks the status of
the website every few hours, and will only report if it's not online:</p>
<p><a class="twitter-timeline" href="https://twitter.com/spacejamisdown" data-widget-id="421465631746060288">Tweets by @spacejamisdown</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script></p>
<p>With a little luck, this bot won't tweet any time soon.</p>
<p>Finally, because I have a love of writing random libraries, I wrote
the ruby gem <a href="https://github.com/muffinista/spacejam">spacejam</a>, which is a pretty simple Ruby library
you can use to check on the status of any website. It can do tests
against expected response codes, the body of a page, etc. It's pretty
simple, but it's good enough to check on the status of the Space Jam
website.</p>
Each Town - Listing All Towns in America on Twitterhttp://muffinlabs.com/2013/10/16/each-town-listing-all-towns-in-america-on-twitter/2013-10-16T00:00:00+00:002013-10-16T00:00:00+00:00Colin Mitchell<p>A week or two ago I launched <a href="https://twitter.com/eachtown">@eachtown</a> on Twitter. It will spend the
next couple years tweeting the name and location of every populated
place in America, in alphabetical order.</p>
<p><a class="twitter-timeline" data-dnt="true" href="https://twitter.com/eachtown" data-widget-id="390546604588924928">Tweets by @eachtown</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script></p>
<p>A couple of years ago, I spent a lot of time fiddling with the
<a href="http://gnis.usgs.gov/index.html">USGS database of Geographic Names</a>. It's a cool set of data and
I've often thought of doing more with it. I was inspired by
<a href="https://twitter.com/everyword">@everyword</a> to create something similar, and decided to
create a bot which iterates through every populated place in America,
and tweets the name, and a link to a Google Map for the location. I
enjoy the context you get from having the ability to look at a place.
Not every location in the database is a city or even a town. There's
mobile home parks, condominiums, etc. Seeing them on the map gives you
a sense of the fact that these places are real, and gives them a
little context.</p>
<div>
<img src="/images/agnew-mobile-home-park-wa.jpg" class="imgp_img" alt="Agnew Mobile Home Park, WA" />
</div>
<p><strong><a href="https://maps.google.com/?t=k&q=48.122838,-123.221415">Agnew Mobile Home Park, WA</a></strong></p>
<p>It's a pretty simple bot, and I'll post the source code at some point
once I clean it up a little.</p>
Gopherpedia - The Free Encyclopedia via gopherhttp://muffinlabs.com/2013/06/14/gopherpedia-the-free-encyclopedia-via-gopher/2013-06-14T00:00:00+00:002013-06-14T00:00:00+00:00Colin Mitchell<p>My last release for Project Dump week is <a href="http://gopherpedia.com/">Gopherpedia</a> –
a mirror of Wikipedia in gopherspace. If you happen to have a gopher
client, you can see it at gopherpedia.com on port 70. Otherwise, you
can browse to <a href="http://gopherpedia.com/">gopherpedia.com</a> and view it via a web
proxy.</p>
<p>A couple of years ago, I landed on the idea of a gopher interface to
Wikipedia. Originally it was probably a joke, but it stuck with me. So
one day I registered a domain name and got to work. The first thing I
needed to do was build a gopher server, because none of the currently
available options were up to the task. So I built
<a href="https://github.com/muffinista/gopher2000">Gopher2000</a>. Then, I quickly realized that the current
gopher proxies weren't any good either, so I built <a href="https://github.com/muffinista/gophper-proxy">GoPHPer</a>.
Once both of those were written (well over a year ago), it didn't seem
like there was much left to be done – gopherpedia should've been
ready to launch.</p>
<p>But I hadn't reckoned on the challenges of churning through a database dump
of Wikipedia.</p>
<p>Wikipedia is very open. They have an API which you can use to search
and query documents, and they provide
<a href="http://dumps.wikimedia.org/">downloadable archives</a> of their entire collection of
databases. They encourage you to download these, mirror them, etc.</p>
<p>My first implementation of gopherpedia used the API. This worked well,
but had two problems. First, it was a little slow, since it needed to
query a remote server for every request. Second, Wikipedia prohibits
using the API this way - if you want to make a mirror of their
website, they want you to download an archive and use that, so their
servers aren't overloaded.</p>
<p>So I downloaded a dump of their database, which is a single 9GB
compressed XML file. Nine. Gigabytes. Compressed. A single file.</p>
<p>Then a took the opportunity to learn about streaming XML Parsers.
Basically I wrote a <a href="https://gist.github.com/muffinista/5781615">parser script</a> that parsed the file while
it was reading it, as opposed to reading the whole thing into memory
at once, which was clearly impossible. The script splits up wikipedia
entries and stores them as flat text files. Running that script took a
couple days on my extremely cheap Dreamhost server – that's right, I
have a gopher server hosted on Dreamhost.</p>
<p>So, when someone requests a page, the gopher server reads that file,
does some parsing, and returns the result as a gopher query. Sounds
simple, right? Not quite, because parsing the contents of a wikipedia
entry is also a mess. It's part wikitext, part HTML, and there's
plenty of places where both are broken. If I was just outputting HTML,
I could probably get away with it. But since this is Gopher I really
needed to format the results as plain text. I spent a while writing an
incredibly messy parser, and the imperfect results are what you see on
gopherpedia now. Sorry for all the flaws.</p>
<p>Anyway, this was a fun project, and it occupied a pleasant chunk of my
spare time over the last year or two, but it's time to release it to
the wild. Unless I'm mistaken, this is now the largest gopher site in
existence. There are about 4.2 million pages on gopherpedia, totaling
somewhere over 10GB of data.</p>
<p>Here's my favorite page on the site – the
<a href="http://gopherpedia.com/gopherpedia.com/Gopher%20(protocol)">gopherified wikipedia entry for Gopher</a>.</p>
<p>Please note, this is in extreme beta, and is likely to break, just let
me know if you have any problems. Enjoy!</p>