Mumblr Recent Entries

Monner - Monitor CPU, memory and network whilst running a program

Sun, 12 Aug 2012 13:40:20 +0100

I'm looking at re-implementing one of our services in various different ways. To help me decide between the different ways I want to run a benchmark of the services and monitor how much of a systems resources they use.

After a lot of digging the best solutions I found involved running various different commands, scraping together the output and then cobbling together a report. Not fun.

Enter Monner. A simple python script that lets you do:

monner some-program

And will output results like:

CPU (%)    Memory used (mb)    Network in (kb)    Network out (kb)
100.0      3379.9              0.0                0.0
100.0      3380.4              0.5                0.0
100.0      3380.4              0.1                0.0
100.0      3380.4              0.0                0.0

This can then be loaded into a spreadsheet program and graphed to your hearts content.

Code is on github: http://github.com/colinhowe/monner

SSDs on AWS - Impact on Conversocial

Mon, 23 Jul 2012 14:58:37 +0100

AWS releases SSD instances

AWS recently released a new instance type with two 1TB SSDs attached locally.

These SSDs are fast. Especially compared to EBS. EBS will push to about 700 random reads/second when RAIDed and network conditions are kind. The new SSDs will do about 120,000 random reads/second, a 170x performance boost, irrespective of network conditions.

We switched our MongoDB over

We've just switched our primary MongoDB replica set over to the new SSD instances instead of high-memory quadruple extra large Instances (68GB RAM each). We've put the two drives together in a RAID1 array for extra reliability.

Performance Impact

Our average response time across all of Conversocial is 43% faster (392ms to 274ms). One particular view that grabs a lot of random data is now 74% faster (877ms to 504ms). We've not done much optimization beyond indexing yet so this is great.

According to MMS our CPU usage on our old primary server was hovering around 90% iowait (we definitely needed to shard). It now hovers around 3%.

Long-term Gain - Less Optimizing for Random IO

We've been starting to look at performance and have been considering a lot of fun tricks to get our data clustered together and optimised for reading from platter-based disks. Now that we have SSDs we can save ourselves a lot of time and hassle by not doing these optimisations.

This is a feature we've been waiting for AWS to do for a long time. It's now here and, as always, AWS have done an amazing job.

MongoDB - Collection Per User Performance

Mon, 09 Jul 2012 11:34:57 +0100

Theory

On the MongoDB site there is a suggestion that collections can be used to cluster data and get better performance as a result.

The idea is that a different collection could be used for each user's data. Internally, MongoDB will use different extents for each collection (an extent is a contiguous block of memory). By doing this we guarantee that a user's data will be stored in mostly sequential blocks on disk - making it far easier to read in data if going to disk.

Performance - Experiment

The theory is sound, so what about performance?

In this experiment I used two c1.medium (high CPU) instances on AWS - one to run the test and one to be the server.

The test script inserted 5,000,000 ~1kb emails into the database spread over 10,000 users (using a triangle distribution with a mean of 5,000 to simulate high/low volume users). 10,000 queries were then performed for 20 e-mails from a random slice of time for a random user. I'll add a link to the script tonight when I get access to my home laptop.

There was an index on user/time on every collection.

There were three variants:

One collection for all users
Fifty collections with user's spread equally amongst the collections
One collection for each user

Performance - Results

Having a single collection for each user reduced the amount of storage needed (due to not needing to store/index the user ID):

308mb vs 476mb for indices
4044mb vs 4101mb for data

Query performance was as follows:

One collection for all users - 91.0ms / query
Fifty collections - 20.0ms / query
One collection per user - 13.2ms / query

Insert performance was:

One collection for all users - 5,167 inserts / sec
Fifty collections - 4,350 inserts / sec
One collection per user - 1,645 inserts / sec

Conclusions

Having a single collection per user was ~7x faster for reads, but, ~3x slower for writes.

Using fifty collections seemed to give a decent balance, ~4.5x faster for reads and only 15% slower for writes.

This technique will introduce more complexity into your system and reduce the flexibility of querying. However, if performance is a concern then it is a technique worth considering - but benchmark it first as your mileage may vary :)

Powering Conversocial's Analytics

Tue, 03 Jul 2012 15:27:53 +0100

Powering Conversocial's Analytics

We recently released our new analytics functionality for our customers. It allows them to see stats like:

Number of messages received each day
Messages processed by each agent
Response times split into buckets (less than 30 minutes, less than 1 hour, etc)
Sentiment breakdown

All of this can be viewed for different date ranges and comparisons performed to previous time periods.

We've done some interesting things to make this possible and I'd like to share them.

Queues, MongoDB and Service Oriented Architecture

When an action is performed in the system (e.g. new content arrives, someone replies to a message) an event is generated and placed in our queueing system (backed by Redis and using pyres).

This is then picked up by a worker which identifies which metrics need updating (often tens of metrics for a single event) and then a call is made to our analytics API to perform the actual update.

The analytics API itself then handles pushing this data into MongoDB.

Why have an internal API?

Creating an internal service for analytics with its own API has given us a lot of benefits:

Small and self-contained code base that deals purely with analytics - this makes debugging far simpler
Failure is isolated - if the analytics servers go down then everything else carries on running
Freedom to use different technologies - our main application is Django, our analytics service uses Flask and pymongo as they fit the requirements better
Upgrading / changing is easier - create a new analytics machine with the new code and redirect a percentage of requests to the new machine until we're happy with it

The two downsides to this:

The whole system now has extra moving parts - but they're simpler so this is OK
Our development environment becomes more complex - but we now have a Virtual Machine with scripts to start everything so this is moot

Why MongoDB?

MongoDB is very fast when things are in memory. As most of our metrics are updating the reading for now we can safely assume that all our counter updates will be hitting data that is in memory.

To ensure good read performance we grouped readings for an individual metric together into a single document for each month. More on that here

We also considered Redis and Cassandra but ruled them out:

Redis has a memory limit which makes it useless for us - we want our customers to be able to query data from last year in just as much detail as the data today
Cassandra would have also been a good fit - it has tremendous write performance. We have no experience deploying Cassandra and a lot of experience deploying MongoDB and so we went with MongoDB

Data Structure

All our metrics are stored in the same way: a single document per month with a value for each day. E.g.

{ type: 'message-count', date: '2012-07', 1: 756, 2: 754, ..., 30: 760 }

This works perfectly for simple statistics such as number of messages per day. However, it doesn't really work if you want to see how the breakdown of messages by hour. To achieve this we store 24 metrics, 1 for each hour:

{ type: 'message-count-1', date: '2012-07', 1: 1, 2: 1, ..., 30: 3 }
{ type: 'message-count-2', date: '2012-07', 1: 2, 2: 3, ..., 30: 7 }
...
{ type: 'message-count-24', date: '2012-07', 1: 1, 2: 5, ..., 30: 6 }

Then if we want to get the breakdown by hour we query all 24 of these documents and combine them to create the hourly breakdown.

Performance

We currently use 3 of Amazon's small instances in a replica set to power this. We haven't really stress tested read performance - most queries respond in 1 or 2 milliseconds and doing lots of them at once hasn't caused CPU usage to go above 2%.

Whilst migrating our existing data over to our new analytics we maxed out at around 2,000 metric updates per second.

Conclusions

We've really enjoyed creating this new analytics system for ourselves. By isolating the entire system behind an internal API we made our lives a lot easier and simpler :)

We're Hiring!

We're on a mission to help companies give fantastic customer service on social media such as Facebook and Twitter.

If you want to join a London based VC funded startup working with fun things like Redis, MongoDB, Amazon Web Services and hundreds of millions of messages then e-mail jobs@conversocial.com with your CV and covering letter.

Considerations when Sharding

Fri, 08 Jun 2012 14:05:18 +0100

Whilst this is talking about our use of MongoDB there is relevance here for any sharding discussion.

We currently use MongoDB at Conversocial for our main content store. We're now starting to think about how we shard as the main store is getting pretty large (150 million documents across 300gb).

The easy answer for us is to use MongoDB's built-in sharding. MongoDB divides all the documents into equally sized chunks and distributes these evenly around all shards. This would cause the documents to be spread fairly even manner.

The difficult question then is what to shard on? Some combination of customer account and timestamps would ensure that all customer accounts are spread roughly equally around all our boxes and give us both read and write scaling.

Wonderful!

Except, if a shard goes down (assuming worst case scenarios) then ALL customers are affected to some extent. Reporting the temporary disappearance of some of their data will be quite difficult.

Instead we could limit the shard key to just customer account. This changes our failure scenario such that a subset of customers would temporarily lose access to ALL their data but the number of customers affected would be reduced. This scenario is also far simpler to explain to customers. A downside of this is that individual customers get no benefit from reads and writes being distributed around more servers and potentially running faster as a result.

This sounds good - all our customers data will be evenly distributed (in terms of storage) across our shards and the failure scenario will only affect subsets of customers. Unfortunately, not all customers are created equal. Some customers have a lot of data but don't do a lot with it. Some customers have a lot of data AND do a lot with it. We can easily imagine scenarios where one server with 100gb of data on it has a far lower number of queries than another server with 100gb of data - simply because the latter server has more demanding customers placed on it. In general, this shouldn't happen as customers will be distributed randomly and so the heavy hitters should be distributed fairly evenly. In practice, we must consider worst case scenarios.

To top this off - we also have customers that have more demanding performance requirements (and increasingly, regulatory requirements in terms of where there data is and whether it is encrypted).

Simply sharding based on storage won't deal with this.

Instead, we're thinking of moving our sharding into our application and manually sharding based on how demanding a customer's usage is. The advantage of this is we can then place customers data in different regions if needed (an added boon of this is we can place data closer to where they are and give them better performance). The disadvantage is that we're going to have to do all the sharding work ourselves.

We've still not made our decision. I put this up here simply as a reminder to others that when deciding to shard you must consider your business needs as well as trying to get bigger read/write volumes.

MongoDB - Strategies when hitting disk

Thu, 26 Apr 2012 15:26:33 +0100

I gave a lightning talk on this at the London MongoDB User Group and thought I'd write it up here

MongoDB sucks when it hits disk (ignoring SSDs). The general advice is to never hit disk. What if you have to hit disk? Conversocial's new metrics infrastructure will allow people to see statistics for their Facebook and Twitter channels going back indefinitely. In general, the data being queried and updated will be in the past month and we can keep this is memory. But, we want to let them query the data going back further than this - which means hitting disk.

We found three good strategies for making hitting the disk less painful:

1. Use Single Big Documents

The naive implementation of our metrics system stored documents like this:

{ metric: "content_count", client: 5, value: 51, date: ISODate("2012-04-01 13:00") }
{ metric: "content_count", client: 5, value: 49, date: ISODate("2012-04-02 13:00") }

An alternative implementation is:

{ metric: "content_count", client: 5, month: "2012-04", 1: 51, 2: 49, ... }

In this case we have a single document that spans an entire month with the value for each day being a field inside the document.

For a simple test we filled our database so that we had ~7gb of data on an Amazon c1.medium instance (1.7gb RAM) then tested how long it would take to read the data for an entire year and averaged this over multiple runs:

Naive implementation: 1.6s for a single year
Single document per month: 0.3s

That's a huge difference. The reasoning behind it is fairly simple:

The naive implementation has a worst case scenario where it has to read from the disk for all 365 documents and each of these results in a random seek
Having a single document per month has a worst case scenario where it has to read from the disk for 12 documents

An added benefit of this strategy is that there is less overhead per day which means the working set can contain much more data.

Foursquare do this.

2. Unusual Indices

Sometimes it pays to experiment with unusual index layouts. The naive index for our metrics system is on metric, client and then date:

db.metrics.ensureIndex({ metric: 1, client: 1, date: 1})

A common tip with indexing is to have all new values go to one side of the index. We reasoned that although the date was at the end of our index we would be writing to the right of lots of parts of the index so performance should be OK. We were wrong. We compared the performance of the above index with a new one:

db.metrics.ensureIndex({ date: 1, metric: 1, client: 1 })

The naive implementation performed 10k/sec inserts but after 20 million inserts the performance dropped down to 2.5k/sec inserts and occasionally stalled with lots of IO to disk. Ouch
By switching to date at the start of the index our performance was kept constant at 10k/sec inserts

What about queries? By putting the date at the front of the index we realised we'd now have to query an entire year of data using an in query:

db.metrics.find({
    metric: 'content_count', client: 1, date: { $in: [ "2012-01", "2012-02", ... ] }
})

A test of the read performance of this displayed no noticeable impact.

The reasoning for this is that the naive implementation will be causing a lot of rebalancing of the trees used for the index. By switching the index around we ensured that all inserts went to one side of the index and rebalancing became a trivial operation.

3. Pre-Allocate for Locality

For most disks (not SSDs) the sequential read performance is vastly better than the random read performance. This means that we can read our metrics really fast from disk if we read them all from the same part of the disk. With MongoDB documents will reside on disk in the order that you wrote them unless they are resized and need to be moved around.

If we pre-allocate zero filled documents then we can force values for nearby months for the same metric to be stored on disk in the same location and then exploit the speed of sequential reads:

db.metrics.insert([
    { metric: 'content_count', client: 3, date: '2012-01', 0: 0, 1: 0, 2: 0, ... }
    { .................................., date: '2012-02', ... })
    { .................................., date: '2012-03', ... })
    { .................................., date: '2012-04', ... })
    { .................................., date: '2012-05', ... })
    { .................................., date: '2012-06', ... })
    { .................................., date: '2012-07', ... })
    { .................................., date: '2012-08', ... })
    { .................................., date: '2012-09', ... })
    { .................................., date: '2012-10', ... })
    { .................................., date: '2012-11', ... })
    { .................................., date: '2012-12', ... })
])

Now, when client 3 wants their values for 'content_count' for the past year we can serve it using one big sequential read.

And the benchmarks?

Reading an entire year without pre-allocation: 62ms
Reading an entire year with pre-allocation: 6.6ms

Despite the performance gains from this we decided not to do this. Pre-allocation can get expensive for sparse data: you end up wasting a lot of space storing zeros that are never changed.

Conclusions

MongoDB can be made to have decent disk performance. You've just got to do some of the work yourself to ensure that reads aren't too expensive.

This is What We Make Happen

Fri, 20 Apr 2012 08:19:31 +0100

In a startup it's easy to lose focus. When you're worrying about investment, recruitment, customer requirements and all the other things you need to do, it's easy to lose sight of the problem your company is trying to solve.

This is where a simple daily e-mail can really help. At Conversocial we all get an email every day with five random conversations that our customers have had with their customers (the conversations are all public on Twitter and Facebook). This reminds us, on a daily basis, of what we make happen and the problem we're solving - customer service in social media. Not only does it remind us but it motivates us too - seeing people getting their issues resolved via Conversocial gets me fired up.

Give it a try yourself. Find the data that reminds you what you're about and create an automated daily e-mail for it. Afterwards, drop me a comment saying how you got on with it!

Never use Your ORM Directly

Sun, 15 Apr 2012 09:54:06 +0100

ORMs are great for helping a startup create a product quickly. The downside is that they make it incredibly easy to litter your codebase with code like this:

post = BlogPost.objects.get(id=post_id)

This seems relatively innocent at first, but, as your codebase grows you'll find it suffers from a few problems:

When you need to refactor how you access your blog posts you will need to do a search across all your code for any usage of BlogPost and check to see if it needs changing
Suppose you want to check if an index in your database is still needed. You would have to do a search across the entire codebase and check a lot of irrelevant uses of BlogPost
What if you wanted to completely change your ORM? All the places you used BlogPost will need changing, and there will be a lot of these places because of how you've used the ORM

Fortunately, it's easy to avoid this kind of problem by creating another layer between your ORM and the rest of your code - an internal API of sorts.

def get_blogpost_by_id(post_id): BlogPost.objects.get(id=post_id)

Then the example above becomes:

post = get_blogpost_by_id(post_id)

This will make your life easier when the codebase has grown and you need to refactor.

Changing how getting by ID works requires searching only for get_blogpost_by_id which will always be exact matches that you are interested in
Finding out about index usage is easy - you only need to check the code in your internal API (and check that the API is used)
A complete change of your ORM can be made transparent to the rest of your application - they still call the same methods and the API handles the switch

This kind of approach also makes it easier to reason about how you access your data - all the access paths are in one place for you to look over. It also lends itself to exposing a real API to the rest of the world.

To summarise, never use your ORM directly.

Tackling Technical Debt

Sun, 08 Apr 2012 20:07:01 +0100

Most startups accrue a large amount of technical debt whilst they develop the first versions of their product. This isn't a problem. Many startups don't know exactly what they need to build until after they've shown something to their users - spending time perfecting these initial versions might be valuable time wasted.

This build-up of technical debt becomes a problem once the company reaches a certain size - often when the company knows what is needed and is more focussed on growing the customer base and scaling the software. This build-up of debt constantly slows down the technical team and reduces effectiveness. In some cases it can even bring development to a grinding halt. So, what do you do to reduce your technical debt?

Get your Debt Visible

Before you embark on reducing your debt it it is worth spending time helping everyone understand the problem. Your sales and marketing folks probably don't (yet) care about your refactoring needs, further, some on your technical team might not be as exposed to your debt as others. It's hard to justify spending time on a problem if not everybody agrees that there is a problem.

So, get your debt visible. You can use fancy solutions such as Geckoboard to allow everyone to see the impact in real-time. Alternatively, simply sending an email every day or week summarising the situation can work wonders.

But, what do you show? There are lots of ways to measure the cost of your technical debt and here's a few to get you thinking:

Code complexity
Test coverage
Time spent on support issues due to technical debt
Slowdowns in new development caused by working around existing problems
$ worth of transactions lost due to technical debt (e.g. a client walks away due to instability in your platform)

An easily understood measure that is close to the business is your strongest ally for getting your business behind your attempts to reduce technical debt.

Once your business understands that there is a problem it is usually worthwhile explaining how you've got this problem. It's not because you're cowboys, it's because you wanted to help the business get established quickly.

Paying off the Debt

How Much to Pay Off

It is tempting to spend a lot of time paying off all technical debt and making everything rock solid. This has a few drawbacks:

It's demoralising to spend a long time reworking old code
It's demoralising not creating new and exciting things
It's impossible - you'll always have something that you're not totally happy with

Instead, I find it works best to nibble away at technical debt a little bit at a time. E.g. have a week or two of codepocalypse and then get back to normal. If you've taken the time to decide on a measure for your debt then you can use this measure to decide how long and often these binges should last.

What to Pay Off

When faced with a lot of technical debt there is a tendency to favour the massive rewrite. Resist that urge.

The most important thing to do is to prioritise your debt. It's easy to send your team off to tackle their favourite bit of debt, but, that doesn't guarantee good results. I tend to prioritise these things:

Hard stuff. The hard stuff is rewarding and is unlikely to ever get fixed in normal day-to-day development
Infrastructure. Infrastructure, (e.g. server setup, error reporting, logging) is also likely to be put off to another day during normal development
Common moans. Optimise for happiness. If everybody groans when they have to work in a certain area then get that fixed

Useful Techniques

Delete and Simplify Functionality

Remember that reporting functionality written six months ago as a trial feature? Does anyone use it? No? Delete it. What about the ultra-granular analytics you provide? Simplify them if no-one uses the granularity.

Scan your entire product for functions no longer called and features no longer used. Deleted code incurs no debt.

The Debt Register

Tracking your technical debt in a register is a great way to help prioritise. It can also be useful for reducing technical debt. Just the act of registering a bit of technical debt down can be enough to spur someone into great efforts to reduce it - pride is a wonderful thing :)

A debt register is also helpful in planning normal development. If everyone knows where the debt is then you're less likely to get nasty surprises when starting work on new functionality.

Classify Your Code's Debt Appetite

Some code is peripheral and can have a high amount of debt before it becomes a real problem. Some code can't. Your layer for talking to your database should have a very small amount of technical debt. Your cron job that sends a daily summary of new customer counts can tolerate a fair amount of technical debt.

Take the time to identify the areas of your system that are not tolerant of debt. Make sure everyone knows what these areas are and why.

Got Money? Use it

When bootstrapping a startup there are always a few technical decisions that are driven by financial reasons, e.g. "let's use these hosts because they give us free credit". Sometimes these decisions incur a lot of technical debt and the debt can easily be paid off with the sprinkling of some cash. Evaluate your past decisions (easy if you have a technical debt register) and decide if any can be wiped out with a bit of money.

How to Reduce Further Growth

After paying off your debt further growth of technical debt is inevitable and is not necessarily a bad thing. If taking on technical debt allows your company to move faster on things it needs to then it is a debt worth taking. When taking on more debt remember to keep the company aware of this so that it is not taken by surprise later.

Many start-ups in their early phases produce code of a general lower quality than more mature companies where quality is of greater importance than speed. Shifting your start-up team into this mentality can be hard. Getting the debt visible is a big step towards changing this mindset. Encouraging the team to have a big discussion about how to produce better quality code can create a surprisingly rapid change due to the team being much more bought into the solutions that are thought of.

Conclusions

Technical debt is an inevitable result of any startup trying to move fast. The trick is to not get drowned by it.

If Your Database Server Fails, What Happens?

Thu, 05 Apr 2012 13:57:12 +0100

I'm writing up everything I think someone running engineering at a startup needs to know. Every post will have a checklist at the end that can be used as a quick sanity check.

This is my first post for the series. Let me know what you think!

If Your Database Server Fails, What Happens?

Losing all your data is generally not an option. To prevent total data loss, most people look to backups. There is far more to backups than just copying the files somewhere with a cron job - the realm of backups is big enough that you can get a job as a backup engineer if you wished. In some cases, companies run quite happily without any traditional backups and instead rely on replication of data to enough locations that their data would survive anything short of a world war.

That said, there are some basic techniques and also some more advanced techniques you should be aware of to safeguard your data.

Techniques you Should be Aware of

DR Tests

I want to talk about recovery before backups - there is no point in backups if you cannot recover.

A DR (disaster recovery) test is a full or partial test of your recovery process. This can be made as close to a real disaster as you like, and the closer you get to reality the more confidence you can have in your recovery process.

The most important part to test is other people. You might know exactly how to handle database recovery but what if you aren't around the day that it all goes wrong?

Partial DR testing can also be performed automatically. A script could be written to regularly create a new server from recent backups and check that the backups work.

Examples of things to test include:

Configuration of a database server from a fresh machine
Recovery of your database from backup files to a new server
Switching your application to use the new server
Automatic failover of your database - take care with this, if it goes wrong you've just broken your production site
Communication processes - who needs to know about the disaster? Do customers need to know? Do you know what you would tell them?
Failure in recovery - what if the most recent backup is corrupt?

Whilst you are thinking about it, you shouldn't just limit DR testing to your servers. What about your office? What about people?

Benefits

A DR test will prove that you can actually recover in the event of a disaster. Until then, you're just hoping. The first time you do it, you will encounter flaws in your process. Testing it gives you the opportunity to improve it before you're doing it for real.

Doing regular DR tests will also ensure that other people know how to cope in a disaster.

Problems

DR tests can be time consuming to perform and can lead to real downtime if you are testing the failover of real components. The best way to mitigate this is to do such testing out of busy times to minimise impact.

The hardest part of doing DR testing is remembering to do it regularly. As your company evolves your recovery processes might need to evolve too, regular DR tests will highlight this need.

Full Database Backups

A full database backup is a useful but surprisingly hard thing to get right. Ensure you thoroughly read everything available on backups for your chosen database technology.

Benefits

Aside from the obvious, full database backups can also serve as good sources of data for testing or analysis (after being suitably cleansed of any confidential data).

Problems

Full database backups are hard to get right.

Consistency - is the backup consistent? Have you performed whatever your database requires to ensure the data is consistent? E.g. do you prevent writes to the files whilst creating the backup
Restorability - can the files be restored? This sounds silly but it's relatively easy to create backup files that simply do not work (e.g. your backup device could have run out of space)
Time to restore - if backups are large they can take a long time to restore from (especially if copying from a different data centre). Backups are nice. Knowing it will take 24 hours to copy the backups isn't nice.
Security - are your backups secure? Many leaks of confidential data have been where someone has gained access to backups
Regularity - how regular are the backups? If they're your only option then you could lose a lot of data
Load - creating a full backup can place a lot of load on the server chosen for the backup. It can also slow down network performance as large files are copied

Point-in-time Restoration

Some database technologies give you the option of maintaining logs of operations (often called journals or binary logs). These can then be combined with a full backup to restore data to a single point-in-time - providing your log goes back as far as the time when the full backup was taken. In general, the database will start with the full backup and then replay the logs to get the database to the desired point in time.

Benefits

Journals are generally smaller than the entire database and so can be backed more regularly than the entire database. This gives smaller windows in which data could be lost.

Journals can also be used in the event of an undesirable operation being performed on the database (e.g. deleting a lot of data accidentally). This is such a common use case that there are tools for most databases that will assist you in replaying entire journals except the specific commands that weren't wanted.

Problems

Replaying journals can take a long time. Combine that time with the time it takes to restore a full backup and you could have a long wait on your hands.

One-button Recovery

If your database fails you are probably going to be stressed. When you are stressed you are more prone to making mistakes. When recovering your database the last thing you want to do is make a mistake. Creating a script that performs recovery can make life much easier. Even better is a script that can be run with one click and zero configuration.

Benefits

Aside from reducing the chances of mistakes, a recovery script will allow others to do the recovery with minimal learning on their part.

Recovery scripts also make automatic DR testing much easier.

Problems

Recovery scripts need to be kept up to date. It is also tempting to make them more complicated than they need be - which can lead to bugs in your recovery process.

Automated Recovery

Automated recovery includes any process whereby a server failure is automatically detected and the server replaced by a new server that has been configured automatically.

Benefits

Downtime is kept to an absolute minimum. No more getting up in the middle of the night to restore a server.

Problems

Creating auto-recovery systems is hard. Detecting when a server has failed is hard - there are lots of ways a server could appear failed without it being a failure, e.g. a network issue could be making it appear down to your monitoring systems. Further, the failure might not be fixed by creating a new server, e.g. disk space might have run out.

Hot Standbys

A hot standby is a secondary database server that can be switched to if the primary database fails. Typically, the standby is kept up-to-date with the primary database by replicating writes to the standby (a variety of methods for doing this exist).

When the primary fails the standby is switched to and database functionality is restored, this is called failover.

Most databases provide basic functionality for hot standbys but require manual failover. Some databases go further and provide functionality for automatic failover.

Benefits

Hot standbys are primarily useful when uptime is a concern. In general, a hot backup will allow you to restore database functionality within a few minutes (if not seconds).

A further benefit is that very little data is lost - typically, only the data that was written to the primary but hadn't yet been written to the standby. In some configurations it is possible to have zero data loss - some databases allow for writes to be classed as successful only when they have been replicated to a standby.

Problems

Most of the problems of hot standbys are related to how they have been configured.

One of the biggest problems with a hot standby is that they do little to protect against malicious (or accidental) data corruption. If someone deletes all the data on the primary server then it will likely be deleted from the standby too. Time-delayed standbys can be used to solve this problem.

The placement of a hot standby is important. If the standby is in the same data centre as the primary then it is useless in the event of power outages, network outages, etc.

It is tempting to have standby servers running on cheaper hardware than the primary servers. This may save money but has the downside that performance may be degraded when the primary server fails. It also increases the risk that the standby cannot keep up-to-date with the primary server.

Time-delayed Replicas

A time-delayed replica is the same as a hot standby except it is intentionally behind the primary server. Typically, a time-delayed replica is kept an hour or so behind the primary. Generally, the time-delayed standby is kept out of the set of servers that can be failed over to automatically.

Benefits

A time-delayed replica is useful when the worst should happen: someone accidentally deletes your data (or maliciously during a security breach). Having a time-delayed replica makes it quicker to recover in these situations. It can either by failed over to directly (losing a little bit of data) or allowed to catch up until just before the point of disaster.

Problems

Whilst useful, time-delayed replicas add extra costs that might not be needed. They also add additional overheads in configuration and maintenance.

No Master Server

Not having a master server means that writes can go to any of a number of servers. If one server should go down then writes can continue being performed to the other servers.

There are many different ways to do this and each database technology has its own of doing it. Needless to say, many SQL implementations support this (e.g. Postgres, MySQL) as well as other technologies that do not use SQL (e.g. Cassandra, Riak).

Benefits

Having multiple masters makes it easier to keep writes going even in the event of a server failure.

Problems

If enough servers go down then writes become impossible. Ensuring that servers are in different racks (at the least) or in different data centres reduces the chance of multiple servers going down at the same time.

In general, the SQL implementations are significantly more complex than normal master-slave replication. The non-SQL technologies that have no master are unfamiliar to most developers and this could slow down your development (once the technologies are learnt you might find you develop faster as downtime issues occupy less time).

Checklist

As I said earlier, there is a lot more to backups than you might have thought. As with all things in technology there are a lot of trade-offs you will need to consider when designing your recovery processes. To make your decisions a bit easier I've put together a checklist for your backups in rough order of importance.

The Bare Minimum

Backups are made automatically
There is a written down restoration process that can be followed by any technical person in the company
At least one DR test when you think you have the rest of this list done

Periodic DR tests
There is a one-click restoration script that can be run by any technical person in the company
Backups are manually checked regularly by restoring to a server
Backups are automatically checked regularly by restoring to a server
Hot standbys are available that can be manually switched to
Restoration can be made to a single point in time

Nice to haves

Hot standbys are available that are automatically switched to
Hot standbys are in a different data centre
At least one time-delayed replica
There is no master database and loss of one server has no impact on write/read availabilty

Django Sampler 0.6 Released

Sun, 01 Apr 2012 10:08:55 +0100

I've just uploaded Django Sampler 0.6 to pypi!

For those that don't know, Django Sampler allows you to use cost-based sampling of queries on a production site to discover what is consuming the most resources. It's not always the bad queries that need optimising, sometimes its the little ones that run a lot :)

There is only one new feature but I'm quite excited by it: samples are now split by day. This allows you to see how your queries change over time. If you perform some optimisations then you should be able to see the difference in the data by choosing which day you want to see. We've found this functionality really useful at Conversocial for assessing how much of an impact different changes have made.

Enjoy!

Why every Developer should have Redis

Thu, 12 Jan 2012 16:38:22 +0000

Queues, counters and polling

How many times have you modelled a queue in SQL?

What about event counters?

Let's not even get started on polling for changes...

Since we started using Redis we've noticed that most of our 'hard' modelling problems in SQL will fit into a Redis construct really easily:

Queues - try a list or a sorted set
Counters - a key and using incr/decr work nicely
Polling - Pub/sub or blocking pops on lists

Realising this - and using it - has made a lot of our code far simpler and faster.

Don't bin SQL - it's great for relational modelling. Be familiar with Redis too - it's far better for when you want non-trivial types (and it's really fast).

Getting the Size of a Specific Index in MongoDB

Sat, 24 Sep 2011 08:32:07 +0100

Spent a little while trying to find the size of a specific index this morning and couldn't find any documentation on how to do it. Eventually stumbled on it in db.collection_name.stats()

> db.content.stats()
{
    "ns" : "conversocial.content",
    "sharded" : false,
    "primary" : "main01",
    "ns" : "conversocial.content",
    "count" : 1924859,
    "size" : 1578724996,
    "avgObjSize" : 820.1769563381006,
    "storageSize" : 1746546688,
    "numExtents" : 23,
    "nindexes" : 2,
    "lastExtentSize" : 301682688,
    "paddingFactor" : 1,
    "flags" : 0,
    "totalIndexSize" : 179773888,
    "indexSizes" : {
        "_id_" : 56226352,
        "source_1_puid_1" : 123547536
    },
    "ok" : 1
}

Also, db.collection_name.totalIndexSize(true) will list these:

> db.content.totalIndexSize(true)
_id_   56226352
source_1_puid_1 123547536
179773888

And an easy script to get the indices in mb:

s = db.content.stats()['indexSizes']; 
for (key in s) { 
    print(key + ': ' + s[key] / (1024 * 1024))
}

Django SQL Sampler becomes Django Sampler (with Mongo support)

Wed, 21 Sep 2011 09:58:43 +0100

In a previous post I described Django SQL Sampler as a tool that helps you find the SQL queries that are consuming the most time on a production site.

I've now renamed Django SQL Sampler to Django Sampler because it now does much more. It now has a plugin architecture that makes it easier to start sampling things other than SQL queries. Using this I have added support for MongoDB. Soon, I'll be adding support for Django's views and also Celery tasks.

It's on github, so check it out and let me know what you think!

Why I Want to See Open Source on your CV

Mon, 12 Sep 2011 07:32:50 +0100

An interview is an opportunity for you to tell me about your your legendary hacking and awesome communication skills. A contribution to an open source project shows me this, and more.

Designers don't turn up to interviews with just their CV, they bring along a portfolio of their work. They show their interviewer how good they are and demonstrate their style. If you're serious about software, you should do the same.

By making a single contribution to an open source project you show me that you can:

Code
Communicate
and, Analyse

It also demonstrates a commitment to wanting to be a developer.

It doesn't have to be a lot

I'm not looking for a massive commit. A single line patch, a bug report or a documentation change are all far better than nothing.

If you are serious, you can do any of this in a day.

Don't be afraid

Your code isn't perfect. Nobody's is. There isn't any perfect art either.

If you're just starting out then I expect you to not be as good as someone who has had a lot of experience. By seeing what you can do I can figure out your strengths and weaknesses and decide if they're things that we can work on.

There are lots of open source projects out there that are begging for commits. Most of them will welcome you with open arms. Remember that annoying little bug you encountered in that tool you used yesterday? You could fix that.

What's the best thing to do?

There are lots of ways to contribute. Below I've listed some of the ways you can do so in descending order.

Fix a complex bug

Fixing a complex bug shows that you've understood a complex system. It shows you can work with other people and it shows you can code.

Add/improve a feature

This is below fixing a complex bug because it's normally easier to add or improve an existing feature than it is to understand a big system.

Create an awesome bug report

An awesome bug report contains:

Detailed reproduction steps
Expected result
Actual result
A test case

Doing this shows me how you think and that you can express your thoughts clearly. Writing a test case shows that you can actually write automated tests.

Fix a simple bug

This shows you can work with other people and it shows you can code.

Write a documentation change

Documentation is crucial to most large projects. It's also often done with less effort than the rest of the project.

Helping out with documentation shows me how you communicate and that you understand the system you're writing about.

Help on mailing lists

Subscribe to a projects mailing list. If you see a question come in that you can answer - answer it!

Variety is Key

Doing a variety of the tasks above shows that you are all-round amazing. Fixing lots and lots of simple bugs is great but I'd rather see one simple bug fix and one documentation change - it eliminates more unknowns if I can see a variety in what you do.

Start Small

Jumping in to rewriting something is a big task. You'll likely not finish it either if it is your first time. So, start on something small.

Get on with it!

Stop reading this and get out their and do something. Anything.

Django SQL Sampling

Sun, 04 Sep 2011 19:18:04 +0100

Any site of a large size with have lots of different queries going on. If you're having performance trouble it's easy to find and fix the ones that are causing trouble. It's not so easy if you have a query that runs moderately fast but is run a lot. It's not even easy to find such queries.

Enter Django SQL Sampler (github link). This little tool that I wrote will sample a configurable percentage of SQL queries and group them by stack trace. It then gives you a view over these queries that allows you to find the queries that are taking up the most cumulative time - these are the queries that should be optimised or cached.

Let me know what you think!

Profiling Eventlet

Fri, 26 Aug 2011 16:36:51 +0100

I wanted to profile our system that is using eventlet. I wasn't happy with any of the existing solutions so I rolled my own: https://github.com/colinhowe/eventlet_profiler.

The existing solutions have a few failures:

You can't figure out how much time a specific function spends calling another function
They generally use CPU time used instead of wall clock time (which is useful if your bottleneck is MySQL or similar)
They don't track how much time is spent doing nothing (i.e. waiting for greenlets to finish)

Feedback is welcome!

My Ideal Pair-Programming Setup

Wed, 24 Aug 2011 18:39:15 +0100

Whenever I'm pair-programming with someone we often hit a spot where we both want to experiment and try stuff out - different ways of approaching a problem, playing with APIs, whatever.

Going back to your own computer when you hit this spot isn't great. By moving away from the other person you lose the ability to quickly show each other code and discuss things.

I think I've finally figured out the ideal setup to solve this problem.

Three monitors side-by-side. Each person has their own mouse and keyboard. Two chairs reasonably close together.

The person on the left gets sole control of the left monitor. The person on the right gets sole control of the monitor on the right. The monitor in the middle is shared ground. Either person can drag their windows from their monitor across to the shared one when they have something they want to share, talk about or pair on properly.

This allows people to pair really quickly and easily. It also allows you to switch back to your own experiments just as easily. You could even take it further and have this setup in a long row where everyone excluding the edge people have two people to pair with at any time. Or, further still, a circle of programming.

Sounds great to me... but, I don't know if there is any combination of software and hardware already available that would make this work. If anyone has an idea I'd love to hear it!

MySQL Performance for More Data than Memory

Mon, 22 Aug 2011 17:10:00 +0100

In a previous post I looked at the performance hit you get in MongoDB when the working set is larger than the memory available. I thought I'd have a look to see how MySQL fares under similar conditions.

The experiment was very similar experiment to the experiment for MongoDB.

Setup

MySQL server: A large Amazon RDS instance. I chose this as RDS is very well configured for MySQL. I'm more interested in how performance changes so don't want to spend time on configuration.

MySQL client: An EC2 small instance.

Code: up on github.

Test

The test will involve inserting X documents to MySQL with the following structure:

key: n (where n is 0, 1, ... X - 1) text: 'Mary had a little lamb. ' x 100 There will be an index on key to prevent full scans of the data.

After the insert there will be 30,000 reads with random keys.

The expectation is that when the data set gets too large to fit in memory the random reads will become very slow. This will be due to the data no longer fitting in memory.

When this thrashing of the disk starts happening it will be interesting to see what happens when a subset of the dataset is read from. To investigate this a further test will be run that:

99% of the time - reads from a random key chosen from only Y% of the keys 1% of the time - reads from any key chosen from the entire dataset The expectation here is that for small Y the performance will be similar to when the entire data set is in memory - as the pages that contain the subset of data will be in memory already and not need to read from disk.

Results

A result spreadsheet is available here (Google Doc).

The interesting part is when you reach 10,000,000 keys and start reading from subsets:

    Focus (%)  Read 1 (s)  Read 2 (s)  Read 3 (s)
      100        172.35      135.02      154.77
       10        124.70       95.48       77.29
        1         32.58       24.46       22.59

For comparison, the reads for a database with only 10,000 keys average out to 20.5s.

Unsurprisingly, this is similar to the results with MongoDB. What I think makes this interesting is that a lot of people don't seem to be aware of this - there are a lot of discussions about how to optimise indices but not many on how to keep your working set small. I know that we've benefited from being more aware of this when optimising MySQL queries. Maybe you will too?

Visualising Eventlet

Wed, 17 Aug 2011 09:46:02 +0100

When using eventlet you typically have a lot of concurrent things all being processed at once. Trying to identify problems in highly concurrent systems can be hard. To solve a problem we had I created eventlet visualiser (on github). This allows you to see the life span of all your eventlets in a program:

Each block can be clicked on and information is output in the console - such as stack trace and arguments to the function.

I used this to identify that a certain class of eventlet was slowly taking over the pool and preventing execution of any other eventlet. Hopefully someone else will find it useful too :)

Bash: Search and replace across multiple files

Tue, 09 Aug 2011 08:17:33 +0100

I sometimes need to do a search and replace across lots of files. So, I made a function in my .bashrc to make it easy:

function find_replace {
    FIND=$1
    REPLACE=$2
    echo "Finding $FIND and replacing with $REPLACE"
    grep -rl $FIND .
    grep -rl "$FIND" . | xargs sed -i -e s/$FIND/$REPLACE/
}

Usage: find_replace <search> <replace>
e.g. find_replace hello goodbye

Solr, Solango and being IO bound

Thu, 14 Jul 2011 11:49:56 +0100

We just hit a problem where the indexing performance of our Solr instance dropped massively when re-indexing the entire database. At the start it would be doing 100 docs/second but after an hour or so dropped down to 10/second and carried on falling.

After looking at iostat I discovered that Solr was IO bound. Specifically, the disk was maxing out writes at 20mb/s (this is on ephemeral storage on a large EC2 instance). However, our total data size at this point was only 350mb. Meaning that we were doing a large amount of rewriting of our indexes.

We're using Solango (a python library) to talk to Solr. We've set it up to commit in batches of 500. However, by default Solango will invoke the optimize command after each batch instead of using commit.

Optimize is a heavy-weight command that will remove deleted records from indices, compact indices and generally do all sorts of good stuff. Good stuff that smashes the disk and isn't so necessary during every little step of a re-index.

So, after changing that to do a commit instead of an optimize we're now indexing roughly 150 docs/second and maintaining that pace :)

Google Paying $150m to Employees - Might not be crazy

Thu, 07 Apr 2011 16:26:11 +0100

According to this article on Techcrunch Google have offered $150 million in stock grants vesting over four years to keep two key employees.

I think that this could be a clever move by Google. Google are trying hard to get excellent engineers going to Google. There are so many companies trying to hire these same people that it is incredibly hard to stand out. When you think of it like this, $37.5 million a year might be a better investment than a load of advertising, it's certainly caught the eye of a number of notable tech sites... plus they get to keep some people they clearly want to keep.

The counter argument is that if you're an excellent engineer then you likely care far more about the environment you're in and what you're working on. Google are trying to fix this - they're currently doing a reorganisation to try and make themselves less bureaucratic and more innovative. Everybody already knows that Google offer a lot of perks. Coupling all this with some big publicity about how well they reward their heroes might just be a winning strategy.

Is AWS the right host for us?

Tue, 05 Apr 2011 20:24:53 +0100

Short answer: probably not for 80% of sites out there. But, it is right for us. To find out why, read on. For a quick guide, skip to the conclusions ;-)

Question of Cost

AWS: generally more expensive for hardware

For the hardware, AWS is expensive compared to traditional hosting (shared or dedicated). However, AWS gives you greater flexibility than a traditional host:

If you find yourself with too much capacity you can turn off a server or two and stop paying
If you find yourself with too little... you can turn them back on

If you have spiky traffic (e.g. slow periods on weekends) then with traditional hosting you will have to pay for spare capacity all the time. With AWS this isn't the case and it can result it lower costs for some sites.

AWS: Redundancy is cheaper

On a traditional host you might need two web servers to cope with your levels of traffic. Your business can't afford to have these go down so you have two backup web servers, always on standby and always costing money.

On AWS you can fire up a new server in minutes. So, instead of having two standby servers you might have one, or even none and just fire them up when something goes down (AWS can even do this for you). This isn't a luxury you get with most traditional hosts as it can take them some time to set up new servers.

Cost of Staff

Hardware costs aren't the only cost. You also have the cost of staff to consider. With dedicated hardware you might need to employ someone to do your sysops far sooner than with AWS. With AWS you can find far more information out there on how to do things and have your existing team maintain the infrastructure more easily. You can also afford mistakes as you can simply start a new server if you break one ;-)

Testing is Cheap

Testing new code can be really cheap with AWS. If we have a big feature to roll-out and want to give it a thorough working over before releasing then we can start up an entire farm of servers with the new code. When we're done, we kill the farm. No more paying for test servers to sit around all day, every day, not being used.

S3 is Cheap

It's worth pointing out that S3 can be cheaper compared to most CDNs. You can use S3 without using any other part of AWS if you wish. S3 is not a CDN itself but does do a good job of serving static content. CloudFront is AWS's CDN that is built on top of S3 and performs much better. CloudFront is more expensive than S3 but is still worth exploring.

Cost not a decider

For us, cost was not the decider. It's more expensive to be on AWS but the cost isn't so high that it's not worth the other benefits we get.

Scaling

80% of people think that their site needs to scale to epic proportions. Sadly, nearly all of them are wrong. Most sites do not need to cope with a million unique visitors a day. Many sites have fairly slow growth patterns and do not need to make use of AWS to get servers up quickly as they grow.

Some sites are not going to grow wildly but still need scaling.

Some sites (such as ours) are rapidly changing what they do and adding new features. In this case it can be incredibly hard to determine what your hardware needs are going to be ahead of time. Having to worry about resource requirements every time you add a substantial new feature can really slow your development. As an example, at Conversocial we added some new features for searching Twitter. This new feature had the potential to double the amount of content we handle each day. Did we worry about the performance impact? Yes, for about 2 minutes. Our answer was that we would roll it out and monitor the performance, if Conversocial slowed down then we'd fire up new servers.

RDS: Hosted MySQL

Amazon recently added RDS to their suite of features. RDS is hosted MySQL. It handles pretty much everything:

Failover
Backups
Replication setups
Upgrades
Point in time snapshots and recovery

It's actually pretty amazing. This was a big deal for us. Not having to worry about all of this freed up a large chunk of our time, it also meant we haven't had to employ someone to look after MySQL for us.

Growing Feature Set

AWS is constantly adding new features. Frankly, the pace is astonishing. There are some features currently lacking that we'd like (e.g. knowing we have SSDs for our disks) but we are confident that these features, or ones we haven't even thought of, will soon come out and make our lives better.

Community Support

There is a thriving community around AWS. The abstracted platform makes it really easy for people to share recipes. Want a Mongo server on AWS? There's a guide for it. We don't have a dedicated sysadmin (yet) and the community support was a big pull for us.

Availability Zones

All hosts have downtime for one reason or another (natural disaster, a bomb or someone destroying their wire to the outside world). If you're a business then you probably worry about this a lot.

AWS has solved this problem by having multiple Availability Zones (AZs) across the world. Each AZ is a data centre that is isolated from all the others. Certain things are shared across multiple AZs, such as machine images and EBS snapshots. EBS is the persistent disk-based storage that you can attach to a running machine.

If one AZ gets wiped off the face of the Earth then you can quickly restore your entire infrastructure on another AZ using your AMIs and snapshots. Alternatively, you can have hot standbys in different AZs that automatically take over when something goes wrong.

It's cool

I'll be honest, AWS is cool. Many technical folks get excited by it. This did play a small role in our decision making, if we want to hire great people then we want to make sure we're giving them opportunities to learn about interesting things. AWS helps us do this :)

Alternatives

Using AWS doesn't have to be an all-or-nothing undertaking. I know of people using AWS solely for their backend processing and web servers, their Mongo database is hosted on a dedicated server with an SSD raid somewhere with a good connection to AWS. Similarly, some companies use AWS solely for bulk processing of data with Hadoop.

Conclusions

AWS is not for everyone. In particular, it's not for you if:

You're running just a single server and aren't (honestly) going to need to scale
You're running a load of servers and have fairly stable hardware needs
Hardware is a high cost for you (as %age of expenditures)
Your system hits disk a lot (IO performance on AWS is known to be lacking)

That said, we went with AWS and we're loving it so far. It can be slow at times and we have had an instance die on us (but we started a new one and nobody noticed anything had happened). That said, at some point in the future when Conversocial is massive and we have a better picture of our hardware needs then we might look at a mixed hosting.

Here's what we have on AWS at the moment:

Solr for search
Web servers running Apache
MySQL (via RDS)
MongoDB
Memcached
Backend servers running Celery

Hope that helps. If anyone has any questions, please ask!

How we migrated to AWS

Sun, 03 Apr 2011 13:13:23 +0100

A few weeks ago we (Conversocial) migrated our infrastructure from a shared Solaris host to AWS. I'm going to talk about how we did our migration, why we chose AWS is a big enough topic for another blog post.

The two goals for the migration were:

Migrate a customer at a time instead of doing a big bang migration. This meant that we could get a subset of our customers on the new infrastructure to make sure we were 100% happy before moving more and more customers across.
As seamless as possible for our customers. We wanted all our customers to carry on logging in at the same URL and not really be aware that they were on different servers (in different parts of the world).

Background - Our Infrastructure

Before pressing on it's worth saying a bit about what our infrastructure looked like. The infrastructure was fairly standard as far as Python/Django setups go:

a MySQL database with all our data in
some web servers running Apache to handle requests
some backend servers running Celery to do polling and process background tasks

With regards to size, our database wasn't massive but some of our customers did have several gigabytes of data.

General Approach

The general approach we took for each customer was as follows:

Mark the customer account as being currently migrated. This locked them out of the site but it freed us from having to worry about changes whilst we copied data around
Create an SQL dump of their data
Import the SQL dump in to the new database
Mark the customer account as migrated in both the old and new databases

Locking them out of the site

Locking customers out of the site was done for each customer in turn. In most cases the customer got full access again within five minutes. For our largest customers they were locked out for up to two hours. This wasn't as bad as it sounds - as we could do a customer at a time we were able to identify the ideal time for each customer and migrate them at their convenience and not ours.

To make things easier for the customer, the lock-down page had an auto-refresh to push them back to the site as soon as their account was migrated.

Dumping their data

To dump the data for each customer we created the MySQL Partial Dump tool. This tool allowed us to describe our schema and how all our tables relate to each other. It was then incredibly easy to create a MySQL dump for each customer. An added bonus is that this has given us an easy way to get data for testing environments (data cleansing is supported in the partial dump tool).

Handling Logins

Handling logins was probably the trickiest part of the migration. Whilst both infrastructures were live we wanted customers to be able to login at a single URL and be taken to the appropriate infrastructure without noticing anything different.

To handle this we created an additional subdomain: app.conversocial.com (changing to this was something we wanted to do anyway). Our existing infrastructure was hosting www.conversocial.com and we wanted customers to continue logging in at www.conversocial.com. To do this we altered our code in several ways:

For logins the email was first checked if it was in the legacy infrastructure
- If the e-mail was found and the account was not migrated then the email/password was checked as normal
- If the e-mail did not exist or the account was migrated then our old servers made a request to the new ones, passing through an encrypted token containing the email/password. If a match was found then a one-time token was created and passed back to the user for use in a redirect response that pushed them on to app.conversocial.com.
The forgotten password form followed a similar system
All new sign-ups were directed to the new infrastructure
The background tasks/pollers would only handle accounts that were on the same infrastructure as themselves
All page requests went through a Django middleware that checked if the account had become migrated. If so, the user was redirected to the same URL but on app.conversocial.com. This accounted for someone using the site being migrated between requests
Likewise, all AJAX requests went through a Django middleware. The difference here was that the response had to trigger some Javascript to do the redirection instead of relying on HTTP redirects

Putting it all together

All of this was put together using fabric so that we could do:

fab migrate:<account IDs>

Using fabric made it simple to handle connections to several servers for moving data around.

Doing a test-run

The entire migration went through two test-runs to ensure it would work when we did it for real. It also allowed us to iron out a few kinks.

There were two differences with the test-run: the migration flags weren't set for accounts as we migrated them. We didn't want customers suddenly using our new infrastructure before it was ready the data was cleansed of all e-mails and sensitive data before copying it. This prevented our new infrastructure from starting to send e-mails to customers (the old infrastructure would also be sending them e-mails)

Apart from that, it was all the same.

Conclusion

None of our customers noticed any unplanned down-time
The vast majority of our customers didn't even notice the migration at all
We didn't work crazy hours
Nothing went wrong that freaked us out
Small things did go wrong, but, the slow pace of the migration meant that the problems were small and isolated instead of catastrophic

Overall, we were very happy with the migration :)

Tea rounds

Sun, 13 Mar 2011 19:32:55 +0000

When starting your first job there is a lot to learn. Especially for a developer. During this learning frenzy it is easy to overlook soft skills that will help you make friends in your new work place. The easiest and most important of these is how to make a tea round (this includes coffee).

Making tea rounds is an important part of the work place:

getting to know names becomes easier
forces you to take a break and clear your head
people like getting cups of tea made for them

Sadly, tea round rules vary and it can be a minefield if you get the rules wrong. Fortunately, there is one universal law that will save you from upsetting your colleagues.

The universal law of tea rounds

If you do not make rounds of tea then do not accept cups from others making them.

Those that ignore this rule are seen as people who don't share in the work and think themselves special. That's not what you want.

Who to make tea for

The rule that varies most is how many people to offer tea to.

In small offices (a handful of people) this is usually everybody in the office.
In large offices the situation gets more complex: by team? by row of desks?

The simplest thing to do is watch and learn. If in doubt, be generous until you've figured out the optimal round size.

Remember preferences

Most people have a preference on how they take their tea. Remember it. They'll like you more for it.

Persistent 'no's

You will meet at least one person who always says no to a cup of tea. Ask if they want you to stop asking before you stop offering. The person might want a cup every once in a while or they might just want to feel included.

How to make tea

There are plenty of tutorials out there on how to make tea. The one thing they never mention is to make it too strong instead of too weak. It's easy to fix tea that is too strong and people rarely moan about it. Weak tea, on the other hand, is frowned upon and cannot be fixed.

Enjoy!

Enjoy making tea for people, no-one likes a grudgingly offered cup of tea.

Goodbye wordpress, hello Mumblr

Tue, 08 Mar 2011 21:47:00 +0000

I've been using a self-hosted Wordpress blog for about two years now. In those two years I've not been very impressed by it. It's easy to extend (sometimes) but it runs so slow and has so much clutter that I felt embarrassed by my blog.

So, I've moved to Mumblr by my colleague Harry Marr. Mumblr is written in Python and has Mongo as the database. It runs superfast and is easy to edit to have the functionality I need.

Another big change is that I have moved from my host (NearlyFreeSpeech) and on to an Amazon EC2 micro instance. It was really easy to setup and performance has been good so far.

The end result? A fast blog I can be proud of (barring the design, I'm working on making it neater).

MongoDB Performance for more data than memory

Wed, 23 Feb 2011 23:41:40 +0000

I've recently been having a play around with MongoDB and it's really cool. One of the common messages I see all over is that you should only use it if your dataset fits into memory. I've not yet seen any benchmarks on what happens when it doesn't though. So, here is a benchmark on how MongoDB performs when the data is bigger than the amount of memory available.

Setup

Mongo server: An EC2 large instance (64 bit) running a Ubuntu 10.10 image from Alestic. Has 7.5gb of memory. Data folder was on the instance and not EBS.

Mongo client: An EC2 small instance.

Test

The test will involve inserting X documents to MongoDB with the following structure:

key: n (where n is 0, 1, ... X - 1)
text: 'Mary had a little lamb. ' x 100

There will be an index on key to prevent full scans of the data.

After the insert there will be 30,000 gets with random keys.

The expectation is that when the data set gets too large to fit in memory the random gets will become very slow. This will be due to MongoDB's memory mapped files no longer fitting in memory and needing to be read from disk.

When this thrashing of the disk starts happening it will be interesting to see what happens when a subset of the dataset is read from. To investigate this a further test will be run that:

99% of the time - reads from a random key chosen from only Y% of the keys
1% of the time - reads from any key chosen from the entire dataset

The expectation here is that for small Y the performance will be similar to when the entire dataset is in memory - as the pages that contain the subset of data will be in memory already and not need to read from disk.

Results

Basic results

A result spreadsheet is available here (Google Doc).

Up to 3 million documents the reads were consistent around 17s for 30,000 reads:

      Keys  Average time (s)  Memory usage (mb)
    10,000              16.8   forgot to check
   100,000              16.9               547
 1,000,000              18.0              1672
 3,000,000              17.2              4158
10,000,000              74.1              7469 (16.1gb inc. virtual)

Once the dataset got larger than the amount of memory available the read time got slow. It wasn't as slow as it could be in extreme cases as roughly half of the dataset would still have been in memory.

It's worth noting that at this point inserts started getting slow: 178s for 3 million documents vs 1,102s for 10 million documents (~17k inserts/sec vs ~9k inserts/sec).

What about when reading a subset more often?

Focus (%)  Read 1 (s)  Read 2 (s)  Read 3 (s)
      100        73.1        75.3        73.9
       10        54.3        37.0        29.5
        1        21.1        18.8        18.2

Focus in the above results refers to the %age of the dataset that was chosen for 99% of reads. In this case it was the first Y% of rows to be inserted - meaning that the pages were likely now out of memory by the time we wanted to read them.

The results show that MongoDB will perform just as fast on a dataset that is too large for memory if a small subset of the data is read from more frequently than the rest.

It was interesting to see the 10% figure drop over time. I suspect that this figure will get closer to 18s as the number of reads increases - more and more of the pages will be cached by the operating system and not need to be read from disk.

Conclusions

From doing this it can be seen that the performance of MongoDB can drop by an order of magnitude when the dataset gets too big for memory. However, if the reads are clustered in a subset of the dataset then a large amount of the data will be able to be kept in cache and reads kept quick.

It's definitely worth noting that it's normal for the performance to drop by an order of magnitude when the database has to start hitting disk. The point of this experiment was to make sure that it was only one order of magnitude and that if reads were focussed the performance would stay high.

Code

The code for the benchmark (for improvements and your own testing) is in github: http://github.com/colinhowe/mongo-benchmarks/blob/master/bench.py

Getting a slice of live data (MySQL Partial Dump)

Sun, 13 Feb 2011 14:37:00 +0000

Have you ever needed to get all the data for a single customer from a MySQL database? All their orders, the products they've viewed, their billing preferences, everything.

I needed to do this for Conversocial when debugging a problem a customer was having. More recently, I've needed to do this for a server migration project. To make this easy I've created a tool: MySQLPartialDump.

MySQLPartialDump allows you to describe your database structure using a simple DSL. It will then crawl your database following the relationships you specify and create a dump file that can be imported directly in to MySQL.

More complex features include:

Cleansing of sensitive data - e.g. removing emails
Creating multiple files for parallel import
Custom relationships for tables that relate to other tables optionally

For full details please see the readme.

If you find this useful - please let me know :)

Celery and Sentry - Recording Errors

Tue, 08 Feb 2011 16:32:00 +0000

As part of improving our infrastructure for Conversocial we wanted to add django-sentry. This little app groups together similar errors and makes diagnosing problems far easier. It integrates with django seamlessly, but, it needs a bit of work to get celery sending errors to it.

This is not quite as simple as I first thought. After a bit of experimentation I found the following in our tasks.py worked:

# Hook up sentry to celery's logging
import logging 
from celery.signals import task_failure 
from sentry.client.handlers import SentryHandler

logger = logging.getLogger('task')
logger.addHandler(SentryHandler()) 
def process_failure_signal(exception, traceback, sender, task_id, 
                           signal, args, kwargs, einfo, **kw): 
  exc_info = (type(exception), exception, traceback) 
  logger.error( 
    'Celery job exception: %s(%s)' % (exception.__class__.__name__, exception), 
    exc_info=exc_info, 
    extra={ 
      'data': { 
        'task_id': task_id, 
        'sender': sender, 
        'args': args, 
        'kwargs': kwargs, 
      } 
    } 
  ) 
task_failure.connect(process_failure_signal)

This is based on some code from the Celery user group. The main difference is that instead of adding the SentryHandler to the celery logger I define my own logger. I do this because I found numerous issues when trying to add it to the celery logger, including:

Double-recording of the errors in Sentry
The task ID appearing in the message in Sentry - eliminating Sentry's ability to group messages
Celery's info/warning messages came through to Sentry - we use Splunk for checking our logs so wanted to get just the errors

Thought I'd share this nugget for anyone else who tries to get this working and hits problems :)

Mumblr Recent Entries

Monner - Monitor CPU, memory and network whilst running a program

SSDs on AWS - Impact on Conversocial

AWS releases SSD instances

We switched our MongoDB over

Performance Impact

Long-term Gain - Less Optimizing for Random IO

MongoDB - Collection Per User Performance

Theory

Performance - Experiment

Performance - Results

Conclusions

Powering Conversocial's Analytics

Powering Conversocial's Analytics

Queues, MongoDB and Service Oriented Architecture

Why have an internal API?

Why MongoDB?

Data Structure

Performance

Conclusions

We're Hiring!

Considerations when Sharding

MongoDB - Strategies when hitting disk

1. Use Single Big Documents

2. Unusual Indices

3. Pre-Allocate for Locality

Conclusions

This is What We Make Happen

Never use Your ORM Directly

Tackling Technical Debt

Get your Debt Visible

Paying off the Debt

How Much to Pay Off

What to Pay Off

Useful Techniques

Delete and Simplify Functionality

The Debt Register

Classify Your Code's Debt Appetite

Got Money? Use it

How to Reduce Further Growth

Conclusions

Further Reading

If Your Database Server Fails, What Happens?

If Your Database Server Fails, What Happens?

Techniques you Should be Aware of

DR Tests

Benefits

Problems

Full Database Backups

Benefits

Problems

Point-in-time Restoration

Benefits

Problems

One-button Recovery

Benefits

Problems

Automated Recovery

Benefits

Problems

Hot Standbys

Benefits

Problems

Time-delayed Replicas

Benefits

Problems

No Master Server

Benefits

Problems

Checklist

The Bare Minimum

Recommended

Nice to haves

Django Sampler 0.6 Released

Why every Developer should have Redis

Queues, counters and polling

Getting the Size of a Specific Index in MongoDB

Django SQL Sampler becomes Django Sampler (with Mongo support)

Why I Want to See Open Source on your CV

It doesn't have to be a lot