W(e)blinks

AWS Tips I Wish I'd Known Before I Started

xml-feeds@wblinks.com (Rich Adams) — Mon, 03 Feb 2014 00:00:00 +0000

Moving from physical servers to the "cloud" involves a paradigm shift in thinking. Generally in a physical environment you care about each invididual host; they each have their own static IP, you probably monitor them individually, and if one goes down you have to get it back up ASAP. You might think you can just move this infrastructure to AWS and start getting the benefits of the "cloud" straight away. Unfortunately, it's not quite that easy (believe me, I tried). You need think differently when it comes to AWS, and it's not always obvious what needs to be done.

So, inspired by Sehrope Sarkuni's recent post, here's a collection of AWS tips I wish someone had told me when I was starting out. These are based on things I've learned deploying various applications on AWS both personally and for my day job. Some are just "gotcha"'s to watch out for (and that I fell victim to), some are things I've heard from other people that I ended up implementing and finding useful, but mostly they're just things I've learned the hard way.

Application Development

Store no application state on your servers.
The reason for this is so that if you server gets killed, you won't lose any application state. To that end, sessions should be stored in a database (or some other sort of central storage; memcached, redis, etc.), not on the local filesystem. Logs should be handled via syslog (or similar) and sent to a remote store. Uploads should go direct to S3 (don't store on local filesystem and have another process move to S3 for example). And any post-processing or long running tasks should be done via an asynchronous queue (SQS is great for this).

Edit: For S3 uploads, HN user krallin pointed out that you can bypass your server entirely and use pre-signed URLs to let your users upload directly to S3.

Store extra information in your logs.
Log lines normally have information like timestamp, pid, etc. You'll also probably want to add instance-id, region, availability-zone and environment (staging, production, etc), as these will help debugging considerably. You can get this information from the instance metadata service. The method I use is to grab this information as part of my bootstrap scripts, and store it in files on the filesystem (/env/az, /env/region, etc). This way I'm not constantly querying the metadata service for the information. You should make sure this information gets updated properly when your instances reboot, as you don't want to save an AMI and have the same data persist, as it will then be incorrect.

If you need to interact with AWS, use the SDK for your langauge.
Don't try to roll your own, I did this at first as I only needed a simple upload to S3, but then you add more services and it's just an all around bad idea. The AWS SDKs are well written, handle authentication automatically, handle retry logic, and they're maintained and iterated on by Amazon. Also, if you use EC2 IAM roles (which you absolutely should, more on this later) then the SDK will automatically grab the correct credentials for you.

Have tools to view application logs.
You should have an admin tool, syslog viewer, or something that allows you to view current real-time log info without needing to SSH into a running instance. If you have centralised logging (which you really should), then you just want to be sure you can read the logs there without needing to use SSH. Needing to SSH into a running application instance to view logs is going to become problematic.

Operations

If you have to SSH into your servers, then your automation has failed.

Disable SSH access to all servers.
This sounds crazy, I know, but port 22 should be disallowed for everyone in your security group. If there's one thing you take away from this post, this should be it: If you have to SSH into your servers, then your automation has failed. Disabling it at the firewall level (rather than on the servers themselves) will help the transition to this frame of thinking, as it will highlight any areas you need to automate, while still letting you easily re-instate access to solve immediate issues. It's incredibly freeing to know that you never need to SSH into an instance. This is both the most frightening and yet most useful thing I've learned.

Edit: A lot of people are concerned about this particular tip (there's some good discussion over on Hacker News), so I'd like to expand on it a little. Disabling inbound SSH has just been a way for me to stop myself cheating with automation (Oh, I'll just SSH in and fix this one thing). I can still re-enable it in the security group if I need to actively debug something on an instance, since sometimes there really is no other way to debug certain issues. It also depends on your application; If your application relies on you being able to push things to a server via SSH, then disabling it might be a bad idea. Blocking inbound SSH worked for me, and forced me to get my automation into a decent state, but it might not be for everyone.

Servers are ephemeral, you don't care about them. You only care about the service as a whole.
If a single server dies, it should be of no big concern to you. This is where the real benefit of AWS comes in compared to using physical servers yourself. Normally if a physical server dies, there's panic. With AWS, you don't care, because auto-scaling will give you a fresh new instance soon anyway. Netflix have taken this several steps further with their simian army, where they have things like Chaos Monkey, which will kill random instances in production (they also have Chaos Gorilla to kill AZs and I've heard rumour of a Chaos Kong to kill regions...). The point is that servers will fail, but this shouldn't matter in your application.

Don't give servers static/elastic IPs.
For a typical web application, you should put things behind a load balancer, and balance them between AZs. There are a few cases where Elastic IPs will probably need to be used, but in order to make best use of auto-scaling you'll want to use a load balancer instad of giving every instance their own unique IP.

Automate everything.
This is more of general operations advice than AWS specific, but everything needs to be automated. Recovery, deployment, failover, etc. Package and OS updates should be managed by something, whether it's just a bash script, or Chef/Puppet, etc. You shouldn't have to care about this stuff. As mentioned earlier, you should also make sure to disable SSH access, as this will pretty quickly highlight any part of your process that isn't automated. Remember the key phrase from earlier, if you have to SSH into your servers, then your automation has failed.

Everyone gets an IAM account. Never login to the master.
Usually you'll have an "operations account" for a service, and your entire ops team will have the password. With AWS, you definitely don't want to do that. Everyone gets an IAM user with just the permissions they need (least privilege). An IAM user can control everything in the infrastructure. At the time of writing, the only thing an IAM user can't access are some parts of the billing pages.

If you want to protect your account even more, make sure to enable multi-factor authentication for everyone (you can use Google Authenticator). I've heard of some users who give the MFA token to two people, and the password to two others, so to perform any action on the master account, two of the users need to agree. This is overkill for my case, but worth mentioning in case someone else wants to do it.

The last time I had an actionable alert from CloudWatch was about a year ago...

Get your alerts to become notifications.
If you've set everyting up correctly, your health checks should automatically destroy bad instances and spawn new ones. There's usually no action to take when getting a CloudWatch alert, as everything should be automated. If you're getting alerts where manual intervention is required, do a post-mortem and figure out if there's a way you can automate the action in future. The last time I had an actionable alert from CloudWatch was about a year ago, and it's extremely awesome not to be woken up at 4am for ops alerts any more.

Billing

Set up granular billing alerts.
You should always have at least one billing alert set up, but that will only tell you on a monthly basis once you've exceeded your allowance. If you want to catch runaway billing early, you need a more fine grained approach. The way I do it is to set up an alert for my expected usage each week. So the first week's alert for say $1,000, the second for $2,000, third for $3,000, etc. If the week-2 alarm goes off before the 14th/15th of the month, then I know something is probably going wrong. For even more fine-grained control, you can set this up for each individual service, that way you instantly know which service is causing the problem. This could be useful if your usage on one service is quite steady month-to-month, but another is more erratic. Have the indidividual weekly alerts for the steady one, but just an overall one for the more erratic one. If everything is steady, then this is probably overkill, as looking at CloudWatch will quickly tell you which service is the one causing the problem.

Security

Use EC2 roles, do not give applications an IAM account.
If your application has AWS credentials baked into it, you're "doing it wrong". One of the reasons it's important to use the AWS SDK for your language is that you can really easily use EC2 IAM roles. The idea of a role is that you specify the permissions a certain role should get, then assign that role to an EC2 instance. Whenever you use the AWS SDK on that instance, you don't specify any credentials. Instead, the SDK will retrieve temporary credentials which have the permissions of the role you set up. This is all handled transparently as far as you're concerned. It's secure, and extremely useful.

Assign permissions to groups, not users.
Managing users can be a pain, if you're using Active Directory, or some other external authentication mechanism which you've integrated with IAM, then this probably won't matter as much (or maybe it matters more). But I've found it much easier to manage permissions by assigning them only to groups, rather than to individual users. It's much easier to rein in permissions and get an overall view of the system than going through each individual user to see what permissions have been assigned.

Set up automated security auditing.
It's important to keep track of changes in your infrastructure's security settings. One way to do this is to first set up a security auditer role (JSON template), which will give anyone assigned that role read-only access to any security related settings on your account. You can then use this rather fantastic Python script, which will go over all the items in your account and produce a canonical output showing your configuration. You set up a cronjob somewhere to run this script, and compare its output to the output from the previous run. Any differences will show you exactly what has been changed in your security configuration. It's useful to set this up and just have it email you the diff of any changes. (Source: Intrusion Detection in the Cloud - Video & Presentation)

Use CloudTrail to keep an audit log.
CloudTrail will log any action performed via the APIs or web console into an S3 bucket. Set up the bucket with versioning to be sure no one can modify your logs, and you then have a complete audit trail of all changes in your account. You hope that you will never need to use this, but it's well worth having for when you do.

S3

Use "-" instead of "." in bucket names for SSL.
If you ever want to use your bucket over SSL, using a "." will cause you to get certificate mismatch errors. You can't change bucket names once you've created them, so you'd have to copy everything to a new bucket.

I've found them to be about as reliable as a large government department...

Avoid filesystem mounts (FUSE, etc).
I've found them to be about as reliable as a large government department when used in critical applications. Use the SDK instead.

You don't have to use CloudFront in front of S3 (but it can help).
Edit: Based on some excellent feedback from Hacker News users, I've made some modifications to this tip.
If all you care about is scalability, you can link people directly to the S3 URL instead of using CloudFront. S3 can scale to any capacity (although some users have reported that it doesn't scale instantly), so is great if that's all your care about. Additionally, updates are available quickly in S3, yet you have to wait for the TTL when using a CDN to see the change (although I believe you can set a 0s TTL in CloudFront now, so this point is probably moot).

If you need speed, or are handling very high bandwidth (10TB+), then you might want to use a CDN like CloudFront in front of S3. CloudFront can dramatically speed up access for users around the globe, as it copies your content to edge locations. Depending on your use case, this can also work out slightly cheaper if you deal with very high bandwidth (10TB+) with lower request numbers, as it's about $0.010/GB cheaper for CloudFront bandwidth than S3 bandwidth once you get above 10TB, but the cost per request is slightly higher than if you were to access the files from S3 directly. Depending on your usage pattern, the savings from bandwidth could outweigh the extra cost per request. Since content is only fetched from S3 infrequently (and at a much lower rate than normal), your S3 cost would be much smaller than if you were serving content directly from S3. The AWS documentation on CloudFront explains how you can use it with S3.

Use random strings at the start of your keys.
This seems like a strange idea, but one of the implementation details of S3 is that Amazon use the object key to determine where a file is physically placed in S3. So files with the same prefix might end up on the same hard disk for example. By randomising your key prefixes, you end up with a better distribution of your object files. (Source: S3 Performance Tips & Tricks)

EC2/VPC

Use tags!
Pretty much everything can be given tags, use them! They're great for organising things, make it easier to search and group things up. You can also use them to trigger certain behaviour on your instances, for example a tag of env=debug could put your application into debug mode when it deploys, etc.

I've had it happen, it sucks, learn from my mistake!

Use termination protection for non-auto-scaling instances. Thank me later.
If you have any instances which are one-off things that aren't under auto-scaling, then you should probably enable termination protection, to stop anyone from accidentally deleting the instance. I've had it happen, it sucks, learn from my mistake!

Use a VPC.
VPC either wasn't around, or I didn't notice it when I got started with AWS. It seems like a pain at first, but once you get stuck in and play with it, it's suprising easy to set up and get going. It provides all sorts of extra features over EC2 that are well worth the extra time it takes to set up a VPC. First, you can control traffic at the network level using ACLs, you can modify instance size, security groups, etc. without needing to terminate an instance. You can specify egress firewall rules (you cannot control outbound traffic from normal EC2). But the biggest thing is that you have your own private subnet where your instances are completely cut off from everyone else, so it adds an extra layer of protection. Don't wait like I did, use VPC straight away to make things easy on yourself.

If you're interested in the internals of VPC, I highly recommend watching A Day in the Life of Billion Packets (Slides).

Use reserved instances to save big $$$.
Reserving an instance is just putting some money upfront in order to get a lower hourly rate. It ends up being a lot cheaper than an on-demand instance would cost. So if you know you're going to be keeping an instance around for 1 or 3 years, it's well worth reserving them. Reserved instances are a purely logical concept in AWS, you don't assign a specific instance to be reserved, but rather just specify the type and size, and any instances that match the criteria will get the lower price.

Lock down your security groups.
Don't use 0.0.0.0/0 if you can help it, make sure to use specific rules to restrict access to your instances. For example, if your instances are behind an ELB, you should set your security groups to only allow traffic from the ELBs, rather than from 0.0.0.0/0. You can do that by entering "amazon-elb/amazon-elb-sg" as the CIDR (it should auto-complete for you). If you need to allow some of your other instances access to certain ports, don't use their IP, but specify their security group identifier instead (just start typing "sg-" and it should auto-complete for you).

Don't keep unassociated Elastic IPs.
You get charged for any Elastic IPs you have created but not associated with an instance, so make sure you don't keep them around once you're done with them.

ELB

Terminate SSL on the load balancer.
You'll need to add your SSL certificate information to the ELB, but this will take the overhead of SSL termination away from your servers which can speed things up. Additionally, if you upload your SSL certificate, you can pass through the HTTPS traffic and the load balancer will add some extra headers to your request (x-forwarded-for, etc), which are useful if you want to know who the end user is. If you just forward TCP, then those headers aren't added and you lose the information.

Pre-warm your ELBs if you're expecting heavy traffic.
It takes time for your ELB to scale up capacity. If you know you're going to have a large traffic spike (selling tickets, big event, etc), you need to "warm up" your ELB in advance. You can inject a load of traffic, and it will cause ELB to scale up and not choke when you actually get the traffic, however AWS suggest you contact them instead to prewarm your load balancer. (Source: Best Practices in Evaluating Elastic Load Balancing). Alternatively you can install your own load balancer software on an EC2 instance and use that instead (HAProxy, etc).

ElastiCache

Use the configuration endpoints, instead of individual node endpoints.
Normally you would have to make your application aware of every Memcached node available. If you want to dynamically scale up your capacity, then this becomes an issue as you will need to have some way to make your application aware of the changes. An easier way is to use the configuration endpoint, which means using an AWS version of a Memcached library that abstracts away the auto-discovery of new nodes. The AWS guide to cache node auto-discovery has more information.

RDS

Set up event subscriptions for failover.
If you're using a Multi-AZ setup, this is one of those things you might not think about which ends up being incredibly useful when you do need it.

CloudWatch

Use the CLI tools.
It can become extremely tedious to create alarms using the web console, especially if you're setting up a lot of similar alarms, as there's no ability to "clone" an existing alarm while making a minor change elsewhere. Scripting this using the CLI tools can save you lots of time.

Use the free metrics.
CloudWatch monitors all sorts of things for free (bandwidth, CPU usage, etc.), and you get up to 2 weeks of historical data. This saves you having to use your own tools to monitor you systems. If you need longer than 2 weeks, unfortunately you'll need to use a third-party or custom built monitoring solution.

Use custom metrics.
If you want to monitor things not covered by the free metrics, you can send your own metric information to CloudWatch and make use of the alarms and graphing features. This can not only be used for things like tracking diskspace usage, but also for custom application metrics too. The AWS page on publishing custom metrics has more information.

Use detailed monitoring.
It's ~$3.50 per instance/month, and well worth the extra cost for the extra detail. 1 minute granularity is much better than 5 minute. You can have cases where a problem is hidden in the 5 minute breakdown, but shows itself quite clearly in the 1 minute graphs. This may not be useful for everyone, but it's made investigating some issues much easier for me.

Auto-Scaling

Scale down on INSUFFICIENT_DATA as well as ALARM.
For your scale-down action, make sure to trigger a scale-down event when there's no metric data, as well as when your trigger goes off. For example, if you have an app which usually has very low traffic, but experiences occasional spikes, you want to be sure that it scales down once the spike is over and the traffic stops. If there's no traffic, you'll get INSUFFIFIENT_DATA instead of ALARM for your low traffic threshold and it won't trigger a scale-down action.

Use ELB health check instead of EC2 health checks.
This is a configuration option when creating your scaling group, you can specify whether to use the standard EC2 checks (is the instance connected to the network), or to use your ELB health check. The ELB health check offers way more flexibility. If your health check fails and the instance gets taken out of the load balancing pool, you're pretty much always going to want to have that instance killed by auto-scaling and a fresh one take it's place. If you don't set up your scaling group to use the ELB checks, then that won't necessarily happen. The AWS documentation on adding the health check has all the information you need to set this up.

Only use the availability zones (AZs) your ELB is configured for.
If you add your scaling group to multiple AZs, make sure your ELB is configured to use all of those AZs, otherwise your capacity will scale up, and the load balancer won't be able to see them.

Don't use multiple scaling triggers on the same group.
If you have multiple CloudWatch alarms which trigger scaling actions for the same auto-scaling group, it might not work as you initially expect it to. For example, let's say you add a trigger to scale up when CPU usage gets too high, or when the inbound network traffic gets high, and your scale down actions are the opposite. You might get an increase in CPU usage, but your inbound network is fine. So the high CPU trigger causes a scale-up action, but the low inbound traffic alarm immediately triggers a scale-down action. Depending on how you've set your cooldown period, this can cause quite a problem as they'll just fight against each other. If you want multiple triggers, you can use multiple auto-scaling groups.

IAM

Use IAM roles.
Don't create users for application, always use IAM roles if you can. They simplify everything, and keeps things secure. Having application users just creates a point of failure (what if someone accidentally deletes the API key?) and it becomes a pain to manage.

Users can have multiple API keys.
This can be useful if someone is working on multiple projects, or if you want a one-time key just to test something out, without wanting to worry about accidentally revealing your normal key.

IAM users can have multi-factor authentication, use it!
Enable MFA for your IAM users to add an extra layer of security. Your master account should most definitely have this, but it's also worth enabling it for normal IAM users too.

Route53

Use ALIAS records.
An ALIAS record will link your record set to a particular AWS resource directly (i.e. you can map a domain to an S3 bucket), but the key is that you don't get charged for any ALIAS lookups. So whereas a CNAME entry would cost you money, an ALIAS record won't. Also, unlike a CNAME, you can use an ALIAS on your zone apex. You can read more about this on the AWS page for creating alias resource record sets.

Elastic MapReduce

Specify a directory on S3 for Hive results.
If you use Hive to output results to S3, you must specify a directory in the bucket, not the root of the bucket, otherwise you'll get a rather unhelpful NullPointerException with no real explanation as to why.

Miscellaneous Tips

Scale horizontally.
I've found that using lots of smaller machines is generally more reliable than using a smaller number of larger machines. You need to balance this though, as trying to run your application from 100 t1.micro instances probably isn't going to work very well. Breaking your application into lots of smaller instances means you'll be more resiliant to failure in one of the machines. If you're just running from two massive compute cluster machines, and one goes down, things are going to get bad.

Your application may require changes to work on AWS.
While a lot of applications can probably just be deployed to an EC2 instance and work well, if you're coming from a physical environment, you may need to re-architect your application in order to accomodate changes. Don't just think you can copy the files over and be done with it.

Always be redundant across availability zones (AZs).
AZs can have outages, it's happened in the past that certain things in an AZ have gone down. Spreading your application into multiple AZs is as simple as adding a new AZ to your load balancer and starting an instance in that AZ. You should spread your load over two AZs at the very least! If you can afford it, being redundant across regions can also be well worth it, but this generally has a more complex set up cost and isn't always necessary. You can now copy AMIs between regions, and you can set up your Route53 records to balance traffic between regions, but you can't use a single ELB across regions.

Be aware of AWS service limits before you deploy.
Various service limits are enforced which aren't highlighted until you're actually trying to deploy your application and get the error notification. These limits can easily be increased by making a request to AWS support, however that can involve a significant turn around time (as low as a few minutes, up to a few days, based on past experience), during which you won't be able to finish deploying. A few days before deploying, you should consult the service limits page to see if you think you're going to exceed any of them, and make your support request ahead of time. You will need to make a separate request to each department where you need a limit increased. It's also worth pointing out that some limits are global, while others are per-region.

Decide on a naming convention early, and stick to it.
There's a lot of resources on AWS where you can change the name later, but there's equally a lot where you cannot (security group names, etc). Having a consistent naming convention will help to self-document your infrastructure. Don't forget to make use of tags too.

Decide on a key-management strategy from the start.
Are you going to have one key-pair per group of instances, or are you going to have one key-pair you use for your entire account? It's easy to modify your authorized-keys file with a bootstrap script of course, but you need to decide if you want to manage multiple key-pairs or not, as things will get complicated later on if you try to change your mind.

Make sure AWS is right for your workload.
User mbreese on Hacker News makes the very good point that you should make sure that using AWS is correct for your particular workload. If you have a steady load and 24/7 servers, it's possible there are cheaper providers you can use, or it might even be cheaper to use dedicated hardware of your own. One of the big benefits of AWS is the ability to scale up and down rapidly in response to load, but not everyone needs that feature. As when purchasing anything, you should shop around a bit first to make sure you're getting the best deal for what you need.

Parting Words

So there you have it, I hope these tips will be of use to someone out there. There's such a wealth of information available about AWS, whether it's great posts, presentations, or videos; I've added a few additional reading links below for some of the resources I've found most useful. If you have any other tips, or want to suggest improvements to this post (or point out errors), feel free to let me know on Twitter, I'm @r_adams.

(If you liked this post, then you might also like the talk I gave at USENIX SRECon14, SRE in the Cloud).

Protecting Yourself Against Insecure Websites

xml-feeds@wblinks.com (Rich Adams) — Mon, 08 Apr 2013 00:00:00 +0000

There's a big problem with the internet right now: there are a large number of websites storing passwords insecurely. It seems there are lots of bad developers out there who don't know how to store passwords properly. That's fine, it's not really their fault, everyone had to learn at some point, and some people just haven't come across the information yet. Perhaps "bad" isn't the right word, "inexperienced" would be a better description. The life of a programmer is one of constant learning after all (if you're a decent programmer, the worst code you've ever seen is likely to be code you wrote a few years earlier).

But nowadays, there isn't room for inexperience when it comes to basic security. Many websites out there are storing passwords incorrectly, plain and simple. Big sites are not immune to such issues, there's the recent stories of both LinkedIn and Last.fm, all of whom weren't storing their passwords properly, putting their users at risk. But do users even know they're at risk?

In the UK last Summer, there were a lot of tweets posted around about how Tesco, a supermarket chain, were not storing passwords properly. Does the average online grocery shopper really understand why it's bad that their password was emailed to them in plaintext? I highly doubt it.

As people who do understand why it's bad, what can we do? Sure, we could stop using such services in protest, but that's not always feasible. Having to use some websites is just a necessary evil, whether it's because we don't have a choice or simply out of convenience. But there is something you can do to protect yourself against sites storing your passwords incorrectly.

What's the impact to you if your password is leaked?

Let's say your password to a service is stolen, either through a site incorrectly storing passwords, or for some other reason. What's the impact to you?

Someone will have access to that service's account, including all information you've stored with that service (email, address, security questions, etc).
If you use the same password in other places, those accounts are all compromised and you've given them access to everything stored with those services too.

Basically, there's now an access point to other parts of your online presence, all of which can snowball into even more access and information to an attacker. Thankfully, there are some simple things you can do to mitigate such a scenario.

Protecting against sites that don't securely store your password.

It all comes down to personal account and password management. How do you manage your account information and passwords to mitigate the impact should you have an account with a company that couldn't care less about storing passwords properly? Here are some simple rules to follow,

Never use the same password on more than one site. Ever.
Always use the most complex password you can that fits with the site's password requirements.
Never put information on your account that the company doesn't absolutely need. (Use fake information if it's "required" but they don't really need it. An online store doesn't need to know my date of birth, mother's maiden name, etc.)
Regularly purge expired/irrelevant information from your accounts.
(Optional) Use a different email address for every service you sign up for.

That's it. That's all there is to it. Let's look at these in more detail,

1. Never use the same password on more than one site.

"But I can't remember thousands of passwords!!!" you might cry. Well, that's kind of the point, you're not meant to remember them. You need to use a password manager. A password manager stores a list of all your usernames and passwords, that's protected by just one master password. So you only need to remember the one password.

"But what if someone gets the master password!!!" you might cry. Then someone would still need access to your password archive. The password archive is a strongly encrypted file that you keep on your computer, or on a keyring, etc. that's opened by the password manager. As long as you don't post it up online for the world to see, you should be good. Basically, an attacker would need two things; something you have (the password archive) , and something you know (the master password). In multi-factor authentication these things are referred to as a knowledge factor, and a possession factor.

There are some online services you can also use, such as LastPass. Other people swear by it, however I'm a little more paranoid. I don't use a service like LastPass simply because I will never trust a 3rd party with my passwords, even if they claim it's encrypted before being sent to their servers. This is just a personal preference, I encourage you to try LastPass out, research it's security, and make up your own mind.

I personally use KeePass. Using KeePass means no one else ever has access to my password file except me. I store it in a Git repository on my home server so I can always jump back to an older version and recover an older password if I need to (or if the file gets corrupted, etc). I'm basically abusing Git as a quick recovery solution. Obviously, I'm also backing up the file to several other places (encrypted of course). The point is, the file is never stored on a system that's not under my control without many extra levels of encryption on it. When it is encrypted, I'm the one doing the encryption, and I know exactly where the file is going and what the encryption keys are.

"What if I lose my password archive?" you might cry. That's what backups are for. Always always always backup important files. There's no excuse not to. If you lose your password archive and didn't have any backups, I have no sympathy for you I'm afraid. (Don't forget that sensitive backups should also be encrypted, and remember to always restore your backups regularly to make sure that they're actually working).

2. Always use the most complex password you can.

I've written before about how websites restrict your password choice in the name of security. Again, there's nothing we can do as users to stop this, since no one ever listens. So we're just going to have to work within the confines of what we're given.

Now that you're using a solution like KeePass, it doesn't matter what your passwords are, since you don't need to remember them. Some sites will have stupid rules, like "Don't use a % sign", but that won't matter any more. Most password managers come with a password generator tool as part of them. Simply enter the same rules that the website allows, and have the manager generate a password for you.

You can even just generate your own passwords on a command line too, just make sure that you use a good source of randomness/entropy,

cat /dev/urandom | tr -cd "[:graph:]" | head -c ${1:-100}; echo

But the key is to always use as many different characters as the site will allow, and make the password as long as the site will allow. You will then always have the most secure password (assuming you generate a completely random one from the characters you're allowed to use). While the password might still be weak due to the insane password rules, it will at least be the best you can get.

3. Never put information on your account that the site doesn't absolutely need.

This is a big one. So many websites ask for information, which they quite simply don't need. For example, let's say I want to view a forum post on a gaming website. The site is forcing me to register an account first, where it wants my address, phone number and date of birth. There's no way in hell they need that information, so there's no way in hell I'm going to give them accurate information.

Lots of website require you to enter your birthday, and then will ask you to verify that information if you ever need to recover your account. Just think about how many people out there know your date of birth. There's no need for these websites to have that information, they're really just asking you to provide a piece of information that you can remember in order to verify your account at some point in the future. So put in random information and store it in KeePass, just like a password. I'll say it again,

Never provide information that a site doesn't absolutely need.

Some information is indeed required, for example, if you're ordering a book online, it's pretty reasonable that you need to provide your real address. But by the same token, it's completely unreasonable for that site to request your date of birth or mother's maiden name.

Security questions are a perfect example. So many accounts get broken into via the security questions route. People don't seem to think twice about providing all sorts of information to websites that really don't need that information. When a website asks "What was your mother's maiden name?", you should be thinking "None of your god damn business!". You're under no obligation to provide real information to websites in their security questions (Note that official government websites may be a different matter, since providing false information on those is usually a crime).

The problem with security questions is that the answers are password equivalent, yet they're usually stored in plaintext, and shown to you in plaintext whenever you go to edit them. If a site implements security questions, it's pretty much always the weakest authentication link. Every single time. So what can you do to help? Simple,

Never answer a security question honestly.

First of all, if you do answer it honestly, there's instantly quite a few people in the world who will know, or who can guess the answer. Second of all, if someone gets access to your account, they now probably have enough information to get into some other account based on your security question answers. A shopping website simply does not need to know the town I grew up in, or the name of my first pet.

Your answers to these questions should use the same process of generating random passwords as you do for your actual passwords. Store the answers in KeePass and you're all good. You might have to use non-special characters since some sites won't let you in security question answers.

4. Regularly purge expired/irrelevant information from your accounts.

What does this mean? Well, if you order something from a company, they'll likely keep your address and credit card on file so that they can fulfill your order. But once you've got your item, the site doesn't need to keep this information. Many sites will keep this information indefinitely. On a shopping website you purchased something from years ago, your account gets broken into and now someone has your address. They can use this information to get into other accounts, more recent ones that contain more useful information, etc.

Once the order has been completed, the information you provided has "expired" and is no longer relevant to the website. You should regularly remove this information from your online accounts. There are some sites I trust with this info, as they have a proven record of protecting my data, like Amazon for example. But others, not so much. As soon as I get my item, my address and credit card info are deleted. If I need to order from them again, then I'll just enter the information again.

You may find that companies don't provide an easy way to remove this information. In those cases, a letter or an email to them will usually get the job done.

The protection this gives is questionable, but I find this useful for reasons other than security, so I think it's at least worth mentioning.

You might be wondering how you could possible have so many email addresses, well, since I own my own domain, I can have anything@mydomain, so I simple create a new alias that routes to my real email address every time I sign up for something. For laziness sake, I have a catch-all email which I check very infrequently just in case I forget to set up an alias.

Not everyone has their own domain however, or you might be with a webhost which limits the number of email aliases you can have. If you use Gmail however, you can create special email addresses on-the-fly by using a + character. Basically, take your normal email address before the @, and you can add + to it and it will all route to your normal email address.

For example, if my gmail address was example@gmail.com, then example+amazon@gmail.com and example+newegg@gmail.com, etc. would all route to my main inbox.

What does this provide in terms of security? Well, there are a few benefits to using this method,

If a website leaks your email address to spammers, you can very easily add a filter to mark those emails as SPAM.
If you receive an email claiming to be from a certain website, but which wasn't sent to your special alias, you can be sure it's a phishing attempt.
If an online account is broken into, and your email grabbed, it will be useless for logging into any other website since you will be using a different email.

Point 3 warrants further thought. I claim that if one account is broken into, it won't provide the email/username for another online account. But if you use my method above of example+amazon@gmail.com, etc. and the info is leaked, it's pretty obvious that your linkedin username would probably be example+linkedin@gmail.com, etc. So how can you get around this?

Something I've started doing is randomizing my username as well as my password (for websites where my username isn't tied to a profile URL... for example, having a random username for my Twitter account would be a bad idea. A random email on the other hand isn't much of an issue. I just store my random email (eg. example+2hsyd77234@gmail.com) in my KeePass archive, and I'm all good. If I need to verify that an email actually came from a specific site, I can just cross check it in KeePass. Although an easier method is to setup a rule to tag or move the mail to a specific folder, then if one doesn't get routed, you know it was an phishing mail.

Some may call this overkill, but I like being able to instantly see if an email came from the right place, and it's interesting to see which companies very obviously sell my email to marketers, since all the SPAM arrives to their unique alias.

How is all of this stuff going to actually help me?

Let's revisit the initial issue. A site you use gets hacked and your account information and password gets out (since the password was stored insecurely, the raw password is leaked), only now you've used the methods talked about above. What's the impact to you?

Someone will have access to that service's account, including the limited information you've stored with that service.

That's it. You don't use the same password information anywhere else, and the information stored with your account is either random, or useless by itself. You can rest easy knowing the information the attacker has gained is essentially useless, since there's not massive amounts of information about you on the site, and the password isn't used for anything else. Also, since you used an unique email address, any SPAM or phishing emails can easily be avoided by just deleting the email alias, or routing it to the SPAM folder, and the attacker won't have a clue what your username/email is for other online accounts to even attempt to brute force a way in.

Obviously, the information you've stored on the compromised account has been leaked. But the only way to prevent that is for the site in question to have decent security. There's nothing you can do as a mere user to prevent such things.

So now you're as protected as you can be in your online life, even if you're dealing with a site that was built by inexperienced developers.

This is how I do it anyway. If you think I'm doing something wrong, or have a better way to protect yourself, let me know in the comments. After all, there's always something new to learn!

Afterthought: How did we get into this mess?

Is it really that every developer out there is bad? Well, no, of course not! Everyone is inexperienced in something. Some developers simply might not realise that they are storing password insecurely. If you think you're doing things right way, then you're not going to be searching for the right way to do things. Back in my early days of web development, I too was one of the bad developers who stored passwords in plaintext, purely because I didn't know any better.

The blame doesn't always lie with inexperienced developers however, it could well be that management don't want to spend time/money on something they see no benefit from, all too often, a developer's cries can go unheard. Unfortunately, that's business, we don't live in some fantasy land, this is the real world. Unless the company's hand is forced due to a data breach that's made public, they're not going to invest time and money in upgrading their security, since they'd rather spend the money on something that will make them more money. After all, most people don't change the locks on their house to the latest edition unless they have a break-in first.

Unfortunately, there's not much to do in cases like this. I started writing emails to sites that were obviously storing passwords incorrectly a few years ago, but nothing much ever happened. I'd get the occasional response saying "Thanks, we'll look into it." and then nothing ever happened, or on the other end of the scale I'd get threatened with a lawsuit for "hacking" their system (of which I obviously did nothing of the sort). I tried naming and shaming, but that didn't work either, since unless you get a lot of traction and your complaint goes viral, it's not going to make a difference. Not to mention that even if you do get a lot of traction, most end users don't really know why it's bad that passwords are stored in plaintext, and simply don't care.

I don't see an easy way out. Developers need to be taught about web security from the start, companies need to take security more seriously, and users need to understand that they're being put at risk and how to protect themselves. I guess in the end, we're all a little bit to blame.

Fix Graphics in Ubuntu 10.04 Lucid Lynx on a Toshiba Portege

xml-feeds@wblinks.com (Rich Adams) — Sun, 17 Jun 2012 00:00:00 +0000

This is a brief follow on from my previous note on how to fix networking for the same setup.

After recently purchasing the awesome Humble Bundle V, it became apparent that I'd never got the graphics working properly on my laptop with Ubuntu 10.04. For those crazy people out there who, like me, want to run Ubuntu 10.04 on their new laptop, here's how to get the graphics drivers installed and working.

Hardware

Before you follow these steps, you should make sure that your hardware is an Intel Graphics Controller. I'm using a Toshiba Portégé (Z835-ST6N03), so you may need to adjust some steps if you're using different hardware.

To find our your exact hardware, you can use lspci. You may get an unhelpful response if you haven't updated your PCI IDs though,

> lspci | grep 'VGA'
00:02.0 VGA compatible controller: Intel Corporation Device 0116 (rev 09)

Updating your IDs and running the command again will give you something more descriptive.

> sudo update-pciids
Downloaded daily snapshot dated ...
> lspci | grep 'VGA'
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

New Graphics Drivers

The source of my information comes straight from Stefan Glasenhardt's Intel Graphics Driver blog post (German), and the packages to install come from his PPA.

Run the following to install the drivers,

sudo add-apt-repository ppa:glasen/intel-driver
sudo apt-get update
sudo apt-get upgrade

Then create /etc/X11/xorg.conf and add the following,

Section "Device" 
    Identifier "card0" 
    Driver "intel" 
EndSection

Finally, if you aren't already running the latest kernel from backports, now is the time to do that,

sudo apt-get install linux-image-generic-lts-backport-natty linux-headers-generic-lts-backport-natty

Once you reboot, your graphics should all be working nicely. This also means compiz will now work if you feel like having fancy desktop effects.

Fix Networking in Ubuntu 10.04 Lucid Lynx on a Toshiba Portege

xml-feeds@wblinks.com (Rich Adams) — Tue, 06 Mar 2012 00:00:00 +0000

It's been a while since I've written anything here, so I figured I'd start getting into the habit again with a quick note.

I recently got a new Toshiba Portégé laptop (Z835-ST6N03), onto which I immediately loaded Ubuntu 10.04. Unfortunately, 10.04 is getting a bit old, yet the hardware in the laptop is quite recent, so neither ethernet or wireless networking worked.

Now, I could have just put on 11.10, or even the 12.04 beta and it would work staight away (I know, because I tried). If you're happy to use later verions, then that's going to be the best way. Unfortunately, I cannot stand the direction Ubuntu has gone in. Despite being able to ditch Unity for Gnome, and to run Gnome in fallback mode so it's similar to Gnome 2, I still couldn't use the system the way I wanted to. Other applications had been "simplified" to such an extent that it was just excrutiating to use. So I resolved myself to using my favourite version and trying to get networking to work the hard way.

Since someone else may be just as crazy as me and want to get Ubuntu 10.04 up and running on their Portégé, I thought I'd document how I got it working.

Hardware

The ethernet adapter is a Intel 82579V Gigabit Ethernet, and the wireless is a Intel Centrino Advanced N 6230. It's worth making sure that you have this exact hardware if you're going to follow the steps below. If you have similar hardware, then you should be able to adjust the steps accordingly.

To find our your exact hardware, you can use lspci. You may get an unhelpful response if you haven't updated your PCI IDs though,

> lspci | grep 'Ethernet\|Network'
00:19.0 Ethernet controller: Intel Corporation Device 1503 (rev 04)
02:00.0 Network controller: Intel Corporation Device 0091 (rev 34)

Updating your IDs and running the command again will give you something more descriptive.

> sudo update-pciids
Downloaded daily snapshot dated ...
> lspci | grep 'Ethernet\|Network'
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Advanced-N 6230 (rev 34)

Fixing eth0

First things first, since there's no networking, you'll need access to another computer in order to download the ethernet drivers form Intel (or you can download them from SourceForge). At the time of writing, the latest version was 1.9.5.

Here's how to get your ethernet working once you have the file downloaded and moved to your laptop,

tar -xvf e1000e-1.9.5.tar.gz
cd e10001e-1.9.5
sudo make install
sudo modprobe -r e1000e; sudo modprobe e1000e

You can confirm it's loaded by running lsmod,

> lsmod | grep e1000e
e1000e                158424  0

Ethernet should start working straight away, so now you'll be able to at least connect to the internet.

Fixing wlan0

The wireless was a little more involved (I was hoping it would be as easy at the ethernet). As detailed on the Intel Wireless Networking page, you can download the wireless drivers from intellinuxwireless.org.

Originally I downloaded the microcode image (iwlwifi-6000g2b-ucode-18.168.6.1.tgz) and ran the following,

tar -vxf iwlwifi-6000g2b-ucode-18.168.6.1.tgz 
cd iwlwifi-6000g2b-ucode-18.168.6.1/
sudo cp iwlwifi-6000g2b-6.ucode /lib/firmware/
sudo modprobe -r iwlagn
sudo modprobe iwlagn

However, I later discovered you can do this more easily by just installing the linux-backports-modules-wireless-lucid-generic package. Ah well, live and learn.

sudo apt-get install linux-backports-modules-wireless-lucid-generic

At this point, lsmod will tell you that everything is loaded (iwlagn, iwlcore, mac80211 and cfg80211),

> lsmod | grep iwl
iwlagn                272480  0 
iwlcore               167474  1 iwlagn
mac80211              298255  2 iwlagn,iwlcore
cfg80211              182202  3 iwlagn,iwlcore,mac80211

Unfortunately, despite what lsmod says, I still had no wireless connectivity at this point. After a long and arduous search, I discovered this was because I was running the 2.6.32-38 kernel, and the wireless driver will only work properly for later kernels. So there's one last set of packages to install in order to update the kernel,

sudo apt-get install linux-image-generic-lts-backport-natty linux-headers-generic-lts-backport-natty

This will install 2.6.38.13.23 (or later), and after a reboot the wireless should start working straight away.

The later kernel also has the bonus of supporting two-finger scrolling and fixing a minor display issue for the encryption password prompt on the boot screen.

So there you have it, if you want to use Ubuntu 10.04 Lucid Lynx on your Toshiba Portégé laptop and would like networking to work, that's how I did it. Hopefully this will save someone else an hour or so of Googling.

Secure Session Management Tips

xml-feeds@wblinks.com (Rich Adams) — Sun, 06 Feb 2011 00:00:00 +0000

Most (if not all) modern websites use sessions to control the experience for individual users, and to maintain state between requests (since HTTP is a stateless protocol after all). Sessions are fantastic and incredibly useful, but if managed incorrectly they can expose your website to security vulnerabilities and potentially allow a malicious attacker to gain unauthorised access to user accounts.

Of course, the biggest tip is that you should really just use a pre-built framework which has tried and tested session management code where security experts have tested and verified it, and the bugs have been identified and fixed. But I never listen to myself...I've been building a new site in my spare time recently and got to the point of writing the session management code. This seemed like a good subject to try and get myself into the habit of updating my notes more regularly.

So while certainly not an exhaustive list, here are 11 of my tips on managing sessions and avoiding some common security vulnerabilities (yes, this post goes all the way to 11). I'm using PHP in the code examples, but the principles apply to any other language. In fact, PHP does a very good job of automatically protecting against most of the attacks the tips discuss, but this isn't necessarily the case for other languages, so the principles are still important to understand.

1. Always regenerate a session ID (SID) when elevating privileges or changing between HTTP and HTTPS.

By elevating privileges, I don't mean when you modify a user's record to give them more permissions, I mean any time an action is performed which makes the current user of the website have more privileges than they had moments before. Things such as logging in, where the user now has more privileges than they did a moment ago yet are still the same user of the website (and will be on the same session). In this case you should immediately generate a new session ID for them and destroy their previous session.

Regenerating a SID is extremely important to protect against session fixation. To understand what session fixation is, consider the following example of horizontal privilege escalation. User M (malicious) is a malicious user who is trying to access the account of User V (victim) on a website.

User M: Visits the website, and has a session ID assigned to them. They either look at the GET parameters (?sid=xxxxx) or at the headers (Set-Cookie: sid=xxxxx) to determine their session ID. Once they have the ID, they craft either a direct link using the GET parameters (http://example.com/?sid=xxxxx) or they construct a non-direct link which will add the relevant headers. This link is then sent to User V.

User V: Gets an email from User M which says "Hey, check out your new look banking account! http://example.com/?sid=xxxxx" Since the link looks legitimate (example.com is the real URL) then the user clicks the link confident it's not a scam. They then login with their account.

User M: Since this user already had the same session or already knows the SID, they will now be logged in to the site as if they are user V. If you regenerate the SID, then this wouldn't be possible since user M would no longer know the correct SID.

There are other ways session fixation can be performed, such as setting a cross-site cookie (Site A sets a cookie for Site B with the session ID, etc), or if you allow a sid to be set from a GET/POST variable then the malicious user can just pick one they want and don't even have to visit in the first place.

Remember it's not just for login, any privilege escalation should get a new session ID. You could just do it on every request since the user won't care that the session ID changes and it makes any session ID immediately invalid after the next request, but that's overkill and doesn't offer any real security benefit. Regenerating the session ID will make the attack useless, since by the time User V has done anything, the session ID they were sent was invalid, so User M cannot access their account.

You should also regenerate the SID when switching between HTTP and HTTPS, since you want to be sure not to use an insecure value over a new secure connection and vice versa.

In PHP it's easy to regenerate a session id using session_regenerate_id(), just remember to take extra care to always use true as the optional parameter so that the old session file gets deleted.

session_regenerate_id(true);

2. Check for suspicious activity and immediately destroy any suspect session.

You want to be sure the user who started the session is the same as the user who is actively using the session. There is no 100% accurate way to do this, the best you can do is to look for suspicious signs and get the user to re-authenticate if you suspect at any point they're not the same user.

This is to help prevent session hijacking. Suppose a user visits your site and comes from IP address 999.999.999.999 and is using the Chrome browser. Then all of a sudden on the next request they visit from 888.888.888.888 and are using the Internet Explorer browser. This would be pretty unlikely to happen if it were the genuine user, and so can be considered suspicious. You should actively monitor for these types of events and immediately destroy the user session and/or get them to re-authenticate.

So it's simple then, just check if the IP changes and we can be safe knowing we're protected? WRONG. Many people can be using the same IP address and this alone doesn't prevent a malicious user from hijacking a session.

Ok, so we just need to check the User Agent and if it changes then we're protected? WRONG. Changing the user agent header is trivial and should not be relied upon to protect against this attack.

So what's the answer? There isn't one (that I'm aware of). There's no foolproof way to positively determine that the user is the same user who started the session. We can only be suspicious that a particular user isn't the original user. That's what this tip is about, not knowing the positive, but potentially knowing a negative and acting on it pre-emptively.

So if the user agent changes and the IP changes, you should implement some sort of policy to re-generate the session and get the user to re-authenticate. Be careful about just implementing one of these though, as it can have some bad side effects. Users behind a proxy will find their IP changes on every request, and the user agent string is easy to change manually. It depends on how cautious you want to be and what the site is used for. There's no right answer here.

Some guides and tutorials will suggest also checking the referer to make sure it came from your own site. This would seem to be completely useless to me though since it can be easily spoofed by an attacker, and in some cases might not even be sent at all. Referer will give you more false negatives than is necessary and will just degrade the experience for your users.

If ($_SESSION['_USER_IP'] != $_SERVER['REMOTE_ADDR']
    || $_SESSION['_USER_AGENT'] != $_SERVER['HTTP_USER_AGENT'])
{
    session_unset(); // Same as $_SESSION = array();
    session_destroy();
    session_start();
    session_regenerate_id(true);
    Log::create("Possible session hijacking attempt.", Log::NOTIFY_ADMIN)
    Auth::getCurrentUser()->reAuthenticate(Auth::SESSION_SUSPICIOUS);
}

$_SESSION['_USER_IP']    = $_SERVER['REMOTE_ADDR'];
$_SESSION['_USER_AGENT'] = $_SERVER['HTTP_USER_AGENT'];

A loose IP check is a good option if you don't want to screw over proxy users. Just check the first 2 blocks of the IP address. It will catch anyone quickly changing countries for example. You can add even more information into the mix too, such as if the "Accept" headers change, since these will generally stay the same if it's the same user.

If ($_SESSION['_USER_LOOSE_IP'] != long2ip(ip2long($_SERVER['REMOTE_ADDR']) 
                                           & ip2long("255.255.0.0"))
    || $_SESSION['_USER_AGENT'] != $_SERVER['HTTP_USER_AGENT']
    || $_SESSION['_USER_ACCEPT'] != $_SERVER['HTTP_ACCEPT']
    || $_SESSION['_USER_ACCEPT_ENCODING'] != $_SERVER['HTTP_ACCEPT_ENCODING']
    || $_SESSION['_USER_ACCEPT_LANG'] != $_SERVER['HTTP_ACCEPT_LANGUAGE']
    || $_SESSION['_USER_ACCEPT_CHARSET'] != $_SERVER['HTTP_ACCEPT_CHARSET'])
{
    // Destroy and start a new session
    session_unset(); // Same as $_SESSION = array();
    session_destroy(); // Destroy session on disk
    session_start();
    session_regenerate_id(true);

    // Log for attention of admin
    Log::create("Possible session hijacking attempt.", Log::NOTIFY_ADMIN)

    // Flag that the user needs to re-authenticate before continuing.
    Auth::getCurrentUser()->reAuthenticate(Auth::SESSION_SUSPICIOUS);
}

// Store these values into the session so I can check on subsequent requests.
$_SESSION['_USER_AGENT']           = $_SERVER['HTTP_USER_AGENT'];
$_SESSION['_USER_ACCEPT']          = $_SERVER['HTTP_ACCEPT'];
$_SESSION['_USER_ACCEPT_ENCODING'] = $_SERVER['HTTP_ACCEPT_ENCODING'];
$_SESSION['_USER_ACCEPT_LANG']     = $_SERVER['HTTP_ACCEPT_LANGUAGE'];
$_SESSION['_USER_ACCEPT_CHARSET']  = $_SERVER['HTTP_ACCEPT_CHARSET'];

// Only use the first two blocks of the IP (loose IP check). Use a
// netmask of 255.255.0.0 to get the first two blocks only.
$_SESSION['_USER_LOOSE_IP'] = long2ip(ip2long($_SERVER['REMOTE_ADDR']) 
                                      & ip2long("255.255.0.0"));

A friend of mine (OK... me) built a site a very long time ago and thought that storing the username and password in a cookie on the client-side was the correct way to do things. Those details were then authenticated again on each request. As it turns out, this is a very bad idea. For starters, cookies are generally stored in plaintext on the client-side, so anyone with access to the computer can see them. Secondly, there are many attacks which can steal cookies and then use the information to impersonate another user. If all an attacker gets is a session id which has probably been regenerated since, it's pretty useless.

I... I mean my friend then decided that it might be better to hash the password. But again they were wrong. Even if you hash the password, it shouldn't be stored client-side, EVER. It allows someone to brute force the password and have all the time in the world to do it. Basically, never trust the client, you can't rely on client-side information being accurate, you should always keep things server side where you know that they are accurate.

The worst example of this was something I saw a few years ago (not me this time, thankfully). A site would start a session and let you login. The cookie would have a flag which said if the user was logged in or not, and then another variable with the username. All state was stored in the cookie rather than in the session. All you had to do to login as a different user was to change the username in the cookie, no password needed. A cookie can always be manipulated by the user.

Store all information server-side and only store the session ID on the client-side. The cookie should just be a pointer to the information server-side. You should treat cookies in the same way as any other user input (validate it and sanitize it).

When setting your cookies remember to always specify the domain, an expiry, and set the "HttpOnly" and "secure" options. "HttpOnly" prevents JavaScript from accessing the cookie, only the server can access it (assuming the user's browser implements it correctly of course). A common method of stealing cookies (and hence the session ID) is to inject some JavaScript onto a site using XSS, and then this JavaScript will steal the user cookie and post it to a malicious domain where the information is collected. Adding HttpOnly helps to prevent this, as the cookie can only be accessed via the HTTP protocol (this includes HTTPS, HttpOnly doesn't mean "unsecure HTTP only"). Setting the "secure" flag will limit the cookie so that it can only be accessed over a secure connection using HTTPS.

As of PHP5.2 you can specify "HttpOnly" and "secure" in the setcookie() method as the last parameters, or you can just set them directly into your PHP configuration to have session_start() make use of them.

// Manually set the cookie
setcookie("sid",                // Name
          session_id(),         // Value
          strtotime("+1 hour"), // Expiry
          "/",                  // Path
          ".wblinks.com",       // Domain
          true,                 // HTTPS Only
          true);                // HTTP Only

// Or, in php.ini
session.cookie_lifetime = "3600";      // Expiry
session.cookie_httponly = "1";         // HTTP Only
session.cookie_secure = "1";           // HTTPS Only
session.cookie_domain = ".wblinks.com" // Domain

// Then session_start will use the above config.
session_start();

4. Confirm SIDs aren't from an external source, and verify the session was generated by your server.

Never just blindly accept a session ID and assume it's valid. If you grab a session ID from a cookie, confirm that the cookie was set by the domain of your website (and not an invalid sub-domain for example), and make sure the session exists already and was generated by your server (so don't allow users to set their own session ID). We tend to assume that browsers correctly handle cookies so that cross-site cookies aren't possible. This might not be the case for all the browsers that your users use though, which can allow cross-site cooking. If the domain is invalid or the session wasn't created by your server, then destroy the session immediately and regenerate a fresh one.

Checking the session was generated by your server is as simple as adding a value into the session variable and checking for it's existence.

If (!isset($_SESSION['MY_SERVER_GENERATED_THIS_SESSION']))
{
    session_unset(); 
    session_destroy();
    session_start();
    session_regenerate_id(true);
}

$_SESSION['MY_SERVER_GENERATED_THIS_SESSION'] = true;

5. Don't append the SID to URLs as a GET parameter.

Not really an issue in PHP 5.3.0 and later, since the default configuration of session.use_only_cookies will protect against this, but it could be important for another language or earlier PHP version.

Doing this will leak the SID to various people, and SID leakage can lead to a session fixation attack if you haven't protected against it (see Tip 1). It will be stored in the user's browser history, it will be stored in a bookmark if they bookmark the page. If they copy/paste the link then it will be copied too. Use a cookie instead. This isn't a huge deal if you regenerate the id on every request, but still something to avoid.

Cookies are very rarely disabled nowadays and I've yet to see anyone (in person, or in logs) who visits without them enabled. I'm all for having a fallback, but you need to decide if it's worth it based on site traffic.

You should expire a session after both an overall lifetime and an inactivity time. Make sure you mark when a session was last used on every request. Just add a session variable with the current time,

$_SESSION['_USER_LAST_ACTIVITY'] = time();

When starting the session you should also store the time and then check a longer delay to make sure the session cannot last too long.

$_SESSION['SESSION_START_TIME'] = time();

Before doing any of this though, check if the last activity or start time value is older than some pre-defined time limit. If so, then destroy the session immediately. While adding a cookie time limit is important too, it should not be relied upon. You can't just clear the cookie on expiry and think it's over. The session will still be active on the server side and the session ID can still be used. You must clear it server-side too.

When the session expires you should make the user login again, and regenerate the SID as per Tip 1.

if ($_SESSION['SESSION_START_TIME'] < (strtotime("-1 hour"))
    || $_SESSION['_USER_LAST_ACTIVITY'] < (strtotime("-20 mins")))
{
    session_unset();
    session_destroy();
    Auth::getCurrentUser()->reAuthenticate(Auth::SESSION_EXPIRED);
}

7. Use long and unpredictable session IDs.

Quite basic this one, but never use sequential session ID's! If you rely on using a session ID which increments every time you need a new one, stop immediately and re-think your strategy. Even if you regenerate your session IDs to prevent fixation, even if you don't allow the session ID to be given as a GET parameter, even if you don't leak it in a GET parameter, none of that matters if you have predictable session IDs as an attacker can just know your current session ID no matter what. Session prediction is very bad.

In PHP this is all taken care of for you, and is not something you need to be concerned about, but in other languages that might not be the case.

session_start();
session_regenerate_id(true);

You can configure PHP a bit deeper than the defaults and decide on the entropy source (session.entropy_file) used to create the session IDs (/dev/random, etc), the hash function used (session.hash_function), how many bytes are used (session.entropy_length), etc. Change these if you like, but the defaults usually suffice.

Don't try to get clever and generate a session ID based on a hash of the IP or user-agent or anything like that. That's what I mean by predictable data, if you can find the pattern then an attacker can generate a session ID for anyone, opening you up to session hijacking.

Use something as random as possible, but also make sure it's actually a good pseudo random generator. Don't make the mistake of assuming the PHP rand() method on Windows is good for randomness for example, because you'd be surprised.

8. Properly sanitize user input before setting headers with them.

PHP will automatically protect against this as it will only allow one header to be set in the header() function, but for other languages it may not be the case.

This might seem a strange tip and appear unrelated to sessions, but all will become clear after an example. Suppose you have the following code, where the sanitize function strips things like HTML/JavaScript/SQL to prevent cross site scripting, but doesn't strip CRLFs (carriage return and line feed, or 0x0D and 0x0A in ASCII).

$nextPage = Sanitizer::sanitize($_GET['next_page']);
header("Location: $nextPage");

Even though it seems like you're protected from XSS attacks, session fixation is still possible using this method, despite the fact that you've technically not allowed the user to set sessions via the URL. This would appear to have nothing to do with sessions, but suppose I give the following link to a user,

http://example.com/?next_page=login%0d%0aSet-Cookie:%20sessionID%3d12345678

If you get rid of the HTML encoded characters you get,

?next_page=login\r\nSet-Cookie: sessionID=12345678

A sanitize method which doesn't protect against this by removing CRLFs will allow the above string through, in which case the following header has just been sent,

Location: login
Set-Cookie: sessionID=12345678

So even though you never allowed users to explicitly set the session ID, and you're sure you're safe against session fixation, a seemingly unrelated bit of code has allowed a sessionID to be fixed. An attacker can also modify the headers to make sure the cookie never expires for example, making it more likely the attack will succeed. This type of attack is called HTTP response splitting and is not something PHP users need to be too concerned with, as header() only allows one header to be set at a time, purposely to prevent this type of attack.

If you're not using PHP, you should sanitize any input which sets headers by removing CRLFs to prevent response splitting.

9. When a user logs out, destroy their session explicitly on the server.

Don't rely on garbage collection to destroy the session information on disk for you after a user logs out. Garbage collection may never run on a slow traffic site, even if you've set session.gc_probability, session.gc_divisor and session.gc_maxlifetime up properly. You can never absolutely guarantee garbage collection will run when calling session_start(). Always manually use session_destroy() to end the session and delete the data from disk. Don't rely on a cookie expiry to do it for you either, since if you don't manually destroy the session it will still be available on the server.

session_unset(); 
session_destroy();
session_start();
session_regenerate_id(true);

It's not generally possibly to delete a cookie explicitly, instead you need to re-set a cookie with the same name, but set it's expiry time to the past. I've seen tutorials which suggest using some JavaScript to clear the cookies, don't do this. Firstly relying on JavaScript is a bad idea, but also if you're storing your cookies correctly in the first place with HttpOnly (see Tip 3) then it shouldn't be possible to access your cookies via JavaScript anyway.

So first, here are some ways you shouldn't clear cookies.

unset($_COOKIE);
// This will only remove it from the superglobal and will do nothing to
// the actual client-side cookie.

setcookie("sid", "", 0);
// 0 sets the expiry time to when the browser is closed and doesn't 
// immediately expire it. Don't use 0!

setcookie("sid", "", strtotime("-1 hour"));
// Sets the expiry to one hour in the past right?
// In server time yes, but cookies are stored on the client in their
// local timezone, so depending on where that is, it may not expire for a
// few more hours!

The correct way to clear a cookie is to just pass in 1 as the expiry time. This is one second after the unix epoch and will always be in the past. (If you really want you can set it to some time over 24 hours in the past, but "1" is always going to be less verbose).

setcookie("sid", "", 1);

Don't forget to destroy the session on the server-side too!

10. Check your session configuration.

Check your session configuration carefully to ensure you're not sharing things you shouldn't. For example, by default PHP stores sessions in the "/tmp" directory. All well and good if you have a dedicated server, but if you're on shared hosting then it could allow anyone else on that server to see the session data and hijack them. Of course, you can still use /tmp, just make sure to set the file system permissions properly so only you can read the session data.

It's recommended to read through all of the configuration options PHP or the language of your choice provides and to make sure they're all set up correctly for your needs. A slight misconfiguration can open you up to all sorts of strange attacks. In general the default configuration is pretty good, but there are still some things you should consider changing (like the session.save_path mentioned above for example). The parts of the configuration you need to change will always depend on the specific needs of your application, so make sure to understand all of the options available to you.

11. Force users to re-authenticate on any destructive or critical actions.

A quick tip to end on. Any time a user wants to perform something destructive or critical (delete account, change password, etc) then you should force them to re-authenticate with their password. This will prevent anyone from performing the critical actions if they've stolen a valid session ID since they don't know the password.

Auth::getCurrentUser()->reAuthenticate(Auth::ACTION_SENSITIVE_CRITICAL);

You don't have to worry about the malicious user knowing the password, since if they knew that then it's a moot point and game over anyway, attempting the hijack the session would be pointless.

Summary

Always regenerate a session ID (SID) when elevating privileges or changing between HTTP and HTTPS.
Check for suspicious activity and immediately destroy any suspect session.
Store all session information server-side, never store anything except the SID in the client-side cookie.
Confirm SIDs aren't from an external source, and verify the session was generated by your server.
Don't append the SID to URLs as a GET parameter.
Expire sessions on the server side, don't rely on cookie expiration to end a user session.
Use long and unpredictable session IDs.
Properly sanitize user input before setting headers with them.
When a user logs out, destroy their session explicitly on the server.
Check your session configuration.
Force users to re-authenticate on any destructive or critical actions.

None of this is cutting edge and there are no new session based attacks out there that have prompted this post, all of these tips are things that have been known for years. But that doesn't mean people aren't always started to learn about web development and these things need to be known. Even if you rely on pre-built frameworks, knowing this stuff is useful for other areas.

As I said at the start, the list is certainly not exhaustive and there are plenty of excellent tutorials and articles on the subject just a Google search away. Secure session management is a complicated subject, so it's well advised to read around before trying to implement your own system.

As I have said many times in past notes, I am not a security expert. Before trying to write any session management code yourself, seriously consider using something pre-built and open source. Many web frameworks have session management abilities as part of them which have been tried and tested by many users and security experts, people who are much smarter than me.

Cross Site Request Forgery (CSRF/XSRF)

xml-feeds@wblinks.com (Rich Adams) — Fri, 04 Jun 2010 00:00:00 +0000

If you're building a site that allows users to update any sort of information (so most websites), then you should probably think about protecting against Cross Site Request Forgery (referred to as CSRF or XSRF). Being susceptible to this type of attack can be annoying in some cases, but extremely dangerous in others. Unfortunately, it's not the type of attack that's easy to understand at first, and it's not immediately obvious how to prevent such an attack. Because of this, protecting against XSRF is often overlooked, even on some big name websites.

What is XSRF?

An XSRF vulnerability is one which allows a malicious user (or website) to make an unsuspecting user perform an action on your site which they didn't want to happen.

As a basic example, imagine you allow users to post images in your comments. If a malicious user puts "http://example.com/logout.php" as the image's URL, where example.com is your domain, then any time a logged in user views that comment they will be logged out if you don't protect against XSRF. It's not a valid URL for the image, but that doesn't matter as the unsuspecting user's browser will still make the request and your site will perform the action thinking the user wanted it.

A more dangerous example could be that you allow a user account to be deleted without confirming the action or protecting it from XSRF in any way, so any user visiting the page would then get their account deleted instantly!

I thought I'd make a comment on your site. Check out this cool image!

Even if you have no user submitted content on your site, you can still be vulnerable. If a malicious website contains a form which the unsuspecting user submits, it can submit the information to a different site which the user is logged in to, and that site will think the request came from the logged in user and will just perform the update as if the user had done it intentionally.

It doesn't even have to be a link/button the user clicks on, as shown in the first example it could happen even if just viewing a site. Obviously, this could become quite annoying for your users.

While annoying is one thing, it can also be dangerous. For example, let's say I'm logged into an account on a simple shopping cart site. I then go and browse to another unrelated website. The other website has a button which just says "Click here to register". Seems simple enough, this other website wants me to register an account. However, this is a malicious site and when I click the link, it's actually submitting a request to the original shopping cart website as if I'd clicked the "Order me 1000 of some expensive item in one-click" button. If the shopping cart website is susceptible to XSRF then it will think the request to order 1000 items was genuinely submitted by me, and I'll get a nice surprise in the mail and on my credit card statements.

It's a particularly difficult type of attack to get your head around as it's very subtle, but once you understand how it works it's not that difficult to protect against it. It's very easy to go down the bad route though and think you're safe when in reality you're still wide open to attack.

The Wrong Way To Protect Against XSRF

So what can you do as a web developer to prevent such attacks? Effectively you're just wanting to make sure a request came from your site and was actually intended to be run by the user, so you could just check the referer header to make sure the request came from your own site, right?





While this may work in some cases, it's going to be about as effective as an underwater hair dryer as far as stopping an XSRF attack. Altering the referer header is pretty trivial and it will also have the downside of making the site unusable for lots of people, since many browser or proxies can strip the referer header when in "private" mode.



This also wouldn't protect against the first example of XSRF, where someone just uses the logout URL as an image URL in a comment. As the request would come from the correct referer in that case.



So not only will you not prevent XSRF attacks, but you'll also annoy some of your users. Not a good solution.



The Right Way To Protect Against XSRF

What we really need is a nonce (one-time key/token) which allows us to validate that the request came from a form we presented to the user intentionally. The following code samples show the method I use to achieve this for the comments section of this site.



For the purposes of the following code examples, you can assume the Session class is just a wrapper for the $_SESSION superglobal (It actually does some other stuff, but that doesn't matter for this). All of these functions are part of an Auth class, which handles all of my XSRF protection (along with some other things).



First things first, you'll want to add a configuration parameter somewhere so you can specify how long the tokens should last. I use 15 minutes, but you can use whatever makes sense for your application. The important point is that the tokens should have an expiry,


$_config['csrf_token_lifetime'] = "15 mins"; // How long the tokens should last.




You'll also need a function which can generate a nonce (basically a random string of characters), here's the method I currently use, there might be better methods out there.


// Generates a nonce, by base64 encoding some random binary data.
public static function generateNonce($length = 20)
{
    return substr(base64_encode(openssl_random_pseudo_bytes(1000)), 0, $length);
}




To prepare the token, you'll want to generate it, calculate the expiry time for it (based on the configuration parameter you stored earlier), and then add these values to the current user's session. I keep an array of active tokens (more on why later).


// Generate a new XSRF token and store it in the user's session.
public static function getXSRFToken()
{
    $nonce          = Auth::generateNonce();
    $tokens         = Session::get("_xsrf");
    $tokens[$nonce] = strtotime("+".$_config['csrf_token_lifetime']);
    Session::set("_xsrf", $tokens);
    return $nonce;
}


    

You'll need some way to validate that a given token is valid. This is something you'll call when you recieve a token as part of a request. Validate that it exists in the user's session, and that it hasn't expired. If it's valid, you should make sure to immediately invalidate it so that it can't be used again. You also want to clear any tokens which have expired.


// This will determine if an fkey is valid
private static function validateXSRFToken($xsrf)
{
    $tokens = Session::get("_xsrf");
    if ($tokens == null) { return false; } // Sanity check

    // Check that the fkey exists, and time has not expired
    foreach ($tokens as $key => $expires)
    {
        // Remove any tokens that have expired.
        if (time() > $expires)
        {
            unset($tokens[$key]); Session::set("_xsrf", $tokens);
            continue;
        }
    
        // If key matches and isn't expired, we can use it.
        if ($key == $xsrf  && time() <= $expires)
        {
            // Key is good, remove it from use
            unset($tokens[$key]); Session::set("_xsrf", $tokens);
            return true;
        }
    }
    return false;
}  




Rather than having to call the getXSRFToken() function manually on every form on your site, you should probably make a quick helper function to do it for you. This will also make sure you don't typo the name of the field anywhere.


// Creates the form input to use
public static function getXSRFFormInput()
{
    return "";
}




So what's going on with all this? Anytime there's a form on the site which POST's data, I will output the getXSRFFormInput() function to add a hidden field to this form. This hidden field contains the value returned from getXSRFToken(), this method generates a random 20 character nonce, and stores it into the user's session and then returns this nonce so it can be put into the hidden input. For example,



    
        
        
    





This will add a hidden field to the HTML form which will look like this,


";




The user's session will also now contain an array called "_xsrf", which contains a random 20 character nonce, along with the expiry time of that nonce.


$_SESSION['_xsrf'] array(1) => 
{
    ["3748ab53cf129d536eca"] => int(1275675864)
}




When the user submits this form, the idea is that we take the fkey value and check it against the ones we've stored in the user's session. If it's there and hasn't expired, then the request is valid and came from a form which the site generated. If it doesn't exist in the session, or has expired, then it's a possible XSRF attack and the output should be stopped and logged. This validation is done in the validateXSRFToken() method.



Now the final step, is to run something like the following code on any page that takes input. I actually have it in my bootstrap file which is run on every page request to the site.


if (count($_POST) > 0)
{
    if (!Auth::validateXSRFToken($_POST["fkey"]))
    {
        Log::create("XSRF", "Possible XSRF attack", LOG::NOTIFY, LOG::NOTIFY_ADMIN);

        // Inform user that why things broke and how to fix it.
        Notification::set("Your session token expired. Please refresh and try again (don't use the back button).", NotificationType::ERROR);

        // Redirect back to whichever page they came from.
        header("Location:".Session::getReferer()); exit();
    }
}




So if anyone ever makes a POST request to the page, we not only validate that the fkey exists in the user's session, but that it also hasn't expired. Restricting the validity time of the fkey reduces the likelyhood of an attack succeeding since it would have to be mounted quickly. Rather than purely relying on the expiry time, we also invalidate any key as soon as it's used, so that it can't be used again if it was captured by a MITM attack of some sort.



Why an Array of Tokens?

It was just the choice I made when first implementing this, it means each form gets it's own token value. This allows users to have another browser tab open and be able to submit both without getting an error (if I used the same token, it would be invalidated after the first form is submitted). If that's not a use case you need to support, then you could just store one XSRF token in the user's session and use it on every form on a page. Then just cycle the token once it's used. This should be just as effective as my method above where I generate a new token for each individual form. The important things to remember are,



The token must be tied to a specific users' session, it should not be site-wide!
The token should have some sort of expiry time.
The token must be invalidated once it's been used.



Why only POST?

You may be wondering why I'm only checking POST variables to prevent XSRF rather than both POST and GET. The answer is because GET should never be susceptible to such an attack if you're using it correctly.



So when should you use GET and when should you use POST? Well, it's all in the name. GET requests ideally should be used when the contents of the page are read-only, so nothing gets changed by the request. So a GET request should be idempotent (I should be able to trigger the same GET request as many times as I want and it shouldn't affect the result I get. Basically, you want to make sure it doesn't have any side-effects). POST should be used whenever it causes a destructive action (I don't just mean deletes... I mean destructive as in something changes). So login, logout, updates, creation, deletion, comments, voting, etc.



This is why browsers will generally prompt you to confirm when resending a POST request, whereas they won't with a GET request. This is because a POST request will generally change the outcome each time it is run so you want to make sure you're not going to accidentally run it again and change something you didn't want to. A GET request shouldn't change any data that way, and so it doesn't need to be confirmed.



If you do want to use a GET for a destructive action, say you want it to be an anchor tag rather than a button otherwise your style will look strange, then you must make sure that you at least redirect to a confirmation page to confirm the YES/NO, ideally this confirmation page should use POST.



If not, then say you were to have a link in your admin area which deleted an item using GET, without any form of confirmation. Some browsers do what's called pre-fetching, where they examine the links on a page, and pre-fetch the websites you'd get by cliking these links, under the assumption that you will go to them. Then when you do click them, the page can be displayed very quickly. If you just delete with a GET, then simple visiting the admin page could cause the browser to follow all those links in the background and delete everything. Obviously not something you want. Yes, this happened to me in the past, so please learn from my mistakes. (I've heard stories of GoogleBot accidentally deleting pages from a site this way too).



POST Refresh

Another thing to keep in mind, is that you should redirect users to a GET based URL once the POST request has been handled. This means users will be able to hit F5 and refresh the result page without being prompted if they want to resubmit the POST request (since that will now cause an XSRF error to be displayed). This is common practice on most sites and something users now expect to be able to do. So they'll probably not be too happy if they have to re-send requests and then get errors about their XSRF token expiring.



Final Thoughts

As I have said many times before, I am not a security expert. Take everything you've read here with a grain of salt. This is just how I understand things right now. I may have misunderstood something and there could be a glaring security hole in my examples above (please do let me know if that's the case!). Things might also change in future, new attack vectors may be discovered, or better ways of protecting against XSRF may become standard. You're highly encouraged to do your own research before settling on a method of XSRF protection.

Do we Really Need to Keep Typing www?

xml-feeds@wblinks.com (Rich Adams) — Tue, 09 Feb 2010 00:00:00 +0000

I'll admit it, I'm the kind of programmer who'll spend 10 hours writing some code to do a job that would only have taken 2 hours to do anyway. This isn't because I'm stupid (well... maybe a little), but more because I just like writing programs and it'll usually teach me something new. I also have a notion that if I'm going to do something once, I'm probably going to have to do it again at some point, so the next time it'll only take a few seconds because I have a program to do it.

Even though I will happily spend hours writing software that I didn't really need to do, I don't like to waste my time on something that to me seems pointless. Even small insignificant things which in the long run probably have no real impact on my time anyway. I'm kinda strange like that.

One of these small insignificant things is typing out the "www." before a web address. Saving myself the milliseconds it would take to type that out is a big deal for me, because I don't see what the point of me typing it out is. It's redundant information and can simply be implied. Yet all the time I'm coming across sites which will not work if you miss off those four characters at the beginning. This ends up costing my time since I won't realise until the request times out, and then I need to type out the "www." anyway in order to get it work. There's no real excuse for a site to behave this way, it's just rude.

A story of woe

Just before the holidays I wanted to book a train ticket back home, so I opened up my browser and typed "thetrainline.co.uk", hit enter and expected the site to pop right up. But it didn't, instead it timed out. So figuring the site was down I typed "virgintrains.co.uk" instead, hit enter and the site popped right up. I then went ahead and bought my train tickets home.

Later on however, I wanted to check some train times, so I went back to try the trainline, and discovered that the site wasn't down at all, it's just it will only work if you type "www.thetrainline.co.uk".

I've no idea on the trainline's business model, but I would imagine they make at least some of their money from selling the actual train tickets. Because their site doesn't work without the "www." they lost a sale from me. Maybe I'm the only person in the entire world who types URLs this way, and so it would only be my sale that was lost, but one sale is still one sale. However, if a lot of people type their URLs the same way as me, then it could end up being quite a bit of cash that's being lost.

There's no reason for only one of the URLs to work. It's not like they're using the one without the www for something else. It just timed out. It's a tiny piece of site configuration, but if you leave it out then it can cost you visitors and sales.

Why do we use "www." anyway?

I did some searching around, but couldn't find a conclusive answer to this. The only remotely sensible reason I came across is that back in the early days of the world wide web, it was used to distinguish the web site of a domain from the FTP server or mail server (which would be ftp.example.com and mail.example.com, etc). So it seems to be nothing more than a convention that caught on.

One of my personal thoughts on the matter is that it was a way to tell people that what they were looking at was an address for the World Wide Web. In the early days of the internet if you'd seen something like "jurassicpark.com" on the bottom of a movie poster, it wouldn't have been obvious what it was. Is it the name of the production company?, something that got printed by accident? Adding the "www." to the beginning made it immediately obvious to everyone that this was something to do with the internet or World Wide Web.

Despite this mainstream use of "www." in domains, there is no technical reason why it's there. There's nothing in the HTTP specification that requires the use of "www.", it's simply a standard that caught on.You can host an FTP/mail/SSH/web/news server all on the same domain since these services all use different ports. If you need them to connect to different machines, then yes, maybe you would perhaps need a separate domain to use in the DNS records, but there's nothing stopping you just using example.com for a normal website.

To "www." or not to "www."?

So should you use "http://www.example.com" or "http://example.com" for you site? Well, the answer is both, at least from the users' point of view. Make one the actual home page for your site, and just make sure the other one redirects (using a 301 redirect) to the correct one. That way, no matter what the user types into their browser, they'll get to the site.

But then which one is going to be the home page and which one is going to redirect? Some people see this as a non-question saying that it doesn't matter. Well, that's not always the case. The choice you make for this can have big impacts depending on how your site is set up.

You could make the same page appear for both rather than redirecting, but this can have negative impacts on your search rankings. Search engines may think www.example.com and example.com are actually separate sites, so it will appear twice. Google Webmaster Tools will show you if that's the case and generally they'll just pick one, but it could cause problems with other search engines.

There are advantages and disadvantages of both ways and it all depends on what your site is really used for, or your personal preference. For me, I prefer to not use the "www.", but that's because I don't use cookies on this domain. At the end of the day, if you don't have any major reasons for using one over the other, then it doesn't really matter so long as you stay consistent.

What if the site uses cookies?

Without the "www." cookies will be served on all subdomains. This may be the behaviour you want, or it may not. So you would need to analyse this for your own site and decide the best approach. By including the "www." it means subdomains can have different cookies from the main site, whereas without it, all cookies set on the main domain will propagate to all the subdomains.

Why does this matter? Well, if you intend to serve static data then you'll need a cookieless subdomain. If you've gone for the option without the "www." then you won't be able to do this, and instead you'll need to use a completely separate domain name. So pick wisely.

Microreasons

Microblogging is big. Yes, I'm talking about Twitter. Everyone and their dog seems to use twitter, and people will often want to link to places. Having four less characters to type can mean the difference between some free publicity for your domain on twitter, or someone using a URL shortener because your URL is too long. Since most URLs are generally too long anyway, if you don't use the "www." you could even implement your own URL shortener. A little free publicity never hurt anyone.

Summary

It all boils down to this; a list of some top tips for web developers with regards to "www.",

Make your site work both including and excluding the "www.".
Since including the "www." is social convention, you should make sure that users are sent to the same place if they include it or not, they shouldn't be sent to a different site if they miss off the "www." as that will just confuse matters.
Pick one URL to use as the main destination, and redirect the other URL(s) to it. This way, all of the URLs to your site should stay consistent.
Use a 301 redirect as this tells search engines that it's a permanent redirect and that both URLs are actually for the same thing. This should preserve your rankings and keep things consistent.
If you want to use a cookieless subdomain for static data, then you need to use the "www." for your main site
If possible with your domain structure (some websites use subdomains for other things, and that's fine) also try to make "w.domain.com" and "ww.domain.com" point to the main site. This will catch anyone who makes a typo. It takes very little time to setup and can save someone even a few seconds of grief when they realise their typing mistake.

It's the little touches like this which could make the difference between a visitor, or someone who goes elsewhere. It's probably not going to lose you many customers, but every little helps in making people's lives less irritating.

If you follow the tips, then no matter whether someone bothers to type those extra four characters or not, they'll still be able to get to your site or order their train tickets home without hassle!

Creating a 'Database is Down' Page

xml-feeds@wblinks.com (Rich Adams) — Thu, 14 Jan 2010 00:00:00 +0000

Earlier today the database for this site was unavailable for around 30 minutes, I imagine something was being rebooted somewhere. This doesn't really concern me too much, since this is just a personal site. So during this time anyone visiting the site was sent to a holding page explaining that the database was down, and providing some links of places for people to visit instead. I make the choice to hide the error details from the user, rather than displaying a page with cryptic error messages on it, or even worse an error message which prints out some critical information. After all, the user probably doesn't really care why my database is down, they just care that they can't get to the information they want. So instead, I log the error details internally and just give a nice page to the user.

I received an email from someone who obviously tried to visit my site during this time, asking how this was done. Since I haven't updated in a while, I thought it's be good to give a breif overview of how it's done.

It's not very difficult and just involves a simple catch statement. If there's an error when attempting to the connect to the database then I log it and email myself the error, and then redirect the user to the holding page.

In the DBAccess class for wblog (the name I gave to the backend code that runs this site), the connect() method looks like this,

function connect()
{
    global $DB_CONFIG;
    
    if ($this->getHandle() != null) return; // Already connected

    $pdo_string = 'mysql:dbname='.$DB_CONFIG['database'].';host='.$DB_CONFIG['hostname'];
    
    try
    {
        Log::create("DB","Connecting to ".$pdo_string." as ".$DB_CONFIG['username'], Log::INFO);
        
        $this->_handle = new PDO($pdo_string, $DB_CONFIG['username'], $DB_CONFIG['password']);
        $this->getHandle()->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    }
    catch (PDOException $e)
    {
        Log::create("DB","Exception caught while connecting. ".$e->getMessage(),
                    Log::ERROR,
                    Log::NOTIFY_ADMIN);
        header("Location: /dbdown/");
    }
    Log::create("DB","Connected to database", Log::INFO);
}

The relevant bit of the code is this,

    $this->getHandle()->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
}
catch (PDOException $e)
{
    Log::create("DB","Exception caught while connecting. ".$e->getMessage(),
                Log::ERROR,
                Log::NOTIFY_ADMIN);
    header("Location: /dbdown/");
}

I set the ATTR_ERRMODE attribute so that it will throw exceptions (See the PHP manual for details on setting attributes for PDO). I then catch any PDOException that's thrown. The "Log" class for wblog takes care of recording the error into a log file on the server, and the extra argument Log::NOTIFY_ADMIN tells it to drop me an email with a copy of the log (so that I get alerted to something going on). It's smart enough not to just email me the same thing everytime a user visits though, and has an increasing delay before it emails me the same thing again. (So first 5 minutes, then 10, 15, 20.. etc). You'll find people keep refreshing the site when they get an error page, so you'll just fill your inbox with emails if you're not careful.

Since you don't know what's in my Log class, you can just ignore those bits of the code. I'll probably make another post one day with the full code for this site anyway.

Once the error has been logged, I use the Location header to redirect the user to my holding page. Then I just make sure the holding page is static HTML file and doesn't touch the DB. It could also be PHP, but for my purposes a static HTML file is fine. As long as it doesn't touch the database, otherwise you're going to get into an infinite loop.

So there you have it, that's how I deal with issues when my database is down. If you have any better ways, or see something wrong with how I'm doing it, then let me know in the comments.

Taking the Security out of Security Questions

xml-feeds@wblinks.com (Rich Adams) — Sat, 12 Dec 2009 00:00:00 +0000

A security system is only as strong as the weakest point and this weakest point tends to be the bit where a human is involved (putting your password on a sticky note attached to your screen, for example). As a programmer, there are some things you just can't protect against, and human failure is one of them (Velociraptors are another). However, developers could at least try to make it a bit harder for people to break into other users' accounts.

I came across the following scenario a few weeks ago when attempting to sign up to a new system. Imagine the scene if you will, you've come up with a really secure password, you're happy that it would take someone a very long time to break such a password, and you haven't noted it down anywhere. Now the website you've signed up to is asking you to enter answers to some security questions in case you forget your password. Questions like "What was the name of your first pet?". You only have 4 different questions to choose from, and you have to pick at least two. It is mandatory to provide an answer.

This scenario is not as far fetched as you might first imagine, behold the following image!

I won't name and shame where this screenshot comes from, but it's from a product that's live, out there in the wild and worryingly also deals with money (and I should point out, not associated to anything I've worked on).

If you have any sort of sense at all, you'll see why this is a very bad idea.

All The Hard Work, For Nothing

I really don't know what goes through a developer's head when they come up with things like this. You've built yourself a great web application, you're using salted hashes for the password storage, and you're happy that your users have picked secure passwords.You've not forced some silly password restrictions on people, that actually make it easier to guess passwords. You're feeling pretty happy about it all.

But then you wonder about a certain use case. What happens when someone forgets their password? You've used salted hashes so you have no way to retrieve the original password, which means you'll need users to verify who they are somehow first.

Here is where you have some options,

If a user requests a password reset, generate a new password and send it to the email address on file.
Get the user to verify who they are by validating some information you have stored under their account. First line of address, day and month of birth, etc. Email them with a confirmation link so that you can verify they have access to the email account on file. That confirmation link then sends them to a page where the user can set a new password.
Security questions. Get them to fill in some security questions when creating their account, then we can just use that!

Don't bother with the first option, you should never send a password in plaintext over email. The second option is my preferred method, yet security questions seems to be the popular choice, and the one that I come across on lots of websites. If done right, then security questions can be a life saver when you've forgotten your pasword, but if done wrong, you might as well give out your password.

The Wrong Way

Simply, it's forcing a set of pre-defined questions on users. For example,

What is your mother's maiden name?
What is your mother's middle name?
What was the name of your first pet?
What was the name of your first school?

Now, think about this for a moment. You're going to allow anyone to "recover" their account and change their password if they answer these simple questions.

Not only have you reduced an attack on the site to a simple dictionary attack (since all those questions require dictionary word/sentence responses) but even worse, if you actually know someone, then you'll probably be able to answer those questions pretty easily. I'm pretty certain I could answer at least 3 of those for most of my friends. You'd also be surprised just how many people grew up with a dog called "Rex" or "Rover", etc.

Also, what if the user has never had a pet? Or their mother doesn't have a middle name? By forcing the questions users can pick from is not only going to irritate people but make it much easier for attackers to just randomly try and break the security questions rather than the password.

The Better Way

Once again, I'm going to stop myself from saying "The Right Way", since there's never going to be a foolproof way to recover someone's account if the password is lost. Every method of account recover has it's drawbacks. The very idea of having another method to access an account inherantly makes it less secure.

If you're going to go down the security question route, then a much better way is to let the user decide the question themselves. It allows the user to come up with something really cryptic, which only that person would know, yet no one else will. For example,

Q. 2X4B?
A. 523P

Unless you're a fan of Red Dwarf, you wouldn't get the reference that 2X4B is the middle part of Kryten's name, with the end being 523P. Ok, a simple Google search could probably find that out, so probably a bad example, but you get the idea.

Obviously this is still vulnerable to the "human element". A lot of people might choose very simple questions, not realising the implication that they're making it easier for others to get at their account. I've actually seen the following in a live environment before as a custom security question.

Q. My password is "iamawesome"?

Seriously... the mind boggles at what that particular user was thinking. So it may be worth adding a quick notice to remind users that silly questions like that will make their account insecure.

Don't Force Your Users To Create Security Questions

From a security point of view, allowing someone to recover their account if the password is lost is not a very good idea. The password is there for a reason, to secure the account. If it's lost, then no one should be able to get at the account. Account recovery procedures such as security questions were created purely because of customer service, not for security.

Users are only human, and humans forget passwords. It's that simple. Not wanting to lose a customer, you will want to provide a way for a user to recover their account and prove they are who they say they are. If security questions are implemented correctly, then the benefits to customer service can outweight the potential security implications of having two access points to an account.

However, part of customer satisfaction is not to force something on your users. I cannot stand it when I have to sign up for a website and fill in all sorts of information as mandatory, before I'm allowed to continue. By all means provide the user with the ability to create a custom security question, but don't make it mandatory. If a user doesn't want to provide another means to access their account, don't force it on them.

Forcing also has the side effect that most users will just think "Eugh, I just want to continue, I don't care about this" and they'll enter something stupid which is easily breakable. Putting "I don't care" as the answer for example (Yes, I've done this myself).

Adding Salt - Security Question Answers Are Passwords!

It's worth noting that answers to security questions are password equivalent and should be treated in exactly the same way as the main account password. I've seen systems which will store passwords as salted hashes, but then store the security answers in plaintext, completely negating the point of salting the password in the first place. Don't forget to store the security question answers as salted, strengthened hashes!

Combination Lock

The most effective method of account recovery I've come across is a combination approach. Ask the user to answer their custom security question, and then get them to verify their email address by sending a confirmation link they need to click. This means unless an attacker knows both the security question answer and has access to the email account, they won't be able to attack the account in this manner. At least not very easily. Once the user has clicked the confirmation link, they should be taken to a screen where they can just set a new password for their account.

Special Case

As with everything, there's always going to be at least one person who manages to lock themselves out of their account, with no access to an old email address and they can't remember their security question response. In these cases it's best not to attempt to provide the user with an automated method for account recovery. Another automated method is yet another point of entry for an account, and another place you need to secure.

In cases like this, it's often better to have a customer service email or phone number which the user can contact, and have a person resolve the issue once they're satisfied the user is who they say they are (by verifying other details on the account). In some cases that's just not going to be possible though, and their account will just have to remain inaccessible forever.

Get To The Point!

Basically my point is that if you're going to go down the security questions route, don't shoot yourself in the foot. I'm sick of signing up to sites that force me to choose from some security questions before I can proceed. It's the illusion of security, because it actually does nothing except annoy me, and make it easier for an attacker to gain access to my account since there are now two access points when I only want one.

If at all possible, don't bother implementing security questions at all. A forgotten password page should get the user to verify some other information in their account and then get them to confirm their email before allowing them to just chose a new password. This will always be better than security questions. But if you decide to implement them anyway, at the very least be sure to take note of the following,

Don't force a certain set of questions on the user. Let them chose their own question.
Don't force a user to fill in the questions as mandatory. If the user doesn't want to, they shouldn't have to. Of course, feel free to ridicule the user if they then forget their password, but it was at least the user's choice.
Hash the answer, salt the hash, and strengthen it before storing.
Don't rely purely on a security question response. Confirm the users email somehow and then send them straight to a page where they can set a new password. Avoid generating and sending a new password via email.
If the user doesn't have access to the email, or can't remember the question answer, don't use an automated process. Get another human involved to verify who the person is.

So there you have it. Bought on by my recent voyage into websites that force things on to me, let me know if you've had similar experiences, what methods you use for account recovery (if any) and whether you think anything I've said is wrong. If you have some novel method of account recovery, I'd love to hear about it!

Update (January, 2010): Someone sent me the following image, which are the security questions for a banking website. Another good example of how not to do it.

Only allowing four different questions is a very bad idea. For starters, what if you didn't have a high school mascot? Also most of those questions could be answered by anyone who was a close friend (or enemy). This is not the way to implement security questions, especially not for a bank.

Password Rules Don't Always Help

xml-feeds@wblinks.com (Rich Adams) — Tue, 03 Nov 2009 00:00:00 +0000

A while ago, I wrote about how users can't be trusted to come up with good passwords, and that it's up to us as programmers and web developers to hash the password (and salt it) so that it means bad password choices aren't immediately obvious to someone who gets hold of your stored data.

Of course, if people still use dictionary words, a simple brute force attack will work. So while some password tips such as "Don't use a dictionary word" are quite legitimate, there are plenty of rules and restrictions that do nothing but infuriate users and make passwords less secure.

There is absolutely no need to enforce certain password rules which seems to be forced on people throughout the corporate world and on many websites. Here are a few which I find the most annoying, supposedly implemented to make people use passwords which are more "secure", but in reality do just the opposite. Passwords become more predictable and expert users who create complex passwords get infuriated when they're force to make them less complex in order to fit with the restrictions. I wonder how many of you have come across these before.

The Rotating Password

"You must change your password every x days"

Usually it's every 30 days, so that each month you have to have a new password. There are then crazy rules, like you can't use a password you've used the last 12 times. Sure, if someone discovers a user's password, and waits a month to use it, it will no longer work.. but that's pretty much the only case this protects against. But it does a lot more harm.

If you force a user to change their password every 30 days, there's very little chance they are going to come up with a truly unique password each time. Instead, they'll tend to keep the same basic password and just add something extra in. You'd be shocked at just how many people use the name or number of the month as that extra thing.

Since we now know a likely part of the password, it can make it a bit easier to crack with brute force attacks. It could even mean if you do discover a user's password, and it has a number or name of the month.. you'll always know that users password for any month, making this measure completely pointless.

To add further annoyance to this, a lot of system adminitrators feel it's necessary to remind users that they'll soon need to change their password. So 14 days before your password expires in Windows for example, you'll get a popup saying "Your password expires in 14 days, do you want to change it now?" This has to be the most useless dialog I've ever come across. If my password expires in 14 days, why would I want to change it sooner? It makes no sense. Just tell me when it's expired and I'll change it. Making me select Yes or No every single time I login just irritates me.

The Ridiculous Restriction

Your password must be at least 6-8 characters long, contain uppercase and lowercase letters, and have at least two digits, but not for the first two characters. Oh, and it shouldn't contain the name of a species of dog.

How many websites or applications have you come across with ridiculous password restrictions like this? I see it all the time. Well.. maybe not the dog one. Yet it doesn't actually do anything to make the password more secure! Forcing users to use a certain pattern, means that an attacker also knows the expected pattern, reducing their search space from "anything in the world" to "well, now I know there's at least two digits and will be mixed case".

I've also seen things like "You must choose a password of the form XxxxxxN" where X is an uppercase letter, x is a lowercase letter and N is a digit. Now, that has to be one of the worst. You've given an attacker everything they need to crack the passwords in no time at all using a simple brute force approach.

Not only does it make attacks easier, but it's more frustrating for users. As I said in the previous post, most users will use the same password for everything. It's not their fault, it's just easier to remember one password than one hundred. If you suddenly force a policy that means users can't use that password, they'll have to come up with a new one, and they'll inevitably forget it, frustrating the user even more the next time they login. Some would argue this is a good thing, but I think if a user has a sufficiently complex password, they should be able to use it everywhere if they really want to (although if they really value security, they'll use a different one everywhere, that's a choice the user should make).

So you're 0 for 2. You've made it easier for attackers and pissed off your users, well done!

The Giveaway

"You cannot use that password because another user already has it"

Yes, I have actually seen this in a live environment before. If you have someone on your team who insists on using this method, take them into a room and slap them. This is a pretty big mistake since you've given away another users password. Usernames are much easier to find out that passwords, especially if it's a public website.

I'm really trying hard to imagine what was going through the designer/programmers head when they thought of using this technique. There is no need for a system to enforce unique passwords between users.. none at all! If your database model uses a password as a primary key, you have failed as a developer.

The Normalizer

"Your password cannot contain special characters, only normal letters and numbers"

Why do you care what I use in my password? As long as it's a character understood by the computer, and as long as it will be hashed the same way each time, you shouldn't care what people use. Restricting the use of special characters is on par with dictating the exact format. It's usually done because of some limitation to the system, that you can't store those characters in the database, it will mess up their admin area, or the website doesn't support multibyte characters, etc. Which means either you're not sanitizing your inputs properly, aren't using prepared statemens and probably don't hash your passwords. Alarm bells should be ringing.

If I want to use a percentage sign or a Japanese character in my password, I should be able to do so. It will mean my password is harder for anyone else to guess. I cannot stand it when I come up with a really complex password like "l-.(mR{n and am told that my password can only contain letters and numbers. You're forcing me to choose a password that is much less secure than one I've chosen myself. That is unacceptable.

The Length Restriction

Your password cannot be longer than 16 chars.

Why not? The most common excuse for something like this, is that the database can only store 16 chars in the field, or that a designer only wants the input box to be a certain size.

Well, the designers argument is nonsense, since they can size the box however they want without restricting the amount of information that can be input. The database excuse is also nonsense, since you should be hashing passwords, so that they'll always be the same length after hashing anyway. So the password itself can be any length you want, yet you only have to store the 40 character hash output (or however many characters you force the hash to be).

A limit can be enforced, but it should be at least 100 characters at the minimum.

If I want to type a novel as my password, then that should be my choice. Restricting password length just restricts the search space an attacker needs to use and infuriates users who actually pick long complicated passwords by making them pick something less secure.

So what are the lessons?

Enforce some complexity if you have to, but don't restrict complexity. I should make it clear that I'm not against all password restrictions. It can depend on the environment. If you're storing the password for a system which has personal data, then enforcing a certain amount of complexity in the password is probably a good idea. I guess the point I'm trying to make though, is that it's easy to go too far. There's a point when it just becomes a hinderance to users, and ends up creating patterns which make the system less secure.

Try not to restrict user password choice. By heavily restricting what a user can enter as a password you're simply annoying your users and making it easier for an attacker to guess or brute force the password. By enforcing too many password restrictions you are making the system more predictable, and when it comes to passwords (and cryptography in general), predictability is a very bad thing.

It comes down to some very simple points.

Don't resitrct what can go into a password, whether it's a max length or strange characters.
Don't force a users password to follow a certain pattern.
Don't force a user to change their password every x days unless you really have to.

Store your passwords securely (strengthened, salted hashes).

If you follow these simple rules, your users will be happier, your database will always store the same length of data no matter what the user enters and you can be safe in the knowledge that you're not giving away any information to make it easier for an attacker.

Update (December, 2009): The best method I've seen recently is simply a blacklist of common passwords and dictionary words (this is the method Twitter use apparently). Users can pick a password of any length, with any characters they want and don't have to change it at any interval. They just can't use a common password or dictionary word found in the blacklist. Simple, effective and only reduces the search space by the common dictionary words which would've been easy to crack using brute force anyway.

text-transform - Content or Presentation?

xml-feeds@wblinks.com (Rich Adams) — Sat, 15 Aug 2009 00:00:00 +0000
HTML and CSS are all about separating the content of a site, from the presentation. As with most things though, there are grey areas, and for a lot of people text-transform is one of them. Some people regard changing the case of text as being a content issue, others see it as a presentation issue.

Personally, I prefer to think of it as a presentation issue for one very good reason; to cover as many scenarios as possible. Suppose you have some mark-up, where the text needs to be in uppercase, because that's how you want a menu to look. You could write the HTML like this,

SOME OPTION

ANOTHER OPTION

This is all well and good, and is perfectly reasonable. Now suppose though that the design needs to change, and these need to be in lowercase instead. Not a problem, since you can use text-transform to convert them to lowercase.

ul li { text-transform: lowercase; }

But now what if it needs to be capitalized instead? You can just use text-transform: capitalize; right? Unfortunately not. A little known part of "text-transform: capitalize" is that is only changes the first letter of each word to be uppercase. It doesn't touch any other letters. This isn't a bug, it's specifically designed this way. But it could give you problems if you haven't thought ahead, since most people expect it to also transform the other letters to lowercase.

In the above example, you would never be able to capitalize the words. Yes, you could start with lowercase in the HTML rather than uppercase. Then you can use text-transform to make it uppercase, and capitalized. Except it would still be all in lowercase if someone were to view the site without a stylesheet.

My suggestion then, is to capitalize things like this in the HTML. So taking my above example again, I would write it like this in the HTML,

Some Option

Another Option

This way you can have full control over the case by using CSS. You can either leave it as it is, make it uppercase, or lowercase. It also means it will be properly capitalized if someone were to view the site without a stylesheet. This way you're not dictating the style in the content.

As with anything though, there are of course exceptions. If a product or trademark name is in all uppercase, or uses different cases within a word, then this would actually be considered the content rather than the presentation. An example of this would be something like "JavaScript", where the J and S should always be uppercase. In those kinds of scenarios, I wouldn't imagine that the case is ever going to change when the design changes, since it's a trademark, brand name etc.

If you need another option, like every other character uppercase or something, then you'll need to do something server-side to change it. CSS2.1 is good, but not that good. You could come up with a JavaScript solution, but JavaScript isn't always the answer.

It's all a matter of analysing the situation and deciding the best way to do it. Could the design change in future, and if so, would it be possible to have the control needed using just CSS? A good rule of thumb is to think the following, "If I view the page without a stylesheet, does the text still accurately reflect what I need it to?". But remember not to be caught out by how text-transform: capitalize; works.

JavaScript is Good, But Should Not be Relied Upon

xml-feeds@wblinks.com (Rich Adams) — Sun, 19 Jul 2009 00:00:00 +0000
There was a time, years ago, when the only reason to use JavaScript on a website was to produce cliché effects; flashing, scrolling, fading and popups to name but a few. It was slow, clunky and not a very nice language to write code in. Browsers required different code to do the same thing as another browser, the whole thing was a mess. People would overuse sites like dynamic drive to achieve all sorts of pointless effects, falling snow, page transition effects, and who can forget, the disabling of right mouse clicking by making a popup appear, which was about as effective as putting up a sign which says "Please don't push this button".

Recently however, JavaScript has lost it's status as as annoyance and has become common place on lots of main stream websites. It's picked up a certain bit of elegance and if used correctly, can add a lot to the experience of a site. Browsers can now parse JavaScript at speeds which make it viable to use for visual effects, and it can be used to trigger events and to change parts of the page dynamically without having to refresh the entire page (AJAX for example).

With libraries like jQuery and Prototype it has become even easier, as they add a new layer between the browser and the programmer, meaning you don't need to know about all the little inconsistencies between the browser implementations of JavaScript. The library will hide these from you, so you can concentrate on writing the code and let the library deal with getting it to behave the same in every browser.

A combination of browser support, libraries, speed and ease of use means JavaScript is now much more attractive than it was 5 or 10 years ago. It no longer has the stigma associated with 1990s sites and has become a much more civilised solution to web development. All the big sites use it, Google has auto-completion when you type, Digg uses it to show extra comments, etc.

There's no doubt that JavaScript is incredibly useful and a great way to make the interactive experience of a website seamless. But you should never rely on JavaScript for a part of your website to function.

Everyone Uses JavaScript

Everyone has JavaScript enabled right? Well I'm afraid not. If you've made any part of your website only accessible to those with JavaScript enabled, then you've cut out 6% of the people who use the internet (June 2009 figures from thecounter). That's a lot of people, no matter how you look at it. Yes, you can just put a disclaimer on the site saying "This only works with JavaScript". But that isn't going to make it any less annoying for those 6% of users.

You may be asking yourself, "Why would anyone have JavaScript disabled in this day and age?". Well, for any number of reasons. Companies sometimes disable JavaScript for security reasons, people may disable it because it makes their browser experience faster if they leave it off, people could be using a text browser, or a mobile phone browser that doesn't support JavaScript.

If a visitor comes to your site and doesn't have JavaScript enabled, yet your site heavily relies on JavaScript to function, then they'll leave immediately. Yet it is so simple to make things work without JavaScript.

Progressive Enhancement vs Graceful Degredation

There are two main programming philosophies when dealing with a new technology. Progressive Enhancement and Graceful Degredation. The latter means that you build the site using the new technology first. Make sure it works in all the latest browsers and that all of your fancy code runs well. Then you go back and add bits to make sure that if someone views it in an older browser, it will still work fine. Progressive Enhancement is the other way around. You make the main functionality of the site work without the new technology, then add it in afterwards so that it works for everyone.

Think of it like this. If you're building a skyscraper, how would you do it? You have two options.

Build the foundations, making sure it's a solid base. Then add each floor one by one.
Build all of the floors right to the top, then go back and make sure the base is strong enough for people to walk around on the bottom floor.

It would seem number 1 is the best choice, but for me, it depends on the scenario. For CSS I generally use Graceful Degredation, not from any pre-thought out plan, it's just how I happen to do it. When I write a new style, I will be developing on just one browser (generally Firefox/Linux), once I am happy with the style, I will validate it with the W3C validator service. I will have generally made a few typos here and there. Once the CSS is valid, I'll check it out in other browsers, usually the latest versions of Opera, Chrome, IE and Firefox. It will generally look the same in all of those, with just perhaps a few minor tweaks here and there. Then I will try it out in the older versions, IE7, Firefox 2, etc. Based on how many inconsistencies there are, I may just use a quick and nasty CSS hack to target just those browsers, or give a new CSS file entirely (in the case of IE6/7). This is graceful degredation at work. I know it works in the later stuff, now I'm making it work in the older stuff without breaking it in the newer stuff.

The problem with this method, is that sometimes you'll edit something to work in an older browser, but then find you've broken it in the newer browsers, so you start going back and forth with yourself trying to get it to work in both.

Progressive Enhancement is the other way around. You first make it work without any "extra stuff", then you add on the extra bits afterwards, for those who have the technology to view the extra bits. It means current people can still use the site, but those who are ahead in technology make use of the latest stuff. This is my preferred method, because it means you won't leave anything out, and will slowly build your site up from a base. It also means you generally aren't going to break something you've already coded, which saves time in the long run.

This is how you should add JavaScript to a site. First make the site work without JavaScript at all. Disable it in your browser and make the site function. Then go ahead and add all of the JavaScript goodness you want, safe in the knowledge that those 6% of people will still be able to use the site. Yeah, so it won't have lovely fade effects, or quick AJAX, but it will still work.

Progressive Enhancement also has an added bonus. It's actually generally a quicker and easier method than Graceful Degredation. It should be a no brainer!

You're a Hypocrite!

I recently (well, end of last year) made the decision that I will no longer build website for IE6. It is my strong opinion that IE6 is holding back web development. It takes nearly double the amount of time to make a site work in IE6 as well as all the other browsers out there. To quote a bit from Falling Down, this isn't economically viable! Yes OK, a lot of people still use IE6 (around 9%), but it won't go away unless we make a stand.

Unfortunately, it isn't going to go anywhere soon, because it was around for so long without any sort of competition, which means lots and lots of companies used it to build internal tools and sites, which now will only work in IE6. They would need to spend lots of money to upgrade their internal things to work in more modern browsers, so they will simply stick with what they have. Why spend money, when it works as it is. There is also the fact that "Internet Explorer" was very cleverly named to have the word "Internet" in it. You'll find the majority of non technical people will not understand what a browser is, and simply think that "Internet Explorer" is in fact "The Internet".

You could say I'm being quite hypocritical. Here I am trying to argue that we should make the time to make our sites work for 6% of people who don't have JavaScript enabled, yet I'm advocating cutting off the 9% of IE6 users out there for exactly the same reasons.

Well, it's not that simple. There are some fundamental differences,

To make a site work without JavaScript takes very little effort. Compared to the effort required to make a site work in IE6, which normally takes double the time as you have to build two sites. One to work in IE6, and one to work in everything else.
There is no excuse for someone to be using IE6 exclusively. It's heavily outdated, and I'm betting most people's kettle's are actually newer than IE6. JavaScript on the other hand has legitimate reasons to be turned off. Security, speed, text browser, mobile browser, accessibility.
Making your site work in IE6 only benefits people who use IE6. Making your site work without JavaScript has several other advantages. If you accidentally introduce a JavaScript bug or the JavaScript fails to load correctly, your site is resilient and won't be crippled, as it will still work without JavaScript. Search Engines generally can't follow JavaScript links, so making them work without JavaScript means search engines will be able to index your site better, etc
As new browsers come out, they may introduce inconsistencies in the way JavaScript works. Being able to fallback to a non-JavaScript solution means you don't have to immediately spend time and effort getting it to work in new browsers. Once you've made the site work in IE6 and put the effort in, that's it, it's never going to be useful for anything else.

Maybe you still think I'm a hypocrite, and that's fine. Let me know your thoughts in the comments. I'm curious to know where people stand on this subject.

Stop Waffling and Get to The Good Stuff

OK, so now to get to the code. Let's imagine you have a shopping website that relies entirely on JavaScript. Your links might look something like this,

Buy Item

That's great, it works for you and adds the item to your shopping cart. Now turn JavaScript off, and you're not going anywhere. There's a link on the page which doesn't work. A user is furiously clicking it, but it won't add anything to their cart. They go to another site instead, and you just lost a sale. Good luck if anyone using a text based browser wants to buy anything, as it won't work for them either.

(This has actually happened to me in the past, not because I've had JavaScript disabled, but because the code had a typo in it and wasn't loading correctly, so I couldn't purchase the item and went somewhere else instead).

If you build a website, you should make it work for as many people as you possibly can. (Yes, even IE6 if you have the time and the money). Imagine buying a car, only to find you can only use it on 94% of the roads. Great, until you hit one of those 6% of roads.

Here's a better way to make the link. This follows the Graceful Degredation process I talked about earlier. We had our original link, which works in JavaScript, and now we're editing it to work without JavaScript too.

Buy Item

(We'll ignore for the moment it's a bad idea to use a GET parameter for something like this, that's a PHP issue and not in the scope of this post. I'll write about it sometime in the future).

If you make the "addToBasket" function return false at the end, then if JavaScript is enabled, the link will not be followed and only the JavaScript will execute. But if JavaScript is turned off, then the link is followed, and the user can still add things to their basket, (with a script you've written, but for this example, I'm assuming the JavaScript would have called the same PHP script with AJAX anyway, it just will now redirect back to the same page). Everyone is happy and everyone can buy things.

But there's a better way.

The Better Way

Now there's another issue. If I don't have JavaScript enabled, I'm still downloading the "onclick='addToBasket()'" but of the code. It's not much, but over a whole page it could be a lot. It also makes the markup look untidy. Not a big issue, but I'm OCD about that sort of stuff. Also as we saw earlier, there's a better way to do things. Lay the foundations of your skyscraper first.

If you've never heard the term "unobtrusive JavaScript" before, I suggest you do a quick Google search to see what it's all about. Basically it's called unobtrusive because the markup stays the same and has no JavaScript calls in it at all. It is pure HTML, as it was meant to be. HTML for markup, CSS for presentation and JavaScript for behaviour. So rather than littering your HTML with "onclick=''", etc. It is nice and clean.

Since JavaScript only needs to work when JavaScript is enabled, you can use the JavaScript itself to modify the DOM once the page has loaded, in order to add the "onclick" event handlers for you. But this also means there's no chance of someone clicking a JavaScript link before the page has fully loaded, which can sometimes cause problems, since the JavaScript version of the link won't be active until the page has fully loaded. jQuery makes this sort of stuff extremely easy.

On page load, you modify your links to add the JavaScript handlers. If a person has JavaScript disabled, they don't even need to waste the bandwidth of downloading the JavaScript, and the HTML will be nice and clean making the page load faster for them. But if they do have JavaScript enabled, the site will be modified to work as a JavaScript version.

Suppose we now create the link like this,

Buy This

This is nice and clean, has an ID to unique identify it, has a CSS class so we can style it appropriately, and is free of JavaScript. This will work for everyone. Now we can add a JavaScript file to the page and "AJAXify" this link without ever needing to touch the HTML again. So an example using jQuery would be something like this,

$(document).ready(function() { $("a.buyitem").click( function() { var url = "buyitem.php?item=" + $(this).attr("id").substring(8); $.get(url, "", function(val) { alert("Item added to cart"); }, "text" ); return false; // So we don't follow the normal link. } ); });

When the page loads, this will go over all links with the "buyitem" class, pick out the item string from the id, and add an "onclick" event handler to submit the AJAX request. We haven't touched the HTML at all, but by just adding JavaScript we've made it work with AJAX for those with JavaScript, and we're safe in the knowledge it will still work for those without.

Note : The example is just to demonstrate unobtrusive JavaScript. You will want to have a much better callback than the one above. The callback should check that the item was actually added, rather than just show an alert either way as the example above does. You probably want to parse the id to get everything after the "-" character, rather than just substring(8), since the id may change in future, etc.

Final Thoughts

Every so often I come across forums, or answers on the internet, which give JavaScript solutions to common problems (which is perfectly fine), but then they say something like "This will only work with JS enabled, but everyone has it on nowadays, so screw those who don't, they probably won't want to look at your site anyway". This is just completely the wrong attitude. With a little bit of extra work you can make it work both ways and have a site that works for 100% of users, not just 94%. People argue that because it's 2009, everyone should have JavaScript enabled. While perhaps they should it doesn't necessarily mean that they have to. There are legitimate reasons to have it disabled, and those people shouldn't be excluded.

Another common issue people tend to solve with JavaScript is input validation. If your input validation is just a JavaScript file then you have failed as a web developer. All I need to do to bypass your checks is to disable JavaScript. It's the equivalent of using JavaScript for login/password (something I have actually seen on a production site before). By all means use JavaScript for input validation, it means the user can see something is wrong without having to submit the form, a great interface response. But don't rely on it. Always always always make sure you also have server-side validation for the same things, otherwise you're just asking for a world of trouble. It may feel like you're just repeating code, but it's worth it in the long run.

So I guess my point is this. Never rely on JavaScript to do a job. Form validation, prevent dual submits, links, etc. Always make sure they also work without JavaScript, otherwise you've either opened your site up for abuse, or made it unusable to 6% of the people who use the internet. Either way it's not very good.

Storing Passwords - The Wrong, Better and Even Better Way

xml-feeds@wblinks.com (Rich Adams) — Sun, 21 Jun 2009 00:00:00 +0000
If you've ever had to sign up to use a website, you'll no doubt have been prompted to provide a username and password, so that when you next visit the site you can login without having to fill in all of your details again. Your password has to be stored somewhere, otherwise you won't be able to login the next time you visit. Right? Unfortunately, a few sites I've come across do just this. They store your password, which means if the information is stolen, someone has got your password.

In a perfect world, everyone would use a different password for every account they sign up for, and that password would be a combination of numbers, letters (uppercase and lowercase) and special characters, and would be at least about 20 characters long. But let's be honest here, we're not all memory machines and remembering cryptic combinations like that isn't something everyone can do. So people are tempted to choose just one password and use it on everything they sign up for, including their email account. Which means there's the potential for someone to have the email address and the password. Not a very good thing.

There's no way around this, the weakest point in any security system is the human element. People are always going to chose easy passwords, or use the same passwords for multiple sites. No amount of security can make up for users picking a bad password to begin with, but we can still protect those use the same password for everything. So it's up to us as web developers to help to keep their passwords secure so that these people never have to go through the problems associated with someone getting into their other accounts.

This all comes down to how you store the password. Do you do it the "Wrong Way", the "Better Way" or the "Even Better Way"? (I'm not going to say the "Right Way", because I don't think there is such a thing when it comes to password security).

The Wrong Way

I was shocked to recently discover a website which stored their passwords in plain text. I couldn't quite grasp how in modern web development anyone could store their passwords like that, but it still happens. Storing passwords as plain text is a very bad thing. It means that if I sign up for a website using the password of "iamawesome", then the value that's stored in their database is also "iamawesome". This means any employees with access to the database (and there will always be at least one), can see your password. They can also probably see your email address associated to the account. So if you're one of those people that uses the same password, there's nothing to stop that person going and getting into your accounts. I like to think the majority of people out there in this position are honest and would never use that information for such nefarious means, but by the same token, I'm willing to bet there are people out there who aren't so honest.

So how can you tell if a website stores your passwords securely? The simple answer is that you can't. But there are a few tell tale signs that they're not doing all they can. Many websites have a "Forgotten password" function. If you go through this procedure and you get an email with your original password in, then it is highly likely they're storing you password in plain text. (Although this isn't always the case, they could be using reversible encryption, but again.. the developers will now how to reverse the encryption, so while it may stop anyone who steals the database, it wouldn't stop dishonest employees).

Personally, I would stop using such a service straight away (and in most cases I will email the relevant department to warn them that they're storing things insecurely). There is no need for a website to store that information in plain text. None at all. It puts your information at risk. There's a much better way to store passwords.

The Better Way

In order to login to a website, there's no need for the site to actually know what that user's password is, they just need to be able to tell if you've entered the same password as when you registered.

You can use something called a "hash" to do this. A hash is a one-way mathematical function, which given an input A, will always produce the same output B. But ideally it's very difficult to get from B back to A (but not impossible, you could use a rainbow table to lookup A based on the hash B).

Note : It is possible for more than one input value to result in the same hash. This is called hash collision. It's pretty unlikely to get hash collision unless you're using very large datasets, but it is a possibility. I've never come across it myself, although it does happen. The danger with hash collision is that someone doesn't need to know the original password, if they can find something which hashes to the same result, then they can just use that.

Different algorithms will hash things in different ways, and some are better than others. MD5 is a common one used in lots of tutorials but it has serious issues with collisions and is universally considered to be cryptographically broken. SHA1 also has similiar issues with hash collision. The bottom line is you shouldn't use MD5 or SHA1. Others, such as the SHA-2 family of algorithms, currently have no known collision issues. I'm going to use SHA-256 in these examples, but you could use any other number of hashing algorithms. There is still a problem with using these algorithms for passwords (they're too fast), but I'll get to that later.

PHP 5 has the built in hash function, which I will use to hash with SHA-256,

$password_hash = hash("sha256", "iamawesome"); // 4aa4029d0d0265c566c934de4f5e0a36496c59c54b6df8a72d9c52bdf0c1a0e8

The idea behind using a hash, is that you store this hash in your database, rather than the plain text password. In order to verify a user has entered the correct password when logging in, you use the same hash function on whatever text they enter, then compare the result to what you have stored in the database. If they match, then the correct password was entered.

$user_entered = hash("sha256", $_POST['password']); return ($user_entered == $password_from_db);

This way, only the user ever knows the real password. If someone were to look at the database of your stored hashes (whether it's a dishonest employee, or because it was stolen) they'll only ever be able to see the hash, and won't be able to go around getting into people's email accounts.

So you might be thinking that it's problem solved, let's just do this hashing stuff and we'll be secure. Well no. There are still some problems with this method. Suppose two people have the same password, this means they will have the same hash in the database. Now suppose you manage to trick one of these people into giving you their real password (via however, email scam, etc). This means you would now have the password for anyone with the same hash.

One of the most basic password attacks is called a dictionary attack. At the most basic level this would involve trying every word in the English dictionary as the password to login as someone, but more commonly the dictionary (a list of strings, not a language dictionary) contains a list of every combination of characters up to a certain length. This will generally work since it's quite common for people to just use a dictionary word or a common sequence of characters as their password. A dictionary attack is still possible when using hashes. You can generate a list of hashes from a dictionary and compare it to the hashes in the database, you'll probably find a few matches, and then you have the password for those people. The list of hashes is called a rainbow table, basically a massive lookup table of hashes and the corresponding input.

So really all we've done is obfuscated the password from people viewing it directly, but certain attacks are still possible. There is still a better way.

The Even Better Way

The way around these problems is to use something called a "salted hash". The definition of a salt is "random bits that are used as an input to a key derivation function", basically just another word for a nonce. Normally to create your hash you provide one thing as input (the original password) and you get the hash as an output. A salt/nonce is a random string of characters you use as another input into the hash function in order to get the output.

So now when storing a users password, instead of just hashing the password, you concatenate the password and the salt/nonce and hash that instead. So in PHP it would look something like this,

$salted_hash = hash($password . $random_salt);

The salt should be a string of random characters, ideally it should be long (more than 20 characters) and not just alphanumeric, it should have special characters too.

You can make your function however you want, as long as the random salt and password are both used in order to construct the hash you want to store. The method you use doesn't change the effectiveness of your password storage, one is not really any more secure than the other. Relying on the design of how you hash the password and salt together to provide security is called security through obscurity and should be avoided, since in reality the method you use doesn't affect the security.

$salted_hash = hash(hash($password) . $random_salt); $salted_hash = hash(hash($password) . hash($random_salt)); // ...etc

In the database, it's also important to store the salt for each user as well as the completed salted hash, otherwise you won't be able to tell if the user has entered the correct password. Each user should have their own random salt, you shouldn't use the same one for the entire database or for multiple users, otherwise you've completely negated the point of using a salt in the first place. You should however also use a site-wide salt (stored on the filesystem) in addition to a per-user salt (this is sometimes called a pepper), the idea being if your database is compromised then an attacker will only have the user salt, not the application-wide one.

$salted_hash = hash($random_salt . $sitewide_salt . $password);

You're database table should look something like this,

username password_hash password_salt
rich 2bae773debd80de... ?hb-:4a-loDC90^n#=R...
bob d82ff2c12d5065f... 2g}iT'JG><,?wP6{#VG...

So now, this means the hash you store will be different for every user in your database, even if they have the same password. So if anyone wants to do a dictionary attack by precomputing a rainbow table, they have to precompute it for each user individually, rather than the entire database at once, making it much more difficult and often infeasible.

(Note: You can never make it impossible to crack someone's password, there will always be a way. You can just make it very very difficult)

But now you have another problem, how do you implement the "I've forgotten my password" functionality if you can't tell the user their password? Rather then just retrieving the user's original password (which you can't do when using salted hashes), you instead want to verify the user by getting them to confirm some other information on their account, then send them a email with a confirmation URL/number in it. When they click this URL (or enter the number) they should then be taken to a page where they can set a new password for their account. You can now feel safe that the person setting the new password knows information about the account, and has access to the email address. You should never email a new password to your users as then if their email account is compromised, so is their account. Most mainstream websites nowadays will do this, and this is usually a good sign that they're storing your password properly.

Slow It Down

But wait, there's still more to do! While in most algorithms you aim to make things as fast as possible, in password hashing algorithms you want the opposite. The traditional hashing algorithms like MD5, SHA-256, SHA-512, etc. all have a serious problem when it comes to password storage, they're too fast. Now that you have salted passwords stored, what if someone were to try and crack those passwords?

While a rainbow table is a precomputed table of all the hashes corresponding to a set of plaintext passwords, another method of cracking involves feeding every possible combination of characters into your hashing algorithm rather than precomputing the results. These are called incremental password crackers. Basically rather than using space to attack the passwords (a massive rainbow table lookup), they use time (try a dictionary on your hashing function until a result is found). With services like Amazon EC2, you can have massive amounts of computing power for very little cost able do all the hard work for you.

If your password hashing function is very fast, then the incremental method will work faster and the password can be cracked quickly. If it takes 0.00001 seconds for your hash function to return, someone can try 100,000 passwords a second until they find the password. If it takes 1 second for your hash function to spit out the result, it's not a big deal as far as someone logging into your application is concerned, but for cracking the password it's a very big deal since each attempt will now take 1 second to get a result, meaning it would take 100,000 times as long to find the password as it would using your original hash function.

So how do you slow it down? Either use a hashing algorithm specifically designed to be slow (like bcrypt), or use a standard hash function lots of times. This is called key strengthening (or sometimes key stretching), and is just the idea of running the hash function through thousands of iterations.

So now your password hashing method becomes this,

$iterations = 100000; $salted_hash = hash($random_salt . $sitewide_salt . $password); for ($i = 0; $i < $iterations; $i++) { $salted_hash = hash($random_salt . $sidewide_salt . $salted_hash); }

You should also store the number of iterations in your database somewhere, since you will want to update the number of iterations in future if you get faster hardware. In that case, not storing it would mean older users would be unable to login. Your database table should now look like this,

username password_hash password_salt hash_iterations
rich 2bae773debd80de... ?hb-:4a-loDC90^n#=R... 100000
bob d82ff2c12d5065f... 2g}iT'JG><,?wP6{#VG... 150000

Some people like to store these all in one field, with some pre-defined separator. That's fine too, it doesn't really make much difference other than how you write the code to extract those values. Just make sure the separator you use can't appear in the hash or salt, or you'll run into issues when extracting the values. Remember not to store the sitewide salt in the database, it should be stored on the filesystem with your application.

username password
rich $100000$2bae773debd80de...$?hb-:4a-loDC90^n#=R...$
bob $150000$d82ff2c12d5065f...$2g}iT'JG><,?wP6{#VG...$

You can also take this a step further to future-proof it by including an identifier for the algorithm you're using. If you're using sha256 to hash passwords right now, in a few years you may want to use something better (there could be a sha1024 for example), if you include something that tells you with algorithm was used to hash, then you'll be able to use newer or more appropriate algorithms without having to change your code other than to add the new algorithm (you'll still be able to check passwords that were hashed using the older method, etc). Supposing sha256 had an identifier of 5 (which it does in crypt), then your database would now look like this,

username password
rich $5$100000$2bae773debd80de...$?hb-:4a-loDC90^n#=R...$
bob $5$150000$d82ff2c12d5065f...$2g}iT'JG><,?wP6{#VG...$

Hashed, salted, strengthened and future-proofed. If a better hashing algorithm becomes available, you can implement it without affecting current users. Likewise, if you get new hardware, you can increase the number of iterations without affecting current users. If you have dishonest employees, they won't be able to get the passwords since they're hashed. If someone were to steal your database, they'd either need to compute a rainbow table for every single user in the database, taking a very long time, or take 1 second per password to try a dictionary in an incremental password cracker, also taking a very long time. Either way it's going to be difficult to crack. But not impossible, it never is!

The Even Better Than The Even Better Way

Security is hard, and I'm not a security expert. Everything I've written about above is just how I understand it right now, there could be important things I've missed, or even worse there could be things I've misunderstood and presented incorrectly (and please do let me know if that's the case). Unless you understand the intricacies involved in cryptographic algorithms and cryptography in general you shouldn't try to implement a security system yourself. There are plenty of great libraries out there built by real security experts, which have been tried and tested in the real world. A simple mistake somewhere in a custom built system can go unnoticed until it's too late.

Nothing I've written about here is new, in fact most of it dates back to 1976. If any of this page is news to you, then it just proves my point that you shouldn't custom build a security system.

I'll say this once more, because it's very important, use a well established library that has been tried and tested in the field, and written by real security experts. There are plenty of options out there, bcrypt and scrypt being popular choices. PHP has the built-in crypt function, and there are many frameworks such as phpass which handle everything for you. All of these incorporate everything I've talked about above.

But if you like to ignore good advice and decide to implement a password storage system yourself (seriously, don't do it!), at the very least remember to store your passwords the even better way by using salted, strengthened hashes!

Always Include a Print Stylesheet

xml-feeds@wblinks.com (Rich Adams) — Wed, 29 Apr 2009 00:00:00 +0000
A print stylesheet is a stylesheet that's only applied to your website when the user goes to print it. A lot of sites will provide a "print version" of their page, which is a page which has the same content as the original page, but has different markup and layout to make it look better when printed. But creating a separate "print version" of your page is not necessary, you can simply use a print stylesheet to do all of the work.

Most websites have a lot of content which is very useful when browsing the site, but makes no sense being printed out on paper. Navigation menus, for example, are great for getting around a site, but are a waste of ink and paper if it gets printed every time a user prints a page from your site. You also don't really want background colours to be printed, as if your site has a dark background it's going to waste a lot of ink.

The internet and paper are two completely different types of media, and your site should adjust so that it's best suited for the media it is being presented on. While small fonts may look fine on a screen, they may be very difficult to read when printed. Banner images may make your website stand out from the rest, but it's just going to waste paper if it gets printed. Now I could go down the green route here and say that having a print stylesheet helps to save the environment. Less ink equals less paper, and not wasting paper is good for the environment. For most people this would be more than enough to encourage them to use a print stylesheet. But for the more business minded, print stylesheets take time to develop, and is it really worth the extra cost? Well, the answer is yes.

How many times have you gone to print a website, whether it's a flight itinerary or just a product description page, and you've had pointless images printed, the text goes over the edge of the page, and 5 pages or printed material comes out of the printer when you only really need the first page since the other 4 are just banner images? I've lost count of the number of times this has happened to me, and it's extremely frustrating. So much so in fact that I generally resort to just cut and pasting the text I want into a new document and printing it myself. This is a mild annoyance for me, but not the end of the world. But it means the company behind the site doesn't have their logo on my printed copy, and that's a missed opportunity. Plus a less frustrated customer is always an advantage right?

A print stylesheet takes minutes to build, but can make a world of difference for your users. For me, it's not so much the reason that I want my users to have a better experience (although I admit that should be driving force), it's that I want my designs to look as good on paper as it does on a screen. I want my sites to look great, no matter how a user chooses to view it.

So now that you're convinced that adding a print stylesheet is a good thing (or maybe you're not convinced, I'm still going to assume you are though, since otherwise it makes the next bit of this rather pointless) you may be wondering what a typical print stylesheet looks like, and how to include on on your page. Well, that's what this next part is all about.

Including a print stylesheet on your site is pretty straight forward. You add the CSS file like you would any other, but you make sure to select the media type of "print". That way the styles are only applied when the user goes to print the site, rather than all the time.

Generally you will want to hide some elements that don't need to be printed such as sidebars, headers, menus, etc. This can be done by specifying "display: none". I've included the "!important" declaration just to be sure it gets applied and isn't overridden by anything else.

#header, #menu, #additional-content { display: none !important; }

Another useful thing you can do with print stylesheets is to show the URL of any link after the link text. When viewing a website this isn't really necessary, since you can click the link. But when printed you have no way of knowing the URL for a link. I'm normally not a fan of using CSS to include content, but in this case it's very useful, so I like it.

a[href]:after { content: " (" attr(href) ") "; }

You can use the print stylesheet to add page breaks too. If you have a list of blog posts for example, and you want them to print one to a page, then you can use the following bit of code,

.blog-post { page-break-before: always; }

Does anyone else have any tips and tricks for things to include in print stylesheets? Does anyone prefer to create a "print version" of their page rather than using a print stylesheet, and if so, why? Comments on a postcard please, or you could just comment to this post.. your choice :)

Separation of Content and Presentation with HTML and CSS

xml-feeds@wblinks.com (Rich Adams) — Thu, 07 Aug 2008 00:00:00 +0000
Almost every day I come across sites which have missed this vital point entirely. They think that just by adding CSS to a site, it's suddenly now much better. This is not the case, and I admit I've been in this category myself in the past.

The whole concept of HTML and CSS is to separate the content of a site, from the presentation. To layer it. This means that there should be absolutely NO content in the CSS, and absolutely NO presentation in the HTML.

Using presentational names for ID's

Over and over I'm seeing sites with things like id="left" in the code. What happens when the design changes? "left" gives no semantic meaning to the element at all, that is presentational, and should not be used. A better way is to name it something relevant to the content it contains id="home-page-quote", etc. Now when the design changes, the HTML still semantically represents the content.

Using spacer.gif files

I'm also seeing the use of spacer.gif's. Back in the days of table based layouts, people would use spacer.gif files (a small transparent image) to space out the content. Not only is this very bad practice, as it will mess up the layout for people who use mobile devices, but also because when the page is viewed without style (such as for screen readers) there will be images all over the place that lend absolutely nothing to the content. If you use CSS, there is absolutely no need for a spacer.gif file, you can do everything you need in CSS, with no exceptions! Padding and margins can space out the content and you can use letter-spacing, text-indent or line-height to space out text.

Clearing

Another thing I've noticed, is the use of a div which has class="clear", yet it has no content whatsoever. Again, these add nothing to the content of the page, they don't make semantic sense. This is presentational, as you want to clear a float. But again, what if the design changes? The markup will need to be changed too. It is possible to use CSS to make self clearing floats with minimal effort, and with valid standards compliant CSS. A quick Google search will reveal many different pages discussing the matter of self-clearing floats.

Page Ordering

When viewing HTML without the CSS applied it should be in a suitable order so that it reads well, and only images which are part of the content should be shown (logos are included here, as they're part of the page content). If the page is in an odd order, screenreaders will be confusing for people and if random style images are showing, then it detracts from the content. You should be able to completely change the layout of a page by only editing the CSS. If you have to edit the markup, then you've done something wrong and need to re-think the code. This includes extra markup only used for presentational purposes. You shouldn't have a div called "coloured-line" for example, as this is specific to the current design. Yes, you can hide it in later designs with display:none, but ideally you want to make the coloured line just part of the CSS and give no reference to it in the markup.

Nobody's perfect, and I've done many of the things I've mentioned here in the past myself (and a lot more recently I've found myself naming classes "left" without realising until later). The whole point is that we find new and better ways to do things, and constantly strive for perfect code. Hopefully this will help some people to use HTML/CSS the way they were meant to be used.

username	password_hash	password_salt
rich	2bae773debd80de...	?hb-:4a-loDC90^n#=R...
bob	d82ff2c12d5065f...	2g}iT'JG><,?wP6{#VG...

username	password
rich	$100000$2bae773debd80de...$?hb-:4a-loDC90^n#=R...$
bob	$150000$d82ff2c12d5065f...$2g}iT'JG><,?wP6{#VG...$

username	password
rich	$5$100000$2bae773debd80de...$?hb-:4a-loDC90^n#=R...$
bob	$5$150000$d82ff2c12d5065f...$2g}iT'JG><,?wP6{#VG...$

W(e)blinks

AWS Tips I Wish I'd Known Before I Started

Application Development

Operations

Billing

Security

S3

EC2/VPC

ELB

ElastiCache

RDS

CloudWatch

Auto-Scaling

IAM

Route53

Elastic MapReduce

Miscellaneous Tips

Parting Words

Protecting Yourself Against Insecure Websites

What's the impact to you if your password is leaked?

Protecting against sites that don't securely store your password.

1. Never use the same password on more than one site.

2. Always use the most complex password you can.

3. Never put information on your account that the site doesn't absolutely need.

4. Regularly purge expired/irrelevant information from your accounts.

5. Use a different email address for every account you sign up for.

How is all of this stuff going to actually help me?

Afterthought: How did we get into this mess?

Fix Graphics in Ubuntu 10.04 Lucid Lynx on a Toshiba Portege

Hardware

New Graphics Drivers

Fix Networking in Ubuntu 10.04 Lucid Lynx on a Toshiba Portege

Hardware

Fixing eth0

Fixing wlan0

Secure Session Management Tips

1. Always regenerate a session ID (SID) when elevating privileges or changing between HTTP and HTTPS.

2. Check for suspicious activity and immediately destroy any suspect session.

3. Store all session information server-side, never store anything except the SID in the client-side cookie.

4. Confirm SIDs aren't from an external source, and verify the session was generated by your server.

5. Don't append the SID to URLs as a GET parameter.

6. Expire sessions on the server side, don't rely on cookie expiration to end a user session.

7. Use long and unpredictable session IDs.

8. Properly sanitize user input before setting headers with them.

9. When a user logs out, destroy their session explicitly on the server.

10. Check your session configuration.

11. Force users to re-authenticate on any destructive or critical actions.

Summary

Cross Site Request Forgery (CSRF/XSRF)

What is XSRF?

The Wrong Way To Protect Against XSRF

The Right Way To Protect Against XSRF

Why an Array of Tokens?

Why only POST?

POST Refresh

Final Thoughts

Do we Really Need to Keep Typing www?

A story of woe

Why do we use "www." anyway?

To "www." or not to "www."?

What if the site uses cookies?

Microreasons

Summary

Creating a 'Database is Down' Page

Taking the Security out of Security Questions

All The Hard Work, For Nothing

The Wrong Way

The Better Way

Don't Force Your Users To Create Security Questions

Adding Salt - Security Question Answers Are Passwords!

Combination Lock

Special Case

Get To The Point!

Password Rules Don't Always Help

The Rotating Password

The Ridiculous Restriction

The Giveaway

The Normalizer

The Length Restriction

So what are the lessons?