<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom">

 <title>Web Species blog</title>
 
 <link type="text/html" rel="alternate" href="http://blog.webspecies.co.uk/" />
 <updated>2013-03-11T11:17:48-07:00</updated>
 <id>http://blog.webspecies.co.uk/</id>
 <author>
   <name>Web Species Ltd</name>
 </author>

 
 <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/WebSpeciesBlog" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="webspeciesblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
   <title>One month later... Azure+ is dead</title>
   <link href="http://blog.webspecies.co.uk/2011-11-17/one-month-later-azure-plus-is-dead.html" />
   <updated>2011-11-17T17:27:00-08:00</updated>
   <id>http://blog.webspecies.co.uk/2011-11-17/one-month-later-azure-plus-is-dead</id>
   <content type="html">&lt;div class='alignright'&gt;&lt;img src='/media/fail.jpg' alt='Fail' /&gt;&lt;/div&gt;
&lt;p&gt;What a ride this was&amp;#8230; Just a bit more than a month ago I posted an &lt;a href='http://blog.webspecies.co.uk/2011-10-03/we-built-a-cloud-platform-for-php-wait-what.html'&gt;article&lt;/a&gt; on the project we were secretly working on - &lt;a href='http://cloud.webspecies.co.uk/'&gt;Azure+&lt;/a&gt;, the PHP cloud platform built on top of Windows Azure infrastructure. From then functionality was improved, amazing additions were planned, I travelled tens of thousands of miles and showed it to hundreds of people. And now I can announce that it&amp;#8217;s dead. Here is what happened.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;If you want to know the &amp;#8220;Why?&amp;#8221; part you can skip the first few sections all together and just go straight to the sad part. However, this case is a bit complicated&amp;#8230; Well ok, it&amp;#8217;s very complicated. And I can&amp;#8217;t talk about the main reason at all, so the only thing you will find is an &amp;#8220;abstraction&amp;#8221; of what happened. Something that I have no control of.&lt;/p&gt;

&lt;h3 id='the_launch'&gt;The launch&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/explode.gif' alt='Launch' /&gt;&lt;/div&gt;
&lt;p&gt;The blog post about the project has to be the most popular post of this blog, it received well over 10&amp;#8217;000 unique visitors in a day. I think this is a quite big number, which was mainly fueled by social networks as everyone wanted to see what can be done with Windows and PHP. That&amp;#8217;s why it had the &amp;#8220;Wait&amp;#8230; what?&amp;#8221; at the end of the title - I assumed a lot of people would question the value of this kind of project.&lt;/p&gt;

&lt;p&gt;And &lt;em&gt;oh my god&lt;/em&gt; there was a lot of people who did. Microsoft guys were internet-high-five&amp;#8217;ing us all the way, but the majority of everyone else was either still questioning the point of this or plainly calling it stupid. They had their reasons though - Windows and PHP are not really a welcomed topic, mainly because of the prejudice because it works just fine, even if &lt;a href='http://en.wikipedia.org/wiki/Platform_as_a_service'&gt;PaaS&lt;/a&gt; shouldn&amp;#8217;t &amp;#8220;expose&amp;#8221; the OS it is running on (as much as that&amp;#8217;s possible). Also similar projects like &lt;a href='http://orchestra.io/'&gt;Orchestra.io&lt;/a&gt; or &lt;a href='http://phpfog.com/'&gt;PhpFog&lt;/a&gt; had taken off.&lt;/p&gt;

&lt;p&gt;Ignoring the arguments online of whether it was a good project, the prototype that we launched worked flawlessly. One of the biggest demos of it was me doing a demo in my &lt;a href='http://www.slideshare.net/juokaz/php-in-the-cloud-php-barcelona'&gt;PHP in the Cloud&lt;/a&gt; keynote at the PHP Barcelona conference. Deploying an app on the stage with 500 people watching it&amp;#8230; crazy, but it worked. Not once did it failed, even after showing it hundreds of times to all sort of different people I had a chance to show it to.&lt;/p&gt;

&lt;h3 id='what_people_are_looking_for'&gt;What people are looking for?&lt;/h3&gt;

&lt;p&gt;You might not realize this, but PHP ecosystem is a bit different from any other language (as they are different from each other). PHP developers usually start their careers by just hacking on some code locally, because it only takes minutes to setup a PHP environment. And if something doesn&amp;#8217;t work - fix it and refresh. No need to recompile or deal with DLL hells. This obviously brings some disadvantages, but this post is not about how good or bad PHP is.&lt;/p&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/its-easy.jpg' alt='It&amp;apos;s easy' /&gt;&lt;/div&gt;
&lt;p&gt;Because of this upbringing, the tooling for PHP developers should follow this idea. And that&amp;#8217;s what we were trying to achieve and push the industry to do the same. One of the key elements of the platform was that you could always push code directly. As awesome as pulling from Git sounds, you can do way more than that by just executing one terminal command. And if it doesn&amp;#8217;t work - push again and refresh.&lt;/p&gt;

&lt;p&gt;A lot of Microsoft folks asked us about all sorts of different enterprise behaviours and how we are planning to support them. We are not, because PHP (and even non PHP) projects do not need them. Need a custom PHP setup - build your own servers, you are obviously qualified enough. Same applies to all sorts of custom or niche functionality and technologies. But most PHP projects do not need that. And that&amp;#8217;s why we didn&amp;#8217;t overcomplicated our solution, we solved the problem for majority of folks and for anyone else who needs some flexibility which we don&amp;#8217;t support - platform as a service is not right for you.&lt;/p&gt;

&lt;h3 id='future_architecture'&gt;Future architecture&lt;/h3&gt;

&lt;p&gt;Azure+ wasn&amp;#8217;t about being an abstraction on top of Windows Azure, even if initially it looked like one. The reason why apps would take 15 minutes to create was because we were spinning up new instances for every new app, this takes some time. Nonetheless the goal of Azure+ was to deploy apps to a &amp;#8220;shared server&amp;#8221; similarly to how &lt;a href='http://www.heroku.com/how'&gt;Heroku&lt;/a&gt; does it (in quotes because it was not trying to be a shared hosting platform.)&lt;/p&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/fast.jpg' alt='Fast!' /&gt;&lt;/div&gt;
&lt;p&gt;Imagine that each server has N number of slots; each slot can host an app in a fully isolated and secured environment. If it happens that your app needs more resources than the slot can provide (thus not killing other apps), you get more slots - on the same server or other servers. Apps can scale beautifully, any extra modifications are not required and price of the service can match any other PaaS and beat it. If that&amp;#8217;s too crazy for you - you would still be able to choose dedicated boxes.&lt;/p&gt;

&lt;p&gt;Once we would&amp;#8217;ve had that and MySQL setup done, I&amp;#8217;d say it would have been ready for the first public beta. We weren&amp;#8217;t that far actually - it would only have required some hardening of the web server and PHP, a bunch of reverse proxies and a DNS server. This setup works beautifully because applications can be scaled very dynamically by us and there is a cluster supporting the infrastructure so it&amp;#8217;s highly reliable. But then some things happened&amp;#8230;&lt;/p&gt;

&lt;h3 id='good_bye'&gt;Good bye&lt;/h3&gt;

&lt;p&gt;I was put in a position to kill it. Not forced, but continuing it anymore would make no sense. That&amp;#8217;s how much I&amp;#8217;m going to say - I was advised to keep my mouth shut by some clever people in suits. Of course I&amp;#8217;m sad to let it go after so much effort and time we have put into this, but sometimes ideas just don&amp;#8217;t work out.&lt;/p&gt;

&lt;p&gt;P.S.&lt;br /&gt;Name &lt;em&gt;Azure+&lt;/em&gt; was never an issue; it was chosen as a codename and would have been changed as the project progresses.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>We built a cloud platform for PHP. Wait… what?</title>
   <link href="http://blog.webspecies.co.uk/2011-10-03/we-built-a-cloud-platform-for-php-wait-what.html" />
   <updated>2011-10-03T15:27:00-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-10-03/we-built-a-cloud-platform-for-php-wait-what</id>
   <content type="html">&lt;p&gt;We built a cloud platform for PHP. Yep, you heard it correctly. We see a huge opportunity in the market and are willing to work hard to make deploying PHP projects very easy. However this is a different one and here is the story behind it and what it can do for you.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;We call it &lt;a href='http://cloud.webspecies.co.uk/'&gt;Azure+&lt;/a&gt;. Similarly to Notepad++ relation to Notepad, Azure+ is Azure done right and usable. This is a code name though, which might change once this goes to production. As will the design, which currently works as a good basis and is based on the great &lt;a href='http://twitter.github.com/bootstrap/'&gt;Twitter Bootstrap&lt;/a&gt; framework.&lt;/p&gt;

&lt;h3 id='why_azure'&gt;Why Azure?&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img src='/media/azure.png' alt='Azure' /&gt;&lt;p class='wp-caption-text'&gt;Current workflow with Azure, original from &lt;a href='http://xkcd.com/303/'&gt;XKCD&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;There is nothing specific about &lt;a href='http://en.wikipedia.org/wiki/Azure_Services_Platform'&gt;Azure&lt;/a&gt; that we wanted to leverage, but because so many existing PaaS providers are built on Amazon cloud it just made sense to try something else. Furthermore, I have a lot of experience with Windows and PHP so it all felt like a good plan. I think we are awesome enough to make Azure rock for PHP, because&amp;#8230;&lt;/p&gt;

&lt;p&gt;Azure is just impossible to use for PHP today. This is &lt;strong&gt;a&lt;/strong&gt; fact. Doesn&amp;#8217;t matter which way you look at it, it just su.. isn&amp;#8217;t particularly good. The amount of steps you need to make, the knowledge you need to have and the fact that you can only deploy from Windows host are some of the things which make it a very painful experience. I had enough of this pain.&lt;/p&gt;

&lt;p&gt;What is most important, I find Microsoft&amp;#8217;s approach and tooling lacking in so many areas, that the only way I knew how to fix this was to build a service on top, rather than release Azure+ as a product or open source project. There was and still is no way I can change the 15-20 min. deploy time (try debugging a non-working app having to wait half hour before every retry), so we built something which overcomes it.&lt;/p&gt;

&lt;h3 id='oh_god_no_windows'&gt;Oh God no, Windows?!&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/ohgodno.jpg' alt='Oh God no' /&gt;&lt;/div&gt;
&lt;p&gt;It&amp;#8217;s not a big surprise that Azure is running on top of Windows, it&amp;#8217;s a Microsoft cloud at the end of a day. I know a lot of PHP developers feel very negative about Microsoft and Windows specifically. Well, Internet Explorer 6 specifically, but Windows is not better either. But that is something what you would care if this was an infrastructure service.&lt;/p&gt;

&lt;p&gt;Azure+ is Platform as a Service or &lt;a href='http://en.wikipedia.org/wiki/Platform_as_a_service'&gt;PaaS&lt;/a&gt; in short. What that means is that you deploy apps to a cloud black box and the infrastructure it is running is completely irrelevant to you. There is more work to be completed to making it truly PaaS, but our goal is to make deploying to this service completely headache-free and to just make everything work*.&lt;/p&gt;

&lt;p&gt;Important fact to note, this is not developed under any collaboration or affiliation with Microsoft and thus it&amp;#8217;s our own decisions on where we&amp;#8217;ll take it from here. I think PHP support on Windows is as good as on any other OS and all the PHP apps I tried (Zend Framework, Symfony2, Lithium) worked pretty much out of the box.&lt;/p&gt;

&lt;h3 id='features'&gt;Features&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/toys.gif' alt='Toys' /&gt;&lt;/div&gt;
&lt;p&gt;First of all, PHP developers start by writing PHP code, because to start learning PHP you only need a Apache installed and that&amp;#8217;s it. Hack on some code, click refresh and you see the result. That&amp;#8217;s what PHP is. That&amp;#8217;s why at least 15 minutes of wait is just something PHP developer wouldn&amp;#8217;t want to do. We made it faster. How about 5 sec. or less deployment time?&lt;/p&gt;

&lt;p&gt;Furthermore, in core we have mechanisms which allows us to support and change PHP configuration and version in the same short time. So you can try different PHP versions in a matter of one mouse click or switch off &lt;code&gt;display_errors&lt;/code&gt; when your app is ready to live. Currently you can only choose from two PHP versions and error reporting mode, but there is more to come.&lt;/p&gt;

&lt;p&gt;Speed of deployments and configuration freedom is a good building base to start with. But there is more baked in, like an API which allows pushing code directly and a service which will pull from a specified Git repository automatically. Right now we are working on adding MySQL support, so you can port pretty much any existing app. It&amp;#8217;s a great core platform which allows adding new functionality very very easily.&lt;/p&gt;

&lt;h3 id='reception'&gt;Reception&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/azureplusisgood.jpg' alt='Azure+ is good' /&gt;&lt;/div&gt;
&lt;p&gt;It was an unbelievable journey so far and we learned insane amount of things about Azure itself and how to make PHP deployments blazing fast. Some things required hours to tackle, but in the end we made sure that our users are never going to have to deal with them. And believe me, there are &lt;strong&gt;a lot&lt;/strong&gt; of things you can shoot yourself with when working with Windows.&lt;/p&gt;

&lt;p&gt;This is a project which needs feedback and especially from people who know PHP, cloud stack etc. really well. I was running demos and giving access to some people I know and, I think, they were really impressed with the stack. Also because it relies heavily on Microsoft stack, I had spent past two weeks demoing it to a selected group of Microsoft friends and so far reception was amazing. To quote one:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I think you could single highhandedly revolutionize Azure&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I think this is a great achievement for PHP community too, because a lot of the functionality we support is not available in some of the leading services so this should kick their asses a bit. We want to stay competitive and keep pushing the PHP ecosystem further, but when it comes to standards, we&amp;#8217;ll adopt any upcoming specifications for PHP platforms.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Currently a group of 15 or so people is actively testing this and is sending us valuable feedback. Nevertheless it&amp;#8217;s quite close to production-quality service and you&amp;#8217;ll hear more about it very soon. If you feel like you&amp;#8217;d like to test this (completely free of charge) and would be able to provide some good thoughts, feel welcome to write to me. You can find more details about Azure+ &lt;a href='http://cloud.webspecies.co.uk/'&gt;here&lt;/a&gt;.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Dependencies management in PHP projects</title>
   <link href="http://blog.webspecies.co.uk/2011-09-09/dependencies-management-in-php-projects.html" />
   <updated>2011-09-09T13:27:00-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-09-09/dependencies-management-in-php-projects</id>
   <content type="html">&lt;p&gt;Rarely a project lives by itself, especially in the days of frameworks. Furthermore, there are a lot of great open source libraries you might want to use to save time. But all of this raises a new problem - how could we manage all those dependencies. Here are some thoughts on this problem and how you might want to solve it; without shooting yourself in a foot. Which is commonly known as &lt;a href='http://en.wikipedia.org/wiki/DLL_Hell'&gt;DLL Hell&lt;/a&gt;.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;Usually SVN or Git integrated external references management tools are used for this. But&amp;#8230; Version control systems are not made for managing dependencies. Period. They can be made to do so, but sooner or later they are going to fail at doing that. This is a fact and there is no way avoiding it, if you don&amp;#8217;t trust me on this here are some proofs why.&lt;/p&gt;

&lt;h3 id='version_control_systems'&gt;Version control systems&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/stop.gif' alt='Stop' /&gt;&lt;/div&gt;
&lt;p&gt;The most popular one couple years ago was &lt;a href='http://svnbook.red-bean.com/en/1.0/ch07s03.html'&gt;svn:externals&lt;/a&gt; for SVN, which is quite similar to &lt;a href='http://kernel.org/pub/software/scm/git/docs/git-submodule.html'&gt;git submodule&lt;/a&gt; for GIT. The first obvious problem is that they both only support referencing repositories of the same type, that is you can&amp;#8217;t include a Git dependency in a SVN project. Which today is a very problematic thing because you might still be using SVN, although not sure why you would be doing so, but a lot of the open source projects have moved on to GitHub.&lt;/p&gt;

&lt;p&gt;If you are fine with the above, I think you should be quite quickly annoyed by the fact that those sub-folders you are automatically populating are in fact full checkouts by themselves, thus not read-only. Which potentially is a very risky design characteristic, because most of the time you aren&amp;#8217;t supposed to commit from those checkouts, even if you have changed something there.&lt;/p&gt;

&lt;p&gt;Git users might be disappointed to know that submodules do not support partial checkouts, that is you can only checkout full repositories. This works fine most of the times, but quite often you&amp;#8217;d like to checkout a sub-folder of the repository (for example only library folder from Zend Framework). There is a solution for that called &lt;a href='http://progit.org/book/ch6-7.html'&gt;subtree merge&lt;/a&gt;, but I find it way too complicated for my liking and I only have used it a handful of times.&lt;/p&gt;

&lt;h3 id='how_far_you_want_to_go'&gt;How far you want to go&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/space.gif' alt='Space' /&gt;&lt;/div&gt;
&lt;p&gt;The most obvious use of external dependencies is to get a copy of the framework you are using. This is quite a simple task because it can even be solved by just downloading a copy of the framework and sticking it in the project folder. Easy enough to manage, although not ideal. If you have less than say 5 of such dependencies then any way you choose to manage them is going to be fine. As long as they don&amp;#8217;t have dependencies themselves&amp;#8230;&lt;/p&gt;

&lt;p&gt;Dependencies actually are much more complicated than that. If you are using truly componentized libraries, those by themselves are going to have some dependencies. This introduces the &lt;a href='http://en.wikipedia.org/wiki/Transitive_dependency'&gt;transitive dependencies&lt;/a&gt; problem which you can&amp;#8217;t easily solve. This is not such a big deal for PHP projects, because the biggest place where such libraries exist is PEAR.net and the tools there will help you with that. Anyhow, keep this in mind.&lt;/p&gt;

&lt;p&gt;As you can see depending on what sort of external code you are trying pull in there are different problems attached to it. From my experience simple management of the dependencies is enough, because I&amp;#8217;m yet to see a big number of libraries having clearly defined dependencies. So unless this changes soon, I just use the simplest tools available.&lt;/p&gt;

&lt;h3 id='tools_made_for_this'&gt;Tools made for this&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/tools.jpg' alt='Tools' /&gt;&lt;/div&gt;
&lt;p&gt;One of the best known tools is &lt;a href='http://maven.apache.org/'&gt;Apache Maven&lt;/a&gt;, especially if you have Java experience. It does everything you&amp;#8217;d want from a dependency manager and probably more, but having used it for couple projects I think it&amp;#8217;s overcomplicating what I would need for our projects. Maybe because I haven&amp;#8217;t worked on projects complicated enough, but more likely because I just don&amp;#8217;t find tools like this attractive and valuable.&lt;/p&gt;

&lt;p&gt;You might also want to use &lt;a href='http://pear.php.net/index.php'&gt;PEAR&lt;/a&gt; for dependencies management, although it requires external libraries to be stored in PEAR repositories. Similarly there is the &lt;a href='http://github.com/composer/composer'&gt;composer&lt;/a&gt; project which tries to solve a lot of dependencies problems and can resolve them from various different sources, but it still seems to be in development and I haven&amp;#8217;t played with it enough. I think composer might be the one to watch.&lt;/p&gt;

&lt;p&gt;Symfony2 has an interesting approach of just having a &lt;a href='https://github.com/symfony/symfony-standard/blob/master/deps'&gt;deps&lt;/a&gt; file which is used to define where all the dependencies are and where to place them. Think of it as a very light build recipe. Following a similar approach I have extended it and added support for different repository types and sub-repository checkouts. One script &lt;code&gt;./bin/vendors install&lt;/code&gt; to run looks great to me.&lt;/p&gt;

&lt;h3 id='things_you_shouldnt_forget'&gt;Things you shouldn&amp;#8217;t forget&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/remember.jpg' alt='Remember' /&gt;&lt;/div&gt;
&lt;p&gt;With the growing popularity of Symfony2, there are more and more bundles floating around on GitHub. However I just recently had a case where a new developer checked out a project and it wasn&amp;#8217;t working completely. Apparently in couple weeks of time one of the bundles was refactored resulting in all previous integrations completely failing.&lt;/p&gt;

&lt;p&gt;It was my stupidity of not locking in to a specific revision of the bundle (which you can do using the deps.lock file), but it is likely that you will repeat this mistake too. The fact you need to understand is that most of the time you will be pulling 3rd party dependencies which you have no control of and if you depend on them heavily, which you probably are, you need to know a specific version you want to use. Point it to a stable version if they have one, because it&amp;#8217;s extremely bad to just point your reference to a master branch.&lt;/p&gt;

&lt;p&gt;Furthermore, a library can go away completely. If it&amp;#8217;s hosted in a obscure small website you just found, month down the road there could be no trace of it anymore. Even GitHub doesn&amp;#8217;t protect from this - repository can be deleted and will be gone forever. So you need to make a choice - how much do you trust the authors - and preferably backup the source code locally (or setup a mirror repository).&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;One way or another you will have to solve this problem. I&amp;#8217;d say the easiest way to start with and have some room to grow is to integrate this with your build scripts, even as a simple bash script. Don&amp;#8217;t try to reinvent the wheel if you need something more sophisticated - there exists working tools already, so just give some of them a go. Just make sure you know what version you want to depend on and have some safety nets against disappearing sources.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Never trust your sources</title>
   <link href="http://blog.webspecies.co.uk/2011-08-25/never-trust-your-sources.html" />
   <updated>2011-08-25T13:27:00-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-08-25/never-trust-your-sources</id>
   <content type="html">&lt;p&gt;Data validation sounds like an obvious thing and it appears that everyone is doing it, but here are some ideas on where you might be doing it wrong. It&amp;#8217;s not a practical examples article though, I&amp;#8217;d assume they are pretty easy to figure out; this is more about implications and causes of various different validation errors. All of them are where we had suffered before, so make sure not to repeat the same mistakes.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;This post is not about security, although security is probably one of the most important users of validation. Here I&amp;#8217;d like to talk about other uses cases of validation, mainly being how to make sense of data you are receiving and make sure it&amp;#8217;s not breaking your applications.&lt;/p&gt;

&lt;h3 id='obvious_rules'&gt;Obvious rules&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/rules.gif' alt='Rules' /&gt;&lt;/div&gt;
&lt;p&gt;If you expect an integer, check if it&amp;#8217;s an integer. If you expect a date, check if it&amp;#8217;s date. It doesn&amp;#8217;t matter if it is an admin interface and&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&amp;#8220;only ourselves will be using it, so we will be always entering valid data&amp;#8221;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is not &lt;a href='http://www.phpmyadmin.net/home_page/index.php'&gt;phpMyAdmin&lt;/a&gt; you are building (even that is actually validating what you have entered before storing in a database), making sure there is no way to mess up the database from any app will save you time. And grey hair.&lt;/p&gt;

&lt;p&gt;More than once have I seen the cases of applications not checking what they are accepting as a price of a product and then failing to render any successive screens because math operations on it are invalid. It&amp;#8217;s especially bad when users can&amp;#8217;t fix it themselves and need to contact you, the developer, to handle that from the other end. If I enter &lt;em&gt;1&amp;#8217;000&amp;#8217;000&amp;#8217;000&lt;/em&gt; to stock quantity field make sure the whole app doesn&amp;#8217;t explode trying to insert so many rows in a database.&lt;/p&gt;

&lt;h3 id='make_assumptions'&gt;Make assumptions&lt;/h3&gt;

&lt;p&gt;Programming, I believe, is about logic. So don&amp;#8217;t be an &lt;em&gt;idiot&lt;/em&gt; and use some of it. Ask questions about the input you are receiving, be it user entered data or auto-imports from external sources, and lock down the expectations. Here are some basic rules, just examples of course, sounding so natural, but still I rarely see them in practice:&lt;/p&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/assume.jpg' alt='Make assumptions' /&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Product price larger than 0, smaller than 1000&lt;/li&gt;

&lt;li&gt;Person&amp;#8217;s age is between 0 and 150&lt;/li&gt;

&lt;li&gt;Stock quantity is between 0 and 50&lt;/li&gt;

&lt;li&gt;Order cannot have date in the future&lt;/li&gt;

&lt;li&gt;Etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Obviously it depends on the application you are building and these might change, but most likely they won&amp;#8217;t and allowing arbitrary data to be entered can create huge problems. This is better than the &lt;a href='http://en.wikipedia.org/wiki/Blacklist'&gt;Blacklist&lt;/a&gt; approach which is not really pratical, as it requires specifying what your data &lt;strong&gt;can&amp;#8217;t be&lt;/strong&gt; rather what it &lt;strong&gt;can only be&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you are importing data from an external source, does it make sense for the result to be empty? If it&amp;#8217;s a list of events it cannot be empty, unless the whole thing got cancelled. So discard such an import and log that you got 0 events, and you expected at least a dozen… do not delete all events from your database. That is - do not trust the source 100%, use your, or code&amp;#8217;s, head too.&lt;/p&gt;

&lt;h3 id='structure'&gt;Structure&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/structure.png' alt='Structure' /&gt;&lt;/div&gt;
&lt;p&gt;This one is so easy to check it&amp;#8217;s not even funny when your data imports go wrong. It can be an Excel spread sheet, some custom format or XML serialized data. All of those have structure, which you should be able to rely on. Personally, if data format changes, I make sure that my code would just stop processing it immediately, because it doesn&amp;#8217;t know any more what any of it means.&lt;/p&gt;

&lt;p&gt;For tabular data it&amp;#8217;s very easy to check tables&amp;#8217; headers - the amount of them and their labels. The order can change, you can figure out how to handle this, but if some fields are missing it is indicating that possibly the actual data can be mixed up to. XML might be trickier to check as it has nested structures, but one could use validation against a &lt;a href='http://en.wikipedia.org/wiki/Document_Type_Definition'&gt;DTD&lt;/a&gt;. If additional price element is added the code might still work, but the code doesn&amp;#8217;t know if it&amp;#8217;s using the right one anymore.&lt;/p&gt;

&lt;p&gt;There are cases when you might &lt;em&gt;not know&lt;/em&gt; all possible data formats, like what I noticed recently when importing some data from Amazon reports. Everything seemed fine when we were testing, but once we launched some products were reporting with wrong quantities. The type field, which we stupidly ignored because it always showed &lt;em&gt;&amp;#8216;Sellable&amp;#8217;&lt;/em&gt;, apparently can also be different and when it&amp;#8217;s different you should act differently. Obviously because we ignored it the data we imported didn&amp;#8217;t make sense - what we should have done is validate the data and have our assumption about type field in place, this could have notified us about that unseen format.&lt;/p&gt;

&lt;h3 id='encoding'&gt;Encoding&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/encoding.png' alt='Encoding' /&gt;&lt;/div&gt;
&lt;p&gt;We had this issue when working with the &lt;a href='http://blog.webspecies.co.uk/2011-07-04/building-the-edinburgh-festival-api.html'&gt;Edinburgh Festival API&lt;/a&gt; while migrating some import sources to a new location. Everything seemed fine and data was successfully imported, but after some time users pointed out that some of the characters we are returning are invalid Unicode sequences. After some investigation we found out that the conversion of ISO-8859-1 to UTF-8 was in fact incorrect and &lt;a href='http://en.wikipedia.org/wiki/Windows-1252'&gt;Windows-1252&lt;/a&gt; was supposed to have been used.&lt;/p&gt;

&lt;p&gt;Obviously it&amp;#8217;s very unlikely that encoding of the data might change, but, you know, sometimes things happen when you least expect them. Annoyingly encoding issues are easiest to spot by humans, because any single-byte encoding can be applied and it will work, kind of, but only an actual person would notice that the text it&amp;#8217;s showing doesn&amp;#8217;t make sense. Luckily computers now can guess too, by for example running this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$str&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;áéóú&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// ISO-8859-1&lt;/span&gt;
&lt;span class='nb'&gt;mb_detect_encoding&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$str&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;However results of this function are pretty unpredictable sometimes, although you can still use it to detect if data is a valid Unicode string, for example:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$str&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;áéóú&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// ISO-8859-1&lt;/span&gt;
&lt;span class='nb'&gt;mb_detect_encoding&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$str&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;UTF-8&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// &amp;#39;UTF-8&amp;#39;&lt;/span&gt;
&lt;span class='nb'&gt;mb_detect_encoding&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$str&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;UTF-8&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And if you are converting from a specific encoding to another one, say Unicode, test if result doesn&amp;#8217;t contain invisible characters or any other impossible sequences in human readable text.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I can&amp;#8217;t stress enough how import data validation really is. There are so many attack vectors exploiting incomplete or faulty validation you can never be 100% sure all cases are covered. But rather than building a blacklist, go with whitelist approach, because most likely it&amp;#8217;s going to be better and if conditions change you can always fix it later.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Web Scraping is actually pretty easy</title>
   <link href="http://blog.webspecies.co.uk/2011-07-27/web-scrapping-is-actually-pretty-easy.html" />
   <updated>2011-07-27T16:15:00-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-07-27/web-scrapping-is-actually-pretty-easy</id>
   <content type="html">&lt;p&gt;For some of our clients we worked on extracting or submitting data automatically from websites which didn&amp;#8217;t have an API we could use. This and more is called web scraping. Since our announcement of &lt;a href='http://sellerscout.co.uk/'&gt;SellerScout&lt;/a&gt;, which relies heavily on this, I received a list of questions how we actually do this. So here are some thoughts on how to get started in the interesting web scraping world.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;This article talks about some basics, which will work fine for most cases. This is probably not even remotely close to how Google does this or how we do it at SellerScout. The reason being both of those systems work in much larger scale and use cases are different. For example relying on machine learning, text analysis and semantic search algorithms etc. are all the things you might be doing if you want to build something big.&lt;/p&gt;

&lt;h3 id='downloading_the_web'&gt;Downloading the web&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/scraper.jpg' alt='Scraper' /&gt;&lt;p class='wp-caption-text'&gt;It's all just scraping&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href='http://en.wikipedia.org/wiki/Web_crawler'&gt;Spiders&lt;/a&gt; are the small applications you are going to be writing. Usually they are self-contained and CLI-friendly scripts, which have some internal logic how to extract information from a specific website or websites. As an example, the script might go to website&amp;#8217;s homepage, download all the category pages, download products list for each of them and extract a list of products in the store.&lt;/p&gt;

&lt;p&gt;If you are a Python guy, you might want to look at &lt;a href='http://twistedmatrix.com/trac/'&gt;Twisted&lt;/a&gt; or &lt;a href='http://scrapy.org/'&gt;Scrapy&lt;/a&gt;, later being very easy to use. If it&amp;#8217;s PHP you are using, combination of cURL and libxml will allow doing the same; I&amp;#8217;m not aware of any PHP frameworks for this. For any other language, you should give a look at Google.&lt;/p&gt;

&lt;p&gt;Depending on your task you will need to support different functionality. If the website is for logged in users only, you should configure &lt;a href='http://php.net/manual/en/book.curl.php'&gt;cURL&lt;/a&gt; to use cookies jar and initialize the scraping with a request to login page. If you need to extract thousands of documents, have some logic to pause and resume the script, so if it crashes it can start from the last completed document rather than from the start. In any case, try to replicate the natural user behaviour on the site.&lt;/p&gt;

&lt;p&gt;Is it &lt;a href='http://en.wikipedia.org/wiki/Web_scraping#Legal_issues'&gt;legal&lt;/a&gt;? Depends. There is no strict answer and it varies on what data you are trying to extract. Some data can be copyrighted, for example original texts, so if you are scraping them and showing in your website - you are being a bad person. Stop! Ideally you should discuss this with your lawyer, which we did, and get some thoughts on how to proceed.&lt;/p&gt;

&lt;h3 id='getting_blocked'&gt;Getting blocked&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/stopsign.gif' alt='Stop sign' /&gt;&lt;/div&gt;
&lt;p&gt;One of the decisions you will need to make is how you are going to identify the spider - you can either replicate normal browser&amp;#8217;s headers or introduce the spider by its name (eg. &amp;#8217;&lt;a href='http://www.google.com/support/webmasters/bin/answer.py?answer=182072'&gt;&lt;code&gt;googlebot&lt;/code&gt;&lt;/a&gt;&amp;#8217;). First one will allow you to stay undetected, probably, while later one is considered to be the correct way. From my experience, for anything small Firefox headers will work just fine.&lt;/p&gt;

&lt;p&gt;Websites might still decide to block you though, and it&amp;#8217;s something you might want to be prepared for. If you are identifying the spider by a name, you should respect &lt;a href='http://www.robotstxt.org/'&gt;robots.txt&lt;/a&gt; and stop crawling if you are being denied by that file. However the most likely blocking mechanism is to block your IP address, which is going to happen if you are being stupid. Really stupid.&lt;/p&gt;

&lt;p&gt;You see, when people are browsing the web they request 1 page each 3 or 4 seconds, hence if you have a list of 1000 urls to download and you just start iterating over them and issuing requests… Well, you are easy to catch by just looking at the access logs. Don&amp;#8217;t do this. Rather have a queue of urls to download and issue requests with a random delay from a range of 1 to 5 seconds. It&amp;#8217;s going to take longer, but it will help to avoid problems.&lt;/p&gt;

&lt;p&gt;This doesn&amp;#8217;t scale though, you might say. And in fact you are right, because 5 seconds delay between each request limits the amount of content you can download per day. Luckily for you, I have a tip here too - use proxy servers. It&amp;#8217;s going to require writing a requests scheduler, but if you need to download the same 1000 urls you might as well distribute them over a list of proxies each with their own delay times. The more proxies you have the amount of content you can download increases linearly.&lt;/p&gt;

&lt;h3 id='extracting_the_data'&gt;Extracting the data&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/xpath.jpg' alt='XPath' /&gt;&lt;/div&gt;
&lt;p&gt;Once you have the HTML you want to process (to extract links to follow or to extract actual data), you might wonder how to actually do it. There couple of ways and libraries for this, however if you want to keep it simple using XPath or CSS-like queries is going work just fine. If you feel like it, and believe me sometimes there is no other way, you might go with using &lt;a href='http://en.wikipedia.org/wiki/Regular_expression'&gt;regular expressions&lt;/a&gt; for this, but that&amp;#8217;s got problems I&amp;#8217;m going to talk about just in a second.&lt;/p&gt;

&lt;p&gt;I tend to go with &lt;a href='http://www.w3.org/TR/xpath/'&gt;XPath&lt;/a&gt; because it&amp;#8217;s very easy to write and to debug. Furthermore there are various extensions you get for your browser which will allow creating those queries and test them on the actual website. I have worked on spiders for over a 100 different websites and XPath worked fine all the time, as long as…&lt;/p&gt;

&lt;p&gt;The problem you will need to solve is how to process invalid HTML or XHTML markup. And from my experience, I&amp;#8217;m yet to see a website with all pages being 100% valid. The more invalid it is, the harder it&amp;#8217;s to fix those problems. There are libraries though, most famously &lt;a href='http://www.crummy.com/software/BeautifulSoup/'&gt;BeautifulSoap&lt;/a&gt;, which will try to process invalid markup. They do have performance implications, but keep them in mind because you won&amp;#8217;t be able to issue XPath queries on invalid syntax.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s get back to regular expressions. Theoretically they might look awesome, because they can extract data even if the HTML markup is invalid, however the problem is that soon they get complicated and very easy to break. XPath allows you to work on a DOM tree, hence if the website structure change they just stop working completely. Regexps on the other hand might still work, but produce very unpredictable results.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;We, as a company, have a lot of experience web scraping the data and it&amp;#8217;s actually very very easy. As long as you follow the logical rules and don&amp;#8217;t try to over-complicate the data extraction, you could easily extract all news items, products or blog posts in 30 or so code lines of spider. I can talk on this for hours or days, so I might write more on this soon, because this is just a top of an iceberg.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Building the Edinburgh Festival API</title>
   <link href="http://blog.webspecies.co.uk/2011-07-04/building-the-edinburgh-festival-api.html" />
   <updated>2011-07-04T17:10:00-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-07-04/building-the-edinburgh-festival-api</id>
   <content type="html">&lt;p&gt;Couple months ago we started working on a very exciting project - building data access API for world’s largest cultural event, the Edinburgh Festival. It was a very exciting journey and here I’m sharing how we built it and what’s the stack used. The goal of this is to show how we solved specific problems and how you might apply it for your applications.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h3 id='the_problem'&gt;The problem&lt;/h3&gt;

&lt;p&gt;We started from having a specification from &lt;a href='http://festivalslab.com/'&gt;Festivals Lab&lt;/a&gt; of what this API is supposed to be doing and that was a really trivial task - outputting data of 7 different sub-festivals in one format. And this is just about right what this project was about. We took the data from various sources, processed into formats we could understand, did some filtering, cleaning up and validation, and pushed that data through the API.&lt;/p&gt;

&lt;p&gt;Of course this is just a read-only API, so that changed the design a lot. There was no need to connect to an actual database from the API server, but rather to a search server, which would be faster and more reliable for data lookups. And reliability was one of the requirements, because as with all APIs they should always work, &lt;em&gt;in theory&lt;/em&gt;. Having a database and search server decoupled allowed increasing fault-tolerance and scalability.&lt;/p&gt;

&lt;p&gt;What we tried to achieve is that even if all the servers would die, we can spin up a new one, run some scripts and API is back live. Thus data imports were supposed to be deterministic and fast. Once we got that, we were certain that even if things go horribly wrong we can recover very quickly and for smaller cases database’s and search server’s redundancy is protecting us.&lt;/p&gt;

&lt;h3 id='the_stack'&gt;The stack&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/festivalapi.png' alt='Festival API' /&gt;&lt;/div&gt;
&lt;h4 id='processing'&gt;Processing&lt;/h4&gt;

&lt;p&gt;Interestingly this is the part where we spent most of the time, because chasing for data has proven to be very challenging. However once we got all data in sensible formats, we wrote a collection of self-contained scripts processing the data into various our formats.&lt;/p&gt;

&lt;p&gt;Key lesson here was that Excel is used very commonly and there are problems with most of the Excel file parsers out there. Especially with Unicode characters, which worked fine in most cases, but sometimes would just fail to unrecognizable chars. Exporting to CSV first and then reading that from the scripts was how we solved this.&lt;/p&gt;

&lt;h4 id='database_server'&gt;Database server&lt;/h4&gt;

&lt;p&gt;Obviously data coming from data sources needs to be stored somewhere. Overall the structure was very trivial, only consisting of events containing a list of performances. So MySQL could have worked here easily, however the problem was that data structure was different between different festivals and also it was constantly evolving during development process.&lt;/p&gt;

&lt;p&gt;Hence we went with CouchDB because of its reliability and the fact that we can just store events as nested objects.&lt;/p&gt;

&lt;h4 id='search_server'&gt;Search server&lt;/h4&gt;

&lt;p&gt;One of the key elements of this API is to be able to filter data efficiently. &lt;a href='http://www.elasticsearch.org/'&gt;ElasticSearch&lt;/a&gt; was chosen because it integrates with CouchDB almost out of the box, so it was as hard as executing &lt;a href='http://www.elasticsearch.org/blog/2010/09/28/the_river_searchable_couchdb.html'&gt;one query&lt;/a&gt; and we had a (almost) real-time representation of data stored in a database, searchable through the API.&lt;/p&gt;

&lt;p&gt;ElasticSearch also allows to kick-start in 5 minutes, without a need to define document’s structure allowing to add those later to optimize the performance. It supports anything Lucene supports too, so it wasn’t like we were throwing out possible features.&lt;/p&gt;

&lt;h4 id='api'&gt;API&lt;/h4&gt;

&lt;p&gt;As I explained in the previous &lt;a href='http://blog.webspecies.co.uk/2011-06-15/restful-web-services-with-python-the-easy-way.html'&gt;post&lt;/a&gt;, we use Python for all APIs. This wasn’t an exception. However, the actual API is just couple hundred lines of code - very thin layer on top of the search server, which is using HTTP interface anyway so there was no need to use any libraries too.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://nginx.org/'&gt;Nginx&lt;/a&gt; was used as a proxy for API servers, each monitored by &lt;a href='http://supervisord.org/'&gt;supervisor&lt;/a&gt;. Deployments are obviously made using Ant and any task is one-click operation here, so any updates can be rolled out in a matter of minutes.&lt;/p&gt;

&lt;h3 id='the_outcome'&gt;The outcome&lt;/h3&gt;

&lt;p&gt;Well, it works. In our internal tests we tried to make it go down, but it sustained them all quite well. We can&amp;#8217;t share performance characteristics of the application, but it&amp;#8217;s pretty damn fast and the servers have no problems with the load.&lt;/p&gt;

&lt;p&gt;There is logging for all requests coming in, storing information about which API user accessed what sort of information. This should be producing some interesting results we might be able to share later. The API is going to be used in some mobile apps too and we haven&amp;#8217;t worked with those before, so we will learn how use cases are different from mobile applications compared to websites or desktop applications.&lt;/p&gt;

&lt;p&gt;Festival only runs for less than two months, so this project is quite short-lived, but there are big plans for next year to open the data even more. If it was us to decide, we would make it fully public and do not require any license agreements, but this is not the case, so we are going to push hard for this for next year.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Clients seem to be very happy about the outcome and we are happy because we had a chance to work on a very challenging and important impact-wise project. I made all those decisions, of course after discussions with the clients, and they seem to work so far. Any tips for future projects?&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>RESTful web services with Python. The easy way</title>
   <link href="http://blog.webspecies.co.uk/2011-06-15/restful-web-services-with-python-the-easy-way.html" />
   <updated>2011-06-15T17:40:14-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-06-15/restful-web-services-with-python-the-easy-way</id>
   <content type="html">&lt;p&gt;More and more projects are exposing their functionality via REST APIs. We think APIs are awesome and it&amp;#8217;s great what they&amp;#8217;ve done for the web overall, but we also see a lot of bad APIs examples, like &lt;a href='http://dev.twitter.com/doc'&gt;Twitter API&lt;/a&gt;. It might be the case that if you don&amp;#8217;t have the right tools, it becomes hard to implement them correctly and quick. Lately we have been working on a couple APIs and I decided to share our experiences and why we went with Python in the end.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h3 id='what_apis_should_do'&gt;What APIs should do?&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/rest.gif' alt='Rest' /&gt;&lt;/div&gt;
&lt;p&gt;As little as possible ideally. In most cases it&amp;#8217;s just a layer on top of a database or search server, providing a RESTful way to access data and get it back in some fashion understandable by a client. The less code there is, the lighter it&amp;#8217;s and the easier it&amp;#8217;s to make changes the better. And with a rise of popularity to expose data and functionality using APIs it should behave following REST and HTTP standards, so it can be adopted in no time.&lt;/p&gt;

&lt;p&gt;Depending on how you want to do it, you can also go full RESTful or just pretend that what you are doing is a REST API. Supporting &lt;a href='http://en.wikipedia.org/wiki/Hypermedia'&gt;Hypermedia&lt;/a&gt; for example is something you are supposed to do, in theory. But at least the API should handle HTTP Accept headers correctly, use native HTTP authentication like &lt;a href='http://en.wikipedia.org/wiki/Basic_access_authentication'&gt;Basic Auth&lt;/a&gt; and have meaningful resources&amp;#8217; URLs. Just that will make it somewhat much better than most APIs out there.&lt;/p&gt;

&lt;p&gt;Importantly, web services should be as fast as possible and support huge amounts of requests per second. In most cases APIs are called from other applications and the more time it takes for your API to respond the slower that application becomes. You should aim at no more than 10ms to fulfill the request. And if application developer decides to retrieve some resources in a loop, your API server shouldn&amp;#8217;t crash either.&lt;/p&gt;

&lt;h3 id='python_or_not'&gt;Python or not&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/snake.gif' alt='Snake' /&gt;&lt;/div&gt;
&lt;p&gt;Although we might seem as a PHP company, we are not really - we use PHP for websites, where it works best, and nothing else. The reason why we tend to go with Python is simply because it&amp;#8217;s just perfect for what I described above. There are libraries for pretty much anything when it comes to reading and writing data from any storage and it&amp;#8217;s super lightweight compared to a lot of other languages.&lt;/p&gt;

&lt;p&gt;Don&amp;#8217;t get me wrong - there is nothing wrong with any other languages; it&amp;#8217;s just that Python worked really great for us. However if you for example want to stay with PHP, &lt;a href='http://getfrapi.com/'&gt;Frapi&lt;/a&gt; might be a good option for APIs. Although you can&amp;#8217;t really achieve a lot of things as easily as with Python and the language is just much more concise. Performance is a questionable topic, but from our experience Python wins any day. &lt;em&gt;&amp;#8220;It scales&amp;#8221;&lt;/em&gt; that is.&lt;/p&gt;

&lt;p&gt;From functionality perspective &lt;a href='http://www.python.org/dev/peps/pep-0318/'&gt;decorators&lt;/a&gt; allow achieving a lot of things without destroying application flow with endless listeners and callbacks. When I need to provide authentication for the API, I just wrap the application with &lt;code&gt;@auth&lt;/code&gt; or when data from some API call needs to be cached it just gets wrapped with &lt;code&gt;@cache&lt;/code&gt;. Makes workflow really clear and doesn&amp;#8217;t require nested &lt;code&gt;if&lt;/code&gt; structures and duplicated logic. It&amp;#8217;s used heavily in most of the Python web frameworks.&lt;/p&gt;

&lt;h3 id='api_in_a_bottle'&gt;API in a Bottle&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img class='noborder' src='/media/bottle.gif' alt='Bottle' /&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href='http://bottlepy.org/'&gt;Bottle&lt;/a&gt; is one-file web framework based on &lt;a href='http://www.wsgi.org/wsgi/'&gt;WSGI&lt;/a&gt;, thus it works just as any other Python framework. It&amp;#8217;s not really made for APIs exactly, but it works great for them. API looks a lot like &lt;a href='http://www.sinatrarb.com/'&gt;Sinatra&lt;/a&gt; - it just maps routes to actions (functions). What is more, I find it to be allowing very rapid developing - in most cases I can write whole API in less than a day.&lt;/p&gt;

&lt;p&gt;Compared to other Python frameworks it doesn&amp;#8217;t do anything that special, but where it shines is that a lot of the &lt;em&gt;things&lt;/em&gt; can be either configured or swapped for different ones. It&amp;#8217;s just a box of building blocks with some default behaviour, but from there you can really make it work in any way you want. If you need real-time web services allowing you to push data to clients, &lt;a href='http://www.tornadoweb.org/'&gt;Tornado&lt;/a&gt; might be a better choice though.&lt;/p&gt;

&lt;p&gt;Here is an example of the simple API, with first method returning plain string and second returning Python dictionary which will be automatically converted to a JSON string:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='nn'&gt;bottle&lt;/span&gt;
&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;bottle&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;route&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;run&lt;/span&gt;

&lt;span class='nd'&gt;@route&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;method&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;GET&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;homepage&lt;/span&gt;&lt;span class='p'&gt;():&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;Hello world!&amp;#39;&lt;/span&gt;
    
&lt;span class='nd'&gt;@route&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;/events/:id&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;method&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;GET&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;get_event&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;id&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nb'&gt;dict&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;Event &amp;#39;&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='nb'&gt;str&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;id&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
   
&lt;span class='n'&gt;bottle&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;debug&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;True&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; 
&lt;span class='n'&gt;run&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/xmlvsjson.png' alt='XML vs JSON' /&gt;&lt;/div&gt;
&lt;p&gt;A lot of functionality can be tweaked using plugins, so rather than allowing Bottle to automatically convert data structures to JSON, you might want to use plugin like this to return data strictly by the type client accepts:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;FormatPlugin&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;object&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
    &lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;format&amp;#39;&lt;/span&gt;

    &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;apply&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;callback&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;context&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
        &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;wrapper&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;a&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;**&lt;/span&gt;&lt;span class='n'&gt;ka&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
            &lt;span class='c'&gt;# Check if return data format is supported&lt;/span&gt;
            &lt;span class='n'&gt;accept&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;request&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;environ&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;get&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;HTTP_ACCEPT&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='ow'&gt;not&lt;/span&gt; &lt;span class='n'&gt;accept&lt;/span&gt; &lt;span class='ow'&gt;in&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;application/json&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;application/atom+xml&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;]:&lt;/span&gt;
                &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;HTTPError&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;500&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;Unsupported data format&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
            
            &lt;span class='c'&gt;# Execute the action    &lt;/span&gt;
            &lt;span class='n'&gt;rv&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;a&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;**&lt;/span&gt;&lt;span class='n'&gt;ka&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
            
            &lt;span class='c'&gt;# Write out results&lt;/span&gt;
            &lt;span class='n'&gt;response&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;content_type&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;accept&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;accept&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;application/json&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
                &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;json&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;dumps&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;rv&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
            &lt;span class='k'&gt;elif&lt;/span&gt; &lt;span class='n'&gt;accept&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='s'&gt;&amp;#39;application/atom+xml&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
                &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;render_xml&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;rv&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;rv&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When it comes to performance we haven&amp;#8217;t maxed it out yet. Usually we tunnel multiple Bottle applications through Nginx pool using uWSGI as a web server, which we chose because of the &lt;a href='http://nichol.as/benchmark-of-python-web-servers'&gt;benchmarks&lt;/a&gt; done by Nicholas Piël where you&amp;#8217;ll see that uWSGI demonstrates amazing performance. For this article I did some benchmarks on a VirtualBox VM with one core and I was getting at least 2000 req/s from an API talking to Mysql and MongoDB servers to fetch different data.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;If you haven&amp;#8217;t tried Bottle before I suggest you do - installation is as easy as &lt;code&gt;easy_install bottle&lt;/code&gt; and you can be up and running in just a few minutes. It worked well for us as it allowed creating web services quickly and customizing certain behaviours, but even with defaults it was suitable for most of the use cases. If you need help getting up and running, you can always get in &lt;a href='http://webspecies.co.uk/contact'&gt;touch&lt;/a&gt; with us.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Symfony2 - the best framework today?</title>
   <link href="http://blog.webspecies.co.uk/2011-06-07/symfony2-the-best-framework-today.html" />
   <updated>2011-06-07T15:55:14-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-06-07/symfony2-the-best-framework-today</id>
   <content type="html">&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/symfony2.png' alt='Symfony2' /&gt;&lt;/div&gt;
&lt;p&gt;I used to use Zend Framework extensively and still believe it&amp;#8217;s the best framework for anything what doesn&amp;#8217;t support PHP 5.3. However a couple months ago I started using &lt;a href='http://symfony.com'&gt;Symfony2&lt;/a&gt; for internal tools at Web Species and have stayed there since. It has its problems and flaws, but let me give you some thoughts why I think it&amp;#8217;s the framework which is going to go big. Very big.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;Frameworks are big creatures and naming interesting features can take thousands of words, so this is just a short glimpse of the few things I find interesting, to me. Obviously there is much more to it. Hopefully you won&amp;#8217;t find yourself feeling like this guy:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have nothing against Symfony2, I&amp;#8217;ve been using it and it&amp;#8217;s great. But this blog post is nothing but a gushing verbal sex exposition with Symfony as the subject.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id='what_i_like_about_it'&gt;What I like about it&lt;/h3&gt;

&lt;p&gt;First of all, once you grasp how it works, it starts to produce great results. I was really sceptical at first, because it seemed very complicated and over-engineered, however after few days of work I started liking how it works. And that&amp;#8217;s the thing about Symfony2 - after some time it starts to feel great to work with it, it even makes working with PHP interesting again. Once you know how to configure it properly it starts to play along.&lt;/p&gt;

&lt;p&gt;Not a big secret, but I&amp;#8217;m quite a Doctrine user and the fact that Symfony2 integrates with Doctrine2 out-of-the-box just makes things easier. For example working with forms is way simpler now because it can create forms out of entities&amp;#8217; definitions and also &lt;a href='http://symfony.com/doc/current/book/forms.html#forms-and-doctrine'&gt;populate&lt;/a&gt; data to entities. And integration for Doctrine CLI is obviously there, so the only thing you need to do is to specify connection properties and everything just works, in theory.&lt;/p&gt;

&lt;p&gt;But that&amp;#8217;s all code which is replicable to other frameworks; the most impressive bit is bundles. Bundles are small self-contained plugins allowing sharing some functionality between different projects. And GitHub is just &lt;a href='http://symfony2bundles.org/'&gt;flourishing&lt;/a&gt; with all sorts of different ones, showing how much people are interested in this framework and are actively using it. Using bundles makes Symfony2 core smaller, but also gives more flexibility with how you bootstrap a new project.&lt;/p&gt;

&lt;h3 id='business_reasons'&gt;Business reasons&lt;/h3&gt;

&lt;p&gt;As much as I&amp;#8217;m still involved in writing code, I need to make business decisions at the end about tools, frameworks and languages we use. And the tools I tend to go with are the tools I believe in can stay around for years to come and are, obviously, high quality and &lt;em&gt;popular&lt;/em&gt;. Without any doubt Symfony2 is the fastest growing framework and quite quickly should become the most popular out of all.&lt;/p&gt;

&lt;p&gt;There are a lot of not as well-known frameworks out there which are quite good and work somewhat well, but the reason I&amp;#8217;m not using them is because they are not popular enough, which makes me question if they are actually that good. It&amp;#8217;s hard to find developers with experience, it&amp;#8217;s hard to find blog posts and discussions online and I&amp;#8217;m unsure how long they are going to stay with their limited contributors list. Symfony2 makes me feel &lt;a href='http://symfony.com/contributors'&gt;safe&lt;/a&gt; so far.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s hard to answer why, but for me Symfony2 looks the most professional and/or professionally-developed framework. Party because I know a lot of companies and people working on it personally and I know they can deliver, but also because code quality they produce is amazing. And with the amount of developers constantly contributing I can see it being actively developed and becoming even more solid.&lt;/p&gt;

&lt;h3 id='the_bad_parts'&gt;The bad parts&lt;/h3&gt;

&lt;p&gt;A lot of the things in this framework still have to be proven to be the right decisions. I know a lot of people, who love Symfony2, but at the same time there is a big group who just hate it, and they have different reasons. Maybe because it has a very steep learning curve, at least now, but also because of some of the design ideas.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve mentioned this briefly in my blog post about &lt;a href='http://blog.webspecies.co.uk/2011-05-23/the-new-era-of-php-frameworks.html'&gt;frameworks&lt;/a&gt;, which is the fact that I see a lot of Java patterns and ideas being used in PHP. This is a big topic and I might blog about it sometime, although I have nothing against Java &lt;em&gt;(he he)&lt;/em&gt;, but even things like Dependency injection containers just don&amp;#8217;t feel &lt;em&gt;right&lt;/em&gt; (when used in PHP).&lt;/p&gt;

&lt;p&gt;DiC does work fine when you can load objects&amp;#8217; graph into memory and provide dependencies when needed, but nature of PHP is very different from Java and that graph needs to be built on each and every request. Of course the performance problem or impact, to be exact, can be solved in some ways, but the difference is still there. So as much as I like this idea, I&amp;#8217;m not yet fully confident that it will work well.&lt;/p&gt;

&lt;p&gt;This brings us to another problem with Symfony2 - the amount of configuration involved. When I was starting with Symfony2 it took me quite some time to figure out how to achieve even simple things. Maybe the framework went too far with removal of magic, right now it feels quite demanding and asking for everything to be configured explicitly. And once an exception is thrown somewhere deep in core, good luck figuring out why is it happening.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I think we made a right decision choosing this as a base for our web apps and are starting to receive more and more inquires about &lt;a href='http://webspecies.co.uk/training/symfony'&gt;Symfony training&lt;/a&gt; courses and consulting we do, which just shows that popularity is growing and growing. It does have some issues and sometimes just feels clunky, but overall allows producing high quality projects and being sure about the results. I feel confident investing in Symfony2.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Lazy evaluation with PHP</title>
   <link href="http://blog.webspecies.co.uk/2011-05-31/lazy-evaluation-with-php.html" />
   <updated>2011-05-31T15:07:14-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-31/lazy-evaluation-with-php</id>
   <content type="html">&lt;p&gt;Recently I needed to process a huge array of data and because of PHP’s somewhat inefficient variables and especially arrays that was resulting in &amp;#8220;out of memory&amp;#8221; errors. However, I couldn&amp;#8217;t use any other tools than PHP so was forced to come up with a solution implementation in it. Here is how I solved it using principles from functional languages.&lt;/p&gt;
&lt;!--more--&gt;&lt;blockquote&gt;
In programming language theory, lazy evaluation or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is actually required (non-strict evaluation) and also avoid repeated evaluations (sharing). The sharing can reduce the running time of certain functions by an exponential factor over other non-strict evaluation strategies, such as call-by-name.
&lt;/blockquote&gt;
&lt;p&gt;What you are about to read is not necessary about gaining extra performance, most of the times it’s going to stay the same, goal of this is to minimize memory usage as much as possible. So if you have a million of rows to process each weighting 10kb, the memory usage always stays at around 10kb. &lt;em&gt;Theoretically, PHP is not that perfect at cleaning up memory.&lt;/em&gt;&lt;/p&gt;

&lt;h3 id='the_problem'&gt;The problem&lt;/h3&gt;

&lt;p&gt;Let’s image that you need to process those million database records, how are you going to do it? Well, obviously the first step is to fetch them from a database. &lt;strong&gt;Stop!&lt;/strong&gt; I can’t even remember how many times I’ve seen people making mistakes here… Why? Eager evaluation, lookup this term. Wait, I&amp;#8217;m going to make sure you know what I&amp;#8217;m talking about so read it &lt;a href='http://en.wikipedia.org/wiki/Eager_evaluation'&gt;here&lt;/a&gt; in Wikipedia.&lt;/p&gt;

&lt;p&gt;All comes down to using &lt;a href='http://php.net/manual/en/pdostatement.fetchall.php'&gt;&lt;code&gt;fetchAll()&lt;/code&gt;&lt;/a&gt; method - the script was fetching all orders in a specified time range from a database. What &lt;code&gt;fetchAll()&lt;/code&gt; does is returns an array with all results matching the query, but this requires quite some memory if amount of results is in thousands. Later the script was doing some calculations and creating a second results array now with processed data. At the end of the main loop memory usage was &lt;code&gt;row count * size per row * 2&lt;/code&gt;, a very big number.&lt;/p&gt;

&lt;p&gt;It’s very rarely beneficial to do it like this - most of the time data can be processed per row, removing the need to store everything in memory. Sometimes SQL queries won’t allow it, but even those can be changed to make it possible to abandon pre-computation and work with data as streams. Once you have a data stream, you can start working with it using pipelines.&lt;/p&gt;

&lt;h3 id='functional_languages'&gt;Functional languages&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/haskell.jpg' alt='Haskell logo' /&gt;&lt;/div&gt;
&lt;p&gt;Before looking at PHP solutions for this problem, let&amp;#8217;s analyse how it can be done in a functional language. Image a function returning all &lt;a href='http://en.wikipedia.org/wiki/Fibonacci_number'&gt;Fibonacci numbers&lt;/a&gt;, here is an implementation in Haskell (from Haskell &lt;a href='http://www.haskell.org/haskellwiki/Haskell/Lazy_evaluation'&gt;documentation&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='haskell'&gt;&lt;span class='nf'&gt;magic&lt;/span&gt; &lt;span class='ow'&gt;::&lt;/span&gt; &lt;span class='kt'&gt;Int&lt;/span&gt; &lt;span class='ow'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='kt'&gt;Int&lt;/span&gt; &lt;span class='ow'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='kt'&gt;Int&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
&lt;span class='nf'&gt;magic&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='kr'&gt;_&lt;/span&gt; &lt;span class='ow'&gt;=&lt;/span&gt; &lt;span class='kt'&gt;[]&lt;/span&gt;
&lt;span class='nf'&gt;magic&lt;/span&gt; &lt;span class='n'&gt;m&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='ow'&gt;=&lt;/span&gt; &lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='kt'&gt;:&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;magic&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;m&lt;/span&gt;&lt;span class='o'&gt;+&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By calling it with &lt;code&gt;magic 1 1&lt;/code&gt; it will return a list &lt;code&gt;[1,1,2,3,5,8,…]&lt;/code&gt;. But the important part is that this list or the return value is &lt;em&gt;infinite&lt;/em&gt;. There is no boundary like &amp;#8220;return 100 numbers&amp;#8221;, it will actually return all possible numbers. You might say that’s impossible and you are kind-of right, especially if you haven’t worked with similar functional languages before.&lt;/p&gt;

&lt;p&gt;Because Haskell uses lazy evaluation (with strict being an option), calling this method doesn’t actually compute anything. Instead it creates a generator-like resource from which you can read as many numbers as you want, and as long as you are reading them it’s computing next one. So with a function like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='haskell'&gt;&lt;span class='nf'&gt;getIt&lt;/span&gt; &lt;span class='ow'&gt;::&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='kt'&gt;Int&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='ow'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='kt'&gt;Int&lt;/span&gt; &lt;span class='ow'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='kt'&gt;Int&lt;/span&gt;
&lt;span class='nf'&gt;getIt&lt;/span&gt; &lt;span class='kt'&gt;[]&lt;/span&gt; &lt;span class='kr'&gt;_&lt;/span&gt; &lt;span class='ow'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;
&lt;span class='nf'&gt;getIt&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;x&lt;/span&gt;&lt;span class='kt'&gt;:&lt;/span&gt;&lt;span class='n'&gt;xs&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='ow'&gt;=&lt;/span&gt; &lt;span class='n'&gt;x&lt;/span&gt;
&lt;span class='nf'&gt;getIt&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;x&lt;/span&gt;&lt;span class='kt'&gt;:&lt;/span&gt;&lt;span class='n'&gt;xs&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='n'&gt;n&lt;/span&gt; &lt;span class='ow'&gt;=&lt;/span&gt; &lt;span class='n'&gt;getIt&lt;/span&gt; &lt;span class='n'&gt;xs&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;n&lt;/span&gt;&lt;span class='o'&gt;-&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can get &lt;em&gt;Xth&lt;/em&gt; number of Fibonacci - call it like &lt;code&gt;getIt (magic 1 1) 5&lt;/code&gt; and output should be 5, because the 5th number of the sequence is 5. Important part here is that even though I’m passing a result of function &lt;code&gt;magic&lt;/code&gt; to the &lt;code&gt;getIt&lt;/code&gt; function, as mentioned above, &lt;code&gt;magic&lt;/code&gt; doesn’t need to compute anything to return. &lt;code&gt;getIt&lt;/code&gt; reads 5 numbers from that infinite list and terminates returning the last number.&lt;/p&gt;

&lt;h3 id='php_way'&gt;PHP way&lt;/h3&gt;

&lt;p&gt;Sadly, you can’t really do anything like that easily in PHP, because it doesn’t support lazy evaluation or &lt;a href='http://wiki.python.org/moin/Generators'&gt;generators&lt;/a&gt;. However it’s possible to improvise and have a working solution. And the solution is… &lt;a href='http://php.net/manual/en/class.iterator.php'&gt;&lt;code&gt;Iterators&lt;/code&gt;&lt;/a&gt;. One of the most underused functionality in PHP.&lt;/p&gt;

&lt;p&gt;Any class which extends an &lt;code&gt;Iterator&lt;/code&gt; can be used in &lt;code&gt;foreach&lt;/code&gt; as a data source. Let me give you a short example of an iterator generating all numbers from 0 to infinity:&lt;/p&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/lazy.gif' alt='Lazy' /&gt;&lt;/div&gt;&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;﻿&lt;/span&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Numbers&lt;/span&gt; &lt;span class='k'&gt;implements&lt;/span&gt; &lt;span class='nx'&gt;Iterator&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='nv'&gt;$position&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    
    &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;rewind&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;position&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;current&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;position&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;key&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;position&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;next&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;position&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;valid&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$n&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Numbers&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='k'&gt;foreach&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$n&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;print&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nx'&gt;PHP_EOL&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Running this script will yield a never-ending list of numbers. But you can only do 10 iterations and get 10th number all without ever needing to store whole sequence in memory. Of course this is going to be slower than just calculating 10, but using similar approach you can process data one atomic unit per cycle without ever storing it in memory.&lt;/p&gt;

&lt;p&gt;If you need to render data in a view again remember the memory - do not compute the result in a model (or god forbid controller) and then pass it to a view. Creating an iterator in model and return it for consumption in a view. For view it is the same thing - array and iterator are both iteratable, but you are saving a lot of otherwise wasted memory.&lt;/p&gt;

&lt;h3 id='database_side'&gt;Database side&lt;/h3&gt;

&lt;p&gt;Aggregation &lt;em&gt;should&lt;/em&gt; always happen in database side, when possible. So if you need a total of all items sold for each order - calculate that using a &lt;code&gt;SUM()&lt;/code&gt; function rather than doing it when iterating over results. Because to do it you need to look back or forward in results set, and that breaks lazy evaluation immediately.&lt;/p&gt;

&lt;p&gt;First of all, fetch data with a normal &lt;code&gt;fetch()&lt;/code&gt; method in a &lt;code&gt;while&lt;/code&gt; loop, rather that iterating over &lt;code&gt;fetchAll()&lt;/code&gt; result. Nothing will break, hopefully, but instead of building array of all results and then processing them, the script will process each row and release it from memory. MySQL returns a cursor to results’ set and PHP driver will use that to get each row after row.&lt;/p&gt;

&lt;p&gt;You might wonder how to render pages like a list of orders with items. It’s quite easy, just join orders&amp;#8217; and items&amp;#8217; tables and order results by &lt;code&gt;order id&lt;/code&gt;. The data you will be getting back will look something like this&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;Order ID   Item ID   Item name&lt;/span&gt;
&lt;span class='x'&gt;1          156       Milk&lt;/span&gt;
&lt;span class='x'&gt;1          897       Bread&lt;/span&gt;
&lt;span class='x'&gt;2          156       Milk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To render this just iterate over results and as soon as &lt;code&gt;order id&lt;/code&gt; changes from previous one, start a new HTML table for an order. Using this approach you can render whole history of all orders with information about items too without ever using more memory that one row size. The only limiting factor is then response HTML size.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Obviously PHP wasn’t really created for anything like this, but situations when you need to make it work happen. Make sure you are not building up any data or fetching it eagerly and release resources as soon as you are done. This solved all the problems for me and all of the sudden same script was processing all the data without any memory leaks.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>HTML5 History API - dynamic websites like never before</title>
   <link href="http://blog.webspecies.co.uk/2011-05-26/html5-history-api-dynamic-websites-like-never-before.html" />
   <updated>2011-05-26T13:47:13-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-26/html5-history-api-dynamic-websites-like-never-before</id>
   <content type="html">&lt;p&gt;I have talked about this &lt;a href='http://blog.webspecies.co.uk/2011-05-05/javascript-is-not-your-boss.html'&gt;before&lt;/a&gt;, but JavaScript should not dictate content or website structure. It should only improve the UI, but even with JavaScript disabled website should work. Using the new HTML5 History API allows to do that one step further - making dynamic websites behave like &lt;em&gt;normal&lt;/em&gt; ones.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h3 id='what_is_history_api'&gt;What is History API?&lt;/h3&gt;

&lt;p&gt;History API is quite a simple concept - a JavaScript API you can use to control history state. If user clicks on an image and you show a lightbox with enlarged version, clicking &lt;code&gt;Back&lt;/code&gt; sends user back to previous page, rather than closing lightbox popup. It does this, because there is no state information browser can use to know how to close that popup window.&lt;/p&gt;

&lt;p&gt;With using History API you can add an entry to history stack once some dynamic content is loaded. When user clicks &lt;code&gt;Back&lt;/code&gt; browser would go back by one element in history stack and fire off an event which then you can handle by closing the popup window. Same thing happens when user clicks &lt;code&gt;Forward&lt;/code&gt; - event happens and it’s up to your script to handle this gracefully.&lt;/p&gt;

&lt;p&gt;All this behavior is very well explained in &lt;em&gt;&amp;#8220;Dive into HTML5&amp;#8221;&lt;/em&gt; &lt;a href='http://diveintohtml5.org/history.html'&gt;book&lt;/a&gt;, but in short when you load some content using Ajax or you move user to a place in a page which you want to have linkable - use History API. Of course this involves handling the links in server side too, but this is quite trivial - build the website first then make use of History API to improve the interface. In my tests this improved user experienced a lot and allowed to achieve very rich user interfaces without destroying website structure.&lt;/p&gt;

&lt;p&gt;Of course like with most of the new HTML5 functionality, this is not &lt;a href='http://caniuse.com/#search=history'&gt;supported&lt;/a&gt; in all browsers. Most importantly this is not supported at all in any IE versions, not even IE9 which was released only couple months ago. But you can handle this by having improved UI for some users and falling back to normal non-js behavior for IE, for example this is what Github &lt;a href='http://github.com/webspecies/blog.webspecies.co.uk'&gt;does&lt;/a&gt; (click on folders).&lt;/p&gt;

&lt;h3 id='how_to_use_it'&gt;How to use it?&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/html5.png' alt='HTML5' /&gt;&lt;/div&gt;
&lt;p&gt;There is only one main method:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='javascript'&gt;&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;pushState&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;state&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;title&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;link&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and one event:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='javascript'&gt;&lt;span class='nb'&gt;window&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;addEventListener&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;popstate&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kd'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;e&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nx'&gt;alert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;location: &amp;quot;&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='nb'&gt;document&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;location&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;, state: &amp;quot;&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='nx'&gt;JSON&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;stringify&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;event&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;state&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;State example from Mozzila &lt;a href='https://developer.mozilla.org/en/DOM/window.onpopstate'&gt;documentation&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='javascript'&gt;&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;pushState&lt;/span&gt;&lt;span class='p'&gt;({&lt;/span&gt;&lt;span class='nx'&gt;page&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;},&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;title 1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;?page=1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;pushState&lt;/span&gt;&lt;span class='p'&gt;({&lt;/span&gt;&lt;span class='nx'&gt;page&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='p'&gt;},&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;title 2&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;?page=2&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;replaceState&lt;/span&gt;&lt;span class='p'&gt;({&lt;/span&gt;&lt;span class='nx'&gt;page&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;},&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;title 3&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;?page=3&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;back&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// alerts &amp;quot;location: http://example.com/example.html?page=1, state: {&amp;quot;page&amp;quot;:1}&amp;quot;&lt;/span&gt;
&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;back&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// alerts &amp;quot;location: http://example.com/example.html, state: null&lt;/span&gt;
&lt;span class='nx'&gt;history&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='nx'&gt;go&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;  &lt;span class='c1'&gt;// alerts &amp;quot;location: http://example.com/example.html?page=3, state: {&amp;quot;page&amp;quot;:3}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;However I would recommend using some abstraction for this, mainly because you need to do quite some manual work, which I hate doing. A great tool exists called &lt;a href='https://github.com/balupton/history.js'&gt;History.js&lt;/a&gt; which does exactly that - abstracts History API and &lt;strong&gt;also&lt;/strong&gt; has fall-back support for lame browsers like IE. I’ve used it extensively and it works great and it even has adapters for jQuery et al so you can use the same interface.&lt;/p&gt;

&lt;p&gt;Most impressive part of it is optional fall-back for older browsers. What it does is uses a similar to hashbang url, like &lt;code&gt;http://webspecies.co.uk/#/about&lt;/code&gt; which it handles inside and all methods for getting current url still return &lt;code&gt;/about&lt;/code&gt;. And if you go to this url from a modern browser it detects the support for proper History API and redirects to correct url. All combined, makes everything work nicely and future-proof, here is an &lt;a href='https://gist.github.com/854622'&gt;example&lt;/a&gt; of full script for Ajax page.&lt;/p&gt;

&lt;h3 id='how_i_used_it'&gt;How I used it&lt;/h3&gt;

&lt;p&gt;When building our &lt;a href='http://webspecies.co.uk/'&gt;Web Species&lt;/a&gt; website designer had an idea to have all content in one scrollable page. If you go to the homepage and start scrolling you’d notice that immediately - menu stays in place but content slides scroll as usual. Now even though this is very nice from user perspective, it creates a problem from content side. &lt;strong&gt;Very big problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Main problem is that there is no way to link people to specific slide and that it’s hard to get good rankings on Google. Now even though the first problem can be easily fixed by using anchor urls (like &lt;code&gt;http://webspecies.co.uk/#about&lt;/code&gt;), but fixing SEO is, I believe, impossible. Impossible because one page contains a lot of content and you won’t get good rankings for any keywords.&lt;/p&gt;

&lt;p&gt;If you look at view source of &lt;code&gt;http://webspecies.co.uk/about&lt;/code&gt; you’d see correct title and only &amp;#8220;about&amp;#8221; content, but as soon as you load this in browser AJAX loads all slides and replaces existing slide with them. So from user perspective there still exists one-page effect, but actual pages can be called directly (in server side I have all separate pages like about, training, clients etc. and one with all slides).&lt;/p&gt;

&lt;p&gt;To fix linking I employed History API and it worked out beautifully. If you click a menu item on left side of the website, AJAX event fires off resulting in two actions - &lt;code&gt;pushState()&lt;/code&gt; and scroll to correct slide. Same thing happens on &lt;code&gt;Back&lt;/code&gt; or &lt;code&gt;forward&lt;/code&gt; history actions - content is scrolled to where it belongs. If you go to &lt;a href='http://webspecies.co.uk/news'&gt;news section&lt;/a&gt; clicking on news items result in same behaviour too.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;As with all HTML5 functionally History API is still quite rarely used, but if you feel like building something awesome - I’d say give it a go. It works beautifully in modern browsers and there are tools which can make it happen for older browsers too. From my point of view, it allows to achieve crazy UI ideas and still have semantically correct websites, which is always my goal.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>The new era of PHP frameworks</title>
   <link href="http://blog.webspecies.co.uk/2011-05-23/the-new-era-of-php-frameworks.html" />
   <updated>2011-05-23T17:51:13-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-23/the-new-era-of-php-frameworks</id>
   <content type="html">&lt;p&gt;I have worked on a lot of different systems and projects in my years and most of that was spent doing PHP. However just recently I have noticed a new major point in time - a new era of PHP frameworks. Seems like everything is changing these days. I want to discuss what I think the current state is, what’s wrong with it and how the new &lt;em&gt;gang&lt;/em&gt; of frameworks is going to change it.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;On May 21st I have delivered a talk at Dutch PHP conference (DPC in short) about this topic and I had very interesting discussions to follow it. To start with &lt;a href='http://www.slideshare.net/juokaz/the-new-era-of-php-frameworks-dpc'&gt;here&lt;/a&gt; are the slides of the talk, of course keep in mind that they do not work that well without me talking. This article is a brief version of what I talked about.&lt;/p&gt;

&lt;h3 id='frameworks_are_born'&gt;Frameworks are born&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img src='/media/useframeworks.png' alt='I use PHP frameworks' /&gt;&lt;p class='wp-caption-text'&gt;I use PHP frameworks&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;6 years ago CakePHP, one of the first PHP frameworks, was released and from then we’ve seen a plethora of PHP frameworks. Currently there is about… a million of them, all with their different MVC, DBAL and templating implementations. I like them, even if they have their weirdness’s, but still their adoption is not massive.&lt;/p&gt;

&lt;p&gt;If you’d look at numbers of PHP open-source projects &lt;strong&gt;based&lt;/strong&gt; on frameworks, you’d see that there is only a few. Which is sad. Partly the reason is that a lot of those projects were released before any PHP frameworks even existed, but also because doing some work with a PHP framework required quite some learning. Thus if a project is to be based on a framework it would increase the learning-curve, at least in most cases.&lt;/p&gt;

&lt;p&gt;Nonetheless they have changed how we do PHP. A lot of developers claim they know OOP, but when frameworks came they were forced to actually prove it (before that you can hack in any way you want). And frameworks have thought millions of PHP developers what is real OOP and how it works. Ask someone to use &lt;code&gt;mysql_query&lt;/code&gt; nowadays and you might get punched in a face. Twice. Because they would also need to use &lt;code&gt;mysql_real_escape_string&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id='how_it_was_done'&gt;How it was done?&lt;/h3&gt;

&lt;p&gt;No one really knew what PHP frameworks should be. Nor what features they might have. So how did people managed to make them happen? Well, they either followed existing frameworks in other languages (like RoR) or came up with their own ideas. Because experience was not there, most of the frameworks up to today have legacy designs which everyone knows are bad, but are impossible to fix.&lt;/p&gt;

&lt;p&gt;Pragmatic approach of PHP developers here helped a lot - similarly as how PHP as a language evolved, PHP frameworks also changed and grew driven by feedback and requests. In a couple years most people were happy with what we had. But if you’d look closely in 2007 we had Zend Framework 1.0, which had, compared to 1.11, a very limited feature set. So even today frameworks are evolving rapidly to meet features’ needs.&lt;/p&gt;

&lt;p&gt;PHP 4 was (and surprisingly still is by some of them) supported by all the frameworks. This led to a lot of code which is now very out-dated, especially the OOP paradigms. Trying to support this has led to complicated process of implementing new features and fixing bugs. Furthermore, less and less developers want to work on such old code anymore.&lt;/p&gt;

&lt;h3 id='whats_broken'&gt;What’s broken?&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img src='/media/phpmagickkills.png' alt='Magic kills' /&gt;&lt;p class='wp-caption-text'&gt;Magic kills&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;First of all, back then it was very popular to use PHP’s magic functions (&lt;code&gt;__get&lt;/code&gt;, &lt;code&gt;__call&lt;/code&gt; etc.). There is nothing wrong with them from a first look, but they are actually really dangerous. They make APIs unclear, auto-completion impossible and most importantly they are slow. The use case for them was to hack PHP to do things which it didn’t want to. And it worked. But made bad things happen.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;SCOP&lt;/em&gt; - Static class oriented programming, is a term I invented to describe most of the PHP code. Static methods are bad for a lot of reasons I’m not going to dive into today, but most importantly if a class is a collection of static methods, it’s nowhere near OOP. It’s just using a class as a container for functions. There are even full frameworks just doing this.&lt;/p&gt;

&lt;p&gt;Zend Framework for a long time was my favourite PHP framework (and still is for PHP 5.2), but my main issue with it is that it’s trying too hard to be a component library. Other frameworks followed the same path - rather than using existing libraries, they wrote their own. This has created so many libraries inside PHP which are kind-of standalone, but still require downloading whole framework. Fat frameworks are really annoying.&lt;/p&gt;

&lt;h3 id='the_new_era_in_2011'&gt;The new era in 2011&lt;/h3&gt;

&lt;p&gt;To improve this situation people have chosen to do a couple of things. But mainly is to rewrite frameworks from scratch on top of PHP 5.3. This allows to establish new standards, agree on interfaces between all frameworks and throw away all (most) legacy problems. Sounds easy, but only by doing these things we can enter the new era of frameworks.&lt;/p&gt;

&lt;p&gt;I haven’t used any frameworks before CakePHP was born, so I’m going to use it as a landmark. Actually I even doubt there existed anything before CakePHP, of course if you don’t call Drupal a framework. From CakePHP years till today 6 years have passed which I mark as first era. 2011 marks second and completely new things will finally happen, mainly in a form of releases and announcements.&lt;/p&gt;

&lt;p&gt;Interestingly, in 2011 PHP is no longer PHP. Or no longer just PHP. LAMP stack is boring and less and less used with new tools like Nginx and CouchDB available. Today integration and interoperability are crucial elements. Same is for language - PHP 5.3 is a completely new beast which makes possible for amazing functionality, and if you do PHP 5.3 there is no real backwards compatibility support to do.&lt;/p&gt;

&lt;h3 id='lets_fix_it_then_shall_we'&gt;Let’s fix it then, shall we?&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img src='/media/usegit.png' alt='Use git' /&gt;&lt;p class='wp-caption-text'&gt;Use git&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;Moving to GIT allowed a lot of frameworks to make it easier to contribute. I’m mostly impressed with Symfony results, because they have managed to attract huge number of contributors (see the graph &lt;a href='http://github.com/symfony/symfony/graphs/impact'&gt;here&lt;/a&gt;). Current pace is much faster and compared to few years ago frameworks are improving at much higher rates.&lt;/p&gt;

&lt;p&gt;A lot of trimming has been done. First of all, all that magic I was mentioning earlier is now gone and explicit definitions are used all over. Furthermore, there is more thinking towards having a small core and additional features coming through extensions and libraries. This is a great way to make it easier to work with frameworks and reduce memory footprint.&lt;/p&gt;

&lt;p&gt;Performance was a major issue for PHP frameworks and most of them had it in their plans for new releases. Front-end received a lot of attention too with frameworks like Symfony helping with assets (JavaScript and CSS) management and proper headers for static content. PHP side profited from removed magic and code clean-up, and PHP 5.3 added a huge performance boost.&lt;/p&gt;

&lt;h3 id='new_features'&gt;New features&lt;/h3&gt;

&lt;p&gt;Obviously all the new language features are incorporated. Like namespaces for example. Namespaces support lead to needing to research and agree on autoloading approach which would work for most of the frameworks. &lt;a href='http://groups.google.com/group/php-standards/web/psr-0-final-proposal'&gt;PSR-0&lt;/a&gt; was born earlier, but is now well integrated into frameworks. And then anonymous functions are finding their way in.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://pimple-project.org/'&gt;Dependency injection containers&lt;/a&gt; and &lt;a href='http://www.doctrine-project.org/projects/common/2.0/docs/reference/annotations/en'&gt;Annotations&lt;/a&gt; are two I’d like to mention here; both change how you code. I like using them and Symfony makes great use of them, but other frameworks are catching up and should start to incorporate them soon. Combination of those and new PHP features allows to create very clean micro-frameworks, look at &lt;a href='http://silex-project.org/'&gt;Silex&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’m not entirely sure that I like growing list of features directly ported from Java environment. Java works differently (and requires 1GB of RAM) so even DiC is tricky in PHP. We’ll see where it gets us, but so far I’m a bit worried because I know that PHP likes light systems rather than complicate objects’ graphs. As much as cool those patterns sound, they can create more problems then solve.&lt;/p&gt;

&lt;h3 id='so_when'&gt;So when?&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img src='/media/symfony2release.png' alt='Symfony2 release' /&gt;&lt;p class='wp-caption-text'&gt;Symfony2 release&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href='http://framework.zend.com/'&gt;Zend Framework 2.0&lt;/a&gt; is on its way, but will take some time. Because ZF has a massive code base first thing they did was to convert it all to namespaced code. Once it was, it was time to start refactoring functionality and implement new one. Currently the work is being done on MVC part, but I’d hope late this year a final release might happen.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://rad-dev.org/lithium'&gt;Lithium&lt;/a&gt; is about to be there, it’s in dev mode, but seems pretty close to finished. It’s a completely different framework from those regular ones, so it would be worth to check it out. I’m mostly impressed with their &lt;a href='http://en.wikipedia.org/wiki/Aspect-oriented_programming'&gt;AOP&lt;/a&gt; implementation and would like to see adopted in more frameworks. Obviously they are PHP 5.3 only, but they do support CouchDB and MongoDB quite nicely.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://symfony.com/'&gt;Symfony2&lt;/a&gt;, in my opinion, is leading the pack. They are currently in beta2 and final release is a matter of months it seems. List of features is hard to grasp, so it’s worthwhile checking them out in their website, but to name one - Bundles. Bundles are a way to have extensible application structure with a collection of components from outside. Think plugins.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I’m super excited about the things currently happening in PHP industry and I believe they will lead to great achievements. Finally we have a time when we can throw away all (most) our legacy and implement fresh ideas. Fast forward 5 years from now we should be as excited as we are today.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Virtual Machines for Web Developers</title>
   <link href="http://blog.webspecies.co.uk/2011-05-16/virtual-machines-for-web-developers.html" />
   <updated>2011-05-16T13:30:13-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-16/virtual-machines-for-web-developers</id>
   <content type="html">&lt;p&gt;There are millions of articles on how to setup LAMP setup on your own machine to allow developing websites locally. I think this is a wrong approach as running server programs in one’s computer creates a lot of potential problems. Better approach for this would be to use Virtual Machines as they allow bigger flexibility and fewer headaches when something goes wrong.&lt;/p&gt;

&lt;p&gt;Of course you probably don’t want to do it if you are only working on one project or you are just starting your carrier as especially at first it will feel kind of weird. However for people like me who work on a dozen different projects throughout the week it’s just so much better. Hopefully after reading this you will rethink your setup.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h3 id='performance'&gt;Performance&lt;/h3&gt;

&lt;p&gt;Having a VM for web server first of all makes your computer boot much faster. I was getting really frustrated at some point because I had so many different server components installed in my main laptop that it was just unusable. Web server with a dozen of websites, database servers, monitoring tools etc. all of this was just too much. Once all of this is moved to a VM this problem is solved immediately.&lt;/p&gt;

&lt;p&gt;Processors and other hardware nowadays have really good support for virtualization (you can read more about it in &lt;a href='http://en.wikipedia.org/wiki/X86_virtualization#Hardware_assist'&gt;Wikipedia&lt;/a&gt;) so there is no real performance drawback. Just make sure you have enough RAM to run multiple OSes and as much as possible cores in your processor for the load to be spread out. Not surprisingly only install systems without graphical UI, for example install Ubuntu Server without &lt;code&gt;ubuntu-desktop&lt;/code&gt; package, this will save a lot of resources.&lt;/p&gt;

&lt;h3 id='advantages'&gt;Advantages&lt;/h3&gt;

&lt;p&gt;I especially like that, it makes it easier to reinstall machines and migrate them. Not only you can move VMs between different machines (in most cases this works) so you can have same setup for testing on a laptop and on a main computer, but in case you need to reinstall anything you are safer not to screw up the whole system. Tools like VirtualBox also support &lt;a href='http://www.virtualbox.org/manual/ch01.html#snapshots'&gt;snapshots&lt;/a&gt; which allow reverting back to some known state if things go wrong.&lt;/p&gt;

&lt;p&gt;Obviously &lt;strong&gt;decoupling&lt;/strong&gt; of testing environments from an actual machine also makes it possible to have different environments to test different things. For example I have one VM with Windows server and one with Ubuntu, both of them acting as web servers. This allows testing same applications with different configurations very easily. Furthermore, VMs can also be used for browsers&amp;#8217; support testing.&lt;/p&gt;

&lt;p&gt;What is more you can test load balancing, database replication and other things which require multiple servers really easy. And with a help of VM software and tools available in OS one can also simulate network latency, dropped packages and bandwidth. Some of those things you can’t even test on real hardware (like network performance) and VMs just work great for that, it’s only a matter of using right tools.&lt;/p&gt;

&lt;h3 id='virtualbox'&gt;VirtualBox&lt;/h3&gt;

&lt;p&gt;For VMs I use &lt;a href='http://www.virtualbox.org/'&gt;VirtualBox&lt;/a&gt; because it’s free, works really well and supports all the features I might need. My setup is quite trivial but has worked really well so far - all VMs are in a &lt;a href='http://ptony82.wordpress.com/2008/11/18/the-fastest-way-to-create-host-only-network-with-virtualbox/'&gt;Host-Only&lt;/a&gt; network with static IPs. So I can access them from a computer or from any other VM by IP like &lt;code&gt;192.168.56.104&lt;/code&gt;, but they are not visible from other computers in a network while still having internet connection from a second network connection in NAT mode.&lt;/p&gt;

&lt;p&gt;Sharing files with VirtualBox is really easy as it supports &lt;a href='http://www.virtualbox.org/manual/ch04.html#sharedfolders'&gt;shared folders&lt;/a&gt;. Shared folders are exactly what they stand for - any folder in computer becomes a visible partition (Windows) or device (Linux) which then you can map to wherever you want. To support this you need to install guest additions in each VM’s as you need a special kernel module for shared folders. I haven’t yet figured out how to install it without full additions, so let me know if you have any ideas.&lt;/p&gt;

&lt;h3 id='better_setup'&gt;Better setup&lt;/h3&gt;

&lt;p&gt;However I don’t use shared folders that often as I have build scripts for deployment. I believe this is just a better as I don’t want to map my projects&amp;#8217; folders to be accessed by web server as they might create some cache files or modify them in any other way which would be really bad. So I just execute &lt;code&gt;ant testing&lt;/code&gt; and it deploys application to an assigned VM.&lt;/p&gt;

&lt;p&gt;To make this work well I have my public key deployed to all of the VMs so I can connect to them without using a password. Also I have mapped the IP addresses to specific hosts like &lt;code&gt;doctrine.dev&lt;/code&gt;, which allow to connect to a VM as easily as &lt;code&gt;ssh doctrine.dev&lt;/code&gt; or to access a website hosted in that server &lt;code&gt;http://doctrine.dev/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once you have done this, you completely forget the fact that it’s VM you are using. Because I use Ubuntu I have all VMs open in a separate workplace which I don’t use for anything else. And I &lt;em&gt;&amp;#8220;talk&amp;#8221;&lt;/em&gt; to them like with any other server - using SSH. Yet another decoupling point - all the build scripts you write can be used for staging/production server later as you would be accessing them in a same way.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;I can’t be happier for this switch as it finally allowed using my computer not only for development, because if VMs are not started there is nothing waiting in system processes. The amount of flexibility and features is just great and much bigger than initial hassle to get the setup working.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Jekyll - back to the basics of Web</title>
   <link href="http://blog.webspecies.co.uk/2011-05-09/jekyll-back-to-the-basics-of-web.html" />
   <updated>2011-05-09T16:35:13-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-09/jekyll-back-to-the-basics-of-web</id>
   <content type="html">&lt;p&gt;When considering which platform to use for our website and for this blog I had a couple choices: use Wordpress or any other CMS, build our own custom solution or have a completely static website. I went with none of those and chose a small tool with great powers - &lt;em&gt;Jekyll&lt;/em&gt;. This post is a walkthrough and an introduction to it and when it makes sense to use it.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;Jekyll is a static site generator and that’s exactly what it does - turning a collection of templates into HTML files. You can then deploy them to any web server and all of a sudden website is live. Content can be written using your usual editor, so you can write your blog with VIM if you feel like it. Absolutely no need for a database, PHP or any other external dependency, which brings us to Jekyll advantages.&lt;/p&gt;

&lt;h3 id='advantages'&gt;Advantages&lt;/h3&gt;

&lt;p&gt;Well, it generates HTML pages, so it makes a secure website (you can’t really hack a HTML page, now can you?). In comparison, Wordpress is known to have security holes, as any software you say, but blogs are often under heavy attacks by spam bots, so the more secure the better. Unless you have misconfigured your server, I doubt it’s even theoretically possible to get hacked.&lt;/p&gt;

&lt;p&gt;HTML pages bring another advantage of being crazy fast. PHP for example is fast enough for most of the stuff, but still falls behind static pages, for obvious reasons. I like fast responding websites and the only limit I want to have is how fast my internet connection is. This blog is being served by Nginx, an awesome tool by itself, and load times are pretty much instant.&lt;/p&gt;

&lt;p&gt;I have started my career by creating HTML websites years ago and still can do it very quickly. Once I have received a HTML code from my designer I don’t need to wrap it around Joomla or Wordpress specific template structure - with Jekyll it can be used as it is. Makes it ideal for creating websites fast and because there aren’t billion templates, views and partials to juggle with, fixing bugs in markup is child’s play.&lt;/p&gt;

&lt;h3 id='disadvantages'&gt;Disadvantages&lt;/h3&gt;
&lt;div class='alignright'&gt;&lt;img class='noborder' src='/media/backtobasics.jpg' alt='Back to basics' /&gt;&lt;/div&gt;
&lt;p&gt;Static pages are static, thus requiring some workarounds for your usual dynamic parts. Contact forms cannot be implemented as there is no handler for handling the POST data. Luckily there are services like &lt;a href='http://wufoo.com/'&gt;Wufoo&lt;/a&gt; which work for most of the tasks, but require more fiddling that just creating a PHP handler. Same is for comments, ratings etc. anything requiring to store data.&lt;/p&gt;

&lt;p&gt;There is no online interface for editing so it won’t work well for clients&amp;#8217; websites. I can imagine hacks which would make it somewhat possible, but let’s leave them out. This is not an issue for me as there is no need to edit anything on this blog once it’s deployed (I can redeploy in a matter of seconds anyway), however clients will need this feature. Thus rendering Jekyll usable only for your own projects or limited-editing websites.&lt;/p&gt;

&lt;p&gt;That’s pretty much it. Well, at least for me. If none of that sound like a big show stopper, I suggest you try it out.&lt;/p&gt;

&lt;h3 id='how_do_i_start'&gt;How do I start?&lt;/h3&gt;

&lt;p&gt;Project is &lt;a href='http://github.com/mojombo/jekyll/'&gt;hosted&lt;/a&gt; on Github with full installation instructions, but in most systems it can be installed as easy as using:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;gem install jekyll
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then you obviously need a sample project. Create &lt;code&gt;index.html&lt;/code&gt; and &lt;code&gt;_config.yml&lt;/code&gt; in some folder and run this command from it:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;jekyll --server --auto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will launch a web server and &lt;code&gt;--auto&lt;/code&gt; command makes it monitor directory for changes, so as soon as you change some files it will regenerate the site. Guide your browser to &lt;code&gt;http://localhost:4000/&lt;/code&gt; and you should see your website. You can go from there and start adding more pages, using layouts etc., all this information is in the manual. One last thing - generated website is stored &lt;code&gt;_site&lt;/code&gt; folder and this is the folder you can deploy to anywhere.&lt;/p&gt;

&lt;p&gt;I use Markdown for most of the content, but it supports a bunch more and obviously regular HTML files. Templates can be written using &lt;a href='http://www.liquidmarkup.org/'&gt;Liquid&lt;/a&gt; language which is quite nice and more complicated tasks can be written as plugins (categories and tags pages on this blog are generated with them). There is a gallery of websites based on Jekyll &lt;a href='http://github.com/mojombo/jekyll/wiki/sites'&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Next time you are considering building a not complicated website or a blog give a look at Jekyll. It worked great for us and I’m very happy with flexibility and lightness of it. It’s also very easy to start with and potentially Liquid templates can be later migrated to other frameworks (Liquid is for example very close to &lt;a href='http://www.twig-project.org/'&gt;Twig&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;P.S.&lt;br /&gt;&lt;strong&gt;Source code of this blog is actually opensourced&lt;/strong&gt; too and available &lt;a href='http://github.com/webspecies/blog.webspecies.co.uk/'&gt;here&lt;/a&gt;, so feel free to hack with it. You can do this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='bash'&gt;git clone git@github.com:webspecies/blog.webspecies.co.uk.git
&lt;span class='nb'&gt;cd &lt;/span&gt;blog.webspecies.co.uk/
jekyll --server
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>JavaScript is not your boss</title>
   <link href="http://blog.webspecies.co.uk/2011-05-05/javascript-is-not-your-boss.html" />
   <updated>2011-05-05T17:45:13-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-05/javascript-is-not-your-boss</id>
   <content type="html">&lt;p&gt;If you’d have met me personally you would probably know that I don’t like JavaScript. But in a different way. I don’t like JavaScript because most of the time, I believe, it’s implemented in a way which just doesn’t make sense. It still makes sense for the end-user, which is awesome, but from &lt;em&gt;what-web-was-supposed-to-be&lt;/em&gt; balcony it’s far far away.&lt;/p&gt;

&lt;p&gt;From my perspective, JavaScript is a layer of logic (or code) sitting &lt;strong&gt;on top&lt;/strong&gt; of HTML. So you have a HTML page with links, images and text, and some JavaScript which makes nice animations, adds Google ads and maybe something else. Is it always like this? Well, you guessed it – no. Let’s dive in to see how we can make things better, without using Flash of course.&lt;/p&gt;
&lt;!--more--&gt;
&lt;h3 id='content'&gt;Content&lt;/h3&gt;
&lt;div class='alignleft'&gt;&lt;img src='/media/javascript.jpg' alt='JavaScript language' /&gt;&lt;p class='wp-caption-text'&gt;JavaScript!&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;Let’s go back a bit and see what JavaScript offers and how it changed websites. First thing which comes to my mind is Ajax – a way to asynchronously retrieve some data after the page has finished loading. This post is going to be mainly about it.&lt;/p&gt;

&lt;p&gt;You might have a product page with a few tabs like “Description&amp;#8221;, “Specifications&amp;#8221;, “Reviews&amp;#8221; etc. and each of those is loaded dynamically when that tab is clicked on. Makes it possible to fit more data in one page and have use chose which one to show. I love it when websites do that.&lt;/p&gt;

&lt;p&gt;But this functionality should not break the website structure from JavaScript-disabled browser’s point of view. Nowadays one might ignore the users who have chosen to disable JavaScript, but what you can’t ignore is Google. And the reason why you can’t ignore Google is because it makes you money; at least it does for most of the websites I had a chance to work on.&lt;/p&gt;

&lt;h3 id='missing_content'&gt;Missing content&lt;/h3&gt;

&lt;p&gt;Now if you look at the product page example I have mentioned above, if it is implemented like I have described, with all tabs being loaded once requested, search engines’ spiders cannot see their content. And this is &lt;strong&gt;bad&lt;/strong&gt;. All of the sudden you have valuable, relevant and semantically related information not being in that page, which makes that page rank low for a lot of valuable keywords. I’ve seen this happen a lot.&lt;/p&gt;

&lt;p&gt;There are cases when it is &lt;em&gt;allowed&lt;/em&gt; to ignore all of this, for example if you are &lt;a href='http://www.grooveshark.com/'&gt;Grooveshark&lt;/a&gt;. But this is more like an exception from a rule, in most cases it’s recommended to have HTML fall-back version of the website. Of course this is going to require more work than just creating a JavaScript-only version; however benefits of doing so are going to be larger.&lt;/p&gt;

&lt;h3 id='browser_side'&gt;Browser side&lt;/h3&gt;

&lt;p&gt;Ajax also allows to POST data back to the server without reloading the page. So contact form can show a nice thank you message without redirect user to any other page. Again, pretty cool feature. But when you start doing big forms (think order page) like that… You should be smart. Even if you do validation with JavaScript to make it faster for user to order something, you still need to do it server side also. Maintaining two sets of rules is tricky and you might get hacked. I was, once.&lt;/p&gt;

&lt;p&gt;If you are a frontend developer you know that some things just can’t be achieved without some JavaScript sauce. Of course it’s quite crazy to expect a website to look/behave exactly the same when JavaScript is not available, but it should work.&lt;/p&gt;

&lt;h3 id='the_right_way'&gt;“The right way&amp;#8221;&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Don’t code for JavaScript, code for content. Content is the boss, not JavaScript.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When I can I start by defining website structure, implementing all the underlying logic and pretty much finishing the whole website, before I start working on JavaScript. Obviously this is a generalization, but you get the idea. Yet again we can look at the product page example – once ready, add JavaScript to hide some parts of the HTML, thus only showing the current Tab. From user perspective UX is exactly the same.&lt;/p&gt;

&lt;p&gt;If client/designer/kitten asks for some fancy navigation, have it mind, but don’t create a website based on that idea. Not because the requirements might (will) change, but because you are coupling presentation to data. If with disabled JavaScript a website doesn’t work at all or it’s impossible to navigate, you have coupling. And coupling is bad. And will make things bad, sooner or later.&lt;/p&gt;

&lt;p&gt;Even simple things like lightbox effect (a popup-like window with enlarged image) should, ideally, work just fine without JavaScript. When it’s done like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='html'&gt;&lt;span class='nt'&gt;&amp;lt;a&lt;/span&gt; &lt;span class='na'&gt;href=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;big_image.jpg&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;class=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;lightbox&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&amp;lt;img&lt;/span&gt; &lt;span class='na'&gt;src=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;small_image.jpg&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;alt=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;Image&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It’s clean and clear. When like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='html'&gt;&lt;span class='nt'&gt;&amp;lt;a&lt;/span&gt; &lt;span class='na'&gt;href=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;javascript:onclick(openImage(‘big_image.jpg’))&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&amp;lt;img&lt;/span&gt; &lt;span class='na'&gt;src=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;small_image.jpg&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;alt=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;Image&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You are asking for problems.&lt;/p&gt;

&lt;h3 id='conclusion'&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;This is not in any sense an argument against using JavaScript. It can help achieve amazing things, especially together with HTML5 (allowing to make RIA apps). But make sure website makes sense from JavaScript-disabled access points’ point of view, for example Google should be able to index all the data without any tricks.&lt;/p&gt;

&lt;p&gt;P.S. &lt;em&gt;Hashbang&lt;/em&gt; is omitted from this post to make it less rantty. This doesn’t mean hashbangs are good. They are not.&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 
 <entry>
   <title>Sharing the knowledge</title>
   <link href="http://blog.webspecies.co.uk/2011-05-04/sharing-the-knowledge.html" />
   <updated>2011-05-04T17:05:03-07:00</updated>
   <id>http://blog.webspecies.co.uk/2011-05-04/sharing-the-knowledge</id>
   <content type="html">&lt;p&gt;As you might know from our &lt;a href='http://webspecies.co.uk/'&gt;website&lt;/a&gt; we work with a lot of different technologies, tools and projects. For example when building the website we used all the best from HTML5 and CSS3. It only makes sense for us to share our experiences to help others build things faster and better. This is the reason why we have created this blog - sharing our knowledge.&lt;/p&gt;
&lt;!--more--&gt;
&lt;p&gt;It’s my promise to you that you won’t find any company-related content or advertisement of our products in this blog. As you might have seen, we have a &lt;a href='http://webspecies.co.uk/news'&gt;news section&lt;/a&gt; for this in our website and that’s where all that is going to be talked about. Blog on the other hand is mainly for developers and designers, so the content here is going to be very technical and thus different.&lt;/p&gt;

&lt;p&gt;Our point of view is quite similar to the one of a company called 37signals, which is talked about in this &lt;a href='http://www.youtube.com/watch?v=Ks2saa38Id4'&gt;talk&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;iframe src='http://www.youtube.com/embed/Ks2saa38Id4' frameborder='0' height='480' width='790'&gt;
&lt;/iframe&gt;&lt;/p&gt;
&lt;p&gt;We are not afraid to put out what we know best. Hey, most of the stack we use is already open-source so why not contribute back by blogging about it? Of course without revealing security mechanisms and unique algorithms, but if we solved HTML5 History API usage problems we are going to share it. In turn hopefully this will also bring us a couple new clients, that would be great.&lt;/p&gt;

&lt;p&gt;This is just an introductory post, but the bare minimum we are going to do is one blog post per week, so make sure to subscribe our &lt;a href='http://feeds.feedburner.com/WebSpeciesBlog'&gt;RSS feed&lt;/a&gt; and leave comments. I’ll see you soon!&lt;/p&gt;</content>
   <author>
    <name>Juozas</name>
    <email>juozas@juokaz.com</email>
   </author>
 </entry>
 

</feed>
