<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>IONCANNON</title>
	
	<link>http://www.ioncannon.net</link>
	<description>Thoughts on Software Development and Engineering</description>
	<lastBuildDate>Tue, 12 Jan 2010 12:21:24 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ioncannon" /><feedburner:info uri="ioncannon" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>Developing Adobe Air Apps with Linux</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/KZyKO7euX2M/</link>
		<comments>http://www.ioncannon.net/programming/875/developing-adobe-air-apps-with-linux/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 12:21:24 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[air]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=875</guid>
		<description><![CDATA[I finally found a little project I wanted to do using Adobe Air and after some searching I found out you can use Linux to develop Air applications. At first I thought I would have to use Flex Builder which is still in alpha for Linux but it turns out there is a better option [...]]]></description>
			<content:encoded><![CDATA[<p>I finally found a little project I wanted to do using <a href="http://www.adobe.com/products/air/">Adobe Air</a> and after some searching I found out you can use Linux to develop Air applications. At first I thought I would have to use Flex Builder which is still in <a href="http://labs.adobe.com/technologies/flex/flexbuilder_linux/">alpha for Linux</a> but it turns out there is a better option from <a href="http://www.aptana.org/">Aptana</a>. </p>
<p>The <a href="http://www.aptana.org/air/">Aptana Air</a> plugin supports developing Adobe Air applications using HTML and Javascript. It even support the 2.0 release of Air that is currently in beta. Aptana uses the Eclipse framework as an editor so if you are familure with Eclipse it will be even easier to use.</p>
<p>I started by downloading and installing the latest version of the <a href="http://get.adobe.com/air/">Air runtime</a>. Next I grabbed the <a href="http://www.adobe.com/products/air/tools/sdk/">Air SDK</a>, the SDK doesn&#039;t come with the plugin so it is something you have to get directly from the Air developers site. After getting the SDK unpacked I installed the latest Aptana core release. Once the core is installed there is a big plugin button on the startup screen that currently has Air listed.</p>
<p>The install went smoothly except for a few issues. The first one I ran into was very noticeable since it kept any dialog buttons from working when they were clicked although they did work when I clicked them and then hit enter or navigated to them with the keyboard. Luckily someone has already figured out that there is an issue with <a href="http://mou.me.uk/2009/10/31/fixing-eclipse-in-ubuntu-9-10-karmic-koala/">Eclipse and GTK+</a> that is the cause (even though the post is for Ubuntu the same problem and solution worked for me on Fedora). The fix is to set the GDK_NATIVE_WINDOWS variable before running the Aptana binary:</p>
<div class="codesnip-container" >GDK_NATIVE_WINDOWS=true; AptanaStudio</div>
<p>The next thing I noticed was the application.xml descriptor that Aptana created didn&#039;t generate correctly. It needs to start with the correct xmlns or the following error will be thrown on run: &#034;invalid application descriptor: descriptor version does not match runtime version&#034;. To fix this check the version of the Air SDK by running the following command:</p>
<div class="codesnip-container" >./adt -version<br />
adt version &quot;1.5.3.9120&quot;</div>
<p>For the version of the Air SDK I downloaded the correct xmlns was http://ns.adobe.com/air/application/1.5 so I needed the following application tag:</p>
<div class="codesnip-container" >&lt;application xmlns=&quot;http://ns.adobe.com/air/application/1.5&quot;&gt;</div>
<p>Once I had that working I was able to compile and execute a demo application. I was also able to create an Air application package from within Aptana using File > Export > Adobe AIR > Adobe AIR Package. Before creating the Air package I had to create a signing certificate. Creating the certificate can be done within Aptana too but because I had not yet fixed the above button issue I created a cert on the command line with the Air SDK and then imported it. To create the Air signing certificate from the command line I used the adt command from the SDK:</p>
<div class="codesnip-container" >adt -certificate -cn SelfSigned 1024-RSA sampleCert.pfx samplePassword</div>
<p>Remember the password that gets used to generate the certificate because it will have to be used before a package is signed.</p>
<p>Finally Adobe has a lot of information on developing Air applications on their <a href="http://www.adobe.com/devnet/air/">Air devnet site</a>. The <a href="http://www.adobe.com/devnet/air/ajax/">Air ajax section</a> is especially important.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=KZyKO7euX2M:txtp0SPzqEI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=KZyKO7euX2M:txtp0SPzqEI:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=KZyKO7euX2M:txtp0SPzqEI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=KZyKO7euX2M:txtp0SPzqEI:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=KZyKO7euX2M:txtp0SPzqEI:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=KZyKO7euX2M:txtp0SPzqEI:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/KZyKO7euX2M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/875/developing-adobe-air-apps-with-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/875/developing-adobe-air-apps-with-linux/</feedburner:origLink></item>
		<item>
		<title>Heroku Tips for the Cheap</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/73i7NeZ-808/</link>
		<comments>http://www.ioncannon.net/programming/842/heroku-tips-for-the-cheap/#comments</comments>
		<pubDate>Sun, 20 Dec 2009 10:50:14 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[heroku]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[rack]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=842</guid>
		<description><![CDATA[I&#039;ve been playing around with the Ruby/Rails cloud provider Heroku a little bit lately just to try it out. It is somewhat like Google App Engine or Microsoft Azure in the way it works since you bundle your application and push it out to the Heroku cloud for deployment. It is very easy to get [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;ve been playing around with the Ruby/Rails cloud provider <a href="http://www.heroku.com">Heroku</a> a little bit lately just to try it out. It is somewhat like Google App Engine or Microsoft Azure in the way it works since you bundle your application and push it out to the Heroku cloud for deployment. It is very easy to get things going but I ran into a few interesting items that I figured I would share. </p>
<p><span id="more-842"></span></p>
<p>Some of the following grew out of the requirement that you verify for a lot of the Heroku addons and verification requires a credit card. For some of these tips there are easier paths if you don&#039;t mind paying a little extra or just verifying your account.</p>
<ul>
<li><a href="#heroku-config">Configuration</a></li>
<li><a href="#heroku-cname">Set up a custom domain name</a></li>
<li><a href="#heroku-cron">Poor man&#039;s cron</a></li>
<li><a href="#heroku-gmail">Sending mail with GMail</a></li>
<li><a href="#heroku-logs">Keeping your logs</a></li>
<li><a href="#heroku-rack">Rack works too</a></li>
<li><a href="#heroku-compiled-gems">Compiled gems</a></li>
<li><a href="#heroku-dynos">Understanding dynos</a></li>
</ul>
<p><a name="heroku-config"><b>Configuration</b></a></p>
<p>For configuration <a href="http://docs.heroku.com/config-vars#local-setup">Heroku has a way</a> but for some reason it didn&#039;t seem like the best or easiest way to do things. I started off setting up a configuration file as described in this <a href="http://railscasts.com/episodes/85-yaml-configuration-file">railscast on using yaml configuration files</a>. I took a while to dig a little more and found something similar but even <a href="http://almosteffortless.com/2009/06/25/config-vars-and-heroku/">better way of doing configuration with Heroku</a>.</p>
<p><a name="heroku-cname"><b>Set up a custom domain name</b></a></p>
<p>To make a more professional looking app you probably want to have a non-Heroku base domain name. They make it very easy to set that up using a <a href="http://docs.heroku.com/custom-domains#cname-setup">CNAME DNS entry</a>.</p>
<p><a name="heroku-cron"><b>Poor man&#039;s cron</b></a></p>
<p>One of the things I ran into pretty quickly was the need to run a task once every minute. Heroku offers both <a href="http://docs.heroku.com/cron">cron</a> and <a href="http://docs.heroku.com/delayed-job">delayed job</a>. The cron jobs offered are a little limited since the finest grained execution for cron is once an hour. I believe delayed jobs could be made to run every minute but the next issue with both cron and delayed job on Heroku is that they are addons that require verification. </p>
<p>The solution for me was to create a poor man&#039;s cron. I added a controller that would execute the task and then ran wget to hit the controller from an external server. This isn&#039;t a great solution but for testing the service it worked fine. The main note on doing this is to keep in mind that you will tie up a dyno for the length of the request, see <a href="#heroku-dynos">Understanding dynos</a> for more.</p>
<p><a name="heroku-gmail"><b>Sending mail with GMail</b></a></p>
<p>If you want to send email you have a number of different options on Heroku. First off it is important to note that Heroku doesn&#039;t support sending email from their systems directly but instead they support outgoing <a href="http://docs.heroku.com/smtp">SMTP</a>. Sending mail with the <a href="http://docs.heroku.com/sendgrid">Sendgrid addon</a> is probably the most flexible option but the <a href="http://docs.heroku.com/gmail-smtp">GMail SMTP</a> option is the least costly. The GMail option gives you 500 emails a day and that was plenty for my use. </p>
<p>Even though there is an addon for the GMail option you don&#039;t actually need to use it. Instead install the <a href="http://github.com/collectiveidea/action_mailer_optional_tls">action mailer optional TLS plugin</a>. Follow the readme to get it installed and configured. Then you put something like the following into your production.rb file:</p>
<pre class="brush: plain;">
  config.action_mailer.delivery_method = :smtp
  config.action_mailer.raise_delivery_errors = true
  ActionMailer::Base.smtp_settings = {
    :tls            =&gt; true,
    :address        =&gt; 'smtp.gmail.com',
    :port           =&gt; 587,
    :domain         =&gt; 'example.come',
    :authentication =&gt; :plain,
    :user_name      =&gt; 'support@example.com',
    :password       =&gt; 'password'
  }
</pre>
<p><br/></p>
<p><a name="heroku-logs"><b>Keeping your logs</b></a></p>
<p>One downside to Heroku is that they only retain a small portion of your logs. They indicate in their docs that they only retain <a href="http://docs.heroku.com/logs-exceptions">100 lines of logs</a>. So if you want to track your log output you will need to store them outside of the service. They give two suggestions in the docs for logging but there are other ways to do something similar on your own.</p>
<p>The solution I found that seems like the easiest is to use <a href="http://blog.philburrows.com/articles/2009/09/28/rails-logging-with-mongodb/">MongoDB to log</a> since it is <a href="http://blog.mongodb.org/post/172254834/mongodb-is-fantastic-for-logging">&#034;fantastic for logging&#034;</a>. You will need to have a <a href="http://www.mongodb.org/">MongoDB</a> server available first. If you don&#039;t want to <a href="http://www.engineyard.com/blog/2009/mongodb-a-light-in-the-darkness-key-value-stores-part-5/">install MongoDB on your own external server</a> you can try the new <a href="http://mongohq.com/">MongoDB hosted solution</a>.</p>
<p>Once you have the MongoDB server ready you will need to add the MongoDB gem to your .gems file:</p>
<pre class="brush: plain;">
mongodb-mongo --source gems.github.com
</pre>
<p>Then you will want to install the <a href="http://github.com/peburrows/mongo_db_logger">mongo db logger</a> plugin in your rails project. Just follow the instructions they give on the project page to get it installed. </p>
<p>Your app/controllers/application_controller.rb will look something like this:</p>
<pre class="brush: ruby;">
class ApplicationController &lt; ActionController::Base
  include MongoDBLogging

  helper :all # include all helpers, all the time
  protect_from_forgery # See ActionController::RequestForgeryProtection for details
end
</pre>
<p>After you get the plugin installed you will have to hard code the MongoDB information into the plugin. This could probably be fixed and pulled out of a configuration file but it won&#039;t work the way it is set up out of the box. Edit the file vendor/plugins/mongo_db_logger/lib/mongo_logger.rb and change the db_configuration to match your setup. It should be something like this:</p>
<pre class="brush: plain;">
  db_configuration = {
    'host'    =&gt; 'my.mongohost.com',
    'port'    =&gt; 56700,
    'database'    =&gt; 'testapp',
    'capsize' =&gt; default_capsize}
</pre>
<p>Now you should be able to deploy to Heroku and see your logs show up in your MongoDB database.</p>
<p>A couple very important things to note are that this logging won&#039;t catch exceptions that happen above the application itself and if your MongoDB server goes down your app may hang trying to connect to it.</p>
<p><a name="heroku-rack"><b>Rack works too</b></a></p>
<p>You aren&#039;t limited to just Ruby on Rails with Heroku, <a href="http://docs.heroku.com/rack">rack works</a> as well. That opens the door for other frameworks like <a href="http://www.sinatrarb.com/">Sinatra</a>, <a href="http://merbivore.com/">Merb</a> and <a href="http://camping.rubyforge.org/files/README.html">Camping</a> on Heroku. It is also possible that more in depth logging could be done using <a href="http://clogger.rubyforge.org/">clogger</a> or even <a href="http://www.rackamole.com/">rack a mole</a>.</p>
<p>Here is an example Camping application:</p>
<p>The .gems file:</p>
<pre class="brush: plain;">
camping
</pre>
<p>The config.ru file:</p>
<pre class="brush: ruby;">
require 'hello'
run Rack::Adapter::Camping.new(Hello)
</pre>
<p>The hello.rb file:</p>
<pre class="brush: ruby;">
require 'camping'

Camping.goes :Hello

module Hello::Controllers
  class Index &lt; R '/'
     def get
        render :hello
     end
  end
end

module Hello::Views
  def hello
     p  &quot;Hello World!&quot;
  end
end
</pre>
<p><br/></p>
<p><a name="heroku-compiled-gems"><b>Compiled gems</b></a></p>
<p>Heroku also lets you use gems that need to be compiled before they are installed. You can see this if you try to use something like <a href="http://github.com/ice799/memprof">memprof</a> however you will also notice that in the memprof case it won&#039;t actually work because there are missing libraries. So using gems that require libraries may be hit or miss. </p>
<p>If you are lucky you may find that the gem is already part of Heroku like RMagick is. Here is a little example of using Heroku and the RMagick gem to produce an image using Camping:</p>
<p>The .gems file:</p>
<pre class="brush: plain;">
camping
rmagick
</pre>
<p>The config.ru file:</p>
<pre class="brush: ruby;">
require 'hello'
run Rack::Adapter::Camping.new(Hello)
</pre>
<p>The hello.rb file:</p>
<pre class="brush: ruby;">
require 'camping'
require 'RMagick'

Camping.goes :Hello

module Hello::Controllers
  class Index &lt; R '/'
     def get
       @headers[&quot;Content-Type&quot;] = &quot;image/gif&quot;
       img = Magick::Image.new(200, 200)

       gc = Magick::Draw.new
       gc.gravity = Magick::CenterGravity
       gc.pointsize = 32
       gc.font_family = &quot;Helvetica&quot;
       gc.font_weight = Magick::BoldWeight
       gc.stroke = 'none'
       gc.annotate(img, 0, 0, 0, 0, &quot;Hello world!&quot;)

       img.format = &quot;GIF&quot;
       img.to_blob
     end
  end
end
</pre>
<p><br/></p>
<p><a name="heroku-dynos"><b>Understanding dynos</b></a></p>
<p>It is important to understand how <a href="http://docs.heroku.com/dynos">dynos</a> work. The bottom line is that you need one for every concurrent request. This is easy to demonstrate with the following modification of the Camping example above:</p>
<pre class="brush: plain;">
require 'camping'

Camping.goes :Hello

module Hello::Controllers
  class Index &amp;lt; R '/'
     def get
        render :hello
     end
  end
end

module Hello::Views
  def hello
     sleep 10
     p  &quot;Hello World!&quot;
  end
end
</pre>
<p>If you deploy this application to the free version of Heroku and then open two requests to it at the same time the second request will hang until the first one completes. This should illustrate why it is important to keep your processing quick and have enough dynos to match the concurrent request needs of your application.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=73i7NeZ-808:rVeGu-Y6gns:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=73i7NeZ-808:rVeGu-Y6gns:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=73i7NeZ-808:rVeGu-Y6gns:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=73i7NeZ-808:rVeGu-Y6gns:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=73i7NeZ-808:rVeGu-Y6gns:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=73i7NeZ-808:rVeGu-Y6gns:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/73i7NeZ-808" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/842/heroku-tips-for-the-cheap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/842/heroku-tips-for-the-cheap/</feedburner:origLink></item>
		<item>
		<title>Building Chromium and Chrome OS with EC2</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/lpcjxicsJnQ/</link>
		<comments>http://www.ioncannon.net/system-administration/771/building-chromium-and-chrome-os-with-ec2/#comments</comments>
		<pubDate>Mon, 23 Nov 2009 21:17:59 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[system administration]]></category>
		<category><![CDATA[chrome os]]></category>
		<category><![CDATA[chromium]]></category>
		<category><![CDATA[ec2]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=771</guid>
		<description><![CDATA[When the initial cut of the Chromium OS source was released last week I decided to use the opportunity to see if it would run on my EEE PC 900 netbook (check out EEE PC 900 running Chrome OS on Youtube to see the final result). The first roadblock I hit with the build instructions [...]]]></description>
			<content:encoded><![CDATA[<p>When the initial cut of the <a href="http://www.chromium.org/chromium-os">Chromium OS</a> source was released last week I decided to use the opportunity to see if it would run on my EEE PC 900 netbook (check out <a href="http://www.youtube.com/watch?v=gfQ6Qef5DWk">EEE PC 900 running Chrome OS</a> on Youtube to see the final result). The first roadblock I hit with the <a href="http://sites.google.com/a/chromium.org/dev/chromium-os/building-chromium-os/build-instructions">build instructions</a> was the Ubuntu requirement (I did give a little effort to getting it working on Fedora first). I don&#039;t have an Ubuntu box so I started out trying to use VirtualBox but that was going to take forever so I decided to move things to EC2 and what follows is the result. This isn&#039;t meant to be a replacement for the build docs since they are surely going to change, it is more of a cookbook to build <a href="http://www.chromium.org/Home">Chromium</a> (the browser) and Chromium OS using EC2 (EBS is used as well if you want to cache the source over time).</p>
<p>When I first started down the path of using EC2 I thought I would grab the source each time I wanted to build. I quickly ran into a snag however because it took forever to sync the source and download the Ubuntu repo. Once I had the initial sync of the source I decided I would copy it all to an EBS volume and keep that volume up to date. Using EBS to store the source feels better too since I assume Google expects people to be syncing changes only as opposed to pulling the entire source tree down every time they want to build.</p>
<p>I started out by finding this <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1952&#038;categoryID=101">Ubuntu AMI</a> for a base to work from. For the most efficient compile times I ended up using the High CPU (c1.medium) instance. I started with the default small instance but it was just too slow. With the high cpu instance you are looking at about 45 minutes to build the OS after you have the source synced for the first time and if you add building Chromium in there you are looking at around 55 additional minutes. All told you can have a complete build in less than 2 hours even if there are some source updates needed. For EBS you need a 3G volume for the Chrome OS source plus Ubuntu package repo and a 4G volume for the Chromium source. </p>
<p><span id="more-771"></span></p>
<p>I&#039;ve bundled everything up into one script called <a href="http://www.ioncannon.net/examples/builder.sh">builder.sh</a> and if you don&#039;t care to know the details you can download it and give it a try. There are some things you need to know about it however even if you don&#039;t want to follow all the details. First there are some of the assumptions made by the script:</p>
<ol>
<li>The EC2 AMI ami-ccf615a5 image is used on the node it is run on.</li>
<li>If you are storing the source then you have attached the EBS volumes and you have initialized a filesystem on both with mkfs.ext3 </li>
<li>If you are storing the source then you have attached the EBS volumes and indicated what devices they are using OSVOLDEV and BROWSERVOLDEV in the following config section</li>
<li>You are running the script as the root user</li>
</ol>
<p>Next is a small configuration area at the top of the script that lets it know what devices the two volumes will be mounted on. If you don&#039;t intend on having stored source then you don&#039;t need to worry about setting them to anything as long as what they are set to doesn&#039;t match a real device.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="re2">OSVOLDEV</span>=<span class="sy0">/</span>dev<span class="sy0">/</span>sdf1<br />
<span class="re2">BROWSERVOLDEV</span>=<span class="sy0">/</span>dev<span class="sy0">/</span>sdg1</div>
</div>
<p><br/></p>
<p>There are a few command line options that will let you tune the script if you want to skip parts of the build process:</p>
<ul>
<li>&#45;-skip-sync &#8211; This will skip trying to sync any of the repos. If you aren&#039;t using EBS to store the source the script will fail if you use this option since there will be nothing to build.</li>
<li>&#45;-skip-chrome-build &#8211; This will skip trying to build chrome. You have to build chrome at least once for the chrome os build to work.</li>
<li>&#45;-skip-chrome-os-build &#8211; This will skip trying to build chrome os.</li>
</ul>
<p>Here are a few other important things to make note of:</p>
<ul>
<li>The build script adds a user named &#034;gogo&#034; that can be used to log in if you don&#039;t want to use your Google login.</li>
<li>The build script sets the password for root access to &#034;gogo&#034;.</li>
<li>Running the build script should result in an image named /mnt/builder/<BUILDID>.usb.img.bz when everything is done. This is a compressed bootable image that can be written to a USB key (use something like dd if=usb.img of=/dev/usbkeydevice bs=10M).</li>
<li> I could imagine the same instructions and script being re-purposed for building on a non-EC2 system too.</li>
<li>The script can be run multiple times on the same node and will skip the parts that should only be done once.</li>
<li>There are probably ways to speed up the build to make it even faster. With some more effort the build for the browser could be kicked off while the source for the OS was still be synced. I also wondered if using a 64 bit EC2 node and creating a large ramdisk would help but I didn&#039;t want to fiddle with 64 bit builds of Chrome, I may revisit if I find time.</li>
<li>I was tempted to build an AMI that included all the prerequisites but it only takes 5 minutes to pull all of those together so I decided not to. It could streamline things to do that but I&#039;m not sure that it is worth the effort.</li>
</ul>
<p>If you are looking for more information about the script, places where it might make sense to modify the script and the build process in general then read on. </p>
<p>To get started there are a number of prerequisites that need to be installed. These are the required parts for both Chromium and Chromium OS. One note here is that there is some issue with lighttpd on Ubuntu that causes it to die while the apt-get is downloading all the packages so I had to resort to installing Apache as well. I make sure lighttpd isn&#039;t running since the prerequisites install it.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="re5">-f</span> <span class="sy0">/</span>var<span class="sy0">/</span>run<span class="sy0">/</span>chromepre <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw3">echo</span> <span class="st0">&quot;Prereqs already installed, skipping&#8230;&quot;</span><br />
&nbsp; <span class="kw3">return</span><br />
<span class="kw1">fi</span></p>
<p><span class="kw2">touch</span> <span class="sy0">/</span>var<span class="sy0">/</span>run<span class="sy0">/</span>chromepre</p>
<p><span class="kw2">apt-get</span> <span class="re5">-y</span> <span class="re5">&#8211;force-yes</span> update<br />
<span class="kw2">apt-get</span> <span class="re5">-y</span> <span class="re5">&#8211;force-yes</span> upgrade<br />
<span class="kw2">wget</span> http:<span class="sy0">//</span>src.chromium.org<span class="sy0">/</span>svn<span class="sy0">/</span>trunk<span class="sy0">/</span>src<span class="sy0">/</span>build<span class="sy0">/</span>install-build-deps.sh <span class="re5">-O</span> <span class="sy0">/</span>tmp<span class="sy0">/</span>install-build-deps.sh<br />
<span class="kw2">chmod</span> +x <span class="sy0">/</span>tmp<span class="sy0">/</span>install-build-deps.sh<br />
<span class="kw3">echo</span> <span class="st0">&quot;yy&quot;</span> <span class="sy0">|</span> <span class="sy0">/</span>tmp<span class="sy0">/</span>install-build-deps.sh <span class="co0"># (requires y twice)</span><br />
<span class="sy0">/</span>etc<span class="sy0">/</span>init.d<span class="sy0">/</span>lighttpd stop<br />
<span class="kw2">apt-get</span> <span class="re5">-y</span> <span class="re5">&#8211;force-yes</span> <span class="kw2">install</span> git-core apache2 apt-mirror <span class="kw2">zip</span></p>
<p><span class="kw2">cat</span> <span class="sy0">&gt;</span> <span class="sy0">/</span>etc<span class="sy0">/</span>apt<span class="sy0">/</span>mirror.list <span class="sy0">&lt;&lt;</span>__EOF__<br />
<span class="kw1">set</span> base_path &nbsp; &nbsp;<span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>repo<br />
<span class="kw1">set</span> mirror_path &nbsp;<span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>repo<span class="sy0">/</span>mirror<br />
<span class="kw1">set</span> skel_path &nbsp; &nbsp;<span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>repo<span class="sy0">/</span>skel<br />
<span class="kw1">set</span> nthreads &nbsp; &nbsp; 20<br />
<span class="kw1">set</span> _tilde 0<br />
deb http:<span class="sy0">//</span>build.chromium.org<span class="sy0">/</span>buildbot<span class="sy0">/</span>packages<span class="sy0">/</span> chromeos main restricted universe multiverse<br />
deb http:<span class="sy0">//</span>build.chromium.org<span class="sy0">/</span>buildbot<span class="sy0">/</span>packages<span class="sy0">/</span> chromeos_dev main restricted universe multiverse<br />
clean http:<span class="sy0">//</span>build.chromium.org<span class="sy0">/</span>buildbot<span class="sy0">/</span>packages<span class="sy0">/</span><br />
__EOF__</p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-b</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop6 <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">mknod</span> <span class="re5">-m660</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop6 b 7 6<br />
<span class="kw1">fi</span></p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-b</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop7 <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">mknod</span> <span class="re5">-m660</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop7 b 7 7<br />
<span class="kw1">fi</span></div>
</div>
<p><br/></p>
<p>The next step is to create a user that will actually run the build. It turns out to be important that this user is not root so that is why there are some hoops being jumped here. The two source volumes get mounted into the user&#039;s home directory here as well. The last part of this section builds a filesystem to be used as temporary storage for output images. The way they build is set up it will push the final images into the source tree and because that is stored on EBS it could end up eating up a lot of room, there is more to this later in the Chrome OS build section.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="re5">-d</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw3">echo</span> <span class="st0">&quot;Build user already created, skipping&#8230;&quot;</span><br />
&nbsp; <span class="kw3">return</span><br />
<span class="kw1">fi</span></p>
<p>useradd <span class="re5">-G</span> disk <span class="re5">-u</span> 1001 <span class="re5">-s</span> <span class="sy0">/</span>bin<span class="sy0">/</span><span class="kw2">bash</span> <span class="re5">-d</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder <span class="re5">-m</span> builder<br />
<span class="kw3">echo</span> <span class="st0">&quot;builder &nbsp; &nbsp;ALL=NOPASSWD: ALL&quot;</span> <span class="sy0">&gt;&gt;</span> <span class="sy0">/</span>etc<span class="sy0">/</span>sudoers</p>
<p><span class="kw2">mkdir</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<br />
<span class="kw2">chown</span> builder.builder <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span><br />
<span class="kw2">mkdir</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chrome<br />
<span class="kw2">chown</span> builder.builder <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chrome<span class="sy0">/</span></p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="re5">-b</span> <span class="re1">$OSVOLDEV</span> <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">mount</span> <span class="re1">$OSVOLDEV</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span><br />
<span class="kw1">fi</span><br />
<span class="kw1">if</span> <span class="br0">&#91;</span> <span class="re5">-b</span> <span class="re1">$BROWSERVOLDEV</span> <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">mount</span> <span class="re1">$BROWSERVOLDEV</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chrome<span class="sy0">/</span><br />
<span class="kw1">fi</span></p>
<p><span class="kw3">echo</span> <span class="st0">&quot;export PATH=<span class="es1">\$</span>PATH:/mnt/builder/chromeos/depot_tools/&quot;</span> <span class="sy0">&gt;&gt;</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>.bashrc</p>
<p><span class="kw2">ln</span> <span class="re5">-s</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>repo<span class="sy0">/</span>mirror<span class="sy0">/</span>build.chromium.org<span class="sy0">/</span>buildbot<span class="sy0">/</span>packages<span class="sy0">/</span> <span class="sy0">/</span>var<span class="sy0">/</span>www<span class="sy0">/</span>packages</p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-f</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>images.img <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">dd</span> <span class="kw1">if</span>=<span class="sy0">/</span>dev<span class="sy0">/</span>zero <span class="re2">of</span>=<span class="sy0">/</span>mnt<span class="sy0">/</span>images.img <span class="re2">bs</span>=1 <span class="re2">count</span>=0 <span class="re2">seek</span>=10G<br />
&nbsp; losetup <span class="sy0">/</span>dev<span class="sy0">/</span>loop6 <span class="sy0">/</span>mnt<span class="sy0">/</span>images.img<br />
&nbsp; mkfs.ext3 <span class="sy0">/</span>dev<span class="sy0">/</span>loop6<br />
<span class="kw1">fi</span></p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-d</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>depot_tools<span class="sy0">/</span> <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;cd /mnt/builder/chromeos/; svn co http://src.chromium.org/svn/trunk/tools/depot_tools&quot;</span><br />
<span class="kw1">fi</span></div>
</div>
<p><br/></p>
<p>The next sections each sync either source or the apt-get repository. The first is syncing the apt-get repository using apt-mirror. In case you missed it the mirror was defined in the prerequisites section above.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> apt-mirror</div>
</div>
<p><br/></p>
<p>The next is the Chromium OS source repository.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw3">cd</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>chromiumos<span class="sy0">/</span><br />
<span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-f</span> .gclient <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;cd /mnt/builder/chromeos/chromiumos/; /mnt/builder/chromeos/depot_tools/gclient config http://src.chromium.org/git/chromiumos.git&quot;</span><br />
<span class="kw1">fi</span></p>
<p><span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;cd /mnt/builder/chromeos/chromiumos/; /mnt/builder/chromeos/depot_tools/gclient sync&quot;</span></div>
</div>
<p><br/></p>
<p>There are a few extra parts to syncing the Chromium browser. One of those extras is to make sure the third party test suites don&#039;t get synced since they are large.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw2">cat</span> <span class="sy0">&gt;</span> <span class="sy0">/</span>tmp<span class="sy0">/</span>scs.sh <span class="sy0">&lt;&lt;</span>__EOF__<br />
<span class="kw3">export</span> <span class="re2">GYP_DEFINES</span>=<span class="st0">&quot;chromeos=1 target_arch=ia32&quot;</span><br />
<span class="kw3">export</span> <span class="re2">GYP_GENERATORS</span>=<span class="kw2">make</span><br />
<span class="kw3">cd</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chrome<span class="sy0">/</span></p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-f</span> .gclient <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>depot_tools<span class="sy0">/</span>gclient config http:<span class="sy0">//</span>src.chromium.org<span class="sy0">/</span>svn<span class="sy0">/</span>trunk<span class="sy0">/</span>src<br />
&nbsp; <span class="kw2">awk</span> <span class="re5">-f</span> &#8211; .gclient <span class="sy0">&lt;&lt;</span> __END__<br />
<span class="br0">&#123;</span> print <span class="br0">&#125;</span><br />
<span class="sy0">/</span><span class="st0">&quot; &nbsp; &nbsp;&quot;</span>custom_deps<span class="st0">&quot; : {&quot;</span><span class="sy0">/</span> <span class="br0">&#123;</span> <br />
&nbsp; &nbsp; print <span class="st0">&quot; &nbsp; &nbsp;<span class="es1">\&quot;</span>custom_deps<span class="es1">\&quot;</span> : {<span class="es1">\n</span><span class="es1">\&quot;</span>src/third_party/WebKit/LayoutTests<span class="es1">\&quot;</span>: None,&quot;</span> <br />
<span class="br0">&#125;</span><br />
__END__<br />
<span class="kw1">fi</span></p>
<p><span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>depot_tools<span class="sy0">/</span>gclient <span class="kw2">sync</span> <span class="re5">&#8211;deps</span>=<span class="st0">&quot;chromeos,unix&quot;</span></p>
<p><span class="kw3">exit</span><br />
__EOF__</p>
<p><span class="kw2">chmod</span> +x <span class="sy0">/</span>tmp<span class="sy0">/</span>scs.sh</p>
<p><span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="sy0">/</span>tmp<span class="sy0">/</span>scs.sh</div>
</div>
<p><br/></p>
<p>Building the Chromium browser. This isn&#039;t needed every time. When it is done it sticks the browser in place for the Chromium OS build.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-d</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chrome.tmp <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; <span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;cp -Rp /mnt/builder/chrome /mnt/builder/chrome.tmp&quot;</span><br />
<span class="kw1">fi</span><br />
<span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;export PATH=<span class="es2">$PATH</span>:/mnt/builder/chromeos/depot_tools/; /mnt/builder/chromeos/chromiumos/chromiumos.git/src/scripts/build_chrome.sh &#8211;chrome_dir /mnt/builder/chrome.tmp/&quot;</span></div>
</div>
<p><br/></p>
<p>Finally the place where the OS actually gets compiled. The majority of this is right out of the build instructions. One thing to understand here is that the build process creates a chroot environment and then maps the source into that before the build starts. The build is done inside the chroot environment so the build script has to create a temporary script with all the commands that need to run there. One other note here is that the images filesystem created above has to be mapped into the chroot environment and I did that by setting it up as a loopback device then mounting it from inside the chroot environment. After the build is complete I can then get to the filesystem outside of the chroot environment to grab the final image. If you want to change the test user, the system password or get rid of either of those options this is the section to change.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;/mnt/builder/chromeos/chromiumos/chromiumos.git/src/scripts/make_chroot.sh &#8211;mirror http://localhost/packages/ &#8211;chroot /mnt/builder/chroot &#8211;replace&quot;</span></p>
<p><span class="kw2">cat</span> <span class="sy0">&gt;</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chroot<span class="sy0">/</span>tmp<span class="sy0">/</span>go.sh <span class="sy0">&lt;&lt;</span> __EOF__<br />
<span class="co0">#!/bin/sh</span><br />
<span class="kw2">rm</span> <span class="re5">-f</span> <span class="sy0">/</span>tmp<span class="sy0">/</span>go.sh</p>
<p><span class="kw3">cd</span> ..<span class="sy0">/</span>platform<span class="sy0">/</span>pam_google <span class="sy0">&amp;&amp;</span> .<span class="sy0">/</span>enable_localaccount.sh gogo<br />
<span class="kw3">cd</span> -<br />
<span class="kw3">echo</span> <span class="st0">&quot;gogo&quot;</span> <span class="sy0">|</span> .<span class="sy0">/</span>set_shared_user_password.sh</p>
<p>.<span class="sy0">/</span>build_platform_packages.sh<br />
.<span class="sy0">/</span>build_kernel.sh</p>
<p><span class="kw2">sudo</span> <span class="kw2">mount</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop6 <span class="sy0">/</span>home<span class="sy0">/</span>builder<span class="sy0">/</span>trunk<span class="sy0">/</span>src<span class="sy0">/</span>build<span class="sy0">/</span>images<span class="sy0">/</span><br />
<span class="kw2">sudo</span> <span class="kw2">chown</span> builder.adm <span class="sy0">/</span>home<span class="sy0">/</span>builder<span class="sy0">/</span>trunk<span class="sy0">/</span>src<span class="sy0">/</span>build<span class="sy0">/</span>images</p>
<p>.<span class="sy0">/</span>build_image.sh <span class="re5">&#8211;mirror</span> http:<span class="sy0">//</span>localhost<span class="sy0">/</span>packages<span class="sy0">/</span></p>
<p><span class="kw2">sudo</span> <span class="kw2">umount</span> <span class="sy0">/</span>home<span class="sy0">/</span>builder<span class="sy0">/</span>trunk<span class="sy0">/</span>src<span class="sy0">/</span>build<span class="sy0">/</span>images<span class="sy0">/</span></p>
<p><span class="kw3">exit</span><br />
__EOF__</p>
<p><span class="kw2">chown</span> builder.builder <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chroot<span class="sy0">/</span>tmp<span class="sy0">/</span>go.sh<br />
<span class="kw2">chmod</span> +x <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chroot<span class="sy0">/</span>tmp<span class="sy0">/</span>go.sh</p>
<p><span class="kw2">su</span> &#8211; builder <span class="re5">-c</span> <span class="st0">&quot;/mnt/builder/chromeos/chromiumos/chromiumos.git/src/scripts/enter_chroot.sh &#8211;chroot /mnt/builder/chroot /tmp/go.sh&quot;</span></p>
<p><span class="kw2">mkdir</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>mnt<span class="sy0">/</span><br />
<span class="kw2">mount</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop6 <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>mnt<br />
<span class="re2">NEWEST</span>=<span class="sy0">`</span><span class="kw2">ls</span> <span class="re5">-tadr</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>mnt<span class="sy0">/*/</span> <span class="sy0">|</span> <span class="kw2">tail</span> <span class="re5">-1</span> <span class="sy0">|</span> <span class="kw2">head</span> -1<span class="sy0">`</span><br />
<span class="re2">BUILDID</span>=<span class="sy0">`</span><span class="kw2">basename</span> <span class="re1">$NEWEST</span><span class="sy0">`</span><br />
<span class="kw2">dd</span> <span class="kw1">if</span>=<span class="sy0">/</span>dev<span class="sy0">/</span>zero <span class="re2">of</span>=<span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span><span class="re1">$BUILDID</span>.usb.img <span class="re2">bs</span>=1 <span class="re2">count</span>=0 <span class="re2">seek</span>=4G<br />
losetup <span class="sy0">/</span>dev<span class="sy0">/</span>loop7 <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span><span class="re1">$BUILDID</span>.usb.img<br />
<span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>chromeos<span class="sy0">/</span>chromiumos<span class="sy0">/</span>chromiumos.git<span class="sy0">/</span>src<span class="sy0">/</span>scripts<span class="sy0">/</span>image_to_usb.sh <span class="re5">-y</span> <span class="re5">&#8211;from</span>=<span class="re1">$NEWEST</span> <span class="re5">&#8211;to</span>=<span class="sy0">/</span>dev<span class="sy0">/</span>loop7<br />
losetup <span class="re5">-d</span> <span class="sy0">/</span>dev<span class="sy0">/</span>loop7<br />
<span class="kw2">umount</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span>mnt<br />
<span class="kw2">bzip2</span> <span class="re5">-9</span> <span class="sy0">/</span>mnt<span class="sy0">/</span>builder<span class="sy0">/</span><span class="re1">$BUILDID</span>.usb.img</div>
</div>
<p><br/></p>
<p>A final bit of glue looks for command line arguments ties everything together.</p>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="re2">SKIP_SYNC</span>=<span class="kw2">false</span><br />
<span class="re2">SKIP_CHROME_BUILD</span>=<span class="kw2">false</span><br />
<span class="re2">SKIP_CHROME_OS_BUILD</span>=<span class="kw2">false</span></p>
<p><span class="kw1">set</span> <span class="re5">&#8211;</span> $<span class="br0">&#40;</span><span class="kw2">getopt</span> <span class="re5">-l</span> skip-sync,skip-chrome-build,skip-chrome-os-build abc: <span class="st0">&quot;$@&quot;</span><span class="br0">&#41;</span><br />
<span class="kw1">while</span> <span class="br0">&#91;</span> <span class="re4">$#</span> <span class="re5">-gt</span> 0 <span class="br0">&#93;</span><br />
<span class="kw1">do</span><br />
&nbsp; <span class="kw1">case</span> <span class="st0">&quot;$1&quot;</span> <span class="kw1">in</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span>&#8211;skip-sync<span class="br0">&#41;</span> <span class="re2">SKIP_SYNC</span>=<span class="kw2">true</span>; <span class="kw3">break</span><span class="sy0">;;</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span>&#8211;skip-chrome-build<span class="br0">&#41;</span> <span class="re2">SKIP_CHROME_BUILD</span>=<span class="kw2">true</span>; <span class="kw3">break</span><span class="sy0">;;</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span>&#8211;skip-chrome-os-build<span class="br0">&#41;</span> <span class="re2">SKIP_CHROME_OS_BUILD</span>=<span class="kw2">true</span>; <span class="kw3">break</span><span class="sy0">;;</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span>&#8211;<span class="br0">&#41;</span> <span class="kw3">shift</span>; <span class="kw3">break</span><span class="sy0">;;</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span>&#8211;<span class="sy0">*</span><span class="br0">&#41;</span> <span class="kw3">echo</span> <span class="st0">&quot;$0: error &#8211; unrecognized option $1&quot;</span> 1<span class="sy0">&gt;&amp;</span><span class="nu0">2</span>; <span class="kw3">exit</span> <span class="nu0">1</span><span class="sy0">;;</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span><span class="sy0">*</span><span class="br0">&#41;</span> &nbsp;<span class="kw3">break</span><span class="sy0">;;</span><br />
&nbsp; <span class="kw1">esac</span><br />
&nbsp; <span class="kw3">shift</span><br />
<span class="kw1">done</span></p>
<p>install_prereqs<br />
create_build_user</p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$SKIP_SYNC</span>&quot;</span> <span class="sy0">!</span>= <span class="st0">&quot;true&quot;</span> <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; sync_apt<br />
&nbsp; sync_chrome<br />
&nbsp; sync_chrome_os<br />
<span class="kw1">fi</span></p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$SKIP_CHROME_BUILD</span>&quot;</span> <span class="sy0">!</span>= <span class="st0">&quot;true&quot;</span> <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; build_chrome<br />
<span class="kw1">fi</span></p>
<p><span class="kw1">if</span> <span class="br0">&#91;</span> <span class="st0">&quot;<span class="es2">$SKIP_CHROME_BUILD</span>&quot;</span> <span class="sy0">!</span>= <span class="st0">&quot;true&quot;</span> <span class="br0">&#93;</span><br />
<span class="kw1">then</span><br />
&nbsp; build_chrome_os<br />
<span class="kw1">fi</span></div>
</div>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=lpcjxicsJnQ:nQeX7MTeF8w:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=lpcjxicsJnQ:nQeX7MTeF8w:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=lpcjxicsJnQ:nQeX7MTeF8w:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=lpcjxicsJnQ:nQeX7MTeF8w:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=lpcjxicsJnQ:nQeX7MTeF8w:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=lpcjxicsJnQ:nQeX7MTeF8w:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/lpcjxicsJnQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/system-administration/771/building-chromium-and-chrome-os-with-ec2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/system-administration/771/building-chromium-and-chrome-os-with-ec2/</feedburner:origLink></item>
		<item>
		<title>Upgrade to Fedora 12 from Fedora 11</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/XAiHMjGzO9A/</link>
		<comments>http://www.ioncannon.net/system-administration/719/upgrade-to-fedora-12-from-fedora-11/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 11:56:18 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[system administration]]></category>
		<category><![CDATA[Fedora]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=719</guid>
		<description><![CDATA[Fedora 12 was just released and it is time to upgrade again of course. I almost thought this was going to be a version to yawn at but then I saw that there was going to be a new version of Fedora based on Moblin and it seemed exciting again. Of course that isn&#039;t the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://docs.fedoraproject.org/release-notes/f12/en-US/html/">Fedora 12</a> was just released and it is time to upgrade again of course. I almost thought this was going to be a version to yawn at but then I saw that there was going to be a new version of Fedora based on <a href="http://moblin.org/">Moblin</a> and it seemed exciting again. Of course that isn&#039;t the only thing being upgraded in the latest version of Fedora. Some of the more notable changes in this version:</p>
<ul>
<li>Updated window managers <a href="http://www.fedoraproject.org/wiki/Features/Gnome2.28">Gnome 2.28</a>, <a href="http://www.fedoraproject.org/wiki/Features/KDE43">KDE 4.3</a> and <a href="http://www.fedoraproject.org/wiki/Features/FedoraMoblin">Fedora Moblin</a></li>
<li><a href="http://fedoraproject.org/wiki/Features/Presto">Delta RPM support</a></li>
<li><a href="http://www.fedoraproject.org/wiki/Features/F12X86Support">i686 as the base architecture</a></li>
<li>Lots of virtualization changes: <a href="http://www.fedoraproject.org/wiki/Features/KSM">KSM</a>, <a href="http://www.fedoraproject.org/wiki/Features/KVM_Huge_Page_Backed_Memory">KVM huge page support</a>, <a href="http://www.fedoraproject.org/wiki/Features/KVM_NIC_Hotplug>KVM NIC hotplug</a>, <a href="http://www.fedoraproject.org/wiki/Features/KVM_qcow2_Performance">KVM QCow2 performance improvements</a>, <a href="http://www.fedoraproject.org/wiki/Features/KVM_Stable_Guest_ABI">KVM Stable Guest ABI</a>, <a href="http://www.fedoraproject.org/wiki/Features/libguestfs">libguestfs</a>, <a href="http://www.fedoraproject.org/wiki/Features/Network_Interface_Management">Virtual network management</a> and <a href="http://www.fedoraproject.org/wiki/Features/VirtPrivileges">improved virtual privileges</a> to name a few</li>
<li>An easier to use bug reporting interface <a href="http://www.fedoraproject.org/wiki/Features/ABRTF12">Abrt 1.0</a></li>
<li><a href="http://www.fedoraproject.org/wiki/Features/BetterWebcamSupportF12">Better Webcam Support</a></li>
</ul>
<p>You can find the complete list of <a href="http://www.fedoraproject.org/wiki/Releases/12/FeatureList">Fedora 12 enhancements</a> as well if you want more details.</p>
<p><span id="more-719"></span></p>
<p>I&#039;m again starting with <a href="http://fedoraproject.org/wiki/Features/PreUpgrade">PreUpdate</a> since it worked well last time. The steps are pretty much the same as last time but I did have more problems after the upgrade:</p>
<ol>
<li>You have to be at Fedora11 before you try this. If you aren&#039;t there follow the <a href="http://www.ioncannon.net/system-administration/358/upgrading-to-fedora-11-from-fedora-10/">steps to get to Fedora11</a>.</li>
<li>yum clean all</li>
<li>yum udpate</li>
<li>Make sure to back up your xorg.conf since it can disappear with the upgrade.</li>
<li>preupgrade-cli &#034;Fedora 12 (Constantine)&#034; The total download for the upgrade from Fedora 11 to Fedora 12 was 1.1GB for me</li>
<li>After I rebooted the first time I got an error that the /boot directory didn&#039;t have enough space. It turns out I needed about 27M of free space so I ended up having to delete old kernels until I had 28M of free space.</li>
<li>reboot and wait</li>
</ol>
<p>I lucked out again and didn&#039;t have to remove anything to fix dependency issues. I recompiled my existing NVIDIA driver but I ran into an issue with the nouveau NVIDIA driver that comes with Fedora 12. I needed to remove the nouveau package, rebuild initrd and reboot before I could build the NVIDIA driver:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">yum remove xorg-x11-drv-nouveau.i686<br />
rm /boot/initramfs-2.6.31.5-127.fc12.i686.img<br />
mkinitrd /boot/initramfs-2.6.31.5-127.fc12.i686.img 2.6.31.5-127.fc12.i686</div>
</div>
<p>If that is too much work you can also get the NVIDIA driver from <a href="http://www.atrpms.net/">atrpms</a> but you will still need to remove the nouveau driver first. I also needed to pull down the Fedora 12 version of <a href="http://www.virtualbox.org/wiki/Linux_Downloads">Virtualbox</a> but so far that was all.</p>
<p>If you prefer the yum upgrade option here are the steps for that as well:</p>
<ol>
<li>yum clean all</li>
<li>yum update</li>
<li>rpm -Uvh http://mirrors.kernel.org/fedora/releases/12/Fedora/i386/os/Packages/fedora-release-notes-12.0.0-4.fc12.noarch.rpm http://mirrors.kernel.org/fedora/releases/12/Fedora/i386/os/Packages/fedora-release-12-1.noarch.rpm</li>
<li>yum clean all</li>
<li>yum -y update</li>
<li>You may need to resolve dependencies and then do another yum -y update</li>
<li>reboot</li>
</ol>
<p>I had to resolve dependencies to get this to work. I had to remove tigervnc-server-1.0.0-2.fc11.i586 and VirtualBox-3.0.10_54097_fedora11-1.i586. The resulting update was about 1.4G so it took a little while to apply.</p>
<p>I&#039;ve also put together a few <a href="http://www.ioncannon.net/projects/miscellaneous-projects/fedora-12-screenshots-and-videos/">videos and screenshots</a> if you want to get a quick preview of what the different versions available look like. You can also view the videos on Youtube: <a href="http://www.youtube.com/watch?v=W6rTVTyjQ1Q&#038;feature=player_embedded">Moblin</a>, <a href="http://www.youtube.com/watch?v=F0UlVnoUtbw&#038;feature=player_embedded">KDE</a> and <a href="http://www.youtube.com/watch?v=GZ8tSxMJTr8&#038;feature=player_embedded">Gnome</a></p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=XAiHMjGzO9A:aRfm3BcDxeA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=XAiHMjGzO9A:aRfm3BcDxeA:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=XAiHMjGzO9A:aRfm3BcDxeA:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=XAiHMjGzO9A:aRfm3BcDxeA:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=XAiHMjGzO9A:aRfm3BcDxeA:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=XAiHMjGzO9A:aRfm3BcDxeA:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/XAiHMjGzO9A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/system-administration/719/upgrade-to-fedora-12-from-fedora-11/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/system-administration/719/upgrade-to-fedora-12-from-fedora-11/</feedburner:origLink></item>
		<item>
		<title>Full Text Search with Sphinx</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/G_kGmfpWI_A/</link>
		<comments>http://www.ioncannon.net/programming/685/full-text-search-with-sphinx/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 08:30:28 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[sphinx]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=685</guid>
		<description><![CDATA[While developing my GeeQE iPhone application I decided I needed a way to let users search posts so I started looking around for a simple search engine that I could use with PHP. I took a look at a number of different options like MySQL Full Text search, Sphinx, Solr and others based on Lucene. [...]]]></description>
			<content:encoded><![CDATA[<p>While developing my <a href="http://www.ioncannon.net/projects/geeqe/">GeeQE</a> iPhone application I decided I needed a way to let users search posts so I started looking around for a simple search engine that I could use with PHP. I took a look at a number of different options like <a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html">MySQL Full Text search</a>, <a href="http://www.sphinxsearch.com/">Sphinx</a>, <a href="http://lucene.apache.org/solr/">Solr</a> and others based on <a href="http://lucene.apache.org/">Lucene</a>. After looking at what it would take to get started with each I decided to go with Sphinx. Sphinx looked like it would be the easiest and quickest to set up, didn&#039;t require a lot of resources to run in an idle state and would integrate with PHP easily.</p>
<p>This post goes over how I went about configuring Sphinx and gives an example of how to integrate it with PHP. I&#039;m using MySQL as the data store filled with the <a href="http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/">Stack Overflow CC data dump</a> although it should be easy to adapt the instructions to other data sources. To follow along just download a copy of the data dump and use my <a href="http://code.google.com/p/geeqe/source/browse/trunk/scripts/schema.sql">schema</a> and <a href="http://code.google.com/p/geeqe/source/browse/trunk/scripts/load.rb">loader</a> to get the same MySQL database.</p>
<p><span id="more-685"></span></p>
<p>I&#039;ve broken the setup down into the following 4 steps:</p>
<ol>
<li><a href="#sphinx-config">Configuring Sphinx</a></li>
<li><a href="#sphinx-index">Building an Index with Sphinx</a></li>
<li><a href="#sphinx-search">Searching with Sphinx</a></li>
<li><a href="#sphinx-php-api">Using the Sphinx PHP API</a></li>
</ol>
<p>As a side note, Sphinx is available as a package under Fedora 11 and I assume the same of other Linux distributions so it shouldn&#039;t be too hard to install. I am using version 0.9.8.1 compiled from source.</p>
<p><a name="sphinx-config"><b>Configuring Sphinx</b></a></p>
<p>The Sphinx configuration file can be be broken down into different parts with some of those parts being used by the indexer and some by the search service. I have broken the configuration file I am using into different sections but I include the full file for download at the end. The first segment of the configuration describes where to get the data for the index, in this case it is a MySQL database:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">###############################################<br />
## source for the data to be indexed<br />
###############################################</p>
<p>source sosrc<br />
{<br />
&nbsp; &nbsp; &nbsp; &nbsp; type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = mysql</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; sql_host&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = localhost<br />
&nbsp; &nbsp; &nbsp; &nbsp; sql_user&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = souser<br />
&nbsp; &nbsp; &nbsp; &nbsp; sql_pass&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =<br />
&nbsp; &nbsp; &nbsp; &nbsp; sql_db&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = so_2009_10<br />
&nbsp; &nbsp; &nbsp; &nbsp; sql_port&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 3306</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; sql_query &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = SELECT id, owner_id, UNIX_TIMESTAMP(created) AS date_added, title, body_text FROM post WHERE post_type_id = 1</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; sql_attr_uint &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = owner_id<br />
&nbsp; &nbsp; &nbsp; &nbsp; sql_attr_timestamp&nbsp; &nbsp; &nbsp; = date_added</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; sql_ranged_throttle &nbsp; &nbsp; = 0</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; # document info query, ONLY for CLI search (ie. testing and debugging)<br />
&nbsp; &nbsp; &nbsp; &nbsp; sql_query_info&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = SELECT * FROM post WHERE id=$id<br />
}</div>
</div>
<p><br/></p>
<p>There are a few notable parts to the above. The field &#034;sql_query&#034; defines the query that grabs data from the database for the index. The &#034;sql_attr_*&#034; values point out the columns in the data that will be used in search queries later to sort or group by, more on that in the searching section.</p>
<p>The next section defines the index:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">###############################################<br />
## base search index<br />
###############################################</p>
<p>index so_2009_10<br />
{<br />
&nbsp; &nbsp; &nbsp; &nbsp; source&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = sosrc<br />
&nbsp; &nbsp; &nbsp; &nbsp; path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = ./data/so_2009_10<br />
&nbsp; &nbsp; &nbsp; &nbsp; docinfo &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = extern<br />
&nbsp; &nbsp; &nbsp; &nbsp; mlock &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 0</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; morphology&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = none</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; min_word_len&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; charset_type&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = sbcs</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; html_strip&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 1<br />
}</p>
<p>###############################################<br />
## index that extends the original index<br />
###############################################</p>
<p>index so_2009_10stemmed : so_2009_10<br />
{<br />
&nbsp; &nbsp; &nbsp; &nbsp; path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = ./data/so_2009_10stemmed<br />
&nbsp; &nbsp; &nbsp; &nbsp; morphology&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = stem_en<br />
}</div>
</div>
<p><br/></p>
<p>In the above there are two index definitions. The first index is very basic and has no morphology defined. The second index is derived from the first index and uses a <a href="http://en.wikipedia.org/wiki/Stemming">stemming</a> morphology. I&#039;m also having any HTML stripped out of the content since there is some in the data and it doesn&#039;t need to be searchable in the index.</p>
<p>The next block in the configuration file defines how the indexer will run. Here I&#039;m limiting the amount of memory used to 32M as an example of what could be done:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">###############################################<br />
## indexer settings<br />
###############################################</p>
<p>indexer<br />
{<br />
&nbsp; &nbsp; &nbsp; &nbsp; mem_limit &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 32M<br />
}</div>
</div>
<p><br/></p>
<p>The final part of the configuration file sets options for the search daemon:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">###############################################<br />
## searchd settings<br />
###############################################</p>
<p>searchd<br />
{<br />
&nbsp; &nbsp; &nbsp; &nbsp; port&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 3312<br />
&nbsp; &nbsp; &nbsp; &nbsp; log &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = /tmp/searchd.log<br />
&nbsp; &nbsp; &nbsp; &nbsp; query_log &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = /tmp/query.log</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; read_timeout&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 5<br />
&nbsp; &nbsp; &nbsp; &nbsp; max_children&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 30<br />
&nbsp; &nbsp; &nbsp; &nbsp; pid_file&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = /tmp/searchd.pid<br />
&nbsp; &nbsp; &nbsp; &nbsp; max_matches &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 100<br />
&nbsp; &nbsp; &nbsp; &nbsp; seamless_rotate &nbsp; &nbsp; &nbsp; &nbsp; = 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; preopen_indexes &nbsp; &nbsp; &nbsp; &nbsp; = 0<br />
&nbsp; &nbsp; &nbsp; &nbsp; unlink_old&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 1<br />
}</div>
</div>
<p><br/></p>
<p>You can download the full configuration file here: <a href="/examples/sphinx.conf">sphinx.conf</a></p>
<p>The above configuration gets you the basics. If you want to find out more you can check out the <a href="http://www.sphinxsearch.com/docs/current.html#indexing">indexing</a> documentation.</p>
<p><a name="sphinx-index"><b>Building an Index with Sphinx</b></a></p>
<p>The complexity of building and maintaining an index goes up with the frequency of the updates that are required to keep it up to date. Luckily the data set I&#039;m working with gets updated once a month so it is fairly static. Because the data is updated monthly I only have to update the index once a month so this makes the indexing simple.</p>
<p>This is the command I use to create the index:</p>
<pre>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;">indexer <span class="re5">--all</span> <span class="re5">--config</span> sphinx.conf</div>
</div>
</pre>
<p>Indexing the entire data set I have only takes a few minutes so re-indexing speed isn&#039;t an issue for me and the resulting index is in the 250M range as of this post so size isn&#039;t necessarily an issue either. However I currently create the index on one machine and then upload it to the server so a 250M transfer could become a bandwidth hog if I needed to update the index more often. If I were able to get deltas from the data dump I would look into using the <a href="http://www.sphinxsearch.com/docs/current.html#index-merging">index merging</a> feature and then transferring the delta index.</p>
<p><a name="sphinx-search"><b>Searching with Sphinx</b></a></p>
<p>Sphinx offers a lot of flexibility in configuring how a search is run and I found myself needing that flexibility to get better results. The search command line interface (see the <a href="http://www.sphinxsearch.com/docs/current.html#ref-search">reference</a> for all the options) can be used to test a few of the options but you can&#039;t do everything with it that you can with the various APIs available.</p>
<p>The first place I started changing defaults with the search is in how it matches the search words. The default is to find all words:</p>
<pre>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;">search <span class="re5">--config</span> sphinx.conf <span class="st0">&quot;php full text search&quot;</span></div>
</div>
</pre>
<p>Here is an example of the output you get from the CLI search:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">Copyright (c) 2001-2008, Andrew Aksyonoff</p>
<p>using config file &#039;sphinx.conf&#039;&#8230;<br />
index &#039;so_2009_10&#039;: query &#039;php full text search &#039;: returned 19 matches of 19 total in 0.061 sec</p>
<p>displaying matches:<br />
1. document=553055, weight=6, owner_id=2287, date_added=Mon Feb 16 11:45:01 2009<br />
&nbsp; &nbsp; &nbsp; &nbsp; id=553055<br />
&nbsp; &nbsp; &nbsp; &nbsp; post_type_id=1<br />
&nbsp; &nbsp; &nbsp; &nbsp; accepted_answer_id=553269<br />
&nbsp; &nbsp; &nbsp; &nbsp; parent_id=(null)<br />
&nbsp; &nbsp; &nbsp; &nbsp; score=6<br />
&nbsp; &nbsp; &nbsp; &nbsp; view_count=604<br />
&nbsp; &nbsp; &nbsp; &nbsp; body_text=truncated&#8230; body text would be here<br />
&nbsp; &nbsp; &nbsp; &nbsp; owner_id=2287<br />
&nbsp; &nbsp; &nbsp; &nbsp; last_editor_user_id=2287<br />
&nbsp; &nbsp; &nbsp; &nbsp; last_editor_display_name=PConroy<br />
&nbsp; &nbsp; &nbsp; &nbsp; last_edit_date=2009-02-18 10:54:44<br />
&nbsp; &nbsp; &nbsp; &nbsp; last_activity_date=2009-03-25 19:42:34<br />
&nbsp; &nbsp; &nbsp; &nbsp; title=Best full text search for mysql?<br />
&nbsp; &nbsp; &nbsp; &nbsp; answer_count=5<br />
&nbsp; &nbsp; &nbsp; &nbsp; comment_count=0<br />
&nbsp; &nbsp; &nbsp; &nbsp; favorite_count=7<br />
&nbsp; &nbsp; &nbsp; &nbsp; created=2009-02-16 11:45:01<br />
&#8230;</div>
</div>
<p><br/></p>
<p>The output shows you the &#034;document id&#034; as well as the weight and any &#034;sql_attr_&#034; values. I used the CLI search to get the weights for each search change I made before I started working with the API. Please note that the extended information in the above example output only shows up if the sql_query_info query is set correctly in the configuration file.</p>
<p>Next I tested with &#034;any&#034; and &#034;extended search version 2&#034; before settling on extended version 2:</p>
<pre>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;">search <span class="re5">--any</span> <span class="re5">--config</span> sphinx.conf <span class="st0">&quot;php full text search&quot;</span>

search <span class="re5">--ext2</span> <span class="re5">--config</span> sphinx.conf <span class="st0">&quot;php full text search&quot;</span></div>
</div>
</pre>
<p>The CLI search will also perform sorts. It is important to note that you can only sort on values that were indexed using the &#034;sql_attr_&#034;* configuration options. Here is an example of sorting by an indexed value:</p>
<pre>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;">search <span class="re5">--sortby</span> <span class="st0">&quot;date_added desc&quot;</span> <span class="re5">--config</span> sphinx.conf <span class="st0">&quot;php regex&quot;</span></div>
</div>
</pre>
<p>This is what happens when you try to sort based on an un-index attribute:</p>
<pre>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;">search <span class="re5">--sortby</span> <span class="st0">&quot;id desc&quot;</span> <span class="re5">--config</span> sphinx.conf <span class="st0">&quot;php regex&quot;</span>

Sphinx 0.9.8.1-release <span class="br0">&#40;</span>r1533<span class="br0">&#41;</span>
Copyright <span class="br0">&#40;</span>c<span class="br0">&#41;</span> 2001-2008, Andrew Aksyonoff

using config <span class="kw2">file</span> <span class="st_h">'sphinx.conf'</span>...
index <span class="st_h">'so_2009_10'</span>: search error: failed to create sorting queue: sort-by attribute <span class="st_h">'id'</span> not found.</div>
</div>
</pre>
<p>One of the things that the CLI search interface can&#039;t do is change <a href="http://www.sphinxsearch.com/docs/current.html#weighting">weightings</a> for the different data points. I found that weighting the title more than the body made sense and that is reflected in the PHP code that follows in the API section.  </p>
<p>The last part of searching to cover is searchd server. This is the integration point for the various Sphinx APIs and I assume it is running for the examples in the PHP API section. If the configuration example given in the first section is used all that needs to be done is to start the daemon. The following is a simple init script I use to start and stop searchd:</p>
<pre>
<div class="codesnip-container" >
<div class="bash codesnip" style="font-family:monospace;"><span class="co0">#!/bin/sh</span>

<span class="kw1">case</span> <span class="st0">&quot;$1&quot;</span> <span class="kw1">in</span>
<span class="st_h">'start'</span><span class="br0">&#41;</span>
searchd <span class="re5">--config</span> <span class="sy0">/</span>etc<span class="sy0">/</span>sphinx<span class="sy0">/</span>sphinx.conf
<span class="sy0">;;</span>
<span class="st_h">'stop'</span><span class="br0">&#41;</span>
<span class="kw2">killall</span> <span class="re5">-9</span> searchd;
<span class="sy0">;;</span>
<span class="sy0">*</span><span class="br0">&#41;</span>
<span class="kw3">echo</span> <span class="st0">&quot;Usage: $0 { start | stop}&quot;</span>
<span class="kw3">exit</span> 1
<span class="sy0">;;</span>
<span class="kw1">esac</span>
<span class="kw3">exit</span> <span class="nu0">0</span>
<span class="co0">#</span></div>
</div>
</pre>
<p><a name="sphinx-php-api"><b>Using the Sphinx PHP API</b></a></p>
<p>The Sphinx PHP API is included in the <a href="http://www.sphinxsearch.com/downloads.html">Sphinx source</a>. The API is contained in one file named sphinxapi.php that is located in the api directory of the source. Make sure the library is in a place where it can be included before trying out the examples. There is also some <a href="http://www.sphinxsearch.com/wiki/doku.php?id=php_api_docs">documentation</a> for the PHP API.</p>
<p>In the following example I&#039;m running the search query then serializing it to JSON format. Notice that the title is given a weight of 70 and the body_text is given a weight of 30 so the results will focus more on the title than what is in the body:</p>
<div class="codesnip-container" >
<div class="php codesnip" style="font-family:monospace;"><span class="kw2">&lt;?php</span></p>
<p><span class="kw1">require_once</span><span class="br0">&#40;</span><span class="st_h">&#039;sphinxapi.php&#039;</span><span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="co1">// Create the client, tell it where the server </span><br />
<span class="co1">// is and how long to wait for a response.</span><br />
<span class="re0">$sphinxClient</span> <span class="sy0">=</span> <span class="kw2">new</span> SphinxClient<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetServer</span><span class="br0">&#40;</span> <span class="st_h">&#039;localhost&#039;</span><span class="sy0">,</span> 3312 <span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetConnectTimeout</span><span class="br0">&#40;</span> 1 <span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="co1">// This gives the title more weight than the </span><br />
<span class="co1">// body text for searches.</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetFieldWeights</span><span class="br0">&#40;</span><a href="http://www.php.net/array"><span class="kw3">array</span></a><span class="br0">&#40;</span><span class="st_h">&#039;title&#039;</span> <span class="sy0">=&gt;</span> <span class="nu0">70</span><span class="sy0">,</span> <span class="st_h">&#039;body_text&#039;</span> <span class="sy0">=&gt;</span> 30<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="co1">// Use the exteneded v2 match type</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetMatchMode</span><span class="br0">&#40;</span> SPH_MATCH_EXTENDED2 <span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="co1">// Set the maximum number of search results to return</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetLimits</span><span class="br0">&#40;</span> 0<span class="sy0">,</span> 20<span class="sy0">,</span> 1000 <span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="co1">// Set how to rank the weighted values</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetRankingMode</span><span class="br0">&#40;</span> SPH_RANK_PROXIMITY_BM25 <span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="co1">// Give me back the results as an array</span><br />
<span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">SetArrayResult</span><span class="br0">&#40;</span> <span class="kw4">true</span> <span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="re0">$searchQuery</span> <span class="sy0">=</span> <span class="re0">$_GET</span><span class="br0">&#91;</span><span class="st_h">&#039;query&#039;</span><span class="br0">&#93;</span><span class="sy0">;</span><br />
<span class="re0">$searchResults</span> <span class="sy0">=</span> <span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">Query</span><span class="br0">&#40;</span> <span class="re0">$searchQuery</span><span class="sy0">,</span> <span class="st_h">&#039;*&#039;</span> <span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="re0">$jhash</span> <span class="sy0">=</span> <a href="http://www.php.net/array"><span class="kw3">array</span></a><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="kw1">if</span> <span class="br0">&#40;</span> <span class="re0">$searchResults</span> <span class="sy0">===</span> <span class="kw4">false</span> <span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;status&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="st_h">&#039;failed&#039;</span><span class="sy0">;</span><br />
&nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;status_message&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">GetLastError</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span><br />
<span class="kw1">else</span><br />
<span class="br0">&#123;</span><br />
&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> <span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">GetLastWarning</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span><br />
&nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;status&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="st_h">&#039;warning&#039;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;status_message&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="re0">$sphinxClient</span><span class="sy0">-&gt;</span><span class="me1">GetLastWarning</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; <span class="br0">&#125;</span><br />
&nbsp; <span class="kw1">else</span><br />
&nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;status&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="st_h">&#039;good&#039;</span><span class="sy0">;</span><br />
&nbsp; <span class="br0">&#125;</span></p>
<p>&nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;result_total&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="re0">$searchResults</span><span class="br0">&#91;</span><span class="st_h">&#039;total&#039;</span><span class="br0">&#93;</span><span class="sy0">;</span><br />
&nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;result_found&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="re0">$searchResults</span><span class="br0">&#91;</span><span class="st_h">&#039;total_found&#039;</span><span class="br0">&#93;</span><span class="sy0">;</span></p>
<p>&nbsp; <span class="re0">$jhash_matches</span> <span class="sy0">=</span> <a href="http://www.php.net/array"><span class="kw3">array</span></a><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span> <a href="http://www.php.net/is_array"><span class="kw3">is_array</span></a><span class="br0">&#40;</span><span class="re0">$searchResults</span><span class="br0">&#91;</span><span class="st0">&quot;matches&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#41;</span><br />
&nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="re0">$row_ids</span> <span class="sy0">=</span> <a href="http://www.php.net/array"><span class="kw3">array</span></a><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">foreach</span> <span class="br0">&#40;</span> <span class="re0">$searchResults</span><span class="br0">&#91;</span><span class="st0">&quot;matches&quot;</span><span class="br0">&#93;</span> <span class="kw1">as</span> <span class="re0">$docinfo</span> <span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; <a href="http://www.php.net/array_push"><span class="kw3">array_push</span></a><span class="br0">&#40;</span><span class="re0">$row_ids</span><span class="sy0">,</span> <a href="http://www.php.net/mysql_real_escape_string"><span class="kw3">mysql_real_escape_string</span></a><span class="br0">&#40;</span><span class="re0">$docinfo</span><span class="br0">&#91;</span><span class="st_h">&#039;id&#039;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; <span class="br0">&#125;</span></p>
<p>&nbsp; <span class="re0">$jhash</span><span class="br0">&#91;</span><span class="st_h">&#039;matches&#039;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="re0">$jhash_matches</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span></p>
<p><span class="kw1">echo</span> <a href="http://www.php.net/json_encode"><span class="kw3">json_encode</span></a><span class="br0">&#40;</span><span class="re0">$jhash</span><span class="br0">&#41;</span><span class="sy0">;</span></p>
<p><span class="sy1">?&gt;</span></div>
</div>
<p>Although I picked PHP there are a wide range of language specific libraries available.</p>
<p>Overall it didn&#039;t seem too difficult to set up Sphinx and have it serving search results quickly. I liked that it is very light weight and doesn&#039;t need a lot of resources or require a lot of extra parts to be installed.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=G_kGmfpWI_A:hUSqB_tE5N0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=G_kGmfpWI_A:hUSqB_tE5N0:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=G_kGmfpWI_A:hUSqB_tE5N0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=G_kGmfpWI_A:hUSqB_tE5N0:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=G_kGmfpWI_A:hUSqB_tE5N0:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=G_kGmfpWI_A:hUSqB_tE5N0:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/G_kGmfpWI_A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/685/full-text-search-with-sphinx/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/685/full-text-search-with-sphinx/</feedburner:origLink></item>
		<item>
		<title>How I Used Hpricot and Mechanize in GeeQE</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/FLq7P0utn40/</link>
		<comments>http://www.ioncannon.net/programming/675/using-hpricot-and-mechanize/#comments</comments>
		<pubDate>Mon, 28 Sep 2009 00:50:46 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[hpricot]]></category>
		<category><![CDATA[mechanize]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=675</guid>
		<description><![CDATA[While building GeeQE I wanted to enhance the CC dump of Stack Overflow&#039;s data. The main reason I wanted to do this was to capture Gravatar hashes and user badges. To do this I decided to continue using Ruby as I did with the XML loading (see  my previous post on XML parsing with [...]]]></description>
			<content:encoded><![CDATA[<p>While building <a href="http://www.ioncannon.net/projects/geeqe/">GeeQE</a> I wanted to enhance the <a href="http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/">CC dump of Stack Overflow&#039;s data</a>. The main reason I wanted to do this was to capture <a href="http://www.gravatar.com/">Gravatar</a> hashes and user badges. To do this I decided to continue using Ruby as I did with the XML loading (see  my previous post on <a href="http://www.ioncannon.net/programming/643/fast-xml-parsing-with-ruby/">XML parsing with Ruby</a>).  The easy choice was of course <a href="http://github.com/hpricot">Hpricot</a> to parse the HTML from the <a href="http://stackoverflow.com/users">users page</a> and <a href="http://mechanize.rubyforge.org/mechanize/">Mechanize</a> to move from one page to the next.</p>
<p><span id="more-675"></span></p>
<p>The first thing I wanted to make sure to do was to scrape this data as efficiently as possible. That is why I use the users page instead of going over every single user&#039;s profile page. This approach is not optimal since the users pages most likely update as the script moves through them so users could be missed but I&#039;m not looking for 100% here so that limitation was acceptable.</p>
<p>To run the <a href="http://code.google.com/p/geeqe/source/browse/trunk/scripts/useroverviewpuller.rb">user profile script</a> that is described here you will first need to load the database <a href="http://code.google.com/p/geeqe/source/browse/trunk/scripts/schema.sql">schema</a> and then load the CC data dump with the <a href="http://code.google.com/p/geeqe/source/browse/trunk/scripts/load.rb">XML loader script</a>.</p>
<p>Two good sources of information about using Hpricot can be found in the <a href="http://wiki.github.com/hpricot/hpricot/an-hpricot-showcase">Hpricot showcase</a> and the <a href="http://wiki.github.com/hpricot/hpricot/hpricot-challenge">Hpricot challenge</a> wiki pages.</p>
<p>The only tricky thing that I needed to do with Hpricot was associated with parsing badge counts since the outer spans have a title with the name of the badge type in them:</p>
<div class="codesnip-container" >
<div class="html4strict codesnip" style="font-family:monospace;"><span class="sc2">&lt;<a href="http://december.com/html/4/element/div.html"><span class="kw2">div</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;user-details&quot;</span>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/a.html"><span class="kw2">a</span></a> <span class="kw3">href</span><span class="sy0">=</span><span class="st0">&quot;/users/256/example-user&quot;</span> &gt;</span>Example User<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/a.html"><span class="kw2">a</span></a>&gt;&lt;<a href="http://december.com/html/4/element/br.html"><span class="kw2">br</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;reputation-score&quot;</span> <span class="kw3">title</span><span class="sy0">=</span><span class="st0">&quot;reputation score&quot;</span>&gt;</span>22k<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">title</span><span class="sy0">=</span><span class="st0">&quot;8 gold badges&quot;</span>&gt;&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;badge1&quot;</span>&gt;</span><span class="sc1">&amp;#9679;</span><span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;badgecount&quot;</span>&gt;</span>8<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">title</span><span class="sy0">=</span><span class="st0">&quot;5 silver badges&quot;</span>&gt;&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;badge2&quot;</span>&gt;</span><span class="sc1">&amp;#9679;</span><span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;badgecount&quot;</span>&gt;</span>5<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">title</span><span class="sy0">=</span><span class="st0">&quot;7 bronze badges&quot;</span>&gt;&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;badge3&quot;</span>&gt;</span><span class="sc1">&amp;#9679;</span><span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;&lt;<a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a> <span class="kw3">class</span><span class="sy0">=</span><span class="st0">&quot;badgecount&quot;</span>&gt;</span>7<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/span.html"><span class="kw2">span</span></a>&gt;</span><br />
<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/div.html"><span class="kw2">div</span></a>&gt;</span></div>
</div>
<p>I could have probably used the class of the inner span but I decided since it wasn&#039;t named that I couldn&#039;t be completely sure those would stay the same.</p>
<p>I used the ability of Hpricot to match attribute values based on <a href="http://trac.edgewall.org/wiki/TracQuery#QueryLanguage">Trac Query</a> syntax. Here you can see that with the &#034;@title~=badge type&#034; where ~= matches when the value of the title attribute contains the corrisponding badge type name:</p>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;">user_bc = <span class="br0">&#40;</span>user_info<span class="sy0">/</span><span class="st0">&quot;div[@class='user-details']/span[@title~=gold]/span[@class='badgecount']&quot;</span><span class="br0">&#41;</span><br />
&nbsp; user_gold = user_bc != <span class="kw2">nil</span> <span class="sy0">&amp;&amp;</span> user_bc<span class="br0">&#91;</span>0<span class="br0">&#93;</span> != <span class="kw2">nil</span> ? user_bc<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">inner_html</span> : <span class="nu0">0</span><br />
&nbsp; <br />
&nbsp; user_bc = <span class="br0">&#40;</span>user_info<span class="sy0">/</span><span class="st0">&quot;div[@class='user-details']/span[@title~=silver]/span[@class='badgecount']&quot;</span><span class="br0">&#41;</span><br />
&nbsp; user_silver = user_bc != <span class="kw2">nil</span> <span class="sy0">&amp;&amp;</span> user_bc<span class="br0">&#91;</span>0<span class="br0">&#93;</span> != <span class="kw2">nil</span> ? user_bc<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">inner_html</span> : <span class="nu0">0</span><br />
&nbsp; <br />
&nbsp; user_bc = <span class="br0">&#40;</span>user_info<span class="sy0">/</span><span class="st0">&quot;div[@class='user-details']/span[@title~=bronze]/span[@class='badgecount']&quot;</span><span class="br0">&#41;</span><br />
&nbsp; user_bronze = user_bc != <span class="kw2">nil</span> <span class="sy0">&amp;&amp;</span> user_bc<span class="br0">&#91;</span>0<span class="br0">&#93;</span> != <span class="kw2">nil</span> ? user_bc<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">inner_html</span> : <span class="nu0">0</span></div>
</div>
<p>After parsing the page for user information the script then looks for the next page URL to parse then sleeps for a random amount of time before using Mechanize to pull down the page.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=FLq7P0utn40:p68yuWpAz2c:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=FLq7P0utn40:p68yuWpAz2c:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=FLq7P0utn40:p68yuWpAz2c:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=FLq7P0utn40:p68yuWpAz2c:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=FLq7P0utn40:p68yuWpAz2c:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=FLq7P0utn40:p68yuWpAz2c:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/FLq7P0utn40" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/675/using-hpricot-and-mechanize/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/675/using-hpricot-and-mechanize/</feedburner:origLink></item>
		<item>
		<title>Fast XML parsing with Ruby</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/w1IEmH3yWqI/</link>
		<comments>http://www.ioncannon.net/programming/643/fast-xml-parsing-with-ruby/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 13:54:47 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[parse]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=643</guid>
		<description><![CDATA[One of the first things I needed to do while building the GeeQE iPhone application was process the CC data dump from Stack Overflow. The dump contains XML files representing tables from Stack Overflow with the largest file being posts.xml weighing in at 1.2G as of September. I decided it would be pretty easy to [...]]]></description>
			<content:encoded><![CDATA[<p>One of the first things I needed to do while building the <a href="http://www.ioncannon.net/projects/geeqe/">GeeQE iPhone application</a> was process the <a href="http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/">CC data dump from Stack Overflow</a>. The dump contains XML files representing tables from Stack Overflow with the largest file being posts.xml weighing in at 1.2G as of September. I decided it would be pretty easy to use Ruby to parse the XML and load the data into MySQL so I went about finding the right parser for the job. </p>
<p>If you haven&#039;t processed large amounts of XML before one thing to realize is that you don&#039;t want to use a DOM parser because it is going to load the entire XML structure into memory. What you want is a SAX parser that can work on the XML stream as it comes in. With this in mind I started looking around and quickly found an <a href="http://depixelate.com/2008/4/23/ruby-xml-parsing-benchmarks">older benchmark</a> post that gave me an educated guess that the <a href="http://libxml.rubyforge.org/">LibXML</a> library was going to be the fastest parser for Ruby. After figuring out how to use it I decided to also give a couple other libraries a shot to see how they stacked up, the other two I looked at were <a href="http://www.germane-software.com/software/rexml/">REXML</a> and <a href="http://nokogiri.rubyforge.org/nokogiri/">Nokogiri</a>.</p>
<p><span id="more-643"></span></p>
<p>The following is a set of example using each library in streaming SAX mode. Each processes the 1.2G posts.xml file from the dump and does nothing more than check that the element represents a &#034;row&#034;. I have also included a sample runtime for each:</p>
<h2>REXML SAX example</h2>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;"><span class="kw3">require</span> <span class="st0">&#039;rubygems&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;rexml/document&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&quot;rexml/streamlistener&quot;</span></p>
<p><span class="kw1">include</span> REXML</p>
<p><span class="kw1">class</span> PostCallbacks<br />
&nbsp; <span class="kw1">include</span> StreamListener</p>
<p>&nbsp; <span class="kw1">def</span> tag_start<span class="br0">&#40;</span>element, attributes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> element == <span class="st0">&#039;row&#039;</span><br />
&nbsp; &nbsp; &nbsp; <span class="co1"># Process row of data here</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p>source = <span class="kw4">File</span>.<span class="me1">new</span> <span class="st0">&quot;posts.xml&quot;</span><br />
Document.<span class="me1">parse_stream</span><span class="br0">&#40;</span>source, PostCallbacks.<span class="me1">new</span><span class="br0">&#41;</span></div>
</div>
<p><b>REXML runtime</b></p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">time ruby rexmltest.rb<br />
real &nbsp; &nbsp;47m22.871s<br />
user &nbsp; &nbsp;42m0.711s<br />
sys &nbsp; &nbsp; 3m31.943s</div>
</div>
<h2>Nokogiri SAX example</h2>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;"><span class="kw3">require</span> <span class="st0">&#039;rubygems&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;nokogiri&#039;</span></p>
<p><span class="kw1">include</span> Nokogiri</p>
<p><span class="kw1">class</span> PostCallbacks <span class="sy0">&lt;</span> <span class="re2">XML::SAX::Document</span><br />
&nbsp; <span class="kw1">def</span> start_element<span class="br0">&#40;</span>element, attributes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> element == <span class="st0">&#039;row&#039;</span><br />
&nbsp; &nbsp; &nbsp; <span class="co1"># Process row of data here</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p>parser = <span class="re2">XML::SAX::Parser</span>.<span class="me1">new</span><span class="br0">&#40;</span>PostCallbacks.<span class="me1">new</span><span class="br0">&#41;</span><br />
parser.<span class="me1">parse_file</span><span class="br0">&#40;</span><span class="st0">&quot;posts.xml&quot;</span><span class="br0">&#41;</span></div>
</div>
<p><b>Nokogiri runtime</b></p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">time ruby nokogiri.rb<br />
real &nbsp; &nbsp;4m45.347s<br />
user &nbsp; &nbsp;4m7.504s<br />
sys &nbsp; &nbsp; 0m19.332s</div>
</div>
<h2>LibXML SAX example</h2>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;"><span class="kw3">require</span> <span class="st0">&#039;rubygems&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;libxml&#039;</span></p>
<p><span class="kw1">include</span> LibXML</p>
<p><span class="kw1">class</span> PostCallbacks<br />
&nbsp; <span class="kw1">include</span> <span class="re2">XML::SaxParser::Callbacks</span></p>
<p>&nbsp; <span class="kw1">def</span> on_start_element<span class="br0">&#40;</span>element, attributes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> element == <span class="st0">&#039;row&#039;</span><br />
&nbsp; &nbsp; &nbsp; <span class="co1"># Process row of data here</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p>parser = <span class="re2">XML::SaxParser</span>.<span class="me1">file</span><span class="br0">&#40;</span><span class="st0">&quot;posts.xml&quot;</span><span class="br0">&#41;</span><br />
parser.<span class="me1">callbacks</span> = PostCallbacks.<span class="me1">new</span><br />
parser.<span class="me1">parse</span></div>
</div>
<p><b>LibXML runtime</b></p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">time ruby libxmltest.rb<br />
real &nbsp; &nbsp;1m55.657s<br />
user &nbsp; &nbsp;1m41.938s<br />
sys &nbsp; &nbsp; 0m5.718s</div>
</div>
<p><br/><br />
From the above you can see that LibXML is the fastest. I thought that Nokogiri would be a lot closer in execution time given that it uses libxml2 but it is still 2 times slower. The slowest by far was REXML clocking in more than 20 times slower than LibXML. Nokogiri seemed easier to debug when things went wrong than LibXML so had I needed to construct a more complex application to load the data I would have probably used it instead.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=w1IEmH3yWqI:CHLd7dzzsgw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=w1IEmH3yWqI:CHLd7dzzsgw:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=w1IEmH3yWqI:CHLd7dzzsgw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=w1IEmH3yWqI:CHLd7dzzsgw:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=w1IEmH3yWqI:CHLd7dzzsgw:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=w1IEmH3yWqI:CHLd7dzzsgw:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/w1IEmH3yWqI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/643/fast-xml-parsing-with-ruby/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/643/fast-xml-parsing-with-ruby/</feedburner:origLink></item>
		<item>
		<title>RFID Reader USB Prototyping Kit</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/o4T4w02I0Sc/</link>
		<comments>http://www.ioncannon.net/programming/551/rfid-reader-writer-usb-prototyping-kit/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 10:00:23 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[diy]]></category>
		<category><![CDATA[rfid]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=551</guid>
		<description><![CDATA[I recently won a programming contest that netted me a gift card for ThinkGeek and not knowing what else to do I strolled the site looking for something interesting to use the gift card on. Eventually I ran into the RFID Experimentation Kit they have and decided that was what I needed. I have been [...]]]></description>
			<content:encoded><![CDATA[<p>I recently won a programming contest that netted me a gift card for <a href="http://www.thinkgeek.com/">ThinkGeek</a> and not knowing what else to do I strolled the site looking for something interesting to use the gift card on. Eventually I ran into the <a href="http://www.thinkgeek.com/geektoys/science/907a/">RFID Experimentation Kit</a> they have and decided that was what I needed. I have been wanting to play around with <a href="http://en.wikipedia.org/wiki/Radio-frequency_identification">RFID</a> for a while and this kit turned out to be pretty nice for tinkering.</p>
<p><span id="more-551"></span></p>
<p>Even though I picked this RFID kit from ThinkGeek you can buy it a couple different places and in a number of different configurations. The one I got came with a book from ExtremeTech called <a href="http://www.amazon.com/RFID-Toys-Projects-Entertainment-ExtremeTech/dp/0471771961">RFID Toys: 11 Cool Projects for Home, Office and Entertainment</a> and the RFID parts are from <a href="http://www.phidgets.com/">Phidgets</a>, who actually make a lot of other interesting kits. </p>
<p><a href="http://www.ioncannon.net/wp-content/uploads/2009/07/allkit.png"><img src="http://www.ioncannon.net/wp-content/uploads/2009/07/allkit-300x228.png" alt="RFID Kit Parts" title="RFID Kit Parts" width="300" height="228" class="alignnone size-medium wp-image-552" /></a></p>
<p>Outside of the book you can get the <a href="http://www.phidgets.com/products.php?category=14&#038;product_id=1023">RFID reader</a> and <a href="http://www.phidgets.com/products.php?category=14">RFID keyfobs, cards, etc</a>. Another good place to look for this type of gear is <a href="http://www.sparkfun.com">SparkFun</a> and they also have a <a href="http://www.sparkfun.com/commerce/product_info.php?products_id=8852">USB RFID reader</a>.</p>
<p><a href="http://www.ioncannon.net/wp-content/uploads/2009/07/kitsansbook.png"><img src="http://www.ioncannon.net/wp-content/uploads/2009/07/kitsansbook-219x300.png" alt="RFID Reader and Badges" title="RFID Reader and Badges" width="219" height="300" class="alignnone size-medium wp-image-553" /></a></p>
<p><a href="http://www.ioncannon.net/wp-content/uploads/2009/07/rfidreader.png"><img src="http://www.ioncannon.net/wp-content/uploads/2009/07/rfidreader-260x300.png" alt="RFID Reader" title="RFID Reader" width="260" height="300" class="alignnone size-medium wp-image-554" /></a></p>
<p>This RFID kit reads EM4102 type tags and operates at 125 kHz. The frequency and tag type are actually important. I have a number of badges and keyfobs outside of what I got with the kit and none of them work with this reader.</p>
<p>The Phidgets site has source code in a number of different languages making it very easy to develop custom applications. To get started you will need to grab the <a href="http://www.phidgets.com/drivers.php">driver source</a> and compile that. It isn&#039;t a driver in the normal sense of the word since it doesn&#039;t get installed by your operating system but more of a base library. Once you have the base library compiled you can grab any number of different language source files from the <a href="http://www.phidgets.com/products.php?category=14&#038;product_id=1023">RFID reader source</a> area. There should be a simple RFID reader example that shows how to use most of the available functions out of the base library.</p>
<p>If you are interested RFID projects check out these links:</p>
<ul>
<li><a href="http://www.elektronika.ba/674/lazy-mans-usb-rfid-reader/">lazy mans usb RFID reader</a></li>
<li><a href="http://www.instructables.com/id/RFID_Reader_Detector_and_Tilt_Sensitive_RFID_Tag/">RFID reader detector</a></li>
</ul>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=o4T4w02I0Sc:lak724yZKgY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=o4T4w02I0Sc:lak724yZKgY:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=o4T4w02I0Sc:lak724yZKgY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=o4T4w02I0Sc:lak724yZKgY:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=o4T4w02I0Sc:lak724yZKgY:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=o4T4w02I0Sc:lak724yZKgY:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/o4T4w02I0Sc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/551/rfid-reader-writer-usb-prototyping-kit/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/551/rfid-reader-writer-usb-prototyping-kit/</feedburner:origLink></item>
		<item>
		<title>iPhone Windowed HTTP Live Streaming Server</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/XnexDl89faM/</link>
		<comments>http://www.ioncannon.net/meta/564/iphone-windowed-http-live-streaming-server/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 11:22:54 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[meta]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[streaming]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=564</guid>
		<description><![CDATA[After some more work I have enhanced the HTTP segmenter and uploading script from my iPhone streaming using AWS S3 and Cloudfront post. I added a number of features and tried to pull together some of the ideas from the comments. I&#039;ll go over some of the features here and there is a full list [...]]]></description>
			<content:encoded><![CDATA[<p>After some more work I have enhanced the HTTP segmenter and uploading script from my <a href="http://www.ioncannon.net/programming/475/iphone-windowed-http-live-streaming-using-amazon-s3-and-cloudfront-proof-of-concept/">iPhone streaming using AWS S3 and Cloudfront</a> post. I added a number of features and tried to pull together some of the ideas from the comments. I&#039;ll go over some of the features here and there is a full list of configuration options on the <a href="http://www.ioncannon.net/projects/http-live-video-stream-segmenter-and-distributor/">HTTP Live Video Streaming server</a> project page and the source is available at the <a href="http://github.com/carsonmcdonald/HTTP-Live-Video-Stream-Segmenter-and-Distributor/tree/master">github repo</a>.</p>
<p>So the major changes I have added are:</p>
<ul>
<li>Yaml based configuration file. See the project page for a complete list of options.</li>
<li>Ability to transfer segments via copy, ftp, scp and s3.</li>
<li>Added the ability to do variable bitrate streams.</li>
<li>Added re-streaming support.</li>
<li>Added logging to a file and better debug output.</li>
</ul>
<p>The variable bitrate streams where done by using pipes. I haven&#039;t done a large amount of testing but it seems to work fairly well. I am able to stream a live HD video source into 3 different bitrates on a fairly old PC. Here are a couple clips I created to show the progressive enhancement in action, you probably want to switch to the HD version of the video and watch it full screen to get the bet view:</p>
<p><center><br />
<object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/teKAyN0qZVY&#038;rel=0&#038;color1=0xb1b1b1&#038;color2=0xcfcfcf&#038;feature=player_profilepage&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowScriptAccess" value="always"></param><embed src="http://www.youtube.com/v/teKAyN0qZVY&#038;rel=0&#038;color1=0xb1b1b1&#038;color2=0xcfcfcf&#038;feature=player_profilepage&#038;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="480" height="295"></embed></object></center></p>
<p>The configuration file will allow for any number of encoding options or transfer options and they can be put together in a number of different ways. Here are a couple examples of both, see the example configuration files for more. </p>
<p>An encoder example:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">ep_128k:<br />
&nbsp; ffmpeg_command: &quot;ffmpeg -er 4 -y -i %s -f mpegts -acodec libmp3lame -ar 48000 -ab 64k -s 320&#215;240 -vcodec libx264 -b 128k -flags +loop -cmp +chroma -partitions +parti4&#215;4+partp8&#215;8+partb8&#215;8 -subq 5 -trellis 1 -refs 1 -coder 0 -me_range 16 -keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -bt 128k -maxrate 128k -bufsize 128k -rc_eq &#039;blurCplx^(1-qComp)&#039; -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -level 30 -aspect 320:240 -g 30 -async 2 &#8211; | %s %s %s %s %s&quot;<br />
&nbsp; bandwidth: 128000</div>
</div>
<p><br/></p>
<p>Transfer configuration example:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">ftp_dev:<br />
&nbsp; transfer_type: &#039;ftp&#039;<br />
&nbsp; remote_host: &#039;192.168.1.1&#039;<br />
&nbsp; user_name: &#039;user&#039;<br />
&nbsp; password: &#039;pass&#039;<br />
&nbsp; directory: &#039;html/streamingvideo&#039;</div>
</div>
<p>As a final note on changes, you are no longer able to use the segmenter without the script now really. If you want to do that you should use the original version of the <a href="http://svn.assembla.com/svn/legend/segmenter/">segmenter source</a>. </p>
<p>Please note that there is still some work to be done on the script to be complete. If I have time my next enhancement will be to add encryption and I will probably try to test builds on other distributions (maybe attempt to create segmenter binaries).</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=XnexDl89faM:YXDCjUW1O2w:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=XnexDl89faM:YXDCjUW1O2w:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=XnexDl89faM:YXDCjUW1O2w:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=XnexDl89faM:YXDCjUW1O2w:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=XnexDl89faM:YXDCjUW1O2w:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=XnexDl89faM:YXDCjUW1O2w:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/XnexDl89faM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/meta/564/iphone-windowed-http-live-streaming-server/feed/</wfw:commentRss>
		<slash:comments>38</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/meta/564/iphone-windowed-http-live-streaming-server/</feedburner:origLink></item>
		<item>
		<title>iPhone Windowed HTTP Live Streaming Using Amazon S3 and Cloudfront Proof of Concept</title>
		<link>http://feedproxy.google.com/~r/ioncannon/~3/HgEUTThKySo/</link>
		<comments>http://www.ioncannon.net/programming/475/iphone-windowed-http-live-streaming-using-amazon-s3-and-cloudfront-proof-of-concept/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 02:07:37 +0000</pubDate>
		<dc:creator>carson</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[cloudfront]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://www.ioncannon.net/?p=475</guid>
		<description><![CDATA[This post should be seen as a proof of concept. I&#039;m working on creating a more concise and easier to use package of everything covered here but I felt like getting the knowledge out sooner rather than later would be of help to people looking for a way to do this. If you are interested [...]]]></description>
			<content:encoded><![CDATA[<p>This post should be seen as a proof of concept. I&#039;m working on creating a more concise and easier to use package of everything covered here but I felt like getting the knowledge out sooner rather than later would be of help to people looking for a way to do this. If you are interested keep an eye on the <a href="http://www.ioncannon.net/projects/http-live-video-stream-segmenter-and-distributor/">HTTP live video stream segementer and distributor project page</a> as well as the <a href="http://github.com/carsonmcdonald/HTTP-Live-Video-Stream-Segmenter-and-Distributor/tree/master">github git repository</a>.</p>
<p>After my post on <a href="http://www.ioncannon.net/programming/452/iphone-http-streaming-with-ffmpeg-and-an-open-source-segmenter/">using FFMpeg and an open source segmenter</a> to create videos for the iPhone that conform to the <a href="http://tools.ietf.org/html/draft-pantos-http-live-streaming-01">HTTP live streaming protocol</a> I decided to see if I could get the same segmenter to work on a live stream. As it turns out it didn&#039;t take much modification to work. </p>
<p>If you are looking for something you can buy out of the box it appears that <a href="http://gigaom.com/2009/07/02/akamai-to-make-iphone-video-streaming-smooth/">Akamai</a> is doing <a href="http://newteevee.com/2009/06/30/video-see-apples-http-adaptive-video-streaming-in-action/">iPhone video streaming</a> now. I believe that the following solution using Amazon <a href="http://aws.amazon.com/s3/">S3</a> and <a href="http://aws.amazon.com/cloudfront/">Cloudfront</a> is probably as good as what Akamai can offer but it may be a better choice if you don&#039;t want to have to maintain the configuration.</p>
<p>I put together a quick diagram of the process of transferring the video stream from source to final destination that will hopefully help people understand the full picture before jumping into the details:</p>
<p><a href="http://www.ioncannon.net/wp-content/uploads/2009/07/streaming-diagram.png"><img src="http://www.ioncannon.net/wp-content/uploads/2009/07/streaming-diagram.png" alt="HTTP Live Streaming Diagram" title="HTTP Live Streaming Diagram" width="450" height="250" class="alignnone size-medium wp-image-508" /></a></p>
<p><span id="more-475"></span></p>
<p>Please note that except for the video stream all of the following was done using Fedora 11. I believe it could work on Windows or OS X but I haven&#039;t had time to test it on either.</p>
<h3>Step 1: Find a suitable video source</h3>
<p>This is an important part and can take more work than it seems like it should. I started out trying to stream video from a  USB QuickCam but for some reason the resulting stream wasn&#039;t correctly formatted even after going through the FFMpeg transcoding. I then turned to the iSight camera on a macbook using <a href="http://www.ioncannon.net/software/478/streaming-video-between-quicktime-broadcaster-and-vlc/">QuickTime Broadcaster to VLC streaming</a>. The resulting stream from the iSight camera works well.</p>
<p>The easiest way I found to experiment with finding a good stream is to dump a short clip out to a file then use the instructions in my <a href="http://www.ioncannon.net/programming/452/iphone-http-streaming-with-ffmpeg-and-an-open-source-segmenter/">previous post</a> to test.</p>
<p>It is also possible to do all of the following steps using a non-live video stream. In fact, due to the flexibility of what can be input into FFMpeg, someone could pull a stream from somewhere like <a href="http://ustream.tv">Ustream TV</a> or <a href="http://www.livestream.com/">Livestream</a> and rebroadcast it. Doing so would open up the door for live streaming to the iPhone using just Safari.</p>
<h3>Step 2: Set up modified segmenter</h3>
<p>To handle the live stream I had to modify the segmenter in two ways. The first was required to bypass an issue with the input coming in as a pipe. For those interested here is the modified section of code:</p>
<div class="codesnip-container" >
<div class="c codesnip" style="font-family:monospace;">ret <span class="sy0">=</span> av_write_frame<span class="br0">&#40;</span>output_context<span class="sy0">,</span> <span class="sy0">&amp;</span>packet<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>ret <span class="sy0">&lt;</span> <span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; fprintf<span class="br0">&#40;</span>stderr<span class="sy0">,</span> <span class="st0">&quot;Could not write frame of stream: %d<span class="es1">\n</span>&quot;</span><span class="sy0">,</span> ret<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; av_free_packet<span class="br0">&#40;</span><span class="sy0">&amp;</span>packet<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; <span class="co1">//break; ****** removed for streaming support *****</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; <span class="kw1">else</span> <span class="kw1">if</span> <span class="br0">&#40;</span>ret <span class="sy0">&gt;</span> <span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; fprintf<span class="br0">&#40;</span>stderr<span class="sy0">,</span> <span class="st0">&quot;End of stream requested<span class="es1">\n</span>&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; av_free_packet<span class="br0">&#40;</span><span class="sy0">&amp;</span>packet<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; <span class="kw2">break</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="br0">&#125;</span></div>
</div>
<p><br/></p>
<p>The second modification was more extensive. The original segmenter wrote the index file out for each segment as the segment ended. Instead of writing the index to disk I need the index information to push to S3 as well as to know when the segment itself is ready to be pushed. I could have used a <a href="http://libs3.ischo.com/index.html">S3 library</a> and stuck everything into the C code but instead I decided that it made more sense to save the stream to disk then push the index information to another process.</p>
<p>I do the transfer of information over a TCP socket connection from the segmenter to the upload process. A side effect of doing this is that it will allow for the upload process to take input from multiple transcode and segmenters at the same time. This should make for easy variable rate configurations where the transcoding can take advantage of multiple machines.</p>
<p>Here is the modified section of code:</p>
<div class="codesnip-container" >
<div class="c codesnip" style="font-family:monospace;"><span class="kw4">int</span> write_index_file<span class="br0">&#40;</span><span class="kw4">const</span> <span class="kw4">char</span> index<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> segment_duration<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> output_directory<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> output_prefix<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> http_prefix<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> first_segment<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> last_segment<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">int</span> end<span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> bucket_name<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="sy0">,</span> <span class="kw4">const</span> <span class="kw4">char</span> key_prefix<span class="br0">&#91;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
<span class="br0">&#123;</span><br />
&nbsp; <span class="kw4">char</span> buffer<span class="br0">&#91;</span>1024 <span class="sy0">*</span> 10<span class="br0">&#93;</span><span class="sy0">;</span><br />
&nbsp; memset<span class="br0">&#40;</span>buffer<span class="sy0">,</span> 0<span class="sy0">,</span> <span class="kw4">sizeof</span><span class="br0">&#40;</span><span class="kw4">char</span><span class="br0">&#41;</span> <span class="sy0">*</span> 1024 <span class="sy0">*</span> 10<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; sprintf<span class="br0">&#40;</span>buffer<span class="sy0">,</span> <span class="st0">&quot;%s, %s, %d, %s, %s, %d, %d, %d, %s, %s&quot;</span><span class="sy0">,</span> index<span class="sy0">,</span> output_directory<span class="sy0">,</span> segment_duration<span class="sy0">,</span> output_prefix<span class="sy0">,</span> http_prefix<span class="sy0">,</span> first_segment<span class="sy0">,</span> last_segment<span class="sy0">,</span> end<span class="sy0">,</span> bucket_name<span class="sy0">,</span> key_prefix<span class="br0">&#41;</span><span class="sy0">;</span></p>
<p>&nbsp; fprintf<span class="br0">&#40;</span>stderr<span class="sy0">,</span> <span class="st0">&quot;Sending: %s<span class="es1">\n</span>&quot;</span><span class="sy0">,</span> buffer<span class="br0">&#41;</span><span class="sy0">;</span></p>
<p>&nbsp; <span class="kw4">int</span> sock<span class="sy0">;</span><br />
&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span><span class="br0">&#40;</span>sock <span class="sy0">=</span> socket<span class="br0">&#40;</span>PF_INET<span class="sy0">,</span> SOCK_STREAM<span class="sy0">,</span> IPPROTO_TCP<span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="sy0">&lt;</span> <span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; fprintf<span class="br0">&#40;</span>stderr<span class="sy0">,</span> <span class="st0">&quot;Could not open socket.&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="sy0">-</span><span class="nu0">1</span><span class="sy0">;</span><br />
&nbsp; <span class="br0">&#125;</span></p>
<p>&nbsp; <span class="kw4">const</span> <span class="kw4">char</span> <span class="sy0">*</span>serverIP <span class="sy0">=</span> <span class="st0">&quot;127.0.0.1&quot;</span><span class="sy0">;</span> &nbsp;<br />
&nbsp; <span class="kw4">int</span> serverPort <span class="sy0">=</span> <span class="nu0">10234</span><span class="sy0">;</span></p>
<p>&nbsp; <span class="kw4">struct</span> sockaddr_in serverAddress<span class="sy0">;</span><br />
&nbsp; memset<span class="br0">&#40;</span><span class="sy0">&amp;</span>serverAddress<span class="sy0">,</span> 0<span class="sy0">,</span> <span class="kw4">sizeof</span><span class="br0">&#40;</span>serverAddress<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; serverAddress.<span class="me1">sin_family</span> &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> AF_INET<span class="sy0">;</span><br />
&nbsp; serverAddress.<span class="me1">sin_addr</span>.<span class="me1">s_addr</span> <span class="sy0">=</span> inet_addr<span class="br0">&#40;</span>serverIP<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; serverAddress.<span class="me1">sin_port</span> &nbsp; &nbsp; &nbsp; &nbsp;<span class="sy0">=</span> htons<span class="br0">&#40;</span>serverPort<span class="br0">&#41;</span><span class="sy0">;</span></p>
<p>&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>connect<span class="br0">&#40;</span>sock<span class="sy0">,</span> <span class="br0">&#40;</span><span class="kw4">struct</span> sockaddr <span class="sy0">*</span><span class="br0">&#41;</span> <span class="sy0">&amp;</span>serverAddress<span class="sy0">,</span> <span class="kw4">sizeof</span><span class="br0">&#40;</span>serverAddress<span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="sy0">&lt;</span> <span class="nu0">0</span><span class="br0">&#41;</span><br />
&nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; fprintf<span class="br0">&#40;</span>stderr<span class="sy0">,</span> <span class="st0">&quot;Could not connect to socket.&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="sy0">-</span><span class="nu0">1</span><span class="sy0">;</span><br />
&nbsp; <span class="br0">&#125;</span></p>
<p>&nbsp; <span class="kw4">int</span> buffer_len <span class="sy0">=</span> strlen<span class="br0">&#40;</span>buffer<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; <span class="kw1">if</span> <span class="br0">&#40;</span>send<span class="br0">&#40;</span>sock<span class="sy0">,</span> buffer<span class="sy0">,</span> buffer_len<span class="sy0">,</span> <span class="nu0">0</span><span class="br0">&#41;</span> <span class="sy0">!=</span> buffer_len<span class="br0">&#41;</span><br />
&nbsp; <span class="br0">&#123;</span><br />
&nbsp; &nbsp; fprintf<span class="br0">&#40;</span>stderr<span class="sy0">,</span> <span class="st0">&quot;Could not send command.&quot;</span><span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="sy0">-</span><span class="nu0">1</span><span class="sy0">;</span><br />
&nbsp; <span class="br0">&#125;</span></p>
<p>&nbsp; close<span class="br0">&#40;</span>sock<span class="br0">&#41;</span><span class="sy0">;</span></p>
<p>&nbsp; <span class="kw1">return</span> <span class="nu0">0</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span></div>
</div>
<p> <br/></p>
<p>The above function assumes that the upload server lives on the same machine as the segmenter at this point. Also note that the command line arguments for the segmenter have grown to include a S3 bucket name and a S3 key prefix:</p>
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">Usage: live_upload &lt;input MPEG-TS file&gt; &lt;segment duration in seconds&gt; &lt;output directory&gt; &lt;output MPEG-TS file prefix&gt; &lt;output m3u8 index file&gt; &lt;http prefix&gt; &lt;bucket name&gt; &lt;key prefix&gt;</div>
</div>
<ul>
<li>input MPEG-TS file &#8211; For the live streaming use you want this to be a pipe so it should be -</li>
<li>segment duration in seconds &#8211; How long to make each segment of video</li>
<li>output directory &#8211; Where the video segments live before they are transfered</li>
<li>output MPEG-TS file prefix &#8211; The prefix of the video file</li>
<li>output m3u8 index file &#8211; The name of the m3u8 index file</li>
<li>http prefix &#8211; The prefix of the URL where the segments are ultimately located</li>
<li>bucket name &#8211; The S3 bucket name that the segments and index will be stored in</li>
<li>key prefix &#8211; The S3 key that the segments and index should be prefixed with</li>
</ul>
<p>Download the <a href="http://www.ioncannon.net/examples/httplive/Makefile">Makefile</a> and <a href="http://www.ioncannon.net/examples/httplive/live_upload.c">segmenter source</a> and compile if you want to follow step 4.</p>
<h3>Step 3: Transfer the segments</h3>
<p>The modified segmenter is now ready to push the index information to a process that will in turn upload the index as well as the stream segment. In this case I chose to push the segments and index to S3 but that isn&#039;t the only option possible. Furthermore I&#039;ve stuck Cloudfront in front of the segments so they can be cached closer to the destination. Letting the index files be cached by Cloudfront could be done but care would need to be taken to make sure the index isn&#039;t cached for longer than the segment duration.</p>
<p>For the upload server I&#039;m using Ruby and the <a href="http://rightscale.rubyforge.org/right_aws_gem_doc/">Rightscale AWS gem</a> to push the segments and the index files to S3. Here is the complete code to do the upload server (this file is called s3server.rb in the git repository):</p>
<div class="codesnip-container" >
<div class="ruby codesnip" style="font-family:monospace;"><span class="kw3">require</span> <span class="st0">&#039;thread&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;socket&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;ftools&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;rubygems&#039;</span><br />
<span class="kw3">require</span> <span class="st0">&#039;right_aws&#039;</span></p>
<p>AWS_S3_ID=<span class="st0">&quot;your s3 id&quot;</span><br />
AWS_S3_KEY=<span class="st0">&quot;your s3 private key&quot;</span></p>
<p><span class="kw1">def</span> create_index<span class="br0">&#40;</span>segment_duration, output_prefix, http_prefix, first_segment, last_segment, stream_end<span class="br0">&#41;</span><br />
&nbsp; <span class="kw4">File</span>.<span class="kw3">open</span><span class="br0">&#40;</span><span class="st0">&quot;tmp.index.m3u8&quot;</span>, <span class="st0">&#039;w&#039;</span><span class="br0">&#41;</span> <span class="kw1">do</span> <span class="sy0">|</span>index_file<span class="sy0">|</span> <br />
&nbsp; &nbsp; index_file.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;#EXTM3U<span class="es0">\n</span>&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; index_file.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;#EXT-X-TARGETDURATION:#{segment_duration}<span class="es0">\n</span>&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; index_file.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;#EXT-X-MEDIA-SEQUENCE:#{last_segment &gt;= 5 ? last_segment-4 : 1}<span class="es0">\n</span>&quot;</span><span class="br0">&#41;</span></p>
<p>&nbsp; first_segment.<span class="me1">upto</span><span class="br0">&#40;</span>last_segment<span class="br0">&#41;</span> <span class="kw1">do</span> <span class="sy0">|</span> segment_index <span class="sy0">|</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> segment_index <span class="sy0">&gt;</span> last_segment <span class="sy0">-</span> <span class="nu0">5</span><br />
&nbsp; &nbsp; &nbsp; index_file.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;#EXTINF:#{segment_duration}<span class="es0">\n</span>&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; index_file.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;#{http_prefix}#{output_prefix}-%05u.ts<span class="es0">\n</span>&quot;</span> <span class="sy0">%</span> segment_index<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw1">end</span></p>
<p>&nbsp; &nbsp; index_file.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;#EXT-X-ENDLIST&quot;</span><span class="br0">&#41;</span> <span class="kw1">if</span> stream_end<br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p><span class="kw1">def</span> push_to_s3<span class="br0">&#40;</span>index, output_directory, bucket_name, key_prefix, output_prefix, last_segment<span class="br0">&#41;</span><br />
&nbsp; s3 = <span class="re2">RightAws::S3Interface</span>.<span class="me1">new</span><span class="br0">&#40;</span>AWS_S3_ID, AWS_S3_KEY<span class="br0">&#41;</span></p>
<p>&nbsp; video_filename = <span class="st0">&quot;#{output_directory}/#{output_prefix}-%05u.ts&quot;</span> <span class="sy0">%</span> last_segment<br />
&nbsp; <span class="kw3">puts</span> <span class="st0">&quot;Pushing #{video_filename} to s3://#{bucket_name}/#{key_prefix}/#{output_prefix}-%05u.ts&quot;</span> <span class="sy0">%</span> last_segment<br />
&nbsp; s3.<span class="me1">put</span><span class="br0">&#40;</span>bucket_name, <span class="st0">&quot;#{key_prefix}/#{output_prefix}-%05u.ts&quot;</span> <span class="sy0">%</span> last_segment, <span class="kw4">File</span>.<span class="kw3">open</span><span class="br0">&#40;</span>video_filename<span class="br0">&#41;</span>, <span class="br0">&#123;</span><span class="st0">&#039;x-amz-acl&#039;</span> <span class="sy0">=&gt;</span> <span class="st0">&#039;public-read&#039;</span>, <span class="st0">&#039;content-type&#039;</span> <span class="sy0">=&gt;</span> <span class="st0">&#039;video/MP2T&#039;</span><span class="br0">&#125;</span><span class="br0">&#41;</span><br />
&nbsp; <span class="kw3">puts</span> <span class="st0">&quot;Done pushing video file&quot;</span><br />
&nbsp; <br />
&nbsp; <span class="kw3">puts</span> <span class="st0">&quot;Pushing tmp.index.m3u8 to s3://#{bucket_name}/#{key_prefix}/#{index}&quot;</span><br />
&nbsp; s3.<span class="me1">put</span><span class="br0">&#40;</span>bucket_name, key_prefix <span class="sy0">+</span> <span class="st0">&quot;/&quot;</span> <span class="sy0">+</span> index, <span class="kw4">File</span>.<span class="kw3">open</span><span class="br0">&#40;</span><span class="st0">&quot;tmp.index.m3u8&quot;</span><span class="br0">&#41;</span>, <span class="br0">&#123;</span><span class="st0">&#039;x-amz-acl&#039;</span> <span class="sy0">=&gt;</span> <span class="st0">&#039;public-read&#039;</span>, <span class="st0">&#039;content-type&#039;</span> <span class="sy0">=&gt;</span> <span class="st0">&#039;video/MP2T&#039;</span><span class="br0">&#125;</span><span class="br0">&#41;</span><br />
&nbsp; <span class="kw3">puts</span> <span class="st0">&quot;Done pushing index file&quot;</span><br />
<span class="kw1">end</span></p>
<p>queue = <span class="kw4">Queue</span>.<span class="me1">new</span></p>
<p>server_thread = <span class="kw4">Thread</span>.<span class="me1">new</span> <span class="kw1">do</span><br />
&nbsp; server = TCPServer.<span class="me1">new</span><span class="br0">&#40;</span><span class="st0">&#039;0.0.0.0&#039;</span>, 10234<span class="br0">&#41;</span><br />
&nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>session = server.<span class="me1">accept</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; input = session.<span class="kw3">gets</span><br />
&nbsp; &nbsp; queue <span class="sy0">&lt;&lt;</span> input<br />
&nbsp; &nbsp; session.<span class="me1">close</span><br />
&nbsp; <span class="kw1">end</span> &nbsp; &nbsp;<br />
<span class="kw1">end</span></p>
<p>upload_thread = <span class="kw4">Thread</span>.<span class="me1">new</span> <span class="kw1">do</span><br />
&nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>value = queue.<span class="me1">pop</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="br0">&#40;</span>index, output_directory, segment_duration, output_prefix, http_prefix, first_segment, last_segment, stream_end, bucket_name, key_prefix<span class="br0">&#41;</span> = value.<span class="me1">strip</span>.<span class="kw3">split</span><span class="br0">&#40;</span><span class="sy0">%</span>r<span class="br0">&#123;</span>,\s<span class="sy0">*</span><span class="br0">&#125;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">if</span> last_segment.<span class="me1">to_i</span> <span class="sy0">&gt;</span> 0<br />
&nbsp; &nbsp; &nbsp; create_index<span class="br0">&#40;</span>segment_duration.<span class="me1">to_i</span>, output_prefix, http_prefix, first_segment.<span class="me1">to_i</span>, last_segment.<span class="me1">to_i</span>, stream_end.<span class="me1">to_i</span> == 1<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; push_to_s3<span class="br0">&#40;</span>index, output_directory, bucket_name, key_prefix, output_prefix, last_segment.<span class="me1">to_i</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></p>
<p>server_thread.<span class="me1">join</span></div>
</div>
<p><br/></p>
<p>To use the above code you will need to put your S3 credentials in place for the values AWS_S3_ID and AWS_S3_KEY.</p>
<h3>Step 4: Test it</h3>
<ol>
<li>Set up a HTML file that points to the streaming index file:
<div class="codesnip-container" >
<div class="html4strict codesnip" style="font-family:monospace;"><span class="sc2">&lt;<a href="http://december.com/html/4/element/html.html"><span class="kw2">html</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/head.html"><span class="kw2">head</span></a>&gt;</span><br />
&nbsp; &nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/title.html"><span class="kw2">title</span></a>&gt;</span>Video Test<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/title.html"><span class="kw2">title</span></a>&gt;</span><br />
&nbsp; &nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/meta.html"><span class="kw2">meta</span></a> <span class="kw3">name</span><span class="sy0">=</span><span class="st0">&quot;viewport&quot;</span> <span class="kw3">content</span><span class="sy0">=</span><span class="st0">&quot;width=320; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;&quot;</span><span class="sy0">/</span>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/head.html"><span class="kw2">head</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/body.html"><span class="kw2">body</span></a> <span class="kw3">style</span><span class="sy0">=</span><span class="st0">&quot;background-color:#FFFFFF; &quot;</span>&gt;</span><br />
&nbsp; &nbsp; <span class="sc2">&lt;<a href="http://december.com/html/4/element/center.html"><span class="kw2">center</span></a>&gt;</span><br />
&nbsp; &nbsp; &nbsp; <span class="sc2">&lt;video <span class="kw3">width</span><span class="sy0">=</span><span class="st0">&#039;150&#039;</span> <span class="kw3">height</span><span class="sy0">=</span><span class="st0">&#039;150&#039;</span> <span class="kw3">src</span><span class="sy0">=</span><span class="st0">&quot;http://s3.amazonaws.com/ionlivestream/stream0001/stream-128k.m3u8&quot;</span> <span class="sy0">/</span>&gt;</span><br />
&nbsp; &nbsp; <span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/center.html"><span class="kw2">center</span></a>&gt;</span><br />
&nbsp; <span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/body.html"><span class="kw2">body</span></a>&gt;</span><br />
<span class="sc2">&lt;<span class="sy0">/</span><a href="http://december.com/html/4/element/html.html"><span class="kw2">html</span></a>&gt;</span></div>
</div>
<p>Note that the index file is coming from S3 directly and not from Cloudfront to keep it from being cached and a stale version being served. In case it helps, the format for the source index location in this example is: http://s3.amazonaws.com/&lt;bucket name>/&lt;key prefix>/&lt;index file>
</li>
<li>Start the upload script. Once started It will sit and wait for input from the segmenter:
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">ruby s3server.rb</div>
</div>
<p>As requests are made the script will dump output to stdout describing what it is doing.
</li>
<li>Configure and start your source video stream. In my case I need to start the VLC to QuickTime Broadcaster connection for the iSight.
</li>
<li>Run FFMpeg against the source video stream and pipe the resulting transcoded output into the segmenter. For my stream I used the following command:
<div class="codesnip-container" >
<div class="text codesnip" style="font-family:monospace;">ffmpeg -v 0 -i http://192.168.132.101:8080 -f mpegts -acodec libmp3lame -ar 48000 -ab 64k -s 320&#215;240 -vcodec libx264 -b 128k -flags +loop -cmp +chroma -partitions +parti4&#215;4+partp8&#215;8+partb8&#215;8 -subq 5 -trellis 1 -refs 1 -coder 0 -me_range 16 -keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -bt 200k -maxrate 128k -bufsize 128k rc_eq &#039;blurCplx^(1-qComp)&#039; -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -level 30 -aspect 320:240 -g 30 -async 2 &#8211; | live_upload &#8211; 10 /tmp/ sample_128k stream-128k.m3u8 http://d3vmly3syseqo9.cloudfront.net/stream0001/ ionlivestream stream0001</div>
</div>
</li>
<li>Point safari on your iPhone to the HTML file created in #1 and hit play. After buffering starts you should see the live video with a segment or two delay.</li>
</ol>
<p><i>Other information</i></p>
<p>Again, this is just a proof of concept so there are a number of things lacking. Here are a list of enhancements I could imagine people would find useful:</p>
<ul>
<li>Variable bitrate segments</li>
<li>Easier and more flexible configuration</li>
<li>Use <a href="http://aws.amazon.com/ec2/">EC2</a> for encoding from one source stream</li>
<li>Pluggable transfer to a website using FTP or SCP instead of S3</li>
</ul>
<p>I thought some about the cost breakdown of doing this using S3 and Cloudfront. The following is a quick calculation on what it might cost for a variable rate stream using the cost of S3 and Cloudfront today.</p>
<p><b>Some assumptions:</b><br />
Everyone always finishes the entire stream when they start it.<br />
Variable rates: 128kbps, 256kbps, 364kbps<br />
Video length: 5 minute video<br />
Client streams: 100</p>
<p>30 segments for each stream (5 minutes = 300 seconds / 10 second intervals)</p>
<p>128kbps x 5 minutes = 4.8MB<br />
256kbps x 5 minutes = 9.6MB<br />
364kbps x 5 minutes = 13.65MB</p>
<p>% of each stream rate<br />
25% client streams @ 364kbps = 25 * 13.65MB = 341.25MB<br />
50% client streams @ 256kbps = 50 * 9.6MB = 480MB<br />
25% client streams @ 128kbps = 25 * 4.8MB = 128MB</p>
<p>3 stream index files + 1 variable index file<br />
30 segments * 3 streams = 90 index puts + 90 stream segment puts</p>
<p><b>S3 put cost:</b><br />
  $0.10 per GB – all data transfer in<br />
  $0.01 per 1,000 PUT, COPY, POST, or LIST requests</p>
<p>  $0.01 * (180/1000) = $0.0018 + $0.10 * (4.8+9.6+13.65)/1000 = $0.002805 = $0.004605</p>
<p><b>S3 storage cost:</b><br />
  $0.15 per GB – first 50 TB / month of storage used</p>
<p>  $0.15 * (4.8+9.6+13.65)/1000 = $0.0042075</p>
<p><b>S3 transfer cost:</b><br />
  $0.17 per GB – first 10 TB / month data transfer out<br />
  $0.01 per 10,000 GET and all other requests</p>
<p>  The index gets pulled from S3 every time, the streams come from Cloudfront<br />
  Assumes the transfer to Cloudfront happens 3 times per stream<br />
  $0.17 * (3 * (4.8+9.6+13.65)) / 1000 = $0.0143055<br />
  $0.01 * (100 * 30) + (30 * 3) / 10000 = $0.00309</p>
<p><b>Cloudfront transfer cost:</b><br />
  $0.17 per GB – first 10 TB / month data transfer out<br />
  $0.01 per 10,000 GET requests</p>
<p>  $0.17 * (341.25 + 480 + 128) / 1000 = $0.1613708<br />
  $0.01 * (30 * 100) / 10000 = $0.003</p>
<p><b>Rough total cost:</b> $0.1933833</p>
<p>So for 100 streams of 5 minutes worth of video you would be looking at something around 20 cents.</p>
<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/ioncannon?a=HgEUTThKySo:iWkWUz3fTO8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=HgEUTThKySo:iWkWUz3fTO8:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/ioncannon?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=HgEUTThKySo:iWkWUz3fTO8:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=HgEUTThKySo:iWkWUz3fTO8:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/ioncannon?a=HgEUTThKySo:iWkWUz3fTO8:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/ioncannon?i=HgEUTThKySo:iWkWUz3fTO8:F7zBnMyn0Lo" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/ioncannon/~4/HgEUTThKySo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.ioncannon.net/programming/475/iphone-windowed-http-live-streaming-using-amazon-s3-and-cloudfront-proof-of-concept/feed/</wfw:commentRss>
		<slash:comments>38</slash:comments>
		<feedburner:origLink>http://www.ioncannon.net/programming/475/iphone-windowed-http-live-streaming-using-amazon-s3-and-cloudfront-proof-of-concept/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 1.760 seconds. --><!-- Cached page generated by WP-Super-Cache on 2010-01-28 12:56:09 -->
