<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
  <channel>
    <title>Ruby Pond</title>
    <link>http://rubypond.com/rss/</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Ruby Pond Ltd. | Ruby on Rails Developers, Online Marketing, Social Networking</description>
    
    
        <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/rubypond" type="application/rss+xml" /><item>
          <title>Scotland on Rails 09 - Videos are up</title>
          <description>&lt;p&gt;Not much more to say really, the title sums it up. For those that weren&amp;#8217;t there or those that were that just want a recap go and check out the &lt;a href="http://www.engineyard.com/blog/community/scotland-on-rails/"&gt;Scotland on rails videos&lt;/a&gt; now.&lt;/p&gt;

&lt;p&gt;And I &lt;em&gt;will&lt;/em&gt; eventually get around to doing that round-up of the post/out-of-conference activities to do should you be looking to attend the conference yourself next year.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/uvunOx_HEPs" height="1" width="1"/&gt;</description>
          <pubDate>Thu, 21 May 2009 10:41:37 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/05/21/scotland-on-rails-09---videos-are-up/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/uvunOx_HEPs/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/05/21/scotland-on-rails-09---videos-are-up/</feedburner:origLink></item>
    
        <item>
          <title>Belated Scotland on Rails round up</title>
          <description>&lt;p&gt;Okay, so I&amp;#8217;m almost a month and a half late in finishing this. And the content might already seem a little dated given &lt;a href="http://www.railsconf.com/"&gt;RailsConf&lt;/a&gt; is currently on in Vegas and &lt;a href="http://euruko2009.org/"&gt;EuRuKo&lt;/a&gt; kicks off this weekend in Barcelona. But better late than never.&lt;/p&gt;

&lt;h2&gt;The Summary&lt;/h2&gt;

&lt;p&gt;So in summary, how was it? Pretty good and great value. It wasn&amp;#8217;t blow my mind amazing, but there was enough good stuff going on to justify the trip. Plus Scotland is a beautiful country, and Edinburgh in-particular a great city, so any means of justifying a business expense to get&amp;nbsp;there&amp;nbsp;is&amp;nbsp;worthwhile&amp;nbsp;;)&lt;/p&gt;

&lt;p&gt;So onto the sessions I saw or chatted in depth to people about.&lt;/p&gt;

&lt;h2&gt;Sessions&lt;/h2&gt;

&lt;h3&gt;Keynote - Marcel Molina Jr&lt;/h3&gt;

&lt;p&gt;I&amp;#8217;d heard of Marcel before, I&amp;#8217;d even seen his name grace the cover of some books I own, but I didn&amp;#8217;t know much about him beyond that. For those like me, he was one of the original core team at 37signals that worked on Rails. And his keynote was quite a personal look at the beginnings of the framework. Digging up old usenet posts from DHH showing how he&amp;#8217;s trying to learn ruby, to slowly building the hype around rails, to a public release. For a presentation that was essentially reading old emails and newsgroup postings I found it thoroughly enjoyable. It was particularly interesting to see just how quickly DHH went from new to ruby to releasing probably one of the most widely used tools written in the language.&lt;/p&gt;

&lt;h3&gt;Implementing Geo-awareness - Jim Remsik&lt;/h3&gt;

&lt;p&gt;I&amp;#8217;d had big hopes for this presentation. I&amp;#8217;ve dabbled with geo-awareness stuff before, heck I&amp;#8217;ve even written my own rails mapping plugin because I thought YM4R and the alternatives weren&amp;#8217;t very ruby (btw. it&amp;#8217;s in use for the &lt;a href="http://map.ly/"&gt;collaborative mapping/direction&lt;/a&gt; site I quickly rustled up for my girlfriend &lt;a href="http://map.ly/"&gt;Map.ly&lt;/a&gt;). I was really looking at some idea re how to make an existing site more &amp;#8220;geo aware&amp;#8221;, and this presentation was little more than &amp;#8220;how to put a google map on your site 101&amp;#8221;. For all my complaints about the existing plugins and their approach, they&amp;#8217;re still quite simple to use. With any level of ruby/rails knowledge, this talk could have probably been reduced to 5-10mins.&lt;/p&gt;

&lt;h3&gt;Getting Git: Getting effective project management - Scott Chacon&lt;/h3&gt;

&lt;p&gt;Scott is part of the team at Logical Awesome working on Github. He&amp;#8217;s also done the peepcode on git, and maintains the community book. He really knows his git, and by all accounts I&amp;#8217;m disappointed I missed this one for the geo-awareness session. Thankfully the guys I pair with most days, &lt;a href="http://www.effectif.com"&gt;Graham Ashton&lt;/a&gt; and Jon Ramsey, went and filled me in somewhat on it. The first half was a bit of an introduction on what it is, why it&amp;#8217;s different, how to use it, etc. that could mostly be ignored if you&amp;#8217;ve used it at any length. Apparently the second half of the talk (which unfortunately he tried to cram into 10mins) was amazing. Showing how to use bisect to identify which commit introduced a bug as one of the highlights. The slides are available at &lt;a href="http://github.com/schacon/git-presentations/tree/master"&gt;github&lt;/a&gt; and hopefully the videos will be up soon.&lt;/p&gt;

&lt;h3&gt;Ruby on the couch eating potatoes - Alexander Lang&lt;/h3&gt;

&lt;p&gt;A well times presentation for my perspective. I&amp;#8217;ve been looking at a number of relational database alternatives of late for one of my side projects, and &lt;a href="http://couchdb.apache.org/"&gt;CouchDB&lt;/a&gt; had come up as one of the most likely contenders. So I&amp;#8217;d done a fair degree of research on CouchDB itself, but not much on the most suitable adapter for using it with rails.&lt;/p&gt;

&lt;p&gt;Alex did a good job going through all the various know adapters for talking to CouchDB (including his own, CouchPotato) and pointed out the pros and cons of each. Amazingly for a presentation like this he admitted that you should avoid his. For those interested, the winner was Paul Carey&amp;#8217;s &lt;a href="http://github.com/paulcarey/relaxdb/tree/master"&gt;RelaxDB&lt;/a&gt;. A point of contention which spilled onto the twittersphere was Alex&amp;#8217;s recommendation to avoid George Palmer&amp;#8217;s &lt;a href="http://github.com/georgepalmer/couch_foo/tree/master"&gt;CouchFoo&lt;/a&gt; (George was in the audience). The rationale being that CouchFoo, much like CouchPotato, is too &amp;#8220;ActiveRecord&amp;#8221; and that document-based databases are different to relational databases so this means the approach must be inherently wrong.&lt;/p&gt;

&lt;p&gt;Personally I think no, the rationale is wrong. Sure, you don&amp;#8217;t want to treat CouchDB like a relational database because you&amp;#8217;re missing the benefits gained in the document approach and kind of missing the whole point. But that doesn&amp;#8217;t mean ActiveRecord for CouchDB is wrong. ActiveRecord does not &lt;em&gt;have to&lt;/em&gt; mean Object Relational Mapper, it&amp;#8217;s just up until this point it&amp;#8217;s all people have used it for. In fact, there is is nothing in the name ActiveRecord that inherently infers it is only for relational schemas. I think if people had grown up being accustomed to AR being an Object Document Mapper instead of an ORM then this argument would be moot. In rails you interface with an object, the adapter takes care of how to store it be-it in a document store or a relational store. We&amp;#8217;ve got a pretty well documented API for AR, with lots of validation rules and other plugins that are useful. If you then decide to go build a relational schema in your document database that isn&amp;#8217;t the fault of CouchFoo, just like AR can&amp;#8217;t be blamed if you decide to try and keep everything in a single big flat table in MySql. &lt;/p&gt;

&lt;p&gt;My own conclusion after some research was the two best adapters out there were RelaxDB and CouchFoo. RelaxDB seems the best (I&amp;#8217;ll go into details in a future post) but CouchFoo is ideal if you want to transition quickly or need access to some of the AR helpers/mixins. So all in all, a good presentation and made me go back and re-assess my own research on the topic and we ultimately shared the same conclusion albeit with different reasoning.&lt;/p&gt;

&lt;h3&gt;Merb and Rails 3.0 - Yehuda Katz&lt;/h3&gt;

&lt;p&gt;Probably not a lot to go through here which wont be out of date due to RailsConf. An interesting and high-paced presentation from Yehuda on what&amp;#8217;s coming in rails 3.0, some of the design decisions, etc. Probably not a huge amount of new information for anybody that had been using merb or following the public updates.&lt;/p&gt;

&lt;h3&gt;In Praise of non-fixtured data - Kevin Barnes&lt;/h3&gt;

&lt;p&gt;I often wonder how I&amp;#8217;d go on stage in front of a bunch of peers, but I&amp;#8217;m thankful that many of the presenters at these conferences looks as nervous up there as I&amp;#8217;d be. Kevin made me rethink if I should ever bother. A really excellent and clear presenter that was a joy to listen to.&lt;/p&gt;

&lt;p&gt;That being said, I didn&amp;#8217;t take from this as much as I&amp;#8217;d hoped. It&amp;#8217;s my own fault as the talk was exactly what was presented on the label, mostly about why you shouldn&amp;#8217;t use fixtured data and then a look at what the alternatives are. I was looking for a bit more of a detailed comparison of the pros/cons of each of the object factory approaches (much like Alex had done in the CouchDB talk) but it was really just a cursory mention of them all and then a look at how to use the one Kevin had developed, ObjectDaddy.&lt;/p&gt;

&lt;h3&gt;Confessions of a PackRat - Scott Raymond&lt;/h3&gt;

&lt;p&gt;I only caught the tail-end of this talk, but from those I spoke to it was quite insightful. Scott went into detail about how the developed a Facebook app and then needing to scale quickly to deal with the load. Everything from scaling into EC2, switching between technologies, and having to handle people trying to game the system. The 10 mins I saw were thoroughly enjoyable.&lt;/p&gt;

&lt;h3&gt;Advanced Deployment - Johnathon Weiss&lt;/h3&gt;

&lt;p&gt;This is one that I felt wasn&amp;#8217;t at all what was described on the label. I&amp;#8217;d normally consider deployment something more akin to &amp;#8220;getting your stuff onto servers&amp;#8221;. Instead this covered lots of high-level architectural options you have available to you, the alternatives for each, but without any real explanation on why you should choose one over another. An interesting topic, but pitched to the wrong crowd with the wrong name I feel. Much better for manager types with little-to-no hands on rails experience. I think anybody who&amp;#8217;s actually deployed an application would have known the answers to all the theoretical questions raise.&lt;/p&gt;

&lt;h3&gt;The Ruby Object Model - Dave Thomas&lt;/h3&gt;

&lt;p&gt;The guy that brought is the Pragmatic Programmers and a number of ruby books and videos from that stable gave a great presentation on the object model with ruby. The shoeless maestro strutted around the stage and made what could be a very dull and overly technical topic seem approachable and simplistic. When you call a method where does it look to first to run it? Where does it bubble up to? In what order? And how do you access an anonymous method? All this arnd more was answered! Not sure if I picked up anything new, but it was a great cementing of my understanding in a highly graphical fashion. 6 weeks on when I&amp;#8217;m debugging the internals of something I&amp;#8217;m still visualising Dave&amp;#8217;s charts in my head to help me. Much of the content you can go through at leisure and in more detail in the $5 videos from &lt;a href="http://www.pragprog.com/screencasts/v-dtrubyom/the-ruby-object-model-and-metaprogramming"&gt;pragprog&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;R-house - Fernand Galiana&lt;/h3&gt;

&lt;p&gt;Fernand took the noble risk of trying to do a live demonstration of the possibilities of using home automation with ruby. Thankfully for the most part it worked, and when it didn&amp;#8217;t he was entertaining enough to cover it and continue. And I have to appreciate the level of work that has gone into R-house and the libraries it depends on to make this happen, with such a range of different home automation equipment out there and each with there own slight nuances on how to interface it can&amp;#8217;t have been a simple job. A great presentation, not sure how soon I&amp;#8217;ll be able to use any of it though.&lt;/p&gt;

&lt;h3&gt;How to makes a successful Rails app - Steven A Bristol&lt;/h3&gt;

&lt;p&gt;A presentation with no slides?! The horror!! Steven was another excellent presenter, and with a great presentation to boot. He went into what it takes to actually make a successful Rails app (hint: actually releasing it is a big part), and the advice was sound. The other big one which I think many developers ignore, I know I&amp;#8217;ve been guilty of it, is trying to do the design yourself. Pay for a few days of their time and make it look professional. He was awfully opinionated on a number of topics without any real basis. That was fine when I agreed with him, I thought he was full of shit when he didn&amp;#8217;t justify it ;) One point that niggled at me throughout most of the presentation was that the reason I know of Steven and his company was through the the open-source &lt;a href="http://lovdbyless.com/"&gt;Lovd by Less&lt;/a&gt; social networking framework they&amp;#8217;d built in rails. I&amp;#8217;d needed something like it once upon a time and looked at Lovd, and the code was horrible. So here I was being lectured by someone on how to make a Rails app who&amp;#8217;d produced some of the ugliest code I&amp;#8217;d tried to use. &lt;/p&gt;

&lt;p&gt;Then the bombshell hit (paraphrasing), &amp;#8220;I don&amp;#8217;t care what the code looks like if it works, just get it out the door and see if people like it. Anybody who&amp;#8217;s seen some of our code will know it&amp;#8217;s horrible&amp;#8221;. And he was right, on all counts. But&amp;#8230; I still had hesitations reservations. This was an open source framework, so the consumers in this instance are developers. So for the same reasons that you should get a designer to make your site look good for regular punters, I think you should take the time to make your code elegant and usable for developers. He&amp;#8217;s got a great looking product which does what it needs to, it&amp;#8217;s a shame that it&amp;#8217;s such a pain to use and made me question the validity of what he was saying. To that end, and rather than continuing to pay other developers to extend my own social networking engine I&amp;#8217;ve taken it upon myself to fix everything I think is broken with Lovd. There&amp;#8217;s still a long way to go and I&amp;#8217;ll make a more formal announcement once it&amp;#8217;s nearer completion but it&amp;#8217;s already come a long way IMHO. Get it from &lt;a href="http://github.com/rubypond/lovd-by-less/tree"&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Who needs Photoshop? Image manipulation in Ruby - Cory Forsyth&lt;/h3&gt;

&lt;p&gt;Reading through this initially I thought it was going was a bit of a sensational headline for what was going to be an intermediate level look at rmagick. And given my own experience with rmagick, I couldn&amp;#8217;t possibly stomach the thought of it ever being considered a viable photoshop replacement. I was wrong.&lt;/p&gt;

&lt;p&gt;This was another thoroughly entertaining and informative presentation. Cory went through in detail the logic behind the one feature that has been making me want to upgrade to the latest version of photoshop, &lt;a href="http://www.youtube.com/watch?v=019mu8FTy6M"&gt;content aware scaling&lt;/a&gt;, complete with code samples. Next up was facial recognition which could be applied to suggest smart cropping, or apparently put party hats on cats. It&amp;#8217;s most likely made my CS4 upgrade redundant, which paid for the trip numerous times over.&lt;/p&gt;

&lt;h3&gt;Edgecase dialog: Ruby code review - Joe O&amp;#8217;Brian and Jim Weirich&lt;/h3&gt;

&lt;p&gt;It sounded like a boring topic, it should have been a boring topic, and a very odd note on which to end a conference. It turns out Joe and Jim are brilliant entertainers taking the audience through a hypothetical role play of a consultant coming in to refactor an existing rails code base. While it&amp;#8217;s unlikely they&amp;#8217;ll be invited back to do their return for the Edinburgh Fringe Festival in August, they thoroughly deserved their rapturous applause at the end. I&amp;#8217;ve not got any notes unfortunately, I&amp;#8217;m not sure if that&amp;#8217;s because I was laughing too hard or most of it was common sense. In any event, it was a great reminder that you should step back and look at you code to make sure you aren&amp;#8217;t trying to be clever for the sake of it. Simple solutions are often the best.&lt;/p&gt;

&lt;h2&gt;Seeya next year&lt;/h2&gt;

&lt;p&gt;It was well worth the trip. The conference was super cheap so it&amp;#8217;s really not hard to justify the expense and Edinburgh is a great place to visit. I&amp;#8217;ve been there as both a tourist and lived there for 12 months, so I&amp;#8217;ll do a subsequent post for those looking to attend next year with my own off-conference highlights.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/B4MxJHbkt1g" height="1" width="1"/&gt;</description>
          <pubDate>Fri, 08 May 2009 13:15:08 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/05/08/belated-scotland-on-rails-round-up/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/B4MxJHbkt1g/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/05/08/belated-scotland-on-rails-round-up/</feedburner:origLink></item>
    
        <item>
          <title>Capturing a form cancel</title>
          <description>&lt;p&gt;I&amp;#8217;ve been meaning to write this one up for months now, ever since I originally posted the &lt;a href="http://rubypond.com/articles/2008/07/16/sexy-forms-in-rails/"&gt;sexy forms in rails&lt;/a&gt;. The form builder I talk about in that post makes it easy to put a submit and cancel button on a the form, and a few people have asked how to handle the cancel action. Some have suggested using javascript to redirect back using an onclick, ick! What happens if the user doesn&amp;#8217;t have javascript support? Sure you might try and justify it by saying, &amp;#8220;Well, how many users don&amp;#8217;t have javascript these days?&amp;#8221; but there is absolutely no reason why this &lt;em&gt;has&lt;/em&gt; to use javascript, so lets make it work for &lt;em&gt;everybody&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s actually much easier than you&amp;#8217;d think, within &lt;em&gt;application.rb&lt;/em&gt; I&amp;#8217;ve got the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class ApplicationController &amp;lt; ActionController::Base
  before_filter :catch_cancel, :update =&amp;gt; [:create, :update, :destroy]
  after_filter :set_referrer, :only =&amp;gt; [:index, :show]

  private
    def set_referrer
      session[:referrer] = url_for(params)
    end

    def catch_cancel
      redirect_to session[:referrer] if params[:commit] == &amp;quot;Cancel&amp;quot;
    end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The logic being, that if we&amp;#8217;ve got a fully RESTful architecture the only pages we&amp;#8217;ll ever want to go &amp;#8220;back&amp;#8221; will be the index or show actions on each controller. So we cheekily call url&lt;em&gt;for in an after filter with the current params to get the path of the page we&amp;#8217;ve just display the user, and stash it in the session. Theoretically it may be possible to use ENV[&amp;#8220;HTTP&lt;/em&gt;REFERER&amp;#8221;] instead but it&amp;#8217;s not guaranteed, this approach however ensures we keep track of the page to go back to explicitly ourselves.&lt;/p&gt;

&lt;p&gt;Then, to piece it all together there&amp;#8217;s a before filter checking if the user has clicked the submit button labeled with &amp;#8220;Cancel&amp;#8221;. You&amp;#8217;ll need to change that equality test dependent on what text you put on your button. You may also need to move the before filter into the controllers that require them, depending on the order it needs to run with other filters you have.&lt;/p&gt;

&lt;p&gt;Hope that helps.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/3Nu3RZdKoFA" height="1" width="1"/&gt;</description>
          <pubDate>Fri, 06 Feb 2009 22:40:02 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/02/06/capturing-a-form-cancel/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/3Nu3RZdKoFA/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/02/06/capturing-a-form-cancel/</feedburner:origLink></item>
    
        <item>
          <title>Ruby Manor Videos Available</title>
          <description>&lt;p&gt;For anybody that was unable to attend (like me), or for those that just want a recap then a selection of the &lt;a href="http://rubymanor.org/videos/"&gt;Ruby Manor videos&lt;/a&gt; are now available for download. Hopefully some more will be posted soon.&lt;/p&gt;

&lt;p&gt;For any of the bits that are missing you can read through Graham&amp;#8217;s review of the various sessions. &lt;a href="http://effectif.com/2008/11/22/notes-from-the-ruby-manor"&gt;Ruby Manor part 1&lt;/a&gt; and &lt;a href="http://effectif.com/2008/11/23/notes-from-the-ruby-manor-part-2"&gt;Ruby Manor part 2&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/BpYYoZmBDBg" height="1" width="1"/&gt;</description>
          <pubDate>Fri, 16 Jan 2009 14:38:44 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/01/16/ruby-manor-videos-available/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/BpYYoZmBDBg/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/01/16/ruby-manor-videos-available/</feedburner:origLink></item>
    
        <item>
          <title>scRUBYt! Gets Plugins!</title>
          <description>&lt;p&gt;Yes! You heard right! As you may have gathered, it&amp;#8217;s been a rather frantic month of development for scRUBYt! and currently this is the addition I&amp;#8217;m most proud of. One of the most common requests used to be along the lines of &amp;#8220;when do you plan to support xxx format output?&amp;#8221;. Now, scRUBYt! is oblivious to output formats. That&amp;#8217;s right, it natively supports nothing, nada, zilch. But to make it useful, we&amp;#8217;ve written a Hash output plugin which we&amp;#8217;re shipping it with.&lt;/p&gt;

&lt;h2&gt;How to use a scRUBYt! output plugin&lt;/h2&gt;

&lt;p&gt;Firstly, you need to make sure you have the plugin you require installed. At the time of writing there will only be the two I&amp;#8217;ve written, Hash and XmlFile. Then in your ruby file require the plugin. As the current edge release isn&amp;#8217;t yet packaged as a gem you&amp;#8217;ll need to test this with the github checkout and reference the output plugin explicitly:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;require &amp;quot;plugins/scrubyt_xml_file_output/scrubyt_xml_file_output&amp;quot;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you&amp;#8217;ve been following the tutorials talking about &lt;a href="http://rubypond.com/articles/2009/01/15/web-scraping---the-amazon-example-finale/"&gt;web scraping&lt;/a&gt; with the new version for the past few weeks you&amp;#8217;ll have seen how to direct output to a plugin. To request Hash output it&amp;#8217;s:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new :output =&amp;gt; :hash do
    fetch &amp;quot;http://www.google.com/search?&amp;amp;q=ruby&amp;quot;
    result &amp;quot;//html/body/div[5]/div/div/h2/a&amp;quot;
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and for XmlFile it is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@file = File.open(&amp;quot;results.xml&amp;quot;, &amp;quot;w&amp;quot;)
@extractor = Scrubyt::Extractor.new :output =&amp;gt; :xml_file, :file =&amp;gt; @file do
    fetch &amp;quot;http://www.google.com/search?&amp;amp;q=ruby&amp;quot;
    result &amp;quot;//html/body/div[5]/div/div/h2/a&amp;quot;
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The XmlFile output takes an additional parameter which is the file to stream the results out to.&lt;/p&gt;

&lt;h2&gt;Creating your own plugin&lt;/h2&gt;

&lt;p&gt;That&amp;#8217;s great for those of you that are happy with XML or Hash output, but what about if you want some other custom format? Well it&amp;#8217;s time to create your own. I&amp;#8217;ll show you the actual code that implements the XmlFile output to show you how simple it is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;require 'rexml/document'
require &amp;quot;#{File.dirname(__FILE__)}/inflector&amp;quot;
require &amp;quot;#{File.dirname(__FILE__)}/inflections&amp;quot;

class Scrubyt::Output::XmlFile &amp;lt; Scrubyt::Output::Plugin  
  @subscribers = {}
  on_initialize :setup_file
  before_extractor :open_root_node
  after_extractor :close_root_node
  on_save_result :save_xml


  def setup_file(args = {})
    @file = args[:file]
  end

  def open_root_node(*args)
    @file.write(&amp;quot;&amp;lt;root&amp;gt;&amp;quot;)
  end

  def save_xml(name, results)
    if results.is_a?(::Hash)
      @file.write results.to_xml
    else
      results.each do |result|
        @file.write result.to_xml(name)
      end
    end
  end

  def close_root_node(*args)
    @file.write(&amp;quot;&amp;lt;/root&amp;gt;&amp;quot;)
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The require lines at the top are only needed for this output format. REXML to construct the XML tags for me, and some inflections I&amp;#8217;ve put together to turn the Hash and Array objects into XML. Now into analysing the class proper.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class Scrubyt::Output::XmlFile &amp;lt; Scrubyt::Output::Plugin  
  @subscribers = {}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;At the moment, you&amp;#8217;ll need to inintialize this instance variable to be an empty Hash for the events to get attached correctly. I&amp;#8217;m looking for a way to remove it, stay tuned. But for now you&amp;#8217;ll need to put it in.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;on_initialize :setup_file
before_extractor :open_root_node
after_extractor :close_root_node
on_save_result :save_xml&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we&amp;#8217;ve got four events to listen for, the concept should be familiar if you&amp;#8217;re coming from Rails. Essentially all we are doing is saying &amp;#8220;When we initialize run the setup&lt;em&gt;file method. Before the extractor actually starts, run the method called open&lt;/em&gt;root&lt;em&gt;node. Whenever we get a result to save, call save&lt;/em&gt;xml. And finally, after the extractor run the method called close&lt;em&gt;root&lt;/em&gt;node.&amp;#8221;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def setup_file(args = {})
  @file = args[:file]
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is fairly straightforward. If you&amp;#8217;ve got any custom logic that needs to happen when the output plugin is initialized you can place it in here. Any parameter that is passed in to Extractor.new() is passed through for you to access here.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def open_root_node(*args)
  @file.write(&amp;quot;&amp;lt;root&amp;gt;&amp;quot;)
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now just to open the XML file, and keep it somewhat consistent with the old scRUBYt! XML output we open a &lt;root&gt; node within the file.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def save_xml(name, results)
  if results.is_a?(::Hash)
    @file.write results.to_xml
  else
    results.each do |result|
      @file.write result.to_xml(name)
    end
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here is where the majority of the magic happens. The save_xml method will be passed the desired name for the result, and a hash of the results. This is essentially the same format you&amp;#8217;d get if you used the Hash output format, except for each individual detail block rather than then entire extractor.&lt;/p&gt;

&lt;p&gt;The reason for the if/else scenario is for when results are not part of a detail block. If you&amp;#8217;re just returning results straight (like the Google example at the top of this post) then &amp;#8220;results&amp;#8221; in this context will be a list/Array of all the matching results rather than a Hash.&lt;/p&gt;

&lt;h2&gt;Passing results back to the extractor&lt;/h2&gt;

&lt;p&gt;Not everyone is going to want to stream results out to a file though, so to deal with this you can make a results method available on the instance of your plugin. As I said earlier, even Hash operates as a plugin now so we can see an example of how this work in the Hash output plugin:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class Scrubyt::Output::Hash &amp;lt; Scrubyt::Output::Plugin
  @subscribers = {}
  on_initialize :setup_results
  on_save_result :store_hash

  def setup_results(args = {})
    @results = []
  end

  def results
    @results
  end

  def store_hash(name, passed_results)
    @results &amp;lt;&amp;lt; passed_results
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we setup a @results instance on&lt;em&gt;initialize, and then on&lt;/em&gt;save&lt;em&gt;results simply pushes the passed&lt;/em&gt;results into @results. Confused yet? Hopefully the code is clear enough to make sense.&lt;/p&gt;

&lt;p&gt;All that happens then is that back in your extractor definition the call to @extractor.results is passed through to the first output plugin it can find.&lt;/p&gt;

&lt;h2&gt;Naming Conventions and Namespacing&lt;/h2&gt;

&lt;p&gt;The only additional requirement for a plugin to work in scRUBYt! is that it is correctly named and namespaced. As you may have noticed the ones I&amp;#8217;ve provided are called Scrubyt::Output::Hash and Scrubyt::Output::XmlFile, that means they can be targeted using :output =&gt; :hash and  :output =&gt; :xml&lt;em&gt;file respectively. If you wanted to call your output GlennsBadExample it would be namespaced as Scrubyt::Output::GlennsBadExample and you&amp;#8217;d then just need to require the appropriate file and use :output =&gt; :glenns&lt;/em&gt;bad_example&lt;/p&gt;

&lt;h2&gt;Oh the possibilities! So what&amp;#8217;s next?&lt;/h2&gt;

&lt;p&gt;We&amp;#8217;re only just starting to see the possibilities that this will offer our extractors. It opens up the possibility of pushing results not only to a different format, but possibly a completely different service. It&amp;#8217;s now trivial to create an output format that streams results directly into backgroundRB, a nanite worker, or a web service for further processing and data warehousing. By the time you read this, you&amp;#8217;ll also be able to pass in an array of outputs like :output =&gt; [:hash, :xml_file] and have both plugins generate the appropriate format(s). For the scraper I&amp;#8217;m currently working on where I have two different companies wanting the same data, this could be just the ticket for interfacing directly to their API as I scrape.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;d love to hear what ideas people might have for this, or how you think it could be improved. We&amp;#8217;re really hopeful that this is the kind of thing that makes developing and extending scRUBYt! really easy for those with more complicated needs.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/gMWBj9onwkg" height="1" width="1"/&gt;</description>
          <pubDate>Fri, 16 Jan 2009 01:18:45 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/01/16/scrubyt-gets-plugins/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/gMWBj9onwkg/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/01/16/scrubyt-gets-plugins/</feedburner:origLink></item>
    
        <item>
          <title>Web Scraping - The Amazon Example Finale</title>
          <description>&lt;p&gt;So yesterday I showed you a fairly contrived example of &lt;a href="http://rubypond.com/articles/2009/01/13/more-web-scrapers-with-the-upcoming-scrubyt/"&gt;how to build a web scraper&lt;/a&gt; using scRUBYt! to get data from Amazon (they&amp;#8217;ve got an API that would be much easier and more robust if you need access to this info, but that&amp;#8217;s not the point at the moment). But if you look at the results, they&amp;#8217;re not the greatest. There is too much noise in some fields, and we probably want to share there data with another system so a ruby based Hash object isn&amp;#8217;t going to work.&lt;/p&gt;

&lt;h2&gt;Removing empty results with scRUBYt!&lt;/h2&gt;

&lt;p&gt;For various reasons, sometimes you may not get all the data you want back for every record. It&amp;#8217;s usually related to your result definition being too restrictive, or a change in format on a specific page. Maybe the price information is in a different DIV if it is on sale. In any event, you need to make a decision on what to do. In the new release of scRUBYt! there are three immediate options that come to mind, but we may well build more in if required. First, and what we did yesterday in the example, is to do nothing. You&amp;#8217;ll get the nil/empty result returned back to you to handle as you see fit. Second, you can simply drop any fields that are nil (I&amp;#8217;ve just displayed the first few results):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot; do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;
    price &amp;quot;//b[@class='priceLarge']&amp;quot;, :remove_blank =&amp;gt; true
    saving &amp;quot;//td[@class='price']&amp;quot;
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;
  end
end
puts @extractor.results.inspect
=&amp;gt; [{:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Programming Language [ILLUSTRATED]  (Paperback)&amp;quot;}, 
             {:price=&amp;gt;&amp;quot;$26.39&amp;quot;}, 
             {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;}, 
             {:isbn=&amp;gt;&amp;quot;ISBN-10: 0596516177&amp;quot;}]}
    {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Programming Language (Paperback)&amp;quot;}, 
             {:saving=&amp;gt;nil}, 
             {:isbn=&amp;gt;&amp;quot;ISBN-10: 020171096X&amp;quot;}]}, 
    {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Beginning Ruby: From Novice to Professional (Beginning from Novice to Professional) (Paperback)&amp;quot;}, 
             {:price=&amp;gt;&amp;quot;$26.39&amp;quot;}, 
             {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;}, 
             {:isbn=&amp;gt;&amp;quot;ISBN-10: 1590597664&amp;quot;}]}
    ...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We&amp;#8217;ve set :remove_blank to true on the price field, and as a result you&amp;#8217;ll see that the 2nd result contains no price element. Alternatively, you could drop any detail block (in this example, any single book) which is missing the field:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot; do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;, :required =&amp;gt; true
    price &amp;quot;//b[@class='priceLarge']&amp;quot;, :required =&amp;gt; true
    saving &amp;quot;//td[@class='price']&amp;quot;
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;
  end
end
puts @extractor.results.inspect
=&amp;gt; [{:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Programming Language [ILLUSTRATED]  (Paperback)&amp;quot;},
             {:price=&amp;gt;&amp;quot;$26.39&amp;quot;},
             {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;},
             {:isbn=&amp;gt;&amp;quot;ISBN-10: 0596516177&amp;quot;}]},
    {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Beginning Ruby: From Novice to Professional (Beginning from Novice to Professional) (Paperback)&amp;quot;},
             {:price=&amp;gt;&amp;quot;$26.39&amp;quot;},
             {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;},
             {:isbn=&amp;gt;&amp;quot;ISBN-10: 1590597664&amp;quot;}]},
    {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Programming Ruby: The Pragmatic Programmers' Guide, Second Edition [ILLUSTRATED]  (Paperback)&amp;quot;},
             {:price=&amp;gt;&amp;quot;$29.67&amp;quot;},
             {:saving=&amp;gt;&amp;quot;$15.28\n      (34%)\n    &amp;quot;},
             {:isbn=&amp;gt;&amp;quot;ISBN-10: 0974514055&amp;quot;}]},
    ...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This time the &amp;#8220;The Ruby Programming Language (Paperback)&amp;#8221; book isn&amp;#8217;t included in the results at all. For our purposes though, I&amp;#8217;ve decided that I only want to know about books that I have &lt;em&gt;all&lt;/em&gt; the details for. Instead of setting :required on every result, I can specify it on the book_detail definition which will give the same output:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot;, :required =&amp;gt; :all do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;
    price &amp;quot;//b[@class='priceLarge']&amp;quot;
    saving &amp;quot;//td[@class='price']&amp;quot;
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Sanitizing scRUBYt! output&lt;/h2&gt;

&lt;p&gt;The output still isn&amp;#8217;t quite what we want. We&amp;#8217;ve dumped the empty results, but we&amp;#8217;ve still got that ugly &amp;#8220;ISBN-10: &amp;#8221; in front of the ISBN. We could clean it up later, but it&amp;#8217;s creating additional work for ourselves. And if we want this thing to scale (I&amp;#8217;ve got scrapers which scrape thousands of pages off a single site) trying to keep all that data hanging around in memory isn&amp;#8217;t going to work. So let&amp;#8217;s do as much of possible within the scRUBYt! definition as we&amp;#8217;re collecting the data: &lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot;, :required =&amp;gt; :all do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;
    price &amp;quot;//b[@class='priceLarge']&amp;quot;
    saving &amp;quot;//td[@class='price']&amp;quot;
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;, :script =&amp;gt; Proc.new{|isbn| isbn.gsub(&amp;quot;ISBN-10: &amp;quot;, &amp;quot;&amp;quot;)}
  end
end
puts @extractor.results.inspect
=&amp;gt; [{:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Programming Language [ILLUSTRATED]  (Paperback)&amp;quot;},
             {:price=&amp;gt;&amp;quot;$26.39&amp;quot;},
             {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;},
             {:isbn=&amp;gt;&amp;quot;0596516177&amp;quot;}]},
    ... ]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And now you&amp;#8217;ll see that we&amp;#8217;re getting a much cleaner ISBN result. Just create a Proc, the result will be passed into in and then do as you see fit. An if statement to check it contains something you expect, a regexp, the possibilities are endless. You can also combine this with the other options like :require and :remove_blank. So let&amp;#8217;s really jazz this thing up. We&amp;#8217;ll clean up the saving, and pull in the description and that&amp;#8217;s all the data we need:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot;, :required =&amp;gt; :all  do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;
    price &amp;quot;//b[@class='priceLarge']&amp;quot;
    saving &amp;quot;//td[@class='price']&amp;quot;, :script =&amp;gt; Proc.new{|saving| saving.match(/(\$[\d\.]*)/)[1]}
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;, :script =&amp;gt; Proc.new{|isbn| isbn.gsub(&amp;quot;ISBN-10: &amp;quot;,&amp;quot;&amp;quot;)}
    description &amp;quot;//div[@id='productDescription']//div[@class='content']&amp;quot;
  end
end
puts @extractor.results.inspect
=&amp;gt; [{:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Beginning Ruby: From Novice to Professional (Beginning from Novice to Professional) (Paperback)&amp;quot;},
             {:price=&amp;gt;&amp;quot;$26.39&amp;quot;},
             {:saving=&amp;gt;&amp;quot;$13.60&amp;quot;},
             {:isbn=&amp;gt;&amp;quot;1590597664&amp;quot;}, 
             {:description=&amp;gt;&amp;quot;Product Description\n  Ruby is perhaps best known as the engine powering the...&amp;quot;}]}, 
    {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Programming Ruby: The Pragmatic Programmers' Guide, Second Edition [ILLUSTRATED]  (Paperback)&amp;quot;}, 
             {:price=&amp;gt;&amp;quot;$29.67&amp;quot;}, 
             {:saving=&amp;gt;&amp;quot;$15.28&amp;quot;}, 
             {:isbn=&amp;gt;&amp;quot;0974514055&amp;quot;}, 
             {:description=&amp;gt;&amp;quot;Product Description\n  Ruby is an increasingly popular, fully object-oriented dynamic...&amp;quot;}]},
    ...]&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;Outputting results to XML&lt;/h2&gt;

&lt;p&gt;Inevitably there comes a time where you want to consume this data in something other than your ruby application. At that point, passing around a Hash is probably not the best idea. Alternatively, you might have a scraper that has to scrape hundreds to thousands of pages. Storing all the results in a Hash as you go will bring your machine to it&amp;#8217;s knees. So here comes one of the largest changes to the way the new release of scRUBYt! works.&lt;/p&gt;

&lt;p&gt;Previously, you always had results returned as a Hash and/or XML depending on your need. Everything was held in memory until you destroyed your extractor. Now, the standard XML option is to stream the results out to a file as they are processed and remove them from memory. There is no way to retrieve the results as XML within your program, they have to be streamed out to a file (and really, why would you want XML within your app when you can have native ruby structures instead?). So to save our scraper above out to an XML file you just pass a new output format and an instance of a File to the extractor:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@file = File.new(&amp;quot;amazon_results.xml&amp;quot;, &amp;quot;w&amp;quot;)
@extractor = Scrubyt::Extractor.new :output =&amp;gt; :xml_file, :file =&amp;gt; @file do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot;, :required =&amp;gt; :all  do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;
    price &amp;quot;//b[@class='priceLarge']&amp;quot;
    saving &amp;quot;//td[@class='price']&amp;quot;, :script =&amp;gt; Proc.new{|saving| saving.match(/(\$[\d\.]*)/)[1]}
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;, :script =&amp;gt; Proc.new{|isbn| isbn.gsub(&amp;quot;ISBN-10: &amp;quot;,&amp;quot;&amp;quot;)}
    description &amp;quot;//div[@id='productDescription']//div[@class='content']&amp;quot;
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now a call to @extractor.results at the end of the scrape will return no results. I hope that wasn&amp;#8217;t too much and it&amp;#8217;s given you a good view into how to create your very own web scraper. If you have any questions, head on over to the &lt;a href="http://www.scrubyt.org/"&gt;scRUBYt! forums&lt;/a&gt; or post them in the comments.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/eGbApyAbWo8" height="1" width="1"/&gt;</description>
          <pubDate>Thu, 15 Jan 2009 20:52:12 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/01/15/web-scraping---the-amazon-example-finale/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/eGbApyAbWo8/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/01/15/web-scraping---the-amazon-example-finale/</feedburner:origLink></item>
    
        <item>
          <title>More web scrapers with the upcoming scRUBYt!</title>
          <description>&lt;p&gt;In the previous articles I gave a brief glimpse at the upcoming scRUBYt! release. We learned how to do some basic &lt;a href="http://rubypond.com/articles/2008/12/09/web-spidering-and-data-extraction-with-scrubyt/"&gt;html scraping&lt;/a&gt;, and then followed it up by &lt;a href="http://rubypond.com/articles/2008/12/17/web-spider-creation-with-scrubyt---part-ii/"&gt;scraping multiple pages&lt;/a&gt;. As promised, in this installment I&amp;#8217;ll go over how to get more detailed logging output to help you debug your scraper while in development and how to submit and navigate forms.&lt;/p&gt;

&lt;h2&gt;Logging Scraper Output&lt;/h2&gt;

&lt;p&gt;There&amp;#8217;s been quite a fundamental change to the way scRUBYt! works internally of late. It doesn&amp;#8217;t manifest itself visually in the way you interface to it, but it does mean logging the output is now much easier and cleaner. Using our most basic example from the first tutorial, you just need to pass the :log_level into the extractor:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new(:log_level =&amp;gt; :verbose) do
  fetch &amp;quot;http://www.google.com/ncr&amp;quot;
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;By default, the output is directed to stdout so you&amp;#8217;d see the following on your screen:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;start
fetch: http://www.google.com/ncr
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you&amp;#8217;re after a more complex example, here is the scraper definition from the second part of the tutorial series and the corresponding log output:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new(:log_level =&amp;gt; :verbose) do
  fetch &amp;quot;http://www.google.com/ncr&amp;quot;
  fill_textfield &amp;quot;q&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  page_detail &amp;quot;//h3[@class='r']/a&amp;quot; do
    title &amp;quot;//title&amp;quot;
    summary &amp;quot;//p&amp;quot;, :script =&amp;gt; Proc.new{|result| result if result.match(%r{(\w+\W+){25}})}
  end
  next_page &amp;quot;//a[text()*='Next']&amp;quot;, :limit =&amp;gt; 2
end&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;start
fetch: http://www.google.com/ncr
textfield: 'q' = 'ruby'
with options ''
submit
next detail: 'page' = 'http://ruby-lang.org/'
with args: ''
next detail: 'page' = 'http://en.wikipedia.org/wiki/Ruby_(programming_language)'
with args: ''

etc...

next page: /search?hl=en&amp;amp;ie=UTF-8&amp;amp;q=ruby&amp;amp;start=10&amp;amp;sa=N
fetch: http://www.google.com/search?hl=en&amp;amp;ie=UTF-8&amp;amp;q=ruby&amp;amp;start=10&amp;amp;sa=N
next detail: 'page' = 'http://www.rubycentral.com/book/'
with args: ''

etc...

end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;At present the valid :log_level values are :none, :critical, :error, :warn, :info, :debug, and :verbose (in increasing order of noise). If you want to direct the log output to something other than stdout the only way at the moment is to override the Scrubyt::Logger#log method. I&amp;#8217;m looking at ways to make it easier to substitute in a file based or other logging approach.&lt;/p&gt;

&lt;h2&gt;Making Your Scraper Navigate Forms&lt;/h2&gt;

&lt;p&gt;So now that we know how to log the output, let&amp;#8217;s do something more useful with our scraper. We can&amp;#8217;t really take our Google example from previous posts any further given how simple their interface is, so let us move over to Amazon. Say I wanted to grab a list the books on ruby that are for sale. Sure, I could probably get this information via an Amazon API&amp;#8230; but that&amp;#8217;s not really the point now is it ;)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Scrubyt::Extractor.new(:log =&amp;gt; :debug) do
  fetch &amp;quot;http://www.amazon.com/&amp;quot;
  select_option &amp;quot;url&amp;quot;, &amp;quot;Books&amp;quot;
  fill_textfield &amp;quot;field-keywords&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  book_detail &amp;quot;//td[@class='dataColumn']/table/tr/td/a&amp;quot; do
    title &amp;quot;//h1[@class='parseasinTitle']&amp;quot;
    price &amp;quot;//b[@class='priceLarge']&amp;quot;
    saving &amp;quot;//td[@class='price']&amp;quot;
    isbn &amp;quot;//li[text()*='ISBN-10:']&amp;quot;
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So let&amp;#8217;s run through what we&amp;#8217;ve got here, hopefully some of it looks familiar from the previous examples we&amp;#8217;ve gone through. First we fetch the page to start with, then in the select field named &amp;#8220;url&amp;#8221; we choose the options that says &amp;#8220;Books&amp;#8221;, and we then submit the form. scRUBYt! will keep track of the last form you input any data to, so if there are multiple forms on the page then you just need to target the appropriate input fields. From there, the submit action will work out what it needs to do.&lt;/p&gt;

&lt;p&gt;Next is to define a detail block, so we point out the XPath to the heading/link for each book on the page and say we want to navigate to that page and extract the title, price, saving, etc. I&amp;#8217;ve been a little cheeky with the isbn definition, saying just find me any LI tag that contains the string &amp;#8220;ISBN-10:&amp;#8221;. If we were to look at the results generated you&amp;#8217;d see:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;puts @extractor.results.inspect
[{:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Programming Language [ILLUSTRATED]  (Paperback)&amp;quot;}, {:price=&amp;gt;&amp;quot;$26.39&amp;quot;}, 
          {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;}, 
          {:isbn=&amp;gt;&amp;quot;ISBN-10: 0596516177&amp;quot;}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Programming Language (Paperback)&amp;quot;}, 
          {:price=&amp;gt;nil}, 
          {:saving=&amp;gt;nil}, 
          {:isbn=&amp;gt;&amp;quot;ISBN-10: 020171096X&amp;quot;}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Beginning Ruby: From Novice to Professional (Beginning from Novice to Professional) (Paperback)&amp;quot;}, 
          {:price=&amp;gt;&amp;quot;$26.39&amp;quot;}, 
          {:saving=&amp;gt;&amp;quot;$13.60\n      (34%)\n    &amp;quot;}, 
          {:isbn=&amp;gt;&amp;quot;ISBN-10: 1590597664&amp;quot;}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Beginning Ruby: From Novice to Professional (Kindle Edition)&amp;quot;}, 
          {:price=&amp;gt;nil}, 
          {:saving=&amp;gt;nil}, 
          {:isbn=&amp;gt;nil}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Programming Ruby: The Pragmatic Programmers' Guide, Second Edition [ILLUSTRATED]  (Paperback)&amp;quot;}, 
          {:price=&amp;gt;&amp;quot;$29.67&amp;quot;}, 
          {:saving=&amp;gt;&amp;quot;$15.28\n      (34%)\n    &amp;quot;}, 
          {:isbn=&amp;gt;&amp;quot;ISBN-10: 0974514055&amp;quot;}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition) (Addison-Wesley Professional Ruby Series) (Paperback)&amp;quot;}, 
          {:price=&amp;gt;&amp;quot;$29.69&amp;quot;}, 
          {:saving=&amp;gt;&amp;quot;$15.30\n      (34%)\n    &amp;quot;}, 
          {:isbn=&amp;gt;&amp;quot;ISBN-10: 0672328844&amp;quot;}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Ruby Way, The: Solutions and Techniques in Ruby Programming (Kindle Edition)&amp;quot;}, 
          {:price=&amp;gt;nil}, 
          {:saving=&amp;gt;nil}, 
          {:isbn=&amp;gt;nil}]}, 
 {:book=&amp;gt;[{:title=&amp;gt;&amp;quot;Learning Ruby [ILLUSTRATED]  (Paperback)&amp;quot;}, 
          {:price=&amp;gt;&amp;quot;$23.09&amp;quot;}, 
          {:saving=&amp;gt;&amp;quot;$11.90\n      (34%)\n    &amp;quot;}, 
          {:isbn=&amp;gt;&amp;quot;ISBN-10: 0596529864&amp;quot;}]}
]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And we have a reasonable snapshot of the data. You&amp;#8217;ll see there though, that it&amp;#8217;s not perfect. Firstly, we are missing information for some results. We&amp;#8217;ve got spaces and carriage returns in the saving data, and we probably don&amp;#8217;t need the &amp;#8220;ISBN-10:&amp;#8221; string at the front of the ISBN result. And what if we wanted to link to the actual result so someone could actually buy the book on Amazon? &lt;/p&gt;

&lt;p&gt;All good questions, and all easily solvable. I&amp;#8217;ll follow it up with a post in the next day or so and highlight some of the new ways of specifying constraints on your data in scRUBYt!. And special thanks have to go to Homeflow who have been funding at least a day of time for Peter and I each week lately, hence the increased level of development in scRUBYt!. It&amp;#8217;s nice when you have clients who want to actively give back.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/F3mx_D_x6Yw" height="1" width="1"/&gt;</description>
          <pubDate>Tue, 13 Jan 2009 01:20:13 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/01/13/more-web-scrapers-with-the-upcoming-scrubyt/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/F3mx_D_x6Yw/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/01/13/more-web-scrapers-with-the-upcoming-scrubyt/</feedburner:origLink></item>
    
        <item>
          <title>Rails Hosting - Updated</title>
          <description>&lt;p&gt;This is just a quick note so that those subscribing to my feed or checking the site for updates are aware. I&amp;#8217;ve made a series of minor changes to the &lt;a href="http://rubypond.com/articles/2008/12/04/ruby-on-rails-hosting-round-up/"&gt;rails hosting&lt;/a&gt; post from last month based on the feedback provided by you guys. If you&amp;#8217;ve got any additional experiences with different rails hosts that you&amp;#8217;d like to share, then please send them through.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/_q5r2WVjcrg" height="1" width="1"/&gt;</description>
          <pubDate>Mon, 05 Jan 2009 23:10:53 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/01/05/rails-hosting---updated/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/_q5r2WVjcrg/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/01/05/rails-hosting---updated/</feedburner:origLink></item>
    
        <item>
          <title>Ruby on Rails Tutorials</title>
          <description>&lt;p&gt;There has been a few occasions over the previous months when various people I meet have asked me to point them in the right direction to get started with ruby on rails. I remember when I started, it was a pretty easy question to answer&amp;#8230; everyone just got pointed in the general direction of why_&amp;#8217;s poignant guide, the pickaxe book, and agile web development with rails. But things have come a long way since then. There is not only a veritable cornucopia of options now out there, but for somebody who isn&amp;#8217;t already engrossed in the community it&amp;#8217;s almost impossible to tell which ones are still relevant and useful and which ones are so dated that they&amp;#8217;ll make your attempts to learning markedly more difficult.&lt;/p&gt;

&lt;p&gt;So with that in mind, I&amp;#8217;ve attempted to compile a list of resources which are still (at least for the most part) still relevant at the time of posting. Some will have some minor inconsistencies either in implementation or current convention but I&amp;#8217;ve made a genuine effort to review and ensure they are not way out of line. If you find in errors on that front, please let me know in the comments.&lt;/p&gt;

&lt;p&gt;So without any further ado, here are the ruby on rails tutorials I&amp;#8217;ve found broken into various categories (a couple are listed in more than one):&lt;/p&gt;

&lt;h2&gt;Ruby on Rails Tutorials&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="#getting-started"&gt;Getting Started and Example Applications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#models-migrations"&gt;Models/Migrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#controllers"&gt;Controllers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#views"&gt;Views&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#routing"&gt;Routing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mail"&gt;Mail&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#testing"&gt;Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#performance-admin-debug"&gt;Performance, Administration, and Debugging&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#misc"&gt;Miscellaneous&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#special-mention"&gt;Extra Special Mention&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a name="getting-started"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Getting Started and Example Applications&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/getting_started_with_rails.html" rel="nofollow"&gt;Getting started with Rails&lt;/a&gt;
A great guide on building your first rails app and understanding the framework.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.bitwisemag.com/2/Ruby-On-Rails-Tutorial-The-Basics" rel="nofollow"&gt;Ruby On Rails Tutorial : The Basics&lt;/a&gt;
A two-part tutorial to getting up and running on rails by Huw Collingbourne.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://ruby.about.com/od/rubyonrails/ss/railsblog1.htm" rel="nofollow"&gt;Building a blog&lt;/a&gt;
About.com have have this guide for Rails 2.1 which runs you through the classic &amp;#8220;15-minute blog&amp;#8221; project.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://docs.activestate.com/komodo/4.4/tutorial/railstut.html" rel="nofollow"&gt;Starting a rails project using Komodo&lt;/a&gt;
For those of you using Komodo this is a great guide to show you how to develop within Komodo and run the tests within your IDE.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.webmonkey.com/tutorial/Ruby_on_Rails_for_Beginners" rel="nofollow"&gt;Building a bookmark manager&lt;/a&gt;
A quick introductory guide to rails and an example app to manage your bookmarks.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://godbit.com/article/beginners-guide-to-rails-part-1" rel="nofollow"&gt;Building a contact manager&lt;/a&gt;
A ruby on rails tutorial on how to get acquainted with the framework by building a contact list manager.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.oracle.com/technology/pub/articles/haefel-oracle-ruby.html" rel="nofollow"&gt;A simple product catalog&lt;/a&gt;
A tutorial from Oracle on how to create a comics catalog on top of an Oracle back-end.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://media.rubyonrails.org/video/rails_blog_2.mov" rel="nofollow"&gt;The official 15-min blog screencast&lt;/a&gt;
A rather high paced run-through of how to get a blog up and running in 15mins.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.sitepoint.com/article/learn-ruby-on-rails/" rel="nofollow"&gt;The ultimate guide to ruby &amp;amp; rails&lt;/a&gt;
Quite a detailed and useful tutorial from Patrick Lenz which not only goes through rails but is a pretty decent intro to the ruby you should get your head around first too.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://betterexplained.com/articles/starting-ruby-on-rails-what-i-wish-i-knew/" rel="nofollow"&gt;What I wish I knew&lt;/a&gt;
A bit of a retrospective from someone after they got their first few rails applications under their belt.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.railsforum.com/viewtopic.php?id=1033" rel="nofollow"&gt;Sending Instant Messages in Rails&lt;/a&gt;
How to use XMPP and BackgroundRB to send messages to GMail and other Jabber based clients.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.railsforum.com/viewtopic.php?id=1041" rel="nofollow"&gt;Programming Conventions&lt;/a&gt;
A discussion on what exactly DRY means, why you should stick to conventions, and when you should look at optimization.&lt;/p&gt;

&lt;p&gt;&lt;a name="models-migrations"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Models/Migrations&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/"&gt;Bulk Insert into ActiveRecord with Rails&lt;/a&gt;
How to quickly insert multiple records at once with ActiveRecord.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.oracle.com/technology/pub/articles/haefel-oracle-ruby.html" rel="nofollow"&gt;Using Oracle&lt;/a&gt;
A tutorial from Oracle on how to create a comics catalog on top of an Oracle back-end.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2007/11/20/reject-junk-email-addresses-with-email-veracity/"&gt;Rails email validation&lt;/a&gt;
Check that a provided email address really is legit, by doing a DNS/MX lookup on the domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.railsforum.com/viewtopic.php?id=1011" rel="nofollow"&gt;Rail Migrations for beginners&lt;/a&gt;
This tutorial is a little dated, but thankfully the examples all still work in Rails 2.x. What I particularly like, is that they actually cover how to add an index to a column which most don&amp;#8217;t cover and thus many don&amp;#8217;t bother to do.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/migrations.html" rel="nofollow"&gt;Rails Guide to Migrations&lt;/a&gt;
A very detailed guide on what you can and can&amp;#8217;t do in migrations. Even I learnt a few new tricks from reading through it.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/activerecord_validations_callbacks.html" rel="nofollow"&gt;ActiveRecord Validations and Callbacks&lt;/a&gt;
All the various validation options you have, and what method callbacks you can hook into when modifying an instance of a model.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/association_basics.html" rel="nofollow"&gt;ActiveRecord Associations&lt;/a&gt;
How to link up the associations between the models in your applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/finders.html" rel="nofollow"&gt;Finding records in your database&lt;/a&gt;
A tutorial on all the various ways you can find either a single record, or a collection of records, using ActiveRecord. The built in finders, and how to pass your own custom conditions in are all handled.&lt;/p&gt;

&lt;p&gt;&lt;a name="controllers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Controllers&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2007/11/27/simplifying-rails-controllers-with-make_resourceful/"&gt;Simplify Ruby on Rails controllers with make_resourceful&lt;/a&gt;
Take advantage of the DRY approach by quickly implementing common controller actions in just a few lines of code.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.devarticles.com/c/a/Ruby-on-Rails/Controlling-Information-Access-with-the-Rails-Action-Controller/" rel="nofollow"&gt;Controlling Information Access&lt;/a&gt;
A quick explanation of the why/when you should make methods in your controllers private or protected.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/07/11/useful-flash-messages-in-rails/"&gt;Displaying Useful Flash Messages&lt;/a&gt;
How to provide more useful feedback on errors or actions to your users within your application.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/07/16/sexy-forms-in-rails/"&gt;Consistent and sexy rails forms&lt;/a&gt;
How to both DRY up your code and ensure your users get a consistent experience by using a Rails Custom Form Builder.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/actioncontroller_basics.html" rel="nofollow"&gt;ActionController Basics&lt;/a&gt;
How to use filters, disabling the session, cookies, authentication, and protecting against forgery. It&amp;#8217;s called basics, but it covers quite a lot.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2007/12/13/new-error-handling-in-rails-2-0/"&gt;Catching rails error messages&lt;/a&gt;
An easy example of how to use the 2.x approach to catching errors in your rails applications.&lt;/p&gt;

&lt;p&gt;&lt;a name="views"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Views&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/form_helpers.html" rel="nofollow"&gt;Introduction to form helpers&lt;/a&gt;
A quick tutorial on when you should use the object/model based form helpers in rails, and when you should use the standard ones.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/layouts_and_rendering.html" rel="nofollow"&gt;Rendering layouts and views&lt;/a&gt;
How to render a response to the web browser within your rails app. Also goes into providing only a HEAD response, nesting layouts, using partials, redirects, and using yield.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/01/03/unobtrusive-javascript-with-lowpro/"&gt;Implementing unobtrusive javascript in rails&lt;/a&gt;
How to stop using the built-in javascript helpers and instead use lowpro to keep your views and code clean.&lt;/p&gt;

&lt;p&gt;&lt;a name="routing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Routing&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://adam.blog.heroku.com/past/2007/12/20/nested_resources_in_rails_2/" rel="nofollow"&gt;Nested Resources in Rails 2.x&lt;/a&gt;
A guide to bring you up to speed on the changes between the way resources work in 1.x and the 2.x releases.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/routing_outside_in.html" rel="nofollow"&gt;Rails Routing from the Outside-In&lt;/a&gt;
A very comprehensive guide on how to configure and use the routing options in rails.&lt;/p&gt;

&lt;p&gt;&lt;a name="mail"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Mail&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2007/11/16/trap-rails-from-sending-mail-via-smtp/"&gt;Stop Rails from sending email&lt;/a&gt;
A useful way to check that mail is getting sent, without actually sending it.&lt;/p&gt;

&lt;p&gt;&lt;a name="testing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Testing&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2007/12/10/testing-xml-output-in-rspec-and-rails/"&gt;Test Rails XML Output with rspec&lt;/a&gt;
How to easily test XML output using rspec.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/01/07/testing-web-services-with-rspec/"&gt;Testing webservices using rspec&lt;/a&gt;
A guide on testing the old style SOAP/XML webservices with rspec.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/03/31/using-rspec-have_tag/"&gt;Using rspec have_tag&lt;/a&gt;
How to test anything you need to in a view with rspec have_tag&lt;/p&gt;

&lt;p&gt;&lt;a name="performance-admin-debug"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Performance, Administration, and Debugging&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/02/06/improving-rails-app-and-mongrel-performance-with-thin/"&gt;Improve Rails Performance with Thin&lt;/a&gt;
A tutorial on replacing mongrel or webrick with the event driven thin web server.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/04/08/letting-nginx-automatically-detect-new-rails-apps/"&gt;Let nginx automatically become aware of your Rails apps&lt;/a&gt;
How to configure nginx to automatically become aware of any new apps you deploy and start serving them without any additional configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/04/07/rails-god-config/"&gt;Monitoring your Rails app with god&lt;/a&gt;
How to use the god monitoring tool to manage, start, stop, and restart your rails app.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://effectif.com/2008/4/8/monitoring-backgroundrb-with-god"&gt;Monitoring BackgroundRB with god&lt;/a&gt;
Another tutorial on using god, this one for monitoring BackgroundRB.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/06/18/bulk-insertion-of-data-with-activerecord/"&gt;Improving Bulk Data Insertion with ActiveRecord&lt;/a&gt;
Reduce the number of round-trips to the database by batch inserting records into ActiveRecord.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/07/17/the-complete-guide-to-setting-up-starling/"&gt;Background Processing with Starling&lt;/a&gt;
How to off-load long running tasks to a queue so that you don&amp;#8217;t tie up your webserver processes requests needlessly.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.railsenvy.com/2007/2/28/rails-caching-tutorial" rel="nofollow"&gt;Ruby on Rails Caching Tutorial&lt;/a&gt;
Another two-part tutorial. This one, by Gregg from RailsEnvy, runs through the various ruby on rails caching options and how to implement them to improve the performance of your site.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.railsenvy.com/2007/6/11/ruby-on-rails-rake-tutorial" rel="nofollow"&gt;Ruby on Rails Rake Tutorial&lt;/a&gt;
Gregg from RailsEnvy again running through how to make the most of rake. If you&amp;#8217;re not already writing your own custom tasks in rake you should definitely take a look because there is probably a handful of things your could (and should) be doing in it.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.sitepoint.com/article/debug-rails-app-ruby-debug/" rel="nofollow"&gt;Debug your app with ruby-debug&lt;/a&gt;
I don&amp;#8217;t know how I ever diagnosed problems in development before I learnt how
to use ruby-debug. If you&amp;#8217;ve got applications even just staring to come together, you really need to get your head around this tutorial.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.rubyinside.com/19-rails-tricks-most-rails-coders-dont-know-131.html" rel="nofollow"&gt;19 Rails Tricks Most Rails Coders Don&amp;#8217;t Know&lt;/a&gt;
Not a tutorial per se, but a good collection of handy tips on how to benchmark performance, test various aspects of your app, improve it&amp;#8217;s performance, and more.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.railsforum.com/viewtopic.php?id=872" rel="nofollow"&gt;Reading a stack trace&lt;/a&gt;
You&amp;#8217;re inevitably going to see a few when your developing, you better learn how to make sense of them.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/security.html" rel="nofollow"&gt;Ruby on Rails Security Guide&lt;/a&gt;
Storing and protecting the session, user management, projecting from injection, and more.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/debugging_rails_applications.html" rel="nofollow"&gt;Debugging Rails Applications&lt;/a&gt;
How to use views to help you debug your app, detecting memory leaks, and using the logger.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/benchmarking_and_profiling.html" rel="nofollow"&gt;Benchmarking and Profiling Rails&lt;/a&gt;
The why and how of benchmarking and profiling. Still a work in progress.&lt;/p&gt;

&lt;p&gt;&lt;a name="misc"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Miscellaneous&lt;/h2&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/12/08/custom-starting-templates-for-rails-apps/"&gt;Define your own base rails framework with templates&lt;/a&gt;
Stop manually importing and installing all the same plugins in all of your applications, define a template in just a few lines and let rails know what it should start with.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/06/26/make-your-own-im-bot-in-ruby-and-interface-it-with-your-rails-app/"&gt;Integrate an IM Bot into your Rails App &lt;/a&gt;
How to make an Instant Messaging BOT and use it in your rails application.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://guides.rails.info/creating_plugins.html" rel="nofollow"&gt;Creating Plugins&lt;/a&gt;
How to create a plugin for use in your rails app.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rubypond.com/articles/2008/02/08/improved-timezones-in-rails-2-1/"&gt;Working with Rails Timezones&lt;/a&gt;
How to use the 2.x enhancements to easily switch between timezones or display dates/times in a users local zone.&lt;/p&gt;

&lt;p&gt;&lt;a name="special-mention"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;Super extra special mention&lt;/h2&gt;

&lt;p&gt;Ryan Bates is simply tireless in providing useful screencasts showing various bits of functionality and tips within rails. If I was to list each of the ones I thought would be useful here, then the list would probably be 3 times longer than it already is. Instead, make sure you check out &lt;a href="http://railscasts.com/" rel="nofollow"&gt;Railscasts&lt;/a&gt; and scan through the archive.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/ycwf5Y38Ghk" height="1" width="1"/&gt;</description>
          <pubDate>Mon, 05 Jan 2009 02:34:09 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2009/01/05/ruby-on-rails-tutorials/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/ycwf5Y38Ghk/</link>
        <feedburner:origLink>http://rubypond.com/articles/2009/01/05/ruby-on-rails-tutorials/</feedburner:origLink></item>
    
        <item>
          <title>Web Spider Creation with scRUBYt! - Part II </title>
          <description>&lt;p&gt;Continuing on from the previous post, &lt;a href="/articles/2008/12/09/web-spidering-and-data-extraction-with-scrubyt/"&gt;Web Spidering and Data Extraction with scRUBYt!&lt;/a&gt;, this article will help you delve a little deeper with the &lt;a href="http://www.scrubyt.org"&gt;scRUBYt! scraping framework&lt;/a&gt; both in terms of your understanding of how to use it&amp;#8230; and actually delving deeper in your crawl to more pages.&lt;/p&gt;

&lt;h2&gt;A quick recap on the last web spider&lt;/h2&gt;

&lt;p&gt;So as far as we got last week was to go off to Google, put in a search for the word &amp;#8220;ruby&amp;#8221;, and then list the link text for each of the results. Here&amp;#8217;s the code we ended up with to get that far:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Skimr::Extractor.new(:agent =&amp;gt; :standard) do
  fetch &amp;quot;http://www.google.com/ncr&amp;quot;
  fill_textfield &amp;quot;q&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  page_title &amp;quot;//h3[@class='r']&amp;quot;
end&lt;/code&gt;&lt;/pre&gt;

&lt;h2&gt;How to scrape deeper pages&lt;/h2&gt;

&lt;p&gt;But this is a fairly contrived example, and it&amp;#8217;s not exactly going to save you a huge amount of time over a quick manual copy-and-paste job from the results page. But what if you didn&amp;#8217;t want to just have a list of the links, you wanted some kind of summary or additional detail on each of the links? Lets actually go to each website, and see what kind of content they&amp;#8217;ve got and grab something useful from it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Skimr::Extractor.new(:agent =&amp;gt; :standard) do
  fetch &amp;quot;http://www.google.com/ncr&amp;quot;
  fill_textfield &amp;quot;q&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  page_detail &amp;quot;//h3[@class='r']/a&amp;quot; do
    page_body &amp;quot;//body&amp;quot;
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you offer up the XPath to a link element with a result name ending in _detail, and then pass in a block, scRUBYt! will follow the link before trying to process the block. That means we can use this technique for following each of the results Google gives us, and at each we return all of the text contain on the page between the &amp;lt;body&amp;gt; tags. &lt;/p&gt;

&lt;p&gt;Now you could take the hash returned, and do some post processing on it to get something meaningful from the text extracted. But for the sake of example, I&amp;#8217;ll make some big assumptions and assume that every page is going to have at least a &amp;lt;title&amp;gt; tag, and a &amp;lt;p&amp;gt; tag with more than just a few words in it.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Skimr::Extractor.new(:agent =&amp;gt; :standard) do
  fetch &amp;quot;http://www.google.com/ncr&amp;quot;
  fill_textfield &amp;quot;q&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  page_detail &amp;quot;//h3[@class='r']/a&amp;quot; do
    title &amp;quot;//title&amp;quot;
    summary &amp;quot;//p&amp;quot;, :script =&amp;gt; Proc.new{|result| result if result.match(%r{(\w+\W+){25}})}
  end
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you may have noticed, you can pass a proc in as a parameter to your result definition. The output of the XPath match will be passed in to the proc, and the result ultimately returned as the final result for that definition? Make sense? If not, what I&amp;#8217;ve done above is look for all &amp;lt;p&amp;gt; tags on the page and pass them in to my proc definition. The proc then runs a regexp against it to check that at least 25 words exist within the &amp;lt;p&amp;gt;, if there is then all the &amp;lt;p&amp;gt; content is returned otherwise nil is returned.&lt;/p&gt;

&lt;h2&gt;What if the results are paginated?&lt;/h2&gt;

&lt;p&gt;You could potentially create a highly recursive extractor to handle this, but it&amp;#8217;s such a common case that we&amp;#8217;ve included a method to do it for you:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@extractor = Skimr::Extractor.new(:agent =&amp;gt; :standard) do
  fetch &amp;quot;http://www.google.com/ncr&amp;quot;
  fill_textfield &amp;quot;q&amp;quot;, &amp;quot;ruby&amp;quot;
  submit
  page_detail &amp;quot;//h3[@class='r']/a&amp;quot; do
    title &amp;quot;//title&amp;quot;
    summary &amp;quot;//p&amp;quot;, :script =&amp;gt; Proc.new{|result| result if result.match(%r{(\w+\W+){25}})}
  end
  next_page &amp;quot;//a[text()*='Next']&amp;quot;, :limit =&amp;gt; 2
end&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I&amp;#8217;ve used the XPath text() function here to highlight its usefulness. I use it quite a lot as a shortcut to get things working and test, and it&amp;#8217;s been a lifesaver in many scenarios where the markup is inconsistent or you want to keep the scraper definition generic. And thankfully, it works for this scenario. However, be wary of using it as is in production as it may have some unexpected side effects. If one of the results that came back had the word &amp;#8220;Next&amp;#8221; in the title, then scRUBYt! would diligently follow that link and you&amp;#8217;d end up on the wrong page.&lt;/p&gt;

&lt;h2&gt;What&amp;#8217;s next?&lt;/h2&gt;

&lt;p&gt;In the next installment I&amp;#8217;ll briefly cover how to handle logging of the scrape to help you diagnose any problems, and how to handle more complex form completion.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/rubypond/~4/g1CBnhSn18o" height="1" width="1"/&gt;</description>
          <pubDate>Wed, 17 Dec 2008 10:00:35 GMT</pubDate>
          <guid isPermaLink="false">http://rubypond.com/articles/2008/12/17/web-spider-creation-with-scrubyt---part-ii/</guid>
          <link>http://feedproxy.google.com/~r/rubypond/~3/g1CBnhSn18o/</link>
        <feedburner:origLink>http://rubypond.com/articles/2008/12/17/web-spider-creation-with-scrubyt---part-ii/</feedburner:origLink></item>
    
    
  </channel>
</rss>
