<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><title>ht4</title><description>thoughts of a web developer and a sysadmin</description><link>http://blog.ht4.ca</link><lastBuildDate>Thu, 23 Feb 2012 14:18:37 GMT</lastBuildDate><generator>typhoon</generator><ttl>60</ttl><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ht4" /><feedburner:info uri="ht4" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><title>Your first HipChat bot</title><description>&lt;p&gt;A few weeks ago I stumbled upon a &lt;a href="https://hipchat.com"&gt;promising new startup&lt;/a&gt; while searching for a hosted group chat solution.&lt;/p&gt;

&lt;p&gt;Besides being feature-rich and easy to use, &lt;a href="https://hipchat.com"&gt;HipChat&lt;/a&gt; allows you to write your own client by exposing an XMPP entry-point.&lt;/p&gt;

&lt;p&gt;If you have never heard of &lt;a href="http://en.wikipedia.org/wiki/Extensible_Messaging_and_Presence_Protocol"&gt;XMPP&lt;/a&gt; it is a common chat protocol used by services such as &lt;a href="http://code.google.com/apis/talk/open_communications.html#which"&gt;Google Talk&lt;/a&gt; and &lt;a href="http://developers.facebook.com/docs/chat/"&gt;Facebook Chat&lt;/a&gt;.&lt;/p&gt;

&lt;!-- more --&gt;

&lt;h1&gt;wobot&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://github.com/cjoudrey/wobot/blob/master/README.md"&gt;wobot&lt;/a&gt; is my attempt at abstracting the XMPP protocol in &lt;a href="http://nodejs.org"&gt;Node.js&lt;/a&gt; and offering a simple API to write your own bot.&lt;/p&gt;

&lt;p&gt;&lt;img src="/2011/05/11/your-first-hipchat-bot/screen1.png"/ &gt;&lt;/p&gt;

&lt;h1&gt;Installing wobot&lt;/h1&gt;

&lt;p&gt;To get started you will need a few things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js (&lt;a href="http://nodejs.org/#download"&gt;http://nodejs.org/#download&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;npm (&lt;a href="http://npmjs.org"&gt;http://npmjs.org&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, wobot depends on the &lt;a href="https://github.com/astro/node-xmpp"&gt;node-xmpp&lt;/a&gt; module which requires the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;libexpat1-dev: &lt;code&gt;apt-get install libexpat1-dev&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;libicu-dev: &lt;code&gt;apt-get install libicu-dev&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you have installed the build dependencies, install wobot in your working directory using npm:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;mkdir ~/mybot
cd ~/mybot
npm install wobot
&lt;/code&gt;&lt;/pre&gt;

&lt;h1&gt;Configurations&lt;/h1&gt;

&lt;p&gt;Once wobot is installed you will need to add a new member to your HipChat group and login as that member.&lt;/p&gt;

&lt;p&gt;Under "My Account" &gt; "XMPP/Jabber Info" you will find the following:&lt;/p&gt;

&lt;p&gt;&lt;img src="/2011/05/11/your-first-hipchat-bot/screen2.png" /&gt;&lt;/p&gt;

&lt;p&gt;Instantiate the &lt;code&gt;wobot.Bot&lt;/code&gt; class as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var wobot = require('wobot');
var bot = new wobot.Bot({
  jid: '1234_12345@chat.hipchat.com/bot',
  password: 'yourpassword',
  name: 'XMPP Bot'
});
bot.connect();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After running the above script, you should see your bot in HipChat's Lobby.&lt;/p&gt;

&lt;h1&gt;Auto-joining on connect&lt;/h1&gt;

&lt;p&gt;Once the bot is connected you will likely want it to join one or many channels. This can be done as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;bot.on('connect', function() {
  this.join('1234_bot_testing@conf.hipchat.com');
});
&lt;/code&gt;&lt;/pre&gt;

&lt;h1&gt;Reacting to a message&lt;/h1&gt;

&lt;p&gt;Whenever a message is sent to a channel your bot is in, the &lt;code&gt;message&lt;/code&gt; event will be emitted.&lt;/p&gt;

&lt;p&gt;Here is a simple example of a echo bot:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;bot.on('message', function(channel, from, msg) {
  if (from === this.name) return false;
  this.message(channel, '@' + from + ' you just said: ' + msg);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src="/2011/05/11/your-first-hipchat-bot/screen3.png" /&gt;&lt;/p&gt;

&lt;h1&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;The goal of this article was to briefly introduce wobot.&lt;/p&gt;

&lt;p&gt;There are many &lt;a href="https://github.com/cjoudrey/wobot/tree/master/examples"&gt;more examples&lt;/a&gt; and &lt;a href="https://github.com/cjoudrey/wobot/blob/master/README.md"&gt;detailed documentation&lt;/a&gt; on GitHub.&lt;/p&gt;

&lt;p&gt;-Christian Joudrey&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ht4/~4/NEiBAQXmj0o" height="1" width="1"/&gt;</description><pubDate>Wed, 11 May 2011 00:00:00 GMT</pubDate><guid isPermaLink="false">http://blog.ht4.ca/2011/05/11/your-first-hipchat-bot</guid><link>http://feedproxy.google.com/~r/ht4/~3/NEiBAQXmj0o/your-first-hipchat-bot</link><feedburner:origLink>http://blog.ht4.ca/2011/05/11/your-first-hipchat-bot</feedburner:origLink></item><item><title>Grooveshark: Behind the scenes</title><description>&lt;p&gt;Since its &lt;a href="http://blog.grooveshark.com/post/2083426261/on-being-broke-n"&gt;revamp in 2010&lt;/a&gt;, most of the magic in Grooveshark happens on the client-side. I will not be talking about these technologies as it was already covered by Jerod Santo's post: &lt;a href="http://blog.jerodsanto.net/2010/12/the-tech-behind-the-new-grooveshark/"&gt;The Tech Behind the New Grooveshark&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What I'd like to address is the data exchanges between the client and the server when you hit the play button and the search button.&lt;/p&gt;

&lt;!-- more --&gt;

&lt;p&gt;I do not work for Grooveshark. Some of this information may be incorrect.&lt;/p&gt;

&lt;p&gt;When you first load &lt;a href="http://listen.grooveshark.com"&gt;Grooveshark&lt;/a&gt; a session id is set by PHP:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Set-Cookie: PHPSESSID=1c3b5c7d906f60cab128b1a2b4c30201; expires=Wed, 06-Apr-2011 23:11:20 GMT; path=/; domain=.grooveshark.com
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This session id is required throughout the API and is used to compute a communication token.&lt;/p&gt;

&lt;h1&gt;The SWF proxy&lt;/h1&gt;

&lt;p&gt;Most of the communication between the client-side and the server is handled by the Flash object &lt;code&gt;JSQueue.swf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For those of you who are interested, Grooveshark logs all traffic between the SWF proxy and the server to the browser's console:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/uT7WM.png" /&gt;&lt;/p&gt;

&lt;p&gt;Most calls are POST requests to &lt;code&gt;https://listen.grooveshark.com/more.php?&lt;method name&gt;&lt;/code&gt; with a JSON formatted string as the data:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  "parameters": &lt;params&gt;,
  "method": &lt;method name&gt;,
  "header": {
    "uuid": &lt;uuid&gt;,
    "clientRevision": &lt;"20101222.5" | "20101222"&gt;,
    "country: &lt;localization hash&gt;,
    "privacy": 0,
    "session": &lt;session id&gt;,
    "client": &lt;"jsqueue" | "htmlshark"&gt;,
    "token": &lt;hashed communication token&gt;
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Depending on the method called, &lt;code&gt;client&lt;/code&gt; and &lt;code&gt;clientRevision&lt;/code&gt; will differ. For instance, searching for songs must be done with &lt;code&gt;htmlshark&lt;/code&gt; while downloading songs must be &lt;code&gt;jsqueue&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The API responds with a JSON formatted string with the following structure:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  "header": {
    "session": &lt;session id&gt;,
    "serviceVersion: "20100903".
    "prefetchEnabled": true
  },
  "result": &lt;result&gt;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h1&gt;The communication token&lt;/h1&gt;

&lt;p&gt;When the app first loads, an API call is made to &lt;code&gt;getCommunicationToken&lt;/code&gt; with the following parameters:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{ "secretKey": "&lt;secret key&gt;" }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The secret key is actually just a MD5 hash of your session id:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var n=hex_md5(GS.service.sessionID);req=new a("getCommunicationToken",{secretKey:n},t,w,{},true)}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The result of this call is a 13 hex character string which they call the communication token.&lt;/p&gt;

&lt;h1&gt;Hashing the communication token&lt;/h1&gt;

&lt;p&gt;The communication token is actually used by the Flash proxy to compute another token that the server validates your request with.&lt;/p&gt;

&lt;p&gt;This final token is composed of two parts: the &lt;code&gt;randomizer&lt;/code&gt; and the &lt;code&gt;hash&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;randomizer&lt;/code&gt; consists of 6 random hex characters that are regenerated by the Flash proxy before each API request.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;hash&lt;/code&gt; is computed as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;Removed as requested by Grooveshark&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;h1&gt;Searching for songs&lt;/h1&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/Frh8U.png" /&gt;&lt;/p&gt;

&lt;p&gt;When you search for a song using the main search box multiple requests are made to the API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A search request for &lt;code&gt;Songs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A search request for &lt;code&gt;Playlists&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A search request for &lt;code&gt;Users&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A search request for &lt;code&gt;Artists&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A search request for &lt;code&gt;Albums&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data returned by the API is used to build the various elements of the results view:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/94W6L.png" /&gt; &lt;img src="http://i.imgur.com/JZ6di.png" /&gt; &lt;img src="http://i.imgur.com/gykaB.png" /&gt;&lt;/p&gt;

&lt;p&gt;What is interesting about these requests is that the pagination is done on the client-side. The API appears to always return a maximum of 200 results.&lt;/p&gt;

&lt;p&gt;All the queries are sent to the same method &lt;code&gt;getSearchResultsEx&lt;/code&gt; with the following parameters:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
    "query": &lt;search query&gt;,
    "type": &lt;"Songs" | "Playlists" | "Users" | "Artists" | "Albums"&gt;,
    "guts": 0,
    "ppOverride": false
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;From my tests, it appears that this method requires the &lt;code&gt;client&lt;/code&gt; value of the JSON header to be set to &lt;code&gt;htmlshark&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Grooveshark has a lot of interesting data. Searching by &lt;code&gt;"type": "Songs"&lt;/code&gt; return many interesting fields such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AlbumID&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ArtistID&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SongName&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AlbumName&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ArtistName&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CoverArtFilename&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EstimateDuration&lt;/code&gt; (in seconds)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;IsLowBitrateAvailable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Popularity&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ArtistPopularity&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SongPlays&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ArtistPlays&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SphinxWeight&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Score&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Rank&lt;/code&gt; (used to sort the results, I think)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sample song entry:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  SongID: '7507736',
  AlbumID: '1123311',
  ArtistID: '490',
  GenreID: '16',
  Name: 'Cry for Help',
  SongName: 'Cry for Help',
  AlbumName: 'Greatest Hits',
  ArtistName: 'Rick Astley',
  Year: '',
  TrackNum: '16',
  CoverArtFilename: '1123311.jpg',
  TSAdded: '1209773471',
  AvgRating: 0,
  AvgDuration: 246,
  EstimateDuration: 244,
  Flags: 0,
  IsLowBitrateAvailable: '1',
  IsSponsored: '0',
  IsVerified: '1',
  SongVerified: '1',
  AlbumVerified: 1,
  ArtistVerified: 1,
  Popularity: 1100400047,
  AlbumPopularity: 0,
  ArtistPopularity: 1100400084,
  SongPlays: 87,
  ArtistPlays: 2882,
  SphinxWeight: 350700,
  Score: 41414.48648202,
  Rank: 0.99154757716654
}
&lt;/code&gt;&lt;/pre&gt;

&lt;h1&gt;Downloading cover arts&lt;/h1&gt;

&lt;p&gt;Cover arts are available in 3 formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small: 100x100&lt;/li&gt;
&lt;li&gt;Medium: 170x170&lt;/li&gt;
&lt;li&gt;Large: 240x240&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They can be accessed through: &lt;code&gt;http://beta.grooveshark.com/static/amazonart/&lt;first letter of size&gt;&lt;cover art filename&gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href="http://beta.grooveshark.com/static/amazonart/s1123311.jpg"&gt;http://beta.grooveshark.com/static/amazonart/s1123311.jpg&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/9uqEe.jpg" /&gt;&lt;/p&gt;

&lt;h1&gt;Obtaining a song&lt;/h1&gt;

&lt;p&gt;About a year ago I was looking into how Grooveshark worked. At the time, when you played a song, the API would return a link to an MP3 file hosted on an &lt;a href="http://www.akamai.com/"&gt;Akamai&lt;/a&gt; server. The link would have an expiry time of about 7 minutes.&lt;/p&gt;

&lt;p&gt;Since then, they have rolled out a new way of serving the media. They seem to be running a bunch of &lt;a href="http://www.lighttpd.net/"&gt;lighttpd&lt;/a&gt; servers that stream the media.&lt;/p&gt;

&lt;p&gt;The first step is to obtain the stream's ip and a stream key for a given song.&lt;/p&gt;

&lt;p&gt;This is done by calling &lt;code&gt;getStreamKeyFromSongIDEx&lt;/code&gt; with the following parameters:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  "prefetch": &lt;true | false&gt;,
  "mobile": &lt;true | false&gt;,
  "country": &lt;same localization hash as header&gt;,
  "songID": &lt;song id&gt;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The API response contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;uSecs&lt;/code&gt;: The length of the song in micro seconds.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ip&lt;/code&gt;: Hostns to tame of the stream server.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;streamKey&lt;/code&gt;: Key used to obtain the song from the stream server.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The next step is quite straightforward.&lt;/p&gt;

&lt;p&gt;Perform a &lt;code&gt;POST&lt;/code&gt; request to &lt;code&gt;http://&lt;ip&gt;/stream.php&lt;/code&gt; with the following url encoded parameter and the MP3 will be returned:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;streamKey=&lt;stream key&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;h1&gt;Interesting facts about the media files&lt;/h1&gt;

&lt;p&gt;When &lt;code&gt;mobile=false&lt;/code&gt; songs are sent as they were uploaded. This includes the original ID3 tags and the original bitrate.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;mobile=true&lt;/code&gt; songs are converted to mono and a bitrate of 64 kbps. In addition, all ID3 tags have been stripped.&lt;/p&gt;

&lt;h1&gt;Closing remarks&lt;/h1&gt;

&lt;p&gt;If you want to use Grooveshark's API, you might be better off (legally) to use their HTTP API: &lt;a href="http://apishark.com"&gt;ApiShark.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Although this explains how to download songs from Grooveshark, it's not an excuse to do so. Grooveshark's &lt;a href="http://www.grooveshark.com/terms"&gt;terms of service&lt;/a&gt; explicitly disallows any storage of data from their service among other things. You should very carefully read and take notice of it.&lt;/p&gt;

&lt;p&gt;If someone from Grooveshark happens to stumble across this, let me first say that I am a great fan of your service. If you are unhappy with this blog post for some reason, feel free to contact me.&lt;/p&gt;

&lt;p&gt;-Christian Joudrey&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ht4/~4/Wa74RdhM2N4" height="1" width="1"/&gt;</description><pubDate>Wed, 30 Mar 2011 00:00:00 GMT</pubDate><guid isPermaLink="false">http://blog.ht4.ca/2011/03/30/grooveshark-behind-the-scenes</guid><link>http://feedproxy.google.com/~r/ht4/~3/Wa74RdhM2N4/grooveshark-behind-the-scenes</link><feedburner:origLink>http://blog.ht4.ca/2011/03/30/grooveshark-behind-the-scenes</feedburner:origLink></item><item><title>Dissecting Google thumbnails</title><description>&lt;p&gt;Today I had an itch to look into how Google generates the preview thumbnails when you do a search.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/NhS6D.png"/&gt;&lt;!-- more --&gt;&lt;/p&gt;

&lt;h1&gt;Google results page&lt;/h1&gt;

&lt;p&gt;The first thing I discovered is that the thumbnails are only loaded the first time you click a magnifying glass &lt;img src="http://i.imgur.com/vkShs.png"/&gt;. In the case where you access a previous query, the thumbnails are loaded from the cache once the results are rendered to the screen.&lt;/p&gt;

&lt;h1&gt;JSONP /webpagethumbnail request&lt;/h1&gt;

&lt;p&gt;After your first click the magnifying glass, 10 JSONP calls (1 per search result) are made to &lt;code&gt;http://clients1.google.com/webpagethumbnail&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;An example request for my search query &lt;code&gt;site:reddit.com programming&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;c=11
r=2
f=2
s=300:585
hl=en
gl=ca

query=programming
d=http://www.reddit.com/programming
b=1
j=google.vs.r
a=IFs
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A few values are hardcoded in the page's HTML (before the search results are even loaded), namely the thumbnail size &lt;code&gt;s&lt;/code&gt; and locale values &lt;code&gt;hl&lt;/code&gt; and &lt;code&gt;gl&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;"kfeUrlPrefix":"/webpagethumbnail?c=11&amp;r=2&amp;f=2&amp;s=300:585&amp;query=&amp;hl=en&amp;gl=ca"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The next values are what interest me though:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;query&lt;/code&gt; only contains the keyword I searched and not my entire query &lt;code&gt;site:reddit.com programming&lt;/code&gt;. I find this particularly interesting as this "slicing" logic seems to be done client-side.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;d&lt;/code&gt; contains the full URL of the given result item.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;j&lt;/code&gt; contains the JSONP callback function&lt;/li&gt;
&lt;li&gt;&lt;code&gt;a&lt;/code&gt; contains a 3 character checksum to prevent 3rd party requests (from what I concluded) that is obtained with the results HTML (in this case &lt;code&gt;IFs&lt;/code&gt;):&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/UXiE2.png"/ &gt;&lt;/p&gt;

&lt;h1&gt;JSONP /webpagethumbnail response&lt;/h1&gt;

&lt;p&gt;The thumbnails are sent with a expiry time of 1 day from a server running &lt;code&gt;snapshot_btfe&lt;/code&gt; (likely the codename of their thumbnail server). No surprise there.&lt;/p&gt;

&lt;p&gt;The structure of the returned JSON goes as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  "s": "b",
  "b": 1,
  "dim": [302, 585],
  "ssegs": [ "data:image/jpeg...", "data:image/jpeg..." ],
  "ssegs-heights: [405, 180],
  "tbts": [ ... ],
  "url": "http://www.reddit.com/r/programming"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dim&lt;/code&gt; contains the total width and height of the thumbnails&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ssegs&lt;/code&gt; contains an array of strings each composed of a data uri with a segment of the thumbnail&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ssegs-heights&lt;/code&gt; contains the height of each segment&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tbts&lt;/code&gt; contains an array of text that will be overlayed on top of the thumbnails&lt;/li&gt;
&lt;li&gt;&lt;code&gt;url&lt;/code&gt; contains the url of the requested page&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this time I am unsure what &lt;code&gt;s&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt; are used for.&lt;/p&gt;

&lt;h1&gt;Building the thumbnail&lt;/h1&gt;

&lt;p&gt;The thumbnail appears to be split into segments when the height is greater than 405 pixels. I'm guessing this is either for performance reasons or compatibility (&lt;a href="http://en.wikipedia.org/wiki/Data_URI_scheme#Disadvantages"&gt;IE8 supports max 32KB data URIs&lt;/a&gt;)?&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/7AalF.jpg" /&gt; &lt;img src="http://i.imgur.com/kyCnx.jpg" /&gt;&lt;/p&gt;

&lt;p&gt;Both segments are simply appended one after the other in the preview bubble.&lt;/p&gt;

&lt;h1&gt;Building the overlay text&lt;/h1&gt;

&lt;p&gt;As I previously explained the overlay text data is contained within the JSON in the &lt;code&gt;tbts&lt;/code&gt; array.&lt;/p&gt;

&lt;p&gt;Each text overlay has an entry in the array with the following values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;box&lt;/code&gt; contains the dimension and position (top, left) of the thumbnail highlight&lt;/li&gt;
&lt;li&gt;&lt;code&gt;txt&lt;/code&gt; contains the HTML text that is displayed in the overlay&lt;/li&gt;
&lt;li&gt;&lt;code&gt;txtBox&lt;/code&gt; contains the dimensions and position of the text box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;{
  "box": {
    "h": 10,
    "l": 211,
    "t": 71,
    "w": 74
  },
  "txt": "A reddit for discussion and news about computer &lt;em&gt;programming&lt;/em&gt; &lt;b&gt;...&lt;/b&gt;",
  "txtBox": {
    "h": 42,
    "l": 0,
    "t": 25,
    "w": 300
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A &lt;code&gt;div&lt;/code&gt; is then appended for each box and textbox into the preview bubble which gives the end result:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/f4bvN.png" /&gt;&lt;/p&gt;

&lt;h1&gt;Unanswered questions&lt;/h1&gt;

&lt;p&gt;Unfortunately there are many unanswered questions. I would really be greatful if someone at Google made an official post about how the thumbnails work.&lt;/p&gt;

&lt;p&gt;Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How are the thumbnail images generated? I'm guessing they are using a headless version of Chrome?&lt;/li&gt;
&lt;li&gt;How are the position of the boxes calculated?&lt;/li&gt;
&lt;li&gt;What kind of infrastructure is behind the thumbnail service?&lt;/li&gt;
&lt;li&gt;What is the ratio of pages that are currently thumbnailed?&lt;/li&gt;
&lt;li&gt;Will there ever be an open API?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;-Christian Joudrey&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ht4/~4/1OsirOlmDwU" height="1" width="1"/&gt;</description><pubDate>Wed, 23 Mar 2011 00:00:00 GMT</pubDate><guid isPermaLink="false">http://blog.ht4.ca/2011/03/23/disecting-google-thumbnails</guid><link>http://feedproxy.google.com/~r/ht4/~3/1OsirOlmDwU/disecting-google-thumbnails</link><feedburner:origLink>http://blog.ht4.ca/2011/03/23/disecting-google-thumbnails</feedburner:origLink></item><item><title>Fiddling with netcat - intercepting proxy</title><description>&lt;p&gt;A couple of days ago Chris needed a way to see how a particular client was interacting with a server. Obviously there are numerous ways to do this, but I was curious how easy it would be to implement something similar with a quick &lt;a href="http://en.wikipedia.org/wiki/Netcat"&gt;netcat&lt;/a&gt; command. Sure enough after a little bit of fiddling I was able to produce exactly what he needed.&lt;!-- more --&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;nc -l -p 12345 &lt; pipe | tee outgoing.log | nc server 12345 | tee pipe incoming.log
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now this may seem a little cryptic so I'm going to dissect each portion to explain how it works. Keep in mind the &lt;code&gt;pipe&lt;/code&gt; references an actual &lt;a href="http://en.wikipedia.org/wiki/Named_pipe"&gt;pipe&lt;/a&gt;. You can make a FIFO pipe by running &lt;code&gt;mkfifo pipe&lt;/code&gt; or &lt;code&gt;mknod pipe p&lt;/code&gt; - the former is the most usual way. If you're not familiar with named pipes I recommend &lt;a href="http://en.wikipedia.org/wiki/Named_pipe"&gt;reading up on them&lt;/a&gt; before continuing with this post as you may get a little confused.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;nc -l -p 12345 &lt; pipe
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This portion simply has netcat listen on port 12345 and send anything from the pipe to the connected client. If you're not familiar with the pipes think of it as a simple file with the word &lt;code&gt;hello&lt;/code&gt; in it. When someone were to successfully connect to the netcat instance it would send the &lt;code&gt;hello&lt;/code&gt; to the client.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| tee outgoing.log
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you're not familiar with &lt;a href="http://unixhelp.ed.ac.uk/CGI/man-cgi?tee"&gt;tee&lt;/a&gt; this may seem a bit obscure. Tee prints the things piped to it to stdout as well as writing it to a file. In this instance any traffic from the connected client will get printed to stdout and to the file &lt;code&gt;outgoing.log&lt;/code&gt;. An example of how this would work is if I connected to the netcat instance and simply typed &lt;code&gt;hello&lt;/code&gt; it would print it out to the screen and log it to the &lt;code&gt;outgoing.log&lt;/code&gt; file.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| nc server 12345
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is the server that you would normally want to connect to. Remember the goal is to make a quick intercepting proxy to see how the client reacts to the server. This is the server.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;| tee pipe incoming.log
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here is where the magic happens. This completes the relay so the client and server can communicate across the proxy. What this does is takes the network traffic from the server and using tee prints it to stdout while piping it to our &lt;code&gt;pipe&lt;/code&gt; and &lt;code&gt;incoming.log&lt;/code&gt; files respectively.&lt;/p&gt;

&lt;p&gt;Now all of this may make sense individually, though how they work together might be slightly confusing.&lt;/p&gt;

&lt;p&gt;If you recall the first command sends all data from our &lt;code&gt;pipe&lt;/code&gt; to the client - and at the end we pipe all data from the server to the &lt;code&gt;pipe&lt;/code&gt;. See now? We're simply taking all data the server send and sending it to the client completing the relay and allowing for normal operation.&lt;/p&gt;

&lt;p&gt;Now in this case Chris needed this for a quick look at how a normal IRC client interacts with the server since the RFC is lacking - so here is a real world example of where this was used (though there's likely infinite better ways to do it):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;nc -l -p 12345 &lt; pipe | tee outgoing.log | nc irc.freenode.net 6667 | tee pipe incoming.log
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You'll notice when you execute the above command you'll start seeing some traffic from the server instantly:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;NOTICE AUTH :*** Looking up your hostname...
NOTICE AUTH :*** Checking ident
NOTICE AUTH :*** No identd (auth) response
NOTICE AUTH :*** Couldn't look up your hostname
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we connect to the netcat server - in this case localhost on port 12345 and if everything goes as planned it should connect like normal to Freenode. If you take a peak at the netcat server you'll see a bunch of activity!&lt;/p&gt;

&lt;p&gt;The cool part is the logs - we can see exactly how this particular IRC client (IRSSI) and server (Freenode) interact.&lt;/p&gt;

&lt;p&gt;Once again this isn't the best way to do this - tcpdump, wireshark and infinite other choices are available. That being said it's fun to fiddle and learn.&lt;/p&gt;

&lt;p&gt;-Cody Robertson&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ht4/~4/WkV8nn5kzFA" height="1" width="1"/&gt;</description><pubDate>Wed, 02 Dec 2009 00:00:00 GMT</pubDate><guid isPermaLink="false">http://blog.ht4.ca/2009/12/02/fiddling-with-netcat-intercepting-proxy</guid><link>http://feedproxy.google.com/~r/ht4/~3/WkV8nn5kzFA/fiddling-with-netcat-intercepting-proxy</link><feedburner:origLink>http://blog.ht4.ca/2009/12/02/fiddling-with-netcat-intercepting-proxy</feedburner:origLink></item></channel></rss>

