<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><description>An exploration of people and data management, the evolution of learning and the scientific method in an era of data-intensive distributed computing, and efficient knowledge capture and distribution using the web.  Probably other stuff, too.</description><title>Mainly Data</title><generator>Tumblr (3.0; @jeh)</generator><link>http://jeffhammerbacher.com/</link><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/MainlyData" type="application/rss+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item><title>Blogging for the Yoga Bear Cause</title><description>&lt;p&gt;I’ve been lucky to watch &lt;a title="Yoga Bear" href="http://www.yogabear.org" target="_blank"&gt;Yoga Bear&lt;/a&gt; grow from an idea to a nationwide organization of hundreds of volunteers, yoga studios, and participants recovering from cancer. The &lt;a title="Blogging for a Cause" href="http://www.zemanta.com/bloggingforacause/" target="_blank"&gt;Blogging for a Cause&lt;/a&gt; campaign from Zemanta is looking for nonprofits to help out, and I can’t think of one more worthy than Yoga Bear. From events like &lt;a title="Yogapalooza" href="http://www.yogabear.org/page/yogapalooza-1" target="_blank"&gt;Yogapalooza&lt;/a&gt; and the &lt;a title="Meeting of the Minds" href="http://blog.yogabear.org/2009/01/meeting-of-minds.html" target="_blank"&gt;Meeting of the Minds&lt;/a&gt; to their commitment to hilighting &lt;a title="Yoga and Cancer Research" href="http://blog.yogabear.org/search/label/research" target="_blank"&gt;research&lt;/a&gt; on the effectiveness of alternative medicine in the treatment of cancer, Yoga Bear is enhancing the lives of many on a shoestring budget. Let’s hope Zemanta recognizes the great work Yoga Bear is doing and awards them the grant!&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/GTDyylb8V58/114974065</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/114974065</guid><pubDate>Fri, 29 May 2009 12:53:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/114974065</feedburner:origLink></item><item><title>The image above is used by Google to implement a common...</title><description>&lt;img src="http://12.media.tumblr.com/EdUe32Hjqiaiajzke2Er08mzo1_250.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;The image above is used by Google to implement a common technique known as &lt;a title="CSS Sprites" href="http://www.alistapart.com/articles/sprites/" target="_blank"&gt;CSS spriting&lt;/a&gt;. This technique is used by most major websites to minimize HTTP requests and hence improve page load times. As an example, Facebook loads all of the News Feed icons in a single PNG file, displayed below. And yes, reblogging this post was an excuse for me to display Facebook’s icon set, which is totally awesome—Facebook’s design team is superb.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Facebook's icon set" src="http://facebook.com/rsrc.php/z8BAQ/h/8nj7d9vy/images/sprite/icons.png" height="1884" width="18"/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://davemorin.tumblr.com/post/68268731/this-is-crazy-genius-superamit-googles" target="_blank"&gt;davemorin&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This is crazy genius.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://superamit.tumblr.com/post/68202422/googles-search-pages-load-this-1-single-image" target="_blank"&gt;superamit&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Google’s search pages load this 1 single image for their logo and all the UI parts and pieces, using CSS to crop the images as needed, which seems to me as 100% more effi­cient than man­ag­ing a pool of tiny images.” - &lt;a href="http://www.dressedinvalue.com/2008/12/google/" target="_blank"&gt;Dressed In Value: Google&lt;/a&gt; (via &lt;a href="http://chrisglass.com/" target="_blank"&gt;Chris Glass&lt;/a&gt;)&lt;a href="http://www.dressedinvalue.com/2008/12/google/" target="_blank"&gt;&lt;br/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Smart!&lt;/b&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/6LVDRxN2O2g/74216957</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/74216957</guid><pubDate>Thu, 29 Jan 2009 20:11:00 -0800</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/74216957</feedburner:origLink></item><item><title>Hadoop for a living</title><description>&lt;a href="http://www.cloudera.com"&gt;Hadoop for a living&lt;/a&gt;: &lt;p&gt;Hadoop has been good to me, so I’m going to try to be good to Hadoop. Come say hello to us in Burlingame.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/l_rKp76WOjE/57229691</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/57229691</guid><pubDate>Thu, 30 Oct 2008 22:33:24 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/57229691</feedburner:origLink></item><item><title>Hive is alive!</title><description>&lt;a href="https://issues.apache.org/jira/browse/HADOOP-3601"&gt;Hive is alive!&lt;/a&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/4LtahxJLppo/46180526</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/46180526</guid><pubDate>Sat, 16 Aug 2008 03:25:40 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/46180526</feedburner:origLink></item><item><title>bigdata</title><description>&lt;a href="http://en.oreilly.com/oscon2008/public/schedule/detail/2933"&gt;bigdata&lt;/a&gt;: &lt;p&gt;If you’re at &lt;a title="oscon" href="http://en.oreilly.com/oscon2008/" target="_blank"&gt;OSCON&lt;/a&gt;, please attend this talk and let me know if the hyperbolic claims in this abstract are substantiated.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/t2NR0fOU05o/43232974</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/43232974</guid><pubDate>Wed, 23 Jul 2008 01:18:02 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/43232974</feedburner:origLink></item><item><title>August in San Francisco</title><description>&lt;p&gt;&lt;a href="http://secretenemyhideout.com/post/42725099/august-in-san-francisco" target="_blank"&gt;zachklein&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;I’m renting a place near Alamo Sqaure for a month in order to be closer to a handful of friends I’ve been distant from for too long, to recruit developers, to learn the geography of the Bay Area, and to have a convenient jumpoff point for a few backpacking trips in the Cascades.&lt;/blockquote&gt;
&lt;p&gt;yay.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/nfUSyyCWhEA/42859007</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/42859007</guid><pubDate>Sat, 19 Jul 2008 16:08:05 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/42859007</feedburner:origLink></item><item><title>Cassandra</title><description>&lt;a href="http://code.google.com/p/the-cassandra-project/"&gt;Cassandra&lt;/a&gt;: &lt;p&gt;Congratulations to Avinash, Prashant, and Karthik.&lt;/p&gt;
&lt;p&gt;
&lt;object height="355" width="425"&gt;
&lt;param value="center" name="align"&gt;
&lt;param value="true" name="allowFullScreen"&gt;
&lt;param value="always" name="allowScriptAccess"&gt;
&lt;param value="http://static.slideshare.net/swf/ssplayer2.swf?doc=datapresentationscassandrasigmod-1214939766004171-9" name="src"&gt;
&lt;embed height="355" width="425" src="http://static.slideshare.net/swf/ssplayer2.swf?doc=datapresentationscassandrasigmod-1214939766004171-9" allowscriptaccess="always" allowfullscreen="true" align="center" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;
&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/voIkF-swzf4/40749429</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/40749429</guid><pubDate>Wed, 02 Jul 2008 15:52:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/40749429</feedburner:origLink></item><item><title>MMDS 2008 and CIM</title><description>&lt;p&gt;Last week the second &lt;a title="MMDS" href="http://www.stanford.edu/group/mmds/" target="_blank"&gt;Workshop on Algorithms for Modern Massive Data Sets&lt;/a&gt; was held at Stanford University.  This workshop had an incredible density of prestigious &lt;a title="speakers" href="http://www.stanford.edu/group/mmds/abstracts.html" target="_blank"&gt;speakers&lt;/a&gt; from the field of machine learning; I guess spending a weekend in California during the summer is an easy sell.&lt;/p&gt;
&lt;p&gt;Each of the four days of the conference had a theme.  The official themes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Data Analysis and Data Applications&lt;/li&gt;
&lt;li&gt;Networked Data and Algorithmic Tools&lt;/li&gt;
&lt;li&gt;Statistical, Geometric, and Topological Methods&lt;/li&gt;
&lt;li&gt;Machine Learning and Dimensionality Reduction&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After attending several talks by these machine learning luminaries over the course of four days, I tried to pull together a few common themes of my own for further exploration:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Incomplete Dyadic Data&lt;/li&gt;
&lt;li&gt;Distributing Data and Computation&lt;/li&gt;
&lt;li&gt;Manifolds with Noise&lt;br/&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I’ll try to put a detailed post up on each of these three topics this week.&lt;/p&gt;
&lt;p&gt;Over the past few years I’ve become a regular conference attendee.  It seems plausible that I will continue to attend KDD, VLDB, SIGMOD, and MMDS for years to come, leaving me with a major question: where can I find a longitudinal analysis of the content of these conferences?&lt;/p&gt;
&lt;p&gt;It seems that these conferences provide an excellent yearly cross-section of the state of their respective fields, but I’m much more interested in the deltas.  What are the new topics, which topics are making forward progress, and which topics are losing steam?  Taking time to collect this information would provide some fascinating insight into the progress of science.&lt;/p&gt;
&lt;p&gt;The KDD community has done some &lt;a title="DBConnect" href="http://portal.acm.org/citation.cfm?id=1348549.1348558&amp;coll=portal&amp;dl=ACM" target="_blank"&gt;analysis&lt;/a&gt; of the &lt;a title="DBLP" href="http://www.informatik.uni-trier.de/~ley/db/" target="_blank"&gt;DBLP&lt;/a&gt; data set, but not with the aims proposed above.  From the VLDB community has come some work that is a bit closer in intent: AnHai Doan’s &lt;a title="Cimple" href="http://pages.cs.wisc.edu/~anhai/projects/cimple/" target="_blank"&gt;Cimple project&lt;/a&gt;.  So far, they’ve produced the moderately useful &lt;a title="DBLife" href="http://dblife.cs.wisc.edu/" target="_blank"&gt;DBLife&lt;/a&gt;, and they’ve outlined a promising research direction.&lt;/p&gt;
&lt;p&gt;I’ve recently started a &lt;a title="conference type" href="http://freebase.com/view/user/hammer/default_domain/conference" target="_blank"&gt;conference type&lt;/a&gt; on Freebase and have a simple tool to monitor conference progress using their API.  I’d like to make this tool more general and robust.  Any CIM students or conference geeks looking for a summer project?&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/igf1jyWVzBc/40393497</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/40393497</guid><pubDate>Mon, 30 Jun 2008 02:29:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/40393497</feedburner:origLink></item><item><title>My good friend, musical mastermind Matt O’Malley, cooked...</title><description>&lt;embed type="application/x-shockwave-flash" src="http://jeffhammerbacher.com/swf/audio_player.swf?audio_file=http://www.tumblr.com/audio_file/40036150/n7dBUAlkMaq5mp47nApv3r18&amp;color=FFFFFF" height="27" width="207" quality="best"&gt;&lt;/embed&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;My good friend, musical mastermind &lt;a title="Matt O'Malley" href="http://www.omalleymusic.com" target="_blank"&gt;Matt O’Malley&lt;/a&gt;, cooked up a theme song for the &lt;a title="Data Team" href="http://www.facebook.com/pages/Facebook-Data-Team/8394258414" target="_blank"&gt;Facebook Data Team&lt;/a&gt;.  You can also check it out on my &lt;a title="Muxtape" href="http://jhammerb.muxtape.com" target="_blank"&gt;Muxtape&lt;/a&gt;.  It’s just so, so good.  If you would like to create an animation to accompany the music, please let me know.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/4tPZqcWHLjE/40036150</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/40036150</guid><pubDate>Fri, 27 Jun 2008 02:51:42 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/40036150</feedburner:origLink></item><item><title>REPL</title><description>&lt;p&gt;Python is an amazing language for many reasons, but code prototyping via the interactive interpreter has crippled my development speed in languages that do not have a &lt;a title="REPL" href="http://en.wikipedia.org/wiki/REPL" target="_blank"&gt;REPL&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For this reason, I was ecstatic when I joined Facebook and learned they had developed their own interactive shell for PHP, &lt;a title="PHPSH" href="http://www.phpsh.org" target="_blank"&gt;phpsh&lt;/a&gt;.  My Javascript development was hastened when I found Mozilla’s &lt;a title="spidermonkey" href="http://www.mozilla.org/js/spidermonkey/" target="_blank"&gt;spidermonkey&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I also enjoyed learning Ruby and Erlang, because in Ruby, you have &lt;a title="irb" href="http://en.wikipedia.org/wiki/Interactive_Ruby_Shell" target="_blank"&gt;irb&lt;/a&gt;, and in Erlang, you have the Erlang shell, &lt;a title="EShell" href="http://www.erlang.org/doc/getting_started/seq_prog.html#2.1" target="_blank"&gt;Eshell&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I recently came across CERN’s &lt;a title="CINT" href="http://root.cern.ch/twiki/bin/view/ROOT/CINT" target="_blank"&gt;CINT&lt;/a&gt;, a REPL for C/C++.  Finally!  I have come to dread coding in these languages, especially C++, but I’m now looking forward to my next big C/C++ project.  If you’ve used CINT, drop me a line.&lt;/p&gt;
&lt;p&gt;I’ve also started looking for a nice Java REPL.  There’s &lt;a title="DynamicJava" href="http://koala.ilog.fr/djava" target="_blank"&gt;DynamicJava&lt;/a&gt;, and the &lt;a title="Groovy Shell" href="http://groovy.codehaus.org/Groovy+Shell" target="_blank"&gt;Groovy interactive shell&lt;/a&gt;.  Anything else I should try?&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/DJm8kqzvorU/40012837</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/40012837</guid><pubDate>Thu, 26 Jun 2008 22:30:55 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/40012837</feedburner:origLink></item><item><title>Wired Magazine: (Brief) Dispatches from the "Petabyte Age"</title><description>&lt;p&gt;I tend to geek hard when a publication drops a series of articles about petabyte-scale data analysis.  Frequent culprits include the &lt;a title="SIGMOD Record" href="http://www.sigmod.org/record/" target="_blank"&gt;SIGMOD Record&lt;/a&gt; or &lt;a title="Teradata Magazine" href="http://www.teradata.com/tdmo/" target="_blank"&gt;Teradata Magazine&lt;/a&gt;, but this week, &lt;a title="Wired" href="http://www.wired.com/science/discoveries/magazine/16-07/pb_intro" target="_blank"&gt;Wired Magazine&lt;/a&gt; splashed into the pool.&lt;/p&gt;
&lt;p&gt;Chris Anderson (the &lt;a title="Long Tail" href="http://en.wikipedia.org/wiki/The_Long_Tail" target="_blank"&gt;long tail&lt;/a&gt; one) starts things off by declaring that &lt;a title="scientific method" href="http://www.wired.com/science/discoveries/magazine/16-07/pb_theory" target="_blank"&gt;the data deluge spells trouble for the scientific method&lt;/a&gt;.  Clearly this claim is false; Google’s success is almost entirely due to rapid product iterations via thousands of hypothesis tests every month.  To give Chris the benefit of the doubt, he seems to be asserting instead that the practice of model development should be done in close concert with the collection of empirical data.  I couldn’t agree more.&lt;/p&gt;
&lt;p&gt;Near the end of the article, Chris makes a common (if frustrating) mistake: he claims that the cluster used for the NSF’s &lt;a title="CluE" href="http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm" target="_blank"&gt;CluE&lt;/a&gt; program will be running the &lt;a title="Google File System" href="http://labs.google.com/papers/gfs.html" target="_blank"&gt;Google File System&lt;/a&gt;, as well as crediting Google and IBM for building the software that will power this cluster.  In actuality, Yahoo deserves almost all of the credit, as they have done an incredible job scaling the &lt;a title="Hadoop" href="http://hadoop.apache.org/core/" target="_blank"&gt;Hadoop&lt;/a&gt; project to thousands of nodes and making the NSF program possible.  Just another testament to the effectiveness of Google’s &lt;a title="Google PR" href="http://www.businessweek.com/magazine/content/07_52/b4064048925836.htm" target="_blank"&gt;PR machine&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The rest of the articles cover many application areas of large scale data analysis in brief, including agriculture, astronomy, high-energy physics, politics, epidemiology, and insurance.  There’s also a startlingly incoherent attempt to describe how &lt;a title="MapReduce" href="http://labs.google.com/papers/mapreduce.html" target="_blank"&gt;MapReduce&lt;/a&gt; works that probably should have been left out.&lt;/p&gt;
&lt;p&gt;Wired’s intentions were noble but their execution was not up to expectations.  I’m hoping that tomorrow’s &lt;a title="MMDS" href="http://www.stanford.edu/group/mmds/" target="_blank"&gt;MMDS Workshop&lt;/a&gt; will have a bit more substance.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/8yIb3dGSaKs/39755862</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39755862</guid><pubDate>Wed, 25 Jun 2008 01:22:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39755862</feedburner:origLink></item><item><title>Bloomberg for the Web?  We Need Real-Time News, Data, and Analytics.</title><description>&lt;p&gt;The NYT had an &lt;a title="Bloomberg Thomson Reuters" href="http://www.nytimes.com/2008/06/23/business/media/23thomson.html?adxnnl=1&amp;ref=business&amp;pagewanted=all" target="_blank"&gt;article&lt;/a&gt; today about the battle for efficient financial services information delivery currently heating up between &lt;a title="Bloomberg" href="http://www.bloomberg.com/" target="_blank"&gt;Bloomberg&lt;/a&gt; and &lt;a title="Thomson Reuters" href="http://www.thomsonreuters.com/" target="_blank"&gt;Thomson Reuters&lt;/a&gt;.  A few weeks ago, there were articles about the &lt;a title="NASDAQ real time data" href="http://www.nasdaq.com/newsroom/news/newsroomnewsStory.aspx?textpath=pr2008%5CACQPMZ200806020730PRIMZONEFULLFEED143845.htm" target="_blank"&gt;NASDAQ&lt;/a&gt; and &lt;a title="NYSE real time data" href="http://www.reuters.com/article/marketsNews/idINN0432325620080605?rpc=44" target="_blank"&gt;NYSE&lt;/a&gt; making their stock quotes available in real time to multiple information outlets.  And just last week, &lt;a title="Tibco" href="http://www.tibco.com/" target="_blank"&gt;Tibco&lt;/a&gt;, once a wholly owned subsidiary of Reuters, purchased &lt;a title="Insightful" href="http://www.insightful.com/" target="_blank"&gt;Insightful&lt;/a&gt;.  Insightful makes the S-Plus statistical analysis software on which the &lt;a title="R Project" href="http://www.r-project.org/" target="_blank"&gt;R project&lt;/a&gt; is based.  Tibco already owns &lt;a title="Spotfire" href="http://spotfire.tibco.com/" target="_blank"&gt;Spotfire&lt;/a&gt;, a firm that makes excellent software for data exploration.&lt;/p&gt;
&lt;p&gt;These developments made me think back to my time in financial services.  Trading floors have the highest concentration of numerate people I’ve ever been around.  They also have ready access to superb information manipulation software.&lt;/p&gt;
&lt;p&gt;Now consider the current state of analytics software for the web.  I won’t bother to list the major competitors, as they are all pretty mediocre.  As the online advertising space grows in sophistication and mechanisms to promote a more efficient market are introduced (see, for example, &lt;a title="ADSDAQ" href="http://exchange.contextweb.com/" target="_blank"&gt;&lt;/a&gt;&lt;a title="Right Media" href="http://www.rightmedia.com" target="_blank"&gt;Right Media&lt;/a&gt; and ContextWeb’s &lt;a title="ADSDAQ" href="http://exchange.contextweb.com/" target="_blank"&gt;ADSDAQ&lt;/a&gt;), a workbench for the real-time exploration of news and data related to the web will be a necessary tool for many quantitative marketers.&lt;/p&gt;
&lt;p&gt;As Hal Varian &lt;a title="hal varian" href="http://blogs.wsj.com/economics/2007/07/19/economics-according-to-google/" target="_blank"&gt;points out&lt;/a&gt;, marketing is the next field to be overrun with quants, and I expect that the tools most useful in finance will be brought along for the invasion.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/9kQ3TntlMQo/39606942</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39606942</guid><pubDate>Mon, 23 Jun 2008 23:03:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39606942</feedburner:origLink></item><item><title>Yoga Bear: Now Using YUI and App Engine</title><description>&lt;a href="http://www.yogabear.org"&gt;Yoga Bear: Now Using YUI and App Engine&lt;/a&gt;: &lt;p&gt;My girlfriend &lt;a title="Halle" href="http://halle.tumblr.com" target="_blank"&gt;Halle&lt;/a&gt; works hard at her nonprofit, &lt;a title="Yoga Bear" href="http://www.yogabear.org" target="_blank"&gt;Yoga Bear&lt;/a&gt;.  I help out by maintaining the website.  Today I ported the website from &lt;a title="Django" href="http://www.djangoproject.com" target="_blank"&gt;Django&lt;/a&gt; (hosted by &lt;a title="WebFaction" href="http://www.webfaction.com" target="_blank"&gt;WebFaction&lt;/a&gt;) to Google’s &lt;a title="App Engine" href="http://code.google.com/appengine" target="_blank"&gt;App Engine&lt;/a&gt;; I also ported the CSS to use &lt;a title="YUI" href="http://developer.yahoo.com/yui" target="_blank"&gt;YUI&lt;/a&gt;’s reset, fonts, and grids.&lt;/p&gt;
&lt;p&gt;As a result, we no longer have to pay hosting fees and the site renders properly in IE 6.  Furthermore, the transition will allow me to explore the YUI AJAX libraries and App Engine data store to add some new features to the Yoga Bear website.  The web is beginning to feel less hostile to rapid development.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/19pb-RWNPCA/39486383</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39486383</guid><pubDate>Mon, 23 Jun 2008 02:08:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39486383</feedburner:origLink></item><item><title>I made a wordle from my delicious tags.  I found this site in...</title><description>&lt;img src="http://19.media.tumblr.com/n7dBUAlkMajwuic7v4U03znJ_250.png"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;I made a &lt;a title="wordle" href="http://wordle.net" target="_blank"&gt;wordle&lt;/a&gt; from my delicious tags.  I found this site in the App Engine gallery.&lt;/p&gt;
&lt;p&gt;I really dislike web programming due to its many &lt;a title="accidental complexities" href="http://books.google.com/books?id=gJrmszNHQV4C&amp;pg=RA5-PA129&amp;output=html&amp;source=gbs_toc_s&amp;cad=1&amp;sig=W8_DPqpyHpixqBF2LPWchXodw_4" target="_blank"&gt;accidental complexities&lt;/a&gt;, but the simplicity of developing a web app in python via App Engine has me experimenting with the web once again.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/p5Kh0GTZgCw/39445127</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39445127</guid><pubDate>Sun, 22 Jun 2008 17:59:30 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39445127</feedburner:origLink></item><item><title>TIME Digital's 1998 "Cyber Elite"</title><description>&lt;a href="http://www.time.com/time/digital/cyberelite/"&gt;TIME Digital's 1998 "Cyber Elite"&lt;/a&gt;: &lt;p&gt;A late-night browsing session landed me on this gem from ten years ago.  Some highlights:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It’s been said before, but reading the description of &lt;a title="David Bohnett" href="http://www.time.com/time/digital/cyberelite/16.html" target="_blank"&gt;GeoCities&lt;/a&gt; from 1998 really drives home the fact that online social networking dates back to the beginning of the web.&lt;/li&gt;
&lt;li&gt;
&lt;a title="DoubleClick" href="http://www.time.com/time/digital/cyberelite/47.html" target="_blank"&gt;DoubleClick&lt;/a&gt; is really old school.  It’s crazy to think of them integrating with Google.  Also, there’s a reminder of how controversial cookies were on the early web: “Using controversial software called cookies, DoubleClick sites can snoop on users’ browsing habits, sometimes picking up such critical information as zip codes”.&lt;/li&gt;
&lt;li&gt;
&lt;a title="Rupert Murdoch" href="http://www.time.com/time/digital/cyberelite/19.html" target="_blank"&gt;Rupert Murdoch&lt;/a&gt;’s internet ambitions were already apparent.&lt;/li&gt;
&lt;li&gt;
&lt;a title="Joe Nacchio" href="http://www.time.com/time/digital/cyberelite/32.html" target="_blank"&gt;Joe Nacchio&lt;/a&gt; was not on trial.&lt;/li&gt;
&lt;li&gt;Even before the iPod, &lt;a title="Jonathan Ive" href="http://www.time.com/time/digital/cyberelite/44.html" target="_blank"&gt;Jonathan Ive&lt;/a&gt; was recognized for his visionary designs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;TIME also showed an early awareness of the web’s global reach, highlighting Japanese, Chinese, and Brazilian leaders.&lt;/p&gt;
&lt;p&gt;Historical artifacts like these have helped bring perspective to my first two and a half years in the Valley.  For another historical reference, check out an old &lt;a title="Forbes" href="http://www.forbes.com/forbes/2003/0901/072_print.html" target="_blank"&gt;Forbes article&lt;/a&gt; about the aftermath of Yahoo’s first major acquisition spree.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/qCG0zTtHAMs/39274109</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39274109</guid><pubDate>Sat, 21 Jun 2008 04:51:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39274109</feedburner:origLink></item><item><title>Creative Commons Technology Summit 2008</title><description>&lt;p&gt;On Wednesday, Creative Commons hosted their first ever &lt;a title="Technology Summit" href="http://wiki.creativecommons.org/Creative_Commons_Technology_Summit_2008-06-18" target="_blank"&gt;technology summit&lt;/a&gt; at Google.  I’ve been into CC for a &lt;a title="protest" href="http://www.flickr.com/photos/fcb/86753755/in/set-72057594048528507/" target="_blank"&gt;while&lt;/a&gt; now and I was relatively pleased by their progress since I last checked in.&lt;/p&gt;
&lt;p&gt;&lt;a title="Joi Ito" href="http://joi.ito.com" target="_blank"&gt;Joi Ito&lt;/a&gt; started the day off with a lucid delineation of CC’s major components.  He pointed out the technical side of CC is hoping to create a set of standards for digital media exchange in a similar spirit to what the &lt;a title="IETF" href="http://www.ietf.org/" target="_blank"&gt;IETF&lt;/a&gt; does for the internets as a whole.  The political side of CC is more akin to the &lt;a title="Open Internet Coalition" href="http://www.openinternetcoalition.com/" target="_blank"&gt;Open Internet Coalition&lt;/a&gt; that is fighting to keep these standards in neutral hands.&lt;/p&gt;
&lt;p&gt;&lt;a title="Ben Adida" href="http://ben.adida.net/" target="_blank"&gt;Ben&lt;/a&gt; then gave a great outline of the technical components of &lt;a title="ccREL" href="http://creativecommons.org/projects/ccREL" target="_blank"&gt;ccREL&lt;/a&gt;.  I am impressed with the refinements and flexibility introduced by the full-scale adoption of &lt;a title="RDFa" href="http://en.wikipedia.org/wiki/RDFa" target="_blank"&gt;RDFa&lt;/a&gt; for semantic markup.&lt;/p&gt;
&lt;p&gt;The later sessions started to drift away from my core interests, but I was intrigued by the proliferation of digital copyright registries: &lt;a title="Registered Commons" href="http://www.registeredcommons.org/" target="_blank"&gt;Registered Commons&lt;/a&gt;, &lt;a title="SafeCreative" href="https://www.safecreative.org" target="_blank"&gt;SafeCreative&lt;/a&gt;, and &lt;a title="Noank Media" href="http://www.noankmedia.com" target="_blank"&gt;Noank Media&lt;/a&gt;, for example.  It was great to see &lt;a title="attributor" href="http://attributor.com" target="_blank"&gt;Attributor&lt;/a&gt; have a presence at the summit.  They’re also heavy users of Hadoop—I am once again impressed by what &lt;a title="Jim Pitkow" href="http://www.crunchbase.com/person/jim-pitkow" target="_blank"&gt;Jim&lt;/a&gt; has built.&lt;/p&gt;
&lt;p&gt;I’ve enjoyed watching Creative Commons evolve over the past several years and I’m still holding out some hope that I’ll be able to have a material impact on their success some time in the future.  For now, it’s great to keep up with the team; I trust they’re in good hands with a fellow &lt;a title="Canterbury School" href="http://canterburyschool.org" target="_blank"&gt;Cavalier&lt;/a&gt; leading the way.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/T66K9zlh8mE/39261966</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39261966</guid><pubDate>Sat, 21 Jun 2008 02:32:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39261966</feedburner:origLink></item><item><title>Hive JIRA ticket opened!</title><description>&lt;a href="https://issues.apache.org/jira/browse/HADOOP-3601"&gt;Hive JIRA ticket opened!&lt;/a&gt;: &lt;p&gt;We’re one step closer to getting our data warehousing framework built on top of Hadoop into the hands of the community.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/OTjYW_ujJH8/39098723</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/39098723</guid><pubDate>Thu, 19 Jun 2008 19:09:49 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/39098723</feedburner:origLink></item><item><title>I learned yesterday that Google’s ads data warehouse,...</title><description>&lt;img src="http://23.media.tumblr.com/n7dBUAlkM9yjfgcl7t9FF0vt_500.jpg"/&gt;&lt;br/&gt;&lt;br/&gt;&lt;p&gt;I learned yesterday that Google’s ads data warehouse, built on &lt;a title="Netezza" href="http://www.netezza.com" target="_blank"&gt;Netezza&lt;/a&gt;, is named “Everest”—the same name Yahoo chose for their modified Postgres data warehouse.  Awk-ward…&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/7W1_qa02uCc/37563560</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/37563560</guid><pubDate>Sat, 07 Jun 2008 19:01:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/37563560</feedburner:origLink></item><item><title>Repeatability, again</title><description>&lt;p&gt;In a &lt;a title="narrative in science" href="http://jeffhammerbacher.com/post/36038729/peer-reviewed-journals-for-source-code-and-data" target="_blank"&gt;recent post&lt;/a&gt; I mentioned the idea of &lt;a title="reproducible research" href="http://www.stat.washington.edu/jaw/jaw.research.reproducible.html" target="_blank"&gt;reproducible research&lt;/a&gt;.  It turns out that this year’s &lt;a title="sigmod" href="http://www.sigmod08.org" target="_blank"&gt;SIGMOD conference&lt;/a&gt;, where we’ll be &lt;a title="cassandra" href="http://www.sigmod08.org/program_glance.shtml#products1" target="_blank"&gt;presenting&lt;/a&gt; a new approach to structured storage, has conducted an experiment in reproducible research.  You can read a fascinating account of the experiment in this month’s &lt;a title="SIGMOD Record" href="http://www.sigmod.org/record/" target="_blank"&gt;SIGMOD Record&lt;/a&gt;.  While you’re there, be sure to check out the “Data Management Projects at Google” article as well!&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/LND7Sz8Wbtc/37449980</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/37449980</guid><pubDate>Fri, 06 Jun 2008 15:50:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/37449980</feedburner:origLink></item><item><title>Hadoop at Facebook</title><description>&lt;p&gt;There’s a &lt;a title="Hadoop at Facebook" href="http://www.facebook.com/note.php?note_id=16121578919" target="_blank"&gt;post&lt;/a&gt; on the Facebook Engineering blog today from one of the Data team engineers, Joydeep Sen Sarma, discussing how we use Hadoop here at Facebook.  Check it out.&lt;/p&gt;
&lt;p&gt;For more on Hadoop at Facebook, you can check out the &lt;a title="Dublin" href="http://www.slideshare.net/jhammerb/tags/dublin" target="_blank"&gt;slides&lt;/a&gt; from a set of lectures I gave at IBM’s &lt;a title="IBM Cloud Computing Center" href="http://www-03.ibm.com/press/us/en/pressrelease/23710.wss" target="_blank"&gt;Cloud Computing Center&lt;/a&gt; in Dublin.&lt;/p&gt;
&lt;p&gt;For a deeper look at the architecture of HDFS, check out the &lt;a title="HDFS Architecture" href="http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/hdfs_dhruba.pdf" target="_blank"&gt;presentation [PDF]&lt;/a&gt; that &lt;a title="dhruba" href="http://www.borthakur.com/dhruba_borthakur" target="_blank"&gt;dhruba&lt;/a&gt;, a recent addition to the &lt;a title="Data team" href="http://www.facebook.com/pages/Facebook-Data-Team/8394258414" target="_blank"&gt;Data team&lt;/a&gt;, gave at the IBM storage team’s recent offsite.&lt;/p&gt;</description><link>http://feedproxy.google.com/~r/MainlyData/~3/ZRL7XnWVu0I/37439593</link><guid isPermaLink="false">http://jeffhammerbacher.com/post/37439593</guid><pubDate>Fri, 06 Jun 2008 13:48:00 -0700</pubDate><feedburner:origLink>http://jeffhammerbacher.com/post/37439593</feedburner:origLink></item></channel></rss>
