<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">

<channel>
	<title>The Mendicant Bug » computational linguistics</title>
	
	<link>http://mendicantbug.com</link>
	<description>Wanderings into computational linguistics, science, social media and life...</description>
	<lastBuildDate>Fri, 10 May 2013 02:24:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain="mendicantbug.com" port="80" path="/?rsscloud=notify" registerProcedure="" protocol="http-post" />
<image>
		<url>http://1.gravatar.com/blavatar/3c73bbb145eaa976335be29004da9868?s=96&amp;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>The Mendicant Bug » computational linguistics</title>
		<link>http://mendicantbug.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://mendicantbug.com/osd.xml" title="The Mendicant Bug" />
	
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/TheMendicantBugComputationalLinguistics" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="themendicantbugcomputationallinguistics" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://mendicantbug.com/?pushpress=hub" /><item>
		<title>Semantifi and the Deep Web</title>
		<link>http://mendicantbug.com/2010/02/06/semantifi-and-the-deep-web/</link>
		<comments>http://mendicantbug.com/2010/02/06/semantifi-and-the-deep-web/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 05:09:22 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[search interfaces]]></category>
		<category><![CDATA[semantic search engine]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[wolfram alpha]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1277</guid>
		<description><![CDATA[At the Atlanta Semantic Web Meetup tonight, Vishy Dasari gave us a quick description and demo of a new search engine called Semantifi.  They purportedly are a search engine for the deep web, meaning the web that is not indexed by traditional search engines because the content is dynamic.  They are just in the very [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1277&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>At the <a href="http://www.meetup.com/The-Atlanta-Semantic-Web-Meetup/" target="_blank">Atlanta Semantic Web Meetup</a> tonight, Vishy Dasari gave us a quick description and demo of a new search engine called <a href="http://semantifi.com" target="_blank">Semantifi</a>.  They purportedly are a search engine for the deep web, meaning the web that is not indexed by traditional search engines because the content is dynamic.  They are just in the very early stages, but have opened the site for people to play with and add data to via &#8220;Apps.&#8221;  These apps are sort of like agents that respond to queries, returning results to some marshal process that decides which App will get the right to answer.  Results are ranked by some method I wasn&#8217;t able to ascertain, but it reminded me of how <a href="http://blog.amyiris.com/2009/06/extending-amy-iris-python.html" target="_blank">Amy Iris</a> works.  These apps form the backbone of the Semantifi system, it seems, and they are crowdsourcing their creation.  You can create a very simple app to return answers on your own data set in a few short minutes.</p>
<p>Perhaps more interesting is that they use a natural language interface in addition to the standard query sort of interface we&#8217;re all used to.  Given the small amount of data currently available, I couldn&#8217;t really determine just how well this interface performs.  It is based on a cognitive theory by John Hawks (sp?) that apparently states we think in terms of patterns.  That&#8217;s very general and I haven&#8217;t been able to chase down that reference &#8212; and I forgot to ask Vishy for more info at the meetup.  If someone can clear that up for me, I&#8217;d be grateful.  The only seemingly relevant <a href="http://johnhawks.net/weblog" target="_blank">John Hawks</a> I could find is a paleoanthropologist, so not sure.  Anyhow, these patterns are what Vishy says the system uses to interpret natural language input.  That may be a grandiose way of saying n-gram matching.</p>
<p>While Wolfram|Alpha is a computational knowledge engine™, Semantifi does not make that claim.  Apps may compute certain things like mortgage values, but it&#8217;s not a general purpose calculator.  However, Semantifi is looking at bringing in unstructured data from blogs and the like, that W|A ignores.  It remains to be seen what that will look like, though.  Also, users can contribute to Semantifi while W|A is a black box.  In any case, they are making interesting claims and I look forward to seeing how they play out with more data.</p>
<p><em>Note: All of my observations are based on notes and memories of tonight&#8217;s presentation, so if I made any mistakes please post corrections in the comments or email me.</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1277/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1277/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1277&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2010/02/06/semantifi-and-the-deep-web/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>NLP Resources for Ruby</title>
		<link>http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/</link>
		<comments>http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 06:28:02 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[parsers]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[stemmers]]></category>
		<category><![CDATA[wordnet]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1268</guid>
		<description><![CDATA[There are quite a few well-known libraries for doing various NLP tasks in Java and Python, such as the Stanford Parser (Java) and the Natural Language Toolkit (Python).  For Ruby, there are a few resources out there, but they are usually derivative or not as mature.  By derivative, I mean they are ports from other [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1268&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">There are quite a few well-known libraries for doing various NLP tasks in Java and Python, such as the Stanford Parser (Java) and the Natural Language Toolkit (Python).  For Ruby, there are a few resources out there, but they are usually derivative or not as mature.  By derivative, I mean they are ports from other languages or extensions using code from another language.  And I&#8217;m responsible for two of them! :)</p>
<ul>
<li><a title="Treat" href="https://github.com/louismullie/treat" target="_blank">Treat</a> &#8211; Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I&#8217;ve encountered so far for Ruby
<ul>
<li>Text extractors for various document formats</li>
<li>Chunkers, segmenters, tokenizers</li>
<li>LDA</li>
<li>much more &#8211; the list is big</li>
</ul>
</li>
<li><span style="background-color:#ffffff;"><a href="http://www.deveiate.org/projects/Linguistics/" target="_blank">Ruby Linguistics</a> &#8211; this is one of the more ambitious projects, but is not as mature as NLTK</span>
<ul>
<li><span style="background-color:#ffffff;">interface for WordNet</span></li>
<li><span style="background-color:#ffffff;">Link grammar parser</span></li>
<li><span style="background-color:#ffffff;">some inflection stuff</span></li>
</ul>
</li>
<li><a title="Stanford Core NLP" href="https://github.com/louismullie/stanford-core-nlp" target="_blank">Stanford Core NLP</a> &#8211; if you&#8217;ve gotten a headache trying to use the Java bridge, this is your answer</li>
<li><a href="http://rubyforge.org/projects/stanfordparser/" target="_blank">Stanford Parser</a> interface &#8211; uses a Java bridge to access the Stanford Parser library</li>
<li><a href="http://www.markwatson.com/" target="_blank">Mark Watson</a> has a <a href="http://www.markwatson.com/opensource/rubytagger_0.1.1.zip" target="_blank">part of speech tagger</a> [zip], a <a href="http://www.markwatson.com/opensource/rubyreuters_0.1.zip" target="_blank">text categorizer</a> [zip], and <a href="http://www.markwatson.com/opensource/ruby_read_docs.zip" target="_blank">some text extraction utilities</a> [zip], but I haven&#8217;t tried to use them yet</li>
<li><a href="http://github.com/ealdent/lda-ruby" target="_blank">LDA Ruby Gem</a>- Ruby port of David Blei&#8217;s lda-c library by yours truly
<ul>
<li>Uses Blei&#8217;s c-code for the actual LDA but I include some wrappers to make using it a bit easier</li>
</ul>
</li>
<li><a href="http://github.com/ealdent/uea-stemmer" target="_blank">UEA Stemmer</a> &#8211; Ruby port (again by yours truly) of a conservative stemmer based on Jenkins and Smith&#8217;s <a href="http://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemming" target="_blank">UEA Stemmer</a></li>
<li><a href="http://rubyforge.org/projects/stemmer/" target="_blank">Stemmer gem</a> &#8211; <a href="http://tartarus.org/~martin/PorterStemmer/" target="_blank">Porter stemmer</a></li>
<li><a href="http://www.locknet.ro/projects/ann-ruby-stemmer" target="_blank">Lingua Stemmer </a>- another stemming library, Porter stemmer</li>
<li><a href="http://www.deveiate.org/projects/Ruby-WordNet/" target="_blank">Ruby WordNet </a>- basically what&#8217;s included in Ruby Linguistics</li>
<li><a href="http://sourceforge.net/projects/raspell/" target="_blank">Raspell</a> &#8211; Ruby interface to Aspell spell checker</li>
</ul>
<p style="text-align:justify;">There are also a number of fledgling or orphaned projects out there purporting to be ports or interfaces for various other libraries like Stanford POS Tagger and Named Entity Recognizer.  Ruby (straight Ruby, not just JRuby) can interface just about any Java library using the <a href="http://rjb.rubyforge.org/" target="_blank">Ruby Java Bridge</a> (RJB).  RJB can be a pain, and I could only initialize it once per run (a second attempt never succeeds), so there are some limitations.  But using it, I was able to easily interface with the Stanford POS tagger.</p>
<p style="text-align:justify;">So while there aren&#8217;t terribly many libraries for NLP tasks in Ruby, the availability of interfacing with Java directly widens the scope quite a bit.  You can also incorporate a c library using extensions.</p>
<p style="text-align:justify;">Naturally, if I missed anything, no matter how small, please let me know.</p>
<p style="text-align:justify;"><em>Update:</em> Here is a great list of <a href="http://web.media.mit.edu/~dustin/papers/ai_ruby_plugins/" target="_blank">AI-related ruby libraries</a> from Dustin Smith.</p>
<br />Posted in Uncategorized  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1268/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1268&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>Lazyfeed: the missing link in the evolution of RSS?</title>
		<link>http://mendicantbug.com/2009/08/01/lazyfeed-the-missing-link-in-the-evolution-of-rss/</link>
		<comments>http://mendicantbug.com/2009/08/01/lazyfeed-the-missing-link-in-the-evolution-of-rss/#comments</comments>
		<pubDate>Sat, 01 Aug 2009 13:00:09 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[exploratory search]]></category>
		<category><![CDATA[google alerts]]></category>
		<category><![CDATA[information trapping]]></category>
		<category><![CDATA[invites]]></category>
		<category><![CDATA[lazyfeed]]></category>
		<category><![CDATA[recommender system]]></category>
		<category><![CDATA[rss]]></category>
		<category><![CDATA[topic detection]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1240</guid>
		<description><![CDATA[When Lazyfeed announced a limited round of beta invites on TechCrunch, I admit, I lusted after them.  Only 250?  I wanted to be one!  But alas, I was put on the waiting list.  It&#8217;s a decent marketing strategy for building up some hype.  When I finally did get my invite, I tried them out for [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1240&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">When <a href="http://www.lazyfeed.com" target="_blank">Lazyfeed</a> announced a limited round of beta invites on <a href="http://www.techcrunch.com/2009/07/10/lazyfeed-wakes-up-%E2%80%94-and-we-have-invites/" target="_blank">TechCrunch</a>, I admit, I lusted after them.  Only 250?  I wanted to be one!  But alas, I was put on the waiting list.  It&#8217;s a decent marketing strategy for building up some hype.  When I finally did get my invite, I tried them out for about 5 minutes and fell prey to the distractions of the internet.  That was a bad sign, though.  Usually a new service can hold my attention for a little while longer.  So what happened?</p>
<h3>Lazy-wha?</h3>
<p style="text-align:justify;">Lazyfeed is a service that lets you enter topics, blogs, twitter, delicious and flickr accounts to form a live streaming <em>lazyfeed</em>.  You then get live updates in the form of your tags being updated.  Your main screen consists of a bunch of boxes with your topics and then things it guesses are related.</p>
<h3>The hook</h3>
<p style="text-align:justify;">Lazyfeed&#8217;s marketing strategy succeeded again by giving me three invites to hand out to friends.  I offered them on <a class="zem_slink" title="Twitter" rel="homepage" href="http://twitter.com">Twitter</a>, having only one person bite.  So here are the other two invites for the adventurous.  Get em while they&#8217;re hot.  If you manage to take one, please comment that you did so, so that I can at least know who you were and we can save someone else the wasted time.  I&#8217;m just throwing them into the ether like this because I don&#8217;t feel like pushing them on Twitter again.</p>
<blockquote><p><span style="text-decoration:line-through;">NTI1MzMxMjc5ZVhmUTl5cDBiek1R<br />
</span><span style="text-decoration:line-through;"> OTk5MTUwNjczN3JCLklmZHhjMDdV</span></p></blockquote>
<h3>The rub</h3>
<p style="text-align:justify;">Lazyfeed is a lovely service in terms of appearance and ajaxy goodness, but my initial impression is that it ends up being streaming <a class="zem_slink" title="Information overload" rel="wikipedia" href="http://en.wikipedia.org/wiki/Information_overload">information overload</a>.  For one, the topic suggestion feature appears to be fairly naive.  Someone correct me if I&#8217;m wrong, but it looks a bit like document similarity for topics is done purely by one-for-one matching on tags.  Whatever the method, the result of their suggested topics (&#8220;Stuff for Lazy Jason&#8221;) is stuff like the following:</p>
<div id="attachment_1241" class="wp-caption aligncenter" style="width: 510px"><a href="http://ealdent.files.wordpress.com/2009/08/picture-1.png"><img class="size-full wp-image-1241" title="Lazyfeed sample results" src="http://ealdent.files.wordpress.com/2009/08/picture-1.png?w=614" alt="Lazyfeed sample results"   /></a><p class="wp-caption-text">Lazyfeed sample suggested topics</p></div>
<p style="text-align:justify;">Granted, it&#8217;s a hard problem, but those results are pretty bad.  So as I started to write this post lambasting this service, I considered that maybe I was just seeing cold-start problems, and I was being unfair.  So I trained it with some additional feeds and topics that are straight-to-the-point of stuff I&#8217;m interested in, like <em>sigir2009</em>, <em>topicmodeling</em>, <em>recommendersystems</em>, etc.  Tags can contain no spaces, btw, which is why those don&#8217;t.  When I tried using dashes, like I often do on delicious, it gives no results.  I also removed some things that were too general or contained too many spurious results.</p>
<h3>The light</h3>
<p style="text-align:justify;">Things started improving here, and I actually began to understand what the point of Lazyfeed is.  My initial confusion was that &#8220;Stuff for Lazy Jason&#8221; is stuff that I would want to read right now.  Being lazy, I didn&#8217;t expect to have to do work to get those things.  But &#8220;Stuff for Lazy Jason&#8221; is a list of topics it thinks I might be interested in.  Saving any one of those puts it into my lazyfeed, which is in the bar on the left.</p>
<div id="attachment_1242" class="wp-caption aligncenter" style="width: 510px"><a href="http://ealdent.files.wordpress.com/2009/08/picture-2.png"><img class="size-full wp-image-1242" title="My lazyfeed topics" src="http://ealdent.files.wordpress.com/2009/08/picture-2.png?w=614" alt="My lazyfeed topics"   /></a><p class="wp-caption-text">My lazyfeed topics</p></div>
<p style="text-align:justify;">So now what happens is that occasionally it discovers something new related to my interests and it bumps that category to the top of the list and turns it bold again (grayed out topics have been read).  Most of my topics are low traffic, so add something like <em>mariahcarey</em> if you want to see this functionality in action.  Now we&#8217;re getting somewhere.  It has actually started being helpful and has found me some stuff that my Google alerts haven&#8217;t.  Which is weird, and is making me think I need to double check to make sure my Google alerts are working&#8230;</p>
<h3>The end</h3>
<p style="text-align:justify;">My takeaway after using Lazyfeed for nigh on two hours is that it&#8217;s an interesting alternative (or even extension) to <a class="zem_slink" title="RSS" rel="wikipedia" href="http://en.wikipedia.org/wiki/RSS">RSS</a>, but one that still hasn&#8217;t crossed the bridge to the next stage in evolution.  The idea is solid.  Automatically discover stuff in the sea of human knowledge (or human idiocy) and serve it up fresh.  The implementation lacks robust topic detection which is unfortunately going to be necessary unless it is to become another source of information overload rather than a useful stream of relevant information. Relevance is an ephemeral thing, given that your information needs change from day to day.  Lazyfeed makes it pretty easy to get rid of old topics and add new ones, even if some of their suggestions are still wonky.  It&#8217;s an interesting recommender system problem with a lot of potential.</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/403d8746-534f-461c-8872-df09ca6433c9/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_c.png?x-id=403d8746-534f-461c-8872-df09ca6433c9" alt="Reblog this post [with Zemanta]" /></a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1240/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1240/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1240&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/08/01/lazyfeed-the-missing-link-in-the-evolution-of-rss/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://ealdent.files.wordpress.com/2009/08/picture-1.png" medium="image">
			<media:title type="html">Lazyfeed sample results</media:title>
		</media:content>

		<media:content url="http://ealdent.files.wordpress.com/2009/08/picture-2.png" medium="image">
			<media:title type="html">My lazyfeed topics</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_c.png?x-id=403d8746-534f-461c-8872-df09ca6433c9" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>Updates to lda-ruby gem</title>
		<link>http://mendicantbug.com/2009/07/30/updates-to-lda-ruby-gem/</link>
		<comments>http://mendicantbug.com/2009/07/30/updates-to-lda-ruby-gem/#comments</comments>
		<pubDate>Fri, 31 Jul 2009 02:20:25 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[c]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[latent dirichlet allocation]]></category>
		<category><![CDATA[lda]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[rubygems]]></category>
		<category><![CDATA[topic modeling]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1230</guid>
		<description><![CDATA[A while back I ported David Blei&#8217;s lda-c code for performing Latent Dirichlet Allocation to Ruby.  Basically I just wrapped the C methods in a Ruby class, turned it into a gem, and called it a day.  The result was a bit ugly and unwieldy, like most research code.  A few months later, Todd Fisher [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1230&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;"><a href="http://ealdent.files.wordpress.com/2009/07/works-on-my-machine-starburst.jpg"><img class="alignright size-full wp-image-1232" style="margin:3px 4px;" title="works-on-my-machine-starburst" src="http://ealdent.files.wordpress.com/2009/07/works-on-my-machine-starburst.jpg?w=614" alt="works-on-my-machine-starburst"   /></a>A while back <a href="http://mendicantbug.com/2008/11/17/lda-in-ruby/" target="_self">I ported</a> David Blei&#8217;s <a href="http://www.cs.princeton.edu/~blei/lda-c/" target="_blank">lda-c</a> code for performing <a class="zem_slink" title="Latent Dirichlet allocation" rel="wikipedia" href="http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation">Latent Dirichlet Allocation</a> to <a class="zem_slink" title="Ruby (programming language)" rel="homepage" href="http://www.ruby-lang.org/">Ruby</a>.  Basically I just wrapped the C methods in a Ruby class, turned it into <a href="http://github.com/ealdent/lda-ruby" target="_blank">a gem</a>, and called it a day.  The result was a bit ugly and unwieldy, like most research code.  A few months later, <a href="http://github.com/taf2" target="_blank">Todd Fisher</a> came along and discovered a couple bugs and memory leaks in the C code, for which I am very grateful.  I had been toying with the idea of improving the Ruby code, and embarked on a mission to do so.  The result is a hopefully much cleaner <a class="zem_slink" title="RubyGems" rel="homepage" href="http://docs.rubygems.org">gem</a> that can be used right out of the box with little screwing around.</p>
<p style="text-align:justify;">Unfortunately, I did something I&#8217;m ashamed of.  Ruby gems are notorious for breaking backwards compatibility, and I have done just that.  The good news is, your code will almost work, assuming you didn&#8217;t start diving into the Document and Corpus classes too heavily.  If you did, then you will probably experience a lot of breakage.  The result, I hope is a more sensical implementation, however, so maybe you won&#8217;t hate me.  Of course, I could be wrong and my implementation is still crap.  If that&#8217;s the case, please let me know what needs to be improved.</p>
<p>To install the gem:</p>
<p style="padding-left:30px;"><code>gem sources -a <a href="http://gems.github.com" rel="nofollow">http://gems.github.com</a><br />
sudo gem install ealdent-lda-ruby</code></p>
<p>Enjoy!</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/f6bb550a-b4f1-4af6-bcde-fabdba98ad7a/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_c.png?x-id=f6bb550a-b4f1-4af6-bcde-fabdba98ad7a" alt="Reblog this post [with Zemanta]" /></a></div></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1230/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1230/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1230&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/07/30/updates-to-lda-ruby-gem/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://ealdent.files.wordpress.com/2009/07/works-on-my-machine-starburst.jpg" medium="image">
			<media:title type="html">works-on-my-machine-starburst</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_c.png?x-id=f6bb550a-b4f1-4af6-bcde-fabdba98ad7a" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>Porting the UEA-Lite Stemmer to Ruby</title>
		<link>http://mendicantbug.com/2009/07/16/porting-the-uea-lite-stemmer-to-ruby/</link>
		<comments>http://mendicantbug.com/2009/07/16/porting-the-uea-lite-stemmer-to-ruby/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 11:00:44 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[finite state transducers]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[open source software]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[stemmers]]></category>
		<category><![CDATA[stemming]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1224</guid>
		<description><![CDATA[A twitter friend (@communicating) tipped me off to the UEA-Lite Stemmer by Marie-Claire Jenkins and Dan J. Smith.  Stemmers are NLP tools that get rid of inflectional and derivational affixes from words.  In English, that usually means getting rid of the plural -s, progressive -ing, and preterite -ed.  Depending on the type of stemmer, that [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1224&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">A twitter friend (<a href="http://twitter.com/communicating" target="_blank">@communicating</a>) tipped me off to the <a href="http://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemming" target="_blank">UEA-Lite Stemmer</a> by <a href="http://www.scienceforseo.com/" target="_blank">Marie-Claire Jenkins</a> and Dan J. Smith.  Stemmers are <a class="zem_slink" title="Natural language processing" rel="wikipedia" href="http://en.wikipedia.org/wiki/Natural_language_processing">NLP</a> tools that get rid of inflectional and derivational <a class="zem_slink" title="Affix" rel="wikipedia" href="http://en.wikipedia.org/wiki/Affix">affixes</a> from words.  In English, that usually means getting rid of the plural <em>-s</em>, progressive <em>-ing</em>, and preterite <em>-ed</em>.  Depending on the type of <a class="zem_slink" title="Stemming" rel="wikipedia" href="http://en.wikipedia.org/wiki/Stemming">stemmer</a>, that might also mean getting rid of derivational <a class="zem_slink" title="Suffix" rel="wikipedia" href="http://en.wikipedia.org/wiki/Suffix">suffixes</a> like <em>-ful</em> and <em>-ness</em>.  Sometimes it&#8217;s useful to be able to reduce words like <em>consolation</em> and <em>console</em> to the same root form: <em>consol</em>.  But sometimes that doesn&#8217;t make sense.  If you&#8217;re searching for video game consoles, you don&#8217;t want to find documents about consolation.  In this case, you need a conservative stemmer.</p>
<p style="text-align:justify;">The UEA-Lite Stemmer is a rule-based, conservative stemmer that handles regular words, proper nouns and acronyms.  It was originally written in <a class="zem_slink" title="Perl" rel="homepage" href="http://www.perl.org/">Perl</a>, but had been ported to <a class="zem_slink" title="Java (software platform)" rel="homepage" href="http://java.sun.com">Java</a>.  Since I usually code in Ruby these days, I thought it&#8217;d be nice to make it available to the Ruby community, so I ported it over last night.</p>
<p style="text-align:justify;">The code is open source under the <a class="zem_slink" title="Apache License" rel="wikipedia" href="http://en.wikipedia.org/wiki/Apache_License">Apache 2 License</a> and hosted on <a href="http://github.com/ealdent/uea-stemmer/tree/master" target="_blank">github</a>.  So please check out the code and let me know what you think.  Heck, you can even fork the project and make some improvements yourself if you want.</p>
<p style="text-align:justify;">One direction I&#8217;d like to be able to go is to turn all of the rules into <a href="http://en.wikipedia.org/wiki/Finite_state_transducer" target="_blank">finite state transducers</a>, which can be composed into a single large deterministic finite state transducer.  That would be a lot more efficient (and even fun!), but Ruby lacks a decent FST implementation.</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/ac71b162-56e6-4cbb-b1ab-10ee9055ed7b/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_c.png?x-id=ac71b162-56e6-4cbb-b1ab-10ee9055ed7b" alt="Reblog this post [with Zemanta]" /></a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1224/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1224/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1224&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/07/16/porting-the-uea-lite-stemmer-to-ruby/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_c.png?x-id=ac71b162-56e6-4cbb-b1ab-10ee9055ed7b" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>First Impressions of Wolfram|Alpha</title>
		<link>http://mendicantbug.com/2009/05/16/first-impressions-of-wolframalpha/</link>
		<comments>http://mendicantbug.com/2009/05/16/first-impressions-of-wolframalpha/#comments</comments>
		<pubDate>Sun, 17 May 2009 04:42:46 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google squared]]></category>
		<category><![CDATA[knowledge engines]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[stephen wolfram]]></category>
		<category><![CDATA[wikipedia]]></category>
		<category><![CDATA[wolfram alpha]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1158</guid>
		<description><![CDATA[Perhaps you&#8217;ve heard of the latest brainchild of the Wunderkind Stephen Wolfram:  Wolfram&#124;Alpha.  Matthew Hurst nicknamed it Alphram today and I agree that&#8217;s a much better name.   Wolfram&#124;Alpha (W&#124;A henceforth) is not a search engine, it&#8217;s a knowledge engine.  It will compete with Google on a slice of traffic that Google really isn&#8217;t all that [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1158&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">Perhaps you&#8217;ve heard of the latest brainchild of the Wunderkind <a class="zem_slink" title="Stephen Wolfram" rel="wikipedia" href="http://en.wikipedia.org/wiki/Stephen_Wolfram">Stephen Wolfram</a>:  <a href="http://wolframalpha.com" target="_blank">Wolfram|Alpha</a>.  Matthew Hurst nicknamed it <a href="http://datamining.typepad.com/data_mining/2009/05/alphram-and-google.html" target="_blank">Alphram</a> today and I agree that&#8217;s a much better name.   Wolfram|Alpha (W|A henceforth) is not a <a class="zem_slink" title="Web search engine" rel="wikipedia" href="http://en.wikipedia.org/wiki/Web_search_engine">search engine</a>, it&#8217;s a knowledge engine.  It will compete with <a class="zem_slink" title="Google" rel="homepage" href="http://google.com">Google</a> on a slice of traffic that Google really isn&#8217;t all that hot in for now, comparative questioning answering.  When you <a href="http://www.google.com/search?source=ig&amp;hl=en&amp;rlz=1G1GGLQ_ENUS245&amp;=&amp;q=how+does+the+gdp+of+south+africa+compare+to+china%3F&amp;btnG=Google+Search&amp;aq=f&amp;oq=" target="_blank">ask Google</a> something like &#8220;How does the <a class="zem_slink" title="Gross domestic product" rel="wikipedia" href="http://en.wikipedia.org/wiki/Gross_domestic_product">GDP</a> of <a class="zem_slink" title="South Africa" rel="wikipedia" href="http://en.wikipedia.org/wiki/South_Africa">South Africa</a> compare to <a class="zem_slink" title="China" rel="geolocation" href="http://maps.google.com/maps?ll=35.0,105.0&amp;spn=10.0,10.0&amp;q=35.0,105.0%20%28China%29&amp;t=h">China</a>?&#8221; you hope you get back something relevant in the first few results (<em>spoiler alert:  you don&#8217;t</em>).  When you ask that of W|A, you get <a href="http://www.wolframalpha.com/input/?i=how+does+the+gdp+of+south+africa+compare+to+china%3F" target="_blank">exactly what you&#8217;re looking for</a>.  Beautiful.  W|A&#8217;s so-called natural language interface isn&#8217;t perfect, though.  You get a lot of flakiness from it until you start to recognize what works and what doesn&#8217;t.</p>
<p style="text-align:justify;">Now let&#8217;s be honest.  How often do we search for that kind of thing?  Not very often.  I think that&#8217;s partly because Google is notoriously bad at it.  Once we start to get a handle on what W|A is capable of, I think people will start expecting more of their friendly neighborhood search giant.  Google claims to have <a href="http://www.techcrunch.com/2009/05/12/what-is-google-squared-it-is-how-google-will-crush-wolfram-alpha-exclusive-video/" target="_blank">a few tricks</a> up its sleeves, but everything <a href="http://googleblog.blogspot.com/2009/04/adding-search-power-to-public-data.html" target="_blank">I&#8217;ve seen out</a> of Google lately has been such a disappointment I am deeply skeptical.  The new trick is called Google Squared and it returns search results in a spreadsheet format, breaking down the various facets of the things you are searching for.  In the demo, it shows stuff like rollercoaster drop speeds, heights, etc when you search for <a class="zem_slink" title="Roller coaster" rel="wikipedia" href="http://en.wikipedia.org/wiki/Roller_coaster">roller coasters</a>.  You can add to the square and do some pretty nifty stuff.  <a class="zem_slink" title="TechCrunch" rel="homepage" href="http://www.techcrunch.com">TechCrunch</a> claims this will kill W|A.  I think the two could be complementary.  Based on the demo, I expect W|A will return results of a higher calibre, but will miss out on a lot of queries because the knowledge is just missing.  Google Squared appears to be doing something fuzzier and will return results that might be really bad.  So while W|A just says it doesn&#8217;t know, Google Squared will let you pick through the junk to find the gem.  Google Squared is expected to launch later this month in <a class="zem_slink" title="Google Labs" rel="homepage" href="http://labs.google.com">Google Labs</a>.</p>
<p style="text-align:justify;">Many have said that where W|A will really compete is against Wikipedia and I am inclined to agree.  There are plenty of things I go to Wikipedia for now that I probably will switch over to W|A for, like populations of countries, size of Neptune&#8217;s moons, and so on.  Wikipedia still wins for more in-depth knowledge on a topic.  W|A also does some pretty cool stuff when you search for the definition of a word (use a query like &#8220;<a href="http://www.wolframalpha.com/input/?i=word+kitten" target="_blank">word kitten</a>&#8220;).  You learn that kitten comes from <a class="zem_slink" title="Classical Latin" rel="wikipedia" href="http://en.wikipedia.org/wiki/Classical_Latin">Classical Latin</a>, and entered English about 700 years ago.  You can find out a similar thing (and go further in depth for the etymology at least) using <a class="zem_slink" title="The American Heritage Dictionary (based on the New Second College Edition)" rel="amazon" href="http://www.amazon.com/American-Heritage-Dictionary-Second-College/dp/0440201896%3FSubscriptionId%3D0G81C5DAZ03ZR9WH9X82%26tag%3Dthemenbug-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0440201896">the American Heritage dictionary</a> on <a href="http://dictionary.reference.com" target="_blank">dictionary.com</a>, but W|A requires less digging.</p>
<p style="text-align:justify;">And this brings me around to a key point with W|A.  It&#8217;s an awesome factoid answering service.  It does it well and it does it in a pretty way.  Stuff you can find in more depth elsewhere you can get quickly and easily, but only superficially via W|A.  There are links to more information, though, so you don&#8217;t lose much by relying on W|A &#8212; assuming it has knowledge about what you&#8217;re looking for.  You&#8217;re still going to be more likely to hit a brick wall with W|A.</p>
<p style="text-align:justify;">And of course, since Wolfram developed <a class="zem_slink" title="Mathematica" rel="amazon" href="http://www.amazon.com/Mathematica-Stephen-Wolfram/dp/0201193302%3FSubscriptionId%3D0G81C5DAZ03ZR9WH9X82%26tag%3Dthemenbug-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0201193302">Mathematica</a>, W|A is backed by it.  Enter an equation and you get some really handy math info back.  Need to quickly know the derivative of a fairly complicated equation?  Presto.  Probably the most satisfying feeling I got today was from a query similar to &#8220;<a href="http://www.wolframalpha.com/input/?i=what+is+the+area+under+x^4%2B3x^2%2B4+from+1+to+8%3F" target="_blank">what is the area under x^4+3x^2+4 from 1 to 8?</a>&#8220;  Let&#8217;s see you answer that, Google Squared.</p>
<p style="text-align:justify;"><a href="http://ealdent.files.wordpress.com/2009/05/picture-11.png"><img class="aligncenter size-full wp-image-1160" title="Wolfram|Alpha sample results" src="http://ealdent.files.wordpress.com/2009/05/picture-11.png?w=614" alt="Wolfram|Alpha sample results"   /></a></p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/04142ddc-4b5d-4b78-b0ba-2a10d21b90e8/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_c.png?x-id=04142ddc-4b5d-4b78-b0ba-2a10d21b90e8" alt="Reblog this post [with Zemanta]" /></a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1158/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1158/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1158&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/05/16/first-impressions-of-wolframalpha/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://ealdent.files.wordpress.com/2009/05/picture-11.png" medium="image">
			<media:title type="html">Wolfram|Alpha sample results</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_c.png?x-id=04142ddc-4b5d-4b78-b0ba-2a10d21b90e8" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>My #WWW2009 Reading List</title>
		<link>http://mendicantbug.com/2009/04/25/my-www2009-reading-list/</link>
		<comments>http://mendicantbug.com/2009/04/25/my-www2009-reading-list/#comments</comments>
		<pubDate>Sat, 25 Apr 2009 14:54:40 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[computer science]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[literature]]></category>
		<category><![CDATA[proceedings]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[web 2.0]]></category>
		<category><![CDATA[www2009]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1143</guid>
		<description><![CDATA[The papers are out for WWW2009 (and have been for a bit), but I&#8217;ve only just gotten a chance to start looking at them. First of all, kudos to the ePrints people for improving the presentation of conference proceedings. This is a lot easier than having to do a Google Scholar search for each paper [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1143&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">The papers are out for <a href="http://www2009.org/" target="_blank">WWW2009</a> (and have been for a bit), but I&#8217;ve only just gotten a chance to start looking at them. First of all, kudos to the ePrints people for improving the presentation of conference proceedings. This is a lot easier than having to do a <a class="zem_slink" title="Google Scholar" rel="homepage" href="http://scholar.google.com/" target="_blank">Google Scholar</a> search for each paper and hoping I find something, like I have to do with <a href="http://www.acl-ijcnlp-2009.org/main/acceptedfullpapers.html" target="_blank">some conferences</a>.</p>
<div id="attachment_1144" class="wp-caption aligncenter" style="width: 510px"><a href="http://ealdent.files.wordpress.com/2009/04/pradera_www2009.jpg"><img class="size-full wp-image-1144" title="WWW2009" src="http://ealdent.files.wordpress.com/2009/04/pradera_www2009.jpg?w=614" alt="WWW2009 Madrid"   /></a><p class="wp-caption-text">WWW2009 Madrid</p></div>
<p style="text-align:justify;">There are a lot of very interesting ones, and here are a few that bubbled to the top of my reading list:</p>
<h4><a class="zem_slink" title="Data mining" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_mining" target="_blank">Data Mining</a> Track</h4>
<ul>
<li><em><a href="http://www2009.eprints.org/21/" target="_blank">A Class-Feature-Centroid Classifier for Text Categorization</a></em></li>
<li><em><a href="http://www2009.eprints.org/22/" target="_blank">Large Scale Multi-Label Classification via MetaLabeler</a></em></li>
<li><a href="http://www2009.eprints.org/13/" target="_blank"><em>Learning Consensus Opinion: Mining Data from a Labeling Game</em></a></li>
<li><a href="http://www2009.eprints.org/12/" target="_blank"><em>Matchbox: Large Scale Online Bayesian Recommendations</em></a></li>
</ul>
<h4>Semantic/Data Web</h4>
<ul>
<li><a href="http://www2009.eprints.org/67/" target="_blank"><em>Extracting Key Terms From Noisy and Multi-theme Documents</em></a></li>
<li><a href="http://www2009.eprints.org/63/" target="_blank"><em>Triplify: Light-Weight Linked Data Publication from Relational Databases</em></a></li>
</ul>
<h4><a class="zem_slink" title="Social network" rel="wikipedia" href="http://en.wikipedia.org/wiki/Social_network">Social Networks</a> and <a class="zem_slink" title="Web 2.0" rel="wikipedia" href="http://en.wikipedia.org/wiki/Web_2.0">Web 2.0</a></h4>
<ul>
<li><a href="http://www2009.eprints.org/70/" target="_blank"><em>Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models</em></a></li>
<li><a href="http://www2009.eprints.org/78/" target="_blank"><em>Ranking and Classifying Attractiveness of Photos in Folksonomies</em></a></li>
<li><a href="http://www2009.eprints.org/68/" target="_blank"><em>Tagommenders: Connecting Users to Items through Tags</em></a></li>
</ul>
<p><em></em></p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/39577c8f-13ea-4915-b29b-59c2dcb3667f/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_c.png?x-id=39577c8f-13ea-4915-b29b-59c2dcb3667f" alt="Reblog this post [with Zemanta]" /></a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1143/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1143&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/04/25/my-www2009-reading-list/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://ealdent.files.wordpress.com/2009/04/pradera_www2009.jpg" medium="image">
			<media:title type="html">WWW2009</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_c.png?x-id=39577c8f-13ea-4915-b29b-59c2dcb3667f" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>Much ado about nothing</title>
		<link>http://mendicantbug.com/2009/03/26/much-ado-about-nothing/</link>
		<comments>http://mendicantbug.com/2009/03/26/much-ado-about-nothing/#comments</comments>
		<pubDate>Thu, 26 Mar 2009 06:19:24 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[blagoblag]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[disappointment]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[hype]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic search]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1111</guid>
		<description><![CDATA[There has been much ballyhoo in the blogosphere touting Google&#8217;s so-called foray into semantic search.  The blog post announcing the new feature doesn&#8217;t even mention the word semantics, but it does say it looks at associations and concepts related to your query.  I see no mention of tuples or anything of the sort and the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1111&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">There has been much <a href="http://www.pcworld.com/businesscenter/article/161869/google_rolls_out_semantic_search_capabilities.html?tk=rel_news" target="_blank">ballyhoo</a> in the blogosphere touting Google&#8217;s so-called foray into semantic search.  The blog post <a href="http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-results.html" target="_blank">announcing the new feature</a> doesn&#8217;t even mention the word semantics, but it does say it looks at associations and concepts related to your query.  I see no mention of tuples or anything of the sort and the suggested queries are the kind of thing that I would expect to come out of a background closer to document/query classification than semantic analysis.</p>
<div id="attachment_1112" class="wp-caption aligncenter" style="width: 510px"><a href="http://ealdent.files.wordpress.com/2009/03/picture-1.png"><img class="size-full wp-image-1112" title="much ado about nothing" src="http://ealdent.files.wordpress.com/2009/03/picture-1.png?w=614" alt="Related search results for &lt;i&gt;much ado about nothing&lt;/i&gt;"   /></a><p class="wp-caption-text">Related search results for much ado about nothing</p></div>
<p style="text-align:justify;">And the results are pretty meh.  Except for <em>taming of the shrew</em>, those results are no-brainers.  That&#8217;s query completion quality results.  Of course you can&#8217;t judge the whole system by one isolated example.</p>
<p style="text-align:justify;">When PC World and a host of other pop tech media zines started toasting the entrance of Google to the semantic arena, I was excited to see some cool stuff.  Imagine my disappointment when I was not only underwhelmed by the quality of the results, but by the lack of novelty.  How long has that feature been there?  Seems like I&#8217;ve seen it for ages.  Maybe it got a technological face-lift (I guess that would be a face-lift on the inside), but it looks about the same as I remember it.  Plus, its placement at the bottom of results page relegates it to search engine hell.</p>
<p style="text-align:justify;">In summary:  boring.  My complaints are first and foremost with those elements of the blagoblag who over-hyped this.  Secondly, I am complaining to Google for not being better.  I am feeling demanding today.</p>
<p style="text-align:justify;">Daniel&#8217;s <a href="http://thenoisychannel.com/2009/03/24/google-offers-more-and-better-search-refinements/" target="_blank">post</a> on it is worth reading.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1111/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1111/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1111&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/03/26/much-ado-about-nothing/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://ealdent.files.wordpress.com/2009/03/picture-1.png" medium="image">
			<media:title type="html">much ado about nothing</media:title>
		</media:content>
	</item>
		<item>
		<title>Computational Linguistics Blogs</title>
		<link>http://mendicantbug.com/2009/01/24/computational-linguistics-blogs/</link>
		<comments>http://mendicantbug.com/2009/01/24/computational-linguistics-blogs/#comments</comments>
		<pubDate>Sat, 24 Jan 2009 22:00:07 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[blagoblag]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[lists]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=1041</guid>
		<description><![CDATA[Since I started blogging almost a year and a half ago, I have been following many blogs. I managed to find some blogs dealing with computational linguistics and natural language processing, but they were few and far between. Since then, I&#8217;ve discovered quite a few NLP people that have entered the blagoblag. Here is a [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1041&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">Since I started blogging almost a year and a half ago, I have been following many blogs.  I managed to find some blogs dealing with computational linguistics and natural language processing, but they were few and far between.  Since then, I&#8217;ve discovered quite a few NLP people that have entered the blagoblag.  Here is a non-exhaustive list of the many that I follow.</p>
<p style="text-align:justify;">Many of these bloggers post sporadically and even then only post about CL/NLP occasionally.  I&#8217;ve tried to organize the list into those who post exclusively on CL/NLP (at least as far as I have followed them) and those who post sporadically on CL/NLP.  I would fall into the latter, since I frequently blog about my dogs, regular computer science-y and programming stuff, and other rants.  P.S. I group Information Retrieval in with CL/NLP here, but only the blogs I actually read.  I&#8217;m sure there&#8217;s a bazillion I don&#8217;t.</p>
<p style="text-align:justify;">If I&#8217;ve missed one+, please let me know.  I&#8217;m always on the lookout.  Ditto if you think I&#8217;ve miscategorized someone.  I&#8217;ve excluded a few that haven&#8217;t posted in a while.</p>
<ul>
<li>Almost always containing Computational Linguistics/NLP
<ul>
<li> <a href="http://computationallinguistic.blogspot.com/">Computational Linguistics</a> (Ali Reza Ebadat)</li>
<li> <a href="http://livewebir.com/blog/">Information Retrieval on the Live Web</a> (Paul Ogilvie)</li>
<li> <a href="http://nlpers.blogspot.com/">natural language processing blog</a> (Hal Daume)</li>
<li> <a href="http://kreisman.wordpress.com/">thought process</a> Ken Reisman&#8217;s blog is very new, but so far, just CL/NLP stuff.</li>
</ul>
</li>
<li> Often containing Computational Linguistics/NLP
<ul>
<li> <a href="http://a-dimit.blogspot.com/">?- true</a></li>
<li><a href="http://andyhickl.com/" target="_blank">AndyHickl.com</a> (Andy Hickl)</li>
<li> <a href="http://apperceptual.wordpress.com/">Apperceptual</a> (Peter Turney)</li>
<li> <a href="http://www.drni.de/niels/cl/blog/">Automatic Mind</a> (Niels Ott)</li>
<li> <a href="http://blogamundo.net/dev/">Hacklog: Blogamundo</a> (Patrick Hall)</li>
<li> <a href="http://lada.si.umich.edu:8080/wordpress/">Information Engineering</a> (Dragomir Radev)</li>
<li> <a href="http://languagewrong.tumblr.com/">Language Wrong</a> (Roddy Lindsay)</li>
<li> <a href="http://lingpipe-blog.com/">LingPipe</a> (Bob Carpenter)</li>
<li><a href="http://staff.science.uva.nl/~tsagias/">Manos Tsagkias</a></li>
<li> <a href="http://resnotebook.blogspot.com/">Misc Research Stuff</a> (Delip Rao)</li>
<li> <a href="http://khassanali-nlp-research.blogspot.com/">Natural Language Processing</a> (Nisha)</li>
<li> <a href="http://thenoisychannel.com/">The Noisy Channel</a> (Daniel Tunkelang)</li>
<li> <a href="http://blog.outerthoughts.com/">Outer Thoughts</a> (Alexandre Rafalovitch)</li>
<li> <a href="http://ramslifeofalinguist.blogspot.com/">Ramifications of a Linguist&#8217;s Life</a></li>
<li><a href="http://scienceforseo.blogspot.com/">Science for SEO</a></li>
<li><a href="http://streamhacker.com">streamhacker.com</a></li>
</ul>
</li>
<li> Occasionally containing Computational Linguistics/NLP
<ul>
<li> <a href="http://blog.amyiris.com/">Amy Iris</a></li>
<li> <a href="http://dpn.name/">Attempted Axiomatisation</a> (David Petar Novakovic)  Though <a href="http://twitter.com/dpn">@dpn</a> hasn&#8217;t posted much lately.  Consider this a <em>poke</em>. :)</li>
<li><a href="http://www.drmaciver.com/">David R. MacIver</a></li>
<li> <a href="http://earningmyturns.blogspot.com/">Earning My Turns</a> (Fernando Pereira)</li>
<li> <a href="http://www.searchenginecaffe.com/">Jeff&#8217;s Search Engine Caffe</a> (Jeff Dalton)</li>
<li> <a href="http://thelousylinguist.blogspot.com/">The Lousy Linguist</a></li>
<li> <a href="https://www.stanford.edu/~mjockers/cgi-bin/drupal/?q=blog">Matthew L. Jockers</a></li>
<li> <a href="http://sandersn.com/blog/index.php">Nathan Sanders : Journal</a></li>
<li> <a href="http://prospero.bluescarf.net/stuart/">Nerd Industries: Stuart Robinson&#8217;s blog</a></li>
<li> <a href="http://www.nlpado.de/blog/">NLPadLog</a></li>
<li> <a href="http://topicmodels.wordpress.com/">Research Log</a></li>
<li> <a href="http://tdunning.blogspot.com/">Surprise and Coincidence &#8211; musings from the long tail</a> (Ted Dunning)</li>
<li> <a href="http://synthese.wordpress.com/">Synthèse</a> (Andre Vellino)</li>
<li> <a href="http://textai.blogspot.com/">Text and Artificial Intelligence</a> (Shahzad Khan)</li>
<li> <a href="http://windowoffice.tumblr.com/">window office</a> (Jon Elsas)</li>
<li><a href="http://anileklavya.wordpress.com/">अनिल एकलव्य</a> (Anil Eklayva)</li>
</ul>
</li>
<li> Corporations and Institutions working with CL/NLP
<ul>
<li><a href="http://blog.evri.com/">Evri</a></li>
<li> <a href="http://blog.hakia.com/">hakia</a></li>
<li> <a href="http://blog.jamiq.com/">The JamiQ Report</a></li>
<li> <a href="http://blog.larkc.eu/">LarKC</a> (Large Knowledge Collider)</li>
<li> <a href="http://www.lexalytics.com/lexablog/">LexaBlog</a> (Lexalytics)</li>
<li> <a href="http://blogs.msdn.com/naturallanguage/default.aspx">Office Natural Language Team Blog</a> (Microsoft)</li>
<li> <a href="http://opencalais.com/blog">OpenCalais</a></li>
<li><a href="http://www.powerset.com/blog/">Powerset</a> (defunct?)</li>
<li> <a href="http://blog.semantichacker.com/">Semantic Hacker</a> (textwise)</li>
<li> <a href="http://www.texttechnologies.com/">Text Technologies</a> (Monash Research)</li>
<li> <a href="http://www.w3lab.nl/blog/">w3lab &#8211; the science of searching</a></li>
</ul>
</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/1041/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/1041/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=1041&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/01/24/computational-linguistics-blogs/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>Books for Christmas</title>
		<link>http://mendicantbug.com/2009/01/04/books-for-christmas/</link>
		<comments>http://mendicantbug.com/2009/01/04/books-for-christmas/#comments</comments>
		<pubDate>Mon, 05 Jan 2009 02:49:06 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[christmas]]></category>
		<category><![CDATA[collective intelligence]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[computer science]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[evolutionary computing]]></category>
		<category><![CDATA[genetic algorithms]]></category>
		<category><![CDATA[string algorithms]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=981</guid>
		<description><![CDATA[I got most of the books I wanted the most for Christmas this year. It was a great haul that will keep me busy for a while. Among them were: Programming Collective Intelligence: Building Smart Web 2.0 Applications Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology Visualizing Data: Exploring and Explaining Data [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=981&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">I got most of the books I wanted the most for Christmas this year.  It was a great haul that will keep me busy for a while.  Among them were:</p>
<ul>
<li><a href="http://www.amazon.com/gp/product/0596529325?ie=UTF8&amp;tag=themenbug-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0596529325">Programming Collective Intelligence: Building Smart Web 2.0 Applications</a><img style="border:none!important;margin:0!important;" src="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=0596529325" border="0" alt="" width="1" height="1" /></li>
<li><a href="http://www.amazon.com/gp/product/0521585198?ie=UTF8&amp;tag=themenbug-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0521585198">Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology</a><img style="border:none!important;margin:0!important;" src="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=0521585198" border="0" alt="" width="1" height="1" /></li>
<li><a href="http://www.amazon.com/gp/product/0596514557?ie=UTF8&amp;tag=themenbug-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0596514557">Visualizing Data: Exploring and Explaining Data with the Processing Environment</a><img style="border:none!important;margin:0!important;" src="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=0596514557" border="0" alt="" width="1" height="1" /></li>
<li><a href="http://www.amazon.com/gp/product/3540401849?ie=UTF8&amp;tag=themenbug-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=3540401849">Introduction to Evolutionary Computing (Natural Computing Series)</a><img style="border:none!important;margin:0!important;" src="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=3540401849" border="0" alt="" width="1" height="1" /></li>
</ul>
<p style="text-align:justify;">The books on string and tree algorithms and collective intelligence should be self-explanatory.  The book on data visualization I wanted because it was an overlooked skill in my education.  I appreciate great data visualizations and taking some steps to improve my understanding and increase my skills in that area is worth doing.  Finally the book on evolutionary computing is for personal enrichment.  I&#8217;ve been playing around with genetic algorithms since 1994, even before I got out of high school.  It&#8217;s always been <em>playing</em>, though, and I wanted a bit of a more rigorous introduction to them.</p>
<p style="text-align:justify;">With any luck, I&#8217;ll be posting some thoughts on these books in the coming months.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/981/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/981/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=981&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2009/01/04/books-for-christmas/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>

		<media:content url="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=0596529325" medium="image" />

		<media:content url="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=0521585198" medium="image" />

		<media:content url="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=0596514557" medium="image" />

		<media:content url="http://www.assoc-amazon.com/e/ir?t=themenbug-20&amp;l=as2&amp;o=1&amp;a=3540401849" medium="image" />
	</item>
		<item>
		<title>Definition of computational linguistics #87</title>
		<link>http://mendicantbug.com/2008/12/29/definition-of-computational-linguistics-87/</link>
		<comments>http://mendicantbug.com/2008/12/29/definition-of-computational-linguistics-87/#comments</comments>
		<pubDate>Tue, 30 Dec 2008 03:05:51 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[definitions]]></category>
		<category><![CDATA[dictionaries]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=964</guid>
		<description><![CDATA[According to the somewhat suspect Definitions.net (suspect by default, since I haven&#8217;t evaluated it otherwise): 1. (noun) computational linguistics the use of computers for linguistic research and applications This particular definition came to my attention thanks to a Google alert and I thought it was about the shortest definition of computational linguistics I&#8217;ve ever seen. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=964&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>According to the somewhat suspect <a href="http://www.synonyms.net/definition/computational%20linguistics" target="_blank">Definitions.net</a> (suspect by default, since I haven&#8217;t evaluated it otherwise):</p>
<blockquote><p>1. (noun)  computational linguistics<br />
the use of computers for linguistic research and applications</p></blockquote>
<p style="text-align:justify;">This particular definition came to my attention thanks to a Google alert and I thought it was about the shortest definition of computational linguistics I&#8217;ve ever seen.  It might not be a half bad definition for telling friends and family what you do when you don&#8217;t want to see them go all glassy-eyed and start drooling on themselves.  It&#8217;s certainly not a <em>satisfying</em> definition, though.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/964/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/964/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=964&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2008/12/29/definition-of-computational-linguistics-87/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>Group effort</title>
		<link>http://mendicantbug.com/2008/12/08/group-effort/</link>
		<comments>http://mendicantbug.com/2008/12/08/group-effort/#comments</comments>
		<pubDate>Tue, 09 Dec 2008 03:40:09 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[credit]]></category>
		<category><![CDATA[effort]]></category>
		<category><![CDATA[einstein]]></category>
		<category><![CDATA[heroic science]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[multiple discovery]]></category>
		<category><![CDATA[praise]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://mendicantbug.com/?p=940</guid>
		<description><![CDATA[Hal Daume has a nice post that deals with credit in academia among other things. What I took away from this comment is essentially the realization that we are all working toward some vague future goal, which has to do with computationalizing language processing (or some other topic, for the non-NLP audience). Progress is good. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=940&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">Hal Daume has a nice post that deals with <a href="http://nlpers.blogspot.com/2008/12/two-reviewer-comments-that-stuck-with.html" target="_blank">credit in academia</a> among other things.</p>
<blockquote><p>What I took away from this comment is essentially the realization that we are <span style="font-style:italic;">all</span> working toward some vague future goal, which has to do with computationalizing language processing (or some other topic, for the non-NLP audience). Progress is good. If I&#8217;ve done work that has something interesting and novel to say about this goal, then it&#8217;s not bad &#8212; and is often <span style="font-style:italic;">good</span> &#8212; that this builds on and improves on your work.</p></blockquote>
<p style="text-align:justify;">So one illusion people have about science is that it is advanced by giant leaps.  An Einstein comes along and revolutionizes science.  If you live your life with this ambition, you will almost certainly end in disappointment.  Most advancements are small hops forward and are often <a href="http://apperceptual.wordpress.com/2007/01/15/the-heroic-theory-of-scientific-development/" target="_blank">multiply discovered</a>.  Believing that you will somehow be the next Einstein will probably have the opposite effect in your life.</p>
<p style="text-align:justify;">I am becoming more and more convinced that intelligence is a matter of hard work, dedication, and interest.  I&#8217;ve seen some pop science reporting that a growing body of research supports praising kids for their effort rather than telling them how special they are.  I can&#8217;t find the link at the moment, but that rings true to me.  I was always told I was the smartest person in the world as a child, and I think that made me intellectually lazy.  It took some pretty serious life mistakes to learn that so-called intelligence is more about effort.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/940/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/940/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=940&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2008/12/08/group-effort/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>10 Reasons to Use Git for Research</title>
		<link>http://mendicantbug.com/2008/11/30/10-reasons-to-use-git-for-research/</link>
		<comments>http://mendicantbug.com/2008/11/30/10-reasons-to-use-git-for-research/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 04:08:49 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[computer science]]></category>
		<category><![CDATA[cvs]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[reproducibility]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[subversion]]></category>
		<category><![CDATA[version control systems]]></category>

		<guid isPermaLink="false">http://ealdent.wordpress.com/?p=910</guid>
		<description><![CDATA[Git is a version control system that has been gaining in popularity recently.  If you have heard of or used Subversion or CVS, you are familiar with the basic principle of keeping track of changes by multiple users in a series of documents (source code, text files, etc).  One of the chief benefits of version [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=910&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;"><a href="http://git-scm.com/" target="_blank">Git</a> is a <a href="http://en.wikipedia.org/wiki/Revision_control" target="_blank">version control system</a> that has been gaining in popularity recently.  If you have heard of or used <a href="http://subversion.tigris.org/" target="_blank">Subversion</a> or CVS, you are familiar with the basic principle of keeping track of changes by multiple users in a series of documents (source code, text files, etc).  One of the chief benefits of version control in software is that you can rollback in case the code has become corrupted.  It&#8217;s easy to see which changes were made where and broken code can be fixed more easily than if you had no version control and had to reconstruct the working code from scratch.  Unlike Subversion and CVS, Git is a distributed version control system.  Each user has their own copy of the entire repository and history.  Branching and merging is much easier and it&#8217;s extremely simple to get started.  Plus, having used all three, Git is the most fun.</p>
<p style="text-align:justify;">Academic settings impose different constraints on code base management.  The goal is usually less about code quality and more about exploring possibilities.  Academic code is often quite shitty, hacked together by some grad student(s), with dozens of false starts and changes in requirements.  Trying to recreate previous experiments is often very difficult unless the grad student made previsions for such rollbacks.  And if they have, it&#8217;s probably done in a way that seemed logical to the grad student at the time but is a nightmare for someone new to the project.  There are ways to avoid this, by placing more of an emphasis on software engineering, but sometimes projects are so small or short-lived that it doesn&#8217;t seem feasible to trouble with that at first.  And if you don&#8217;t even have a clear picture of where you are heading, it might not even be possible (though you are probably doomed to many problems in that case).</p>
<p style="text-align:justify;">To help combat these issues, I will contend that every academic software project <strong>must</strong> use version control.  Git makes that easy and here&#8217;s why.</p>
<h4><strong>1.  Creating the first repository is a no-brainer.</strong></h4>
<p>To create a new repository you simply type:</p>
<p style="padding-left:30px;"><code>git init</code></p>
<p>It&#8217;s so easy, you can use it for <em>anything</em>.  To clone someone else&#8217;s repository, just type:</p>
<p style="padding-left:30px;"><code>git clone git://location.of.origin.repository</code></p>
<p style="text-align:justify;">Cloning is very similar to checking out in Subversion and CVS, except that you can now work completely independently if you desire.  And you can tunnel it through ssh (substitute ssh:// for git:// above), if you&#8217;re worried about security.</p>
<h4><strong>2.  You can still use it while off the grid.</strong></h4>
<p style="text-align:justify;">In Subversion, creating the initial repository means needing some central place where all of the code goes.  If you are collaborating with several people, chances are this repository is not on your own machine so if you cannot access the network, you cannot access the repository.  With Git, you store the entire repository and history on your own machine so even if you are off the network, you can take advantage of all of the features of having version control.</p>
<h4 style="text-align:justify;"><strong>3.  Branch your experiments.<br />
</strong></h4>
<p style="text-align:justify;">Often the need arises to try out different approaches in academic coding.  Branching in Git is ridiculously simple:</p>
<p style="text-align:justify;padding-left:30px;"><code>git checkout -b new-branch-name</code></p>
<p style="text-align:justify;">You can easily switch between multiple branches, merge branches, or discard them.  One approach might be to keep the main architecture stuff in your master branch (the original) and use branches for different parameters in experiments.  This will let you easily and logically separate functionality so that running an old experiment is just a matter of checking out the branch that pertained to it.  <strong>Update</strong>:  Thanks to <a href="http://twitter.com/dlsspy" target="_blank">Dustin  Sallings</a> for the <a href="http://twitter.com/dlsspy/status/1031715076" target="_blank">shorter version</a> of checking out a new branch.</p>
<h4 style="text-align:justify;"><strong>4.  Version control your paper.</strong></h4>
<p style="text-align:justify;">Why use a shared folder or email to edit your paper?  You can easily create a Git repository to collaborate and merge changes.  You can quickly see who contributed what to a paper.  <a href="http://nitens.org/taraborelli/home">Dario Taraborelli</a> <a href="http://www.academicproductivity.com/2008/revision-control-for-latex-in-search-of-an-answer/">wrote about this</a> a few months ago, though his point was that you would need your collaborators to be familiar with a version control system and they usually aren&#8217;t.  I am arguing that <em>they should be</em>.  On a side note, another VCS, <a href="http://bazaar-vcs.org/" target="_blank">Bazaar</a>, is listed as an alternative in the comments to Dario&#8217;s post.</p>
<h4 style="text-align:justify;"><strong>5.  Convert into an open source project.</strong></h4>
<p style="text-align:justify;">Sourceforge has been around for a while, but the UI is absolute garbage.  There is an even better solution out there:  <a href="https://github.com/" target="_blank">GitHub</a>.  GitHub is free for open source projects and offers some great visualizations for helping you track the life of your open source project.  Of course, there is Google Code, which is quite nice and easy to use.  It doesn&#8217;t support Git, just Subversion.  The drawback to using Google Code is that you have a lifetime max of 10 open source projects.  No such limit with GitHub.  Moving your Git repository to GitHub is also a simple matter of <a href="http://github.com/guides/fork-a-project-and-submit-your-modifications" target="_blank">forking your project</a> to GitHub.</p>
<p style="text-align:justify;">Why does this even matter?  Check out <a href="http://www.d.umn.edu/~tpederse/" target="_blank">Ted Pedersen</a>&#8216;s <em>Empiricism is not a matter of faith</em> [<a href="http://www.d.umn.edu/~tpederse/Pubs/pedersen-last-word-2008.pdf" target="_blank">pdf</a>] in the September issue of <em>Computational Linguistics</em>.  He contends that you should create academic software with the goal of releasing it.  This ensures the survivability of your project, increases the impact of your work, and allows reproducibility of your results.  Git makes that easier, n&#8217;est-ce pas?</p>
<h4 style="text-align:justify;"><strong>6.  Keep track of your grad students.</strong></h4>
<p style="text-align:justify;">Suspect your grad students are slacking?  Check the commit logs!  And now I prepare for hate mail from grad students.  However, I think that if I had this form of accountability, it would have made me more productive.  Of course, you don&#8217;t need Git for this, any version control system would do.  Of all the systems I&#8217;ve used, Git&#8217;s presentation of changes is the user-friendliest.</p>
<h4 style="text-align:justify;"><strong>7.  Version control helps you write the paper.</strong></h4>
<p style="text-align:justify;">When it comes time to write the paper, the version control logs can be used to provide a roadmap of what you have done.  Even though you probably have kept good notes, version control keeps a calendar of events that can add useful perspective (or fill in gaps when your notes are inadequate).</p>
<h4 style="text-align:justify;"><strong>8.  Git is faster and leaner than other version control systems.</strong></h4>
<p style="text-align:justify;">Because you have the complete repository on your own system, most operations are much faster in git.  Git reports an order of magnitude improvement in speed for some operations.  Git has a packed format they report uses less storage in most circumstances, as well.  Git has been reported to be <a href="http://vcscompare.blogspot.com/2008/06/git-mercurial-bazaar-repository-size.html" target="_blank">almost three times </a>more space efficient than Bazaar, another distributed version control system mentioned above.  Git also features <a href="http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html" target="_blank">an easy binary search</a> when trying to locate bugs.</p>
<h4 style="text-align:justify;"><strong>9.  Version control makes it easier to bring new team members up to speed.</strong></h4>
<p style="text-align:justify;">Speaking from experience, having a record of commits (and well documented commits) makes it easier to come up to speed on an existing project.  This applies not only to academic coding but to any coding endeavor.  Good documentation doesn&#8217;t hurt either.</p>
<h4 style="text-align:justify;"><strong>10.  Save yourself some headaches.</strong></h4>
<p style="text-align:justify;">I think you&#8217;ll minimize headaches if you use Git.  If not Git, at least use some version control system.  A lot of the things I listed above are covered by most version control systems, but Git combines regular advantages of version control in a way that is very friendly to non-linear coding situations.  Git also makes it a cinch to move your code into an open source project that can have a significant impact on your career as a researcher.  And Git is so easy to use, you have to ask yourself, <em>why not?</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/910/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/910/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=910&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2008/11/30/10-reasons-to-use-git-for-research/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>LDA in Ruby</title>
		<link>http://mendicantbug.com/2008/11/17/lda-in-ruby/</link>
		<comments>http://mendicantbug.com/2008/11/17/lda-in-ruby/#comments</comments>
		<pubDate>Tue, 18 Nov 2008 04:50:47 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[c]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[latent dirichlet allocation]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[ruby gems]]></category>
		<category><![CDATA[rubyforge]]></category>
		<category><![CDATA[topic modeling]]></category>

		<guid isPermaLink="false">http://ealdent.wordpress.com/?p=880</guid>
		<description><![CDATA[Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across David Blei&#8217;s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=880&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;">Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across <a href="http://www.cs.princeton.edu/~blei/lda-c/" target="_blank">David Blei&#8217;s C code</a> for <a href="http://mendicantbug.com/2008/11/16/latent-dirichlet-allocation/" target="_self">Latent Dirichlet Allocation</a> and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good to know if you need a really fast implementation of something that gets called alot).  Wrapping a C library is slightly harder, but not horribly so.  Probably most of my challenge was the fact that it&#8217;s been so long since I wrote anything in C.</p>
<p style="text-align:justify;">Since the code is open source, I decided to release the Ruby wrapper as <a href="http://github.com/ealdent/lda-ruby/wikis" target="_blank">a gem</a> on <a href="http://github.com" target="_blank">GitHub</a>.  I chose GitHub over <a href="http://rubyforge.org/" target="_blank">RubyForge</a>, because it uses <a href="http://git.or.cz" target="_blank">Git</a> and freakin&#8217; rocks.  But GitHub is a story for another day.  Feel free to contribute and extend the project if you&#8217;re so inclined.</p>
<p>A basic usage example:</p>
<pre class="brush: ruby; title: ; notranslate">
require 'lda'
# create an Lda object for training
lda = Lda::Lda.new
corpus = Lda::Corpus.new(&quot;data/data_file.dat&quot;)
lda.corpus = corpus
# run EM algorithm using random starting points
lda.em(&quot;random&quot;)
lda.load_vocabulary(&quot;data/vocab.txt&quot;)
# print the topic 20 words per topic
lda.print_topics(20)
</pre>
<p style="text-align:justify;">
You can also download the gem from GitHub directly:</p>
<p><code>gem sources -a <a href="http://gems.github.com" rel="nofollow">http://gems.github.com</a><br />
sudo gem install ealdent-lda-ruby</code>
</p>
<p style="text-align:justify;">You only need the first line if you haven&#8217;t added GitHub to your sources before.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/880/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/880/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=880&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2008/11/17/lda-in-ruby/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
		<item>
		<title>Latent Dirichlet Allocation</title>
		<link>http://mendicantbug.com/2008/11/16/latent-dirichlet-allocation/</link>
		<comments>http://mendicantbug.com/2008/11/16/latent-dirichlet-allocation/#comments</comments>
		<pubDate>Sun, 16 Nov 2008 21:30:58 +0000</pubDate>
		<dc:creator>Jason Adams</dc:creator>
				<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[computer science]]></category>
		<category><![CDATA[em algorithm]]></category>
		<category><![CDATA[latent dirichlet allocation]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[topic modeling]]></category>
		<category><![CDATA[topics]]></category>
		<category><![CDATA[unsupervised learning]]></category>

		<guid isPermaLink="false">http://ealdent.wordpress.com/?p=873</guid>
		<description><![CDATA[Latent Dirichlet Allocation (LDA) is an unsupervised method of finding topics in a collection of documents.  It posits a set of possible topics from which a subset are selected for each document.  This selected mixture of topics represents the topics discussed in the document, and each word in the document is generated by this mixture.  [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=873&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p style="text-align:justify;"><a href="http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation" target="_blank">Latent Dirichlet Allocation</a> (LDA) is an unsupervised method of finding topics in a collection of documents.  It posits a set of possible topics from which a subset are selected for each document.  This selected mixture of topics represents the topics discussed in the document, and each word in the document is generated by this mixture.  As a quick example, if we had a short document with the topics <em>geology</em> and <em>astronomy</em>:</p>
<blockquote>
<p style="text-align:justify;"><span style="color:#000000;"><span style="background:#b73333 none repeat scroll 0 0;">The rover traveled many millions of miles through space to arrive at Mars.</span> <span style="background:#33b736 none repeat scroll 0 0;">Once there, it collected soil samples and examined them to determine if liquid water had ever been present on the surface.</span></span></p>
</blockquote>
<p style="text-align:justify;">In this case, the topic <em>astronomy</em> is represented in red and <em>geology</em> in green.  LDA finds these latent topics in an unsupervised fashion using the EM algorithm.  EM is a two step process for estimating parameters in a statistical model.  The nice thing about it is that it&#8217;s guaranteed to converge to a local maximum (not necessarily the global!).  However, it can take a while to converge, depending on the size and nature of the data and model.  While I was in school, EM was one of the most confusing concepts, and I&#8217;m still not 100% on it, but it makes a lot more sense now than it did before.</p>
<p style="text-align:justify;">In the context of LDA, EM is basically doing two things.  First, we come up with an idea about how the topics are distributed.  Next, we look at the actual words and compute the probabilities in the model based on those hypothesized topics.  Eventually we converge to a local &#8220;best&#8221; set of topics.  These may not correspond to realistic topics, but they maximize the negative log probability of the model.  Usually LDA does a pretty good job of finding explainable topics given a decent amount of data.</p>
<p style="text-align:justify;">For more details about LDA, check out the paper by Blei et al (2003).  LDA has been extended in a number of different directions since the original paper, so it&#8217;s essential reading if you&#8217;re doing any sort of topic modeling <sup>[<a href="http://en.wikipedia.org/wiki/Wikipedia:Citation_needed" target="_blank"><em><span style="color:blue;">citation needed</span></em></a>]</sup>.</p>
<h3>References</h3>
<p>D.M. Blei, A.Y. Ng, and M.I. Jordan, “Latent dirichlet allocation,” The Journal of Machine Learning Research,  vol. 3, 2003, pp. 993-1022. [<a href="http://www.cs.princeton.edu/picasso/mats/BleiNgJordan2003_blei.pdf" target="_blank">pdf</a>]</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ealdent.wordpress.com/873/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ealdent.wordpress.com/873/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mendicantbug.com&#038;blog=1474857&#038;post=873&#038;subd=ealdent&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mendicantbug.com/2008/11/16/latent-dirichlet-allocation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/ebec6abd2b9f1eb4de865aed01242171?s=96&amp;d=monsterid&amp;r=PG" medium="image">
			<media:title type="html">ealdent</media:title>
		</media:content>
	</item>
	</channel>
</rss>
