<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:arto="http://ar.to/" xmlns:cc="http://creativecommons.org/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://ar.to/tags/drupal.rss">
    <title xml:lang="en">Arto Bendiken - Drupal</title>
    <link>http://ar.to/tags/drupal</link>
    <description xml:lang="en"></description>
    <items rdf:nodeID="b1"/>
    <cc:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
  </channel>
  <rdf:Seq rdf:nodeID="b1">
    <rdf:li>http://ar.to/node/223</rdf:li>
    <rdf:li>http://ar.to/node/193</rdf:li>
    <rdf:li>http://ar.to/node/49</rdf:li>
  </rdf:Seq>
  <item rdf:about="http://ar.to/node/223">
    <title>RDF for Intrepid Unix Hackers: Grepping N-Triples</title>
    <link>http://ar.to/2010/03/grepping-ntriples</link>
    <description>&lt;p&gt;&lt;em&gt;&lt;small&gt;I originally wrote this tutorial for the &lt;a href=&quot;http://blog.datagraph.org/2010/03/grepping-ntriples&quot;&gt;Datagraph Blog&lt;/a&gt;. Subscribe to the &lt;a href=&quot;http://feeds.feedburner.com/datagraph&quot;&gt;Datagraph RSS feed&lt;/a&gt; to get subsequent installments.&lt;/small&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;http://en.wikipedia.org/wiki/N-Triples&quot;&gt;N-Triples&lt;/a&gt; format is the lowest common denominator for &lt;a href=&quot;http://www.w3.org/RDF/&quot;&gt;RDF&lt;/a&gt;
serialization formats, and turns out to be a very good fit to the Unix
paradigm of line-oriented, whitespace-separated data processing. In this
tutorial we'll see how to process N-Triples data by &lt;a href=&quot;http://en.wikipedia.org/wiki/Pipeline_(Unix)&quot;&gt;pipelining&lt;/a&gt;
standard Unix tools such as &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;wc&lt;/code&gt;, &lt;code&gt;cut&lt;/code&gt;, &lt;code&gt;awk&lt;/code&gt;, &lt;code&gt;sort&lt;/code&gt;, &lt;code&gt;uniq&lt;/code&gt;,
&lt;code&gt;head&lt;/code&gt; and &lt;code&gt;tail&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To follow along, you will need access to a Unix box (Mac OS X, Linux, or
BSD) with a &lt;a href=&quot;http://en.wikipedia.org/wiki/Bash&quot;&gt;Bash&lt;/a&gt;-compatible shell. We'll be using &lt;a href=&quot;http://curl.haxx.se/&quot;&gt;&lt;code&gt;curl&lt;/code&gt;&lt;/a&gt; to
fetch data over HTTP, but you can substitute &lt;code&gt;wget&lt;/code&gt; or &lt;code&gt;fetch&lt;/code&gt; if necessary. 
A couple of the examples require a modern &lt;a href=&quot;http://en.wikipedia.org/wiki/AWK&quot;&gt;AWK&lt;/a&gt; version such as
&lt;a href=&quot;http://www.gnu.org/software/gawk/&quot;&gt;&lt;code&gt;gawk&lt;/code&gt;&lt;/a&gt; or &lt;a href=&quot;http://invisible-island.net/mawk/mawk.html&quot;&gt;&lt;code&gt;mawk&lt;/code&gt;&lt;/a&gt;; on Linux distributions you should be okay
by default, but on Mac OS X you will need to install &lt;code&gt;gawk&lt;/code&gt; or &lt;code&gt;mawk&lt;/code&gt; from
&lt;a href=&quot;http://www.macports.org/&quot;&gt;MacPorts&lt;/a&gt; as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ sudo port install mawk
$ alias awk=mawk
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Grokking N-Triples&lt;/h2&gt;
&lt;p&gt;Each N-Triples line encodes one RDF statement, also known as a &lt;em&gt;triple&lt;/em&gt;.
Each line consists of the subject (a URI or a blank node identifier), one or
more characters of whitespace, the predicate (a URI), some more whitespace,
and finally the object (a URI, blank node identifier, or literal) followed
by a dot and a newline. For example, the following N-Triples statement
asserts the title of my website:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://ar.to/&amp;gt; &amp;lt;http://purl.org/dc/terms/title&amp;gt; &quot;Arto Bendiken&quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is an almost perfect format for Unix tooling; the only possible further
improvement would have been to define the statement component separator to
be a tab character, which would have simplified obtaining the object
component of statements -- as we'll see in a bit.&lt;/p&gt;
&lt;h2&gt;Getting N-Triples&lt;/h2&gt;
&lt;p&gt;Many RDF data dumps are made available as compressed N-Triples files.
&lt;a href=&quot;http://dbpedia.org/&quot;&gt;DBpedia&lt;/a&gt;, the RDFization of Wikipedia, is a prominent example. For
purposes of this tutorial I've prepared an N-Triples dataset containing all
&lt;a href=&quot;http://drupal.org/&quot;&gt;Drupal&lt;/a&gt;-related RDF statements from &lt;a href=&quot;http://wiki.dbpedia.org/Downloads34&quot;&gt;DBpedia 3.4&lt;/a&gt;, which is the latest
release at the moment and reflects Wikipedia as of late September 2009.&lt;/p&gt;
&lt;p&gt;I prepared the sample dataset by downloading all English-language core
datasets (20 N-Triples files totaling 2.1 GB when compressed) and crunching
through them as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ bzgrep Drupal *.nt.bz2 &amp;gt; drupal.nt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To save you from gigabyte-sized downloads and an hour of data crunching, you
can just grab a copy of the resulting &lt;a href=&quot;http://blog.datagraph.org/2010/03/grepping-ntriples/drupal.nt&quot;&gt;&lt;code&gt;drupal.nt&lt;/code&gt;&lt;/a&gt; file as
follows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ curl http://blog.datagraph.org/2010/03/grepping-ntriples/drupal.nt &amp;gt; drupal.nt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The sample dataset totals 294 RDF statements and weighs in at 70 KB.&lt;/p&gt;
&lt;h2&gt;Counting N-Triples&lt;/h2&gt;
&lt;p&gt;The first thing we want to do is count the number of triples in an N-Triples
dataset. This is straightforward to do, since each triple is represented by
one line in an N-Triples input file and there are a number of Unix tools
that can be used to count input lines. For example, we could use either of
the following commands:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ cat drupal.nt | wc -l
294

$ cat drupal.nt | awk 'END { print NR }'
294
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since we'll be using a lot more of &lt;a href=&quot;http://en.wikipedia.org/wiki/AWK&quot;&gt;AWK&lt;/a&gt; throughout this tutorial, let's
stick with &lt;code&gt;awk&lt;/code&gt; and define a handy shell alias for this operation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-count=&quot;awk 'END { print NR }'&quot;

$ cat drupal.nt | rdf-count
294
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that, for reasons of comprehensibility, the previous examples as well
as most of the subsequent ones assume that we're dealing with &quot;clean&quot;
N-Triples datasets that don't contain comment lines or other miscellania.
The DBpedia data dumps fit this bill very well. However, further onwards I
will give &quot;fortified&quot; versions of these commands that can correctly deal
with arbitrary N-Triples files.&lt;/p&gt;
&lt;h2&gt;Measuring N-Triples&lt;/h2&gt;
&lt;p&gt;We at Datagraph frequently use the N-Triples representation as the canonical
lexical form of an RDF statement, and work with &lt;a href=&quot;http://rdfcache.rubyforge.org/&quot;&gt;content-addressable storage
systems for RDF data&lt;/a&gt; that in fact &lt;em&gt;store&lt;/em&gt; statements using their
N-Triples representation. In such cases, it is often useful to know some
statistical characteristics of the data to be loaded in a mass import, so as
to e.g. be able to fine-tune the underlying storage for optimum space
efficiency.&lt;/p&gt;
&lt;p&gt;A first useful statistic is to know the typical size of a datum, i.e. the
line length of an N-Triples statement, in the dataset we're dealing with.
AWK yields us N-Triples line lengths without much trouble:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-lengths=&quot;awk '{ print length }'&quot;

$ cat drupal.nt | rdf-lengths | head -n5
162
150
155
137
150
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that N-Triples is an ASCII format, so the numbers above reflect both
the byte sizes of input lines as well as the ASCII character count of input
lines. All non-ASCII characters are escaped in N-Triples, and for present
purposes we'll be talking in terms of ASCII characters only.&lt;/p&gt;
&lt;p&gt;The above list of line lengths in and of itself won't do us much good; we
want to obtain aggregate information for the whole dataset at hand, not for
individual statements. It's too bad that Unix doesn't provide commands for
simple numeric aggregate operations such as the minimum, maximum and average
of a list of numbers, so let's see if we can remedy that.&lt;/p&gt;
&lt;p&gt;One way to define such operations would be to pipe the above output to an
&lt;a href=&quot;http://en.wikipedia.org/wiki/Reverse_Polish_notation&quot;&gt;RPN&lt;/a&gt; shell calculator such as &lt;code&gt;dc&lt;/code&gt; and have it perform the needed
calculations. The complexity of this would go somewhat beyond mere shell
aliases, however. Thankfully, it turns out that AWK is well-suited to
writing these aggregate operations as well. Here's how we can extend
our earlier pipeline to boil the list of line lengths down to an average:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias avg=&quot;awk '{ s += \$1 } END { print s / NR }'&quot;

$ cat drupal.nt | rdf-lengths | avg
242.517
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The above, incidentally, is an example of a simple &lt;a href=&quot;http://en.wikipedia.org/wiki/MapReduce&quot;&gt;map/reduce&lt;/a&gt;
operation: a sequence of input values is &lt;em&gt;mapped&lt;/em&gt; through a function, in
this case &lt;code&gt;length(line)&lt;/code&gt;, to give a sequence of output values (the line
lengths) that is then &lt;em&gt;reduced&lt;/em&gt; to a single aggregate value (the average
line length). Though I won't go further into this just now, it is worth
mentioning in passing that N-Triples is an ideal format for massively
parallel processing of RDF data using &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;Hadoop&lt;/a&gt; and the like.&lt;/p&gt;
&lt;p&gt;Now, we can still optimize and simplify the above some by combining both
steps of the operation into a single alias that outputs an average line
length for the given input stream, like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-length-avg=&quot;awk '\
  { s += length }
  END { print s / NR }'&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Likewise, it doesn't take much more to define an alias for obtaining the
maximum line length in the input dataset:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-length-max=&quot;awk '\
  BEGIN { n = 0 } \
  { if (length &amp;gt; n) n = length } \
  END { print n }'&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Getting the minimum line length is only slightly more complicated. Instead
of comparing against a zero baseline like above, we need to instead define a
&quot;roof&quot; value to compare against. In the following, I've picked an
arbitrarily large number, making the (at present) reasonable assumption that
no N-Triples line will be longer than a billion ASCII characters, which
would amount to somewhat less than a binary gigabyte:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-length-min=&quot;awk '\
  BEGIN { n = 1e9 } \
  { if (length &amp;gt; 0 &amp;amp;&amp;amp; length &amp;lt; n) n = length } \
  END { print (n &amp;lt; 1e9 ? n : 0) }'&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have some aggregate operations to crunch N-Triples data with,
let's analyze our sample DBpedia dataset using the three aliases defined
above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ cat drupal.nt | rdf-length-avg
242.517

$ cat drupal.nt | rdf-length-max
2179

$ cat drupal.nt | rdf-length-min
84
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see from the output that N-Triples line lengths in this dataset vary
considerably: from less than a hundred bytes to several kilobytes, but being
on average in the range of two hundred bytes. This variability is to be
expected for DBpedia data, given that many RDF statements in such a dataset
contain a long textual description as their object literal whereas others
contain merely a simple integer literal.&lt;/p&gt;
&lt;p&gt;Many other statistics, such as the median line length or the standard
deviation of the line lengths, could conceivably be obtained in a manner
similar to what I've shown above. I'll leave those as exercises for the
reader, however, as further stats regarding the raw N-Triples lines are
unlikely to be all that generally interesting.&lt;/p&gt;
&lt;h2&gt;Parsing N-Triples&lt;/h2&gt;
&lt;p&gt;It's time to move on to getting at the three components -- the subject, the
predicate and the object -- that constitute RDF statements.&lt;/p&gt;
&lt;p&gt;We have two straightforward choices for obtaining the subject and predicate:
the &lt;code&gt;cut&lt;/code&gt; command and good old &lt;code&gt;awk&lt;/code&gt;. I'll show both aliases:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-subjects=&quot;cut -d' ' -f 1 | uniq&quot;
$ alias rdf-subjects=&quot;awk '{ print \$1 }' | uniq&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While &lt;code&gt;cut&lt;/code&gt; might shave off some microseconds compared to &lt;code&gt;awk&lt;/code&gt; here, AWK is
still the better choice for the general case, as it allows us to expand the
alias definition to ignore empty lines and comments, as we'll see later. On
our sample data, though, either form works fine.&lt;/p&gt;
&lt;p&gt;You may have noticed and wondered about the pipelined &lt;code&gt;uniq&lt;/code&gt; after &lt;code&gt;cut&lt;/code&gt; and
&lt;code&gt;awk&lt;/code&gt;. This is simply a low-cost, low-grade deduplication filter: it drops
consequent duplicate values. For an ordered dataset (where the input
N-Triples lines are already sorted in lexical order), it will get rid of all
duplicate subjects. In an unordered dataset, it won't do much good, but it
won't do much harm either (what's a microsecond here or there?)&lt;/p&gt;
&lt;p&gt;To fully deduplicate the list of subjects for a (potentially) unordered
dataset, apply another &lt;code&gt;uniq&lt;/code&gt; filter after a &lt;code&gt;sort&lt;/code&gt; operation as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ cat drupal.nt | rdf-subjects | sort | uniq | head -n5
&amp;lt;http://dbpedia.org/resource/Acquia_Drupal&amp;gt;
&amp;lt;http://dbpedia.org/resource/Adland&amp;gt;
&amp;lt;http://dbpedia.org/resource/Advomatic&amp;gt;
&amp;lt;http://dbpedia.org/resource/Apadravya&amp;gt;
&amp;lt;http://dbpedia.org/resource/Application_programming_interface&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I've not made &lt;code&gt;sort&lt;/code&gt; an integral part of the &lt;code&gt;rdf-subjects&lt;/code&gt; alias because
sorting the subjects is an expensive operation with resource usage
proportional to the number of statements processed; when processing a
billion-triple N-Triples stream, it is usually simply better to not care too
much about ordering.&lt;/p&gt;
&lt;p&gt;Getting the predicates from N-Triples data works exactly the same way as
getting the subjects:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-predicates=&quot;cut -d' ' -f 2 | uniq&quot;
$ alias rdf-predicates=&quot;awk '{ print \$2 }' | uniq&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, you can apply &lt;code&gt;sort&lt;/code&gt; in conjunction  with &lt;code&gt;uniq&lt;/code&gt; to get the list of
unique predicate URIs in the dataset:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ cat drupal.nt | rdf-predicates | sort | uniq | tail -n5
&amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt;
&amp;lt;http://www.w3.org/2004/02/skos/core#subject&amp;gt;
&amp;lt;http://xmlns.com/foaf/0.1/depiction&amp;gt;
&amp;lt;http://xmlns.com/foaf/0.1/homepage&amp;gt;
&amp;lt;http://xmlns.com/foaf/0.1/page&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Obtaining the object component of N-Triples statements, however, is somewhat more
complicated than getting the subject or the predicate. This is due to the
fact that object literals can contain whitespace that will throw off the
whitespace-separated field handling of &lt;code&gt;cut&lt;/code&gt; and &lt;code&gt;awk&lt;/code&gt; that we've relied on
so far. Not to worry, AWK can still get us the results we want, but I won't
attempt to explain how the following alias works; just be happy that it
does:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-objects=&quot;awk '{ ORS=\&quot;\&quot;; for (i=3;i&amp;lt;=NF-1;i++) print \$i \&quot; \&quot;; print \&quot;\n\&quot; }' | uniq&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output of &lt;code&gt;rdf-objects&lt;/code&gt; is the N-Triples encoded object URI, blank node
identifier or object literal. URIs are output in the same format as
subjects and predicates, with enclosing angle brackets; language-tagged
literals include the language tag, and datatyped literals include the
datatype URI:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ cat drupal.nt | rdf-objects | sort | uniq | head -n5
&quot;09&quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#integer&amp;gt;
&quot;16&quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#integer&amp;gt;
&quot;2001-01&quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#gYearMonth&amp;gt;
&quot;2009&quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#integer&amp;gt;
&quot;6.14&quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#decimal&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Another very useful operation to have is getting the list of object literal
datatypes used in an N-Triples dataset. This is also a somewhat involved
alias definition, and requires a modern AWK version such as &lt;a href=&quot;http://www.gnu.org/software/gawk/&quot;&gt;&lt;code&gt;gawk&lt;/code&gt;&lt;/a&gt;
or &lt;a href=&quot;http://invisible-island.net/mawk/mawk.html&quot;&gt;&lt;code&gt;mawk&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ alias rdf-datatypes=&quot;awk -F'\x5E' '/\&quot;\^\^&amp;lt;/ { print substr(\$3, 1, length(\$3)-2) }' | uniq&quot;

$ cat drupal.nt | rdf-datatypes | sort | uniq
&amp;lt;http://www.w3.org/2001/XMLSchema#decimal&amp;gt;
&amp;lt;http://www.w3.org/2001/XMLSchema#gYearMonth&amp;gt;
&amp;lt;http://www.w3.org/2001/XMLSchema#integer&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we can see, most object literals in this dataset are untyped strings, but
there are some decimal and integer values as well as year + month literals.&lt;/p&gt;
&lt;h2&gt;Aliasing N-Triples&lt;/h2&gt;
&lt;p&gt;As promised, here follow more robust versions of all the aforementioned Bash
aliases. Just copy and paste the following code snippet into your
&lt;code&gt;~/.bash_aliases&lt;/code&gt; or &lt;code&gt;~/.bash_profile&lt;/code&gt; file, and you will always have these
aliases available when working with N-Triples data on the command line.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# N-Triples aliases from http://blog.datagraph.org/2010/03/grepping-ntriples
alias rdf-count=&quot;awk '/^\s*[^#]/ { n += 1 } END { print n }'&quot;
alias rdf-lengths=&quot;awk '/^\s*[^#]/ { print length }'&quot;
alias rdf-length-avg=&quot;awk '/^\s*[^#]/ { n += 1; s += length } END { print s/n }'&quot;
alias rdf-length-max=&quot;awk 'BEGIN { n=0 } /^\s*[^#]/ { if (length&amp;gt;n) n=length } END { print n }'&quot;
alias rdf-length-min=&quot;awk 'BEGIN { n=1e9 } /^\s*[^#]/ { if (length&amp;gt;0 &amp;amp;&amp;amp; length&amp;lt;n) n=length } END { print (n&amp;lt;1e9 ? n : 0) }'&quot;
alias rdf-subjects=&quot;awk '/^\s*[^#]/ { print \$1 }' | uniq&quot;
alias rdf-predicates=&quot;awk '/^\s*[^#]/ { print \$2 }' | uniq&quot;
alias rdf-objects=&quot;awk '/^\s*[^#]/ { ORS=\&quot;\&quot;; for (i=3;i&amp;lt;=NF-1;i++) print \$i \&quot; \&quot;; print \&quot;\n\&quot; }' | uniq&quot;
alias rdf-datatypes=&quot;awk -F'\x5E' '/\&quot;\^\^&amp;lt;/ { print substr(\$3, 2, length(\$3)-4) }' | uniq&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I should also note that though I've spoken throughout only in terms of
N-Triples, most of the above aliases will work fine also for input in
&lt;a href=&quot;http://sw.deri.org/2008/07/n-quads/&quot;&gt;N-Quads&lt;/a&gt; format.&lt;/p&gt;
&lt;p&gt;In the next installments of &lt;em&gt;RDF for Intrepid Unix Hackers&lt;/em&gt;, we'll attempt
something a little more ambitious: building a &lt;code&gt;rdf-query&lt;/code&gt; alias to perform
subject-predicate-object queries on N-Triples input. We'll also see what to
do if your RDF data isn't already in N-Triples format, learning how to
install and use the &lt;a href=&quot;http://librdf.org/raptor/&quot;&gt;Raptor RDF Parser Library&lt;/a&gt; to convert RDF data
between the various popular RDF serialization formats. &lt;a href=&quot;http://feeds.feedburner.com/datagraph&quot;&gt;Stay
tuned&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;em&gt;Lest there be any doubt, all the code in this tutorial is hereby
released into the public domain using the &lt;a href=&quot;http://unlicense.org/&quot;&gt;Unlicense&lt;/a&gt;. You are free to
copy, modify, publish, use, sell and distribute it in any way you please,
with or without attribution.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;</description>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2010-03-05T16:00:00Z</dc:date>
    <dc:creator>Arto</dc:creator>
    <dc:subject>AWK</dc:subject>
    <dc:subject>DBpedia</dc:subject>
    <dc:subject>Drupal</dc:subject>
    <dc:subject>N-Triples</dc:subject>
    <dc:subject>RDF</dc:subject>
    <dc:subject>Unix</dc:subject>
    <cc:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
  </item>
  <item rdf:about="http://ar.to/node/193">
    <title>RDFizing Drupal: Upgrading the RSS Feeds</title>
    <link>http://ar.to/2009/03/rdfizing-drupal-rss-feeds</link>
    <description>&lt;p&gt;&lt;em&gt;This is the first part in a series of articles on &lt;a href=&quot;http://ar.to/articles/rdfizing-drupal&quot;&gt;RDFizing Drupal&lt;/a&gt;, showing how you can make use of the &lt;a href=&quot;http://drupal.org/project/rdf&quot;&gt;RDF module&lt;/a&gt; for Drupal 6.x to set your data free and connect your Drupal site to the emerging &lt;a href=&quot;http://linkeddata.org/&quot;&gt;Linked Data&lt;/a&gt; web.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;
If you've been wanting to use &lt;a href=&quot;http://drupal.org/&quot;&gt;Drupal&lt;/a&gt; 6.x for creating &lt;a href=&quot;http://drupal.org/node/219862&quot; title=&quot;Resource Description Framework&quot;&gt;RDF&lt;/a&gt;-enabled websites you've probably been annoyed at the fact that Drupal outputs feeds in &lt;acronym title=&quot;Really Simple Syndication&quot;&gt;RSS&lt;/acronym&gt; 2.0 format, which isn't based on RDF. This article will show you, step by step, how to upgrade&lt;a href=&quot;#footnote-rss&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt; all of Drupal's RSS feeds into clean, extensible and RDF-compatible &lt;a href=&quot;http://web.resource.org/rss/1.0/&quot;&gt;&lt;acronym title=&quot;RDF Site Summary&quot;&gt;RSS&lt;/acronym&gt; 1.0&lt;/a&gt; format.
&lt;/p&gt;

&lt;p&gt;
To get started you must first, of course, install the &lt;a href=&quot;http://drupal.org/project/rdf&quot;&gt;RDF module&lt;/a&gt;. Any version of the module since RDF 6.x-1.0-alpha6 ought to do fine. There are no dependencies that you need to care about&lt;a href=&quot;#footnote-xmlwriter&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/a&gt; other than ensuring your PHP version is sufficiently recent (PHP 5.2.0 or newer). Just follow the installation instructions in the accompanying &lt;a href=&quot;http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/rdf/INSTALL.txt?view=markup&quot; title=&quot;How to install the Drupal RDF module&quot;&gt;INSTALL.txt&lt;/a&gt; file, and then enable the module at &lt;em&gt;Administer &amp;raquo; Site building &amp;raquo; Modules&lt;/em&gt;:
&lt;/p&gt;

&lt;a name=&quot;image-module&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/module.png&quot;
  alt=&quot;Administer &amp;gt; Site building &amp;gt; Modules&quot;
  width=&quot;727&quot; height=&quot;172&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
After enabling the module, navigate to &lt;em&gt;Administer &amp;raquo; Site configuration &amp;raquo; RDF settings &amp;raquo; Feeds&lt;/em&gt;:
&lt;/p&gt;

&lt;a name=&quot;image-breadcrumb&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/breadcrumb.png&quot;
  alt=&quot;Home &amp;gt; Administer &amp;gt; Site configuration &amp;gt; RDF settings&quot;
  width=&quot;727&quot; height=&quot;57&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
On this screen you'll see a listing of all the available RSS feeds output by Drupal's core modules. These are published, in Drupal 6.x, by the &lt;em&gt;Node&lt;/em&gt;, &lt;em&gt;Taxonomy&lt;/em&gt;, &lt;em&gt;Blog&lt;/em&gt; and &lt;em&gt;Aggregator&lt;/em&gt; modules. (If you've installed the &lt;a href=&quot;http://drupal.org/project/views&quot;&gt;Views&lt;/a&gt; module, this screen will also list any RDF feeds you've enabled for your views&lt;a href=&quot;#footnote-hook_rdf_feeds&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/a&gt;.)
&lt;/p&gt;

&lt;a name=&quot;image-feeds&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/feeds.png&quot;
  alt=&quot;Home &amp;gt; Administer &amp;gt; Site configuration &amp;gt; RDF settings &amp;gt; Feeds&quot;
  width=&quot;727&quot; height=&quot;327&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
To upgrade any of the core feeds to RDF format, simply use the &lt;em&gt;enable&lt;/em&gt; action. This will present you with the following choice:
&lt;/p&gt;

&lt;a name=&quot;image-config&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/config.png&quot;
  alt=&quot;Configure RDF feed&quot;
  width=&quot;727&quot; height=&quot;337&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
Note that you can downgrade back to the default RSS 2.0 feeds at any time, so don't be afraid to experiment. To RDFize your feed, simply select the RSS 1.0 option and save the configuration. You will be returned to the same screen with a status message indicating that the feed was upgraded:
&lt;/p&gt;

&lt;a name=&quot;image-config-upgrade&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/config-upgrade.png&quot;
  alt=&quot;The feed rss.xml has been upgraded for RDF compatibility.&quot;
  width=&quot;727&quot; height=&quot;277&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
Once you've upgraded a feed, some additional configuration options will be made available to you. (Note that you &lt;em&gt;don't&lt;/em&gt; need to change any of these settings if you don't want to, and everything will work just as before using the defaults; feel free to skip ahead several paragraphs if you don't care to tinker with this at the moment.)
&lt;/p&gt;

&lt;p&gt;
In the &lt;em&gt;Channel settings&lt;/em&gt; section, you will find settings that implement the &lt;a href=&quot;http://web.resource.org/rss/1.0/modules/syndication/&quot;&gt;RSS 1.0 syndication hints&lt;/a&gt; specification. This is a standard that specifies advisory metadata that you can include in your feed to tell feed readers how often your feed is updated. This allows aggregators to optimize how often they'll re-fetch your feed, and hence also affords you some potential control over your bandwidth usage:
&lt;/p&gt;

&lt;a name=&quot;image-config-channel&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/config-channel.png&quot;
  alt=&quot;Configure RDF feed - channel settings&quot;
  width=&quot;727&quot; height=&quot;442&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
Here you can also change the RSS feed's serialization format. RDF can be represented in a wide variety of &lt;a href=&quot;http://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats&quot;&gt;serialization formats&lt;/a&gt;, and the RDF module provides support for some of the most popular ones (if you install the optional &lt;a href=&quot;http://arc.semsol.org/&quot;&gt;ARC&lt;/a&gt; RDF library, you will get support for yet more formats). However, only explicitly RDF-aware feed aggregators can handle anything else than the default RDF/XML serialization, so be advised that changing this setting is probably a rather bad idea for the time being.
&lt;/p&gt;

&lt;p&gt;
Below the channel settings you will find a section for configuring how feed items (that is, Drupal nodes, taxonomy terms, and such) are output in the RSS feed. At the moment, you have additional two settings: you can configure how body fields get output (using the teaser only, or including the full text), and you can configure whether date/time information in the feed includes the time zone component (if applicable, such as for &lt;a href=&quot;http://drupal.org/project/date&quot;&gt;Date&lt;/a&gt; module fields) or whether all times will be output in UTC:
&lt;/p&gt;

&lt;a name=&quot;image-config-item&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/config-item.png&quot;
  alt=&quot;Configure RDF feed - item settings&quot;
  width=&quot;727&quot; height=&quot;297&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
Once you're done with the feed settings, save the configuration and you'll be returned to the RDF feed management screen. Notice that the &lt;em&gt;Operations&lt;/em&gt; column indicates which feeds have been upgraded to RDF, with the &lt;em&gt;enable&lt;/em&gt; action changing to &lt;em&gt;configure&lt;/em&gt; where applicable:
&lt;/p&gt;

&lt;a name=&quot;image-config-save&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/config-save.png&quot;
  alt=&quot;The feed configuration for rss.xml has been updated.&quot;
  width=&quot;727&quot; height=&quot;362&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
A special note on Drupal's front page feed, &lt;tt&gt;rss.xml&lt;/tt&gt;: once you've RDFized this feed, it isn't ideal that it still has the all-too-generic URL extension &lt;tt&gt;.xml&lt;/tt&gt;. You can certainly keep it that way if you wish (feed aggregators parse feeds based on the MIME content type, not the file extension), but Drupal makes it so trivially easy to rename the feed's URL that I'd recommend doing so. A more appropriate extension for RSS 1.0 feeds would be &lt;tt&gt;.rss&lt;/tt&gt; or &lt;tt&gt;.rdf&lt;/tt&gt;. You can rename the feed URL by navigating to &lt;em&gt;Administer &amp;raquo; Site building &amp;raquo; URL aliases &amp;raquo; Add alias&lt;/em&gt; and entering something like the following:
&lt;/p&gt;

&lt;a name=&quot;image-alias&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/alias.png&quot;
  alt=&quot;Home &amp;gt; Administer &amp;gt; Site building &amp;gt; URL aliases &amp;gt; Add alias&quot;
  width=&quot;727&quot; height=&quot;367&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
Based on the feeds listed at &lt;a href=&quot;http://planetrdf.com/&quot;&gt;Planet RDF&lt;/a&gt;, &lt;tt&gt;index.rdf&lt;/tt&gt; would seem to be the most popular URL for a front page feed, so that's a data point to take into consideration. (I've been contrarian on this, myself, and named my blog's feed simply &lt;tt&gt;blog.rss&lt;/tt&gt;, intending it to only include blog posts. I'm using the &lt;tt&gt;.rss&lt;/tt&gt; extension to differentiate my RSS feeds from other RDF data that I will publish here later using the usual &lt;tt&gt;.rdf&lt;/tt&gt; extension.)
&lt;/p&gt;

&lt;p&gt;
Now, in a similar way as you would rename Drupal's &lt;tt&gt;rss.xml&lt;/tt&gt;, you can  also define URL aliases for any of the other non-wildcard feeds listed on the RDF feeds management screen:
&lt;/p&gt;

&lt;a name=&quot;image-feeds-aliased&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/feeds-aliased.png&quot;
  alt=&quot;Home &amp;gt; Administer &amp;gt; Site configuration &amp;gt; RDF settings &amp;gt; Feeds&quot;
  width=&quot;727&quot; height=&quot;327&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
And if you'd like to rename any of the displayed wildcard feeds, such as the taxonomy feeds at paths of the form &lt;tt&gt;taxonomy/term/%/0/feed&lt;/tt&gt;, that's easy enough to do by installing the excellent &lt;a href=&quot;http://drupal.org/project/pathauto&quot;&gt;Pathauto&lt;/a&gt; module that will automatically create such URL aliases where needed. Here on my blog, for instance, all my &lt;a href=&quot;http://ar.to/tags&quot;&gt;tags&lt;/a&gt; have RDFized feeds with URLs of the form &lt;a href=&quot;http://ar.to/tags/drupal.rss&quot;&gt;http://ar.to/tags/drupal.rss&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
If you're a perfectionist, consider also installing the &lt;a href=&quot;http://drupal.org/project/globalredirect&quot;&gt;Global Redirect&lt;/a&gt; module to ensure that attempts to access a non-aliased feed URL will result in an HTTP redirect to the canonical aliased URL. For example, should you try to load up &lt;a href=&quot;http://ar.to/rss.xml&quot; title=&quot;My non-aliased RSS feed&quot;&gt;http://ar.to/rss.xml&lt;/a&gt;, you will be redirected to &lt;a href=&quot;http://ar.to/blog.rss&quot; title=&quot;My aliased RSS feed&quot;&gt;http://ar.to/blog.rss&lt;/a&gt; which is the alias I've defined for my front page feed. Among other benefits, this makes sure that search engines won't index both URLs.
&lt;/p&gt;

&lt;p&gt;
Once you've RDFized your feeds, you may want to use &lt;a href=&quot;http://www.w3.org/RDF/Validator/&quot;&gt;W3C's RDF Validation Service&lt;/a&gt; to double-check that everything turned out a-okay and that your feeds are indeed valid RDF. My blog feed is clearly bursting at the seams with RDFness, as &lt;a href=&quot;http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Far.to%2Fblog.rss&amp;amp;PARSE=Parse+URI%3A+&amp;amp;TRIPLES_AND_GRAPH=PRINT_BOTH&amp;amp;FORMAT=PNG_EMBED&quot;&gt;validating it&lt;/a&gt; yields the following reassuring message:
&lt;/p&gt;

&lt;a name=&quot;image-validate&quot;&gt;&lt;/a&gt;
&lt;img src=&quot;http://ar.to/2009/03/rdfizing-drupal-rss-feeds/validate.png&quot;
  alt=&quot;W3C RDF Validation Service&quot;
  width=&quot;727&quot; height=&quot;227&quot; border=&quot;0&quot;/&gt;

&lt;p&gt;
In case you are still learning RDF, the validation service is also a great way to view the underlying &lt;a href=&quot;http://en.wikipedia.org/wiki/Resource_Description_Framework#Overview&quot;&gt;triples&lt;/a&gt; (RDF statements) that constitute RDF documents such as your RSS feed. You can get the triples listed both in table format and rendered as a graph in a variety of graphics formats; this can really be helpful in grokking how simple RDF actually is beneath all that &lt;acronym title=&quot;Extensible Markup Language&quot;&gt;XML&lt;/acronym&gt; verbiage.
&lt;/p&gt;

&lt;p&gt;
Well, that's all for now. Go forth and RDFize all your feeds; you know you want to. I will add a link here to the first several people who upgrade their Drupal feeds per these instructions (just &lt;a href=&quot;http://ar.to/contact&quot; title=&quot;Contact Me&quot;&gt;leave me a note&lt;/a&gt; with a link to your site). And should you run into any trouble with these instructions, please &lt;a href=&quot;http://drupal.org/node/add/project-issue/rdf&quot;&gt;post an issue&lt;/a&gt; at drupal.org and we'll see if we can sort it out.
&lt;/p&gt;

&lt;p&gt;
&lt;em&gt;In the next couple parts of this article series, I'll be talking about how you can include additional &lt;a href=&quot;http://drupal.org/project/cck&quot;&gt;CCK&lt;/a&gt; fields in your RSS feeds, and how to enable &lt;a href=&quot;http://rdfa.info/&quot;&gt;RDFa&lt;/a&gt; (affectionately known as &quot;microformats on steroids&quot;) on your Drupal site. Be sure to subscribe to the aforementioned &lt;a href=&quot;http://ar.to/blog.rss&quot;&gt;feed&lt;/a&gt; to get these upcoming articles!&lt;/em&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;strong&gt;Update:&lt;/strong&gt; &lt;a href=&quot;http://www.juliakm.com/&quot;&gt;Julia Kulla-Mader&lt;/a&gt; (&lt;a href=&quot;http://www.juliakm.com/rss.rdf&quot;&gt;RSS&lt;/a&gt;) and &lt;a href=&quot;http://web3.ee/&quot;&gt;Kaido Toomingas&lt;/a&gt; (&lt;a href=&quot;http://web3.ee/rss.rdf&quot;&gt;RSS&lt;/a&gt;) are the first pioneers to brave these waters and RDFize their feeds. &lt;a href=&quot;http://groups.drupal.org/node/20529&quot;&gt;Anyone else?&lt;/a&gt;
&lt;/p&gt;

&lt;br/&gt;

&lt;p&gt;
&lt;small&gt;&lt;a name=&quot;footnote-rss&quot; href=&quot;#footnote-rss&quot;&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt;
I won't here delve into &lt;a href=&quot;http://en.wikipedia.org/wiki/RSS#History&quot;&gt;the history&lt;/a&gt; of the &lt;a href=&quot;http://en.wikipedia.org/wiki/RSS#Variants&quot;&gt;RSS 2.0 controversy&lt;/a&gt;, but sufficient to say that RSS 2.0 (&quot;Really Simple Syndication&quot;) represents a downgrade from RSS 1.0 (&quot;RDF Site Summary&quot;) in terms of capabilities and potential. You've heard of &quot;embrace and extend&quot;, right? Well, try &quot;co-opt and cripple&quot; on for size. (&lt;strong&gt;Update:&lt;/strong&gt; I posted some more on this at the &lt;a href=&quot;http://www.reddit.com/r/programming/comments/87d5h/rdfizing_drupal_upgrading_the_rss_feeds/&quot;&gt;Reddit&lt;/a&gt; thread and at &lt;a href=&quot;http://groups.drupal.org/node/20529&quot;&gt;groups.drupal.org&lt;/a&gt;.)&lt;/small&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;small&gt;&lt;a name=&quot;footnote-xmlwriter&quot; href=&quot;#footnote-xmlwriter&quot;&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/a&gt;
Note that for the purposes described in this article, you &lt;em&gt;don't&lt;/em&gt; have to install the optional &lt;a href=&quot;http://arc.semsol.org/&quot;&gt;ARC&lt;/a&gt; RDF library; the RDF module includes native support for &lt;a href=&quot;http://www.w3.org/TR/rdf-syntax-grammar/&quot;&gt;RDF/XML&lt;/a&gt; output using PHP's &lt;a href=&quot;http://php.net/manual/en/book.xmlwriter.php&quot;&gt;XMLWriter&lt;/a&gt; extension. This extension is &lt;a href=&quot;http://php.net/manual/en/xmlwriter.installation.php&quot;&gt;available by default&lt;/a&gt; since PHP 5.1.2, though FreeBSD users may need to explicitly install the &lt;a href=&quot;http://www.freshports.org/textproc/php5-xmlwriter/&quot;&gt;php5-xmlwriter&lt;/a&gt; package.&lt;/small&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;small&gt;&lt;a name=&quot;footnote-hook_rdf_feeds&quot; href=&quot;#footnote-hook_rdf_feeds&quot;&gt;&lt;sup&gt;[3]&lt;/sup&gt;&lt;/a&gt;
Developers: see &lt;tt&gt;hook_rdf_feeds()&lt;/tt&gt; in &lt;tt&gt;rdf.module&lt;/tt&gt; for an example on how you can declare RDF-compatible feeds that will be listed on this screen.&lt;/small&gt;
&lt;/p&gt;</description>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-03-24T23:00:00Z</dc:date>
    <dc:creator>Arto</dc:creator>
    <dc:subject>Drupal</dc:subject>
    <dc:subject>Linked Data</dc:subject>
    <dc:subject>RDF</dc:subject>
    <dc:subject>RSS</dc:subject>
    <dc:subject>Semantic Web</dc:subject>
    <dc:subject>RDFizing Drupal</dc:subject>
    <cc:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
  </item>
  <item rdf:about="http://ar.to/node/49">
    <title>The Universal Timeline Aggregator</title>
    <link>http://ar.to/2006/12/the-universal-timeline-aggregator</link>
    <description>&lt;p&gt;&lt;img alt=&quot;Timeline&quot; src=&quot;http://ar.to/2006/12/the-universal-timeline-aggregator/timeline-sundial.png&quot; width=&quot;150&quot; height=&quot;150&quot; border=&quot;0&quot; align=&quot;left&quot; style=&quot;margin-right: 15px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For those who haven&amp;#8217;t yet come across it somewhere on the web, may I recommend checking out the &lt;a href=&quot;http://simile.mit.edu/timeline/&quot;&gt;Timeline&lt;/a&gt; widget developed by &lt;a href=&quot;http://people.csail.mit.edu/people/dfhuynh/&quot;&gt;David F. Huynh&lt;/a&gt; of the &lt;a href=&quot;http://simile.mit.edu/&quot;&gt;&lt;span class=&quot;caps&quot;&gt;MIT&lt;/span&gt; Simile&lt;/a&gt; project. It&amp;#8217;s a snazzy DHTML/JavaScript tool for visualizing chronological events on a scrollable, graphical timeline &amp;#8212; sort of a Google Maps for temporal information.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve been working quite a bit with the Simile widget lately, co-developing (with &lt;a href=&quot;http://drdonohue.com/&quot;&gt;David Donohue&lt;/a&gt;) a &lt;a href=&quot;http://drupal.org/project/timeline&quot;&gt;module&lt;/a&gt; that integrates the widget into &lt;a href=&quot;http://drupal.org/&quot;&gt;Drupal&lt;/a&gt;, allowing Drupal sites to display any &lt;a href=&quot;http://drupal.org/project/cck&quot;&gt;CCK&lt;/a&gt; / &lt;a href=&quot;http://drupal.org/project/views&quot;&gt;Views&lt;/a&gt; content as graphical timelines.&lt;/p&gt;

&lt;p&gt;Today, inspired by &lt;a href=&quot;http://apassant.net/blog/&quot;&gt;Alexandre Passant&amp;#8217;s&lt;/a&gt; &lt;a href=&quot;http://apassant.net/home/2006/07/rss2timeline/&quot;&gt;RSS2Timeline&lt;/a&gt; implementation, I sat down to code up a generic web service that can take any Atom or &lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; feed and convert it into a &lt;a href=&quot;http://en.wikipedia.org/wiki/JSON&quot;&gt;JSON-based&lt;/a&gt; event source for the Timeline widget. My goal was to make it absolutely trivial to embed live Atom/&lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; timelines into blogs and whatnot, so that anyone with basic &lt;span class=&quot;caps&quot;&gt;HTML&lt;/span&gt; skills could use timelines without having to go through the relatively complex technical setup the widget requires.&lt;/p&gt;

&lt;p&gt;I hereby present the &lt;strong&gt; &lt;a href=&quot;http://timeline.to/&quot;&gt;Universal Timeline Aggregator&lt;/a&gt; &lt;/strong&gt;, available at &lt;strong&gt; &lt;a href=&quot;http://timeline.to/&quot;&gt;http://timeline.to/&lt;/a&gt; &lt;/strong&gt; (I&amp;#8217;ve been snapping up Tonga&amp;#8217;s &lt;a href=&quot;http://en.wikipedia.org/wiki/.to&quot;&gt;dot-to domains&lt;/a&gt; since getting the ultimate &lt;a href=&quot;http://en.wikipedia.org/wiki/Vanity_domain&quot;&gt;vanity domain&lt;/a&gt;, &lt;a href=&quot;http://ar.to/&quot;&gt;ar.to&lt;/a&gt;, as a Christmas present; the &amp;#8220;to&amp;#8221; preposition works rather nicely for the present purpose, too.)&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s a screen capture (and live example, if you click on it) of the sort of timeline display you can create in a minute or two using the &lt;strong&gt; &lt;a href=&quot;http://timeline.to/&quot;&gt;timeline.to&lt;/a&gt; &lt;/strong&gt; service:&lt;/p&gt;

&lt;div class=&quot;screenshot&quot;&gt;
&lt;a href=&quot;http://timeline.to/?url=http%3A%2F%2Fdev.rubyonrails.org%2Ftimeline%3Fchangeset%3Don%26max%3D50%26format%3Drss&quot; target=&quot;_blank&quot;&gt;&lt;img alt=&quot;Ruby on Rails Trac&quot; src=&quot;http://ar.to/2006/12/the-universal-timeline-aggregator/timeline-trac-rails.png&quot; width=&quot;500&quot; height=&quot;400&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;

&lt;span&gt;Timeline view of &amp;#8220;recent Ruby on Rails development&amp;#8221;:http://dev.rubyonrails.org/timeline.&lt;/span&gt;
&lt;/div&gt;

&lt;p&gt;Embedding a live, interactive Atom or &lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; timeline into any site is now as easy as copying and pasting the following &lt;span class=&quot;caps&quot;&gt;HTML&lt;/span&gt; snippet, with the appropriate modifications:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;html&quot;&gt;&amp;lt;iframe src=&quot;http://timeline.to/http://www.mysite.com/rss.xml&quot;
  width=&quot;500&quot; height=&quot;400&quot;
  scrolling=&quot;no&quot; frameborder=&quot;1&quot;
  marginwidth=&quot;0&quot; marginheight=&quot;0&quot;&amp;gt;&amp;lt;/iframe&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Just replace &lt;tt&gt;http://www.mysite.com/rss.xml&lt;/tt&gt; with a real &lt;span class=&quot;caps&quot;&gt;URL&lt;/span&gt; address to an Atom or &lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; feed, and modify the width and height as you like. &lt;a href=&quot;http://diveintomark.org/archives/2002/08/15/ultraliberal_rss_locator&quot;&gt;Autodiscovery&lt;/a&gt; of feeds is supported to a reasonable extent, so in most cases you won&amp;#8217;t even need the exact &lt;span class=&quot;caps&quot;&gt;URL&lt;/span&gt; to the feed; the website&amp;#8217;s &lt;span class=&quot;caps&quot;&gt;URL&lt;/span&gt; address itself will do.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s another example as a screen capture, this time showing the popup box that opens when a timeline event is clicked:&lt;/p&gt;

&lt;div class=&quot;screenshot&quot;&gt;
&lt;a href=&quot;http://timeline.to/?url=http%3A%2F%2Fwww.scheme.dk%2Fplanet%2F&quot; target=&quot;_blank&quot;&gt;&lt;img alt=&quot;Planet Scheme&quot; src=&quot;http://ar.to/2006/12/the-universal-timeline-aggregator/timeline-planet-scheme.png&quot; width=&quot;500&quot; height=&quot;400&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;br /&gt;

&lt;span&gt;Timeline view of &lt;a href=&quot;http://www.scheme.dk/planet/&quot;&gt;Planet Scheme&lt;/a&gt;, showing a preview of a blog entry.&lt;/span&gt;
&lt;/div&gt;

&lt;p&gt;Starting out on the &lt;strong&gt; &lt;a href=&quot;http://timeline.to/&quot;&gt;timeline.to&lt;/a&gt; &lt;/strong&gt; implementation today, I had to actually pause a moment to contemplate which technology to use: Python, Ruby or &lt;span class=&quot;caps&quot;&gt;PHP&lt;/span&gt; &amp;#8212; it wasn&amp;#8217;t quite as clear-cut a decision as usual.&lt;/p&gt;

&lt;p&gt;Case in point, while I haven&amp;#8217;t done much Python coding recently (since defecting to the Ruby camp), the language does have some excellent libraries and frameworks going for it &amp;#8212; including arguably the best Atom/&lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; parser in existence, the &lt;a href=&quot;http://feedparser.org/&quot;&gt;Universal Feed Parser&lt;/a&gt; written by &lt;a href=&quot;http://diveintomark.org/&quot;&gt;Mark Pilgrim&lt;/a&gt;. Considering the &lt;a href=&quot;http://diveintomark.org/archives/2002/08/20/how_liberal_is_too_liberal&quot;&gt;staggering number&lt;/a&gt; &lt;a href=&quot;http://diveintomark.org/archives/2003/01/22/parse_at_all_costs&quot;&gt;of malformed and invalid feeds&lt;/a&gt; out there, a good parser is essential (I won&amp;#8217;t get further, right this moment, into the delicious irony of having to parse &lt;span class=&quot;caps&quot;&gt;XML&lt;/span&gt; formats using regular expressions).&lt;/p&gt;

&lt;p&gt;On the other hand, Ruby also has (at least) two relatively comprehensive and decent libraries for feed parsing, &lt;a href=&quot;http://rubyforge.org/projects/feedtools/&quot;&gt;FeedTools&lt;/a&gt; and &lt;a href=&quot;http://rubyforge.org/projects/syndication/&quot;&gt;Syndication&lt;/a&gt;. Unfortunately, in my experience neither library is quite up there with the Universal Feed Parser yet, and neither seems &lt;a href=&quot;http://rubyforge.org/tracker/index.php?func=detail&amp;#38;aid=4843&amp;#38;group_id=775&amp;#38;atid=3061&quot;&gt;particularly active&lt;/a&gt; recently.&lt;/p&gt;

&lt;p&gt;In the end, underdog &lt;span class=&quot;caps&quot;&gt;PHP&lt;/span&gt; won out on this project on purely practical points: since my &lt;a href=&quot;http://drupal.org/project/timeline&quot;&gt;Timeline module&lt;/a&gt; for &lt;a href=&quot;http://drupal.org/&quot;&gt;Drupal&lt;/a&gt; is written in &lt;span class=&quot;caps&quot;&gt;PHP&lt;/span&gt;, it makes sense to try and reuse code both ways between the &lt;strong&gt; &lt;a href=&quot;http://timeline.to/&quot;&gt;timeline.to&lt;/a&gt; &lt;/strong&gt; service and Drupal.&lt;/p&gt;

&lt;p&gt;Investigating the current best way to parse both Atom and &lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; feeds with &lt;span class=&quot;caps&quot;&gt;PHP&lt;/span&gt;, I learned of a new feed parser library for &lt;span class=&quot;caps&quot;&gt;PHP&lt;/span&gt; called &lt;a href=&quot;http://simplepie.org/&quot;&gt;SimplePie&lt;/a&gt;, which has been gaining a lot of momentum lately (indeed, it seems to be on track for eventually surpassing &lt;a href=&quot;http://magpierss.sourceforge.net/&quot;&gt;MagpieRSS&lt;/a&gt; as the de-facto &lt;span class=&quot;caps&quot;&gt;RSS&lt;/span&gt; parser for &lt;span class=&quot;caps&quot;&gt;PHP&lt;/span&gt;). The SimplePie developers are apparently in the process of porting the 3000+ unit tests from Pilgrim&amp;#8217;s parser, which certainly seems a promising prospect for creating a truly robust parser.&lt;/p&gt;

&lt;p&gt;SimplePie is also bundled with the &lt;a href=&quot;http://drupal.org/project/feedparser&quot;&gt;Feedparser&lt;/a&gt; Drupal module, so again, it all just makes sense. The library turned out to be quite painless to work with, and has, so far, been able to parse all the feeds I&amp;#8217;ve thrown at it. (I did have to disable SimplePie&amp;#8217;s ad-removal feature, as that was eating up the entry descriptions on some Atom feeds.)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ar.to/contact&quot;&gt;Feedback&lt;/a&gt; on the &lt;strong&gt; &lt;a href=&quot;http://timeline.to/&quot;&gt;timeline.to&lt;/a&gt; &lt;/strong&gt; service is welcome. If there&amp;#8217;s sufficient interest, I will consider adding further functionality such as &lt;a href=&quot;http://en.wikipedia.org/wiki/ICalendar&quot;&gt;iCalendar&lt;/a&gt; support, and perhaps &lt;a href=&quot;http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29&quot;&gt;mashup&lt;/a&gt; features allowing multiple feeds and data sources to be combined into a single timeline display.&lt;/p&gt;</description>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2006-12-29T17:37:02Z</dc:date>
    <dc:creator>Arto</dc:creator>
    <dc:subject>atom</dc:subject>
    <dc:subject>Drupal</dc:subject>
    <dc:subject>JSON</dc:subject>
    <dc:subject>mashups</dc:subject>
    <dc:subject>projects</dc:subject>
    <dc:subject>RSS</dc:subject>
    <dc:subject>services</dc:subject>
    <dc:subject>SimplePie</dc:subject>
    <dc:subject>timeline</dc:subject>
    <cc:license rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
  </item>
</rdf:RDF>
