<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://hecker.org">
  <title type="text">Frank Hecker - Blosxom</title>
  <subtitle type="text">Notes on the Blosxom blogging system and my contributions to it</subtitle>
  
  <link rel="alternate" type="text/html" hreflang="en" href="http://hecker.org/blosxom/" />
  <id>tag:hecker.org,2004:/blosxom</id>
  <generator uri="http://www.blosxom.com/" version="2.0">Blosxom</generator>
  <rights>Copyright 2004-2006 Frank Hecker, http://www.hecker.org/</rights>
  
  
  <updated>2006-07-23T16:37:00Z</updated>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/hecker-blosxom" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="hecker-blosxom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
    <id>tag:hecker.org,2006:/blosxom/extensionless</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/extensionless" />

    <title type="text">Extensionless URIs for Blosxom entries</title>
    <published>2006-07-23T16:37:00Z</published>
    <updated>2006-07-23T16:37:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>This post introduces the Blosxom plugin
"<a href="http://hecker.org/blosxom/plugins/extensionless">extensionless</a>". Ever since reading
the essay "<a href="http://www.w3.org/Provider/Style/URI">Cool URIs don't
change</a>" I've wanted to change
my web site to conform to some of its recommendations, including the
recommendation to omit file extensions (e.g., ".html") on
URIs. Unfortunately standard Blosxom requires that a file extension be
present in a URI for an individual entry (e.g.,
<code>http://www.example.com/foo.html</code>) to distinguish it from a URI for a
category (e.g., <code>http://www.example.com/foo</code>). How to fix this?</p>

<p>In my ignorance I originally thought it would be necessary to <a href="http://groups.yahoo.com/group/blosxom/message/8600" title="Blosxom 2 patch to support extensionless URIs for entries">patch
Blosxom itself</a> to recognize extensionless URIs for
entries. However I hadn't fully grasped the power of <a href="http://www.blosxom.com/documentation/users/plugins.html">Blosxom
plugins</a>, nor had I had a good look at the <a href="http://www.blosxom.com/plugins/">Blosxom plugin
registry</a>, where I would have found not <a href="http://www.blosxom.com/plugins/link/cooluri.htm" title="cooluri plugin">one</a> but <a href="http://www.blosxom.com/plugins/link/cooluri2.htm" title="cooluri2 plugin">two</a>
plugins to implement a "cool URI" scheme.</p>

<p>Unfortunately those plugins are overkill for my own personal needs,
since I'm not interested in doing date-based permalinks as implemented
by those plugins. As a result I decided to implement my own plugin to
provide the more limited functionality that I wanted.</p>

<p>(I refrained from naming the plugin "cooluri3" or something similar
since it really doesn't implement the full date-based "cool URI"
scheme recommended by the W3C and implemented by the <code>cooluri</code> and
<code>cooluri2</code> plugins; at best it implements "semi-cool" URIs.)</p>

<p>See the plugin code itself for the full documentation. In most cases
you should just be able to copy the plugin into your Blosxom plugin
directory. Note that you should not have to change any other plugins;
the <code>extensionless</code> plugin will fix up an entry's URI internally so
that it has the proper file extension for its flavour as expected by
Blosxom and Blosxom plugins.</p>

<p>If you encounter problems with the plugin (or if you just
use it and like it) please <a href="mailto:hecker@hecker.org">send me email</a>.</p>

<p>UPDATE: Thanks go to Stu MacKenzie for providing a patch to allow the
plugin to work for extensionless URIs where the entry name starts with
a digit.</p>

<p>UPDATE 2: The original version of the <code>extensionless</code> plugin would not
work with Blosxom versions 2.0.1 and higher. I fixed the plugin to
correct this problem; current versions of the <code>extensionless</code> plugin
should work with all Blosxom versions.</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2006:/blosxom/feedback</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/feedback" />

    <title type="text">The feedback plugin, an alternative to writeback</title>
    <published>2006-07-20T05:41:00Z</published>
    <updated>2006-07-20T05:41:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>When I originally put up my blog one of the major things lacking was
support for comments and TrackBacks. After looking at the various
alternatives (the <a href="http://blosxom.ookee.com/blosxom/plugins/v2/writeback-v20030918-zip">writeback
plugin</a>,
the <a href="http://fletcher.freeshell.org/wiki/WritebackplusPlugin">writebackplus
plugin</a>, and
so on) I decided to embark on a complete rewrite of the writeback
plugin in order to support my particular requirements for a comments
system. After much struggle I created an initial version of my
<a href="http://hecker.org/blosxom/plugins/feedback">feedback plugin</a> for publication and use
on my site; since that time I've upgraded the plugin and incorporated
bug fixes suggested by various people.</p>

<h2>Features</h2>

<p>I did a pretty much complete rewrite of writeback and the various
writeback derivatives, both because they didn't support particular
features of interest to me and also because I didn't understand
exactly why they did certain things the way they did. Here are the key
things I wanted, all of which I managed to implement in one form or
another:</p>

<ul>
<li>no need to use a special writeblack flavour</li>
<li>different formatting for comments vs. TrackBacks</li>
<li>correct formatting of basic plain-text comments (in particular, if
you enter paragraphs separated by blank lines then the resulting
comment should actually display as multiple paragraphs, without
requiring you to use HTML tags)</li>
<li>comment previewing</li>
<li>comment and TrackBack moderation</li>
<li><del>basic</del> spam blacklist checking using Akismet</li>
<li>basic security measures against code injection attacks</li>
<li><a href="http://daringfireball.net/projects/markdown/">Markdown</a> support (although I'm not using it right now)</li>
<li>clean enough code structure to allow easy addition of new features</li>
<li>minimal dependence on other plugins</li>
</ul>

<h2>Non-features</h2>

<p>There were a number of other features that I didn't care about and
decided to leave out:</p>

<ul>
<li><p><a href="http://en.wikipedia.org/wiki/Captcha">Captcha</a> support. I omitted this in favor of a combination of
Akismet spam checking plus (optional) moderation; unlike captchas
this protects against TrackBack spam, not just comment spam.</p></li>
<li><p>HTML tags in comments (even just a subset). I think Markdown is a
better approach for those who want links and more complicated
formatting.</p></li>
<li><p>Comment threading (e.g., as implemented in the <a href="http://blosxom.ookee.com/blosxom/plugins/v2/comments-v0i6-zip">comments
plugin</a>). This just wasn't that important to me.</p></li>
<li><p>A writeback compatibility mode (i.e., the ability to use the
feedback plugin as a drop-in replacement for thr writeback plugin,
using legacy writeback templates and variables). The plugin could
probably be enhanced to support this, but I'll leave it to someone
else to do this.</p></li>
<li><p>Correct display of comment and TrackBack counts (e.g., displaying "1
comment" vs. '2 comments"). I left this as a future task.</p></li>
<li><p>A generalized API for extending features through feedback-specific
plugins, e.g., <a href="http://lathi.net/twiki-bin/view/Main/BlosxomWriteback?skin=print">as implemented by Doug Alcorn</a> for
writeback. While an interesting concept, I felt this was a bit
heavyweight for what I wanted to do.</p></li>
</ul>

<p>There were also several features that I initially implemented and then
later pulled out in the interest of simplicity:</p>

<ul>
<li><p>Ability to display comments and TrackBacks on index and archive
pages. (Right now the plugin displays comments and TrackBacks only
on individual story pages.) I've seen some Blosxom-based blogs that
do this, but it wasn't something I was interested in. (For anyone
who wants to do this, it's a trivial patch.)</p></li>
<li><p>Support for "comments_head" and "comments_foot" templates to
provide additional content to be displayed before all comments and
after all comments (with analogous "trackbacks_head" and
"trackbacks_foot" templates for TrackBacks). In the end I found I
could use variable interpolation (as supported by the
<a href="http://blosxom.ookee.com/blosxom/plugins/v2/interpolate_fancy-v20030909">interpolate_fancy plugin</a>) in the story and/or foot templates to
achieve the look I wanted.</p></li>
<li><p>Support for a separate "preview template" used for previewed
comments; in the end I just reused the comment template for this.</p></li>
<li><p>Support for HTML email for moderation and notification messages. I
decided that plain text email worked fine and was arguably better
from a security point of view.</p></li>
</ul>

<h2>Implementation</h2>

<p>Finally, here some implementation details that might be of interest to
people using or (especially) writing writeback-like plugins:</p>

<ul>
<li><p>The feedback plug-in contains all the default templates needed that
are specific to the plug-in; you should only have to modify your
existing story and foot templates to reference the plugin variables
(as described in the documentation at the bottom of the plugin).</p></li>
<li><p>I used the same basic file structure to store comments and
TrackBacks as is used by writeback and friends. (In fact, I think
you may be able to use existing writeback files with this plugin,
but I haven't tested this.)</p></li>
<li><p>I moved processing of submitted comments from the <code>start</code> subroutine
into the <code>story</code> subroutine. This seemed to simplify the
implementation, especially when it came to deciding whether or not
comments or TrackBacks should be closed after a certain time. (To do
this properly you need the modification date/time of the story,
which you don't have until the <code>date</code> subroutine runs, right before
the <code>story</code> subroutine.)</p></li>
<li><p>Comment previewing was pretty straightforward to implement: It's
basically a matter of keying off the particular submit button
clicked, formatting a special "preview" comment separate from the
main comments, and then pre-filling the comment form fields with the
previewed field values.</p></li>
<li><p>Moderation was also pretty straightforward (after a couple of false
starts): Write the moderated comment or TrackBack not to the main
feedback file for the story, but to a separate file whose name
contains a randomly-generated 8-character alphanumeric string. To
support approval or rejection of the comment or TrackBack, create
<code>moderate=approve</code> and <code>moderate=reject</code> URLs referencing that
string as a query parameter (e.g., <code>feedback=mi3g9qcl4</code>) and send
those URLs to the moderator as part of an email message notifying
them of the comment/TrackBack. When the moderator clicks on the
appropriate URL and the plugin processes the GET request, it will
either append the temporary file to the main feedback file (if the
request was approved) or just delete it (if the request was
rejected). The use of a random string minimizes the possibility of
unauthorized persons trying to approve their own requests. (This
doesn't protect against eavesdropping attacks, of course; doing that
would require some use of cryptography.)</p></li>
<li><p>The plugin seems to support non-ASCII posts (e.g., in Japanese,
etc.), except in the notification/moderation email messages. I
suspect this is just be an artifact of the particular version of
Perl I'm using; I certainly didn't do any work to support anything
but 7-bit ASCII, and it may be that some things are still broken
even in my case.</p></li>
</ul>

<p>Anyway, I hope this may be of interest to some people. As always,
please feel free to re-use the code or ideas as you wish. See the
<a href="http://hecker.org/blosxom/plugins/feedback">plugin</a> itself for the full documentation on how to configure it.
If you encounter problems with the plugin (or if you just use it and
like it) please <a href="mailto:hecker@hecker.org">send me email</a>.</p>

<p>UPDATE: Added mention of Akismet support. Thanks go to Kevin
Scaldeferri for the Akismet support (adapted from his version of
writebackplus) and to Keith Carangelo, Gustaf Erikson, Matthijs
Kooijman, and Michael Lamertz for various bug fixes and related
suggestions for improvement.</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/atomfeed-utc-patch</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/atomfeed-utc-patch" />

    <title type="text">Patch for atomfeed plugin (UTC dates)</title>
    <published>2005-02-20T08:06:00Z</published>
    <updated>2005-02-20T08:06:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I recently experienced a strange problem with the Atom feed on my
weblog. My weblog server is running on U.S. Eastern time as the basic
time zone, but the story dates in the Atom feed should be expressed in
UTC/GMT; the atomfeed plugin has code that supposedly should do any
necessary conversions. On my local test blog (running under OS X 10.3
using Perl 5.8.1) this worked fine, but on my real blog (running on
Red Hat Enterprise Linux 3 using Perl 5.8.0) the dates in the Atom
feed were incorrect; they were five hours earlier than what they
should be, suggesting that they didn't get converted to UTC/GMT.
After some investigation this turned out to be due to non-portable
code in the <a href="http://www.blosxom.com/downloads/plugins/atomfeed">atomfeed
plugin</a>.</p>

<p>More specifically, the atomfeed plugin attempts to convert the
modification time of each entry file (expressed as a Unix time, i.e.,
in seconds since the epoch) into a UTC date by calling the subroutine
<code>blosxom::nice_date</code>, which in turn uses the <code>ctime</code> function defined
by <code>Time::localtime</code>. Since <code>ctime</code> normally converts Unix time into a
local time (i.e., using the time zone currently in effect), the
original atomfeed plugin attempts to coerce <code>ctime</code> into returning a
UTC/GMT time by setting the <code>TZ</code> environment variable to the value
'GMT' prior to calling <code>nice_date</code>, and then restoring <code>TZ</code> to its
original value afterwards.</p>

<p>Unfortunately, as noted in <a href="http://www.perl.com/doc/manual/html/pod/perlport.html">"Writing Portable Perl"</a>, this
won't necessarily work on all systems:</p>

<blockquote>
  <p>The system's notion of time of day and calendar date is controlled
  in widely different ways. Don't assume the timezone is stored in
  $ENV{TZ}, and even if it is, don't assume that you can control the
  timezone through that variable.</p>
</blockquote>

<p>I appear to have hit one of the cases where this doesn't work.</p>

<p>In any case the solution was simple: I just changed the atomfeed
plugin to convert the entry's modification time using the built-in
Perl function <code>gmtime</code>, which converts a Unix time into a time
expressed as UTC/GMT. Like the <code>nice_date</code> subroutine the <code>gmtime</code>
function returns an array; the returned values require a bit more
reformatting than those from <code>nice_date</code>, but nothing that
complicated.</p>

<p>For the exact code changes see the <a href="http://hecker.org/blosxom/plugins/atomfeed-utc.patch">atomfeed UTC patch</a> itself.</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/seemore-full-text-feed-patch</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/seemore-full-text-feed-patch" />

    <title type="text">Patch seemore plugin for full text feeds</title>
    <published>2005-01-18T09:50:00Z</published>
    <updated>2005-01-18T09:50:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I use the <a href="http://molelog.molehill.org/blox/Computers/Internet/Web/Blosxom/SeeMore/">seemore
plugin</a>
by <a href="http://molelog.molehill.org/blox/Meta/about-me.html">Todd Larason</a>
to show only excerpts of entries on my main blog page, index pages for
categories, and archive pages, while displaying the entire article on
an individual entry's page. It's worked well, with one exception: When
I created my <a href="http://hecker.org/site/feeds">RSS and Atom feeds</a> I wanted the feeds to
contain the full text of all entries, for the convenience of people
using news readers. (Many of these applications display article text
directly in the reader, removing the need to open a browser window to
read the article.)</p>

<p>To do this I made a minor <a href="http://hecker.org/blosxom/plugins/seemore-v0i3-full-text-feed.patch">patch to the seemore plugin</a>, which
I thought others might find of interest as well. The patch essentially
bypasses seemore processing for selected Blosxom flavours (in my case,
the 'rss' and 'atom' flavours).</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/entries_cache_meta-meta-values-patch</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/entries_cache_meta-meta-values-patch" />

    <title type="text">Patch for entries_cache_meta plugin (meta values)</title>
    <published>2005-01-17T23:57:00Z</published>
    <updated>2005-01-17T23:57:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I've been using the <a href="http://ahab.com/lint/entries_cache_meta.html">entries_cache_meta
plugin</a> by <a href="http://ahab.com/">Jason
Thaxter</a>, mainly for the convenience of specifying
the modification date within the entry file. After a while I decided
I'd like to also use its "meta" capability, i.e., the ability to
specify arbitrary variables in the entry header along with the
modification time, e.g.,</p>

<pre><code>The entry title
meta-mtime: 2005/01/17 12:18:00
meta-foo: Whatever you want

The entry text begins here...
</code></pre>

<p>and then reference the variables as, e.g., <code>$meta::foo</code> within the
story template (as is possible with Rael Dornfest's
original <a href="http://www.blosxom.com/downloads/plugins/meta">meta
plugin</a>). Unfortunately,
I couldn't get this to work at all.</p>

<p>After a bit of debugging I managed to find out what the problem was:
The plugin code for the <code>story</code> subroutine attempts to take the cached
meta values for the current story and stuff them into the "meta"
namespace, so that they can be accessed as, e.g.,
<code>$meta::foo</code>. However the code doesn't correctly access the cached set
of meta values for the entry; it uses as a cache key the value of the
variable <code>$filename</code> as passed to the <code>story</code> subroutine, but this
variable is simply the basename of the entry file (e.g., "foo"). What
it should be using is the full absolute pathname of the entry file,
e.g., <code>/blosxom/data/abc/foo.txt</code>.</p>

<p>The fix is very simple: use the standard Blosxom variables
<code>$blosxom::datadir</code> and <code>$blosxom::file_extension</code> and the variable
<code>$path</code> (also passed to the <code>story</code> subroutine) along with <code>$filename</code>
to recreate the entry file's absolute pathname. For full details see
the one-line <a href="http://hecker.org/blosxom/plugins/entries_cache_meta-v0i6-meta-values.patch">patch</a> itself.</p>

<p>A final note: The documentation for the entries_cache_meta plugin
claims as a special feature that</p>

<blockquote>
  <p>By combining the meta-tag functionality with the entries cache, it
  becomes possible to write or use plugins that access meta-values
  outside the <code>story</code> hook. For example, you could use this plugin to
  write another one showing the most recent entry for each author
  defined in meta tags.</p>
</blockquote>

<p>The documentation doesn't expand on how this would be possible, and in
my confusion I was thinking that this was done using <code>$meta::foo</code>
variables; this is not correct. The problem is that entry-specific
meta variables have a unique meaning and value only in the context of
a single entry. For example, one entry file might have a line
<code>meta-foo: abc</code> and another file a line <code>meta-foo: xyz</code>; hence
<code>$meta::foo</code> would have the value 'abc' in the context of the first
entry (e.g., when processing a story template for that entry) and the
value 'xyz' in the context of the second entry. Outside the context of
those two entries (e.g., when processing a head template) it doesn't
make sense to refer to <code>$meta::foo</code>.</p>

<p>Now having said that, with the entries_cache_meta plugin it is in
fact possible to make use of cached meta values outside the context of
an entry, since the variable <code>%entries_cache_meta::cache</code> is populated
as soon as the <code>entries</code> subroutine is run. For example, a plugin
could loop over all the entries to be displayed and determine how many
entries were by a particular author, as determined by a "meta-author"
field in the entry files:</p>

<pre><code>my %num_by;                   # number of entries for each author
for my $entry (keys %entries_cache_meta::cache) {
    $entries_cache_meta::cache{$entry}{'author'}
        and $num_by{$entries_cache_meta::cache{$entry}{'author'}}++
}
</code></pre>

<p>The results could then be used to customize the head section of the
page (e.g., to identify the most prolific author).</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/slashredir</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/slashredir" />

    <title type="text">Enforcing proper use of trailing slashes</title>
    <published>2005-01-11T10:50:00Z</published>
    <updated>2005-01-11T10:50:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I've previously blogged about my <a href="http://hecker.org/blosxom/canonicaluri">canonicaluri
plugin</a> that checks to see whether the
requested URI is in the canonical form for the type of page being
requested, and if necessary does a browser redirect to the canonical
form of the URI. However the canonicaluri plugin may be overkill for
some people, for example, it presumes use of the <a href="http://hecker.org/blosxom/extensionless">extensionless
plugin</a>, so that canonical URIs for individual
entries do not have file extensions for the default flavour. A
simpler alternative to the canonicaluri plugin is the <a href="http://hecker.org/blosxom/plugins/slashredir">slashredir
plugin</a>, which only enforces proper usage
regarding trailing slashes.</p>

<p>In particular, the slashredir plugin enforces the following rules:</p>

<ul>
<li><p>URIs for individual entry pages should not have a trailing
slash.</p></li>
<li><p>URIs for all other pages (including the blog root, category index
pages, and archive index pages) should have a trailing slash.</p></li>
</ul>

<p>For example, if you request the URI</p>

<pre><code>http://www.example.com/blog/foo
</code></pre>

<p>where "foo" is a category, this plugin will force a redirect to the
canonical URI</p>

<pre><code>http://www.example.com/blog/foo/
</code></pre>

<p>Similarly, if you request the URI</p>

<pre><code>http://www.example.com/blog/foo.html/
</code></pre>

<p>where "foo.html" is an individual entry, this plugin will force a
redirect to the canonical URI</p>

<pre><code>http://www.example.com/blog/foo.html
</code></pre>

<p>Note that this plugin depends on having URI rewriting rules in place
(e.g., in the Apache configuration file) to enforce the restriction
that a URI should never have more than one trailing slash. The plugin
as presently written can't handle this case properly (even though the
code attempts to do so) because it depends on the <code>path_info()</code>
function to get the URI path, and the <code>path_info()</code> value has already
been stripped of any excess trailing slashes that might have been
present in the original URI.</p>

<p>See the <a href="http://hecker.org/blosxom/plugins/slashredir">plugin code</a> itself for the full documentation. If you
encounter problems with the plugin (or if you just use it and like it)
please <a href="mailto:hecker@hecker.org">send me email</a>.</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/atomfeed-modified-patch</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/atomfeed-modified-patch" />

    <title type="text">Patch for atomfeed plugin ("modified" element for feed)</title>
    <published>2005-01-09T13:55:00Z</published>
    <updated>2005-01-09T13:55:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>The "official" <a href="http://www.blosxom.com/downloads/plugins/atomfeed">atomfeed
plugin</a> does not
generate valid feeds for the current version (0.3) of the <a href="http://www.atomenabled.org/developers/syndication/atom-format-spec.php">Atom
specification</a>
because the output does not have a "modified" element for the feed as
a whole, just "modified" elements for each story. Obviously the
modification date/time for the feed can be interpreted as the
date/time modified of the most recent story, so then it's just a
matter of generating the proper output for the <code>MODIFIED</code> tags.</p>

<p><a href="http://jclark.org/">Jason Clark</a> already <a href="http://jclark.org/weblog/WebDev/ThisSite/atomic.html">looked at this</a> and created a <a href="http://jclark.org/download/plugins/atomfeed">patched
version of the atomfeed plugin</a>. However his patch
requires the use of the <a href="http://www.cobblers.net/files/lastmodified">lastmodified plugin</a>. While I'm using a
<a href="http://hecker.org/blosxom/lastmodified2" title="The lastmodified2 plugin">rewritten version of the lastmodified plugin</a>, I don't
want to depend on it being present in order to get Atom feeds to work
properly.</p>

<p>Prior to discovering Jason Clark's atomfeed patch I had already
patched atomfeed myself. My <a href="http://hecker.org/blosxom/plugins/atomfeed-modified.patch">atomfeed patch</a> has the advantage of
not requiring a separate plugin. However the downside is that I had to
generate the "modified" element for the feed in the foot section after
the "entry" elements for the stories themselves, right before the
<code>FEED</code> end tag. Jason's patch generates the "modified" element for the
feed in the head section, which I think is more aesthetically pleasing
if nothing else.</p>

<p>Putting the "modified" element for the feed at the end after the
"entry" elements doesn't appear to affect the validity of the feed
output; it validates fine according to <a href="http://feedvalidator.org/">feedvalidator.org</a>. However
I don't know if this might cause problems for any Atom-aware feed
readers out there.  (The only ones I've tested with are <a href="http://www.newsfirerss.com/">NewsFire</a>
and <a href="http://www.ranchero.com/netnewswire/">NetNewsWire</a> for OS X; both seem to work fine.)</p>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/lastmodified2</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/lastmodified2" />

    <title type="text">The lastmodified2 plugin</title>
    <published>2005-01-09T13:43:00Z</published>
    <updated>2005-01-09T13:43:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="html" xml:base="http://hecker.org" xml:lang="en">
&lt;p&gt;In a previous post I discussed the general problem of &lt;a href="http://hecker.org/blosxom/validating-and-caching"&gt;validating and
caching dynamic content&lt;/a&gt;. In order to
implement the strategy outlined in that post I decided to create a new
version of the &lt;a href="http://www.cobblers.net/files/lastmodified"&gt;lastmodified
plugin&lt;/a&gt; originally created
by &lt;a href="http://www.cobblers.net/blog"&gt;Bob Schumaker&lt;/a&gt;. The lastmodified
plugin was a good base to build on; however it didn't do exactly what
I wanted to do, and hence I couldn't resist trying to improve on it.&lt;/p&gt;

&lt;p&gt;The following material documents the &lt;a href="http://hecker.org/blosxom/plugins/lastmodified2"&gt;lastmodified2
plugin&lt;/a&gt; that I created, including my
notes on how I implemented page validation according to my
interpretation of the &lt;a href="http://www.ietf.org/rfc/rfc2616.txt"&gt;HTTP 1.1
specification&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;The strategy revisited&lt;/h2&gt;

&lt;p&gt;As you may recall, in my &lt;a href="http://hecker.org/blosxom/validating-and-caching"&gt;previous post&lt;/a&gt; I outlined an overall
strategy for how to support validating and caching dynamic
content. Here's a recap of that strategy, with additional detail added
on the subject of validation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;When sending responses to requests, add a &lt;code&gt;Content-length&lt;/code&gt; header to
identify the total number of bytes in the response.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When sending responses to requests, add an &lt;code&gt;ETag&lt;/code&gt; header to identify
the "version number" (entity tag) for this particular version of the
page, and/or a &lt;code&gt;Last-Modified&lt;/code&gt; header to identify the date/time the
page was last modified. These are computed as follows, depending on
whether weak or strong validation is used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For weak validation, the &lt;code&gt;Last-modified&lt;/code&gt; header should reflect the
date/time modified of the most recently-updated
"semantically-significant" component of the page. (For example, for
Blosxom we consider entries to be semantically significant, but not
flavour templates.) The &lt;code&gt;ETag&lt;/code&gt; header can then supply a weak etag
directly derived from this date/time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For strong validation, the &lt;code&gt;ETag&lt;/code&gt; header should change if even a
single bit on a page changes; for example, it could be derived from
the MD5 or SHA-1 digest of the page. A &lt;code&gt;Last-modified&lt;/code&gt; header value
could then be determined by consulting a cached copy of the &lt;code&gt;ETag&lt;/code&gt; and
&lt;code&gt;Last-modified&lt;/code&gt; values for the URI; if there is a cache match then the
&lt;code&gt;Last-modified&lt;/code&gt; value can taken from the cache, otherwise it can be
arbitrarily assigned to be a date in the recent past.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When sending responses to requests, also add &lt;code&gt;Cache-control&lt;/code&gt; and
&lt;code&gt;Expires&lt;/code&gt; headers to the response to provide a "use by" date/time to
clients doing caching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When processing requests, look for the &lt;code&gt;If-none-match&lt;/code&gt;
and&lt;code&gt;If-modified-since&lt;/code&gt; headers. If one or both are present, return the
full page in the response only if necessary: if the version of the
page currently available is different than the version requested in
the &lt;code&gt;If-none-match&lt;/code&gt; header, or if the page has been modified since the
date in the &lt;code&gt;If-modified-since&lt;/code&gt; header.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Implementation overview&lt;/h2&gt;

&lt;p&gt;This section and the next describe in more depth how I implemented the
above strategy.&lt;/p&gt;

&lt;p&gt;First, the plugin is designed to have its behavior easily modifiable
using configurable variables, as is done with other Blosxom
plugins. In particular, it is possible to specify whether the plugin
should do strong or weak validation (&lt;code&gt;&amp;#036;strong&lt;/code&gt; boolean variable) and
whether it should generate an &lt;code&gt;ETag&lt;/code&gt; header, &lt;code&gt;Last-modified&lt;/code&gt; header,
or both (&lt;code&gt;&amp;#036;generate_etag&lt;/code&gt; and &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt; boolean variables). By
default the plugin is configured to be a "plug-compatible" replacement
for the lastmodified plugin, doing weak validation and generating both
&lt;code&gt;ETag&lt;/code&gt; and &lt;code&gt;Last-modified&lt;/code&gt; headers.&lt;/p&gt;

&lt;p&gt;The basic plan of the lastmodified2 plugin is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;start&lt;/code&gt; subroutine: Read in the cached information containing the
previous &lt;code&gt;ETag&lt;/code&gt; and &lt;code&gt;Last-modified&lt;/code&gt; values for this URI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;filter&lt;/code&gt; subroutine: Get the information necessary for weak
validation by traversing the list of entries to be displayed on the
page and determining the date/time any of the entries was most
recently modified. Use this last-modified date/time to create a weak
&lt;code&gt;ETag&lt;/code&gt; value.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;skip&lt;/code&gt; subroutine: For weak validation interpret any &lt;code&gt;If-none-match&lt;/code&gt;
or &lt;code&gt;If-modified-since&lt;/code&gt; headers and determine whether or not we need to
send a full response. If not we can skip the actual story processing
after setting &lt;code&gt;Status&lt;/code&gt; to 304 (Not Modified) and generating any other
headers appropriate for a 304 response.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;last&lt;/code&gt; subroutine: For strong validation generate an MD5 digest of
the page and use this to create the &lt;code&gt;ETag&lt;/code&gt; value. Create the
&lt;code&gt;Last-modified&lt;/code&gt; header by using the cached &lt;code&gt;Last-modified&lt;/code&gt; value if
the new &lt;code&gt;ETag&lt;/code&gt; value matches the cached &lt;code&gt;ETag&lt;/code&gt; value, otherwise
assigning a new &lt;code&gt;Last-modified&lt;/code&gt; value in the very recent past. Then
interpret any &lt;code&gt;If-none-match&lt;/code&gt; or &lt;code&gt;If-modified-since&lt;/code&gt; headers in the
request and determine whether or not we need to send a full
response. In either case send the appropriate headers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that there is also a &lt;code&gt;story&lt;/code&gt; subroutine in the lastmodified2
plugin, but its purpose is restricted to setting output variables
(e.g., for use in flavour templates) for compatibility with the
lastmodified plugin. It does not affect the actual caching and
validation processes.&lt;/p&gt;

&lt;h2&gt;Implementation details&lt;/h2&gt;

&lt;p&gt;Like the lastmodified plugin, this version of the plugin looks for and
acts upon the &lt;code&gt;If-modified-since&lt;/code&gt; header itself, instead of letting
the underlying web server deal with it. Note that the 1.3 and 2.0
versions of Apache in common use today have a feature whereby the
underlying web server will handle &lt;code&gt;If-modified-since&lt;/code&gt; checks as long
as the CGI script simply sets the &lt;code&gt;Last-modified&lt;/code&gt; header; this can be
used to easily implement simple validation. (Previous versions of this
plugin relied on this feature.)&lt;/p&gt;

&lt;p&gt;The plugin also looks for and acts upon the &lt;code&gt;If-none-match&lt;/code&gt;
header. (Apache does not do this for CGI scripts, so the plugin has no
choice but to do it itself.) Note that for weak validation
(&lt;code&gt;generate_etag&lt;/code&gt; set to 1 but &lt;code&gt;&amp;#036;strong&lt;/code&gt; set to 0) we generate entity
tag values using the date/time modified of the most recent entry, so
both the &lt;code&gt;If-modified-since&lt;/code&gt; check and the &lt;code&gt;If-none-match&lt;/code&gt; check can
be done as soon as we compute the &lt;code&gt;Last-modified&lt;/code&gt; value, which is done
in the &lt;code&gt;filter&lt;/code&gt; subroutine. This allows the plugin to save processing
time by skipping story processing (i.e., using the &lt;code&gt;skip&lt;/code&gt; subroutine)
when it does not need to return a full response to a conditional GET
with &lt;code&gt;If-modified-since&lt;/code&gt; or &lt;code&gt;If-none-match&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For strong validation using &lt;code&gt;ETag&lt;/code&gt; (&lt;code&gt;&amp;#036;generate_etag&lt;/code&gt; and &lt;code&gt;&amp;#036;strong&lt;/code&gt;
both set to 1) the &lt;code&gt;ETag&lt;/code&gt; value is computed as an MD5 digest of the
entire page as it will be returned to the user, in order to
distinguish changes that affect even a single bit of the page. We
can't skip story processing in this case since we need the complete
output (including the results of interpolating variables) in order to
compute the correct MD5 digest.&lt;/p&gt;

&lt;p&gt;For strong validation using &lt;code&gt;Last-modified&lt;/code&gt; (&lt;code&gt;&amp;#036;generate_mod&lt;/code&gt; and
&lt;code&gt;&amp;#036;strong&lt;/code&gt; both set to 1) we also compute an MD5 digest of the entire
page as it will be returned to the user, and we compare that value
against a cached MD5 digest computed for the page on previous
requests. If they match then we know that no changes have occurred
since the previous requests, and we set the &lt;code&gt;Last-modified&lt;/code&gt; value to
the value cached with the MD5 digest. Otherwise we know that some
change has occurred since the time of the previous requests, but do
not know exactly when that change occurred; we therefore arbitrarily
set the &lt;code&gt;Last-modified&lt;/code&gt; value to a time just prior to the time of the
current request. Note that we can't skip story processing in this case
either, since again we need the complete output (including the results
of interpolating variables) in order to compute the correct MD5
digest.&lt;/p&gt;

&lt;p&gt;Note that for weak validation (&lt;code&gt;&amp;#036;strong&lt;/code&gt; set to 0) the &lt;code&gt;Last-modified&lt;/code&gt;
header does &lt;em&gt;not&lt;/em&gt; necessarily provide the date/time at which the
actual (bit for bit) contents of the page last changed; instead it
provides the date/time at which the &lt;em&gt;meaning&lt;/em&gt; of the page last
changed, i.e., because the contents of at least one entry on the page
were changed. It is possible for other elements on the page such as
headers, footers, or comments to change without changing the meaning
of the page in this sense, so in this case the &lt;code&gt;Last-modified&lt;/code&gt; value
is only a "weak validator" as defined by section 13.3.3 of the HTTP
1.1 specification. When &lt;code&gt;&amp;#036;strong&lt;/code&gt; is set to 0 the entity tag provided
by the &lt;code&gt;ETag&lt;/code&gt; header is derived from the &lt;code&gt;Last-modified&lt;/code&gt; value and
hence is also only a weak validator, and we explicitly mark it as such
by prefixing it with "W/", as described in section 3.11 of the HTTP
1.1 specification.&lt;/p&gt;

&lt;p&gt;The net effect is that with weak validation if browsers, web caches,
and news aggregators caching the page send a conditional GET request
(i.e., with an &lt;code&gt;If-none-match&lt;/code&gt; and/or &lt;code&gt;If-modified-since&lt;/code&gt; header) to
check the current status of the page, they will be given a brand new
copy of the page only if there have been "semantically significant"
changes to the page (in the words of the HTTP 1.1 specification). With
strong validation they will get a new copy of the page if the page has
changed in any way, no matter how slight.&lt;/p&gt;

&lt;p&gt;Generation of the &lt;code&gt;Cache-control&lt;/code&gt; and &lt;code&gt;Expires&lt;/code&gt; headers is relatively
straightforward: We use the value of &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; directly with
the &lt;code&gt;max-age&lt;/code&gt; directive of the &lt;code&gt;Cache-control&lt;/code&gt; header, and add it to
the current date/time to create a date/time in the future for the
&lt;code&gt;Expires&lt;/code&gt; header. If &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; is set to 0 then we instead
send the &lt;code&gt;no-cache&lt;/code&gt; directive with the &lt;code&gt;Cache-control&lt;/code&gt; header and set
the &lt;code&gt;Expires&lt;/code&gt; header to a date in the past.&lt;/p&gt;

&lt;p&gt;Generation of the &lt;code&gt;Content-length&lt;/code&gt; header is also straightforward: We
simply use the length of &lt;code&gt;&amp;#036;blosxom::output&lt;/code&gt;. (This assumes of course
that no other plugin will subsequently be changing that output.) Note
that for HEAD requests Apache will not actually send the output but
&lt;em&gt;will&lt;/em&gt; send the &lt;code&gt;Content-length&lt;/code&gt; header if set; in that case the
&lt;code&gt;Content-length&lt;/code&gt; value reflects the length of the output that would
have been sent for a GET request, in compliance with section 14.13 of
the HTTP 1.1 specification.&lt;/p&gt;

&lt;p&gt;Note that the plugin does not generate the &lt;code&gt;Last-modified&lt;/code&gt; and
&lt;code&gt;Content-length&lt;/code&gt; headers for a 304 (Not Modified) response, in
accordance with section 10.3.5 of the HTTP 1.1 protocol specification.&lt;/p&gt;

&lt;p&gt;Finally, for upward compatibility the lastmodified2 plugin supports
the following features present in the original lastmodified plugin:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Optionally checking the &lt;code&gt;%others&lt;/code&gt; hash for last-modified dates:
Checking &lt;code&gt;%others&lt;/code&gt; is one way to detect changes other than changes in
the entries themselves; in particular it can be used to detect changes
to flavour files used in creating the page. Unfortunately some of the
entries in &lt;code&gt;%others&lt;/code&gt; are not relevant for the page being created
(e.g., flavour files for flavours other than the one currently being
generated) and may cause the &lt;code&gt;Last-modified&lt;/code&gt; time to be computed
incorrectly. Also, checking &lt;code&gt;%others&lt;/code&gt; will not detect page changes due
to interpolating variables into flavour files (e.g., for
comments). Finally, some plugins that replace the default Blosxom
&lt;code&gt;entries&lt;/code&gt; subroutine (including the &lt;a href="http://blosxom.ookee.com/blosxom/plugins/v2/entries_cache_meta-v0i5"&gt;entries_cache_meta plugin&lt;/a&gt;
in particular) do not create the &lt;code&gt;%others&lt;/code&gt; hash at all.&lt;/p&gt;

&lt;p&gt;For the above reasons this feature is deprecated; you should not
use it unless you need it for upward compatibility with your current
lastmodified configuration. If you want to check for changes to a page
outside the entries themselves then you should simply enable strong
validation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Exporting variables with the last-modified time and other times in
RFC 822 and ISO 8601 formats: The lastmodified2 plugin computes these
variables essentially in the same way as the lastmodified plugin; see
the code and the plugin documentation below for more information.&lt;/p&gt;

&lt;p&gt;Note that the variables &lt;code&gt;&amp;#036;latest_rfc822&lt;/code&gt; and &lt;code&gt;&amp;#036;latest_iso8601&lt;/code&gt;
always refer to the date/time modified for the most recently updated
entry, regardless of whether weak or strong validation is being
used. The problem with interpreting &lt;code&gt;&amp;#036;latest_rfc822&lt;/code&gt; or
&lt;code&gt;&amp;#036;latest_iso8601&lt;/code&gt; as a &lt;code&gt;Last-modified&lt;/code&gt; value is that when using strong
validation we wouldn't have values for these variables until we
completed generating output for the page, too late for the variables
to be of any use.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to use the lastmodified2 plugin to replace an existing
configuration of the lastmodified plugin, change the plugin's filename
and Perl package name (i.e., in the &lt;code&gt;package&lt;/code&gt; statement at the
beginning of the code) to "lastmodified" and set the &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt;,
&lt;code&gt;&amp;#036;generate_etag&lt;/code&gt;, and &lt;code&gt;&amp;#036;use_others&lt;/code&gt; configurable variables to match
your current values. All other configurable variables can be left as
is.&lt;/p&gt;

&lt;h2&gt;Description&lt;/h2&gt;

&lt;p&gt;This section and the succeeding ones contain more in-depth
documentation of the lastmodified2 plugin to supplement the material
included in the plugin itself.&lt;/p&gt;

&lt;p&gt;The lastmodified2 plugin enables caching and validation of
dynamically-generated Blosxom pages by web browsers, web proxies, news
aggregators, and other clients by generating various cache-related
HTTP headers in the response and supporting conditional GET requests,
as described below. This can reduce excess network traffic and server
load caused by requests for RSS or Atom feeds or for web pages for
popular entries or categories.&lt;/p&gt;

&lt;p&gt;The plugin generates an &lt;code&gt;ETag&lt;/code&gt; header to identify the particular
version of the page, as well as a &lt;code&gt;Last-modified&lt;/code&gt; header based on the
plugin's determination of when the contents of the page were most
recently modified. The plugin also recognizes and properly acts on an
&lt;code&gt;If-none-match&lt;/code&gt; and/or &lt;code&gt;If-modified-since&lt;/code&gt; header in a request,
enabling a client to check whether the page has changed since it last
requested the page. This reduces network traffic for the site, because
the server can skip returning a copy of the page if in fact it has not
changed.&lt;/p&gt;

&lt;p&gt;The plugin can also optionally generate &lt;code&gt;Cache-control&lt;/code&gt; and/or
&lt;code&gt;Expires&lt;/code&gt; headers to specify how long copies of a page should be
retained by caches. This reduces server load for the site, because web
proxies and other caching clients can use a cached copy of the page
and avoid sending additional requests for the page (including
conditional GET requests) to the site's server for as long as the page
remains fresh. Alternatively you can use the &lt;code&gt;Cache-control&lt;/code&gt; and
&lt;code&gt;Expires&lt;/code&gt; headers to specify that pages should not be cached at all
under any circumstance. This helps ensure that users always get the
most up-to-date content, at the expense of increased server load.&lt;/p&gt;

&lt;p&gt;Finally, the plugin also generates a &lt;code&gt;Content-length&lt;/code&gt; header
containing the length in bytes of the content ("entity body" in HTTP
1.1 jargon). Providing a &lt;code&gt;Content-length&lt;/code&gt; header supports persistent
connections for clients that use the HTTP 1.0 "keep-alive" mechanism
(as documented in section 19.7.1 of &lt;a href="http://www.ietf.org/rfcs/rfc2068.txt"&gt;RFC 2068&lt;/a&gt;); this can reduce the
number of connections to the site in some cases.&lt;/p&gt;

&lt;p&gt;Note that at present this plugin can be used as a replacement for the
lastmodified plugin and its default configuration is essentially
equivalent to that of the lastmodified plugin, as discussed below.&lt;/p&gt;

&lt;h2&gt;Installation and configuration&lt;/h2&gt;

&lt;p&gt;To install the lastmodifed2 plugin copy the plugin file into your
Blosxom plugin directory. You should not normally need to rename the
plugin; however see the discussion below.&lt;/p&gt;

&lt;p&gt;Configurable variables specify how the plugin handles validation
(&lt;code&gt;&amp;#036;generate_etag&lt;/code&gt;, &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt;, and &lt;code&gt;&amp;#036;strong&lt;/code&gt;), caching
(&lt;code&gt;&amp;#036;generate_cache&lt;/code&gt;, &lt;code&gt;&amp;#036;generate_expires&lt;/code&gt;, and &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt;),
whether or not to generate any other recommended headers
(&lt;code&gt;&amp;#036;generate_length&lt;/code&gt;), and whether to implement features from the
lastmodified plugin for compatibility (&lt;code&gt;&amp;#036;use_others&lt;/code&gt; and
&lt;code&gt;&amp;#036;export_dates&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;For validation the most common configurations are the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No validation: &lt;code&gt;&amp;#036;generate_etag&lt;/code&gt; and &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt; both set to 0.
The plugin does not generate &lt;code&gt;ETag&lt;/code&gt; or &lt;code&gt;Last-modified&lt;/code&gt; headers, and
does not check &lt;code&gt;If-none-match&lt;/code&gt; and &lt;code&gt;If-modified-since&lt;/code&gt; headers in the
request. Use this configuration if you plan to allow caching of
responses (as discussed below) but for some reason you don't want to
do validation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Weak validation: &lt;code&gt;&amp;#036;generate_etag&lt;/code&gt; and &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt; both set to 1,
&lt;code&gt;&amp;#036;strong&lt;/code&gt; set to 0. The plugin generates both &lt;code&gt;ETag&lt;/code&gt; and
&lt;code&gt;Last-modified&lt;/code&gt; headers based on the most recent time that any entry
on the page was modified; it checks for &lt;code&gt;If-none-match&lt;/code&gt; and/or
&lt;code&gt;If-modified-since&lt;/code&gt; headers in the request, and sends a 304 (Not
Modified) response with no output when it can do so. Use this
configuration if changes to your pages are only (or at least
primarily) due to changes to the entries themselves. This is the
default configuration, for compatibility with the lastmodified plugin.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strong validation: &lt;code&gt;&amp;#036;generate_etag&lt;/code&gt;, &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt;, and &lt;code&gt;&amp;#036;strong&lt;/code&gt;
all set to 1. The plugin generates both &lt;code&gt;ETag&lt;/code&gt; and &lt;code&gt;Last-modified&lt;/code&gt;
headers based on the current page's contents and our estimate as to
when the contents were last modified; the plugin checks for
&lt;code&gt;If-none-match&lt;/code&gt; and/or &lt;code&gt;If-modified-since&lt;/code&gt; headers in the request, and
sends a 304 response when it can do so. Use this configuration if your
pages contain comments or other material that is updated more
frequently than the entries themselves.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Strong validation using &lt;code&gt;ETag&lt;/code&gt; only: &lt;code&gt;&amp;#036;generate_etag&lt;/code&gt; and &lt;code&gt;&amp;#036;strong&lt;/code&gt;
set to 1, &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt; set to 0. The plugin generates only an
&lt;code&gt;ETag&lt;/code&gt; header (not &lt;code&gt;Last-modified&lt;/code&gt;) and checks only for an
&lt;code&gt;If-none-match&lt;/code&gt; header in the request (not &lt;code&gt;If-modified-since&lt;/code&gt;).  Use
this configuration if you want to support strong validation but don't
want the performance overhead of caching &lt;code&gt;Last-modified&lt;/code&gt; values as
previously described. Note that this configuration does not support
validation for HTTP 1.0 clients or other clients that do not support
validation using &lt;code&gt;If-none-match&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that if you set &lt;code&gt;&amp;#036;generate_mod&lt;/code&gt; and &lt;code&gt;&amp;#036;strong&lt;/code&gt; to 1 then you
might as well set &lt;code&gt;&amp;#036;generate_etag&lt;/code&gt; to 1 as well, since correctly
using &lt;code&gt;Last-modified&lt;/code&gt; as a strong validator requires that we generate
and cache MD5 digests of the page in order to detect any changes, and
these digests are also what we use to generate &lt;code&gt;ETag&lt;/code&gt; values.&lt;/p&gt;

&lt;p&gt;For caching the most common configurations are the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No caching: &lt;code&gt;&amp;#036;generate_cache&lt;/code&gt; and &lt;code&gt;&amp;#036;generate_expires&lt;/code&gt; both set to 0.
The plugin does not generate either a &lt;code&gt;Cache-control&lt;/code&gt; or &lt;code&gt;Expires&lt;/code&gt;
header, and thus web proxies and other clients will typically not
cache returned pages. This is the default configuration; use it if you
don't care about caching.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching allowed: &lt;code&gt;&amp;#036;generate_cache&lt;/code&gt; and &lt;code&gt;&amp;#036;generate_expires&lt;/code&gt; both set
to 1, and &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; set to a positive integer value. The
plugin generates &lt;code&gt;Cache-control&lt;/code&gt; and &lt;code&gt;Expires&lt;/code&gt; headers that allow for
caching of returned pages for up to &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; seconds from the
time of the request. Use this configuration if you'd like to allow
caching by proxies and other clients to reduce server hits due to GET
requests (whether conditional or not), and set &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; to a
value comparable to the frequency with which your site is updated.&lt;/p&gt;

&lt;p&gt;(By default &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; is set to 3,000 seconds, long enough
to provide some benefit through caching by web proxies, especially
during periods of heavy load, but short enough to ensure that news
aggregators doing hourly polling will always use up-to-date copies of
feeds.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Caching prohibited: &lt;code&gt;&amp;#036;generate_cache&lt;/code&gt; and &lt;code&gt;&amp;#036;generate_expires&lt;/code&gt; both
set to 1, and &lt;code&gt;&amp;#036;freshness_time&lt;/code&gt; set to 0. The plugin generates
&lt;code&gt;Cache-control&lt;/code&gt; and &lt;code&gt;Expires&lt;/code&gt; headers that specifically prohibit
caching of returned pages. Use this configuration if you want all
clients to always see the most up-to-date content.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that if you set &lt;code&gt;&amp;#036;generate_cache&lt;/code&gt; to 1 then you might as well
set &lt;code&gt;&amp;#036;generate_expires&lt;/code&gt; to 1 and vice versa, in order to properly
support both HTTP 1.1 and HTTP 1.0 clients; there is no performance
penalty for doing so.&lt;/p&gt;

&lt;p&gt;The other configurable variables are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;&amp;#036;generate_length&lt;/code&gt; controls whether or not generate a
&lt;code&gt;Content-length&lt;/code&gt; header. The default is to generate the header; you
can disable this by setting &lt;code&gt;&amp;#036;generate_length&lt;/code&gt; to 0. Note that support
of HTTP 1.0 persistent connections using &lt;code&gt;Content-length&lt;/code&gt; requires
that your web server be configured to support persistent connections
in the first place; for Apache this is done using the &lt;code&gt;KeepAlive On&lt;/code&gt;
directive in the Apache configuration file.&lt;/p&gt;

&lt;p&gt;Also note that HTTP 1.1 clients can use persistent connections
even if the &lt;code&gt;Content-length&lt;/code&gt; header is not present, if (like Apache)
the underlying web server supports HTTP 1.1 persistent connections for
CGI scripts using the &lt;code&gt;Connection&lt;/code&gt; header and chunked transfer
coding. However we generate a &lt;code&gt;Content-length&lt;/code&gt; header by default
because it's recommended by section 14.13 of the HTTP 1.1
specification.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;&amp;#036;use_others&lt;/code&gt; controls whether changes to flavour files and other
non-entry files in the Blosxom data directory should also be
considered semantically significant for weak validation. Note that
this feature is provided only for compatibility with the lastmodified
plugin and its use is deprecated; by default it is disabled.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;&amp;#036;export_dates&lt;/code&gt; controls whether or not the plugin should set the
following variables for use in flavour templates and other plugins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;#036;now_rfc822&lt;/code&gt; and &lt;code&gt;&amp;#036;now_iso8601&lt;/code&gt;: Current date/time, in RFC 822
and ISO 8601 formats respectively. These variables can be used in any
flavour template.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;#036;latest_rfc822&lt;/code&gt; and &lt;code&gt;&amp;#036;latest_iso8601&lt;/code&gt;: Date/time modified of the
most recently modified entry to be displayed on the page, in RFC 822
and ISO 8601 formats respectively. These variables can be used in any
flavour template.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;#036;others_rfc822&lt;/code&gt; and &lt;code&gt;&amp;#036;others_iso8601&lt;/code&gt;: Date/time modified of the
most recently modified non-entry file in the Blosxom data directory,
in RFC 822 and ISO8601 formats respectively. These variables can be
used in any flavour template, but are set only if &lt;code&gt;&amp;#036;use_others&lt;/code&gt; is set
to 1.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;#036;story_rfc822&lt;/code&gt; and &lt;code&gt;&amp;#036;story_iso8601&lt;/code&gt;: Date/time modified of the
current entry, in RFC 822 and ISO 8601 formats respectively. These
variables can be used in the story and date templates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;p&gt;Note that the ISO 8601 format produced is the complete date plus
hours, minutes and seconds: &lt;code&gt;YYYY-MM-DDThh:mm:ssTZD&lt;/code&gt; (e.g.,
&lt;code&gt;1997-07-16T19:20:30+01:00&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can set the variable &lt;code&gt;&amp;#036;debug&lt;/code&gt; to 1 or greater to produce
additional information useful in debugging the operation of the
plugin; the debug output is sent to your web server's error log.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
This plugin supplies &lt;code&gt;filter&lt;/code&gt;, &lt;code&gt;skip&lt;/code&gt;, &lt;code&gt;story&lt;/code&gt;, and &lt;code&gt;last&lt;/code&gt;
subroutines. It needs to run after any other plugin whose &lt;code&gt;filter&lt;/code&gt;
subroutine changes the list of entries included in the response;
otherwise the &lt;code&gt;Last-modified&lt;/code&gt; date may be computed incorrectly. It
needs to run after any other plugin whose &lt;code&gt;skip&lt;/code&gt; subroutine does
redirection (e.g., the &lt;a href="http://hecker.org/blosxom/canonicaluri"&gt;canonicaluri plugin&lt;/a&gt;) or otherwise
conditionally sets the HTTP status to any value other than
200. Finally, this plugin needs to run after any other plugin whose
&lt;code&gt;last&lt;/code&gt; subroutine changes the output for the page; otherwise the
&lt;code&gt;Content-length&lt;/code&gt; value (and the &lt;code&gt;ETag&lt;/code&gt; and &lt;code&gt;Last-modified&lt;/code&gt; values, if
you are using strong validation) may be computed incorrectly. If you
are encountering problems in any of these regards then you can force
the plugin to run after other plugins by renaming it to, e.g.,
99lastmodified2.&lt;/p&gt;

&lt;h2&gt;Bugs&lt;/h2&gt;

&lt;p&gt;Several of the following items are not in fact bugs, but the behaviors
in question may cause confusion in some cases; hence their inclusion
here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;As discussed above, with weak validation the &lt;code&gt;Last-modified&lt;/code&gt; header
generated may not always reflect the date/time at which the
bit-for-bit contents of the page most recently changed, and the
contents of the page may change without changing the &lt;code&gt;ETag&lt;/code&gt; value. In
particular, if changes are made to flavour files used in generating
the page or comments are added to a page via variable interpolation
(e.g., as done by the writeback plugin and others) then a user will
not necessarily see such changes without forcing an full reload of the
page (i.e., using an unconditional GET request). This should be
considered a feature and not a bug; if you are not comfortable with
this behavior then you should set &lt;code&gt;&amp;#036;strong&lt;/code&gt; to 1 to enable strong
validation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When &lt;code&gt;ETag&lt;/code&gt; generation is enabled and &lt;code&gt;Last-modified&lt;/code&gt; disabled (or
vice versa) and a request includes both an &lt;code&gt;If-none-match&lt;/code&gt; and
&lt;code&gt;If-modified-since&lt;/code&gt; header, the plugin will &lt;em&gt;not&lt;/em&gt; return a 304
response under any circumstances. This is not a bug, but rather
complies with section 13.3.4 of the HTTP 1.1 specification: "An
HTTP/1.1 origin server, upon receiving a conditional request that
includes both a Last-Modified date (e.g., in an If-Modified-Since or
If-Unmodified-Since header field) and one or more entity tags (e.g.,
in an If-Match, If-None-Match, or If-Range header field) as cache
validators, MUST NOT return a response status of 304 (Not Modified)
unless doing so is consistent with all of the conditional header
fields in the request."&lt;/p&gt;

&lt;p&gt;In other words, if a conditional request contains both tests and we
can't perform one of the tests (because we're not generating the
header value used in the test) then we can't return a 304 regardless
of the results of the other test.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If the &lt;code&gt;Cache-control&lt;/code&gt; and/or &lt;code&gt;Expires&lt;/code&gt; headers are enabled then a
user requesting to view a page will not necessarily see updates to
that page even if the underlying entries have been changed since the
last time the user viewed the page. This should be considered a
feature and not a bug; if you are not comfortable with this behavior
then you should not enable generation of the &lt;code&gt;Cache-control&lt;/code&gt; and/or
&lt;code&gt;Expires&lt;/code&gt; headers, or you should explicitly prohibit caching by
setting the freshness time to 0.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When using the &lt;code&gt;Expires&lt;/code&gt; header to prohibit caching, for strict
consistency with the HTTP 1.1 specification (section 14.21) the
date/time sent with the &lt;code&gt;Expires&lt;/code&gt; header should be equal to the
date/time sent with the &lt;code&gt;Date&lt;/code&gt; header. However we don't necessarily
know what the exact &lt;code&gt;Date&lt;/code&gt; value is (at least not for Apache, where it
is generated by the server itself), and it's possible that the current
date/time as measured in the plugin itself may be a little bit later
than the time in the &lt;code&gt;Date&lt;/code&gt; header, so instead we set the &lt;code&gt;Expires&lt;/code&gt;
value to be a minute before the current date/time (as measured in the
plugin itself).&lt;/p&gt;

&lt;p&gt;This should produce correct behavior for HTTP 1.0 clients relying
on the &lt;code&gt;Expires&lt;/code&gt; header, per section 10.7 of the HTTP 1.0
specification, as well as for HTTP 1.1 clients in the absence of a
&lt;code&gt;Cache-control&lt;/code&gt; header, per section 14.9.3 of the HTTP 1.1
specification.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As noted previously, if we're doing strong validation using
&lt;code&gt;Last-modified&lt;/code&gt; and we don't have a cached &lt;code&gt;Last-modified&lt;/code&gt; value
then we have to make up one; we arbitrarily set it to 5 seconds
prior to the current time. Since our value for the current time may be
later than that used in the &lt;code&gt;Date&lt;/code&gt; header (as noted in the previous
item), it's possible that the &lt;code&gt;Last-modified&lt;/code&gt; value generated may be
in the future relative to that sent with the &lt;code&gt;Date&lt;/code&gt; header,
especially if the CGI script takes a long time to run (e.g., because
of heavy load). This violates the HTTP 1.1 specification (see section
14.29).&lt;/p&gt;

&lt;p&gt;The probabability of this happening could be lessened by setting
an earlier &lt;code&gt;Last-modified&lt;/code&gt; time; however this increases the
possibility of having two updates occur within the &lt;em&gt;n&lt;/em&gt;-second time
window between the ostensible &lt;code&gt;Last-modified&lt;/code&gt; time and the current
time, and there may be race conditions associated with this that could
cause other problems, such as sending a &lt;code&gt;Last-modified&lt;/code&gt; value that's
earlier than one sent previously for the same URI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With strong validation using &lt;code&gt;Last-modified&lt;/code&gt; it's possible that the
plugin may attempt to update the cache file while another plugin
invocation (resulting from a simultaneous request) may attempt to read
it; more seriously, two plugin invocations may attempt to both update
the validator cache file simultaneously. I've tried to minimize
problems relating to this by having the plugin write out cache data to
a temporary file and then rename it to the real file; if the rename is
an atomic operation then this should eliminate the problem of a plugin
invocation trying to read from a partially-written validator cache
file.&lt;/p&gt;

&lt;p&gt;As for simultaneous updates, presumably the worst that can happen
is that one of the plugin invocations will fail to update the cache
entry for its URI (since its changes will be overwritten by the second
plugin invocation); however this simply means that the plugin won't be
able to send a 304 on a subsequent conditional GET for that URI, and
will then have to update the cache file again.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As noted above, you may experience problems if you install this
plugin with other plugins that set HTTP status in the &lt;code&gt;skip&lt;/code&gt;
subroutine. Blosxom stops executing &lt;code&gt;skip&lt;/code&gt; subroutines as soon as one
returns a true value, so whichever plugin is first in the execution
order will get to set the final HTTP status.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;To do&lt;/h2&gt;

&lt;p&gt;Here are some ideas for ways in which the lastmodified2 plugin could
be enhanced and extended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Support selective use of strong or weak validation depending on the
flavour. For example, weak validation would probably work fine for RSS
and Atom feeds, since they typically contain content only for entries;
however strong validation may be needed for the HTML flavour of
individual entry pages (and, to a lesser extent, HTML index pages) in
order to pick up changes due to comments.&lt;/p&gt;

&lt;p&gt;Note that doing this would be perfectly compatible with the HTTP
1.1 protocol specification, since different flavours correspond to
different URIs; any given URI (or set of URIs) could be either
strongly validated or weakly validated independent of any other URIs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support specifying different freshness times for different types of
content, e.g., for different flavours, for individual entries
vs. entry index pages, and/or for current index pages vs. archive
index pages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Try to make the filename for the validator cache temporary file more
unique to minimize the possibility of name collisions by simultaneous
plugin invocations. (Perhaps use &lt;a href="http://search.cpan.org/~jhi/Time-HiRes-1.66/HiRes.pm"&gt;Time::HiRes&lt;/a&gt; to get subsecond times?)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For completeness, support the case where the &lt;code&gt;If-none-match&lt;/code&gt; header
has the value '*' (which matches any entity). See section 14.26 of the
HTTP 1.1 specification for the desired behavior in this case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For completeness, support conditional GETs using the &lt;code&gt;If-match&lt;/code&gt;
and/or &lt;code&gt;If-unmodified-since&lt;/code&gt; headers within the plugin itself, in
addition to &lt;code&gt;If-none-match&lt;/code&gt; and/or &lt;code&gt;If-modified-since&lt;/code&gt;. However note
that this doesn't appear to be necessary for Apache, since it appears
to correctly make these checks as long as &lt;code&gt;ETag&lt;/code&gt; and/or
&lt;code&gt;Last-modified&lt;/code&gt; headers are returned by the CGI script.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/validating-and-caching</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/validating-and-caching" />

    <title type="text">Validating and caching dynamic content</title>
    <published>2005-01-09T05:25:00Z</published>
    <updated>2005-01-09T05:25:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>One of the things I enjoy about setting up my own blog with the
<a href="http://www.blosxom.com/">Blosxom software</a> is learning about the deep
details of web protocols and formats that I've never worried about
before. (This might have been the case if I'd used another blogging
system, but the hackable nature of Blosxom inspires, nay, almost
demands it.) Lately I've been educating myself about <a href="http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers " title="HTTP Conditional GET for RSS Hackers">HTTP conditional
GET
requests</a> and <a href="http://www.mnot.net/cache_docs/ " title="Caching Tutorial for Web Authors and Webmasters">validation and caching of
dynamically-generated content</a>.</p>

<p>In this post I discuss the subtleties of validating and caching
dynamic content in general, and then in a separate post I tell how I
created the <a href="http://hecker.org/blosxom/lastmodified2">lastmodified2 plugin</a> for
Blosxom, a rewrite of the <a href="http://www.cobblers.net/files/lastmodified">lastmodified
plugin</a>.</p>

<p>I'm writing this really for my own education more than anything else
(under the theory that you don't really understand something until you
can explain it), but others may find it useful as well. My goal is to
explain how the HTTP protocol actually works in this context (as
opposed to just saying "do this" without explaining why) while at the
same time avoiding "protocol geekery" that's irrelevant to the problem
at hand.</p>

<h2>The problem</h2>

<p>Suppose that you have a blog (or other web site) whose content is
generated dynamically in response to incoming requests. (In other
words, you are not using Blosxom in static mode, or another blogging
system like <a href="http://www.movabletype.org/">MovableType</a> that normally generates static pages.) In
practice there are several types of web clients that might access such
a site, of which the following are the most common:</p>

<ul>
<li>web browsers used by humans viewing web pages</li>
<li>web proxy servers supporting browser users by accessing and caching
web pages on their behalf</li>
<li>search engines "spidering" a site: downloading pages, following
links to find more pages, and indexing the results</li>
<li>news aggregators downloading RSS and/or Atom feeds, typically on a
periodic basis</li>
</ul>

<p>If we generate a full dynamic response for each and every request (as
is the case with standard Blosxom, for example) then this produces a
lot of network traffic and server load, some of which we could avoid.
In the blogosphere most of the attention has been on traffic generated
by feed aggregators, but (<a href="http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers" title="HTTP Conditional GET for RSS Hackers">as pointed out by Charles Miller</a>
and others) there's really no need to treat RSS feeds as a special
case, at least initially. (Some people have proposed <a href="http://bobwyman.pubsub.com/main/2004/09/using_rfc3229_w.html" title="Using RFC3229 with Feeds">more advanced
techniques</a> specifically tailored to RSS/Atom feeds, but these
techniques presuppose use of the techniques I describe here.)</p>

<h2>The tools at hand</h2>

<p>We have three goals in dealing with web clients: to minimize network
traffic to and from our site, to minimize server load for our site,
and to deliver up-to-date content to the various users of the site; as
discussed below, these three goals are often in conflict with each
other, but we can usually implement a reasonable trade-off.</p>

<p>We have at least three approaches we can take to help achieve these
goals: We can reduce the need for clients to make multiple network
connections to the site, we can provide ways for clients to validate
whether they need to re-download a page that they've previously
downloaded, and we can provide clients "freshness" information telling
them how long they can keep copies of pages without having to
revalidate them. (There is also a fourth possible approach, namely to
use compression techniques to reduce the size of page data returned to
the client; I hope to discuss this in a future article.)</p>

<h3>Persistent connections</h3>

<p>Reducing the number of needed connections can be done through support
of so-called "persistent connections". The original <a href="http://www.faqs.org/ftp/rfc/rfc1945.html" title="RFC 1945: Hypertext Transfer Protocol -- HTTP/1.0">HTTP 1.0
protocol</a> required the client to open up a new network
connection for each and every request; for an HTML page with lots of
included images this might amount to a dozen or more connections, each
of which the server has to accept and then close after
responding. Support for persistent connections allows a client to open
up one connection and make several requests over that connection,
either one after the other or nearly simultaneously ("pipelining").</p>

<p>A form of persistent connections was introduced as an extension to
HTTP 1.0 and described in section 19.7.1 of <a href="http://www.ietf.org/rfc/rfc2068.txt" title="RFC 2068: Hypertext Transfer Protocol -- HTTP/1.1">RFC 2068</a>, an earlier
version of the HTTP 1.1 specification. Using this scheme a client
sends a <code>Connection: Keep-Alive</code> header in the request to the server,
with the server then keeping the connection open after sending its
response. However the client needs some indication from the server as
to when the response is actually complete; this is provided by a
<code>Content-length</code> header in the response that provides the size in
bytes of the response (more correctly, the "entity-body" part of the
response, after the headers).</p>

<p>Providing a <code>Content-length</code> value requires determining the length of
the output prior to sending it.  This is easy to do for static files
but more difficult for dynamic content (e.g. CGI output), and hence
many content generation tools (including Blosxom and other blogging
systems) do not produce <code>Content-length</code> headers by default.</p>

<p>In HTTP 1.1 a new scheme for persistent connections was introduced; in
this scheme the response can be broken up into multiple "chunks", with
each chunk accompanied by an indication of its size. When Apache is
configured to support persistent connections (using the <a href="http://httpd.apache.org/docs-2.0/mod/core.html#keepalive"><code>KeepAlive
On</code> directive</a>) then it can automatically handle persistent
connections for dynamic content without the need for a
<code>Content-length</code> header.</p>

<h3>Validation</h3>

<p>Validation can be done in the HTTP protocol by using a so-called
"conditional" GET request, which is in turn implemented by using one
or both of two special HTTP headers sent with the request:
<code>If-modified-since</code> and/or <code>If-none-match</code>. (The <code>If-modified-since</code>
header is supported in both HTTP 1.0 and HTTP 1.1, while the
<code>If-none-match</code> header is only in HTTP 1.1. In practice servers can
and should recognize and properly handle either of them.)</p>

<p>For example, using the <code>If-modified-since</code> HTTP header a client can
tell the server, "Give me this page, but only if it's been modified
since 9:08 am on December 17, 2004". This date could represent the
last time the client downloaded the page; alternatively it could
represent the last time the page was actually modified, as identified
in a <code>Last-modified</code> HTTP header returned by the site as part of the
response to a previous page request from that client.</p>

<p>Similarly, using the <code>If-none-match</code> HTTP header a client can tell the
server, "Give me this page, but only it's different from the version
'foo' I already have". The version is identified using an "entity tag"
(or "etag") assigned by the server to each new version of the page;
the entity tag value is contained in an optional <code>ETag</code> HTTP header
previously returned by the site as part of the response for that page.</p>

<p>Note that some people have suggested using HTTP HEAD requests for
validating pages: Send a HEAD request, check the <code>Last-modified</code> or
<code>ETag</code> value in the response, and then send a GET request
(unconditional) if the page appears to have changed. However this
approach is inferior to using a conditional GET, for at least two
reasons:</p>

<ul>
<li>Using HEAD for page validation requires two HTTP requests and
responses to accomplish the same purpose as a single conditional GET
request and response. This leads to increased network traffic and
longer network latency relative to using conditional GETs.</li>
<li>A response to a HEAD request is supposed to contain the exact same
HTTP headers as would the response to a GET request for the same URI;
the only difference is that a response to a HEAD request doesn't
contain any actual content (i.e., an entity-body). For dynamic content
this often means that to properly satisfy a HEAD request you have to
generate exactly the same content you would for a GET request, only to
discard the content after creating the headers; for example, this is
true if you're generating the <code>Content-length</code> header, and is also
true for certain approaches to generating <code>ETag</code> and <code>Last-modified</code>
values, as discussed below. This leads to increased server load
relative to using conditional GET requests.</li>
</ul>

<p>A site based on dynamic content should be able to properly respond to
HEAD requests (as required by the HTTP specifications), but should
support conditional GET requests as the primary mechanism for page
validation.</p>

<h3>Freshness and Caching</h3>

<p>Validation can reduce the network bandwidth used by your site, since
the site does not always need to send back full copies of the pages;
however the clients are still hitting the site, if only to validate
pages, and this still puts a load on the server. To reduce this load
the site can also indicate how long a response should be considered
"fresh"--in other words, how long clients can wait before having to
return to the site to check for a new version of the page.</p>

<p>Freshness tests can be done in the HTTP protocol using one or both of
two special HTTP headers sent with the site's response to a request:
<code>Expires</code> and/or <code>Cache-control</code>. The <code>Expires</code> header is supported
in HTTP 1.0, while the <code>Cache-control</code> header was introduced with HTTP
1.1; sites can and should support both headers.</p>

<p>The <code>Expires</code> header is like the "use by" date on a perishable item in
a grocery store: For example, a site can tell a client, "If you keep a
copy of this page, throw it away after 7:00 pm on December 20, 2004;
if you need a copy after that please check to see if there's a new
version available". The <code>Cache-control</code> header can be used similarly,
except expressing the "use by" date in terms of a time relative to the
time of the request: "Don't keep a copy of this page longer than 12
hours from now".</p>

<p>Regardless of whether the <code>Expires</code> or <code>Cache-control</code> header is used,
the net effect is the same: Clients downloading the page are
instructed to keep a copy of the page for a specified period of time
and reuse it as necessary during that time, and to avoid contacting
the site again to request that page (even with a conditional GET)
until that time period is over.</p>

<h3>The strategy</h3>

<p>Based on the techniques available, a suitable strategy for a
dynamically-generated site is then as follows:</p>

<ul>
<li>If possible, use a web server that supports HTTP 1.1 persistent
connections for CGI output.  In addition, when sending responses to
requests add a <code>Content-length</code> header to identify the total number of
bytes in the response, in order to support HTTP 1.0 persistent
connections.</li>
<li>When sending responses to requests, add an <code>ETag</code> header to
identifiy the "version number" (entity tag) for this particular
version of the page, and/or a <code>Last-Modified</code> header to identify the
date/time the page was last modified.</li>
<li>When sending responses to requests, also add <code>Cache-control</code> and
<code>Expires</code> headers to the response to provide a "use by" date/time to
clients doing caching.</li>
<li>When processing requests, look for the <code>If-none-match</code> and
<code>If-modified-since</code> headers. If one or both are present, return the
full page in the response only if necessary: if the version of the
page currently available is different than the version requested in
the <code>If-none-match</code> header, or if the page has been modified since the
date in the <code>If-modified-since</code> header.</li>
</ul>

<h2>What's new?</h2>

<p>The strategy outlined above seems simple enough, but we've glossed
over a crucial and surprisingly difficult question: How do we
determine if and when a page has changed and a new version has been
created?</p>

<p>The creators of the HTTP protocol specification suggested two
different approaches to this question, with two correspondingly
different ways to use the <code>ETag</code> header. (For various reasons too
geeky to go into, <code>ETag</code> is a better example here than
<code>Last-modified</code>.)</p>

<p>The strict approach is to consider a page to be changed if even one
bit on the page changes. For example, in the context of Blosxom if you
made even a single-character correction to a flavour template then any
page using that template would be considered to have changed. When
sending an <code>ETag</code> header for such a page you would then be duty-bound
to update the entity tag value identifying the version for that
page. Under this approach the <code>ETag</code> header is considered to be a
"strong validator" in HTTP jargon.</p>

<p>A looser approach is to consider the page to be changed only if the
essential "meaning" of the page changes, where you as the site author
get to decide what that meaning actually is in this context. For
example, if you are primarily concerned with RSS feeds then you might
decide that the response sent to news aggregators and other clients
should be considered changed only if there were content changes to any
of the entries included in the response. You would then be free to
keep the entity tag value sent in the <code>ETag</code> header the same as long
as the underlying entries didn't change; here you're using the <code>ETag</code>
header as a "weak validator".</p>

<p>Having a strong validator as described above is important in cases
where knowing about bit-level changes is absolutely required. The most
common example of this is downloading large binary files where the
download might be interrupted for some reason and the client wishes to
resume from the point at which it was interrupted (as opposed to
restarting the download from the beginning). For this purpose HTTP
provides a mechanism whereby clients can request a range of bytes for
a resource, so that (for example) a client can tell the server "give
me bytes 737878-1643324 for this resource (I already have the
others)".</p>

<p>In order for this to work properly, when the client goes back to the
server to pick up the rest of the file the client has to know that the
version of the file for which it's getting the new set of bytes is
<em>exactly</em> the same as the version for which it got the first
(interrupted) set of bytes; otherwise it will end up with a corrupted
copy. The <code>ETag</code> header can provide the necessary version information,
but only if it's a strong validator as described above.</p>

<p>However using the etag as a weak validator is arguably a better
approach for a typical blog, both because it better fits the nature of
the content (most people care more about the prose content of the page
than about the exact bytes making up the page) and also because
correctly implementing strong validation for dynamic content can be
more difficult and time-consuming (in terms of server load), at least
for Blosxom.</p>

<p>However implementing weak validation has its own problems as well. In
particular, for Blosxom there are changes which at least for some
would arguably change the "meaning" of a page but which can be
difficult to detect in practice without checking for byte-for-byte
changes; the most important examples are having new comments for an
individual entry's page or a new number of comments for an entry
listing on an index page. In these cases the change is typically
introduced through interpolating variables when processing flavour
templates, and hence you can't use the date/time modified for either
the entry file or the flavour template as a guide to when the change
occurred.</p>

<p>(You could potentially look at the date/time modified for
comment-related files as stored in the Blosxom plugin state directory
or elsewhere, but this would require knowing exactly what plugin is
being used to generate comments, and how it stores comment-related
information. This is one of the drawbacks to Blosxom's minimal
approach to blogging, in which a comments capability isn't a standard
feature of the software but has to be implemented by add-on software,
with different sites using different comments plugins.)</p>

<p>I discuss this and other implementation issues in more detail in my
<a href="http://hecker.org/blosxom/lastmodified2">next post</a>.</p>

<h2>For more information</h2>

<p>Here are some useful reference documents and related material I
consulted while researching the issue of validating and caching
dynamic content in the course of creating the lastmodified2
plugin. The main documents of interest are the following:</p>

<ul>
<li><a href="http://www.mnot.net/cache_docs/">"Caching Tutorial for Web Authors"</a> by <a href="http://www.mnot.net/">Mark
Nottingham</a> is the best introductory document I've found on page
validation and caching.</li>
<li>The <a href="http://www.faqs.org/ftp/rfc/rfc2616.html">HTTP 1.1 protocol specification</a> (RFC 261, "Hypertext
Transfer Protocol -- HTTP/1.1") is the ultimate authority for how
validation and caching should work. See in particular sections 10.3.5
("304 Not Modified"), 13 ("Caching in HTTP"), 14.9 (<code>Cache-control</code>
header), 14.13 (<code>Content-length</code> header), 14.19 (<code>ETag</code> header), 14.21
(<code>Expires</code> header), 14.25 (<code>If-modified-since</code> header), and 14.26
(<code>If-none-match</code> header).</li>
</ul>

<p>You may also find the following documents of interest:</p>

<ul>
<li><a href="http://fishbowl.pastiche.org/2002/10/21/http_conditional_get_for_rss_hackers">"Conditional GET for RSS Hackers"</a> by <a href="http://fishbowl.pastiche.org/">Charles
Miller</a> is a basic tutorial on implementing conditional GETs in the
context of a blog. However it lacks an in-depth discussion of strong
vs. weak validation and why the distinction matters.</li>
<li>The post <a href="http://philringnalda.com/blog/2002/10/joels_rss_problem.php">"Joel's RSS Problem"</a> on <a href="http://philringnalda.com/">Phil Ringnalda's
blog</a> is a good example of various views on how to
address the problem of blog's being overloaded by aggregator requests,
including links to related blog posts and articles.</li>
<li><a href="http://www.modperlbook.org/html/ch16_01.html" title="Practical mod_perl, chapter 16: HTTP Headers for Optimal Performance">Chapter 16</a> of the book <cite><a href="http://www.modperlbook.org/">Practical
mod_perl</a></cite> has some good in-depth information on
the issue of validation and caching of dynamic content and strong
vs. weak validators.</li>
<li>Though it's been superceded by RFC 2616, the <a href="http://www.faqs.org/ftp/rfc/rfc2068.html" title="replaced by RFC 2616">original version of
the HTTP 1.1 protocol specification</a> (RFC 2068,
"Hypertext Transfer Protocol -- HTTP/1.1") is worth consulting for its
description (in section 19.17) of the HTTP 1.0 "Keep-Alive" extension
for persistent connections.</li>
<li>The original <a href="http://www.faqs.org/ftp/rfc/rfc1945.html" title="RFC 1945: Hypertext Transfer Protocol -- HTTP/1.0">HTTP 1.0 protocol specification</a> (RFC 1945,
"Hypertext Transfer Protocol -- HTTP/1.0") is mainly of historical
interest. (The HTTP 1.1 specification addresses backwards
compatibility for HTTP 1.0 clients.)</li>
</ul>
</div>
    </content>
  </entry>
<entry>
    <id>tag:hecker.org,2005:/blosxom/emptymessage-patch</id>
    <link rel="alternate" type="text/html" href="http://hecker.org/blosxom/emptymessage-patch" />

    <title type="text">Emptymessage patch for Apache compatibility, etc.</title>
    <published>2005-01-08T13:15:00Z</published>
    <updated>2005-01-08T13:15:00Z</updated>
    <category term="blosxom" />
    <author>
      <name>Frank Hecker</name>
      <uri>http://hecker.org</uri>
    </author>
    <content type="xhtml" xml:base="http://hecker.org" xml:lang="en">
<div xmlns="http://www.w3.org/1999/xhtml"><p>When stock Blosxom sees a URL that doesn't correspond to an existing
entry or list of entries, it simply puts up a "normal" page (i.e.,
using the standard heat and foot templates for that flavour) that
doesn't have any actual content. I really don't like this behavior,
and thus I decided to try out the <a href="http://fletcher.freeshell.org/downloads/emptymessage">emptymessage
plugin</a> created
by <a href="http://fletcher.freeshell.org/">Fletcher Penney</a>. Unfortunately I
wasn't entirely happy with its behavior either, and so I decided to
patch it.</p>

<p>First, I didn't like the standard "404 Not Found" message produced by
the plugin; it was reminiscent of the standard message produced by
Apache, but different enough that it got on my nerves. My first patch
was therefore to make the 404 message produced by the emptymessage
plugin look <em>exactly</em> like the standard message produced by Apache
2.0, even down to including the same information about the server
version, etc. (You can compare for yourself by trying a <a href="http://hecker.org/foo/">bogus URL</a>
for my blog vs. <a href="http://www.tsc.org/foo/">another bogus URL</a> for a static site hosted on the
same system.)</p>

<p>My second patch wasn't to fix an actual problem but rather to make the
emptymessage plugin work better with another plugin I'm creating that
also wants to set the HTTP <code>Status</code> header in the response. In the
original emptymessage code the HTTP status is set by doing an actual
<code>print</code> operation to output the <code>Status</code> header; unfortunately doing
it this way means that other plugins have no way of detecting that the
status was thus set, in case they want to modify their own behavior.</p>

<p>To address this problem I patched the emptymessage plugin to set the
status header using <code>$blosxom::header-&gt;{'Status'}</code>; other plugins can
then check that variable to determine what the HTTP status will be. At
the same time I also changed the emptymessage plugin to produce a
<code>Content-length</code> header (by setting
<code>$blosxom::header-&gt;{'Content-length'}</code>) for further compatibility with
Apache.</p>

<p>The above patches worked fine in testing (which I do using a
stand-alone Apache server located on my laptop) but when I deployed
the emptymessage plugin to my production site it didn't seem to be
working at all: The returned response for an empty/non-existent page
was exactly the same as if the plugin were not installed.</p>

<p>After doing some debugging I discovered the problem: The data
directory on my production site has a period in the pathname (e.g.,
<code>/usr/local/foo.bar</code>) and the emptymessage plugin was mangling the
pathname in its attempts to take the Blosxom <code>$currentdir</code> variable,
prepend the data directory pathname, and then strip off any flavour
suffix (i.e., after a '.'). I patched this by changing the order of
operations so that the flavour suffix was stripped <em>before</em> prepending
the data directory.</p>

<p>For more details on the above changes see the <a href="http://hecker.org/blosxom/plugins/emptymessage-03.patch" title="patch to emptymessage plugin version 0.3">patch itself</a>.</p>

<p>A final note: The emptymessage plugin still doesn't address one
important case, namely when a request is made for a date-based archive
page (e.g., <code>http://www.example.com/blog/2005/01</code>) and there are no
entries for that date. I may look at providing a patch for that later.</p>
</div>
    </content>
  </entry>
</feed>
