<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
 
 <title>Zopatista</title>
 
 <link href="http://www.zopatista.com" />
 <updated>2012-06-14T13:07:02-07:00</updated>
 <id>http://www.zopatista.com</id>
 <author>
   <name>Martijn Pieters</name>
   <email>mj@zopatista.com</email>
 </author>

 
 <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/zopatista" /><feedburner:info uri="zopatista" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry>
   <title>Portlets as ESI include</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/bAKY-rxWJek/portlets-as-esi-include" />
   <updated>2012-06-14T00:00:00-07:00</updated>
   <id>http://www.zopatista.com/plone/2012/06/14/portlets-as-esi-include</id>
   <content type="html">
&lt;p&gt;&lt;em&gt;Using &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; includes to cache Plone portlets separately.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To help with making a large and busy intranet website perform better, we&amp;rsquo;ve used a light sprinkling of &lt;a href="https://en.wikipedia.org/wiki/Edge_Side_Includes"&gt;&lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt;&lt;/a&gt; (via &lt;a href="https://www.varnish-cache.org/trac/wiki/ESIfeatures"&gt;Varnish&amp;rsquo;s &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; support&lt;/a&gt;) to improve the cacheabilibty of pages in the site. By delegating assembly of parts of the page to the Varnish cache, pages become much more cacheable as frequently changing chunks such as the personal bar at the top are requested separately.&lt;/p&gt;

&lt;h2 id="portlets-via-esi-include"&gt;Portlets via &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; include&lt;/h2&gt;

&lt;p&gt;One such chunk we separated out is the right-hand portlets column. Varnish has been configured to set a special header so that we can detect that &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; is supported:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="c"&gt;&lt;span class="n"&gt;sub&lt;/span&gt; &lt;span class="n"&gt;vcl_recv&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="cp"&gt;    # Indicate that a varnish capable of doing &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; is in front...&lt;/span&gt;
    &lt;span class="n"&gt;set&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;&lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;&amp;quot;esi&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Using this header we can then conditionally swap out the portlets column with an &lt;code&gt;&amp;lt;esi:include&amp;gt;&lt;/code&gt; statement; this makes site development much easier as we do not have to run Varnish just to see the site working. Here is the relevant section from the &lt;code&gt;main_template.pt&lt;/code&gt; file:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="xml"&gt;&lt;span class="nt"&gt;&amp;lt;td&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;portal-column-two&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;metal:define-slot=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;column_two_slot&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;tal:condition=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;sr&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;visualPadding&amp;quot;&lt;/span&gt;
       &lt;span class="na"&gt;tal:define=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="s"&gt;           esi_header request/HTTP_X_ESI | nothing;&lt;/span&gt;
&lt;span class="s"&gt;           base context/@@plone_context_state/current_base_url | nothing;&lt;/span&gt;
&lt;span class="s"&gt;           location python:base and base.rstrip(&amp;#39;/&amp;#39;).split(&amp;#39;/&amp;#39;)[-1].lstrip(&amp;#39;@&amp;#39;);&lt;/span&gt;
&lt;span class="s"&gt;           esi python:esi_header and (location not in (&lt;/span&gt;
&lt;span class="s"&gt;               &amp;#39;manage-portlets&amp;#39;, &amp;#39;manage-content-type-portlets&amp;#39;));&lt;/span&gt;
&lt;span class="s"&gt;           queryString request/QUERY_STRING;&lt;/span&gt;
&lt;span class="s"&gt;           queryString python: queryString and &amp;#39;?&amp;#39; + queryString or &amp;#39;&amp;#39;;&lt;/span&gt;
&lt;span class="s"&gt;           &amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;metal:portlets&lt;/span&gt; &lt;span class="na"&gt;define-slot=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;portlets_two_slot&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;esi:include&lt;/span&gt; &lt;span class="na"&gt;tal:condition=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;esi&amp;quot;&lt;/span&gt;
          &lt;span class="na"&gt;tal:attributes=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;src string:${context/absolute_url}/@@right-column${queryString}&amp;quot;&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;tal:noesi&lt;/span&gt; &lt;span class="na"&gt;condition=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;not: esi&amp;quot;&lt;/span&gt;
                 &lt;span class="na"&gt;replace=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;structure provider:plone.rightcolumn&amp;quot;&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/metal:portlets&amp;gt;&lt;/span&gt;
    &lt;span class="ni"&gt;&amp;amp;nbsp;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Note that we are making sure that &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; is also not applied when using the portlet management views.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;@@right-column&lt;/code&gt; view is simply a template:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="xml"&gt;&lt;span class="nt"&gt;&amp;lt;html&lt;/span&gt; &lt;span class="na"&gt;tal:omit-tag=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&lt;/span&gt; &lt;span class="na"&gt;tal:omit-tag=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;tal:block&lt;/span&gt; &lt;span class="na"&gt;replace=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;structure provider:plone.rightcolumn&amp;quot;&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This whole setup was working swimmingly; we could cache pages for extended periods of times with things like the portlets updating much more frequently and with caching keyed to specific groups of users.&lt;/p&gt;

&lt;h2 id="where-did-that-portlet-go"&gt;Where did that portlet go?&lt;/h2&gt;

&lt;p&gt;This being a large and complex intranet, it took some time for someone to notice that some lightly-used portlets were no longer showing up. These were portlets that depend on certain content being there, so their absence was not necessarily a problem. However, it was becoming clear that even when their specific conditions were being met, they were not being rendered still. This was quickly narrowed down to the &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt;-included portlet rendering; if you bypassed the cache the portlets would show up.&lt;/p&gt;

&lt;p&gt;So what went wrong?&lt;/p&gt;

&lt;p&gt;Portlets are essentially rendered as part of the Zope viewlet framework. Viewlets are snippets of page output that are looked up by a key consisting of the current context, the current request, the current view and the viewlet manager. Portlets thus have access to these same pieces of information, and you can thus register portlets that only show for certain contexts (particular content types, marker interfaces, etc.), browser layers (usually themes), and even only for specific views or portlet managers (tying the portlet to the left, right or dashboard portlet wells).&lt;/p&gt;

&lt;p&gt;With the lesser-known &lt;a href="https://github.com/plone/plone.app.portlets/blob/7a6303400b4ecf7595fb21ec9c43b38b31fb8aca/plone/app/portlets/metadirectives.py#L67"&gt;&lt;code&gt;&amp;lt;plone:portletRenderer /&amp;gt;&lt;/code&gt; directive&lt;/a&gt;, you can also vary the way portlets are rendered for the above keys. Thus, a portlet can look different in different themes, different portlet managers, or when a certain extra marker interface is present on your content objects. This is what had happened to the vanished portlets here; they had been tied to specific &lt;em&gt;views&lt;/em&gt;:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="xml"&gt;&lt;span class="nt"&gt;&amp;lt;configure&lt;/span&gt;
    &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://namespaces.zope.org/zope&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;xmlns:plone=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;http://namespaces.plone.org/plone&amp;quot;&lt;/span&gt;
    &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;

  &lt;span class="nt"&gt;&amp;lt;plone:portlet&lt;/span&gt;
    &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;foobar.portlets.localcalendar&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;interface=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.ILocalCalendarPortlet&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;assignment=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.Assignment&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;renderer=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.Hidden&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;addview=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.AddForm&amp;quot;&lt;/span&gt;
    &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;

  &lt;span class="c"&gt;&amp;lt;!-- My HQ page --&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;plone:portletRenderer&lt;/span&gt;
    &lt;span class="na"&gt;portlet=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.ILocalCalendarPortlet&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.Renderer&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;view=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;foobar.types.browser.mychain.MyChainView&amp;quot;&lt;/span&gt;
    &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;

  &lt;span class="c"&gt;&amp;lt;!-- My Store page --&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;plone:portletRenderer&lt;/span&gt;
    &lt;span class="na"&gt;portlet=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.ILocalCalendarPortlet&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;.localportlet.Renderer&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;view=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;foobar.types.browser.store.StoreView&amp;quot;&lt;/span&gt;
    &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/configure&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The above &lt;code&gt;plone:portlet&lt;/code&gt; declaration registers a portlet that is hidden by default. The two &lt;code&gt;plone:portletRenderer&lt;/code&gt; declarations then assign new renderers when certain views are being used instead. This neat trick allows for the portlet to be targeted very specifically.&lt;/p&gt;

&lt;p&gt;This all works great, unless you use a dedicated view for &lt;abbr title="Edge Side Includes"&gt;ESI&lt;/abbr&gt; rendering of the portlets. Suddenly the current view is no longer &lt;code&gt;MyChainView&lt;/code&gt; or &lt;code&gt;StoreView&lt;/code&gt;, but rather &lt;code&gt;@@right-column&lt;/code&gt;. Thus the dedicated renderer is skipped in favour of the &lt;code&gt;.localportlet.Hidden&lt;/code&gt; renderer, which does what it says on the tin: not render.&lt;/p&gt;

&lt;h2 id="reconstruct-the-whole-context"&gt;Reconstruct the &lt;em&gt;whole&lt;/em&gt; context&lt;/h2&gt;

&lt;p&gt;The solution is of course to reconstruct the whole context; the &lt;code&gt;@@right-column&lt;/code&gt; view already had most things right, only the current view is wrong. With a simple set of TAL declarations we can set up a new value for the &lt;code&gt;view&lt;/code&gt; variable when rendering the portlets. Here is the reworked &lt;code&gt;main_template.pt&lt;/code&gt; code:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="xml"&gt;&lt;span class="nt"&gt;&amp;lt;td&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;portal-column-two&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;metal:define-slot=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;column_two_slot&amp;quot;&lt;/span&gt;
    &lt;span class="na"&gt;tal:condition=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;sr&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;visualPadding&amp;quot;&lt;/span&gt;
       &lt;span class="na"&gt;tal:define=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="s"&gt;           esi_header request/HTTP_X_ESI | nothing;&lt;/span&gt;
&lt;span class="s"&gt;           base context/@@plone_context_state/current_base_url | nothing;&lt;/span&gt;
&lt;span class="s"&gt;           location python:base and base.rstrip(&amp;#39;/&amp;#39;).split(&amp;#39;/&amp;#39;)[-1].lstrip(&amp;#39;@&amp;#39;);&lt;/span&gt;
&lt;span class="s"&gt;           esi python:esi_header and (location not in (&amp;#39;manage-portlets&amp;#39;, &amp;#39;manage-content-type-portlets&amp;#39;));&lt;/span&gt;
&lt;span class="s"&gt;           viewContext string:?__view_context=${view/__name__};&lt;/span&gt;
&lt;span class="s"&gt;           queryString request/QUERY_STRING;&lt;/span&gt;
&lt;span class="s"&gt;           queryString python: queryString and viewContext + &amp;#39;&amp;amp;amp;&amp;#39; + queryString or viewContext;&lt;/span&gt;
&lt;span class="s"&gt;                   &amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;metal:portlets&lt;/span&gt; &lt;span class="na"&gt;define-slot=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;portlets_two_slot&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;esi:include&lt;/span&gt; &lt;span class="na"&gt;tal:condition=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;esi&amp;quot;&lt;/span&gt;
                   &lt;span class="na"&gt;tal:attributes=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;src string:${context/absolute_url}/@@right-column${queryString}&amp;quot;&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
      &lt;span class="nt"&gt;&amp;lt;tal:noesi&lt;/span&gt; &lt;span class="na"&gt;condition=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;not: esi&amp;quot;&lt;/span&gt;
                 &lt;span class="na"&gt;replace=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;structure provider:plone.rightcolumn&amp;quot;&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/metal:portlets&amp;gt;&lt;/span&gt;
    &lt;span class="ni"&gt;&amp;amp;nbsp;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/td&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;We use a GET parameter to pass along the name of the view to look up; I&amp;rsquo;ve used a double-underscore prefix here to reduce the chances we clash with a query string parameter used elsewhere in the site. The &lt;code&gt;@@right-column&lt;/code&gt; view then restores this view for portlet rendering (with a fallback to the Plone default view context &lt;code&gt;@@plone&lt;/code&gt;):&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="xml"&gt;&lt;span class="nt"&gt;&amp;lt;html&lt;/span&gt; &lt;span class="na"&gt;tal:omit-tag=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;body&lt;/span&gt; &lt;span class="na"&gt;tal:omit-tag=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;tal:block&lt;/span&gt;
    &lt;span class="na"&gt;define=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;viewname request/__view_context | nothing;&lt;/span&gt;
&lt;span class="s"&gt;            viewname python:viewname and &amp;#39;@@&amp;#39; + viewname or &amp;#39;@@plone&amp;#39;;&lt;/span&gt;
&lt;span class="s"&gt;            view nocall:context/?viewname&amp;quot;&lt;/span&gt;
	&lt;span class="na"&gt;replace=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;structure provider:plone.rightcolumn&amp;quot;&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/body&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Et voilà, our portlets are showing up good and proper again.&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/bAKY-rxWJek" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/plone/2012/06/14/portlets-as-esi-include</feedburner:origLink></entry>
 
 <entry>
   <title>Portlets As Esi Include</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/gvxCqyhL1tU/portlets-as-esi-include" />
   <updated>2012-06-09T00:00:00-07:00</updated>
   <id>http://www.zopatista.com/plone/2012/06/09/portlets-as-esi-include</id>
   <content type="html">
&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/gvxCqyhL1tU" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/plone/2012/06/09/portlets-as-esi-include</feedburner:origLink></entry>
 
 <entry>
   <title>Unicode in RTF documents</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/q8PHvEbLdnE/rtf-and-unicode" />
   <updated>2012-06-06T00:00:00-07:00</updated>
   <id>http://www.zopatista.com/python/2012/06/06/rtf-and-unicode</id>
   <content type="html">
&lt;p&gt;&lt;em&gt;How to encode unicode codepoints in &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; documents using PyRTF.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Some time ago I had to output some nicely formatted reports from a web application, to be usable offline by Windows users. Naturally, I used the aging but still reliable &lt;a href="https://pypi.python.org/pypi/PyRTF"&gt;&lt;code&gt;PyRTF&lt;/code&gt; module&lt;/a&gt; to generate &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; documents with headers, tables, and a consistent style.&lt;/p&gt;

&lt;p&gt;As my application users are mostly Norwegians, however, I quickly discovered that the &lt;code&gt;PyRTF&lt;/code&gt; module does not handle international characters (i.e. anything outside the &lt;abbr title="American Standard Code for Information Interchange"&gt;ASCII&lt;/abbr&gt; codepoints), at all. There is no unicode support at all (it has been on the TODO list since forever), let alone converting unicode codepoints to whatever &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; uses to represent international characters.&lt;/p&gt;

&lt;p&gt;Recently, a &lt;a href="http://stackoverflow.com/q/10852810/100297"&gt;Stack Overflow question&lt;/a&gt; reminded me of how I solved this problem at the time, and clearly this question has &lt;a href="http://stackoverflow.com/q/9908647/100297"&gt;come&lt;/a&gt; &lt;a href="https://groups.google.com/forum/?fromgroups#!topic/django-users/gZH1mnBfgoI"&gt;up&lt;/a&gt; &lt;a href="http://osdir.com/ml/web2py/2010-03/msg01045.html"&gt;before&lt;/a&gt;. Because some approaches I&amp;rsquo;ve seen can actually produce incorrect or overly verbose output (including &lt;a href="https://code.google.com/p/pyrtf-ng/"&gt;pyrtf-ng&lt;/a&gt;), I wanted to explain and expand on my solution to provide a definitive answer to the problem, and also see how my original method faired in terms of speed.&lt;/p&gt;

&lt;h2 id="so-does-rtf-handle-unicode"&gt;So &lt;em&gt;does&lt;/em&gt; &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; handle unicode?&lt;/h2&gt;

&lt;p&gt;Since PyRTF doesn&amp;rsquo;t filter the text you add to a document at all we can just encode unicode strings ourselves. Lucky for me, the &lt;a href="https://en.wikipedia.org/wiki/Rich_Text_Format"&gt;Wikipidia entry on &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt;&lt;/a&gt; has a fairly detailed section on how &lt;a href="https://en.wikipedia.org/wiki/Rich_Text_Format#Character_encoding"&gt;&lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; handles characters outside the &lt;abbr title="American Standard Code for Information Interchange"&gt;ASCII&lt;/abbr&gt; range&lt;/a&gt;. Together with the &lt;a href="http://www.boumphrey.com/rtf/rtfspec.pdf"&gt;published &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; 1.9.1 specification&lt;/a&gt; (PDF) there is plenty of information on how to encode unicode codepoints to &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; control sequences.&lt;/p&gt;

&lt;p&gt;There basically are two choices:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The &lt;code&gt;\'hh&lt;/code&gt; control sequence; a backslash and single quote, followed by an 8-bit hexadecimal value. The value is interpreted as a code-point in a Windows codepage, limiting it&amp;rsquo;s use. You &lt;em&gt;can&lt;/em&gt; assign different codepages to different fonts, but you still cannot use the full range of unicode in a paragraph.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The &lt;code&gt;\uN?&lt;/code&gt; control sequence; backslash &amp;lsquo;u&amp;rsquo; followed by a signed 16-bit integer value in decimal and a placeholder character (represented here by a question mark). The signed 16-bit integer number here is consistent with the &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; standard for control characters, a value between -32768 and 32767.&lt;/p&gt;

    &lt;p&gt;This control sequence &lt;em&gt;can&lt;/em&gt; properly represent unicode, at least for the U+0000 through to U+FFFF codepoints. This sequence was introduced in the 1.5 revision of the &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; spec, in 1997, so it should be widely supported. The placeholder character is meant to be used by readers that do not yet support this escape sequence and should be an &lt;abbr title="American Standard Code for Information Interchange"&gt;ASCII&lt;/abbr&gt; character closest to the unicode codepoint.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;\uN?&lt;/code&gt; format is the easiest to produce, especially if you ignore the replacement character (just set it to &amp;lsquo;?&amp;rsquo; at all times, surely most &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; readers support the 1.5 &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; standard by now, it&amp;rsquo;s been out there for 15 years).&lt;/p&gt;

&lt;h2 id="encoding-the-slow-and-incorrect-way"&gt;Encoding the slow (and incorrect) way&lt;/h2&gt;

&lt;p&gt;A quick search with Google showed me how &lt;a href="https://code.google.com/p/pyrtf-ng/source/browse/trunk/rtfng/Renderer.py?r=81#506"&gt;pyrtf-ng encodes unicode points&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;writeUnicodeElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;\u&lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Unfortunately, the above snippet does a few things wrong: it uses a control code for &lt;em&gt;every&lt;/em&gt; character in the unicode string, producing output that is at least 5 times as long as the input, and it doesn&amp;rsquo;t produce negative numbers for codepoints over &lt;code&gt;\u7fff&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;example&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;CJK Ideograph: &lt;/span&gt;&lt;span class="se"&gt;\u8123&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;\u&lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="go"&gt;u&amp;#39;\\u67?\\u74?\\u75?\\u32?\\u73?\\u100?\\u101?\\u111?\\u103?\\u114?\\u97?\\u112?\\u104?\\u58?\\u32?\\u33059?&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;A recent &lt;a href="http://stackoverflow.com/a/9912561/100297"&gt;Stack Overflow answer&lt;/a&gt; improved on this by only encoding characters over &lt;code&gt;\u007f&lt;/code&gt; (decimal 127) but it still iterates over every character in the string to do so:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;\u&lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="go"&gt;u&amp;#39;CJK Ideograph: \\u33059?&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This outputs unicode because codepoints &amp;lt; 128 are left untouched; numbers are not properly converted to signed shorts either. Here is my variation that remedies these things, and dispenses with the &lt;code&gt;str()&lt;/code&gt; call:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;\u&lt;/span&gt;&lt;span class="si"&gt;%i&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\u8000&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\x7f&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\x20&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;{}&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;CJK Ideograph: \\u-32477?&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;This feels like rather a waste to me, and must be slow as well. I wanted to see how my own solution stacks up against the character-by-character, naive implementation.&lt;/p&gt;

&lt;h2 id="encoding-the-lazy-way"&gt;Encoding the lazy way&lt;/h2&gt;

&lt;p&gt;While casting around for my own solution, I also looked into the Python &lt;a href="http://docs.python.org/library/codecs.html"&gt;&lt;code&gt;codecs&lt;/code&gt; module&lt;/a&gt; to come up with ideas on how to do this more efficiently. Of course, the codecs provided by that module are all implemented in C, but the &lt;code&gt;unicode_escape&lt;/code&gt; codec did produce output quite close to what I needed for &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt;; codepoints between &lt;code&gt;\u0020&lt;/code&gt; and &lt;code&gt;\u007f&lt;/code&gt; are left alone, the rest are encoded to one of the &lt;code&gt;\xhh&lt;/code&gt;, &lt;code&gt;\uhhhh&lt;/code&gt; or &lt;code&gt;\Uhhhhhhhh&lt;/code&gt; 8, 16 or 32-bit escapes (with the exception of &lt;code&gt;\t&lt;/code&gt;, &lt;code&gt;\n&lt;/code&gt; and &lt;code&gt;\r&lt;/code&gt;). Would there be any way to reuse this output?&lt;/p&gt;

&lt;p&gt;Well, if you combine this with a bit of &lt;a href="http://docs.python.org/library/re.html#re.sub"&gt;&lt;code&gt;re.sub&lt;/code&gt;&lt;/a&gt; magic, you can in fact produce convincing &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; command sequences:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;re&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;struct&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_charescape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;r&amp;#39;(?&amp;lt;!&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;(?:x([0-9a-fA-F]{2})|u([0-9a-fA-F]{4}))&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_replace_struct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="c"&gt;# Convert XX or XXXX hex string into 2 bytes&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;codepoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;00&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;hex&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="c"&gt;# Convert 2 bytes into a signed integer, insert into escape sequence&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;u&lt;/span&gt;&lt;span class="si"&gt;%i&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unpack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;!h&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;codepoint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;escaped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;unicode_escape&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;escaped&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;CJK Ideograph: \\u8123&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_charescape&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_replace_struct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;escaped&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;CJK Ideograph: \\u-32477?&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Using the &lt;a href="http://docs.python.org/library/struct.html"&gt;&lt;code&gt;struct&lt;/code&gt; module&lt;/a&gt; gave me a quick means to re-interpret the hexadecimal notation as produced by the &lt;code&gt;unicode_escape&lt;/code&gt; format as a signed short, but I did have to make sure there were 2 bytes at all times.&lt;/p&gt;

&lt;p&gt;Of course, the above trick does not handle newlines, returns or tabs (&lt;code&gt;\n&lt;/code&gt;, &lt;code&gt;\r&lt;/code&gt; and &lt;code&gt;\t&lt;/code&gt; respectively) correctly, nor does it escape existing backslashes yet, but I hoped back when that this proof of concept should operate several orders of a magnitude faster than the naive character-by-character method when dealing with mostly-&lt;abbr title="American Standard Code for Information Interchange"&gt;ASCII&lt;/abbr&gt; input; most of the work is done in C by the &lt;code&gt;codecs&lt;/code&gt; and &lt;code&gt;re&lt;/code&gt; modules, after all.&lt;/p&gt;

&lt;p&gt;So this time around I decided to time these:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;timeit&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test1&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;\u&lt;/span&gt;&lt;span class="si"&gt;%i&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\u8000&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\x7f&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\x20&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;{}&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;testdocument&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test2&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="n"&gt;_charescape&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_replace_struct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;testdocument&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;unicode_escape&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;&lt;span class="s"&gt;Alle mennesker er f&lt;/span&gt;&lt;span class="se"&gt;\xf8&lt;/span&gt;&lt;span class="s"&gt;dt frie og med samme menneskeverd og menneskerettigheter. &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;&lt;span class="s"&gt;De er utstyrt med fornuft og samvittighet og b&lt;/span&gt;&lt;span class="se"&gt;\xf8&lt;/span&gt;&lt;span class="s"&gt;r handle mot hverandre i brorskapets &lt;/span&gt;&lt;span class="se"&gt;\xe5&lt;/span&gt;&lt;span class="s"&gt;nd.&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;testdocument&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;declaration&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;test1()&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import test1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;5.982733964920044&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;test2()&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import test2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;1.4459600448608398&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Cool, so my hybrid encode plus regular-expression based solution looks to be around 4 times as fast, at least when it comes to simple Norwegian text with a handful of latin-1 characters, my most common case. Note however that I am not handling the &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; escape characters properly, nor are the &lt;code&gt;\n&lt;/code&gt;, &lt;code&gt;\r&lt;/code&gt; and &lt;code&gt;\t&lt;/code&gt; characters handled correctly.&lt;/p&gt;

&lt;h2 id="can-i-do-better"&gt;Can I do better?&lt;/h2&gt;

&lt;p&gt;But I am actually being too clever by half (read: pretty dumb really); why did I encode to &lt;code&gt;unicode_escape&lt;/code&gt; in the first place? I was still in the process of fully understanding the issues and saw a shortcut. My regular expression isn&amp;rsquo;t particularly clever, I dabbled with the struct module to get my signed short values, and with all this hocus-pocus I lost sight of the goal: to escape certain classes of characters to &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; command codes.&lt;/p&gt;

&lt;p&gt;But aren&amp;rsquo;t regular expressions quite good at finding those classes all by themselves? I may as well use a decent expression that selects what needs to be encoded directly:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_charescape_direct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;u&amp;#39;([&lt;/span&gt;&lt;span class="se"&gt;\x00&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\x1f\\\\&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="se"&gt;\x80&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\uffff&lt;/span&gt;&lt;span class="s"&gt;])&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_replace_direct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="n"&gt;codepoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="gp"&gt;... &lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;u&lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s"&gt;?&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codepoint&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;codepoint&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;32768&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;codepoint&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_charescape_direct&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_replace_direct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;ascii&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;CJK Ideograph: \\u-32477?&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test3&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="n"&gt;_charescape_direct&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_replace_direct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;testdocument&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;ascii&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;...&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;test3()&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import test3&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;0.5356400012969971&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Suddenly we have an 10 times speed increase! Not only that, I am now also properly escaping the three whitespace characers &lt;code&gt;\n&lt;/code&gt;, &lt;code&gt;\r&lt;/code&gt; and &lt;code&gt;\t&lt;/code&gt;, and as an added bonus, the &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; special characters &lt;code&gt;\&lt;/code&gt;, &lt;code&gt;{&lt;/code&gt; and &lt;code&gt;}&lt;/code&gt; are now also being escaped! I call this a result, and a lesson to learn.&lt;/p&gt;

&lt;h2 id="perhaps-we-can-translate-instead"&gt;Perhaps we can translate instead&lt;/h2&gt;

&lt;p&gt;We could also use a &lt;a href="http://docs.python.org/library/stdtypes.html#str.translate"&gt;translation table&lt;/a&gt; to do my escaping for me. This is simply a dict that maps unicode codepoints to a replacement value. To create a static dict for all unicode values could be somewhat tricky, requiring either a custom &lt;code&gt;__missing__&lt;/code&gt; method or loading a generated structure on import.&lt;/p&gt;

&lt;p&gt;Before digging into clever solutions to that, I should perhaps first test the speed of a simple translation table, one that only covers codepoints up to &amp;lsquo;\u00ff&amp;rsquo;, or latin-1:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;u&amp;quot;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;{0:02x}&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="s"&gt;u&amp;quot;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;{0:02x}&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;{}&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;u&amp;quot;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;u{0}&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;xrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;163&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;testdocument.translate(_table).encode(&amp;quot;ascii&amp;quot;)&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import testdocument, _table&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;2.66812801361084&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Unfortunately, using &lt;code&gt;.translate&lt;/code&gt; turns out to be slowing us down considerably. Reducing the table to just a few codepoints doesn&amp;rsquo;t help either:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_basictable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="s"&gt;u&amp;quot;&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;{0:02x}&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\n\r\t\\&lt;/span&gt;&lt;span class="s"&gt;{}&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_basictable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;6&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;testdocument.translate(_basictable)&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import testdocument, _basictable&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;2.0113179683685303&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;So it looks like I might want to avoid using &lt;code&gt;.translate&lt;/code&gt; if at all possible.&lt;/p&gt;

&lt;h2 id="worst-case-scenario"&gt;Worst-case scenario&lt;/h2&gt;

&lt;p&gt;So far, I&amp;rsquo;ve compared methods by testing them against some Norwegian text, typical of many European languages with a generous helping of &lt;abbr title="American Standard Code for Information Interchange"&gt;ASCII&lt;/abbr&gt; characters.&lt;/p&gt;

&lt;p&gt;To get a more complete picture, I need to test these methods against a worst-case scenario, a UTF-8 encoded test set from a great set of &lt;a href="https://github.com/bits/UTF-8-Unicode-Test-Documents"&gt;UTF-8 test documents&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;urllib&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;utf8_sequence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;https://raw.github.com/bits/UTF-8-Unicode-Test-Documents/master/UTF-8_sequence_unseparated/utf8_sequence_0-0xffff_assigned_printable_unseparated.txt&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;utf-8&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;utf8_sequence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;58081&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;testdocument&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;utf8_sequence&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;test1()&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import test1&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;0.7785000801086426&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;test3()&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;from __main__ import test3&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;0.8913929462432861&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Interesting! So in the worse-case scenario, where the vast majority (99.8%) of the text requires encoding, the character-by-character method is actually a little faster again! But this also means that for most cases, where you insert shorter text snippets into an &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; document, and where a far larger percentage of characters do not need escaping, the regular expression method will beat the character-by-character method hands down.&lt;/p&gt;

&lt;h2 id="so-what-about-non-bmp-unicode"&gt;So what about non-&lt;abbr title="Basic Multilingual Plane"&gt;BMP&lt;/abbr&gt; Unicode?&lt;/h2&gt;

&lt;p&gt;So far I&amp;rsquo;ve focused only on characters within the &lt;a href="https://en.wikipedia.org/wiki/Unicode_plane#Basic_Multilingual_Plane"&gt;&lt;abbr title="Basic Multilingual Plane"&gt;BMP&lt;/abbr&gt;&lt;/a&gt;. You can apparently use a &lt;a href="https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF"&gt;UTF-16 surrogate pair&lt;/a&gt;, at least according to Wikipedia for codepoints byond the &lt;abbr title="Basic Multilingual Plane"&gt;BMP&lt;/abbr&gt;. However, the &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; specification itself is silent on this, and no endian-nes is documented anywhere that I can find. The Microsoft platform uses UTF-16-LE throughout, so perhaps &lt;abbr title="Rich Text Format"&gt;RTF&lt;/abbr&gt; readers support little-endian surrogate pairs too.&lt;/p&gt;

&lt;p&gt;However, I cannot at this time be bothered to extend my encoder to support such codepoints. On a UCS-2-compiled python there is a happy coincidence that codepoints beyond the &lt;abbr title="Basic Multilingual Plane"&gt;BMP&lt;/abbr&gt; are treated mostly like UTF-16 surrogate pairs anyway, so they are sort-of supported by this method:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;beyond&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;u&amp;#39;&lt;/span&gt;&lt;span class="se"&gt;\U00010196&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;_charescape_direct&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_replace_direct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;beyond&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;ascii&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;\\u-10240?\\u-8810?&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Note, however, that the first byte is -10240, or &lt;code&gt;0xd800&lt;/code&gt; in unsigned hexadecimal, making this a big-endian encoded surrogate pair. Presumably on Windows that&amp;rsquo;ll encode the other way around.&lt;/p&gt;

&lt;p&gt;On a UCS-4 platform the codepoint will be ignored by the regular expression and the &lt;code&gt;.encode('ascii')&lt;/code&gt; call will raise a UnicodeEncodeError instead.&lt;/p&gt;

&lt;p&gt;I am calling this &amp;lsquo;unsupported&amp;rsquo; and a day. Suggestions for implementing this in a neat and performant way are welcome!&lt;/p&gt;

&lt;h2 id="off-to-pypi-we-go"&gt;Off to PyPI we go&lt;/h2&gt;

&lt;p&gt;I am quite happy with the simple regular expression method, and prefer it over the character-by-character loop.&lt;/p&gt;

&lt;p&gt;So I packaged up my regular expression method as a handy &lt;a href="https://pypi.python.org/pypi/rtfunicode"&gt;module on PyPI&lt;/a&gt;, complete with Python 2 and 3 support and a miniscule test suite; the &lt;a href="https://github.com/mjpieters/rtfunicode"&gt;source code is available on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The module in fact registers a new codec, called &lt;code&gt;rtfunicode&lt;/code&gt;, so after you import the package all you need do is use the new codec in the &lt;code&gt;.encode()&lt;/code&gt; method:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="pycon"&gt;&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;rtfunicode&lt;/span&gt;
&lt;span class="gp"&gt;&amp;gt;&amp;gt;&amp;gt; &lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;rtfunicode&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;&amp;#39;Alle mennesker er f\\u248?dt frie og med samme menneskeverd og menneskerettigheter. De er utstyrt med fornuft og samvittighet og b\\u248?r handle mot hverandre i brorskapets \\u229?nd.&amp;#39;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Hopefully it comes in handy for others. Feedback is most welcome, as are patches!&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/q8PHvEbLdnE" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/python/2012/06/06/rtf-and-unicode</feedburner:origLink></entry>
 
 <entry>
   <title>The dreaded plone.relations IntId KeyError</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/g-0hTnGEkfc/the-dreaded-plone-relations-intid-keyerror" />
   <updated>2011-06-29T00:00:00-07:00</updated>
   <id>http://www.zopatista.com/plone/2011/06/29/the-dreaded-plone-relations-intid-keyerror</id>
   <content type="html">
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="http://jarn.com"&gt;jarn.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;When IntIds go missing, the going gets tough. Specifically, plone.app.relations and related packages do not deal gracefully when a relationship source or target is missing. Here is how we clear such broken relationships.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We&amp;rsquo;ve been experimenting with &lt;a href="http://pypi.python.org/pypi/plone.app.relations"&gt;plone.app.relations&lt;/a&gt; to manage relationships between objects for a few years now. This package uses &lt;a href="http://pypi.python.org/pypi/zc.relationship"&gt;zc.relations&lt;/a&gt; to lay the links between content items in your site, which in turn relies on &lt;a href="http://pypi.python.org/pypi/zope.app.intid"&gt;zope.app.intid&lt;/a&gt; to indirectly create those links. Basically, intids are pointers to the real objects and lets you handle the linking efficiently.&lt;/p&gt;

&lt;h2 id="water-in-the-bilge"&gt;Water in the Bilge&lt;/h2&gt;

&lt;p&gt;The relations machinery is not very forgiving if any intid has gone AWOL. Normally, the relations data structures are kept in sync through Zope events, but this doesn&amp;rsquo;t always work out. In our experience, you can end up with objects and their intids removed, but the relationships pointing to the now-gone intids still in place. When this happens, things break, and you get trackbacks ending in the dreaded &lt;code&gt;KeyError: &amp;lt;long number&amp;gt;&lt;/code&gt; in &lt;code&gt;getObject&lt;/code&gt; of &lt;code&gt;zope/app/intid/__init__.py&lt;/code&gt;. The traceback line before that will be zc/relationship/index.py in the method &lt;code&gt;resolveToken&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, the zc.relations package is very powerful and very, very flexible. This comes at a price, as it&amp;rsquo;s internal data structures are quite daunting to the uninitiated. If you have to repair these relations and all you have is the missing intid at one end of the relation, it&amp;rsquo;ll be a long hard slug through a maze of 3 or 4 different packages and opaque TreeSets.&lt;/p&gt;

&lt;h2 id="bucket-by-bucket"&gt;Bucket by Bucket&lt;/h2&gt;

&lt;p&gt;Luckily, we already did the deep code dive for you. The following method, if passed an intid, will find any references to it in the relations data structure and remove these for you:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;plone.relations.interfaces&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IComplexRelationshipContainer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;zope.app.intid.interfaces&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IIntIds&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;removeKeyErrorRelationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Remove all relationships that point to a intid no &lt;/span&gt;
&lt;span class="sd"&gt;       longer in the site&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;intids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getUtility&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IIntIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relationships&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getUtility&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IComplexRelationshipContainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                               &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;relations&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;relationships&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relationIndex&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;direction&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;target&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;source&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_name_TO_mapping&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="c"&gt;# Empty set for this direction&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;relid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="n"&gt;keyref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intids&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;refs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;keyref&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c"&gt;# Not even the relationship exists anymore&lt;/span&gt;
                &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iid&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;relation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keyref&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__parent__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;AttributeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c"&gt;# The relation object only exists in the intid utility;&lt;/span&gt;
                    &lt;span class="c"&gt;# in this case __parent__ is None.&lt;/span&gt;
                    &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unindex_doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;# be doubly sure&lt;/span&gt;
                    &lt;span class="n"&gt;intids&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unregister&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Note that this method assumes you already have &lt;a href="http://stackoverflow.com/questions/5819978/how-do-i-trigger-portal-quickinstaller-reinstallproducts-form-outside-the-plone-s/5820885#5820885"&gt;the local site manager set up properly&lt;/a&gt;. This is a great little method to get rid of individual KeyError problems.&lt;/p&gt;

&lt;h2 id="man-the-pumps"&gt;Man the Pumps&lt;/h2&gt;

&lt;p&gt;It would be better, if you could clear out all missing intids from the relations tool altogether, &lt;em&gt;before&lt;/em&gt; they become a problem and things fall down. Luckily, there is! The following code will hunt down and remove all missing intids from the tool. Note that it&amp;rsquo;ll take a while (it&amp;rsquo;ll scan through two whole relations indexes), so you better sit back and relax while the work is done.&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;plone.relations.interfaces&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IComplexRelationshipContainer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;zope.app.intid.interfaces&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IIntIds&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;BTrees.IOBTree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;difference&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clearAllMissingLinks&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Find and remove all missing intids in the&lt;/span&gt;
&lt;span class="sd"&gt;       relations tool.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;intids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getUtility&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IIntIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relationships&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getUtility&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IComplexRelationshipContainer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                               &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;relations&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;relIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;relationships&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relationIndex&lt;/span&gt;
    &lt;span class="n"&gt;rtotal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;itotal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;direction&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#39;target&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;source&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_name_TO_mapping&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;iid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;difference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intids&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;refs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;itotal&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;relid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;iid&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
                &lt;span class="n"&gt;keyref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;intids&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;refs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;keyref&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c"&gt;# Not even the relationship exists anymore&lt;/span&gt;
                    &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iid&lt;/span&gt;&lt;span class="p"&gt;,),&lt;/span&gt; &lt;span class="n"&gt;direction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;relation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keyref&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;
                    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__parent__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;AttributeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="c"&gt;# The relation object only exists in the intid utility;&lt;/span&gt;
                        &lt;span class="c"&gt;# in this case __parent__ is None.&lt;/span&gt;
                        &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="n"&gt;relIndex&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unindex_doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;# be doubly sure&lt;/span&gt;
                        &lt;span class="n"&gt;intids&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unregister&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;rtotal&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;itotal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rtotal&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;Note that this method returns the total number of intids identified, as well as the total number of relationships removed.&lt;/p&gt;

&lt;h2 id="patch-the-leak"&gt;Patch the leak?&lt;/h2&gt;

&lt;p&gt;Instead of pumping out the water, we should of course patch the leak. We have yet to find it though, but if we do, we&amp;rsquo;ll make sure the affected packages receive the patch!&lt;/p&gt;

&lt;h3 id="april-2012-update-clean-up-methods-fine-tuned"&gt;&lt;em&gt;April 2012 Update&lt;/em&gt;: clean-up methods fine-tuned.&lt;/h3&gt;

&lt;p&gt;I&amp;rsquo;ve found that in practice some relationships only were still referenced by intid keyrefs and present in the relationships index, but no longer were present in the relationship utility itself. These have to be manually unindexed and removed; the code examples above have been updated to reflect this.&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/g-0hTnGEkfc" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/plone/2011/06/29/the-dreaded-plone-relations-intid-keyerror</feedburner:origLink></entry>
 
 <entry>
   <title>Saving the day: recovering lost objects</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/fZsGBq_Usmw/saving-the-day-recovering-lost-objects" />
   <updated>2008-12-18T00:00:00-08:00</updated>
   <id>http://www.zopatista.com/plone/2008/12/18/saving-the-day-recovering-lost-objects</id>
   <content type="html">
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="http://jarn.com"&gt;jarn.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;When a customer discovers over a week later that an important object was accidentially deleted, what do you do?&lt;/em&gt;&lt;/p&gt;

&lt;h2 id="oh-noes"&gt;Oh noes!&lt;/h2&gt;

&lt;p&gt;A customer discovered that an important entire section of his site was missing and asked us to bring it back. This was in a heavily edited site, with loads of writes each day, but we quickly located the offending transaction: someone had deleted the object in question 9 days earlier.&lt;/p&gt;

&lt;p&gt;Undo was no longer an option, though: too many things had changed, not least the catalog. Truncating the Data.fs (removing all transactions since, including the offending one) was not only undesirable, but impossible as the site stores the data in Oracle through &lt;a href="http://wiki.zope.org/ZODB/RelStorage"&gt;RelStorage&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id="time-travel"&gt;Time travel&lt;/h2&gt;

&lt;p&gt;So, instead of permanently removing transactions, we used a handy little package to do some time traveling: &lt;a href="http://pypi.python.org/pypi/zc.beforestorage"&gt;zc.beforestorage&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;zc.beforestorage&lt;/code&gt; does require a ZODB version 3.8 or 3.9; the customer installation is on Plone 3.0, so a newer ZODB3 egg was necessary for this operation. A small additional buildout configuration file (saved as beforestorage.cfg) helps out:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="ini"&gt;&lt;span class="k"&gt;[buildout]&lt;/span&gt;
&lt;span class="na"&gt;extends&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s" /&gt;
&lt;span class="s"&gt;    buildout.cfg&lt;/span&gt;
&lt;span class="na"&gt;eggs +&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s" /&gt;
&lt;span class="s"&gt;    zc.beforestorage&lt;/span&gt;
&lt;span class="s"&gt;    ZODB3&lt;/span&gt;
&lt;span class="s"&gt;    zope.proxy&lt;/span&gt;

&lt;span class="k"&gt;[versions]&lt;/span&gt;
&lt;span class="na"&gt;ZODB3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;3.8.1&lt;/span&gt;
&lt;span class="na"&gt;zope.proxy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;3.4.2&lt;/span&gt;

&lt;span class="k"&gt;[relstorage-patch]&lt;/span&gt;
&lt;span class="na"&gt;recipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;plone.recipe.command&lt;/span&gt;
&lt;span class="na"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s" /&gt;
&lt;span class="s"&gt;    cd ${buildout:eggs-directory}/ZODB3-3.8.1-py2.4-linux-i686.egg/ZODB&lt;/span&gt;
&lt;span class="s"&gt;    curl -s http://svn.zope.de/zope.org/relstorage/tags/1.1c1/poll-invalidation-1-zodb-3-8-0.patch | patch -N -p0&lt;/span&gt;
&lt;span class="s"&gt;    cd ${buildout:directory}&lt;/span&gt;
&lt;span class="na"&gt;update-command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;${relstorage-patch:command}&lt;/span&gt;

&lt;span class="k"&gt;[instance]&lt;/span&gt;
&lt;span class="na"&gt;zope-conf-additional +&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s" /&gt;
&lt;span class="s"&gt;    enable-product-installation False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;relstorage-patch&lt;/code&gt; section in the above code ensures that our ZODB3 egg is patched with the RelStorage additions, and the zope.proxy egg is needed because ZODB 3.8 requires a newer version. The &lt;code&gt;enable-product-installation&lt;/code&gt; line is required because &lt;code&gt;zc.beforestorage&lt;/code&gt; puts your ZODB in read-only mode (understandibly); the option tells Zope not to try and write product information to the ZODB.&lt;/p&gt;

&lt;p&gt;Once buildout has been run with this configuration (with the &lt;code&gt;-c&lt;/code&gt; switch), you&amp;rsquo;ll still need to edit the zope.conf file for your instance, usually in parts/instance/etc/zope.conf. You need to edit the &lt;code&gt;&amp;lt;zodb_db main&amp;gt;&lt;/code&gt; section to wrap the storage in the beforestorage. Ours looked something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;zodb_db main&amp;gt;
    # Main database
    cache-size 650000
%import zc.beforestorage
%import relstorage
    &amp;lt;before&amp;gt;
    before 2008-12-08T10:29:03
    &amp;lt;relstorage&amp;gt;
        &amp;lt;oracle&amp;gt;
            dsn RELSTORAGE_DSN
            password xxxxxxxxx
            user xxxxxxxx
        &amp;lt;/oracle&amp;gt;
    &amp;lt;/relstorage&amp;gt;
    &amp;lt;/before&amp;gt;
    mount-point /
&amp;lt;/zodb_db&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Any line with the word &amp;lsquo;before&amp;rsquo; in it is new. The timestamp we learned from the undo log, simply converted to UTC. Now, when you start the instance, you are in the past. You can&amp;rsquo;t alter this past (no killing of grandfathers), but you &lt;em&gt;can&lt;/em&gt; read it. And lo and behold, the deleted object is back.&lt;/p&gt;

&lt;h2 id="recovery"&gt;Recovery&lt;/h2&gt;

&lt;p&gt;Now that we have found the lost object, we can recover it. We simply exported it; in the ZMI, choose the Export/Import button, and save the export on the server. Remove the zc.beforestorage configuration (just run buildout with your regular buildout file), restart, import the .zexp file, done!&lt;/p&gt;

&lt;p&gt;Note that you&amp;rsquo;ll need to reindex the imported content and that any related data that lives outside of the object itself is gone. For example, its intid are gone and all relationships to it will have to be recreated etc. But you just saved your customers bacon, I&amp;rsquo;m sure they won&amp;rsquo;t mind a little manual work!&lt;/p&gt;
&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/fZsGBq_Usmw" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/plone/2008/12/18/saving-the-day-recovering-lost-objects</feedburner:origLink></entry>
 
 <entry>
   <title>One cookie please, but hold the pickles</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/4KTiODkruuI/one-cookie-please" />
   <updated>2007-11-09T00:00:00-08:00</updated>
   <id>http://www.zopatista.com/plone/2007/11/09/one-cookie-please</id>
   <content type="html">
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="http://jarn.com"&gt;jarn.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The python pickle module is dangerous, didn&amp;rsquo;t you know?&lt;/em&gt;&lt;/p&gt;

&lt;h2 id="all-your-base-are-belong-to-us"&gt;All your base are belong to us&lt;/h2&gt;

&lt;p&gt;By now you all should have installed &lt;a href="http://plone.org/products/plone-hotfix/releases/20071106"&gt;last Tuesday&amp;rsquo;s Hotfix&lt;/a&gt;. If you haven&amp;rsquo;t yet, but are running Plone 2.5 or Plone 3.0 websites, you should do so &lt;strong&gt;yesterday&lt;/strong&gt;, or at least as soon as humanly possible.&lt;/p&gt;

&lt;p&gt;The Hotfix patches a serious security problem in the statusmessages and linkintegrity modules, where network-supplied data was interpreted as &lt;a href="http://docs.python.org/lib/module-pickle.html"&gt;pickles&lt;/a&gt;. &amp;ldquo;Network-supplied&amp;rdquo; data in this case means both cookies and form data, and no authentication is required to exploit the holes.&lt;/p&gt;

&lt;h2 id="what-happen-"&gt;What happen ?&lt;/h2&gt;

&lt;p&gt;The basic problem with the holes is that the Plone community was totally unaware of how dangerous the pickle module really is. Hanno Schlichting did file a &lt;a href="http://dev.plone.org/plone/ticket/6943"&gt;report&lt;/a&gt; a few months ago stating that the code was potentially dangerous, but even he didn&amp;rsquo;t fully appreciate that pickles are a security hole only waiting for attacker input. The scary thing here is that the code in question was written by extremely capable and experienced developers, but none of them were aware of the fact that you cannot ever use pickles to load user-supplied data.&lt;/p&gt;

&lt;p&gt;What is needed then, is education. This is my contribution.&lt;/p&gt;

&lt;h2 id="you-are-on-the-way-to-destruction"&gt;You are on the way to destruction&lt;/h2&gt;

&lt;p&gt;So what is wrong with pickles? They are just a damn handy way to serialize arbitrary data into binary strings and back again, right?&lt;/p&gt;

&lt;p&gt;Yes, they are that, but the pickle format used is also a &lt;a href="http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/"&gt;simple stack language&lt;/a&gt; that allows the creation of arbitrary python structures, and execute them. This stack language allows you to import modules (the &amp;lsquo;c&amp;rsquo; symbol), and apply arguments to callables (the &amp;lsquo;R&amp;rsquo; symbol), thus causing code to be run. Combine this with the python built-in methods &lt;code&gt;eval&lt;/code&gt; and &lt;code&gt;compile&lt;/code&gt; and you have the perfect vehicle for an attacker to have the pickle loader routine execute arbitrary python code when loading a well-crafted pickle. Just image what an attacker could do with that to your Zope server. Do you think you&amp;rsquo;ll ever be sure you got all the backdoors out of your Data.fs?&lt;/p&gt;

&lt;h2 id="we-get-signal"&gt;We get signal&lt;/h2&gt;

&lt;p&gt;So next time you need to preserve data across HTTP requests, please do not be tempted to use the pickle module to create strings for you. Rarely will you have anything more than a handful of simple datatypes to pass along anyway, so just invent a simple dataformat and use that instead. (No, using a subclass of the python implementation of pickle is not a simpler solution).&lt;/p&gt;

&lt;p&gt;With statusmessages for example, each message consists of a message and a type string, both unicode. So we changed to a hand-rolled format using a 2 byte length header (11 bits of message length, 5 for the type) directly followed by the message and type strings (encoded to utf-8). When reading this from a cookie again later, the decoder simply has to read the lengths from the first 2 bytes, then read the right amount of characters to get the message and type back. A similar method was used to encode the linkintegrity data. Simple, effective, and impervious to attacks.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Congratulation!!&lt;br /&gt;
A.D.2111&lt;br /&gt;
All bases of CATS were destroyed.&lt;br /&gt;
It seems to be peaceful.&lt;br /&gt;
But it is incorrect. CATS is still alive.&lt;br /&gt;
ZIG-01 must fight against CATS again.&lt;br /&gt;
And down with them completely !&lt;br /&gt;
Good luck.&lt;/p&gt;

  &lt;p&gt;&lt;small&gt;&lt;a href="http://en.wikipedia.org/wiki/All_your_base_are_belong_to_us"&gt;Zero Wing, 1989&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/4KTiODkruuI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/plone/2007/11/09/one-cookie-please</feedburner:origLink></entry>
 
 <entry>
   <title>Small change, big effect</title>
   <link href="http://feedproxy.google.com/~r/zopatista/~3/-pcTCl4uBtI/small-change-big-effect" />
   <updated>2007-11-04T00:00:00-07:00</updated>
   <id>http://www.zopatista.com/plone/2007/11/04/small-change-big-effect</id>
   <content type="html">
&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="http://jarn.com"&gt;jarn.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How changing one line halved the time it took to rename a Plone folder.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here at the Plone Performance sprint, Matt Hamilton and Sasha Vincic are homing in on the Catalog and folder renaming. As Sasha already &lt;a href="http://valentinewebsystems.com/archive/2007/11/02/indexing-in-plone-got-twice-as-fast"&gt;reported earlier&lt;/a&gt;, they identified the object_provides index as a potential bottleneck.&lt;/p&gt;

&lt;h2 id="the-objectprovides-index"&gt;The object_provides index&lt;/h2&gt;

&lt;p&gt;The index is filled with interface identifiers, strings representing the actual interfaces. The data for the index comes from a small method in Products.CMFPlone.CatalogTool, object_provides, which looked like this:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;object_provides&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;portal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;interfaceToName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;portal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;providedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flattened&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;So, for each interface declared by an object, interfaceToName is invoked. The purpose of interfaceToName is to provide a way to turn an
interface to a string that can be used to later turn that string back
into an interface, through the queryInterface method, interfaceToName&amp;rsquo;s
sibling.&lt;/p&gt;

&lt;p&gt;Now the problem with interfaceToName is that it has to iterate over the whole utilities registry to find all interfaces registered as utilities just to find out what name it was registered with. This is a slow process, but a necessary one; although the default name for an interface is it&amp;rsquo;s &lt;em&gt;dotted name&lt;/em&gt; (the python identifier path to their definition), some special classes of interfaces are registered with a different name. For example, when registering a Zope3-style browser menu, an interface is generated for the menu, and registered with a zope.app.menus prefix.&lt;/p&gt;

&lt;h2 id="no-need-for-interfacetoname"&gt;No need for interfaceToName&lt;/h2&gt;

&lt;p&gt;Luckily, for the object_provides index use-case, interfaceToName is overkill. First of all, object_provides indexes declared interfaces on content objects only, and therefore will never encounter any of the &amp;ldquo;special&amp;rdquo; interfaces.&lt;/p&gt;

&lt;p&gt;But more importantly, the index contents are never used to find the original interfaces again. Quite the contrary, it is only used to search what objects provide a given interface, and the developer querying the catalog will have to generate the same string format every time they search. So, with the index using interfaceToName to fill the index, searching the index also requires developers to use interfaceToName to query the index. Search for IATFolder? Pass in interfaceToName(IATFolder) and hit the same performance problem.&lt;/p&gt;

&lt;p&gt;&lt;img src="/assets/images/renaming-performance-increase.png" alt="Renaming performance increase" class="pull-right" /&gt;&lt;/p&gt;

&lt;h2 id="unique-identifier"&gt;Unique identifier&lt;/h2&gt;

&lt;p&gt;So if interfaceToName is overkill, what unique identifier should we use then? As we already mentioned, when you register an interface in the first place, the default name is the dotted name of the interface. It&amp;rsquo;s a unique identifier, as it&amp;rsquo;s the name under which python stores it in memory. It is available as the &lt;code&gt;__identifier__&lt;/code&gt; attribute on the interface. As it&amp;rsquo;s unique, and available directly from the interfaces themselves, it&amp;rsquo;s ideally suited for both indexing and searching.&lt;/p&gt;

&lt;p&gt;Of course, this means that if we use &lt;code&gt;__identifier__&lt;/code&gt; then you should use the same attribute when querying the index. Because &lt;code&gt;__identifier__&lt;/code&gt; (or &lt;code&gt;__module__ + '.' + __name__&lt;/code&gt;, which is the same) is already the default for interfaceToName, this is what Plone developers have been using anyway.&lt;/p&gt;

&lt;p&gt;So we changed the indexing method to:&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;code class="python"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;object_provides&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;portal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__identifier__&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;providedBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flattened&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;and presto, indexing was more than twice as fast, as shown by the pretty graph on the right. We tested this by having JMeter rename a folder with 20 documents in it, 40 times.&lt;/p&gt;

&lt;p&gt;Not bad for a one-line change.&lt;/p&gt;

&lt;img src="http://feeds.feedburner.com/~r/zopatista/~4/-pcTCl4uBtI" height="1" width="1"/&gt;</content>
 <feedburner:origLink>http://www.zopatista.com/plone/2007/11/04/small-change-big-effect</feedburner:origLink></entry>
 
 
</feed>
