<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2enclosuresfull.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:yt="http://gdata.youtube.com/schemas/2007" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
   <channel>
      <title>JISC-CRIG Planet</title>
      <description>Selected feeds from around the web searching for a single tag and aggregated into a single feed. The tag (CRIG) is for the Common Repository Interface Group as funded by the Joint Information Systems Committee.  We are improving repositories in the Higher and Further Education Sector over the next eighteen months, til Sept 2009.</description>
      <link>http://pipes.yahoo.com/pipes/pipe.info?_id=IOhBPNqn3BG4kFuCJZhxuA</link>
      <atom:link rel="next" href="http://pipes.yahoo.com/pipes/pipe.run?_id=IOhBPNqn3BG4kFuCJZhxuA&amp;_render=rss&amp;page=2" />
      <pubDate>Tue, 18 Jun 2013 03:43:00 +0000</pubDate>
      <generator>http://pipes.yahoo.com/pipes/</generator>
      <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/jisc-crig" /><feedburner:info uri="jisc-crig" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><itunes:explicit>no</itunes:explicit><itunes:subtitle>Selected feeds from around the web searching for a single tag and aggregated into a single feed. The tag (CRIG) is for the Common Repository Interface Group as funded by the Joint Information Systems Committee. We are improving repositories in the Higher </itunes:subtitle><itunes:summary>Selected feeds from around the web searching for a single tag and aggregated into a single feed. The tag (CRIG) is for the Common Repository Interface Group as funded by the Joint Information Systems Committee. We are improving repositories in the Higher and Further Education Sector over the next eighteen months, til Sept 2009.</itunes:summary><feedburner:browserFriendly></feedburner:browserFriendly><item>
         <title>Promoting IWMW 2013: the Video Summary</title>
         <link>http://ukwebfocus.wordpress.com/2013/05/28/promoting-iwmw-2013-the-video-summary/</link>
         <description>Back in the late 1990s publicising an event was quite simple &amp;#8211; the main activity was simply sending messages to relevant email lists. A message I sent to the web-support JISCMail list on Wednesday, 2 September 1998 illustrates this (and I&amp;#8217;m pleased that the JISCMail service has continued to provide an archive of messages over this [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&amp;#038;blog=497535&amp;#038;post=13712&amp;#038;subd=ukwebfocus&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=13712</guid>
         <pubDate>Tue, 28 May 2013 13:00:40 +0000</pubDate>
         <content:encoded><![CDATA[<p>Back in the late 1990s publicising an event was quite simple &#8211; the main activity was simply sending messages to relevant email lists. A <a rel="nofollow" target="_blank" href="https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=WEB-SUPPORT;df4768d4.9809">message I sent to the web-support JISCMail list</a> on Wednesday, 2 September 1998 illustrates this (and I&#8217;m pleased that the JISCMail service has continued to provide an archive of messages over this period of time).</p>
<p>Nowadays, of course, there are many more communications channels available, and many users (in the case of events, potential participants) will expect to receive information of relevance to them in their preferred environment. Indeed, to use <a rel="nofollow" target="_blank" href="http://tallblog.conted.ox.ac.uk/index.php/2011/12/16/visitors-and-residents-an-update/">the visitors and residents metaphor</a>, if they are &#8216;residents&#8217; of the online environment they will expect the modern equivalent of the town cryer to make the announcements close to their residence, whereas &#8216;visitors&#8217; may well expect to receive information only if they track down relevant information kiosks.</p>
<p>In addition to using tweets, blog posts, RSS feeds and LinkedIn announcements it is now possible to use video sharing tools, such as YouTube. Such popular services, which will be readily available on mobile devices, may be particularly useful in reaching out to people on the move, who may find it easier to view a brief video clip rather than read text on a small screen.</p>
<p>For this reason I have created a brief video clip, lasting just over 2 minutes, which summarises the <a rel="nofollow" target="_blank" href="http://iwmw.ukoln.ac.uk/iwmw2013/">IWMW 2013 event</a>, which will be held at the University of Bath on 26-28 June. The <a rel="nofollow" target="_blank" href="http://www.youtube.com/watch?v=ss7y10FI0OU">video clip is available on YouTube</a> and is embedded below. I should add that the questions were asked by Kirsty Pitkin and the video was taken by Rich Pitkin, who also edited the video. Kirsty and Rich are running a session on <a rel="nofollow" target="_blank" href="http://iwmw.ukoln.ac.uk/iwmw2013/sessions/pitkin/">Creating a Multimedia CV or Project Summary</a> at the IWMW 2013 event, so if you would like a brief video made about yourself or some aspect of your work, feel free to sign up for the session! Remember that <a rel="nofollow" target="_blank" href="http://iwmw.ukoln.ac.uk/iwmw2013/booking/">the 3-day event costs only £350</a>, which includes 2 nights&#8217; accommodation.</p>
<span class='embed-youtube' style='text-align:center;display:block;'></span> 
<hr />
<p><a rel="nofollow" name="statistics"></a>View Twitter conversation from: [<a rel="nofollow" target="_blank" href="http://topsy.com/ukwebfocus.wordpress.com/2013/05/28/promoting-iwmw-2013-the-video-summary/">Topsy</a>] | View Twitter statistics from: [<a rel="nofollow" target="_blank" href="https://tweetreach.com/reports/7602453">TweetReach</a>] – [<a rel="nofollow" target="_blank" href="http://bit.ly/19qAgSd">Bit.ly</a>]</p>
<br />Filed under: <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/events/'>Events</a>  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ukwebfocus.wordpress.com/13712/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godelicious/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ukwebfocus.wordpress.com/13712/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gofacebook/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ukwebfocus.wordpress.com/13712/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gotwitter/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ukwebfocus.wordpress.com/13712/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gostumble/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ukwebfocus.wordpress.com/13712/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godigg/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ukwebfocus.wordpress.com/13712/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/goreddit/ukwebfocus.wordpress.com/13712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ukwebfocus.wordpress.com/13712/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&#038;blog=497535&#038;post=13712&#038;subd=ukwebfocus&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <geo:lat>51.379915</geo:lat>
         <geo:long>-2.331708</geo:long>
         <media:content medium="image" url="http://2.gravatar.com/avatar/27731abff266f585f006998f65c74be9?s=96&amp;amp;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96">
            <media:title type="html">Brian Kelly (UK Web Focus)</media:title>
         </media:content>
      </item>
      <item>
         <title>Flanders’ rants</title>
         <link>http://feedproxy.google.com/~r/DFFlanders/~3/Q5m8ZSbPf-k/</link>
         <description>Disclaimer: This page on my blog is intended to act as a repository for rants. I&amp;#8217;ll be the first to admit that these rants (by their very nature) are *not* well thought out and intended more as a place for me to vent about what frustrates me. The point in posting these rants is that [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dfflanders.wordpress.com&amp;#038;blog=532341&amp;#038;post=775&amp;#038;subd=dfflanders&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://dfflanders.wordpress.com/?p=775</guid>
         <pubDate>Sun, 12 May 2013 05:39:08 +0000</pubDate>
         <content:encoded><![CDATA[<h1>Disclaimer:</h1>
<h3>This page on my blog is intended to act as a repository for rants. I&#8217;ll be the first to admit that these rants (by their very nature) are *not* well thought out and intended more as a place for me to vent about what frustrates me. The point in posting these rants is that at some stage (once I am a bit calmer), I can come back to view these rants in a more methodical and mindful way. The value in these rants is therefore not in the intellectual ideas, but rather in the emotional ideas that provoked their &#8216;knee jerk&#8217; reaction.  I hope to learn from these rants by posting them publicly, and  (hopefully) through my friends and colleagues I can separate the emotional from my ideological responses.</h3>
<p>Below in the comments you&#8217;ll find these rants, to receive these rants you&#8217;ll need to leave a comment and then tick the &#8220;Notify me of follow-up comments via email&#8221; box.</p>
<p>NB The other reason for having this page on my blog, is my perception that twitter is increasingly becoming a failed space for having short topical debates that often get kicked off on twitter and usually erode after three or more people join in the discussion.</p>
<br />  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/dfflanders.wordpress.com/775/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dfflanders.wordpress.com/775/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dfflanders.wordpress.com&#038;blog=532341&#038;post=775&#038;subd=dfflanders&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <media:content medium="image" url="http://2.gravatar.com/avatar/8cdc26501af3b7e571a4d23f0afbbf08?s=96&amp;amp;d=identicon&amp;amp;r=G">
            <media:title type="html">dfflanders</media:title>
         </media:content>
      </item>
      <item>
         <title>Mobile discovery: don&amp;#8217;t retro-fit; invent!</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/05/09/mobile-discovery-dont-retro-fit-invent/</link>
         <description>&lt;address&gt;The following is a version of an article that was printed in the March edition of &lt;a rel="nofollow" target="_blank" href="http://www.cilip.org.uk/publications/update-magazine/pages/default.aspx"&gt;Cilip Update&lt;/a&gt;.&lt;/address&gt;
&lt;address&gt; &lt;/address&gt;
&lt;address&gt;&lt;span style="font-style:normal;"&gt;When confronted with new technologies we often fail, early in their existence, to exploit the opportunities offered by the new medium. We retro-fit existing solutions rather than inventing new experiences.&lt;/span&gt;&lt;/address&gt;
&lt;p&gt;The Canadian philosopher of communication and media, Marshall McLuhan, famously argued:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;We see the world through a rear-view mirror. We march backwards into the future&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;In the early days of the web it was common for retailers to replicate paper brochures online, so called ‘brochureware’, missing the interactivity and format opportunities the web provides (and losing customers in the process too!). &lt;strong&gt;We continue to transpose our experiences of physical paper and books online&lt;/strong&gt;, with little or no adaptation to the opportunities for interaction and multi-media.&lt;/p&gt;
&lt;p&gt;While mobile technology has been available for decades, its current ubiquity and power (both socially and technologically) mean we find ourselves at the edge of a technological shift. As we move from a desk top to a mobile lifestyle we must be careful not to succumb to the rear-view mirror effect and replicate the desk top experience in the services and systems we design for the mobile user.&lt;/p&gt;
&lt;p&gt;We find ourselves inhabiting a very different environment to a few years ago. Where once our computing power was located in one place, it now travels with us, capturing and distracting us no matter where we find ourselves. It connects us to people, places and things in ways not previously possible.&lt;/p&gt;
&lt;p&gt;With this mobile lifestyle in mind I want to explore 4 challenges that mobile technologies present to libraries. In articulating these challenges I hope it will become increasingly clear what strategies and opportunities there are for libraries, and their services, systems and collections.&lt;/p&gt;
&lt;h3&gt;Simplicity&lt;/h3&gt;
&lt;p&gt;When you take a look at some of the best mobile experiences, whether apps or websites they usually have one thing in common: They do one thing extremely well. Everything extraneous is stripped away to leave only the most essential and relevant information.&lt;/p&gt;
&lt;p&gt;Exemplars include &lt;a rel="nofollow" target="_blank" href="https://itunes.apple.com/us/app/rise-alarm-clock/id577221529?mt=8"&gt;Rise&lt;/a&gt;, an alarm clock app that incorporates visually simple interfaces, combined with gesture recognition and your music playlists. Or &lt;a rel="nofollow" target="_blank" href="http://www.realmacsoftware.com/clear/"&gt;Clear&lt;/a&gt;, a ‘to do’ app, with intuitive gesture controls and the use of colour to denote urgency – nothing else.&lt;/p&gt;
&lt;p&gt;Amazon’s &lt;a rel="nofollow" target="_blank" href="https://www.amazon.co.uk/gp/feature.html?ie=UTF8&amp;amp;docId=1000644603"&gt;stripped down app&lt;/a&gt; is a good example of a website that has adapted its presence to a mobile experience: Only the relevant information is included and all the complexity is hidden away from sight (although you can dig deeper if you wish).&lt;/p&gt;
&lt;p&gt;The Amazon example is an interesting one. It invites comparisons with the library catalogue, and it certainly provides an effective template for mobile discovery. However, libraries have a physical infrastructure, processes and technologies that mean refining the mobile experience to a single thing can be hard. When we use a phrase like ‘discovery’ in a library or information-seeking context we often mean a set of interrelated actions, such as: search, select, find and use. Is it possible to break these down into their component parts and still deliver a positive experience for the user, both in terms of the mobile experience and of using the library?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The challenge the mobile devices present to libraries in this context is one of needs over solutions.&lt;/strong&gt; The challenge is to think beyond the solutions already in place (the catalogue, discovery layer), to articulating the actual need. In the case of discovery maybe, ‘I need to answer a question’, or; ‘I need to find something’. Formulated in this way it is clear that a solution may be very different to the ones already available.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It forces us to consider the context we’re operating in; it invites us to invent, not retro-fit!&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;People and Place&lt;/h3&gt;
&lt;p&gt;Increasingly, &lt;strong&gt;the mobile device is a bridge between our online social connectivity and our localised real-world interactions&lt;/strong&gt;. If you explore a map on your phone you don’t have to tell it where you are, the internal GPS has already told it. Similarly, it can tell you when a friend is near-by through apps like Facebook, FourSquare and so on.&lt;/p&gt;
&lt;p&gt;There are a number of interesting examples where libraries and others have exploited these inherent benefits of mobile devices. Mendeley, the reference manager, is a good example of a service that is explicitly looking to build a social layer on top of the bibliographic data they have crowdsourced from the academic community in the form of bibliographies. You can follow academics with similar research interests, build groups and curate and build your own, personalised discovery network.&lt;br /&gt;
Increasingly, the discovery experience unfolds and is led by the content itself. What used to be the destination, the content or resource, is now the beginning of the journey.&lt;/p&gt;
&lt;p&gt;For example, projects like &lt;a rel="nofollow" target="_blank" href="http://bombsight.org/#15/51.5050/-0.0900"&gt;Bomb Site&lt;/a&gt;, from the National Archives, have taken bomb site map data and made it available as a responsive website so that academics, researchers and members of the public can explore where bombs fell. This data is augmented over a map and includes images, descriptions and people’s memories.&lt;/p&gt;
&lt;div id="attachment_1776" class="wp-caption aligncenter" style="width:310px;"&gt;&lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/files/2013/05/bombsite.jpg"&gt;&lt;img class="size-medium wp-image-1776" src="http://infteam.jiscinvolve.org/wp/files/2013/05/bombsite-300x146.jpg" alt="" width="300" height="146"/&gt;&lt;/a&gt;&lt;p class="wp-caption-text"&gt;Bomb Sight App&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;Similarly, the &lt;a rel="nofollow" target="_blank" href="http://jiscphonebooth.wordpress.com/"&gt;PhoneBooth &lt;/a&gt;project from the London School of Economics mobilised the Charles Booth poverty maps of London so that students and researchers could use and annotate the maps in context, i.e., on the streets of London as part of their learning experience.&lt;/p&gt;
&lt;div id="attachment_1779" class="wp-caption aligncenter" style="width:209px;"&gt;&lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/files/2013/05/phonebooth1.png"&gt;&lt;img class="size-medium wp-image-1779" src="http://infteam.jiscinvolve.org/wp/files/2013/05/phonebooth1-199x300.png" alt="" width="199" height="300"/&gt;&lt;/a&gt;&lt;p class="wp-caption-text"&gt;PhoneBooth app&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Increasingly the discovery process will find itself facilitating peer-to-peer and social recommendation experiences&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The traditional catalogue will itself begin to disappear from these interactions. Instead, the discovery experience will have an intimacy and personalisation associated with it that mirrors the intimately personal experience of the mobile device itself.&lt;/p&gt;
&lt;h3&gt;Personal&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The web provides unparalleled opportunities for scale&lt;/strong&gt;. The local bric-a-brac shop becomes eBay, the bookshop Amazon, the University becomes the massively open online course (MOOC) such as Cousera. Similarly the library begins to operate at ‘web-scale’ with its systems and services.&lt;/p&gt;
&lt;p&gt;Yet, the mobile experience is an intimately personal one. It challenges libraries and information providers to find a balance between these two types of scale: the singular (the personal) and the ‘web-scale’. It is not enough simply to adopt web-scale systems and services: mobile challenges us to think about how that web-based interaction is transformed into real-world action.&lt;/p&gt;
&lt;p&gt;One opportunity for libraries is in the data that circulates through their systems, both the management data and the user-generated interaction data. There are an increasing number of services and projects looking at exploiting this data for the personalisation of the user experience. These include commercial offerings, of which the best known is bX from Ex Libris.&lt;/p&gt;
&lt;p&gt;There are also a number of academic libraries exploring the use of this data, including:&lt;a rel="nofollow" target="_blank" href="http://salt11.wordpress.com/"&gt; SALT &lt;/a&gt;(surfacing the academic long tail) and &lt;a rel="nofollow" target="_blank" href="http://www.open.ac.uk/blogs/RISE/"&gt;RISE &lt;/a&gt;(Recommendations improve the search experience) which are exploring how different sets of data can be used to enhance and personalise the library experience.&lt;/p&gt;
&lt;p&gt;The ability of libraries to exploit this data will grow increasingly important. The data provides a way for libraries to continue delivering services to hundreds and thousands of users, while providing a personalised experience that users expect from web-based services.&lt;/p&gt;
&lt;h3&gt;New models&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;If the mobile shift challenges libraries to invent new experiences, it also invites us to rethink how we develop and implement these.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As information becomes abundant and digital, the models for how libraries develop and implement new services and systems will radically change too. Libraries are no longer comparing themselves and their services to other libraries; instead they are being compared to the web, and the types of services and resources users can access there. Increasingly libraries will find themselves needing to adopt approaches that would normally be more associated with web start-ups.&lt;/p&gt;
&lt;p&gt;This implies a greater focus on ideas (ideas from everywhere: librarians, users et al), rapid iteration and testing, and implementation of the idea (or quick relegation of ideas). This more entrepreneurial approach recognises that there is no simple crossing between how things are now and the future. &lt;strong&gt;There is not a simple roadmap from the complexities of the information environment as they are now, to some stable future; disruption is a feature, not a bug of the system.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;While the change in a libraries approach to the user and the work it undertakes is significant, and not easy, there are some straightforward starting points. There are already great examples and case studies of mobile innovation in libraries. The &lt;a rel="nofollow" target="_blank" href="http://mlibraries.jiscinvolve.org/wp/"&gt;M-Libraries community support blog&lt;/a&gt;, for example, includes a large amount of information, including case-studies, best practice guides and inspiration from other organisations on how they have transformed services with mobile technology.&lt;/p&gt;
&lt;p&gt;Indeed, as many of the examples on the M-Libraries blog demonstrate, the financial overhead for this type of change should be low. Rethinking your approach to design of mobile services shouldn&amp;#8217;t include significant barriers, either financial or technical. A good place to start is by borrowing ideas from other domains, like software development and design. The example of &lt;a rel="nofollow" target="_blank" href="http://hedtek.com/2012/paper-prototyping-mlibrary-2012-presentation/"&gt;paper-prototyping&lt;/a&gt;, used in a recent mobile development workshop, provides a good place to start.&lt;/p&gt;
&lt;p&gt;What many of these examples share is a renewed focus on the user. It moves us away from a focus on internal systems and processes, toward the behaviours and requirements of the user. The centre of gravity moves away from the technology and toward the user; the mobile-turn is one where the technology is overshadowed by the needs of the user.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The challenges mobile technologies present to libraries are ones drenched in paradox. The hardware (the phone, tablet, ereader) gradually fades from view, and it is the user, with their intricate behaviours and requirements that remain the focus of our attention&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Unlike so many other technologies, mobile enables the library to rethink its services, systems and processes to ensure that it is the user that remains at their heart. This does not mean business as usual, however. But it does mean that by understanding these challenges and their implications, libraries are in a position to design and deliver mobile experiences that users will want to engage with.&lt;/p&gt;</description>
         <author>Ben Showers</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1765</guid>
         <pubDate>Thu, 09 May 2013 12:11:22 +0000</pubDate>
      </item>
      <item>
         <title>open service development and the Expenditure Database</title>
         <link>http://feedproxy.google.com/~r/ScienceLibraryPad/~3/_LlyrtZW3Rk/open-service-development-and-the-expenditure-database.html</link>
         <description>&lt;div&gt;&lt;p id="yui_3_7_2_1_1367594642545_4011" style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;On April 22, 2013 TBS released the &lt;a rel="nofollow" target="_blank" href="http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/index-eng.html" style="text-decoration:underline;color:purple;outline:0px;"&gt;Expenditure Database&lt;/a&gt;.  It allows one to browse and search expenditures organised in ways that are easier to understand than the "votes" system under which they are actually allocated.  It is great progress within the overall open government initiative.&lt;/p&gt;&#xD;
&lt;p id="yui_3_7_2_1_1367594642545_4019" style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;However, from a technology, design and process perspective, I think there are some opportunities that are being missed.&lt;/p&gt;&#xD;
&lt;h2&gt;Victorian Service Delivery&lt;/h2&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;Minister Clement is fond of saying that the government's data is like grandmother's silver, hidden away.  In fact, &lt;strong&gt;our current model of delivering applications is more like hiding the entire kitchen from view&lt;/strong&gt;.  In Ottawa's Victorian homes, the kitchen is usually at the back of the house, closed off from the dining room.  The servants were supposed to work there in obscurity, with only the final result appearing with a dramatic flourish in the formal dining room.  &#xD;
&lt;/p&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;In most of those homes that wall has now been smashed down, because we found that in the post-servant age, the kitchen is the hub of activity in the house, where we socialise and cook together openly, a very human and social activity.&lt;/p&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;But in government software development, that wall still stands.  Applications are developed behind closed doors by public servants, and then suddenly appear, fully formed, on release day. This model of &lt;a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Proprietary_software"&gt;closed software&lt;/a&gt; development has some very real consequences:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;There is no opportunity for issues to be caught and corrected during the development process; instead we have to wait post-launch to find design issues that are only visible after public use&lt;/li&gt;&#xD;
&lt;li&gt;It makes application development seem mysterious and closed off from the world.  In a country where we want young people--in particular young women--to see software development as an interesting career, we've hidden all the real human aspects of development away, all the interactions, discussions, decisions, and compromises that have to be made as we assemble the software from the available ingredients in our coding kitchen.&lt;/li&gt;&#xD;
&lt;li&gt;It closes the government developer community off from the much larger national and worldwide community of developers, both reducing the network that they can tap into as well as hiding our own government excellence and expertise.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;We need not just open data, we need the entire philosophy of open source web development: we need to develop in the open.&lt;/p&gt;&#xD;
&lt;h2&gt;Modern Government Service Design&lt;/h2&gt;&#xD;
&lt;p id="yui_3_7_2_1_1367594642545_4025" style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;The UK Government has released a fantastic &lt;a rel="nofollow" target="_blank" href="https://www.gov.uk/service-manual" style="text-decoration:underline;color:purple;outline:0px;"&gt;Government Service Design Manual&lt;/a&gt;.  It has some explicit statements and some built-in assumptions.  The underlying concepts are to develop government services using modern software engineering processes.  This means &lt;strong&gt;being open about the code as it is in development&lt;/strong&gt; (e.g. through blogs explaining the work and github repositories making the code available) and iterating through the design, from Discovery, through Alpha, Beta, Live and (all important in the government) Retirement.&lt;/p&gt;&#xD;
&lt;p id="yui_3_7_2_1_1367594642545_4028" style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;The UK &lt;a rel="nofollow" target="_blank" href="http://digital.cabinetoffice.gov.uk/" id="yui_3_7_2_1_1367594642545_4026" style="text-decoration:underline;color:purple;outline:0px;"&gt;Government Digital Service&lt;/a&gt; does its work in the open, through blogs, twitter, and github.&lt;/p&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;Software development is a process: the application you release at a moment in time is not an end, what's important is that the application should tell a story about itself, so that it can be improved and so that it can be an inspiration for further work.&lt;/p&gt;&#xD;
&lt;h2&gt;The Expenditure Database&lt;/h2&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;The Expenditure Database is a great step in providing easy citizen access to the underlying data.  But it doesn't tell any of the story of how it works, how the data is processed, and who did the work.  The meal has been delivered and it looks very nice, but the kitchen and the process of making it are still closed off from view, hidden in mystery.&lt;/p&gt;&#xD;
&lt;h2&gt;Code on the Web&lt;/h2&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;Of&#xD;
 course this is not quite as simple a story because it is a web &#xD;
application, so in theory you could look through all of the JavaScript.  But that means reverse-engineering the code, trying to figure out how it works without having access to documentation (the recipe) and the developers (the cooks).  When someone is trying to understand an application the first thing they will do is read its story about how it works and how it was put together, and then ask the developers when there are aspects not clear or not covered in the documentation.  Without open code development, this avenue of understanding is closed.&lt;/p&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;Instead one has to dig through the code and make guesses.  For example, it looks upon examination that it is drawing the data from a local &#xD;
file: &lt;a rel="nofollow" target="_blank" href="http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/data.js" style="text-decoration:underline;color:purple;outline:0px;"&gt;http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/data.js&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;If&#xD;
 instead the code were in a shared repository and documented, it would be much &#xD;
easier to understand how it works and the rights to reuse and modify &#xD;
would be a lot clearer.  As well one would expect there would be both &#xD;
examples of using the data file as well as an explanation of how it was &#xD;
generated.&lt;/p&gt;&#xD;
&lt;h2&gt;Expenditure Database Development&lt;/h2&gt;&#xD;
&lt;p id="yui_3_7_2_1_1367594642545_4031" style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;What Expenditure Database development would look like if it followed this model:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;In advance of coding, in the Discovery phase there would be an internal call for requirements and expertise, e.g. on GCpedia (the Government of Canada internal wiki)&lt;/li&gt;&#xD;
&lt;li&gt;An alpha using open source code would be quickly released&lt;/li&gt;&#xD;
&lt;li&gt;The alpha would use public budget open data from data.gc.ca&lt;/li&gt;&#xD;
&lt;li&gt;If the data wasn't available, it would have to be released (the rule is, internal developers shouldn't have any special or privileged access to information)&lt;/li&gt;&#xD;
&lt;li&gt;A beta would be released to the public incorporating feedback and code changes&lt;/li&gt;&#xD;
&lt;li&gt;At the beta phase at the latest, code would be shared on github (a public code repository and developer community) and there would be public explanations and documentation of the work going on.&lt;/li&gt;&#xD;
&lt;li&gt;The developers would be using their own accounts; it would be clear whom to contact for questions about the software.&lt;/li&gt;&#xD;
&lt;li&gt;Once the system goes live, there would be an ongoing process to incorporate improvements to the code, improve the documentation, and respond to internal and external questions from other developers and users.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;What We're Missing&lt;/h2&gt;&#xD;
&lt;p style="margin:0in 0in 1.5pt;padding:0px;display:block;line-height:13.5pt;font-size:10pt;font-family:Helvetica, sans-serif;color:#454545;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:#ffffff;"&gt;Just a few examples of what we're missing as a consequence of the current implementation:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;There are no permanent URLs to departments or searches.  That means if I want to point you at the expenditures for the National Research Council, I can't just give you a URL.  I have to say "go to &lt;a rel="nofollow" target="_blank" href="http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/index-eng.html" style="text-decoration:underline;color:purple;outline:0px;"&gt;http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/index-eng.html&lt;/a&gt; and search or scroll to National Research Council and then click on it and then scroll down and click on the details for various parts of it".  A good standards-based design and testing process should have revealed this issue.&lt;/li&gt;&#xD;
&lt;li&gt;The Expenditure Database "stands alone" on a difficult-to-remember GC standard URL.  You have to know to navigate to it, or get the link from somewhere.  Why not tbs-sct.gc.ca/expenditures as a short link?  There is also no link from the database to data.gc.ca nor is there a link from data.gc.ca to the database.&#xD;
&lt;/li&gt;&#xD;
&lt;li&gt;Without documentation about how the data were created, it's not clear if it's using budget data from data.gc.ca or whether it is using a custom database, and how the numbers are being generated; without access to the code there is a lost opportunity for other developers to learn how to parse, manage and display the budget data.&lt;/li&gt;&#xD;
&lt;li&gt;Without the code, it's not possible for developers to either make their own websites using a similar approach, or to improve the Expenditure Database code itself.&lt;/li&gt;&#xD;
&lt;li&gt;Without pulling the data directly from data.gc.ca there is no opportunity to explore all of the challenges that external developers need to solve, such as how to deal with versioning (what if a number has to be changed in the released data? how to tell if it has been changed?  how to see the previous version?) and performance (should there be an API instead of a download?  should the download be cached locally?  for how long?) and updates (how to tell when a new budget year has been released?)&lt;/li&gt;&#xD;
&lt;li&gt;Without the code shared and the work publically described, people are left to speculate as to how the results are generated.&lt;/li&gt;&#xD;
&lt;li&gt;Without the code shared and the work publically described, there is no entry point for external developers; there is no community-building for external developers.&lt;/li&gt;&#xD;
&lt;li&gt;There's no contact point.  The site gives no idea who developed the Expenditure Database, how it works, or who made the design decisions.  The code and its development is made mysterious, the work of the kitchen hidden, rather than being demystified.&lt;/li&gt;&#xD;
&lt;li&gt;There's no way to download the data displayed; you can only take a screenshot.&lt;/li&gt;&#xD;
&lt;li&gt;There is no standard web navigation (e.g. the back button doesn't work)&lt;/li&gt;&#xD;
&lt;li&gt;There's no way to compare departments &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;h2&gt;Additional information&lt;/h2&gt;&#xD;
&lt;p&gt;TBS PR - &lt;a rel="nofollow" target="_blank" href="http://www.tbs-sct.gc.ca/media/nr-cp/2013/0422-eng.asp" style="text-decoration:underline;color:purple;outline:0px;"&gt;Searchable Information on Government Spending is now Just a Click Away&lt;/a&gt;&lt;/p&gt;&#xD;
&lt;h2&gt;Call for Feedback&lt;/h2&gt;&#xD;
&lt;p&gt;I welcome your feedback on this post, with any clarifications, corrections, suggestions or pointers.&lt;/p&gt;&#xD;
&lt;p&gt;Thanks to &lt;a rel="nofollow" target="_blank" href="http://ea.typepad.com/"&gt;Stephen Anthony&lt;/a&gt;, Adam Frankowski, Patrice Collin and &lt;a rel="nofollow" target="_blank" href="http://www.cpsrenewal.ca/"&gt;Nick Charney&lt;/a&gt; for providing ideas and feedback that improved this post.&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/ScienceLibraryPad?a=_LlyrtZW3Rk:lKKkNEN2NaY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/ScienceLibraryPad?d=yIl2AUoC8zA" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/ScienceLibraryPad?a=_LlyrtZW3Rk:lKKkNEN2NaY:qj6IDK7rITs"&gt;&lt;img src="http://feeds.feedburner.com/~ff/ScienceLibraryPad?d=qj6IDK7rITs" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/ScienceLibraryPad?a=_LlyrtZW3Rk:lKKkNEN2NaY:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/ScienceLibraryPad?i=_LlyrtZW3Rk:lKKkNEN2NaY:F7zBnMyn0Lo" border="0"&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ScienceLibraryPad/~4/_LlyrtZW3Rk" height="1" width="1"/&gt;</description>
         <author>Richard Akerman</author>
         <guid isPermaLink="false">tag:typepad.com,2003:post-6a00d8341c8a6453ef019101c3944f970c</guid>
         <pubDate>Fri, 03 May 2013 13:10:57 +0000</pubDate>
      <enclosure url="http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/data.js" length="570739" type="application/x-javascript" /><media:content url="http://www.tbs-sct.gc.ca/ems-sgd/edb-bdd/data.js" fileSize="570739" type="application/x-javascript" /><itunes:subtitle>On April 22, 2013 TBS released the Expenditure Database.  It allows one to browse and search expenditures organised in ways that are easier to understand than the "votes" system under which they are actually allocated.  It is great progress within the ove</itunes:subtitle><itunes:author>Richard Akerman</itunes:author><itunes:summary>On April 22, 2013 TBS released the Expenditure Database.  It allows one to browse and search expenditures organised in ways that are easier to understand than the "votes" system under which they are actually allocated.  It is great progress within the overall open government initiative. However, from a technology, design and process perspective, I think there are some opportunities that are being missed. Victorian Service Delivery Minister Clement is fond of saying that the government's data is like grandmother's silver, hidden away.  In fact, our current model of delivering applications is more like hiding the entire kitchen from view.  In Ottawa's Victorian homes, the kitchen is usually at the back of the house, closed off from the dining room.  The servants were supposed to work there in obscurity, with only the final result appearing with a dramatic flourish in the formal dining room.  In most of those homes that wall has now been smashed down, because we found that in the post-servant age, the kitchen is the hub of activity in the house, where we socialise and cook together openly, a very human and social activity. But in government software development, that wall still stands.  Applications are developed behind closed doors by public servants, and then suddenly appear, fully formed, on release day. This model of closed software development has some very real consequences: There is no opportunity for issues to be caught and corrected during the development process; instead we have to wait post-launch to find design issues that are only visible after public use It makes application development seem mysterious and closed off from the world.  In a country where we want young people--in particular young women--to see software development as an interesting career, we've hidden all the real human aspects of development away, all the interactions, discussions, decisions, and compromises that have to be made as we assemble the software from the available ingredients in our coding kitchen. It closes the government developer community off from the much larger national and worldwide community of developers, both reducing the network that they can tap into as well as hiding our own government excellence and expertise. We need not just open data, we need the entire philosophy of open source web development: we need to develop in the open. Modern Government Service Design The UK Government has released a fantastic Government Service Design Manual.  It has some explicit statements and some built-in assumptions.  The underlying concepts are to develop government services using modern software engineering processes.  This means being open about the code as it is in development (e.g. through blogs explaining the work and github repositories making the code available) and iterating through the design, from Discovery, through Alpha, Beta, Live and (all important in the government) Retirement. The UK Government Digital Service does its work in the open, through blogs, twitter, and github. Software development is a process: the application you release at a moment in time is not an end, what's important is that the application should tell a story about itself, so that it can be improved and so that it can be an inspiration for further work. The Expenditure Database The Expenditure Database is a great step in providing easy citizen access to the underlying data.  But it doesn't tell any of the story of how it works, how the data is processed, and who did the work.  The meal has been delivered and it looks very nice, but the kitchen and the process of making it are still closed off from view, hidden in mystery. Code on the Web Of course this is not quite as simple a story because it is a web application, so in theory you could look through all of the JavaScript.  But that means reverse-engineering the code, trying to figure out how it works without having access to documentation (the recipe) and the developers (the cooks).  When someone is trying</itunes:summary></item>
      <item>
         <title>Shared Library Systems and Services, Part 1</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/04/22/shared-library-systems-and-services-part-1/</link>
         <description>&lt;p dir="ltr"&gt;As part of the &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/whatwedo/programmes/di_informationandlibraries/emergingopportunities/librarysystems.aspx"&gt;Library Systems Programme&lt;/a&gt;, two reports have been published exploring the potential for shared library systems across Universities in both Scotland and Wales.&lt;/p&gt;
&lt;p&gt;In the first of two posts I wanted to briefly&lt;strong&gt; introduce you to the two recently published reports, and their main findings/recommendations&lt;/strong&gt;. In the second post I want to highlight some of the other developments on the shared library systems landscape, and highlight some of the implications.&lt;/p&gt;
&lt;h3&gt;A Shared LMS for Wales (WHELF)&lt;/h3&gt;
&lt;p&gt;The  &lt;a rel="nofollow" target="_blank" href="http://blogs.cf.ac.uk/sharedlms/resource/JISC-Shared-LMS-Report.pdf"&gt;Welsh Shared Service Library Management System Feasibility Report&lt;/a&gt; focussed on &lt;strong&gt;the most prevalent and practical issues for a shared all Wales HE library management system in broad terms&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;A set of &lt;strong&gt;high-level agreed consortium requirements for a shared LMS&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;A &lt;strong&gt;proposed governance model for the consortium&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;High level recommendations on&lt;strong&gt; integration requirements for local systems&lt;/strong&gt;; map communications standards which are applicable to the project against standards in use by suppliers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;A &lt;strong&gt;business case for a Wales-wide consortium LMS&lt;/strong&gt;, including cost matrices for the different approaches presented.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;Recommendations on &lt;strong&gt;the most cost-effective approach for software, hosting and ongoing management&lt;/strong&gt; of the LMS.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p dir="ltr"&gt;The report makes the following &lt;strong&gt;recommendations&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;The Project recommended &lt;strong&gt;setting up an All-Wales Consortium with formal governance&lt;/strong&gt;. This requires the consortium to formally agree which processes, working practices and configurations will be adhered to by all members as a whole.&lt;/p&gt;
&lt;p dir="ltr"&gt;A &lt;strong&gt;cloud solution hosted by a vendor (or open source vendor) is the preferred option&lt;/strong&gt;, because this will provide the most cost-effective resilient solution.&lt;/p&gt;
&lt;p dir="ltr"&gt;Further work will be required to &lt;strong&gt;develop a clear statement on the vision for shared LMS services in Wales&lt;/strong&gt;, ensuring clarity of purpose and providing a compelling statement of intent for senior stakeholders and staff to achieve buy-in to the strategic direction proposed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Next steps&amp;#8230;&lt;/strong&gt;&lt;/p&gt;
&lt;p dir="ltr"&gt;The report suggests a &lt;strong&gt;phased approach to implementation; anticipating that the first implementations will be no sooner than Summer 2014.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The report also suggests a&lt;strong&gt; task and finish group should be convened to quickly put together a high level plan, costs and cost allocation&lt;/strong&gt; (i.e. funding) for the establishment of a project team.&lt;/p&gt;
&lt;h3&gt;The Benefits of Sharing (SCURL)&lt;/h3&gt;
&lt;p&gt;&lt;span style="font-size:13px;font-weight:normal;"&gt;The &lt;/span&gt;&lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/"&gt;Benefits of Sharing project&lt;/a&gt;&lt;span style="font-size:13px;font-weight:normal;"&gt; has also just released a &lt;/span&gt;&lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/files/2013/04/The-Benefits-Of-Sharing-Summary-Report.pdf"&gt;summary report&lt;/a&gt;&lt;span style="font-size:13px;font-weight:normal;"&gt; of its work exploring a simple question:&lt;/span&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;strong&gt;How would a shared library management system improve services in Scotland?&lt;/strong&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;/strong&gt;While the question is simple, the answer is a little more complex. Indeed, the project began looking at the question with an initial &lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/the-lms-day/"&gt;workshop and subsequent report&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It then broke the problem into 3 parts:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;&lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/users/"&gt;Users&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;&lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/systems/"&gt;Systems&lt;/a&gt;, and;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p dir="ltr"&gt;&lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/content/"&gt;Content&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The project also published a summary report which concludes with a number of recommendations, including the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p dir="ltr"&gt;&lt;strong&gt;From a systems perspective, sharing technical infrastructure and support structures would offer benefits of economies of scale, with more efficient use of staffing and greater expertise than any single library could offer. System options such as Open Source (OS) alternatives to ‘off the shelf’ commercial products could, therefore, become viable. It is recommended that at the tender and procurement phases of a shared LMS, all options, including OS systems, are reviewed and assessed.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;-&lt;/p&gt;
&lt;p&gt;Both reports make very interesting reading &amp;#8211; and also tell us a lot about the current library systems landscape. In particular &lt;strong&gt;there is a renewed vigour in the potential for sharing and collaborating around services and systems between libraries and institutions&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;There is also &lt;strong&gt;a clear recognition that open source solutions are viable options for the community&lt;/strong&gt;, and may represent a feature of this new library landscape.&lt;/p&gt;
&lt;p&gt;In the second post on shared library services and systems I&amp;#8217;ll explore some of the other developments within this landscape, and the implications they have for institutions, libraries and systems vendors.&lt;/p&gt;
&lt;div&gt;&lt;/div&gt;</description>
         <author>Ben Showers</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1758</guid>
         <pubDate>Mon, 22 Apr 2013 10:24:35 +0000</pubDate>
      <enclosure url="http://blogs.cf.ac.uk/sharedlms/resource/JISC-Shared-LMS-Report.pdf" length="896960" type="application/pdf" /><media:content url="http://blogs.cf.ac.uk/sharedlms/resource/JISC-Shared-LMS-Report.pdf" fileSize="896960" type="application/pdf" /><itunes:subtitle>As part of the Library Systems Programme, two reports have been published exploring the potential for shared library systems across Universities in both Scotland and Wales. In the first of two posts I wanted to briefly introduce you to the two recently pu</itunes:subtitle><itunes:author>Ben Showers</itunes:author><itunes:summary>As part of the Library Systems Programme, two reports have been published exploring the potential for shared library systems across Universities in both Scotland and Wales. In the first of two posts I wanted to briefly introduce you to the two recently published reports, and their main findings/recommendations. In the second post I want to highlight some of the other developments on the shared library systems landscape, and highlight some of the implications. A Shared LMS for Wales (WHELF) The  Welsh Shared Service Library Management System Feasibility Report focussed on the most prevalent and practical issues for a shared all Wales HE library management system in broad terms: A set of high-level agreed consortium requirements for a shared LMS. A proposed governance model for the consortium. High level recommendations on integration requirements for local systems; map communications standards which are applicable to the project against standards in use by suppliers. A business case for a Wales-wide consortium LMS, including cost matrices for the different approaches presented. Recommendations on the most cost-effective approach for software, hosting and ongoing management of the LMS. The report makes the following recommendations: The Project recommended setting up an All-Wales Consortium with formal governance. This requires the consortium to formally agree which processes, working practices and configurations will be adhered to by all members as a whole. A cloud solution hosted by a vendor (or open source vendor) is the preferred option, because this will provide the most cost-effective resilient solution. Further work will be required to develop a clear statement on the vision for shared LMS services in Wales, ensuring clarity of purpose and providing a compelling statement of intent for senior stakeholders and staff to achieve buy-in to the strategic direction proposed. Next steps&amp;#8230; The report suggests a phased approach to implementation; anticipating that the first implementations will be no sooner than Summer 2014. The report also suggests a task and finish group should be convened to quickly put together a high level plan, costs and cost allocation (i.e. funding) for the establishment of a project team. The Benefits of Sharing (SCURL) The Benefits of Sharing project has also just released a summary report of its work exploring a simple question: How would a shared library management system improve services in Scotland? While the question is simple, the answer is a little more complex. Indeed, the project began looking at the question with an initial workshop and subsequent report. It then broke the problem into 3 parts: Users Systems, and; Content The project also published a summary report which concludes with a number of recommendations, including the following: From a systems perspective, sharing technical infrastructure and support structures would offer benefits of economies of scale, with more efficient use of staffing and greater expertise than any single library could offer. System options such as Open Source (OS) alternatives to ‘off the shelf’ commercial products could, therefore, become viable. It is recommended that at the tender and procurement phases of a shared LMS, all options, including OS systems, are reviewed and assessed. &amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;&amp;#8212;- Both reports make very interesting reading &amp;#8211; and also tell us a lot about the current library systems landscape. In particular there is a renewed vigour in the potential for sharing and collaborating around services and systems between libraries and institutions. There is also a clear recognition that open source solutions are viable options for the community, and may represent a feature of this new library landscape. In the second post on shared library services and systems I&amp;#8217;ll explore some of the other developments within this landscape, and the implications they have for instit</itunes:summary></item>
      <item>
         <title>How you can help us to make research administration more efficient</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/04/18/how-you-can-help-us-to-make-research-administration-more-efficient/</link>
         <description>&lt;p&gt;Jisc and &lt;a rel="nofollow" target="_blank" href="http://casrai.org/"&gt;CASRAI&lt;/a&gt; are piloting the development of a &amp;#8216;UK chapter&amp;#8217; of the CASRAI dictionary to improve research interoperability.&lt;br /&gt;
Get in touch by emailing &lt;a rel="nofollow" target="_blank" href="mailto:info@casrai.org"&gt;info@casrai.org&lt;/a&gt; if you&amp;#8217;d like to keep up-to-date with progress and to contribute your views. More information is below.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The problem we are addressing&lt;/strong&gt;&lt;br /&gt;
Research teams and administrators must retype the same information repeatedly when applying for grants and reporting to funders. Research policy-makers, managers and evaluators are consistently frustrated by an inability to draw meaningful conclusions from a growing mass of disconnected data. The problem and a way to reduce this administrative duplication for the research community are nicely illustrated in the CASRAI video:&lt;/p&gt;
&lt;p&gt;&lt;/p&gt; 
&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;&lt;br /&gt;
The solution suggested by &lt;a rel="nofollow" target="_blank" href="http://casrai.org/"&gt;CASRAI&lt;/a&gt; is compiling a common, international dictionary. The dictionary contains definitions of key terms or information elements which relate to the management of e.g. research grants, CVs or data management plans and documents controlled vocabularies, authoritative lists and identifiers that are relevant for these terms.  The dictionary thereby provides the basis for data profiles to ease the exchange of information within and between organisations.  As a single, open and unambiguous reference source for data profiles, the CASRAI dictionary can be used by multiple technology suppliers &amp;#8211; including those using CERIF or VIVO – thereby forming a basis for interoperability and allowing information to be exchanged smoothly.&lt;/p&gt;
&lt;p&gt;The ‘CASRAI approach’ to developing this dictionary and building agreement around key terms is generating more and more interest in the UK research community and we are excited about trialling it here in the UK.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CASRAI-UK&lt;/strong&gt;&lt;br /&gt;
CASRAI and Jisc are piloting three &lt;em&gt;National working groups&lt;/em&gt; (NWGs) around a number of priority areas identified at the &lt;a rel="nofollow" target="_blank" href="http://isc.ukoln.ac.uk/2012/12/14/casrai-uk-chapter/"&gt;CASRAI-UK summit&lt;/a&gt; organised by Jisc and CASRAI last December. These pilot projects are exploring both the methods and the particular content that is the focus of this work.&lt;/p&gt;
&lt;p&gt;The people on the working groups (i.e. funders, research managers, standards experts) will identify and document agreements on vocabularies. While these agreements will build on and have defined relationships with an international core, they will reflect UK requirements. A CASRAI analyst will help to develop ‘data profiles’ which are defined  as a harmonized standard that specifies a subset of information required by the users of an inter-organisational work process.&lt;/p&gt;
&lt;p&gt;This approach can prevent us from reinventing wheels and offer a sustainable home for these agreements &amp;#8211;  for example also for the outputs of the Jisc &lt;a rel="nofollow" target="_blank" href="http://ukriss.cerch.kcl.ac.uk/"&gt;UK Research Information Shared Service&lt;/a&gt; project.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Three pilot National Working Groups&lt;/strong&gt; &lt;strong&gt;&lt;/strong&gt;&lt;strong&gt;will focus on&lt;br /&gt;
&lt;/strong&gt;1. Data Management Plans&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developing an initial data profile reflecting the current version of the &lt;a rel="nofollow" target="_blank" href="http://www.dcc.ac.uk/dmponline"&gt;Digital Curation Centre’s&lt;/a&gt; “DMP-Online” tool.&lt;/li&gt;
&lt;li&gt;Concurrently, developing a potential further version of the data profile derived from harmonization and discussions among the UK Research Councils.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;2. Organisational Lists&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Assessment and recommendation on the suitability of the planned FundRef list for adoption as a standard authoritative list for international funding organizations.&lt;/li&gt;
&lt;li&gt;Exploring possible sources of authoritative lists of organisations involved in UK research, including research performing organisations, charities, industry, etc.&lt;/li&gt;
&lt;li&gt;Develop a sustainable process for maintaining authoritative lists of organisations in the CASRAI dictionary.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;3. Research Reporting&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data profile supporting institutional report to UK funders for the new policy on Open Access, and for research contributions / outputs more generally&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While other areas identified at the CASRAI-UK Summit &amp;#8211; Ethics Review and Research Equipment profiles – are also important, we think that these need more discussion before we can convene and set the scope for working groups. For now, these discussions will be continued in a special online forum for each of these topics. &lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt; &lt;strong&gt;and how you can get invovled&lt;/strong&gt;&lt;br /&gt;
Due to the pilot status of the working groups, the CASRAI governance arrangements and membership model will not be applied to its full extent this year. An objective of the pilot is to develop &amp;#8211; together with the people participating in the working groups &amp;#8211; a mechanism that works appropriately for the UK.&lt;/p&gt;
&lt;p&gt;At the same time as we are starting the working groups we are also convening the &lt;em&gt;CASRAI-UK National Review Circle.&lt;/em&gt; This group includes a wider group of people that are interested in the progress of CASRAI-UK and is open for anyone interested to join.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;National Working Groups (NWGs)&lt;/em&gt; will, in the course of their work, produce drafts, announcements and other outputs. These will be posted to a dedicated forum for the &lt;em&gt;National Review Circle&lt;/em&gt;. This wider group can keep up-to-date on progress but also contribute advice and feedback to the NWGs as they evolve new national standards for the UK research community.&lt;br /&gt;
The purpose of the &lt;em&gt;National review Circle&lt;/em&gt; is to ensure that the resulting standards are applicable and multi-disciplinary and that valid diverged views are communicated.&lt;/p&gt;
&lt;p&gt;Please email &lt;a rel="nofollow" target="_blank" href="mailto:info@casrai.org"&gt;info@casrai.org&lt;/a&gt; if you are interested in joining the National Review Circle or the discussion forum on Ethics Review or Research Equipment profiles.&lt;/p&gt;</description>
         <author>verena13</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1732</guid>
         <pubDate>Thu, 18 Apr 2013 15:12:54 +0000</pubDate>
      </item>
      <item>
         <title>Data storytelling will get more citations for researchers?</title>
         <link>http://feedproxy.google.com/~r/DFFlanders/~3/KuN2G0xiG-Q/</link>
         <description>I spent the weekend creating the following infographic (a version of soon to appear in JISCs #mobile1st report). What struck me as interesting during this process was the fact that I was working with very reputable data, and yet my methods for visualizing this data were very focused on the storytelling aspects of this chart [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dfflanders.wordpress.com&amp;#038;blog=532341&amp;#038;post=712&amp;#038;subd=dfflanders&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://dfflanders.wordpress.com/?p=712</guid>
         <pubDate>Tue, 02 Apr 2013 23:43:03 +0000</pubDate>
         <content:encoded><![CDATA[<p>I spent the weekend creating the following infographic (a version of soon to appear in JISCs #mobile1st report). What struck me as interesting during this process was the fact that I was working with very reputable data, and yet my methods for visualizing this data were very focused on the storytelling aspects of this chart (see below for the various drafts I iterated through to get to this version).</p>
<div class="wp-caption aligncenter" style="width:998px;"><img alt="" src="https://docs.google.com/drawings/d/1fiBDA9bZjaWJDMJlzzKiuXL3TDNSQXxgdDV4HrUc6cw/pub?w=988&amp;h=539" width="988" height="538"/><p class="wp-caption-text">Increase of international student tuition fees (vs national student tuition) in the UK per year and student enrollment.</p></div>
<p>What strikes me as a bit awkward in the coming scholarly revolution is how easy it is to manipulate data in the visual. Will we not see researchers employing similar data storytelling skills (in the form of infographics like the above &#8216;bubbles&#8217; atop a bar chart?) for the purpose of getting more people to click, link and download their paper based on what they see on the first page (abstract + image)?</p>
<p>NB I&#8217;d highly recommend you read the JISC/OCLC report on Gen Y researcher reading habits: &#8220;Bouncing, Chunking and Squirrelling&#8221;. To grossly paraphrase this report: <span style="text-decoration:underline;"><strong>the researcher who puts a map or infographic on the first page of their research article, significantly improves their chance of their research being &#8216;squirrelled&#8217; away</strong></span> by readers, which leads to more citations, <span style="text-decoration:underline;"><strong>which leads to increased researcher reputation</strong></span>.</p>
<p>My question to you is: do these types of hybrid-(info)graphs undermine the scholarly process? Perhaps even employing marketing techniques to the data so it is more appealing to the reader?</p>
<p>On the other side of the coin, is this article where journalists are feeling they are losing their creative storytelling abilities because of &#8216;data journalism&#8217;: <a rel="nofollow" target="_blank" href="http://techcrunch.com/2013/03/31/data-is-not-killing-creativity-its-just-changing-how-we-tell-stories/">http://techcrunch.com/2013/03/31/data-is-not-killing-creativity-its-just-changing-how-we-tell-stories/</a></p>
<p>Some other attempts at various infographics of the same data as the chart at the top of this post:</p>
<p><img class="aligncenter" alt="" src="https://docs.google.com/spreadsheet/oimg?key=0AiEMyUaWk5YidDNQdXQyb3pCSG9yc3hwOGxBd1FRX3c&amp;oid=3&amp;zx=v56kanfic362" width="456" height="299"/></p>
<p>This one (as opposed to the one at the top of this post) doesn&#8217;t use logarithmic data but does place the trends in proportion to one another&#8230;</p>
<p><img class="aligncenter" alt="" src="https://docs.google.com/spreadsheet/oimg?key=0AiEMyUaWk5YidDNQdXQyb3pCSG9yc3hwOGxBd1FRX3c&amp;oid=4&amp;zx=eib9a5qklkd0" width="404" height="244"/></p>
<br />  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/dfflanders.wordpress.com/712/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/dfflanders.wordpress.com/712/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dfflanders.wordpress.com&#038;blog=532341&#038;post=712&#038;subd=dfflanders&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <media:content medium="image" url="http://2.gravatar.com/avatar/8cdc26501af3b7e571a4d23f0afbbf08?s=96&amp;amp;d=identicon&amp;amp;r=G">
            <media:title type="html">dfflanders</media:title>
         </media:content>
         <media:content medium="image" url="https://docs.google.com/drawings/d/1fiBDA9bZjaWJDMJlzzKiuXL3TDNSQXxgdDV4HrUc6cw/pub?w=988&amp;h=539" />
         <media:content medium="image" url="https://docs.google.com/spreadsheet/oimg?key=0AiEMyUaWk5YidDNQdXQyb3pCSG9yc3hwOGxBd1FRX3c&amp;oid=3&amp;zx=v56kanfic362" />
         <media:content medium="image" url="https://docs.google.com/spreadsheet/oimg?key=0AiEMyUaWk5YidDNQdXQyb3pCSG9yc3hwOGxBd1FRX3c&amp;oid=4&amp;zx=eib9a5qklkd0" />
      </item>
      <item>
         <title>When Staff and Researchers Leave Their Host Institution</title>
         <link>http://ukwebfocus.wordpress.com/2013/03/22/when-staff-and-researchers-leave-their-host-institution/</link>
         <description>What happens when staff and researchers are planning to leave their host institution? In light of the &amp;#8220;UKOLN – Looking Ahead&amp;#8221; announcement this is a subject which is currently preoccupying myself and many of my colleagues. As Martin Hamilton pointed out in his post on A Tale of Two Jiscs: Reflections on CETIS13, FutureLearn and the JISC Diaspora &amp;#8221;In many [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&amp;#038;blog=497535&amp;#038;post=13484&amp;#038;subd=ukwebfocus&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=13484</guid>
         <pubDate>Fri, 22 Mar 2013 08:45:50 +0000</pubDate>
         <content:encoded><![CDATA[<p>What happens when staff and researchers are planning to leave their host institution? In light of the &#8220;<a rel="nofollow" title="Permanent link to Announcement: UKOLN &#x002013; Looking&#xa0;Ahead" target="_blank" href="http://ukwebfocus.wordpress.com/2012/12/21/ukoln-looking-ahead/">UKOLN – Looking Ahead</a>&#8221; announcement this is a subject which is currently preoccupying myself and many of my colleagues.</p>
<p>As Martin Hamilton pointed out in his post on <a rel="nofollow" target="_blank" href="http://blog.martinh.net/2013/03/a-tale-of-two-jiscs-reflections-on.html">A Tale of Two Jiscs: Reflections on CETIS13, FutureLearn and the JISC Diaspora</a> &#8221;<em>In many cases, JISC was farsighted enough to forsee requirements in the research and education sector that have subsequently turned into significant businesses in themselves</em>&#8220;. But Martin then went on to describe how those benefits are about to be lost: &#8220;<em>we are entering a new era, necessitated by funding reductions, changing student demographics and frankly an unwillingness to see &#8220;R&amp;D&#8221; type activities (of which a large proportion can be expected to fail) facilitated through top sliced central funding</em>&#8220;. For myself and many of my colleagues we are having to respond to the scenario depicted by Martin:&#8221;<em>Behind the scenes, a lot of people who have been working for JISC on its various centres and services have been having meetings with their local HR departments about redundancy and redeployment</em>&#8220;.</p>
<p>But what should you do if you wish to continue to make use of the skills and expertise you have developed over the years but new full-time posts appear to be in short supply? I suspect the changes in Jisc will provide new consultancy opportunities, with <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/publications/jiscinform/2013/inform36.aspx">their current preoccupation</a> in telling <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/news.aspx">good news stories</a> without addressing any of the underlying complexities or tensions leaving a void which can be filled by those who have a more realistic understanding of the complexities of exploiting IT to support institutional requirements.</p>
<p>The preparation for a new career will mean the loss of an IT infrastructure and the accompanying support which many of us will have grown accustomed to. But how can provide help and advice in the preparation for a move away from an institutional environment? One might expect the Library to provide support, especially for institutions which have a commitment to information literacy, which <a rel="nofollow" target="_blank" href="http://www.lilacconference.com/WP/">is defined as</a> &#8220;<em>the ability to find, use, evaluate and communicate information&#8221;</em> and is &#8220;<em>an essential skill in this digital age and era of life-long learning</em>&#8220;. But as <a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/web-focus/events/conferences/lilac-2013/">I will be describing next week</a> at the <a rel="nofollow" target="_blank" href="http://www.lilacconference.com/WP/">LILAC 2013 conference</a> this is not necessarily the case, with the role of librarians perhaps being to promote use of institutional rather than Cloud services. But since we will all, at some point, leave our host institution, this is not really providing staff and researchers with the life-long skills needed to thrive beyond an institutional context.</p>
<p>Surely it is timely for a change in focus, especially if the gloomy predictions are correct and we continue to see reductions in staffing levels in higher education institutions?</p>
<p>I&#8217;d welcome your thoughts and comments &#8211; especially if you have experience of leaving your host institution and continuing to work, perhaps as a consultant. My slides are <a rel="nofollow" target="_blank" href="http://www.slideshare.net/lisbk/when-staff-and-researchers-leave-their-host-institution">available on Slideshare</a> and embedded below:</p>
 
<hr />
<p><a rel="nofollow" name="statistics"></a>View Twitter conversation from: [<a rel="nofollow" target="_blank" href="http://topsy.com/ukwebfocus.wordpress.com/2013/03/22/when-staff-and-researchers-leave-their-host-institution/">Topsy</a>] | View Twitter statistics from: [<a rel="nofollow" target="_blank" href="https://tweetreach.com/reports/7028585?oauth=1">TweetReach</a>] – [<a rel="nofollow" target="_blank" href="https://bitly.com/Yz6Vgd+/global">Bit.ly</a>]</p>
<br />Filed under: <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/general/'>General</a>  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ukwebfocus.wordpress.com/13484/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godelicious/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ukwebfocus.wordpress.com/13484/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gofacebook/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ukwebfocus.wordpress.com/13484/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gotwitter/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ukwebfocus.wordpress.com/13484/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gostumble/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ukwebfocus.wordpress.com/13484/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godigg/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ukwebfocus.wordpress.com/13484/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/goreddit/ukwebfocus.wordpress.com/13484/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ukwebfocus.wordpress.com/13484/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&#038;blog=497535&#038;post=13484&#038;subd=ukwebfocus&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <geo:lat>51.379915</geo:lat>
         <geo:long>-2.331708</geo:long>
         <media:content medium="image" url="http://2.gravatar.com/avatar/27731abff266f585f006998f65c74be9?s=96&amp;amp;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96">
            <media:title type="html">Brian Kelly (UK Web Focus)</media:title>
         </media:content>
         <category>General</category>
      </item>
      <item>
         <title>Week endNotes</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/03/01/week-endnotes/</link>
         <description>&lt;p style="text-align:center;"&gt;&lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/files/2013/03/BL1.jpg"&gt;&lt;img class="aligncenter size-medium wp-image-1716" src="http://infteam.jiscinvolve.org/wp/files/2013/03/BL1-300x200.jpg" alt="" width="300" height="200"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The highlight of this weeks endNotes is something that happened at the end of last week (I know it&amp;#8217;s cheating, but there wasn&amp;#8217;t an endNote last week!).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The International Discovery Summit&lt;/strong&gt;, held at the British Library, saw delegates from the UK and across the world come together to discuss ways of collaborating to address the common technical, political and social challenges that are preventing us realising our grand visions for better resource discovery.&lt;/p&gt;
&lt;p&gt;All the presentations and resources from the day can be found on the &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/events/2013/02/discovery-13.aspx"&gt;website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On tuesday this week the Ithaka S+R and Jisc published a report into the&lt;a rel="nofollow" target="_blank" href="http://www.sr.ithaka.org/research-publications/supporting-changing-research-practices-chemists"&gt; &lt;strong&gt;Changing Research Pracitices of Chemists&lt;/strong&gt;&lt;/a&gt;.  The report describes the findings of research into academic chemists’ research habits and research support needs. the report covers themes such as data management, research collaboration, library use, discovery, publication practices, and research funding with a number of recommendations for new support service models.&lt;/p&gt;
&lt;p&gt;This week also saw two workshops take place that explored the challenges of new technologies on libraries and teaching and learning. The University of Huddersfield hosted a &lt;strong&gt;&lt;a rel="nofollow" target="_blank" href="http://library.hud.ac.uk/blogs/projects/hike/"&gt;HIKE project workshop&lt;/a&gt; &lt;/strong&gt;that explored the impact of new library technologies (like next generation systems and services like &lt;a rel="nofollow" target="_blank" href="http://www.jisc-collections.ac.uk/knowledgebaseplus/"&gt;KB+&lt;/a&gt;) on workflows and processes.&lt;/p&gt;
&lt;p&gt;The second workshop was on the &lt;strong&gt;&lt;a rel="nofollow" target="_blank" href="http://ebookchallenge.org.uk/"&gt;challenges of eBooks &lt;/a&gt;&lt;/strong&gt; and is part of a report exploring the challenges ebooks (and e-textbooks) pose to teaching and learning as well as beginning to map some of the ways institutions are addressing and solving these issues. the case studies developed with institutions are particularly interesting in mapping out potential ways forward in this fluid and complex space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finally&amp;#8230;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;My final link this week is for a documentary that is part of the BBC&amp;#8217;s Storyville series. &lt;a rel="nofollow" target="_blank" href="http://www.bbc.co.uk/iplayer/episode/b01qxmqc/Storyville_20122013_Google_and_the_World_Brain/"&gt;Google and the World Brain&lt;/a&gt; is about the campaign by authors to put a stop to the Google Books website after Google scanned millions of books, over half of which were still in copyright. Fascinating stuff.&lt;/p&gt;</description>
         <author>Ben Showers</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1705</guid>
         <pubDate>Fri, 01 Mar 2013 16:19:31 +0000</pubDate>
      </item>
      <item>
         <title>Naming Conventions For Institutional Repositories: Lessons from CORE</title>
         <link>http://ukwebfocus.wordpress.com/2013/02/21/naming-conventions-for-institutional-repositories-lessons-from-core/</link>
         <description>The CORE (COnnecting REpositories) Project Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded CORE project. The CORE (COnnecting REpositories) project aims to &amp;#8220;facilitate free access to scholarly publications distributed across many systems&amp;#8220;. The CORE Web site, which was developed at the Open University, provides access to [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&amp;#038;blog=497535&amp;#038;post=13367&amp;#038;subd=ukwebfocus&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=13367</guid>
         <pubDate>Thu, 21 Feb 2013 13:46:24 +0000</pubDate>
         <content:encoded><![CDATA[<h2>The CORE (COnnecting REpositories) Project</h2>
<p>Whilst preparing a follow-up post on institutional repositories I started to explore the data which has been collected by the JISC-funded <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/">CORE project</a>. The CORE (COnnecting REpositories) project aims to &#8220;<em>facilitate free access to scholarly publications distributed across many systems</em>&#8220;. The CORE Web site, which was developed at the Open University, provides access to four applications including:</p>
<p style="padding-left:30px;"><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics">Repository Analytics</a> - A tool that enables to monitor the ingestion of metadata and content from repositories and provides a wide range of statistics.</p>
<p>I wanted to use this service to find information about the repositories provided by the 24 Russell Group universities. However, as can be seen from the accompanying screenshot, it was not easy to associate a repository with its host institution.</p>
<p><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/02/core-project.png"><img class="alignright size-full wp-image-13407" style="border:1px solid black;margin-left:10px;margin-right:10px;" alt="CORE project" src="http://ukwebfocus.files.wordpress.com/2013/02/core-project.png?w=663&#038;h=173" width="663" height="173"/></a>The first four examples illustrate the difficulties I had in using the information. The first entry, for the <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/1">Aberdeen University Research Archive</a>, gives a clear indication of the host institution. The second example, <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/2">Abertay Research Collections</a>, is somewhat more obscure, unless you know that Abertay is the name of a Scottish university. However the next two examples, <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/3">Access to Research Resources for Teachers</a> and <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/4">Advanced Knowledge Technologies EPrints Archive</a>, give no clue as to the host institution.</p>
<p>This meant that browsing the list was not an effective way of finding the repositories for the Russell Group universities. In addition the search interface was misleading: a search for &#8220;Southampton&#8221; enabled me to find <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/36">eCrystals &#8211; Southampton</a> and <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/37">Electronics &amp; Computer Science EPrints Service &#8211; University of Southampton</a> - but not the main repository which has the name <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/34">e-Prints Soton</a>.</p>
<h2>Using CORE to Search for Russell Group University Repositories</h2>
<p>Despite the limitations caused by the lack of institutional identifiers I felt it would be useful to discover information held about Russell Group university repositories, based on a search of the CORE system using the obvious name for the host institution. The following table summarises the findings for a survey carried out on 21 February 2013 using the search term given in the second column.</p>
<table border="1">
<tbody>
<tr>
<td rowspan="1" colspan="1"><strong>Ref.<br />
No.</strong></td>
<td rowspan="1" colspan="1"><strong>Institution<br />
(search string)</strong></td>
<td rowspan="1" colspan="1"><strong>Repository</strong></td>
<td rowspan="1" colspan="1"><strong>Metadata<br />
Download</strong></td>
<td rowspan="1" colspan="1"><strong>Metadata<br />
Readable</strong></td>
<td rowspan="1" colspan="1"><strong>PDF<br />
Downloads</strong></td>
</tr>
<tr>
<td style="text-align:center;" rowspan="3">1</td>
<td rowspan="3">Birmingham</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/119">University of Birmingham<br />
Research Archive, E-papers Repository</a></td>
<td style="text-align:center;">    937</td>
<td style="text-align:center;">    928</td>
<td style="text-align:center;"> 103</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/120">University of Birmingham<br />
Research Archive, E-prints Repository</a></td>
<td style="text-align:center;">    828</td>
<td style="text-align:center;">    802</td>
<td style="text-align:center;">  766</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/121">University of Birmingham<br />
Research Archive, E-theses Repository</a></td>
<td style="text-align:center;">  2,559</td>
<td style="text-align:center;">  2,513</td>
<td style="text-align:center;">2,133</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">2</td>
<td rowspan="1">Bristol</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/11">Bristol Repository of Scholarly Eprints</a></td>
<td style="text-align:center;">   -</td>
<td style="text-align:center;">       4</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">3</td>
<td rowspan="2">Cambridge</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/22">Computer Laboratory Technical Reports<br />
- Cambridge University</a></td>
<td style="text-align:center;">  3,252</td>
<td style="text-align:center;">     520</td>
<td style="text-align:center;">  440</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/27">DSpace @ Cambridge</a></td>
<td style="text-align:center;">216,718</td>
<td style="text-align:center;">192,129</td>
<td style="text-align:center;">2,847</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">4</td>
<td rowspan="1">Cardiff</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/83">Online Research @ Cardiff</a></td>
<td style="text-align:center;">   31,274</td>
<td style="text-align:center;">    1,647</td>
<td style="text-align:center;">1,555</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">5</td>
<td rowspan="2">Durham</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/32">Durham e-Theses</a></td>
<td style="text-align:center;">    4,483</td>
<td style="text-align:center;">   4,411</td>
<td style="text-align:center;">4,051</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/33">Durham Research Online</a></td>
<td style="text-align:center;">    9,062</td>
<td style="text-align:center;">   2,922</td>
<td style="text-align:center;">2,856</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">6</td>
<td rowspan="1">Exeter</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/45">Exeter Research and Institutional Content archive</a></td>
<td style="text-align:center;">    2,547</td>
<td style="text-align:center;">   2,334</td>
<td style="text-align:center;">     4</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">7</td>
<td rowspan="2">Edinburgh</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/38">Edinburgh DataShare</a></td>
<td style="text-align:center;">        75</td>
<td style="text-align:center;">      75</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/39">Edinburgh Research Archive</a></td>
<td style="text-align:center;">    5,769</td>
<td style="text-align:center;">  5,395</td>
<td style="text-align:center;">1,583</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">8</td>
<td rowspan="2">Glasgow</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/47">Glasgow DSpace Service</a></td>
<td style="text-align:center;">   -</td>
<td style="text-align:center;">  -</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/48">Glasgow Theses Service</a></td>
<td style="text-align:center;">    2,682</td>
<td style="text-align:center;">   2,683</td>
<td style="text-align:center;">2,356</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">9</td>
<td rowspan="1">Imperial</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/105">Spiral &#8211; Imperial College Digital Repository</a></td>
<td style="text-align:center;">    8,097</td>
<td style="text-align:center;">   8,094</td>
<td style="text-align:center;">      4</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">10</td>
<td rowspan="1">King&#8217;s College London<br />
(also used King&#8217;s and Kings)</td>
<td>None found</td>
<td style="text-align:center;">   -</td>
<td style="text-align:center;">  -</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">11</td>
<td rowspan="2">Leeds</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/62">leedsmet open search</a> (Incorrect institution)</td>
<td style="text-align:center;">   (-)</td>
<td style="text-align:center;">   (-)</td>
<td style="text-align:center;">   (-)</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/64">Leodis &#8211; A photographic archive of Leeds</a></td>
<td style="text-align:center;">    57,998</td>
<td style="text-align:center;">  57,998</td>
<td style="text-align:center;">   -</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">12</td>
<td rowspan="2">Liverpool</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/65">Liverpool John Moores University Research Archive</a><br />
(Incorrect institution)</td>
<td style="text-align:center;">     (-)</td>
<td style="text-align:center;">   (-)</td>
<td style="text-align:center;">   (-)</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/129">University of Liverpool Research Archive</a></td>
<td style="text-align:center;">      885</td>
<td style="text-align:center;">    810</td>
<td style="text-align:center;">  517</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">13</td>
<td rowspan="2">LSE</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/67">LSE Research Online</a></td>
<td style="text-align:center;">  33,959</td>
<td style="text-align:center;">  6,520</td>
<td style="text-align:center;">6,463</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/170">LSE Theses Online</a></td>
<td style="text-align:center;">      454</td>
<td style="text-align:center;">    454</td>
<td style="text-align:center;">  424</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">14</td>
<td rowspan="2">Manchester</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/35">e-space at Manchester Metropolitan University<br />
</a> (Incorrect institution)<a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/35"><br />
</a></td>
<td style="text-align:center;">  (-)</td>
<td style="text-align:center;">   (-)</td>
<td style="text-align:center;">  (-)</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/234">Manchester eScholar Services</a></td>
<td style="text-align:center;"> 119,854</td>
<td style="text-align:center;">119,854</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">15</td>
<td rowspan="1">Newcastle</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/252">Newcastle University E-Prints</a></td>
<td style="text-align:center;">   -</td>
<td style="text-align:center;">  -</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">16</td>
<td rowspan="2">Nottingham</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/80">Nottingham ePrints</a></td>
<td style="text-align:center;">     1,084</td>
<td style="text-align:center;">   1,026</td>
<td style="text-align:center;">  990</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/81">Nottingham eTheses</a></td>
<td style="text-align:center;">     1,843</td>
<td style="text-align:center;">   1,793</td>
<td style="text-align:center;">1,757</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">17</td>
<td rowspan="1">Oxford</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/88">Oxford University Research Archive</a></td>
<td style="text-align:center;">   16,215</td>
<td style="text-align:center;">   3,745</td>
<td style="text-align:center;">    98</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">18</td>
<td rowspan="1">Queen Mary</td>
<td>None found</td>
<td style="text-align:center;"></td>
<td style="text-align:center;"></td>
<td style="text-align:center;"></td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">19</td>
<td rowspan="1">Queen&#8217;s University Belfast</td>
<td>None found</td>
<td style="text-align:center;">   -</td>
<td style="text-align:center;">  -</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">20</td>
<td rowspan="1">Sheffield</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/102">Sheffield Hallam University Research Archive</a><br />
(Incorrect institution)</td>
<td style="text-align:center;">    (-)</td>
<td style="text-align:center;">  (-)</td>
<td style="text-align:center;">  (-)</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">21</td>
<td rowspan="2">Southampton</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/36">eCrystals &#8211; Southampton</a></td>
<td style="text-align:center;">     602</td>
<td style="text-align:center;">    602</td>
<td style="text-align:center;">  -</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/37">Electronics &amp; Computer Science EPrints Service -<br />
University of Southampton</a></td>
<td style="text-align:center;"> 15,835</td>
<td style="text-align:center;">   8,947</td>
<td style="text-align:center;">7,071</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">22</td>
<td rowspan="1">UCL</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/118">UCL Discovery</a></td>
<td style="text-align:center;">         0</td>
<td style="text-align:center;">245,407</td>
<td style="text-align:center;">      2</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="2">23</td>
<td rowspan="2">Warwick</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/44">EPrints at the Centre for Scientific Computing,<br />
University of Warwick</a></td>
<td style="text-align:center;">   -</td>
<td style="text-align:center;"> -</td>
<td style="text-align:center;">   360</td>
</tr>
<tr>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/136">Warwick Research Archives Portal Repository</a></td>
<td style="text-align:center;">   49,469</td>
<td style="text-align:center;">    7,696</td>
<td style="text-align:center;"> 7,025</td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1">24</td>
<td rowspan="1">York</td>
<td><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/142">York St John University ArchivalWare Digital Library</a><br />
(Incorrect institution)<a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/142"><br />
</a></td>
<td style="text-align:center;">       331</td>
<td style="text-align:center;">         1</td>
<td style="text-align:center;">  -</td>
</tr>
</tbody>
</table>
<p>Note that the <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics">Repository Analytics</a> page does not appear to provide a formal definition of the data collected. However from hovering over the accompanying icon for the entries it appears that the Metadata Download column gives the number of metadata records, the Metadata Readable column gives the number of links extracted from the metadata and the PDF Download column the number of PDFs which were downloaded.</p>
<h2>Discussion</h2>
<p>It is difficult to interpret the data given in the table: the entry for the <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/118">UCL Discovery</a> repository, for example, tells us that there are 0 metadata records, with 245407 links having been extracted from these records and 2 PDFs downloaded!</p>
<p>However the table does suggest patterns of naming conventions for institutional repositories, such as the institutional name being provided at the beginning (&#8220;<em>University of Birmingham Research Archive, E-prints Repository</em>&#8220;, &#8220;<em>University of Liverpool Research Archive</em>&#8221; and &#8220;LSE Research Online&#8221;) or end of the repository name (&#8220;<em>EPrints at the Centre for Scientific Computing, University of Warwick</em>&#8220;, &#8220;<em>Electronics &amp; Computer Science EPrints Service - University of Southampton</em>&#8221; and &#8220;<em>Computer Laboratory Technical Reports - Cambridge University</em>&#8220;) together with a large number of examples which use a partial form of the institution&#8217;s name (e.g. &#8220;Edinburgh Research Archive&#8221;, &#8220;Glasgow DSpace Service&#8221; and &#8220;<em>Manchester eScholar Services</em>&#8220;).</p>
<p>But of greater interest are the institutional repositories which have been harvested by CORE but are missing from this search such as &#8220;<a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/34">e-Prints Soton</a>&#8221; and the &#8220;<a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/139">White Rose E-theses Online</a>&#8221; and &#8220;<a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/140">White Rose Research Online</a>&#8221; repositories which are used by the universities of Leeds, York and Sheffield.</p>
<p>Whilst the ownership of a repository will be apparent to the end user who access the service via the main entry point (perhaps from the institution&#8217;s Library Web site) in a number of cases such information is not apparent when the repository has been harvested and accessed using other systems such as, in this case, the interface developed by the CORE project.</p>
<p>In light of the findings from a survey of Russell group Universities, I would make the following simple recommendation:</p>
<p style="padding-left:30px;"><strong>Institutional repositories should contain the name of the host institution.</strong></p>
<p>In order to illustrate the need for such a recommendation, here are a list of repositories which have been harvested by CORE:</p>
<p style="padding-left:30px;"><a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/3">Access to Research Resources for Teachers</a> - <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/26">Department of Computer Science E-Repository</a> - <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/42">Enlighten</a> - <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/73">Modern Languages Publications Archive</a> - <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/82">Online Publications Store</a> - <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/86">Open Research Online</a> - <a rel="nofollow" target="_blank" href="http://core.kmi.open.ac.uk/repository_analytics/display/90">Pharmacy Eprints</a></p>
<p>If you are unfamiliar with these repositories, would you to able to guess who owns them?</p>
<p>Or, to put it another way, meaningful metadata is important for repositories!</p>
<hr />
<p>View Twitter conversation from: [<a rel="nofollow" target="_blank" href="http://topsy.com/ukwebfocus.wordpress.com/2013/02/21/naming-conventions-for-institutional-repositories-lessons-from-core/">Topsy</a>] | View Twitter statistics from: [<a rel="nofollow" target="_blank" href="http://tweetreach.com/reach?q=http%3A%2F%2Fukwebfocus.wordpress.com%2F2013%2F02%2F21%2Fnaming-conventions-for-institutional-repositories-lessons-from-core%2F">TweetReach</a>] – [<a rel="nofollow" target="_blank" href="https://bitly.com/ZjFKb8+/global">Bit.ly</a>]</p>
<br />Filed under: <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/openness/'>openness</a>, <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/repositories/'>Repositories</a>  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ukwebfocus.wordpress.com/13367/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godelicious/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ukwebfocus.wordpress.com/13367/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gofacebook/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ukwebfocus.wordpress.com/13367/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gotwitter/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ukwebfocus.wordpress.com/13367/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gostumble/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ukwebfocus.wordpress.com/13367/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godigg/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ukwebfocus.wordpress.com/13367/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/goreddit/ukwebfocus.wordpress.com/13367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ukwebfocus.wordpress.com/13367/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&#038;blog=497535&#038;post=13367&#038;subd=ukwebfocus&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <geo:lat>51.379915</geo:lat>
         <geo:long>-2.331708</geo:long>
         <media:content medium="image" url="http://2.gravatar.com/avatar/27731abff266f585f006998f65c74be9?s=96&amp;amp;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96">
            <media:title type="html">Brian Kelly (UK Web Focus)</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/02/core-project.png">
            <media:title type="html">CORE project</media:title>
         </media:content>
      </item>
      <item>
         <title>Profiling Use of Third-Party Research Repository Services</title>
         <link>http://ukwebfocus.wordpress.com/2013/02/12/profiling-use-of-third-party-research-repository-services/</link>
         <description>Background In a recent post I explained Why I’m Evaluating ResearchGate. In the post I summarised the reasons why I felt that researchgate.net could provide an additional service for depositing research papers which would complement Opus, the University of Bath institutional repository. But what others services might also be relevant? And which services are hosting the largest numbers of research [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&amp;#038;blog=497535&amp;#038;post=13269&amp;#038;subd=ukwebfocus&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=13269</guid>
         <pubDate>Tue, 12 Feb 2013 10:37:02 +0000</pubDate>
         <content:encoded><![CDATA[<h2>Background</h2>
<div id="attachment_13334" class="wp-caption alignright" style="width:130px;"><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/02/oxford-researchgate.png"><img class="size-full wp-image-13334" alt="How significant is use of third-party repository services?" src="http://ukwebfocus.files.wordpress.com/2013/02/oxford-researchgate.png?w=120&#038;h=384" width="120" height="384"/></a><p class="wp-caption-text">How significant is use of third-party repository services?</p></div>
<p>In a recent post I explained <a rel="nofollow" title="Permanent link to Why I&#x002019;m Evaluating&#xa0;ResearchGate" target="_blank" href="http://ukwebfocus.wordpress.com/2013/02/06/why-im-evaluating-researchgate/">Why I’m Evaluating ResearchGate</a>. In the post I summarised the reasons why I felt that <a rel="nofollow" target="_blank" href="http://researchgate.net/">researchgate.net</a> could provide an additional service for depositing research papers which would complement <a rel="nofollow" target="_blank" href="http://opus.ukoln.ac.uk/">Opus</a>, the University of Bath institutional repository. But what others services might also be relevant? And which services are hosting the largest numbers of research papers?</p>
<p>In order to seek answers to these questions, I used Google to provide a measure of the size of a number of hosting services for PDFs and the number of PDFs they host. The services I analysed were:</p>
<ul>
<li><a rel="nofollow" target="_blank" href="https://www.researchgate.net/">Researchgate.net</a>: This site is <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/ResearchGate">described in Wikipedia</a> as &#8220;<em>a social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. The site has been described as a mash-up of “Facebook, Twitter and LinkedIn” that includes “profile pages, comments, groups, job listings, and ‘like’ and ‘follow’ buttons”. Members are encouraged to share raw data and failed experiment results as well as successes, in order to avoid repeating their peers’ scientific research mistakes.</em>&#8220;</li>
<li><a rel="nofollow" target="_blank" href="http://academia.edu/">Academia.edu</a>: This site is <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Academia.edu">described in Wikipedia</a> as &#8220;<em>a platform for academics to share research papers. It was launched in September 2008. Currently the site is approaching 2 million registered users.<sup id="cite_ref-2"><a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Academia.edu#cite_note-2">[2]</a></sup> The platform can be used to share papers, monitor their impact, and follow the research in a particular field.</em>&#8220;</li>
<li><a rel="nofollow" target="_blank" href="http://www.mendeley.com/">Mendeley.com</a>: Thus site is <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Mendeley">described in Wikipedia</a> as &#8220;<em>a desktop and web program for managing and sharing research papers,<sup id="cite_ref-2"><a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Mendeley#cite_note-2">[2]</a></sup> discovering research data and collaborating online. It combines Mendeley Desktop, a PDF and <a rel="nofollow" title="Reference management software" target="_blank" href="http://en.wikipedia.org/wiki/Reference_management_software">reference management application</a> (available for Windows, Mac and Linux) with Mendeley Web, an online <a rel="nofollow" title="Social software" target="_blank" href="http://en.wikipedia.org/wiki/Social_software">social</a> network for researchers.<sup id="cite_ref-HullD2008_3-0"><a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Mendeley#cite_note-HullD2008-3">[3]</a></sup><sup id="cite_ref-4"><a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Mendeley#cite_note-4">[4]</a></sup><sup id="cite_ref-5"><a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Mendeley#cite_note-5">[5]</a></sup> Mendeley requires the user to store all basic citation data on its servers &#8211; storing copies of documents is at the user&#8217;s discretion</em>&#8220;.</li>
<li><a rel="nofollow" target="_blank" href="http://www.citeulike.org/">Citeulike.org</a>: This site is <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Citeulike">described in Wikipedia</a> as &#8220;<em>based on the principle of <a rel="nofollow" title="Social bookmarking" target="_blank" href="http://en.wikipedia.org/wiki/Social_bookmarking">social bookmarking</a> [the service] is aimed to promote and to develop the sharing of scientific references amongst <a rel="nofollow" title="Researchers" target="_blank" href="http://en.wikipedia.org/wiki/Researchers">researchers</a>. In the same way that it is possible to catalog <a rel="nofollow" title="Web page" target="_blank" href="http://en.wikipedia.org/wiki/Web_page">web pages</a> (with <a rel="nofollow" title="Furl" target="_blank" href="http://en.wikipedia.org/wiki/Furl">Furl</a> and <a rel="nofollow" title="Del.icio.us" target="_blank" href="http://en.wikipedia.org/wiki/Del.icio.us">del.icio.us</a>) or photographs (with <a rel="nofollow" title="Flickr" target="_blank" href="http://en.wikipedia.org/wiki/Flickr">Flickr</a>), scientists can share information on <a rel="nofollow" title="Academic publishing" target="_blank" href="http://en.wikipedia.org/wiki/Academic_publishing#Academic_paper">academic papers</a> with specific tools (like CiteULike) developed for that purpose</em>&#8220;.</li>
<li><a rel="nofollow" target="_blank" href="http://www.scribd.com/">Scribd.com</a>: This site is <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Scribd">described in Wikipedia</a> as &#8220;<em>a document-sharing website that allows users to post documents of various formats, and embed them into a web page using its iPaper format</em>&#8220;.</li>
</ul>
<p>Many researchers will probably be familiar with the first four services listed. The fifth service, scribd.com, is included in order to explore whether a general-purpose PDF repository service could have a role to play in supporting the sharing of research publications.</p>
<h2>Findings for the Coverage of the Services</h2>
<p>Google was used in order to provide an estimate of the coverage of the services, including the total number of resources which have been indexed by Google and the number of PDF files. The findings are given in the following table. Note that the figures were initially collected on 6 February 2013. In order to check <a rel="nofollow" target="_blank" href="http://www.ehow.com/info_12218524_google-index-count-fluctuate-much.html">the volatility of the findings</a> the searches were repeated on 11 February.</p>
<table border="1">
<tbody>
<tr>
<td><strong>Search for</strong></td>
<td><strong>Search Term</strong></td>
<td><strong>Nos. of results</strong></td>
<td><strong>Date</strong></td>
</tr>
<tr>
<td colspan="4"><strong>Service: <strong>researchgate.net</strong></strong></td>
</tr>
<tr>
<td rowspan="2">Total number of resources</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:researchgate.net">site:researchgate.net</a></em></td>
<td style="text-align:center;">55,300,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">56,100,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td rowspan="2">Total number of PDF files</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:researchgate.net+filetype%3Apdf">site:researchgate.net filetype:pdf</a></em></td>
<td style="text-align:center;">  2,980,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">  2,910,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td colspan="4"><strong>Service: <strong>academia.edu</strong></strong></td>
</tr>
<tr>
<td rowspan="2">Total number of resources</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:academia.edu">site:academia.edu</a></em></td>
<td style="text-align:center;">12,500,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;"> 12,400,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td rowspan="2">Total number of PDF files</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:academia.edu+filetype%3Apdf">site:academia.edu filetype:pdf</a></em></td>
<td style="text-align:center;">          4,930</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">         4,740</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td colspan="4"><strong>Service: <strong>mendeley.com</strong></strong></td>
</tr>
<tr>
<td rowspan="2">Total number of resources</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:mendeley.com">site:mendeley.com</a></em></td>
<td style="text-align:center;">  3,310,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">  3,150,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td rowspan="2">Total number of PDF files</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:mendeley.com+filetype%3Apdf">site:mendeley.com filetype:pdf</a></em></td>
<td style="text-align:center;">         3,840</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">         4,020</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td colspan="4"><strong>Service: <strong>citeulike.org</strong></strong></td>
</tr>
<tr>
<td rowspan="2">Total number of resources</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:citeulike.org">site:citeulike.org</a></em></td>
<td style="text-align:center;"> 35,600,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;"> 35,700,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td rowspan="2">Total number of PDF files</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:citeulike.org+filetype%3Apdf">site:citeulike.org filetype:pdf</a></em></td>
<td style="text-align:center;">             244</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">               30</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td colspan="4"><strong>Service: <strong>scribd.com</strong></strong></td>
</tr>
<tr>
<td rowspan="2">Total number of resources</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=scribd.com">site:scribd.com</a></em></td>
<td style="text-align:center;">  61,300,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">166,000,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td rowspan="2">Total number of PDF files</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=scribd.com+filetype%3Apdf">site:scribd.com filetype:pdf</a></em></td>
<td style="text-align:center;">                 -</td>
<td>6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">371,000,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td colspan="4"><strong>Service: <strong>issuu.com</strong></strong></td>
</tr>
<tr>
<td rowspan="2">Total number of resources</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=issuu.com">site:issuu.com</a></em></td>
<td style="text-align:center;">10,300,000</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">26,100,000</td>
<td>11 Feb 2013</td>
</tr>
<tr>
<td rowspan="2">Total number of PDF files</td>
<td rowspan="2"><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=issuu.com+filetype%3Apdf">site:issuu.com filetype:pdf</a></em></td>
<td style="text-align:center;">       48,800</td>
<td>  6 Feb 2013</td>
</tr>
<tr>
<td style="text-align:center;">       48,800</td>
<td>11 Feb 2013</td>
</tr>
</tbody>
</table>
<p><span style="font-size:13px;line-height:19px;">It seems that Scribd hosts a very large number of resources (although a finding of 3 PDF resources originally found was discarded as the results seemed to be unreliable).</span></p>
<p><span style="font-size:13px;line-height:19px;">However since Scribd is a general purpose repository service, it was felt that ResearchGate provides a repository of a large number of PDFs resources which are more relevant for researchers. In light of this confirmation of the popularity of Researchgate an additional survey was carried out which reported on use of the service across Russell Group universities.</span></p>
<h2>Findings for Institutional Use of Academic.edu and Researchgate</h2>
<p>On 1 August 2012 a <a rel="nofollow" target="_blank" href="http://ukwebfocus.wordpress.com/2012/08/01/a-survey-of-use-of-researcher-profiling-services-by-the-24-russell-group-universities/">Survey of Use of Researcher Profiling Services Across the 24 Russell Group Universities</a> was published on this blog. This survey has been repeated in order to detect changes in the use of ResearchGate. Since the original survey also provided an analysis of Academia.edu, this was also included in the current survey. The results are given in the following table. Note that <a rel="nofollow" target="_blank" href="https://docs.google.com/spreadsheet/ccc?key=0AqyjJ9Eviy8idHZSODRyZHB4WlRWa0l6US1tNjN6Snc#gid=0">the data is also available in Google Spreadsheets.</a></p>
<table border="1">
<tbody>
<tr>
<td rowspan="4" colspan="1"><strong>Ref.<br />
No.</strong></td>
<td rowspan="4" colspan="1"><strong>Institution</strong></td>
<td rowspan="3" colspan="2"><strong>Academia.edu (members)</strong></td>
<td style="text-align:center;" rowspan="1" colspan="5"><strong>ResearchGate</strong></td>
</tr>
<tr>
<td style="text-align:center;" rowspan="1" colspan="2"><strong>Aug 2012</strong></td>
<td style="text-align:center;" rowspan="1" colspan="2"><strong>Feb 2013</strong></td>
</tr>
<tr>
<td rowspan="2"><strong>Members</strong></td>
<td rowspan="2"><strong>Publications</strong></td>
</tr>
<tr>
<td rowspan="1"><strong>Aug 2012</strong></td>
<td rowspan="1"><strong>Feb 2013<sup>*</sup></strong></td>
<td rowspan="1"><strong>Members</strong></td>
<td rowspan="1"><strong>Publications</strong></td>
</tr>
<tr>
<td style="text-align:center;">1</td>
<td><a rel="nofollow" target="_blank" href="http://www.bham.ac.uk/">University of Birmingham</a></td>
<td style="text-align:center;">1,210</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="http://birmingham.academia.edu/">1,562</a></td>
<td style="text-align:center;"> 782</td>
<td style="text-align:center;">19,515</td>
<td style="text-align:center;">1,439</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Birmingham">22,068</a></td>
</tr>
<tr>
<td style="text-align:center;">2</td>
<td><a rel="nofollow" target="_blank" href="http://www.bristol.ac.uk/">University of Bristol</a></td>
<td style="text-align:center;"> 1,018</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://bristol.academia.edu/">1,189</a></td>
<td style="text-align:center;">  641</td>
<td style="text-align:center;">21,249</td>
<td style="text-align:center;"> 1,251 <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Bristol"><br />
</a></td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Bristol">23,701</a></td>
</tr>
<tr>
<td style="text-align:center;">3</td>
<td><a rel="nofollow" target="_blank" href="http://www.cam.ac.uk/">University of Cambridge</a></td>
<td style="text-align:center;"> 3,020</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://cambridge.academia.edu/">3,439</a></td>
<td style="text-align:center;">  972</td>
<td style="text-align:center;">39,713</td>
<td style="text-align:center;">1,699</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Cambridge">42,419</a></td>
</tr>
<tr>
<td style="text-align:center;">4</td>
<td><a rel="nofollow" target="_blank" href="http://www.cf.ac.uk/">Cardiff University</a></td>
<td style="text-align:center;">    906</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://cardiff.academia.edu/">1,071</a></td>
<td style="text-align:center;">  646</td>
<td style="text-align:center;">  9,596</td>
<td style="text-align:center;">1,272</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Cardiff_University">10,696</a></td>
</tr>
<tr>
<td style="text-align:center;">5</td>
<td><a rel="nofollow" target="_blank" href="http://www.durham.ac.uk/">Durham University</a></td>
<td style="text-align:center;"> 1,001</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="http://durham.academia.edu/">1,189</a></td>
<td style="text-align:center;"> 273</td>
<td style="text-align:center;"> 1,151</td>
<td style="text-align:center;">   662</td>
<td style="text-align:center;">  <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Durham_University">7,152</a></td>
</tr>
<tr>
<td style="text-align:center;">6</td>
<td><a rel="nofollow" target="_blank" href="http://www.ex.ac.uk/">University of Exeter</a></td>
<td style="text-align:center;">   919</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="http://ex.academia.edu/">1,106</a></td>
<td style="text-align:center;">  269</td>
<td style="text-align:center;"> 5,150</td>
<td style="text-align:center;">  652</td>
<td style="text-align:center;">  <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Exeter">6,191</a></td>
</tr>
<tr>
<td style="text-align:center;">7</td>
<td><a rel="nofollow" target="_blank" href="http://www.ed.ac.uk/">University of Edinburgh</a></td>
<td style="text-align:center;"> 2,079</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="http://edinburgh.academia.edu/">2,479</a><a rel="nofollow" target="_blank" href="http://www.linkedin.com/company/5738"><br />
</a></td>
<td style="text-align:center;">1,181</td>
<td style="text-align:center;">25,918</td>
<td style="text-align:center;">2,065</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/The_University_of_Edinburgh">28,486</a></td>
</tr>
<tr>
<td style="text-align:center;">8</td>
<td><a rel="nofollow" target="_blank" href="http://www.gla.ac.uk/">University of Glasgow</a></td>
<td style="text-align:center;">1,004<a rel="nofollow" target="_blank" href="http://glasgow.academia.edu/"><br />
</a></td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://glasgow.academia.edu/">1,212</a></td>
<td style="text-align:center;">   613</td>
<td style="text-align:center;">20,041</td>
<td style="text-align:center;">1,224</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Glasgow">21,733</a></td>
</tr>
<tr>
<td style="text-align:center;">9</td>
<td><a rel="nofollow" target="_blank" href="http://www.ic.ac.uk/">Imperial College</a></td>
<td style="text-align:center;">   798</td>
<td style="text-align:center;">    <a rel="nofollow" target="_blank" href="http://imperial.academia.edu/">896</a></td>
<td style="text-align:center;">1,096</td>
<td style="text-align:center;">30,404</td>
<td style="text-align:center;">1,377</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Imperial_College_London">34,202</a></td>
</tr>
<tr>
<td style="text-align:center;">10</td>
<td><a rel="nofollow" target="_blank" href="http://www.kcl.ac.uk/">King’s College London</a></td>
<td style="text-align:center;">1,420</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://kcl.academia.edu/">1,748</a></td>
<td style="text-align:center;">1,406</td>
<td style="text-align:center;">18,264</td>
<td style="text-align:center;">2,241</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Kings_College_London">23,391</a></td>
</tr>
<tr>
<td style="text-align:center;">11</td>
<td><a rel="nofollow" target="_blank" href="http://www.leeds.ac.uk/">University of Leeds</a></td>
<td style="text-align:center;">1,657</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://leeds.academia.edu/">1,871</a></td>
<td style="text-align:center;">   848</td>
<td style="text-align:center;"> 16,944</td>
<td style="text-align:center;">1,455<a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Leeds"><br />
</a></td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Leeds">19,560</a></td>
</tr>
<tr>
<td style="text-align:center;">12</td>
<td><a rel="nofollow" target="_blank" href="http://www.liv.ac.uk/">University of Liverpool</a></td>
<td style="text-align:center;">  866</td>
<td style="text-align:center;">    <a rel="nofollow" target="_blank" href="http://liverpool.academia.edu/">989</a></td>
<td style="text-align:center;">  582</td>
<td style="text-align:center;"> 16,475</td>
<td style="text-align:center;">1,146</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Liverpool">18,749</a></td>
</tr>
<tr>
<td style="text-align:center;">13</td>
<td><a rel="nofollow" target="_blank" href="http://www.lse.ac.uk/">London School of Economics</a></td>
<td style="text-align:center;">1,131</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://lse.academia.edu/">1,354</a></td>
<td style="text-align:center;">   191</td>
<td style="text-align:center;">   1,838</td>
<td style="text-align:center;">   407</td>
<td style="text-align:center;">  <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/The_London_School_of_Economics_and_Political_Science">2,449</a></td>
</tr>
<tr>
<td style="text-align:center;">14</td>
<td><a rel="nofollow" target="_blank" href="http://www.man.ac.uk/">University of Manchester</a></td>
<td style="text-align:center;">2,279</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://manchester.academia.edu/">2,590</a></td>
<td style="text-align:center;">1,113</td>
<td style="text-align:center;"> 25,139</td>
<td style="text-align:center;">2,188</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/The_University_of_Manchester">29,675</a></td>
</tr>
<tr>
<td style="text-align:center;">15</td>
<td><a rel="nofollow" target="_blank" href="http://www.newcastle.ac.uk/">Newcastle University</a></td>
<td style="text-align:center;">   906</td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="http://newcastle.academia.edu/">1,039</a></td>
<td style="text-align:center;">   704</td>
<td style="text-align:center;"> 17,307</td>
<td style="text-align:center;">1,348</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Newcastle_University">17,376</a></td>
</tr>
<tr>
<td style="text-align:center;">16</td>
<td><a rel="nofollow" target="_blank" href="http://www.nottingham.ac.uk/">University of Nottingham</a></td>
<td style="text-align:center;">1,299</td>
<td>       <a rel="nofollow" target="_blank" href="http://nottingham.academia.edu/">1,529</a></td>
<td style="text-align:center;">   970</td>
<td style="text-align:center;"> 20,513</td>
<td style="text-align:center;">1,559</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Nottingham">20,145</a></td>
</tr>
<tr>
<td style="text-align:center;">17</td>
<td><a rel="nofollow" target="_blank" href="http://www.ox.ac.uk/">University of Oxford</a></td>
<td style="text-align:center;">3,842</td>
<td>       <a rel="nofollow" target="_blank" href="http://oxford.academia.edu/">4,469</a></td>
<td style="text-align:center;">1,221</td>
<td style="text-align:center;"> 38,224</td>
<td style="text-align:center;">1,967</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Oxford">39,861</a></td>
</tr>
<tr>
<td style="text-align:center;">18</td>
<td><a rel="nofollow" target="_blank" href="http://www.qmul.ac.uk/">Queen Mary</a></td>
<td style="text-align:center;">   715</td>
<td>          <a rel="nofollow" target="_blank" href="http://qmul.academia.edu/">849</a></td>
<td style="text-align:center;">  228</td>
<td style="text-align:center;">   5,232</td>
<td style="text-align:center;">   898<a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Queen_Mary_University_of_London"><br />
</a></td>
<td style="text-align:center;"> <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Queen_Mary_University_of_London">6,013</a></td>
</tr>
<tr>
<td style="text-align:center;">19</td>
<td><a rel="nofollow" target="_blank" href="http://www.qub.ac.uk/">Queen’s University Belfast</a></td>
<td style="text-align:center;">   689</td>
<td>          <a rel="nofollow" target="_blank" href="http://qub.academia.edu/">774</a></td>
<td style="text-align:center;">  479</td>
<td style="text-align:center;">10,750</td>
<td style="text-align:center;">   864</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/Queens_University_Belfast">11,699</a></td>
</tr>
<tr>
<td style="text-align:center;">20</td>
<td><a rel="nofollow" target="_blank" href="http://www.shef.ac.uk/">University of Sheffield</a></td>
<td style="text-align:center;"> 1,082</td>
<td>       <a rel="nofollow" target="_blank" href="http://sheffield.academia.edu/">1,235</a></td>
<td style="text-align:center;">  823</td>
<td style="text-align:center;">18,127</td>
<td style="text-align:center;"> 1,659</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/The_University_of_Sheffield">20,149</a></td>
</tr>
<tr>
<td style="text-align:center;">21</td>
<td><a rel="nofollow" target="_blank" href="http://www.soton.ac.uk/">University of Southampton</a></td>
<td style="text-align:center;"> 1,083</td>
<td>       <a rel="nofollow" target="_blank" href="http://soton.academia.edu/">1,265</a></td>
<td style="text-align:center;">  670</td>
<td style="text-align:center;"> 16,887</td>
<td style="text-align:center;"> 1,371</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Southampton">18,325</a></td>
</tr>
<tr>
<td style="text-align:center;">22</td>
<td><a rel="nofollow" target="_blank" href="http://www.ucl.ac.uk/">University College London</a></td>
<td style="text-align:center;"> 2,776</td>
<td>       <a rel="nofollow" target="_blank" href="http://ucl.academia.edu/">3,162</a></td>
<td style="text-align:center;">1,624</td>
<td style="text-align:center;"> 35,035</td>
<td style="text-align:center;"> 2,878</td>
<td style="text-align:center;"><a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_College_London">38,550</a></td>
</tr>
<tr>
<td style="text-align:center;">23</td>
<td><a rel="nofollow" target="_blank" href="http://www.warwick.ac.uk/">University of Warwick</a></td>
<td style="text-align:center;">1,143</td>
<td>       <a rel="nofollow" target="_blank" href="http://warwick.academia.edu/">1,349</a></td>
<td style="text-align:center;">   448<a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/The_University_of_Warwick/"><br />
</a></td>
<td style="text-align:center;">  8,098</td>
<td style="text-align:center;">    873</td>
<td style="text-align:center;">  <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/The_University_of_Warwick">9,334</a></td>
</tr>
<tr>
<td style="text-align:center;">24</td>
<td><a rel="nofollow" target="_blank" href="http://www.york.ac.uk/">University of York</a></td>
<td style="text-align:center;">   986</td>
<td>       <a rel="nofollow" target="_blank" href="http://york.academia.edu/">1,180</a></td>
<td style="text-align:center;">   386</td>
<td style="text-align:center;">  4,841</td>
<td style="text-align:center;">   696<a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_York"><br />
</a></td>
<td style="text-align:center;">  <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_York">5,179</a></td>
</tr>
<tr>
<td></td>
<td><strong>TOTAL</strong></td>
<td>   <strong>33,829</strong></td>
<td style="text-align:center;"><strong>39,546</strong></td>
<td><strong>18,166</strong></td>
<td>  <strong>426,414</strong></td>
<td style="text-align:center;"> <strong>33,191</strong></td>
<td style="text-align:center;"><strong>477,103</strong></td>
</tr>
<tr>
<td></td>
<td><strong>Increase (%)</strong></td>
<td style="text-align:center;">   <strong><br />
</strong></td>
<td style="text-align:center;"><strong>  14.5%</strong></td>
<td style="text-align:center;"></td>
<td style="text-align:center;"></td>
<td style="text-align:center;"><strong> 82.7% </strong></td>
<td style="text-align:center;"><strong>  11.9%</strong></td>
</tr>
</tbody>
</table>
<p><strong>Note</strong>: *  As described in the previous survey the numbers of Academia.edu members is obtained by entering the name of the institution in the search box.</p>
<h2>Conclusions</h2>
<div id="attachment_13333" class="wp-caption alignright" style="width:336px;"><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/02/researchgate-publications.png"><img class="size-full wp-image-13333" title="Nos. of items deposited in Researchgate in Aug 2012 (blue) &amp; Feb 2013 (red)" alt="Nos. of Researchgate publications" src="http://ukwebfocus.files.wordpress.com/2013/02/researchgate-publications.png?w=326&#038;h=208" width="326" height="208"/></a><p class="wp-caption-text">Nos. of items deposited in Researchgate in Aug 2012 (blue) &amp; Feb 2013 (red)</p></div>
<div id="attachment_13332" class="wp-caption alignright" style="width:330px;"><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/02/researchgate-members.png"><img class="size-full wp-image-13332" style="margin-left:10px;margin-right:10px;" title="Nos. of Researchgate Members in Aug 2012 (blue) &amp; Feb 2013 (red)" alt="Nos. of Researchgate Members" src="http://ukwebfocus.files.wordpress.com/2013/02/researchgate-members.png?w=320&#038;h=213" width="320" height="213"/></a><p class="wp-caption-text">Nos. of Researchgate Members in Aug 2012 (blue) &amp; Feb 2013 (red)</p></div>
<p>As illustrated in the accompanying diagrams it seems that the numbers of researchers who have signed up for a ResearchGate account has grown significantly over the past six months, and now stands at over 33,000 users, a growth of 82.7%. The numbers of papers which have been deposited by researchers at Russell Group universities has also grown to a total of over 477, 000 items. However since this represents a growth of 11.9% over six months it suggests that new members are providing metadata records only and not depositing the full text.</p>
<p>I therefore conclude that the conclusions I reached in my post which explained <a rel="nofollow" title="Permanent link to Why I&#x002019;m Evaluating&#xa0;ResearchGate" target="_blank" href="http://ukwebfocus.wordpress.com/2013/02/06/why-im-evaluating-researchgate/">Why I’m Evaluating ResearchGate</a> were correct and ResearchGate is a service which I should use not only to provide a presence about my research activities but also to host my research papers. I do wonder, though, whether the large numbers of items which have been deposited in ResearchGate is due to promotion of the service with the Russell Group universities or represents a bottom-up approach, in which researchers have recognised the benefits of the service and recommended it to their peers?</p>
<hr />
<p>View Twitter conversation from: [<a rel="nofollow" target="_blank" href="http://topsy.com/ukwebfocus.wordpress.com/2013/02/12/profiling-use-of-third-party-research-repository-services/">Topsy</a>] | View Twitter statistics from: [<a rel="nofollow" target="_blank" href="http://tweetreach.com/reach?q=http%3A%2F%2Fukwebfocus.wordpress.com%2F2013%2F02%2F12%2Fprofiling-use-of-third-party-research-repository-services%2F">TweetReach</a>] – [<a rel="nofollow" target="_blank" href="https://bitly.com/Xyh0bO+/global">Bit.ly</a>]</p>
<br />Filed under: <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/openness/'>openness</a>, <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/repositories/'>Repositories</a>  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ukwebfocus.wordpress.com/13269/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godelicious/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ukwebfocus.wordpress.com/13269/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gofacebook/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ukwebfocus.wordpress.com/13269/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gotwitter/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ukwebfocus.wordpress.com/13269/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gostumble/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ukwebfocus.wordpress.com/13269/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godigg/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ukwebfocus.wordpress.com/13269/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/goreddit/ukwebfocus.wordpress.com/13269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ukwebfocus.wordpress.com/13269/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&#038;blog=497535&#038;post=13269&#038;subd=ukwebfocus&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <geo:lat>51.379915</geo:lat>
         <geo:long>-2.331708</geo:long>
         <media:content medium="image" url="http://2.gravatar.com/avatar/27731abff266f585f006998f65c74be9?s=96&amp;amp;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96">
            <media:title type="html">Brian Kelly (UK Web Focus)</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/02/oxford-researchgate.png">
            <media:title type="html">How significant is use of third-party repository services?</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/02/researchgate-publications.png">
            <media:title type="html">Nos. of items deposited in Researchgate in Aug 2012 (blue) &amp;amp; Feb 2013 (red)</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/02/researchgate-members.png">
            <media:title type="html">Nos. of Researchgate Members in Aug 2012 (blue) &amp;amp; Feb 2013 (red)</media:title>
         </media:content>
      </item>
      <item>
         <title>Week endNotes: Digital Infrastructure</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/02/11/week-endnotes-digital-infrastructure/</link>
         <description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/files/2013/01/136Cockcroft070704LO.jpg"&gt;&lt;img class="aligncenter size-full wp-image-1656" src="http://infteam.jiscinvolve.org/wp/files/2013/01/136Cockcroft070704LO.jpg" alt="" width="425" height="264"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;It seems that all the cool kids produce weekly notes on their blogs reflecting back on the work that&amp;#8217;s taken place over the week! Take a look at the &lt;a rel="nofollow" target="_blank" href="http://digital.cabinetoffice.gov.uk/category/gds/week-notes/"&gt;Government&amp;#8217;s Digital Service&lt;/a&gt;, &lt;a rel="nofollow" target="_blank" href="http://berglondon.com/blog/tag/weeknotes/"&gt;Berg&lt;/a&gt; and &lt;a rel="nofollow" target="_blank" href="http://dougbelshaw.com/blog/tag/weeknotes/"&gt;Doug Belshaw&amp;#8217;s blog&lt;/a&gt; for some really good examples.&lt;/p&gt;
&lt;p&gt;So, with these Week endNotes the plan is to reflect over the past week of work, events and general happenings from the Digital infrastructure team. This first week will have a definite library flavour about it&amp;#8230;&lt;/p&gt;
&lt;h2&gt;Library Systems&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;First up is a slight cheat as this &lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/2013/01/28/living-in-interesting-times-kualiole-announce-further-funding-from-mellon-foundation/"&gt;KualiOLE announcement&lt;/a&gt; was from last week! The community source library system &lt;a rel="nofollow" target="_blank" href="http://www.kuali.org/ole"&gt;Kuali Open Library Environment&lt;/a&gt; announced a further $750,000 in funding to help finish the development of the project. An important announcement for the project and for those institutions, in the US and elsewhere, who are watching the progress of the project.&lt;/li&gt;
&lt;li&gt;The &lt;a rel="nofollow" target="_blank" href="http://www.lmschange.info/blog/2013/01/the-library-service-platform-context-key-resources/"&gt;LMS Change landscape document&lt;/a&gt; has been published on the project blog which provides a very good overview of the current Library Systems landscape. Well worth a read if you can.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Library Analytics&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="font-size:13px;font-weight:normal;"&gt;This week saw the announcement of the very exciting &lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/2013/02/01/jisc-lamp-shedding-light-on-library-data-and-metrics/"&gt;Library Analytics and Metrics Project&lt;/a&gt; (jiscLAMP). The project will develop a shared library analytics service for UK HE. Stay tuned for more information soon!&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;There has also been the recent release of the Activity Data Report entitled: &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/publications/reports/2012/activity-data-delivering-benefits.aspx"&gt;Delivering benefits from the data deluge&lt;/a&gt;. This is a great report and worth taking a look at for both background to the issues as well as the use-cases and implications for institutions in being able to effectively analyse and act upon data.&lt;/li&gt;
&lt;li&gt;Finally, but by no means least, one of the most interesting posts on the blog last week was on &lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/2013/02/01/observing-the-web/"&gt;Observing the Web&lt;/a&gt; by my colleague Neil Grindley. The post describes the work of the &lt;a rel="nofollow" target="_blank" href="http://webscience.org/"&gt;Web Science Trust&lt;/a&gt; and their aim to build a global network of Web Observatories providing an open analytics environment to drive new forms of Web research. The implications of this work seem significant &amp;#8211; and is something of particular importance to the LAMP project and its aim to aggregate different data sets in order to under cover and describe new and useful narratives.&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;So that&amp;#8217;s it for this week. Next week there won&amp;#8217;t be a Week endNote as I am away for work (it seems the experiment is over before it begins!), but they&amp;#8217;ll be back the week after.&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;As a final link, I thought it might be worth noting the release of the &lt;a rel="nofollow" target="_blank" href="http://www.nmc.org/publications/2013-horizon-report-higher-ed"&gt;NMC Horizon Report for 2013&lt;/a&gt;. There&amp;#8217;s all the kinds of things you&amp;#8217;d expect to see: Mobile Apps, Augmented Reality and, of course, MOOCs. There&amp;#8217;s also Wearable technologies and 3D printing a little further away on the horizon too!&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
         <author>Ben Showers</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1654</guid>
         <pubDate>Mon, 11 Feb 2013 06:16:52 +0000</pubDate>
      </item>
      <item>
         <title>Why I’m Evaluating ResearchGate</title>
         <link>http://ukwebfocus.wordpress.com/2013/02/06/why-im-evaluating-researchgate/</link>
         <description>A PDF Repository for my Research Publications In a recent post which explained Why I’m Now Embedding ORCID Metadata in PDFs I described my intentions to ensure that my research papers contains rich embedded metadata to held enhance the discoverability of the publications, ensure that authorship is asserted (by embedding the ORCID ID of the authors of the [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&amp;#038;blog=497535&amp;#038;post=13141&amp;#038;subd=ukwebfocus&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=13141</guid>
         <pubDate>Wed, 06 Feb 2013 10:00:00 +0000</pubDate>
         <content:encoded><![CDATA[<h2>A PDF Repository for my Research Publications</h2>
<p>In a recent post which explained <a rel="nofollow" title="Permanent link to Why I&#x002019;m Now Embedding ORCID Metadata in&#xa0;PDFs" target="_blank" href="http://ukwebfocus.wordpress.com/2013/01/28/why-im-now-embedding-orcid-metadata-in-pdfs/">Why I’m Now Embedding ORCID Metadata in PDFs</a> I described my intentions to ensure that my research papers contains rich embedded metadata to held enhance the discoverability of the publications, ensure that authorship is asserted (by embedding the ORCID ID of the authors of the papers) and ensure that embedded images contain descriptions which help ensure that the content can be understood by visually impaired readers. In addition I wish to ensure that the PDF is stored in PDF/A format which provides a more preservable format.</p>
<p>In light of discussions on the blog and on email I have decided to embed the ORCID IDs for co-authors of my peer-reviewed papers although, as suggested by Geoffery Bilder, I will be embedding the HTTP URI version of the ORCID IDs (e.g. <a rel="nofollow" target="_blank" href="http://orcid.org/0000-0001-5875-8744">http://orcid.org/0000-0001-5875-8744</a>) rather than just the ORCID ID itself (0000-0001-5875-8744). In addition I will also be embedding the DOI for papers which have been assigned a DOI.</p>
<p>But I am now faced with the problem of where the paper should be hosted. This post summarises the processes I am using in the selection of an appropriate repository service to complement my institutional repository.</p>
<h2>Selection Processes</h2>
<p>As <a rel="nofollow" target="_blank" href="http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/">described previously</a> workflow processes used in the creation of cover sheets for items hosted in our repository means that metadata embedded in PDFs is lost. Although we&#8217;re having discussions with repository staff about this, it occurred to me that I now have an ideal opportunity to make use of a third-party repository service.</p>
<p>In the past I have normally deposited papers in my institutional repository and used third-party services (such as <a rel="nofollow" target="_blank" href="http://www.researchgate.net/">ResearchGate</a> and <a rel="nofollow" target="_blank" href="http://www.academia.edu/">academia.edu</a>) to host the metadata, with links being provided to the full-text of the papers hosted in the institutional repository. The main reason for doing this was to ensure that usage statistics for accesses of the full-text was available in a single location rather than being fragmented across a range of services. There was a need to minimise the effort in collating such statistics for the product of evidence reports of our work which our funders have required in the past. However in light of the <a rel="nofollow" target="_blank" href="http://ukwebfocus.wordpress.com/2012/12/21/ukoln-looking-ahead/">recent announcement of the cessation of core-funding for UKOLN</a>, this is no longer a priority! Indeed it is now important to ensure that ideas described in peer-reviewed papers are widely disseminated.</p>
<h2>Using ResearchGate</h2>
<p>Having recognised the value of hosting PDF copies of my papers on a third-party repository service the question then was which one to select. The key criteria used in the selection were:</p>
<ul>
<li>Easy to upload files.</li>
<li>Popular with readers.</li>
<li>Resource is easily found using Google.</li>
<li>PDF files preserved intact.</li>
<li>Service appears to be viable.</li>
</ul>
<p><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-bath.png"><img class="alignright size-full wp-image-13183" style="margin-left:10px;margin-right:10px;" alt="Researchgate: University of Bath" src="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-bath.png?w=388&#038;h=342" width="388" height="342"/></a>On 25 December 2012 I received an automated email from <a rel="nofollow" target="_blank" href="https://www.researchgate.net/">ResearchGate</a> which informed me that &#8220;<em>28 of your colleagues from University of Bath have joined ResearchGate in the last month</em>&#8220;. On 24 January 2013 an automated message announced &#8220;<em>44 of your colleagues recently joined ResearchGate</em>&#8220;. As illustrated <a rel="nofollow" target="_blank" href="https://www.researchgate.net/institution/University_of_Bath">the University of Bath&#8221;s entry of ResearchGate</a> shows that there are currently researchers from 26 departments who have uploaded a total of 7,263 publications. It seems ResearchGate is growing in popularity, at least at the University of Bath.</p>
<p>On 20 December 2012 I was notified of the numbers of views of my papers (or, more accurately, the numbers of views of the metadata for my papers): &#8220;<em>Your published research was viewed <strong>1,678</strong> times in 2012</em>&#8221; so perhaps ResearchGate is popular beyond the University of Bath!</p>
<p>In light of the apparent popularity of the service I decided to upload one of my papers to the service: the PDF copy of the paper on &#8220;<em>Developing A Holistic Approach For E-Learning Accessibility</em>&#8220;.</p>
<p>It was trivial to upload the paper, especially as the associated metadata had been created previously. I then downloaded the PDF and was able to confirm that the metadata was still embedded in the PDF resource.</p>
<p>The <a rel="nofollow" target="_blank" href="https://www.researchgate.net/publication/216570896_Developing_A_Holistic_Approach_For_E-Learning_Accessibility">paper can be accessed from ResearchGate</a> and the user interface is shown below. I&#8217;ll leave others to judge the usability of the service.</p>
<div class="wp-caption alignnone" style="width:661px;"><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-cjtl-2004-paper.png"><img style="border:1px solid black;" alt="ResearchGate page for CJTL 2004 paper" src="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-cjtl-2004-paper.png?w=651&#038;h=395" width="651" height="395"/></a><p class="wp-caption-text">Page on ResearchGate for one of my papers</p></div>
<p>But in addition to users who are linked directly to the paper or access resources on the ResearchGate service using the Web site&#8217;s browse and search functionality, what of the discoverability of resources using Google.</p>
<h2>ResearchGate, Google and Embedded Metadata</h2>
<p>The PDF version of the paper now contains content which will not be widely used elsewhere: a combination of the authors&#8217; names and their ORCID ID. A Google search for &#8220;<em>Brian Kelly ORCID: 0000-0001-5875-8744</em>&#8220;, &#8220;<em>Lawrie Phipps ORCID: 0000-0002-0834-273X&#8221;</em> or <em>Elaine Swift ORCID: 0000-0002-6101-6861</em>&#8220; should initially find information about the paper hosted on the UKOLN Web site, the UK Web Focus blog and other services which may be used by the co-authors, although not the institutional repository as this does not currently provide ORCID information (understandably, as ORCID is so new).</p>
<p>I have therefore provided links to the following Google searches which I will monitor to see when Google has indexed the PDFs hosted on ResearchGate:</p>
<table border="1">
<tbody>
<tr>
<td><strong>Search Term</strong></td>
<td><strong>Findings</strong></td>
<td><strong>Date</strong></td>
</tr>
<tr>
<td><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=Brian+Kelly+ORCID%3A+0000-0001-5875-8744">Brian Kelly ORCID: 0000-0001-5875-8744</a></em></td>
<td>Large number of hits from UK Web Focus blog<br />
together with ORCID, UKOLN and Slideshare Web sites</td>
<td>27 Jan 2013</td>
</tr>
<tr>
<td rowspan="2"><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=Lawrie+Phipps+ORCID%3A+0000-0002-0834-273X"><em>Lawrie Phipps ORCID: 0000-0002-0834-273X</em></a></td>
<td>5 hits (ORCID and UKOLN Web sites and UK Web Focus blog)</td>
<td>6 Feb 2013</td>
</tr>
<tr>
<td>4 hits (ORCID Web site and UK Web Focus blog)</td>
<td>27 Jan 2013</td>
</tr>
<tr>
<td rowspan="2"><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=Elaine+Swift+ORCID%3A+0000-0002-6101-6861"><em>Elaine Swift ORCID: 0000-0002-6101-6861</em></a></td>
<td>3 hits (ORCID and UKOLN Web site and UK Web Focus blog)</td>
<td>6 Feb 2013</td>
</tr>
<tr>
<td>2 hits (ORCID Web site and UK Web Focus blog)</td>
<td>27 Jan 2013</td>
</tr>
</tbody>
</table>
<p>It appears that over a period of a week the ORCID metadata is being found from citation records hosted on the UKOLN Web site together with the citation records already indexed on the ORCID Web site and this blog, but not yet the PDF files hosted on ResearchGate. Might this be due to Google not indexing the researchgate.net site? In order to answer this question Google was used to provide information on the total number of resources on the service and the total number of PDF files. The results are given below.</p>
<table border="1">
<tbody>
<tr>
<td><strong>Purpose</strong></td>
<td><strong>Search Term</strong></td>
<td><strong>Nos. of results</strong></td>
<td><strong>Date</strong></td>
</tr>
<tr>
<td>Total number of resources on researchgate.net site</td>
<td><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:researchgate.net">site:researchgate.net</a></em></td>
<td>24,100,000 -<br />
55,300,000 <sup>*</sup></td>
<td>6 Feb 2013</td>
</tr>
<tr>
<td>Total number of PDF files on researchgate.net site</td>
<td><em><a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=site:researchgate.net+filetype%3Apdf">site:researchgate.net filetype:pdf</a></em></td>
<td>2,980,000</td>
<td>6 Feb 2013</td>
</tr>
</tbody>
</table>
<p>* The numbers of search results have fluctuated from 24,100,000 &#8211; 55,300,000 during the last few days.</p>
<p>It seems that a large number of PDF files hosted on Researchgate have been indexed by Google, but it takes longer than a week for new resources to be indexed and the results found using a Google search.</p>
<h2>Sustainability of the Service</h2>
<h3><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-numbers.png"><img class="alignright" style="margin-left:10px;margin-right:10px;border:1px solid black;" alt="Numbers of ResearchGate users" src="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-numbers.png?w=286&#038;h=268" width="286" height="268"/></a>What Does The Evidence Say?</h3>
<p>The <a rel="nofollow" target="_blank" href="http://www.researchgate.net/">home page for the service</a> displays a graphic (to users who are not logged in) of the numbers of the service. It seems that 2.4 million users have subscribed. Since there are likely to be researchers, this does appear to be a significant number.</p>
<p>But what else do we know about the service and the company which provides the service? TechCrunch provides <a rel="nofollow" target="_blank" href="http://techcrunch.com/tag/researchgate/">a handful of posts about the company</a> together with <a rel="nofollow" target="_blank" href="http://www.crunchbase.com/company/researchgate">the following summary</a>:</p>
<p style="padding-left:30px;"><em>ResearchGate is the leading social network for scientists. It offers tools and applications for researchers to interact and collaborate. ResearchGate offers a social, crowdsourced platform designed for researchers. The platform provides a global scientific web-based environment in which scientists can interact, exchange knowledge and collaborate with researchers of different fields.</em></p>
<p style="padding-left:30px;"><em>The results of ResearchGate’s new search engine, called ReFind, are not merely based on keywords, but selected in an intelligent way based on semantic, contextual correlations.</em></p>
<p><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-numbers-of-users-in-2012.png"><img class="alignleft" style="border-width:1px;border-style:solid;margin-right:10px;margin-bottom:10px;" alt="Researchgate: numbers of users in 2012" src="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-numbers-of-users-in-2012.png?w=311&#038;h=126" width="311" height="126"/></a>In addition the article also provides a graph showing the numbers of users over the past year, based on <a rel="nofollow" target="_blank" href="http://siteanalytics.compete.com/researchgate.net/">figures provided by Compete</a>.</p>
<p>As can be seen, the numbers of unique visitors seem to be growing significantly, from 61,640K in December 2011 to 236,170K in December 2012.</p>
<p><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/02/majestic-seo-figures-for-researchgate.png"><img class="alignright  wp-image-13248" style="border-width:1px;border-color:black;border-style:solid;margin:10px;" alt="MajesticSEO figures for Researchgate" src="http://ukwebfocus.files.wordpress.com/2013/02/majestic-seo-figures-for-researchgate.png?w=314&#038;h=228" width="314" height="228"/></a>I also <a rel="nofollow" target="_blank" href="https://www.majesticseo.com/reports/site-explorer/summary/researchgate.net?oq=researchgate.net&amp;IndexDataSource=F">used MajesticSEO to report on the SEO characteristics of the service</a> (note free subscription required in order to view findings). As can be seen there are 7,459 domains which have links to Researchgate.net and a total of 177,945 backlinks. Although such figures need to be regarded with caution (for example, they can be skewed significantly by link spam) the number of links from educational domains (3,241) and the numbers of educational domains (551) may be more appropriate to measure, due to the difficulties in creating educations domains to host link farms. This snapshot may therefore provide a useful baseline for measuring changes in the link popularity in the service.</p>
<h3>Terms and Conditions</h3>
<p>It should be noted that looking at the <a rel="nofollow" target="_blank" href="http://www.researchgate.net/application.TermsAndConditions.html">ResearchGate terms and conditions</a> I found no suggestions that the company claims rights to sell my data or my attention data to others (although I haven&#8217;t studied the terms and conditions in great detail). Although some may welcome this, others may wonder what the business model for the company is. An article entitled <a rel="nofollow" target="_blank" href="http://www.forbes.com/sites/alexknapp/2012/03/15/researchgate-wants-to-be-facebook-for-scientists/">ResearchGate Wants To Be Facebook For Scientists</a> published by Forbes in March 2012 described how:</p>
<p style="padding-left:30px;">&#8220;<em>ResearchGate will also be looking into ways to monetize its platform. The “no-brainer” way to do that, in Madisch’s words, is to provide job boards for scientists looking for jobs. Universities and companies would pay the site to place listings. The company is also looking for ways to partner with other companies that manufacture and sell biotech lab equipment, as well as several other different programs.&#8221;</em></p>
<p> Perhaps this is an appropriate business model which will accepted by researchers who normally shy away from free services on the grounds that &#8220;<em>If You’re Not Paying for It; You’re the Product</em>&#8220;.</p>
<h2>Interest in UK HE Sector</h2>
<p>Although ResearchGate seems to be growing in popularity globally (and in the University of Bath) is there any evidence of interest with the UK&#8217;s higher education community? For me this is not necessarily a significant issue (it can be fine to be an early adopter) but it would be interesting to see what others in my community are saying about the service.</p>
<p>Using a <a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=researchgate+terms+and+conditions">Google search for &#8220;<em>researchgate terms and conditions</em>&#8220;</a> I found that the DCC have provided <a rel="nofollow" target="_blank" href="http://www.dcc.ac.uk/resources/external/researchgate">a summary of ResearchGate</a> in its list of resources of digital curators with <a rel="nofollow" target="_blank" href="http://www.ed.ac.uk/schools-departments/humanities-soc-sci/research-ke/news/current-news/researchgate">a similar resource</a> being provided by the University of Edinburgh&#8217;s College of Humanities and Social Science. A <a rel="nofollow" target="_blank" href="https://www.google.co.uk/search?q=researchgate+uk">Google search for &#8220;<em>researchgate UK</em>&#8220;</a> finds a number of additional resources from the sector including pages provided by the University of Leeds (<a rel="nofollow" target="_blank" href="http://www.leeds.ac.uk/comms/for_staff/Researchgate.pdf">PDF format</a>), the <a rel="nofollow" target="_blank" href="http://www2.le.ac.uk/library/find/databases/r/researchgate">University of Leicester</a>, the University of Liverpool (<a rel="nofollow" target="_blank" href="http://www.liv.ac.uk/gradschool/tools_resources/attachments/ResearchGateHANDOUT.pdf">PDF format</a>) and the <a rel="nofollow" target="_blank" href="http://insight.glos.ac.uk/academicschools/nss/members/undergraduates/biosciences/biosciencenews/Pages/researchgate.aspx">University of Gloucester</a> together with blog posts at the <a rel="nofollow" target="_blank" href="http://blog.lboro.ac.uk/researchstaff/researchgate/">University of Loughborough</a> and the <a rel="nofollow" target="_blank" href="http://blogs.warwick.ac.uk/libresearch/entry/mendeley_and_researchgate/">University of Warwick</a>.</p>
<h2>My Decision</h2>
<p>In light of these figures and my experiences in using the service I am happy to use the service to provide additional exposure to my research papers which complements the master copy of papers which are hosted on my institutional repository. Are other researchers making similar decisions or are alternative services felt to provide better options?</p>
<hr />
<p>View Twitter conversation from: [<a rel="nofollow" target="_blank" href="http://topsy.com/ukwebfocus.wordpress.com/2013/02/06/why-im-evaluating-researchgate/">Topsy</a>] | View Twitter statistics from: [<a rel="nofollow" target="_blank" href="http://tweetreach.com/reach?q=https%3A%2F%2Fukwebfocus.wordpress.com%2F2013%2F02%2F06%2Fwhy-im-evaluating-researchgate%2F">TweetReach</a>] – [<a rel="nofollow" target="_blank" href="https://bitly.com/12skeFg+/global">Bit.ly</a>]</p>
<br />Filed under: <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/openness/'>openness</a>, <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/repositories/'>Repositories</a>  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ukwebfocus.wordpress.com/13141/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godelicious/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ukwebfocus.wordpress.com/13141/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gofacebook/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ukwebfocus.wordpress.com/13141/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gotwitter/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ukwebfocus.wordpress.com/13141/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gostumble/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ukwebfocus.wordpress.com/13141/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godigg/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ukwebfocus.wordpress.com/13141/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/goreddit/ukwebfocus.wordpress.com/13141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ukwebfocus.wordpress.com/13141/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&#038;blog=497535&#038;post=13141&#038;subd=ukwebfocus&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <geo:lat>51.379915</geo:lat>
         <geo:long>-2.331708</geo:long>
         <media:content medium="image" url="http://2.gravatar.com/avatar/27731abff266f585f006998f65c74be9?s=96&amp;amp;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96">
            <media:title type="html">Brian Kelly (UK Web Focus)</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-bath.png">
            <media:title type="html">Researchgate: University of Bath</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-cjtl-2004-paper.png">
            <media:title type="html">ResearchGate page for CJTL 2004 paper</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-numbers.png">
            <media:title type="html">Numbers of ResearchGate users</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/01/researchgate-numbers-of-users-in-2012.png">
            <media:title type="html">Researchgate: numbers of users in 2012</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/02/majestic-seo-figures-for-researchgate.png">
            <media:title type="html">MajesticSEO figures for Researchgate</media:title>
         </media:content>
      <enclosure url="http://www.leeds.ac.uk/comms/for_staff/Researchgate.pdf" length="2547437" type="application/pdf" /><itunes:subtitle>A PDF Repository for my Research Publications In a recent post which explained Why I’m Now Embedding ORCID Metadata in PDFs I described my intentions to ensure that my research papers contains rich embedded metadata to held enhance the discoverability of </itunes:subtitle><itunes:summary>A PDF Repository for my Research Publications In a recent post which explained Why I’m Now Embedding ORCID Metadata in PDFs I described my intentions to ensure that my research papers contains rich embedded metadata to held enhance the discoverability of the publications, ensure that authorship is asserted (by embedding the ORCID ID of the authors of the [&amp;#8230;]</itunes:summary></item>
      <item>
         <title>Call for feedback to the ResourceSync specification for synchronisation of web resources</title>
         <link>http://blog.paulwalk.net/2013/02/01/call-for-feedback-to-the-resourcesync-specification-for-synchronisation-of-web-resources/</link>
         <description>I have been slightly involved (through Jisc funding) with the ResourceSync specification project, being led by Herbert Van de Sompel of the Los Alamos National Laboratory. The project has just released a draft specification, which is available at http://www.openarchives.org/rs/. The draft will be available for public comment until March 15th 2013 &amp;#8211; you are invited [...]</description>
         <guid isPermaLink="false">http://blog.paulwalk.net/?p=345</guid>
         <pubDate>Fri, 01 Feb 2013 18:04:01 +0000</pubDate>
         <content:encoded><![CDATA[<p><img src="http://blog.paulwalk.net/wp-content/uploads/sites/6/2013/02/resourcesync_logo.gif" alt="Resourcesync logo" title="resourcesync_logo.gif" border="0" width="72" height="66" style="float:left;margin:3px;"/>I have been slightly involved (through <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk">Jisc</a> funding) with the <a rel="nofollow" target="_blank" href="http://www.niso.org/workrooms/resourcesync/">ResourceSync</a> specification project, being led by Herbert Van de Sompel of the Los Alamos National Laboratory. The project has just released a draft specification, which is available at <a rel="nofollow" target="_blank" href="http://www.openarchives.org/rs/">http://www.openarchives.org/rs/</a>.</p>

<p>The draft will be available for public comment until March 15th 2013 &#8211; you are invited to comment via the <a rel="nofollow" target="_blank" href="https://groups.google.com/forum/?fromgroups#!forum/resourcesync">ResourceSync Google Group</a>. Group discussions are openly accessible; posting requires group membership.</p>

<p>In Herbert&#8217;s words:</p>

<blockquote>
  <p>The ResourceSync specification describes a synchronisation framework for the web that consists of various capabilities that allow third party systems to remain synchronised with a server&#8217;s evolving resources. The capabilities may be combined in a modular manner to meet local or community requirements. The specification also describes how a server can advertise the synchronisation capabilities it supports and how third party systems can discover this information. The document formats used in the synchronisation framework are based on the widely adopted <em>Sitemap</em> protocol.</p>
</blockquote>

<p>ResourceSync is a collaboration between the <a rel="nofollow" target="_blank" href="http://niso.org">National Information Standardization Organization (NISO)</a> and the <a rel="nofollow" target="_blank" href="http://www.openarchives.org">Open Archives Initiative (OAI)</a>. It is funded by the <a rel="nofollow" target="_blank" href="http://www.sloan.org/">Alfred P. Sloan Foundation</a> and <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk">Jisc</a>.</p>]]></content:encoded>
      </item>
      <item>
         <title>Observing the Web</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/02/01/observing-the-web/</link>
         <description>&lt;p&gt;Like a lot of people, when I think about it, or when I&amp;#8217;m reminded about it, I understand that the Web is a place where someone is always watching what you do. I understand that &amp;#8230; but then I think, well &amp;#8230; the Web is such a huge beast; such a vast ocean; such a giant metropolis where the comings and goings of individuals are insignificant. How and why would anyone notice what I&amp;#8217;m looking at and which links I&amp;#8217;m clicking on?&lt;/p&gt;
&lt;p&gt;Then up pops Tom Barnett from Switch Concepts Ltd. at a meeting yesterday to tell us that &amp;#8216;Google has a file the size of an encyclopedia on everyone in this room.&amp;#8217;&lt;/p&gt;
&lt;p&gt;Hmmm &amp;#8230; that&amp;#8217;s not a particularly comfortable idea for someone to put in your head. I start to feel a vague sense of paranoia creeping through my mind.&lt;/p&gt;
&lt;p&gt;And then I think, c&amp;#8217;mon Neil, pull yourself together! Google really doesn&amp;#8217;t care who you are. They just want to put things in your line of sight that are more rather than less likely to get you to open your wallet and part with your wages!!&lt;/p&gt;
&lt;p&gt;Such were the thoughts that were buzzing around my head yesterday at an event organised by the Web Science Trust (&lt;a rel="nofollow" target="_blank" href="http://webscience.org" title="Web Science Trust"&gt;http://webscience.org&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The meeting was entitled &amp;#8216;Observing the Web&amp;#8217; and the purpose was to highlight some of the work that the Web Science Trust and their partners and collaborators are doing to build a global network of Web Observatories providing an open analytics environment to drive new forms of Web research. We went round the room doing introductions and Dame Wendy Hall ended up branding us a &amp;#8216;motley crew&amp;#8217;. Academics, industry players, not-for-profits, technologists, funders, charities, a lawyer. (Quite a respectable looking motley crew in the very smart surroundings of the Royal Society I might add). But &amp;#8216;motley crew&amp;#8217; felt about right for a topic and a collaborative, academic, open activity that is still exploring the territory and testing new ground. Presumably in contrast to the well-resourced, sophisticated and highly developed (but opaque) methods employed by the corporate observers of the Web (Facebook, Amazon, Google, Microsoft, Yahoo etc.).&lt;/p&gt;
&lt;p&gt;The point of all of this &amp;#8216;observing&amp;#8217; is not to try and take account of every little bit of data and content on the web, but rather to understand what the aggregated use of the Web can tell us; how trends and fashions and changes of behaviour in relation to the Web might illuminate aspects of our society and culture, both now and for future students and researchers.&lt;/p&gt;
&lt;p&gt;This was all of great interest to Jisc. We are currently working with the British Library, the Oxford Internet Institute and the Institute of Historical Research on an initiative that aligns very well with the notion of the Web Observatory.&lt;/p&gt;
&lt;p&gt;The Big Data project (&lt;a rel="nofollow" target="_blank" href="http://www.oii.ox.ac.uk/research/projects/?id=88" title="Big Data Project"&gt;http://www.oii.ox.ac.uk/research/projects/?id=88&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;and&lt;/p&gt;
&lt;p&gt;the AADDA project (&lt;a rel="nofollow" target="_blank" href="http://www.history.ac.uk/projects/digital/AADDA" title="AADDA Project"&gt;http://www.history.ac.uk/projects/digital/AADDA&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;are both using a copy of the Internet Archive&amp;#8217;s collection of UK domain websites collected over the period 1996-2010, to examine new ways to engage with the web at domain level, and develop new forms of research that leverage the scale of the web. As the name of the Oxford project says &amp;#8230; it&amp;#8217;s all about using &amp;#8216;Big Data&amp;#8217;.&lt;/p&gt;
&lt;p&gt;This was work that emerged from influential JISC-funded reports commissioned in 2010 -&lt;/p&gt;
&lt;p&gt;Researcher Engagement with Web Archives&lt;br /&gt;
&lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/whatwedo/programmes/preservation/researcherengagementwithWA.aspx" title="Researcher Engagement with Web Archives"&gt; http://www.jisc.ac.uk/whatwedo/programmes/preservation/researcherengagementwithWA.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As we heard at the meeting, the academic observatory is a very different proposition to the corporate observatory and comes with enormous challenges including: interoperability (how do we link observatories?); access (asides from Twitter which of the big corporates will let us use their data?); privacy (will people feel spied upon?); and sustainability (what is the business model?).&lt;/p&gt;
&lt;p&gt;A fascinating meeting and big topic. There will be more discussion in Early May at the ACM Web Science Meeting in Paris.&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://www.websci13.org/registration/" title="ACM Web Science 2013"&gt;http://www.websci13.org/registration/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
         <author>Neil Grindley</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1681</guid>
         <pubDate>Fri, 01 Feb 2013 16:09:02 +0000</pubDate>
      </item>
      <item>
         <title>Jisc LAMP &amp;#8211; Shedding Light on Library Data and Metrics</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/02/01/jisc-lamp-shedding-light-on-library-data-and-metrics/</link>
         <description>&lt;div id="attachment_1630" class="wp-caption aligncenter" style="width:458px;"&gt;&lt;a rel="nofollow" target="_blank" href="http://infteam.jiscinvolve.org/wp/files/2013/01/New-Picture-15.jpg"&gt;&lt;img class="size-full wp-image-1630" src="http://infteam.jiscinvolve.org/wp/files/2013/01/New-Picture-15.jpg" alt="Indianapolis Museum of Art analytics dashboard" width="448" height="216"/&gt;&lt;/a&gt;&lt;p class="wp-caption-text"&gt;Indianapolis Museum of Art analytics dashboard&lt;/p&gt;&lt;/div&gt;
&lt;p&gt;The current buzz and enthusiasm around &lt;em&gt;data &lt;/em&gt;is difficult to avoid. Whether it&amp;#8217;s the challenges of &amp;#8216;big data&amp;#8217;, the potential of data-driven techniques, or simply the power and clarity of data visualisations; data is inescapably entwined with much of what we do, and the systems and services we provide and use.&lt;/p&gt;
&lt;p&gt;Long before we were all swept up in this data euphoria, libraries have understood the potential and opportunities of collecting, managing and acting on the myriad sources of data they create through their systems and services. &lt;strong&gt;A recent survey of libraries by Mimas highlighted that the use of data to inform decision making was a top priority within the next 5 years&lt;/strong&gt;. At the same time it is clear from Jisc&amp;#8217;s &lt;a rel="nofollow" target="_blank" href="http://www.activitydata.org/"&gt;Activity Data programme&lt;/a&gt; that there is a lack of viable tools and services for libraries to effectively exploit the broad range of data available.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A new project has just started at Jisc called the Library Analytics and Metrics project (jiscLAMP) &lt;/strong&gt;which aims to enable libraries to capitalise on this data and support the improvement and development of new services (enhanced collections management, personalised services, tailored student support) and demonstrate value and impact in new ways across the institution (student attainment and satisfaction, identify changing research patterns, for example).&lt;/p&gt;
&lt;p&gt;Besides LAMP providing ample opportunities for nice puns around shining a light onto library data,&lt;strong&gt; the project will be developing a prototype shared library analytics service for UK academic libraries&lt;/strong&gt;. Initially this is being envisioned as a kind of data dashboard, bringing together disparate data sets and visualising them in an attractive and meaningful way.&lt;/p&gt;
&lt;p&gt;The project is a partnership between Jisc, Mimas (at the University of Manchester) and the University of Huddersfield running until October this year when it will deliver a prototype service for use by the community.&lt;/p&gt;
&lt;p&gt;While the project will establish specific analytics use-cases for the data, the project intends to initially focus on services and projects such as &lt;a rel="nofollow" target="_blank" href="http://copac.ac.uk/innovations/activity-data/"&gt;Copac activity data&lt;/a&gt;, the &lt;a rel="nofollow" target="_blank" href="http://library.hud.ac.uk/blogs/projects/lidp/"&gt;Library Impact Data project&lt;/a&gt;, &lt;a rel="nofollow" target="_blank" href="http://jusp.mimas.ac.uk/"&gt;JUSP&lt;/a&gt;, and &lt;a rel="nofollow" target="_blank" href="http://www.irus.mimas.ac.uk/"&gt;IRUS&lt;/a&gt;. As the project iterates through prototypes and use-cases so the depth and breadth of data sources and services will change too.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ultimately the project will be exploiting the potential that bringing these various services and emerging data sets together will have: unearthing new narratives and unforeseen connections.&lt;/strong&gt; While individually these services and data sets are powerful, the opportunities for new insights and evidence grow exponentially as these data sets are brought together and interrogated as a group.&lt;/p&gt;
&lt;p&gt;So that was a very brief introduction to LAMP.&lt;/p&gt;
&lt;p&gt;At this point I was going to simply end this post with some of the aims and objectives of the project. But, instead I thought I&amp;#8217;d do something a little different and give you a &amp;#8216;vision&amp;#8217; and goals for the eventual prototype. Come October, this is what you can expect&amp;#8230;.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;vision&lt;/strong&gt; of the library analytics and metrics project is to put data at the fingertips of librarians to improve student attainment and satisfaction and achieve new efficiencies and economies through innovative services and tailored support.&lt;/p&gt;
&lt;p&gt;To achieve this the project will meet the following &lt;strong&gt;goals&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Provide insight&lt;/strong&gt;: Will enable analysis that provides new insights which result in new services and practices providing additional value to students and researchers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Diversity of data&lt;/strong&gt;: Delivers data from a minimum of three distinct data sources to meet a range of use-cases such as, improving student retention and attainment, personalised services and demonstrating value&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sustainable&lt;/strong&gt;: Ensure the project is based on a sustainable foundation to ensure long-term support for the academic library community&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Community based&lt;/strong&gt;: The service will enable the library community to do things together not possible separately. Through the community advisory group users will be in from the beginning, and the prototype service will engage a minimum of 8 institutions in the initial prototype&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you would like to find out more then take a look at the&lt;strong&gt; &lt;a rel="nofollow" target="_blank" href="http://jisclamp.mimas.ac.uk/"&gt;project website&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
         <author>Ben Showers</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1622</guid>
         <pubDate>Fri, 01 Feb 2013 10:27:12 +0000</pubDate>
      </item>
      <item>
         <title>Jisc Report on Sustaining Our Digital Future: Institutional Strategies for Digital Content</title>
         <link>http://ukwebfocus.wordpress.com/2013/01/30/jisc-report-on-sustaining-our-digital-future-institutional-strategies-for-digital-content/</link>
         <description>Earlier today the Jisc announced the launch of a report on Sustaining Our Digital Future: Institutional Strategies for Digital Content. This report, which provides a close look at three institutions (UCL, Imperial War Museums and the National Library of Wales) in the United Kingdom confirms: How fragmented the digital landscape is at universities and within [&amp;#8230;]&lt;img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&amp;#038;blog=497535&amp;#038;post=13188&amp;#038;subd=ukwebfocus&amp;#038;ref=&amp;#038;feed=1" width="1" height="1"/&gt;</description>
         <guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=13188</guid>
         <pubDate>Wed, 30 Jan 2013 13:50:14 +0000</pubDate>
         <content:encoded><![CDATA[<p><a rel="nofollow" target="_blank" href="http://ukwebfocus.files.wordpress.com/2013/01/jisc-sca-report-2013-01.png"><img class="alignright  wp-image-13189" style="margin-left:10px;margin-right:10px;" alt="JISC SCA report" src="http://ukwebfocus.files.wordpress.com/2013/01/jisc-sca-report-2013-01.png?w=302&#038;h=433" width="302" height="433"/></a>Earlier today <a rel="nofollow" target="_blank" href="http://sca.jiscinvolve.org/wp/2013/01/30/new-sca-ithaka-sr-report-and-videos-sustaining-our-digital-future/">the Jisc announced the launch of a report</a> on <em>Sustaining Our Digital Future: Institutional Strategies for Digital Content</em>.</p>
<p>This report, which provides a close look at three institutions (UCL, Imperial War Museums and the National Library of Wales) in the United Kingdom confirms:</p>
<ul>
<li>How fragmented the digital landscape is at universities and within other organisations.</li>
<li>How there are examples of good practice within and outside higher education that all can learn from but that greater co-ordination is required to deliver this at a UK level.</li>
<li>How little the topic of post-build sustainability comes up at the higher levels of administration.</li>
<li>How risk is present within the current system, concerning the sustainability of digital content.</li>
</ul>
<p>The report (which is available in <a rel="nofollow" target="_blank" href="http://sca.jiscinvolve.org/wp/files/2013/01/Sustaining-our-digital-future-FINAL-31.pdf">PDF format</a>) is substantial, containing 88 pages. In addition to this main report a second document (also available in <a rel="nofollow" target="_blank" href="http://sca.jiscinvolve.org/wp/files/2013/01/sustainability_healthcheck_tool.pdf">PDF format</a>) provides a &#8220;<em>Sustainability Health Check Tool for Digital Content Projects</em>&#8220;.</p>
<p>This report is very timely arriving at a time in which we are seeing reductions in the levels of funding available across public sector organisations in the UK, which will lead to questions regarding the sustainability of existing online services and digital resources.</p>
<p>The report is based on a study conducted by Ithaka S+R, with funding from the Jisc-led Strategic Content Alliance, which reported on findings of earlier studies showing that both funders and project leaders rely heavily on their host institutions to support and sustain digital content, beyond the end of the grant. But what will happen when the host institutions have significantly reduced levels of funding to continue to maintain and develop such content?</p>
<p>The report describes the need for an &#8220;<em>early and honest appraisal of which projects are likely to require .. support post-launch</em>&#8220;:</p>
<ul>
<li><em><strong>Digital content, requiring just “maintenance”</strong>: These may not require ongoing growth, but certainly do require a clear exit plan to ensure that the content will be smoothly deposited and integrated into some other site, database, or repository. The issue of ongoing investment does not disappear; it just becomes the concern of the larger platform on which this piece of content now lives.</em></li>
<li><em><strong>Digital resources, requiring ongoing growth and investment</strong>: These require early sustainability planning, including identifying institutional or other partners and careful consideration of the full range of costs and activities needed to keep the resource vibrant.</em></li>
</ul>
<p>The Sustainability Health Check Tool provides a paper-based checklist for those with responsibilities for managing digital content. The tool covers a number of areas including ongoing support; audience, usage and impact assessment together with preservation issues.</p>
<p>A series of video clips have been produced to accompany the launch of this report. It was particularly interesting to hear the comment from Prof David Price, Vice-Provost (Research) at UCL:</p>
<p style="padding-left:30px;">&#8220;<em>We&#8217;re not just worried about things disappearing but about things never appearing! They are hosted all over the place, and not all the projects have a sustainable plan</em>.&#8221;</p>
<p>This video clip <a rel="nofollow" target="_blank" href="http://www.youtube.com/watch?feature=player_embedded&amp;v=HjH-FsWuAkw">is available on YouTube</a> and embedded below.</p>
<span class='embed-youtube' style='text-align:center;display:block;'></span> 
<hr />
<p><a rel="nofollow" name="statistics"></a>View Twitter conversation from: [<a rel="nofollow" target="_blank" href="http://topsy.com/ukwebfocus.wordpress.com/2013/01/30/jisc-report-on-sustaining-our-digital-future-institutional-strategies-for-digital-content/">Topsy</a>] | View Twitter statistics from: [<a rel="nofollow" target="_blank" href="http://tweetreach.com/reach?q=http%3A%2F%2Fukwebfocus.wordpress.com%2F2013%2F01%2F30%2Fjisc-report-on-sustaining-our-digital-future-institutional-strategies-for-digital-content%2F">TweetReach</a>] – [<a rel="nofollow" target="_blank" href="https://bitly.com/VqWdcg+/global">Bit.ly</a>]</p>
<br />Filed under: <a rel="nofollow" target="_blank" href='http://ukwebfocus.wordpress.com/category/preservation/'>preservation</a>  <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gocomments/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ukwebfocus.wordpress.com/13188/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godelicious/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ukwebfocus.wordpress.com/13188/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gofacebook/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ukwebfocus.wordpress.com/13188/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gotwitter/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ukwebfocus.wordpress.com/13188/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/gostumble/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ukwebfocus.wordpress.com/13188/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/godigg/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ukwebfocus.wordpress.com/13188/"/></a> <a rel="nofollow" target="_blank" href="http://feeds.wordpress.com/1.0/goreddit/ukwebfocus.wordpress.com/13188/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ukwebfocus.wordpress.com/13188/"/></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ukwebfocus.wordpress.com&#038;blog=497535&#038;post=13188&#038;subd=ukwebfocus&#038;ref=&#038;feed=1" width="1" height="1"/>]]></content:encoded>
         <geo:lat>51.379915</geo:lat>
         <geo:long>-2.331708</geo:long>
         <media:content medium="image" url="http://2.gravatar.com/avatar/27731abff266f585f006998f65c74be9?s=96&amp;amp;d=http%3A%2F%2F2.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96">
            <media:title type="html">Brian Kelly (UK Web Focus)</media:title>
         </media:content>
         <media:content medium="image" url="http://ukwebfocus.files.wordpress.com/2013/01/jisc-sca-report-2013-01.png">
            <media:title type="html">JISC SCA report</media:title>
         </media:content>
      <enclosure url="http://sca.jiscinvolve.org/wp/files/2013/01/Sustaining-our-digital-future-FINAL-31.pdf" length="4468396" type="application/pdf" /><itunes:subtitle>Earlier today the Jisc announced the launch of a report on Sustaining Our Digital Future: Institutional Strategies for Digital Content. This report, which provides a close look at three institutions (UCL, Imperial War Museums and the National Library of W</itunes:subtitle><itunes:summary>Earlier today the Jisc announced the launch of a report on Sustaining Our Digital Future: Institutional Strategies for Digital Content. This report, which provides a close look at three institutions (UCL, Imperial War Museums and the National Library of Wales) in the United Kingdom confirms: How fragmented the digital landscape is at universities and within [&amp;#8230;]</itunes:summary></item>
      <item>
         <title>Living in interesting times: KualiOLE announce further funding from Mellon Foundation</title>
         <link>http://infteam.jiscinvolve.org/wp/2013/01/28/living-in-interesting-times-kualiole-announce-further-funding-from-mellon-foundation/</link>
         <description>&lt;p&gt;I thought it was worth posting this announcement I saw about the &lt;strong&gt;&lt;a rel="nofollow" target="_blank" href="http://www.kuali.org/ole"&gt;KualiOLE project&lt;/a&gt; receiving further funding from the Mellon Foundation &lt;/strong&gt;($750,000 in total).&lt;/p&gt;
&lt;p&gt;While the KualiOLE project is a partnership between US research libraries, it is also a project which is generating some interest within the UK library community.&lt;/p&gt;
&lt;p&gt;I don&amp;#8217;t think it would be an overstatement to claim that &lt;strong&gt;the Library Systems landscape is in a particularly interesting phase at the moment&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;A few years ago, 2007/8, Jisc and SCONUL released their &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/publications/briefingpapers/2008/librarymanagementbp.aspx"&gt;LMS landscape report&lt;/a&gt; which reflected on the library communities frustration at the perceived lack of visible innovation with the systems space.&lt;/p&gt;
&lt;p&gt;Over the past few years &lt;strong&gt;that landscape has been somewhat transformed with new proprietary systems from all the major library systems vendors&lt;/strong&gt;, as well as the first implementation of an &lt;a rel="nofollow" target="_blank" href="http://blogs.staffs.ac.uk/informationlandscape/2010/12/10/staffordshire-university-chooses-koha-for-its-new-library-system/"&gt;open source solution at Staffordshire University&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And, we shouldn&amp;#8217;t forget the impact of  the &lt;a rel="nofollow" target="_blank" href="http://talis-systems.com/2011/03/talis-group-completes-the-sale-of-its-library-division-to-capita-group-plc/"&gt;sale of Talis group&amp;#8217;s Library Division&lt;/a&gt; to Capita.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This agitation of the fragile library systems ecology has continued apace, and things do not seem to be slowing down&amp;#8230;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Late last year, the Bloomsbury Library Management Systems Consortium made a &lt;a rel="nofollow" target="_blank" href="http://www.blms.ac.uk/blms-solution-decision-in-principle/"&gt;decision in principle&lt;/a&gt; to develop a new, 21st century Library Systems with KualiOLE as its platform. There is a &lt;a rel="nofollow" target="_blank" href="http://www.blms.ac.uk/scanning-the-horizon/"&gt;great post&lt;/a&gt; by John Robinson of &lt;a rel="nofollow" target="_blank" href="http://www.soas.ac.uk/library/"&gt;SOAS &lt;/a&gt; on what led them to the decision and what the library systems landscape looks like from their perspective at the moment.&lt;/p&gt;
&lt;p&gt;At the same time some of the projects involved in the Jisc &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/whatwedo/programmes/di_informationandlibraries/emergingopportunities/librarysystems.aspx"&gt;Library Systems programme&lt;/a&gt; have had their interest sparked by the KualiOLE project.&lt;/p&gt;
&lt;p&gt;This interest and general awareness was catalysed by the &lt;a rel="nofollow" target="_blank" href="http://www.sconul.ac.uk/news/kuali-ole-seminar"&gt;SCONUL KualiOLE seminar&lt;/a&gt; held prior to the SCONUL Winter conference in December 2012, which a number of institutions attended.&lt;/p&gt;
&lt;p&gt;A number of projects and libraries posted thoughts on the work of KualiOLE, a sample of some are below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Benefits of Sharing project in Scotland, &lt;a rel="nofollow" target="_blank" href="http://libraryblogs.is.ed.ac.uk/benefitsofsharing/2013/01/tbos-learns-about-kualiole/"&gt;blogged some thoughts&lt;/a&gt; on the project.&lt;/li&gt;
&lt;li&gt;Richard Nurse of the OU &lt;a rel="nofollow" target="_blank" href="http://libwebrarian.wordpress.com/2012/12/10/kuali-ole-seminar/"&gt;reflected &lt;/a&gt;on the seminar.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There has also been a very fruitful partnership between KualiOLE, Jisc and JISC Collections to build an open web service that provides electronic resource information on a global scale.&lt;/p&gt;
&lt;p&gt;The &lt;a rel="nofollow" target="_blank" href="http://gokb.org/"&gt;Global Open knowledgebase (GOKb) project&lt;/a&gt; is working in partnership with &lt;a rel="nofollow" target="_blank" href="http://www.jisc-collections.ac.uk/knowledgebaseplus/"&gt;knowledgebase+ (KB+) &lt;/a&gt; and sharing a common data model to help address those global electronic resource management issues, such as data accuracy, transfer formats and so on. &lt;strong&gt;As well as helping establish a global library community working together to solve some of these everyday frustrations and challenges.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GOKb is set to deliver an open, community-based, international data repository that will provide libraries with publication information about electronic resources, supporting libraries in providing efficient and effective services to their users and ensuring that critical electronic collections are available to their students and researchers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;It will help underpin both the KualiOLE management system, and provide valuable data and collaboration for KB+ and the UK library community.&lt;/p&gt;
&lt;p&gt;So these are very exciting times for libraries (in both the UK and elsewhere) as they reflect on the requirements of their users and the functionality and agility of their systems. And, while libraries find themselves navigating through these constantly treacherous technological waters, it may be worth keeping an eye on the work of the &lt;a rel="nofollow" target="_blank" href="http://www.lmschange.info/blog/"&gt;LMS Change project&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Like a bright star in the sky, it may help guide you to slightly calmer waters!&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
         <author>Ben Showers</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1634</guid>
         <pubDate>Mon, 28 Jan 2013 10:16:46 +0000</pubDate>
      </item>
      <item>
         <title>Activity data &amp;#8211; delivering benefits from the data deluge</title>
         <link>http://infteam.jiscinvolve.org/wp/2012/12/17/activity-data-delivering-benefits-from-the-data-deluge/</link>
         <description>&lt;p&gt;We are pleased to announce a new report that explores how activity data and analytics can benefit universities and proposes how institutions can cope with the associated challenges and opportunities. The report is called Activity data &amp;#8211; delivering benefits from the data deluge and is available on the &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/publications/reports/2012/activity-data-delivering-benefits.aspx"&gt;Jisc website now&lt;/a&gt;. The eagle eyed will have spotted a link to it in the &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/inform/inform35/FutureTrendsDataAnalytics.html"&gt;current issue of Jisc inform&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The report was written by David Kay of Sero Consulting and Mark van Harmelen of Headtek and it builds on the &lt;a rel="nofollow" target="_blank" href="http://www.activitydata.org/"&gt;work we have been doing with activity data over the last couple of years&lt;/a&gt;. Over those two years it has felt that activity data has moved from being a relatively fringe and immature area in universities to something that is likely to be of vital importance in the next few years.&lt;/p&gt;
&lt;p&gt;I think that this is emphasised by a flurry of exciting new developments. My colleague Myles Danson has worked with CETIS to release the &lt;a rel="nofollow" target="_blank" href="http://publications.cetis.ac.uk/c/analytics"&gt;Analytics Series&lt;/a&gt;. This is a series of seven useful and interesting reports that explore analytics from a number of different angles. This includes thinking about the implications for research and teaching and learning.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ll pause here to explain what I see as the difference between analytics and activity data. Analytics is a broad heading for the mining of data to inform business decisions or provide improved services to end users. Activity data is one type of data that falls under the analytics heading. Activity data specifically focuses on the data recorded about a user&amp;#8217;s actions when they interact with a website or software or even a physical space.&lt;/p&gt;
&lt;p&gt;Another exciting development is a project to explore a shared library analytics service. This project is seeking to develop a pilot shared service that builds on some of the experiments we have been doing in our activity data work. It is expected to complete in Autumn 2013 and should provide libraries with a useful new way to study how their services are working and to gather data to inform crucial decisions over allocation of resources. More detail will be available on this soon.&lt;/p&gt;
&lt;p&gt;One project that will be an important part of delivering the library analytics suite will be Huddersfield&amp;#8217;s Library Impact Data project. They released version 2 of their toolkit last week. So if you can&amp;#8217;t wait for the library analytics suite to start exploring your library activity data then head over to their &lt;a rel="nofollow" target="_blank" href="http://library.hud.ac.uk/blogs/projects/lidp/"&gt;blog for more information&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So, there is a lot going on. That makes the Activity Data report even more timely since it provides an accessible and useful introduction to the topic. The report discusses the benefits that are on offer to institutions. It includes case studies on UK and US institutions who are leading the way with activity data. It finishes by offering some pointers on strategies that may be useful in getting ready to seize the opportunities offered by activity data.&lt;/p&gt;
&lt;p&gt;This is a fast moving area and it looks like 2013 should see some even more exciting developments.&lt;/p&gt;</description>
         <author>Andy McGregor</author>
         <guid isPermaLink="false">http://infteam.jiscinvolve.org/wp/?p=1602</guid>
         <pubDate>Mon, 17 Dec 2012 10:47:39 +0000</pubDate>
      </item>
      <item>
         <title>RIOXX application profile – draft 1</title>
         <link>http://blog.paulwalk.net/2012/10/23/rioxx-application-profile-draft-1/</link>
         <description>Together with Sheridan Brown, I have been tasked with developing some guidelines and a metadata &amp;#8216;application&amp;#8217; profile for institutional repositories (IRs) in the UK. We are calling this work RIOXX. This post focusses on the application profile more than the guidelines, and describes phase 1 of the project, which aims to deploy this application profile [...]</description>
         <guid isPermaLink="false">http://blog.paulwalk.net/?p=300</guid>
         <pubDate>Tue, 23 Oct 2012 13:30:09 +0000</pubDate>
         <content:encoded><![CDATA[<p>Together with <a rel="nofollow" target="_blank" href="http://www.keyperspectives.co.uk/aboutus/sbrown.html">Sheridan Brown</a>, I have been tasked with developing some guidelines and a metadata &#8216;application&#8217; profile for institutional repositories (IRs) in the UK. We are calling this work <em>RIOXX</em>. This post focusses on the <strong>application profile</strong> more than the guidelines, and describes phase 1 of the project, which aims to deploy this application profile across IRs in the UK by the first quarter of 2013. </p>
<h2>Objectives</h2>
<ul>
<li>to develop an application profile which enables open access repositories to expose metadata more consistently and which, in particular, conveys information about how the item being described in the metadata was funded</li>
<li>to develop general guidelines for repositories which support the use of the application profile</li>
<li>to support such technical development as is necessary to implement these recommendations and the application profile in common repository platforms</li>
<li>to develop these such that they pave the way for a likely CERIF-based solution in the medium-long term.</li>
</ul>
<h2>Scope and approach</h2>
<p>Funder policy regarding Open Access (OA) is being actively developed and the OA landscape is shifting. The emphasis in this phase of RIOXX is to do something which is adequate and able to be <strong>quickly</strong> implemented. This work will provide an application profile and guidelines which are inherently an <em>interim</em> solution. Broadly speaking, the approach we are taking is as follows:</p>
<h3>Develop the simplest possible application profile, based on Dublin Core (DC).</h3>
<p>Pretty much all repositories support DC, as another application profile of DC, OAI-DC, is a mandated minimum metadata format for the ubiquitous protocol for harvesting metadata from repositories (OAI-PMH). If all goes well, the development work needed for repository systems should be minimised.</p>
<h3>Consider other, related guidelines</h3>
<p>We have examined two related initiatives: the <a rel="nofollow" target="_blank" href="http://www.openaire.eu/en/component/attachments/download/79%E3%80%88=en">OpenAIRE guidelines</a> (and the Driver guidelines which preceded these), and the <a rel="nofollow" target="_blank" href="http://ethostoolkit.cranfield.ac.uk/tiki-index.php?page_ref_id=69">EThOS Toolkit</a> which developed an application profile of DC for eTheses.</p>
<h3>Consider a CERIF-XML expression of this application profile</h3>
<p>The interest in CERIF as the <em>de facto</em> standard format for exchanging this kind of information between systems is growing steadily. We are liaising with the <a rel="nofollow" target="_blank" href="http://isc.ukoln.ac.uk/work/csp/">CERIF Support Project</a> and ensuring that a transition towards a CERIF-based approach remains viable. </p>
<h3>Develop a modelled, expressive application profile</h3>
<p>In later phases of RIOXX, we hope to develop the application profile more fully. This will take into account such things as:<br />
* greater use of controlled vocabularies<br />
* a move away from DC and towards CERIF<br />
* greater involvement of systems other than repositories &#8211; notably Current Research Information Systems (CRIS).<br />
* modelling of &#8216;access-level semantics&#8217; &#8211; i.e. describing how, where and under what license or conditions the resource might be accessed and used</p>
<h2>Rationale for some decisions in phase 1</h2>
<h3>Keeping things very simple</h3>
<p>Timescales are very, very tight. From a pragmatic, technical point of view we have restricted ourselves in this phase to developing an approach which allows the repository to emit RIOXX records based on information properties already catered for in the repository system (that is, the placeholders for <em>Sponsor</em> and <em>ProjectID</em> already being there, even if the actual data has not yet been entered). We have deferred a more complete and complex approach to a later phase because the capacity to deliver this kind of information from institutional systems is developing rapidly.</p>
<h3>The <em>ProjectID</em> property</h3>
<p>We found ourselves unable to simply adopt the OpenAIRE guidelines as these mandate a particular syntax for the <em>ProjectID</em> (designed for EC funded projects) which would preclude certain UK funders. In any case, we consider it to be a mistake to embed semantics into this property and believe it is best provided as a globally-unique, opaque identifier. To this end, we are actively looking at the possibility of funders minting DOIs for the <em>ProjectID</em>. In the meantime, we will be requiring that the ProjectID be whatever identifier is provided by the funder of the output being described in the record.<br />
We have chosen the term <em>ProjectID</em> rather than, for example, <em>GrantID</em>, as we have been advised that the former is the more widely used term in common usage in the UK.</p>
<h3>The <em>Sponsor</em> property</h3>
<p>For phase 1 we are mandating this property, but specifying only that a recognised form of identifier for the funder/sponsor be used. This will mean a free-text string for now. We are actively exploring possibilities for identifying and then mandating a particular authority list of funder names, such that this property becomes underpinned by a controlled vocabulary. However, this will not make it into phase 1.<br />
This property, while essential in the short term, might become more of a convenience than a necessity, as the <em>ProjectID</em> becomes more reliably &#8216;actionable&#8217;. In the medium-term, we would anticipate being able to reliably derive the sponsor/funder from the <em>ProjectID</em>. For this reason, we have not modelled the relationship between these two properties closely &#8211; except insofar as they exist in a particular record. This means that some records may contain more than one <em>Sponsor</em> and more than one <em>ProjectID</em> with no direct way to relate a given <em>ProjectID</em> to a given <em>Sponsor</em>. While it would be possible to model this relationship, we have chosen not to do so in this phase, because:</p>
<ul>
<li>it is not the common case that a record would have more than one <em>Sponsor</em></li>
<li>it is more likely that a record might have more than one <em>ProjectID</em>, but only one <em>Sponsor</em>. This happens where a project has multiple versions &#8211; such as when the PI moves institution during the project.</li>
<li>it is unlikely that current repository systems will be able to provide more richly modelled relationships between these properties without further development</li>
<li>it is the common case that a record will have one <em>Sponsor</em> and one <em>ProjectID</em>.</li>
</ul>
<p>We anticipate that this will need to be modelled more thoroughly in future phases.</p>
<h3>Deferring the &#8216;access-level-semantics&#8217; question</h3>
<p>In order to convey the precise nature of the open-access &#8216;state&#8217; of resource, RIOXX will need to develop a richer way of describing such concepts as &#8216;green&#8217; or &#8216;gold&#8217; open access, embargoes, licenses etc. The use-cases and operations which will depend on such information are not yet clear and, while the time has now come to model these, this should not be done in a hurry.</p>
<p>The following is a table of proposed elements and recommended formats. We propose to use extend the Dublin Core elements with two new elements under the <em>rioxterms</em> namespace.</p>
<ul>
<li><strong>M</strong>: Mandated</li>
<li><strong>R</strong>: Recommended</li>
<li><strong>O</strong>: Optional</li>
</ul>
<table>
<tr>
<th>Element</th>
<th>Inclusion M/R/O</th>
<th>Format</th>
<th>Format M/R/O</th>
</tr>
<tr>
<td>dc:title</td>
<td>M</td>
<td>Free text. It is recommended to use the form: <i>Title:Subtitle</i></td>
<td>R</td>
</tr>
<tr>
<td>dc:creator</td>
<td>M</td>
<td>Free text. Recommended practice is to either use the form <i>Last Name, First Name(s)</i> or a unique identifier from a recognised system. Each creator should be given a separate <i>dc:creator</i> element</td>
<td>R</td>
</tr>
<tr>
<td>dc:identifier</td>
<td>M</td>
<td>A globally unique identifier. It is strongly recommended to use a URI which can be de-referenced (i.e. is &#8216;actionable&#8217;) where this is appropriate</td>
<td>R</td>
</tr>
<tr>
<td>dc:source</td>
<td>M</td>
<td>Journal title, reference or ISSN</td>
<td>M</td>
</tr>
<tr>
<td>dc:language</td>
<td>M</td>
<td>Use <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/ISO_639-3">ISO 639-3 language codes</a></td>
<td>M</td>
</tr>
<tr>
<td>rioxxterms.projectid</td>
<td>M</td>
<td>Use the identifier provided by the funder to indicate the project within which this output has been created</td>
<td>M</td>
</tr>
<tr>
<td>dc:coverage</td>
<td>O</td>
<td>The extent or scope of the content of the resource. Coverage will typically include spatial location (a place name or geographic co-ordinates), temporal period (a period label, date or date range) or jurisdiction (such as a named administrative entity).</td>
<td></td>
</tr>
<tr>
<td>dc:rights</td>
<td>O</td>
<td>No agreed vocabulary or semantics exist for this in the context of Open Access papers, and it is common practice for this to be ignored by repositories currently. Some work is being funded to look at this area for the next phase of RIOXX. For now, this element has to be optional.</td>
<td></td>
</tr>
<tr>
<td>dc:audience</td>
<td>O</td>
<td>Free text.</td>
<td></td>
</tr>
<tr>
<td>dc:format</td>
<td>R</td>
<td>It is recommended to use the IANA registered list of Internet Media Types (MIME types)</td>
<td>M</td>
</tr>
<tr>
<td>dc:date</td>
<td>M</td>
<td>One date using ISO 8601.  Published date is the default and recommended interpretation.</td>
<td>M</td>
</tr>
<tr>
<td>dc:type</td>
<td>O</td>
<td>This is currently free text and an optional element. However, RIOXX phase 1 will be recommending that a vocabulary be adopted or developed for this element.</td>
<td>O</td>
</tr>
<tr>
<td>dc:contributor</td>
<td>O</td>
<td>(as for dc:creator)</td>
<td></td>
</tr>
<tr>
<td>rioxxterms.sponsor</td>
<td>M</td>
<td>Free text &#8211; Funder name using the funder&#8217;s preferred format</td>
<td>O</td>
</tr>
<tr>
<td>dc:publisher</td>
<td>R</td>
<td>Free text indicating the name of the publisher (commercial or non-commercial)</td>
<td>O</td>
</tr>
<tr>
<td>dc:description</td>
<td>R</td>
<td>Best practice is to use an English language abstract.</td>
<td>O</td>
</tr>
<tr>
<td>dc:subject</td>
<td>R</td>
<td>Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. E.g. LOC, MESH.</td>
<td>O</td>
</tr>
</table>
<p>I would appreciate any comments people might have about the technical aspects of this.</p>]]></content:encoded>
      <enclosure url="http://www.openaire.eu/en/component/attachments/download/79%E3%80%88=en" length="146066" type="application/pdf" /><media:content url="http://www.openaire.eu/en/component/attachments/download/79%E3%80%88=en" fileSize="146066" type="application/pdf" /><itunes:subtitle>Together with Sheridan Brown, I have been tasked with developing some guidelines and a metadata &amp;#8216;application&amp;#8217; profile for institutional repositories (IRs) in the UK. We are calling this work RIOXX. This post focusses on the application profile</itunes:subtitle><itunes:summary>Together with Sheridan Brown, I have been tasked with developing some guidelines and a metadata &amp;#8216;application&amp;#8217; profile for institutional repositories (IRs) in the UK. We are calling this work RIOXX. This post focusses on the application profile more than the guidelines, and describes phase 1 of the project, which aims to deploy this application profile [...]</itunes:summary></item>
      <item>
         <title>RIP my friend &amp;#8211; Lee Dirks</title>
         <link>http://savas.me/2012/08/rip-my-friend-lee-dirks/</link>
         <description>&lt;p&gt;&lt;img class="alignleft" src="http://savas.me/wp-content/uploads/2012/08/083012_0332_RIPmyfriend1.jpg" alt="" width="211" height="165" align="left"/&gt;People started posting their condolences on &lt;a rel="nofollow" target="_blank" href="http://www.facebook.com/ldirks"&gt;Lee&amp;#8217;s Facebook timeline&lt;/a&gt; so I guess it&amp;#8217;s ok to post this note now.&lt;/p&gt;
&lt;p&gt;Earlier this morning I received the sad news that my good friend and colleague at Microsoft, Lee Dirks, and his wife, Judy, were killed in a &lt;a rel="nofollow" target="_blank" href="http://latino.foxnews.com/latino/news/2012/08/29/2-americans-peruvian-die-in-peru-highway-accident/"&gt;tragic accident in Peru&lt;/a&gt;. Their two daughters were in Seattle. Even now, I cannot process the news.&lt;/p&gt;
&lt;p&gt;Lee was an amazing amazing guy. Larger than life.&lt;/p&gt;
&lt;p&gt;We worked together in MSR, under Tony Hey, for almost three years but continued to be in touch (drink and play poker) even after that. I&amp;#8217;d like to believe that Lee, Alex, and I produced some great work together. We had lots of fun collaborating and traveling around, working with communities in domains such as Scholarly Communication, Digital Libraries, Computer Science, Repositories, you name it. Lee and Alex introduced me to the wonderful world of Library/Information Science and great people around the world. Most importantly, I&amp;#8217;d like to believe that we really connected… I would do anything for these guys.&lt;/p&gt;
&lt;p&gt;Lee was one of those guys that was friends with everyone. HUGE smile! Always there for you!&lt;/p&gt;
&lt;p&gt;My thoughts are with his family, friends, and colleagues.&lt;/p&gt;
&lt;p&gt;We lost a really good guy today :-(&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here are some photos of/with Lee from the things we did together:&lt;/p&gt;
&lt;p&gt;Sasquatch Festival with Mary and Shelton&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3809.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3809" src="http://savas.me/wp-content/uploads/2012/08/IMG_3809_thumb.jpg" alt="IMG_3809" width="240" height="160" border="0"/&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3830.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3830" src="http://savas.me/wp-content/uploads/2012/08/IMG_3830_thumb.jpg" alt="IMG_3830" width="240" height="160" border="0"/&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3881.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3881" src="http://savas.me/wp-content/uploads/2012/08/IMG_3881_thumb.jpg" alt="IMG_3881" width="240" height="160" border="0"/&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3888.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3888" src="http://savas.me/wp-content/uploads/2012/08/IMG_3888_thumb.jpg" alt="IMG_3888" width="240" height="160" border="0"/&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3901.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3901" src="http://savas.me/wp-content/uploads/2012/08/IMG_3901_thumb.jpg" alt="IMG_3901" width="240" height="160" border="0"/&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3904.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3904" src="http://savas.me/wp-content/uploads/2012/08/IMG_3904_thumb.jpg" alt="IMG_3904" width="240" height="160" border="0"/&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_3781.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;margin:0px;display:inline;padding-right:0px;border:0px;" title="IMG_3781" src="http://savas.me/wp-content/uploads/2012/08/IMG_3781_thumb.jpg" alt="IMG_3781" width="240" height="160" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Poker at my place&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_1321.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border:0px;" title="IMG_1321" src="http://savas.me/wp-content/uploads/2012/08/IMG_1321_thumb.jpg" alt="IMG_1321" width="240" height="160" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The Scholarly Communications trio (&lt;a rel="nofollow" target="_blank" href="http://savas.me/2009/11/scholarly-communications-group-on-facebook-and-at-the-oclc-mashathon-event/"&gt;at an OCLC event&lt;/a&gt;:-)&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/1042.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border:0px;" title="1042" src="http://savas.me/wp-content/uploads/2012/08/1042_thumb.jpg" alt="1042" width="240" height="180" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://research.microsoft.com/en-us/news/features/zentity-052009.aspx"&gt;Our Zentity project is announced to the world&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/scholarlycommunications_zentity.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border:0px;" title="scholarlycommunications_zentity" src="http://savas.me/wp-content/uploads/2012/08/scholarlycommunications_zentity_thumb.jpg" alt="scholarlycommunications_zentity" width="240" height="180" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;On the road to talk about Zentity (and eat and drink :-)&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://savas.me/wp-content/uploads/2012/08/IMG_6508.jpg"&gt;&lt;img style="background-image:none;padding-top:0px;padding-left:0px;display:inline;padding-right:0px;border:0px;" title="IMG_6508" src="http://savas.me/wp-content/uploads/2012/08/IMG_6508_thumb.jpg" alt="IMG_6508" width="160" height="240" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;</description>
         <author>savas</author>
         <guid isPermaLink="false">http://savas.me/?p=2753</guid>
         <pubDate>Thu, 30 Aug 2012 03:32:29 +0000</pubDate>
      </item>
      <item>
         <title>x11e7a crig sport</title>
         <link>http://www.dailymotion.com/video/x11e7a_crig_sport?syndication=112314</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/2a9f66062228759c55cdbf56979db1f8#</guid>
         <pubDate>Sun, 12 Aug 2012 14:32:17 +0000</pubDate>
      </item>
      <item>
         <title>UTS Library future service model (with notes)</title>
         <link>http://www.slideshare.net/malbooth/uts-library-future-service-model-with-notes</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/ca043160865691f5080ba3cb1c3cc44c#</guid>
         <pubDate>Sun, 03 Jun 2012 23:36:35 +0000</pubDate>
      </item>
      <item>
         <title>iPad App for Islandora Repo Now Available</title>
         <link>http://feedproxy.google.com/~r/typepad/mleggott/loomware/~3/QenxhR2ARU8/ipad-app-for-islandora-repo-now-available.html</link>
         <description>&lt;div&gt;&lt;p&gt;Our new iPad application, Telling Island Stories, is now available in the App Store (and yes it does support the retina display). TIS is a front-end to a number of &lt;a rel="nofollow" target="_blank" href="http://islandora.ca/"&gt;Islandora&lt;/a&gt;-based digital collections, with a focus on story and geo-level presentation of digital assets. As a lightweight iPad app it will give you a good example of what can be done with a powerful Islandora-&lt;a rel="nofollow" target="_blank" href="http://fedora-commons.org/"&gt;Fedora&lt;/a&gt; driven backend. Some highlights of this version:&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt;- provision of "collection" views via Media (images, audio, books, maps), Stories (associated subsets of assets) and Communities (assets sharing a geographic range);&lt;br&gt;- a map-based view of all assets, including media-specific pins, pin aggregation for larger numbers of items in close proximity (including the ability to turn this feature on and off), search/display via the map;&lt;br&gt;- flexible display including ability to toggle metadata display on and off by item, filter items by type;&lt;br&gt;- delivery of high quality retina-display quality books in a page &lt;/p&gt;&#xD;
&lt;/blockquote&gt;&#xD;
&lt;p&gt;Items available at launch include photographs from the University of PEI and community groups, herbarium specimins, digital books from the &lt;a rel="nofollow" target="_blank" href="http://www.lmmontgomery.ca/"&gt;Lucy Maud Montgomery Institute&lt;/a&gt;, oral histories from the &lt;a rel="nofollow" target="_blank" href="http://www.islandvoices.ca/"&gt;Dutch Thompson collection&lt;/a&gt; and &lt;a rel="nofollow" target="_blank" href="http://www.islandimagined.ca/"&gt;maps from UPEI and partner organizations&lt;/a&gt;. As content is added to the Islandora system it automatically shows up in the app - no need to download a new app just for new content. A future version will allow the user to submit content to a central repository.&lt;/p&gt;&#xD;
&lt;p&gt;We are especially proud of the collaborative approach to making content from all parts of Prince Edward Island available to the world. If you are visiting PEI this summer you definitely want to download the free TIS app to your iPad!&lt;/p&gt;&#xD;
&lt;p&gt;TIS is a joint development venture between &lt;a rel="nofollow" target="_blank" href="http://upei.ca/"&gt;UPEI&lt;/a&gt; and &lt;a rel="nofollow" target="_blank" href="http://discoverygarden.ca/"&gt;DiscoveryGarden&lt;/a&gt;, an Islandora services company, along with generous funding from &lt;a rel="nofollow" target="_blank" href="http://www.innovationpei.com/"&gt;Innovation PEI&lt;/a&gt;. If you are interested in adapting the TIS app to your own Fedora/Islandora framework &lt;a rel="nofollow" target="_blank" href="mailto:mleggott@mac.com"&gt;drop me a note&lt;/a&gt;.&lt;/p&gt;&#xD;
&lt;blockquote&gt;&#xD;
&lt;p&gt; &lt;/p&gt;&#xD;
&lt;/blockquote&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=QenxhR2ARU8:0uWltsh9w2k:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?d=yIl2AUoC8zA" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=QenxhR2ARU8:0uWltsh9w2k:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?i=QenxhR2ARU8:0uWltsh9w2k:F7zBnMyn0Lo" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=QenxhR2ARU8:0uWltsh9w2k:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?i=QenxhR2ARU8:0uWltsh9w2k:V_sGLiPBpWU" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=QenxhR2ARU8:0uWltsh9w2k:EpLpB3ZkKWg"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?d=EpLpB3ZkKWg" border="0"&gt;&lt;/a&gt;
&lt;/div&gt;</description>
         <author>mleggott</author>
         <guid isPermaLink="false">tag:typepad.com,2003:post-6a00d83452e76c69e20168ebfdfa49970c</guid>
         <pubDate>Fri, 01 Jun 2012 13:11:18 +0000</pubDate>
      </item>
      <item>
         <title>UPEI Senate Approves Open Access Policy</title>
         <link>http://feedproxy.google.com/~r/typepad/mleggott/loomware/~3/C4lfLSIF4_8/upei-senate-approves-open-access-policy.html</link>
         <description>&lt;div&gt;&lt;p&gt;After a lengthy but very fruitful process UPEI's Senate approved &lt;a rel="nofollow" target="_blank" href="https://cab.upei.ca/sites/default/files/attachments/OpenAccessandDisseminationofResearchOutput.pdf"&gt;a new open access policy&lt;/a&gt;. Some highlights of the policy:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;immediate deposit of scholarly articles (including faculty, grad student and undergraduate works) in the UPEI repository is encouraged (a mandate was not considered a viable option - next time...);&lt;/li&gt;&#xD;
&lt;li&gt;encouragement for scholars to retain copyright, including a reference to the &lt;a rel="nofollow" class="zem_slink" target="_blank" href="http://en.wikipedia.org/wiki/SPARC" title="SPARC"&gt;SPARC&lt;/a&gt; addendum;&lt;/li&gt;&#xD;
&lt;li&gt;deposit of research data into a UPEI &lt;a rel="nofollow" target="_blank" href="http://library.upei.ca/vre"&gt;Virtual Research Environment&lt;/a&gt; is also encouraged, along with links to the final scholarly work.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://islandscholar.ca/"&gt;IslandScholar.ca&lt;/a&gt; is the UPEI open access repository, and it is currently undergoing a major re-architecture with a new launch this summer. The UPEI effort is also a key part of the new &lt;a rel="nofollow" target="_blank" href="http://islandora.ca/"&gt;Islandora&lt;/a&gt; Institutional Repository Solution Pack, which will be released this summer as well.&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=C4lfLSIF4_8:ELE4kW3L6jE:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?d=yIl2AUoC8zA" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=C4lfLSIF4_8:ELE4kW3L6jE:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?i=C4lfLSIF4_8:ELE4kW3L6jE:F7zBnMyn0Lo" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=C4lfLSIF4_8:ELE4kW3L6jE:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?i=C4lfLSIF4_8:ELE4kW3L6jE:V_sGLiPBpWU" border="0"&gt;&lt;/a&gt; &lt;a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?a=C4lfLSIF4_8:ELE4kW3L6jE:EpLpB3ZkKWg"&gt;&lt;img src="http://feeds.feedburner.com/~ff/typepad/mleggott/loomware?d=EpLpB3ZkKWg" border="0"&gt;&lt;/a&gt;
&lt;/div&gt;</description>
         <author>mleggott</author>
         <guid isPermaLink="false">tag:typepad.com,2003:post-6a00d83452e76c69e2016766e80188970b</guid>
         <pubDate>Tue, 29 May 2012 12:16:15 +0000</pubDate>
      </item>
      <item>
         <title>CQL Bibliographic Searching Proposal</title>
         <link>http://www.loc.gov:8081/standards/sru/cql-bibliographic-searching.html</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/7214687e6ef45bc3a6e86c62d12a7ee2#</guid>
         <pubDate>Wed, 23 May 2012 21:18:42 +0000</pubDate>
      </item>
      <item>
         <title>Library systems of the future</title>
         <link>http://blog.paulwalk.net/2012/01/20/library-systems-of-the-future/</link>
         <description>Edit: The presentation I gave to accompany this post is available on Slideshare I was asked by Ben Showers of the JISC to write a &amp;#8216;challenging and provocative vision&amp;#8217; for library management systems, for a joint JISC / SCONUL workshop. I was given a free hand with this &amp;#8211; the only parameters were that the [...]</description>
         <guid isPermaLink="false">http://blog.paulwalk.net/?p=262</guid>
         <pubDate>Fri, 20 Jan 2012 18:54:07 +0000</pubDate>
         <content:encoded><![CDATA[<p><em>Edit: The presentation I gave to accompany this post is <a rel="nofollow" target="_blank" href="http://www.slideshare.net/paulwalk/future-of-lms">available on Slideshare</a></em></p>
<p><em>I was asked by Ben Showers of <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/">the JISC</a> to write a &#8216;challenging and provocative vision&#8217; for library management systems, for a joint JISC / <a rel="nofollow" target="_blank" href="http://www.sconul.ac.uk/">SCONUL</a> workshop. I was given a free hand with this &#8211; the only parameters were that the piece should be non more than a side of A4 paper in length, and that it should use 2020 as its target year for prediction. I think I ignored both of these restrictions, but I had fun and it did provoke some discussion….</em></p>
<p><em>Dramatis personae:</em></p>
<ul>
<li>Alby, a young student &amp; researcher in full time employment</li>
<li>Charlotte, a venerable librarian</li>
<li>Bob, Dan and Eva, semi-autonomous software agents</li>
</ul>
<p>Following the unprecedented Conservative &#8216;walk-over&#8217; election victory of 2015 and the subsequent consolidation in 2019, the landscape of higher-education in the UK is all but unrecognisable. The free market dominates the buying and selling of courses, and the provisioning of learning and research resources has, in the end, simply had to follow suit. Copyright has mostly been &#8216;fixed&#8217; in the virtual world through a combination of an adjustment to more modest expectations of compensation for copyright holders, workable systems to control distribution, and global agreements allowing extradition and prosecution.</p>
<h2 id="thestudentresearcher1">The student researcher (1)</h2>
<p>Alby works, full time, as a software engineer. As part of his job, he is given some time to pursue research topics of interest to him and to his employer. His firm gives him a small budget to support this. In the evenings he studies part-time for the new <em>Masters++</em> qualification. He is enrolled at three universities, visiting one of the these &#8211; the local <em>George Osbourne University</em> (GOU) &#8211; every Thursday evening. He finances all of this himself.</p>
<p>On Monday evening, when Alby gets home, he goes straight to his laptop and works through all the notes he has dictated into his smart phone during the day. He has become interested in the evolution of library systems and wants to  register this interest on the Research Interest Grid (RIG). While recording notes into his phone, he has also published some of these into <em>StreamingConscious</em>, the latest social network to become popular with researchers, and has gained a few new connections from people with aligned interests, including a promising one with a subject librarian at GOU.</p>
<p>Alby then invokes his Foraging Agent, &#8216;Bob&#8217;. A license for Bob was given to him by a  publisher, <em>Coyote</em>, which specialises in resources for software engineers, in return for sending him a steady stream of advertisements. Alby adopted Bob because he liked its interface, but he suspects it has in-built biases towards certain, commercial information sources. He believes that he compensates for this by carefully defining his research questions in Research Question Format (RQF) and filtering the results.</p>
<p>Bob runs constantly on Alby&#8217;s &#8216;slice&#8217; &#8211; a portion of Personal Cloud (PC) infrastructure provided by a well-known supermarket chain. After a series of questions and answers, Bob is armed with three carefully RQF research questions, and a set of parameters, such as when to report back, and how much of Alby&#8217;s research budget to spend on a single transaction before asking him for approval. Bob has learned through observation how Alby likes to work. It <em>knows</em> him in a sense, enough to represent his interests when dealing with other agents. Alby then instructs Bob to begin searching, negotiating and shopping for answers, leads and recommendations, while he gets on with some reading. Alby has grown to trust Bob.</p>
<h2 id="thelibrarian">The Librarian</h2>
<p>Charlotte is a subject librarian with many years&#8217; experience (she tried to retire 3 year ago but has been forced to come back to work), specialising in software &amp; systems engineering, and currently working for George Osbourne University. On Tuesday morning she checks the reports from her Listening Agents over breakfast. She controls several agents running on the library&#8217;s slice of the GOU cloud. </p>
<p>Bob, an agent representing someone called Alby has made contact, coincidentally, with two of her agents &#8211; one which represents GOU and which reports to her, and her own personal agent, Eva. Only yesterday, <em>BirdSong</em> (a social network monitoring agent) had suggested that she connect with @alby on <em>StreamingConscious</em> based on their mutual interest in the history of LMS systems. Charlotte&#8217;s interest in LMS systems is partly fuelled by nostalgia &#8211; she has been working with such systems for more than thirty years.</p>
<p>She sees that Dan, the GOU agent, has supplied Bob with material to which Alby is automatically entitled, and has automatically reserved two books from the local GOU collection for him. In so doing, Dan recommends to Charlotte the purchase of a newer edition of one of these textbooks.</p>
<p>Dan has also made a number of offers to Bob of more restricted material which can be supplied at a cost, including 3 inter-library-loans. Bob has accepted one of these paid-for items on Alby&#8217;s behalf and Charlotte is happy to see that it has also observed the protocol of explaining why it has not accepted the others. In one case, she sees that Bob was successful in bidding on <em>eBay</em> for a second-hand copy of a book which Dan had offered as an ILL. Bob has also made an offer to Dan for ownership of the book, once Alby has finished with it, in return for one free ILL. Dan needs Charlotte to approve this. However, she declines, knowing the book to be flawed, despite its 4 star popularity rating. Dan registers this decision, quietly blacklists the book against any future recommendation, and reports this decision to Bob.</p>
<p>Dan notes that Bob has also registered a second book on Alby&#8217;s personal virtual book-shelf and indicated a willingness to make this available to the GOU circulation agent for loan to other GOU students as part of the &#8216;Support Your Library&#8217; protocol, in return for one free ILL token. Charlotte accepts this offer.</p>
<p>Charlotte instructs Dan to negotiate with Bob to arrange a meeting over coffee for Alby and herself. She does this partly because Eva has separately registered Alby&#8217;s interest on the RIG and it seems worthwhile meeting with Alby in person to discuss his research. She decides to investigate a couple of other suggestions thrown up by Dan in the meantime. She also notes that Dan has suggested a couple of other contacts to Bob &#8211; other people who are enrolled at GOU and whom Alby may wish to befriend on <em>StreamingConscious</em> &#8211;  &#8211; as part of a strategy to reinforce the local GOU social network of students and researchers.</p>
<h2 id="thestudentresearcher2">The student researcher (2)</h2>
<p>Later on Tuesday morning, Alby wakes to find an interesting report from Bob waiting for him. He discovers he is the proud owner of a new book on LMS system design and is pleased to note that it has a four star rating &#8211; one star above the threshold he has set in Bob&#8217;s book-buying decision parameters.</p>
<p>Bob has, inevitably, also turned up a few offers of information and resources from the &#8216;invisible market&#8217;. He knows that if you have the right connections, you can get just about any book in ePub5 format. The penalties for possession of an illegally obtained, copyright resource are stiff however. Although it is not illegal, he is also a little wary of using <em>Turpin</em>, the global federation of Open Access papers and other resources, as he has been culturally conditioned to be suspicious of things which appear to be &#8216;free&#8217;.</p>
<p>He also finds a tentative appointment in his diary for coffee with @charlotte, the subject librarian with whom he connected yesterday on <em>StreamingConscious</em>. As he works close by the university, he accepts the appointment. He can pick up his reservations while he&#8217;s there.</p>
<h2 id="facetofacelaterthatmorning">Face to face, later that morning</h2>
<p>Alby finally puts his pen down, and takes a swig of his coffee. He has been writing furiously for half an hour. Charlotte has just taken him on a whirlwind tour of the evolution of the LMS.</p>
<p>She has described how the library has learned, over the last decade, that client relationship management (CRM) is crucial to its mission. Adjusting to the new realities of social networking and global search, the LMS has become a distributed and loosely-coupled collection of processes, all designed to help connect people with resources and with each other.</p>
<p>Alby learns how the rapid introduction of semi-autonomous software agents into research practice took many by surprise. Although the concepts were not new, and much of the technology existed in one form or another, it took the confluence of a number of factors to finally introduce agent-mediated research:</p>
<ul>
<li>the cultural acceptance of an &#8216;always online&#8217; culture brought about through the ubiquity of  smart phones, the prevalence of global social networks and move from the desktop to cloud-based processes</li>
<li>the utter complexity of negotiating through &#8216;permission stacks&#8217; to determine whether or not an individual has the rights to access a given resource in a given context</li>
<li>the complexity of relationships between individuals and institutions</li>
</ul>
<p>Charlotte explains how, from having been a destination for local researchers, the LMS has dissolved into the fabric of a vast, distributed network of research interests, library collections, national, private and open resources.</p>
<p>While the curation of local collections remains important, the facilitation of networking, and the handling of transactions, both social and financial, has taken over as the focus of the LMS. She points out that where once it was quite easy to point to the LMS &#8211; at least as a line in a budget sheet &#8211; it has become somewhat nebulous on recent years. The LMS has become the coffee-shop of cyberspace, where software agents meet to compare notes, register interests, make deals&#8230;.</p>
<p>Taking a sip of her peppermint tea, Charlotte sighs as she remembers how simple it all once was.</p>]]></content:encoded>
      </item>
      <item>
         <title>Making Debian Changelogs from Github repositories</title>
         <link>http://davetaz-blog.blogspot.com/2012/01/making-debian-changlogs-from-github.html</link>
         <description>One of the many things that irks me is the gap between good developers who put all their code on platforms such as GitHub, and those who then actually bother to put some effort into packaging up their code for easy platform installation. &lt;br /&gt;&lt;br /&gt;I have come to the realisation that this is mainly due to the pedantic nature of packaging formats and platform lock in. One such example is the exacting format of the debian changelog...&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://p2-registry.ecs.soton.ac.uk/opf/gh2ch/"&gt;GitHib2Changelog&lt;/a&gt; is a bit of code that I knocked together to help in this situation. It takes a GitHub repository URL and builds a debian changelog from the repository commits and tags.&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://2.bp.blogspot.com/-dQQtRWJC1zw/TxmYH4qTEwI/AAAAAAAAAFE/RH2MrLzNCZs/s1600/gh2hc.png"&gt;&lt;img style="display:block;margin:0px auto 10px;text-align:center;cursor:pointer;cursor:hand;width:580px;" src="http://2.bp.blogspot.com/-dQQtRWJC1zw/TxmYH4qTEwI/AAAAAAAAAFE/RH2MrLzNCZs/s1600/gh2hc.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5699754064650375938"/&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;By looking at the tags and commits it works out which commits are related to which tags (something GitHub APIv3 doesn't do) and then outputs this directly to you already formatted. &lt;br /&gt;&lt;br /&gt;The service is built in php, and is web based with both a pretty front end and API access. &lt;br /&gt;&lt;br /&gt;Ironically, since i've now committed the code to GitHub &lt;a rel="nofollow" target="_blank" href="https://github.com/davetaz/Github2ChangeLog"&gt;here&lt;/a&gt; I now need to use the service on itself and build the easy to install packages. More on that soon...</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-1386325288664783565</guid>
         <pubDate>Fri, 20 Jan 2012 08:27:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://2.bp.blogspot.com/-dQQtRWJC1zw/TxmYH4qTEwI/AAAAAAAAAFE/RH2MrLzNCZs/s72-c/gh2hc.png" width="72" />
      </item>
      <item>
         <title>DepositMOre - The Prototype</title>
         <link>http://davetaz-blog.blogspot.com/2012/01/depositmore-prototype.html</link>
         <description>Building on the success of DepositMO and SWORDv2, I thought it would be a good idea to put a quick HTML5 client together to save myself some pain.&lt;br /&gt;&lt;br /&gt;The basic premise of this web-based client is to automatically search for "your stuff" in a number of ways and then allow it all to be submitted to a repository in one click. &lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://1.bp.blogspot.com/-WDA-88sy11c/Txgvj2mr50I/AAAAAAAAAEI/Yfwv3R4oXhs/s1600/easychair_desktop.png"&gt;&lt;img style="float:right;margin:0 10px 10px 0;cursor:pointer;cursor:hand;width:320px;height:162px;" src="http://1.bp.blogspot.com/-WDA-88sy11c/Txgvj2mr50I/AAAAAAAAAEI/Yfwv3R4oXhs/s320/easychair_desktop.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5699357621437065026"&gt;&lt;/a&gt;First target for me was www.easychair.org. This service is used as an online conference submission and review system. In a nut-shell if an author wants to get accepted into a conference, easychair is one system which they WILL have to battle with in order to submit their content. As a result there is a strong potential that easychair knows about many publications which should also be present in other systems.&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://2.bp.blogspot.com/-uD3VFZnJ9VA/TxgwT1_4uQI/AAAAAAAAAEU/tvKLK2oAYr8/s1600/easychair_paper.png"&gt;&lt;img style="float:left;margin:0 10px 10px 0;cursor:pointer;cursor:hand;width:320px;height:174px;" src="http://2.bp.blogspot.com/-uD3VFZnJ9VA/TxgwT1_4uQI/AAAAAAAAAEU/tvKLK2oAYr8/s320/easychair_paper.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5699358445908048130"&gt;&lt;/a&gt;From the main screen in easychair it is possible to navigate and find the many conference publications which you have submitted. Each publication is tied to a conference and it can take a substantial number of clicks to navigate between each publication.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;DepositMOre&lt;/h2&gt;&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://3.bp.blogspot.com/-1vtThDXev9E/TxgxHmACtZI/AAAAAAAAAEg/jpP977aiIsU/s1600/deposit_more_login.png"&gt;&lt;img style="float:right;margin:0 0 10px 10px;cursor:pointer;cursor:hand;height:200px;" src="http://3.bp.blogspot.com/-1vtThDXev9E/TxgxHmACtZI/AAAAAAAAAEg/jpP977aiIsU/s320/deposit_more_login.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5699359334966932882"&gt;&lt;/a&gt;DepositMOre is a modular system which is intended to be a home for many services which locate your publications. The first module to be developed is for easychair.&lt;br /&gt;&lt;br /&gt;By simply providing your login credentials to the DepositMOre system, it will not only list all your authored items from easychair but also check if these are present in your locally detected repository. If they are not deposited, and they should be then one click will do this for you. &lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://3.bp.blogspot.com/-Jn9fKQenFlo/TxhEiufSZWI/AAAAAAAAAE4/eDUgULqIVpY/s1600/depositmore.png"&gt;&lt;img style="display:block;margin:0px auto 10px;text-align:center;cursor:pointer;cursor:hand;width:640px;" src="http://3.bp.blogspot.com/-Jn9fKQenFlo/TxhEiufSZWI/AAAAAAAAAE4/eDUgULqIVpY/s1600/depositmore.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5699380691822863714"/&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;A combination of HTML5 and SWORD2 make this process quick and seemless! Multiple items can be submitted at once and as each are submitted you can instantly click a link to your item and can view it in the repository.&lt;br /&gt;&lt;br /&gt;The following video gives a demo of the prototype in action. We hope to continue development with the support of a funded project.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Technologies Used&lt;/h2&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;HTML/Javascript/JQuery/PHP&lt;/li&gt;&lt;br /&gt;&lt;li&gt;SWORD2 PHP Library - Stuart Lewis - https://github.com/stuartlewis/swordappv2-php-library/&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-3103110777547409006</guid>
         <pubDate>Thu, 19 Jan 2012 06:51:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://1.bp.blogspot.com/-WDA-88sy11c/Txgvj2mr50I/AAAAAAAAAEI/Yfwv3R4oXhs/s72-c/easychair_desktop.png" width="72" />
      </item>
      <item>
         <title>Business processes and workflow in the Web services world</title>
         <link>http://www.ibm.com/developerworks/webservices/library/ws-work.html</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/27690112bc0ca3e5964fa7e311d5df3a#</guid>
         <pubDate>Mon, 19 Sep 2011 15:26:15 +0000</pubDate>
      </item>
      <item>
         <title>Financing Community Renewables Schemes – Recommendation June 2010 - Annex A</title>
         <link>http://www.scotland.gov.uk/Topics/Business-Industry/Energy/Energy-sources/19185/Resources/17613/FREDSCRIG/CRIGRecAJune2010</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/0e7e7e458c0b6e5f6bcf3d42e6553cdb#</guid>
         <pubDate>Thu, 21 Jul 2011 03:47:01 +0000</pubDate>
      </item>
      <item>
         <title>David F. Flanders</title>
         <link>http://dfflanders.wordpress.com/</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/1045d474fbd6c6bb28f24fa474c2cae0#</guid>
         <pubDate>Wed, 25 May 2011 14:29:56 +0000</pubDate>
      </item>
      <item>
         <title>OAI-ORE Importer for Omeka</title>
         <link>http://www.screencast.com/users/SeanHannan/folders/Default/media/47e6be9d-774f-47ae-b5a6-04b4b185e5f2</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/315631b94cc8b1bf750f59faa18ebc4a#</guid>
         <pubDate>Wed, 30 Mar 2011 21:24:05 +0000</pubDate>
      </item>
      <item>
         <title>Preservation Tools - Moving Forward</title>
         <link>http://davetaz-blog.blogspot.com/2011/03/preservation-tools-moving-forward.html</link>
         <description>&lt;div&gt;Over the last number of years, JISC and other bodies have funded a number of digital preservation projects which have resulted in some really valuable contributions to the area... now is the time to realise the benefits of this work and provide a digital preservation experience to everyday users. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To achieve this a not insignificant amount of work needs to be undertaken, namely to identify key applications and separate these from the complex systems into which they have been built. Alternatively many applications now need re-thinking and the best bits built into system which have super-ceded these applications. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;File Format Identification Tools&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;File format identification now has a number of tools available, each with their own advantages and disadvantages, in no particular order they are:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;DROID: &lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Started out as a tool to identify file types and versions of those types. :)&lt;/li&gt;&lt;li&gt;Each file version was assigned an identifier which could be referenced and re-used. :)&lt;/li&gt;&lt;li&gt;Identification of file was done via "signature", not extension matching. :)&lt;/li&gt;&lt;li&gt;Became complex as it was adjusted to suit workflows and provide much more complex information which few people understand or want :(&lt;/li&gt;&lt;li&gt;Added complexity increased the time required for each file classification, no longer a simple tool :(&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;FIDO:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;A new cut down client which takes the DROID signature files and does the simple stuff again :)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;FILE: &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;A built in Unix tool installed on every Unix based system in the world already! :)&lt;/li&gt;&lt;li&gt;Does not do version type identification :(&lt;/li&gt;&lt;li&gt;Does not provide a mime-type URI :(&lt;/li&gt;&lt;li&gt;Very quick to run :)&lt;/li&gt;&lt;li&gt;Has the capacity to add version type identification and there is a TODO in the code for it! :)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;With the PRONOM registry now looking at providing URIs for file versions, why can't we stop coding new tools and change the FILE library. This way it could handle the version information and feed back the URIs if people want them. I've looked briefly into this and the PRONOM signatures should be easy to transport and use with the file tool.  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If I get time I might well have a go at this and feed it back to the community. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-2661869222401824269</guid>
         <pubDate>Mon, 14 Mar 2011 04:38:00 +0000</pubDate>
      </item>
      <item>
         <title>Responsive innovation – change management in a recession</title>
         <link>http://blog.paulwalk.net/2010/12/03/responsive-innovation-change-management-in-a-recession/</link>
         <description>Back in August I gave a short presentation to the JISC Innovation Group about the DevCSI project, introducing some ideas about possible future directions. The DevCSI project is a JISC-funded initiative designed to work directly with (software) developers in Higher Education through the general approach of encouraging them to establish a community or peers, sharing knowledge, [...]</description>
         <guid isPermaLink="false">http://blog.paulwalk.net/?p=259</guid>
         <pubDate>Fri, 03 Dec 2010 16:03:48 +0000</pubDate>
         <content:encoded><![CDATA[<p>Back in August I gave a short presentation to the JISC Innovation Group about the <a rel="nofollow" target="_blank" href="http://devcsi.ukoln.ac.uk/blog/">DevCSI project</a>, introducing some ideas about possible future directions. The DevCSI project is a JISC-funded initiative designed to work directly with (software) developers in Higher Education through the general approach of encouraging them to establish a community or peers, sharing knowledge, experience, code etc. An aspect of this which has emerged during the first year of the project is the potential value in peer-training &#8211; where one developer trains a few of their peers. By supporting this kind of activity as an &#8216;add-on&#8217; to larger events, we seem to have hit on a way to deliver extremely cost-effective training to (and, importantly, <em>by</em>) the sector&#8217;s developers (we&#8217;ve done some work to calculate the financial value of this). DevCSI, then, provides a channel through which the sector, represented by JISC and UKOLN, can invest in its developers.</p>
<p>In recent years, JISC has invested in some development programmes based around an approach labelled <em><a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/whatwedo/programmes/inf11/jiscri.aspx">Rapid Innovation</a></em>. Rapid Innovation, in this context, described an approach of investment in small, short, cheap development projects designed to &#8216;scratch an itch&#8217;. There was more than an echo of the <a rel="nofollow" target="_blank" href="http://agilemanifesto.org/">Agile Manifesto</a> in this approach. The Rapid Innovation projects tended to show the following characteristics:</p>
<ul>
<li>they brought developers more to the fore</li>
<li>they produced lighter, more frequent documentation</li>
<li>they produced working code very early in the process</li>
<li>they involved end-users directly, and throughout the project</li>
</ul>
<p>The early work of DevCSI has been informed by this work &#8211; notably in the increased awareness of adoption of agile development methodologies.</p>
<p>So why is this important?</p>
<p>The radical changes currently being introduced to the economic and political landscape around higher education in the UK are forcing universities and colleges to re-examine themselves as &#8216;businesses&#8217;. With the growing interest in commodified hardware and software and remote <em>software as a service </em>(SaaS) options for service delivery, HEIs need to examine how they can best exploit these opportunities. (The JISC&#8217;s <a rel="nofollow" target="_blank" href="http://www.jiscinfonet.ac.uk/flexible-service-delivery/programme">Flexible Service Delivery Programme</a> has been established to help institutions in this). While HEIs will have differing levels and types of interest in what are being referred to as <em>cloud services</em>, they are generally going to be searching for efficiency-based savings.</p>
<p>The value proposition of financial cost-reduction from using shared services is something which cannot be ignored by HEIs &#8211; but it seems to me that there are some things which need to be born in mind:</p>
<ul>
<li>the biggest saving in cloud-based services is to the supplier, not the customer (although the supplier will pass on some of this saving)</li>
<li>this whole approach is not yet well understood &#8211; especially how SaaS sits with an &#8216;enterprise&#8217; service oriented architecture (SOA) approach which is also of interest to some HEIs</li>
<li>some services can be outsourced more easily, or to greater benefit, than others</li>
</ul>
<p>In <em><a rel="nofollow" target="_blank" href="http://blogs.cf.ac.uk/mwe/resource/eunis-2009-final.pdf">The role of the central IT Services organisation in a Web 2.0 world</a></em>, Joe Nicholls and David I Harrison introduce the useful characterisation of services being either <em>chore</em> or <em>core</em>. Making use of SaaS is a form of <em>outsourcing</em>, and outsourcing is a tricky thing to get right. There are arguments for outsourcing those things you have to do but have no special interest in (e.g. HEIs frequently outsource their catering operation). In the ICT service context such services might include the various administration systems which all HEIs need to operate (e.g. finance). These we might call <em>chore</em> services. However, another reason for outsourcing is a lack of capacity or expertise to deliver a service internally &#8211; whether or not that is the preferred option. Services which are <em>core</em> to the HEI&#8217;s business might fall into this category occasionally &#8211; even if this is not ideal. In a recession, with drastically reduced funding, HEIs might see more <em>core</em> services become unsustainable &#8211; or indeed need to reconsider what is core in the first place. Normally, business decisions of this sort are not so simply binary, and some complex judgement will need to be made.</p>
<p>Inevitably, the growing opportunity for outsourcing ICT services will be appealing to many HEIs &#8211; whether those services are outsourced to generic or specialist commercial suppliers, or to HE-sector-based consortia such as the <a rel="nofollow" target="_blank" href="http://kuali.org/">Kuali Foundation</a>. But outsourcing can introduce hidden costs. A lessening of control is one obvious concern. But a more insidious risk introduced by an enthusiastic embracing of outsourcing services is a temptation to start to regard the maintenance of local development expertise as a luxury. After all, if we&#8217;re going to outsource our ICT, why do we need to retain technical staff and, especially, developers. ICT is just a  commodity, right?</p>
<p>Well, no. I think it is a mistake to lose sight of the advantages that come from a local capacity to perform and deal with technical innovation. A local or &#8216;in house&#8217; development capacity is a valuable resource in the normal run of things. In a recession, it is <em>vital</em>.</p>
<p>The successful organisation will use a recession to examine its business and to change in order to be ready to fully exploit the economic recovery, when it comes. And large organisations are getting better at preparing themselves to be able to innovate <em>internally</em> or <em>locally</em>. Scott Anthony, who has worked with Clayton Christenson who coined the expression &#8220;disruptive innovation&#8221;, <a rel="nofollow" target="_blank" href="http://blogs.hbr.org/anthony/2008/12/can_established_companies_disr.html">lists some principles which inform an organisation&#8217;s ability to engage in innovation:</a></p>
<ul>
<li>Put the customer, and their important, unsatisfied job-to-be-done at the center of the innovation equation</li>
<li>Embrace the power of simplicity, convenience, and affordability</li>
<li>Create organizational space for disruptive growth businesses</li>
<li>Consider innovation levers beyond features and functions</li>
<li>Become world class at testing, iterating and adjusting</li>
</ul>
<p>(I&#8217;m not entirely enamoured of the &#8216;disruptive innovation&#8217; label &#8211; as my colleague Brian Kelly pointed out at the recent CETIS Conference, the HEI sector is receiving plenty of &#8216;disruption&#8217; right now from political forces &#8211; certainly enough to encourage innovation!)</p>
<p>In <a rel="nofollow" target="_blank" href="http://blogs.cetis.ac.uk/adam/2010/11/22/whither-innovation/"><em>Whither Innovation</em></a>, Adam Cooper of CETIS asks: &#8220;Could we leave innovation to the commercial sector and buy it in?&#8221;. Answering his own question, he quotes <a rel="nofollow" target="_blank" href="http://faculty.fuqua.duke.edu/%7Echarlesw/s591/Bocconi-Duke/Papers/C10/CohenLevinthalASQ.pdf">Cohen and Levinthal (1990)</a> who introduce the term <em>absorptive capacity</em>, describing :</p>
<blockquote><p>&#8230;a model of firm investment in research and development (R&amp;D), in which R&amp;D contributes to a firm’s absorptive capacity&#8230;.</p></blockquote>
<p>I see a direct parallel between outsourcing too much, and losing the absorptive capacity necessary to respond to change and to innovate to meet new challenges. In my talk to the JISC Innovation Group, I presented this diagram:</p>
<p><img style="margin-left:auto;margin-right:auto;border-color:initial;border-width:0px;" src="http://www.paulwalk.net/images/change_management.jpg" alt="change_management.jpg" border="0"/></p>
<p>This diagram tries to express the role of the local developer to act as an agent enabling and supporting change in an HEI. The developer deals with the remote, outsourced ICT system at a technical level, becoming one route through which the HEI ensures it gets the best possible value out of this arrangement. Remote services are, nowadays, guaranteed to offer some sort of application programming interface (API) which allows the more technically capable customer to tailor the service to their needs, rather than simply being obliged to use an undifferentiated, default user-interface for example. Local developers are increasingly networked with their peers in other HEIs (not least because of the efforts of the <a rel="nofollow" target="_blank" href="http://devcsi.ukoln.ac.uk/blog/">DevCSI project</a>), so they become quite powerful in being able to exploit commonly used remote services through the free sharing of knowledge, technique and even code. And because local developers are, in some case, embracing a more agile approach to development, they become the conduit through which the end-user expresses their needs to make the remote, shared service better fit their local, idiosyncratic needs. Developers can become surprisingly aware of &#8216;business&#8217; processes and information flows through an HEI, as they have to deal with them at several levels (I wrote about experiences of this sort in a previous post, <a rel="nofollow" target="_blank" href="http://blog.paulwalk.net/2008/02/26/soa-and-reusable-knowledge/">SOA and reusable knowledge</a>).</p>
<p>I see an opportunity for the <a rel="nofollow" target="_blank" href="http://devcsi.ukoln.ac.uk/blog/">DevCSI project</a> to focus its efforts on this aspect of change within our HEIs. <em>Change management </em>is going to be crucial for HEIs as they redefine what is <em>core</em> and what is <em>chore</em>, as they decide what they can do best, and what can be best done for them by others. They are going to need a capable, knowledgeable and above all agile capacity to innovate to meet new business challenges and a changed ICT environment.</p>
<p>I&#8217;ve taken to using the label <em>responsive innovation</em> to describe the act of dealing with or instigating technical change in a manner which advances the core mission of the institution. Developers are not the only part of the solution, but they are a vital part. Not only do HEIs need to hang on to their best developers, they need to invest in them, if they are to manage change and not be managed <em>by</em> the changes being imposed on them.</p>
<p>Developers are <em>core.</em></p>]]></content:encoded>
      <enclosure url="http://blogs.cf.ac.uk/mwe/resource/eunis-2009-final.pdf" length="120772" type="application/pdf" /><media:content url="http://blogs.cf.ac.uk/mwe/resource/eunis-2009-final.pdf" fileSize="120772" type="application/pdf" /><itunes:subtitle>Back in August I gave a short presentation to the JISC Innovation Group about the DevCSI project, introducing some ideas about possible future directions. The DevCSI project is a JISC-funded initiative designed to work directly with (software) developers </itunes:subtitle><itunes:summary>Back in August I gave a short presentation to the JISC Innovation Group about the DevCSI project, introducing some ideas about possible future directions. The DevCSI project is a JISC-funded initiative designed to work directly with (software) developers in Higher Education through the general approach of encouraging them to establish a community or peers, sharing knowledge, [...]</itunes:summary></item>
      <item>
         <title>Hot Topics in Scholarly Systems</title>
         <link>http://davetaz-blog.blogspot.com/2010/11/hot-topics-in-scholarly-systems.html</link>
         <description>Since I last wrote a blog post the world has been going through some harsh times where cutbacks and simplifications have been essential. The phrase "Throw money at it" no longer applies to anything and all of a sudden organisations as well as people seem far more keen to share than before (although we are still not fully open and sharing, mostly it's organisations wanting stuff without sharing themselves, but we'll get there).&lt;br /&gt;&lt;br /&gt;Anyway enough of that, what is actually happening?&lt;br /&gt;&lt;br /&gt;Well I am very proud to be at the forefront of an international effort to hold a series of scholarly technology meetings focussed on solving institutional problems. These meetings, known as the Scholarly Information Technical Summit (SITS) meetings, are being held in alongside many international conferences over the next 2 years and are being backed by all the major international funding bodies. See  http://bit.ly/Scholarly_Infrastructure_Technical_Summit for more info.&lt;br /&gt;&lt;br /&gt;There have now been 2 meetings, although SITS only came about because the first one was so successful. Each meeting conforms to the Open Agenda (see wikipedia) principal and is chaired likewise. This leads to the agenda being very pertinent to the people in the room and often creates conversation critical to the forward momentum of some of the technologies discussed. In the next few paragraphs I'm going to try and summerise the hot topics from the first meeting:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;SWORD&lt;/span&gt; - Put stuff in a repository&lt;br /&gt;&lt;br /&gt;SWORD has undoubtedly been a huge success, it's simple and well supported by many publishers and publishing software (including most notably the Microsoft office suite via the author add-in tool http://research.microsoft.com/authoring). There are however some problems which the community wants to address without making it more complex:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Packaging Formats - What exactly do you submit in your SWORD bundle, how should it be formed. There was no clear consensus other than we feel endpoints should try to support a multitude of formats depending on their users.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Endpoints are hard to find, for both users and the software, this could do with being addressed either via negotiation or meta tags of some sort.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;URIs in the returned package are not well specified to say what they mean or what they should mean.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Not a complete CRUD model&lt;/li&gt;&lt;li&gt;No levels of compliance any more&lt;/li&gt;&lt;li&gt;SWORD uses basic auth (too basic?)&lt;/li&gt;&lt;/ul&gt;The general call was that these points need addressing without making the SIMPLE (that's what the S stands for) too complex. CRUD looks interesting.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;OUTCOME: &lt;/span&gt;A follow on &lt;span style="font-weight:bold;"&gt;SWORD&lt;/span&gt; project has been funded by JISC (UK) along with a number of complementary (but separate) projects including &lt;span style="font-weight:bold;"&gt;DepositMO &lt;/span&gt;(http://blogs.ecs.soton.ac.uk/depositmo) and &lt;span style="font-weight:bold;"&gt;SONEX&lt;/span&gt; (http://sonexworkgroup.blogspot.com/).&lt;br /&gt;&lt;br /&gt;Personally i'm involved in DepositMO which intends to use SWORD (+CRUD) at it's core and extend this even further (outside of SWORD) to be fully interactive with the users. More can be found on the levels of conformance via the DepositMO blog (http://blogs.ecs.soton.ac.uk/depositmo).&lt;br /&gt;&lt;br /&gt;Package guidelines are to be set out by the new SWORD project along with tight definitions on what URIs mean and what it means to CRUD those URIs.&lt;br /&gt;&lt;br /&gt;Being written in to both projects I hope to bring not only technical knowledge to the table but also real world usages.&lt;br /&gt;&lt;br /&gt;There was also a call to look into technologies like OAuth and it's usages in SWORD, however this was a minor part of what became a major conversation at the second meeting.&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;Inverse Sword&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;span style="font-weight:bold;"&gt;&lt;span style="font-weight:bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;This conversation started on workflows and a discussion on the opportunities for common workflows and their impact. &lt;span style="font-weight:bold;"&gt;&lt;span style="font-weight:bold;"&gt;&lt;span style="font-weight:bold;"&gt;&lt;span style="font-weight:bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;The problem is that workflows tend to be very specific and quiet heavy weight in their approach to a problem, often constrained by the domain. This is the advantage of SWORD, it doesn't specify one, just a technique for transferring stuff. So what about reverse SWORD where you request a URI and a packaging format you want.&lt;br /&gt;&lt;br /&gt;This basically then re-inforced the conversation on what it meant to have SWORD endpoints supporting full CRUD using content negotiation to agree on packaging formats.  Clearly something to take forward... as it was!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Storage for Digital Repositories&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Question was (not from me): What is their beyond the Akubra (now DuraCloud) and my two projects (one of which has been finished)? &lt;span style="font-weight:bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;It is clear that there are now a whole range of storage options and technologies with infinite numbers of APIs, luckily many of the cloud providers use the S3 API (which is good!). So what rules languages are there for expressing &lt;span style="font-weight:bold;"&gt;where&lt;/span&gt; things should be stored?&lt;br /&gt;&lt;br /&gt;I briefly explained the EPrints implementation (labelled as mine but it isn't, it's EPrints property) which uses lightweight plug-ins to communicate with each service. These plug-ins implement 4 API calls (Store, Retrieve, Delete and one other necessary I won't bother explaining here). There is then an XML/XSLT based policy file which dictates which plug-ins are used to store what. Each file is then stored and metadata adjusted to state where it is stored in case policy changes. Upon a policy change, the files can be re-arranged to their correct locations again. This can also handle changes in storage architecture and whole services being off-lined. Advantage with this approach, which the community likes, is that you can use any number of storage solutions simultaneously and store as many copies of files on different ones as you like. For more see http://eprints.ecs.soton.ac.uk/17084/.&lt;br /&gt;&lt;br /&gt;The actions from this were that others were going to look at this implementation to see if this rule based language could apply on other repository platforms. Further it would be nice to have some good reference architectures available from vendors.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Services and Configuration Languages &lt;/span&gt;(was Common Platforms/Tools on the day)&lt;br /&gt;&lt;br /&gt;This was an interesting conversation which started around the idea of being able to re-use technologies by re-using/calling code libraries directly. The problem is here (as I see it) the number of coding environments and versions of these environments available.&lt;br /&gt;&lt;br /&gt;The solution is REST (not SOAP) APIs on the web and abstraction APIs in the code (e.g. SOLR) which enable you to call functions from (say) the command line, without having to &lt;span style="font-style:italic;"&gt;understand&lt;/span&gt; the code.&lt;br /&gt;&lt;br /&gt;David Flanders perhaps summed it up best, there are levels of interaction, some easier than others:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Core System (hard)&lt;/li&gt;&lt;li&gt;Exposing structured data&lt;/li&gt;&lt;li&gt;End user interfaces (including APIs)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;XML for configuration is a bit of a sticking point with users, but you need a machine readable language to configure the machine. Perhaps the point is here only use XML if you need it otherwise simple config files with "=" signs in is fine.&lt;br /&gt;&lt;br /&gt;There is no real answer to this question other than try and keep it simple... stupid.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Author IDs (URIs)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;/span&gt;&lt;/span&gt;Yes it's our favourite topic raising its ugly head again!&lt;br /&gt;&lt;br /&gt;It is clear that there are many efforts in this area, none of which have fully succeeded yet&lt;span style="font-weight:bold;"&gt;&lt;span style="font-weight:bold;"&gt;&lt;/span&gt;. &lt;/span&gt;There is still much interest in this area however and it is clear that we should be prepared to handle multiple IDs for a single author and be able to align them (if allowed) at a later stage.&lt;br /&gt;&lt;br /&gt;Currently the project to watch is ORCID which is a continuation of a previous project by Thompson (which failed commercially in this project).&lt;br /&gt;&lt;br /&gt;The consensus was however that we are not wrong to mint URIs for our authors in our repositories.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Conclusions&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Identification/Authorisation is a problem, can technologies like OAuth not only help with authorisation but also with identification? This could be a very interesting area.&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;SWORD being taken forward is a very positive outcome of the first SITS meeting.&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Simple services with simple APIs are so much more effective than "project centric" solutions and bloatware.&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Simple services are usable by lots of people!&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-318178988423514450</guid>
         <pubDate>Mon, 22 Nov 2010 07:13:00 +0000</pubDate>
      </item>
      <item>
         <title>djatoka</title>
         <link>http://african.lanl.gov/adore-djatoka/</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/66dc259754079bf51c80414b3bb04c5f#</guid>
         <pubDate>Tue, 16 Nov 2010 01:40:14 +0000</pubDate>
      </item>
      <item>
         <title>GeoCrossWalk</title>
         <link>http://www.geoxwalk.ac.uk/</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/68b098f38de8289c6a78090f981566ba#</guid>
         <pubDate>Wed, 27 Oct 2010 19:58:46 +0000</pubDate>
      </item>
      <item>
         <title>OJS/OCS Repository Deposit Project - APSRWiki</title>
         <link>http://pilot.apsr.edu.au/wiki/index.php/OJS/OCS_Repository_Deposit_Project</link>
         <guid isPermaLink="false">http://previous.delicious.com/url/0aded1bb3b6e5594322fff83e01cd960#</guid>
         <pubDate>Wed, 20 Oct 2010 15:32:29 +0000</pubDate>
      </item>
      <item>
         <title>XForms has been Dead for Years.  What to do?</title>
         <link>http://feedproxy.google.com/~r/MediashelfBlog/~3/N7paiU4Jjek/</link>
         <description>At Open Repositories in 2007, I gave a presentation on Simplifying Fedora Frontends with XForms and Fedora Disseminators (PPT).  This work showed how to leverage Fedora disseminators to embed XForms directly into your Fedora repository, thus embedding a basic edit interface directly into your archival objects.  The reference implementation used Orbeon XForms and a whole lot of [...]</description>
         <guid isPermaLink="false">http://yourmediashelf.com/blog/?p=383</guid>
         <pubDate>Tue, 20 Jul 2010 21:00:02 +0000</pubDate>
         <content:encoded><![CDATA[<p>At <a rel="nofollow" target="_blank" href="http://openrepositories.org/2007/program/fedora">Open Repositories</a> in 2007, I gave a presentation on <a rel="nofollow" target="_blank" href="http://yourmediashelf.com/reference/xforms/XForms%20and%20Fedora.ppt">Simplifying Fedora Frontends with XForms and Fedora Disseminators</a> (PPT).  This work showed how to leverage Fedora disseminators to embed XForms directly into your Fedora repository, thus embedding a basic edit interface directly into your archival objects.  The reference implementation used Orbeon XForms and a whole lot of XSLT.  Shortly after giving that presentation, I dove into Ruby on Rails in a big way and swiftly realized that I was gazing up the wrong tree.  By comparison with Rails, my development process was slow, my code was difficult to test or debug, and the user interfaces I had created were kludgy, ugly and difficult to refine.  From an architectural perspective, the XForms+Disseminators work was a great concept, but from an engineering perspective, it had serious flaws.  The most damning flaw was the fact that it put the ideals of software architecture ahead of the interests of real-life user interaction.</p>
<p>After careful consideration, I abandoned XForms and threw my energy into putting Rails on top of Fedora Repositories, which I reported on one year later at Open Repositories 2008 in a presentation called <a rel="nofollow" target="_blank" href="http://pubs.or08.ecs.soton.ac.uk/108/">How we Integrated Fedora into Ruby on Rails and How You Can Use It</a>.  This in turn lead to the development of ActiveFedora and the resulting Hydra framework.</p>
<p>I was not alone in my decision to abandon XForms.  In January 2008, the W3C published the first draft of HTML5. Later, the funding for the <a rel="nofollow" target="_blank" href="http://www.mozilla.org/projects/xforms/">Mozilla XForms project</a> evaporated.  Since then, to my knowledge, none of the browser manufacturers have commited any resources to supporting XForms.  To contrast, all of the major browser manufacturers have committed to support HTML5 (with HTML5 Forms) and have already made substantial headway in that direction.</p>
<p>Meanwhile, my old presentation from 2007 has been floating around the web.  I periodically get emails from people who are enthusiastic about using XForms with their Fedora repositories.  Every time that happens, I do my best to discourage them and promise to put up a blog post enumerating my criticisms of XForms. Thus, with sincere apologies for the lengthy delay, here is the quick rundown of my thoughts on XForms and a bit of info about how MediaShelf is dealing with complex, hierarchical XML in the repository-driven applications that we build.</p>
<h3>XForms are to Beta Cassetes as HTML5 Forms are to VHS</h3>
<p>XHTML2 and, by extension, XForms have lost their bid to become the chosen replacements for HTML4. HTML5, CSS3 and HTML5&#8217;s Forms (formerly WebForms) are the official successors, and they are looming large on the horizon.  All of the major browsers already support them.   The further you sink your legs into XForms, the harder it will be to keep up when this wave hits the mainstream.</p>
<p>In case that&#8217;s not &#8217;nuff said, let&#8217;s look a little more deeply.</p>
<h4>XForms is a dead technology.</h4>
<p>Technologies are like languages &#8212; Latin is a dead language but is still used (and useful) in rare cases.  The same is true for XForms.  If you choose to use XForms, you&#8217;re choosing to use a technology that will never enjoy broad adoption and will increasingly fall short of contemporary development practices.</p>
<p>Losing the battle to become a standard doesn&#8217;t always kill a technology, but in this case it did.  XForms was dead in the water in 2008 when the W3C decided to favor HTML5 over XHTML2.  The writing was on the wall when the major corporate support for XForms quickly evaporated (ie. IBM cancelled its funding for the Mozilla XForms Project, which to my knowledge has never seen any work since).</p>
<p>While some organizations do still use XForms, this doesn&#8217;t mean that the technology is strong &#8212; after all, plenty of hefty corporations still run and spend money on maintaining cobol apps.</p>
<h4>XForms makes it difficult to stay abreast of contemporary interface and interaction design.</h4>
<p>The structure and conceptual framework of XForms feels strange and overly complex to most developers (especially developers who don&#8217;t like declarative languages).  This discourages innovation on the part of those developers, who are only going to work with XForms when they are forced to.  Since XForms effectively died in 2008, none of the innovations since 2007 have happened in the XForms arena.  Case in point: XForms is incompatible with the vast majority of the javascript widgets and plugins floating around out there, which in turn means that unless you re-implement all of those plugins you&#8217;re stuck with kludgy, outdated interactivity in your forms.</p>
<h3>XForms has some real design flaws</h3>
<h4>XForms assumes that In-Browser interactivity means Asynchronous XML.</h4>
<p>With its rigid fixation on XML, XForms prevents you from taking advantage of non-XML communications.</p>
<p>The very notion of AJAX has evolved substantially since the inception of XForms.  In fact, the very name AJAX has fallen from favor because what originally manifested as &#8220;Asynchronous Javascript and XML&#8221; has now become much more open ended.  Notably, convention has drifted heavily in the direction of preferring JSON over XML by virtue of its simplicity and attendant ease of serialization/deserialization. Meanwhile XForms instead focuses on embracing the full complexity of XML, which adds complexity without much, or any, real payoff.  Further, with the arrival of HTML5, the predominant conventions, which are not all XML-centric, will become ensconced as the standard.</p>
<p><span class="Apple-style-span" style="font-weight:bold;">XForms encourages you to break the MVC pattern in your applications.</span></p>
<p>Though XForms uses an MVC model internally, it encourages you to break the MVC model in your application by forcing you to embed decisions about your XML schema(s) &#8212; which are an implementation detail of your model &#8212; into the XForms.  Thus, you&#8217;re forced to put representations of your model into the view layer of your application.  This is especially noticeable when your user wants to edit metadata that is actually spread across multiple XML documents (and schemas) in your model &#8212; though you can technically handle this in XForms with XInclude, etc, it is far from ideal.</p>
<p>While this is merely bad design when you&#8217;ve got just one form in one application, it compounds upon itself as you add new forms because decisions about the schema become replicated across numerous XForm controls within each of your edit views.  Any time you want to change something about your model (ie. what attribute to use on a particular metadata field), you have to update all of the corresponding references in all of your forms.</p>
<p>You could add a layer of abstraction in order to at least partially achieve this separation of concerns, but in that case, once you&#8217;ve added an abstraction layer, why the heck would you use XForms instead of a regular dynamic form?</p>
<h3>How, then, to deal with complex XML?</h3>
<p>This brings us back to the real issue at hand.  Most of the people who ask me about XForms are building interfaces for creating and/or editing complex XML metadata.  Most often they are either dealing with MODS or Qualified Dublin Core.  Since the metadata is already in XML, XForms seem like a perfect fit.</p>
<p>Though I was able to offer criticisms, until recently I haven&#8217;t been able to offer a complete, viable alternative to XForms.  ActiveFedora has supported Dublin Core since late 2008, but that was neither a complete nor a freestanding solution, and it wasn&#8217;t any help at all for those poor souls who are coping with MODS.  Now, as of last month, this has all changed.</p>
<p>As part MediaShelf&#8217;s work on Hydra for the Stanford Libraries, we have created a Ruby gem called OM (Opinionated Metadata). OM is a server-side solution for mapping between your application&#8217;s metadata vocabulary and a specific XML structure.  In order to use it, all you need to do is provide a single file specifying how your application terms map to nodes in an xml document.  Based on that information, it uses XPath and some of the characteristics of XUpdate to give you simple tools for predictably creating, retrieving, updating and deleting nodes within complex XML documents.</p>
<p>Though it&#8217;s only in its first stages of development, I think OM has some real promise.  We are currently using it in Hydrangea (the Hydra demo application) to edit MODS descriptive metadata and to edit rights metadata based on the Hydra rightsMetadata schema.  Also, ActiveFedora natively supports using OM to index complex, hierarchical XML into solr, so a single OM mapping file gets you all of the basics you need in order to use complex XML metadata in all of your application&#8217;s Search/View/Edit functionality while putting regular HTML (either HTML4 or HTML5) into your views.</p>
<p>There is much more information to come on OM.  In the meantime, go directly to the code.</p>
<p>The OM code: <a rel="nofollow" title="opinionated metadata on github" target="_blank" href="http://github.com/mediashelf/om">http://github.com/mediashelf/om</a></p>
<p>The Hydrangea code, which uses OM: <a rel="nofollow" target="_blank" href="http://github.com/projecthydra/hydrangea">http://github.com/projecthydra/hydrangea</a></p>
<p>Within Hydrangea, check out the OM mapping files for <a rel="nofollow" target="_blank" href="http://github.com/projecthydra/hydrangea/blob/master/vendor/plugins/hydra_repository/lib/hydra/mods_article.rb">MODS Articles</a> and <a rel="nofollow" target="_blank" href="http://github.com/projecthydra/hydrangea/blob/master/vendor/plugins/hydra_repository/lib/hydra/rights_metadata.rb">Hydra Rights Metadata</a>.  (Note: Both files also specify custom methods that augment the behaviors provided by OM.)  You can see how the resulting mappings (ie. <em>[:person, :first_name]</em> ) are used in the <a rel="nofollow" target="_blank" href="http://github.com/projecthydra/hydrangea/tree/master/app/views/hydrangea_articles">edit form templates for hydrangea articles</a>.</p>
<img src="http://feeds.feedburner.com/~r/MediashelfBlog/~4/N7paiU4Jjek" height="1" width="1"/>]]></content:encoded>
      </item>
      <item>
         <title>JISC CNI Meeting, Edinburgh</title>
         <link>http://blog.paulwalk.net/2010/07/02/jisc-cni-meeting-edinburgh/</link>
         <description>I&amp;#8217;ve been at the excellent JISC CNI Meeting in Edinburgh these last two days. Lots of interesting work being described and met some great new people. Some people have asked me to post my slides, so here they are: JISC CNI Meeting, Edinburgh 2010 View more presentations from Paul Walk.</description>
         <guid isPermaLink="false">http://blog.paulwalk.net/?p=239</guid>
         <pubDate>Fri, 02 Jul 2010 13:21:30 +0000</pubDate>
         <content:encoded><![CDATA[<p>I&#8217;ve been at the excellent <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/events/2010/07/cni.aspx">JISC CNI Meeting in Edinburgh</a> these last two days. Lots of interesting work being described and met some great new people. Some people have asked me to post my slides, so here they are:</p>
<div style="width:425px;"><strong><a rel="nofollow" title="JISC CNI Meeting, Edinburgh 2010" target="_blank" href="http://www.slideshare.net/paulwalk/jisc-cni-meeting-edinburgh-2010">JISC CNI Meeting, Edinburgh 2010</a></strong> 
<div style="padding:5px 0 12px;">View more <a rel="nofollow" target="_blank" href="http://www.slideshare.net/">presentations</a> from <a rel="nofollow" target="_blank" href="http://www.slideshare.net/paulwalk">Paul Walk</a>.</div>
</div>]]></content:encoded>
      </item>
      <item>
         <title>Usage Statistics parsing and querying with redis and python</title>
         <link>http://oxfordrepo.blogspot.com/2010/03/usage-statistics-parsing-and-querying.html</link>
         <description>This is an update of my previous dabblings with chomping through log files. To summarise where I am now:&lt;br /&gt;&lt;br /&gt;I have a distributable workflow, loosely coordinated using Redis and Supervisord - redis is used in two fashions: firstly using its lists as queues, buffering the communication between the workers, and secondly as a store, counting and associating the usage with the items and the metadata entities (people, subjects, etc) of those items.&lt;br /&gt;&lt;br /&gt;I have written a very small python logger, that pushes loglines directly onto a redis list, providing me with live updating abilities, as well as manual log file parsing. This is currently switched on for testing in the live repository.&lt;br /&gt;&lt;br /&gt;Current code base is here: &lt;a rel="nofollow" target="_blank" href="http://github.com/benosteen/UsageLogAnalysis"&gt;http://github.com/benosteen/UsageLogAnalysis&lt;/a&gt; - it has a good number of things hardcoded to the perculiarities of my log files and repository. However, as part of the &lt;a rel="nofollow" target="_blank" href="http://www.cranfieldlibrary.cranfield.ac.uk/pirus2/tiki-index.php"&gt;PIRUS 2 project&lt;/a&gt;, I am turning this into an easily reusable codebase, adding in the ability to push out OpenURLs to PIRUS statistics gatherers.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Overview&lt;/strong&gt;:&lt;br /&gt;&lt;br /&gt;Loglines  -- lpush'd to 'q:loglines'&lt;br /&gt;&lt;br /&gt;workers -  'debot.py' - pulls lines from this queue and parses them up, separating them into 4 categories:&lt;br /&gt;&lt;ol&gt;&lt;br /&gt; &lt;li&gt;Any hit by a recognised Bot or spider&lt;/li&gt;&lt;br /&gt; &lt;li&gt;Any view or download made by a real person on an item in the repository&lt;/li&gt;&lt;br /&gt; &lt;li&gt;Any 404, etc&lt;/li&gt;&lt;br /&gt; &lt;li&gt;And anything else&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;and the lines are moved onto 4 (5) queues respectively, q:bothits, q:objectviews (and q:count simultaneously), q:fof, and q:other. I am using prefixes as a convention when working with Redis keys - "q:" will almost always be a queue of some sort. These four queues are consumed by loggers, who commit the logs to disc, segregated into their categories.&lt;br /&gt;&lt;br /&gt;The q:count queue is consumed by a further worker called - count.py. This does a number of jobs, and is the part that actually does the analysis.&lt;br /&gt;&lt;br /&gt;For each repository item logged event, it finds the ID of the item and also whether this was a download of an item's files. With my repository, both these facts are deducible from the URL itself.&lt;br /&gt;&lt;br /&gt;Given the ID, it checks redis to see if this item has had its metadata analysed before. If it hasn't, it grabs the metadata for the item from the repositories index (hosted by an instance of Apache Solr) and starts to add connections between metadata entity and ID to the redis index:&lt;br /&gt;&lt;br /&gt;eg say item "pid:1" has the simple metadata of author_name='Ben' and subjects='foo, bar'&lt;br /&gt;&lt;br /&gt;create unique IDs from the text by hashing the text and prefix it with the type of the field they came from:&lt;br /&gt;&lt;br /&gt;Prefixes:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt; &lt;li&gt;name =&amp;gt; "n:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;institution =&amp;gt; "i:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;faculty =&amp;gt; "f:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;subjects =&amp;gt; "s:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;keyphrases =&amp;gt; "k:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;content type =&amp;gt; "type:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;collection =&amp;gt; "col:"&lt;/li&gt;&lt;br /&gt; &lt;li&gt;thesis type =&amp;gt; "tt:"&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;eg&lt;br /&gt;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; from hashlib import md5&lt;br /&gt;&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; md5("Ben").hexdigest()﻿&lt;br /&gt;&lt;br /&gt;'092f2ba9f39fbc2876e64d12cd662f72'&lt;br /&gt;&lt;br /&gt;So, the hashkey of the 'name' 'Ben' is 'n:﻿﻿092f2ba9f39fbc2876e64d12cd662f72'&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Now to make the connections in Redis:&lt;/strong&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt; &lt;li&gt;Add ID to the set 'objectitems' - to keep track of all the IDs (SADD objectitems {ID})&lt;/li&gt;&lt;br /&gt; &lt;li&gt;Set 'n:092f2....' to 'Ben' (so we can keep a reverse mapping)&lt;/li&gt;&lt;br /&gt; &lt;li&gt;Add 'n:092f2...' to 'names' set (to make it clearer. KEYS n:* should return an equivalent set)&lt;/li&gt;&lt;br /&gt; &lt;li&gt;Add 'n:092f2...' to 'e:{id}' eg "e:pid:1" - (e -&amp;gt; prefix for collections of entities. e:{id} is a set of all entities that occur in id)&lt;/li&gt;&lt;br /&gt; &lt;li&gt;Add 'e:pid:1' to 'e:n:092f2....' (gathers a list of item ids in which this entity 'Ben' occurs in)&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Repeat for any entity you wish to track.&lt;br /&gt;&lt;br /&gt;To make this more truth-manageable, you should include the id of record with the text when you generate the hashkey. That way, 'Ben' appearing in one record will have a different key than 'Ben' occuring in another. The assertion that these two entities are the same can easily take place in a different set, (I'm using b: as the prefix for these bundles of asserted equivalence)&lt;br /&gt;&lt;br /&gt;Once you have made these assertions, you can set about counting :)&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Conventions for tracking hits:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;d[v|d|o]:{id} - set of the dates on which {id} was viewed (v), downloaded from (d) or any other page action (o)&lt;br /&gt;&lt;p style="padding-left:30px;"&gt;eg dv:pid:1 -&amp;gt; set of dates on which pid:1 had page views.&lt;/p&gt;&lt;br /&gt;YYYY-MM-DD:{id}:[v|d|o] - set of IP clients that accessed a particular item on a given day - v,d,o as above&lt;br /&gt;&lt;p style="padding-left:30px;"&gt;eg 2010-02-03:pid:1:d - set of IP clients that downloaded a file from pid:1 on 2010-02-03&lt;/p&gt;&lt;br /&gt;t:views:{hashkey}, t:dls:{hashkey}, t:other:{hashkey}&lt;br /&gt;&lt;p style="padding-left:30px;"&gt;Grand totals of views, downloads or other accesses on a given entity or id. Good for quick lookups.&lt;/p&gt;&lt;br /&gt;Let's walk through an example: consider that a client of IP 1.2.3.4 visits the record page for this 'pid:1' on 2010-01-01:&lt;br /&gt;&lt;br /&gt;ID = pid:1&lt;br /&gt;&lt;br /&gt;Add the User Agent string ("mozilla... etc") to the 'ua:{IP}' set, to keep track of the fingerprints of the visitors.&lt;br /&gt;&lt;br /&gt;Try to add the IP address to the set  - in this case "2010-01-01:pid:1:v"&lt;br /&gt;&lt;br /&gt;If the IP isn't already in this set (the client hasn't accessed this page already today) then:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt; &lt;li&gt;make sure that "2010-01-01" is a part of the 'dv:pid:1' set&lt;/li&gt;&lt;br /&gt; &lt;li&gt;go through all the entities that are part of pid:1 (n:092... etc) and increment their totals by one.&lt;br /&gt;&lt;ul&gt;&lt;br /&gt; &lt;li&gt;INCR t:views:n:092...&lt;/li&gt;&lt;br /&gt; &lt;li&gt;INCR t:views:pid:1&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;strong&gt;Now, what about querying?&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Say we wish to look up the activity on a given entity, say for 'Ben'?&lt;br /&gt;&lt;br /&gt;First, find the hashkey(s) that exist that are equivalent - either directly using the simple md5sum hash, or by checking which bundles are for this entity.&lt;br /&gt;&lt;br /&gt;You can get the grand totals by simply querying "t:views:key", "t:dls..." for each key and summing them together.&lt;br /&gt;&lt;br /&gt;You can get more refined answers by getting the set of IDs that this entity is associated with, and querying that to gather all the daily IP sets for them, and summing the answer. This gives me a nice way to generate data suitable for a daily activity sparkline, like:&lt;br /&gt;&lt;br /&gt;&lt;img class="alignnone" title="Usage sparkline" src="http://chart.apis.google.com/chart?chs=400x125&amp;amp;cht=ls&amp;amp;chco=0077CC&amp;amp;chds=0,15&amp;amp;chxt=x&amp;amp;chxl=0:|2009-07-16|2009-09-16|2010-03-17&amp;amp;chd=e:AAAAAAAAAAAAAAAAAAAAAAAAYAAMAAAAAAAMAMAAAAAAYAAAAMMAkMAAAAAAAMAAAAAAYAAAYAAAAAMAAAAAMAAMAAYAAAAAAAAAAAYMAAAAAAAAAAAAMAAAAAAAAAMAAAAAAAAAAMMYAMAAAAAAYAAAAAMAMAMMAAAAMAAAAMAMAAAM8MMkAAAAAAAAAAAMAAAAMAAAAkAYAAMMAMMAAAAAAAAAAAAAAMYAAAAMAMAAYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMMAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYA" alt="" width="400" height="125"/&gt;&lt;br /&gt;&lt;br /&gt;I have added another set of keys to the store, of the form 'geocode:{IP}' that record country code to IP address, which gives me a nice way to plot out graphs like the following also using the google chart API:&lt;br /&gt;&lt;br /&gt;&lt;img class="alignnone" title="Usage distribution of an item from the repository" src="http://chart.apis.google.com/chart?cht=t&amp;amp;chs=440x220&amp;amp;chd=s:_&amp;amp;chf=bg,s,EAF7FE&amp;amp;chtm=world&amp;amp;chco=FFFFFF,FF0000,FFFF00,00FF00&amp;amp;chld=AEGRHKTRDEJPUSKRKWGBUKINEUNLSGSD&amp;amp;chd=t:0,0,0,0,0,0,40,46,0,0,100,6,6,0,0,0" alt="" width="440" height="220"/&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Python logging to Redis&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;This functionality is mainly in one file in the github repo: &lt;a rel="nofollow" target="_blank" href="http://github.com/benosteen/UsageLogAnalysis/blob/master/redislogger.py"&gt;redislogger.py&lt;/a&gt;﻿&lt;br /&gt;&lt;br /&gt;As you can see, most of that file is taken up with a demonstration of how to invoke it! The file that holds the logging configuration which this demo uses is in &lt;a rel="nofollow" target="_blank" href="http://github.com/benosteen/UsageLogAnalysis/blob/master/logging.conf.example"&gt;logging.conf.example&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;NB The usage analysis code and UI is very much a WIP&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;but, I just wanted to post quickly on the rough overview on how it is set up and working.&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://3.bp.blogspot.com/_KLlGSypGAvw/S6yPzoLmhXI/AAAAAAAAAGY/f5dWVvq7T5Q/s1600/repo_statistics.png"&gt;&lt;img style="display:block;margin:0px auto 10px;text-align:center;cursor:pointer;cursor:hand;width:369px;height:400px;" src="http://3.bp.blogspot.com/_KLlGSypGAvw/S6yPzoLmhXI/AAAAAAAAAGY/f5dWVvq7T5Q/s400/repo_statistics.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5452891365961008498"/&gt;&lt;/a&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://2.bp.blogspot.com/_KLlGSypGAvw/S6yPzH1bZcI/AAAAAAAAAGQ/z9o1QHxNmlo/s1600/repo_stats_for_item.png"&gt;&lt;img style="display:block;margin:0px auto 10px;text-align:center;cursor:pointer;cursor:hand;width:358px;height:400px;" src="http://2.bp.blogspot.com/_KLlGSypGAvw/S6yPzH1bZcI/AAAAAAAAAGQ/z9o1QHxNmlo/s400/repo_stats_for_item.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5452891357278070210"/&gt;&lt;/a&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-4239157151754782056</guid>
         <pubDate>Fri, 26 Mar 2010 03:36:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://3.bp.blogspot.com/_KLlGSypGAvw/S6yPzoLmhXI/AAAAAAAAAGY/f5dWVvq7T5Q/s72-c/repo_statistics.png" width="72" />
      </item>
      <item>
         <title>Curating content from one repository to put into another</title>
         <link>http://oxfordrepo.blogspot.com/2010/03/curating-content-from-one-repository-to.html</link>
         <description>First you need a little code that I've written:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;sudo easy_install recordsilo oaipmhscraper &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(This should install all the dependencies for the following)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To harvest some OAI-PMH records from say... &lt;a rel="nofollow" target="_blank" href="http://eprints.soton.ac.uk/perl/oai2"&gt;http://eprints.soton.ac.uk/perl/oai2&lt;/a&gt; :&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, take a look at the Identify page for the OAI-PMH endpoint: &lt;a rel="nofollow" target="_blank" href="http://eprints.soton.ac.uk/perl/oai2?verb=Identify"&gt;http://eprints.soton.ac.uk/perl/oai2?verb=Identify&lt;/a&gt; &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The example identifier indicates that the record identifiers start with: "oai:eprints.soton.ac.uk:" - we'll need this in a bit. Maybe not need, but it'll make the local storage more... elegant?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Go to a nice clean directory, with enough storage to handle whatever you want to harvest.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Start a python commandline:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; from oaipmhscraper import OAIPMHScraper&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;---&amp;gt; NB  OAIPMHScraper(storage_dir, base_oai_url, identifier_uri_prefix)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; oaipmh = OAIPMHScraper("myrepo",&lt;/div&gt;&lt;div&gt;                                                         "http://eprints.soton.ac.uk/perl/oai2", &lt;/div&gt;&lt;div&gt;                                                         "oai:eprints.soton.ac.uk:")&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's have a look at what could be found out about the OAI-PMH endpoint then:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; oaipmh.state&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;{'lastidentified': '2010-03-25T15:57:15.670552', 'identify': {'deletedRecord': 'persistent', 'compression': [], 'granularity': 'YYYY-MM-DD', 'baseURL': 'http://eprints.soton.ac.uk/perl/oai2', 'adminEmails': ['mailto:eprints@soton.ac.uk'], 'descriptions': ['........'], 'protocolVersion': '2.0', 'repositoryName': 'e-Prints Soton', 'earliestDatestamp': '0001-01-01 00:00:00'}}&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; oaipmh.getMetadataPrefixes()&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;{'oai_dc': ('http://www.openarchives.org/OAI/2.0/oai_dc.xsd', 'http://www.openarchives.org/OAI/2.0/oai_dc/'), 'uketd_dc': ('http://naca.central.cranfield.ac.uk/ethos-oai/2.0/uketd_dc.xsd', 'http://naca.central.cranfield.ac.uk/ethos-oai/2.0/')}&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's grab all the oai_dc from all the objects:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; oaipmh.getRecords('oai_dc')&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Go make a cup of coffee or tea.... you'll get lots of stuff like:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:OAIPMH Harvester:New object: oai:eprints.soton.ac.uk:1267 found with datestamp 2004-04-27T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-03-25 16:01:11,807 - OAIPMH Harvester - INFO - New object: oai:eprints.soton.ac.uk:1268 found with datestamp 2005-04-22T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:OAIPMH Harvester:New object: oai:eprints.soton.ac.uk:1268 found with datestamp 2005-04-22T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-03-25 16:01:11,813 - OAIPMH Harvester - INFO - New object: oai:eprints.soton.ac.uk:1269 found with datestamp 2004-04-07T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:OAIPMH Harvester:New object: oai:eprints.soton.ac.uk:1269 found with datestamp 2004-04-07T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-03-25 16:01:11,819 - OAIPMH Harvester - INFO - New object: oai:eprints.soton.ac.uk:1270 found with datestamp 2004-04-07T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:OAIPMH Harvester:New object: oai:eprints.soton.ac.uk:1270 found with datestamp 2004-04-07T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-03-25 16:01:11,824 - OAIPMH Harvester - INFO - New object: oai:eprints.soton.ac.uk:1271 found with datestamp 2004-04-14T00:00:00 - storing.&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My advice is to hop to a different terminal window and start to poke around with the content you are getting. The underlying store is a take on the CDL's Pairtree microspec (pairtree being a minimalist specification for how to structure the access to object-orientated items on a hierarchical filesystem) This model on top of pairtree I've called a Silo (in the RecordSilo library I've written) and constitutes a basic object model, where each object has a persistent JSON state (r/w-able) and can store any file or file in a subdirectory. It has crude object-level versioning, rather than file-versioning, so you can clone one version, delete/alter/add to it to create a second, curated version for reuse elsewhere without affecting the original.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What makes pairtree attractive is that the files themselves are not altered in form, so normal posix tools can be used on the files without unwrapping, depacking, etc.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's have a look around at what's been harvested so far into the "myrepo" silo:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; from recordsilo import Silo&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; s = Silo("myrepo")&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; s.state&lt;/div&gt;&lt;div&gt;{'storage_dir': 'myrepo', 'identifier_uri_prefix': 'oai:eprints.soton.ac.uk:', 'uri_base': 'oai:eprints.soton.ac.uk:', 'base_oai_url': 'http://eprints.soton.ac.uk/perl/oai2'}'}&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; len(s)   # NB this can be a time-consuming operation&lt;/div&gt;&lt;div&gt;&lt;div&gt;1100&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; len(s)&lt;/div&gt;&lt;div&gt;1200&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now let's look at a record: I'm sure I saw '6102' whizz past as it was harvesting...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj = s.get_item("oai:eprints.soton.ac.uk:6102")&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj&lt;/div&gt;&lt;div&gt;{'files': {'1': ['oai_dc']}, 'subdir': {'1': []}, 'versions': ['1'], 'date': '2004-06-24T00:00:00', 'currentversion': '1', 'metadata_files': {'1': ['oai_dc']}, 'item_id': 'oai:eprints.soton.ac.uk:6102', 'version_dates': {'1': '2004-06-24T00:00:00'}, 'metadata': {'identifier': 'oai:eprints.soton.ac.uk:6102', 'firstSeen': '2004-06-24T00:00:00', 'setSpec': ['7374617475733D707562', '7375626A656374733D51:5148:5148333031', '7375626A656374733D47:4743', '74797065733D61727469636C65', '67726F75703D756F732D686B']}}&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.files&lt;/div&gt;&lt;div&gt;['oai_dc']&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.versions&lt;/div&gt;&lt;div&gt;['1']&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.clone_version("1","workingcopy")&lt;/div&gt;&lt;div&gt;'workingcopy'&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.versions&lt;/div&gt;&lt;div&gt;['1', 'workingcopy']&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.currentversion&lt;/div&gt;&lt;div&gt;'workingcopy'&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.set_version_cursor("1")&lt;/div&gt;&lt;div&gt;True&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.set_version_cursor("workingcopy")&lt;/div&gt;&lt;div&gt;True&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.files&lt;/div&gt;&lt;div&gt;['oai_dc']&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; with obj.get_stream("oai_dc") as oai_dc_xml:&lt;/div&gt;&lt;div&gt;...   print oai_dc_xml.read()&lt;/div&gt;&lt;div&gt;... &lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&amp;lt;metadata xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&amp;gt;&lt;/div&gt;&lt;div&gt;      &amp;lt;oai_dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:title&amp;gt;Population biology of Hirondellea sp. nov. (Amphipoda: Gammaridea: Lysianassoidea) from the Atacama Trench (south-east Pacific Ocean)&amp;lt;/dc:title&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:creator&amp;gt;Perrone, F.M.&amp;lt;/dc:creator&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:creator&amp;gt;Dell'Anno, A.&amp;lt;/dc:creator&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:creator&amp;gt;Danovaro, R.&amp;lt;/dc:creator&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:creator&amp;gt;Groce, N.D.&amp;lt;/dc:creator&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:creator&amp;gt;Thurston, M.H.&amp;lt;/dc:creator&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:subject&amp;gt;QH301 Biology&amp;lt;/dc:subject&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:subject&amp;gt;GC Oceanography&amp;lt;/dc:subject&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:description/&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:publisher/&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:date&amp;gt;2002&amp;lt;/dc:date&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:type&amp;gt;Article&amp;lt;/dc:type&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:type&amp;gt;PeerReviewed&amp;lt;/dc:type&amp;gt;&lt;/div&gt;&lt;div&gt;        &amp;lt;dc:identifier&amp;gt;http://eprints.soton.ac.uk/6102/&amp;lt;/dc:identifier&amp;gt;&amp;lt;/oai_dc:dc&amp;gt;&amp;lt;/metadata&amp;gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You can add bytestreams as strings:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.put_stream("foo.txt", "Some random text!")&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;or as file-like objects:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; with open("README", "r") as readmefile:&lt;/div&gt;&lt;div&gt;...   obj.put_stream("README", readmefile)&lt;/div&gt;&lt;div&gt;... &lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.files&lt;/div&gt;&lt;div&gt;['oai_dc', 'foo.txt', 'README']&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.set_version_cursor("1")&lt;/div&gt;&lt;div&gt;True&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; obj.files&lt;/div&gt;&lt;div&gt;['oai_dc']&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;This isn't the easiest way to browse or poke around the files. It would be nice to see these through a web UI of some kind:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Grab the basic UI code from http://github.com/benosteen/siloserver&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(You'll need to install web.py and Mako:  sudo easy_install mako web.py)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then edit the silodirectory_conf.py file to point to the location of the Silo - if the directory structure looks like the following:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;myrepo&lt;/div&gt;&lt;div&gt;   |&lt;/div&gt;&lt;div&gt;   ---  Silo directory stuff...&lt;/div&gt;&lt;div&gt;SiloServer&lt;/div&gt;&lt;div&gt;   |&lt;/div&gt;&lt;div&gt;    - dropbox.py&lt;/div&gt;&lt;div&gt;    etc&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You need to change data_dir to "../myrepo" and then you can start the server by running 'python dropbox.py'&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Point a browser at http://localhost:8080/ and wait a while - that start page loads *every* object in the Silo.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a rel="nofollow" target="_blank" href="http://1.bp.blogspot.com/_KLlGSypGAvw/S6uUGcqms0I/AAAAAAAAAF8/F13VVNI2GLc/s1600/dropbox_soton.png"&gt;&lt;img src="http://1.bp.blogspot.com/_KLlGSypGAvw/S6uUGcqms0I/AAAAAAAAAF8/F13VVNI2GLc/s320/dropbox_soton.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5452614612357133122" style="display:block;margin-top:0px;margin-right:auto;margin-bottom:10px;margin-left:auto;text-align:center;cursor:pointer;width:320px;height:214px;"/&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;And let's revisit our altered record, at &lt;a rel="nofollow" target="_blank" href="http://localhost:8080/oai:eprints.soton.ac.uk:6102"&gt;http://localhost:8080/oai:eprints.soton.ac.uk:6102&lt;/a&gt; &lt;/div&gt;&lt;div&gt;&lt;a rel="nofollow" target="_blank" href="http://4.bp.blogspot.com/_KLlGSypGAvw/S6uUeJNq0kI/AAAAAAAAAGE/H7ZTuIE8N7s/s1600/dropbox_6102.png"&gt;&lt;img src="http://4.bp.blogspot.com/_KLlGSypGAvw/S6uUeJNq0kI/AAAAAAAAAGE/H7ZTuIE8N7s/s400/dropbox_6102.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5452615019452355138" style="display:block;margin-top:0px;margin-right:auto;margin-bottom:10px;margin-left:auto;text-align:center;cursor:pointer;width:400px;height:295px;"/&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;So, from this point, I can curate the records as I wish, add files to each item - perhaps licences, PREMIS files, etc - and then push them onto another repository, such as Fedora.&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-6383826637269561669</guid>
         <pubDate>Thu, 25 Mar 2010 08:42:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://1.bp.blogspot.com/_KLlGSypGAvw/S6uUGcqms0I/AAAAAAAAAF8/F13VVNI2GLc/s72-c/dropbox_soton.png" width="72" />
      </item>
      <item>
         <title>Ideas for the OR10 Developer Challenge?</title>
         <link>http://blog.paulwalk.net/2010/03/23/ideas-for-the-or10-developer-challenge/</link>
         <description>Update: I have closed comment on this post now. Thank you very much to all who commented and suggested ideas for a challenge. I have now posted a draft Challenge here and would welcome comments on that post. Thanks again! Through the JISC-funded DevCSI project, UKOLN has been asked to arrange a &amp;#8216;Developer Challenge&amp;#8217; for [...]</description>
         <guid isPermaLink="false">http://blog.paulwalk.net/2010/03/23/ideas-for-the-or10-developer-challenge/</guid>
         <pubDate>Tue, 23 Mar 2010 21:01:14 +0000</pubDate>
         <content:encoded><![CDATA[<p><strong>Update: </strong>I have closed comment on this post now. Thank you very much to all who commented and suggested ideas for a challenge. I have now <a rel="nofollow" target="_blank" href="http://blog.paulwalk.net/2010/04/26/draft-or10-challenge-idea/">posted a draft Challenge here</a> and would welcome comments on that post. Thanks again!</p>
<p>Through the <a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk">JISC</a>-funded <a rel="nofollow" target="_blank" href="http://devcsi.ukoln.ac.uk/">DevCSI</a> project, <a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/">UKOLN</a> has been asked to arrange a &#8216;Developer Challenge&#8217; for the <a rel="nofollow" target="_blank" href="http://or2010.fecyt.es/publico/Home/index.aspx">Fifth International Conference on Open Repositories</a>, (OR10) to be held in Madrid in July of this year.</p>
<p>This will be the third consecutive year that the Developer Challenge has been a feature of this conference. Previous challenges have been both competitive and creative.</p>
<p><img src="http://blog.paulwalk.net/wp-content/uploads/sites/6/2010/03/OR09_dev_challenge.jpg" alt="OR09_dev_challenge.jpg" width="240" height="180"/></p>
<p><a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/grahamtriggs/3563945078/sizes/m/"><span style="font-size:x-small;"><span style="font-size:10px;">Photo by Graham Triggs</span></span></a></p>
<p>This year we have been considering doing something slightly different. Previously, a general challenge has been issued, inviting developers to submit prototypes for anything which they feel is relevant and useful to the repository community. But now that the community has a better appreciation of the sort of creativity which developers can bring to these events, we wonder if we might try something a little different.</p>
<p><strong>A general challenge?</strong></p>
<p>We have been thinking about the possibility of the repository community issuing a particular challenge to the developers planning to attend OR10. This could be decided on by the community well in advance of the conference. If we managed to &#8216;crowd source&#8217; a few ideas, we could organise a simple vote. Something we are trying to do more with the DevCSI project is to get developers together with non-developers from the same &#8216;domain&#8217; (repositories in this case) &#8211; so we are quite interested in pursuing this approach with OR10.</p>
<p>The OR10 organisers have helpfully couched the conference itself in terms of some challenges:</p>
<blockquote><p>In a world of increasingly dispersed and modularized digital services and content, it remains a grand challenge for the future to cross the borders between diverse poles:</p>
<ul>
<li>the web and the repository</li>
<li>knowledge and technology</li>
<li>wild and curated content</li>
<li>linked and isolated data</li>
<li>disciplinary and institutional systems</li>
<li>scholars and service providers</li>
<li>ad-hoc and long-term access</li>
<li>ubiquitous and personalized environments</li>
<li>the cloud and the desktop.</li>
</ul>
</blockquote>
<p>Perhaps one or more of these could serve as the inspiration for a more concrete developers challenge?</p>
<p>What this boils down to is finding a challenge in the general area of repositories, recognised as important by the community generally, which could only be met by getting developers to work with non-developers at the conference. For it to be fair, the challenge would need to be non-specific with regard to any particular repository software.</p>
<p>I would welcome some feedback:</p>
<ul>
<li>is this general approach a good idea?</li>
<li>do you have any ideas for a challenge?</li>
</ul>
<p>please feel free to comment her if you have any ideas, or alternatively drop me an email at <a rel="nofollow" target="_blank" href="http://p.walk@ukoln.ac.uk">p.walk@ukoln.ac.uk</a></p>
<p>Thanks!</p>]]></content:encoded>
      </item>
      <item>
         <title>An Analytical Anniversary</title>
         <link>http://chronicles-of-richard.blogspot.com/2010/03/analytical-anniversary.html</link>
         <description>&lt;p&gt;Today is my anniversary.  I have been at &lt;a rel="nofollow" target="_blank" href="http://www.symplectic.co.uk/"&gt;Symplectic Ltd&lt;/a&gt; for one of your Earth "years".  And a very busy one it has been, what with writing repository integration tools for our &lt;a rel="nofollow" target="_blank" href="http://www.symplectic.co.uk/products/publications.html"&gt;research management system&lt;/a&gt; to &lt;a rel="nofollow" target="_blank" href="http://www.symplectic.co.uk/products/repository-tools.html"&gt;deposit content&lt;/a&gt; into &lt;a rel="nofollow" target="_blank" href="http://www.dspace.org/"&gt;DSpace&lt;/a&gt;, &lt;a rel="nofollow" target="_blank" href="http://www.eprints.org/"&gt;EPrints&lt;/a&gt; and &lt;a rel="nofollow" target="_blank" href="http://www.fedora-commons.info/"&gt;Fedora&lt;/a&gt;, plus supporting the integration into a number of other platforms.  I thought it would be fun to do a bit of a breakdown of the code that I've written from scratch in the last 12 months (which I'm counting as 233 working days).  I'm going to do an analysis of the following areas of productivity:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;lines of code&lt;/li&gt;&lt;li&gt;lines of inline code commentary&lt;/li&gt;&lt;li&gt;number of A4 pages of documentation (end user, administrator and technical)&lt;/li&gt;&lt;li&gt;number of version control commits&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Lets start from the bottom and work upwards.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Number of version control commits&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Total: 700&lt;strong&gt;&lt;br /&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Per day: 3&lt;/p&gt;&lt;p&gt;I tend to commit units of work, so this might suggest that I do 3 bits of functionality every day.  In reality I quite often also commit quick bug fixes (so that I can record in the commit log the fix details), or at the end of a day/week, when I want to know that my code is safe from hardware theft, nuclear disaster, etc.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Number of A4 pages of documentation&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Total: 72&lt;/p&gt;&lt;p&gt;Per day: 0.31&lt;/p&gt;&lt;p&gt;Not everyone writes their documentation in A4 form any more, and it's true that some of my dox take the form of web pages, but as a commercial software house we tend to produce well formatted, nice end-user and administrator documentation.  In addition, I rather enjoy at a geek level a nice printable document that's well laid out, so I do my technical dox that way too.&lt;/p&gt;&lt;p&gt;The amount of documentation is relatively small, but it doesn't take into account a lot of informal documentation.  More importantly, though, at the back end of the first version of our Repository Tools software, the documentation is still in development.  I expect the number of pages to probably triple or quadruple over the next few weeks.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Lines of Code and Lines of Commentary&lt;/strong&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;I wrote a script which analysed my outputs.  Ironically, it's written in Python, which isn't one of the languages that I use professionally, so it's not included in this analysis (and none of my personal programming projects are therefore included).  This analysis covers all of my final code on my anniversary (23rd March), and does not take into account prototyping or refactoring of any kind.  Note also that blank lines are not counted.&lt;/p&gt;&lt;p&gt;&lt;em&gt;Line Counts&lt;/em&gt;:&lt;br /&gt;&lt;br /&gt;XML (107 Files) :: Lines of Code: 17819; Lines of Inline Comments: 420&lt;br /&gt;&lt;/p&gt;&lt;p&gt;XML isn't really programming, but it was interesting to see how much I actually work with it.  This figure is not used in any of the below statistics.  Some of these are large metadata documents and some are configuration (maven build files, ant build files, web server config, etc).&lt;/p&gt;&lt;p&gt;&lt;br /&gt;XSLT (36 Files) :: Lines of Code: 8502; Lines of Inline Comments: 2762&lt;br /&gt;JAVA (181 Files) :: Lines of Code: 22350; Lines of Inline Comments: 7565&lt;br /&gt;JSP (16 Files) :: Lines of Code: 2847; Lines of Inline Comments: 1&lt;br /&gt;PERL (58 Files) :: Lines of Code: 6506; Lines of Inline Comments: 1699&lt;br /&gt;---------------&lt;br /&gt;TOTAL (291 Files) :: Lines of Code: 40205; Lines of Inline Comments: 12027&lt;br /&gt;&lt;/p&gt;&lt;p&gt;I remember once being told that 30k lines of code a year was pretty reasonable for a developer.  I feel quite chuffed!&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;em&gt;Lines of code/comments per day:&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;XSLT :: Lines of Code: 36; Lines of Inline Comments: 12&lt;br /&gt;JAVA :: Lines of Code: 96; Lines of Inline Comments: 32&lt;br /&gt;JSP :: Lines of Code: 12; Lines of Inline Comments: 0&lt;br /&gt;PERL :: Lines of Code: 28; Lines of Inline Comments: 7&lt;br /&gt;---------------&lt;br /&gt;TOTAL :: Lines of Code: 173; Lines of Inline Comments: 52&lt;/p&gt;&lt;p&gt;It looks much less impressive when you look at it on a daily basis.  We just have to remember that this is 173 &lt;strong&gt;wonderful&lt;/strong&gt; lines of code every day!&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Comment to code ratio (comments/code)&lt;/em&gt;:&lt;br /&gt;&lt;br /&gt;XSLT :: 0.33&lt;br /&gt;JAVA :: 0.34&lt;br /&gt;JSP :: 0&lt;br /&gt;PERL :: 0.26&lt;br /&gt;---------------&lt;br /&gt;TOTAL :: 0.30&lt;br /&gt;&lt;/p&gt;&lt;p&gt;It was interesting to see that my commenting ratio is fairly stable at about 30% of the overall codebase size.  I didn't plan that or anything.  This includes block comments for classes and methods, and inline programmer documentation.  The reason for the shortfall in Perl is suggested below.  Notice that I didn't write any comments in the JSPs because I only use this code for testing, and is less carefully curated code.&lt;/p&gt;&lt;p&gt;Some perl comments don't start with anything specific - they are block comments starting and ending with =xxx and =cut respectively, which is difficult to parse out for analysis easily.  Therefore the Perl code line counts overestimate and the comment counts underestimate.  More likely figures are, given a 0.33 comment to code ratio:&lt;br /&gt;&lt;br /&gt;PERL (58 Files) :: Lines of Code: 5498; Lines of Inline Comments: 2707&lt;/p&gt;&lt;p&gt;&lt;em&gt;Amount of testing code (testing/production):&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;9937 / 30268 = 0.33&lt;br /&gt;&lt;/p&gt;&lt;p&gt;This is the total amount of code that I wrote to test the other code that I wrote.  So nearly 10k lines of code are there purely to demonstrate that the other 30k lines of code are working.  I'm not going to suggest that this 33% is a linear relationship as the projects increase in size, but maybe we'll find out next year.  Incidentally, the test code that I analysed was the third version of my test framework, so in reality I wrote quite a few more lines of code (perhaps 3 or 4k) before reaching the final version used above.&lt;/p&gt;&lt;p&gt;Note that I'm a big fan of Behaviour Driven Development, and this does tend to cause testing code to be fairly extensive in its own right.&lt;/p&gt;&lt;p&gt;&lt;em&gt;Number of new files per day:&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;XSLT :: 0.15&lt;br /&gt;JAVA :: 0.78&lt;br /&gt;JSP :: 0.07&lt;br /&gt;PERL :: 0.25&lt;br /&gt;---------------&lt;br /&gt;TOTAL :: 1.25&lt;/p&gt;&lt;p&gt;In reality, of course, I create lots and lots of new files over a short period of time, and then nothing for ages.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;em&gt;Average file length&lt;/em&gt;:&lt;br /&gt;&lt;br /&gt;Excluding blank lines: 179&lt;br /&gt;Including blank lines: 211&lt;br /&gt;Spaciousness (including/excluding): 1.18&lt;br /&gt;&lt;/p&gt;&lt;p&gt;What is spaciousness?  It's a measure of how I tend to space my code.  Everyone, I have noticed, is fairly different in this regard - I wonder what other people's spaciousness is?&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Source Code&lt;/strong&gt;&lt;strong&gt;&lt;br /&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Do you want to have a go at this yourself?  Blogger doesn't make attaching files particularly easy, so you can get this from the nice folks at pastebin, who say this shouldn't ever time out: &lt;a rel="nofollow" target="_blank" href="http://pastebin.com/GVkHd7tB"&gt;http://pastebin.com/GVkHd7tB&lt;/a&gt;.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/PywPdRdqKjQ" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-8994691411480589353</guid>
         <pubDate>Mon, 22 Mar 2010 16:42:00 +0000</pubDate>
      </item>
      <item>
         <title>Blacklight, ActiveFedora and Shelver: Interplay between Searching, Managing and Indexing in a Repository Solution</title>
         <link>http://feedproxy.google.com/~r/MediashelfBlog/~3/DAJYpAvrXWY/</link>
         <description>I submitted an abbreviated version of this proposal (limited to 4 pages) to the OR10 review committee.  Feel free to download the abbreviated version or this long version in PDF format.
OpenRepositories 2010 Presentation Proposal (Long Version)
Any repository solution provides facilities for Creation, Management, &amp;#38; Editing of Content as well as facilities for Searching &amp;#38; Browsing [...]</description>
         <guid isPermaLink="false">http://yourmediashelf.com/blog/?p=342</guid>
         <pubDate>Mon, 01 Mar 2010 22:54:01 +0000</pubDate>
         <content:encoded><![CDATA[<p><em>I submitted an abbreviated version of this proposal (limited to 4 pages) to the OR10 review committee.  Feel free to download the </em><em><a rel="nofollow" target="_blank" href="http://yourmediashelf.com/blog/wp-content/uploads/2010/03/or10_proposal_blacklight_activefedora_and_shelver.pdf">abbreviated version</a> </em><em>or this </em><a rel="nofollow" target="_blank" href="http://yourmediashelf.com/blog/wp-content/uploads/2010/03/long_version_of_or10_proposal_blacklight_activefedora_and_shelver.pdf"><em>long version</em></a><em> in PDF format.</em></p>
<h3>OpenRepositories 2010 Presentation Proposal (Long Version)</h3>
<p>Any repository solution provides facilities for Creation, Management, &amp; Editing of Content as well as facilities for Searching &amp; Browsing through that content.  Experience has shown that when a solution binds these two areas of functionality together too tightly, the system becomes brittle and unworkable, discouraging innovation.  Our work on the Hydra project has produced a flexible and intuitive solution that combines these two areas in an almost entirely decoupled fashion.  This solution, which is already working in multiple Hydra applications, is built on a three-part pattern where <em>Blacklight</em> handles Search &amp; Discovery, <em>ActiveFedora</em> handles Creation, Management and Editing of Content, and a small application called <em>Shelver</em> supplies the crossover point by indexing the content into Solr so that it will show up in Blacklight.  This three-part approach reflects a strong pattern for designing and/or improving repository solutions.  The main pivot of this approach is to treat indexing as its own separate part of the application and to allow that indexing processes to evolve constantly as part of the application development cycle.</p>
<p>This work is the product of combining established best practices, best of breed software, and lessons learned from an iterative approach to application development.  While our implementation is focused on Fedora Repositories, the software could be used in multiple contexts and the pattern is certainly applicable to any content-oriented application.</p>
<h2>The anatomy of a Hydra Application</h2>
<p><em><strong>Note</strong>: This is a working model of the functional structure of a Hydra application.  The complete designs for the final features and functionality of Hydra applications reach far beyond what is presented here.  For more information on the greater vision around the Hydra project, please refer to the <a rel="nofollow" target="_blank" href="http://www.fedora-commons.org/confluence/display/hydra/The+Hydra+Project">Hydra Project pages on the Fedora Commons wiki</a>.</em></p>
<ul>
<li>The portion of a Hydra application that handles Creation, Management, &amp; Editing of content is provided by the Hydra Core, which consists of ActiveFedora along with a few Hydra &#8220;helpers&#8221; which integrate ActiveFedora into Ruby on Rails.</li>
<li>The Search &amp; Discovery portion of a Hydra application is a Blacklight installation - nothing more, nothing less.  As with any Blacklight installation, its behavior and appearance are likely to be customized but otherwise there is nothing Hydra-specific about it.</li>
<li>Shelver (which can be run either from within the application, from the command line, or as a JMS listener) indexes content and its salient metadata in Solr, usually pulling that content from Fedora.</li>
</ul>
<p>These three components &#8212; Blacklight, and Hydra Core and Shelver &#8212; work in concert to present a consolidated repository solution to the end user.  Meanwhile, the three components are sufficiently decoupled that each could be run as a freestanding application.  They interoperate based on a minimal contract that revolves around decisions about what information should be in Solr and how it should be represented in the Solr index in order to achieve the ideal search experience.</p>
<p>In the process of customizing or extending a Hydra application, some changes require modifications to all three components, but most changes impact only one or two of the components at a time.  This makes it very easy to iteratively improve the application and adapt to real world needs.</p>
<p>This structure grew naturally out of a process of exploration.  In early 2009 developers at UVA and Stanford discovered that it was relatively easy to put Blacklight on top of a Solr index that had in turn been been populated by ActiveFedora &#8212; effectively turning Blacklight into a search &amp; discovery interface for that Fedora repository.  Based on that, we tried dropping ActiveFedora-driven views &amp; controllers for editing Fedora content into the same Ruby on Rails application as an existing copy of Blacklight.  It worked like a charm.  The two systems happily coexist.  What we found was that as long as we could change and refine how the metadata percolates from Fedora into Solr, getting Blacklight to operate together with the ActiveFedora management component was completely straightforward.</p>
<p>With most Hydra applications, all content is stored in a Fedora Repository.  However, there is nothing to prevent you from adding non-Fedora content to solr and having it show up in the (Blacklight) search &amp; discovery views.  Of course, that content will not be editable unless you implement the code to integrate with that content&#8217;s host system.</p>
<h3>Best of Breed: Blacklight &amp; ActiveFedora</h3>
<p><a rel="nofollow" title="Blacklight Project" target="_blank" href="http://projectblacklight.org">Blacklight</a> is a next generation Search &amp; Discovery tool.  It was intentionally designed to serve a single purpose - Search &amp; Discovery - without having any knowledge of indexing, cataloging, or even the location of the content it&#8217;s searching through.  Whatever information you have in your Solr index, Blacklight will help you expose a rich, faceted search interface for exploring through that information and displaying detail views of individual records.  This open-ended design made it very easy for us to integrate Blacklight directly into our Hydra applications as-is.  The ease with which we achieved this seamless integration is a testament to the quality of Blacklight&#8217;s design.</p>
<p><a rel="nofollow" target="_blank" href="http://yourmediashelf.com/activefedora/">ActiveFedora</a> is a Ruby library that encapsulates the details of interacting with a Fedora repository and provides high-level tools for defining data models, creating Fedora objects, and modifying the data associated with those objects.  While opening the door to rapid, iterative application development, ActiveFedora also attempts to expose and accentuate many of the strong design patterns inherent in Fedora.  ActiveFedora&#8217;s emphasis on flexibility and design patterns provided us with many opportunities to make our Hydra applications robust and re-usable.  In particular, ActiveFedora makes it possible for multiple Hydra (and non-Hydra) applications to operate on top of a single Fedora repository, thus achieving the goal of providing many lightweight views onto complex, heterogeneous repository content.</p>
<h3>Hydra Core: the building blocks of an interface for creating &amp; editing Fedora content</h3>
<p>Hydra Core provides the Ruby on Rails code that handles Creation, Management, &amp; Editing of Fedora content.  This primarily consists of Rails helpers for generating edit interfaces and Rails controllers to handle the submissions from those interfaces.</p>
<p>Fedora allows a great amount of freedom with respect to data models and metadata.  As a result, we could not simply create a single generic content management interface in Hydra.  Instead, we created a number of &#8220;helpers&#8221; that allow you to deal with your Fedora content and its metadata at a high level of abstraction.  For example, the editable_metadata_field helper generates the HTML for displaying an editable version of whichever metadata field you specify.  All you have to know is what field you want to display and where it is stored within the object.  Everything else is handled for you.</p>
<p>The forms generated by the Hydra helpers need somewhere to submit their data to.  This is handled by the Rails controllers provided by Hydra core.</p>
<p>Underneath the helpers and controllers, Hydra Core relies on ActiveFedora to handle connecting with Fedora, modeling Fedora objects &amp; metadata, and performing the basic operations of creation, retrieval, updating and deletion.</p>
<h3>Shelver: a script that brought unexpected freedom</h3>
<p>When we wrote Shelver, we didn&#8217;t anticipate how integral it would become to the application development process.  Shelver started out as something extremely simple.  A developer at Stanford initially wrote it in order to populate a Solr index with some working data from a Fedora Repository.  Over time, as needs arose, we built out the script to be more robust.  It soon became apparent how crucial it is to be able to modify and/or augment the behavior of your indexing tool.  In most other systems, the indexing tool is either implicit (relational databases) or external to your application and difficult to re-configure (ie. Fedora GSearch).  As a result, when working with other systems, discussion of (and changes to) the indexing strategy are kept to a minimum.  In contrast, since we had Shelver at our disposal, we found ourselves constantly tweaking it to satisfy new functionality.  This ability to tweak our indexing routines gives us radical freedom to explore new features, improve the search experience, and increase the quality of search results.</p>
<p>Eventually, we pulled shelver into the Hydra app itself so that we could trigger it as part of the save/update process, though we retained the hooks for running it as a command line tool as well.  We also did this because we found that changes we made to shelver often corresponded to changes in the search interface.  Shelver was continually evolving in conjunction with the application, so it made sense to track the code with a single versioning system.</p>
<h2>Approaches to Indexing: from RDBMS to Fedora + Shelver</h2>
<h3>RDBMS (data model &amp; search index combined)</h3>
<p>If you rely solely on a Relational Database to drive your application, your data model (the database schema) is also your indexing model &#8212; any search oriented changes necessitate changes to your data model.  This makes it difficult to refine and extend the search &amp; discovery portion of the application without impacting other areas of functionality.</p>
<h3>RDBMS + Solr (separate search index from data without much thought to the conceptual differences)</h3>
<p>A number of tools exist for pulling content from a relational database into Solr.  This achieves the goal of separating the search index from the data itself, allowing you to have an indexing model separate from your data model.  However, often with these systems the indexing methodology remains tightly bound to the data model.  This is more of a conceptual stumbling block than a technical one.  It&#8217;s easy to underestimate the complexity and distinctiveness of indexing for search &amp; discovery.  It is not enough to index your data in Solr; you must think differently about how and why you put it there.  This &#8220;thinking&#8221; must be manifest somewhere in the application&#8217;s code, ideally separated from the rest of the application.</p>
<h3>Fedora + GSearch + Solr (freestanding tool specifically handles indexing)</h3>
<p>Fedora is explicitly designed with the idea that you should separate your data model from your indexing solution(s).  This allows us to use any variety of content models and metadata schemas to represent our content in Fedora while pulling that information into any number of indexes to suit specific searching needs.  The most common indexing approach with Fedora repositories is to use Fedora GSearch to pull Fedora content into a Lucene, Solr or Zebra index.  This approach has the benefits of completely separating the data from the index while also providing a freestanding, configurable tool to handle the process of indexing.</p>
<p>GSearch was designed with the goal of enabling 1) full-text searching of Fedora content and 2) indexing of arbitrary XML metadata from Fedora objects.  It runs as a web application alongside Fedora, listening for JMS messages or REST API commands telling it to (re)index Fedora objects.  The process by which GSearch indexes the content is implemented as a mix of XSLT and Java code.</p>
<p>GSearch establishes the strong best practice of decoupling both the search index and the indexing process from the data itself.  This pattern was part of Fedora&#8217;s design all along, but thus far GSearch has been the clearest manifestation of it.</p>
<p>Because it was designed specifically to enable full-text indexing using XSL Transformations (XSLT), GSearch operates on the premise that you are <em>transforming</em> the content in order to put it in the index.  In a basic system, transformations are sufficient.  However, most repository solutions eventually need to actively <em>process</em> the data when indexing it, performing complex actions in order to decide how to populate the search index.  Because XSLT does not lend itself to performing such complex processes, you must modify Java code if you want to implement this type of processing in GSearch.  Modifying that code has proven daunting for most.  Very few projects have taken on the challenge of modifying the GSearch code itself.  Those that have modified the code have only done so in minimal and relatively stable ways.</p>
<h3>Fedora + Shelver + Solr (allowing the indexing methodology to constantly evolve)</h3>
<p>If you want to provide a great search &amp; discovery experience in your application, you must make it easy to iteratively &#8220;massage&#8221; the indexes.  Anyone who manages a Blacklight or VuFind installation on top of their ILS (or anyone who participates in Code4Lib) can attest to the fact that in order to achieve a truly successful search &amp; discovery experience you must continually refine the way you index your metadata.  Little changes in your indexing methodology can bear tangible results for end-users.</p>
<p>In building SALT, the first Hydra application to combine Blacklight with ActiveFedora, we created Shelver as an alternative to GSearch because we wanted to be able to specify our indexing process in Ruby code and, where possible, we wanted to use simple mapping files rather than being forced to use XSLT and Java to perform those actions. We assumed that Shelver would be a relatively simple application whose code rarely changed.  After all, when using GSearch we rarely changed the XSLT and basically never changed the Java code.  We expected that the same would be true with Shelver.  We were wrong.  Shelver is constantly changing because we are constantly coming up with new things that we want to do to improve the search &amp; discovery utilities in our Hydra applications.  As time passes, the code of Shelver itself has stabilized, but the instructions for how to index specific data from Fedora continually morphs as a regular part of the application development process.  In fact, touching the Shelver code has become such an integral part of our work that we can&#8217;t imagine building a repository solution without this kind of freedom.</p>
<h2>Conclusion, Observations and Best Practices</h2>
<p>To review, some of the recommendations coming out of this work are to</p>
<ul>
<li>use indexing as the crossover point between decoupled solutions for searching through and managing your content</li>
<li>make the indexer an explicit, evolving part of your application</li>
<li>use flexible components that were designed with iterative development in mind</li>
<li>re-use established best practices where possible</li>
<li>combine best of breed solutions for astounding results</li>
</ul>
<p>We were pleasantly surprised to discover how easy it was to combine Blacklight and ActiveFedora into a single Fedora solution.  The three-part pattern that emerged out of this effort, which now constitutes a basic Hydra application, builds on well established practices and serendipitously combines them in a stable, intuitive way.  This in turn provides a strong base for us all to carry out a great amount of innovative work in the coming years.</p>
<img src="http://feeds.feedburner.com/~r/MediashelfBlog/~4/DAJYpAvrXWY" height="1" width="1"/>]]></content:encoded>
      </item>
      <item>
         <title>Dev8D 2010 Day 1</title>
         <link>http://feedproxy.google.com/~r/MediashelfBlog/~3/tNSr4z4Rp4E/</link>
         <description>JISC Developer Days (aka dev8D) kicked off today at the University of London Union with an estimated 500 attendees. I don&amp;#8217;t remember the number of attendees last year, but it feels much bigger this year. Spanning five rooms, I counted at least 27 different sessions, talks and workshops over the space of a single afternoon.
Linked [...]</description>
         <guid isPermaLink="false">http://yourmediashelf.com/blog/?p=225</guid>
         <pubDate>Thu, 25 Feb 2010 01:12:19 +0000</pubDate>
         <content:encoded><![CDATA[<p><a rel="nofollow" target="_blank" href="http://dev8d.org/">JISC Developer Days</a> (aka dev8D) kicked off today at the University of London Union with an estimated 500 attendees. I don&#8217;t remember the number of attendees last year, but it feels <em>much</em> bigger this year. Spanning five rooms, I counted at least <a rel="nofollow" target="_blank" href="http://data.dev8d.org/2010/programme/dev8d_programme.html">27 different sessions</a>, talks and workshops over the space of a single afternoon.</p>
<p>Linked Data was the dominant theme today, starting with the pre-conference <a rel="nofollow" target="_blank" href="http://www.meetup.com/Web-Of-Data/calendar/12317420/">Linked Data Meetup</a> which included presentations and workshops from Tom Scott, Patrick Sinclair, and Silver Oliver of the BBC and John Sheridan &amp; Jeni Tennison of <a rel="nofollow">data.gov.uk</a> among many, many others.</p>
<p>Lin Clark of DERI ran a Drupal RDFa workshop in the afternoon. She demonstrated <a rel="nofollow" target="_blank" href="http://drupal.org/project/rdfcck">RDF Mapping</a>, the <a rel="nofollow" target="_blank" href="http://drupal.org/project/rdfproxy">RDF SPARQL Proxy</a>, and the <a rel="nofollow" target="_blank" href="http://drupal.org/project/sparql_ep">RDF SPARQL Endpoint</a> modules and gave a preview of the RDF features to expect in Drupal 7 core.</p>
<p>One session I really wanted, but wasn&#8217;t able, to attend was Jim Downing&#8217;s <a rel="nofollow" target="_blank" href="http://wiki.2010.dev8d.org/w/Project_Zone_Wed_PM#Multicore_Processing_-_Session_1">Multicore Processing</a> session. Not a terribly interesting title, but he worked through mining linked data from data.gov.uk, using the Clojure library <a rel="nofollow" target="_blank" href="http://liebke.github.com/incanter/">Incanter</a> to find correlations in the data, and finally, parallelizing the data processing.</p>
<p>The dev8D spirit of developer happiness was nowhere more evident than in this year&#8217;s new attention to electronics. I sat at a table watching Emma Tonkin build an electronic whiteboard with <a rel="nofollow" target="_blank" href="http://dev8d.jiscinvolve.org/2010/02/24/how-to-make-a-low-cost-electronic-whiteboard/">a nano-projector and Wiimote</a>. Walking over to the projects room, I saw Ben O&#8217;Steen working with facial recognition software, Dave Tarrant turning off and dimming lights over the local network, Julian Cheal wiring each room&#8217;s entry with an RFID reader, and Ian Ibbotson programming an Arduino. Tomorrow, I&#8217;m told, we&#8217;ll have someone showing off a <a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/RepRap_Project">RepRap</a> 3D printer.</p>
<p>I find the impromptu meetings as valuable as any of the structured components of dev8D. I sat down with Andy Turner, a researcher at the University of Leeds, who introduced me to <a rel="nofollow" target="_blank" href="http://www.casa.ucl.ac.uk/genesisblog/?page_id=2">GENeSIS</a>, a mad-scientist level modeling and simulation project that I thought was absolutely fascinating. Just as serendipitously, as we wrapped up Day 1 of dev8D at the pub (where else?), I found myself drinking a pint with Scott Wilson and we got to talking about Apache Wookie and HTML 5.</p>
<p>See you tomorrow!</p>
<img src="http://feeds.feedburner.com/~r/MediashelfBlog/~4/tNSr4z4Rp4E" height="1" width="1"/>]]></content:encoded>
      </item>
      <item>
         <title>My swiss army toolkit for distributed/multiprocessing systems</title>
         <link>http://oxfordrepo.blogspot.com/2010/02/my-swiss-army-toolkit-for.html</link>
         <description>&lt;div&gt;My first confession - I avoid 'threading' and shared memory. Avoid it like the plague, not because I cannot do it but because it can be a complete pain to build and maintain relative to the alternatives.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I am very much pro multiprocessing versus multithreading - obviously, there are times when threading is by far the best choice, but I've found multiprocessing for the most part it to be quicker, easier and far easier to log, manage and debug than multithreading.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, what do I mean by a 'multiprocessing' system? (just to be clear)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;A multiprocessing system consists of many concurrently running processes running on one or more machines, and contains some means to distribute messages and persist data between these processes.&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This does not mean that the individual processes cannot multithread themselves, it is just that each process handles a small, well-defined aspect of the system (paralleling the unix commandline tool idiom).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Tools for multiprocess management:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href="http://code.google.com/p/redis"&gt;Redis&lt;/a&gt; - data structure server, providing atomic operations on integers, lists, sets, and sorted lists.&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href="http://www.rabbitmq.com/"&gt;RabbitMQ&lt;/a&gt; - messaging server, based on the AMQP spec. IMO Much cleaner, easier to manage, more flexible and more reliable than all the JMS systems I've used.&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href="http://supervisord.org/"&gt;Supervisor&lt;/a&gt; - a battle-tested, process manager that can be operated via XML-RPC or HTTP. Enables live control and status of your processes.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Redis has become my swiss army knife of data munging - a store that persists data and which has some very useful atomic operations, such as integer incrementing, list manipulations and very fast set operations. I've also used it for some quick-n-dirty process orchestrations (which is how I've used it in the example that ends this post.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I've also used it for usage statistic parsing and characterisation of miscellaneous XML files too!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;RabbitMQ - a dependable, fast message server which I am primarily using as a buffer for asynchronous operations and task distribution. More boilerplate to use than, say Redis, but by far more suited for that sort of thing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Supervisord - I've been told that the ruby project 'god' is similar to this - I really have found it very useful, especially on those systems I run remotely. An HTML page to control processes and view logs and stats? what's not to like!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;Now for a little illustration of a simple multiprocessing solution - in fact, this blog post far, far outweighs the code written and perhaps even overeggs the simple nature of the problem. I typically wouldn't use supervisor for a simple task like the following, but it seems a suitable example to show how to work it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The ability to asynchronously deliver messages, updates and tasks between your processes is a real boon - it enables quick solutions to normally vexing or time-consuming problems. For example, let's look at a trivial problem of how to harvest the content from a repository with an OAI-PMH service:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A possible solution needs:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;a process to communicate with the OAI-PMH service to gain the list of identifiers for the items in the repository (with the ability to update itself at a later time). Including the ability to find the serialised form of the full metadata for the item, if it cannot be gotten from the OAI-PMH service (eg Eprints3 XML isn't often included in the OAI-PMH service, but can be retrieved from the Export function.),&lt;/li&gt;&lt;li&gt;a process that simply downloads files to a point on the disc,&lt;/li&gt;&lt;li&gt;and a service that allows process one to queue jobs for process 2 to download - in this case Redis.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;I told you it would be trivial :)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Installing Redis: (See &lt;a rel="nofollow" target="_blank" href="http://code.google.com/p/redis/wiki/QuickStart"&gt;http://code.google.com/p/redis/wiki/QuickStart&lt;/a&gt; for fuller instructions)&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;sudo apt-get install build-essential python-dev python-setuptools [make sure you can build and use easy_install - here shown for debian/ubuntu/etc]&lt;/li&gt;&lt;li&gt;sudo easy_install supervisor&lt;/li&gt;&lt;li&gt;mkdir oaipmh_directory        # A directory to contain all the bits you need&lt;/li&gt;&lt;li&gt;cd oaipmh_directory&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;Create a supervisor configuration for the task at hand and save it as supervisord.conf.&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;[program:oaipmhgrabber]&lt;/div&gt;&lt;div&gt;autorestart = false&lt;/div&gt;&lt;div&gt;numprocs = 1&lt;/div&gt;&lt;div&gt;autostart = false&lt;/div&gt;&lt;div&gt;redirect_stderr = True&lt;/div&gt;&lt;div&gt;stopwaitsecs = 10&lt;/div&gt;&lt;div&gt;startsecs = 10&lt;/div&gt;&lt;div&gt;priority = 10&lt;/div&gt;&lt;div&gt;command = python harvest.py&lt;/div&gt;&lt;div&gt;startretries = 3&lt;/div&gt;&lt;div&gt;stdout_logfile = workerlogs/harvest.log&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[program:downloader]&lt;/div&gt;&lt;div&gt;autorestart = true&lt;/div&gt;&lt;div&gt;numprocs = 1&lt;/div&gt;&lt;div&gt;autostart = false&lt;/div&gt;&lt;div&gt;redirect_stderr = True&lt;/div&gt;&lt;div&gt;stopwaitsecs = 10&lt;/div&gt;&lt;div&gt;startsecs = 10&lt;/div&gt;&lt;div&gt;priority = 999&lt;/div&gt;&lt;div&gt;command = oaipmh_file_downloader q:download_list&lt;/div&gt;&lt;div&gt;startretries = 3&lt;/div&gt;&lt;div&gt;stdout_logfile = workerlogs/download.log&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[program:redis]&lt;/div&gt;&lt;div&gt;autorestart = true&lt;/div&gt;&lt;div&gt;numprocs = 1&lt;/div&gt;&lt;div&gt;autostart = true&lt;/div&gt;&lt;div&gt;redirect_stderr = True&lt;/div&gt;&lt;div&gt;stopwaitsecs = 10&lt;/div&gt;&lt;div&gt;startsecs = 10&lt;/div&gt;&lt;div&gt;priority = 999&lt;/div&gt;&lt;div&gt;command = path/to/the/redis-server&lt;/div&gt;&lt;div&gt;startretries = 3&lt;/div&gt;&lt;div&gt;stdout_logfile = workerlogs/redis.log&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[unix_http_server]&lt;/div&gt;&lt;div&gt;file = /tmp/supervisor.sock&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[supervisord]&lt;/div&gt;&lt;div&gt;minfds = 1024&lt;/div&gt;&lt;div&gt;minprocs = 200&lt;/div&gt;&lt;div&gt;loglevel = info&lt;/div&gt;&lt;div&gt;logfile = /tmp/supervisord.log&lt;/div&gt;&lt;div&gt;logfile_maxbytes = 50MB&lt;/div&gt;&lt;div&gt;nodaemon = false&lt;/div&gt;&lt;div&gt;pidfile = /tmp/supervisord.pid&lt;/div&gt;&lt;div&gt;logfile_backups = 10&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[supervisorctl]&lt;/div&gt;&lt;div&gt;serverurl = unix:///tmp/supervisor.sock&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[rpcinterface:supervisor]&lt;/div&gt;&lt;div&gt;supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[inet_http_server]&lt;/div&gt;&lt;div&gt;username = guest&lt;/div&gt;&lt;div&gt;password = mypassword&lt;/div&gt;&lt;div&gt;port = 127.0.0.1:9001&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This has a lot of boilerplate on it, so let's go through it, section by section:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[program:redis] - this controls the redis program. You will need to change the path to the redis server to wherever it was built on your system - eg ~/redis-1.2.1/redis-server&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[program:oaipmhgrabber] and [program:downloader] - these set up the processes, look at the 'command' key for the command that is run for them eg downloader has "oaipmh_file_downloader q:download_list" - The OAIPMHScraper package adds in the script, 'q:download_list' is the redis-based list that the download tasks appear on. NB we haven't written harvest.py yet - don't worry!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;NB very important that autorestart=false in [program:oaipmhgrabber] - if it didn't, it would eternally repeat - on and on and on - harvesting!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Supervisor boilerplate: [unix_http_server], [supervisord], [supervisorctl]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;RPC interface control [rpcinterface:supervisor]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;HTTP interface control - [inet_http_server] - which includes importantly the username and password to log in to the control panel!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, create the log directory:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;mkdir workerlogs&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's now write 'harvest.py': PLEASE use a different OAI2 endpoint url!&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;#!/usr/bin/env python&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;from oaipmhscraper import Eprints3Harvester&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;o = Eprints3Harvester("repo", base_oai_url="http://eprints.maths.ox.ac.uk/cgi/oai2/")&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;o.getRecords(metadataPrefix="XML",&lt;/div&gt;&lt;div&gt;                        template="%(pid)s/%(prefix)s/mieprints-eprint-%(pid)s.xml")&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;[Note there is a base OAIPMHScraper class, but this simply goes and gets the metadata or Identifiers for a given endpoint and stores whatever XML metadata it gets into a store. The Eprints3 harvester gets the files as well, or tries to.]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You may have to change the template for other eprints repositories - the above template would result in the following for item 774:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;"http://eprints.maths.ox.ac.uk/cgi/export/774/XML/mieprints-eprint-774.xml"&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;YMMV for other repositories of course, so you can rewrite this template accordingly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Your directory should look like this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;--&amp;gt;  harvest.py  supervisord.conf  workerlogs/&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's start the supervisor to make the configuration is correct:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[---] $ supervisord -c supervisord.conf&lt;/div&gt;&lt;div&gt;[---] $ &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now open &lt;a rel="nofollow" target="_blank" href="http://localhost:9001/"&gt;http://localhost:9001/&lt;/a&gt; - it should look like the following:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color:rgb(0, 0, 238);"&gt;&lt;img src="http://2.bp.blogspot.com/_KLlGSypGAvw/S3QGPGqfG8I/AAAAAAAAAFI/QSLPEbfZPQE/s320/supervisor.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5436977506699647938" style="display:block;margin-top:0px;margin-right:auto;margin-bottom:10px;margin-left:auto;text-align:center;cursor:pointer;width:320px;height:153px;"/&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;Click on the 'redis' name to see the logfile that this is generating - you'll want to see lines like:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;11 Feb 13:34:32 . 0 clients connected (0 slaves), 2517 bytes in use, 0 shared objects&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;Let's start the harvest :)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Click on 'start' for the oaipmh grabber process and wait - in the configuration file, we told it to wait for the process to stay up for 10 seconds before reporting that it was running, so it should take about that long for the page to refresh.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, let's see what it is putting onto the queue, before we start the download process (see, easy to debug!)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;python&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; from redis import Redis&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; r = Redis()&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; r.keys("*")&lt;/div&gt;&lt;div&gt;&lt;div&gt;[u'q:download_list']&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; r.llen("q:download_list")&lt;/div&gt;&lt;div&gt;351&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; r.llen("q:download_list")&lt;/div&gt;&lt;div&gt;361&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; r.llen("q:download_list")&lt;/div&gt;&lt;div&gt;370&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; # Still accruing things to download as we speak...&lt;/div&gt;&lt;div&gt;&amp;gt;&amp;gt;&amp;gt; r.lrange("q:download_list", 0,0)&lt;/div&gt;&lt;div&gt;[u'{"url": "http://eprints.maths.ox.ac.uk/cgi/export/774/XML/mieprints-eprint-774.xml", "filename": "XML", "pid": "oai:generic.eprints.org:774", "silo": "repo"}']&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, let's switch on the downloader and work on those messages - go back to http://localhost:9001 and start the downloader. Click on the downloader name when the page refreshes to get a 'tail' of it's logfile in the browser.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You should get something like the following:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:CombineHarvester File downloader:Starting download of XML (from http://eprints.maths.ox.ac.uk/cgi/export/370/XML/mieprints-eprint-370.xml) to object oai:generic.eprints.org:370&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-02-11 13:43:51,284 - CombineHarvester File downloader - INFO - Download completed in 0 seconds&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:CombineHarvester File downloader:Download completed in 0 seconds&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-02-11 13:43:51,285 - CombineHarvester File downloader - INFO - Saving to Silo repo&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:CombineHarvester File downloader:Saving to Silo repo&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;2010-02-11 13:43:51,287 - CombineHarvester File downloader - INFO - Starting download of XML (from http://eprints.maths.ox.ac.uk/cgi/export/371/XML/mieprints-eprint-371.xml) to object oai:generic.eprints.org:371&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;INFO:CombineHarvester File downloader:Starting download of XML (from http://eprints.maths.ox.ac.uk/cgi/export/371/XML/mieprints-eprint-371.xml) to object oai:generic.eprints.org:371&lt;/span&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:small;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt; So, that will go about and download all the XML (Eprints3 XML) for each item it found in the repository. (I haven't put in much to stop dupe downloads etc. - exercise for the reader ;))&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;How about we try to download the files for each item too? I just so happens I've included a little Eprints3 XML parser and method for queuing up the files for download 'reprocessRecords' - let's use this to download the files now - save as download_files.py&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;#!/usr/bin/env python&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;from oaipmhscraper import Eprints3Harvester&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;o = Eprints3Harvester("repo", base_oai_url="http://eprints.maths.ox.ac.uk/cgi/oai2/")&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;o.reprocessRecords()&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;Add this process to the top of the supervisord.conf file:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;[program:queuefilesfordownload]&lt;/div&gt;&lt;div&gt;autorestart = false&lt;/div&gt;&lt;div&gt;numprocs = 1&lt;/div&gt;&lt;div&gt;autostart = false&lt;/div&gt;&lt;div&gt;redirect_stderr = True&lt;/div&gt;&lt;div&gt;stopwaitsecs = 10&lt;/div&gt;&lt;div&gt;startsecs = 10&lt;/div&gt;&lt;div&gt;priority = 999&lt;/div&gt;&lt;div&gt;command = python download_files.py&lt;/div&gt;&lt;div&gt;startretries = 3&lt;/div&gt;&lt;div&gt;stdout_logfile = workerlogs/download_files.log&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, to demonstrate the commandline supervisor controller:&lt;/div&gt;&lt;div&gt;[--] $ supervisorctl&lt;/div&gt;&lt;div&gt;&lt;div&gt;$ supervisorctl &lt;/div&gt;&lt;div&gt;downloader                       RUNNING    pid 20750, uptime 0:15:41&lt;/div&gt;&lt;div&gt;oaipmhgrabber                    STOPPED    Feb 11 01:58 PM&lt;/div&gt;&lt;div&gt;redis                            RUNNING    pid 16291, uptime 0:25:31&lt;/div&gt;&lt;div&gt;supervisor&amp;gt; shutdown&lt;/div&gt;&lt;div&gt;Really shut the remote supervisord process down y/N? y&lt;/div&gt;&lt;div&gt;Shut down&lt;/div&gt;&lt;div&gt;supervisor&amp;gt; &lt;/div&gt;&lt;div&gt;(Press Ctrl+D to leave this terminal)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now restart the supervisor:&lt;/div&gt;&lt;div&gt;[--] $ supervisord -c supervisord.conf&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And refresh http://localhost:9001/&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[NB in the following picture, I reran oaipmhgrabber, so you could see what the status of a normally exiting process looks like]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://1.bp.blogspot.com/_KLlGSypGAvw/S3QOwkhk0eI/AAAAAAAAAFU/ZrAuV2di7wQ/s1600-h/supervisor2.png"&gt;&lt;img style="display:block;margin:0px auto 10px;text-align:center;cursor:pointer;cursor:hand;width:320px;height:154px;" src="http://1.bp.blogspot.com/_KLlGSypGAvw/S3QOwkhk0eI/AAAAAAAAAFU/ZrAuV2di7wQ/s320/supervisor2.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5436986877744042466"/&gt;&lt;/a&gt;Now, switch on the reprocess record worker and tail -f the downloader if you want to watch it work :)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;What's a RecordSilo&lt;/b&gt;? (aka How things are stored in the example)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This class is based on CDL's spec for &lt;a rel="nofollow" target="_blank" href="https://confluence.ucop.edu/display/Curation/PairTree"&gt;Pairtree&lt;/a&gt; object storage - each object contains a JSON manifest and is made up of object-level versions. But, it is easier to understand if you have some kind of GUI to poke around with, so I quickly wrote the following dropbox.py server for that end:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Grab the dropbox code and templates from &lt;a rel="nofollow" target="_blank" href="http://github.com/benosteen/SiloServer"&gt;http://github.com/benosteen/SiloServer&lt;/a&gt; - unpack it into the same directory as you are in now.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;so that:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[--] $ ls&lt;/div&gt;&lt;div&gt;download_files.py  dropbox.py  dump.rdb  harvest.py  repo  supervisord.conf  templates  workerlogs&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Edit dropbox.py and change the data_dir to equal your repo directory name - in this case, just "repo"&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Make sure you have mako and web.py installed too! sudo easy_install mako web.py)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;then:  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;$ python dropbox.py &lt;/div&gt;&lt;div&gt;http://0.0.0.0:8080/&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Go to http://localhost:8080/ to then see all your objects! This page opens them all, so could take a while :)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://3.bp.blogspot.com/_KLlGSypGAvw/S3QVsECsWdI/AAAAAAAAAFo/UBDx4VKNB3M/s1600-h/dropbox2.png"&gt;&lt;img style="cursor:pointer;cursor:hand;width:320px;height:245px;" src="http://3.bp.blogspot.com/_KLlGSypGAvw/S3QVsECsWdI/AAAAAAAAAFo/UBDx4VKNB3M/s320/dropbox2.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5436994496886495698"/&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://3.bp.blogspot.com/_KLlGSypGAvw/S3QVVAzYa6I/AAAAAAAAAFg/qpe4ffhbFGs/s1600-h/dropbox.png"&gt;&lt;img style="cursor:pointer;cursor:hand;width:320px;height:137px;" src="http://3.bp.blogspot.com/_KLlGSypGAvw/S3QVVAzYa6I/AAAAAAAAAFg/qpe4ffhbFGs/s320/dropbox.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5436994100879977378"/&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(I did this on my work computer and may have not put in some dependencies, etc but it worked for me. Let me know if it doesn't in the comments)&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-5117977572954400653</guid>
         <pubDate>Thu, 11 Feb 2010 03:45:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://2.bp.blogspot.com/_KLlGSypGAvw/S3QGPGqfG8I/AAAAAAAAAFI/QSLPEbfZPQE/s72-c/supervisor.png" width="72" />
      </item>
      <item>
         <title>Usage stats and Redis</title>
         <link>http://oxfordrepo.blogspot.com/2010/01/usage-stats-and-redis.html</link>
         <description>Redis has been such a massively useful tool to me. &lt;p&gt;Recently, it has let me cut through access logs munging like a hot knife through butter, all with multiprocessing goodness.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Key things:&lt;/b&gt;&lt;/p&gt;&lt;p&gt;&lt;i&gt;Using sets to manage botlists:&lt;/i&gt;&lt;/p&gt;&lt;p&gt;&amp;gt;&amp;gt;&amp;gt; from redis import Redis&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; r = Redis()&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; for bot in r.smembers("botlist"):&lt;br /&gt;...   print bot&lt;br /&gt;...&lt;br /&gt;lycos.txt&lt;br /&gt;non_engines.txt&lt;br /&gt;inktomi.txt&lt;br /&gt;misc.txt&lt;br /&gt;askjeeves.txt&lt;br /&gt;oucs_bots&lt;br /&gt;wisenut.txt&lt;br /&gt;altavista.txt&lt;br /&gt;msn.txt&lt;br /&gt;googlebotlist.txt&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; total = 0&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; for bot in r.smembers("botlist"):&lt;br /&gt;...   total = total + r.scard(bot)&lt;br /&gt;...&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; total&lt;br /&gt;3882&lt;br /&gt;&lt;/p&gt;&lt;p&gt;So, I have 3882 different IP addresses that I have built up that I consider bots.&lt;/p&gt;&lt;p&gt;&lt;i&gt;Keeping counts and avoiding race-conditions&lt;/i&gt;&lt;/p&gt;&lt;p&gt;By using the Redis INCR command, it's easy to write little workers that run in their own process but which atomically increment counts of hits.&lt;/p&gt;&lt;p&gt;&lt;b&gt;What does the stat system look like?&lt;br /&gt;&lt;/b&gt;&lt;/p&gt;&lt;p&gt;I am treating each line of the Apache-style log as a message that I am passing through a number of workers. &lt;/p&gt;&lt;p&gt;&lt;b&gt;Queues&lt;/b&gt;&lt;/p&gt;&lt;p&gt;All in the same AMQP exchange: ("stats")&lt;/p&gt;&lt;p&gt;Queue "&lt;b&gt;loglines&lt;/b&gt;" - msg's = A single log line in the Apache format. Can be sourced from either local logs or from the live service.&lt;/p&gt;&lt;p&gt;&lt;b&gt;loglines&lt;/b&gt; is listened to by a &lt;b&gt;debot.py&lt;/b&gt; worker, just one at the moment. This worker feeds three queues:&lt;/p&gt;&lt;p&gt;Queue "&lt;b&gt;bothits&lt;/b&gt;" - log lines from a request that matches a bot IP&lt;/p&gt;&lt;p&gt;Queue "&lt;b&gt;objectviews&lt;/b&gt;" - log lines from a request that was a record page view or item download&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Queue "&lt;b&gt;other&lt;/b&gt;" - log lines that I am presently not so interested in.&lt;/p&gt;&lt;p&gt;[These three queues are consumed by 3 loggers and these maintain a copy of the logs, pre-separated. These are designed to be temporary parts of the workflow, to be discarded once we know what we want from the logs.]&lt;/p&gt;&lt;p&gt;&lt;b&gt;objectviews&lt;/b&gt; is subscribed to by a &lt;b&gt;count.py&lt;/b&gt; worker which does the heavy crunching as shown below.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Debot.py&lt;/b&gt;&lt;br /&gt;&lt;/p&gt; &lt;p&gt;The first worker is 'debot.py' - this does the broad separation and checking of a logged event. In essence, it uses the Redis SISMEMBER command to see if the IP address is in the blacklists and if not, applies a few regex's to see if it is a record view and/or a download or something else.&lt;/p&gt; &lt;p&gt;&lt;b&gt;Broad Logging&lt;/b&gt;&lt;/p&gt; &lt;p&gt;There are three logger workers that debot.py feeds for "bothits", "objectviews", and "other" - these workers just sit and listen on the relevant queue for an apache log line and appends it to the logfile it has open. Saves me having to open/close logger objects or pass anything around.&lt;/p&gt; &lt;p&gt;The logfiles are purely as a record of the processing and so I can skip redoing it if I want to do any further analysis, like tracking individuals, etc.&lt;/p&gt;&lt;p&gt;The loggers also INCR a key in Redis for each line they see - u:objectviews, u:bothits, and u:other as appropriate - these give me a rough idea of how the processing is going.&lt;/p&gt;&lt;p&gt;(And you can generate pretty charts from it too:)&lt;/p&gt;&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://chart.apis.google.com/chart?cht=p3&amp;amp;chds=0,9760660&amp;amp;chd=t:368744,9760660,1669552&amp;amp;chs=600x200&amp;amp;chl=Views%7CBots%7COther"&gt;http://chart.apis.google.com/chart?cht=p3&amp;amp;chds=0,9760660&amp;amp;chd=t:368744,9760660,1669552&amp;amp;chs=600x200&amp;amp;chl=Views|Bots|Other&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;a rel="nofollow" target="_blank" href="http://chart.apis.google.com/chart?cht=p3&amp;amp;chds=0,9760660&amp;amp;chd=t:368744,9760660,1669552&amp;amp;chs=600x200&amp;amp;chl=Views%7CBots%7COther"&gt;&lt;img style="margin:0px auto 10px;display:block;text-align:center;cursor:pointer;width:600px;height:200px;" src="http://chart.apis.google.com/chart?cht=p3&amp;amp;chds=0,9760660&amp;amp;chd=t:368744,9760660,1669552&amp;amp;chs=600x200&amp;amp;chl=Views%7CBots%7COther" alt="" border="0"/&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;(data sourced at a point during the processing - 10million bot hits vs 360k object views/dls)&lt;br /&gt;&lt;/p&gt; &lt;p&gt;&lt;b&gt;Counting hits (metadata and time based)&lt;br /&gt;&lt;/b&gt;&lt;/p&gt; &lt;p&gt;Most of the heavy lifting is in count.py - this is fed from the object views/downloads stream coming from the debot.py worker. It does a number of procedural steps for the metadata:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Get metadata from ORA's Solr endpoint (as JSON)&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Specifically, get the 'authors' (names), subjects/keyphrases, institutions, content types, and collections things appear in.&lt;/li&gt;&lt;li&gt;These fields correspond to certain keys in Redis. Eg names = 'number:names' = number of unique names, 'n:...' = hits to a given name, etc&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;For each view/dl:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;INCR 'ids:XXXXX' where XXXXX is 'names', 'subjects', etc. It'll return the new value for this, eg 142&lt;/li&gt;&lt;li&gt;SET X:142 to be equal to the text for this new entity, where X is the prefix for the field.&lt;/li&gt;&lt;li&gt;SADD this id (eg X:142) to the relevant set for it, like 'names', 'subjects', etc - This is so we can have an accurate idea of the entities in use even after removing/merging them.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;Reverse lookup:&lt;/i&gt; Hash the text for the entity (eg md5("John F. Smith")) and SET r:X:{hash} to be equal to "X:142"&lt;br /&gt;&lt;/li&gt;&lt;li&gt;SET X:views:142 to be equal to 1 to get the ball rolling (or X:dl:142 for downloads)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;If the name is not new:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Hash the text and lookup r:{hash} to get the id (eg n:132)&lt;/li&gt;&lt;li&gt;INCR the item's counter (eg INCR n:views:132)&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;Time-based and other counts:&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;INCR t:{object id} (total hits on that repository object since logs began)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;INCR t:MMYY (total 'proper' hits for that month)&lt;/li&gt;&lt;li&gt;INCR t:MMYY:{object id} (total 'proper' hits for that repo item that month)&lt;/li&gt;&lt;li&gt;INCR t:MMYY:{entity id} (Total hits for an entity, say 'n:132' that month)&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt; &lt;p&gt;A lot of pressure is put on Redis by count.py but it seems to be coping fine. A note for anyone else thinking about this: Redis keeps its datastore in RAM - running out of RAM is a Bad Thing(tm).&lt;/p&gt;&lt;p&gt;I know that I could also just use the md5 hashes as ids, rather than using a second id - I'm still developing this section and this outline just states it how it is now!&lt;br /&gt;&lt;/p&gt; &lt;p&gt;Also, it's worth noting that if I needed to, I can put remote redis 'shards' on other machines and they can just pull log lines from the main objectview queue to process. (It'll still need to create the id &amp;lt;-&amp;gt; entity name mapping on the main store though or a slave of the main store.)&lt;/p&gt; &lt;p&gt;&lt;b&gt;But why did I do this?&lt;/b&gt;&lt;/p&gt;&lt;p&gt;I thought that it would mean I could handle both legacy logs and live data and have a framework I could put against other systems and in a way that would mean I would write less code and for the system to be more reliable.&lt;/p&gt;&lt;p&gt;So far, I still think this is the case. If people are interested, I'll abstract out a class or two (eg the metadata lookup function, etc) and stick it on google code. It's not really a lot of code so far, I think even this outline post is longer....&lt;br /&gt;&lt;/p&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-6053583822562222201</guid>
         <pubDate>Mon, 18 Jan 2010 09:19:00 +0000</pubDate>
      </item>
      <item>
         <title>#iPres09: e-Infrastructure and digital preservation: challenges and outlook</title>
         <link>http://davetaz-blog.blogspot.com/2009/10/ipres09-e-infrastructure-and-digital.html</link>
         <description>e-infrastructure: Starts by defining infrastructure (see wikipedia) and e-infrastructure specific to a collection of European digital repositories. So basically we are looking at opportunities to build and supply services which are applicable to these repositories. &lt;br /&gt;&lt;br /&gt;Background: EU is supplying lots of support for this and in germany they are researching national approaches, identifying activities and assign tasks to "expert" institutions. By introducing the current fields of project he is outlining that there is still a significant mismatch between the scale of the problem and the amount of effort being expended. From this he outlines that there is a significant lack of common approaches to solving problems. [I don't think this will ever go away, unless there is a mandate, and even then not everyone will want to sign up].&lt;br /&gt;&lt;br /&gt;[Lots of argument] Funding is focused on many individual projects and thus doubles up the the argument that there are no commons. This led leads to a slide about interoperability and standards and the lack of them. [Which again, i don't think will ever go away and I think that we should be appreciative that people tend to pick XML to encode their data in, this makes it interoperable right]. &lt;br /&gt;&lt;br /&gt;[This is a start of project presentation, I don't seem to see that much output. They have some simple models as diagrams, again though at this stage it is hard to see how they are not just another project which will come up with (another) set of standards which no one will then want to adopt.] &lt;br /&gt;&lt;br /&gt;Giving a set of examples now where they are going to re-use and extend existing software/projects. The goals are good, in terms of concrete steps for global infrastructure for registries, data formats, software deposits and risk management. [Just not sure how achievable all this is based upon the fact it has been the aim of many projects already]</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-5274314513105060713</guid>
         <pubDate>Mon, 05 Oct 2009 13:39:00 +0000</pubDate>
      </item>
      <item>
         <title>Exploring curation micro-services</title>
         <link>http://feedproxy.google.com/~r/technosophia/~3/zPtUh6TmyRo/</link>
         <description>As far as I&amp;#039;m concerned, the most exciting developments this year in repositories and digital curation have come out of the California Digital Library. It has been impossible not to notice their papers and presentations. Put simply, their idea is that digital curation is enabled by &amp;#034;micro-services&amp;#034; built upon well-known abstractions such as the filesystem. [...]</description>
         <guid isPermaLink="false">http://lackoftalent.org/michael/blog/?p=504</guid>
         <pubDate>Sun, 27 Sep 2009 04:00:29 +0000</pubDate>
         <content:encoded><![CDATA[<p><img src="http://lackoftalent.org/images/micro_repo_thumb.png" alt="thumbnail of micro-repo tree" style="float:left;"/>As far as I&#039;m concerned, the most exciting developments this year in  repositories and digital curation have come out of the California Digital Library.  It has been impossible <a rel="nofollow" target="_blank" href="http://conferences.library.gatech.edu/or/or09/paper/view/95">not</a> <a rel="nofollow" target="_blank" href="http://uccsc2009.ucdavis.edu/preso/UCCSC-2009-CDL-PODS-v05.ppt">to</a> <a rel="nofollow" target="_blank" href="http://www.ijdc.net/index.php/ijdc/article/view/98">notice</a> <a rel="nofollow" target="_blank" href="https://meeting-reg.com/sunpasig/abstracts.php">their</a> <a rel="nofollow" target="_blank" href="http://www.digitalpreservation.gov/news/events/ndiipp_meetings/ndiipp09/docs/NDIIPP%20Partner%20Meeting%202009_Breakout%20Session%20Schedule.pdf">papers</a> <a rel="nofollow" target="_blank" href="http://www.ijdc.net/index.php/ijdc/article/view/108/84">and</a> <a rel="nofollow" target="_blank" href="http://www.cdlib.org/iPres/confsched.html">presentations</a>.  Put simply, their idea is that digital curation is enabled by &#034;micro-services&#034; built upon well-known abstractions such as the filesystem.  The benefits are obvious: filesystem tools are ubiquitous and cross-platform, and there are strong market forces to ensure the filesystem persists.  The idea is radically simple and straightforward, though many questions remain about such a paradigm.  I&#039;ll return to those later. </p>
<p>If you have not yet taken a look at CDL&#039;s curation micro-service specifications, most of which may be printed on as few as one or two sheets of paper, see the <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/">Digital Library Building Blocks</a>.</p>
<p>My co-workers in the LC Repository Development Center have been chatting about these specs on and off throughout the year.  After months of procrastinating, I finally read all of the specs on Thursday; it&#039;s wonderful that you can do so in the course of one reading session, I might add.  Yesterday a bunch of us RDCers got together to chat (informally) about the specs: what they&#039;re for, how they work, and how they interact with one another.  I learn by doing, by examples, so I combed through each of the specs in advance of our meeting and tried to <a rel="nofollow" target="_blank" href="http://twitter.com/mjgiarlo/statuses/4371794936">construct</a> a minimal repository[<a rel="nofollow" target="_blank" href="http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/#footnote_0_504" id="identifier_0_504" class="footnote-link footnote-identifier-link" title="Perhaps it&amp;#8217;s more in line with the specs to refer to this space as &amp;#8220;a managed filesystem that drives repository and curation services,&amp;#8221; given the CDL philosophy that preservation is not a place/repository.  But it&amp;#8217;s easier to say &amp;#8220;repository,&amp;#8221; so there you go.">1</a>] based on micro-services.<br />
<span id="more-504"></span><br />
Here is a tree visualization of the final product, inevitable warts and all: <a rel="nofollow" target="_blank" href="http://lackoftalent.org/images/micro_repo.png"><img src="http://lackoftalent.org/images/micro_repo.png" alt="sample micro-services repo tree"/></a>  The services I used were <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/namaste/namastespec.html">Namaste</a>, <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/can/canspec.pdf">Content Access Node (CAN)</a>, <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html">Pairtree</a>, <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/dflat/dflatspec.pdf">Dflat</a>, <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/redd/reddspec.html">Reverse Directory Deltas (ReDD)</a>, <a rel="nofollow" target="_blank" href="http://www.cdlib.org/inside/diglib/clop/clopspec.pdf">Class-based System for Managing Object Properties (CLOP)</a>, and <a rel="nofollow" target="_blank" href="http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf">BagIt</a> (co-developed by LC and CDL).</p>
<p>As I mentioned in our Friday meeting, recounting my experience exploring the specs: the bad thing is that I spent an hour building a repository with rudimentary tools such as mkdir, touch, cp, ln, and emacs; but the good thing is that I built a <em>repository</em> in <em>one hour</em> using <em>common, rudimentary tools</em>.  It&#039;s a very compelling paradigm.  <a rel="nofollow" target="_blank" href="http://inkdroid.org/ehs">Ed</a>&#039;s already built a <a rel="nofollow" target="_blank" href="http://github.com/edsu/dflat">tool</a> implementing some of Dflat, further demonstrating how lightweight these micro-services are.  (<strong>UPDATE</strong>: Ed notes that this code is a work in progress and is &#034;barely functional.&#034;)  (<strong>UPDATE 2</strong>: The dflat library has come a long way.  Check it out if you&#039;re interested.  Also, I just committed a pretty basic Namaste library: <a rel="nofollow" target="_blank" href="http://github.com/mjgiarlo/namaste">http://github.com/mjgiarlo/namaste</a>.  Only took about an hour, which is a testament to the power of lightweight specs.)</p>
<p>I am certain this will be a running thread at work as the specifications evolve and our understanding of them grows.  Some questions and comments that occurred to me while exploring the micro-service specs and building the minimal repo:</p>
<ul>
<li>CAN was a bit puzzling.  The spec is simple enough, but I found some of the conventions confusing, and I was left wondering what CAN provides other than a container.  What I would like to see is a simple use case and perhaps more examples.  Thus, the CAN stuff in my sample repo doesn&#039;t feel very useful only because I had a hard time working with the spec.</li>
<li>CLOP feels like the least mature of the specifications.  It seems generally useful to be able to put digital objects, however you define that, into classes and define properties on those classes.  The spec did not clearly convey to me just how it accomplishes that aim.  A few examples would go a very long way.  I&#039;ve got some CLOP stuff in the sample repo but I have no idea how close my implementation matches the spec.</li>
<li>Is Dflat dependent on ReDD?  One would assume not since there&#039;s an optional property in the dflat-info.txt file for specifying a delta scheme.  But, say, could you stub out the v001 directory (reserved to hold the initial version of a digital object) and use a system such as <a rel="nofollow" target="_blank" href="http://git-scm.com/">git</a> or <a rel="nofollow" target="_blank" href="http://bazaar-vcs.org/">bazaar</a>?  <br/><br/>One might argue that these established delta schemes, if you want to call them that, have many more developers and users than a system such as ReDD and thus should persist longer and have more tools built around them.  I imagine the micro-service viewpoint would acknowledge that point, but counter that the spirit of these specs is to avoid dependencies from outside the filesystem?</li>
<li>Is the ReDD specification meaningful outside of a Dflat given that any one ReDD directory knows nothing of its successors and predecessors, or is it dependent upon Dflat?</li>
<li>Could a BagIt bag live inside of the ReDD reserved &#034;full&#034; directory?  That is, could the &#034;full&#034; directory be marked up appropriately to <em>be</em> a BagIt bag?</li>
<li>How many tools exist for these specs?  I notice there&#039;s code in CPAN for Pairtree and Namaste, which is a fabulous start.  Tools are the difference between YAMF (Yet Another Messy Filesystem) and reliably managed curation services.  Granted, tools such as cp and emacs already exist and are part of the appeal of these micro-services, but there&#039;s also tremendous room for error if operations are all done &#034;by hand.&#034;</li>
<li>To what extent has CDL transitioned to using these specs/tools?</li>
<li>Are other institutions using these specs/tools?  I have heard tell that digital library folks from the University of Michigan and the University of North Texas may be involved.</li>
</ul>
<p>I hope I don&#039;t sound overly critical.  I&#039;m really glad our colleagues at the California Digital Library have written these specifications and applied their deep experience to what could be a transformative paradigm[<a rel="nofollow" target="_blank" href="http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/#footnote_1_504" id="identifier_1_504" class="footnote-link footnote-identifier-link" title="Please excuse the fanboyishness; this filesystem fetishism is exciting stuff!">2</a>] in the digital curation world.  Kudos to them!</p>
<h5>Notes</h5><ol class="footnotes"><li id="footnote_0_504" class="footnote">Perhaps it&#039;s more in line with the specs to refer to this space as &#034;a managed filesystem that drives repository and curation services,&#034; given the CDL philosophy that preservation is not a place/repository.  But it&#039;s easier to say &#034;repository,&#034; so there you go.</li><li id="footnote_1_504" class="footnote">Please excuse the fanboyishness; this filesystem fetishism is exciting stuff!</li></ol><br/>
<hr/><div class="feedflare">
<a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/technosophia?a=zPtUh6TmyRo:Uy68kBvfdAI:bcOpcFrp8Mo"><img src="http://feeds.feedburner.com/~ff/technosophia?d=bcOpcFrp8Mo" border="0"></a> <a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/technosophia?a=zPtUh6TmyRo:Uy68kBvfdAI:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/technosophia?d=dnMXMwOfBR0" border="0"></a> <a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/technosophia?a=zPtUh6TmyRo:Uy68kBvfdAI:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/technosophia?i=zPtUh6TmyRo:Uy68kBvfdAI:D7DqB2pKExk" border="0"></a>
</div><img src="http://feeds.feedburner.com/~r/technosophia/~4/zPtUh6TmyRo" height="1" width="1"/>]]></content:encoded>
      <enclosure url="http://uccsc2009.ucdavis.edu/preso/UCCSC-2009-CDL-PODS-v05.ppt" length="1569792" type="application/vnd.ms-powerpoint" /><media:content url="http://uccsc2009.ucdavis.edu/preso/UCCSC-2009-CDL-PODS-v05.ppt" fileSize="1569792" type="application/vnd.ms-powerpoint" /><itunes:subtitle>As far as I&amp;#039;m concerned, the most exciting developments this year in repositories and digital curation have come out of the California Digital Library. It has been impossible not to notice their papers and presentations. Put simply, their idea is tha</itunes:subtitle><itunes:summary>As far as I&amp;#039;m concerned, the most exciting developments this year in repositories and digital curation have come out of the California Digital Library. It has been impossible not to notice their papers and presentations. Put simply, their idea is that digital curation is enabled by &amp;#034;micro-services&amp;#034; built upon well-known abstractions such as the filesystem. [...]</itunes:summary></item>
      <item>
         <title>I2: Survey results</title>
         <link>http://feedproxy.google.com/~r/technosophia/~3/4SISswBO5mE/</link>
         <description>I wrote in June that the I2 subgroup surveyed &amp;#034;repository managers to determine the current practices and needs of the repository community regarding institutional identifiers. Results from the survey will inform a set of use cases that will be shared with the community, and that are expected to drive the development of a new standard [...]</description>
         <guid isPermaLink="false">http://lackoftalent.org/michael/blog/?p=497</guid>
         <pubDate>Tue, 15 Sep 2009 12:29:55 +0000</pubDate>
         <content:encoded><![CDATA[<p>I <a rel="nofollow" target="_blank" href="http://lackoftalent.org/michael/blog/2009/06/20/i2-survey/">wrote</a> in June that the I2 subgroup surveyed &#034;repository managers to determine the current practices and needs of the repository community regarding institutional identifiers. Results from the survey will inform a set of use cases that will be shared with the community, and that are expected to drive the development of a new standard for institutional identifiers.&#034;</p>
<p>The survey closed in July, and the subgroup spent August writing a report on the survey results.  That report is now <a rel="nofollow" target="_blank" href="http://www.niso.org/apps/group_public/document.php?document_id=2773">final</a> and it&#039;s available to the public.  Feedback may be sent to our (woefully underutilized) public <a rel="nofollow" target="_blank" href="http://www.niso.org/lists/i2info/">i2info</a> mailing list, left as a comment on this post, or e-mailed to me privately which I can forward to our internal list.</p>
<p>The next step is to build upon the report to draw yet more conclusions from the data &#8212; there&#039;s an awful lot there &#8212; and flesh out some repository use cases for institutional identifiers.  The I2 core group is moving quickly towards finalizing identifier metadata elements so that a standard may be drafted, and I think having some use cases documented will help drive the standard in a direction the community can get behind.</p>
<p>Onward and upward.</p>
<div class="feedflare">
<a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/technosophia?a=4SISswBO5mE:r7DDup2itSc:bcOpcFrp8Mo"><img src="http://feeds.feedburner.com/~ff/technosophia?d=bcOpcFrp8Mo" border="0"></a> <a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/technosophia?a=4SISswBO5mE:r7DDup2itSc:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/technosophia?d=dnMXMwOfBR0" border="0"></a> <a rel="nofollow" target="_blank" href="http://feeds.feedburner.com/~ff/technosophia?a=4SISswBO5mE:r7DDup2itSc:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/technosophia?i=4SISswBO5mE:r7DDup2itSc:D7DqB2pKExk" border="0"></a>
</div><img src="http://feeds.feedburner.com/~r/technosophia/~4/4SISswBO5mE" height="1" width="1"/>]]></content:encoded>
      </item>
      <item>
         <title>Thoughts on digitization, data deluge and linking</title>
         <link>http://davetaz-blog.blogspot.com/2009/09/thoughts-on-digitization-data-deluge.html</link>
         <description>It's been a while since I've put a post up and this is probably due to being busy and also trying to tidy up a lot of stuff before starting on new projects.&lt;br /&gt;&lt;br /&gt;In this post then: &lt;b&gt;Digitisation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I never really gathered how big the area of digitisation is and how many non repository people are actively involved in digitisation. There are a great many projects &amp;gt;50 who are digitising resources and these include national libraries. Items being digitised include everything from postcards and newspapers to full books and old journals.&lt;br /&gt;&lt;br /&gt;So what's the problem here ... simple ... how many people are digitising the same things? Yes I know that there is so much out there that this is unlikely to be the case however it brings me nicely to the problem of information overload. There is already more valuable information on the internet than we can possibly handle effectively, so how do you ensure that any resources you digitize for open access usage on the web can be found and used? &lt;br /&gt;&lt;br /&gt;I don't normally say this but perhaps we should look at physical libraries for the answer. Libraries are a very good central point where you can find publications related to all subject areas, and if your local library does not have a copy then it will try and find a copy somewhere else. &lt;br /&gt;&lt;br /&gt;How then does this map onto the web? Web sites become the library and links become the references to additional items or items this site does not contain, simple right? Unfortunately with 50+ projects I can count already, this leads to 50+ different web sites all with differing information presented in different ways. Due to the presentation of each web site being totally different this means that in fact they are not a library - that pride themselves on the standard way to organise resources -&lt;br /&gt;thus web sites become books. Thus to find resources we have to rely on search engines and federation. Thus we are back to where we started and we have a problem with  information overload. &lt;br /&gt;&lt;br /&gt;Unfotunately I don't have an answer to this problem, however I do know that links hold the key to the solution. Each website at the moment is simply an island of infromation, what is desperately required is the authors and community to establish links to these resources. If digitisation houses are curating refereed resources then the simplist way to link to these would be to put information about them on wikipedia.&lt;br /&gt;&lt;br /&gt;This would be my final point then, wikipedia is actually a good thing, simply because of the the community aspect. However it also provides many other huge benefits:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;External resources such as photoes have to have a licience&lt;/li&gt;&lt;br /&gt;&lt;li&gt;In annotating a page/item you create links and establish facts which are available by semantic wikipedia (dbpedia)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Wikipedia is an easy way to establish your presence on the link data web (linkeddata.org)&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;So if you are digitising books by an author, add this link to their wikipedia page. If you are digitising a collection of World War images, add links to some of these to wikipedia and flikr.&lt;br /&gt;&lt;br /&gt;Establish links and help yourself to help everyone else.</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-1374762836853461443</guid>
         <pubDate>Tue, 08 Sep 2009 07:01:00 +0000</pubDate>
      </item>
      <item>
         <title>Less talk, less code, more data - The Preserv2 Data Registry</title>
         <link>http://davetaz-blog.blogspot.com/2009/04/talk-code-data-preserv2-data-registry.html</link>
         <description>Yes, less talk more code (oxfordrepo.blogspot.com) is a good saying but i'm going to argue in this post that in fact we need more data! Having a ton of available services and a load of highly complex and well considered data models is all well and good but without data all of these services are useless; A repository is not a repository until it has something in it (Harnad). &lt;br /&gt;&lt;br /&gt;If we look outside of the repository community for a minute we find the web community we are accumulating a whole ton of data, wikipedia being the main point of reference here. Yet in the repository community we are not harnessing this open linked data model to enhance our data. &lt;br /&gt;&lt;br /&gt;I have been working in the area of digital preservation for a while now and the PRONOM file format registry (TNA UK) has been my friend for many years now and contains some valuable data. However I am concerned with the way I see it progressing. The main thing I use the PRONOM registry for is as a complement to DROID for file format information, and the data here is not even that complete. I am concerned however at the size of the new data model and the sheer effort which is going to be required to fill it with the data which it specifies. &lt;br /&gt;&lt;br /&gt;Why not looked to the linked data web to see how to tie a series of smaller systems together to make a much more powerful and easier to maintain one! &lt;br /&gt;&lt;br /&gt;This is where I have started with the preserv2 registry available at &lt;a rel="nofollow" target="_blank" href="http://p2-registry.ecs.soton.ac.uk/"&gt;http://p2-registry.ecs.soton.ac.uk/&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;The preserv2 registry is a semantic knowledge base (RDF triples based) with an SPARQL endpoint, RESTful services and a basic browser. Currently the data is focussed on file formats and is basically made up of the PRONOM database ported from a complex XML schema into simple RDF triples. On top of this i'm beginning to add data from dbpedia (wikipedia RDF'd) and making links between the PRONOM data and the dbpedia data! &lt;br /&gt;&lt;br /&gt;Already this is helping is ascertain a greater knowledge base and the cost of gathering and compiling this data is very low. Other than that the registry took me less than a week to construct! &lt;br /&gt;&lt;br /&gt;So "Go forth and make links" (Wendy Hall) is exactly what I'm now doing. With enough data you will be able to make complex OWL-S rules that can be used to deduce accurately facts such as formats which are at risk.</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-3516378215565453655</guid>
         <pubDate>Wed, 08 Apr 2009 08:21:00 +0000</pubDate>
      </item>
      <item>
         <title>We need people!</title>
         <link>http://oxfordrepo.blogspot.com/2009/03/we-need-people.html</link>
         <description>(UPDATE - Grrr.... seems that the concept of persistent URLs is lost on the admin - link below has been removed - see google cached copy &lt;a rel="nofollow" target="_blank" href="http://209.85.229.132/search?q=cache:YsdhcWzKWksJ:www.admin.ox.ac.uk/ps/oao/ar/ar3979j.shtml+http://www.admin.ox.ac.uk/ps/oao/ar/ar3979j.shtml&amp;amp;cd=1&amp;amp;hl=en&amp;amp;ct=clnk"&gt;here&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.admin.ox.ac.uk/ps/oao/ar/ar3979j.shtml"&gt;http://www.admin.ox.ac.uk/ps/oao/ar/ar3979j.shtml&lt;/a&gt; - job description.&lt;br /&gt;&lt;br /&gt;Essentially, we need smart people who are willing to join us to do good, innovative stuff; work that isn't by-the-numbers with room for initiative and ideas.&lt;br /&gt;&lt;br /&gt;Help us turn our digital repository into a digital library, it'll be fun! Well, maybe not fun, but it will be very interesting at least!&lt;br /&gt;&lt;br /&gt;bulletpoints: python/ruby frameworks, REST, a little SemWeb, ajax, jQuery, AMQP, Atom, JSON, RDF+RDFa, Apache WSGI deployment, VMs, linux, NFS, storage, RAID, etc.</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-462847752063249912</guid>
         <pubDate>Thu, 19 Mar 2009 07:46:00 +0000</pubDate>
      </item>
      <item>
         <title>Developer Happiness days - why happyness is important</title>
         <link>http://oxfordrepo.blogspot.com/2009/02/developer-happiness-days-why-happyness.html</link>
         <description>&lt;big&gt;&lt;big&gt;&lt;b&gt;Creativity and innovation&lt;/b&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;One of the defining qualities of a good innovative developer is creativity and a pragmatic attitude; someone with the '&lt;i&gt;rough consensus, running code&lt;/i&gt;' mentality that pervades good software innovation. This can be seen as the drive to experiment, to turn inspiration and ideas into real, running code or to pathfind by trying out different things. Innovation can often happen when talking about quite separate, seemingly unrelated things, even to the point that most of the time, the 'outcomes' of an interaction are impossible to pin down.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;&lt;b&gt;Play, vagueness and communication&lt;/b&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Creativity, inspiration, innovation, ideas, fun, and curiousity&lt;/b&gt; are all useful and important when developing software. These words convey concepts that do not thrive in situations that are purely scheduled, didactic, and teacher-pupil focussed. There needs to be an amount of '&lt;b&gt;&lt;i&gt;play&lt;/i&gt;&lt;/b&gt;' in the system (&lt;a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Play_%28engineering%29"&gt;see 'Play'.&lt;/a&gt;) While this '&lt;i&gt;&lt;b&gt;play&lt;/b&gt;&lt;/i&gt;' is bad in a tightly regimented system, it is an essential part in a creative system, to allow for new things to develop, new ideas to happen and for 'random' interactions to take place.&lt;br /&gt;&lt;br /&gt;Alongside this notion of &lt;i&gt;&lt;b&gt;play&lt;/b&gt;&lt;/i&gt; in an event, there also needs to be an amount of blank space, a &lt;i&gt;&lt;b&gt;vagueness&lt;/b&gt;&lt;/i&gt; to the event. I think that we can agree that much of the usefulness of normal conferences comes from the 'coffee breaks' and 'lunch breaks', which are blank spaces of a sort. It is the recognition of this that is important and to factor it in more.&lt;br /&gt;&lt;br /&gt;Note that if a single developer could guess at how things should best be developed in the academic space, they would have done so by now. &lt;i&gt;Pre-compartmentalisation of ideas into 'tracks' can kill potential innovation stone-dead.&lt;/i&gt; The distinction between CMSs, repositories and VLE developers is purely semantic and it is detrimental for people involved in one space to not overhear the developments, needs, ideas and issues in another. It is especially counter-productive to further segregate by community, such as having simultaneous Fedora, DSpace and EPrints strands at an event.&lt;br /&gt;&lt;br /&gt;While the inherent and intended &lt;i&gt;&lt;b&gt;vagueness&lt;/b&gt;&lt;/i&gt; provides the potential for cross-fertilisation of ideas, and the room for &lt;i&gt;&lt;b&gt;play&lt;/b&gt;&lt;/i&gt; provides the space, the final ingredient is that of &lt;i&gt;&lt;b&gt;speech, or any communication that takes place with the same ease and at the same speed of speech&lt;/b&gt;&lt;/i&gt;. While some may find the 140 character limit on twitter or identi.ca a strange constraint, this provides a target for people to really think about what they wish to convey and keeps the dialogue from becoming a series of monologues - much like the majority of emails of mailing lists - and keeps it as a dialogue between people.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;&lt;b&gt;Communication and Developers&lt;/b&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;One of the dichotomies in the necessity of communication to development is that developers can be shy, initially preferring the false anonymity of textual communication to spoken words between real people. There is a need to provide means for people to break the ice, and to strike up conversations with people that they can recognise as being of like minds. Asking that people's public online avatars are changed to be pictures of them can help people at an event find those that they have been talking to online and to start talking, face to face.&lt;br /&gt;&lt;br /&gt;On a personal note, one of the most difficult things I have to do when meeting people out in real life is answer the question 'What do you do?' - it is much easier when I already know that the person asking the question has a technical background.&lt;br /&gt;&lt;br /&gt;And again, going back to the concept of compartmentalisation - &lt;i&gt;developers who only deal with developers and their managers/peers will build systems that work best for their peers and their managers.&lt;/i&gt; If these people are not the only users then they need to widen their communications. It is important for the developers that do not use their own systems to engage with the people who actually do. They should do this directly, without the potential for garbled dialogue via layers of protocol. This part needs managing in whatever space, both to avoid dominance by loud, disgruntled users and to mitigate anti-social behaviour. By and large, I am optimistic of this process, people tend to want to be thanked, and this simple &lt;i&gt;feedback loop&lt;/i&gt; can be used to help motivate. Making this feedback more disproportionate (a small 'thank you' can lead to great effects) and adding in the notion of &lt;i&gt;highscore&lt;/i&gt; can lead to all sorts of interaction and outcomes, most notably being the rapid reinforcement of any behaviour that led to a positive outcome.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;&lt;b&gt;Disproportionate feedback loops and Highscores drive human behaviour&lt;/b&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;I'll just digress quickly to cover what I mean be a &lt;b&gt;&lt;i&gt;disproportionate feedback loop&lt;/i&gt;&lt;/b&gt;: A disproportionate feedback loop is something that encourages a certain behaviour; the input to which is something small and inexpensive, in either time or effort but the output can be large and very rewarding. This pattern can be seen in very many interactions: playing the lottery, [good] video game controls, twitter and facebook, musical instruments, the 'who wants to be a millionaire' format, mashups, posting to a blog ('free' comments, auto rss updating, a google-able webpage for each post) etc.&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;&lt;b&gt;natural drive for highscores&lt;/b&gt;&lt;/i&gt; is also worth pointing out. At first glance, is it as simple as considering its use in videogames? How about the concept of getting your '5 fruit and veg a day'? &lt;a rel="nofollow" target="_blank" href="http://www.5aday.nhs.uk/topTips/default.html"&gt;http://www.5aday.nhs.uk/topTips/default.html&lt;/a&gt; Running in a marathon against other people? Inbox Zero (&lt;a rel="nofollow" target="_blank" href="http://www.slideshare.net/merlinmann/inbox-zero-actionbased-email"&gt;http://www.slideshare.net/merlinmann/inbox-zero-actionbased-email&lt;/a&gt;),  Learning to play different musical scores? Your work being rated highly online? An innovation of yours being commented on by 5 different people in quick succession? Highscores can be very good drivers for human behaviour, addictive to some personalities.&lt;br /&gt;&lt;br /&gt;Why not set up some software highscores? For example, in the world of repositories, how about 'Fastest UI for self-submission' - encouraging automatic metadata/datamining, a monthly prize for 'Most issue tickets handled' - to the satisfaction of those posting the tickets, and so on.&lt;br /&gt;&lt;br /&gt;It is very easy to over-metricise this - some will purposefully abstain from this and some metrics are truely misleading. In the 90s, there was a push to have lines of code added as a metric to productivity. The false assumption is that lines of code have anything to do with producitivity - code should be lean, but not too lean to maintain.&lt;br /&gt;&lt;br /&gt;So be very careful when adding means to record highscores - they should be flexible, and be fun - if they are no fun for the developers and/or the users, they become a pointless metric, more of an obstacle than a motivation.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;&lt;b&gt;The Dev8D event&lt;/b&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;People were free to roam and interact at the Dev8D event and there was no enforced schedule, but twitter and a loudhailer were used to make people aware of things that were going on. Talks and discussions were lined up prior to the event of course, but the event was organised on a wiki which all were free to edit. As experience has told us, the important and sometimes inspired ideas occur in relaxed and informal surroundings where people just talk and share information, such as in a typical social situation like having food and drink.&lt;br /&gt;&lt;br /&gt;As a specific example, look &lt;a rel="nofollow" target="_blank" href="http://oxfordrepo.blogspot.com/2009/02/tracking-conferences-at-dev8d-with.html"&gt;at the role of twitter at the event&lt;/a&gt;. Sam Easterby-Smith (&lt;a rel="nofollow" target="_blank" href="http://twitter.com/samscam%29"&gt;http://twitter.com/samscam)&lt;/a&gt; created a means to track 'developer happiness' and shared the tracking '&lt;a rel="nofollow" target="_blank" href="http://samscam.co.uk/happier/"&gt;Happyness-o-meter'&lt;/a&gt; site with us all. This unplanned development inspired me to relay the infomation back to twitter and similarly led to me running an operating system/hardware survey in a very similar fashion.&lt;br /&gt;&lt;br /&gt;To help break the ice and to encourage play, we instituted a number of ideas:&lt;br /&gt;&lt;br /&gt;A &lt;b&gt;wordcloud on each attendees badge&lt;/b&gt;, consisting of whatever we could find of their work online, be it their blog or similar so that it might provide a talking point, or allow people to spot people who write about things they might be interested in learning more about.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The poker chip game&lt;/b&gt; - each attendee was given 5 poker chips at the start of the event, and it was encouraged that chips were to be traded for help, advice or as a way to convey a thank you. The goal was that the top 4 people ranked by amounts of chips at the end of the third day would receive a Dell mini 9 computer. The balance to this was that each chip was also worth a drink at the bar on that day too.&lt;br /&gt;&lt;br /&gt;We were well aware that we'd left a lot of play in this particular system, allowing for lotteries to be set up, people pooling their chips, and so on. As the sole purpose of this was to encourage people to interact, to talk and bargain with each other, and to provide that feedback loop I mentioned earlier, it wasn't too important how people got the chips as long as it wasn't underhanded. It was the interaction and the 'fun' that we were after. Just as an aside, Dave Flanders deserves the credit for this particular scheme.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;&lt;b&gt;Developer Decathlon&lt;/b&gt;&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;The basic concept of the &lt;a rel="nofollow" target="_blank" href="http://code.google.com/p/developerhappinessdays/wiki/DeveloperDecathlon"&gt;Developer Decathlon&lt;/a&gt; was also reusing these ideas of play and feedback: "&lt;a rel="nofollow" name="What_is_the_Developer_Decathlon?"&gt;The Developer Decathlon is a competition at dev8D that enables developers to come together face-to-face to do rapid prototyping of software ideas. [..] &lt;/a&gt; We help facilitate this at dev8D by providing both 'real users' and 'expert advice' on how to run these rapid prototyping sprints. [..] The 'Decathlon' part of the competition represents the '10 users' who will be available on the day to present the biggest issues they have with the apps they use and in turn to help answer developer questions as the prototypes applications are being created.  The developers will have two days to work with the users in creating their prototype applications."&lt;br /&gt;&lt;br /&gt;The best two submissions will get cash prizes that go to the individual, not to the company or institution that they are affiliated with. The outcomes of which will be made public shortly, once the judging panel have done their work.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;Summary&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;To foster innovation and to allow for creativity in software development:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Having &lt;b&gt;play&lt;/b&gt; space is &lt;b&gt;important&lt;/b&gt;&lt;/li&gt;&lt;li&gt;Being &lt;b&gt;vague&lt;/b&gt; with aims and &lt;b&gt;flexible&lt;/b&gt; with outcomes is not a bad thing and is &lt;b&gt;vital&lt;/b&gt; for unexpected things to develop - &lt;i&gt;e.g. A project's outcomes should be under continual re-negotiation as a general rule, not as the exception.&lt;/i&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Encouraging&lt;/b&gt; and &lt;b&gt;enabling&lt;/b&gt; free and easy communication is &lt;b&gt;crucial&lt;/b&gt;.&lt;/li&gt;&lt;li&gt;Be aware of what drives people to do what they do. Push all feedback to be &lt;b&gt;as disproportionate as possible&lt;/b&gt;, allowing both developers and users to benefit, with only putting a relatively trivial amount of input in (this pattern affects web UIs, development cycles, team interaction, etc)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Choose useful highscores&lt;/b&gt; and be prepared to ditch them or change them if they are no longer &lt;b&gt;fun and motivational&lt;/b&gt;.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-7889113205346661214</guid>
         <pubDate>Wed, 25 Feb 2009 06:09:00 +0000</pubDate>
      </item>
      <item>
         <title>Handling Tabular data</title>
         <link>http://oxfordrepo.blogspot.com/2009/02/handling-tabular-data.html</link>
         <description>&lt;div&gt;&lt;b&gt;"Storage"&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;I put the s-word in quotes because the storing of the item is actually a very straightforward process - we have been dealing with storing tabular data for computation for a very long time now. Unfortunately, this also means that there are very many ways to capture, edit and present tables of information.&lt;br/&gt;&lt;br/&gt;One realisation to make with regards to preserving access to data coming from research is that there is a huge backlog of data in formats that we shall kindly call 'legacy'. Not only is there this issue, but data is being made with tools and systems that effectively 'trap' or lock-in a lot of this information - case in point being any research being recorded using Microsoft Access. While the tables of data can often be extracted with some effort, it is normally difficult to impossible to extract the implicit information; how tables interlink, how the Access Form adds information to the dataset, etc.&lt;br/&gt;&lt;br/&gt;It is this implicit knowledge that is the elephant in the room. Very many serialisations, such as SQL 'dumps', csv, xsl and so on, rely on implicit knowledge that is either related to the particulars of the application used to open it, or is actually highly domain specific.&lt;br/&gt;&lt;br/&gt;So, it is trivial and easy to specify a model for storing data, but without also encoding the implied information and without making allowances for the myriad of sources, the model is useless; it would be akin to defining the colour of storage boxes holding bric-a-brac. The datasets need to be characterised, and the implied information recorded in as good a way as possible.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Characterisation&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;The first step is to characterise the dataset that has been marked for archival and reuse. (Strictly, the best first step is to consult with the researcher or research team and help and guide them so that as much of the unsaid knowledge is known by all parties.)&lt;br/&gt;&lt;br/&gt;Some serialisations so a good job of this themselves, *SQL-based serialisations include basic data type information inside the table declarations themselves. As a pragmatic measure, it seems sensible to accept SQL-style table descriptions as a reasonable beginning. Later, we'll consider the implicit information that also needs to be recorded alongside such a declaration.&lt;br/&gt;&lt;br/&gt;Some others, such as CSV, leave it up to the parsing agent to guess at the type of information included. In these cases, it is important to find out or even deduce the type of data held in each column. Again, this data can be serialised in a SQL table declaration held alongside the original &lt;i&gt;unmodified&lt;/i&gt; dataset.&lt;br/&gt;&lt;br/&gt;(It is assumed that a basic data review will be carried out; does the csv have a consistent number of columns per row, is the version and operating system known for the MySQL that held the data, is there a PI or responsible party for the data, etc.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Implicit information&lt;br/&gt;&lt;/b&gt;&lt;br/&gt;Good teachers are right to point out this simple truth: "don't forget to write down the obvious!" It may seem obvious that all your data is latin-1 encoded, or that you are using a FAT32 filesystem, or even that you are running in a 32-bit environment, the painful truth is that we can't guarantee that these aspects won't affect how the data is held, accessed or stored. There may be systematic issues that we are not aware of, such as the problems with early versions of ZFS causing [, at the time, detected] data corruption, or MySQL truncating fields when serialised in a way that is not anticipated or discovered until later.&lt;br/&gt;&lt;br/&gt;In characterising the legacy sets of data, it is important to realise that there will be loss, especially with the formats and applications that blend presentation with storage. For example, it will require a major effort to attempt to recover the forms and logic bound into the various versions of MS Access. I am even aware of a major dataset, a highly researched dictionary of old english words and phrases, that the final output of which is a Macromedia Authorware application, and the source files are held by an unknown party (that is if they still exist at all) - the Joy of hiring Contractors. In fact, this warrants a slight digression:&lt;br/&gt;&lt;br/&gt;&lt;b&gt;The gap in IT support for research&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;If an academic researcher wishes to gain an external email account at their institution, there is an established protocol for this. Email is so commonplace, it sounds an easy thing to provide, but you need expertise, server hardware, multiuser configuration, adoption of certain access standards (IMAP, POP3, etc), and generally there are very few types of email (text or text with MIME attachments - NB the IM in MIME stands for Internet Mail)&lt;br/&gt;&lt;br/&gt;If a researcher has a need to store tables of data, where do they turn? They should turn to the same department, who will handle the heavy lifting of guiding standards, recording the implicit information and providing standard access APIs to the data. What the IT departments seem to be doing currently is - to carry on the metaphor - handing the researcher the email server software and telling them to get on with it, to configure it as they want. No wonder the resulting legacy systems are as free-form as they are.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Practical measures - Curation&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;Back to specifics now, consider that a set of data has been found to be important, research has been based on it, and it's been recognised that this dataset needs to be looked after. [This will illustrate the technical measures. Licencing, dialogue with the data owners, and other non-technical analysis and administration is left out, but assumed.]&lt;br/&gt;&lt;br/&gt;First task is to store the incoming data, byte-for-byte, as much as is possible - storing the iso image of the media the data is stored on, storing the SQL dump of a database, etc.&lt;br/&gt;&lt;br/&gt;Analyse the tables of data - record the base types of each column (text, binary, float, decimal, etc) apeing the syntax of a SQL table declaration, as well as trying to identify the key columns.&lt;br/&gt;&lt;br/&gt;Record the inter-table joins between primary and secondary keys, possibly by using a &lt;i&gt;"table.column SAMEAS table.column;"&lt;/i&gt; declaration after the table declarations.&lt;br/&gt;&lt;br/&gt;Likewise, attempt to add information concerning each column, information such as units or any other identifying material.&lt;br/&gt;&lt;br/&gt;Store this table description alongside the recorded tabular data source.&lt;br/&gt;&lt;br/&gt;Form a representation of this data in a well-known, current format such as a MySQL dump. For spreadsheets that are 'frozen', cells that are the results of embedded formula should be calculated and added as fixed values. It is important to record the environment, library and platform that these calculations are made with.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Table description as RDF &lt;/b&gt;(strictly, referencing cols/rows via the URI)&lt;br/&gt;&lt;br/&gt;One syntax I am playing around with is the notion that by appending sensible suffixes to the base URI for a dataset, we can unique specify a row, a column, a region or even a single cell. Simply put:&lt;br/&gt;&lt;br/&gt;http://datasethost/datasets/{data-id}#table/{table-name}/column/{column-id} to reference a whole column&lt;br/&gt;http://datasethost/datasets/{data-id}#table/{table-name}/row/{column-id} to reference a whole row, etc&lt;br/&gt;&lt;br/&gt;[The use of the # in the position it is in will no doubt cause debate. Suffice it to say, this is a pragmatic measure, as I suspect that an intermediary layer will have to take care of dereferencing a GET on these forms in any case.]&lt;br/&gt;&lt;br/&gt;The purpose for this is so that the tabular description can be made using common and established namespaces to describe and characterise the tables of data. Following on from a previous post on extending the BagIt protocol with an RDF manifest, this information can be included in said manifest, alongside the more expected metadata without disrupting or altering how this is handled.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;A possible content type for tabular data&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;By considering the base Fedora repository object model, or the BagIt model, we can apply the above to form a content model for a dataset:&lt;br/&gt;&lt;br/&gt;As a Fedora Object:&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Original data in whatever forms or formats it arrives in (dsid prefix convention: DATA*)&lt;/li&gt;&lt;li&gt;Binary/textual serialisation in a well-understood format (dsid prefix convention: DERIV*)&lt;/li&gt;&lt;li&gt;'Manifest' of the contents (dsid convention: RELS-INT)&lt;/li&gt;&lt;li&gt;Connections between this dataset and other objects, like articles, etc as well as the RDF description of this item (RELS-EXT)&lt;/li&gt;&lt;li&gt;Basic description of dataset for interoperability (Simple dublin core - DC)&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;As a BagIt+RDF:&lt;br/&gt;&lt;br/&gt;Zip archive - &lt;br/&gt;&lt;ul&gt;&lt;li&gt;/MANIFEST (list of files and checksums)&lt;/li&gt;&lt;li&gt;/RDFMANIFEST (RELS-INT and RELS-EXT from above)&lt;/li&gt;&lt;li&gt;/data/* (original dataset files/disk images/etc)&lt;/li&gt;&lt;li&gt;/derived/* (normalised/re-rendered datasets in a well known format)&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Presentation - the important part&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;What is described above is the archival of the data. This is a form suited for discovery, but is not in a form suited for reuse. So, what is the possibility?&lt;br/&gt;&lt;br/&gt;BigTable (Google) or HBase (Hadoop) provides a platform where tabular data can be put in a scalable manner. In fact, I would go on to suggest that HBase should be a basic service offered by the IT department of any institution. By providing this database as a service, it should be easier to normalise, and to educate the academic users in a manner that is useful to them, not just to the archivist. Google spreadsheet is an extremely good example of how such a large, scalable database might be presented to the end-user.&lt;br/&gt;&lt;br/&gt;For archival sets with a good (RDF) description of the table, it should be possible to instantiate working versions of the tabular data on a scalable database platform like HBase on demand. Having a policy to put to 'sleep' unused datasets can provide a useful comprimise, avoiding having all the tables live but still providing a useful service. &lt;br/&gt;&lt;br/&gt;It should also be noted that the adoption of popular methods of data access should be part of the responsibility of the data providers - this will change as time goes on, and protocols and methods for access alter with fashion. Currently, Atom/RSS feeds of any part of a table of data (the google spreadsheet model) fits very well with the landscape of applications that can reuse this information.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;Try to record as much information as can be found or derived - from host operating system to column types.&lt;/li&gt;&lt;li&gt;Keep the original dataset byte-for-byte as you recieved it.&lt;/li&gt;&lt;li&gt;Try to maintain a version of the data in a well-understood format&lt;/li&gt;&lt;li&gt;Describe the tables of information in a reusable way, preferably by adopting a machine-readable mechanism&lt;/li&gt;&lt;li&gt;Be prepared to create services that the users want and need, not services that you think they should have.&lt;/li&gt;&lt;/ul&gt;&lt;br/&gt;&lt;br/&gt;&lt;div class='zemanta-pixie'&gt;&lt;img src='http://img.zemanta.com/pixy.gif?x-id=38add0ef-8f30-4993-98ab-7ae1db1f0b20' class='zemanta-pixie-img'/&gt;&lt;/div&gt;&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-7013951292415915268</guid>
         <pubDate>Sun, 22 Feb 2009 18:28:00 +0000</pubDate>
      </item>
      <item>
         <title>Tracking conferences (at Dev8D) with python, twitter and tags</title>
         <link>http://oxfordrepo.blogspot.com/2009/02/tracking-conferences-at-dev8d-with.html</link>
         <description>There was so much going on at http://www.dev8d.org (#dev8d) that it might be foolish for me to attempt to write up what happened.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;So, I'll focus on a small, but to my mind, crucial aspect of it - tag tracking with a focus on &lt;a rel="nofollow" target="_blank" href="http://twitter.com/"&gt;Twitter&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;The Importance of Tags&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;First, the tag (#)dev8d was cloudburst over a number of social sites - &lt;a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/tags/dev8d/"&gt;Flickr&lt;/a&gt;(dev8d tagged photos), &lt;a rel="nofollow" target="_blank" href="http://search.twitter.com/search?q=%23dev8d"&gt;Twitter&lt;/a&gt;(dev8d feed), blogs such as the &lt;a rel="nofollow" target="_blank" href="http://dev8d.jiscinvolve.org/"&gt;JISCInvolve Dev8D site&lt;/a&gt;, and so on. This was not just done for publicity, but as a means to track and re-assemble the various inputs to and outputs from the event.&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/tags/dev8d/"&gt;Flickr&lt;/a&gt; has some really nice photos on it, shared by people like &lt;a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/ianibbo/"&gt;Ian Ibbotson&lt;/a&gt; (who caught &lt;a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/ianibbo/3275945388/"&gt;an urban fox&lt;/a&gt; on camera during the event!) While there was an 'official' &lt;a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/dev8d"&gt;dev8d flickr user&lt;/a&gt;, I expect the most unexpected and most interesting photos to be shared by other people who kindly add on the dev8d tag so we can find them. For conference organisers, this means that there is a pool of images that we can choose from, each with their own provenance so we can contact the owner if we wanted to re-use, or re-publish. Of course, if the owner puts a &lt;a rel="nofollow" target="_blank" href="http://creativecommons.org/"&gt;CC licence&lt;/a&gt; on them, it makes things easier :)&lt;br /&gt;&lt;br /&gt;So, asserting a tag or label for an event is a useful thing to do in any case. But, this twinned with using a messaging system like &lt;a rel="nofollow" target="_blank" href="http://twitter.com/"&gt;Twitter&lt;/a&gt; or &lt;a rel="nofollow" target="_blank" href="http://identi.ca/"&gt;Identi.ca&lt;/a&gt;, means that you can coordinate, share, and bring together an event. There was a projector in the Basecamp room, which was either the bar, or one of the large basement rooms at Birkbeck depending on the day. Initially, this was used to run through the basic flow of events, which was primarily organised through the use of a &lt;a rel="nofollow" target="_blank" href="http://code.google.com/p/developerhappinessdays/"&gt;wiki&lt;/a&gt;, to which all of us and the attendees were members.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;Projecting the bird's eye view of the event&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;I am not entirely sure whose idea it was initially to use the projector to follow the dev8d tag on twitter, auto-refreshing itself every minute, but it would be one or more of the following: Dave Flanders(&lt;a rel="nofollow" target="_blank" href="http://twitter.com/dfflanders"&gt;@dfflanders&lt;/a&gt;), Andy McGregor(&lt;a rel="nofollow" target="_blank" href="http://twitter.com/andymcg"&gt;@andymcg&lt;/a&gt;) and Dave Tarrant(&lt;a rel="nofollow" target="_blank" href="http://twitter.com/davetaz"&gt;@davetaz&lt;/a&gt;) who is aka BitTarrant due to his network wizardry keeping the wifi going despite Birkbeck's network's best efforts at stopping any form of useful networking going.&lt;br /&gt;&lt;br /&gt;The funny thing about the feed being there, was that it felt perfectly natural from the start. Almost like a mix of notice board, event liveblog and facebook status updates, but the overall effect was like it was the&lt;i&gt; bird's eye view&lt;/i&gt; of the entire event, which you could dip into and out of at will, follow up on talks you weren't even attending, catch interesting links that people posted, and just follow the whole event while doing your own thing.&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;Then things got interesting.&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;From what I heard, a conversation in the bar about developer happiness (involving &lt;a rel="nofollow" target="_blank" href="http://twitter.com/rgardler"&gt;@rgardler&lt;/a&gt;?) lead to &lt;a rel="nofollow" target="_blank" href="http://samscam.co.uk/"&gt;Sam Easterby-Smith&lt;/a&gt; (&lt;a rel="nofollow" target="_blank" href="http://twitter.com/samscam"&gt;@samscam&lt;/a&gt;) to create a script that dug through the dev8d tweets looking for &lt;i&gt;n/m&lt;/i&gt; (like 7/10) and to use that as a mark of happyness e.g.&lt;br /&gt;&lt;blockquote&gt;&lt;a rel="nofollow" target="_blank" href="http://twitter.com/samscam"&gt;" @samscam&lt;/a&gt; #dev8d I am seriously 9/10 happy &lt;a rel="nofollow" target="_blank" href="http://samscam.co.uk/happier"&gt;http://samscam.co.uk/happier&lt;/a&gt; HOW HAPPY ARE YOU? " &lt;a rel="nofollow" target="_blank" href="http://twitter.com/samscam/status/1197185415"&gt; (Tue, 10 Feb 2009 11:17:15)&lt;/a&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;img src="http://lh4.ggpht.com/_KLlGSypGAvw/SZvxf5lx4rI/AAAAAAAAAEA/C9twrbS5xgE/%5BUNSET%5D.png?imgmax=800" style="max-width:800px;"/&gt;&lt;br /&gt;&lt;br /&gt;And computed the average happyness and overall happyness of those who tweeted how they were doing!&lt;br /&gt;&lt;br /&gt;Of course, being friendly, constructive sorts, we knew the best way to help 'improve' his happyometer was to try to break it by sending it bad input... *ahem*.&lt;br /&gt;&lt;blockquote&gt;&lt;a rel="nofollow" target="_blank" href="http://twitter.com/samscam"&gt;" @samscam&lt;/a&gt; #dev8d based on instant discovery of bugs in the Happier Pipe am now only 3/5 happy " (&lt;a rel="nofollow" target="_blank" href="http://twitter.com/samscam/statuses/1197215138"&gt;Tue, 10 Feb 2009 23:05:05&lt;/a&gt;)&lt;br /&gt;&lt;/blockquote&gt;BUT things got fixed, and the community got involved and interested. It caused talk and debate, got people wondering how that it was done, how they could do the same thing and how to take it further.&lt;br /&gt;&lt;br /&gt;At which point, I thought it might be fun to 'retweet' the happyness ratings as they change, to keep a running track of things. And so, a purpose for &lt;a rel="nofollow" target="_blank" href="http://twitter.com/randomdev8d"&gt;@randomdev8d&lt;/a&gt; was born:&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh4.ggpht.com/_KLlGSypGAvw/SZvxqf_Xz5I/AAAAAAAAAEE/Gr_rAh0ojPs/%5BUNSET%5D.png?imgmax=800" style="max-width:800px;"/&gt;&lt;br /&gt;&lt;br /&gt;How I did this was fairly simple: I grabbed his page every minute or so, used BeautifulSoup to parse the HTML, got the happyness numbers out and compared it to the last ones the script had seen. If there was a change, it tweeted it and seconds later, the projected tweet feed updated to show the new values - a disproportionate feedback loop, the key to involvement in games; you do something small like press a button or add 4/10 to a message, and you can affect the stock-market ticker of happyness :)&lt;br /&gt;&lt;br /&gt;If I had been able to give my talk on the python code day, the code to do this would contain zero surprises, because I covered 99% of this - so here's my &lt;a rel="nofollow" target="_blank" href="http://developerhappinessdays.googlecode.com/files/dev8d-presentation.pdf"&gt;'slides'&lt;/a&gt;[pdf] - basically a snapshot screencast.&lt;br /&gt;&lt;br /&gt;Here's the crufty code though that did this:&lt;br /&gt;&lt;blockquote&gt;import time&lt;br /&gt;import simplejson, httplib2, BeautifulSoup&lt;br /&gt;h = httplib2.Http()&lt;br /&gt;h.add_credentials('randomdev8d','PASSWORD')&lt;br /&gt;happy = httplib2.Http()&lt;br /&gt;o = 130.9&lt;br /&gt;a = 7.7&lt;br /&gt;import urllib&lt;br /&gt;&lt;br /&gt;while True:&lt;br /&gt;print "Checking happiness...."&lt;br /&gt;(resp, content) = happy.request('http://samscam.co.uk/happier/')&lt;br /&gt;soup = BeautifulSoup.BeautifulSoup(content)&lt;br /&gt;overallHappyness = soup.findAll('div')[2].contents&lt;br /&gt;avergeHappyness = soup.findAll('div')[4].contents&lt;br /&gt;over = float(overallHappyness[0])&lt;br /&gt;ave = float(avergeHappyness[0])&lt;br /&gt;print "Overall %s - Average %s" % (over, ave)&lt;br /&gt;omess = "DOWN"&lt;br /&gt;if over &amp;gt; o:&lt;br /&gt;omess = "UP!"&lt;br /&gt;amess = "DOWN"&lt;br /&gt;if ave &amp;gt; a:&lt;br /&gt;amess= "UP!"&lt;br /&gt;if over == o:&lt;br /&gt;omess = "SAME"&lt;br /&gt;if ave == a:&lt;br /&gt;amess = "SAME"&lt;br /&gt;if not (o == over and a == ave):&lt;br /&gt;print "Change!"&lt;br /&gt;o = over&lt;br /&gt;a = ave&lt;br /&gt;tweet = "Overall happiness is now %s(%s), with an average=%s(%s) #dev8d (from http://is.gd/j99q)" % (overallHappyness[0], omess, avergeHappyness[0], amess)&lt;br /&gt;data = {'status':tweet}&lt;br /&gt;body = urllib.urlencode(data)&lt;br /&gt;(rs,cont) = h.request('http://www.twitter.com/statuses/update.json', "POST", body=body)&lt;br /&gt;else:&lt;br /&gt;print "No change"&lt;br /&gt;time.sleep(120)&lt;br /&gt;&lt;/blockquote&gt;(Available from &lt;a rel="nofollow" target="_blank" href="http://pastebin.com/f3d42c348"&gt;http://pastebin.com/f3d42c348&lt;/a&gt; with syntax highlighting - NB this was written beat-poet style, written from A to B with little concern for form. The fact that it works is a miracle, so comment on the code if you must.)&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;The grand, official #Dev8D survey!&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;... which was anything but official, or grand. The happyness-o-meter idea lead BitTarrant and I to think "Wouldn't it be cool to find out what computers people have brought here?" Essentially, finding out what computer environment developers &lt;i&gt;choose&lt;/i&gt; to use is a very valuable thing - developers choose things which make our lives easier, by and large, so finding out which setups they use by preference to develop or work with could guide later choices, such as being able to actually target the majority of environments for wifi, software, or talks.&lt;br /&gt;&lt;br /&gt;So, on the Wednesday morning, Dave put out the call on @dev8d for people to post the operating systems on the hardware they brought to this event, in the form of OS/HW. I then busied myself with writing a script that hit the twitter search api directly, and parsed it itself. As this was a more intended script, I made sure that it kept track of things properly, pickling its per-person tallys. (You could post up multiple configurations in one or more tweets, and it kept track of it per-person.) This script was a little bloated at 86 lines, so I won't post it inline - plus, it also showed that I should've gone to the regexp lesson, as I got stuck trying to do it with regexp, gave up, and then used whitespace-tokenising... but it worked fine ;)&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://pastebin.com/f2c04719b"&gt;Survey code: http://pastebin.com/f2c04719b&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Survey results:&lt;/span&gt; &lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJqToHzjCs2jaQ"&gt;http://spreadsheets.google.com/pub?key=pDKcyrBE6SJqToHzjCs2jaQ&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;OS:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Linux was the majority at 42%&lt;/span&gt; closely followed by Apple at 37% with MS-based OS at 18% with a stellar showing of one user of OpenSolaris (4%)!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Hardware type:&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;66% were laptops, with 25% of the machines there being classed as netbooks&lt;/span&gt;. 8% of the hardware there were iPhones too, and one person claimed to have brought Amazon EC2 with them ;)&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;The post hoc analysis&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;Now then, having gotten back to normal life, I've spent a little time grabbing stuff from twitter and digging through them. Here is the &lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJoJdIcm7mdpBg"&gt;list of the 1300+ tweets with the #dev8d tag in them&lt;/a&gt; published via google docs, and here is some derived things posted by Tony Hirst(&lt;a rel="nofollow" target="_blank" href="http://twitter.com/psychemedia"&gt;@psychemedia&lt;/a&gt;) and Chris Wilper(&lt;a rel="nofollow" target="_blank" href="http://twitter.com/cwilper"&gt;@cwilper&lt;/a&gt;) seconds after I posted this:&lt;br /&gt;&lt;br /&gt;Tagcloud of twitterer's:&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.wordle.net/gallery/wrdl/549364/dev8_twitterers"&gt;http://www.wordle.net/gallery/wrdl/549364/dev8_twitterers&lt;/a&gt; [java needed]&lt;br /&gt;&lt;br /&gt;Tagcloud of tweeted words:&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.wordle.net/gallery/wrdl/549350/dev8d"&gt;http://www.wordle.net/gallery/wrdl/549350/dev8d&lt;/a&gt; [java needed]&lt;br /&gt;&lt;br /&gt;And a column of all the tweeted links:&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=p1rHUqg4g423-wWQn8arcTg"&gt;http://spreadsheets.google.com/pub?key=p1rHUqg4g423-wWQn8arcTg&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This lead me to dig through them and republish the list of tweets, but try to unminimise the urls and try to grab the &amp;amp;lt;title&amp;gt; tag of the html page it goes to, which you can find here:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJpwVmV4_4qOdg"&gt;http://spreadsheets.google.com/pub?key=pDKcyrBE6SJpwVmV4_4qOdg&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(Which incidently, lead me to spot that there was one link to "YouTube - Rick Astley - Never Gonna Give You Up" which means the hacking was all worthwhile :))&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;big&gt;Graphing Happyness&lt;/big&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;For one, I've re-analysed the happyness tweets and posted up the following:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A &lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJqHVP8Fb7euEA"&gt;full log of happyness with timeline attached to it&lt;/a&gt;,&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJoxj8D7_EWscQ"&gt;The running average, with accompanying timeline,&lt;/a&gt;&lt;/li&gt;&lt;li&gt;and the &lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJp6acAAn77SZQ"&gt;average of the last 10 tweets&lt;/a&gt; in much the same way as before.&lt;/li&gt;&lt;/ul&gt;It is easier to understand the averages as graphs over time of course! You could also use Tony Hirst's excellent write up here about &lt;a rel="nofollow" target="_blank" href="http://ouseful.wordpress.com/2009/02/17/creating-your-own-results-charts-for-surveys-created-with-google-forms/"&gt;creating graphs from google forms and spreadsheets.&lt;/a&gt; I'm having issues embedding the google timeline widget here, so you'll have to make do with static graphs.&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://lh6.ggpht.com/_KLlGSypGAvw/SZr0KFxfnRI/AAAAAAAAADk/AJQI307X1As/s800/dev8d_running_total_average.png"&gt;&lt;img style="margin:0px auto 10px;display:block;text-align:center;cursor:pointer;width:800px;height:468px;" src="http://lh6.ggpht.com/_KLlGSypGAvw/SZr0KFxfnRI/AAAAAAAAADk/AJQI307X1As/s800/dev8d_running_total_average.png" alt="" border="0"/&gt;&lt;/a&gt;&lt;span style="font-weight:bold;"&gt;Average happyness over the course of the event - all tweets counted towards the average.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://lh4.ggpht.com/_KLlGSypGAvw/SZr0KnArJ0I/AAAAAAAAADs/N5OdzUBDefQ/s912/dev8d_last10HappynessTweetsCount.png"&gt;&lt;img style="margin:0px auto 10px;display:block;text-align:center;cursor:pointer;width:912px;height:510px;" src="http://lh4.ggpht.com/_KLlGSypGAvw/SZr0KnArJ0I/AAAAAAAAADs/N5OdzUBDefQ/s912/dev8d_last10HappynessTweetsCount.png" alt="" border="0"/&gt;&lt;/a&gt;&lt;span style="font-weight:bold;"&gt;Average happyness, but only the previous 10 tweets counted towards the average making it more reflective of the happyness at that time.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you are wondering about the first dip, that was when we all tried to break Sam's tracker by sending it bad data, a lot of 0 happyness's were recorded therefore :) As for the second dip, well, you can see that from the &lt;a rel="nofollow" target="_blank" href="http://spreadsheets.google.com/pub?key=pDKcyrBE6SJqHVP8Fb7euEA"&gt;log of happyness&lt;/a&gt;, yourselves :)&lt;br /&gt;&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-3073595490420683868</guid>
         <pubDate>Wed, 18 Feb 2009 04:18:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://lh4.ggpht.com/_KLlGSypGAvw/SZvxf5lx4rI/AAAAAAAAAEA/C9twrbS5xgE/s72-c/%5BUNSET%5D.png?imgmax=800" width="72" />
      </item>
      <item>
         <title>EPrints 3.2 - Amazon S3/Cloudfront Plug-in</title>
         <link>http://davetaz-blog.blogspot.com/2009/01/eprints-32-amazon-s3cloudfront-plug-in.html</link>
         <description>A quick post to say that we have just successfully tested an EPrints 3.2 (svn) install with the new Storage Controller plugged into Amazon S3! &lt;br /&gt;&lt;br /&gt;This has quiet a lot of implications for both EPrints and other projects wanting to provide external services which operate on objects in a repository. We hope to bring people more news on this at the upcoming Open Repositories 2009 conference in Atlanta. &lt;br /&gt;&lt;br /&gt;For more information on this all check out storage section on the Preserv2 website @ &lt;a rel="nofollow" target="_blank" href="http://www.preserv.org.uk"&gt;www.preserv.org.uk&lt;/a&gt;.</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-7166652851684473013</guid>
         <pubDate>Wed, 21 Jan 2009 05:45:00 +0000</pubDate>
      </item>
      <item>
         <title>Beginning with RDF triplestores - a 'survey'</title>
         <link>http://oxfordrepo.blogspot.com/2008/11/beginning-with-rdf-triplestores.html</link>
         <description>&lt;div&gt;Like last time, this was prompted by an email that eventually was passed to me. It was a call for opinion - "&lt;tt&gt;&lt;font color='#737373'&gt;we thought we'd check first to see what software&lt;/font&gt;&lt;/tt&gt;&lt;tt&gt;&lt;font color='#737373'&gt; either of you recommend or use for an RDF database.&lt;/font&gt;&lt;/tt&gt;"&lt;br/&gt;&lt;br/&gt;It's a good question.&lt;br/&gt;&lt;br/&gt;In fact, it's a really great question, as searching for similar advice online results in very few opinions on the subject.&lt;br/&gt;&lt;br/&gt;But which one's are the best for novices? Which have the best learning curves? which has the easiest install or the shortest time between starting out and being able to query things?&lt;br/&gt;&lt;br/&gt;I'll try to pose as much as I can as a newcomer which won't be too hard :) Some of the comments will be my own, and some will be comments from others, but I'll try to be as honest as I can be to reflect new user expectation and experience and most importantly, developer-attention span. (See the end for some of my reasons for this approach.)&lt;br/&gt;&lt;br/&gt;&lt;em&gt;(Puts on newbie hat and enables PEBKAC mode.)&lt;/em&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Installable (local) triplestores&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Sesame&lt;/b&gt; - &lt;a rel="nofollow" target="_blank" href='http://www.openrdf.org/'&gt;http://www.openrdf.org/&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;Simple menu on the left of the website, one called downloads. Great, I'll give that a whirl. "Download &lt;a rel="nofollow" target="_blank" href='http://sourceforge.net/project/showfiles.php?group_id=46509&amp;amp;package_id=168413'&gt;the latest Sesame 2.x release&lt;/a&gt;" looks good to me. Hmm 5 differently named files... I'll grab the 'onejar' file and try to run it. "Failed to load Main-Class manifest attribute from openrdf-sesame-2.2.1-onejar.jar", okay... so back to the site to find out how to install this thing.&lt;br/&gt;&lt;br/&gt;No links for installation guide... on the &lt;a rel="nofollow" target="_blank" href='http://www.openrdf.org/documentation.jsp'&gt;Documentation&lt;/a&gt; page, no link for installation instructions for the sesame 2.2.1 I downloaded, but there is &lt;a rel="nofollow" target="_blank" href='http://www.openrdf.org/doc/sesame2/users/'&gt;Sesame 2 user documentation&lt;/a&gt; and &lt;a rel="nofollow" target="_blank" href='http://www.openrdf.org/doc/sesame2/system/'&gt;Sesame 2 system documentation&lt;/a&gt;. Phew, after guessing that the user documentation might have the guide, I finally found the &lt;a rel="nofollow" target="_blank" href='http://www.openrdf.org/doc/sesame2/users/ch06.html'&gt;installation guide&lt;/a&gt;  (system documentation was about the architecture, not how to administer the system as you might expect.)&lt;br/&gt;&lt;br/&gt;(Developer losing interest...)&lt;br/&gt;&lt;br/&gt;Ah, I see, I need the SDK. I wonder what that 'onejar' was then... "The deployment process is container-specific, please consult the&lt;br/&gt;			documentation for your container on how to deploy a web application. " - right, okay... let's assume that I have a Java background and am not just a user wanting to hook into it from my language of choice, such as php, ruby, python, or dare I say it, javascript.&lt;br/&gt;&lt;br/&gt;(Only Java-friendly developers continue on)&lt;br/&gt;&lt;br/&gt;Right, got Tomcat, and put in the war file... right so, now I need to work out how to use a &lt;a rel="nofollow" target="_blank" href='http://www.openrdf.org/doc/sesame2/users/ch07.html#d0e354'&gt;commandline&lt;/a&gt; console tool to set up a 'repository'... does this use SVN or CVS then? Oh, it doesn't do anything unless I end the line with a period. I thought it had hung trying to connect!  "Triple indexes [spoc,posc]" Wha? Well, whatever that was, the test repository is created. Let's see what's at http://localhost:8080/openrdf-sesame then. &lt;br/&gt;&lt;br/&gt;"You are currently accessing an OpenRDF Sesame server. This server is&lt;br/&gt;intended to be accessed by dedicated clients, using a specialized&lt;br/&gt;protocol. To access the information on this server through a browser,&lt;br/&gt;we recommend using the OpenRDF Workbench software."&lt;br/&gt;&lt;br/&gt;Bugger. Google for "sesame clients" then.&lt;br/&gt;&lt;ul&gt;&lt;li&gt;There is a Java client it seems, but it seems to need a lot to get going. Oh, and useful if my application is in Java or in a JVM (jRuby, jython)&lt;br/&gt;&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href='http://jeenbroekstra.blogspot.com/2008/09/sesame-2-desktop-client.html'&gt;http://jeenbroekstra.blogspot.com/2008/09/sesame-2-desktop-client.html&lt;/a&gt; .Net GUI... not so useful for programmatic stuff&lt;/li&gt;&lt;li&gt;...&lt;/li&gt;&lt;/ul&gt;I've pretty much given up at this point. If I knew I needed to use a triplestore then I might have persisted, but if I was just investigating it? I would've probably given up earlier.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Mulgara&lt;/b&gt; - &lt;a rel="nofollow" target="_blank" href='http://www.mulgara.org/'&gt;http://www.mulgara.org/&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;Nice, they've given the frontpage some style, not too keen on orange, but the effort makes it look professional. "&lt;em&gt;Mulgara&lt;/em&gt; is a scalable RDF database written entirely in &lt;strong&gt;&lt;a rel="nofollow" class='styleBlack' target="_blank" href='http://java.com/'&gt;Java&lt;/a&gt;&lt;/strong&gt;." -&amp;gt; Great, I found what I am looking for, and it warns me it needs Java. "DOWNLOAD NOW" - that's pretty clear. *click*&lt;br/&gt;&lt;br/&gt;Hmm, where's the style gone? Lots of download options, but thankfully one is marked by "These released binaries are all that are required for most applications." so I'll grab &lt;a rel="nofollow" target="_blank" href='http://www.mulgara.org/files/v2.0.6/mulgara-2.0.6-bin.tar.gz'&gt;those&lt;/a&gt;. 25Mb? Wow...&lt;br/&gt;&lt;br/&gt;Okay, it's downloaded and unpacked now. Let's see what we've got - a 'dist/' directory and two jars. Well, I guess I should try to run one (wonder what the licence is, where's the README?)&lt;br/&gt;&lt;blockquote&gt;&lt;em&gt;Mulgara Semantic Store Version 2.0.6 (Build 2.0.6.local) INFO [main] (EmbeddedMulgaraServer.java:715) - RMI Registry started automatically on port 10990 [main] INFO org.mulgara.server.EmbeddedMulgaraServer  - RMI Registry started automatically on port 1099 INFO [main] (EmbeddedMulgaraServer.java:738) - java.security.policy set to jar:file:/home/ben/Desktop/apache-tomcat-6.0.18/mulgara-2.0.6/dist/mulgara-2.0.6.jar!/conf/mulgara-rmi.policy3 [main] INFO org.mulgara.server.EmbeddedMulgaraServer  - java.security.policy set to jar:file:/home/ben/Desktop/apache-tomcat-6.0.18/mulgara-2.0.6/dist/mulgara-2.0.6.jar!/conf/mulgara-rmi.policy2008-11-14 14:06:39,899 INFO  Database - Host name aliases for this server are: [billpardy, localhost, 127.0.0.1]&lt;/em&gt;&lt;br/&gt;&lt;/blockquote&gt;Well, I guess something has started... back to the site, there is a documentation page and a wiki. A quick view of the official documentation has just confused me, is &lt;a rel="nofollow" target="_blank" href='http://docs.mulgara.org/'&gt;this an external site&lt;/a&gt;? No easy link to something like 'getting started' or tutorials. I've heard of SPARQL, what's iTQL? nevermind, let's see if the &lt;a rel="nofollow" target="_blank" href='http://www.mulgara.org/trac/wiki'&gt;wiki&lt;/a&gt; is more helpful.&lt;br/&gt;&lt;br/&gt;Let's try '&lt;a rel="nofollow" target="_blank" href='http://www.mulgara.org/trac/wiki/Docs'&gt;Documentation&lt;/a&gt;' - sweet, first link looks like what I want - &lt;a rel="nofollow" target="_blank" href='http://www.mulgara.org/trac/wiki/WebUI' class='wiki'&gt;Web User Interface&lt;/a&gt;.&lt;br/&gt;&lt;blockquote&gt;A default configuration for a standalone Mulgara server runs a set of&lt;br/&gt;web services, including the Web User Interface. The standard&lt;br/&gt;configuration puts uses port 8080, so the web services can be seen by&lt;br/&gt;pointing a browser on the server running Mulgara to &lt;a rel="nofollow" target="_blank" href='http://localhost:8080/' class='ext-link'&gt;&lt;span class='icon'&gt;http://localhost:8080/&lt;/span&gt;&lt;/a&gt;.&lt;br/&gt;&lt;/blockquote&gt;Ooo cool. *click* &lt;br/&gt;&lt;blockquote&gt;&lt;h2&gt;Available Services&lt;/h2&gt;&lt;ul&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href='http://localhost:8080/sparql'&gt;SPARQL HTTP Service&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href='http://localhost:8080/webui'&gt;User Interface&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href='http://localhost:8080/webservices'&gt;Web Services&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a rel="nofollow" target="_blank" href='http://localhost:8080/tql'&gt;TQL HTTP Service&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;&lt;br/&gt;SPARQL, I've heard of that. *click* &lt;br/&gt;&lt;blockquote&gt;&lt;h2&gt;HTTP ERROR: 400&lt;/h2&gt;&lt;pre&gt;Query must be supplied&lt;/pre&gt;&lt;p&gt;RequestURI=/sparql/&lt;/p&gt;&lt;p&gt;&lt;i&gt;&lt;small&gt;&lt;a rel="nofollow" target="_blank" href='http://jetty.mortbay.org/'&gt;Powered by Jetty://&lt;/a&gt;&lt;/small&gt;&lt;/i&gt;&lt;/p&gt;&lt;/blockquote&gt;I guess that's the SPARQL api, good to know, but the frontpage could've warned me a little. Ah, second link is to the User Interface.&lt;br/&gt;&lt;br/&gt;Good, I can use a drop down to look at lots of example queries, nice. Don't understand most of them at the moment, but it's definitely comforting to have examples. They look nothing like SPARQL though... wonder what it is? I'm sure it does SPARQL... was I wrong?&lt;br/&gt;&lt;br/&gt;Quick poke at the HTML shows that it is just POSTing the query text to webui/ExecuteQuery. Looks straightforward to start hacking against too, but probably should password protect this somehow! I wonder how that is done... documentation mentions a '&lt;tt&gt;java.security.policy'&lt;/tt&gt; field:&lt;tt&gt;&lt;br/&gt;&lt;br/&gt;java.security.policy&lt;/tt&gt;&lt;i&gt;&lt;br/&gt;string: URL&lt;/i&gt;: The URL for the security policy file to use.&lt;br/&gt;Default: jar:file:/jar_path!/conf/mulgara-rmi.policy &lt;br/&gt;&lt;blockquote&gt;&lt;p/&gt;&lt;/blockquote&gt;Kinda stumped... will investigate that later, but at least there's hope. Just be firing off the example queries though shows me stuff, so I've got something to work with at least.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Jena&lt;/b&gt; - &lt;a rel="nofollow" target="_blank" href='http://jena.sourceforge.net/'&gt;http://jena.sourceforge.net/&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;Front page is pretty clear, even if I don't understand what all those acronyms are. &lt;a rel="nofollow" target="_blank" href='http://jena.sourceforge.net/downloads.html'&gt;downloads&lt;/a&gt; link takes me to a page with an obvious download link, good. (Oh, and sourceforge, you suck. How many frikkin mirrors do I have to try to get this file?)&lt;br/&gt;&lt;br/&gt;Have to put Jena on pause while Sourceforge sorts its life out.&lt;br/&gt;&lt;br/&gt;&lt;b&gt;ARC2&lt;/b&gt; - http://arc.semsol.org/&lt;br/&gt;&lt;br/&gt;Frontpage: "Easy RDF and SPARQL for LAMP systems" Nice, I know of LAMP and I particularly like the word Easy. Let's see... &lt;a rel="nofollow" target="_blank" href='http://arc.semsol.org/download'&gt;Download&lt;/a&gt; is easy to find, and tells me straight away I need PHP 4.3+ and MySQL 4.0.4+ *check* Right, now how do I enable PHP for apache again?... Ah, it helps if I install it first... Okay, done. Dropping the folder into my web space... Hmm nothing does anything. From the documentation, it does look like it is geared to providing a PHP library framework for working with its triplestore and RDF. Hang on, &lt;a rel="nofollow" target="_blank" href='http://arc.semsol.org/docs/v2/endpoint'&gt;SPARQL Endpoint Setup&lt;/a&gt; looks like what I want. It wants a database, okay... done, bit of a hassle though.&lt;br/&gt;&lt;br/&gt;Hmm, all I get is "&lt;b&gt;Fatal error&lt;/b&gt;:  Call to undefined function mysql_connect() in &lt;b&gt;/********/arc2/store/ARC2_Store.php&lt;/b&gt; on line &lt;b&gt;53"&lt;br/&gt;&lt;/b&gt;&lt;br/&gt;Of course, install php libraries to access mysql (PEBKAC)... done and I also realise I need to set up the store, like the example in "&lt;a rel="nofollow" target="_blank" href='http://arc.semsol.org/docs/v2/getting_started'&gt;Getting Started&lt;/a&gt;"... done (with &lt;a rel="nofollow" target="_blank" href='http://pastebin.com/f2ca379e7'&gt;this&lt;/a&gt;) and what does the index page now look like?&lt;br/&gt;&lt;br/&gt;&lt;img src='http://lh4.ggpht.com/_KLlGSypGAvw/SR2Xk92vjbI/AAAAAAAAACo/RhWSkZvbYCM/%5BUNSET%5D.png?imgmax=800' style='max-width:800px;'/&gt;&lt;br/&gt;&lt;br/&gt;Yay! there's like SPARQL and stuff... I guess 'load' and 'insert' will help me stick stuff in, and 'select' looks familiar... Well, it seems to be working at least.&lt;br/&gt;&lt;br/&gt;Unfortunately, it looks like the Jena download from sourceforge is in a world of FAIL for now. Maybe I'll look at it next time?&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Triplestores in the cloud&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;Talis Platform - &lt;a rel="nofollow" target="_blank" href='http://www.talis.com/platform/'&gt;http://www.talis.com/platform/&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;From the frontpage - "&lt;i&gt;Developers using the Platform can spend more of their time building&lt;br /&gt;extraordinary applications and less of their time worrying about how&lt;br /&gt;they will scale their data storage.&lt;/i&gt;" - pretty much want I wanted to hear, so how do I get to play with it?&lt;br/&gt;&lt;br/&gt;There is a &lt;a rel="nofollow" title='Get involved' target="_blank" href='http://www.talis.com/platform/get_involved/index.shtml'&gt;Get involved&lt;/a&gt; link on the left, which rapidly leads me to see the section: "Develop, play and try out" - &lt;a rel="nofollow" target="_blank" href='http://n2.talis.com/wiki/Main_Page'&gt;n&lt;sup&gt;2&lt;/sup&gt; developer community &lt;/a&gt; seems to be where it wants me to go. &lt;br/&gt;&lt;br/&gt;Lots of links on the frontpage, takes a few seconds to spot: "&lt;a rel="nofollow" title='Join' target="_blank" href='http://n2.talis.com/wiki/Join'&gt;Join&lt;/a&gt; - join the n² community to get free developer stores and online support" - free, nice word that. So, I just have to email someone? Okay, I can live with that.&lt;br/&gt;&lt;br/&gt;Documentation seems good, lots of choices though, a little hard to spot a single thread to follow to get up to speed, but &lt;a rel="nofollow" title='Guides and Tutorials' target="_blank" href='http://n2.talis.com/wiki/Guides_and_Tutorials'&gt;Guides and Tutorials&lt;/a&gt; looks right to get going with. The &lt;a rel="nofollow" target="_blank" href='http://n2.talis.com/wiki/Kniblet_Tutorial'&gt;Kniblet tutorial&lt;/a&gt; (whatever a kniblet is) looks the most beginnerish, and it's also very PHP focussed, which is either a good thing or a bad thing depending on the user :)&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Commercial triplestores&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Openlink Virtuoso&lt;/b&gt; - &lt;a rel="nofollow" target="_blank" href='http://virtuoso.openlinksw.com/'&gt;http://virtuoso.openlinksw.com/&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;Okay, I tried the &lt;a rel="nofollow" target="_blank" href='http://download.openlinksw.com/download/'&gt;Download&lt;/a&gt; link, but I am pretty confused by what I'm greeted with: &lt;br/&gt;&lt;br/&gt;&lt;img src='http://lh3.ggpht.com/_KLlGSypGAvw/SR2rWiIXUBI/AAAAAAAAACs/qOB0ORoEOI4/%5BUNSET%5D.png?imgmax=800' style='max-width:800px;'/&gt;&lt;br/&gt;&lt;br/&gt;Not sure what one to pick just to try it out, it's late in the day, and my tolerance for all things installable has ended.&lt;br/&gt;&lt;br/&gt;-----------------------------------------&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Why take the http/web-centric, newbie approach to looking at these?&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;i&gt;Answer: &lt;/i&gt;In part, I am taking this approach because I have a deep belief that it&lt;br/&gt;was only after relational DBs became commoditised - "You want fries&lt;br/&gt;with you MySQL database?" - that the dynamic web kicked off. If we want&lt;br/&gt;the semantic web to kick off, we need to commoditise it or at least, make&lt;br/&gt;it very easy for developers to get started. And I mean &lt;b&gt;&lt;i&gt;EASY&lt;/i&gt;&lt;/b&gt;. A query that I want answered is: "Is there something that fits: 'apt-get install&lt;br/&gt;triplestore; r = store('localhost'), r.add(rdf), r.query(blah)'? " &lt;br/&gt;&lt;br/&gt;(I am particularly interested to see what happens when &lt;a rel="nofollow" target="_blank" href='http://tom.opiumfield.com/'&gt;Tom Morris&lt;/a&gt;'s work on &lt;a rel="nofollow" target="_blank" href='http://github.com/tommorris/reddy/tree/master'&gt;Reddy&lt;/a&gt; collides with ActiveRecord or activerdf...)&lt;br/&gt;&lt;br/&gt;&lt;b&gt;NB &lt;/b&gt;I've short circuited the discovery of software homepages - Imagine&lt;br/&gt;I've seen projects stating that they use "XXXXX as a triplestore". I know&lt;br/&gt;this will likely mean I've compared apples to oranges, but as a newbie, how&lt;br/&gt;would I be expected to know this? "Powered by the Talis Platform" and&lt;br/&gt;"Powered by Jena" seem pretty similar on the surface.)&lt;br/&gt;&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-280227876627147956</guid>
         <pubDate>Fri, 14 Nov 2008 08:50:00 +0000</pubDate>
         <media:thumbnail height="72" url="http://lh4.ggpht.com/_KLlGSypGAvw/SR2Xk92vjbI/AAAAAAAAACo/RhWSkZvbYCM/s72-c/%5BUNSET%5D.png?imgmax=800" width="72" />
      </item>
      <item>
         <title>A Fedora/Solr Digital Library for Oxford's 'Forced Migration Online'</title>
         <link>http://oxfordrepo.blogspot.com/2008/11/fedorasolr-digital-library-for-oxford.html</link>
         <description>&lt;div&gt;(mods:subtitle - Slightly more technical follow-up to the &lt;a rel="nofollow" target="_blank" href='http://expertvoices.nsdl.org/hatcheck/2008/11/06/a-fedorasolr-digital-library-for-oxfords-forced-migration-online/'&gt;Fedora Hatcheck piece&lt;/a&gt;.)&lt;br/&gt;&lt;br/&gt;As I have been prompted via email by Phil Cryer (of the &lt;a rel="nofollow" target="_blank" href='http://mobot.org/'&gt;Missouri Botanical Garden&lt;/a&gt;) to talk more about how this technically works, I thought it would be best to make it a written post, rather than the more limited email response.&lt;br/&gt;&lt;br/&gt;&lt;big&gt;Background&lt;/big&gt;&lt;br/&gt;&lt;br/&gt;Forced Migration Online (FMO) had a proprietary system, supporting their document needs. It was originally designed for newpaper holdings and applied that model to encoding the mostly paginated documents that FMO held - such that each part was broken up into paragraphs of text, images and the location of all these parts on a page. It even encoded (in its own format) the location of the words on the page when it OCR'd the documents, making per-word higlighting possible. Which is nice.&lt;br/&gt;&lt;br/&gt;However, the backend that powered this was over-priced, and FMO wanted to move to a more open, sustainable platform.&lt;br/&gt;&lt;br/&gt;&lt;big&gt;Enter the DAMS&lt;/big&gt;&lt;br/&gt;&lt;br/&gt;(DAMS = Digital Asset Management System)&lt;br/&gt;&lt;br/&gt;I have been doing work on trying to make a service out of a base of fedora-commons and additional 'plugin' services, such as the wonderful Apache Solr and the useful eXist XML db. The end aim is for departments/users/whoever to requisition a 'store' with a certain quality of service (solr attached, 50Gb+ etc) but this is not yet an automated process.&lt;br/&gt;&lt;br/&gt;The focus for the store is a very clear separation between storage, management, indexing services and distribution - Normal filesystems, or Sun Honeycomb are the storage, Fedora-commons provides the management + CRUD, solr, eXist, mulgara, sesame, and couchDB can provide potential index and query services, and distribution is handed pragmatically, caching outgoing and mirroring where necessary.&lt;br/&gt;&lt;br/&gt;&lt;big&gt;The FMO 'store'&lt;/big&gt;&lt;br/&gt;&lt;br/&gt;From discussions with FMO, and examining the information they held and the way they wished to make use of it, a simple Fedora/Solr store seemed to fufill what they wanted: a persistant store of items with attachments and the ability to search the metadata and retrieve results.&lt;br/&gt;&lt;br/&gt;&lt;big&gt;Bring in the consultants&lt;/big&gt;&lt;br/&gt;&lt;br/&gt;FMO hired Aptivate to do the migration of their data from the proprietary system, in its custom format, to a Fedora/Solr store and trying as much as possible to retain the functionality they had.&lt;br/&gt;&lt;br/&gt;Some points that I think it is important to impress on people here:&lt;br/&gt;&lt;ul&gt;&lt;li&gt;In general, software engineer consultants don't understand METS or FOXML.&lt;/li&gt;&lt;li&gt;They *really* don't understand the point of disseminators.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Having to teach software engineer consultants to do METS/FOXML/bDef's etc is likely an arduous and costly task.&lt;/li&gt;&lt;li&gt;Consultants add lots of money to do things their team don't already have the experience to do.&lt;/li&gt;&lt;/ul&gt;So, my conclusion was to not make these things part of the development at all to the extent that I might even have forgotten to mention these things to them except in passing. I helped them install their own local store and helped them with the various interfaces and gotchas of the two software packages. By showing them how I use Fedora and Solr in ora.ouls.ox.ac.uk, they were able to hit the ground running.&lt;br/&gt;&lt;br/&gt;They began by using the REST interface to Fedora and the RESTful interface to Solr. By having them begin by using the simple put/get REST interface to Fedora, they could concentrate on getting used to the nature of Fedora as an objectstore. I think they moved to use the SOAP interface as it better suited their Java background, although I cannot be certain as it wasn't an issue that came up.&lt;br/&gt;&lt;br/&gt;Once they had developed the migration scripts to their satisfaction, they asked me to give them a store, which I did (but due to hardware and stupid support issues here I am afraid to say I held them up on this.) They fired off their scripts, moved all the content into the fedora with a straightforward layout per object (pdf, metadata, fulltext and thumbnail) The metadata is - from what I can see - the same XML metadata as before - very MARCXML in nature, with 'Application_Info' elements having types like 'bl:DC.Title'. If necessary, we will strip out the dublin core metadata and put what we can into the DC datastream, but that's not of particular interest to FMO right now. &lt;br/&gt;&lt;br/&gt;&lt;big&gt;Fedora/Solr notes&lt;/big&gt;&lt;br/&gt;&lt;br/&gt;As for the link between Solr and Fedora? This is very loosely coupled, such that they are running in the same Tomcat container for convenience, but aren't linked in a hard way. &lt;br/&gt;&lt;br/&gt;I've looked at GSearch, which is great for a homogenous collection of items, such that they can be acted on by the same XSLT to produce a suitable record for Solr, but as the metadata was a complete unknown for this project, it wasn't too suitable.&lt;br/&gt;&lt;br/&gt;Currently, they have one main route into the fedora store, and so, it isn't hard to simply reindex an item after a change is made, especially for services such as Solr or eXist, which expect to have things change incrementally. I am looking at services such as ActiveMQ for scheduling these index tasks, but more and more I am starting to favour RabbitMQ which seems to be more useful, while retaining the scalability and very robust nature.&lt;br/&gt;&lt;br/&gt;Sending an update to Solr is as simple as an HTTP POST to its /update service, consisting of a XML or JSON packet like " changeme:1 John Smith .... " - it uses a transactional model, such that you can push all the changes and additions into the live index via a commit call, without taking the index offline. To query Solr, all manner of clients exist, and it is built to be very simple to interact with, handling facet queries, filtering, ordering and can deliver the results in XML, JSON, PHP or Python directly. It can even do a XSLT transform of the results on the way out, leading to a trivial way to support OpenSearch, Atom feeds and even HTML blocks for embedding in other sites.&lt;br/&gt;&lt;br/&gt;Likewise, to change a PDF in Fedora can be done by a HTTP POST as well. Does it need to be more complicated?&lt;br/&gt;&lt;br/&gt;&lt;b&gt;Last, but not least, a project to watch closely:&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;The &lt;a rel="nofollow" target="_blank" href='http://ice.usq.edu.au/projects/fascinator/trac'&gt;Fascinator project&lt;/a&gt;, funded by  &lt;a rel="nofollow" class='ext-link' target="_blank" href='http://www.arrow.edu.au/'&gt;&lt;span class='icon'&gt;ARROW&lt;/span&gt;&lt;/a&gt;, as part of their mini project scheme, is an Apache &lt;a rel="nofollow" class='ext-link' target="_blank" href='http://lucene.apache.org/solr/'&gt;&lt;span class='icon'&gt;Solr&lt;/span&gt;&lt;/a&gt; front end to the &lt;a rel="nofollow" class='ext-link' target="_blank" href='http://www.fedora-commons.org/'&gt;&lt;span class='icon'&gt;Fedora commons&lt;/span&gt;&lt;/a&gt; repository. The goal of the project is to create a simple interface to Fedora that uses a single technology – that’s Solr – to handle all browsing, searching and security. Well worth a look, as it seeks to turn this Fedora/Solr pairing truly into an appliance, with a simple installer and handling the linkage between the two.&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-7367532608884117842</guid>
         <pubDate>Thu, 13 Nov 2008 06:30:00 +0000</pubDate>
      </item>
      <item>
         <title>News and updates Oct 2008</title>
         <link>http://oxfordrepo.blogspot.com/2008/10/news-and-updates-oct-2008.html</link>
         <description>&lt;div&gt;Right, I haven't forgotten about this blog, just getting all my ducks in a line as it were. Some updates:&lt;br/&gt;&lt;br/&gt;&lt;ul&gt;&lt;li&gt;The JISC bid for eAdministration was successful, titled "Building the Research Information Infrastructure (BRII)". The project will categorise the research information structure, build vocabularies if necessary, and populate it with information. It will link research outputs (text and data), people, projects, groups, departments, grant/funding information and funding bodys together, using RDF and as many pre-existing vocabularies as is suitable. The first vocab gap we've hit is one for funding, and I've made a draft RDF schema for this which will be openly published once we've worked out a way to make it persistent here at Oxford (trying to get a vocab.ox.ac.uk address)&lt;/li&gt;&lt;ul&gt;&lt;li&gt;One of the final outputs will be a 'foafbook' which will re-use data in the BRII store - it will act as a blue book of researchers. Think Cornell's Vivo, but with the idea of Linked Data firmly in mind.&lt;/li&gt;&lt;li&gt;We are just sorting out a home for this project, and I'll post up an update as soon as it is there.&lt;br/&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Forced Migration Online (FMO) have completed their archived document migration from a crufty, proprietary store to a ORA-style store (Fedora/Solr) - you can see their preliminary frontend at &lt;a rel="nofollow" target="_blank" href='http://fmo.qeh.ox.ac.uk'&gt;http://fmo.qeh.ox.ac.uk.&lt;/a&gt; Be aware that this is a work in progress. We provide the store as a service to them, giving them a Fedora and a Solr to use. They contracted a company called Aptivate to migrate their content, and I believe also to create their frontend. This is a pilot project to show that repositories can be treated in a distributed way, given out like very smart, shared drive space.&lt;/li&gt;&lt;li&gt;We are working to archive and migrate a number of library and historical catalogs. A few projects have a similar aim to provide an architecture and software system to hold medieval catalog research - a record of what libraries existed, and what books and works they held. This is much more complex that a normal catalog, as each assertion is backed by a type of evidence, ranging from the solid (first-hand catalog evidence), to the more loose (handwriting on the front page looks like a certain person who worked at a certain library.) So modelling this informational structure is looking to be very exciting, and we will have to try a number of ways to represent this, starting with RDF due to the interlinked nature of the data. This is related to the kinds of evidence that genealogy uses, and so related ontologies may be of use.&lt;/li&gt;&lt;li&gt;The work on storing and presenting scanned imagery is gearing up. We are investigating storing the sequence of images and associated metadata/ocr text/etc as a single tar file as part of a Fedora object (i.e. a book object will have a catalog record, technical/provenance information and an attached tar file and and a list of file to offset information.)&lt;/li&gt;&lt;ul&gt;&lt;li&gt;This is due to us trying to hit the 'sweet spot' for most file systems. A very large number of highly compressed images and little pieces of text does not fit well with most FS internals. We estimate that for a book there will be around [4+PDFs+2xPages] files, or 500+ typically. Just counting up the various sources of scanned media we already have, we are pressing for about 1/2 million books from one source, 200,000 images from another, 54,000 from yet another... it's adding up real fast.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;We are starting to deal with archiving/curating the 'long-tail' of data - small, bespoke datasets that are useful to many, but don't fall into the realm of Big Data, or Web data. I don't plan on touching Access/FoxPro databases any time soon though! I am building a Fedora/Solr/eXist box to hold and disseminate these, which should live at databank.ouls.ox.ac.uk very, very shortly. (Just waiting on a new VMware host to grow into, our current one is at capacity.)&lt;br/&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;To give a better idea of the structure, etc, I am writing it up in a second blog post to follow shortly - currently named "Modelling and storing a phonetics database inside a store"&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;I am in the process of integrating the Google-analytics-style statistics package at http://piwik.org with the ORA interface, to give relatively live hit counts on a per-item and to build per-collection reports.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Right now, piwik is capturing the hits and downloads from ORA, but I have yet to add in the count display on each item page, so halfway there :)&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;We are just waiting on a number of departments here to upgrade the version of EPrints they are using for their internal, disciplinary repositories, so that we can begin archiving surrogate copies of the work they wish to put up for this service. (Using ORE descriptions of their items) By doing so, their content becomes exposed in ORA, mirror copies are made (working on a good way to maintain these as content evolves), but they retain the content control, ORA will also act as a registry for their content. It's only when their service drops do the users get redirected to the mirror copies that ORA holds (think google cache, but a 100% copy).&lt;/li&gt;&lt;li&gt;In the process of battle-testing the Fedora-Honeycomb connection, but as mentioned above, just waiting for a little more hardware before I set to it. Also, we are examining a number of other storage boxes that should plug in under Fedora, using the Honeycomb software, such as the new and shiny Thumper box, "Thor" Sun Fire Xsomething-or-other. Also, getting pretty interested at the idea of MAID storage - massive array of idle disks. Hopefully, this will act like tape, but have a sustainable access speed of disk. Also, a little more green than a tower of spinning hardware.&lt;/li&gt;&lt;li&gt;Planning out the indexer service at the moment. It will use the Solr 1.3 multicore functionality, with a little parsing magic at the ingest side of things to make a generic indexer-as-a-service type system. One use-case is to be able to bring up VM machines with multicore solr on to act as indexers/search engines as needed. An example aim? "Economics want an index that facets on their JEL codes." POST a schema and ingest indexer to the nearest free indexer, and point the search interface at it once an XMPP message comes back that it is finished.&lt;/li&gt;&lt;li&gt;URI resolvers - still investigating what can be put in place for this, as I strongly wish to avoid coding this myself. Looking at OCLC's OpenURL and how I can hack it to feed it info:fedora uris and link them to their disseminated location. Also, using a tinyurl type library + simple interface might not be a bad idea for a quick PoC.&lt;br/&gt;&lt;/li&gt;&lt;li&gt;Just to let you all know that we are building up the digital team here, most recently held interviews for the futureArch project but we are looking for about 3 others to hire, due to all the projects we are doing. We will be putting out job adverts as and when we feel up to battling with HR :)&lt;/li&gt;&lt;/ul&gt;That's most of the more interesting hot topics and projects I am doing at the moment.... phew :)&lt;br/&gt;&lt;/div&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-2317123182133225178</guid>
         <pubDate>Thu, 16 Oct 2008 03:39:00 +0000</pubDate>
      </item>
      <item>
         <title>Institutions hate repositories... one simple reason.</title>
         <link>http://davetaz-blog.blogspot.com/2008/09/institutions-hate-repositories-one.html</link>
         <description>Open access is not enough!&lt;br /&gt;&lt;br /&gt;People want to give Open Access to some of their materials at their institution however the IR software is seen as a means to manage all Institutional content and not just that which is Open Access and part of the external image of the Institution.&lt;br /&gt;The problem exists in the other direction as well where repository software is trying to solve these problems, thus people are not likely to use this software until it is included.&lt;br /&gt;&lt;br /&gt;So what do we end up with...&lt;br /&gt;&lt;br /&gt;Lots of Repository Islands which aren't interoperable with each other!&lt;br /&gt;&lt;br /&gt;So if we solve the access and copyright issue will people use the software? errrr No. At this point the software is an all in solution and not a service which can be utilised by current institutional practise ... Give up...?&lt;br /&gt;&lt;br /&gt;No!&lt;br /&gt;&lt;br /&gt;Focus on providing a service, e.g. something which can manage your Digital Resources and enable this to plug to existing institutional services. Some softwares would argue they support this already. OK good, so don't try and solve the problem if it is just an integration issue.&lt;br /&gt;&lt;br /&gt;To the repositories: Decouple! Build a set of services, build ways of plugging services together and allow the community to pic 'n' mix.&lt;br /&gt;&lt;br /&gt;To the institution: You already have access control systems ask your Information/Computer Systems department. You probably already have a Content Management System for educational resources for students (Blackboard? - Integrates with an LDAP server), these use external services to manage access and authentication! Here's a few services for you... LDAP, Radius, Eduroam, Domain Controller. &lt;br /&gt; &lt;br /&gt; Till next time!</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-3694637630389146162</guid>
         <pubDate>Thu, 18 Sep 2008 02:14:00 +0000</pubDate>
      </item>
      <item>
         <title>DSpace and Fedora *need* opinionated installers.</title>
         <link>http://oxfordrepo.blogspot.com/2008/08/dspace-and-fedora-need-opinionated.html</link>
         <description>Just to say that both Fedora-Commons and DSpace really, really need opinionated installers that make choices for the user. Getting either installed is a real struggle - which we demonstrated during the Crigshow, so please don't write in the comments that it is easy, it just isn't.&lt;br /&gt;&lt;br /&gt;Something that is relatively straightforward to install, is a debian package.&lt;br /&gt;&lt;br /&gt;So, just a plea in the dark, can we set up a race? Who can make their repository software installable as a .deb first? will it be DSpace or Fedora? Who am I going to send a box of cookies to and a thank you note from the entire developer community?&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://wiki.eprints.org/w/Installing_EPrints_3_via_apt_%28Debian/Ubuntu%29"&gt;(EPrints doesnt count in this race; they've already done it)&lt;/a&gt;</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-1371143081899015772</guid>
         <pubDate>Mon, 18 Aug 2008 05:16:00 +0000</pubDate>
      </item>
      <item>
         <title>Re-using video compression code to aid document quality checking</title>
         <link>http://oxfordrepo.blogspot.com/2008/08/re-using-video-compression-code-to-aid.html</link>
         <description>&lt;a rel="nofollow" target="_blank" href="http://crigshow.blogspot.com/2008/07/prototype-motion-analysis-to-detect.html"&gt;(Expanding on this video post from the Crigshow)&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;Problem:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The volume of pages from a large digitisation project can be overwhelming. Add into that the simple fact that all (UK) institutional projects are woefully underfunded and underresourced, it's surprising that we can cope with them really.&lt;br /&gt;&lt;br /&gt;One issue that repeatedly comes up is the idea of quality assurance; How can we know that a given book has been scanned well? How can we spot images easily? Can we detect if foreign bodies were present in the scan, such as thumbs, fingers or bookmarks?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;A quick solution:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inspired by a talk at one of the conference strands at WorldComp, where the author talked about the use of a component of a commonly used video compression standard (MPEG2) to detect degrees of motion and change in a video, without having to analyse the image sequences using a novel, or smart algorithm.&lt;br /&gt;&lt;br /&gt;He talked about using the motion vector stream to be a good rough guide to the amount of change between frames of video.&lt;br /&gt;&lt;br /&gt;So, why did this inspire me?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;MPEG-2 compression is a pretty much a solved problem; there are some very fast and scalable solutions out there today - direct benefit: &lt;span style="font-weight:bold;"&gt;No new code needs to be written and maintained&lt;/span&gt;&lt;/li&gt;&lt;li&gt;The format is very well understood and stripping out the motion vector stream wouldn't be tricky. Code exists for this too.&lt;/li&gt;&lt;li&gt;Pages of text in printed documents tend towards being justified so that the two edges of the text columns are straight lines. There is also (typically) a fixed number of lines on a page.&lt;/li&gt;&lt;li&gt;A (comparatively rapid) MPEG2 compression of the scans of a book would have the following qualities:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;The motion vectors between pages of text would either shown little overall change (as differing letters are actually quite similar) or a small, global shift if the page was printed on a slight offset.&lt;/li&gt;&lt;li&gt;The motion vectors between a page of text and a page with an image embedded in text on the next, or a thumb on the edge, would show localised and distinct changes that differ greatly from the overall perspective.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;In fact, a real crude solution could be, just using the vector stream to create a bookmark list for all the suspect changes. This might bring the number of pages to check down to a level that a human mediator could handle.&lt;/li&gt;&lt;/ul&gt;How much needs to be checked?&lt;br /&gt;&lt;br /&gt;Via basic sample survey statistics: to be sure to 95% (±5%) that the scanned images of 300 million pages are okay, just 387 totally random pages need to be checked. However, to be sure that each individual book is okay to the same degree, a book being ~300 pages, 169 pages need to be checked &lt;span style="font-weight:bold;"&gt;in each book&lt;/span&gt;. I would suggest that the above technique would significantly lower this threshold, but it would be by an empirically found amount.&lt;br /&gt;&lt;br /&gt;Also note that the above figures carry the assumption that the scanning process doesn't change over time, which of course it does!</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-7963164596904275017</guid>
         <pubDate>Mon, 18 Aug 2008 04:47:00 +0000</pubDate>
      </item>
      <item>
         <title>The four rules of the web and compound documents</title>
         <link>http://oxfordrepo.blogspot.com/2008/08/four-rules-of-web-and-compound.html</link>
         <description>A real quirk that truly interests me is the difference in aims between the way documents are typically published and the way that the information within them is reused.&lt;br /&gt;&lt;br /&gt;A published document is normally in a single 'format' - a paginated layout, and this may comprise text, numerical charts, diagrams, tables of data and so on.&lt;br /&gt;&lt;br /&gt;My assumption is that, to support a given view or argument, a reference to the entirety of an article is not necessary; The full paper gives the context to the information, but it is much more likely that a small part of this paper contains the novel insight being referenced.&lt;br /&gt;&lt;br /&gt;In the paper-based method, it is difficult to uniquely identify parts of an article as items in their own right. You could reference a page number, give line numbers, or quote a table number, but this doesn't solve this issue that the author hadn't put time to considering that a chart, a table or a section of text would be reused.&lt;br /&gt;&lt;br /&gt;So, on the web, where multiple representations of the same information is getting to be commonplace (mashups, rss, microblogs, etc), what can we do to help better fulfill both aims, to show a paginated final version of a document, and also to allow each of the components to exist as items in their own right, with their own URIs (or better, URLs containing some notion of the context e.g.   if /store/article-id gets to the splash page of the article, /store/article-id/paragraph-id will resolve to the text for that paragraph in the article.)&lt;br /&gt;&lt;br /&gt;Note that the four rules of the web (well, of &lt;a rel="nofollow" target="_blank" href="http://linkeddata.org/"&gt;Linked Data&lt;/a&gt;) are in essence:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;give everything a name,&lt;br /&gt;&lt;/li&gt;&lt;li&gt;make that name a URL ...&lt;/li&gt;&lt;li&gt;...which results in data about that thing,&lt;br /&gt;&lt;/li&gt;&lt;li&gt;and have it link to other related things. &lt;/li&gt;&lt;/ul&gt;[From &lt;a rel="nofollow" target="_blank" href="http://www.w3.org/DesignIssues/LinkedData.html"&gt;TimBL's originating article&lt;/a&gt;. Also, see this &lt;a rel="nofollow" target="_blank" href="http://virtuoso.openlinksw.com/presentations/Creating_Deploying_Exploiting_Linked_Data2/Creating_Deploying_Exploiting_Linked_Data2_TimBL_v3.html#%281%29"&gt;presentation &lt;/a&gt;- a remix of presentations from TimBL and the speaker, &lt;span style="font-style:italic;font-size:100%;"&gt;&lt;a rel="nofollow" target="_blank" href="http://myopenlink.net/dataspace/person/kidehen#this"&gt;&lt;span style="color:rgb(0, 147, 182);text-decoration:underline;"&gt;Kingsley Idehen&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt; - given at the recent Linked Data Planet conference&lt;/span&gt;]&lt;br /&gt;&lt;br /&gt;I strongly believe that applying this to the individual components of a document is a very good and useful thing.&lt;br /&gt;&lt;br /&gt;One thing first, we have to get over the legal issue of just storing and presenting a bitwise perfect copy of what an author gives us. We need to let author's know that we may present alternate versions, based on a user's demands. This actually needs to be the case for preservation and the repository needs to make it part of their submission policy to allow for format migrations, accessibility requirements and so on.&lt;br /&gt;&lt;br /&gt;The system holding the articles needs to be able to clearly indicate versions and show multiple versions for a single record.&lt;br /&gt;&lt;br /&gt;When a compound document is submitted to the archive, a second parallel version should be made by fragmenting the document into paragraphs of text, individual diagrams, tables of data, and other natural elements. One issue that has already come up in testing, is that documents tend to clump multiple, separate diagrams together into a single physical image. It is likely that the only solution to breaking these up to this is going to be a human one, either author/publisher education(unlikely) or by breaking them up by hand.&lt;br /&gt;&lt;br /&gt;I would suggest using a very lightweight, hierarchical structure to record the document's logical structure. I have yet to settle on basing it on the content XML format inside the OpenDocument format, or on something very lightweight, using HTML elements, which would have a double benefit of being able to be sent directly to a browser to 'recreate' the document roughly.&lt;br /&gt;&lt;br /&gt;Summary:&lt;br /&gt;&lt;br /&gt;1) Break apart any compound document into its constituent elements (paragraph level is suggested for text)&lt;br /&gt;2) Make sure that each one of these parts are clearly expressed in the context they are in, using hierarchical URLs, /article/paragraph or even better, /article/page/chart&lt;br /&gt;3) On the article's splashpage, make a clear distinction between the real article and the broken up version. I would suggest a scheme like Google search's 'View [PDF, PPT, etc] as HTML'. I would assert that many people intuitively understand that this view is not like the original and will look or act differently.&lt;br /&gt;&lt;br /&gt;Some related video blogs from the &lt;a rel="nofollow" target="_blank" href="http://crigshow.blogspot.com/"&gt;Crigshow&lt;/a&gt; trip&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://crigshow.blogspot.com/2008/07/prototype-extracting-and-finding.html"&gt;Finding and reusing algorithms from published articles&lt;/a&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://crigshow.blogspot.com/2008/07/real-documents-are-complex-objects.html"&gt;OCR'ing documents; Real documents are always complex&lt;/a&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://crigshow.blogspot.com/2008/07/protoype-providing-overviews-of.html"&gt;Providing a systematic overview of how a Research paper is written&lt;/a&gt; - giving each component and each version of a component would have major benefits here</description>
         <author>Ben O'Steen</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3090914606822911489.post-2742105134406656447</guid>
         <pubDate>Mon, 18 Aug 2008 03:40:00 +0000</pubDate>
      </item>
      <item>
         <title>#crigshow - Conference 2 - Worldcomp</title>
         <link>http://davetaz-blog.blogspot.com/2008/07/crigshow-conference-2-worldcomp.html</link>
         <description>&lt;span style="font-weight:bold;"&gt;Agents and Web Services... Why no collaboration?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Out of all the presentations at worldcomp this one struck me as one of the most obvious but not covered areas for research in computer science. Probably the most well known agent system is that used by the travel industry where they have standard ways of interfacing with each other to find details of travel and hotels available on a global scale. This is no mean feat with the number of companies there are hooking into this network. &lt;br /&gt;&lt;br /&gt;So why doesn't the same exist for web services or if there is such a system why isn't everyone in the open community using it? &lt;br /&gt;&lt;br /&gt;Surely the point of web services is for people to discover and use them in their own scenarios just like the agents in the travel industry do. OK so maybe the problem lies in the fact that there are so many communities that there will never be a specific use case or framework and thus hosting a generic web service network becomes infinitely hard with the number of different APIs and Implementations. &lt;br /&gt;&lt;br /&gt;OK so if you are going to use Agents in Web Services what issues do you need to consider? Also what do you gain through doing this?&lt;br /&gt;&lt;br /&gt;One of the key ideas which came out of a talk at worldcomp is to use Agents to be the intelligent front to a web service. This enables an agent to track of a set of web services including information about a specific web service such as availability, versions, changing cost and and offline copy if the service allows this. So the agent becomes a Rendezvous Point for a series of web services. &lt;br /&gt;&lt;br /&gt;So why aren't we seeing more collaboration between the Agent community and the Web Services community?</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-6139828693528407847</guid>
         <pubDate>Wed, 16 Jul 2008 13:19:00 +0000</pubDate>
      </item>
      <item>
         <title>#crigshow - Conference 1 - Oscelot</title>
         <link>http://davetaz-blog.blogspot.com/2008/07/crigshow-conference-1-oscelot.html</link>
         <description>This open source day (#osdiii) hosted by Oscelot was an unconferene which soon became based heavily around the Blackboard platform. This was expected as the majority of people attending it were then going on to attend the BbWorld conference. With the title of the conference being Open Source and yet the main topic being that of a Closed Source product this gave an opening for the CRIG team to promote the wider Open Source community to those who are focused on Blackboard use cases. &lt;br /&gt;&lt;br /&gt;The day was a success for the team as we promoted good practices in web development, standards, resource management and the fact that the people who manage an eLearning platform has a responsibility to the content they hold. &lt;br /&gt;&lt;br /&gt;From our point of view, we discovered: If blackboard is the industry leader in learning management systems then the repository community is big problems when it comes to archiving these resources by the current methodologies each community practices. &lt;br /&gt;&lt;br /&gt;More Collaboration and Awareness please!</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-8889383351258675109</guid>
         <pubDate>Mon, 14 Jul 2008 13:00:00 +0000</pubDate>
      </item>
      <item>
         <title>OAI-PMH + OAI-ORE (Atom) + Pronom Droid = Pretty</title>
         <link>http://davetaz-blog.blogspot.com/2008/06/oai-pmh-oai-ore-atom-pronom-droid.html</link>
         <description>I've just finished writing a wrapper (very simple!) which takes a &lt;a rel="nofollow" target="_blank" href="http://www.openarchives.org/ore/"&gt;OAI-ORE&lt;/a&gt; Resource Map in Atom Format and classifies the objects which are listed in the Aggregation using the National Archives (UK) technical registry (&lt;a rel="nofollow" target="_blank" href="http://www.nationalarchives.gov.uk/pronom/"&gt;Pronom&lt;/a&gt;). &lt;br /&gt;&lt;br /&gt;The wrapper provides a simple front end to the &lt;a rel="nofollow" target="_blank" href="http://droid.sourceforge.net/wiki/index.php/Introduction"&gt;DROID tool&lt;/a&gt;, it takes an &lt;a rel="nofollow" target="_blank" href="http://www.openarchives.org/pmh/"&gt;OAI-PHM&lt;/a&gt; URI and requests the latest resource maps in atom format (ore-atom) and creates a list of the resources which are passed to DROID to classify directly. &lt;br /&gt;&lt;br /&gt;The wrapper requires OAI-PMH as it requests all records which have been modified since it last did a parse of the repository. This way the wrapper can be scheduled to run once a day/week/month etc. &lt;br /&gt;&lt;br /&gt;A single DROID xml file comes back as the output. &lt;br /&gt;&lt;br /&gt;This is all working with EPrints repository software currently.&lt;br /&gt;&lt;br /&gt;Next stage is to do something useful with the output xml in terms of providing useful data back to the repository manager.&lt;br /&gt;&lt;br /&gt;Total lines of source code for the wrapper: 302 :)</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-6418976742344708314</guid>
         <pubDate>Fri, 27 Jun 2008 05:56:00 +0000</pubDate>
      </item>
      <item>
         <title>ORE software libraries from Foresite</title>
         <link>http://chronicles-of-richard.blogspot.com/2008/06/foresite-1-project-is-pleased-to.html</link>
         <description>The Foresite [1] project is pleased to announce the initial code of two software libraries for constructing, parsing, manipulating and serialising OAI-ORE [2] Resource Maps.  These libraries are being written in Java and Python, and can be used generically to provide advanced functionality to OAI-ORE aware applications, and are compliant with the latest release (0.9) of the specification.  The software is open source, released under a BSD licence, and is available from a Google Code repository:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://code.google.com/p/foresite-toolkit/"&gt;http://code.google.com/p/foresite-toolkit/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You will find that the implementations are not absolutely complete yet, and are lacking good documentation for this early release, but we will be continuing to develop this software throughout the project and hope that it will be of use to the community immediately and beyond the end of the project.&lt;br /&gt;&lt;br /&gt;Both libraries support parsing and serialising in: ATOM, RDF/XML, N3, N-Triples, Turtle and RDFa&lt;br /&gt;&lt;br /&gt;Foresite is a JISC [3] funded project which aims to produce a demonstrator and test of the OAI-ORE standard by creating Resource Maps of journals and their contents held in JSTOR [4], and delivering them as ATOM documents via the SWORD [5] interface to DSpace [6].  DSpace will ingest these resource maps, and convert them into repository items which reference content which continues to reside in JSTOR.  The Python library is being used to generate the resource maps from JSTOR and the Java library is being used to provide all the ingest, transformation and dissemination support required in DSpace.&lt;br /&gt;&lt;br /&gt;Please feel free to download and play with the source code, and let us have your feedback via the Google group:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="mailto:foresite@googlegroups.com"&gt;foresite@googlegroups.com&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Richard Jones &amp; Rob Sanderson&lt;br /&gt;&lt;br /&gt;[1] Foresite project page: &lt;a rel="nofollow" target="_blank" href="http://foresite.cheshire3.org/"&gt;http://foresite.cheshire3.org/&lt;/a&gt;&lt;br /&gt;[2] OAI-ORE specification: &lt;a rel="nofollow" target="_blank" href="http://www.openarchives.org/ore/0.9/toc"&gt;http://www.openarchives.org/ore/0.9/toc&lt;/a&gt;&lt;br /&gt;[3] Joint Information Systems Committee (JISC): &lt;a rel="nofollow" target="_blank" href="http://www.jisc.ac.uk/"&gt;http://www.jisc.ac.uk/&lt;/a&gt;&lt;br /&gt;[4] JSTOR: &lt;a rel="nofollow" target="_blank" href="http://www.jstor.org/"&gt;http://www.jstor.org/&lt;/a&gt;&lt;br /&gt;[5] Simple Web Service Offering Repository Deposit (SWORD):&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/repositories/digirep/index/SWORD"&gt;http://www.ukoln.ac.uk/repositories/digirep/index/SWORD&lt;/a&gt;&lt;br /&gt;[6] DSpace: &lt;a rel="nofollow" target="_blank" href="http://www.dspace.org/"&gt;http://www.dspace.org/&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/CiZe1c6fjFs" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-4049331535370251135</guid>
         <pubDate>Mon, 09 Jun 2008 14:56:00 +0000</pubDate>
      </item>
      <item>
         <title>Repository Software is Dead</title>
         <link>http://davetaz-blog.blogspot.com/2008/06/repository-software-is-dead.html</link>
         <description>Repository Software for digital collections as we know it supplies the complete solution to the client, thus without the software you cannot access any of the data in your repository. This is a bad thing for object reuse and digital preservation! &lt;br /&gt;&lt;br /&gt;Many people at conferences such as &lt;a rel="nofollow" target="_blank" href="http://or08.ecs.soton.ac.uk"&gt;Open Repositories 2008&lt;/a&gt; and from workgroups like &lt;a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/repositories/digirep/index/CRIG"&gt;CRIG&lt;/a&gt; have been talking for a long while about the importance of Interoperability. However, if you get rid of the need for the interoperability and use a standard specification for accessing simple data objects (pdfs and their metadata), then you don't need interoperability! &lt;br /&gt;&lt;br /&gt;So this leads me to the fact that &lt;a rel="nofollow" target="_blank" href="http://www.eprints.org"&gt;EPrints&lt;/a&gt;, &lt;a rel="nofollow" target="_blank" href="http://www.fedora-commons.org/"&gt;Fedora&lt;/a&gt; and hopefully at some point &lt;a rel="nofollow" target="_blank" href="http://www.dspace.org"&gt;DSpace&lt;/a&gt; are abstracting their database and storage layers to support use of any type of storage platform. Thanks goes &lt;a rel="nofollow" target="_blank" href="http://www.sun.com"&gt;SUN Microsystems&lt;/a&gt; preservation action group and open storage group for pushing this work from a commercial perspective. But we need to go further than this to get rid of the need for interoperability.&lt;br /&gt;&lt;br /&gt;From &lt;a rel="nofollow" target="_blank" href="http://or08.ecs.soton.ac.uk"&gt;Open Repositories 2008&lt;/a&gt;, myself and a college &lt;a rel="nofollow" target="_blank" href="http://oxfordrepo.blogspot.com/"&gt;Ben O'Steen&lt;/a&gt; from Oxford University proved how &lt;a rel="nofollow" target="_blank" href="http://www.ope anarchives.org/ore/"&gt;OAI-ORE (OAI specification for Object Reuse and Exchange)&lt;/a&gt; can be used to enable high level repository interoperability. &lt;a rel="nofollow" target="_blank" href="http://www.preserv.org.uk/?page=oai-ore"&gt;This work won us $5000&lt;/a&gt; but more importantly got the community thinking about the true power of a specification like OAI-ORE. Ben and I are now hoping to push this work down to the low level storage such that the objects within an ORE map (documents and metadata) can be directly referenced without the need for the current repository layer. For this to happen &lt;b&gt;all objects need to be stored in their simplest form - NO WRAPPER FORMATS ALLOWED at the lowest level&lt;/b&gt;. &lt;br /&gt;&lt;br /&gt;From recent talks with Sandy Payette and Les Carr (Fedora and EPrints respectively) I am envisaging that the current repository software becomes classified as repository service software which is able to manage low level objects but is not specifically required to access these objects. So current services which plug into the repository software can act directly on the objects.&lt;br /&gt;&lt;br /&gt;A couple of problems to solve, security and consistency of cached data. All especially applicable if you have more than one piece of repository service software modifying your objects.</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-6344307351894496505</guid>
         <pubDate>Sun, 08 Jun 2008 04:43:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG / IEDemonstator After Thoughts</title>
         <link>http://davetaz-blog.blogspot.com/2008/06/crig-iedemonstator-after-thoughts.html</link>
         <description>IEDemonstrator is a really bad name for a project as it just says Microsoft to me but I'm fairly it isn't anything to do with that most stable of web browsers.&lt;br /&gt;&lt;br /&gt;From the workshop it has become clear to me that discussing a specification for service interaction globally is going to be impossible. This could be due to the fact that SOAP did such a good job of it and no one wants to use anything else (enough sarcasm??). I think many people left the workshop with a much better idea at how HTTP error codes (which have been around years) already go most of the way to solving a web service model. We also realised quickly that any specification would have to be built specifically for pay services (e.g. make use of the 402 code), this would then encourage companies/institutions to supply reliable services which last more than 4 years (cough AHDS cough).</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-8557063529511560836</guid>
         <pubDate>Sun, 08 Jun 2008 04:35:00 +0000</pubDate>
      </item>
      <item>
         <title>First Post - CRIG DRY Workshop</title>
         <link>http://davetaz-blog.blogspot.com/2008/06/first-post-crig-dry-workshop.html</link>
         <description>Well there's a surprise!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_DRY_Workshop"&gt;CRIG DRY Workshop&lt;/a&gt; in Bath is where I am now. So what's happening:&lt;br /&gt;&lt;br /&gt;People have been talking about services and proposed projects to provide authoritative and complete services to users/agents/repositories. A couple of themes have come out morning session for me:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.w3.org/2004/02/skos/"&gt;SKOS: &lt;/a&gt; A lot of projects (incl. &lt;a rel="nofollow" target="_blank" href="http://lcsh.info"&gt;Library of Congress&lt;/a&gt;) are using this RDF language to describe subject and properties. Each provides access to this information in so many different ways it is hard to see how to interact in a constant manor.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Service Interaction&lt;/b&gt; (read on as the name is not that descriptive)&lt;br /&gt;&lt;br /&gt;This moves us on from the Open Storage stuff i've been working on (again more later in another blog post) into how we facilitate the use of services and discover how to interact with these services. We are pushing for the use of http codes! CRIG it. &lt;br /&gt;&lt;br /&gt;Tis it for now....</description>
         <author>Dave Tarrant</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-4926451824261299693.post-1214878339545743666</guid>
         <pubDate>Fri, 06 Jun 2008 04:43:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG Flipchart Outputs</title>
         <link>http://chronicles-of-richard.blogspot.com/2008/01/crig-flipchart-outputs.html</link>
         <description>The JISC &lt;a rel="nofollow" target="_blank" href="http://chronicles-of-richard.blogspot.com/search/label/CRIG"&gt;CRIG&lt;/a&gt; meeting which I previously &lt;a rel="nofollow" target="_blank" href="http://chronicles-of-richard.blogspot.com/2007/12/crig-meeting-day-1-1.html"&gt;live-blogged&lt;/a&gt; from has now had its output formulated into a series of slides with annotations on Flickr, which can be found here:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.flickr.com/photos/wocrig/"&gt;http://www.flickr.com/photos/wocrig/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The process by which this was achieved was through an intense round of brain-storming sessions culminating in a room full of topic spaced flip chart sheets.  We then performed a &lt;a rel="nofollow" target="_blank" href="http://www.dotmocracy.org/"&gt;Dotmocracy&lt;/a&gt;, and the results that you see on the Flickr page are the ideas which made it through the process as having some interest invested in them.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/emYCzRUx-Pk" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-4749146878939399170</guid>
         <pubDate>Thu, 24 Jan 2008 14:23:00 +0000</pubDate>
      </item>
      <item>
         <title>European ORE Roll-Out at Open Repositories 2008</title>
         <link>http://chronicles-of-richard.blogspot.com/2008/01/european-ore-roll-out-at-open.html</link>
         <description>The European leg of the ORE roll-out has been announced and will occur on the final day of the &lt;a rel="nofollow" target="_blank" href="http://or08.ecs.soton.ac.uk/"&gt;Open Repositories 2008&lt;/a&gt; conference in Southampton, UK.  This is to complement the meeting at Johns Hopkins University in Baltimore on March 3.  From the email circular:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;A meeting will be held on April 4, 2008 at the University of Southampton, in conjunction with Open Repositories 2008, to roll-out the beta release of the OAI-ORE specifications. This meeting is the European follow-on to a meeting that will be held in the USA on March 3, 2008 at Johns Hopkins University.&lt;br /&gt;&lt;br /&gt;The OAI-ORE specifications describe a data model to identify and describe aggregations of web resources, and they introduce machine-readable formats to describe these aggregations based on ATOM and RDF/XML. The current, alpha version of the OAI-ORE specifications is at &lt;a rel="nofollow" target="_blank" href="http://www.openarchives.org/ore/0.1/"&gt;http://www.openarchives.org/ore/0.1/&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Additional details for the OAI-ORE European Open Meeting are available at:&lt;br /&gt;&lt;br /&gt;- The full press release for this event:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.openarchives.org/ore/documents/EUKickoffPressrelease.pdf"&gt;http://www.openarchives.org/ore/documents/EUKickoffPressrelease.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;- The registration site for the event:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://regonline.com/eu-oai-ore"&gt;http://regonline.com/eu-oai-ore&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Note that registration is required and space is limited.&lt;br /&gt;&lt;/blockquote&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/lsmJMN_BEro" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-8526639596679087835</guid>
         <pubDate>Wed, 23 Jan 2008 17:11:00 +0000</pubDate>
      <enclosure url="http://www.openarchives.org/ore/documents/EUKickoffPressrelease.pdf" length="59937" type="application/pdf" /><media:content url="http://www.openarchives.org/ore/documents/EUKickoffPressrelease.pdf" fileSize="59937" type="application/pdf" /><itunes:subtitle>The European leg of the ORE roll-out has been announced and will occur on the final day of the Open Repositories 2008 conference in Southampton, UK. This is to complement the meeting at Johns Hopkins University in Baltimore on March 3. From the email circ</itunes:subtitle><itunes:author>Richard</itunes:author><itunes:summary>The European leg of the ORE roll-out has been announced and will occur on the final day of the Open Repositories 2008 conference in Southampton, UK. This is to complement the meeting at Johns Hopkins University in Baltimore on March 3. From the email circular: A meeting will be held on April 4, 2008 at the University of Southampton, in conjunction with Open Repositories 2008, to roll-out the beta release of the OAI-ORE specifications. This meeting is the European follow-on to a meeting that will be held in the USA on March 3, 2008 at Johns Hopkins University. The OAI-ORE specifications describe a data model to identify and describe aggregations of web resources, and they introduce machine-readable formats to describe these aggregations based on ATOM and RDF/XML. The current, alpha version of the OAI-ORE specifications is at http://www.openarchives.org/ore/0.1/. Additional details for the OAI-ORE European Open Meeting are available at: - The full press release for this event: http://www.openarchives.org/ore/documents/EUKickoffPressrelease.pdf - The registration site for the event: http://regonline.com/eu-oai-ore Note that registration is required and space is limited. </itunes:summary></item>
      <item>
         <title>Fine Grained Repository Interoperability: can't package, won't package</title>
         <link>http://chronicles-of-richard.blogspot.com/2008/01/fine-grained-repository.html</link>
         <description>Sadly (although some of you may not agree!), my paper proposed for this year's Open Repositories conference in Southampton has not made it through the Programme Committee.  I include here, therefore, my submission so that it may live on, and you can get an idea of the sorts of things I was thinking about talking about.&lt;br /&gt;&lt;br /&gt;The reasons given for not accepting it are probably valid; mostly concerning a lack of focus.  Honestly, I thought it did a pretty good job of saying what I would talk about, but such is life.&lt;br /&gt;&lt;br /&gt;&lt;hr/&gt;&lt;br /&gt;&lt;br /&gt;What is the point of interoperability, what might it allow us to achieve, and why aren't we very good at it yet?  &lt;br /&gt;&lt;br /&gt;Interoperability is a loosely defined concept.  It can allow systems to talk to each other about the information that they hold, about the information that they can disseminate, and to interchange that information.  It can allow us to tie systems together to improve ingest and dissemination of repository holdings, and allows us to distribute repository functions across multiple systems.  It ought even to allow us to offer repository services to systems which don't do so natively, improving the richness of the information space; repository interoperability is not just about repository to repository, it is also about cross-system communications.  The maturing set of repositories such as DSpace, Fedora and EPrints and other information systems such as publications management tools and research information systems, as well as home-spun solutions are making the task of taking on the interoperability beast both tangible and urgent.&lt;br /&gt;&lt;br /&gt;Traditional approaches to interoperability have often centred around moving packaged information between systems (often other repositories).  The effect this has is to introduce a black-box problem concerning the content of the package itself.  We are no longer transferring information, we are transferring data!  It therefore becomes necessary to introduce package descriptors which allow the endpoint to re-interpret the package correctly, to turn it back into information.  But this constrains us very tightly in the form of our packages, and introduces a great risk of data loss.  Furthermore, it means that we cannot perform temporally and spatially disparate interoperability on an object level (that is, assemble an object's content over a period of time, and from a variety of sources).  A more general approach to information interchange may be more powerful.  &lt;br /&gt;&lt;br /&gt;This paper brings together a number of sources.  It discusses some of the work undertaken at Imperial College London to connect a distributed repository system (built on top of DSpace) to an existing information environment.  This provides repository services to existing systems, and offers library administrators custom repository management tools in an integrated way.  It also considers some of the thoughts arising from the JISC Common Repository Interfaces Group (CRIG) in this area, as well as some speculative proposals for future work and further ideas that may need to be explored.&lt;br /&gt;&lt;br /&gt;Where do we start?  The most basic way to address this problem is to break the idea of the package down into its most simple component parts in the context of a repository: the object metadata, the file content, and the use rights metadata.  Using this approach, you can go a surprisingly long way down the interoperability route without adding further complexity.  At the heart of the Imperial College Digital Repository is a set of web services which deal with exactly this fine structure of the package, because the content for the repository may be fed from a number of sources over a period of time, and thus there never is a definitive package.&lt;br /&gt;&lt;br /&gt;These sorts of operations are not new, though, and there are a variety of approaches to it which have already been undertaken.  For example, WebDAV offers extensions to HTTP to deal with objects using operations such as PUT, COPY or MOVE which could be used to achieve the effects that we desire.  The real challenge, therefore, is not in the mechanics of the web services which we use to exchange details about this deconstructed package, but is in the additional complexities which we can introduce to enhance the interoperability of our systems and provide the value-added services which repositories wish to offer.  &lt;br /&gt;&lt;br /&gt;Consider some other features of interoperability which might be desirable&lt;br /&gt;&lt;br /&gt;- &lt;strong&gt;fine grained or partial metadata records.&lt;/strong&gt;  We may wish to ingest partial records from a variety of sources to assemble into a single record, or disseminate only subsets of our stored metadata.&lt;br /&gt;- &lt;strong&gt;file metadata&lt;/strong&gt;, or any other sub-structure element of the object.  This may include bibliographic, administrative or technical metadata.&lt;br /&gt;object structural information, to allow complex hierarchies and relationships to be expressed and modified.&lt;br /&gt;- &lt;strong&gt;versioning&lt;/strong&gt;, and other inter-object relationships.&lt;br /&gt;- &lt;strong&gt;workflow status&lt;/strong&gt;, if performing deposit across multiple systems, it may be necessary to be aware of the status of the object in each system to calculate an overall state.&lt;br /&gt;- &lt;strong&gt;state and provenance reporting&lt;/strong&gt;, to offer feedback on the repository state to other information systems, administrators or users.&lt;br /&gt;- &lt;strong&gt;statistics&lt;/strong&gt;, to allow content delivery services to aggregate statistics globally.&lt;br /&gt;- &lt;strong&gt;identifiers&lt;/strong&gt;, to support multiple identification schemes.&lt;br /&gt;&lt;br /&gt;Techniques such as application profiling for metadata allow us to frame entire metadata records in terms of their interpretation (e.g. the Scholarly Works Application Profile (SWAP)), but should also be used to frame individual metadata elements.  Object structural data can be encoded using standards such as METS, which can also help us with attaching metadata to sub-structures of the object itself, such as its files.  Versioning, and other inter-object relationships could be achieved using an RDF approach, and perhaps the OAI-ORE project will offer some guidance.  But other operations such as workflow status, and state and provenance reporting do not have such clear approaches.  Meanwhile, the Interoperable Repository Statistics (IRS) project has looked at the statistics problem, and the RIDIR project is looking into interoperable identifiers.  In these latter cases, can we ever consider providing access to their outcomes or services through some general fine grained interface?&lt;br /&gt;&lt;br /&gt;The Imperial College Digital Repository offers limited file metadata which is attached during upload and exposed as part of a METS record, detailing the entire digital object, as a descriptive metadata section.  It can deal with the idea that some metadata comes from one source, while other metadata comes from another, allowing for a primitive partial metadata interchange process.  Conversely, it will also deal with multiple metadata records for the same item.  Also introduced are custom workflow metadata fields which allow some basic interaction between different systems to track deposit of objects both from the point of view of the administrator, the depositor and the systems themselves.  In addition, there is an extensible notifications engine which is used to produce periodic reports to all depositors whose content has undergone some sort of modification or interesting event in a given time period.  This notifications engine is behind a very generic web service which offers extreme flexibility within the context of the College's information environment.&lt;br /&gt;&lt;br /&gt;Important work in the fields that will help achieve this interoperability include the SWORD deposit mechanism which currently deals with packages but may be extensible to include these much needed enhancements.  Meanwhile, the OAI-ORE will be able to provide the semantics for complex objects which will no doubt assist in framing the problems that interoperability faces in a manor in which they can be solved.&lt;br /&gt;&lt;br /&gt;Other examples of the spaces in which interoperability needs to work would include the EThOSnet project, the UK national e-theses effort, where it is conceivable that institutions may want to provide their own e-theses submission system with integration into the central hub to offer seamless distributed submission.  Or in the relationship between Current Research Information Systems (CRIS) and open access repositories, to offer a full-stack information environment for researchers and administrators alike.  The possibilities are extensive and the benefit to the research community would be truly great.  HP Labs is actively researching in these and related areas with its continued work on the DSpace platform.&lt;br /&gt;&lt;br /&gt;&lt;hr/&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/ykFbvAZETZs" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-2615047246925194437</guid>
         <pubDate>Tue, 22 Jan 2008 18:56:00 +0000</pubDate>
      </item>
      <item>
         <title>SWORD/ORE</title>
         <link>http://chronicles-of-richard.blogspot.com/2008/01/swordore.html</link>
         <description>Last week I was at the ORE meeting in Washington DC, and presented some thoughts regarding SWORD and its relationship to ORE.  The slides I presented can be found here:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://wiki.dspace.org/static_files/1/1d/Sword-ore.pdf"&gt;http://wiki.dspace.org/static_files/1/1d/Sword-ore.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;[Be warned that discussion on these slides ensued, and they therefore don't reflect the most recent thinking on the topic]&lt;br /&gt;&lt;br /&gt;The overall approach of using SWORD as the infrastructure to do deposit for ORE seems sound.  There are three main approaches identified:&lt;br /&gt;&lt;br /&gt;- &lt;strong&gt;SWORD is used to deposit the URI of a Resource Map onto a repository&lt;/strong&gt;&lt;br /&gt;- &lt;strong&gt;SWORD is used to deposit the Resource Map as XML onto a repository&lt;/strong&gt;&lt;br /&gt;- &lt;strong&gt;SWORD is used to deposit a package containing the digital object and its Resource Map onto a repository&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;In terms of complications there are two primary ones which concern me the most:&lt;br /&gt;&lt;br /&gt;- &lt;strong&gt;Mapping of the SWORD levels to the usage of ORE.&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;The principal issue is that level 1 implies level 0, and therefore level 2 implies level 1 and level 0.  The inclusion of semantics to support ORE specifics could invoke a new level, and if this level is (for argument's sake) level 3, it implies all the levels beneath it, whatever they might require.  Since the service, by this stage, is becoming complex in itself, such a linear relationship might not follow.&lt;br /&gt;&lt;br /&gt;A brief option discussed at the meeting would be to modularise the SWORD support instead of implementing a level based approach.  That is, the service document would describe the actual services offered by the server, such as ORE support, NoOp support, Verbose support and so forth, with no recourse to "bundles" of functionality labelled by linear levelling.&lt;br /&gt;&lt;br /&gt;- &lt;strong&gt;Scalability of the service document&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;The mechanisms imposed by ORE allow for complex objects to be attached to other complex objects as aggregated resources (ORE term).  This means that you could have a resource map which you wish to tell a repository describes a new part of an existing complex object.  In order to do this, the service document will need to supply the appropriate deposit URI for a segment of an existing repository item.  In DSpace semantics, for example, we may be adding a cluster of files to an existing item, and would therefore require the deposit URI of the item itself.  To do otherwise would be to limit the applicability of ORE within SWORD and the repository model.  Our current service document is a flat document describing what is pragmatically assumed (correctly, in virtually all cases) to be a small selection of deposit URIs.  The same will not be true of item level deposit targets, which could be a very large number of possible deposit targets.  Furthermore, in repositories which exploit the full descriptive capabilities of ORE, the number of deposit targets could be identical to the number of aggregations described (which can be more than one per resource map), which has the potential to be a very large number.&lt;br /&gt;&lt;br /&gt;The consequences are in scalability of response time, which is a platform specific issue, and the scalability of the document itself and the usefulness of the consequences.  It may be more useful to navigate hierarchically through the different levels of the service document in order to identify deposit nodes.&lt;br /&gt;&lt;br /&gt;Any feedback on this topic is probably most useful in the &lt;a rel="nofollow" target="_blank" href="http://groups.google.com/group/oai-ore"&gt;ORE Google Group&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/DJEvhQnxebE" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-9149789036743981806</guid>
         <pubDate>Mon, 21 Jan 2008 13:05:00 +0000</pubDate>
      </item>
      <item>
         <title>BMC and the Free Open Repository Trial</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/12/bmc-and-free-open-repository-trial.html</link>
         <description>Our good buddies at BioMedCentral's Open Repository team have released the latest upgrade to their service, and are offering 3 month trial repositories for evaluation.  From the &lt;a rel="nofollow" target="_blank" href="http://www.dspace.org/"&gt;DSpace home page&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;BioMed Central announced the latest upgrades to Open Repository, the open access publisher's hosted repository solution. Open Repository offers institutions a cost effective repository solution (setup, hosting and maintenance) which includes new DSpace features, customization options, improved user interface.  Along with the annoucement of the upgrades, Open Repository is offereing a free 3-month pilot repository, so institutions can test the suitability of the service without obligation. See the full articles in &lt;a rel="nofollow" target="_blank" href="http://newsbreaks.infotoday.com/wndReader.asp?ArticleId=40331"&gt;Weekly News Digest&lt;/a&gt; and in &lt;a rel="nofollow" target="_blank" href="http://www.alphagalileo.org/index.cfm?fuseaction=readrelease&amp;releaseid=525415"&gt;Alpha Galieo&lt;/a&gt;.&lt;br /&gt;&lt;/blockquote&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/cxCzlAqTC1k" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-5373224028320805461</guid>
         <pubDate>Wed, 12 Dec 2007 09:36:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG Meeting Day 2 (2)</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/12/crig-meeting-day-2-2.html</link>
         <description>Topics for today:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Unconference#Friday_December_7th"&gt;http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Unconference#Friday_December_7th&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The ones that interest me the most are probably these:&lt;br /&gt;&lt;br /&gt;- Death to Packages&lt;br /&gt;&lt;br /&gt;Not really Death to Packages, but lets not forget that packaging sometimes isn't what we want to do or what we can do.&lt;br /&gt;&lt;br /&gt;- Get What?&lt;br /&gt;&lt;br /&gt;This harks to my ORE interest, as to what is available under the URLs, and what that means for something like content negotiation.&lt;br /&gt;&lt;br /&gt;- One Put to Multiple Places&lt;br /&gt;&lt;br /&gt;Really important to distributed information systems (e.g. ethosnet integration into local institutions).  Also, this relates, for me, to the unpackaging question, because it introduces differences between what systems might all be expecting.&lt;br /&gt;&lt;br /&gt;- Web 2.0 interfaces (ok, ok)&lt;br /&gt;&lt;br /&gt;I'm interested in web services.  Yes it's a bit trendy.  But it is useful.&lt;br /&gt;&lt;br /&gt;- Core Servies of a Repository&lt;br /&gt;&lt;br /&gt;For repository core architecture, this is important.  With my DSpace hat on I'd like to see what sorts of things an internal service architecture or api ought to be able to support&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/XguJLOxM_NU" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-1908914017517604548</guid>
         <pubDate>Fri, 07 Dec 2007 10:08:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG Meeting Day 2 (1)</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/12/crig-meeting-day-2-1.html</link>
         <description>It's first thing on day two.  I'm late because I have to get all the way across town, which takes a surprisingly long time in London.  I should have just stayed at a nearby hotel.  Oh well.&lt;br /&gt;&lt;br /&gt;The remainder of yesterday was interesting.  Scope for live blogging is difficult, as the conference is extremely mobile.  Today I will have to pick a point and hide in a corner to get you up to date.&lt;br /&gt;&lt;br /&gt;In the afternoon we discussed the CRIG scenarios, and then implemented something called a Dotmocracy, which involves sticking dots (like house points at school) next to topics which appeared which we were interested in.  When we start up today, the first order of business will be to see what topics made the cut.  From what I saw at the end of the day, this will include Federated Searching, Google Search, and package deconstruction (my personal favourite this week).&lt;br /&gt;&lt;br /&gt;As a brief aside, one running theme has been "no more standards".  As it happens, I disagree with this.  We're never going to get everything thinking the same and working the same.  That's why there are so many standards, and why new ones get made all the time.  It's the way of the world.  At least, with a standard, though, when you have implemented one, you at least have a way of telling people what you did, over the home grown undocumented solutions which are the alternative.&lt;br /&gt;&lt;br /&gt;Right, I suppose I'd better get my skates on.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/dPbLzdCq4ok" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-4799712183286978005</guid>
         <pubDate>Fri, 07 Dec 2007 08:42:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG Meeting Day 1 (2)</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/12/crig-meeting-day-1-2.html</link>
         <description>&lt;a rel="nofollow" target="_blank" href="http://en.wikipedia.org/wiki/Unconference"&gt;http://en.wikipedia.org/wiki/Unconference&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;See also &lt;a rel="nofollow" target="_blank" href="http://wwmm.ch.cam.ac.uk/blogs/downing/"&gt;Jim Downing's&lt;/a&gt; live blogging.&lt;br /&gt;&lt;br /&gt;We've just done a round of preliminary unconferencing, where the &lt;a rel="nofollow" target="_blank" href="http://chronicles-of-richard.blogspot.com/2007/11/sword-10-released.html"&gt;CRIG Podcast&lt;/a&gt; topics were brainstormed onto flip charts.  Not sure how useful that's going to be, but I'm going to approach the whole thing with an open mind.  I've got my marker pen, my baloon, and my three dots.&lt;br /&gt;&lt;br /&gt;wish me luck ...&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/5j8sMCBCo5k" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-1512541800794566898</guid>
         <pubDate>Thu, 06 Dec 2007 15:01:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG Meeting Day 1 (1)</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/12/crig-meeting-day-1-1.html</link>
         <description>Some live blogging; may be slightly malformed, as this is happening inline, with no post-editing.&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Unconference"&gt;http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Unconference&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Les Carr and Jim Downing have introduced us to the CRIG workshop first day.  We're unconferencing which means that there's not a programme!  We're going to try and stay at the abstract or high level discussion, not try to talk about technology.&lt;br /&gt;&lt;br /&gt;David Flanders outlines the meeting philosophy.  The outputs aimed for the meeting include: ideas (bluesky), standards and scenarios and how they can be linked together.  The outputs will be taken to OR08.  The best way for a group to produce good stuff is for everyone to think about themselves.  Makes me think of an article I read recently:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www7.nationalgeographic.com/ngm/0707/feature5/index.html"&gt;http://www7.nationalgeographic.com/ngm/0707/feature5/index.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We are &lt;em&gt;not&lt;/em&gt; about creating new specs.&lt;br /&gt;&lt;br /&gt;Julie then brings us some stuff about SWORD.  See my &lt;a rel="nofollow" target="_blank" href="http://chronicles-of-richard.blogspot.com/2007/11/sword-10-released.html"&gt;previous post&lt;/a&gt; on this.  We are going to have implementations for xrXiv, white rose research online and Jorum.  A SPECTRa deposit client, and later an article in Ariadne and a presentation at OR08.&lt;br /&gt;&lt;br /&gt;Break time ... tea and coffee!&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/wSKAR7Doxt8" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-6875507276473434800</guid>
         <pubDate>Thu, 06 Dec 2007 13:19:00 +0000</pubDate>
      </item>
      <item>
         <title>CRIG Podcast</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/11/crig-podcast.html</link>
         <description>A couple of weeks ago the JISC CRIG (Common Repository Interfaces Group) organised a series of telephone debates on important areas for it.  These have now been edited into  short commentaries which might be of interest to you, and are aimed at priming and informing the upcoming "unconference" to be held 6/7 December in London:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Podcasts"&gt;http://www.ukoln.ac.uk/repositories/digirep/index/CRIG_Podcasts&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The "unconference" will take place at Birkbeck College in Bloomsbury, London.  Take a listen, and enjoy.  Yours truly appears in the "Get and Put within Repositories" and the "Object Interoperability" discussions.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/WXv1M_TyW-w" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-2034686157678763588</guid>
         <pubDate>Fri, 30 Nov 2007 09:19:00 +0000</pubDate>
      </item>
      <item>
         <title>SWORD 1.0 Released</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/11/sword-10-released.html</link>
         <description>Just a quick heads up to say that the SWORD 1.0 release is now out and ready for download from SourceForge:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://sourceforge.net/projects/sword-app/"&gt;http://sourceforge.net/projects/sword-app/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here you will find the common java library which supports repositories wanting to implement SWORD, plus implementations for DSpace and Fedora.  There is also a client (with GUI and CLI versions) which you can use to deposit content into the repositories.&lt;br /&gt;&lt;br /&gt;The DSpace implementation is designed only to work with the forthcoming DSpace 1.5 (which is currently in Alpha release).  Your feedback and experiences with the code would be much appreciated.  We expect to be making refinements to the DSpace implementation up unitl DSpace 1.5 is released as stable.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/OIEs2RRDcp4" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-8632597815206720838</guid>
         <pubDate>Thu, 08 Nov 2007 14:45:00 +0000</pubDate>
      </item>
      <item>
         <title>Scandinavian Dugnad</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/10/scandinavian-dugnad.html</link>
         <description>I was invited by the Scandinavian DSpace User Group meeting to join them in their first official meeting yesterday in Oslo.  It was great to see so many people representing a small-ish geographical area and a reasonably small population all together from 4 nations (Norway, Sweden, Finland and Denmark) to talk about DSpace.  Probably 35 people all-in, with plans to extend the group to be the Nordic DSpace User Group to include members from Iceland, and perhaps even the Faroe Islands, and Greenland (if DSpace instances appear there).&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://wiki.dspace.org/index.php/Scandinavia"&gt;http://wiki.dspace.org/index.php/Scandinavia&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In the grand traditions of Open Source and Open Access, I borrowed presentations given at the recent DSpace User Group Rome, and gave them an update on the state of the DSpace Foundation, DSpace 2.0, and then went on to produce some original slides telling folks how to get involved in DSpace developments.  Hopefully all the content will be available on the web soon.&lt;br /&gt;&lt;br /&gt;As your humble chronicaller struggled with his sub-par Norwegian, he picked up some interesting things.  There is good user end development going on in Scandinavia which could be harnessed to bring improvements to the DSpace UI.  There are also increasingly many requests for "Integration with ...", where the object of integration is one of a variety of library information systems.  Statistics are high on the agenda here as they are everywhere else.  They are also a base of experts in multi-language problems stemming from being polyglot nations with additional letters in their native alphabets.&lt;br /&gt;&lt;br /&gt;It's clear where the future of repositories lie in Scandinavian nations where the national interest and the community feature prominently in society and culture.  Bibsys, a major supplier of library systems and services in Norway (and organisers of the meeting), have 29 DSpace clients on their books already, and are looking at tighter integration between it and their other products, right down to the information model level.  National research reporting systems are much desired repository data sources, and internal information systems at each institutions are starting to feed into their public repositories.&lt;br /&gt;&lt;br /&gt;With such a big user group, and such a community focus, there is little doubt in my mind that the Nordic user group will be a great asset to the DSpace users in that region, and probably to the DSpace community as a whole.&lt;br /&gt;&lt;br /&gt;PS Dugnad is a Norwegian word effectively referring to voluntary, communal work which benefits the community to some degree, but is also social and enjoyable for the participants.  It also formed the basis of the 2006 DSpace User Group Meeting in Bergen&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://dsug2006.uib.no/"&gt;http://dsug2006.uib.no/&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/BlPZVoGLi1A" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-7713247949699281106</guid>
         <pubDate>Wed, 31 Oct 2007 12:23:00 +0000</pubDate>
      </item>
      <item>
         <title>DSpace 1.5 Alpha with experimental binary distribution</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/10/dspace-15-alpha-with-experimental.html</link>
         <description>The DSpace 1.5 Alpha has now been released and we encourage you to download this exciting new release of DSpace and try it out.&lt;br /&gt;&lt;br /&gt;There are big changes in this code base, both in terms of functionality and organisation.  First, we are now using Maven to manage our build process, and have carved the application into a set of core modules which can be used to assemble your desired DSpace instance.  For example, the JSP UI and the Manakin UI are now available as separate UI modules, and you may build either or both of these.  We are taking an important step down the road, here, to allowing for community developments to be more easily created, and also more easily shared.  You should be able, with a little tinkering, to provide separate code packages which can be dropped in alongside the dspace core modules, and built along with them.  There are many stages to go through before this process is complete or perfect, so we encourage you to try out this new mechanism, and to let us know how you get on, or what changes you would make.  Oh, and please do share your modules with the community!  Props to Mark Diggory and the MIT guys for this restructuring work.&lt;br /&gt;&lt;br /&gt;The second big and most exciting thing is that Manakin is now part of our standard distribution, and we want to see it taking over from the JSP UI over the next few major releases.  A big hand for Scott Phillips and the Texas A&amp;M guys for getting this code into the distribution; they have worked really hard.&lt;br /&gt;&lt;br /&gt;In addition to this, we have an Event System which should help us start to decouple tightly integrated parts of the repository, from Richard Rodgers and the guys at MIT.  Browsing is now done with a heavily configurable system written initially by myself, but with significant assistance from Graham Triggs at BioMed Central.  Tim Donohue's much desired Configurable Submission system is now integrated with both JSP and Manakin interfaces and is part of the release too.&lt;br /&gt;&lt;br /&gt;Further to this we have a bunch of other functionality including: IP Authentication, better metadata and schema registry import, move items from one collection to another, metadata export, configurable multilingualism support, Google and html sitemap generator, Community and Sub-Communities as OAI Sets, and Item metadata in XHTML head &amp;lt;meta&amp;gt; elements.&lt;br /&gt;&lt;br /&gt;All in all, a good looking release.  There will be a testathon organised shortly which will be announced on the mailing lists, so that we can run this up to beta and then into final release as soon as possible.  There's lots to test, so please lend a hand.&lt;br /&gt;&lt;br /&gt;We are also experimenting with a binary release, which can be downloaded from the same page as the source release.  We are interested in how people get on with this, so let us know on the mailing lists.&lt;br /&gt;&lt;br /&gt;Come and get it:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://sourceforge.net/project/showfiles.php?group_id=19984"&gt;http://sourceforge.net/project/showfiles.php?group_id=19984&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/Su8qnj6lHUM" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-4691783139275190839</guid>
         <pubDate>Thu, 25 Oct 2007 09:48:00 +0000</pubDate>
      </item>
      <item>
         <title>my my where did the summer go</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/10/my-my-where-did-summer-go.html</link>
         <description>OK, ok, it's been a long long time since I updated.  Did I say at the beginning that this was an experiment in seeing if I was capable of maintaining a blog?  If I didn't I should have done.&lt;br /&gt;&lt;br /&gt;But there's a good reason that I've not updated for a while.  That is, that I've been working flat out on the Imperial College Digital Repository: Spir@l, and am pleased to finally announce in a quiet way that we are officially LIVE:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://spiral.imperial.ac.uk/"&gt;http://spiral.imperial.ac.uk/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;On the outside it doesn't look too serious.  A standard looking DSpace, I hear you say, with an Imperial College site template on it.  And you'd be right.  But only about the tip of the ice-berg.&lt;br /&gt;&lt;br /&gt;Without wishing to blow my own trumpet (modesty &lt;em&gt;is&lt;/em&gt; the third or fourth best thing about me), please do check out the article which I co-wrote with my good colleague Fereshteh Afshari:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://hdl.handle.net/10044/1/493"&gt;http://hdl.handle.net/10044/1/493&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And you may also be interested in my presentation at the recent DSpace User Group Meeting in Rome 2007 (more on that later, maybe):&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" target="_blank" href="http://www.aepic.it/conf/viewabstract.php?id=200&amp;cf=11"&gt;http://www.aepic.it/conf/viewabstract.php?id=200&amp;cf=11&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I could probably be persueded to write a little here about how it works; maybe you'll even get snippets from the monolithic technical documentation that I'm in the middle of writing.&lt;br /&gt;&lt;br /&gt;Oh, and there's more news, but now I've got your attention again you have to wait for the next installment.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/nIamUPYRpyU" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-5921631848918710005</guid>
         <pubDate>Wed, 24 Oct 2007 15:27:00 +0000</pubDate>
      </item>
      <item>
         <title>EThOSnet Kick-Off</title>
         <link>http://chronicles-of-richard.blogspot.com/2007/05/ethosnet-kick-off.html</link>
         <description>On Tuesday of this week the EThOSnet Project Board met for the first time to kick off this significant new project.  For background, this project is the successor to the &lt;a rel="nofollow" target="_blank" href="http://www.ethos.ac.uk/"&gt;EThOS&lt;/a&gt; project, which in turn grew out of the Scottish projects: &lt;a rel="nofollow" target="_blank" href="http://www.thesesalive.ac.uk/"&gt;Theses Alive&lt;/a&gt; at Edinburgh, &lt;a rel="nofollow" target="_blank" href="http://www.lib.gla.ac.uk/daedalus/"&gt;DAEDALUS&lt;/a&gt; at Glasgow, and &lt;a rel="nofollow" target="_blank" href="http://www2.rgu.ac.uk/library/e-theses.htm"&gt;Electronic Theses&lt;/a&gt; at the Robert Gordon University.&lt;br /&gt;&lt;br /&gt;The aim of EThOSnet is to take the work done under EThOS and bring it up to a point where UK institutions can actually start to become early adopters, to start to digitise the back-catalogue of print theses in the UK, investigate technology for the current and the future incarnations of the system, and to basically kick-start a genuinely viable service for deposit and dissemination of UK theses.&lt;br /&gt;&lt;br /&gt;At this stage, the project does not have a Project Manager, which is causing minor hold-ups initially, but Project Director, and Director of Library Services Clare Jenkins of Imperial College Library has stepped in to hold things together until one is appointed (we are expecting to hear very soon).  In the interim, the Project Board has also been put in place to check that all the 7 Work Packages have the things they need to get going.&lt;br /&gt;&lt;br /&gt;Of these 7 workpackages, the first and last are concerned with project management and exit strategy, and the meat of the project will take place in packages 2 - 6.  Details of these work packages are available in the project proposal, which will hopefully be available on the JISC website soon.&lt;br /&gt;&lt;br /&gt;A quick summary, then, of some of the changes and more concrete decisions that we made during the meeting:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;We have set a pleasingly high target of 20,000 digitised theses and 3,000 born-digital theses by the end of the project.  This will be sourced from the many institutions who have already expressed an interest in adopting the service, before the project is even going!&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The first port of call for the technology is to smooth the process of the existing software tools for repository users.  I would hope to have something which works well for DSpace available quickly, and general enough to be part of the main distribution.  EPrints is already fully compliant, and Fedora has representitives from the University of Hull looking after it.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Communications will be done primarily through a soon-to-exist project wiki, and it is hoped that the existing &lt;a rel="nofollow" target="_blank" href="http://www.jiscmail.ac.uk/lists/E-THESES-UK.html"&gt;E-Theses UK list&lt;/a&gt; will be used more heavily than it is already.  Imperial College has agreed to host the existing ethos website, the wiki, and potentially the toolkit if necessary (&lt;a rel="nofollow" target="_blank" href="http://ethostoolkit.rgu.ac.uk/"&gt;currently hosted&lt;/a&gt; at RGU).&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Toolkit development will be ongoing, with work being done on it within a wiki, but with the option to move to some XML format for the final product&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;This is a very big project, and I can't possibly represent everything that came out of Tuesday's meeting here.  In the near future expect to see links to the project wiki appear and more information to come out.&lt;img src="http://feeds.feedburner.com/~r/chronicles-of-richard/~4/iT023WB9CPI" height="1" width="1"/&gt;</description>
         <author>Richard</author>
         <guid isPermaLink="false">tag:blogger.com,1999:blog-3741879089300545664.post-1653851498787432588</guid>
         <pubDate>Thu, 10 May 2007 14:05:00 +0000</pubDate>
      </item>
   <language>en-us</language><media:rating>nonadult</media:rating></channel>
</rss><!-- fe3.yql.bf1.yahoo.com compressed/chunked Tue Jun 18 03:42:51 UTC 2013 -->
