<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>petermr's blog</title>
	
	<link>http://blogs.ch.cam.ac.uk/pmr</link>
	<description>A Scientist and the Web</description>
	<lastBuildDate>Fri, 25 May 2012 13:43:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/PetermrsBlog" /><feedburner:info uri="petermrsblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>We meet in Berlin to prepare the #schoolofdata</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/2cpW63qJ8HI/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/25/we-meet-in-berlin-to-prepare-the-schoolofdata/#comments</comments>
		<pubDate>Fri, 25 May 2012 13:41:44 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/?p=3711</guid>
		<description><![CDATA[I&#8217;m spending an exciting two days in Berlin helping the OKFN/P2PU prepare their School Of Data (SoD) course/s. I&#8217;m sure this will turn out to be a seminal event in both Internet education and advancement in &#8220;data wrangling&#8221;. Here&#8217;s the initial announcement &#8211; http://blog.okfn.org/2012/02/08/announcing-the-school-of-data/ . &#8220;The School will be a joint venture between the Open [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m spending an exciting two days in Berlin helping the OKFN/P2PU prepare their <a href="http://schoolofdata.org">School Of Data</a> (SoD) course/s. I&#8217;m sure this will turn out to be a seminal event in both Internet education and advancement in &#8220;data wrangling&#8221;. Here&#8217;s the initial announcement &#8211; http://blog.okfn.org/2012/02/08/announcing-the-school-of-data/ .  &#8220;The School will be a joint venture between the <a href="http://okfn.org/">Open Knowledge Foundation</a> and <a href="http://p2pu.org">Peer 2 Peer University (P2PU)</a>. &#8221; 
</p><p>There&#8217;s a huge need for skilled and inventive data wrangling. This is a  mixture of technical knowledge and knowhow and the &#8220;course&#8221; will cover both. We are working out the granularity of the &#8220;course&#8221; – almost certainly a collection of smaller units, generally self-paced but with some clear timelines. P2PU has had considerable experience in this – for example partnering with Mozilla on web skills. 
</p><p><img src="http://blogs.ch.cam.ac.uk/pmr/files/2012/05/052512_1341_WemeetinBer11.jpg" alt="" />
	</p><p>Here&#8217;s Laura Newman – the course coordinator – getting our thoughts organized and photographed, and here&#8217;s Rufus Pollock and Stiivi Urbanek hard at work planning the details.
</p><p><img src="http://blogs.ch.cam.ac.uk/pmr/files/2012/05/052512_1341_WemeetinBer21.jpg" alt="" />
	</p><p>Stiivi has put together a great &#8220;architecture&#8221; for the technical side of the course which goes from acquiring data, to cleaning, filtering, repurposing and presentation. We have a strong sense of pipeline, where course participants take a problem from start to finish, using the appropriate skills are each stage. We are presenting this round &#8220;challenges&#8221; – we take a theme which everyone can relate to and go all the way from finding the data to drawing conclusions.
</p><p>The course structure and participation is flexible and controlled – there is no hierarchical distinction between teachers and leaners – we are all a bit of both. We expect information to flow from and to the course.
</p><p>The overall components (stages) – which have largely crystallized in our planning &#8211; are
</p><ul><li>Data sources 
</li><li>Discovery and acquisition
</li><li>Extraction
</li><li>Cleansing, transformation, and integration
</li><li>Analytical modelling
</li><li>Data mining
</li><li>Presentation, Analysis, publishing and packaging
</li></ul><p>And an overarching subject of &#8220;data governance&#8221; 
</p><p>To analyse a particular subject a participant needs to go through the processes above, although not all will be needed for a given problem/challenge. We call this process a &#8220;journey&#8221;, where we visit the different stages on a planned itinerary.  Many courses will be organized like this – and the first we have designed is &#8220;What is unique about my country?&#8221;
</p><p>In this participants (perhaps working in teams) will find and extract information about their country, clean, fliter and integrate it and finally present answers to this very general question (which requires comparison with other countries).
</p><p>In an orthogonal fashion, participants will also study a particular stage in depth. In the journey metaphor, this is like spending your time in one place, finding the different ways of tackling it. So one early topic will be &#8220;Crawling and scraping&#8221; – there are several different tools, approaches and problems.
</p><p>There&#8217;s a real buzz! Over 300 people have signed up and we had an IRC meeting yesterday with 30 – who are very keen to be involved and contribute. Lots of great skills and ideas.
</p><p>Much more later – on a regular basis &#8211; as this is an important part of my life.
</p><p>
 </p><p>
 </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/2cpW63qJ8HI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/25/we-meet-in-berlin-to-prepare-the-schoolofdata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/25/we-meet-in-berlin-to-prepare-the-schoolofdata/</feedburner:origLink></item>
		<item>
		<title>#scholpub should be regulated</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/yFgQ1V-0bZs/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/25/scholpub-should-be-regulated/#comments</comments>
		<pubDate>Fri, 25 May 2012 06:16:05 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/?p=3707</guid>
		<description><![CDATA[I recently asked &#8220;What&#8217;s the difference between Elsevier and British Gas?&#8221; I didn&#8217;t get many answers (it would be nice to have a greater response so I could highlight ideas other than mine). The question could also have replaced &#8220;British Gas&#8221; by &#8220;Virgin Trains&#8221;, &#8220;Scottish Power&#8221;, &#8220;East Anglian Water&#8221; or even &#8220;Lloyds Bank&#8221;. The answer [...]]]></description>
			<content:encoded><![CDATA[<p>I recently asked &#8220;<strong>What&#8217;s the difference between Elsevier and British Gas</strong>?&#8221; I didn&#8217;t get many answers (it would be nice to have a greater response so I could highlight ideas other than mine).  The question could also have replaced &#8220;British Gas&#8221; by &#8220;Virgin Trains&#8221;, &#8220;Scottish Power&#8221;, &#8220;East Anglian Water&#8221; or even &#8220;Lloyds Bank&#8221;.
</p><p>The answer is that the others are all, to a greater or lesser extent bound by regulation. They have a legal duty to:
</p><ul><li>Ensure the quality of service
</li><li>Limit prices
</li></ul><p>Scholarly publishing is in a bizarre and completely unhealthy marker where there is no effective market regulation of price, there is no quality control (the quality of #scholpub is awful compared to other e-products on the web and hasn&#8217;t changed in 20 years. ) We have NO IDEA what the true costs of publishing a paper are, or what they could be if the market operated.
</p><p>Acta Crystallographica E publishes the highest quality papers in science. It&#8217;s a data-only journal and doesn&#8217;t completely scale to other journals. It charges 150 GBP for Gold Open Access and makes a margin. They have built their own authoring system which every crystallographer uses and the papers are full of checked, semantic data and there is high-quality peer review. It&#8217;s difficult to extrapolate but I think a figure of 500 GBP would be the MAXIMUM cost of an efficient scholarly publisher. I&#8217;d like to see the high price publishers challenge this. 
</p><p>Yesterday I was asked by a journalist (I won&#8217;t spoil their story) to comment on the UK Finch report. This hasn&#8217;t formally reported but there are some open readable minutes at <a href="http://www.researchinfonet.org/wp-content/uploads/2012/01/Minutes-mtg-2012-04-272.docx" title="blocked::http://www.researchinfonet.org/wp-content/uploads/2012/01/Minutes-mtg-2012-04-272.docx" target="_blank"><span style="font-family:Arial;font-size:10pt">http://www.researchinfonet.org/wp-content/uploads/2012/01/Minutes-mtg-2012-04-272.docx</span></a><span style="font-family:Arial;font-size:10pt"> and I was asked to comment on what I thought of the pricing , market, etc. I don&#8217;t have a strong view on Finch, but it says:
</span></p><p style="margin-left: 36pt"><em>The Working Group first considered the tables in the annex, which were founded on modelling undertaken for the Heading for the Open Road report. It was noted that the &#8216;central case&#8217; was a starting point under which APCs were set at a cost-neutral level for the HE sector in the UK of c£1,450 per article, with an assumed take-up rate of 23.3% for OA publications. All the tables therefore use that as a starting point, and vary the costs according to a series of different assumptions – some of which are obviously more realistic than others. The variability is determined by four factors: (i) the level of APCs, (ii) the level of take-up of the gold option, (iii) the difference between levels of take up in UK and rest of the world, and (iv) the proportion of APCs to be met by authors outside and within the UK for jointly-authored papers. The Group observed also that the £18.7m saving from subscription charges does not take account of &#8216;stickiness&#8217; in a transitional shift from subscription to APCs – which is liable to take a significant amount of time. Such a transition implies additional costs.
</em></p><p>I haven&#8217;t read the annexe and I cannot see how they can actually assess the costs since almost no publishers analyse and publish them. Some publishers have argued that costs can approach 20,000 USD because of high rejection rates. This is a typical example of an unregulated market. It&#8217;s like saying &#8220;we don&#8217;t have enough capacity on our buses so we are going to throw most passengers off and charge the others a huge amount to make our profits&#8221;. It&#8217;s a sign of a broken market.
</p><p>A typical example of how inefficient the industry is and how unresponsive to costs is that most publishers send the manuscripts off to be retyped – this is an appalling admission of lack of reaction to the 21<sup>st</sup> century. It&#8217;s like having to send Amazon a snail mail to order something. It&#8217;s because Amazon broke  the model that we have efficient, price-competitive market of goods. If the academic sector wished to reduce costs of Gold OA they should create a system with author-side cost reduction. If I was given the option of paying 1450 GBP for APC or 500 GBP if I created it in NLM DTD XML I&#8217;d go for the latter.  The NLM (which publishes Pubmed)  is a world authority on publishing and far more efficient than publishers. It has been highly innovative and the only brake on progress has been the relentless destructive legalisation against it and restrictive practices imposed by major toll-access publishers. That&#8217;s why we cannot get access to content-based search, not because they can&#8217;t do it.
</p><p>Anyway I wrote the following for the journalist. It echoes what I have written here:
</p><p>&#8220;What I am concerned about [and what I intend to blog about as soon as I have time] is the lack of regulation in this market.  In almost all transactions, whether author-&gt;publisher or publisher-&gt;reader there is no price-sensitive market. There is little market pressure on publishers to bring down costs, nor to produce better products. (Scholarly publishing is one of the very few sectors to be completely unaffected by the web &#8211; the product is an electronic copy of what was done 20 years ago). There is even less market force in the hybrid Gold model where publishers can charge what they like with no regulation &#8211; it is simply up to the funders or authors to pay what is demanded. Moreover the products offered are often not significantly different from Green &#8211; there are no rights of re-use and in some cases not even of copying.<br /><br />In areas such as transport, energy, banks, public services and many others the government regulates the market. Providers have to work within negotiated margins and provide an agreed level of service. None of this pressure is put on publishers. The market often resembles personal vanity products where only the brand matters and cost of production is irrelevant. <br /><br />My view is that any Green/Gold model will be a seriously suboptimal model until all the current cost (10 billion USD/yr) can be brought funder/author-side. This desperately needs regulation and strong leadership from bodies &#8211; probably governments and major funders. I don&#8217;t think Finch has addressed this at all &#8211; you cannot be convincing unless you demand a change of control and do the budgeting properly.<br /><br />I believe that even at 1500 GBP per paper this represents a seriously overpriced market. I think it might be brought down by bringing in public contractors / purchasers as is done in Brazil, I believe. Nothing could be more inefficient than leaving market forces to libraries in 10,000 scattered uncoordinated universities.<br />&#8221;
</p><p>So I am not getting excited about Finch unless the government (Willetts) does. AFAICS Finch says &#8220;we want a mixed Green/Gold model with the emphasis on Gold. We aren&#8217;t putting money in. We aren&#8217;t imposing regulation. We are not controlling prices related to costs.&#8221; And of costs it&#8217;s only one country.
</p><p>#scholpub is now, at its worst , a vanity market such as fragrance or mineral water. The price is vastly higher than the cost. You ask what you can get, not what it costs. There is large, wasteful marketing, there is large and wasteful investment in technology and lawyers to prevent access. 
</p><p>So what&#8217;s the difference between Elsevier and Chanel? Not much. They are both unregulated.
</p><p>Oh, and stop thinking of publishers as collaborating partners. Alicia Wise on the GOAL  Open Access mailing list asks &#8220;what can publishers do to help&#8221;. She asserts publicly that I don&#8217;t trust her. Actually I trust her completely. I trust her to behave like a middle manager public relations officer in &#8220;Customer Relations&#8221; for British Gas, or Scotrail or whomever. She is there to maximize profits for the company. And part of that is preserving the current pseudo-monopolies. I trust he to continue to try to defend that. And offering help is a well-used strategy.
</p><p>And she can trust me to challenge almost everything that Elsevier does, says, and more importantly doesn&#8217;t do.
</p><p>Stevan Harnad is dismayed that Elsevier has introduced a catch-22 int their Green regulations. It&#8217;s convoluted (well-designed Catch-22s are) and says something like &#8220;you can deposit Green, but if your institution mandates it then you cannot&#8221;. Stevan feels this is a breach of trust and that Elsevier should change it. I say that until this is regulated by a body with teeth we shall continue to have these games played by the publishers. If I travel to somewhere via London on British trains the price is higher. The cost is not higher.
</p><p>Think of Elsevier, Nature, Wiley, Springer, etc as gas, transport, telecoms, etc. They have no more reason the be loved or hated than those. 
</p><p>The sick part is that the trains have to pay for their fuel (and a lot else). In #scholpub we GIVE the publishing industry the content.
</p><p>
 </p><p>
 </p><p><span style="font-family:Arial;font-size:10pt">
		</span> </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/yFgQ1V-0bZs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/25/scholpub-should-be-regulated/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/25/scholpub-should-be-regulated/</feedburner:origLink></item>
		<item>
		<title>Today we mobilize the forces of academic Freedom</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/81e_jk3naLs/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/21/today-we-mobilize-the-forces-of-academic-freedom/#comments</comments>
		<pubDate>Mon, 21 May 2012 09:38:25 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/?p=3704</guid>
		<description><![CDATA[I&#8217;m attaching a mail that&#8217;s going round the academic twittersphere – mobilizing everyone to sign a WhiteHouse (US) petition requiring that all federally funded research be made publicly available. It&#8217;s a no-brainer. : Find the site below Sign Mail this message to your contacts Will it do any good? Yes. Every bit of publicity is [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m attaching a mail that&#8217;s going round the academic twittersphere – mobilizing everyone to sign a WhiteHouse (US) petition requiring that all federally funded research be made publicly available.
</p><p>It&#8217;s a no-brainer. :
</p><ul><li>Find the site below
</li><li>Sign
</li><li>Mail this message to your contacts
</li></ul><p>Will it do any good?
</p><p>Yes. 
</p><p>Every bit of publicity is good and every indication of support helps. This isn&#8217;t asking you to occupy the streets. It&#8217;s simply, democratically, asking the US government to act.
</p><p>The US government has been flooded with contrary bills from vested interests (SOPA, ACTA, RAW, etc.) and public opinion has killed some of them and is chopping off the hydra heads as they emerge. It is unsustainable.
</p><p>Meanwhile we are now taking positive measures. It make sense to everyone except those with a narrow view of corporate power over material they haven&#8217;t produced and have little moral right to control. 
</p><p>S, in simple words:
</p><p>ALL FEDERALLY FUNDED RESEARCH SHOULD BE MANDATORILY POSTED ON THE WEB.
</p><p>The funders want this to happen.
</p><p>The authors want this to happen
</p><p>The readers (that&#8217;s YOU) want this to happen.
</p><p>The #scholarlyPoor want this to happen. 
</p><p>It&#8217;s not revolution. It&#8217;s our right and our responsibility.
</p><p><span style="font-family:Times New Roman;font-size:12pt">SIGN.
</span></p><p>
 </p><p style="margin-left: 36pt"><span style="font-family:Times New Roman;font-size:12pt"><em>On *Monday, May 21*, we lodge a petition on the White House&#8217;s &#8220;We the People&#8221; page asking the Obama administration to require that all federally funded research be posted on the Web – extending the principle of the NIH policy to all federal agencies.<br /><br />1. What We&#8217;re Asking<br /><br />· Publicity/ Call for Participation.  Please help line up publicity for the petition before Monday.  Specifically, can you help get it on the front pages of Reddit, Tumblr, Wikipedia, Boing Boing, and send out an all-hands-on-deck request through your own blogs/twitter feeds, etc?<br /><br />· 25,000 signatures in 30 days gets an official Administration response.  We want to hit that number fast to escalate this issue inside the White House.  We believe the policy has support but is stuck.  This could well be the event that gets it through.<br /><br />· Please sign the petition on Monday.<br /><br />2. Social Media links/handles
</em></span></p><p style="margin-left: 36pt"><span style="font-family:Times New Roman;font-size:12pt"><em><br />The official campaign website is at <a href="http://access2research.org/" target="_blank"><span style="color:#1155cc;text-decoration:underline">http://access2research.org</span></a> and there are already Facebook pages (<a href="http://facebook.com/access2research" target="_blank"><span style="color:#1155cc;text-decoration:underline">http://facebook.com/access2research</span></a>) and Twitter handles (@access2research) in place. 
</em></span></p><p style="margin-left: 36pt"><span style="font-family:Times New Roman;font-size:12pt"><em>3. Petition Text (800 character limit)<br /><br />WE PETITION THE OBAMA ADMINISTRATION TO: [This doesn't count toward the character count]<br /><br />Require free access over the Internet to journal articles arising from taxpayer-funded research.<br /><br />We believe in the power of the Internet to foster innovation, research, and education. Requiring the published results of taxpayer-funded research to be posted on the Internet would give access to entrepreneurs, researchers, patients, caregivers, and students, who currently are blocked by high costs. We know this works without disturbing the process of scientific publishing because the National Institutes of Health is already doing it through its highly successful Public Access Policy. All other federal agencies that fund research should have similar policies.<br /><br />President Obama, please act now to make federally-funded research freely available to taxpayers on the Internet.<br /><br />4. The Ask to Others<br /><br />To sign the petition:<br /><br />-   Have to be 13 years or older<br />-   Have to create an account on <a href="http://whitehouse.gov/" target="_blank"><span style="color:blue;text-decoration:underline">whitehouse.gov</span></a>, which requires giving a name and an email address and then clicking the validation link sent to that address<br />-   Click to sign<br /><br />5. Further Context
</em></span></p><p style="margin-left: 36pt">
 </p><p style="margin-left: 36pt"><span style="font-family:Times New Roman;font-size:12pt"><em>After years of work on promoting policy change to make federally-funded research available on the Internet, and after winning the battle to implement a public access policy at NIH, it has become clear that being on the right side of the issue is necessary but not sufficient. We&#8217;ve had the meetings, done the hearings, replied to the requests for information.<br /><br />But we&#8217;re opposed in our work by a small set of publishers who profit enormously from the existing system, even though there is no evidence that the NIH policy has had any measurable impact on their business models. They can &#8211; and do &#8211; outspend those of us who have chosen to make a huge part of our daily work the expansion of access to knowledge. This puts the idea of access at a disadvantage. We know there is a serious debate about the extension of public access to taxpayer funded research going on right now in the White House, but we also know that we need more than our current  approaches to get that extension made into federal policy.<br /><br />The best approach that we have yet to try is to make a broad public appeal for support, straight to the people. The Obama Administration has created a web platform to petition the White House directly called We The People. Any petition receiving more than 25,000 digital signatures is placed on the desk of the President&#8217;s Chief of Staff and must be integrated into policy and political discussions. But there&#8217;s a catch &#8211; a petition only has 30 days to gather the required number of signatures to qualify.<br /><br />We can get 25,000 signatures. And if we not only get 25,000, but an order of magnitude more, we can change the debate happening right now.<br /><br />Next week we will publish our petition and the 30 day cycle begins. What we&#8217;re asking you to do is to leverage your personal and professional networks to get the word out.<br /><br />You can do this in any way that makes you feel comfortable. A blog post, an email to constituencies, a tweet, a facebook share, you name it &#8211; something that tells thousands of people &#8220;I support this petition, I&#8217;m signing this petition, and I thought you should know about it too.&#8221; Because this isn&#8217;t just slacktivism with a &#8220;like&#8221; or a retweet &#8211; people need to go to the White House website, enter their name and email address, and hit the button.<br /><br />Qualified signers must be 13 years old or more, and have a valid email address. That&#8217;s all.<br /><br />The goal is not just to get 25,000, but to get far more to show the White House that this issue matters to people, not just a few publishers.<br /><br />We are launching the campaign on Monday May 21. The petition will go live late Sunday night May 20, so that the waves can start in the EU and sweep west with the sunrise. We&#8217;re asking you to turn on your networks on Monday morning.<br /><br />Thanks for considering this. If we can all come together to get the word out at once, and stay behind it for 30 days, we have a real chance to get access to taxpayer funded research across the entire government, and send a signal that the people have a voice in this debate, not just publishers and activists.
</em></span></p><p><span style="font-family:Times New Roman;font-size:12pt">
		</span> </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/81e_jk3naLs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/21/today-we-mobilize-the-forces-of-academic-freedom/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/21/today-we-mobilize-the-forces-of-academic-freedom/</feedburner:origLink></item>
		<item>
		<title>#scholpub , Maxwell and the Laws of Acadynamics</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/BrQKW1lSnTY/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/15/scholpub-maxwell-and-the-laws-of-acadynamics/#comments</comments>
		<pubDate>Tue, 15 May 2012 21:08:02 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/?p=3701</guid>
		<description><![CDATA[For many days we have been discussing #scholpub on the GOAL mailing list, run by Richard Poynder. Some important issues are coming up and there is now a healthy divergence of views which RP runs well. I&#8217;ll talk more later, I hope. In the time between trying to content-mine PDF (yes, more later), I thought [...]]]></description>
			<content:encoded><![CDATA[<p>For many days we have been discussing #scholpub on the GOAL mailing list, run by Richard Poynder. Some important issues are coming up and there is now a healthy divergence of views which RP runs well. I&#8217;ll talk more later, I hope.
</p><p>In the time between trying to content-mine PDF (yes, more later), I thought about the tragedy of the academic commons. We have 10,000,000,000 USD (count the zeros) or mainly public money and student fees to &#8220;buy&#8221; the #scholpub we produce. That&#8217;s a sizable market. It&#8217;s not as large as many, but quite enough to run competently and for the benefit of everyone. 
</p><p>Including the #scholarlypoor
</p><p>But we don&#8217;t.  #scholpub is the most inefficient &#8220;market&#8221; in the world. (No, perhaps arms procurement is worse ). I&#8217;ll analyse more in a later post. Hint, here&#8217;s the answer to my question:
</p><p style="margin-left: 36pt"><em>&#8220;What&#8217;s the difference between Elsevier and British Gas (or Central Trains, or Scottish Power or umpteen more)?&#8221;
</em></p><p><strong>Answer: There is no regulator for #scholpub.
</strong></p><p>I wondered why. Basically because academia is 10,000 institutions all going in different directions. 
</p><p>In molecular sciences these particles obey a Maxwellian distribution. Some fast, some slow, some east, some west, some north, some south, some up, some down. Occasionally they bump into each other, but they are basically uncoordinated. 
</p><p>And they give rise to the laws of thermodynamics. The analogy that follows has some merit – I am still working it out – feel free to contribute: The laws in their formal form are not easily accessible but there&#8217;s a witty synopsis (<a href="http://en.wikiquote.org/wiki/Thermodynamics">http://en.wikiquote.org/wiki/Thermodynamics</a> ) 
</p><p>    0     You have to play the game
</p><ol style="margin-left: 54pt"><li>You can&#8217;t win; you can only break even.
</li><li>You can only break even at absolute zero.
</li><li>You can&#8217;t reach absolute zero.
</li></ol><p>Law 1 says you can move resources (heat and work) around and that you conserve energy
</p><p>Law 2 says that there are inefficiencies in the system (loss of useful energy) which only disappear at absolute zero (the lowest possible temperature)
</p><p>Law 3 is obvious
</p><p>
 </p><p>So I thought – there is ten billion dollars in the system. It can be moved around. There are inefficiencies in the system, but if we work together we can achieve high efficiency, And then? The sad truth. So I proposed 3 laws. They are raw, you are welcome to tune the wording. But they are roughly based on the three laws of Thermodynamics and perhaps there is a zeroth here:
</p><p style="margin-left: 36pt">0. There is a lot of money in the academic #scholpub system
</p><p style="margin-left: 36pt">1. We can change the system by moving money around
</p><p style="margin-left: 36pt">2. To do this academics must collaborate
</p><p style="margin-left: 36pt">3. Academics will never collaborate
</p><p style="margin-left: 36pt">
 </p><p style="margin-left: 36pt">
 </p><p>And when I published them Jan Velterop came up with the lovely &#8220;Laws of Acadynamics&#8221;. Thanks Jan.
</p><p>Now there is a way to get round the Second Law. Maxwell&#8217;s Demon (<a href="http://en.wikipedia.org/wiki/Maxwell%27s_demon">http://en.wikipedia.org/wiki/Maxwell%27s_demon</a> ) . A superintelligent being that bats individual molecules around.  Organizes Universities to point in the same direction. Yes, we need a Maxwell demon.
</p><p>But haven&#8217;t we had a Maxwell Demon already in #scholpub?
</p><p>
 </p><p><em>
		</em> </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/BrQKW1lSnTY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/15/scholpub-maxwell-and-the-laws-of-acadynamics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/15/scholpub-maxwell-and-the-laws-of-acadynamics/</feedburner:origLink></item>
		<item>
		<title>Whats’ the difference between Elsevier and British Gas?</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/sZVTJ7yPqdU/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/12/whats-the-difference-between-elsevier-and-british-gas/#comments</comments>
		<pubDate>Sat, 12 May 2012 09:51:18 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/?p=3698</guid>
		<description><![CDATA[This is a serious question and I have a serious answer. See if you can guess it. If so add a comment. You can substitute &#8220;FooPub&#8221; for &#8220;Elsevier&#8221; where FooPub is any #scholpub such as ACS, PLoS, Wiley, BMC, etc. You can substitute Eastern Water, Scottish Power, First Capital Connect (a train operator) and many [...]]]></description>
			<content:encoded><![CDATA[<p>This is a serious question and I have a serious answer. See if you can guess it. If so add a comment.
</p><p>You can substitute &#8220;FooPub&#8221; for &#8220;Elsevier&#8221; where FooPub is any #scholpub such as ACS, PLoS, Wiley, BMC, etc.
</p><p>You can substitute Eastern Water, Scottish Power, First Capital Connect (a train operator) and many others for &#8220;British Gas&#8221;.
</p><p>I shall continue to turn my attention to content-mining in the next few posts.
</p><p>
 </p><p>
 </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/sZVTJ7yPqdU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/12/whats-the-difference-between-elsevier-and-british-gas/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/12/whats-the-difference-between-elsevier-and-british-gas/</feedburner:origLink></item>
		<item>
		<title>Data are part of the future; the OKFN’s contribution</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/woOSQF1hub4/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/08/data-are-part-of-the-future-the-okfn%e2%80%99s-contribution/#comments</comments>
		<pubDate>Tue, 08 May 2012 11:58:37 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/2012/05/08/data-are-part-of-the-future-the-okfn%e2%80%99s-contribution/</guid>
		<description><![CDATA[I am really excited about the OKF&#8217;s commitment to data. Most data is lost, badly produced, unclear, etc. The OKFN-P2PU School of Data intends to create a new approach to education for the data-age. I&#8217;m very excited to be part of this. Don&#8217;t have time to do more than advertise: http://blog.okfn.org/2012/05/08/were-recruiting/ The Open Knowledge Foundation [...]]]></description>
			<content:encoded><![CDATA[<p>I am really excited about the OKF&#8217;s commitment to data. 
</p><p>Most data is lost, badly produced, unclear, etc. The OKFN-P2PU School of Data intends to create a new approach to education for the data-age. I&#8217;m very excited to be part of this.
</p><p>Don&#8217;t have time to do more than advertise:
</p><p><a href="http://blog.okfn.org/2012/05/08/were-recruiting/">http://blog.okfn.org/2012/05/08/were-recruiting/</a>
	</p><p><strong>The Open Knowledge Foundation are currently recruiting for a Data Wrangler and a Data Visualisation Developer. If you&#8217;d like join our team, please visit our <a href="http://okfn.org/jobs/">jobs page</a>.</strong>
	</p><p><img src="http://blogs.ch.cam.ac.uk/pmr/files/2012/05/050812_1158_Dataarepart1.jpg" alt="" />
	</p><p>At the Open Knowledge Foundation, we build tools and communities to create, use and share open knowledge – and to help others to do the same. In recent months, we have become involved in a growing number of open data projects, and two new positions have now been created within our team.
</p><p>We are seeking two data experts to join us as a <strong>Data Wrangler</strong> and a <strong>Data Visualisation Developer</strong>. Read on to find out more about what the roles involve.
</p><h2>Data Wrangler
</h2><p>We&#8217;re looking for a data wrangler who is excited to tell stories through data. You will work on various datasets, to understand them and to tell their story to a broader audience. You will also be involved in training efforts, creating and teaching courses in data analysis to technical and non-technical audiences.
</p><p>Your role will be exciting and varied, and will include:
</p><ul><li>Work on the School of Data, building learning challenges and course content (see our <a href="http://blog.okfn.org/2012/02/08/announcing-the-school-of-data/">previous post</a> for more information on the School)
</li><li>Research for our new data blog, coming soon.
</li><li>Collaborations with our Working Groups, for example the <a href="http://openeconomics.net/">Working Group on Open Economics</a>
		</li><li>Work on <a href="http://openspending.org/">OpenSpending</a>, one of our flagship projects.
</li></ul><h3>Skills
</h3><p>We are open to people from a wide variety of backgrounds; whether coding, visualisation, journalistic, statistical or otherwise. We are seeking someone who has:
</p><ul><li>Experience in data analysis and statistical methods
</li><li>Experience with data cleansing, ETL patterns
</li><li>Good written communication skills
</li><li>Experience with R/Stata/SPSS
</li><li>Coding skill in a modern script language, e.g. Python, Javascript.
</li><li>Basic skills in information/data visualization
</li></ul><p>If that sounds like you, please visit our <a href="http://okfn.org/jobs/">jobs page</a> to find out more.
</p><h2>Data Visualisation Developer
</h2><p>As a Data Visualisation Developer, much of your time will be spent on our flagship <a href="http://openspending.org/">OpenSpending</a> project.
</p><p>OpenSpending is about mapping the money. We want to make government finances accessible to advocates, journalists and citizens. Our goal is to collect budgeting information from across the world and to present it in a form that promotes understanding, analysis and participation. Some of the questions we ask are:
</p><ul><li>How much is government spending on health? Is expenditure growing or shrinking? How does this translate into results?
</li><li>What are the proportions of different government programmes? What is spending on prisons compared to schools? How much is Ghana spending on education compared to Nigeria?
</li><li>How much tax do I pay into which area of government?
</li></ul><p>Our day-to-day work has many facets. We work on the core platform, undertake journalistic projects as part of &#8220;Spending Stories&#8221;, which won the Knight News Challenge in 2011, and work with organizations and civic activists world-wide to set up local budget transparency projects.
</p><h3>Your role with us
</h3><p>You&#8217;ll help us to create new visualizations to answer spending questions through meaningful, visual narration.
</p><p>Skills we&#8217;re looking for:
</p><ul><li>Strong visual design skills
</li><li>HTML5/Javascript visualisation experience
</li><li>Familiarity with several visualization toolkits (e.g. D3, Raphael)
</li><li>Experience with cross-browser compatibility
</li><li>Plus (but optional): Knowledge of Python 
</li></ul><p>Basically: send us some demos of good stuff you&#8217;ve done.
</p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/woOSQF1hub4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/08/data-are-part-of-the-future-the-okfn%e2%80%99s-contribution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/08/data-are-part-of-the-future-the-okfn%e2%80%99s-contribution/</feedburner:origLink></item>
		<item>
		<title>Towards a manifesto on Open Mining of scholarship</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/myHdGAdilgU/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/05/01/towards-a-manifesto-on-open-mining-of-scholarship/#comments</comments>
		<pubDate>Tue, 01 May 2012 18:00:04 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/2012/05/01/towards-a-manifesto-on-textmining/</guid>
		<description><![CDATA[Tomorrow a small group of people interested in &#8220;textmining&#8221; will have a Skype meeting under the auspices of the OKFN. We have sort-of-pushed this agenda for some years and now it&#8217;s come to fruition – there is clear public awareness of the value of textmining and the barriers that prevent it being used. Indeed my [...]]]></description>
			<content:encoded><![CDATA[<p>Tomorrow a small group of people interested in &#8220;textmining&#8221; will have a Skype meeting under the auspices of the OKFN. We have sort-of-pushed this agenda for some years and now it&#8217;s come to fruition – there is clear public awareness of the value of textmining and the barriers that prevent it being used. Indeed my blog has even got mentioned in a financial analyst&#8217;s review  of Elsevier (the implication being that if Elsevier continues to drag their feet their market will react against them). Of course it&#8217;s not just Elsevier, but they are the ones that have had most prominence. So this post if to prepare my mind and hopefully come out with some useful ideas.
</p><p>There is no doubt that the lack of positive approaches to textmining is having huge costs:
</p><ul style="margin-left: 38pt"><li><strong>Opportunity</strong>. We cannot do the things that we want to. Moreover this stifles the imagination of the rest of the community – without exciting examples of what can be done – and they *are* exciting – people do not realise what they are missing. And that&#8217;s all of us, not just subscribers to journals.
</li><li><strong> In wasted time</strong>. Anyone wishing to do textmining has to spend huge amounts of time trying to get permissions, worrying about being taken to court, and simply waiting for null responses.
</li><li><strong>Bad science</strong>. Much published scientific data is flawed. Not necessarily deliberately, but by the outdated methods of publication. Almost no scientific data are reviewed (a few publishers like Int. Union of Crystallography are shining exceptions). And their tools have unearthed bad and fraudulent science. There is no reason to believe it is different elsewhere – in fact I suspect it&#8217;s worse – the chance of getting caught is often near zero. Textmining is a major tool in data review.
</li><li><strong>Unexploited information and products</strong>. Google et at have shown that there are huge new markets. There is undoubtedly a large market in downstream information and information products from scientific research. I estimate it at low billions for chemistry alone.
</li><li><strong>Bad policy decisions</strong>. If the scientific literature is not used fully then decisions are flawed. These range from new drugs, to climate, to the effects of chemical to… Machines can provide decision support that complements humans.
</li><li><strong>Bad scholarship and bad scholarly relations</strong>. When a new technology emerges of benefit to scholarship then its wilful prevention for non-scholarly reasons has harmful effects on the whole community. It&#8217;s fair to say that many textminers see publishers as a major problem  who are solely bent on making money by restrictive practices
</li></ul><p>There are more – but that should be more than enough to build an overwhelming case.
</p><p><strong>Now what is &#8220;textmining&#8221;.</strong> The word is very unfortunate for several reasons:
</p><ul><li>There are specific legal aspects of text which may differ from other forms of information.
</li><li>There is a confusion with &#8220;fulltext&#8221;.
</li><li>It suggests that only the words in scholarship are involved. This is particularly damaging since much information is conveyed in images, diagrams, audio and video (in fact all of the major MIME-types!). For example commercial publishers often forbid the re-use of diagrams or charge large amounts because artistic images have special protection under copyright.
</li></ul><p>I would like to see a more general term – perhaps &#8220;information mining&#8221; (IM) which covers all the types about and also &#8220;data&#8221;. Or possibly &#8220;publication mining&#8221;. It would be a disaster if we only agree how to manage &#8220;text&#8221; and left the rest unchallenged.
</p><p>Some technical background. (I actually suspect that most of the people who make the rules about IM  (libraries, publishers) haven&#8217;t a clue how it&#8217;s done). Simply:
</p><ul><li>You write (or borrow) a program that retrieves the things you want to mine. A simple F/OSS one is called wget. Ours (Nick Day, Sam Adams) is called &#8220;PubCrawler and has been specially built for crawling scholarly publications. You point it at a website and it systematically retrieves files/pages one-by-one. The only problem is that if you do this too quickly then it may overload the website, <strong>so responsible crawlers have a  delay (perhaps 5 seconds) – POINT 1.</strong> The argument that textmining will destroy servers is a smokescreen. (There are many ways of avoiding technical problems).  Note that if you already have the papers on a local machine this step is unnecessary. Universities create caches to avoid repeated downloads but publisher want the downloads so they can count-the-clicks.<strong> This process does NOT violate copyright </strong>though it may technically violate the restrictive publisher contracts that Universities have signed.
</li><li>You have another program that mines information from each paper. This is hard and tedious to write but once done is automatic to run. How well it performs depends on many factors (the format of the paper, the language/style of the journal/authors, the use of dumb (GIF/PNG) or semi-semantic (SVG) diagrams, etc.). For text you could use Lucene – an Apache project. Daniel Lowe has shown that it&#8217;s possible to mine 500,000 chemical reactions from US patents using our F/OSS OSCAR/OPSIN/ChemicalTagger and the NIH&#8217;s OSRA for chemical diagrams. Things are better than they were 5 years ago and I am fairly hopeful about the technical mass-mining of chemistry. <strong>This process does NOT violate copyright </strong>though it may technically violate the restrictive publisher contracts that Universities have signed.
</li><li>You publish your results. Here there is a potential problem with copyright although I suspect it has never been tested. I suspect anything less than bulk republishing of verbatim full-text would be allowable in many courts. In particular republishing &#8220;factual&#8221; information would incur no legal penalties, whether or not for commercial purposes.
</li></ul><p><strong>The miner&#8217;s problem.
</strong></p><p>Simply stated:
</p><ul><li>IM MIGHT fall foul of copyright law. Because of the risk-averseness of libraries and the pressure from some publishers to limit activities such as UK/PMC no authorities are prepared to challenge of test this. Individual researchers left to make their own judgments, with little hope that they will get support from institutions. This canopy of fear is a dampener for research.
</li><li>There are NO explicit rules. Because of this researchers do not know what they can and cannot do. Logic does NOT work in courts of law – only laws and precedence. People who make facile assertions that you can/not do something only muddy the waters.
</li><li>It MIGHT fall foul of database laws such as <em>sui generis</em> in Europe. Against in our risk-averse culture no-one offers support to challenge this.
</li><li>It probably WILL fall foul of the Publisher-imposed extensions to University contracts. These are basically unethical and imposed solely (IMO) for protecting the market.
</li></ul><p>Simply stated: Miners need clear, simple, permanent, automatic answers so they know what they can and cannot do. 
</p><p><strong><em>Researchers are responsible people</em></strong>. There are many places where research has to take account of law and there are very few public breaches. The same should be assumed for IM. 
</p><p><strong>The publishers&#8217; problem.
</strong></p><p>The primary problem is that publishers now have a market (not necessarily of their own making) which is profitable and where change may bring problems. The flip-side, that IM may bring benefits is never mentioned! Thus Richard Kidd of the Royal Soc. Chemistry on this blog has voiced the fear that he/they are worried that my textmining may undermine the RSC&#8217;s viability and he wants an assurance that I won&#8217;t do anything to harm their income. I think of all publishers in the world the RSC is best placed to <strong>benefit massively</strong> from IM instead of preventing it happening.
</p><p>This is a typical problem with monopolies (which the publishers have). They want to see their income continue indefinitely in the same way rather than changing their models. It&#8217;s natural, and history shows it&#8217;s ultimately doomed. Only the conservatism of academia (see Michael Eisen&#8217;s blog) keeps them in business. Whether or not we take the publishers&#8217; interests into account depends on the worth that society gives to their services – and that is changing rapidly.
</p><p>There is no natural law that says we do or don&#8217;t have to accommodate the publishers, whether or not they are learned socs.  <strong>They no longer have the moral right to control unilaterally how scientific knowledge is published and used</strong>. There has been no constructive debate in this area and publishers should think about their source of material and its volatility.
</p><p><strong>The libraries&#8217; problem.
</strong></p><p>This is a completely new technology which is opaque to many libraries. There are, of course some world-leaders in information management , especially the NLM and national libraries but the average University has no experience of either the technology or the law. This makes it problematic when publishers suggest that text-miners should go through their libraries and have joint discussions with publishers. This is counterproductive as is drastically slows the process and means that many of the decisions are made by non-practitioners. [I have so far written several times to my librarian and am waiting for a reply]. The rigmarole that Elsevier put Heather Piwowar through with UBC librarians is out of order and in any case doesn&#8217;t scale across publishers , libraries or researchers. 
</p><p><strong>Current concerns and why we need principles
</strong></p><p>There is a high probability that some well-intentioned academics will &#8220;negotiate&#8221; terms with publishers which then are used a precedent to constrain everyone else. I, for example, am unwilling to accept the terms that UBC have. For that reason we are setting out principles, which we believe are absolute and which will inform the practices and their adoption. In the spirit of the excellent crafted BOAI and other declarations we are working towards words which will last for decades.
</p><p>Bases of the principles:
</p><ul style="margin-left: 38pt"><li>The scholarly literature is created to inform and enlighten humankind. Authors expect that their material will be as widely used in an many ways as possible and by as many people as possible.
</li><li>Information mining is a natural and major advance in the use of the scholarly literature and brings very large benefits. 
</li><li>The only inexorable laws relating to IM are copyright and database rights. These were not designed to restrict the flow of scholarship and should not be used for this purpose.
</li><li>Subscribers to the scholarly literature are responsible people and will not deliberately break the law. They need a globally published set of principles by which they can determine what they may do.
</li><li>Technology and human attitudes are changing rapidly and we should be positively and proactively responsive to them. We cannot and should not try to guess the future and we should not jeopardies it by short-term considerations
</li></ul><p>And perhaps a single definition. I suggest the term &#8220;Open Mining&#8221; as inclusive. Note that these principles are statements of what we wish to be the case, not a negotiation. BBB are  statements of aspiration.
</p><ul><li>&#8220;By Open-mining  we mean the unrestricted use of machines to extract, process and republish content in whatever form (text, diagrams, images, data, audio, video, etc.) without prior specific permissions other than community norms of responsible behaviour in the electronic age.&#8221;
</li></ul><p>&#8220;Responsible behaviour&#8221; and &#8220;community norms&#8221; covers stuff like server overloading, personal data, deliberate corruption, and adherence to generally accepted Internet practice.
</p><p>That&#8217;s the aspiration. BBB are aspirations. Some scholars and some publishers have adopted them enthusiastically. They have helped enormously.
</p><p>
 </p><p>
 </p><p> 
 </p><p>
 </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/myHdGAdilgU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/05/01/towards-a-manifesto-on-open-mining-of-scholarship/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/05/01/towards-a-manifesto-on-open-mining-of-scholarship/</feedburner:origLink></item>
		<item>
		<title>A pictorial Amusement</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/NMuH6h6Qz4w/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/04/30/a-pictorial-amusement/#comments</comments>
		<pubDate>Mon, 30 Apr 2012 17:55:06 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/2012/04/30/a-pictorial-amusement/</guid>
		<description><![CDATA[I dropped in to see our computer officers today – they&#8217;ve just had an aircon failure and I was offering sympathy – they have a lot to deal with. While there I noticed this splendid spanner (== wrench/US). I love tools and this one has a majesty of its own in a computer office. It&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://blogs.ch.cam.ac.uk/pmr/files/2012/04/043012_1754_ApictorialA1.jpg" alt="" />
	</p><p>I dropped in to see our computer officers today – they&#8217;ve just had an aircon failure and I was offering sympathy – they have a lot to deal with.  While there I noticed this splendid spanner (== wrench/US). I love tools and this one has a majesty of its own in a computer office. It&#8217;s about 40 cm long (see ruler) and we guess it&#8217;s about 2 kilos.
</p><p>I naturally assumed it was for something like bolting units to the floor or something like that, but that&#8217;s not why it was ordered. The reason is gently amusing – perhaps you can make some guesses).
</p><p>Meanwhile tomorrow I&#8217;ll be blogging about text-mining. I&#8217;ve been hacking code furiously over the last 5 days and feeling it. There is a lot I need to write about but textmining is the priority.
</p><p>
 </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/NMuH6h6Qz4w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/04/30/a-pictorial-amusement/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/04/30/a-pictorial-amusement/</feedburner:origLink></item>
		<item>
		<title>Text-mining the scholarly literature: towards a set of universal Principles; Update and strategy</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/b_65BMjoapE/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/04/25/text-mining-the-scholarly-literature-towards-a-set-of-universal-principles-update-and-strategy/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 08:51:44 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/2012/04/25/text-mining-the-scholarly-literature-towards-a-set-of-universal-principles-update-and-strategy/</guid>
		<description><![CDATA[For some years I have seen the primary literature as an enormous untapped resource of scholarly information. We humans are very good at some aspects of &#8220;reading the literature&#8221; but there are many areas where machines are better and should be used. These include scale (hundreds of thousands of manuscripts), checking, validation, transformation (e.g. scientific [...]]]></description>
			<content:encoded><![CDATA[<p>For some years I have seen the primary literature as an enormous untapped resource of scholarly information. We humans are very good at some aspects of &#8220;reading the literature&#8221; but there are many areas where machines are better and should be used. These include scale (hundreds of thousands of manuscripts), checking, validation, transformation (e.g. scientific units), deduction (many papers have implicit semantics), aggregation of knowledge, and much more. We are now reaching the time when the technology of &#8220;text-mining&#8221; is mature enough to  deploy and, for example, my group and I have developed among the best tools in the world for mining chemistry. I am now expanding that to other fields which I will describe in later posts. 
</p><p>In general the readers of the scholarly literature (who may include the #scholarlypoor) have been seriously frustrated by the restrictions imposed by publishers and universally agreed by librarians. Most subscriptions to most major journals have terms forbidding readers to mine/crawl/index/extract etc. This is not a consequence of copyright – it is an additional restriction imposed by published and apparently automatically assented to by academic purchasing systems (mainly libraries). This automatic assent has done scholarship a grave disservice, so I give the library community a chance to correct the historical record:
</p><p><em>Has any library ever publicly challenged the terms of use [on mining] set by publishers? </em>I haven&#8217;t seen any.  But I&#8217;d be grateful to know public cases, and what happened. My current view is that publishers set conditions and that libraries accept them verbatim, which, unfortunately, means that they don&#8217;t have a track record of fighting for text-mining or other freedoms. <em>
		</em></p><p>Moving on, the UK Hargreaves report has recommended removing these restrictions (which are not legally required) and also modifying copyright law. My grapevine suggests there is a high probability that significant changes will be made and that &#8220;text-mining&#8221; will become widely available without requiring explicit permission. We should prepare for this, and any responsible publisher and library/purchaser should be preparing for this. 
</p><p>A month ago I and colleagues in OKF submitted cases to the Hargreaves process. As part of that I asked 6 major publishers whether I could &#8220;text-mine&#8221; their journals. Naomi Lillie of OKF is summarising the results and I will keep you in suspense till then. It&#8217;s fair to say some were helpful, some were not and some were fuzzy (for whatever motivation).  
</p><p>A number of publishers said <strong>we should discuss it with the library. There is no need for this</strong>. I and my group can text mine material by myself – in one week Daniel Lowe extracted 500,000 chemical reactions from the US Patent Office without needing any help. Nick Day has built PubCrawler and extracted 200,000 crystal structures from supplemental information without any help.  The only thing I need is:
</p><ul><li>An assurance I won&#8217;t be sued for behaving like a responsible scholar
</li><li>An assurance that my institution won&#8217;t get cut off for (my) responsible behaviour
</li></ul><p>In case anyone in the publishing or library communities doesn&#8217;t understand what &#8220;responsible&#8221; means, it means:
</p><ul><li>I do not intend deliberately to re-publish the publishers manuscripts (&#8220;the PDF&#8221;) in bulk without valid scholarly reason.
</li></ul><p>I am a responsible scholar. I conform to health and safety. I obey the law of the UK. I do not steal. I can justify the expenditures on my grants. I attempt to value and promote human equality in my scholarship. I try to give credit where it is due. Responsible scholarship is a fundamental principle which I believe applies to almost all readers of the scholarly literature. Occasionally I and others fail – there are ample mechanisms for addressing these without forbidding textmining.
</p><p><strong>So this post asserts my absolute right as a subscriber to the scholarly literature to carry out textmining and to disseminate the results to anyone. I do not need any other permissions.
</strong></p><p>A number of details follow which I&#8217;ll address in later posts.
</p><p>At present, therefore, a group of us – under the aegis of the Open Knowledge Foundation – is drafting a set of principles for textmining. They include:
</p><ul><li>Heather Piwowar. Heather has written several blogposts (<a href="http://researchremix.wordpress.com/">http://researchremix.wordpress.com/</a> ) about text-mining. They include negotiations with Elsevier (which include the need for Elsevier and librarians to give her permission) and more recently a manifesto (<a href="http://researchremix.wordpress.com/2012/04/20/new-fron/">http://researchremix.wordpress.com/2012/04/20/new-fron/</a> ).
</li><li>Maximilian Haussler. See (<a href="http://blogs.ch.cam.ac.uk/pmr/2012/03/09/textmining-update-max-haussler%E2%80%99s-questions-to-publishers-they-have-a-duty-to-reply/">http://blogs.ch.cam.ac.uk/pmr/2012/03/09/textmining-update-max-haussler%E2%80%99s-questions-to-publishers-they-have-a-duty-to-reply/</a> ). Max was quoted 85,000 USD by NPG to mine their content (I think this has been altered to 0?) . He and colleagues have fought for the right and he has submitted a detailed case to the US government
</li><li>Diane Cabell and Jenny Molloy, OKF. Diane is a specialist in intellectual property law and has helped to craft the <a href="http://science.okfn.org/2012/03/21/response-to-ipo-consultation-on-text-mining-copyright-exception/">OKF open-science response to Hargreaves</a>. 
</li><li>Ross Mounce. Panton fellow (<a href="http://about.me/rossmounce">http://about.me/rossmounce</a> ). Ross has created a superb and damning summary of publishers distortion of the term &#8220;Open Access&#8221; in paid hybrid journals. Ross and I are now working on the technology and strategy of textmining.
</li></ul><p>We shall come up with a manifesto/set-of-principles. This will be a statement of our rights and our responsibilities. It is not a negotiation, anymore than Tom Paine or the Founding fathers negotiated in the construction of their declarations. Or, more recently, the BBB declarations of Open Access. Those declaration are priceless – it&#8217;s just a pity that there are not enough who believe in them enough to push for their universal acceptance. We shall not make the same mistake with the principles of textmining.
</p><p>
 </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/b_65BMjoapE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/04/25/text-mining-the-scholarly-literature-towards-a-set-of-universal-principles-update-and-strategy/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/04/25/text-mining-the-scholarly-literature-towards-a-set-of-universal-principles-update-and-strategy/</feedburner:origLink></item>
		<item>
		<title>Panton Fellows, Principles in Japanese, #pantonscience</title>
		<link>http://feedproxy.google.com/~r/PetermrsBlog/~3/vADAlIyMERc/</link>
		<comments>http://blogs.ch.cam.ac.uk/pmr/2012/04/24/panton-fellows-principles-in-japanese-pantonscience/#comments</comments>
		<pubDate>Tue, 24 Apr 2012 13:26:33 +0000</pubDate>
		<dc:creator>pm286</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.ch.cam.ac.uk/pmr/2012/04/24/panton-fellows-principles-in-japanese-pantonscience/</guid>
		<description><![CDATA[   It&#8217;s been an exciting week in Pantonia. I have been very active with our new Panton Fellows (http://science.okfn.org/2012/04/03/introducing-our-panton-fellows/) Last Monday Ross Mounce came over to Cambridge and we looked in depth about liberating information about phylogenetic trees. This is exciting and keeps me up at night and active on train journeys. And yesterday I [...]]]></description>
			<content:encoded><![CDATA[<p>
 </p><p>
 </p><p>It&#8217;s been an exciting week in Pantonia. I have been very active with our new Panton Fellows (<a href="http://science.okfn.org/2012/04/03/introducing-our-panton-fellows/">http://science.okfn.org/2012/04/03/introducing-our-panton-fellows/</a>)  Last Monday Ross Mounce came over to Cambridge and we looked in depth about liberating information about phylogenetic trees. This is exciting and keeps me up at night and active on train journeys. And yesterday I took the train to Oxford to visit Sophie Kershaw who&#8217;s putting together a radically different course for Graduates, with emphasis in reproducible computing. I&#8217;m deliberately downplaying both of these here, as they&#8217;ll be telling you all about what they are doing. 
</p><p>Part of yesterday was an evening meeting run by Jenny Molloy – a new Open Science groups with about 12 of us in the Oxford eResearch Centre (OeRC) where we met Dave de Roure who took us out the dinner in the Royal Oak. While there we discussed in some depth what need to be done for text-mining including Diane Cabell and Dave Shotton. It&#8217;s really great to see critical mass in this way. I will have a LOT to write about textmining.
</p><p><img src="http://blogs.ch.cam.ac.uk/pmr/files/2012/04/042412_1326_PantonFello13.jpg" alt="" />
	</p><p>So today I met with Ayumi Koso (above) from Tokyo. Ayumi works with the Japanese government in Tokyo on the National Bioscience Database Centre (NBDC). She has already translated the Panton Principles into Japanese (<a href="http://pantonprinciples.org/translations/#Japanese">http://pantonprinciples.org/translations/#Japanese</a> ). She&#8217;s staying in Cambridge and so today has a chance to meet some OKF people. Here&#8217;s our visit to the Panton Arms, preceded by a visit to Hinxton/Sanger Centre to visit Tim Hubbard (OKF advisory). And this afternoon Laura Newman will be coming round to meet.
</p><p>I am really fortunate to be living in the middle of all this.
</p><p>(We&#8217;ve decided today that the Panton hashtag is #pantonscience)
</p><p> 
 </p><img src="http://feeds.feedburner.com/~r/PetermrsBlog/~4/vADAlIyMERc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://blogs.ch.cam.ac.uk/pmr/2012/04/24/panton-fellows-principles-in-japanese-pantonscience/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://blogs.ch.cam.ac.uk/pmr/2012/04/24/panton-fellows-principles-in-japanese-pantonscience/</feedburner:origLink></item>
	</channel>
</rss>

