<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
  <title>HubLog</title>
  <id>tag:hublog.hubmed.org,2010://2</id>
  <updated>2012-05-17T04:48:17+00:00</updated>
  <author>
    <name>Alf Eaton</name>
  </author>
  <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/hublog" /><feedburner:info uri="hublog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:browserFriendly>This is an XML content feed. It is intended to be viewed in a newsreader or syndicated to another site, subject to copyright and fair use.</feedburner:browserFriendly><entry>
    <id>tag:hublog.hubmed.org,2010://2.1954</id>
    <title>Adding Files to Google Drive using PHP</title>
    <updated>2012-05-01T17:28:52+00:00</updated>
    <published>2012-05-01T09:19:27+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001954.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p><a href="http://drive.google.com/">Google Drive</a> has <a href="https://developers.google.com/drive/">an API</a> which lets web applications <A href="https://developers.google.com/drive/v1/reference/files/insert">write files to a user's Google Drive</a>, and also register to open those files. Here are my notes on using the API via PHP to upload a file to Google Drive.</p>

<h3>Google API Project and Google Chrome extension</h3>

<p>First, you need to register your application as an API Project in <a href="https://code.google.com/apis/console/">Google's API Console</a>, and enable the "Drive API" and "Drive SDK" services.</p>

<p>In the "API Access" section, generate an OAuth 2.0 client ID (choose "Web application", and enter the URL of the web application). This should provide a client ID, a client secret, and a URI to which OAuth authentication requests will redirect after confirmation.</p>

<p>In the "Drive SDK" section, fill in all the required information. Use the client ID generated above for the "OAuth Client ID" field. I'm not sure exactly which scopes are needed here, but I've entered "https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/userinfo.email" and "https://www.googleapis.com/auth/userinfo.profile".</p>

<p>Next, the App ID from the top of the "Drive SDK" section needs to be added to a Chrome extension. Create a Google Chrome extension for your web application, including the settings <code>"container": "GOOGLE_DRIVE"</code> and <code>"api_console_project_id"</code> (the App ID) in its <a href="https://code.google.com/p/google-drive-sdk-samples/source/browse/chromewebstore/manifest.json">manifest.json</a>, and <a href="https://chrome.google.com/webstore/developer/dashboard">publish it on the Chrome Web Store</a>.</p>

<p>When a user installs this Chrome extension, the web application asks for permission to access the user's Google Drive space.<p>

<h3>PHP Client Library for Google APIs</h3>

<p>Google provide <a href="http://code.google.com/p/google-api-php-client/">an API Client Library in PHP</a>, and if you <a href="http://code.google.com/p/google-api-php-client/source/checkout">checkout the current version of the code</a> there is a newly-added <a href="http://code.google.com/p/google-api-php-client/source/browse/trunk/src/contrib/apiDriveService.php">apiDriveService</a> class. At the time of writing it still has a few bugs, so <a href="https://gist.github.com/2566379">this patch</a> needs to be applied.</p>

<p>I've used the API Client Library in <a href="https://gist.github.com/2560093">a simple PHP script to upload a PDF file to Google Drive</a>. The steps involved are described below.</p>

<p>To use the client library, put the src folder in your include path, and include the appropriate files in a PHP script:</p>
<pre><code>require 'google-api/apiClient.php';
require 'google-api/contrib/apiOauth2Service.php';
require 'google-api/contrib/apiDriveService.php';</code></pre></p>

<p>Initialise the client, using the OAuth client ID, client secret and client redirect URI generated in the API console:</p>
<pre><code>$client = new apiClient();
$client->setUseObjects(true);
$client->setAuthClass('apiOAuth2');
$client->setScopes(array('https://www.googleapis.com/auth/drive.file'));
$client->setClientId($config['client_id']);
$client->setClientSecret($config['client_secret']);
$client->setRedirectUri($config['client_uri']);
</code></pre>

<p>To authenticate the client for this user, first generate an authentication URL:</p>
<pre><code>print $client->createAuthUrl(array('https://www.googleapis.com/auth/drive.file'))</code></pre>

<p>Visit that URL in a web browser, and confirm that the web application is allowed to access your Google Drive files. Copy the code from the end of the URL to which you are redirected after confirmation, and use it as $_GET['code'] to retrieve an access token:</p>
<pre><code>$_GET['code'] = ''; // insert the verification code here
file_put_contents('token.json', $client->authenticate());</code></pre>

<p>Use the stored access token to authenticate the client each time it runs:</p>
<pre><code>$client->setAccessToken(file_get_contents('token.json'));</code></pre>

<p>Create a new Google Drive file, attach the raw data (from "test.pdf" in this case) and upload it to the Google Drive service:</p>
<pre><code>$file = new DriveFile;
$file->setMimeType('application/pdf');
$file->setTitle('test.pdf'); // it's important that the title includes the correct file extension

$service = new apiDriveService($client);
$insertedFile = $service->files->insert($file, array('data' => file_get_contents('test.pdf'), 'mimeType' => 'application/pdf'));</code></pre>

<p>The file should now be available in Google Drive.</p>
]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1953</id>
    <title>Working with the Harvard Library Bibliographic Dataset</title>
    <updated>2012-04-27T07:55:17+00:00</updated>
    <published>2012-04-27T07:54:10+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001953.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>Earlier this week, Harvard Library released <a href="http://openmetadata.lib.harvard.edu/bibdata">a set of metadata for 12 million items held by the library</a>, under a <a href="http://creativecommons.org/publicdomain/zero/1.0/">Creative Commons CC0 (Public Domain) license</a>.</p>

<p>Here's what I've done with it so far:</p>

<p>Start a Ubuntu server instance on <a href="https://console.aws.amazon.com/ec2/">Amazon EC2</a> (the "medium" variety works best, or the "high-CPU medium" (dual-core) if you have a script that can run in parallel).</p>

<p>Create a 10GB volume, attach it to the instance and mount it at /marc: <pre><tt>sudo mkfs.xfs /dev/xvdf
sudo mkdir /marc
sudo mount /dev/xvdf /marc
sudo chmod 777 /marc</tt></pre></p>

<p>I also mounted an empty 50GB volume at /mods, for storing individual files transformed to a different format (see below).</p>

<p>The dataset is available as a gzipped tarball containing 10 MARC21-formatted files. Download to a temporary directory using <a href="http://aria2.sourceforge.net/">aria2</a>: <pre><tt>aria2c --max-connection-per-server=5 http://openmetadata.lib.harvard.edu/bibdata/data</tt></pre> or wget: <pre><tt>wget --continue http://openmetadata.lib.harvard.edu/bibdata/data</tt></pre></p>

<p>Once the file has downloaded, extract the MARC21 files: <tt>tar -xvzf harvard.tar.gz -C /marc</tt>. I've shared an EBS snapshot of this stage (snap-a099a1dd in us-east-1), so you can start from this point by creating a volume from it and attaching it to a running EC2 instance.</p>

<p>Run <a href="https://gist.github.com/2481431#file_marc_to_mods.php">a PHP script</a> that opens each MARC21 file, converts each record to MARCXML using <a href="http://pear.php.net/package/File_MARC/">File_MARC</a>, then transforms each record to MODS, using <a href="http://www.loc.gov/standards/mods/mods-conversions.html">a stylesheet provided by the Library of Congress</a>. I've also shared an EBS snapshot of this stage (snap-90333fed in us-east-1), as it took a while to run. It happened to leave the MODS namespace off the output XML, which may or may not make it easier to work with&hellip;</p>

<p>I've begun one more step, which is <a href="https://gist.github.com/2481431#file_mods_to_cloudsearch.php">an XSL transformation</a> from MODS to <a href="http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/DocumentsBatch.XML.html">CloudSearch input XML</a> - the idea being to import all the data into a CloudSearch instance and make it browsable/searchable there. There are still some fields to add, though. Other transformations that might be useful include Turtle, for import into a triplestore like <a href="http://kasabi.com/">Kasabi</a>; HTML, for browsable/crawlable individual records; JSON, for loading into ElasticSearch, MongoDB or a JS interface.<p>]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1952</id>
    <title>BBC Radio -&gt; XSPF Bookmarklet</title>
    <updated>2012-03-12T12:04:57+00:00</updated>
    <published>2012-03-12T10:44:24+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001952.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>A tiny bookmarklet for getting from a <a href="http://www.bbc.co.uk/programmes/b01d8tvw">BBC Radio</a> or <a href="http://www.bbc.co.uk/iplayer/episode/b01cvq24/">iPlayer</a> web page to <a href="http://www.bbc.co.uk/programmes/b01d8tvt/segments.xspf">the XSPF tracklisting</a>, for use in <a href="http://www.tomahawk-player.org/">Tomahawk</a>:</p>

<p><a href="javascript:(function(){var matches=location.pathname.match(/^\/iplayer\/episode\/(\w+)/);if(!matches)matches=location.pathname.match(/^\/programmes\/(\w+)/);if(matches)location.href=&quot;/programmes/&quot;+matches[1]+&quot;/segments.xspf&quot;;})();">Programme XSPF</a></p>
]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1951</id>
    <title>How To Text Mine Open Access Documents</title>
    <updated>2012-03-12T22:52:47+00:00</updated>
    <published>2012-02-22T09:09:22+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001951.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<aside><p>Note: I've placed all the code and files associated with this post in <a href="https://github.com/hubgit/open-access-text-mining">a repository on GitHub</a>. If you'd like to fork the project and add a script which processes documents through your favourite entity extractor (storing the results in a new directory), I'd be happy to receive pull requests.</p></aside>

<h3>Fetching the documents</h3>

<p>First of all, find a set of open access documents in a standard XML format. Articles deposited in <a href="http://www.ncbi.nlm.nih.gov/pmc/">PubMed Central</a> (PMC) are ideal, as they are converted from publisher-specific DTDs to one of the standard NLM Journal Article DTDs during deposition. PMC also has <a href="http://www.ncbi.nlm.nih.gov/pmc/tools/oai/">an OAI interface</a>, which makes it straightforward to find and retrieve articles.</p>

<p>To find the name of a set of articles, use the OAI "ListSets" command to <a href="https://github.com/hubgit/open-access-text-mining/blob/master/fetch-sets.php">fetch all the sets into a local CSV file</a>. Have a look through that file and find the set you're interested in - in this case I'm using "elsevierwt": Elsevier's "Sponsored Documents", for which a fee has been paid on publication to make the articles open access; <a href="http://www.elsevier.com/wps/find/authorsview.authors/supplementalterms">the license</a> allows text mining for non-commercial purposes*.</p>

<p>Use that set name with the OAI "ListIdentifiers" command to <a href="https://github.com/hubgit/open-access-text-mining/blob/master/fetch-identifiers.php">fetch the identifiers for all documents in that set into a local CSV file</a>. This script checks that each article is also in the "pmc-open" set, which denotes <a href="http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/">the Open Access subset of PubMed Central</a>.</p>

<p>For each identifier, use the OAI "GetRecord" command to <a href="https://github.com/hubgit/open-access-text-mining/blob/master/fetch-records.php">fetch the document XML into a local folder</a>. The document identifier can be base64-encoded into the filename, so it's easy to identify later.</p>

<h3>Converting the documents</h3>

<p><a href="https://github.com/hubgit/open-access-text-mining/blob/master/conversion/nlm-3.0/process.php">Convert all the XML files to the most up-to-date NLM Journal Article DTD</a>, using <a href="http://dtd.nlm.nih.gov/tools/tools.html">the XSL transformation provided by the NLM for this purpose</a>. In this case, I'm converting from v2 to v3 of the NLM Journal Article Archiving and Interchange format; once <a href="http://jats.nlm.nih.gov/">JATS</a> becomes the official standard hopefully the same tools will be provided for conversion.</p>

<p><a href="https://github.com/hubgit/open-access-text-mining/blob/master/conversion/turtle/process.php">Convert the article metadata from the XML into RDF triples</a> in <a href="http://en.wikipedia.org/wiki/Turtle_(syntax)">Turtle</a>, and store them in <a href="http://kasabi.com/dataset/elsevier-sponsored-documents">a Kasabi data set</a>:<p>
<pre><tt>find . -name '*.ttl' -exec curl -vvv -H "Content-Type: text/turtle" --data-binary @{} http://api.kasabi.com/dataset/{$STORE}/store?apikey={$APIKEY} \;</tt></pre>

<p>Finally, <a href="https://github.com/hubgit/open-access-text-mining/blob/master/conversion/html/process.php">convert the body of the article to simple HTML</a>, using another XSL transformation. All the inline elements will become "span" elements, all the block-level elements will become "div" elements**.</p>

<h3>Text mining</h3>

<p>Now the articles are ready for text mining. Choose an entity extraction tool or web service and <a href="https://github.com/hubgit/open-access-text-mining/blob/master/mining/proteins/process.php">run each article through it</a>. I'm using the EBI's <a href="http://www.ebi.ac.uk/webservices/whatizit/">Whatizit</a> here, which has a SOAP web service that understands plain text and returns XML. If you're lucky, you'll have a simple HTTP POST web service that understands HTML and returns JSON.</p>

  <p>Store the results locally, and <a href="https://github.com/hubgit/open-access-text-mining/blob/master/mining/proteins/convert.php">extract the data you need into RDF triples as Turtle</a>. So far, I've extracted disease and protein names from these articles using Whatizit; the easiest way to find the names for the Whatizit processing pipelines is to View Source and look at the options in <a href="http://www.ebi.ac.uk/webservices/whatizit/info.jsf">Whatizit's HTML entry form</a>.</p>

<p>Post the Turtle files to <a href="http://kasabi.com/dataset/elsevier-sponsored-documents">the same Kasabi data set as the article metadata</a>, where they can be browsed and <a href="http://kasabi.com/dataset/elsevier-sponsored-documents/queries/sample-resources">queried using SPARQL</a>:</p>
<pre><tt>find . -name '*.ttl' -exec curl -vvv -H "Content-Type: text/turtle" --data-binary @{} http://api.kasabi.com/dataset/{$STORE}/store?apikey={$APIKEY} \;</tt></pre>

<hr>

<p>* This particular license is quite vague, full of restrictions, and doesn't mention what you can do with derivative works - such as the results of text mining. You might want to choose a set of articles from PLoS or BioMed Central instead, which are clearly licenced with Creative Commons CC-BY licences.</p>

<p>** Each element retains its attributes, and a "class" attribute is added for styling if you ever want to display this HTML.</p>

]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1950</id>
    <title>Open Access Author Manuscripts in PubMed Central</title>
    <updated>2012-02-20T19:07:45+00:00</updated>
    <published>2012-02-20T18:55:23+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001950.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>Of <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)">~1,000,000 articles deposited in PubMed Central from the last 10 years</a>, <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)+AND+%22open+access%22%5BFilter%5D">~300,000 are marked as "Open Access"</a>. Many of these come from "paid open access" programmes, where a fee is paid when the article is published so that it becomes more freely accessible. These schemes include <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)+AND+%22open+access%22%5BFilter%5D+AND+%22springer+open+choice%22%5BFilter%5D">"Springer Open Choice" (15,000 articles)</a>, <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)+AND+%22open+access%22%5BFilter%5D+AND+%22elsevier+sponsored+documents%22%5BFilter%5D">"Elsevier Sponsored Articles" (2,711 articles)</a> and <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)+AND+%22open+access%22%5BFilter%5D+AND+%22npg+open+access%22%5BFilter%5D">"NPG Open Access" (561 articles)</a>.</p>

<p>Of  the ~1,000,000 articles in PubMed Central, <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)+AND+%22author+manuscript%22%5BFilter%5D">~180,000 are marked as "Author Manuscript"</a> - deposited in response to funder mandates, or where the author has retained permission to deposit a pre-print in repositories. However, only <a href="http://www.ncbi.nlm.nih.gov/pmc/?term=(%222002%22%5BPublication+Date%5D+%3A+%222012%22%5BPublication+Date%5D)+AND+%22author+manuscript%22%5BFilter%5D+AND+%22open+access%22%5BFilter%5D">163 of these are marked as "Open Access"</a>.</p>

<p>The question - <a href="http://caseybergman.wordpress.com/2012/02/11/why-the-research-works-act-doesnt-affect-text-mining-research">raised by Casey Bergman</a> - is why there is so little overlap: why are author manuscripts not being marked as Open Access in PubMed Central?</p>]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1949</id>
    <title>ISSN(L)s And Serial Title Abbreviations</title>
    <updated>2012-02-09T19:16:28+00:00</updated>
    <published>2012-02-09T18:56:06+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001949.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>I'd like to build a non-copyrighted list of journals/serials (anything with an ISSN, basically) - including their ISSNs, full titles and abbreviated titles.</p>

<p>As a start, every item needs an identifier. The <a href="http://issn.org/">ISSN International Centre</a> assigns ISSNs to publications that request them, but <a href="http://www.issn.org/2-22659-ISSN-Data-file.php">the data file listing all the ISSN:serial title mappings</a> is copyrighted, expensive and not redistributable (it also uses sentence case for journal titles, which means some information gets lost).</p>

<p>As each publication can have multiple ISSNs - one for each medium in which it is distributed (for example, the online version of a publication can have a different ISSN to the print version) - a pan-ISSN identifier is required to link all the ISSNs together, so the <a href="http://www.issn.org/2-22637-What-is-an-ISSN-L.php">ISSN-L</a> was introduced.</p>

<p><a href="http://www.issn.org/2-24117-Download-the-ISSN-ISSN-L-table.php">The table that maps ISSNs to ISSNLs</a> can be downloaded from issn.org after filling in a form requesting access. In the latest table, there are 1,614,355 unique ISSNs, mapped to 1,552,542 unique ISSNLs.

<p>This ISSNL:ISSN mapping table, like all the information published by the ISSN Internation Centre, is protected by sui generis database rights, which last, <a href="http://en.wikipedia.org/wiki/Database_Directive">according to Wikipedia</a>, for 15 years from the last substantial update.</p>

<blockquote cite="http://www.issn.org/2-22687-Legal-notices.php">The databases appearing on or accessible from the website "the ISSN International Centre" are the exclusive property of CIEPS and are protected under the provisions of the law of 1st July 1998 implementing in the Intellectual Property Code the European Directive of 11 March 1996 on the legal protection of databases. Any performance, whether total or partial, of this site by any company whatsoever, without the express authorization of the CIEPS is strictly forbidden and shall constitute an infringement sanctioned such as Intellectual Property Code.</blockquote>
  
  <cite><a href="http://www.issn.org/2-22687-Legal-notices.php">ISSN International Centre, Legal Notice, Section 3</a></cite>

<p>(note, incidentally, that I'm already in conflict with <a href="http://www.issn.org/2-22687-Legal-notices.php">section 3 of the legal notice</a>: "Users and visitors cannot place a hyperlink to this website without the CIEPS' express and prior authorization."&hellip;)</p>

<p>The full database itself may be copyrighted, but (I believe) the individual facts within it shouldn't be. In which case, the best non-copyrighted source for the journal title, ISSN and ISSN-L of each serial is probably the publishers themselves, and as many publishers make their metadata through an OAI interface it may be possible to extract a fair amount of serials information from those sources. <a href="http://en.scientificcommons.org/repository/overview">ScientificCommons</a>, for example, harvests articles from OAI repositories, so may be able to aggregate useful title and ISSN information.</p>

<h3>Existing, non-free sources</h3>

<p>A commercial source of the information I'm looking to create is the <a href="http://journalseek.net/">JournalSeek</a> database (from Genamics, <a href="http://nj.oclc.org/journalseek/">licensed through OCLC</a>) which <a href="http://journalseek.net/publishers.htm">includes around 100,000 journals</a>.</p>

  <p><a href="http://www.sherpa.ac.uk/romeo/journalbrowse.php?fIDnum=|&mode=simple&la=en">SHERPA/RoMEO aggregates journal lists</a> from <a href="http://zetoc.mimas.ac.uk/jnllist.html">Zetoc</a>, <a href="http://www.doaj.org/doaj?func=loadTempl&templ=faq#metadata">DOAJ</a> (7500 journals) and <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=journals&Cmd=DetailsSearch&Term=currentlyindexed%5BAll%5D">Entrez</a> (<a href="http://www.ncbi.nlm.nih.gov/entrez/citmatch_help.html#JournalLists">40,000 journals</a>), but <a href="http://www.sherpa.ac.uk/romeoreuse.html">the data is only available for non-commercial use</a>.</p>

<h3>Other sources of Journal/ISSN information</h3>

<ul>
  <li><a href="http://www.serialssolutions.com/management/ulrichs/">300,000 serials in Ulrich's</a> (JSON interface somewhere?)</li>
    <li><a href="http://www.serialssolutions.com/resources/detail/summon-serials-titles">SerialsSolutions Summon</a> (title list as PDF)</li>
  <li><a href="http://www.ebscohost.com/titleLists/a9h-journals.htm">13,000 journals in EBSCO "Academic Search Complete"</a></li>
  <li><a href="http://www.oclc.org/worldcatlocal/overview/content/journals.htm">91,000 journals in WorldCat Local</a></li>
  <li><a href="http://ip-science.thomsonreuters.com/cgi-bin/jrnlst/jlresults.cgi?PC=MASTER">17,000 journals in Thomson Reuters Master Journal List</a></li>
  <li><a href="http://www.crossref.org/titleList/">27,000 journals in CrossRef</a></li>
  <li><a href="http://www.portico.org/digital-preservation/who-participates-in-portico/participating-titles">12,500 journals archived by Portico</a></li>
  <li><a href="http://lockss.org/lockss/Publishers_and_Titles">9,000 journals participating in LOCKSS</a></li>
    <li><a href="http://www.sciencedirect.com/science/journals">3,330 journals in ScienceDirect</a></li>
  <li><a href="http://academic.research.microsoft.com/RankList?entitytype=4&topDomainID=6">Journal lists by subject in Bing Academic Search</a></li>
  <li><a href="http://cassi.cas.org/search.jsp">Search journals in the CAS Source Index</a> (copyrighted by the ACS)</li>
</ul>
      
 <p>It would be nice if Freebase could serve as a central repository for this information, but there are <a href="http://www.freebase.com/view/book/journal">only 4,321 journals in Freebase</a> so far.</p>

<h3>Abbreviations</h3>
      
      <p>Once we have the list of ISSNs and journal titles, we also need the corresponding journal title abbreviations, for use when generating bibliographies. There are <a href="http://www.library.uq.edu.au/faqs/endnote/journal_terms.html">lists of journal title abbreviations available for import into EndNote</a>. Sadly, there are several different abbreviation styles (ISO, MEDLINE, BIOSIS, CASSI, etc).</p>

<ul>
  <li>The ISSN International Centre maintains the <a href="http://www.issn.org/2-22660-LTWA.php">list of Title Word Abbreviations</a> which corresponds to the ISO 4 standard (<a href="http://www.iso.org/iso/catalogue_detail?csnumber=3569">available from the ISO store for ~£50</a>), which describes the rules for abbreviating title words and titles of publications. The <a href="http://www.issn.org/2-22661-LTWA-online.php">list of Title Word Abbreviations is available online as HTML</a>.</li>
  <li><a href="http://www.nlm.nih.gov/pubs/factsheets/constructitle.html">The NLM uses the list of Title Word Abbreviations to abbreviate periodical titles.</a></li>
  <li><a href="http://www.ncbi.nlm.nih.gov/books/NBK7251/">The NCBI provides a less-comprehensive list of abbreviations for commonly-used English words in journal titles.</a></li>
  <li><a href="http://www.compholio.com/latex/jabbrv/">The jabbrv LaTeX package abbreviates journal titles using the list of Title Word Abbreviations.</a></li>
  <li>The <a href="http://jabbr.mannlib.cornell.edu/">JAbbr service</a> built a list of abbreviated serial titles from the Cornell library catalog of MARC records, and provides abbreviation &rarr; full title mapping as a JSON and HTML web service (source code provided). <a href="http://journal.code4lib.org/articles/1758">Article in Code4Lib Journal.</a></li>
  <li><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=journals&term=%22Nat%20Methods%22[Title%20Abbreviation]">The NLM Catalog is searchable using a journal title abbreviation</a>, but uses sentence case for full titles.</li>
  <li><a href="http://www.ncbi.nlm.nih.gov/nlmcatalog?term=%221548-7091%22%5BISSNL%5D">The NLM Catalog is also searchable by various ISSNs.</a></li>
    <li><a href="http://images.webofknowledge.com/WOK46/help/WOS/A_abrvjt.html">Web of Science has a list of journal title abbreviations.</a></li>
  <li><a href="http://www.abbreviations.com/jas.asp">A large list of Journal Abbreviation Sources.</a></li>
</ul>
      
      <h3>More links</h3>
      <p><a href="http://pinboard.in/u:hubpin/t:journals/">http://pinboard.in/u:hubpin/t:journals/</a></p>

]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1948</id>
    <title>Extracting Text From A PDF Using Only Javascript</title>
    <updated>2011-11-18T11:38:16+00:00</updated>
    <published>2011-11-18T10:55:04+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001948.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>Using an HTML page like <a href="https://gist.github.com/1376120">this</a>, which embeds a PDF-to-text extraction service I built using <a href="https://github.com/mozilla/pdf.js">pdf.js</a>, you can extract the text from a PDF using only client-side Javascript:</p>

<pre><code>
&lt;!-- edit this; the PDF file must be on the same domain as this page -->
&lt;iframe id="input" src="your-file.pdf">&lt;/iframe>

&lt;!-- embed the pdftotext service as an iframe -->
&lt;iframe id="processor" src="http://hubgit.github.com/2011/11/pdftotext/">&lt;/iframe>

&lt;!-- a container for the output -->
&lt;div id="output">&lt;/div>

&lt;script>
var input = document.getElementById("input");
var processor = document.getElementById("processor");
var output = document.getElementById("output");

// listen for messages from the processor
window.addEventListener("message", function(event){
  if (event.source != processor.contentWindow) return;

  switch (event.data){
    // "ready" = the processor is ready, so fetch the PDF file
    case "ready":
      var xhr = new XMLHttpRequest;
      xhr.open('GET', input.getAttribute("src"), true);
      xhr.responseType = "arraybuffer";
      xhr.onload = function(event) {
        processor.contentWindow.postMessage(this.response, "*");
      };
      xhr.send();
    break;

    // anything else = the processor has returned the text of the PDF
    default:
      output.textContent = event.data.replace(/\s+/g, " ");
    break;
  }
}, true);
&lt;/script>
</code></pre>

<p><a href="http://hubgit.github.com/2011/11/pdftotext/example/">See an example running as a live demonstration.</a></p>

<p>It'll only work in recent browsers, as it requires <a href="http://updates.html5rocks.com/2011/09/Workers-ArrayBuffer">sending binary data between windows as an ArrayBuffer using window.postMessage</a>, and <a href="http://www.html5rocks.com/en/tutorials/workers/basics/">Web Workers</a> in pdf.js.</p>

<p>Basically, this fetches a PDF as an ArrayBuffer using XMLHTTPRequest, then posts it to the embedded window, which uses <a href="https://github.com/mozilla/pdf.js">pdf.js</a> to render the PDF to Canvas (invisibly; you can see the rendered images if you poke around a bit with a web inspector tool). As it does so, <a href="https://github.com/mozilla/pdf.js/pull/738">an HTML layer is constructed</a>, containing a block to match each row of the PDF - this would normally be overlaid on top of the rendered images to allow text to be selected, a technique used by many services that allow PDF text selection and highlighting, including <a href="http://crocodoc.com/">Crocodoc</a> and Google Docs' PDF viewer. By taking the text content of those blocks, the service can return the contents of the PDF as a single block of text.</p>
  
<p>I expect that pdf.js will acquire a native function for retrieving the text content directly, to make documents searchable. It would be nice, next, to try to recreate paragraphs by looking at the spacing between the blocks, and to use the formatting and other heuristics to extract metadata like title, authors, etc.</p>]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1947</id>
    <title>Open Graph wins the Semantic Web</title>
    <updated>2011-09-29T23:57:10+00:00</updated>
    <published>2011-09-29T23:39:21+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001947.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>It took me a year - and the configuration step below - to realise that <a href="http://ogp.me/">Open Graph</a> has found a solution that works for referencing things on the web:</p>

<div><img src="/files/misc/2011-09-29-og-configuration.png"></div>

<p>We now have a standard way of providing metadata about any object, based on two principles:</p>

<ol>
  <li>Every object is represented by at least one HTML page on the web.</li>
  <li>Properties of that object are represented as &lt;meta&gt; elements in the &lt;head&gt; section of that HTML page.</li>
</ol>

<p>From that, we can make statements about any object using URIs, and fetch metadata about that object using HTTP. The Semantic Web!</p>

<h2>Statements</h2>

<p>This is an RDF statement:</p>

<table style="margin-top:0">
  <tr><td>[THING]</td><td>[LINK]</td><td>[THING]</td></tr>
  <tr><td>&lt;http://music.com/band/nirvana&gt;</td><td>&lt;http://example.com/member&gt;</td><td>&lt;http://music.com/person/kurt-cobain&gt;.</td></tr>
</table>

<p>That&#39;s two things connected by a link, all represented by URIs.</p>

<h2><span>Creating a Graph</span></h2>

<p>Several of this kind of statement can be combined to make a graph:</p>

<div>&lt;http://music.com/band/nirvana&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/kurt-cobain&gt;.</div>
<div>&lt;http://music.com/band/nirvana&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/dave-grohl&gt;.</div>
<div>&lt;http://music.com/band/nirvana&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/krist-novoselic&gt;.</div>
<div>&lt;http://music.com/band/nirvana&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/recorded&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/track/on-a-plain&gt;.</div>

<p><img  src="/files/misc/2011-09-29-og-graph.png"></p>

<p>Or, to write those statements in shorthand, without repeating the first part of each one:</p>

<div>&lt;http://music.com/band/nirvana&gt;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/kurt-cobain&gt;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/dave-grohl&gt;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/krist-novoselic&gt;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/recorded&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/track/on-a-plain&gt;.</div>

<p>And using prefixes to avoid having to write out the full URI each time:</p>

<div>PREFIX eg: &lt;http://example.com/&gt;</div>
<div>&lt;http://music.com/band/nirvana&gt;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;eg:member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/kurt-cobain&gt;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;eg:member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/dave-grohl&gt;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;eg:member&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/person/krist-novoselic&gt;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;eg:recorded&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://music.com/track/on-a-plain&gt;.</div>

<h2>Fetching information</h2>

<p>The URI &lt;http://music.com/band/nirvana&gt; represents the band Nirvana. We could equally have used &lt;http://en.wikipedia.org/wiki/Nirvana_(band)&gt; or &lt;http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh&gt;*. As these are HTTP URLs, a representation of this Thing can be <a href="http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh">fetched using HTTP</a> - in this case, your web browser probably receives an HTML representation of the band.</p>

<p>How does the server decide in which format to return that information? There&#39;s a negotiation between whoever requests the information and the server that provides the information. The request contains a list of formats that it would be able to handle, and the server returns the first of those that it&#39;s able to provide. &nbsp;In fact, the information about a Thing might be available as JSON, or XML, or any other format, but <strong>Open Graph requires that every Thing identified by a URL must have an HTML web page that represents it</strong>.</p>

<p>In this way, we can make statements about any Thing, and fetch information about that Thing by dereferencing its URL to see what information it provides.</p>

<p>How should the information about the Thing be presented in that HTML page**? As the page represents the Thing***, this information can be added to the &lt;head&gt; section of the page; it ends up looking like this:</p>

<div>&lt;meta property=&quot;eg:member&quot; content=&quot;http://music.com/person/kurt-cobain&quot;&gt;</div>
<div>&lt;meta property=&quot;eg:member&quot; content=&quot;http://music.com/person/dave-grohl&quot;&gt;</div>
<div>&lt;meta property=&quot;eg:member&quot; content=&quot;http://music.com/person/krist-novoselic&quot;&gt;</div>
<div>&lt;meta property=&quot;eg:recorded&quot; content=&quot;http://music.com/track/on-a-plain&quot;&gt;</div>

<p>Which is exactly the same information as in the shorthand RDF statements above. It&#39;s RDF in HTML!</p>

<p>If someone says they like the album &quot;Nevermind&quot;, a statement is created:<br>
&lt;http://facebook.com/eaton.alf&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://example.com/emotions/likes&gt;&nbsp;&nbsp;&nbsp;&nbsp;&lt;http://open.spotify.com/album/6okv1avxEgYSdc2JYy6ZEi&gt;</p>

<p>When we fetch the HTML document from the URL referenced (&lt;http://open.spotify.com/album/6okv1avxEgYSdc2JYy6ZEi&gt;), it contains (amongst other things) this information:</p>

<div>&lt;http://open.spotify.com/album/6okv1avxEgYSdc2JYy6ZEi&gt;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;og:type&gt; &quot;music.album&quot;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;og:title&gt; &quot;Nevermind&quot;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;music:release_date&gt; &quot;1991-01-01&quot;;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;music:musician&gt; &lt;http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh&gt;.</div>

<p>And when we fetch the HTML document from the &quot;musician&quot; URL &lt;http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh&gt;, it contains (amongst other things) this information:</p>

<div>&lt;http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh&gt;</div>
<div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;og:title&gt; &quot;Nirvana&quot;.</div>

<p>When all that information is combined, we know that this person, who clicked the &quot;Like&quot; button while listening to an album in Spotify, liked the album &quot;Nevermind&quot; by the musician &quot;Nirvana&quot; - which is what gets displayed in their Facebook timeline.</p>

<h2>Referring to URLs</h2>

<p>This is all relevant to my <a href="http://hublog.hubmed.org/archives/001946.html">recent post about citing with URIs</a>. In that demonstration, the script dereferenced the URI to get information about the thing, but specifically asked for JSON. In the end, though, the JSON is basically just a list of properties about the thing being referenced, and there&#39;s no reason why that information can&#39;t be represented in &lt;meta&gt; elements in the &lt;head&gt; of an HTML page, which is exactly what most publishers do in order to <a href="http://scholar.google.com/intl/en/scholar/inclusion.html">get their documents indexed by Google Scholar</a>. They use several prefixes (&quot;dc.&quot;, &quot;prism.&quot;, &quot;citation_&quot;); they often use meta[name][content] instead of meta[property][content], but it&#39;s all basically the same thing. I&#39;ve now updated the script to parse &lt;meta&gt; elements from HTML, alongside JSON responses.</p>

<p>In summary: if someone wants to refer to a Thing, they should be able to use a HTTP URL. If someone wants to get information about that Thing, they should be able to dereference that URL, get an HTML document, look in the &lt;meta&gt; elements in the &lt;head&gt; section, and retrieve all the information about that thing (including further URLs to find out more information about any of those properties).</p>

<hr>

<p>* Asserting equivalence between URIs allows links from one URI to also apply to the other. For example:</br>
&lt;http://example.com/music/nirvana&gt; &lt;http://www.w3.org/2002/07/owl#sameAs&gt; &lt;http://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh&gt;</p>

<p>** We don&#39;t have to worry about representing multiple items on a single page - each one will have a link to its own, individual page.</p>

<p>*** We don&#39;t have to worry about whether the URI represents the Thing or a document about the Thing: it&#39;s always the Thing. Most of the time, no-one cares who wrote the document about the Thing, or when that document was last updated. An exception might be Wikipedia, so I have a suggestion: the Thing is still represented by the web page; information about authors and update times can be attached to an appropriate property of the Thing, e.g. &lt;http://en.wikipedia.org/wiki/Nirvana_(band)#description&gt;.</p>

]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1946</id>
    <title>Citing With URIs in Google Docs</title>
    <updated>2011-09-16T02:14:50+00:00</updated>
    <published>2011-09-16T00:58:35+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001946.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>I built a script that runs in Google Docs, turning inline citations into a formatted bibliography. It lets you cite using DOIs, Mendeley library IDs, or any URL that returns metadata as JSON. It's a first, basic attempt, but here's why I like it:<p>

<h3>Citing With URIs</h3>

<p>The most straightforward way of being able to cite something in a document is to insert an identifier. On the web we hyperlink using URLs, which provide a unique identifier for the item being referenced - with the added bonus of being able to follow that URL to retrieve the item. When writing a scholarly article, however, there's still an expectation that the metadata for a citation will be provided, so that the reference will still make sense even if the URL stops working.</p>

<p>To be able to successfully cite using identifiers, therefore, means being able to retrieve the metadata for each identifier, and the simplest way to do that is to convert that identifier to a URL - if it isn't already - and retrieve it using an HTTP request.</p>

<p>Once we have the metadata for each citation, all that's needed is to generate a bibliography (a list of endnotes) at the end of the document, and insert links to those references inline. As a complication, there are many different publishing systems, and they each have <a href="http://www.zotero.org/styles">their own special preferred formatting</a> for those inline citations and bibliographies, so the tool should ideally be able to cater for any of those formats.</p>

<p>There is a need for citation software that works with Google Docs, as it's basically the standard online writing tool (and is continually getting more awesome). I've managed to get the first steps of a citation processor working in Google Docs; it's not complete yet...</p>

<h3>Inserting and Processing Citations</h3>

  <p><a href="http://code.google.com/googleapps/appsscript/">Google Apps Script</a> provides a way to add menu items to Google Docs and call a function when a menu item is selected. It's server-side Javascript, with an online editor that functions well. You can currently only attach scripts to Google Spreadsheets, but that's ok in this case: we need somewhere to store a local copy of our references.</p>

<p>Here's how to use Google Apps Script to format citations in a Google Document:</p>

<ol>
  <li>Create a new Document in Google Docs and give it a unique title.</li>
  <li>Write your article, adding citations inline in the form {{cite:doi:10.1038/nchem.1108}}.</li>
  <li>Create a new Spreadsheet in Google Docs and give it a title which is the same as the document, but with " - References" at the end.</li>
  <li>Add <a href="https://github.com/hubgit/Exciting">my Exciting script</a> to the spreadsheet (Tools > Script Editor). Once it's installed, an "Exciting" menu should appear.</li>
  <li>From the "Exciting" menu, select "Generate Bibliography".</li>
</ol>

<p>The script will now create a copy of the document (which must be in the same folder as the spreadsheet, and have the same name minus the " - References" suffix). The original document will remain untouched. It will parse the document for {{cite}} strings, fetch the metadata for each one, and store the data in the current spreadsheet (if the script is run a second time, it will use this local data instead of fetching it again). It will then replace the inline citations with numbered references, add a formatted bibliography at the end of the document, email you a PDF of the final, formatted document, and move the formatted copy of the document to the trash.</p>
  
<p>[NB: this is a first attempt, written last weekend. The citation formatting is very, very basic.]<p>

<p>There are several ways to cite using this system, and this is where it gets most interesting:</p>
<ul>
  <li>You can use {{cite:doi:10.1038/nchem.1108}} to cite an item by DOI; in this case the data will be fetched from CrossRef.</li>
  <li>You can use {{cite:mendeley:123456}} to cite an item using its ID in your Mendeley library; in this case the data will be fetched from mendeley.com (this might not be working properly yet - I haven't tested it much. It uses OAuth authentication, so you need to register an application and get a key from the <a href="http://dev.mendeley.com/">Mendeley Developers Portal</a>. You also need to run the "authorizeMendeley" function from within the Script Editor, to authorize this application).</li>
  <li><strong>You can use any URL</strong>, as long as it returns JSON when specified in HTTP Accept headers. For example, {{cite:http://dx.doi.org/10.1038/nchem.1108}} works just as well as the DOI example above.</li>
</ul>

<p>Theoretically, you can cite any URL, and the script will retrieve the metadata from that URL and make use of it. In practice, not nearly as many URLs as I'd like perform content negotiation and return JSON instead of HTML from the same URL, and even when they do there's no standard format for the reference metadata (which is where RDF comes in, but there's no RDF parser in Google Apps Script; RDF triples as JSON would be an good intermediate). The current script has custom functions to normalise the data returned from CrossRef and Mendeley into a single, standard format for local use; adding other sources would probably require a custom parser for their metadata as well.</p>

<p>I'm not able to enter the <a href="http://dev.mendeley.com/api-binary-battle">Mendeley/PLoS API Binary Battle</a>, but |'d be delighted if anyone who's interested was to take this code and make use of it. I see the next steps like this, possibly: 1) get citeproc-node running on a node.js server somewhere (Heroku or Joyent, maybe), and use that for formatting the references; 2) use the UI Services/GUI Builder in Google Apps Script to build an editing interface, for tidying up references once they've been retrieved; 3) add the ability to specify custom formatting for the inline citations, and to choose the citation format for the bibliography.</p>

]]></content>
  </entry>
  <entry>
    <id>tag:hublog.hubmed.org,2010://2.1945</id>
    <title>Client-Side PubMed Searching</title>
    <updated>2011-07-23T18:39:29+00:00</updated>
    <published>2011-07-23T17:35:14+00:00</published>
    <link rel="alternate" type="html" href="http://hublog.hubmed.org/archives/001945.html" />
    <content type="html" xml:base="http://hublog.hubmed.org/" xml:lang="en"><![CDATA[<p>The NCBI have added <a href="https://developer.mozilla.org/en/HTTP_access_control#Access-Control-Allow-Origin"><tt>Access-Control-Allow-Origin: *</tt></a> to the <a href="http://eutils.ncbi.nlm.nih.gov/">eUtils</a> response headers, to allow <a href="http://www.w3.org/TR/cors/">cross-origin resource sharing</a>.</p>

<p>This means that anyone can now make client-side PubMed search interfaces, like <a href="http://alf.hubmed.org/2011/07/pubmed/">this one</a>.</p>

<p>Only the eSearch and eSummary methods have the Access-Control-Allow-Origin header so far, so it's not possible to get abstracts or full citation data this way (using eFetch) yet.</p>

]]></content>
  </entry>
</feed>

