<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
   <channel>
      <title>Common Knowledge</title>
      <link>http://scienceblogs.com/commonknowledge/</link>
      <description />
      <language>en</language>
      <copyright>Copyright 2009</copyright>
      <lastBuildDate>Thu, 05 Nov 2009 07:45:41 -0500</lastBuildDate>
      <generator>http://www.sixapart.com/movabletype/?v=4.261</generator>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs> 

      
      <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/scienceblogs/CommonKnowledge" type="application/rss+xml" /><feedburner:emailServiceId>scienceblogs/CommonKnowledge</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
         <title>Distributed Science, Part 2</title>
          <description>&lt;p&gt;I got a lot of feedback on my&lt;a href="http://scienceblogs.com/commonknowledge/2009/10/open_source_science_or_distrib.php#comments"&gt; last post&lt;/a&gt; in which I argued that open source is the wrong metaphor fo science, because it ties us too closely to the artifact that is open source software. The core of my argument remains the same - &lt;em&gt;science is not software, and we shouldn't treat it the way we treat software&lt;/em&gt;. But I got a few comments, here on the blog and in email, that are worth looking at.&lt;/p&gt;

&lt;p&gt;Here's comment #1.&lt;/p&gt;

&lt;blockquote&gt;You cite openwetware and the biobricks registry, but if you look closer, openwetware is a wiki, not a website about open source wetware tech. To my knowledge, other than the people over at diybio, there have been no signs of anyone with an understanding of free and open source software infrastructure (not the legalese- the toolchains) applying the concepts to the world of open source science.&lt;/blockquote&gt;

&lt;p&gt;This comment illustrates my point by missing it, which is that we should not be applying the understanding of software to science. In software, we the humans are in charge. We write the code. We compile it. Everything exists inside a system that we built, that is at least somewhat intelligently designed. Bringing this "understanding" to science means we shove a science peg into a software slot. The idea that "open source science" should be a site about wetware tech betrays a focus on the construction of tech, which is indeed the point of software.&lt;/p&gt;

&lt;p&gt;But science isn't like software. Science is about extending the boundaries of our ignorance, not making technology. The difference between making technology (which is the point of software) and making discoveries (the point of science) is the root of the failure of the "open source science" metaphor. Science is about creating knowledge that doesn't exist and exposing ignorance that does exist, not about writing source code that we control. &lt;/p&gt;

&lt;p&gt;In honor of his recent passing, here's Claude Lévi-Strauss: "The scientist is not a person who gives the right answers, he's one who asks the right questions." (from&lt;a href="http://fr.wikipedia.org/wiki/Le_Cru_et_le_cuit"&gt; Le Cru et le cuit&lt;/a&gt;, 1964)&lt;/p&gt;

&lt;p&gt;This is precisely why I want to take us up a layer in the ontology. Open source software is an example of distributed innovation, and as an inspiration to make distributed innovation happen in science, it's lovely. But it's an inspiration, not a map.&lt;/p&gt;

&lt;p&gt;We should absolutely have distributed innovation in science. Open WetWare (which I am well aware is a wiki) contains many protocols, crafts and techniques, that are shared openly. This is a locally relevant form of distribution, even if it doesn't fit into an open source software box. Control over protocols and craft is at the core of one of the biggest resistors to distribution in science, which is &lt;a href="http://www3.interscience.wiley.com/cgi-bin/fulltext/121633537/HTMLSTART?CRETRY=1&amp;SRETRY=0"&gt;competitive withholding&lt;/a&gt;. So is the registry of standard biological parts. These are resources and toolchains that absolutely support distribution of capability and increase capacity, which are fundamental to early-stage distributed innovation. &lt;/p&gt;

&lt;p&gt;They're just not what we expect when we wear open source glasses.&lt;/p&gt;

&lt;p&gt;Here's comment #2:&lt;/p&gt;

&lt;blockquote&gt;The "Open Gel Box" project is an initiative to bring biotech equipment into the 21st century. We need innovation in "established" tools to make them intuitive and accessible for anyone who wants to work with DNA. To that end, a group of users from the DIYbio list got together and designed a better, faster gel system than what exists today.

&lt;p&gt;Pearl Biotech is now manufacturing a complete gel electrophoresis system according to the Open Gel Box design The Pearl Gel Box is available for $199 at http://www.pearlbiotech.com. We're advocating for better equipment on all fronts, such as an Open Thermal Cycler.&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;I think this is awesome. It's not "open source" though. It's not even what I'd call "distributed innovation" - the innovation theorists call this kind of thing &lt;a href="http://en.wikipedia.org/wiki/User_innovation"&gt;User-Driven Innovation&lt;/a&gt;. This is about as clear a case of UDI as I know, right down to the fact that it's designed by the DIY folks and then made pretty and sold by a company. This again gets to the paucity of the open source software example. It simply isn't big enough to fit science into it. &lt;/p&gt;

&lt;p&gt;Distributed science, user-driven science, open innovation science, we need ALL of them, not a narrow idea that comes from software. It's about hardware for science. It's about data for science. It's about laboratories for science. It's about research departments and funders and promotion and tenure. It's about paradigms, and paradigm shifts. &lt;/p&gt;

&lt;p&gt;It's not software. &lt;/p&gt;

&lt;p&gt;We control software. We don't control science. DIY Biology is one of the absolute leading examples of how, when we have a critical mass of open craft and protocols, users can lead the way. But it's not something that's enabled by an open source license, a code version repository, and other hallmarks of open source software. It's users saying, "screw this, I can do better" - and doing it. It's users who know the problem best and design the best solutions. &lt;/p&gt;

&lt;p&gt;The business school folks call this "&lt;a href="http://web.mit.edu/evhippel/www/papers/stickyinfo.pdf"&gt;stickiness&lt;/a&gt;." The knowledge of how to make the solution is localized - sticks - to the user. The dumb firms in the sector only make products their marketing departments tell them about, and the smart ones find ways to take user inventions and turn them into their product lines. Like Pearl. &lt;/p&gt;

&lt;p&gt;Comment #3:&lt;/p&gt;

&lt;blockquote&gt; (from my post: Stem cells, mice, vectors, plasmids, and more will need to available outside the old boy's club that dominates modern life sciences.)

&lt;p&gt;This is simply never gonna happen, because of the huge irreducible expense of maintaining and manipulating these reagents.&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;See: &lt;a href="http://personalgenome.org"&gt;Personal Genome Project&lt;/a&gt;, &lt;a href="http://ccr.coriell.org/"&gt;Coriell Cell Culture Repository&lt;/a&gt;, &lt;a href="http://www.jax.org"&gt;Jackson Laboratories&lt;/a&gt;, &lt;a href="http://www.straininfo.ugent.be/"&gt;StrainInfo&lt;/a&gt;. I could link a dozen more. The nodes are emerging. What's missing is the network that connects them. What's missing is an impact factor for materials. &lt;/p&gt;

&lt;p&gt;We're headed straight towards a future where scientists will need to publish their tools, data, and narratives, instead of compressing everything into a "paper" that is constrained by the cost of printing and mailing. I for one can't wait. It's going to be a key to distributing democratized access to tools, which is fundamental for both distributed innovation *and* user-driven innovation.&lt;/p&gt;

&lt;p&gt;Comment #4:&lt;/p&gt;

&lt;blockquote&gt;I believe your historical facts are a little skewed. Open Biology perhaps began on the internet back with BIONET, which functioned well through the late 80's and early 90's, until the network apparently failed to grab sufficient interest for funding. [...] There have been efforts to create biology software repositories (similar to sourceforge.net except for Biology software) and these have largely failed to attract a majority of Bio-scientists too.&lt;/blockquote&gt;

&lt;p&gt;This comment's talking about software. I'm not. It again illustrates the way that the open source metaphor comes with code-centric blinders. &lt;/p&gt;

&lt;blockquote&gt;It would be great to accelerate this process even further, for example by expanding PLoS, encouraging all scientists to publish their working software (for example, MATLAB scripts) into open source repositories&lt;/blockquote&gt;

&lt;p&gt;Now this is talking about the foundations for distributed science. When there is software in science, it should be published. Just like stem cells. Into repositories. Couldn't agree more.&lt;/p&gt;

&lt;blockquote&gt;encouraging the people-in-the-middle (hobbyists, engineers) to publish in an intermediate form which isn't as strict as a scientific journal yet maintains some level of technological standard and legitimacy -- similar to the Internet RFC's, which started as simple technical memo's.&lt;/blockquote&gt;

&lt;p&gt;Now here's where the comment truly shines, IMO. This is thinking broadly about breaking open the central metaphor of knowledge governance in science. This is not about "open source" - the internet RFCs aren't "open source software" - they are protocols, distributed for implementation and comment. Sort of like that stuff on the Open WetWare wiki, huh?&lt;/p&gt;

&lt;p&gt;Coming back to my point. &lt;/p&gt;

&lt;p&gt;Let's take off the open source glasses. Making science isn't like making software. Engineering foundations for distribution, for user hacking, for bringing more people into the system, these are the things that allowed open source to emerge in software. Good design choices, like separation of concerns, led us to the world of open source software. Let's learn from those lessons and build the foundations first, and let the science surprise us with the way it localizes distributed and user driven innovation. &lt;br /&gt;
&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/11/distributed_science_part_2.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/6ySN0TN9shA" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/6ySN0TN9shA/distributed_science_part_2.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/11/distributed_science_part_2.php</guid>
         <category />
         
         <pubDate>Thu, 05 Nov 2009 07:45:41 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/11/distributed_science_part_2.php</feedburner:origLink></item>
      
      <item>
         <title>Open Source Science? Or Distributed Science?</title>
          <description>&lt;p&gt;I was asked in an interview recently about "&lt;a href="http://www.google.com/search?q=open+source+science&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a"&gt;open source science&lt;/a&gt;" and it got me thinking about the ways that, in the "open" communities of practice, we frequently over-simplify the realities of how software like &lt;a href="http://en.wikipedia.org/wiki/Linux"&gt;GNU/Linux&lt;/a&gt; actually came to be. &lt;a href="http://en.wikipedia.org/wiki/Free_and_open_source_software"&gt;Open Source&lt;/a&gt; refers to a software worldview. It's about software development, not a universal truth that can be easily exported. And it's well worth unpacking the worldview to understand it, and then to look at the realities of open source software as they map - or more frequently do not map - to science.&lt;/p&gt;

&lt;p&gt;The foundations of open source software are relatively easy to track. In the beginning, there was free software and &lt;a href="http://en.wikipedia.org/wiki/Richard_Stallman"&gt;Richard Stallman&lt;/a&gt;. RMS didn't just invent the &lt;a href="http://en.wikipedia.org/wiki/GNU_GPL_license"&gt;GPL&lt;/a&gt; as a legal, he wrote crucial foundational software for writing software, notably the &lt;a href="http://en.wikipedia.org/wiki/GNU_Compiler_Collection"&gt;GNU compiler collection&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/GNU_Debugger"&gt;GNU Debugger&lt;/a&gt;, and the original &lt;a href="http://en.wikipedia.org/wiki/Emacs"&gt;Emacs&lt;/a&gt;. So from the beginning, there was not only a free legal tool, but tools for coding that were better than other systems at the time.  &lt;/p&gt;

&lt;p&gt;Simultaneously, we can see that the emergence of &lt;a href="http://en.wikipedia.org/wiki/Microcomputer"&gt;microcomputers&lt;/a&gt; and ubiquitous access to the internet expanded the number (and interconnectivity) of potential programmers. Suddenly there were tens of thousands of programmers with computers at home and at work. The explosion of the Web saw the creation of infrastructure like &lt;a href="http://sourceforge.net/"&gt;code repositories&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/List_of_revision_control_software"&gt;version control systems&lt;/a&gt;, and coding communities. Thanks to &lt;a href="http://en.wikipedia.org/wiki/Object-oriented_programming"&gt;object-orientation&lt;/a&gt;, software was also very amenable to being broken into defined, modular chunks and tasks. One coder could work on a kernel function, another on a user interface function, a third on an application, and they could be reasonably sure that as long as they all followed the standards, their work would snap together into the growing distribution. The phrase "open source" can sort of be a shorthand for this kind of innovation, which we also see in wikipedia and other community built projects. &lt;/p&gt;

&lt;p&gt;Open source, if we view it through a different lens, is really more about a &lt;a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1021034"&gt;distributed methodology for software development.&lt;/a&gt; The burden of creation is widely distributed across a massive community with more-or-less equal access to tools and systems. In this context, the role of the legal tool is more akin to an enzyme. It was an essential piece of a puzzle, but it was not the only piece. In fact, without the rest of the infrastructure (connectivity, tools, and people) the legal tool on its own would not have led us to GNU/Linux. &lt;/p&gt;

&lt;p&gt;Yet far too often the focus on "porting" open source to science focuses on the legal aspects rather than performing an analysis of the infrastructure for science. Science is actually not very similar to modern software at this point. In science, especially life science, many of these factors don't exist. There isn't democratic access to tools. You tend to need a lab, which means you tend to need to work at a place big enough to afford a lab, which tends to mean you need an advanced degree, which means &lt;em&gt;there is no crowd&lt;/em&gt; - thus the fundamentals for distributed science development aren't there. And when we try to force open source on a knowledge space that is fundamentally poorly structured for distributed development, we'll not only be frustrated by our failures to replicate the GNU/Linux and Wikipedia successes, we'll risk discrediting the idea of distribution itself.&lt;/p&gt;

&lt;p&gt;Another problem: the open source approach, which is based on the open licensing of a powerful, moderately internationally harmonious property right, doesn't really apply very well to science, in which the IP situation is far more often patents v trade secret instead of copyright v copyleft. Copyrights are free to acquire, and thus easy to license at no cost as well. No one's losing an investment they made of $50,000 or more to acquire their copyright when they license code under copyleft. Patents are not so amenable to legal &lt;a href="http://en.wikipedia.org/wiki/Aikido"&gt;aikido&lt;/a&gt;. And they can kill a great idea in the cradle by tying up all the rights in a tangle of patent thickets and expensive licenses.&lt;/p&gt;

&lt;p&gt;A third problem is that science is a long, long, long, long, long way from being a modular knowledge construction discipline. Whereas writing code forces the programmer to compile the code, and the standard distribution forces a certain amount of interoperability, scientists typically write up their knowledge as narrative text. It's written for human brains, not silicon compilers. Scientists are taught to think in a reductionist fashion, asking smaller and smaller questions to prove or disprove specific hypotheses. This system almost guarantees that the tasks fail to achieve modularity like software, and also binds scientists through tradition into a culture of writing their knowledge in a word processor rather than a compiler. Until we can achieve something &lt;a href="http://neurocommons.org"&gt;akin to object-orientation in scientific discourse&lt;/a&gt;, we're unlike to see the distributed innovation erupt as it does in culture and code.&lt;/p&gt;

&lt;p&gt;A fourth problem is that science has the additional problem of collective action congestion created by the significant institutional participation impact of research institutions, tech transfer offices, venture capital, startups, and so forth. Software isn't subject to these constraints, at least, not most software. But science is like writing code in the 1950s - if you didn't work at a research institution then, you probably couldn't write code, and if you did, you were stuck with punch cards. Science is in the punch cards stage, and punch cards aren't so easy to turn into GNU/Linux.&lt;/p&gt;

&lt;p&gt;None of this is meant to discourage open approaches. We need to try. The problems we face, from neglected diseases to climate change to earthquake analysis to sustainability, are so complex that they'll probably overwhelm any approach that is not inherently distributed. Distributed systems scale much better than non-distributed, closed systems. But we should always understand the foundations, and closely examine our work to see if we need to work on building those foundations.&lt;/p&gt;

&lt;p&gt;In the sciences, the first foundation is &lt;a href="http://www.earlham.edu/~peters/fos/overview.htm"&gt;access to the narrative texts&lt;/a&gt; that form the canon of the sciences. Tens of thousands of papers are published a year. They need object-orientation - semantics - so that we can begin to treat that information as a platform, not a consumable product. Licensing is a part of this, but so is technology and scientific culture. &lt;a href="http://www.obofoundry.org/"&gt;Better ontologies&lt;/a&gt;, buy-in to technical standards, publisher participation in integration and federation, and more will be foundational to the establishment of content-as-platform. As the &lt;a href="http://research.microsoft.com/en-us/collaboration/fourthparadigm/default.aspx"&gt;data deluge&lt;/a&gt; intensifies, this foundation becomes more and more important, as the literature provides the context for the data. Moving to a &lt;a href="http://linkeddata.org/"&gt;linked web&lt;/a&gt; or &lt;a href="http://www.w3.org/2001/sw/"&gt;semantic web&lt;/a&gt; without a powerful knowledge platform at the base is building a castle made of sand - close to the water line.&lt;/p&gt;

&lt;p&gt;Another foundation is access to tools and the creation of fundamental open tools. We need the biological equivalent of the C compiler, of Emacs. Stem cells, mice, vectors, plasmids, and more will need to available outside the old boy's club that dominates modern life sciences. We need access to supercomputers that can run massive simulations for earth sciences and climate sciences. These tools need to be democratized to bring the beginning of distributed knowledge creation into labs, with the efficiencies we know from eBay and Amazon (of course, these tools should perhaps be restricted to authenticated research scientists, so that we don't get garage biologists accidentally creating a super-virus).&lt;/p&gt;

&lt;p&gt;The legal aspects weave through these foundations. The license has power to create freedoms but the improper application of a license approach carries significant risks. The "open source" meme can often feel a little religious about licenses, but it's good to remember that the GPL was invented not in the desire to write a license, but in a desire to return programming to a free state. With data and tools, we have the chance to avoid the intellectual property trap completely - if we have the nerve for it. &lt;/p&gt;

&lt;p&gt;There is some distributed innovation happening in new fields of science, like &lt;a href="http://diybio.org/"&gt;DIY biology&lt;/a&gt;, and in non science communities, like &lt;a href="http://patientslikeme.com"&gt;patients sharing treatments and outcomes with each other.&lt;/a&gt; A quick examination of the foundations reveals they are ripe for distribution: DIY biology can build on &lt;a href="http://openwetware.org/"&gt;open wetware&lt;/a&gt;, the &lt;a href="http://parts.mit.edu"&gt;registry of standard biological parts&lt;/a&gt;, and the availability of equipment and tools. Patients can connect using Web 2.0 and talk to each other without intermediaries. But this doesn't scale across into traditional science.&lt;/p&gt;

&lt;p&gt;I propose that the point of this isn't to replicate "open source" as we know it in software. The point is to create the essential foundations for distributed science so that it can emerge in a form that is locally relevant and globally impactful. We can do this. But we have to be relentless in questioning our assumptions and in discovering the interventions necessary to make this happen. We don't want to wake up in ten years and realize we missed an opportunity by focusing on the software model instead of designing an open system out of which open science might emerge on its own.&lt;br /&gt;
&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/10/open_source_science_or_distrib.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/tLekVMSBBOo" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/tLekVMSBBOo/open_source_science_or_distrib.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/10/open_source_science_or_distrib.php</guid>
         <category />
         
         <pubDate>Fri, 30 Oct 2009 08:58:52 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/10/open_source_science_or_distrib.php</feedburner:origLink></item>
      
      <item>
         <title>Story Time</title>
          <description>&lt;p&gt;This post was prompted by the combination of three events: a visit with the founder of &lt;a href="http://pubget.com/"&gt;PubGet&lt;/a&gt;, an invitation to keynote at a conference on publishing, and an &lt;a href="http://scienceblogs.com/clock/2009/09/scienceonline09_-_interview_wi_12.php"&gt;interview with Bora&lt;/a&gt; about the&lt;a href="http://www.scienceonline09.com/index.php/wiki/"&gt; Science Online 2009&lt;/a&gt; conference last January in RTP.&lt;/p&gt;

&lt;p&gt;The past year has seen an explosion of talk about the future of the scientific article. It's wonderful to see, even if the results are either &lt;a href="http://www.ploscompbiol.org/doi/pcbi.1000361"&gt;depressingly complicated to achieve&lt;/a&gt; or &lt;a href="http://beta.cell.com/"&gt;depressingly incremental innovation&lt;/a&gt;. Both of those results are better than when I got into this - I remember at a &lt;a href="http://www.lub.lu.se/ncsc2006/"&gt;conference in Sweden in 2006&lt;/a&gt; hearing a grand high priest of the publishing industry argue that they'd gotten this whole digital publishing thing sorted right out...that attitude was the first thing that needed to change. Glad it has.&lt;/p&gt;

&lt;p&gt;I've been hammering for years now on the need to enrich articles with semantics. My talk at that conference in Sweden was probably the first good one I gave on the topic, and it's been an &lt;a href="http://en.wikipedia.org/wiki/Leitmotif"&gt;leitmotif&lt;/a&gt; for me going back to the mid-1990's when I was studying epistemology and getting my first real exposure to networked computers. For years I was convinced it was right around the corner. &lt;/p&gt;

&lt;p&gt;That semantic publishing future now feels closer than it ever has. But I'm actually less convinced it's around the corner than in years past, and the reasons for that are human, not technical.&lt;/p&gt;

&lt;p&gt;To be clear: in the following, I'm going to be talking about narratives and text, not about databases. The semantic future for databases and data &lt;a href="http://linkeddata.org/"&gt;is&lt;/a&gt; &lt;a href="http://www.w3.org/2001/sw/"&gt;already&lt;/a&gt; &lt;a href="http://neurocommons.org"&gt;here&lt;/a&gt;, but to paraphrase William Gibson, it's just unevenly distributed. Those of the argument that the Semantic Web isn't going to work have already lost the argument. You just don't see it, because it's an infrastructure upgrade to the back-end of the Web to make it work for data. &lt;/p&gt;

&lt;p&gt;But the impact of formal semantics on text, which is what humans interface with, has been negligible. It's had nowhere near the impact of tagging and folksonomy. That's driven me, and many others who like formal semantics, crazy.&lt;/p&gt;

&lt;p&gt;The benefits to a formal semantic approach to text are so obvious: we can start to treat knowledge as a graph, and we can even maybe start to get some network externality benefits to that knowledge. Make it more valuable via the network...one fact is like one fax machine, but many facts build a hypothesis, etc. etc. etc.&lt;/p&gt;

&lt;p&gt;Beautiful dream. Not going to happen anytime soon. &lt;/p&gt;

&lt;p&gt;The problem is that people are the writers. Humans. Not machines. Machines luuuuuv semantics. Otherwise they can't tell the difference between a picture and a pitcher (or between a pitcher of water and a baseball pitcher). This is why one should never send one's mother to buy jewelry via Google without the safe browsing mode enabled.&lt;/p&gt;

&lt;p&gt;And people don't like formal semantics. I majored in formal semantics, and it's a topic that still gives me headaches. &lt;/p&gt;

&lt;p&gt;People like stories. &lt;/p&gt;

&lt;p&gt;Scientists are people. &lt;/p&gt;

&lt;p&gt;Scientists like stories. &lt;/p&gt;

&lt;p&gt;A paper is a story. It tells, in its own way, the story of years of work. Of building expertise. Of designing falsifiable hypotheses. Of the results found in the lab. Of the search to balance those results against the canon and dogma. Of the potential ramification of the results. &lt;/p&gt;

&lt;p&gt;It's a story of science. And the telling of it is an important part of being a human who does science. &lt;/p&gt;

&lt;p&gt;A recent article in PLoS Genetics states that  "&lt;a href="http://www.plosgenetics.org/article/info:doi%2F10.1371%2Fjournal.pgen.1000622;jsessionid=BA728C02B80685DF22C9B77B025579AC"&gt;Fission Yeast Tel1ATM and Rad3ATR Promote Telomere Protection and Telomerase Recruitment&lt;/a&gt;" - now, those are the key "facts" asserted. They could be written into machine-readable format. I will spare you what that would look like. Suffice to say it's eye bleedingly ugly, and requires lots of agreement about unique identifiers. It's doable. It's being done for the databases and that will eventually make it possible for the literature. It's just not fun. And it ignores the story. &lt;/p&gt;

&lt;p&gt;It reduces the research tale to a few assertions, nested into a massive graph of stuff other people asserted. While this is great for machines, it is lousy for people. &lt;/p&gt;

&lt;p&gt;This is all leading up to an idea I'm working on for the talk later this month. Publishers need to be in the business of providing the service that translates the stories for the machines to understand. The Web makes it trivial to publish stories in human readable form. All the beautiful layout services and print services that used to be worth paying for...aren't. Peer review isn't free, but it's nowhere near as expensive as it's made out to be - and it's going to get transformed by the Web, too. The Web makes peer review massively more powerful as it makes it massively more democratic. The Web kills a lot of things that used to drive value in content, especially controlled content.&lt;/p&gt;

&lt;p&gt;After all, I can't remember the last time I used a Zagat's guide. Not when I have &lt;a href="http://chow.com"&gt;Chowhound&lt;/a&gt;. It's going to come to science. Don't know exactly how, but it's coming.&lt;/p&gt;

&lt;p&gt;But this only covers one piece of science - the telling of the story. There's another key, which is the ability to use the information to write a new tale. The ability to take this massive corpus of story and turn it into something that can be modeled, that can be used by humans and machines together to draft new stories...that ability is going to require the emergence of publishers who understand their role in the new content economy. It's not as printers who use bits rather than ink. It's as translators between the human stories and the machines who have to take those stories, integrate them into a web of linked data, and make it possible for humans to ask questions, dream dreams, and tell new stories.&lt;/p&gt;

&lt;p&gt;The semantic article isn't going to come from individual scientists rebelling and marking up their own text. It's going to be a publisher value-added service - "let us make your article integrated, and comprehensible, so that you maximize your citation count and potential collaboration." &lt;/p&gt;

&lt;p&gt;Sounds good, doesn't it?&lt;/p&gt;

&lt;p&gt;Focusing on the control of copies of the article, of the story, isn't just a losing strategy because of the open access movement, although it is that as well. It's the wrong concept entirely. Translation is a service for which authors would gladly pay. For which searchers would gladly pay. And it's a market that is going to get more valuable as a result of open systems, not less valuable, as the cost of controlled scientific published content drops thanks to &lt;a href="http://www.earlham.edu/~peters/fos/overview.htm"&gt;green and gold open access&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Think about Clayton Christensen's &lt;a href="http://interestingprinciples.blogspot.com/2009/03/law-of-conservation-of-attractive.html"&gt;law of conservation of attractive profits&lt;/a&gt;: "When attractive profits disappear at one stage in the value chain because a product becomes commoditized, the opportunity to earn attractive profits with proprietary products usually emerges at an adjacent stage."&lt;/p&gt;

&lt;p&gt;Publishers are trying to fight the commoditization of the story. They shouldn't. The vast majority of the stories are bought and paid for by the public one way or the other. Publishers should be looking at the place where they can compete on proprietary services, and taking over those markets before their competitors - or startups - beat them to it. There is enormous opportunity in the emerging open access world to make money without needing to vigilantly police the movement of content. &lt;/p&gt;

&lt;p&gt;Help the scientists tell their stories in a way that lets those stories integrate into the digital web. Don't just gussy up a paper version of a story with hyperlinks. Don't focus on controlling the movement of stories. They're sand in your hands once they're on the network. Embrace that fact. Find the value in the next layer, the service layer. &lt;/p&gt;

&lt;p&gt;Be a guide. Be a search engine. Be a &lt;em&gt;translator&lt;/em&gt;. &lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/09/this_post_was_prompted_by.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/IB7raOLs3LM" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/IB7raOLs3LM/this_post_was_prompted_by.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/09/this_post_was_prompted_by.php</guid>
         <category />
         
         <pubDate>Wed, 02 Sep 2009 12:56:54 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/09/this_post_was_prompted_by.php</feedburner:origLink></item>
      
      <item>
         <title>Ignore this post</title>
          <description>&lt;p&gt;Seriously. Just getting around to technorati claiming. Move along, nothing to see here. Watch for a lengthy post on scientific publishing later tonight or tomorrow.&lt;/p&gt;

&lt;p&gt;59tbcg4wsi&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/09/ignore_this_post.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/T83wZKcuxdY" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/T83wZKcuxdY/ignore_this_post.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/09/ignore_this_post.php</guid>
         <category />
         
         <pubDate>Tue, 01 Sep 2009 19:59:30 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/09/ignore_this_post.php</feedburner:origLink></item>
      
      <item>
         <title>Open Data: It's About Interoperability, Not Property</title>
          <description>&lt;p&gt;I wrote this up on the request of a colleague who heard my talk recently on open data. I'm posting it here for comment and adding some hyperlinks...&lt;/p&gt;

&lt;p&gt;Moving from a Web of documents to a &lt;a href="http://www.w3.org/2001/sw/"&gt;Web of data&lt;/a&gt; (or of &lt;a href="http://linkeddata.org/"&gt;Linked Open Data&lt;/a&gt;) is an oft-cited goal in the sciences. The Web of data would allow us to link together disparate information from unrelated disciplines, run powerful queries, and get precise answers to complex, data-driven questions. It's an undoubtedly desirable extension of the way that the existing networks increase the value of documents and computers through connectivity - &lt;a href="http://en.wikipedia.org/wiki/Metcalfe%27s_law"&gt;Metcalfe's Law&lt;/a&gt; applied to more complex information and systems.&lt;/p&gt;

&lt;p&gt;However, making the Web of data turns out to be a deeply complex endeavor.  Data - here, a catchall word covering databases and datasets and generally meaning here information that is gathered in the sciences as a result of either experimental work or environmental observation - require a much more robust and complete set of standards to achieve the same "web" capabilities we take for granted in commerce and culture. &lt;/p&gt;

&lt;p&gt;Unlike documents, the ultimate intended reader of most data is a machine. Some classic examples include search engines, analytic software, database back ends, and more. There is simply too much data in production to place people on the front lines of analysis. When data scales easily into the petabytes, we just can't keep up using the existing systems. &lt;/p&gt;

&lt;p&gt;This machine-readability requirement is very different from the Web of documents, which was designed to standardize the way information is shown to people. Machine readability means we have to think, early and often, about the level of interoperability in any given chunk of data. "How "connectable" is it to other data?" should be the first question we ask of new data, because the level of effort required to make data connectable post-hoc is significant - frequently unbearable.&lt;/p&gt;

&lt;p&gt;The connectability quotient creates significant pressures to build interoperability deep into the Web of data. It implies a level of rigor in the design of data that understands the intended use of that data is in a network context.  Thus, we need to turn ourselves to the concept of interoperability and examine what it means in a data context.&lt;/p&gt;

&lt;p&gt;There are three interlocking dimensions to interoperability in data: legal, technical, and semantic. By legal, we mean the contractual and intellectual property rights associated with the data; by technical, the standard systems (especially the computer languages) in which the data is published; and by semantic,  the actual meaning of the data itself - what it describes, and how it relates to the broader world. &lt;/p&gt;

&lt;p&gt;Each of these dimensions is complex on its own. Taken together, the three represent unsolvable complexity. The semantic layer alone requires an almost miraculous level of agreement on "what things mean," and anyone who has witnessed argument among scientists, be they economists of physicists, knows that even apparently simple topics turn contentious over matters as basic as definitions. Consensus on the technical layer is somewhat easier - the existence of the Web and the Semantic Web "stack" of standard technologies has begun to take a leadership position in data networking - but still difficult, long, and open to argument. One of the only opportunities we have is in the legal layer, where we can look to a broad set of successes in legal interoperability through the use of a simple, flat standard: the public domain.&lt;/p&gt;

&lt;p&gt;The public domain is a very simple concept - no rights are reserved to owners, and all rights are granted to users. The public domain exists as a counterweight to copyright in the creative space, but in some countries - especially the United States - as a first option for data that is not considered "creative." &lt;/p&gt;

&lt;p&gt;The public domain option currently underpins a wide variety of linked data that is already well on its way to achieving Web scale. From the &lt;a href="http://www.ivoa.net/"&gt;International Virtual Observatory&lt;/a&gt;, whose members build an international data net on norms of "acknowledgment" rather than contracts of "attribution", to the world of genomics, where &lt;a href="http://www.ncbi.nlm.nih.gov/Genbank/"&gt;entire genomes and related data are harmonized nightly across multiple countries&lt;/a&gt;, the public domain creates complete interoperability at the legal layer of the data network, and serves as a foundation for the next layer of technical interoperability. &lt;/p&gt;

&lt;p&gt;Interestingly we have yet to observe similar network effects emerging in cases where the underlying data is treated in a more conservative "intellectual property" context by using copyright licenses or database licenses inspired by copyright. Indeed, in the case of the international consortium mapping human genomic variation, the implementation of a&lt;a href="http://www.worldlii.org/int/other/PubRL/2003/4.html"&gt; "click through" license&lt;/a&gt; was found in practice to &lt;a href="http://www.sanger.ac.uk/Info/Press/2004/041213.shtml"&gt;impede integration of that mapped variation with other public domain data&lt;/a&gt;, limiting the value of the map. The license was removed, the &lt;a href="http://www.hapmap.org/guidelines_hapmap_data.html.en"&gt;public domain option instated&lt;/a&gt;, and the database was immediately technically integrated with the rest of the international web of gene data. &lt;/p&gt;

&lt;p&gt;The legal element is of course just the beginning. The entities inside the databases themselves must be &lt;a href="http://neurocommons.org/page/URIs"&gt;named and linked&lt;/a&gt;, in a standard way. Consensus on a dizzying array of technical standards must be achieved through working groups and hard won agreement. Semantic agreement - or disagreement - must be enabled where possible, and managed through savvy technology where not possible. But if the entire system must begin with a complex set of legal terms and conditions, and be subject to the kinds of injunctions and property claims so familiar from the creative world, it is inherently unstable and unlikely to interoperate. &lt;/p&gt;

&lt;p&gt;We have seen the public domain option work, again and again, across the scientific disciplines. Implementing the public domain as the interoperability standard for the legal dimension of the web of data holds the greatest promise for scalability and long-term achievement of the network effect for data, as it permits the widest range of experimentation and development at the technical and semantic layers.&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/08/open_data_its_about_interopera.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/NdAPeC0RrrY" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/NdAPeC0RrrY/open_data_its_about_interopera.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/08/open_data_its_about_interopera.php</guid>
         <category />
         
         <pubDate>Thu, 20 Aug 2009 11:43:31 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/08/open_data_its_about_interopera.php</feedburner:origLink></item>
      
      <item>
         <title>May All Your Standards Be Simple and Evolvable</title>
          <description>&lt;p&gt;I was in a roundtable yesterday talking about Health IT with a bunch of very smart people in the bay area. It was sort of a briefing of ourselves and others about the real issues underpinning what it would take to generate real disruptive innovation in health technology and health costs. The vast majority of the conversation centered on payment reform, which is outside my ambit. &lt;/p&gt;

&lt;p&gt;But we did spend some time talking about health data standards, and the problem of getting standards that are so geared to the existing market-dominant companies that they actually froze out new market entrants. My contribution in all this was pretty small, and to me seemed obvious. The standard that works best tends to be the least powerful solution to the problem, especially if it's an openly released solution. This can be counterintuitive - why wouldn't we want the most powerful one? - but it's been proven again and again. &lt;/p&gt;

&lt;p&gt;In technology, standards propagate like kudzu. Most of them go nowhere, representing an enormous sunk cost of time and money. And that's because most of them are way too complex. The more powerful they are, the more brittle they are, the more expensive they are to implement, and the more they restrict the re-use of the system.&lt;/p&gt;

&lt;p&gt;Tim Berners-Lee calls this the &lt;a href="http://www.w3.org/DesignIssues/Principles.html"&gt;Rule of Least Power&lt;/a&gt;, and it's one of the most important lessons I learned working at the W3C. There's a simple reason for this - the more basic the markup of the content, the easier it is to write applications that process the content. &lt;/p&gt;

&lt;p&gt;Thus TCP/IP, created simply to move bits between computers, begat a variety of new protocols like &lt;a href="http://en.wikipedia.org/wiki/File_Transfer_Protocol"&gt;FTP&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Gopher_(protocol)"&gt;Gopher&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Finger_protocol"&gt;Finger&lt;/a&gt;, many other protocols that layered atop the basic bits standard. Complexity from simplicity. Attempting to embed file transfer into the bits protocol would have made this whole process a lot harder. &lt;/p&gt;

&lt;p&gt;And of course HTML/HTTP begat the entire Web, all the way to YouTube and Amazon and everything else. Writing video codes into HMTL wouldn't have worked nearly as well as writing a standard that was simple enough to be extended by smart users coming along ten years later.&lt;/p&gt;

&lt;p&gt;To the rule of least power we can add the rule of openness - the standards process should be as open as is feasible, and the standards themselves must be open. Users have to be able to read a standard, and to have the freedom to implement the standard, to be able to innovate atop it with new systems. &lt;/p&gt;

&lt;p&gt;There's a lesson here. Gathering the relevant powers that be to figure out a standard is an important task. The &lt;a href="http://www.w3.org/"&gt;W3C&lt;/a&gt;, the &lt;a href="http://www.ietf.org/"&gt;IETF&lt;/a&gt;, the &lt;a href="http://www.omg.org/"&gt;OMG&lt;/a&gt; (that's Object Management Group, not the internet acrony, for you younguns), and what &lt;a href="http://www.google.com/search?client=safari&amp;rls=en-us&amp;q=data+standards&amp;ie=UTF-8&amp;oe=UTF-8"&gt;feels like every different data discipline&lt;/a&gt; on earth does standards this way.&lt;/p&gt;

&lt;p&gt;But there's a lot of fingers on the scale for most of this work. That's because data standards tend to get created by well-meaning, overworked, and underpaid people who are making a real sacrifice to work on the standards. And those people are going to depend on a lot of in-kind work from the interested parties, who are always going to try to bend the standards to their will. &lt;/p&gt;

&lt;p&gt;That can go multiple ways. The paranoid conclusion is that the for-profits involved will try to use the standard to increase stock prices, which is why smart standards efforts include &lt;a href="http://www.w3.org/Consortium/Patent-Policy-20040205/"&gt;patent policies&lt;/a&gt; to prevent enclosure. But there's a bigger problem out there, which is much less visible but much more of a force in the creation of standards that don't get used, or that don't do what we want them to do.&lt;/p&gt;

&lt;p&gt;It's what I call the problem of standards completeness. Experts in the field, interested parties, impassioned volunteers - these people by their nature tend to want to make the standard they build as complete as possible. They want to cover the most ground with the standard. They understand the space so well that they want to build standards that address vast swaths of work. &lt;/p&gt;

&lt;p&gt;But that violates the Rule of Least Power. And as we move towards a web of data, even a &lt;a href="http://www.nhinwatch.com/"&gt;web of patient data&lt;/a&gt;, we'll do well to make our standards by solving real problems with the simplest possible solutions, then releasing those solutions for others to build on. &lt;/p&gt;

&lt;p&gt;The impact of the simple evolvable standard in short term is probably less than a more complete, perfect standard. Certainly TCP/IP didn't scare the systems integrators at its inception. But it's the power of the crowd that can build on the open standard that breaks open the market. Thanks to simple standards, &lt;a href="http://en.wikipedia.org/wiki/Google#History"&gt;two talented programmers can start a company in a garage that changes the world&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;If we're going to bring that level of innovation potential to health IT, we need to keep the lessons of the simple standard in mind. Because right now, if you're a bright young entrepreneur, you don't get into health IT. And the lack of not just standards, but the right kinds of standards, is the first barrier we have to knock down to change that reality.&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/08/things_you_dont_want_to_watch.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/V9gLOrDTW4g" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/V9gLOrDTW4g/things_you_dont_want_to_watch.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/08/things_you_dont_want_to_watch.php</guid>
         <category />
         
         <pubDate>Wed, 05 Aug 2009 12:05:36 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/08/things_you_dont_want_to_watch.php</feedburner:origLink></item>
      
      <item>
         <title>Integrate. Annotate. Federate. </title>
          <description>&lt;p&gt;Following on to &lt;a href="http://scienceblogs.com/commonknowledge/2009/07/publishing_science_on_the_web.php"&gt;yesterday's post&lt;/a&gt;, where I wrote about the four functions that traditional publishers claim as their space (registration, certification, dissemination, preservation), I want to revisit an argument I made last week at the British Library. &lt;/p&gt;

&lt;p&gt;In &lt;a href="http://www.slideshare.net/wilbanks/future-of-scientific-communication-what-is-the-genuine-article"&gt;my slides&lt;/a&gt;, I argued that the web brings us at least three additional functions: integration, annotation, and federation. I wanted to get this argument out onto the web and get some feedback...&lt;/p&gt;

&lt;p&gt;Let's start with integration. The article no longer sits on a piece of dead tree, inside a journal formatted by date and volume and page number. It exists as a digital entity, capable of dense integration into other digital entities. One way to think of this is to think of how the citation is truly weak tea compared to the hyperlink - an individual citation carries more weight than an individual hyperlink, but the hyperlink is so easy to create, and carries &lt;a href="http://en.wikipedia.org/wiki/PageRank"&gt;so much power in aggregate&lt;/a&gt;, that we get Google. Citations are the only way most articles are integrated with other articles, and that simply has to change.&lt;/p&gt;

&lt;p&gt;Articles need to be integrated with lots of other digital information. Media is an obvious one, and the Elsevier-Cell &lt;a href="http://beta.cell.com/erickson/"&gt;"article of the future"&lt;/a&gt; seems to start here with an interview with the authors. To me this is absurd, and the height of how a "big company" thinks "the users" use the web. I don't want to hear an author interview with a reporter. I assume the author is going to say his or her work is sweets and sparkles and Nobel prizes. I'd rather see an embedded high-resolution video of all protocols necessary to replicate the experiment like the ones you get from &lt;a href="http://www.jove.com/"&gt;JoVE&lt;/a&gt; (I'd like them to actually be open access too, but that's a &lt;a href="http://scienceblogs.com/commonknowledge/2009/04/jove_goes_closed_access.php"&gt;different blog post)&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;If you want to make the article of the future, start with integration and work backwards. Don't start with the article and work forward, because you'll be trapped in document mentality instead of the network mentality.&lt;/p&gt;

&lt;p&gt;We don't just want the data downloadable, we want to be able to run the same algorithms the author ran on the data, and adjust the variables myself, to see if the results are the output of statistical foul play or negligence. We want to be able to hide all the boring language that recapitulates past canon and focus on the new assertions, unless of course the author is trying to game the past canon and shade the facts. And we want to be able to effortlessly click out and get data about the assertions in the paper from other databases - when there's a gene mentioned, we should be able to one-click and run any number of core queries against the sequence, the ontological classifications, order genetic materials from biobanks and so forth.&lt;/p&gt;

&lt;p&gt;Annotation is the second new essential function. The old method of annotation is through either writing a new paper that validates, invalidates, extends, or otherwise affects the assertions made in an old paper. Or if something is really wrong, there might be a letter to the editor or a retraction. In a wiki world, this is fundamentally insane. The paper is a snapshot of years of incremental knowledge progress. We have much better technology to use than dead trees. &lt;/p&gt;

&lt;p&gt;Of course, there isn't any incentive to take the wiki that is science and actually use a wiki to create and edit it. Scientists get tenure for papers, and &lt;a href="http://en.wikipedia.org/wiki/Egoboo"&gt;egoboo&lt;/a&gt; is cold comfort. Annotation needs to be provided by publishers, and is being provided, but the next step is to create an open platform that actually tracks the kind of annotation-relationships that the web enables. Bloggers use &lt;a href="http://en.wikipedia.org/wiki/Trackback"&gt;trackback&lt;/a&gt; to create a formal hyperlink between blog posts, and the protocol can and should be extended to let us connect all sorts of things: articles, wiki pages, database entries, catalog pages for biological materials, data sets, and on and on. By making these link transactions - which exist anyway - explicit and trackable, and most importantly reportable, we'll create a currency that scientists will gladly spend. It won't be about "sharing" but instead about "publishing" more of the intermediate knowledge that currently gets left on the lab floor when the paper gets written.&lt;/p&gt;

&lt;p&gt;Federation is the last essential new function I'll deal with here (have some theories on other long term essential ones, but they're poorly formed in comparison). By federation I mean the ability to take a set of articles and federate them into a corpus with other materials. There's a lot of reasons one might want to do this: text mining, semantic indexing, integration with information that is private, and so forth. It's great to be able to read articles on the web. But if we're going to really explode the way we communicate, the ability to cache local copies (or cloud copies) in new formats for new kinds of analysis, and the right to then distribute the resulting corpus for follow-on innovation and exploration, is going to be central. &lt;/p&gt;

&lt;p&gt;Publishers are so focused on the prevention of copying that they don't see the central business opportunity here: the human-readable, copyrighted version of the article is the least federation-friendly. Charge a fee to make the article beautifully machine-readable and give away the text - because the service of improving the technical aspects of the article is clearly a value-add that shouldn't be subject to a funder mandate.&lt;/p&gt;

&lt;p&gt;Integration, Annotation, Federation. It's what the Web is all about. And if we can get to the point where publishers feel these as core responsibilities, the Open Access debate will have made a major leap. All of these create a world in which the text of the article itself is lower in economic value, and thus easily distributable, than the connectivity of that article into a larger web of information. OA is the beginning, not the end game, of making the web work for science the way it works for culture. Step two is all about the connectivity, and it's time to start arguing - loudly - for the right to start wiring the science together.&lt;br /&gt;
&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/07/integrate_annotate_federate.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/iiaKvTx-jic" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/iiaKvTx-jic/integrate_annotate_federate.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/07/integrate_annotate_federate.php</guid>
         <category />
         
         <pubDate>Fri, 31 Jul 2009 11:21:22 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/07/integrate_annotate_federate.php</feedburner:origLink></item>
      
      <item>
         <title>Publishing science on the web</title>
          <description>&lt;p&gt;I spoke last week at an event at the &lt;a href="http://www.bl.uk"&gt;British Library&lt;/a&gt; about the&lt;a href="http://www.bl.uk/onlinegallery/whatson/events/talkscience.html"&gt; future of the scientific article&lt;/a&gt;. It was a lively event - lots of &lt;a href="http://friendfeed.com/british-library"&gt;friendfeed&lt;/a&gt; and twitter reactions - and it got me thinking a lot about the way we use publication in science. &lt;/p&gt;

&lt;p&gt;In my conversations with research staff and leaders at the BL, I ran across this statement. Publishers frequently claim four functions: registration (when was an idea stated?), certification (is the idea original, has it been "proved" to satisfactory peer review?), dissemination (delivery), and preservation of the record. The journal thus provides for both the claiming of ideas by scientists and for the "memory" of the sciences.&lt;/p&gt;

&lt;p&gt;But the Web does a lot of this for us outside of science. It's become easy to write and read, and to use Google as a memory cache. The ability to rapidly find relevant information is part of daily life for us outside of science. But inside of science there is complaint that even within one's own specialized discipline, there is too much to read, too many journals, too little time. This doesn't even begin to include the coming deluge of data wrought by the relentless miniaturization and parallelization of a world where data is generated by robotic lab machinery and captured by tiny, ubiquitous sensors.&lt;/p&gt;

&lt;p&gt;Wikis and blogs provide almost costless registration and dissemination of new scientific communication. But resistance to wikis and blogs is a feature of science - Nature's web efforts are yet to make significant revenue despite significant individual use. Is it a matter of certification? Preservation? Cultural aspects related to the way we fund and reward scientists?&lt;/p&gt;

&lt;p&gt;Another thought on science communication - science is already a wiki if you look at it a certain way. It's just a really, really inefficient one - the incremental edits are made in papers instead of wikispace, and significant effort is expended to recapitulate the existing knowledge in a paper in order to support the one-to-three new assertions made in any one paper. And the papers are written in a highly specialized form of text that demonstrates the expertise of the writer in the relevant domain, but can form a language barrier to scientists outside the domain understanding the key facts.&lt;/p&gt;

&lt;p&gt;In places where the local knowledge is sufficient enough to create falsifiable hypotheses and experiments, the time required to learn the language of others doesn't get rewarded by results - gene sequencing doesn't need a physicist, for example. How can we get to enough technical standards so that this kind of science can be harvested, aggregated, and mashed up by people and machines into a higher level of discipline traversal? Right now the problem is we still think about cross disciplinarity as a function of people choosing to work together. But the internet and the Web give us a different model. What's more cross disciplinary than Google? But the language barrier among scientists is preserved - indeed, made worse - by the lack of knowledge interoperability at the machine level. It's the Tower of Babel made digital. Until we can get past that one, we're going to be stuck doing human speed knowledge construction on machine speed data generation...&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/07/publishing_science_on_the_web.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/ltN0Cs3YzKw" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/ltN0Cs3YzKw/publishing_science_on_the_web.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/07/publishing_science_on_the_web.php</guid>
         <category />
         
         <pubDate>Thu, 30 Jul 2009 13:32:45 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/07/publishing_science_on_the_web.php</feedburner:origLink></item>
      
      <item>
         <title>WisconsinView converts to CC0</title>
          <description>&lt;p&gt;Just a quick hit - I'm digging out after a wonderful break from work - but this deserves notice...&lt;/p&gt;

&lt;blockquote&gt;Since 2004, &lt;a href="http://www.wisconsinview.org"&gt;WisconsinView&lt;/a&gt; has made aerial photography and satellite imagery of Wisconsin available to the public for free over the web. As part of the AmericaView consortium, WisconsinView supports access and use of these imagery collections through education, workforce development, and research. Starting June 30, 2009, WisconsinView is making available all of its more than 6 Terabytes of imagery data under the new CC0 Protocol provided by Creative Commons. The CC0 (pronounced CC-Zero) Protocol waives any rights in a dataset, ensuring that all of the dataset is available to anyone without encumbrance of any kind.&lt;/blockquote&gt; 

&lt;p&gt;Thanks to Puneet Kishor, our SC Fellow for geospatial, for his tireless advocacy on behalf of the public domain for data!&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/07/wisconsinview_converts_to_cc0.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/VGJeH_RAbys" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/VGJeH_RAbys/wisconsinview_converts_to_cc0.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/07/wisconsinview_converts_to_cc0.php</guid>
         <category />
         
         <pubDate>Sat, 11 Jul 2009 14:38:04 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/07/wisconsinview_converts_to_cc0.php</feedburner:origLink></item>
      
      <item>
         <title>Attribution v. Citation</title>
          <description>&lt;p&gt;There's an interesting &lt;a href="http://twitter.com/#search?q=science%20attribution"&gt;tweet&lt;/a&gt; about attribution in the data web. And it raises a tension I run into a lot but haven't seen a lot written about: the shifting nature of what the word "attribution" means.&lt;/p&gt;

&lt;p&gt;We have a fairly common understanding of &lt;a href="http://en.wikipedia.org/wiki/Attribution"&gt;attribution&lt;/a&gt; in our daily lives: credit where credit is due is mine, and it tends to be what most people think. This is whether one is a musician, a scientist, a teacher, or anyone who does creative or innovative work. We like getting credit for our work. No problem there. &lt;/p&gt;

&lt;p&gt;This idea of attribution encompasses the idea that we should get credit for our ideas. That if I'm the first one to realize that a certain gene knockout cures death, that the idea is linked to me forever. Like we link Watson and Crick to the DNA discovery. In this sense, attribution is very similar to the scholarly concept of &lt;a href="http://en.wikipedia.org/wiki/Citation"&gt;citation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;However, the word "attribution" in a copyright license is a different beast. It even has a &lt;a href="http://en.wikipedia.org/wiki/Attribution_(copyright)"&gt;different wikipedia entry&lt;/a&gt; (which I did not create, and have not edited, despite my temptation!). I don't like the first sentence a lot, because it's not clear that &lt;strong&gt;&lt;em&gt;in copyrights, attribution is something that gets triggered by the making of a copy&lt;/em&gt;&lt;/strong&gt; - not by the use of the ideas in the copyrighted work.&lt;/p&gt;

&lt;p&gt;This is the thing about the law. It's narrow in a lot of places. And it's often not what we think it is. Mainly because it was written by lawyers, not regular people.&lt;/p&gt;

&lt;p&gt;Let's look at the &lt;a href="http://creativecommons.org/licenses/by/3.0/legalcode"&gt;legal code of the Creative Commons Attribution license.&lt;/a&gt; It's interesting.&lt;/p&gt;

&lt;p&gt;The license grants the following rights:&lt;/p&gt;

&lt;p&gt;- to Reproduce the Work, to incorporate the Work into one or more Collections, and to Reproduce the Work as incorporated in the Collections;&lt;br /&gt;
- to create and Reproduce Adaptations provided that any such Adaptation, including any translation in any medium, takes reasonable steps to clearly label, demarcate or otherwise identify that changes were made to the original Work. For example, a translation could be marked "The original work was translated from English to Spanish," or a modification could indicate "The original work has been modified.";&lt;br /&gt;
- to Distribute and Publicly Perform the Work including as incorporated in Collections; and,&lt;br /&gt;
- to Distribute and Publicly Perform Adaptations.&lt;/p&gt;

&lt;p&gt;See? It's about reproducing the work, adapting the work, and distribution. I don't need these rights to read a work, or study a data set, and take the ideas in the work or the data set. I only need them to make copies and derivatives. The law doesn't allow ideas or facts to be covered by copyright. But don't take it from me, &lt;a href="http://www.copyright.gov/help/faq/faq-general.html"&gt;take it from the US Government:&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed."&lt;/p&gt;

&lt;p&gt;Now, because copyright doesn't protect these things, "attribution" in the sense of the license doesn't apply to ideas or facts either. Those rights above are conditional on my compliance with the terms of the license. Section 4 of the legal code lays out those conditions. If I fail to provide proper attribution, I lose the right to make and distribute copies and derivatives. I do NOT lose the right to "steal" the ideas in the article and claim them my own, because those ideas are not subject to copyright, and cannot be made subject to the attribution requirement. &lt;/p&gt;

&lt;p&gt;This is where understanding that to the law, attribution is a very specific term of art, which is very different from what we think casually and commonly. Citation is much closer to the way we think than what is enabled in public copyright licenses or, for that matter, private copyright licenses. &lt;/p&gt;

&lt;p&gt;This is why we recommend waiving attribution in the Science Commons protocol for open access to data. It's a narrow legal term that can screw with interoperability, while at the same time failing to provide what people really want, which is credit where credit is due.&lt;/p&gt;

&lt;p&gt;Puneet Kishor, one of our fellows, got it right. We shouldn't use the law to make it hard to do the wrong thing. We should use technology to make it easy to do the right thing. &lt;/p&gt;

&lt;p&gt;When it comes to data, and in particular to data interoperability, enabling citation and provenance that is easy to track and cite will serve the scientific goals far better than an attempt to port open source "principles" into a world where they fundamentally don't fit. &lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/06/attribution_v_citation.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/NkGGwYtYCQI" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/NkGGwYtYCQI/attribution_v_citation.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/06/attribution_v_citation.php</guid>
         <category />
         
         <pubDate>Wed, 24 Jun 2009 16:44:26 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/06/attribution_v_citation.php</feedburner:origLink></item>
      
      <item>
         <title>State of Innovation Summit</title>
          <description>&lt;p&gt;I'm at the&lt;a href="http://seedmagazine.com/stateofinnovation/"&gt; Seed - Council on Competitiveness State of Innovation Summit&lt;/a&gt;. I was thinking about live blogging, but find that doing so makes it hard for me to think about what people are actually saying. There's a &lt;a href="www.seedmagazine.com/webcast"&gt;webcast&lt;/a&gt; if you're interested.&lt;/p&gt;

&lt;p&gt;As far as conferences go, it's a good one. Rock stars on the stage (E.O. Wilson is a hero of mine) and interesting conversations about innovation.&lt;/p&gt;

&lt;p&gt;But I'm frustrated, as I often am at "innovation" conferences. What follows is a bit of a rant directed less at this event, which as I said is a good one, but at the conversation I hear all the time about scientific innovation. There are three problems.&lt;/p&gt;

&lt;p&gt;Problem 1: there's almost no conversation about the essential theories of emerging innovation - &lt;a href="http://en.wikipedia.org/wiki/Open_innovation"&gt;open&lt;/a&gt;, &lt;a href="http://userinnovation.mit.edu/"&gt;user-driven&lt;/a&gt;, &lt;a href="http://www.mitpressjournals.org/doi/pdf/10.1162/itgg.2007.2.3.97?cookieSet=1"&gt;distributed&lt;/a&gt;. This is about the new forms of innovation that the network enables, and should be on every agenda of every meeting that claims to talk about innovation. If we simply do things the old way, but bigger, we fail. Disruptive innovation models ought to be part of the conversation and they too often aren't. &lt;/p&gt;

&lt;p&gt;Problem 2: there's no conversation about technical infrastructure for innovation. Here's what I mean by that: the internet is infrastructure for innovation in culture and commerce. It underpins an enormous amount of economic value, and from it emerges disruption that we could never have predicted, like the Web. And the web in turn begat Google, Amazon, Facebook, blogging, you name it. Both of these systems work this way &lt;a href="http://isen.com/blog/2009/04/broadband-without-internet-ain-worth.html"&gt;because they are public systems&lt;/a&gt;. Yet we don't talk about an open public technical infrastructure for science. We build individual bits of it, but our vision is investing in unconnected nodes, not networks.&lt;/p&gt;

&lt;p&gt;On top of this, there is the assumption that because the web works for culture, it works for science. But the Web is a system built for documents - it's infrastructure for documents. Science innovation depends on data. This conference had a great panel on data, with Ben Fry, who's a data visualization wizard. Yet no conversation that the infrastructure we have for the Web completely fails at data. Infrastructure for making the web function on data is woeful - format standards, annotation, and so on are always underfunded and first to cut in crisis. &lt;/p&gt;

&lt;p&gt;Infrastructure for data integration, data federation, and so forth should be encoded directly into the open standards of the web and internet. Full stop. And we should talk about this problem more often. Otherwise people look at their iPhones, check for a latte, and assume this level of functionality scales from coffee to the bench. It doesn't.&lt;/p&gt;

&lt;p&gt;Problem 3: there's no conversation about the way that our legal and policy regimes affect emerging modes of innovation. Data use is dependent on legal access to data. There's a range of data regimes across the world that make legal access to data conditional on rights being granted. Copyright licenses prevent innovative scientists from using software to index the literature and integrate it into the database world. Default settings on government policy create strong incentives for patenting smaller and smaller inventions by universities. Tenure and review systems encourage secrecy and withholding.&lt;/p&gt;

&lt;p&gt;Taken together, these three problems represent the core "immune system" of science to disruptive change. That's not a terrible thing. Science should resist some disruptive changes. But right now, the disruptive change being resisted is the network. It's a terrible irony that at the moment we have the technical ability to send any content anywhere at almost no cost of distribution, we haven't got the technical and legal infrastructure to realize the potential of that ability for science. It's an even more terrible irony that the innovation resulting from that ability in culture is being constricted by the very policies and regimes we claim to promote innovation.&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/06/state_of_innovation_summit.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/NWnGTyM6HMc" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/NWnGTyM6HMc/state_of_innovation_summit.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/06/state_of_innovation_summit.php</guid>
         <category />
         
         <pubDate>Tue, 23 Jun 2009 12:39:25 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/06/state_of_innovation_summit.php</feedburner:origLink></item>
      
      <item>
         <title>@ the British Library, July 22</title>
          <description>&lt;p&gt;I'm happy to say that I'll be doing a forum at the British Library on July 22, called Scientific Findings in a Digital World: What is the Genuine Article? There's a &lt;a href="http://network.nature.com/groups/genuine_article/forum/topics"&gt;Nature Network group you can join&lt;/a&gt; to participate in the creation of the agenda.&lt;/p&gt;

&lt;p&gt;This is pretty cool. The British Library is a legendary institution, and has some personal resonance for me too - my dad wrote a big chunk of his dissertation in the reading room there. I'll make a few introductory comments and then do my best Oprah impersonation.&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/06/_the_british_library_july_22.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/ZBXVe0EtfQs" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/ZBXVe0EtfQs/_the_british_library_july_22.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/06/_the_british_library_july_22.php</guid>
         <category />
         
         <pubDate>Mon, 01 Jun 2009 14:32:29 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/06/_the_british_library_july_22.php</feedburner:origLink></item>
      
      <item>
         <title>Podcast on Cloud of Data</title>
          <description>&lt;p&gt;Paul Miller and I recorded a chat last week that's now online as a &lt;a href="http://cloudofdata.com/2009/06/john-wilbanks-talks-about-creative-commons-data-science-and-more/"&gt;podcast from Cloud of Data&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Paul is a smart guy and it was a fun interview. We first met when he was working with &lt;a href="http://talis.com"&gt;Talis&lt;/a&gt;, which is a very progressive company in the UK (they sponsored some of the development of the &lt;a href="http://www.opendatacommons.org/licenses/pddl/1.0/"&gt;PDDL&lt;/a&gt; and currently host data in the public domain for free in the &lt;a href="http://www.talis.com/platform/cc/"&gt;Talis Connected Commons&lt;/a&gt;) but he's now out freelancing. Check out the podcast and let me know your comments.&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/06/podcast_on_cloud_of_data.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/oXWEswBAxE0" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/oXWEswBAxE0/podcast_on_cloud_of_data.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/06/podcast_on_cloud_of_data.php</guid>
         <category />
         
         <pubDate>Mon, 01 Jun 2009 09:13:03 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/06/podcast_on_cloud_of_data.php</feedburner:origLink></item>
      
      <item>
         <title>Video of talk online...</title>
          <description>&lt;p&gt;So, I was supposed to go up to Montreal and Ottawa the past couple of days, but a series of miserable luck in terms of planes made it unworkable (it's complicated).&lt;/p&gt;

&lt;p&gt;Instead, I tried to record a presentation and get it onto the web so we could play it for them, and then take questions by skype. That also didn't work.&lt;/p&gt;

&lt;p&gt;However, we were successful in the end getting the video online. So if you're interested in what I say when I talk to the libraries, but haven't been to one of the conferences where I've spoken, take a look.&lt;/p&gt;

&lt;p&gt;&lt;embed src="http://blip.tv/play/gpxSgYWQdIPZMg" type="application/x-shockwave-flash" width="512" height="414" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt; &lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/05/video_of_talk_online.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/aZv1to1FHZ0" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/aZv1to1FHZ0/video_of_talk_online.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/05/video_of_talk_online.php</guid>
         <category />
         
         <pubDate>Thu, 28 May 2009 15:22:22 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/05/video_of_talk_online.php</feedburner:origLink></item>
      
      <item>
         <title>Digg adopts CC0</title>
          <description>&lt;p&gt;As noted on the Creative Commons blog, the folks at &lt;a href="http://blog.digg.com/?p=779"&gt;Digg have converted to CC0&lt;/a&gt; (replacing a multiyear use of a different public domain legal tool).&lt;/p&gt;

&lt;p&gt;This is very cool on lots of levels. But Daniel Burka of Digg said it best, so I'll make this a short post by simply quoting him...&lt;/p&gt;

&lt;blockquote&gt;This is good for the internet and good for society.&lt;/blockquote&gt;

&lt;p&gt;He's talking about the public domain, and he's right.&lt;br /&gt;
&lt;/p&gt; &lt;a href="http://scienceblogs.com/commonknowledge/2009/05/digg_adopts_cc0.php#commentsArea"&gt;Read the comments on this post...&lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/scienceblogs/CommonKnowledge/~4/knHWkH2eqno" height="1" width="1"/&gt;</description>
         <link>http://feedproxy.google.com/~r/scienceblogs/CommonKnowledge/~3/knHWkH2eqno/digg_adopts_cc0.php</link>
         <guid isPermaLink="false">http://scienceblogs.com/commonknowledge/2009/05/digg_adopts_cc0.php</guid>
         <category />
         
         <pubDate>Wed, 27 May 2009 16:10:38 -0500</pubDate>
      <feedburner:origLink>http://scienceblogs.com/commonknowledge/2009/05/digg_adopts_cc0.php</feedburner:origLink></item>
      
   </channel>
</rss>
