<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Federated Search</title>
	
	<link>http://federatedsearchblog.com</link>
	<description>Covers topics related to federated search and the deep web</description>
	<pubDate>Fri, 03 Jul 2009 23:13:00 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/Federatedsearchblogcom" type="application/rss+xml" /><feedburner:emailServiceId>Federatedsearchblogcom</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Michael Bergman on data federation</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/5jz5sv1lDgM/</link>
		<comments>http://federatedsearchblog.com/2009/07/03/michael-bergman-on-data-federation/#comments</comments>
		<pubDate>Fri, 03 Jul 2009 23:11:25 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=767</guid>
		<description><![CDATA[I&#8217;m new to the term &#8220;data federation.&#8221; How about you? 
Michael Bergman, federated search luminary, just wrote on the subject, preferring the term &#8220;data mixing.&#8221; He explains the concept:

 What is Data Mixing and Why is it So Hard?
As a new term there is no “official” definition of data mixing. However, I think we can [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Michael Bergman on data federation", url: "http://federatedsearchblog.com/2009/07/03/michael-bergman-on-data-federation/" });</script>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m new to the term &#8220;<a href="http://en.wikipedia.org/wiki/Federated_database_system">data federation</a>.&#8221; How about you? </p>
<p>Michael Bergman, <a href="http://federatedsearchblog.com/2008/10/17/luminary-interview-with-michael-bergman-a-preview/">federated search luminary</a>, <a href="http://www.mkbergman.com/?p=496">just wrote on the subject</a>, preferring the term &#8220;data mixing.&#8221; He explains the concept:</p>
<blockquote>
<h2> What is Data Mixing and Why is it So Hard?</h2>
<p>As a new term there is no “official” definition of data mixing. However, I think we can consider it as generally equivalent to the older data federation concept.</p>
<p>Data federation is the bringing together of data from heterogeneous and often physically distributed data sources into a single, coherent view. Sometimes this is the result of searching across multiple sources, in which case it is called <a href="http://en.wikipedia.org/wiki/Federated_search">federated search</a>. But it is not limited to search. Data federation is a key concept in <a href="http://en.wikipedia.org/wiki/Business_intelligence">business intelligence</a> and <a href="http://en.wikipedia.org/wiki/Data_warehouse">data warehousing</a> and a driver behind <a href="http://en.wikipedia.org/wiki/Master_data_management">master data management</a> (MDM).
</p></blockquote>
<p><span id="more-767"></span>Bergman explains that data federation was a hot research topic in the 1980s. Computers of different hardware, operating systems, databases, and other software were proliferating. Today&#8217;s robust and ubiquitous networking protocols were far from mature then. There were no dominant standards for data representation in the 80&#8217;s. Today we take interoperability for granted; if two systems don&#8217;t speak to one another directly we expect that someone has already developed software to bridge the gap. The whole Internet speaks TCP/IP. XML is everywhere.</p>
<p>So, we can say that it took the solving of some major data federation problems to lay the foundation for the Internet and the Web that we enjoy today.</p>
<p>Bergman further explains that the next major challenge is in semantics:</p>
<blockquote><p>
The Internet and its TCP/IP and Web HTTP protocols and XML standards in particular, have been major contributors to overcoming respective physical and syntactical and data exchange heterogeneities. The current challenge is to resolve differences in meaning, or semantics, between disparate data sources. Your “glad” may be someone else’s “happy” and you may organize the world into countries while others organize by regions or cultures.
</p></blockquote>
<p>I recommend Bergman&#8217;s article, especially if you have an interest in the Semantic Web. It&#8217;s moderately technical but it&#8217;s worth the read to understand where data federation fits into the Semantic Web.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Michael+Bergman+on+data+federation&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F07%2F03%2Fmichael-bergman-on-data-federation%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/5jz5sv1lDgM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/07/03/michael-bergman-on-data-federation/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/07/03/michael-bergman-on-data-federation/</feedburner:origLink></item>
		<item>
		<title>Science source selection</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/n9SyWyiTeh0/</link>
		<comments>http://federatedsearchblog.com/2009/07/01/science-source-selection/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 03:21:58 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[viewpoints]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=739</guid>
		<description><![CDATA[My fur was raised when I saw Serials Solutions&#8217; claim that their discovery service was an evolutionary step beyond federated search. I raised my concerns a couple of times: here and here. My beef isn&#8217;t with Serials Solutions as a business, it&#8217;s with their position that it&#8217;s fine to not search content that they don&#8217;t [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Science source selection", url: "http://federatedsearchblog.com/2009/07/01/science-source-selection/" });</script>]]></description>
			<content:encoded><![CDATA[<p>My fur was raised when I saw Serials Solutions&#8217; claim that their discovery service was an evolutionary step beyond federated search. I raised my concerns a couple of times: <a href="http://federatedsearchblog.com/2009/03/19/beyond-federated-search/">here</a> and <a href="http://federatedsearchblog.com/2009/03/20/beyond-federated-search-the-conversation-continues/">here</a>. My beef isn&#8217;t with Serials Solutions as a business, it&#8217;s with their position that it&#8217;s fine to not search content that they don&#8217;t provide access to. There&#8217;s no room (yet) in their discovery service model to include access to quality content that can only be searched live, i.e. via federated search. Carl Grant <a href="http://federatedsearchblog.com/2009/03/30/beyond-federated-search-%E2%80%93-winning-the-battle-and-losing-the-war/">joined the conversation</a> and various people commented, making the topic a very lively one.</p>
<p>My concern was, and is, that libraries and research organizations would consider giving away their responsibility to select quality sources for their patrons for what I imagine to be two primary reasons: (1) library patrons don&#8217;t like to wait 30 seconds for federated search results, and (2) (possibly) cost savings. I don&#8217;t have a lot of sympathy for the Google generation. Even though I&#8217;m an American and my culture has taught me that immediate gratification is a good thing I think 30 seconds is a small price to pay to see better results. Cost I can&#8217;t speak to as I don&#8217;t have any figures.</p>
<p><span id="more-739"></span>One of my colleagues pointed me to an article by scientist and writer Michael Nielsen, <a href="http://michaelnielsen.org/blog/?p=629">Is scientific publishing about to be disrupted?</a>, which only strengthens my belief that access to content from aggregators only supplements access via other methods such as federated search.</p>
<p>Michael Nielsen is a very accomplished scientist. <a href="http://michaelnielsen.org/blog/?page_id=181">His bio</a> lists some of his impressive credentials:</p>
<blockquote><p>
Michael Nielsen is one of the pioneers of quantum computation. Together with Ike Chuang of MIT, he wrote the <a href="http://books.google.com/books?id=66TgFp2YqrAC&#038;dq=Nielsen+Chuang">standard text on quantum computation</a>. This is the most highly cited physics publication of the last 25 years, and one of the ten most highly cited physics books of all time (Source: Google Scholar, December 2007). He is the author of more than fifty scientific papers, including invited contributions to Nature and Scientific American. His research contributions include involvement in one of the first <a href="http://arxiv.org/abs/quant-ph/9811020">quantum teleportation experiments</a> (<a href="http://scholar.google.com/scholar?hl=en&#038;lr=&#038;cites=15126028160438974342">related</a>), named as one of <a href="http://www.sciencemag.org/cgi/content/full/282/5397/2157?maxtoshow=&#038;HITS=10&#038;hits=10&#038;RESULTFORMAT=&#038;fulltext=1998+breakthrough+of+the+year&#038;searchid=1&#038;FIRSTINDEX=0&#038;resourcetype=HWCIT">Science Magazine’s Top Ten Breakthroughs of the Year for 1998</a>, <a href="http://arxiv.org/abs/quant-ph/9703032">quantum gate teleportation</a>, <a href="http://arxiv.org/abs/quant-ph/9610001">quantum process tomography</a>, the fundamental <a href="http://arxiv.org/abs/quant-ph/9811053">majorization theorem</a> for comparing entangled quantum states, and critical contributions to the formula for the quantum channel capacity. A full list of papers is <a href="http://scholar.google.com/scholar?hl=en&#038;lr=&#038;sa=G&#038;oi=qs&#038;q=ma+nielsen+author:m-nielsen">here</a>.
</p></blockquote>
<p>Nielsen&#8217;s article argues that there is impending disruption of scientific publishing. The article is fascinating, Nielsen is a compelling and well-informed writer and I recommend you read the fairly long article and, if you have time, that you follow at least some of the numerous links. I want to also add that I had the opportunity to spend some time with Nielsen at a conference he helped to organize at the <a href="http://perimeterinstitute.ca/">Perimeter Institute</a> and I very much appreciate how incredibly down to earth the man is.</p>
<p>What I found most valuable in Nielsen&#8217;s writing were various examples of science being published in non-traditional ways. </p>
<p>One example is Nielsen&#8217;s response to a New York Times editorial about the death of newspapers. Here&#8217;s a snippet from the editorial:</p>
<blockquote><p>
There’s a great deal of good commentary out there on the Web, as you say. Frankly, I think it is the task of bloggers to catch up to us, not the other way around… Our board is staffed with people with a wide and deep range of knowledge on many subjects. Phil Boffey, for example, has decades of science and medical writing under his belt and often writes on those issues for us… Here’s one way to look at it: If the Times editorial board were a single person, he or she would have six Pulitzer prizes…
</p></blockquote>
<p>And here&#8217;s Nielsen&#8217;s poignant response:</p>
<blockquote><p>
[The New York Times editorial piece] demonstrates a deep commitment to high-quality journalism, and the other values that have made the New York Times great. In ordinary times this kind of commitment to values would be a sign of strength. The problem is that as good as Phil Boffey might be, I prefer the combined talents of Fields medallist <a href="http://terrytao.wordpress.com/">Terry Tao</a>, Nobel prize winner <a href="http://www.scientificblogging.com/cwieman">Carl Wieman</a>, MacArthur Fellow <a href="http://vonahn.blogspot.com/">Luis von Ahn</a>, acclaimed science writer <a href="http://blogs.discovermagazine.com/loom/">Carl Zimmer</a>, and thousands of others. The blogosophere has at least <a href="http://michaelnielsen.org/blog/?p=545">four Fields medalists</a> (the Nobel of math), three Nobelists, and many more luminaries. The New York Times can keep its Pulitzer Prizes.
</p></blockquote>
<p>Nielsen&#8217;s point is clear. The blogosphere is a tremendous resource to scientists. Libraries and research organizations miss huge amounts of valuable and current resources if they only provide access to content from major publishers (or their aggregators.) I do realize that the writings of probably all of the bloggers that Nielsen mentioned is available through Google and might not make sense to federate. The problem with searching Google for excellent science is that you need the time and discernment to find the good stuff. But, however one might access science content, the power of traditional publishers is waning which is a really good reason to not depend on them for all the science worth reading.</p>
<p>Here&#8217;s another excerpt from Nielsen&#8217;s article, this one on innovative ways to communicate science that are sprouting up everywhere:</p>
<blockquote><p>
What’s new today is the flourishing of an ecosystem of startups that are experimenting with new ways of communicating research, some radically different to conventional journals. Consider <a href="http://www.chemspider.com/">Chemspider</a>, the excellent online database of more than 20 million molecules, recently <a href="http://www.rsc.org/AboutUs/News/PressReleases/2009/ChemSpider.asp">acquired</a> by the Royal Society of Chemistry. Consider <a href="http://www.mendeley.com/">Mendeley</a>, a platform for managing, filtering and searching scientific papers, with <a href="http://en.wikipedia.org/wiki/Mendeley#History">backing</a> from some of the people involved in Last.fm and Skype. Or consider startups like <a href="http://www.scivee.tv/">SciVee (YouTube for scientists)</a>, the <a href="http://plos.org/">Public Library of Science</a>, the <a href="http://jove.com/">Journal of Visualized Experiments</a>, vibrant community sites like <a href="http://openwetware.org/wiki/Main_Page">OpenWetWare</a> and the <a href="http://www.alzforum.org/">Alzheimer Research Forum</a>, and dozens more. And then there are companies like <a href="http://wordpress.com/">Wordpress</a>, <a href="http://friendfeed.com/">Friendfeed</a>, and <a href="http://www.wikimedia.org/">Wikimedia</a>, that weren’t started with science in mind, but which are increasingly helping scientists communicate their research.
</p></blockquote>
<p>These Web 2.0 science offerings, at least the ones that provide an API or other mechanism for efficient search, are prime candidates for federation as they constantly generate new content.</p>
<p>One last quote from Nielsen. I very much enjoyed the great examples Nielsen packed into this paragraph of outstanding science being found in blogs of all places.</p>
<blockquote><p>
It’s easy to miss the impact of blogs on research, because most science blogs focus on outreach. But more and more blogs contain high quality research content. Look at Terry Tao’s <a href="http://terrytao.wordpress.com/tag/285g-poincare-conjecture/">wonderful series of posts</a> explaining one of the biggest breakthroughs in recent mathematical history, the proof of the <a href="http://en.wikipedia.org/wiki/Poincare_conjecture">Poincare conjecture</a>. Or Tim Gowers recent experiment in “<a href="http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/">massively collaborative mathematics</a>”, using open source principles to successfully attack a significant mathematical problem. Or Richard Lipton’s <a href="http://rjlipton.wordpress.com/tag/factoring/">excellent series of posts</a> exploring his ideas for solving a major problem in computer science, namely, finding a fast algorithm for factoring large numbers. Scientific publishers should be terrified that some of the world’s best scientists, people at or near their research peak, people whose time is at a premium, are spending hundreds of hours each year creating original research content for their blogs, content that in many cases would be difficult or impossible to publish in a conventional journal. What we’re seeing here is a spectacular expansion in the range of the blog medium. By comparison, the journals are standing still.
</p></blockquote>
<p>At SLA 2009, <a href="http://federatedsearchblog.com/About#Abe">Abe</a> delivered a presentation: <a href="http://federatedsearchblog.com/2009/06/08/journey-to-10000-sources-at-sla-2009/">A Journey to 10,000 sources</a>. The talk was about (this blog&#8217;s sponsor) <a href="http://www.deepwebtech.com">Deep Web Technologies</a>&#8216; efforts to search initially hundreds, then thousands, and eventually 10,000 sources. The <a href="http://www.sla.org/pdfs/sla2009/SciResJourneyTenThouSources.pdf">accompanying paper</a> makes this important argument for making a wider range of science information available to researchers:</p>
<blockquote><p>
By relying on only the content available from the major publishers and aggregators, researchers miss other important content, in  particular the output of scientists who do not publish in mainstream journals. The world is  shrinking, the brain pool is growing, and the output of science is everywhere.
</p></blockquote>
<p>While one may argue about the merits of federation vs. crawling and indexing vs. discovery services those arguments frequently focus on the technological merits of particular approaches. The more important question, I think, is what information is worth your while to see? For most of us that information can&#8217;t all be federated, or all indexed, or all provided to us by a discovery service. I think federated search will continue to evolve into this hybrid being where multiple technologies are enlisted to give scientists what they need.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Science+source+selection&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F07%2F01%2Fscience-source-selection%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/n9SyWyiTeh0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/07/01/science-source-selection/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/07/01/science-source-selection/</feedburner:origLink></item>
		<item>
		<title>Review: About.com guide to Online Research</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/t2vcAs57eOg/</link>
		<comments>http://federatedsearchblog.com/2009/06/28/review-aboutcom-guide-to-online-research/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 03:40:25 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[books]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=717</guid>
		<description><![CDATA[When I reviewed Going Beyond Google I made a mental note to try to find an inexpensive consumer-oriented guide to performing research in the deep Web. While Going Beyond Google is a great book that I highly recommend for use in LIS programs, the book is a class text and at $65 it&#8217;s not a [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Review: About.com guide to Online Research", url: "http://federatedsearchblog.com/2009/06/28/review-aboutcom-guide-to-online-research/" });</script>]]></description>
			<content:encoded><![CDATA[<p>When I reviewed <a href="http://federatedsearchblog.com/2009/04/20/review-going-beyond-google/">Going Beyond Google</a> I made a mental note to try to find an inexpensive consumer-oriented guide to performing research in the deep Web. While Going Beyond Google is a great book that I highly recommend for use in LIS programs, the book is a class text and at $65 it&#8217;s not a book that is aimed at the masses.</p>
<p>When I learned about About.com&#8217;s $18 <a href="http://www.amazon.com/About-com-Guide-Online-Research-Web%C2%97/dp/1598695037">guide to Online Research</a> I became very curious to see if I had found a complement to Going Beyond Google. I got a review copy from the publisher and what follows are my impressions of the book.</p>
<p>The Online Research book is authored by <a href="http://websearch.about.com/bio/Wendy-Boswell-13134.htm">Wendy Boswell</a>, About.com&#8217;s guide to Web Search. The book is 276 pages long and has 15 chapters plus several appendices. The book was published in 2007. While this may seem pretty current, depending on what month the book was published it might be two and a half years old. That&#8217;s getting old given the numerous references to web resources.</p>
<p><span id="more-717"></span>My main interest was in the value of the book for proselytizing about the value of federated and deep Web searching.  Chapters 8, 9, and 10 were most relevant:</p>
<blockquote><p>
Chapter 8: Digging Deeper with the Invisible Web<br />
Chapter 9: The Web as Your Personal Librarian<br />
Chapter 10: Evaluating Web Sites for Credibility
</p></blockquote>
<p>For the sake of completeness I&#8217;ll list the other chapters although I only skimmed them:</p>
<blockquote><p>
Chapter 1: An Introduction to the World Wide Web<br />
Chapter 2: The Basic Web Search Toolbox<br />
Chapter 3: Using Search Engines<br />
Chapter 4: Google Tips and Tricks<br />
Chapter 5: Searching the Web with RSS<br />
Chapter 6: The Niche Web<br />
Chapter 7: Using the Social Web in Searches<br />
Chapter 11: Finding Multimedia on the Web<br />
Chapter 12: Mining the Blogosphere<br />
Chapter 13: Keeping Your Web Searches Private<br />
Chapter 14: Most-Requested Reader Searches<br />
Chapter 15: Web 2.0
</p></blockquote>
<p></p>
<h2>Chapter 8: Digging Deeper with the Invisible Web</h2>
<p>This chapter provides a really good introduction to the deep Web. I particularly appreciated this paragraph:</p>
<blockquote><p>
Why is the invisible Web important? I can answer that in one word: quality. Most of the information on the invisible Web is very topic-focused, simply because most of this fantastic information is packaged in various databases concerning everything from archeology to zoology. Because this information is so narrow - and for the most part, academically oriented - you&#8217;re more likely to obtain higher than average quality search results in a shorter amount of time, which definitely comes in handy when you&#8217;re trying to do a research paper on a deadline.
</p></blockquote>
<p>Bingo! I couldn&#8217;t have said it better. I like the author&#8217;s clear and simple style of writing. She goes on to discuss the size of the deep Web, citing statistics from Michael Bergman&#8217;s Bright Planet seminal paper on the subject. She explains how crawling differs from deep Web searching and how &#8220;invisible Web gateways&#8221; provide access to deep Web content. Most of the rest of the chapter lists deep Web resources (portals and search engines.)</p>
<p>I learned a handy trick for finding deep Web databases in this chapter. Add the word &#8220;database&#8221; to your queries. Sure enough, when I tried the example of searching for the two words <a href="http://www.google.com/#hl=en&#038;q=flowers+database&#038;aq=f&#038;oq=&#038;aqi=&#038;fp=0k1C0PXk7Hc">&#8220;flowers&#8221; and &#8220;database&#8221;</a> (not as a phrase) the top few results were all to searchable databases of flower-related information. I found a pressed flower database, a gardening plant finder from the BBC, and a searchable database of companies in the flora industry, to name a few.</p>
<p></p>
<h2>Chapter 9: Using the Web as Your Personal Librarian</h2>
<p>This chapter is about finding a topic to research. It provides more web resources; these are general reference resources intended to get a researcher high level information about a subject. Some of the resources are deep Web ones: <a href="http://www.library.yale.edu/art/databases.html">The Arts Database at Yale</a>, <a href="http://library.dialog.com/bluesheets/html/bl0034.html">SciSearch</a>, <a href="http://www.science.gov">Science.gov</a>, and <a href="http://biography.com">Biography.com</a> are just a few of the ones mentioned. The chapter does touch on how to evaluate resources for credibility but leaves the deeper discussion for the next chapter.</p>
<p>I have to note that this book is chock full of resources. I find this chapter, like much of the book, is filled with descriptions and URLs to many great web sites. While I like the level of detail I also find it a bit overwhelming. Too many sources and not enough time to discover them all. That&#8217;s how I feel. So, I find myself skimming much of the book, looking for what&#8217;s relevant to me. Maybe that&#8217;s the author&#8217;s intent.</p>
<p></p>
<h2>Chapter 10: Evaluating Web Sites for Credibility</h2>
<p>Of course, with the scholarly federated search engines out there one needn&#8217;t worry about the credibility of information. It&#8217;s when one strays from the deep Web search engines that one has to worry about the credibility of the content found. I do think that this question of credibility is a critical one, especially for researchers. But, even the public should be more concerned about what&#8217;s true. Just because it&#8217;s online doesn&#8217;t make it true, right? So, how do we know what&#8217;s true?</p>
<p>This chapter considers factors that determine credibility: outside editorial oversight, double-checking of facts, and maintenance by trained experts. Specific advice is provided on how to evaluate a web-site:</p>
<ul>
<li>Who&#8217;s in charge?
<li>Is it absolutely clear which company or organization is responsible for the information on the site?
<li>Is there a link to a page describing what the company or organization does and the people who are involved (an &#8220;About Us&#8221; page)?
<li>Is there a valid way of making sure the company or organization is legit - is this a real place that has real contact information?
<li>Is the site telling me the truth?
<li>What is your source really trying to tell you?
</ul>
<p>I appreciate this exploration of critical thinking skills. These skills are not ones I hear discussed particularly often. As young people enter their college years, given how much time they&#8217;re going to be spending online, I think it&#8217;s important that they learn to filter what they read.</p>
<p>What do I think of the book? Do I recommend it? I like it and I recommend it. It is certainly not a replacement for Going Beyond Google. It is not an academic book. I wouldn&#8217;t use this as the only book in a college course but I would use it as a second source. The book provides a very readable introduction to the deep Web. It provides too many resources but you may not find that overwhelming. It gives a really great introduction to web searching, which applies to federated search as much as it does to searching the crawlers. This is a great book to give to a child heading off to college, especially if the child has an aptitude for or an interest in information science.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Review%3A+About.com+guide+to+Online+Research&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F28%2Freview-aboutcom-guide-to-online-research%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/t2vcAs57eOg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/28/review-aboutcom-guide-to-online-research/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/28/review-aboutcom-guide-to-online-research/</feedburner:origLink></item>
		<item>
		<title>Federated search: A wonder or a waste?</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/gNA-yILXWCU/</link>
		<comments>http://federatedsearchblog.com/2009/06/24/federated-search-a-wonder-or-a-waste/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 19:42:45 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[conferences/shows]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=703</guid>
		<description><![CDATA[Michelle Manafy, Chair of the upcoming Enterprise Summit West Conference in San Jose in November, came up with the catchy &#8220;Wonder or Waste?&#8221; title for the conference&#8217;s panel discussion about federated search.
Michelle asked me to moderate the panel, an honor I was grateful to accept. Michelle and I worked together to identify and recruit panelists [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Federated search: A wonder or a waste?", url: "http://federatedsearchblog.com/2009/06/24/federated-search-a-wonder-or-a-waste/" });</script>]]></description>
			<content:encoded><![CDATA[<p>Michelle Manafy, Chair of the upcoming <a href="http://www.enterprisesearchsummit.com/west2009">Enterprise Summit West Conference</a> in San Jose in November, came up with the catchy &#8220;Wonder or Waste?&#8221; title for the conference&#8217;s <a href="http://www.enterprisesearchsummit.com/west2009/daythree.shtml">panel discussion</a> about federated search.</p>
<p>Michelle asked me to moderate the panel, an honor I was grateful to accept. Michelle and I worked together to identify and recruit panelists and to define the topic. Here&#8217;s what we came up with:</p>
<blockquote>
<h2>Federated Search: A Wonder or a Waste?</h2>
<p>Thursday, November 19, 2009<br />
10:45 am – 11:30 am</p>
<p>Moderator: Sol Lederman, Federated Search Blog<br />
Dr Peter Noerr, CTO, MuseGlobal, Inc.<br />
Carl Grant, President, Ex Libris North America<br />
Christopher Cox, Dean of Libraries, Western Washington University</p>
<p>Opinions about the value of federated search vary widely. Some view it as the optimal way to discover unified content. Others believe it to be a slow and poor substitute for searching the underlying sources. Some see it as a necessary evil and learn to tolerate it within their organizations. Come listen to our three panelists discuss their experience with numerous federated search deployments. Learn about the benefits federated search can offer as well as its shortcomings and common pitfalls. Walk away knowing if federated search is right for your organization and how to get from a blank search screen to one that delivers the results your users need.
</p></blockquote>
<p><span id="more-703"></span>Regular readers of this blog should recognize all three panelists.</p>
<p>Peter Noerr, CTO of MuseGlobal, has been a great supporter of this blog through a number of insightful comments although I&#8217;ve not yet corralled him into being a guest writer. MuseGlobal does quite a bit of content integration work that they don&#8217;t label as federated search. Peter was also one of the judges in our first-ever <a href="http://federatedsearchblog.com/2008/06/18/noted-industry-experts-to-judge-federated-search-writing-contest/">predict-the-future-of-federated-search</a> writing contest. </p>
<p>Carl Grant, President of Ex Libris North America, has written a number of very well received guest articles for this blog. I&#8217;ve given Carl his own <a href="http://federatedsearchblog.com/category/carl-grant/">category</a> in the blog to make it easy for readers to find all his articles. Carl and I first &#8220;met&#8221; when I wrote <a href="http://federatedsearchblog.com/2008/01/12/opentranslators-announcement-raises-questions/">an article</a> in January of 2008 about his old company, CARE Affiliates, and he responded. Carl was also a judge for the writing contest.</p>
<p>Christopher Cox, Dean of Libraries at Western Washington University, was the editor of the book &#8220;Federated Search: Solution or Setback for Online Library Services,&#8221; which contains a number of essays related to all stages of the federated search exploration, procurement, installation and test process. The book&#8217;s publisher kindly gave me <a href="http://federatedsearchblog.com/2008/03/18/christopher-cox-book-review-copies-available/">three copies of the book</a> which I gave away in exchange for <a href="http://federatedsearchblog.com/2008/04/18/federated-search-book-covers-a-wide-range-of-topics/">reviews</a> of parts of the book. </p>
<p>Our panel is going to discuss the different views about federated search. We&#8217;ll consider why some people absolutely abhor federated search while other love it. There might even be a middle ground.</p>
<p>If you&#8217;re planning to be at Enterprise Search Summit West come to our panel as I&#8217;d love to meet you. If not, then look for my blog article after the fact.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Federated+search%3A+A+wonder+or+a+waste%3F&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F24%2Ffederated-search-a-wonder-or-a-waste%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/gNA-yILXWCU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/24/federated-search-a-wonder-or-a-waste/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/24/federated-search-a-wonder-or-a-waste/</feedburner:origLink></item>
		<item>
		<title>The “lowest common denominator” myth</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/McirVfyqQ4Q/</link>
		<comments>http://federatedsearchblog.com/2009/06/22/the-lowest-common-denominator-myth/#comments</comments>
		<pubDate>Mon, 22 Jun 2009 21:05:15 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[viewpoints]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=683</guid>
		<description><![CDATA[Daniel Tunkelang, Endeca co-founder and Chief Scientist, wrote a guest article articulating a particular problem with federated search. In the article, Daniel wrote:

But federation is no panacea, at least as it is implemented today. A federated search application brokers a query, sending it to multiple search providers (i.e., the search interfaces to a variety of [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "The &#8220;lowest common denominator&#8221; myth", url: "http://federatedsearchblog.com/2009/06/22/the-lowest-common-denominator-myth/" });</script>]]></description>
			<content:encoded><![CDATA[<p>Daniel Tunkelang, Endeca co-founder and Chief Scientist, wrote a guest article articulating <a href="http://federatedsearchblog.com/2009/06/12/daniel-tunkelang-on-the-problem-with-federated-search">a particular problem with federated search</a>. In the article, Daniel wrote:</p>
<blockquote><p>
But federation is no panacea, at least as it is implemented today. A federated search application brokers a query, sending it to multiple search providers (i.e., the search interfaces to a variety of content repositories), whose results it then attempts to assemble into a coherent whole. Unfortunately, since most search providers provide little more than the top-ranked result pages, federated search applications are largely reduced to assembling a unified ranking of those disparate result pages.</p>
<p>This functionality is significant, and I do not mean to dismiss it. But it is not enough. In particular, this approach to federation necessarily assumes a <strong>lowest common denominator </strong>of search functionality–a consequence of the requirement to evenhandedly broker among a variety of search applications that vary in the richness of their APIs.
</p></blockquote>
<p><span id="more-683"></span>Note my emphasis of the phrase &#8220;lowest common denominator.&#8221; Peter Noerr, Chief Technology Officer for MuseGlobal, left a detailed comment which reads, in part:</p>
<blockquote><p>
This assumption of [lowest common denominator] LCD searching is taken as an obvious truism. But why?</p>
<p>If federated search systems are presumed to be capable of handling multiple record formats for data extraction from retrieved records, then why should they not be considered capable of generating Source specific search statements?</p>
<p>The reason (as for federated search itself) seems to be that most don’t because it is yet another messy thing to deal with on a Source by Source basis. Note that here I am talking about more than adding blanks and quotes to a search statement.
</p></blockquote>
<p>Peter raises an excellent point. There is a prevalent myth that federated search applications search as poorly as their most simple-minded source. But the myth makes no sense. If one of a dozen sources doesn&#8217;t allow an author search then the LCD myth implies that you&#8217;d get no author results from any source. Taken to its logical conclusion, given enough sources, your users would only ever get titles and URLs returned because, for any set of searchable fields, some source will fail to support one or more of them.</p>
<p>I suspect that what some people consider to be LCD behavior is that if, for example, you do an author search against a source that doesn&#8217;t have a searchable author field that you&#8217;re going to get nothing back from that source. As unpleasant as it sounds, depending on the source, it might be the right behavior, i.e. return nothing rather than return irrelevant results. In other cases it might be better to perform a full text search against that source than to not search the source at all if the user enters text into the advanced search author field. The whole issue of what is the lowest common denominator is a messy one because it&#8217;s not clear what the right behavior is. So, it&#8217;s not fair to say that federated search engines do the &#8220;wrong thing&#8221; when searching multiple sources.</p>
<p>Assuming that we agreed that LCD meant suboptimal behavior by the federated search engine then it&#8217;s certainly NOT true that federated search is forced to do LCD. Author search is a great example of how a federated search application can do much better than LCD. Author search is a pain. There are a variety of formats that a source could provide for specifying the author name. The source could expect LASTNAME,FIRST NAME (with or without the comma) or FIRSTNAME LASTNAME. Then there are first and middle names and first and middle initials to deal with. There are many ways a user could enter a name and many ways the source could want it. Plus, in some cases, a source could recognize more than one name format. A human could want to search for A S EINSTEIN, ALBERT S EINSTEIN, A SCIENTIST EINSTEIN, ALBERT SCIENTIST EINSTEIN, ALBERT EINSTEIN, or simply EINSTEIN. What happens if the source expects A EINSTEIN and you search for ALBERT EINSTEIN? Will the source do the right thing? Maybe. Maybe not. Smart connectors deal with these kinds of issues by translating, or mapping, the user&#8217;s search terms into a form that will yield the best results from a particular source. A smart connector would turn ALBERT EINSTEIN into A EINSTEIN just for the one source that needed that in order to give relevant results.</p>
<p>The reality to replace the LCD myth is that not all federated search engines are created equal and that some deal better with picky source behavior than do others. When I worked full time for blog sponsor <a href="http://deepwebtech.com">Deep Web Technologie</a>s I dabbled in connector building and I worked closely with their connector developers on some projects. The Deep Web connectors have remarkably complex logic for what I used to think was a simple task. Dealing with search syntax, phrases, wildcards, booleans, and a host of other factors is far from trivial. The Deep Web connector developers put a lot of sweat and testing into each connector they build.</p>
<p>Connector quality is not the whole LCD story. Sometimes a source will return very few results compared to other sources. This puts the source at a disadvantage because the more results you have from a source the better the relevance ranking you can perform. A smart connector can try to get multiple results pages from that source. A source may be slow to return any results. Rather than ignore results from that source a smart federated search engine can <a href="http://federatedsearchblog.com/2008/03/28/federated-search-the-challenges-of-incremental-results/">initially show results</a> from sources that did respond quickly and then update the results to merge in the late arrivals. Another major problem for federated search applications is to perform good relevance ranking when a source returns no snippet, or just the beginning of the abstract instead of a context sensitive snippet.</p>
<p>There&#8217;s more to the LCD story. Rather than repeat it all here I recommend you read a fairly in depth article I wrote a while back on what determines <a href="http://federatedsearchblog.com/2008/02/10/what-determines-quality-of-search-results/">the quality of search results</a>. I also recommend a <a href="http://federatedsearchblog.com/2009/03/27/white-paper-on-why-quality-of-search-results-matters/">white paper</a> I wrote that distills the &#8220;quality of search results&#8221; ideas into four pages. </p>
<p>Hopefully, the next time someone tells you that federated search is confined to delivering the lowest common denominator of results you&#8217;ll be able to tell them that it &#8220;ain&#8217;t so&#8221; and you&#8217;ll be able to tell them why.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=The+%26%238220%3Blowest+common+denominator%26%238221%3B+myth&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F22%2Fthe-lowest-common-denominator-myth%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/McirVfyqQ4Q" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/22/the-lowest-common-denominator-myth/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/22/the-lowest-common-denominator-myth/</feedburner:origLink></item>
		<item>
		<title>Federated search: yes I knew</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/7_s0yIcMBus/</link>
		<comments>http://federatedsearchblog.com/2009/06/19/federated-search-yes-i-knew/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 16:49:13 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[viewpoints]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=662</guid>
		<description><![CDATA[Search Marketing Sage published an article yesterday entitled Federated Search: Did You Know? I read the article plus one of the articles it referenced, Understanding Federated Link Building: A Primer With Examples, at Search Engine Land. I found them interesting enough to tweet them but I didn&#8217;t think about responding to them because both publishing [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Federated search: yes I knew", url: "http://federatedsearchblog.com/2009/06/19/federated-search-yes-i-knew/" });</script>]]></description>
			<content:encoded><![CDATA[<p>Search Marketing Sage published an article yesterday entitled <a href="http://blog.search-mojo.com/2009/06/18/federated-search-did-you-know/">Federated Search: Did You Know?</a> I read the article plus one of the articles it referenced, <a href="http://searchengineland.com/understanding-federated-link-building-a-primer-with-examples-21056">Understanding Federated Link Building: A Primer With Examples</a>, at Search Engine Land. I found them interesting enough to <a href="http://twitter.com/fedsearchblog">tweet them</a> but I didn&#8217;t think about responding to them because both publishing blogs are SEO (search engine optimization) related and I didn&#8217;t think there was much of a connection between what I blog about and SEO. So, I forgot about the two articles. </p>
<p>To be honest, warning lights went on in my head when I noticed articles in the SEO community about federated search. My knee jerk reaction was that the articles were going to tell people how to exploit federated search engines to get spam into the search engine indexes. The articles turned out not to be spammy at all but I still didn&#8217;t have any thoughts on what I might be able to contribute about SEO and federated search. The federated search applications I write about all provide scholarly content and getting your site aggregated by such an application is next to impossible, unless, of course, you&#8217;re a university or other major institution that provides top notch content. So, I mentally filed the two articles away.</p>
<p><span id="more-662"></span>Then, this morning I did my regular twitter search for &#8220;<a href="http://search.twitter.com/search?q=%22federated+search%22">federated search</a>&#8221; and noticed a bunch of tweets for the &#8220;<a href="http://search.twitter.com/search?q=%22federated+search%22+%22did+you+know%22">Did You Know?</a>&#8221; article. And, I noticed that the &#8220;<a href="http://search.twitter.com/search?q=%22federated+link+building%22">federated link building</a>&#8221; article was also getting attention on twitter. So, I reconsidered my attitude about these two articles. A number of people found these articles valuable enough to retweet. Maybe I could find something to say.</p>
<p>[begin tongue-in-cheek]<br />
Heck, if nothing else, maybe I&#8217;d get some inbound links for this post. Maybe if I write about the subject and reference my article on enough blogs that discuss SEO and &#8220;federated search&#8221; I could get some new blog subscribers and new twitter followers. Maybe some of the blogs I could comment on would be &#8220;do follow&#8221; blogs with a high Google PageRank that would give me some PageRank juice, although with a PageRank of 7 I&#8217;m much more likely to give more through my link love than I would receive.<br />
[end tongue-in-cheek]</p>
<p>Seriously, though, the &#8220;Federated Link Building&#8221; article got me thinking about SEO and federated search applications. There&#8217;s actually a tremendous interest in SEO by the organizations whose content is being federated. <a href="http://www.osti.gov">OSTI</a>, one of my clients, makes a large amount of content available via OAI-PMH, through <a href="http://oaister.org">OAISTER</a> and <a href="http://opendoar.org">OpenDOAR</a>, and to anyone else who wants to harvest it. Plus, they create large sitemaps to make it easy for the big three (Google, Yahoo!, and MSN) to index their content. This blog&#8217;s sponsor, <a href="http://deepwebtech.com">Deep Web Technologies</a>, is creating a product that creates topic pages which allow organizations to create static SEO-friendly pages that combine search results from their primary sources with reference material and images from outside sources. <a href="http://scitopia.org">Scitopia.org</a> is just one Deep Web Technologies customer that is very interested in the outreach potential of having Google find and index their content. Creating sitemaps and topic pages and making content &#8220;harvestable&#8221; are all ways to expose content in the deep web that is very difficult if not impossible for the crawlers to find without help. At the end of the day, every online organization wants to attract readers, followers, subscribers, and buyers of what they&#8217;re providing. Even organizations who give away information are wanting sufficient traffic to justify their worth.</p>
<p>Going back to Eric Ward&#8217;s &#8220;federated linking&#8221; idea about building links by getting aggregated by metasearch engines, I think that can work for folks who have authoritative, or at least highly desirable, content. This is a case of the rich getting richer; if you already have popular content then you may be able to attract search aggregators who might make your content even more attractive. If your content is poor then no aggregator with a reputation to maintain would want to touch it. </p>
<p>I think the point of Eric&#8217;s article might be that SEO folks don&#8217;t swim in the &#8220;deep web&#8221; waters and that they should consider dipping their toe in the water, so to speak. Many might never have heard of the deep web or think about how many metasearch engines there are out there. Charles Knight and his writers cover the alternative search engine industry at <a href="http://altsearchengines.com">AltSearchEngines</a>, including metasearch engines, so I recommend that blog as a way of looking for sites that might aggregate you.</p>
<p>My suspicion that many in the SEO world may never have heard of federated search is confirmed by the &#8220;Did You Know&#8221; article I mentioned early on. Author Catherine Potts leads with this:</p>
<blockquote><p>
What is Federated Search?</p>
<p>I really feel like an idiot because I was unaware of this term until yesterday but I guess it’s a pretty obvious thing. Eric Ward discusses it in his latest article over at <a href="http://searchengineland.com/understanding-federated-link-building-a-primer-with-examples-21056">Search Engine Land</a>. After all the reading I’ve done… was I not paying attention or is this topic just not really something most SEOs focus on? It’s just not something I’ve had to really deal with in my job. It is something I’ve used while in college, however.
</p></blockquote>
<p>Catherine goes on to give a nice introduction to federated search and to the limitations of crawling and is kind enough to reference my blog. Plus, given the number of retweets by link builders, her thoughts and her article have struck a chord in her community.</p>
<p>So, those are my thoughts on SEO and federated search. If you like this article please Digg it, tweet it,  Stumble it, and link to it from your highest ranking site. Oh, and if someone can tell me how I can get higher in the SERPs for &#8220;federated search&#8221; than Wikipedia I&#8217;d very much appreciate it.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Federated+search%3A+yes+I+knew&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F19%2Ffederated-search-yes-i-knew%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/7_s0yIcMBus" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/19/federated-search-yes-i-knew/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/19/federated-search-yes-i-knew/</feedburner:origLink></item>
		<item>
		<title>Noteworthy news</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/_n6E_MzpS3A/</link>
		<comments>http://federatedsearchblog.com/2009/06/15/noteworthy-news/#comments</comments>
		<pubDate>Tue, 16 Jun 2009 00:20:57 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[industry news]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=622</guid>
		<description><![CDATA[In the past few days a number of noteworthy happenings have hit my radar. Here are three:

ScienceResearch.com debuts as the most comprehensive Deep Web science search engine
Blog sponsor Deep Web Technologies announced the official launch of ScienceResearch.com. The science research portal itself is not new. What&#8217;s new is that it now searches over 400 sources [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Noteworthy news", url: "http://federatedsearchblog.com/2009/06/15/noteworthy-news/" });</script>]]></description>
			<content:encoded><![CDATA[<p>In the past few days a number of noteworthy happenings have hit my radar. Here are three:</p>
<p></p>
<h2>ScienceResearch.com debuts as the most comprehensive Deep Web science search engine</h2>
<p>Blog sponsor Deep Web Technologies announced the <a href="http://www.ereleases.com/pr/scienceresearchcom-debuts-worlds-comprehensive-deep-web-science-search-engine-21723">official launch</a> of <a href="http://scienceresearch.com">ScienceResearch.com</a>. The science research portal itself is not new. What&#8217;s new is that it now searches over 400 sources in real-time. It runs inside an Amazon Cloud to quickly address changes in demand. Amazon&#8217;s Web Services Blog <a href="http://aws.typepad.com/aws/2009/06/scienceresearch-deep-web-search-engine-now-on-amazon-ec2.html">reported on the launch</a>. Paula Hane at Information Today also <a href="http://newsbreaks.infotoday.com/NewsBreaks/Deep-Web-Tech-Relaunches-ScienceResearchcom-54675.asp">covered the relaunch</a> in a good amount of detail. All of this press coincides with Deep Web Technologies&#8217; founder Abe Lederman&#8217;s presentation today at SLA: <a href="http://federatedsearchblog.com/2009/06/08/journey-to-10000-sources-at-sla-2009/">Journey to 10,000 sources</a>. ScienceResearch.com is a key accomplishment on the road to searching 10,000 sources at once.<br />
<span id="more-622"></span><br />
<h2>Scitopia.org adds new content partners</h2>
<p><a href="http://www.scitopia.org">Scitopia.org</a> is a federated search application, powered by search technology from Deep Web Technologies. The portal was created by leading scientific and engineering societies. Today <a href="http://www.ereleases.com/pr/scitopiaorg-celebrates-sla-centennial-content-partners-21756">they announced</a> that three societies had joined. Scitopia boasts access to &#8220;over 3.5 million documents, plus patent and government data.&#8221; Content is from &#8220;twenty one societies spanning 350 years of sci-tech scholarship.&#8221;</p>
<p></p>
<h2>OSTI web-site sports a new look and releases two new videos</h2>
<p>One of my clients, the DOE Office of Scientific and Technical Information (OSTI), recently <a href="http://www.osti.gov">updated their web-site</a>. I really like the modern look and the move to highlight RSS, their blog, alerts, and other user-friendly features. SLA&#8217;s Government Information Division Blog <a href="http://sla-divisions.typepad.com/government_information/2009/06/new-look-new-features-for-ostigov.html">briefly covered</a> the new look. OSTI, a pioneer in bringing federated search to the Federal government, is wholeheartedly embracing Web 2.0. On a somewhat related note, I recently reported on <a href="http://federatedsearchblog.com/2009/06/06/osti-videos-dive-into-the-deep-web/">two videos</a> that OSTI had released on their new <a href="http://www.youtube.com/user/ostigov">YouTube channel</a>. OSTI now has posted two more videos; these are with OSTI&#8217;s Director, Dr. Walt Warnick. They&#8217;re short and fun:</p>
<p></p>
<h3>I bet you didn&#8217;t know this about searching!</h3>
<div id="vvq4a521b83ac371" class="vvqbox vvqyoutube" style="width:425px;height:355px;">
<p><a href="http://www.youtube.com/watch?v=Nb6JZUUAuz8">http://www.youtube.com/watch?v=Nb6JZUUAuz8</a></p>
</div>
<p></p>
<h3>The thrill of working for OSTI</h3>
<div id="vvq4a521b83ac3c8" class="vvqbox vvqyoutube" style="width:425px;height:355px;">
<p><a href="http://www.youtube.com/watch?v=ZJN83V4Jiyg">http://www.youtube.com/watch?v=ZJN83V4Jiyg</a></p>
</div>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Noteworthy+news&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F15%2Fnoteworthy-news%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/_n6E_MzpS3A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/15/noteworthy-news/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/15/noteworthy-news/</feedburner:origLink></item>
		<item>
		<title>Daniel Tunkelang on the problem with federated search</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/kwmK9cHHt0E/</link>
		<comments>http://federatedsearchblog.com/2009/06/12/daniel-tunkelang-on-the-problem-with-federated-search/#comments</comments>
		<pubDate>Fri, 12 Jun 2009 20:52:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
		
		<category><![CDATA[viewpoints]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=598</guid>
		<description><![CDATA[[ Editor's Note: This is a guest article by Daniel Tunkelang. (See his bio below.) Daniel is passionate about designing search systems that improve users' experience with information retrieval. This passion comes across very strongly in his book about faceted search, which I recently reviewed.
This article addresses a limitation with federated search that could be [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Daniel Tunkelang on the problem with federated search", url: "http://federatedsearchblog.com/2009/06/12/daniel-tunkelang-on-the-problem-with-federated-search/" });</script>]]></description>
			<content:encoded><![CDATA[<p>[ Editor's Note: This is a guest article by Daniel Tunkelang. (See his bio below.) Daniel is passionate about designing search systems that improve users' experience with information retrieval. This passion comes across very strongly in his book about faceted search, which I recently <a href="http://federatedsearchblog.com/2009/05/14/review-faceted-search/">reviewed</a>.</p>
<p>This article addresses a limitation with federated search that could be removed if content sources provided specific metadata to federated search engines to improve relevance ranking. Good food for thought. ]</p>
<p><img align="right" width="104" height="111" src="http://federatedsearchblog.com/images/DTunkelang-100dpi.gif">Daniel Tunkelang is the Chief Scientist and a co-founder of <a href="http://www.endeca.com/">Endeca</a>, a leading vendor of search technology. Before joining Endeca’s founding team, he worked at the IBM T. J. Watson Research Center and AT&#038;T Bell Labs. Daniel pioneered the annual <a href="http://cuaslis.org/hcir2009/">workshops</a> on <a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval">human-computer information retrieval</a> and recently published a <a href="http://thenoisychannel.com/faceted-search-the-book/">book</a> on faceted search. <a href="http://thenoisychannel.com/">He blogs at The Noisy Channel</a>.</p>
<p><span id="more-598"></span>
<p></p>
<h2>The problem with federated search</h2>
<p>The case for <a href="http://en.wikipedia.org/wiki/Federated_search">federated search</a> is straightforward: no single organization has all of the answers, and therefore no single index can ever hope to complete satisfy its users&#8217; needs. Federation allows the developer of a search application to hedge his or her bets by bringing in knowledge from outside resources.</p>
<p>But federation is no panacea, at least as it is implemented today. A federated search application brokers a query, sending it to multiple search providers (i.e., the search interfaces to a variety of content repositories), whose results it then attempts to assemble into a coherent whole. Unfortunately, since most search providers provide little more than the top-ranked result pages, federated search applications are largely reduced to assembling a unified ranking of those disparate result pages.</p>
<p>This functionality is significant, and I do not mean to dismiss it. But it is not enough. In particular, this approach to federation necessarily assumes a lowest common denominator of search functionality&#8211;a consequence of the requirement to evenhandedly broker among a variety of search applications that vary in the richness of their <a href="http://en.wikipedia.org/wiki/Application_programming_interface">APIs</a>.</p>
<p>What I would like to see is federation of <a href="http://thenoisychannel.com/2008/08/24/set-retrieval-vs-ranked-retrieval/">set retrieval</a>, not just of ranked retrieval. At first glance, this aspiration may seem impractical, since the sets being combined are often too large to be aggregated at query time. I certainly don&#8217;t expect a federated search engine to dynamically aggregate gigabytes or even terabytes of documents to process each individual query!</p>
<p>Instead, we need search engines to support, as a standard API capability, the ability to return a summary of a set of search results. Faceted search has led the way in demonstrating the value of such summarization: it offers users a much richer overview of the search results than the users could hope to obtain from a handful of top-ranked results. For the same reason, it offers far more information to a federation broker.</p>
<p>Indeed, even <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> is not enough. It is unreasonable to expect all search applications to use the same faceted classification scheme, and thus federators find themselves confronting the infamous <a href="http://www.si.umich.edu/%7Efurnas/Papers/vocab.paper.pdf">vocabulary problem</a>&#8211;perhaps more familiar to practitioners as an aspect of <a href="http://en.wikipedia.org/wiki/Master_Data_Management">master data management</a>. How do we address this problem?</p>
<p>With yet more summarization. For example, a facet value corresponds to the set of documents assigned that value, and thus to a distribution of occurrence frequencies on the other facet values. In fact, we can take this idea further and summarize a facet value in terms of a distribution on the words and phrases used in documents assigned that value. Given access to these summarizations, a federator can at least make educated guesses to establish correspondences and relationships among the facets returned by different search applications.</p>
<p>Of course, there is far more work needed to make such an approach effective and efficient enough to be practical. Summarization is more computationally intensive than simply ranking results, and combining summarizations is more complex than simply harmonizing relevance scores. But these are the challenges that represent the best opportunity to make federation a successful strategy.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Daniel+Tunkelang+on+the+problem+with+federated+search&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F12%2Fdaniel-tunkelang-on-the-problem-with-federated-search%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/kwmK9cHHt0E" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/12/daniel-tunkelang-on-the-problem-with-federated-search/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/12/daniel-tunkelang-on-the-problem-with-federated-search/</feedburner:origLink></item>
		<item>
		<title>What is an API?</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/MwtP6BUZceU/</link>
		<comments>http://federatedsearchblog.com/2009/06/11/what-is-an-api/#comments</comments>
		<pubDate>Thu, 11 Jun 2009 13:46:24 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[technology]]></category>

		<category><![CDATA[federated search]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=592</guid>
		<description><![CDATA[Product manager for Blogs.com and lead for their blogger training, Andy Wibbels, wrote an outstanding blog article, &#8220;What is an API?&#8221; As a programmer I know what an API is but I have a hard time explaining the concept to non-programmers. Now, Andy has done the explaining for me.
Andy&#8217;s article does a nice job of [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "What is an API?", url: "http://federatedsearchblog.com/2009/06/11/what-is-an-api/" });</script>]]></description>
			<content:encoded><![CDATA[<p>Product manager for Blogs.com and lead for their blogger training, Andy Wibbels, wrote an outstanding blog article, &#8220;<a href="http://andywibbels.com/2009/06/what-is-an-api/">What is an API?</a>&#8221; As a programmer I know what an API is but I have a hard time explaining the concept to non-programmers. Now, Andy has done the explaining for me.</p>
<p>Andy&#8217;s article does a nice job of explaining without overwhelming, and his short introduction skillfully avoids going into more detail than most people want. If you&#8217;ve ever wanted to explain (or understand) the connection between mashups and APIs, or how Twitter&#8217;s massive and rapid success can be attributed to embracing APIs, then this is the article for you.</p>
<p><span id="more-592"></span>APIs and Web 2.0 are intimately related. Tim O&#8217;Reilly, the man who is credited with coining the term &#8220;Web 2.0,&#8221; describes Web 2.0 as “the design of systems that harness network effects to get better the more people use them.” (See <a href="http://federatedsearchblog.com/2009/01/12/review-programming-collective-intelligence/">this article</a> for a reference to this quote.) It is the proliferation of APIs, which facilitates the mashing up of services, that powers Web 2.0 and allows the rich interaction that makes the services grow, evolve, and improve.</p>
<p>APIs are at the heart of a number of technologies I&#8217;ve blogged about. Here are a few examples:</p>
<ol>
<li><a href="http://federatedsearchblog.com/2009/06/01/query-free-federated-search/">&#8220;Query-free&#8221; federated search</a>. The prototype software that helps writers to find relevant content and references by watching what they type uses APIs to identify the right search terms and to perform searches of resources.</li>
<p></p>
<li>In <a href="http://federatedsearchblog.com/2009/05/27/a-new-paradigm-for-federated-search/">A new paradigm for federated search</a>, venture capitalist Steve Jurvetson&#8217;s distributed search paradigm relies on APIs to provide information about what pages users are browsing and to retrieve relevance information from the &#8220;Collective&#8221; database populated by others.
<p></p>
<li><a href="http://federatedsearchblog.com/2008/09/19/federated-search-and-information-in-context/">Federated search and information in context</a> tells about new features of the Science.gov 5.0 release of the federated search application: clustering, new content sources, mashup of Wikipedia and EurekAlert results. Each of those new features relies on an API in one way or another.
</ol>
<p>APIs are absolutely everywhere! Federated search relies heavily on the technology. Searching sources, especially those that don&#8217;t have to be <a href="http://federatedsearchblog.com/2007/12/27/content-access-basics-part-i-screen-scraping/">screen-scraped</a>, uses APIs to specify how to search and how to retrieve results. <a href="http://federatedsearchblog.com/2008/04/07/the-interplay-between-ajax-and-federated-search/">AJAX</a> is based on an API that specifies how to partially update a web page. Many modern web applications, not just federated search tools, use AJAX. If we consider that APIs are nothing more than protocols that allow two programs to communicate, even your humble web browser uses an API to get to your favorite federated search applications by talking to web servers using a standard called HTTP.</p>
<p>Can you think of other places where APIs are used with federated search? Leave a comment and let us all know.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=What+is+an+API%3F&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F11%2Fwhat-is-an-api%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/MwtP6BUZceU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/11/what-is-an-api/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/11/what-is-an-api/</feedburner:origLink></item>
		<item>
		<title>Journey to 10,000 sources at SLA 2009</title>
		<link>http://feedproxy.google.com/~r/Federatedsearchblogcom/~3/KkYEwbewE_k/</link>
		<comments>http://federatedsearchblog.com/2009/06/08/journey-to-10000-sources-at-sla-2009/#comments</comments>
		<pubDate>Tue, 09 Jun 2009 02:30:17 +0000</pubDate>
		<dc:creator>Sol</dc:creator>
		
		<category><![CDATA[conferences/shows]]></category>

		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://federatedsearchblog.com/?p=572</guid>
		<description><![CDATA[The SLA 2009 Annual Conference is fast approaching. Blog sponsor Deep Web Technologies will have a booth at the conference. If you&#8217;re going to be attending they&#8217;d welcome your visit. Abe Lederman, founder and President of Deep Web Technologies, will be a presenter. His talk and contributed paper are titled &#8220;Science Research: Journey to Ten [...]<script type="text/javascript">SHARETHIS.addEntry({ title: "Journey to 10,000 sources at SLA 2009", url: "http://federatedsearchblog.com/2009/06/08/journey-to-10000-sources-at-sla-2009/" });</script>]]></description>
			<content:encoded><![CDATA[<p>The SLA 2009 Annual Conference is fast approaching. Blog sponsor <a href="http://deepwebtech.com">Deep Web Technologies</a> will have a booth at the conference. If you&#8217;re going to be attending they&#8217;d welcome your visit. <a href="http://federatedsearchblog.com/about/#Abe">Abe Lederman</a>, founder and President of Deep Web Technologies, will be a presenter. His talk and <a href="http://www.sla.org/pdfs/sla2009/SciResJourneyTenThouSources.pdf">contributed paper</a> are titled &#8220;Science Research: Journey to Ten Thousand Sources.&#8221; The talk will be on Monday (June 15) from 3:30 to 5:00 as part of the &#8220;Adapt, Leverage and Communicate (Part I)&#8221; contributed paper session.</p>
<p><span id="more-572"></span>The paper and talk consider what it will take to build federated search engines that simultaneously search, first one thousand, and eventually many thousands of sources. Here is the first paragraph from the paper&#8217;s abstract:</p>
<blockquote><p>
The future of scientific research depends on sifting through more information, more quickly, and more effectively. For a researcher to expect to be able to search 1,000 databases simultaneously for critical information is not unreasonable. While parallel search is the domain of federated search, the current paradigm has severe limitations. The limitations, which include speed, relevance ranking, and selecting the appropriate sources, become painfully obvious when one attempts to search more than a few dozen sources simultaneously. A paradigm is needed for scalability to not only overcome the limitations but also help us assimilate important information.
</p></blockquote>
<p>Check out all the <a href="http://www.sla.org/content/Events/conference/ac2009/Conference/contibpap.cfm">contributed paper abstracts</a> and come to Abe&#8217;s talk. And, visit <a href="https://www.sla.org/veh09/">the exhibitors</a>, which include a few federated search vendors.</p>
<p><a href="http://sharethis.com/item?&wp=2.7.1&amp;publisher=3cfadb1e-a1cd-4b64-bce1-33d1702fd1a6&amp;title=Journey+to+10%2C000+sources+at+SLA+2009&amp;url=http%3A%2F%2Ffederatedsearchblog.com%2F2009%2F06%2F08%2Fjourney-to-10000-sources-at-sla-2009%2F">ShareThis</a></p><img src="http://feeds.feedburner.com/~r/Federatedsearchblogcom/~4/KkYEwbewE_k" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://federatedsearchblog.com/2009/06/08/journey-to-10000-sources-at-sla-2009/feed/</wfw:commentRss>
		<feedburner:origLink>http://federatedsearchblog.com/2009/06/08/journey-to-10000-sources-at-sla-2009/</feedburner:origLink></item>
	</channel>
</rss>
