<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/atom10full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0"><id>tag:blogger.com,1999:blog-21224994</id><updated>2008-07-24T11:25:07.373-07:00</updated><title type="text">Google Research Blog</title><link rel="alternate" type="text/html" href="http://googleresearch.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default?start-index=26&amp;max-results=25&amp;redirect=false" /><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/posts/default" /><author><name>Eric Case</name><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>30</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><subtitle type="html">The latest news on Google Research</subtitle><link rel="self" href="http://feeds.feedburner.com/blogspot/gJZg" type="application/atom+xml" /><entry><id>tag:blogger.com,1999:blog-21224994.post-1751107742026811701</id><published>2008-05-20T17:32:00.000-07:00</published><updated>2008-05-22T08:00:49.347-07:00</updated><title type="text">Machine Learning Meeting</title><content type="html">&lt;span class="byline-author"&gt;Posted by &lt;a href="http://www.phillong.info/"&gt;Phil Long&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span"   style="border-collapse: collapse; color: rgb(52, 52, 52);   font-family:arial;font-size:13px;"&gt;&lt;div class="Ih2E3d" style="color: rgb(80, 0, 80); "&gt;&lt;/div&gt;&lt;span&gt;&lt;span&gt;&lt;a href="http://en.wikipedia.org/wiki/Machine_learning"&gt;Machine Learning&lt;/a&gt; is a branch of &lt;a href="http://en.wikipedia.org/wiki/Artificial_intelligence"&gt;Artificial Intelligence&lt;/a&gt; in which, naturally enough, the aim is to get computers to learn: things like improving performance over time, and recognizing general tendencies among a number of specific cases.  We have many ways to exploit Machine Learning programs, and a lot of data to give them.  Machine Learning helps us to estimate what content users like most, what content is even legitimate, and how to match ads to content.  It also plays key roles in products such as &lt;a href="http://translate.google.com/"&gt;Google Translate&lt;/a&gt; and &lt;a href="http://www.google.com/goog411/"&gt;1-800-GOOG-411&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Last week, Google held a Machine Learning Summit in &lt;a href="http://www.google.com/support/jobs/bin/static.py?page=why-ny-ny.html"&gt;New York&lt;/a&gt;, gathering together engineers from around the world, including participants from Beijing, Zurich, Boston, Haifa, and Mountain View. The program included invited talks by &lt;a href="http://www.csie.ntu.edu.tw/~cjlin/"&gt;CJ Lin&lt;/a&gt; and &lt;a href="http://www.clsp.jhu.edu/people/xp/"&gt;Peng Xu&lt;/a&gt;, and many shorter presentations by Googlers on topics including broadly applicable Machine Learning software, especially challenging applications, and some mathematical analysis&lt;a href="http://youtube.com/watch?v=eBGIQ7ZuuiU"&gt;.&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="Ih2E3d" style="color: rgb(80, 0, 80); "&gt;&lt;/div&gt;&lt;/span&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=H21pTH"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=H21pTH" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/295212610" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/295212610/teaching-machine-learning.html" title="Machine Learning Meeting" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=1751107742026811701" title="11 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/1751107742026811701/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/1751107742026811701" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/1751107742026811701" /><author><name>Research Admin</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2008/05/teaching-machine-learning.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-4464853286790942186</id><published>2008-05-06T15:10:00.000-07:00</published><updated>2008-05-12T17:24:52.656-07:00</updated><title type="text">Can You Publish at Google?</title><content type="html">&lt;span class="byline-author"&gt;Posted by &lt;a href="http://en.wikipedia.org/wiki/Rich_gossweiler"&gt;Rich Gossweiler&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span"   style="border-collapse: collapse; color: rgb(52, 52, 52);   font-family:arial;font-size:13px;"&gt;&lt;div&gt;As part of the &lt;a href="http://www.google.com/intl/en/jobs/index.html"&gt;interview process&lt;/a&gt; at Google we try to describe what it is like to do &lt;a href="http://research.google.com/"&gt;research&lt;/a&gt; here. A common question I get is "How hard is it to publish at Google?" I want to dispel the myth that it is hard. &lt;a href="http://research.google.com/pubs/papers.html"&gt;It is easy to publish&lt;/a&gt;, easy to put code into &lt;a href="http://code.google.com/hosting/projects.html"&gt;open source&lt;/a&gt;, easy to give &lt;a href="http://www.youtube.com/user/googletechtalks"&gt;talks&lt;/a&gt;, etc. But it is also easy for great research to become great engineering, and that is an incredible lure.  Great barriers of despair exist between research and development at many companies; researchers can find it hard to have impact beyond demos, papers, or patents.&lt;br /&gt;&lt;br /&gt;Here at Google it is not uncommon for researchers to work on products (and is modus operandi for me); you can come up with something interesting, &lt;a href="http://labs.google.com/"&gt;experiment &lt;/a&gt;to convince yourself and others that it is worthwhile, and then work as part of the team to build it and help create world-changing products.  But are you willing to do that?  That decision is the hard part, not publishing a paper.&lt;br /&gt;&lt;br /&gt;I think from a Google standpoint, we need to make sure these barriers don't form, that making products, experimenting and having a venue for trying bold new approaches continues to be part of the &lt;a href="http://www.google.com/corporate/tenthings.html"&gt;culture&lt;/a&gt;&lt;a href="http://youtube.com/watch?v=eBGIQ7ZuuiU"&gt;.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/span&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=50cbqH"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=50cbqH" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/289054431" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/289054431/can-you-publish-at-google.html" title="Can You Publish at Google?" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=4464853286790942186" title="1 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/4464853286790942186/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/4464853286790942186" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/4464853286790942186" /><author><name>Research Admin</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2008/05/can-you-publish-at-google.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-7866471163522013754</id><published>2008-05-01T13:30:00.000-07:00</published><updated>2008-05-01T13:37:19.758-07:00</updated><title type="text">VisualRank</title><content type="html">&lt;span class="byline-author"&gt;Posted by Shumeet Baluja and Yushi Jing&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;At WWW-2008, in Beijing, China, we presented our paper "&lt;a href="http://www.esprockets.com/papers/www2008-jing-baluja.pdf"&gt;PageRank for Product Image Search&lt;/a&gt;".   In this paper, we presented a system that used visual cues, instead of solely text information, to determine the rank of images.  The idea was simple: find common visual themes in a set of images, and then find a small set of images that best represented those themes.  The resulting algorithm wound up being PageRank, but on an entirely inferred graph of image similarities.  Since the release of the paper, we've noticed lots of &lt;a href="http://www.nytimes.com/2008/04/28/technology/28google.html"&gt;coverage&lt;/a&gt; in the press and have received quite a few questions.  We thought we could answer a few of them here.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;"Why did we choose to use products for our test case?"&lt;/span&gt;  First and foremost, product queries are popular in actual usage; addressing them is important.  Second, users have strong expectations of what results we should return for these queries; therefore, this category provides an important set of examples that we need to address especially carefully.   Third, on a pragmatic note, they lend themselves well to the type of "image features" that we selected in this study.  Since the publication of the paper, we've also extended our results to other query types, including travel-related queries.    One of the nice features of the approach is that (we hope) it will be easy to extend to new domains; as research in measuring image or object similarity continues, the advances can easily be incorporated into the similarity calculation to compute the underlying graph; the computations on the graph do not change.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;"Where are we going from here?"&lt;/span&gt;  Besides broadening the sets of queries (and sets of features) for which we can use this approach, there are three directions we're exploring.   First, estimating similarity measures for all of the images on the web is computationally expensive; approximations or alternative computations are needed.    Second, we hope to evaluate our approach with respect to the large number of recently proposed alternative clustering methods.  Third, many variations of PageRank can be used in quite interesting ways for image search.  For example, we can use some of these previously published methods to reintroduce, in a meaningful manner, the textual information that the VisualRank algorithm removed.   In the end, we have an approach that has an easy integration with both text and visual clues.  Stay tuned for more on that in the coming months. &lt;br /&gt;&lt;br /&gt;And now to answer the most commonly asked question, &lt;span style="font-weight:bold;"&gt;"Is it live?"&lt;/span&gt;   Not yet. Currently, it is research in progress (&lt;a href="http://www.google.com/intl/en/jobs/index.html"&gt;click here to help speed up the process&lt;/a&gt;).   In the meantime, though, if you'd like another sneak peek of our research on large graphs, this time in the context of YouTube datamining, &lt;a href="http://www.esprockets.com/papers/adsorption-yt.pdf"&gt;just follow the link&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Finally, we want to extend our deepest thanks to the people who helped on this project, especially the image-search team; without their help, this research would not have been possible.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=7Wgn0H"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=7Wgn0H" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/281670173" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/281670173/visualrank.html" title="VisualRank" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=7866471163522013754" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/7866471163522013754/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7866471163522013754" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7866471163522013754" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2008/05/visualrank.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-7179523526847802819</id><published>2008-04-23T14:13:00.000-07:00</published><updated>2008-04-23T14:23:54.658-07:00</updated><title type="text">Research in the Cloud: Providing Cutting Edge Computational Resources to Scientists</title><content type="html">&lt;span class="byline-author"&gt;Posted by Christophe Bisciglia, Senior Software Engineer, and Alfred Spector, Vice President of Research&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The emergence of extremely large datasets, well beyond the capacity of almost any single computer, has challenged traditional and contemporary methods of analysis in the research world.  While a simple spreadsheet or modest database remains sufficient for some research, problems in the domain of  "computational science," which explores mathematical models via computational simulation, require systems that provide huge amounts of data storage and computer processing (current research areas in computational science include climate modeling, gene sequencing, protein mapping, materials science and many more).  As an added hurdle, this level of computational infrastructure is often not affordable to research teams, who usually work with significant budgetary restrictions.&lt;br /&gt;&lt;br /&gt;Fortunately, as the Internet technology industry expands its global infrastructure, accessing world class distributed computational and storage resources can be as simple as visiting a website.  Building on its &lt;a href="http://googleblog.blogspot.com/2007/10/let-thousand-servers-bloom.html"&gt;Academic Cloud Computing Initiative&lt;/a&gt; (ACCI) announced last October, Google and IBM, with the National Science Foundation, announced in February the &lt;a href="http://googleblog.blogspot.com/2008/02/supporting-cluster-computing-in.html"&gt;CluE initiative&lt;/a&gt; to address this particular need.  After coordinating the technical details with Google and IBM,  the NSF posted the official &lt;a href="http://www.nsf.gov/cise/clue/index.jsp"&gt;solicitation&lt;/a&gt; of proposals last week. &lt;br /&gt;&lt;br /&gt;Our primary goal in participating in the CluE initiative is to encourage the understanding, further refinement and --importantly-- targeted application of the latest distributed computing technology and methods across many academic disciplines.  Engaging educators and researchers with the new potential of distributed computing for processing and analyzing extremely large datasets is an invaluable investment for any technology company to make, and Google in particular is pleased to make a contribution to the academic community that has enabled so many recent advances in the industry.  &lt;br /&gt;&lt;br /&gt;We're looking forward to an eclectic collection of proposals from the NSF's solicitation.  We believe many will leverage the power of distributed computing to produce a diverse range of knowledge that will provide long term benefit to both the research community and the public at large.  We also hope that Google's contribution to this low cost, open source approach to distributed computing will allow many more in the academic community to take advantage of this pervasive technological shift.&lt;br /&gt;&lt;br /&gt;More details, including information on how to apply for access to these resources, is available on the &lt;a href="http://www.nsf.gov/news/news_summ.jsp?cntn_id=111470&amp;org=NSF&amp;from=news"&gt;NSF site&lt;/a&gt;.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=scoguJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=scoguJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/276441468" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/276441468/research-in-cloud-providing-cutting.html" title="Research in the Cloud: Providing Cutting Edge Computational Resources to Scientists" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=7179523526847802819" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/7179523526847802819/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7179523526847802819" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7179523526847802819" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2008/04/research-in-cloud-providing-cutting.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-2088148321221400987</id><published>2008-03-28T15:34:00.000-07:00</published><updated>2008-03-31T14:43:41.196-07:00</updated><title type="text">Deploying Goog411</title><content type="html">&lt;span class="byline-author"&gt;Posted by Francoise Beaufays&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A couple of years ago, a few of us got together and decided to build Goog411. It would be a free phone service that users could call to connect to any business in the US, or simply to browse through a list of businesses such as "bookstores" in a given city. Everything would be fully automated, with no operator in the background, just a speech recognition system to converse with the user, and Google Maps to execute the business search.&lt;br /&gt;&lt;br /&gt;We knew that speech recognition is not a solved problem; there would be users for whom the system wouldn't work well, and queries that would be harder to recognize than others. But we got big assets through hosting the service: we could iterate as often as we wanted on any component of the system, we'd have access to all the data, and we could measure whatever seemed relevant to callers. So we built Goog411, started taking some traffic, defining metrics, and iterated many, many times.&lt;br /&gt;&lt;br /&gt;We learned a few interesting things in the process (see &lt;a href="http://research.google.com/archive/goog411.pdf"&gt;our ICASSP paper&lt;/a&gt;). For example, we discovered that databases with lists of business names are almost useless to train a language model for how users answer the question "What business name or category?"; aggregated web query logs from Google Maps yield far better performance. And we found the speech data we collect through our own service is almost as useful to model new queries as the web data, even though we have orders of magnitude less of it. After all, you may type "real estate" in Google Maps to glance at a few properties, but would you ask for it over the phone while driving your car?&lt;br /&gt;&lt;br /&gt;Today Goog411 has grown from an experiment into a product, and we're working on expanding the service to Canada. As calls flow through the system, our focus is still on making the best use of the increasing data, defining metrics that best correlate to the user's experience, and taking advantage of the computer resources and data sources available within Google.&lt;br /&gt;&lt;br /&gt;Maybe our most rewarding experience so far has been to see our traffic grow, and to see repeat callers succeed more and more often with the system. Have you tried it already? Just call 1-800-GOOG-411, and don't hesitate to send us &lt;a href="http://groups.google.com/group/goog411"&gt;feedback&lt;/a&gt;!&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=Hn4NKJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=Hn4NKJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/261542044" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/261542044/deploying-goog411.html" title="Deploying Goog411" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=2088148321221400987" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/2088148321221400987/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/2088148321221400987" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/2088148321221400987" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2008/03/deploying-goog411.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-7904601348637320999</id><published>2008-02-11T11:53:00.000-08:00</published><updated>2008-02-11T15:09:27.051-08:00</updated><title type="text">This year's scalability conference</title><content type="html">&lt;span class="byline-author"&gt;Posted by Andrew Schwerin, Software Engineer&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Managing huge repositories of data and large clusters of machines is no easy task -- and building systems that use those clusters to usefully process that data is even harder. Last year, we held a conference on scalable systems so a bunch of people who work on these challenges could get together and share ideas. Well, it was so much fun that we've decided to do it again.&lt;br /&gt;&lt;br /&gt;This year, the conference is taking place in Seattle on Saturday, June 14. (Registration is free.) If you'd like to talk about a topic on scalable or large-scale systems that is near and dear to your heart, we'd love to hear from you. Potential topics include:&lt;br /&gt;&lt;br /&gt;Development, deployment and production:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;    Systems, environments and languages for building, deploying and debugging complex datacenter-scale apps, or for allowing teams of collaborating engineers to work together on such apps more effectively&lt;/li&gt;&lt;/ul&gt;Mobile applications:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;    Unique challenges of scaling services for mobile devices&lt;/li&gt;&lt;li&gt;Location-aware scaling techniques&lt;/li&gt;&lt;li&gt;Experiences designing scalable apps involving mobile devices&lt;/li&gt;&lt;/ul&gt;Of course, you've probably got more ideas. Send a 500-word abstract of your 30-minute presentation to scalabilityconf@google.com no later than Friday, April 11, and we'll post registration details in the next couple of months.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=IRODBJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=IRODBJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/233383938" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/233383938/this-years-scalability-conference.html" title="This year's scalability conference" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=7904601348637320999" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/7904601348637320999/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7904601348637320999" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7904601348637320999" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2008/02/this-years-scalability-conference.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-753636001906003049</id><published>2007-10-18T17:31:00.000-07:00</published><updated>2007-10-26T12:03:58.979-07:00</updated><title type="text">Google Education Summit</title><content type="html">&lt;span class="byline-author"&gt;Posted by Jeff Walz and Kevin McCurley&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The world's research and educational infrastructures are tightly intertwined.  Research universities enable students to participate in research activities, and research contributes to the vitality of the educational experience.  At Google, we also recognize the importance of education to our research and engineering activities.  In addition to our own in-house activities, we maintain strong ties to academic institutions through visiting faculty programs and summer internships.  In recognition of the importance of education to Google's mission, we also recently organized a &lt;a href="http://research.google.com/university/relations/eduSummit2007/"&gt;Google Education Summit&lt;/a&gt;.  Mehran Sahami has more to say about this in a &lt;a href="http://googleblog.blogspot.com/2007/10/about-google-education-summit.html"&gt;recent blog post&lt;/a&gt;.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=ds7MFJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=ds7MFJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/175497366" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/175497366/google-education-summit.html" title="Google Education Summit" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=753636001906003049" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/753636001906003049/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/753636001906003049" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/753636001906003049" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/10/google-education-summit.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-9127077813767484192</id><published>2007-09-23T14:01:00.000-07:00</published><updated>2007-09-23T15:21:00.037-07:00</updated><title type="text">OpenHTMM Released</title><content type="html">&lt;span class="byline-author"&gt;Posted by Ashok C. Popat, Research Scientist&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Statistical methods of text analysis have become increasingly sophisticated over the years.  A good example is automated topic analysis using latent models, two variants of which are &lt;a href="http://en.wikipedia.org/wiki/PLSI"&gt;Probabilistic latent semantic analysis&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Latent_Dirichlet_Allocation"&gt;Latent Dirichlet Allocation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Earlier this year, &lt;a href="http://www.cs.huji.ac.il/~amitg/"&gt;Amit Gruber&lt;/a&gt;, a Ph.D. student at the Hebrew University of Jerusalem, presented a technique for analyzing the topical content of text at the &lt;a href="http://www.stat.umn.edu/~aistat/"&gt;Eleventh International Conference on Artificial Intelligence and Statistics&lt;/a&gt; in Puerto Rico.&lt;br /&gt;&lt;br /&gt;Gruber's approach, dubbed &lt;a href="http://www.cs.huji.ac.il/~amitg/aistats07.pdf"&gt;Hidden Topic Markov Models (HTMM)&lt;/a&gt;, was developed in collaboration with &lt;a href="http://domino.research.ibm.com/comm/research_people.nsf/pages/rosenzvi.index.html"&gt;Michal Rosen-Zvi&lt;/a&gt; and &lt;a href="http://www.cs.huji.ac.il/~yweiss/"&gt;Yair Weiss&lt;/a&gt;.  It differs notably from others in that, rather than treat each document as a single "bag of words," it imposes a temporal Markov structure on the document.  In this way, it is able to account for shifting topics within a document, and in so doing, provides a topic segmentation within the document, and also seems to effectively distinguish among multiple senses that the same word may have in different contexts within the same document.&lt;br /&gt;&lt;br /&gt;Amit is currently a doing graduate internship at Google.  As part of his project, he has developed a fresh implementation of his method in C++. We are pleased to release it as the &lt;a href="http://code.google.com/p/openhtmm/"&gt;OpenHTMM&lt;/a&gt; package to the research community under the Apache 2 license, in the hopes that it will be of general interest and facilitate further research in this area.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=3FaJxJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=3FaJxJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/160357004" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/160357004/openhtmm-released.html" title="OpenHTMM Released" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=9127077813767484192" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/9127077813767484192/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/9127077813767484192" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/9127077813767484192" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/09/openhtmm-released.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-5235054919803637191</id><published>2007-09-19T16:16:00.000-07:00</published><updated>2007-09-19T23:27:53.534-07:00</updated><title type="text">The Sky is Open</title><content type="html">&lt;span class="byline-author"&gt;Posted by Jeremy Brewer&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We've gotten an incredible amount of positive feedback about &lt;a href="http://googleresearch.blogspot.com/2007/08/introducing-sky-in-google-earth.html"&gt;Sky in Google Earth&lt;/a&gt;, which lets Google Earth users explore the sky above them with hundreds of millions of stars and galaxies taken from astronomy imagery.&lt;br /&gt;&lt;br /&gt;From the start though, we have wanted to open the sky up to everyone.  As a first step, we've been hard at work developing tools to let astronomers add their own imagery, and we think we've come up with something that does the job nicely. We're pleased to announce the availability of &lt;a href="http://code.google.com/p/wcs2kml/"&gt;wcs2kml&lt;/a&gt;, an open source project for importing astronomical imagery into Sky.&lt;br /&gt;&lt;br /&gt;Modern telescopes output imagery in the FITS binary format that contains a set of headers known as a World Coordinate System (that's the "wcs" part) specifying the location of the image on the sky.  Wcs2kml handles the task of transforming this imagery into the projection system used by Google Earth (the "kml" part) so that it can be viewed directly in Sky. Wcs2kml also includes tools to simplify uploading this data to a web server and sharing it with friends.&lt;br /&gt;&lt;br /&gt;We were astounded at the imagery and novel applications people created when we opened the Google Earth API to our users. Now, by opening Sky in Google Earth to the astronomy community, we hope to open a floodgate of new imagery for Sky!&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=8L8eUJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=8L8eUJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/158882358" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/158882358/sky-is-open.html" title="The Sky is Open" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=5235054919803637191" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/5235054919803637191/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/5235054919803637191" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/5235054919803637191" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/09/sky-is-open.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-2585924616580120145</id><published>2007-08-21T19:46:00.000-07:00</published><updated>2007-08-22T12:31:17.395-07:00</updated><title type="text">Introducing Sky in Google Earth</title><content type="html">&lt;span class="byline-author"&gt;Posted by Andy Connolly and Ryan Scranton&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;At Google we are always interested in creating new ways to share ideas and information and applying these techniques to different research fields. Astronomy provides a great opportunity with an abundance of images and information that are accessible to researchers and indeed, anyone with an interest in the stars.  With the release of the &lt;a href="http://earth.google.com/"&gt;Google Earth 4.2 client&lt;/a&gt; the new Sky feature acts as a virtual telescope that provides a view of some of the most detailed images ever taken of the night sky. By clicking on the Sky button, you can explore the universe, seamlessly zooming from the familiar views of the constellations and stars, to the deepest images ever taken of galaxies and more. From planets moving across the sky to supernovae exploding in distant galaxies, Sky provides a view of a dynamic universe that we hope you will enjoy.&lt;br /&gt;&lt;br /&gt;In addition to allowing educators, amateurs or anyone with an interest in space to visually explore the sky, one of the most exciting aspects of Sky is its capability for research and discovery in astronomy. With the latest features in KML you can connect astronomical image and catalog databases directly to the visualization capabilities of Sky ( e.g. searching the Sloan Digital Sky Survey database for the highest redshift quasars or correlating the the infrared and optical sky to detect the presence of dust within our Galaxy). From releasing new data about the latest discovery of planets around nearby stars to identifying the host galaxy of a gamma ray burst the possibilities are endless. Examples of how to build research applications such as a view of the microwave background emission from the remnant of the Big Bang can be found in the &lt;a href="http://earth.google.com/gallery"&gt; Google Earth Gallery&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It has been a lot of fun creating Google's first astronomical observatory. Go check it out; explore the sky from the comfort of your home; If you find something interesting let us know on the &lt;a href="http://bbs.keyhole.com/ubb/postlist.php/Cat/0/Board/sky/"&gt; Sky section&lt;/a&gt; of the Google Earth Community, or &lt;a href="http://code.google.com/apis/kml/documentation/"&gt; author your own &lt;/a&gt; KML applications to to share your discoveries and data with everyone else. You can also find more Sky resources on our &lt;a href="http://earth.google.com/sky/skyedu"&gt; website.&lt;/a&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=o1w5mJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=o1w5mJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/147034293" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/147034293/introducing-sky-in-google-earth.html" title="Introducing Sky in Google Earth" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=2585924616580120145" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/2585924616580120145/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/2585924616580120145" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/2585924616580120145" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/08/introducing-sky-in-google-earth.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-3313596925024274627</id><published>2007-07-26T11:27:00.000-07:00</published><updated>2007-07-26T12:18:39.785-07:00</updated><title type="text">Drink from the firehose with University Research Programs</title><content type="html">&lt;span class="byline-author"&gt;Posted by Michael Lancaster and Josh Estelle, Software Engineers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Whenever we talk to university researchers, we hear a consistent message: they wish they had Google infrastructure. In pursuit of our &lt;a href="http://www.google.com/intl/en/corporate/index.html"&gt;company mission&lt;/a&gt;, we have built an elaborate set of systems for collecting, organizing, and analyzing information about the web. Operating and maintaining such an infrastructure is a high barrier to entry for many researchers. We recognize this and want to share some of the fruits of our labor with the research community. Today, in conjunction with the &lt;a href="http://www.google.com/events/facultysummit2007/agenda.html"&gt;Google Faculty Summit&lt;/a&gt; we're making two services available under the new University Research Programs, namely access to web search, and machine translation.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://research.google.com/university/search/"&gt;University Research Program for Google Search&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Google is focused on the success of the web, which is essentially an organism in and of itself with extremely complex contents and an ever-evolving structure. The primary goal of the University Research Program for Google Search is to promote research that creates a greater understanding of the web. We want to make it easy for researchers to analyze millions of queries in a reasonably short amount of time. We feel that such research can benefit everyone. As such, we've added a proviso that all research produced through this program must be published in a freely accessible manner.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://research.google.com/university/translate/"&gt;University Research Program for Google Translate&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;The web is a global information medium with content from many cultures and languages. In order to break the language barrier, many researchers are hard at work building high quality, automatic, machine translation systems. We've been successful with our own statistical machine translation system, and are now happy to provide researchers greater access to it. The University Research Program for Google Translate provides researchers access to translations, including detailed word alignment information and lists of the n-best translations with detailed scoring information. We hope this program will be a terrific resource to help further the state of the art in automatic machine translation.&lt;br /&gt;&lt;br /&gt;The web holds a wealth of untapped research potential and we look forward to seeing great new publications enabled by these new programs. Go ahead - &lt;a href="http://research.google.com/university/"&gt;surprise us&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;By the way, since many researchers lead a double life as educators, we want to let you know about a site that recently launched: &lt;a href="http://code.google.com/edu/"&gt;Google Code for Educators&lt;/a&gt;, designed to make it easy for CS faculty to integrate cutting-edge computer science topics into their courses. Check it out.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=SWHIjJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=SWHIjJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582859" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582859/drink-from-firehose-with-university.html" title="Drink from the firehose with University Research Programs" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=3313596925024274627" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/3313596925024274627/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/3313596925024274627" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/3313596925024274627" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/07/drink-from-firehose-with-university.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-6570151136412082882</id><published>2007-06-18T15:10:00.000-07:00</published><updated>2007-06-20T20:20:21.638-07:00</updated><title type="text">New Conference on Web Search and Data Mining</title><content type="html">&lt;span class="byline-author"&gt;Posted by Ziv Bar-Yossef and Kevin McCurley, Research Team&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The pace of innovation on the World Wide Web continues unabated more than fifteen years after the first servers went live.  The web was initially used by only a small community of scientists, but there are now over a billion people on the planet who use the web in their lives. The World Wide Web grows and changes as a young organism might, reflecting the social forces of the users and information producers.  Each year seems to bring a radical new change, including the movement of commerce to the web, the availability of realtime news on the web, mobile users being able to access the web from anywhere, new forms of media such as video, and the emergence of blogs changing politics and publishing.&lt;br /&gt;&lt;br /&gt;This rapid pace of innovation and scale presents many interesting research questions.  At Google our goal is to organize information in ways that are useful to users, and we regularly find ourselves solving problems that seemed like ridiculous thought experiments just a few years ago.  We therefore welcome the arrival of a &lt;a href="http://www.wsdm2008.org/"&gt;new conference on Web Search and Data Mining&lt;/a&gt;, prosaically named with the acronym WSDM (pronounced as wisdom). WSDM is intended to be complementary to the World Wide Web Conference tracks in search and data mining. The soaring volume of submissions to these two tracks over the past few years justifies the foundation of a new top-tier conference on web search and mining. WSDM is a joint effort of researchers from the three large search engines (Google, Yahoo, MSN) as well as top-notch scientists from the Academia (such as Jon Kleinberg from Cornell, Rajeev Motwani from Stanford, and Monika Henzinger from Google and EPFL).  The first WSDM conference will take place at Stanford University (the place where both Google and Yahoo! were conceived by their founders). The conference will be held in February of 2008, and the deadline for submissions is July 30, 2007. For further information see the &lt;a href="http://www.wsdm2008.org/"&gt;WSDM web site&lt;/a&gt;. If you have good papers on search or data mining in the pipeline, please consider sending them to WSDM. &lt;br /&gt;&lt;br /&gt;We look forward to seeing you there! &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=T74UpJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=T74UpJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582860" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582860/new-conference-on-web-search-and-data.html" title="New Conference on Web Search and Data Mining" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=6570151136412082882" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/6570151136412082882/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/6570151136412082882" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/6570151136412082882" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/06/new-conference-on-web-search-and-data.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-7018577866531962749</id><published>2007-06-18T14:38:00.000-07:00</published><updated>2007-06-18T19:33:04.438-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="video" /><category scheme="http://www.blogger.com/atom/ns#" term="publication" /><title type="text">Videos of talks</title><content type="html">&lt;span class="byline-author"&gt;Posted by Kevin McCurley, Research Team&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We've recently launched a &lt;a href="http://research.google.com/"&gt;Google Research&lt;/a&gt; web site that we'll be updating to provide information about research activities at Google.  Among other things, one thing you'll find there is the ability to search and view &lt;a href="http://research.google.com/video.html"&gt;videos of talks at Google&lt;/a&gt;.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;One of the best features of working at Google is the rich variety of talks that we can attend, both technical and general interest.  Most of these are videotaped for later viewing.  This has multiple benefits:&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;In case of a scheduling conflict, Google employees may view talks at a later time (yes, some of us &lt;span style="font-style: italic;"&gt;do&lt;/span&gt; have other things to do in the day).&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Talks are available for viewing by Google employees at other sites.  This provides us with a much more cohesive intellectual culture than most global companies.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;When appropriate, speakers may opt to have their talks available on the World Wide Web.  This provides a benefit to both viewers and speakers, since it allows speakers to reach a much broader audience, and it allows viewers to hear interesting talks without the need to be&lt;br /&gt;physically present.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;The World Wide Web started out as a means for scientists to communicate among themselves.  In the early days it provided a less formal and timely means of distributing information than archival refereed publications, and it's now routine for a scientist to have a home page from which they distribute their writings and thoughts.  Moreover, it's also now commonplace to find a large fraction of current scientific literature through the web, both refereed and unrefereed.  In fact, the situation has evolved to the point where scientists often consult the web for publications before going to a library.&lt;br /&gt;&lt;br /&gt;Archival publications are but one means of communication that has typically been used by scientists.  Another mode of communication that has a long history of use is the presentation of talks at meetings and during visits to other institutions.  Oral presentations have historically been less formal, and allow the speaker to be more speculative and interactive.&lt;br /&gt;&lt;br /&gt;In the last few years, several technological developments have made it possible to distribute high quality video of talks on the web in addition to written publications.  This distribution of videos from talks holds the promise of changing the way that scientists think about communication.  Imagine what lessons would be available to us if we had the ability to view lectures by Kepler, Einstein, Turing, Shannon, or von Neumann!  Imagine also what it would be like to be able to watch and listen to selected talks from conferences that are across the world, without having to suffer the burden of traveling to the remote location.  Such media are unlikely to ever completely supplant the richness of communication that arises from personal interaction in physical proximity, but it will probably still change scientific communication as much as email and the web have already.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=MFgO9J"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=MFgO9J" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582861" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582861/videos-of-talks.html" title="Videos of talks" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=7018577866531962749" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/7018577866531962749/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7018577866531962749" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/7018577866531962749" /><author><name>Kevin McCurley</name><uri>http://www.blogger.com/profile/05232657898652200217</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/06/videos-of-talks.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-117165350451339942</id><published>2007-02-16T11:15:00.000-08:00</published><updated>2007-02-16T11:18:24.526-08:00</updated><title type="text">Seattle conference on scalability</title><content type="html">&lt;span class="byline-author"&gt;Posted by Amanda Camp, Software Engineer&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We care a lot about &lt;a href="http://en.wikipedia.org/wiki/Scalability"&gt;scalability&lt;/a&gt; at Google. An algorithm that works only on a small scale doesn't cut it when we are talking global access, millions of people, millions of search queries. We think big and love to talk about big ideas, so we're planning our first ever &lt;span style="font-weight:bold;"&gt;conference on scalable systems.&lt;/span&gt; It will take place on &lt;span style="font-weight: bold;"&gt;June 23&lt;/span&gt; at our Seattle office. Our goal: to create a collegial atmosphere for participants to brainstorm different ways to build the robust systems that can handle, literally, a world of information.  &lt;br /&gt;&lt;br /&gt;If you have a great new idea for handling a growing system or an innovative approach to scalability, we  want to hear from you. Send a short note about who you are and a description of  your 45-minute talk in 500 words or less to &lt;a href="mailto:scalabilityconf@google.com"&gt;scalabilityconf@google.com&lt;/a&gt; by &lt;span style="font-weight: bold;"&gt;Friday, April 20&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;With your help, we can create an exciting event that brings together great  people and ideas. (And by the way, we'll bring the &lt;a href="http://www.flickr.com/photos/brettlider/sets/154249/"&gt;food&lt;/a&gt;.) If you'd like to attend but not speak, we'll post registration details later.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=0XoZ8J"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=0XoZ8J" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582862" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582862/seattle-conference-on-scalability.html" title="Seattle conference on scalability" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=117165350451339942" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/117165350451339942/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/117165350451339942" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/117165350451339942" /><author><name>A Googler</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/02/seattle-conference-on-scalability.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-117147989383764198</id><published>2007-02-14T09:54:00.000-08:00</published><updated>2007-02-14T17:06:03.613-08:00</updated><title type="text">Hear, here.   A Sample of Audio Processing at Google.</title><content type="html">&lt;span class="byline-author"&gt;Posted by Shumeet Baluja, Michele Covell, Pedro Moreno &amp; Eugene Weinstein&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Text isn't the only source of information on the web!  We've been working on a variety of projects related to audio and visual recognition.  One of the fundamental constraints that we have in designing systems at Google is the huge amounts of data that we need to process rapdily.  A few of the research papers that have come out of this work are shown here.  &lt;br /&gt;&lt;br /&gt;In the first pair of papers, to be presented at the 2007 International Conference on Acoustics, Speech and Signal Processing (&lt;a href="http://www.esprockets.com/papers/Baluja_Covell_ICASSP2007_1888.pdf"&gt;Waveprint Overview&lt;/a&gt;, &lt;a href="http://www.esprockets.com/papers/detection_CB.pdf"&gt;Waveprint-for-Known-Audio&lt;/a&gt;), we show how computer vision processing techniques, combined with large-scale data stream processing, can create an efficient system for recognizing audio that has been degraded by various means such as cell phone playback, lossy compression, echoes, time-dilation (as found on the radio), competing noise, etc. &lt;br /&gt;&lt;br /&gt;It is also fun and surprising to see how often in research the same problem can be approached from a completely different perspective.  In the third paper to be presented at ICASSP-2007 (&lt;a href="http://pmoreno.googlepages.com/music.pdf"&gt;Music Identification with WFST&lt;/a&gt;) we explore how acoustic modeling techniques commonly used in speech recognition, and finite state transducers used to represent and search large graphs, can be used in the problem of music identification.  Our approach learns a common alphabet of music sounds (which we call music-phones) and represents large song collections as a big graph where efficient search is possible.&lt;br /&gt;&lt;br /&gt;Perhaps one of the most interesting aspects of audio recognition goes beyond the matching of degraded signals, and instead attempts to capture meaningful notions of similarity.  In our paper presented at the International Conference on Artificial Intelligence (&lt;a href="http://www.esprockets.com/papers/IJCAI07_BalujaCovell.pdf"&gt;Music Similarity&lt;/a&gt;), we describe a system that learns relevant similarities in music signals, while maintaining efficiency by using these learned models to create customized hashing functions.  &lt;br /&gt;&lt;br /&gt;We're extending these pieces of work in a variety of ways, not only in the learning algorithms used, but also the application areas.  If you're interested in joining google research and working on these projects, be sure to drop us a line.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=wq5X7J"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=wq5X7J" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582863" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582863/hear-here-sample-of-audio-processing.html" title="Hear, here.   A Sample of Audio Processing at Google." /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=117147989383764198" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/117147989383764198/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/117147989383764198" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/117147989383764198" /><author><name>Peter Norvig</name><uri>http://www.blogger.com/profile/04107527216960668585</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2007/02/hear-here-sample-of-audio-processing.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-116587801847653170</id><published>2006-12-11T14:58:00.000-08:00</published><updated>2006-12-11T15:16:04.016-08:00</updated><title type="text">Google Research Picks for Videos of the Year</title><content type="html">&lt;span class="byline-author"&gt;Posted by Peter Norvig&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Everyone else is giving you year-end top ten lists of their favorite movies, so we thought we'd give you &lt;a href="http://video.google.com/videosearch?q=Google+engedu"&gt;&lt;b&gt;ours&lt;/b&gt;&lt;/a&gt;, but we're skipping &lt;i&gt;Cars&lt;/i&gt; and &lt;i&gt;The Da Vinci Code&lt;/i&gt; and giving you autonomous cars and open source code.  Our top twenty (we couldn't stop at ten):&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=8594517128412883394&amp;q=engedu"&gt;Winning the DARPA Grand Challenge&lt;/a&gt;: Sebastian Thrun stars in the heartwarming drama of a little car that could.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-7497796873809571567&amp;amp;q=engedu"&gt;The Graphing Calculator Story&lt;/a&gt;: A thriller starring Ron Avitzur as the engineer who snuck into the Apple campus to write code.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=1996321846673788606&amp;q=engedu"&gt;Should Google Go Nuclear?&lt;/a&gt;: Robert Bussard (former Asst. Director of the AEC) talks about inertial electrostatic fusion.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-6972678839686672840&amp;amp;q=engedu"&gt;A New Way to Look at Networking&lt;/a&gt;: Van Jacobson as the old pro discovering that the old problems have not gone away.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-6459339159268485356&amp;q=engedu"&gt;Python 3000&lt;/a&gt;: Guido van Rossum always looks on the bright side of life in this epic look at the future of Python.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=7951038502689013454"&gt;How to Survive a Robot Uprising&lt;/a&gt;: Daniel Wilson stars in this sci-fi horror story.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-6363477088541627991&amp;amp;q=engedu"&gt;The New "Bill of Rights of Information Society"&lt;/a&gt;: Raj Reddy talks about how to get the right information to the right people at the right time.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=448441135356213813&amp;q=engedu"&gt;Practical Common Lisp&lt;/a&gt;: In this foreign film, Peter Seibel introduces the audience to a new language. Subtitles in parentheses.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=3897010229726822034&amp;amp;q=engedu"&gt;Debugging Backwards in Time&lt;/a&gt;: Starring Bil Lewis in this sequel to &lt;i&gt;Back to the Future&lt;/i&gt;.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-5699448884004201579&amp;q=engedu"&gt;Building Large Systems at Google&lt;/a&gt;: Narayanan Shivakumar takes us behind the scenes to see how Google builds large distributed systems.  Like &lt;i&gt;Charlie and the Chocolate Factory&lt;/i&gt; but without the Oompa-Loompas.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-6459171443654125383&amp;amp;q=Google+engedu"&gt;The Science and Art of User Experience at Google&lt;/a&gt;: Jen Fitzpatrick continues the behind-the-scenes look.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=7423822399040727959&amp;q=Google+engedu"&gt;Universally Accessible Demands Accessibility for All of Humanity&lt;/a&gt;: McArthur "Genius Award" Fellow Jim Fruchterman talks about accessibility for the blind and others.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-8220394453782681101&amp;amp;q=watson+google"&gt;DNA and the Brain&lt;/a&gt;: Nobel Laureate James Watson explains how the key to understanding the brain is in our genes.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=383231378223541436&amp;q=authors+google"&gt;Steve Wozniak&lt;/a&gt;: This one-man show is playing to boffo reviews.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=3763187418980465301"&gt;Jane Goodall&lt;/a&gt;: The celebrated primatologist discusses her mission to empower individuals to improve the environment.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-7704388615049492068&amp;amp;q=engedu"&gt;Computers Versus Common Sense&lt;/a&gt;: Doug Lenat reprises his role as the teacher trying to get computers to understand.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-5108715140883188059"&gt;The Google Story&lt;/a&gt;: David Vise talks about his book on Google.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=-4381488634998231167"&gt;The Search&lt;/a&gt;: John Battelle talks about his book on Google.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=8211813884612792878&amp;q=engedu"&gt;The Archimedes Palimpsest&lt;/a&gt;: Like &lt;i&gt;Da Vinci Code&lt;/i&gt;, only true.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://video.google.com/videoplay?docid=6127548813950043200&amp;amp;q=engedu"&gt;The Paradox of Choice - Why More is Less&lt;/a&gt;: With Barry Schwartz.  Hmm, maybe I should have made this a top &lt;i&gt;three&lt;/i&gt; list?&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=REv18J"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=REv18J" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582864" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582864/google-research-picks-for-videos-of.html" title="Google Research Picks for Videos of the Year" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=116587801847653170" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/116587801847653170/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/116587801847653170" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/116587801847653170" /><author><name>Peter Norvig</name><uri>http://www.blogger.com/profile/04107527216960668585</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/12/google-research-picks-for-videos-of.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-116473518428934493</id><published>2006-11-28T09:37:00.000-08:00</published><updated>2006-11-28T09:37:12.950-08:00</updated><title type="text">CSCW 2006: Collaborative editing 20 years later</title><content type="html">&lt;span class="byline-author"&gt;Posted by Lilly Irani &amp; Jens Riegelsberger, User Experience team&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;9am Mountain View, California. 6pm Zurich, Switzerland. The two of us sit separated by thousands miles, telephones tucked under our ears, talking about this blog post and typing words and edits into &lt;a href="http://docs.google.com/" title="Writely"&gt;Google Docs&lt;/a&gt;. As we talk about the title, we start typing into the same paragraph -- and Lilly gets a warning: "You've edited a paragraph that Jens has been editing!" Lilly stops typing so she doesn't lose her thoughts and coordinates with Jens over the phone. Then we realize "We just &lt;a href="http://www.cscw2006.org/program_technical_tue.html#session9" title="read a paper"&gt;talked about this problem&lt;/a&gt; at the conference we're writing about!"&lt;br /&gt;&lt;br /&gt;Two weeks ago four Googlers ventured north to attend &lt;a href="http://www.cscw2006.org/" title="ACM CSCW"&gt;ACM CSCW&lt;/a&gt; in Banff, Alberta, Canada. CSCW is ACM's conference on Computer Supported Cooperative Work and brings together computer scientists, social scientists, and designers interested in how people live their lives -- at work, at play, and in between -- with and around technology, with a focus on undestanding the design of technological systems. Topics like issues and implementation of collaborative editing are staples at CSCW.&lt;br /&gt;&lt;br /&gt;As this year was the conference's 20th anniversary, we had a chance to hear from many of the founders of CSCW: Irene Greif, Jonathan Grudin, Tom Malone, Judy Olson, Lucy Suchman, among others. Not surprisingly, the mood was introspective, with many speakers tracing the impact of the community over time and looking critically and constructively at the future paths the research community might take. Many sessions focused on less traditional areas of research, such as how Facebook figures into college students' school transitions and how tagging vocabularies evolve and are shaped by technology in a movie community. Jens also gave a talk on his pre-Google research on &lt;a href="http://doi.acm.org/10.1145/1180875.1180899" title="effects of photos and voice in online gamer matchmaking systems"&gt;how photos and voice profiles affect people's choice of gaming partners&lt;/a&gt;. And he participated in a &lt;a href="http://myweb.cwpost.liu.edu/qzhang/cscw06/presentations.html" title="workshop exploring how people trust -- and learn to trust"&gt;workshop exploring how people trust -- and learn to trust&lt;/a&gt; -- in online environments.&lt;br /&gt;&lt;br /&gt;Apart from actively taking part in the debates and Q&amp;As, we also demo-ed Google's tools for getting things done, collaboratively or solo: &lt;a href="http://docs.google.com/" title="Google Docs &amp;amp; Spreadsheets"&gt;Google Docs &amp; Spreadsheets&lt;/a&gt; and &lt;a href="http://www.google.com/notebook" title="Google Notebook"&gt;Google Notebook&lt;/a&gt;. These were met with much interest, as these publicly available Google tools build on insights gained in the CSCW field over the last 20 years.&lt;br /&gt;&lt;br /&gt;If you're interested in these issues, you'd be a great addition to our team. Learn about available positions in &lt;a href="http://www.google.com/support/jobs/bin/topic.py?dep_id=1091&amp;amp;loc_id=1100" title="user experience research and design"&gt;user experience research and design&lt;/a&gt;.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=BOmQUJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=BOmQUJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582865" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582865/cscw-2006-collaborative-editing-20.html" title="CSCW 2006: Collaborative editing 20 years later" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=116473518428934493" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/116473518428934493/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/116473518428934493" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/116473518428934493" /><author><name>A Googler</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/11/cscw-2006-collaborative-editing-20.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-115894742981077088</id><published>2006-09-22T09:25:00.000-07:00</published><updated>2006-10-31T12:15:39.533-08:00</updated><title type="text">And the Awards Go To ...</title><content type="html">&lt;span class="byline-author"&gt;Posted by Proud Googlers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We're usually a modest bunch, but we we couldn't help but let you know about some honors and awards bestowed on Googlers recently:&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt; Ramakrishnan Srikant is the winner of the 2006 ACM &lt;a href="http://www.acm.org/sigs/sigkdd/awards.php#2006i"&gt;SIGKDD Innovation Award&lt;/a&gt; for his work on pruning techniques for the discovery of association rules, and for developing new data mining approaches that respect the privacy of people in the data base.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; Henry Rowley and Shumeet Baluja, along with CMU professor Takeo Kanade, received the Longuet-Higgins prize for "a contribution which has stood the test of time," namely their 1996 paper &lt;i&gt;&lt;a href="http://www.ri.cmu.edu/pubs/pub_2678.html"&gt;Neural Network based face detection&lt;/a&gt;&lt;/i&gt;.  The award was given at the 2006 Computer Vision and Pattern Recognition (&lt;a href="http://www.cvpr.org/2006/"&gt;CVPR&lt;/a&gt;) Conference.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; &lt;i&gt;Team Smartass&lt;/i&gt;, consisting of Christopher Hendrie, Derek Kisman, Ambrose Feinstein and Daniel Wright won first place in the ICFP (International Conference on Functional Programming) &lt;a href="http://icfpcontest.org/scoreboard.shtml"&gt;programming contest&lt;/a&gt;, using a combination of C++, Haskell and 2D. Third place went to &lt;i&gt;Can't Spell Awesome without ASM&lt;/i&gt;, a team consisting of Google engineer Jon Dethridge, former Google interns Ralph Furmaniak and Tomasz Czajka, and Reid Barton of Harvard. They got the judges at the functional programming conference to admit "Assembler is not too shabby."&lt;br /&gt;&lt;br /&gt;&lt;li&gt; Peter Norvig was named a Berkeley &lt;a href="http://www.eecs.berkeley.edu/alumni/distinguished.cs.shtml#outstanding"&gt;Distinguished Alumni&lt;/a&gt; in Computer Science, and gave the keynote &lt;a href="http://norvig.com/speech.html"&gt;commencement address&lt;/a&gt;.  We'd also like to congratulate  Prabhakar Raghavan, Head of Yahoo Research, who was a co-recipient of this award.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; Simon Quellen Field's book &lt;i&gt;&lt;a href="http://www.amazon.com/gp/product/1556526105"&gt;Return of Gonzo Gizmos&lt;/a&gt;&lt;/i&gt; was a selection of the &lt;a href="http://www.sciambookclub.com"&gt;Scientific American Book Club&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; Google summer intern Rion Snow (along with Stanford professors Dan Jurafsky and Andrew Ng) got the best paper award at the 2006 &lt;a href="http://www.acl2006.mq.edu.au/"&gt;ACL/COLING&lt;/a&gt; (computational linguistics) conference for his paper titled &lt;i&gt;&lt;a href="http://ai.stanford.edu/~rion/papers/semtax_acl06.pdf"&gt;Semantic taxonomy induction from heterogenous evidence&lt;/a&gt;&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; Google summer intern Lev Reyzin won the outstanding student paper award at &lt;a href="http://www.icml2006.org/icml2006/16648.html"&gt;ICML&lt;/a&gt; (International Conference on Machine Learning) for work with Rob Schapire of Princeton on &lt;i&gt;&lt;a href="http://www.icml2006.org/icml_documents/camera-ready/095_How_Boosting_the_Mar.pdf"&gt;How Boosting the Margin Can Also Boost Classifier Complexity&lt;/a&gt;&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; As we &lt;a href="http://googleresearch.blogspot.com/2006/06/interactive-tv-conference-and-best.html"&gt;mentioned earlier&lt;/a&gt;, Michael Fink, Michele Covell and Shumeet Baluja won a best paper award for &lt;i&gt;Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; &lt;b&gt;Update 13 Oct 2006:&lt;/b&gt; &lt;a href="http://www.technologyreview.com/TR35/Profile.aspx?Cand=T&amp;TRID=437"&gt;Paul Rademacher&lt;/a&gt; has been named one of the top innovators under 35 by MIT's &lt;a href="http://www.technologyreview.com/"&gt;Technology Review&lt;/a&gt;. He was cited&lt;br /&gt;for his mashup of Google Maps and Craig's List housing data at &lt;a href="http://housingmaps.com"&gt;housingmaps.com&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;li&gt; &lt;b&gt;Update 31 Oct 2006:&lt;/b&gt; We forgot Alon Halevy, who won the &lt;a href="http://www.informatik.uni-trier.de/~ley/db/conf/vldb/vldb2006.html"&gt;VLDB 10 Year Best Paper Award&lt;/a&gt; for &lt;i&gt;Querying Heterogeneous Information Sources Using Source Descriptions&lt;/i&gt; with Anand Rajaraman and Joann J. Ordille.&lt;br /&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=qEsjhJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=qEsjhJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582866" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582866/and-awards-go-to.html" title="And the Awards Go To ..." /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=115894742981077088" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/115894742981077088/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/115894742981077088" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/115894742981077088" /><author><name>Peter Norvig</name><uri>http://www.blogger.com/profile/04107527216960668585</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/09/and-awards-go-to.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-115463142334550661</id><published>2006-08-03T11:26:00.000-07:00</published><updated>2006-10-10T17:50:42.940-07:00</updated><title type="text">All Our N-gram are Belong to You</title><content type="html">&lt;span class="byline-author"&gt;Posted by Alex Franz and Thorsten Brants, Google Machine Translation Team&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here at Google Research we have been using word &lt;a href="http://en.wikipedia.org/wiki/N-gram"&gt;n-gram models&lt;/a&gt; for a variety of R&amp;amp;D projects, such as &lt;a href="http://googleresearch.blogspot.com/2006/04/statistical-machine-translation-live.html"&gt;statistical machine translation&lt;/a&gt;, speech recognition, &lt;a href="http://www.google.com/jobs/britney.html"&gt;spelling correction&lt;/a&gt;, entity detection, information extraction, and others.  While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;infrastructure&lt;/a&gt; to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of &lt;i&gt;one trillion words&lt;/i&gt; from public Web pages.&lt;br /&gt;&lt;br /&gt;We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That's why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.&lt;br /&gt;&lt;br /&gt;Watch for an announcement at the Linguistics Data Consortium (&lt;a href="http://www.ldc.upenn.edu"&gt;LDC&lt;/a&gt;), who will be distributing it soon, and then order your set of 6 DVDs. And &lt;a href="mailto:ngrams@google.com"&gt;let us hear from you&lt;/a&gt; - we're excited to hear what you will do with the data, and we're always interested in feedback about this dataset, or other potential datasets that might be useful for the research community.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update (22 Sept. 2006):&lt;/b&gt; The LDC now has the &lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13"&gt;data available&lt;/a&gt; in their catalog. The counts are as follows:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;File sizes: approx. 24 GB compressed (gzip'ed) text files&lt;br /&gt;&lt;br /&gt;Number of tokens:    1,024,908,267,229&lt;br /&gt;Number of sentences:    95,119,665,584&lt;br /&gt;Number of unigrams:         13,588,391&lt;br /&gt;Number of bigrams:         314,843,401&lt;br /&gt;Number of trigrams:        977,069,902&lt;br /&gt;Number of fourgrams:     1,313,818,354&lt;br /&gt;Number of fivegrams:     1,176,470,663&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The following is an example of the 3-gram data contained this corpus:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;ceramics collectables collectibles 55&lt;br /&gt;ceramics collectables fine 130&lt;br /&gt;ceramics collected by 52&lt;br /&gt;ceramics collectible pottery 50&lt;br /&gt;ceramics collectibles cooking 45&lt;br /&gt;ceramics collection , 144&lt;br /&gt;ceramics collection . 247&lt;br /&gt;ceramics collection &amp;lt;/S&amp;gt; 120&lt;br /&gt;ceramics collection and 43&lt;br /&gt;ceramics collection at 52&lt;br /&gt;ceramics collection is 68&lt;br /&gt;ceramics collection of 76&lt;br /&gt;ceramics collection | 59&lt;br /&gt;ceramics collections , 66&lt;br /&gt;ceramics collections . 60&lt;br /&gt;ceramics combined with 46&lt;br /&gt;ceramics come from 69&lt;br /&gt;ceramics comes from 660&lt;br /&gt;ceramics community , 109&lt;br /&gt;ceramics community . 212&lt;br /&gt;ceramics community for 61&lt;br /&gt;ceramics companies . 53&lt;br /&gt;ceramics companies consultants 173&lt;br /&gt;ceramics company ! 4432&lt;br /&gt;ceramics company , 133&lt;br /&gt;ceramics company . 92&lt;br /&gt;ceramics company &amp;lt;/S&amp;gt; 41&lt;br /&gt;ceramics company facing 145&lt;br /&gt;ceramics company in 181&lt;br /&gt;ceramics company started 137&lt;br /&gt;ceramics company that 87&lt;br /&gt;ceramics component ( 76&lt;br /&gt;ceramics composed of 85&lt;br /&gt;ceramics composites ferrites 56&lt;br /&gt;ceramics composition as 41&lt;br /&gt;ceramics computer graphics 51&lt;br /&gt;ceramics computer imaging 52&lt;br /&gt;ceramics consist of 92&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The following is an example of the 4-gram data in this corpus:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;serve as the incoming 92&lt;br /&gt;serve as the incubator 99&lt;br /&gt;serve as the independent 794&lt;br /&gt;serve as the index 223&lt;br /&gt;serve as the indication 72&lt;br /&gt;serve as the indicator 120&lt;br /&gt;serve as the indicators 45&lt;br /&gt;serve as the indispensable 111&lt;br /&gt;serve as the indispensible 40&lt;br /&gt;serve as the individual 234&lt;br /&gt;serve as the industrial 52&lt;br /&gt;serve as the industry 607&lt;br /&gt;serve as the info 42&lt;br /&gt;serve as the informal 102&lt;br /&gt;serve as the information 838&lt;br /&gt;serve as the informational 41&lt;br /&gt;serve as the infrastructure 500&lt;br /&gt;serve as the initial 5331&lt;br /&gt;serve as the initiating 125&lt;br /&gt;serve as the initiation 63&lt;br /&gt;serve as the initiator 81&lt;br /&gt;serve as the injector 56&lt;br /&gt;serve as the inlet 41&lt;br /&gt;serve as the inner 87&lt;br /&gt;serve as the input 1323&lt;br /&gt;serve as the inputs 189&lt;br /&gt;serve as the insertion 49&lt;br /&gt;serve as the insourced 67&lt;br /&gt;serve as the inspection 43&lt;br /&gt;serve as the inspector 66&lt;br /&gt;serve as the inspiration 1390&lt;br /&gt;serve as the installation 136&lt;br /&gt;serve as the institute 187&lt;br /&gt;serve as the institution 279&lt;br /&gt;serve as the institutional 461&lt;br /&gt;serve as the instructional 173&lt;br /&gt;serve as the instructor 286&lt;br /&gt;serve as the instructors 161&lt;br /&gt;serve as the instrument 614&lt;br /&gt;serve as the instruments 193&lt;br /&gt;serve as the insurance 52&lt;br /&gt;serve as the insurer 82&lt;br /&gt;serve as the intake 70&lt;br /&gt;serve as the integral 68&lt;br /&gt;&lt;/pre&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=CjmQYJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=CjmQYJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582867" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582867/all-our-n-gram-are-belong-to-you.html" title="All Our N-gram are Belong to You" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=115463142334550661" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/115463142334550661/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/115463142334550661" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/115463142334550661" /><author><name>Peter Norvig</name><uri>http://www.blogger.com/profile/04107527216960668585</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-115274014722981285</id><published>2006-07-12T15:28:00.000-07:00</published><updated>2006-07-13T06:18:53.813-07:00</updated><title type="text">Call for attendees - Conference on Test Automation</title><content type="html">&lt;span class="byline-author"&gt;Posted by Allen Hutchison, Engineering Manager&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As we &lt;a href="http://googleresearch.blogspot.com/2006/04/our-conference-on-automated-testing.html"&gt;noted earlier&lt;/a&gt;, we're hosting our first-ever Conference on Test Automation in London in September.&lt;br /&gt;&lt;br /&gt;We've heard from many interested parties, and now have 13 excellent presentations lined up. Now we are soliciting people who want to attend. Because we expect lots of interest and space is limited, we're asking everyone who's interested to write a short note (400 words or less) on why you want to be there. There's an &lt;a href="http://services.google.com/events/londontesters"&gt;easy form&lt;/a&gt; for requesting a spot, and we hope to hear from you. The deadline for writing in is July 28th - and you'll hear back by August 4.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=e5B0vJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=e5B0vJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582868" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582868/call-for-attendees-conference-on-test.html" title="Call for attendees - Conference on Test Automation" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=115274014722981285" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/115274014722981285/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/115274014722981285" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/115274014722981285" /><author><name>A Googler</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/07/call-for-attendees-conference-on-test.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-114962138526611754</id><published>2006-06-06T10:13:00.000-07:00</published><updated>2006-06-06T12:16:25.570-07:00</updated><title type="text">Interactive TV: Conference and Best Paper</title><content type="html">&lt;span class="byline-author"&gt;Posted by Michele Covell &amp;amp; Shumeet Baluja, Research Scientists&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Euro ITV (the interactive television conference) took place in Athens last week.  The presentations included a diverse collection of user studies, new application areas, and exploratory business models. One of the main themes was the integration of multiple information sources.  For example, during a time-out in a live sporting event, some viewers may enjoy reviewing highlight footage, while others may prefer to view a parallel program to view player profiles and statistics before being automatically returned to the soccer match once the event was back underway.&lt;br /&gt;&lt;br /&gt;Other papers explored the idea of selecting and recommending videos.  When many videos are available, such as through IPTV or digital cable, we see a &lt;i&gt;heavy-tailed distribution&lt;/i&gt; of content accesses (much like that on the internet).  There are a small number of popular channels but the combined viewings from thousands of "niche" channels outweigh the popular channels.  As on the web, the problem that arises from this situation is one of discovery. A &lt;i&gt;TV guide&lt;/i&gt; type resource is not practical; methods like collaborative-filtering can help.  Nonetheless, new ideas and interfaces are needed.  &lt;br /&gt;&lt;br /&gt;We also presented our work at the conference.  &lt;a href="http://www.mangolassi.org/covell/pubs/euroITV-2006.pdf"&gt;Our paper [pdf]&lt;/a&gt; (which received the &lt;b&gt;best paper&lt;/b&gt; award :) focused on using broadcast viewing to automatically present relevant information on a web browser.  We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort.  The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed.  Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=XCJV0J"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=XCJV0J" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582869" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582869/interactive-tv-conference-and-best.html" title="Interactive TV: Conference and Best Paper" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=114962138526611754" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/114962138526611754/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114962138526611754" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114962138526611754" /><author><name>Peter Norvig</name><uri>http://www.blogger.com/profile/04107527216960668585</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/06/interactive-tv-conference-and-best.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-114926336657808713</id><published>2006-06-02T08:34:00.000-07:00</published><updated>2008-02-17T18:12:21.754-08:00</updated><title type="text">Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken</title><content type="html">&lt;span class="byline-author"&gt;Posted by Joshua Bloch, Software Engineer&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I remember vividly Jon Bentley's first Algorithms lecture at CMU, where he asked all of us incoming Ph.D. students to write a binary search, and then dissected one of our implementations in front of the class.  Of course it was broken, as were most of our implementations.  This made a real impression on me, as did the treatment of this material in his wonderful &lt;i&gt;Programming Pearls&lt;/i&gt; (Addison-Wesley, 1986; Second Edition, 2000).  The key lesson was to carefully consider the invariants in your programs.&lt;br /&gt;&lt;br /&gt;Fast forward to 2006.  I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of &lt;i&gt;Programming Pearls&lt;/i&gt; contains a bug.  Once I tell you what it is, you will understand why it escaped detection for two decades.  Lest you think I'm picking on Bentley, let me tell you how I discovered the bug: The version of binary search that I wrote for the JDK contained the same bug. It was reported to Sun recently when it broke someone's program, after lying in wait for nine years or so.&lt;br /&gt;&lt;br /&gt;So what's the bug?  Here's a standard binary search, in Java.  (It's one that I wrote for the &lt;code&gt;java.util.Arrays&lt;/code&gt;):&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;1:     public static int binarySearch(int[] a, int key) {&lt;br /&gt;2:         int low = 0;&lt;br /&gt;3:         int high = a.length - 1;&lt;br /&gt;4:&lt;br /&gt;5:         while (low &lt;= high) {&lt;br /&gt;6:             int mid = (low + high) / 2;&lt;br /&gt;7:             int midVal = a[mid];&lt;br /&gt;8:&lt;br /&gt;9:             if (midVal &lt; key)&lt;br /&gt;10:                 low = mid + 1&lt;br /&gt;11:             else if (midVal &gt; key)&lt;br /&gt;12:                 high = mid - 1;&lt;br /&gt;13:             else&lt;br /&gt;14:                 return mid; // key found&lt;br /&gt;15:         }&lt;br /&gt;16:         return -(low + 1);  // key not found.&lt;br /&gt;17:     }&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The bug is in this line:&lt;br /&gt;&lt;pre&gt; 6:             int mid =(low + high) / 2;&lt;/pre&gt;&lt;br /&gt;In &lt;i&gt;Programming Pearls&lt;/i&gt; Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer."  On the face of it, this assertion might appear correct, but it fails for large values of the &lt;code&gt;int&lt;/code&gt; variables &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;high&lt;/code&gt;. Specifically, it fails if the sum of &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;high&lt;/code&gt; is greater than the maximum positive &lt;code&gt;int&lt;/code&gt; value (2&lt;sup&gt;31&lt;/sup&gt; - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results.  In Java, it throws &lt;code&gt;ArrayIndexOutOfBoundsException&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;This bug can manifest itself for arrays whose length (in elements) is 2&lt;sup&gt;30&lt;/sup&gt; or greater (roughly a billion elements).  This was inconceivable back in the '80s, when &lt;i&gt;Programming Pearls&lt;/i&gt; was written, but it is common these days at Google and other places.  In &lt;i&gt;Programming Pearls&lt;/i&gt;, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of &lt;span style="font-style: italic;"&gt;n&lt;/span&gt; did not appear until 1962."  The truth is, very few correct versions have ever been published, at least in mainstream programming languages.&lt;br /&gt;&lt;br /&gt;So what's the best way to fix the bug? Here's one way:&lt;br /&gt;&lt;pre&gt; 6:             int mid = low + ((high - low) / 2);&lt;/pre&gt;&lt;br /&gt;Probably faster, and arguably as clear is:&lt;br /&gt;&lt;pre&gt; 6:             int mid = (low + high) &gt;&gt;&gt; 1;&lt;/pre&gt;&lt;br /&gt;In C and C++ (where you don't have the &lt;code&gt;&gt;&gt;&gt;&lt;/code&gt; operator), you can do this:&lt;br /&gt;&lt;pre&gt; 6:             mid = ((unsigned int)low + (unsigned int)high)) &gt;&gt; 1;&lt;/pre&gt;&lt;br /&gt;And now we &lt;i&gt;know&lt;/i&gt; the binary search is bug-free, right?  Well, we strongly suspect so, but we don't know. It is not sufficient merely to prove a program correct; you have to test it too.  Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.  With concurrent programs, it's even worse: You have to test for all internal states, which is, for all practical purposes, impossible.&lt;br /&gt;&lt;br /&gt;The binary-search bug applies equally to mergesort, and to other divide-and-conquer algorithms.  If you have any code that implements one of these algorithms, fix it now before it blows up.  The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.&lt;br /&gt;&lt;br /&gt;We programmers need all the help we can get, and we should never assume otherwise.  Careful design is great.  Testing is great.  Formal methods are great.  Code reviews are great. Static analysis is great.  But none of these things alone are sufficient to eliminate bugs: They will always be with us.  A bug can exist for half a century despite our best efforts to exterminate it. We must program carefully, defensively, and remain ever vigilant.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update 17 Feb 2008&lt;/span&gt;: Thanks to Antoine Trux, Principal Member of Engineering Staff at Nokia Research Center Finland for pointing out that the original proposed fix for C and C++ (Line 6), was not guaranteed to work by the relevant C99 standard (&lt;span style="font-style: italic;"&gt;INTERNATIONAL STANDARD - ISO/IEC - 9899 - Second edition - 1999-12-01&lt;/span&gt;, &lt;a href="http://3.4.3.3/" target="_blank"&gt;3.4.3.3&lt;/a&gt;), which says that if you add two signed quantities and get an overflow, the result is undefined. The older C Standard, C89/90, and the C++ Standard are both identical to C99 in this respect. Now that we've made this change, we &lt;span style="font-style: italic;"&gt;know&lt;/span&gt; that the program is correct;)&lt;h3&gt;Resources&lt;/h3&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt; &lt;a href="http://www.google.com/url?sa=D&amp;amp;q=http%3A%2F%2Fnetlib.bell-labs.com%2Fcm%2Fcs%2Fpearls%2F" target="_blank"&gt;&lt;i&gt;Programming Pearls&lt;/i&gt;&lt;/a&gt; - Highly recommended.  Get a copy today!  &lt;/li&gt;&lt;li&gt; &lt;a href="http://www.google.com/url?sa=D&amp;amp;q=http%3A%2F%2Fbugs.sun.com%2Fbugdatabase%2Fview_bug.do%3Fbug_id%3D5045582" target="_blank"&gt; The Sun bug report describing this bug in the JDK&lt;/a&gt;  &lt;/li&gt;&lt;li&gt; &lt;a href="http://www.google.com/url?sa=D&amp;amp;q=http%3A%2F%2Fwww.di.unipi.it%2F%7Eruggieri%2FPapers%2Fsemisum.pdf" target="_blank"&gt; A 2003 paper by Salvatore Ruggieri&lt;/a&gt; discussing a related problem -  The problem is a bit more general but perhaps less interesting: the average of two numbers of arbitrary sign.  The paper does not discuss performance, and its solution is not fast enough for use in the inner loop of a mergesort. &lt;/li&gt; &lt;/ul&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=39rguJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=39rguJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582870" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582870/extra-extra-read-all-about-it-nearly.html" title="Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=114926336657808713" title="3 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/114926336657808713/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114926336657808713" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114926336657808713" /><author><name>Peter Norvig</name><uri>http://www.blogger.com/profile/04107527216960668585</uri><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-114614493946140129</id><published>2006-04-28T15:40:00.000-07:00</published><updated>2006-05-01T11:52:07.336-07:00</updated><title type="text">Statistical machine translation live</title><content type="html">&lt;span class="byline-author"&gt;Posted by Franz Och, Research Scientist&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Because we want to provide everyone with access to all the world's information, including information written in every language, one of the exciting projects at Google Research is machine translation. Most state-of-the-art commercial machine translation systems in use today have been developed using a rules-based approach and require a lot of work by linguists to define vocabularies and grammars.&lt;br /&gt;&lt;br /&gt;Several research systems, including ours, take a different approach: we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages.  We then apply statistical learning techniques to build a translation model. We have achieved &lt;a href="http://www.nist.gov/speech/tests/mt/mt05eval_official_results_release_20050801_v3.html"&gt;very good results&lt;/a&gt; in research evaluations.&lt;br /&gt;&lt;br /&gt;Now you can see the results for yourself. We recently launched an online version of our system for &lt;a href="http://translate.google.com/translate_t?langpair=ar%7Cen"&gt;Arabic-English&lt;/a&gt; and &lt;a href="http://translate.google.com/translate_t?langpair=en%7Car"&gt;English-Arabic&lt;/a&gt;. Try it out! Arabic is a very challenging language to translate to and from: it requires long-distance reordering of words and has a very rich morphology. Our system works better for some types of text (e.g. news) than for others (e.g. novels) -- and you probably should not try to translate poetry ... but do stay tuned for more exciting developments.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Update:&lt;/span&gt; We've just opened a &lt;a href="http://groups.google.com/group/google-translate"&gt;discussion forum&lt;/a&gt; for all topics related to machine translation.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=8LKQdJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=8LKQdJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582871" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582871/statistical-machine-translation-live.html" title="Statistical machine translation live" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=114614493946140129" title="1 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/114614493946140129/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114614493946140129" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114614493946140129" /><author><name>A Googler</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/04/statistical-machine-translation-live.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-114573258339695752</id><published>2006-04-27T09:45:00.000-07:00</published><updated>2006-04-27T09:47:19.743-07:00</updated><title type="text">Our conference on automated testing</title><content type="html">&lt;span class="byline-author"&gt;Posted by Allen Hutchison, Engineering Manager&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Automated testing is one of my passions: it has hard problems to be solved, and they get harder every day. Over the past few years, I've had the opportunity to work on several automation projects, and now I'm getting a chance to combine my  passion for automation with my love for the city of London.&lt;br /&gt;&lt;br /&gt;I'm happy to  announce that Google will be hosting a &lt;b&gt;Conference on Test Automation &lt;/b&gt;in our London office on September 7 and 8, 2006. Our goal is to create a collegial atmosphere where participants can discuss challenges facing people on the  cutting edge of test automation, and evaluate solutions for meeting those  challenges.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Call for Presentations&lt;/span&gt;&lt;br /&gt;We're looking for speakers with exciting ideas and new approaches to test automation.  If you have a subject you'd like to talk about, please send me email at &lt;a href="mailto:londontestconf@google.com"&gt;londontestconf@google.com&lt;/a&gt; that includes a description of your 60- or 90-minute talk in 500 words or less. Deadline for submissions is &lt;span style="font-weight:bold;"&gt;June 1&lt;/span&gt;. &lt;br /&gt;&lt;br /&gt;We're planning to have 10 people give presentations at the conference followed by adequate time for discussion. If you'd like to attend as a non-speaker, watch this space. Once we've got a slate of speakers, we'll post it  along with details on attending.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=z7jtkJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=z7jtkJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582872" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582872/our-conference-on-automated-testing.html" title="Our conference on automated testing" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=114573258339695752" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/114573258339695752/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114573258339695752" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114573258339695752" /><author><name>A Googler</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/04/our-conference-on-automated-testing.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-21224994.post-114580429560357159</id><published>2006-04-23T08:01:00.000-07:00</published><updated>2006-04-23T08:04:02.250-07:00</updated><title type="text">See you at CHI</title><content type="html">&lt;span class="byline-author"&gt;Posted by Rick Boardman, User Experience Researcher&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The raison d’etre for our user experience research team is driven by Google's keen interest in focusing on the user. So we help many product teams provide the best possible experience to everyone around the world, primarily by inviting thousands of people to take part in &lt;a href="http://en.wikipedia.org/wiki/Usability_testing"&gt;usability tests&lt;/a&gt; in our labs, and by analyzing our logs to identify problems which need fixing. From this we get the data we help our engineers make Google products as easy as possible to use for the millions of people out there who think computers are far too complicated. People like my Mum, Dad, girlfriend, Gran — and pretty much everyone I know!&lt;br /&gt;&lt;br /&gt;We’re one of several Google teams that publish research at academic and industry conferences, and this week a number of us will be attending the &lt;a href="http://www.chi2006.org/"&gt;CHI&lt;/a&gt; (Computer-Human Interaction) conference in Montreal, the world's premiere gathering for CHI researchers and practitioners. Googlers from several teams will take part in eight sessions, each focusing on different aspects of human-computer interaction. (The full program is &lt;a href="http://www.chi2006.org/docs/finalprogram2006.pdf"&gt;here&lt;/a&gt; – it’s a PDF file.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;A Large Scale Study of Wireless Search Behavior: Google Mobile Search&lt;/span&gt; – In a session on Search and Navigation: Mobiles and Audio, we'll present the first large-scale study of search behavior for mobile users, highlighting some shortcomings of wireless search interfaces.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Scaling the card sort method to over 500 items: Restructuring the Google AdWords Help Center&lt;/span&gt; – Here we adapt the popular card-sorting research methodology to large information sets where the traditional approach is impractical and discuss how we've applied this technique.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;No IM Please, We’re Testing&lt;/span&gt; – During the Usability Evaluations: Challenges and Solutions session we’ll discuss the use of instant messaging tools like Google Talk in usability tests, and the benefits of this technique for enabling live collaboration between test moderators and observers.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Add a Dash of Interface: Taking Mash-Ups to the Next Level&lt;/span&gt; – Here we contribute to the discussion of how extendable interfaces like Google Maps are enabling exciting new online innovation through the combining of data sources.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Why Do Tagging Systems Work?&lt;/span&gt; – This panel will address the design challenges of scaling tagging systems to meet their recent surge in popularity. Gmail is an example of email tagging that offers more flexibility than traditional hierarchical systems.&lt;br /&gt;&lt;br /&gt;D&lt;span style="font-weight: bold;"&gt;esign Communication: How Do You Get Your Point Across?&lt;/span&gt; – A key challenge for UI designers is communicating solutions and challenges within product teams. This panel focuses on effective ways to do that.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;“It’s About the Information, Stupid!” Why We Need a Separate Field of Human Information Interaction&lt;/span&gt; – This interdisciplinary panel will discuss arguments for and against a distinct field focusing on information rather than computing technology. One for the theoreticians? (-;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Incorporating Eyetracking into User Studies at Google &lt;/span&gt;– In this Eyetracking in Practice workshop, we’ll talk about some of the challenges we’ve encountered in studies of eyetracking in our labs.&lt;br /&gt;&lt;br /&gt;If you work in, or study, the area of human-computer interaction, the user experience team is hiring. Right now we’re looking for &lt;a href="http://www.google.com/support/jobs/bin/answer.py?answer=32255"&gt;user experience researchers&lt;/a&gt; (including those with specialized quantitative skills), &lt;a href="http://www.google.com/support/jobs/bin/answer.py?answer=23690"&gt;UI designers&lt;/a&gt;, and &lt;a href="http://www.google.com/support/jobs/bin/topic.py?dep_id=1091&amp;amp;loc_id=1100"&gt;more&lt;/a&gt;.&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~f/blogspot/gJZg?a=vZXFPJ"&gt;&lt;img src="http://feeds.feedburner.com/~f/blogspot/gJZg?i=vZXFPJ" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/blogspot/gJZg/~4/144582873" height="1" width="1"/&gt;</content><link rel="alternate" type="text/html" href="http://feeds.feedburner.com/~r/blogspot/gJZg/~3/144582873/see-you-at-chi.html" title="See you at CHI" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=21224994&amp;postID=114580429560357159" title="0 Comments" /><link rel="replies" type="application/atom+xml" href="http://googleresearch.blogspot.com/feeds/114580429560357159/comments/default" title="Post Comments" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114580429560357159" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/21224994/posts/default/114580429560357159" /><author><name>A Googler</name><email>noreply@blogger.com</email></author><feedburner:origLink>http://googleresearch.blogspot.com/2006/04/see-you-at-chi.html</feedburner:origLink></entry></feed>
