<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
    <channel>
        <title>Jean Véronis - Language technologies</title>
        <description>Language, technology, politics, society</description>
        <link>http://aixtal.blogspot.com</link>
        <lastBuildDate>Fri, 26 Jun 2009 14:12:01 +0100</lastBuildDate>
        <language>en-US</language>
        <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/aixtal-en" type="application/rss+xml" /><item>
            <title>Wikio: Over 100,000 UK blogs</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/vsWNAk8CEOw/wikio-over-100000-uk-blogs.html</link>
            <description>&lt;div&gt;      &lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2009/06/wikio-plus-de-100-000-blogs-uk.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I've been quiet recently. I've been working flat out on a project that has required all of my attention: increasing the number of UK blogs for Wikio UK (&lt;a href="http://www.wikio.co.uk/" target="_blank"&gt;www.wikio.co.uk&lt;/a&gt;). The UK site was the last one to appear after &lt;a href="http://www.wikio.fr/" target="_blank"&gt;wikio.fr&lt;/a&gt;, &lt;a href="http://www.wikio.it/" target="_blank"&gt;wikio.it&lt;/a&gt;, &lt;a href="http://www.wikio.es/" target="_blank"&gt;wikio.es&lt;/a&gt;, &lt;a href="http://www.wikio.de/" target="_blank"&gt;wikio.de&lt;/a&gt; and &lt;a href="http://www.wikio.com/" target="_blank"&gt;wikio.com&lt;/a&gt;, and has to some extent always suffered a little in terms of increasing the number of sites in the database. I thus put in place some adapted algorithms several weeks ago and I'm happy to announce that &lt;b&gt;the UK site has now passed 100,000 blogs&lt;/b&gt;.&lt;b&gt; &lt;/b&gt;Exactly 113,000 at the time of writing, and this number is set to increase further in the coming hours: there are nearly 30,000 more blogs in the pipeline.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.wikio.co.uk/"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 350px; height: 354px;" src="http://4.bp.blogspot.com/_J6yMAuMJ-Wc/SkRzl3pugxI/AAAAAAAAAUA/QhrGC05TQNk/s400/Image+7.jpg" alt="" id="BLOGGER_PHOTO_ID_5351529351654310674" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;If you go to the &lt;a href="http://www.wikio.co.uk/" target="_blank"&gt;site&lt;/a&gt; you will see "&lt;span style="font-style: italic;"&gt;Live breaking news from 156920 blogs&lt;/span&gt;", but this is simply the number of &lt;span style="font-weight: bold;"&gt; anglophone&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;blogs&lt;/span&gt;, and not only those from the UK. The same number is indeed shown on &lt;a href="http://www.wikio.com/" target="_blank"&gt;wikio.com&lt;/a&gt;. Both sites draw from the same database but do not display the same results: it's all a question of weighting. The UK site prioritises UK news and the US site prioritises US news (hence the need to geolocate sources). You will see for example the differing reactions to international events, be it the situation in Iran, or the death of Michael Jackson - all rather interesting.&lt;br /&gt;&lt;br /&gt;It is alas very complicated in practice. It is extremely difficult for our machines to determine whether a site is American or British (or Canadian or Australian etc.). Obviously if the URL ends in .&lt;a href="http://co.uk/" target="_blank"&gt;co.uk&lt;/a&gt;, there is little ambiguity. But this is in fact rarely the case. Most British blogs for example are on &lt;a href="http://blogspot.com/" target="_blank"&gt;blogspot.com&lt;/a&gt;, &lt;a href="http://wordpress.com/" target="_blank"&gt;wordpress.com&lt;/a&gt;, etc.&lt;br /&gt;&lt;br /&gt;The algorithms are rather sensitive, and as far as I'm aware, no other service goes as far to distinguish between UK/US in the way that we do at Wikio. If you try Google Blogs Search or Technorati, you will see for example that it is a mish-mash without any real attempt to sort by country except a (probable) bias towards .&lt;a href="http://co.uk/" target="_blank"&gt;co.uk&lt;/a&gt;. domains.&lt;br /&gt;&lt;br /&gt;The difficulty comes from the fact that &lt;span style="font-weight: bold;"&gt;no one criterion suffices unto itself. &lt;/span&gt;We can, for example, check the &lt;b&gt;spelling&lt;/b&gt;. We know that in Britain they write &lt;span style="font-style: italic;"&gt;colour&lt;/span&gt; or &lt;span style="font-style: italic;"&gt;neighbour&lt;/span&gt; and not &lt;span style="font-style: italic;"&gt;color&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;neighbor&lt;/span&gt; as in America. This can be useful, but it does not in fact concern that many words, and we are not guaranteed to find them on your average blog. To further complicate matters, Canadians, Australians and other blogs of the Commonwealth use the British spelling style. So we can also turn to the &lt;b&gt;blogger's profile&lt;/b&gt;: if it cites "London, UK", there you have it. But there is very often not a profile on the page, and it must be found and correctly parsed by the machines. Web 2.0 it appears lacks certain standards! So in practice this requires a fair bit of work...&lt;br /&gt;&lt;br /&gt;We can also look at the &lt;span style="font-weight: bold;"&gt;topology of the blogosphere&lt;/span&gt; (I hope soon to be able to show you some maps of the US/UK &lt;i&gt;à la&lt;/i&gt; &lt;a href="http://labs.wikio.net/wikiopole" target="_blank"&gt;Wikiopole FR&lt;/a&gt;). UK blogs tend principally to reference UK blogs, and the US blogs US blogs. The web is simply a sum of communities... However, in pratice it's a little trickier than that: UK blogs also reference US blogs (yet this tends not to happen in the opposite direction, which does help a little).&lt;br /&gt;&lt;br /&gt;So, in order to end up with a reliable sourcing technique, one must combine all these criteria, and let me assure you it has not been simple. But I am rather pleased with the results, both in terms of coverage and reliability. The UK site is now the second biggest in terms of the number of blogs. I hope it will be useful for you if you are interested in British culture, and wish to discover blogs from across the channel. I would have loved that when I was learning English at school (we had only the BBC on short wave radio...). The &lt;a href="http://www.wikio.co.uk/blogs/top" target="_blank"&gt;themed rankings&lt;/a&gt; are still somewhat light, but I am currently working furiously on this with a team of Masters students whom Wikio kindly granted internships, and we are already seeing some great categories emerging. I don't know whether some (perhaps Wine &amp;amp; Beer) will see the light of day for the next ranking, but if not, it will be at the end of July.&lt;br /&gt;&lt;br /&gt;That is also a real challenge: as reliably as possible categorising hundreds of thousands of blogs. It's not simple: a nice example of &lt;span style="font-weight: bold;"&gt;intermingled semantics and topology.&lt;/span&gt; That, however, will be the subject of another post. I don't wish to wear you all out!     &lt;/div&gt;    &lt;/div&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/vsWNAk8CEOw" height="1" width="1"/&gt;</description>
            <pubDate>Fri, 26 Jun 2009 13:59:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2009/06/wikio-over-100000-uk-blogs.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2009/06/wikio-over-100000-uk-blogs.html</feedburner:origLink></item>
        <item>
            <title>Blogs: French, less sociable than you might have thought (but Italians even less so!)</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/oFQu3GHufSY/blogs-french-less-sociable-than-you.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/10/blogs-le-franais-nest-pas-liant-mais.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;We have already discussed several times on this blog (most notably in the comments) the fact that French bloggers seem to link to one another with far less regularity than their American counterparts. I had wanted to avoid stereotypes - the disciplined Germans, complaining French and romantic Italians etc. -  and approach this with an open mind. Still, the results are fairly clear cut: each country has a &lt;span style="font-weight: bold;"&gt;different approach to Web 2.0&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;I worked out the &lt;span style="font-weight: bold;"&gt;proportion of links&lt;/span&gt; on the blogs of the various countries in the &lt;a href="http://www.wikio.com/"&gt;Wikio&lt;/a&gt; database for September 2008. The results are clear: The US is well out in front with 0.17 links per post, or one link for every 6 posts. Then comes Germany, followed by the UK and then France, which has half the proportion of links of the United States (0.08 per post or one link for every 12 posts published). Then finally comes Spain and Italy. Cliches aside, it's funny to see the the Anglosaxons and Germans on one side, and the Latin countries on the other, the two flanking the French who sit slap-bang in the middle.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;table width="60%"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td bgcolor="#f6cc72"&gt;&lt;b&gt;Coutry&lt;/b&gt;&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#f6cc72"&gt;&lt;b&gt;Links/post&lt;br /&gt;&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fef8e9"&gt;US&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.17&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fcedcd"&gt;DE&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.12&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fef8e9"&gt;UK&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.09&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fcedcd"&gt;FR&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.08&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fef8e9"&gt;ES&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.06&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fcedcd"&gt;IT&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.05&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_J6yMAuMJ-Wc/SPoo2KLBzHI/AAAAAAAAAI8/B_ZB_4p0uNM/s1600-h/Image+3.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_J6yMAuMJ-Wc/SPoo2KLBzHI/AAAAAAAAAI8/B_ZB_4p0uNM/s400/Image+3.png" alt="" id="BLOGGER_PHOTO_ID_5258560425817132146" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;Number of links per post&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Even more interesting is to see the results separated by link type: to another post or to a blog's homepage.&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;table width="60%"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td bgcolor="#f6cc72"&gt;&lt;b&gt;Country&lt;/b&gt;&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#f6cc72"&gt;&lt;span style="font-weight: bold;"&gt;to a post&lt;/span&gt;&lt;br /&gt;&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#f6cc72"&gt;&lt;b&gt;to homepage&lt;br /&gt;&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fef8e9"&gt;US&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.14&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.03&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fcedcd"&gt;DE&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.10&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.02&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fef8e9"&gt;UK&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.07&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.02&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fcedcd"&gt;FR&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.03&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.04&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fef8e9"&gt;ES&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.03&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fef8e9"&gt;0.02&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td bgcolor="#fcedcd"&gt;IT&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.03&lt;/td&gt;&lt;td style="text-align: right;" bgcolor="#fcedcd"&gt;0.02&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;You will notice that the different rates are essentially those of &lt;span style="font-weight: bold;"&gt;links from post to post&lt;/span&gt;: links to homepages occur in pretty much the same proportions from country to country (with a slightly higher rate for France). French, Spanish and Italian bloggers link to other posts 4 times less than Americans do.&lt;br /&gt;&lt;br /&gt;These results lead me to two remarks:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;1.&lt;/span&gt; First off, they explain the difficulties one encounters in trying to make a &lt;a style="font-weight: bold;" href="http://en.wikipedia.org/wiki/Memetracker"&gt;memetracker&lt;/a&gt; work in several countries - one which tracks 'hot' discussions by following post-to-post links. The example of &lt;a href="http://www.techmeme.com/"&gt;Techmeme&lt;/a&gt; seems a difficult one to recreate in other cultures: for example Wikio's memetracker works better in the &lt;a href="http://www.wikio.com/blogs"&gt;US&lt;/a&gt; than in &lt;a href="http://www.wikio.fr/blogs"&gt;France&lt;/a&gt;, where the discussions are less easily aggregated. New ideas are needed!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;2.&lt;/span&gt; Also, as I explained in my last post [&lt;a href="http://aixtal.blogspot.com/2008/10/blogs-dans-les-entrailles-du-classement_04.html"&gt;Fr&lt;/a&gt;], the &lt;a href="http://www.wikio.com/blogs/top"&gt;Wikio rankings&lt;/a&gt; do not currently take into account links to homepages. This was a way of combating "chains", but (having seen your reactions and comments) we will evidently have to rethink this one too!&lt;p class="blogger-labels"&gt;Libellés : &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Blogs"&gt;Blogs&lt;/a&gt;, &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Wikio%20labs"&gt;Wikio labs&lt;/a&gt;&lt;/p&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/oFQu3GHufSY" height="1" width="1"/&gt;</description>
            <pubDate>Mon, 20 Oct 2008 22:15:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/10/blogs-french-less-sociable-than-you.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/10/blogs-french-less-sociable-than-you.html</feedburner:origLink></item>
        <item>
            <title>Blogs: Turbulence ahead</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/mAyNnDJgFjk/blogs-turbulence-ahead.html</link>
            <description>&lt;div&gt;      &lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/09/blogs-avis-de-turbulences.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;A big clean is more commonly carried out in Spring... but there's no reason not to do so at the end of the summer! This indeed was my recent advice to Wikio regarding their famous &lt;a style="font-weight: bold;" href="http://www.wikio.com/blogs/top"&gt;Blog rankings&lt;/a&gt;. I told you recently that one of the projects which would be receiving my attention would be the rankings, in collaboration with you all. In fact, I completely reworked it, and, &lt;a href="http://aixtal.blogspot.com/2008/09/blogs-wikio-enleve-le-haut.html"&gt;as promised [fr]&lt;/a&gt;, I will provide you with the algorithm's details in the days to come. My first observation is that there was a significant amount of dust in certain nooks and crannies, which needed a little attention before we could progress and try to improve the rankings as a whole. This is not a criticism: such a ranking is an extremely technical undertaking and even the very big names have troubles with it (&lt;a href="http://www.arhv.lhivic.org/index.php/2008/09/24/817-mon-rapport-annuel-sur-l-etat-de-technorati"&gt;Technorati for example [fr]&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.wikio.fr/blogs/top"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_J6yMAuMJ-Wc/SOCgJWrR4bI/AAAAAAAAAG8/t1Vrz3D6jj0/s400/bal-rss.jpg" alt="" id="BLOGGER_PHOTO_ID_5251373248080372146" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;So, the various Wikio teams have spent September with broom in hand and the results are likely to ruffle a few feathers... There will surely be some grinding of teeth (there always is: not everyone can be on top), but the engine is now much cleaner. Several of you had noted that there were &lt;span style="font-weight: bold;"&gt;inactive blogs&lt;/span&gt; that had stuck around in the rankings, even though they had not published for a few weeks. Well no more - they're out. I got our developers to create several indicators, one of which flags up publication volume, that allow us to more closely follow the behaviour of the tens of thousands of sources in our database. All such blogs who had not published for four months have thus been jettisoned. Other indicators were a little more difficult to implement, but now in place they allow one to assess the similarity between sources and so address spammers, aggregators and multiple posting (which is sometimes legitimate, but such activity can seriously affect the analysis of backlinks, and thus the rankings as they are based solely on this criterion). So out also with &lt;span style="font-weight: bold;"&gt;aggregators and other doubles&lt;/span&gt; (a lot of the recent work was precisely this, dealing with the enormous presence of source duplication which is a delicate and extensive process).&lt;br /&gt;&lt;br /&gt;I also implemented a small change, which has no bearing on the overall principle, but improves the &lt;span style="font-weight: bold;"&gt;transition from one month to another&lt;/span&gt;. Many of you had seen that there was sometimes a yo-yo effect, whereby blogs suddenly lose a large number of positions, or the opposite, they shoot up the rankings like a rocket. This was largely due to the time period used when analysing backlinks. This period as you will know is four months, but say a blog is very heavily buzzed in April, it will then appear high up in the rankings from May to August and then (if it is not further talked about in the mean time), suddenly plummet in September. Not ideal clearly. I thus replaced the straight four-month calculation with a progressive attenuation over nine months. So September's links have a value of 1, August's a value of 1 – 1/9, July's 1 – 2/9 etc. etc. The variations are now a lot more temperate.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_J6yMAuMJ-Wc/SOCbqu-kemI/AAAAAAAAAGs/_mnBsUDGhpU/s1600-h/retrolien1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_J6yMAuMJ-Wc/SOCbqu-kemI/AAAAAAAAAGs/_mnBsUDGhpU/s320/retrolien1.png" alt="" id="BLOGGER_PHOTO_ID_5251368323981277794" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;Before&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_J6yMAuMJ-Wc/SOCb0kZ8h9I/AAAAAAAAAG0/SAt38spwcaI/s1600-h/retrolien2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_J6yMAuMJ-Wc/SOCb0kZ8h9I/AAAAAAAAAG0/SAt38spwcaI/s320/retrolien2.png" alt="" id="BLOGGER_PHOTO_ID_5251368492942002130" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;Now&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Obviously this month there will still be a lot of change in the rankings as many things have been adjusted. The good news is that the clearing out of moribund or spammer blogs has cleared a number of places, and there are thus more blogs on their way up than on their way down. I don't yet wish to reveal the rankings as verifications are still being carried out, but there are some &lt;span style="font-weight: bold;"&gt;noteworthy and indeed worthy leaps&lt;/span&gt;. A few falls as well but that is to be expected. The summer entailed a drop in activity for many blogs but that is true everywhere (you will have likely seen the report on Technorati). It is of course up for analysis, but we hope at least to have provided an improved and cleaner ranking.&lt;p class="blogger-labels"&gt;Libellés : &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Blogs"&gt;Blogs&lt;/a&gt;, &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Wikio%20labs"&gt;Wikio labs&lt;/a&gt;&lt;/p&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/mAyNnDJgFjk" height="1" width="1"/&gt;</description>
            <pubDate>Mon, 29 Sep 2008 14:43:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/09/blogs-turbulence-ahead.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/09/blogs-turbulence-ahead.html</feedburner:origLink></item>
        <item>
            <title>Google: Please find attached...</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/wRKu2_7jMio/google-please-find-attached.html</link>
            <description>&lt;div&gt;      &lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/09/google-veuillez-trouver-ci-joint.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;I’ve dreamt about it (and I’m sure you have too), Google have done it (in part at least)... How many times have you sent a message and later realized that you have forgotten to send the &lt;span style="font-weight: bold;"&gt;attachment&lt;/span&gt;? Embarrassment guaranteed. It has nearly come to be a standing joke with me to say that the automatic detection of missing attachments will be one of the best selling natural language processing programs in the world. A few years ago I even had discussions with students in my seminars on the various ways of developing such a function.&lt;br /&gt;&lt;br /&gt;Well, believe it or not &lt;a href="http://gmailblog.blogspot.com/2008/09/new-in-labs-handy-intern-tweaks.html"&gt;Google has announced&lt;/a&gt; that it has developed this function as part of GMail, under the mildly sexy name of "&lt;span style="font-weight: bold;"&gt;Forgotten attachment detector&lt;/span&gt;".&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://gmailblog.blogspot.com/2008/09/new-in-labs-handy-intern-tweaks.html"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_J6yMAuMJ-Wc/SNUbMudxKbI/AAAAAAAAAFk/5vnV-dVEd3c/s400/gmail-logo.jpg" alt="" id="BLOGGER_PHOTO_ID_5248130846215449010" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It must seem slightly magical to some of you, almost the stuff of science-fiction (could Google now be able to guess, or even anticipate our thoughts? It’s enough to make you shiver...). I am the first to denounce false announcements, which do more harm than good in the field of language technologies (there have been a slew of them over the last half century or more, on automatic translation, man-machine dialogue, and others). We know the problem with these technologies, and the greatest modesty still reigns. As I say in my first lesson, in fifty years we have managed to decode the human genome, but not the language... In this particular case however, I do believe it’s perfectly feasible.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_J6yMAuMJ-Wc/SNVZfegBVnI/AAAAAAAAAFs/YkqPUnrv-sY/s1600-h/Image+1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_J6yMAuMJ-Wc/SNVZfegBVnI/AAAAAAAAAFs/YkqPUnrv-sY/s400/Image+1.png" alt="" id="BLOGGER_PHOTO_ID_5248199338068366962" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;How on earth has Google managed to do it? Honestly I have no idea, but I can tell you how I would have done it (and it seems to me to be just about the only way). The wrong way, in my opinion, is to scratch your head and try to find expressions to detect in the body of mails: "&lt;span style="font-style: italic;"&gt;please find attached&lt;/span&gt;", etc. Even if you hire the best linguists in the world, the majority will still more than likely be missed.&lt;br /&gt;&lt;br /&gt;So here’s my recipe:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Take a very large mail base, millions, billions if possible (Google easily has that).&lt;/li&gt;&lt;li&gt;Split them into two piles: mails with attachments, mails without attachments.&lt;/li&gt;&lt;li&gt;Extract from each of the piles the dictionary of words that come up, or even better the n-grams that is sequences of n words that come up.&lt;/li&gt;&lt;li&gt;With the use of statistical tools, extract the &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;-grams which appear frequently in mails with attachments and not in mails without attachments.&lt;/li&gt;&lt;li&gt;For each new mail, check to see if one of these magical &lt;span style="font-style: italic;"&gt;n&lt;/span&gt;-grams is present in the text, and if so trigger an alarm.&lt;/li&gt;&lt;/ul&gt;I’ve just done a little rough test with my own mails and I can see word sequences appearing like: &lt;span style="font-style: italic;"&gt;"hereafter”, "attached file(s)”, "attachment(s)”, "I’m sending you”, "I’m forwarding to you”, "here is the report”, "here is the file”, "here is the/a document”, "here is the estimate”, "please find”, &lt;/span&gt;etc.&lt;br /&gt;&lt;br /&gt;Of course, a program like this will generate a little noise (false alerts) and silence (missed attachments), but if 95% of cases can be detected, it’s a more than useful function.&lt;br /&gt;&lt;br /&gt;My estimate:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Building a prototype: one day.&lt;/li&gt;&lt;li&gt;Developing and testing an operational version: one month.&lt;/li&gt;&lt;/ul&gt;Maybe I should offer my services to Google, since if I am to believe the mini-test featured on &lt;a href="http://pulse2.com/2008/09/16/testing-gmails-new-forgotten-attachment-detector/"&gt;Pulse 2.0&lt;/a&gt;, it's not very good. The detector recognizes "I have attached", but not "Attach a document" or "Here is the attachment"... I tested this myself, with phrases like "Attached please find a copy of...", without much more success. Rather strange all the same.&lt;br /&gt;&lt;br /&gt;It remains to be seen (after having resolved these few details...) if Google will offer a French version. I’ve already mentioned in the past the amount of time Google takes in localizing its products. Sometimes a few years. Watch this space.&lt;p class="blogger-labels"&gt;Libellés : &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Google"&gt;Google&lt;/a&gt;&lt;/p&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/wRKu2_7jMio" height="1" width="1"/&gt;</description>
            <pubDate>Wed, 24 Sep 2008 09:44:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/09/google-please-find-attached.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/09/google-please-find-attached.html</feedburner:origLink></item>
        <item>
            <title>Tool: Who's creating today's buzz?</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/jGV31YNGVQ0/tool-whos-creating-todays-buzz.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/04/outil-qui-buzze-aujourdhui.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png"&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I have &lt;a href="http://aixtal.blogspot.com/2008/03/wikio-intelligent-news-portal.html"&gt;already mentioned&lt;/a&gt; the admiration I have for &lt;a href="http://www.wikio.com"&gt;Wikio&lt;/a&gt;'s linguistic technology, which I find to be one of the most well-developed amongst the many tools, search engines and portals currently available on the web. One very interesting function is the automatic detection of "named entities", those being names of people, places and companies. You might have already noticed that the engine displays, in the summary of each news story, a selection of different links to the various entities that it has recognised, allowing one to launch further, related searches with a single click. An inveterate  &lt;a href="http://en.wikipedia.org/wiki/Tinkerer"&gt;tinkerer&lt;/a&gt;, I had a little fun with this and aggregated a few statistics to give you the daily buzz:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;" border="0"&gt;&lt;iframe src="http://labs.wikio.net/buzzclouds/widget.php?n=50&amp;size=3&amp;country=us" marginwidth="0" marginheight="0" frameborder="0" height="400" scrolling="no" width="400"&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Interesting, no? Do bookmark this page; it's automatically updated every hour according to the latest news.&lt;br /&gt;&lt;br /&gt;And if you are interested in other languages, &lt;a href="http://labs.wikio.net/buzzclouds"&gt;drop by this page&lt;/a&gt;, where you will find the daily buzz for the news in German, French, Spanish and Italian. You will be surprised at just how much tastes vary from country to country!
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/jGV31YNGVQ0" height="1" width="1"/&gt;</description>
            <pubDate>Mon, 28 Apr 2008 11:35:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/04/tool-whos-creating-todays-buzz.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/04/tool-whos-creating-todays-buzz.html</feedburner:origLink></item>
        <item>
            <title>Wikio: Intelligent news portal</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/t3b2aLHq06I/wikio-intelligent-news-portal.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/02/wikio-portail-dactualits-intelligent.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Quite a while ago now, I promised to talk to you about the intelligent news portal &lt;a href="http://www.wikio.fr/"&gt;&lt;span style="font-weight: bold;"&gt;Wikio [fr]&lt;/span&gt;.&lt;/a&gt; I came across this site in an absent minded glimpse, as with many of you no doubt, and stupidly only saw it as another aggregator, all be it with Digg style vote buttons admittedly, but nothing worth writing home about. Tragic error. Wikio is undoubtedly the service which harbors the most advanced &lt;span style="font-weight: bold;"&gt;linguistic technology&lt;/span&gt; on the Web at the current time (and you’ve noticed that that’s the theme of this blog... it just had to interest me!).&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.wikio.com/"&gt;&lt;img src="http://www.veronis.fr/blog-images/logo-mywikio.gif" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;I’ll no doubt come back to it in other postings, but I just wanted to give you an example. Wikio doesn’t just aggregate news and postings ad hoc. When you go to its main competitor,  &lt;a href="http://news.google.com/"&gt;Google News&lt;/a&gt;, the home page offers you today’s headlines grouped into major &lt;span style="font-weight: bold;"&gt;categories&lt;/span&gt; (Sports, International, France, Economy, etc.). That’s basically where the intelligence of the service ends. It’s true that when you enter a keyword, the articles are presented to you in aggregate fashion, but this aggregation is of poor quality. Enter “&lt;a href="http://news.google.com/news?q=yahoo"&gt;Yahoo&lt;/a&gt;”, for example, and you will see that the groups are quite un-readable. Many news items are not grouped at all and the existing groups overlap each other: the Microsoft affair is spread over a variety of groups, etc. (when you enter a query, the page will certainly have changed, but you get the idea). When it came online in 2002, however, I praised this service. &lt;span style="font-weight: bold;"&gt;Document clustering&lt;/span&gt; (and thus news clustering) is an extremely difficult issue, as you can imagine, and the system seemed very promising. Alas, as with many Google products, after its initial launch it hardly evolved, although it officially left the beta version in 2006. Google concentrated more on the number of sources (4500 for English so we’re told) than on their quality, or that of the algorithms… The increase in the number of sources (easy to do automatically) quite logically leads to the deterioration of the clustering quality.&lt;br /&gt;&lt;br /&gt;For &lt;span style="font-weight: bold;"&gt;Wikio&lt;/span&gt;, it’s not perfect (the service is clearly announced as a beta version), but the underlying technology is infinitely more promising. Articles (from media or blogs) are not merely grouped into high level categories (Sports, etc.) but in a veritable “knowledge tree” which currently includes over &lt;a href="http://www.wikio.com/categories"&gt;30 000 categories&lt;/a&gt; (at least on the &lt;a href="http://www.wikio.fr/"&gt;French site&lt;/a&gt; -- Wikio.com is more recent and might be a little behind):&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.wikio.com/categories"&gt;&lt;img src="http://www.veronis.fr/blog-images/wikio-categ.jpg" /&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;If you count, you will see that there aren’t quite 30 000 categories (even on the French site). I asked Wikio the question: it's normal, the list changes constantly and only categories which have had recent news appear.&lt;br /&gt;&lt;br /&gt;To my knowledge, the categories are not visible anywhere in tree form, but one can guess the organization by the URL form. Take the “&lt;a style="font-weight: bold;" href="http://www.wikio.com/health/disability/deafness"&gt;deafness&lt;/a&gt;” category for example. When you enter this keyword into the engine, it sends you back to a page containing news on the topic, with an URL giving the following hierarchy:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.wikio.com/health/disability/deafness"&gt;&lt;span style="font-weight: bold;"&gt;http://www.wikio.com/health/disability/deafness&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The &lt;a href="http://www.wikio.com/health"&gt;Health&lt;/a&gt; theme contains numerous sub-themes, including &lt;a href="http://www.wikio.com/health/disability"&gt;Disability&lt;/a&gt;, which in turn contains &lt;a href="http://www.wikio.com/health/disability/deafness"&gt;Deafness&lt;/a&gt;. This hierarchy is also clearly given by navigation links in the top left hand corner of the page:&lt;br /&gt;&lt;br /&gt;&lt;h4 style="text-align: center;" class="trail" id="trail"&gt;&lt;span class="g"&gt;&lt;a href="http://www.wikio.com/"&gt;News&lt;/a&gt; &gt; &lt;a href="http://www.wikio.com/health"&gt;Health&lt;/a&gt; &gt; &lt;a href="http://www.wikio.com/health/disability"&gt;Disability&lt;/a&gt; &gt; Deafness&lt;/span&gt;&lt;/h4&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.veronis.fr/blog-images/wikio-deafness-1.jpg" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The &lt;a href="http://www.wikio.com/health/disability/deafness"&gt;Deafness&lt;/a&gt; theme in turn contains other sub-themes: &lt;a href="http://www.wikio.com/health/disability/deafness/cochlear_implants"&gt;Cochlear implants&lt;/a&gt;, &lt;a href="http://www.wikio.com/health/disability/deafness/sign_language"&gt;Sign language&lt;/a&gt;, &lt;a href="http://www.wikio.com/health/disability/deafness/lip_reading"&gt;Lip reading&lt;/a&gt;, &lt;a href="http://www.wikio.com/health/disability/deafness/cued_speech"&gt;Cued speech &lt;/a&gt; and others. But navigating to the sub-categories is less easy, and it’s a shame (a bunch of tags can indeed be found to the right of the screen, but they are often complex and don't only present daughter categories). One could imagine other more practical solutions (a small scroll down menu for example under the word Deafness in the navigation link at the top of the page).&lt;br /&gt;&lt;br /&gt;Don’t think that it only consists of an alert on the keyword &lt;span style="font-style: italic;"&gt;deafness&lt;/span&gt; as is the case with Google. The page offers articles which don’t contain this word, but which contain related words:  d&lt;span style="font-style: italic;"&gt;eaf, hearing, hearing loss,&lt;/span&gt; etc. And, above all, Wikio doesn’t let itself get too much hoodwinked by articles (and there are plenty in its database, I’ve just checked) which talk about the deafness of power, and politicians turning a deaf ear and so on.&lt;br /&gt;&lt;br /&gt;Wikio presents a fantastic reservoir of &lt;span style="font-weight: bold;"&gt;structured information&lt;/span&gt;, that is, to my knowledge, unrivalled. The beauty of the thing is that everyone can create their own news pages, either by subscribing directly to a category’s RSS feed (for example &lt;a href="http://rss.wikio.com/health/disability/deafness.rss"&gt;here for deafness&lt;/a&gt;), or by combining the categories with each other to create one’s own tabs – which can in turn be exploited by a specific RSS feed!&lt;br /&gt;&lt;br /&gt;Absolutely fascinating. The possibilities of such a system are mind boggling... Of course, there is some tweaking to be done here and there, as you may imagine. This is the very forefront (and believe me, extremely difficult) in language technologies. And there are some perverse cases. &lt;a href="http://aixtal.blogspot.com/2007/12/seo-google-and-its-image.html"&gt;One of my postings&lt;/a&gt;, on Google and internet referencing, has gone into the Cosmetics category because I quoted the expression &lt;span style="font-style: italic;"&gt;nail varnish&lt;/span&gt; for example. But, honestly, only the &lt;a href="http://aix-echos.blogspot.com/2008/02/lib-1968-anne-informatique.html"&gt;HAL's grandson [fr]&lt;/a&gt;…would be able to resolve that one, and in 3001 no doubt.&lt;br /&gt;&lt;br /&gt;I'll be brief... I know that we are in the zapping civilization and that most of you have already gone off onto other channels. So I’ll come back to that. I’ll go into greater detail about what I've been able to understand of the surprising technology behind all this. Meanwhile, I’m eagerly awaiting the new version on which Wikio will apparently begin to do &lt;a href="http://www.kelblog.com/2008/02/wikio-europa-ar.html"&gt;“teasing” [fr]&lt;/a&gt; ;-) So watch this space!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;p class="ps"&gt;PS&lt;/p&gt;&lt;br /&gt;&lt;a href="http://blog.wikio.fr/2008/02/wikio-europa-da.html"&gt;It's confirmed! [fr]&lt;/a&gt;, a new version is in the starting-blocks.
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/t3b2aLHq06I" height="1" width="1"/&gt;</description>
            <pubDate>Sat, 01 Mar 2008 08:44:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/03/wikio-intelligent-news-portal.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/03/wikio-intelligent-news-portal.html</feedburner:origLink></item>
        <item>
            <title>Yahoo: Multilingual suggestions</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/itd2qgccEQs/yahoo-multilingual-suggestions.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/02/yahoo-suggestions-multilingues.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;A lot has been said about &lt;span style="font-weight: bold;"&gt;Yahoo!&lt;/span&gt; &lt;a href="http://www.iht.com/articles/2008/02/06/technology/google.php"&gt;lately&lt;/a&gt;. The company is clearly suffering from image problems, and a certain lack of coherence… But I’ve already had the opportunity of pointing that out (for example &lt;a href="http://aixtal.blogspot.com/2005/03/web-yahoo-indexes-more-pages-than.html"&gt;here&lt;/a&gt;, or &lt;a href="http://aixtal.blogspot.com/2006/03/moteurs-yahoo-google.html"&gt;here&lt;/a&gt;…), its technology is not bad at all and in a certain number of cases it is even superior to that of Google.&lt;br /&gt;&lt;br /&gt;This is the case for example with search suggestions. It’s true that Google announced this function as of late 2004, three years earlier than Yahoo! [Nostalgia trip: it was one of the &lt;a href="http://aixtal.blogspot.com/2004/12/web-google-devine-la-suite.html"&gt;first postings (fr)&lt;/a&gt; I wrote on this blog...] We’re now familiar with Ajax and relatively blazé about Web 2.0 (which already sometimes seems dated), but remember: at the time, this type of interactive technology (based on Javascript and XMLHttpRequest, see &lt;a href="http://serversideguy.blogspot.com/2004/12/google-suggest-dissected.html"&gt;reverse-engineering here&lt;/a&gt;) was a small revolution…&lt;br /&gt;&lt;br /&gt;The problem with &lt;span style="font-weight: bold;"&gt;Google Suggest&lt;/span&gt;, as with many other Google technologies, is that it has hardly evolved since it was launched. It’s true that it has been integrated into the Toolbar and the &lt;a href="http://www.google.com/webhp?complete=1&amp;amp;hl=en"&gt;service home page &lt;/a&gt;no longer says “beta”,  but I can hardly see any changes compared with 2004. Especially as, unless I’m mistaken, Google Suggest still doesn't distinguish between languages, which is really awkward for us French speakers (and a few other internet users throughout the world).&lt;br /&gt;&lt;br /&gt;So, when I type “ala” in my search box, I get suggestions concerning &lt;span style="font-style: italic;"&gt;Alaska airlines, Alamo car rental&lt;/span&gt; or&lt;span style="font-style: italic;"&gt; alarm clocks,&lt;/span&gt; which are not exactly subjects of interest for your average Frog.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.veronis.fr/blog-images/google-suggest-ala.png" /&gt;&lt;/div&gt;&lt;br /&gt;Yahoo! took three years longer (&lt;span style="font-style: italic;"&gt;that&lt;/span&gt; just might be their problem...) but the version which came out in December (after coming out in October in the States) is "localized":&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.veronis.fr/blog-images/yahoo-suggest-ala.jpg" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;It is even better designed than Google Suggest, as it knows how to search in individual words inside complex requests:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.veronis.fr/blog-images/yahoo-suggest-alb.png" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Compared with Google Suggest, which settles for searching in the first word:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.veronis.fr/blog-images/google-suggest-alb.png" /&gt;&lt;/div&gt;&lt;br /&gt;Other interesting functions from a linguistics point of view have also appeared recently with Yahoo! I’m a little short of time these days (I’m sure you've noticed ;-) but I'll try to come back to this... Anyway, once again I find it hard to understand the disdain internet users have for Yahoo! (especially in France). Questions of image, marketing, buzz... That’s the way the world goes round, and the Web with it!&lt;p class="blogger-labels"&gt;Libellés : &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Yahoo"&gt;Yahoo&lt;/a&gt;&lt;/p&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/itd2qgccEQs" height="1" width="1"/&gt;</description>
            <pubDate>Fri, 08 Feb 2008 12:40:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/02/yahoo-multilingual-suggestions.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/02/yahoo-multilingual-suggestions.html</feedburner:origLink></item>
        <item>
            <title>Tool: Analyzing the buzz with CaptuCourbes</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/pAkF-48CS6w/tool-analyzing-buzz-with-captucourbes.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2008/01/outil-analysez-le-buzz-avec-le.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;You know I love tools (&lt;a href="http://affordance.typepad.com/mon_weblog/2006/01/utilomane.html"&gt;some [fr&lt;/a&gt;&lt;a href="http://affordance.typepad.com/mon_weblog/2006/01/utilomane.html"&gt;]&lt;/a&gt; might even say I'm a tool junkie). I've come up with a few myself, you remember, but I like it even better when they are developed by others – especially when it's precisely an idea I've been playing with for months, and when I haven't been able to find the time to do it.&lt;br /&gt;&lt;br /&gt;You have probably noticed that numerous sites offer buzzometers, providing graphs: &lt;a href="http://www.technorati.com/"&gt;Technorati&lt;/a&gt;, &lt;a href="http://www.blogpulse.com/"&gt;BlogPulse&lt;/a&gt;, &lt;a href="http://www.blogscope.net/"&gt;BlogScope&lt;/a&gt;, &lt;a href="http://www.trendio.com/"&gt;Trendio&lt;/a&gt;, soon &lt;a href="http://blog.wikio.fr/2007/12/bonne-et-heureu.html"&gt;Wikio Buzz&lt;/a&gt;&lt;a href="http://blog.wikio.fr/2007/12/bonne-et-heureu.html"&gt; [fr]&lt;/a&gt;, etc. There's just one thing though... These services usually dish up beautiful graphics, but not the data themselves (not as daft as that), and obviously, the graphs cannot be compared (different scales, different time frames, etc.). Maybe you did what I did: manually  superimposed the graphs playing with transparency in Photoshop. But I wondered whether the images couldn't be analyzed automatically...&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Fait accompli.&lt;/span&gt; &lt;a href="http://philippe.gambette.free.fr/"&gt;Philippe Gambette [fr]&lt;/a&gt;, another surprising tool junkie (and an admirer of my blog – I promise I paid nothing for the title of &lt;a href="http://gambette.blogspot.com/2008/01/comparer-les-courbes-de-buzz-avec-le.html"&gt;his [fr]&lt;/a&gt;!) designed this tool. Simple, effective, open and free.&lt;br /&gt;&lt;br /&gt;With, as a bonus, the comparative analysis of the &lt;a href="http://aixtal.blogspot.com/2007/12/actu-laure-manaudou-nue-dans-un-arbre.html"&gt;"Manaudou naked" [fr]&lt;/a&gt; mega-buzz:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://gambette.blogspot.com/2008/01/comparer-les-courbes-de-buzz-avec-le.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/CaptucourbesBruniManaudou.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-size:85%;"&gt;&lt;a href="http://gambette.blogspot.com/2008/01/comparer-les-courbes-de-buzz-avec-le.html"&gt;Click to enlarge and view on Philippe's blog&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;To be seen (and buzzed!) urgently! It's &lt;a href="http://gambette.blogspot.com/2008/01/comparer-les-courbes-de-buzz-avec-le.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;p class="ps"&gt;New&lt;/p&gt;&lt;ul class="noborder"&gt;&lt;li&gt;&lt;a style="text-decoration: none;" href="http://feeds.feedburner.com/aixtal-en"&gt;&lt;img src="http://sites.univ-provence.fr/veronis/blog-images/english.gif" /&gt; &lt;span style="font-weight: bold;"&gt;Subscribe to English-only RSS feed !&lt;/span&gt;&lt;/a&gt; &lt;/li&gt;&lt;/ul&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/pAkF-48CS6w" height="1" width="1"/&gt;</description>
            <pubDate>Wed, 09 Jan 2008 11:38:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2008/01/tool-analyzing-buzz-with-captucourbes.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2008/01/tool-analyzing-buzz-with-captucourbes.html</feedburner:origLink></item>
        <item>
            <title>Aixtal: RSS streaming per language</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/M2MSJ2c_-MA/aixtal-rss-streaming-per-language.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2007/12/aixtal-flux-rss-par-langue.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;You have no doubt noticed that I have started posting in English again lately. I don’t know if I’ll carry on as it takes more time…&lt;br /&gt;&lt;br /&gt;Anyway, this time I thought it might present problems for some tools like &lt;a href="http://www.wikio.fr/high-tech/internet/blogosphere/blogueurs/jean_veronis"&gt;Wikio&lt;/a&gt; or &lt;a href="http://www.rollsense.com/"&gt;RollSense&lt;/a&gt; which apparently don't detect (not yet, at least) the language. This means my postings are wrongly indexed…&lt;br /&gt;&lt;br /&gt;So I programmed a little system enabling this blog to have two streams:&lt;br /&gt;&lt;ul class="noborder"&gt;&lt;li&gt;&lt;a href="http://feeds.feedburner.com/aixtal-fr"&gt;&lt;img src="http://sites.univ-provence.fr/veronis/blog-images/french.gif" /&gt;&lt;/a&gt; &lt;span style="font-weight: bold;"&gt;Français&lt;/span&gt; : &lt;a href="http://feeds.feedburner.com/aixtal-fr"&gt;http://feeds.feedburner.com/aixtal-fr&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://feeds.feedburner.com/aixtal-en"&gt;&lt;img src="http://sites.univ-provence.fr/veronis/blog-images/english.gif" /&gt;&lt;/a&gt; &lt;span style="font-weight: bold;"&gt;Anglais&lt;/span&gt; : &lt;a href="http://feeds.feedburner.com/aixtal-en"&gt;http://feeds.feedburner.com/aixtal-en&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;This might be more comfortable for those of you who only want to have one or other of the versions. If you want to have both French and English, the following streaming will provide both:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Bilingual &lt;/span&gt;: &lt;a href="http://feeds.feedburner.com/aixtal"&gt;http://feeds.feedburner.com/aixtal&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;It’s a shame that the blog platforms don't offer this function. It's so easy though...
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/M2MSJ2c_-MA" height="1" width="1"/&gt;</description>
            <pubDate>Thu, 13 Dec 2007 16:09:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2007/12/aixtal-rss-streaming-per-language.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2007/12/aixtal-rss-streaming-per-language.html</feedburner:origLink></item>
        <item>
            <title>Google: When California awakes...</title>
            <link>http://feedproxy.google.com/~r/aixtal-en/~3/3vf7N7p9Zlo/google-when-california-awakes.html</link>
            <description>&lt;div&gt;
					&lt;div class="noborder" style="text-align: right;"&gt;&lt;a href="http://aixtal.blogspot.com/2007/12/google-quand-la-californie-seveille.html"&gt;&lt;img src="http://www.veronis.fr/blog-images/french-version.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;After publishing a posting on Google I like looking at my statistics just as California is waking up.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.veronis.fr/blog-images/google-s-eveille.jpg" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Hi guys, greetings from Aix!&lt;p class="blogger-labels"&gt;Libellés : &lt;a rel='tag' href="http://aixtal.blogspot.com/search/label/Google"&gt;Google&lt;/a&gt;&lt;/p&gt;
				&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/aixtal-en/~4/3vf7N7p9Zlo" height="1" width="1"/&gt;</description>
            <pubDate>Wed, 12 Dec 2007 22:07:00 +0100</pubDate>
            <guid isPermaLink="false">http://aixtal.blogspot.com/2007/12/google-when-california-awakes.html</guid>
        <feedburner:origLink>http://aixtal.blogspot.com/2007/12/google-when-california-awakes.html</feedburner:origLink></item>
    </channel>
</rss>
