<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!-- name="generator" content="SnipSnap/1.0b3-uttoxeter" --><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:blogChannel="http://backend.userland.com/blogChannelModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

  <channel>
    <title>thinkberg</title>
    
    <link>http://thinkberg.com/space/start</link>
    <description />
    <dc:title>start</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2007-09-06T00:23:40+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start</dc:identifier>
<dc:creator>arte</dc:creator>

    <!-- <blogChannel:changes>http://www.weblogs.com/rssUpdates/changes.xml</changes> -->
    <admin:generatorAgent rdf:resource="http://www.snipsnap.org/space/version-1.0b3-uttoxeter" />
    
       <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/thinkberg" /><feedburner:info uri="thinkberg" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
        <title>TWIMPACT NIPS preparations</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/lDkwLgdYEuw/1</link>
        <description>Finally, as Mikio wrote in our dev blog the analysis is running smoothly. Next we are going to put the non-serial analyzable stuff on the cluster to roll the whole of 2011 into our snapshots. It will be fun to see what are the hotspots. And as a side effect we are looking at a comparison of our own little language detector to the chrome language detector.What Mikio doesn't know, i am already planning on using more memory for even more fun stuff to analyze.</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-11-29/1#TWIMPACT_NIPS_preparations</guid>
        <content:encoded><![CDATA[Finally, <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.tumblr.com/">as Mikio wrote in our dev blog</a></span> the analysis is running smoothly. Next we are going to put the non-serial analyzable stuff on the cluster to roll the whole of 2011 into our snapshots. It will be fun to see what are the hotspots. And as a side effect we are looking at a comparison of our own little language detector to the chrome language detector.<p class="paragraph"/>What Mikio doesn't know, i am already planning on using more memory for even more fun stuff to analyze.<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=lDkwLgdYEuw:-Ssy68K8gB4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/lDkwLgdYEuw" height="1" width="1"/>]]></content:encoded>
        <dc:title>TWIMPACT NIPS preparations</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-11-29T19:31:51+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-11-29/1#TWIMPACT_NIPS_preparations</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-11-29/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-11-29/1#TWIMPACT_NIPS_preparations</feedburner:origLink></item>
    
       <item>
        <title>Warum der Bundestrojaner so schlecht ist ...</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/dDhFvATAns0/1</link>
        <description>Siehe Google+, wo ich in Ansätzen versuche zu beleuchten wieso Software des Bundes so schlecht ist ...&amp;#104;ttps://plus.google.com/117122816629542437147/posts/f5WpLAYGqFj</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-10-09/1#Warum_der_Bundestrojaner_so_schlecht_ist_...</guid>
        <content:encoded><![CDATA[Siehe Google+, wo ich in Ansätzen versuche zu beleuchten wieso Software des Bundes so schlecht ist ...<p class="paragraph"/><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="https://plus.google.com/117122816629542437147/posts/f5WpLAYGqFj">&#104;ttps://plus.google.com/117122816629542437147/posts/f5WpLAYGqFj</a></span><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=dDhFvATAns0:QTYc1eSh4G8:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/dDhFvATAns0" height="1" width="1"/>]]></content:encoded>
        <dc:title>Warum der Bundestrojaner so schlecht ist ...</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-10-09T10:53:31+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-10-09/1#Warum_der_Bundestrojaner_so_schlecht_ist_...</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-10-09/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-10-09/1#Warum_der_Bundestrojaner_so_schlecht_ist_...</feedburner:origLink></item>
    
       <item>
        <title>The curse of the bit dump ...</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/DvILzT4Zutw/1</link>
        <description>I wonder where it will end. Everyone is speaking of big data and even I take part in it.
At first I thought we might end up something like in "Muell", a comic book by Juan Gimenez picturing a world where only a very small part is still habitable and all the rest of it looks like a big dump. Funnily, that is not going to happen. At least I think that we are going in a different direction that has to do with our fascination for big data, which is closely related to "messyism": Stupidly collecting things we no longer need.
A not so long time ago everything would at some point simply disintegrate because the material would not last forever. With the industrial revolution we invented more and more durable materials and now we have this volatile bit which is going to haunt us even longer, because we make it stick in our data centers. Already most of Europe and parts of the rest of our world are covered with data centers. We will need even more if we don't look back at how our ancestors handled information.Partly due to lack of storage, but also for the sake of simplicity and understanding information was boiled down, filtered and then came part where it was stored (drawn or written down mostly). That's the point where I think we need to think again and focus on the part where understanding and filtering reduces the amount of information.Still, there is this wish to keep up with the speed of our time and having it all available. At the speed we produce data, the analytics tools that evolve will lose in the end if we try to catch up the bit dumps already there.
We have a chance though to keep the pace by shifting away from big data storage analytics to real-time data analytics that simply keeps our level of knowledge, the insights from our continuous data streams, up-to-date at all times.The resulting data will be much less and much more informative than all the bit muell it came from and it will in the end protect us from ending up to look like Coruscant, a planet of data centers.</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-06-27/1#The_curse_of_the_bit_dump_...</guid>
        <content:encoded><![CDATA[I wonder where it will end. Everyone is speaking of big data and even <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">I take part in it</a></span>.<p class="paragraph"/><a href="http://www.google.com?q=juan+gimenez+muell"><img src="http://thinkberg.com/space/start/2011-06-27/1/muellschwermetall.png" alt="muellschwermetall" class="float-right" border="0"/></a>
At first I thought we might end up something like in "Muell", a comic book by Juan Gimenez picturing a world where only a very small part is still habitable and all the rest of it looks like a big dump. Funnily, that is not going to happen. At least I think that we are going in a different direction that has to do with our fascination for big data, which is closely related to "messyism": Stupidly collecting things we no longer need.<p class="paragraph"/><a href="http://www.datacentermap.com/"><img src="http://thinkberg.com/space/start/2011-06-27/1/datacenters.png" alt="datacenters" class="float-left" border="0"/></a>
A not so long time ago everything would at some point simply disintegrate because the material would not last forever. With the industrial revolution we invented more and more durable materials and now we have this volatile bit which is going to haunt us even longer, because we make it stick in our data centers. Already most of Europe and parts of the rest of our world are covered with data centers. We will need even more if we don't look back at how our ancestors handled information.<p class="paragraph"/>Partly due to lack of storage, but also for the sake of simplicity and understanding information was boiled down, filtered and then came part where it was stored (drawn or written down mostly). That's the point where I think we need to think again and focus on the part where understanding and filtering reduces the amount of information.<p class="paragraph"/>Still, there is this wish to keep up with the speed of our time and having it all available. At the speed we produce data, the analytics tools that evolve will lose in the end if we try to catch up the bit dumps already there.<p class="paragraph"/><a href="http://swc.fs2downloads.com/sshot.php?subdir=Misc/Brand-X/&#38;page=1&#38;images=10&#38;sort=MTIME_ASC"><img src="http://thinkberg.com/space/start/2011-06-27/1/coruscant.png" alt="coruscant" class="float-right" border="0"/></a>
<i class="italic">We have a chance though to keep the pace by shifting away from big data storage analytics to real-time data analytics that simply keeps our level of knowledge, the insights from our continuous data streams, up-to-date at all times.</i><p class="paragraph"/>The resulting data will be much less and much more informative than all the bit muell it came from and it will in the end protect us from ending up to look like Coruscant, a planet of data centers.<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=DvILzT4Zutw:egelUcVNe74:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/DvILzT4Zutw" height="1" width="1"/>]]></content:encoded>
        <dc:title>The curse of the bit dump ...</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-06-27T09:39:20+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-06-27/1#The_curse_of_the_bit_dump_...</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-06-27/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-06-27/1#The_curse_of_the_bit_dump_...</feedburner:origLink></item>
    
       <item>
        <title>updated arab revolt trends with location detection</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/2KFCSlPIGv4/1</link>
        <description>IMPORTANT: To really update the application, give it a few reloads in the browser until you see a map below the keywords and the trending graph.I have updated our trend detection for the arabic world in two ways:

I have separated libya and iran into their own trackers.
The trends now also contain retweet information.
An experimental features shows an approximate coordinate or what is being talked about in the tweet
More to come, the data stream already transmits mention trends, retweet trends, hashtag trends and link trends in addition to what is already there. I just did not have enough time to put it into the interface.
You can now select the tracker in the menu top right:Enjoy!</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-03-06/1#updated_arab_revolt_trends_with_location_detection</guid>
        <content:encoded><![CDATA[<b class="bold">IMPORTANT:</b> To really update the application, give it a few reloads in the browser until you see a map below the keywords and the trending graph.<p class="paragraph"/>I have updated our trend detection for the arabic world in two ways:
<ol>
<li>I have separated <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://beta.twimpact.com/wike/#!/stream/libya">libya</a></span> and <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://beta.twimpact.com/wike/#!/stream/iran">iran</a></span> into their own trackers.</li>
<li>The trends now also contain retweet information.</li>
<li>An experimental features shows an approximate coordinate or what is being talked about in the tweet</li>
</ol>More to come, the data stream already transmits mention trends, retweet trends, hashtag trends and link trends in addition to what is already there. I just did not have enough time to put it into the interface.<p class="paragraph"/><a href="http://thinkberg.com/space/start/2011-03-06/1/screeshot16.png"><img src="http://thinkberg.com/space/start/2011-03-06/1/screeshot16small.png" alt="screeshot16small" border="0"/></a><p class="paragraph"/>
You can now select the tracker in the menu top right:<p class="paragraph"/><img src="http://thinkberg.com/space/start/2011-03-06/1/menu.png" alt="menu" border="0"/><p class="paragraph"/>Enjoy!<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=2KFCSlPIGv4:-lYxWD4wi8g:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/2KFCSlPIGv4" height="1" width="1"/>]]></content:encoded>
        <dc:title>updated arab revolt trends with location detection</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-03-06T22:25:47+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-03-06/1#updated_arab_revolt_trends_with_location_detection</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-03-06/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-03-06/1#updated_arab_revolt_trends_with_location_detection</feedburner:origLink></item>
    
       <item>
        <title>How to use egypt.twimpact.com</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/Bkbfv-HTWl4/1</link>
        <description />
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-02-17/1#How_to_use_egypt.twimpact.com</guid>
        <content:encoded><![CDATA[<a href="http://egypt.twimpact.com"><img src="http://thinkberg.com/space/start/2011-02-17/1/streamhowto.png" alt="link=http://egypt.twimpact.com" border="0"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=Bkbfv-HTWl4:tZg-FYdqxhk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/Bkbfv-HTWl4" height="1" width="1"/>]]></content:encoded>
        <dc:title>How to use egypt.twimpact.com</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-02-17T22:43:27+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-02-17/1#How_to_use_egypt.twimpact.com</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-02-17/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-02-17/1#How_to_use_egypt.twimpact.com</feedburner:origLink></item>
    
       <item>
        <title>Spike on arrest of Al-Jazeera Journalists</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/axlHLUtw1FE/1</link>
        <description>Here is the spike our trender collected after Al-Jazeera journalists have been taken into custody.</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-01-31/1#Spike_on_arrest_of_Al-Jazeera_Journalists</guid>
        <content:encoded><![CDATA[Here is the spike our trender collected after Al-Jazeera journalists have been taken into custody.<p class="paragraph"/><a href="space/start/2011-01-31/1/wike.png"><img src="http://thinkberg.com/space/start/2011-01-31/1/wike-small.png" alt="link=space/start/2011-01-31/1/wike.png" border="0"/></a><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=axlHLUtw1FE:5F0p2VT87L4:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/axlHLUtw1FE" height="1" width="1"/>]]></content:encoded>
        <dc:title>Spike on arrest of Al-Jazeera Journalists</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-01-31T14:16:08+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-01-31/1#Spike_on_arrest_of_Al-Jazeera_Journalists</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-01-31/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-01-31/1#Spike_on_arrest_of_Al-Jazeera_Journalists</feedburner:origLink></item>
    
       <item>
        <title>Following #egypt on twitter</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/d3EtHcVcYo8/1</link>
        <description>Trying to follow all the keywords relevant to the egypt crisis can be very difficult. The updates sometimes come in so quickly that it is impossible to read. I've tried it first, tracking all keywords directly. The opportunity to actually provide a useful service is great, so I have installed a special egypt events tracker on&amp;#104;ttp://egypt.twimpact.com/It tracks retweets that contain the keywords and ranks the keywords over time. All this is done using Mikios new "squid farm" trending system. This is not even starting to cratch the surface of its possibilities; but who cares. We will install it to track a few million topics on twimpact.com later on.It's actually streaming the latest tweets directly to all subscribers. The latest updates are there, a tag cloud showing the relevance of the keywords over time and the trending rate within the last hour.What's next? I am going to install a link trender to track all links and if I get it working a picture stream from the twitter data.</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2011-01-30/1#Following_#egypt_on_twitter</guid>
        <content:encoded><![CDATA[Trying to follow all the keywords relevant to the egypt crisis can be very difficult. The updates sometimes come in so quickly that it is impossible to read. I've tried it first, tracking all keywords directly. The opportunity to actually provide a useful service is great, so I have installed a special egypt events tracker on<p class="paragraph"/><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><span class="nobr"><a href="http://egypt.twimpact.com/">&#104;ttp://egypt.twimpact.com/</a></span><p class="paragraph"/>It tracks retweets that contain the keywords and ranks the keywords over time. All this is done using <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://blog.mikiobraun.de/">Mikios</a></span> new "squid farm" trending system. This is not even starting to cratch the surface of its possibilities; but who cares. We will install it to track a few million topics on <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">twimpact.com</a></span> later on.<p class="paragraph"/>It's actually streaming the latest tweets directly to all subscribers. The latest updates are there, a tag cloud showing the relevance of the keywords over time and the trending rate within the last hour.<p class="paragraph"/>What's next? I am going to install a link trender to track all links and if I get it working a picture stream from the twitter data.<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=d3EtHcVcYo8:TOD5w9ryDfI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/d3EtHcVcYo8" height="1" width="1"/>]]></content:encoded>
        <dc:title>Following #egypt on twitter</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2011-01-30T12:56:45+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2011-01-30/1#Following_#egypt_on_twitter</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2011-01-30/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2011-01-30/1#Following_#egypt_on_twitter</feedburner:origLink></item>
    
       <item>
        <title>Moving a cluster at 200GB/hour</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/ovgiFcS8YbA/1</link>
        <description>I just moved the twimpact cluster from Berlin to Amsterdam. It's about 650km and the current payload on the cluster is about 800GB. We moved at an average rate of 200GB/hour but in the end, with the last traffic jam right before entering Amsterdam we dropped back to an effective rate of 114GB/hour. Still okay, especially since a DELL PowerEdge M1000e Blade Enclosure fits very nicely into the trunk of my car. The only indication that we have loaded something is the weight, pressing down the back of the car.Today it will be put into the rack and then the holiday part of the trip begins.</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2010-10-02/1#Moving_a_cluster_at_200GB/hour</guid>
        <content:encoded><![CDATA[I just moved the <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">twimpact</a></span> cluster from Berlin to Amsterdam. It's about 650km and the current payload on the cluster is about 800GB. We moved at an average rate of 200GB/hour but in the end, with the last traffic jam right before entering Amsterdam we dropped back to an effective rate of 114GB/hour. Still okay, especially since a DELL PowerEdge M1000e Blade Enclosure fits very nicely into the trunk of my car. The only indication that we have loaded something is the weight, pressing down the back of the car.<p class="paragraph"/>Today it will be put into the rack and then the holiday part of the trip begins.<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=ovgiFcS8YbA:exbkyp2FFdk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/ovgiFcS8YbA" height="1" width="1"/>]]></content:encoded>
        <dc:title>Moving a cluster at 200GB/hour</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2010-10-02T09:57:08+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2010-10-02/1#Moving_a_cluster_at_200GB/hour</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2010-10-02/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2010-10-02/1#Moving_a_cluster_at_200GB/hour</feedburner:origLink></item>
    
       <item>
        <title>What I've learned in the past few month ...</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/EaHth86Fy3Q/1</link>
        <description>
Scala (new TWIMPACT infrastructure)
Apache Cassandra (new TWIMPACT datastore)
Akka (decoupling TWIMPACT analysis, processing and query infrastructure)
Apache ActiveMQ
Apache Commons Pool (scala implementation for our cassandra client and an Akka actors pool for connection to ActiveMQ)
REST (TWIMPACT API)
This is not counting that it keeps a lot of other knowledge alive. And all that after I thought I would never really code again. For those who don't know, my day-to-day job is knowledge transfer from university to industry in the neuroscience area. A lot of fun too.Do you love your job? ;-)</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2010-09-18/1#What_I've_learned_in_the_past_few_month_...</guid>
        <content:encoded><![CDATA[<ul class="minus">
<li><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://www.scala-lang.org/">Scala</a></span> (new <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">TWIMPACT</a></span> infrastructure)</li>
<li><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://cassandra.apache.org">Apache Cassandra</a></span> (new <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">TWIMPACT</a></span> datastore)</li>
<li><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://www.akkasource.org">Akka</a></span> (decoupling <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">TWIMPACT</a></span> analysis, processing and query infrastructure)</li>
<li><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://activemq.apache.org">Apache ActiveMQ</a></span></li>
<li><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://commons.apache.org/pool/">Apache Commons Pool</a></span> (scala implementation for our cassandra client and an Akka actors pool for connection to ActiveMQ)</li>
<li><span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">REST</a></span> (<span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://twimpact.com">TWIMPACT</a></span> API)</li>
</ul>This is not counting that it keeps a lot of other knowledge alive. And all that after I thought I would never really code again. For those who don't know, my day-to-day job is <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://bfnt-berlin.de">knowledge transfer from university to industry</a></span> in the neuroscience area. A lot of fun too.<p class="paragraph"/>Do you love your job? ;-)<div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=EaHth86Fy3Q:tqULeXp99zU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/EaHth86Fy3Q" height="1" width="1"/>]]></content:encoded>
        <dc:title>What I've learned in the past few month ...</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2010-09-18T23:03:46+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2010-09-18/1#What_I've_learned_in_the_past_few_month_...</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2010-09-18/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2010-09-18/1#What_I've_learned_in_the_past_few_month_...</feedburner:origLink></item>
    
       <item>
        <title>pool size, threads and TIME_WAIT</title>
        <link>http://feedproxy.google.com/~r/thinkberg/~3/TSpXqcpNxxk/1</link>
        <description>We are now in a stage where we put everything together for the new TWIMPACT. Our main analysis, done my Mikio now runs on Scala, the database is based on Cassandra and we have created a messaging infrastructure to distribute further analysis and aggregation tasks.Now was the time to do some testing on real hardware to see whether it would give us what we want. In a first test using a single analysis thread did run fast at first, but came down to very low numbers storing the data. So he decided to increas the number of threads doing the job and the performance increase is immense. However, at some point our little test program started spitting out strange socket errors about not being able to "assign the requested address".It turned out that the host system was accumulating open sockets up to the maximum number around 30k and from then on it did not allow any more new sockets to be opened. However, most of these sockets were in state TIME_WAIT, indicating a closed socket, but not finalized yet.This did look very strange, I had wrapped commons-pool in a very straightforward way and the socket factory also looks simple enough. Enabling extensive debug output only revealed normal action until the errors started. Also, the actual pool size never really went above 10. And that's where that idea, lurking in the back of my mind, came rushing forward: The pools maximum amount of idle elements is set to 8 by default, leading to the problem.

16 threads take out a socket from the pool (not all 16 are actually active at the same time for some reason).
The pool creates new sockets on demand.
After doing a single task the threads put back the socket.
The pool sometimes counts the amount of idle sockets and closes some to get back to the maximum idle count.
The closed sockets sit in the system with TIME_WAIT.
As all this happens very fast the pool closes and creates new sockets quickly and the system starts accumulating closed sockets until it breaks.Setting the pool size to 16 with 16 worker threads works beautifully. Also, we have 32 (1 server, 1 client) open sockets for 16 TSocket connections using the Thrift API, which did speed up the accumulation of open sockets.Conclusion: Set the pool size according to your worker threads.</description>
        <guid isPermaLink="false">http://thinkberg.com/space/start/2010-07-15/1#pool_size,_threads_and_TIME_WAIT</guid>
        <content:encoded><![CDATA[We are now in a stage where we put everything together for the new TWIMPACT. Our main analysis, done my <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://blog.mikiobraun.de/">Mikio</a></span> now runs on <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://www.scala-lang.org/">Scala</a></span>, the database is based on <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://cassandra.apache.org/">Cassandra</a></span> and we have created a messaging infrastructure to distribute further analysis and aggregation tasks.<p class="paragraph"/>Now was the time to do some testing on real hardware to see whether it would give us what we want. In a first test using a single analysis thread did run fast at first, but came down to very low numbers storing the data. So he decided to increas the number of threads doing the job and the performance increase is immense. However, at some point our little test program started spitting out strange socket errors about not being able to "<i class="italic">assign the requested address</i>".<p class="paragraph"/>It turned out that the host system was accumulating open sockets up to the maximum number around 30k and from then on it did not allow any more new sockets to be opened. However, most of these sockets were in state TIME_WAIT, indicating a closed socket, but not finalized yet.<p class="paragraph"/>This did look very strange, I had wrapped <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://commons.apache.org/pool/">commons-pool</a></span> in a very straightforward way and the socket factory also looks simple enough. Enabling extensive debug output only revealed normal action until the errors started. Also, the actual pool size never really went above 10. And that's where that idea, lurking in the back of my mind, came rushing forward: The pools maximum amount of idle elements is set to 8 by default, leading to the problem.
<ul class="star">
<li>16 threads take out a socket from the pool (not all 16 are actually active at the same time for some reason).</li>
<li>The pool creates new sockets on demand.</li>
<li>After doing a single task the threads put back the socket.</li>
<li>The pool sometimes counts the amount of idle sockets and closes some to get back to the maximum idle count.</li>
<li>The closed sockets sit in the system with TIME_WAIT.</li>
</ul>As all this happens very fast the pool closes and creates new sockets quickly and the system starts accumulating closed sockets until it breaks.<p class="paragraph"/>Setting the pool size to 16 with 16 worker threads works beautifully. Also, we have 32 (1 server, 1 client) open sockets for 16 TSocket connections using the <span class="nobr"><img src="http://thinkberg.com/theme/images/Icon-Extlink.png" alt="&gt;&gt;" border="0"/><a href="http://wiki.apache.org/cassandra/API06">Thrift API</a></span>, which did speed up the accumulation of open sockets.<p class="paragraph"/>Conclusion: <b class="bold">Set the pool size according to your worker threads.</b><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/thinkberg?a=TSpXqcpNxxk:f3OFqAmoNCg:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/thinkberg?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/thinkberg/~4/TSpXqcpNxxk" height="1" width="1"/>]]></content:encoded>
        <dc:title>pool size, threads and TIME_WAIT</dc:title>
<dc:language>en</dc:language>
<dc:type>Text</dc:type>
<dc:date>2010-07-15T09:25:19+01:00</dc:date>
<dc:identifier>http://thinkberg.com/space/start/2010-07-15/1#pool_size,_threads_and_TIME_WAIT</dc:identifier>
<dc:creator>arte</dc:creator>

        <comments>http://thinkberg.com/comments/start/2010-07-15/1#post</comments>
      <feedburner:origLink>http://thinkberg.com/space/start/2010-07-15/1#pool_size,_threads_and_TIME_WAIT</feedburner:origLink></item>
    
  </channel>
</rss>

