<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>René Pickhardt</title>
	
	<link>http://www.rene-pickhardt.de</link>
	<description>China, Mathematik, SEO, Heavy Metal</description>
	<lastBuildDate>Fri, 24 Feb 2012 12:53:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Rene-pickhardt" /><feedburner:info uri="rene-pickhardt" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by-sa/2.0/</creativeCommons:license><item>
		<title>Wishlist of features for a distributed graph data base technology</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/hS-bL37GZXQ/</link>
		<comments>http://www.rene-pickhardt.de/wishlist-of-features-for-a-distributed-graph-data-base-technology/#comments</comments>
		<pubDate>Fri, 24 Feb 2012 12:53:59 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[distributed graph data base]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[how to]]></category>
		<category><![CDATA[Information retrieval]]></category>
		<category><![CDATA[neo4j]]></category>
		<category><![CDATA[PhD]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1151</guid>
		<description><![CDATA[I am just dreaming this does not exist and needs to be refined in a later stage. Fast traversals: Jumping from one vertex of the graph to another should be possible in O(1) Online processing: “Standard queries” (&#60;&#8211;whatever this means) should compute within miliseconds. As an example: Local recommendations e.g. similar users in a bipartite [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
<li><a href='http://www.rene-pickhardt.de/neo4j-news-stream-demo-on-wikipedia-graph-running/' rel='bookmark' title='neo4j based social news feed demo on wikipedia graph running'>neo4j based social news feed demo on wikipedia graph running</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I am just dreaming this does not exist and needs to be refined in a later stage.</p>
<ul>
<li><strong>Fast traversals: </strong>
<ul>
<li>Jumping from one vertex of the graph to another should be possible in O(1)</li>
</ul>
</li>
<li><strong>Online processing</strong>:
<ul>
<li>“Standard queries” (&lt;&#8211;whatever this means) should compute within miliseconds.</li>
<li>As an example: Local recommendations e.g. similar users in a bipartite &#8220;User &#8211; Band&#8221; graph should be possible to process online in less than a second.</li>
</ul>
</li>
<li><strong>Query language</strong>:
<ul>
<li>A programming model that supports pattern matching and traversals with one (or possibly several) starting nodes</li>
<li>No SPARQL (too general for a reasonable graph application) support needed.</li>
<li>Support for reading and writing new data (to disk)!</li>
</ul>
</li>
<li><strong>Distribution effort</strong>:
<ul>
<li>The programmer should not have to care about the distribution techniques.</li>
<li>He should just be able to use the technology.</li>
</ul>
</li>
<li><strong>Fault tolerance</strong>:
<ul>
<li>The system has to run stable if working computers are added or removed.</li>
<li>Probably by introducing redundancy in some way [1]</li>
</ul>
</li>
<li><strong>Persistence</strong>:
<ul>
<li>Transactions and persistence are important for any data base service.</li>
</ul>
</li>
</ul>
<p>It is very clear that this wish list is very high level. But I think these are reasonable assumptions from which we can break down the problem and discuss pros and cons of all the techniques needed to built such a system. &nbsp;&nbsp;</p>
<h3>[1] on the Redundancy discussion:</h3>
<p>Depending on the techniques used, introducing redundancy has probably two positive effects on:</p>
<ol>
<li>Fast traversals</li>
<li>Fault tolerance</li>
</ol>
<p>On the other hand it has a deep impact on</p>
<ol>
<li>Persistence (which is hard to achieve in a distributed setting anyway is even harder to achieve once redundancies are included.)</li>
</ol>
<p>It is not clear if we really need redundancy. Maybe there are some other techniques that enable us to find our goals but I personally have the feeling that a good model for redundancy will “solve” the problem.</p>
<h3>relation to the <a href="http://www.rene-pickhardt.de/cat/webscience/reading-club/">reading club </a></h3>
<p>I already found the time to look over<a target="_blank" href="http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/"> our courrent reading assignments</a>. Especially the VLDB paper (<a href="http://cs-www.cs.yale.edu/homes/dna/papers/sw-graph-scale.pdf" target="_blank">Topology partitioning applied to SPARQL, HADOOP and TripleStores</a>) and the <a target="_blank" href="http://www.sandia.gov/~bahendr/papers/graphs-and-machines.pdf" target="_blank">Challenges in parallel graph processing</a>&nbsp;strengthen my confidence that an approach described above seems very reasonable.</p>
<h3>What is your oppinion?</h3>
<p>Do you think I am missing some features or should keep a focus on one particular feature? What about methods to achieve those goals? I am happy to discuss your thoughts!</p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
<li><a href='http://www.rene-pickhardt.de/neo4j-news-stream-demo-on-wikipedia-graph-running/' rel='bookmark' title='neo4j based social news feed demo on wikipedia graph running'>neo4j based social news feed demo on wikipedia graph running</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=hS-bL37GZXQ:ATRS_o0yyHE:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=hS-bL37GZXQ:ATRS_o0yyHE:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=hS-bL37GZXQ:ATRS_o0yyHE:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=hS-bL37GZXQ:ATRS_o0yyHE:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=hS-bL37GZXQ:ATRS_o0yyHE:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=hS-bL37GZXQ:ATRS_o0yyHE:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=hS-bL37GZXQ:ATRS_o0yyHE:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/hS-bL37GZXQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/wishlist-of-features-for-a-distributed-graph-data-base-technology/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/wishlist-of-features-for-a-distributed-graph-data-base-technology/</feedburner:origLink></item>
		<item>
		<title>From Graph (batch) processing towards a distributed graph data base</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/niOtpgCJsag/</link>
		<comments>http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/#comments</comments>
		<pubDate>Thu, 23 Feb 2012 12:45:22 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[Reading Club]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[PhD]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1156</guid>
		<description><![CDATA[Yesterdays meeting of the reading club was quite nice. We all agreed that the papers where of good quality and we gained some nice insights. The only drawback of the papers was that it did not directly tell us how to achieve our goal for a real time distributed graph data base technology. In the [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/wishlist-of-features-for-a-distributed-graph-data-base-technology/' rel='bookmark' title='Wishlist of features for a distributed graph data base technology'>Wishlist of features for a distributed graph data base technology</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
<li><a href='http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/' rel='bookmark' title='Reading club on Graph databases and distributed systems'>Reading club on Graph databases and distributed systems</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Yesterdays meeting of the reading club was quite nice. We all agreed that the papers where of good quality and we gained some nice insights. The only drawback of the papers was that it did not directly tell us how to achieve our goal for a real time distributed graph data base technology. In the readings for <strong>next meeting (which will take place Wednesday March 7th 2pm CET) </strong>we tried to choose papers that don&#8217;t discuss these distributed graph / data processing techniques but &nbsp; focus more on speed or point out the general challenges in parallel graph processing.</p>
<h3>Readinglist for next Meeting (Wednesday March 7th 2pm CET)</h3>
<ul>
<li><a target="_blank" href="http://dl.acm.org/citation.cfm?id=1012894">memcached paper</a>: To understand how for distributed shared memory works which <a href="http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/">could essentially speed up approaches like Signal Collect</a></li>
<li><a target="_blank" href="http://www.cs.cornell.edu/people/egs/papers/beehive.pdf">Beehive:</a> to see a p2p aproach for graph distribution.&nbsp;</li>
<li><a target="_blank" href="http://www.sandia.gov/~bahendr/papers/graphs-and-machines.pdf">Challenges in parallel graph processing</a>. For obvious reasons since it points out the large picture.</li>
<li><a target="_blank" href="http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html">http://www.boost.org/doc/libs/1_48_0/libs/graph_parallel/doc/html/index.html</a>&nbsp;The boos library is a general parallel graph processing framework. In any case it is interesting and good to understand what is going on there.</li>
<li><a target="_blank" href="http://cs-www.cs.yale.edu/homes/dna/papers/sw-graph-scale.pdf">Topology partitioning applied to SPARQL, HADOOP and TripleStores</a>&nbsp;Shows how a speedup of 1000x can be achieved due to smart partitioning of a graph</li>
</ul>
<p>Again while reading an preparing stuff feel free to add more reading wishes to the comments of this blog post or drop me a mail!</p>
<h3>Summary of yesterdays meeting</h3>
<p>As written in the introduction we agreed that the papers where interesting but not heading in our direction. Claudio pointed out that everyone should consider the following set of questions.</p>
<ul>
<li>Do we want the graph to be mutable or is it supposed to writable or is it supposed to be read only?
<ul>
<li>writing makes sens. If it is read only it is called batch processing</li>
<li>Writing is hard you care about locking consistancy</li>
</ul>
</li>
</ul>
<ul>
<li>Do we want to answer queries (Cypher/gremlin/whatever)?</li>
<li>Do we want to provide an API for processing?</li>
<li>How big is the data set we want to support
<ul>
<li>many people do in memory</li>
<li>If you go to the disk you open a whole new bottle of topics</li>
<li>One approach would be to solve the problem in memory first.</li>
</ul>
</li>
</ul>
<p>I am very confident that it was a good idea to start with graph processing but that we are taking the right steps now to go in the direction of real distributed graph data base systems. I think there are some more questions and high level assumptions that one has to fix which I will post in a few days on this blog.&nbsp;Sorry I am in a hurry for this day / rest of the week.</p>
<h3>Infrastructure</h3>
<p>Schegi just suggested to create a Mailingliste for the reading club or to switch to Google Groups. He pointed out that a private blog is kind of a weired medium to be so central. What is your opinion on that? Do we need some other / more formal infrastructure?</p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/wishlist-of-features-for-a-distributed-graph-data-base-technology/' rel='bookmark' title='Wishlist of features for a distributed graph data base technology'>Wishlist of features for a distributed graph data base technology</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
<li><a href='http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/' rel='bookmark' title='Reading club on Graph databases and distributed systems'>Reading club on Graph databases and distributed systems</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=niOtpgCJsag:aIJj_BHhfcw:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=niOtpgCJsag:aIJj_BHhfcw:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=niOtpgCJsag:aIJj_BHhfcw:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=niOtpgCJsag:aIJj_BHhfcw:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=niOtpgCJsag:aIJj_BHhfcw:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=niOtpgCJsag:aIJj_BHhfcw:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=niOtpgCJsag:aIJj_BHhfcw:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/niOtpgCJsag" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/</feedburner:origLink></item>
		<item>
		<title>Question of the Day: How the hell do we reach more people?</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/9ZXs1ANYvkI/</link>
		<comments>http://www.rene-pickhardt.de/question-of-the-day-how-the-hell-do-we-reach-more-people/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 21:33:09 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[Question of Musicians]]></category>
		<category><![CDATA[ad]]></category>
		<category><![CDATA[advertising]]></category>
		<category><![CDATA[answer]]></category>
		<category><![CDATA[fundamental concepts]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[music industry]]></category>
		<category><![CDATA[musicians]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[record label]]></category>
		<category><![CDATA[social circle]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1027</guid>
		<description><![CDATA[Recently I received an email from a musicians that wishes to stay unnamed&#160;telling me that many people out there love his music but it just hasn&#8217;t spread too far. His basic question is how can his band reach more people on the web especially with regard to a new upcoming video? His promoter suggested something [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/question-by-filip-stilin-house-on-mars-what-do-you-think-of-bandcamp/' rel='bookmark' title='Question by Filip Stilin (House on Mars): What do you think of Bandcamp?'>Question by Filip Stilin (House on Mars): What do you think of Bandcamp?</a></li>
<li><a href='http://www.rene-pickhardt.de/you-should-use-youtube-promoted-videos-for-your-music-band/' rel='bookmark' title='you should use Youtube promoted videos for your music band'>you should use Youtube promoted videos for your music band</a></li>
<li><a href='http://www.rene-pickhardt.de/stop-facebook-filterbubble-of-facebooks-news-stream-wall/' rel='bookmark' title='Stop Facebook &#8211; Filterbubble of facebook&#8217;s news stream &amp; wall'>Stop Facebook &#8211; Filterbubble of facebook&#8217;s news stream &#038; wall</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Recently I received an email from a musicians that wishes to stay unnamed&nbsp;telling me that many people out there love his music but it just hasn&#8217;t spread too far. His basic question is how can his band reach more people on the web especially with regard to a new upcoming video?</p>
<p>His promoter suggested something like:</p>
<ol>
<li>You should send a press release to all music related websites</li>
<li>You should show the videos to your &nbsp;friends and ask them to reshare it with their friends</li>
<li>You should make a riffle regarding the video</li>
</ol>
<p>Before I start with my thoughts. In this blog article I will explain some fundamental concepts of online marketing. This is not only interesting for musicians but for any brand! The concepts of branding will be questioned hard.</p>
<h3>The aked question is very fundamental for online (music) marketing</h3>
<p>The oberservation of this musician is quite right and holds true for many musicians but also for other products. Actually it is a <a target="_blank" href="http://networkscience.wordpress.com/2012/02/07/the-pareto-principle/">nice application of the Pareto Principle</a>&nbsp;(8.9% of bands make up 91.1% of plays on Last.fm). And the problem exists not only on the web. Once you start a new thing with a lot of enthusiasm you will most certainly get to realize <strong>that no one really cares</strong> despite the fact that your product / music is good and your receive some nice feedback. I looked at some of the social media response for this particular artist and found out that he really is in the lucky position to receive a lot of great feedback and see that quite a lot people cared. Still it is &nbsp;hard for him to increase his reach and inform people about this very product id est introducing the music to them.</p>
<h3>Problem of overcoming the boarders of one&#8217;s own social circle</h3>
<p>The entire problem breaks down to overcoming one&#8217;s own social circle. If you understand that <a target="_blank" href="http://en.wikipedia.org/wiki/Six_degrees_of_separation">the entire world is a network</a> (it is really hard to fully understand all the implications of this) you understand that the big problem in (online) marketing is the following:</p>
<p>You have your ego network &#8211; or let&#8217;s call it social circle of people &#8211; that you can reach with an idea. You will realize within your &nbsp;social circle some ties are better and some are worse. Some ties are so great that they might even help you out (for example because they are your record label or promoter or booker or die hard fans). But in the end they can be all combined as one person in the network that has a (somewhat bigger) social circle. And the social circle will look like the following pictures which I took from <a target="_blank" href="http://www.connectedaction.net/2009/03/30/social-networks-in-the-news/">Marc Smith Blogpost about social networks in the news</a>&nbsp;(A tutor of mine at Webscie summer school 2011)</p>
<p><img class="alignnone" title="Full Ego Network" src="http://www.connectedaction.net/wp-content/uploads/2009/03/20090329-nyt-facebook-ego-network-1-of-2.png" alt="Ego network and social circle" width="649" height="323" /></p>
<p>In this series of pictures you can actually see how weak your social circle is (and again for a music band the circle might be bigger at all but the entire picture will look similar:</p>
<p><img class="alignnone" title="Rest of an ego network" src="http://www.connectedaction.net/wp-content/uploads/2009/03/20090329-nyt-facebook-ego-network-2-of-2.png" alt="Rest of an ego network" width="546" height="319" /></p>
<p>Even though <a target="_blank" href="http://en.wikipedia.org/wiki/Six_degrees_of_separation">you can reach anyone in our worldwide friendship network with only 6 people in between</a> your social circle will only reach a view ten thousand people! So the question is: <strong>&#8220;how to get the other 7 billion?&#8221;</strong></p>
<p>To make this even more clear: As a b(r)and you might &#8220;know&#8221; an amazing number of let us say 3500 fans (mabe you reach even more) and become a &#8220;rather&#8221; central node in the social graph of people and brands. But let uns see why this is not of any help!</p>
<ol>
<li>you know 3500 fans</li>
<li>Those fans already have 410&#8217;000 (almost half a million) people in their ego networks which probably makes the use of Facebook so tempting.&nbsp;</li>
<li>Your fans friends altogether know 51 mio people.&nbsp;</li>
<li>Finally these people that you already know over three hops will know the entire rest of the world e.g. the other 7 bn people&nbsp;</li>
</ol>
<p>Do you really think your music / product / message / idea is so great that your friends will tell their friends who will tell their friends who will tell the others? <strong>That is really all it takes! 4 hops. Sorry to disappoint you. This is almost impossible to achieve!</strong></p>
<h3>The classic answer before the viral web was born: Advertising</h3>
<p>Well the problem existed a long time before the web was born. People already have figured out the solution: You ask someone who has a huge reach to tell his audience about your new idea! Less sophisticated we can summerize this idea in one word: &#8220;<strong>advertising</strong>&#8221; you go to any media pay them some money and they will talk about you. This worked amazingly well in earlier days. There wasn&#8217;t that much media and if you had your product in some media you could be certain that you had at least a brand awareness. But there is a limitation of advertising&#8230;</p>
<h3>A Problem of advertising and some advantages and disadvantages of online advertising</h3>
<p>Of course once you run advertising campaigns your will receive some attention. No matter how good or bad your product is. That was the principle during the Dot com Bubble&nbsp;&nbsp;where <a target="_blank" href="http://en.wikipedia.org/wiki/Dot-com_bubble#Free_spending">many start ups spent way too much advertising dollars and money to built reach</a> instead of focusing on a great product. This is the reason why every ad campaign should carefully measure several things (fortunately in online marketing this is easier than anything else!):</p>
<ul>
<li><a target="_blank" href="http://en.wikipedia.org/wiki/Conversion_rate">Conversionrate</a> &#8211; as the percentage of people that convert to (fans, custormers, emailadresses,&#8230; basically whatever goal you have for the campaign)</li>
<li><a target="_blank" href="http://en.wikipedia.org/wiki/Bounce_rate">Bounce rate</a> &#8211; as the amount of people that clicked your ad but left your page / product right away</li>
<li>The price you pay per click (or per 1 thousand impressions)</li>
<li>The value to you for a new fan (customer).</li>
</ul>
<p>Once you know these numbers you can easily calculate weather an ad campaign is usefull for you or not! And you can calculate those numbers easily. The first three are given or measured during the ad campaign. If you don&#8217;t know how to do this. Contact me. The last one has its own paragraph.</p>
<h3>What is the value of one new fan?</h3>
<p>For the value of a new fan you could do the following. Count the number of fans you have (e.g. size of news letter / facebook fans &#8230;) and then look at your last year revenue (sum up: merch + concerts + sold music) devide this number by the number of fans and you see how much revenue one fan produces in one year. I know it is only a rough estimation but I wonder how many bands have calculated this number! Look at other products and markets.Everyone who is in the business of direct marketing maintains a customer data base which is a solid asset to the business. Groupon for example gives away 6 Euro for every new Customer someone finds. (This means that Groupon thinks a new Contact is worth at least 6 Euro.)&nbsp;</p>
<p>As a musician you should know this number. Assume a new Fan is worth 6 Euro to you this would mean that knowing 5 thousand fans is worth as much as the production of an entire record. Could this be true? Actually I will try to start some calculations together with my Collegues from In Legend soon and try to calculate how much we should spend to gain a new fan!</p>
<h3>Problem of aggressive marketing vs spreading of new information</h3>
<p>On the web the problem is that people have so much choice that advertisting only helps to some degree. With Google adwords for example you only pay for clicks. So you definately obtain the attention. On the other side people drift away very quickly and their span of attention is very small. Another drawback is that many things on the web are free. Even though advertisting for music on the web is so much cheaper than offline advertising it is not standard to make use of advertisting on the web. Recordlabels and promoters so far refuse to invest money for it. That is sad because there is a lot of potential in online advertising (especially once you measure your conversion and bounce rates as well as the click through rates.)</p>
<p>Luckily those rates can be measured before you run an ad campaign. Investing money to increase reach should therefore never be done to a complete new product. A complete new product should be tested first. You meassure the feedback. Once you realize that people are happy you are save paying money to increasing your reach. For example in in legends case the youtube <a target="_blank" href="http://www.youtube.com/watch?v=aLesGMA0-rM">pandemonium video</a>&nbsp;received much better feedback than <a target="_blank" href="http://www.youtube.com/watch?v=l8AnViAS6BQ">vortex video</a>&nbsp;and so far the <a target="_blank" href="http://www.youtube.com/watch?v=exSsKgUVu54">soul apart video</a> seems to receive the best feedback overall.&nbsp;By user ratings / average daily views as well as user comments. So you better go out an make advertising with your best performing content. Will you care that this is not your newest piece of work or even the first video? Hell no! Advertising is always targeted to people that don&#8217;t know you. You can even show them a 10 year old video and they might not realize its age! Once they like you they can still discover your newer stuff.&nbsp;</p>
<p>Attention the above suggestion drastically changes if you are a mature b(r)and. In that case of course your star products are already well known and you should use promotional power and your brand recognition to get out the new product. This leads to the next paragraph.</p>
<h3>Do all these calculations depend on the maturity of the band?</h3>
<p>The answer is clearly yes! A band that is mature will still need promotion and advertising but also a lot of messages spread from almost alone. Some other things are easier achieved (mag title stories and so on&#8230;) But the obvious message is the following. The cost of advertising and your conversion rate will most probably stay constant and remain independent of the bands maturity. Metallica has to pay the same price for youtube ads as In Legend has to. But the fan is worth more if a band is not mature yet. A fan from the first record might go to many tours and buy a lot of products and also help spreading the word in a viral manner. For a mature band chances are pretty high that people who see the ad already know the brand but already decided not to like the band. So even though a young band has not much money and can not be sure that the product is already sufficiently good (which will lead to a drop in conversionrate i.e. more expensive ads or higher acquisition cost per new fan) ads make much more sense if the band is totally unknown. Be couragous invest some money and bootstrap your product! You are also courageous by going on the stage. So please also enter the web stage and built your reach! I know it&#8217;s less fun than rocking the audience in a life concert but the effect should be pretty much the same.</p>
<h3>Other non advertising impact factors to overcome the boarders of one&#8217;s social circle</h3>
<p>Since advertising is very expensive and probably not sustainable one can wonder if there are other ways to overcome the boarders of your social circle. I would say there are. The best way to do so in my pure empirical experience is transparicy, openess, the trust in other people and the interactive communication with them. Take me as an example. When I went to china I had basically no one to talk to about my interests. Now I started blogging sharing my ideas and &#8211; some people already called me crazy &#8211; parts of my intellectual propertie. Guess what! By doing so I was able to find more and more people with similar interests from all around the globe that I would never have met in my own social circle giving my valuable feedback to my ideas. <a href="http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/">My Reading club on distributed graph data bases</a> is just a recent example of this added value from transparency.</p>
<p>So as a band here are some things you could and should do besides advertising your star products (and yes it is not the mainstream and might take some courage to do so!):</p>
<ol>
<li><strong>Interact with your fans</strong>. Treat them with respect. Not just by telling them but by really showing them. Don&#8217;t be so &#8220;kind&#8221; to share only news about you. That&#8217;s not interacting that&#8217;s publishing! Join the discussion on current topics (in music, news, &#8230;) or respond to what your fans say!&nbsp;Find out if it is relevant and stay in a conversation. I know it is hard since fans can become annoying and as a person it is already hard where a band has much more fame and more people and ideas to take care of.&nbsp;</li>
<li><strong>And please don&#8217;t be fake interactive</strong> by asking questions like &#8220;<em><a target="_blank" rel="nofollow" href="http://www.facebook.com/inlegend/posts/354278247931831">Heya Legends..<br />
A new week has begun, what is your sound for depressing mondays and what gets your through the week???</a></em>&#8221; Asking questions especially if they don&#8217;t really carry a meaning is not interactive. That is just embarissing. That&#8217;s the reason why this kind of questions don&#8217;t take your marketing anywhere.</li>
<li><strong>Integrate your fans!</strong> That is the by far most promesing strategy. As a band with already 3&#8217;500 fans there is so much diversity, creativity and so on that you will be able to achieve extra ordinary things. I am sure fans of you will have access to recording studios will have access to cheap videos maybe there are webdesigner and photographers. If you interact with your fans in a very smart way you don&#8217;t even have to ask them if there is a <strong>REPLACE BY WHATEVER YOU NEED</strong>. You will just know who is it. This leads me to 4th</li>
<li><strong>Use the upcoming social network google+</strong>. First of all it is very obvious that <a target="_blank" href="http://towriteistothink.com/2012/02/18/kids-google-and-the-increasing-speed-of-innovation/">it will kill Facebook on the long term</a> (probably even on the short term) <strong>but more important it supports you to follow your fans since you can put them into circles</strong>. So all photographs go in one circle. All reviews go in another circle, all bloggers in the next circle, all technicians, all bookers, all concert organizers and so on. Streetteamer go in one circle. Follow those people they like your music and could be valuable for you. &nbsp;&nbsp;</li>
<li><strong>Speaking of streetteam:</strong> I once talked to a record label coworker. She told me in her experience <strong>the single best promotion tool for a band is a streetteam</strong>. Unfortunately I have seen many streetteams also did not receive the respect they deserved from band mambers. A band&#8217;s streetteam has an incredible impact. I once wondered how a poster of my favourite band came to be placed on a train station of a minor German city. It was much later that I realized that this must have been streetteam members&#8230; On the web by being interactive you can build your own global streetteam with almost no cost &#8211; besides time! Treat your streetteam. Have a streetteam meeting at least once a year and have all bandmembers come and have a great party with them. You can also make virtual meetings.</li>
<li>last but not least: Do all the things your promoter suggested at the very top of my post. Those are the core homework choires.&nbsp;</li>
</ol>
<p>For the last lines. Have fun making music and enjoy the most recent video of my band</p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/exSsKgUVu54" frameborder="0" allowfullscreen></iframe></p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/question-by-filip-stilin-house-on-mars-what-do-you-think-of-bandcamp/' rel='bookmark' title='Question by Filip Stilin (House on Mars): What do you think of Bandcamp?'>Question by Filip Stilin (House on Mars): What do you think of Bandcamp?</a></li>
<li><a href='http://www.rene-pickhardt.de/you-should-use-youtube-promoted-videos-for-your-music-band/' rel='bookmark' title='you should use Youtube promoted videos for your music band'>you should use Youtube promoted videos for your music band</a></li>
<li><a href='http://www.rene-pickhardt.de/stop-facebook-filterbubble-of-facebooks-news-stream-wall/' rel='bookmark' title='Stop Facebook &#8211; Filterbubble of facebook&#8217;s news stream &amp; wall'>Stop Facebook &#8211; Filterbubble of facebook&#8217;s news stream &#038; wall</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=9ZXs1ANYvkI:lkryZ3vj7uA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=9ZXs1ANYvkI:lkryZ3vj7uA:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=9ZXs1ANYvkI:lkryZ3vj7uA:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=9ZXs1ANYvkI:lkryZ3vj7uA:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=9ZXs1ANYvkI:lkryZ3vj7uA:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=9ZXs1ANYvkI:lkryZ3vj7uA:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=9ZXs1ANYvkI:lkryZ3vj7uA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/9ZXs1ANYvkI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/question-of-the-day-how-the-hell-do-we-reach-more-people/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/question-of-the-day-how-the-hell-do-we-reach-more-people/</feedburner:origLink></item>
		<item>
		<title>Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/oOPr8057tZ4/</link>
		<comments>http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 17:05:49 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[Reading Club]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[neo4j]]></category>
		<category><![CDATA[Pregel]]></category>
		<category><![CDATA[Signal Collect]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1134</guid>
		<description><![CDATA[One of the reading club assignments was to read the paper about Google Pregel and Signal Collect, compare them and point out pros and cons of both approaches. So after I read both papers as well as Claudios overview on Pregel clones and took some notes here are my thoughts but first a short summary [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/' rel='bookmark' title='Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club'>Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club</a></li>
<li><a href='http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/' rel='bookmark' title='Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud'>Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>One of the <a href="http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/">reading club assignments</a> was to read the paper about Google Pregel and Signal Collect, compare them and point out pros and cons of both approaches.</p>
<p>So after I read both papers as well as <a target="_blank" href="http://blog.acaro.org/entry/google-pregel-the-rise-of-the-clones">Claudios overview on Pregel clones</a> and took some notes here are my thoughts but first a short summary of both papers.</p>
<h3>Summary of Google Pregel</h3>
<p>The methodology is heavily based on <a target="_blank" href="http://en.wikipedia.org/wiki/Bulk_synchronous_parallel">Bulk Sychronous Parallel Model (BSP)</a> and also has some similarties to MapReduce (with just one superstep). The main idea is to spread the data over several machines and introduce some supersteps. For each superstep every vertex of the graph calculates a certain function that is given by the programmer.<br />
This enables one to process large graphs which are distributed over several machines. The paper describes how to use Checkpoints to increase fault tolerance and also how to make good use of the Google File System in order to partition the graph data on the workers. The authors mention that smarter hashing functions could help to distribute the vertices not randomly but rather in a way they are connected on the graph which could potentially increase performance.<br />
Overall the goal of Google Pregel seems to enable one to process large graph data and gain knowledge from it. The focus does not seem to increase the usage of the calculation power of the distributed system efficiently. In stead it rather seems to create a system that makes distribution of data &#8211; that will not fit into one machine &#8211; possible at a decent speed and without much effort for the programmer by introducing methods for increasing fault tolerance.</p>
<h3>Summary of Signal Collect</h3>
<p>Signal Collect as a system is pretty similar to Google Pregel. The main difference is that the authors introduce a threshold score which is used to decide weather a node should collect its signals or weather it should send signals. Using this score the processing of algorithms can be accelerated in a way that for every super step only signals and collects are performed if a certain threashhold is hit.<br />
From here the authors say that one can get rid of the superstep model and make the entire calculation asynchronous. This is done by introducing randomization on the set of vertices on which signal and collect computations have to be computed (as long as the threasholdscores are overcome)<br />
The entire system is implemented on a single machine but the vertices of the compute graph are processed by different workers (in this setting Threads). All Threads are able to share the main memory of the system which makes message passing of Signal and Collect computations unnecessary. The authors show how the increasing number of workers actually antiproportionally lower the runtime of the algorithm in the asynchronous setting. They also give evidence that different scheduling strategies seem to fit the needs for different graphs or algorithms.</p>
<h3>Discussion of Pros and Cons</h3>
<ul>
<li>From the text above it seems very obvious that Signal Collect with its Asynchronous Programming model seems superior. But &#8211; in opposite to the authors &#8211; I have hidden to mention the drawbacks of one small but important detail. The fact that all the workers share a common knowledge which they can access due random access in main memory of the machine allows their model to be so fast while being asynchronous. It is not clear how to maintain this speed with a real distributed system. So in this way Signal Collect only give a proof of concept that an abstract programming model for graph processing exists and it enables fast distribution in theory.</li>
<li>Pregel actually is a real frame work that can really achieve distribution of large data to clusters of several thousand machines which for sure is a huge pro.</li>
<li>Signal Collect proposes to be more general than Pregel since Pregel can only respect one vertex type and edges are stored implicitly. Whereas Signal Collect is able to store RDF Graphs. I personally understand that Signal Collect can only send signals from one vertex to another if and edge exists and is also not able to add or remove edges or vertices. In this sense I still think that Pregel is the more general system. But I guess one can still argue on my point of view.</li>
<li>Pregel&#8217;s big drawbacks in my opinion are that the system is not optimized for speed. As already discussed in the last meeting of the reading club Map Reduce &#8211; with its one Superstep attitude &#8211; is able to start Backup tasks towards the end of the computation in order to fight stragglers. Pregel has to wait for those stragglers in every superstep in order to make synchronous Barriers possible. &nbsp;</li>
<li>Another point that is unique with Pregel is the deep integration with Google File System (btw. I am almost through the google file system paper and even if you already know of the idea it is absolutely worthwhile reading it and understanding the arguments for the design decisions of the google file system). So far I am not sure weather this integration is a strong or a weak point. This is due to the fact that I can&#8217;t see all the implications.<strong> However it gives strenght to my argument that for a distributed system some things like network protocols and file systems should be considered since they seem to have a strong impact on the entire system.</strong>&nbsp;</li>
<li>Both systems in my opinion fail to consider partitioning of the graph and a different network protocol as an important task. Especially for Pregel I do not understand this since it already has so much network traffic. Partitioning the graph might increase start up Traffic on the one hand but could increase overall traffic on the long term.&nbsp;</li>
</ul>
<h3>Outlooks and personal thoughts:</h3>
<p>I am considering to invite the authors of both papers to next weeks reading club. It would be even more interesting to discuss these and other questions directly with the guys who built that stuff.&nbsp;</p>
<p>Also I like Schegi&#8217;s idea to see what happens if one actually runs several neo4j servers on different machines and just use a model similar to Signal Collect or Pregel to perform some computations. In this way a programming model could be given and research on the core distribution framework &#8211; relying on good technologies for the workers &#8211; could be done.</p>
<p>For the development of the first version of <a target="_blank" title="metal commuity" href="http://www.metalcon.de">metalcon</a> we used <a target="_blank" href="http://memcached.org/">memcached</a>. I read a lot <a target="_blank" href="http://highscalability.com/bunch-great-strategies-using-memcached-and-mysql-better-together">that memcached scales perfectly horizontal over several machines</a>. I wonder how an integration of memcached to Signal Collect would work in order to make the asynchronous computation possible in a distributed fashion. Since random access memory is a bottleneck in any application I suggest to put the <a target="_blank" href="http://dl.acm.org/citation.cfm?id=1012894">original memcached paper</a> on our reading list.</p>
<p>One last point to mention is that both systems still don&#8217;t seem to be useful as a technology to built a distributed graph data base which enables online query processing.</p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/' rel='bookmark' title='Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club'>Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club</a></li>
<li><a href='http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/' rel='bookmark' title='Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud'>Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=oOPr8057tZ4:WmrNdLSM6FU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=oOPr8057tZ4:WmrNdLSM6FU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=oOPr8057tZ4:WmrNdLSM6FU:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=oOPr8057tZ4:WmrNdLSM6FU:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=oOPr8057tZ4:WmrNdLSM6FU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=oOPr8057tZ4:WmrNdLSM6FU:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=oOPr8057tZ4:WmrNdLSM6FU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/oOPr8057tZ4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/</feedburner:origLink></item>
		<item>
		<title>President Obama on Google+ talking to people</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/saYszkXwDaY/</link>
		<comments>http://www.rene-pickhardt.de/president-obama-on-google-talking-to-people/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 07:03:14 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[Webscience]]></category>
		<category><![CDATA[googleplus]]></category>
		<category><![CDATA[hangout]]></category>
		<category><![CDATA[obama]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1127</guid>
		<description><![CDATA[Not really news since it has happened like 20 days ago but here is a nice youtube summary of President Obamas public Hangout with the American folk. Kind of amazing that he actually did this. I am really looking forward to the time where these kind of events are not amazing anymore but rather standard [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/how-to-invite-people-to-google-plus/' rel='bookmark' title='how to invite people to Google plus'>how to invite people to Google plus</a></li>
<li><a href='http://www.rene-pickhardt.de/google-2011-q4-earnings/' rel='bookmark' title='Google 2011 Q4 Earnings'>Google 2011 Q4 Earnings</a></li>
<li><a href='http://www.rene-pickhardt.de/google-facebook-co-are-not-free/' rel='bookmark' title='Google, Facebook &amp; co. are not free!'>Google, Facebook &#038; co. are not free!</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Not really news since it has happened like 20 days ago but here is a nice youtube summary of President Obamas public Hangout with the American folk. </p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/HMCanwEsy_k" frameborder="0" allowfullscreen></iframe></p>
<p>Kind of amazing that he actually did this. I am really looking forward to the time where these kind of events are not amazing anymore but rather standard like they should be. It is also interesting to see what technology he chose. Well I guess it was not for Google Plus&#8217; publicity but rather for the brand recognition of Youtube. Anyway I didn&#8217;t see Facebook around there. Also I think questions like this should not be decided by how a president behaves it still is interesting to see how Obama chose Google over Facebook in these days&#8230;</p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/how-to-invite-people-to-google-plus/' rel='bookmark' title='how to invite people to Google plus'>how to invite people to Google plus</a></li>
<li><a href='http://www.rene-pickhardt.de/google-2011-q4-earnings/' rel='bookmark' title='Google 2011 Q4 Earnings'>Google 2011 Q4 Earnings</a></li>
<li><a href='http://www.rene-pickhardt.de/google-facebook-co-are-not-free/' rel='bookmark' title='Google, Facebook &amp; co. are not free!'>Google, Facebook &#038; co. are not free!</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=saYszkXwDaY:j7eDmbwsVOo:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=saYszkXwDaY:j7eDmbwsVOo:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=saYszkXwDaY:j7eDmbwsVOo:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=saYszkXwDaY:j7eDmbwsVOo:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=saYszkXwDaY:j7eDmbwsVOo:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=saYszkXwDaY:j7eDmbwsVOo:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=saYszkXwDaY:j7eDmbwsVOo:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/saYszkXwDaY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/president-obama-on-google-talking-to-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/president-obama-on-google-talking-to-people/</feedburner:origLink></item>
		<item>
		<title>Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/09hJvHf5eA4/</link>
		<comments>http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 16:54:44 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[Reading Club]]></category>
		<category><![CDATA[GFS]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1123</guid>
		<description><![CDATA[The&#160;first meeting of our reading club was quite a success. Everyone was well prepared and we discussed some issues about Google&#8217;s Map Reduce framework and I had the feeling that everyone now better understands what is going on there. I will now post a summary of what has been discussed and will also post some [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/' rel='bookmark' title='Reading club on Graph databases and distributed systems'>Reading club on Graph databases and distributed systems</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>The&nbsp;<a href="http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/">first meeting of our reading club</a> was quite a success. Everyone was well prepared and we discussed some issues about Google&#8217;s Map Reduce framework and I had the feeling that everyone now better understands what is going on there. I will now post a summary of what has been discussed and will also post some feedback and reading for next week to the end of this post. Most importantly: The reading club will meet next week Wednesday February 22nd at 2 o&#8217;clock pm CET.&nbsp;</p>
<h3>Summary</h3>
<p>First take away which was well known is that there is a certain stack of Google papers and corresponding Apache implementations:</p>
<ol>
<li><a target="_blank" href="http://www.cs.brown.edu/courses/cs295-11/2006/gfs.pdf">Google File System</a> vs <a target="_blank" href="http://hadoop.apache.org/hdfs/">Apache Hadoop filesystem</a></li>
<li><a target="_blank" href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf">Google Big Table</a> vs<a target="_blank" href="http://hbase.apache.org/"> Apache HBase</a></li>
<li><a target="_blank" href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/mapreduce-osdi04.pdf">Google Map reduce</a> vs <a target="_blank" href="http://hadoop.apache.org/">Apache Hadoop</a></li>
<li><a target="_blank" href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Google Pregel</a>&nbsp;vs <a target="_blank" href="http://incubator.apache.org/giraph/">Apache Giraph</a></li>
</ol>
<p>The later ones are all based eather on GFS or HDFS. Therefore we agreed that a detailed understanding of GFS (Google file system) is mandatory to fully understand the Map Reduce implementation. We don&#8217;t want to commonly discuss GFS yet but at least think everyone should be well aware of it and give room for further questions about it on next weeks reading club.</p>
<p>We discussed map Reduce&#8217;s advantage of handling stragglers over Pregel&#8217;s approach. In map reduce since it is a one step system it is easy to deal with Stragglers. Just reassign the job to a different machine as soon as it takes to long. This will perfectly handle stragglers that occure due to faulty machines. The superstep model in pregel has &#8211; up to our knowledge &#8211; no clear solution to these kind of Stragglers (<strong>to come up with a strategy to handle those would be a very nice research topic!</strong>) On the other hand Pregel has another kind of Stragglers that come from super nodes. There are <a target="_blank" href="http://www.few.vu.nl/~ekr/papers/icdcn11.pdf">some papers that are fixing those problems</a> one of them is the paper that will be read for next week.</p>
<p>We had the discussion that partitioning the data in a smart way would make the process more efficient. We agreed that for Map Reduce and Pregel where you just want to process the graph on a cloud this is not the most important thing. But for a real time graph data base the partitioning of data will most certainly be a crucial point. Here again we saw the strong connection to Google File System since the Google File system does a lot of the partitioning in the current approaches.</p>
<p>Achim pointed out that Microsoft also has some&nbsp;proprietary&nbsp;products. It would be nice if someone could provide more detailed resources. He also wished that we could focus on the problems first and then talk about distributing. His solution was to make this top down.</p>
<p>We also discussed if frameworks that use map reduce to process large graphs have been compared with Pregel or Apache Giraph so far. <strong>This evaluation would also be a very interesting research topic</strong>. For that reason and to better understand what is happening when large graphs are processed with map reduce we included the last two papers for reading.</p>
<h3>Feedback from you guys</h3>
<p>After the club was over I asked everyone for suggestions and I got some usefull feedback:</p>
<ul>
<li>We should prepare more than one paper</li>
<li>google hangout in combination with many people in the room is a little hard (introduce everyone in the beginning or everyone brings a notebook or group of people should sit in front of one camera)</li>
<li>We need more focus on the paper we are currently discussing. Understanding problems should be collected 1 or 2 days before we meet and be integrated into the agenda.</li>
<li>We need some check points for every paper. everyone should state: (what do i like, what do i not like, what could be further research, what do i want to discuss, what do i not understand)&nbsp;</li>
<li>We need a reading pool where everyone can commit</li>
</ul>
<h3>New Rules</h3>
<p>In order to incoperate the feedback from you guys I thought of some rules for next weeks meeting. I am not sure if they are the best rules and if they don&#8217;t work we will easily change them back.</p>
<ul>
<li>There is a list of papers to be discussed (see below)</li>
<li>At the end of the club we fix 3-6 papers from the paper pool that are to be prepared for next week</li>
<li>before the club meets everyone should commit some more papers to the pool that he would like to read the week after (you can do this on the comments here or via email)</li>
<li>If more people are in the same room they should sit together in front of one camera</li>
<li>Short introduction of who is there in the beginning</li>
<li>use the checkpoints to discuss papers</li>
<li>no discussions of brand new solutions and ideas. Write them down, send a mail, discuss them at a different place. The reading club is for collectively understanding the papers that we are reading.</li>
</ul>
<p>Last but not least. The focus is about creating ideas and research about distributed real time graph data base solutions. That is why we first want to understand the graph processing stuff.</p>
<h3>Reading tasks for next week</h3>
<p>for better understanding the basics (should not be discussed)</p>
<ul>
<li><a target="_blank" href="http://www.cs.brown.edu/courses/cs295-11/2006/gfs.pdf">Google File System</a></li>
</ul>
<p>To understand Pregel and another approach that has not this rigid super step model. The last paper introduces some methods to fight stragglers that come from graph topology.</p>
<ul>
<li><a target="_blank" href="http://www.ifi.uzh.ch/pax/uploads/pdf/publication/1450/SignalCollect.pdf">Signal Collect</a></li>
<li><a target="_blank" href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Google Pregel</a></li>
<li><a target="_blank" href="http://www.few.vu.nl/~ekr/papers/icdcn11.pdf">HipG: Parallel Processing of Large-Scale Graphs</a></li>
</ul>
<p>And finnaly two more papers that discuss how map reduce can be used to process large graphs without a pregel like frame work.</p>
<ul>
<li><a target="_blank" href="http://www.math.cmu.edu/~ctsourak/pegasusICDM09.pdf">Pegasus Paper</a></li>
<li>
<div id="_mcePaste"><a target="_blank" href="http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Schatz_MLG2010.pdf">Design Patterns for Efﬁcient Graph Algorithms in MapReduce</a></div>
</li>
</ul>
<h3>More feedback is welcome</h3>
<p>If you have some suggestions to the rules or other remarks that we havn&#8217;t thought of or if you just want to read other papers feel free to comment here in this way everyone who is interested can contribute to the discussion.</p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/' rel='bookmark' title='Reading club on Graph databases and distributed systems'>Reading club on Graph databases and distributed systems</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=09hJvHf5eA4:CHsWk7AAqGA:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=09hJvHf5eA4:CHsWk7AAqGA:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=09hJvHf5eA4:CHsWk7AAqGA:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=09hJvHf5eA4:CHsWk7AAqGA:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=09hJvHf5eA4:CHsWk7AAqGA:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=09hJvHf5eA4:CHsWk7AAqGA:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=09hJvHf5eA4:CHsWk7AAqGA:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/09hJvHf5eA4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/</feedburner:origLink></item>
		<item>
		<title>Reading club on Graph databases and distributed systems</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/CXU9hweTqbI/</link>
		<comments>http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 14:57:14 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[Reading Club]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1108</guid>
		<description><![CDATA[Update: find a summary of last meeting and the current reading list for next week&#8217;s meeting here. Teaching classes is over for this term so for the next couple of weeks I want to spend a lot of time working on some research topics that are on my mind. My goal is to finnaly write [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/' rel='bookmark' title='Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club'>Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><strong>Update: <a href="http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/">find a summary of last meeting and the current reading list for next week&#8217;s meeting here</a>.</strong></p>
<p>Teaching classes is over for this term so for the next couple of weeks I want to spend a lot of time working on some research topics that are on my mind. My goal is to finnaly write down my PhD proposal and have a well organized written structure for the rest of my PhD time.</p>
<h3>The main topic for 2012 in the scientific part of my life can be summerized by these bullets points:</h3>
<ul>
<li>Graph data bases</li>
<li>distributed systems</li>
<li>distributed computing</li>
<li>distribution of graph data bases</li>
<li>dynamic hash tables</li>
<li>peer to peer networks</li>
<li>graph data base query languages (since this seems to have a deep impact on the technologies that suport everything)</li>
<li>real time graph processing</li>
</ul>
<p>So the reading club will read and most importantly understand and discuss papers that belong to those categories.</p>
<h3>I will start with the following selection of papers:</h3>
<ul>
<li><a target="_blank" href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Google Pregel</a></li>
<li><a target="_blank" href="http://research.google.com/archive/mapreduce.html">Map Reduce</a></li>
<li><a target="_blank" href="http://dl.acm.org/citation.cfm?id=1251183">Beehive</a></li>
</ul>
<h3>Time and Place</h3>
<p>The reading club will take place in D116 &nbsp;the &#8220;Kreuzverweisraum&#8221; and will take place every wednesday 2 pm CET .&nbsp;</p>
<p><strong>For next week I expect from anyone who wants to join that the Map Reduce paper will be read by Wednesday.</strong></p>
<p>I will keep anyone up to date with the results from the reading club and the anouncements for next weeks readings. </p>
<h3>How to join on the web!</h3>
<p>I shared a google plus circle with all people who are interested:<br />
<a href ="https://plus.google.com/115250982031867883098/posts/AhSZgvbKYs8">https://plus.google.com/115250982031867883098/posts/AhSZgvbKYs8</a><br />
You can contact me to be included to the circle. The circle will be invited to a hangout every wednesday about 2 pm CET (central european time)<br />
If anyone knows a better technology for the telco feel free to tell me. </p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/some-thoughts-on-google-mapeduce-and-google-pregel-after-our-discussions-in-the-reading-club/' rel='bookmark' title='Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club'>Some thoughts on Google Mapeduce and Google Pregel after our discussions in the Reading Club</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=CXU9hweTqbI:J3_0LBhrnHY:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=CXU9hweTqbI:J3_0LBhrnHY:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=CXU9hweTqbI:J3_0LBhrnHY:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=CXU9hweTqbI:J3_0LBhrnHY:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=CXU9hweTqbI:J3_0LBhrnHY:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=CXU9hweTqbI:J3_0LBhrnHY:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=CXU9hweTqbI:J3_0LBhrnHY:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/CXU9hweTqbI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/reading-club-on-graph-databases-and-distributed-systems/</feedburner:origLink></item>
		<item>
		<title>Birds of a feather: Graph processing future trends in Graph Devroom</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/mFt0db6ghBE/</link>
		<comments>http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/#comments</comments>
		<pubDate>Sun, 05 Feb 2012 11:00:56 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[FOSDEM]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1096</guid>
		<description><![CDATA[Since one of the talks got canceled the organisers of the Graph Devroom at Fosdem used the opportunity to make a public discussions with all the developers to talk about some future trends in graph processing. I really liked the idea but unfortunately the discussion wasn&#8217;t really kicking off well. I guess for &#160;a discussion [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/' rel='bookmark' title='Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing'>Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Since one of the talks got canceled the organisers of the Graph Devroom at Fosdem used the opportunity to make a public discussions with all the developers to talk about some future trends in graph processing. I really liked the idea but unfortunately the discussion wasn&#8217;t really kicking off well. I guess for &nbsp;a discussion like this people have to prepared in a better way.</p>
<ul>
<li>Topics were <a target="_blank" href="https://github.com/tinkerpop/blueprints/wiki/">Blueprints</a>&nbsp;(a common graph accass api) created by <a target="_blank" href="http://markorodriguez.com/">Marko Rodriguez</a> as&nbsp;</li>
<li>Problem of real time graph processing&nbsp;</li>
<li>Benchmarking issues (we need standards for benchmarking).&nbsp;</li>
<li>A guy from OrientDB raised the question weather Graph databases should really have <a target="_blank" href="http://en.wikipedia.org/wiki/ACID">ACID transaction</a>?</li>
<li>Max De Marzi raised the question about are graphs changing while processing or are they rather static?</li>
<li>Achim Pointed out that Relational databases are actually a special case of Graph databases. He demands vendors to generalize more and consolidate the technologies&#8230;&nbsp;</li>
</ul>
<p>The room was not as full as the talks before but sill half of the seats have been filled as you can see on this short video:</p>
<p><iframe frameborder="0" src="http://www.youtube.com/embed/85ex0DGhx4A" height="315" width="560"></iframe></p>
<h3>My thoughts on acid discussion</h3>
<p>I think that the ACID question was interesting. Alistair from neo4j gave a fine response to this saying that it clearly depends on the usecase and the kind of transactions that you really need. He compared to the relational data base world where you might have the option of switching ACID off.</p>
<p>He says that &#8220;in neo4j you cannot shut off ACID as neo4j believes that for most of their customers this is the best choice. But he admits that there are use cases where you migh want to shutoff ACID.</p>
<h3>Changing graphs vs static graphs</h3>
<p>I think that this is also a very important question. On one hand we have <a href="http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/">static models like Giraph</a> that are able to find answers on huge static graphs on the other side you have situations like <a href="http://www.rene-pickhardt.de/graphity">graphity where you have fast fluctuation</a> in data. Unfortunately for the later there is no technology I know of (besides hidden Facebook and Twitter and Googleplus) .</p>
<p>the entire disussion was recorded here:<br />
<iframe width="560" height="315" src="http://www.youtube.com/embed/1H0jtWY03zA" frameborder="0" allowfullscreen></iframe></p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/' rel='bookmark' title='Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing'>Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/google-pregel-vs-signal-collect-for-distributed-graph-processing-pros-and-cons/' rel='bookmark' title='Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons'>Google Pregel vs Signal Collect for distributed Graph Processing &#8211; pros and cons</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=mFt0db6ghBE:STv-8riHPio:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=mFt0db6ghBE:STv-8riHPio:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=mFt0db6ghBE:STv-8riHPio:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=mFt0db6ghBE:STv-8riHPio:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=mFt0db6ghBE:STv-8riHPio:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=mFt0db6ghBE:STv-8riHPio:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=mFt0db6ghBE:STv-8riHPio:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/mFt0db6ghBE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/</feedburner:origLink></item>
		<item>
		<title>Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/zwYMhieSglE/</link>
		<comments>http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/#comments</comments>
		<pubDate>Sun, 05 Feb 2012 10:10:03 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[FOSDEM]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[Social Network]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1090</guid>
		<description><![CDATA[Nils Grunwald works at the french startup Linkefluence. Their product is more or less social network analysis and graph processing. They crawl the web and blogs or get other social network data and provide solutions with statistics and insights for their customers.&#160; In this scenario obviously big data is envolved and the data carries a [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/' rel='bookmark' title='Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud'>Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/' rel='bookmark' title='Birds of a feather: Graph processing future trends in Graph Devroom'>Birds of a feather: Graph processing future trends in Graph Devroom</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Nils Grunwald works at the french startup Linkefluence. Their product is more or less social network analysis and graph processing. They crawl the web and blogs or get other social network data and provide solutions with statistics and insights for their customers.&nbsp;</p>
<p>In this scenario obviously big data is envolved and the data carries a natural structure of a graph. He sais a system to process the data has the following constrains:</p>
<ul>
<li>The processing should not compromise the rest of the system</li>
<li>Low maintenance costs</li>
<li>Used for queries and rapid prototyping (so they want a &#8220;general&#8221; graph processing solution as customer needs changes)</li>
<li>Flexible, hard to tell which field or metadata will be used beforehand.</li>
</ul>
<p>He afterwards introduces <a target="_blank" href="http://nathanmarz.com/blog/introducing-cascalog-a-clojure-based-query-language-for-hado.html">their solution Cascalog</a> based on Hadoop and is also inspired by cascading a workflow managment system and datalog a subset of prolog which as a declarative, expressive language is very concise way of writing queries and enable quick prototyping</p>
<p>For me personally it is not a very interesting solution since it is not able to answer queries in realtime which of course is obvious if you consider the technologies it is based on. But I quess for people that have time and just do analysis this solution will properly work pretty well!</p>
<p>What I really liked about his the solution is that after processing the graph you can export the data to Gephi or to <a href="http://www.rene-pickhardt.de/tag/neo4j">Neo4j</a>&nbsp;to have fast query processing.&nbsp;</p>
<p>Hey then explained alot specific details about the syntax of cascalog:</p>
<p>&nbsp;</p>
<div id="attachment_1091" class="wp-caption alignnone" style="width: 650px"><a href="http://www.rene-pickhardt.de/wp-content/uploads/2012/02/nils-grundwald-fosdem.jpg"><img class="size-full wp-image-1091" title="nils grundwald fosdem" src="http://www.rene-pickhardt.de/wp-content/uploads/2012/02/nils-grundwald-fosdem.jpg" alt="nils grundwald fosdem" width="640" height="480" /></a>
<p class="wp-caption-text">nils grundwald from linkfluence talks about cascalog at fosdem</p>
</div>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/3czv_E9ALNE" frameborder="0" allowfullscreen></iframe></p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/' rel='bookmark' title='Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud'>Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/' rel='bookmark' title='Birds of a feather: Graph processing future trends in Graph Devroom'>Birds of a feather: Graph processing future trends in Graph Devroom</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=zwYMhieSglE:lQytbSYVArQ:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=zwYMhieSglE:lQytbSYVArQ:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=zwYMhieSglE:lQytbSYVArQ:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=zwYMhieSglE:lQytbSYVArQ:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=zwYMhieSglE:lQytbSYVArQ:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=zwYMhieSglE:lQytbSYVArQ:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=zwYMhieSglE:lQytbSYVArQ:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/zwYMhieSglE" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/</feedburner:origLink></item>
		<item>
		<title>Claudio Martella talks @ FOSDEM about Apache Giraph: Distributed Graph Processing in the Cloud</title>
		<link>http://feedproxy.google.com/~r/Rene-pickhardt/~3/VJYHm1AeuYo/</link>
		<comments>http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/#comments</comments>
		<pubDate>Sun, 05 Feb 2012 09:01:45 +0000</pubDate>
		<dc:creator>Rene</dc:creator>
				<category><![CDATA[FOSDEM]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[giraph]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://www.rene-pickhardt.de/?p=1085</guid>
		<description><![CDATA[Claudio Martella&#160;introduces&#160;Apache Giraph&#160;which according to him is a loose implementation of&#160;Google Pregel&#160;which was introduced &#160;on SIGMOD&#160;in 2010. He points out that Map Reduce cannot be used to do graph processing. He then gave an example on how MapReduce can be used to to do page rank calculation. He points out that Pagerank can be calculated [...]


Related posts:<ol><li><a href='http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/' rel='bookmark' title='Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing'>Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/' rel='bookmark' title='Birds of a feather: Graph processing future trends in Graph Devroom'>Birds of a feather: Graph processing future trends in Graph Devroom</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div><a target="_blank" href="http://blog.acaro.org/">Claudio Martella</a>&nbsp;introduces&nbsp;<a target="_blank" href="http://incubator.apache.org/giraph/">Apache Giraph</a>&nbsp;which according to him is a loose implementation of&nbsp;<a target="_blank" href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Google Pregel</a>&nbsp;which was introduced &nbsp;on SIGMOD&nbsp;in 2010. He points out that Map Reduce cannot be used to do graph processing.</div>
<div>
<p>He then gave an example on how MapReduce can be used to to do page rank calculation. He points out that Pagerank can be calculated as a local property of a graph in a distributed way by calculating local pagerank from the knowledge of the neighbours. He did this to show what the Drawbacks of this method are in his oppinion:</p>
<ul>
<li>job boostrap take some time</li>
<li>disk is hit about 6 &nbsp;times</li>
<li>Data is sorted</li>
<li>Graph is passed through</li>
</ul>
<p>Like in the Pregel Paper he says that other Graphalgorithms like singlesource shortest paths have the same problems.&nbsp;</p>
<p>&nbsp;</p>
<div>
<dl id="attachment_1086">
<dt><a href="http://www.rene-pickhardt.de/wp-content/uploads/2012/02/Claudio-Martella-fosdem.jpg"><img title="Claudio Martella fosdem" src="http://www.rene-pickhardt.de/wp-content/uploads/2012/02/Claudio-Martella-fosdem.jpg" alt="Claudio Martella from Apache explains how giraph works at in the graph dev room @ Fosdem 2012" width="576" height="432" /></a></dt>
<dd>Claudio Martella from Apache explains how giraph works at in the graph dev room @ Fosdem 2012</dd>
</dl>
</div>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>After introducing more about implementing Pregle ontop of the existing MapReduce structure for distributing he says that this system has some advantages over MapReduce</p>
<ul>
<li>it&#8217;s a stateful computation</li>
<li>Disk is hit if/only for checkpoints</li>
<li>No sorting is necessary</li>
<li>Only messages hit the network</li>
</ul>
<p>He points out that the advantages of Giraph over other methods (Hama, GoldenOrb, Signal/Collect) are especially an active community (Facebook, Yahoo, Linkedin, Twitter) behind this project. I personally think another advantage is that it is run by Apache who already run MapReduce (<a target="_blank" href="http://hadoop.apache.org/">Hadoop</a>) with great success. So it is something that people trust&#8230;</p>
<p>Claudio points out explicitly that they are searching for more contributors and I think this is really an interesting topic to work on! So thank Claudio for your inspiring work!
</p></div>
<p>here the video streams from the graph dev room:<br />
<iframe width="560" height="315" src="http://www.youtube.com/embed/3ZrqPEIPRe4" frameborder="0" allowfullscreen></iframe></p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/BmRaejKGeDM" frameborder="0" allowfullscreen></iframe></p>


<p>Related posts:<ol><li><a href='http://www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/' rel='bookmark' title='Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing'>Nils Grunwald from Linkfluence talks at FOSDEM about Cascalog for graph processing</a></li>
<li><a href='http://www.rene-pickhardt.de/from-graph-batch-processing-towards-a-distributed-graph-data-base/' rel='bookmark' title='From Graph (batch) processing towards a distributed graph data base'>From Graph (batch) processing towards a distributed graph data base</a></li>
<li><a href='http://www.rene-pickhardt.de/birds-of-a-feather-graph-processing-future-trends-in-graph-devroom/' rel='bookmark' title='Birds of a feather: Graph processing future trends in Graph Devroom'>Birds of a feather: Graph processing future trends in Graph Devroom</a></li>
</ol></p><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=VJYHm1AeuYo:bPaM6PBPBE0:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=VJYHm1AeuYo:bPaM6PBPBE0:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=VJYHm1AeuYo:bPaM6PBPBE0:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=VJYHm1AeuYo:bPaM6PBPBE0:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=dnMXMwOfBR0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=VJYHm1AeuYo:bPaM6PBPBE0:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?i=VJYHm1AeuYo:bPaM6PBPBE0:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Rene-pickhardt?a=VJYHm1AeuYo:bPaM6PBPBE0:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Rene-pickhardt?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Rene-pickhardt/~4/VJYHm1AeuYo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.rene-pickhardt.de/claudio-martella-talks-fosdem-about-apache-giraph-distributed-graph-processing-in-the-cloud/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 1.213 seconds. --><!-- Cached page generated by WP-Super-Cache on 2012-02-26 05:05:20 --><!-- Compression = gzip -->

