<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Dive into A Data Deluge</title>
	
	<link>http://diveintodata.org</link>
	<description>Discussion about Newly Emerging Issues on Database</description>
	<lastBuildDate>Wed, 27 Jan 2010 20:08:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/diveintodata" /><feedburner:info uri="diveintodata" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>애플 타플릿 IPad 발표 됐군요.</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/R3EfgC7b-wA/</link>
		<comments>http://diveintodata.org/2010/01/%ec%95%a0%ed%94%8c-%ed%83%80%ed%94%8c%eb%a6%bf-ipad-%eb%b0%9c%ed%91%9c-%eb%90%90%ea%b5%b0%ec%9a%94/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 20:08:16 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[apple]]></category>
		<category><![CDATA[ipad]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=598</guid>
		<description><![CDATA[




나오기 전부터 시끄럽더니 단순한 언론 플레이는 아니었던 것 같습니다. 아래 두 링크는 발표와 제품 사진, 그리고 동영상입니다. 가격이 $499 부터 시작한다는게 조금 부담이네요.

http://www.engadget.com/2010/01/27/live-from-the-apple-tablet-latest-creation-event/
http://www.apple.com/ipad/
http://www.apple.com/ipad/#video

제가 흥미로웠던 건 발표 시점에 이미 SDK, 프로그래밍 가이드라인, 휴먼 인터페이스 가이드 라인까지 준비가 되어 있었고 곧 바로 홈페이지에 소개가 됐다는 사실입니다. 언플을 밥먹듯 하는 국내 일부 기업들은 좀 배워야 하지 않나 싶습니다.
]]></description>
			<content:encoded><![CDATA[<p>나오기 전부터 시끄럽더니 단순한 언론 플레이는 아니었던 것 같습니다. 아래 두 링크는 발표와 제품 사진, 그리고 동영상입니다. 가격이 $499 부터 시작한다는게 조금 부담이네요.</p>
<ul>
<li><a href="http://www.engadget.com/2010/01/27/live-from-the-apple-tablet-latest-creation-event/">http://www.engadget.com/2010/01/27/live-from-the-apple-tablet-latest-creation-event/</a></li>
<li><a href="http://www.apple.com/ipad/">http://www.apple.com/ipad/</a></li>
<li><a href="http://www.apple.com/ipad/#video">http://www.apple.com/ipad/#video</a></li>
</ul>
<p>제가 흥미로웠던 건 발표 시점에 이미 SDK, 프로그래밍 가이드라인, 휴먼 인터페이스 가이드 라인까지 준비가 되어 있었고 곧 바로 홈페이지에 소개가 됐다는 사실입니다. 언플을 밥먹듯 하는 국내 일부 기업들은 좀 배워야 하지 않나 싶습니다.</p>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/R3EfgC7b-wA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2010/01/%ec%95%a0%ed%94%8c-%ed%83%80%ed%94%8c%eb%a6%bf-ipad-%eb%b0%9c%ed%91%9c-%eb%90%90%ea%b5%b0%ec%9a%94/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2010/01/%ec%95%a0%ed%94%8c-%ed%83%80%ed%94%8c%eb%a6%bf-ipad-%eb%b0%9c%ed%91%9c-%eb%90%90%ea%b5%b0%ec%9a%94/</feedburner:origLink></item>
		<item>
		<title>새로운 개념의 소셜 서비스 – Sekai Camera</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/VJujC7i4IOk/</link>
		<comments>http://diveintodata.org/2009/12/%ec%83%88%eb%a1%9c%ec%9a%b4-%ea%b0%9c%eb%85%90%ec%9d%98-%ec%86%8c%ec%85%9c-%ec%84%9c%eb%b9%84%ec%8a%a4-sekai-camera/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 04:27:57 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[ucc]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=585</guid>
		<description><![CDATA[
Sekai Camera라는 어플이 앱스토어에 글로벌 버전으로 출시됐다고 한다. 살펴 보니 증강현실(augmented reality) + UCC + 소셜 네트워크를 이용한 새로운 개념의 소셜 서비스 인 것 같다. 최근 다양한 미디어와 디바이스를 바탕으로 한 이러한 서비스들이 우훅죽순으로 쏟아져 나오고 있는데 향후 3~5년 뒤가 참 기대된다. 더불어 이와 관련된 데이터 관리(data management) 이슈들도 많이 제기 될 것이다. 그런데 국내 [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/KgTwSXK_5dg&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/KgTwSXK_5dg&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Sekai Camera라는 어플이 앱스토어에 글로벌 버전으로 출시됐다고 한다. 살펴 보니 증강현실(augmented reality) + UCC + 소셜 네트워크를 이용한 새로운 개념의 소셜 서비스 인 것 같다. 최근 다양한 미디어와 디바이스를 바탕으로 한 이러한 서비스들이 우훅죽순으로 쏟아져 나오고 있는데 향후 3~5년 뒤가 참 기대된다. 더불어 이와 관련된 데이터 관리(data management) 이슈들도 많이 제기 될 것이다. 그런데 국내 IT업체들은 지금 같이 급변하는 미디어 및 기술의 변화 속에서 현재 어떤 아이디어를 가지고 미래를 준비하고 있는지 참 궁금하다.</p>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/VJujC7i4IOk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/12/%ec%83%88%eb%a1%9c%ec%9a%b4-%ea%b0%9c%eb%85%90%ec%9d%98-%ec%86%8c%ec%85%9c-%ec%84%9c%eb%b9%84%ec%8a%a4-sekai-camera/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/12/%ec%83%88%eb%a1%9c%ec%9a%b4-%ea%b0%9c%eb%85%90%ec%9d%98-%ec%86%8c%ec%85%9c-%ec%84%9c%eb%b9%84%ec%8a%a4-sekai-camera/</feedburner:origLink></item>
		<item>
		<title>How to Create A Table in HBase for Beginners</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/e7M6U-4SMds/</link>
		<comments>http://diveintodata.org/2009/11/how-to-make-a-table-in-hbase-for-beginners/#comments</comments>
		<pubDate>Fri, 27 Nov 2009 02:33:36 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[FOSS]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[create table]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[table]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=527</guid>
		<description><![CDATA[I have accumulated some knowledge and know-how about MapReduce, Hadoop, and HBase since I participated in some projects. From hence, I&#8217;ll post the know-how of HBase by period. Today, I&#8217;m going to introduce a way to make a hbase table in java.
HBase provides two ways to allow a Hbase client to connect HBase master. One [...]]]></description>
			<content:encoded><![CDATA[<p>I have accumulated some knowledge and know-how about MapReduce, Hadoop, and HBase since I participated in some projects. From hence, I&#8217;ll post the know-how of HBase by period. Today, I&#8217;m going to introduce a way to make a hbase table in java.</p>
<p>HBase provides two ways to allow a Hbase client to connect HBase master. One is to use a instance of <a href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HBaseAdmin.html" target="_blank">HBaseAdmin</a> class. HBaseAdmin provides some methods for creating, modifying, and deleting tables and column families. Another way is to use an instance of HTable class. This class almost provides some methods to manipulate data like inserting, modifying, and deleting rows and cells.</p>
<p>Thus, in order to make a hbase table, we need to connect a HBase master by initializing a instance of HBaseAdmin like line 4. HBaseAdmin requires an instance of <a href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/HBaseConfiguration.html" target="_blank">HBaseConfiguration</a>. If necessary, you may set some configurations like line 2.</p>
<p>In order to describe HBase schema,  we make an instances of <a href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/HColumnDescriptor.html" target="_blank">HColumnDescriptor</a> for each column family. In addition to column family names, HColumnDescriptor enables you to set various parameters, such as maxVersions, compression type, timeToLive, and bloomFilter. Then, we can create a HBase table by invoking createTable like line 10.</p>
<pre class="brush: java;">
HBaseConfiguration conf = new HBaseConfiguration();
conf.set(&quot;hbase.master&quot;,&quot;localhost:60000&quot;);

HBaseAdmin hbase = new HBaseAdmin(conf);
HTableDescriptor desc = new HTableDescriptor(&quot;TEST&quot;);
HColumnDescriptor meta = new HColumnDescriptor(&quot;personal&quot;.getBytes());
HColumnDescriptor prefix = new HColumnDescriptor(&quot;account&quot;.getBytes());
desc.addFamily(meta);
desc.addFamily(prefix);
hbase.createTable(desc);
</pre>
<p>Finally, you can check your hbase table as the following commands.</p>
<pre class="brush: bash;">
c0d3h4ck@code:~/Development/hbase$ bin/hbase shell
HBase Shell; enter 'help&lt;RETURN&gt;' for list of supported commands.
Version: 0.20.1, r822817, Wed Oct  7 11:55:42 PDT 2009
hbase(main):001:0&gt; list
TEST

1 row(s) in 0.0940 seconds
</pre>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/e7M6U-4SMds" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/11/how-to-make-a-table-in-hbase-for-beginners/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/11/how-to-make-a-table-in-hbase-for-beginners/</feedburner:origLink></item>
		<item>
		<title>ACM SIGMOD 2010 Programming Contest</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/n4RDIQua-Wc/</link>
		<comments>http://diveintodata.org/2009/11/acm-sigmod-2010-programming-contest/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 11:44:06 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[acm]]></category>
		<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[programming contest]]></category>
		<category><![CDATA[relational database]]></category>
		<category><![CDATA[SIGMOD]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=515</guid>
		<description><![CDATA[As you know, SIGMOD is ACM&#8217;s Special Interest Group on Management of Data. SIGMOD holds the annual conference that is regarded as one of the best conference in computer science. Besides, SIGMOD organizes a programming contest in parallel with the ACM SIGMOD conference. Below description is the call for the programming contest of this year. [...]]]></description>
			<content:encoded><![CDATA[<p>As you know, SIGMOD is ACM&#8217;s Special Interest Group on Management of Data. SIGMOD holds the annual conference that is regarded as one of the best conference in computer science. Besides, SIGMOD organizes a programming contest in parallel with the ACM SIGMOD conference. Below description is the call for the programming contest of this year. The programming contest&#8217;s subject of this year seems very interesting! The task is to implement a simple distributed query executor built on top of last year&#8217;s main-memory index. The environment on which contestants will test their implementation may be provided by Amazon. If you are interested in this programming contest, try that. You can get further information from here (<a href="http://dbweb.enst.fr/events/sigmod10contest/" target="_blank">http://dbweb.enst.fr/events/sigmod10contest</a>).</p>
<blockquote><p>A programming contest is organized in parallel with the ACM SIGMOD 2010 conference, following the success of the first annual SIGMOD programming contest organized last year. Student teams from degree-granting institutions are invited to compete to develop a distributed query engine over relational data. Submissions will be judged on the overall performance of the system on a variety of workloads. A shortlist of finalists will be invited to present their implementation at the SIGMOD conference in June 2010 in Indianapolis, USA. The winning team, to be selected during the conference, will be awarded a prize of 5,000 USD and will be invited to a one-week research visit in Paris. The winning system, released in open source, will form a building block of a complete distributed database system which will be built over the years, throughout the programming contests.</p></blockquote>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/n4RDIQua-Wc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/11/acm-sigmod-2010-programming-contest/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/11/acm-sigmod-2010-programming-contest/</feedburner:origLink></item>
		<item>
		<title>CIKM 2009 in Hong Kong</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/0_OrfHqXzy0/</link>
		<comments>http://diveintodata.org/2009/11/cikm-2009-in-hong-kong/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 15:08:26 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[cikm]]></category>
		<category><![CDATA[cikm09]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[Hong Kong]]></category>
		<category><![CDATA[spider]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=473</guid>
		<description><![CDATA[With Min Kyoung Sung who is a coauthor of  &#8216;SPIDER : A System for Scalable, Parallel / Distributed Evaluation of large-scale RDF Data&#8216;, I participated in 18th ACM CIKM 2009 (Conference on Information and Knowledge Management) held in Hong Kong. We stayed in Marriott Hotel near the Asia World-Expo at which CIKM 2009 held. At [...]]]></description>
			<content:encoded><![CDATA[<p>With Min Kyoung Sung who is a coauthor of  &#8216;<a href="http://dbserver.korea.ac.kr/projects/spider/" target="_blank"><em>SPIDER : A System for Scalable, Parallel / Distributed Evaluation of large-scale RDF Data</em></a>&#8216;, I participated in <a href="http://www.comp.polyu.edu.hk/conference/cikm2009/about/index.htm" target="_blank">18th ACM CIKM 2009 (Conference on Information and Knowledge Management)</a> held in Hong Kong. We stayed in Marriott Hotel near the <a href="http://www.asiaworld-expo.com/" target="_blank">Asia World-Expo</a> at which CIKM 2009 held. At this conference, I got along with several Korean researchers (<strong></strong>Kyong-Ha Lee, Jinoh Oh, and Sangchul Kim) and I discussed about SPIDER with some researchers who are interested in RDF data processing during the demonstration session.</p>
<p>At CIKM 2009, I felt that the recent trend of web data management are being changed to information extraction and semantic or structured web data rather then unstructured data. Many papers and posters addressed these issues. In addition, the subject of the panel was ‘<span><strong> <em>Information extraction meets relational databases: Where    are we heading?</em></strong></span>’ One of the panel said that the hot spot of web data management research changes from crawling, indexing, and searching to information extraction and semantic data. These changes lead to new various data and knowledge management issues. Besides information extraction, graph data mining was one of the main hot issues in CIKM 2009.</p>
<p>At the main keynote, Kyu-Young Hwang (KAIST, Korea) spoke &#8216;<span style="font-style: italic; font-weight: bold;">DB-IR Integration and Its Application to a Massively-Parallel Search Engine.&#8217; </span>Its key subject is that DB-IR integration is becoming one of major challenges in the database area, so it is leading to new DBMS architecture applicable to DB-IR integration. In addition, Edward Chang (Google Research China) and Clement Yu (University of Illinois at Chicago) spoke &#8216;<strong><em>Confucius and its intelligent Disciples</em></strong>&#8216; and &#8216;<strong><em>Advanced Metasearch Engines</em>&#8216;</strong> respectively.</p>
<p style="text-align: center;"><a class="flickr-image alignnone" title="Coffee Break at CIKM 2009" rel="flickr-mgr[CIKM]" href="http://www.flickr.com/photos/hyunsik/4088464259/"><img class="flickr-medium" src="http://farm3.static.flickr.com/2764/4088464259_4f6498eca2_m.jpg" alt="Coffee Break at CIKM 2009" /></a><a class="flickr-image alignnone" title="SPIDER in Demo Session" rel="flickr-mgr[CIKM]" href="http://www.flickr.com/photos/hyunsik/4088463803/"><img class="flickr-medium" src="http://farm3.static.flickr.com/2752/4088463803_b53bbd8646_m.jpg" alt="SPIDER in Demo Session" /></a></p>
<p style="text-align: center;"><a class="flickr-image alignnone" title="Tian Tan Buddha Statue in Hong Kong" rel="flickr-mgr[CIKM]" href="http://www.flickr.com/photos/hyunsik/4088461317/"><img class="flickr-medium" src="http://farm3.static.flickr.com/2609/4088461317_5546d70eff_m.jpg" alt="Tian Tan Buddha Statue in Hong Kong" /></a><a class="flickr-image alignnone" title="The lunch time in CIKM 2009" rel="flickr-mgr[CIKM]" href="http://www.flickr.com/photos/hyunsik/4088462251/"><img class="flickr-medium" src="http://farm3.static.flickr.com/2591/4088462251_d5875a68e3_m.jpg" alt="The lunch time in CIKM 2009" /></a></p>
<p>This conference was a really nice experience for me. I enjoyed the conference, reception, and banquet. However, I have an unsatisfied feeling because I didn&#8217;t participate in <a href="http://www.clouddb.org/CloudDB09/" target="_blank">the 1st Workshop CloudDB 2009</a> in conjunction in CIKM 2009.</p>
<p>Anyway, this conference inspired Min Kyoung Sung and me. It may be kept in our mind for long time.</p>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/0_OrfHqXzy0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/11/cikm-2009-in-hong-kong/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/11/cikm-2009-in-hong-kong/</feedburner:origLink></item>
		<item>
		<title>MapReduce Online Comes Out!</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/ZTE2NNQt_uo/</link>
		<comments>http://diveintodata.org/2009/10/mapreduce-onlie-comes-out/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 15:49:37 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[map-reduce]]></category>
		<category><![CDATA[online aggregation]]></category>
		<category><![CDATA[stream queries]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=461</guid>
		<description><![CDATA[MapReduce has been gaining much attention in data intensive computing field. As you know, it is well known as a very popular framework for batch-processing.
Recently, however, Tyson Condie who is a Ph.D student in UC Berkeley accomplishes MapReduce Online. Today, I heard this news from Data Beta. Actually, It is amazing works since the original [...]]]></description>
			<content:encoded><![CDATA[<p>MapReduce has been gaining much attention in data intensive computing field. As you know, it is well known as a very popular framework for batch-processing.</p>
<p>Recently, however, Tyson Condie who is a Ph.D student in UC Berkeley accomplishes <a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.html" target="_self">MapReduce Online</a>. Today, I heard this news from <a href="http://databeta.wordpress.com/2009/10/18/mapreduce-online/" target="_self">Data Beta</a>. Actually, It is amazing works since the original MapReduce is specialized and designed for only batch-processing. In addition, most people believe that MapReduce will remain a batch-processing.</p>
<p>The essential of MapReduce online is that it tries to hold the fault-tolerance model of the <a href="http://labs.google.com/papers/mapreduce.html" target="_self">original MapReduce</a>, whereas it provides the the pipelining of results across tasks and jobs instead of materializing the output of each MapReduce task and job into disk. Consequently, MapReduce online enables the program to return the result earlier from a big job.</p>
<p>You can get further information from <a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.html" target="_self">MapReduce Online</a>.</p>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/ZTE2NNQt_uo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/10/mapreduce-onlie-comes-out/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/10/mapreduce-onlie-comes-out/</feedburner:origLink></item>
		<item>
		<title>BSP Library on Hadoop?</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/8lAjCvhr7mA/</link>
		<comments>http://diveintodata.org/2009/10/bsp-library-on-hadoop/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 11:45:33 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[FOSS]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[angrapa]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[bsp]]></category>
		<category><![CDATA[bulk synchronization parallel]]></category>
		<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hama]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=443</guid>
		<description><![CDATA[Recently, I started to participate in the Hama project (a distributed scientific package on Hadoop for massive matrix and graph data), and I have taken the times to develop the bulk synchronization parallel (BSP) library on Hadoop (HAMA-195); I&#8217;m getting help from Edword Yoon, a founder of Hama project. The motivation of BSP lib is [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I started to participate in the <a href="http://incubator.apache.org/hama/" target="_self">Hama project</a> (a distributed scientific package on Hadoop for massive matrix and graph data), and I have taken the times to develop the <a href="http://en.wikipedia.org/wiki/Bulk_synchronous_parallel" target="_self">bulk synchronization parallel</a> (BSP) library on Hadoop (<a href="https://issues.apache.org/jira/browse/HAMA-195" target="_self">HAMA-195</a>); I&#8217;m getting help from <a href="http://blog.udanax.org/" target="_self">Edword Yoon</a>, a founder of Hama project. The motivation of BSP lib is definitely clear.</p>
<p>The hadoop platforms are installed in cloud computing service providers and many companies as you can see in <a href="http://wiki.apache.org/hadoop/PoweredBy" target="_self">http://wiki.apache.org/hadoop/PoweredBy</a>. However, most of them may use only MapReduce programs. As you know although MapReduce is very scalability, but it provides only the simple programming model. Many programmers want to use more various programming model without changing the platform (i.e., <a href="http://hadoop.apache.org" target="_self">Hadoop</a>). This BSP lib will be the beginning for their desires. However, like MapReduce, BSP may also be not swiss army knife. When we find appropriate applications, BSP lib on Hadoop will be valued for its scalability and ability.</p>
<p>Sooner, I&#8217;ll post articles about the progress of BSP library and <a href="http://wiki.apache.org/hama/GraphPackage" target="_self">Angrapa</a> (the graph package on Hama).</p>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/8lAjCvhr7mA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/10/bsp-library-on-hadoop/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/10/bsp-library-on-hadoop/</feedburner:origLink></item>
		<item>
		<title>Google’s New Location-based Service</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/ipCz9BpOiNU/</link>
		<comments>http://diveintodata.org/2009/10/googles-new-location-based-service/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 15:08:07 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[location-based service]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[mobile service]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=430</guid>
		<description><![CDATA[I always feel that Google is leading internet services. Recently, Google map provides the map service that allows users to search locations by given query keywords, such as restaurant, hospital, and gas station. They can be ordered by the distance from user&#8217;s location, the user-preferred ranking, and both. In addition, Google presents the new local [...]]]></description>
			<content:encoded><![CDATA[<p>I always feel that Google is leading internet services. Recently, <a href="http://maps.google.com/" target="_blank">Google map</a> provides the map service that allows users to search locations by given query keywords, such as restaurant, hospital, and gas station. They can be ordered by the distance from user&#8217;s location, the user-preferred ranking, and both. In addition, Google presents the new local search for mobile tab. This service enables users to mark some locations with stars and to can call starred places through only few clicks. Below video shows that service.</p>
<div><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="560" height="340" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/Y_62nFjUW7Q&amp;hl=ko&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="560" height="340" src="http://www.youtube.com/v/Y_62nFjUW7Q&amp;hl=ko&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<div>Actually, these services are not new in the academic&#8217;s point of view , but Google are realizing things that are mentioned in the literatures.</div>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/ipCz9BpOiNU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/10/googles-new-location-based-service/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/10/googles-new-location-based-service/</feedburner:origLink></item>
		<item>
		<title>Java Universal Network/Graph Framework</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/g_9TJPnoSz8/</link>
		<comments>http://diveintodata.org/2009/09/java-universal-networkgraph-framework/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 23:30:45 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[jung]]></category>
		<category><![CDATA[visualization tools]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=349</guid>
		<description><![CDATA[Recently, I&#8217;m primarily concerned with large-scale graph data processing. Occasionally, the visualization of graph can be a good way for us to observe some properties from graph data sets. Today, I&#8217;m going to introduce a graph framework, called Java Universal Network/Graph Framework (Jung). Jung provides data structures for graph, a programming interface familiar with graph [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I&#8217;m primarily concerned with large-scale graph data processing. Occasionally, the visualization of graph can be a good way for us to observe some properties from graph data sets. Today, I&#8217;m going to introduce a graph framework, called <em><a href="http://jung.sourceforge.net/" target="_blank">Java Universal Network/Graph Framework (Jung)</a>. </em>Jung provides data structures for graph, a programming interface familiar with graph features, some fundamental graph algorithms (e.g., minimum spanning tree, depth-first search, breath-first search, and dijkstra algorithm), and even visualization methods. Especially, I&#8217;m interested in its visualization methods.</p>
<p>The following java source shows the programming interface of Jung. In more detail, this program make a graph, add three vertices to the graph, and connect vertices. This source code is brought from <a href="http://jung.sourceforge.net/doc/index.html" target="_blank">Jung tutorial</a>. As you can see, Jung&#8217;s APIs are very easy.</p>
<pre class="brush: java;">
  // Make a graph by a SparseMultigraph instance.
  Graph&lt;Integer, String&gt; g = new SparseMultigraph&lt;Integer, String&gt;();
  g.addVertex((Integer)1); // Add a vertex with an integer 1
  g.addVertex((Integer)2);
  g.addVertex((Integer)3);
  g.addEdge(&quot;Edge-A&quot;, 1,3); // Added an edge to connect between 1 and 3 vertices.
  g.addEdge(&quot;Edge-B&quot;, 2,3, EdgeType.DIRECTED);
  g.addEdge(&quot;Edge-C&quot;, 3, 2, EdgeType.DIRECTED);
  g.addEdge(&quot;Edge-P&quot;, 2,3); // A parallel edge

  // Make some objects for graph layout and visualization.
  Layout&lt;Integer, String&gt; layout = new KKLayout&lt;Integer, String&gt;(g);
  BasicVisualizationServer&lt;Integer, String&gt; vv =
  new BasicVisualizationServer&lt;Integer, String&gt;(layout);
  vv.setPreferredSize(new Dimension(800,800));

  // It determine how each vertex with its value is represented in a diagram.
  ToStringLabeller&lt;Integer&gt; vertexPaint = new ToStringLabeller&lt;Integer&gt;() {
    public String transform(Integer i) {
    return &quot;&quot;+i;
   }
  };

  vv.getRenderContext().setVertexLabelTransformer(vertexPaint);

  JFrame frame = new JFrame(&quot;Simple Graph View&quot;);
  frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
  frame.getContentPane().add(vv);
  frame.pack();
  frame.setVisible(true);
</pre>
<p>Some APIs of the Jung are based on generic programming, so you can use easily vertices or edges to contains user-defined data. If you want more detail information, visit <a href="http://jung.sourceforge.net/">http://jung.sourceforge.net</a>.</p>
<p>The above source code shows the following diagram.<br />
<a class="flickr-image aligncenter" title="Jung example" rel="flickr-mgr" href="http://www.flickr.com/photos/hyunsik/3919489249/"><img class="flickr-medium aligncenter" src="http://farm3.static.flickr.com/2646/3919489249_3377cc8c63.jpg" alt="Jung example" width="347" height="346" /></a></p>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/g_9TJPnoSz8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/09/java-universal-networkgraph-framework/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/09/java-universal-networkgraph-framework/</feedburner:origLink></item>
		<item>
		<title>Zipf Distribution Generator in Java</title>
		<link>http://feedproxy.google.com/~r/diveintodata/~3/7YZdCPkhSl0/</link>
		<comments>http://diveintodata.org/2009/09/zipf-distribution-generator-in-java/#comments</comments>
		<pubDate>Sun, 13 Sep 2009 14:17:34 +0000</pubDate>
		<dc:creator>Hyunsik Choi</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[zipf]]></category>

		<guid isPermaLink="false">http://diveintodata.org/?p=369</guid>
		<description><![CDATA[When I carry out some experiments, I usually make synthetic data sets generated by  some probability distributions.  Especially, Zipf distribution is frequently used for a synthetic data set. Zipf distribution is  one of the discrete power law probability distributions. You can get detail information from Zipf&#8217;s law in Wikipedia. Anyway, I attached my own java [...]]]></description>
			<content:encoded><![CDATA[<p>When I carry out some experiments, I usually make synthetic data sets generated by  some probability distributions.  Especially, Zipf distribution is frequently used for a synthetic data set. Zipf distribution is  one of the discrete power law probability distributions. You can get detail information from <a href="http://en.wikipedia.org/wiki/Zipf%27s_law" target="_blank">Zipf&#8217;s law</a> in Wikipedia. Anyway, I attached my own java class for zip distribution. Below graphs are generated by my own java code and the gnuplot.</p>
<pre><a class="flickr-image alignleft" title="Zipf Distribution (s=1)" rel="flickr-mgr" href="http://www.flickr.com/photos/hyunsik/3914971725/"><img class="flickr-medium" src="http://farm3.static.flickr.com/2528/3914971725_39800bd7f5_m.jpg" alt="Zipf Distribution (s=1)" /></a><a class="flickr-image alignnone" title="Zipf Distribution with log scale (s=1)" rel="flickr-mgr" href="http://www.flickr.com/photos/hyunsik/3914971927/"><img class="flickr-medium" src="http://farm3.static.flickr.com/2486/3914971927_df23796db2_m.jpg" alt="Zipf Distribution with log scale (s=1)" /></a>
<pre class="brush: java;">
import java.util.Random;

public class ZipfGenerator {
 private Random rnd = new Random(System.currentTimeMillis());
 private int size;
 private double skew;
 private double bottom = 0;

 public ZipfGenerator(int size, double skew) {
  this.size = size;
  this.skew = skew;

  for(int i=1;i&lt;size; i++) {
  this.bottom += (1/Math.pow(i, this.skew));
  }
 }

 // the next() method returns an rank id. The frequency of returned rank ids are follows Zipf distribution.
 public int next() {
   int rank;
   double friquency = 0;
   double dice;

   rank = rnd.nextInt(size);
   friquency = (1.0d / Math.pow(rank, this.skew)) / this.bottom;
   dice = rnd.nextDouble();

   while(!(dice &lt; friquency)) {
     rank = rnd.nextInt(size);
     friquency = (1.0d / Math.pow(rank, this.skew)) / this.bottom;
     dice = rnd.nextDouble();
   }

   return rank;
 }

 // This method returns a probability that the given rank occurs.
 public double getProbability(int rank) {
   return (1.0d / Math.pow(rank, this.skew)) / this.bottom;
 }

 public static void main(String[] args) {
   if(args.length != 2) {
     System.out.println(&quot;usage: ./zipf size skew&quot;);
     System.exit(-1);
   }

   ZipfGenerator zipf = new ZipfGenerator(Integer.valueOf(args[0]),
   Double.valueOf(args[1]));
   for(int i=1;i&lt;=100;i++)
     System.out.println(i+&quot; &quot; +zipf.getProbability(i));
 }
}
</pre>
</pre>
<img src="http://feeds.feedburner.com/~r/diveintodata/~4/7YZdCPkhSl0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://diveintodata.org/2009/09/zipf-distribution-generator-in-java/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://diveintodata.org/2009/09/zipf-distribution-generator-in-java/</feedburner:origLink></item>
	</channel>
</rss>
