<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3256159328630041416</id><updated>2017-12-08T21:22:09.954-06:00</updated><category term="data mining"/><category term="python"/><category term="proc fcmp"/><category term="r"/><category term="vba"/><category term="9.3"/><category term="macro"/><category term="proc iml"/><category term="proc sql"/><category term="spark"/><category term="visualization"/><category term="SAS"/><category term="credit risk"/><category term="marketing"/><category term="statistics"/><category term="web analytics"/><category term="Hadoop"/><category term="algorithm"/><category term="market risk"/><category term="data manipulation"/><category term="hash object"/><category term="sql server"/><category term="sqlite"/><category term="data cleaning"/><category term="data preparation"/><category term="healthcare"/><category term="javascript"/><category term="9.4"/><category term="Elasticsearch"/><category term="Mainframe"/><category term="db2"/><category term="flask"/><category term="option"/><category term="redis"/><title type='text'>Software &amp; Service</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default?redirect=false'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default?start-index=26&amp;max-results=25&amp;redirect=false'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>162</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-837342275205197863</id><published>2017-09-11T23:23:00.000-05:00</published><updated>2017-09-11T23:23:01.774-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="redis"/><title type='text'>Algorithmic pricing with Redis</title><content type='html'>&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://4.bp.blogspot.com/-W_iFHzF0k_s/Wbbgk5FuUkI/AAAAAAAAMXk/Gc8yiuBnV7s418pgKGsW1krqRO5BYeFyQCLcBGAs/s1600/blur-1853262_640.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; data-original-height=&quot;480&quot; data-original-width=&quot;640&quot; height=&quot;240&quot; src=&quot;https://4.bp.blogspot.com/-W_iFHzF0k_s/Wbbgk5FuUkI/AAAAAAAAMXk/Gc8yiuBnV7s418pgKGsW1krqRO5BYeFyQCLcBGAs/s320/blur-1853262_640.jpg&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The algorithmic pricing is an exciting new area, and it combines engineering and mathematics. &lt;a href=&quot;https://mislove.org/publications/Amazon-WWW.pdf&quot;&gt;Chen’s paper&lt;/a&gt; has introduced the algorithmic pricing on Amazon Marketplace. This post is to discuss the implementation of an algorithmic pricing based on Redis from the perspective of the sellers.&lt;br /&gt;&lt;h3 id=&quot;the-background&quot;&gt;The background&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;Each of Amazon’s ASINs will have many sellers that compete each other.&lt;/li&gt;&lt;li&gt;Amazon has a ranking mechanism for Buy Box, say, to punish a new seller. But for the same ASIN, the seller who has the lowest price usually wins the Buy Box. &lt;/li&gt;&lt;li&gt;For each ASIN, each of many sellers will have an optimal price (the price they want to sell) and a lowest acceptable price (they cannot sell if the price is below it). &lt;/li&gt;&lt;li&gt;Amazon allows a seller to change price for an ASIN every 15 minutes.&lt;/li&gt;&lt;li&gt;Why 15 minutes? Because Amazon said that it takes up to 15 minutes for all systems to converge to the new price. But it is not exactly true. Amazon MWS uses 4 data centers in North America and data synchronization is not that fast for each of the databases or each of the data centers. &lt;/li&gt;&lt;li&gt;Amazon MWS has APIs. But don’t rely on it, and build your own crawlers and price adjustor. The reasons are explained later. &lt;/li&gt;&lt;li&gt;The sellers’ target variables are mainly  &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Buy Box take-over percentage (how they beat the other competitors)&lt;/li&gt;&lt;li&gt;Profit margin (the higher prices they sell goods at, the better)&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3 id=&quot;how-an-algorithmic-pricing-engine-works&quot;&gt;How an algorithmic pricing engine works&lt;/h3&gt;The overall infrastructure may include three parts. &lt;br /&gt;&lt;br /&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://2.bp.blogspot.com/-1y96SPZ5uYU/Wbbgz6qS29I/AAAAAAAAMXo/nDqKZBbS0GIviVhV1PVBc3G8KRMKqwfkACLcBGAs/s1600/Note4_0%25281%2529.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; data-original-height=&quot;1600&quot; data-original-width=&quot;900&quot; height=&quot;320&quot; src=&quot;https://2.bp.blogspot.com/-1y96SPZ5uYU/Wbbgz6qS29I/AAAAAAAAMXo/nDqKZBbS0GIviVhV1PVBc3G8KRMKqwfkACLcBGAs/s320/Note4_0%25281%2529.png&quot; width=&quot;180&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;h4 id=&quot;1-the-distributed-crawler&quot;&gt;1. The distributed crawler&lt;/h4&gt;Amazon most time allows a crawler running 1 QPS/IP. But all the data centers and all ASINs from the seller and his competitors have to be closely watched. So a distributed approach will be safer. The response time will be crucial to decide which one is the winner within the game. In a common chasing diagram below, clearly Seller 1 has the upper hand and Seller 2 is losing the group, since the crawler from Seller 1 is faster.&lt;br /&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://4.bp.blogspot.com/-rxVvgnIeXlM/WbbikUDj-AI/AAAAAAAAMYA/-DolXZWm21AyRWEmkSmjhX_A9-_HZpaiwCLcBGAs/s1600/Picture1.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; data-original-height=&quot;803&quot; data-original-width=&quot;721&quot; height=&quot;320&quot; src=&quot;https://4.bp.blogspot.com/-rxVvgnIeXlM/WbbikUDj-AI/AAAAAAAAMYA/-DolXZWm21AyRWEmkSmjhX_A9-_HZpaiwCLcBGAs/s320/Picture1.png&quot; width=&quot;287&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;h4 id=&quot;2-the-algorithm-core&quot;&gt;2. The algorithm core&lt;/h4&gt;The embedded algorithm will have two purposes&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Analyze the data from the crawler and detect other competitors’ optimal price/lowest acceptable price and strategy. &lt;/li&gt;&lt;li&gt;Adjust price according to an algorithm. The strategy can be either as simple as &quot;minus one cent from the current competitor&#39;s price&quot;, or as complicated as machine learning or deep learning.&lt;/li&gt;&lt;/ul&gt;&lt;h4 id=&quot;3-the-price-adjustor&quot;&gt;3. The price adjustor&lt;/h4&gt;The computed price will enter Amazon and become effective. Amazon will rank lower for the sellers whom he thinks doing algorithmic pricing. Sometimes Amazon even bans the seller. So don’t let Amazon to find that a computer is manipulating its APIs. To emulate human’s behavior on a browser, the two options are phantom.js and headless Chrome.&lt;br /&gt;&lt;h3 id=&quot;the-role-of-redis&quot;&gt;The role of Redis&lt;/h3&gt;Redis is a in-memory data store, which supports persistence and sharding. The first usage for this algorithmic pricing engine is that it can be used as a centralized task scheduler for the crawlers. Besides that, some other interesting fields in algorithmic pricing could be explored and utilized with Redis. &lt;br /&gt;&lt;h5 id=&quot;1-predicative-pricing&quot;&gt;1. Predicative pricing&lt;/h5&gt;Redis as a cache has support for TTL(time to live). With the accumulation of the data, the competitors’ price changing time could be predicted. In the publisher-subscriber model, each time the predicted duration for next price changing can be inputted as an expiring key with TTL. Once the key expires, the publisher dispatches a crawling task and makes the price adjustment. The good thing for this approach is that the crawlers don’t need to tap a single web page from Amazon every second that brings the risk of being banned. &lt;br /&gt;&lt;br /&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs asciidoc&quot;&gt;&lt;span class=&quot;hljs-code&quot;&gt;+--------------------+&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-header&quot;&gt;|  Subscriber        |&lt;br /&gt;+--------------------+&lt;/span&gt;&lt;br /&gt;|                    |&lt;br /&gt;| &lt;span class=&quot;hljs-code&quot;&gt;+ psubscribe():void|  +&lt;/span&gt;--------------+&lt;br /&gt;|                    |  |      PubSub  |&lt;br /&gt;&lt;span class=&quot;hljs-code&quot;&gt;+--------------------+&lt;/span&gt;  &lt;span class=&quot;hljs-code&quot;&gt;+--------+&lt;/span&gt;-----+&lt;br /&gt;&lt;span class=&quot;hljs-code&quot;&gt;                                 ^&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-header&quot;&gt;                                 |&lt;br /&gt;+--------------------------------+-----+&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-header&quot;&gt;|       Publisher                      |&lt;br /&gt;+--------------------------------------+&lt;/span&gt;&lt;br /&gt;|                                      |&lt;br /&gt;|  + calculateNextTTL():int            |&lt;br /&gt;|  + onPMessage():void                 |&lt;br /&gt;|  + dispatchCrawler():boolean         |&lt;br /&gt;&lt;span class=&quot;hljs-header&quot;&gt;|                                      |&lt;br /&gt;+--------------------------------------+&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h5 id=&quot;2-cross-data-center-hedging&quot;&gt;2. Cross-data-center hedging&lt;/h5&gt;The synchronization mechanism across the data centers costs time sometimes hours. That is the reason why we see different prices at different IPs of Amazon at the same time. People will also have different purchasing behavior pattern at different time. Since a seller has the option to change price at a specified data center with IP instead of domain name, it will be an interesting topic to utilize the cool down time for the price to spread to the overall network and make a hedging. Redis’ capacity to keep all prices as hashes in memory will be helpful to spot those valuable occasions.&lt;br /&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://3.bp.blogspot.com/-Q9Qhl49SEsQ/Wbbg7_kfJlI/AAAAAAAAMXs/KVCoVfms_IYG02HGohwJ-fj42xoKei1agCLcBGAs/s1600/Note4_2.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; data-original-height=&quot;1600&quot; data-original-width=&quot;900&quot; height=&quot;640&quot; src=&quot;https://3.bp.blogspot.com/-Q9Qhl49SEsQ/Wbbg7_kfJlI/AAAAAAAAMXs/KVCoVfms_IYG02HGohwJ-fj42xoKei1agCLcBGAs/s640/Note4_2.png&quot; width=&quot;360&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/837342275205197863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/837342275205197863'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2017/09/algorithmic-pricing-with-redis.html' title='Algorithmic pricing with Redis'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://4.bp.blogspot.com/-W_iFHzF0k_s/Wbbgk5FuUkI/AAAAAAAAMXk/Gc8yiuBnV7s418pgKGsW1krqRO5BYeFyQCLcBGAs/s72-c/blur-1853262_640.jpg" height="72" width="72"/></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-963364892874150808</id><published>2017-02-17T10:38:00.001-06:00</published><updated>2017-02-17T10:45:41.189-06:00</updated><title type='text'>Good math, bad engineering</title><content type='html'>&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://3.bp.blogspot.com/-CxYsH_YbSeM/WKcncNVtVAI/AAAAAAAALZI/smM-UPJ7L7E46tGMbzMqhWpiEksVJO0agCLcB/s1600/math-1547018_960_720.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;324&quot; src=&quot;https://3.bp.blogspot.com/-CxYsH_YbSeM/WKcncNVtVAI/AAAAAAAALZI/smM-UPJ7L7E46tGMbzMqhWpiEksVJO0agCLcB/s640/math-1547018_960_720.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s ability to find the abstraction and the engineer’s ability to find the implementation.&lt;br /&gt;&lt;br /&gt;For a typical engineering problem, the steps are usually - &lt;br /&gt;- 1. Abstract the problem with a formula or some pseudocodes &lt;br /&gt;- 2. Solve the problem with the formula  &lt;br /&gt;- 3. Iterate the initial solution until it achieves the optimal time complexity and space complexity&lt;br /&gt;&lt;br /&gt;I feel that a mathematician would like dynamic programming or DP questions most, because they are too similar to the typical deduction question in math. An engineer will feel it challenging, since it needs the imagination and some sense of math. &lt;br /&gt;&lt;br /&gt;The formula is the most important: without it, try-and-error or debugging does not help. Once the the formula is figured out, the rest becomes a piece of the cake. However, sometimes things are not that straightforward. Good mathematics does not always lead to good engineering.&lt;br /&gt;&lt;br /&gt;Let’s see &lt;a href=&quot;https://leetcode.com/problems/target-sum/?tab=Solutions&quot;&gt;one question from Leetcode&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs livecodeserver&quot;&gt;You are given &lt;span class=&quot;hljs-operator&quot;&gt;a&lt;/span&gt; list &lt;span class=&quot;hljs-operator&quot;&gt;of&lt;/span&gt; non-negative integers, a1, a2, ..., &lt;span class=&quot;hljs-operator&quot;&gt;an&lt;/span&gt;, &lt;span class=&quot;hljs-operator&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;hljs-operator&quot;&gt;a&lt;/span&gt; target, S. Now you have &lt;span class=&quot;hljs-number&quot;&gt;2&lt;/span&gt; symbols + &lt;span class=&quot;hljs-operator&quot;&gt;and&lt;/span&gt; -. For &lt;span class=&quot;hljs-keyword&quot;&gt;each&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;integer&lt;/span&gt;, you should choose &lt;span class=&quot;hljs-constant&quot;&gt;one&lt;/span&gt; &lt;span class=&quot;hljs-built_in&quot;&gt;from&lt;/span&gt; + &lt;span class=&quot;hljs-operator&quot;&gt;and&lt;/span&gt; - &lt;span class=&quot;hljs-keyword&quot;&gt;as&lt;/span&gt; its &lt;span class=&quot;hljs-built_in&quot;&gt;new&lt;/span&gt; symbol.&lt;br /&gt;&lt;br /&gt;Find out how many ways &lt;span class=&quot;hljs-built_in&quot;&gt;to&lt;/span&gt; assign symbols &lt;span class=&quot;hljs-built_in&quot;&gt;to&lt;/span&gt; make &lt;span class=&quot;hljs-built_in&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;hljs-operator&quot;&gt;of&lt;/span&gt; integers equal &lt;span class=&quot;hljs-built_in&quot;&gt;to&lt;/span&gt; target S.&lt;br /&gt;&lt;br /&gt;Example &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;Input: nums is [&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;], S is &lt;span class=&quot;hljs-number&quot;&gt;3.&lt;/span&gt; &lt;br /&gt;Output: &lt;span class=&quot;hljs-number&quot;&gt;5&lt;/span&gt;&lt;br /&gt;Explanation: &lt;br /&gt;&lt;br /&gt;-&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt; = &lt;span class=&quot;hljs-number&quot;&gt;3&lt;/span&gt;&lt;br /&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;-&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt; = &lt;span class=&quot;hljs-number&quot;&gt;3&lt;/span&gt;&lt;br /&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;-&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt; = &lt;span class=&quot;hljs-number&quot;&gt;3&lt;/span&gt;&lt;br /&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;-&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt; = &lt;span class=&quot;hljs-number&quot;&gt;3&lt;/span&gt;&lt;br /&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;+&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;-&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt; = &lt;span class=&quot;hljs-number&quot;&gt;3&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are &lt;span class=&quot;hljs-number&quot;&gt;5&lt;/span&gt; ways &lt;span class=&quot;hljs-built_in&quot;&gt;to&lt;/span&gt; assign symbols &lt;span class=&quot;hljs-built_in&quot;&gt;to&lt;/span&gt; make &lt;span class=&quot;hljs-operator&quot;&gt;the&lt;/span&gt; &lt;span class=&quot;hljs-built_in&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;hljs-operator&quot;&gt;of&lt;/span&gt; nums be target &lt;span class=&quot;hljs-number&quot;&gt;3.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&quot;1-the-quick-solution&quot;&gt;1. The quick solution&lt;/h3&gt;For each of the element of a list, it has two options: plus or minus. So the question asks how many ways to get a special number by all possible paths. Of course, if the sum of numbers is unrealistic, we just need to return 0.&lt;br /&gt;&lt;br /&gt;Sounds exactly like a DP question. If we have a pencil and a paper, we can start to explore the relationship between &lt;code&gt;dp(n)&lt;/code&gt; and &lt;code&gt;dp(n-1)&lt;/code&gt;. For example, our goal is to get a sum of 5, and we are given a list of [1, 1, 1, 1, 1]. If th a smaller tuple/list is (1, 1, 1, 1) and some paths get 4, that is exactly what we want since it adds 1 and becomes 5. Similarly, if they could get 6, that is fine as well. We add simply both paths together, since there are only two paths.  &lt;br /&gt;&lt;br /&gt;The formula is is &lt;code&gt;dp(n, s) = dp(n-1, s-x) + dp(n-1, s+x&lt;/code&gt;), where &lt;code&gt;n&lt;/code&gt; is the size of the list, &lt;code&gt;s&lt;/code&gt; is the sum of the numbers and &lt;code&gt;x&lt;/code&gt; is the one that adds to the previous list. OK, the second step is easy. &lt;br /&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot;language-python hljs &quot;&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findTargetSumWays_1&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(nums, S)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-string&quot;&gt;&quot;&quot;&quot;&lt;br /&gt;    :type nums: Tuple[int]&lt;br /&gt;    :type S: int&lt;br /&gt;    :rtype: int&lt;br /&gt;    &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;not&lt;/span&gt; nums:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; S == &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; findTargetSumWays_1(nums[&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;:], S+nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;]) + findTargetSumWays_1(nums[&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;:], S-nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;]) &lt;br /&gt;&lt;br /&gt;small_test_nums = (&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;)&lt;br /&gt;small_test_S = &lt;span class=&quot;hljs-number&quot;&gt;3&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;%time findTargetSumWays_1(small_test_nums, small_test_S)&lt;/code&gt;&lt;/pre&gt;It is theoretically correct and works perfectly with small test cases. But we know that it is going to a nightmare for an engineering application, because it has a hefty time complexity of O(2^N). So math part is done, and We have to move to the third step.&lt;br /&gt;&lt;h3 id=&quot;2-the-third-step-that-is-hard&quot;&gt;2. The third step that is hard&lt;/h3&gt;So we need to find a data structure to record all the paths. If it is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Fibonacci_number&quot;&gt;Fibonacci number&lt;/a&gt; problem, a simple linear data structure like a list will slash O(2^N) to O(N). &lt;br /&gt;&lt;br /&gt;But the hard part is: what data structure is going to be used here. Since the &lt;code&gt;get&lt;/code&gt; operation in Hashtable is O(1), a rolling dictionary will help record the previous states. However, Python’s dictionary does not support change/add ops while it is in a loop, then we have to manually replace it. The overall path will be like a tree structure. So the ideal solution will be like -&lt;br /&gt;&lt;br /&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs python&quot;&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findTargetSumWays_2&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(nums, S)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;not&lt;/span&gt; nums:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;&lt;br /&gt;    dic = {nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;]: &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, -nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;]: &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;} &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;] != &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;else&lt;/span&gt; {&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;: &lt;span class=&quot;hljs-number&quot;&gt;2&lt;/span&gt;}&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;for&lt;/span&gt; i &lt;span class=&quot;hljs-keyword&quot;&gt;in&lt;/span&gt; range(&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;, len(nums)):&lt;br /&gt;        tdic = {}&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;for&lt;/span&gt; d &lt;span class=&quot;hljs-keyword&quot;&gt;in&lt;/span&gt; dic:&lt;br /&gt;            tdic[d + nums[i]] = tdic.get(d + nums[i], &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;) + dic.get(d, &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;)&lt;br /&gt;            tdic[d - nums[i]] = tdic.get(d - nums[i], &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;) + dic.get(d, &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;)&lt;br /&gt;        dic = tdic&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; dic.get(S, &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;big_test_nums = tuple(range(&lt;span class=&quot;hljs-number&quot;&gt;100&lt;/span&gt;))&lt;br /&gt;big_test_S = sum(range(&lt;span class=&quot;hljs-number&quot;&gt;88&lt;/span&gt;))&lt;br /&gt;%time findTargetSumWays_2(big_test_nums, big_test_S)&lt;/code&gt;&lt;/pre&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs python&quot;&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;The time is exactly what we need. However, the codes are not elegant and hard to understand. &lt;br /&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs perl&quot;&gt;CPU &lt;span class=&quot;hljs-keyword&quot;&gt;times&lt;/span&gt;: user &lt;span class=&quot;hljs-number&quot;&gt;189&lt;/span&gt; ms, sys: &lt;span class=&quot;hljs-number&quot;&gt;4.77&lt;/span&gt; ms, total: &lt;span class=&quot;hljs-number&quot;&gt;194&lt;/span&gt; ms&lt;br /&gt;Wall &lt;span class=&quot;hljs-keyword&quot;&gt;time&lt;/span&gt;: &lt;span class=&quot;hljs-number&quot;&gt;192&lt;/span&gt; ms&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&quot;3-finally-the-easy-solution&quot;&gt;3. Finally the easy solution&lt;/h3&gt;If we don’t want things to get complicated. Here we just want a cache and Python 3 provides a &lt;code&gt;lru_cache&lt;/code&gt; decorator. Then adding one line to the first solution will quickly solve the problem. &lt;br /&gt;&lt;br /&gt;&lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs python&quot;&gt;&lt;span class=&quot;hljs-decorator&quot;&gt;@lru_cache(10000000)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findTargetSumWays_3&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(nums, S)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;not&lt;/span&gt; nums:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; S == &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; findTargetSumWays_3(nums[&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;:], S+nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;]) + findTargetSumWays_3(nums[&lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;:], S-nums[&lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;]) &lt;br /&gt;&lt;br /&gt;%time findTargetSumWays_3(big_test_nums, big_test_S)&lt;br /&gt;&lt;br /&gt;CPU times: user &lt;span class=&quot;hljs-number&quot;&gt;658&lt;/span&gt; ms, sys: &lt;span class=&quot;hljs-number&quot;&gt;19.7&lt;/span&gt; ms, total: &lt;span class=&quot;hljs-number&quot;&gt;677&lt;/span&gt; ms&lt;br /&gt;Wall time: &lt;span class=&quot;hljs-number&quot;&gt;680&lt;/span&gt; ms&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h4&gt;Good math cannot solve all the engineering problems. It has to combine with the details of the languange, the application and the system to avoid bad engineering implementation.&lt;br /&gt;&lt;br /&gt;The Jupyter notebook is at &lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/Good%20math%2C%20bad%20engineering.ipynb&quot;&gt;Github&lt;/a&gt;. If you have any comment, please email me wm@sasanalysis.com. </content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/963364892874150808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/963364892874150808'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2017/02/good-math-bad-engineering.html' title='Good math, bad engineering'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://3.bp.blogspot.com/-CxYsH_YbSeM/WKcncNVtVAI/AAAAAAAALZI/smM-UPJ7L7E46tGMbzMqhWpiEksVJO0agCLcB/s72-c/math-1547018_960_720.jpg" height="72" width="72"/></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-8758802296176401668</id><published>2017-02-16T13:20:00.001-06:00</published><updated>2017-02-17T10:34:49.502-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Elasticsearch"/><title type='text'>Use Elasticsearch and Kibana for large BI system</title><content type='html'>&lt;br /&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://2.bp.blogspot.com/-Ft_j8Vo9plg/WKX7vSu36GI/AAAAAAAALWA/bTeMPL3MUJUOJvljREK7yUBclhYJzi9OgCLcB/s1600/valve-1135657_960_720.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;392&quot; src=&quot;https://2.bp.blogspot.com/-Ft_j8Vo9plg/WKX7vSu36GI/AAAAAAAALWA/bTeMPL3MUJUOJvljREK7yUBclhYJzi9OgCLcB/s640/valve-1135657_960_720.jpg&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;p&gt;Nowadays Elasticsearch is more and more popular. Besides it original search functionalities, I found Elasticsearch can be &lt;/p&gt; &lt;ol&gt;&lt;li&gt;used as a logging container. That is  what the ELK stack is created for.&lt;/li&gt;&lt;li&gt;utilized as a JSON server with richful APIs, which can be combined with its Kibana as BI servers. &lt;/li&gt;&lt;/ol&gt; &lt;p&gt;That is the data store I see everyday &lt;/p&gt; &lt;ul&gt;&lt;li&gt;10PB stocking data&lt;/li&gt;&lt;li&gt;average 30TB incoming data everyday&lt;/li&gt;&lt;li&gt;various data sources including binary files such PDF&lt;/li&gt;&lt;li&gt;including very complicated SQL queries (fortunately no stored procedures)&lt;/li&gt;&lt;li&gt;millions of JSON creations daily&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;People want to know what is going on with such data. So a business intelligence or an OLAP system is needed to visualize/aggregate the data and its flow. Since Elasticsearch is so easy to scale out, it beats other solutions for big data on the market.&lt;/p&gt;   &lt;h2 id=&quot;1-batch-worker&quot;&gt;1. Batch Worker&lt;/h2&gt; &lt;p&gt;There are many options to implement a batch worker. Finally the decision falls to either Spring Data Batch or writing a library from the scratch in Python. &lt;/p&gt;   &lt;h4 id=&quot;11-spring-data-batch-vs-python&quot;&gt;1.1 Spring Data Batch v.s. Python&lt;/h4&gt;   &lt;h5 id=&quot;spring-data-batch&quot;&gt;Spring Data Batch&lt;/h5&gt; &lt;ul&gt;&lt;li&gt;Pros: &lt;br&gt;&lt;ul&gt;&lt;li&gt;is a full framework that includes rollback, notification and scheduler features&lt;/li&gt;&lt;li&gt;provides great design pattern for Dependency Injection, such as factory and singleton, which help multiple persons work together. For example -&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot;language-java hljs &quot;&gt;&lt;span class=&quot;hljs-annotation&quot;&gt;@Bean&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;public&lt;/span&gt; Step &lt;span class=&quot;hljs-title&quot;&gt;IndexMySQLJob01&lt;/span&gt;() {&lt;br /&gt;   &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; stepBuilderFactory.get(&lt;span class=&quot;hljs-string&quot;&gt;&quot;IndexMySQLJob01&quot;&lt;/span&gt;)&lt;br /&gt;           .&amp;lt;Data, Data&amp;gt; chunk(&lt;span class=&quot;hljs-number&quot;&gt;10&lt;/span&gt;)&lt;br /&gt;           .reader(reader())&lt;br /&gt;           .processor(processor())&lt;br /&gt;           .writer(writer())&lt;br /&gt;           .build();&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt; &lt;ul&gt;&lt;li&gt;Cons: &lt;br&gt;&lt;ul&gt;&lt;li&gt;is difficult to handle nested JSONs&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html&quot;&gt;Elasticsearch Java client&lt;/a&gt; is not as good as its &lt;a href=&quot;https://elasticsearch-py.readthedocs.io/en/master/&quot;&gt;Python client&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;   &lt;h5 id=&quot;python&quot;&gt;Python&lt;/h5&gt; &lt;ul&gt;&lt;li&gt;Pros: &lt;br&gt;&lt;ul&gt;&lt;li&gt;less codes; flexible &lt;/li&gt;&lt;li&gt;super easy from dictionary to JSON &lt;/li&gt;&lt;li&gt;has official/3rd party libraries for everything&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Cons: &lt;br&gt;&lt;ul&gt;&lt;li&gt;you create your own library/framework&lt;/li&gt;&lt;li&gt;if the pattern like Spring Data Batch is deployed, has to inject dependencies manually, such as -&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot;language-python hljs &quot;&gt;&lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;IndexMySQLJob01&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(object)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;     &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(self, reader, processor, writer, listener)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;          self.reader = reader&lt;br /&gt;          self.processor = processor&lt;br /&gt;          self.writer = writer&lt;br /&gt;          self.listener = listener&lt;br /&gt;     ...&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;Eventually Python is picked, because the overall scenario is more algorithm-bound instead of language-bound. &lt;/p&gt;   &lt;h4 id=&quot;12-the-algorithms&quot;&gt;1.2 The algorithms&lt;/h4&gt; &lt;p&gt;Since the data size is pretty big, time and space are always considered. The direct way to decrease the time complexity is using the hash tables, as long as the memory can hold the data. For example, a join between an N rows table and an M rows table can be optimized from O(M*N) to O(M). &lt;/p&gt; &lt;p&gt;To save the space, a generator chain is used to stream data from the start to the end, instead of materializing sizable objects.&lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot;language-python hljs &quot;&gt;&lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;JsonTask01&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(object)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    ...&lt;br /&gt;    &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;get_json&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(self, generator1, hashtable1)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;for&lt;/span&gt; each_dict &lt;span class=&quot;hljs-keyword&quot;&gt;in&lt;/span&gt; generator1:&lt;br /&gt;            key = each_dict.get(&lt;span class=&quot;hljs-string&quot;&gt;&#39;key&#39;&lt;/span&gt;)&lt;br /&gt;            each_dict.update(hashtable1.get(key))&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;yield&lt;/span&gt; each_dict&lt;/code&gt;&lt;/pre&gt;   &lt;h4 id=&quot;13-the-scheduler&quot;&gt;1.3 The scheduler&lt;/h4&gt; &lt;p&gt;A scheduler is a must: &lt;code&gt;cron&lt;/code&gt; is enough for simple tasking, while a bigger system requires a work flow. &lt;a href=&quot;https://airflow.incubator.apache.org/&quot;&gt;Airflow&lt;/a&gt; is the one that helps organize and schedule. It has a web UI and is written in Python, which is easy to be integrated with the batch worker. &lt;/p&gt;   &lt;h4 id=&quot;14-high-availability-with-zero-downtime&quot;&gt;1.4 High availability with zero downtime&lt;/h4&gt; &lt;p&gt;Indexing of large quantity of data will impose significant impact. For mission-critical indexes that need 100% up time, the &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html&quot;&gt;zero down algorithm&lt;/a&gt; is implemented and we keep two copies of an index for maximum safety. The alias will switch between the two copies once the indexing is finished.&lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot;language-python hljs &quot;&gt;    &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;add_alias&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(self, idx)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;        LOGGER.warn(&lt;span class=&quot;hljs-string&quot;&gt;&quot;The alias {} will point to {}.&quot;&lt;/span&gt;.format(self.index, idx))&lt;br /&gt;        self.es.indices.put_alias(idx, self.index)&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;delete_alias&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(self, idx)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;        LOGGER.warn(&lt;span class=&quot;hljs-string&quot;&gt;&quot;The alias {} will be removed from {}.&quot;&lt;/span&gt;.format(self.index, idx))&lt;br /&gt;        self.es.indices.delete_alias(idx, self.index)&lt;/code&gt;&lt;/pre&gt;   &lt;h2 id=&quot;2-elasticsearch-cluster&quot;&gt;2. Elasticsearch Cluster&lt;/h2&gt;   &lt;h4 id=&quot;21-three-kinds-of-nodes&quot;&gt;2.1 Three kinds of nodes&lt;/h4&gt; &lt;p&gt;An Elasticsearch node can choose one of three roles: master node, data node and ingest node(previously called client node). It is commonly seen to dedicate a node as master and ingest and data all together. For a large system, it is always helpful to assign the three roles to different machines/VMs. Therefore, once a node is down/up, it will be quicker to failover or recover. &lt;br&gt;&lt;a href=&quot;https://github.com/mobz/elasticsearch-head&quot;&gt;elasticsearch-head&lt;/a&gt; can clearly visualize the data transfer process of the shards once an accident occurs. &lt;/p&gt; &lt;p&gt;With the increased number of cluster nodes, the deployment becomes painful. I feel the best tool so far is &lt;a href=&quot;https://github.com/elastic/ansible-elasticsearch&quot;&gt;ansible-elasticsearch&lt;/a&gt;. With &lt;code&gt;ansible-playbook -i hosts ./your-playbook.yml -c paramiko&lt;/code&gt;, the cluster is on the fly. &lt;/p&gt;   &lt;h4 id=&quot;22-memory-is-cheap-but-heap-is-expensive&quot;&gt;2.2 Memory is cheap but heap is expensive&lt;/h4&gt; &lt;p&gt;&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html&quot;&gt;The rules of thumb&lt;/a&gt; for Elasticsearch are - &lt;/p&gt; &lt;blockquote&gt;  &lt;p&gt;Give (less than) Half Your Memory to Lucene&lt;/p&gt;     &lt;p&gt;Don’t Cross 32 GB!&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;The result causes an awkward situation: if you have a machine that has more than 64GB memory, then the additional memory will mean nothing to Elasticsearch. Actually it is meaningful to run two or more Elasticsearch instances side by side to save the hardware. For example, there is a machine with 96GB memory. We can allocate 31GB for an ingest node, 31 GB for a data node and the rest for the OS. However, two data nodes in a single machine will compete for the disk IO that damages the performance, while a master node and a data node will increase the risk of downtime.&lt;/p&gt; &lt;p&gt;The great thing for Elasticsearch is that it provides richful REST APIs, such as &lt;a href=&quot;http://localhost:9200/_nodes/stats?pretty&quot;&gt;http://localhost:9200/_nodes/stats?pretty&lt;/a&gt;. We could use Xpack(paid) or other customized tools to monitor them. I feel that the three most important statistics for the heap and therefore the performance are - &lt;/p&gt;   &lt;h5 id=&quot;the-heap-usage-and-the-old-gc-duration&quot;&gt;The heap usage and the old GC duration&lt;/h5&gt; &lt;p&gt;The two statistics intertwined together. The high heap usage, such as 75%, will lead to a GC, while GC with high heap usage will take longer time. We have to keep both numbers as low as possible. &lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs r&quot;&gt;     &lt;span class=&quot;hljs-string&quot;&gt;&quot;jvm&quot;&lt;/span&gt; : {&lt;br /&gt;        &lt;span class=&quot;hljs-string&quot;&gt;&quot;mem&quot;&lt;/span&gt; : {&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;heap_used_percent&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;89&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot;&gt;...&lt;/span&gt; &lt;br /&gt;        &lt;span class=&quot;hljs-string&quot;&gt;&quot;gc&quot;&lt;/span&gt; : {&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;collectors&quot;&lt;/span&gt; : {&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;...&lt;/span&gt;&lt;br /&gt;            &lt;span class=&quot;hljs-string&quot;&gt;&quot;old&quot;&lt;/span&gt; : {&lt;br /&gt;              &lt;span class=&quot;hljs-string&quot;&gt;&quot;collection_count&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;225835&lt;/span&gt;,&lt;br /&gt;              &lt;span class=&quot;hljs-string&quot;&gt;&quot;collection_time_in_millis&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;22624857&lt;/span&gt;&lt;br /&gt;            }&lt;br /&gt;          }&lt;br /&gt;        }&lt;/code&gt;&lt;/pre&gt;   &lt;h5 id=&quot;thread-pools&quot;&gt;Thread pools&lt;/h5&gt; &lt;p&gt;There are three kinds of thread pools: active, queue, and reject. It is useful to visualize the real time change. Once there are a lot of queued threads or rejected threads, it is good time to think about scale up or scale out. &lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs bash&quot;&gt;&lt;span class=&quot;hljs-string&quot;&gt;&quot;thread_pool&quot;&lt;/span&gt; : {&lt;br /&gt;        &lt;span class=&quot;hljs-string&quot;&gt;&quot;bulk&quot;&lt;/span&gt; : {&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;threads&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;4&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;queue&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;active&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;rejected&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;largest&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;4&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;completed&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;53680&lt;/span&gt;&lt;br /&gt;        },&lt;/code&gt;&lt;/pre&gt;   &lt;h5 id=&quot;the-segments-numbersize&quot;&gt;The segments’ number/size&lt;/h5&gt; &lt;p&gt;The segments are the in-memory inverted indexes corresponding to the indexes on the hard disk, which are persistent in the physical memory and GC will have no effect on them. The segment will have the footage on every search thread. The size of the segments are important because they will be multiplied by a factor of the number of threads.&lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs bash&quot;&gt;        &lt;span class=&quot;hljs-string&quot;&gt;&quot;segments&quot;&lt;/span&gt; : {&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;count&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;215&lt;/span&gt;,&lt;br /&gt;          &lt;span class=&quot;hljs-string&quot;&gt;&quot;memory_in_bytes&quot;&lt;/span&gt; : &lt;span class=&quot;hljs-number&quot;&gt;15084680&lt;/span&gt;,&lt;br /&gt;        },&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;The number of shards actually controls the number of the segments. The shards increase, then the size of the segments decreases and the number of the segments increases. So we cannot increase the number of shards as many as we want. If there are many small segments, the heap usage will turn much higher. The solution is &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html&quot;&gt;Force merge&lt;/a&gt;, which is time-consuming but effecitve. &lt;/p&gt;   &lt;h2 id=&quot;3-kibana-as-bi-dashboard&quot;&gt;3. Kibana as BI dashboard&lt;/h2&gt;   &lt;h4 id=&quot;31-plugins&quot;&gt;3.1 Plugins&lt;/h4&gt; &lt;p&gt;Kibana integreated DevTools(previously The Sense UI) for free. DevTools has code assistance and is a powerful tool for debugging. If the budget is not an issue, Xpack is also highly recommended. As for Elasticsearch, since 5.0, &lt;code&gt;ingest-geoip&lt;/code&gt; is now a plugin. We will have to write it to Ansible YAML such as -&lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs haml&quot;&gt;es_plugins:&lt;br /&gt;    -&lt;span class=&quot;ruby&quot;&gt; &lt;span class=&quot;hljs-symbol&quot;&gt;plugin:&lt;/span&gt; ingest-geoip&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;   &lt;h4 id=&quot;32-instant-aggregation&quot;&gt;3.2 Instant Aggregation&lt;/h4&gt; &lt;p&gt;There are quite a few KPIs that need system-wide term aggregations. From 5.0 the request cache will be enabled by default for all requests with &lt;code&gt;size:0&lt;/code&gt;.  &lt;br&gt;For example -&lt;/p&gt;   &lt;pre class=&quot;prettyprint&quot;&gt;&lt;code class=&quot; hljs haskell&quot;&gt;&lt;span class=&quot;hljs-type&quot;&gt;POST&lt;/span&gt; /big_data_index/&lt;span class=&quot;hljs-typedef&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;data&lt;/span&gt;/_search&lt;/span&gt;&lt;br /&gt;{   &lt;span class=&quot;hljs-string&quot;&gt;&quot;size&quot;&lt;/span&gt;: &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;,&lt;br /&gt;    &lt;span class=&quot;hljs-string&quot;&gt;&quot;query&quot;&lt;/span&gt;: {&lt;br /&gt;        &lt;span class=&quot;hljs-string&quot;&gt;&quot;bool&quot;&lt;/span&gt;: {&lt;br /&gt;            &lt;span class=&quot;hljs-string&quot;&gt;&quot;must_not&quot;&lt;/span&gt;: {&lt;br /&gt;                &lt;span class=&quot;hljs-string&quot;&gt;&quot;exists&quot;&lt;/span&gt;: {&lt;br /&gt;                    &lt;span class=&quot;hljs-string&quot;&gt;&quot;field&quot;&lt;/span&gt;: &lt;span class=&quot;hljs-string&quot;&gt;&quot;interesting_field&quot;&lt;/span&gt;&lt;br /&gt;                }&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;}&lt;/code&gt;&lt;/pre&gt; &lt;p&gt;The &lt;code&gt;Fore merge&lt;/code&gt; as mentioned above, such as&lt;code&gt;POST /_forcemerge?max_num_segments=1&lt;/code&gt;, will combine the segments and dramatically increase the aggregation speed. &lt;/p&gt; &lt;h4 id=&quot;33-proxy&quot;&gt;3.3 Proxy&lt;/h4&gt; &lt;p&gt;Nginx is possibly the best proxy as the frontend toward Kibana. There are two advantages: first the proxy can cache the static resources of Kibana; second we can always check the Nginx logs to figure out what causes problem for Kibana. &lt;/p&gt;   &lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt; &lt;p&gt;Elasticsearch and Kibana together provide high availability and high scalability for large BI system. &lt;/p&gt; &lt;blockquote&gt;  &lt;p&gt;if you have any comment, please email me wm@sasanalysis.com&lt;/p&gt;&lt;/blockquote&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8758802296176401668'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8758802296176401668'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2017/02/use-elasticsearch-and-kibana-for-large_16.html' title='Use Elasticsearch and Kibana for large BI system'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://2.bp.blogspot.com/-Ft_j8Vo9plg/WKX7vSu36GI/AAAAAAAALWA/bTeMPL3MUJUOJvljREK7yUBclhYJzi9OgCLcB/s72-c/valve-1135657_960_720.jpg" height="72" width="72"/></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-4090150749080677705</id><published>2016-09-07T14:19:00.003-05:00</published><updated>2016-09-07T14:19:57.452-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'>Use Slack bot to monitor the server</title><content type='html'>&lt;br /&gt;&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=4090150749080677705&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;div style=&quot;margin: 0px 0px 1.2em !important;&quot;&gt;I used to install &lt;a href=&quot;https://www.datadoghq.com/&quot;&gt;Datadog&lt;/a&gt; or other SaaS to monitor my Linux boxes on the cloud. Most times they are just overkill for my tiny servers with only 1GB or 2GB memory. Actually what I am most interested is the up-and-running processes, or/and the exact memory usage. And I need a mobile solution to monitor on-the-go.  &lt;/div&gt;&lt;div style=&quot;margin: 0px 0px 1.2em !important;&quot;&gt;Now with the coming of &lt;a href=&quot;https://api.slack.com/bot-users&quot;&gt;Slack bot&lt;/a&gt;, and &lt;a href=&quot;https://github.com/slackhq/python-slackclient&quot;&gt;its real time Python client&lt;/a&gt;,&amp;nbsp;I can just use a simple Python script to realize the purposes. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; slackclient &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; SlackClient&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; subprocess &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt;  getoutput&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; logging&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; time&lt;br /&gt;&lt;br /&gt;message_channel = &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;#my-server-001&#39;&lt;/span&gt;&lt;br /&gt;api_key = &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;xoxb-slack-token&#39;&lt;/span&gt;&lt;br /&gt;client = SlackClient(api_key)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; client.rtm_connect():&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;:&lt;br /&gt;        last_read = client.rtm_read()&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; last_read:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;try&lt;/span&gt;:&lt;br /&gt;                parsed = last_read[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;][&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;text&#39;&lt;/span&gt;]&lt;br /&gt;                &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; parsed &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;status&#39;&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; parsed:&lt;br /&gt;                    result = getoutput(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;pstree&#39;&lt;/span&gt;)&lt;br /&gt;                    result += &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;\n\n&#39;&lt;/span&gt; + getoutput(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;free -h&#39;&lt;/span&gt;)&lt;br /&gt;                    client.rtm_send_message(message_channel, str(result))&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;except&lt;/span&gt; Exception &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; e:&lt;br /&gt;                logging.error(e)&lt;br /&gt;        time.sleep(&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 0px 0px 1.2em !important;&quot;&gt;Then I use &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;systemd&lt;/code&gt; or other tools to daemonize it. No matter where and when I am, I enter &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;status&lt;/code&gt; at the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;#my-server-001&lt;/code&gt; channel on my phone, I will instantly get the result like - &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;systemd-+-accounts-daemon-+-{gdbus}&lt;br /&gt;       |                 `-{gmain}&lt;br /&gt;       |-agetty&lt;br /&gt;       |-cron&lt;br /&gt;       |-dbus-daemon&lt;br /&gt;       |-fail2ban-server---2*[{fail2ban-server}]&lt;br /&gt;       |-login---bash&lt;br /&gt;       |-nginx---nginx&lt;br /&gt;       |-postgres---5*[postgres]&lt;br /&gt;       |-python---sh---pstree&lt;br /&gt;       |-redis-server---2*[{redis-server}]&lt;br /&gt;       |-rsyslogd-+-{in:imklog}&lt;br /&gt;       |          |-{in:imuxsock}&lt;br /&gt;       |          `-{rs:main Q:Reg}&lt;br /&gt;       |-sshd-+-3*[sshd---sshd---bash]&lt;br /&gt;       |      `-sshd---sshd&lt;br /&gt;       |-2*[systemd---(sd-pam)]&lt;br /&gt;       |-systemd-journal&lt;br /&gt;       |-systemd-logind&lt;br /&gt;       |-systemd-timesyn---{sd-resolve}&lt;br /&gt;       |-systemd-udevd&lt;br /&gt;       `-uwsgi---uwsgi---5*[uwsgi]&lt;br /&gt;&lt;br /&gt;             total        used        free      shared  buff/cache   available&lt;br /&gt;Mem:           2.0G        207M        527M         26M        1.2G        1.7G&lt;br /&gt;Swap:          255M          0B        255M&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+SSB1c2VkIHRvIGluc3RhbGwgW0RhdGFkb2ddKGh0dHBzOi8vd3d3LmRhdGFkb2docS5jb20v KSBvciBvdGhlciBTYWFTIHRvIG1vbml0b3IgbXkgTGludXggYm94ZXMgb24gdGhlIGNsb3VkLiBN b3N0IHRpbWVzIHRoZXkgYXJlIGp1c3Qgb3ZlcmtpbGwgZm9yIG15IHRpbnkgc2VydmVycyB3aXRo IDFHQiBvciAyR0IgbWVtb3J5LiBXaGF0IEkgYW0gbW9zdCBpbnRlcmVzdGVkIGlzIHRoZSB1cC1h bmQtcnVubmluZyBwcm9jZXNzZXMsIG9yL2FuZCB0aGUgZXhhY3QgbWVtb3J5IHVzYWdlLiBBbmQg SSBuZWVkIGEgbW9iaWxlIHNvbHV0aW9uIHRvIG1vbml0b3Igb24tdGhlLWdvLiAmbmJzcDs8L3A+ PHA+PGJyPjwvcD48cD5Ob3cgd2l0aCB0aGUgY29taW5nIG9mIFtTbGFjayBib3RdKGh0dHBzOi8v YXBpLnNsYWNrLmNvbS9ib3QtdXNlcnMpLCBJIGNhbiB1c2UgYSBzaW1wbGUgUHl0aG9uIHNjcmlw dCB0byByZWFsaXplIHRoZSBwdXJwb3Nlcy4mbmJzcDs8L3A+PHA+PGJyPjwvcD48cD5gYGBweXRo b248L3A+PHA+ZnJvbSBzbGFja2NsaWVudCBpbXBvcnQgU2xhY2tDbGllbnQ8L3A+PHA+ZnJvbSBz dWJwcm9jZXNzIGltcG9ydCAmbmJzcDtnZXRvdXRwdXQ8L3A+PHA+aW1wb3J0IGxvZ2dpbmc8L3A+ PHA+aW1wb3J0IHRpbWU8L3A+PHA+PGJyPjwvcD48cD5tZXNzYWdlX2NoYW5uZWwgPSAnI215LXNl cnZlci0wMDEnPC9wPjxwPmFwaV9rZXkgPSAneG94Yi1zbGFjay10b2tlbic8L3A+PHA+Y2xpZW50 ID0gU2xhY2tDbGllbnQoYXBpX2tleSk8L3A+PHA+PGJyPjwvcD48cD5pZiBjbGllbnQucnRtX2Nv bm5lY3QoKTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyB3aGlsZSBUcnVlOjwvcD48cD4mbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgbGFzdF9yZWFkID0gY2xpZW50LnJ0bV9yZWFkKCk8L3A+PHA+Jm5i c3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGlmIGxhc3RfcmVhZDo8L3A+PHA+Jm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgdHJ5OjwvcD48cD4mbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IHBhcnNlZCA9IGxhc3RfcmVh ZFswXVsndGV4dCddPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgaWYgcGFyc2VkIGFuZCAnc3RhdHVzJyBpbiBwYXJzZWQ6PC9wPjxw PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyByZXN1bHQgPSBnZXRvdXRwdXQoJ3BzdHJlZScpPC9wPjxwPiZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyByZXN1bHQgKz0gJ1xuXG4nICsgZ2V0b3V0cHV0KCdmcmVlIC1oJyk8L3A+PHA+Jm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7IGNsaWVudC5ydG1fc2VuZF9tZXNzYWdlKG1lc3NhZ2VfY2hhbm5lbCwgc3RyKHJlc3Vs dCkpPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGV4Y2Vw dCBFeGNlcHRpb24gYXMgZTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBsb2dnaW5nLmVycm9yKGUpPC9wPjxwPiZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyB0aW1lLnNsZWVwKDEpPC9wPjxwPmBgYDwvcD48cD48YnI+PC9wPjxw PlRoZW4gSSB1c2UgYHN5c3RlbWRgIG9yIG90aGVyICZuYnNwO3RvIGRhZW1vbml6ZSBpdC4gTm8g bWF0dGVyIHdoZXJlIGFuZCB3aGVuIEkgYW0sIEkgZW50ZXIgYHN0YXR1c2AgYXQgdGhlIGAjbXkt c2VydmVyLTAwMWAgY2hhbm5lbCBvbiBteSBwaG9uZSwgSSB3aWxsIGluc3RhbnRseSBnZXQgdGhl IHJlc3VsdCBsaWtlIC0mbmJzcDs8L3A+PHA+PGJyPjwvcD48cD5gYGA8L3A+PHA+c3lzdGVtZC0r LWFjY291bnRzLWRhZW1vbi0rLXtnZGJ1c308L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7fCAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7IGAte2dtYWlufTwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8LWFnZXR0eTwv cD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8LWNyb248L3A+PHA+Jm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7fC1kYnVzLWRhZW1vbjwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDt8LWZhaWwyYmFuLXNlcnZlci0tLTIqW3tmYWlsMmJhbi1zZXJ2ZXJ9XTwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8LWxvZ2luLS0tYmFzaDwvcD48cD4mbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDt8LW5naW54LS0tbmdpbng8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7fC1wb3N0Z3Jlcy0tLTUqW3Bvc3RncmVzXTwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDt8LXB5dGhvbi0tLXNoLS0tcHN0cmVlPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwO3wtcmVkaXMtc2VydmVyLS0tMipbe3JlZGlzLXNlcnZlcn1dPC9wPjxwPiZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwO3wtcnN5c2xvZ2QtKy17aW46aW1rbG9nfTwvcD48cD4mbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDt8ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8 LXtpbjppbXV4c29ja308L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7fCAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7YC17cnM6bWFpbiBROlJlZ308L3A+PHA+Jm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7fC1zc2hkLSstMypbc3NoZC0tLXNzaGQtLS1iYXNoXTwvcD48 cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8ICZuYnNwOyAmbmJzcDsgJm5ic3A7YC1zc2hk LS0tc3NoZDwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8LTIqW3N5c3RlbWQtLS0o c2QtcGFtKV08L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7fC1zeXN0ZW1kLWpvdXJu YWw8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7fC1zeXN0ZW1kLWxvZ2luZDwvcD48 cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8LXN5c3RlbWQtdGltZXN5bi0tLXtzZC1yZXNv bHZlfTwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDt8LXN5c3RlbWQtdWRldmQ8L3A+ PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7YC11d3NnaS0tLXV3c2dpLS0tNSpbdXdzZ2ld PC9wPjxwPjxicj48L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7dG90YWwgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7dXNlZCAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDtmcmVlICZuYnNwOyAmbmJzcDsgJm5ic3A7c2hhcmVkICZuYnNwO2J1 ZmYvY2FjaGUgJm5ic3A7IGF2YWlsYWJsZTwvcD48cD5NZW06ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgMi4wRyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsyMDdNICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOzUyN00gJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IDI2 TSAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsxLjJHICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOzEuN0c8L3A+PHA+U3dhcDogJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOzI1 NU0gJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOzBCICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOzI1NU08L3A+PHA+PGJyPjwvcD48cD5gYGA8L3A+&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/4090150749080677705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=4090150749080677705' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4090150749080677705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4090150749080677705'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2016/09/use-slack-bot-to-monitor-server.html' title='Use Slack bot to monitor the server'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-3234862478706284550</id><published>2015-07-29T08:24:00.000-05:00</published><updated>2015-07-29T09:17:19.426-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Deploy edx spark environment to DigitalOcean</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=3234862478706284550&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;This summer I took the Spark courses at edx &lt;a href=&quot;https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x&quot;&gt;CS100&lt;/a&gt; and &lt;a href=&quot;https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x&quot;&gt;CS190&lt;/a&gt;, and had wonderful experience. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;The two classes apply a Vagrant virtual machine containing Spark and all teaching materials. There are two challenges with the virtual machine —&lt;/div&gt;&lt;ol style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;The labs usually take long time to finish, say 8-10 hours. If the host machine is closed, the RDDs will be lost and the pipeline has to be run again.&lt;/div&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Some RDD operations take a lot computation/communication powers, such as &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;groupByKey&lt;/code&gt; and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;distinct&lt;/code&gt;. Many of my 50k classmates complained about the waiting time. And my most used laptop is a Chromebook and doesn’t even have options to install Virtual Box. &lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;To deploy the learning environment to a cloud may be an alternative. DigitalOcean is a good choice because it uses mirrors for most packages, and the network speed is amazingly fast that is almost 100MB/s (thanks to the SSD infrastructure DigitalOcean implements for the cloud, otherwise the hard disk may not stand this rapid IO; see my deployment records&amp;nbsp;&lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/Deploy%20edx%20spark%20environment%20to%20DigitalOcean/Deploy%20edx%20spark%20environment%20to%20DigitalOcean.ipynb&quot;&gt;GitHub&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;I found that a Linux box with 1 GB memory and 1 CPU at DigitalOcean that costs 10 dollars a month will handle most labs fairly easy with IPython and Spark. A 2 GB memory and 2 CPU droplet will be ideal since it is the minimal requirement for a simulated cluster. It costs 20 dollars a month, but is still much cheaper than the cost to earn the big data certificate that is $100 (50 for each). I just need to write Python scripts to install IPython notebook with SSL, and download Spark and the course materials.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-qycNjcvU1X8/VbjIaTpaFJI/AAAAAAAAERs/tMZnSxXzu7U/s1600/Capture2.PNG&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;448&quot; src=&quot;http://2.bp.blogspot.com/-qycNjcvU1X8/VbjIaTpaFJI/AAAAAAAAERs/tMZnSxXzu7U/s640/Capture2.PNG&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;The DevOps tool is &lt;a href=&quot;http://www.fabfile.org/&quot;&gt;Fabric&lt;/a&gt; and the fabfile is at &lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/Deploy%20edx%20spark%20environment%20to%20DigitalOcean/fabfile.py&quot;&gt;GitHub&lt;/a&gt;.&lt;/div&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;The deployment pipeline is also at &lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/Deploy%20edx%20spark%20environment%20to%20DigitalOcean/Deploy%20edx%20spark%20environment%20to%20DigitalOcean.ipynb&quot;&gt;GitHub&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+VGhpcyBzdW1tZXIgSSB0b29rIHRoZSBTcGFyayBjb3Vyc2VzIGF0IGVkeCBbQ1MxMDBdKGh0 dHBzOi8vd3d3LmVkeC5vcmcvY291cnNlL2ludHJvZHVjdGlvbi1iaWctZGF0YS1hcGFjaGUtc3Bh cmstdWMtYmVya2VsZXl4LWNzMTAwLTF4KSBhbmQgW0NTMTkwXShodHRwczovL3d3dy5lZHgub3Jn L2NvdXJzZS9zY2FsYWJsZS1tYWNoaW5lLWxlYXJuaW5nLXVjLWJlcmtlbGV5eC1jczE5MC0xeCks IGFuZCBoYWQgd29uZGVyZnVsIGV4cGVyaWVuY2UuJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+VGhl IHR3byBjbGFzc2VzIHVzZXMgYSBWYWdyYW50IHZpcnR1YWwgbWFjaGluZSBjb250YWluaW5nIFNw YXJrIGFuZCBhbGwgdGVhY2hpbmcgbWF0ZXJpYWxzLiBUaGVyZSBhcmUgdHdvIGNoYWxsZW5nZXMg d2l0aCB0aGUgdmlydHVhbCBtYWNoaW5lIC0tPC9wPjxwPjxicj48L3A+PHA+MS4gVGhlIGxhYnMg dGFrZXMgbG9uZyB0aW1lIHRvIGZpbmlzaCwgc2F5IDgtMTAgaG91cnMuIElmIHRoZSBob3N0IG1h Y2hpbmUgaXMgY2xvc2VkLCB0aGUgUkREcyB3aWxsIGJlIGxvc3QgYW5kIHRoZSBwaXBlbGluZSBo YXMgdG8gYmUgcnVuIGFnYWluLjwvcD48cD48YnI+PC9wPjxwPjIuIFNvbWUgUkREIG9wZXJhdGlv bnMgdGFrZXMgYSBsb3QgY29tcHV0YXRpb24vY29tbXVuaWNhdGlvbiBwb3dlcnMsIHN1Y2ggYXMg YGdyb3VwQnlLZXlgIGFuZCBgZGlzdGluY3RgLiBNYW55IG9mIG15IDUwayBjbGFzc21hdGVzIGNv bXBsYWluZWQgYWJvdXQgaXQuIEFuZCBteSBtb3N0IHVzZWQgbGFwdG9wIGlzIGEgQ2hyb21lYm9v ayBhbmQgZG9lc24ndCBldmVuIGhhdmUgb3B0aW9ucyB0byBpbnN0YWxsIFZpcnR1YWwgQm94LiZu YnNwOzwvcD48cD48YnI+PC9wPjxwPlRvIGRlcGxveSB0aGUgbGVhcm5pbmcgZW52aXJvbm1lbnQg dG8gYSBjbG91ZCBtYXkgYmUgYW4gYWx0ZXJuYXRpdmUuIERpZ2l0YWxPY2VhbiBpcyBhIGdvb2Qg Y2hvaWNlIGJlY2F1c2UgaXQgdXNlcyBtaXJyb3JzIGZvciBtb3N0IHBhY2thZ2VzLCBhbmQgdGhl IG5ldHdvcmsgc3BlZWQgaXMgYW1hemluZ2x5IGZhc3QgdGhhdCBpcyBhbG1vc3QgMTAwTUIvcyAo dGhhbmtzIHRvIHRoZSBTU0QgaW5mcmFzdHJ1Y3J1ZSBEaWdpdGFsT2NlYW4gaW1wbGVtZW50cyku PC9wPjxwPjxicj48L3A+PHA+SSBmb3VuZCB0aGF0IGEgTGludXggYm94IHdpdGggMUdCIGFuZCAx Q1BVIGF0IERpZ2l0YWxPY2VhbiB0aGF0IGNvc3RzIDEwIGRvbGxhcnMgYSBtb250aCB3aWxsIGhh bmRsZSBtb3N0IGxhYnMgZmFpcmx5IGVhc3kgd2l0aCBJUHl0aG9uIGFuZCBTcGFyay4gQSAyR0Ig YW5kIDJDUFUgZHJvcGxldCB3aWxsIGJlIGlkZWFsIHNpbmNlIGl0IGlzIG1pbmltYWwgZm9yIGEg c2ltdWxhdGVkIGNsdXN0ZXIuIEl0IGNvc3RzIDIwIGRvbGxhcnMgYSBtb250aCwgYnV0IGlzIHN0 aWxsIG11Y2ggY2hlYXBlciB0aGFuIHRoZSBjZXJ0aWZjYXRlIGNvc3QgdGhhdCBpcyAkMTAwLiBJ IGp1c3QgbmVlZCB0byBpbnN0YWxsIElQeXRob24gbm90ZWJvb2sgd2l0aCBTU0wsIGFuZCBkb3du bG9hZCBTcGFyayBhbmQgdGhlIGNvdXJzZSBtYXRlcmlhbHMuJm5ic3A7PC9wPjxwPjxicj48L3A+ PHAgY2xhc3M9InNlcGFyYXRvciIgc3R5bGU9InRleHQtYWxpZ246IGNlbnRlcjsgY2xlYXI6IGJv dGg7Ij48YSBpbWFnZWFuY2hvcj0iMSIgaHJlZj0iaHR0cDovLzIuYnAuYmxvZ3Nwb3QuY29tLy1x eWNOamN2VTFYOC9WYmpJYVRwYUZKSS9BQUFBQUFBQUVScy90TVpuU3hYenU3VS9zMTYwMC9DYXB0 dXJlMi5QTkciIHN0eWxlPSJtYXJnaW4tbGVmdDogMWVtOyBtYXJnaW4tcmlnaHQ6IDFlbTsiPjxp bWcgc3JjPSJodHRwczovLzIuYnAuYmxvZ3Nwb3QuY29tLy1xeWNOamN2VTFYOC9WYmpJYVRwYUZK SS9BQUFBQUFBQUVScy90TVpuU3hYenU3VS9zNjQwL0NhcHR1cmUyLlBORyIgYm9yZGVyPSIwIiB3 aWR0aD0iNjQwIiBoZWlnaHQ9IjQ0OCI+PC9hPjwvcD48cD48YnI+PC9wPjxwPjxicj48L3A+PHA+ LSBUaGUgRGV2T3BzIHRvb2wgaXMgW0ZhYnJpY10oaHR0cDovL3d3dy5mYWJmaWxlLm9yZy8pIGFu ZCB0aGUgZmFiZmlsZSBpcyBhdCBbR2l0SHViXShodHRwczovL2dpdGh1Yi5jb20vZGFwYW5nbWFv L0Jsb2cvYmxvYi9tYXN0ZXIvRGVwbG95JTIwZWR4JTIwc3BhcmslMjBlbnZpcm9ubWVudCUyMHRv JTIwRGlnaXRhbE9jZWFuL2ZhYmZpbGUucHkpLjwvcD48cD48YnI+PC9wPjxwPi0gVGhlIGRlcGxv eW1lbnQgcGlwZWxpbmUgaXMgYWxzbyBhdCBbR2l0SHViXShodHRwczovL2dpdGh1Yi5jb20vZGFw YW5nbWFvL0Jsb2cvYmxvYi9tYXN0ZXIvRGVwbG95JTIwZWR4JTIwc3BhcmslMjBlbnZpcm9ubWVu dCUyMHRvJTIwRGlnaXRhbE9jZWFuL0RlcGxveSUyMGVkeCUyMHNwYXJrJTIwZW52aXJvbm1lbnQl MjB0byUyMERpZ2l0YWxPY2Vhbi5pcHluYik8L3A+PHA+PGJyPjwvcD4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/3234862478706284550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=3234862478706284550' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/3234862478706284550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/3234862478706284550'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/07/deploy-edx-spark-environment-to.html' title='Deploy edx spark environment to DigitalOcean'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-qycNjcvU1X8/VbjIaTpaFJI/AAAAAAAAERs/tMZnSxXzu7U/s72-c/Capture2.PNG" height="72" width="72"/><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-8898042550578692558</id><published>2015-07-17T11:25:00.000-05:00</published><updated>2015-07-17T11:25:21.393-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Transform SAS files to Parquet through Spark</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=8898042550578692558&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;The demo pipeline is at &lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/Transform%20SAS%20files%20to%20Parquet%20through%20Spark/Transform%20SAS%20files%20to%20Parquet%20through%20Spark.ipynb&quot;&gt;GitHub&lt;/a&gt;.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Since the version 1.3, Spark has introduced the new data structure &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;DataFrame&lt;/code&gt;. A data analyst now could easily scale out the exsiting codes based on the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;DataFrame&lt;/code&gt; from Python or R to a cluster hosting Hadoop and Spark.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;There are quite a few practical scenarios that &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;DataFrame&lt;/code&gt; fits well. For example, a lot of data files including the hardly read SAS files want to merge into a single data store. &lt;a href=&quot;https://parquet.apache.org/documentation/latest/&quot;&gt;Apache Parquet&lt;/a&gt; is a popular column store in a distributed environment, and especially friendly to structured or semi-strucutred data. It is an ideal candidate for a univeral data destination.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;I copy three SAS files called &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;prdsale&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;prdsal2&lt;/code&gt; and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;prdsal3&lt;/code&gt;, which are about a simulated sales record, from the SASHELP library to a Linux directory. And then I launch the SQL context from Spark 1.4. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;The three SAS files now have the size of 4.2MB. My overall strategy is to build a pipeline to realize my purpose such as &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;SAS --&amp;gt; Python --&amp;gt; Spark --&amp;gt; Parquet&lt;/code&gt;.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;import os&lt;br /&gt;try:&lt;br /&gt;    import sas7bdat&lt;br /&gt;    import pandas&lt;br /&gt;except ImportError:&lt;br /&gt;    print(&#39;try to install the packags first&#39;)&lt;br /&gt;&lt;br /&gt;print(&#39;Spark verion is {}&#39;.format(sc.version))&lt;br /&gt;&lt;br /&gt;if type(sqlContext) != pyspark.sql.context.HiveContext:&lt;br /&gt;    print(&#39;reset the Spark SQL context&#39;)&lt;br /&gt;&lt;br /&gt;os.chdir(&#39;/root/playground&#39;)&lt;br /&gt;&lt;br /&gt;def print_bytes(filename):&lt;br /&gt;    print(&#39;{} has {:,} bytes&#39;.format(filename, os.path.getsize(filename)))&lt;br /&gt;&lt;br /&gt;print_bytes(&#39;prdsale.sas7bdat&#39;)&lt;br /&gt;print_bytes(&#39;prdsal2.sas7bdat&#39;)&lt;br /&gt;print_bytes(&#39;prdsal3.sas7bdat&#39;)&lt;br /&gt;&lt;br /&gt;!du -ch --exclude=test_parquet&lt;br /&gt;&lt;br /&gt;Spark verion is 1.4.0&lt;br /&gt;prdsale.sas7bdat has 148,480 bytes&lt;br /&gt;prdsal2.sas7bdat has 2,790,400 bytes&lt;br /&gt;prdsal3.sas7bdat has 1,401,856 bytes&lt;br /&gt;4.2M    .&lt;br /&gt;4.2M    total&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&quot;1-test-dataframe-in-python-and-spark&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;1. Test DataFrame in Python and Spark&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;First I transform a SAS &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sas7bdat&lt;/code&gt; file to a pandas DataFrame.  The great thing in Spark is that a Python/pandas DataFrame could be translated to Spark DataFrame by the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;createDataFrame&lt;/code&gt; method. Now I have two DataFrames: one is a pandas DataFrame and the other is a Spark DataFrame. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;with sas7bdat.SAS7BDAT(&#39;prdsale.sas7bdat&#39;) as f:&lt;br /&gt;     pandas_df = f.to_data_frame()&lt;br /&gt;print(&#39;-----Data in Pandas dataframe-----&#39;)&lt;br /&gt;print(pandas_df.head())&lt;br /&gt;&lt;br /&gt;print(&#39;-----Data in Spark dataframe-----&#39;)&lt;br /&gt;spark_df = sqlContext.createDataFrame(pandas_df)&lt;br /&gt;spark_df.show(5)&lt;br /&gt;&lt;br /&gt;-----Data in Pandas dataframe-----&lt;br /&gt;   ACTUAL COUNTRY   DIVISION  MONTH  PREDICT   PRODTYPE PRODUCT  QUARTER  \&lt;br /&gt;0     925  CANADA  EDUCATION  12054      850  FURNITURE    SOFA        1   &lt;br /&gt;1     999  CANADA  EDUCATION  12085      297  FURNITURE    SOFA        1   &lt;br /&gt;2     608  CANADA  EDUCATION  12113      846  FURNITURE    SOFA        1   &lt;br /&gt;3     642  CANADA  EDUCATION  12144      533  FURNITURE    SOFA        2   &lt;br /&gt;4     656  CANADA  EDUCATION  12174      646  FURNITURE    SOFA        2   &lt;br /&gt;&lt;br /&gt;  REGION  YEAR  &lt;br /&gt;0   EAST  1993  &lt;br /&gt;1   EAST  1993  &lt;br /&gt;2   EAST  1993  &lt;br /&gt;3   EAST  1993  &lt;br /&gt;4   EAST  1993  &lt;br /&gt;-----Data in Spark dataframe-----&lt;br /&gt;+------+-------+---------+-------+-------+---------+-------+-------+------+------+&lt;br /&gt;|ACTUAL|COUNTRY| DIVISION|  MONTH|PREDICT| PRODTYPE|PRODUCT|QUARTER|REGION|  YEAR|&lt;br /&gt;+------+-------+---------+-------+-------+---------+-------+-------+------+------+&lt;br /&gt;| 925.0| CANADA|EDUCATION|12054.0|  850.0|FURNITURE|   SOFA|    1.0|  EAST|1993.0|&lt;br /&gt;| 999.0| CANADA|EDUCATION|12085.0|  297.0|FURNITURE|   SOFA|    1.0|  EAST|1993.0|&lt;br /&gt;| 608.0| CANADA|EDUCATION|12113.0|  846.0|FURNITURE|   SOFA|    1.0|  EAST|1993.0|&lt;br /&gt;| 642.0| CANADA|EDUCATION|12144.0|  533.0|FURNITURE|   SOFA|    2.0|  EAST|1993.0|&lt;br /&gt;| 656.0| CANADA|EDUCATION|12174.0|  646.0|FURNITURE|   SOFA|    2.0|  EAST|1993.0|&lt;br /&gt;+------+-------+---------+-------+-------+---------+-------+-------+------+------+&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;The two should be the identical length. Here both show 1,440 rows. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;print(len(pandas_df))&lt;br /&gt;print(spark_df.count())&lt;br /&gt;&lt;br /&gt;1440&lt;br /&gt;1440&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&quot;2-automate-the-transformation&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;2. Automate  the transformation&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;I write a pipeline function to automate the transformation. As the result, the all three SAS files are saved to the same directory as Parquet format.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;def sas_to_parquet(filelist, destination):&lt;br /&gt;    &quot;&quot;&quot;Save SAS file to parquet&lt;br /&gt;    Args:&lt;br /&gt;        filelist (list): the list of sas file names&lt;br /&gt;        destination (str): the path for parquet&lt;br /&gt;    Returns:&lt;br /&gt;        None&lt;br /&gt;    &quot;&quot;&quot;&lt;br /&gt;    rows = 0&lt;br /&gt;    for i, filename in enumerate(filelist):&lt;br /&gt;        with sas7bdat.SAS7BDAT(filename) as f:&lt;br /&gt;            pandas_df = f.to_data_frame()&lt;br /&gt;            rows += len(pandas_df)&lt;br /&gt;        spark_df = sqlContext.createDataFrame(pandas_df)&lt;br /&gt;        spark_df.save(&quot;{0}/key={1}&quot;.format(destination, i), &quot;parquet&quot;)&lt;br /&gt;    print(&#39;{0} rows have been transformed&#39;.format(rows))&lt;br /&gt;&lt;br /&gt;sasfiles = [x for x in os.listdir(&#39;.&#39;) if x[-9:] == &#39;.sas7bdat&#39;]&lt;br /&gt;print(sasfiles)&lt;br /&gt;&lt;br /&gt;sas_to_parquet(sasfiles, &#39;/root/playground/test_parquet&#39;)&lt;br /&gt;&lt;br /&gt;[&#39;prdsale.sas7bdat&#39;, &#39;prdsal2.sas7bdat&#39;, &#39;prdsal3.sas7bdat&#39;]&lt;br /&gt;36000 rows has been transformed&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Then I read from the newly created Parquet data store. The query shows that the data has been successfully saved.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;df = sqlContext.load(&quot;/root/playground/test_parquet&quot;, &quot;parquet&quot;)&lt;br /&gt;print(df.count())&lt;br /&gt;df.filter(df.key == 0).show(5)&lt;br /&gt;&lt;br /&gt;36000&lt;br /&gt;+------+-------+------+----+-------+-------+---------+-------+-------+-----+------+-----+---------+------+---+&lt;br /&gt;|ACTUAL|COUNTRY|COUNTY|DATE|  MONTH|PREDICT| PRODTYPE|PRODUCT|QUARTER|STATE|  YEAR|MONYR| DIVISION|REGION|key|&lt;br /&gt;+------+-------+------+----+-------+-------+---------+-------+-------+-----+------+-----+---------+------+---+&lt;br /&gt;| 925.0| CANADA|  null|null|12054.0|  850.0|FURNITURE|   SOFA|    1.0| null|1993.0| null|EDUCATION|  EAST|  0|&lt;br /&gt;| 999.0| CANADA|  null|null|12085.0|  297.0|FURNITURE|   SOFA|    1.0| null|1993.0| null|EDUCATION|  EAST|  0|&lt;br /&gt;| 608.0| CANADA|  null|null|12113.0|  846.0|FURNITURE|   SOFA|    1.0| null|1993.0| null|EDUCATION|  EAST|  0|&lt;br /&gt;| 642.0| CANADA|  null|null|12144.0|  533.0|FURNITURE|   SOFA|    2.0| null|1993.0| null|EDUCATION|  EAST|  0|&lt;br /&gt;| 656.0| CANADA|  null|null|12174.0|  646.0|FURNITURE|   SOFA|    2.0| null|1993.0| null|EDUCATION|  EAST|  0|&lt;br /&gt;+------+-------+------+----+-------+-------+---------+-------+-------+-----+------+-----+---------+------+---+&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&quot;3-conclusion&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;3. Conclusion&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;There are multiple advantages to tranform data from various sources to Parquet.&lt;/div&gt;&lt;ol style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;It is an open format that could be read and written by major softwares. &lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;It could be well distributed to HDFS. &lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;It compresses data. &lt;/li&gt;&lt;/ol&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;For example, the original SAS files add up to 4.2 megabyte. Now as Parquet, it only weighs 292KB and achieves 14X compression ratio. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;os.chdir(&#39;/root/playground/test_parquet/&#39;)&lt;br /&gt;!du -ahc &lt;br /&gt;&lt;br /&gt;4.0K    ./key=2/._metadata.crc&lt;br /&gt;4.0K    ./key=2/._SUCCESS.crc&lt;br /&gt;0    ./key=2/_SUCCESS&lt;br /&gt;4.0K    ./key=2/_common_metadata&lt;br /&gt;4.0K    ./key=2/.part-r-00001.gz.parquet.crc&lt;br /&gt;4.0K    ./key=2/._common_metadata.crc&lt;br /&gt;4.0K    ./key=2/_metadata&lt;br /&gt;60K    ./key=2/part-r-00001.gz.parquet&lt;br /&gt;88K    ./key=2&lt;br /&gt;4.0K    ./key=0/._metadata.crc&lt;br /&gt;4.0K    ./key=0/._SUCCESS.crc&lt;br /&gt;0    ./key=0/_SUCCESS&lt;br /&gt;4.0K    ./key=0/_common_metadata&lt;br /&gt;4.0K    ./key=0/.part-r-00001.gz.parquet.crc&lt;br /&gt;4.0K    ./key=0/._common_metadata.crc&lt;br /&gt;4.0K    ./key=0/_metadata&lt;br /&gt;12K    ./key=0/part-r-00001.gz.parquet&lt;br /&gt;40K    ./key=0&lt;br /&gt;4.0K    ./key=1/._metadata.crc&lt;br /&gt;4.0K    ./key=1/._SUCCESS.crc&lt;br /&gt;0    ./key=1/_SUCCESS&lt;br /&gt;4.0K    ./key=1/_common_metadata&lt;br /&gt;4.0K    ./key=1/.part-r-00001.gz.parquet.crc&lt;br /&gt;4.0K    ./key=1/._common_metadata.crc&lt;br /&gt;4.0K    ./key=1/_metadata&lt;br /&gt;132K    ./key=1/part-r-00001.gz.parquet&lt;br /&gt;160K    ./key=1&lt;br /&gt;292K    .&lt;br /&gt;292K    total&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;A bar plot visualizes the signifcant size difference between the two formats. It shows an order of magnitude space deduction. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block !important; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;%matplotlib inline&lt;br /&gt;import matplotlib.pyplot as plt&lt;br /&gt;import numpy as np&lt;br /&gt;index = np.arange(2)&lt;br /&gt;bar_width = 0.35&lt;br /&gt;data = [4200, 292]&lt;br /&gt;header = [&#39;SAS files&#39;, &#39;Parquet&#39;]&lt;br /&gt;plt.bar(index, data)&lt;br /&gt;plt.grid(b=True, which=&#39;major&#39;, axis=&#39;y&#39;)&lt;br /&gt;plt.ylabel(&#39;File Size by KB&#39;)&lt;br /&gt;plt.xticks(index + bar_width, header)&lt;br /&gt;plt.show()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;&lt;a href=&quot;http://3.bp.blogspot.com/--lCxMED8HIo/VaksPWdzLvI/AAAAAAAAEPY/JKQ_NAVTGHo/s1600/output_13_0.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;208&quot; src=&quot;http://3.bp.blogspot.com/--lCxMED8HIo/VaksPWdzLvI/AAAAAAAAEPY/JKQ_NAVTGHo/s320/output_13_0.png&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+VGhlIGRlbW8gcGlwZWxpbmUgaXMgYXQgW0dpdEh1Yl0oaHR0cHM6Ly9naXRodWIuY29tL2Rh cGFuZ21hby9CbG9nL2Jsb2IvbWFzdGVyL1RyYW5zZm9ybSUyMFNBUyUyMGZpbGVzJTIwdG8lMjBQ YXJxdWV0JTIwdGhyb3VnaCUyMFNwYXJrL1RyYW5zZm9ybSUyMFNBUyUyMGZpbGVzJTIwdG8lMjBQ YXJxdWV0JTIwdGhyb3VnaCUyMFNwYXJrLmlweW5iKS48L3A+PHA+PGJyPjwvcD48cD48YnI+PC9w PjxwPlNpbmNlIHRoZSB2ZXJzaW9uIDEuMywgU3BhcmsgaGFzIGludHJvZHVjZWQgdGhlIG5ldyBk YXRhIHN0cnVjdHVyZSBgRGF0YUZyYW1lYC4gQSBkYXRhIGFuYWx5c3Qgbm93IGNvdWxkIGVhc2ls eSBzY2FsZSBvdXQgdGhlIGV4c2l0aW5nIGNvZGVzIGJhc2VkIG9uIHRoZSBgRGF0YUZyYW1lYCBm cm9tIFB5dGhvbiBvciBSIHRvIGEgY2x1c3RlciBob3N0aW5nIEhhZG9vcCBhbmQgU3BhcmsuPC9w PjxwPjxicj48L3A+PHA+VGhlcmUgYXJlIHF1aXRlIGEgZmV3IHByYWN0aWNhbCBzY2VuYXJpb3Mg dGhhdCBgRGF0YUZyYW1lYCBmaXRzIHdlbGwuIEZvciBleGFtcGxlLCBhIGxvdCBvZiBkYXRhIGZp bGVzIGluY2x1ZGluZyB0aGUgaGFyZGx5IHJlYWQgU0FTIGZpbGVzIHdhbnQgdG8gbWVyZ2UgaW50 byBhIHNpbmdsZSBkYXRhIHN0b3JlLiBbQXBhY2hlIFBhcnF1ZXRdKGh0dHBzOi8vcGFycXVldC5h cGFjaGUub3JnL2RvY3VtZW50YXRpb24vbGF0ZXN0LykgaXMgYSBwb3B1bGFyIGNvbHVtbiBzdG9y ZSBpbiBhIGRpc3RyaWJ1dGVkIGVudmlyb25tZW50LCBhbmQgZXNwZWNpYWxseSBmcmllbmRseSB0 byBzdHJ1Y3R1cmVkIG9yIHNlbWktc3RydWN1dHJlZCBkYXRhLiBJdCBpcyBhbiBpZGVhbCBjYW5k aWRhdGUgZm9yIGEgdW5pdmVyYWwgZGF0YSBkZXN0aW5hdGlvbi48L3A+PHA+PGJyPjwvcD48cD5J IGNvcHkgdGhyZWUgU0FTIGZpbGVzIGNhbGxlZCBgcHJkc2FsZWAsIGBwcmRzYWwyYCBhbmQgYHBy ZHNhbDNgLCB3aGljaCBhcmUgYWJvdXQgYSBzaW11bGF0ZWQgc2FsZXMgcmVjb3JkLCBmcm9tIHRo ZSBTQVNIRUxQIGxpYnJhcnkgdG8gYSBMaW51eCBkaXJlY3RvcnkuIEFuZCB0aGVuIEkgbGF1bmNo IHRoZSBTUUwgY29udGV4dCBmcm9tIFNwYXJrIDEuNC4mbmJzcDs8L3A+PHA+PGJyPjwvcD48cD5U aGUgdGhyZWUgU0FTIGZpbGVzIG5vdyBoYXZlIHRoZSBzaXplIG9mIDQuMk1CLiBNeSBvdmVyYWxs IHN0cmF0ZWd5IGlzIHRvIGJ1aWxkIGEgcGlwZWxpbmUgdG8gcmVhbGl6ZSBteSBwdXJwb3NlIHN1 Y2ggYXMgYFNBUyAtLSZndDsgUHl0aG9uIC0tJmd0OyBTcGFyayAtLSZndDsgUGFycXVldGAuPC9w PjxwPjxicj48L3A+PHA+PGJyPjwvcD48cD4mbmJzcDsgJm5ic3A7IGltcG9ydCBvczwvcD48cD4m bmJzcDsgJm5ic3A7IHRyeTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGltcG9y dCBzYXM3YmRhdDwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgaW1wb3J0IHBhbmRh czwvcD48cD4mbmJzcDsgJm5ic3A7IGV4Y2VwdCBJbXBvcnRFcnJvcjo8L3A+PHA+Jm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7IHByaW50KCd0cnkgdG8gaW5zdGFsbCB0aGUgcGFja2FncyBmaXJz dCcpPC9wPjxwPiZuYnNwOyAmbmJzcDsmbmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyBwcmludCgn U3BhcmsgdmVyaW9uIGlzIHt9Jy5mb3JtYXQoc2MudmVyc2lvbikpPC9wPjxwPiZuYnNwOyAmbmJz cDsmbmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyBpZiB0eXBlKHNxbENvbnRleHQpICE9IHB5c3Bh cmsuc3FsLmNvbnRleHQuSGl2ZUNvbnRleHQ6PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyBwcmludCgncmVzZXQgdGhlIFNwYXJrIFNRTCBjb250ZXh0Jyk8L3A+PHA+Jm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7Jm5ic3A7PC9wPjxwPiZuYnNwOyAmbmJzcDsgb3MuY2hkaXIoJy9y b290L3BsYXlncm91bmQnKTwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsmbmJzcDs8 L3A+PHA+Jm5ic3A7ICZuYnNwOyBkZWYgcHJpbnRfYnl0ZXMoZmlsZW5hbWUpOjwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgcHJpbnQoJ3t9IGhhcyB7Oix9IGJ5dGVzJy5mb3JtYXQo ZmlsZW5hbWUsIG9zLnBhdGguZ2V0c2l6ZShmaWxlbmFtZSkpKTwvcD48cD4mbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsmbmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyBwcmludF9ieXRlcygncHJk c2FsZS5zYXM3YmRhdCcpPC9wPjxwPiZuYnNwOyAmbmJzcDsgcHJpbnRfYnl0ZXMoJ3ByZHNhbDIu c2FzN2JkYXQnKTwvcD48cD4mbmJzcDsgJm5ic3A7IHByaW50X2J5dGVzKCdwcmRzYWwzLnNhczdi ZGF0Jyk8L3A+PHA+Jm5ic3A7ICZuYnNwOyZuYnNwOzwvcD48cD4mbmJzcDsgJm5ic3A7ICFkdSAt Y2ggLS1leGNsdWRlPXRlc3RfcGFycXVldDwvcD48cD48YnI+PC9wPjxwPiZuYnNwOyAmbmJzcDsg U3BhcmsgdmVyaW9uIGlzIDEuNC4wPC9wPjxwPiZuYnNwOyAmbmJzcDsgcHJkc2FsZS5zYXM3YmRh dCBoYXMgMTQ4LDQ4MCBieXRlczwvcD48cD4mbmJzcDsgJm5ic3A7IHByZHNhbDIuc2FzN2JkYXQg aGFzIDIsNzkwLDQwMCBieXRlczwvcD48cD4mbmJzcDsgJm5ic3A7IHByZHNhbDMuc2FzN2JkYXQg aGFzIDEsNDAxLDg1NiBieXRlczwvcD48cD4mbmJzcDsgJm5ic3A7IDQuMk08c3BhbiBjbGFzcz0i QXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+LjwvcD48cD4m bmJzcDsgJm5ic3A7IDQuMk08c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3aGl0 ZS1zcGFjZTpwcmUiPgk8L3NwYW4+dG90YWw8L3A+PHA+PGJyPjwvcD48cD48YnI+PC9wPjxwPiMj IyMxLiBUZXN0IERhdGFGcmFtZSBpbiBQeXRob24gYW5kIFNwYXJrPC9wPjxwPjxicj48L3A+PHA+ Rmlyc3QgSSB0cmFuc2Zvcm0gYSBTQVMgYHNhczdiZGF0YCBmaWxlIHRvIGEgcGFuZGFzIERhdGFG cmFtZS4gJm5ic3A7VGhlIGdyZWF0IHRoaW5nIGluIFNwYXJrIGlzIHRoYXQgYSBQeXRob24vcGFu ZGFzIERhdGFGcmFtZSBjb3VsZCBiZSB0cmFuc2xhdGVkIHRvIFNwYXJrIERhdGFGcmFtZSBieSB0 aGUgYGNyZWF0ZURhdGFGcmFtZWAgbWV0aG9kLiBOb3cgSSBoYXZlIHR3byBEYXRhRnJhbWVzOiBv bmUgaXMgYSBwYW5kYXMgRGF0YUZyYW1lIGFuZCB0aGUgb3RoZXIgaXMgYSBTcGFyayBEYXRhRnJh bWUuJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+PGJyPjwvcD48cD4mbmJzcDsgJm5ic3A7IHdpdGgg c2FzN2JkYXQuU0FTN0JEQVQoJ3ByZHNhbGUuc2FzN2JkYXQnKSBhcyBmOjwvcD48cD4mbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7cGFuZGFzX2RmID0gZi50b19kYXRhX2ZyYW1lKCk8 L3A+PHA+Jm5ic3A7ICZuYnNwOyBwcmludCgnLS0tLS1EYXRhIGluIFBhbmRhcyBkYXRhZnJhbWUt LS0tLScpPC9wPjxwPiZuYnNwOyAmbmJzcDsgcHJpbnQocGFuZGFzX2RmLmhlYWQoKSk8L3A+PHA+ Jm5ic3A7ICZuYnNwOyZuYnNwOzwvcD48cD4mbmJzcDsgJm5ic3A7IHByaW50KCctLS0tLURhdGEg aW4gU3BhcmsgZGF0YWZyYW1lLS0tLS0nKTwvcD48cD4mbmJzcDsgJm5ic3A7IHNwYXJrX2RmID0g c3FsQ29udGV4dC5jcmVhdGVEYXRhRnJhbWUocGFuZGFzX2RmKTwvcD48cD4mbmJzcDsgJm5ic3A7 IHNwYXJrX2RmLnNob3coNSk8L3A+PHA+PGJyPjwvcD48cD4mbmJzcDsgJm5ic3A7IC0tLS0tRGF0 YSBpbiBQYW5kYXMgZGF0YWZyYW1lLS0tLS08L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7QUNUVUFMIENPVU5UUlkgJm5ic3A7IERJVklTSU9OICZuYnNwO01PTlRIICZuYnNwO1BSRURJ Q1QgJm5ic3A7IFBST0RUWVBFIFBST0RVQ1QgJm5ic3A7UVVBUlRFUiAmbmJzcDtcPC9wPjxwPiZu YnNwOyAmbmJzcDsgMCAmbmJzcDsgJm5ic3A7IDkyNSAmbmJzcDtDQU5BREEgJm5ic3A7RURVQ0FU SU9OICZuYnNwOzEyMDU0ICZuYnNwOyAmbmJzcDsgJm5ic3A7ODUwICZuYnNwO0ZVUk5JVFVSRSAm bmJzcDsgJm5ic3A7U09GQSAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsxICZuYnNwOyZuYnNw OzwvcD48cD4mbmJzcDsgJm5ic3A7IDEgJm5ic3A7ICZuYnNwOyA5OTkgJm5ic3A7Q0FOQURBICZu YnNwO0VEVUNBVElPTiAmbmJzcDsxMjA4NSAmbmJzcDsgJm5ic3A7ICZuYnNwOzI5NyAmbmJzcDtG VVJOSVRVUkUgJm5ic3A7ICZuYnNwO1NPRkEgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7MSAm bmJzcDsmbmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyAyICZuYnNwOyAmbmJzcDsgNjA4ICZuYnNw O0NBTkFEQSAmbmJzcDtFRFVDQVRJT04gJm5ic3A7MTIxMTMgJm5ic3A7ICZuYnNwOyAmbmJzcDs4 NDYgJm5ic3A7RlVSTklUVVJFICZuYnNwOyAmbmJzcDtTT0ZBICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOzEgJm5ic3A7Jm5ic3A7PC9wPjxwPiZuYnNwOyAmbmJzcDsgMyAmbmJzcDsgJm5ic3A7 IDY0MiAmbmJzcDtDQU5BREEgJm5ic3A7RURVQ0FUSU9OICZuYnNwOzEyMTQ0ICZuYnNwOyAmbmJz cDsgJm5ic3A7NTMzICZuYnNwO0ZVUk5JVFVSRSAmbmJzcDsgJm5ic3A7U09GQSAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsyICZuYnNwOyZuYnNwOzwvcD48cD4mbmJzcDsgJm5ic3A7IDQgJm5i c3A7ICZuYnNwOyA2NTYgJm5ic3A7Q0FOQURBICZuYnNwO0VEVUNBVElPTiAmbmJzcDsxMjE3NCAm bmJzcDsgJm5ic3A7ICZuYnNwOzY0NiAmbmJzcDtGVVJOSVRVUkUgJm5ic3A7ICZuYnNwO1NPRkEg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7MiAmbmJzcDsmbmJzcDs8L3A+PHA+Jm5ic3A7ICZu YnNwOyZuYnNwOzwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyBSRUdJT04gJm5ic3A7WUVBUiAm bmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyAwICZuYnNwOyBFQVNUICZuYnNwOzE5OTMgJm5ic3A7 PC9wPjxwPiZuYnNwOyAmbmJzcDsgMSAmbmJzcDsgRUFTVCAmbmJzcDsxOTkzICZuYnNwOzwvcD48 cD4mbmJzcDsgJm5ic3A7IDIgJm5ic3A7IEVBU1QgJm5ic3A7MTk5MyAmbmJzcDs8L3A+PHA+Jm5i c3A7ICZuYnNwOyAzICZuYnNwOyBFQVNUICZuYnNwOzE5OTMgJm5ic3A7PC9wPjxwPiZuYnNwOyAm bmJzcDsgNCAmbmJzcDsgRUFTVCAmbmJzcDsxOTkzICZuYnNwOzwvcD48cD4mbmJzcDsgJm5ic3A7 IC0tLS0tRGF0YSBpbiBTcGFyayBkYXRhZnJhbWUtLS0tLTwvcD48cD4mbmJzcDsgJm5ic3A7ICst LS0tLS0rLS0tLS0tLSstLS0tLS0tLS0rLS0tLS0tLSstLS0tLS0tKy0tLS0tLS0tLSstLS0tLS0t Ky0tLS0tLS0rLS0tLS0tKy0tLS0tLSs8L3A+PHA+Jm5ic3A7ICZuYnNwOyB8QUNUVUFMfENPVU5U Ull8IERJVklTSU9OfCAmbmJzcDtNT05USHxQUkVESUNUfCBQUk9EVFlQRXxQUk9EVUNUfFFVQVJU RVJ8UkVHSU9OfCAmbmJzcDtZRUFSfDwvcD48cD4mbmJzcDsgJm5ic3A7ICstLS0tLS0rLS0tLS0t LSstLS0tLS0tLS0rLS0tLS0tLSstLS0tLS0tKy0tLS0tLS0tLSstLS0tLS0tKy0tLS0tLS0rLS0t LS0tKy0tLS0tLSs8L3A+PHA+Jm5ic3A7ICZuYnNwOyB8IDkyNS4wfCBDQU5BREF8RURVQ0FUSU9O fDEyMDU0LjB8ICZuYnNwOzg1MC4wfEZVUk5JVFVSRXwgJm5ic3A7IFNPRkF8ICZuYnNwOyAmbmJz cDsxLjB8ICZuYnNwO0VBU1R8MTk5My4wfDwvcD48cD4mbmJzcDsgJm5ic3A7IHwgOTk5LjB8IENB TkFEQXxFRFVDQVRJT058MTIwODUuMHwgJm5ic3A7Mjk3LjB8RlVSTklUVVJFfCAmbmJzcDsgU09G QXwgJm5ic3A7ICZuYnNwOzEuMHwgJm5ic3A7RUFTVHwxOTkzLjB8PC9wPjxwPiZuYnNwOyAmbmJz cDsgfCA2MDguMHwgQ0FOQURBfEVEVUNBVElPTnwxMjExMy4wfCAmbmJzcDs4NDYuMHxGVVJOSVRV UkV8ICZuYnNwOyBTT0ZBfCAmbmJzcDsgJm5ic3A7MS4wfCAmbmJzcDtFQVNUfDE5OTMuMHw8L3A+ PHA+Jm5ic3A7ICZuYnNwOyB8IDY0Mi4wfCBDQU5BREF8RURVQ0FUSU9OfDEyMTQ0LjB8ICZuYnNw OzUzMy4wfEZVUk5JVFVSRXwgJm5ic3A7IFNPRkF8ICZuYnNwOyAmbmJzcDsyLjB8ICZuYnNwO0VB U1R8MTk5My4wfDwvcD48cD4mbmJzcDsgJm5ic3A7IHwgNjU2LjB8IENBTkFEQXxFRFVDQVRJT058 MTIxNzQuMHwgJm5ic3A7NjQ2LjB8RlVSTklUVVJFfCAmbmJzcDsgU09GQXwgJm5ic3A7ICZuYnNw OzIuMHwgJm5ic3A7RUFTVHwxOTkzLjB8PC9wPjxwPiZuYnNwOyAmbmJzcDsgKy0tLS0tLSstLS0t LS0tKy0tLS0tLS0tLSstLS0tLS0tKy0tLS0tLS0rLS0tLS0tLS0tKy0tLS0tLS0rLS0tLS0tLSst LS0tLS0rLS0tLS0tKzwvcD48cD4mbmJzcDsgJm5ic3A7Jm5ic3A7PC9wPjxwPjxicj48L3A+PHA+ PGJyPjwvcD48cD5UaGUgdHdvIHNob3VsZCBiZSB0aGUgaWRlbnRpY2FsIGxlbmd0aC4gSGVyZSBi b3RoIHNob3cgMSw0NDAgcm93cy4mbmJzcDs8L3A+PHA+PGJyPjwvcD48cD48YnI+PC9wPjxwPiZu YnNwOyAmbmJzcDsgcHJpbnQobGVuKHBhbmRhc19kZikpPC9wPjxwPiZuYnNwOyAmbmJzcDsgcHJp bnQoc3BhcmtfZGYuY291bnQoKSk8L3A+PHA+PGJyPjwvcD48cD4mbmJzcDsgJm5ic3A7IDE0NDA8 L3A+PHA+Jm5ic3A7ICZuYnNwOyAxNDQwPC9wPjxwPjxicj48L3A+PHA+PGJyPjwvcD48cD4jIyMj Mi4gQXV0b21hdGUgJm5ic3A7dGhlIHRyYW5zZm9ybWF0aW9uPC9wPjxwPjxicj48L3A+PHA+SSB3 cml0ZSBhIHBpcGVsaW5lIGZ1bmN0aW9uIHRvIGF1dG9tYXRlIHRoZSB0cmFuc2Zvcm1hdGlvbi4g QXMgdGhlIHJlc3VsdCwgdGhlIGFsbCB0aHJlZSBTQVMgZmlsZXMgYXJlIHNhdmVkIHRvIHRoZSBz YW1lIGRpcmVjdG9yeSBhcyBQYXJxdWV0IGZvcm1hdC48L3A+PHA+PGJyPjwvcD48cD48YnI+PC9w PjxwPiZuYnNwOyAmbmJzcDsgZGVmIHNhc190b19wYXJxdWV0KGZpbGVsaXN0LCBkZXN0aW5hdGlv bik6PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAiIiJTYXZlIFNBUyBmaWxlIHRv IHBhcnF1ZXQ8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IEFyZ3M6PC9wPjxwPiZu YnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGZpbGVsaXN0IChsaXN0KTog dGhlIGxpc3Qgb2Ygc2FzIGZpbGUgbmFtZXM8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgZGVzdGluYXRpb24gKHN0cik6IHRoZSBwYXRoIGZvciBwYXJxdWV0 PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBSZXR1cm5zOjwvcD48cD4mbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBOb25lPC9wPjxwPiZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyAiIiI8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IHJv d3MgPSAwPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBmb3IgaSwgZmlsZW5hbWUg aW4gZW51bWVyYXRlKGZpbGVsaXN0KTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgd2l0aCBzYXM3YmRhdC5TQVM3QkRBVChmaWxlbmFtZSkgYXMgZjo8L3A+ PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyBwYW5kYXNfZGYgPSBmLnRvX2RhdGFfZnJhbWUoKTwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IHJvd3MgKz0gbGVuKHBhbmRhc19k Zik8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgc3Bhcmtf ZGYgPSBzcWxDb250ZXh0LmNyZWF0ZURhdGFGcmFtZShwYW5kYXNfZGYpPC9wPjxwPiZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IHNwYXJrX2RmLnNhdmUoInswfS9rZXk9 ezF9Ii5mb3JtYXQoZGVzdGluYXRpb24sIGkpLCAicGFycXVldCIpPC9wPjxwPiZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyBwcmludCgnezB9IHJvd3MgaGF2ZSBiZWVuIHRyYW5zZm9ybWVkJy5m b3JtYXQocm93cykpPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyZuYnNwOzwvcD48 cD4mbmJzcDsgJm5ic3A7IHNhc2ZpbGVzID0gW3ggZm9yIHggaW4gb3MubGlzdGRpcignLicpIGlm IHhbLTk6XSA9PSAnLnNhczdiZGF0J108L3A+PHA+Jm5ic3A7ICZuYnNwOyBwcmludChzYXNmaWxl cyk8L3A+PHA+Jm5ic3A7ICZuYnNwOyZuYnNwOzwvcD48cD4mbmJzcDsgJm5ic3A7IHNhc190b19w YXJxdWV0KHNhc2ZpbGVzLCAnL3Jvb3QvcGxheWdyb3VuZC90ZXN0X3BhcnF1ZXQnKTwvcD48cD48 YnI+PC9wPjxwPjxicj48L3A+PHA+Jm5ic3A7ICZuYnNwOyBbJ3ByZHNhbGUuc2FzN2JkYXQnLCAn cHJkc2FsMi5zYXM3YmRhdCcsICdwcmRzYWwzLnNhczdiZGF0J108L3A+PHA+Jm5ic3A7ICZuYnNw OyAzNjAwMCByb3dzIGhhcyBiZWVuIHRyYW5zZm9ybWVkPC9wPjxwPjxicj48L3A+PHA+PGJyPjwv cD48cD5UaGVuIEkgcmVhZCBmcm9tIHRoZSBuZXdseSBjcmVhdGVkIFBhcnF1ZXQgZGF0YSBzdG9y ZS4gVGhlIHF1ZXJ5IHNob3dzIHRoYXQgdGhlIGRhdGEgaGFzIGJlZW4gc3VjY2Vzc2Z1bGx5IHNh dmVkLjwvcD48cD48YnI+PC9wPjxwPjxicj48L3A+PHA+Jm5ic3A7ICZuYnNwOyBkZiA9IHNxbENv bnRleHQubG9hZCgiL3Jvb3QvcGxheWdyb3VuZC90ZXN0X3BhcnF1ZXQiLCAicGFycXVldCIpPC9w PjxwPiZuYnNwOyAmbmJzcDsgcHJpbnQoZGYuY291bnQoKSk8L3A+PHA+Jm5ic3A7ICZuYnNwOyBk Zi5maWx0ZXIoZGYua2V5ID09IDApLnNob3coNSk8L3A+PHA+PGJyPjwvcD48cD4mbmJzcDsgJm5i c3A7IDM2MDAwPC9wPjxwPiZuYnNwOyAmbmJzcDsgKy0tLS0tLSstLS0tLS0tKy0tLS0tLSstLS0t Ky0tLS0tLS0rLS0tLS0tLSstLS0tLS0tLS0rLS0tLS0tLSstLS0tLS0tKy0tLS0tKy0tLS0tLSst LS0tLSstLS0tLS0tLS0rLS0tLS0tKy0tLSs8L3A+PHA+Jm5ic3A7ICZuYnNwOyB8QUNUVUFMfENP VU5UUll8Q09VTlRZfERBVEV8ICZuYnNwO01PTlRIfFBSRURJQ1R8IFBST0RUWVBFfFBST0RVQ1R8 UVVBUlRFUnxTVEFURXwgJm5ic3A7WUVBUnxNT05ZUnwgRElWSVNJT058UkVHSU9OfGtleXw8L3A+ PHA+Jm5ic3A7ICZuYnNwOyArLS0tLS0tKy0tLS0tLS0rLS0tLS0tKy0tLS0rLS0tLS0tLSstLS0t LS0tKy0tLS0tLS0tLSstLS0tLS0tKy0tLS0tLS0rLS0tLS0rLS0tLS0tKy0tLS0tKy0tLS0tLS0t LSstLS0tLS0rLS0tKzwvcD48cD4mbmJzcDsgJm5ic3A7IHwgOTI1LjB8IENBTkFEQXwgJm5ic3A7 bnVsbHxudWxsfDEyMDU0LjB8ICZuYnNwOzg1MC4wfEZVUk5JVFVSRXwgJm5ic3A7IFNPRkF8ICZu YnNwOyAmbmJzcDsxLjB8IG51bGx8MTk5My4wfCBudWxsfEVEVUNBVElPTnwgJm5ic3A7RUFTVHwg Jm5ic3A7MHw8L3A+PHA+Jm5ic3A7ICZuYnNwOyB8IDk5OS4wfCBDQU5BREF8ICZuYnNwO251bGx8 bnVsbHwxMjA4NS4wfCAmbmJzcDsyOTcuMHxGVVJOSVRVUkV8ICZuYnNwOyBTT0ZBfCAmbmJzcDsg Jm5ic3A7MS4wfCBudWxsfDE5OTMuMHwgbnVsbHxFRFVDQVRJT058ICZuYnNwO0VBU1R8ICZuYnNw OzB8PC9wPjxwPiZuYnNwOyAmbmJzcDsgfCA2MDguMHwgQ0FOQURBfCAmbmJzcDtudWxsfG51bGx8 MTIxMTMuMHwgJm5ic3A7ODQ2LjB8RlVSTklUVVJFfCAmbmJzcDsgU09GQXwgJm5ic3A7ICZuYnNw OzEuMHwgbnVsbHwxOTkzLjB8IG51bGx8RURVQ0FUSU9OfCAmbmJzcDtFQVNUfCAmbmJzcDswfDwv cD48cD4mbmJzcDsgJm5ic3A7IHwgNjQyLjB8IENBTkFEQXwgJm5ic3A7bnVsbHxudWxsfDEyMTQ0 LjB8ICZuYnNwOzUzMy4wfEZVUk5JVFVSRXwgJm5ic3A7IFNPRkF8ICZuYnNwOyAmbmJzcDsyLjB8 IG51bGx8MTk5My4wfCBudWxsfEVEVUNBVElPTnwgJm5ic3A7RUFTVHwgJm5ic3A7MHw8L3A+PHA+ Jm5ic3A7ICZuYnNwOyB8IDY1Ni4wfCBDQU5BREF8ICZuYnNwO251bGx8bnVsbHwxMjE3NC4wfCAm bmJzcDs2NDYuMHxGVVJOSVRVUkV8ICZuYnNwOyBTT0ZBfCAmbmJzcDsgJm5ic3A7Mi4wfCBudWxs fDE5OTMuMHwgbnVsbHxFRFVDQVRJT058ICZuYnNwO0VBU1R8ICZuYnNwOzB8PC9wPjxwPiZuYnNw OyAmbmJzcDsgKy0tLS0tLSstLS0tLS0tKy0tLS0tLSstLS0tKy0tLS0tLS0rLS0tLS0tLSstLS0t LS0tLS0rLS0tLS0tLSstLS0tLS0tKy0tLS0tKy0tLS0tLSstLS0tLSstLS0tLS0tLS0rLS0tLS0t Ky0tLSs8L3A+PHA+Jm5ic3A7ICZuYnNwOyZuYnNwOzwvcD48cD48YnI+PC9wPjxwPjxicj48L3A+ PHA+IyMjIzMuIENvbmNsdXNpb248L3A+PHA+PGJyPjwvcD48cD5UaGVyZSBhcmUgbXVsdGlwbGUg YWR2YW50YWdlcyB0byB0cmFuZm9ybSBkYXRhIGZyb20gdmFyaW91cyBzb3VyY2VzIHRvIFBhcnF1 ZXQuPC9wPjxwPjxicj48L3A+PHA+MS4gSXQgaXMgYW4gb3BlbiBmb3JtYXQgdGhhdCBjb3VsZCBi ZSByZWFkIGFuZCB3cml0dGVuIGJ5IG1ham9yIHNvZnR3YXJlcy4mbmJzcDs8L3A+PHA+Mi4gSXQg Y291bGQgYmUgd2VsbCBkaXN0cmlidXRlZCB0byBIREZTLiZuYnNwOzwvcD48cD4zLiBJdCBjb21w cmVzc2VzIGRhdGEuJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+Rm9yIGV4YW1wbGUsIHRoZSBvcmln aW5hbCBTQVMgZmlsZXMgYWRkIHVwIHRvIDQuMiBtZWdhYnl0ZS4gTm93IGFzIFBhcnF1ZXQsIGl0 IG9ubHkgd2VpZ2hzIDI5MktCIGFuZCBhY2hpZXZlcyAxNFggY29tcHJlc3Npb24gcmF0aW8uJm5i c3A7PC9wPjxwPjxicj48L3A+PHA+PGJyPjwvcD48cD4mbmJzcDsgJm5ic3A7IG9zLmNoZGlyKCcv cm9vdC9wbGF5Z3JvdW5kL3Rlc3RfcGFycXVldC8nKTwvcD48cD4mbmJzcDsgJm5ic3A7ICFkdSAt YWhjJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9 IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTIv Ll9tZXRhZGF0YS5jcmM8L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9IkFwcGxl LXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTIvLl9TVUND RVNTLmNyYzwvcD48cD4mbmJzcDsgJm5ic3A7IDA8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4i IHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9Mi9fU1VDQ0VTUzwvcD48cD4m bmJzcDsgJm5ic3A7IDQuMEs8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3aGl0 ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9Mi9fY29tbW9uX21ldGFkYXRhPC9wPjxwPiZuYnNw OyAmbmJzcDsgNC4wSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5bGU9IndoaXRlLXNw YWNlOnByZSI+CTwvc3Bhbj4uL2tleT0yLy5wYXJ0LXItMDAwMDEuZ3oucGFycXVldC5jcmM8L3A+ PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0i d2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTIvLl9jb21tb25fbWV0YWRhdGEuY3JjPC9w PjxwPiZuYnNwOyAmbmJzcDsgNC4wSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5bGU9 IndoaXRlLXNwYWNlOnByZSI+CTwvc3Bhbj4uL2tleT0yL19tZXRhZGF0YTwvcD48cD4mbmJzcDsg Jm5ic3A7IDYwSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5bGU9IndoaXRlLXNwYWNl OnByZSI+CTwvc3Bhbj4uL2tleT0yL3BhcnQtci0wMDAwMS5nei5wYXJxdWV0PC9wPjxwPiZuYnNw OyAmbmJzcDsgODhLPHNwYW4gY2xhc3M9IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3Bh Y2U6cHJlIj4JPC9zcGFuPi4va2V5PTI8L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xh c3M9IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5 PTAvLl9tZXRhZGF0YS5jcmM8L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9IkFw cGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTAvLl9T VUNDRVNTLmNyYzwvcD48cD4mbmJzcDsgJm5ic3A7IDA8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNw YW4iIHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9MC9fU1VDQ0VTUzwvcD48 cD4mbmJzcDsgJm5ic3A7IDQuMEs8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3 aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9MC9fY29tbW9uX21ldGFkYXRhPC9wPjxwPiZu YnNwOyAmbmJzcDsgNC4wSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5bGU9IndoaXRl LXNwYWNlOnByZSI+CTwvc3Bhbj4uL2tleT0wLy5wYXJ0LXItMDAwMDEuZ3oucGFycXVldC5jcmM8 L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9IkFwcGxlLXRhYi1zcGFuIiBzdHls ZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTAvLl9jb21tb25fbWV0YWRhdGEuY3Jj PC9wPjxwPiZuYnNwOyAmbmJzcDsgNC4wSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5 bGU9IndoaXRlLXNwYWNlOnByZSI+CTwvc3Bhbj4uL2tleT0wL19tZXRhZGF0YTwvcD48cD4mbmJz cDsgJm5ic3A7IDEySzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5bGU9IndoaXRlLXNw YWNlOnByZSI+CTwvc3Bhbj4uL2tleT0wL3BhcnQtci0wMDAwMS5nei5wYXJxdWV0PC9wPjxwPiZu YnNwOyAmbmJzcDsgNDBLPHNwYW4gY2xhc3M9IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUt c3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTA8L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4g Y2xhc3M9IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4v a2V5PTEvLl9tZXRhZGF0YS5jcmM8L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9 IkFwcGxlLXRhYi1zcGFuIiBzdHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTEv Ll9TVUNDRVNTLmNyYzwvcD48cD4mbmJzcDsgJm5ic3A7IDA8c3BhbiBjbGFzcz0iQXBwbGUtdGFi LXNwYW4iIHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9MS9fU1VDQ0VTUzwv cD48cD4mbmJzcDsgJm5ic3A7IDQuMEs8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxl PSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9MS9fY29tbW9uX21ldGFkYXRhPC9wPjxw PiZuYnNwOyAmbmJzcDsgNC4wSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIgc3R5bGU9Indo aXRlLXNwYWNlOnByZSI+CTwvc3Bhbj4uL2tleT0xLy5wYXJ0LXItMDAwMDEuZ3oucGFycXVldC5j cmM8L3A+PHA+Jm5ic3A7ICZuYnNwOyA0LjBLPHNwYW4gY2xhc3M9IkFwcGxlLXRhYi1zcGFuIiBz dHlsZT0id2hpdGUtc3BhY2U6cHJlIj4JPC9zcGFuPi4va2V5PTEvLl9jb21tb25fbWV0YWRhdGEu Y3JjPC9wPjxwPiZuYnNwOyAmbmJzcDsgNC4wSzxzcGFuIGNsYXNzPSJBcHBsZS10YWItc3BhbiIg c3R5bGU9IndoaXRlLXNwYWNlOnByZSI+CTwvc3Bhbj4uL2tleT0xL19tZXRhZGF0YTwvcD48cD4m bmJzcDsgJm5ic3A7IDEzMks8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3aGl0 ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9MS9wYXJ0LXItMDAwMDEuZ3oucGFycXVldDwvcD48 cD4mbmJzcDsgJm5ic3A7IDE2MEs8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3 aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+Li9rZXk9MTwvcD48cD4mbmJzcDsgJm5ic3A7IDI5Mks8 c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4iIHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3Nw YW4+LjwvcD48cD4mbmJzcDsgJm5ic3A7IDI5Mks8c3BhbiBjbGFzcz0iQXBwbGUtdGFiLXNwYW4i IHN0eWxlPSJ3aGl0ZS1zcGFjZTpwcmUiPgk8L3NwYW4+dG90YWw8L3A+PHA+PGJyPjwvcD48cD48 YnI+PC9wPjxwPkEgYmFyIHBsb3QgdmlzdWFsaXplcyB0aGUgc2lnbmlmY2FudCBzaXplIGRpZmZl cmVuY2UgYmV0d2VlbiB0aGUgdHdvIGZvcm1hdHMuIEl0IHNob3dzIGFuIG9yZGVyIG9mIG1hZ25p dHVkZSBzcGFjZSBkZWR1Y3Rpb24uJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+PGJyPjwvcD48cD4m bmJzcDsgJm5ic3A7ICVtYXRwbG90bGliIGlubGluZTwvcD48cD4mbmJzcDsgJm5ic3A7IGltcG9y dCBtYXRwbG90bGliLnB5cGxvdCBhcyBwbHQ8L3A+PHA+Jm5ic3A7ICZuYnNwOyBpbXBvcnQgbnVt cHkgYXMgbnA8L3A+PHA+Jm5ic3A7ICZuYnNwOyBpbmRleCA9IG5wLmFyYW5nZSgyKTwvcD48cD4m bmJzcDsgJm5ic3A7IGJhcl93aWR0aCA9IDAuMzU8L3A+PHA+Jm5ic3A7ICZuYnNwOyBkYXRhID0g WzQyMDAsIDI5Ml08L3A+PHA+Jm5ic3A7ICZuYnNwOyBoZWFkZXIgPSBbJ1NBUyBmaWxlcycsICdQ YXJxdWV0J108L3A+PHA+Jm5ic3A7ICZuYnNwOyBwbHQuYmFyKGluZGV4LCBkYXRhKTwvcD48cD4m bmJzcDsgJm5ic3A7IHBsdC5ncmlkKGI9VHJ1ZSwgd2hpY2g9J21ham9yJywgYXhpcz0neScpPC9w PjxwPiZuYnNwOyAmbmJzcDsgcGx0LnlsYWJlbCgnRmlsZSBTaXplIGJ5IEtCJyk8L3A+PHA+Jm5i c3A7ICZuYnNwOyBwbHQueHRpY2tzKGluZGV4ICsgYmFyX3dpZHRoLCBoZWFkZXIpPC9wPjxwPiZu YnNwOyAmbmJzcDsgcGx0LnNob3coKTwvcD48cD48YnI+PC9wPjxwPjxicj48L3A+PHAgY2xhc3M9 InNlcGFyYXRvciIgc3R5bGU9InRleHQtYWxpZ246IGNlbnRlcjsgY2xlYXI6IGJvdGg7Ij48YSBp bWFnZWFuY2hvcj0iMSIgaHJlZj0iaHR0cDovLzMuYnAuYmxvZ3Nwb3QuY29tLy0tbEN4TUVEOEhJ by9WYWtzUFdkekx2SS9BQUFBQUFBQUVQWS9KS1FfTkFWVEdIby9zMTYwMC9vdXRwdXRfMTNfMC5w bmciIHN0eWxlPSJtYXJnaW4tbGVmdDogMWVtOyBtYXJnaW4tcmlnaHQ6IDFlbTsiPjxpbWcgc3Jj PSJodHRwczovLzMuYnAuYmxvZ3Nwb3QuY29tLy0tbEN4TUVEOEhJby9WYWtzUFdkekx2SS9BQUFB QUFBQUVQWS9KS1FfTkFWVEdIby9zMzIwL291dHB1dF8xM18wLnBuZyIgYm9yZGVyPSIwIiBzdHls ZT0iIiB3aWR0aD0iMzIwIiBoZWlnaHQ9IjIwOCI+PC9hPjwvcD48cD48YnI+PC9wPjxwPjxicj48 L3A+&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/8898042550578692558/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=8898042550578692558' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8898042550578692558'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8898042550578692558'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/07/transform-sas-files-to-parquet-through.html' title='Transform SAS files to Parquet through Spark'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/--lCxMED8HIo/VaksPWdzLvI/AAAAAAAAEPY/JKQ_NAVTGHo/s72-c/output_13_0.png" height="72" width="72"/><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-3945689256495065527</id><published>2015-06-19T10:43:00.000-05:00</published><updated>2015-06-19T13:39:27.436-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>Two alternative ways to query large dataset in SAS</title><content type='html'>&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;I really appreciate those wonderful comments on my SAS posts by the readers (&lt;a href=&quot;http://www.sasanalysis.com/2015/02/solve-top-n-questions-in-sassql_3.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;1&lt;/a&gt;,&amp;nbsp;&lt;a href=&quot;http://www.sasanalysis.com/2012/05/top-10-tips-and-tricks-about-proc-sql.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;2&lt;/a&gt;,&amp;nbsp;&lt;a href=&quot;http://www.sasanalysis.com/2011/01/top-10-most-powerful-functions-for-proc.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;3&lt;/a&gt;). They gave me a lot of inspirations. Due to SAS or SQL’s inherent limitation, recently I feel difficult in deal with some extremely large SAS datasets (it means that I exhausted all possible traditional ways). Here I conclude two alternative solutions in these extreme cases as a follow-up to the comments.&lt;/div&gt;&lt;ol style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px; margin-top: 0px; padding: 0px 0px 0px 2em;&quot;&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Read Directly&lt;ul style=&quot;box-sizing: border-box; margin-bottom: 0px; margin-top: 0px; padding: 0px 0px 0px 2em;&quot;&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Use a scripting language such as Python to Reading SAS datasets directly&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Code Generator&lt;ul style=&quot;box-sizing: border-box; margin-bottom: 0px; margin-top: 0px; padding: 0px 0px 0px 2em;&quot;&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Use SAS or other scripting languages to generate SAS/SQL codes&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;The examples still use&amp;nbsp;&lt;code style=&quot;background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;&quot;&gt;sashelp.class&lt;/code&gt;, which has 19 rows. The target variable is&amp;nbsp;&lt;code style=&quot;background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;&quot;&gt;weight&lt;/code&gt;.&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;*In SAS&lt;br /&gt;data class;&lt;br /&gt;    set sashelp.class;&lt;br /&gt;    row = _n_;&lt;br /&gt;run;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h4 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.25em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#example-1-find-the-median&quot; id=&quot;user-content-example-1-find-the-median&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.2; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Example 1: Find the median&lt;/h4&gt;&lt;h5 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#sql-query&quot; id=&quot;user-content-sql-query&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;SQL Query&lt;/h5&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;a href=&quot;http://www.sasanalysis.com/2012/05/top-10-tips-and-tricks-about-proc-sql.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;In the comment&lt;/a&gt;, Anders SköllermoFebruary wrote&lt;/div&gt;&lt;blockquote style=&quot;border-left-color: rgb(221, 221, 221); border-left-style: solid; border-left-width: 4px; box-sizing: border-box; color: #777777; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin: 0px 0px 16px; padding: 0px 15px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; margin-bottom: 16px;&quot;&gt;Hi! About 1. Calculate the median of a variable:&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box; margin-bottom: 16px;&quot;&gt;If you look at the details in the SQL code for calculation the median, then you find that the intermediate file is of size N*N obs, where N is the number of obs in the SAS data set.&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box;&quot;&gt;So this is OK for very small files. But for a file with 10000 obs, you have an intermediate file of size 100 million obs. / Br Anders Anders Sköllermo Ph.D., Reuma and Neuro Data Analyst&lt;/div&gt;&lt;/blockquote&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;The SQL query below is simple and pure, so that it can be ported to any other SQL platform. However, just like what Anders said, it is just way too expensive.&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;*In SAS&lt;br /&gt;proc sql;&lt;br /&gt;   select avg(weight) as Median&lt;br /&gt;   from (select e.weight&lt;br /&gt;       from class e, class d&lt;br /&gt;       group by e.weight&lt;br /&gt;       having sum(case when e.weight = d.weight then 1 else 0 end)&lt;br /&gt;          &amp;gt;= abs(sum(sign(e.weight - d.weight)))&lt;br /&gt;    );&lt;br /&gt;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h5 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#proc-univariate&quot; id=&quot;user-content-proc-univariate&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;PROC UNIVARIATE&lt;/h5&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;a href=&quot;http://www.sasanalysis.com/2012/05/top-10-tips-and-tricks-about-proc-sql.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;In the comment&lt;/a&gt;, Anonymous wrote:&lt;/div&gt;&lt;blockquote style=&quot;border-left-color: rgb(221, 221, 221); border-left-style: solid; border-left-width: 4px; box-sizing: border-box; color: #777777; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin: 0px 0px 16px; padding: 0px 15px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; margin-bottom: 16px;&quot;&gt;I noticed the same thing - we tried this on one of our &#39;smaller&#39; datasets (~2.9 million records), and it took forever.&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box;&quot;&gt;Excellent solution, but maybe PROC UNIVARIATE will get you there faster on a large dataset.&lt;/div&gt;&lt;/blockquote&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;Indeed PROC UNIVARIATE is the best solution in SAS to find the median, which utilizes SAS&#39;s built-in powers.&lt;/div&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h6 style=&quot;box-sizing: border-box; color: #777777; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#read-directly&quot; id=&quot;user-content-read-directly&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Read Directly&lt;/h6&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;When the extreme cases come, say SAS cannot even open the entire dataset, we may have to use the streaming method to Reading the sas7bdat file line by line. The sas7bdat format has been decoded by&amp;nbsp;&lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/kasper.eobjects.org/2011/06/sassyReadinger-open-source-Readinger-of-sas.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;Java&lt;/a&gt;,&amp;nbsp;&lt;a href=&quot;http://cran.r-project.org/web/packages/sas7bdat/index.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;R&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href=&quot;https://pypi.python.org/pypi/sas7bdat&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;Python&lt;/a&gt;. Theoretically we don&#39;t need to have SAS to query a SAS dataset.&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Heap_(data_structure)&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;Heap&lt;/a&gt;&amp;nbsp;is an interesting data structure, which easily finds a min or a max. ream the values, we could build a max heap and a min heap to cut the incoming stream into half in Python. The algorithm looks like a heap sorting. The good news is that it only Reading one variable each time and thus saves a lot of space.&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;#In Python&lt;br /&gt;import heapq&lt;br /&gt;from sas7bdat import SAS7BDAT&lt;br /&gt;class MedianStream(object):&lt;br /&gt;    def __init__(self):&lt;br /&gt;        self.first_half = [] # will be a max heap&lt;br /&gt;        self.second_half = [] # will be a min heap, 1/2 chance has one more element&lt;br /&gt;        self.N = 0&lt;br /&gt;&lt;br /&gt;    def insert(self, x):&lt;br /&gt;        heapq.heappush(self.first_half, -x)&lt;br /&gt;        self.N += 1&lt;br /&gt;        if len(self.second_half) == len(self.first_half):&lt;br /&gt;            to_second, to_first = map(heapq.heappop, [self.first_half, self.second_half])&lt;br /&gt;            heapq.heappush(self.second_half, -to_second)&lt;br /&gt;            heapq.heappush(self.first_half, -to_first)&lt;br /&gt;        else:&lt;br /&gt;            to_second = heapq.heappop(self.first_half)&lt;br /&gt;            heapq.heappush(self.second_half, -to_second)&lt;br /&gt;&lt;br /&gt;    def show_median(self):&lt;br /&gt;        if self.N == 0:&lt;br /&gt;            raise IOError(&#39;please use the insert method first&#39;)&lt;br /&gt;        elif self.N % 2 == 0:&lt;br /&gt;            return (-self.first_half[0] + self.second_half[0]) / 2.0&lt;br /&gt;        return -self.first_half[0]&lt;br /&gt;&lt;br /&gt;if __name__ == &quot;__main__&quot;: &lt;br /&gt;    stream = MedianStream()&lt;br /&gt;    with SAS7BDAT(&#39;class.sas7bdat&#39;) as infile:&lt;br /&gt;        for i, line in enumerate(infile):&lt;br /&gt;            if i == 0:&lt;br /&gt;                continue&lt;br /&gt;            stream.insert(float(line[-1]))&lt;br /&gt;    print stream.show_median()&lt;br /&gt;&lt;br /&gt;99.5&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h4 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.25em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#example-2-find-top-k-by-groups&quot; id=&quot;user-content-example-2-find-top-k-by-groups&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.2; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Example 2: Find top K by groups&lt;/h4&gt;&lt;h5 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#sql-query-1&quot; id=&quot;user-content-sql-query-1&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;SQL Query&lt;/h5&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;This query below is very expensive. We have a self-joining O(N^2) and a sorting O(NlogN), and the total time complexity is a terrible O(N^2 + Nlog(N)).&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;* In SAS&lt;br /&gt;proc sql; &lt;br /&gt;    select a.sex, a.name, a.weight, (select count(distinct b.weight) &lt;br /&gt;            from class as b where b.weight &amp;gt;= a.weight and a.sex = b.sex ) as rank &lt;br /&gt;    from class as a&lt;br /&gt;    where calculated rank &amp;lt;= 3&lt;br /&gt;    order by sex, rank&lt;br /&gt;;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h5 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#code-generator&quot; id=&quot;user-content-code-generator&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Code Generator&lt;/h5&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;The overall thought is break-and-conquer. If we synthesize SAS codes from a scripting tool such as Python, we essentially get many small SAS codes segments. For example, the SQL code below is just about sorting. So the time comlexity is largely decreased to O(Nlog(N)).&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;# In Python&lt;br /&gt;def create_sql(k, candidates):&lt;br /&gt;    template = &quot;&quot;&quot;&lt;br /&gt;    proc sql outobs = {0};&lt;br /&gt;    select *&lt;br /&gt;    from {1}&lt;br /&gt;    where sex = &#39;{2}&#39; &lt;br /&gt;    order by weight desc&lt;br /&gt;    ;&lt;br /&gt;    quit;&quot;&quot;&quot;&lt;br /&gt;    for x in candidates:&lt;br /&gt;        current = template.format(k, &#39;class&#39;, x)&lt;br /&gt;        print current&lt;br /&gt;if __name__ == &quot;__main__&quot;:&lt;br /&gt;    create_sql(3, [&#39;M&#39;, &#39;F&#39;])&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;    proc sql outobs = 3;&lt;br /&gt;    select *&lt;br /&gt;    from class&lt;br /&gt;    where sex = &#39;M&#39; &lt;br /&gt;    order by weight desc&lt;br /&gt;    ;&lt;br /&gt;    quit;&lt;br /&gt;&lt;br /&gt;    proc sql outobs = 3;&lt;br /&gt;    select *&lt;br /&gt;    from class&lt;br /&gt;    where sex = &#39;F&#39; &lt;br /&gt;    order by weight desc&lt;br /&gt;    ;&lt;br /&gt;    quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h5 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#read-directly-1&quot; id=&quot;user-content-read-directly-1&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Read Directly&lt;/h5&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;This time we use the data structure of heap again in Python. To find the k top rows for each group, we just need to prepare the min heaps with the k size for each group. With the smaller values popped out everytime, we finally get the top k values for each group. The optimized time complexity is O(Nlog(k))&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;#In Python&lt;br /&gt;from sas7bdat import SAS7BDAT&lt;br /&gt;from heapq import heappush, heappop&lt;br /&gt;&lt;br /&gt;def get_top(k, sasfile):&lt;br /&gt;    minheaps = [[], []]&lt;br /&gt;    sexes = [&#39;M&#39;, &#39;F&#39;]&lt;br /&gt;    with SAS7BDAT(sasfile) as infile:&lt;br /&gt;        for i, row in enumerate(infile):&lt;br /&gt;            if i == 0:&lt;br /&gt;                continue&lt;br /&gt;            sex, weight = row[1], row[-1]&lt;br /&gt;            i = sexes.index(sex)&lt;br /&gt;            current = minheaps[i]&lt;br /&gt;            heappush(current, (weight, row))&lt;br /&gt;            if len(current) &amp;gt; k:&lt;br /&gt;                heappop(current)&lt;br /&gt;    for x in minheaps:&lt;br /&gt;        for _, y in x:&lt;br /&gt;            print y&lt;br /&gt;&lt;br /&gt;if __name__ == &quot;__main__&quot;:&lt;br /&gt;    get_top(3, &#39;class.sas7bdat&#39;)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[u&#39;Robert&#39;, u&#39;M&#39;, 12.0, 64.8, 128.0]&lt;br /&gt;[u&#39;Ronald&#39;, u&#39;M&#39;, 15.0, 67.0, 133.0]&lt;br /&gt;[u&#39;Philip&#39;, u&#39;M&#39;, 16.0, 72.0, 150.0]&lt;br /&gt;[u&#39;Carol&#39;, u&#39;F&#39;, 14.0, 62.8, 102.5]&lt;br /&gt;[u&#39;Mary&#39;, u&#39;F&#39;, 15.0, 66.5, 112.0]&lt;br /&gt;[u&#39;Janet&#39;, u&#39;F&#39;, 15.0, 62.5, 112.5]&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h4 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.25em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#example-3-find-moving-window-maxium&quot; id=&quot;user-content-example-3-find-moving-window-maxium&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.2; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Example 3: Find Moving Window Maxium&lt;/h4&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;At the daily work, I always want to find three statistics for a moving window: mean, max, and min. The sheer data size poses challenges.&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;In his&amp;nbsp;&lt;a href=&quot;http://www.sas-programming.com/2015/05/fast-sql-moving-average-calculation.html&quot; style=&quot;box-sizing: border-box; color: #4078c0; text-decoration: none;&quot;&gt;blog post&lt;/a&gt;, Liang Xie showed three advanced approaches to calculated the moving averages, including&amp;nbsp;&lt;code style=&quot;background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;&quot;&gt;PROC EXPAND&lt;/code&gt;,&amp;nbsp;&lt;code style=&quot;background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;&quot;&gt;DATA STEP&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code style=&quot;background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;&quot;&gt;PROC SQL&lt;/code&gt;. Apparently&amp;nbsp;&lt;code style=&quot;background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;&quot;&gt;PROC EXPAND&lt;/code&gt;&amp;nbsp;is the winner throughout the comparison. As conclusion, self-joining is very expensive and always O(N^2) and we should avoid it as much as possible.&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;The question to find the max or the min is somewhat different other than to find the mean, since for the mean only the mean is memorized, while for the max/min the locations of the past min/max should also be memorized.&lt;/div&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h5 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#code-generator-1&quot; id=&quot;user-content-code-generator-1&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.1; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Code Generator&lt;/h5&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;The strategy is very straightforward: we choose three rows from the table sequentially and calculate the means. The time complexity is O(k*N). The generated SAS code is very lengthy, but the machine should feel comfortable to Reading it.&lt;/div&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;In addition, if we want to save the results, we could insert those maximums to an empty table.&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;# In Python&lt;br /&gt;def create_sql(k, N):&lt;br /&gt;    template = &quot;&quot;&quot;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in ({0})&lt;br /&gt;    ;&quot;&quot;&quot;&lt;br /&gt;    SQL = &quot;&quot;&lt;br /&gt;    for x in range(1, N - k + 2):&lt;br /&gt;        current = map(str, range(x, x + 3))&lt;br /&gt;        SQL += template.format(&#39;,&#39;.join(current))&lt;br /&gt;    print &quot;proc sql;&quot; + SQL + &quot;quit;&quot;&lt;br /&gt;if __name__ == &quot;__main__&quot;:&lt;br /&gt;    create_sql(3, 19)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;proc sql;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (1,2,3)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (2,3,4)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (3,4,5)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (4,5,6)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (5,6,7)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (6,7,8)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (7,8,9)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (8,9,10)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (9,10,11)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (10,11,12)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (11,12,13)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (12,13,14)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (13,14,15)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (14,15,16)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (15,16,17)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (16,17,18)&lt;br /&gt;    ;&lt;br /&gt;    select max(weight)&lt;br /&gt;    from class&lt;br /&gt;    where row in (17,18,19)&lt;br /&gt;    ;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h4 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.25em; line-height: 1.4; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#read-directly-2&quot; id=&quot;user-content-read-directly-2&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.2; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Read Directly&lt;/h4&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;Again, if we want to further decrease the time complexity, say O(N), we have to use better data structure, such as queue. SAS doesn&#39;t have queue, so we may switch to Python. Actually it has two loops which adds up to O(2N). However, it is still better than any other methods.&lt;/div&gt;&lt;pre style=&quot;background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; margin-bottom: 16px; overflow: auto; padding: 16px; word-wrap: normal;&quot;&gt;&lt;code style=&quot;background: transparent; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: Consolas, &#39;Liberation Mono&#39;, Menlo, Courier, monospace; font-size: 13.6000003814697px; line-height: inherit; margin: 0px; max-width: initial; overflow: initial; padding: 0px; word-break: normal; word-wrap: normal;&quot;&gt;# In Python&lt;br /&gt;from sas7bdat import SAS7BDAT&lt;br /&gt;from collections import deque&lt;br /&gt;&lt;br /&gt;def maxSlidingWindow(A, w):&lt;br /&gt;    N = len(A)&lt;br /&gt;    ans =[0] * (N - w + 1)&lt;br /&gt;    myqueue = deque()&lt;br /&gt;    for i in range(w):&lt;br /&gt;        while myqueue and A[i] &amp;gt;= A[myqueue[-1]]:&lt;br /&gt;            myqueue.pop()&lt;br /&gt;        myqueue.append(i)&lt;br /&gt;    for i in range(w, N):&lt;br /&gt;        ans[i - w] = A[myqueue[0]]&lt;br /&gt;        while myqueue and A[i] &amp;gt;= A[myqueue[-1]]:&lt;br /&gt;            myqueue.pop()&lt;br /&gt;        while myqueue and myqueue[0] &amp;lt;= i-w:&lt;br /&gt;            myqueue.popleft()&lt;br /&gt;        myqueue.append(i)&lt;br /&gt;    ans[-1] = A[myqueue[0]]&lt;br /&gt;    return ans&lt;br /&gt;&lt;br /&gt;if __name__ == &quot;__main__&quot;:&lt;br /&gt;    weights = []&lt;br /&gt;    with SAS7BDAT(&#39;class.sas7bdat&#39;) as infile:&lt;br /&gt;        for i, row in enumerate(infile):&lt;br /&gt;            if i == 0:&lt;br /&gt;                continue&lt;br /&gt;            weights.append(float(row[-1]))&lt;br /&gt;&lt;br /&gt;    print maxSlidingWindow(weights, 3)&lt;br /&gt;&lt;br /&gt;[112.5, 102.5, 102.5, 102.5, 102.5, 112.5, 112.5, 112.5, 99.5, 99.5, 90.0, 112.0, 150.0, 150.0, 150.0, 133.0, 133.0]&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 1em; position: relative;&quot;&gt;&lt;a aria-hidden=&quot;true&quot; class=&quot;anchor&quot; href=&quot;https://github.com/dapangmao/Blog/blob/master/Two%20alternative%20ways%20to%20query%20large%20dataset%20in%20SAS.md#conclusion&quot; id=&quot;user-content-conclusion&quot; style=&quot;box-sizing: border-box; color: #4078c0; display: block; left: 0px; line-height: 1.2; margin-left: -30px; padding-left: 30px; padding-right: 6px; position: absolute; text-decoration: none; top: 0px;&quot;&gt;&lt;/a&gt;Conclusion&lt;/h3&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;&lt;div style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 16px;&quot;&gt;While data is expanding, we should more and more consider three things -&lt;/div&gt;&lt;ul style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; line-height: 20.4799995422363px; margin-bottom: 0px !important; margin-top: 0px; padding: 0px 0px 0px 2em;&quot;&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Time complexity: we don&#39;t want run data for weeks.&lt;/li&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Space complexity: we don&#39;t want the memory overflow.&lt;/li&gt;&lt;li style=&quot;box-sizing: border-box;&quot;&gt;Clean codes: the colleagues should easily Reading and modify the codes.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/h3&gt;&lt;h3 style=&quot;box-sizing: border-box; color: #333333; font-family: &#39;Helvetica Neue&#39;, Helvetica, &#39;Segoe UI&#39;, Arial, freesans, sans-serif; font-size: 1.5em; line-height: 1.43; margin-bottom: 16px; margin-top: 0px !important; position: relative;&quot;&gt;&lt;ul style=&quot;box-sizing: border-box; font-size: 16px; font-weight: normal; line-height: 20.4799995422363px; margin-bottom: 0px !important; margin-top: 0px; padding: 0px 0px 0px 2em;&quot;&gt;&lt;/ul&gt;&lt;/h3&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/3945689256495065527/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=3945689256495065527' title='92 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/3945689256495065527'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/3945689256495065527'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/06/two-alternative-ways-with-sas-and-sql.html' title='Two alternative ways to query large dataset in SAS'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>92</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-1995775124679997534</id><published>2015-06-03T10:59:00.000-05:00</published><updated>2015-06-03T10:59:11.474-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>saslib: a simple Python tool to lookup SAS metadata </title><content type='html'>&lt;div style=&quot;background-color: white; color: #222222; font-family: arial, sans-serif; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;a href=&quot;https://github.com/dapangmao/saslib&quot; style=&quot;color: #1155cc;&quot;&gt;saslib&lt;/a&gt;&amp;nbsp;is an HTML report generator to lookup the metadata (or the head information) like PROC CONTENTS in SAS.&lt;/div&gt;&lt;ul style=&quot;background-color: white; color: #222222; font-family: arial, sans-serif; font-size: small; margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;It reads the sas7bdat files directly and quickly, and does not need SAS installed.&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Emulate&amp;nbsp;&lt;a href=&quot;http://support.sas.com/documentation/cdl/en/proc/67327/HTML/default/viewer.htm#n1hqa4dk5tay0an15nrys1iwr5o2.htm&quot; style=&quot;color: #1155cc;&quot;&gt;PROC CONTENTS&lt;/a&gt;&amp;nbsp;by jQuery and&amp;nbsp;&lt;a href=&quot;https://www.datatables.net/&quot; style=&quot;color: #1155cc;&quot;&gt;DataTables&lt;/a&gt;.&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Extract the meta data from all SAS7bdat files under the specified directory.&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Support IE(&amp;gt;=10), firefox, chrome and any other modern browser.&lt;/li&gt;&lt;/ul&gt;&lt;h2 id=&quot;installation&quot; style=&quot;background-color: white; border-bottom-color: rgb(238, 238, 238); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: arial, sans-serif; font-size: 1.4em; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Installation&lt;/h2&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(204, 204, 204); display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em;&quot;&gt;pip install saslib&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial, sans-serif; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;saslib&lt;/code&gt;&amp;nbsp;requires&amp;nbsp;&lt;a href=&quot;https://pypi.python.org/pypi/sas7bdat&quot; style=&quot;color: #1155cc;&quot;&gt;sas7bdat&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href=&quot;https://pypi.python.org/pypi/Jinja2/2.7.3&quot; style=&quot;color: #1155cc;&quot;&gt;jinjia2&lt;/a&gt;.&lt;/div&gt;&lt;h2 id=&quot;usage&quot; style=&quot;background-color: white; border-bottom-color: rgb(238, 238, 238); border-bottom-style: solid; border-bottom-width: 1px; color: #222222; font-family: arial, sans-serif; font-size: 1.4em; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Usage&lt;/h2&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial, sans-serif; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;The module is very simple to use. For example, the SAS data sets under the SASHELP library could be viewed —&lt;/div&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background: rgb(248, 248, 248); border-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;font-weight: bold;&quot;&gt;from&lt;/span&gt; saslib &lt;span class=&quot;hljs-keyword&quot; style=&quot;font-weight: bold;&quot;&gt;import&lt;/span&gt; PROCcontents&lt;br /&gt;&lt;br /&gt;sasdata = PROCcontents(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;c:/Program Files/SASHome/SASFoundation/9.3/core/sashelp&#39;&lt;/span&gt;)&lt;br /&gt;sasdata.show()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 0em; height: 0px; margin: 0px; max-height: 0px; max-width: 0px; overflow: hidden; padding: 0px; width: 0px;&quot; title=&quot;MDH:PGRpdj5bc2FzbGliXShodHRwczovL2dpdGh1Yi5jb20vZGFwYW5nbWFvL3Nhc2xpYikgaXMgbiBI VE1MIHJlcG9ydCBnZW5lcmF0b3IgdG8gcGVyZm9ybSB0aGUgbWV0YSBkYXRhIGxvb2t1cCBsaWtl IFBST0MgQ09OVEVOVFMgaW4gU0FTLjwvZGl2PjxkaXY+LSBJdCByZWFkcyB0aGUgc2FzN2JkYXQg ZmlsZXMgZGlyZWN0bHksIGFuZCBkb2VzIG5vdCBuZWVkIFNBUyBpbnN0YWxsZWQuJm5ic3A7PC9k aXY+PGRpdj4tIEVtdWxhdGUgW1BST0MgQ09OVEVOVFNdKGh0dHA6Ly9zdXBwb3J0LnNhcy5jb20v ZG9jdW1lbnRhdGlvbi9jZGwvZW4vcHJvYy82NzMyNy9IVE1ML2RlZmF1bHQvdmlld2VyLmh0bSNu MWhxYTRkazV0YXkwYW4xNW5yeXMxaXdyNW8yLmh0bSkgYnkgalF1ZXJ5IGFuZCBbRGF0YVRhYmxl c10oaHR0cHM6Ly93d3cuZGF0YXRhYmxlcy5uZXQvKS48L2Rpdj48ZGl2Pi0gRXh0cmFjdCB0aGUg bWV0YSBkYXRhIGZyb20gYWxsIFNBUzdiZGF0IGZpbGVzIHVuZGVyIHRoZSBzcGVjaWZpZWQgZGly ZWN0b3J5LiZuYnNwOzwvZGl2PjxkaXY+LSBTdXBwb3J0IElFKCZndDs9MTApLCBmaXJlZm94LCBj aHJvbWUgYW5kIGFueSBvdGhlciBtb2Rlcm4gYnJvd3Nlci4mbmJzcDs8L2Rpdj48ZGl2Pjxicj48 L2Rpdj48ZGl2PiMjIEluc3RhbGxhdGlvbjwvZGl2PjxkaXY+YGBgPC9kaXY+PGRpdj5waXAgaW5z dGFsbCBzYXNsaWI8L2Rpdj48ZGl2PmBgYDwvZGl2PjxkaXY+YHNhc2xpYmAgcmVxdWlyZXMgW3Nh czdiZGF0XShodHRwczovL3B5cGkucHl0aG9uLm9yZy9weXBpL3NhczdiZGF0KSBhbmQgW2ppbmpp YTJdKGh0dHBzOi8vcHlwaS5weXRob24ub3JnL3B5cGkvSmluamEyLzIuNy4zKS48L2Rpdj48ZGl2 Pjxicj48L2Rpdj48ZGl2PiMjIFVzYWdlPC9kaXY+PGRpdj48YnI+PC9kaXY+PGRpdj5UaGUgbW9k dWxlIGlzIHZlcnkgc2ltcGxlIHRvIHVzZS4gRm9yIGV4YW1wbGUsIHRoZSBTQVMgZGF0YSBzZXRz IHVuZGVyIHRoZSBTQVNIRUxQIGxpYnJhcnkgY291bGQgYmUgdmlld2VkIC0tPC9kaXY+PGRpdj5g YGBweXRob248L2Rpdj48ZGl2PmZyb20gc2FzbGliIGltcG9ydCBQUk9DY29udGVudHM8L2Rpdj48 ZGl2Pjxicj48L2Rpdj48ZGl2PnNhc2RhdGEgPSBQUk9DY29udGVudHMoJ2M6L1Byb2dyYW0gRmls ZXMvU0FTSG9tZS9TQVNGb3VuZGF0aW9uLzkuMy9jb3JlL3Nhc2hlbHAnKTwvZGl2PjxkaXY+c2Fz ZGF0YS5zaG93KCk8L2Rpdj48ZGl2PmBgYDwvZGl2PjxkaXY+VGhlIHJlc3VsdGluZyBIVE1MIGZp bGUgZnJvbSB0aGUgY29kZXMgYWJvdmUgd2lsbCBiZSBsaWtlIFtoZXJlXShodHRwOi8vZGFwYW5n bWFvLmdpdGh1Yi5pby9zYXNsaWJfZGVtby9zYXNoZWxwLmh0bWwpLjwvZGl2Pg==&quot;&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style=&quot;-webkit-text-stroke-width: 0px; background-color: white; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; margin: 1.2em 0px !important; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px;&quot;&gt;The resulting HTML file from the codes above will be like&amp;nbsp;&lt;a href=&quot;http://dapangmao.github.io/saslib_demo/sashelp.html&quot; style=&quot;color: #1155cc;&quot;&gt;here&lt;/a&gt;.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/1995775124679997534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=1995775124679997534' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/1995775124679997534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/1995775124679997534'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/06/saslib-simple-python-tool-to-lookup-sas.html' title='saslib: a simple Python tool to lookup SAS metadata '/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-5600838384179368577</id><published>2015-03-20T13:52:00.001-05:00</published><updated>2015-03-20T14:00:48.002-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Deploy a minimal Spark cluster</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=5600838384179368577&quot;&gt;&lt;h3 id=&quot;requirements&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Requirements&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Since Spark is rapidly evolving, I need to deploy and maintain a minimal Spark cluster for the purpose of testing and prototyping. A public cloud is the best fit for my current demand. &lt;/div&gt;&lt;ol style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Intranet speed&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;The cluster should easily copy the data from one server to another. MapReduce always shuffles a large chunk of data  throughout the HDFS. It’s best that the hard disk is SSD.&lt;/div&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Elasticity and scalability&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Before scaling the cluster out to more machines, the cloud should have some elasticity to size up or size down. &lt;/div&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Locality of Hadoop&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Most importantly, the Hadoop cluster and the Spark cluster should have one-to-one mapping relationship like below. The computation and the storage should always be on the same machines. &lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px; font-family: inherit; font-size: inherit; font-stretch: inherit; font-style: inherit; font-variant: inherit; font-weight: inherit; line-height: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;background-color: white; border-top-color: rgb(204, 204, 204); border-top-style: solid; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em;&quot;&gt;Hadoop&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Cluster Manager&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Spark&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;MapReduce&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;background-color: white; border-top-color: rgb(204, 204, 204); border-top-style: solid; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Name Node&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Master&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Driver&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Job Tracker&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;background-color: #f8f8f8; background-color: white; border-top-color: rgb(204, 204, 204); border-top-style: solid; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Data Node&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Slave&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Executor&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Task Tracker&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h3 id=&quot;choice-of-public-cloud-&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Choice of public cloud:&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;I simply compare two cloud service provider: AWS and DigitalOcean. Both have nice Python-based monitoring tools(&lt;a href=&quot;https://github.com/boto/boto&quot;&gt;Boto&lt;/a&gt; for AWS and &lt;a href=&quot;https://github.com/koalalorenzo/python-digitalocean&quot;&gt;python-digitalocean&lt;/a&gt; for DigitalOcean). &lt;/div&gt;&lt;ol style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;From storage to computation&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Hadoop’s S3 is a great storage to keep data and load it into the Spark/EC2 cluster. Or the Spark cluster on EC2 can directly read S3 bucket such as s3n://file (the speed is still acceptable). On DigitalOcean, I have to upload data from local to the cluster’s HDFS. &lt;/div&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;DevOps tools:&lt;/div&gt;&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;AWS: &lt;a href=&quot;https://github.com/apache/spark/blob/master/ec2/spark_ec2.py&quot;&gt;spark-ec2.py&lt;/a&gt;&lt;/div&gt;&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;With default setting after running it, you will get&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;2 HDFSs: one persistent and one ephemeral&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Spark 1.3 or any earlier version&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Spark’s stand-alone cluster manager&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;A minimal cluster with 1 master and 3 slaves will be consist of 4 m1.xlarge EC2 instances &lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Pros: large memory with each node having 15 GB memory &lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Cons: not SSD; expensive (cost $0.35 * 6 = $2.1 per hour)&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;DigitalOcean: &lt;a href=&quot;https://digitalocean.mesosphere.com/&quot;&gt;https://digitalocean.mesosphere.com/&lt;/a&gt;&lt;/div&gt;&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;With default setting after running it, you will get &lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;HDFS&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;no Spark&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Mesos&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;OpenVPN&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;A minimal cluster with 1 master and 3 slaves will be consist of 4 2GB/2CPUs droplets &lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Pros: as low as $0.12 per hour; Mesos provide fine-grained control over the cluster(down to 0.1 CPU and 16MB memory); nice to have VPN to guarantee the security&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Cons: small memory(each has 2GB memory); have to install Spark manually&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;&lt;h3 id=&quot;add-spark-to-digitalocean-cluster&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Add Spark to DigitalOcean cluster&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Tom Faulhaber has &lt;a href=&quot;http://www.infolace.com/blog/2015/02/27/create-an-ad-hoc-spark-cluster/&quot;&gt;a quick bash script&lt;/a&gt; for deployment. To install Spark 1.3.0, I write it into a &lt;a href=&quot;https://github.com/dapangmao/Blog/blob/master/Deploy%20a%20minimal%20Spark%20cluster/fabfile.py&quot;&gt;fabfile&lt;/a&gt; for &lt;a href=&quot;http://www.fabfile.org/&quot;&gt;Python’s Fabric&lt;/a&gt;.&lt;br /&gt;Then all the deployment onto the DigitOcean is just one command line. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# 10.1.2.3 is the internal IP address of the master&lt;/span&gt;&lt;br /&gt;fab -H &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;10.1&lt;/span&gt;&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;.2&lt;/span&gt;&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;.3&lt;/span&gt; deploy_spark&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;&lt;em&gt;The source codes above are available at my &lt;a href=&quot;https://github.com/dapangmao/Blog/tree/master/Deploy%20a%20minimal%20Spark%20cluster&quot;&gt;Github&lt;/a&gt;&lt;/em&gt;&lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+PGJyPjwvcD48cD48YnI+PC9wPjxwPiMjI1JlcXVpcmVtZW50czwvcD48cD48YnI+PC9wPjxw PlNpbmNlIFNwYXJrIGlzIHJhcGlkbHkgZXZvbHZpbmcsIEkgbmVlZCB0byBkZXBsb3kgYW5kIG1h aW50YWluIGEgbWluaW1hbCBTcGFyayBjbHVzdGVyIGZvciB0aGUgcHVycG9zZSBvZiB0ZXN0aW5n IGFuZCBwcm90b3R5cGluZy4gQSBwdWJsaWMgY2xvdWQgaXMgdGhlIGJlc3QgZml0IGZvciBteSBj dXJyZW50IGRlbWFuZC4mbmJzcDs8L3A+PHA+PGJyPjwvcD48cD4xLiBJbnRyYW5ldCBzcGVlZDwv cD48cD4mbmJzcDsgJm5ic3A7PC9wPjxwPiZuYnNwOyAmbmJzcDtUaGUgY2x1c3RlciBzaG91bGQg ZWFzaWx5IGNvcHkgdGhlIGRhdGEgZnJvbSBvbmUgc2VydmVyIHRvIGFub3RoZXIuIE1hcFJlZHVj ZSBhbHdheXMgc2h1ZmZsZXMgYSBsYXJnZSBjaHVuayBvZiBkYXRhICZuYnNwO3Rocm91Z2hvdXQg dGhlIEhERlMuIEl0J3MgYmVzdCB0aGF0IHRoZSBoYXJkIGRpc2sgaXMgU1NELjwvcD48cD48YnI+ PC9wPjxwPjIuIEVsYXN0aWNpdHkgYW5kIHNjYWxhYmlsaXR5PC9wPjxwPjxicj48L3A+PHA+Jm5i c3A7ICZuYnNwO0JlZm9yZSBzY2FsaW5nIHRoZSBjbHVzdGVyIG91dCB0byBtb3JlIG1hY2hpbmVz LCB0aGUgY2xvdWQgc2hvdWxkIGhhdmUgc29tZSBlbGFzdGljaXR5IHRvIHNpemUgdXAgb3Igc2l6 ZSBkb3duLiZuYnNwOzwvcD48cD48YnI+PC9wPjxwPjMuIExvY2FsaXR5IG9mIEhhZG9vcDwvcD48 cD48YnI+PC9wPjxwPiZuYnNwOyAmbmJzcDtNb3N0IGltcG9ydGFudGx5LCB0aGUgSGFkb29wIGNs dXN0ZXIgYW5kIHRoZSBTcGFyayBjbHVzdGVyIHNob3VsZCBoYXZlIG9uZS10by1vbmUgbWFwcGlu ZyByZWxhdGlvbnNoaXAgbGlrZSBiZWxvdy4gVGhlIGNvbXB1dGF0aW9uIGFuZCB0aGUgc3RvcmFn ZSBzaG91bGQgYWx3YXlzIGJlIG9uIHRoZSBzYW1lIG1hY2hpbmVzLiZuYnNwOzwvcD48cD48YnI+ PC9wPjxwPnwgSGFkb29wICZuYnNwO3wgQ2x1c3RlciBNYW5hZ2VyIHwgJm5ic3A7U3BhcmsgfCBN YXBSZWR1Y2UgfCZuYnNwOzwvcD48cD58LS0tLS0tLS0tLXw6LS0tLS0tLS0tLS0tLTp8LS0tLS0t OnwtLS0tLS0tOnw8L3A+PHA+fCBOYW1lIE5vZGUgfCAmbmJzcDtNYXN0ZXIgfCBEcml2ZXIgfCBK b2IgVHJhY2tlciB8Jm5ic3A7PC9wPjxwPnwgRGF0YSBOb2RlIHwgJm5ic3A7U2xhdmUgJm5ic3A7 IHwgRXhlY3V0b3IgfCBUYXNrIFRyYWNrZXIgfCZuYnNwOzwvcD48cD48YnI+PC9wPjxwPiMjI0No b2ljZSBvZiBwdWJsaWMgY2xvdWQ6Jm5ic3A7PC9wPjxwPkkgc2ltcGx5IGNvbXBhcmUgdHdvIGNs b3VkIHNlcnZpY2UgcHJvdmlkZXI6IEFXUyBhbmQgRGlnaXRhbE9jZWFuLiBCb3RoIGhhdmUgbmlj ZSBQeXRob24tYmFzZWQgbW9uaXRvcmluZyB0b29scyhbQm90b10oaHR0cHM6Ly9naXRodWIuY29t L2JvdG8vYm90bykgZm9yIEFXUyBhbmQgW3B5dGhvbi1kaWdpdGFsb2NlYW5dKGh0dHBzOi8vZ2l0 aHViLmNvbS9rb2FsYWxvcmVuem8vcHl0aG9uLWRpZ2l0YWxvY2VhbikgZm9yIERpZ2l0YWxPY2Vh bikuJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+MS4gRnJvbSBzdG9yYWdlIHRvIGNvbXB1dGF0aW9u PC9wPjxwPjxicj48L3A+PHA+Jm5ic3A7ICZuYnNwO0hhZG9vcCdzIFMzIGlzIGEgZ3JlYXQgc3Rv cmFnZSB0byBrZWVwIGRhdGEgYW5kIGxvYWQgaXQgaW50byB0aGUgU3BhcmsvRUMyIGNsdXN0ZXIu IE9yIHRoZSBTcGFyayBjbHVzdGVyIG9uIEVDMiBjYW4gZGlyZWN0bHkgcmVhZCBTMyBidWNrZXQg c3VjaCBhcyBzM246Ly9maWxlICh0aGUgc3BlZWQgaXMgc3RpbGwgYWNjZXB0YWJsZSkuIE9uIERp Z2l0YWxPY2VhbiwgSSBoYXZlIHRvIHVwbG9hZCBkYXRhIGZyb20gbG9jYWwgdG8gdGhlIGNsdXN0 ZXIncyBIREZTLiZuYnNwOzwvcD48cD48YnI+PC9wPjxwPjIuIERldk9wcyB0b29sczo8L3A+PHA+ Jm5ic3A7ICZuYnNwOyogQVdTOiBbc3BhcmstZWMyLnB5XShodHRwczovL2dpdGh1Yi5jb20vYXBh Y2hlL3NwYXJrL2Jsb2IvbWFzdGVyL2VjMi9zcGFya19lYzIucHkpPC9wPjxwPiZuYnNwOyAmbmJz cDsgJm5ic3A7IC0gV2l0aCBkZWZhdWx0IHNldHRpbmcgYWZ0ZXIgcnVubmluZyBpdCwgeW91IHdp bGwgZ2V0PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDstIDIgSERGU3M6 IG9uZSBwZXJzaXN0ZW50IGFuZCBvbmUgZXBoZW1lcmFsPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDstIFNwYXJrIDEuMyBvciBhbnkgZWFybGllciB2ZXJzaW9uPC9wPjxw PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDstIFNwYXJrJ3Mgc3RhbmQtYWxvbmUg Y2x1c3RlciBtYW5hZ2VyPC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7IC0gQSBtaW5pbWFsIGNs dXN0ZXIgd2l0aCAxIG1hc3RlciBhbmQgMyBzbGF2ZXMgd2lsbCBiZSBjb25zaXN0IG9mIDQgbTEu eGxhcmdlIEVDMiBpbnN0YW5jZXMmbmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOy0gUHJvczogbGFyZ2UgbWVtb3J5IHdpdGggZWFjaCBub2RlIGhhdmluZyAxNSBH QiBtZW1vcnkmbmJzcDs8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOy0g Q29uczogbm90IFNTRDsgZXhwZW5zaXZlIChjb3N0ICQwLjM1ICogNiA9ICQyLjEgcGVyIGhvdXIp PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7Jm5ic3A7PC9wPjxwPiZuYnNwOyAmbmJzcDsqIERp Z2l0YWxPY2VhbjogaHR0cHM6Ly9kaWdpdGFsb2NlYW4ubWVzb3NwaGVyZS5jb20vPC9wPjxwPiZu YnNwOyAmbmJzcDsgJm5ic3A7IC0gV2l0aCBkZWZhdWx0IHNldHRpbmcgYWZ0ZXIgcnVubmluZyBp dCwgeW91IHdpbGwgZ2V0Jm5ic3A7PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDstIEhERlM8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOy0gbm8g U3Bhcms8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOy0gTWVzb3M8L3A+ PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOy0gT3BlblZQTjwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAtIEEgbWluaW1hbCBjbHVzdGVyIHdpdGggMSBtYXN0ZXIgYW5kIDMg c2xhdmVzIHdpbGwgYmUgY29uc2lzdCBvZiA0IDJHQi8yQ1BVcyBkcm9wbGV0cyZuYnNwOzwvcD48 cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7LSBQcm9zOiBhcyBsb3cgYXMgJDAu MTIgcGVyIGhvdXI7IE1lc29zIHByb3ZpZGUgZmluZS1ncmFpbmVkIGNvbnRyb2wgb3ZlciB0aGUg Y2x1c3Rlcihkb3duIHRvIDAuMSBDUFUgYW5kIDE2TUIgbWVtb3J5KTsgbmljZSB0byBoYXZlIFZQ TiB0byBndWFyYW50ZWUgdGhlIHNlY3VyaXR5PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDstIENvbnM6IHNtYWxsIG1lbW9yeShlYWNoIGhhcyAyR0IgbWVtb3J5KTsgaGF2 ZSB0byBpbnN0YWxsIFNwYXJrIG1hbnVhbGx5PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsmbmJzcDs8L3A+PHA+IyMjQWRkIFNwYXJrIHRvIERpZ2l0YWxPY2VhbiBjbHVz dGVyPC9wPjxwPlRvbSBGYXVsaGFiZXIgaGFzIFthIHF1aWNrIGJhc2ggc2NyaXB0XShodHRwOi8v d3d3LmluZm9sYWNlLmNvbS9ibG9nLzIwMTUvMDIvMjcvY3JlYXRlLWFuLWFkLWhvYy1zcGFyay1j bHVzdGVyLykgZm9yIGRlcGxveW1lbnQuIFRvIGluc3RhbGwgU3BhcmsgMS4zLjAsIEkgd3JpdGUg aXQgaW50byBhIFtmYWJmaWxlXShodHRwczovL2dpdGh1Yi5jb20vZGFwYW5nbWFvL0Jsb2cvYmxv Yi9tYXN0ZXIvRGVwbG95JTIwYSUyMG1pbmltYWwlMjBTcGFyayUyMGNsdXN0ZXIvZmFiZmlsZS5w eSkgZm9yIFtQeXRob24ncyBGYWJyaWNdKGh0dHA6Ly93d3cuZmFiZmlsZS5vcmcvKS4mbmJzcDs8 L3A+PHA+VGhlbiBhbGwgdGhlIGRlcGxveW1lbnQgb250byB0aGUgRGlnaXRPY2VhbiBpcyBqdXN0 IG9uZSBjb21tYW5kIGxpbmUuJm5ic3A7PC9wPjxwPmBgYHB5dGhvbjwvcD48cD4jIDEwLjEuMi4z IGlzIHRoZSBpbnRlcm5hbCBJUCBhZGRyZXNzIG9mIHRoZSBtYXN0ZXI8L3A+PHA+ZmFiIC1IIDEw LjEuMi4zIGRlcGxveV9zcGFyayZuYnNwOzwvcD48cD5gYGA8L3A+PHA+KlRoZSBzb3VyY2UgY29k ZXMgYWJvdmUgYXJlIGF2YWlsYWJsZSBhdCBteSBbR2l0aHViXShodHRwczovL2dpdGh1Yi5jb20v ZGFwYW5nbWFvL0Jsb2cvdHJlZS9tYXN0ZXIvRGVwbG95JTIwYSUyMG1pbmltYWwlMjBTcGFyayUy MGNsdXN0ZXIpKjwvcD48ZGl2Pjxicj48L2Rpdj4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/5600838384179368577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=5600838384179368577' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/5600838384179368577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/5600838384179368577'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/03/deploy-minimal-spark-cluster.html' title='Deploy a minimal Spark cluster'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-3016598022911617398</id><published>2015-02-03T08:33:00.000-06:00</published><updated>2015-02-03T08:33:03.913-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>Solve the Top N questions in SAS/SQL</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=3016598022911617398;onPublishedMenu=allposts;onClosedMenu=allposts;postNum=0;src=postname&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;This is a following post after &lt;a href=&quot;http://www.sasanalysis.com/2012/05/top-10-tips-and-tricks-about-proc-sql.html&quot;&gt;my previous post&lt;/a&gt; about SAS/SQL. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;SAS’s SQL procedure has a basic SQL syntax. I found that the most challenging work is to use PROC SQL to solve the TOP N (or TOP N by Group) questions. Comparing with other modern database systems, PROC SQL is lack of -&lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;The ranking functions such as &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;RANK()&lt;/code&gt; or the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;SELECT TOP&lt;/code&gt; clause such as&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-sql&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-operator&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;select&lt;/span&gt; TOP &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt; * &lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; class&lt;br /&gt;;&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;The &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;partition by&lt;/code&gt; clause such as &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-sql&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-operator&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;select&lt;/span&gt; sex, name, weight&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; (&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;select&lt;/span&gt; sex, name, &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;max&lt;/span&gt;(weight) over(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;partition&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;by&lt;/span&gt; sex) max_weight&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; class)&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;where&lt;/span&gt;  weight = max_weight&lt;br /&gt;;&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;However, there are always some alternative solutions in SAS. I list a few question from an ascending difficulty below to explore the possibilities. &lt;/div&gt;&lt;h5 id=&quot;prepare-the-data&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Prepare the data&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;First a &lt;a href=&quot;http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_intro_sect009.htm&quot;&gt;SASHELP.CLASS&lt;/a&gt; dataset is used as a demo (availabe for every SAS copy). It is a small weight and height dataset from a faked class of 19 children. Now I only keep the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;weight&lt;/code&gt; variable as target column.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-sql&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;data class;&lt;br /&gt;    &lt;span class=&quot;hljs-operator&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;set&lt;/span&gt; sashelp.class;&lt;/span&gt;&lt;br /&gt;    keep name sex weight;&lt;br /&gt;run;&lt;br /&gt;&lt;br /&gt;proc sort;&lt;br /&gt;    by descending weight;&lt;br /&gt;run;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Sex&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Age&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Philip&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;16&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;150&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Ronald&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;15&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;133&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Robert&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;128&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Alfred&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;14&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Janet&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;15&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Mary&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;15&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;William&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;15&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Carol&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;14&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;102.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Henry&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;14&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;102.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;John&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;99.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Barbara&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;13&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;98&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Judy&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;14&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;90&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Thomas&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;11&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;85&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Jane&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;84.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Alice&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;13&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;84&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Jeffrey&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;13&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;84&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;James&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;83&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Louise&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;77&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Joyce&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;11&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;50.5&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h5 id=&quot;1-select-highest-value&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;1. Select highest value&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;It is straightforward to use the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;outobs&lt;/code&gt; option at the begining to single out the highest weight. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;title &quot;Select highest weight overall&quot;;&lt;br /&gt;proc sql outobs = 1;&lt;br /&gt;    select name, weight&lt;br /&gt;    from class&lt;br /&gt;    order by weight desc&lt;br /&gt;;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Philip&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;150&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h5 id=&quot;2-select-second-highest-value&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;2. Select second highest value&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;How about the second highest weight? The logic is simple — if we remove the highest weight first, then the second highest weight will take the first row. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;title &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;Select second highest weight overall&quot;&lt;/span&gt;;&lt;br /&gt;proc sql outobs = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;;&lt;br /&gt;    select name, &lt;span class=&quot;hljs-function&quot;&gt;weight &lt;br /&gt;    from class&lt;br /&gt;    where weight not &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;hljs-params&quot;&gt;(select max(weight)&lt;/span&gt; from class)&lt;br /&gt;    order by weight desc&lt;br /&gt;&lt;/span&gt;;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Ronald&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;133&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h5 id=&quot;3-select-nth-highest-value&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;3. Select Nth highest value&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Now it comes to the hard part. How about the Nth highest value, say, the fourth highest weight? Now we have to do a self-joining to let the distinct value point to 3. Since there are two children with the weight 112.5, the query returns the two tied names. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;title &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;Select Nth highest weight&quot;&lt;/span&gt;;&lt;br /&gt;%let n = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;4&lt;/span&gt;;&lt;br /&gt;&lt;br /&gt;proc sql;&lt;br /&gt;    select distinct a.name, a.&lt;span class=&quot;hljs-function&quot;&gt;weight&lt;br /&gt;    from class as a&lt;br /&gt;    &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;hljs-params&quot;&gt;(select count(distinct b.weight)&lt;/span&gt;&lt;br /&gt;        from class as b&lt;br /&gt;        where b.weight &amp;gt; a.weight&lt;br /&gt;        ) &lt;/span&gt;= &amp;amp;n - &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;;&lt;br /&gt;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Alfred&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Janet&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h5 id=&quot;4-select-highest-values-by-group&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;4. Select highest values by group&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;There are two groups Male and Female in the class, and the easiest way to find the highest weight for each category is &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;select max for female union select max for male&lt;/code&gt;. However, a more scalable solution is to use the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;group by&lt;/code&gt; clause that fits more than two groups. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;title &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;Select highest weights by group&quot;&lt;/span&gt;;&lt;br /&gt;proc sql;&lt;br /&gt;    select sex, name, weight&lt;br /&gt;    from &lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;group&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;sex&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;having&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;weight&lt;/span&gt; &lt;/span&gt;= max(weight)&lt;br /&gt;;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Sex&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Janet&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Philip&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;150&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h5 id=&quot;5-rank-all-values&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;5. Rank all values&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The ultimate solution to solve all the question above is to derive a &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;rank&lt;/code&gt; column for the target. There are two solutions: the first one use a subquery in the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;select&lt;/code&gt; clause, while the second one utilizes a subquery in the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;where&lt;/code&gt; clause. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The subquery in the first solution is independent to the main query, which uses less codes and is easier to recall in practice. The second one is actually a self-joining that is faster than the first solution.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;/* Solution I */&lt;/span&gt;&lt;br /&gt;proc sql; &lt;br /&gt;    select name, weight, (&lt;span class=&quot;hljs-function&quot;&gt;select &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(distinct b.weight)&lt;/span&gt; &lt;br /&gt;            from class as b where b.weight &amp;gt;&lt;/span&gt;= a.weight) as Rank&lt;br /&gt;    from &lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;a&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;order&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;by&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;rank&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;;quit;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;/* Solution II */&lt;/span&gt;&lt;br /&gt;proc sql;&lt;br /&gt;    select a.name, a.weight, count(b.weight) as rank&lt;br /&gt;    from &lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;a&lt;/span&gt;, (&lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;select&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;distinct&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;weight&lt;/span&gt;&lt;br /&gt;           &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;class&lt;/span&gt;&lt;br /&gt;           ) &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;b&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;a&lt;/span&gt;.&lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;weight&lt;/span&gt; &amp;lt;&lt;/span&gt;= b.weight&lt;br /&gt;    group by a.name, a.weight&lt;br /&gt;    order by a.weight desc&lt;br /&gt;;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Rank&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Philip&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;150&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Ronald&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;133&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Robert&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;128&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Alfred&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;4&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Janet&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;4&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Mary&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;William&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;5&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Henry&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;102.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Carol&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;102.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;6&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;John&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;99.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;7&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Barbara&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;98&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;8&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Judy&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;90&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;9&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Thomas&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;85&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;10&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Jane&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;84.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;11&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Alice&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;84&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Jeffrey&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;84&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;12&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;James&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;83&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;13&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Louise&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;77&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;14&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Joyce&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;50.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;15&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;h5 id=&quot;6-select-top-n-values-by-group&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;6. Select top N values by group&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Once with the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;rank&lt;/code&gt; column at hand, many perplexing problems could be easily solved. For example, we can use it to find the top 3 heaviest people for each category of male and female. And it is also scalable to more than two groups.  &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;title &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;Select Top N weights by group&quot;&lt;/span&gt;;&lt;br /&gt;proc sql; &lt;br /&gt;    select a.sex, a.name, a.weight, (&lt;span class=&quot;hljs-function&quot;&gt;select &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(distinct b.weight)&lt;/span&gt; &lt;br /&gt;            from class as b where b.weight &amp;gt;&lt;/span&gt;= a.weight and a.sex = b.sex ) as rank &lt;br /&gt;    from &lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;a&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;calculated&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;rank&lt;/span&gt; &amp;lt;&lt;/span&gt;= &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt;&lt;br /&gt;    order by sex, rank&lt;br /&gt;;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Sex&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;Weight&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;rank&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Janet&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Mary&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;112&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;F&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Carol&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;102.5&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Philip&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;150&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Ronald&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;133&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;M&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: left;&quot;&gt;Robert&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;128&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: right;&quot;&gt;3&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:VGhpcyBpcyBhIGZvbGxvd2luZyBwb3N0IGFmdGVyIFtteSBwcmV2aW91cyBwb3N0XShodHRwOi8v d3d3LnNhc2FuYWx5c2lzLmNvbS8yMDEyLzA1L3RvcC0xMC10aXBzLWFuZC10cmlja3MtYWJvdXQt cHJvYy1zcWwuaHRtbCkgYWJvdXQgU0FTL1NRTC4gPGJyPjxicj5TQVMncyBTUUwgcHJvY2VkdXJl IGhhcyBhIGJhc2ljIFNRTCBzeW50YXguIEkgZm91bmQgdGhhdCB0aGUgbW9zdCBjaGFsbGVuZ2lu ZyB3b3JrIGlzIHRvIHVzZSBQUk9DIFNRTCB0byBzb2x2ZSB0aGUgVE9QIE4gKG9yIFRPUCBOIGJ5 IEdyb3VwKSBxdWVzdGlvbnMuIENvbXBhcmluZyB3aXRoIG90aGVyIG1vZGVybiBkYXRhYmFzZSBz eXN0ZW1zLCBQUk9DIFNRTCBpcyBsYWNrIG9mIC08YnI+PGJyPi0gVGhlIHJhbmtpbmcgZnVuY3Rp b25zIHN1Y2ggYXMgYFJBTksoKWAgb3IgdGhlIGBTRUxFQ1QgVE9QYCBjbGF1c2Ugc3VjaCBhczxi cj5gYGBzcWw8YnI+c2VsZWN0IFRPUCAzICogPGJyPmZyb20gY2xhc3M8YnI+Ozxicj5gYGA8YnI+ wqDCoMKgwqDCoCA8YnI+LSBUaGUgYHBhcnRpdGlvbiBieWAgY2xhdXNlIHN1Y2ggYXMgPGJyPmBg YHNxbDxicj5zZWxlY3Qgc2V4LCBuYW1lLCB3ZWlnaHQ8YnI+ZnJvbSAoc2VsZWN0IHNleCwgbmFt ZSwgbWF4KHdlaWdodCkgb3ZlcihwYXJ0aXRpb24gYnkgc2V4KSBtYXhfd2VpZ2h0PGJyPsKgwqDC oMKgwqAgZnJvbSBjbGFzcyk8YnI+d2hlcmXCoCB3ZWlnaHQgPSBtYXhfd2VpZ2h0PGJyPjs8YnI+ YGBgPGJyPjxicj5Ib3dldmVyLCB0aGVyZSBhcmUgYWx3YXlzIHNvbWUgYWx0ZXJuYXRpdmUgc29s dXRpb25zIGluIFNBUy4gSSBsaXN0IGEgZmV3IHF1ZXN0aW9uIGZyb20gYW4gYXNjZW5kaW5nIGRp ZmZpY3VsdHkgYmVsb3cgdG8gZXhwbG9yZSB0aGUgcG9zc2liaWxpdGllcy4gPGJyPiMjIyMjUHJl cGFyZSB0aGUgZGF0YTxicj48YnI+Rmlyc3QgYSBbU0FTSEVMUC5DTEFTU10oaHR0cDovL3N1cHBv cnQuc2FzLmNvbS9kb2N1bWVudGF0aW9uL2NkbC9lbi9zdGF0dWcvNjM5NjIvSFRNTC9kZWZhdWx0 L3ZpZXdlci5odG0jc3RhdHVnX2ludHJvX3NlY3QwMDkuaHRtKSBkYXRhc2V0IGlzIHVzZWQgYXMg YSBkZW1vIChhdmFpbGFiZSBmb3IgZXZlcnkgU0FTIGNvcHkpLiBJdCBpcyBhIHNtYWxsIHdlaWdo dCBhbmQgaGVpZ2h0IGRhdGFzZXQgZnJvbSBhIGZha2VkIGNsYXNzIG9mIDE5IGNoaWxkcmVuLiBO b3cgSSBvbmx5IGtlZXAgdGhlIGB3ZWlnaHRgIHZhcmlhYmxlIGFzIHRhcmdldCBjb2x1bW4uPGJy PmBgYHNxbDxicj5kYXRhIGNsYXNzOzxicj7CoMKgwqAgc2V0IHNhc2hlbHAuY2xhc3M7PGJyPsKg wqDCoCBrZWVwIG5hbWUgc2V4IHdlaWdodDs8YnI+cnVuOzxicj48YnI+cHJvYyBzb3J0Ozxicj7C oMKgwqAgYnkgZGVzY2VuZGluZyB3ZWlnaHQ7PGJyPnJ1bjs8YnI+YGBgPGJyPnwgTmFtZcKgwqDC oCB8IFNleMKgwqAgfMKgwqAgQWdlIHzCoMKgIFdlaWdodCB8PGJyPnw6LS0tLS0tLS18Oi0tLS0t LXwtLS0tLS06fC0tLS0tLS0tLTp8PGJyPnwgUGhpbGlwwqAgfCBNwqDCoMKgwqAgfMKgwqDCoCAx NiB8wqDCoMKgIDE1MMKgwqAgfDxicj58IFJvbmFsZMKgIHwgTcKgwqDCoMKgIHzCoMKgwqAgMTUg fMKgwqDCoCAxMzPCoMKgIHw8YnI+fCBSb2JlcnTCoCB8IE3CoMKgwqDCoCB8wqDCoMKgIDEyIHzC oMKgwqAgMTI4wqDCoCB8PGJyPnwgQWxmcmVkwqAgfCBNwqDCoMKgwqAgfMKgwqDCoCAxNCB8wqDC oMKgIDExMi41IHw8YnI+fCBKYW5ldMKgwqAgfCBGwqDCoMKgwqAgfMKgwqDCoCAxNSB8wqDCoMKg IDExMi41IHw8YnI+fCBNYXJ5wqDCoMKgIHwgRsKgwqDCoMKgIHzCoMKgwqAgMTUgfMKgwqDCoCAx MTLCoMKgIHw8YnI+fCBXaWxsaWFtIHwgTcKgwqDCoMKgIHzCoMKgwqAgMTUgfMKgwqDCoCAxMTLC oMKgIHw8YnI+fCBDYXJvbMKgwqAgfCBGwqDCoMKgwqAgfMKgwqDCoCAxNCB8wqDCoMKgIDEwMi41 IHw8YnI+fCBIZW5yecKgwqAgfCBNwqDCoMKgwqAgfMKgwqDCoCAxNCB8wqDCoMKgIDEwMi41IHw8 YnI+fCBKb2huwqDCoMKgIHwgTcKgwqDCoMKgIHzCoMKgwqAgMTIgfMKgwqDCoMKgIDk5LjUgfDxi cj58IEJhcmJhcmEgfCBGwqDCoMKgwqAgfMKgwqDCoCAxMyB8wqDCoMKgwqAgOTjCoMKgIHw8YnI+ fCBKdWR5wqDCoMKgIHwgRsKgwqDCoMKgIHzCoMKgwqAgMTQgfMKgwqDCoMKgIDkwwqDCoCB8PGJy PnwgVGhvbWFzwqAgfCBNwqDCoMKgwqAgfMKgwqDCoCAxMSB8wqDCoMKgwqAgODXCoMKgIHw8YnI+ fCBKYW5lwqDCoMKgIHwgRsKgwqDCoMKgIHzCoMKgwqAgMTIgfMKgwqDCoMKgIDg0LjUgfDxicj58 IEFsaWNlwqDCoCB8IEbCoMKgwqDCoCB8wqDCoMKgIDEzIHzCoMKgwqDCoCA4NMKgwqAgfDxicj58 IEplZmZyZXkgfCBNwqDCoMKgwqAgfMKgwqDCoCAxMyB8wqDCoMKgwqAgODTCoMKgIHw8YnI+fCBK YW1lc8KgwqAgfCBNwqDCoMKgwqAgfMKgwqDCoCAxMiB8wqDCoMKgwqAgODPCoMKgIHw8YnI+fCBM b3Vpc2XCoCB8IEbCoMKgwqDCoCB8wqDCoMKgIDEyIHzCoMKgwqDCoCA3N8KgwqAgfDxicj58IEpv eWNlwqDCoCB8IEbCoMKgwqDCoCB8wqDCoMKgIDExIHzCoMKgwqDCoCA1MC41IHw8YnI+PGJyPiMj IyMjIDEuIFNlbGVjdCBoaWdoZXN0IHZhbHVlPGJyPkl0IGlzIHN0cmFpZ2h0Zm9yd2FyZCB0byB1 c2UgdGhlIGBvdXRvYnNgIG9wdGlvbiBhdCB0aGUgYmVnaW5pbmcgdG8gc2luZ2xlIG91dCB0aGUg aGlnaGVzdCB3ZWlnaHQuIDxicj5gYGBweXRob248YnI+dGl0bGUgIlNlbGVjdCBoaWdoZXN0IHdl aWdodCBvdmVyYWxsIjs8YnI+cHJvYyBzcWwgb3V0b2JzID0gMTs8YnI+wqDCoMKgIHNlbGVjdCBu YW1lLCB3ZWlnaHQ8YnI+wqDCoMKgIGZyb20gY2xhc3M8YnI+wqDCoMKgIG9yZGVyIGJ5IHdlaWdo dCBkZXNjPGJyPjtxdWl0Ozxicj5gYGA8YnI+fCBOYW1lwqDCoCB8wqDCoCBXZWlnaHQgfDxicj58 Oi0tLS0tLS18LS0tLS0tLS0tOnw8YnI+fCBQaGlsaXAgfMKgwqDCoMKgwqAgMTUwIHw8YnI+PGJy PiMjIyMjMi4gU2VsZWN0IHNlY29uZCBoaWdoZXN0IHZhbHVlIDxicj5Ib3cgYWJvdXQgdGhlIHNl Y29uZCBoaWdoZXN0IHdlaWdodD8gVGhlIGxvZ2ljIGlzIHNpbXBsZSAtLSBpZiB3ZSByZW1vdmUg dGhlIGhpZ2hlc3Qgd2VpZ2h0IGZpcnN0LCB0aGVuIHRoZSBzZWNvbmQgaGlnaGVzdCB3ZWlnaHQg d2lsbCB0YWtlIHRoZSBmaXJzdCByb3cuIDxicj48YnI+YGBgamF2YTxicj50aXRsZSAiU2VsZWN0 IHNlY29uZCBoaWdoZXN0IHdlaWdodCBvdmVyYWxsIjs8YnI+cHJvYyBzcWwgb3V0b2JzID0gMTs8 YnI+wqDCoMKgIHNlbGVjdCBuYW1lLCB3ZWlnaHQgPGJyPsKgwqDCoCBmcm9tIGNsYXNzPGJyPsKg wqDCoCB3aGVyZSB3ZWlnaHQgbm90IGluIChzZWxlY3QgbWF4KHdlaWdodCkgZnJvbSBjbGFzcyk8 YnI+wqDCoMKgIG9yZGVyIGJ5IHdlaWdodCBkZXNjPGJyPjtxdWl0Ozxicj5gYGA8YnI+PGJyPnwg TmFtZcKgwqAgfMKgwqAgV2VpZ2h0IHw8YnI+fDotLS0tLS0tfC0tLS0tLS0tLTp8PGJyPnwgUm9u YWxkIHzCoMKgwqDCoMKgIDEzMyB8PGJyPjxicj4jIyMjIzMuIFNlbGVjdCBOdGggaGlnaGVzdCB2 YWx1ZTxicj48YnI+Tm93IGl0IGNvbWVzIHRvIHRoZSBoYXJkIHBhcnQuIEhvdyBhYm91dCB0aGUg TnRoIGhpZ2hlc3QgdmFsdWUsIHNheSwgdGhlIGZvdXJ0aCBoaWdoZXN0IHdlaWdodD8gTm93IHdl IGhhdmUgdG8gZG8gYSBzZWxmLWpvaW5pbmcgdG8gbGV0IHRoZSBkaXN0aW5jdCB2YWx1ZSBwb2lu dCB0byAzLiBTaW5jZSB0aGVyZSBhcmUgdHdvIGNoaWxkcmVuIHdpdGggdGhlIHdlaWdodCAxMTIu NSwgdGhlIHF1ZXJ5IHJldHVybnMgdGhlIHR3byB0aWVkIG5hbWVzLiA8YnI+YGBgamF2YTxicj50 aXRsZSAiU2VsZWN0IE50aCBoaWdoZXN0IHdlaWdodCI7PGJyPiVsZXQgbiA9IDQ7PGJyPjxicj5w cm9jIHNxbDs8YnI+wqDCoMKgIHNlbGVjdCBkaXN0aW5jdCBhLm5hbWUsIGEud2VpZ2h0PGJyPsKg wqDCoCBmcm9tIGNsYXNzIGFzIGE8YnI+wqDCoMKgIHdoZXJlIChzZWxlY3QgY291bnQoZGlzdGlu Y3QgYi53ZWlnaHQpPGJyPsKgwqDCoMKgwqDCoMKgIGZyb20gY2xhc3MgYXMgYjxicj7CoMKgwqDC oMKgwqDCoCB3aGVyZSBiLndlaWdodCAmZ3Q7IGEud2VpZ2h0PGJyPsKgwqDCoMKgwqDCoMKgICkg PSAmYW1wO24gLSAxOzxicj5xdWl0Ozxicj5gYGA8YnI+PGJyPnwgTmFtZcKgwqAgfMKgwqAgV2Vp Z2h0IHw8YnI+fDotLS0tLS0tfC0tLS0tLS0tLTp8PGJyPnwgQWxmcmVkIHzCoMKgwqAgMTEyLjUg fDxicj58IEphbmV0wqAgfMKgwqDCoCAxMTIuNSB8PGJyPjxicj4jIyMjIzQuIFNlbGVjdCBoaWdo ZXN0IHZhbHVlcyBieSBncm91cDxicj48YnI+VGhlcmUgYXJlIHR3byBncm91cHMgTWFsZSBhbmQg RmVtYWxlIGluIHRoZSBjbGFzcywgYW5kIHRoZSBlYXNpZXN0IHdheSB0byBmaW5kIHRoZSBoaWdo ZXN0IHdlaWdodCBmb3IgZWFjaCBjYXRlZ29yeSBpcyBgc2VsZWN0IG1heCBmb3IgZmVtYWxlIHVu aW9uIHNlbGVjdCBtYXggZm9yIG1hbGVgLiBIb3dldmVyLCBhIG1vcmUgc2NhbGFibGUgc29sdXRp b24gaXMgdG8gdXNlIHRoZSBgZ3JvdXAgYnlgIGNsYXVzZSB0aGF0IGZpdHMgbW9yZSB0aGFuIHR3 byBncm91cHMuIDxicj5gYGBqYXZhPGJyPnRpdGxlICJTZWxlY3QgaGlnaGVzdCB3ZWlnaHRzIGJ5 IGdyb3VwIjs8YnI+cHJvYyBzcWw7PGJyPsKgwqDCoCBzZWxlY3Qgc2V4LCBuYW1lLCB3ZWlnaHQ8 YnI+wqDCoMKgIGZyb20gY2xhc3M8YnI+wqDCoMKgIGdyb3VwIGJ5IHNleDxicj7CoMKgwqAgaGF2 aW5nIHdlaWdodCA9IG1heCh3ZWlnaHQpPGJyPjtxdWl0Ozxicj5gYGA8YnI+PGJyPnwgU2V4wqDC oCB8IE5hbWXCoMKgIHzCoMKgIFdlaWdodCB8PGJyPnw6LS0tLS0tfDotLS0tLS0tfC0tLS0tLS0t LTp8PGJyPnwgRsKgwqDCoMKgIHwgSmFuZXTCoCB8wqDCoMKgIDExMi41IHw8YnI+fCBNwqDCoMKg wqAgfCBQaGlsaXAgfMKgwqDCoCAxNTDCoMKgIHw8YnI+PGJyPjxicj4jIyMjIzUuIFJhbmsgYWxs IHZhbHVlczxicj5UaGUgdWx0aW1hdGUgc29sdXRpb24gdG8gc29sdmUgYWxsIHRoZSBxdWVzdGlv biBhYm92ZSBpcyB0byBkZXJpdmUgYSBgcmFua2AgY29sdW1uIGZvciB0aGUgdGFyZ2V0LiBUaGVy ZSBhcmUgdHdvIHNvbHV0aW9uczogdGhlIGZpcnN0IG9uZSB1c2UgYSBzdWJxdWVyeSBpbiB0aGUg YHNlbGVjdGAgY2xhdXNlLCB3aGlsZSB0aGUgc2Vjb25kIG9uZSB1dGlsaXplcyBhIHN1YnF1ZXJ5 IGluIHRoZSBgd2hlcmVgIGNsYXVzZS4gPGJyPjxicj5UaGUgc3VicXVlcnkgaW4gdGhlIGZpcnN0 IHNvbHV0aW9uIGlzIGluZGVwZW5kZW50IHRvIHRoZSBtYWluIHF1ZXJ5LCB3aGljaCB1c2VzIGxl c3MgY29kZXMgYW5kIGlzIGVhc2llciB0byByZWNhbGwgaW4gcHJhY3RpY2UuIFRoZSBzZWNvbmQg b25lIGlzIGFjdHVhbGx5IGEgc2VsZi1qb2luaW5nIHRoYXQgaXMgZmFzdGVyIHRoYW4gdGhlIGZp cnN0IHNvbHV0aW9uLjxicj5gYGBqYXZhPGJyPi8qIFNvbHV0aW9uIEkgKi88YnI+cHJvYyBzcWw7 IDxicj7CoMKgwqAgc2VsZWN0IG5hbWUsIHdlaWdodCwgKHNlbGVjdCBjb3VudChkaXN0aW5jdCBi LndlaWdodCkgPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgZnJvbSBjbGFzcyBhcyBiIHdoZXJl IGIud2VpZ2h0ICZndDs9IGEud2VpZ2h0KSBhcyBSYW5rPGJyPsKgwqDCoCBmcm9tIGNsYXNzIGFz IGE8YnI+wqDCoMKgIG9yZGVyIGJ5IHJhbms8YnI+O3F1aXQ7PGJyPjxicj4vKiBTb2x1dGlvbiBJ SSAqLzxicj5wcm9jIHNxbDs8YnI+wqDCoMKgIHNlbGVjdCBhLm5hbWUsIGEud2VpZ2h0LCBjb3Vu dChiLndlaWdodCkgYXMgcmFuazxicj7CoMKgwqAgZnJvbSBjbGFzcyBhcyBhLCAoc2VsZWN0IGRp c3RpbmN0IHdlaWdodDxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoCBmcm9tIGNsYXNzPGJyPsKgwqDC oMKgwqDCoMKgwqDCoMKgICkgYXMgYjxicj7CoMKgwqAgd2hlcmUgYS53ZWlnaHQgJmx0Oz0gYi53 ZWlnaHQ8YnI+wqDCoMKgIGdyb3VwIGJ5IGEubmFtZSwgYS53ZWlnaHQ8YnI+wqDCoMKgIG9yZGVy IGJ5IGEud2VpZ2h0IGRlc2M8YnI+O3F1aXQ7PGJyPmBgYDxicj48YnI+fCBOYW1lwqDCoMKgIHzC oMKgIFdlaWdodCB8wqDCoCBSYW5rIHw8YnI+fDotLS0tLS0tLXwtLS0tLS0tLS06fC0tLS0tLS06 fDxicj58IFBoaWxpcMKgIHzCoMKgwqAgMTUwwqDCoCB8wqDCoMKgwqDCoCAxIHw8YnI+fCBSb25h bGTCoCB8wqDCoMKgIDEzM8KgwqAgfMKgwqDCoMKgwqAgMiB8PGJyPnwgUm9iZXJ0wqAgfMKgwqDC oCAxMjjCoMKgIHzCoMKgwqDCoMKgIDMgfDxicj58IEFsZnJlZMKgIHzCoMKgwqAgMTEyLjUgfMKg wqDCoMKgwqAgNCB8PGJyPnwgSmFuZXTCoMKgIHzCoMKgwqAgMTEyLjUgfMKgwqDCoMKgwqAgNCB8 PGJyPnwgTWFyecKgwqDCoCB8wqDCoMKgIDExMsKgwqAgfMKgwqDCoMKgwqAgNSB8PGJyPnwgV2ls bGlhbSB8wqDCoMKgIDExMsKgwqAgfMKgwqDCoMKgwqAgNSB8PGJyPnwgSGVucnnCoMKgIHzCoMKg wqAgMTAyLjUgfMKgwqDCoMKgwqAgNiB8PGJyPnwgQ2Fyb2zCoMKgIHzCoMKgwqAgMTAyLjUgfMKg wqDCoMKgwqAgNiB8PGJyPnwgSm9obsKgwqDCoCB8wqDCoMKgwqAgOTkuNSB8wqDCoMKgwqDCoCA3 IHw8YnI+fCBCYXJiYXJhIHzCoMKgwqDCoCA5OMKgwqAgfMKgwqDCoMKgwqAgOCB8PGJyPnwgSnVk ecKgwqDCoCB8wqDCoMKgwqAgOTDCoMKgIHzCoMKgwqDCoMKgIDkgfDxicj58IFRob21hc8KgIHzC oMKgwqDCoCA4NcKgwqAgfMKgwqDCoMKgIDEwIHw8YnI+fCBKYW5lwqDCoMKgIHzCoMKgwqDCoCA4 NC41IHzCoMKgwqDCoCAxMSB8PGJyPnwgQWxpY2XCoMKgIHzCoMKgwqDCoCA4NMKgwqAgfMKgwqDC oMKgIDEyIHw8YnI+fCBKZWZmcmV5IHzCoMKgwqDCoCA4NMKgwqAgfMKgwqDCoMKgIDEyIHw8YnI+ fCBKYW1lc8KgwqAgfMKgwqDCoMKgIDgzwqDCoCB8wqDCoMKgwqAgMTMgfDxicj58IExvdWlzZcKg IHzCoMKgwqDCoCA3N8KgwqAgfMKgwqDCoMKgIDE0IHw8YnI+fCBKb3ljZcKgwqAgfMKgwqDCoMKg IDUwLjUgfMKgwqDCoMKgIDE1IHw8YnI+PGJyPiMjIyMjNi4gU2VsZWN0IHRvcCBOIHZhbHVlcyBi eSBncm91cDxicj48YnI+T25jZSB3aXRoIHRoZSBgcmFua2AgY29sdW1uIGF0IGhhbmQsIG1hbnkg cGVycGxleGluZyBwcm9ibGVtcyBjb3VsZCBiZSBlYXNpbHkgc29sdmVkLiBGb3IgZXhhbXBsZSwg d2UgY2FuIHVzZSBpdCB0byBmaW5kIHRoZSB0b3AgMyBoZWF2aWVzdCBwZW9wbGUgZm9yIGVhY2gg Y2F0ZWdvcnkgb2YgbWFsZSBhbmQgZmVtYWxlLiBBbmQgaXQgaXMgYWxzbyBzY2FsYWJsZSB0byBt b3JlIHRoYW4gdHdvIGdyb3Vwcy7CoCA8YnI+YGBgamF2YTxicj50aXRsZSAiU2VsZWN0IFRvcCBO IHdlaWdodHMgYnkgZ3JvdXAiOzxicj5wcm9jIHNxbDsgPGJyPsKgwqDCoCBzZWxlY3QgYS5zZXgs IGEubmFtZSwgYS53ZWlnaHQsIChzZWxlY3QgY291bnQoZGlzdGluY3QgYi53ZWlnaHQpIDxicj7C oMKgwqDCoMKgwqDCoMKgwqDCoMKgIGZyb20gY2xhc3MgYXMgYiB3aGVyZSBiLndlaWdodCAmZ3Q7 PSBhLndlaWdodCBhbmQgYS5zZXggPSBiLnNleCApIGFzIHJhbmsgPGJyPsKgwqDCoCBmcm9tIGNs YXNzIGFzIGE8YnI+wqDCoMKgIHdoZXJlIGNhbGN1bGF0ZWQgcmFuayAmbHQ7PSAzPGJyPsKgwqDC oCBvcmRlciBieSBzZXgsIHJhbms8YnI+O3F1aXQ7PGJyPmBgYDxicj48YnI+fCBTZXjCoMKgIHwg TmFtZcKgwqAgfMKgwqAgV2VpZ2h0IHzCoMKgIHJhbmsgfDxicj58Oi0tLS0tLXw6LS0tLS0tLXwt LS0tLS0tLS06fC0tLS0tLS06fDxicj58IEbCoMKgwqDCoCB8IEphbmV0wqAgfMKgwqDCoCAxMTIu NSB8wqDCoMKgwqDCoCAxIHw8YnI+fCBGwqDCoMKgwqAgfCBNYXJ5wqDCoCB8wqDCoMKgIDExMsKg wqAgfMKgwqDCoMKgwqAgMiB8PGJyPnwgRsKgwqDCoMKgIHwgQ2Fyb2zCoCB8wqDCoMKgIDEwMi41 IHzCoMKgwqDCoMKgIDMgfDxicj58IE3CoMKgwqDCoCB8IFBoaWxpcCB8wqDCoMKgIDE1MMKgwqAg fMKgwqDCoMKgwqAgMSB8PGJyPnwgTcKgwqDCoMKgIHwgUm9uYWxkIHzCoMKgwqAgMTMzwqDCoCB8 wqDCoMKgwqDCoCAyIHw8YnI+fCBNwqDCoMKgwqAgfCBSb2JlcnQgfMKgwqDCoCAxMjjCoMKgIHzC oMKgwqDCoMKgIDMgfDxicj4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/3016598022911617398/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=3016598022911617398' title='37 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/3016598022911617398'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/3016598022911617398'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/02/solve-top-n-questions-in-sassql_3.html' title='Solve the Top N questions in SAS/SQL'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>37</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-6389320002516762287</id><published>2015-02-01T18:34:00.000-06:00</published><updated>2015-02-02T08:42:13.608-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'>Deploy a MongoDB powered Flask app in 5 minutes</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/src=dashboard&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;This is a quick tutorial to deploy a web service (a social network) by the LNMP (Linux, Nginx, MongoDB, Python) infrastructure on any IaaS cloud. The repo at Github is at &lt;a href=&quot;https://github.com/dapangmao/minitwit-mongo-ubuntu&quot;&gt;https://github.com/dapangmao/minitwit-mongo-ubuntu&lt;/a&gt;. &lt;/div&gt;&lt;h4 id=&quot;stack&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Stack&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The stack is built on the tools in the ecosystem of Python below.&amp;nbsp;&lt;/div&gt;&lt;h4 id=&quot;stack&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;&lt;a href=&quot;http://3.bp.blogspot.com/-s1sq96hNeXg/VM7FeqEiiTI/AAAAAAAADf4/naHg5uF93JY/s1600/test%2B-%2BNew%2BPage%2B%281%29.png&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://3.bp.blogspot.com/-s1sq96hNeXg/VM7FeqEiiTI/AAAAAAAADf4/naHg5uF93JY/s1600/test%2B-%2BNew%2BPage%2B(1).png&quot; height=&quot;560&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/h4&gt;&lt;table style=&quot;border-collapse: collapse; border-spacing: 0px; border: 0px none; font: inherit; margin: 1.2em 0px; padding: 0px;&quot;&gt;&lt;thead&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em;&quot;&gt;Tool&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Name&lt;/th&gt;&lt;th style=&quot;background-color: #f0f0f0; border: 1px solid rgb(204, 204, 204); font-size: 1em; font-weight: bold; margin: 0px; padding: 0.5em 1em;&quot;&gt;Advantage&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style=&quot;border: 0px none; margin: 0px; padding: 0px;&quot;&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Cloud&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;&lt;a href=&quot;https://www.digitalocean.com/&quot;&gt;DigitalOcean&lt;/a&gt;&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Cheap but fast&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Server distro&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Ubuntu 14.10 x64&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Everything is latest&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;WSGI proxy&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Gunicorn&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Manage workers automatically&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Web proxy&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Nginx&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Fast and easy to configure&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Framework&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Flask&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Single file approach for MVC&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: #f8f8f8; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Data store&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;MongoDB&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;No scheme needed and scalable&lt;/td&gt;&lt;/tr&gt;&lt;tr style=&quot;-moz-border-bottom-colors: none; -moz-border-left-colors: none; -moz-border-right-colors: none; -moz-border-top-colors: none; background-color: white; border-color: rgb(204, 204, 204) -moz-use-text-color -moz-use-text-color; border-image: none; border-right: 0px none; border-style: solid none none; border-width: 1px 0px 0px; margin: 0px; padding: 0px;&quot;&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;DevOps&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em; text-align: center;&quot;&gt;Fabric&lt;/td&gt;&lt;td style=&quot;border: 1px solid rgb(204, 204, 204); font-size: 1em; margin: 0px; padding: 0.5em 1em;&quot;&gt;Agentless and Pythonic&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-V5Iu8aOWDKs/VM7Fj3dyIRI/AAAAAAAADgA/Rsesi9zUZIw/s1600/Screenshot%2Bfrom%2B2015-02-01%2B15%3A25%3A41.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://2.bp.blogspot.com/-V5Iu8aOWDKs/VM7Fj3dyIRI/AAAAAAAADgA/Rsesi9zUZIw/s1600/Screenshot%2Bfrom%2B2015-02-01%2B15%3A25%3A41.png&quot; height=&quot;640&quot; width=&quot;552&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;In addition, a &lt;a href=&quot;http://supervisord.org/&quot;&gt;Supervisor&lt;/a&gt; running on the server provides a daemon to protect the Gunicorn-Flask process. &lt;/div&gt;&lt;h4 id=&quot;the-minitwit-app&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;The MiniTwit app&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The MiniTwit application is &lt;a href=&quot;https://github.com/mitsuhiko/flask/tree/master/examples/minitwit&quot;&gt;an example provided by Flask&lt;/a&gt;, which is a prototype of Twitter like multiple-user social network. The original application depends on SQLite. However, the data store could be modified to fit the category of NoSQL such as Google Data Store or MongoDB. A live MintiTwit demo is hosted at &lt;a href=&quot;http://minitwit-123.appspot.com/public&quot;&gt;http://minitwit-123.appspot.com/public&lt;/a&gt;&lt;/div&gt;&lt;h4 id=&quot;deployment&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Deployment&lt;/h4&gt;&lt;h5 id=&quot;1-install-fabric-and-clone-the-github-repo&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;1. Install Fabric and clone the Github repo&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The DevOps tool is &lt;a href=&quot;https://github.com/fabric/fabric&quot;&gt;fabric&lt;/a&gt; that is simply based on SSH. The &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;fabfile.py&lt;/code&gt; and the staging &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;flask&lt;/code&gt; files are stored on Github. We should install &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;fabric&lt;/code&gt; and download the fabfile.py on the local machine before the deployment.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-bash&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-built_in&quot; style=&quot;color: #0086b3;&quot;&gt;sudo&lt;/span&gt; pip install fabric &lt;br /&gt;wget https://raw.githubusercontent.com/dapangmao/minitwit-mongo-ubuntu/master/fabfile.py&lt;br /&gt;fab &lt;span class=&quot;hljs-operator&quot;&gt;-l&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h5 id=&quot;2-input-ip-from-the-virtual-machine&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;2. Enter IP from the virtual machine&lt;/h5&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;A new VM from ausually emails IP address and the root password. Then we could modify the head part of the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;fabfile.py&lt;/code&gt; accordingly. There are quite a less expensive cloud providers for prototyping other than the costly Amazon EC2. For example, a minimal instance from DigitalOcean only costs five dollars a month. If SSH key has been uploaded, the password could be ignored. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;env.hosts = [&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;YOUR IP ADDRESS&#39;&lt;/span&gt;] &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;#  Enter IP&lt;br /&gt;env.user = &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;root&#39;&lt;/span&gt;&lt;br /&gt;env.password = &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;YOUR PASSWORD&#39;&lt;/span&gt;  &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;#  Enter password&lt;br /&gt;&lt;!-------------&gt;&lt;!-------------&gt;&lt;/span&gt;&lt;!-------------&gt;&lt;!-------------&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h5 id=&quot;3-fire-up-fabric&quot; style=&quot;font-size: 1em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;3. Fire up Fabric&lt;/h5&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-w8utFJ80m9A/VM7FpNNWYiI/AAAAAAAADgI/c8-lpBMGAd8/s1600/Screenshot%2Bfrom%2B2015-02-01%2B19%3A21%3A34.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://1.bp.blogspot.com/-w8utFJ80m9A/VM7FpNNWYiI/AAAAAAAADgI/c8-lpBMGAd8/s1600/Screenshot%2Bfrom%2B2015-02-01%2B19%3A21%3A34.png&quot; height=&quot;297&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Now it is time to formally deploy the application. With the command below, the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;fabric&lt;/code&gt; will first install &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;pip, git, nginx, gunicorn, supervisor&lt;/code&gt; and the latest &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;MongodB&lt;/code&gt;, and configure them sequentially.  In less than 5 minutes, a Flask and MongoDB application will be ready for use. Since DigitalOcean has its own software repository for Ubuntu, and its VMs are on SSD, the deployment is even faster, which is usually finished in one minute.   &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;fab deploy_minitwit&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+VGhpcyBpcyBhIHF1aWNrIHR1dG9yaWFsIHRvIGRlcGxveSBhIHdlYiBzZXJ2aWNlIChhIHNv Y2lhbCBuZXR3b3JrKSBieSB0aGUgTE5NUCAoTGludXgsIE5naW54LCBNb25nb0RCLCBQeXRob24p IGluZnJhc3RydWN0dXJlIG9uIGFueSBJYWFTIGNsb3VkLiBUaGUgcmVwbyBhdCBHaXRodWIgaXMg YXQgaHR0cHM6Ly9naXRodWIuY29tL2RhcGFuZ21hby9taW5pdHdpdC1tb25nby11YnVudHUuIDxi cj48YnI+IyMjI1N0YWNrPGJyPlRoZSBzdGFjayBpcyBidWlsdCBvbiB0aGUgdG9vbHMgaW4gdGhl IGVjb3N5c3RlbSBvZiBQeXRob24gYmVsb3cuIDxicj48YnI+fCBUb29sJm5ic3A7Jm5ic3A7IHwm bmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsgTmFtZSZuYnNwOyZuYnNwOyZuYnNwOyZuYnNw OyZuYnNwOyB8Jm5ic3A7IEFkdmFudGFnZSB8PGJyPnwtLS0tLS0tLS0tfDotLS0tLS0tLS0tLS0t OnwtLS0tLS18PGJyPnwgQ2xvdWQgfCBbRGlnaXRhbE9jZWFuXShodHRwczovL3d3dy5kaWdpdGFs b2NlYW4uY29tLykgfCBDaGVhcCBidXQgZmFzdCB8PGJyPnwgU2VydmVyIGRpc3RybyB8Jm5ic3A7 IFVidW50dSAxNC4xMCB4NjQgfCBFdmVyeXRoaW5nIGlzIGxhdGVzdCB8PGJyPnwgV1NHSSBwcm94 eSB8Jm5ic3A7Jm5ic3A7Jm5ic3A7IEd1bmljb3JuJm5ic3A7Jm5ic3A7IHwmbmJzcDsmbmJzcDsg TWFuYWdlIHdvcmtlcnMgYXV0b21hdGljYWxseSB8PGJyPnwgV2ViIHByb3h5IHwgTmdpbnggfCZu YnNwOyZuYnNwOyZuYnNwOyBGYXN0IGFuZCBlYXN5IHRvIGNvbmZpZ3VyZXw8YnI+fCBGcmFtZXdv cmsgfCBGbGFzayB8U2luZ2xlIGZpbGUgYXBwcm9hY2ggZm9yIE1WQyB8PGJyPnwgRGF0YSBzdG9y ZSB8IE1vbmdvREIgfCBObyBzY2hlbWUgbmVlZGVkIGFuZCBzY2FsYWJsZXw8YnI+fCBEZXZPcHMg fCBGYWJyaWMgfCBBZ2VudGxlc3MgYW5kIFB5dGhvbmljJm5ic3A7IHw8YnI+PGJyPkluIGFkZGl0 aW9uLCBhIFtTdXBlcnZpc29yXShodHRwOi8vc3VwZXJ2aXNvcmQub3JnLykgcnVubmluZyBvbiB0 aGUgc2VydmVyIHByb3ZpZGVzIGEgZGFlbW9uIHRvIHByb3RlY3QgdGhlIEd1bmljb3JuLUZsYXNr IHByb2Nlc3MuIDxicj48YnI+IyMjI1RoZSBNaW5pVHdpdCBhcHA8YnI+VGhlIE1pbmlUd2l0IGFw cGxpY2F0aW9uIGlzIFthbiBleGFtcGxlIHByb3ZpZGVkIGJ5IEZsYXNrXShodHRwczovL2dpdGh1 Yi5jb20vbWl0c3VoaWtvL2ZsYXNrL3RyZWUvbWFzdGVyL2V4YW1wbGVzL21pbml0d2l0KSwgd2hp Y2ggaXMgYSBwcm90b3R5cGUgb2YgVHdpdHRlciBsaWtlIG11bHRpcGxlLXVzZXIgc29jaWFsIG5l dHdvcmsuIFRoZSBvcmlnaW5hbCBhcHBsaWNhdGlvbiBkZXBlbmRzIG9uIFNRTGl0ZS4gSG93ZXZl ciwgdGhlIGRhdGEgc3RvcmUgY291bGQgYmUgbW9kaWZpZWQgdG8gZml0IHRoZSBjYXRlZ29yeSBv ZiBOb1NRTCBzdWNoIGFzIEdvb2dsZSBEYXRhIFN0b3JlIG9yIE1vbmdvREIuIEEgbGl2ZSBNaW50 aVR3aXQgZGVtbyBpcyBob3N0ZWQgYXQgaHR0cDovL21pbml0d2l0LTEyMy5hcHBzcG90LmNvbS9w dWJsaWM8YnI+PGJyPiMjIyNEZXBsb3ltZW50PGJyPjxicj4jIyMjIzEuIEluc3RhbGwgRmFicmlj IGFuZCBjbG9uZSB0aGUgR2l0aHViIHJlcG88YnI+VGhlIERldk9wcyB0b29sIGlzIFtmYWJyaWNd KGh0dHBzOi8vZ2l0aHViLmNvbS9mYWJyaWMvZmFicmljKSB0aGF0IGlzIHNpbXBseSBiYXNlZCBv biBTU0guIFRoZSBgZmFiZmlsZS5weWAgYW5kIHRoZSBzdGFnaW5nIGBmbGFza2AgZmlsZXMgYXJl IHN0b3JlZCBvbiBHaXRodWIuIFdlIHNob3VsZCBpbnN0YWxsIGBmYWJyaWNgIGFuZCBkb3dubG9h ZCB0aGUgZmFiZmlsZS5weSBvbiB0aGUgbG9jYWwgbWFjaGluZSBiZWZvcmUgdGhlIGRlcGxveW1l bnQuPGJyPmBgYGJhc2g8YnI+c3VkbyBwaXAgaW5zdGFsbCBmYWJyaWMgPGJyPndnZXQgaHR0cHM6 Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2RhcGFuZ21hby9taW5pdHdpdC1tb25nby11YnVu dHUvbWFzdGVyL2ZhYmZpbGUucHk8YnI+ZmFiIC1sPGJyPmBgYDxicj48YnI+IyMjIyMyLiBJbnB1 dCBJUCBmcm9tIHRoZSB2aXJ0dWFsIG1hY2hpbmU8YnI+QSBuZXcgVk0gdXN1YWxseSBlbWFpbHMg SVAgYWRkcmVzcyBhbmQgdGhlIHJvb3QgcGFzc3dvcmQuIFRoZW4gd2UgY291bGQgbW9kaWZ5IHRo ZSBoZWFkIHBhcnQgb2YgdGhlIGBmYWJmaWxlLnB5YCBhY2NvcmRpbmdseS4gVGhlcmUgYXJlIHF1 aXRlIGEgZmV3IGNoZWFwZXIgY2xvdWQgcHJvdmlkZXIgZm9yIHByb3RvdHlwaW5nIG90aGVyIHRo YW4gQW1hem9uIEVDMi4gRm9yIGV4YW1wbGUsIGEgbWluaW1hbCBpbnN0YW5jZSBmcm9tIERpZ2l0 YWxPY2VhbiBvbmx5IGNvc3RzIGZpdmUgZG9sbGFycyBhIG1vbnRoLiBJZiBTU0gga2V5IGhhcyBi ZWVuIHVwbG9hZGVkLCB0aGUgcGFzc3dvcmQgY291bGQgYmUgaWdub3JlZC4gPGJyPjxicj5gYGBw eXRob248YnI+ZW52Lmhvc3RzID0gWydZT1VSIElQIEFERFJFU1MnXSAjICZsdDstLS0tLS0tLS0g RW50ZXIgdGhlIElQIGFkZHJlc3M8YnI+ZW52LnVzZXIgPSAncm9vdCc8YnI+ZW52LnBhc3N3b3Jk ID0gJ1lPVVIgUEFTU1dPUkQnJm5ic3A7ICMgJmx0Oy0tLS0tLS0tLSBFbnRlciB0aGUgcm9vdCBw YXNzd29yZDxicj5gYGA8YnI+PGJyPiMjIyMjMy4gRmlyZSB1cCBGYWJyaWM8YnI+Tm93IGl0IGlz IHRpbWUgdG8gZm9ybWFsbHkgZGVwbG95IHRoZSBhcHBsaWNhdGlvbi4gV2l0aCB0aGUgY29tbWFu ZCBiZWxvdywgdGhlIGBmYWJyaWNgIHdpbGwgZmlyc3QgaW5zdGFsbCBgcGlwLCBnaXQsIG5naW54 LCBndW5pY29ybiwgc3VwZXJ2aXNvcmAgYW5kIHRoZSBsYXRlc3QgYE1vbmdvZEJgLCBhbmQgY29u ZmlndXJlIHRoZW0gc2VxdWVudGlhbGx5LiZuYnNwOyBJbiBsZXNzIHRoYW4gNSBtaW51dGVzLCBh IEZsYXNrIGFuZCBNb25nb0RCIGFwcGxpY2F0aW9uIHdpbGwgYmUgcmVhZHkgZm9yIHVzZS4gU2lu Y2UgRGlnaXRhbE9jZWFuIGhhcyBpdHMgb3duIHNvZnR3YXJlIHJlcG9zaXRvcnkgZm9yIFVidW50 dSwgYW5kIGl0cyBWTXMgYXJlIG9uIFNTRCwgdGhlIGRlcGxveW1lbnQgaXMgZXZlbiBmYXN0ZXIs IHdoaWNoIGlzIHVzdWFsbHkgZmluaXNoZWQgaW4gb25lIG1pbnV0ZS4mbmJzcDsmbmJzcDsgPGJy PmBgYHB5dGhvbjxicj5mYWIgZGVwbG95X21pbml0d2l0PGJyPmBgYDxicj48YnI+PGJyPjwvcD4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/6389320002516762287/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=6389320002516762287' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/6389320002516762287'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/6389320002516762287'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/02/deploy-mongodb-powered-flask-app-in-5.html' title='Deploy a MongoDB powered Flask app in 5 minutes'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-s1sq96hNeXg/VM7FeqEiiTI/AAAAAAAADf4/naHg5uF93JY/s72-c/test%2B-%2BNew%2BPage%2B(1).png" height="72" width="72"/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-2704239854334413865</id><published>2015-01-08T17:45:00.001-06:00</published><updated>2015-01-08T17:45:41.425-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Spark practice(4): malicious web attack</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=2704239854334413865&quot;&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Suppose there is a website tracking user activities to prevent robotic attack on the Internet. Please design an algorithm to identify user IDs that have more than 500 clicks within any given 10 minutes.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Sample.txt: anonymousUserID  timeStamp    clickCount&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;123    9:45am    10&lt;br /&gt;234    9:46am    12&lt;br /&gt;234    9:50am    20&lt;br /&gt;456    9:53am    100&lt;br /&gt;123    9:55am    33&lt;br /&gt;456    9:56am    312&lt;br /&gt;123    10:03am    110&lt;br /&gt;123    10:16am    312&lt;br /&gt;234    10:20am    201&lt;br /&gt;456    10:23am    180&lt;br /&gt;123    10:25am    393&lt;br /&gt;456    10:27am    112&lt;br /&gt;999    12:21pm    888&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;h3 id=&quot;thought&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Thought&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;This is a typical example of stream processing. The key is to build a fixed-length window to slide through all data, count data within and return the possible malicious IDs.&lt;/div&gt;&lt;h3 id=&quot;single-machine-solution&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Single machine solution&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Two data structures are used: a queue and a hash table. The queue is scanning the data and only keeps the data within a 10-minute window. Once a new data entry is filled, the old ones out of the window are popped out. The hash table counts the data in the queue and will be updated with the changing queue. Any ID with more than 500 clicks will be added to a set. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; datetime &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; datetime&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; time&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; collections &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; deque&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;get_minute&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(s, fmt = &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;%I:%M%p&#39;&lt;/span&gt;)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; time.mktime(datetime.strptime(s, fmt).timetuple())&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;get_diff&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(s1, s2)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; int(get_minute(s2) - get_minute(s1)) / &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;60&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;find_ids&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(infile, duration, maxcnt)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    queue, htable, ans = deque(), {}, set()&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;with&lt;/span&gt; open(infile, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;rt&#39;&lt;/span&gt;) &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; _infile:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; l &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; _infile:&lt;br /&gt;            line = l.split()&lt;br /&gt;            line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;] = int(line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;])&lt;br /&gt;            current_id, current_time, current_clk = line&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; current_id &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; htable:&lt;br /&gt;                htable[current_id] = current_clk&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;                htable[current_id] += current_clk&lt;br /&gt;            queue.append(line)&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;while&lt;/span&gt; queue &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; get_diff(queue[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;][&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;], current_time) &amp;gt; duration:&lt;br /&gt;                past_id, _, past_clk = queue.popleft()&lt;br /&gt;                htable[past_id] -= past_clk&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; htable[current_id] &amp;gt; maxcnt:&lt;br /&gt;                ans.add(current_id)&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; ans&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; __name__ == &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;__main__&quot;&lt;/span&gt;:&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; find_ids(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample.txt&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;10&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;500&lt;/span&gt;)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&quot;cluster-solution&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Cluster solution&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;https://spark.apache.org/news/spark-1-2-0-released.html&quot;&gt;The newest Spark (version 1.2.0)&lt;/a&gt; starts to support Python streaming. However, the document is still scarce — wait to see if this problem can be done by the new API. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;To be continued&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+Jmd0OyBTdXBwb3NlIHRoZXJlIGlzIGEgd2Vic2l0ZSB0cmFja2luZyB1c2VyIGFjdGl2aXRp ZXMgdG8gcHJldmVudCByb2JvdGljIGF0dGFjayBvbiB0aGUgSW50ZXJuZXQuIFBsZWFzZSBkZXNp Z24gYW4gYWxnb3JpdGhtIHRvIGlkZW50aWZ5IHVzZXIgSURzIHRoYXQgaGF2ZSBtb3JlIHRoYW4g NTAwIGNsaWNrcyB3aXRoaW4gYW55IGdpdmVuIDEwIG1pbnV0ZXMuPGJyPjxicj4mZ3Q7IFNhbXBs ZS50eHQ6IGFub255bW91c1VzZXJJRCZuYnNwOyB0aW1lU3RhbXAmbmJzcDsmbmJzcDsmbmJzcDsg Y2xpY2tDb3VudDxicj5gYGA8YnI+MTIzJm5ic3A7Jm5ic3A7Jm5ic3A7IDk6NDVhbSZuYnNwOyZu YnNwOyZuYnNwOyAxMDxicj4yMzQmbmJzcDsmbmJzcDsmbmJzcDsgOTo0NmFtJm5ic3A7Jm5ic3A7 Jm5ic3A7IDEyPGJyPjIzNCZuYnNwOyZuYnNwOyZuYnNwOyA5OjUwYW0mbmJzcDsmbmJzcDsmbmJz cDsgMjA8YnI+NDU2Jm5ic3A7Jm5ic3A7Jm5ic3A7IDk6NTNhbSZuYnNwOyZuYnNwOyZuYnNwOyAx MDA8YnI+MTIzJm5ic3A7Jm5ic3A7Jm5ic3A7IDk6NTVhbSZuYnNwOyZuYnNwOyZuYnNwOyAzMzxi cj40NTYmbmJzcDsmbmJzcDsmbmJzcDsgOTo1NmFtJm5ic3A7Jm5ic3A7Jm5ic3A7IDMxMjxicj4x MjMmbmJzcDsmbmJzcDsmbmJzcDsgMTA6MDNhbSZuYnNwOyZuYnNwOyZuYnNwOyAxMTA8YnI+MTIz Jm5ic3A7Jm5ic3A7Jm5ic3A7IDEwOjE2YW0mbmJzcDsmbmJzcDsmbmJzcDsgMzEyPGJyPjIzNCZu YnNwOyZuYnNwOyZuYnNwOyAxMDoyMGFtJm5ic3A7Jm5ic3A7Jm5ic3A7IDIwMTxicj40NTYmbmJz cDsmbmJzcDsmbmJzcDsgMTA6MjNhbSZuYnNwOyZuYnNwOyZuYnNwOyAxODA8YnI+MTIzJm5ic3A7 Jm5ic3A7Jm5ic3A7IDEwOjI1YW0mbmJzcDsmbmJzcDsmbmJzcDsgMzkzPGJyPjQ1NiZuYnNwOyZu YnNwOyZuYnNwOyAxMDoyN2FtJm5ic3A7Jm5ic3A7Jm5ic3A7IDExMjxicj45OTkmbmJzcDsmbmJz cDsmbmJzcDsgMTI6MjFwbSZuYnNwOyZuYnNwOyZuYnNwOyA4ODg8YnI+YGBgPGJyPjxicj4jIyNU aG91Z2h0IDxicj5UaGlzIGlzIGEgdHlwaWNhbCBleGFtcGxlIG9mIHN0cmVhbSBwcm9jZXNzaW5n LiBUaGUga2V5IGlzIHRvIGJ1aWxkIGEgZml4ZWQtbGVuZ3RoIHdpbmRvdyB0byBzbGlkZSB0aHJv dWdoIGFsbCBkYXRhLCBjb3VudCBkYXRhIHdpdGhpbiBhbmQgcmV0dXJuIHRoZSBwb3NzaWJsZSBt YWxpY2lvdXMgSURzLjxicj48YnI+IyMjU2luZ2xlIG1hY2hpbmUgc29sdXRpb248YnI+VHdvIGRh dGEgc3RydWN0dXJlcyBhcmUgdXNlZDogYSBxdWV1ZSBhbmQgYSBoYXNoIHRhYmxlLiBUaGUgcXVl dWUgaXMgc2Nhbm5pbmcgdGhlIGRhdGEgYW5kIG9ubHkga2VlcHMgdGhlIGRhdGEgd2l0aGluIGEg MTAtbWludXRlIHdpbmRvdy4gT25jZSBhIG5ldyBkYXRhIGVudHJ5IGlzIGZpbGxlZCwgdGhlIG9s ZCBvbmVzIG91dCBvZiB0aGUgd2luZG93IGFyZSBwb3BwZWQgb3V0LiBUaGUgaGFzaCB0YWJsZSBj b3VudHMgdGhlIGRhdGEgaW4gdGhlIHF1ZXVlIGFuZCB3aWxsIGJlIHVwZGF0ZWQgd2l0aCB0aGUg Y2hhbmdpbmcgcXVldWUuIEFueSBJRCB3aXRoIG1vcmUgdGhhbiA1MDAgY2xpY2tzIHdpbGwgYmUg YWRkZWQgdG8gYSBzZXQuIDxicj48YnI+YGBgcHl0aG9uPGJyPmZyb20gZGF0ZXRpbWUgaW1wb3J0 IGRhdGV0aW1lPGJyPmltcG9ydCB0aW1lPGJyPmZyb20gY29sbGVjdGlvbnMgaW1wb3J0IGRlcXVl PGJyPjxicj5kZWYgZ2V0X21pbnV0ZShzLCBmbXQgPSAnJUk6JU0lcCcpOjxicj4mbmJzcDsmbmJz cDsmbmJzcDsgcmV0dXJuIHRpbWUubWt0aW1lKGRhdGV0aW1lLnN0cnB0aW1lKHMsIGZtdCkudGlt ZXR1cGxlKCkpPGJyPjxicj5kZWYgZ2V0X2RpZmYoczEsIHMyKTo8YnI+Jm5ic3A7Jm5ic3A7Jm5i c3A7IHJldHVybiBpbnQoZ2V0X21pbnV0ZShzMikgLSBnZXRfbWludXRlKHMxKSkgLyA2MDxicj48 YnI+ZGVmIGZpbmRfaWRzKGluZmlsZSwgZHVyYXRpb24sIG1heGNudCk6PGJyPiZuYnNwOyZuYnNw OyZuYnNwOyBxdWV1ZSwgaHRhYmxlLCBhbnMgPSBkZXF1ZSgpLCB7fSwgc2V0KCk8YnI+Jm5ic3A7 Jm5ic3A7Jm5ic3A7IHdpdGggb3BlbihpbmZpbGUsICdydCcpIGFzIF9pbmZpbGU6PGJyPiZuYnNw OyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyBmb3IgbCBpbiBfaW5maWxlOjxi cj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsmbmJzcDsgbGluZSA9IGwuc3BsaXQoKTxicj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsgbGluZVsyXSA9IGludChs aW5lWzJdKTxicj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJz cDsmbmJzcDsmbmJzcDsmbmJzcDsgY3VycmVudF9pZCwgY3VycmVudF90aW1lLCBjdXJyZW50X2Ns ayA9IGxpbmU8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5i c3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IGlmIGN1cnJlbnRfaWQgbm90IGluIGh0YWJsZTo8YnI+Jm5i c3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7 Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IGh0YWJsZVtjdXJyZW50X2lkXSA9IGN1cnJl bnRfY2xrPGJyPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNw OyZuYnNwOyZuYnNwOyZuYnNwOyBlbHNlOjxicj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJz cDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsgaHRhYmxlW2N1cnJlbnRfaWRdICs9IGN1cnJlbnRfY2xrPGJyPiZuYnNwOyZuYnNwOyZu YnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyBxdWV1 ZS5hcHBlbmQobGluZSk8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5i c3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IHdoaWxlIHF1ZXVlIGFuZCBnZXRfZGlmZihxdWV1 ZVswXVsxXSwgY3VycmVudF90aW1lKSAmZ3Q7IGR1cmF0aW9uOjxicj4mbmJzcDsmbmJzcDsmbmJz cDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsmbmJzcDsmbmJzcDsgcGFzdF9pZCwgXywgcGFzdF9jbGsgPSBxdWV1ZS5wb3BsZWZ0KCk8 YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7 Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IGh0YWJsZVtwYXN0X2lkXSAtPSBw YXN0X2Nsazxicj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJz cDsmbmJzcDsmbmJzcDsmbmJzcDsgaWYgaHRhYmxlW2N1cnJlbnRfaWRdICZndDsgbWF4Y250Ojxi cj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsgYW5zLmFkZChjdXJyZW50X2lkKTxi cj4mbmJzcDsmbmJzcDsmbmJzcDsgcmV0dXJuIGFuczxicj48YnI+aWYgX19uYW1lX18gPT0gIl9f bWFpbl9fIjo8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7IHByaW50IGZpbmRfaWRzKCdzYW1wbGUudHh0 JywgMTAsIDUwMCk8YnI+YGBgPGJyPjxicj4jIyNDbHVzdGVyIHNvbHV0aW9uPGJyPltUaGUgbmV3 ZXN0IFNwYXJrICh2ZXJzaW9uIDEuMi4wKV0oaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnL25ld3Mv c3BhcmstMS0yLTAtcmVsZWFzZWQuaHRtbCkgc3RhcnRzIHRvIHN1cHBvcnQgUHl0aG9uIHN0cmVh bWluZy4gSG93ZXZlciwgdGhlIGRvY3VtZW50IGlzIHN0aWxsIHNjYXJjZSAtLSB3YWl0IHRvIHNl ZSBpZiB0aGlzIHByb2JsZW0gY2FuIGJlIGRvbmUgYnkgdGhlIG5ldyBBUEkuIDxicj48YnI+YGBg cHl0aG9uPGJyPlRvIGJlIGNvbnRpbnVlZDxicj5gYGA8YnI+PGJyPjwvcD4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/2704239854334413865/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=2704239854334413865' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/2704239854334413865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/2704239854334413865'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2015/01/spark-practice4-malicious-web-attack.html' title='Spark practice(4): malicious web attack'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-438116158511483050</id><published>2014-12-23T21:52:00.000-06:00</published><updated>2014-12-23T21:52:00.103-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Spark practice (3): clean and sort Social Security numbers</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=438116158511483050&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Sample.txt&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;Requirements:&lt;br /&gt;1. separate valid SSN and invalid SSN&lt;br /&gt;2. count the number of valid SSN&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;402-94-7709 &lt;br /&gt;283-90-3049 &lt;br /&gt;124-01-2425 &lt;br /&gt;1231232&lt;br /&gt;088-57-9593 &lt;br /&gt;905-60-3585 &lt;br /&gt;44-82-8341&lt;br /&gt;257581087&lt;br /&gt;327-84-0220&lt;br /&gt;402-94-7709&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;h4 id=&quot;thoughts&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Thoughts&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;SSN indexed data is commonly seen and stored in many file systems. The trick to accelerate the speed on Spark is to build a numerical key and use the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sortByKey&lt;/code&gt; operator.  Besides, the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;accumulator&lt;/code&gt; provides a global variable existing across machines in a cluster, which is especially useful for counting data.  &lt;/div&gt;&lt;h4 id=&quot;single-machine-solution&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Single machine solution&lt;/h4&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;#!/usr/bin/env python&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# coding=utf-8&lt;/span&gt;&lt;br /&gt;htable = {}&lt;br /&gt;valid_cnt = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;with&lt;/span&gt; open(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample.txt&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;rb&#39;&lt;/span&gt;) &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; infile, open(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample_bad.txt&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;wb&#39;&lt;/span&gt;) &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; outfile:&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; l &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; infile:&lt;br /&gt;        l = l.strip()&lt;br /&gt;        nums = l.split(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;-&#39;&lt;/span&gt;)&lt;br /&gt;        key = -&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; l.isdigit() &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; len(l) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;9&lt;/span&gt;:&lt;br /&gt;            key = int(l)&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; len(nums) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; map(len, nums) == [&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;4&lt;/span&gt;]:&lt;br /&gt;            key = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1000000&lt;/span&gt;*int(nums[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]) + &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;10000&lt;/span&gt;*int(nums[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]) + int(nums[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;])&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; key == -&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;            outfile.write(l + &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;\n&#39;&lt;/span&gt;)&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; key &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; htable:&lt;br /&gt;                htable[key] = l&lt;br /&gt;                valid_cnt += &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;with&lt;/span&gt; open(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample_sorted.txt&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;wb&#39;&lt;/span&gt;) &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; outfile:&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; sorted(htable):&lt;br /&gt;        outfile.write(htable[x] + &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;\n&#39;&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; valid_cnt&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-Iih49DOgGVE/VJmCGe0HnjI/AAAAAAAADfI/aQcVxejSmXM/s1600/Capture.PNG&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-Iih49DOgGVE/VJmCGe0HnjI/AAAAAAAADfI/aQcVxejSmXM/s1600/Capture.PNG&quot; height=&quot;193&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;h4 id=&quot;cluster-solution&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Cluster solution&lt;/h4&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;#!/usr/bin/env python&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# coding=utf-8&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; pyspark&lt;br /&gt;sc = pyspark.SparkContext()&lt;br /&gt;valid_cnt = sc.accumulator(&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;is_validSSN&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(l)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    l = l.strip()&lt;br /&gt;    nums = l.split(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;-&#39;&lt;/span&gt;)&lt;br /&gt;    cdn1 = (l.isdigit() &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; len(l) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;9&lt;/span&gt;)&lt;br /&gt;    cdn2 = (len(nums) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; map(len, nums) == [&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;4&lt;/span&gt;])&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; cdn1 &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;or&lt;/span&gt; cdn2:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;False&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;set_key&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(l)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;global&lt;/span&gt; valid_cnt&lt;br /&gt;    valid_cnt += &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    l = l.strip()&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; len(l) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;9&lt;/span&gt;:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; (int(l), l)&lt;br /&gt;    nums = l.split(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;-&#39;&lt;/span&gt;)&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; (&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1000000&lt;/span&gt;*int(nums[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]) + &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;10000&lt;/span&gt;*int(nums[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]) + int(nums[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;]), l)&lt;br /&gt;&lt;br /&gt;rdd = sc.textFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample.txt&#39;&lt;/span&gt;)&lt;br /&gt;rdd1 = rdd.filter(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; is_validSSN(x))&lt;br /&gt;&lt;br /&gt;rdd2 = rdd.filter(is_validSSN).distinct() \&lt;br /&gt;    .map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: set_key(x)) \&lt;br /&gt;    .sortByKey().map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;])&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; rdd1.collect():&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;Invalid SSN\t&#39;&lt;/span&gt;, x&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; rdd2.collect():&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;valid SSN\t&#39;&lt;/span&gt;, x&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;\nNumber of valid SSN is {}&#39;&lt;/span&gt;.format(valid_cnt)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Save RDD to file system&lt;/span&gt;&lt;br /&gt;rdd1.saveAsTextFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample_bad&#39;&lt;/span&gt;)&lt;br /&gt;rdd2.saveAsTextFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample_sorted&#39;&lt;/span&gt;)&lt;br /&gt;sc.stop()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-OH87V18pc34/VJmCAJAqC1I/AAAAAAAADfA/Xr8pe9S7FVA/s1600/Capture1.PNG&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-OH87V18pc34/VJmCAJAqC1I/AAAAAAAADfA/Xr8pe9S7FVA/s1600/Capture1.PNG&quot; height=&quot;200&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:PHA+Jmd0OyBTYW1wbGUudHh0PGJyPmBgYDxicj5SZXF1aXJlbWVudHM6PGJyPjEuIHNlcGFyYXRl IHZhbGlkIFNTTiBhbmQgaW52YWxpZCBTU048YnI+Mi4gY291bnQgdGhlIG51bWJlciBvZiB2YWxp ZCBTU048YnI+YGBgPGJyPiZndDs8YnI+YGBgPGJyPjQwMi05NC03NzA5IDxicj4yODMtOTAtMzA0 OSA8YnI+MTI0LTAxLTI0MjUgPGJyPjEyMzEyMzI8YnI+MDg4LTU3LTk1OTMgPGJyPjkwNS02MC0z NTg1IDxicj40NC04Mi04MzQxPGJyPjI1NzU4MTA4Nzxicj4zMjctODQtMDIyMDxicj40MDItOTQt NzcwOTxicj5gYGA8YnI+PGJyPiMjIyMgVGhvdWdodHM8YnI+U1NOIGluZGV4ZWQgZGF0YSBpcyBj b21tb25seSBzZWVuIGluIG1hbnkgZmlsZSBzeXN0ZW1zLiBUaGUgdHJpY2sgdG8gYWNjZWxlcmF0 ZSB0aGUgc3BlZWQgb24gU3BhcmsgaXMgdG8gYnVpbGQgYSBudW1lcmljYWwga2V5IGFuZCB1c2Ug dGhlIGBzb3J0QnlLZXlgIG9wZXJhdG9yLiZuYnNwOyBCZXNpZGVzLCB0aGUgYGFjY3VtdWxhdG9y YCBwcm92aWRlcyBhIGdsb2JhbCB2YXJpYWJsZSBleGlzdGluZyBhY3Jvc3MgbWFjaGluZXMgaW4g YSBjbHVzdGVyLCB3aGljaCBpcyBlc3BlY2lhbGx5IHVzZWZ1bCBmb3IgY291bnRpbmcgZGF0YS4m bmJzcDsgPGJyPjxicj4jIyMjIFNpbmdsZSBtYWNoaW5lIHNvbHV0aW9uPGJyPmBgYHB5dGhvbjxi cj4jIS91c3IvYmluL2VudiBweXRob248YnI+IyBjb2Rpbmc9dXRmLTg8YnI+aHRhYmxlID0ge308 YnI+dmFsaWRfY250ID0gMDxicj53aXRoIG9wZW4oJ3NhbXBsZS50eHQnLCAncmInKSBhcyBpbmZp bGUsIG9wZW4oJ3NhbXBsZV9iYWQudHh0JywgJ3diJykgYXMgb3V0ZmlsZTo8YnI+Jm5ic3A7Jm5i c3A7Jm5ic3A7IGZvciBsIGluIGluZmlsZTo8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5i c3A7Jm5ic3A7Jm5ic3A7IGwgPSBsLnN0cmlwKCk8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7 Jm5ic3A7Jm5ic3A7Jm5ic3A7IG51bXMgPSBsLnNwbGl0KCctJyk8YnI+Jm5ic3A7Jm5ic3A7Jm5i c3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IGtleSA9IC0xPGJyPiZuYnNwOyZuYnNwOyZuYnNw OyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyBpZiBsLmlzZGlnaXQoKSBhbmQgbGVuKGwpID09IDk6 PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNw OyZuYnNwOyZuYnNwOyBrZXkgPSBpbnQobCk8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5i c3A7Jm5ic3A7Jm5ic3A7IGlmIGxlbihudW1zKSA9PSAzIGFuZCBtYXAobGVuLCBudW1zKSA9PSBb MywgMiwgNF06PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZu YnNwOyZuYnNwOyZuYnNwOyZuYnNwOyBrZXkgPSAxMDAwMDAwKmludChudW1zWzBdKSArIDEwMDAw KmludChudW1zWzFdKSArIGludChudW1zWzJdKTxicj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsmbmJzcDsmbmJzcDsgaWYga2V5ID09IC0xOjxicj4mbmJzcDsmbmJzcDsmbmJzcDsmbmJz cDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsgb3V0ZmlsZS53cml0 ZShsICsgJ1xuJyk8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7 IGVsc2U6PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNw OyZuYnNwOyZuYnNwOyZuYnNwOyBpZiBrZXkgbm90IGluIGh0YWJsZTo8YnI+Jm5ic3A7Jm5ic3A7 Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5i c3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IGh0YWJsZVtrZXldID0gbDxicj4mbmJzcDsmbmJzcDsmbmJz cDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsmbmJzcDsm bmJzcDsmbmJzcDsmbmJzcDsgdmFsaWRfY250ICs9IDE8YnI+PGJyPndpdGggb3Blbignc2FtcGxl X3NvcnRlZC50eHQnLCAnd2InKSBhcyBvdXRmaWxlOjxicj4mbmJzcDsmbmJzcDsmbmJzcDsgZm9y IHggaW4gc29ydGVkKGh0YWJsZSk6PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZu YnNwOyZuYnNwOyBvdXRmaWxlLndyaXRlKGh0YWJsZVt4XSArICdcbicpPGJyPiZuYnNwOyZuYnNw OyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyA8YnI+cHJpbnQgdmFsaWRfY250PGJyPmBg YDwvcD48cCBzdHlsZT0idGV4dC1hbGlnbjogY2VudGVyOyBjbGVhcjogYm90aDsiIGNsYXNzPSJz ZXBhcmF0b3IiPjxhIHN0eWxlPSJtYXJnaW4tbGVmdDogMWVtOyBtYXJnaW4tcmlnaHQ6IDFlbTsi IGhyZWY9Imh0dHA6Ly80LmJwLmJsb2dzcG90LmNvbS8tSWloNDlET2dHVkUvVkptQ0dlMEhuakkv QUFBQUFBQUFEZkkvYVFjVnhlalNtWE0vczE2MDAvQ2FwdHVyZS5QTkciIGltYWdlYW5jaG9yPSIx Ij48aW1nIGRhdGEtb3JpZy1zcmM9Imh0dHA6Ly80LmJwLmJsb2dzcG90LmNvbS8tSWloNDlET2dH VkUvVkptQ0dlMEhuakkvQUFBQUFBQUFEZkkvYVFjVnhlalNtWE0vczE2MDAvQ2FwdHVyZS5QTkci IHNyYz0iLy9pbWFnZXMtYmxvZ2dlci1vcGVuc29jaWFsLmdvb2dsZXVzZXJjb250ZW50LmNvbS9n YWRnZXRzL3Byb3h5P3VybD1odHRwJTNBJTJGJTJGNC5icC5ibG9nc3BvdC5jb20lMkYtSWloNDlE T2dHVkUlMkZWSm1DR2UwSG5qSSUyRkFBQUFBQUFBRGZJJTJGYVFjVnhlalNtWE0lMkZzMTYwMCUy RkNhcHR1cmUuUE5HJmFtcDtjb250YWluZXI9YmxvZ2dlciZhbXA7Z2FkZ2V0PWEmYW1wO3Jld3Jp dGVNaW1lPWltYWdlJTJGKiIgc3R5bGU9IiIgYm9yZGVyPSIwIiBoZWlnaHQ9IjE5MyIgd2lkdGg9 IjQwMCI+PC9hPjwvcD48cD48YnI+PGJyPiMjIyMgQ2x1c3RlciBzb2x1dGlvbjxicj5gYGBweXRo b248YnI+IyEvdXNyL2Jpbi9lbnYgcHl0aG9uPGJyPiMgY29kaW5nPXV0Zi04PGJyPmltcG9ydCBw eXNwYXJrPGJyPnNjID0gcHlzcGFyay5TcGFya0NvbnRleHQoKTxicj52YWxpZF9jbnQgPSBzYy5h Y2N1bXVsYXRvcigwKTxicj48YnI+ZGVmIGlzX3ZhbGlkU1NOKGwpOjxicj4mbmJzcDsmbmJzcDsm bmJzcDsgbCA9IGwuc3RyaXAoKTxicj4mbmJzcDsmbmJzcDsmbmJzcDsgbnVtcyA9IGwuc3BsaXQo Jy0nKTxicj4mbmJzcDsmbmJzcDsmbmJzcDsgY2RuMSA9IChsLmlzZGlnaXQoKSBhbmQgbGVuKGwp ID09IDkpPGJyPiZuYnNwOyZuYnNwOyZuYnNwOyBjZG4yID0gKGxlbihudW1zKSA9PSAzIGFuZCBt YXAobGVuLCBudW1zKSA9PSBbMywgMiwgNF0pPGJyPiZuYnNwOyZuYnNwOyZuYnNwOyBpZiBjZG4x IG9yIGNkbjI6PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyZuYnNwOyBy ZXR1cm4gVHJ1ZTxicj4mbmJzcDsmbmJzcDsmbmJzcDsgcmV0dXJuIEZhbHNlPGJyPjxicj5kZWYg c2V0X2tleShsKTo8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7IGdsb2JhbCB2YWxpZF9jbnQ8YnI+Jm5i c3A7Jm5ic3A7Jm5ic3A7IHZhbGlkX2NudCArPSAxPGJyPiZuYnNwOyZuYnNwOyZuYnNwOyBsID0g bC5zdHJpcCgpPGJyPiZuYnNwOyZuYnNwOyZuYnNwOyBpZiBsZW4obCkgPT0gOTo8YnI+Jm5ic3A7 Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7Jm5ic3A7IHJldHVybiAoaW50KGwpLCBsKTxi cj4mbmJzcDsmbmJzcDsmbmJzcDsgbnVtcyA9IGwuc3BsaXQoJy0nKTxicj4mbmJzcDsmbmJzcDsm bmJzcDsgcmV0dXJuICgxMDAwMDAwKmludChudW1zWzBdKSArIDEwMDAwKmludChudW1zWzFdKSAr IGludChudW1zWzJdKSwgbCk8YnI+PGJyPnJkZCA9IHNjLnRleHRGaWxlKCdzYW1wbGUudHh0Jyk8 YnI+cmRkMSA9IHJkZC5maWx0ZXIobGFtYmRhIHg6IG5vdCBpc192YWxpZFNTTih4KSk8YnI+PGJy PnJkZDIgPSByZGQuZmlsdGVyKGlzX3ZhbGlkU1NOKS5kaXN0aW5jdCgpIFw8YnI+Jm5ic3A7Jm5i c3A7Jm5ic3A7IC5tYXAobGFtYmRhIHg6IHNldF9rZXkoeCkpIFw8YnI+Jm5ic3A7Jm5ic3A7Jm5i c3A7IC5zb3J0QnlLZXkoKS5tYXAobGFtYmRhIHg6IHhbMV0pPGJyPjxicj5mb3IgeCBpbiByZGQx LmNvbGxlY3QoKTo8YnI+Jm5ic3A7Jm5ic3A7Jm5ic3A7IHByaW50ICdJbnZhbGlkIFNTTlx0Jywg eDxicj48YnI+Zm9yIHggaW4gcmRkMi5jb2xsZWN0KCk6PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyBw cmludCAndmFsaWQgU1NOXHQnLCB4PGJyPiZuYnNwOyZuYnNwOyZuYnNwOyA8YnI+cHJpbnQgJ1xu TnVtYmVyIG9mIHZhbGlkIFNTTiBpcyB7fScuZm9ybWF0KHZhbGlkX2NudCk8YnI+PGJyPiMgU2F2 ZSBSREQgdG8gZmlsZSBzeXN0ZW08YnI+cmRkMS5zYXZlQXNUZXh0RmlsZSgnc2FtcGxlX2JhZCcp PGJyPnJkZDIuc2F2ZUFzVGV4dEZpbGUoJ3NhbXBsZV9zb3J0ZWQnKTxicj5zYy5zdG9wKCk8YnI+ YGBgPC9wPjxwIHN0eWxlPSJ0ZXh0LWFsaWduOiBjZW50ZXI7IGNsZWFyOiBib3RoOyIgY2xhc3M9 InNlcGFyYXRvciI+PGEgc3R5bGU9Im1hcmdpbi1sZWZ0OiAxZW07IG1hcmdpbi1yaWdodDogMWVt OyIgaHJlZj0iaHR0cDovLzQuYnAuYmxvZ3Nwb3QuY29tLy1PSDg3VjE4cGMzNC9WSm1DQUpBcUMx SS9BQUFBQUFBQURmQS9YcjhwZTlTN0ZWQS9zMTYwMC9DYXB0dXJlMS5QTkciIGltYWdlYW5jaG9y PSIxIj48aW1nIGRhdGEtb3JpZy1zcmM9Imh0dHA6Ly80LmJwLmJsb2dzcG90LmNvbS8tT0g4N1Yx OHBjMzQvVkptQ0FKQXFDMUkvQUFBQUFBQUFEZkEvWHI4cGU5UzdGVkEvczE2MDAvQ2FwdHVyZTEu UE5HIiBzcmM9Ii8vaW1hZ2VzLWJsb2dnZXItb3BlbnNvY2lhbC5nb29nbGV1c2VyY29udGVudC5j b20vZ2FkZ2V0cy9wcm94eT91cmw9aHR0cCUzQSUyRiUyRjQuYnAuYmxvZ3Nwb3QuY29tJTJGLU9I ODdWMThwYzM0JTJGVkptQ0FKQXFDMUklMkZBQUFBQUFBQURmQSUyRlhyOHBlOVM3RlZBJTJGczE2 MDAlMkZDYXB0dXJlMS5QTkcmYW1wO2NvbnRhaW5lcj1ibG9nZ2VyJmFtcDtnYWRnZXQ9YSZhbXA7 cmV3cml0ZU1pbWU9aW1hZ2UlMkYqIiBzdHlsZT0iIiBib3JkZXI9IjAiIGhlaWdodD0iMjAwIiB3 aWR0aD0iNjQwIj48L2E+PC9wPjxwPjxicj48L3A+PHA+PGJyPjwvcD4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/438116158511483050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=438116158511483050' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/438116158511483050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/438116158511483050'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/12/spark-practice-3-clean-and-sort-social.html' title='Spark practice (3): clean and sort Social Security numbers'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-Iih49DOgGVE/VJmCGe0HnjI/AAAAAAAADfI/aQcVxejSmXM/s72-c/Capture.PNG" height="72" width="72"/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-645898768991537099</id><published>2014-12-12T12:36:00.001-06:00</published><updated>2014-12-12T12:36:59.882-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Spark practice (2): query text using SQL</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=645898768991537099;onPublishedMenu=allposts;onClosedMenu=allposts;postNum=0;src=link&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;In a class of a few children, use SQL to find those who are male and weight over 100.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;class.txt (including Name Sex Age Height Weight)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;Alfred M 14 69.0 112.5 &lt;br /&gt;Alice F 13 56.5 84.0 &lt;br /&gt;Barbara F 13 65.3 98.0 &lt;br /&gt;Carol F 14 62.8 102.5 &lt;br /&gt;Henry M 14 63.5 102.5 &lt;br /&gt;James M 12 57.3 83.0 &lt;br /&gt;Jane F 12 59.8 84.5 &lt;br /&gt;Janet F 15 62.5 112.5 &lt;br /&gt;Jeffrey M 13 62.5 84.0 &lt;br /&gt;John M 12 59.0 99.5 &lt;br /&gt;Joyce F 11 51.3 50.5 &lt;br /&gt;Judy F 14 64.3 90.0 &lt;br /&gt;Louise F 12 56.3 77.0 &lt;br /&gt;Mary F 15 66.5 112.0 &lt;br /&gt;Philip M 16 72.0 150.0 &lt;br /&gt;Robert M 12 64.8 128.0 &lt;br /&gt;Ronald M 15 67.0 133.0 &lt;br /&gt;Thomas M 11 57.5 85.0 &lt;br /&gt;William M 15 66.5 112.0&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;h4 id=&quot;thoughts&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Thoughts&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The challenge is to transform unstructured data to structured data. In this question, a schema has to be applied including column name and type, so that the syntax of SQL is able to query the pure text. &lt;/div&gt;&lt;h4 id=&quot;single-machine-solution&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Single machine solution&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Straight-forward and simple if with Python’s built-in module &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sqlite3&lt;/code&gt;.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; sqlite3&lt;br /&gt;&lt;br /&gt;conn = sqlite3.connect(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;:memory:&#39;&lt;/span&gt;)&lt;br /&gt;c = conn.cursor()&lt;br /&gt;c.execute(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;CREATE TABLE class&lt;br /&gt;             (Name text, Sex text, Age real, Height real, Weight real)&quot;&quot;&quot;&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;with&lt;/span&gt; open(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;class.txt&#39;&lt;/span&gt;) &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; infile:&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; l &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; infile:&lt;br /&gt;        line = l.split()&lt;br /&gt;        c.execute(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;INSERT INTO class VALUES (?,?,?,?,?)&#39;&lt;/span&gt;, line)&lt;br /&gt;conn.commit()&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; c.execute(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;SELECT * FROM class WHERE Sex = &#39;M&#39; AND Weight &amp;gt; 100&quot;&lt;/span&gt;):&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; x&lt;br /&gt;conn.close()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://3.bp.blogspot.com/-82EPLhf98qY/VIs0IInzpKI/AAAAAAAADec/xr7BCafaa3s/s1600/Capture1.PNG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://3.bp.blogspot.com/-82EPLhf98qY/VIs0IInzpKI/AAAAAAAADec/xr7BCafaa3s/s1600/Capture1.PNG&quot; height=&quot;141&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;h4 id=&quot;cluster-solution&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Cluster solution&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Spark SQL is built on Hive, and seamlessly queries the JSON formatted data that is semi-structured. To dump the JSON file on the file system will be the first step. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; os&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; subprocess&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; json&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; pyspark &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; SparkContext&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; pyspark.sql &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; HiveContext&lt;br /&gt;sc = SparkContext()&lt;br /&gt;hiveCtx = HiveContext(sc)&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;trans&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(x)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; {&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;Name&#39;&lt;/span&gt;: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;], &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;Sex&#39;&lt;/span&gt;: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;], &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;Age&#39;&lt;/span&gt;: int(x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;]), \&lt;br /&gt;           &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;Height&#39;&lt;/span&gt;: float(x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;3&lt;/span&gt;]), &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;Weight&#39;&lt;/span&gt;: float(x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;4&lt;/span&gt;])}&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Remove the output directory for JSON if it exists&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;class-output&#39;&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; os.listdir(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;.&#39;&lt;/span&gt;):&lt;br /&gt;    subprocess.call([&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;rm&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;-rf&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;class-output&#39;&lt;/span&gt;])&lt;br /&gt;&lt;br /&gt;rdd = sc.textFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;class.txt&#39;&lt;/span&gt;)&lt;br /&gt;rdd1 = rdd.map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: x.split()).map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: trans(x))&lt;br /&gt;rdd1.map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: json.dumps(x)).saveAsTextFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;class-output&#39;&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;infile = hiveCtx.jsonFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;class-output/part-00000&quot;&lt;/span&gt;)&lt;br /&gt;infile.registerTempTable(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;class&quot;&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;query = hiveCtx.sql(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;SELECT * FROM class WHERE Sex = &#39;M&#39; AND Weight &amp;gt; 100&lt;br /&gt;      &quot;&quot;&quot;&lt;/span&gt;)&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; query.collect():&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; x&lt;br /&gt;&lt;br /&gt;sc.stop()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-nXwla2xDfOQ/VIs0YbGcJiI/AAAAAAAADek/OrfFQvLu3nU/s1600/Capture.PNG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://1.bp.blogspot.com/-nXwla2xDfOQ/VIs0YbGcJiI/AAAAAAAADek/OrfFQvLu3nU/s1600/Capture.PNG&quot; height=&quot;108&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&amp;nbsp;In a conclusion, JSON should be considered if SQL is desired on Spark. &lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:Jmd0OyBJbiBhIGNsYXNzIG9mIGEgZmV3IGNoaWxkcmVuLCB1c2UgU1FMIHRvIGZpbmQgdGhvc2Ug d2hvIGFyZSBtYWxlIGFuZCB3ZWlnaHQgb3ZlciAxMDAuPGJyPjxicj4mZ3Q7IDxicj5gYGA8YnI+ Y2xhc3MudHh0IChpbmNsdWRpbmcgTmFtZSBTZXggQWdlIEhlaWdodCBXZWlnaHQpPGJyPmBgYDxi cj4mZ3Q7IDxicj5gYGA8YnI+QWxmcmVkIE0gMTQgNjkuMCAxMTIuNSA8YnI+QWxpY2UgRiAxMyA1 Ni41IDg0LjAgPGJyPkJhcmJhcmEgRiAxMyA2NS4zIDk4LjAgPGJyPkNhcm9sIEYgMTQgNjIuOCAx MDIuNSA8YnI+SGVucnkgTSAxNCA2My41IDEwMi41IDxicj5KYW1lcyBNIDEyIDU3LjMgODMuMCA8 YnI+SmFuZSBGIDEyIDU5LjggODQuNSA8YnI+SmFuZXQgRiAxNSA2Mi41IDExMi41IDxicj5KZWZm cmV5IE0gMTMgNjIuNSA4NC4wIDxicj5Kb2huIE0gMTIgNTkuMCA5OS41IDxicj5Kb3ljZSBGIDEx IDUxLjMgNTAuNSA8YnI+SnVkeSBGIDE0IDY0LjMgOTAuMCA8YnI+TG91aXNlIEYgMTIgNTYuMyA3 Ny4wIDxicj5NYXJ5IEYgMTUgNjYuNSAxMTIuMCA8YnI+UGhpbGlwIE0gMTYgNzIuMCAxNTAuMCA8 YnI+Um9iZXJ0IE0gMTIgNjQuOCAxMjguMCA8YnI+Um9uYWxkIE0gMTUgNjcuMCAxMzMuMCA8YnI+ VGhvbWFzIE0gMTEgNTcuNSA4NS4wIDxicj5XaWxsaWFtIE0gMTUgNjYuNSAxMTIuMCA8YnI+YGBg PGJyPjxicj4jIyMjIFRob3VnaHRzPGJyPlRoZSBjaGFsbGVuZ2UgaXMgdG8gdHJhbnNmb3JtIHVu c3RydWN0dXJlZCBkYXRhIHRvIHN0cnVjdHVyZWQgZGF0YS4gSW4gdGhpcyBxdWVzdGlvbiwgYSBz Y2hlbWEgaGFzIHRvIGJlIGFwcGxpZWQgaW5jbHVkaW5nIGNvbHVtbiBuYW1lIGFuZCB0eXBlLCBz byB0aGF0IHRoZSBzeW50YXggb2YgU1FMIGlzIGFibGUgdG8gcXVlcnkgdGhlIHB1cmUgdGV4dC4g PGJyPjxicj4jIyMjIFNpbmdsZSBtYWNoaW5lIHNvbHV0aW9uPGJyPlN0cmFpZ2h0LWZvcndhcmQg YW5kIHNpbXBsZSBpZiB3aXRoIFB5dGhvbidzIGJ1aWx0LWluIG1vZHVsZSBgc3FsaXRlM2AuPGJy PmBgYHB5dGhvbjxicj5pbXBvcnQgc3FsaXRlMzxicj48YnI+Y29ubiA9IHNxbGl0ZTMuY29ubmVj dCgnOm1lbW9yeTonKTxicj5jID0gY29ubi5jdXJzb3IoKTxicj5jLmV4ZWN1dGUoIiIiQ1JFQVRF IFRBQkxFIGNsYXNzPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAoTmFtZSB0ZXh0LCBTZXgg dGV4dCwgQWdlIHJlYWwsIEhlaWdodCByZWFsLCBXZWlnaHQgcmVhbCkiIiIpPGJyPjxicj53aXRo IG9wZW4oJ2NsYXNzLnR4dCcpIGFzIGluZmlsZTo8YnI+wqDCoMKgIGZvciBsIGluIGluZmlsZTo8 YnI+wqDCoMKgwqDCoMKgwqAgbGluZSA9IGwuc3BsaXQoKTxicj7CoMKgwqDCoMKgwqDCoCBjLmV4 ZWN1dGUoJ0lOU0VSVCBJTlRPIGNsYXNzIFZBTFVFUyAoPyw/LD8sPyw/KScsIGxpbmUpPGJyPmNv bm4uY29tbWl0KCk8YnI+PGJyPmZvciB4IGluIGMuZXhlY3V0ZSgiU0VMRUNUICogRlJPTSBjbGFz cyBXSEVSRSBTZXggPSAnTScgQU5EIFdlaWdodCAmZ3Q7IDEwMCIpOjxicj7CoMKgwqAgcHJpbnQg eDxicj5jb25uLmNsb3NlKCk8YnI+YGBgPGJyPjxicj48YnI+IyMjIyBDbHVzdGVyIHNvbHV0aW9u PGJyPlNwYXJrIFNRTCBpcyBidWlsdCBvbiBIaXZlLCBhbmQgc2VhbWxlc3NseSBxdWVyaWVzIHRo ZSBKU09OIGZvcm1hdHRlZCBkYXRhIHRoYXQgaXMgc2VtaS1zdHJ1Y3R1cmVkLiBUbyBkdW1wIHRo ZSBKU09OIGZpbGUgb24gdGhlIGZpbGUgc3lzdGVtIHdpbGwgYmUgdGhlIGZpcnN0IHN0ZXAuIDxi cj5gYGBweXRob248YnI+aW1wb3J0IG9zPGJyPmltcG9ydCBzdWJwcm9jZXNzPGJyPmltcG9ydCBq c29uPGJyPmZyb20gcHlzcGFyayBpbXBvcnQgU3BhcmtDb250ZXh0PGJyPmZyb20gcHlzcGFyay5z cWwgaW1wb3J0IEhpdmVDb250ZXh0PGJyPnNjID0gU3BhcmtDb250ZXh0KCk8YnI+aGl2ZUN0eCA9 IEhpdmVDb250ZXh0KHNjKTxicj5kZWYgdHJhbnMoeCk6PGJyPsKgwqDCoCByZXR1cm4geydOYW1l JzogeFswXSwgJ1NleCc6IHhbMV0sICdBZ2UnOiBpbnQoeFsyXSksIFw8YnI+wqDCoMKgwqDCoMKg wqDCoMKgwqAgJ0hlaWdodCc6IGZsb2F0KHhbM10pLCAnV2VpZ2h0JzogZmxvYXQoeFs0XSl9PGJy PiMgUmVtb3ZlIHRoZSBvdXRwdXQgZGlyZWN0b3J5IGZvciBKU09OIGlmIGl0IGV4aXN0czxicj5p ZiAnY2xhc3Mtb3V0cHV0JyBpbiBvcy5saXN0ZGlyKCcuJyk6PGJyPsKgwqDCoCBzdWJwcm9jZXNz LmNhbGwoWydybScsICctcmYnLCAnY2xhc3Mtb3V0cHV0J10pPGJyPjxicj5yZGQgPSBzYy50ZXh0 RmlsZSgnY2xhc3MudHh0Jyk8YnI+cmRkMSA9IHJkZC5tYXAobGFtYmRhIHg6IHguc3BsaXQoKSku bWFwKGxhbWJkYSB4OiB0cmFucyh4KSk8YnI+cmRkMS5tYXAobGFtYmRhIHg6IGpzb24uZHVtcHMo eCkpLnNhdmVBc1RleHRGaWxlKCdjbGFzcy1vdXRwdXQnKTxicj48YnI+aW5maWxlID0gaGl2ZUN0 eC5qc29uRmlsZSgiY2xhc3Mtb3V0cHV0L3BhcnQtMDAwMDAiKTxicj5pbmZpbGUucmVnaXN0ZXJU ZW1wVGFibGUoImNsYXNzIik8YnI+PGJyPnF1ZXJ5ID0gaGl2ZUN0eC5zcWwoIiIiU0VMRUNUICog RlJPTSBjbGFzcyBXSEVSRSBTZXggPSAnTScgQU5EIFdlaWdodCAmZ3Q7IDEwMDxicj7CoMKgwqDC oMKgICIiIik8YnI+Zm9yIHggaW4gcXVlcnkuY29sbGVjdCgpOjxicj7CoMKgwqAgcHJpbnQgeDxi cj48YnI+c2Muc3RvcCgpPGJyPmBgYDxicj48YnI+SW4gYSBjb25jbHVzaW9uLCBKU09OIHNob3Vs ZCBiZSBjb25zaWRlcmVkIGlmIFNRTCBpcyBkZXNpcmVkLiA8YnI+PGJyPg==&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/645898768991537099/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=645898768991537099' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/645898768991537099'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/645898768991537099'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/12/spark-practice-2-query-text-using-sql.html' title='Spark practice (2): query text using SQL'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-82EPLhf98qY/VIs0IInzpKI/AAAAAAAADec/xr7BCafaa3s/s72-c/Capture1.PNG" height="72" width="72"/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-4798203155541449895</id><published>2014-12-07T17:05:00.002-06:00</published><updated>2014-12-08T14:30:00.931-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Spark practice (1): find the stranger that shares the most friends with me</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=4798203155541449895&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Given the friend pairs in the sample text below (each line contains two people who are friends), find the stranger that shares the most friends with me.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;sample.txt&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;me Alice&lt;br /&gt;Henry me&lt;br /&gt;Henry Alice&lt;br /&gt;me Jane&lt;br /&gt;Alice John&lt;br /&gt;Jane John&lt;br /&gt;Judy Alice&lt;br /&gt;me Mary&lt;br /&gt;Mary Joyce&lt;br /&gt;Joyce Henry&lt;br /&gt;Judy me&lt;br /&gt;Judy Jane&lt;br /&gt;John Carol &lt;br /&gt;Carol me&lt;br /&gt;Mary Henry&lt;br /&gt;Louise Ronald&lt;br /&gt;Ronald Thomas&lt;br /&gt;William Thomas&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;h4 id=&quot;thoughts&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Thoughts&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The scenario is commonly seen for a social network user. Spark has three methods to query such data:&lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;MapReduce&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;GraphX&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Spark SQL&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;If I start with the simplest MapReduce approach, then I would like to use two hash tables in Python. First I scan all friend pairs and store the friends for each person in a hash table. Second I use another hash table to count my friends’ friends and pick out the strangers to me.  &lt;/div&gt;&lt;h4 id=&quot;single-machine-solution&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Single machine solution&lt;/h4&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;#!/usr/bin/env python&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# coding=utf-8&lt;/span&gt;&lt;br /&gt;htable1 = {}&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;with&lt;/span&gt; open(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample.txt&#39;&lt;/span&gt;, &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;rb&#39;&lt;/span&gt;) &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;as&lt;/span&gt; infile:&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; l &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; infile:&lt;br /&gt;        line = l.split()&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;] &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; htable1:&lt;br /&gt;            htable1[line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]] = [line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]]&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            htable1[line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]] += [line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]]&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;] &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; htable1:&lt;br /&gt;            htable1[line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]] = [line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]]&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            htable1[line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]] += [line[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]]&lt;br /&gt;&lt;br /&gt;lst = htable1[&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;me&#39;&lt;/span&gt;]&lt;br /&gt;htable2 = {}&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; key, value &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; htable1.iteritems():&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; key &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; lst:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; value:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; lst &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;and&lt;/span&gt; x != &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;me&#39;&lt;/span&gt;: &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# should only limit to strangers&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; htable2:&lt;br /&gt;                    htable2[x] = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;                &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;                    htable2[x] += &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; sorted(htable2, key = htable2.get, reverse = &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;):&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;The stranger {} has {} common friends with me&quot;&lt;/span&gt;.format(x, \&lt;br /&gt;        htable2[x])&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-QIXAgtbR2-o/VITcrf1vMlI/AAAAAAAADeA/jYzmwsGFugQ/s1600/Capture.PNG&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-QIXAgtbR2-o/VITcrf1vMlI/AAAAAAAADeA/jYzmwsGFugQ/s1600/Capture.PNG&quot; height=&quot;128&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The result shows that John has three common friends like I do, followed by Joyce who has two. Therefore, John will be the one who is most likely to be recommended by the social network.&lt;/div&gt;&lt;h4 id=&quot;cluster-solution&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Cluster solution&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;If the log file for the friend pairs is quite big, say, like several GB size, the single machine solution is not able to load the data into the memory and we have to seek help from a cluster. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Spark provides the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;pair RDD&lt;/code&gt; that is similar to a hash table and essentially a key-value structure. To translate the single machine solution to a cluster one, I use the operators from Spark’s Python API including &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;map&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;reduceByKey&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;filter&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;union&lt;/code&gt; and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sortByKey&lt;/code&gt;. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;#!/usr/bin/env python&lt;br /&gt;# coding=utf-&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;8&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; pyspark&lt;br /&gt;sc = pyspark.SparkContext()&lt;br /&gt;# Load data from hdfs&lt;br /&gt;rdd = sc.textFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;hdfs://sample.txt&#39;&lt;/span&gt;) &lt;br /&gt;# Build the first pair RDD&lt;br /&gt;rdd1 = rdd.map(lambda x: x.split()).union(rdd.map(lambda x: x.split()[::-&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]))&lt;br /&gt;# Bring my friend list to local&lt;br /&gt;lst = rdd1.filter(lambda x: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;] == &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;me&#39;&lt;/span&gt;).map(lambda x: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]).collect()&lt;br /&gt;# Build the second pari RDD&lt;br /&gt;rdd2 = rdd1.filter(lambda x: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;] in lst).map(lambda x: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;]) \&lt;br /&gt;    .filter(lambda x: x != &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;me&#39;&lt;/span&gt; and x not in lst) \&lt;br /&gt;    .map(lambda x: (x, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;)).reduceByKey(lambda a, b: a + b) \&lt;br /&gt;    .map(lambda (x, y): (y, x)).sortByKey(ascending = False)&lt;br /&gt;# Save the result to hdfs&lt;br /&gt;rdd2.saveAsTextFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;hdfs://sample_output&quot;&lt;/span&gt;)&lt;br /&gt;# Bring the result to local since the sample is small&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x, y in rdd2.collect():&lt;br /&gt;    print &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;The stranger {} has {} common friends with me&quot;&lt;/span&gt;.format(y, x)&lt;br /&gt;&lt;br /&gt;sc.stop()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-o4uLKQLW6kI/VITc6kS9UDI/AAAAAAAADeI/ZuAao6wCBbA/s1600/Capture2.PNG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://1.bp.blogspot.com/-o4uLKQLW6kI/VITc6kS9UDI/AAAAAAAADeI/ZuAao6wCBbA/s1600/Capture2.PNG&quot; height=&quot;134&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The result is the same. In this experiment, most time is spent on the data loading process from HDFS to the memory. The following MapReduce operations actually costs just a small fraction of overall time. In conclusion, Spark fits well on an iterative data analysis against existing RDD.  &lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:Jmd0OyBHaXZlbiB0aGUgZnJpZW5kIHBhaXJzIGluIHRoZSBzYW1wbGUgdGV4dCBiZWxvdyAoZWFj aCBsaW5lIGNvbnRhaW5zIHR3byBwZW9wbGUgd2hvIGFyZSBmcmllbmRzKSwgZmluZCB0aGUgc3Ry YW5nZXIgdGhhdCBzaGFyZXMgdGhlIG1vc3QgZnJpZW5kcyB3aXRoIG1lLjxicj48YnI+Jmd0OyBz YW1wbGUudHh0PGJyPmBgYDxicj5tZSBBbGljZTxicj5IZW5yeSBtZTxicj5IZW5yeSBBbGljZTxi cj5tZSBKYW5lPGJyPkFsaWNlIEpvaG48YnI+SmFuZSBKb2huPGJyPkp1ZHkgQWxpY2U8YnI+bWUg TWFyeTxicj5NYXJ5IEpveWNlPGJyPkpveWNlIEhlbnJ5PGJyPkp1ZHkgbWU8YnI+SnVkeSBKYW5l PGJyPkpvaG4gQ2Fyb2wgPGJyPkNhcm9sIG1lPGJyPk1hcnkgSGVucnk8YnI+TG91aXNlIFJvbmFs ZDxicj5Sb25hbGQgVGhvbWFzPGJyPldpbGxpYW0gVGhvbWFzPGJyPmBgYDxicj48YnI+IyMjI1Ro b3VnaHRzPGJyPlRoZSBzY2VuYXJpbyBpcyBjb21tb25seSBzZWVuIGZvciBhIHNvY2lhbCBuZXR3 b3JrIHVzZXIuIFNwYXJrIGhhcyB0aHJlZSBtZXRob2RzIHRvIHF1ZXJ5IHN1Y2ggZGF0YTo8YnI+ LSBNYXBSZWR1Y2U8YnI+LSBHcmFwaFg8YnI+LSBTcGFyayBTUUw8YnI+PGJyPklmIEkgc3RhcnQg d2l0aCB0aGUgc2ltcGxlc3QgTWFwUmVkdWNlIGFwcHJvYWNoLCB0aGVuIEkgd291bGQgbGlrZSB0 byB1c2UgdHdvIGhhc2ggdGFibGVzIGluIFB5dGhvbi4gRmlyc3QgSSBzY2FuIGFsbCBmcmllbmQg cGFpcnMgYW5kIHN0b3JlIHRoZSBmcmllbmRzIGZvciBlYWNoIHBlcnNvbiBpbiBhIGhhc2ggdGFi bGUuIFNlY29uZCBJIHVzZSBhbm90aGVyIGhhc2ggdGFibGUgdG8gY291bnQgbXkgZnJpZW5kcycg ZnJpZW5kcyBhbmQgcGljayBvdXQgdGhlIHN0cmFuZ2VycyB0byBtZS7CoCA8YnI+PGJyPiMjIyNT aW5nbGUgbWFjaGluZSBzb2x1dGlvbiA8YnI+YGBgcHl0aG9uPGJyPiMhL3Vzci9iaW4vZW52IHB5 dGhvbjxicj4jIGNvZGluZz11dGYtODxicj5odGFibGUxID0ge308YnI+d2l0aCBvcGVuKCdzYW1w bGUudHh0JywgJ3JiJykgYXMgaW5maWxlOjxicj7CoMKgwqAgZm9yIGwgaW4gaW5maWxlOjxicj7C oMKgwqDCoMKgwqDCoCBsaW5lID0gbC5zcGxpdCgpPGJyPsKgwqDCoMKgwqDCoMKgIGlmIGxpbmVb MF0gbm90IGluIGh0YWJsZTE6PGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgaHRhYmxlMVtsaW5l WzBdXSA9IFtsaW5lWzFdXTxicj7CoMKgwqDCoMKgwqDCoCBlbHNlOjxicj7CoMKgwqDCoMKgwqDC oMKgwqDCoMKgIGh0YWJsZTFbbGluZVswXV0gKz0gW2xpbmVbMV1dPGJyPsKgwqDCoMKgwqDCoMKg IGlmIGxpbmVbMV0gbm90IGluIGh0YWJsZTE6PGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgaHRh YmxlMVtsaW5lWzFdXSA9IFtsaW5lWzBdXTxicj7CoMKgwqDCoMKgwqDCoCBlbHNlOjxicj7CoMKg wqDCoMKgwqDCoMKgwqDCoMKgIGh0YWJsZTFbbGluZVsxXV0gKz0gW2xpbmVbMF1dPGJyPjxicj5s c3QgPSBodGFibGUxWydtZSddPGJyPmh0YWJsZTIgPSB7fTxicj5mb3Iga2V5LCB2YWx1ZSBpbiBo dGFibGUxLml0ZXJpdGVtcygpOjxicj7CoMKgwqAgaWYga2V5IGluIGxzdDo8YnI+wqDCoMKgwqDC oMKgwqAgZm9yIHggaW4gdmFsdWU6PGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgaWYgeCBub3Qg aW4gbHN0IGFuZCB4ICE9ICdtZSc6ICMgc2hvdWxkIG9ubHkgbGltaXQgdG8gc3RyYW5nZXJzPGJy PsKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBpZiB4IG5vdCBpbiBodGFibGUyOjxicj7C oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBodGFibGUyW3hdID0gMTxicj7C oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgZWxzZTo8YnI+wqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqAgaHRhYmxlMlt4XSArPSAxPGJyPjxicj5mb3IgeCBpbiBzb3J0 ZWQoaHRhYmxlMiwga2V5ID0gaHRhYmxlMi5nZXQsIHJldmVyc2UgPSBUcnVlKTo8YnI+wqDCoMKg IHByaW50ICJUaGUgc3RyYW5nZXIge30gaGFzIHt9IGNvbW1vbiBmcmllbmRzIHdpdGggbWUiLmZv cm1hdCh4LCBcPGJyPsKgwqDCoMKgwqDCoMKgIGh0YWJsZTJbeF0pPGJyPmBgYDxicj5UaGUgcmVz dWx0IHNob3dzIHRoYXQgSm9obiBoYXMgdGhyZWUgY29tbW9uIGZyaWVuZHMgbGlrZSBJIGRvLCBm b2xsb3dlZCBieSBKb3ljZSB3aG8gaGFzIHR3by4gVGhlcmVmb3JlLCBKb2huIHdpbGwgYmUgdGhl IG9uZSB3aG8gaXMgbW9zdCBsaWtlbHkgdG8gYmUgcmVjb21tZW5kZWQgYnkgdGhlIHNvY2lhbCBu ZXR3b3JrLjxicj4jIyMjQ2x1c3RlciBzb2x1dGlvbjxicj5JZiB0aGUgbG9nIGZpbGUgZm9yIHRo ZSBmcmllbmQgcGFpcnMgaXMgcXVpdGUgYmlnLCBzYXksIGxpa2Ugc2V2ZXJhbCBHQiBzaXplLCB0 aGUgc2luZ2xlIG1hY2hpbmUgc29sdXRpb24gaXMgbm90IGFibGUgdG8gbG9hZCB0aGUgZGF0YSBp bnRvIHRoZSBtZW1vcnkgYW5kIHdlIGhhdmUgdG8gc2VlayBoZWxwIGZyb20gYSBjbHVzdGVyLiA8 YnI+PGJyPlNwYXJrIHByb3ZpZGVzIHRoZSBgcGFpciBSRERgIHRoYXQgaXMgc2ltaWxhciB0byBh IGhhc2ggdGFibGUgYW5kIGVzc2VudGlhbGx5IGEga2V5LXZhbHVlIHN0cnVjdHVyZS4gVG8gdHJh bnNsYXRlIHRoZSBzaW5nbGUgbWFjaGluZSBzb2x1dGlvbiB0byBhIGNsdXN0ZXIgb25lLCBJIHVz ZSB0aGUgb3BlcmF0b3JzIGZyb20gU3BhcmsncyBQeXRob24gQVBJIGluY2x1ZGluZyBgbWFwYCwg YHJlZHVjZUJ5S2V5YCwgYGZpbHRlcmAsIGB1bmlvbmAgYW5kIGBzb3J0QnlLZXlgLiA8YnI+PGJy PmBgYGphdmE8YnI+IyEvdXNyL2Jpbi9lbnYgcHl0aG9uPGJyPiMgY29kaW5nPXV0Zi04PGJyPmlt cG9ydCBweXNwYXJrPGJyPnNjID0gcHlzcGFyay5TcGFya0NvbnRleHQoKTxicj4jIExvYWQgZGF0 YSBmcm9tIGhkZnM8YnI+cmRkID0gc2MudGV4dEZpbGUoJ2hkZnM6Ly9zYW1wbGUudHh0JykgPGJy PiMgQnVpbGQgdGhlIGZpcnN0IHBhaXIgUkREPGJyPnJkZDEgPSByZGQubWFwKGxhbWJkYSB4OiB4 LnNwbGl0KCkpLnVuaW9uKHJkZC5tYXAobGFtYmRhIHg6IHguc3BsaXQoKVs6Oi0xXSkpPGJyPiMg QnJpbmcgbXkgZnJpZW5kIGxpc3QgdG8gbG9jYWw8YnI+bHN0ID0gcmRkMS5maWx0ZXIobGFtYmRh IHg6IHhbMF0gPT0gJ21lJykubWFwKGxhbWJkYSB4OiB4WzFdKS5jb2xsZWN0KCk8YnI+IyBCdWls ZCB0aGUgc2Vjb25kIHBhcmkgUkREPGJyPnJkZDIgPSByZGQxLmZpbHRlcihsYW1iZGEgeDogeFsw XSBpbiBsc3QpLm1hcChsYW1iZGEgeDogeFsxXSkgXDxicj7CoMKgwqAgLmZpbHRlcihsYW1iZGEg eDogeCAhPSAnbWUnIGFuZCB4IG5vdCBpbiBsc3QpIFw8YnI+wqDCoMKgIC5tYXAobGFtYmRhIHg6 ICh4LCAxKSkucmVkdWNlQnlLZXkobGFtYmRhIGEsIGI6IGEgKyBiKSBcPGJyPsKgwqDCoCAubWFw KGxhbWJkYSAoeCwgeSk6ICh5LCB4KSkuc29ydEJ5S2V5KGFzY2VuZGluZyA9IEZhbHNlKTxicj4j IFNhdmUgdGhlIHJlc3VsdCB0byBoZGZzPGJyPnJkZDIuc2F2ZUFzVGV4dEZpbGUoImhkZnM6Ly9z YW1wbGVfb3V0cHV0LnR4dCIpPGJyPiMgQnJpbmcgdGhlIHJlc3VsdCB0byBsb2NhbCBzaW5jZSB0 aGUgc2FtcGxlIGlzIHNtYWxsPGJyPmZvciB4LCB5IGluIHJkZDIuY29sbGVjdCgpOjxicj7CoMKg wqAgcHJpbnQgIlRoZSBzdHJhbmdlciB7fSBoYXMge30gY29tbW9uIGZyaWVuZHMgd2l0aCBtZSIu Zm9ybWF0KHksIHgpPGJyPjxicj5zYy5zdG9wKCk8YnI+YGBgPGJyPlRoZSByZXN1bHQgaXMgdGhl IHNhbWUuIEluIHRoaXMgZXhwZXJpbWVudCwgbW9zdCB0aW1lIGlzIHNwZW50IG9uIHRoZSBkYXRh IGxvYWRpbmcgcHJvY2VzcyBmcm9tIEhERlMgdG8gdGhlIG1lbW9yeS4gVGhlIGZvbGxvd2luZyBN YXBSZWR1Y2Ugb3BlcmF0aW9ucyBhY3V0YWxseSBjb3N0cyBqdXN0IGEgc21hbGwgZnJhY3Rpb24g b2Ygb3ZlcmFsbCB0aW1lLiBJbiBjb25jbHVzaW9uLCBTcGFyayBmaXRzIHdlbGwgb24gYW4gaXRl cmF0aXZlIGRhdGEgYW5hbHlzaXMgYWdhaW5zdCBleGlzdGluZyBSREQuwqAgPGJyPjxicj4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/4798203155541449895/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=4798203155541449895' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4798203155541449895'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4798203155541449895'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/12/spark-practice-1-find-stranger-that.html' title='Spark practice (1): find the stranger that shares the most friends with me'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-QIXAgtbR2-o/VITcrf1vMlI/AAAAAAAADeA/jYzmwsGFugQ/s72-c/Capture.PNG" height="72" width="72"/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-2046349011753900323</id><published>2014-12-05T13:57:00.001-06:00</published><updated>2014-12-05T13:57:45.295-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>Use a vector to print Pascal&#39;s triangle in SAS</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=2046349011753900323&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Yesterday Rick Wicklin showed a cool SAS/IML function to use a matrix and &lt;a href=&quot;http://blogs.sas.com/content/iml/2014/12/03/pascals-triangle-in-sas/&quot;&gt;print a Pascal’s triangle&lt;/a&gt;. I come up with an alternative solution by using a vector in SAS/IML.&lt;/div&gt;&lt;h3 id=&quot;method&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Method&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Two functions are used, including a main function &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;PascalRule&lt;/code&gt; and a helper function &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;_PascalRule&lt;/code&gt;. The helper function recycles the vector every time and fills the updated values; the main function increases the length of the vector from 1 to n.&lt;/div&gt;&lt;h3 id=&quot;pro&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Pro&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Get the nth row directly, for example, return the 10th row by &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;PascalRule(10)&lt;/code&gt;; no need to use a matrix or matrix related operators; use less memory to fit a possibly bigger &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;n&lt;/code&gt;.&lt;/div&gt;&lt;h3 id=&quot;con&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Con&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;More lines of codes; slowlier to print the triangle, since there is no data structure such as matrix to remember the transient numbers.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-java&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;proc iml;&lt;br /&gt;    &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;/* The main function */&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-function&quot;&gt;start &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;PascalRule&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(n)&lt;/span&gt;&lt;/span&gt;;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; n &amp;lt;= &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;hljs-function&quot;&gt;then&lt;br /&gt;            &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;({&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;})&lt;/span&gt;&lt;/span&gt;;&lt;br /&gt;        answer = {&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;};&lt;br /&gt;        do i = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt; to n - &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt; ;&lt;br /&gt;            answer = _PascalRule(answer);&lt;br /&gt;        end;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt;(answer);&lt;br /&gt;    finish;&lt;br /&gt;    &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;/* The helper function */&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-function&quot;&gt;start &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;_PascalRule&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(vector)&lt;/span&gt;&lt;/span&gt;;&lt;br /&gt;        previous = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;;&lt;br /&gt;        do i = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;hljs-function&quot;&gt;to &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;nrow&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(vector)&lt;/span&gt;&lt;/span&gt;;&lt;br /&gt;            current = vector[i];&lt;br /&gt;            vector[i] = previous + current;&lt;br /&gt;            previous = current;&lt;br /&gt;        end;&lt;br /&gt;        vector = vector &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;// {1};&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt;(vector);&lt;br /&gt;    finish;&lt;br /&gt;    &lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;/* Print the pascal&#39;s triangle */&lt;/span&gt;&lt;br /&gt;    do i = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt; to &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;10&lt;/span&gt;;&lt;br /&gt;        x = PascalRule(i);&lt;br /&gt;        x = x`;&lt;br /&gt;        print x;&lt;br /&gt;    end;&lt;br /&gt;quit;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Theoretically, Rick’s solution has a time complexity of &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(N^2)&lt;/code&gt; and a space complexity of &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(N^2)&lt;/code&gt;, while my solution has a time complexity of &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(N^3)&lt;/code&gt; (unfortunately have to use three times of &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;do loop&lt;/code&gt; in IML) and a space complexity of &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(N)&lt;/code&gt;. Actually it&#39;s a trade-off between speed and memory cost.&lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:WWVzdGVyZGF5IFJpY2sgV2lja2xpbiBzaG93ZWQgYSBjb29sIFNBUy9JTUwgZnVuY3Rpb24gdG8g dXNlIGEgbWF0cml4IGFuZCBbcHJpbnQgYSBQYXNjYWwncyB0cmlhbmdsZV0oaHR0cDovL2Jsb2dz LnNhcy5jb20vY29udGVudC9pbWwvMjAxNC8xMi8wMy9wYXNjYWxzLXRyaWFuZ2xlLWluLXNhcy8p LiBJIGNvbWUgdXAgd2l0aCBhbiBhbHRlcm5hdGl2ZSBzb2x1dGlvbiBieSB1c2luZyBhIHZlY3Rv ciBpbiBTQVMvSU1MLjxicj48YnI+IyMjTWV0aG9kPGJyPlR3byBmdW5jdGlvbnMgYXJlIHVzZWQs IGluY2x1ZGluZyBhIG1haW4gZnVuY3Rpb24gYFBhc2NhbFJ1bGVgIGFuZCBhIGhlbHBlciBmdW5j dGlvbiBgX1Bhc2NhbFJ1bGVgLiBUaGUgaGVscCBmdW5jdGlvbiByZWN5Y2xlcyB0aGUgdmVjdG9y IGV2ZXJ5IHRpbWUgYW5kIGZpbGxzIHRoZSB1cGRhdGVkIHZhbHVlczsgdGhlIG1haW4gZnVuY3Rp b24gaW5jcmVhc2VzIHRoZSBsZW5ndGggb2YgdGhlIHZlY3RvciBmcm9tIDEgdG8gbi48YnI+PGJy Pjxicj4jIyNQcm88YnI+R2V0IHRoZSBudGggcm93IGRpcmVjdGx5LCBmb3IgZXhhbXBsZSwgcmV0 dXJuIHRoZSAxMHRoIHJvdyBieSBgUGFzY2FsUnVsZSgxMClgOyBubyBuZWVkIHRvIHVzZSBhIG1h dHJpeCBvciBtYXRyaXggcmVsYXRlZCBvcGVyYXRvcnM7IHVzZSBsZXNzIG1lbW9yeSB0byBmaXQg YSBwb3NzaWJseSBiaWdnZXIgYG5gLjxicj48YnI+IyMjQ29uPGJyPk1vcmUgbGluZXMgb2YgY29k ZXM7IHNsb3dsaWVyIHRvIHByaW50IHRoZSB0cmlhbmdsZSwgc2luY2UgdGhlcmUgaXMgbm8gZGF0 YSBzdHJ1Y3R1cmUgc3VjaCBhcyBtYXRyaXggdG8gcmVtZW1iZXIgdGhlIHRyYW5zaWVudCBudW1i ZXJzLjxicj5gYGBqYXZhPGJyPnByb2MgaW1sOzxicj7CoMKgwqAgLyogVGhlIG1haW4gZnVuY3Rp b24gKi88YnI+wqDCoMKgIHN0YXJ0IFBhc2NhbFJ1bGUobik7PGJyPsKgwqDCoMKgwqDCoMKgIGlm IG4gJmx0Oz0gMSB0aGVuPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgcmV0dXJuKHsxfSk7PGJy PsKgwqDCoMKgwqDCoMKgIGFuc3dlciA9IHsxLCAxfTs8YnI+wqDCoMKgwqDCoMKgwqAgZG8gaSA9 IDEgdG8gbiAtIDIgOzxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgIGFuc3dlciA9IF9QYXNjYWxS dWxlKGFuc3dlcik7PGJyPsKgwqDCoMKgwqDCoMKgIGVuZDs8YnI+wqDCoMKgwqDCoMKgwqAgcmV0 dXJuKGFuc3dlcik7PGJyPsKgwqDCoCBmaW5pc2g7PGJyPsKgwqDCoCAvKiBUaGUgaGVscGVyIGZ1 bmN0aW9uICovPGJyPsKgwqDCoCBzdGFydCBfUGFzY2FsUnVsZSh2ZWN0b3IpOzxicj7CoMKgwqDC oMKgwqDCoCBwcmV2aW91cyA9IDE7PGJyPsKgwqDCoMKgwqDCoMKgIGRvIGkgPSAyIHRvIG5yb3co dmVjdG9yKTs8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBjdXJyZW50ID0gdmVjdG9yW2ldOzxi cj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgIHZlY3RvcltpXSA9IHByZXZpb3VzICsgY3VycmVudDs8 YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBwcmV2aW91cyA9IGN1cnJlbnQ7PGJyPsKgwqDCoMKg wqDCoMKgIGVuZDs8YnI+wqDCoMKgwqDCoMKgwqAgdmVjdG9yID0gdmVjdG9yIC8vIHsxfTs8YnI+ wqDCoMKgwqDCoMKgwqAgcmV0dXJuKHZlY3Rvcik7PGJyPsKgwqDCoCBmaW5pc2g7PGJyPsKgwqDC oCAvKiBQcmludCB0aGUgcGFzY2FsJ3MgdHJpYW5nbGUgKi88YnI+wqDCoMKgIGRvIGkgPSAxIHRv IDEwOzxicj7CoMKgwqDCoMKgwqDCoCB4ID0gUGFzY2FsUnVsZShpKTs8YnI+wqDCoMKgwqDCoMKg wqAgeCA9IHhgOzxicj7CoMKgwqDCoMKgwqDCoCBwcmludCB4Ozxicj7CoMKgwqAgZW5kOzxicj5x dWl0Ozxicj5gYGA8YnI+VGhlb3JldGljYWxseSwgUmljaydzIHNvbHV0aW9uIGhhcyBhIHRpbWUg Y29tcGxleGl0eSBvZiBgTyhOXjIpYCBhbmQgYSBzcGFjZSBjb21wbGV4aXR5IG9mIGBPKE5eMilg LCB3aGlsZSBteSBzb2x1dGlvbiBoYXMgYSB0aW1lIGNvbXBsZXhpdHkgb2YgYE8oTl4zKWAgKHVu Zm9ydHVuYXRlbHkgaGF2ZSB0byB1c2UgdGhyZWUgdGltZXMgb2YgYGRvIGxvb3BgIGluIElNTCkg YW5kIGEgc3BhY2UgY29tcGxleGl0eSBvZiBgTyhOKWAuIEFjdHVhbGx5IGl0dCBpcyBhIHRyYWRl LW9mZiBiZXR3ZWVuIHNwZWVkIGFuZCBtZW1vcnkgY29zdC48YnI+PGJyPjxicj48YnI+&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/2046349011753900323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=2046349011753900323' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/2046349011753900323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/2046349011753900323'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/12/use-vector-to-print-pascals-triangle-in.html' title='Use a vector to print Pascal&#39;s triangle in SAS'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-6232425220663233636</id><published>2014-11-28T13:59:00.001-06:00</published><updated>2014-11-28T13:59:08.580-06:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="spark"/><title type='text'>Minimize complexity by Spark</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=6232425220663233636&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;There is always a trade-off between time complexity and space complexity for computer programs. Deceasing the time cost will increase space cost, and vice versa, The ideal solution to parallelize the program to multiple cores if there is a multiple-core computer, or even scale it out to multiple machines across a cluster, which would eventually reduce both time complexity and space complexity.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;https://spark.apache.org/&quot;&gt;Spark&lt;/a&gt; is currently the hottest platform for cluster computing on top of Hadoop, and its Python interface provides &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;map&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;reduce&lt;/code&gt; and many other methods, which allow a mapRecdue job in a straightforward way, and therefore easily migrate an algorithm from a single machine to a cluster of many machines.  &lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Minimize space complexity&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;There is a question to look for the only single number from a mostly paired-number array.  &lt;/div&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Single Number&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;  Given an array of integers, every element appears twice except for one.&lt;br /&gt;    Find that single one.&lt;br /&gt;    Note:&lt;br /&gt;    Your algorithm should have a linear runtime complexity.&lt;br /&gt;    Could you implement it without using extra memory? &lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The optimal space complexity for this question is &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(1)&lt;/code&gt; by using the bit manipulator &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;xor&lt;/code&gt;. For a cluster, since Spark aggregates memory acrosss machines, the space complexity may become &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(1/k)&lt;/code&gt;, where k is the number of the machines in the cluster.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Space.py&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; pyspark&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; random &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; shuffle&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; operator &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; xor&lt;br /&gt;sc = pyspark.Spark.Context()&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Create the test case and the target is 99&lt;/span&gt;&lt;br /&gt;testCase = range(&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;99&lt;/span&gt;) + range(&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;100&lt;/span&gt;)&lt;br /&gt;shuffle(testCase)&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Run the testing with Spark&lt;/span&gt;&lt;br /&gt;result = sc.parallelize(testCase).reduce(xor)&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Show the result&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; result&lt;br /&gt;sc.stop()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Minimize time complexity&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;There is a question to implement the function (or a method) that returns the square root of an integer.&lt;/div&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Sqrt(x)&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;Implement int sqrt(int x).&lt;br /&gt;Compute and return the square root of x.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The optimal solution could achieve the time complexity of &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(lgN)&lt;/code&gt; by using &lt;a href=&quot;http://en.wikipedia.org/wiki/Binary_search_algorithm&quot;&gt;binary search&lt;/a&gt;. If we pass the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sqrt&lt;/code&gt; function to Spark, then the time complexity will decreased to &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(lgN/k)&lt;/code&gt;, where k is the number of the machines in the cluster.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Time.py&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; pyspark&lt;br /&gt;sc = pyspark.Spark.Context()&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Implement binary search for square root function&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(x)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; x &amp;lt; &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; isinstance(x, int):&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;raise&lt;/span&gt; ValueError&lt;br /&gt;    hi, lo = x/&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt; + &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;while&lt;/span&gt; hi &amp;gt;= lo:&lt;br /&gt;        mid = (hi + lo) / &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;2&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; mid * mid &amp;gt; x:&lt;br /&gt;            hi = mid - &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            lo = mid + &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; int(hi)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Test the square root algorithm&lt;/span&gt;&lt;br /&gt;testCase = sc.parallelize(xrange(&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;, &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;100&lt;/span&gt;))&lt;br /&gt;result = testCase.map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: sqrt(x))&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Show the result&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; result.collect():&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; x&lt;br /&gt;sc.stop()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Find the worst rating by accounts&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;There is a question to find the worst one among a few rating letters for each of the account numbers. &lt;/div&gt;&lt;blockquote style=&quot;border-left: 4px solid rgb(221, 221, 221); color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Want to find the worst rating for each account number. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;sample.txt is below&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;Account_number    Rating&lt;br /&gt;1            A&lt;br /&gt;1            B&lt;br /&gt;2            A&lt;br /&gt;2            B&lt;br /&gt;2            C&lt;br /&gt;3            A&lt;br /&gt;3            C&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;the desired result should be like  &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;1            B&lt;br /&gt;2            C&lt;br /&gt;3            C&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The question is essentially one of the grouping questons. Spark’s pair RDD, which reflects the key-value relationship for groups, supplies a one-line solution for it.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; pyspark&lt;br /&gt;sc = pyspark.SparkContext()&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Squeeze the letters by keys&lt;/span&gt;&lt;br /&gt;rdd = sc.textFile(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;sample.txt&#39;&lt;/span&gt;)&lt;br /&gt;result = rdd.map(&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: x.split()).filter(x: x[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;].isdigit()).reduceByKey(max) &lt;br /&gt;&lt;span class=&quot;hljs-comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Show the result&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; result.collect(): &lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;print&lt;/span&gt; x&lt;br /&gt;sc.stop()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;In a conclusion, Spark significantly changes the way we think about data analysis. &lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; max-height: 0; max-width: 0; overflow: hidden; padding: 0; width: 0;&quot; title=&quot;MDH:VGhlcmUgaXMgYWx3YXlzIGEgdHJhZGUtb2ZmIGJldHdlZW4gdGltZSBjb21wbGV4aXR5IGFuZCBz cGFjZSBjb21wbGV4aXR5IGZvciBjb21wdXRlciBwcm9ncmFtcy4gRGVjZWFzaW5nIHRoZSB0aW1l IGNvc3Qgd2lsbCBpbmNyZWFzZSBzcGFjZSBjb3N0LCBhbmQgdmljZSB2ZXJzYSwgVGhlIGlkZWFs IHNvbHV0aW9uIHRvIHBhcmFsbGVsaXplIHRoZSBwcm9ncmFtIHRvIG11bHRpcGxlIGNvcmVzIGlm IHRoZXJlIGlzIGEgbXVsdGlwbGUtY29yZSBjb21wdXRlciwgb3IgZXZlbiBzY2FsZSBpdCBvdXQg dG8gbXV0aXBsZSBtYWNoaW5lcyBhY3Jvc3MgYSBjbHVzdGVyLCB3aGljaCB3b3VsZCBldmVudHVh bGx5IHJlZHVjZSBib3RoIHRpbWUgY29tcGxleGl0eSBhbmQgc3BhY2UgY29tcGxleGl0eS48YnI+ PGJyPltTcGFya10oaHR0cHM6Ly9zcGFyay5hcGFjaGUub3JnLykgaXMgY3VycmVudGx5IHRoZSBo b3R0ZXN0IHBsYXRmb3JtIGZvciBjbHVzdGVyIGNvbXB1dGluZyBvbiB0b3Agb2YgSGFkb29wLCBh bmQgaXRzIFB5dGhvbiBpbnRlcmZhY2UgcHJvdmlkZXMgYG1hcGAsIGByZWR1Y2VgIGFuZCBtYW55 IG90aGVyIG1ldGhvZHMsIHdoaWNoIGFsbG93IGEgbWFwUmVjZHVlIGpvYiBpbiBhIHN0cmFpZ2h0 Zm9yd2FyZCB3YXksIGFuZCB0aGVyZWZvcmUgZWFzaWx5IG1pZ3JhdGUgYW4gYWxnb3JpdGhtIGZy b20gYSBzaW5nbGUgbWFjaGluZSB0byBhIGNsdXN0ZXIgb2YgbWFueSBtYWNoaW5lcy7CoCA8YnI+ PGJyPjxicj4tIE1pbmltaXplIHNwYWNlIGNvbXBsZXhpdHk8YnI+PGJyPlRoZXJlIGlzIGEgcXVl c3Rpb24gdG8gbG9vayBmb3IgdGhlIG9ubHkgc2luZ2xlIG51bWJlciBmcm9tIGEgbW9zdGx5IHBh aXJlZC1udW1iZXIgYXJyYXkuwqAgPGJyPiZndDsgU2luZ2xlIE51bWJlcjxicj48YnI+Jmd0O8Kg wqDCoMKgwqDCoCBHaXZlbiBhbiBhcnJheSBvZiBpbnRlZ2VycywgZXZlcnkgZWxlbWVudCBhcHBl YXJzIHR3aWNlIGV4Y2VwdCBmb3Igb25lLjxicj7CoMKgwqDCoMKgwqDCoCBGaW5kIHRoYXQgc2lu Z2xlIG9uZS48YnI+wqDCoMKgwqDCoMKgwqAgTm90ZTo8YnI+wqDCoMKgwqDCoMKgwqAgWW91ciBh bGdvcml0aG0gc2hvdWxkIGhhdmUgYSBsaW5lYXIgcnVudGltZSBjb21wbGV4aXR5Ljxicj7CoMKg wqDCoMKgwqDCoCBDb3VsZCB5b3UgaW1wbGVtZW50IGl0IHdpdGhvdXQgdXNpbmcgZXh0cmEgbWVt b3J5PyA8YnI+PGJyPlRoZSBvcHRpbWFsIHNwYWNlIGNvbXBsZXhpdHkgZm9yIHRoaXMgcXVlc3Rp b24gaXMgYE8oMSlgIGJ5IHVzaW5nIHRoZSBiaXQgbWFuaXB1bGF0b3IgYHhvcmAuIEZvciBhIGNs dXN0ZXIsIHNpbmNlIFNwYXJrIGFnZ3JlZ2F0ZXMgbWVtb3J5IGFjcm9zc3MgbWFjaGluZXMsIHRo ZSBzcGFjZSBjb21wbGV4aXR5IG1heSBiZWNvbWUgYE8oMS9rKWAsIHdoZXJlIGsgaXMgdGhlIG51 bWJlciBvZiB0aGUgbWFjaGluZXMgaW4gdGhlIGNsdXN0ZXIuPGJyPjxicj5gYGBweXRob248YnI+ IyBTcGFjZS5weTxicj5pbXBvcnQgcHlzcGFyazxicj5mcm9tIHJhbmRvbSBpbXBvcnQgc2h1ZmZs ZTxicj5mcm9tIG9wZXJhdG9yIGltcG9ydCB4b3I8YnI+c2MgPSBweXNwYXJrLlNwYXJrLkNvbnRl eHQoKTxicj48YnI+IyBDcmVhdGUgdGhlIHRlc3QgY2FzZSBhbmQgdGhlIHRhcmdldCBpcyA5OTxi cj50ZXN0Q2FzZSA9IHJhbmdlKDAsIDk5KSArIHJhbmdlKDAsIDEwMCk8YnI+c2h1ZmZsZSh0ZXN0 Q2FzZSk8YnI+IyBSdW4gdGhlIHRlc3Rpbmcgd2l0aCBTcGFyazxicj5yZXN1bHQgPSBzYy5wYXJh bGxlbGl6ZSh0ZXN0Q2FzZSkucmVkdWNlKHhvcik8YnI+IyBTaG93IHRoZSByZXN1bHQ8YnI+cHJp bnQgcmVzdWx0PGJyPnNjLnN0b3AoKTxicj5gYGA8YnI+PGJyPi0gTWluaW1pemUgdGltZSBjb21w bGV4aXR5PGJyPjxicj5UaGVyZSBpcyBhIHF1ZXN0aW9uIHRvIGltcGxlbWVudCB0aGUgZnVuY3Rp b24gKG9yIGEgbWV0aG9kKSB0aGF0IHJldHVybnMgdGhlIHNxdWFyZSByb290IG9mIGFuIGludGVn ZXIuPGJyPjxicj4mZ3Q7IFNxcnQoeCk8YnI+YGBgPGJyPkltcGxlbWVudCBpbnQgc3FydChpbnQg eCkuPGJyPkNvbXB1dGUgYW5kIHJldHVybiB0aGUgc3F1YXJlIHJvb3Qgb2YgeC48YnI+YGBgPGJy Pjxicj5UaGUgb3B0aW1hbCBzb2x1dGlvbiBjb3VsZCBhY2hpZXZlIHRoZSB0aW1lIGNvbXBsZXhp dHkgb2YgYE8obGdOKWAgYnkgdXNpbmcgW2JpbmFyeSBzZWFyY2hdKGh0dHA6Ly9lbi53aWtpcGVk aWEub3JnL3dpa2kvQmluYXJ5X3NlYXJjaF9hbGdvcml0aG0pLiBJZiB3ZSBwYXNzIHRoZSBgc3Fy dGAgZnVuY3Rpb24gdG8gU3BhcmssIHRoZW4gdGhlIHRpbWUgY29tcGxleGl0eSB3aWxsIGRlY3Jl YXNlZCB0byBgTyhsZ04vaylgLCB3aGVyZSBrIGlzIHRoZSBudW1iZXIgb2YgdGhlIG1hY2hpbmVz IGluIHRoZSBjbHVzdGVyLjxicj48YnI+YGBgcHl0aG9uPGJyPiMgVGltZS5weTxicj5pbXBvcnQg cHlzcGFyazxicj5zYyA9IHB5c3BhcmsuU3BhcmsuQ29udGV4dCgpPGJyPiMgSW1wbGVtZW50IGJp bmFyeSBzZWFyY2ggZm9yIHNxdWFyZSByb290IGZ1bmN0aW9uPGJyPmRlZiBzcXJ0KHgpOjxicj7C oMKgwqAgaWYgeCAmbHQ7IDAgb3Igbm90IGlzaW5zdGFuY2UoeCwgaW50KTo8YnI+wqDCoMKgwqDC oMKgwqAgcmFpc2UgVmFsdWVFcnJvcjxicj7CoMKgwqAgaGksIGxvID0geC8yICsgMSwgMDxicj7C oMKgwqAgd2hpbGUgaGkgJmd0Oz0gbG86PGJyPsKgwqDCoMKgwqDCoMKgIG1pZCA9IChoaSArIGxv KSAvIDI8YnI+wqDCoMKgwqDCoMKgwqAgaWYgbWlkICogbWlkICZndDsgeDo8YnI+wqDCoMKgwqDC oMKgwqDCoMKgwqDCoCBoaSA9IG1pZCAtIDE8YnI+wqDCoMKgwqDCoMKgwqAgZWxzZTo8YnI+wqDC oMKgwqDCoMKgwqDCoMKgwqDCoCBsbyA9IG1pZCArIDE8YnI+wqDCoMKgIHJldHVybiBpbnQoaGkp PGJyPsKgwqDCoCA8YnI+IyBUZXN0IHRoZSBzcXVhcmUgcm9vdCBhbGdvcml0aG08YnI+dGVzdENh c2UgPSBzYy5wYXJhbGxlbGl6ZSh4cmFuZ2UoMSwgMTAwKSk8YnI+cmVzdWx0ID0gdGVzdENhc2Uu bWFwKGxhbWJkYSB4OiBzcXJ0KHgpKTxicj4jIFNob3cgdGhlIHJlc3VsdDxicj5mb3IgeCBpbiBy ZXN1bHQuY29sbGVjdCgpOjxicj7CoMKgwqAgcHJpbnQgeDxicj5zYy5zdG9wKCk8YnI+YGBgPGJy Pjxicj4tIEZpbmQgdGhlIHdvcnN0IHJhdGluZyBieSBhY2NvdW50czxicj48YnI+VGhlcmUgaXMg YSBxdWVzdGlvbiB0byBmaW5kIHRoZSB3b3JzdCBvbmUgYW1vbmcgYSBmZXcgcmF0aW5nIGxldHRl cnMgZm9yIGVhY2ggb2YgdGhlIGFjY291bnQgbnVtYmVycy4gPGJyPjxicj4mZ3Q7IFdhbnQgdG8g ZmluZCB0aGUgd29yc3QgcmF0aW5nIGZvciBlYWNoIGFjY291bnQgbnVtYmVyLiA8YnI+PGJyPiZn dDsgc2FtcGxlLnR4dCBpcyBiZWxvdzxicj5gYGA8YnI+QWNjb3VudF9udW1iZXLCoMKgwqAgUmF0 aW5nPGJyPjHCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIEE8YnI+McKgwqDCoMKgwqDCoMKgwqDCoMKg wqAgQjxicj4ywqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBBPGJyPjLCoMKgwqDCoMKgwqDCoMKgwqDC oMKgIEI8YnI+MsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgQzxicj4zwqDCoMKgwqDCoMKgwqDCoMKg wqDCoCBBPGJyPjPCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIEM8YnI+YGBgPGJyPnRoZSBkZXNpcmVk IHJlc3VsdCBzaG91bGQgYmUgbGlrZcKgIDxicj5gYGA8YnI+McKgwqDCoMKgwqDCoMKgwqDCoMKg wqAgQjxicj4ywqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBDPGJyPjPCoMKgwqDCoMKgwqDCoMKgwqDC oMKgIEM8YnI+YGBgPGJyPjxicj5UaGUgcXVlc3Rpb24gaXMgZXNzZW50aWFsbHkgb25lIG9mIHRo ZSBncm91cGluZyBxdWVzdG9ucy4gU3BhcmsncyBwYWlyIFJERCwgd2hpY2ggcmVmbGVjdHMgdGhl IGtleS12YWx1ZSByZWxhdGlvbnNoaXAgZm9yIGdyb3Vwcywgc3VwcGxpZXMgYSBvbmUtbGluZSBz b2x1dGlvbiBmb3IgaXQuPGJyPjxicj5gYGBweXRob248YnI+aW1wb3J0IHB5c3Bhcms8YnI+c2Mg PSBweXNwYXJrLlNwYXJrQ29udGV4dCgpPGJyPjxicj4jIFNxdWVlemUgdGhlIGxldHRlcnMgYnkg a2V5czxicj5yZGQgPSBzYy50ZXh0RmlsZSgnc2FtcGxlLnR4dCcpPGJyPnJlc3VsdCA9IHJkZC5t YXAobGFtYmRhIHg6IHguc3BsaXQoKSkuZmlsdGVyKHg6IHhbMF0uaXNkaWdpdCgpKS5yZWR1Y2VC eUtleShtYXgpIDxicj4jIFNob3cgdGhlIHJlc3VsdDxicj5mb3IgeCBpbiByZXN1bHQuY29sbGVj dCgpOiA8YnI+wqDCoMKgIHByaW50IHg8YnI+c2Muc3RvcCgpPGJyPmBgYDxicj48YnI+SW4gYSBj b25jbHVzaW9uLCBTcGFyayBzaWduaWZpY2FudGx5IGNoYW5nZXMgdGhlIHdheSB3ZSB0aGluayBh Ym91dCBkYXRhIGFuYWx5c2lzLiA8YnI+&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/6232425220663233636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=6232425220663233636' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/6232425220663233636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/6232425220663233636'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/11/minimize-complexity-by-spark.html' title='Minimize complexity by Spark'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-4108135054526708767</id><published>2014-10-20T21:18:00.000-05:00</published><updated>2014-10-21T09:34:26.812-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'>Automated testing by pytest</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=4108135054526708767&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;The most hard part in testing is to write test cases, which is time-consuming and error-prone. Fortunately, besides Python built-in modules such as &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;doctest&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;unittest&lt;/code&gt;, there are quite a few third-party packages that could help with automated testing. My favorite one is &lt;a href=&quot;http://pytest.org/latest/#&quot;&gt;pytest&lt;/a&gt;, which enjoys proven record and syntax sugar. &lt;/div&gt;&lt;h3 id=&quot;step-1-test-driven-development&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Step 1: test-driven development&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;For example, there is a &lt;a href=&quot;https://oj.leetcode.com/problems/find-minimum-in-rotated-sorted-array/&quot;&gt;coding challenge on Leetcode&lt;/a&gt;: &lt;/div&gt;&lt;blockquote style=&quot;border-left-color: rgb(221, 221, 221); border-left-style: solid; border-left-width: 4px; color: #777777; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Find Minimum in Rotated Sorted Array &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Suppose a sorted array is rotated at some pivot unknown to you beforehand.&lt;br /&gt;    (i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2).&lt;br /&gt;    Find the minimum element.&lt;br /&gt;    You may assume no duplicate exists in the array.&lt;/div&gt;&lt;/blockquote&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;The straightforward way to find a minimal element in an array(or list in Python) is sequential searching, which goes through every element and has a time complexity of &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(N)&lt;/code&gt;. If the array is sorted, then the minimal one is the first element that only costs &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(1)&lt;/code&gt;.&lt;br /&gt;However, this question provides a rotated sorted array, which suggests a binary search and reduces the complexity from &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(N)&lt;/code&gt; to &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(logN)&lt;/code&gt;.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;As usual, write the test cases first. The great thing for pytest is that it significantly simplies the effort to code the test cases: in this example, I only use 3 lines to generate 101 test cases to cover all conditions from 0 to 99 and also include an null test.  &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Next step is to code the function. It is easy to transplant &lt;a href=&quot;http://en.wikipedia.org/wiki/Binary_search_algorithm&quot;&gt;the iterative approach of binary search&lt;/a&gt; to this question. If the pointer is between a sorted segment, then return the most left element as minimal. Otherwise, adjust the right boundary and the left boundary. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 255); border-bottom-left-radius: 3px; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot;&gt;# test1.py&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;import&lt;/span&gt; pytest&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot;&gt;# Prepare 101 test cases&lt;/span&gt;&lt;br /&gt;array = list(range(&lt;span class=&quot;hljs-number&quot;&gt;100&lt;/span&gt;))&lt;br /&gt;_testdata = [[array[i: ] + array[ :i], &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;] &lt;span class=&quot;hljs-keyword&quot;&gt;for&lt;/span&gt; i &lt;span class=&quot;hljs-keyword&quot;&gt;in&lt;/span&gt; range(&lt;span class=&quot;hljs-number&quot;&gt;100&lt;/span&gt;)]&lt;br /&gt;_testdata += [pytest.mark.empty(([], &lt;span class=&quot;hljs-keyword&quot;&gt;None&lt;/span&gt;))]&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot;&gt;# Code the initial binary search function&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findMinPrev&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(num)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    lo, hi = &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;, len(num) - &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;while&lt;/span&gt; lo &amp;lt;= hi:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; num[lo] &amp;lt;= num[hi]:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; num[lo]&lt;br /&gt;        mid = (lo + hi) / &lt;span class=&quot;hljs-number&quot;&gt;2&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; num[mid] &amp;lt; num[hi]:&lt;br /&gt;            hi = mid - &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            lo = mid + &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-decorator&quot;&gt;@pytest.mark.parametrize(&#39;input, expected&#39;, _testdata)&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;test_findMinPrev&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(input, expected)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;assert&lt;/span&gt; findMinPrev(input) == expected&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;After running the &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;py.test -v test1.py&lt;/code&gt; command, part of the results shows below. 65 tests passed and 36 failed; the failed cases return the much bigger values that suggests out of boundary during loops, and the selection of the boudaries may be too aggresive.  &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 255); border-bottom-left-radius: 3px; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;test1.py:20: AssertionError&lt;br /&gt;_________________________ test_findMinPrev[input98-0] _________________________&lt;br /&gt;&lt;br /&gt;input = [98, 99, 0, 1, 2, 3, ...], expected = 0&lt;br /&gt;&lt;br /&gt;    @pytest.mark.parametrize(&#39;input, expected&#39;, _testdata)&lt;br /&gt;    def test_findMinPrev(input, expected):&lt;br /&gt;&amp;gt;       assert findMinPrev(input) == expected&lt;br /&gt;E       assert 98 == 0&lt;br /&gt;E        +  where 98 = findMinPrev([98, 99, 0, 1, 2, 3, ...])&lt;br /&gt;&lt;br /&gt;test1.py:20: AssertionError&lt;br /&gt;==================== 36 failed, 65 passed in 0.72 seconds =====================&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Now I adjust the right boundary slightly and finally come up with a solution that passes all the tests. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 255); border-bottom-left-radius: 3px; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findMin&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(num)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    lo, hi = &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;, len(num) - &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;while&lt;/span&gt; lo &amp;lt;= hi:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; num[lo] &amp;lt;= num[hi]:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; num[lo]&lt;br /&gt;        mid = (lo + hi) / &lt;span class=&quot;hljs-number&quot;&gt;2&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; num[mid] &amp;lt; num[hi]:&lt;br /&gt;            hi = mid&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            lo = mid + &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&quot;step-2-performance-profiling&quot; style=&quot;font-size: 1.3em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Step 2: performance profiling&lt;/h3&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;Besides the right solution, I am also interested in if the binary search method has indeed improved the performance. This step I choose &lt;a href=&quot;https://pypi.python.org/pypi/line_profiler/&quot;&gt;line_profiler&lt;/a&gt; given its line-by-line ability of profiling. I take the most basic one (the sequential search) as benchmark, and also include the method that applies the &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;min&lt;/code&gt; function since a few functions similar to it in Pyhton implement &lt;a href=&quot;http://www.sasanalysis.com/2012/01/do-loop-vs-vectorization-in-sasiml.html&quot;&gt;vectorizaiton&lt;/a&gt; to speed up. The test case is a rotated sorted array with 10 million elements.  &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 255); border-bottom-left-radius: 3px; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-comment&quot;&gt;# test2.py&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;from&lt;/span&gt; line_profiler &lt;span class=&quot;hljs-keyword&quot;&gt;import&lt;/span&gt; LineProfiler&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;from&lt;/span&gt; sys &lt;span class=&quot;hljs-keyword&quot;&gt;import&lt;/span&gt; maxint&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-decorator&quot;&gt;@profile&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findMinRaw&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(num)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-string&quot;&gt;&quot;&quot;&quot;Sequential searching&quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;not&lt;/span&gt; num:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; &lt;br /&gt;    min_val = maxint&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;hljs-keyword&quot;&gt;in&lt;/span&gt; num:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; x &amp;lt; min_val:&lt;br /&gt;            min_val = x&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; min_val&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-decorator&quot;&gt;@profile&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findMinLst&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(num)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-string&quot;&gt;&quot;&quot;&quot;Searching by list comprehension&quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot;&gt;not&lt;/span&gt; num:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; min(num)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-decorator&quot;&gt;@profile&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot;&gt;findMin&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(num)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-string&quot;&gt;&quot;&quot;&quot;&quot;Binary search&quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;    lo, hi = &lt;span class=&quot;hljs-number&quot;&gt;0&lt;/span&gt;, len(num) - &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;hljs-keyword&quot;&gt;while&lt;/span&gt; lo &amp;lt;= hi:&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; num[lo] &amp;lt;= num[hi]:&lt;br /&gt;            &lt;span class=&quot;hljs-keyword&quot;&gt;return&lt;/span&gt; num[lo]&lt;br /&gt;        mid = (lo + hi) / &lt;span class=&quot;hljs-number&quot;&gt;2&lt;/span&gt;&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;if&lt;/span&gt; num[mid] &amp;lt; num[hi]:&lt;br /&gt;            hi = mid&lt;br /&gt;        &lt;span class=&quot;hljs-keyword&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;            lo = mid + &lt;span class=&quot;hljs-number&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot;&gt;# Prepare a rotated array&lt;/span&gt;&lt;br /&gt;array = list(range(&lt;span class=&quot;hljs-number&quot;&gt;10000000&lt;/span&gt;))&lt;br /&gt;_testdata = array[&lt;span class=&quot;hljs-number&quot;&gt;56780&lt;/span&gt;: ] + array[ :&lt;span class=&quot;hljs-number&quot;&gt;56780&lt;/span&gt;]&lt;br /&gt;&lt;span class=&quot;hljs-comment&quot;&gt;# Test the three functions&lt;/span&gt;&lt;br /&gt;findMinRaw(_testdata)&lt;br /&gt;findMinLst(_testdata)&lt;br /&gt;findMin(_testdata)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;After running &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;kernprof -l -v test2.py&lt;/code&gt;, I have the output as below. The sequential search has hit the loops 10000001 times and costs almost 14 seconds. The &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;min&lt;/code&gt; function encapsulate all details inside and uses 0.5 seconds which is 28 times faster. On the contrary, the binary search method only takes 20 loops to find the minimal value and spends just 0.0001 seconds. As a result, while dealing with large number, an improved algorithm can really save time. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; background: rgb(248, 248, 255); border-bottom-left-radius: 3px; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block !important; display: block; display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;Total time: 13.8512 s&lt;br /&gt;File: test2.py&lt;br /&gt;Function: findMinRaw at line 4&lt;br /&gt;&lt;br /&gt;Line #      Hits         Time  Per Hit   % Time  Line Contents&lt;br /&gt;==============================================================&lt;br /&gt;     4                                           @profile&lt;br /&gt;     5                                           def findMinRaw(num):&lt;br /&gt;     6         1           13     13.0      0.0      if not num:&lt;br /&gt;     7                                                   return&lt;br /&gt;     8         1            3      3.0      0.0      min_val = maxint&lt;br /&gt;     9  10000001     16031900      1.6     47.5      for x in num:&lt;br /&gt;    10  10000000     17707821      1.8     52.5          if x &amp;lt; min_val:&lt;br /&gt;    11         2            5      2.5      0.0              min_val = x&lt;br /&gt;    12         1            3      3.0      0.0      return min_val&lt;br /&gt;&lt;br /&gt;Total time: 0.510298 s&lt;br /&gt;File: test2.py&lt;br /&gt;Function: findMinLst at line 15&lt;br /&gt;&lt;br /&gt;Line #      Hits         Time  Per Hit   % Time  Line Contents&lt;br /&gt;==============================================================&lt;br /&gt;    15                                           @profile&lt;br /&gt;    16                                           def findMinLst(num):&lt;br /&gt;    17         1            4      4.0      0.0      if not num:&lt;br /&gt;    18                                                   return&lt;br /&gt;    19         1      1243016 1243016.0    100.0      return min(num)&lt;br /&gt;&lt;br /&gt;Total time: 0.000101812 s&lt;br /&gt;File: test2.py&lt;br /&gt;Function: findMin at line 22&lt;br /&gt;&lt;br /&gt;Line #      Hits         Time  Per Hit   % Time  Line Contents&lt;br /&gt;==============================================================&lt;br /&gt;    22                                           @profile&lt;br /&gt;    23                                           def findMin(num):&lt;br /&gt;    24         1           15     15.0      6.0      lo, hi = 0, len(num) - 1&lt;br /&gt;    25        20           40      2.0     16.1      while lo &amp;lt;= hi:&lt;br /&gt;    26        20           48      2.4     19.4          if num[lo] &amp;lt;= num[hi]:&lt;br /&gt;    27         1            2      2.0      0.8              return num[lo]&lt;br /&gt;    28        19           54      2.8     21.8          mid = (lo + hi) / 2&lt;br /&gt;    29        19           50      2.6     20.2          if num[mid] &amp;lt; num[hi]:&lt;br /&gt;    30         5           10      2.0      4.0              hi = mid&lt;br /&gt;    31                                                   else:&lt;br /&gt;    32        14           29      2.1     11.7              lo = mid + 1&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; padding: 0;&quot; title=&quot;MDH:PGRpdiBjbGFzcz0ibWFya2Rvd24taGVyZS13cmFwcGVyIiBkYXRhLWJsb2dnZXItZXNjYXBlZC1k YXRhLW1kLXVybD0iaHR0cHM6Ly93d3cuYmxvZ2dlci5jb20vYmxvZ2dlci5nP2Jsb2dJRD0zMjU2 MTU5MzI4NjMwMDQxNDE2I2VkaXRvci90YXJnZXQ9cG9zdDtwb3N0SUQ9NDEwODEzNTA1NDUyNjcw ODc2NyI+PGRpdiBzdHlsZT0ibWFyZ2luOiAxLjJlbSAwcHggIWltcG9ydGFudDsiPjxwPlRoZSBt b3N0IGhhcmQgcGFydCBpbiB0ZXN0aW5nIGlzIHRvIHdyaXRlIHRlc3QgY2FzZXMsIHdoaWNoIGlz IHRpbWUtY29uc3VtaW5nIGFuZCBlcnJvci1wcm9uZS4gRm9ydHVuYXRlbHksIGJlc2lkZXMgUHl0 aG9uIGJ1aWx0LWluIG1vZHVsZXMgc3VjaCBhcyBgZG9jdGVzdGAsIGB1bml0dGVzdGAsIHRoZXJl IGFyZSBxdWl0ZSBhIGZldyB0aGlyZC1wYXJ0eSBwYWNrYWdlcyB0aGF0IGNvdWxkIGhlbHAgd2l0 aCBhdXRvbWF0ZWQgdGVzdGluZy4gTXkgZmF2b3JpdGUgb25lIGlzIFtweXRlc3RdKGh0dHA6Ly9w eXRlc3Qub3JnL2xhdGVzdC8jKSwgd2hpY2ggZW5qb3lzIHByb3ZlbiByZWNvcmQgYW5kIHN5bnRh eCBzdWdhci4mbmJzcDs8L3A+PHA+PGJyPjwvcD48cD4jIyNTdGVwIDE6IHRlc3QtZHJpdmVuIGRl dmVsb3BtZW50PC9wPjxwPjxicj48L3A+PHA+Rm9yIGV4YW1wbGUsIHRoZXJlIGlzIGEgW2NvZGlu ZyBjaGFsbGVuZ2Ugb24gTGVldGNvZGVdKGh0dHBzOi8vb2oubGVldGNvZGUuY29tL3Byb2JsZW1z L2ZpbmQtbWluaW11bS1pbi1yb3RhdGVkLXNvcnRlZC1hcnJheS8pOiZuYnNwOzwvcD48cD4mZ3Q7 IEZpbmQgTWluaW11bSBpbiBSb3RhdGVkIFNvcnRlZCBBcnJheSZuYnNwOzwvcD48cD48YnI+PC9w PjxwPiZndDsgU3VwcG9zZSBhIHNvcnRlZCBhcnJheSBpcyByb3RhdGVkIGF0IHNvbWUgcGl2b3Qg dW5rbm93biB0byB5b3UgYmVmb3JlaGFuZC48L3A+PHA+Jm5ic3A7ICZuYnNwOyAoaS5lLiwgMCAx IDIgNCA1IDYgNyBtaWdodCBiZWNvbWUgNCA1IDYgNyAwIDEgMikuPC9wPjxwPiZuYnNwOyAmbmJz cDsgRmluZCB0aGUgbWluaW11bSBlbGVtZW50LjwvcD48cD4mbmJzcDsgJm5ic3A7IFlvdSBtYXkg YXNzdW1lIG5vIGR1cGxpY2F0ZSBleGlzdHMgaW4gdGhlIGFycmF5LjwvcD48cD48YnI+PC9wPjxw PlRoZSBzdHJhaWdodGZvcndhcmQgd2F5IHRvIGZpbmQgYSBtaW5pbWFsIGVsZW1lbnQgaW4gYW4g YXJyYXkob3IgbGlzdCBpbiBQeXRob24pIGlzIHNlcXVlbnRpYWwgc2VhcmNoaW5nLCB3aGljaCBn b2VzIHRocm91Z2ggZXZlcnkgZWxlbWVudCBhbmQgaGFzIGEgdGltZSBjb21wbGV4aXR5IG9mIGBP KE4pYC4gSWYgdGhlIGFycmF5IGlzIHNvcnRlZCwgdGhlbiB0aGUgbWluaW1hbCBvbmUgaXMgdGhl IGZpcnN0IGVsZW1lbnQgdGhhdCBvbmx5IGNvc3RzIGBPKDEpYC4mbmJzcDs8L3A+PHA+SG93ZXZl ciwgdGhpcyBxdWVzdGlvbiBwcm92aWRlcyBhIHJvdGF0ZWQgc29ydGVkIGFycmF5LCB3aGljaCBz dWdnZXN0cyBhIGJpbmFyeSBzZWFyY2ggYW5kIHJlZHVjZXMgdGhlIGNvbXBsZXhpdHkgZnJvbSBg TyhOKWAgdG8gYE8obG9nTilgLjwvcD48cD48YnI+PC9wPjxwPkFzIHVzdWFsLCB3cml0ZSB0aGUg dGVzdCBjYXNlcyBmaXJzdC4gVGhlIGdyZWF0IHRoaW5nIGZvciBweXRlc3QgaXMgdGhhdCBpdCBz aWduaWZpY2FudGx5IHNpbXBsaWVzIHRoZSBlZmZvcnQgdG8gY29kZSB0aGUgdGVzdCBjYXNlczog aW4gdGhpcyBleGFtcGxlLCBJIG9ubHkgdXNlIDMgbGluZXMgdG8gZ2VuZXJhdGUgMTAxIHRlc3Qg Y2FzZXMgdG8gY292ZXIgYWxsIGNvbmRpdGlvbnMgZnJvbSAwIHRvIDk5IGFuZCBhbHNvIGluY2x1 ZGUgYW4gbnVsbCB0ZXN0LiAmbmJzcDs8L3A+PHA+PGJyPjwvcD48cD5OZXh0IHN0ZXAgaXMgdG8g Y29kZSB0aGUgZnVuY3Rpb24uIEl0IGlzIGVhc3kgdG8gdHJhbnNwbGFudCBbdGhlIGl0ZXJhdGl2 ZSBhcHByb2FjaCBvZiBiaW5hcnkgc2VhcmNoXShodHRwOi8vZW4ud2lraXBlZGlhLm9yZy93aWtp L0JpbmFyeV9zZWFyY2hfYWxnb3JpdGhtKSB0byB0aGlzIHF1ZXN0aW9uLiBJZiB0aGUgcG9pbnRl ciBpcyBiZXR3ZWVuIGEgc29ydGVkIHNlZ21lbnQsIHRoZW4gcmV0dXJuIHRoZSBtb3N0IGxlZnQg ZWxlbWVudCBhcyBtaW5pbWFsLiBPdGhlcndpc2UsIGFkanVzdCB0aGUgcmlnaHQgYm91bmRhcnkg YW5kIHRoZSBsZWZ0IGJvdW5kYXJ5LiZuYnNwOzwvcD48cD5gYGBweXRob248L3A+PHA+IyB0ZXN0 MS5weTwvcD48cD5pbXBvcnQgcHl0ZXN0PC9wPjxwPjxicj48L3A+PHA+IyBQcmVwYXJlIDEwMSB0 ZXN0IGNhc2VzPC9wPjxwPmFycmF5ID0gbGlzdChyYW5nZSgxMDApKTwvcD48cD5fdGVzdGRhdGEg PSBbW2FycmF5W2k6IF0gKyBhcnJheVsgOmldLCAwXSBmb3IgaSBpbiByYW5nZSgxMDApXTwvcD48 cD5fdGVzdGRhdGEgKz0gW3B5dGVzdC5tYXJrLmVtcHR5KChbXSwgTm9uZSkpXTwvcD48cD48YnI+ PC9wPjxwPiMgQ29kZSB0aGUgaW5pdGlhbCBiaW5hcnkgc2VhcmNoIGZ1bmN0aW9uPC9wPjxwPmRl ZiBmaW5kTWluUHJldihudW0pOjwvcD48cD4mbmJzcDsgJm5ic3A7IGxvLCBoaSA9IDAsIGxlbihu dW0pIC0gMTwvcD48cD4mbmJzcDsgJm5ic3A7IHdoaWxlIGxvICZsdDs9IGhpOjwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgaWYgbnVtW2xvXSAmbHQ7PSBudW1baGldOjwvcD48cD4m bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyByZXR1cm4gbnVtW2xvXTwv cD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgbWlkID0gKGxvICsgaGkpIC8gMjwvcD48 cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgaWYgbnVtW21pZF0gJmx0OyBudW1baGldOjwv cD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBoaSA9IG1pZCAt IDE8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGVsc2U6PC9wPjxwPiZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGxvID0gbWlkICsgMTwvcD48cD48YnI+ PC9wPjxwPkBweXRlc3QubWFyay5wYXJhbWV0cml6ZSgnaW5wdXQsIGV4cGVjdGVkJywgX3Rlc3Rk YXRhKTwvcD48cD5kZWYgdGVzdF9maW5kTWluUHJldihpbnB1dCwgZXhwZWN0ZWQpOjwvcD48cD4m bmJzcDsgJm5ic3A7IGFzc2VydCBmaW5kTWluUHJldihpbnB1dCkgPT0gZXhwZWN0ZWQ8L3A+PHA+ YGBgPC9wPjxwPkFmdGVyIHJ1bm5pbmcgdGhlIGBweS50ZXN0IC12IHRlc3QxLnB5YCBjb21tYW5k LCBwYXJ0IG9mIHRoZSByZXN1bHRzIHNob3dzIGJlbG93LiA2NSB0ZXN0cyBwYXNzZWQgYW5kIDM2 IGZhaWxlZDsgdGhlIGZhaWxlZCBjYXNlcyByZXR1cm4gdGhlIG11Y2ggYmlnZ2VyIHZhbHVlcyB0 aGF0IHN1Z2dlc3RzIG91dCBvZiBib3VuZGFyeSBkdXJpbmcgbG9vcHMsIGFuZCB0aGUgc2VsZWN0 aW9uIG9mIHRoZSBib3VkYXJpZXMgbWF5IGJlIHRvbyBhZ2dyZXNpdmUuICZuYnNwOzwvcD48cD5g YGA8L3A+PHA+dGVzdDEucHk6MjA6IEFzc2VydGlvbkVycm9yPC9wPjxwPl9fX19fX19fX19fX19f X19fX19fX19fX18gdGVzdF9maW5kTWluUHJldltpbnB1dDk4LTBdIF9fX19fX19fX19fX19fX19f X19fX19fX188L3A+PHA+PGJyPjwvcD48cD5pbnB1dCA9IFs5OCwgOTksIDAsIDEsIDIsIDMsIC4u Ll0sIGV4cGVjdGVkID0gMDwvcD48cD48YnI+PC9wPjxwPiZuYnNwOyAmbmJzcDsgQHB5dGVzdC5t YXJrLnBhcmFtZXRyaXplKCdpbnB1dCwgZXhwZWN0ZWQnLCBfdGVzdGRhdGEpPC9wPjxwPiZuYnNw OyAmbmJzcDsgZGVmIHRlc3RfZmluZE1pblByZXYoaW5wdXQsIGV4cGVjdGVkKTo8L3A+PHA+Jmd0 OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBhc3NlcnQgZmluZE1pblByZXYoaW5wdXQpID09IGV4cGVj dGVkPC9wPjxwPkUgJm5ic3A7ICZuYnNwOyAmbmJzcDsgYXNzZXJ0IDk4ID09IDA8L3A+PHA+RSAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsrICZuYnNwO3doZXJlIDk4ID0gZmluZE1pblByZXYo Wzk4LCA5OSwgMCwgMSwgMiwgMywgLi4uXSk8L3A+PHA+PGJyPjwvcD48cD50ZXN0MS5weToyMDog QXNzZXJ0aW9uRXJyb3I8L3A+PHA+PT09PT09PT09PT09PT09PT09PT0gMzYgZmFpbGVkLCA2NSBw YXNzZWQgaW4gMC43MiBzZWNvbmRzID09PT09PT09PT09PT09PT09PT09PTwvcD48cD5gYGA8L3A+ PHA+Tm93IEkgYWRqdXN0IHRoZSByaWdodCBib3VuZGFyeSBzbGlnaHRseSBhbmQgZmluYWxseSBj b21lIHVwIHdpdGggYSBzb2x1dGlvbiB0aGF0IHBhc3NlcyBhbGwgdGhlIHRlc3RzLiZuYnNwOzwv cD48cD5gYGBweXRob248L3A+PHA+ZGVmIGZpbmRNaW4obnVtKTo8L3A+PHA+Jm5ic3A7ICZuYnNw OyBsbywgaGkgPSAwLCBsZW4obnVtKSAtIDE8L3A+PHA+Jm5ic3A7ICZuYnNwOyB3aGlsZSBsbyAm bHQ7PSBoaTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGlmIG51bVtsb10gJmx0 Oz0gbnVtW2hpXTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgcmV0dXJuIG51bVtsb108L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IG1pZCA9 IChsbyArIGhpKSAvIDI8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGlmIG51bVtt aWRdICZsdDsgbnVtW2hpXTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgaGkgPSBtaWQ8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGVsc2U6 PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGxvID0gbWlk ICsgMTwvcD48cD5gYGA8L3A+PHA+PGJyPjwvcD48cD4jIyNTdGVwIDI6IHBlcmZvcm1hbmNlIHBy b2ZpbGluZyZuYnNwOzwvcD48cD5CZXNpZGVzIHRoZSByaWdodCBzb2x1dGlvbiwgSSBhbSBhbHNv IGludGVyZXN0ZWQgaW4gaWYgdGhlIGJpbmFyeSBzZWFyY2ggbWV0aG9kIGhhcyBpbmRlZWQgaW1w cm92ZWQgdGhlIHBlcmZvcm1hbmNlLiBUaGlzIHN0ZXAgSSBjaG9vc2UgW2xpbmVfcHJvZmlsZXJd KGh0dHBzOi8vcHlwaS5weXRob24ub3JnL3B5cGkvbGluZV9wcm9maWxlci8pIGdpdmVuIGl0cyBs aW5lLWJ5LWxpbmUgYWJpbGl0eSBvZiBwcm9maWxpbmcuIEkgdGFrZSB0aGUgbW9zdCBiYXNpYyBv bmUgKHRoZSBzZXF1ZW50aWFsIHNlYXJjaCkgYXMgYmVuY2htYXJrLCBhbmQgYWxzbyBpbmNsdWRl IHRoZSBtZXRob2QgdGhhdCBhcHBsaWVzIHRoZSBgbWluYCBmdW5jdGlvbiBzaW5jZSBhIGZldyBm dW5jdGlvbnMgc2ltaWxhciB0byBpdCBpbiBQeWh0b24gaW1wbGVtZW50IFt2ZWN0b3JpemFpdG9u XShodHRwOi8vd3d3LnNhc2FuYWx5c2lzLmNvbS8yMDEyLzAxL2RvLWxvb3AtdnMtdmVjdG9yaXph dGlvbi1pbi1zYXNpbWwuaHRtbCkgdG8gc3BlZWQgdXAuIFRoZSB0ZXN0IGNhc2UgaXMgYSByb3Rh dGVkIHNvcnRlZCBhcnJheSB3aXRoIDEwIG1pbGxpb24gZWxlbWVudHMuICZuYnNwOzwvcD48cD5g YGBweXRob248L3A+PHA+IyB0ZXN0Mi5weTwvcD48cD5mcm9tIGxpbmVfcHJvZmlsZXIgaW1wb3J0 IExpbmVQcm9maWxlcjwvcD48cD5mcm9tIHN5cyBpbXBvcnQgbWF4aW50PC9wPjxwPjxicj48L3A+ PHA+QHByb2ZpbGU8L3A+PHA+ZGVmIGZpbmRNaW5SYXcobnVtKTo8L3A+PHA+Jm5ic3A7ICZuYnNw OyAiIiJTZXF1ZW50aWFsIHNlYXJjaGluZyIiIjwvcD48cD4mbmJzcDsgJm5ic3A7IGlmIG5vdCBu dW06PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyByZXR1cm4mbmJzcDs8L3A+PHA+ Jm5ic3A7ICZuYnNwOyBtaW5fdmFsID0gbWF4aW50PC9wPjxwPiZuYnNwOyAmbmJzcDsgZm9yIHgg aW4gbnVtOjwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgaWYgeCAmbHQ7IG1pbl92 YWw6PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IG1pbl92 YWwgPSB4PC9wPjxwPiZuYnNwOyAmbmJzcDsgcmV0dXJuIG1pbl92YWw8L3A+PHA+PGJyPjwvcD48 cD5AcHJvZmlsZTwvcD48cD5kZWYgZmluZE1pbkxzdChudW0pOjwvcD48cD4mbmJzcDsgJm5ic3A7 ICIiIlNlYXJjaGluZyBieSBsaXN0IGNvbXByZWhlbnNpb24iIiI8L3A+PHA+Jm5ic3A7ICZuYnNw OyBpZiBub3QgbnVtOjwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgcmV0dXJuPC9w PjxwPiZuYnNwOyAmbmJzcDsgcmV0dXJuIG1pbihudW0pPC9wPjxwPjxicj48L3A+PHA+QHByb2Zp bGU8L3A+PHA+ZGVmIGZpbmRNaW4obnVtKTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAiIiIiQmluYXJ5 IHNlYXJjaCIiIjwvcD48cD4mbmJzcDsgJm5ic3A7IGxvLCBoaSA9IDAsIGxlbihudW0pIC0gMTwv cD48cD4mbmJzcDsgJm5ic3A7IHdoaWxlIGxvICZsdDs9IGhpOjwvcD48cD4mbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgaWYgbnVtW2xvXSAmbHQ7PSBudW1baGldOjwvcD48cD4mbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyByZXR1cm4gbnVtW2xvXTwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgbWlkID0gKGxvICsgaGkpIC8gMjwvcD48cD4mbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgaWYgbnVtW21pZF0gJmx0OyBudW1baGldOjwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBoaSA9IG1pZDwvcD48cD4mbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgZWxzZTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgbG8gPSBtaWQgKyAxPC9wPjxwPjxicj48L3A+PHA+IyBQcmVw YXJlIGEgcm90YXRlZCBhcnJheTwvcD48cD5hcnJheSA9IGxpc3QocmFuZ2UoMTAwMDAwMDApKTwv cD48cD5fdGVzdGRhdGEgPSBhcnJheVs1Njc4MDogXSArIGFycmF5WyA6NTY3ODBdPC9wPjxwPiMg VGVzdCB0aGUgdGhyZWUgZnVuY3Rpb25zPC9wPjxwPmZpbmRNaW5SYXcoX3Rlc3RkYXRhKTwvcD48 cD5maW5kTWluTHN0KF90ZXN0ZGF0YSk8L3A+PHA+ZmluZE1pbihfdGVzdGRhdGEpPC9wPjxwPjxi cj48L3A+PHA+YGBgPC9wPjxwPjxicj48L3A+PHA+QWZ0ZXIgcnVubmluZyBga2VybnByb2YgLWwg LXYgdGVzdDIucHlgLCBJIGhhdmUgdGhlIG91dHB1dCBhcyBiZWxvdy4gVGhlIHNlcXVlbnRpYWwg c2VhcmNoIGhhcyBoaXQgdGhlIGxvb3BzIDEwMDAwMDAxIHRpbWVzIGFuZCBjb3N0cyBhbG1vc3Qg MTQgc2Vjb25kcy4gVGhlIGBtaW5gIGZ1bmN0aW9uIGVuY2Fwc3VsYXRlIGFsbCBkZXRhaWxzIGlu c2lkZSBhbmQgdXNlcyAwLjUgc2Vjb25kcyB3aGljaCBpcyAyOCB0aW1lcyBmYXN0ZXIuIE9uIHRo ZSBjb250cmFyeSwgdGhlIGJpbmFyeSBzZWFyY2ggbWV0aG9kIG9ubHkgdGFrZXMgMjAgbG9vcHMg dG8gZmluZCB0aGUgbWluaW1hbCB2YWx1ZSBhbmQgc3BlbmRzIGp1c3QgMC4wMDAxIHNlY29uZHMu IEFzIGEgcmVzdWx0LCB3aGlsZSBkZWFsaW5nIHdpdGggbGFyZ2UgbnVtYmVyLCBhbiBpbXByb3Zl ZCBhbGdvcml0aG0gY2FuIHJlYWxseSBzYXZlIHRpbWUuJm5ic3A7PC9wPjxwPjxicj48L3A+PHA+ YGBgPC9wPjxwPlRvdGFsIHRpbWU6IDEzLjg1MTIgczwvcD48cD5GaWxlOiB0ZXN0Mi5weTwvcD48 cD5GdW5jdGlvbjogZmluZE1pblJhdyBhdCBsaW5lIDQ8L3A+PHA+PGJyPjwvcD48cD5MaW5lICMg Jm5ic3A7ICZuYnNwOyAmbmJzcDtIaXRzICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBUaW1l ICZuYnNwO1BlciBIaXQgJm5ic3A7ICUgVGltZSAmbmJzcDtMaW5lIENvbnRlbnRzPC9wPjxwPj09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PC9wPjxwPiZuYnNwOyAmbmJzcDsgJm5ic3A7NCAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyBAcHJvZmlsZTwvcD48cD4mbmJzcDsgJm5ic3A7ICZuYnNwOzUgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgZGVmIGZpbmRNaW5SYXcobnVtKTo8L3A+PHA+Jm5ic3A7ICZu YnNwOyAmbmJzcDs2ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAxICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgMTMgJm5ic3A7ICZuYnNwOyAxMy4wICZuYnNwOyAmbmJzcDsg Jm5ic3A7MC4wICZuYnNwOyAmbmJzcDsgJm5ic3A7aWYgbm90IG51bTo8L3A+PHA+Jm5ic3A7ICZu YnNwOyAmbmJzcDs3ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyByZXR1cm48L3A+PHA+Jm5ic3A7ICZuYnNwOyAmbmJzcDs4ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAxICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7MyAmbmJzcDsgJm5ic3A7ICZuYnNwOzMuMCAmbmJzcDsgJm5ic3A7ICZuYnNwOzAu MCAmbmJzcDsgJm5ic3A7ICZuYnNwO21pbl92YWwgPSBtYXhpbnQ8L3A+PHA+Jm5ic3A7ICZuYnNw OyAmbmJzcDs5ICZuYnNwOzEwMDAwMDAxICZuYnNwOyAmbmJzcDsgMTYwMzE5MDAgJm5ic3A7ICZu YnNwOyAmbmJzcDsxLjYgJm5ic3A7ICZuYnNwOyA0Ny41ICZuYnNwOyAmbmJzcDsgJm5ic3A7Zm9y IHggaW4gbnVtOjwvcD48cD4mbmJzcDsgJm5ic3A7IDEwICZuYnNwOzEwMDAwMDAwICZuYnNwOyAm bmJzcDsgMTc3MDc4MjEgJm5ic3A7ICZuYnNwOyAmbmJzcDsxLjggJm5ic3A7ICZuYnNwOyA1Mi41 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDtpZiB4ICZsdDsgbWluX3ZhbDo8L3A+ PHA+Jm5ic3A7ICZuYnNwOyAxMSAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgMiAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOzUgJm5ic3A7ICZuYnNwOyAmbmJzcDsy LjUgJm5ic3A7ICZuYnNwOyAmbmJzcDswLjAgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7bWluX3ZhbCA9IHg8L3A+PHA+Jm5ic3A7ICZuYnNwOyAxMiAmbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgMSAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOzMgJm5ic3A7ICZuYnNwOyAmbmJzcDszLjAgJm5ic3A7ICZuYnNwOyAmbmJzcDsw LjAgJm5ic3A7ICZuYnNwOyAmbmJzcDtyZXR1cm4gbWluX3ZhbDwvcD48cD48YnI+PC9wPjxwPlRv dGFsIHRpbWU6IDAuNTEwMjk4IHM8L3A+PHA+RmlsZTogdGVzdDIucHk8L3A+PHA+RnVuY3Rpb246 IGZpbmRNaW5Mc3QgYXQgbGluZSAxNTwvcD48cD48YnI+PC9wPjxwPkxpbmUgIyAmbmJzcDsgJm5i c3A7ICZuYnNwO0hpdHMgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IFRpbWUgJm5ic3A7UGVy IEhpdCAmbmJzcDsgJSBUaW1lICZuYnNwO0xpbmUgQ29udGVudHM8L3A+PHA+PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT08L3A+PHA+ Jm5ic3A7ICZuYnNwOyAxNSAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBAcHJvZmls ZTwvcD48cD4mbmJzcDsgJm5ic3A7IDE2ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 IGRlZiBmaW5kTWluTHN0KG51bSk6PC9wPjxwPiZuYnNwOyAmbmJzcDsgMTcgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7IDEgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDs0ICZuYnNwOyAmbmJzcDsgJm5ic3A7NC4wICZuYnNwOyAmbmJzcDsgJm5ic3A7MC4wICZuYnNw OyAmbmJzcDsgJm5ic3A7aWYgbm90IG51bTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAxOCAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgcmV0 dXJuPC9wPjxwPiZuYnNwOyAmbmJzcDsgMTkgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IDEg Jm5ic3A7ICZuYnNwOyAmbmJzcDsxMjQzMDE2IDEyNDMwMTYuMCAmbmJzcDsgJm5ic3A7MTAwLjAg Jm5ic3A7ICZuYnNwOyAmbmJzcDtyZXR1cm4gbWluKG51bSk8L3A+PHA+PGJyPjwvcD48cD5Ub3Rh bCB0aW1lOiAwLjAwMDEwMTgxMiBzPC9wPjxwPkZpbGU6IHRlc3QyLnB5PC9wPjxwPkZ1bmN0aW9u OiBmaW5kTWluIGF0IGxpbmUgMjI8L3A+PHA+PGJyPjwvcD48cD5MaW5lICMgJm5ic3A7ICZuYnNw OyAmbmJzcDtIaXRzICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBUaW1lICZuYnNwO1BlciBI aXQgJm5ic3A7ICUgVGltZSAmbmJzcDtMaW5lIENvbnRlbnRzPC9wPjxwPj09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PC9wPjxwPiZu YnNwOyAmbmJzcDsgMjIgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgQHByb2ZpbGU8 L3A+PHA+Jm5ic3A7ICZuYnNwOyAyMyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBk ZWYgZmluZE1pbihudW0pOjwvcD48cD4mbmJzcDsgJm5ic3A7IDI0ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOyAxICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgMTUgJm5ic3A7 ICZuYnNwOyAxNS4wICZuYnNwOyAmbmJzcDsgJm5ic3A7Ni4wICZuYnNwOyAmbmJzcDsgJm5ic3A7 bG8sIGhpID0gMCwgbGVuKG51bSkgLSAxPC9wPjxwPiZuYnNwOyAmbmJzcDsgMjUgJm5ic3A7ICZu YnNwOyAmbmJzcDsgJm5ic3A7MjAgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyA0 MCAmbmJzcDsgJm5ic3A7ICZuYnNwOzIuMCAmbmJzcDsgJm5ic3A7IDE2LjEgJm5ic3A7ICZuYnNw OyAmbmJzcDt3aGlsZSBsbyAmbHQ7PSBoaTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAyNiAmbmJzcDsg Jm5ic3A7ICZuYnNwOyAmbmJzcDsyMCAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 IDQ4ICZuYnNwOyAmbmJzcDsgJm5ic3A7Mi40ICZuYnNwOyAmbmJzcDsgMTkuNCAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7aWYgbnVtW2xvXSAmbHQ7PSBudW1baGldOjwvcD48cD4m bmJzcDsgJm5ic3A7IDI3ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAxICZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7MiAmbmJzcDsgJm5ic3A7ICZuYnNwOzIuMCAm bmJzcDsgJm5ic3A7ICZuYnNwOzAuOCAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDtyZXR1cm4gbnVtW2xvXTwvcD48cD4mbmJzcDsgJm5ic3A7IDI4ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOzE5ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgNTQgJm5ic3A7ICZuYnNwOyAmbmJzcDsyLjggJm5ic3A7ICZuYnNwOyAyMS44ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDttaWQgPSAobG8gKyBoaSkgLyAyPC9wPjxwPiZuYnNw OyAmbmJzcDsgMjkgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7MTkgJm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyA1MCAmbmJzcDsgJm5ic3A7ICZuYnNwOzIuNiAmbmJzcDsgJm5i c3A7IDIwLjIgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwO2lmIG51bVttaWRdICZs dDsgbnVtW2hpXTo8L3A+PHA+Jm5ic3A7ICZuYnNwOyAzMCAmbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgNSAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IDEwICZuYnNwOyAmbmJz cDsgJm5ic3A7Mi4wICZuYnNwOyAmbmJzcDsgJm5ic3A7NC4wICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwO2hpID0gbWlkPC9wPjxwPiZuYnNwOyAmbmJzcDsg MzEgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNw OyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7IGVsc2U6PC9wPjxwPiZuYnNwOyAmbmJzcDsgMzIgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7MTQgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAyOSAmbmJzcDsgJm5i c3A7ICZuYnNwOzIuMSAmbmJzcDsgJm5ic3A7IDExLjcgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7bG8gPSBtaWQgKyAxPC9wPjxwPjxicj48L3A+PHA+YGBg PC9wPjxwPjxicj48L3A+PC9kaXY+PC9kaXY+&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/4108135054526708767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=4108135054526708767' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4108135054526708767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4108135054526708767'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/10/automated-testing-by-pytest.html' title='Automated testing by pytest'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-8488996285438608067</id><published>2014-09-24T14:08:00.000-05:00</published><updated>2014-09-25T14:09:01.618-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'> One example of test-driven development in Python</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=8488996285438608067;onPublishedMenu=allposts;onClosedMenu=allposts;postNum=0;src=postname&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Function or method is the most basic unit in Python programming. Test-driven development is a key for a developer to assure the code quality of those units. In &lt;a href=&quot;http://chimera.labs.oreilly.com/books/1234000000754/index.html&quot;&gt;his book&lt;/a&gt;, Harry Percival illustrated a few great examples about how to use TDD with Python and Django. It seems that for web development, TDD including unit testing and integration testing is the cornerstone for every success. For data analysis, coding mostly relies on built-in packages instead large framework like Django, which makes TDD easier. In my opnion, TDD in data analysis could have three steps. &lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;Step 1: requirement analysis&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;  Before writing any code for data analysis, the programmer should seriously ask the customer or himself about the requirements. &lt;/div&gt;&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;What are the input parameter?&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;what if the input data doesn’t fit the assumptions?&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;What is the purpose of this funtction or method? what are the desired outputs?&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;For example, there is a recent coding challenge called &lt;a href=&quot;https://oj.leetcode.com/problems/maximum-product-subarray/&quot;&gt;Maximum Product Subarray&lt;/a&gt;.&lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); display: block ! important; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em 0.7em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;  &amp;gt; Find the contiguous subarray within an array (containing at least one number) which has the largest product.&lt;br /&gt;  For example, given the array [2,3,-2,4],&lt;br /&gt;  the contiguous subarray [2,3] has the largest product = 6.&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;OK, understanding this question is quite straight-forward. Given a array(or a list in Python), you return the integer that is the maximum product from a continuous subarry out of the input array. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;  &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;maxProduct&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(A)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot; A function to find the maximum product value for a continuous subarray.&lt;br /&gt;      :param A: an array or list&lt;br /&gt;      :type A: list&lt;br /&gt;      &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; A &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; isinstance(A, list):&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; len(A) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; A[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;pass&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;A production version of the codes above should be more like: &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #445588; color: #990000; font-weight: bold; font-weight: bold;&quot;&gt;FunctionInputError&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(Exception)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;pass&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;maxProduct&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(A)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot; A function to find the maximum product value for a continuous subarray.&lt;br /&gt;  :param A: an array or list&lt;br /&gt;  :type A: list&lt;br /&gt;  &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; A &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; isinstance(A, list):&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;raise&lt;/span&gt; FunctionInputError(&lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;must give a list as input&#39;&lt;/span&gt;)&lt;br /&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; len(A) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; A[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]&lt;br /&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;pass&lt;/span&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;Step 2: write test cases&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;  Given not a single line of logic codes has been writen yet, I call the current step as black-box testing, which means that I want this funtion to fail every test cases. Python has a built-in module &lt;a href=&quot;https://docs.python.org/2/library/doctest.html&quot;&gt;doctest&lt;/a&gt;, which allows embedding the test cases within the docstring. I write six test cases, run the psedu-function below and arbitrarily specify the result to be -1. As expected, it fails all the six test cases with horrible red warnings under the Python shell. That is a good thing: it proves that the testing works.  &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; doctest&lt;br /&gt;  &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;maxProduct&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(A)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot; A function to find the maximum product value for a continuous subarray.&lt;br /&gt;      :param A: an array or list&lt;br /&gt;      :type A: list&lt;br /&gt;&lt;br /&gt;      - testcase1&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, 2, 3,-2,4, 1, -1])&lt;br /&gt;          48&lt;br /&gt;&lt;br /&gt;      - testcase2&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([2, 3, 0, 2, 4, 0, 3, 4])&lt;br /&gt;          12&lt;br /&gt;&lt;br /&gt;      - testcase3&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, 5, 3, -1, -2, 0, -2, 4, 0, 3, 4])&lt;br /&gt;          30&lt;br /&gt;&lt;br /&gt;      - testcase4&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, 1, -3, -4, -5, -1, 1, 2, 1, -1, -100, 0, -100000])&lt;br /&gt;          12000&lt;br /&gt;&lt;br /&gt;      - testcase5&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, -3000, -2, 0, -100, -100, 0, -9, -8, 1, 1, 2])&lt;br /&gt;          10000&lt;br /&gt;&lt;br /&gt;      - testcase6&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, -2, 0])&lt;br /&gt;          0&lt;br /&gt;      &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; A &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; isinstance(A, list):&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; len(A) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; A[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; -&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  doctest.testmod()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;Step 3: implement the logic&lt;/div&gt;&lt;div style=&quot;margin: 0.5em 0px ! important; margin: 1.2em 0px ! important;&quot;&gt;  It’s time to tackle the most difficult part: write the real function. Think about time complexity (it is best to use only one iteration around the input array which means &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;O(n)&lt;/code&gt;), and space complexity (it is best not to use extra space). Run &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;testmod()&lt;/code&gt; again and again to find mistakes and modify the codes accordingly. Finally I come with a solution with a helper shadow function _maxProduct. And it passes the six test cases. Althoug I am not sure that this function does not have any bug, at least it works now. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; doctest&lt;br /&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;from&lt;/span&gt; sys &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;import&lt;/span&gt; maxint&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;maxProduct&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(A)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot; A function to find the maximum product value for a continuous subarray.&lt;br /&gt;      :param A: an array or list&lt;br /&gt;      :type A: list&lt;br /&gt;&lt;br /&gt;      - testcase1&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, 2, 3,-2,4, 1, -1])&lt;br /&gt;          48&lt;br /&gt;&lt;br /&gt;      - testcase2&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([2, 3, 0, 2, 4, 0, 3, 4])&lt;br /&gt;          12&lt;br /&gt;&lt;br /&gt;      - testcase3&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, 5, 3, -1, -2, 0, -2, 4, 0, 3, 4])&lt;br /&gt;          30&lt;br /&gt;&lt;br /&gt;      - testcase4&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, 1, -3, -4, -5, -1, 1, 2, 1, -1, -100, 0, -100000])&lt;br /&gt;          12000&lt;br /&gt;&lt;br /&gt;      - testcase5&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, -3000, -2, 0, -100, -100, 0, -9, -8, 1, 1, 2])&lt;br /&gt;          10000&lt;br /&gt;&lt;br /&gt;      - testcase6&lt;br /&gt;          &amp;gt;&amp;gt;&amp;gt; maxProduct([0, -2, 0])&lt;br /&gt;          0&lt;br /&gt;      &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; A &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;not&lt;/span&gt; isinstance(A, list):&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; len(A) == &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; A[&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;]&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; max(_maxProduct(A), _maxProduct([a &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; a &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; reversed(A)]))&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;hljs-function&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;_maxProduct&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(A)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;      max_val_forward = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;      rst = -maxint&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;for&lt;/span&gt; a &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;in&lt;/span&gt; A:&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; a != &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;:&lt;br /&gt;              max_val_forward *= a&lt;br /&gt;              rst = max(rst, max_val_forward)&lt;br /&gt;          &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;              rst = max(&lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;0&lt;/span&gt;, rst)&lt;br /&gt;              max_val_forward = &lt;span class=&quot;hljs-number&quot; style=&quot;color: teal;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;      &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;return&lt;/span&gt; rst&lt;br /&gt;&lt;br /&gt;  &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; __name__ == &lt;span class=&quot;hljs-string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;__main__&quot;&lt;/span&gt;:&lt;br /&gt;      doctest.testmod()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;In conclusion, the most important thing about TDD in data analysis is writing test cases, which really needs a lot of training and exercises. &lt;/div&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; padding: 0;&quot; title=&quot;MDH:RnVuY3Rpb24gb3IgbWV0aG9kIGlzIHRoZSBtb3N0IGJhc2ljIHVuaXQgaW4gUHl0aG9uIHByb2dy YW1taW5nLiBUZXN0LWRyaXZlbiBkZXZlbG9wbWVudCBpcyBhIGtleSBmb3IgYSBkZXZlbG9wZXIg dG8gYXNzdXJlIHRoZSBjb2RlIHF1YWxpdHkgb2YgdGhvc2UgdW5pdHMuIEluIFtoaXMgYm9va10o aHR0cDovL2NoaW1lcmEubGFicy5vcmVpbGx5LmNvbS9ib29rcy8xMjM0MDAwMDAwNzU0L2luZGV4 Lmh0bWwpLCBIYXJyeSBQZXJjaXZhbCBpbGx1c3RyYXRlZCBhIGZldyBncmVhdCBleGFtcGxlcyBh Ym91dCBob3cgdG8gdXNlIFRERCB3aXRoIFB5dGhvbiBhbmQgRGphbmdvLiBJdCBzZWVtcyB0aGF0 IGZvciB3ZWIgZGV2ZWxvcG1lbnQsIFRERCBpbmNsdWRpbmcgdW5pdCB0ZXN0aW5nIGFuZCBpbnRl Z3JhdGlvbiB0ZXN0aW5nIGlzIHRoZSBjb3JuZXJzdG9uZSBmb3IgZXZlcnkgc3VjY2Vzcy4gRm9y IGRhdGEgYW5hbHlzaXMsIGNvZGluZyBtb3N0bHkgcmVsaWVzIG9uIGJ1aWx0LWluIHBhY2thZ2Vz IGluc3RlYWQgbGFyZ2UgZnJhbWV3b3JrIGxpa2UgRGphbmdvLCB3aGljaCBtYWtlcyBUREQgZWFz aWVyLiBJbiBteSBvcG5pb24sIFRERCBpbiBkYXRhIGFuYWx5c2lzIGNvdWxkIGhhdmUgdGhyZWUg c3RlcHMuIDxicj48YnI+LSBTdGVwIDE6IHJlcXVpcmVtZW50IGFuYWx5c2lzPGJyPsKgwqDCoCA8 YnI+wqDCoMKgIEJlZm9yZSB3cml0aW5nIGFueSBjb2RlIGZvciBkYXRhIGFuYWx5c2lzLCB0aGUg cHJvZ3JhbW1lciBzaG91bGQgc2VyaW91c2x5IGFzayB0aGUgY3VzdG9tZXIgb3IgaGltc2VsZiBh Ym91dCB0aGUgcmVxdWlyZW1lbnRzLiA8YnI+wqDCoMKgIC0gV2hhdCBhcmUgdGhlIGlucHV0IHBh cmFtZXRlcj88YnI+wqDCoMKgIC0gd2hhdCBpZiB0aGUgaW5wdXQgZGF0YSBkb2Vzbid0IGZpdCB0 aGUgYXNzdW1wdGlvbnM/PGJyPsKgwqDCoCAtIFdoYXQgaXMgdGhlIHB1cnBvc2Ugb2YgdGhpcyBm dW50Y3Rpb24gb3IgbWV0aG9kPyB3aGF0IGFyZSB0aGUgZGVzaXJlZCBvdXRwdXRzPzxicj48YnI+ LSBGb3IgZXhhbXBsZSwgdGhlcmUgaXMgYSByZWNlbnQgY29kaW5nIGNoYWxsZW5nZSBjYWxsZWQg W01heGltdW0gUHJvZHVjdCBTdWJhcnJheV0oaHR0cHM6Ly9vai5sZWV0Y29kZS5jb20vcHJvYmxl bXMvbWF4aW11bS1wcm9kdWN0LXN1YmFycmF5LykuPGJyPjxicj7CoMKgwqDCoMKgwqDCoCAmZ3Q7 IEZpbmQgdGhlIGNvbnRpZ3VvdXMgc3ViYXJyYXkgd2l0aGluIGFuIGFycmF5IChjb250YWluaW5n IGF0IGxlYXN0IG9uZSBudW1iZXIpIHdoaWNoIGhhcyB0aGUgbGFyZ2VzdCBwcm9kdWN0Ljxicj7C oMKgwqDCoMKgwqDCoCBGb3IgZXhhbXBsZSwgZ2l2ZW4gdGhlIGFycmF5IFsyLDMsLTIsNF0sPGJy PsKgwqDCoMKgwqDCoMKgIHRoZSBjb250aWd1b3VzIHN1YmFycmF5IFsyLDNdIGhhcyB0aGUgbGFy Z2VzdCBwcm9kdWN0ID0gNi48YnI+wqDCoMKgIDxicj4tIE9LLCB1bmRlcnN0YW5kaW5nIHRoaXMg cXVlc3Rpb24gaXMgcXVpdGUgc3RyYWlnaHQtZm9yd2FyZC4gR2l2ZW4gYSBhcnJheShvciBhIGxp c3QgaW4gUHl0aG9uKSwgeW91IHJldHVybiB0aGUgaW50ZWdlciB0aGF0IGlzIHRoZSBtYXhpbXVt IHByb2R1Y3QgZnJvbSBhIGNvbnRpbnVvdXMgc3ViYXJyeSBvdXQgb2YgdGhlIGlucHV0IGFycmF5 LiA8YnI+wqDCoMKgIGBgYHB5dGhvbjxicj7CoMKgwqAgZGVmIG1heFByb2R1Y3QoQSk6PGJyPsKg wqDCoMKgwqDCoMKgICIiIiBBIGZ1bmN0aW9uIHRvIGZpbmQgdGhlIG1heGltdW0gcHJvZHVjdCB2 YWx1ZSBmb3IgYSBjb250aW51b3VzIHN1YmFycmF5Ljxicj7CoMKgwqDCoMKgwqDCoCA6cGFyYW0g QTogYW4gYXJyYXkgb3IgbGlzdDxicj7CoMKgwqDCoMKgwqDCoCA6dHlwZSBBOiBsaXN0PGJyPsKg wqDCoMKgwqDCoMKgICIiIjxicj7CoMKgwqDCoMKgwqDCoCBpZiBBIGlzIE5vbmUgb3Igbm90IGlz aW5zdGFuY2UoQSwgbGlzdCk6PGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgcmV0dXJuIE5vbmU8 YnI+wqDCoMKgwqDCoMKgwqAgaWYgbGVuKEEpID09IDE6PGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKg wqAgcmV0dXJuIEFbMF08YnI+wqDCoMKgwqDCoMKgwqAgcGFzczxicj7CoMKgwqAgYGBgPGJyPsKg wqDCoCAtIEEgcHJvZHVjdGlvbiB2ZXJzaW9uIG9mIHRoZSBjb2RlcyBhYm92ZSBzaG91bGQgYmUg bW9yZSBsaWtlOiA8YnI+wqDCoMKgIDxicj7CoMKgwqAgYGBgcHl0aG9uPGJyPsKgwqDCoCBjbGFz cyBGdW5jdGlvbklucHV0RXJyb3IoRXhjZXB0aW9uKTo8YnI+wqDCoMKgwqDCoMKgwqAgcGFzczxi cj7CoMKgwqDCoMKgwqDCoCA8YnI+wqDCoMKgIGRlZiBtYXhQcm9kdWN0KEEpOjxicj7CoMKgwqDC oMKgwqDCoCAiIiIgQSBmdW5jdGlvbiB0byBmaW5kIHRoZSBtYXhpbXVtIHByb2R1Y3QgdmFsdWUg Zm9yIGEgY29udGludW91cyBzdWJhcnJheS48YnI+wqDCoMKgwqDCoMKgwqAgOnBhcmFtIEE6IGFu IGFycmF5IG9yIGxpc3Q8YnI+wqDCoMKgwqDCoMKgwqAgOnR5cGUgQTogbGlzdDxicj7CoMKgwqDC oMKgwqDCoCAiIiI8YnI+wqDCoMKgwqDCoMKgwqAgaWYgQSBpcyBOb25lIG9yIG5vdCBpc2luc3Rh bmNlKEEsIGxpc3QpOjxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgIHJhaXNlIEZ1bmN0aW9uSW5w dXRFcnJvcignbXVzdCBnaXZlIGEgbGlzdCBhcyBpbnB1dCcpPGJyPsKgwqDCoMKgwqDCoMKgIGlm IGxlbihBKSA9PSAxOjxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgIHJldHVybiBBWzBdPGJyPsKg wqDCoMKgwqDCoMKgIHBhc3M8YnI+wqDCoMKgIGBgYDxicj48YnI+PGJyPi0gU3RlcCAyOiB3cml0 ZSB0ZXN0IGNhc2VzPGJyPsKgwqDCoCA8YnI+wqDCoMKgIEdpdmVuIG5vdCBhIHNpbmdsZSBsaW5l IG9mIGxvZ2ljIGNvZGVzIGhhcyBiZWVuIHdyaXRlbiB5ZXQsIEkgY2FsbCB0aGUgY3VycmVudCBz dGVwIGFzIGJsYWNrLWJveCB0ZXN0aW5nLCB3aGljaCBtZWFucyB0aGF0IEkgd2FudCB0aGlzIGZ1 bnRpb24gdG8gZmFpbCBldmVyeSB0ZXN0IGNhc2VzLiBQeXRob24gaGFzIGEgYnVpbHQtaW4gbW9k dWxlIFtkb2N0ZXN0XShodHRwczovL2RvY3MucHl0aG9uLm9yZy8yL2xpYnJhcnkvZG9jdGVzdC5o dG1sKSwgd2hpY2ggYWxsb3dzIGVtYmVkZGluZyB0aGUgdGVzdCBjYXNlcyB3aXRoaW4gdGhlIGRv Y3N0cmluZy4gSSB3cml0ZSBzaXggdGVzdCBjYXNlcywgcnVuIHRoZSBwc2VkdS1mdW5jdGlvbiBi ZWxvdyBhbmQgYXJiaXRyYXJpbHkgc3BlY2lmeSB0aGUgcmVzdWx0IHRvIGJlIC0xLiBBcyBleHBl Y3RlZCwgaXQgZmFpbHMgYWxsIHRoZSBzaXggdGVzdCBjYXNlcyB3aXRoIGhvcnJpYmxlIHJlZCB3 YXJuaW5ncyB1bmRlciB0aGUgUHl0aG9uIHNoZWxsLiBUaGF0IGlzIGEgZ29vZCB0aGluZzogaXQg cHJvdmVzIHRoYXQgdGhlIHRlc3Rpbmcgd29ya3MuwqAgPGJyPjxicj7CoMKgwqAgYGBgcHl0aG9u PGJyPsKgwqDCoCBpbXBvcnQgZG9jdGVzdDxicj7CoMKgwqAgZGVmIG1heFByb2R1Y3QoQSk6PGJy PsKgwqDCoMKgwqDCoMKgICIiIiBBIGZ1bmN0aW9uIHRvIGZpbmQgdGhlIG1heGltdW0gcHJvZHVj dCB2YWx1ZSBmb3IgYSBjb250aW51b3VzIHN1YmFycmF5Ljxicj7CoMKgwqDCoMKgwqDCoCA6cGFy YW0gQTogYW4gYXJyYXkgb3IgbGlzdDxicj7CoMKgwqDCoMKgwqDCoCA6dHlwZSBBOiBsaXN0PGJy PsKgwqDCoCA8YnI+wqDCoMKgwqDCoMKgwqAgLSB0ZXN0Y2FzZTE8YnI+wqDCoMKgwqDCoMKgwqDC oMKgwqDCoCAmZ3Q7Jmd0OyZndDsgbWF4UHJvZHVjdChbMCwgMiwgMywtMiw0LCAxLCAtMV0pPGJy PsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgNDg8YnI+wqDCoMKgIDxicj7CoMKgwqDCoMKgwqDCoCAt IHRlc3RjYXNlMjxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgICZndDsmZ3Q7Jmd0OyBtYXhQcm9k dWN0KFsyLCAzLCAwLCAyLCA0LCAwLCAzLCA0XSk8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAx Mjxicj7CoMKgwqAgPGJyPsKgwqDCoMKgwqDCoMKgIC0gdGVzdGNhc2UzPGJyPsKgwqDCoMKgwqDC oMKgwqDCoMKgwqAgJmd0OyZndDsmZ3Q7IG1heFByb2R1Y3QoWzAsIDUsIDMsIC0xLCAtMiwgMCwg LTIsIDQsIDAsIDMsIDRdKTxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgIDMwPGJyPsKgwqDCoCA8 YnI+wqDCoMKgwqDCoMKgwqAgLSB0ZXN0Y2FzZTQ8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAm Z3Q7Jmd0OyZndDsgbWF4UHJvZHVjdChbMCwgMSwgLTMsIC00LCAtNSwgLTEsIDEsIDIsIDEsIC0x LCAtMTAwLCAwLCAtMTAwMDAwXSk8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAxMjAwMDxicj7C oMKgwqAgPGJyPsKgwqDCoMKgwqDCoMKgIC0gdGVzdGNhc2U1PGJyPsKgwqDCoMKgwqDCoMKgwqDC oMKgwqAgJmd0OyZndDsmZ3Q7IG1heFByb2R1Y3QoWzAsIC0zMDAwLCAtMiwgMCwgLTEwMCwgLTEw MCwgMCwgLTksIC04LCAxLCAxLCAyXSk8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAxMDAwMDxi cj7CoMKgwqAgPGJyPsKgwqDCoMKgwqDCoMKgIC0gdGVzdGNhc2U2PGJyPsKgwqDCoMKgwqDCoMKg wqDCoMKgwqAgJmd0OyZndDsmZ3Q7IG1heFByb2R1Y3QoWzAsIC0yLCAwXSk8YnI+wqDCoMKgwqDC oMKgwqDCoMKgwqDCoCAwPGJyPsKgwqDCoMKgwqDCoMKgICIiIjxicj7CoMKgwqDCoMKgwqDCoCBp ZiBBIGlzIE5vbmUgb3Igbm90IGlzaW5zdGFuY2UoQSwgbGlzdCk6PGJyPsKgwqDCoMKgwqDCoMKg wqDCoMKgwqAgcmV0dXJuIE5vbmU8YnI+wqDCoMKgwqDCoMKgwqAgaWYgbGVuKEEpID09IDE6PGJy PsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgcmV0dXJuIEFbMF08YnI+wqDCoMKgwqDCoMKgwqAgcmV0 dXJuIC0xPGJyPsKgwqDCoCA8YnI+wqDCoMKgIGRvY3Rlc3QudGVzdG1vZCgpPGJyPsKgwqDCoCBg YGA8YnI+PGJyPi0gU3RlcCAzOiBpbXBsZW1lbnQgdGhlIGxvZ2ljPGJyPjxicj7CoMKgwqAgSXQn cyB0aW1lIHRvIHRhY2tsZSB0aGUgbW9zdCBkaWZmaWN1bHQgcGFydDogd3JpdGUgdGhlIHJlYWwg ZnVuY3Rpb24uIFRoaW5rIGFib3V0IHRpbWUgY29tcGxleGl0eSAoaXQgaXMgYmVzdCB0byB1c2Ug b25seSBvbmUgaXRlcmF0aW9uIGFyb3VuZCB0aGUgaW5wdXQgYXJyYXkgd2hpY2ggbWVhbnMgYE8o bilgKSwgYW5kIHNwYWNlIGNvbXBsZXhpdHkgKGl0IGlzIGJlc3Qgbm90IHRvIHVzZSBleHRyYSBz cGFjZSkuIFJ1biBgdGVzdG1vZCgpYCBhZ2FpbiBhbmQgYWdhaW4gdG8gZmluZCBtaXN0YWtlcyBh bmQgbW9kaWZ5IHRoZSBjb2RlcyBhY2NvcmRpbmdseS4gRmluYWxseSBJIGNvbWUgd2l0aCBhIHNv bHV0aW9uIHdpdGggYSBoZWxwZXIgc2hhZG93IGZ1bmN0aW9uIF9tYXhQcm9kdWN0LiBBbmQgaXQg cGFzc2VzIHRoZSBzaXggdGVzdCBjYXNlcy4gQWx0aG91ZyBJIGFtIG5vdCBzdXJlIHRoYXQgdGhp cyBmdW5jdGlvbiBkb2VzIG5vdCBoYXZlIGFueSBidWcsIGF0IGxlYXN0IGl0IHdvcmtzIG5vdy4g PGJyPjxicj7CoMKgwqAgYGBgcHl0aG9uPGJyPsKgwqDCoCBpbXBvcnQgZG9jdGVzdDxicj7CoMKg wqAgZnJvbSBzeXMgaW1wb3J0IG1heGludDxicj7CoMKgwqAgPGJyPsKgwqDCoCBkZWYgbWF4UHJv ZHVjdChBKTo8YnI+wqDCoMKgwqDCoMKgwqAgIiIiIEEgZnVuY3Rpb24gdG8gZmluZCB0aGUgbWF4 aW11bSBwcm9kdWN0IHZhbHVlIGZvciBhIGNvbnRpbnVvdXMgc3ViYXJyYXkuPGJyPsKgwqDCoMKg wqDCoMKgIDpwYXJhbSBBOiBhbiBhcnJheSBvciBsaXN0PGJyPsKgwqDCoMKgwqDCoMKgIDp0eXBl IEE6IGxpc3Q8YnI+wqDCoMKgIDxicj7CoMKgwqDCoMKgwqDCoCAtIHRlc3RjYXNlMTxicj7CoMKg wqDCoMKgwqDCoMKgwqDCoMKgICZndDsmZ3Q7Jmd0OyBtYXhQcm9kdWN0KFswLCAyLCAzLC0yLDQs IDEsIC0xXSk8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCA0ODxicj7CoMKgwqAgPGJyPsKgwqDC oMKgwqDCoMKgIC0gdGVzdGNhc2UyPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgJmd0OyZndDsm Z3Q7IG1heFByb2R1Y3QoWzIsIDMsIDAsIDIsIDQsIDAsIDMsIDRdKTxicj7CoMKgwqDCoMKgwqDC oMKgwqDCoMKgIDEyPGJyPsKgwqDCoCA8YnI+wqDCoMKgwqDCoMKgwqAgLSB0ZXN0Y2FzZTM8YnI+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCAmZ3Q7Jmd0OyZndDsgbWF4UHJvZHVjdChbMCwgNSwgMywg LTEsIC0yLCAwLCAtMiwgNCwgMCwgMywgNF0pPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgMzA8 YnI+wqDCoMKgIDxicj7CoMKgwqDCoMKgwqDCoCAtIHRlc3RjYXNlNDxicj7CoMKgwqDCoMKgwqDC oMKgwqDCoMKgICZndDsmZ3Q7Jmd0OyBtYXhQcm9kdWN0KFswLCAxLCAtMywgLTQsIC01LCAtMSwg MSwgMiwgMSwgLTEsIC0xMDAsIDAsIC0xMDAwMDBdKTxicj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKg IDEyMDAwPGJyPsKgwqDCoCA8YnI+wqDCoMKgwqDCoMKgwqAgLSB0ZXN0Y2FzZTU8YnI+wqDCoMKg wqDCoMKgwqDCoMKgwqDCoCAmZ3Q7Jmd0OyZndDsgbWF4UHJvZHVjdChbMCwgLTMwMDAsIC0yLCAw LCAtMTAwLCAtMTAwLCAwLCAtOSwgLTgsIDEsIDEsIDJdKTxicj7CoMKgwqDCoMKgwqDCoMKgwqDC oMKgIDEwMDAwPGJyPsKgwqDCoCA8YnI+wqDCoMKgwqDCoMKgwqAgLSB0ZXN0Y2FzZTY8YnI+wqDC oMKgwqDCoMKgwqDCoMKgwqDCoCAmZ3Q7Jmd0OyZndDsgbWF4UHJvZHVjdChbMCwgLTIsIDBdKTxi cj7CoMKgwqDCoMKgwqDCoMKgwqDCoMKgIDA8YnI+wqDCoMKgwqDCoMKgwqAgIiIiPGJyPsKgwqDC oMKgwqDCoMKgIGlmIEEgaXMgTm9uZSBvciBub3QgaXNpbnN0YW5jZShBLCBsaXN0KTo8YnI+wqDC oMKgwqDCoMKgwqDCoMKgwqDCoCByZXR1cm4gTm9uZTxicj7CoMKgwqDCoMKgwqDCoCBpZiBsZW4o QSkgPT0gMTo8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoCByZXR1cm4gQVswXTxicj7CoMKgwqDC oMKgwqDCoCByZXR1cm4gbWF4KF9tYXhQcm9kdWN0KEEpLCBfbWF4UHJvZHVjdChbYSBmb3IgYSBp biByZXZlcnNlZChBKV0pKTxicj7CoMKgwqAgPGJyPsKgwqDCoCBkZWYgX21heFByb2R1Y3QoQSk6 PGJyPsKgwqDCoMKgwqDCoMKgIG1heF92YWxfZm9yd2FyZCA9IDE8YnI+wqDCoMKgwqDCoMKgwqAg cnN0ID0gLW1heGludDxicj7CoMKgwqDCoMKgwqDCoCBmb3IgYSBpbiBBOjxicj7CoMKgwqDCoMKg wqDCoMKgwqDCoMKgIGlmIGEgIT0gMDo8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg IG1heF92YWxfZm9yd2FyZCAqPSBhPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBy c3QgPSBtYXgocnN0LCBtYXhfdmFsX2ZvcndhcmQpPGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqAg ZWxzZTo8YnI+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIHJzdCA9IG1heCgwLCByc3Qp PGJyPsKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBtYXhfdmFsX2ZvcndhcmQgPSAxPGJy PsKgwqDCoMKgwqDCoMKgIHJldHVybiByc3Q8YnI+wqDCoMKgIDxicj7CoMKgwqAgaWYgX19uYW1l X18gPT0gIl9fbWFpbl9fIjo8YnI+wqDCoMKgwqDCoMKgwqAgZG9jdGVzdC50ZXN0bW9kKCk8YnI+ wqDCoMKgIGBgYDxicj48YnI+SW4gY29uY2x1c2lvbiwgdGhlIG1vc3QgaW1wb3J0YW50IHRoaW5n IGFib3V0IFRERCBpbiBkYXRhIGFuYWx5c2lzIGlzIHdyaXRpbmcgdGVzdCBjYXNlcywgd2hpY2gg bmVlZHMgYSBsb3Qgb2YgdHJhaW5pbmcgYW5kIGV4ZXJjaXNlcy4gPGJyPjxicj4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/8488996285438608067/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=8488996285438608067' title='91 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8488996285438608067'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8488996285438608067'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/09/one-example-of-test-driven-development.html' title=' One example of test-driven development in Python'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>91</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-224456519230091406</id><published>2014-08-10T08:30:00.000-05:00</published><updated>2014-09-26T10:10:45.938-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>Translate SAS&#39;s sas7bdat format to SQLite and Pandas</title><content type='html'>&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Thanks Jared Hobbs’ &lt;a href=&quot;https://pypi.python.org/pypi/sas7bdat&quot;&gt;sas7bdat&lt;/a&gt; package, Python can read SAS’s data sets quickly and precisely. And it  will be great to have a few extension functions to enhance this package  with SQLite and Pandas. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The good things to transfer SAS libraries to SQLite: &lt;/div&gt;&lt;ol style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Size reduction:&lt;br /&gt;SAS’s sas7bdat format is verbose. So far successfully loaded 40GB SAS data to SQLite with 85% reduction of disk usage.&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Save the cost to buy SAS/ACCESS&lt;br /&gt;SAS/ACCESS costs around $8,000 a year for a server, while SQLite is accessible for most common softwares.&lt;/li&gt;&lt;/ol&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The good things to transfer SAS data set to Pandas: &lt;/div&gt;&lt;ol style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Pandas’ powerful Excel interface:&lt;br /&gt;Write very large Excel file quickly as long as memory can hold data. &lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Validation of statistics&lt;br /&gt;Pandas works well with &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;statsmodels&lt;/code&gt; and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;scikit-learn&lt;/code&gt;. Easy to validate&amp;nbsp; SAS’s outputs.&amp;nbsp; &lt;/li&gt;&lt;/ol&gt;&lt;script src=&quot;https://gist.github.com/dapangmao/b4ac940343498408c7a2.js&quot;&gt;&lt;/script&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/224456519230091406/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=224456519230091406' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/224456519230091406'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/224456519230091406'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/08/python-extension-functions-to-translate.html' title='Translate SAS&#39;s sas7bdat format to SQLite and Pandas'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-8327540405541315675</id><published>2014-06-10T09:03:00.002-05:00</published><updated>2014-06-10T09:04:06.410-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>Remove tabs from SAS code files</title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/target=post;postID=8327540405541315675&quot; markdown-here-wrapper-content-modified=&quot;true&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px !important;&quot;&gt;By default, SAS records the indent by pressing the tab key by &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;tab&lt;/code&gt;, which causes many problem to use the code files under a different environment. There are actually two ways to eliminate the tab character in SAS and replace with empty spaces. &lt;/div&gt;&lt;ul&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Regular expression&lt;br /&gt;Press Ctrl + H →  Replace window pops out → Choose Regular expression search → At the box of Find text input &lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;\t&lt;/code&gt;→ At the box of Replace input multiple&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;\s&lt;/code&gt;, say four&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-qdpjMLIY4p0/U5cPtTXfz2I/AAAAAAAAC8Y/kYOab7YkHT0/s1600/Capture1.PNG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-qdpjMLIY4p0/U5cPtTXfz2I/AAAAAAAAC8Y/kYOab7YkHT0/s1600/Capture1.PNG&quot; height=&quot;176&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;ul&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;&lt;div style=&quot;margin: 0.5em 0px !important; margin: 1.2em 0px !important;&quot;&gt;Editor option&lt;br /&gt;Click Tools → Options →  Enhanced Editors… → Choose Insert spaces for tabs → Choose Replace tabs with spaces on file open&lt;/div&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-li_btVYbwCw/U5cP_JBwH1I/AAAAAAAAC8k/kF7-WzRAfyw/s1600/Capture.PNG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-li_btVYbwCw/U5cP_JBwH1I/AAAAAAAAC8k/kF7-WzRAfyw/s1600/Capture.PNG&quot; height=&quot;219&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; padding: 0;&quot; title=&quot;MDH:PHAgZGlyPSJsdHIiIHN0eWxlPSJsaW5lLWhlaWdodDoxLjE1O21hcmdpbi10b3A6MHB0O21hcmdp bi1ib3R0b206MHB0OyI+PHNwYW4gc3R5bGU9ImZvbnQtc2l6ZToxNXB4O2ZvbnQtZmFtaWx5OkFy aWFsO2NvbG9yOiMwMDAwMDA7YmFja2dyb3VuZC1jb2xvcjp0cmFuc3BhcmVudDtmb250LXdlaWdo dDpub3JtYWw7Zm9udC1zdHlsZTpub3JtYWw7Zm9udC12YXJpYW50Om5vcm1hbDt0ZXh0LWRlY29y YXRpb246bm9uZTt2ZXJ0aWNhbC1hbGlnbjpiYXNlbGluZTt3aGl0ZS1zcGFjZTpwcmUtd3JhcDsi PkJ5IGRlZmF1bHQsIFNBUyByZWNvcmRzIHRoZSBpbmRlbnQgYnkgcHJlc3NpbmcgdGhlIHRhYiBr ZXkgYnkgYHRhYmAsIHdoaWNoIGNhdXNlcyBtYW55IHByb2JsZW0gdG8gdXNpbmcgdGhlIGNvZGUg c291cmNlIGZpbGVzIHVuZGVyIGEgZGlmZmVyZW50IGVudmlyb25tZW50LiBUaGVyZSBhcmUgYWN0 dWFsbHkgdHdvIHdheXMgdG8gZWxpbWluYXRlIHRoZSB0YWIgY2hhcmFjdGVyIGluIFNBUyBhbmQg cmVwbGFjZSB3aXRoIGVtcHR5IHNwYWNlcy4gPC9zcGFuPjwvcD48cD48YiBzdHlsZT0iZm9udC13 ZWlnaHQ6bm9ybWFsOyIgaWQ9ImRvY3MtaW50ZXJuYWwtZ3VpZC0wNTdiOTViMS04NjEzLWZhYjUt YThmYi1kZTFjNTkxYmI5YjkiPjxicj48L2I+PC9wPjxwIGRpcj0ibHRyIiBzdHlsZT0ibGluZS1o ZWlnaHQ6MS4xNTttYXJnaW4tdG9wOjBwdDttYXJnaW4tYm90dG9tOjBwdDsiPjxzcGFuIHN0eWxl PSJmb250LXNpemU6MTVweDtmb250LWZhbWlseTpBcmlhbDtjb2xvcjojMDAwMDAwO2JhY2tncm91 bmQtY29sb3I6dHJhbnNwYXJlbnQ7Zm9udC13ZWlnaHQ6bm9ybWFsO2ZvbnQtc3R5bGU6bm9ybWFs O2ZvbnQtdmFyaWFudDpub3JtYWw7dGV4dC1kZWNvcmF0aW9uOm5vbmU7dmVydGljYWwtYWxpZ246 YmFzZWxpbmU7d2hpdGUtc3BhY2U6cHJlLXdyYXA7Ij4xLiBSZWd1bGFyIGV4cHJlc3Npb24gPC9z cGFuPjwvcD48cCBkaXI9Imx0ciIgc3R5bGU9ImxpbmUtaGVpZ2h0OjEuMTU7bWFyZ2luLXRvcDow cHQ7bWFyZ2luLWJvdHRvbTowcHQ7Ij48c3BhbiBzdHlsZT0iZm9udC1zaXplOjE1cHg7Zm9udC1m YW1pbHk6QXJpYWw7Y29sb3I6IzAwMDAwMDtiYWNrZ3JvdW5kLWNvbG9yOnRyYW5zcGFyZW50O2Zv bnQtd2VpZ2h0Om5vcm1hbDtmb250LXN0eWxlOm5vcm1hbDtmb250LXZhcmlhbnQ6bm9ybWFsO3Rl eHQtZGVjb3JhdGlvbjpub25lO3ZlcnRpY2FsLWFsaWduOmJhc2VsaW5lO3doaXRlLXNwYWNlOnBy ZS13cmFwOyI+UHJlc3MgQ3RybCArIEgg4oaSICZuYnNwO1JlcGxhY2Ugd2luZG93IHBvcHMgb3V0 IOKGkiBDaG9vc2UgUmVndWxhciBleHByZXNzaW9uIHNlYXJjaCDihpIgQXQgdGhlIGJveCBvZiBG aW5kIHRleHQgaW5wdXQgYFx0YOKGkiBBdCB0aGUgYm94IG9mIFJlcGxhY2UgaW5wdXQgbXVsdGlw bGVgXHNgLCBzYXkgZm91cjwvc3Bhbj48L3A+PHA+PGIgc3R5bGU9ImZvbnQtd2VpZ2h0Om5vcm1h bDsiPjxicj48L2I+PC9wPjxwIGRpcj0ibHRyIiBzdHlsZT0ibGluZS1oZWlnaHQ6MS4xNTttYXJn aW4tdG9wOjBwdDttYXJnaW4tYm90dG9tOjBwdDsiPjxzcGFuIHN0eWxlPSJmb250LXNpemU6MTVw eDtmb250LWZhbWlseTpBcmlhbDtjb2xvcjojMDAwMDAwO2JhY2tncm91bmQtY29sb3I6dHJhbnNw YXJlbnQ7Zm9udC13ZWlnaHQ6bm9ybWFsO2ZvbnQtc3R5bGU6bm9ybWFsO2ZvbnQtdmFyaWFudDpu b3JtYWw7dGV4dC1kZWNvcmF0aW9uOm5vbmU7dmVydGljYWwtYWxpZ246YmFzZWxpbmU7d2hpdGUt c3BhY2U6cHJlLXdyYXA7Ij4yLiBFZGl0b3Igb3B0aW9uPC9zcGFuPjwvcD48cD48L3A+PHAgZGly PSJsdHIiIHN0eWxlPSJsaW5lLWhlaWdodDoxLjE1O21hcmdpbi10b3A6MHB0O21hcmdpbi1ib3R0 b206MHB0OyI+PHNwYW4gc3R5bGU9ImZvbnQtc2l6ZToxNXB4O2ZvbnQtZmFtaWx5OkFyaWFsO2Nv bG9yOiMwMDAwMDA7YmFja2dyb3VuZC1jb2xvcjp0cmFuc3BhcmVudDtmb250LXdlaWdodDpub3Jt YWw7Zm9udC1zdHlsZTpub3JtYWw7Zm9udC12YXJpYW50Om5vcm1hbDt0ZXh0LWRlY29yYXRpb246 bm9uZTt2ZXJ0aWNhbC1hbGlnbjpiYXNlbGluZTt3aGl0ZS1zcGFjZTpwcmUtd3JhcDsiPkNsaWNr IFRvb2xzIOKGkiBPcHRpb25zIOKGkiAmbmJzcDtFbmhhbmNlZCBFZGl0b3Jz4oCmIOKGkiBDaG9v c2UgSW5zZXJ0IHNwYWNlcyBmb3IgdGFicyDihpIgQ2hvb3NlIFJlcGxhY2UgdGFicyB3aXRoIHNw YWNlcyBvbiBmaWxlIG9wZW48L3NwYW4+PC9wPjxkaXY+PHNwYW4gc3R5bGU9ImZvbnQtc2l6ZTox NXB4O2ZvbnQtZmFtaWx5OkFyaWFsO2NvbG9yOiMwMDAwMDA7YmFja2dyb3VuZC1jb2xvcjp0cmFu c3BhcmVudDtmb250LXdlaWdodDpub3JtYWw7Zm9udC1zdHlsZTpub3JtYWw7Zm9udC12YXJpYW50 Om5vcm1hbDt0ZXh0LWRlY29yYXRpb246bm9uZTt2ZXJ0aWNhbC1hbGlnbjpiYXNlbGluZTt3aGl0 ZS1zcGFjZTpwcmUtd3JhcDsiPjxicj48L3NwYW4+PC9kaXY+&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/8327540405541315675/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=8327540405541315675' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8327540405541315675'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/8327540405541315675'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/06/by-default-sas-records-indent-by.html' title='Remove tabs from SAS code files'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-qdpjMLIY4p0/U5cPtTXfz2I/AAAAAAAAC8Y/kYOab7YkHT0/s72-c/Capture1.PNG" height="72" width="72"/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-4572762488354650217</id><published>2014-05-26T09:58:00.000-05:00</published><updated>2014-10-03T07:43:02.211-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="flask"/><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'>Steps to deploy Flask&#39;s minitwit on Google App Enginee </title><content type='html'>&lt;div class=&quot;markdown-here-wrapper&quot; data-md-url=&quot;https://www.blogger.com/blogger.g?blogID=3256159328630041416#editor/src=dashboard&quot;&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;https://github.com/mitsuhiko/flask&quot;&gt;Flask&lt;/a&gt; is a light-weight web framework for Python, which is well documented and clearly written. Its Github depository provides a few examples, which includes &lt;a href=&quot;https://github.com/mitsuhiko/flask/tree/master/examples/minitwit&quot;&gt;minitwit&lt;/a&gt;. The &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;minittwit&lt;/code&gt; website enjoys a few basic features of social network such as following, login/logout. The demo site on GAE is &lt;a href=&quot;http://minitwit-123.appspot.com/&quot;&gt;http://minitwit-123.appspot.com&lt;/a&gt;. The Github repo is &lt;a href=&quot;https://github.com/dapangmao/minitwit&quot;&gt;https://github.com/dapangmao/minitwit&lt;/a&gt;.&lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;https://appengine.google.com/&quot;&gt;Google App Engine&lt;/a&gt; or GAE is a major public clouder service besides Amazon EC2. Among the four languages(Java/Python/Go/PHP) it supports, GAE is friendly to Python users, possibly because Guido van Rossum worked there and personally created Python datastore interface. As for me, it is a good choice for a Flask app. &lt;/div&gt;&lt;h4 id=&quot;step1-download-gae-sdk-and-gae-flask-skeleton&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Step1: download GAE SDK and GAE Flask skeleton&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;&lt;a href=&quot;https://cloud.google.com/appengine/downloads#Google_App_Engine_SDK_for_Python&quot;&gt;GAE’s Python SDK&lt;/a&gt; tests the staging app and eventuall pushes the app to the cloud. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;A Flask skeleton can be dowloaded from &lt;a href=&quot;https://console.developers.google.com/start/appengine&quot;&gt;Google Developer Console&lt;/a&gt;. It contains three files:&lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;app.yaml: specify the entrance of run-time&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;appengine_config.py: add the external libraries such as Flask to system path &lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;main.py: the root Python program&lt;/li&gt;&lt;/ul&gt;&lt;h4 id=&quot;step2-schema-design&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Step2: schema design&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The dabase used for the original minitwit is SQLite. &lt;a href=&quot;https://github.com/mitsuhiko/flask/blob/master/examples/minitwit/schema.sql&quot;&gt;The schema&lt;/a&gt; consists of three tables: &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;user&lt;/code&gt;, &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;follower&lt;/code&gt; and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;message&lt;/code&gt;, which makes a normalized database together. GAE has two Datastore APIs: &lt;a href=&quot;https://cloud.google.com/appengine/docs/python/datastore/&quot;&gt;DB&lt;/a&gt; and &lt;a href=&quot;https://cloud.google.com/appengine/docs/python/ndb/&quot;&gt;NDB&lt;/a&gt;. Since neither of them supports joining (in this case one-to-many joining for user to follower), I move the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;follwer&lt;/code&gt; table as an nested text propery into the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;user&lt;/code&gt; table, which eliminatse the need for joining. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;As the result, the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;main.py&lt;/code&gt; has two data models: &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;User&lt;/code&gt; and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;Message&lt;/code&gt;. They will create and maintain two &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;kind&lt;/code&gt;s (or we call them as tables) with the same names in Datastore. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;&lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(ndb.Model)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;  username = ndb.StringProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  email = ndb.StringProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  pw_hash = ndb.StringProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  following = ndb.IntegerProperty(repeated=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  start_date = ndb.DateTimeProperty(auto_now_add=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;hljs-class&quot;&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;hljs-title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;Message&lt;/span&gt;&lt;span class=&quot;hljs-params&quot;&gt;(ndb.Model)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;  author = ndb.IntegerProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  text = ndb.TextProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  pub_date = ndb.DateTimeProperty(auto_now_add=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  email = ndb.StringProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;  username = ndb.StringProperty(required=&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;True&lt;/span&gt;)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&quot;step3-replace-sql-statements&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Step3: replace SQL statements&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;The next step is to replace SQL operations in each of the routing functions with NDB’s methods. NDB’s two fundamental methods are &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;get()&lt;/code&gt; that retrieves data from Datastore as a list, and &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;put()&lt;/code&gt; that pushes list to Datastore as a row. In short, data is created and manipulated as individual object. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;For example, if a follower needs to add to a user, I first retrieve the user by its ID that returns a list like &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;[username, email, pw_hash, following, start_date]&lt;/code&gt;, where following itself is a list. Then I insert the new follower into the following element and save it back again. &lt;/div&gt;&lt;pre style=&quot;font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; font-size: 1em; line-height: 1.2em; margin: 1.2em 0px;&quot;&gt;&lt;code class=&quot;hljs language-python&quot; style=&quot;background-color: #f8f8f8; background: none repeat scroll 0% 0% rgb(248, 248, 248); border-radius: 3px; border-radius: 3px; border: 1px solid rgb(204, 204, 204); border: 1px solid rgb(234, 234, 234); color: #333333; display: block ! important; display: block; display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; overflow-x: auto; overflow: auto; padding: 0.5em 0.7em; padding: 0.5em; padding: 0px 0.3em; white-space: pre-wrap; white-space: pre;&quot;&gt;u = User.get_by_id(cid)&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;if&lt;/span&gt; u.following &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;None&lt;/span&gt;:&lt;br /&gt;  u.following = [whom_id]&lt;br /&gt;  u.put()&lt;br /&gt;&lt;span class=&quot;hljs-keyword&quot; style=&quot;color: #333333; font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;  u.following.append(whom_id)&lt;br /&gt;  u.put()&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;People with experience in ORM such as &lt;a href=&quot;http://www.sqlalchemy.org/&quot;&gt;SQLAlchemy&lt;/a&gt; will be comfortable to implement the changes. &lt;/div&gt;&lt;h4 id=&quot;setp4-testing-and-deployment&quot; style=&quot;font-size: 1.2em; font-weight: bold; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Setp4: testing and deployment&lt;/h4&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;Without the schema file, now the minitwit is a real single file web app. It’s time to use GAE SDK to test it locally, or eventually push it to the cloud. On &lt;a href=&quot;https://appengine.google.com/&quot;&gt;GAE&lt;/a&gt;, We can check any error or warning through the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;Logs&lt;/code&gt; tab to find bugs, or view the raw data through the &lt;code style=&quot;background-color: #f8f8f8; border-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas,Inconsolata,Courier,monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;Datastore Viewer&lt;/code&gt; tab. &lt;/div&gt;&lt;div style=&quot;margin: 1.2em 0px ! important;&quot;&gt;In conclusion, GAE has a few advantages and disadvantages to work with Flask as a web app.&lt;/div&gt;&lt;ul style=&quot;margin: 1.2em 0px; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Pro: &lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;It allows up to 25 free apps (great for exercises)&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Use of database is free&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Automatical memoryCached for high IO&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Con:&lt;ul style=&quot;margin: 0px; margin: 1.2em 0px; padding-left: 1em; padding-left: 2em;&quot;&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;Database is No-SQL, which makes data hard to port&lt;/li&gt;&lt;li style=&quot;margin: 0.5em 0px;&quot;&gt;More expensive for production than EC2&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style=&quot;font-size: 0em; height: 0; margin: 0; padding: 0;&quot; title=&quot;MDH:PHA+W0ZsYXNrXShodHRwczovL2dpdGh1Yi5jb20vbWl0c3VoaWtvL2ZsYXNrKSBpcyBhIGxpZ2h0 LXdlaWdodCB3ZWIgZnJhbWV3b3JrIGZvciBQeXRob24sIHdoaWNoIGlzIHdlbGwgZG9jdW1lbnRl ZCBhbmQgY2xlYXJseSB3cml0dGVuLiBJdHMgR2l0aHViIGRlcG9zaXRvcnkgcHJvdmlkZXMgYSBm ZXcgZXhhbXBsZXMsIHdoaWNoIGluY2x1ZGVzIFttaW5pdHdpdF0oaHR0cHM6Ly9naXRodWIuY29t L21pdHN1aGlrby9mbGFzay90cmVlL21hc3Rlci9leGFtcGxlcy9taW5pdHdpdCkuIFRoZSBgbWlu aXR0d2l0YCB3ZWJzaXRlIGVuam95cyBhIGZldyBiYXNpYyBmZWF0dXJlcyBvZiBzb2NpYWwgbmV0 d29yayBzdWNoIGFzIGZvbGxvd2luZywgbG9naW4vbG9nb3V0LiBUaGUgZGVtbyBzaXRlIG9uIEdB RSBpcyBbaHR0cDovL21pbml0d2l0LTEyMy5hcHBzcG90LmNvbV0oaHR0cDovL21pbml0d2l0LTEy My5hcHBzcG90LmNvbSkuIFRoZSBHaXRodWIgZGVwbyBpcyBbaHR0cHM6Ly9naXRodWIuY29tL2Rh cGFuZ21hby9taW5pdHdpdF0oaHR0cHM6Ly9naXRodWIuY29tL2RhcGFuZ21hby9taW5pdHdpdCku PGJyPjxicj5bR29vZ2xlIEFwcCBFbmdpbmVdKGh0dHBzOi8vYXBwZW5naW5lLmdvb2dsZS5jb20v KSBvciBHQUUgaXMgYSBtYWpvciBwdWJsaWMgY2xvdWRlciBzZXJ2aWNlIGJlc2lkZXMgQW1hem9u IEVDMi4gQW1vbmcgdGhlIGZvdXIgbGFuZ3VhZ2VzKEphdmEvUHl0aG9uL0dvL1BIUCkgaXQgc3Vw cG9ydHMsIEdBRSBpcyBmcmllbmRseSB0byBQeXRob24gdXNlcnMsIHBvc3NpYmx5IGJlY2F1c2Ug R3VpZG8gdmFuIFJvc3N1bSB3b3JrZWQgdGhlcmUgYW5kIHBlcnNvbmFsbHkgY3JlYXRlZCBQeXRo b24gZGF0YXN0b3JlIGludGVyZmFjZS4gQXMgZm9yIG1lLCBpdCBpcyBhIGdvb2QgY2hvaWNlIGZv ciBhIEZsYXNrIGFwcC4gPGJyPiZuYnNwOyA8YnI+IyMjI1N0ZXAxOiBkb3dubG9hZCBHQUUgU0RL IGFuZCBHQUUgRmxhc2sgc2tlbGV0b24gPGJyPjxicj5bR0FFJ3MgUHl0aG9uIFNES10oaHR0cHM6 Ly9jbG91ZC5nb29nbGUuY29tL2FwcGVuZ2luZS9kb3dubG9hZHMjR29vZ2xlX0FwcF9FbmdpbmVf U0RLX2Zvcl9QeXRob24pIHRlc3RzIHRoZSBzdGFnaW5nIGFwcCBhbmQgZXZlbnR1YWxsIHB1c2hl cyB0aGUgYXBwIHRvIHRoZSBjbG91ZC4gPGJyPjxicj5BIEZsYXNrIHNrZWxldG9uIGNhbiBiZSBk b3dsb2FkZWQgZnJvbSBbR29vZ2xlIERldmVsb3BlciBDb25zb2xlXShodHRwczovL2NvbnNvbGUu ZGV2ZWxvcGVycy5nb29nbGUuY29tL3N0YXJ0L2FwcGVuZ2luZSkuIEl0IGNvbnRhaW5zIHRocmVl IGZpbGVzOjxicj4mbmJzcDsgLSBhcHAueWFtbDogc3BlY2lmeSB0aGUgZW50cmFuY2Ugb2YgcnVu LXRpbWU8YnI+Jm5ic3A7IC0gYXBwZW5naW5lX2NvbmZpZy5weTogYWRkIHRoZSBleHRlcm5hbCBs aWJyYXJpZXMgc3VjaCBhcyBGbGFzayB0byBzeXN0ZW0gcGF0aCA8YnI+Jm5ic3A7IC0gbWFpbi5w eTogdGhlIHJvb3QgUHl0aG9uIHByb2dyYW08YnI+PGJyPiMjIyNTdGVwMjogc2NoZW1hIGRlc2ln biA8YnI+VGhlIGRhYmFzZSB1c2VkIGZvciB0aGUgb3JpZ2luYWwgbWluaXR3aXQgaXMgU1FMaXRl LiBbVGhlIHNjaGVtYV0oaHR0cHM6Ly9naXRodWIuY29tL21pdHN1aGlrby9mbGFzay9ibG9iL21h c3Rlci9leGFtcGxlcy9taW5pdHdpdC9zY2hlbWEuc3FsKSBjb25zaXN0cyBvZiB0aHJlZSB0YWJs ZXM6IGB1c2VyYCwgYGZvbGxvd2VyYCBhbmQgYG1lc3NhZ2VgLCB3aGljaCBtYWtlcyBhIG5vcm1h bGl6ZWQgZGF0YWJhc2UgdG9nZXRoZXIuIEdBRSBoYXMgdHdvIERhdGFzdG9yZSBBUElzOiBbREJd KGh0dHBzOi8vY2xvdWQuZ29vZ2xlLmNvbS9hcHBlbmdpbmUvZG9jcy9weXRob24vZGF0YXN0b3Jl LykgYW5kIFtOREJdKGh0dHBzOi8vY2xvdWQuZ29vZ2xlLmNvbS9hcHBlbmdpbmUvZG9jcy9weXRo b24vbmRiLykuIFNpbmNlIG5laXRoZXIgb2YgdGhlbSBzdXBwb3J0cyBqb2luaW5nIChpbiB0aGlz IGNhc2Ugb25lLXRvLW1hbnkgam9pbmluZyBmb3IgdXNlciB0byBmb2xsb3dlciksIEkgbW92ZSB0 aGUgYGZvbGx3ZXJgIHRhYmxlIGFzIGFuIG5lc3RlZCB0ZXh0IHByb3BlcnkgaW50byB0aGUgYHVz ZXJgIHRhYmxlLCB3aGljaCBlbGltaW5hdHNlIHRoZSBuZWVkIGZvciBqb2luaW5nLiA8YnI+PGJy PkFzIHRoZSByZXN1bHQsIHRoZSBgbWFpbi5weWAgaGFzIHR3byBkYXRhIG1vZGVsczogYFVzZXJg IGFuZCBgTWVzc2FnZWAuIFRoZXkgd2lsbCBjcmVhdGUgYW5kIG1haW50YWluIHR3byBga2luZGBz IChvciB3ZSBjYWxsIHRoZW0gYXMgdGFibGVzKSB3aXRoIHRoZSBzYW1lIG5hbWVzIGluIERhdGFz dG9yZS4gPGJyPmBgYHB5dGhvbjxicj5jbGFzcyBVc2VyKG5kYi5Nb2RlbCk6PGJyPiZuYnNwOyB1 c2VybmFtZSA9IG5kYi5TdHJpbmdQcm9wZXJ0eShyZXF1aXJlZD1UcnVlKTxicj4mbmJzcDsgZW1h aWwgPSBuZGIuU3RyaW5nUHJvcGVydHkocmVxdWlyZWQ9VHJ1ZSk8YnI+Jm5ic3A7IHB3X2hhc2gg PSBuZGIuU3RyaW5nUHJvcGVydHkocmVxdWlyZWQ9VHJ1ZSk8YnI+Jm5ic3A7IGZvbGxvd2luZyA9 IG5kYi5JbnRlZ2VyUHJvcGVydHkocmVwZWF0ZWQ9VHJ1ZSk8YnI+Jm5ic3A7IHN0YXJ0X2RhdGUg PSBuZGIuRGF0ZVRpbWVQcm9wZXJ0eShhdXRvX25vd19hZGQ9VHJ1ZSk8YnI+Jm5ic3A7IDxicj5j bGFzcyBNZXNzYWdlKG5kYi5Nb2RlbCk6PGJyPiZuYnNwOyBhdXRob3IgPSBuZGIuSW50ZWdlclBy b3BlcnR5KHJlcXVpcmVkPVRydWUpPGJyPiZuYnNwOyB0ZXh0ID0gbmRiLlRleHRQcm9wZXJ0eShy ZXF1aXJlZD1UcnVlKTxicj4mbmJzcDsgcHViX2RhdGUgPSBuZGIuRGF0ZVRpbWVQcm9wZXJ0eShh dXRvX25vd19hZGQ9VHJ1ZSk8YnI+Jm5ic3A7IGVtYWlsID0gbmRiLlN0cmluZ1Byb3BlcnR5KHJl cXVpcmVkPVRydWUpPGJyPiZuYnNwOyB1c2VybmFtZSA9IG5kYi5TdHJpbmdQcm9wZXJ0eShyZXF1 aXJlZD1UcnVlKTxicj5gYGA8YnI+PGJyPiMjIyNTdGVwMzogcmVwbGFjZSBTUUwgc3RhdGVtZW50 czxicj5UaGUgbmV4dCBzdGVwIGlzIHRvIHJlcGxhY2UgU1FMIG9wZXJhdGlvbnMgaW4gZWFjaCBv ZiB0aGUgcm91dGluZyBmdW5jdGlvbnMgd2l0aCBOREIncyBtZXRob2RzLiBOREIncyB0d28gZnVu ZGFtZW50YWwgbWV0aG9kcyBhcmUgYGdldCgpYCB0aGF0IHJldHJpZXZlcyBkYXRhIGZyb20gRGF0 YXN0b3JlIGFzIGEgbGlzdCwgYW5kIGBwdXQoKWAgdGhhdCBwdXNoZXMgbGlzdCB0byBEYXRhc3Rv cmUgYXMgYSByb3cuIEluIHNob3J0LCBkYXRhIGlzIGNyZWF0ZWQgYW5kIG1hbmlwdWxhdGVkIGFz IGluZGl2aWR1YWwgb2JqZWN0LiA8YnI+PGJyPkZvciBleGFtcGxlLCBpZiBhIGZvbGxvd2VyIG5l ZWRzIHRvIGFkZCB0byBhIHVzZXIsIEkgZmlyc3QgcmV0cmlldmUgdGhlIHVzZXIgYnkgaXRzIElE IHRoYXQgcmV0dXJucyBhIGxpc3QgbGlrZSBgW3VzZXJuYW1lLCBlbWFpbCwgcHdfaGFzaCwgZm9s bG93aW5nLCBzdGFydF9kYXRlXWAsIHdoZXJlIGZvbGxvd2luZyBpdHNlbGYgaXMgYSBsaXN0LiBU aGVuIEkgaW5zZXJ0IHRoZSBuZXcgZm9sbG93ZXIgaW50byB0aGUgZm9sbG93aW5nIGVsZW1lbnQg YW5kIHNhdmUgaXQgYmFjayBhZ2Fpbi4gPGJyPmBgYHB5dGhvbjxicj51ID0gVXNlci5nZXRfYnlf aWQoY2lkKTxicj5pZiB1LmZvbGxvd2luZyBpcyBOb25lOjxicj4mbmJzcDsgdS5mb2xsb3dpbmcg PSBbd2hvbV9pZF08YnI+Jm5ic3A7IHUucHV0KCk8YnI+ZWxzZTo8YnI+Jm5ic3A7IHUuZm9sbG93 aW5nLmFwcGVuZCh3aG9tX2lkKTxicj4mbmJzcDsgdS5wdXQoKTxicj5gYGA8YnI+UGVvcGxlIHdp dGggZXhwZXJpZW5jZSBpbiBPUk0gc3VjaCBhcyBbU1FMQWxjaGVteV0oaHR0cDovL3d3dy5zcWxh bGNoZW15Lm9yZy8pIHdpbGwgYmUgY29tZm9ydGFibGUgdG8gaW1wbGVtZW50IHRoZSBjaGFuZ2Vz LiA8YnI+PGJyPiMjIyNTZXRwNDogdGVzdGluZyBhbmQgZGVwbG95bWVudDxicj5XaXRob3V0IHRo ZSBzY2hlbWEgZmlsZSwgbm93IHRoZSBtaW5pdHdpdCBpcyBhIHJlYWwgc2luZ2xlIGZpbGUgd2Vi IGFwcC4gSXQncyB0aW1lIHRvIHVzZSBHQUUgU0RLIHRvIHRlc3QgaXQgbG9jYWxseSwgb3IgZXZl bnR1YWxseSBwdXNoIGl0IHRvIHRoZSBjbG91ZC4gT24gW0dBRV0oaHR0cHM6Ly9hcHBlbmdpbmUu Z29vZ2xlLmNvbS8pLCBXZSBjYW4gY2hlY2sgYW55IGVycm9yIG9yIHdhcm5pbmcgdGhyb3VnaCB0 aGUgYExvZ3NgIHRhYiB0byBmaW5kIGJ1Z3MsIG9yIHZpZXcgdGhlIHJhdyBkYXRhIHRocm91Z2gg dGhlIGBEYXRhc3RvcmUgVmlld2VyYCB0YWIuIDxicj48YnI+SW4gY29uY2x1c2lvbiwgR0FFIGhh cyBhIGZldyBhZHZhbnRhZ2VzIGFuZCBkaXNhZHZhbnRhZ2VzIHRvIHdvcmsgd2l0aCBGbGFzayBh cyBhIHdlYiBhcHAuPGJyPi0gUHJvOiA8YnI+Jm5ic3A7IC0gSXQgYWxsb3dzIHVwIHRvIDI1IGZy ZWUgYXBwcyAoZ3JlYXQgZm9yIGV4ZXJjaXNlcyk8YnI+Jm5ic3A7IC0gVXNlIG9mIGRhdGFiYXNl IGlzIGZyZWU8YnI+Jm5ic3A7IC0gQXV0b21hdGljYWwgbWVtb3J5Q2FjaGVkIGZvciBoaWdoIElP PGJyPi0gQ29uOjxicj4mbmJzcDsgLSBEYXRhYmFzZSBpcyBOby1TUUwsIHdoaWNoIG1ha2VzIGRh dGEgaGFyZCB0byBwb3J0PGJyPiZuYnNwOyAtIE1vcmUgZXhwZW5zaXZlIGZvciBwcm9kdWN0aW9u IHRoYW4gRUMyPGJyPjxicj48YnI+PGJyPjwvcD4=&quot;&gt;​&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/4572762488354650217/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=4572762488354650217' title='27 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4572762488354650217'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/4572762488354650217'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/05/steps-to-deploy-flasks-minitwit-on.html' title='Steps to deploy Flask&#39;s minitwit on Google App Enginee '/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>27</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-537922754674236523</id><published>2014-05-21T16:22:00.003-05:00</published><updated>2014-05-21T16:22:40.947-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="data mining"/><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'>Use recursion and gradient ascent to solve logistic regression in Python</title><content type='html'>&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-ysUUQI8I3zM/U30Y9uIarOI/AAAAAAAACig/ScA0gaGu8LY/s1600/plot.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-ysUUQI8I3zM/U30Y9uIarOI/AAAAAAAACig/ScA0gaGu8LY/s1600/plot.png&quot; height=&quot;320&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;In his book&amp;nbsp;&lt;a href=&quot;http://www.amazon.com/Machine-Learning-Action-Peter-Harrington/dp/1617290181/ref=sr_1_1?ie=UTF8&amp;amp;qid=1400707152&amp;amp;sr=8-1&amp;amp;keywords=Machine+Learning+in+Action&quot; style=&quot;color: #1155cc;&quot;&gt;Machine Learning in Action&lt;/a&gt;, Peter Harrington provides&amp;nbsp;&lt;a href=&quot;https://github.com/pbharrin/machinelearninginaction/blob/master/Ch05/logRegres.py&quot; style=&quot;color: #1155cc;&quot;&gt;a solution&lt;/a&gt;&amp;nbsp;for parameter estimation of logistic regression . I use&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;pandas&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;ggplot&lt;/code&gt;&amp;nbsp;to realize a recursive alternative. Comparing with the iterative method, the recursion costs more space but may bring the improvement of performance.&lt;/div&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code class=&quot;language-python&quot; style=&quot;background-color: ghostwhite; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# -*- coding: utf-8 -*-&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;&lt;br /&gt;Use recursion and gradient ascent to solve logistic regression in Python&lt;br /&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;import&lt;/span&gt; pandas &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;as&lt;/span&gt; pd&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;from&lt;/span&gt; ggplot &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;import&lt;/span&gt; *&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;function&quot;&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;sigmoid&lt;/span&gt;&lt;span class=&quot;params&quot;&gt;(inX)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1.0&lt;/span&gt;/(&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;+exp(-inX))&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;function&quot;&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;grad_ascent&lt;/span&gt;&lt;span class=&quot;params&quot;&gt;(dataMatrix, labelMat, cycle)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;&lt;br /&gt;    A function to use gradient ascent to calculate the coefficients&lt;br /&gt;    &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; isinstance(cycle, int) == &lt;span class=&quot;built_in&quot; style=&quot;color: #0086b3;&quot;&gt;False&lt;/span&gt; &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;or&lt;/span&gt; cycle &amp;lt; &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;:&lt;br /&gt;        &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;raise&lt;/span&gt; ValueError(&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;Must be a valid value for the number of iterations&quot;&lt;/span&gt;)&lt;br /&gt;    m, n = shape(dataMatrix)&lt;br /&gt;    alpha = &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0.001&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; cycle == &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;:&lt;br /&gt;        &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;return&lt;/span&gt; ones((n, &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;))&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;else&lt;/span&gt;:&lt;br /&gt;        weights = grad_ascent(dataMatrix, labelMat, cycle-&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;)&lt;br /&gt;        h = sigmoid(dataMatrix * weights)&lt;br /&gt;        errors = (labelMat - h)&lt;br /&gt;        &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;return&lt;/span&gt; weights + alpha * dataMatrix.transpose()* errors&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;function&quot;&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;params&quot;&gt;(vector)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;&lt;br /&gt;    A funtion to use ggplot to visualize the result&lt;br /&gt;    &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;    x = arange(-&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;, &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;, &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0.1&lt;/span&gt;)&lt;br /&gt;    y = (-vector[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;]-vector[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;]*x) / vector[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;]&lt;br /&gt;    new = pd.DataFrame()&lt;br /&gt;    new[&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;x&#39;&lt;/span&gt;] = x&lt;br /&gt;    new[&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;y&#39;&lt;/span&gt;] = array(y).flatten()&lt;br /&gt;    infile.classlab = infile.classlab.astype(str)&lt;br /&gt;    p = ggplot(aes(x=&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;x&#39;&lt;/span&gt;, y=&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;y&#39;&lt;/span&gt;, colour=&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;classlab&#39;&lt;/span&gt;), data=infile) + geom_point()&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;return&lt;/span&gt; p + geom_line&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Use pandas to manipulate data&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; __name__ == &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;__main__&#39;&lt;/span&gt;:&lt;br /&gt;    infile = pd.read_csv(&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;https://raw.githubusercontent.com/pbharrin/machinelearninginaction/master/Ch05/testSet.txt&quot;&lt;/span&gt;, sep=&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;\t&#39;&lt;/span&gt;, header=&lt;span class=&quot;built_in&quot; style=&quot;color: #0086b3;&quot;&gt;None&lt;/span&gt;, names=[&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;x&#39;&lt;/span&gt;, &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;y&#39;&lt;/span&gt;, &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;classlab&#39;&lt;/span&gt;])&lt;br /&gt;    infile[&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;one&#39;&lt;/span&gt;] = &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;    mat1 = mat(infile[[&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;one&#39;&lt;/span&gt;, &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;x&#39;&lt;/span&gt;, &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;y&#39;&lt;/span&gt;]])&lt;br /&gt;    mat2 = mat(infile[&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;classlab&#39;&lt;/span&gt;]).transpose()&lt;br /&gt;    result1 = grad_ascent(mat1, mat2, &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;500&lt;/span&gt;)&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; plot(result1)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: 0em; height: 0px; margin: 0px; padding: 0px;&quot; title=&quot;MDH:PGRpdj5JbiBoaXMgYm9vayBbTWFjaGluZSBMZWFybmluZyBpbiBBY3Rpb25dKGh0dHA6Ly93d3cu YW1hem9uLmNvbS9NYWNoaW5lLUxlYXJuaW5nLUFjdGlvbi1QZXRlci1IYXJyaW5ndG9uL2RwLzE2 MTcyOTAxODEvcmVmPXNyXzFfMT9pZT1VVEY4JmFtcDtxaWQ9MTQwMDcwNzE1MiZhbXA7c3I9OC0x JmFtcDtrZXl3b3Jkcz1NYWNoaW5lK0xlYXJuaW5nK2luK0FjdGlvbiksIFBldGVyIEhhcnJpbmd0 b24gcHJvdmlkZXMgW2Egc29sdXRpb25dKGh0dHBzOi8vZ2l0aHViLmNvbS9wYmhhcnJpbi9tYWNo aW5lbGVhcm5pbmdpbmFjdGlvbi9ibG9iL21hc3Rlci9DaDA1L2xvZ1JlZ3Jlcy5weSkgZm9yIHBh cmFtZXRlciBlc3RpbWF0aW9uIG9mIGxvZ2lzdGljIHJlZ3Jlc3Npb24gLiBJIHVzZSBgcGFuZGFz YCBhbmQgYGdncGxvdGAgdG8gcmVhbGl6ZSBhIHJlY3Vyc2l2ZSBhbHRlcm5hdGl2ZS4gQ29tcGFy aW5nIHdpdGggdGhlIGl0ZXJhdGl2ZSBtZXRob2QsIHRoZSByZWN1cnNpb24gY29zdHMgbW9yZSBz cGFjZSBidXQgbWF5IGJyaW5nIHRoZSBpbXByb3ZlbWVudCBvZiBwZXJmb3JtYW5jZS4mbmJzcDs8 L2Rpdj48ZGl2PmBgYHB5dGhvbjwvZGl2PjxkaXY+IyAtKi0gY29kaW5nOiB1dGYtOCAtKi08L2Rp dj48ZGl2PiIiIjwvZGl2PjxkaXY+VXNlIHJlY3VzaW9uIHRvIGNhbGN1bGF0ZSBncmFkaWVudCBh c2VudCBpbiBQeXRob248L2Rpdj48ZGl2PiIiIjwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+aW1w b3J0IHBhbmRhcyBhcyBwZDwvZGl2PjxkaXY+ZnJvbSBnZ3Bsb3QgaW1wb3J0ICo8L2Rpdj48ZGl2 Pjxicj48L2Rpdj48ZGl2PmRlZiBzaWdtb2lkKGluWCk6PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7 IHJldHVybiAxLjAvKDErZXhwKC1pblgpKTwvZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+ZGVmIGdy YWRfYXNjZW50KGRhdGFNYXRyaXgsIGxhYmVsTWF0LCBjeWNsZSk6PC9kaXY+PGRpdj4mbmJzcDsg Jm5ic3A7ICIiIjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBBIGZ1bmN0aW9uIHRvIHVzZSBncmFk aWVudCBhc2NlbnQgdG8gY2FsY3VsYXRlIHRoZSBjb2VmZmljaWVudHM8L2Rpdj48ZGl2PiZuYnNw OyAmbmJzcDsgIiIiPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IGlmIGlzaW5zdGFuY2UoY3ljbGUs IGludCkgPT0gRmFsc2Ugb3IgY3ljbGUgJmx0OyAwOjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7IHJhaXNlIFZhbHVlRXJyb3IoIk11c3QgYmUgYSB2YWxpZCB2YWx1ZSBmb3Ig dGhlIG51bWJlciBvZiBpdGVyYXRpb25zIik8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgbSwgbiA9 IHNoYXBlKGRhdGFNYXRyaXgpPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IGFscGhhID0gMC4wMDE8 L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgaWYgY3ljbGUgPT0gMDo8L2Rpdj48ZGl2PiZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyByZXR1cm4gb25lcygobiwgMSkpPC9kaXY+PGRpdj4mbmJzcDsg Jm5ic3A7IGVsc2U6PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgd2VpZ2h0 cyA9IGdyYWRfYXNjZW50KGRhdGFNYXRyaXgsIGxhYmVsTWF0LCBjeWNsZS0xKTwvZGl2PjxkaXY+ Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGggPSBzaWdtb2lkKGRhdGFNYXRyaXggKiB3ZWln aHRzKTwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGVycm9ycyA9IChsYWJl bE1hdCAtIGgpPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgcmV0dXJuIHdl aWdodHMgKyBhbHBoYSAqIGRhdGFNYXRyaXgudHJhbnNwb3NlKCkqIGVycm9yczwvZGl2PjxkaXY+ Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7Jm5ic3A7PC9kaXY+PGRpdj5kZWYgcGxvdCh2ZWN0 b3IpOjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAiIiI8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsg QSBmdW50aW9uIHRvIHVzZSBnZ3Bsb3QgdG8gdmlzdWFsaXplIHRoZSByZXN1bHQ8L2Rpdj48ZGl2 PiZuYnNwOyAmbmJzcDsgIiIiPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IHggPSBhcmFuZ2UoLTMs IDMsIDAuMSk8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgeSA9ICgtdmVjdG9yWzBdLXZlY3Rvclsx XSp4KSAvIHZlY3RvclsyXTwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBuZXcgPSBwZC5EYXRhRnJh bWUoKTwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBuZXdbJ3gnXSA9IHg8L2Rpdj48ZGl2PiZuYnNw OyAmbmJzcDsgbmV3Wyd5J10gPSBhcnJheSh5KS5mbGF0dGVuKCk8L2Rpdj48ZGl2PiZuYnNwOyAm bmJzcDsgaW5maWxlLmNsYXNzbGFiID0gaW5maWxlLmNsYXNzbGFiLmFzdHlwZShzdHIpPC9kaXY+ PGRpdj4mbmJzcDsgJm5ic3A7IHAgPSBnZ3Bsb3QoYWVzKHg9J3gnLCB5PSd5JywgY29sb3VyPSdj bGFzc2xhYicpLCBkYXRhPWluZmlsZSkgKyBnZW9tX3BvaW50KCk8L2Rpdj48ZGl2PiZuYnNwOyAm bmJzcDsgcmV0dXJuIHAgKyBnZW9tX2xpbmU8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsmbmJzcDs8 L2Rpdj48ZGl2PiMgVXNlIHBhbmRhcyB0byBtYW5pcHVsYXRlIGRhdGE8L2Rpdj48ZGl2PmlmIF9f bmFtZV9fID09ICdfX21haW5fXyc6PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IGluZmlsZSA9IHBk LnJlYWRfY3N2KCJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vcGJoYXJyaW4vbWFj aGluZWxlYXJuaW5naW5hY3Rpb24vbWFzdGVyL0NoMDUvdGVzdFNldC50eHQiLCBzZXA9J1x0Jywg aGVhZGVyPU5vbmUsIG5hbWVzPVsneCcsICd5JywgJ2NsYXNzbGFiJ10pPC9kaXY+PGRpdj4mbmJz cDsgJm5ic3A7IGluZmlsZVsnb25lJ10gPSAxPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IG1hdDEg PSBtYXQoaW5maWxlW1snb25lJywgJ3gnLCAneSddXSk8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsg bWF0MiA9IG1hdChpbmZpbGVbJ2NsYXNzbGFiJ10pLnRyYW5zcG9zZSgpPC9kaXY+PGRpdj4mbmJz cDsgJm5ic3A7IHJlc3VsdDEgPSBncmFkX2FzY2VudChtYXQxLCBtYXQyLCA1MDApPC9kaXY+PGRp dj4mbmJzcDsgJm5ic3A7IHByaW50IHBsb3QocmVzdWx0MSk8L2Rpdj48ZGl2PmBgYDwvZGl2Pjxk aXY+PGJyPjwvZGl2Pg==&quot;&gt;​r&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/537922754674236523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=537922754674236523' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/537922754674236523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/537922754674236523'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/05/use-recursion-and-gradient-ascent-to.html' title='Use recursion and gradient ascent to solve logistic regression in Python'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-ysUUQI8I3zM/U30Y9uIarOI/AAAAAAAACig/ScA0gaGu8LY/s72-c/plot.png" height="72" width="72"/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-6514512212441585272</id><published>2014-04-30T16:09:00.001-05:00</published><updated>2014-05-28T18:37:29.699-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="python"/><title type='text'>Count large chunk of data in Python</title><content type='html'>&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;The line-by-line feature in Python allows it to count hard disk-bound data. The most frequently used data structures in Python are list and dictionary. Many cases the dictionary has advantages since it is a basically a hash table that many realizes O(1) operations.&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;However, for the tasks of counting values, the two options make no much difference and we can choose any of them for convenience. I listed two examples below.&lt;/div&gt;&lt;h3 id=&quot;use-a-dictionary-as-a-counter&quot; style=&quot;background-color: white; color: #222222; font-family: arial; font-size: 1.3em; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Use a dictionary as a counter&lt;/h3&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;There is a question to&amp;nbsp;&lt;a href=&quot;http://www.mitbbs.com/article_t/Statistics/31368081.html&quot; style=&quot;color: #1155cc;&quot;&gt;count the strings in Excel&lt;/a&gt;.&lt;/div&gt;&lt;blockquote style=&quot;background-color: white; border-left-color: rgb(221, 221, 221); border-left-style: solid; border-left-width: 4px; color: #777777; font-family: arial; font-size: small; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;Count the unique values in one column in EXCEL 2010. The worksheet has 1 million rows and 10 columns.&lt;br /&gt;or numbers.&lt;br /&gt;For example,&lt;br /&gt;A5389579_10&lt;br /&gt;A1543848_6&lt;br /&gt;A5389579_8&lt;br /&gt;Need to cut off the part after (including) underscore such as from A5389579_10 to A5389579&lt;/div&gt;&lt;/blockquote&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;Commonly Excel on a desktop can’t handle this size of data, while Python would easily handle the job.&lt;/div&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code class=&quot;language-python&quot; style=&quot;background-color: ghostwhite; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Load the Excel file by the xlrd package&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;import&lt;/span&gt; xlrd&lt;br /&gt;book = xlrd.open_workbook(&lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;test.xlsx&quot;&lt;/span&gt;)&lt;br /&gt;sh = book.sheet_by_index(&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;)&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; sh.name, sh.nrows, sh.ncols&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;Cell D30 is&quot;&lt;/span&gt;, sh.cell_value(rowx=&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;29&lt;/span&gt;, colx=&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Count the unique values in a dictionary&lt;/span&gt;&lt;br /&gt;c = {} &lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;for&lt;/span&gt; rx &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;in&lt;/span&gt; range(sh.nrows):&lt;br /&gt;    word = str(sh.row(rx)[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;].value)[:-&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;]&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;try&lt;/span&gt;:&lt;br /&gt;        c[word] += &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt; &lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;except&lt;/span&gt;:&lt;br /&gt;        c[word] = &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; c&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&quot;use-a-list-as-a-counter&quot; style=&quot;background-color: white; color: #222222; font-family: arial; font-size: 1.3em; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;Use a list as a counter&lt;/h3&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;There is a question to&amp;nbsp;&lt;a href=&quot;http://www.mitbbs.com/article_t/DataSciences/4599.html&quot; style=&quot;color: #1155cc;&quot;&gt;count emails&lt;/a&gt;.&lt;/div&gt;&lt;blockquote style=&quot;background-color: white; border-left-color: rgb(221, 221, 221); border-left-style: solid; border-left-width: 4px; color: #777777; font-family: arial; font-size: small; margin: 1.2em 0px; padding: 0px 1em; quotes: none;&quot;&gt;&lt;div style=&quot;margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;A 3-column data set includes sender, receiver and timestamp. How to calculate the time between the sender sends the email&lt;br /&gt;and the receiver sends the reply email?&lt;/div&gt;&lt;/blockquote&gt;&lt;span style=&quot;background-color: white; color: #222222; font-family: arial; font-size: x-small;&quot;&gt;The challenge is to scale up the small sample data to larger size. The solution I have has the complexity of O(nlogn), which is only limited by the sorting step.&lt;/span&gt;&lt;br /&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code class=&quot;language-python&quot; style=&quot;background-color: ghostwhite; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;raw_data = &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;&lt;br /&gt;    SENDER|RECEIVER|TIMESTAMP&lt;br /&gt;    A B 56&lt;br /&gt;    A A 7&lt;br /&gt;    A C 5&lt;br /&gt;    C D 9&lt;br /&gt;    B B 12&lt;br /&gt;    B A 8&lt;br /&gt;    F G 12&lt;br /&gt;    B A 18&lt;br /&gt;    G F 2&lt;br /&gt;    A B 20&lt;br /&gt;    &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Transform the raw data to a nested list&lt;/span&gt;&lt;br /&gt;data = raw_data.split()&lt;br /&gt;data.pop(&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;) &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Remove the Head&lt;/span&gt;&lt;br /&gt;data = zip(data[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;::&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;], data[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;::&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;], map(&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: int(x), data[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;::&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;]))&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Sort the nested list by the timestamp &lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;from&lt;/span&gt; operator &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;import&lt;/span&gt; itemgetter&lt;br /&gt;data.sort(key=itemgetter(&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;))&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;for&lt;/span&gt; r &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;in&lt;/span&gt; data:&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; r&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Count the time difference in a list&lt;/span&gt;&lt;br /&gt;c = []&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;while&lt;/span&gt; len(data) != &lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;:&lt;br /&gt;    y = data.pop(&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;)&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;in&lt;/span&gt; data:&lt;br /&gt;        &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;] == y[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;] &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;and&lt;/span&gt; x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;] == y[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;]:&lt;br /&gt;            diff = x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;] - y[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;] &lt;br /&gt;            &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; y, x, &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;----&amp;gt;&#39;&lt;/span&gt;, diff&lt;br /&gt;            c.append(diff)&lt;br /&gt;            &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;break&lt;/span&gt; &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Only find the quickest time to respond&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; c&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&quot;p-s-&quot; style=&quot;background-color: white; color: #222222; font-family: arial; font-size: 1.2em; margin: 1.3em 0px 1em; padding: 0px;&quot;&gt;P.S.&lt;/h4&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;I come up with the O(n) solution below, which utilizes two hash tables to decrease the complexity.&lt;/div&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code class=&quot;language-python&quot; style=&quot;background: rgb(248, 248, 255); border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;__author__ = &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&#39;dapangmao&#39;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;function&quot;&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;title&quot; style=&quot;color: #990000; font-weight: bold;&quot;&gt;find_duration&lt;/span&gt;&lt;span class=&quot;params&quot;&gt;(data)&lt;/span&gt;:&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Construct two hash tables&lt;/span&gt;&lt;br /&gt;    h1 = {}&lt;br /&gt;    h2 = {}&lt;br /&gt;    &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Find the starting time for each ID pair&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;in&lt;/span&gt; data:&lt;br /&gt;        &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;] != x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;]:&lt;br /&gt;            key = x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;] + x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;]&lt;br /&gt;            &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;try&lt;/span&gt;:&lt;br /&gt;                h1[key] = x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;]&lt;br /&gt;            &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;except&lt;/span&gt;:&lt;br /&gt;                h1[key] = min(h1[key], x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;])&lt;br /&gt;    &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Find the minimum duration for each ID pair&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;in&lt;/span&gt; data:&lt;br /&gt;        key = x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;] + x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;]&lt;br /&gt;        &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; h1.has_key(key):&lt;br /&gt;            duration = x[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;] - h1[key]&lt;br /&gt;            &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;try&lt;/span&gt;:&lt;br /&gt;                h2[key] = duration&lt;br /&gt;            &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;except&lt;/span&gt;:&lt;br /&gt;                h2[key] = min(h2[key], duration)&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;return&lt;/span&gt; h2&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;if&lt;/span&gt; __name__ == &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;__main__&quot;&lt;/span&gt;:&lt;br /&gt;    raw_data = &lt;span class=&quot;string&quot; style=&quot;color: #dd1144;&quot;&gt;&quot;&quot;&quot;&lt;br /&gt;        SENDER|RECEIVER|TIMESTAMP&lt;br /&gt;        A B 56&lt;br /&gt;        A A 7&lt;br /&gt;        A C 5&lt;br /&gt;        C D 9&lt;br /&gt;        B B 12&lt;br /&gt;        B A 8&lt;br /&gt;        F G 12&lt;br /&gt;        B A 18&lt;br /&gt;        G F 2&lt;br /&gt;        A B 20&lt;br /&gt;        &quot;&quot;&quot;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Transform the raw data to a nested list&lt;/span&gt;&lt;br /&gt;    data = raw_data.split()&lt;br /&gt;    data.pop(&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;) &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Remove the Head&lt;/span&gt;&lt;br /&gt;    data = zip(data[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;0&lt;/span&gt;::&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;], data[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;1&lt;/span&gt;::&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;], map(&lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;lambda&lt;/span&gt; x: int(x), data[&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;2&lt;/span&gt;::&lt;span class=&quot;number&quot; style=&quot;color: #009999;&quot;&gt;3&lt;/span&gt;]))&lt;br /&gt;    &lt;span class=&quot;comment&quot; style=&quot;color: #999988; font-style: italic;&quot;&gt;# Verify the result&lt;/span&gt;&lt;br /&gt;    &lt;span class=&quot;keyword&quot; style=&quot;font-weight: bold;&quot;&gt;print&lt;/span&gt; find_duration(data)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: 0em; height: 0px; margin: 0px; padding: 0px;&quot; title=&quot;MDH:PGRpdj48YnI+PC9kaXY+PGRpdj4jIyMjUC5TLjwvZGl2PjxkaXY+SSBjb21lIHVwIHdpdGggdGhl IE8obikgc29sdXRpb24gYmVsb3csIHdoaWNoIHV0aWxpemVzIHR3byBoYXNoIHRhYmxlcyB0byBk ZWNyZWFzZSB0aGUgY29tcGxleGl0eS4mbmJzcDs8L2Rpdj48ZGl2PmBgYHB5dGhvbjwvZGl2Pjxk aXY+X19hdXRob3JfXyA9ICdkYXBhbmdtYW8nPC9kaXY+PGRpdj48YnI+PC9kaXY+PGRpdj5kZWYg ZmluZF9kdXJhdGlvbihkYXRhKTo8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgIyBDb25zdHJ1Y3Qg dHdvIGhhc2ggdGFibGVzPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IGgxID0ge308L2Rpdj48ZGl2 PiZuYnNwOyAmbmJzcDsgaDIgPSB7fTwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAjIEZpbmQgdGhl IHN0YXJ0aW5nIHRpbWUgZm9yIGVhY2ggSUQgcGFpcjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBm b3IgeCBpbiBkYXRhOjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGlmIHhb MF0gIT0geFsxXTo8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7IGtleSA9IHhbMF0gKyB4WzFdPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyB0cnk6PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGgxW2tleV0gPSB4WzJdPC9kaXY+PGRp dj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBleGNlcHQ6PC9kaXY+ PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5i c3A7IGgxW2tleV0gPSBtaW4oaDFba2V5XSwgeFsyXSk8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsg IyBGaW5kIHRoZSBtaW5pbXVtIGR1cmF0aW9uIGZvciBlYWNoIElEIHBhaXI8L2Rpdj48ZGl2PiZu YnNwOyAmbmJzcDsgZm9yIHggaW4gZGF0YTo8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyBrZXkgPSB4WzFdICsgeFswXTwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7IGlmIGgxLmhhc19rZXkoa2V5KTo8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7 ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGR1cmF0aW9uID0geFsyXSAtIGgxW2tleV08L2Rpdj48ZGl2 PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IHRyeTo8L2Rpdj48ZGl2 PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg aDJba2V5XSA9IGR1cmF0aW9uPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7ICZuYnNwOyBleGNlcHQ6PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJz cDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IGgyW2tleV0gPSBtaW4oaDJba2V5XSwgZHVy YXRpb24pPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7IHJldHVybiBoMjwvZGl2PjxkaXY+PGJyPjwv ZGl2PjxkaXY+aWYgX19uYW1lX18gPT0gIl9fbWFpbl9fIjo8L2Rpdj48ZGl2PiZuYnNwOyAmbmJz cDsgcmF3X2RhdGEgPSAiIiI8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBT RU5ERVJ8UkVDRUlWRVJ8VElNRVNUQU1QPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAm bmJzcDsgQSBCIDU2PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsgQSBBIDc8 L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBBIEMgNTwvZGl2PjxkaXY+Jm5i c3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7IEMgRCA5PC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZu YnNwOyAmbmJzcDsgQiBCIDEyPC9kaXY+PGRpdj4mbmJzcDsgJm5ic3A7ICZuYnNwOyAmbmJzcDsg QiBBIDg8L2Rpdj48ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBGIEcgMTI8L2Rpdj48 ZGl2PiZuYnNwOyAmbmJzcDsgJm5ic3A7ICZuYnNwOyBCIEEgMTg8L2Rpdj48ZGl2PiZuYnNwOyAm bmJzcDsgJm5ic3A7ICZuYnNwOyBHIEYgMjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAmbmJzcDsg Jm5ic3A7IEEgQiAyMDwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAmbmJzcDsgJm5ic3A7ICIiIjwv ZGl2PjxkaXY+PGJyPjwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyAjIFRyYW5zZm9ybSB0aGUgcmF3 IGRhdGEgdG8gYSBuZXN0ZWQgbGlzdDwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBkYXRhID0gcmF3 X2RhdGEuc3BsaXQoKTwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBkYXRhLnBvcCgwKSAjIFJlbW92 ZSB0aGUgSGVhZDwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNwOyBkYXRhID0gemlwKGRhdGFbMDo6M10s IGRhdGFbMTo6M10sIG1hcChsYW1iZGEgeDogaW50KHgpLCBkYXRhWzI6OjNdKSk8L2Rpdj48ZGl2 PiZuYnNwOyAmbmJzcDsgIyBWZXJpZnkgdGhlIHJlc3VsdDwvZGl2PjxkaXY+Jm5ic3A7ICZuYnNw OyBwcmludCBmaW5kX2R1cmF0aW9uKGRhdGEpPC9kaXY+PGRpdj5gYGA8L2Rpdj48ZGl2Pjxicj48 L2Rpdj4=&quot;&gt;​&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/6514512212441585272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=6514512212441585272' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/6514512212441585272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/6514512212441585272'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/04/count-large-chunk-of-data-in-python.html' title='Count large chunk of data in Python'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3256159328630041416.post-2761331639484264001</id><published>2014-04-06T08:41:00.002-05:00</published><updated>2014-04-08T16:04:14.059-05:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Hadoop"/><title type='text'>10 popular Linux commands for Hadoop</title><content type='html'>&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;The Hadoop system has its unique shell language, which is called&amp;nbsp;&lt;a href=&quot;https://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html&quot; style=&quot;color: #1155cc;&quot;&gt;FS&lt;/a&gt;. Comparing with the common Bash shell within the Linux ecosystem, the FS shell has much fewer commands. To deal with the humongous size of data distributively stored at the Hadoop nodes, in my practice, I have 10 popular Linux command to facilitate my daily work.&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;1. sort&lt;/strong&gt;&lt;br /&gt;A good conduct of running Hadoop is to always test the map/reduce programs at the local machine before releasing the time-consuming map/reduce codes to the cluster environment. The&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sort&lt;/code&gt;&amp;nbsp;command simulates the sort and shuffle step necessary for the map/redcue process. For example, I can run the piped commands below to verify whether the Python codes have any bugs.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;./mapper.py | sort | ./reducer.py&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;2. tail&lt;/strong&gt;&lt;br /&gt;Interestingly, the FS shell at Hadoop only supports the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;tail&lt;/code&gt;&amp;nbsp;command instead of the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;head&lt;/code&gt;&amp;nbsp;command. Then I can only grab the bottom lines of the data stored at Hadoop.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -tail 5 data/web.log.9&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;3. sed&lt;/strong&gt;&lt;br /&gt;Sine the FS shell doesn’t provide the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;head&lt;/code&gt;&amp;nbsp;command, the alternative solution is to use the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;sed&lt;/code&gt;&amp;nbsp;command that actually has more flexible options.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -cat data/web.log.9 | sed &#39;1,+5!d&#39;&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;4. stat&lt;/strong&gt;&lt;br /&gt;The&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;stat&lt;/code&gt;&amp;nbsp;command allows me to know the time when the file has been touched.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -stat data/web.log.9&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;5. awk&lt;/strong&gt;&lt;br /&gt;The commands that the FS shell supports usually have very few options. For example the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;du&lt;/code&gt;&amp;nbsp;command under the FS shell does not support&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;-sh&lt;/code&gt;&amp;nbsp;option to aggregate the disk usage of the sub-directories. In this case, I have to look for help from the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;awk&lt;/code&gt;&amp;nbsp;command to satisfy my need.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -du  data | awk &#39;{sum+=$1} END {print sum}&#39;&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;6. wc&lt;/strong&gt;&lt;br /&gt;One of the most important things to understand a file located at the Hadoop is to find the number of its total lines.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -cat data/web.log.9 | wc -l&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;7. cut&lt;/strong&gt;&lt;br /&gt;The&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;cut&lt;/code&gt;&amp;nbsp;command is convenient to select the specified columns at the file. For example, I am able to count the lines for each of the unique groups from the column between the position of #5 and #14.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -cat data/web.log.9 | cut -c 5-14 | uniq -c&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;8. getmerge&lt;/strong&gt;&lt;br /&gt;The great thing for the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;getmerge&lt;/code&gt;&amp;nbsp;command is that I am able to fetch all the result after map/reduce to the local file system as a single file.&lt;br /&gt;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;hadoop fs -getmerge result result_merged.txt&lt;/code&gt;&lt;/div&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;9. grep&lt;/strong&gt;&lt;br /&gt;I can start a mapper-only job only with the&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;grep&lt;/code&gt;&amp;nbsp;command form the Bash shell to search the lines which contain the key words I am interested in. And this is a map-only task.&lt;/div&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code style=&quot;background-color: ghostwhite; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;hadoop jar $STREAMING -D mapred.reduce.tasks=0 -input data -output result -mapper &quot;bash -c &#39;grep -e Texas&#39;&quot;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div style=&quot;background-color: white; color: #222222; font-family: arial; font-size: small; margin-bottom: 1.2em !important; margin-top: 1.2em !important;&quot;&gt;&lt;strong&gt;10. at and crontab&lt;/strong&gt;&lt;br /&gt;The&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;at&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code style=&quot;background-color: #f8f8f8; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(234, 234, 234); display: inline; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; padding: 0px 0.3em; white-space: pre-wrap;&quot;&gt;crontab&lt;/code&gt;&amp;nbsp;commnands are my favorite to schedule a job at Hadoop. For example, I would like to use the order below to clean the map/reduce results at midnight.&lt;/div&gt;&lt;pre style=&quot;background-color: white; color: #222222; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 1em; line-height: 1.2em; margin-bottom: 1.2em; margin-top: 1.2em;&quot;&gt;&lt;code style=&quot;background-color: ghostwhite; border-bottom-left-radius: 3px; border-bottom-right-radius: 3px; border-top-left-radius: 3px; border-top-right-radius: 3px; border: 1px solid rgb(204, 204, 204); color: #333333; display: block !important; font-family: Consolas, Inconsolata, Courier, monospace; font-size: 0.85em; margin: 0px 0.15em; overflow: auto; padding: 0.5em;&quot;&gt;at 0212&lt;br /&gt;at &amp;gt; hadoop fs -rmr result&lt;/code&gt;&lt;/pre&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.sasanalysis.com/feeds/2761331639484264001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3256159328630041416&amp;postID=2761331639484264001' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/2761331639484264001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3256159328630041416/posts/default/2761331639484264001'/><link rel='alternate' type='text/html' href='http://blog.sasanalysis.com/2014/04/10-popular-linux-commands-for-hadoop.html' title='10 popular Linux commands for Hadoop'/><author><name>CHARLIE HUANG</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry></feed>