<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:blogger='http://schemas.google.com/blogger/2008' xmlns:georss='http://www.georss.org/georss' xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5979497974446854318</id><updated>2024-09-14T12:53:19.122+08:00</updated><category term="R"/><category term="Tutorial"/><category term="Python"/><category term="Descriptive Statistics"/><category term="LaTeX"/><category term="Mathematics"/><category term="Parametric Inference"/><category term="Probability Theory"/><category term="Data Mining"/><category term="Installation"/><category term="Interactive Visualization"/><category term="C and CPP"/><category term="Image Analysis"/><category term="Machine Learning"/><category term="Mapping"/><category term="Multivariate Analysis"/><category term="SAS"/><category term="Spatial Analysis"/><category term="Statistical Learning"/><category term="Animation"/><category term="Book Review"/><category term="Infographic"/><category term="Linear Models"/><category term="Mathematica"/><category term="Meetup"/><category term="Packages"/><category term="Real Analysis"/><category term="Signal Processing"/><category term="Ubuntu"/><category term="Video"/><category term="ALUES"/><category term="Julia"/><category term="Nonlinear Models"/><category term="Optimization Algorithm"/><category term="Sampling Analysis"/><category term="Shiny Apps"/><category term="Windows"/><title type='text'>Analysis with Programming</title><subtitle type='html'>. . . a love story between theory and practice . . .</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default?redirect=false'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default?start-index=26&amp;max-results=25&amp;redirect=false'/><author><name>AL</name><uri>http://www.blogger.com/profile/09263478137359820882</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>68</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-7061899207800133848</id><published>2017-06-12T13:01:00.000+08:00</published><updated>2017-11-07T12:35:18.730+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Installation"/><category scheme="http://www.blogger.com/atom/ns#" term="Julia"/><title type='text'>Julia: Installation and Editors</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: justify;&quot; trbidi=&quot;on&quot;&gt;
If you have been following this blog, you may have noticed that I don&#39;t have any update for more than a year now. The reason is that I&#39;ve been busy with my research, my work, and I promised not to share anything here until I finished my degree (Master of Science in Statistics). Anyways, at this point I think it&#39;s time to share with you what I&#39;ve learned in the past year. So far, it&#39;s been a good year for Statistics especially in the Philippines, in fact, last November 15, 2016, the team of local data scientists made a huge step in Big data by organizing the &lt;a href=&quot;http://www.bigdataconferenceph.com/&quot; target = &quot;_blank&quot;&gt;first ever conference on this topic&lt;/a&gt;. Also months before that, the &lt;a href=&quot;https://psa.gov.ph/ncs/13th&quot; target = &quot;_blank&quot;&gt;13th National Convention on Statistics&lt;/a&gt; organized by the &lt;a href=&quot;https://psa.gov.ph/&quot; target = &quot;_blank&quot;&gt;Philippine Statistics Authority&lt;/a&gt;, invited a keynote speaker from &lt;a href=&quot;http://www.paris21.org/&quot; target = &quot;_blank&quot;&gt;Paris21&lt;/a&gt; to tackle Big data and its use in the government. &lt;br/&gt;&lt;br/&gt;

So without further ado, in this post, I would like to share a new programming language which I&#39;ve used for several months now, and it&#39;s called &lt;a target = &quot;_blank&quot; href=&quot;https://julialang.org/&quot;&gt;Julia&lt;/a&gt;. This programming language is by far my favorite, it&#39;s a well-thought-out language as many would say, for many reasons. The first of course is the speed, second is the grammar, and many more. I can&#39;t list them down here, but I suggest you visit the &lt;a href=&quot;https://julialang.org/&quot; target = &quot;_blank&quot;&gt;official website&lt;/a&gt;, and try it for yourself.
&lt;h2&gt;Installation&lt;/h2&gt;
The installation of this program is straightforward, simply go to the &lt;a target = &quot;_blank&quot; href=&quot;https://julialang.org/downloads/&quot;&gt;Julia&#39;s official download page&lt;/a&gt; and download the binaries for your operating system. Alternatively, you can install Julia by downloading the &lt;a target = &quot;_blank&quot; href=&quot;https://juliacomputing.com/products/&quot;&gt;JuliaPro&lt;/a&gt; from the &lt;a href=&quot;https://juliacomputing.com/&quot;&gt;Julia Computing&lt;/a&gt; products. This will setup everything you need, which include the &lt;a target = &quot;_blank&quot; href=&quot;https://atom.io/&quot;&gt;Github Atom Editor&lt;/a&gt; out of the box. After installation, the first time you load the command-line-version program, you&#39;ll have the following window: &lt;br/&gt;&lt;br/&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center; margin-top:-20px;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyH49Xj1D6Ktyc1ZetszkF_0a9UrbyVRL7Y8Kw3OzjdQt-WbSgeo1KFRU2DBSbiT5l4WTx1Ec7V2wtiQjI40sKfBl5Sg3bPY_9-PgqQRMSTVdDJv5dB48JBLh-nToZEhwUSKcvdbT0YQJH/s1600/Screen+Shot+2017-06-11+at+1.51.53+PM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyH49Xj1D6Ktyc1ZetszkF_0a9UrbyVRL7Y8Kw3OzjdQt-WbSgeo1KFRU2DBSbiT5l4WTx1Ec7V2wtiQjI40sKfBl5Sg3bPY_9-PgqQRMSTVdDJv5dB48JBLh-nToZEhwUSKcvdbT0YQJH/s1600/Screen+Shot+2017-06-11+at+1.51.53+PM.png&quot; data-original-width=&quot;1410&quot; data-original-height=&quot;956&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p style = &quot;margin-top: -50px;&quot;&gt;&lt;/p&gt;
Working with the command-line-version is actually fun, and personally I think Julia has the best command-line-version compared to R and Python in terms of features. For example, you can shift to &lt;b&gt;shell mode&lt;/b&gt; by simply pressing &lt;kbd&gt;;&lt;/kbd&gt; in the Julia prompt, and using &lt;kbd&gt;?&lt;/kbd&gt; to activate the &lt;b&gt;help mode&lt;/b&gt;. It also has autocompletion by pressing &lt;kbd&gt;Tab&lt;/kbd&gt; after entering first few letters of the syntax, the LaTeX UTF autocompletion is also one of the best features, and almost any symbols/characters can be used as variables, like emoticon as shown below:
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGZ6fK6K39zLUeFIBFqq2mzLYcyfg-8Sukhgb7naQVNZJf368IK267szlrAlVvu2zPgTPNrXn-9L4dTtv16NdZijhyphenhyphenqGmzbUEaUM5oBLbJ_aUoYd2Kbj2wfjPBFxK4k9vc2WFcmXpZ9mSy/s1600/Screen+Shot+2017-06-12+at+9.10.19+AM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGZ6fK6K39zLUeFIBFqq2mzLYcyfg-8Sukhgb7naQVNZJf368IK267szlrAlVvu2zPgTPNrXn-9L4dTtv16NdZijhyphenhyphenqGmzbUEaUM5oBLbJ_aUoYd2Kbj2wfjPBFxK4k9vc2WFcmXpZ9mSy/s1600/Screen+Shot+2017-06-12+at+9.10.19+AM.png&quot; data-original-width=&quot;1422&quot; data-original-height=&quot;1246&quot; /&gt;&lt;/a&gt;&lt;/div&gt;


&lt;h2 style = &quot;margin-top: -50px;&quot;&gt;Editor&lt;/h2&gt;
While Julia&#39;s command-line-version is loaded with good features, working with huge project needs a better front-end editors. Like &lt;a target = &quot;_blank&quot; href=&quot;https://www.rstudio.com/&quot;&gt;RStudio&lt;/a&gt; for &lt;a target = &quot;_blank&quot; href=&quot;https://cran.r-project.org/&quot;&gt;R&lt;/a&gt;, &lt;a target = &quot;_blank&quot; href=&quot;https://www.jetbrains.com/pycharm/&quot;&gt;PyCharm&lt;/a&gt; for &lt;a target = &quot;_blank&quot; href=&quot;https://www.python.org/&quot;&gt;Python&lt;/a&gt;, Julia can run on &lt;a target = &quot;_blank&quot; href=&quot;http://jupyter.org/&quot;&gt;Jupyter&lt;/a&gt; (also available for R and Python), &lt;a target = &quot;_blank&quot; href=&quot;https://atom.io/&quot;&gt;Github Atom Editor&lt;/a&gt;, and &lt;a target = &quot;_blank&quot; href=&quot;https://code.visualstudio.com/&quot;&gt;Microsoft Visual Studio Code&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;

&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3Hzkzu7ir_vQaWYUBU8fSusSY_AVPLxqK6y71WzNlHMzYPlYww-fpNGmzqqETF4u67Mi7Eecc4i0VTrFHMBXwDOv80eL7KVkGVN7b1NdDXWAPX_8tF_6DKpn4o2DW6JRRxB04pnv8BhIo/s1600/Screen+Shot+2017-06-11+at+2.56.10+PM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3Hzkzu7ir_vQaWYUBU8fSusSY_AVPLxqK6y71WzNlHMzYPlYww-fpNGmzqqETF4u67Mi7Eecc4i0VTrFHMBXwDOv80eL7KVkGVN7b1NdDXWAPX_8tF_6DKpn4o2DW6JRRxB04pnv8BhIo/s1600/Screen+Shot+2017-06-11+at+2.56.10+PM.png&quot; data-original-width=&quot;1600&quot; data-original-height=&quot;969&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: center;&quot; trbidi=&quot;on&quot;&gt;Julia in Jupyter Notebook&lt;/div&gt;&lt;br/&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgh_TebM2bablWafsQvmibEIy2r_zsSmhSbKq-_LqySfHWjrR_IdFmgWQ8oLCyF8uDIvQQa7bj-KyFt0K6Er9qLbxyHikhAeeX1EV0lSo5C-Pc1ktOl4icY3Oa2B2DTpIkiIxP6NW7_RNbj/s1600/Screen+Shot+2017-06-11+at+2.34.04+PM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgh_TebM2bablWafsQvmibEIy2r_zsSmhSbKq-_LqySfHWjrR_IdFmgWQ8oLCyF8uDIvQQa7bj-KyFt0K6Er9qLbxyHikhAeeX1EV0lSo5C-Pc1ktOl4icY3Oa2B2DTpIkiIxP6NW7_RNbj/s1600/Screen+Shot+2017-06-11+at+2.34.04+PM.png&quot; data-original-width=&quot;1600&quot; data-original-height=&quot;1000&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: center;&quot; trbidi=&quot;on&quot;&gt;Julia in Github Atom Editor&lt;/div&gt;
&lt;br/&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3QoGw2daSeu77KrI16do_PUXUR1rdVmHM9YmOdtVkKV52pKilyKSo2f9-QVQMx9lbJ6-aOOgeMHMP4sBQjhXQtSbk-GrpPKM1vByePZZu3eI5AzrzPRJmsoKbCSY4MuHm8an7CMtOYt6f/s1600/Screen+Shot+2017-06-11+at+2.22.42+PM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3QoGw2daSeu77KrI16do_PUXUR1rdVmHM9YmOdtVkKV52pKilyKSo2f9-QVQMx9lbJ6-aOOgeMHMP4sBQjhXQtSbk-GrpPKM1vByePZZu3eI5AzrzPRJmsoKbCSY4MuHm8an7CMtOYt6f/s1600/Screen+Shot+2017-06-11+at+2.22.42+PM.png&quot; data-original-width=&quot;1600&quot; data-original-height=&quot;1000&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: center;&quot; trbidi=&quot;on&quot;&gt;Julia in Microsoft Visual Studio Code&lt;/div&gt;
&lt;br/&gt;
To install the Jupyter notebook, simply run the following codes:
&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/986d299cee2b620162ae1b76c0ee734a.js&quot;&gt;&lt;/script&gt;
In the screenshot above, I tweaked the theme of the notebook using the script from &lt;a target = &quot;_blank&quot; href=&quot;https://github.com/dunovank/jupyter-themes&quot;&gt;this repository&lt;/a&gt;.

As mentioned, to setup Julia in Github Atom Editor, I recommend downloading the JuliaPro or you can follow the instruction in the &lt;a target = &quot;_blank&quot; href=&quot;http://junolab.org/&quot;&gt;Juno Lab website&lt;/a&gt;. After installation, you can add Atom Extensions like &lt;a target = &quot;_blank&quot; href=&quot;https://atom.io/packages/minimap&quot;&gt;Minimap&lt;/a&gt;, which is not available by default, and in case you are interested, the syntax highlighter I used in the screenshot is the &lt;a target = &quot;_blank&quot; href=&quot;https://atom.io/themes/gruvbox-plus-syntax&quot;&gt;Gruvbox Plus&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;

Further, to setup Julia in Microsoft Visual Studio Code, open the program, press &lt;kbd&gt;Ctrl&lt;/kbd&gt;+&lt;kbd&gt;P&lt;/kbd&gt;, paste &lt;code&gt;ext install language-julia&lt;/code&gt; and hit &lt;kbd&gt;Enter&lt;/kbd&gt;. This will install the Julia extension for Visual Studio Code. After installation, you can load the Julia REPL by pressing the following keys &lt;kbd&gt;Ctrl&lt;/kbd&gt;+&lt;kbd&gt;Shift&lt;/kbd&gt;+&lt;kbd&gt;P&lt;/kbd&gt; (Windows) or &lt;kbd&gt;Cmd&lt;/kbd&gt;+&lt;kbd&gt;Shift&lt;/kbd&gt;+&lt;kbd&gt;P&lt;/kbd&gt; (Mac) and enter &lt;code&gt;julia start repl&lt;/code&gt;, and press &lt;kbd&gt;Enter&lt;/kbd&gt;. If there is an error, the path may need to be specified properly. To do this, go to &lt;b&gt;Preferences &gt; Settings&lt;/b&gt;. Then in the .json user file settings, enter the following:&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f296a604c168f84dea3418577f118f71.js&quot;&gt;&lt;/script&gt;
Of course, you need to check the path properly by replacing the &lt;code&gt;Julia-0.6.0-rc3&lt;/code&gt; (Windows) or &lt;code&gt;Julia-0.6.app&lt;/code&gt; (Mac) with the desired version of your Julia, and the &lt;code&gt;C:/Users/MyName&lt;/code&gt; with your desired path. Further, I use the following setting in my .json file to adjust my Minimap similar to the screenshot above.&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d2fb3f89ed03e5267a237437ece382e1.js&quot;&gt;&lt;/script&gt;
Lastly, to toggle the cursor&#39;s focus between the script pane and the integrated Julia terminal using &lt;kbd&gt;Ctrl&lt;/kbd&gt;+&lt;kbd&gt;`&lt;/kbd&gt;, I use the following &lt;a target = &quot;_blank&quot; href=&quot;https://stackoverflow.com/questions/42796887/switch-focus-between-editor-and-integrated-terminal-in-visual-studio-code&quot;&gt;Keybindings&lt;/a&gt; (go to &lt;b&gt;Preferences &gt; Keyboard Shortcuts &gt; keybindings.json&lt;/b&gt;).&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/76d61a566f03e96a1410c5cc0416d423.js&quot;&gt;&lt;/script&gt;
For more on this topic visit the &lt;a target = &quot;_blank&quot; href=&quot;https://github.com/JuliaEditorSupport/julia-vscode&quot;&gt;official github page&lt;/a&gt;. The three editors above have advantages and disadvantages. However, my primary editor is the Visual Studio Code, because it is fast and loaded with features as well. The major limitation of this editor is the LaTeX UTF autocompletion, which is available for Atom Editor. But there are third party packages like &lt;a target = &quot;_blank&quot; href=&quot;https://marketplace.visualstudio.com/items?itemName=oijaz.unicode-latex&quot;&gt;Unicode LaTeX&lt;/a&gt;, that can do the job indirectly, or alternatively you can generate the LaTeX UTF using the console (the integrated Julia terminal in the Visual Studio Code), but I think this is not a big deal, and may be in the near future, this capability will be added. On the other hand, the Atom Editor has of course more features for Julia, for example the plot pane, the workspace, and many more. The only problem is that, it&#39;s kind of slow especially when working with several datasets in your workspace, plus plots, plus very long lines of codes, scrolling through it is not smooth. Nevertheless, let&#39;s be positive and hope that more improvements are coming to these editors.

Finally, for those who want to start using Julia, visit the &lt;a target = &quot;_blank&quot; href=&quot;https://docs.julialang.org/en/stable/&quot;&gt;Official Documentation&lt;/a&gt; and &lt;a target = &quot;_blank&quot;  href=&quot;https://julialang.org/learning/&quot;&gt;Learning Materials&lt;/a&gt;; ask questions on &lt;a target = &quot;_blank&quot;  href=&quot;https://discourse.julialang.org/&quot;&gt;Julia Discourse&lt;/a&gt; and join the &lt;a target = &quot;_blank&quot; href=&quot;https://gitter.im/JuliaLang/julia&quot;&gt;Julia Gitter&lt;/a&gt;.

&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/7061899207800133848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2017/06/julia-installation-and-editors.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/7061899207800133848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/7061899207800133848'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2017/06/julia-installation-and-editors.html' title='Julia: Installation and Editors'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyH49Xj1D6Ktyc1ZetszkF_0a9UrbyVRL7Y8Kw3OzjdQt-WbSgeo1KFRU2DBSbiT5l4WTx1Ec7V2wtiQjI40sKfBl5Sg3bPY_9-PgqQRMSTVdDJv5dB48JBLh-nToZEhwUSKcvdbT0YQJH/s72-c/Screen+Shot+2017-06-11+at+1.51.53+PM.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-6446820968926186166</id><published>2015-12-22T22:20:00.003+08:00</published><updated>2017-04-15T20:31:50.077+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Machine Learning"/><category scheme="http://www.blogger.com/atom/ns#" term="Nonlinear Models"/><category scheme="http://www.blogger.com/atom/ns#" term="Optimization Algorithm"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><category scheme="http://www.blogger.com/atom/ns#" term="Statistical Learning"/><title type='text'>R and Python: Gradient Descent</title><content type='html'>&lt;div style=&quot;text-align: justify;&quot;&gt;
One of the problems often dealt in Statistics is minimization of the objective function. And contrary to the linear models, there is no analytical solution for models that are nonlinear on the parameters such as logistic regression, neural networks, and nonlinear regression models (like Michaelis-Menten model). In this situation, we have to use mathematical programming or optimization. And one popular optimization algorithm is the &lt;i&gt;gradient descent&lt;/i&gt;, which we&#39;re going to illustrate here. To start with, let&#39;s consider a simple function with closed-form solution given by
\begin{equation}
f(\beta) \triangleq \beta^4 - 3\beta^3 + 2.
\end{equation}
We want to minimize this function with respect to $\beta$. The quick solution to this, as what calculus taught us, is to compute for the first derivative of the function, that is
\begin{equation}
\frac{\text{d}f(\beta)}{\text{d}\beta}=4\beta^3-9\beta^2.
\end{equation}
Setting this to 0 to obtain the stationary point gives us
\begin{align}
\frac{\text{d}f(\beta)}{\text{d}\beta}&amp;amp;\overset{\text{set}}{=}0\nonumber\\
4\hat{\beta}^3-9\hat{\beta}^2&amp;amp;=0\nonumber\\
4\hat{\beta}^3&amp;amp;=9\hat{\beta}^2\nonumber\\
4\hat{\beta}&amp;amp;=9\nonumber\\
\hat{\beta}&amp;amp;=\frac{9}{4}.
\end{align}
&lt;/div&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;The following plot shows the minimum of the function at $\hat{\beta}=\frac{9}{4}$ (red line in the plot below).&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;embed src=&quot;https://cdn.rawgit.com/alstat/SampleImages/master/g1.svg&quot; width=&quot;500&quot;&gt;&lt;/embed&gt;
&lt;/center&gt;
&lt;i&gt;R Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a097d6c4b1a0e8d693b7.js&quot;&gt;&lt;/script&gt;
Now let&#39;s consider minimizing this problem using gradient descent with the following algorithm:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Initialize $\mathbf{x}_{r},r=0$&lt;/li&gt;
&lt;li&gt;&lt;b&gt;while&lt;/b&gt; $\lVert \mathbf{x}_{r}-\mathbf{x}_{r+1}\rVert &amp;gt; \nu$&lt;/li&gt;
&lt;li&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; $\mathbf{x}_{r+1}\leftarrow \mathbf{x}_{r} - \gamma\nabla f(\mathbf{x}_r)$&lt;/li&gt;
&lt;li&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; $r\leftarrow r + 1$&lt;/li&gt;
&lt;li&gt;&lt;b&gt;end while&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;return&lt;/b&gt; $\mathbf{x}_{r}$ and $r$&lt;/li&gt;
&lt;/ol&gt;
&lt;div style=&quot;text-align: justify;&quot;&gt;
where $\nabla f(\mathbf{x}_r)$ is the &lt;i&gt;gradient&lt;/i&gt; of the cost function, $\gamma$ is the &lt;i&gt;learning-rate&lt;/i&gt; parameter of the algorithm, and $\nu$ is the &lt;i&gt;precision&lt;/i&gt; parameter. For the function above, let the initial guess be $\hat{\beta}_0=4$ and $\gamma=.001$ with $\nu=.00001$. Then $\nabla f(\hat{\beta}_0)=112$, so that 
\[\hat{\beta}_1=\hat{\beta}_0-.001(112)=3.888.\] 
And $|\hat{\beta}_1 - \hat{\beta}_0| = 0.112&amp;gt; \nu$. Repeat the process until at some $r$, $|\hat{\beta}_{r}-\hat{\beta}_{r+1}| \ngtr \nu$. It will turn out that 350 iterations are needed to satisfy the desired inequality, the plot of which is in the following figure with estimated minimum $\hat{\beta}_{350}=2.250483\approx\frac{9}{4}$.&lt;/div&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;embed src=&quot;https://cdn.rawgit.com/alstat/SampleImages/master/z2.svg&quot; width=&quot;500&quot;&gt;&lt;/embed&gt;
&lt;/center&gt;
&lt;div style=&quot;text-align: justify;&quot;&gt;
&lt;i&gt;R Script with Plot&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ad7c715acf07aa54ae0f.js&quot;&gt;&lt;/script&gt;
&lt;i&gt;Python Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ee5dacbd1ef59efd458b.js&quot;&gt;&lt;/script&gt;
Obviously the convergence is slow, and we can adjust this by tuning the learning-rate parameter, for example if we try to increase it into $\gamma=.01$ (change &lt;code&gt;gamma&lt;/code&gt; to .01 in the codes above) the algorithm will converge at 42nd iteration. To support that claim, see the steps of its gradient in the plot below.&lt;/div&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;embed src=&quot;https://cdn.rawgit.com/alstat/SampleImages/master/z3.svg&quot; width=&quot;500&quot;&gt;&lt;/embed&gt;
&lt;/center&gt;
&lt;div style=&quot;text-align: justify;&quot;&gt;
If we try to change the starting value from 4 to .1 (change &lt;code&gt;beta_new&lt;/code&gt; to .1) with $\gamma=.01$, the algorithm converges at 173rd iteration with estimate $\hat{\beta}_{173}=2.249962\approx\frac{9}{4}$ (see the plot below).&lt;/div&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;embed src=&quot;https://cdn.rawgit.com/alstat/SampleImages/master/z4.svg&quot; width=&quot;500&quot;&gt;&lt;/embed&gt;
&lt;/center&gt;
&lt;div style=&quot;text-align: justify;&quot;&gt;
Now let&#39;s consider another function known as &lt;a href=&quot;https://en.wikipedia.org/wiki/Rosenbrock_function&quot; target=&quot;_blank&quot;&gt;Rosenbrock&lt;/a&gt; defined as
\begin{equation}
f(\mathbf{w})\triangleq(1 - w_1) ^ 2 + 100 (w_2 - w_1^2)^2.
\end{equation}
The gradient is
\begin{align}
\nabla f(\mathbf{w})&amp;amp;=[-2(1 - w_1) - 400(w_2 - w_1^2) w_1]\mathbf{i}+200(w_2-w_1^2)\mathbf{j}\nonumber\\
&amp;amp;=\left[\begin{array}{c} 
-2(1 - w_1) - 400(w_2 - w_1^2) w_1\\
200(w_2-w_1^2)
\end{array}\right].
\end{align}
Let the initial guess be $\hat{\mathbf{w}}_0=\left[\begin{array}{c}-1.8\\-.8\end{array}\right]$, $\gamma=.0002$, and $\nu=.00001$. Then $\nabla f(\hat{\mathbf{w}}_0)=\left[\begin{array}{c} -2914.4\\-808.0\end{array}\right]$. So that 
\begin{equation}\nonumber
\hat{\mathbf{w}}_1=\hat{\mathbf{w}}_0-\gamma\nabla f(\hat{\mathbf{w}}_0)=\left[\begin{array}{c} -1.21712
\\-0.63840\end{array}\right].
\end{equation}
And $\lVert\hat{\mathbf{w}}_0-\hat{\mathbf{w}}_1\rVert=0.6048666&amp;gt;\nu$. Repeat the process until at some $r$, $\lVert\hat{\mathbf{w}}_r-\hat{\mathbf{w}}_{r+1}\rVert\ngtr \nu$. It will turn out that 23,374 iterations are needed for the desired inequality with estimate $\hat{\mathbf{w}}_{23375}=\left[\begin{array}{c} 0.9464841
\\0.8956111\end{array}\right]$, the contour plot is depicted in the figure below.
&lt;/div&gt;
&lt;center&gt;
&lt;embed src=&quot;https://cdn.rawgit.com/alstat/SampleImages/master/z5.svg&quot; width=&quot;550&quot;&gt;&lt;/embed&gt;
&lt;/center&gt;
&lt;div style=&quot;text-align: justify;&quot;&gt;
&lt;i&gt;R Script with Contour Plot&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/6b77eadaf6d34d809b4a.js&quot;&gt;&lt;/script&gt;
&lt;i&gt;Python Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f90a17ce13b7504037d4.js&quot;&gt;&lt;/script&gt;
Notice that I did not use ggplot for the contour plot, this is because the plot needs to be updated 23,374 times just to accommodate for the arrows for the trajectory of the gradient vectors, and ggplot is just slow. Finally, we can also visualize the gradient points on the surface as shown in the following figure.
&lt;/div&gt;
&lt;center&gt;
&lt;embed src=&quot;https://cdn.rawgit.com/alstat/SampleImages/master/z6.svg&quot; width=&quot;450&quot;&gt;&lt;/embed&gt;
&lt;/center&gt;
&lt;div style=&quot;text-align: justify;&quot;&gt;
&lt;i&gt;R Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/3ea6112e9aaf3945e4af.js&quot;&gt;&lt;/script&gt;
In my future blog post, I hope to apply this algorithm on statistical models like linear/nonlinear regression models for simple illustration.&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/6446820968926186166/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/12/r-and-python-gradient-descent.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6446820968926186166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6446820968926186166'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/12/r-and-python-gradient-descent.html' title='R and Python: Gradient Descent'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-8984078873601215128</id><published>2015-12-15T18:55:00.000+08:00</published><updated>2015-12-16T19:02:15.111+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Interactive Visualization"/><category scheme="http://www.blogger.com/atom/ns#" term="Linear Models"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>R and Python: Theory of Linear Least Squares</title><content type='html'>In my &lt;a href=&quot;http://alstatr.blogspot.com/2015/08/r-python-and-sas-getting-started-with.html&quot; target=&quot;_blank&quot;&gt;previous article&lt;/a&gt;, we talked about implementations of linear regression models in R, Python and SAS. On the theoretical sides, however, I briefly mentioned the estimation procedure for the parameter $\boldsymbol{\beta}$. So to help us understand how software does the estimation procedure, we&#39;ll look at the mathematics behind it. We will also perform the estimation manually in R and in Python, that means we&#39;re not going to use any special packages, this will help us appreciate the theory.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Linear Least Squares&lt;/h3&gt;
Consider the linear regression model,
\[
y_i=f_i(\mathbf{x}|\boldsymbol{\beta})+\varepsilon_i,\quad\mathbf{x}_i=\left[
\begin{array}{cccc}
1&amp;x_{11}&amp;\cdots&amp;x_{1p}
\end{array}\right],\quad\boldsymbol{\beta}=\left[\begin{array}{c}\beta_0\\\beta_1\\\vdots\\\beta_p\end{array}\right],
\]
where $y_i$ is the &lt;i&gt;response&lt;/i&gt; or the &lt;i&gt;dependent&lt;/i&gt; variable at the $i$th case, $i=1,\cdots, N$. The $f_i(\mathbf{x}|\boldsymbol{\beta})$ is the deterministic part of the model that depends on both the parameters $\boldsymbol{\beta}\in\mathbb{R}^{p+1}$ and the predictor variable $\mathbf{x}_i$, which in matrix form, say $\mathbf{X}$, is represented as follows
\[
\mathbf{X}=\left[
\begin{array}{cccccc}
1&amp;x_{11}&amp;\cdots&amp;x_{1p}\\
1&amp;x_{21}&amp;\cdots&amp;x_{2p}\\
\vdots&amp;\vdots&amp;\ddots&amp;\vdots\\
1&amp;x_{N1}&amp;\cdots&amp;x_{Np}\\
\end{array}
\right].
\]
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
$\varepsilon_i$ is the error term at the $i$th case which we assumed to be Gaussian distributed with mean 0 and variance $\sigma^2$. So that
\[
\mathbb{E}y_i=f_i(\mathbf{x}|\boldsymbol{\beta}),
\]
i.e. $f_i(\mathbf{x}|\boldsymbol{\beta})$ is the &lt;i&gt;expectation function&lt;/i&gt;. The uncertainty around the response variable is also modelled by Gaussian distribution. Specifically, if $Y=f(\mathbf{x}|\boldsymbol{\beta})+\varepsilon$ and $y\in Y$ such that $y&gt;0$, then
\begin{align*}
\mathbb{P}[Y\leq y]&amp;=\mathbb{P}[f(x|\beta)+\varepsilon\leq y]\\
&amp;=\mathbb{P}[\varepsilon\leq y-f(\mathbf{x}|\boldsymbol{\beta})]=\mathbb{P}\left[\frac{\varepsilon}{\sigma}\leq \frac{y-f(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right]\\
&amp;=\Phi\left[\frac{y-f(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right],
\end{align*}
where $\Phi$ denotes the Gaussian distribution with density denoted by $\phi$ below. Hence $Y\sim\mathcal{N}(f(\mathbf{x}|\boldsymbol{\beta}),\sigma^2)$. That is,
\begin{align*}
\frac{\operatorname{d}}{\operatorname{d}y}\Phi\left[\frac{y-f(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right]&amp;=\phi\left[\frac{y-f(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right]\frac{1}{\sigma}=\mathbb{P}[y|f(\mathbf{x}|\boldsymbol{\beta}),\sigma^2]\\
&amp;=\frac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\frac{1}{2}\left[\frac{y-f(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right]^2\right\}.
\end{align*}
If the data are independent and identically distributed, then the log-likelihood function of $y$ is,
\begin{align*}
\mathcal{L}[\boldsymbol{\beta}|\mathbf{y},\mathbf{X},\sigma]&amp;=\mathbb{P}[\mathbf{y}|\mathbf{X},\boldsymbol{\beta},\sigma]=\prod_{i=1}^N\frac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\frac{1}{2}\left[\frac{y_i-f_i(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right]^2\right\}\\
&amp;=\frac{1}{(2\pi)^{\frac{n}{2}}\sigma^n}\exp\left\{-\frac{1}{2}\sum_{i=1}^N\left[\frac{y_i-f_i(\mathbf{x}|\boldsymbol{\beta})}{\sigma}\right]^2\right\}\\
\log\mathcal{L}[\boldsymbol{\beta}|\mathbf{y},\mathbf{X},\sigma]&amp;=-\frac{n}{2}\log2\pi-n\log\sigma-\frac{1}{2\sigma^2}\sum_{i=1}^N\left[y_i-f_i(\mathbf{x}|\boldsymbol{\beta})\right]^2.
\end{align*}
And because the likelihood function tells us about the plausibility of the parameter $\boldsymbol{\beta}$ in explaining the sample data. We therefore want to find the best estimate of $\boldsymbol{\beta}$ that likely generated the sample. Thus our goal is to maximize the likelihood function which is equivalent to maximizing the log-likelihood with respect to $\boldsymbol{\beta}$. And that&#39;s simply done by taking the partial derivative with respect to the parameter $\boldsymbol{\beta}$. Therefore, the first two terms in the right hand side of the equation above can be disregarded since it does not depend on $\boldsymbol{\beta}$. Also, the location of the maximum log-likelihood with respect to $\boldsymbol{\beta}$ is not affected by arbitrary positive scalar multiplication, so the factor $\frac{1}{2\sigma^2}$ can be omitted. And we are left with the following equation,
\begin{equation}\label{eq:1}
-\sum_{i=1}^N\left[y_i-f_i(\mathbf{x}|\boldsymbol{\beta})\right]^2.
\end{equation}
One last thing is that, instead of maximizing the log-likelihood function we can do minimization on the negative log-likelihood. Hence we are interested on minimizing the negative of Equation (\ref{eq:1}) which is
\begin{equation}\label{eq:2}
\sum_{i=1}^N\left[y_i-f_i(\mathbf{x}|\boldsymbol{\beta})\right]^2,
\end{equation}
popularly known as the &lt;i&gt;residual sum of squares&lt;/i&gt; (RSS). So RSS is a consequence of maximum log-likelihood under the Gaussian assumption of the uncertainty around the response variable $y$. For models with two parameters, say $\beta_0$ and $\beta_1$ the RSS can be visualized like the one in my &lt;a href=&quot;http://alstatr.blogspot.com/2015/08/r-python-and-sas-getting-started-with.html&quot; target=&quot;_blank&quot;&gt;previous article&lt;/a&gt;, that is
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/479/&quot; target=&quot;_blank&quot; title=&quot;Error Surface&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/479.png&quot; alt=&quot;Error Surface&quot; style=&quot;max-width: 100%;width: 500px;&quot;  width=&quot;500&quot; onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:479&quot;  src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;

Performing differentiation under $(p+1)$-dimensional parameter $\boldsymbol{\beta}$ is manageable in the context of linear algebra, so Equation (\ref{eq:2}) is equivalent to
\begin{align*}
\lVert\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rVert^2&amp;=\langle\mathbf{y}-\mathbf{X}\boldsymbol{\beta},\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rangle=\mathbf{y}^{\text{T}}\mathbf{y}-\mathbf{y}^{\text{T}}\mathbf{X}\boldsymbol{\beta}-(\mathbf{X}\boldsymbol{\beta})^{\text{T}}\mathbf{y}+(\mathbf{X}\boldsymbol{\beta})^{\text{T}}\mathbf{X}\boldsymbol{\beta}\\
&amp;=\mathbf{y}^{\text{T}}\mathbf{y}-\mathbf{y}^{\text{T}}\mathbf{X}\boldsymbol{\beta}-\boldsymbol{\beta}^{\text{T}}\mathbf{X}^{\text{T}}\mathbf{y}+\boldsymbol{\beta}^{\text{T}}\mathbf{X}^{\text{T}}\mathbf{X}\boldsymbol{\beta}
\end{align*}
And the derivative with respect to the parameter is 
\begin{align*}
\frac{\operatorname{\partial}}{\operatorname{\partial}\boldsymbol{\beta}}\lVert\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\rVert^2&amp;=-2\mathbf{X}^{\text{T}}\mathbf{y}+2\mathbf{X}^{\text{T}}\mathbf{X}\boldsymbol{\beta}
\end{align*}
Taking the critical point by setting the above equation to zero vector, we have
\begin{align}
\frac{\operatorname{\partial}}{\operatorname{\partial}\boldsymbol{\beta}}\lVert\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}}\rVert^2&amp;\overset{\text{set}}{=}\mathbf{0}\nonumber\\
-\mathbf{X}^{\text{T}}\mathbf{y}+\mathbf{X}^{\text{T}}\mathbf{X}\hat{\boldsymbol{\beta}}&amp;=\mathbf{0}\nonumber\\
\mathbf{X}^{\text{T}}\mathbf{X}\hat{\boldsymbol{\beta}}&amp;=\mathbf{X}^{\text{T}}\mathbf{y}\label{eq:norm}
\end{align}
Equation (\ref{eq:norm}) is called the &lt;i&gt;normal equation&lt;/i&gt;. If $\mathbf{X}$ is full rank, then we can compute the inverse of $\mathbf{X}^{\text{T}}\mathbf{X}$,
\begin{align}
\mathbf{X}^{\text{T}}\mathbf{X}\hat{\boldsymbol{\beta}}&amp;=\mathbf{X}^{\text{T}}\mathbf{y}\nonumber\\
(\mathbf{X}^{\text{T}}\mathbf{X})^{-1}\mathbf{X}^{\text{T}}\mathbf{X}\hat{\boldsymbol{\beta}}&amp;=(\mathbf{X}^{\text{T}}\mathbf{X})^{-1}\mathbf{X}^{\text{T}}\mathbf{y}\nonumber\\
\hat{\boldsymbol{\beta}}&amp;=(\mathbf{X}^{\text{T}}\mathbf{X})^{-1}\mathbf{X}^{\text{T}}\mathbf{y}.\label{eq:betahat}
\end{align}
That&#39;s it, since both $\mathbf{X}$ and $\mathbf{y}$ are known.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Prediction&lt;/h3&gt;
If $\mathbf{X}$ is full rank and spans the subspace $V\subseteq\mathbb{R}^N$, where $\mathbb{E}\mathbf{y}=\mathbf{X}\boldsymbol{\beta}\in V$. Then the predicted values of $\mathbf{y}$ is given by,
\begin{equation}\label{eq:pred}
\hat{\mathbf{y}}=\mathbb{E}\mathbf{y}=\mathbf{P}_{V}\mathbf{y}=\mathbf{X}(\mathbf{X}^{\text{T}}\mathbf{X})^{-1}\mathbf{X}^{\text{T}}\mathbf{y},
\end{equation}
where $\mathbf{P}$ is the projection matrix onto the space $V$. For proof of the projection matrix in Equation (\ref{eq:pred}) please refer to reference (1) below. Notice that this is equivalent to
\begin{equation}\label{eq:yhbh}
\hat{\mathbf{y}}=\mathbb{E}\mathbf{y}=\mathbf{X}\hat{\boldsymbol{\beta}}.
\end{equation}
&lt;br/&gt;
&lt;h3&gt;Computation&lt;/h3&gt;
Let&#39;s fire up R and Python and see how we can apply those equations we derived. For purpose of illustration, we&#39;re going to simulate data from Gaussian distributed population. To do so, consider the following codes
&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;R Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9c838343aefc64609e8b.js&quot;&gt;&lt;/script&gt;
&lt;i&gt;Python Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/0679705fbfc6dc85ace1.js&quot;&gt;&lt;/script&gt;
Here we have two predictors &lt;code&gt;x1&lt;/code&gt; and &lt;code&gt;x2&lt;/code&gt;, and our response variable &lt;code&gt;y&lt;/code&gt; is generated by the parameters $\beta_1=3.5$ and $\beta_2=2.8$, and it has Gaussian noise with variance 7. While we set the same random seeds for both R and Python, we should not expect the random values generated in both languages to be identical, instead both values are independent and identically distributed (iid). For visualization, I will use Python Plotly, you can also translate it to R Plotly.&lt;br/&gt;&lt;br/&gt;
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated/5/&quot; target=&quot;_blank&quot; title=&quot;&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated/5.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%;width: 600px;&quot;  width=&quot;600&quot; onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated:5&quot;  src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/90650219b47f4412f789.js&quot;&gt;&lt;/script&gt;
Now let&#39;s estimate the parameter $\boldsymbol{\beta}$ which by default we set to $\beta_1=3.5$ and $\beta_2=2.8$. We will use Equation (\ref{eq:betahat}) for estimation. So that we have&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;R Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/c628ccdc642c8dbdc390.js&quot;&gt;&lt;/script&gt;
&lt;i&gt;Python Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/31cc00ba2af9fe88980c.js&quot;&gt;&lt;/script&gt;
That&#39;s a good estimate, and again just a reminder, the estimate in R and in Python are different because we have different random samples, the important thing is that both are iid. To proceed, we&#39;ll do prediction using Equations (\ref{eq:pred}). That is,
&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;R Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/87212b699aea6049b5ab.js&quot;&gt;&lt;/script&gt;
&lt;i&gt;Python Script&lt;/i&gt;
&lt;script src=&quot;https://gist.github.com/alstat/35f8b83dcaacb98eb231.js&quot;&gt;&lt;/script&gt;
The first column above is the data &lt;code&gt;y&lt;/code&gt; and the second column is the prediction due to Equation (\ref{eq:pred}). Thus if we are to expand the prediction into an expectation plane, then we have&lt;br/&gt;&lt;br/&gt;
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated/10/&quot; target=&quot;_blank&quot; title=&quot;&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated/10.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%;width: 600px;&quot;  width=&quot;600&quot; onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated:10&quot;  src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8907a7679a2a7ee8c13f.js&quot;&gt;&lt;/script&gt;
You have to rotate the plot by the way to see the plane, I still can&#39;t figure out how to change it in Plotly. Anyway, at this point we can proceed computing for other statistics like the variance of the error, and so on. But I will leave it for you to explore. Our aim here is just to give us an understanding on what is happening inside the internals of our software when we try to estimate the parameters of the linear regression models.
&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Reference&lt;/h3&gt;
&lt;ol&gt;&lt;li&gt;
Arnold, Steven F. (1981). &lt;i&gt;The Theory of Linear Models and Multivariate Analysis&lt;/i&gt;. Wiley.
&lt;/li&gt;
&lt;li&gt;
&lt;a href=&quot;http://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf&quot; target = &quot;_blank&quot;&gt;OLS in Matrix Form&lt;/a&gt;
&lt;/li&gt;&lt;/ol&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/8984078873601215128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/12/r-and-python-theory-of-linear-least.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8984078873601215128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8984078873601215128'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/12/r-and-python-theory-of-linear-least.html' title='R and Python: Theory of Linear Least Squares'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-5640154164563165613</id><published>2015-08-17T10:38:00.001+08:00</published><updated>2015-08-24T17:29:00.409+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Data Mining"/><category scheme="http://www.blogger.com/atom/ns#" term="Interactive Visualization"/><category scheme="http://www.blogger.com/atom/ns#" term="Linear Models"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>R, Python, and SAS: Getting Started with Linear Regression</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Consider the linear regression model,
$$
y_i=f_i(\boldsymbol{x}|\boldsymbol{\beta})+\varepsilon_i,
$$
where $y_i$ is the &lt;i&gt;response&lt;/i&gt; or the &lt;i&gt;dependent&lt;/i&gt; variable at the $i$th case, $i=1,\cdots, N$ and the &lt;i&gt;predictor&lt;/i&gt; or the &lt;i&gt;independent&lt;/i&gt; variable is the $\boldsymbol{x}$ term defined in the mean function $f_i(\boldsymbol{x}|\boldsymbol{\beta})$. For simplicity, consider the following simple linear regression (SLR) model,
$$
y_i=\beta_0+\beta_1x_i+\varepsilon_i.
$$
To obtain the (best) estimate of $\beta_0$ and $\beta_1$, we solve for the least residual sum of squares (RSS) given by,
$$
S=\sum_{i=1}^{n}\varepsilon_i^2=\sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2.
$$
Now suppose we want to fit the model to the following data, &lt;b&gt;Average Heights and Weights for American Women&lt;/b&gt;, where &lt;i&gt;weight&lt;/i&gt; is the response and &lt;i&gt;height&lt;/i&gt; is the predictor. The data is available in R by default.
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a05a87bfe2140c230cfe.js&quot;&gt;&lt;/script&gt;
The following is the plot of the residual sum of squares of the data base on the SLR model over $\beta_0$ and $\beta_1$, note that we standardized the variables first before plotting it,
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/479/&quot; target=&quot;_blank&quot; title=&quot;Error Surface&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/479.png&quot; alt=&quot;Error Surface&quot; style=&quot;max-width: 100%;width: 500px;&quot;  width=&quot;500&quot; onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:479&quot;  src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
If you are interested on the codes of the above figure, please click &lt;a href=&quot;https://gist.github.com/alstat/9833b04843449747ba1b&quot; target = &quot;_blank&quot;&gt;here&lt;/a&gt;.
To minimize this elliptic paraboloid, differentiation has to be done with respect to the parameters, and then equate this to zero to obtain the stationary point, and finally solve for $\beta_0$ and $\beta_1$. For more on derivation of the estimates of the parameters see reference 1.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Simple Linear Regression in R&lt;/h3&gt;
In R, we can fit the model using the &lt;code&gt;lm&lt;/code&gt; function, which stands for linear models, i.e.&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/30459979336fc95974da.js&quot;&gt;&lt;/script&gt;
Formula, defined above as &lt;code&gt;{response ~ predictor}&lt;/code&gt;, is a handy method for fitting model to the data in R. Mathematically, our model is
$$
weight = \beta_0 + \beta_1 (height) + \varepsilon.
$$
The summary of it is obtain by running &lt;code&gt;model %&gt;% summary&lt;/code&gt; or for non-magrittr user &lt;code&gt;summary(model)&lt;/code&gt;, given the &lt;code&gt;model&lt;/code&gt; object defined in the previous code,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/bb4ca0b911c335591c60.js&quot;&gt;&lt;/script&gt;
The Coefficients section above returns the estimated coefficients of the model, and these are $\beta_0 = -87.51667$ and $\beta_1=3.45000$ (it should be clear that we used the unstandardized variables for obtaining these estimates). The estimates are both significant base on the p-value under .05 and even in .01 level of the test. Using the estimated coefficients along with the residual standard error we can now construct the fitted line and it&#39;s confidence interval as shown below.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDImaXQJOaEM8X54ZbNDVUq4xGXADZHF4K44UYvODiDxSmDNxRrDlPVNLxw761xjK1snwI8xbQ0j-uwiZniOWPvRY0BugWCyrv6rapgzuMFpmIsMQToFB620Ck_0h9kF94XwxKke3gkAnj/s1600/Rplot02.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Fig 1. Plot of the Data and the Predicted Values in R.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f40b85ca7bb0799db221.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
Simple Linear Regression in Python&lt;/h3&gt;
In Python, there are two modules that have implementation of linear regression modelling, one is in &lt;a href=&quot;http://scikit-learn.org/stable/index.html&quot; target=&#39;_blank&#39;&gt;scikit-learn&lt;/a&gt; (&lt;code&gt;sklearn&lt;/code&gt;) and the other is in &lt;a href=&quot;http://statsmodels.sourceforge.net/&quot; target = &#39;_blank&#39;&gt;Statsmodels&lt;/a&gt; (&lt;code&gt;statsmodels&lt;/code&gt;). For example we can model the above data using &lt;code&gt;sklearn&lt;/code&gt; as follows:
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e23b7fd39ae3772852a4.js&quot;&gt;&lt;/script&gt;
Above output is the estimate of the parameters, to obtain the predicted values and plot these along with the data points like what we did in R, we can wrapped the functions above into a class called &lt;code&gt;linear_regression&lt;/code&gt; say, that requires &lt;a href=&quot;http://stanford.edu/~mwaskom/software/seaborn/&quot;&gt;Seaborn&lt;/a&gt; package for neat plotting, see the codes below:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ea8747cd0cfe81a07e56.js&quot;&gt;&lt;/script&gt;
Using this class and its methods, fitting the model to the data is coded as follows:
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/1146870dd5ec8c1a5256.js&quot;&gt;&lt;/script&gt;
The predicted values of the data points is obtain using the &lt;code&gt;predict&lt;/code&gt; method,
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/684f3a9869aa1b377603.js&quot;&gt;&lt;/script&gt;
And Figure 2 below shows the plot of the predicted values along with its confidence interval and data points.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimIGa4H-5AZMJjusg43s_F1eh7wMZSEhVFsUm73g3pnJ53hriu0AKADvgVvMkunZq5qgRuewyGIZNNmzhw4ue_C8j6I64SZsBLhA8KsUZx95KwJ-_b9gNF6-uHA38P_HirC4XHnw0jEiJU/s1600/plot1.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Fig 2. Plot of the Data and the Predicted Values in Python.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e5e68803df06fff20852.js&quot;&gt;&lt;/script&gt;
If one is only interested on the estimates of the model, then &lt;code&gt;LinearRegression&lt;/code&gt; of scikit-learn is sufficient, but if the interest on other statistics such as that returned in R model summary is necessary, the said module can also do the job but might need to program other necessary routine. &lt;code&gt;statsmodels&lt;/code&gt;, on the other hand, returns complete summary of the fitted model as compared to the R output above, which is useful for studies with particular interest on these information. So that modelling the data using simple linear regression is done as follows:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ee2bc1a96c584aaeb3ee.js&quot;&gt;&lt;/script&gt;
Clearly, we could spare time with statsmodels, especially in diagnostic checking involving test statistics such as &lt;a href=&quot;https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic&quot;&gt;Durbin-Watson&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/wiki/Jarque%E2%80%93Bera_test&quot;&gt;Jarque-Bera&lt;/a&gt; tests. We can of course add some plotting for diagnostic, but I prefer to discuss that on a separate entry.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Simple Linear Regression in SAS&lt;/h3&gt;
Now let&#39;s consider running the data in SAS, I am using SAS Studio and in order to import the data, I saved it as a CSV file first with columns height and weight. Uploaded it to SAS Studio, in which follows are the codes below to import the data.&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/69f1ba19db43e16aff9f.js&quot;&gt;&lt;/script&gt;
Next we fit the model to the data using the &lt;code&gt;REG&lt;/code&gt; procedure,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f4ff898dc4c275d570e2.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Number of Observations Read&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Number of Observations Used&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;Analysis of Variance&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;b header&quot; scope=&quot;col&quot;&gt;Source&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;DF&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Sum of&lt;br&gt;Squares&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Mean&lt;br&gt;Square&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;F Value&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Pr&amp;nbsp;&amp;gt;&amp;nbsp;F&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Model&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3332.70000&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3332.70000&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1433.02&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;lt;.0001&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Error&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;13&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;30.23333&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2.32564&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Corrected Total&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;14&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3362.93333&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Root MSE&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1.52501&lt;/td&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;R-Square&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.9910&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Dependent Mean&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;136.73333&lt;/td&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Adj R-Sq&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.9903&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Coeff Var&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1.11531&lt;/td&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;&amp;nbsp;&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;Parameter Estimates&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;b header&quot; scope=&quot;col&quot;&gt;Variable&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;DF&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Parameter&lt;br&gt;Estimate&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Standard&lt;br&gt;Error&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;t&amp;nbsp;Value&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Pr&amp;nbsp;&amp;gt;&amp;nbsp;|t|&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Intercept&lt;/th&gt;
&lt;th class=&quot;r data&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-87.51667&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5.93694&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-14.74&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;lt;.0001&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;height&lt;/th&gt;
&lt;th class=&quot;r data&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;3.45000&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.09114&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;37.86&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;lt;.0001&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7nJ6aFRjJxNzKG3dIAK8Xr3igJ2soY9-BHlDbIRPKuz6Ps07U-KbY-IEmd7tYEHgU6CAtp99h8SwfXkLurGsNdyz6RSiQRQ090mxtA_9E-5DDNIV5DtLPf5dsQ2Sj0ODZkFDd78giQP6d/s1600/p1.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7nJ6aFRjJxNzKG3dIAK8Xr3igJ2soY9-BHlDbIRPKuz6Ps07U-KbY-IEmd7tYEHgU6CAtp99h8SwfXkLurGsNdyz6RSiQRQ090mxtA_9E-5DDNIV5DtLPf5dsQ2Sj0ODZkFDd78giQP6d/s400/p1.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvHA3voiF4-9sLjy6fO4Zh0syrc0A34w0S94dZLNLxNJaYnQ_frNnQgqVqCq_DSjcgrtICRwMCCI27bk1Vc50cgjBUo7wkkPLuBgjAiCOoe_3QDDUylLSeDIha0_edMwpXEIX3txwqRRPI/s1600/p2.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvHA3voiF4-9sLjy6fO4Zh0syrc0A34w0S94dZLNLxNJaYnQ_frNnQgqVqCq_DSjcgrtICRwMCCI27bk1Vc50cgjBUo7wkkPLuBgjAiCOoe_3QDDUylLSeDIha0_edMwpXEIX3txwqRRPI/s400/p2.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgf7ICwXKHJtMls3RpLlGUbl9lIvbWrTwTOfNAIYP8k4ymWxoVZwq-J4GdPbs0cPcKqJAI-LO2R2fe-SDUkW4B07Ii8sMcRFbQ89DCSGcL9YszMFxSitxXANDWgoklZuEnjC3D0d9gPzXJn/s1600/p3.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgf7ICwXKHJtMls3RpLlGUbl9lIvbWrTwTOfNAIYP8k4ymWxoVZwq-J4GdPbs0cPcKqJAI-LO2R2fe-SDUkW4B07Ii8sMcRFbQ89DCSGcL9YszMFxSitxXANDWgoklZuEnjC3D0d9gPzXJn/s400/p3.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/center&gt;
Now that&#39;s a lot of output, probably the complete one. But like I said, I am not going to discuss each of these values and plots as some of it are used for diagnostic checking (you can read more on that in reference 1, and in other applied linear regression books). For now, let&#39;s just confirm the coefficients obtained -- both the estimates are the same with that in R and Python.
&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Multiple Linear Regression (MLR)&lt;/h3&gt;
To extend SLR to MLR, we&#39;ll demonstrate this by simulation. Using the formula-based &lt;code&gt;lm&lt;/code&gt; function of R, assuming we have $x_1$ and $x_2$ as our predictors, then following is how we do MLR in R:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d0ba72899808667ec59f.js&quot;&gt;&lt;/script&gt; 
Although we did not use intercept in simulating the data, but the obtained estimates for $\beta_1$ and $\beta_2$ are close to the true parameters (.35 and .56). The intercept, however, will help us capture the noise term we added in simulation. &lt;br/&gt;&lt;br/&gt;
Next we&#39;ll try MLR in Python using statsmodels, consider the following:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/0bdfe480e4e2aa3feeb4.js&quot;&gt;&lt;/script&gt;
It should be noted that, the estimates in R and in Python should not (necessarily) be the same since these are simulated values from different software. Finally, we can perform MLR in SAS as follows:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/6244ad3429c96c53fadc.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Number of Observations Read&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Number of Observations Used&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;Analysis of Variance&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;b header&quot; scope=&quot;col&quot;&gt;Source&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;DF&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Sum of&lt;br&gt;Squares&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Mean&lt;br&gt;Square&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;F Value&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Pr&amp;nbsp;&amp;gt;&amp;nbsp;F&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Model&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;610.86535&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;305.43268&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;303.88&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;lt;.0001&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Error&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;97&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;97.49521&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1.00511&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Corrected Total&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;99&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;708.36056&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Root MSE&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1.00255&lt;/td&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;R-Square&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.8624&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Dependent Mean&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;244.07327&lt;/td&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Adj R-Sq&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.8595&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Coeff Var&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.41076&lt;/td&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;&amp;nbsp;&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;Parameter Estimates&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;b header&quot; scope=&quot;col&quot;&gt;Variable&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;DF&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Parameter&lt;br&gt;Estimate&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Standard&lt;br&gt;Error&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;t&amp;nbsp;Value&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Pr&amp;nbsp;&amp;gt;&amp;nbsp;|t|&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;Intercept&lt;/th&gt;
&lt;th class=&quot;r data&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;18.01299&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;11.10116&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1.62&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.1079&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;X1&lt;/th&gt;
&lt;th class=&quot;r data&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.31770&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.01818&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;17.47&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;lt;.0001&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;rowheader&quot; scope=&quot;row&quot;&gt;X2&lt;/th&gt;
&lt;th class=&quot;r data&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;0.58276&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.03358&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;17.35&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&amp;lt;.0001&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWfDi3dta-1ujEZ6oZA490kUp7f261UCJ9Wf57Ycshq4FvU8qQMf2Y-C3cgMQFFsGvrMD4MRE09ZM8r-hjeARBqs4OwVyKm4ZLQlYjDaBjPNImfJSfqfn1SXVOL6xjha6ap2rHXUYtCVFd/s1600/v1.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWfDi3dta-1ujEZ6oZA490kUp7f261UCJ9Wf57Ycshq4FvU8qQMf2Y-C3cgMQFFsGvrMD4MRE09ZM8r-hjeARBqs4OwVyKm4ZLQlYjDaBjPNImfJSfqfn1SXVOL6xjha6ap2rHXUYtCVFd/s400/v1.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY3aA7XHfmbUV06ItoKLoQbNzMXc9cCixImST5HsmvhPpaKquJ6M13BpZ8JU5TRk80awEYUB35cbDbS11CvhSNp9-e6kSKzB9ywP0Pe6lIwTjZls4GjhSNZH2sgvAB4gZMBk-3qFQv2anH/s1600/v2.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhY3aA7XHfmbUV06ItoKLoQbNzMXc9cCixImST5HsmvhPpaKquJ6M13BpZ8JU5TRk80awEYUB35cbDbS11CvhSNp9-e6kSKzB9ywP0Pe6lIwTjZls4GjhSNZH2sgvAB4gZMBk-3qFQv2anH/s400/v2.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/center&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
In conclusion, SAS saves a lot of work, since it returns complete summary of the model, no doubt why companies prefer to use this, besides from their active customer support. R and Python, on the other hand, despite the fact that it is open-source, it can well compete with the former, although it requires programming skills to achieved all of the SAS outputs, but I think that&#39;s the exciting part of it -- it makes you think, and manage time. The achievement in R and Python is of course fulfilling. Hope you&#39;ve learned something, feel free to share your thoughts on the comment below.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
Draper, N. R. and Smith, H. (1966). Applied Regression Analysis. John Wiley &amp; Sons, Inc. United States of America.
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://scikit-learn.org/stable/documentation.html&quot; target = &quot;_blank&quot;&gt;Scikit-learn Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://statsmodels.sourceforge.net/documentation.html&quot; target =&quot;_blank&quot;&gt;Statsmodels Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://support.sas.com/documentation/&quot; target = &quot;_blank&quot;&gt;SAS Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.amazon.com/The-Little-SAS-Book-Edition-ebook/dp/B00B29H9HU/ref=pd_sim_kstore_3?ie=UTF8&amp;refRID=1831EFXWNWTBVA83EMDY&quot; target = &quot;_blank&quot;&gt;Delwiche, Lora D., and Susan J. Slaughter. 2012. The Little SAS® Book: A Primer, Fifth Edition. Cary, NC: SAS Institute Inc.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter1/sasreg1.htm&quot; target = &quot;_blank&quot;&gt;Regression with SAS. Institute for Digital Research and Education. UCLA. Retrieved August 13, 2015.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://plot.ly/python/getting-started/&quot;&gt;Python Plotly Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;style&gt;
.header {
    background-color: #EDF2F9;
    border-color: #B0B7BB;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    color: #127;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: bold;
    padding: 2px 5px 2px 5px;
}


.rowheader {
    background-color: #EDF2F9;
    border-color: #B0B7BB;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    color: #127;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: bold;
    text-align: center;
    padding: 2px 5px 2px 5px;
}


.data, .dataemphasis {
    background-color: #FFF;
    border-color: #C1C1C1;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: normal;
    text-align: right;
    padding: 2px 5px 2px 5px;
}

.table {
    border-color: #C1C1C1;
    border-style: solid;
    border-width: 1px 1px 1px 1px;
    border-collapse: collapse;
    border-spacing: 0px;
    padding: 5px 5px 5px 5px;
    margin-bottom: 1em;
}

.body {
    color: #000;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: normal;
    line-height: 1.231;
}
&lt;/style&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/5640154164563165613/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/08/r-python-and-sas-getting-started-with.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/5640154164563165613'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/5640154164563165613'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/08/r-python-and-sas-getting-started-with.html' title='R, Python, and SAS: Getting Started with Linear Regression'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDImaXQJOaEM8X54ZbNDVUq4xGXADZHF4K44UYvODiDxSmDNxRrDlPVNLxw761xjK1snwI8xbQ0j-uwiZniOWPvRY0BugWCyrv6rapgzuMFpmIsMQToFB620Ck_0h9kF94XwxKke3gkAnj/s72-c/Rplot02.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-1597733395520390528</id><published>2015-07-21T07:05:00.000+08:00</published><updated>2015-07-21T07:05:27.000+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>Parametric Inference: Karlin-Rubin Theorem</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div class=&quot;definition&quot;&gt;
A family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ for a univariate random variable $T$ with real-valued parameter $\theta$ has a &lt;i&gt;monotone likelihood ratio&lt;/i&gt; (MLR) if, for every $\theta_2&amp;gt;\theta_1$, $g(t|\theta_2)/g(t|\theta_1)$ is a monotone (nonincreasing or nondecreasing) function of $t$ on $\{t:g(t|\theta_1)&amp;gt;0\;\text{or}\;g(t|\theta_2)&amp;gt;0\}$. Note that $c/0$ is defined as $\infty$ if $0&amp;lt; c$.
&lt;/div&gt;
&lt;div class=&quot;theorem&quot; style=&quot;content: &#39;Karlin-Rubin Theorem.&#39;;&quot;&gt;
Consider testing $H_0:\theta\leq \theta_0$ versus $H_1:\theta&amp;gt;\theta_0$. Suppose that $T$ is a sufficient statistic for $\theta$ and the family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ of $T$ has an MLR. Then for any $t_0$, the test that rejects $H_0$ if and only if $T &amp;gt;t_0$ is a UMP level $\alpha$ test, where $\alpha=P_{\theta_0}(T &amp;gt;t_0)$.
&lt;/div&gt;
&lt;b&gt;Example 1&lt;/b&gt;&lt;br/&gt;
To better understand the theorem, consider a single observation, $X$, from $\mathrm{n}(\theta,1)$, and test the following hypotheses:
$$
H_0:\theta\leq \theta_0\quad\mathrm{versus}\quad H_1:\theta&amp;gt;\theta_0.
$$
Then $\theta_1&amp;gt;\theta_0$, and the likelihood ratio test statistics would be
$$
\lambda(x)=\frac{f(x|\theta_1)}{f(x|\theta_0)}.
$$
And we say that the null hypothesis is rejected if $\lambda(x)&amp;gt;k$. To see if the distribution of the sample has MLR property, we simplify the above equation as follows:
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
$$
\begin{aligned}
\lambda(x)&amp;amp;=\frac{\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{(x-\theta_1)^2}{2}\right]}{\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{(x-\theta_0)^2}{2}\right]}\\
&amp;amp;=\exp
\left[-\frac{x^2-2x\theta_1+\theta_1^2}{2}+\frac{x^2-2x\theta_0+\theta_0^2}{2}\right]\\
&amp;amp;=\exp\left[\frac{2x\theta_1-\theta_1^2-2x\theta_0+\theta_0^2}{2}\right]\\
&amp;amp;=\exp\left[\frac{2x(\theta_1-\theta_0)-(\theta_1^2-\theta_0^2)}{2}\right]\\
&amp;amp;=\exp\left[x(\theta_1-\theta_0)\right]\times\exp\left[-\frac{\theta_1^2-\theta_0^2}{2}\right]
\end{aligned}
$$
which is increasing as a function of $x$, since $\theta_1&amp;gt;\theta_0$.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPqtDpMdd70cQJjLJU-1dYl2EYm31o0YhF8_8IsbQcMZIZ60jzJ3oex5ou6YTbvNZdwN4H6V4IuP-RYi_6pWhYVPrI4JMod_egy2ZfgxSk_zjZgBGjCXUz1PPgeSW0jJ1I5VOjeDiBbaZh/s1600/m1.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 1. Normal Densities with $\mu=1,2$.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d49f8ccc7482bc2a32df.js&quot;&gt;&lt;/script&gt;
By illustration, consider Figure 1. The plot of the likelihood ratio of these models is monotone increasing as seen in Figure 2, where rejecting $H_0$ if $\lambda(x)&amp;gt;k$ is equivalent to rejecting it if $T\geq t_0$.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCONTh9cNqxbETGNiZWbAzhLUh4nRRoMI4JW_n4cTrtkrBK-xcmI1rzqpWWt4OIUD4ikweapnlFfpRRdALTm2RlGNWz1KMNbtFhpyIY5AahtW4nmGfKdJa1SSEw1qUVWkHaguwqlzaV86U/s1600/m2.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 2. Likelihood Ratio of the Normal Densities.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;script src=&quot;https://gist.github.com/alstat/bbf49740f68f40719c08.js&quot;&gt;&lt;/script&gt;
And by factorization theorem the likelihood ratio test statistic can be written as a function of the sufficient statistics since the term, $h(x)$ will be cancelled out. That is,
$$
\lambda(t)=\frac{g(t|\theta_1)}{g(t|\theta_0)}.
$$
And by Karlin-Rubin theorem, the rejection region $R=\{t:t&amp;gt;t_0\}$ is a uniformly most powerful level-$\alpha$ test. Where $t_0$ satisfies the following:
$$
\begin{aligned}
\mathrm{P}(T&amp;gt;t_0|\theta_0)&amp;amp;=\mathrm{P}(T\in R|\theta_0)\\
\alpha&amp;amp;=1-\mathrm{P}(X\leq t_0|\theta_0)\\
1-\alpha&amp;amp;=\int_{-\infty}^{t_0}\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{(x-\theta_0)^2}{2}\right]\operatorname{d}x
\end{aligned}
$$
Hence the quantile of the $1-\alpha$ probability, which is $z_{\alpha}$ is equal to $t_0$, that is $z_{\alpha}=t_0$, and thus we reject $H_0$ if $T&amp;gt;z_{\alpha}$.&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Example 2&lt;/b&gt;&lt;br/&gt;
Now consider testing the hypotheses, $H_0:\theta\geq \theta_0$ versus $H_1:\theta&amp;lt; \theta_0$ using the sample $X$ (single observation) from Beta($\theta$, 2), and to be more specific let $\theta_0=4$ and $\theta_1=3$. Can we apply Karlin-Rubin?
Of course! Visually, we have something like in Figure 3. 
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjI3mKWraNNNHa2TvrfE7sDorZ3gFTfYa8q-2_1E1xIfVumHKi58E5KbVl8p_wClOBfqnXs6D75pvB_JO4175-9Qn8GP4FIN25tXgnLujRIoIPEthXkrkV5B1I0IdKScXp9lybEj_YR0_Rm/s1600/m3.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 3. Beta Densities Under Different Parameters.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;script src=&quot;https://gist.github.com/alstat/bf0d0bae4ad61c0b1985.js&quot;&gt;&lt;/script&gt;
Note that for this test, $\theta_1&amp;lt;\theta_0$, and so the likelihood ratio test statistics is simplified as follows:
$$
\begin{aligned}
\lambda(x)&amp;amp;=\frac{f(x|\theta_1=3, 2)}{f(x|\theta_0=4, 2)}=\frac{\displaystyle\frac{\Gamma(\theta_1+2)}{\Gamma(\theta_1)\Gamma(2)}x^{\theta_1-1}(1-x)^{2-1}}{\displaystyle\frac{\Gamma(\theta_0+2)}{\Gamma(\theta_0)\Gamma(2)}x^{\theta_0-1}(1-x)^{2-1}}\\
&amp;amp;=\frac{\displaystyle\frac{\Gamma(5)}{\Gamma(3)\Gamma(2)}x^{2}(1-x)}{\displaystyle\frac{\Gamma(6)}{\Gamma(4)\Gamma(2)}x^{3}(1-x)}=\frac{\displaystyle\frac{12\Gamma(3)}{\Gamma(3)\Gamma(2)}x^{2}(1-x)}{\displaystyle\frac{20\Gamma(4)}{\Gamma(4)\Gamma(2)}x^{3}(1-x)}\\
&amp;amp;=\frac{3}{5x},
\end{aligned}
$$
which is decreasing as a function of $x$, see the plot of this in Figure 4. And we say that $H_0$ is rejected if $\lambda(x) &gt; k$ if and only if $T &lt; t_0$. Where $t_0$ satisfies the following equations:
$$
\begin{aligned}
\mathrm{P}(T &lt; t_0|\theta_0)&amp;=\mathrm{P}(X &lt; t_0|\theta_0)\\
\alpha&amp;=\int_{0}^{t_0}\frac{\Gamma(\theta_0+2)}{\Gamma(\theta_0)\Gamma(2)}x^{\theta_0-1}(1-x)^{2-1}\operatorname{d}x\\
\alpha&amp;=\int_{0}^{t_0}\frac{\Gamma(6)}{\Gamma(4)\Gamma(2)}x^{3}(1-x)\operatorname{d}x.
\end{aligned}
$$
Hence the quantile of the $\alpha$ probability, $x_{\alpha}=t_0$. And thus we reject $H_0$ if $T &lt; x_{\alpha}$.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLiH2EYgwqTX-dd99U1bpdqhhLxttnKuvruxqr-qcKcGCAQergyB0YJAKp6mY3A0tOtr4QTM9WZ7eWJJphgEGEgZN_bm0cTIX1ABMV8fpFBIfffrAL6OR6uBBSCwrpqHf1ESN73iAEKath/s1600/m4.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 4. Likelihood Ratio of the Beta Densities.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;script src=&quot;https://gist.github.com/alstat/87844496a84cdbc56171.js&quot;&gt;&lt;/script&gt;
&lt;/div&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/1597733395520390528/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/07/parametric-inference-karlin-rubin.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/1597733395520390528'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/1597733395520390528'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/07/parametric-inference-karlin-rubin.html' title='Parametric Inference: Karlin-Rubin Theorem'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPqtDpMdd70cQJjLJU-1dYl2EYm31o0YhF8_8IsbQcMZIZ60jzJ3oex5ou6YTbvNZdwN4H6V4IuP-RYi_6pWhYVPrI4JMod_egy2ZfgxSk_zjZgBGjCXUz1PPgeSW0jJ1I5VOjeDiBbaZh/s72-c/m1.png" height="72" width="72"/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-3643474548584730338</id><published>2015-05-23T16:30:00.002+08:00</published><updated>2015-05-23T16:42:51.183+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>Parametric Inference: Likelihood Ratio Test Problem 2</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
More on Likelihood Ratio Test, the following problem is originally from Casella and Berger (2001), exercise 8.12.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Problem&lt;/h3&gt;
For samples of size $n=1,4,16,64,100$ from a normal population with mean $\mu$ and known variance $\sigma^2$, plot the power function of the following LRTs (Likelihood Ratio Tests). Take $\alpha = .05$.
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;
$H_0:\mu\leq 0$ versus $H_1:\mu&amp;gt;0$&lt;/li&gt;
&lt;li&gt;$H_0:\mu=0$ versus $H_1:\mu\neq 0$&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Solution&lt;/h3&gt;
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;
The LRT statistic is given by
$$
\lambda(\mathbf{x})=\frac{\displaystyle\sup_{\mu\leq 0}\mathcal{L}(\mu|\mathbf{x})}{\displaystyle\sup_{-\infty&amp;lt;\mu&amp;lt;\infty}\mathcal{L}(\mu|\mathbf{x})}, \;\text{since }\sigma^2\text{ is known}.
$$
The denominator can be expanded as follows:
$$
\begin{aligned}
\sup_{-\infty&amp;lt;\mu&amp;lt;\infty}\mathcal{L}(\mu|\mathbf{x})&amp;amp;=\sup_{-\infty&amp;lt;\mu&amp;lt;\infty}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;amp;=\sup_{-\infty&amp;lt;\mu&amp;lt;\infty}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{2\sigma^2}\right],\\
&amp;amp;\quad\text{since }\bar{x}\text{ is the MLE of }\mu.\\
&amp;amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{n-1}{n-1}\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{2\sigma^2}\right]\\
&amp;amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2}{2\sigma^2}\right],\\
\end{aligned}
$$
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
while the numerator is evaluated as follows:
$$
\begin{aligned}
\sup_{\mu\leq 0}\mathcal{L}(\mu|\mathbf{x})&amp;amp;=\sup_{\mu\leq 0}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;amp;=\sup_{\mu\leq 0}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}\right].
\end{aligned}
$$
Above expression will attain its maximum if the value inside the exponential function is small. And for negative values of $\mu\in(-\infty,0)$ the quantity $(x_i-\mu)^2$ would be large, implies that the exponential term would become small. Therefore, the only value that will give us the supremum likelihood is $\mu=\mu_0=0$. Hence, 
$$
\begin{aligned}
\sup_{\mu\leq 0}\mathcal{L}(\mu|\mathbf{x})&amp;amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\mu_0)^2}{2\sigma^2}\right]\\
=\frac{1}{(2\pi\sigma^2)^{1/n}}&amp;amp;\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x}+\bar{x}-\mu_0)^2}{2\sigma^2}\right]\\
=\frac{1}{(2\pi\sigma^2)^{1/n}}&amp;amp;\exp\left\{-\displaystyle\sum_{i=1}^{n}\left[\frac{(x_i-\bar{x})^2+2(x_i-\bar{x})(\bar{x}-\mu_0)+(\bar{x}-\mu_0)^2}{2\sigma^2}\right]\right\}\\
=\frac{1}{(2\pi\sigma^2)^{1/n}}&amp;amp;\exp\left[-\frac{(n-1)s^2+n(\bar{x}-\mu_0)^2}{2\sigma^2}\right], \\
&amp;amp;\text{since the middle term is 0.}\\
=\frac{1}{(2\pi\sigma^2)^{1/n}}&amp;amp;\exp\left[-\frac{(n-1)s^2+n\bar{x}^2}{2\sigma^2}\right], \text{since }\mu_0=0.\\
\end{aligned}
$$
So that
$$
\begin{equation}
\label{eq:lrtre}
\begin{aligned}
\lambda(\mathbf{x})&amp;amp;=\frac{\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2+n\bar{x}^2}{2\sigma^2}\right]}{\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2}{2\sigma^2}\right]}\\
&amp;amp;=\exp\left[-\frac{n\bar{x}^2}{2\sigma^2}\right].\\
\end{aligned}
\end{equation}
$$
And we reject the null hypothesis if $\lambda(\mathbf{x})\leq c$, that is
$$
\begin{aligned}
\exp\left[-\frac{n\bar{x}^2}{2\sigma^2}\right]&amp;amp;\leq c\\
-\frac{n\bar{x}^2}{2\sigma^2}&amp;amp;\leq \log c\\
\frac{\lvert\bar{x}\rvert}{\sigma/\sqrt{n}}&amp;amp;\geq\sqrt{-2\log c}=c&#39;.
\end{aligned}
$$
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihyphenhyphenPbZeiZG14r_0R4mp4N5kXAYmtYIKb0eBzMjd8PKhH0iPIpdnLJ2FssyKVv3GUAObCE7UE96rQAx2IKcbiBAg4ALz5PmNOuUFD8cFKirRik23oeS2gwygM9e0eINYFhfzKaFJK-rQCLy/s1600/Rplot05.png&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 1: Plot of Likelihood Ratio Test Statistic for $n = 4,\sigma = 1$.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br/&gt;

&lt;script src=&quot;https://gist.github.com/alstat/9d0f59f757950d48bbde.js&quot;&gt;&lt;/script&gt;
Hence, rejecting the null hypothesis if $\lambda(\mathbf{x})\leq c$, is equivalent to rejecting $H_0$ if $\frac{\bar{x}}{\sigma/\sqrt{n}}\geq c&#39;\in[0,\infty)$. Figure 1 depicts the plot of the LRT, the shaded region is on the positive side because that&#39;s where the alternative region is, $H_1:\mu&amp;gt;0$, in a sense that if the LRT is small enough to reject $H_0$, then it simply tells us that the plausibility of the parameter in the alternative in explaining the sample is higher compared to the null hypothesis. And if that&#39;s the case, we expect the sample to come from the model proposed by $H_1$, so that the sample mean $\bar{x}$, being an unbiased estimator of the population mean $\mu$, a function of the LRT statistic, should fall on the side (shaded region) of the alternative.&lt;br /&gt;&lt;br /&gt;
So that the power function, that is the probability of rejecting the null hypothesis given that it is true (the probability of Type I error) is,
$$
\begin{aligned}
\beta(\mu)&amp;amp;=\mathrm{P}\left[\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\geq c&#39;\right],\quad\mu_0=0\\
&amp;amp;=1-\mathrm{P}\left[\frac{\bar{x}+\mu-\mu-\mu_0}{\sigma/\sqrt{n}}&amp;lt; c&#39;\right]\\
&amp;amp;=1-\mathrm{P}\left[\frac{\bar{x}-\mu}{\sigma/\sqrt{n}} + \frac{\mu-\mu_0}{\sigma/\sqrt{n}}&amp;lt; c&#39;\right]\\
&amp;amp;=1-\mathrm{P}\left[\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}&amp;lt; c&#39;+ \frac{\mu_0-\mu}{\sigma/\sqrt{n}}\right]\\
&amp;amp;=1-\Phi\left[c&#39;+ \frac{\mu_0-\mu}{\sigma/\sqrt{n}}\right].
\end{aligned}
$$
Values taken by $\Phi$ are negative and so it decreases, but since we subtracted it to 1, then $\beta(\mu)$ is an increasing function. So that for $\alpha=.05$,
$$
\begin{aligned}
\alpha&amp;amp;=\sup_{\mu\leq \mu_0}\beta(\mu)\\
.05&amp;amp;=\beta(\mu_0)\Rightarrow\beta(\mu_0)=1-\Phi(c&#39;)\\
.95&amp;amp;=\Phi(c&#39;)\Rightarrow c&#39;=1.645.
\end{aligned}
$$
Since,
$$
\begin{aligned}
\Phi(1.645)=\int_{-\infty}^{1.645}\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{x^2}{2}\right]\operatorname{d}x=.9500151.
\end{aligned}
$$
Therefore for $c&#39;=1.645,\mu_0=0,\sigma=1$, the plot of the power function as a function of $\mu$ for different sample size, $n$, is shown in Figure 2. For example, for $n=1$ we compute for the function
\begin{equation}
\label{eq:powcomp}
\begin{aligned}
\beta(\mu)&amp;amp;=1-\Phi\left[c&#39;+ \frac{\mu_0-\mu}{\sigma/\sqrt{n}}\right]\\
&amp;amp;=1-\Phi\left[1.645+ \frac{0-\mu}{1/\sqrt{1}}\right]\\
&amp;amp;=1-\int_{-\infty}^{\left(1.645+ \frac{0-\mu}{1/\sqrt{1}}\right)}\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{x^2}{2}\right]\operatorname{d}x.
\end{aligned}
\end{equation}
The obtained values would be the $y$. For $n = 64$
$$
\begin{aligned}
\beta(\mu)&amp;amp;=1-\Phi\left[c&#39;+ \frac{\mu_0-\mu}{\sigma/\sqrt{n}}\right]\\
&amp;amp;=1-\Phi\left[1.645+ \frac{0-\mu}{1/\sqrt{64}}\right]\\
&amp;amp;=1-\int_{-\infty}^{\left(1.645+ \frac{0-\mu}{1/\sqrt{64}}\right)}\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{x^2}{2}\right]\operatorname{d}x,
\end{aligned}
$$
and so on.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGJmgbTJ2HqDGAcTPWOZ5TcFayUljxccLRL4LfOcvFh2Q-vlMlis5cfxCnXPE8ld_jMDt6ABUlzrjhICaz1Mm6gO2qP-BeISoV1CFqx29kSd-6yDPWj2DYR7lY1FceqeHBkSzQi2UJJzui/s1600/p1.png&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 2: Power Function for Different Values of $n$.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/056d1f4e3684f374c6bc.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;
The LRT statistic is given by
$$
\lambda(\mathbf{x})=\frac{\displaystyle\sup_{\mu= 0}\mathcal{L}(\mu|\mathbf{x})}{\displaystyle\sup_{-\infty&lt;\mu&lt;\infty}\mathcal{L}(\mu|\mathbf{x})}, \;\text{since }\sigma^2\text{ is known}.
$$
The denominator can be expanded as follows:
$$
\begin{aligned}
\sup_{-\infty&lt;\mu&lt;\infty}\mathcal{L}(\mu|\mathbf{x})&amp;=\sup_{-\infty&lt;\mu&lt;\infty}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;=\sup_{-\infty&lt;\mu&lt;\infty}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{2\sigma^2}\right],\\
&amp;\quad\;\text{since }\bar{x}\text{ is the MLE of }\mu.\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{n-1}{n-1}\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2}{2\sigma^2}\right],\\
\end{aligned}
$$
and the numerator is evaluated as follows:
$$
\begin{aligned}
\sup_{\mu=0}\mathcal{L}(\mu|\mathbf{x})&amp;=\sup_{\mu=0}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;=\sup_{\mu=0}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-0)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2+n\bar{x}^2}{2\sigma^2}\right],
\end{aligned}
$$
we skip some lines in the above simplification since we&#39;ve done this already in part (a). And by Equation (1), $\lambda(\mathbf{x})=\exp\left[-\frac{n\bar{x}^2}{2\sigma^2}\right]$. So that $\lambda(\mathbf{x})\leq c$ would be
$$
\begin{aligned}
\exp\left[-\frac{n\bar{x}^2}{2\sigma^2}\right]&amp;\leq c\\
-\frac{n\bar{x}^2}{2\sigma^2}&amp;\leq \log c\\
\frac{\lvert\bar{x}-\mu_0\rvert}{\sigma/\sqrt{n}}&amp;\geq\sqrt{-2\log c}=c&#39;,\quad \mu_0=0.
\end{aligned}
$$
So rejecting the null hypothesis if $\lambda(\mathbf{x})\leq c&#39;$ is equivalent to rejecting $H_0$ if $\frac{\lvert\bar{x}\rvert}{\sigma/\sqrt{n}}\geq c&#39;$. And since $H_1$ is two-sided, then we reject $H_0$ if $\frac{\bar{x}}{\sigma/\sqrt{n}}\geq c&#39;$ or $\frac{\bar{x}}{\sigma/\sqrt{n}}\leq -c&#39;$. To illustrate this, consider Figure 3 where the two shaded regions are the lower and upper rejection regions.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR1qESpUyjhyrtn46O577Fqr8LwxsxjPmoHWIhzd3uGjcWQA1Op7RCZmoh4rQ0H66QT8Gbli5jWTu6MQ62qT-lCGguXn8simjsqAwXv2I20hHllz9h7xDM1574D1xzNoDkCGE2Gno3FIpW/s1600/Rplot04.png&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 3: Plot of Likelihood Ratio Test Statistic for $n = 4,\sigma = 1$.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/2dc31448dede2706b261.js&quot;&gt;&lt;/script&gt;
So that the power function is,

$$
\begin{aligned}
\beta(\mu)&amp;=\mathrm{P}\left[\frac{\lvert\bar{x}\rvert}{\sigma/\sqrt{n}}\geq c&#39;\right]\\
&amp;=1 - \mathrm{P}\left[\frac{\lvert\bar{x}\rvert}{\sigma/\sqrt{n}}&lt; c&#39;\right]\\
&amp;=1 - \mathrm{P}\left[-c&#39;&lt;\frac{\bar{x}}{\sigma/\sqrt{n}}&lt; c&#39;\right]\\
&amp;=1 - \left\{\mathrm{P}\left[\frac{\bar{x}}{\sigma/\sqrt{n}}&lt; c&#39;\right]-\mathrm{P}\left[\frac{\bar{x}}{\sigma/\sqrt{n}}&lt; -c&#39;\right]\right\}\\
&amp;=1 - \left\{\mathrm{P}\left[\frac{\bar{x}+\mu-\mu}{\sigma/\sqrt{n}}&lt; c&#39;\right]-\mathrm{P}\left[\frac{\bar{x}+\mu-\mu}{\sigma/\sqrt{n}}&lt; -c&#39;\right]\right\}\\
&amp;=1 - \mathrm{P}\left[\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}&lt; c&#39;-\frac{\mu}{\sigma/\sqrt{n}}\right]+\mathrm{P}\left[\frac{\bar{x}-\mu}{\sigma/\sqrt{n}}&lt; -c&#39;-\frac{\mu}{\sigma/\sqrt{n}}\right]\\
&amp;=\underbrace{1 - \Phi\left[c&#39;-\frac{\mu}{\sigma/\sqrt{n}}\right]}_{\Phi_1}+\underbrace{\Phi\left[-c&#39;-\frac{\mu}{\sigma/\sqrt{n}}\right]}_{\Phi_2}.
\end{aligned}
$$
Notice that $\Phi_1$ is an increasing function, while $\Phi_2$ is decreasing as a function of $\mu$. We expect this since the alternative hypothesis is a two-sided one, so does the power. To see this, consider Figure 4 for different values of $n$.
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwKsCla95GYIlyArs6_JvTi41hxJQ6pI__qBzGGzxvVcHS6mVR1J_6FUI_DVzSPyi-bPZtVppPEYsRKofMFHrT54NHq1ThFc7uaDyxv9FiH7O37hUqzfidP4ZNxqJSovIbSqCV5hlmLtHz/s1600/p3.png&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Figure 4: Two-Sided Power Function for Different $n$.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/dfbb443b94fa558477b9.js&quot;&gt;&lt;/script&gt;
The points in the plot are computed by substituting values of $\mu=0,\sigma=1$ and $n$ to the power function just like we did in Equation (2).
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.nicebread.de/shading-regions-of-the-normal-the-stanine-scale/&quot;&gt;Felix Schönbrodt. &lt;i&gt;Shading regions of the normal: The Stanine scale.&lt;/i&gt; Retrieved May 2015.&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/3643474548584730338/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/05/parametric-inference-likelihood-ratio_23.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/3643474548584730338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/3643474548584730338'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/05/parametric-inference-likelihood-ratio_23.html' title='Parametric Inference: Likelihood Ratio Test Problem 2'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihyphenhyphenPbZeiZG14r_0R4mp4N5kXAYmtYIKb0eBzMjd8PKhH0iPIpdnLJ2FssyKVv3GUAObCE7UE96rQAx2IKcbiBAg4ALz5PmNOuUFD8cFKirRik23oeS2gwygM9e0eINYFhfzKaFJK-rQCLy/s72-c/Rplot05.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-6951754898016482363</id><published>2015-05-21T15:25:00.000+08:00</published><updated>2015-05-22T14:19:12.991+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><title type='text'>Parametric Inference: Likelihood Ratio Test Problem 1</title><content type='html'>Another post for mathematical statistics, the problem below is originally from Casella and Berger (2001) (&lt;i&gt;see&lt;/i&gt; Reference 1), exercise 8.6.
&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Problem&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Suppose that we have two independent random samples $X_1,\cdots, X_n$ are exponential($\theta$), and $Y_1,\cdots, Y_m$ are exponential($\mu$).
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt; Find the LRT (Likelihood Ratio Test) of $H_0:\theta=\mu$ versus $H_1:\theta\neq\mu$.&lt;/li&gt;
&lt;li&gt; Show that the test in part (a) can be based on the statistic&lt;/li&gt; 
$$
T=\frac{\sum X_i}{\sum X_i+\sum Y_i}.
$$
&lt;li&gt; Find the distribution of $T$ when $H_0$ is true.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Solution&lt;/h3&gt;
&lt;ol&gt;
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;
The Likelihood Ratio Test is given by
$$
\lambda(\mathbf{x},\mathbf{y}) = \frac{\displaystyle\sup_{\theta = \mu,\mu&gt;0}\mathrm{P}(\mathbf{x},\mathbf{y}|\theta,\mu)}{\displaystyle\sup_{\theta &gt; 0,\mu&gt;0}\mathrm{P}(\mathbf{x}, \mathbf{y}|\theta,\mu)},
$$
where the denominator is evaluated as follows:
$$
\sup_{\theta &gt; 0,\mu&gt;0}\mathrm{P}(\mathbf{x}, \mathbf{y}|\theta,\mu)=
\sup_{\theta &gt; 0}\mathrm{P}(\mathbf{x}|\theta)\sup_{\mu &gt; 0}\mathrm{P}(\mathbf{y}|\mu),\quad\text{by independence.}
$$
So that,
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
$$
\begin{aligned}
\sup_{\theta &gt; 0}\mathrm{P}(\mathbf{x}|\theta)&amp;=\sup_{\theta&gt;0}\prod_{i=1}^{n}\frac{1}{\theta}\exp\left[-\frac{x_i}{\theta}\right]=\sup_{\theta&gt;0}\frac{1}{\theta^n}\exp\left[-\frac{\sum_{i=1}^{n}x_i}{\theta}\right]\\
&amp;=\frac{1}{\bar{x}^n}\exp\left[-\frac{\sum_{i=1}^{n}x_i}{\bar{x}}\right]=\frac{1}{\bar{x}^n}\exp[-n],
\end{aligned}
$$
since $\bar{x}$, or the sample mean is the MLE of $\theta$. Also,
$$
\begin{aligned}
\sup_{\mu &gt; 0}\mathrm{P}(\mathbf{y}|\mu)&amp;=\sup_{\mu&gt;0}\prod_{j=1}^{m}\frac{1}{\mu}\exp\left[-\frac{y_j}{\mu}\right]=\sup_{\mu&gt;0}\frac{1}{\mu^m}\exp\left[-\frac{\sum_{j=1}^{m}y_j}{\mu}\right]\\
&amp;=\frac{1}{\bar{y}^m}\exp\left[-\frac{\sum_{j=1}^{m}y_j}{\bar{y}}\right]=\frac{1}{\bar{y}^m}\exp[-m].
\end{aligned}
$$
Now the numerator is evaluated as follows,
$$
\begin{aligned}
\sup_{\theta = \mu,\mu&gt;0}\mathrm{P}(\mathbf{x},\mathbf{y}|\theta,\mu)&amp;=\sup_{\theta=\mu,\mu&gt;0}\mathrm{P}(\mathbf{x}|\theta)\mathrm{P}(\mathbf{y}|\mu),\quad\text{by independence.}\\
&amp;=\sup_{\theta=\mu,\mu&gt;0}\prod_{i=1}^{n}\frac{1}{\theta}\exp\left[-\frac{x_i}{\theta}\right]\prod_{j=1}^{m}\frac{1}{\mu}\exp\left[-\frac{y_j}{\mu}\right]\\
&amp;=\sup_{\theta=\mu,\mu&gt;0}\frac{1}{\theta^n}\exp\left[-\frac{\sum_{i=1}^nx_i}{\theta}\right]\frac{1}{\mu^m}\exp\left[-\frac{\sum_{j=1}^m y_j}{\mu}\right]\\
&amp;=\sup_{\mu&gt;0}\frac{1}{\mu^n}\exp\left[-\frac{\sum_{i=1}^nx_i}{\mu}\right]\frac{1}{\mu^m}\exp\left[-\frac{\sum_{j=1}^m y_j}{\mu}\right]\\
&amp;=\sup_{\mu&gt;0}\frac{1}{\mu^{n+m}}\exp\left\{-\frac{1}{\mu}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]\right\}
\end{aligned}
$$
Note that $\mu$ is a nuisance parameter, and so we will also maximize this over its domain. And to do that we take the log-likeihood function first, 
$$
\begin{aligned}
\ell(\mu|\mathbf{x},\mathbf{y})&amp;=-\log(\mu^{n+m})-\frac{1}{\mu}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]\\
&amp;=-(n+m)\log(\mu)-\frac{1}{\mu}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right].
\end{aligned}
$$
Taking the derivative with respect to $\mu$, gives us
$$
\frac{\operatorname{d}}{\operatorname{d}\mu}\ell(\mu|\mathbf{x},\mathbf{y})=-(n+m)\frac{1}{\mu}+\frac{1}{\mu^2}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right],
$$
equate this to zero to obtain the stationary point,
$$
\begin{aligned}
-(n+m)\frac{1}{\mu}+\frac{1}{\mu^2}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]&amp;=0\\
-(n+m)\mu+\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]&amp;=0\\
\mu&amp;=\frac{1}{n+m}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right].
\end{aligned}
$$
To verify if this is the MLE, we take the second derivative test for the log-likelihood function,
$$
\frac{\operatorname{d}^2}{\operatorname{d}\mu^2}\ell(\mu|\mathbf{x},\mathbf{y})=(n+m)\frac{1}{\mu^2}-\frac{2}{\mu^3}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]&lt;0,
$$
since $\frac{1}{\mu^2}&lt;\frac{2}{\mu^3}$, implying $\hat{\mu}=\displaystyle\frac{1}{n+m}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]$ is the MLE of $\mu$.
Thus the LRT, $\lambda(\mathbf{x},\mathbf{y})$ would be,
$$
\begin{aligned}
\lambda(\mathbf{x},\mathbf{y})&amp;=\frac{\sup_{\mu&gt;0}\displaystyle\frac{1}{\mu^{n+m}}\exp\left\{-\frac{1}{\mu}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]\right\}}{\displaystyle\frac{1}{\bar{x}^n}\frac{1}{\bar{y}^m}\exp[-(n+m)]}\\
&amp;=\left(\frac{1}{\frac{1}{{(n+m)}^{n+m}}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]^{n+m}}\times\right.\\
&amp;\qquad\left.\exp\left\{-\frac{1}{\frac{1}{n+m}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]\right\}\right)\bigg/\\
&amp;\qquad\qquad\qquad\displaystyle\frac{1}{\bar{x}^n}\frac{1}{\bar{y}^m}\exp[-(n+m)]
\end{aligned}
$$
$$
\begin{aligned}
&amp;=\frac{\displaystyle\frac{1}{\displaystyle\frac{1}{{(n+m)}^{n+m}}\left[\displaystyle\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]^{n+m}}\times\exp[-(n+m)]}{\displaystyle\frac{1}{\bar{x}^n}\frac{1}{\bar{y}^m}\exp[-(n+m)]}\\[.3cm]
&amp;=\frac{\displaystyle \bar{x}^n \bar{y}^m}{\displaystyle\frac{1}{{(n+m)}^{n+m}}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]^{n+m}}.
\end{aligned}
$$
And we say that $H_0$ is rejected if $\lambda(\mathbf{x},\mathbf{y})\leq c$.
&lt;/li&gt;
&lt;li&gt;
If we do some algebra on the LRT in part (a), we obtain the following:
$$
\begin{aligned}
\lambda(\mathbf{x},\mathbf{y})&amp;=\frac{\displaystyle \bar{x}^n \bar{y}^m}{\displaystyle\frac{1}{{(n+m)}^{n+m}}\left[\sum_{i=1}^nx_i+\sum_{j=1}^m y_j\right]^{n+m}}\\
&amp;=\frac{\displaystyle\frac{1}{n^n}\left(\sum_{i=1}^{n}x_i\right)^{n}\frac{1}{m^{m}}\left(\sum_{j=1}^{m}y_j\right)^{m}}{\displaystyle\frac{1}{(n+m)^{n+m}}\left[\sum_{i=1}^{n}x_i+\sum_{j=1}^{m}y_j\right]^{n+m}}\\
&amp;=\frac{\displaystyle (n+m)^{n+m}\left(\sum_{i=1}^{n}x_i\right)^{n}\left(\sum_{j=1}^{m}y_j\right)^{m}}{\displaystyle n^{n}m^{m}\left[\sum_{i=1}^{n}x_i+\sum_{j=1}^{m}y_j\right]^{n+m}}\\
&amp;=\frac{(n+m)^{n+m}}{n^nm^{m}}\left[\frac{\displaystyle \sum_{j=1}^{m}y_j}{\displaystyle\sum_{i=1}^{n}x_i+\sum_{j=1}^{m}y_j}\right]^{m}\left[\frac{\displaystyle \sum_{i=1}^{n}x_i}{\displaystyle\sum_{i=1}^{n}x_i+\sum_{j=1}^{m}y_j}\right]^{n}\\
&amp;=\frac{(n+m)^{n+m}}{n^nm^{m}}\left[1-\frac{\displaystyle \sum_{i=1}^{n}x_i}{\displaystyle\sum_{i=1}^{n}x_i+\sum_{j=1}^{m}y_j}\right]^{m}\left[\frac{\displaystyle \sum_{i=1}^{n}x_i}{\displaystyle\sum_{i=1}^{n}x_i+\sum_{j=1}^{m}y_j}\right]^{n}\\
&amp;=\frac{(n+m)^{n+m}}{n^nm^{m}}\left[1-T\right]^{m}\left[T\right]^{n}.
\end{aligned}
$$
Hence, the LRT can be based on the statistic $T$.
&lt;/li&gt;
&lt;li&gt;
The distribution of $\sum X_i$ is obtain using the MGF (Moment Generating Function) technique, that is
$$
\begin{aligned}
\mathrm{M}_{\Sigma X_i}(t)&amp;=\mathrm{E}\exp[t\Sigma X_i]=\mathrm{E}\exp[tX_1 +\cdots + tX_n]\\
&amp;=\mathrm{E}\exp[tX_1]\times\cdots\times\mathrm{E}\exp[tX_n],\quad\text{by independence.}\\
&amp;=\frac{1}{1-\theta t}\times\cdots\times\frac{1}{1-\theta t}\\
&amp;=\left(\frac{1}{1-\theta t}\right)^{n}=\text{MGF of gamma}(n,\theta).
\end{aligned}
$$
Now, when $H_0$ is true then $\sum X_i$ is gamma($m,\theta$). For brevity, let $X=\sum_{i=1}^{n} X_i$ and $Y=\sum_{j=1}^{m}Y_j$. The joint distribution of $X$ and $Y$ is given below,
$$
f_{XY}(x, y)=\frac{1}{\Gamma (n)\theta^{n}}x^{n-1}\exp[-x/\theta]\times\frac{1}{\Gamma (m)\theta^{m}}y^{m-1}\exp[-y/\theta].
$$
Let $U=\frac{X}{X+Y}$ and $V=X+Y$, then the support of $(X,Y)$ is $\mathcal{A}=\left\{(x,y)\in \mathbb{R}^{+}\times \mathbb{R}^{+}\right\}$. Since the transformations $U$ and $V$ is one-to-one and onto, then $\mathcal{B}=\left\{(u,v)\in [0,1]\times \mathbb{R}^{+}\right\}$. Consider the following transformations
$$
u=g_{1}(x,y)=\frac{x}{x+y}\quad\text{and}\quad v=g_{2}(x,y)=x+y.
$$
Then,
\begin{equation}
\label{eq:bvt1}
u=\frac{x}{x+y}\Rightarrow x=\frac{uy}{1-u}
\end{equation}
and
\begin{equation}
\label{eq:bvt2}
v=x+y\Rightarrow y = v-x.
\end{equation}
Substitute Equation (\ref{eq:bvt2}) to Equation (\ref{eq:bvt1}), then
$$
\begin{aligned}
x=\frac{u(v-x)}{1-u}&amp;\Rightarrow x(1-u)=u(v-x)\\
x-ux=uv-ux&amp;\Rightarrow x=uv=h_{1}(u,v).
\end{aligned}
$$
Substitute $x$ above to Equation (\ref{eq:bvt2}) to obtain,
$$y=v(1-u)=h_2(u,v).$$
And the Jacobian matrix is,
$$
\mathbf{J}=\bigg|
\begin{array}{cc}
v&amp;u\\[.2cm]
-v&amp;1-u
\end{array}
\bigg|=v(1-u)+uv=v.
$$
So that,
$$
\begin{aligned}
f_{UV}(u,v)&amp;=f_{XY}(h_1(u,v),h_2(u,v))\lvert \mathbf{J}\rvert=f_{XY}(uv,v(1-u))\lvert v\rvert\\
&amp;=\frac{1}{\Gamma (n)\theta^{n}}(uv)^{n-1}\exp[-uv/\theta]\times\\
&amp;\quad\;\frac{1}{\Gamma (m)\theta^{m}}(v(1-u))^{m-1}\exp[-v(1-u)/\theta]v\\
&amp;=\frac{1}{\Gamma (n)\theta^{n}}(uv)^{n-1}\exp[-uv/\theta]\times\\
&amp;\quad\;\frac{1}{\Gamma (m)\theta^{m}}(v(1-u))^{m-1}\exp[-v/\theta]\exp[uv/\theta]v\\
&amp;=\frac{1}{\Gamma (n)\theta^{n}}u^{n-1}v^{n-1}\times\frac{1}{\Gamma (m)\theta^{m}}v^{m-1}(1-u)^{m-1}\exp[-v/\theta]v\\
&amp;=\frac{1}{\Gamma (n)}\underbrace{u^{n-1}(1-u)^{m-1}}_{\text{Beta}(n,m)\text{ kernel}}\frac{1}{\Gamma (m)\theta^{m+n}}v^{m-1}v^{n-1}\exp[-v/\theta]v\\
&amp;=\frac{\Gamma(m)\Gamma(m+n)}{\Gamma(m)\Gamma(m+n)}\frac{u^{n-1}(1-u)^{m-1}}{\Gamma (n)}\times\\
&amp;\quad\;\frac{1}{\Gamma (m)\theta^{m+n}}v^{m-1}v^{n}\exp[-v/\theta]\\
&amp;=\underbrace{\frac{\Gamma(m+n)}{\Gamma (n)\Gamma(m)}u^{n-1}(1-u)^{m-1}}_{\text{Beta}(n,m)}\times\\
&amp;\quad\;\underbrace{\frac{1}{\Gamma(m+n)\theta^{m+n}}v^{m+n-1}\exp[-v/\theta]}_{\text{Gamma}(m+n,\theta)}.
\end{aligned}
$$
So that the marginal density of $U=\displaystyle\frac{\sum X_i}{\sum X_i +\sum Y_i}$ is Beta($n,m$).
&lt;/li&gt;
&lt;/ol&gt;
&lt;/ol&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/6951754898016482363/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/05/parametric-inference-likelihood-ratio.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6951754898016482363'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6951754898016482363'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/05/parametric-inference-likelihood-ratio.html' title='Parametric Inference: Likelihood Ratio Test Problem 1'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-2562292771002515136</id><published>2015-05-01T15:17:00.000+08:00</published><updated>2015-08-17T10:41:57.752+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Interactive Visualization"/><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><title type='text'>Parametric Inference: The Power Function of the Test</title><content type='html'>In Statistics, we model random phenomenon and make conclusions about its population. For example, in an experiment of determining the true heights of the students in the university. Suppose we take sample from the population of the students, and consider testing the null hypothesis that the average height is 5.4 ft against an alternative hypothesis that the average height is greater than 5.4 ft. Mathematically, we can represent this as $H_0:\theta=\theta_0$ vs $H_1:\theta&gt;\theta_0$, where $\theta$ is the true value of the parameter, and $\theta_0=5.4$ is the testing value set by the experimenter. And because we only consider subset (the sample) of the population for testing the hypotheses, then we expect for errors we commit. To understand these errors, consider if the above test results into rejecting $H_0$ given that $\theta\in\Theta_0$, where $\Theta_0$ is the parameter space of the null hypothesis, in other words we mistakenly reject $H_0$, then in this case we committed a Type I error. Another is, if the above test results into accepting $H_0$ given that $\theta\in\Theta_0^c$, where $\Theta_0^c$ is the parameter space of the alternative hypothesis, then we committed a Type II error. To summarize this consider the following table,&lt;br/&gt;&lt;br/&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th align = &quot;center&quot;&gt;Truth&lt;/th&gt;&lt;th colspan = &quot;2&quot; align = &quot;center&quot;&gt;Decision&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;4&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 1: Two Types of Errors in Hypothesis Testing.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td align = &quot;center&quot;&gt;Accept $H_0$&lt;/td&gt;&lt;td align = &quot;center&quot;&gt;Reject $H_0$&lt;/td&gt;&lt;/tr&gt;
&lt;tr class = &quot;alt&quot;&gt;&lt;td&gt;$H_0$&lt;/td&gt;&lt;td&gt;Correct Decision&lt;/td&gt;&lt;td&gt;Type I Error&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;$H_1$&lt;/td&gt;&lt;td&gt;Type II Error&lt;/td&gt;&lt;td&gt;Correct Decision&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br/&gt;
Let&#39;s formally define the power function, from Casella and Berger (2001), see reference 1.
&lt;blockquote&gt;
&lt;b&gt;Definition 1&lt;/b&gt;. The &lt;i&gt;power function&lt;/i&gt; of a hypothesis test with rejection region $R$ is the function of $\theta$ defined by $\beta(\theta)=\mathrm{P}_{\theta}(\mathbf{X}\in R)$.
&lt;/blockquote&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
To relate the definition to the above problem, if $R$ is the rejection region of $H_0$. Then we make mistake if the sample observed, $\mathbf{x}$, $\mathbf{x}\in R$ given that $\theta\in\Theta_0$. That is, $\beta(\theta)=\mathrm{P}_{\theta}(\mathbf{X}\in R)$ is the probability of Type I error. Let&#39;s consider an example, one that is popularly used in testing the sample mean. The example below is the combined problem of Example 8.3.3 and Exercise 8.37 (a) of reference 1.
&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Example 1&lt;/b&gt;. Let $X_1,\cdots, X_n\overset{r.s.}{\sim}N(\mu,\sigma^2)$ -- normal population where $\sigma^2$ is known. Consider testing $H_0:\theta\leq \theta_0$ vs $H_1:\theta&gt; \theta_0$, obtain the likelihood ratio test (LRT) statistic and its power function.
&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;Solution:&lt;/i&gt;
The LRT statistic is given by
$$
\lambda(\mathbf{x})=\frac{\displaystyle\sup_{\theta\leq\theta_0}L(\theta|\mathbf{x})}{\displaystyle\sup_{-\infty&lt;\theta&lt;\infty}L(\theta|\mathbf{x})},
$$
where
$$
\begin{aligned}
\sup_{\theta\leq\theta_0}L(\theta|\mathbf{x})&amp;=\sup_{\theta\leq\theta_0}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\theta)^2}{2\sigma^2}\right]\\
&amp;=\sup_{\theta\leq\theta_0}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\theta)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\theta_0)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x}+\bar{x}-\theta_0)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left\{-\displaystyle\sum_{i=1}^{n}\left[\frac{(x_i-\bar{x})^2+2(x_i-\bar{x})(\bar{x}-\theta_0)+(\bar{x}-\theta_0)^2}{2\sigma^2}\right]\right\}\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2+n(\bar{x}-\theta_0)^2}{2\sigma^2}\right], \text{since the middle term is 0.}
\end{aligned}
$$
And
$$
\begin{aligned}
\sup_{-\infty&lt;\theta&lt;\infty}L(\theta|\mathbf{x})&amp;=\sup_{-\infty&lt;\theta&lt;\infty}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\theta)^2}{2\sigma^2}\right]\\
&amp;=\sup_{-\infty&lt;\theta&lt;\infty}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\theta)^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{2\sigma^2}\right],\quad\text{since }\bar{x}\text{ is the MLE of }\theta.\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{n-1}{n-1}\displaystyle\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{2\sigma^2}\right]\\
&amp;=\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2}{2\sigma^2}\right],\\
\end{aligned}
$$
so that
$$
\begin{aligned}
\lambda(\mathbf{x})&amp;=\frac{\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2+n(\bar{x}-\theta_0)^2}{2\sigma^2}\right]}{\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\frac{(n-1)s^2}{2\sigma^2}\right]}\\
&amp;=\exp\left[-\frac{n(\bar{x}-\theta_0)^2}{2\sigma^2}\right].\\
\end{aligned}
$$
And from my previous &lt;a href=&quot;http://alstatr.blogspot.com/2015/04/parametric-inference-likelihood-ratio.html&quot; target = &quot;_blank&quot;&gt;entry&lt;/a&gt;, $\lambda(\mathbf{x})$ is rejected if it is small, such that $\lambda(\mathbf{x})\leq c$ for some $c\in[0,1]$. Hence,
$$
\begin{aligned}
\lambda(\mathbf{x})&amp;=\exp\left[-\frac{n(\bar{x}-\theta_0)^2}{2\sigma^2}\right]&lt; c\\&amp;\Rightarrow-\frac{n(\bar{x}-\theta_0)^2}{2\sigma^2}&lt;\log c\\
&amp;\Rightarrow\frac{\bar{x}-\theta_0}{\sigma/\sqrt{n}}&gt;\sqrt{-2\log c}.
\end{aligned}
$$
So that $H_0$ is rejected if $\frac{\bar{x}-\theta_0}{\sigma/\sqrt{n}}&gt; c&#39;$ for some $c&#39;=\sqrt{-2\log c}\in[0,\infty)$. Now the power function of the test, is the probability of rejecting the null hypothesis given that it is true, or the probability of the Type I error given by,
$$
\begin{aligned}
\beta(\theta)&amp;=\mathrm{P}\left[\frac{\bar{x}-\theta_0}{\sigma/\sqrt{n}}&gt; c&#39;\right]\\
&amp;=\mathrm{P}\left[\frac{\bar{x}-\theta+\theta-\theta_0}{\sigma/\sqrt{n}}&gt; c&#39;\right]\\
&amp;=\mathrm{P}\left[\frac{\bar{x}-\theta}{\sigma/\sqrt{n}}+\frac{\theta-\theta_0}{\sigma/\sqrt{n}}&gt; c&#39;\right]\\
&amp;=\mathrm{P}\left[\frac{\bar{x}-\theta}{\sigma/\sqrt{n}}&gt; c&#39;-\frac{\theta-\theta_0}{\sigma/\sqrt{n}}\right]\\
&amp;=1-\mathrm{P}\left[\frac{\bar{x}-\theta}{\sigma/\sqrt{n}}\leq c&#39;+\frac{\theta_0-\theta}{\sigma/\sqrt{n}}\right]\\
&amp;=1-\Phi\left[c&#39;+\frac{\theta_0-\theta}{\sigma/\sqrt{n}}\right].
\end{aligned}
$$
To illustrate this, consider $\theta_0=5.4,\sigma = 1,n=30$ and $c&#39;=1.645$. Then the plot of the power function as a function of $\theta$ is,
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/203/&quot; target=&quot;_blank&quot; title=&quot;Power Function&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/203.png&quot; alt=&quot;Power Function&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:203&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
Since $\beta$ is an increasing function with unit range, then
$$
\alpha = \sup_{\theta\leq\theta_0}\beta(\theta)=\beta(\theta_0)=1-\Phi(c&#39;).
$$
So that using values we set for the above graph, $\alpha=0.049985\approx 0.05$, $\alpha$ here is called the &lt;i&gt;size of the test&lt;/i&gt; since it is the supremum of the power function over $\theta\leq\theta_0$, see reference 1 for &lt;i&gt;level of the test&lt;/i&gt;. Now let&#39;s investigate the power function above, the probability of committing Type I error, $\beta(\theta), \forall \theta\leq \theta_0$, is acceptably small. However, the probability of committing Type II error, $1-\beta(\theta), \forall \theta &gt; \theta_0$, is too high as we can see in the following plot,
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/230/&quot; target=&quot;_blank&quot; title=&quot;Type II Error&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/230.png&quot; alt=&quot;Type II Error&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:230&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
Therefore, it&#39;s better to investigate the error structure when considering the power of the test. From Casella and Berger (2001), the ideal power function is 0 $\forall\theta\in\Theta_0$ and 1 $\forall\theta\in\Theta_0^c$. Except in trivial situations, this ideal cannot be attained. Qualitatively, a good test has power function near 1 for most $\theta\in\Theta_0^c$ and $\theta\in\Theta_0$. Implying, one that has steeper power curve.&lt;br/&gt;&lt;br/&gt;

Now an interesting fact about power function is that it depends on the sample size $n$. Suppose in our experiment above we want the Type I error to be 0.05 and the Type II error to be 0.1 if $\theta\geq \theta_0+\sigma/2$. Since the power function is increasing, then we have
$$
\beta(\theta_0)=0.05\Rightarrow c&#39;=1.645\quad\text{and}\quad 1 - \beta(\theta_0+\sigma/2)=0.1\Rightarrow\beta(\theta_0+\sigma/2)=0.9.
$$
Where
$$
\begin{aligned}
\beta(\theta_0+\sigma/2)&amp;=1-\Phi\left[c&#39; +\frac{\theta_0-\sigma/2-\theta_0}{\sigma/\sqrt{n}}\right]\\
&amp;=1-\Phi\left[c&#39; - \frac{\sqrt{n}}{2}\right]\\
0.9&amp;=1-\Phi\left[1.645 - \frac{\sqrt{n}}{2}\right]\\
0.1&amp;=\Phi\left[1.645 - \frac{\sqrt{n}}{2}\right].\\
\end{aligned}
$$
Hence, $n$ is chosen such that it solves the above equation. That is,
$$
\begin{aligned}
1.645 - \frac{\sqrt{n}}{2}&amp;=-1.28155,\quad\text{since }\Phi(-1.28155)=0.1\\
\frac{3.29 - \sqrt{n}}{2}&amp;=-1.28155\\
3.29 - \sqrt{n}&amp;=-2.5631\\
n&amp;=(3.29+2.5631)^2=34.25878,\;\text{take }n=35.
\end{aligned}
$$
For purpose of illustration, we&#39;ll consider the non-rounded value of $n$. Below is the plot of this,
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/238/&quot; target=&quot;_blank&quot; title=&quot;Power Function with Sample Size&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/238.png&quot; alt=&quot;Power Function with Sample Size&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:238&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
And for different values of $n$, consider the following power functions
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/286/&quot; target=&quot;_blank&quot; title=&quot;Effect of Sample Size on Power Function&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/286.png&quot; alt=&quot;Effect of Sample Size on Power Function&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:286&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
From the above plot, the larger the sample size, $n$, the steeper the curve implying a better error structure. To see this, try hovering over the lines in the plot, and you&#39;ll witness a fast departure for values of large $n$ on the unit range, this characteristics contribute to the sensitivity of the test.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Plot&#39;s Python Codes&lt;/h3&gt;
In case you want to reproduce the above plots, click &lt;a href=&quot;https://gist.github.com/alstat/cfb927fffea2cc15afe2&quot; target = &quot;_blank&quot;&gt;here&lt;/a&gt; for the source code.
&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://plot.ly/python/&quot; target = &quot;_blank&quot;&gt;Plotly Python Library Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/2562292771002515136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/05/parametric-inference-power-function-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2562292771002515136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2562292771002515136'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/05/parametric-inference-power-function-of.html' title='Parametric Inference: The Power Function of the Test'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-6414633907837650938</id><published>2015-04-27T17:21:00.000+08:00</published><updated>2015-08-17T10:46:27.584+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Interactive Visualization"/><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><title type='text'>Parametric Inference: Likelihood Ratio Test by Example</title><content type='html'>Hypothesis testing have been extensively used on different discipline of science. And in this post, I will attempt on discussing the basic theory behind this, the Likelihood Ratio Test (LRT) defined below from Casella and Berger (2001), see reference 1.
&lt;blockquote&gt;
&lt;b&gt;Definition&lt;/b&gt;. The &lt;i&gt;likelihood ratio test statistic&lt;/i&gt; for testing $H_0:\theta\in\Theta_0$ versus $H_1:\theta\in\Theta_0^c$ is
\begin{equation}
\label{eq:lrt}
\lambda(\mathbf{x})=\frac{\displaystyle\sup_{\theta\in\Theta_0}L(\theta|\mathbf{x})}{\displaystyle\sup_{\theta\in\Theta}L(\theta|\mathbf{x})}.
\end{equation}
A &lt;i&gt;likelihood ratio test&lt;/i&gt; (LRT) is any test that has a rejection  region of the form $\{\mathbf{x}:\lambda(\mathbf{x})\leq c\}$, where $c$ is any number satisfying $0\leq c \leq 1$.
&lt;/blockquote&gt;
The numerator of equation (\ref{eq:lrt}) gives us the supremum probability of the parameter, $\theta$, over the restricted domain (null hypothesis, $\Theta_0$) of the parameter space $\Theta$, that maximizes the joint probability of the sample, $\mathbf{x}$. While the denominator of the LRT gives us the supremum probability of the parameter, $\theta$, over the unrestricted domain, $\Theta$, that maximizes the joint probability of the sample, $\mathbf{x}$. Therefore, if the value of $\lambda(\mathbf{x})$ is small such that $\lambda(\mathbf{x})\leq c$, for some $c\in [0, 1]$, then the true value of the parameter that is plausible in explaining the sample is likely to be in the alternative hypothesis, $\Theta_0^c$.&lt;br/&gt;&lt;br/&gt;
&lt;b&gt;Example 1&lt;/b&gt;. Let $X_1,X_2,\cdots,X_n\overset{r.s.}{\sim}f(x|\theta)=\frac{1}{\theta}\exp\left[-\frac{x}{\theta}\right],x&gt;0,\theta&gt;0$. From this sample, consider testing $H_0:\theta = \theta_0$ vs $H_1:\theta&lt;\theta_0$.
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;br/&gt;&lt;br/&gt;
&lt;i&gt;Solution:&lt;/i&gt;&lt;br/&gt;
The parameter space $\Theta$ is the set $(0,\Theta_0]$, where $\Theta_0=\{\theta_0\}$. Hence, using the likelihood ratio test, we have
$$
\lambda(\mathbf{x})=\frac{\displaystyle\sup_{\theta=\theta_0}L(\theta|\mathbf{x})}{\displaystyle\sup_{\theta\leq\theta_0}L(\theta|\mathbf{x})},
$$
where,
$$
\begin{aligned}
\sup_{\theta=\theta_0}L(\theta|\mathbf{x})&amp;=\sup_{\theta=\theta_0}\prod_{i=1}^{n}\frac{1}{\theta}\exp\left[-\frac{x_i}{\theta}\right]\\
&amp;=\sup_{\theta=\theta_0}\left(\frac{1}{\theta}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta}\right]\\
&amp;=\left(\frac{1}{\theta_0}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta_0}\right],
\end{aligned}
$$
and
$$
\begin{aligned}
\sup_{\theta\leq\theta_0}L(\theta|\mathbf{x})&amp;=\sup_{\theta\leq\theta_0}\prod_{i=1}^{n}\frac{1}{\theta}\exp\left[-\frac{x_i}{\theta}\right]\\
&amp;=\sup_{\theta\leq\theta_0}\left(\frac{1}{\theta}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta}\right]=\sup_{\theta\leq\theta_0}f(\mathbf{x}|\theta).
\end{aligned}
$$
Now the supremum of $f(\mathbf{x}|\theta)$ over all values of $\theta\leq\theta_0$ is the MLE (maximum likelihood estimator) of $f(x|\theta)$, which is $\bar{x}$, provided that $\bar{x}\leq \theta_0$.&lt;br/&gt;&lt;br/&gt;
So that, 
$$
\begin{aligned}
\lambda(\mathbf{x})&amp;=\frac{\left(\frac{1}{\theta_0}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta_0}\right]}
{\left(\frac{1}{\bar{x}}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\bar{x}}\right]},\quad\text{provided that}\;\bar{x}\leq \theta_0\\
&amp;=\left(\frac{\bar{x}}{\theta_0}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta_0}\right]\exp[n].
\end{aligned}
$$
And we say that, if $\lambda(\mathbf{x})\leq c$, $H_0$ is rejected. That is,
$$
\begin{aligned}
\left(\frac{\bar{x}}{\theta_0}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta_0}\right]\exp[n]&amp;\leq c\\
\left(\frac{\bar{x}}{\theta_0}\right)^n\exp\left[-\displaystyle\frac{\sum_{i=1}^{n}x_i}{\theta_0}\right]&amp;\leq c&#39;,\quad\text{where}\;c&#39;=\frac{c}{\exp[n]}\\
n\log\left(\frac{\bar{x}}{\theta_0}\right)-\frac{n}{\theta_0}\bar{x}&amp;\leq \log c&#39;\\
\log\left(\frac{\bar{x}}{\theta_0}\right)-\frac{\bar{x}}{\theta_0}&amp;\leq \frac{1}{n}\log c&#39;\\
\log\left(\frac{\bar{x}}{\theta_0}\right)-\frac{\bar{x}}{\theta_0}&amp;\leq \frac{1}{n}\log c-1.
\end{aligned}
$$
Now let $h(x)=\log x - x$, then $h&#39;(x)=\frac{1}{x}-1$. So the critical point of $h&#39;(x)$ is $x=1$. And to test if this is maximum or minimum, we apply second derivative test. That is,
$$
h&#39;&#39;(x)=-\frac{1}{x^2}&lt;0,\forall x.
$$
Thus, $x=1$ is a maximum. Hence,
$
\log\left(\frac{\bar{x}}{\theta_0}\right)-\frac{\bar{x}}{\theta_0}
$
is maximized if $\frac{\bar{x}}{\theta_0}=1\Rightarrow\bar{x}=\theta_0$. To see this consider the following plot,
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/152/&quot; target=&quot;_blank&quot; title=&quot;LRT and its Critical Value&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/152.png&quot; alt=&quot;LRT and its Critical Value&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:152&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
Above figure is the plot of $h(\bar{x})$ function with $\theta_0=1$. Given the assumption that $\bar{x}\leq \theta_0$ then assuming $R=\frac{1}{n}\log c-1$ designates the orange line above, then we reject $H_0$ if $h(\bar{x})\leq R$, if and only if $\bar{x}\leq k$. In practice, $k$ is specified to satisfy,
$$
\mathrm{P}(\bar{x}\leq k|\theta=\theta_0)\leq \alpha,
$$
where $\alpha$ is called the level of the test.&lt;br/&gt;&lt;br/&gt;
It follows that $X_i|\theta = \theta_0\overset{r.s.}{\sim}\exp[\theta_0]$, then $\mathrm{E}X_i=\theta_0$ and $\mathrm{Var}X_i=\theta_0^2$. If $\bar{x}=\frac{1}{n}\sum_{i=1}^{n}X_i$ and if $G_n$ is the distribution of $\frac{(\bar{x}_n-\theta_0)}{\sqrt{\frac{\theta_0^2}{n}}}$. By CLT (central limit theorem) $\lim_{n\to\infty}G_n$ converges to standard normal distribution. That is, $\bar{x}|\theta = \theta_0\overset{r.s.}{\sim}AN\left(\theta_0,\frac{\theta_0^2}{n}\right)$. $AN$ - assymptotically normal. &lt;br/&gt;&lt;br/&gt;
Thus,
$$
\mathrm{P}(\bar{x}\leq k|\theta=\theta_0)=\Phi\left(\frac{k-\theta_0}{\theta_0/\sqrt{n}}\right),\quad\text{for large }n.
$$
So that,
$$
\mathrm{P}(\bar{x}\leq k|\theta=\theta_0)=\Phi\left(\frac{k-\theta_0}{\theta_0/\sqrt{n}}\right)\leq \alpha.
$$
Plotting this gives us,
&lt;div&gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/115/&quot; target=&quot;_blank&quot; title=&quot;CDF&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/115.png&quot; alt=&quot;CDF&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:115&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
with corresponding PDF given by,
&lt;div &gt;
    &lt;a href=&quot;https://plot.ly/~alstated1a61/128/&quot; target=&quot;_blank&quot; title=&quot;Density Function&quot; style=&quot;display: block; text-align: center;&quot;&gt;&lt;img src=&quot;https://plot.ly/~alstated1a61/128.png&quot; alt=&quot;Density Function&quot; style=&quot;max-width: 100%;&quot;  onerror=&quot;this.onerror=null;this.src=&#39;https://plot.ly/404.png&#39;;&quot; /&gt;&lt;/a&gt;
    &lt;script data-plotly=&quot;alstated1a61:128&quot; src=&quot;https://plot.ly/embed.js&quot; async&gt;&lt;/script&gt;
&lt;/div&gt;
Implying,
$$
\frac{k-\theta_0}{\theta_0/\sqrt{n}}=z_{\alpha}\Rightarrow k=\theta_0+z_{\alpha}\frac{\theta_0}{\sqrt{n}}.
$$
Therefore, a level-$\alpha$ test of $H_0:\theta=\theta_0$ vs $H_1:\theta&lt;\theta_0$ is the test that rejects $H_0$ when $\bar{x}\leq\theta_0+z_{\alpha}\frac{\theta_0}{\sqrt{n}}$.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Plot&#39;s Python Codes&lt;/h3&gt;
In case you might ask how above plots were generated:
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/34f4a22120a658e0a980.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://plot.ly/python/&quot; target = &quot;_blank&quot;&gt;Plotly Python Library Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/6414633907837650938/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/04/parametric-inference-likelihood-ratio.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6414633907837650938'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6414633907837650938'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/04/parametric-inference-likelihood-ratio.html' title='Parametric Inference: Likelihood Ratio Test by Example'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-8297983243016383510</id><published>2015-04-25T15:29:00.000+08:00</published><updated>2015-04-25T15:29:40.614+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Descriptive Statistics"/><category scheme="http://www.blogger.com/atom/ns#" term="Probability Theory"/><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>SAS&amp;reg;: Getting Started with PROC IML</title><content type='html'>Another powerful procedure of SAS, my favorite one, that I would like to share is the PROC IML (Interactive Matrix Language). This procedure treats all objects as a matrix, and is very useful for doing scientific computations involving vectors and matrices. To get started, we are going to demonstrate and discuss the following:
&lt;ul&gt;
&lt;li&gt;Creating and Shaping Matrices;&lt;/li&gt;
&lt;li&gt;Matrix Query;&lt;/li&gt;
&lt;li&gt;Subscripts;&lt;/li&gt;
&lt;li&gt;Descriptive Statistics;&lt;/li&gt;
&lt;li&gt;Set Operations;&lt;/li&gt;
&lt;li&gt;Probability Functions and Subroutine;&lt;/li&gt;
&lt;li&gt;Linear Algebra;&lt;/li&gt;
&lt;li&gt;Reading and Creating Data;&lt;/li&gt;
&lt;/ul&gt;
Above outline is based on the IML tip sheet (see Reference 1). So to begin on the first bullet, consider the following code:
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/2497c8d1404f99c11d21.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_659b59db-692c-4d47-b4ee-f8af267a4eaf&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;scalar&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;row_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;col_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;num_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;chr_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Hello,&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;world! :D&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX5&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;i_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX6&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;mat_2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX7&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;trow_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX8&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;mat1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;

With the help of the comments in the code, it wouldn&#39;t be difficult to comprehend what each line tries to tell us, so I will only explain line 33. In SAS, variables defined are not automatically stored into the workspace unless one stores it first, and then call it on other procedures by loading the storage, which we&#39;ll see on the next entry -- Math Query. Functions we&#39;ll discuss in math query involve extracting number of columns, rows, and so on, below is the sample code of this,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/544f32b4555ea6ea5851.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_7bace06f-a0ca-4a3c-b6e0-8760c483c5ce&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;pre class=&quot;batch&quot; style=&quot;border-spacing: 1px; width:500px&quot;&gt; SYMBOL     ROWS   COLS TYPE   SIZE                     
 ------   ------ ------ ---- ------                     
 CHR_MAT       2      1 char      9                     
 COL_VEC       6      1 num       8                     
 I_MAT         6      6 num       8                     
 MAT1          3      2 num       8                     
 MAT_2         3      1 num       8                     
 NUM_MAT       2      3 num       8                     
 ROW_VEC       1      6 num       8                     
 SCALAR        1      1 num       8                     
 TROW_VEC      6      1 num       8                     
  Number of symbols = 10  (includes those without values)
&lt;/pre&gt;&lt;br/&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;nmat_row&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;nmat_col&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;nmat_dim&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;cmat_len&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX5&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;cmat_nlen&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX6&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;nmat_typ&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;N&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX7&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;cmat_typ&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;C&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
So to load all variables stored in the workspace, we use line 3. Succeeding lines are not that difficult to understand, and this what I love about SAS, the statements and functions are self-explanatory -- a good excuse for us to proceed with subscripting on matrices, below is the code of it&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9ec1455cb62d691f99ad.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_c86abb13-bf8d-4735-b91e-f90589cb9998&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;NUM_MAT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;n22_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;nr1_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;ir12_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;ic12_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX5&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;ngm_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;3.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX6&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;ncm_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2.5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3.5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4.5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX7&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;nrm_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX8&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;ngs_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX9&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;nrs_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;17&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;29&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX10&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;ncs_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX11&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;nss_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;91&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX12&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;nrs_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;17&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;29&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX13&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;ncs_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
Line 17 computes the grand mean of the matrix by simply inserting &lt;code&gt;:&lt;/code&gt; symbol inside the place holder of the subscript. So that if we have &lt;code&gt;num_mat[:, 1]&lt;/code&gt;, then mean is computed over the row entries, giving us the column mean, particularly for first column. The same goes for &lt;code&gt;num_mat[1, :]&lt;/code&gt;, where it computes the mean over the column entries, giving us the row mean. If we replace the symbol in the place holder of the subscripts to &lt;code&gt;+&lt;/code&gt;, then we are interested in the sum of the entries. Further, if we use &lt;code&gt;##&lt;/code&gt; symbol, the returned value will be the sum of square of the elements. And reducing this to &lt;code&gt;#&lt;/code&gt;, the returned value will be the product of the elements.&lt;br/&gt;&lt;br/&gt;

Now let&#39;s proceed to the next bullet, which is about Descriptive Statistics.&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/28c4498a94bc8acd622f.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_bc574f49-7da4-479e-a6f4-9f85a2bc330a&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;6&quot; scope=&quot;colgroup&quot;&gt;csr_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;10&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;15&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;csn_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;10&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;15&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;mnr_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;mnn_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;mxr_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX5&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;mxn_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX6&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;smr_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX7&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;smn_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX8&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;ssr_vec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;91&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX9&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;ssn_mat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;91&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
To generate random numbers from say normal distribution and computing the mean, standard deviation and other statistics, consider the following:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9466735e59f624f2cf07.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_b4cc1ec2-9911-43bc-9bae-76b53bf609ab&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.2642335&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.0747269&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.8179241&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.552775&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.5401449&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1.233822&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.141535&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.0420036&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.0657322&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.225259&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.148304&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.2901233&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1.149394&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.482548&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.452974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.2738675&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.224133&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.218553&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.420015&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.246356&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;54.993687&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;58.167325&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;59.147705&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;40.74794&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;45.813645&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;53.460273&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;57.877839&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;51.98273&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;49.875743&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;52.570553&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;54.097005&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;46.936325&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;57.509082&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;50.463228&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;42.775346&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;39.376643&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;53.303455&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;54.494482&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;55.747821&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;44.512206&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;x12&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.2642335&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;54.993687&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.0747269&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;58.167325&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.8179241&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;59.147705&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.552775&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;40.74794&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.5401449&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;45.813645&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1.233822&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;53.460273&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.141535&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;57.877839&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.0420036&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;51.98273&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.0657322&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;49.875743&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1.225259&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;52.570553&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.148304&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;54.097005&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.2901233&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;46.936325&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1.149394&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;57.509082&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.482548&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;50.463228&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.452974&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;42.775346&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.2738675&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;39.376643&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.224133&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;53.303455&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.218553&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;54.494482&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.420015&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;55.747821&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.246356&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;44.512206&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;x12_cor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.001531&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.001531&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;x12_cov&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.5645625&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.006864&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.006864&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;35.614684&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX5&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x1_mu&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.1126712&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX6&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x2_std&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;5.967804&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
Line 2 above sets the initial random seed for random numbers to be generated in line 8. Line 5 allocates a matrix of dimension 20 by 1 to &lt;code&gt;x1&lt;/code&gt; variable, and that&#39;s done by using the &lt;code&gt;j&lt;/code&gt; function. The number of rows of &lt;code&gt;x1&lt;/code&gt; represents the sample size of the random numbers needed. One can also set &lt;code&gt;x1&lt;/code&gt; to a row vector, where in this case, the number of columns represents the sample size needed. The two sets of random sample, &lt;code&gt;x1&lt;/code&gt; and &lt;code&gt;x2&lt;/code&gt;, generated from the same family of distribution, Gaussian/Normal, are then concatenated column-wise (&lt;code&gt;||&lt;/code&gt;) to form a matrix of size 20 by 2 in line 13. Using this new matrix, &lt;code&gt;x12&lt;/code&gt;, we can then compute the correlation and covariance of the two columns using &lt;code&gt;corr&lt;/code&gt; and &lt;code&gt;cov&lt;/code&gt; functions, respectively, which from the above output tells us that there is almost no relation between the two.&lt;br/&gt;&lt;br/&gt;
SAS can also perform set operations, and it&#39;s easy. Consider the following:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5a3d8e5ee39bc8762858.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_aac492bc-1421-4b71-a5a2-43c5bffcd20d&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;4&quot; scope=&quot;colgroup&quot;&gt;B_comp&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;a&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;i&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;m&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;5&quot; scope=&quot;colgroup&quot;&gt;A_comp&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;e&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;h&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;r&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;t&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;10&quot; scope=&quot;colgroup&quot;&gt;AuB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;a&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;e&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;h&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;i&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;m&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;o&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;r&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;t&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;x&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;AnB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;o&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;10&quot; scope=&quot;colgroup&quot;&gt;AB_unq&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;a&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;e&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;h&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;i&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;m&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;o&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;r&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;t&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;x&lt;/td&gt;
&lt;td class=&quot;data&quot;&gt;y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
Next bullet is all about Probability Functions and Subroutine. For example, consider an experiment defined by the random variable $X$ which follows an exponential distribution with mean $\beta = .5$. What is the probability of $X$ to be at most 2, $\mathrm{P}(X\leq 2)$? To solve this we use the &lt;code&gt;CDF&lt;/code&gt; function, but note that the exponential density in SAS is given by
$$f(x|\beta)=\frac{1}{\beta}\exp\left[-\frac{x}{\beta}\right].$$
So to compute the probability, we solve for the following integration,
$$
\mathrm{P}(X\leq 2)=\int_{0}^{2}\frac{1}{.5}\exp\left[-\frac{x}{.5}\right]\operatorname{d}x = 0.9816844
$$
To confirm this in SAS, run the following
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/fd66e2ddb1a7918de2b7.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_9eec5826-d049-4fe3-a8eb-5160994efcb6&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;px&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.9816844&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
If we take the derivative of the Cumulative Distribution Function (CDF), the returned expression is what we call the Probability Density Function (PDF). And in SAS, we play on this using the &lt;code&gt;PDF&lt;/code&gt; function. For example, we can confirm the above probability by integrating the PDF. And to do so, run the following&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/79470055c1d76cb1926d.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_c6bfb9e4-c541-45b4-a50a-e12563253a89&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;px&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.9816844&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
To end this topic, consider the inverse of the CDF, which is the quantile. To compute for the quantile of the popular level of significance $\alpha = 0.05$, from a standard normal distribution, which is $z_{\alpha} = -1.645$ for lower tail, run &lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f5b4a7a6cc66efe8fa41.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_95ef86ca-49ff-4cc1-9df2-eb84e99f317d&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;z_a&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1.644854&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
Next entry is about Linear Algebra, the topic on which this procedure is based upon. Linear algebra is very useful in Statistics, especially in Regression, Nonlinear Regression, and Multivariate Analysis. To perform this in SAS, consider &lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/335af0c7dd49cf3bd6a5.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_0dbdf489-6427-4e8f-b189-ef771ec3a453&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;xm_det&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;xm_inv&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-3&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-3&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-1&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;4.441E-16&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX2&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x_evl&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;11.344814&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.1709152&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.515729&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX3&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;3&quot; scope=&quot;colgroup&quot;&gt;x_evc&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.3279853&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.591009&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.7369762&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.591009&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.736976&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.3279853&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;0.7369762&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.3279853&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-0.591009&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX4&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x_coef&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap&quot;&gt;-4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
Finally, one of the coolest capabilities of SAS/IML is to Read and Create SAS Data. The following code demos how to read SAS data set.&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/42957027513b0b8f7918.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;div id=&quot;div_6df58228-c26c-4a98-ba2b-4132cfa11cda&quot; class=&quot;c body&quot;&gt;
&lt;section data-name=&quot;IML&quot; data-sec-type=&quot;proc&quot;&gt;
&lt;div id=&quot;IDX&quot; class=&quot;systitleandfootercontainer&quot; style=&quot;border-spacing: 1px&quot;&gt;
&lt;/div&gt;
&lt;article&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;x_dat&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Acura&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Audi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Audi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;data&quot;&gt;Audi&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;article id=&quot;IDX1&quot;&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;c b header&quot; scope=&quot;col&quot;&gt;hp_mean&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;215.88551&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/article&gt;
&lt;/section&gt;
&lt;/div&gt;
&lt;/center&gt;
And to create a SAS data set, run &lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/b58693b72de397cc3b50.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;col&gt;&lt;col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Obs&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;COL1&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;COL2&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;COL3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;2&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
To end this post, I want to say, &lt;i&gt;I am loving SAS because of IML&lt;/i&gt;. There are still hidden capabilities of this procedure that I would love to explore and share to my readers, so stay tuned. Another great blog about SAS/IML is &lt;a href=&quot;http://blogs.sas.com/content/iml&quot; target = &quot;_blank&quot;&gt;The DO Loop&lt;/a&gt;, whose author, &lt;a href=&quot;http://blogs.sas.com/content/iml/author/rickwicklin&quot; target = &quot;_blank&quot;&gt;Dr. Rick Wicklin&lt;/a&gt;, is also the principal developer of the said procedure and &lt;a href=&quot;http://support.sas.com/rnd/app/studio/index.html&quot; target = &quot;_blank&quot;&gt;SAS/IML Studio&lt;/a&gt;, do check that out.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;SAS/IML Tip Sheet. &lt;i&gt;&lt;a href=&quot;http://blogs.sas.com/content/iml/files/2011/10/IMLTipSheet.pdf&quot; target = &quot;_blank&quot;&gt;Frequently Used SAS/IML Functions and Subroutines&lt;/a&gt;&lt;/i&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://support.sas.com/documentation/cdl/en/imlug/63541/PDF/default/imlug.pdf&quot; target = &quot;_blank&quot;&gt;SAS/IML 13.2 User Guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://blogs.sas.com/content/iml/2011/05/06/how-to-numerically-integrate-a-function-in-sas.html&quot; target = &quot;_blank&quot;&gt;Rick Wicklin. The DO Loop. &lt;i&gt;How to numerically integrate a function in SAS&lt;/i&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;style&gt;
.header {
    background-color: #EDF2F9;
    border-color: #B0B7BB;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    color: #127;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: bold;
    padding: 2px 5px 2px 5px;
}


.rowheader {
    background-color: #EDF2F9;
    border-color: #B0B7BB;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    color: #127;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: bold;
    text-align: center;
    padding: 2px 5px 2px 5px;
}


.data, .dataemphasis {
    background-color: #FFF;
    border-color: #C1C1C1;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: normal;
    text-align: right;
    padding: 2px 5px 2px 5px;
}

.table {
    border-color: #C1C1C1;
    border-style: solid;
    border-width: 1px 1px 1px 1px;
    border-collapse: collapse;
    border-spacing: 0px;
    padding: 5px 5px 5px 5px;
    margin-bottom: 1em;
}

.body {
    color: #000;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: normal;
    line-height: 1.231;
}
&lt;/style&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/8297983243016383510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/04/sas-getting-started-with-proc-iml.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8297983243016383510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8297983243016383510'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/04/sas-getting-started-with-proc-iml.html' title='SAS&amp;reg;: Getting Started with PROC IML'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-2701868276603194637</id><published>2015-04-16T14:02:00.000+08:00</published><updated>2015-08-17T10:47:45.208+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Interactive Visualization"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><category scheme="http://www.blogger.com/atom/ns#" term="Sampling Analysis"/><title type='text'>Python and R: Basic Sampling Problem</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
In this post, I would like to share a simple problem about sampling analysis. And I will demonstrate how to solve this using Python and R. The first two problems are originally from Sampling: Design and Analysis book by Sharon Lohr.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Problems&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Let $N=6$ and $n=3$. For purposes of studying sampling distributions, assume that all population values are known.
&lt;br/&gt;&lt;br/&gt;
&lt;center&gt;
&lt;table width = 50%&gt;
&lt;tr&gt;&lt;td&gt;$y_1 = 98$&lt;/td&gt;&lt;td&gt;$y_2 = 102$&lt;/td&gt;&lt;td&gt;$y_3=154$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;$y_4 = 133$&lt;/td&gt;&lt;td&gt;$y_5 = 190$&lt;/td&gt;&lt;td&gt;$y_6=175$&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br/&gt;
We are interested in $\bar{y}_U$, the population mean. Consider eight possible samples chosen.
&lt;br/&gt;&lt;br/&gt;
&lt;center&gt;
&lt;table width = 40%&gt;
&lt;tr&gt;&lt;td&gt;Sample No.&lt;/td&gt;&lt;td&gt;Sample, $\mathcal{S}$&lt;/td&gt;&lt;td&gt;$P(\mathcal{S})$&lt;/td&gt;&lt;tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;1&lt;/td&gt;&lt;td&gt;$\{1,3,5\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;2&lt;/td&gt;&lt;td&gt;$\{1,3,6\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;3&lt;/td&gt;&lt;td&gt;$\{1,4,5\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;4&lt;/td&gt;&lt;td&gt;$\{1,4,6\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;5&lt;/td&gt;&lt;td&gt;$\{2,3,5\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;6&lt;/td&gt;&lt;td&gt;$\{2,3,6\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;7&lt;/td&gt;&lt;td&gt;$\{2,4,5\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;center&quot;&gt;8&lt;/td&gt;&lt;td&gt;$\{2,4,6\}$&lt;/td&gt;&lt;td&gt;$1/8$&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br/&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;What is the value of $\bar{y}_U$?&lt;/li&gt;
&lt;li&gt;Let $\bar{y}$ be the mean of the sample values. For each sampling plan, find
&lt;ol type = &quot;i&quot;&gt;
&lt;li&gt;$\mathrm{E}\bar{y}$;&lt;/li&gt;
&lt;li&gt;$\mathrm{Var}\bar{y}$;&lt;/li&gt;
&lt;li&gt;$\mathrm{Bias}(\bar{y})$;&lt;/li&gt;
&lt;li&gt;$\mathrm{MSE}(\bar{y})$;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
Mayr et al. (1994) took an SRS of 240 children who visisted their pediatric outpatient clinic. They found the following frequency distribution for the age (in months) of free (unassisted) walking among the children:&lt;br/&gt;&lt;br/&gt;
&lt;center&gt;
&lt;table width = 85%&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;Age (months)&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;9&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;10&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;11&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;12&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;13&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;14&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;15&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;16&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;17&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;18&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;19&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;20&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;Number of Children&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;13&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;35&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;44&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;69&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;36&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;24&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;7&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;3&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;2&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;5&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;1&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;1&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;br/&gt;
Find the mean and SE of the age for onset of free walking.
&lt;/li&gt;
&lt;li&gt;
Table 1 gives the cultivated area in acres in 1981 for 40 villages in a region. (Theory and Method of Survey)
Using the arrangement (random) of data in the table, draw systematic sample of size 8.  Use r ((random start) = 2,&lt;br/&gt;&lt;br/&gt;
&lt;center&gt;
&lt;table width = 80%&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;Village&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;$Y_j$&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;Village&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;$Y_j$&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;Village&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;$Y_j$&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;Village&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;$Y_j$&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;1&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;105&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;11&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;319&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;21&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;70&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;31&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;16&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;2&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;625&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;12&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;72&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;22&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;249&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;32&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;439&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;3&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;47&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;13&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;109&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;23&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;384&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;33&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;123&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;4&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;312&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;14&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;91&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;24&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;482&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;34&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;207&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;5&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;327&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;15&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;152&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;25&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;378&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;35&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;145&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;6&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;230&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;16&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;189&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;26&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;111&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;36&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;666&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;7&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;240&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;17&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;365&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;27&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;534&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;37&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;338&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;8&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;203&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;18&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;70&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;28&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;306&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;38&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;624&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;9&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;535&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;9&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;249&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;29&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;655&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;39&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;501&lt;/tr&gt;
&lt;tr&gt;&lt;td align = &quot;right&quot;&gt;10&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;275&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;20&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;384&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;30&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;102&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;40&lt;/td&gt;&lt;td align = &quot;right&quot;&gt;962&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Solutions&lt;/h3&gt;
In order to appreciate the codes, I will share some theoretical part of the solution. But our main focus here is to solve this problem computationally using Python and R.
&lt;ol&gt;
&lt;li&gt;
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;The value of $\bar{y}_U$ is coded as follows:&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/d98cc60fdb46270f7311.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/9afce899f14128ea12d0.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;
To obtain the sample using the sample index given in the table in the above question, we do a combination of population index of three elements, ${6\choose 3}$, first. Where the first two combinations are the samples, $\{1,2,3\}$ and $\{1,2,4\}$, and so on. Then from this list of all possible combinations of three elements, we draw those that are listed in the above table as our samples, with first sample index $\{1,3,5\}$, having population units, $\{98, 154, 190\}$. So that the following is the code of this sampling design:&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/bca1e32f6ab9c7e2a6f2.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/9b67c3e8444c6fcc620c.js&quot;&gt;&lt;/script&gt;
&lt;ol type = &quot;i&quot;&gt;
&lt;li&gt;
Now to obtain the expected value of the average of the sample data, we compute it using $\mathrm{E}\bar{y}=\sum_{k}\bar{y}_k\mathrm{P}(\bar{y}_k)=\sum_{k}\bar{y_k}\mathrm{P}(\mathcal{S}_k)$, $\forall k\in\{1,\cdots,8\}$. So for $k = 1$, 
$$
\begin{aligned}
\bar{y}_1\mathrm{P}(\mathcal{S}_1)&amp;=\frac{98+154+190}{3}\mathrm{P}(\mathcal{S}_1)\\
&amp;=\frac{98+154+190}{3}\left(\frac{1}{8}\right)=18.41667.
\end{aligned}
$$ 
Applying this to the remaining $n-1$ $k$s, and summing up the terms gives us the answer to $\mathrm{E}\bar{y}$. So that the following is the equivalent of this:&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/ef0892e9be3024231b3a.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/3e5813edd196ab931014.js&quot;&gt;&lt;/script&gt;
From the above code, the output tells us that $\mathrm{E}\bar{y}=140$.
&lt;/li&gt;
&lt;li&gt;
Next is to compute for the variance of $\bar{y}$, which is $\mathrm{Var}\bar{y}=\mathrm{E}\bar{y}^{2}-(\mathrm{E}\bar{y})^2$. So we need a function for $\mathrm{E}\bar{y}^2$, where the first term of this, $k=1$, is $\bar{y}_1^2\mathrm{P}(\mathcal{S}_1)=\left(\frac{98+154+190}{3}\right)^2\mathrm{P}(\mathcal{S}_1)=\left(\frac{98+154+190}{3}\right)^2(\frac{1}{8})=2713.3889$. Applying this to other terms and summing them up, we have following code:&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/965349d8ec12455a96c6.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/9594aaf87ad6dfe4fde4.js&quot;&gt;&lt;/script&gt;
So that using the above output, 20182.94, and subtracting $(\mathrm{E}\bar{y})^2$ to it, will give us the variance. And hence the succeeding code:&lt;br/&gt;&lt;br/&gt;
Python Code:
&lt;script src=&quot;https://gist.github.com/alstat/c213a63513ba21774ec3.js&quot;&gt;&lt;/script&gt;
R Code:
&lt;script src=&quot;https://gist.github.com/alstat/390b6ba65650532f96de.js&quot;&gt;&lt;/script&gt;
So the variance of the $\bar{y}$ is $18.9444$.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
The $\mathrm{Bias}$ is just the difference between the estimate and the true value. And since the estimate is unbiased ($\mathrm{E}\bar{y}=142$), so $\mathrm{Bias}=142-142=0$.
&lt;/li&gt;
&lt;li&gt;
$\mathrm{MSE}=\mathrm{Var}\bar{y}-(\mathrm{Bias}\bar{y})^2$, and since the $\mathrm{Bias}\bar{y}=0$. So $\mathrm{MSE}=\mathrm{Var}\bar{y}$.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
First we need to obtain the probability of each Age, that is by dividing the Number of Children with the total sum of it. That is why, we have &lt;code&gt;p_s&lt;/code&gt; function defined below. After obtaining the probabilities, we can then compute the expected value using the &lt;code&gt;expectation&lt;/code&gt; function we defined earlier.&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/7c1c0090a03c389fd935.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/b4eb9a7692ed163538d6.js&quot;&gt;&lt;/script&gt;

It should be clear in the data that the average age is about 12 months old, where the plot of it is shown below,

&lt;iframe src=&quot;http://cdn.rawgit.com/alstat/Analysis-with-Programming/master/2015/Python/Sampling-Design/samp1.html&quot; seamless height=&quot;515px&quot; width=&quot;100%&quot;, frameborder = 0&gt;&lt;/iframe&gt;

For the code of the above plot please click &lt;a href=&quot;https://gist.github.com/alstat/594c7074890f4dc8b4d1&quot; target = &quot;_blank&quot;&gt;here&lt;/a&gt;. Next is to compute the standard error which is just the square root of the variance of the sample,&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/34b12a892e8cb311c9e5.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/26d69c90d12d68bdd7f8.js&quot;&gt;&lt;/script&gt;
So the standard variability of the Age is 1.920824.
&lt;/li&gt;
&lt;li&gt;
Let me give you a brief discussion on the systematic sampling to help you understand the code. The idea in systematic sampling is that, given the population units numbered from 1 to $N$, we compute for the sampling interval, given by $k = \frac{N}{n}$, where $n$ is the number of units needed for the sample. After that, we choose for the random start, number between $1$ and $k$. This random start will be the first sample, and then the second unit in the sample is obtained by adding the sampling interval to the random start, and so on. There are two types of systematic sampling namely, Linear and Circular Systematic Samplings. Circular systematic sampling treats the population units numbered from $1$ to $N$ in circular form, so that if the increment step is more than the number of $N$ units, say $N+2$, the sample unit is the $2^{nd}$ element in the population,  and so on. The code that I will be sharing can be used both for linear and circular, but for this particular problem only. Since there are rules in linear that are not satisfied in the function, one of which is if $k$ is not a whole number, despite that, however, you can always extend it to a more general function.&lt;br/&gt;&lt;br/&gt;
Python Code
&lt;script src=&quot;https://gist.github.com/alstat/06e459265e7c1dfa9b17.js&quot;&gt;&lt;/script&gt;
R Code
&lt;script src=&quot;https://gist.github.com/alstat/77674937787928723e02.js&quot;&gt;&lt;/script&gt;
You may notice in the output above, that the index returned in Python is not the same with the index returned in R. This is because Python index starts with 0, while that in R starts with 1. So that&#39;s why we have the same population units sampled between the two language despite the differences between the index returned.
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Sampling-Design-Analysis-Advanced-Series/dp/0495105279&quot; target=&quot;_blank&quot;&gt;Lohr, Sharon (2009). &lt;i&gt;Sampling: Design and Analysis&lt;/i&gt;. Cengage Learning.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/2701868276603194637/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/04/python-and-r-basic-sampling-problem.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2701868276603194637'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2701868276603194637'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/04/python-and-r-basic-sampling-problem.html' title='Python and R: Basic Sampling Problem'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-6122766998085853717</id><published>2015-03-06T19:27:00.000+08:00</published><updated>2015-03-06T19:27:09.889+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="Probability Theory"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><title type='text'>Probability Theory: Convergence in Distribution Problem</title><content type='html'>&lt;div class=&quot;plotdiv&quot; id=&quot;31e7723b-f40b-4bd6-a02f-14762191a179&quot;&gt;
Let&#39;s solve some theoretical problem in probability, specifically on convergence. The problem below is originally from Exercise 5.42 of Casella and Berger (2001). And I just want to share my solution on this. If there is an incorrect argument below, I would be happy if you could point that to me.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Problem&lt;/h3&gt;
Let $X_1, X_2,\cdots$ be iid (independent and identically distributed) and $X_{(n)}=\max_{1\leq i\leq n}x_i$.
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;If $x_i\sim$ beta(1,$\beta$), find a value of $\nu$ so that $n^{\nu}(1-X_{(n)})$ converges in distribution;&lt;/li&gt;
&lt;li&gt;If $x_i\sim$ exponential(1), find a sequence $a_n$ so that $X_{(n)}-a_n$ converges in distribution.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Solution&lt;/h3&gt;
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;Let $Y_n=n^{\nu}(1-X_{(n)})$, we say that $Y_n\rightarrow Y$ in distribution. If 
$$\lim_{n\rightarrow \infty}F_{Y_n}(y)=F_Y(y).$$
Then,
$$
\begin{aligned}
\lim_{n\rightarrow\infty}F_{Y_n}(y)&amp;=\lim_{n\rightarrow\infty}P(Y_n\leq y)=\lim_{n\rightarrow\infty}P(n^{\nu}(1-X_{(n)})\leq y)\\
&amp;=\lim_{n\rightarrow\infty}P\left(1-X_{(n)}\leq \frac{y}{n^{\nu}}\right)\\
&amp;=\lim_{n\rightarrow\infty}P\left(-X_{(n)}\leq \frac{y}{n^{\nu}}-1\right)=\lim_{n\rightarrow\infty}\left[1-P\left(-X_{(n)}&gt; \frac{y}{n^{\nu}}-1\right)\right]\\
&amp;=\lim_{n\rightarrow\infty}\left[1-P\left(\max\{X_1,X_2,\cdots,X_n\}&lt; 1-\frac{y}{n^{\nu}}\right)\right]\\
&amp;=\lim_{n\rightarrow\infty}\left[1-P\left(X_1&lt; 1-\frac{y}{n^{\nu}},X_2&lt; 1-\frac{y}{n^{\nu}},\cdots,X_n&lt; 1-\frac{y}{n^{\nu}}\right)\right]\\
&amp;=\lim_{n\rightarrow\infty}\left[1-P\left(X_1&lt; 1-\frac{y}{n^{\nu}}\right)^n\right],\;\text{since}\;X_i&#39;s\;\text{are iid.}
\end{aligned}
$$
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
And because $x_i\sim$ beta(1,$\beta$), the density is
$$
f_{X_1}(x)=\begin{cases}
\beta(1-x)^{\beta - 1}&amp;\beta&gt;0, 0\leq x\leq 1\\
0,&amp;\mathrm{Otherwise}
\end{cases}
$$
Implies,
$$
\begin{aligned}
\lim_{n\to \infty}P(Y_n\leq y)&amp;=\lim_{n\to \infty}\left\{1-\left[\int_0^{1-\frac{y}{n^{\nu}}}\beta(1-t)^{\beta-1}\,\mathrm{d}t\right]^n\right\}\\
&amp;=\lim_{n\to \infty}\left\{1-\left[-\int_1^{\frac{y}{n^{\nu}}}\beta u^{\beta-1}\,\mathrm{d}u\right]^{n}\right\}\\
&amp;=\lim_{n\to \infty}\left\{1-\left[-\beta\frac{u^{\beta}}{\beta}\bigg|_{u=1}^{u=\frac{y}{n^{\nu}}}\right]^{n}\right\}\\
&amp;=1-\lim_{n\to \infty}\left[1-\left(\frac{y}{n^{\nu}}\right)^{\beta}\right]^{n}
\end{aligned}
$$
We can simplify the limit if $\nu=\frac{1}{\beta}$, that is
$$
\lim_{n\to\infty}P(Y_n\leq y)=1-\lim_{n\to\infty}\left[1-\frac{y^{\beta}}{n}\right]^{n}=1-e^{-y^{\beta}}
$$
To confirm this in Python, run the following code using the sympy module&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a33ba59e2d9dedf0c1ef.js&quot;&gt;&lt;/script&gt;
Therefore, if $1-e^{-y^{\beta}}$ is a distribution function of $Y$, then $Y_n=n^{\nu}(1-X_{(n)})$ converges in distribution to $Y$ for $\nu=\frac{1}{\beta}$.&lt;br/&gt;
$\hspace{12.5cm}\blacksquare$&lt;/li&gt;
&lt;li&gt;
$$
\begin{aligned}
P(X_{(n)}-a_{n}\leq y) &amp;= P(X_{(n)}\leq y + a_n)=P(\max\{X_1,X_2,\cdots,X_n\}\leq y+a_n)\\
&amp;=P(X_1\leq y+a_n,X_2\leq y+a_n,\cdots,X_n\leq y+a_n)\\
&amp;=P(X_1\leq y+a_n)^n,\;\text{since}\;x_i&#39;s\;\text{are iid}\\
&amp;=\left[\int_{-\infty}^{y+a_n}f_{X_1}(t)\,\mathrm{d}t\right]^n
\end{aligned}
$$
Since $X_i\sim$ exponential(1), then the density is
$$
f_{X_1}=\begin{cases}
e^{-x},&amp;0\leq x\leq \infty\\
0,&amp;\mathrm{otherwise}
\end{cases}
$$
So that,
$$
\begin{aligned}
P(X_{(n)}-a_{n}\leq y)&amp;=\left[\int_{0}^{y+a_n}e^{-t}\,\mathrm{d}t\right]=\left\{-\left[e^{-(y+a_n)}-1\right]\right\}^n\\
&amp;=\left[1-e^{-(y+a_n)}\right]^n
\end{aligned}
$$
If we let $Y_n=X_{(n)}-a_n$, then we say that $Y_n\rightarrow Y$ in distribution if
$$
\lim_{n\to\infty}P(Y_n\leq y)=P(Y\leq y)
$$
Therefore,
$$
\begin{aligned}
\lim_{n\to\infty}P(Y_n\leq y) &amp;= \lim_{n\to\infty}P(X_{(n)}-a_n\leq y)=\lim_{n\to \infty}\left[1-e^{-y-a_n}\right]^n\\
&amp;=\lim_{n\to\infty}\left[1-\frac{e^{-y}}{e^{a_n}}\right]^n
\end{aligned}
$$ 
We can simplify the limit if $a_n=\log n$, that is
$$
\lim_{n\to\infty}\left[1-\frac{e^{-y}}{e^{\log n}}\right]^n=\lim_{n\to\infty}\left[1-\frac{e^{-y}}{n}\right]^n=e^{-e^{-y}}
$$
Check this in Python by running the following code,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/17eee1aa1839d88e10b7.js&quot;&gt;&lt;/script&gt;
In conclusion, if $e^{-e^{-y}}$ is a distribution function of Y, then $Y_n=X_{(n)}-a_n$ converges in distribution to $Y$ for sequence $a_n=\log n$.&lt;br/&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/6122766998085853717/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/03/probability-theory-convergence-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6122766998085853717'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/6122766998085853717'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/03/probability-theory-convergence-in.html' title='Probability Theory: Convergence in Distribution Problem'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-48310061215213779</id><published>2015-02-26T22:34:00.001+08:00</published><updated>2015-02-27T19:58:59.090+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Infographic"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>R: How to Layout and Design an Infographic</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
As promised from my &lt;a href=&quot;http://alstatr.blogspot.com/2015/02/philippine-infographic-recapitulation.html&quot; target=&quot;_blank&quot;&gt;recent article&lt;/a&gt;, here&#39;s my tutorial on how to layout and design an infographic in R. This article will serve as a template for more infographic design that I plan to share on future posts. Hence, we will go through the following sections:

&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Layout - mainly handles by &lt;a href=&quot;http://cran.r-project.org/web/packages/grid/index.html&quot; target=&quot;_blank&quot;&gt;grid&lt;/a&gt; package.&lt;/li&gt;
&lt;li&gt;Design - style of the elements in the layout.
&lt;ul&gt;
&lt;li&gt;Texts - use &lt;a href=&quot;http://cran.r-project.org/web/packages/extrafont/index.html&quot; target=&quot;_blank&quot;&gt;extrafont&lt;/a&gt; package for custom fonts;&lt;/li&gt;
&lt;li&gt;Shapes (lines and point characters) - use &lt;a href=&quot;http://cran.r-project.org/web/packages/grid/index.html&quot; target=&quot;_blank&quot;&gt;grid&lt;/a&gt;, although this package has been removed from CRAN (as of February 26, 2015), the compressed file of the source code of the package is still available. But if I am not mistaken, by default this package is included in R. You might check it first before installing.&lt;/li&gt;
&lt;li&gt;Plots - several choices for plotting data in R: base plot, &lt;a href=&quot;http://cran.r-project.org/web/packages/lattice/index.html&quot; target=&quot;_blank&quot;&gt;lattice&lt;/a&gt;, or &lt;a href=&quot;http://cran.r-project.org/web/packages/ggplot2/index.html&quot; target=&quot;_blank&quot;&gt;ggplot2&lt;/a&gt; package.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
The Infographic&lt;/h3&gt;
We aim to obtain the following layout and design in the final output of our code:
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtxW-CXybvA3BcxzObxK9mqlqRJh9-RuQ6dtAMR6u72Z61b-ZrCHMqOVSLeQhvDt8s3kTSngiLplRC2EMzsJj5yOPW36rxesXylN0RlhXzqtdb_dJArP09KW4OA0v24LRqCKMPhmZ4rUcX/s1600/Infographics.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtxW-CXybvA3BcxzObxK9mqlqRJh9-RuQ6dtAMR6u72Z61b-ZrCHMqOVSLeQhvDt8s3kTSngiLplRC2EMzsJj5yOPW36rxesXylN0RlhXzqtdb_dJArP09KW4OA0v24LRqCKMPhmZ4rUcX/s1600/Infographics.png&quot; height=&quot;640&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
To start with, we need to setup our data first. And for illustration purposes, we will use a simulated data:
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/0af7bfceae62e5c37115.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;Design: Colour&lt;/h3&gt;
The aesthetic of an infographic not only depends on the shapes and plots, but also on the colours. So if you are not an artist, I suggest to look first for a list of sample infographics to get some inspiration. Once you have found the theme for your chart, grab the colour of it. To grab the colour, use eyedropper tool from software such as photoshop, affinity designer, etc. There is also free add ons for Mozilla Firefox called &lt;a href=&quot;http://www.colorzilla.com/firefox/&quot; target=&quot;_blank&quot;&gt;ColorZilla&lt;/a&gt;, I haven&#39;t tried it but maybe you could explore that. For the above theme, there are five colours with the following hexadecimal colour code: &lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Colour Name&lt;/th&gt;&lt;th&gt;Hexadecimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 1: Colours Used in the Chart.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Dark Violet&lt;/td&gt;&lt;td&gt;&lt;code&gt;#552683&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;Dark Yellow&lt;/td&gt;&lt;td&gt;&lt;code&gt;#E7A922&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;White&lt;/td&gt;&lt;td&gt;&lt;code&gt;#FFFFFF&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;Gray (Infographic Text)&lt;/td&gt;&lt;td&gt;&lt;code&gt;#A9A8A7&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Dark Yellow (Crime Text)&lt;/td&gt;&lt;td&gt;&lt;code&gt;#CA8B01&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;
Design: Data Visualization&lt;/h3&gt;
At this point, we&#39;ll prepare the elements in the layout, and we begin with the plots. Below is the bar plot of &lt;code&gt;y1&lt;/code&gt; in the data frame, &lt;code&gt;dat&lt;/code&gt;, in three groupings, &lt;code&gt;grp&lt;/code&gt;. Note that the plot you&#39;ll obtain will not be the same with the one below since the data changes every time we run the simulation above.
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3603SHYC8RBWqdffrgJuY4mqA1NHDfwCKuekuyZ0Q9VBNi72DaesLYeoN8vCkja4XmIttrCxOXlmdOaYXtjTDzwdTetpbiqz6Sxvt6AODp_jfsf_Cqz5JKNQxyDCTkAXR5ww-YM9m-cNh/s1600/Rplot08.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/7f1369f458163f8be96b.js&quot;&gt;&lt;/script&gt;
So that&#39;s the default theme of ggplot2, and we want to customize this using the &lt;code&gt;theme&lt;/code&gt; function. One of the elements in the plot that will be tweaked is the font. To deal with this we need to import the fonts using the extrafont package. That is,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/650a227c7871f40ecb2e.js&quot;&gt;&lt;/script&gt;
What happens above is that all fonts installed in your machine will be imported. It&#39;s better to import all of it so that we&#39;ll have several choices to play on. For the above infographic, the font used is called &lt;a href=&quot;http://en.wikipedia.org/wiki/Impact_(typeface)&quot; target=&quot;_blank&quot;&gt;Impact&lt;/a&gt;, which is available on windows and I think on mac as well. If you don&#39;t have that, then download and install it first before running the above codes. To arrive on the design of the bar plot in the infographic we use the following theme,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5af5422879f07f6148c6.js&quot;&gt;&lt;/script&gt;
I named it &lt;code&gt;kobe_theme&lt;/code&gt; since if you recall from my &lt;a href=&quot;http://alstatr.blogspot.com/2015/02/philippine-infographic-recapitulation.html&quot; target=&quot;_blank&quot;&gt;previous article&lt;/a&gt;, the above chart is inspired by &lt;a href=&quot;http://www.nba.com/lakers/multimedia/121205kobe30Kinfographic&quot; target=&quot;_blank&quot;&gt;Kobe Bryant Infographic&lt;/a&gt;. So applying this to the plot we&#39;ll have the following,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTpqaK8jvjf7wRH8tt0h504pdCNMI_Vmrjv2Ho8iCPp9qAYG_4fkAXcm_ufy7xeeylE46S1LyhFfy0rocrsoRnhZ-Ko81Llfjcsi6pnQBInuaxuAYZclY0YC5rw3bpFWi8QXiLjY2_cZDI/s1600/Rplot09.png&quot; /&gt;&lt;/div&gt;
Obtain by running &lt;code&gt;p1 + kobe_theme()&lt;/code&gt;. If in case you want to reorder the ticks in the x-axis, by starting with A from the top and ending with L in the bottom, simply run the following,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/635ae74604e94e9e6891.js&quot;&gt;&lt;/script&gt;
And you&#39;ll have
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXAnFULQ0c6PhMHl1F70Hoy9L3xdG_xLnBgUgpeljYjEToauaygI1cRn6f_4yvWjSIrgO45JQNgbJ03725IelEDvpVM-y1LjVw_fR8Q4Rmn8EkThasLKqDDlqbn8bXYzUCWPmlOgYsw-dq/s1600/Rplot10.png&quot; /&gt;&lt;/div&gt;
So that&#39;s our first plot, next is to plot &lt;code&gt;y2&lt;/code&gt; from &lt;code&gt;dat&lt;/code&gt; data frame, this time using the line plot.
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwd9r86n0nKbW3CZiXESkMFKYS60_I6WZftZ2iaCIye7172Acc9_yj3BdjOaRA1un23aQvskqat_BWjF9O42qhZTqWl1bWXzlfsMfPOmjdoiQ6lb6qA3SdWwM_XOShl670EIhXFONybIgq/s1600/Rplot11.png&quot; /&gt;&lt;/div&gt;
Obtain by running the following code:
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/02a2141db4b4c7e49d29.js&quot;&gt;&lt;/script&gt;
Applying &lt;code&gt;kobe_theme&lt;/code&gt;, will give us
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDcLFbp87cZtQkMwELEVqYVSol7pVIHpUA5RHQqHZ9Qi9aLXsWaBpKjKLDBIUEecC2Dmitlfjg9W6bWcCyxue76BmROBa5FClRZ34-S8hQwKivt1sNLHc4f_HaSWCS5whHCwZM5z9VD_16/s1600/Rplot12.png&quot; /&gt;&lt;/div&gt;
Above plot is generated by running &lt;code&gt;p2 + kobe_theme()&lt;/code&gt;. We should expect this since the &lt;code&gt;kobe_theme&lt;/code&gt; that was applied in the bar plot with &lt;code&gt;coord_flip&lt;/code&gt; option enabled, affects the orientation of the grids. So instead, we do a little tweak on the current theme, and see for yourself the difference:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/91b98a7834f244ba6c47.js&quot;&gt;&lt;/script&gt;
So that we have the following result:
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsFqLS5o8X78PmdA0OkKfO81zYyGzC_sxmgUe3iLXybmEAKG6znEBxEKSHlMHndpjSfd_hD_ja3Ype9LF9cfFVM6_OWuR_W0YWoYXLIvVuGmwwIJogRmZew_2icKYvgf9KZ8Tu5w1MvlWN/s1600/Rplot13.png&quot; /&gt;&lt;/div&gt;
Now that&#39;s better, one more issue is the title label for the legend. To change the label, run the following code:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8991af247f658ac346c0.js&quot;&gt;&lt;/script&gt;
And that will give us
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNdepKcVMzC7FISTve7ccS92WzYuSCdXPH-1dXS09SRedamfDoXzZIHc1gqzzYjdkrVcAaR6FEa-OR12zdraoxgySg3k4VzZ2gBB6GQWke-JDv50irNSAPZkXYp9SpD7Ngjo3rcQIKQVlw/s1600/Rplot14.png&quot; /&gt;&lt;/div&gt;
Finally, &lt;code&gt;y3&lt;/code&gt; variable is plotted using the following codes:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/dc8ca2604cc8895fcd95.js&quot;&gt;&lt;/script&gt;
by default we have,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4O30KgZg0y8U98nyMN9VajpRo56lbj0Hq6uzTWE7y3mVXWsDmTrTBpidAuN4HrM74QGxYrjLNvRTAB3QrRz1sP2ukB0vwtykJImZvHtiIWVyZ2RlIWfWBU0_N2DlIGzZbB3llQ5EVdllg/s1600/Rplot19.png&quot; /&gt;&lt;/div&gt;
Applying &lt;code&gt;kobe_theme2()&lt;/code&gt;,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfn615hcPRVQyfelTea4uUPQadpc2igjjWddWi9ZsjJPITG9-FFE2IgCjo88z0EoG3CiTbHi3_y32RhhE8w_SgrDwr5gigkX6_qi0M1VxpJsiGGSqnIJhM-d05ZNZQxr1WWdg7RN28gqzD/s1600/Rplot20.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/b715c391f075ac5d396f.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
Layout&lt;/h3&gt;
All plots are now set, next is to place it in the layout. The following steps explain the procedure:
&lt;ol&gt;
&lt;li&gt;Start by creating new grid plot, &lt;code&gt;grid.newpage()&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Next define the layout of the grid. Think of this as a matrix of plots, where a 2 by 2 matrix plot will give us 4 windows (two rows and two columns). These windows will serve as a placeholder of the plots. So to achieve a matrix plot with 4 rows and 3 columns, we run&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/887cb68f25adc25acec0.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;Next is the background colour, this will be the background colour of the infographic. For the given chart, we run the following:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/47ae8fe3e88f9222b0ec.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;Next is to insert texts in the layout, use the &lt;code&gt;grid.text&lt;/code&gt; function. The position of objects/elements such as texts in the grid is defined by the (x, y) coordinates. The bound of the grid by default is a unit square, of course the aspect ratio of the square can be modified. So the support of x and y is $[0,1]^2$;&lt;/li&gt;
&lt;li&gt;To insert the plot into a specific window in the matrix plot use the &lt;code&gt;vplayout&lt;/code&gt; function for the coordinates of the placeholder, and &lt;code&gt;print&lt;/code&gt; for pasting. Say we want to insert the first plot in first row, second column, we code it this way
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/4a9917a11facd5d34cb1.js&quot;&gt;&lt;/script&gt;
Now to place it in first row and stretched it over all (three) columns, run
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9646f44df3cc7029214a.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;/ol&gt;
Using the above procedure, we have the following codes for the infographic. Enjoy!
&lt;br /&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/6a38f2fd4927d1d892c7.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
PNG Output&lt;/h3&gt;&lt;br/&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvUzWRhXAAfprBgXO46dxhwDb5alkPfwLu-vZ7jhI3ntoLDeHpcz-zuSvFdbdyBZb52d8z-pLj8x48Y-YTgY7FSBXQbY_ZI5bScFddf7BqPFIR2f0eMMBXobDxpngTHYqOp-YNEpPnaQkd/s1600/Infographics1.png&quot; height=&quot;640&quot; width=&quot;320&quot; /&gt;&lt;/div&gt;&lt;br/&gt;
&lt;h3&gt;
PDF Output&lt;/h3&gt;&lt;br/&gt;
&lt;iframe class=&quot;scribd_iframe_embed&quot; data-aspect-ratio=&quot;undefined&quot; data-auto-height=&quot;false&quot; frameborder=&quot;0&quot; height=&quot;600&quot; id=&quot;doc_45465&quot; scrolling=&quot;no&quot; src=&quot;https://www.scribd.com/embeds/256799197/content?start_page=1&amp;amp;view_mode=scroll&amp;amp;show_recommendations=true&quot; width=&quot;100%&quot;&gt;&lt;/iframe&gt;&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http://docs.ggplot2.org/current/&quot; target = &quot;_blank&quot;&gt;ggplot2&lt;/a&gt; Documentation.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.cookbook-r.com/Graphs/&quot; target = &quot;_blank&quot;&gt;Cookbook for R&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/48310061215213779/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/02/r-how-to-layout-and-design-infographic.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/48310061215213779'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/48310061215213779'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/02/r-how-to-layout-and-design-infographic.html' title='R: How to Layout and Design an Infographic'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjtxW-CXybvA3BcxzObxK9mqlqRJh9-RuQ6dtAMR6u72Z61b-ZrCHMqOVSLeQhvDt8s3kTSngiLplRC2EMzsJj5yOPW36rxesXylN0RlhXzqtdb_dJArP09KW4OA0v24LRqCKMPhmZ4rUcX/s72-c/Infographics.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-1174353664769461422</id><published>2015-02-18T23:54:00.000+08:00</published><updated>2015-02-25T18:02:43.111+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Infographic"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>Philippine Infographic: Recapitulation on Incidents Involving Motorcycle Riding in Tandem Criminals for 2011-2013</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
The Philippine government has launched &lt;a href=&quot;http://data.gov.ph/&quot; target=&quot;_blank&quot;&gt;Open Data Philippines&lt;/a&gt; (data.gov.ph) last year, January 16, 2014. Accordingly, the data.gov.ph aims to make national government data searchable, accessible, and useful, with the help of the different agencies of government, and with the participation of the public.&amp;nbsp;This website consolidates the data sets of different government agencies, allowing users to find specific information from a rich and continuously growing collection of public data sets.&lt;br /&gt;
&lt;br /&gt;
Data.gov.ph provides information on how to access these datasets and tools, such infographics and other applications, to make the information easy to understand. Users may not only view the datasets, but also share and download them as spreadsheets and other formats, for their own use.&lt;br /&gt;
&lt;br /&gt;
The primary goal of data.gov.ph is to foster a citizenry empowered to make informed decisions, and to promote efficiency and transparency in government. For more, check out the video:&lt;br /&gt;
&lt;br /&gt;
&lt;center&gt;
&lt;iframe allowfullscreen=&quot;&quot; frameborder=&quot;0&quot; height=&quot;281&quot; mozallowfullscreen=&quot;&quot; src=&quot;//player.vimeo.com/video/84140522&quot; webkitallowfullscreen=&quot;&quot; width=&quot;500&quot;&gt;&lt;/iframe&gt;
&lt;/center&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
Although admittedly I accidentally discovered this few weeks ago, but still good news for me. I mean I&#39;ve been frustrated about our government data since college, it was difficult to do case study and research about crimes, rainfall, and other interesting variables to model due to the lack of data available online. With the launch of Open Data Philippines, and for believing that data can improve our country, it&#39;s a win win for me. So as a first exploitation on it. I decided to use the data from &lt;a href=&quot;http://pnp.gov.ph/portal/&quot; target=&quot;_blank&quot;&gt;Philippine National Police (PNP)&lt;/a&gt; agency about the incidents involving motorcycle riding in tandem criminals. Check my first infographic below,&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
The Infographic (PNG)&lt;/h3&gt;&lt;br/&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxi-n6FzNc8loi4gE2XFefkJwKviDPDLE1mHZPWQx10M5UBnkAVThkG_bUs5mYHCpEzRD-cl8-NqKUsRaAvqvuj6F98PNXixf7u338CIVWmpL_ZU14cDth3ngjCgsRJRfqh86Gz1uV7iSp/s1600/Infographics.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxi-n6FzNc8loi4gE2XFefkJwKviDPDLE1mHZPWQx10M5UBnkAVThkG_bUs5mYHCpEzRD-cl8-NqKUsRaAvqvuj6F98PNXixf7u338CIVWmpL_ZU14cDth3ngjCgsRJRfqh86Gz1uV7iSp/s1600/Infographics.png&quot; height=&quot;320&quot; width=&quot;160&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;
PDF Version&lt;/h3&gt;
&lt;br /&gt;
&lt;iframe class=&quot;scribd_iframe_embed&quot; data-aspect-ratio=&quot;undefined&quot; data-auto-height=&quot;false&quot; frameborder=&quot;0&quot; height=&quot;600&quot; id=&quot;doc_2732&quot; scrolling=&quot;no&quot; src=&quot;https://www.scribd.com/embeds/256148060/content?start_page=1&amp;amp;view_mode=scroll&amp;amp;show_recommendations=true&quot; width=&quot;100%&quot;&gt;&lt;/iframe&gt;

&lt;br /&gt;
&lt;h3&gt;
What software did I use for creating this infographic?&lt;/h3&gt;
Well I designed it entirely using &lt;a href=&quot;http://alstatr.blogspot.com/search/label/R&quot; target=&quot;_blank&quot;&gt;R&lt;/a&gt;, with the help of &lt;a href=&quot;http://cran.r-project.org/web/packages/ggplot2/index.html&quot; target=&quot;_blank&quot;&gt;ggplot2&lt;/a&gt;, &lt;a href=&quot;http://cran.r-project.org/web/packages/grid/index.html&quot; target=&quot;_blank&quot;&gt;grid&lt;/a&gt;, and &lt;a href=&quot;http://cran.r-project.org/package=extrafont&quot; target=&quot;_blank&quot;&gt;extrafont&lt;/a&gt; packages. The above infographic is inspired by &lt;a href=&quot;http://www.nba.com/lakers/multimedia/121205kobe30Kinfographic&quot; target=&quot;_blank&quot;&gt;Kobe Bryant Infographic&lt;/a&gt;. The hexadecimal color codes from the said chart were extracted using the eyedropper tool from &lt;a href=&quot;https://affinity.serif.com/en-gb/&quot; target=&quot;_blank&quot;&gt;Affinity Designer&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I will not share any code in this post, but will do a tutorial on how to create one. So be notified by subscribing.
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/1174353664769461422/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/02/philippine-infographic-recapitulation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/1174353664769461422'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/1174353664769461422'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/02/philippine-infographic-recapitulation.html' title='Philippine Infographic: Recapitulation on Incidents Involving Motorcycle Riding in Tandem Criminals for 2011-2013'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxi-n6FzNc8loi4gE2XFefkJwKviDPDLE1mHZPWQx10M5UBnkAVThkG_bUs5mYHCpEzRD-cl8-NqKUsRaAvqvuj6F98PNXixf7u338CIVWmpL_ZU14cDth3ngjCgsRJRfqh86Gz1uV7iSp/s72-c/Infographics.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-4358173786127971101</id><published>2015-02-09T17:37:00.000+08:00</published><updated>2015-02-22T20:35:05.185+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Descriptive Statistics"/><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><title type='text'>Python: Getting Started with Data Analysis</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;a href=&quot;http://alstatr.blogspot.com&quot; target = &quot;_blank&quot;&gt;Analysis with Programming&lt;/a&gt; has recently been syndicated to &lt;a href=&quot;http://planetpython.org&quot; target = &quot;_blank&quot;&gt;Planet Python&lt;/a&gt;. And as a first post being a contributing blog on the said site, I would like to share how to get started with data analysis on Python. Specifically, I would like to do the following:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Importing the data
&lt;ul&gt;
&lt;li&gt;Importing CSV file both locally and from the web;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data transformation;&lt;/li&gt;
&lt;li&gt;Descriptive statistics of the data;&lt;/li&gt;
&lt;li&gt;Hypothesis testing
&lt;ul&gt;
&lt;li&gt;One-sample t test;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Visualization; and&lt;/li&gt;
&lt;li&gt;Creating custom function.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Importing the data&lt;/h3&gt;
This is the crucial step, we need to import the data in order to proceed with the succeeding analysis. And often times data are in CSV format, if not, at least can be converted to CSV format. In Python we can do this using the following codes:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/c460ddf86c7485a4839b.js&quot;&gt;&lt;/script&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
To read CSV file locally, we need the &lt;code&gt;pandas&lt;/code&gt; module which is a python data analysis library. The &lt;code&gt;read_csv&lt;/code&gt; function can read data both locally and from the web.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Data transformation&lt;/h3&gt;
Now that we have the data in the workspace, next is to do transformation. Statisticians and scientists often do this step to remove unnecessary data not included in the analysis. Let&#39;s view the data first:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/89e95aa0504f25c3d4fd.js&quot;&gt;&lt;/script&gt;
To R programmers, above is the equivalent of &lt;code&gt;print(head(df))&lt;/code&gt; which prints the first six rows of the data, and &lt;code&gt;print(tail(df))&lt;/code&gt; -- the last six rows of the data, respectively. In Python, however, the number of rows for head of the data by default is 5 unlike in R, which is 6. So that the equivalent of the R code &lt;code&gt;head(df, n = 10)&lt;/code&gt; in Python, is &lt;code&gt;df.head(n = 10)&lt;/code&gt;. Same goes for the tail of the data.&lt;br /&gt;
&lt;br /&gt;
Column and row names of the data are extracted using the &lt;code&gt;colnames&lt;/code&gt; and &lt;code&gt;rownames&lt;/code&gt; functions in R, respectively. In Python, we extract it using the &lt;code&gt;columns&lt;/code&gt; and &lt;code&gt;index&lt;/code&gt; attributes. That is,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/92a80fb9b76c1bc0e5a6.js&quot;&gt;&lt;/script&gt;
Transposing the data is obtain using the &lt;code&gt;T&lt;/code&gt; method,
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/c7ca9cce05d8cf41a45f.js&quot;&gt;&lt;/script&gt;
Other transformations such as sort can be done using &lt;code&gt;sort&lt;/code&gt; attribute. Now let&#39;s extract a specific column. In Python, we do it using either &lt;code&gt;iloc&lt;/code&gt; or &lt;code&gt;ix&lt;/code&gt; attributes, but &lt;code&gt;ix&lt;/code&gt; is more robust and thus I prefer it. Assuming we want the head of the first column of the data, we have
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/839451ab69658bccdbc8.js&quot;&gt;&lt;/script&gt;
By the way, the indexing in Python starts with 0 and not 1. To slice the index and first three columns of the 11th to 21st rows, run the following&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d2dae542c7edf12c0b23.js&quot;&gt;&lt;/script&gt;
Which is equivalent to &lt;code&gt;print df.ix[10:20, [&#39;Abra&#39;, &#39;Apayao&#39;, &#39;Benguet&#39;]]&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
To drop a column in the data, say columns 1 (Apayao) and 2 (Benguet), use the &lt;code&gt;drop&lt;/code&gt; attribute. That is,
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9bd963e1ff5637b9e693.js&quot;&gt;&lt;/script&gt;
&lt;code&gt;axis&lt;/code&gt; argument above tells the function to drop with respect to columns, if &lt;code&gt;axis = 0&lt;/code&gt;, then the function drops with respect to rows.&lt;br/&gt;
&lt;br /&gt;
&lt;h3&gt;
Descriptive Statistics&lt;/h3&gt;
Next step is to do descriptive statistics for preliminary analysis of our data using the &lt;code&gt;describe&lt;/code&gt; attribute:
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a3c6f2aa6e1af7179270.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
Hypothesis Testing&lt;/h3&gt;
Python has a great package for statistical inference. And that&#39;s the &lt;a href=&quot;http://docs.scipy.org/doc/scipy/reference/stats.html&quot; target = &quot;_blank&quot;&gt;stats&lt;/a&gt; library of scipy. The one sample t-test is implemented in &lt;code&gt;ttest_1samp&lt;/code&gt; function. So that, if we want to test the mean of the Abra&#39;s volume of palay production against the null hypothesis with 15000 assumed population mean of the volume of palay production, we have
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e84461f482a3a778e8fe.js&quot;&gt;&lt;/script&gt;
The values returned are tuple of the following values:
&lt;ul&gt;
&lt;li&gt;t : float or array&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;t-statistic&lt;/li&gt;
&lt;li&gt;prob : float or array&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;two-tailed p-value&lt;/li&gt;
&lt;/ul&gt;
From the above numerical output, we see that the p-value = 0.2627 is greater than $\alpha=0.05$, hence there is no sufficient evidence to conclude that the average volume of palay production is not equal to 15000. Applying this test for all variables against the population mean 15000 volume of production, we have&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/2ac5ae476629eb5a21e5.js&quot;&gt;&lt;/script&gt;
The first array returned is the t-statistic of the data, and the second array is the corresponding p-values.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Visualization&lt;/h3&gt;
There are several module for visualization in Python, and the most popular one is the matplotlib library. To mention few, we have bokeh and seaborn modules as well to choose from. In my previous &lt;a href=&quot;http://alstatr.blogspot.com/2014/03/python-numerical-description-of-data.html&quot; target = &quot;_blank&quot;&gt;post&lt;/a&gt;, I&#39;ve demonstrated the matplotlib package which has the following graphic for box-whisker plot,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQmWlqmaVBi8s9mQvecVlgaPOxew07t80JFN7KDj9OgeMdVpWBjKzh-5lrnm5kC3NyAzvAVdwj3OiuftChJtOfoYi02_OGKCKdvHCtfkuiBmRjez-6iYJuO3o-nuZsPbjH8CrIQ1C1VPSY/s1600/myfigmat.png&quot; height=&quot;395&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ea3780959ac7b8a92a37.js&quot;&gt;&lt;/script&gt;
Now plotting using pandas module can beautify the above plot into the theme of the popular R plotting package, the &lt;a href=&quot;http://docs.ggplot2.org/current/index.html&quot; target = &quot;_blank&quot;&gt;ggplot&lt;/a&gt;. To use the ggplot theme just add one more line to the above code,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/747394fc4e9ebafebab2.js&quot;&gt;&lt;/script&gt;
And you&#39;ll have the following,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3iOubBXYXmfG_em5dX5QQvHf0W0CaoAi4CXqnuSo1yHxyKaWozv6E3qxFWJ5pAMPIrszrkYqgHx0nHdS-OmEnLEkhdwzgocuwCbN3zXddndOscSx1KYOHebePOpFb9P2FOKdlZTG7wuMb/s1600/myfigmatg.png&quot; height=&quot;397&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
Even neater than the default matplotlib.pyplot theme. But in this post, I would like to introduce the seaborn module which is a statistical data visualization library. So that, we have the following
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgAsKNQ7xRccTGUraDNmQ7Q9cYbPe5jKb-nC-73hiajaiEaDpXUEnTpUcGMdH6m7XOvJcmYckUnANyreQ4sMKdymuX5pb1fX77orN6k3y9cwiTFAKs5Wl6C1v3OLBzAdttJEj6ZorqFPxCu/s1600/myfig3.png&quot; height=&quot;397&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/57aeb24456ff6ebb9250.js&quot;&gt;&lt;/script&gt;
Sexy boxplot, scroll down for more.
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3LfhJqVKyx_A4PG8v_PVNbwbmjNQQ-UL1OPXLUUHzNxm1PPSea-IIoP0Wd5uYEGMK3zVKrkaxfTgIo-fdaELriPJbhB_TlSSu8UMGJr1-ucFY2UHQYp33MYphMpY3aiJaPuacn9vLnF3E/s1600/myfig4.png&quot; height=&quot;392&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/01c1edd8cb6a8952e2c1.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPYC_aupHvC_dLAgmpN4iDOtVPMxzLhe54fwVr2Qz1Tbzlsogc_ngzv3qsuyUczKM-uIVsp8pTOroR4NklHq8-1oqQFJgTKtIwhxHKJJwvaoDFYGxQOj7iClEBAdD5Ql0i6GeUpx3HFkzx/s1600/myfig1.png&quot; height=&quot;390&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/1c23218afc7f57307918.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhc-9mp89aJxBNV6l6h-XTCn2RwdxP4WRaGAk8mVefZV4jNvWeBlY-ppgXC-g1plhMxvLRAyfkj-DN0BiMinpYS1bRd3fP6CG-VqnuAy99t7FTkxkdGZm2GdC4oPseXFObYfVR8jGr3mhdp/s1600/myfig2.png&quot; height=&quot;387&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/33bc4428309b2c7e392c.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjraDN3b1_XQbbDDTqDk1-sRWw0CZbWtGonmt3I7xawCbls05_Wgr5OeWF1OWgb9uANV5JnMJRfiyODatxFc7nLjQIt5sj6qe6wiIvAz6EDQHDTgXqaSDigGORYfY3JZSPOzBV6ApT5mj0N/s1600/myfig5.png&quot; height=&quot;400&quot; width=&quot;392&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/65832727c54f560413d2.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;Creating custom function&lt;/h3&gt;
To define a custom function in Python, we use the &lt;code&gt;def&lt;/code&gt; function. For example, say we define a function that will 
add two numbers, we do it as follows,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ba50893788cfc3a4cb0f.js&quot;&gt;&lt;/script&gt;
By the way, in Python indentation is important. Use indentation for scope of the function, which in R we do it with braces &lt;code&gt;{...}&lt;/code&gt;. Now here&#39;s an algorithm from my previous &lt;a href=&quot;http://alstatr.blogspot.com/2014/01/python-and-r-is-python-really-faster.html&quot; target = &quot;_blank&quot;&gt;post&lt;/a&gt;,
&lt;ol style=&quot;text-align: left;&quot;&gt;
&lt;li&gt;Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;&lt;/li&gt;
&lt;li&gt;Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence level;&lt;/li&gt;
&lt;li&gt;Repeat the process 100 times; then&lt;/li&gt;
&lt;li&gt;Compute the percentage of the confidence intervals containing the true mean.&lt;/li&gt;
&lt;/ol&gt;
Coding this in Python we have,
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8725353.js&quot;&gt;&lt;/script&gt;
Above code might be easy to read, but it&#39;s slow in replication. Below is the improvement of the above code, thanks to Python gurus, see &lt;a href=&quot;http://alstatr.blogspot.com/2014/01/python-and-r-is-python-really-faster.html#disqus_thread&quot; target = &quot;_blank&quot;&gt;comments&lt;/a&gt; on my previous post.&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8748774.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;Update&lt;/h3&gt;
For those who are interested in the ipython notebook of this article, please click &lt;a href=&quot;http://nuttenscl.be/Python_Getting_Started_with_Data_Analysis.html&quot; target = &quot;_blank&quot;&gt;here&lt;/a&gt;. This article was converted to ipython notebook by of &lt;a href=&quot;https://twitter.com/NuttensC&quot; target=&quot;_blank&quot;&gt;Nuttens Claude&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;
Data Source&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://countrystat.bas.gov.ph/&quot; target=&quot;_blank&quot;&gt;Philippine Bureau of Agricultural Statistics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http://pandas.pydata.org/pandas-docs/stable/&quot; target = &quot;_blank&quot;&gt;Pandas&lt;/a&gt;, &lt;a href=&quot;http://docs.scipy.org/doc/&quot; target = &quot;_blank&quot;&gt;Scipy&lt;/a&gt;, and &lt;a href=&quot;http://stanford.edu/~mwaskom/software/seaborn/&quot; target = &quot;_blank&quot;&gt;Seaborn&lt;/a&gt; Documentations.&lt;/li&gt;
&lt;li&gt;Wes McKinney &amp; PyData Development Team (2014). &lt;i&gt;pandas: powerful Python data analysis toolkit&lt;/i&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/4358173786127971101/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/02/python-getting-started-with-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/4358173786127971101'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/4358173786127971101'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/02/python-getting-started-with-data.html' title='Python: Getting Started with Data Analysis'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQmWlqmaVBi8s9mQvecVlgaPOxew07t80JFN7KDj9OgeMdVpWBjKzh-5lrnm5kC3NyAzvAVdwj3OiuftChJtOfoYi02_OGKCKdvHCtfkuiBmRjez-6iYJuO3o-nuZsPbjH8CrIQ1C1VPSY/s72-c/myfigmat.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-928073872476658308</id><published>2015-01-30T21:46:00.000+08:00</published><updated>2015-01-31T12:16:03.160+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="LaTeX"/><category scheme="http://www.blogger.com/atom/ns#" term="Probability Theory"/><title type='text'>Multiple Random Variables Problems</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
To probability lovers, I just want to share (and discuss) few simple problems I solved in Chapter 4 of &lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2002). Statistical Inference&lt;/a&gt;.
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;
A random point $(X,Y)$ is distributed uniformly on the square with vertices $(1, 1),(1,-1),(-1,1),$ and $(-1,-1)$. That is, the joint pdf is $f(x,y)=\frac{1}{4}$ on the square. Determine the probabilities of the following events.

&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;$X^2 + Y^2 &amp;lt; 1$&lt;/li&gt;
&lt;li&gt;$2X-Y&amp;gt;0$&lt;/li&gt;
&lt;li&gt;$|X+Y|&lt;1$ (modified since the original $|X+Y|&lt;2$ is trivial.)&lt;/li&gt;
&lt;/ol&gt;
&lt;i&gt;Solutions:&lt;/i&gt;

&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;$X^2 + Y^2 &amp;lt; 1$&lt;br /&gt;
We need to consider the boundary of this inequality first in the unit square, so below is the plot of $X^2 + Y^2 = 1$,
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjazWu4wwnTB929g3g7G3sreEGlCO2DHPI_ROr8mXhyphenhyphenxE8V3y6M9UebdAsQD31Jxf2y0stbHfxEQLPjtgotoLSkNFd-XjdgQ0ULFsRGGYQEC1kUbu8wkWFYWSzLtnzOBTFup6GbDwf6Fn6I/s1600/Screenshot+from+2014-10-29+23:25:05.png&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/7d60a55d60fba0671f29&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;br /&gt;
Hence, we are interested in the area of the ellipse above since $X^2 + Y^2$ is less than 1. To compute the area of the ellipse, notice that the regions in the 4 quadrants are identical except on the orientation, thus we can compute the area of the first quadrant then we simply multiply this by 4 to cover the overall area of the said geometric.
\begin{equation}\nonumber
\begin{aligned}
P(X^2 + Y^2 &amp;lt; 1) &amp;amp;= 4\int_{0}^{1}\int_{0}^{\sqrt{1 - x^2}}\frac{1}{4}\operatorname{d}y\operatorname{d}x\\
&amp;amp;= \int_{0}^{1}y\Bigg|_{y=0}^{y=\sqrt{1 - x^2}}\operatorname{d}x = \int_{0}^{1}\sqrt{1 - x^2}\operatorname{d}x\\
&amp;amp;=\left(\frac{x}{2} \sqrt{- x^{2} + 1} + \frac{1}{2} \operatorname{sin}^{-1}{\left (x \right )}\right)\Bigg|_{x=0}^{x=1}\\
&amp;amp;=\frac{\pi}{4}-0=\frac{\pi}{4}.
\end{aligned}
\end{equation}
Confirm this using python symbolic computation,&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/2d8afd2b01520e4d3eec.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;Given $2X-Y&amp;gt;0$, we have&lt;br /&gt;
\begin{equation}\nonumber
P(2X-Y&amp;gt;0)=P(-Y&gt;-2x) = P(Y&lt;2X)
\end{equation}
The plot of $Y=2X$ is shown below,
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxkA3398o3emzT12YzXlimxVxLhRSPzJymDkgtYJJ6VHwmY0CarA_dNbCBy2zx3jW-nm6D_BKsph8L_ONXos9IpKUjNYjcRehuey9KB9opW7AB_AAoNlraszohsKYGWl2cnB4gLB6_1ONI/s1600/Screenshot+from+2014-10-30+12:04:08.png&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/f98409701254f446d145&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;
&lt;br/&gt;
The shaded region is the event we are interested in, then
\begin{equation}\nonumber
\begin{aligned}
P(Y &lt; 2X) &amp;= \int_{-1}^{1}\int_{\frac{y}{2}}^{1}\frac{1}{4}\operatorname{d}x\operatorname{d}y=\int_{-1}^{1}\frac{y}{4}\Bigg|_{\frac{y}{2}}^{1}\operatorname{d}y\\
&amp;=\int_{-1}^{1}\left(\frac{1}{4}-\frac{y}{8}\right)\operatorname{d}y=\left(\frac{y}{4}-\frac{y^2}{16}\right)\Bigg|_{-1}^{1}\\
&amp;=\left(\frac{1}{4}-\frac{1}{16}\right)-\left(-\frac{1}{4}-\frac{1}{16}\right)=\frac{1}{2}.
\end{aligned}
\end{equation}
&lt;/li&gt;
&lt;li&gt; Given $|X+Y| &lt; 1$, we have
\begin{equation}\nonumber
P(|X+Y|&lt;1) = P(-1 &lt; X+Y &lt; 1) = P(-1-X &lt; Y &lt; 1-X)
\end{equation}
The shaded region of both equations ($Y=1-X$ and $Y=-1-X$) is shown below
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgptR1LGgZeLQgFn3lMGSIbioDtETuneMKb0GV-pK3PfxSuah4VvtHkPxd1TrIRwlTKN_XdYHkMLxCUbnzuV8PdhrPSD6Lgob50W1sfbWRUpw4vz6jfLa8Ithpa0msZadPng03QgSwUSoXO/s1600/Screenshot+from+2014-10-30+13:40:57.png&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/aef57fc8bbf4f7056afb&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;&lt;br/&gt;
Hence, we have
\begin{equation}\nonumber
\begin{aligned}
P(-1-X &lt; Y &lt; 1-X) &amp;= 2\int_{0}^{1}\int_{-1}^{1-x}\frac{1}{4}\operatorname{d}y\operatorname{d}x\\
&amp;=\int_{0}^{1}\frac{y}{2}\Bigg|_{-1}^{1-x}\operatorname{d}x=\int_{0}^{1}\left(\frac{1-x}{2}+\frac{1}{2}\right)\operatorname{d}x\\
&amp;=\left(x-\frac{x^2}{4}\right)\Bigg|_{0}^{1}=\frac{3}{4}.
\end{aligned}
\end{equation}
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
A pdf is defined by
\begin{equation}\nonumber
f(x,y) = \begin{cases}
C (x+2y) &amp; \text{if}\;0 &lt; y &lt; 1\;\text{and}\;0 &lt; x &lt; 2\\
0 &amp; \text{otherwise}.
\end{cases}
\end{equation}
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;Find the value of $C$.&lt;/li&gt;
&lt;li&gt;Find the marginal distribution of $X$.&lt;/li&gt;
&lt;li&gt;Find the joint cdf of $X$ and $Y$.&lt;/li&gt;
&lt;/ol&gt;
&lt;i&gt;Solutions:&lt;/i&gt;
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;Find the value of $C$.&lt;br/&gt;
To solve for the value of $C$ we integrate the given pdf first for $x$ and $y$, that is
\begin{equation}\nonumber
\begin{aligned}
1&amp;=\int_{0}^{1}\int_{0}^{2}C (x+2y)\operatorname{d}x\operatorname{d}y=
C\int_{0}^{1} \left(\frac{x^2}{2}+2xy\right)\Bigg|_{x=0}^{x=2}\operatorname{d}y\\
&amp;=C\int_{0}^{1}(2+4y)\operatorname{d}y=C\left(2y+4\frac{y^2}{2}\right)\Bigg|_{y=0}^{y=1}\\
1&amp;=4C\Rightarrow C=\frac{1}{4}
\end{aligned}
\end{equation}
&lt;script src=&quot;https://gist.github.com/alstat/24d157ff530416a54c0e.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;Find the marginal distribution of $X$.
\begin{equation}\nonumber
\begin{aligned}
f_X(x)&amp;=\int_{0}^{1}f(x,y)\operatorname{d}y = \frac{1}{4}\int_{0}^{1}(x+2y)\operatorname{d}y\\
&amp;=\frac{1}{4}(xy+y^2)\Bigg|_{y=0}^{y=1}=\begin{cases}
\frac{1}{4}(x+1),&amp;0  &lt; x &lt; 2\\
0,&amp;\text{elsewhere}
\end{cases}
\end{aligned}
\end{equation}
&lt;/li&gt;
&lt;li&gt;Find the joint cdf of $X$ and $Y$. 
\begin{equation}\nonumber
\begin{aligned}
F_{XY}(x,y)&amp;=P(X\leq x, Y\leq y) = \frac{1}{4}\int_{0}^{x}\int_{0}^{y}(u+2v)\operatorname{d}v\operatorname{d}u\\
&amp;=\frac{1}{4}\int_{0}^{x}(uv+v^2)\Bigg|_{v=0}^{v=y}\operatorname{d}u\\
&amp;=\frac{1}{4}\int_{0}^{x}(uy+y^2)\operatorname{d}u\\
&amp;=\frac{1}{4}\left(\frac{u^2y}{2}+uy^2\right)\Bigg|_{u=0}^{u=x}=\frac{x^2y}{8}+\frac{xy^2}{4}
\end{aligned}
\end{equation}
If $x\geq 2$ and $0 &lt; y &lt; 1$, then
\begin{equation}\nonumber
\begin{aligned}
F_{XY}(x,y)&amp;=P(X\leq x, Y\leq y) = \frac{1}{4}\int_{0}^{2}\int_{0}^{y}(u+2v)\operatorname{d}v\operatorname{d}u\\
&amp;=\frac{1}{4}\int_{0}^{2}(uv+v^2)\Bigg|_{v=0}^{v=y}\operatorname{d}u\\
&amp;=\frac{1}{4}\int_{0}^{2}(uy+y^2)\operatorname{d}u\\
&amp;=\frac{1}{4}\left(\frac{u^2y}{2}+uy^2\right)\Bigg|_{u=0}^{u=2}=\frac{y}{2}+\frac{y^2}{2}
\end{aligned}
\end{equation}
If $0 &lt; x &lt; 2$ and $y \geq 1$, then
\begin{equation}\nonumber
\begin{aligned}
F_{XY}(x,y)&amp;=P(X\leq x, Y\leq y) = \frac{1}{4}\int_{0}^{x}\int_{0}^{1}(u+2v)\operatorname{d}v\operatorname{d}u\\
&amp;=\frac{1}{4}\int_{0}^{x}(uv+v^2)\Bigg|_{v=0}^{v=1}\operatorname{d}u\\
&amp;=\frac{1}{4}\int_{0}^{x}(u+1)\operatorname{d}u\\
&amp;=\frac{1}{4}\left(\frac{u^2}{2}+u\right)\Bigg|_{u=0}^{u=x}=\frac{x^2}{8}+\frac{x}{4}
\end{aligned}
\end{equation}
&lt;/li&gt;
Hence below is the summary of the cdf,
\begin{equation}\nonumber
F_{XY}(x,y)=\begin{cases}
0,&amp;x\leq 0, y\leq 0\\
\frac{x^2y}{8}+\frac{xy^2}{4},&amp; 0 &lt; x &lt; 2\;\text{and}\;0 &lt; y &lt; 1\\
\frac{y}{2}+\frac{y^2}{2},&amp; x\geq 2\;\text{and}\;0 &lt; y &lt; 1\\
\frac{x^2}{8}+\frac{x}{4}, &amp; 0 &lt; x &lt; 2\;\text{and}\;y \geq 1\\
1,&amp;x\geq 2\;\text{and}\;y\geq 1
\end{cases}
\end{equation}
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;Find $P(X &gt; \sqrt{Y})$ if $X$ and $Y$ are jointly distributed with pdf
\begin{equation}
f(x,y)=x+y,\;0\leq x\leq 1,\;0\leq y\leq 1.
\end{equation}
&lt;/li&gt;
&lt;li&gt;Find $P(X^2 &lt; Y &lt; X)$ if $X$ and $Y$ are jointly distributed with pdf
\begin{equation}
f(x,y)=2x,\;0\leq x\leq 1,\; 0\leq y \leq 1.
\end{equation}&lt;/li&gt;
&lt;/ol&gt;
&lt;i&gt;Solutions:&lt;/i&gt;
&lt;ol type = &quot;a&quot;&gt;
&lt;li&gt;$P(X &gt; \sqrt{Y})=P(Y &lt; X^2)$. Now the plot of $y=x^2$ is shown below
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3X9v9y5AtBNZ1g8tNQL-BqE_3EUuBsPqbaDVD7j-VKp_y78MHZYai9EZGcucyKABQJ4KzbeqCLx5d0X8SqkfaS7PfhdYU3oc81g9ma6Znqwh-VB22jiY-KJtC9opAi1wWAhAAs4MaCw_e/s1600/Screenshot+from+2014-11-01+20:34:27.png&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/8b6f1689526ff3f8951f&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;&lt;br/&gt;
The probability of the blue region above is computed as follows,
\begin{equation}\nonumber
\begin{aligned}
P(Y &lt; X^2)&amp;=\int_{0}^{1}\int_{0}^{x^2}x + y\operatorname{d}y\operatorname{d}x\\
&amp;=\int_{0}^{1}\left(xy+\frac{y^2}{2}\right)\Bigg|_{y=0}^{y=x^2}\operatorname{d}x\\
&amp;=\int_{0}^{1}\left(x^3+\frac{x^4}{2}\right)\operatorname{d}x\\
&amp;=\left(\frac{x^4}{4}+\frac{x^5}{10}\right)\Bigg|_{0}^{1}\\
&amp;=\frac{1}{4}+\frac{1}{10}=\frac{7}{20}
\end{aligned}
\end{equation}
&lt;script src=&quot;https://gist.github.com/alstat/a26ebf05903731c5d17b.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;So we are interested on the event between $y=x$ and $y=x^2$, as shown below
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTh-SHQyflpSzw_EKnox0k4f5BUHI7xusnDb-Yb1b0ZIMRN7FY3siljPvRdziFaTzNpHIkn0_FCHsdzF6LfElz0uY02fCgw-kYliiQ-XZqq3aM9zhzaqCuJ7ZVwftd9LONiM2zkXZ6HzqD/s1600/Screenshot+from+2014-11-01+20:52:39.png&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/838cc9189d6bbb46d0e2&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;&lt;br/&gt;
Thus,
\begin{equation}\nonumber
\begin{aligned}
P(X^2 &lt; Y &lt; X) &amp;=\int_{0}^{1}\int_{x^2}^{x}2x\operatorname{d}y\operatorname{d}x\\
&amp;=\int_{0}^{1}2xy\Bigg|_{y=x^2}^{y=x}\operatorname{d}x=\int_{0}^{1}(2x^2-2x^3)\operatorname{d}x\\
&amp;=\left(\frac{2x^3}{3}-\frac{x^4}{2}\right)\Bigg|_{0}^{1}\\
&amp;=\frac{2}{3}-\frac{1}{2}=\frac{1}{6}
\end{aligned}
\end{equation}
&lt;script src=&quot;https://gist.github.com/alstat/16383ce6d85657f31bad.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/928073872476658308/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/01/multiple-random-variables-problems.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/928073872476658308'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/928073872476658308'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/01/multiple-random-variables-problems.html' title='Multiple Random Variables Problems'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjazWu4wwnTB929g3g7G3sreEGlCO2DHPI_ROr8mXhyphenhyphenxE8V3y6M9UebdAsQD31Jxf2y0stbHfxEQLPjtgotoLSkNFd-XjdgQ0ULFsRGGYQEC1kUbu8wkWFYWSzLtnzOBTFup6GbDwf6Fn6I/s72-c/Screenshot+from+2014-10-29+23:25:05.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-8937118107731420127</id><published>2015-01-15T21:41:00.000+08:00</published><updated>2015-04-20T22:09:17.292+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Descriptive Statistics"/><category scheme="http://www.blogger.com/atom/ns#" term="Parametric Inference"/><category scheme="http://www.blogger.com/atom/ns#" term="SAS"/><title type='text'>New Toy: SAS&amp;reg; University Edition</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
So I started using SAS® University Edition which is a FREE version of SAS® software. Again it&#39;s FREE, and that&#39;s the main reason why I want to relearn the language. The software was &lt;a href=&quot;http://www.sas.com/en_us/news/press-releases/2014/march/analytics-u-sgf14.html&quot; target=&quot;_blank&quot;&gt;announced on March 24, 2014&lt;/a&gt; and the &lt;a href=&quot;http://blogs.sas.com/content/academic/2014/03/24/new-academic-offerings-announced-at-sas-global-forum/&quot; target=&quot;_blank&quot;&gt;download went available on May of that year&lt;/a&gt;. And for that, I salute &lt;a href=&quot;http://en.wikipedia.org/wiki/James_Goodnight&quot; target=&quot;_blank&quot;&gt;Dr. Jim Goodnight&lt;/a&gt;. At least we can learn SAS® without paying for the expensive price tag, especially for single user like me.&lt;br /&gt;
&lt;br /&gt;
The software requires a virtual machine, where it runs on top of that; and a 64-bit processor. To install, just follow the instruction in this &lt;a href=&quot;https://www.youtube.com/watch?v=sVFAxyLkc3g&quot; target=&quot;blank&quot;&gt;video&lt;/a&gt;. Although the installation in the video is done in Windows, it also works on Mac. Below is the screenshot of my SAS® Studio running on Safari.
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFh_HkDD_e53fDtoEk-9h-dDC0kum3znqlSEkqPVVGfp3zujUiUbbjyj0GSbmnHBXy4SiyU_1T5_awqvZXYhNP95dlv0GUMdMYxWofx9vgo7mGBS2ky-3r38sSx3PwIRVEVyAfHuWv6Osi/s1600/Screen+Shot+2015-01-12+at+10.26.42+PM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFh_HkDD_e53fDtoEk-9h-dDC0kum3znqlSEkqPVVGfp3zujUiUbbjyj0GSbmnHBXy4SiyU_1T5_awqvZXYhNP95dlv0GUMdMYxWofx9vgo7mGBS2ky-3r38sSx3PwIRVEVyAfHuWv6Osi/s1600/Screen+Shot+2015-01-12+at+10.26.42+PM.png&quot; height=&quot;250&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;h3&gt;
What&#39;s in the box?&lt;/h3&gt;
The software includes the following libraries:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Base SAS® - Make programming fast and easy with the SAS® programming language, ODS graphics and reporting procedure;&lt;/li&gt;
&lt;li&gt;SAS/STAT® - Trust SAS® proven reliability with a wide variety of statistical methods and techniques;&lt;/li&gt;
&lt;li&gt;SAS/IML® - Use this matrix programming language for more specialized analyses and data exploration;&lt;/li&gt;
&lt;li&gt;SAS Studio - Reduce your programming time with autocomplete for hundreds of SAS® statements and procedures, as well as built-in syntax help;&lt;/li&gt;
&lt;li&gt;SAS/ACCESS® - Seamlessly connect with your data, no matter where it resides.&lt;/li&gt;
&lt;/ol&gt;
For more about SAS® University Edition please refer to the &lt;a href=&quot;http://www.sas.com/content/dam/SAS/en_us/doc/factsheet/sas-university-edition-107140.pdf&quot; target=&quot;_blank&quot;&gt;fact sheet&lt;/a&gt;. &lt;br /&gt;
&lt;br /&gt;
If you&#39;ve been following this blog, I have been promoting free software (R, Python, and C/C++) for analysis, and the introduction of SAS® University Edition will only mean one thing, a new topic to discuss on succeeding posts. So let&#39;s welcome this software by doing analysis on it. &lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Analysis&lt;/h3&gt;
Our goal here is to address the basics in order to proceed with the analysis, and thus we have the following:
1. Importing and transforming the data; 2. Descriptive statistics; 3. Hypothesis testing: One-sample t test; 4. Creating function; and, 5. Visualization.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Data&lt;/h3&gt;
We&#39;ll use again the Volume of Palay Production (1994 to 2013 quarterly) from Cordillera Administrative Region (CAR) Philippines. To reproduce this article, please click &lt;a href=&quot;https://raw.githubusercontent.com/alstat/Analysis-with-Programming/master/2015/SAS/New%20Toy%20SAS%20University%20Edition/palay.csv&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt; to download the data.
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;b&gt;Importing and transforming the data&lt;/b&gt;&lt;br /&gt;
Working in SAS® Studio, requires you to upload your data into it. To do this, hover to the sidebar, click on Folders tab, and there you will find the &quot;up arrow&quot; for upload. See picture below&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBI9Kx1enwVBlzP3IFcozkyfrRKutw0H7LtMDwwynpYxhjxpAAEilaELf-p8gT1qQzrXRZjLXQW_YUKcm6Rdt7EYhtSPtFB8q3uJSZjCvMOkXJztSl-BhZ1T7GjW9Bjt91FyJwc_wt5loV/s1600/Screen+Shot+2015-01-12+at+10.56.35+PM.png&quot; height=&quot;239&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
You are now set to import the data using the following code. As for my case, the location of the uploaded data seen from the above photo is in &quot;/folders/myfolders/palay.csv&quot;,&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/149ebdab13f603066c18.js&quot;&gt;&lt;/script&gt;
In SAS®, &lt;code&gt;proc&lt;/code&gt; refers to procedure, where in this case we perform the &lt;code&gt;import&lt;/code&gt; procedure. &lt;code&gt;out&lt;/code&gt; is the path where the SAS® data is saved, here we saved it in &quot;Work&quot; folder with filename &quot;palay&quot;. &lt;code&gt;getnames&lt;/code&gt; determines whether to generate SAS® variable names from the data values in the first 
record of the imported file. Finally, &lt;code&gt;datarow&lt;/code&gt; starts reading data from the specified row number in the delimited text file. &lt;br /&gt;&lt;br /&gt;
I want to emphasize that the description of the arguments of the statements and procedures above is available in the software itself, thanks to SAS® Studio, autocomplete for hundreds of SAS® statements and procedures is very handy. So that in the proceeding codes, we will give description on selected statements only. Below is the autocomplete feature of SAS® Studio seen in action,&lt;br /&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNuIsoWw8Bh2EyZlO2Za3jt8fB1TUrt510F8SUBpwDKiQuBKjdbNmxVMHlkQQDoqYgMXl85mrKtorTh7ADAUlbd0XPEeUxyWG08bzfYK7UtNmltnXP2uMsQvKYP_xUWyqBN6yzN0GczfJm/s1600/Screen+Shot+2015-01-13+at+8.21.57+PM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNuIsoWw8Bh2EyZlO2Za3jt8fB1TUrt510F8SUBpwDKiQuBKjdbNmxVMHlkQQDoqYgMXl85mrKtorTh7ADAUlbd0XPEeUxyWG08bzfYK7UtNmltnXP2uMsQvKYP_xUWyqBN6yzN0GczfJm/s1600/Screen+Shot+2015-01-13+at+8.21.57+PM.png&quot; height=&quot;178&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
Now that we have the data in our workspace, let&#39;s do some transformation on it. In R, we always start by viewing the head of the data or the first few observations of the data, and we code it as &lt;code&gt;head(data)&lt;/code&gt;. Having that habit, here&#39;s how to do it in SAS®, in this case, first five observations,&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/03a6634abf6500379eb0.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Obs&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Abra&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Apayao&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Benguet&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Ifugao&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Kalinga&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Mt_Province&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1243&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2934&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;148&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3300&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;10553&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2675&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;2&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;4158&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;9235&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4287&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;8063&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;35257&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1920&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;3&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1787&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1922&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1955&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1074&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4544&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6955&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;4&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;17152&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;14501&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3536&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;19607&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;31687&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2715&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;5&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1266&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2385&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2530&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3315&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;8520&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2601&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
If you want to start and end on specific row, you can do the following. In this case, from 5th row to 10th row:
&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5b500443b9270b49a685.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Obs&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Abra&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Apayao&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Benguet&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Ifugao&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Kalinga&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Mt_Province&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;5&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1266&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2385&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2530&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3315&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;8520&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2601&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;6&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;5576&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;7452&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;771&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;13134&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;28252&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1242&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;7&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;927&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1099&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2796&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5134&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3106&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;9145&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;8&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;21540&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;17038&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2463&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;14226&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;36238&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2465&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;9&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1039&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1382&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2592&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6842&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4973&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2624&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;10&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;5424&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;10588&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1064&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;13828&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;40140&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1237&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
Now, what about playing with the variables of the data? Say we want to view a specific column only, assuming observations from row 15 to 20 of the Benguet variable, how is that? Well, I humbly present to you the following code,
&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/72c518c5da87505b7b0f.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Obs&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Benguet&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;15&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;2847&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;16&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;2942&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;17&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;2119&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;18&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;734&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;19&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;2302&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;20&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;2598&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
For viewing multiple columns, simply enumerate the name of the variables using either &lt;code&gt;keep&lt;/code&gt; -- keeps the variables to be returned, or &lt;code&gt;drop&lt;/code&gt; -- drops the variables, excluded in the printing.
&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/c28f152c6f0a3431fea3.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Obs&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Abra&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Apayao&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Benguet&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Ifugao&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Kalinga&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;15&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1048&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1427&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2847&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5526&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;4402&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;16&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;25679&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;15661&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2942&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;14452&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;33717&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;17&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1055&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2191&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2119&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;5882&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;7352&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;18&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;5437&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6461&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;734&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;10477&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;24494&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;19&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;1029&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1183&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2302&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;6438&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;3316&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;20&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;23710&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;12222&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;2598&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;8446&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;26659&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
I think above are enough demonstrations for data transformation.
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Perform descriptive statistics&lt;/b&gt;&lt;br /&gt;
And as always, next step is to look on the descriptive statistics of the data, and here&#39;s how to do it,
&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e6103617086bc4c6d1b4.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;b header&quot; scope=&quot;col&quot;&gt;Variable&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;N&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Mean&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Std Dev&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Minimum&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Maximum&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;data&quot;&gt;&lt;div class=&quot;stacked-values&quot;&gt;
&lt;div&gt;
Abra&lt;/div&gt;
&lt;div&gt;
Apayao&lt;/div&gt;
&lt;div&gt;
Benguet&lt;/div&gt;
&lt;div&gt;
Ifugao&lt;/div&gt;
&lt;div&gt;
Kalinga&lt;/div&gt;
&lt;div&gt;
Mt_Province&lt;/div&gt;
&lt;/div&gt;
&lt;/th&gt;
&lt;td class=&quot;r data&quot;&gt;&lt;div class=&quot;stacked-values&quot;&gt;
&lt;div&gt;
79&lt;/div&gt;
&lt;div&gt;
79&lt;/div&gt;
&lt;div&gt;
79&lt;/div&gt;
&lt;div&gt;
79&lt;/div&gt;
&lt;div&gt;
79&lt;/div&gt;
&lt;div&gt;
79&lt;/div&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&lt;div class=&quot;stacked-values&quot;&gt;
&lt;div&gt;
12874.38&lt;/div&gt;
&lt;div&gt;
16860.65&lt;/div&gt;
&lt;div&gt;
3237.39&lt;/div&gt;
&lt;div&gt;
12414.62&lt;/div&gt;
&lt;div&gt;
30446.42&lt;/div&gt;
&lt;div&gt;
4506.20&lt;/div&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&lt;div class=&quot;stacked-values&quot;&gt;
&lt;div&gt;
16746.47&lt;/div&gt;
&lt;div&gt;
15448.15&lt;/div&gt;
&lt;div&gt;
1588.54&lt;/div&gt;
&lt;div&gt;
5034.28&lt;/div&gt;
&lt;div&gt;
22245.71&lt;/div&gt;
&lt;div&gt;
3815.71&lt;/div&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&lt;div class=&quot;stacked-values&quot;&gt;
&lt;div&gt;
927.0000000&lt;/div&gt;
&lt;div&gt;
401.0000000&lt;/div&gt;
&lt;div&gt;
148.0000000&lt;/div&gt;
&lt;div&gt;
1074.00&lt;/div&gt;
&lt;div&gt;
2346.00&lt;/div&gt;
&lt;div&gt;
382.0000000&lt;/div&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;&lt;div class=&quot;stacked-values&quot;&gt;
&lt;div&gt;
60303.00&lt;/div&gt;
&lt;div&gt;
54625.00&lt;/div&gt;
&lt;div&gt;
8813.00&lt;/div&gt;
&lt;div&gt;
21031.00&lt;/div&gt;
&lt;div&gt;
68663.00&lt;/div&gt;
&lt;div&gt;
13038.00&lt;/div&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
In case you want to view few or more statistics, you can try
&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e6316e9184d360a18e23.js&quot;&gt;&lt;/script&gt;
We&#39;ll end this section with the following scatter plot matrix,&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuKRNLjtgU7wz7BzemVJs8BHZ2ry8zLi8tz26XGrfJ05y9ogwo-OZtxjLiXw_KqRTFc1EU-gDL6HXoprInpkeQmIiELrehAyWF-F2jTCHWf6QK4hKwgQYJQ5sympx4T97lNZOpwDQyXTIl/s1600/SGScatter6.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuKRNLjtgU7wz7BzemVJs8BHZ2ry8zLi8tz26XGrfJ05y9ogwo-OZtxjLiXw_KqRTFc1EU-gDL6HXoprInpkeQmIiELrehAyWF-F2jTCHWf6QK4hKwgQYJQ5sympx4T97lNZOpwDQyXTIl/s1600/SGScatter6.png&quot; height=&quot;400&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e3f4d5a4e5f6d5413caf.js&quot;&gt;&lt;/script&gt;
A quick analysis, we see a strong positive relationship between Kalinga and Apayao; and relationship between Ifugao and Benguet base on the above scatter plot matrix.
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Hypothesis testing: One-sample t test&lt;/b&gt;&lt;br /&gt;
Let&#39;s perform simple hypothesis testing, the one-sample t test. Using 0.05 level of significance we&#39;ll test whether the true mean of Abra is not equal to 15000.&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/b785a6c4992aed2d5fc6.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;N&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Mean&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Std&amp;nbsp;Dev&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Std&amp;nbsp;Err&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Minimum&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Maximum&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;79&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;12874.4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;16746.5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;1884.1&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;927.0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;60303.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Mean&lt;/th&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;95% CL Mean&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Std&amp;nbsp;Dev&lt;/th&gt;
&lt;th class=&quot;c b header&quot; colspan=&quot;2&quot; scope=&quot;colgroup&quot;&gt;95% CL Std Dev&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;12874.4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;9123.4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;16625.4&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;16746.5&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;14480.9&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;19859.1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;DF&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;t&amp;nbsp;Value&lt;/th&gt;
&lt;th class=&quot;r b header&quot; scope=&quot;col&quot;&gt;Pr&amp;nbsp;&amp;gt;&amp;nbsp;|t|&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;r data&quot;&gt;78&lt;/td&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap;&quot;&gt;-1.13&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;0.2627&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzY9sxw2x9zjEirEtsHPsTio9hiQUh10j8XoDWsLyjvpy-cwREZbBjQTdwCbQDlebOPaAJGbpaR8eM__r6Nod90iiG7CqVIChuZL2Wz1qP3sF3pW1FUar6p32-xXefvdeYCJseol8YAfpZ/s1600/SummaryPanel5.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzY9sxw2x9zjEirEtsHPsTio9hiQUh10j8XoDWsLyjvpy-cwREZbBjQTdwCbQDlebOPaAJGbpaR8eM__r6Nod90iiG7CqVIChuZL2Wz1qP3sF3pW1FUar6p32-xXefvdeYCJseol8YAfpZ/s1600/SummaryPanel5.png&quot; height=&quot;300&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSAj17lfhrzBYKo6hg9vKwOacuKpyVPoOKL-8CKnzJ2xkgxRpZCZZl95u9snHfE3pqaZjJ0Cd1aCRpnGvOW0RAzWHnA9Xvb_iCmBelXeAHNcEGroysNxnfz75BUdgUH7p_ACWPOWg02chA/s1600/QQPlot5.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhSAj17lfhrzBYKo6hg9vKwOacuKpyVPoOKL-8CKnzJ2xkgxRpZCZZl95u9snHfE3pqaZjJ0Cd1aCRpnGvOW0RAzWHnA9Xvb_iCmBelXeAHNcEGroysNxnfz75BUdgUH7p_ACWPOWg02chA/s1600/QQPlot5.png&quot; height=&quot;300&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
From the above numerical output, we see that the p-value = 0.2627 is greater than $\alpha = 0.05$, hence there is no sufficient evidence to conclude that the average volume of palay production is not equal to 15000. Graphically, the observations of the Abra variable is not normally distributed based on its Q-Q plot, although that is subjective but evidently the points clearly deviates from the line.
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Creating a function&lt;/b&gt;&lt;br /&gt;
Let&#39;s create a function, we&#39;ll use the &lt;code&gt;fcmp&lt;/code&gt; procedure. For illustration purposes, consider the standard normal function,
$$
\phi(x) = \frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^2}{2}\right\}
$$
In SAS&amp;reg; we code it as follows,&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/68be9e5388302c5d18fb.js&quot;&gt;&lt;/script&gt;
To generate data from this function using &lt;code&gt;do loop&lt;/code&gt;, consider the following:&lt;br /&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/b3484eeee29c0edcc68c.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table class=&quot;table&quot; style=&quot;border-spacing: 0;&quot;&gt;
&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;colgroup&gt;&lt;col&gt;&lt;/col&gt;&lt;col&gt;&lt;/col&gt;&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;Obs&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;x&lt;/th&gt;
&lt;th class=&quot;r header&quot; scope=&quot;col&quot;&gt;y&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;1&lt;/th&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap;&quot;&gt;-5.0&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;.000001487&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;2&lt;/th&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap;&quot;&gt;-4.9&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;.000002439&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;3&lt;/th&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap;&quot;&gt;-4.8&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;.000003961&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;4&lt;/th&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap;&quot;&gt;-4.7&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;.000006370&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;r rowheader&quot; scope=&quot;row&quot;&gt;5&lt;/th&gt;
&lt;td class=&quot;r data&quot; style=&quot;white-space: nowrap;&quot;&gt;-4.6&lt;/td&gt;
&lt;td class=&quot;r data&quot;&gt;.000010141&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/center&gt;
And that&#39;s how you create and use a function in SAS®. For me, the function definition procedure &lt;code&gt;fcmp&lt;/code&gt; is the best procedure to be included in SAS® version 9.2, and I&#39;m just lucky relearning this language with this feature available, especially that it is FREE in SAS® Studio.
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Visualization&lt;/b&gt;&lt;br /&gt;
Now it&#39;s time for us to create some visual art. And SAS® being a propriety software, has a lot to offer. We&#39;ve demonstrate few above already, this time let&#39;s plot the data points of &lt;code&gt;sn_data&lt;/code&gt; generated from the &lt;code&gt;stdnorm&lt;/code&gt; function we define earlier. Here it is,&lt;br /&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDw0EkNGHtmfYWc9EBGq1Goa5HjTExgnVgm5M9Q4rRKWQt_5ni6tAI_QMorssq1ePZLjFcXlGalXTut3YiX134Me_rYrVF5MzMvWIKX_DpKQhyWTSnoib12FNhZJY-nZC-JTv_E1E9uCCa/s1600/SGPlot2-2.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDw0EkNGHtmfYWc9EBGq1Goa5HjTExgnVgm5M9Q4rRKWQt_5ni6tAI_QMorssq1ePZLjFcXlGalXTut3YiX134Me_rYrVF5MzMvWIKX_DpKQhyWTSnoib12FNhZJY-nZC-JTv_E1E9uCCa/s1600/SGPlot2-2.png&quot; height=&quot;300&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a5cc13b1c8a89fdfef3e.js&quot;&gt;&lt;/script&gt;
For other types of plot, simply go to the Snippets tab in the side bar of the SAS® Studio, and there you will find template codes for different types of plots. See picture below,&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_IM6Veoy6ztU0AFE-cnlNChIVY-pgczrnMq8moY1ZEjdSriB3Ab2hM66YJ9dQAP2YJMHJpN1_a4k9gOYt0x9osIhbsExpCwB40czaxPFIMav-KnLPp9Q8oz2yIvhBmFLtwjd8jbVgiqRe/s1600/Screen+Shot+2015-01-15+at+1.15.18+PM.png&quot; height=&quot;400&quot; width=&quot;328&quot; /&gt;&lt;/div&gt;
I will end this section with histogram and series plot. 
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Histogram&lt;/b&gt;&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiL_ZHgaE-zyaiEWgab62iBiKwlM8nS5TA2oIaLDa15SsLh8dZAjgIdzJK4CH4RGFOK9RidRpylW_f2yz73VAh5gT29aPWmZYl2Al94gj-1hORggdr8fDFYVSnyVXwv0xLro2db_QkdOPrM/s1600/SGPlot5.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiL_ZHgaE-zyaiEWgab62iBiKwlM8nS5TA2oIaLDa15SsLh8dZAjgIdzJK4CH4RGFOK9RidRpylW_f2yz73VAh5gT29aPWmZYl2Al94gj-1hORggdr8fDFYVSnyVXwv0xLro2db_QkdOPrM/s1600/SGPlot5.png&quot; height=&quot;300&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ade75f9e88c758f257e5.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Historical&lt;/b&gt;&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf8ty4IjPYIphG3J8xzH1_qz8jmWGkeP_Iwx9z5IjhiKmgdjRcmPS-Klv0qqRYaq6VfYm3E6Jdj2teYP8_fbWVffdr73hWv3LE39ZXJirsAQAs_JOYQLqITWyvZQdBjhyIfaVCuQlqpf7t/s1600/SGPlot22.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf8ty4IjPYIphG3J8xzH1_qz8jmWGkeP_Iwx9z5IjhiKmgdjRcmPS-Klv0qqRYaq6VfYm3E6Jdj2teYP8_fbWVffdr73hWv3LE39ZXJirsAQAs_JOYQLqITWyvZQdBjhyIfaVCuQlqpf7t/s1600/SGPlot22.png&quot; height=&quot;300&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/763f1d6b9932248e1141.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Conclusion&lt;/h3&gt;
In conclusion, it wasn&#39;t difficult for me to relearn SAS®, not only because I have used it on few papers back in college, but also because I have programming background on R and Python, which I used as basis on understanding the grammar of the language. Overall, SAS® language is a high level language, as we see above, simple statement will give you complete results with graphics without having lengthy code. And although I used R and Python as my primary tools for research, I am happy to include SAS® on it. And despite the popularity of R in analysis, I am looking ahead to see more learners, students, and researchers even more bloggers using SAS®. That way, we can share and get ideas, techniques between communities of R, SAS®, and Python.&lt;br /&gt;
&lt;br /&gt;
What about you? How&#39;s your experience with SAS® University Edition?
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Data Source&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://countrystat.bas.gov.ph/&quot; target=&quot;_blank&quot;&gt;Philippine Bureau of Agricultural Statistics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;SAS® Documentation&lt;/li&gt;
&lt;li&gt;r4stats.com: Data Import. From &lt;a href=&quot;http://r4stats.com/examples/data-import/&quot; target=&quot;_blank&quot;&gt;http://r4stats.com/examples/data-import/&lt;/a&gt; (acccessed January 15, 2015)&lt;/li&gt;
&lt;li&gt;SAS Learning Module: Subsetting data in SAS. From &lt;a href=&quot;http://www.ats.ucla.edu/stat/sas/modules/subset.htm&quot;&gt;http://www.ats.ucla.edu/stat/sas/modules/subset.htm&lt;/a&gt; (accessed January 15, 2015)&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;style&gt;
.header {
    background-color: #EDF2F9;
    border-color: #B0B7BB;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    color: #127;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: bold;
    padding: 2px 5px 2px 5px;
}


.rowheader {
    background-color: #EDF2F9;
    border-color: #B0B7BB;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    color: #127;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: bold;
    text-align: center;
    padding: 2px 5px 2px 5px;
}


.data, .dataemphasis {
    background-color: #FFF;
    border-color: #C1C1C1;
    border-style: solid;
    border-width: 0px 1px 1px 0px;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: normal;
    text-align: right;
    padding: 2px 5px 2px 5px;
}

.table {
    border-color: #C1C1C1;
    border-style: solid;
    border-width: 1px 1px 1px 1px;
    border-collapse: collapse;
    border-spacing: 0px;
    padding: 5px 5px 5px 5px;
    margin-bottom: 1em;
}

.body {
    color: #000;
    font-family: Arial,&quot;Albany AMT&quot;,Helvetica,Helv;
    font-size: x-small;
    font-style: normal;
    font-weight: normal;
    line-height: 1.231;
}
&lt;/style&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/8937118107731420127/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/01/new-toy-sas-university-edition.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8937118107731420127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8937118107731420127'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/01/new-toy-sas-university-edition.html' title='New Toy: SAS&amp;reg; University Edition'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFh_HkDD_e53fDtoEk-9h-dDC0kum3znqlSEkqPVVGfp3zujUiUbbjyj0GSbmnHBXy4SiyU_1T5_awqvZXYhNP95dlv0GUMdMYxWofx9vgo7mGBS2ky-3r38sSx3PwIRVEVyAfHuWv6Osi/s72-c/Screen+Shot+2015-01-12+at+10.26.42+PM.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-3346169097209319093</id><published>2015-01-05T15:38:00.000+08:00</published><updated>2015-01-19T15:17:21.158+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Data Mining"/><category scheme="http://www.blogger.com/atom/ns#" term="Image Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="Multivariate Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>R: Canonical Correlation Analysis on Imaging</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
In imaging, we deal with multivariate data, like in array form with several spectral bands. And trying to come up with interpretation across correlations of its dimensions is very challenging, if not impossible. For example let&#39;s recall the number of spectral bands of AVIRIS data we used in the &lt;a href=&quot;http://alstatr.blogspot.com/2014/12/principal-component-analysis-on-imaging.html&quot; target=&quot;_blank&quot;&gt;previous post&lt;/a&gt;. There are 152 bands, so in total there are 152$\cdot$152 = 23104 correlations of pairs of random variables. How will you be able to interpret that huge number of correlations? &lt;br /&gt;
&lt;br /&gt;
To engage on this, it might be better if we group these variables into two and study the relationship between these sets of variables. Such statistical procedure can be done using the canonical correlation analysis (CCA). An example of this on health sciences (from Reference 2) is variables related to exercise and health. On one hand you have variables associated with exercise, observations such as the climbing rate on a stair stepper, how fast you can run, the amount of weight lifted on bench press, the number of push-ups per minute, etc. But you also might have health variables such as blood pressure, cholesterol levels, glucose levels, body mass index, etc. So two types of variables are measured and the relationships between the exercise variables and the health variables are to be studied.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Methodology&lt;/h3&gt;
Mathematically we have the following procedures:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Divide the random variables into two groups, and assign these to the following random vectors:
\begin{equation}\nonumber
\mathbf{X} = [X_1,X_2,\cdots, X_p]^T\;\text{and}\;\mathbf{Y} = [Y_1,Y_2,\cdots, Y_q]^T
\end{equation}
&lt;/li&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;li&gt;Analogous to principal component analysis (PCA), we aim to find a linear combination
\begin{equation}\nonumber
\begin{aligned}
U_1 = &amp;amp;\mathbf{a}_1^T\mathbf{X} = a_{11}X_1 + a_{12}X_2+\cdots + a_{1p}X_p\\
U_2 = &amp;amp;\mathbf{a}_2^T\mathbf{X} = a_{21}X_1 + a_{22}X_2+\cdots + a_{2p}X_p\\
&amp;amp;\qquad\quad\qquad\vdots\qquad\qquad\vdots\\
U_p = &amp;amp;\mathbf{a}_p^T\mathbf{X} = a_{p1}X_1 + a_{p2}X_2+\cdots + a_{pp}X_p
\end{aligned}
\end{equation}
and
\begin{equation}\nonumber
\begin{aligned}
V_1 = &amp;amp;\mathbf{b}_1^T\mathbf{Y}=b_{11}Y_1 + b_{12}Y_2+\cdots + b_{1q}Y_q\\
V_2 = &amp;amp;\mathbf{b}_2^T\mathbf{Y}=b_{21}Y_1 + b_{22}Y_2+\cdots + b_{2q}Y_q\\
&amp;amp;\qquad\quad\qquad\vdots\qquad\qquad\vdots\\
V_q = &amp;amp;\mathbf{b}_q^T\mathbf{Y}=b_{q1}Y_1 + b_{q2}Y_2+\cdots + b_{qq}Y_q\\
\end{aligned}
\end{equation}
that will maximize the correlation
\begin{equation}\nonumber
Corr(U_i,V_i)=\frac{Cov(U_i,V_i)}{\sqrt{Var(U_i)}\sqrt{Var{V_i}}},\quad i=1,2\cdots,n
\end{equation}
where $n = \min{(p, q)}$.
&lt;/li&gt;
&lt;li&gt;The first pair canonical variables is defined by
\begin{equation}\nonumber
Corr(U_1, V_1)=\rho_1=\sqrt{\rho_1^2},
\end{equation}
where $\rho_1$, the first canonical correlation, is the square root of the highest of the eigenvalues, $\rho_1^2\geq \rho_2^2\geq \cdots \geq \rho_n^2$, which is the eigenvalues of the matrix $\mathbf{\Sigma}_{XX}^{-1/2}\mathbf{\Sigma}_{XY}\mathbf{\Sigma}_{YY}^{-1}\mathbf{\Sigma}_{XY}^{T}\mathbf{\Sigma}_{XX}^{-1/2}$, where $\mathbf{\Sigma}_{XX}$ is the variance-covariance of $\mathbf{X}$; $\mathbf{\Sigma}_{YY}$ is the variance-covariance of $\mathbf{Y}$; and $\mathbf{\Sigma}_{XY}$ is the covariance matrix of the random vector $\mathbf{XY}$. So that the second pair canonical variable is given by
\begin{equation}\nonumber
Corr(U_2, V_2)=\rho_2=\sqrt{\rho_2^2},
\end{equation}
and so on.
&lt;/li&gt;
&lt;/ol&gt;
For more detailed theory of CCA, please refer to Reference 1 and 2 below. To continue, let&#39;s apply this methodology on an image. We will use the Grass data from (Bajorski, 2012), and do analysis on it using R. Below is the proper description of the data.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Data&lt;/h3&gt;
Grass data is a spectral image of 64 by 64 pixels, grass texture. Each pixel is represented by a spectral reflectance curve in 42 spectral bands with reflectance given in percent.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;

Analysis
&lt;/h3&gt;
To begin, let&#39;s display the data in an image form:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJeHlSjCEo0GJ8GmjaVNXg5aGtz65OjshUtvYG4JVOO6D6O-a9lATm8FlJOpyaxXh395Y1gCzuJr85-airoiq1H8XyfFMSkbza7V7NjTQ-tl9txgt-SRhBzWH2MdEyqVGNHJU3ADodJety/s1600/Rplot03.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJeHlSjCEo0GJ8GmjaVNXg5aGtz65OjshUtvYG4JVOO6D6O-a9lATm8FlJOpyaxXh395Y1gCzuJr85-airoiq1H8XyfFMSkbza7V7NjTQ-tl9txgt-SRhBzWH2MdEyqVGNHJU3ADodJety/s1600/Rplot03.png&quot; height=&quot;238&quot; width=&quot;320&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f337ab80631bbc6b4ec8.js&quot;&gt;&lt;/script&gt;
The code generates the first 12 spectral bands of the data, where we observe a significant change on brightness of the twelfth band compared to the first band. The signature of all pixels across these bands is shown below:
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgU-MWT8R-bSiPum8u2WxsWKZr3AbUUHg4ODKYRoXE6wlbnpe6DaB7sSMsvRE1uY8M3bVVD1g_5xkN8-RPhrYU5-28EZB899HA5eHfn2KhYL3BjjtVhxn_wKw8VCrGaAjHq9RuYcvXm2KkB/s1600/Rplot04.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/3fb1851f99dfd6e76bc0.js&quot;&gt;&lt;/script&gt;
Investigating on the above plot tells us that it seems almost all bands are correlated; that is, if the reflectance of a given pixel on $i$th band (increases or decreases), the $j$th band, $i\neq j$, is also expected to (increase or decrease); except on bands 30 and 31 where seems to be no clear pattern on it. But that&#39;s subjective, we cannot tell exactly because there are 4096 signatures (lines in the plot) that will likely to overlap other important informations. So to see properly the relationship between all variables, here is the correlation matrix of all the spectral bands,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8-pj-kHgRgoACRQTUywT02mECRltqgptm1HMk7P_J50HomBMNfjDWbKcxwccBrsU-kNt-3TWae7FiVkQ8QjSmThpPSn97PapLbGduIQwNiew6JqaN2F8lNHxtbyk2WaTl8o4iGLx01Qaa/s1600/Rplot05.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/803738fd9d8f6a19d442.js&quot;&gt;&lt;/script&gt;
The cyan colour engulfing almost 60 percent of the region indicates higher correlation between the corresponding spectral bands. But the fuchsia colour that is pronounced in the plot tells us low correlation between those bands. Now let&#39;s divide this data into two, from 42 bands we can have two equal sets of variables (each with 21 dimensions). But for purpose of illustration, we&#39;ll consider unequal sets of variables, say the first 15 bands is classified as first group and the remaining bands 16 - 42 be the second group, hence $p=15$ and $q=27$. So that there are $\min(p,q)=n=15$ pairs of canonical variables. And applying CCA we have,
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/2017513ca2cbacfa2389.js&quot;&gt;&lt;/script&gt;
The above numerical output returned is actually the $n=15$ canonical correlations. And as we can see, the first five canonical correlations are very large implying that the linear combinations we obtain on the first five canonical variables were highly correlated to each other. For subsequent correlations, similar way of interpretation can be done. Next, we&#39;ll examine the coefficients of the first five canonical variables to see which bands is highly explained by the above canonical correlations. The &lt;code&gt;cancor&lt;/code&gt; function returns the following components:
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;cor&lt;/code&gt; - correlations;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;xcoef&lt;/code&gt; - estimated coefficients for the x variables;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ycoef&lt;/code&gt; - estimated coefficients for the y variables;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;xcenter&lt;/code&gt; - the values used to adjust the x variables; and,&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ycenter&lt;/code&gt; - the values used to adjust the x variables.&lt;/li&gt;
&lt;/ol&gt;
We are interested on &lt;code&gt;xcoef&lt;/code&gt; and &lt;code&gt;ycoef&lt;/code&gt;, and so the plot of the coefficients of the first three $i$s of $U_i$s and $V_i$s random variables is shown below,
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkiVRY6sskuXx0i0cBvvgMJb9Jdjyd3KmGLQa8EKS2yfyqDNbSBoUDRqXetx30wzV_yRsiothyt8WaFb-ncaBAmv3830I2SLydyqrCY-T4lwd3Nnxt7HhQIKIhtDRr5ydNzFWid1FsP3cp/s1600/Rplot08.png&quot; /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXAtyMg7sTFLNwNnMT0wSDc_D5pqWe8qzW7NLU5OcYjOpBysEKSR_HDOeGMtInCCrES8iIBQ3k2Be8pmqTBgXKMqS5trt62fezZamroTJRtxG1F6Hy_HYFScEmqlHJKkZOMHBNZ5sn4H8U/s1600/Rplot10.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/cf7cf403193d80b1ca01.js&quot;&gt;&lt;/script&gt;
A closer look on the plot of the coefficients of the first three $U_i$s random variables, shows us fluctuations of loadings between negative and positive values, so that the $U_1,U_2,$ and $U_3$ are a contrast of the spectral bands. And a similar situation is also observed on the plot of the coefficients of the first three $V_i$s random variables, and because of that we cannot further tell for a more specific interpretation on these bands. 
&lt;br/&gt;&lt;br/&gt;
&lt;h3&gt;Test of Canonical Dimension&lt;/h3&gt;
The dimension of the canonical variates above is $n = 15$, let&#39;s check if all these are statistically significant. We&#39;ll use the &lt;a href=&quot;http://cran.r-project.org/web/packages/CCP/index.html&quot; target = &quot;_blank&quot;&gt;CCP&lt;/a&gt; (Significance Tests for Canonical Correlation Analysis) R package, which contains &lt;code&gt;p.asym&lt;/code&gt; function that will do the job for us.
&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8455ccc15a92a631104d.js&quot;&gt;&lt;/script&gt;
Above output tells us that with 0.05 level of significance, only the first 13 canonical dimensions are significant out of 15. &lt;br/&gt;&lt;br/&gt;
For more on CCA using R, please check Reference 3. If you want to perform it on SAS, you might want to check Reference 2, and for more on imaging I suggest Reference 1. 
&lt;br /&gt;
&lt;br /&gt;
&lt;h3 style=&quot;text-align: left;&quot;&gt;
Reference&lt;/h3&gt;
&lt;ol style=&quot;text-align: left;&quot;&gt;
&lt;li&gt;&lt;a href=&quot;http://www.amazon.com/Statistics-Imaging-Optics-Photonics-Bajorski/dp/0470509457&quot; target=&quot;_blank&quot;&gt;Bajorski, P. (2012). &lt;i&gt;Statistics for Imaging, Optics, and Photonics&lt;/i&gt;. John Wiley &amp;amp; Sons, Inc.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://onlinecourses.science.psu.edu/stat505/node/63&quot;&gt;Stat 505 - Applied Multivariate Statistical Analysis. &lt;i&gt;Lesson 8: Canonical Correlation Analysis&lt;/i&gt;. Eberly College of Science, Pennsylvania State University (Penn State).&lt;/a&gt; (accessed January 2, 2015)&lt;/li&gt;
&lt;li&gt;R Data Analysis Examples: Canonical Correlation Analysis. UCLA: Statistical Consulting Group. From &lt;a href=&quot;http://www.ats.ucla.edu/stat/r/dae/canonical.htm&quot; target = &quot;_blank&quot;&gt;http://www.ats.ucla.edu/stat/r/dae/canonical.htm&lt;/a&gt; (accessed January 4, 2015)&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/3346169097209319093/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2015/01/canonical-correlation-analysis-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/3346169097209319093'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/3346169097209319093'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2015/01/canonical-correlation-analysis-on.html' title='R: Canonical Correlation Analysis on Imaging'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJeHlSjCEo0GJ8GmjaVNXg5aGtz65OjshUtvYG4JVOO6D6O-a9lATm8FlJOpyaxXh395Y1gCzuJr85-airoiq1H8XyfFMSkbza7V7NjTQ-tl9txgt-SRhBzWH2MdEyqVGNHJU3ADodJety/s72-c/Rplot03.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-3681315616899008407</id><published>2014-12-25T20:26:00.001+08:00</published><updated>2015-12-27T09:52:43.876+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Data Mining"/><category scheme="http://www.blogger.com/atom/ns#" term="Image Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="LaTeX"/><category scheme="http://www.blogger.com/atom/ns#" term="Machine Learning"/><category scheme="http://www.blogger.com/atom/ns#" term="Multivariate Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><category scheme="http://www.blogger.com/atom/ns#" term="Statistical Learning"/><title type='text'>R: Principal Component Analysis on Imaging</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Ever wonder what&#39;s the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as we reserve this for future post while I&#39;m still doing research on it. Instead, we go through its basic concept and use it for data reduction on spectral bands of the image using R.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Let&#39;s view it mathematically&lt;/h3&gt;
Consider a line $L$ in a parametric form described as a set of all vectors $k\cdot\mathbf{u}+\mathbf{v}$ parameterized by $k\in \mathbb{R}$, where $\mathbf{v}$ is a vector orthogonal to a &lt;a href=&quot;http://en.wikipedia.org/wiki/Unit_vector&quot; target=&quot;_blank&quot;&gt;normalized vector&lt;/a&gt; $\mathbf{u}$. Below is the graphical equivalent of the statement:
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgex0TuNA-RxBG_c9TPXYTjNLO30lB-zUqF6A8LxgOGVv11juKjAnrZuBxcD_9NtTui2NWt3h0rLQVzpQcGnyGaEgSzkEg85GeoM1O8QPt6dm7YgrUjSJF3lhDExxse6B6jbZtGGTlh-zka/s1600/Screen+Shot+2014-12-21+at+6.49.01+PM.png&quot; height=&quot;217&quot; width=&quot;320&quot; /&gt;&lt;/div&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;So if given a point $\mathbf{x}=[x_1,x_2]^T$, the orthogonal projection of this point on the line $L$ is given by $(\mathbf{u}^T\mathbf{x})\mathbf{u}+\mathbf{v}$. Graphically, we mean
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiznlFMfI4-7jkpk1X7U4ROYSPt4AC8CocHhHUlrh9-AncvFqOGy-23rukpUlcFz9CXtczlNVCe-SU3hBNPfSSKMhCWsj4wIzxdK9-N1QQlAOQY_-3L4A94wxGRU5wF6rwCTHDHFSpukl-V/s1600/Screen+Shot+2014-12-25+at+2.25.12+PM.png&quot; height=&quot;267&quot; width=&quot;320&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/b392b73d73fdb1b000ee&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;
&lt;br /&gt;
$Proj$ is the projection of the point $\mathbf{x}$ on the line, where the position of it is defined by the scalar $\mathbf{u}^{T}\mathbf{x}$. Therefore, if we consider $\mathbf{X}=[X_1, X_2]^T$ be a random vector, then the random variable $Y=\mathbf{u}^T\mathbf{X}$ describes the variability of the data on the direction of the normalized vector $\mathbf{u}$. So that $Y$ is a linear combination of $X_i, i=1,2$. &lt;i&gt;The principal component analysis identifies a linear combinations of the original variables $\mathbf{X}$ that contain most of the information, in the sense of variability, contained in the data. The general assumption is that useful information is proportional to the variability. PCA is used for data dimensionality reduction and for interpretation of data. (Ref 1. Bajorski, 2012)&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
To better understand this, consider two dimensional data set, below is the plot of it along with two lines ($L_1$ and $L_2$) that are orthogonal to each other:
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaL8EuNiiCDSYPMU-Xxf3Y55qRAfnrcaUIDItJuUb18MNBMzjwEgyBFY8NTo0l9E4nTOO0PqH-XEsAHD4PrfBilrlpr2Ih1r4bs3GJP2LDNZIWVcL3B6vf1xTDRBGwScNzxlaV7l1NUvbO/s1600/Screen+Shot+2014-12-22+at+8.53.11+PM.png&quot; height=&quot;400&quot; width=&quot;313&quot; /&gt;&lt;/div&gt;
If we project the points orthogonally to both lines we have,

&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUeuXn23UhsLlfWIUpqdI2_FEtbvRHlT6xiQY5KhIXahHxYbe89I71wxMXSUdBZc5ot0SLW7LVULKor-JadxFicykR03MIVUrULpAVstm9XpAat3ZhzIagAeLB7uyPUY12wHwz1oS7YoUE/s1600/Screen+Shot+2014-12-22+at+9.11.24+PM.png&quot; height=&quot;400&quot; width=&quot;312&quot; /&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/ec5bfa3342b95029b9b7&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;
&lt;br /&gt;
So that if normalized vector $\mathbf{u}_1$ defines the direction of $L_1$, then the variability of the points on $L_1$ is described by the random variable $Y_1=\mathbf{u}_1^T\mathbf{X}$. Also if $\mathbf{u}_2$ is a normalized vector that defines the direction of $L_2$, then the variability of the points on this line is described by the random variable $Y_2=\mathbf{u}_2^T\mathbf{X}$. The first principal component is one with maximum variability. So in this case, we can see that $Y_2$ is more variable than $Y_1$, since the points projected on $L_2$ are more dispersed than in $L_1$. In practice, however, the linear combinations $Y_i = \mathbf{u}_i^T\mathbf{X}, i=1,2,\cdots,p$ is maximized sequentially so that $Y_1$ is the linear combination of the first principal component, $Y_2$ is the linear combination of the second principal component, and so on. Further, the estimate of the direction vector $\mathbf{u}$ is simply the normalized eigenvector $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$. And the variability explained by the principal component is the corresponding eigenvalue $\lambda$. For more details on theory of PCA refer to (Bajorski, 2012) at Reference 1 below.&lt;br /&gt;
&lt;br /&gt;
As promised we will do dimensionality reduction using PCA. We will use the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data from (Barjorski, 2012), you can use other locations of AVIRIS data that can be downloaded &lt;a href=&quot;http://aviris.jpl.nasa.gov/data/get_aviris_data.html&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;. However, since for most cases the AVIRIS data contains thousands of bands so for simplicity we will stick with the data given in (Bajorski, 2012) as it was cleaned reducing to 152 bands only.&lt;br /&gt;
&lt;br /&gt;
&lt;h3 style=&quot;text-align: left;&quot;&gt;
What is spectral bands?&lt;/h3&gt;
&lt;div&gt;
In imaging, spectral bands refer to the third dimension of the image usually denoted as $\lambda$. For example, RGB image contains red, green and blue bands as shown below along with the first two dimensions $x$ and $y$ that define the resolution of the image.&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqkXXbdzZdToQVa3U37P8hwVlqpF4N0r3BFa_tk5T9LZg57wICyT-bdgatLbt8IUBps6MzZc0RcoHvIfkcx1T8A9qIf2uXT7YG1meqHBUWcn9H_93Rl4eVDabdTwlSx_XPu1QXaH2FN7Bi/s1600/Screen+Shot+2014-12-23+at+12.11.08+AM.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqkXXbdzZdToQVa3U37P8hwVlqpF4N0r3BFa_tk5T9LZg57wICyT-bdgatLbt8IUBps6MzZc0RcoHvIfkcx1T8A9qIf2uXT7YG1meqHBUWcn9H_93Rl4eVDabdTwlSx_XPu1QXaH2FN7Bi/s1600/Screen+Shot+2014-12-23+at+12.11.08+AM.png&quot; height=&quot;200&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://gist.github.com/alstat/4c333c590cbead3fa067&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;LaTeX Code&quot; /&gt;&lt;/center&gt;
&lt;br /&gt;
&lt;div&gt;
These are few of the bands that are visible to our eyes, there are other bands that are not visible to us like infrared, and many other in &lt;a href=&quot;http://en.wikipedia.org/wiki/Electromagnetic_spectrum&quot; target=&quot;_blank&quot;&gt;electromagnetic spectrum&lt;/a&gt;. That is why in most cases AVIRIS data contains huge number of bands each captures different characteristics of the image. Below is the proper description of the data.&lt;br /&gt;
&lt;br /&gt;
&lt;h3 style=&quot;text-align: left;&quot;&gt;
Data&lt;/h3&gt;
The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), is a sensor collecting spectral radiance in the range of wavelengths from 400 to 2500 nm. It has been flown on various aircraft platforms, and many images of the Earth’s surface are available. A 100 by 100 pixel AVIRIS image of an urban area in Rochester, NY, near the Lake Ontario shoreline is shown below. The scene has a wide range of natural and man-made material including a mixture of commercial/warehouse and residential neighborhoods, which adds a wide range of spectral diversity. Prior to processing, invalid bands (due to atmospheric water absorption) were removed, reducing the overall dimensionality to 152 bands. This image has been used in Bajorski et al. (2004) and Bajorski (2011a, 2011b). The first 152 values in the AVIRIS Data represent the spectral radiance values (a spectral curve) for the top left pixel. This is followed by spectral curves of the pixels in the first row, followed by the next row, and so on. (Ref. 1 Bajorski, 2012)&lt;/div&gt;
&lt;br /&gt;
To load the data, run the following code:
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5ca5842c5ffe3173e1fb.js&quot;&gt;&lt;/script&gt;

&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihIEpHRRCY4EjRO9RZ1WoEffkthQhXlIwkZRtT2nWjrXhm4YcmkAkDoffIcbxhnQR5IEAW0VYcVmydYeo4W4fvU3guvkYf8IQr1X-tClbeVl582LxgbX7PedXtPxwVS5QNvg1NfroNvPwo/s1600/Rplot.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihIEpHRRCY4EjRO9RZ1WoEffkthQhXlIwkZRtT2nWjrXhm4YcmkAkDoffIcbxhnQR5IEAW0VYcVmydYeo4W4fvU3guvkYf8IQr1X-tClbeVl582LxgbX7PedXtPxwVS5QNvg1NfroNvPwo/s1600/Rplot.png&quot; height=&quot;255&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
Above code uses EBImage package, and can be installed from my &lt;a href=&quot;http://alstatr.blogspot.com/2014/09/r-image-analysis-using-ebimage.html&quot; target=&quot;_blank&quot;&gt;previous post&lt;/a&gt;.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Why do we need to reduce the dimension of the data?&lt;/h3&gt;
Before we jump in to our analysis, in case you may ask why? Well sometimes it&#39;s just difficult to do analysis on high dimensional data, especially on interpreting it. This is because there are dimensions that aren&#39;t significant (like redundancy) which adds to our problem on the analysis. So in order to deal with this, we remove those nuisance dimension and deal with the significant one.&lt;br /&gt;
&lt;br /&gt;
To perform PCA in R, we use the function &lt;code&gt;princomp&lt;/code&gt; as seen below:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/bec19685dc2a9cb63ce3.js&quot;&gt;&lt;/script&gt;
The structure of &lt;code&gt;princomp&lt;/code&gt; consist of a list shown above, we will give description to selected outputs. Others can be found in the documentation of the function by executing &lt;code&gt;?princomp&lt;/code&gt;.
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;sdev&lt;/code&gt; - standard deviation, the square root of the eigenvalues $\lambda$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data, &lt;code&gt;dat.mat&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loadings&lt;/code&gt; - eigenvectors $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data, &lt;code&gt;dat.mat&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;scores&lt;/code&gt; - the principal component scores.&lt;/li&gt;
&lt;/ul&gt;
Recall that the objective of PCA is to find for a linear combination $Y=\mathbf{u}^T\mathbf{X}$ that will maximize the variance $Var(Y)$. So that from the output, the estimate of the components of $\mathbf{u}$ is the entries of the &lt;code&gt;loadings&lt;/code&gt; which is a matrix of eigenvectors, where the columns corresponds to the eigenvectors of the sequence of principal components, that is if the first principal component is given by $Y_1=\mathbf{u}_1^T\mathbf{X}$, then the estimate of $\mathbf{u}_1$ which is $\mathbf{e}_1$ (eigenvector) is the set of coefficients obtained from the first column of the &lt;code&gt;loadings&lt;/code&gt;. The explained variability of the first principal component is the square of the first standard deviation &lt;code&gt;sdev&lt;/code&gt;, the explained variability of the second principal component is the square of the second standard deviation &lt;code&gt;sdev&lt;/code&gt;, and so on. Now let&#39;s interpret the loadings (coefficients) of the first three principal components. Below is the plot of this,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDLVF1vTLJTDq76Q9ySAViej5pJcdGcVniJA7MhfJQMtpY_KxaWuasRx7K0Wwe3R5l8NsGigxY_z6e4UjIIx4KzRVRkiD6HPpDNr-iheliM2YnP-ZGNs5Ywp23nyA9vRvlvZxTOMQXinVs/s1600/Rplot01.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a1ebd7d3ae1661569170.js&quot;&gt;&lt;/script&gt;
Base above, the coefficients of the first principal component (PC1) are almost all negative. A closer look, the variability in this principal component is mainly explained by the weighted average of radiance of the spectral bands 35 to 100. Analogously, PC2 mainly represents the variability of the weighted average of radiance of spectral bands 1 to 34. And further, the fluctuation of the coefficients of PC3 makes it difficult to tell on which bands greatly contribute on its variability. Aside from examining the loadings, another way to see the impact of the PCs is through the &lt;i&gt;impact plot&lt;/i&gt; where the &lt;i&gt;impact curve&lt;/i&gt; $\sqrt{\lambda_j}\mathbf{e}_j$ are plotted, I want you to explore that. &lt;br /&gt;
&lt;br /&gt;
Moving on, let&#39;s investigate the percent of variability in $X_i$ explained by the $j$th principal component, below is the formula of this,
\begin{equation}\nonumber
\frac{\lambda_j\cdot e_{ij}^2}{s_{ii}},
\end{equation}
where $s_{ii}$ is the estimated variance of $X_i$. So that below is the percent of explained variability in $X_i$ of the first three principal components including the cumulative percent variability (sum of PC1, PC2, and PC3),
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivdWT5oKaDMlBQv17qYi20OebfcScGiNEs6j9wp927C_CBNSZnNbRSd1lyrrpcQHbKFTvE5p5xq5zVCeVTxDR1Y80sRMxHXhCFkEkp4BbP7z8s_KCnSusIu4vf8DEnZayloouWp8D4_6ha/s1600/Rplot03.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d8a599e8dd802a009cae.js&quot;&gt;&lt;/script&gt;
For the variability of the first 33 bands, PC2 takes on about 90 percent of the explained variability as seen in the above plot. And still have great contribution further to 102 to 152 bands. On the other hand, from bands 37 to 100, PC1 explains almost all the variability with PC2 and PC3 explain 0 to 1 percent only. The sum of the percentage of explained variability of these principal components is indicated as orange line in the above plot, which is the cumulative percent variability.&lt;br /&gt;
&lt;br /&gt;
To wrap up this section, here is the percentage of the explained variability of the first 10 PCs.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PC1&lt;/th&gt;&lt;th&gt;PC2&lt;/th&gt;&lt;th&gt;PC3&lt;/th&gt;&lt;th&gt;PC4&lt;/th&gt;&lt;th&gt;PC5&lt;/th&gt;&lt;th&gt;PC6&lt;/th&gt;&lt;th&gt;PC7&lt;/th&gt;&lt;th&gt;PC8&lt;/th&gt;&lt;th&gt;PC9&lt;/th&gt;&lt;th&gt;PC10&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;10&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 1: Variability Explained by the First Ten Principal Components for the AVIRIS data.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;82.057&lt;/td&gt;&lt;td&gt;17.176&lt;/td&gt;&lt;td&gt;0.320&lt;/td&gt;&lt;td&gt;0.182&lt;/td&gt;&lt;td&gt;0.094&lt;/td&gt;&lt;td&gt;0.065&lt;/td&gt;&lt;td&gt;0.037&lt;/td&gt;&lt;td&gt;0.029&lt;/td&gt;&lt;td&gt;0.014&lt;/td&gt;&lt;td&gt;0.005&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
Above variability were obtained by noting that the variability explained by the principal component is simply the eigenvalue (square of the &lt;code&gt;sdev&lt;/code&gt;) of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$, hence the percentage of variability explained by the $j$th PC is equal to its corresponding eigenvalue $\lambda_j$ divided by the overall variability which is the sum of the eigenvalues, $\sum_{j=1}^{p}\lambda_j$, as we see in the following code,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a19e345d766c64eb88ad.js&quot;&gt;&lt;/script&gt;

&lt;h3&gt;
Stopping Rules&lt;/h3&gt;
Given the list of percentage of variability explained by the PCs in Table 1, how many principal components should we take into account that would best represent the variability of the original data? To answer that, we introduce the following stopping rules that will guide us on deciding the number of PCs:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Scree plot;&lt;/li&gt;
&lt;li&gt;Simple fare-share;&lt;/li&gt;
&lt;li&gt;Broken-stick; and,&lt;/li&gt;
&lt;li&gt;Relative broken-stick.&lt;/li&gt;
&lt;/ol&gt;
The scree plot is the plot of the variability of the PCs, that is the plot of the eigenvalues. Where we look for an elbow or sudden drop of the eigenvalues on the plot, hence for our example we have
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6tXQZGLnx70S-U-E9RBigKg3UyGQAJLx4jJ3Q4vnjb-z5UH1UoNNv6BpYmrIIrtPMfcbzIHHAgSi6JmT2fnQ26loa_RGCZnXrEtDxZcxf2f89KXDfofr49alNLXTfLzBUw7oU_22eISTJ/s1600/Rplot04.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/80133cd378bd3ff9ea02.js&quot;&gt;&lt;/script&gt;
Therefore, we need return the first two principal components based on the elbow shape. However, if the eigenvalues differ by order of magnitude, it is recommended to use the logarithmic scale which is illustrated below,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4u9R9Q3mVT3n89iLZgzAwvIRSiZ2KMbsVh4NYp2G-p1wEeguvfNM3X7S6SC-qpiMhRAMJy8tRm3xrh3cSpR-QFNUXc21QuKkKnW0V155KUK_RAixnOpKpdS1U_T0PH4kkRh5sIZ4aXSvL/s1600/Rplot05.png&quot; /&gt;&lt;/div&gt;
Unfortunately, sometimes it won&#39;t work as we can see here, it&#39;s just difficult to determine where the elbow is.
The succeeding discussions on the last three stopping rules are based on (Bajorski, 2012). The &lt;i&gt;simple fair-share&lt;/i&gt; stopping rule identifies the largest $k$ such that $\lambda_k$ is larger than its fair share, that is larger than $(\lambda_1+\lambda_2+\cdots+\lambda_p)/p$. To illustrate this, consider the following:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a1d446cea1dea3f9878e.js&quot;&gt;&lt;/script&gt;
Thus, we need to stop at second principal component.&lt;br /&gt;
&lt;br /&gt;
If one was concerned that the above method produces too many principal components, a &lt;i&gt;broken-stick rule&lt;/i&gt; could be used. The rule is that it identifies the principal components with largest $k$ such that $\lambda_j/(\lambda_1+\lambda_2+\cdots +\lambda_p)&amp;gt;a_j$, for all $j\leq k$, where
\begin{equation}\nonumber
a_j = \frac{1}{p}\sum_{i=j}^{p}\frac{1}{i},\quad j =1,\cdots, p.
\end{equation}
Let&#39;s try it,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/160b9d116e94a55bb0b2.js&quot;&gt;&lt;/script&gt;
Above result coincides with the first two stopping rule. The draw back of simple fair-share and broken-stick rules is that it do not work well when the eigenvalues differ by orders of magnitude. In such case, we then use the &lt;i&gt;relative broken-stick&lt;/i&gt; rule, where we analyze $\lambda_j$ as the first eigenvalue in the set $\lambda_j\geq \lambda_{j+1}\geq\cdots\geq\lambda_{p}$, where $j &amp;lt; p$. The dimensionality $k$ is chosen as the largest value such that $\lambda_j/(\lambda_j+\cdots +\lambda_p)&amp;gt;b_j$, for all $j\leq k$, where

\begin{equation}\nonumber
b_j = \frac{1}{p-j+1}\sum_{i=1}^{p-j+1}\frac{1}{i}.
\end{equation}
Applying this to the data we have,
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCqbGmhUro-O3dXSslqOJAfxwovLIHunSblE9buPVTV9dA0I7tHhnaU4tqiXniJk3Q6YQRusJakdgOIae5z4cXIf0CNcdkj1jJJDbvOf_xl41-umkjf0c2ndUUOxPsFGG0zKtuR2hmK6Pa/s1600/Rplot06.png&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/02733c62f0cd338d54dc.js&quot;&gt;&lt;/script&gt;
According to the numerical output, the first 34 principal components are enough to represent the variability of the original data.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Principal Component Scores&lt;/h3&gt;
The principal component scores is the resulting new data set obtained from the linear combinations $Y_j=\mathbf{e}_j(\mathbf{x}-\bar{\mathbf{x}}), j = 1,\cdots, p$. So that if we use the first three stopping rules, then below is the scores (in image) of PC1 and PC2,
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkegveyLr4vtQkT2KHcEIEHIejQKnWjn3x1mDi62MFY7qeyfW5WNj0HVQkvjTK8Lc7tRzJJwhBJmQnrIrl35vF2H0g5p5m4xzrmiJrKWJ7cseBbLyexuqFSlzk_E2xP5mlTLTdpDe1HgHn/s1600/Rplot07.png&quot; height=&quot;255&quot; width=&quot;400&quot; /&gt;&lt;/div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/acc2d9cfc047332a9a54.js&quot;&gt;&lt;/script&gt;
If we base on the relative broken-stick rule then we return the first 34 PCs, and below is the corresponding scores (in image).
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfUjXk8JDKOcG-gA9rMxJ6SQ_skC31eQA2ji5u90tAQtiqI7rwSOuxpJeIOcGYH6P7WTO5vyoHgGzAasnAQL3AbiUdU8o_-bdopsxg6WPYqP3VPAO3X0_3PdE7hoPcvy0pHrI6TL9nkVxG/s1600/Rplot08.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfUjXk8JDKOcG-gA9rMxJ6SQ_skC31eQA2ji5u90tAQtiqI7rwSOuxpJeIOcGYH6P7WTO5vyoHgGzAasnAQL3AbiUdU8o_-bdopsxg6WPYqP3VPAO3X0_3PdE7hoPcvy0pHrI6TL9nkVxG/s400/Rplot08.png&quot; height=&quot;255&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Click on the image to zoom in.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;h3&gt;
Residual Analysis&lt;/h3&gt;
Of course when doing PCA there are errors to be considered unless one would return all the PCs, but that would not make any sense because why would someone apply PCA when you still take into account all the dimensions? An overview of the errors in PCA without going through the theory is that, the overall error is simply the excluded variability explained by the $k$th to $p$th principal components, $k&gt;j$.
&lt;br /&gt;
&lt;br /&gt;

&lt;h3 style=&quot;text-align: left;&quot;&gt;
Reference&lt;/h3&gt;
&lt;div&gt;
&lt;ol style=&quot;text-align: left;&quot;&gt;
&lt;li&gt;&lt;a href=&quot;http://www.amazon.com/Statistics-Imaging-Optics-Photonics-Bajorski/dp/0470509457&quot; target=&quot;_blank&quot;&gt;Bajorski, P. (2012). &lt;i&gt;Statistics for Imaging, Optics, and Photonics&lt;/i&gt;. John Wiley &amp;amp; Sons, Inc.&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Download PDF Version&lt;/h3&gt;
&lt;center&gt;
&lt;input onclick=&quot;window.open(&#39;https://github.com/alstat/Analysis-with-Programming/raw/master/2014/R/Principal%20Component%20Analysis%20on%20Imaging/Principal%20Component%20Analysis%20on%20Imaging.pdf&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;Click here to download&quot; /&gt;&lt;/center&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/3681315616899008407/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/12/principal-component-analysis-on-imaging.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/3681315616899008407'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/3681315616899008407'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/12/principal-component-analysis-on-imaging.html' title='R: Principal Component Analysis on Imaging'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgex0TuNA-RxBG_c9TPXYTjNLO30lB-zUqF6A8LxgOGVv11juKjAnrZuBxcD_9NtTui2NWt3h0rLQVzpQcGnyGaEgSzkEg85GeoM1O8QPt6dm7YgrUjSJF3lhDExxse6B6jbZtGGTlh-zka/s72-c/Screen+Shot+2014-12-21+at+6.49.01+PM.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-2623482324233122897</id><published>2014-10-26T23:21:00.000+08:00</published><updated>2014-10-27T11:14:56.394+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="ALUES"/><category scheme="http://www.blogger.com/atom/ns#" term="Packages"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><title type='text'>ALUES: Agricultural Land Use Evaluation System, R package</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Authors:&lt;br /&gt;
&lt;b&gt;Arnold R. Salvacion&lt;/b&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
&lt;input onclick=&quot;window.open(&#39;https://github.com/alstat/ALUES/raw/master/vignette/ALUES.pdf&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;Download PDF Version&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;arsalvacion@gmail.com&lt;/i&gt;&lt;br /&gt;
&lt;a href=&quot;http://r-nold.blogspot.com/&quot; target=&quot;_blank&quot;&gt;Data Analysis and Visualization using R&lt;/a&gt; (blog)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
&lt;input onclick=&quot;window.open(&#39;https://github.com/alstat/ALUES&#39;, &#39;_blank&#39;)&quot; type=&quot;button&quot; value=&quot;Github Repository&quot; /&gt;&lt;br /&gt;&lt;br/&gt;
&lt;b&gt;Al-Ahmadgaid B. Asaad&lt;/b&gt; (maintainer)&lt;br /&gt;
&lt;i&gt;alstated@gmail.com&lt;/i&gt;&lt;br /&gt;
&lt;br /&gt;
Agricultural Land Use Evaluation System (ALUES) is an R package that evaluates land suitability for
different crop production. The package is based on the Food and Agriculture Organization (FAO) and the
International Rice Research Institute (IRRI) methodology for land evaluation. Development of ALUES is
inspired by similar tool for land evaluation, Land Use Suitability Evaluation Tool (LUSET). The package
uses fuzzy logic approach to evaluate land suitability of a particular area based on inputs such as rainfall,
temperature, topography, and soil properties. The membership functions used for fuzzy modeling are the
following: Triangular, Trapezoidal and Gaussian. The methods for computing the overall suitability of a particular area are also included, and these are the Minimum, Maximum, Product, Sum, Average, Exponential and Gamma. Finally, ALUES uses the power of Rcpp library for efficient computation.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
INSTALLATION&lt;/h3&gt;
The package is not yet on CRAN, and is currently under development on github. To install it, run the following:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/383a072fc51b4f6e526d.js&quot;&gt;&lt;/script&gt;
We want to hear some feedbacks, and if you have any suggestion or issues regarding this package, please do submit it &lt;a href=&quot;https://github.com/alstat/ALUES/issues&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt;.&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;h3 style=&quot;text-align: left;&quot;&gt;
DATASET&lt;/h3&gt;
The package contains several datasets which can be categorized into two:
&lt;br /&gt;
&lt;ol style=&quot;text-align: left;&quot;&gt;
&lt;li&gt;&lt;b&gt;Land Units&#39; Attributes&lt;/b&gt; - datasets that contain the attributes of the land units of a given location.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Crop Requirements&lt;/b&gt; - datasets that contain the required values of factors of a particular crop for the land units.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Land Units&#39; Attributes&lt;/h3&gt;
The package contains sample dataset of land units&#39; attributes from two countries:&lt;br /&gt;
&lt;ol style=&quot;text-align: left;&quot;&gt;
&lt;li&gt;&lt;b&gt;Marinduque, Philippines&lt;/b&gt;:
&lt;ul&gt;
&lt;li&gt; &lt;code&gt;MarinduqueLT&lt;/code&gt; - a dataset consisting the land and terrain characteristics of the land units of Marinduque, Philippines;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MarinduqueTemp&lt;/code&gt; - a dataset consisting the temperature characteristics of the land units of Marinduque, Philippines; and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MarinduqueWater&lt;/code&gt; - a dataset consisting the water characteristics of the land units of Marinduque, Philippines.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Lao Cai, Vietnam&lt;/b&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;LaoCaiLT&lt;/code&gt; - a dataset consisting the land and terrain characteristics of the land units of Lao Cai, Vietnam;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LaoCaiTemp&lt;/code&gt; - a dataset consisting the temperature characteristics of the land units in Lao Cai, Vietnam;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LaoCaiWater&lt;/code&gt; - a dataset consisting the water characteristics of the land units of Lao Cai, Vietnam;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
For example, the first six land units in &lt;code&gt;MarinduqueLT&lt;/code&gt; is shown below&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d43faeaec0ed1475c809.js&quot;&gt;&lt;/script&gt;
The complete list of factors is available in the pdf version.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Crop Requirements&lt;/h3&gt;
The crops available in the package are the listed in Table 1.
&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;&lt;th&gt;Crops&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;4&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 1: Crops Dataset Available in ALUES.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;BANANA-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Banana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;CASSAVA-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Cassava&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;COCOA-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Cocoa&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;COCONUT-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Coconut&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;COFFEEAR-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Arabica Coffee&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;COFFEERO-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Robusta Coffee&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;RICEBR-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Rainfed Bunded Rice&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;RICEIW-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Irrigated Rice&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;RICENF-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Rice Cultivation Under Natural Floods&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;RICEUR-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Rainfed Upland Rice&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
From the table, the codes are suffixed with the land units&#39; characteristics (&lt;code&gt;TerrainCR&lt;/code&gt;, &lt;code&gt;SoilCR&lt;/code&gt;, &lt;code&gt;WaterCR&lt;/code&gt; and &lt;code&gt;TemperatureCR&lt;/code&gt;) required for the crop. For example, below are the required values for the terrain characteristics of the land units on cultivating coconut:
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/dcb86c3e65433a27e391.js&quot;&gt;&lt;/script&gt;
For required characteristics of soil, water and temperature on cultivating coconut the codes are &lt;code&gt;COCONUTSoilCR&lt;/code&gt;, &lt;code&gt;COCONUTWaterCR&lt;/code&gt; and &lt;code&gt;COCONUTTemperatureCR&lt;/code&gt;, respectively.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
R FUNCTIONS&lt;/h3&gt;
The package contains the following functions:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;suitability&lt;/code&gt; - computes the suitability scores and classes of the land units base on the requirements of the crop.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;overall_suit&lt;/code&gt;- computes the overall suitability of the land units, using the suitability scores obtained from the &lt;code&gt;suitability&lt;/code&gt; function.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
Suitability&lt;/h3&gt;
In this section, we will get into the details of the &lt;code&gt;suitability&lt;/code&gt; function.
&lt;b&gt;Usage&lt;/b&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/140638b109dc94d9eadc.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table style=&quot;width: 550px;&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;x&lt;/code&gt;&lt;/td&gt;&lt;td&gt;a data frame consisting the properties of the land units;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;y&lt;/code&gt;&lt;/td&gt;&lt;td&gt;a data frame consisting the crop (e.g. coconut, cassava, etc.) requirements for a given characteristics (terrain, soil, water and temperature);&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;mf&lt;/code&gt;&lt;/td&gt;&lt;td&gt;membership function, default is set to &lt;code&gt;&quot;triangular&quot;&lt;/code&gt;. Other fuzzy models are &lt;code&gt;&quot;Trapezoidal&quot;&lt;/code&gt; and &lt;code&gt;&quot;Gaussian&quot;&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;sow.month&lt;/code&gt;&lt;/td&gt;&lt;td&gt;sowing month of the crop. Takes integers from 1 to 12 (inclusive), representing the twelve months of a year. So if sets to 1, the function assumes sowing month on January.&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;min&lt;/code&gt;&lt;/td&gt;&lt;td&gt;factor&#39;s minimum value. If &lt;code&gt;NULL&lt;/code&gt; (default), &lt;code&gt;min&lt;/code&gt; is set to 0. But if numeric of length one, say 0.5, then minimum is set to 0.5 for all factors. If factors on land units (&lt;code&gt;x&lt;/code&gt;) have different minimum, then these can be concatenated to vector of &lt;code&gt;min&lt;/code&gt;s, the length of this vector should be equal to the number of factors in &lt;code&gt;x&lt;/code&gt;. However, if sets to &lt;code&gt;&quot;average&quot;&lt;/code&gt;, then &lt;code&gt;min&lt;/code&gt; is theoretically computed as:&lt;br /&gt;
&lt;br /&gt;
Let X be a factor, then X has the following suitability class: S3, S2 and S1. Assuming the scores of the said suitability class for X are $a, b$ and $c$, respectively. Then,
$$\mathrm{min} = a - \displaystyle\frac{(b - a) + (c - b)}{2}$$
For factors with suitability class S3, S2, S1, S1, S2 and S3 with scores $a, b, c, d, e$ and $f$, respectively. &lt;code&gt;min&lt;/code&gt; is computed as,
$$\mathrm{min} = a - \displaystyle\frac{(b - a) + (c - b) + (d - c) + (e - d) + (f - e)}{5}$$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;max&lt;/code&gt;&lt;/td&gt;&lt;td&gt;factor&#39;s maximum value. Default is set to &lt;code&gt;&quot;average&quot;&lt;/code&gt;. If numeric of length one, say 50, then maximum is set to 50 for all factors. If factors on land units (&lt;code&gt;x&lt;/code&gt;) have different maximum, then these can be concatenated to vector of &lt;code&gt;max&lt;/code&gt;s, the length of this vector should be equal to the number of factors in &lt;code&gt;x&lt;/code&gt;. However, if sets to &lt;code&gt;&quot;average&quot;&lt;/code&gt;, then &lt;code&gt;max&lt;/code&gt; is computed from the equation below:
$$\mathrm{max}=c + \displaystyle\frac{(b-a) + (c-b)}{2}$$
For factors with suitability class S3, S2, S1, S1, S2 and S3 with scores $a, b, c, d, e$ and $f$, respectively. Then,
$$\mathrm{max} = f + \displaystyle\frac{(b - a) + (c - b) + (d - c) + (e - d) + (f - e)}{5}$$&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;interval&lt;/code&gt;&lt;/td&gt;&lt;td&gt;domain for every suitability class (S1, S2, S3, and N). If &lt;code&gt;&quot;fixed&quot;&lt;/code&gt;, the interval would be 0 to 0.25 for N (Not Suitable), 0.25 to 0.50 for S3 (Marginally Suitable), 0.50 to 0.75 for S2 (Moderately Suitable), and 0.75 to 1 for (Highly Suitable). If &lt;code&gt;&quot;unbias&quot;&lt;/code&gt;, then the interval is set to 0 to $\displaystyle\frac{a}{\mathrm{max}}$ for N, $\displaystyle\frac{a}{\mathrm{max}}$ to $\displaystyle\frac{b}{\mathrm{max}}$ for S3, $\displaystyle\frac{b}{\mathrm{max}}$ to $\displaystyle\frac{c}{\mathrm{max}}$ for S2, and $\displaystyle\frac{c}{\mathrm{max}}$ to $\displaystyle\frac{\mathrm{max}}{\mathrm{max}}$ for S1.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;
&lt;b&gt;Output&lt;/b&gt;&lt;br /&gt;
The function returns the following output:&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Actual Factors Evaluated;&lt;/li&gt;
&lt;li&gt;Suitability Score;&lt;/li&gt;
&lt;li&gt;Suitability Class;&lt;/li&gt;
&lt;li&gt;Factors&#39; Minimum Values; and,&lt;/li&gt;
&lt;li&gt;Factors&#39; Maximum Values.&lt;/li&gt;
&lt;/ol&gt;
&lt;i&gt;Example&lt;/i&gt;: To test the suitability of the land units in Marinduque, Philippines, for terrain requirements of coconut, we have&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/dac48c584bc347a35924.js&quot;&gt;&lt;/script&gt;
Before we run the function, let&#39;s check for the possible output. From the land units (&lt;code&gt;MarinduqueLT&lt;/code&gt;), the only factor available to be evaluated is &lt;code&gt;CFragm&lt;/code&gt;, for required soil characteristics of the coconut. The first land unit has 11% coarse fragment (CFragm), which falls within the S1 domain of the required soil characteristics, with domain [&lt;code&gt;min&lt;/code&gt; - 15%), where &lt;code&gt;min&lt;/code&gt; has default value set to 0. The second to sixth land units also are highly suitable as it falls within the said domain. Let&#39;s confirm it using the function,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/11a2a689e924e0c901c8.js&quot;&gt;&lt;/script&gt;
Extract the first 6 of the outputs,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/a7d4ed27a086b36dd813.js&quot;&gt;&lt;/script&gt;
Indeed, just what we argued earlier.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;
Options for &lt;code&gt;mf&lt;/code&gt; (Membership Function)&lt;/b&gt;&lt;br /&gt;
The membership function is an option for the type of fuzzy model, the available models are the following:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Triangular;&lt;/li&gt;
&lt;li&gt;Trapezoidal; and,&lt;/li&gt;
&lt;li&gt;Gaussian.&lt;/li&gt;
&lt;/ol&gt;
The suitability scores are computed base on these fuzzy models.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Options for &lt;code&gt;sow.month&lt;/code&gt; (Sowing Month)&lt;/b&gt;&lt;br /&gt;
The &lt;code&gt;sow.month&lt;/code&gt; is the sowing month which takes integers from 1 to 12, representing the twelve months of a year. So if sets to 1, the function assumes sowing month on January. This argument is only use for water and temperature characteristics.&lt;br /&gt;
&lt;br /&gt;
To illustrate this, we will test the land units of Marinduque for the required water and temperature for rainfed bunded rice. Thus, we have&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/733313cc615dc901ea92.js&quot;&gt;&lt;/script&gt;
We will test first the land units for water, and here are the following water requirements for rainfed bunded rice,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/3928701add1a2f612e04.js&quot;&gt;&lt;/script&gt;
The factors to be evaluated here are the following:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;WmAv1&lt;/code&gt; - Mean precipitation of first month (mm);&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv2&lt;/code&gt; - Mean precipitation of second month (mm);&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv3&lt;/code&gt; - Mean precipitation of third month (mm); and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv4&lt;/code&gt; - Mean precipitation of fourth month (mm).&lt;/li&gt;
&lt;/ol&gt;
If sowing month is set to November, then we have
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;WmAv1&lt;/code&gt; - November;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv2&lt;/code&gt; - December;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv3&lt;/code&gt; - January; and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv4&lt;/code&gt; - February.&lt;/li&gt;
&lt;/ol&gt;
So for Novermber, we see the first land unit falls within the domain of S1, that is, 277 mm falls within [175 - 500 mm). And same thing for the first land unit of December, highly suitable. Let&#39;s fire up the function to confirm that,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/eca0f662e2629689dcb1.js&quot;&gt;&lt;/script&gt;
You will have this error if there is no factors to be evaluated. What just happened here is that, the function assumed the data as neither water nor temperature characteristics. Thus, it ignores the &lt;code&gt;WmAv1&lt;/code&gt;, &lt;code&gt;WmAv2&lt;/code&gt;, &lt;code&gt;WmAv3&lt;/code&gt; and &lt;code&gt;WmAv4&lt;/code&gt; factors. But if we specify the sowing month (&lt;code&gt;sow.month&lt;/code&gt;) to November (&lt;code&gt;11&lt;/code&gt;), then we have&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/103f35abf3c55b8547f4.js&quot;&gt;&lt;/script&gt;
The first land unit for November does confirms to be S1, but for December it isn&#39;t, and instead S2 is given. This problem will be discussed later on details about the &lt;code&gt;interval&lt;/code&gt; argument.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Options for &lt;code&gt;min&lt;/code&gt; (Factors&#39; Minimum Value)&lt;/b&gt;&lt;br /&gt;
By default, &lt;code&gt;min = 0&lt;/code&gt; for all factors. This can be assigned to any positive integers, for example, using the cassava soil requirements,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/da1400c9412120a91be4.js&quot;&gt;&lt;/script&gt;
Now let&#39;s try different minimums for factors, we will use the following:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th&gt;CECc&lt;/th&gt;&lt;th&gt;pHH20&lt;/th&gt;&lt;th&gt;CFragm&lt;/th&gt;&lt;th&gt;SoilTe&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;4&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 2: Custom &lt;code&gt;min&lt;/code&gt;.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;0.4&lt;/td&gt;&lt;td&gt;0.6&lt;/td&gt;&lt;td&gt;0.1&lt;/td&gt;&lt;td&gt;0.3&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/c74262c7a59515855b10.js&quot;&gt;&lt;/script&gt;
So we got an error, it is expected, since the length of the vector &lt;code&gt;min&lt;/code&gt; should be equal to the number of factors in &lt;code&gt;x&lt;/code&gt;, which is 6. Since we are not interested on the latitude (&lt;code&gt;X&lt;/code&gt;) and longitude (&lt;code&gt;Y&lt;/code&gt;) factors of the dataset, then we can ommit the two and rerun the code,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/839483a52b652c31bd29.js&quot;&gt;&lt;/script&gt;
Only CECc and SoilTe are returned since these are the factors evaluated.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Options for &lt;code&gt;max&lt;/code&gt; (Factors&#39; Maximum Value)&lt;/b&gt;&lt;br /&gt;
By default &lt;code&gt;max = &#39;average&#39;&lt;/code&gt;, and just like &lt;code&gt;min&lt;/code&gt;, &lt;code&gt;max&lt;/code&gt; can be assigned to any positive integer, example:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/7b6c3213361b7c0c1a59.js&quot;&gt;&lt;/script&gt;
For different maximum value on every factor, we will use the following and ommit the first two factors in &lt;code&gt;MarinduqueLT&lt;/code&gt; like what we did in the previous section.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th&gt;CECc&lt;/th&gt;&lt;th&gt;pHH20&lt;/th&gt;&lt;th&gt;CFragm&lt;/th&gt;&lt;th&gt;SoilTe&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;4&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 3: Custom &lt;code&gt;max&lt;/code&gt;.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;52.5&lt;/td&gt;&lt;td&gt;8.8&lt;/td&gt;&lt;td&gt;40&lt;/td&gt;&lt;td&gt;14&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/1eebf0604e0fccc1f141.js&quot;&gt;&lt;/script&gt;
&lt;b&gt;Options for &lt;code&gt;interval&lt;/code&gt; (Domain of Suitability Scores)&lt;/b&gt;&lt;br /&gt;
The domain of suitability scores are set to default, &lt;code&gt;&#39;fixed&#39;&lt;/code&gt;, if this option is used, the domain of the suitability scores would be,&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th&gt;Class&lt;/th&gt;&lt;th&gt;N&lt;/th&gt;&lt;th&gt;S3&lt;/th&gt;&lt;th&gt;S2&lt;/th&gt;&lt;th&gt;S1&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;5&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 4: Domain for &lt;code&gt;&#39;fixed&#39;&lt;/code&gt;.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Domain&lt;/td&gt;&lt;td&gt;[0, 0.25)&lt;/td&gt;&lt;td&gt;[0.25, 0.5)&lt;/td&gt;&lt;td&gt;[0.5, 0.75)&lt;/td&gt;&lt;td&gt;[0.75, 1]&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
An example of &lt;code&gt;interval = &#39;fixed&#39;&lt;/code&gt; is the one illustrated in &lt;i&gt;Options for &lt;code&gt;sow.month&lt;/code&gt; (Sowing Month)&lt;/i&gt; above. Let us investigate the output of that, here is the crop requirements for water (the crop we are interested in, is the rainfed bunded rice),
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/7f6255ca8cf4166ac55e.js&quot;&gt;&lt;/script&gt;
Given that the starting sowing month assigned is November, then the following factors are evaluated: 
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;WmAv1&lt;/code&gt; - November;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv2&lt;/code&gt; - December;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv3&lt;/code&gt; - January; and&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WmAv4&lt;/code&gt; - February.&lt;/li&gt;
&lt;/ol&gt;
So we are going to extract this factors from the dataset, &lt;code&gt;MarinduqueWater&lt;/code&gt;,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/ed5fda094d3126c3cfd1.js&quot;&gt;&lt;/script&gt;
The suitability scores and class of this would be,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/c782e7fe6d7ac8c26a72.js&quot;&gt;&lt;/script&gt;
Focus your attention on suitability scores of &lt;code&gt;Feb&lt;/code&gt; factor for the first three land units. We have here 0.3714, 0.3714 and 0.3771. And the domain of this base on Table 4, would be S3, S3 and S3. But, if we refer to the original data, the first three data points in &lt;code&gt;Feb&lt;/code&gt; factor are all 65. Since &lt;code&gt;WmAv4&lt;/code&gt; is the corresponding requirements for &lt;code&gt;Feb&lt;/code&gt; factor, with scores:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th&gt;Factor&lt;/th&gt;&lt;th&gt;S3&lt;/th&gt;&lt;th&gt;S2&lt;/th&gt;&lt;th&gt;S1&lt;/th&gt;&lt;th&gt;S1&lt;/th&gt;&lt;th&gt;S2&lt;/th&gt;&lt;th&gt;S3&lt;/th&gt;&lt;th&gt;Weight&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;8&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 5: &lt;code&gt;WmAv4&lt;/code&gt;’s Suitability Requirements.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;WmAv4&lt;/td&gt;&lt;td&gt;29&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;50&lt;/td&gt;&lt;td&gt;300&lt;/td&gt;&lt;td&gt;500&lt;/td&gt;&lt;td&gt;600&lt;/td&gt;&lt;td&gt;NA&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
Then it is easy to pin point what suitability class does the scores of the land units falls into. Which follows that all first three land units falls within class S1. See the problem with &lt;code&gt;&#39;fixed&#39;&lt;/code&gt; interval? This is the same problem for other factor like &lt;code&gt;Dec&lt;/code&gt; (December), where instead of S1, we got S2. Users can change the domain though, that is, instead of using the &lt;code&gt;&#39;fixed&#39;&lt;/code&gt; option, users can assign for example, &lt;code&gt;interval = c(0, 0.33, 0.56, 0.89, 1)&lt;/code&gt;, which equivalently:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th&gt;Class&lt;/th&gt;&lt;th&gt;N&lt;/th&gt;&lt;th&gt;S3&lt;/th&gt;&lt;th&gt;S2&lt;/th&gt;&lt;th&gt;S1&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;5&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 6: Custom Domains.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Domain&lt;/td&gt;&lt;td&gt;[0, 0.33)&lt;/td&gt;&lt;td&gt;[0.33, 0.56)&lt;/td&gt;&lt;td&gt;[0.56, 0.89)&lt;/td&gt;&lt;td&gt;[0.89, 1]&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
Assigning new values for parameters of the &lt;code&gt;interval&lt;/code&gt; won&#39;t solve the problem, but this argument has one more option to offer, which does solve the problem, and that is by changing &lt;code&gt;interval = &#39;fixed&#39;&lt;/code&gt; to &lt;code&gt;interval = &#39;unbias&#39;&lt;/code&gt;. Let&#39;s try it,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/e9cd269c31466e6aa57c.js&quot;&gt;&lt;/script&gt;
And that supports our argument above.&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Weighting&lt;/b&gt;&lt;br /&gt;
The function, &lt;code&gt;suitability&lt;/code&gt;, also considers the weights of the factors. An example of crop with no weights is the soil requirement for coconut,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/da8acd4b0d0cb8667662.js&quot;&gt;&lt;/script&gt;
The weights are assigned on the last column, &lt;code&gt;Weight.class&lt;/code&gt;. And here is the soil requirements for the cassava, with weight on each factor:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/21f45eb8035e8436b7a7.js&quot;&gt;&lt;/script&gt;
If a given factor has a weight, then the function will compute the corresponding suitability and then use the weighting score to obtain the appropriate suitability score. The weights of the factors for the default interval (&lt;code&gt;interval = &#39;fixed&#39;&lt;/code&gt;) are in Table 7:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;&lt;th&gt;Suitability&lt;/th&gt;&lt;th colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;Factor Weights&lt;/th&gt;&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;4&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 7: Weights of the Factors for &lt;code&gt;&#39;fixed&#39;&lt;/code&gt; Interval.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;b&gt;Class&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;S1&lt;/td&gt;&lt;td&gt;0.833&lt;/td&gt;&lt;td&gt;0.916&lt;/td&gt;&lt;td&gt;1.000&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;S2&lt;/td&gt;&lt;td&gt;0.583&lt;/td&gt;&lt;td&gt;0.667&lt;/td&gt;&lt;td&gt;0.750&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;S3&lt;/td&gt;&lt;td&gt;0.333&lt;/td&gt;&lt;td&gt;0.416&lt;/td&gt;&lt;td&gt;0.500&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;0.083&lt;/td&gt;&lt;td&gt;0.167&lt;/td&gt;&lt;td&gt;0.250&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
Thus the function simply divides the interval of the suitability class into three, for three weights.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Overall Suitability&lt;/h3&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/fe9ad310f98b203d4361.js&quot;&gt;&lt;/script&gt;
&lt;center&gt;
&lt;table style=&quot;width: 550px;&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;x&lt;/code&gt;&lt;/td&gt;&lt;td&gt;a data frame consisting the suitability scores of a given characteristics (terrain, soil, water and temperature) for a given crop (e.g. coconut, cassava, etc.);&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;method&lt;/code&gt;&lt;/td&gt;&lt;td&gt;the method for computing the overall suitability, which includes the minimum, maximum, sum, product, average, exponential and gamma. If &lt;code&gt;NULL&lt;/code&gt;, minimum is used.&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;interval&lt;/code&gt;&lt;/td&gt;&lt;td&gt;if &lt;code&gt;NULL&lt;/code&gt;, the interval used are the following: 0-0.25 (Not suitable, N), 0.25-0.50 (Marginally Suitable, S3), 0.50-0.75 (Moderately Suitable, S2), and 0.75-1 (Highly Suitable, S1).&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: right;&quot; valign=&quot;top&quot;&gt;&lt;code&gt;output&lt;/code&gt;&lt;/td&gt;&lt;td&gt;the output to be returned, either the scores or class. If &lt;code&gt;NULL&lt;/code&gt;, both are returned.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/center&gt;
&lt;br /&gt;
&lt;h3&gt;
DEMONSTRATION&lt;/h3&gt;
Let&#39;s assume we are interested on the land units in Lao Cai, Vietnam, for cultivating irrigated rice. So here are the first 6 land units in the said location,
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/b020a5802d9a8fb33d24.js&quot;&gt;&lt;/script&gt;
And here are the required values for factors of soil, terrain, temperature and water characteristics for irrigated rice,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9ae335b43d60d894e683.js&quot;&gt;&lt;/script&gt;
Now, we are going to take the suitability scores for every characteristics,&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/952fca3606fd0e04c2ae.js&quot;&gt;&lt;/script&gt;
Next, we will take the overall suitability on all factors in each land unit using the &lt;code&gt;&quot;average&quot;&lt;/code&gt; method (default is &lt;code&gt;&quot;minimum&quot;&lt;/code&gt;).&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/877596db520ba2688bba.js&quot;&gt;&lt;/script&gt;
Finally, take the overall suitability from these characteristics using the &lt;code&gt;&quot;maximum&quot;&lt;/code&gt; method.&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5ecf671028fde46b86e9.js&quot;&gt;&lt;/script&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/2623482324233122897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/10/alues-agricultural-land-use-evaluation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2623482324233122897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2623482324233122897'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/10/alues-agricultural-land-use-evaluation.html' title='ALUES: Agricultural Land Use Evaluation System, R package'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-8924714867856909647</id><published>2014-09-21T11:56:00.000+08:00</published><updated>2014-09-21T18:55:33.434+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="LaTeX"/><category scheme="http://www.blogger.com/atom/ns#" term="Probability Theory"/><category scheme="http://www.blogger.com/atom/ns#" term="Python"/><title type='text'>Probability Theory Problems</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Let&#39;s have fun on probability theory, here is my first problem set in the said subject.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Problems&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
It was noted that statisticians who follow the deFinetti school do not accept the Axiom of Countable Additivity, instead adhering to the Axiom of Finite Additivity.
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;
Show that the Axiom of Countable Additivity implies Finite Additivity.&lt;/li&gt;
&lt;li&gt;Although, by itself, the Axiom of Finite Additivity does not imply Countable Additivity, suppose we supplement it with the following. Let $A_1\supset A_2\supset\cdots\supset A_n\supset \cdots$ be an infinite sequence of nested sets whose limit is the empty set, which we denote by $A_n\downarrow\emptyset$. Consider the following:&lt;br /&gt;&lt;br /&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;b&gt;Axiom of Continuity:&lt;/b&gt; If $A_n\downarrow\emptyset$, then $P(A_n)\rightarrow 0$
&lt;/div&gt;
&lt;br /&gt;
Prove that the Axiom of Continuity and the Axiom of Finite Additivity imply Countable Additivity.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Prove each of the following statements. (Assume that any conditioning event has positive probability.)
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt; If $P(B)=1$, then $P(A|B)=P(A)$ for any $A$.&lt;/li&gt;
&lt;li&gt; If $A\subset B$, then $P(B|A)=1$ and $P(A|B)=P(A)/P(B)$.&lt;/li&gt;
&lt;li&gt; If $A$ and $B$ are mutually exclusive, then
\begin{equation}\nonumber
P(A|A\cup B) = \displaystyle\frac{P(A)}{P(A)+P(B)}.
\end{equation}&lt;/li&gt;
&lt;li&gt; $P(A\cap B\cap C)=P(A|B\cap C)P(B|C)P(C)$.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;li&gt;Prove that the following functions are cdfs.
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt; $\frac{1}{2}+\frac{1}{\pi}\arctan(x), x\in (-\infty, \infty)$&lt;/li&gt;
&lt;li&gt; $(1+e^{-x})^{-1},x\in (-\infty,\infty)$&lt;/li&gt;
&lt;li&gt; $e^{-e^{-x}}, x\in (-\infty, \infty)$&lt;/li&gt;
&lt;li&gt; $1-e^{-x}, x\in (0,\infty)$&lt;/li&gt;
&lt;li&gt; the function defined in (1.5.6), (Check in the reference below.)&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;A cdf $F_X$ is &lt;i&gt;stochastically&lt;/i&gt; greater than a cdf $F_{Y}$ if $F_{X}(t)\leq F_{Y}(t)$ for all $t$ and $F_{X}(t) &amp;lt; F_{Y}(t)$ for some $t$. Prove that if $X\sim F_X$ and $Y\sim F_Y$, then
\begin{equation}\nonumber
P(X&amp;gt;t) \geq P(Y&amp;gt;t)\;\text{for every}\;t
\end{equation}
and 
\begin{equation}\nonumber
P(X&amp;gt;t)&amp;gt;P(Y&amp;gt;t),\;\text{for some}\; t
\end{equation}
that is, $X$ tends to be bigger than $Y$.&lt;/li&gt;
&lt;li&gt; Let $X$ be a continuous random variable with pdf $f(x)$ and cdf $F(x)$. For a fixed number $x_0$, define the function
\begin{equation}\nonumber
g(x) = \begin{cases}
f(x) / [1-F(x_0)]&amp;amp; x \geq x_0\\
0 &amp;amp; x &amp;lt; x_0.
\end{cases}
\end{equation}
Prove that $g(x)$ is a pdf. (Assume that $F(x_0)&lt;1$.)&lt;/li&gt;
&lt;li&gt;For each of the following, determine the value of $c$ that makes $f(x)$ a pdf.
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt; $f(x)=\mathrm{c}\sin x, 0 &amp;lt; x &amp;lt; \pi/2$&lt;/li&gt;
&lt;li&gt; $f(x)=\mathrm{c}e^{-|x|},-\infty &amp;lt; x &amp;lt; \infty$&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h3&gt;
Solutions&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. Let $\mathscr{B}$ be a $\sigma$-algebra and consider $A_1,A_2,\cdots\in \mathscr{B}$ are pairwise disjoint, then by countable additivity
\begin{equation}\nonumber
P\left(\displaystyle\bigcup_{i=1}^{\infty}A_i\right)=\displaystyle\sum_{i=1}^{\infty}P(A_i).
\end{equation}
Now, 
\begin{equation}
\begin{aligned}
P\left(\displaystyle\bigcup_{i=1}^{\infty}A_i\right)&amp;amp;=
P\left(\displaystyle\bigcup_{i=1}^{n}A_i\cup\displaystyle
\bigcup_{i=n+1}^{\infty}A_i\right)\\
&amp;amp;=
P\left(\displaystyle\bigcup_{i=1}^{n}A_i\right)+P\left(\displaystyle
\bigcup_{i=n+1}^{\infty}A_i\right),\;(\text{since}\;A_i&#39;s\;\text{are disjoints})\\
&amp;amp;=P(A_1)+\cdots+P(A_n)+P\left(\displaystyle
\bigcup_{i=n+1}^{\infty}A_i\right),\\
&amp;amp;\quad(\text{by finite additivity})\\
&amp;amp;=\displaystyle\sum_{i=1}^{n}P(A_i)+P\left(\displaystyle
\bigcup_{i=n+1}^{\infty}A_i\right)
\end{aligned}\nonumber
\end{equation}
Notice that for any $n$, we can consider $P(A_i),\;i&amp;gt;n$ to be empty. Implying 
\begin{equation}\nonumber
P\left(\displaystyle\bigcup_{i=n+1}^{\infty}A_i\right)=\displaystyle
\sum_{i=n+1}^{\infty}P(A_i)=P(\emptyset)+P(\emptyset)+\cdots,
\end{equation}
that is,
\begin{equation}\nonumber
\begin{aligned}
P\left(\displaystyle\bigcup_{i=1}^{\infty}A_i\right)&amp;amp;=
\displaystyle\sum_{i=1}^{n}P(A_i)+\sum_{i=n+1}^{\infty}P(A_i)\\
&amp;amp;=\displaystyle\sum_{i=1}^{n}P(A_i)+P(\emptyset)+P(\emptyset)+\cdots
\end{aligned}
\end{equation}
$\therefore$ countable additivity implies finite additivity. &lt;br /&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;From (a), we have shown that countable additivity implies finite additivity, i.e.,
\begin{equation}
P\left(\displaystyle\bigcup_{i=1}^{\infty}A_i\right)=\displaystyle\sum_{i=1}^{n}P(A_i)+P\left(\displaystyle
\bigcup_{i=n+1}^{\infty}A_i\right)
\nonumber
\end{equation}
If we supplement this with the following condition, that $A_1\supset A_2\supset A_3\supset\cdots$. By Axiom of Continuity, $\displaystyle\lim_{n\to \infty}A_n=\emptyset$, and by &lt;a href=&quot;http://alstatr.blogspot.com/2014/09/monotonic-sequential-continuity.html&quot; target=&quot;_blank&quot;&gt;Monotone Sequential Continuity&lt;/a&gt;, $P\left(\displaystyle\lim_{n\to\infty}A_n\right)=
\displaystyle\lim_{n\to\infty}P(A_n)=0$.
Now we can write $A_1\supset A_2\supset A_3\supset\cdots$ as
\begin{equation}\nonumber
B_k=\bigcup_{i=k}^{\infty}A_i,\;\text{such that}\;B_{k+1}\subset B_k, \text{implying}\; \lim_{k\to\infty}B_k=\emptyset
\end{equation}
Thus, finite additivity plus axiom of continuity, we have
\begin{equation}\nonumber
\begin{aligned}
P\left(\bigcup_{i=1}^{\infty}A_i\right)&amp;amp;=\lim_{n\to\infty}\left(
\sum_{i=1}^{n}P(A_i)+P(B_{n+1})\right)\\
&amp;amp;=\lim_{n\to\infty}\left(\sum_{i=1}^{n}P(A_i)\right)+\lim_{n\to\infty}
P(B_{n+1})\\
&amp;amp;=\sum_{i=1}^{\infty}P(A_i)+0,\;(\text{by axiom of continuity}).
\end{aligned}
\end{equation}
Implying countable additivity.&lt;br /&gt;
$\hspace{12.5cm}\blacksquare$&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;
&lt;i&gt;Proof&lt;/i&gt;. If $P(B)=1$, then $P(S)=P(B)=1$. Because $A\subseteq S$, implies $A\subseteq B$. Thus, $A\cap B = A$, and therefore
\begin{equation}\nonumber
P(A|B)=\displaystyle\frac{P(A\cap B)}{P(B)}=\displaystyle\frac{P(A)}{P(B)}=P(A)
\end{equation}
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. If $A\subseteq B$ then
\begin{equation}\nonumber
P(B|A)=\displaystyle\frac{P(A\cap B)}{P(A)}=\displaystyle\frac{P(A)}{P(A)}=1
\end{equation}
and,
\begin{equation}\nonumber
P(A|B)=\displaystyle\frac{P(A\cap B)}{P(B)}=\displaystyle\frac{P(A)}{P(B)}
\end{equation}
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. If $A$ and $B$ are mutually exclusive, then
\begin{equation}
\nonumber
\begin{aligned}
P(A|A\cup B)&amp;amp;=\displaystyle\frac{P(A\cap (A\cup B))}{P(A\cup B)}\\
&amp;amp;=\displaystyle\frac{P(A)\cup [P(A\cap B)]}{P(A)+ P(B)}\\
&amp;amp;=\displaystyle\frac{P(A)}{P(A)+ P(B)}
\end{aligned}
\end{equation}$\hspace{12.5cm}\blacksquare$&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. Consider,
\begin{equation}\nonumber
P(A|B\cap C)=\displaystyle\frac{P(A\cap B\cap C)}{P(B\cap C)}
\end{equation}
Hence,
\begin{equation}\nonumber
P(A\cap B\cap C) = P(A|B\cap C)P(B\cap C)
\end{equation}
Now $P(B\cap C)=P(B|C)P(C)$, therefore
\begin{equation}\nonumber
P(A\cap B\cap C) = P(A|B\cap C)P(B|C)P(C)
\end{equation}$\hspace{12.5cm}\blacksquare$&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;$F(x)$ is a cdf if it satisfies the following conditions:
&lt;ol type=&quot;i&quot;&gt;
&lt;li&gt;$\displaystyle\lim_{x\to-\infty}F(x)=0$ and $\displaystyle\lim_{x\to\infty}F(x)=1$&lt;/li&gt;
&lt;li&gt;$F(x)$ is nondecreasing.&lt;/li&gt;
&lt;li&gt;$F(x)$ is right-continuous.&lt;/li&gt;
&lt;/ol&gt;
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;.
&lt;ol type=&quot;i&quot;&gt;
&lt;li&gt; $F(x)=\frac{1}{2}+\frac{1}{\pi}\arctan(x), x\in (-\infty, \infty)$
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLlRsznedngHHOItJ4pr1dG54MwxmYvM7ugD47BVjBTUACBv1E8gHkcMeVLy5AfVrWoBiResGloZvemrCVNTteWoiRxrl9M9X2TAIWiOLmbEKybXnkqLpIderZ0BMvaAGZ6KqFU75WMe7A/s1600/Screenshot+from+2014-09-20+21:11:29.png&quot; /&gt;&lt;/div&gt;
Above figure was generated by the following $\mathrm{\LaTeX}$ codes:&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/79dee5ee372d810e0e1b.js&quot;&gt;&lt;/script&gt;
\begin{equation}\nonumber
\begin{aligned}
\displaystyle\lim_{x\to-\infty}F(x)&amp;amp;=\displaystyle\lim_{x\to-\infty}
\left(\frac{1}{2}+\frac{1}{\pi}\arctan(x)\right)\\
&amp;amp;=\frac{1}{2}+\frac{1}{\pi}\displaystyle\lim_{x\to-\infty}\left(\arctan(x)\right)\\
&amp;amp;=\frac{1}{2}+\frac{1}{\pi}
\left(\frac{-\pi}{2}\right),\;\text{since}\;\displaystyle\lim_{x\to-\frac{\pi}{2}}\frac{\sin(x)}{\cos(x)}=-\infty\\
&amp;amp;=0\\[0.5cm]
\displaystyle\lim_{x\to\infty}F(x)&amp;amp;=\displaystyle\lim_{x\to\infty}
\left(\frac{1}{2}+\frac{1}{\pi}\arctan(x)\right)\\
&amp;amp;=\frac{1}{2}+\frac{1}{\pi}\displaystyle\lim_{x\to\infty}\left(\arctan(x)\right)\\
&amp;amp;=\frac{1}{2}+\frac{1}{\pi}
\left(\frac{\pi}{2}\right),\;\text{since}\;\displaystyle\lim_{x\to\frac{\pi}{2}}\frac{\sin(x)}{\cos(x)}=\infty\\
&amp;amp;=1
\end{aligned}
\end{equation}&lt;/li&gt;
&lt;li&gt; To test if $F(x)$ is nondecreasing, recall in Calculus that, first differentiation of the function tells us if it is decreasing or increasing. In particular, $\frac{dF(x)}{dx}&amp;gt;0$ tells us that the function is increasing in a given interval of $x$. Thus,
\begin{equation}
\nonumber
\frac{dF(x)}{dx}=\frac{d}{dx}\left(\frac{1}{2}+\frac{1}{\pi}\arctan(x)\right)=\frac{1}{\pi(1+x^2)}
\end{equation}
Confirm the above differentiation with Python using &lt;a href=&quot;http://sympy.org/&quot; target = &quot;_blank&quot;&gt;sympy&lt;/a&gt; module. &lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/f58b1a6f567cd7b1f302.js&quot;&gt;&lt;/script&gt;
Since $x^2$ is always positive for all $x$, thus $\frac{dF(x)}{dx}&amp;gt;0$, implying $F(x)$ is increasing.&lt;/li&gt;
&lt;li&gt; $F(x)$ is continuous, implies that $F(x)$ is right-continuous.&lt;/li&gt;
&lt;/ol&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;.
&lt;ol type=&quot;i&quot;&gt;
&lt;li&gt;$
F(x)=\displaystyle\frac{1}{1+e^{-x}}, x\in(-\infty,\infty)
$
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcKjJQoxB0sHDAxMxz-D9iBn0cwV6WAf-9FohWCx2gdvfCJCn0_L9u5PSdZ8YKMTMfJWj2cIIiNNNx7ZX_p5Zbwj6L8LVp6RHAO6U49mZYAv59tHPkyyvE47nCjN7ZkGv6rwrPtQ9RUr0Q/s1600/Screenshot+from+2014-09-20+21:13:04.png&quot; /&gt;&lt;/div&gt;
\begin{equation}\nonumber
\begin{aligned}
\displaystyle\lim_{x\to-\infty}F(x)&amp;amp;=\displaystyle\lim_{x\to-\infty}
\left(\frac{1}{1+e^{-x}}\right)\\
&amp;amp;=0\\[0.5cm]
\displaystyle\lim_{x\to\infty}F(x)&amp;amp;=\displaystyle\lim_{x\to\infty}
\left(\frac{1}{1+e^{-x}}\right)\\
&amp;amp;=\displaystyle\lim_{x\to\infty}
\left(\frac{1}{1+\frac{1}{e^{x}}}\right)\\
&amp;amp;=1
\end{aligned}
\end{equation}
Confirm these in Python,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/cc73f0976d9827f4ff0c.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt; Using the same method we did in (a), we have
\begin{equation}
\nonumber
\begin{aligned}
\frac{dF(x)}{dx}&amp;amp;=\frac{d}{dx}\left(\displaystyle\frac{1}{1+e^{-x}}\right)\\
&amp;amp;=\frac{e^{-x}}{(1+e^{-x})^2}
\end{aligned}
\end{equation}
$\frac{dF(x)}{dx}=\frac{e^{-x}}{(1+e^{-x})^2}&amp;gt;0,\;\forall\;x\in(-\infty,\infty)$. Thus the function is increasing in the interval of $x$.&lt;/li&gt;
&lt;li&gt; $F(x)$ is continuous, implies the function is right-continuous.
&lt;/li&gt;
&lt;/ol&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. 
&lt;ol type=&quot;i&quot;&gt;
&lt;li&gt;$F(x)=e^{-e^{-x}}, x\in (-\infty, \infty)$
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdsi4F01eRC2BXzbvgq4m96rFKRykg-npSjmjS-pF3Md3EGtxhiDUBnaxsXDtm-h4oqoPLm0VVKt-PGn4rTE-m-sJAO-uBoBtMExtFhIelpDKuIFDb4XVnCmeSCumLVFzm_3vExtRKt1Dd/s1600/Screenshot+from+2014-09-20+21:16:51.png&quot; /&gt;&lt;/div&gt;
\begin{equation}\nonumber
\begin{aligned}
\displaystyle\lim_{x\to-\infty}F(x)&amp;amp;=\displaystyle\lim_{x\to-\infty}
\left(e^{-e^{-x}}\right)\\
&amp;amp;=\displaystyle\lim_{x\to-\infty}
\left(\frac{1}{e^{\frac{1}{e^{x}}}}\right)\\
&amp;amp;=0\\[0.5cm]
\displaystyle\lim_{x\to\infty}F(x)&amp;amp;=\displaystyle\lim_{x\to\infty}
\left(e^{-e^{-x}}\right)\\
&amp;amp;=\displaystyle\lim_{x\to\infty}
\left(\frac{1}{e^{\frac{1}{e^{x}}}}\right)\\
&amp;amp;=1
\end{aligned}
\end{equation}&lt;/li&gt;
&lt;li&gt;Like what we did above, $\frac{dF(x)}{dx}$ is,
\begin{equation}
\nonumber
\frac{dF(x)}{dx}=\frac{d}{dx}\left(e^{-e^{-x}}\right)=e^{-x}e^{-e^{-x}}&amp;gt;0
\end{equation}
Because $e^{-x}e^{-e^{-x}}&amp;gt;0,\;\forall\; x\in(-\infty,\infty)$. Then we say $F(x)$ is an increasing function in the interval of $x$.&lt;/li&gt;
&lt;li&gt;$F(x)$ is continuous, implies that $F(x)$ is right-continuous.&lt;/li&gt;
&lt;/ol&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;.
&lt;ol type=&quot;i&quot;&gt;
&lt;li&gt;$F(x)=1-\displaystyle\frac{1}{e^{x}}, x\in(0,\infty)$
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVTU9Azh43Ni6CNnjphUEyQPN1shEVAKA0epGLkIFaaE-X1wElbrA8Yt-kUQzDgO7GnPCkg49a5vBR3igLPcX_AX5WQNrAOUuf0fwBYAcXAjpWwdpXpSxtUDGd1o4RxfaBLu2jTaSI6zPV/s1600/Screenshot+from+2014-09-20+21:31:10.png&quot; /&gt;&lt;/div&gt;
\begin{equation}\nonumber
\begin{aligned}
\displaystyle\lim_{x\to-\infty}F(x)&amp;amp;=\displaystyle
\lim_{x\to 0}F(x)=1-\displaystyle\lim_{x\to 0}
\left(\frac{1}{e^{x}}\right)
=0\\[0.5cm]
\displaystyle\lim_{x\to\infty}F(x)&amp;amp;=1-
\displaystyle\lim_{x\to\infty}
\left(\frac{1}{e^{x}}\right)=1
\end{aligned}
\end{equation}
&lt;/li&gt;
&lt;li&gt;
\begin{equation}\nonumber
\frac{dF(x)}{dx}=\frac{d}{dx}\left(1-\frac{1}{e^{x}}\right)=0-(-e^{-x})=\frac{1}{e^{x}}
\end{equation}
$F(x)$ is an increasing function since $\frac{1}{e^{x}}&amp;gt;0,\;\forall\;x\in(0,\infty)$.
&lt;/li&gt;
&lt;li&gt;$F(x)$ is right-continuous, since it is continuous.&lt;/li&gt;
&lt;/ol&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;.
The function in Equation (1.5.6) is given by,
\begin{equation}
F_Y(y)=\begin{cases}
\displaystyle\frac{1-\varepsilon}{1+e^{-y}}&amp;\text{if}\;y&lt;0,\; \text{for some}\;\varepsilon, 1&gt;\varepsilon&gt;0\\
\varepsilon+\displaystyle\frac{1-\varepsilon}{1+e^{-y}}&amp;\text{if}\;y\geq 0,\;\text{for some}\;\varepsilon, 1&gt;\varepsilon&gt;0
\end{cases}\nonumber
\end{equation}
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvjoqtZNkSivQY-obbSq9fi8qWsqgqRcIKEI8OZt6us2NR13TthdlZ-4DuRPE-v_4BHpWhMRQsqNYMI9e9ipiQBsiqMjxT3uGmUFCSMtZWW0x6-qqf-24uX0xTZ1HTNcsliD0vXeUGwmHQ/s1600/Screenshot+from+2014-09-20+21:51:06.png&quot; /&gt;
&lt;ol type=&quot;i&quot;&gt;
&lt;li&gt;
\begin{equation}\nonumber
\begin{aligned}
\displaystyle\lim_{y\to-\infty}F_Y(y)&amp;amp;=\displaystyle\lim_{y\to-\infty}
\left(\displaystyle\frac{1-\varepsilon}{1+e^{-y}}\right)=\displaystyle\lim_{y\to-\infty}
\left(\displaystyle\frac{1-\varepsilon}{1+\frac{1}{e^{y}}}\right)=0\\[0.5cm]
\displaystyle\lim_{y\to\infty}F(y)&amp;amp;=\displaystyle\lim_{y\to\infty}
\left(\varepsilon+\displaystyle\frac{1-\varepsilon}{1+e^{-y}}\right)=\varepsilon + \displaystyle\lim_{y\to\infty}
\left(\displaystyle\frac{1-\varepsilon}{1+\frac{1}{e^{y}}}\right)=1
\end{aligned}
\end{equation}
&lt;/li&gt;
&lt;li&gt;For $y&lt;0$, we have
\begin{equation}
\begin{aligned}
\frac{d}{dy}\left(\frac{1-\varepsilon}{1+e^{-y}}\right)&amp;=(1-\varepsilon)\frac{d}{dy}\left(\frac{1}{1+e^{-y}}\right)\\
&amp;=(1-\varepsilon)\frac{(1+\varepsilon^{-y})\cdot 0 - 1\cdot e^{-y}(-1)}{(1+e^{-y})^2}\\
&amp;=\frac{(1-\varepsilon)e^{-y}}{(1+e^{-y})^2}
\end{aligned}\nonumber
\end{equation}
$(1-\varepsilon)&gt;0$ since $0&lt;\varepsilon&lt;1$. Thus for all $y &lt; 0$, $\frac{(1-\varepsilon)e^{-y}}{(1+e^{-y})^2}&gt;0$, implying that the function is increasing. &lt;br /&gt;&lt;br /&gt;
For $y\geq 0$, 
\begin{equation}
\begin{aligned}
\frac{d}{dy}\left(\varepsilon+\frac{1-\varepsilon}{1+e^{-y}}\right)&amp;=\varepsilon+\frac{(1-\varepsilon)e^{-y}}{(1+e^{-y})^2}
\end{aligned}\nonumber
\end{equation}
The function is increasing since $\varepsilon + \frac{(1-\varepsilon)e^{-y}}{(1+e^{-y})^2}&gt;0$ for all $y\geq 0$.&lt;/li&gt;
&lt;li&gt;Since the function is continuous, then the function is right-continuous.&lt;/li&gt;
&lt;/ol&gt;
$\hspace{12.5cm}\blacksquare$
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. We know that,
\begin{equation}\nonumber
P(X&amp;gt;t)=1-P(X\leq t)=1-F_X(t)
\end{equation}
and
\begin{equation}\nonumber
P(Y&amp;gt;t)=1-P(Y\leq t)=1-F_Y(t)
\end{equation}
Hence we have,
\begin{equation}\nonumber
\begin{aligned}
P(X&amp;gt;t)=1-F_X(t)\;&amp;amp;\overset{?}{\geq}\;1-F_Y(t)=P(Y&amp;gt;t)\\
\end{aligned}
\end{equation}
Since $F_X(t)\leq F_Y(t)$, then the difference $1-F_X(t)$ tends to get bigger than $1-F_Y(t)$. Thus for all $t$, $P(X&amp;gt;t)\geq P(X&amp;gt;t)$.&lt;br /&gt;&lt;br /&gt;
Now if $F_X(t) &amp;lt; F_Y(t)$ for some $t$, then using the same argument above, $P(X&amp;gt;t) &gt; P(X&amp;gt;t)$ for some $t$. &lt;br /&gt;
$\hspace{13.5cm}\blacksquare$&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. For a function to be a pdf, it has to satisfy the following:
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;$g(x)\geq 0$ for all $x$; and,&lt;/li&gt;
&lt;li&gt; $\displaystyle\int_{-\infty}^{\infty}g(x)\,dx=1$.&lt;/li&gt;
&lt;/ol&gt;
For any arbitrary $x_0$, $F(x_0)&lt;1$. Thus, $g(x)$ is always positive. Now,
\begin{equation}
\begin{aligned}
\int_{-\infty}^{\infty}g(x)\,dx&amp;=
\int_{-\infty}^{x_0}g(x)\,dx+
\int_{x_0}^{\infty}g(x)\,dx\\
&amp;=\int_{x_0}^{\infty}g(x)\,dx\\
&amp;=\int_{x_0}^{\infty}\frac{f(x)}{(1-F(x_0))}\,dx\\
&amp;=\frac{1}{1-F(x_0)}\int_{x_0}^{\infty}f(x)\,dx\\
&amp;=\frac{1}{1-F(x_0)}[F(\infty)-F(x_0)]\\
&amp;=\frac{1}{1-F(x_0)}[1-F(x_0)]=1,\;\text{since}\;\lim_{x\to \infty}F(x)=1\\
\end{aligned}\nonumber
\end{equation}
$\hspace{13.5cm}\blacksquare$&lt;/li&gt;
&lt;li&gt;In order for $f(x)$ to be a pdf, it has to integrate to 1.
&lt;ol type=&quot;a&quot;&gt;
&lt;li&gt;\begin{equation}
\begin{aligned}
\int_{-\infty}^{\infty}f(x)&amp;amp;=\int_{0}^{\frac{\pi}{2}}\mathrm{c}\sin x=\displaystyle\left.-(\mathrm{c})\cos x\displaystyle\right\rvert_{0}^{\frac{\pi}{2}}\\
&amp;amp;=-\mathrm{c}\left(\cos\left(\frac{\pi}{2}\right)-\cos(0)\right)\\
&amp;amp;=-\mathrm{c}(0-1)=1\mathrm{c}
\end{aligned}\nonumber
\end{equation}
Hence, $\mathrm{c}$ is 1. Confirm this with python,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5c547c22bb8584743aed.js&quot;&gt;&lt;/script&gt;
&lt;/li&gt;
&lt;li&gt;
\begin{equation}
\begin{aligned}
\int_{-\infty}^{\infty}f(x)&amp;amp;=\int_{-\infty}^{\infty}
\mathrm{c}\,e^{-|x|}\\
&amp;amp;=\mathrm{c}\left(\int_{-\infty}^{0}
\,e^{x}\,dx+\int_{0}^{\infty}
e^{-x}\,dx\right)\\
&amp;amp;=\mathrm{c}\left[(e^{0}-e^{-\infty})-(e^{-\infty}-e^{0})\right]\\
&amp;amp;=\mathrm{c}(1+1) = 2\mathrm{c}
\end{aligned}\nonumber
\end{equation}
Hence, c is $\frac{1}{2}$. Confirm this with Python,&lt;br/&gt;&lt;br/&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8f69c125655ec96dedd5.js&quot;&gt;&lt;/script&gt;
&lt;br/&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9PmcmtVXW6eSPhVWRmQz_-ibAIS_U-VuyFFJe2UszyrsCyxqvAoGoj3Lmm_bpM6qUMB-xL62Tzap3TRKmW8sPj6Ab6NZKtARD-ELHR6mO7zd7BplVKHGon1sgSXijelEfHNTwOmHWS-DP/s1600/Screenshot+from+2014-09-20+22:48:04.png&quot; /&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126&quot; target=&quot;_blank&quot;&gt;Casella, G. and Berger, R.L. (2001). &lt;i&gt;Statistical Inference&lt;/i&gt;. Thomson Learning, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/8924714867856909647/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/09/probability-theory-problems.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8924714867856909647'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/8924714867856909647'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/09/probability-theory-problems.html' title='Probability Theory Problems'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLlRsznedngHHOItJ4pr1dG54MwxmYvM7ugD47BVjBTUACBv1E8gHkcMeVLy5AfVrWoBiResGloZvemrCVNTteWoiRxrl9M9X2TAIWiOLmbEKybXnkqLpIderZ0BMvaAGZ6KqFU75WMe7A/s72-c/Screenshot+from+2014-09-20+21:11:29.png" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-2606700552526644808</id><published>2014-09-12T11:08:00.000+08:00</published><updated>2015-12-27T09:52:43.872+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Data Mining"/><category scheme="http://www.blogger.com/atom/ns#" term="Image Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="Machine Learning"/><category scheme="http://www.blogger.com/atom/ns#" term="Multivariate Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><category scheme="http://www.blogger.com/atom/ns#" term="Signal Processing"/><category scheme="http://www.blogger.com/atom/ns#" term="Statistical Learning"/><title type='text'>R: k-Means Clustering on Imaging</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Enough with the theory we recently published, let&#39;s take a break and have fun on the application of Statistics used in Data Mining and Machine Learning, the &lt;i&gt;k&lt;/i&gt;-Means Clustering.&lt;br /&gt;
&lt;blockquote class=&quot;tr_bq&quot;&gt;
&lt;i&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/K-means_clustering&quot; target=&quot;_blank&quot;&gt;k-means clustering&lt;/a&gt;&lt;/i&gt; is a method of &lt;a href=&quot;http://en.wikipedia.org/wiki/Vector_quantization&quot; target=&quot;_blank&quot;&gt;vector quantization&lt;/a&gt;, originally from signal processing, that is popular for cluster analysis in data mining. &lt;i&gt;k&lt;/i&gt;-means clustering aims to partition &lt;i&gt;n&lt;/i&gt; observations into &lt;i&gt;k&lt;/i&gt; clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. (Wikipedia, Ref 1.)
&lt;/blockquote&gt;
We will apply this method to an image, wherein we group the pixels into &lt;i&gt;k&lt;/i&gt; different clusters. Below is the image that we are going to use,
&lt;br /&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgubnFMq4Ex3R0Csr6YexaRJ5NYpgXp94xHo58rlluSPwCTaaBZ_t1J8XUXKjUcF1TAuQ7MxoU5iXRhojop0NPLTjm1_NsaqTWzxT4GawvVRbfvWakepNpqqZohCWrrzIzXMoimpsOMECZa/s1600/ColorfulBird.jpg&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgubnFMq4Ex3R0Csr6YexaRJ5NYpgXp94xHo58rlluSPwCTaaBZ_t1J8XUXKjUcF1TAuQ7MxoU5iXRhojop0NPLTjm1_NsaqTWzxT4GawvVRbfvWakepNpqqZohCWrrzIzXMoimpsOMECZa/s1600/ColorfulBird.jpg&quot; height=&quot;263&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Colorful Bird From &lt;a href=&quot;http://www.wall321.com/Animals/Birds/colorful_birds_tropical_head_3888x2558_wallpaper_6566&quot; target=&quot;_blank&quot;&gt;Wall321&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
We will utilize the following packages for input and output:
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;&lt;b&gt;&lt;a href=&quot;http://cran.r-project.org/package=jpeg&quot; target=&quot;_blank&quot;&gt;jpeg&lt;/a&gt;&lt;/b&gt; - Read and write JPEG images; and,&lt;/li&gt;
&lt;li&gt;&lt;b&gt;&lt;a href=&quot;http://cran.r-project.org/package=ggplot2&quot; target=&quot;_blank&quot;&gt;ggplot2&lt;/a&gt;&lt;/b&gt; - An implementation of the Grammar of Graphics.&lt;/li&gt;
&lt;/ol&gt;&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;h3&gt;
Download and Read the Image&lt;/h3&gt;
Let&#39;s get started by downloading the image to our workspace, and tell R that our data is a JPEG file.&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/1fe9287c70625f74b0ef.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
Cleaning the Data&lt;/h3&gt;
Extract the necessary information from the image and organize this for our computation:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/542baaa83523f949adb4.js&quot;&gt;&lt;/script&gt;
The image is represented by large array of pixels with dimension &lt;i&gt;rows&lt;/i&gt; by &lt;i&gt;columns&lt;/i&gt; by &lt;i&gt;channels&lt;/i&gt; -- red, green, and blue or RGB.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Plotting&lt;/h3&gt;
Plot the original image using the following codes:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9d4f7f19152dabe9e89d.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqoH2FoQjdEXyorJFolDesk_3VFGeOgN8J7wPZGboSgn7rLie7SsNneZd-3HJiHhV4BGdYkBkYgGXfB2M0O7mpcdsslCfjJvT9diYBiS51rBsQxikSQLmoNtqGyQCVojZ5QYjcYD-t4HMV/s1600/Bird.png&quot; /&gt;&lt;/div&gt;
&lt;h3&gt;
Clustering&lt;/h3&gt;
Apply &lt;i&gt;k&lt;/i&gt;-Means clustering on the image:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/39330b3992c51380143f.js&quot;&gt;&lt;/script&gt;
Plot the clustered colours:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/57584dc6e72dc7a0875c.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVsMe1Wb-2zj2L8ZagOT8v-VU2j5mr_WiipAUc6GpmwL8c5No8akD4EPMmsdxFTkTwGhQQnJJ0hKx9iV6A1v-jqfN35KThkIhcY3wwCuhCQ55EFm-Idn5_hK1v67zNok9W3N3zIQfHFqq0/s1600/Bird2.png&quot; /&gt;&lt;/div&gt;
Possible clusters of pixels on different &lt;i&gt;k&lt;/i&gt;-Means:
&lt;br /&gt;&lt;br/&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Original&lt;/th&gt;&lt;th&gt;&lt;i&gt;k&lt;/i&gt; = 6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 1: Different &lt;i&gt;k&lt;/i&gt;-Means Clustering.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgzesECAb6iy17WPSC2cWzVw0ZV-pnkiWNghXgl0u6QOCK5Nt_WSc-mlWghifrEkoFjIbDaL5jitg5CUQ97QQZcZrPrBXph4wJpa7hOAUOLirFGcvIpiDa6UO6kuYT3rXDfs9mmt4eZuuV/s1600/Bird.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgzesECAb6iy17WPSC2cWzVw0ZV-pnkiWNghXgl0u6QOCK5Nt_WSc-mlWghifrEkoFjIbDaL5jitg5CUQ97QQZcZrPrBXph4wJpa7hOAUOLirFGcvIpiDa6UO6kuYT3rXDfs9mmt4eZuuV/s1600/Bird.png&quot; height=&quot;140&quot; width=&quot;200&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn-RCm4Yx4vsGGHkTSxNIFwJY7SyVpXjlex5OLNEXg2Jqyv-zRdyvCr45Nmk_tX2AGQPy-FYqwleEaX3EDn9Mp4MjNDIxpR7OQw5aUslDUl760Zvfm6u5d3tVifJ_w0GABs76enjJRh9Ym/s1600/Bird3.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn-RCm4Yx4vsGGHkTSxNIFwJY7SyVpXjlex5OLNEXg2Jqyv-zRdyvCr45Nmk_tX2AGQPy-FYqwleEaX3EDn9Mp4MjNDIxpR7OQw5aUslDUl760Zvfm6u5d3tVifJ_w0GABs76enjJRh9Ym/s1600/Bird3.png&quot; height=&quot;140&quot; width=&quot;200&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;i&gt;k&lt;/i&gt; = 5&lt;/th&gt;&lt;th&gt;&lt;i&gt;k&lt;/i&gt; = 4&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJPse5un5EC_juFT41MX4XWKUvMHX0512Jq-lBrea_NwsrS9JZzE4etpy8tqQeYS8e8A0-BBNeE1TfA-r7udT4zVRhAIYLhyphenhyphenqckSfRHi0yKHHafyCR42HiNUZ8f2RBTh7kAVb3PwLwhpIN/s1600/Bird4.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJPse5un5EC_juFT41MX4XWKUvMHX0512Jq-lBrea_NwsrS9JZzE4etpy8tqQeYS8e8A0-BBNeE1TfA-r7udT4zVRhAIYLhyphenhyphenqckSfRHi0yKHHafyCR42HiNUZ8f2RBTh7kAVb3PwLwhpIN/s1600/Bird4.png&quot; height=&quot;140&quot; width=&quot;200&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpP5g3zXI7UqOlWEi4IuF26HyQM_0RwESm6mfqfWGqUwHC-oGS7P3_1ryAApE1y5VSDtja9FRCuTaz1wESok7t3W_crWRUtdPBG-1jvQ1CWdIrJdU53XsOgqTmcq8DS_e8g3WAxvqcx7QS/s1600/Bird5.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpP5g3zXI7UqOlWEi4IuF26HyQM_0RwESm6mfqfWGqUwHC-oGS7P3_1ryAApE1y5VSDtja9FRCuTaz1wESok7t3W_crWRUtdPBG-1jvQ1CWdIrJdU53XsOgqTmcq8DS_e8g3WAxvqcx7QS/s1600/Bird5.png&quot; height=&quot;140&quot; width=&quot;200&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;i&gt;k&lt;/i&gt; = 3&lt;/th&gt;&lt;th&gt;&lt;i&gt;k&lt;/i&gt; = 2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisnoZEEfRshGVaMkHtXNtQDevH0mdDIO9cZ4pnMhOzcL9sn_DyrCXWgJHT4DDGt5IRY0UCQG1BubJk8Kt3S42lCbZ4B-8rfvcLebNlaxdLwPN_nroTDYLPJurVfWrCBtm0pruQYKnuw60T/s1600/Bird2.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisnoZEEfRshGVaMkHtXNtQDevH0mdDIO9cZ4pnMhOzcL9sn_DyrCXWgJHT4DDGt5IRY0UCQG1BubJk8Kt3S42lCbZ4B-8rfvcLebNlaxdLwPN_nroTDYLPJurVfWrCBtm0pruQYKnuw60T/s1600/Bird2.png&quot; height=&quot;140&quot; width=&quot;200&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnIH96QSX_cVxOTKg_YqlySXtV-RW-RLrMLfFiRVNsHW177FauCcJl16_xwcgHLh5WwSNzWggLODUt6NKQZ02Yha5Zi8NtoU1HjLzWXXXQZEJ0k4KAWYErwlSBucRaBnkATG52UQMFOPBn/s1600/Bird6.png&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: 1em; margin-right: 1em;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnIH96QSX_cVxOTKg_YqlySXtV-RW-RLrMLfFiRVNsHW177FauCcJl16_xwcgHLh5WwSNzWggLODUt6NKQZ02Yha5Zi8NtoU1HjLzWXXXQZEJ0k4KAWYErwlSBucRaBnkATG52UQMFOPBn/s1600/Bird6.png&quot; height=&quot;140&quot; width=&quot;200&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br/&gt;
I suggest you try it!
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/k-means_clustering&quot; target=&quot;_blank&quot;&gt;K-means clustering&lt;/a&gt;. &lt;i&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Main_Page&quot; target=&quot;_blank&quot;&gt;Wikipedia&lt;/a&gt;&lt;/i&gt;. Retrieved September 11, 2014.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/2606700552526644808/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/09/r-k-means-clustering-on-image.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2606700552526644808'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/2606700552526644808'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/09/r-k-means-clustering-on-image.html' title='R: k-Means Clustering on Imaging'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgubnFMq4Ex3R0Csr6YexaRJ5NYpgXp94xHo58rlluSPwCTaaBZ_t1J8XUXKjUcF1TAuQ7MxoU5iXRhojop0NPLTjm1_NsaqTWzxT4GawvVRbfvWakepNpqqZohCWrrzIzXMoimpsOMECZa/s72-c/ColorfulBird.jpg" height="72" width="72"/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-7457991516473879219</id><published>2014-09-08T23:09:00.000+08:00</published><updated>2014-10-03T17:43:14.230+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Mathematics"/><category scheme="http://www.blogger.com/atom/ns#" term="Real Analysis"/><title type='text'>Lebesgue Measure and Outer Measure Problems</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
More proving, still on Real Analysis. This is my solution and if you find any errors, do let me know.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Problems&lt;/h3&gt;
&lt;u&gt;Lebesgue Measure&lt;/u&gt;: Let $\mu$ be set function defined for all set in $\sigma$-algebra $\mathscr{F}$ with values in $[0,\infty]$. Assume $\mu$ is countably additive over countable disjoint collections of sets in $\mathscr{F}$.
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Prove that if $A$ and $B$ are two sets in $\mathscr{F}$, with $A\subseteq B$, then $\mu(A)\leq \mu(B)$. This property is called &lt;i&gt;monotonicity&lt;/i&gt;.&lt;/li&gt;
&lt;li&gt;Prove that if there is a set $A$ in the collection $\mathscr{F}$ for which $\mu(A)&amp;lt;\infty$, then $\mu(\emptyset)=0$.&lt;/li&gt;
&lt;li&gt;Let $\{E_{k}\}_{k=1}^{\infty}$ be a countable collection of sets in $\mathscr{F}$. Prove that $\mu\left(\displaystyle\bigcup_{k=1}^{\infty}E_{k}\right)\leq \displaystyle\sum_{k=1}^{\infty}\mu(E_k)$&lt;/li&gt;
&lt;/ol&gt;
&lt;u&gt;Lebesgue Outer Measure&lt;/u&gt;:
&lt;br /&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;By using property of outer measure, prove that the interval $[0,1]$ is not countable.&lt;/li&gt;
&lt;li&gt;Let $A$ be the set of irrational numbers in the interval $[0,1]$. Prove that $\mu^{*}(A)=1$.&lt;/li&gt;
&lt;li&gt;Let $B$ be the set of rational numbers in the interval $[0,1]$, and let $\{I_k\}_{k=1}^{n}$ be finite collection of open intervals that covers $B$. Prove that $\displaystyle\sum_{k=1}^{n}\mu^{*}(I_k)\geq 1$.&lt;/li&gt;
&lt;li&gt;Prove that if $\mu^{*}(A)=0$, then $\mu^{*}(A\cup B)=\mu^{*}(B).$&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
&lt;h3&gt;
Solutions&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. If $A\subseteq B$, then $B= A\cup (B\cap A^c)\Rightarrow B= A\cup (B\backslash A)$. Thus,
\begin{equation}\nonumber
\begin{aligned}
\mu(B)&amp;amp;= \mu(A\cup (B\backslash A))\\
&amp;amp;= \mu(A)+\mu(B\backslash A)\\
&amp;amp;(\text{since}\;\mu\;\text{is countably additive on disjoint sets})
\end{aligned}
\end{equation}
We can see that $\mu(B)\geq \mu(A)$ since $\mu(B\backslash A) &gt; 0$.
$\hspace{13.5cm}\blacksquare$&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. For any set $A$ in $\mathscr{F}$ such that $\mu(A)&lt;\infty$, $A\cup \emptyset = A$. Thus,
\begin{equation}\nonumber
\begin{aligned}
\mu(A)&amp;amp;=\mu(A\cup \emptyset)=\mu(A)-\mu(\emptyset)\\
0&amp;=\mu(\emptyset)
\end{aligned}
\end{equation}
$\hspace{13.5cm}\blacksquare$&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. We define a sequence $\{A_n\}_{n=1}^{\infty}\subseteq\mathscr{F}$, such that $A_1=E_1$ and
\begin{equation}\nonumber
A_n = E_n \backslash \bigcup_{k=1}^{n-1}E_k,\;\text{for}\;n&gt;1
\end{equation}
It is easy to see that $A_n$ is pairwise disjoint, and $\bigcup_{n=1}^{\infty}A_n=\bigcup_{k=1}^{\infty}E_k$, also $\{A_n\}\subseteq \{E_k\}$. Thus by countably additive and monotonicity property of $\mu$, we have 
\begin{equation}\nonumber
\begin{aligned}
\mu\left(\bigcup_{k=1}^{\infty}E_k\right)&amp;=\mu\left(\bigcup_{n=1}^{\infty}A_n\right)\\
&amp;=\sum_{n=1}^{\infty}\mu(A_n)\\
&amp;\leq \sum_{k=1}^{\infty}\mu(E_k)\;(\text{by monotonicty}).    
\end{aligned}
\end{equation}
$\hspace{13.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. Let&#39;s prove this by contradiction, assume the interval $[0,1]$ is countable. Then we need to show that $\mu^{*}([0,1])=0$ for it to be countable. Now consider $\varepsilon &amp;gt;0$, such that $I = \{[\varepsilon - 0, 1 + \varepsilon]\}$ covers $[0,1]$. Then by property of outer measure that says, $\mu^{*}([a,b])$ is the length of $[a,b]$, we have
\begin{equation}\nonumber
\mu^{*}([0,1]) = \inf\,\{\ell (I)\} = (1+\varepsilon) - (0-\varepsilon) = 1+2\varepsilon
\end{equation}
This holds for each $\varepsilon &amp;gt;0$, thus $\mu^{*}([0,1])=1$ which is a contradiction.$\hspace{2.13cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. If $A=\{\mathbb{Q}^c\cap [0,1]\}$ is the set of irrational numbers in the interval $[0,1]$, then $A^c=\{\mathbb{Q}\cap [0,1]\}$ is the set of rational numbers in the interval $[0,1]$. Now consider the following,
\begin{equation}\nonumber
\begin{aligned}
\mu^{*}([0,1])&amp;amp;=\mu^{*}(A)+\mu^{*}(A^{c})\\
\mu^{*}(A)&amp;amp;=\mu^{*}([0,1]) - \mu^{*}(A^{c})\\
&amp;amp;=1 -\mu^{*}(A^{c})
\end{aligned}
\end{equation}
We need to show that $\mu^{*}(A^{c})$ has outer measure zero. To do that, let $a_1,a_2,\cdots\in A^{c}$. And for $\varepsilon &gt; 0$, $\exists$ $\{I_n\}$ such that $\{I_n\} = \{(a_n - \frac{\varepsilon}{2^{k+1}}, a_n + \frac{\varepsilon}{2^{k+1}})\}$. Thus, $\bigcup_{n=1}^{\infty}I_n$ covers $A^{c}$, and by outer measure we have,
\begin{equation}\nonumber
\begin{aligned}
\mu^{*}(A^c)&amp; \leq \inf\left\{\sum_{n=1}^{\infty}\ell(I_n)\right\}\\
&amp;\leq \inf\left\{\sum_{n=1}^{\infty}\left(a_n + \frac{\varepsilon}{2^{k+1}} - a_n + \frac{\varepsilon}{2^{k+1}}\right) \right\}\\
&amp;\leq \inf\left\{\sum_{n=1}^{\infty}\left(\frac{\varepsilon}{2^{k}}\right) \right\}\\
&amp;\leq \varepsilon
\end{aligned}
\end{equation}
Since this hold for each $\varepsilon$ then $\mu^{*}(A^c)=0$.
Thus, $\mu^{*}(A)=1-0=1$.&lt;br/&gt;
$\hspace{13.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. The rational numbers are dense in $\mathbb{R}$. Thus, any point in the interval $[0,1]$ may it be irrational numbers will always be a point of closure of $B$, that is $\bar{B}=[0,1]$. Since $B\subseteq \bigcup_{k=1}^{\infty}I_k$, then by closure property, $\bar{B}\subseteq \overline{\bigcup_{k=1}^{\infty}I_k}$, which is $\bar{B}\subseteq \bigcup_{k=1}^{\infty}\bar{I}_k$. Thus by definition of outer measure we have,
\begin{equation}\nonumber
\begin{aligned}
1=\mu^{*}([0,1])&amp;=\mu^{*}(\bar{B})\leq \mu^{*}\left(\bigcup_{k=1}^{\infty}\bar{I}_k\right)\\
&amp;\leq \sum_{k=1}^{\infty}\mu^{*}(\bar{I}_k)=\sum_{k=1}^{\infty}\mu^{*}(I_k).
\end{aligned}
\end{equation}
Thus,
\begin{equation}\nonumber\sum_{k=1}^{\infty}\mu^{*}(I_k)\geq 1
\end{equation}
$\hspace{13.5cm}\blacksquare$
&lt;/li&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. We need to show that,
\begin{equation}\nonumber
\begin{aligned}
&amp;\mu^{*}(A\cup B)\leq \mu^{*}(B)\\
&amp;\mu^{*}(B)\leq \mu^{*}(A\cup B)
\end{aligned}
\end{equation}
&lt;ol type = &quot;i&quot;&gt;
&lt;li&gt; From the definition of outer measure,
\begin{equation}
\nonumber
\begin{aligned}
\mu^{*}(A\cup B)&amp;\leq \mu^{*}(A)+\mu^{*}(B)\\
&amp;\leq \mu^{*}(B).
\end{aligned}
\end{equation}
&lt;/li&gt;
&lt;li&gt;Since $B\subseteq A\cup B$, then from property of outer measure that if $A\subseteq B$, then $\mu^{*}(A)\leq \mu^{*}(B)$. Hence,
\begin{equation}\nonumber
\mu^{*}(B)\leq \mu^{*}(A\cup B)
\end{equation}
&lt;/li&gt;
&lt;/ol&gt;
$\hspace{13.5cm}\blacksquare$
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.co.uk/gp/product/013143747X/ref=pd_lpo_sbs_dp_ss_3/275-0027308-5123953?pf_rd_m=A3P5ROKL5A1OLE&amp;amp;pf_rd_s=lpo-top-stripe&amp;amp;pf_rd_r=1GV7YACR2ZFEJSRSFD3N&amp;amp;pf_rd_t=201&amp;amp;pf_rd_p=479289247&amp;amp;pf_rd_i=0135113555&quot; target=&quot;_blank&quot;&gt;Royden, H.L. and Fitzpatrick, P.M. (2010). &lt;i&gt;Real Analysis&lt;/i&gt;. Pearson Education, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/7457991516473879219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/09/lebesgue-measure-and-outer-measure.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/7457991516473879219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/7457991516473879219'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/09/lebesgue-measure-and-outer-measure.html' title='Lebesgue Measure and Outer Measure Problems'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-875594146541861571</id><published>2014-09-07T11:15:00.000+08:00</published><updated>2014-09-08T23:20:16.290+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Mathematics"/><category scheme="http://www.blogger.com/atom/ns#" term="Real Analysis"/><title type='text'>Translation Invariant of Lebesgue Outer Measure</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Another proving problem, this time on Real Analysis.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Problem&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
Prove that the Lebesgue outer measure is translation invariant. (Use the property that, the length of an interval $l$ is translation invariant.)
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h3&gt;
Solution&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;i&gt;Proof&lt;/i&gt;. The outer measure is translation invariant if for $y\in \mathbb{R}$,
\begin{equation}\nonumber
\mu^{*}(A)=\mu^{*}(A+y)
\end{equation}
Hence, we need to show that Case 1: $\mu^{*}(A)\leq \mu^{*}(A+y)$; and Case 2: $\mu^{*}(A+y)\leq \mu^{*}(A)$.&lt;br /&gt;&lt;br /&gt;
&lt;u&gt;Case 1&lt;/u&gt;: Consider a countable collection $\{I_n\}_{n=1}^{\infty}$, and let
\begin{equation}\nonumber
W = \left\{\displaystyle\sum_{n=1}^{\infty}l(I_n)\mid A\subseteq\displaystyle\bigcup_{n=1}^{\infty}I_n\right\}
\end{equation}
Then the outer measure of $A$ is,
\begin{equation}\nonumber
\mu^{*}(A)=\inf\,\{W\}.
\end{equation}
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;
Now consider $x\in W$, then there is a particular collection $\hat{I}_n$ that covers $A$, such that $\displaystyle\sum_{n=1}^{\infty}l(\hat{I}_n)=x$, and that of course is the $\inf\,\{W\}$. Further, we see that the collection $\{\hat{I}_n+y\}$ covers $A+y$, that is, $A+y\subseteq \displaystyle\bigcup_{n=1}^{\infty}\{\hat{I}_n + y\}$. And from this, we obtain the following outer measure:
\begin{equation}\nonumber
\begin{aligned}
\mu^{*}(A+y)&amp;amp;=\displaystyle\sum_{n=1}^{\infty}l(\hat{I}_n+y)\\
&amp;amp;=\displaystyle\sum_{n=1}^{\infty}l(\hat{I}_n),\;\text{since}\;l\;\text{is translation invariant}.\\
&amp;amp;=x.
\end{aligned}
\end{equation}
And therefore, $W\subseteq\left\{\displaystyle\sum_{n=1}^{\infty}I_n\mid A+y\subseteq \displaystyle\bigcup_{n=1}^{\infty}I_n\right\}$, implying $\mu^{*}(A)\leq \mu^{*}(A+y)$.&lt;br /&gt;&lt;br /&gt;
&lt;u&gt;Case 2&lt;/u&gt;: Using the same flow of reasoning as in Case 1, consider a countable collection $\{I_n\}_{n=1}^{\infty}$, and let
\begin{equation}\nonumber
V = \left\{\displaystyle\sum_{n=1}^{\infty}l(I_n)\mid A+y\subseteq\displaystyle\bigcup_{n=1}^{\infty}I_n\right\}
\end{equation}
Then the outer measure of $A$ is,
\begin{equation}\nonumber
\mu^{*}(A+y)=\inf\,\{V\}.
\end{equation}
Now consider $x\in V$, then there is a particular collection $\hat{I}_n$ that covers $A+y$, such that $\displaystyle\sum_{n=1}^{\infty}l(\hat{I}_n)=x$, and that of course is the $\inf\,\{V\}$. Further, we see that the collection $\{\hat{I}_n+(-y)\}$ covers $A$, that is, $A\subseteq \displaystyle\bigcup_{n=1}^{\infty}\{\hat{I}_n + (-y)\}$. And from this, we obtain the following outer measure:
\begin{equation}\nonumber
\begin{aligned}
\mu^{*}(A)&amp;amp;=\displaystyle\sum_{n=1}^{\infty}l(\hat{I}_n+(-y))\\
&amp;amp;=\displaystyle\sum_{n=1}^{\infty}l(\hat{I}_n),\;\text{since}\;l\;\text{is translation invariant}.\\
&amp;amp;=x.
\end{aligned}
\end{equation}
And therefore, $V\subseteq\left\{\displaystyle\sum_{n=1}^{\infty}I_n\mid A\subseteq \displaystyle\bigcup_{n=1}^{\infty}I_n\right\}$, implying $\mu^{*}(A+y)\leq \mu^{*}(A)$.&lt;br /&gt;&lt;br /&gt;
Since we have shown both cases, then $\mu^{*}(A)=\mu^{*}(A+y).\hspace{3.7cm}\blacksquare$
&lt;/li&gt;
&lt;/ol&gt;
&lt;br /&gt;
&lt;h3&gt;
Reference&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;a href=&quot;http://www.amazon.co.uk/gp/product/013143747X/ref=pd_lpo_sbs_dp_ss_3/275-0027308-5123953?pf_rd_m=A3P5ROKL5A1OLE&amp;amp;pf_rd_s=lpo-top-stripe&amp;amp;pf_rd_r=1GV7YACR2ZFEJSRSFD3N&amp;amp;pf_rd_t=201&amp;amp;pf_rd_p=479289247&amp;amp;pf_rd_i=0135113555&quot; target=&quot;_blank&quot;&gt;Royden, H.L. and Fitzpatrick, P.M. (2010). &lt;i&gt;Real Analysis&lt;/i&gt;. Pearson Education, Inc.&lt;/a&gt; 
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/875594146541861571/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/09/translation-invariant-of-lebesgue-outer.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/875594146541861571'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/875594146541861571'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/09/translation-invariant-of-lebesgue-outer.html' title='Translation Invariant of Lebesgue Outer Measure'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5979497974446854318.post-1341742472849113803</id><published>2014-09-05T20:55:00.000+08:00</published><updated>2014-09-12T10:57:54.712+08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Image Analysis"/><category scheme="http://www.blogger.com/atom/ns#" term="Mathematica"/><category scheme="http://www.blogger.com/atom/ns#" term="R"/><category scheme="http://www.blogger.com/atom/ns#" term="Signal Processing"/><title type='text'>R: Image Analysis using EBImage</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;
Currently, I am taking &lt;i&gt;Statistics for Image Analysis&lt;/i&gt; on my masteral, and have been exploring this topic in R. One package that has the capability in this field is the &lt;a href=&quot;http://www.bioconductor.org/packages/release/bioc/html/EBImage.html&quot; target=&quot;_blank&quot;&gt;EBImage&lt;/a&gt; from &lt;a href=&quot;http://www.bioconductor.org/&quot; target=&quot;_blank&quot;&gt;Bioconductor&lt;/a&gt;, which will be showcased in this post.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Installation&lt;/h3&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/83573359af6c57dcea4f.js&quot;&gt;&lt;/script&gt;
For those using Ubuntu, you may likely to encounter this error:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/3bec32e1ad17515d7b3a.js&quot;&gt;&lt;/script&gt;
It has something to do with the &lt;code&gt;tiff.h&lt;/code&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Include_directive&quot; target=&quot;_blank&quot;&gt;C header file&lt;/a&gt;, but it&#39;s not that serious since &lt;a href=&quot;http://mytechscribblings.wordpress.com/&quot; target=&quot;_blank&quot;&gt;mytechscribblings&lt;/a&gt; has an effective &lt;a href=&quot;http://mytechscribblings.wordpress.com/2013/06/28/installing-ebimage-package-for-r-using-rstudio-in-ubuntu/&quot; target=&quot;_blank&quot;&gt;solution&lt;/a&gt; for this, do check that out.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Importing Data&lt;/h3&gt;
To import a raw image, consider the following codes:&lt;br /&gt;
&lt;a name=&#39;more&#39;&gt;&lt;/a&gt;&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/40778c752b73d5de4186.js&quot;&gt;&lt;/script&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPEXI_-DFfoRDUEqFOSRSBbeXbiZegBTr41GW35WGMp1UFlpg6Pb3BDJzQ1GhYxYe8EHC0YyyNP9drQhiYddjME3qOue2Ej2KX9hscBNjIj358L1jrUnnGpc4a2CxZs2PBC-dak-ZA3ypB/s1600/tinago.JPG&quot; imageanchor=&quot;1&quot; style=&quot;margin-left: auto; margin-right: auto;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPEXI_-DFfoRDUEqFOSRSBbeXbiZegBTr41GW35WGMp1UFlpg6Pb3BDJzQ1GhYxYe8EHC0YyyNP9drQhiYddjME3qOue2Ej2KX9hscBNjIj358L1jrUnnGpc4a2CxZs2PBC-dak-ZA3ypB/s1600/tinago.JPG&quot; height=&quot;300&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
Yes, this is the photo that we are going to use for our analysis. Needless to say, that&#39;s me and my friends. In the proceeding section we will do image manipulation and other processing.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Image Properties&lt;/h3&gt;
So what do we get from our raw image? To answer that, simply run &lt;code&gt;print(Image)&lt;/code&gt;. This will return the properties of the image, including the array of pixel values. With these information, we apply mathematical and statistical operations to do enhancement on the image.&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/abd3f84e2eb19fea58f9.js&quot;&gt;&lt;/script&gt;
There are two sections (Summary and array of the pixels) in the above output, with the following entries for the first section:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Code&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 1: Information from 1st section of &lt;code&gt;print(Image)&lt;/code&gt;.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;colormode&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Color&lt;/td&gt;&lt;td&gt;The type (Color/Grayscale) of the color of the image.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;storage.mode&lt;/code&gt;&lt;/td&gt;&lt;td&gt;double&lt;/td&gt;&lt;td&gt;Type of values in the array.&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;dim&lt;/code&gt;&lt;/td&gt;&lt;td&gt;1984 1488 3&lt;/td&gt;&lt;td&gt;Dimension of the array, (x, y, z).&lt;/td&gt;&lt;/tr&gt;
&lt;tr class=&quot;alt&quot;&gt;&lt;td&gt;&lt;code&gt;nb.total.frames:&lt;/code&gt;&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;Number of channels in each pixel, z entry in &lt;code&gt;dim&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nb.render.frames&lt;/code&gt;&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Number of channels rendered.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
The second section is the obtained values from mapping pixels in the image to the real line between 0 and 1 (inclusive). Both extremes of this interval [0, 1], are black and white colors, respectively. Hence, pixels with values closer to any of these end points are expected to be darker or lighter, respectively. And because pixels are contained in a large array, then we can do all matrix manipulations available in R for processing.
&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
Adjusting Brightness&lt;/h3&gt;
It is better to start with the basic first, one of which is the brightness. As discussed above, brightness can be manipulated using &lt;code&gt;+&lt;/code&gt; or &lt;code&gt;-&lt;/code&gt;:&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/9994581ad659b5fdb124.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Lighter&lt;/th&gt;&lt;th&gt;Darker&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 2: Adjusting Brightness.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBhuneKWbkGzwxDMM9MS1e7Xp0ONwJuH7YAilVMQ4RBODJFhVsV5feHiTNfUcNsPj2d6h8FLJ7LHSfsiAKtphV7SzvBUf-8jFqDbKlCYefBSCap2rbYf1MN6MxfxkMCDXItEegr9sHNT2-/s200/Image3.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image1)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZdJ7h_xa8fwAD5H23p0QAl68kVJsELDRQ2d9stbqc2L3AKpNI2Ogdbi2CaMYaZ_-B4t9UJdyek_95bGYVED6RKwziV5LEnILWSQikXBDWk0UezwH4nDmTDl1mtNXQ5_Lrf-WFn9w7FWCh/s200/Image2.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image2)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;
Adjusting Contrast&lt;/h3&gt;
Contrast can be manipulated using multiplication operator(&lt;code&gt;*&lt;/code&gt;):
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/5ece4021aeef8c08d38d.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Low&lt;/th&gt;&lt;th&gt;High&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 3: Adjusting Contrast.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHZcTD4WEqWNOttrO0SZ3fscnA3TOyELCs-TzMapMyiyPT2RcApfaBq_l5FeBxRep-ex8nqp6A49L4u5kDHZE0i5w1oz_tGgAFVa-1OjQeN4jl0Ie-t1bgIphYf48OctMAZqey6G4ntJH0/s200/Image4.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image3)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4JjxKyTEwCIA5mkrHHBDg44mM_soleomfG0QBP49awf5GORAjENCCPccV2vlRasKhyWnWxHIG7twIEZZppX5-XW3v6T-a5s6RrRfSMlmD0yoOcd1PsxexDVn1pz74tYnnUcEYPxjWhmPm/s200/Image5.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image4)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;
Gamma Correction&lt;/h3&gt;
Gamma correction is the name of a nonlinear operation used to code and decode luminance or tristimulus values in video or still image systems, defined by the following power-law expression:
\begin{equation}\nonumber
V_{\mathrm{out}} = AV_{\mathrm{in}}^{\gamma}
\end{equation}
where $A$ is a constant and the input and output values are non-negative real values; in the common case of $A = 1$, inputs and outputs are typically in the range 0-1. A gamma value $\gamma&amp;lt; 1$ is sometimes called an &lt;b&gt;encoding gamma&lt;/b&gt; (Wikipedia, Ref. 1).&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/24632941aba052c44641.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;$\gamma = 2$&lt;/th&gt;&lt;th&gt;$\gamma = 0.7$&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 4: Adjusting Gamma Correction.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9gq2y6gBLH7BIocxW-ZzcsfvqSQXcF_eqi1-a-OTVKS6ZYig4-FUntAudcUJoTaVxS9UQaMkPiYn02XYCicWCwTGoJHWoWmeCsztiiDHZ8xOV846LtBy07BrFJfKpFthyphenhyphenKu-PQIcM1MiD/s200/Image6.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image5)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDfm8AHAZyurELK69vF76Qo7P3V4_sJq5obzq5yFO4hgQcOV33TMkd_1lhE2vyjmqPCx519SX_qh4oJ_f50IhdUp2SGp_NrPfTx6TcvxQLDFJe0fuHiOPiFkx5fCh3ZrFKDur_CPLNE38f/s200/Image7.png&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(Image6)&lt;/code&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;
Cropping&lt;/h3&gt;
Slicing array of pixels, simply mean cropping the image.
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8a5b5cf36dc74be36574.js&quot;&gt;&lt;/script&gt;
&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEDSwuGs0NR7z1hEdQAA49DhOcOvQbjM1GPgPaT6dWwDCPf7ldrDIG8_OJrkfM69-CGZbkw9d9jAnyWNW268slmGaRo2tvi6dWpr0lzdO-MuwR6QbazI_x8TZSmsgbcWFk-NUzhrI6_3HW/s1600/Image8.png&quot; height=&quot;187&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; width=&quot;320&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of the above code.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;h3&gt;
Spatial Transformation&lt;/h3&gt;
Spatial manipulation like rotate (&lt;code&gt;rotate&lt;/code&gt;), flip (&lt;code&gt;flip&lt;/code&gt;), and translate (&lt;code&gt;translate&lt;/code&gt;) are also available in the package. Check this out,
&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8fde183b0944c1044fb2.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhWmXdxfB-CSmX-4RsyU-tEe12uZLL-31ap6lx11E4BlO0X2dTgJub9GHQ9xJN7sww6sh3y5V2fm4a7ownC-LxHudDIISVoxSnw0eHUqGPaH-k8pcp-E7kim1CeUl-MZS1g2ZtmWFkDEpH6/s1600/Image9.png&quot; height=&quot;240&quot; width=&quot;320&quot; /&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;h3&gt;
Color Management&lt;/h3&gt;
Since the array of pixels has three axes in its dimension, for example in our case is 1984 x 1488 x 3. The third axis is the slot for the three channels: Red, Green and Blue, or RGB. Hence, transforming the &lt;code&gt;color.mode&lt;/code&gt; from &lt;code&gt;Color&lt;/code&gt; to &lt;code&gt;Grayscale&lt;/code&gt;, implies disjoining the three channels from single rendered frame (three channels for each pixel) to three separate array of pixels for red, green, and blue frames.&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/7c2348ec3131ae6a173f.js&quot;&gt;&lt;/script&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Original&lt;/th&gt;&lt;th&gt;Red Channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 5: Color Mode Transformation.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlV1YQolUACP35HDj3oWIde02mlfW2NQXcAGfqkWEVHfdg5b36E9eE6A-uyE02_kb81qB8QfXbntuQecyiribslXdIY6Lkl78GF1rcC8D2DphsIWnWCha23jATPVjQoYvbmdf3my5LgYJr/s200/Image1.png&quot; /&gt;&lt;/div&gt;
&lt;/td&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRCof9diG8VM1MWZqXXVtS676c8OdGY9wmWmQVw2F_D76icPtyFdhatk2LiLszI6iR1TJHAcZtFyG-ruaavGrmBo8TT6DWvo05sIXzb-qQPUtUtrJiZ5TFu8dMQE5QEYABiE5Dm3WX3-hB/s200/Image10.png&quot; /&gt;&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Green Channel&lt;/th&gt;&lt;th&gt;Blue Channel&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmNAFfvSAcukgR3Efi-lS2GIzzhUtKATA06vxpDbxx6lb_nX-CrLm6jewe3KywLuBTNWv9uTSOpRG8XLVcaL4ENH2W3ztalIt-Zje6JI170UEza4G9Yvpmge4tc0wA3KPwWzgRviinNgsu/s200/Image10-g.png&quot; /&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg92Qst7awrgOp62tE_WECVewnRieGeUgKPqI0nW13SntXBfsaDt62GbulL9J_7wKoR77wbyKZ-qjArqQtZsHqwslL7ZNTdG-UdYDR9k4fteh68qwry03JnFr3b7z1bPTMB02mKK_tciHoK/s200/Image10-b.png&quot; /&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
To revert the color mode, simply run&lt;br /&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/2e92358020609764e8a6.js&quot;&gt;&lt;/script&gt;
&lt;h3&gt;
Filtering&lt;/h3&gt;
In this section, we will do smoothing/blurring using low-pass filter, and edge-detection using high-pass filter. In addition, we will also investigate median filter to remove noise.&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Low-Pass (Blur)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 6: Image Filtering.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-xVmQCajznGSO02H1cJICfhkXK4UNXbQMY3DeUcWsS2-IniFlXtYYjOyneWST63H3gXVTzwFAm3Q6QyJXFgSrLafsu7-yg4HF_YeLVgvs5EUW7aly3YzKWhCb-QYgLg5mb2lMD2JID0mG/s320/Image11.png&quot; /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/8c0f1a2ed48383277f8f.js&quot;&gt;&lt;/script&gt;&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;High Pass&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYgRqNz7FC1-uL8nHRi_Cd8o0Esld1dnHN1fzVAH33C4g3_1FfqoGJ0LhbG_lDCNY9aIgfix_mS7CvDP308gIBxVhQ7w0G6_yDU6EQcaKfAUix-x6VBUTJO4or2PizbR-NNPMR6q_Cfc79/s320/Image12.png&quot; /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;script src=&quot;https://gist.github.com/alstat/1ef92750b21735e39366.js&quot;&gt;&lt;/script&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;div class=&quot;datagrid&quot;&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Original&lt;/th&gt;&lt;th&gt;Filtered&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tfoot&gt;
&lt;tr&gt;&lt;td colspan=&quot;3&quot; style=&quot;text-align: center;&quot;&gt;&lt;div id=&quot;paging&quot;&gt;
&lt;i&gt;Table 7: Median Filter.&lt;/i&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tfoot&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiD-ltkZKGoEewCHPpjj7n81VG3e-pB79XrWIedRnH-XXJuQnSCghPr-CoRnO9JI_4xfQA24lH1s24uzr1B44D9Prud_VDp-MBh1ouPn67PK8mR-ev5J17HPtcOwlcBtKjiS-E0fQ77H__q/s1600/peppersalt.jpg&quot; height=&quot;200&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; width=&quot;200&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;From Google, Link &lt;a href=&quot;http://my.fit.edu/~vkepuska/ece3552/esp_book/adsp/chap10/image_files/noisy_image/peppersalt.bmp&quot; target=&quot;_blank&quot;&gt;Here&lt;/a&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;margin-left: 1em; margin-right: 1em; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;td&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot; style=&quot;margin-left: auto; margin-right: auto; text-align: center;&quot;&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td style=&quot;text-align: center;&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXKuanmODf2FVYMWCp8mA9WFeHhNGzjzYCkjVj1ea92BLATQVcAr2h6mej7QTO5jHUB4KQlUZSCBXRBRz8F5YGg5WPATYNsWTXVXRN_Fga6iT3wloMyTRUGq-gByaoK0yS-kiJ1geufuDb/s1600/psfiltered.png&quot; height=&quot;200&quot; style=&quot;margin-left: auto; margin-right: auto;&quot; width=&quot;200&quot; /&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;tr-caption&quot; style=&quot;text-align: center;&quot;&gt;Output of &lt;code&gt;display(medFltr)&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;div class=&quot;separator&quot; style=&quot;margin-left: 1em; margin-right: 1em; text-align: center;&quot;&gt;
&lt;/div&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;br /&gt;
&lt;script src=&quot;https://gist.github.com/alstat/d0630e5c6420fc6cad9a.js&quot;&gt;&lt;/script&gt;
For comparison, I run median filter on first-neighborhood in Mathematica, and I got this
&lt;br /&gt;
&lt;div class=&quot;separator&quot; style=&quot;clear: both; text-align: center;&quot;&gt;
&lt;img border=&quot;0&quot; src=&quot;https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9MU8ILvWLZR_xoHY6K0mtUSgwar5iC-VgjJ1ZDa0JYbIi5hWQkLRgggPoKioCD0bQrnhTUpU4Ce3pE4hKda9NGTsXHkTqL3jS4Ur1Rgp-Qn2oBtC-SvcUPbyPOpgQBqWIEuBB3TiY7Z4b/s1600/M10MedFilter.png&quot; /&gt;&lt;/div&gt;
Clearly, Mathematica has better enhancement than R for this particular filter. But R has a good foundation already, as we witness with EBImage. There are still lots of interesting functions in the said package, that is worth exploring, I suggest you check that out.&lt;br /&gt;
&lt;br /&gt;
For the meantime, we will stop here, but hoping we can play more on this topic in the succeeding post.&lt;br /&gt;
&lt;br /&gt;
&lt;h3&gt;
References&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Gamma_correction&quot; target=&quot;_blank&quot;&gt;Gamma Correction&lt;/a&gt;. &lt;i&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Main_Page&quot; target=&quot;_blank&quot;&gt;Wikipedia&lt;/a&gt;&lt;/i&gt;. Retrieved August 31, 2014.&lt;/li&gt;
&lt;li&gt;Gregoire Pau, Oleg Sklyar, Wolfgang Huber (2014). &lt;i&gt;&lt;a href=&quot;http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=1&amp;amp;cad=rja&amp;amp;uact=8&amp;amp;ved=0CB8QFjAA&amp;amp;url=http%3A%2F%2Fwww.bioconductor.org%2Fpackages%2Frelease%2Fbioc%2Fvignettes%2FEBImage%2Finst%2Fdoc%2FEBImage-introduction.pdf&amp;amp;ei=GdgDVO2jC4HViwLegYG4Dg&amp;amp;usg=AFQjCNEsZZ5NXZGjo_IkuvYR0g759ik6hA&amp;amp;sig2=grKJQ70YXN-vdqZDb0RBVQ&amp;amp;bvm=bv.74115972,d.cGE&quot; target=&quot;_blank&quot;&gt;Introduction to EBImage,
an image processing and analysis toolkit for R&lt;/a&gt;&lt;/i&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</content><link rel='replies' type='application/atom+xml' href='http://alstatr.blogspot.com/feeds/1341742472849113803/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://alstatr.blogspot.com/2014/09/r-image-analysis-using-ebimage.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/1341742472849113803'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5979497974446854318/posts/default/1341742472849113803'/><link rel='alternate' type='text/html' href='http://alstatr.blogspot.com/2014/09/r-image-analysis-using-ebimage.html' title='R: Image Analysis using EBImage'/><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='https://img1.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPEXI_-DFfoRDUEqFOSRSBbeXbiZegBTr41GW35WGMp1UFlpg6Pb3BDJzQ1GhYxYe8EHC0YyyNP9drQhiYddjME3qOue2Ej2KX9hscBNjIj358L1jrUnnGpc4a2CxZs2PBC-dak-ZA3ypB/s72-c/tinago.JPG" height="72" width="72"/><thr:total>0</thr:total></entry></feed>