<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The DO Loop</title>
	<atom:link href="https://blogs.sas.com/content/iml/feed" rel="self" type="application/rss+xml" />
	<link>https://blogs.sas.com/content/iml/</link>
	<description>Statistical programming in SAS with an emphasis on SAS/IML programs</description>
	<lastBuildDate>Mon, 06 Apr 2026 12:48:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.6</generator>
	<item>
		<title>One-pass formulas for mean and variance</title>
		<link>https://blogs.sas.com/content/iml/2026/04/06/welford-mean-var.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/04/06/welford-mean-var.html#respond</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 06 Apr 2026 09:23:31 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Statistical Programming]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=58201</guid>

					<description><![CDATA[<p>In statistical programming, we often assume that the data are stored in a matrix or a data set and can be read at will. For example, if you want to compute a sample mean or standard deviation, you simply pass a vector to a built-in function, and the software spits [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/04/06/welford-mean-var.html">One-pass formulas for mean and variance</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<a href="https://blogs.sas.com/content/iml/files/2026/04/Welford1.jpg"><img fetchpriority="high" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/04/Welford1.jpg" alt="" width="358" height="358" class="alignright size-full wp-image-58204" srcset="https://blogs.sas.com/content/iml/files/2026/04/Welford1.jpg 358w, https://blogs.sas.com/content/iml/files/2026/04/Welford1-300x300.jpg 300w, https://blogs.sas.com/content/iml/files/2026/04/Welford1-150x150.jpg 150w" sizes="(max-width: 358px) 100vw, 358px" /></a>

<p>
In statistical programming, we often assume that the data are stored in a matrix or a data set and can be read at will.
For example, if you want to compute a sample mean or standard deviation, you simply pass a vector to a built-in function, 
and the software spits out the statistics. The function that computes the mean passes through the data once. The textbook
formula for the standard deviation requires two passes through the data. 
</p>

<p>
Modern computers are so fast, and RAM is so cheap and plentiful, that 
today's data analysts probably don't think twice about
using multi-pass algorithms to compute statistics.
But it didn't used to be that way! 
</p><p>
This year SAS is turning 50 years old. In the 1970s, 
RAM was expensive. It was also small: A mainframe computer might have 64KB or less! 
Furthermore, if the data was on a physical media such as a tape, rewinding the tape to read the data again was a time-consuming operation.
Consequently, many research papers in the 1960s and '70s focused on how to compute statistics with one pass through the data.
One such paper was by 
Welford (1962, <em>Technometrics</em>), who showed a simple method for calculating corrected sums of squares and products. 
This one-pass method could be used to compute the mean, variance, and other central moments without using 
any arrays or extra memory.
</p>

<p>
An old proverb states, "everything old is new again," so perhaps it is not surprising that Welford's one-pass formula 
for the mean and standard deviation has applications in the modern era. 
One application is streaming data. If you want to compute statistics for a utility meter or a weather station that monitors conditions continuously,
it would be inefficient to store all the data. It is better to compute statistics on-the-fly, if you can, and maybe store the hourly mean value.
</p>
<p>
Another modern application is for Monte Carlo simulations. Often, we guess how many iterations we need, run the simulation, and hope that the resulting estimate is accurate enough to use. Welford's method provides a way to compute the standard error while the simulation runs, 
which means that we can program logic to exit the simulation when the Monte Carlo estimate satisfies a predetermined level of accuracy.
</p>

<p>This article discusses how to use Welford's one-pass formula to compute moments for streaming data. 
A subsequent article shows how to use this technique to monitor and stop a Monte Carlo simulation.
</p>

<h3>The traditional "offline" mean and variance</h3>
<p>
In the old days, algorithms were sometimes described as "online" or "offline." 
Since these terms are before the creation of the internet, they sound a little strange to the modern programmer.
In computer science, an "offline" algorithm 
requires that the complete data are available before the algorithm is run.
In contrast, an "online" algorithm processes its input serially, one datum at a time, and doesn't store the raw data. 
I prefer to call this type of algorithm a one-pass algorithm that requires little or no auxiliary storage.
</p>
<p>
The familiar "textbook" method for computing means and variances is an offline algorithm.
To compute a mean, you sum the full set of N numbers then divide by N. For the variance, 
you pass through the data again, this time computing the sum of squared deviances (SSE) from the mean.
You divide the SSE by N-1 to obtain the variance. If desired, take the square root to obtain the standard deviation.
For example, the following SAS IML program shows the traditional offline algorithm:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #000080; font-weight: bold;">proc iml</span>;
y = <span style="color: #66cc66;">&#123;</span><span style="color: #2e8b57; font-weight: bold;">48</span>, <span style="color: #2e8b57; font-weight: bold;">51</span>, <span style="color: #2e8b57; font-weight: bold;">46</span>, <span style="color: #2e8b57; font-weight: bold;">37</span>, <span style="color: #2e8b57; font-weight: bold;">45</span>, <span style="color: #2e8b57; font-weight: bold;">50</span>, <span style="color: #2e8b57; font-weight: bold;">46</span>, <span style="color: #2e8b57; font-weight: bold;">51</span>, <span style="color: #2e8b57; font-weight: bold;">51</span>, <span style="color: #2e8b57; font-weight: bold;">55</span><span style="color: #66cc66;">&#125;</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* A traditional or &quot;offline&quot; calculation of the mean and standard deviation
   uses two passes through the data: 
   One for the mean, and a second pass for the sum of squared deviations, 
   which gives the variance or standard deviation. */</span>
trad_m = <span style="color: #0000ff;">mean</span><span style="color: #66cc66;">&#40;</span>y<span style="color: #66cc66;">&#41;</span>;
trad_sse = ssq<span style="color: #66cc66;">&#40;</span>y - trad_m<span style="color: #66cc66;">&#41;</span>;
trad_sd = <span style="color: #0000ff;">std</span><span style="color: #66cc66;">&#40;</span>y<span style="color: #66cc66;">&#41;</span>;
print trad_m trad_sse trad_sd;   <span style="color: #006400; font-style: italic;">/* sample mean = 48; StdDev ~ 5 */</span></pre></td></tr></table></div>




<img decoding="async" src="https://blogs.sas.com/content/iml/files/2026/04/Welford2.png" alt="" width="198" height="65" class="alignnone size-full wp-image-58237" />

<p>This two-pass approach is perfectly fine for everyday data analysis. 
It requires that the data vector <code class="preserve-code-formatting">y</code> is known and fits in memory. 
But it is possible to compute these statistics without using extra memory. 
The next section shows how to compute the 
mean and standard deviation by using a one-pass algorithm.
</p>

<h3>The Welford update formula for "online" data</h3>

<p>Welford (1962) published a one-pass algorithm for computing the mean and variance. 
This method was popularized when Knuth featured it in his seminal book,
<em>The Art of Computer Programming, Volume II: Seminumerical Algorithms</em> (1981). 
The Welford method maintains two running totals: one for the mean, and another for the sum of squared differences. 
For each new datum, the algorithm updates these running totals.
The following IML program implements Welford's algorithm by explicitly looping over the elements of the <code class="preserve-code-formatting">y</code> vector:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* The Welford update formula:
   First, initialize the Welford variables for the one-pass mean and std dev */</span>
m = <span style="color: #2e8b57; font-weight: bold;">0</span>;
varsum = <span style="color: #2e8b57; font-weight: bold;">0</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* Update the running totals for each new datum y[k]. */</span> 
<span style="color: #0000ff;">do</span> k = <span style="color: #2e8b57; font-weight: bold;">1</span> to nrow<span style="color: #66cc66;">&#40;</span>y<span style="color: #66cc66;">&#41;</span>;
   varsum = varsum + <span style="color: #66cc66;">&#40;</span>k-<span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #006400; font-style: italic;">* (y[k] - m)**2 / k;</span>  <span style="color: #006400; font-style: italic;">/* update the sum of squared differences */</span>
   <span style="color: #0000ff;">if</span> k &gt; <span style="color: #2e8b57; font-weight: bold;">1</span> <span style="color: #0000ff;">then</span> sd = <span style="color: #0000ff;">sqrt</span><span style="color: #66cc66;">&#40;</span>varsum/<span style="color: #66cc66;">&#40;</span>k-<span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
   m = m + <span style="color: #66cc66;">&#40;</span>y<span style="color: #66cc66;">&#91;</span>k<span style="color: #66cc66;">&#93;</span> - m<span style="color: #66cc66;">&#41;</span> / k;                       <span style="color: #006400; font-style: italic;">/* update the mean */</span>
<span style="color: #0000ff;">end</span>;
print m varsum sd;</pre></td></tr></table></div>





<img decoding="async" src="https://blogs.sas.com/content/iml/files/2026/04/Welford3.png" alt="" width="160" height="66" class="alignnone size-full wp-image-58234" />

<p>
In the program, the value of m on the right side of the mean-update formula 
is the sample mean after seeing k-1 observations, so denote it by m<sub>k-1</sub>.
The formula overwrites m with the mean of k observations, so denote the left side as m<sub>k</sub>.
It's not hard to see why Welford's update formula is true.
You can rewrite the mean of k observations by 
writing the sum of k observations as the k_th observation plus the sum of the earlier k-1 observations, as follows:
<br />
<span class='MathJax_Preview'>\(
\begin{aligned}
m_k &= \frac{1}{k} \sum_{i=1}^{k} x_i \\
    &= \frac{1}{k} \left( x_k + \sum_{i=1}^{k-1} x_i \right) \\
    &= \frac{1}{k} \left( x_k + \frac{k-1}{k-1} \sum_{i=1}^{k-1} x_i \right) \\
    &= \frac{1}{k} \left( x_k + (k-1) m_{k-1} \right) \\
    &= m_{k-1} + ( x_k - m_{k-1} ) / k
\end{aligned}
\)</span><script type='math/tex'>
\begin{aligned}
m_k &= \frac{1}{k} \sum_{i=1}^{k} x_i \\
    &= \frac{1}{k} \left( x_k + \sum_{i=1}^{k-1} x_i \right) \\
    &= \frac{1}{k} \left( x_k + \frac{k-1}{k-1} \sum_{i=1}^{k-1} x_i \right) \\
    &= \frac{1}{k} \left( x_k + (k-1) m_{k-1} \right) \\
    &= m_{k-1} + ( x_k - m_{k-1} ) / k
\end{aligned}
</script>
</p>
<p>
The derivation of the formula for the variance is only slightly more complicated. 
<a href="https://dev.to/kylepena/derivation-of-recurrence-relation-of-variance-based-on-second-central-moments-welfords-algorithm-305n">
You can find the derivation online.</a>
</p>

<h3>Other useful one-pass methods</h3>
<p>
Since SAS is celebrating its 50th birthday, I should mention 
the importance of one-pass methods in early SAS procedures. 
In the 1970s, SAS was able to leverage one-pass methods to compute basic statistics, regression analyses, and multivariate analyses on data sets that had 
a huge number of observations.
The regression and multivariate analyses were possible because 
<a href="https://blogs.sas.com/content/iml/2019/04/10/compute-sscp-matrix.html">SAS computes the 
uncorrected sum of squares and crossproducts (USSCP or X`X) matrix in one pass through the data.</a>. 
From the USSCP matrix, <a href="https://blogs.sas.com/content/iml/2018/04/18/sweep-operator-sas.html">you can use the 
SWEEP operator</a> to obtain the centered SSCP, which then yields the covariance and correlation matrices.
SAS used these and other one-pass methods to serve customers who needed to analyze Big Data quickly.
</p>


<h3>Summary</h3>
<p>
This article discusses the value of Welford's method for the mean and variance.
Welford's method is an example of a one-pass formula.
Today, memory is cheap and plentiful, so one-pass formulas are not as necessary as they were
in the 1970s when SAS was founded.
However, they are still useful. Modern sensors generate hundreds of kilobytes of data every minute,
so it makes sense to compute statistics "on the fly" rather than storing all the raw data.
In addition, the next article shows how you can use Welford's formulas in Monte Carlo simulations 
to monitor convergence.
</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/04/06/welford-mean-var.html">One-pass formulas for mean and variance</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/04/06/welford-mean-var.html/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/04/Welford1-150x150.jpg" />
	</item>
		<item>
		<title>An NMF analysis of Scotch whiskies</title>
		<link>https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html#comments</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 09:28:33 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Matrix Computations]]></category>
		<category><![CDATA[Statistical Programming]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=58087</guid>

					<description><![CDATA[<p>This is the last article in a series about the nonnegative matrix factorization (NMF). In this article, I run and visualize an NMF analysis of the Scotch whisky data and compare it to a principal component analysis (PCA). Previous articles in the series provide information about the whisky data, the [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html">An NMF analysis of Scotch whiskies</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
This is the last article in a series about the nonnegative matrix factorization (NMF).
In this article, I run and visualize an NMF analysis of the Scotch whisky data and compare it to a principal component analysis (PCA).
</p><p>
Previous articles in the series provide information about the whisky data, the PCA analysis, and issues related to two-factor low-rank factorizations such as the NMF:
</p>
<ul>
<li><a href="https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html">PCA Analysis of whisky</a>: What is the Scotch whisky data? How can you use a PCA to identify whiskies that have similar flavor characteristics?
</li>
<li><a href="https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html">A two-factor rank-k decomposition has inherent ambiguities</a>: You can arbitrarily scale and permute the order of the factors.
</li>
<li><a href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">A comparison of the PCA and NMF decompositions</a> on simulated data, which reveals why the NMF might be preferred for nonnegative data.
</li>
</ul>

<p>
In SAS, you can obtain the NMF by using PROC NMF in SAS Viya, or by implementing the algorithm directly in SAS IML. 
Sharad Saxena has written three blog posts about how to use PROC NMF in SAS Viya: 
</p>
<ol>
<li>
<a href="https://communities.sas.com/t5/SAS-Communities-Library/Non-negative-Matrix-Factorization-Part-1-Understanding-Its/ta-p/973592">How to Understand the NMF</a>
</li>
<li>
<a href="https://communities.sas.com/t5/SAS-Communities-Library/Non-negative-Matrix-Factorization-Part-2-Discovering-Topics-from/ta-p/976933">How to Extract Topics from a Set of Documents</a>
</li>
<li>
<a href="https://communities.sas.com/t5/SAS-Communities-Library/Non-negative-Matrix-Factorization-Part-3-Making-Recommendations/ta-p/980752">How to Use the NMF to Build a Recommender System</a>
</li>
</ol>

<h3>Dimension reduction by using the NMF</h3>
<p>
A previous article uses <a href="https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html">classical principal component analysis (PCA) to analyze the flavor profiles of 86 Scotch whiskies</a>. 
Each whisky is assigned a value 0-4 according to 12 flavors:  
Tobacco, Medicinal, Smoky, Body, Spicy, Winey, Nutty, Honey, Malty, Fruity, Sweetness, and Floral.
The PCA and NMF analyses were discussed in an article by Young, Fogel, and Hawkins (2006, <a href="https://www.niss.org/sites/default/files/ScotchWhisky.pdf">"Clustering scotch whiskies using non-negative matrix factorization"</a>, <em>SPES Newsletter</em>). 
</p>
<p>
By using the PCA, you can find a low-dimensional linear subspace in the 12-dimensional variable space
such that the subspace captures most of the variance in the data. 
This is called <em>dimension reduction</em>. For the whisky data, 
you can use PCA to choose a four-dimensional subspace that explains  67% of the variability in the data.
The PCA factorization has two important properties: it maximizes the explained variance of the data within a lower-dimensional linear subspace,
and it provides orthogonal basis vectors for the subspace.
</p><p>
Whereas the PCA can analyze any centered data matrix, 
the NMF factorization is designed for dimension-reduction of a nonnegative data matrix.
The NMF seeks to decompose the data matrix into profile vectors (which are basis vectors for a lower-dimensional subspace)
and a matrix of weights so that the product approximates the data matrix. In addition, the NMF requires that 
the profile vectors and weights be nonnegative. This last condition is why the NMF factors 
are often more interpretable than the PC vectors. 
However, the NMF decomposition does not have the maximum-variance or orthogonal-basis properties that
make the PCA useful.
</p>

<h3>How to interpret the NMF factors</h3>
<p>
The whisky data can be represented as an
86x12 nonnegative data matrix, X.
Each row represents flavor characteristics of a whisky.
Each whisky is assigned a flavor profile that consists of the values 0-4 for the 12 flavors.
</p><p>
The NMF creates a rank-4 approximation to X.
The approximation is represented as a product W*H,
where W is 86x4 and H is 4x12.
The rows of H form a basis for a 4-D linear subspace.
These are the "flavor profile" vectors that are analogous to the principal components (eigenvectors) 
in a PCA analysis.
</p><p>
The rows of the W matrix are weights (or coefficients) for the 4-factor approximation of the rows of X. 
If W<sub>i</sub> is the i_th row of W, the product 
W<sub>i</sub> * H approximates the flavor profile for the i_th whisky.
Equivalently, if  H<sub>1</sub>-H<sub>4</sub> are the rows of H, then the linear combination
<br />
W<sub>i1</sub>H<sub>1</sub> +  W<sub>i2</sub>H<sub>2</sub> + W<sub>i3</sub>H<sub>3</sub> + W<sub>i4</sub>H<sub>4</sub> 
<br />
is an approximation of the flavor profile for the i_th whisky.
In this sense, W provides the weights for the approximation of observations in terms of the NMF profile vectors.
</p>
<p>
You can also interpret H as a set of weights for an approximation to the variables of X.
The columns of W can be considered "latent factors" for X. The linear combination W*H[,j] is an approximation 
of the j_th variable, 
X<sub>j</sub>.
</p>
<p>
Be aware that the role played by W and H depends on how the data are arranged in X.
For example, it is equally valid to represent the data by using a 12x86 data matrix, Y, where the rows are the flavors and the columns are the whiskies. In that case,
Y` = X, and Y = H`*W` is a valid NMF factorization. 
</p>


<h3>An NMF analysis of the whisky data</h3>
<p>
I'm a statistician, not a Scotch connoisseur. Nevertheless, let's perform an NMF analysis of the whisky data.
I used SAS IML to implement a simple version of the NMF algorithm.
You can <a href="https://github.com/sascommunities/the-do-loop-blog/blob/master/NMF/whisky_NMF.sas">download the SAS programs in this article from GitHub</a>
and run the analysis yourself.
After reading the whisky data into a matrix, the following IML statements 
perform an NMF analysis and visualize the H matrix.
Since the rows of H are the basis vectors for a 4-D subspace, I transpose the matrix 
and display the columns. 
As shown in a previous article, 
<a href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">you can scale the NMF factors so 
that they are approximately on the same scale as the PCA factors</a>.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #0000ff;">%let</span> varNames = Tobacco Medicinal Smoky Body Spicy Winey Nutty Honey Malty Fruity Sweetness Floral;
<span style="color: #000080; font-weight: bold;">proc iml</span>;
<span style="color: #006400; font-style: italic;">/* Apply NMF to the Scotch whisky data */</span>
varNames = propcase<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;varNames</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;
use Whisky;
   read all <span style="color: #0000ff;">var</span> varNames <span style="color: #0000ff;">into</span> <span style="color: #0000ff;">X</span>;
   read all <span style="color: #0000ff;">var</span> <span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">&quot;Distillery&quot;</span> <span style="color: #a020f0;">&quot;Selected&quot;</span><span style="color: #66cc66;">&#125;</span>;
<span style="color: #0000ff;">close</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* The PCA analysis of the whisky data kept four PCs.
   Load the nmf_mult function and compute a four-factor NMF */</span>
load module = <span style="color: #0000ff;">_ALL_</span>;
names = <span style="color: #a020f0;">'NMF 1'</span>:<span style="color: #a020f0;">'NMF 4'</span>;
<span style="color: #000080; font-weight: bold;">run</span> nmf_mult<span style="color: #66cc66;">&#40;</span>W1, H1, <span style="color: #0000ff;">X</span>, ncol<span style="color: #66cc66;">&#40;</span>names<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* scale the H matrix to use the same scale used in the PCA article */</span>
maxValPCA = <span style="color: #2e8b57; font-weight: bold;">0.8506</span>;             <span style="color: #006400; font-style: italic;">/* max value in the PCA eigenbasis */</span>
scale = maxValPCA / H1<span style="color: #66cc66;">&#91;</span>,&lt;&gt;<span style="color: #66cc66;">&#93;</span>;    <span style="color: #006400; font-style: italic;">/* scale each row by maximum, then multiply by maxVal */</span>
W = W1 / scale`;
H = H1 # scale;
print  <span style="color: #66cc66;">&#40;</span>W<span style="color: #66cc66;">&#91;</span>&lt;&gt;,<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>L=<span style="color: #a020f0;">&quot;Max W Cols&quot;</span><span style="color: #66cc66;">&#93;</span>,   <span style="color: #66cc66;">&#40;</span>H<span style="color: #66cc66;">&#91;</span>,&lt;&gt;<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>L=<span style="color: #a020f0;">&quot;Max H Rows&quot;</span><span style="color: #66cc66;">&#93;</span>;
&nbsp;
<span style="color: #0000ff;">%let</span> Reds = CXF7F7F7 CXFDDBC7 CXF4A582 CXD6604D CXB2182B ;
ods graphics / width=360px height=480px;
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>H`<span style="color: #66cc66;">&#41;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;NMF Factors (H`) for Whiskies&quot;</span>
                     colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;Reds</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">0</span>||<span style="color: #0000ff;">max</span><span style="color: #66cc66;">&#40;</span>H<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
                     xvalues=names yvalues=varNames;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF1.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF1.png" alt="" width="360" height="480" class="alignnone size-full wp-image-58123" srcset="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF1.png 360w, https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF1-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2.png" alt="" width="360" height="480" class="alignnone size-full wp-image-57782" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2.png 360w, https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
For comparison, I also show <a href="https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html">a heat map of the factors (basis elements) for the four-component PCA analysis from my previous article</a>.
The colors in the left image correspond to the interval [0, 0.8506]. 
The colors in the right image correspond to the interval [-0.8506, 0.8506], where white represents 0, so that the red colors represent the same range in each image. 
The images enable you to compare the NMF and PCA factors. Here are some comparisons:
</p>
<ul>
<li>Both images show latent factors for a four-factor approximation of the whisky flavors. That is, the flavor characteristics of all whiskies are approximated by 
linear combinations of these four factors.</li>
<li>The factors for the NMF are all positive. Because the weights are also positive, the flavor of each whisky is approximated by using an additive sum of these NMF components. 
In contrast, the PCA factors have both positive and negative values. The flavor of each whisky is approximated by using a sum of these contrasts. 
</li>
<li>The first NMF factor is a strong combination of the Fruity, Sweetness, and Floral variables, with smaller amounts of Smoky, Spicy, and Malty. 
At the risk of promoting stereotypes, these dominant flavors are sometimes characterized as "feminine" flavors. 
</li>
<li>The second NMF factor is a combination of 10 flavors, with an emphasis on Body. 
Unlike the first factor, it includes the Winey, Nutty, and Honey flavors.
This factor might be described as modeling "overall" flavors. Or perhaps as a "base" or "average" profile for Scotch whiskies. 
In a subsequent section, you will see that the second factor has a substantial weight for almost all whiskies.
</li>
<li>The third NMF factor emphasizes "masculine" flavors: Tobacco, Medicinal, Smoky. This factor also includes the Body and Spicy variables.
</li>
<li>The fourth NMF factor emphasizes the Nutty flavors, but also includes the Malty and Sweetness variables.
</li>
<li>It is difficult to compare the NMF factors with the PCA factors. The first PCA factor is a contrast between the masculine and feminine flavors. The second PCA factor 
seems to emphasize overall flavor. The fourth factor is primarily a contrast between Nutty and Sweetness.  The fact that the PCA factors are harder to interpret is one reason why NMF 
is used in applications where interpretability is important.
</li>
</ul>

<h3>Approximating whisky flavors in terms of factors</h3>
<p>
The goal of the PCA and NMF analyses is to choose a small number of factors that 
can be used to reduce the dimensionality of the data.
Instead of associating each whisky with 12 variables (flavors), the PCA and NMF analyses 
both choose four new "compound" factors. 
The flavor profile of each whisky is approximated as a linear combination of these factors.
A previous article discusses 
<a href="https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html">how to visualize the weights for the four-component PCA analysis</a>.
In a PCA analysis, the weights can be positive or negative.
</p>
<p>
For the NMF analysis, the weights are always nonnegative. The following heat map visualizes 
a selected subset of whiskies in terms of the NMF components.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">ods graphics / width=480px height=640px;
<span style="color: #006400; font-style: italic;">/* choose a subset of whiskies */</span>
selIdx = loc<span style="color: #66cc66;">&#40;</span>Selected<span style="color: #66cc66;">&#41;</span>;
W_sel = W<span style="color: #66cc66;">&#91;</span>selIdx, <span style="color: #66cc66;">&#93;</span>;
Whisky_name = Distillery<span style="color: #66cc66;">&#91;</span>selIdx<span style="color: #66cc66;">&#93;</span>;
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>W_sel<span style="color: #66cc66;">&#41;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;NMF Characteristics of Selected Whiskies (W)&quot;</span>
         colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;Reds</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">0</span>||<span style="color: #0000ff;">max</span><span style="color: #66cc66;">&#40;</span>W<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
         xvalues=names yvalues=Whisky_name;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF3.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF3.png" alt="" width="360" height="480" class="alignnone size-full wp-image-58138" srcset="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF3.png 480w, https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF3-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
For each whisky (each row), the heat map shows the weights for the approximation of the whisky's flavor in terms of the four NMF factors.
You can use this heat map in place of scatter plots to visualize the four-dimensional scores for each whisky.
A few features are apparent:
</p>
<ul>
<li>The second NMF factor is used for almost all whiskies. In that sense, the second factor is a "base" set of flavors.
Most individual whiskies start with this flavor profile and then add additional features by using other factors.
The Aberlour, Balvenie, Glenlivet, Linkwood, and Macallan whiskies have similar flavors, which is based on the second factor.
The similarity in these whiskies can be explained by noting that these distilleries are all located in the same region (Speyside) Scotland. 
</li>
<li>
The first NMF factor is dominant in the Glenfiddich whisky and five others (Auchentoshan, GlenGarioch,...).
Recall that the first NMF factor emphasizes "feminine" flavors such as Fruity, Sweetness, and Floral.
</li>
<li>
The third NMF factor is dominant in the Ardbeg, Lagavulin, and Laphroaig whiskies.
Recall that the third NMF factor emphasizes "masculine" flavors. Indeed, all three whiskies are often described as "earthy" or "peaty,"
and the distilleries are all on the island of Islay.
</li>
<li>
The fourth NMF factor is primarily related to the Nutty variable.
This factor is not the sole component of whisky.
Rather, this factor is added to other components. 
The Edradour whisky is primarily composed of the "base" flavor with Nutty enhancements. 
</li>
</ul>


<h3>The orthogonality (not!) of NMF factors</h3>
<p>
Unlike the PCA factors, the NMF factors are not orthogonal to each other. 
The NMF factors are correlated, as shown by the following computation:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* show that the profile vectors are not orthogonal */</span>
R = corr<span style="color: #66cc66;">&#40;</span>H`<span style="color: #66cc66;">&#41;</span>;
print R<span style="color: #66cc66;">&#91;</span>c=names r=names F=Bestd5.<span style="color: #66cc66;">&#93;</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF2.png" alt="" width="239" height="172" class="alignnone size-full wp-image-58129" />


<h3>How much variance is captured by the NMF factors?</h3>
<p>
When you use a k-factor PCA for dimension-reduction, the factors form a basis for a k-dimensional linear 
subspace that explains the most variance in the data. 
Since the NMF factors are not orthogonal, the subspace spanned by the NMF factors must explain less variance.
But how much less? That answer depends on the data, but you can write a short IML function that provides the answer.
</p><p>
The math is standard linear algebra, but a little long to explain. 
Let the columns of A represent a set of linearly independent basis vectors for an arbitrary subspace.
To compute the variance explained by the subspace, 
you can project the data onto that subspace, then compute the covariance matrix of the projected data.
</p><p>
Let S be the covariance matrix of the original data.
Briefly, you can use the following method to compute
the proportion of the variance in S that is explained by the column space of A.
</p>
<ul>
<li>
Let Q be an orthonormal basis for the column space of A.
</li>
<li>
The projection matrix onto this subspace is P = Q*Q`.
</li>
<li>
The covariance of the projected data is P*S*P.
Therefore, the explained variance is trace(P*S*P).
</li>
<li>
The <a href="https://en.wikipedia.org/wiki/Trace_(linear_algebra)#Cyclic_property">trace of a product of square matrices is invariant under a cyclic shift</a>
of the matrices in the product. If you substitute P = Q*Q`, shift the order, and simplify, you discover that
the explained variance is trace(Q`*S*Q).
</li>
</ul>

<p>
This method is implemented in the following SAS IML function, which accepts the covariance matrix, S, and a matrix of basis elements, A.
The function returns the proportion of the variance in S that is explained by the subspace defined by the columns of A.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Let S be an nxn covariance matrix. 
   Let A be an nxk matrix with k linearly indep columns.
   This function returns the proportion of the variance in S that is explained by column space of A.
*/</span>
start VarExplained<span style="color: #66cc66;">&#40;</span>A, S<span style="color: #66cc66;">&#41;</span>;
   <span style="color: #0000ff;">call</span> svd<span style="color: #66cc66;">&#40;</span>U, D, V, A<span style="color: #66cc66;">&#41;</span>;   <span style="color: #006400; font-style: italic;">/* Find orthonormal basis, Q, for the span of A */</span>
   k = ncol<span style="color: #66cc66;">&#40;</span>A<span style="color: #66cc66;">&#41;</span>;
   Q = U<span style="color: #66cc66;">&#91;</span>, <span style="color: #2e8b57; font-weight: bold;">1</span>:k<span style="color: #66cc66;">&#93;</span>; 
&nbsp;
   var_explained = trace<span style="color: #66cc66;">&#40;</span>Q`<span style="color: #006400; font-style: italic;">*S*Q);</span>   <span style="color: #006400; font-style: italic;">/* variance explained by colspace(A) */</span>
   total_var = trace<span style="color: #66cc66;">&#40;</span>S<span style="color: #66cc66;">&#41;</span>;            <span style="color: #006400; font-style: italic;">/* total variance in S */</span>
   prop_var = var_explained / total_var; <span style="color: #006400; font-style: italic;">/* proportion of var explained */</span>
   <span style="color: #0000ff;">return</span> prop_var;
finish;
&nbsp;
<span style="color: #006400; font-style: italic;">/* For the PCA, find the proportion of variance explained by the first 4 PCs */</span>
<span style="color: #0000ff;">call</span> eigen<span style="color: #66cc66;">&#40;</span>D, U, S<span style="color: #66cc66;">&#41;</span>;                 <span style="color: #006400; font-style: italic;">/* PCA decomp */</span>
U = U<span style="color: #66cc66;">&#91;</span>, <span style="color: #2e8b57; font-weight: bold;">1</span>:<span style="color: #2e8b57; font-weight: bold;">4</span><span style="color: #66cc66;">&#93;</span>;
prop_var_PCA = VarExplained<span style="color: #66cc66;">&#40;</span>U, S<span style="color: #66cc66;">&#41;</span>;   <span style="color: #006400; font-style: italic;">/* somewhat redundant, since Q=U */</span>
<span style="color: #006400; font-style: italic;">/* For the NMF, find the proportion of variance explained by the rowspace of H */</span>
prop_var_NMF = VarExplained<span style="color: #66cc66;">&#40;</span>H`, S<span style="color: #66cc66;">&#41;</span>;
print prop_var_PCA<span style="color: #66cc66;">&#91;</span>F=PERCENT7.1<span style="color: #66cc66;">&#93;</span> prop_var_NMF<span style="color: #66cc66;">&#91;</span>F=PERCENT7.1<span style="color: #66cc66;">&#93;</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF4.png" alt="" width="279" height="83" class="alignnone size-full wp-image-58174" />

<p>
The output shows that the maximum variance that can be captured in the 4-D linear subspace is 67.1%.
By comparison, the NMF factors explain 60.3% of the variance.
</p>



<h3>Summary</h3>
<p>
The purpose of this article is to use the NMF factorization on the Scotch whisky data set, which classifies the flavors in 86 Scotch whiskies according to 12 flavor characteristics. 
These data were analyzed by Young, Fogel, and Hawkins (2006).
The NMF decomposes the data matrix into two matrices: A matrix of factors (H) and a matrix of weights (W).
The NMF enables you to reduce the dimensionality of the data by replacing the 12 original variables with four factors that
are nonnegative combinations of the variables.
The rows of the weights matrix provides the coefficients for a rank-4 approximation of the data.
The flavor profile of each whisky is approximated by using a linear combination of factors. 
If you visualize the weights matrix, you can identify which whiskies have similar flavor profiles. 
</p><p>
You can <a href="https://github.com/sascommunities/the-do-loop-blog/blob/master/NMF/whisky_NMF.sas">download the SAS programs in this article from GitHub</a>.
</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html">An NMF analysis of Scotch whiskies</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html/feed</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/03/whisky_NMF1-150x150.png" />
	</item>
		<item>
		<title>A comparison of SVD and NMF factorizations</title>
		<link>https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html#comments</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 09:25:52 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Statistical Graphics]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57980</guid>

					<description><![CDATA[<p>The classical singular value decomposition (SVD) has a long and venerable history. I have described it as a fundamental theorem of linear algebra. Nevertheless, the mathematical property that makes it so useful in general (namely, the orthogonality of the matrix factors) also makes it less useful for certain applications. In [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">A comparison of SVD and NMF factorizations</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
<a href="https://blogs.sas.com/content/iml/2017/08/28/singular-value-decomposition-svd-sas.html">The classical singular value decomposition (SVD)</a> has a long and venerable history.
I have described it as a fundamental theorem of linear algebra. 
Nevertheless, the mathematical property that makes it so useful in general (namely, the orthogonality of the matrix factors) 
also makes it less useful for certain applications. 
In response, researchers developed the nonnegative matrix factorization (NMF), which 
removes the requirement of orthogonality and replaces it with a requirement that 
the elements in the matrix factors be nonnegative.
</p>
<p>
The purpose of this article is to use an example to compare 
the classical singular value decomposition (SVD) and the nonnegative matrix factorization (NMF).
An excellent paper that presented the NMF to the statistical community is
Fogel, Hawkins, Beecher, Luta, and Young (2013),
<a href="https://doi.org/10.1080/00031305.2013.845607">"A tale of two matrix factorizations."</a>
One section of the paper 
that caught my eye is Section 4.2, "Realistic synthetic mixture,"
which compares and contrasts the NMF and the classical SVD.
This article presents a similar example. It also improves the visualizations in the Fogel et al. paper.
The images in this blog post are better for visualizing how the SVD and the NMF methods differ.
</p>

<h3>Simulating patient-gene data for a disease</h3>
<p>
The Fogel et al. example is based on 
clinical study that involves gene expression levels for 30 genes across 40 patient samples.
(Actually, the paper uses 40 genes, but the example is more enlightening if the number of variables and observations are different.)
Assume a disease (such as diabetes) has two distinct genetic etiologies.
The L matrix ("L" for "left side") represents 40 patients.
The first column of L represents a measure of one etiology.
The second column of L represents a measure of a second etiology.
In this example, 10 patients have only the first etiology,
10 have only the second etiology, and 10 have both.
</p>
<p>
The R matrix ("R" for "right side") represents expression levels for 30 genes. 
This example assumes the first 10 genes are related to the first etiology, and the next 10 genes are related to the second etiology. 
The last 10 genes are not related to the disease.
</p>
<p>
You can use SAS IML to simulate these factors and to construct the data matrix as their product.
Because the data are simulated, we know that the data matrix is exactly rank-2, and we know a true set of rank-2 matrices whose product equals the data matrix.
You can simulate the L and R matrices by randomly assigning a value between 1 and 2 to specific cells. 
Notice that these factors are entirely nonnegative, so the product matrix, X = L*R, is also nonnegative.
The following SAS IML statements simulate the L and R matrices and create heat maps to visualize them.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #000080; font-weight: bold;">proc iml</span>;
<span style="color: #0000ff;">reset</span> <span style="color: #0000ff;">fuzz</span>;
<span style="color: #006400; font-style: italic;">/*** Simulation of L and R for clinical example of two etiologies ***/</span>
<span style="color: #0000ff;">call</span> randseed<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">123</span><span style="color: #66cc66;">&#41;</span>;
nrow = <span style="color: #2e8b57; font-weight: bold;">40</span>;
ncol = <span style="color: #2e8b57; font-weight: bold;">30</span>;
L = j<span style="color: #66cc66;">&#40;</span>nrow, <span style="color: #2e8b57; font-weight: bold;">2</span>, <span style="color: #2e8b57; font-weight: bold;">0</span><span style="color: #66cc66;">&#41;</span>;
idx1 = <span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">11</span>:<span style="color: #2e8b57; font-weight: bold;">20</span><span style="color: #66cc66;">&#41;</span> || <span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">31</span>:<span style="color: #2e8b57; font-weight: bold;">40</span><span style="color: #66cc66;">&#41;</span>;
L<span style="color: #66cc66;">&#91;</span>idx1, <span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#93;</span> = randfun<span style="color: #66cc66;">&#40;</span> ncol<span style="color: #66cc66;">&#40;</span>idx1<span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">&quot;Uniform&quot;</span>, <span style="color: #2e8b57; font-weight: bold;">1</span>, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #006400; font-style: italic;">/* 1st col */</span>
idx2 =  <span style="color: #2e8b57; font-weight: bold;">21</span>:<span style="color: #2e8b57; font-weight: bold;">40</span>;
L<span style="color: #66cc66;">&#91;</span>idx2, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#93;</span> = randfun<span style="color: #66cc66;">&#40;</span> ncol<span style="color: #66cc66;">&#40;</span>idx2<span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">&quot;Uniform&quot;</span>, <span style="color: #2e8b57; font-weight: bold;">1</span>, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #006400; font-style: italic;">/* 2nd col */</span>
&nbsp;
R = j<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span>, ncol, <span style="color: #2e8b57; font-weight: bold;">0</span><span style="color: #66cc66;">&#41;</span>;
idx1 = <span style="color: #2e8b57; font-weight: bold;">1</span>:<span style="color: #2e8b57; font-weight: bold;">10</span>;
R<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">1</span>, idx1<span style="color: #66cc66;">&#93;</span> = randfun<span style="color: #66cc66;">&#40;</span> <span style="color: #2e8b57; font-weight: bold;">1</span>|| ncol<span style="color: #66cc66;">&#40;</span>idx1<span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">&quot;Uniform&quot;</span>, <span style="color: #2e8b57; font-weight: bold;">1</span>, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span>; <span style="color: #006400; font-style: italic;">/* 1st row */</span>
idx2 = <span style="color: #2e8b57; font-weight: bold;">11</span>:<span style="color: #2e8b57; font-weight: bold;">20</span>;
R<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">2</span>, idx2<span style="color: #66cc66;">&#93;</span> = randfun<span style="color: #66cc66;">&#40;</span> <span style="color: #2e8b57; font-weight: bold;">1</span>||ncol<span style="color: #66cc66;">&#40;</span>idx2<span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">&quot;Uniform&quot;</span>, <span style="color: #2e8b57; font-weight: bold;">1</span>, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #006400; font-style: italic;">/* 2nd row */</span>
&nbsp;
<span style="color: #006400; font-style: italic;">/* simplify L and R by rounding to 2 decimal places */</span>
L = <span style="color: #0000ff;">round</span><span style="color: #66cc66;">&#40;</span>L, <span style="color: #2e8b57; font-weight: bold;">0.01</span><span style="color: #66cc66;">&#41;</span>;
R = <span style="color: #0000ff;">round</span><span style="color: #66cc66;">&#40;</span>R, <span style="color: #2e8b57; font-weight: bold;">0.01</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #006400; font-style: italic;">/*** end simulation of L and R ***/</span>
&nbsp;
<span style="color: #006400; font-style: italic;">/* create blue-white-red color ramp from ColorBrewer */</span>
<span style="color: #0000ff;">%let</span> BlueReds = CX2166AC CX4393C3 CX92C5DE CXD1E5F0 CXF7F7F7 CXFDDBC7 CXF4A582 CXD6604D CXB2182B ;
<span style="color: #0000ff;">%let</span> Reds = CXF7F7F7 CXFDDBC7 CXF4A582 CXD6604D CXB2182B ;
&nbsp;
<span style="color: #006400; font-style: italic;">/* Visualize L (40 rows, transposed for viewing) and R (30 columns) */</span>
ods graphics / width=640px height=84px; 
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>L`<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;BlueReds</span><span style="color: #66cc66;">&#125;</span> yvalues=<span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">'Real 1'</span> <span style="color: #a020f0;">'Real 2'</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2.2</span> <span style="color: #2e8b57; font-weight: bold;">2.2</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;Original L`&quot;</span>;
&nbsp;
ods graphics / width=480px height=84px;
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>R<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;BlueReds</span><span style="color: #66cc66;">&#125;</span> yvalues=<span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">'Real 1'</span> <span style="color: #a020f0;">'Real 2'</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2.2</span> <span style="color: #2e8b57; font-weight: bold;">2.2</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;Original R&quot;</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD01.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD01.png" alt="" width="640" height="84" class="alignnone size-full wp-image-58007" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD01.png 640w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD01-300x39.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a>
<br />
<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD02.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD02.png" alt="" width="480" height="84" class="alignnone size-full wp-image-58004" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD02.png 480w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD02-300x53.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>

<p>
The columns of the L matrix are the factors for gene activation. 
The rows of the L matrix represent patients. (Note: the image is for the transposed matrix, L`).
You can visualize the etiologies for the four groups.
For the R matrix, the rows are the factors and the columns are the genes.
The heat maps are entirely red or white because all values of 
L and R are nonnegative. The product, X = L*R, represents the gene expression levels for 
30 genes across 40 patient samples. The following statements visualize the data matrix:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #0000ff;">X</span> = L<span style="color: #006400; font-style: italic;">*R;</span>
ods graphics / width=360px height=480px; 
<span style="color: #006400; font-style: italic;">/* Note: Dark red now represents 4, not 2.2 */</span>
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">X</span><span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;Reds</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">0.01</span> <span style="color: #2e8b57; font-weight: bold;">4.04</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;Original (Synthetic) Data Matrix&quot;</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD03.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD03.png" alt="" width="360" height="480" class="alignnone size-full wp-image-58001" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD03.png 360w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD03-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>


<p>
In the real world, the order of the patients and the order of the genes would be permuted arbitrarily. 
Also, there would be random noise that appears in the 40x30 panel of measurements. 
Regardless, when presented with these data, a researcher would like to recover the underlying factors that generated them. 
If researchers know or suspect that the disease has two etiologies, they might try to approximate the matrix 
by using a rank-2 decomposition of factors. 
In this example, we know the true factors whose product forms the X matrix, but in the real world, 
those factors are unknown and must be estimated from X.
</p>

<h3>The SVD factorization</h3>
<p>
The classical way to decompose X into rank-2 factors is to use the singular value decomposition (SVD). 
A newer decomposition is the nonnegative matrix factorization (NMF). 
Following Fogel et al. (2013), the remainder of this article compares these two factorizations.
</p>
<p>
A previous article discusses the fact that <a href="https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html">a low-rank factorization is not unique, but can be arbitrarily rescaled and permuted</a>. 
Fogel et al. visualized the results by using heat maps.
However, their heat maps are individually scaled to use the minimum and maximum value in each matrix. 
Because the SVD factors have positive and negative elements whereas the NMF factors do not, 
it is hard to compare the two factorization methods unless the colors in the heat maps represent the same scale.
In this article, I carefully scale the factors so that red colors always represent positive values, white represents zero, 
and blue always represents negative values.
</p>
<p>
Let's look at an SVD analysis of the simulated data matrix, X. The following statements call the SVD subroutine in SAS IML to compute X = U*D*V`. 
To get the rank-2 approximation of X, the first two columns of U and V are extracted. 
For this example, the rank-2 product exactly equals X because we constructed X as the product of two rank-2 matrices
and did not add any noise.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">k = <span style="color: #2e8b57; font-weight: bold;">2</span>;
<span style="color: #0000ff;">call</span> SVD<span style="color: #66cc66;">&#40;</span>A, D, B, <span style="color: #0000ff;">X</span><span style="color: #66cc66;">&#41;</span>;
i = <span style="color: #2e8b57; font-weight: bold;">1</span>:k;   
L_SVD =   A<span style="color: #66cc66;">&#91;</span>, i<span style="color: #66cc66;">&#93;</span> # <span style="color: #0000ff;">sqrt</span><span style="color: #66cc66;">&#40;</span>D<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>`;   <span style="color: #006400; font-style: italic;">/* scale each factor evenly */</span>
R_SVD = T<span style="color: #66cc66;">&#40;</span>B<span style="color: #66cc66;">&#91;</span>, i<span style="color: #66cc66;">&#93;</span> # <span style="color: #0000ff;">sqrt</span><span style="color: #66cc66;">&#40;</span>D<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>`<span style="color: #66cc66;">&#41;</span>;
X2 = L_SVD<span style="color: #006400; font-style: italic;">*R_SVD;</span>
&nbsp;
<span style="color: #006400; font-style: italic;">/* show that the SVD recovers X exactly */</span>
maxDiff = <span style="color: #0000ff;">max</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">abs</span><span style="color: #66cc66;">&#40;</span>X-X2<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
print maxDiff;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD04.png" alt="" width="70" height="64" class="alignnone size-full wp-image-58013" />

<h3>Interpreting the SVD factors</h3>
<p>
Although the product of the SVD exactly equals X, the SVD factors do NOT look like the nonnegative factors L and R that were used
to create X. 
The reason is that the factors in an SVD are orthogonal, which means that the factors must contain both positive and negative elements when k &ge; 2. 
(Technically, if X is block-diagonal, then the SVD can recover the original features, but that is not the case in this example.) 
Because X contains strictly nonnegative values and isn't block-diagonal, 
<em>it is mathematically impossible for the SVD to return orthogonal factors that are entirely nonnegative</em>.
</p>
<p>
This fact is shown by using the following heat maps, which display the SVD factors on the same color scale as the true factors.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Visualize true L` (40 columns) and the left factor from the SVD */</span>
ods graphics / width=640px height=84px; 
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>L`<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;BlueReds</span><span style="color: #66cc66;">&#125;</span> yvalues=<span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">'Real 1'</span> <span style="color: #a020f0;">'Real 2'</span><span style="color: #66cc66;">&#125;</span>   <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2.2</span> <span style="color: #2e8b57; font-weight: bold;">2.2</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;Real Factor L`&quot;</span>;
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>L_SVD`<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;BlueReds</span><span style="color: #66cc66;">&#125;</span> yvalues=<span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">'SVD 1'</span> <span style="color: #a020f0;">'SVD 2'</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2.2</span> <span style="color: #2e8b57; font-weight: bold;">2.2</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;SVD Factor&quot;</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05.png" alt="" width="640" height="84" class="alignnone size-full wp-image-58019" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05.png 640w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05-300x39.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a>
<br />
<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD06.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD06.png" alt="" width="640" height="84" class="alignnone size-full wp-image-58037" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD06.png 640w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD06-300x39.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a>

<p>
The heat maps show (the transpose of) the original factor, L, and the SVD factor, U.
Whereas the true factors for X have strictly nonnegative values, the second factor in the SVD contains negative values (the blue cells).
The negative values are required because of orthogonality: the inner product of the columns must equal 0! 
The negative values represent contrasts, which can be difficult to interpret when the true underlying components of the data are strictly additive.
</p>
<p>
You can use similar statement to visualize the right-hand SVD factor, V, and compare it to the original factor, R. The results are shown below:
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07.png" alt="" width="480" height="84" class="alignnone size-full wp-image-58034" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07.png 480w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07-300x53.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>
<br />
<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD08.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD08.png" alt="" width="480" height="84" class="alignnone size-full wp-image-58031" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD08.png 480w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD08-300x53.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>

<h3>The NMF factorization</h3>
<p>
The SVD cannot return purely nonnegative factors, which makes interpretation of the factors difficult when the 
data are nonnegative. Researchers developed the NMF to address this issue. 
The NMF decomposes a nonnegative matrix into a product W*H, where the elements in W and H are constrained to be nonnegative.
The decomposition is not unique.
</p>
<p>
Let's use NMF to produce rank-2 factors of X. 
In SAS, you can obtain the NMF by using PROC NMF in SAS Viya, or by implementing the algorithm directly in SAS IML. 
This article computes the NMF by running an IML module that implements the Lee and Seung multiplicative update algorithm.
The factorization you get might depend on the way that you specify an "initial guess"  for the factors. 
Consequently, the result that I present is just one possible NMF factorization.
</p>
<p>
The NMF returns two rank-2 matrices, W and H, such that X &asymp; W * H. 
As with the SVD, the product is a rank-2 approximation to X. 
However, due to the algorithm and initialization that I used, the 
product of the NMF factors does not exactly equal X.
The NMF algorithm attempts to optimize a non-convex objective function. It can converge to local minima.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Run the Lee and Seung &quot;multiplicative update&quot; algorithm for NMF.
   The IML implementation can be downloaded from GitHub. */</span>
<span style="color: #000080; font-weight: bold;">run</span> nmf_mult<span style="color: #66cc66;">&#40;</span>W1, H1, <span style="color: #0000ff;">X</span>, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* Let's see whether the product W1*H1 is a reasonable approximation of X. */</span>
X_est = W1<span style="color: #006400; font-style: italic;">*H1;</span>
ods graphics / width=360px height=480px; 
<span style="color: #006400; font-style: italic;">/* Note: Dark red now represents ~4, not 2.2 */</span>
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>X_est<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;Reds</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">0.01</span> <span style="color: #2e8b57; font-weight: bold;">4.04</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;NMF Estimate of Matrix&quot;</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD09.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD09.png" alt="" width="360" height="480" class="alignnone size-full wp-image-58046" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD09.png 360w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD09-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
According to the heat map, the product W1*H1 does a very good job approximating X. An obvious difference is seen for Patients 32 and 39. 
Whereas these patients actually have positive gene expression levels for genes 1-10, 
the rank-2 NMF approximation does not fully recover that fact.
</p>

<h3>Comparing the NMF factors</h3>
<p>
As explained in the <a href="https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html">previous article</a>, there are two inherent ambiguities in the factors of the NMF:
</p>
<ol>
<li>The columns of W and the rows of H can be arbitrarily scaled, provided they are scaled reciprocally.</li>
<li>The columns of W and the rows of H can be arbitrarily permuted, provided they are permuted together.</li>
</ol>
<p>
To compare the factors from the NMF with the original factors, it is best that the colors represent the same scale. 
We could standardize the factors L and R, but since we have already visualized them, I will instead adjust the 
scaling of W and H so that their maximum elements are approximately equal to 2, 
which was the maximum value in the original L and R factors. This is not a standard way to standardize (pun intended!),
but it is useful for this example.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* The following statements rescale W and H to try to put all values in [0,2]. */</span>
maxVal = <span style="color: #2e8b57; font-weight: bold;">2</span>;
scale = maxVal / W1<span style="color: #66cc66;">&#91;</span>&lt;&gt;,<span style="color: #66cc66;">&#93;</span>;    <span style="color: #006400; font-style: italic;">/* scale each col by maximum, then multiply by maxVal */</span>
W = W1 # scale;              <span style="color: #006400; font-style: italic;">/* scale columns of W */</span>
H = H1 / scale`;             <span style="color: #006400; font-style: italic;">/* reciprocally scale rows of H */</span>
&nbsp;
<span style="color: #006400; font-style: italic;">/* Now, visualize the NMF factors, W and H, and compare with L and R from the original simulation. */</span>
ods graphics / width=640px height=70px; 
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>L`<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;BlueReds</span><span style="color: #66cc66;">&#125;</span> yvalues=<span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">'Real 1'</span> <span style="color: #a020f0;">'Real 2'</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2.2</span> <span style="color: #2e8b57; font-weight: bold;">2.2</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;Real Factor L`&quot;</span>;
<span style="color: #0000ff;">call</span> heatmapcont<span style="color: #66cc66;">&#40;</span>W`<span style="color: #66cc66;">&#41;</span> colorramp=<span style="color: #66cc66;">&#123;</span><span style="color: #0000ff; font-weight: bold;">&amp;BlueReds</span><span style="color: #66cc66;">&#125;</span> yvalues=<span style="color: #66cc66;">&#123;</span><span style="color: #a020f0;">'NMF 1'</span> <span style="color: #a020f0;">'NMF 2'</span><span style="color: #66cc66;">&#125;</span>   <span style="color: #0000ff;">range</span>=<span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2.2</span> <span style="color: #2e8b57; font-weight: bold;">2.2</span><span style="color: #66cc66;">&#125;</span> <span style="color: #0000ff;">title</span>=<span style="color: #a020f0;">&quot;NMF Factor W`&quot;</span>;</pre></td></tr></table></div>





<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05.png" alt="" width="640" height="84" class="alignnone size-full wp-image-58019" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05.png 640w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD05-300x39.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a>
<br />
<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD10.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD10.png" alt="" width="640" height="70" class="alignnone size-full wp-image-58055" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD10.png 640w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD10-300x33.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a>

<p>
For some data, you might need to permute the columns of the left NMF factor and the rows of the right factor before you can compare them to 
real factors.
(An easy way to compare: sort the columns of all left factors from largest to smallest by using the Euclidean norm.)
In this example, we got lucky and can immediately compare the NMF and real factors. 
The similarity between the left factors is impressive. The NMF factor captures the structure of the real factor except for Patients 32 and 39.
In this case, the NMF algorithm converged to a solution that is not the true factor. 
This happens frequently. One of the drawbacks of NMF is that the solution you get depend on an initial guess for the factors,
and there are many competing ideas about how to provide an initial guess.
</p><p>
The agreement between the right factors is also impressive, as shown below:
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07.png" alt="" width="480" height="84" class="alignnone size-full wp-image-58034" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07.png 480w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD07-300x53.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>
<br />
<a href="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD11.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD11.png" alt="" width="480" height="84" class="alignnone size-full wp-image-58058" srcset="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD11.png 480w, https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD11-300x53.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>

<p>
This output highlights two critical features of the NMF algorithm:
</p>
<ol>
<li>Both factors contain only nonnegative elements. It immediately follows that the product W*H must also be nonnegative.</li>
<li>The NMF decomposition does an excellent job of recovering the true underlying factors. 
If you compare the true factors with the NMF factors, you can see that the patterns are remarkably similar. 
The most obvious difference is that the first row of the transposed W factor does not correctly estimate 
the values for Patients 32 and 39. The NMF estimate indicates that these patients only exhibit 
the second etiology, whereas in reality, they exhibit both.
</li>
</ol>

<h3>Summary</h3>
<p>
This article uses SAS IML to present an example that is similar to Section 4.2 in Fogel, et al. (2013).
The example simulates a data matrix that represents the expression profiles of 30 target genes for 40 clinical study participants. 
A classical SVD analysis produces factors whose product perfectly equals X. 
The SVD uses deterministic linear algebra, and is guaranteed to find 
an optimal rank-k approximation.
However, the factors contain both positive and negative values 
because the SVD enforces orthogonality of the factors. 
</p>
<p>
In contrast, the NMF produces factors that contain strictly nonnegative values.
When the true underlying features of the data involve positive quantities whose effects are additive, 
the NMF factors are more natural and interpretable.
However, the NMF attempts to optimize a non-convex objective function. The algorithm's solution depends on how the 
algorithm is initialized. Some results might correspond to local minimum instead of a global one.
</p>
<p>
You can <a href="https://github.com/sascommunities/the-do-loop-blog/blob/master/NMF/nmf_vs_svd.sas">download the complete SAS program that I used to simulate the data, run the NMF algorithm, and visualize the results</a>.
</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">A comparison of SVD and NMF factorizations</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html/feed</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/03/NMFvsSVD09-150x150.png" />
	</item>
		<item>
		<title>On the nonuniqueness of low-rank matrix factors</title>
		<link>https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html#comments</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 16 Mar 2026 09:28:49 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Matrix Computations]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57908</guid>

					<description><![CDATA[<p>A common task in statistics is to approximate a large data matrix by using a low-rank approximation. Low-rank approximations are used for reducing the dimension of a problem (for example, in principal component analysis), for image analysis via the singular value decomposition (SVD), and for decomposing large matrices that are [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html">On the nonuniqueness of low-rank matrix factors</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
A common task in statistics is to approximate a large data matrix by using a low-rank approximation. 
Low-rank approximations are used for reducing the dimension of a problem (for example, in principal component analysis), 
for <a href="https://blogs.sas.com/content/iml/2017/08/30/svd-low-rank-approximation.html">image analysis via the singular value decomposition (SVD)</a>,
and for <a href="https://blogs.sas.com/content/subconsciousmusings/2021/09/22/generating-word-embeddings/">decomposing large matrices that are encountered in text analytics</a>. 
Mathematically, these problems are similar: you start with a data matrix containing hundreds or thousands of columns (variables in the data), 
and you reduce it to a handful of fundamental features. 
</p><p>
This article looks at a two-factor approximation, where an n&nbsp;x&nbsp;m data matrix, Y, is approximated by the product L*R,
where L is an n&nbsp;x&nbsp;k matrix and R is a k&nbsp;x&nbsp;m matrix. (As a mnemonic, I use L as the "left" matrix and R as the "right" matrix.) 
Usually k is a small integer. This is motivated by a particular two-factor decomposition, called <a href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">the nonnegative matrix factorization (NMF),which I write about in another article.</a>
Although the SVD and the NMF share similarities, they are also very different.
For most data matrices, you can standardize the factors in the SVD 
to obtain a unique decomposition. 
However, two-factor NMF decomposition is more complicated. This article takes a closer look at the nonuniqueness of the factors 
in a two-factor decomposition.
</p>
<p>
The nonuniqueness of the NMF explains why different algorithms might give NMFs that are correct but look completely different. 
The tips in this article make it easier to compare factorizations that are produced by different software or different algorithms.
</p>

<h3>The Fundamental Theorem of Linear Algebra</h3>
<p>
Nonuniqueness of factors is very common in mathematics. For example, the integer 12 can be expressed in multiple ways as the product of two nontrivial factors: 2*6, and 3*4.
A way to deal with nonuniqueness is to standardize the factors so that they have special properties. For example, the Fundamental Theorem of Arithmetic says that 
every composite integer can be represented uniquely as a product of prime numbers, up to the order of the factors. This allows us to represent 12 = 2<sup>2</sup>*3
in a standard way. Note that the representation has two properties: the factors are "standardized" to be prime, and the prime factors are listed in increasing order (2 before 3)
to eliminate the <em>order ambiguity</em>.
</p>
<p>
In a similar way, the SVD decomposition of a matrix is sometimes called <a href="https://blogs.sas.com/content/iml/2017/08/28/singular-value-decomposition-svd-sas.html">the Fundamental Theorem of Linear Algebra</a>.
The singular value decomposition states that every <em>n</em>&nbsp;x&nbsp;<em>p</em> matrix can be written as the product of three matrices: A = U&nbsp;&Sigma;&nbsp;V<sup>T</sup> where 
U and V are orthogonal matrices, and &Sigma; is a diagonal matrix that contains the singular values. The SVD is not unique, but if the singular values of &Sigma; are distinct, then we can obtain a factorization that is "mostly unique," 
in the following sense:
</p>
<ul>
<li>Order the singular values of &Sigma; in decreasing order. You can always do this by using a permutation that re-orders the columns of U and V.
</li>
<li>The columns of U and V are unit length, but they can vary in sign. Make some arbitrary choice such as "the first row of U is positive" or "the sum of the columns of U are positive."
Rules like these remove the <em>sign ambiguity</em>.
</li>
</ul>
<p>
All statistical libraries order the singular values. However, most do not standardize the signs of the columns of U and V, which means that you might see
slightly different SVDs if you compare the results from different software.  However, if you apply a rule to standardize the columns of U, then the results from 
different software will agree, provided that the singular values are distinct.
</p>
<p>
The SVD has a strong connection to low-rank approximations.
Namely, if you 
choose some integer <em>k</em>  &ge; 1, you can use the SVD to find the best rank-<em>k</em> approximation to the original matrix, Y.
The new decomposition is often denoted as U<sub>k</sub>&Sigma;<sub>k</sub>V`<sub>k</sub>, where 
U<sub>k</sub> is defined as the first k columns of U, &Sigma;<sub>k</sub> is the upper-left k&nbsp;x&nbsp;k submatrix of &Sigma;,
and V<sub>k</sub> is the first k columns of V.
</p>

<h3>Nonuniqueness of a two-factor decomposition: The scaling ambiguity</h3>
<p>
As explained above, the SVD provides a three-factor rank-k decomposition that can be standardized so that it becomes unique.
However, if you want to express the SVD by using only two factors, you must sacrifice uniqueness.
The SVD represents the linear transformation as the composition of a rotation, a scaling, and another rotation. You can try to distribute the scaling transformation (&Sigma;) 
into one or both of the orthogonal matrices, but there are infinitely many ways to do that. 
This is called the <em>scaling ambiguity</em>. 
If &alpha; is any value in [0, 1], you can form a two-factor decomposition in infinitely many ways:
<br />
Y = (U*&Sigma;<sup>&alpha;</sup>) (&Sigma;<sup>1-&alpha;</sup>V`) 
</p>
<p>
You can easily construct examples to demonstrate this nonuniqueness of a two-factor decomposition, as shown in the following SAS IML program:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #000080; font-weight: bold;">proc iml</span>; 
<span style="color: #0000ff;">reset</span> <span style="color: #0000ff;">fuzz</span>;
<span style="color: #006400; font-style: italic;">/* Define a simple matrix with an underlying rank of 2 */</span>
Y = <span style="color: #66cc66;">&#123;</span> <span style="color: #2e8b57; font-weight: bold;">8</span> <span style="color: #2e8b57; font-weight: bold;">11</span> <span style="color: #2e8b57; font-weight: bold;">5</span>,
     <span style="color: #2e8b57; font-weight: bold;">11</span> <span style="color: #2e8b57; font-weight: bold;">12</span> <span style="color: #2e8b57; font-weight: bold;">5</span>,
     <span style="color: #2e8b57; font-weight: bold;">12</span>  <span style="color: #2e8b57; font-weight: bold;">9</span> <span style="color: #2e8b57; font-weight: bold;">3</span>,
     <span style="color: #2e8b57; font-weight: bold;">17</span> <span style="color: #2e8b57; font-weight: bold;">14</span> <span style="color: #2e8b57; font-weight: bold;">5</span> <span style="color: #66cc66;">&#125;</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* Use SVD to find the factors */</span>
<span style="color: #0000ff;">call</span> svd<span style="color: #66cc66;">&#40;</span>U, Sigma, V, Y<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* Extract the first k=2 columns to get a rank-2 product */</span>
i = <span style="color: #2e8b57; font-weight: bold;">1</span>:<span style="color: #2e8b57; font-weight: bold;">2</span>;
U2 = U<span style="color: #66cc66;">&#91;</span>,i<span style="color: #66cc66;">&#93;</span>;
S  = Sigma<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>;
V2 = V<span style="color: #66cc66;">&#91;</span>,i<span style="color: #66cc66;">&#93;</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* You can distribute the scaling of S into the left factor, 
   the right factor, or both factors by varying alpha */</span>
alpha = <span style="color: #66cc66;">&#123;</span><span style="color: #2e8b57; font-weight: bold;">0</span>, <span style="color: #2e8b57; font-weight: bold;">0.5</span>, <span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#125;</span>;
<span style="color: #0000ff;">do</span> j = <span style="color: #2e8b57; font-weight: bold;">1</span> to nrow<span style="color: #66cc66;">&#40;</span>alpha<span style="color: #66cc66;">&#41;</span>;
   L = U2 <span style="color: #006400; font-style: italic;">* diag(S##alpha[j]);</span>
   R = T<span style="color: #66cc66;">&#40;</span> V2 <span style="color: #006400; font-style: italic;">* diag(S##(1-alpha[j])) );</span>
   <span style="color: #006400; font-style: italic;">/* The product L * R is identical for all choices of alpha! */</span>
   prod = L<span style="color: #006400; font-style: italic;">*R;</span>
   print prod;
<span style="color: #0000ff;">end</span>;</pre></td></tr></table></div>




<p>
I do not display the output from the program, but for every assignment inside the DO loop, the product L*R 
is exactly the same and is equal to Y (because Y is rank-2).
</p>

<h3>A problem with two-factor decompositions</h3>
<p>
In general, suppose you decompose a matrix Y as Y = L*R, where L is n&nbsp;x&nbsp;k and R is k&nbsp;x&nbsp;m.
This decomposition is not unique because 
you can also write <br />
Y = (L*A) (A<sup>-1</sup>R) <br />
for any invertible matrix, A.
Thus, let's examine ways to standardize the problem to resolve as many ambiguities as we can.
</p>

<h3>The order ambiguity</h3>
<p>
<a href="https://blogs.sas.com/content/iml/2015/04/29/permutation-matrix.html">A permutation matrix</a>
is an invertible matrix that changes the orders of columns or rows. The inverse of a permutation matrix equals its transpose.
Thus, for any two-factor decomposition Y = L*R, you can permute the columns of L and simultaneously permute the rows of R
by using a permutation matrix, P, as follows: <br />
Y = (L*P) (P`*R).<br />
This is known as the <em>order ambiguity</em> or the <em>permutation ambiguity</em>.
For example, the following IML program interchanges the order of the columns of L and the rows of R, and the product is the same:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Permute the columns of L and rows of R */</span>
P = <span style="color: #66cc66;">&#123;</span><span style="color: #2e8b57; font-weight: bold;">0</span> <span style="color: #2e8b57; font-weight: bold;">1</span>,
     <span style="color: #2e8b57; font-weight: bold;">1</span> <span style="color: #2e8b57; font-weight: bold;">0</span><span style="color: #66cc66;">&#125;</span>;
L_new = L <span style="color: #006400; font-style: italic;">* P;</span>
R_new = P` <span style="color: #006400; font-style: italic;">* R;</span>
Y_check = L_new <span style="color: #006400; font-style: italic;">* R_new;</span> <span style="color: #006400; font-style: italic;">/* perfectly reconstructs Y */</span>
print Y_check;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/rank-k01.png" alt="" width="79" height="144" class="alignnone size-full wp-image-57968" />

<h3>The mixing ambiguity</h3>
<p>
Recall that the SVD decomposition is (practically) unique is because the left and right factors are orthonormal. 
If you change the scaling of the U or V matrices, or you change the order of their columns and rows, they are still orthogonal.
Consequently, if we adopt certain conventions 
(for example, sort the columns by the magnitude of the singular values and require that V is orthonormal), then we can standardize the SVD, 
provided that the singular values are distinct.
</p>
<p>
If we do not insist that the left and right factors be orthogonal, then we might have to sacrifice uniqueness of the factorization.
As mentioned above, for <em>any</em> invertible matrix, A, it is true that
Y = (L*A) (A<sup>-1</sup>R) is a valid two-factor decomposition. The following IML program shows an example:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* A can be ANY invertible matrix. */</span>
A = <span style="color: #66cc66;">&#123;</span> <span style="color: #2e8b57; font-weight: bold;">0.9</span>  <span style="color: #2e8b57; font-weight: bold;">0.1</span>,
      <span style="color: #2e8b57; font-weight: bold;">0.1</span>  <span style="color: #2e8b57; font-weight: bold;">0.9</span><span style="color: #66cc66;">&#125;</span>;
L_new = L <span style="color: #006400; font-style: italic;">* A;</span>
R_new = inv<span style="color: #66cc66;">&#40;</span>A<span style="color: #66cc66;">&#41;</span> <span style="color: #006400; font-style: italic;">* R;</span>
Y_check = L_new <span style="color: #006400; font-style: italic;">* R_new;</span> <span style="color: #006400; font-style: italic;">/* the product is preserved */</span>
print Y_check;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/rank-k01.png" alt="" width="79" height="144" class="alignnone size-full wp-image-57968" />

<h3>Why is this important?</h3>
<p>
As mentioned earlier, a popular algorithm in machine learning and text mining is the nonnegative matrix factorization (NMF) algorithm.
Given an  n&nbsp;x&nbsp;m data matrix, Y, that has nonnegative elements, you can find factors 
L (an n&nbsp;x&nbsp;k matrix) and R (a k&nbsp;x&nbsp;m matrix) that also have nonnegative elements, 
and for which L*R is the best rank-k approximation to Y subject to the constraints that L and R are nonnegative.
The constraints on L and R might make us hopeful that we cannot arbitrarily form new factors (L*A) and (inv(A)*R).
That is often correct: Not all invertible matrices A will result in the new factors being nonnegative!
</p><p>
You can prove that if A is the product of a permutation matrix and a positive diagonal matrix, then the new factors are  
nonnegative, but, unfortunately, there are other choices of A
that also preserve nonnegativity of factors.
</p><p>
For example, the following rank-2 example defines L and R as nonnegative factors. The A matrix is the same as in the previous section.
The inverse of A is NOT nonnegative, but the products (L*A) and (inv(A)*R) retain their nonnegative properties.
Consequently, the NMF factorizations are both correct. Furthermore, you cannot transform one into the other simply by 
scaling and sorting the columns of L or the rows of R.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">L1 = <span style="color: #66cc66;">&#123;</span><span style="color: #2e8b57; font-weight: bold;">1</span> <span style="color: #2e8b57; font-weight: bold;">2</span> ,
      <span style="color: #2e8b57; font-weight: bold;">3</span> <span style="color: #2e8b57; font-weight: bold;">8</span> ,
      <span style="color: #2e8b57; font-weight: bold;">5</span> <span style="color: #2e8b57; font-weight: bold;">4</span> <span style="color: #66cc66;">&#125;</span>;
R1 = <span style="color: #66cc66;">&#123;</span><span style="color: #2e8b57; font-weight: bold;">22</span> <span style="color: #2e8b57; font-weight: bold;">21</span> <span style="color: #2e8b57; font-weight: bold;">20</span> <span style="color: #2e8b57; font-weight: bold;">19</span>,
      <span style="color: #2e8b57; font-weight: bold;">12</span> <span style="color: #2e8b57; font-weight: bold;">22</span> <span style="color: #2e8b57; font-weight: bold;">11</span> <span style="color: #2e8b57; font-weight: bold;">10</span><span style="color: #66cc66;">&#125;</span>;
Y = L1 <span style="color: #006400; font-style: italic;">* R1;</span>
print Y, L1, R1;
&nbsp;
A = <span style="color: #66cc66;">&#123;</span><span style="color: #2e8b57; font-weight: bold;">0.9</span>  <span style="color: #2e8b57; font-weight: bold;">0.1</span>, <span style="color: #2e8b57; font-weight: bold;">0.1</span>  <span style="color: #2e8b57; font-weight: bold;">0.9</span><span style="color: #66cc66;">&#125;</span>;
invA = inv<span style="color: #66cc66;">&#40;</span>A<span style="color: #66cc66;">&#41;</span>;
print A, invA;     <span style="color: #006400; font-style: italic;">/* Note: inv(A) is not nonnegative! */</span>
&nbsp;
<span style="color: #006400; font-style: italic;">/* nevertheless, the products (L*A) and (inv(A)*R) can still be nonnegative */</span>
L2 = L1<span style="color: #006400; font-style: italic;">*A;</span>
R2 = inv<span style="color: #66cc66;">&#40;</span>A<span style="color: #66cc66;">&#41;</span><span style="color: #006400; font-style: italic;">*R1;</span>
Y_check = L2 <span style="color: #006400; font-style: italic;">* R2;</span>
print Y_check, L2, R2;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/rank-k02.png" alt="" width="449" height="464" class="alignnone size-full wp-image-57965" srcset="https://blogs.sas.com/content/iml/files/2026/03/rank-k02.png 449w, https://blogs.sas.com/content/iml/files/2026/03/rank-k02-290x300.png 290w" sizes="(max-width: 449px) 100vw, 449px" />

<p>
This example motivates an important question: is there is a way to check whether
two decompositions (L1, R1) and (L2, R2) represent the same solution up to multiplication by an invertible matrix?
The answer is yes! If all the matrices are full rank and L1*R1 = L2*R2, then you can
solve a least-squares system to find the matrix A that converts one solution into the other.
</p><p>
Here is how to find the matrix A.
Assume that there is a k&nbsp;x&nbsp;k matrix A for which 
<br />
L2 = L1 * A <br />
R2 = inv(A) * R1 <br />
You can use 
<a href="https://blogs.sas.com/content/iml/2018/11/21/generalized-inverses-for-matrices.html">the generalized inverse (also called the Moore-Penrose inverse)</a>
to solve for A in each equation.
In the first equation, multiply on the left by 
the Moore-Penrose pseudoinverse for L1.
In the second equation, multiply on the right by 
the Moore-Penrose pseudoinverse for R1.
If you get the same answer from each equation, then the solutions are equivalent.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* What if we use different algorithms and one algorithm returns
   (L1,R1) whereas the other returns (L2,R2)? Can we find an invertible 
   matrix, A, that maps one solution to the other? */</span>
A_from_L = ginv<span style="color: #66cc66;">&#40;</span>L1<span style="color: #66cc66;">&#41;</span> <span style="color: #006400; font-style: italic;">* L2;</span>
A_from_R = R1 <span style="color: #006400; font-style: italic;">* ginv(R2);</span>
<span style="color: #006400; font-style: italic;">/* Do the two LS solutions give the same matrix? */</span>
max_diff = <span style="color: #0000ff;">max</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">abs</span><span style="color: #66cc66;">&#40;</span>A_from_L - A_from_R<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
print max_diff;
print A_from_L, A_from_R;     <span style="color: #006400; font-style: italic;">/* Note: inv(A) is not nonnegative! */</span></pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/rank-k03.png" alt="" width="181" height="190" class="alignnone size-full wp-image-57962" />

<p>
In this example, each overdetermined system produces the same matrix, so the two factorizations are equivalent up to a linear transformation.
</p>

<h3>Summary</h3>
<p>
This article explores the uniqueness of various matrix factorizations related to rank-k approximation.
If you use the classical SVD algorithm, you can obtain a rank-k approximation of a data matrix that is mostly unique. 
To get uniqueness, you must specify how to resolve the order ambiguity and the scaling ambiguity.
If a matrix has distinct singular values, you can obtain a unique rank-k decomposition that you can use to 
compare SVDs from different software. The factors in the SVD are orthogonal matrices, which imposes enough 
restrictions on the factors that you can obtain a unique representation.

</p><p>
Unfortunately, uniqueness is not guaranteed for other decompositions. 
<a href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">An example is the NMF, which I discuss in another article.</a>
In the NMF, the factors are not orthogonal, but they are composed of nonnegative elements. 
You can resolve the scaling ambiguity and the order ambiguity by standardizing the factors.
However, two algorithms might give different factors whose product is the same.
In that case, you can solve a least squares system to determine whether there is an invertible linear transformation that 
convert one set of factors into the other set.
</p>


<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html">On the nonuniqueness of low-rank matrix factors</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/03/16/on-the-nonuniqueness-of-low-rank-matrix-factors.html/feed</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2017/08/svd1-150x150.png" />
	</item>
		<item>
		<title>Estimate pi by tossing coins</title>
		<link>https://blogs.sas.com/content/iml/2026/03/09/estimate-pi-by-tossing-coins.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/03/09/estimate-pi-by-tossing-coins.html#respond</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 09 Mar 2026 09:24:44 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[pi day]]></category>
		<category><![CDATA[Simulation]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57752</guid>

					<description><![CDATA[<p>Welcome to Pi Day 2026! Every year on March 14th (written 3/14 in the US), people in the mathematical sciences celebrate "all things pi-related" because 3.14 is the three-decimal approximation to &#960; &#8776; 3.14159265358979.... The purpose of this day is to have fun and celebrate the importance and ubiquity of [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/09/estimate-pi-by-tossing-coins.html">Estimate pi by tossing coins</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
Welcome to Pi Day 2026!
Every year on March 14th (written 3/14 in the US), people in the mathematical sciences celebrate "all things pi-related"
because 3.14 is the three-decimal approximation to
 &pi; &asymp; 
3.14159265358979.... 
The purpose of this day is to have fun and celebrate the importance and ubiquity of pi.
</p><p>
This year's article shows a relationship between pi and simple game that involves a series of coin tosses.
In a short, readable paper, 
<a href="https://arxiv.org/pdf/2602.14487">Jim Propp (2026) describes a simple Monte Carlo method for estimating pi by tossing a coin</a>.
In this blog post, I use SAS to demonstrate the convergence by using a Monte Carlo simulation.
I also explore the distribution of the statistic that estimates pi.
But before you get all your friends together and start tossing coins, let me state clearly that this
estimation method is not efficient in practice!  This is an example of a mathematical result that is correct,
but not efficient in practice.
</p>

<h3>The game: Toss a coin until there are more heads than tails</h3>
<p>
How can you flip a fair coin to estimate pi?
Imagine a stadium filled with people. Each person has a fair coin and plays the following coin-tossing game:
</p>
<ul>
<li>Toss a coin and keep track of how many heads and tails appear. 
</li>
<li>
Stop tossing as soon as the number of heads exceeds the number of tails.
</li>
<li>
Divide the total number of heads by the total number of tosses. Transmit this fraction to a moderator.
</li>
</ul>

<p>
At the last step, each person sends their result to a moderator. (Nowadays, you would enter the numbers into a phone app!). 
The moderator (or app) computes the average of all the fractions 
of all the people in the stadium.
The mathematical result by Propp is that the average of the fractions is &pi;/4.
Thus, if the moderator multiplies the average by 4, she obtains an approximation of pi.
</p>

<h3>Sequences and probabilities</h3>
<p>
Let's examine the coin-tossing process for each person in the stadium.
First, notice that the total number of tosses is always odd. This is because the 
game stops as soon as the number of heads is one more than the number of tails. Here are some possibilities
for the people in the auditorium:
</p>
<ul>
<li>
About 1/2 (or 50%) of the people in the stadium toss the coin only once. 
They see heads on the very first coin toss. 
They are finished. They transmit the fraction 1/1 to the moderator.
</li>
<li>
1/8 (or 12.5%) of the people toss the coin three times. 
Their heads-and-tails sequence is THH. They transmit the fraction 2/3 to the moderator because they saw 2 heads after 3 tosses.
</li>
<li>
2/32 (or 6.25%) of the people toss the coin five times. 
Their sequence is either TTHHH or THTHH. They transmit the fraction 3/5 to the moderator.
</li>
<li>
5/128  (or 3.9%) of the people toss the coin seven times. 
They transmit the fraction 4/7 to the moderator.
Their sequence is one of the following: TTTHHHH, TTHTHHH, TTHHTHH, THTTHHH, or THTHTHH.
</li>
<li>
In general, the fractions are all of the form (k+1) / (2k+1) for k=0, 1, 2, ....
For a fair coin, Propp shows that the exact probability of the k_th fraction is 
<span class='MathJax_Preview'>\(\binom{2k}{k} / (2 \cdot 4^k (k+1))\)</span><script type='math/tex'>\binom{2k}{k} / (2 \cdot 4^k (k+1))</script>.  (In practice, a coin-flipper needs to transmit only the total number of tosses (t) 
because the number of heads is always (t+1)/2.)
</li>
</ul>

<h3>The mathematical result: The expected value of a distribution</h3>
<p>
This section is very "mathy," so feel free to skip it and proceed to the next section where I show a SAS program that simulates the coin-tossing game to estimate pi.
</p><p>
Let X denote the random variable that represents the ratio
"number-of-heads divided by number-of-tosses" for a random trial. (A "trial" is the complete sequence of tosses.)
That is, values of X are the values x<sub>k</sub> = (k+1) / (2k+1) with the above probabilities.
Recall that the expected value of the discrete random variable is the sum 
E[X] = &Sigma; x<sub>k</sub> P(X=x<sub>k</sub>).
Propp noticed that the expected value for X 
is the Taylor series for (1/2)arcsine(1), which is &pi;/4.
</p><p>
In other words, <br />
<span class='MathJax_Preview'>\(\pi/4 = 1/2 \cdot (1/1) + 1/8 \cdot (2/3) + 2/32 \cdot (3/5) + 5/128 \cdot (4/7) + ... 
\)</span><script type='math/tex'>\pi/4 = 1/2 \cdot (1/1) + 1/8 \cdot (2/3) + 2/32 \cdot (3/5) + 5/128 \cdot (4/7) + ... 
</script>
<br />
and the probabilities 1/2, 1/8, 2/32, 5/128, ... can be estimated by using a large number of physical coin tosses or by using a computer simulation.
This is a very cool result!
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/03/CoinPi1.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/CoinPi1.png" alt="" width="360" height="270" class="alignright size-full wp-image-57872" srcset="https://blogs.sas.com/content/iml/files/2026/03/CoinPi1.png 640w, https://blogs.sas.com/content/iml/files/2026/03/CoinPi1-300x225.png 300w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
This is shown graphically in the plot to the right. (Click to enlarge.) 
Each vertical bar is located at a ratio that represents the number of heads divided by the number
of coin tosses. The height of the bar is the probability that the ratio will occur in the coin-tossing game.
The red vertical line is the quantity &pi;/4.
The mathematical result is that the &pi;/4 is the weighted sum of all the ratios.
</p><p>
At this point a pure mathematician is satisfied. The series converges, the result is proved, end of discussion!
But an applied mathematician is interested in the rate of convergence, and on that topic, there is bad news.
As a general rule, Taylor series converge fastest near the center and slowest near the boundary of their disk of convergence.
The Taylor series (about 0) for arcsine(x) converges for |x| &le; 1, which means that the series is very slow to converge
when x=1. Furthermore, because it is a series of positive terms, the partial sums approach &pi; from below,
and any finite summation is less than &pi;.  
</p>
<p>
How slow is the convergence? The following approximations give you some idea:
</p>
<ul>
<li>If you use four terms of the series (as written above), the estimate of &pi; is 4*(0.62) = 2.483. This is not a very good approximation of pi!
</li>
<li>If you use 10 terms of the series, the estimate is &pi; &asymp; 4*(0.691) = 2.764, which is still a bad approximation of pi.
</li>
<li>If you use 100 terms of the series, the estimate is slightly more than 3: &pi; &asymp; 4*(0.757) = 3.028.
</li>
<li>Even if you use 100,000 terms of the series, the estimate is only good to one decimal place: &pi; &asymp; 4*(0.7845) = 3.138.
</li>
</ul>

<p>
In conclusion, this series is mathematically interesting but is not an efficient way to compute pi.
</p>


<h3>A Monte Carlo simulation to estimate pi</h3>
<p>
There are <a href="https://blogs.sas.com/content/iml/2022/08/01/examples-monte-carlo-simulation.html">several Monte Carlo simulations that estimate pi</a>,
but now we have a new method: simulate many people tossing a coin until the number of heads is greater than the number of tails. 
You can write a simple DATA step in SAS to perform the Monte Carlo simulation, as follows:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Play the coin-tossing game to estimate pi (Propp, 2026) */</span>
<span style="color: #0000ff;">%macro</span> CoinIsHeads;
   <span style="color: #66cc66;">&#40;</span>rand<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">&quot;Bernoulli&quot;</span>, <span style="color: #2e8b57; font-weight: bold;">0.5</span><span style="color: #66cc66;">&#41;</span>=<span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #0000ff;">%mend</span>;
&nbsp;
<span style="color: #0000ff;">%let</span> numPeople = <span style="color: #2e8b57; font-weight: bold;">1000</span>;
<span style="color: #000080; font-weight: bold;">data</span> TossCoins;
<span style="color: #0000ff;">call</span> streaminit<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">123</span><span style="color: #66cc66;">&#41;</span>;
cusum = <span style="color: #2e8b57; font-weight: bold;">0</span>;
<span style="color: #0000ff;">do</span> trialID = <span style="color: #2e8b57; font-weight: bold;">1</span> to <span style="color: #0000ff; font-weight: bold;">&amp;numPeople</span>;          <span style="color: #006400; font-style: italic;">/* each person tosses a coin */</span>
   nTails = <span style="color: #2e8b57; font-weight: bold;">0</span>;
   nHeads = <span style="color: #2e8b57; font-weight: bold;">0</span>;
   <span style="color: #006400; font-style: italic;">/* toss until the number of heads is greater than the number of tails */</span>
   <span style="color: #0000ff;">do</span> <span style="color: #0000ff;">while</span> <span style="color: #66cc66;">&#40;</span>nTails &gt;= nHeads<span style="color: #66cc66;">&#41;</span>;        
      <span style="color: #0000ff;">if</span> %CoinIsHeads <span style="color: #0000ff;">then</span> nHeads + <span style="color: #2e8b57; font-weight: bold;">1</span>;
      <span style="color: #0000ff;">else</span>                 nTails + <span style="color: #2e8b57; font-weight: bold;">1</span>;
   <span style="color: #0000ff;">end</span>;
   <span style="color: #0000ff;">n</span> = nTails + nHeads;                <span style="color: #006400; font-style: italic;">/* total number of tosses */</span>
   trial_ratio = nHeads / <span style="color: #0000ff;">n</span>;           <span style="color: #006400; font-style: italic;">/* the ratio nHeads/n for each trial */</span>
   cusum + trial_ratio;                <span style="color: #006400; font-style: italic;">/* cumulative sum of ratios */</span>
   pi_estimate = <span style="color: #2e8b57; font-weight: bold;">4</span><span style="color: #006400; font-style: italic;">*cusum / trialID;</span>    <span style="color: #006400; font-style: italic;">/* multiply by 4 so running mean = estimate of pi */</span>
   <span style="color: #0000ff;">output</span>;
<span style="color: #0000ff;">end</span>;
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #0000ff;">title</span> <span style="color: #a020f0;">&quot;Running Average of Ratio nHeads/n&quot;</span>;
<span style="color: #0000ff;">%let</span> pi = <span style="color: #2e8b57; font-weight: bold;">3.14159265358979</span>;
<span style="color: #000080; font-weight: bold;">proc sgplot</span> <span style="color: #000080; font-weight: bold;">data</span>=TossCoins;
   scatter <span style="color: #0000ff;">x</span>=trialID y=pi_estimate;
   refline <span style="color: #0000ff; font-weight: bold;">&amp;pi</span> / axis=y <span style="color: #0000ff;">label</span>=<span style="color: #a020f0;">'pi'</span> labelpos=<span style="color: #0000ff;">min</span> lineattrs=GraphData2;
   yaxis grid <span style="color: #0000ff;">label</span>=<span style="color: #a020f0;">&quot;Estimate of pi&quot;</span>;
   xaxis grid <span style="color: #0000ff;">label</span>=<span style="color: #a020f0;">&quot;Number of Trials&quot;</span>;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/CoinPi2.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/CoinPi2.png" alt="" width="640" height="480" class="alignnone size-full wp-image-57869" srcset="https://blogs.sas.com/content/iml/files/2026/03/CoinPi2.png 640w, https://blogs.sas.com/content/iml/files/2026/03/CoinPi2-300x225.png 300w" sizes="(max-width: 640px) 100vw, 640px" /></a>

<p>
The graph shows the convergence of the estimate of &pi; as you successively tabulate the results of 1,000 participants. 
For this random set of trials, the estimate starts large, but settles down after about 200 results. 
For the 1,000 participants in this trial, the final estimate is 3.14514, which is only two digits of accuracy.
See Appendix 1 for a discussion of how the estimate might change 
if you repeat the experiment. 
</p>

<h3>The number of trials is unbounded</h3>
<p>
There is a hidden inefficiency in this simulation. Notice that the innermost loop in the program is a DO-WHILE loop
that simulates tossing a coin sequentially until the number of heads exceeds the number of tails. 
You might think that this inner loop runs a few dozen times, at most. But that's not true!
A participant might get "unlucky" and start out by tossing a large number of tails, it might take hundreds or thousands 
of tosses before the number of heads exceeds the number of tails!
You can use PROC UNIVARIATE to reveal the five largest number of trials in the simulation:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #000080; font-weight: bold;">proc univariate</span> <span style="color: #000080; font-weight: bold;">data</span>=TossCoins;
   <span style="color: #0000ff;">var</span> <span style="color: #0000ff;">n</span>;
   ods <span style="color: #0000ff;">select</span> ExtremeObs;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/CoinPi3.png" alt="" width="180" height="224" class="alignnone size-full wp-image-57875" />

<p>
The column outlined in red shows the number of coin tosses for the most unlucky participants. 
These unfortunate coin-flippers tossed their coins many thousands of times until the number of heads exceeded the number of tails. 
The most unlucky participant tossed his coin 526,919 times! Clearly, this experiment not feasible in real life!
See Appendix 2 to learn more about why you should expect arbitrarily large stopping times.
</p>

<h3>Summary</h3>
<p>
Propp (2026) proves an interesting mathematical result: You can use a series of coin flips to estimate &pi;.
The method involves a mathematical game where you toss a coin
until the number of heads exceeds the number of tails.
You can estimate &pi; as 
the average of the ratio "number-of-heads divided by number-of-tosses."
Asymptotically, as many people play this game, the estimate converges to &pi;/4, which is the expected value of the distribution of 
the ratio. Just multiply the average by 4 to estimate &pi;.
</p><p>
However, don't start flipping coins just yet! In practice, this process converges very slowly, and 
an unlucky coin-flipper might spend hours or days before the sequence of heads and tails shows more heads than tails.
</p>


<h3>Appendix 1: The variance of the estimate of pi</h3>
<p>
The main body of this article includes a graph of the running total of the estimate of pi when 1,000 people play the coin-tossing game.
Of course, that is only one possible realization. If you ask these same 1,000 people to repeat the experiment, the result will be different
because of random chance.
The following graph shows 100 different instances of playing the same game.
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/03/CoinPi4.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/CoinPi4.png" alt="" width="480" height="360" class="alignnone size-full wp-image-57899" srcset="https://blogs.sas.com/content/iml/files/2026/03/CoinPi4.png 640w, https://blogs.sas.com/content/iml/files/2026/03/CoinPi4-300x225.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>
<p>
The values at the right side of the graph (n=1000) represent the estimates from each game.
The average of those estimates is called the Monte Carlo estimate. For these trials,
the Monte Carlo estimate of &pi; is 3.1438, which is accurate to two decimal places.
An empirical 95% confidence interval for that estimate is [3.090, 3.187].
</p>




<h3>Appendix 2: The stopping time of a discrete symmetric random walk</h3>
<p>
The coin-tossing game is a classic example of a one-dimensional symmetric random walk. 
In a symmetric random walk, we imagine an ant that starts at x=0 on a number line. 
With probability 1/2, the ant takes a random unit step to the left or right. 
What is the expected number of steps until the ant reaches x=1?
</p><p>
Random walks are well-studied. 
It is mathematically certain that the ant will eventually reach x=1,
but <a href="https://en.wikipedia.org/wiki/Random_walk#One-dimensional_random_walk">you can prove that the expected stopping time is infinite</a>.
Returning to coin tosses, the game is guaranteed to end, but the number of coin flips before it ends is an unbounded quantity. 
If enough people play, some unlucky soul will pass out from exhaustion long before he reaches the stopping criterion.
</p>


<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/09/estimate-pi-by-tossing-coins.html">Estimate pi by tossing coins</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/03/09/estimate-pi-by-tossing-coins.html/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/03/CoinPi2-150x150.png" />
	</item>
		<item>
		<title>Two new visualizations of a principal component analysis</title>
		<link>https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html#comments</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 02 Mar 2026 10:27:54 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Statistical Graphics]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57761</guid>

					<description><![CDATA[<p>SAS procedures automatically generate many graphs when you turn on ODS graphics. For example, I have written about how to interpret the graphs that are produced automatically when you use PROC PRINCOMP to perform a principal component analysis (PCA). The built-in graphs are well-designed and informative, but this article discusses [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html">Two new visualizations of a principal component analysis</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
SAS procedures automatically generate many graphs when you turn on ODS graphics.
For example, I have written about <a href="https://blogs.sas.com/content/iml/2019/11/04/interpret-graphs-principal-components.html">how to interpret the graphs that are produced automatically when you use PROC PRINCOMP to perform a principal component analysis (PCA)</a>.
The built-in graphs are well-designed and informative, but this article discusses two additional graphs that can help you to visualize a PCA. 
Both use a heat map. The first uses a heat map to visualize the relationship between the principal components (PCs) and the original variables. 
The second uses a heat map to visualize the principal component scores, which are obtained by projecting the original (centered) data onto the most important PCs.
As part of these visualizations, I show how to use SAS to construct a symmetric range for a gradient color palette.
</p>

<h3>The data and the PCA</h3>
<p>
This article uses the Scotch whisky data set. I previously showed <a href="https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html">how to use PROC PRINCOMP in SAS to perform a classical PCA on the Whisky data</a>.
The previous article includes the following call to PROC PRINCOMP. After performing the PCA, the procedure saves the table of the top eigenvectors to a SAS data set named 'PCs'.
It also saves the principal component scores to a data set named 'PC_COV_Scores'.  
Notice that I use a covariance-based analysis for these data because all original variables are measured on the same scale and have the same units. 
If you are analyzing data that have different scales or units (for example, lengths and weights), you should omit the COV option so that the correlation matrix is analyzed.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* PCA from PROC PRINCOMP. 
   The Whisky data are defined in https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html */</span>
<span style="color: #0000ff;">%let</span> varNames = Tobacco Medicinal Smoky Body Spicy Winey Nutty Honey Malty Fruity Sweetness Floral;
<span style="color: #000080; font-weight: bold;">proc princomp</span> <span style="color: #000080; font-weight: bold;">data</span>=Whisky <span style="color: #0000ff;">N</span>=<span style="color: #2e8b57; font-weight: bold;">4</span> cov plots=all
              out=PC_COV_Scores;   <span style="color: #006400; font-style: italic;">/* 'scores' = projections onto first PCs */</span>
   <span style="color: #0000ff;">var</span> <span style="color: #0000ff; font-weight: bold;">&amp;VarNames</span>;
   ID ID;
   ods <span style="color: #0000ff;">select</span> Eigenvectors ScoreMatrixPlot <span style="color: #a020f0;">'2 by 1'</span> <span style="color: #a020f0;">'4 by 3'</span>;
   ods <span style="color: #0000ff;">output</span> Eigenvectors = PCs;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>





<h3>Visualize the PCs: The quick way</h3>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky02.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky02.png" alt="" width="344" height="387" class="alignright size-full wp-image-57689" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky02.png 344w, https://blogs.sas.com/content/iml/files/2026/02/whisky02-267x300.png 267w" sizes="(max-width: 344px) 100vw, 344px" /></a>
<p>
The output from PROC PRINCOMP includes a table that shows the most important eigenvectors. The table for the whisky data is shown to the right.
The eigenvectors are linear combination of the original variables. The components of the eigenvectors are weights: 
values that have large magnitudes indicate that the principal component is strongly correlated with that original variable. 
Consequently, researchers often use the eigenvector table to assign meaning to the principal components. For example, in the 
previous article, I made the following interpretations of the PCs:
</p>
<ul>
<li>The first column shows how the first (and most important) PC relates to the original variables. 
The first PC represents high values of Body, Smoky, Medicinal, and Tobacco flavors and low values of Honey, Fruity, Sweetness, and Floral. 
Thus, the first PC dimension contrasts harsh "masculine" flavors from sweeter "feminine" flavors.
</li>
<li>
The second column represents whiskies that have many taste characteristics. 
The second PC is associated with Body, Winey, and Honey flavors. 
Because most of the values in the column are positive, this component also represents a general blend of flavors.
</li>
</ul>

<p>
To interpret the PCs from the table requires scanning the column and mentally noting the numbers that are largest in magnitude.
This is tedious and susceptible to error. A simpler approach is to color-code the table as a heat map.
Color the heat map so that 
cells with high values are in one color and cells with low values are in another color. 
You can use PROC SGPLOT to create such a heat map for
the table of eigenvectors, which was written to the 'PCs' data set.
<a href="https://blogs.sas.com/content/iml/2011/01/31/reshaping-data-from-wide-to-long-format.html">The following call 
to PROC TRANSPOSE converts the data from wide form to long form.</a>
You can then call the HEATMAPPARM statement in PROC SGPLOT to create a heat map, as follows:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Use a heatmap to visualize the PCs. 
   First, transform from wide format to long format. See
   https://blogs.sas.com/content/iml/2011/01/31/reshaping-data-from-wide-to-long-format.html
*/</span>
<span style="color: #000080; font-weight: bold;">proc transpose</span> <span style="color: #000080; font-weight: bold;">data</span>=PCs 
   out=LongPC<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">rename</span>=<span style="color: #66cc66;">&#40;</span>Col1=Value<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> name=PC;                    
   <span style="color: #0000ff;">by</span> Variable notsorted;       <span style="color: #006400; font-style: italic;">/* for each row */</span>
   <span style="color: #0000ff;">var</span> Prin1-Prin4;             <span style="color: #006400; font-style: italic;">/* make a row for these variables */</span>
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #0000ff;">title</span> <span style="color: #a020f0;">&quot;Principal Components&quot;</span>;
title2 <span style="color: #a020f0;">&quot;Unsymmetric Range for Gradient Legend&quot;</span>;
ods graphics / width=360px height=480px;
<span style="color: #000080; font-weight: bold;">proc sgplot</span> <span style="color: #000080; font-weight: bold;">data</span>=LongPC;
   heatmapparm <span style="color: #0000ff;">x</span>=PC y=Variable colorresponse=Value / colormodel=AltTwoColor outline;
   yaxis <span style="color: #0000ff;">reverse</span>;
   <span style="color: #0000ff;">label</span> PC=<span style="color: #a020f0;">&quot;Principal Components&quot;</span> Variable=<span style="color: #a020f0;">&quot;Flavor Characteristics&quot;</span>;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat1.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat1.png" alt="" width="360" height="480" class="alignnone size-full wp-image-57776" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat1.png 360w, https://blogs.sas.com/content/iml/files/2026/02/whisky_heat1-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
This visualization uses red for high values and blue for low values.
</p>
<ul>
<li>
The first column shows that the largest (positive) contributions for the first PC come from the Medicinal, Smoky, and Body variables. 
Those cells are in red.
The first PC has "negative" amounts of Honey, Fruity, Sweetness, and Floral. Those cells are in blue.
</li><li>
The second column shows that the second PC is formed by using positive contributions from the Winey, Body, and Honey variables, 
and "negative" amounts of the Medicinal and Floral variables.
</li><li>
The third PC has a large contribution from the Spicy variable.
The fourth PC that has a large contribution from the Nutty variable.
</li>
</ul>
<p>
This graph is sufficient for many PCA analyses, but it has a problem.
I do not like the fact that the range of the color scale is not symmetric about the value 0.
In this heat map, there are blue cells that represent positive values. 
The darkest red corresponds to the value +0.85, whereas the darkest blue corresponds to -0.38.
</p>
<p>
I think the graph would be improved and more accurate if the neutral color (white) corresponded to 0, and the positive and negative ranges were equal in length.
Then you could unequivocally say, "red cells are positive, and blue cells are negative." 
The code in the next section shows how to accomplish this modification. The resulting heat map follows:
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2.png" alt="" width="360" height="480" class="alignnone size-full wp-image-57782" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2.png 360w, https://blogs.sas.com/content/iml/files/2026/02/whisky_heat2-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
This second heat map is easier to read because now red represents positive contributions, white represents zero, and 
blue represents negative contributions. For each column, you can use the pattern of dark reds and blues to interpret how the PC
is formed as a linear combination of the original variables.  
</p>


<h3>A symmetric range attribute map for diverging color ramps</h3>
<p>
A two-color ramp that passes through a third "neutral" color is called a <a href="https://blogs.sas.com/content/iml/2014/10/01/colors-for-heat-maps.html">diverging color palette</a>. They are most informative when the neutral color corresponds to a meaningful value. Sometimes the neutral color corresponds to a baseline value or an average value.
In this case, I want the neutral color corresponds to 0.
I also want the range of colors to correspond to a symmetric interval about 0 so that dark shades or red and blue 
represent the same magnitudes but different signs.
</p><p>
For the heat map in the previous section, the maximum value in the table is about 0.85, and the minimum value is about -0.38. Thus, the middle value (0.235) is used for the neutral color. 
Instead, I want to define the color range to be the symmetric interval [-0.85, 0.85] so that the midpoint (0) is used for the neutral color.
</p>
<p>
In SAS, you can accomplish this by using a <a href="https://blogs.sas.com/content/iml/2024/10/28/automate-range-attr-map.html">range attribute map</a>. 
A range attribute map is a data set that assigns the range and colors for a color ramp.
When you define a range attribute map for a variable that has both positive and negative values, you can use two special values to ensure that the range is symmetric about 0:
</p>
<ul>
<li>
Use the keyword _NEGMAXABS_ for the minimum value. This keyword is equivalent to the expression –max(abs(MIN), abs(MAX)).
</li>
<li>
Use the keyword _MAXABS_ for the maximum value. This keyword is equivalent to the expression max(abs(MIN), abs(MAX)).
</li>
</ul>

<p>
The following SAS DATA step creates the range attribute map. It uses a red-blue diverging palette of colors from the ColorBrewer system.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* create a range attribute data set so that the value range is symmetric, such as 
   [-0.85, 0.85] with 0 in the middle. There is a trick to doing this:
   Use &quot;_NEGMAXABS_&quot; = –max(abs(MIN), abs(MAX))  for the MIN variable and 
   use &quot;_MAXABS_&quot; = max(abs(MIN), abs(MAX))  for the MAX variable.
   Use a 5-color diverging blue-white-red palette from the ColorBrewer system:
   CX0571B0 CX92C5DE CXF7F7F7 CXF4A582 CXCA0020 
*/</span>
<span style="color: #000080; font-weight: bold;">data</span> SymRangeAttrs;
<span style="color: #0000ff;">retain</span> ID <span style="color: #a020f0;">&quot;SymRange&quot;</span>;
<span style="color: #0000ff;">length</span> <span style="color: #0000ff;">min</span> <span style="color: #0000ff;">max</span> $12 color altcolor colormodel1-colormodel5 $15;
<span style="color: #0000ff;">input</span> <span style="color: #0000ff;">min</span> <span style="color: #0000ff;">max</span> color altcolor colormodel1-colormodel5;
datalines;
_NEGMAXABS_ _MAXABS_  .  .  CX0571B0 CX92C5DE CXF7F7F7 CXF4A582 CXCA0020
;
&nbsp;
<span style="color: #0000ff;">title</span> <span style="color: #a020f0;">&quot;Principal Components&quot;</span>;
title2 <span style="color: #a020f0;">&quot;Symmetric Range&quot;</span>;
ods graphics / width=360px height=480px;
<span style="color: #000080; font-weight: bold;">proc sgplot</span> <span style="color: #000080; font-weight: bold;">data</span>=LongPC RATTRMAP=SymRangeAttrs;  <span style="color: #006400; font-style: italic;">/* &lt;== DEFINE ATTR MAP DATA SET HERE */</span>
   heatmap <span style="color: #0000ff;">x</span>=PC y=Variable / colorresponse=Value outline
                             RATTRID=SymRange;                 <span style="color: #006400; font-style: italic;">/* &lt;== NAME MAP HERE */</span>
   yaxis <span style="color: #0000ff;">reverse</span>;
   <span style="color: #0000ff;">label</span> PC=<span style="color: #a020f0;">&quot;Principal Components&quot;</span> Variable=<span style="color: #a020f0;">&quot;Flavor Characteristics&quot;</span>;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<p>The resulting heat map is shown in the previous section.
</p>



<h3>Visualize the principal component scores</h3>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky05.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky05.png" alt="" width="360" height="270" class="alignright size-full wp-image-57701" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky05.png 640w, https://blogs.sas.com/content/iml/files/2026/02/whisky05-300x225.png 300w" sizes="(max-width: 360px) 100vw, 360px" /></a>
<p>
In addition to visualizing the principal components themselves, you can construct a heatmap that visualizes the PC scores for each observation.
Normally, scatter plots are used to visualize the PC scores. For example, the scatter plot to the right shows a scatter plot that shows the 
scores for the first two PCs for the 86 whiskies in the data. (Click to enlarge.)
</p>
<p>
The scatter plot shows you which observations (whiskies) have high or low values when projected onto the principal components.
Typically, you would look at multiple scatter plots to fully understand how each observations relates to the PC factors.
However, if you do not have too many observations, you can summarize that information in a single heat map.
With a single heat map, you can see the PC scores for each observation simply by scanning across a row. You can find the 
high and low values for each PC simply by scanning down each column.
You could sort the data by the scores onto the first PC, but I have not done that here.
</p>
<p>
Because the heat map shows one row for each observation, this visualization is not appropriate when there are many observations. 
However, the heat map can be useful for visualizing a subset of the data.
The following statements visualize a subset of 20 whiskies. PROC TRANSPOSE converts the scores from wide (four columns) to long format. 
The value for each dimension is shown by using the symmetric range attribute map, as explained in the previous section:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Now use a heatmap to visualize the projection of centered data onto dominant PCs. */</span>
<span style="color: #000080; font-weight: bold;">proc transpose</span> <span style="color: #000080; font-weight: bold;">data</span>=PC_COV_Scores<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">keep</span>=Distillery RowID Selected Prin1-Prin4<span style="color: #66cc66;">&#41;</span>
               out=LongScore<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">rename</span>=<span style="color: #66cc66;">&#40;</span>Col1=Value<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> name=PC;
   <span style="color: #0000ff;">where</span> selected=<span style="color: #2e8b57; font-weight: bold;">1</span>;            <span style="color: #006400; font-style: italic;">/* only display a subset of the data */</span>
   <span style="color: #0000ff;">by</span> Distillery notsorted;     <span style="color: #006400; font-style: italic;">/* for each row */</span>
   <span style="color: #0000ff;">var</span> Prin1-Prin4;             <span style="color: #006400; font-style: italic;">/* make a row for these variables */</span>
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
ods graphics / width=480px height=640px ;
<span style="color: #0000ff;">title</span> <span style="color: #a020f0;">&quot;Principal Component Scores for 20 Whiskies&quot;</span>;
<span style="color: #000080; font-weight: bold;">proc sgplot</span> <span style="color: #000080; font-weight: bold;">data</span>=LongScore RATTRMAP=SymRangeAttrs;  <span style="color: #006400; font-style: italic;">/* &lt;== DEFINE ATTR MAP DATA SET HERE */</span>
   heatmap <span style="color: #0000ff;">x</span>=PC y=Distillery / colorresponse=Value outline
                             RATTRID=SymRange;                 <span style="color: #006400; font-style: italic;">/* &lt;== NAME MAP HERE */</span>
   yaxis <span style="color: #0000ff;">reverse</span>;
   <span style="color: #0000ff;">label</span> PC=<span style="color: #a020f0;">&quot;Principal Components&quot;</span>;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<a href="https://blogs.sas.com/content/iml/files/2026/03/whisky_heat3.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/03/whisky_heat3.png" alt="" width="360" height="480" class="alignnone size-full wp-image-57809" srcset="https://blogs.sas.com/content/iml/files/2026/03/whisky_heat3.png 480w, https://blogs.sas.com/content/iml/files/2026/03/whisky_heat3-225x300.png 225w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
With this heat map, you can find the whiskies whose flavor profiles are positively or negatively correlated with each PC.
If you like whiskies that are full-bodied, medicinal, and  smoky (the primary attributes captured by the first PC), then scan down 
the first column and find all the dark red whiskies. Examples include Laphroaig, Lagavulin, and Ardbeg.
</p><p>
The "opposite" flavor profile for the first PC is floral, sweet, and fruity.
To find whiskies with those flavors, scan down 
the first column and find all the dark blue whiskies. The deepest blue is found for Auchentoshan, GlenMoray, Glengoyne.
</p>
<p>
Repeat this process for other flavor characteristics that are captured by a principal component: Red for whiskies that have a
positive correlation and blue for whiskies that are negatively correlated. 
</p>
<p>
Notice that each row of this heat map enables you to visualize the four-dimensional scores for each whisky. 
This is an advantage of scatter plots, which show two coordinates at a time.  You would need
six scatter plots to show all pairwise combinations of PC scores for this analysis,
but this one heat map contains the same information. Of course, the human eye cannot distinguish shades 
very well, so this display is best for providing qualitative information. The scatter plots are
better at showing small differences in the PC scores.
</p>
 

<h3>Summary</h3>
<p>
Statistical procedures in SAS provide many graphs to help you visualize your analyses. The PRINCOMP procedure 
can produce a large number of informative graphs. However, this article shows that it is helpful to use a heat map
to visualize the table of eigenvectors. 
The heat map is most useful when it uses a range attribute map that is symmetric about 0.
You can use a similar heat map to visualize the scores of a subset of the data.
This provides a single graph that reveals qualitative differences between whiskies.
</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html">Two new visualizations of a principal component analysis</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/03/02/heatmap-pca.html/feed</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/03/whisky_heat3-150x150.png" />
	</item>
		<item>
		<title>Comparing flavor characteristics of Scotch whiskies: A principal component analysis</title>
		<link>https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html#comments</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 23 Feb 2026 10:21:16 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57662</guid>

					<description><![CDATA[<p>Good old Scotch drink! Inspire me, until I lisp and wink, To sing your name! &#160;&#160;&#160;&#160;&#160; -- Robert Burns (1785) Scotch whisky (spelled without an 'e') is a popular drink that makes up a multi-billion dollar industry. Scotch whisky accounts for almost 75% of Scotland's food and drink exports! Poets [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html">Comparing flavor characteristics of Scotch whiskies: A principal component analysis</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
Good old Scotch drink! <br />
Inspire me, until I lisp and wink, <br />
To sing your name! <br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -- Robert Burns (1785)

</blockquote>


<p>
Scotch whisky (spelled without an 'e') is a popular drink that makes up a multi-billion dollar industry.
Scotch whisky accounts for almost 75% of Scotland's food and drink exports! Poets have lauded it. (See Appendix 2.)
On television and in movies, amber-brown whiskies are consumed by upper-class characters who have (or crave) 
power, prestige, and wealth.  But for those of us who are not so wealthy, it is scary to
spend $50 or more on a bottle of whisky that we might not enjoy.
</p><p>
Fortunately, statistics can help us to identify whiskies that we might like or dislike based on others that we have tasted. 
Suppose a friend offers a "wee dram" of whisky from his private stash. Maybe we like it, maybe we don't. If we like it, we might want to find other brands that taste similar. 
If we hate it, we'd like to identify other brands that we should avoid. 
</p><p>
There are several statistical methods that can identify whiskies that have a similar set of flavor characteristics, 
including classical techniques such as cluster analysis and factor analysis.
In this article, I will use classical principal component analysis (PCA) to analyze the flavor profiles of 86 Scotch whiskies. 
I will show which flavor characteristics are important for differentiating whiskies. <a href="https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html">A future article compares the PCA to a non-negative matrix factorization (NMF) of the same data.</a>
</p><p>
The data and analyses are motivated by an article by Young, Fogel, and Hawkins (2006, <a href="https://www.niss.org/sites/default/files/ScotchWhisky.pdf">"Clustering scotch whiskies using non-negative matrix factorization"</a>, <em>SPES Newsletter</em>). 
The data are from the book <a href="https://www.amazon.com/Whisky-Classified-Choosing-Single-Flavour/dp/1862059136"><em>Whisky Classified, Choosing Single Malts by Flavor</em></a> 
by David Wishart (2002, revised 2018), who classified Scotch whisky according to 12 flavors:  
Tobacco, Medicinal, Smoky, Body, Spicy, Winey, Nutty, Honey, Malty, Fruity, Sweetness, and Floral.
</p>
<p>
Full disclosure: I'm a statistician, not a Scotch connoisseur! 
My experience tasting whisky is very limited.
I have only tasted two Scotch whiskies in my life: 
I liked Glenfiddich, but I did not like Laphroaig. 
</p>


<h3>The Scotch whisky data</h3>
<p>
Appendix 1 defines the data used by Young, Fogel, and Hawkins (2006). There are 86 brands of whisky (the rows) and 12 flavors (the columns) plus some ID variables. The following table shows the flavor profiles for a subset of 20 whiskies:
</p>
<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky01.png" alt="" width="799" height="576" class="alignnone size-full wp-image-57680" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky01.png 799w, https://blogs.sas.com/content/iml/files/2026/02/whisky01-300x216.png 300w, https://blogs.sas.com/content/iml/files/2026/02/whisky01-768x554.png 768w, https://blogs.sas.com/content/iml/files/2026/02/whisky01-536x386.png 536w" sizes="(max-width: 799px) 100vw, 799px" />

<p>
Each whisky is assigned values 0-4 for each flavor, which indicates the absence of a flavor (0) or an intense expression of that flavor (4).
Let's look at the flavor  profiles for the two whiskies that I have tasted:
</p>
<ul>
<li>Glenfiddich is light-bodied and a little smoky. Its predominant characteristic is sweetness, and it has secondary characteristics of
malty, fruity, and floral.
</li>
<li>Laphroaig is full-bodied and very smoky and medicinal. It has a little sweetness with hints of
tobacco, winey, nutty, and malty.
</li>
</ul>


<h3>Principal components: Extracting flavor profiles</h3>
<p>
Flavors are imparted during various steps in the distillation process such as malting, fermentation, and maturing the whisky in oak casks.
They are interconnected and cannot be independently controlled by the blender. Scotch whisky can only contain water, grain, and yeast, so the Master Blender cannot add nuts or honey or fruit to affect the taste of the finished product. 
</p>
<p>
Because the flavors are correlated, a principal component analysis (PCA) is a classical way to identify combinations of flavors that most explain the variance in the 12-dimensional flavor space. A common use of PCA is to reduce the dimensionality of the problem from 12 flavor characteristics to a smaller number of "principal components," which are linear combinations of the flavors. Following Young, Fogel, and Hawkins (2006), I choose four PCs, which explain almost 70% of the variability in the data. In SAS, you can use the following call to PROC PRINCOMP to perform a PCA and to generate many tables and graphs that explain the analysis. Because the flavors are measured on the same scale, you can specify the COV option to use the covariance matrix for the analysis:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* PCA from PROC PRINCOMP */</span>
<span style="color: #0000ff;">%let</span> varNames = Tobacco Medicinal Smoky Body Spicy Winey Nutty Honey Malty Fruity Sweetness Floral;
<span style="color: #000080; font-weight: bold;">proc princomp</span> <span style="color: #000080; font-weight: bold;">data</span>=Whisky <span style="color: #0000ff;">N</span>=<span style="color: #2e8b57; font-weight: bold;">4</span> cov plots=all
              out=PC_COV_Scores;   <span style="color: #006400; font-style: italic;">/* 'scores' = projections onto first PCs */</span>
   <span style="color: #0000ff;">var</span> <span style="color: #0000ff; font-weight: bold;">&amp;VarNames</span>;
   ID ID;
   ods <span style="color: #0000ff;">select</span> Eigenvectors ScoreMatrixPlot <span style="color: #a020f0;">'2 by 1'</span> <span style="color: #a020f0;">'4 by 3'</span>;
   ods <span style="color: #0000ff;">output</span> Eigenvectors = PCs;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<p>
The procedure outputs many tables and graphs, which are explained in the next sections.
</p>

<h3>Interpreting the principal components</h3>
<p>
The most important output from a PCA is the set of eigenvectors, which are used as the basis vectors.
The following table shows four eigenvectors, which are called the first four principal components (PCs).
</p>
<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky02.png" alt="" width="344" height="387" class="alignnone size-full wp-image-57689" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky02.png 344w, https://blogs.sas.com/content/iml/files/2026/02/whisky02-267x300.png 267w" sizes="(max-width: 344px) 100vw, 344px" />

<ul>
<li>
The first column shows how the first (and most important) PC relates to the original variables. The first PC represents high values of Smoky, Medicinal, and Tobacco flavors and low values of Honey, Fruity, Sweetness, and Floral.
Thus, the first PC dimension contrasts harsh "masculine" flavors from sweeter "feminine" flavors.
</li><li>
The second column represents whiskies that have many taste characteristics. The second PC is associated with full-bodied, Winey, and Honey flavors.
Because most of the values in the column are positive, this component also represents a general blend of flavors. 
</li><li>
The third column represents whiskies that are Spicy. It also contrasts whiskies that are high in Fruity and Floral flavors from those that are high in Sweetness and Winey. 
</li><li>
The fourth column represents whiskies that are Nutty. It contrasts 
them with Winey and Sweetness flavors.
</li>
</ul>

<p>
A PCA is a "contrast-revealing" algorithm.
In a PCA analysis, the eigenvectors are orthogonal to each other. This
forces the principal components to contain both positive and negative weights. 
This allows PCA to create contrasts between flavors, but it is unintuitive to think about a whisky having a 'negative' amount of a flavor! 
<a href="https://blogs.sas.com/content/iml/2026/03/23/nmf-vs-svd.html">A subsequent article shows how the nonnegative matrix factorization (NMF) drops the orthogonality constraintso that it can build flavor profiles additively.</a> 
</p>

<h3>Visualizing the principal components: Component pattern plots</h3>
<p>
These PCs are easier to visualize if we view the <a href="https://blogs.sas.com/content/iml/2019/11/04/interpret-graphs-principal-components.html">"component pattern" plots</a>, which I have described in a previous article.
They show the correlations between the principal components and the original variables. 
The following graph shows the correlations between the original variables and the first two PCs.
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky03.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky03.png" alt="" width="360" height="360" class="alignnone size-full wp-image-57695" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky03.png 480w, https://blogs.sas.com/content/iml/files/2026/02/whisky03-300x300.png 300w, https://blogs.sas.com/content/iml/files/2026/02/whisky03-150x150.png 150w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
Notice that the horizontal axis separates the "masculine" flavors from the sweet and fruity and floral flavors. 
Similarly, the vertical axis is positively  correlated with most of the flavors, but is highly correlated with Winey, Body, and Honey.
</p>
<p>
You can show a similar plot for the third and fourth PCs:
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky04.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky04.png" alt="" width="360" height="360" class="alignnone size-full wp-image-57692" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky04.png 480w, https://blogs.sas.com/content/iml/files/2026/02/whisky04-300x300.png 300w, https://blogs.sas.com/content/iml/files/2026/02/whisky04-150x150.png 150w" sizes="(max-width: 360px) 100vw, 360px" /></a>



<h3>Interpreting the PC scores: Identifying similar and dissimilar whiskies</h3>
<p>
Because the PCs differentiate between various flavors, it is useful to project the original observations onto the basis of the PCs. 
This is called "scoring," and the resulting scatter plot are called <a href="https://blogs.sas.com/content/iml/2019/11/04/interpret-graphs-principal-components.html">score plots</a>.
It is useful to identify the points by the name of the distillery. However, since distillery names can be quite long (eg, "GlenDeveronMacduff"), 
I have used distillery names for 20 selected whiskies and have used row numbers to identify the remaining whiskies.
The score plot for first two PCs is shown below:
</p>


<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky05.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky05.png" alt="" width="480" height="360" class="alignnone size-full wp-image-57701" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky05.png 640w, https://blogs.sas.com/content/iml/files/2026/02/whisky05-300x225.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>

<p>
In these plots, the origin (0,0) represents a hypothetical whisky that has the "average" flavor profile. Whiskies in the upper-right quadrant have greater-than-average flavors in both components, whereas whiskies in the lower-left quadrant have less-than-average flavor in both components.
</p>
<p>

This is the most important plot for a beginner to study. Why? Because the whiskies to the right are "strong" on the first PC, meaning that they are 
full-bodied with strong accents of 
Smoky, Medicinal, and Tobacco flavors, and low values of Honey, Fruity, Sweetness, and Floral.
The Laphroaig whisky, which I have tasted and did not like, is far to right. Thus, I know to steer clear of similar whiskies such as Ardberg and Lagavulin.
</p>
<p>
In a similar way, the whiskies at the top of the plot (Macallan and Glendronach) represent whiskies that are
full-bodied with predominately Winey and Honey flavors. Notice that Macallan and Glendronach have approximately 0 score on the horizontal axis, which means that they are not strong associated with the flavors represented by the first PC.
The Glenfiddich whisky, which I like, is at the bottom of this graph. That means it is light-bodied and does not have Winey and Honey flavors. It is left-of-center on the horizontal scale, so it does not have the strong "masculine" flavors that I dislike about Laphroaig.
</p>
<p>
The score plot onto the third and fourth PCs provide additional information, primarily about the Spicy and Nutty flavors. 
</p>

<a href="https://blogs.sas.com/content/iml/files/2026/02/whisky06.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/whisky06.png" alt="" width="480" height="360" class="alignnone size-full wp-image-57698" srcset="https://blogs.sas.com/content/iml/files/2026/02/whisky06.png 640w, https://blogs.sas.com/content/iml/files/2026/02/whisky06-300x225.png 300w" sizes="(max-width: 480px) 100vw, 480px" /></a>


<p>
Whiskies on the right side of this plot include GlenGarioch and Talisker, which are spicy.
To the left are Tomore and Laphroaig, which have no Spicy flavor.
High on the vertical axis are Edradour and Longmorn, which are nutty. At the bottom of the plot are Linkwood and GlenDeveronMacduff, which have no Nutty flavor.
</p><p>
For the whiskies I have tasted, Laphroaig is to the left and in the middle, indicating that it has no Spicy flavor and a moderate amount of Nutty flavor.
Glenfiddich is in the lower-left quadrant, so it has negligible Spicy and Nutty flavors.
</p>

<h3>Summary</h3>
<p>
The purpose of this article is to introduce the Scotch whisky data set, which classifies the flavors in 86 Scotch whiskies according to 12 flavor characteristics. These data were analyzed by Young, Fogel, and Hawkins (2006), who used nonnegative matrix factorization (NMF). 
<a href="https://blogs.sas.com/content/iml/2026/03/30/nmf-whisky.html">I describe that analysis in a subsequent article.</a> However, the strengths of the NMF are best appreciated when compared with 
a conventional principal component analysis (PCA), which is a classical way to identify combinations of variables that explain the variance in the 12-dimensional flavor space. This article shows how a PCA can help someone who is ignorant about single-malt whiskies to understand which distilleries have similar flavor profiles. 
</p>



<h3>Appendix 1: Scotch whisky data</h3>
<p>
Define a SAS data set that contains the original data from Wishart (2004) that was analyzed by 
Young, Fogel, and Hawkins (2006). Then add a new variable, SELECTED, for 20 whiskies that are either 
popular brands (eg, Glenlivet, Glenfiddich, Macallan,...) or 
have extreme scores in a PCA of the data.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* Perform a principal component analysis (PCA) of the 
Scotch Whisky data set in Young, Fogel, and Hawkins 
(SPES, 2006), who analyzed the data by using a nonnegative 
matrix factorization (NMF). The article is available at 
https://www.niss.org/sites/default/files/ScotchWhisky.pdf
&nbsp;
I downloaded the data from  
https://www.niss.org/sites/default/files/ScotchWhisky01.txt
and corrected a few typos:
- ID=12: Replaced 'Belvenie' with 'Balvenie' 
- ID=56: Replaced 'Laphroig' with 'Laphroaig' 
*/</span>
<span style="color: #000080; font-weight: bold;">data</span> Whisky_Orig;
<span style="color: #0000ff;">infile</span> datalines dsd dlm=<span style="color: #a020f0;">','</span>; 
<span style="color: #0000ff;">length</span> Distillery $20;
<span style="color: #0000ff;">input</span>
RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey Nutty Malty Fruity Floral;
datalines;
01,Aberfeldy,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
02,Aberlour,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
03,AnCnoc,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
04,Ardbeg,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
05,Ardmore,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
06,ArranIsleOf,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
07,Auchentoshan,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
08,Auchroisk,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
09,Aultmore,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">10</span>,Balblair,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">11</span>,Balmenach,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">12</span>,Balvenie,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">13</span>,BenNevis,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">14</span>,Benriach,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">15</span>,Benrinnes,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">16</span>,Benromach,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">17</span>,Bladnoch,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
<span style="color: #2e8b57; font-weight: bold;">18</span>,BlairAthol,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">19</span>,Bowmore,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">20</span>,Bruichladdich,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">21</span>,Bunnahabhain,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
<span style="color: #2e8b57; font-weight: bold;">22</span>,Caol Ila,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">23</span>,Cardhu,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">24</span>,Clynelish,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">25</span>,Craigallechie,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>
<span style="color: #2e8b57; font-weight: bold;">26</span>,Craigganmore,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">27</span>,Dailuaine,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">28</span>,Dalmore,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">29</span>,Dalwhinnie,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">30</span>,Deanston,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">31</span>,Dufftown,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">32</span>,Edradour,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">33</span>,GlenDeveronMacduff,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">34</span>,GlenElgin,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
<span style="color: #2e8b57; font-weight: bold;">35</span>,GlenGarioch,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">36</span>,GlenGrant,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">37</span>,GlenKeith,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">38</span>,GlenMoray,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>
<span style="color: #2e8b57; font-weight: bold;">39</span>,GlenOrd,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">40</span>,GlenScotia,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">41</span>,GlenSpey,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">42</span>,Glenallachie,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">43</span>,Glendronach,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">44</span>,Glendullan,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">45</span>,Glenfarclas,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">46</span>,Glenfiddich,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">47</span>,Glengoyne,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">48</span>,Glenkinchie,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">49</span>,Glenlivet,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
<span style="color: #2e8b57; font-weight: bold;">50</span>,Glenlossie,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">51</span>,Glenmorangie,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">52</span>,Glenrothes,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">53</span>,Glenturret,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">54</span>,Highland Park,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">55</span>,Inchgower,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">56</span>,Isle of Jura,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">57</span>,Knochando,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">58</span>,Lagavulin,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">59</span>,Laphroaig,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>   
<span style="color: #2e8b57; font-weight: bold;">60</span>,Linkwood,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">61</span>,Loch Lomond,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">62</span>,Longmorn,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
<span style="color: #2e8b57; font-weight: bold;">63</span>,Macallan,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">64</span>,Mannochmore,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">65</span>,Miltonduff,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">66</span>,Mortlach,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">67</span>,Oban,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">68</span>,OldFettercairn,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">69</span>,OldPulteney,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">70</span>,RoyalBrackla,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">71</span>,RoyalLochnagar,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">72</span>,Scapa,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">73</span>,Speyburn,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">74</span>,Speyside,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">75</span>,Springbank,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">76</span>,Strathisla,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">77</span>,Strathmill,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">78</span>,Talisker,<span style="color: #2e8b57; font-weight: bold;">4</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">79</span>,Tamdhu,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">80</span>,Tamnavulin,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>
<span style="color: #2e8b57; font-weight: bold;">81</span>,Teaninich,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">82</span>,Tobermory,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">83</span>,Tomatin,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
<span style="color: #2e8b57; font-weight: bold;">84</span>,Tomintoul,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>
<span style="color: #2e8b57; font-weight: bold;">85</span>,Tomore,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>
<span style="color: #2e8b57; font-weight: bold;">86</span>,Tullibardine,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">3</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">0</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">2</span>,<span style="color: #2e8b57; font-weight: bold;">1</span>
;
&nbsp;
<span style="color: #006400; font-style: italic;">/* For later analysis, identify some of the most popular single-malt 
   Scotch whiskies (eg, Glenlivet, Glenfiddich, Macallan,...) and 
   some that have extreme scores in a PCA of the data.
*/</span>
<span style="color: #000080; font-weight: bold;">data</span> Whisky / <span style="color: #0000ff;">view</span>=Whisky;
<span style="color: #0000ff;">length</span> ID $20;
<span style="color: #0000ff;">set</span> Whisky_Orig;
selected = <span style="color: #2e8b57; font-weight: bold;">0</span>;
BestSeller = <span style="color: #2e8b57; font-weight: bold;">0</span>;
<span style="color: #0000ff;">if</span> Distillery <span style="color: #0000ff;">in</span> <span style="color: #66cc66;">&#40;</span>
   <span style="color: #a020f0;">'Glenlivet'</span> <span style="color: #a020f0;">'Glenfiddich'</span> <span style="color: #a020f0;">'Macallan'</span> <span style="color: #a020f0;">'Glenmorangie'</span> <span style="color: #a020f0;">'Balvenie'</span> 
   <span style="color: #a020f0;">'Laphroaig'</span> <span style="color: #a020f0;">'Aberlour'</span> <span style="color: #a020f0;">'Lagavulin'</span> <span style="color: #a020f0;">'Ardbeg'</span> <span style="color: #a020f0;">'Talisker'</span> 
   <span style="color: #66cc66;">&#41;</span> <span style="color: #0000ff;">then</span> BestSeller=<span style="color: #2e8b57; font-weight: bold;">1</span>;
<span style="color: #006400; font-style: italic;">/* also select a few whiskies that have extreme values in a PC */</span>
<span style="color: #0000ff;">if</span> BestSeller | RowID <span style="color: #0000ff;">in</span> <span style="color: #66cc66;">&#40;</span>07 <span style="color: #2e8b57; font-weight: bold;">11</span> <span style="color: #2e8b57; font-weight: bold;">32</span> <span style="color: #2e8b57; font-weight: bold;">33</span> <span style="color: #2e8b57; font-weight: bold;">35</span> <span style="color: #2e8b57; font-weight: bold;">43</span> <span style="color: #2e8b57; font-weight: bold;">60</span> <span style="color: #2e8b57; font-weight: bold;">62</span> <span style="color: #2e8b57; font-weight: bold;">82</span> <span style="color: #2e8b57; font-weight: bold;">85</span><span style="color: #66cc66;">&#41;</span> <span style="color: #0000ff;">then</span> 
   selected = <span style="color: #2e8b57; font-weight: bold;">1</span>;
<span style="color: #0000ff;">if</span> selected <span style="color: #0000ff;">then</span> 
   ID = Distillery;
<span style="color: #0000ff;">else</span> 
   ID = <span style="color: #0000ff;">put</span><span style="color: #66cc66;">&#40;</span>RowID, Z2.<span style="color: #66cc66;">&#41;</span>;
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #0000ff;">%let</span> varNames = Tobacco Medicinal Smoky Body Spicy Winey Nutty Honey Malty Fruity Sweetness Floral;
<span style="color: #000080; font-weight: bold;">proc print</span> <span style="color: #000080; font-weight: bold;">data</span>=Whisky;
   <span style="color: #0000ff;">where</span> selected;
   ID ID;
   <span style="color: #0000ff;">var</span> <span style="color: #0000ff; font-weight: bold;">&amp;VarNames</span>;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<h3>Appendix 2: Robert Burns' "Scotch Drink" poem</h3>
<p>
Robert Burns was one of Scotland's most famous poets. 
One of his poems, <a href="https://www.poetryverse.com/robert-burns-poems/scotch-drink">"Scotch Drink"</a> (1785), is an ode to whisky.
It contains more than 20 verses, but here is one amusing verse:
</p>
<blockquote>
O thou, my muse! guid auld Scotch drink! <br />
Whether thro' wimplin worms thou jink, <br />
Or, richly brown, ream owre the brink, <br />
In glorious faem, <br />
Inspire me, till I lisp an' wink, <br />
To sing thy name!
</blockquote>

<p>
If the 18th century spelling is confusing, here is a modern translation that uses conventional spelling. 
The phrase "wimplin worms thou jink" refers to the distilling process, which uses winding coils of tubing.
"To jink" means to change direction.
</p>
<blockquote>
O you, my muse! Good old Scotch drink! <br />
Whether through winding worms you jink, <br />
Or, richly brown, foam over the brink, <br />
In glorious foam, <br />
Inspire me, until I lisp and wink, <br />
To sing your name!
</blockquote>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html">Comparing flavor characteristics of Scotch whiskies: A principal component analysis</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/02/23/whisky-pca.html/feed</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/02/whisky03-150x150.png" />
	</item>
		<item>
		<title>Properties of the &quot;Min Matrix&quot;</title>
		<link>https://blogs.sas.com/content/iml/2026/02/16/min-matrix.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/02/16/min-matrix.html#respond</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 16 Feb 2026 10:23:16 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Matrix Computations]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57599</guid>

					<description><![CDATA[<p>When you implement a numerical algorithm, it is helpful to write tests for which the answer is known analytically. Because I work in computational statistics, I am always looking for test matrices that are symmetric and positive definite because those matrices can be used as covariance matrices. I have previously [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/16/min-matrix.html">Properties of the &quot;Min Matrix&quot;</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
When you implement a numerical algorithm, it is helpful to write tests for which the answer is known analytically.
Because I work in computational statistics, I am always looking for test matrices that are symmetric and positive definite because 
those matrices can be used as covariance matrices. 
I have previously written about some useful test matrices:
</p>
<ul>
<li><a href="https://blogs.sas.com/content/iml/2022/04/13/pascal-matrix-inverse.html">The Pascal matrix</a> is a symmetric positive definite matrix with a known Cholesky decomposition and inverse.</li>
<li><a href="https://blogs.sas.com/content/iml/2015/09/23/large-spd-matrix.html">A Toeplitz matrix</a> is symmetric (and banded) and can be constructed so that it is positive definite. 
</li>
<li>A special Toeplitz matrix, called <a href="https://blogs.sas.com/content/iml/2025/03/03/formula-eigenvalues-ar1-corr.html">the first-order autoregressive (AR(1)) correlation matrix</a>, has explicit eigenvalues.
</li>
</ul>

<p>
I recently learned about another symmetric and positive definite matrix, called the Min Matrix. 
The Min Matrix of size <em>n</em> is the matrix defined by M[<em>i,j</em>] = min(<em>i,j</em>) for 
1 &le; <em>i,j</em> &le; <em>n</em>. The general form of the Min Matrix is shown below:
</p>

<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/MinMatrix0.png" alt="" width="250" height="200" class="alignnone size-full wp-image-57626" />

<p>
The Min Matrix has some wonderful properties. It is easy to generate. It is symmetric and positive definite (SPD). The inverse and Cholesky roots are also integer valued.
The eigenvalues have explicit formulas. 
</p>

<h3>Creating the Min Matrix in SAS IML</h3>

<p>
In the SAS IML language, you can create the Min Matrix without using any loops. 
By using <a href="https://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices.html">the built-in ROW and COL functions</a> in conjunction with <a href="https://blogs.sas.com/content/iml/2014/12/15/elementwise-minmax-operators.html">the elementwise minimum operator</a> (&gt;&lt;), you can generate the matrix in a single line of code:</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #000080; font-weight: bold;">proc iml</span>;
<span style="color: #006400; font-style: italic;">/* return the Min Matrix of size n. M[i,j] = min(i,j) */</span>
start MinMatrix<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span>;
   M = row<span style="color: #66cc66;">&#40;</span>I<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> &gt;&lt; col<span style="color: #66cc66;">&#40;</span>I<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
   <span style="color: #0000ff;">return</span> M;
finish;
&nbsp;
<span style="color: #0000ff;">n</span> = <span style="color: #2e8b57; font-weight: bold;">5</span>;
M = MinMatrix<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span>;
print M<span style="color: #66cc66;">&#91;</span>L=<span style="color: #a020f0;">'Min Matrix'</span><span style="color: #66cc66;">&#93;</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/MinMatrix1.png" alt="" width="107" height="175" class="alignnone size-full wp-image-57638" />

<h3>The Cholesky root and the determinant</h3>
<p>
A property of the Min Matrix is that its Cholesky factor 
is known and integer-valued. 
The Cholesky root of an SPD matrix, M, 
is an upper-triangular matrix, U, such that M = U<sup>T</sup>U. (In PROC IML, 
the ROOT function returns the upper triangular factor.)
For the Min Matrix, the Cholesky root contains all 1s on and above the main diagonal. 
Not only is this a cool result, but the root immediately tells us that the determinant of 
M is always 1. This follows because det(M) = det(U<sup>T</sup>)det(U) = 1.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">U = root<span style="color: #66cc66;">&#40;</span>M<span style="color: #66cc66;">&#41;</span>;
print U<span style="color: #66cc66;">&#91;</span>L=<span style="color: #a020f0;">'Cholesky Root'</span><span style="color: #66cc66;">&#93;</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/MinMatrix2.png" alt="" width="113" height="170" class="alignnone size-full wp-image-57635" />

<h3>The tridiagonal inverse</h3>
<p>
The Min Matrix is a dense matrix, but its inverse is a tridiagonal matrix. 
The inverse matrix has 2s on the main diagonal (except for the very last element, which is 1) and -1s on the superdiagonal and subdiagonal.
This inverse matrix is closely related to the matrix for a discrete second-difference operator, except for the (n,n)th element. 
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">M_Inv = inv<span style="color: #66cc66;">&#40;</span>M<span style="color: #66cc66;">&#41;</span>;
print M_Inv<span style="color: #66cc66;">&#91;</span>L=<span style="color: #a020f0;">'Inverse Matrix'</span><span style="color: #66cc66;">&#93;</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/MinMatrix3.png" alt="" width="121" height="172" class="alignnone size-full wp-image-57632" />

<h3>Eigenvalues</h3>

<p>
The structure of the inverse matrix enables use to find a formula for its eigenvalues.
The k_th eigenvalue for M<sup>-1</sup> is

<span class='MathJax_Preview'>\(\mu_k = 4 \sin^2\left( \frac{(2k-1)\pi}{2(2n+1)} \right)
\)</span><script type='math/tex'>\mu_k = 4 \sin^2\left( \frac{(2k-1)\pi}{2(2n+1)} \right)
</script>
</p><p>
For any invertible matrix, the eigenvalues of the matrix and its inverse are reciprocals.
Therefore, the eigenvalues of the Min Matrix of size <em>n</em> are given by the following expression:

<span class='MathJax_Preview'>\(\lambda_k = \frac{1}{4 \sin^2\left( \frac{(2k-1)\pi}{2(2n+1)} \right)}\)</span><script type='math/tex'>\lambda_k = \frac{1}{4 \sin^2\left( \frac{(2k-1)\pi}{2(2n+1)} \right)}</script>
</p>
<p>You can verify this formula by comparing the analytical results to the numerical computation from the EIGVAL function:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;">eigenvals = eigval<span style="color: #66cc66;">&#40;</span>M<span style="color: #66cc66;">&#41;</span>;    <span style="color: #006400; font-style: italic;">/* numerical eigenvalues */</span>
<span style="color: #006400; font-style: italic;">/* compare to the analytical formula */</span>
k = T<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">1</span>:<span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span>;
pi = constant<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'pi'</span><span style="color: #66cc66;">&#41;</span>;
check_eig = <span style="color: #2e8b57; font-weight: bold;">1</span> / <span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">4</span> <span style="color: #006400; font-style: italic;">* sin( ((2*k-1)*pi) / (2*(2*n+1)) )##2);</span>
Diff = eigenvals - check_eig;
print eigenvals check_eig Diff;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/MinMatrix4.png" alt="" width="229" height="174" class="alignnone size-full wp-image-57629" />

<p>
From the analytical expression for the eigenvalues, you can deduce that the Min Matrix is ill-conditioned for large <em>n</em>.
As <em>n</em> increases, the largest eigenvalue (<em>k</em>=1) grows like <em>O(n<sup>2</sup>)</em>. 
In contrast, the smallest eigenvalue (<em>k=n</em>) approaches the value 1/4.
The condition number of a matrix is the ratio of the largest to smallest eigenvalue, so 
the condition number of the Min Matrix grows quadratically with <em>n</em>. 
This makes it an a good test case for matrix computations.
</p>

<h3>Summary</h3>
<p>
The Min Matrix is a good test matrix to know about. It is a covariance matrix because it is symmetric and positive definite.
It is easy to construct, 
it has a simple Cholesky root,
it has an integer-valued tridiagonal inverse, and
its eigenvalues are known exactly. 
</p>

<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/16/min-matrix.html">Properties of the &quot;Min Matrix&quot;</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/02/16/min-matrix.html/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/02/MinMatrix0-150x150.png" />
	</item>
		<item>
		<title>The asymptotic expected value of the range for normal data</title>
		<link>https://blogs.sas.com/content/iml/2026/02/09/asymptotic-scaling-range.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/02/09/asymptotic-scaling-range.html#respond</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Mon, 09 Feb 2026 10:23:52 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[Statistical Programming]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57485</guid>

					<description><![CDATA[<p>A previous article shows how to compute various robust estimates of scale in SAS. In that article, I show how to scale these robust estimators so that they become consistent estimators of the standard deviation (&#963;) when the data are normally distributed. The scaling factor is related to the expected [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/09/asymptotic-scaling-range.html">The asymptotic expected value of the range for normal data</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>
A previous article shows <a href="https://blogs.sas.com/content/iml/2026/02/02/rescale-mad-iqr.html">how to compute various robust estimates of scale in SAS</a>.
In that article, I show how to scale these robust estimators so that they become consistent
estimators of the standard deviation (&sigma;) when the data are normally distributed. The scaling factor is related to the expected value of the statistics for normal data. 
</p>
<p>
The range is also a measure of dispersion, although it is not robust to outliers.
I have previously shown <a href="https://blogs.sas.com/content/iml/2024/02/21/range-d2.html">how to use the range to form a consistent estimator of  &sigma; for normally distributed data</a>.
The scaling factor depends on the expected value of the range for a normal sample of size <em>n</em>, 
which is called <em>d</em><sub>2</sub>(<em>n</em>). 
</p>
<p>
When the sample size is small (<em>n</em> &le; 25),
quality control engineers use the range to estimate the standard deviation.
You can find values of <em>d</em><sub>2</sub>(<em>n</em>) in a table or use the previous article to compute values 
by using the D2 function in SAS/QC software or by solving an integral by using SAS IML.
</p><p>
I recently saw an approximation to  <em>d</em><sub>2</sub>(<em>n</em>) by using 
an asymptotic (large <em>n</em>) formula. 
An asymptotic formula approximates the expected value of the range in a normal sample when <em>n</em> is large. 
I was surprised to discover that the asymptotic approximation is not a good 
approximation for <em>d</em><sub>2</sub>(<em>n</em>) for moderately large samples, such as <em>n</em> &lt; 1000.
This post compares the asymptotic formula to the exact value of <em>d</em><sub>2</sub>(<em>n</em>) and 
compares both methods with a Monte Carlo simulation.
</p>


<h2>The Expected Range and Extreme Value Theory</h2>
<a href="https://blogs.sas.com/content/iml/files/2019/07/extremeval8.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2019/07/extremeval8.png" alt="" width="320" height="200" class="alignright size-full wp-image-24640" srcset="https://blogs.sas.com/content/iml/files/2019/07/extremeval8.png 640w, https://blogs.sas.com/content/iml/files/2019/07/extremeval8-300x188.png 300w, https://blogs.sas.com/content/iml/files/2019/07/extremeval8-343x215.png 343w" sizes="(max-width: 320px) 100vw, 320px" /></a>

<p>
The range of a sample is the difference between the maximum and the minimum values: <em>R</em> = <em>X</em><sub>(<em>n</em>)</sub> - <em>X</em><sub>(1)</sub>. For a symmetric distribution like the normal distribution, the expected value of the range is simply twice the expected value of the maximum: <em>E</em>[<em>R</em>] = 2&middot;E[<em>X</em><sub>(<em>n</em>)</sub>].
</p>
<p>
I have previously shown that 
as <em>n</em> becomes large, <a href="https://blogs.sas.com/content/iml/2019/07/22/extreme-value-normal-data.html">the distribution of the maximum of a normal sample approaches a Gumbel distribution</a>. 
The graph of the expected value of the maximum is shown to the left for large sample sizes.
You can write down formulas that 
describe the asymptotic behavior of this function.
I used formulas in David and Nagaraja (2004, <em>Order Statistics, 3rd Ed.</em>).
First, define the following functions:
</p>
<ul>
<li><span class='MathJax_Preview'>\(s_n = \sqrt{2 \log n}\)</span><script type='math/tex'>s_n = \sqrt{2 \log n}</script>
</li>
<li>
<span class='MathJax_Preview'>\(b_n = s_n - \frac{\log(\log n) + \log(4\pi)}{2s_n}\)</span><script type='math/tex'>b_n = s_n - \frac{\log(\log n) + \log(4\pi)}{2s_n}</script>
</li>
<li>
<span class='MathJax_Preview'>\(a_n = \frac{1}{s_n}\)</span><script type='math/tex'>a_n = \frac{1}{s_n}</script>
</li>
</ul>

<p>
In these formulas, LOG is the natural logarithm.
By using these expressions, you can construct several asymptotic approximations for <em>d</em><sub>2</sub>(<em>n</em>). Each approximation adds
one additional term to the previous approximation:
</p>
<ol>
    <li><strong>1-term approximation:</strong> 2<em>s<sub>n</sub></em>. This is the simplest (and crudest) estimate.</li>
    <li><strong>2-term approximation:</strong> 2<em>b<sub>n</sub></em>. This a better approximation, but it tends to underestimate 
<em>d</em><sub>2</sub>(<em>n</em>).</li>
    <li><strong>3-term approximation:</strong> 2(<em>b<sub>n</sub></em> + &gamma;<em>a<sub>n</sub></em>), where &gamma; &asymp; 0.5772 is the Euler-Mascheroni constant. This accounts for the fact that the mean of the limiting Gumbel distribution is &gamma;. This is the best approximation of the three.</li>
</ol>

<p>
These are large-<em>n</em> approximations for <em>d</em><sub>2</sub>(<em>n</em>). The following SAS DATA step compares the 
exact values of <em>d</em><sub>2</sub>(<em>n</em>) to these asymptotic formulas.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* d2(n) is the scale factor to use to adjust the range statistic so that 
   it is a consistent estimator of the standard deviation for normal data.
   Some references use asymptotic approximations for d2(n).
   Define the formulas 
   sn = sqrt(2*log(n));
   bn = sn - (log(log(n)) + log(4*pi)) / (2*sqrt(2*log(n)));
   an = 1 / sn;
&nbsp;
   Then the following are 1-term, 2-term, and 3-term asymptotic approximations for d2(n):
&nbsp;
   Number of Terms    Formula
   1 term             2 * sn
   2 terms            2 * bn
   3 terms            2 * (bn + gamma*an), where gamma~0.57721... is the Euler-Mascheroni constant
*/</span>
<span style="color: #000080; font-weight: bold;">data</span> k_n;
<span style="color: #006400; font-style: italic;">/* Compute or look up the value of d2(n) in a table for certain values of n.
   Compare those exact values to several asymptotic (large n) approximations. */</span>
<span style="color: #0000ff;">array</span> nA<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">5</span><span style="color: #66cc66;">&#93;</span>   _temporary_ <span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span>,      <span style="color: #2e8b57; font-weight: bold;">50</span>,     <span style="color: #2e8b57; font-weight: bold;">100</span>,    <span style="color: #2e8b57; font-weight: bold;">500</span>,    <span style="color: #2e8b57; font-weight: bold;">1000</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #0000ff;">array</span> d_nA<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">5</span><span style="color: #66cc66;">&#93;</span> _temporary_ <span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">1.1284</span>, <span style="color: #2e8b57; font-weight: bold;">4.4982</span>, <span style="color: #2e8b57; font-weight: bold;">5.0151</span>, <span style="color: #2e8b57; font-weight: bold;">6.0734</span>, <span style="color: #2e8b57; font-weight: bold;">6.4829</span><span style="color: #66cc66;">&#41;</span>;
pi = constant<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'pi'</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #0000ff;">gamma</span> = constant<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'Euler'</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #006400; font-style: italic;">/* 0.57721... */</span>
<span style="color: #0000ff;">do</span> i = <span style="color: #2e8b57; font-weight: bold;">1</span> to <span style="color: #0000ff;">dim</span><span style="color: #66cc66;">&#40;</span>nA<span style="color: #66cc66;">&#41;</span>;
   <span style="color: #0000ff;">n</span>  = nA<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>;
   d2_n = d_nA<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>;
   sn = <span style="color: #0000ff;">sqrt</span><span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #006400; font-style: italic;">*log(n));</span>
   bn = sn - <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">log</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">log</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> + <span style="color: #0000ff;">log</span><span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">4</span><span style="color: #006400; font-style: italic;">*pi)) / (2*sqrt(2*log(n)));</span>
   an = <span style="color: #2e8b57; font-weight: bold;">1</span> / sn;
   Asymp1 = <span style="color: #2e8b57; font-weight: bold;">2</span> <span style="color: #006400; font-style: italic;">* sn;</span>
   Asymp2 = <span style="color: #2e8b57; font-weight: bold;">2</span> <span style="color: #006400; font-style: italic;">* bn;</span>
   Asymp3 = <span style="color: #2e8b57; font-weight: bold;">2</span> <span style="color: #006400; font-style: italic;">* (bn + gamma*an);</span>
   <span style="color: #0000ff;">output</span>;
<span style="color: #0000ff;">end</span>;
<span style="color: #0000ff;">drop</span> pi <span style="color: #0000ff;">gamma</span> i sn bn an;
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #000080; font-weight: bold;">proc print</span> <span style="color: #000080; font-weight: bold;">data</span>=k_n noobs; 
   <span style="color: #0000ff;">format</span> <span style="color: #0000ff;">_all_</span> best5.; 
   ID <span style="color: #0000ff;">n</span>; 
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/RangeScale1.png" alt="" width="274" height="175" class="alignnone size-full wp-image-57533" />


<p>
I have seen researchers use these asymptotic approximations for values of <em>n</em> as small as 50 or 100.
But, as you can see, the approximations are not close to <em>d</em><sub>2</sub>(<em>n</em>) for those values of <em>n</em>.
In fact, 
David and Nagaraja (2003) warn that these asymptotic approximations converge quite slowly. 
Even at <em>n</em> = 1000, the approximations are not very close to the true value.
I do not recommend these asymptotic approximations for small or moderate sample sizes.
</p>

<h2>Testing the Estimators: A Monte Carlo Simulation</h2>

<p>
You can run a Monte Carlo simulation to estimate <em>d</em><sub>2</sub>(<em>n</em>), which is the expected range of a normal sample of size <em>n</em>.
The following SAS DATA step generates 1,000 samples of size <em>n</em> = 100 from a standard normal distribution. For each sample, PROC MEANS calculates the range. In a second DATA step, the range is divided by three different factors: the exact <em>d</em><sub>2</sub>(100), the 2-term approximation, and the 3-term approximation. 
These "adjusted ranges" are the  various estimates for the standard deviation, which is 1 for this simulation.
I do not show the 1-term adjusted range because it is so bad.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* simulate 1000 random samples of size n=100 from N(0,1) */</span>
<span style="color: #0000ff;">%let</span> <span style="color: #0000ff;">N</span> = <span style="color: #2e8b57; font-weight: bold;">100</span>;
<span style="color: #0000ff;">%let</span> numSamples = <span style="color: #2e8b57; font-weight: bold;">1000</span>;
<span style="color: #000080; font-weight: bold;">data</span> SimNormal;
<span style="color: #0000ff;">call</span> streaminit<span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">12345</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #0000ff;">do</span> sampleID = <span style="color: #2e8b57; font-weight: bold;">1</span> to <span style="color: #0000ff; font-weight: bold;">&amp;numSamples</span>;
   <span style="color: #0000ff;">do</span> i = <span style="color: #2e8b57; font-weight: bold;">1</span> to &amp;<span style="color: #0000ff;">n</span>;
      <span style="color: #0000ff;">x</span> = rand<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">&quot;Normal&quot;</span><span style="color: #66cc66;">&#41;</span>;
      <span style="color: #0000ff;">output</span>;
   <span style="color: #0000ff;">end</span>;
<span style="color: #0000ff;">end</span>;
<span style="color: #0000ff;">keep</span> sampleID <span style="color: #0000ff;">x</span>;
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* compute the range for each random sample */</span>
<span style="color: #000080; font-weight: bold;">proc means</span> <span style="color: #000080; font-weight: bold;">data</span>=SimNormal noprint;
    <span style="color: #0000ff;">by</span> sampleID;
    <span style="color: #0000ff;">var</span> <span style="color: #0000ff;">x</span>;
    <span style="color: #0000ff;">output</span> out=RangeOut <span style="color: #0000ff;">RANGE</span>=<span style="color: #0000ff;">Range</span>;
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* adjust the range statistics by using d2(100), Asymp2, or Asymp3 formula */</span>
<span style="color: #000080; font-weight: bold;">data</span> AdjustRange;
pi = constant<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'pi'</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #0000ff;">gamma</span> = constant<span style="color: #66cc66;">&#40;</span><span style="color: #a020f0;">'Euler'</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #006400; font-style: italic;">/* 0.5772156649 */</span>
<span style="color: #0000ff;">set</span> RangeOut<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">rename</span>=<span style="color: #66cc66;">&#40;</span>_FREQ_=<span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
d2_n = <span style="color: #2e8b57; font-weight: bold;">5.0151</span>;
sn = <span style="color: #0000ff;">sqrt</span><span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #006400; font-style: italic;">*log(n));</span>
bn = sn - <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">log</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">log</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">n</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> + <span style="color: #0000ff;">log</span><span style="color: #66cc66;">&#40;</span><span style="color: #2e8b57; font-weight: bold;">4</span><span style="color: #006400; font-style: italic;">*pi)) / (2*sqrt(2*log(n)));</span>
an = <span style="color: #2e8b57; font-weight: bold;">1</span> / sn;
Asymp2 = <span style="color: #2e8b57; font-weight: bold;">2</span> <span style="color: #006400; font-style: italic;">* bn;</span>
Asymp3 = <span style="color: #2e8b57; font-weight: bold;">2</span> <span style="color: #006400; font-style: italic;">* (bn + gamma*an);</span>
&nbsp;
AdjRange_d2     = <span style="color: #2e8b57; font-weight: bold;">1</span>/d2_n   <span style="color: #006400; font-style: italic;">* Range;</span>
AdjRange_2Term  = <span style="color: #2e8b57; font-weight: bold;">1</span>/Asymp2 <span style="color: #006400; font-style: italic;">* Range;</span>
AdjRange_3Term  = <span style="color: #2e8b57; font-weight: bold;">1</span>/Asymp3 <span style="color: #006400; font-style: italic;">* Range;</span>
<span style="color: #0000ff;">keep</span> SampleID <span style="color: #0000ff;">Range</span> AdjRange_d2 AdjRange_2Term AdjRange_3Term;
<span style="color: #0000ff;">label</span> AdjRange_d2=<span style="color: #a020f0;">&quot;d2&quot;</span> AdjRange_2Term=<span style="color: #a020f0;">&quot;2-Term Asymptotic&quot;</span> AdjRange_3Term=<span style="color: #a020f0;">&quot;3-Term Asymptotic&quot;</span>;
<span style="color: #000080; font-weight: bold;">run</span>;
&nbsp;
<span style="color: #006400; font-style: italic;">/* Print the Monte Carlo estimates of the adjusted range, which should be close to sigma=1.
   Note that the 95% CI for the mean includes 1 for d2(100), but not for the asymptotic approx. 
*/</span>
<span style="color: #000080; font-weight: bold;">proc means</span> <span style="color: #000080; font-weight: bold;">data</span>=AdjustRange <span style="color: #0000ff;">Mean</span> CLM ndec=<span style="color: #2e8b57; font-weight: bold;">3</span> nolabels;
   <span style="color: #0000ff;">var</span> AdjRange_d2 AdjRange_2Term AdjRange_3Term;
<span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/RangeScale2.png" alt="" width="464" height="117" class="alignnone size-full wp-image-57530" srcset="https://blogs.sas.com/content/iml/files/2026/02/RangeScale2.png 464w, https://blogs.sas.com/content/iml/files/2026/02/RangeScale2-300x76.png 300w" sizes="(max-width: 464px) 100vw, 464px" />

<p>
Since the data were generated from <em>N</em>(0,1), a consistent estimator should produce an average value close to 1.0. 
The results show that the Monte Carlo estimate is consistent if you use the 
 <em>d</em><sub>2</sub>(100) value to scale the data. For the asymptotic formulas, you get biased estimates. 
The two-term asymptotic approximation overestimates the standard deviation whereas the 
two-term asymptotic approximation underestimates it. These results are reproducible: they do not change if you use 
additional Monte Carlo trials.
</p>


<h3>Summary</h3>
<p>
The expected range of normal data is a quantity known as <em>d</em><sub>2</sub>(<em>n</em>).
This article shows several asymptotic approximations.
The asymptotic approximations provide a simple expression that you can use to predict the 
range in a very large random sample of normal data.
However, these estimates converge quite slowly. 
In quality control, the range is used to estimate the standard deviation of normal data when <em>n</em> is small.
You should not use the asymptotic approximations of <em>d</em><sub>2</sub>(<em>n</em>) for that purpose.
</p><p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/09/asymptotic-scaling-range.html">The asymptotic expected value of the range for normal data</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/02/09/asymptotic-scaling-range.html/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/02/RangeScale3-150x150.png" />
	</item>
		<item>
		<title>Clip values into an interval</title>
		<link>https://blogs.sas.com/content/iml/2026/02/04/clip-values.html</link>
					<comments>https://blogs.sas.com/content/iml/2026/02/04/clip-values.html#respond</comments>
		
		<dc:creator><![CDATA[Rick Wicklin]]></dc:creator>
		<pubDate>Wed, 04 Feb 2026 10:20:28 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Statistical Programming]]></category>
		<guid isPermaLink="false">https://blogs.sas.com/content/iml/?p=57437</guid>

					<description><![CDATA[<p>Years ago, I wrote an article about the "trap and cap" programming technique. The idea is that programmers should "trap" inputs to functions (like SQRT, LOG, and QUANTILE functions) to avoid domain errors. In addition, when visualizing a function's range, you should "cap" the output to improve graphs of functions [...]</p>
<p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/04/clip-values.html">Clip values into an interval</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></description>
										<content:encoded><![CDATA[<a href="https://blogs.sas.com/content/iml/files/2026/02/clipping0.png"><img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/clipping0.png" alt="" width="360" height="360" class="alignright size-full wp-image-57470" srcset="https://blogs.sas.com/content/iml/files/2026/02/clipping0.png 360w, https://blogs.sas.com/content/iml/files/2026/02/clipping0-300x300.png 300w, https://blogs.sas.com/content/iml/files/2026/02/clipping0-150x150.png 150w" sizes="(max-width: 360px) 100vw, 360px" /></a>

<p>
Years ago, I wrote an article about <a href="https://blogs.sas.com/content/iml/2015/11/04/trap-cap-division-by-zero.html">the "trap and cap" programming technique</a>.
The idea is that programmers should "trap" inputs to functions (like SQRT, LOG, and QUANTILE functions) to avoid domain errors.
In addition, when visualizing a function's range, you should "cap" the output to 
improve graphs of functions (such as LOG, 1/X, or TAN) that contain singularities.
</p><p>
Each of these is an example of what computer scientists call <a href="https://en.wikipedia.org/wiki/Clamp_(function)"><em>clipping</em> or <em>clamping</em></a>. Mathematicians call it truncation to an interval.
An example is shown to the right. In the example, input values are in the interval [-2, 2]. The "clip" function ensures that the 
values are clipped into the interval [-1, 1].  For references, the identity function is shown as a dashed line in the background.
</p><p>
This article shows how to write SAS logic for clipping values in the DATA step and in SAS IML.
</p>

<h3>Clipping in the DATA step: Nested MAX and MIN functions</h3>
<p>
In the DATA step, the most straightforward way to truncate a value <em>x</em> to the interval [<em>a, b</em>] is to use nested 
calls to the MIN and MAX functions. 
First, take the minimum between the value and the upper bound, <em>b</em>. This value is less than or equal to <strong>b</strong>.
Then take the maximum of the lower bound, <em>a</em>, and the previous result. 
The result is greater than or equal to <em>a</em>.
</p>
<p>
You can define a simple macro to encapsulate this logic:
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* clip a value x into the interval [low, high] */</span>
<span style="color: #0000ff;">%macro</span> CLIP<span style="color: #66cc66;">&#40;</span>low, value, high<span style="color: #66cc66;">&#41;</span>;
   <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">max</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff; font-weight: bold;">&amp;low</span>, <span style="color: #0000ff;">min</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff; font-weight: bold;">&amp;value</span>, <span style="color: #0000ff; font-weight: bold;">&amp;high</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #0000ff;">%mend</span>;
<span style="color: #000080; font-weight: bold;">data</span> ClipVals;
<span style="color: #0000ff;">input</span> <span style="color: #0000ff;">x</span> @@;
clipped = %CLIP<span style="color: #66cc66;">&#40;</span>-<span style="color: #2e8b57; font-weight: bold;">1</span>, <span style="color: #0000ff;">x</span>, <span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#41;</span>;
datalines;
-<span style="color: #2e8b57; font-weight: bold;">2</span> -<span style="color: #2e8b57; font-weight: bold;">1.5</span> -<span style="color: #2e8b57; font-weight: bold;">0.5</span> <span style="color: #2e8b57; font-weight: bold;">0</span> <span style="color: #2e8b57; font-weight: bold;">0.5</span> <span style="color: #2e8b57; font-weight: bold;">1.5</span> <span style="color: #2e8b57; font-weight: bold;">2</span>
;
&nbsp;
<span style="color: #000080; font-weight: bold;">proc print</span>; <span style="color: #000080; font-weight: bold;">run</span>;</pre></td></tr></table></div>




<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/clipping1.png" alt="" width="138" height="229" class="alignnone size-full wp-image-57467" />

<p>
In this program, the value for <em>x</em> is clipped into the interval [-1, 1]. All values less than -1 become -1; all values greater than 1 become 1.

</p>

<h3>Clipping in PROC IML: Elementwise operators</h3>
<p>
If you are working with vectors or matrices in PROC IML, the 
previous code is probably not what you want to use. The MIN and MAX functions return a scalar value, which is the minimum or maximum element of the vector or matrix argument. Instead, you probably want to truncate every element of a vector by using <a href="https://blogs.sas.com/content/iml/2014/12/15/elementwise-minmax-operators.html">the elementwise minimum and maximum operators</a>.
If <em>x</em> and <em>y</em> are conformal quantities (for example, vectors of the same size or a scalar and a vector), then 
you can use the elementwise operators, as follows:
</p>
<ul>
<li>The elementwise maximum is <em>x</em> &lt;&gt; <em>y</em>.
</li>
<li>The elementwise minimum is <em>x</em> &gt;&lt; <em>y</em>.
</li>
</ul>

<p>
You can encapsulate this logic by defining a SAS IML function that clips elements of a vector, <em>x</em>, into an interval [<em>a,b</em>]. The interval
is represented as a two-element vector.
</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="sas" style="font-family:monospace;"><span style="color: #006400; font-style: italic;">/* use the elementwise minimum (&gt;&lt;) and elementwise maximum (&lt;&gt;) operators to 
   truncate elements of a vector into an interval. */</span>
<span style="color: #000080; font-weight: bold;">proc iml</span>;
start Clip<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">x</span>, ab<span style="color: #66cc66;">&#41;</span>;
   a = ab<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#93;</span>; b = ab<span style="color: #66cc66;">&#91;</span><span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#93;</span>;
   <span style="color: #0000ff;">return</span><span style="color: #66cc66;">&#40;</span> a &lt;&gt; <span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">x</span> &gt;&lt; b<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>;
finish;
&nbsp;
<span style="color: #0000ff;">x</span> = <span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">2</span>, -<span style="color: #2e8b57; font-weight: bold;">1.5</span>, -<span style="color: #2e8b57; font-weight: bold;">0.5</span>, <span style="color: #2e8b57; font-weight: bold;">0</span>, <span style="color: #2e8b57; font-weight: bold;">0.5</span>, <span style="color: #2e8b57; font-weight: bold;">1.5</span>, <span style="color: #2e8b57; font-weight: bold;">2</span><span style="color: #66cc66;">&#125;</span>;
clipped = Clip<span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">x</span>, <span style="color: #66cc66;">&#123;</span>-<span style="color: #2e8b57; font-weight: bold;">1</span>, <span style="color: #2e8b57; font-weight: bold;">1</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>;   <span style="color: #006400; font-style: italic;">/* truncate elements of x into [-1, 1] */</span>
print <span style="color: #0000ff;">x</span> clipped;</pre></td></tr></table></div>





<img loading="lazy" decoding="async" src="https://blogs.sas.com/content/iml/files/2026/02/clipping2.png" alt="" width="104" height="227" class="alignnone size-full wp-image-57464" />

<p>
The result is the same as for the DATA step example.
The elementwise operators prevent you from having to loop over elements of the vector.
</p>

<h3>What happens with missing values?</h3>
<p>
A missing value is not a mathematical number, so the clipping operation is not defined for missing values.
SAS obeys certain conventions when a missing value is used in a MIN or MAX function, or is part of an elementwise operation in IML.
As written, the DATA step function will return the lower limit if the value is missing.
The IML function will return the upper limit.
With a little additional logic, you can determine what you want to happen by using the IFN function in the DATA step and the CHOOSE function in IML.
For example, if you decide that you want the clipping function to output a missing value when the input is missing, you can rewrite the body of the macro and the function as follows:
</p>

<ul>
<li>DATA step: <tt>ifn(missing(&amp;value), ., max(&amp;low, min(&amp;value, &amp;high)))</tt></li>
<li>PROC IML: <tt>choose(x=., ., a  (x &gt;&lt; b))</tt></li>
</ul>



<h3>Summary</h3>
<p>
This article shows how to clip (or truncate) values into an interval.
This technique is useful for avoiding domain errors in arguments to functions that have a 
restricted domain. Even for functions that are defined everywhere, such as the EXP function, you can use the 
clipping trick to <a href="https://blogs.sas.com/content/iml/2025/08/04/sum-of-exponentials.html">avoid numerical underflow and overflow in your programs</a>.
</p><p>The post <a rel="nofollow" href="https://blogs.sas.com/content/iml/2026/02/04/clip-values.html">Clip values into an interval</a> appeared first on <a rel="nofollow" href="https://blogs.sas.com/content/iml">The DO Loop</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogs.sas.com/content/iml/2026/02/04/clip-values.html/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			<enclosure url="https://blogs.sas.com/content/iml/files/2026/02/clipping0-150x150.png" />
	</item>
	</channel>
</rss>
