The post Difference operators as matrices appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
For a time series { y_{1}, y_{2}, …, y_{N} }, the difference operator computes the difference between two observations. The kth-order difference is the series
{ y_{k+1} – y_{1}, …, y_{N} – y_{N-k} }.
In SAS, the DIF function in the DATA step computes differences between observations. The DIF function in the SAS/IML language takes a column vector of values and returns a vector of differences.
For example, the following SAS/IML statements define a column vector that has five observations and calls the DIF function to compute the first-order differences between adjacent observations. By convention, the DIF function returns a vector that is the same size as the input vector and inserts a missing value in the first element.
proc iml; x = {0, 0.1, 0.3, 0.7, 1}; dif = dif(x); /* by default DIF(x, 1) ==> first-order differences */ print x dif; |
The difference operator is a linear operator that can be represented by a matrix. The first nonmissing value of the difference is x[2]-x[1], followed by x[3]-x[2], and so forth. Thus the linear operator can be represented by the matrix that has -1 on the main diagonal and +1 on the super-diagonal (above the diagonal). An efficient way to construct the difference operator is to start with the zero matrix and insert ±1 on the diagonal and super-diagonal elements. You can use the DO function to construct the indices for the diagonal and super-diagonal elements in a matrix:
start DifOp(dim); D = j(dim-1, dim, 0); /* allocate zero martrix */ n = nrow(D); m = ncol(D); diagIdx = do(1,n*m, m+1); /* index diagonal elements */ superIdx = do(2,n*m, m+1); /* index superdiagonal elements */ *subIdx = do(m+1,n*m, m+1); /* index subdiagonal elements (optional) */ D[diagIdx] = -1; /* assign -1 to diagonal elements */ D[superIdx] = 1; /* assign +1 to super-diagonal elements */ return D; finish; B = DifOp(nrow(x)); d = B*x; print B, d[L="Difference"]; |
You can see that the DifOp function constructs an (n-1) x n matrix, which is the correct dimension for transforming an n-dimensional vector into an (n-1)-dimensional vector. Notice that the matrix multiplication omits the element that previously held a missing value.
You probably would not use a matrix multiplication in place of the DIF function if you needed the first-order difference for a single time series. However, the matrix formulation makes it possible to use one matrix multiplication to find the difference for many time series.
The following matrix contains three time-series, one in each column. The B matrix computes the first-order difference for all columns by using a single matrix-matrix multiplication. The same
SAS/IML code is valid whether the X matrix has three columns or three million columns.
/* The matrix can operate on a matrix where each column is a time series */ x = {0 0 0, 0.1 0.2 0.3, 0.3 0.8 0.5, 0.7 0.9 0.8, 1 1 1 }; B = DifOp(nrow(x)); d = B*x; /* apply the difference operator */ print d[L="Difference of Columns"]; |
Other operators in time series analysis can also be represented by matrices. For example, the first-order lag operator is represented by a matrix that has +1 on the super-diagonal. Moving average operators also have matrix representations.
The matrix formulation is efficient for short time series but is not efficient for a time series that contains thousands of elements. If the time series contains n elements, then the dense-matrix representation of the difference operator contains about n2 elements, which consumes a lot of RAM when n is large.
However, as we have seen,
the matrix representation of an operator is advantageous
when you want to operate on a large number of short time series, as might arise in a simulation.
The post Difference operators as matrices appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Discounted Certification Exams at SAS Analytics Experience 2017 appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
SAS is hosting the premier analytics conference in the world September 18-20 in Washington DC, and supplementing the event with discounted training and certification exams at SAS Analytics Experience 2017. These offerings will be held before and after the event. As at other SAS events, we will be offering certification exams at […]
The post Discounted Certification Exams at SAS Analytics Experience 2017 appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Employment – that’s been a hot topic here in the US lately. Many of the manufacturing jobs we had in past decades are gone now, and it would be great if there was a crystal ball to predict which jobs might be at risk of disappearing in the future. The […]
The post Risks to US employment – automation and offshoring appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Songs most frequently banned at weddings! appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
A lot of my friends seem to be getting married these days. Which got me thinking about wedding parties. Which then got me wondering what songs DJs do/don’t play at weddings these days. And what was the outcome of my meandering thoughts … a fun & interesting graph, of course! It […]
The post Songs most frequently banned at weddings! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post A quantile definition for skewness appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
Skewness is a measure of the asymmetry of a univariate distribution. I have previously shown how to compute the skewness for data distributions in SAS. The previous article computes Pearson’s definition of skewness, which is based on the standardized third central moment of the data.
Moment-based statistics are sensitive to extreme outliers. A single extreme observation can radically change the mean, standard deviation, and skewness of data. It is not surprising, therefore, that there are alternative definitions of skewness. One robust definition of skewness that is intuitive and easy to compute is a quantile definition, which is also known as the Bowley skewness or Galton skewness.
The quantile definition of skewness uses Q1 (the lower quartile value), Q2 (the median value), and Q3 (the upper quartile value). You can measure skewness as the difference between the lengths of the upper quartile (Q3-Q2) and the lower quartile (Q2-Q1), normalized by the length of the interquartile range (Q3-Q1). In symbols, the quantile skewness γ_{Q} is
You can visualize this definition by using the figure to the right.
For a symmetric distribution, the quantile skewness is 0 because the length Q3-Q2 is equal to the length Q2-Q1.
If the right length (Q3-Q2) is larger than the left length (Q2-Q1), then the quantile skewness is positive.
If the left length is larger, then the quantile skewness is negative.
For the extreme cases when Q1=Q2 or Q2=Q3, the quantile skewness is ±1.
Consequently, whereas the Pearson skewness can be any real value, the quantile skewness is bounded in the interval [-1, 1].
The quantile skewness is not defined if Q1=Q3, just as the Pearson skewness is not defined when the variance of the data is 0.
There is an intuitive interpretation for the quantile skewness formula. Recall that the
relative difference between
two quantities R and L can be defined as their difference divided by their average value. In symbols, RelDiff = (R – L) / ((R+L)/2). If you choose R to be the length Q3-Q2 and L to be the length Q2-Q1, then quantile skewness is half the relative difference between the lengths.
It is instructive to simulate some skewed data and compute the two measures of skewness.
The following SAS/IML statements simulate 1000 observations from a Gamma(a=4) distribution. The Pearson skewness of a Gamma(a) distribution is 2/sqrt(a), so the Pearson skewness for a Gamma(4) distribution is 1. For a large sample, the sample skewness should be close to the theoretical value. The QNTL call computes the quantiles of a sample.
/* compute the quantile skewness for data */ proc iml; call randseed(12345); x = j(1000, 1); call randgen(x, "Gamma", 4); skewPearson = skewness(x); /* Pearson skewness */ call qntl(q, x, {0.25 0.5 0.75}); /* sample quartiles */ skewQuantile = (q[3] -2*q[2] + q[1]) / (q[3] - q[1]); print skewPearson skewQuantile; |
For this sample, the Pearson skewness is 1.03 and the quantile skewness is 0.174. If you generate a different random sample from the same Gamma(4) distribution, the statistics will change slightly.
In general, there is no simple relationship between quantile skewness and Pearson skewness for a data distribution. (This is not surprising: there is also no simple relationship between a median and a mean, nor between the interquartile range and the standard deviation.)
Nevertheless,
it is interesting to compare the Pearson skewness to the quantile skewness for a particular probability distribution.
For many probability distributions, the Pearson skewness is a function of the parameters of the distribution.
To compute the quantile skewness for a probability distribution, you can use the quantiles for the distribution. The following SAS/IML statements compute the skewness for the Gamma(a) distribution for varying values of a.
/* For Gamma(a), the Pearson skewness is skewP = 2 / sqrt(a). Use the QUANTILE function to compute the quantile skewness for the distribution. */ skewP = do(0.02, 10, 0.02); /* Pearson skewness for distribution */ a = 4 / skewP##2; /* invert skewness formula for the Gamma(a) distribution */ skewQ = j(1, ncol(skewP)); /* allocate vector for results */ do i = 1 to ncol(skewP); Q1 = quantile("Gamma", 0.25, a[i]); Q2 = quantile("Gamma", 0.50, a[i]); Q3 = quantile("Gamma", 0.75, a[i]); skewQ[i] = (Q3 -2*Q2 + Q1) / (Q3 - Q1); /* quantile skewness for distribution */ end; title "Pearson vs. Quantile Skewness"; title2 "Gamma(a) Distributions"; call series(skewP, skewQ) grid={x y} label={"Pearson Skewness" "Quantile Skewness"}; |
The graph shows a nonlinear relationship between the two skewness measures. This graph is for the Gamma distribution; other distributions would have a different shape. If a
distribution has a parameter value for which the distribution is symmetric, then the graph will go through the point (0,0). For highly skewed distributions, the quantile skewness will approach ±1 as the Pearson skewness approaches ±∞.
Several researchers have noted that there is nothing special about using the first and third quartiles to measure skewness. An alternative formula (sometimes called Kelly’s coefficient of skewness) is to use deciles: γ_{Kelly} = ((P90 – P50) – (P50 – P10)) / (P90 – P10). Hinkley (1975) considered the q_th and (1-q)_th quantiles for arbitrary values of q.
The quantile definition of skewness is easy to compute. In fact, you can compute the statistic by hand without a calculator for small data sets. Consequently, the quantile definition provides an easy way to quickly estimate the skewness of data. Since the definition uses only quantiles, the quantile skewness is robust to extreme outliers.
At the same time, the Bowley-Galton quantile definition has several disadvantages. It uses only the central 50% of the data to estimate the skewness. Two different data sets that have the same quartile statistics will have the same quantile skewness, regardless of the shape of the tails of the distribution. And, as mentioned previously, the use of the 25th and 75th percentiles are somewhat arbitrary.
Although the Pearson skewness is widely used in the statistical community, it is worth mentioning that the quantile definition is ideal for use with a box-and-whisker plot.
The Q1, Q2, and Q2 quartiles are part of every box plot.
Therefore you can visually estimate the quantile skewness as the relative difference between the lengths of the upper and lower boxes.
The post A quantile definition for skewness appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Organize your work with SAS® Enterprise Guide® Projects appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Nowadays, whether you write SAS programs or use point-and-click methods to get results, you have choices for how you access SAS. Currently, when you open Base SAS most people get the traditional SAS windowing environment (aka Display Manager) as their interface. But it doesn’t have to be that way. If […]
The post Organize your work with SAS® Enterprise Guide® Projects appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Datasets are rarely ready for analysis, and one of the most prevalent problems is missing data. This post is the first in a short series focusing on how to think about missingness, how JMP13 can help us determine the scope of missing data in a given table, and how to […]
The post How severe is your missing data problem? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Finding important predictors: Using your data to explain what’s going on appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
St. Louis Union Station welcomed its first passenger train on Sept. 2, 1894 at 1:45 pm and became one of the largest and busiest passenger rail terminals in the world. Back in those days, the North American railroads widely used a system called Timetable and Train Order Operation to establish […]
The post Finding important predictors: Using your data to explain what’s going on appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Customize your keys in SAS Enterprise Guide with AutoHotkey appeared first on The SAS Dummy.
]]>This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
SAS power users (and actually, power users of any application) like to customize their environment for maximum productivity. Long-time SAS users remember the KEYS window in SAS display manager, which allows you to assign SAS commands to “hot keys” in your SAS session. These users will invest many hours to come up with the perfect keyboard mappings to suit the type of work that they do.
When using SAS Enterprise Guide, these power users often lament the lack of a similar KEYS window. But these people needn’t suffer with the default keys — a popular tool named AutoHotkey can fill the gap for this and for any other Windows application. I’ve recommended it to many SAS users over the years, and I’ve heard positive feedback from those who have adopted it. AutoHotkey is free, and it’s lightweight and portable; even users with “locked-down” systems can usually make use of it.
AutoHotkey provides its own powerful scripting language, which allows you define new behaviors for any key combination that you want to repurpose. When you activate these scripts, AutoHotkey gets first crack at processing your keypress, so you can redirect the built-in key mappings for any Windows application. I’ll share two examples of different types of scripts that users have found helpful.
In SAS Enterprise Guide, F3 and F8 are both mapped to “Run program”. A newer user found the F8 mapping confusing because she had a habit of using that key for something else, and so became quite annoyed when she kept accidentally running her process before she was ready.
The following AutoHotkey script “eats” the F8 keypress. The logic first checks to see if the running process is SAS Enterprise Guide (seguide.exe), and if so, it simply stops processing the action, effectively vetoing the F8 action.
F8:: WinGet, Active_ID, ID, A WinGet, Active_Process, ProcessName, ahk_id %Active_ID% if ( Active_Process ="seguide.exe" ) { ;eat the keystroke }
I recently shared a tip to close all open data sets in SAS Enterprise Guide. It’s a feature on the Tools menu that launches a special window, and some readers wished for a single key mapping to get the job done. Using AutoHotkey, you can map a series of clicks/keystrokes to a single key.
The following script will select the menu item, activate the “View Open Data Sets” window, and then select Close All.
F12:: WinGet, Active_ID, ID, A WinGet, Active_Process, ProcessName, ahk_id %Active_ID% if ( Active_Process ="seguide.exe" ) { Sleep, 100 Send {Alt Down}{Alt Up}{t} Sleep, 100 Send, {v} WinActivate, View Open Data Sets ahk_class WindowsForms10.Window.8.app.0.143a722_r12_ad1 Send, {Tab} Sleep, 100 Send, {Space} Sleep, 500 Send, {Esc} }
You’ll see that one of the script commands activates the “View Open Data Sets” window. The window “class” is referenced, and the class name is hardly intuitive. AutoHotkey includes a “Window spy” utility called “Active Window Info” that can help you to find the exact name of the window you need to activate.
AutoHotkey can direct mouse movements and clicks, but those directives might not be reliable in different Windows configurations. In my scripts, I rely on simulated keyboard commands. This script activates the top-level menu with Alt+”t” (for Tools), then “v” (for the “View Open Data Sets” window), then TAB to the “Close All” button, space bar to press the button, then Escape to close the window. Each action takes some time to take effect, so “Sleep” commands are inserted (with times in milliseconds) to allow the actions to complete.
Every action in SAS Enterprise Guide is accessible by the keyboard (even if several keystrokes are required). If you want to see all of the already-defined keyboard mappings, search the SAS Enterprise Guide help for “keyboard shortcuts.”
In this article, I’ve only just scratched the surface of how you can customize keys and automate actions in SAS Enterprise Guide. Some of our users have asked us to build in the ability to customize key actions within the application. While that might be a good enhancement within the boundaries of your SAS applications, a tool like AutoHotkey can help you to automate your common tasks within SAS and across other applications that you use. The scripting language presents a bit of a learning curve, but the online help is excellent. And there is a large community of AutoHotkey users who have published hundreds of useful examples.
Have you used AutoHotkey to automate any SAS tasks? If so, please share your tips here in the comments or within the SAS Enterprise Guide community.
The post Customize your keys in SAS Enterprise Guide with AutoHotkey appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
My previous blog post focused on a graph, showing the % of women earning STEM degrees in various fields. While that graph was was designed to answer a very specific question, let’s now look at the data from a broader perspective. Let’s look at the total number of STEM degrees […]
The post Tracking STEM degrees – a deeper look! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |