The post Let’s celebrate 40 years of SAS users appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Let’s go back to 1976 for a moment. Did you know that was the year Muhammad Ali introduced a line of beauty products called “Knock Out?” And the hottest merchandise, including t-shirts, posters and even beanbags sported the character Arthur Fonzarelli (The Fonz) of TV’s “Happy Days.” But there was something […]
The post Let’s celebrate 40 years of SAS users appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Visualize the Cantor function in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
I was a freshman in college the first time I saw the Cantor middle-thirds set and the related Cantor “Devil’s staircase” function. (Shown at left.) These constructions expanded my mind and led me to study fractals, real analysis, topology, and other mathematical areas.
The Cantor function and the Cantor middle-thirds set are often used as counter-examples to mathematical conjectures.
The Cantor set is defined by a recursive process that requires infinitely many steps. However, you can approximate these pathological objects in a matrix-vector language such as SAS/IML with only a few lines of code!
The Cantor middle-thirds set is defined by the following iterative algorithm. The algorithm starts with the closed interval [0,1], then does the following:
After two steps you have four intervals: [0,1/9], [2/9,1/3], [2/3, 7/9], and [8/9,1]. After three steps you have eight intervals of length 1/27. After k steps you have 2^{k} intervals of length 3^{–k}.
The Cantor set is the set of all points that never get removed during this infinite process. The Cantor set clearly contains infinitely many points because it contains the endpoints of the intervals that are removed: 0, 1, 1/3, 2/3, 1/9. 2/9, 7/9, 8/9, and so forth. Less intuitive is the fact that the cardinality of the Cantor set is uncountably infinite even though it is a set of measure zero.
The Cantor function F: [0,1] → [0,1] can be defined iteratively in a way that reflects the construction of the Cantor middle-thirds set. The function is shown at the top of this article.
Visualize the Cantor staircase function in #SAS.
Click To Tweet
This is a SAS-related blog, so I want to visualize the Cantor function in SAS. The middle-third intervals during the kth step of the construction have length 3^{–k}, so you can stop the construction after a small number of iterations and get a decent approximation. I’ll use k=8 steps.
Although the Cantor set and function were defined geometrically, they have an equivalent definition in terms of decimal expansion. The Cantor set is the set of decimal values that can be written in base 3 without using the ‘1’ digit. In other words, elements of the Cantor set have the form
x = 0.a_{1}a_{2}a_{3}… (base 3), where a_{i} equals 0 or 2.
An equivalent definition in terms of fractions is x = Σ_{i} a_{i}3^{-i} where a_{i} equals 0 or 2.
Although the sum is infinite, you can approximate the Cantor set by truncating the series after finitely many terms. A sum like this can be expressed as an inner product x = a*v` where a is a k-element row vector that contains 0s and 2s and v is a vector that contains the elements {1/3, 1/9, 1/27, …, 1/3^{-k}}.
You can define B to be a matrix with k columns and 2^{k} rows that contains all combinations of 0s and 2s. Then the matrix product B*v is an approximation to the Cantor set after k steps of the construction. It contains the right-hand endpoints of the middle-third intervals.
In SAS/IML you can use the EXPANDGRID function to create a matrix whose rows contain all combinations of 0s and 2s. The ## operator raises an element to a power. Therefore the following statements construct and visualize the Cantor function.
With a little more effort, you can write a few more statements that improve the approximation and add fractional tick marks to the axes, as shown in the graph at the top of this article.
proc iml; /* rows of B contain all 8-digit combinations of 0s and 2s */ B = expandgrid({0 2}, {0 2}, {0 2}, {0 2}, {0 2}, {0 2}, {0 2}, {0 2}); B = B[2:nrow(B),]; /* remove first row of zeros */ k = ncol(B); /* k = 8 */ v = 3##(-(1:k)); /* vector of powers 3^{-i} */ t = B * v`; /* x values: right-hand endpts of middle-third intervals */ u = 2##(-(1:k)); /* vector of powers 2^{-i} */ f = B/2 * u`; /* y values: Cantor function on Cantor set */ call series(t, f); /* graph the Cantor function */ |
I think this is a very cool construction. Although the Cantor function is defined iteratively, there are no loops in this program. The loops are replaced by matrix multiplication and vectors. The power of a matrix language is that it enables you to compute complex quantities with only a few lines of programming.
Do you have a favorite example from math or statistics that changed the way that you look at the world? Leave a comment.
This short article cannot discuss all the mathematically interesting features of the Cantor set and Cantor function. The following references are provided for the readers who want additional information:
The post Visualize the Cantor function in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Could I get a Brexit map over here, please!?! appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
With the recent vote for the United Kingdom to leave the European Union being all over the news, I was a bit embarrassed to realize I didn’t know exactly what areas are (and aren’t) considered part of the UK. After a few Google searches, I found the following map on the […]
The post Could I get a Brexit map over here, please!?! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Video: Demonstrating the new features in SAS Enterprise Guide 7.1 appeared first on The SAS Dummy.
]]>This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
Would you like to see the latest features of SAS Enterprise Guide in action? Of course you would! That’s why it’s well worth the 12 minutes of your time to watch this video from SAS Global Forum 2016.
In the video, Casey Smith (SAS’ R&D manager of the SAS Enterprise Guide team) shows off the favorite new features, including:
Casey also talks about his unique perspective as a second-generation SAS user. His Mom is a long-time SAS user; Casey was raised with SAS in the house! It’s only appropriate that Casey went on to join SAS as an employee. He frequently presents for user groups and you can often find Casey (as CaseyS_SAS) on the SAS Enterprise Guide discussion board in SAS Support Communities.
The post Video: Demonstrating the new features in SAS Enterprise Guide 7.1 appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
The post In praise of simple graphics appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
‘Tis a gift to be simple.
— Shaker hymn
In June 2015 I published a short article for Significance, a magazine that features statistical and data-related articles that are of general interest to a wide a range of scientists.
The title of my article is “In Praise of Simple Graphics.” It is based on a blog post “Visualizing the causes of airline crashes.”
My article compares infographics and statistical graphics.
Infographics are designed to appeal as well as to inform. Unfortunately, a beautiful artistic display can sometimes obscure the data.
In contrast,
a statistician usually has a different goal: represent the data objectively and let the data speak for themselves. Standard statistical graphics are purposely free of excess adornment in a
Tuftean effort to maximize the data-ink ratio. Their beauty is in their minimalist simplicity.
Yes, I sometimes create complex graphs on my blog. In the past three weeks I’ve featured
spaghetti plots,
lasagna plots, and
effect plots.
However, I create complex graphs only to visualize complex data or models. For simple data, I advocate using a simple graph. I strive to never let the graph get in the way of the data.
To paraphrase Einstein, graphs should be as complex as necessary, but no more complex.
Graphs should be as complex as necessary, but no more complex. #DataViz #StatWisdom
Click To Tweet
You can read my article “In Praise of Simple Graphics” at the Significance web site. If you like data analysis, graphics, and statistical ideas,
the Significance magazine archives
are a great resource. All issues of Significance are freely available one year after publication. Enjoy!
The post In praise of simple graphics appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Assign a SAS library to a different path depending on your OS appeared first on The SAS Dummy.
]]>This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
One thing that we have a lot of at SAS: installations of SAS software that we can run. I have SAS for Windows on my laptop, and I have access to many centralized instances of SAS that run on Linux and Windows servers. (I also have access to mainframe SAS, though it’s been a while since I’ve used it. When I log in, I picture a Rube-Goldberg style mechanism that pokes an intern to mount a tape so my profile can be reloaded.)
I often develop programs using my local instance of SAS and SAS Enterprise Guide, but deploy them for use on a central server. I might run them as batch jobs or interactively with SAS Enterprise Guide or SAS Studio or even in SAS/IntrNet.
Our IT department wants SAS employees to have seamless access to their files whether on Windows or on Unix-style file systems, and so they make it easy to access the same network path from Windows (using UNC notation, or “\\server\path” syntax) and Unix (using “/node/usr/path” syntax). As I develop my SAS programs, I want the programs to work the same whether run from Windows or Unix, and I don’t want to have to change LIBNAME paths each time. Fortunately, SAS programs are usually portable across different operating systems, and while SAS data sets might have different encodings across systems, SAS can always read a data set that was created by a different version.
I have a simple technique that references the proper path for the operating system that I’m using. I build a SAS macro variable by using the IFC function and the &SYSSCP automatic variable to check whether I’m running on Windows, then assign the path accordingly.
/* Use the IFC function as a shorthand for if-then, returning a character string */ %let tgtpath = %sysfunc( ifc(&SYSSCP. = WIN, \\sasprod\root\dept\mydept\project, /r/node/vol/vol01/mydept/project ) ); libname tgt "&tgtpath."; |
When I run this on SAS for Linux, I see this in the log:
NOTE: Libref TGT was successfully assigned as follows: Engine: V9 Physical Name: /r/node/vol/vol01/mydept/project
And on Windows:
NOTE: Libref TGT was successfully assigned as follows: Engine: V9 Physical Name: \\sasprod\root\dept\mydept\project
The post Assign a SAS library to a different path depending on your OS appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
For more posts like this, see Heuristic Andrew.
This post was kindly contributed by Heuristic Andrew - go there to comment and to read the full post. |
SAS can give the error The SAS System stopped processing this step because of insufficient memory when querying a single, wide row from a remote SQL Server. The following code fully demonstrates the problem and shows a workaround. Also, I eliminate the explanation that SAS data sets in general do not support rows this wide.
This post was kindly contributed by Heuristic Andrew - go there to comment and to read the full post. |
The post Are there Zika mosquitoes in your county? appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Let’s create a souped-up SAS map that can track Zika-carrying mosquitoes down to the county level, in the US! A few months ago, I wrote a blog post with a world map of documented locations of the Aedes mosquitoes that could carry the Zika virus. The world map showed a high concentration […]
The post Are there Zika mosquitoes in your county? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Use the EFFECTPLOT statement to visualize regression models in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
Graphs enable you to visualize how the predicted values for a regression model depend on the model effects. You can gain an intuitive understanding of a model by using the EFFECTPLOT statement in SAS to create graphs like the one shown at the top of this article.
Many SAS regression procedures automatically create ODS graphics for simple regression models. For more complex models (including interaction effects and link functions), you can use
the EFFECTPLOT statement to construct effect plots. An effect plot shows the predicted response as a function of certain covariates while other covariates are held constant.
Use effect plots in #SAS to help interpret regression models. #DataViz
Click To Tweet
The EFFECTPLOT statement was introduced in SAS 9.22, but it is not as well known as it should be. Although many procedure include an EFFECTPLOT statement as part of their syntax, I will use the PLM procedure (PLM = post-linear modeling) to show how to construct effect plots. I have previously shown how to use the PLM procedure to score regression models. A good introduction to the PLM procedure is Tobias and Cai (2010), “Introducing PROC PLM and Postfitting Analysis for Very General Linear Models.”
The data for this article is the Sashelp.BWeight data set, which is distributed with SAS. There are 50,000 records. Each row gives information about the birth weight of a baby, including information about the mother. This article uses the following variables:
The following DATA step creates a SAS view that creates an indicator variable, Underweight, which has the value 1 if the baby’s birth weight was less than 2500 grams and 0 otherwise:
/* Underweight=1 if the birth weight is <2500 grams and Underweight=0 otherwise */ data babyWeight / view=BabyWeight; set sashelp.bweight; Underweight = (Weight < 2500); run; |
To illustrate the capabilities of the EFFECTPLOT statement, the following statements use PROC LOGISTIC to model the probability of having an underweight boy baby (less than 2500 grams). The explanatory effects are MomAge, CigsPerDay, and the interaction effect between those two variables.
The STORE statement creates an item store called logiModel. The item store is read by PROC PLM, which creates the effect plot:
proc logistic data=babyWeight; where Boy=1; /* restrict to baby boys */ model Underweight(event='1') = MomAge | CigsPerDay; store logiModel; run; title "Probability of Underweight Boy Baby"; proc plm source=logiModel; effectplot fit(x=MomAge plotby=CigsPerDay); run; |
In this example,
the output is a panel of plots that show the predicted probability of having an underweight boy baby as a function of the mother’s relative age. (Remember: the age is centered at 27 years.) The panel shows slices of the continuous CigsPerDay variable, which enables you to see how the predicted response changes with increasing cigarette use.
The graphs indicate that the probability of an underweight boy is very low in nonsmoking mothers, regardless of the mother’s age. In smoking mothers, however, the probability of having an underweight boy increases with age. For mothers of a given age, the probability of an underweight boy increases with the number of cigarettes smoked.
The example shows a panel of fit plots, where the paneling variable is determined by the PLOTBY= option. You can also “stack” the predicted probability curves by using a slice plot. You can specify a slice plot by using the SLICEFIT keyword. You specify the slicing variable by using the SLICEBY= option, as follows:
proc plm source=logiModel; effectplot slicefit(x=MomAge sliceby=CigsPerDay); run; |
An example of a slice plot is shown in the next section.
You can also use the EFFECTPLOT statement to create a contour plot of the predicted response as a function of the two continuous covariates, which is also shown in the next section.
The effect plot is especially useful when visualizing complex models. When there are several independent variables and interactions, you can create multiple plots that show the predicted response at various levels of categorical or continuous variables. By default, covariates that do not appear in the plots are fixed at their mean level (for continuous variables) or their reference level (for classification variables).
The previous example used a WHERE clause to restrict the data to boy babies. Suppose that you want to include the gender of the baby as a covariate in the regression model. The following call to PROC LOGISTIC includes the main effects and two-way interactions between two continuous and one classification variable.
The call to PROC PLM creates a panel of slice plots. Each slice plot shows predicted probability curves for slices of the CigsPerDay variable. The panels are determined by levels of the Boy variable, which is specified on the PLOTBY= option:
proc logistic data=babyWeight; class Boy; model Underweight(event='1') = MomAge | CigsPerDay | Boy @2; store logiModel; run; proc plm source=logiModel; effectplot slicefit(x=MomAge sliceby=CigsPerDay plotby=Boy); run; |
The output is shown in the graph at the top of this article.
The right side of the panel shows the predicted probabilities for boys. These curves are similar to those in the previous example, but now they are overlaid on a single plot. The left side of the panel shows the corresponding curves for girl babies. In general, the model predicts that girl babies have a higher probability to be underweight (relative to boys) in smoking mothers. The effect is noticeable most dramatically for younger mothers.
If you want to add confidence limits for the predicted curves, you can use the CLM option: effectplot slicefit(...) / CLM.
You can specify the levels of a continuous variable that are used to slice or panel the curves. For example, most cigarettes come in a pack of 20, so the following EFFECTPLOT statement visually compares the effect of smoking for pregnant women who smoke zero, one, or two packs per day:
effectplot slicefit(x=MomAge sliceby=CigsPerDay=0 20 40 plotby=Boy); |
Notice that there are no parentheses around the argument to the SLICEBY= option. That is, you might expect the syntax to be sliceby=(CigsPerDay=0 20 40), but that syntax is not supported.
If you want to directly compare the probabilities for boys and girls, you might want to interchange the SLICEBY= and PLOTBY= variables. The following statements create a graph that has three panels, and each panel directly compares boys and girls:
proc plm source=logiModel; effectplot slicefit(x=MomAge sliceby=boy plotby=CigsPerDay=0 20 40); run; |
As mentioned previously, you can also create contour plots that display the predicted response as a function of two continuous variables. The following statements create two contour plots, one for boy babies and one for girls:
proc plm restore=logiModel; effectplot contour(x=MomAge y=CigsPerDay plotby=Boy); run; |
The EFFECTPLOT statement
enables you to create plots that visualize interaction effects in complex regression models.
The EFFECTPLOT statement is a hidden gem in SAS/STAT software that deserves more recognition.
The easiest way to create an effect plot is to use the STORE statement in a regression procedure to create an item store, then use PROC PLM to create effect plots. In that way, you only need to fit a model once, but you can create many plots that help you to understand the model.
You can overlay curves, create panels, and even create contour plots. Several other plot types are also possible. See the documentation for the EFFECTPLOT statement for the full syntax, options, and additional examples of how to create plots that visualize interactions in generalized linear models.
The post Use the EFFECTPLOT statement to visualize regression models in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Will independent voters decide the election? appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
In recent years, more and more people have been registering as independent voters in the US, rather than Democrat or Republican – the independents now control well over 1/3 of the votes. Will they likely vote for the Democrat or Republican candidates in the upcoming election? Let’s break down some numbers […]
The post Will independent voters decide the election? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |