The post Looking for indications of fraud, in North Carolina's absentee ballots appeared first on SAS Learning Post.
]]>This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Certain North Carolina counties have been in the news lately, for suspected election fraud involving absentee ballots in the 2018 election. Let’s analyze the voter registration and absentee ballot data, to see if we can detect anything suspicious! In order to definitively determine whether fraud & illegal activity occurred, investigators […]
The post Looking for indications of fraud, in North Carolina's absentee ballots appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post The essential guide to bootstrapping in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
This article describes best practices and techniques that every data analyst should know before bootstrapping in SAS.
The bootstrap method is a powerful statistical technique, but it can be a challenge to implement it efficiently.
An inefficient bootstrap program can take hours to run, whereas a well-written program can give you an answer in an instant.
If you prefer “instants” to “hours,” this article is for you! I’ve compiled dozens of resources that explain how to compute bootstrap statistics in SAS.
Recall that a bootstrap analysis enables you to investigate the sampling variability of a statistic without making any distributional assumptions about the population. For example, if you compute the skewness of a univariate sample, you get an estimate for the skewness of the population. You might want to know the range of skewness values that you might observe from a second sample (of the same size) from the population. If the range is large, the original estimate is imprecise. If the range is small, the original estimate is precise. Bootstrapping enables you to estimate the range by using only the observed data.
In general, the basic bootstrap method consists of four steps:
The links in the previous list provide examples of best practices for bootstrapping in SAS. In particular, do not fall into the trap of using a macro loop to “resample, analyze, and append.” You will eventually get the correct bootstrap estimates, but you might wait a long time to get them!
The remainder of this article is organized by the three ways to perform bootstrapping in SAS:
The articles in this section describe how to program the bootstrap method in SAS for basic univariate analyses, for regression analyses, and for related resampling techniques such as the jackknife and permutation tests. This section also links to articles that describe how to generate bootstrap samples in SAS.
When you bootstrap regression statistics, you have two choices for generating the bootstrap samples:
An important part of a bootstrapping is generating multiple bootstrap samples from the data. In SAS, there are many ways to obtain the bootstrap samples:
The SAS-supplied macros %BOOT, %JACK, and %BOOTCI,
can perform basic bootstrap analyses and jackknife analyses. However, they require a familiarity with writing and using SAS macros. If you are interested, I wrote an example that shows how to use the %BOOT and %BOOTCI macros for bootstrapping. The documentation also provides several examples.
Many SAS procedures not only compute statistics but also provide standard errors or confidence intervals that enable you to infer whether an estimate is precise. Many confidence intervals are based on distributional assumptions about the population. (“If the errors are normally distributed, then….”) However, the following SAS procedures provide an easy way to obtain a distribution-free confidence interval by using the bootstrap. See the SAS/STAT documentation for the syntax for each procedure.
Resampling techniques such as bootstrap methods and permutation tests are widely used by modern data analysts. But how you implement these techniques can make a huge difference between getting the results in a few seconds versus a few hours.
This article summarizes and consolidates many previous articles that demonstrate how to perform an efficient bootstrap analysis in SAS. Bootstrapping enable you to investigate the sampling variability of a statistic without making any distributional assumptions. In particular, the bootstrap is often used to estimate standard errors and confidence intervals for parameters.
The post The essential guide to bootstrapping in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Does SAS support Microsoft Office 365? appeared first on The SAS Dummy.
]]>This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
In her role as Product Manager for SAS Platform Technologies (including the SAS Add-In for Microsoft Office), my colleague Amy Peters hears this question often. With many organizations adopting Microsoft Office 365 — the “cloud” version of Office — what does this mean for other processes that integrate with Microsoft Office applications?
Microsoft has used different names for these similar offerings: Office 2016, Office 365, Microsoft 365, Office Online. The bottom line is that most users of a “365” package in the cloud, also have access to the Microsoft Office tools on their Windows desktop. They can use the full version of Excel, PowerPoint, Word, etc., and they also have access to these same tools via a web browser. At SAS, we recently experienced this transition ourselves. Have the Office applications on our desktops vanished? No, they have not. While more of our data is now on the cloud (looking at you, OneDrive), it’s not really changing how we work, especially when creating/maintaining content. (Like many organizations, we already had one foot in this world by using Microsoft SharePoint for collaboration.)
Let’s look at an example of how I use SAS with Microsoft Office. First, I create a report in SAS Visual Analytics. Then I open Excel on my desktop and use the SAS Add-In for Microsoft Office to embed the shared report into my spreadsheet. Want to see what that looks like in action? Check out this video Tech Talk with SAS developer Tim Beese.
Now suppose that I share this content in Microsoft OneDrive, and my colleague views it in Excel in a web browser. Yep, the content is still there. The difference is that the content is not dynamic like it is on my desktop. So what do you do when you want to edit that spreadsheet displaying in the browser? You select Open in Excel and the document opens on your desktop. Voila! The content is dynamic and you have all the functionality the SAS Add-In for Microsoft Office provides.
Today, the expectation of most users working with “Office Online” applications in their browsers is that it’s primarily for viewing and basic editing. Will this change? Probably. We’re researching how to provide more of the SAS Add-In for Microsoft Office function in a browser app. If you or your colleagues need this browser-based function – you want to do something specific in Excel with your SAS content — we want to hear from you. And do you have a plan to move completely to browser-based Office apps? Currently you can’t create SAS content from a browser-based Office app. If that’s a pressing need, we would like to know. For now, we’re not hearing of use cases where some form of the desktop app isn’t still in the picture.
SAS integration with these everyday productivity tools, like Microsoft Office, is important to us. Don’t forget about these SAS programming methods to create and read your Microsoft Office content:
How are you using Microsoft Office 365 with SAS? How do you think this workflow will change for you in the next year or two? Leave a comment — we would love to hear from you.
The post Does SAS support Microsoft Office 365? appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
We got our first ‘big’ snow of the season here at the SAS headquarters in Cary, NC … therefore I thought this would be a great time to dig into some snow data! Follow along and pick up some tips & tricks as I plot our snow data – and […]
The post Plotting snow data for your area appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
SAS has partnered with Pearson VUE to offer Online Proctored for all public SAS exams.
The post Take your SAS exam from home or office appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Visualize Christmas songs appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The best way to spread Christmas cheer
is singing loud for all to hear!
-Buddy in Elf
In the Christmas movie Elf (2003), Jovie (played by Zooey Deschanel) must “spread Christmas cheer” to help Santa. She chooses to sing “Santa Claus is coming to town,” and soon all of New York City is singing along.
The best sing-along songs are short and have lyrics that repeat. Jovie’s choice, “Santa Claus is coming to town,” satisfies both criteria. The musical structure of the song is simple:
There is a fun way to visualize repetition in song lyrics. For a song that has N words, you can define the repetition matrix to be the N x N matrix where the (i,j)th cell has the value 1 if the i_th word is the same as the j_th word. Otherwise, the (i,j)th cell equals 0. You can visualize the matrix by using a two-color heat map. Colin Morris has a web site devoted to these visualizations.
The following image visualizes the lyrics of “Santa Claus is coming to town.” I have added some vertical and horizontal lines to divide the lyrics into seven sections: the verses (V1 and V2), the tag line (S), and the bridge (B).
The image shows the structure of the repetition in the song lyrics:
Now that you understand what a repetition matrix looks like and how to interpret it, let’s visualize a few other classic Christmas songs that contain repetitive lyrics! To help “spread Christmas cheer,” I’ll use shades of red and green to visualize the lyrics, rather than the boring white and black colors.
If you make a list of Christmas songs that have repetition, chances are “The Twelve Days of Christmas” will be at the top of the list. The song is formulaic: each new verse adds a few new words before repeating the words from the previous verse. As a result, the repetition matrix is almost boring in its regularity. Here is the visualization of the classic song (click to enlarge):
Another highly repetitive Christmas song is “The Little Drummer Boy,” which features an onomatopoeic phrase (Pa rum pum pum pum) that alternates with the other lyrics. A visualization of the classic song is shown below:
In addition to repeating the title, “Silver Bells” repeats several phrases. Most notably, the phrase “Soon it will be Christmas Day” is repeated multiple times at the end of the song. Because only certain phrases are repeated, the visualization has a pleasing structure that complements the song’s lyrical qualities:
To contrast the hustle, bustle, and commercialism of Christmas, I enjoy hearing songs that are musically simple. One of my favorites is “Silent Night.” Each verse is distinct, yet each begins with “Silent night, holy night!” and ends by repeating a phrase. The resulting visualization is devoid of clutter. It is visually empty and matches the lyrical imagery, “all is calm, all is bright.”
You can
download the SAS program that creates these images. The program also computes visualizations of some contemporary songs such as
“Last Christmas” by Wham!, “Someday at Christmas” (Stevie Wonder version), “Rockin’ Around the Christmas Tree” (Brenda Lee version), and “Happy XMas (War Is Over)” by John Lennon and Yoko Ono. If you have access to SAS, you can even add your own favorite lyrics to the program! If you don’t have access to SAS, Colin Morris’s website enables you to paste in the lyrics and see the visualization.
In a little-known “deleted scene” from Elf, Buddy says that the second-best way to spread Christmas cheer is posting images for all to share! So post a comment and share your favorite visualization of a Christmas song!
Happy holidays to all my readers. I am grateful for you. Merry Christmas to all, and to all a good night!
The post Visualize Christmas songs appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Joseph Woodside discusses the use of using ensemble modeling in SAS for fraud detection and readmissions.
The post Government Healthcare Data Ensemble Modeling appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post When is a histogram not a histogram? When it's a table! appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
Recently a SAS programmer wanted to obtain a table of counts that was based on a histogram. I showed him how you can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to obtain that information. For example, the following call to PROC UNIVARIATE creates a histogram for the MPG_City variable in the Sashelp.Cars data set. The histogram has 11 bins. The OUTHIST= option writes the counts for each bin to a SAS data set:
proc univariate data=Sashelp.Cars noprint; var MPG_City; histogram MPG_City / barlabel=count outhist=MidPtOut; run; proc print data=MidPtOut label; label _MIDPT_ = "Midpoint" _COUNT_="Frequency"; var _MIDPT_ _COUNT_; run; |
As I’ve previously discussed,
PROC UNIVARIATE supports two options for specifying the locations of bins. The
MIDPOINTS option specifies that “nice” numbers (for example, multiples of 2, 5, or 10) are used for the midpoints of the bins; the
ENDPOINTS option specifies that nice numbers are used for the endpoints of the bins; By default, midpoints are used, as shown in the previous section. The following call to PROC UNIVARIATE uses the ENDPOINTS option and writes the new bin counts to a data set. The histogram is not shown.
proc univariate data=Sashelp.Cars noprint; var MPG_City; histogram MPG_City / barlabel=count endpoints outhist=EndPtOut; run; proc print data=EndPtOut; label _MINPT_ = "Left Endpoint" _COUNT_="Frequency"; var _MINPT_ _COUNT_; run; |
If you want to “manually” count the number of observations in each bin, you have a few choices. If you already know the bin width and anchor position for the bins, then you can use a DATA step array to accumulate the counts. You can also use PROC FORMAT to define a format to bin the observations and use PROC FREQ to tabulate the counts.
The harder problem is when you do not have a prior set of “nice” values to use as the endpoints of bins. It is usually not satisfactory to use the minimum and maximum data values as endpoints of the binning intervals because that might result in intervals whose endpoints are long decimal values such as [3.4546667 4.0108333].
Fortunately, the SAS/IML language provides the GSCALE subroutine, which computes “nice” values from a vector of data and the number of bins. The GSCALE routine returns a three-element vector. The first element is the minimum value of the leftmost interval, the second element is the maximum value of the rightmost interval, and the third element is the bin width. For example, the following SAS/IML statements compute nice intervals for the data in the MPG_City variable:
proc iml; use Sashelp.Cars; read all var "MPG_City" into X; close; /* GSCALE subroutine computes "nice" tick values: s[1]<=min(x); s[2]>=max(x) */ call gscale(s, x, 10); /* ask for about 10 intervals */ print s[rowname={"Start" "Stop" "Increment"}]; |
The output from the GSCALE subroutine suggests that a good set of intervals to use for binning the data are [10, 15), [15, 20), …, [55, 60]. These are the same endpoints that are generated by using the ENDPOINTS option in PROC UNIVARIATE. (Actually, the procedure uses half-open intervals for all bins, so it adds the extra interval [60, 65) to the histogram.)
I’ve previously shown how to use the BIN and TABULATE functions in SAS/IML to count the observations in a set of bins. The following statements use the values from the GSCALE routine to form evenly spaced cutpoints for the binning:
cutPoints = do(s[1], s[2], s[3]); /* use "nice" cutpoints from GSCALE */ *cutPoints = do(s[1], s[2]+s[3], s[3]); /* ALTERNATIVE: add additional cutpoint to match UNIVARIATE */ b = bin(x, cutPoints); /* find bin for each obs */ call tabulate(bins, freq, b); /* count how many obs in each bin */ binLabels = char(cutPoints[bins]); /* use left endpoint as labels for bins */ print freq[colname = binLabels label="Count"]; |
Except for the last interval, the counts are the same as for the ENDPOINTS option in PROC UNIVARIATE. It is a matter of personal preference whether you want to treat the last interval as a closed interval or whether you want all intervals to be half open. If you want to exactly match PROC UNIVARIATE, you can modify the definition of the cutPoints variable, as indicated in the program comments.
Notice that the TABULATE routine only reports the bins that have nonzero counts. If you prefer to obtain counts for ALL bins—even bins with zero counts—you can use the TabulateLevels module, which I described in a previous blog post.
In summary, you can use PROC UNIVARIATE or SAS/IML to create a tabular representation of a histogram. Both procedures provide a way to obtain “nice” values for the bin endpoints. If you already know the endpoints for the bins, you can use other techniques in SAS to produce the table.
The post When is a histogram not a histogram? When it's a table! appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
It might snow this weekend here at the SAS headquarters! This would be the first snow of the season for us, and it got me thinking about snow. Apparently these thoughts have manifested themselves in my computer graphics work … in the form of a snow animation. Follow along, and […]
The post Let it snow, let it snow, let it snow! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Find out how to kick-start your volunteering and leadership skills at SAS Global Forum.
The post Take your leadership skills to the next level at SAS Global Forum appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |