This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
A question recently popped up in a discussion forum about creating table plots using SAS. So I thought I’d dabble in this topic, and see what I could come up with. If you’re interested in tableplots, or American Community Survey (ACS) data, or data visualization, I invite you to follow […]
The post Tableplots and ACS data appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post How to use SAS to read a range of cells from Excel appeared first on The SAS Dummy.
]]>This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
I’ve said it before: spreadsheets are not databases. However, many of us use spreadsheets as if they were databases, and then we struggle when the spreadsheet layout does not support database-style rigor of predictable rows, columns, and variable types — the basic elements we need for analytics and reporting. If you’re using SAS to read data from Microsoft Excel, what can you do when the data you need doesn’t begin at cell A1?
By design, SAS can read data from any range of cells in your spreadsheet. In this article, I’ll describe how to use the RANGE statement in PROC IMPORT to get the data you need.
With SAS 9.4 and later, SAS recommends using DBMS=XLSX for the most flexibility. It works on all operating systems without the need for additional components like the PC Files Server. Your Excel file must be in the Excel 2007-or-later format (XLSX). You do need a licence for SAS/ACCESS to PC Files. (Just learning? These DBMS=XLSX techniques also work in SAS University Edition.)
If your Excel data does not begin in cell A1 (the default start point for an import process), then you can add a RANGE= value that includes the specific cells. The easiest method is to use a Named Range in Excel to define the exact boundaries of the data.
To define a named range in Excel, highlight the range of cells to include and simply type the new name of the range in the Name Box:
Then save the Excel file.
Then to import into SAS, specify that range name in the RANGE= option:
proc import datafile="/myprojects/myfile.xlsx" out=mydata replace; range="myspecialrange"; run; |
What if you don’t know the range ahead of time? You can use PROC IMPORT to read the entire sheet, but the result will not have the column headers and types you want. Consider a sheet like this:
This code will read it:
proc import datafile="/myprojects/middle.xlsx" out=mid dbms=xlsx replace; run; |
But the result will contain many empty cells, and the values will be read as all character types:
With additional coding, you can “fix” this result in another pass using DATA step. Or, if you’re willing to add the RANGE option with the Excel notation for the specific cell ranges, you can read it properly in the first pass:
proc import datafile="/myprojects/middle.xlsx" out=mid dbms=xlsx replace; range="Sheet1$E7:K17" ; run; |
You can also use LIBNAME XLSX to read entire sheets from Excel, or simply as a discovery step to see what sheets the Excel file contains before you run PROC IMPORT. However, LIBNAME XLSX does not show the Excel named ranges.
On SAS for Windows systems, you can use LIBNAME EXCEL (32-bit) or LIBNAME PCFILES (64-bit) to reveal a little more information about the Excel file.
libname d pcfiles path="c:\myprojects\middle.xlsx"; proc datasets lib=d; quit; /* always clear the libname, as it locks the file */ libname d clear; |
Note that DBMS=XLSX does not support some of the options we see in the legacy DBMS=XLS (which supports only old-format XLS files), such as STARTROW and NAMEROW. DBMS=XLSX does support GETNAMES (treats the first record of the sheet or range as the variable names). See the full reference for Excel file import/export in the SAS documentation.
The post How to use SAS to read a range of cells from Excel appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
The post The bootstrap method in SAS: A t test example appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
A previous article provides an example of using the BOOTSTRAP statement in PROC TTEST to compute bootstrap estimates of statistics in a two-sample t test. The BOOTSTRAP statement is new in SAS/STAT 14.3 (SAS 9.4M5). However, you can perform the same bootstrap analysis in earlier releases of SAS by using procedures in Base SAS and SAS/STAT. This article gives an example of how to bootstrap in SAS.
A previous article describes how to construct a bootstrap confidence interval in SAS.
The major steps of a bootstrap analysis follow:
This article uses the same bootstrap example as the previous article.
The following SAS DATA step subsets the Sashelp.Cars data to create a data set that contains two groups: SUV” and “Sedan”. There are 60 SUVs and 262 sedans.
The statistic of interest is the difference of means between the two groups.
A call to PROC TTEST computes the difference between group means for the data:
data Sample; /* create the sample data. The two groups are "SUV" and "Sedan" */ set Sashelp.Cars(keep=Type MPG_City); if Type in ('Sedan' 'SUV'); run; /* 1. Compute statistic (difference of means) for data */ proc ttest data=Sample; class Type; var MPG_City; ods output Statistics=SampleStats; /* save statistic in SAS data set */ run; /* 1b. OPTIONAL: Store sample statistic in a macro variable for later use */ proc sql noprint; select Mean into :Statistic from SampleStats where Method="Satterthwaite"; quit; %put &=Statistic; |
STATISTIC= -4.9840 |
The point estimate for the difference of means between groups is -4.98. The TTEST procedure produces a graph (not shown) that indicates that the MPG_City variable is moderately skewed for the “Sedan” group. Therefore you might question the usefulness of the classical parametric estimates for the standard error and confidence interval for the difference of means. The following bootstrap analysis provides a nonparametric estimate about the accuracy of the difference of means.
For many resampling schemes, PROC SURVEYSELECT is the simplest way to generate bootstrap samples. The documentation for PROC TTEST states,
“In a bootstrap for a two-sample design, random draws of size n1 and n2
are taken with replacement from the first and second groups, respectively, and combined to produce a single bootstrap sample.” One way to carry out this sampling scheme is to use the STRATA statement in PROC SURVEYSELECT to sample (with replacement) from the “SUV” and “Sedan” groups. To perform stratified sampling,
sort the data by the STRATA variable.
The following statements sort the data and generate 10,000 bootstrap samples by drawing random samples (with replacement) from each group:
/* 2. Sample with replacement from each stratum. First sort by the STRATA variable. */ proc sort data=Sample; by Type; run; /* Then perform stratified sampling with replacement */ proc surveyselect data=Sample out=BootSamples noprint seed=123 method=urs /* with replacement */ /* OUTHITS */ /* use OUTHITS option when you do not want a frequency variable */ samprate=1 reps=10000; /* 10,000 resamples */ strata Type; /* sample N1 from first group and N2 from second */ run; |
The BootSamples data set contains 10,000 random resamples. Each sample contains 60 SUVs and 262 sedans, just like the original data. The BootSamples data contains a variable named NumberHits that contains the frequency with which each original observation appears in the resample. If you prefer to use duplicated observations, specify the OUTHITS option in the PROC SURVEYSELECT statement. The different samples are identified by the values of the Replicate variable.
Recall that a BY-group analysis is an efficient way to process 10,000 bootstrap samples. Recall also that it is efficient to suppress output when you perform a large BY-group analysis. The following macros encapsulate the commands that suppress ODS objects prior to a simulation or bootstrap analysis and then permit the objects to appear after the analysis is complete:
/* Define useful macros */ %macro ODSOff(); /* Call prior to BY-group processing */ ods graphics off; ods exclude all; ods noresults; %mend; %macro ODSOn(); /* Call after BY-group processing */ ods graphics on; ods exclude none; ods results; %mend; |
With these definitions, the following call to PROC TTEST computes the Satterthwaite test statistic for each bootstrap sample. Notice that you need to sort the data by the Replicate variable because the BootSamples data are ordered by the values of the Type variable. Note also that the NumberHits variable is used as a FREQ variable.
/* 3. Compute statistics */ proc sort data = BootSamples; by Replicate Type; run; %ODSOff /* suppress output */ proc ttest data=BootSamples; by Replicate; class Type; var MPG_City; freq NumberHits; /* Use FREQ variable in analysis (or use OUTHITS option) */ ods output ConfLimits=BootDist(where=(method="Satterthwaite") keep=Replicate Variable Class Method Mean rename=(Mean=DiffMeans)); run; %ODSOn /* enable output */ |
At this point in the bootstrap example, the data set BootDist contains the bootstrap distribution in the variable DiffMeans. You can use this variable to compute various bootstrap statistics. For example, the bootstrap estimate of the standard error is the standard deviation of the DiffMeans variable. The estimate of bias is the difference between the mean of the bootstrap estimates and the original statistic. The percentiles of the DiffMeans variable can be used to construct a confidence interval. (Or you can use a different interval estimate, such as the bias-adjusted and corrected interval.) You might also want to graph the bootstrap distribution. The following statements use PROC UNIVARIATE to compute these estimates:
/* 4. Plot sampling distribution of difference of sample means. Write stats to BootStats data set */ proc univariate data=BootDist; /* use NOPRINT option to suppress output and graphs */ var DiffMeans; histogram DiffMeans; /* OPTIONAL */ output out=BootStats pctlpts =2.5 97.5 pctlname=P025 P975 pctlpre =Mean_ mean=BootMean std=BootStdErr; run; /* use original sample statistic to compute bias */ data BootStats; set BootStats; Bias = BootMean - &Statistic; label Mean_P025="Lower 95% CL" Mean_P975="Upper 95% CL"; run; proc print data=BootStats noobs; var BootMean BootStdErr Bias Mean_P025 Mean_P975; run; |
The results are shown. The bootstrap distribution appears to be normally distributed. This indicates that the bootstrap estimates will probably be similar to the classical parametric estimates. For this problem, the classical estimate of the standard error is 0.448 and a 95% confidence interval for the difference of means is [-5.87, -4.10]. In comparison, the bootstrap estimates are 0.444 and [-5.87, -4.13]. In spite of the skewness of the MPG_City variable for the “Sedan” group, the two-sample Satterthwaite t provides similar estimates regarding the accuracy of the point estimate for the difference of means.
The bootstrap statistics also are similar to the statistics that you can obtain by using the BOOTSTRAP statement in PROC TTEST in SAS/STAT 14.3.
In summary, you can use Base SAS and SAS/STAT procedures to compute a bootstrap analysis of a two-sample t test.
Although the “manual” bootstrap requires more programming effort than using the BOOTSTRAP statement in PROC TTEST, the example in this article generalizes to other statistics for which a built-in bootstrap option is not supported. This article also shows how to use PROC SURVEYSELECT to perform stratified sampling as part of a bootstrap analysis that involves sampling from multiple groups.
The post The bootstrap method in SAS: A t test example appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
When making a new piece of code, I like to use the smallest font I can read. This lets me fit more text on the screen at once. When presenting code to others, especially in a classroom setting, I like to make the font large enough to see from the back of the room. Here’s how I change font size in SAS in our three programming interfaces.
The post Changing font size in SAS appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Ever since the Moneyball book & movie came out, athletes have been scrambling to use data and analytics to gain a competitive advantage. One of my favorite sports is boat racing – the ones you paddle. Follow along as I lead you through some maps and graphs I created for […]
The post SAS analytics for the Gorge Downwind Champs race appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post The BOOTSTRAP statement for t tests in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
Bootstrap resampling is a powerful way to estimate the standard error for a statistic without making any parametric assumptions about its sampling distribution. The bootstrap method is often implemented by using a sequence of calls to resample from the data, compute a statistic on each sample, and analyze the bootstrap distribution. An example is provided in the article “Compute a bootstrap confidence interval in SAS.”
This process can be lengthy and in Base SAS it requires reading and writing a large amount of data. In SAS/STAT 14.3 (SAS 9.4m5), the TTEST procedure supports the BOOTSTRAP statement, which automatically performs a bootstrap analysis of one-sample and two-sample t tests. The BOOTSTRAP statement also applies to two-sample paired tests.
The BOOTSTRAP statement makes it easy to obtain bootstrap estimates of bias and standard error for a statistic and confidence intervals (CIs) for the underlying parameter. The BOOTSTRAP statement supports
several estimates for the confidence intervals, including normal-based intervals, t-based intervals, percentile intervals, and bias-adjusted intervals.
This section shows how to obtain bootstrap estimates for a two-sample t test. The statistic of interest is the difference between the means of two groups.
The following SAS DATA step subsets the Sashelp.Cars data to create a data set that contains only two types of vehicles: sedans and SUVs. A call to PROC UNIVARIATE displays a comparative histogram that shows the distributions of the MPG_City variable for each group. The MPG_City variable measures the fuel efficiency (in miles per gallon) for each vehicle during typical city driving.
/* create data set that has two categories: 'Sedan' and 'SUV' */ data Sample; set Sashelp.Cars(keep=Type MPG_City); if Type in ('Sedan' 'SUV'); run; proc univariate data=Sample; class Type; histogram MPG_City; inset N Mean Std Skew Kurtosis / position=NE; ods select histogram; run; |
Suppose that you want to test whether the mean MPG of the “SUV” group is significantly different from the mean of the “Sedan” group. The groups appear to have different variances, so you would probably choose the Satterthwaite version of the t test, which accommodates different variances. You can use PROC TTEST to run a two-sample t test for these data, but in looking at the distributions of the groups, you might be concerned that the normality assumptions for the t test are not satisfied by these data. Notice that the distribution of the MPG_City variable for the “Sedan” group has high skewness (1.3) and moderately high kurtosis (1.9). Although the t test is somewhat robust to the normality assumption, you might want to use the bootstrap method to estimate the standard error and confidence interval for the difference of means between the two groups.
If you are using SAS/STAT 14.3, you can compute bootstrap estimates for a t test by using the BOOTSTRAP statement, as follows:
title "Bootstrap Estimates with Percentile CI"; proc ttest data=Sample; class Type; var MPG_City; bootstrap / seed=123 nsamples=10000 bootci=percentile; /* or BOOTCI=BC */ run; |
The BOOTSTRAP statement supports three options:
The TTEST procedure produces several tables and graphs, but I have highlighted a few statistics in two tables. The top table is the “ConfLimits” table, which is based on the data and shows the traditional statistics for the t test. The estimate for the difference in means between the “SUV” and “Sedan” groups is -4.98 and is highlighted in blue. The traditional (parametric) estimate for a 95% confidence interval is highlighted in red. The interval is [-5.87, -4.10], which does not contain 0, therefore you can conclude that the group means are significantly different at the 0.05 significance level.
The lower table is the “Bootstrap” table, which is based on the bootstrap resamples.
The TTEST documentation explains the resampling process and the computation of the bootstrap statistics.
The top row of the table shows estimates for the difference of means. The bootstrap estimate for the standard error is 0.45. The estimate of bias (which subtracts the average bootstrap statistic from the sample statistic) is -0.01, which is small. The percentile estimate for the confidence interval is [-5.87, -4.14], which is similar to the parametric interval estimate in the top table. (For comparison, the bias-adjusted CI is also similar: [-5.85, -4.12].) Every cell in this table will change if you change the SEED= or NSAMPLES= options because the values in this table are based on the bootstrap samples.
Although the difference of means is the most frequent statistic to bootstrap, you can see from the lower table that the BOOTSTRAP statement also estimates the standard error, bias, and confidence interval for the standard deviation of the difference. Although this article focuses on the two-sample t test, the BOOTSTRAP statement also applies to one sample t tests.
In summary, the BOOTSTRAP statement in PROC TTEST in SAS/STAT 14.3 makes it easy to obtain bootstrap estimates for statistics in one-sample or two-sample t tests (and paired t tests). By using the BOOTSTRAP statement, the manual three-step bootstrap process (resample, compute statistics, and summarize) is reduced to a zero-step process. The TTEST procedure handles the details for you.
The post The BOOTSTRAP statement for t tests in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
I recently read an interesting article that claims “a single cremation emits as much carbon dioxide as a 1,000-mile car trip.” This got me wondering about cremation data, and I ended up on the Wikipedia page about cremation rates. They had a map of the US cremation rates by state … but the more […]
The post Cremation rates in the US, by state appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Video: New random number generators in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
My 2018 SAS Global Forum paper was about “how to use the random-number generators (RNGs) in SAS.”
You can read the paper for details, but I recently recorded a short video that summarizes the main ideas in the paper. In particular, the video gives an overview of the new RNGs in SAS, which include the following:
If your browser does not support embedded video, you can go directly to the video on YouTube.
The following references provide more information about the random number generators in SAS:
The post Video: New random number generators in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
If you use PROC SGPLOT to create ODS graphics, “ATTRS” are everywhere. ATTRS is an abbreviation of “attributes.” Most options that change the attributes of a graphical element end with the ATTRS suffix. For example, the MARKERATTRS option modifies attributes of markers, the LINEATTRS option modifies attributes of lines, and the FILLATTRS option modifies attributes of the filled area of a bar or other region. These options are easy to remember and use.
However, there are three “ATTRS” that you might find more confusing: CYCLEATTRS, ATTRPRIORITY, and STYLEATTRS.
These options determine the colors, line patterns, and marker symbols that are used to represent groups in your data. They interact with each other when you use a GROUP= option to specify groups and also use multiple statements to overlay several graph types such as scatter plots and series plots.
This article summarizes the ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS keywords and provides an example that shows how they interact with each other.
The end of this article presents a list of references for further reading.
In PROC SGPLOT, many statements support the GROUP= option. In order to distinguish one group from another, the groups are assigned different attributes such as colors, line patterns, marker symbols, and so on. The colors, patterns, and other attributes are defined by the current ODS style (or by the STYLEATTRS statement or the DATTRMAP= data set), but the ATTRPRIORITY and CYCLEATTRS keywords determine how the colors, patterns, and symbols combine for each group.
The following describes the syntax and purpose of each keyword:
Before looking at the interaction between these keywords, recall how the ATTRPRIORITY= option affects a graph that has only one statement. The following calls to PROC SGPLOT are identical. The only difference is the ATTRPRIORITY= option.
title "Iris Data: AttrPriority=Color"; ods graphics / AttrPriority=Color; proc sgplot data=Sashelp.Iris; reg x=PetalWidth y=SepalLength / group=Species markerattrs=(size=10) lineattrs=(thickness=4); run; title "Iris Data: AttrPriority=None"; ods graphics / AttrPriority=None; proc sgplot data=Sashelp.Iris; reg x=PetalWidth y=SepalLength / group=Species markerattrs=(size=10) lineattrs=(thickness=4); run; |
The plot on the left shows the result of using ATTRPRIORITY=COLOR with the HtmlBlue style. The three groups are represented by using the first three colors in the style (in this case, blue, red, and green), but the line patterns and marker symbols do not change between groups. In contrast, the plot on the left is the result of using ATTRPRIORITY=NONE. The three groups are represented by using the first three colors in the style and also the first three line patterns (solid, dashed, and dash-dot) and the first three marker symbols (circle, plus sign, and “X”). The results might depend on the current ODS style.
The ATTRPRIORITY and CYCLEATTRS options interact when you use two or more statements to overlay graphs.
In Warren Kuhfeld’s short course on “Advanced ODS Graphics Examples in SAS,” he presents a slide that shows how these options interact. The following example creates a panel of graphs that is inspired by Warren’s slide. Each graph in the example shows the same data (the closing stock price for two companies in 2001–2003) displayed for each of the four combinations of the ATTRPRIORITY and CYCLEATTRS options. Although the LOESS statement can display the markers, I used a separate SCATTER statement because I want to illustrate how the CYCLEATTRS statement affects the attributes.
data Stocks; set Sashelp.Stocks; where '01JAN2001'd <= Date <= '31OCT2003'd and Stock in ("IBM", "Microsoft"); run; /* The code for each call is the same except for the options. Wrap in a macro for brevity. */ %macro GraphIt(priority, cycle); title "AttrPriority=&priority.; &cycle."; ods graphics / AttrPriority=&priority.; proc sgplot data=Stocks &cycle.; loess x=Date y=Close / group=Stock lineattrs=(thickness=4) NoMarkers smooth=0.75; scatter x=Date y=Close / group=Stock markerattrs=(size=10); yaxis label="Stock Price"; run; %mend; ods graphics / width=400px height=300px; ods layout gridded columns=2 advance=table column_gutter=0 row_gutter=0; %GraphIt(Color, NoCycleAttrs); %GraphIt(None, NoCycleAttrs); %GraphIt(Color, CycleAttrs); %GraphIt(None, CycleAttrs); ods layout end; |
The panel for the example shows the interaction between the ATTRPRIORITY= and CYCLEATTRS options.
Depending on your needs, either graph in the first row would be appropriate for color graphs. The lower-right plot is most suitable for ODS styles that create monochrome graphs.
After you understand the interaction between the ATTRPRIORITY= and CYCLEATTRS options, it is straightforward to use the STYLEATTRS statement. The statement merely overrides the group attributes for the current style. If you specify fewer attributes than the number of groups, the attributes are cyclically reused. For example, the following STYLEATTRS statement specifies that all lines should be solid but specifies different colors and symbols for the groups. Because two attributes are specified, the second statement reuses the same colors and symbols even though the NOCYCLEATTRS option is set:
title "AttrPriority=None; NoCycleAttrs; StyleAttrs"; ods graphics / AttrPriority=None; proc sgplot data=Stocks NoCycleAttrs; styleattrs datalinepatterns=(solid) datacontrastcolors=(SteelBlue DarkGreen) datasymbols=(CircleFilled TriangleFilled); loess x=Date y=Close / group=Stock lineattrs=(thickness=4) NoMarkers smooth=0.75; scatter x=Date y=Close / group=Stock markerattrs=(size=10); yaxis label="Stock Price"; run; |
Here’s another trick I learned from Warren’s course: You can use the RESET= option to reset the default value of the ATTRPRIORITY option for the current style:
ods graphics / reset=attrpriority; /* reset the default value for the current style */ |
This article provides an example and summarizes the behavior of the ATTRPRIORITY= option (on the ODS GRAPHICS statement) and the CYCLEATTRS option (on the PROC SGPLOT statement). These options interact when you have several statements in PROC SGPLOT that each use the GROUP= option. You can use the STYLEATTRS statement to override the default attributes for the current ODS style.
Much more can be said about these options and how they interact. In addition to the documentation links in the article, I recommend the following:
The post Attrs, attrs, everywhere: The interaction between ATTRPRIORITY, CYCLEATTRS, and STYLEATTRS in ODS graphics appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
The post 6 ways to use the _NULL_ data set in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
In SAS, the reserved keyword _NULL_ specifies a SAS data set that has no observations and no variables. When you specify _NULL_ as the name of an output data set, the output is not written. The _NULL_ data set is often used when you want to execute DATA step code that displays a result, defines a macro variable, writes a text file, or makes calls to the EXECUTE subroutine. In those cases, you are interested in the “side effect” of the DATA step and rarely want to write a data set to disk.
This article presents six ways to use the _NULL_ data set. Because the _NULL_ keyword is used, no data set is created on disk.
You can compute a quantity in a DATA _NULL_ step and then use the PUT statement to output the answer to the SAS log. For example, the following DATA step evaluates the normal density function at x-0.5 when μ=1 and σ=2. The computation is performed twice: first using the built-in PDF function and again by using the formula for the normal density function. The SAS log shows that the answer is 0.193 in both cases.
data _NULL_; mu = 1; sigma = 2; x = 0.5; pdf = pdf("Normal", x, mu, sigma); y = exp(-(x-mu)**2 / (2*sigma**2)) / sqrt(2*constant('pi')*sigma**2); put (pdf y) (=5.3); run; |
pdf=0.193 y=0.193 |
You can use a null DATA step to display characteristics of a data set.
For example, the following DATA step uses the PUT statement to display the number of numeric and character variables in the Sashelp.Class data set. No data set is created.
data _NULL_; set Sashelp.Class; array char[*} $ _CHAR_; array num[*} _NUMERIC_; nCharVar = dim(char); nNumerVar = dim(num); put "Sashelp.Class: " nCharVar= nNumerVar= ; stop; /* stop processing after first observation */ run; |
Sashelp.Class: nCharVar=2 nNumerVar=3 |
You can also store these values in a macro variable, as shown in the next section.
You can use the SYMPUT or SYMPUTX subroutines to create a SAS macro variable from a value in a SAS data set. For example, suppose you run a SAS procedure that computes some statistic in a table. Sometimes the procedure supports an option to create an output data that contains the statistic. Other times you might need to use the ODS OUTPUT statement to write the table to a SAS data set. Regardless of how the statistic gets in a data set, you can use a DATA _NULL_ step to read the data set and store the value as a macro variable.
The following statements illustrate this technique. PROC MEANS creates a table called Summary, which contains the means of all numerical variables in the Sashelp.Class data. The ODS OUTPUT statement writes the Summary table to a SAS data set called Means. The DATA _NULL_ step finds the row for the Height variable and creates a macro variable called MeanHeight that contains the statistic. You can use that macro variable in subsequent steps of your analysis.
proc means data=Sashelp.Class mean stackods; ods output Summary = Means; run; data _NULL_; set Means; /* use PROC CONTENTS to determine the columns are named Variable and Mean */ if Variable="Height" then call symputx("MeanHeight", Mean); run; %put &=MeanHeight; |
MEANHEIGHT=62.336842105 |
For a second example, see the article “What is a factoid in SAS,” which shows how to perform the same technique with a factoid table.
Sometimes there is no procedure that computes the quantity that you want, or you prefer to compute the quantity yourself. The following DATA _NULL_ step counts the number of complete cases for the numerical variables in the Sashelp.Heart data. It then displays the number of complete cases and the percent of complete cases in the data. You can obtain the same results if you use PROC MI and look at the MissPattern table.
data _NULL_; set Sashelp.Heart end=eof nobs=nobs; NumCompleteCases + (nmiss(of _NUMERIC_) = 0); /* increment if all variables are nonmissing */ if eof then do; /* when all observations have been read ... */ PctComplete = NumCompleteCases / nobs; /* ... find the percentage */ put NumCompleteCases= PctComplete= PERCENT7.1; end; run; |
NumCompleteCases=864 PctComplete=16.6% |
This is a favorite technique of Warren Kuhfeld, who is a master of writing a DATA _NULL_ step that modifies an ODS template. In fact, this technique is at the heart of the %MODSTYLE macro and the SAS macros that modify the Kaplan-Meier survival plot.
Although I am not as proficient as Warren, I wrote a blog post that introduces this template modification technique. The DATA _NULL_ step is used to modify an ODS template. It then uses CALL EXECUTE to run PROC TEMPLATE to compile the modified template.
All the previous tips use _NULL_ as the name of a data set that is not written to disk. It is a curious fact that you can use the _NULL_ data set in almost every SAS statement that expects a data set name!
For example,
you can read from the _NULL_ data set. Although reading zero observations is not always useful, one application is to check the syntax of your SAS code. Another application is to check whether a procedure is installed on your system. For example, you can run the statements PROC ARIMA data=_NULL_; quit; to check whether you have access to the ARIMA procedure.
A third application is to use _NULL_ to suppress debugging output. During the development and debugging phase of your development, you might want to use PROC PRINT, PROC CONTENTS, and PROC MEANS to ensure that your program is working as intended. However, too much output can be a distraction, so sometimes I direct the debugging output to the _NULL_ data set where, of course, it magically vanishes!
For example, the following DATA step subsets the Sashelp.Cars data. I might be unsure as to whether I created the subset correctly. If so, I can use PROC CONTENTS and PROC MEANS to display information about the subset, as follows:
data Cars; set Sashelp.Cars(keep=Type _NUMERIC_); if Type in ('Sedan', 'Sports', 'SUV', 'Truck'); /* subsetting IF statement */ run; /* FOR DEBUGGING ONLY */ %let DebugName = Cars; /* use _NULL_ to turn off debugging output */ proc contents data=&DebugName short; run; proc means data=&DebugName N Min Max; run; |
If I don’t want to this output (but I want the option to see it again later), I can modify the DebugName macro
(%let DebugName = _NULL_;) so that the CONTENTS and MEANS procedures do not produce any output. If I do that and rerun the program, the program does not create any debugging output. However, I can easily restore the debugging output whenever I want.
In summary, the _NULL_ data set name is a valuable tool for SAS programmers. You can perform computations, create macro variables, and manipulate text files without creating a data set on disk. Although I didn’t cover it in this article, you can use DATA _NULL_ in conjunction with ODS for creating customized tables and reports.
What is your favorite application of using the _NULL_ data set? Leave a comment.
The post 6 ways to use the _NULL_ data set in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |