This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
Motivating students is one of the major challenges teachers and student advocates face on a daily basis and encouraging students to be interested in analytics is a whole other mountain to climb, or is it? What motivates students? Often we make assumptions that students are not motivated or are not […]
The post Using SAS to Help Girl Scouts Grow Cookie Sales! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
The post Simulate data from the betabinomial distribution in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
This article shows how to simulate betabinomial data in SAS and how to compute the density function (PDF).
The betabinomial distribution is a discrete compound distribution. The “binomial” part of the name means that the discrete random variable X follows a binomial distribution with parameters N (number of trials) and p, but there is a twist: The parameter p is not a constant value but is a random variable that follows the Beta(a, b) distribution.
The betabinomial distribution is used to model count data where the counts are “almost binomial” but have more variance than can be explained by a binomial model. Therefore this article also compares the
binomial and betabinomial distributions.
To generate a random value from the betabinomial distribution, use a twostep process. The first step is to draw p randomly from the Beta(a, b) distribution. Then you draw x from the binomial distribution Bin(p, N). The betabinomial distribution is not natively supported by the RAND function SAS, but you can call the RAND function twice to simulate betabinomial data, as follows:
/* simulate a random sample from the betabinomial distribution */ %let SampleSize = 1000; data BetaBin; a = 6; b = 4; nTrials = 10; /* parameters */ call streaminit(4321); do i = 1 to &SampleSize; p = rand("Beta", a, b); /* p[i] ~ Beta(a,b) */ x = rand("Binomial", p, nTrials); /* x[i] ~ Bin(p[i], nTrials) */ output; end; keep x; run; 
The result of the simulation is shown in the following bar chart. The expected values are overlaid. The next section shows how to compute the expected values.
The Wikipedia article about the betabinomial distribution contains a formula for the PDF of the distribution. Since the distribution is discrete, some references prefer to use “PMF” (probability mass function) instead of PDF. Regardless, if X is a random variable that follows the betabinomial distribution then the probability that X=x is given by
where B is the complete beta function.
The binomial coefficients (“N choose x“) and the beta function are defined in terms of factorials and gamma functions, which get big fast. For numerical computations, it is usually more stable to compute the logtransform of the quantities and then exponentiate the result. The following DATA step computes the PDF of the betabinomial distribution. For easy comparison with the distribution of the simulated data, the DATA step also computes the expected count for each value in a random sample of size N.
The PDF and the simulated data are merged and plotted on the same graph by using the VBARBASIC statement in SAS 9.4M3.
The graph was shown in the previous section.
data PDFBetaBinom; /* PMF function */ a = 6; b = 4; nTrials = 10; /* parameters */ do x = 0 to nTrials; logPMF = lcomb(nTrials, x) + logbeta(x + a, nTrials  x + b)  logbeta(a, b); PMF = exp(logPMF); /* probability that X=x */ EX = &SampleSize * PMF; /* expected value in random sample */ output; end; keep x PMF EX; run; /* Merge simulated data and PMF. Overlay PMF on data distribution. */ data All; merge BetaBin PDFBetaBinom(rename=(x=t)); run; title "The BetaBinomial Distribution"; title2 "Sample Size = &SampleSize"; proc sgplot data=All; vbarbasic x / barwidth=1 legendlabel='Simulated Sample'; /* requires SAS 9.4M3 */ scatter x=t y=EX / legendlabel='Expected Value' markerattrs=GraphDataDefault(symbol=CIRCLEFILLED size=10); inset "nTrials = 10" "a = 6" "b = 4" / position=topleft border; yaxis grid; xaxis label="x" integer type=linear; /* force TYPE=LINEAR */ run; 
One application of the betabinomial distribution is to model count data that are approximately binomial but have more variance (“thicker tails”) than the binomial model predicts.
The expected value of a Beta(a, b) distribution is a/(a + b), so let’s compare the betabinomial distribution to the binomial distribution with p = a/(a + b).
The following graph overlays the two PDFs for a = 6, b = 4, and nTrials = 10. The blue distribution is the binomial distribution with p = 6/(6 +
4) = 0.6. The pink distribution is the betabinomial. You can see that the betabinomial distribution has a shorter peak and thicker tails than the corresponding binomial distribution. The expected value for both distributions is 6, but the variance of the betabinomial distribution is greater. Thus you can use the betabinomial distribution as an alternative to the binomial distribution when the data exhibit greater variance than expected under the binomial model (a phenomenon known as overdispersion).
The betabinomial distribution is an example of a compound distribution. You can simulate data from a compound distribution by randomly drawing the parameters from some distribution and then using those random parameters to draw the data. For the betabinomial distribution, the probability parameter p is drawn from a beta distribution and then used to draw x from a binomial distribution where the probability of success is the value of p. You can use the betabinomial distribution to model data that have greater variance than expected under the binomial model.
The post Simulate data from the betabinomial distribution in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
Information Dashboards were the hot topic a few years ago, but the hype seems to have died down lately. A good dashboard is still a very useful way to summarize, analyze, and share data – so I thought I’d revisit the topic, and try to improve an old dashboard. Did […]
The post Does your dashboard measure up? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
The post Catch runtime errors in SAS/IML programs appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
Did you know that a SAS/IML function can recover from a runtime error? You can specify how to handle runtime errors by using a programming technique that is similar to the modern “trycatch” technique, although the SAS/IML technique is an older implementation.
In general, SAS/IML programmers should try to detect potential problems and prevent errors from occurring.
For example, before you compute the square root of a number, you should test whether the number is greater than or equal to zero. If not, you can handle the bad input. By testing the input value before you call the SQRT function, you prevent a runtime error.
However, sometimes you don’t know whether a computation will fail until you attempt it. Other times, the test to determine whether an error will occur is very expensive, and it is cheaper to attempt the computation and handle the error if it occurs.
A canonical example is computing the inverse of a square matrix. It is difficult to test whether a given matrix is numerically singular. You can compute the rank of the matrix or the condition number of the matrix, but both computations are expensive because they rely on the SVD or eigenvalue decomposition. Cheaper methods include computing a determinant or using Gaussian elimination, but these methods can be numerically unstable and are not good indicators of whether a matrix is numerically singular.
To understand how to handle runtime errors in SAS/IML modules, recall the following facts about how to use the execution queue in SAS/IML modules.
In summary, when a runtime error occurs in a module, the module pauses and waits for additional commands. However, if there are statements in the execution queue, those statements will be executed.
You can use these facts to handle errors. At the top of the module, use the PUSH statement to add errorhandling statements into the execution queue. If an error occurs, the module pauses, sees that there are statements in the queue, and executes those statements. If one of the statements contains a RESUME statement, the module resumes execution.
Let’s apply these ideas to computing the inverse of a square matrix. The following SAS/IML function attempts to compute the inverse of the input matrix. If the matrix is singular, the computation fails and the module handles that error. The main steps of the modules are as follows:
proc iml; /* If the matrix is not invertible, return a missing value. Otherwise, return the inverse. */ start InvEx(A); errFlag = 1; /* set flag. Assume we will get an error */ on_error = "if errFlag then do; AInv = .; resume; end;"; call push(on_error); /* PUSH code that will be executed if an error occurs */ AInv = inv(A); /* if error, AInv set to missing and function resumes */ errFlag = 0; /* remove flag for normal exit from module */ return ( AInv ); finish; A1 = {1 1, 0 1}; B1 = InvEx(A1); A2 = {1 1, 1 1}; B2 = InvEx(A2); print B1, B2; 
The function is called first with a nonsingular matrix, A1. The INV function does not fail, so the module exits normally. When the module exits, it flushes the execution queue. Because ErrFlag=0, the pushed statements have no effect. B1 is equal to the inverse of A1.
The function is called next with a singular matrix, A2. The INV function encounters an error and the module pauses. When the module pauses, it executes the statements in the queue. Because ErrFlag=1, the statements set AINV to a missing value and resume the module execution. The module exits normally and the B2 value is set to missing.
In summary, you can implement statements that handle runtime errors in SAS/IML modules. The key is to push errorhandling code into a queue. The errorhandling statements should include a RESUME statement. If the module pauses due to an error, the statements in the queue are executed and the module resumes execution. Be aware that the queue is flushed when the module exits normally, so use a flag variable and IFTHEN logic to control how the statements behave when there is no error.
The post Catch runtime errors in SAS/IML programs appeared first on The DO Loop.
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
The post A tip for debugging SAS/IML modules: The PAUSE statement appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
Debugging is the bane of every programmer. SAS supports a DATA step debugger, but that debugger can’t be used for debugging SAS/IML programs.
In lieu of a formal debugger, many SAS/IML programmers resort to inserting multiple PRINT statements into a
function definition. However, there is an easier way to query the values of variables inside a SAS/IML function: Use the PAUSE statement to halt the execution of program inside the misbehaving function. You can then interactively submit PRINT statements or any other valid statements. When you are finished debugging, you can submit the RESUME statement and the function will resume execution. (Or you can submit the STOP statement to halt execution and return to the main scope of the program.)
The SAS/IML language is interactive.
The PAUSE statement pauses the execution of a SAS/IML module. While the program is paused you can print or assign local variables inside the module. When you are ready for the module to resume execution, you submit the RESUME statement. (The SAS/IML Studio application works slightly differently. The PAUSE statement brings up a dialog box that you can use to submit statements. You press the Resume button to resume execution.)
For example, suppose you write a function called ‘Func’ whose purpose is to compute the sum of squares of the elements in a numeric vector. While testing the function, you discover that the answer is not correct when you pass in a row vector. You decide to insert a PAUSE statement (“set a breakpoint”)
near the end of the module, just before the RETURN statement, as follows:
proc iml; /* in SAS/IML, the CALL EXECUTE statement runs immediately */ start Func(x); y = x`*x; /* Debugging tip: Use PAUSE to enter INTERACTIVE mode INSIDE module! */ pause "Inside 'Func'; submit RESUME to resume computations."; /* Execution pauses until you submit RESUME, then continues from the next statement. */ return (y); finish; w = Func( {1 2 3} ); 
When you run the program, it prints Inside 'Func'; submit RESUME to resume computations. The program then waits for you to enter commands. The program will remain paused until you submit the RESUME statement (include a semicolon!).
Because the program has paused inside the function, you can query or set the values of local variables. For this function, there are only two local variables, x and y. Highlight the following statement, and press F3 to submit it.
print y; /* inside module; print local variable */

The output shows that the value of y is a matrix, not a scalar, which indicates that the expression x`*x does not compute the sum of squares for this input vector. You might recall that SAS/IML has a builtin SSQ function, which computes the sum of squares. Highlight the following statement and press F3 to submit it:
y = ssq(x); /* assign local variable inside module */ 
This assignment has overwritten the value of the local variable, y. When you submit a RESUME statement, the function will resume execution and return the value of y. Since this program does not contain a QUIT statement, the procedure will remain active at the main scope. You can therefore print the value of w, as follows:
resume; /* resume execution. Return to main scope */ print w; /* print result at main scope */ 
To make the change permanent, you must edit the function definition and redefine the module. Be sure to remove the PAUSE statement when you are finished debugging: If you run a program in batch mode that contains a PAUSE statement, the program will forever wait for input!
In conclusion, the PAUSE statement enables you to pause the execution of a program inside a module and interactively query and change the local variables in the module. This can help you to debug a function. After you finish investigating the state of the local variables, you can submit the RESUME statement, which tells the function to continue executing from the line after the PAUSE statement. (Or submit STOP to exit the function.) The PAUSE statement can be a useful alternative to inserting many PRINT statements inside a function during the debugging phase of program development.
The post A tip for debugging SAS/IML modules: The PAUSE statement appeared first on The DO Loop.
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
Here’s a Proc Print trick for grouped data. Suppose your data is divided into groups, such as males and females. You could sort by the grouping variable before printing, like this: Suppose you want to better emphasize the groups. You could add a BY statement, like this: OK, but, personally, […]
The post Simple Proc Print trick for grouped data appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
One of the great things about living in an area that has seasons is you get to see the leaves change colors in the fall. If you’re a big fan of seeing the leaves at their peak, you could actually travel around the country and see the leaves at their […]
The post When are the fall leaves at their peak? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
I recently saw an alarming article on social media about an outbreak of airborne plague spreading from Madagascar to Africa (and potentially to the rest of the world). The plague?!? – I thought that only happened hundreds of years ago?!? I don’t really trust news on Facebook, so I went […]
The post Graphing mistakes to avoid … like the plague! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post  go there to comment and to read the full post. 
The post How to format rows of a table in SAS appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the number of nonmissing observations (N) should never use the format of the variable, whereas the minimum, maximum, and mean values should.
In SAS you can easily apply a format to a column, but not to a row.
However, SAS provides two techniques that enable you to control the appearance of cells in tables:
Both PROC TEMPLATE and PROC REPORT have a learning curve. If you are in a hurry,
an alternative solution is to use the DATA step to create a factoid. Recall that
a factoid is an ODS table that contains character values and (formatted) numbers in the same column.
A factoid can display mixedtype data by converting every value to a string.
This conversion process gives you complete control over the format of each cell in the table. Consequently, you can use context to format some numbers differently from other numbers.
Suppose that you want to generate the table shown at the top of this article. You can obtain the statistics from
PROC MEANS and then transpose the data. The following statements produce an output data set that contains descriptive statistics for three variables in the Sashelp.Cars data:
proc means data=sashelp.cars noprint; var Cylinders EngineSize MSRP; output out=stats(drop=_TYPE_ _FREQ_ rename=(_STAT_=Label1) where=(Label1 ^= "STD") ); run; proc print data=stats noobs; run; 
I have highlighted the first row of the output data to indicate why this output is not suitable for a report.
The output variables retain the format of the original variables. However,
the ‘N’ statistic is the sample size, and a sample of size ‘$428’ does not make sense! Even ‘428.000’ is a strange value to see for a sample size. It would be great to format the first row with an intrger format such as 8.0.
The next DATA step creates three character variables (CVALUE1CVALUE3) that contain formatted values of the three numerical variables in the data. The SELECTWHEN statement does the following:
data Factoid; set stats; array nValue[3] Cylinders EngineSize MSRP; /* numerical variables */ array cValue[3] $12.; /* cValue[i] is formatted version of nValue[i] */ /* with a little effort you could assign labels dynamically */ label cValue1="Cylinders" cValue2="EngineSize" cValue3="MSRP"; do i = 1 to dim(nValue); select (Label1); when ('N') cValue[i] = put(nvalue[i], 8.0); otherwise cValue[i] = vvalue(nvalue[i]); end; end; run; proc print data=Factoid label noobs; var Label1 cValue:; run; 
The output displays the character columns, which contain formatted numbers. With additional logic, you could display fewer decimal values for the MEAN row.
The previous table contains the information that the programmer wants. However, the
programmer asked to display the transpose of the previous table. For completeness, here are SAS statements that transpose the table:
proc transpose data=Factoid out=TFactoid; var cValue:; ID Label1; IDLABEL Label1; run; proc print data=TFactoid(drop=_NAME_) noobs label; label _LABEL_ = "Variable"; run; 
The final table is displayed at the top of the article.
In summary, this article shows one way to display a column of data in which each cell potentially has a different format. The trick is to create a factoid, which means that you create a character copy of numeric data. As you create the copy, you can use the PUT function to apply a custom format, or you can use the VVALUE function to get the existing formatted value.
In general, you should try to avoid creating copies of data, so use PROC TEMPLATE and PROC REPORT if you know how. However, if you don’t know how to use those tools, the factoid provides an alternative way to control the formatted values that appear in a table. Although not shown in this article, a factoid also enables you to display character and numeric values in the same column.
The post How to format rows of a table in SAS appeared first on The DO Loop.
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
The post What is a factoid in SAS? appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
Have you ever seen the “Fit Summary” table from PROC LOESS, as shown to the right? Or maybe you’ve seen the “Model Information” table that is displayed by some SAS analytical procedures?
These tables provide brief interesting facts about a statistical procedure, hence they are called factoids.
In SAS, a “factoid” has a technical meaning:
I want to emphasize the first item in the list.
Since variables in a SAS data set must be either character or numeric, you might wonder how to access the data that underlies a factoid. You can use the ODS OUTPUT statement to look at the data object behind any SAS table, as shown below:
proc loess data=sashelp.cars plots=none; model mpg_city = weight; ods output FitSummary=Fit(drop=SmoothingParameter); run; proc print data=Fit noobs; run; 
The PROC PRINT output shows how the factoid manages to display characters and numbers in the same column. The underlying data object contains three columns. The LABEL1 column contains the row headers, which identify each row. The CVALUE1 column is the column that is displayed in the factoid. It is a character column that contains character strings and formatted copies of the numbers in the NVALUE1 column.
The NVALUE1 column contains the raw numeric value of every number in the table. Missing values represent rows for which the table displays a character value.
All factoids use the same naming scheme and display the LABEL1 and CVALUE1 columns.
The form of the data is important when you want to use the numbers from a factoid in a SAS program.
Do not use the CVALUE1 character variable to get numbers! Those values are formatted and possibly truncated, as you can see by looking at the “Smoothing Parameter” row.
Instead, read the numbers from the NVALUE1 variable, which stores the full doubleprecision number.
For example, if you want to use the AICC statistic (the last row) in a program, read it from the NVALUE1 column, as follows:
data _NULL_; set Fit(where=( Label1="AICC" )); /* get row with AICC value */ call symputx("aicc", NValue1); /* read value of NValue1 variable into a macro */ run; %put &=aicc; /* display value in log */ 
AICC=3.196483775 
Some procedures produce factoids that display multiple columns. For example, PROC CONTENTS creates the “Attributes” table, which is a factoid that displays four columns. The “Attributes table displays two columns of labels and two columns of values. When you use the ODS OUTPUT statement to create a data set, the variables for the first two columns are LABEL1, CVALUE1, and NVALUE1. The variables for the second two columns are LABEL2, CVALUE2, and NVALUE2.
Be aware that the values in the LABEL1 (and LABEL2) columns depend on the LOCALE= option for your SAS session. This means that some values in the LABEL1 column might be translated into French, German, Korean, and so forth. When you use a WHERE clause to extract a value, be aware that the WHERE clause might be invalid in other locales. If you suspect that your program will be run under multiple locales, use the _N_ automatic variable, such as if _N_=14 then call symputx("aicc", NValue1);. Compared with the WHERE clause, using the _N_ variable is less readable but more portable.
Now that you know what a factoid is, you will undoubtedly notice them everywhere in your SAS output. Remember that if you need to obtain numerical values from a factoid, use the ODS OUTPUT statement to create a data set. The NVALUE1 variable contains the full doubleprecision numbers in the factoid. The CVALUE1 variable contains character values and formatted versions of the numbers.
The post What is a factoid in SAS? appeared first on The DO Loop.
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 