The post JMP Blog has moved! appeared first on JMP Blog.

]]>We missed you these past few weeks, but we look forward to sharing lots of posts -- starting with today's post by data visualization expert and R&D director Xan Gregg, about parallel coordinate plots.

P.S. We are still working on the look and feel of the blog in its new home, so please pardon us while we continue to fine-tune.

tags: JMP Blog

The post JMP Blog has moved! appeared first on JMP Blog.

]]>The post The JMP Blog is moving! appeared first on JMP Blog.

]]>The JMP Blog is moving to the JMP User Community soon!

**When will this happen?**By the end of November at the latest, the JMP Blog will make its debut in its new home with a new look.**Why is it moving?**When you visit the JMP User Community, you will have easy access to the blog. Plus, you will be able to find relevant blog posts when you search for answers to your JMP questions in the Community.**What should you do?**Once the JMP Blog has moved, we'd love it if you'd come by and check it out, subscribe to it there and update your bookmark to the blog.

I'll update you one final time when the blog move is complete!

Thanks for reading!

tags: JMP - General, JMP User Community

The post The JMP Blog is moving! appeared first on JMP Blog.

]]>The post 4 ways to use fixed/baseline (historical) control limits in Control Chart Builder appeared first on JMP Blog.

]]>In my blog post "Generating control limits using Control Chart Builder," I introduced a printing process. Let’s review. Variations in the printing process can cause distortion in the line, including skew, thickness, and length problems. This example considers the length of the line. The line is considered good if it has a printed length of 16 cm +/- 0.2 cm. Any longer, and the sentence may run off of the page. Any shorter, and there would be a lot of wasted space on the page.

The blog post detailed creating a control chart for this data. The limits summary for the control chart is given below.

We want to use these baseline limits with new data.

One method of using fixed limits is with a control limits column property. Open your table that contains your new data. Select the Length column and click Cols->Column Info. Click on the Column Properties drop down and choose Control Limits. Enter 15.99825 for Avg, 15.90519 for LSL, and 16.09131 for UCL. These are the calculated limits from Figure 1.

Click on the XBar drop down and choose R. Now enter the fixed limits for the R chart. Enter 0.0495 for Avg, 0 for the LCL, and 0.161693 for UCL. These are the calculated limits from Figure 1.

Click OK. You have just entered fixed control limits for XBar and R charts for the Length column.

Create your control chart by going to Analyze->Quality and Process->Control Chart Builder. Drag Length to Y and Run to the subgroup role.

Rather than calculating limits from the data, JMP used the fixed limits defined in the column properties. Note that the Limits Sigma says “User Defined” in the Limit Summaries table. We see that many points fall outside of the limits. Furthermore, the averages are higher than those of the baseline process. This process is different from the original process that we used to calculate the baseline control limits.

Another method of fixing the control limits is using the set control limits command. (Note that if you tried the above method of using column properties to set control limits, you will want to delete those column properties prior to continuing with this example. To delete the column properties, return to the data table. Select the Length column. Go to Cols->Column Info. Make sure the Control Limits Column Property is highlighted and click Remove. See Figure 2 above.) In my opinion, set control limits is the easiest method to use. I do not have to remember to define any column properties beforehand, and I only need my data table.

Create your control chart using the steps provided in the Column Property section of this blog post. Right-click in the top chart and select Limits->Set Control Limits. Enter your baseline limits and press OK. Do the same for the R chart.

You are presented with the same graph as in Figure 4.

The Get Limits method is by far the most flexible method. If you have fixed limits for many different processes, you should use the Get Limits method. If you have different fixed control limits for each phase, you should use the Get Limits method. To use the Get Limits method, you need a data table that defines your limits.

A limits data table contains a minimum of two columns. One column must be called _LimitsKey. This column contains keywords that are used to define the limits. Additionally, you need one column for each process that defines the values for each of these keywords. These additional columns must have the same column name as the process of interest.

The keywords needed to define limits for the XBar chart are _Mean (for the average), _LCL (for the lower control limit), and _UCL (for the upper control limit). The keywords needed to define limits for the R chart are _AvgR (for the average), _LCLR (for the lower control limit), and _UCLR (for the upper control limit).

These limit data tables can be created by the old control chart platforms in JMP, or you can create them yourself using File->New->Data Table. In the future, Control Chart Builder will also be able to create these limit data tables.

Create your control chart by following the steps given in the Column Property section of this blog. Click on the red triangle next to Control Chart Builder and choose Get Limits. Pick the Length Limits.jmp data table. You are presented with the same graph as in Figure 4.

The excluded row state method can be used when your new and old data reside in the same data table. This method can only be used if your historical data *and* your new data all have equal subgroup sizes. In your combined data table, make sure the new observations have the excluded row state property. To do this, select the new rows. Select Rows->Exclude/Unexclude. Create your control chart as described in the Column Property section. JMP uses only the unexcluded rows(historical data) to create the control limits. The new data (excluded data) is still plotted on the graph (dimmed), but these data were not used in any of the calculations.

One advantage of this method is that you can see both the baseline data and the new data in the same graph. This may help you determine differences.

JMP provides four different methods of defining fixed/baseline control limits. The column property method requires you to define limits for each control chart as column properties in the data table. Set Control Limits is the easiest method and allows you to define control limits with a right mouse click in each control chart. Get Limits is the most flexible method and requires a separate data table that defines the limits for each process. Get Limits should be used if there are many processes and if there are phase variables. The excluded row state method can be used when your data resides in the same table and your subgroup sizes are equal. Since the historical and new data are plotted in the same graph with the excluded method, comparisons are more straightforward with this method.

JMP Software: Statistical Process Control Course Notes

tags: basline limits, Control Chart, Control Chart Builder, Control Limits, fixed limits, historical limits, QC, Quality Control, SPC, SQC

The post 4 ways to use fixed/baseline (historical) control limits in Control Chart Builder appeared first on JMP Blog.

]]>The post Creating a capability analysis in JMP using your specification limits appeared first on JMP Blog.

]]>We will use Process Capability in JMP to generate a capability analysis. We will then use this analysis to make decisions about our process. You can read more about Process Capability in a post by my colleague Laura Lancaster.

In an earlier post on generating control limits using Control Chart Builder, we saw that stability should **always** be checked **prior** to performing a capability analysis. I introduced an example that involved a printing process. Let’s review this example.

Variations in the printing process can cause distortion in the line, including skew, thickness, and length problems. For our purposes, we are considering the length of the line. The line is considered good if it has a printed length of 16 cm +/- 0.2 cm. Any longer, and the sentence may run off of the page. Any shorter and there would be a lot of wasted space on the page. For every print run, the first and last books are taken for measurement. The line lengths are measured on a specified page in the middle of each book.

We determined in the previous blog post on control charts that the process was *stable*. Now that we know the process is stable, we can determine whether or not our process is *capable*. The capability of the process is defined by how well the process produces product that is within specification. In our example, the line is considered good if it has a printed length of 16 cm +/- 0.2 cm. Even though we knew this and defined this prior to creating our control charts, this information was not used in our control chart analysis. This information defines our specification limits. These values are not calculated values. We know given the size of the page, that a good line has the length stated above.

We can use our same data to perform the capability analysis. To generate the analysis, go to Analyze->Quality and Process->Process Capability. Select Length as the Y, Process. In the Column Roles section, select Length. Open the Process Subgrouping section by clicking on the triangle next to Process Subgrouping. Select Run in the Select Columns section and click Nest Subgroup ID Column. In the Within-Subgroup Variation Statistic section, choose Average of Ranges. Since we used ranges for our control chart, let’s use ranges here as well.

Click OK. You are presented with a dialog that allows you to define the spec limits. You can either select a data table that contains your spec limits or you can enter your spec limits directly. We will enter them directly. Enter 15.98 (16-.02) for the LSL, 16 for the Target, and 16.02 (16+.02) for the USL. Click OK.

You are first presented with a Goal Plot.

The y axis is the std dev standardized to spec. This shows the variability in the data. We notice that the point falls well above the red triangle, indicating high variability. The x axis is the Mean Shift Standardized to Spec. Since the point occurs near the apex of the triangle, we note that the process is close to target (a small amount below target). The area under the red triangle denotes a non-conformance rate of 0.0027 or better (this corresponds to Ppk =1), if the distribution is normal. We see that our process has a much higher non-conformance rate. The Ppk slider can be adjusted so that the triangle denotes different non-conformance rates.

The Capability Box Plot shows that not all of our data meets the specification limits. This box plot is very wide and spans a width much larger than the green standardized specification limits. We also notice that the process is close to the target value, but a little on the lower side. We see this because the solid black line that appears inside the box plot is slightly to the left of the solid green line which denotes the standardized target value.

To look at this process in more detail, select Individual Detail Reports from the red triangle next to Process Capability. The histogram suggests that the data is normal. If you wanted to perform an actual goodness of fit test for normality, you could use the Distribution platform.

The blue density curve for Within Sigma falls pretty close to the black dotted density curve for Overall Sigma. This indicates that the process is stable, which we already showed via Control Chart Builder. In the nonconformance report, we see that the total outside is 67.5. So 67.5% of our measurements do not meet the specification limits.

To further investigate this process, click on the red triangle next to Process Capability and select Out of Spec Values->Color Out of Spec Values.

We see in the data table that less than half of the observations met the specification limits. The number of observations that fell above the specification limit is about the same as the number of observations that fell below the spec limit. This problem needs further investigation. Perhaps we need to take a closer look at how the spec limits were determined. Perhaps there are other variables that we did not take into account. Remember in the description of the process that measurements were taken on the first and last book of each run. In our analysis, we only used the variables Length and Run. We did not take into account book which may also be a source of variation that we need to account for.

The printing process is stable, but not capable. So while we can predict what the process is going to do, we can’t consistently produce lines of the appropriate length (between 15.98 and 16.02 cm). This process needs further investigation.

JMP Software: Statistical Process Control Course Notes

The post Creating a capability analysis in JMP using your specification limits appeared first on JMP Blog.

]]>The post JMP Categorical features you never knew about: Aligned Responses appeared first on JMP Blog.

]]>Many surveys have questions about a person’s attitudes toward different ideas, or their satisfaction with different attributes of a good or service. Most of these are rated on an ordinal scale (e.g., 1-5 where 1 is “Poor”, and 5 is “Excellent”, or 1-5 where 1=”Strongly Disagree” and 5= “Strongly Agree”). Sensory analysis often uses a “Just About Right” (JAR) scale where negative numbers indicate “Not enough” of a particular flavor (Saltiness, Sweetness, etc), positive numbers indicate “Too much” of the flavor, and values near 0 are “Just About Right”.

All of these cases have two features in common: 1) The values have an inherent order, and 2) they are all measured on the same scale. You can use Aligned Responses in the Categorical platform to get a streamlined comparison for many of these columns at once.

As an example, let’s take a look at a famous survey about attitudes and social activities called the “Bowling Alone” data. This data set has a lot of questions that would be good candidates for Aligned Responses. Let’s take a look at a series of questions that measures how people feel about themselves:

- I am the kind of person who knows what I want to accomplish in life and how to achieve it (ACCOMPLISH)
- My friends and neighbors often come to me for advice about products and brands (ADVICE)
- I have better taste than most people (BETTASTE)
- I don't like to take chances (CHANCES)
- I would do better than average in a fist fight (FISTFGT).

People were asked to rate how much they agreed with each of these statements on a scale of 1 (Definitely Disagree) to 6 (Definitely Agree). The way the questions are worded, you can think of these as measures of how people feel about themselves: Do they see themselves as competent? Someone who is respected by others? A resource for friends and neighbors?

It’s interesting to compare these different measures of self-perception. Aligned Responses is tailored for this type of analysis.

On the Analyze menu, go to Consumer Studies and choose Categorical. By default, the first tab you see is the “Simple” tab (you may see a different one if you’ve set your Preferences). Clicking on the “Related” tab, we see that there are several types of responses JMP considers related, including Aligned Responses, Repeated Measures, and Rater Agreement. They all give the same basic Crosstab and Share Charts, but they report different statistics. See the JMP Consumer Research book for more details.

In our case, Aligned Responses is sufficient for what we need.

- Using Ctrl-Click in the columns pane, select the five columns for the report.
- Click Aligned Responses to add the crosstab definition.
- Click “OK” to see the report.

The crosstabs from the Related tab look different from the crosstabs created by other tabs in the Categorical platform. Instead of having separate tables for each response column, they are all placed into one. There is a row for each response, and each value is given a column. The values are common across all the responses, because the platform expects Aligned Responses to work this way.

A centered Likert share chart appears below the crosstab and has the same layout. There is a bar for each row in the table. For each row, the mean is considered the "center" of the response, and the bar is shifted left or right, depending on whether there are more responses at the higher levels or the lower levels. The colors for the values across the top of the table match the colors in each section of the bar. You can easily see that more people agree with “I am the kind of person who knows what I want to accomplish in life and how to achieve it (ACOMPLSH)” than with any of the other statements. Most people are not confident about their ability to perform better than average in a fist fight (FISTFGT). The bar for CHANCES is shifted to the right, and the bright red bar is wider than the others, showing that more people “Definitely Agree” that they don’t like to take chances than they “Definitely Agree” with the other statements.

It’s often important to see how different categories of people responded to different questions, i.e., if there is a predictor variable that can be used as a proxy for other variables. Let’s change the report slightly. Instead of looking at everything as a response, let’s use one of the questions, “I have better taste than most people” (BETTASTE) as a proxy for a person’s self-esteem. Instead of assigning it as part of the Aligned Responses, we’ll make it an X Column.

Now when we click OK, we get a series of crosstabs: One for each of the questions designated as a response, with a row for each value of BETTASTE within each table. The tables are stacked on top of each other, creating a long crosstab that can be difficult to read. Sometimes, I turn it off using the option from the Categorical platform’s red triangle menu.

The Share chart clearly shows what might be difficult to see in the numbers: Each question has its own mean measuring how much, on average, respondents agreed with the statement, but if we examine each of the questions based on how much respondents agreed with “I have better taste than most people”, we see that the bars shift to the right as you move down the row within each sub-table.

People who responded “Definitely Agree” to the BETTASTE question were more likely to “Definitely Agree” that “I am the kind of person who knows what I want to accomplish in life and how to achieve it” and they were more likely to “Definitely Agree” with “I would do better than average in a fist fight”.

The only question in this group that appears to be unrelated to BETTASTE is the statement “I don’t like to take Chances,” since the bars in that table do not shift with the answer for BETTASTE.

Aligned Responses is a convenient way to explore surveys that have many questions with the same ordinal coding (such as “Strongly agree” – “Strongly Disagree” or “Poor” to “Excellent” ratings on a product or service). The Share chart, in particular, is a quick and easy visual that can help you spot patterns in your data that might be hidden in a large table of counts and percentages.

tags: Aligned Responses, Categorical, consumer and market research, Consumer Research, Survey research

The post JMP Categorical features you never knew about: Aligned Responses appeared first on JMP Blog.

]]>The post To explain or predict with Galit Shmueli appeared first on JMP Blog.

]]>Her research interests span a number of interesting topics, most notably her acclaimed research, To Explain or Predict, as well as noteworthy research on statistical strategy, bio-surveillance, online auctions, count data models, quality control and more.

In the Analytically Speaking interview, we’ll focus on her most interesting Explain or Predict work as well as her research on Information Quality and Behavioral Big Data, which was the basis of her plenary talk at the Stu Hunter conference earlier this year. I'll also ask about her books and teaching.

Galit has authored and co-authored many books, two of which — just out this year — include some JMP. First is *Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro*, with co-authors, Peter C. Bruce, Nitin R. Patel, and Mia Stephens of JMP. This first edition release coincides with the third edition release of *Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner, *with the first two co-authors listed above*.* As Michael Rappa says so well in the foreword of the JMP Pro version of the book, “Learning analytics is ultimately about doing things to and with data to generate insights. Mastering one's dexterity with powerful statistical tools is a necessary and critical step in the learning process.”

The second book is *Information Quality: The Potential of Data and Analytics to Generate Knowledge, *which Galit co-authored with Professor Ron S. Kenett, CEO and founder of KPA and research professor at the University of Turin in Italy (you may recognize Ron and KPA colleagues as guest bloggers on the JMP Blog on the topic of QbD). As David Hand notes in his foreword, the book explains that “the same data may be high quality for one purpose and low quality for another, and that the adequacy of an analysis depends on the data and the goal, as well as depending on other less obvious aspects, such as the accessibility, completeness, and confidentiality of the data.”

Both Ron and Galit will be plenary speakers at Discovery Summit Prague in March. You can download a chapter from their book, which discusses information quality support with JMP and features an add-in for Information Quality, both written by Ian Cox of JMP. You can see a short demo of JMP support for information quality during the Analytically Speaking webcast on Nov. 16.

Whether your analysis is seeking to explain some phenomena and/or to make useful predictions, you will want to hear Galit’s thoughtful perspective on the tensions between these two goals, as well as what Galit has to say on other topics up for discussion. Join us! If Nov. 16 doesn’t suit your schedule, you can always view the archived version when convenient.

The post To explain or predict with Galit Shmueli appeared first on JMP Blog.

]]>The post Data table tools part 1: Custom Date Formula Writer appeared first on JMP Blog.

]]>To begin, install the Data Table Tools Add-in and navigate to the formula writer:

Now the new date column is just four steps away:

- Choose the table and column containing the character date/time data.
- Point and click to delimit the "words" in the data.
- Specify the meaning of each word, and various options, using drop-down menus and radio buttons.
- Press the "Build formula column" button.

Here's what the process looks like:

Step 2: Point and click the text to delimit the data, then press the "Apply delimiting and choose words" button.

Step 3: Complete the dialog using the radio buttons and drop-down menus to select options and word roles.

Step 4: Click the "Build Formula Column" button to write the new formula column to the data table.

The column formula is written automatically. Isn't that nice? Hopefully, your date worries are now a thing of the past.

This add-in, along with many others, is available for free on the JMP User Community's File Exchange. (A free SAS profile is required for access.)

I'll be blogging on more table tools in the future, so stay tuned!

Note: This blog post is first in a series exploring the various features of the Data Table Tools add-in.

tags: Add-Ins, Date/Time Format, Tips and Tricks

The post Data table tools part 1: Custom Date Formula Writer appeared first on JMP Blog.

]]>The post Want scientists and engineers to make more discoveries? Here's how to help them appeared first on JMP Blog.

]]>Analytical software has surfaced a new world of analytics that is characterized by these important traits:

- Data are “self-provisioned.” Users are able to get the data they need without assistance and without delay.
- The analytics are visual and interactive. As a result …
- Users can now conduct advanced analytics without a PhD in statistics.
- Analysts conduct their work “in-the-moment.” Insights often surface questions that analysts explore “in-the-moment” creating an active dynamic that further spawns discovery.
- Analytical thinking is completely coupled to the business thinking.
- More than descriptive, analytics are inferential.

Consider this insurance example. Here demographic information from many thousands of current and potential clients was collected and maintained in a database. The insurance company was able to download the data into a spreadsheet and summarize the data but did they get the best exploitable insights? Answering even the simplest questions took days to acquire, splice and arrange the data.

Today, with integrated, interactive and visual analytics insights are revealed in seconds. The big question when it comes to prospective clients is how many of them were converted to new business and what are the factors that drive the conversion? By knowing this, focus can be brought to business practices that lead to higher rates of success.

We started by loading the data. With only a few clicks, tens of thousands prospective client encounters, including demographic information such as income, education, age, martial status, etc., were loaded. You can see from the image above that overall about 12.5% (the blue area) of these prospects were converted into paying customers.

Now to the question at hand: What factors determine success in winning new business? One more click (on the Split button in the lower-left) and an “aha” moment ensued.

The chart above shows that a particular factor (which, due to confidentiality I can’t disclose so we’ll call it ... ), “factor Xn,” leads to an incredibly high conversion rate (about 90% as seen in the blue bar on the right) for a good number of prospects and that the remaining prospects had little chance of succeeding.

The analysts were stunned at seeing this. This insight had eluded them because the overall conversion rate was masking a major distinction, identified by factor Xn, among the prospects. Keep in mind that these analysts spend day-in and day-out poring over data, but this important insight and others that were to follow remained locked within.

This insight spawned a bunch of questions. First, it appears changing sales representative instructions were in order. Second, why was it that the conversion rate for other customers was so incredibly low? This led to questions about pricing, packaging and the like in combination with demographics that would be investigated with designed experiments.

Looking back at the six traits above, we can see that in this case:

- IT established systems that allowed users to get the data themselves: "self-provisioned data."
- Indeed the analytics were highly visual. Yes, all the statistical information is provided, but it is made accessible through graphics and interactivity.
- No PhD in statistics was necessary. The analysis above involves recursive partitioning with cross-validation. A mouthful to be sure, but that complexity (and statistical jargon) does not get in the way of a business analyst or engineer gaining the highest possible number and quality of exploitable insights. They can focus on their subject matter unfettered. In fact, my experience is that the tool almost becomes invisible as the focus is on the subject matter.
- Unlike the old days, when I started in this game, there was no need to submit a request that instructs programmers in IT to amend a report that will arrive several days later. The lapsed time between question-and-answer was gone, and so was the dependency.
- The old division of labor between analytics and business was gone. They must be welded together to be effective and efficient at finding exploitable business, engineering and scientific insights.
- Notice that the analysis is not simply descriptive, as it was in the old days. It is inferential because it leads analysts to predict future outcomes and ask further questions.

Not only were the analysts impressed with the insight, but they were also excited about how readily it was derived.

What does it take to bring the new world of analytics into your organization and support a culture of analytics?

This is where IT comes in -- obviously, they have a major role to play. IT no longer needs to worry about conducting analytics. It’s best left to the analysts. Instead, IT are now enablers of analytics. They can do this by:

- Maintaining the hardware and software infrastructure that supports operational and analytical needs.
- Making data available in an analytically-friendly way so that data may be self-provisioned. We do lots of work in this area to ensure that analytical data demands do not affect operations. For example, in pharmaceutical, semiconductor, solar and other industries, unimpeded real-time data must be collected for traceability. Analytical demand on IT infrastructure cannot affect operational systems.
- Support the likes of our company, Predictum, in developing integrated analytical applications that further facilitate analysis, store and transfer knowledge and insights and gain other efficiencies and cost savings in areas of operations, research and compliance.
- Secure all systems.

Securing systems is a rapidly growing and increasingly demanding responsibility for IT -- so much so that we find that IT folks are usually very happy to be relieved of the burden of conducting analytics or involving themselves with analytics that analysts can better support themselves. Their enabling role is much more consistent with their other activities and responsibilities. For example, IT supports order/shipping/billing systems, but they do not order, ship or bill themselves -- so why should they conduct business, science or engineering analytics?

With the Internet of Things, new more capable equipment and the internet’s expanding reach, we can expect an exponential increase in the amount and quality of data well into the future. It’s best to prepare for the opportunities presented by building a culture of analytics now. That involves designing the right data architecture, providing JMP and enabling business analysts, scientists and engineers to advance their subject matter expertise with analytics.

**Editor's Note:** A version of this blog post first appeared in the Predictum blog. Thanks to Wayne Levin for sharing it here as well.

tags: Analytic Culture, Analytics, Discovery

The post Want scientists and engineers to make more discoveries? Here's how to help them appeared first on JMP Blog.

]]>The post Formulations involving both mixture and process variables appeared first on JMP Blog.

]]>The following tip is from this new book, which focuses on providing the essential information needed to successfully conduct formulation studies in the chemical, biotech and pharmaceutical industries:

Although most journal articles present mixture experiments and models that only involve the formulation components, most real applications also involve process variables, such as temperature, pressure, flow rate and so on. How should we modify our experimental and modeling strategies in this case? A key consideration is whether the formulation components and process variables interact. If there is no interaction, then an additive model, fitting the mixture and process effects independently, can be used:

c(x,z) = f(x) + g(z), where 1

f(x) is the mixture model, and g(z) is the process variable model. Independent designs could also be used. However, in our experience, there is typically interaction between mixture and process variables. What should we do in this case? Such interaction is typically modeled by replacing the additive model in Equation 1 with a multiplicative model:

c(x,z) = f(x)*g(z) 2

Note that this multiplicative model is actually non-linear in the parameters. Most authors, including Cornell (2002), therefore suggest multiplying out the individual terms in f(x) and g(z) from Equation 2, creating a linear hybrid model. However, this tends to be a large model, since the number of terms in linearized version of c(x,z) will be the number in f(x) times the number in g(z). In Cornell’s (2002) famous fish patty experiment, there were three mixture variables (7 terms) and three process variables (8 terms), but the linearized c(x,z) had 7*8 = 56 terms, requiring a 56-run hybrid design.

Recent research by Snee et al. (2016) has shown that by considering hybrid models that are non-linear in the parameters, the number of terms required, and therefore the size of designs required, can be significantly reduced, often on the order of 50%. For example, if we fit equation 2 directly as a non-linear model, then the number of terms to estimate is the number in f(x) plus the number in g(z); 7 + 8 = 15 in the fish patty case. Snee et al. (2016) showed using real data that this approach can often provide reasonable models, allowing use of much smaller fractional hybrid designs. We therefore recommended an overall sequential strategy involving initial use of fractional designs and non-linear models, but with the option of moving to linearized models if necessary.

**Want more information?** You can read a free chapter from the book and learn about authors Ronald D. Snee and Roger W. Hoerl. In addition, the details of this approach can be found in Chapter 9 of Snee and Hoerl's book (2016).

**References**

- Cornell, J.A. (2002),
*Experiments with Mixtures: Designs, Models, and Analysis of Mixture Data, 3rd Edition*, John Wiley & Sons, Hoboken, NY. - Snee, R.D., Hoerl, R.W., and Bucci, G. (2016), “A Statistical Engineering Approach to Mixture Experiments with Process Variables”
*Quality Engineering*, 28:3, 263-279. - Snee, R.D., and Hoerl, R.W. (2016),
*Strategies for Formulations Development: A Step by Step Guide Using JMP*, SAS Institute, Cary, NC.

tags: Analytics, Books, Design of Experiments (DOE), Formulation, jmp books, Mixture, Modeling, Statistics

The post Formulations involving both mixture and process variables appeared first on JMP Blog.

]]>The post Empirical power calculations for designed experiments with 1-click simulate in JMP 13 appeared first on JMP Blog.

]]>If our response is continuous, and we are assuming a linear regression model, we can use results from the Power Analysis outline under Design Evaluation. However, what if our response is based on pass/fail data, where we are planning to do 10 trials at each experimental run? For this response, we can fit a logistic regression model, but we cannot use the results in the Design Evaluation outline. Nevertheless, we’re still interested in the power...

**What to do?**

We could do a literature review to see about estimating the power, and hope to find something that applies (and do so for each specific case that comes up in the future). But, it is more straight-forward to run a Monte Carlo simulation. To do so, we need to be able to generate responses according to a specified logistic regression model. For each of these generated responses, fit the model and, for each effect, check if the p-value falls below a certain threshold (say 0.05). This has been possible in previous versions of JMP using JSL, but requires a certain level of comfort with scripting and in particular scripting formulas and extracting information from JMP reports. Also, you need to find the time to write the script. In JMP Pro 13, you can now perform Monte Carlo simulations with just a few mouse-clicks.

**That sounds awesome**

The first time I saw the new one-click simulate, I was ecstatic, thinking of the possible uses with designed experiments. A key element needed to use the one-click simulate feature is a column containing a formula with a random component. If you read my previous blog post on the revamped Simulate Responses in DOE, then you know we already have a way to generate such a formula without having to write it ourselves.

**1. Create the Design, and then Make Table with Simulate Responses checked**

In this example, we have four factors (A-D), and plan an eight-run experiment. I’ll assume that you’re comfortable using the Custom Designer, but if not, you can read about the Custom Design platform here. This example can essentially be set up the same way as an example in our documentation.

Before you click the Make Table button, you need to make sure that Simulate Responses has been selected from the hotspot at the top of the Custom Design platform.

**2. Set up the simulation**

Once the data table is created, we now have to setup our simulation via the Simulate Response dialog described previously. Under Distribution, we select Binomial, with and set N to 10 (i.e. 10 trials for each row of the design). Here, I’ve chosen a variety of coefficients for A-D, with factor D having a coefficient of 0 (i.e., that factor is inactive). The Simulate Response dialog I will use is:

Clicking the Apply button, we get a Y Simulated column simulating the number of successes out of 10 trials, and a column indicating the number of trials (which is used in Fit Model). For modeling purposes, I copied the Y Simulated column into Y.

If we look at the formula for Y Simulated, we see that it can generate a response vector based on the model given in the Simulate Responses dialog.

**3. Fit the Model**

Now that we have a formula for simulating responses, we need to set up the modeling for the simulated responses. In this case, we want to collect p-values for the effects from repeated logistic regression analyses on the simulated responses. We first need to do the analysis for a single response. If we launch the Fit Model platform, we can add the number of trials to the response role (Y), and change the Personality to Generalized Linear Model with a Binomial Distribution. My Fit Model launch looks like this:

Click the Run button to fit the model. The section of the report that we’re interested in is the Parameter Estimates outline.

For the initial simulated response, A, B, and C were found to be active, and D was not (which is correct). Of course, this is just for a single simulation. We could keep simulating a new response vector, and keeping track of these p-values for each effect, or, we could use one-click simulate and let it do this for us.

**4. Run the simulation**

The column we’re interested in for this blog post is the Prob>ChiSq. We right-click on that column to bring up the menu, and (if you have JMP Pro), at the bottom, above bootstrap, we see an option for Simulate.

The dialog that pops up has a choice for Column to Switch Out and a Choice for Column to Switch In. For our simulations, instead of using the response Y, we want to use Y Simulated, as it contains the formula with the Random Binomial. Instead of using Y when we first used Fit Model, we could have instead used Y Simulated, and switch it out with itself. The Number of Samples refers to how many times to simulate a response. Here I’ve left it at 2500.

Now we just click OK, and let it run. After a short wait, we’re presented with a data table containing a column for each effect from the Fit Model dialog (as well as a simulation ID, SimID), and 2501 rows – the first is the original fit, and marked as excluded, while each other row corresponds to the results from one of our 2500 simulated responses. The values are the p-values for each effect from Fit Model.

The one-click Simulate has also pre-populated the data table with a distribution script, and, because it recognizes the results are p-values, another script called Power Analysis. Running the Power Analysis script provides a distribution of the p-values for each effect, as well as a summary titled Simulated Power with the rejection rate at different levels of alpha. For example, if we look at the effect of factor B, we see that at alpha = 0.05, 2103 times out of 2500 the null hypothesis of no effect was rejected for a rejection rate (empirical power) of about 84%.

I typically right-click on one of the Simulated Power tables, and select Make Combined Data Table. This provides a data table that provides the rejection rates for each term at the four different alpha levels. This makes it easier to view the results in Graph Builder, such as the results for alpha = 0.05.

Now we can see that we have high power to detect the effects for A and B (recall that they had the largest coefficients), while C and the intercept are around 50%. Since D was inactive in our simulation, the rejection rate is around 0.05, as we would expect. We may be concerned with 50% power assuming that the effect of C is correct. With the ease of being able to perform these simulations, it’s simple to go back to Simulate Reponses and change the number of trials for each row of the design before running another one-click simulation. Likewise, we could create a larger design to see how that affects the power. We could even try modeling using generalized regression with a binomial distribution.

**Final thoughts**

To me, a key aspect of this new feature is that it allows you to go through different “what if” scenarios with ease. This is especially true if you are in a screening situation, where it’s not unusual to be using model selection techniques when analyzing data. Now you can have empirical power calculations that match the analysis you plan to use, and help alert you to pitfalls that can arise during analysis. While this was possible prior to JMP 13, I typically didn’t find the time to create a custom formula each time I was considering a design. In the short time I’ve been using the one-click simulate, the ease with which I can create the formula and run the simulations has led me to insights I would not have gleaned otherwise, and has become an important tool in my toolbox.

The post Empirical power calculations for designed experiments with 1-click simulate in JMP 13 appeared first on JMP Blog.

]]>