The post The QbD Column: Applying QbD to make analytic methods robust appeared first on JMP Blog.

]]>We begin with a quick review of analytic methods and a brief summary of the experiment described in that previous blog post, and then show what is learned by using a DSD.

Analytic methods are used to carry out essential product and process measurements. Such measurement systems are critical in the pharmaceutical industry where understanding of the process monitoring and control requirements are important for developing sound analytic methods. The typical requirements for evaluating analytic methods include:

**Precision:**This requirement makes sure that method variability is only a small proportion of the specifications range (upper specification limit – lower specification limit).**Selectivity:**This determines which impurities to monitor at each production step and specifies design methods that adequately discriminate the relative proportions of each impurity.**Sensitivity:**This relates to the need for methods that accurately reflect changes in CQA's that are important relative to the specification limits, which is essential for effective process control.

QbD implementation in the development of analytic methods is typically a four-stage process addressing both design and control of the methods[2]. The stages are:

*Method Design Intent*: Identify and specify the analytical method performance.*Method Design Selection*: Select the method work conditions to achieve the design intent.*Method Control Definition*: Establish and define appropriate controls for the components with the largest contributions to performance variability.*Method Control Validation*: Demonstrate acceptable method performance with robust and effective controls.

We continue here the discussion of how to use statistically designed experiments to achieve robustness, which we began in our previous blog post.

The case study we presented in our previous post concerns the development of a High Performance Liquid Chromatography (HPLC) method[3]. The specific system consists of an Agilent 1050, with a variable-wavelength UV detector and a model 3396-A integrator. Table 1 lists the factors and their levels used in the designed experiments of this case study. The original experimental array was a 2^{7-4} Fractional Factorial experiment with three center points (see Table 2). The levels "-1" and "1" correspond to the lower and upper levels listed in Table 1, and "0" corresponds to the nominal level. The lower and upper levels are chosen to reflect variation that might naturally occur about the nominal setting during regular operation.

The fractional factorial experiment (Table 2) consists of 11 runs that combine the design factor levels in a balanced set of combinations, including three center points.

In our previous post, we analyzed the data from the factorial experiment and found that the experiment provided answers to several important questions:

- How sensitive is the method to natural variation in the input settings?
- Which inputs have the largest effect on the outputs from the method?
- Are there different inputs that dominate the sensitivity of different responses?
- Is the variation transmitted from factor variation large relative to natural run-to-run variation?

All of the above answers relate to our ability to assess the effects of factor variation when the factors are at their nominal setting. However, they do not address the possibility of improving robustness by possibly moving the nominal setting to one that is less sensitive to factor variation.

Robustness has a close link to nonlinearity. We saw this feature in the previous blog post. There the initial analysis of the factorial experiment showed clear lack-of-fit, which the team attributed to the "gradient" factor. We used a model with a quadratic term for gradient and found that situating the nominal value near the "valley" of the resulting curve could effectively reduce the amount of transmitted variation. Thus, the added quadratic term gave valuable information about where to set the gradient to achieve a robust method.

The presence of interactions is another form of nonlinearity that has consequences for method robustness. Two factors have an interaction effect on a response when the slope of either factor's effect depends on the setting of the other factor. In a robustness experiment, the slope is a direct reflection of method sensitivity. So when there is an interaction, we can typically set the nominal level of one of the factors to a level that moderates the slope of the second factor, thereby reducing its contribution to transmitted variation. Exploiting interactions in this manner is a basic tool in the quality engineering experiments of Genichi Taguchi[4].

The fractional factorial experiment that we analyzed in the previous post was effective for estimating linear effects of the factors – and this was sufficient for assessing robustness. However, to improve robustness, we need a design that is large enough to let us estimate both linear and nonlinear effects. The natural first step is to consider estimating "second order effects", which include pure quadratic effects like the one for gradient in our earlier post and two-factor interactions.

There are three ways we can think about enlarging the experiment to estimate additional terms in a model of the analytic method’s performance. Specifically, we can use a design that is appropriate for estimating:

- All two-factor interactions and pure quadratic effects.
- All two-factor interactions but no pure quadratics.
- All pure quadratics but no interactions.

Effective designs exist for option 1, like the central composite and Box-Behnken designs. Similarly, the two-factor interactions can be estimated from two-level fractional factorial designs (option 2). The main drawback to both of these choices is that they require too many experimental runs. With K factors in the experiment, there are K main effects, K pure quadratics and K(K-1)/2 two-factor interactions. We also need to estimate the overall mean, so we need at least 1+K(K+1)/2 runs to estimate all the main effects and two-factor interactions. If K is small, this may be perfectly feasible. However, with K=7, as in the HPLC experiment, that adds up to at least 29 runs (and at least 36 to also estimate the pure quadratics). These experiments are about three times as large as the fractional factorial design analyzed in the previous blog post and would be too expensive to implement.

Here we consider option 3, designs to estimate all the pure quadratics, but no interactions. A very useful class of experimental designs for this purpose is the Definitive Screening Designs (DSD's). We show in the next section how to use a DSD for studying and improving the robustness of an analytic method.

A Definitive Screening Design (DSD) for K factors requires 2K+1 runs if K is even and 2K+3 if K is odd (to ensure main effect orthogonality). The design needs to estimate 2K+1 regression terms, so this is at or near the minimum number of runs needed. In such a design, all factors are run at three levels in a factorial arrangement, main effects are orthogonal and free of aliasing (partial or full) with quadratic effects and two-factor interaction effects and no quadratic or two-way interaction effect is fully aliased with another quadratic or two-way interaction effect. With a DSD we can estimate all linear and quadratic main effects. Further, if some factors prove to have negligible effects, we may be able to estimate some two-factor interactions. The HPLC study had seven factors, so a DSD requires 17 experimental runs (see Table 3). For robustness studies, it is important to estimate the magnitude of run-to-run variation. The DSD in this application has two degrees of freedom for error, so no additional runs are needed. Were K even, it would be advisable to add at least two runs to permit estimation of error. A simple way to do this is to add more center points to the design.

As in the previous blog post, we will illustrate the analysis by looking at the results for the peakHeight response in the HPLC application. Throughout, we divide the actual peakHeights by 1000 for ease of presentation. We proceeded to fit a model to the DSD experimental data that includes all main effects and pure quadratic effects. The analysis shows that the only significant quadratic effect is that for Gradient. In addition to the Gradient quadratic effect we decided to keep in the model, as linear main effects: Gradient, Column Temperature, Detection Wavelength and Triethylamine Percentage. In Figure 1, we show parameter estimates from fitting this reduced model to the peakHeight responses. All terms are statistically significant, the adjusted R^{2} is 87%, and the run-to-run variation has an estimated standard deviation of 2.585.

We show a Profiler for the reduced quadratic model in Figure 2, below.

In order to improve robustness, we need to identify nonlinear effects. Here the only nonlinear effect is for gradient. Figure 2 shows us that the quadratic response curve for gradient reaches a minimum quite close to the nominal value (0 in the coded units of Figure 2). Consequently, setting the nominal level of Gradient to that level is a good choice for robustness. The other factors can also be kept at their nominal settings. They have only minor quadratic effects, so moving them to other settings will have no effect on method robustness.

We can assess the level of variation, as in the previous post, by assigning normal distributions to the input factors. As in that post, we use the default option in JMP, which assigns to each input a normal distribution with standard deviation of 0.4 (in coded units). Figure 3 shows the results of this simulation. The standard deviation of peakHeight associated with variation in the factor levels is 2.697, very similar in magnitude to the SD for run-to-run variation from the experimental data. The estimate of the overall method SD is then 3.736 (the square root of 2.697^{2} + 2.585^{2}).

It is instructive to compare the results from analyzing the DSD to those from analyzing the fractional factorial in the previous blog post. Both experiments ended with the conclusion that gradient has a nonlinear effect on peakHeight, and that setting gradient close to its planned nominal level is a good choice for robustness of the analytic method. The fractional factorial was not able to identify gradient as the interesting factor; this happened only after substantial discussion by the experimental team. And even then, there was concern that the decision to attribute all the nonlinearity to the gradient might be completely off the mark. The DSD, on the other hand, with just a few more runs, was able to support a firm conclusion that gradient is the only factor that has a nonlinear effect. There was no need for debate and assumptions; the issue could be determined from the experimental data.

The DSD and the fractional factorial are both able to assess variance from factor uncertainty and both agree that the three factors with the most important contributions are gradient, column temperature and detection wavelength. The DSD identified a fourth factor, the percent of Triethylamine, as playing a significant role.

The DSD, by estimating all the pure quadratic effects, was also able to fully confirm that there would be no direct gain in method robustness by shifting any of the factors to different nominal values. Improvement might still be possible due to two-factor interactions; but as we pointed out, only a much larger experiment could detect those interactions.

The DSD has shown us that changing nominal levels is not a solution. An alternative is to institute tighter control on the process parameters, thereby limiting their natural variation. Moreover, the DSD helps us prioritize the choice of which factors to control. Figures 1 and 3 show us that the strongest linear effect is due to the column temperature. They also show that the strong and nonlinear effect of gradient may be contributing some of the most extreme high values of peakHeight. Thus these two variables appear to be the primary candidates for enhanced control. Figures 4 and 5 use the simulator option with the profiler to see the effect of reducing the natural spread of each of these factors, in turn, by a factor of 2. With enhanced control of the column temperature, the SD related to factor uncertainty drops from 2.697 to 2.550. Reducing the variation of the gradient leads to a much more substantial improvement. The SD drops by about 40%, to 1.667.

Experiments on robustness are an important stage in the development of an analytic method. These experiments intentionally vary process factors that cannot be perfectly controlled about their nominal value. Experiments that are geared to fitting a linear regression model are useful for assessing robustness, but have limited value for improving robustness.

We have shown here how to exploit nonlinear effects to achieve more robust analytic methods. The Definitive Screening Design can be especially useful for such experiments. For a minimal experimental cost, it provides enough data to estimate curvature with respect to each input factor. When curvature is present, we have seen how to exploit it to improve robustness. When curvature has been exploited, we have seen how to use the experimental results to achieve further improvements via tighter control of one or more input factors.

**Notes**

[1] Jones, B. and Nachtsheim, C. J. (2011) “A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects” *Journal of Quality Technology*, **43**. 1-15.

[2] Borman, P., Nethercote, P., Chatfield, M., Thompson, D., Truman, K. (2007), Pharmaceutical Technology. http://pharmtech.findpharma.com/pharmtech/Peer-Reviewed+Research/The-Application-of-Quality-by-Design-to-Analytical/ArticleStandard/Article/detail/463580

[3] Romero R., Gasquez, D., Sanshez, M., Rodriguez, L. and Bagur, M. (2002), A geometric approach to robustness testing in analytical HPLC, LCGC North America, **20**, pp. 72-80, www.chromatographyonline.com.

[4] Steinberg, D.M., Bursztyn, D. (1994). Dispersion effects in robust-design experiments with noise factors, *Journal of Quality Technology*, 26, 12-20.

## About the AuthorsThis blog post is brought to you by members of the KPA Group: Ron Kenett and David Steinberg. Read the whole QbD Column series. |
||

tags: Analytics, Definitive Screening Design, Design of Experiments (DOE), QbD, Statistics, The QbD Column

The post The QbD Column: Applying QbD to make analytic methods robust appeared first on JMP Blog.

]]>The post Using Virtual Join in JMP 13 to explore adverse events data appeared first on JMP Blog.

]]>This new feature can help with large data tables, and save you time in trying to figure out the best way to physically join them together. Virtual joins allow you to link tables, making variables available for use together in your analysis, graphs and many other platforms in JMP.

This blog post presents an example of using Virtual Join on data from a randomized controlled clinical trial for the drug nicardipine for treatment of patients with rare, life-threatening aneurysmal subarachnoid hemorrhage (SAH). This was a clinical trial carried out from 1987 to 1989 on very sick patients experiencing bleeding between the brain and the tissues that cover it. Understandably, the patients experienced several adverse events while in the trial. It is the job of the medical monitor of the clinical trial to look at the distribution of those events. Understanding adverse event occurrence across demographic and treatment groups along with their severity and possibility of a relationship to study drug administration is a key component to meeting safety standards to keep patients safe.

This example uses an adverse events data table along with a demographics data table -- to quickly demonstrate which subjects experiencing certain events in the study were a certain age, sex, race and which treatment they received. Each table has a unique identifier (USUBJID) column or variable that can be linked together; once the tables are linked, you can analyze the data and pull together valuable information.

The demographics table (demographic.jmp) contains a unique row for each subject. Therefore, USUBJID column in this table will be the “Link ID” column. The Link ID column property marks a column in the auxiliary data table as the ID column. That is, the rows of the data table are uniquely identified by the values of the ID column. There are several ways to assign this column property, but you can simply right-click on the column and select it, which places a check mark beside Link ID, as shown below.

Once that Link ID is set in this table, you can reference it from the adverse events table (adverseevents.jmp) with the same identifier column (USUBJID). To assign your “Link Reference” column property, use the right-click method again on that column to easily select the demographic.jmp table that you want to reference or link back to, as shown below.

When the Link ID and the Link Reference are set up properly in both tables, you will see a gold key in one table and a blue key in the other.

From demographic.jmp: From adverseevents.jmp:

Below you can see the adverseevents.jmp table, with the referencing columns from the demographic.jmp table in the columns panel:

Nicardipine is a calcium channel blocker that works by relaxing blood vessels so adverse events related to the vascular system are of interest. You can use a Data Filter for AEBODSYS and select VASCULAR DISORDERS, and Distribution lets you easily see those adverse events alongside the demographic information via the virtual join.

You can see that having both hypertension and hypotension was a common occurrence in subjects. With the ability to include treatment ARM via the USUBJID link, you can see that there was reduced occurrence in the Treatment group (NIC .15). Meanwhile if you select Hypotension, you see an increased occurrence in the treatment group. This suggests the drug was perhaps working too well!

Using Graph Builder, you can explore the relationship between the adverse events severity and whether the placebo was administered. This graph seems to show that there were less severe events in using the placebo.

It is also nice to use the Summary platform to find the sum of how many subjects used the Placebo vs. the NIC . 15 grouped by ARM[USUBJID], SEX[USUBJID] and AESEV, and see that the majority of the cases reported mild severity in adverse events. Using Tables - > Summary you can select the following and then see the table produced below.

This is just one example of data that can be referenced from one table to another. I have found Virtual Join very useful when I have multiple tables and need to pull data from different sources. I hope you will find it useful, too.

You can get more information about this new feature from my JMP Discovery Summit poster and from a presentation by Daniel Valente that focuses on Query Builder but discusses Virtual Join as well.

Many thanks to Kelci Miclaus from the JMP Life Sciences team for her substantial contributions to this blog post.

**Reference**

Haley E.C., Kassell N.F. and Torner J.C. (1993). A randomized controlled trial of high-dose intravenous nicardipine in aneurysmal subarachnoid hemorrhage. Journal of Neurosurgery 78: 537-547.

tags: Adverse Events, Clinical Trials, JMP 13, Virtual Join

The post Using Virtual Join in JMP 13 to explore adverse events data appeared first on JMP Blog.

]]>The post Interactive HTML: Profilers in 3 more platforms in JMP 13 appeared first on JMP Blog.

]]>After users got to try this tool, the response was overwhelmingly positive. They found it a great way to explore cross-sections of predicted responses across multiple factors with other people who don’t have JMP yet. However, the feedback was that users would like to see Profilers available in other platforms as well.

In JMP 13, three more platforms have embedded Profilers that are available in interactive HTML.

**Neural**

In JMP, you can analyze your data using Neural Networks. I will use the Diabetes data set from the sample data library to illustrate some of the differences between this platform and Generalized Regression below. Note the curved responses for Age, BMI, and BP as well as the elongated report (only the first five factors out of 10 are shown).

**Generalized Regression**

Generalized Regression embedded Profilers are supported for export from JMP Pro 13. This example also shows an additional enhancement for Interactive HMTL in JMP 13 that allows you to pick how many plots are displayed in a row when you have a lot of factors. You'd do this in JMP by selecting the red triangle, going to Appearance and selecting Arrange in Rows to provide the number you want before exporting. This allows you to explore many factors in Interactive HTML with a nice layout (which can be useful on a mobile device with a smaller screen). You can see the same factors analyzed as in the Neural platform above, but more are visible in the same width display due to this feature.

**Generalized Linear Model**

Generalized Linear Model is the third platform to support interactive HTML embedded Profilers in JMP 13.

In addition to making embedded Profilers in those three platforms available in interactive HTML, JMP 13 includes new features to make exploring your data a little easier. That's what I'll cover in the following sections.

**Adapt Y Axis**

In JMP 12, you could explore data outside of the initial range of the numeric factors by typing in a value in the edit box below the curve. But what if this causes the curve to move outside the initial range of the response? You could see the value displayed in red on the Y axis, but no longer see the curve itself. Now there is an option to have the Y axis automatically adapt to show the min and max values of the curve. Simply click the menu button above the Profiler and check “Adapt Y Axis”.

**Formatted Variables**

Some data requires analyzing a formatted X factor such as a date, time, or geographic location. In JMP 12, you could click or drag anywhere within the Profiler to change the value, but there was no way to provide a precise value for this type of data. Now X variables in these formats are displayed as a button that, when clicked, launches a dialog to enter the individual fields of the format.

**Apply Mixture**

Similarly, in JMP 12, if you tried to precisely set a Profiler with a mixture constraint to a set of values that you knew satisfied the constraint, you couldn’t do it; every time you set one value, the others were altered to satisfy the mixture. In JMP 13, mixture values are applied by clicking an apply button.

For example, the amounts of three ingredients used to make a plastic in the following Profiler must sum to 1 and stay within the ranges shown. The values 0.7, 0.1, and 0.2 sum to 1 exactly. So, by entering these values in the edit boxes and then clicking apply, the Profiler is set to those precise values.

The images shown here as well as a few other examples are available as live interactive HTML files to explore on the web.

JMP offers a wide variety of math functions, special features and powerful algorithms that haven’t all been implemented in HTML, so not every Profiler will come out interactively. If you need to share work with someone who doesn’t have JMP and export your reports to Interactive HTML, we’ve added messages to the log to try to indicate why a particular Profiler has come out as a static image. Armed with this knowledge, we hope you will try your own Profilers and give us feedback on what features and platforms you want to see in the future.

tags: Data Visualization, Interactive HTML, JMP 13, Modeling, Profiler

The post Interactive HTML: Profilers in 3 more platforms in JMP 13 appeared first on JMP Blog.

]]>The post JMP User Community redesign and relaunch is underway appeared first on JMP Blog.

]]>That's because the Community is being redesigned and upgraded. All content is read-only until the relaunch of the Community.

We want you to have a better experience in the Community, including:

- A high-quality mobile experience.
- A new system of ranks and badges that reward you for contributions to the community.
- Better SPAM filters, keeping the noise level down so you can enjoy the good content.
- Faster, more intuitive searches.

Don't worry! Your login information will stay the same, and the login process in the redesigned Community will be streamlined. And, all of the content you created will still be there and be credited to you.

Until the relaunch, you can still search and read the current Community site, but no new posts or members can be added. Sorry!

We will send out another announcement when the redesigned community is launched in a couple of weeks.

Thanks for being part of the JMP User Community!

tags: JMP User Community

The post JMP User Community redesign and relaunch is underway appeared first on JMP Blog.

]]>The post The QbD Column: Is QbD applicable for developing analytic methods? appeared first on JMP Blog.

]]>**Precision:**This requirement makes sure that method variability is only a small proportion of the specifications range (upper specification limit – lower specification limit). This is also called Gage Reproducibility and Repeatability (GR&R).**Selectivity:**This determines which impurities to monitor at each production step and specifies design methods that adequately discriminate the relative proportions of each impurity.**Sensitivity:**To achieve effective process control, this requires methods that accurately reflect changes in CQA's that are important relative to the specification limits.

These criteria establish the reliability of methods for use in routine operations. This has implications for analysis time, acceptable solvents and available equipment. To develop an analytic method with QbD principles, the method’s performance criteria must be understood, as well as the desired operational intent of the eventual end-user. Limited understanding of a method can lead to poor technology transfer from the laboratory into use in commercial manufacturing facilities or from an existing facility to a new one. Failed transfers often require significant additional resources to remedy the causes of the failure, usually at a time when there is considerable pressure to move ahead with the launch of a new product. Applying Quality by Design (QbD) to analytic methods aims to prevent such problems.

QbD implementation in the development of analytic methods is typically a four-stage process, addressing both design and control of the methods[1]. The stages are:

*Method Design Intent*: Identify and specify the analytical method performance.*Method Design Selection*: Select the method work conditions to achieve the design intent.*Method Control Definition*: Establish and define appropriate controls for the components with the largest contributions to performance variability.*Method Control Validation*: Demonstrate acceptable method performance with robust and effective controls.

Testing robustness of analytical methods involves evaluating the influence of small changes in the operating conditions[2]. Ruggedness testing identifies the degree of reproducibility of test results obtained by the analysis of the same sample under various normal test conditions such as different laboratories, analysts, and instruments. In the following case study, we focus on the use of experiments to assess and improve robustness.

The case study presented here is from the development of a High Performance Liquid Chromatography (HPLC) method^{[3]}. It is a typical example of testing the robustness of analytical methods. The specific system consists of an Agilent 1050, with a variable-wavelength UV detector and a model 3396-A integrator.

The goal of the robustness study is to find out whether deviations from the nominal operating conditions affect the results. Table 1 lists the factors and their levels used in this case study. The experimental array is a 2^{7-4} Fractional Factorial experiment with three center points (see Table 2). The levels "-1" and "1" correspond to the lower and upper levels listed in Table 1, and "0" corresponds to the nominal level. The lower and upper levels are chosen to reflect variation that might naturally occur about the nominal setting during regular operation. For examples of QbD applications of Fractional Factorial experiments to formulation and drug product development, see the second and third blog posts in this series.

The experimental array consists of 11 experimental runs that combine the design factors levels in a balanced set of combinations and three center points.

In analyzing the HPLC experiment, we have the following goals:

- Find expected method measurement prediction variance for recommended setups of the method (the measurement uncertainty).
- Identify the best settings of the experimental factors to achieve acceptable performance.
- Determine the factors that impact the performance of the method on one or more responses.
- Assess the impact of variability in the experimental factors on the measured responses.
- Make robust the HPLC process by exploiting nonlinearity in the factor effects to achieve performance that is not sensitive to changes about nominal levels.

If we consider only the eight experimental runs of the 2^{7-4 }fractional factorial, without the center points, we get an average prediction variance of 0.417 and 100% optimality for fitting a first-order model. This is due to the balanced property of the design (see Figure 1 Left). The design in Table 2, with three center points, reduces prediction uncertainty near the center of the region and has a lower average prediction variance of 0.38. However, the center points don't contribute to estimating slopes, as seen in the lower efficiency for fitting the first-order model (see Figure 1 Right).

The JMP Prediction Variance Profile in Figure 2 shows the ratio of the prediction variance to the error variance, also called the relative variance of prediction, at various factor level combinations. Relative variance is minimized at the center of the design. Adding three center points reduces prediction variance by 25%, from 0.12 to 0.09. This is an advantage derived by adding experimental runs at the center points. Another advantage that we will see later is that the center points permit us to assess nonlinear effects, or lack-of-fit for the linear regression model. A third advantage is that the center points give us a model-free estimate of the extent of natural variation in the system.

At each factor level combination, the experiments produced five responses: 1) Area of chromatogram at peak (peakArea), 2) Height of chromatogram at peak (peakHeight), 3) Minimum retention time adjusted to standard (tRmin), 4) Unadjusted minimum retention time (unad tRmin) and 5) Chromatogram resolution (Res).

Our first concern in analyzing the data is to identify proper models linking factors and responses.

Linear regression models are the simplest models to consider. They represent changes in responses between two levels of factors, in our case this corresponds to levels labeled “-1” and “+1”. Since we also have three center points, at levels labeled “0”, we can also assess nonlinear effects. We do so, as in our second blog post, by adding a synthetic indicator variable designed to assess lack-of-fit (LOF) that is equal to “1” at the center points and “0” elsewhere. The JMP Effect Summary report, for all five responses with linear effects on all seven factors, and the LOF indicator, is presented in Figure 3.

The Effect Summary table lists the model effects across the full set of five responses, sorted by ascending p-values. The LogWorth for each effect is defined as -log10(p-value), which adjusts p-values to provide an appropriate scale for graphics. A LogWorth that exceeds 2 is significant at the 0.01 level because -log10(0.01)=2. The report includes a bar graph of the LogWorth with dashed vertical lines at integer values and a blue reference line at 2. The displayed p-values correspond to the significance test displayed in the Effect Tests table of the model report. The report in Figure 3 shows that, overall, four factors and LOF are significant at the 0.01 level (Col Temp, Gradient, Buf PH and Dim Perc) and Buf Conc, Det Wave and Trie Perc are non-significant. From the experimental plan in Table 2, one can estimate the main effects of the seven factors and the LOF indicator on the five responses with a linear model.

Figure 4 presents parameter estimates for peakHeight with an adjusted R^{2} of 93%, a very good fit.

The peakHeight response is most sensitive to variations in Col Temp, Det Wave and Gradient.

** **We observe a statistically significant difference between the predicted value at the center point of the experimental design and the three measurements actually performed there (via the LOF variable).

Figure 5 displays a profiler plot showing the linear effects of each factor on all five responses. The plot is very helpful in highlighting which conditions might affect the HPLC method. We see that Col Temp and Gradient, the two most important factors, affect several different responses. Buf pH, Buf Conc and Dim Perc have especially strong effects on the retention responses, but have weak effects on the other CQA's. The factors give good fits to the retention responses and to peakHeight, but not to peakArea or Res, which is reflected in the wide confidence bands for those CQA's and in high p-values for the overall model F-tests in the Analysis of Variance line of the model output.

What should we do about the nonlinearity? Our analysis found a significant effect of the LOF indicator, which points to a nonlinear effect that is not accounted for in the profiler of Figure 5. The center points we added to the two-level fractional factorial design let us detect the nonlinearity, but they don’t provide enough information to determine what causes it – any one of the seven factors (and possibly several of them) could be responsible for the nonlinear effect on peak Height. In our next blog, we will discuss some design options to address the problem. For now, we show what we achieved with the current experiment.

After much brainstorming, the HPLC team decided that it was very likely that the Gradient was the factor causing the nonlinearity. This important assumption, based only on process knowledge, is crucial to all our subsequent conclusions. We proceeded to fit a model to the original experimental data that includes a quadratic effect for Gradient. The team also decided to retain only the factors with the strongest main effects for each response; for peakHeight, the factors were Gradient, Column Temperature and Detection Wavelength. In Figure 6, we show parameter estimates from fitting this reduced model to the peakHeight responses. With this model, all terms are significant with an adjusted R^{2} of 89%. The root mean squared error, which estimates run-to-run variation at the same settings of the factors, is 1.754, slightly less than 1% the magnitude of peakHeight itself (after dividing peakHeight by 1000).

We show a Profiler for the reduced quadratic model in Figure 7.

One of the main goals of the experiment was to assess the robustness of the system to variation in the input factors. We explore this question by introducing normal noise to the four factors in the reduced quadratic model. For each of the three factors, we assumed a standard deviation of 0.4 (in the coded units), which is the default option in JMP. This reflects a view that the experimental settings are about 2.5 SD from the nominal level, so reflect rather extreme deviations that might be encountered in practice.

Figure 8 presents the effect of noise on peakHeight for a set-up at the center point which was initially identified as the nominal setting. We can compute the SD of the simulated outcomes by saving them to a table and using the “distribution” tab in JMP. The SD turns out to be 2.397 and is slightly larger than the run-to-run SD that we computed earlier of 1.754. The overall SD associated with the analytic system involves both of these components. To combine them, we need to first square them, then add them (because variances are additive, not SD) and then take a square root to return to the original measurement scale. The resulting combined SD is 2.970, so the anticipated variation in factor settings leads to an SD about 70% larger than the one from run-to-run variation alone. The overall SD is less than 1.5% of the typical values of peakHeight and that was considered acceptable for this process.

Figure 8 is very helpful in answering this question. The important factor here is Gradient, through its non-linear relationship to peakHeight. The “valley” of that relationship is near the nominal choice of 0. Our simulation of factor variation generates values of Gradient that cover the range from -1 to 1. When those values are in the valley, they transmit very little variation to peakHeight. By contrast, when they are near the extremes, there is substantial variation in peakHeight. So the fact that the bottom of the valley is close to the nominal setting assures us that the transmitted variation will be about as small as possible. We can test this feature by shifting the nominal value of Gradient. When the nominal is -0.5, the simulator shows that the SD from factor variation increases to 4.282, almost 80% more than for the nominal setting at 0.

The dependence of peakHeight on Col Temp and on Det Wave is linear. So regardless of how we choose the nominal settings of these factors, they will transmit the same degree of variation to the peakHeight output. The experiment lets us assess how they affect robustness, but does not provide any opportunity to exploit the results to improve robustness.

In reviewing the questions originally posed, we can now provide the following answers:

1. What is the expected method measurement prediction variance for recommended set ups of the method (the measurement uncertainty).

**Answer:** We looked at this question most closely for peakHeight, where we found that the SD is 2970, with roughly equal contributions from run-to-run variation and from variation in the measurement process factors.

2. What setup of the experimental factors will achieve acceptable performance?

**Answer:** With all factors at their nominal settings (coded value at 0), the SD of 2970 is less than 1.5% the size of the values measured, which is an acceptable level of variation in this application.

3. What are the factors that impact the performance of the method in one or more responses?

**Answer:** The three factors with highest impact on the method’s performance are gradient profile, column temperature and detection wave.

4. Can we make the setup of the experimental factors robust in order to achieve performance that is not sensitive to changes in factor levels?

**Answer:** We saw that we can improve robustness for peakHeight by setting the gradient to its coded level of 0 (the nominal level in the experiment). That setting helps us to take advantage of the non-linear effect of gradient and reduce transmitted variation.

5. Can we assess the impact of variability in the experimental factors on the analytical method?

**Answer:** As we noted earlier, the natural variability of the input factors is responsible for slightly more than half the variation in peakHeight.

In reviewing the questions originally posed, we first fit a linear regression model. After realizing that there is an unaccounted nonlinear effect we used a reduced quadratic model and found that it fits the data well. Inducing variability in the factors of the reduced quadratic (gradient profile, column temperature and detection wave), we could estimate of the variability due to the method and could assess the robustness of the recommended setup.

The team’s assumption that gradient is responsible for the non-linearity is clearly important here. If other factors also have non-linear effects, there could be consequences for how to best improve the robustness of the method. We will explore this issue further in our next blog post.

**Notes**

[1] Borman, P., Nethercote, P., Chatfield, M., Thompson, D., Truman, K. (2007), Pharmaceutical Technology. http://pharmtech.findpharma.com/pharmtech/Peer-Reviewed+Research/The-Application-of-Quality-by-Design-to-Analytical/ArticleStandard/Article/detail/463580

[2] Kenett, R.S, and Kenett, D.A (2008), Quality by Design Applications in Biosimilar Technological Products*, *Accreditation and Quality Assurance, Springer Verlag, Vol. 13, No 12, pp. 681-690.

[3] Romero R., Gasquez, D., Sanshez, M., Rodriguez, L. and Bagur, M. (2002), A geometric approach to robustness testing in analytical HPLC, LCGC North America, **20**, pp. 72-80, www.chromatographyonline.com.

## About the AuthorsThis blog post is brought to you by members of the KPA Group: Ron Kenett and David Steinberg. |
||

The post The QbD Column: Is QbD applicable for developing analytic methods? appeared first on JMP Blog.

]]>The post 13 reasons data access is better than ever in JMP 13 appeared first on JMP Blog.

]]>The feedback we’ve gotten from users about **Query Builder** suggests that you are finding it useful. We have also gotten suggestions for fixes and enhancements, both for **Query Builder** and other aspects of data access. With JMP 13, we are delivering a boatload of such fixes and enhancements. The 13 most important such fixes and enhancements are detailed below.

The first four enhancements all relate to filtering data.

** Careful, my data could be huge – **When you create a filter for a categorical column, Query Builder retrieves the values to display in a list. With large tables, this can take a long time. In JMP 12, value retrieval was unconditional, plus there was not a way to cancel it. In JMP 13, we have made several changes to prevent long waits:

**Cancelable value retrieval**– JMP 13 puts up a progress bar with a**Cancel**button when retrieving categorical column values. This is supported for all ODBC drivers we have tested when JMP is running on Windows. It is not supported when connecting to SAS or for most ODBC drivers available for the Macintosh.**Too big to attempt**– If there are more than 1,000,000 rows in a table, JMP will not even attempt to retrieve unique column values. The 1,000,000 value can be changed via a preference.**Simpler list**– In JMP 12, the**Check Box List**was the only type of filter available for selecting from a list of values. In JMP 13, we have added a plain**List Box**filter type. The**List Box**filter is less resource-intensive than the**Check Box List**filter. This makes it better-suited for larger lists. The default filter type for categorical columns is the**List Box**in JMP 13.

** New filter types** – In addition to the new, simpler **List Box** filter type, two more filter types have been added for categorical columns in JMP 13:

**Contains**filter – Enter some text, and JMP will match all rows that contain that text. You can also ask to match rows that do**not**contain the text.**Manual List**filter – Allows you to create a list of selections yourself to avoid the need for values to be looked up.

** List filters are now invertible** – All of the list-type filters (**List Box, Manual List, Check Box List** and **Match Column Values**) now have a **Not in List **check box. This allows you to select a couple items and retrieve all rows that do **not** match the selected values. For example, this filter will return all movies rated something other than **“G”**:

** List filters can now be conditional** – This one is sort of a big deal. Using the red-triangle menu on a list-type filter, you can set the filter to be **conditional.** Conditional filters only display values that match other filters that precede them in the list. Below is an example using movie **Rating** and movie **Genre**. In this example, I have asked for the **Genre** filter to be conditional. When I select **G** in the list for **Rating**, the **Genre** filter changes to list only genres that contain at least one G-rated film:

This symbol indicates that the filter is conditional. Only filters for columns **from the same table** affect the values displayed in a conditional filter.

After using Query Builder, some users would ask us, “What if I just have a folder full of JMP data tables. Can I use Query Builder on them?” In JMP 13, the answer is a resounding **“Yes!”** Or perhaps you use ODBC Query Builder, Text Import or the Excel Import Wizard to import several tables. It would nice to be able to use Query Builder to join the results. With JMP 13, you can!

To use Query Builder on JMP data tables, first open the tables, and then select **JMP Query Builder** from the **Tables **menu.

For example, JMP has two sample data tables, **SATByYear.jmp** and **CrimeData.jmp,** that both have **State** and **Year** columns. Another sample table, **US Demographics.jmp**, has a **State** column. I can easily join these three tables with JMP Query Builder:

JMP Query Builder allows up to **64** tables to be joined together. If you ever get that many tables into one query, please send me a screenshot.

All of the other features of **Query Builder**, such as filters and prompting, are also available with **JMP Query Builder.**

We built a SQL engine into JMP to allow Query Builder to work on JMP data tables. A new JSL function, **Query()**, gives you direct access to that SQL engine. You can use the **Query()** function to manipulate JMP data tables using SQL statements. Here is an example using **SATByYear** and **CrimeData** sample data tables:

In JMP 13, you can configure a query to** **immediately run when you open it instead of opening the Query Builder window. Simply check the **Run on Open** option on the red-triangle menu at the top of the Query Builder window:

This is especially useful for queries that have prompted filters. You can send these queries to others (or incorporate them into a JMP add-in), and when the other user opens them, they will just see the filter prompt. This allows them to make their filter selections without having to wade through the complexities of Query Builder.

When a query has been set to **Run on Open**, but you need to open it into Query Builder to make changes, you have a few options. If you hold down the **Ctrl** key while opening the query, it will open into the Query Builder window. Alternatively, you can right-click on the query file in the **JMP Home Window** and select **Edit Query**.

One caveat to all these neat new JMP 13 Query Builder features – if you create queries that use these features, you will not be able to open them in JMP 12. At the same time, you may get JMP 13 earlier than your co-workers so that you can try out other new features.

To help with this scenario, we have added a preference in JMP 13 that hides all of the new JMP 13 features of Query Builder so that the queries you build will still be compatible with JMP 12. The preference is on the **Query Builder Preferences page**:

Any ODBC or SAS queries you build after setting that preference will only allow features that are compatible with JMP 12. If you want to relax that rule for a particular query, there is an option on Query Builder’s red-triangle menu that you can uncheck to allow JMP 13 features for that query:

The **Tables** panel on the Query Builder window in JMP 12 did not have much functionality other than showing you the list of tables in your query. In JMP 13, that panel gains a number of features:

- Selecting one or more tables in the
**Tables**panel restricts the columns listed in the**Available Columns**panel to just columns from the selected tables, making columns easier to find. - The
**Tables**panel now displays the Venn diagram icon corresponding to the join type for each table, and you can edit the join, change the table alias, or remove the table from the query from the context menu. - When querying JMP data tables, double-clicking a table in the
**Tables**panel makes the table visible and brings it to the front (or select the**View**item on the context menu).

When querying large tables from databases, sometimes it is helpful to retrieve just the first thousand or so rows of data for a query to experiment with before you spend the time and resources to retrieve all the data.

In JMP 12, **First N Rows** sampling was supported for the **Oracle** and **SQL Server** databases. In JMP 13, support has been added for most other databases, including **PostgreSQL**, **MySQL**, **Microsoft Access**, **SQLite**, **Apache Hive**, and **Cloudera Impala**.

More and more data is being stored in “big data” databases these days. JMP 13 improves date support for sources like Apache Hive, Cloudera Impala and HortonWorks. Also, saving tables with **File > Database > Save Table** did not work well with some of these data sources. That has been improved in JMP 13, with the caveat that using ODBC to save data to Hadoop-based data sources is not a very efficient way to get data to them.

If you do a lot with CSV files, support for the **Microsoft Text Files** ODBC driver has been improved in JMP 13.

Keeping data in a database makes it convenient to provide access to whoever needs it. For many releases, JMP has supported saving JMP data tables to databases via the **File > Database > Save Table** feature. However, with data sizes getting larger and larger, we have had reports that saving JMP tables to a database was taking much longer than people felt that it should. We listened and investigated, and we are happy to report that, in JMP 13, the performance of saving JMP tables to databases has significantly improved, in some cases dramatically. Please try this feature again and let us know what you experience.

With JMP, all of the data you are analyzing has to fit in memory. When you join JMP data tables with either **Tables > Join** or the new **JMP Query Builder**, data tends to get duplicated from smaller “look-up” tables into the larger join result. To help prevent this duplication, the **Virtual Join** feature has been added in JMP 13. For example, a DVD store might have an **inventory** table that knows where all the DVD’s are and a **film** table with details about each title. In the **film** table, I can set the **film_id** column to be the **Link ID** for the table:

Then, in the **inventory** table, I can set **film_id** to be a **link reference** to the **film** table. This action effectively joins the two tables based on the **film_id** column.

Once I’ve set that up, columns from the **film** table now appear in the column list for **inventory. **They are designated "**referenced** columns" and are initially hidden. I can unhide whichever columns I want to appear in the **inventory** table, in this case **title[film_id]**:

**Virtual Join** allows me to see the values from the **film** table in the **inventory** table. However, they have not been physically copied. They are looked up as needed, which saves memory.

This just scratches the surface of **Virtual Join**, which is worthy of a blog post all on its own.

So, there you have it – a look at the many enhancements for accessing and manipulating data in JMP 13. Which feature is your favorite? What feature were you hoping to see that was not mentioned? Let me know in the comments.

For more information on using Query Builder for JMP data tables, check out my Discovery Summit poster presentation in the JMP User Community. While you're there, you can also see the slides from my Discovery Summit tutorial titled, "Wrangling All Your Data With Query Builder in JMP 13."

tags: Data access, Databases, JMP 13, Query Builder

The post 13 reasons data access is better than ever in JMP 13 appeared first on JMP Blog.

]]>The post Formulation success: Getting the right data in the right amount at the right time appeared first on JMP Blog.

]]>We have worked with formulation scientists and engineers for decades and have seen many different types of formulation development programs. This has shown us what formulation scientists really *need* to know rather than what is nice to know. Because JMP data analysis software is used in the examples in the book, readers get valuable guidance on the software for the proposed methodology. That means JMP users can immediately apply what they learn in the book.

Key takeaways from the book include:

- Approach the development process from a strategic viewpoint, with the overall end in mind. Don’t necessarily run the largest design possible. An experimentation plan that implements the strategy provides the right road map for developing a successful formulation.
- Focus on developing understanding how the components blend together. Use designs and models that help find the dominant components, components with large effects, and components with small effects.
- Use screening experiments early on to identify those components that are most important to the performance of the formulation. This strategy creates a broad view and helps ensure that no important components are overlooked. It also saves significant experimental effort.
- Analyze both screening and optimization experiments using graphical and numerical methods, which is easily done with JMP. The right graphics can extract additional information from the data.
- Consider integration of both formulation components and process variables in designs and models, using recently published methods that reduce the required experimentation by up to 50 percent.

This is how you speed up the formulation development process and produce high-quality formulations in a timely manner. Upcoming blog posts will show how to address each of these important issues.

**Want more information?** You can read a free chapter from the book and learn about authors Ronald D. Snee and Roger W. Hoerl**.**

The post Formulation success: Getting the right data in the right amount at the right time appeared first on JMP Blog.

]]>The post Interactive HTML: Points, box plots and more for Graph Builder appeared first on JMP Blog.

]]>Since this blog post describes interactive web pages output from JMP, images and animations below were captured from a web browser.

**Points**

Points exist in many graphs in JMP where you can customize the point color, shape, and size, usually by opening a dialog box. Graph Builder’s drag-and-drop interface makes it easy to create colorful graphs with points of all shapes and sizes. The example below using Diamonds Data from the sample data library in JMP sets the following point attributes:

- Size based on the Table column data
- Color based on the Depth column data
- Shape based on the Clarity column data

In addition to these attributes, Price versus Cut and grouping by Carat Weight was employed to understand what influences diamond prices the most. Of course, JMP provides capabilities that specifically target this question, but that’s a topic for another blog post.

This combination of attributes made supporting Graph Builder point plots in Interactive HTML challenging because there are now more ways to determine the size, shape and color of each point. The challenge was increased additionally by the fact that each point in Graph Builder can represent a statistical summary of multiple rows of data.

In the following Interactive HTML example, each point represents diamonds of a given Cut and Clarity. Although the legend is rearranged, the shape and color are still determined by Clarity and Depth respectively. To accentuate the difference between the diamonds' Table dimensions, a column transform named Relative Table was used in the Size role rather than the raw Table column data.

**Box Plots **

The summarized points above may provide too little information and the raw points may be too busy, so how about a compromise using box plots? In the graph below, we see the distribution of prices for each Cut, Clarity, and Carat Weight combination. The legend was moved to the bottom and drawn horizontally to match the arrangement of box plots in each group.

**Heat Maps**

So far, it might be difficult to see what influences diamond prices the most. We’ve only covered three of the four C’s in diamond quality. So, here’s a heat map including all four. Maybe now it’s easier to make some conclusions.

Adding support for heat maps in Graph Builder gave us a bonus outside of Graph Builder: The Uplift graph in the Uplift Platform is now interactive and can display X Factors and X/Y ranges.

**Map Shapes**

Map shapes can be used in Graph Builder for location-based data, like population data. Grouping can help the viewer focus on one region at a time. With the ‘Show Missing Shapes’ option enabled, the region of interest can be seen in context of the whole country.

Map Shapes can be scaled according to a size variable (Population) while being colored by another variable (Vegetable Consumption).

**Combinations**

To see that some of the interactive power of JMP is available in Interactive HTML, it helps to interact with combined graphs. In JMP, this can be accomplished with Combined Windows, Application Builder, or Dashboard Builder. Below are some combination examples using the graph types described above.

This example explores Crime data with a Heat Map and geographical Map Shapes.

The following example uses Points, Box Plots, a Heat Map, and a custom Map Shape to explore office temperatures.

One new feature for Points and Map Shapes in Interactive HTML is the ability to display images in tooltips.

Note that these are all just animations. You can interact with the Interactive HTML files shown in this blog at a page we created for this purpose. Let us know what you think!

The post Interactive HTML: Points, box plots and more for Graph Builder appeared first on JMP Blog.

]]>The post Is that the best (distribution) you've got? appeared first on JMP Blog.

]]>Fishing for the best distribution can lead you into a trap. Just because one option appears to be best – that doesn’t mean that it’s correct! For example, consider this data set:

What is the best distribution we can use to describe this data? JMP can help us answer this question. From the Distribution platform, we can choose to fit a number of common distributions to the data: Normal, Weibull, Gamma, Exponential, and others. To fit all possible continuous distributions to this data in JMP, go to the red triangle hotspot for this variable in the Distribution report, and choose “Continuous Fit > All”. Here is the result:

JMP has compared 11 potential distributions for this data, and ranked them from best (Gamma) to worst (Exponential). The metric used to perform the ranking is the corrected Akaike Information Criterion (AICc). Lower values of AICc indicate better fit, and so the Gamma distribution is the winner here.

This data set was generated by drawing a random sample of size 50 from a population that is normally distributed with a mean of 50 and a standard deviation of 10. The Normal distribution is the correct answer by definition, but our fishing expedition gave us a misleading result.

How often is there a mismatch like this? One way we can approach this question is through simulation. I wrote a small JMP script to draw samples of various sizes from a normally distributed population. I investigated sample sizes of 5, 10, 20, 30, 50, 75, 100, 250, and 500 observations; for each of these, I drew 1,000 independent samples and had JMP compute the fit for all possible continuous distributions. Last, for each sample I recorded the name of the best-fitting distribution, as measured by AICc. (JSL script available in the JMP File Exchange).

The results were quite surprising!

- Remember, the correct answer in each case is “Normal”. If our fishing expedition was yielding good results across the board, the line for the Normal distribution should be high and flat, hovering near 100%.
- Instead, the wrong distribution was chosen with disturbing frequency. For sample sizes under 50, the Normal distribution was not even the most commonly chosen. That honor belongs to the Weibull distribution.
- For a sample size of 5 observations from a Normal distribution, the correct identification was not made a single time out of 1,000 samples.
- If you want to have at least a 50% chance of correctly identifying normally distributed data by this method, you’ll need more than 100 observations!
- Even at a sample size of 500 observations, the likelihood of the normal distribution being correctly called the best is only about 80%.

When comparing the fit of different distributions to a data set, don’t assume that the distribution with the smallest AICc is the correct one. *Relative* magnitudes of the AICc statistics are what counts. A rule of thumb (used elsewhere in JMP) is that models whose values of AICc are within 10 units of the “best” one are roughly equivalent.* In our first example above, the Gamma distribution is nominally the best, but its AICc is only .2 units lower than that of the Normal distribution. There is not good statistical evidence to choose the Gamma over the Normal.

More generally, as a best practice it is wise to consider only distributions that make sense in the context of the problem. Your own knowledge and expertise are usually the best guides. Don’t choose an exotic distribution that has a slightly better fit over one that makes sense and has a proven track record in your field of work.

*This rule is used to compare models built in the Generalized Regression personality of the Fit Model platform in JMP Pro. See Burnham, K.P. and Anderson, D.R. (2002), *Model Selection And Multimodel Inference: A Practical Information Theoretic Approach*. Springer, New York.

tags: Distribution, Statistics, Tips and Tricks

The post Is that the best (distribution) you've got? appeared first on JMP Blog.

]]>The post Entertaining thing-explaining with Randall Munroe appeared first on JMP Blog.

]]>Because Randall’s talk was followed by the book signing for his newest book, many didn’t submit feedback on his talk in the JMP Discovery Summit app (probably so they could quickly get in line for the book). I took the comments submitted and used the new Text Explorer platform in JMP 13 to show you the very positive terms from the comments and how the powerful regex handles all those enthusiastic exclamation points!!!!

Below left, you see the most popular terms and phrases listed. And at right, you can see the regular expression editor with default tokenizing options highlighted in different colors under the Word Separator List. These settings can be further customized, but for this simple example, we see that Randall Munroe (using simple words) evoked very positive and enthusiastic comments. For more on text exploration, check out these previous posts.

During the book signing, Randall asked what JMP users do at their organizations. Upon hearing a few answers about some of the “complicated stuff“ JMP users do, he actually flipped to a place in his book where he “explained” what they did!

At the close of his talk, I tweeted, “Awesome keynote by Randall Munroe @ #jmpDiscoverySummit. His curiosity is contagious.” We hope you will tune in Oct. 12 for some entertaining thing-explaining! Or, you can watch the archive along with other episodes of Analytically Speaking.

tags: Analytically Speaking, Discovery Summit, Discovery Summit Keynote, Text Analysis, Text Explorer

The post Entertaining thing-explaining with Randall Munroe appeared first on JMP Blog.

]]>