Batchmaturity and Modeloptimizergui: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Donal
 
imported>Mathias
 
Line 1: Line 1:
===Purpose===
=Model Optimizer=
__TOC__


Batch process model and monitoring, identifying outliers.
=Introduction=


===Synopsis===
Optimizing modeling conditions and settings requires building and comparing multiple models. The Model Optimizer allows you to both automate building of models and visualize a comparison of models. See the [[modeloptimizer]] and [[comparemodels]] functions for more additional command line information.
: model = batchmaturity(x,ncomp_pca,options);
: model = batchmaturity(x,y,ncomp_pca,options);
: model = batchmaturity(x,y,ncomp_pca,ncomp_reg,options);
: pred  = batchmaturity(x,model,options);
: pred  = batchmaturity(x,model);


===Description===
The interface is based around the idea of a "Snapshot" of the Analysis model building interface. Each snapshot contains all the settings for building the model. Once a set of snapshots has been created, the Model Optimizer can assemble all combinations of the settings into a collection of models to calculate. This collection can then be run at a convenient time or saved and run on new data.
Analyzes multivariate batch process data to quantify the acceptable
variability of the process variables during normal processing conditions as a function of the percent of batch completion.
The resulting model can be used on new batch process data to identify
measurements which indicate abnormal processing behavior (See the
pred.inlimits field for this indicator.)


The progression through a batch is described in terms of the "Batch Maturity", BM, which is often defined in terms of percentage of completion of the process. In the BM analysis method a PCA model is built on the calibration batch dataset and the samples' PCA scores are binned according to the samples' BM values. The resulting model contains confidence limits on PCA scores as a function of batch maturity and reflect the normal range of variability of the data at any stage of progression through the batch. If one of the measured variable represents BM, or if BM is a known function of the measured variables, then a sample from a new batch can be tested against the model's score limits at its known BM value to see if it is normal sample or not. However, BM is often not measurable or known as a function of the measured variables in new batches. Instead, this relationship is estimated by building a PLS model using the calibration dataset where BM is provided. The PLS model is then used to predict the BM value for any sample.
After all models have been calculated the results are displayed in a comparison table. By default the table shows only columns that differ between the models but it can be customized. Each column can be sorted to investigate results.


For assistance in preparing batch data for use in Batchmaturity please see [[bspcgui]].


====Methodology:====
'''Unsupported Model Types'''
Given multivariate X data and a Y variable which represents the
corresponding state of batch maturity (BM) build a model by:
# Build a PLS model on X and Y using specified preprocessing. Use its self-prediction of Y, ypred, as the indicator of BM.
# Simplify the X data by performing PCA analysis (with specified preprocessing). We now have PC scores and a measure of BM (ypred) for each sample.
# Sort the samples to be in order of increasing BM. Calculate a running-mean of each PC's ordered scores ("smoothed score means"). Calculate deviations of scores from the smoothed means for each PC.
# Form a set of equally-spaced BM values over the range (BMstart, BMend). For each BM point, find the ''n'' samples which have BM closest to that value.
# For each BM point, calculate low and high score limit values corresponding to the cl/2 and 1-cl/2 percentiles of the ''n'' sample score deviations just selected (repeat for each PC). Add the smoothed scores to these limits to get the actual limits for each PC at each BM point. These BM points and corresponding low/high score limits constitute a lookup table for score limits for each PC in terms of BM value.
# The score limits lookup table contains upper and lower score limits for each PC, for every equally-spaced BM point over the BM range.
# The batch maturity model contains the PLS and PCA sub-models and the score limits lookup table. It is applied to a new batch processing dataset, X1, by applying the PLS sub-model to get BM (ypred), then applying the PCA sub-model to get scores. The upper and lower score limits (for each PC) for each sample are obtained by using the sample's BM value and querying the score limits lookup table. A sample is considered to be an inlier if its score values are within the score limits for each PC.


Fig. 1 shows an example of the Batch Maturity Scores Plot (obtained from the BATCHMATURITY Analysis window's Scores Plot). This shows the second PC scores upper and lower limit as a function of BM as calculated form the "Dupont_BSPC" demo dataset using cl = 0.9 and step 2 only from batches 1 to 36. These batches had normal processing conditions so the shaded zone enclosed by the limit lines indicates the range where a measured sample's PC=2 scores should occur if processing is evolving "normally". Similar plots result for the other modeled PCs. The data points shown are the PCA model scores, which are accessible from the batchmaturity model or pred's <tt>t</tt> field.
* CLUSTER
* PURITY
* Batch Maturity


<gallery caption="Fig. 1. Batchmaturity Scores Plot." widths="400px" heights="300px" perrow="1">
=Model Optimizer Interface=
File:BMScoreScoresPlot.png|Plot showing Scores for PC 2 as a function of batch maturity (Ypred from the PLS model).
==Getting Started==
</gallery>
Snapshots can be take from the Analysis toolbar or the Model Optimizer toolbar by clicking the snapshot button (camera icon). Each time a snapshot is taken the settings are added to the snapshot list as well as the model list. The combinations are updated after each snapshot. A Model Optimizer window will open if it's not already.


====Inputs====
Once you've added the desired snapshots you can either calculate the list of models immediately or click the '''Add Combinations''' button to add all unique combinations of the model settings to the models list then calculate.
* '''x''' = X-block (2-way array class "double" or "dataset").
* '''y''' = Y-block (vector class "double" or "dataset").
* '''ncomp_pca''' = Number of components to to be calculated  in PCA model (positive integer scalar).
* '''ncomp_reg''' = Number of latent variables for regression method.


====Outputs====
After the models are calculated the model comparison table will be updated showing the columns that differ between models. Sort the columns to find models with the desired value. For additional information on each model, expand it's leaf in the tree.
* '''model''' = standard model structure containing the PCA and Regression model (See MODELSTRUCT).
* '''pred''' = prediction structure contains the scores from PCA model for the input test data as pred.t.


Model and pred contain the following fields which relate to score limits and
==Taking Snapshots==
whether samples are within normal ranges or not:
A snapshot can be taken before or after a model is calculated (by clicking the Snapshot button).
:'''limits''' : struct with fields:
::'''cl''': value used for cl option
::'''bm''': (1 x bmlookuppts) bm values for score limits
::'''low''': (nPC x bmlookuppts) lower score limit of inliers
::'''high''': (nPC x bmlookuppts) upper score limit of inliers
::'''median''': (nPC x bmlookuppts) median trace of scores
:'''inlimits''' : (nsample x nPC) logical indicating if samples are inliers.
:'''t''' : (nsample x nPC) scores from the PCA submodel.
:'''t_reduced''' : (nsample x nPC) scores scaled by limits, with limits -> +/- 1 at upper/lower limit, -> 0 at the median score.
:'''submodelreg''' : regression model built to predict bm. Only PLS currently.
:'''submodelpca''' : PCA model used to calculate X-block scores.


===Options===
* If the snapshot is taken before calculation it will be added to both the Snapshot List and Model List but no results will be available until it's calculated. Any additional combinations resulting from the Snapshot will not be added to the Model List, you must add those by by clicking Add Combinations button.
* If the snapshot is taken after calculation, the model will be added same as above but results will be displayed immediately in the Comparison Table.


options =  a structure array with the following fields:
If you have large data and or time-consuming model settings it's often useful to take a Snapshot '''without''' calculating the model. Then, adjust settings and continue assembling snapshots '''without''' calculating. Once you have all the models assembled you can create them (all at once) from the Model Optimizer.


* '''regression_method''' : [ {'pls'} ] A string indicating type of regression method to use. Currently, only 'pls' is supported.
Use Combinations to generate all unique combinations of model settings. For example, if you change the cross-validation and preprocessing, take a snapshot, then change cross-validation and preprocessing again and take a snapshot, there will be 4 possible combinations. Clicking the Add Combinations button will add all 4.
* '''preprocessing''' : { [] } preprocessing structure goes to both PCA and PLS. PLS Y-block preprocessing will always be autoscale.
* '''zerooffsety''' : [ 0 | {1}] transform y resetting to zero per batch
* '''stretchy''' : [ 0 | {1}] transform y to have range=100 per batch
* '''cl''' : [ 0.90 ] Confidence limit (2-sided) for moving limits (defined as 1 - Expected fraction of outliers.)
* '''nearestpts''' : [{25}] number nearby scores used in getting limits
* '''smoothing''' : [{0.05}] smoothing of limit lines. Width of window used in Savgol smoothing as a fraction of BM range.
* '''bmlookuppts''' : [{1001}] number of equally-spaced points in BM lookup table mentioned in Methodology Step 4 above. Default gives lookup values spaced every 0.1% over the BM range.
* '''plots''' : [ 'none' | 'detailed' | {'final'} ] governs production of plots when model is built. 'final' shows standard scores and loadings plots. 'detailed' gives individual scores plots with limits for all PCs.
* '''waitbar''' : [ 'off' | {'auto'} ] governs display of waitbar when calculating confidence limits ('auto' shows waitbar only when the calculation will take longer than 15 seconds)


===See Also===
==Model Optimizer Window==
The main Optimizer Window has 4 sections:


[[batchfold]], [[batchdigester]], [[bspcgui]].
# Model comparison table.
# Model list.
# Snapshot list.
# Combinations list.
 
[[Image:ModelOptimizerMain.png|700px|GUI Window]]
 
==Managing Snapshots and Models==
 
Clicking the snapshot button adds it to the snapshot list, updates the combinations list, and adds the snapshot to the model list. A snapshot or model can be removed by clicking the '''Remove''' node in the list tree or by using the right-click menu.
 
The list right-click menu allows you to:
* Collapse all leafs in the tree.
* Run all models.
* Add a single snapshot to the model list.
* Add all snapshots to model list.
* Clear a single snapshot or model from the list.
* Clear all snapshots or models from the list.
* Clear everything.
 
Expanding each node will "drill down" into the snapshot/model settings. Expand a node to get more information about the item.
==Survey Preprocessing==
 
Preprocessing in a snapshot can be "surveyed" over a range of parameter values via the [[preprocessiterator | preprocess iteration]] function.
 
# Create a snapshot with the preprocessing you wish to survey.
# Right-click on the snapshot and select "Survey Preprocessing" from the menu.
# For each of the preprocessing methods available for surveying (see [[preprocessiterator]] for a list) select the min, max and interval to use. '''Note:''' a full factorial of the combinations will be created so the total number of models will be the product of the "count" column.
# Click '''OK''' and the new snapshots should appear. You can expand the snapshots to confirm preprocess settings.
# Click '''Add Combinations''' then '''Calculate Models'''.
<br>
 
Survey Menu:
 
[[Image:SurveyPreprocessingMenu.jpg||300px| ]]
 
Survey Figure:
 
[[Image:PreprocessIteratorFig.jpg| |500px| ]]
 
==Adding Models from Cache or Workspace==
 
Models can be added to the Optimizer from the cache or workspace by dragging and dropping them onto the window. If a model is added from the workspace the associated data is not added.
 
A model can be added from the cache using the right-click menu "Compare" item.
 
==Working with the Comparison Table==
Once models have been added and calculated the Comparison Table will be updated. By default, only the columns with differing values will be displayed (columns with the same value for all models will be hidden). Investigate the results by sorting columns.
 
The display of columns and table behavior can be modified via the Edit menu Include/Exclude Columns and the Options editor. Excluded columns do not appear in the include list so remove excluded columns from the exclude list before attempting to include them.
 
Right-click on the column header to display a context menu for sorting. Clicking on a particular row will expand the models corresponding node in the model tree. Double clicking the row number will open the model in analysis.
 
[[Image:OptimizerTable.jpg| |500px|Optimizer Table]]
 
Click the plot button to generate a plot of the table data.
 
[[Image:modeloptimizer_plot.jpg | |400px|Optimizer Plot]]
 
==Saving an Optimizer Model==
Optimizer models can be save from the '''File''' menu. These models are also "cached" in the model cache each time the '''Calculate''' button is pushed.
 
=Applying New Data=
 
New data can be applied to an Optimizer model by dragging and dropping new data onto a model in the [[Workspace_Browser | Workspace Browser]].
 
* Locate the Model Optimizer model in the model cache and double-click it to load it into the workspace.
* Load the data you want to apply.
* Drag and drop new data onto your model.
* Once the model is calculated you'll be prompted to save the model to the workspace.
* Double-click the new model to open it in the Model Optimizer window.
 
[[Image:ApplyModelOptimizer.jpg||600px|Apply Optimizer]]
 
 
 
 
 
[[Image:modeloptimizer1.jpg||600px|Validate Optimizer]]
 
=Applying Validation Data=
 
[[Image:modeloptimizer1.jpg||600px|Validate Optimizer]]

Revision as of 21:41, 17 October 2016

Model Optimizer

Introduction

Optimizing modeling conditions and settings requires building and comparing multiple models. The Model Optimizer allows you to both automate building of models and visualize a comparison of models. See the modeloptimizer and comparemodels functions for more additional command line information.

The interface is based around the idea of a "Snapshot" of the Analysis model building interface. Each snapshot contains all the settings for building the model. Once a set of snapshots has been created, the Model Optimizer can assemble all combinations of the settings into a collection of models to calculate. This collection can then be run at a convenient time or saved and run on new data.

After all models have been calculated the results are displayed in a comparison table. By default the table shows only columns that differ between the models but it can be customized. Each column can be sorted to investigate results.


Unsupported Model Types

  • CLUSTER
  • PURITY
  • Batch Maturity

Model Optimizer Interface

Getting Started

Snapshots can be take from the Analysis toolbar or the Model Optimizer toolbar by clicking the snapshot button (camera icon). Each time a snapshot is taken the settings are added to the snapshot list as well as the model list. The combinations are updated after each snapshot. A Model Optimizer window will open if it's not already.

Once you've added the desired snapshots you can either calculate the list of models immediately or click the Add Combinations button to add all unique combinations of the model settings to the models list then calculate.

After the models are calculated the model comparison table will be updated showing the columns that differ between models. Sort the columns to find models with the desired value. For additional information on each model, expand it's leaf in the tree.

Taking Snapshots

A snapshot can be taken before or after a model is calculated (by clicking the Snapshot button).

  • If the snapshot is taken before calculation it will be added to both the Snapshot List and Model List but no results will be available until it's calculated. Any additional combinations resulting from the Snapshot will not be added to the Model List, you must add those by by clicking Add Combinations button.
  • If the snapshot is taken after calculation, the model will be added same as above but results will be displayed immediately in the Comparison Table.

If you have large data and or time-consuming model settings it's often useful to take a Snapshot without calculating the model. Then, adjust settings and continue assembling snapshots without calculating. Once you have all the models assembled you can create them (all at once) from the Model Optimizer.

Use Combinations to generate all unique combinations of model settings. For example, if you change the cross-validation and preprocessing, take a snapshot, then change cross-validation and preprocessing again and take a snapshot, there will be 4 possible combinations. Clicking the Add Combinations button will add all 4.

Model Optimizer Window

The main Optimizer Window has 4 sections:

  1. Model comparison table.
  2. Model list.
  3. Snapshot list.
  4. Combinations list.

GUI Window

Managing Snapshots and Models

Clicking the snapshot button adds it to the snapshot list, updates the combinations list, and adds the snapshot to the model list. A snapshot or model can be removed by clicking the Remove node in the list tree or by using the right-click menu.

The list right-click menu allows you to:

  • Collapse all leafs in the tree.
  • Run all models.
  • Add a single snapshot to the model list.
  • Add all snapshots to model list.
  • Clear a single snapshot or model from the list.
  • Clear all snapshots or models from the list.
  • Clear everything.

Expanding each node will "drill down" into the snapshot/model settings. Expand a node to get more information about the item.

Survey Preprocessing

Preprocessing in a snapshot can be "surveyed" over a range of parameter values via the preprocess iteration function.

  1. Create a snapshot with the preprocessing you wish to survey.
  2. Right-click on the snapshot and select "Survey Preprocessing" from the menu.
  3. For each of the preprocessing methods available for surveying (see preprocessiterator for a list) select the min, max and interval to use. Note: a full factorial of the combinations will be created so the total number of models will be the product of the "count" column.
  4. Click OK and the new snapshots should appear. You can expand the snapshots to confirm preprocess settings.
  5. Click Add Combinations then Calculate Models.


Survey Menu:

SurveyPreprocessingMenu.jpg

Survey Figure:

PreprocessIteratorFig.jpg

Adding Models from Cache or Workspace

Models can be added to the Optimizer from the cache or workspace by dragging and dropping them onto the window. If a model is added from the workspace the associated data is not added.

A model can be added from the cache using the right-click menu "Compare" item.

Working with the Comparison Table

Once models have been added and calculated the Comparison Table will be updated. By default, only the columns with differing values will be displayed (columns with the same value for all models will be hidden). Investigate the results by sorting columns.

The display of columns and table behavior can be modified via the Edit menu Include/Exclude Columns and the Options editor. Excluded columns do not appear in the include list so remove excluded columns from the exclude list before attempting to include them.

Right-click on the column header to display a context menu for sorting. Clicking on a particular row will expand the models corresponding node in the model tree. Double clicking the row number will open the model in analysis.

Optimizer Table

Click the plot button to generate a plot of the table data.

Optimizer Plot

Saving an Optimizer Model

Optimizer models can be save from the File menu. These models are also "cached" in the model cache each time the Calculate button is pushed.

Applying New Data

New data can be applied to an Optimizer model by dragging and dropping new data onto a model in the Workspace Browser.

  • Locate the Model Optimizer model in the model cache and double-click it to load it into the workspace.
  • Load the data you want to apply.
  • Drag and drop new data onto your model.
  • Once the model is calculated you'll be prompted to save the model to the workspace.
  • Double-click the new model to open it in the Model Optimizer window.

Apply Optimizer



Validate Optimizer

Applying Validation Data

Validate Optimizer