Batchmaturity and Faq import three-way data: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Donal
No edit summary
 
imported>Bob
No edit summary
 
Line 1: Line 1:
===Purpose===
===Issue:===


Batch process model and monitoring, identifying outliers.
How do I import three-way data into Solo or PLS_Toolbox?


===Synopsis===
===Possible Solutions:===
: model = batchmaturity(x,ncomp_pca,options);
: model = batchmaturity(x,y,ncomp_pca,options);
: model = batchmaturity(x,y,ncomp_pca,ncomp_reg,options);
: pred  = batchmaturity(x,model,options);
: pred  = batchmaturity(x,model);


===Description===
'''Solution 1) Built in EEM importers :'''
Analyzes multivariate batch process data to quantify the acceptable
variability of the process variables during normal processing conditions as a function of the percent of batch completion.
The resulting model can be used on new batch process data to identify
measurements which indicate abnormal processing behavior (See the
pred.inlimits field for this indicator.)


The progression through a batch is described in terms of the "Batch Maturity" which is often defined in terms of percentage of completion. The resulting model contains confidence limits on PCA scores which are a function of batch maturity and reflect the normal range of variability for the given cross-section of progression through the batch.
If applicable to your file type, use one of the built in EEM importers. There are importers for EEM data from Hitachi, Shimazdu, Horiba and Jasco. Please see this wiki entry for more information on [[Data_Importing_Formats | Data Importing Formats]]


====Methodology:====
EEM data needs be configured in a specific way such that:
Given multivariate X data and a Y variable which represents the
corresponding state of batch maturity (BM) build a model by:
# Build a PLS model on X and Y using specified preprocessing. Use its self-prediction of Y, ypred, as the indicator of BM.
# Simplify the X data by performing PCA analysis (with specified preprocessing). We now have PC scores and a measure of BM (ypred) for each sample.
# Sort the samples to be in order of increasing BM. Calculate a running-mean of each PC's ordered scores ("smoothed score means"). Calculate deviations of scores from the smoothed means for each PC.
# Form a set of equally-spaced BM values over the range (BMstart, BMend). For each BM point, find the ''n'' samples which have BM closest to that value.
# For each BM point, calculate low and high score limit values corresponding to the cl/2 and 1-cl/2 percentiles of the ''n'' sample score deviations just selected (repeat for each PC). Add the smoothed scores to these limits to get the actual limits for each PC at each BM point. These BM points and corresponding low/high score limits constitute a lookup table for score limits for each PC in terms of BM value.
# The score limits lookup table contains upper and lower score limits for each PC, for every equally-spaced BM point over the BM range.
# The batch maturity model contains the PLS and PCA sub-models and the score limits lookup table. It is applied to a new batch processing dataset, X1, by applying the PLS sub-model to get BM (ypred), then applying the PCA sub-model to get scores. The upper and lower score limits (for each PC) for each sample are obtained by using the sample's BM value and querying the score limits lookup table. A sample is considered to be an inlier if its score values are within the score limits for each PC.


Fig. 1 shows an example of the Batch Maturity Scores Plot (obtained from the BATCHMATURITY Analysis window's Scores Plot). This shows the second PC scores upper and lower limit as a function of BM as calculated form the "Dupont_BSPC" demo dataset using cl = 0.9 and step 2 only from batches 1 to 36. These batches had normal processing conditions so the shaded zone enclosed by the limit lines indicates the range where a measured sample's PC=2 scores should occur if processing is evolving "normally". Similar plots result for the other modeled PCs. The data points shown are the PCA model scores, which are accessible from the batchmaturity model or pred's <tt>t</tt> field.
* '''mode 1''' corresponds to '''samples'''


<gallery caption="Fig. 1. Batchmaturity Scores Plot." widths="400px" heights="300px" perrow="1">
* <div>'''mode 2''' corresponds to '''emission'''</div>
File:BMScoreScoresPlot.png|Plot showing Scores for PC 2 as a function of batch maturity (Ypred from the PLS model).
</gallery>


====Inputs====
* <div>'''mode 3''' corresponds to '''excitation'''</div>
* '''x''' = X-block (2-way array class "double" or "dataset").
* '''y''' = Y-block (vector class "double" or "dataset").
* '''ncomp_pca''' = Number of components to to be calculated  in PCA model (positive integer scalar).
* '''ncomp_reg''' = Number of latent variables for regression method.


====Outputs====
The built-in EEM importers will handle this configuration automatically. When importing manually (see below), further manipulation will likely be necessary. Use the Transform &rarr; Permute modes and Transform &rarr; Reshape smenu items to modify your imported data as appropriate.
* '''model''' = standard model structure containing the PCA and Regression model (See MODELSTRUCT).
* '''pred''' = prediction structure contains the scores from PCA model for the input test data as pred.t.


Model and pred contain the following fields which relate to score limits and
'''Solution 2) For three-way data with few slabs:'''  
whether samples are within normal ranges or not:
:'''limits''' : struct with fields:
::'''cl''': value used for cl option
::'''bm''': (1 x bmlookuppts) bm values for score limits
::'''low''': (nPC x bmlookuppts) lower score limit of inliers
::'''high''': (nPC x bmlookuppts) upper score limit of inliers
::'''median''': (nPC x bmlookuppts) median trace of scores
:'''inlimits''' : (nsample x nPC) logical indicating if samples are inliers.
:'''t''' : (nsample x nPC) scores from the PCA submodel.
:'''t_reduced''' : (nsample x nPC) scores scaled by limits, with limits -> +/- 1 at upper/lower limit, -> 0 at the median score.
:'''submodelreg''' : regression model built to predict bm. Only PLS currently.
:'''submodelpca''' : PCA model used to calculate X-block scores.


===Options===
<ol style="list-style-type:lower-alpha">
  <li>Import the data slabs into the workspace (browser). The workspace browser is available from the main analysis user interface from the menu item FigBrowser.</li>
  <li>Each slab, i.e. each matrix of data is imported individually. Hence, if you have a '''10x8x3''' array, you will import three slabs each of size '''10x8'''.</li>
  <li>Use the mouse to drag slab two onto slab one. In the window that opens choose Augment and then choose augment in the Slabs direction.</li>
  <li>A two-slab three-way array has now replaced the first data matrix. More slabs can be added in the same fashion.</li>
</ol>


options =  a structure array with the following fields:
Alternatively, you may also open one slab in the dataset editor and then add additional slabs using File &rarr; Import. After selecting the next slab to import, answer the same questions as in step c above. Repeat for each slab.


* '''regression_method''' : [ {'pls'} ] A string indicating type of regression method to use. Currently, only 'pls' is supported.
'''Solution 3) For larger three-way data:'''  
* '''preprocessing''' : { [] } preprocessing structure goes to both PCA and PLS. PLS Y-block preprocessing will always be autoscale.
* '''zerooffsety''' : [ 0 | {1}] transform y resetting to zero per batch
* '''stretchy''' : [ 0 | {1}] transform y to have range=100 per batch
* '''cl''' : [ 0.90 ] Confidence limit (2-sided) for moving limits (defined as 1 - Expected fraction of outliers.)
* '''nearestpts''' : [{25}] number nearby scores used in getting limits
* '''smoothing''' : [{0.05}] smoothing of limit lines. Width of window used in Savgol smoothing as a fraction of BM range.
* '''bmlookuppts''' : [{1001}] number of equally-spaced points in BM lookup table mentioned in Methodology Step 4 above. Default gives lookup values spaced every 0.1% over the BM range.
* '''plots''' : [ 'none' | 'detailed' | {'final'} ] governs production of plots when model is built. 'final' shows standard scores and loadings plots. 'detailed' gives individual scores plots with limits for all PCs.
* '''waitbar''' : [ 'off' | {'auto'} ] governs display of waitbar when calculating confidence limits ('auto' shows waitbar only when the calculation will take longer than 15 seconds)


===See Also===
In the DataSet editor, you can import a full three-way array if you have it organized as a two-way matrix. Upon importing the two-way data, you can reshape to a three-way array using the menu item: Transform &rarr; Fold into 3-way.


[[batchfold]], [[batchdigester]]
For example, you have the above matrices (three slabs) in one table/matrix:
 
  [ Slab1;
  Slab2;
  Slab3 ]
 
hence have the three slabs below each other. Upon importing, use the menu option described above to "Fold into 3-way" and choose three as the number of slabs and the data will be rearranged accordingly. If you are familiar with the MATLAB function <code>reshape</code>, you may also use Transform &rarr; Reshape for other types of rearrangements.
 
Note: the result of this command will give you slabs in the 3rd mode of the DataSet. If these slabs are separate samples (such as with EEMs), you'll want to use the Transform &rarr; Permute menu to reorder the dimensions. For example, permuting to the order [3 2 1] would swap the order of the 1st and 3rd modes, putting slabs as the first mode.
 
 
'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]'''
 
[[Category:FAQ]]

Revision as of 10:45, 20 June 2019

Issue:

How do I import three-way data into Solo or PLS_Toolbox?

Possible Solutions:

Solution 1) Built in EEM importers :

If applicable to your file type, use one of the built in EEM importers. There are importers for EEM data from Hitachi, Shimazdu, Horiba and Jasco. Please see this wiki entry for more information on Data Importing Formats

EEM data needs be configured in a specific way such that:

  • mode 1 corresponds to samples
  • mode 2 corresponds to emission
  • mode 3 corresponds to excitation

The built-in EEM importers will handle this configuration automatically. When importing manually (see below), further manipulation will likely be necessary. Use the Transform → Permute modes and Transform → Reshape smenu items to modify your imported data as appropriate.

Solution 2) For three-way data with few slabs:

  1. Import the data slabs into the workspace (browser). The workspace browser is available from the main analysis user interface from the menu item FigBrowser.
  2. Each slab, i.e. each matrix of data is imported individually. Hence, if you have a 10x8x3 array, you will import three slabs each of size 10x8.
  3. Use the mouse to drag slab two onto slab one. In the window that opens choose Augment and then choose augment in the Slabs direction.
  4. A two-slab three-way array has now replaced the first data matrix. More slabs can be added in the same fashion.

Alternatively, you may also open one slab in the dataset editor and then add additional slabs using File → Import. After selecting the next slab to import, answer the same questions as in step c above. Repeat for each slab.

Solution 3) For larger three-way data:

In the DataSet editor, you can import a full three-way array if you have it organized as a two-way matrix. Upon importing the two-way data, you can reshape to a three-way array using the menu item: Transform → Fold into 3-way.

For example, you have the above matrices (three slabs) in one table/matrix:

 [ Slab1;
 Slab2;
 Slab3 ]

hence have the three slabs below each other. Upon importing, use the menu option described above to "Fold into 3-way" and choose three as the number of slabs and the data will be rearranged accordingly. If you are familiar with the MATLAB function reshape, you may also use Transform → Reshape for other types of rearrangements.

Note: the result of this command will give you slabs in the 3rd mode of the DataSet. If these slabs are separate samples (such as with EEMs), you'll want to use the Transform → Permute menu to reorder the dimensions. For example, permuting to the order [3 2 1] would swap the order of the 1st and 3rd modes, putting slabs as the first mode.


Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com