Batchmaturity and Evri faq: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Donal
 
imported>Lyle
 
Line 1: Line 1:
===Purpose===
__TOC___
==Importing / Exporting==


Batch process model and monitoring, identifying outliers.
[[faq_concatenate_multiple_files|How do I concatenate multiple files into a single DataSet?]]


===Synopsis===
[[faq_create_multivariate_image_from_separate_images|How do I create a multivariate image from separate images?]]
: model = batchmaturity(x,ncomp_pca,options);
: model = batchmaturity(x,y,ncomp_pca,options);
: model = batchmaturity(x,y,ncomp_pca,ncomp_reg,options);
: pred  = batchmaturity(x,model,options);
: pred  = batchmaturity(x,model);


===Description===
[[faq_export_PCA_scores_and_loadings_to_text_file|How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?]]
Analyzes multivariate batch process data to quantify the acceptable
variability of the process variables during normal processing conditions as a function of the percent of batch completion.
The resulting model can be used on new batch process data to identify
measurements which indicate abnormal processing behavior (See the
pred.inlimits field for this indicator.)


The progression through a batch is described in terms of the "Batch Maturity", BM, which is often defined in terms of percentage of completion of the process. In the BM analysis method a PCA model is built on the calibration batch dataset and the samples' PCA scores are binned according to the samples' BM values. The resulting model contains confidence limits on PCA scores as a function of batch maturity and reflect the normal range of variability of the data at any stage of progression through the batch. If one of the measured variable represents BM, or if BM is a known function of the measured variables, then a sample from a new batch can be tested against the model's score limits at its known BM value to see if it is normal sample or not. However, BM is often not measurable or known as a function of the measured variables in new batches. Instead, this relationship is estimated by building a PLS model using the calibration dataset where BM is provided. The PLS model is then used to predict the BM value for any sample.
[[faq_import_three-way_data|How do I import three-way data into Solo or PLS_Toolbox?]]


For assistance in preparing batch data for use in Batchmaturity please see [[bspcgui]].
[[faq_import_horiba_NGC_64bit |Why can't I import a Horiba NGC file on my 64-bit computer?]]


====Methodology:====
[[faq_SPCREADR_cant_read_multiple_files |Why can't SPCREADR read multiple files I've selected?]]
Given multivariate X data and a Y variable which represents the
corresponding state of batch maturity (BM) build a model by:
# Build a PLS model on X and Y using specified preprocessing. Use its self-prediction of Y, ypred, as the indicator of BM.
# Simplify the X data by performing PCA analysis (with specified preprocessing). We now have PC scores and a measure of BM (ypred) for each sample.
# Sort the samples to be in order of increasing BM. Calculate a running-mean of each PC's ordered scores ("smoothed score means"). Calculate deviations of scores from the smoothed means for each PC.
# Form a set of equally-spaced BM values over the range (BMstart, BMend). For each BM point, find the ''n'' samples which have BM closest to that value.
# For each BM point, calculate low and high score limit values corresponding to the cl/2 and 1-cl/2 percentiles of the ''n'' sample score deviations just selected (repeat for each PC). Add the smoothed scores to these limits to get the actual limits for each PC at each BM point. These BM points and corresponding low/high score limits constitute a lookup table for score limits for each PC in terms of BM value.
# The score limits lookup table contains upper and lower score limits for each PC, for every equally-spaced BM point over the BM range.
# The batch maturity model contains the PLS and PCA sub-models and the score limits lookup table. It is applied to a new batch processing dataset, X1, by applying the PLS sub-model to get BM (ypred), then applying the PCA sub-model to get scores. The upper and lower score limits (for each PC) for each sample are obtained by using the sample's BM value and querying the score limits lookup table. A sample is considered to be an inlier if its score values are within the score limits for each PC.


Fig. 1 shows an example of the Batch Maturity Scores Plot (obtained from the BATCHMATURITY Analysis window's Scores Plot). This shows the second PC scores upper and lower limit as a function of BM as calculated form the "Dupont_BSPC" demo dataset using cl = 0.9 and step 2 only from batches 1 to 36. These batches had normal processing conditions so the shaded zone enclosed by the limit lines indicates the range where a measured sample's PC=2 scores should occur if processing is evolving "normally". Similar plots result for the other modeled PCs. The data points shown are the PCA model scores, which are accessible from the batchmaturity model or pred's <tt>t</tt> field.
[[faq_some_EXCEL_files_fail_to_import |Why do some Excel files fail to import?]]


<gallery caption="Fig. 1. Batchmaturity Scores Plot." widths="400px" heights="300px" perrow="1">
==General==
File:BMScoreScoresPlot.png|Plot showing Scores for PC 2 as a function of batch maturity (Ypred from the PLS model).
</gallery>


====Inputs====
[[faq_PARALIND_in_PLS_Toolbox |Can I do PARALIND in PLS_Toolbox?]]
* '''x''' = X-block (2-way array class "double" or "dataset").
* '''y''' = Y-block (vector class "double" or "dataset").
* '''ncomp_pca''' = Number of components to to be calculated  in PCA model (positive integer scalar).
* '''ncomp_reg''' = Number of latent variables for regression method.


====Outputs====
[[faq_install_on_more_than_one_PC | Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?]]
* '''model''' = standard model structure containing the PCA and Regression model (See MODELSTRUCT).
* '''pred''' = prediction structure contains the scores from PCA model for the input test data as pred.t.


Model and pred contain the following fields which relate to score limits and
[[faq_multiple_class_sets_together_in_SIMCA_PLSDA_LDA | Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?]]
whether samples are within normal ranges or not:
:'''limits''' : struct with fields:
::'''cl''': value used for cl option
::'''bm''': (1 x bmlookuppts) bm values for score limits
::'''low''': (nPC x bmlookuppts) lower score limit of inliers
::'''high''': (nPC x bmlookuppts) upper score limit of inliers
::'''median''': (nPC x bmlookuppts) median trace of scores
:'''inlimits''' : (nsample x nPC) logical indicating if samples are inliers.
:'''t''' : (nsample x nPC) scores from the PCA submodel.
:'''t_reduced''' : (nsample x nPC) scores scaled by limits, with limits -> +/- 1 at upper/lower limit, -> 0 at the median score.
:'''submodelreg''' : regression model built to predict bm. Only PLS currently.
:'''submodelpca''' : PCA model used to calculate X-block scores.


===Options===
[[faq_more_info_on_R_Squared_statistic | Can you give me more information on the R-Squared statistic?]]


options =  a structure array with the following fields:
[[faq_how_RMSEC_and_RMSECV_related to R2Y_and_Q2Y_seen_other_software | How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?]]


* '''regression_method''' : [ {'pls'} ] A string indicating type of regression method to use. Currently, only 'pls' is supported.
[[faq_convergence_of_PARAFAC| Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?]]
* '''preprocessing''' : { [] } preprocessing structure goes to both PCA and PLS. PLS Y-block preprocessing will always be autoscale.
* '''zerooffsety''' : [ 0 | {1}] transform y resetting to zero per batch
* '''stretchy''' : [ 0 | {1}] transform y to have range=100 per batch
* '''cl''' : [ 0.90 ] Confidence limit (2-sided) for moving limits (defined as 1 - Expected fraction of outliers.)
* '''nearestpts''' : [{25}] number nearby scores used in getting limits
* '''smoothing''' : [{0.05}] smoothing of limit lines. Width of window used in Savgol smoothing as a fraction of BM range.
* '''bmlookuppts''' : [{1001}] number of equally-spaced points in BM lookup table mentioned in Methodology Step 4 above. Default gives lookup values spaced every 0.1% over the BM range.
* '''plots''' : [ 'none' | 'detailed' | {'final'} ] governs production of plots when model is built. 'final' shows standard scores and loadings plots. 'detailed' gives individual scores plots with limits for all PCs.
* '''waitbar''' : [ 'off' | {'auto'} ] governs display of waitbar when calculating confidence limits ('auto' shows waitbar only when the calculation will take longer than 15 seconds)


===See Also===
[[faq_does_software_stop_working_if_maintenance_expires | Does the software stop working if my maintenance expires?]]


[[batchfold]], [[batchdigester]], [[bspcgui]].
[[faq_report_a_problem_with_PLS_Toolbox | How and where do I report a problem with PLS_Toolbox?]]
 
[[faq_how_are_T_contributions_calculated | How are T-contributions calculated?]]
 
[[faq_how_are_ROC_curves_calculated_for_PLSDA | How are the ROC curves calculated for PLSDA?]]
 
[[faq_how_are_error_bars_calculated_regression_model | How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?]]
 
[[faq_improve_performance_with_PLS_Toolbx_and_Matlab_on_Mac | How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?]]
 
[[faq_assign_classes_for_samples_in_a_DataSet | How do I assign classes for samples in a DataSet?]]
 
[[faq_build_a_classification_model_from_class_set_other_than_the_first | How do I build a classification model from a class set other than the first?]]
 
[[faq_choose_between_different_cross_validation_leave_out_options | How do I choose between the different cross-validation leave-out options?]]
 
[[faq_reference_Eigenvector| How do I cite/reference Eigenvector?]]
 
[[faq_interpret_ROC_curves_and_Sensitivity_Specificity_plots_from_PLSDA | How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?]]
 
[[faq_make_DataSet_backwards_compatible | How do I make a DataSet backwards compatible?]]
 
[[faq_obtain_or_use_recompilation_license_for_PLS_Toolbox | How do I obtain or use a recompilation license for PLS_Toolbox?]]
 
[[faq_use_custom_cross_validation_option | How do I use the "custom" cross-validation option?]]
 
[[faq_out_of_memory_error_when_analyzing_data | I keep getting "out of memory" errors when analyzing my data. What can I do?]]
 
[[faq_java_lang_OutOfMemoryError| What can I do if I get a java.lang.OutOfMemoryError error?]]
 
[[faq_why_get_negative_scores_when_all_modes_are_set_to_nonnegativity | Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?]]
 
[[faq_what_are_relative_contributions | What are "Relative Contributions"?]]
 
[[faq_what_are_reduced_T^2_and_Q_Statistics | What are the "Reduced" T<sup>2</sup> and Q Statistics?]]
 
[[faq_units_for_RMSEC_and_RMSECV_for_PLSDA | What are the units used for RMSEC and RMSECV when cross-validating PLSDA models?  Why do the cross-validation curves look strange for PLSDA?]]
 
[[faq_what_do_the_four_Fit_/_Unique_Fit_stats_mean_in_MCR_PARAFAC | What do the four Fit/Unique Fit statistics mean in MCR and PARAFAC models?]]
 
[[faq_internal_tests_used_to_select_suggested_number_of_PCs | What internal tests are used to select "suggested" number of PCs?]]
 
[[faq_what_is_PLS1_v_PLS2_and_how_to_create_separate_PLS1_models_from_multi_column_y_block | What is PLS1 vs PLS2 and how do I create separate PLS1 models when I have a multi-column y-block?]]
 
[[faq_difference_between_a_loading_and_a_weighting | What is the difference between a loading and a weighting?]]
 
[[faq_why_some_axis_labels_and_titles_upside_down_in_MIA_Toolbox | Why are some axis labels and titles on my axes upside-down when I'm viewing images in MIA_Toolbox?]]
 
[[faq_why_can't_I_recompile_PLS_Toolbox_functions | Why can't I recompile the PLS_Toolbox functions?]]
 
[[faq_why_get_missing_data_warning| Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action."]]
 
[[faq_why_PLS_Toolbox_have_a_boxplot_function_that_conflicts_with_Stats_Toolbox | Why does PLS_Toolbox have a "boxplot" function that conflicts with the Mathworks Statistics Toolbox function of the same name?]]
 
[[faq_why_R2014b_give_error_cannot_convert_double_value_to_a_handle | Why does R2014b give me "Error using matlab.ui.Figure... Cannot convert double value to a handle" (or similar)?]]
 
==Command Line==
[[faq_specify_all_options_of_a_function_or_only_those_different_from_defaults | Do I have to specify all the options to a function or only the ones that are different from the defaults?]]
 
[[faq_how_are_Q_residuals_and_Hotellings_T2_calculated_in_PLS_models | How are the Q-residuals and Hotelling's T<sup>2</sup> values calculated for PLS models?]]
 
[[faq_how_to_automate_PCA_analysis_for_multiple_images  | How do I automate PCA analysis for multiple images?]]
 
[[faq_how_do_I_calculate_my_own_T2_and_Q_limits | How do I calculate my own T<sup>2</sup> and Q limits?]]
 
[[faq_how_do_manually_calculate_the_limits_for_scores | How do I manually calculate the limits for scores?]]
 
[[faq_how_do_I_calculate_scores_from_a_PLS_or_PLSDA_model | How do I calculate scores from a PLS or PLSDA model?]]
 
[[faq_how_do_I_change_the_default_options_for_a_function | How do I change the default options for a function?]]
 
[[faq_how_do_I_interpret_the_misclassification_results_reported_by_crossval | How do I interpret the Misclassification results reported by crossval?]]
 
[[faq_how_do_I_make_PlotGUI_send_plot_to_a_new_figure_and_not_overwrite_current_figure | How do I make PlotGUI send its plot to a new figure and not overwrite the current figure?]]
 
[[faq_how_do_I_retrieve_and_display_predictions_from_a_model_structure_in_command_window | How do I retrieve and display predictions from a model structure in the command window?]]
 
==Manual==
==GUI==
==Installation==
 
 
 
 
 
 
 
[[Category:FAQ]]

Revision as of 13:26, 5 December 2018

_

Importing / Exporting

How do I concatenate multiple files into a single DataSet?

How do I create a multivariate image from separate images?

How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?

How do I import three-way data into Solo or PLS_Toolbox?

Why can't I import a Horiba NGC file on my 64-bit computer?

Why can't SPCREADR read multiple files I've selected?

Why do some Excel files fail to import?

General

Can I do PARALIND in PLS_Toolbox?

Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?

Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?

Can you give me more information on the R-Squared statistic?

How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?

Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?

Does the software stop working if my maintenance expires?

How and where do I report a problem with PLS_Toolbox?

How are T-contributions calculated?

How are the ROC curves calculated for PLSDA?

How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?

How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?

How do I assign classes for samples in a DataSet?

How do I build a classification model from a class set other than the first?

How do I choose between the different cross-validation leave-out options?

How do I cite/reference Eigenvector?

How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?

How do I make a DataSet backwards compatible?

How do I obtain or use a recompilation license for PLS_Toolbox?

How do I use the "custom" cross-validation option?

I keep getting "out of memory" errors when analyzing my data. What can I do?

What can I do if I get a java.lang.OutOfMemoryError error?

Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?

What are "Relative Contributions"?

What are the "Reduced" T2 and Q Statistics?

What are the units used for RMSEC and RMSECV when cross-validating PLSDA models? Why do the cross-validation curves look strange for PLSDA?

What do the four Fit/Unique Fit statistics mean in MCR and PARAFAC models?

What internal tests are used to select "suggested" number of PCs?

What is PLS1 vs PLS2 and how do I create separate PLS1 models when I have a multi-column y-block?

What is the difference between a loading and a weighting?

Why are some axis labels and titles on my axes upside-down when I'm viewing images in MIA_Toolbox?

Why can't I recompile the PLS_Toolbox functions?

Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action."

Why does PLS_Toolbox have a "boxplot" function that conflicts with the Mathworks Statistics Toolbox function of the same name?

Why does R2014b give me "Error using matlab.ui.Figure... Cannot convert double value to a handle" (or similar)?

Command Line

Do I have to specify all the options to a function or only the ones that are different from the defaults?

How are the Q-residuals and Hotelling's T2 values calculated for PLS models?

How do I automate PCA analysis for multiple images?

How do I calculate my own T2 and Q limits?

How do I manually calculate the limits for scores?

How do I calculate scores from a PLS or PLSDA model?

How do I change the default options for a function?

How do I interpret the Misclassification results reported by crossval?

How do I make PlotGUI send its plot to a new figure and not overwrite the current figure?

How do I retrieve and display predictions from a model structure in the command window?

Manual

GUI

Installation