Bspcgui and Evri faq: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Lyle
 
Line 1: Line 1:
__TOC__
__TOC___
==Importing / Exporting==


=Introduction=
[[faq_concatenate_multiple_files|How do I concatenate multiple files into a single DataSet?]]
Batch Statistical Process Control (BSPC) is the analysis of process data as a function of both correlation among the measured variables and correlation in time (also known as the batch trajectory). The data is subdivided into "batches" (experiments) each of which may be further subdivided into "Steps" (sub-divisions of batch indicating processing segments or other divisions of batches). BSPC goes by many names, process monitoring, fault detection, and anomaly detection, to name a few. Methods generally rely on a model that describes normal and/or desirable operation. Often much is learned about the process from simply the creation of a model. Given a process model, future operating data can be compared to the model to determine if the process condition is nominal.


The BSPC interface prompts the user to select a type of analysis method they want to do, then guides the user through the steps necessary to use that method including:
[[faq_create_multivariate_image_from_separate_images|How do I create a multivariate image from separate images?]]
# Importing and organizing the batch data
# Assuring the batch and step labels (if desired) are assigned
# Aligning the time axes of the batches (if needed for the specific analysis method)
# Choosing other data manipulation settings as needed for the method
# Rearranging the data to the appropriate format for the specific analysis method


[[faq_export_PCA_scores_and_loadings_to_text_file|How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?]]


=Getting Started=
[[faq_import_three-way_data|How do I import three-way data into Solo or PLS_Toolbox?]]
Data is derived directly from process data with the goal being to summarize high-dimensional data with a handful of factors that capture important directions in the data. Success is highly dependent upon the quantity and quality of process data.


Raw data is presumed to be in a 2 dimensional DataSet with rows being samples in time and columns being variables.
[[faq_import_horiba_NGC_64bit |Why can't I import a Horiba NGC file on my 64-bit computer?]]


[[Image:bspc_data_config.png|200px|Data Configuration]]
[[faq_SPCREADR_cant_read_multiple_files |Why can't SPCREADR read multiple files I've selected?]]


===Model Types===
[[faq_some_EXCEL_files_fail_to_import |Why do some Excel files fail to import?]]


The following describes the model types available as targets of the BSPC processing. The dimensions of the resulting processed data and other considerations are listed along with a brief description of the unique characteristics of the model type.
==General==


{| class="wikitable" border="1"
[[faq_PARALIND_in_PLS_Toolbox |Can I do PARALIND in PLS_Toolbox?]]
|+ BSPC Model Types
! Model !! Modes (Dimensions) !! Equal Length Batches !! Steps Aligned !! Data Shape !! Model Comments
|-
| Summary PCA || 2 || No || No || Batch x (Step/Summary) || PCA on statistics summarizing the change in measured variables over the batch progress. Less sensitive to specific batch trajectory.
|-
| [[Batchmaturity|Batch Maturity]] || 2 || No || No || (Batch/Step) x Variable, Can have Y-Block to indicate maturity || PCA with heterogeneous confidence limits based on percent progression through batch.
|-
| [[Mpca|MPCA]] || 3 || Yes || Yes || Time (step) x Variable x Batch || Multiway PCA - captures correlation between variables and their changes through time (trajectory). Very sensitive to trajectory differences.
|-
| [[Parafac|PARAFAC]] || 3 || Yes || Yes || Batch x Variable x Time (step) || Parallel Factor Analysis (multiway). Imposes stronger expectation of similarity between variable trajectories than MPCA.
|-
| Summary PARAFAC || 3 || No || No ||  Batch x Step x Summary || PARAFAC on summary statistics of variables over time. Less sensitive to specific batch trajectory, imposes expectation of correlation between steps of process.
|-
| [[Parafac2|PARAFAC2]] || 3 || No || No ||  Cell Array of Batches || PARAFAC with relaxed multiway structures (only available at PLS_Toolbox command line). Much less sensitive to specific batch trajectory than PARAFAC or MPCA.
|}


See Also: [[batchmaturity|Batch Maturity]], [[mpca|MPCA]], [[MSPC_and_Identification_of_Finite_Impulse_Response_Models|MSPC]], [[parafac|PARAFAC]], [[parafac2|PARAFAC2]]
[[faq_install_on_more_than_one_PC | Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?]]


=Batch Processor Window=
[[faq_multiple_class_sets_together_in_SIMCA_PLSDA_LDA | Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?]]


The goal of the Batch Processor interface is to make it easier to assemble batch data for multivariate analysis. Different analyses and conditions require different data manipulation. This interface attempts to simplify the assembling of data for batch analysis which might otherwise be [[media:Bspc_diagram_roadmap.png |‎ very complicated]].
[[faq_more_info_on_R_Squared_statistic | Can you give me more information on the R-Squared statistic?]]


The workflow of the interface moves from left to right across the tabs at the top of the interface. Loading data and choosing an Analysis Type will enable relevant tabs. Clicking the '''Next''' button will open the next enabled tab. Batches and steps are defined, then alignment and summary information is added. When finished, "folded" data can be saved or exported to the [[Analysis GUI|analysis]] interface and a model for folding new data can be saved.
[[faq_how_RMSEC_and_RMSECV_related to R2Y_and_Q2Y_seen_other_software | How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?]]


[[Image:BSPCGUI main.png| BSPC GUI]]
[[faq_convergence_of_PARAFAC| Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?]]


==Start==
[[faq_does_software_stop_working_if_maintenance_expires | Does the software stop working if my maintenance expires?]]
Load, append, edit, and or clear data. Selecting the Analysis type will automatically enable/disable relevant tabs.


* Dropping data onto the status area will load data. If previously loaded data exists, a prompt for overwrite or augment will appear.
[[faq_report_a_problem_with_PLS_Toolbox | How and where do I report a problem with PLS_Toolbox?]]
** If augment is chosen, two options will be given, augment as new batch or not. Augment as new batch adds a class for the data being augmented otherwise a "normal" augment will occur and if the new dataset has a matching class it will be merged.
* Dragging and dropping multiple-selected (Excel) files from the system browser (e.g., Windows Explorer or Finder) will pre-augment the files and create a label indicating file name. This label can be used to identify batches in the '''Batches''' tab.
* Data can be edited in the [[DataSet Editor]] by clicking the '''Edit''' button. Editing will cause the model to be cleared.


==Batch==
[[faq_how_are_T_contributions_calculated | How are T-contributions calculated?]]
Indicates which samples belong to which batch based on information in the loaded DataSet. Sources can be Class, Label, or Axisscale sets, or a single Variable (column). If manually loaded, a class is created from the loaded content. If the DataSet contains a class with the default name of "BSPC Batch" then it will be automatically selected after loading.


* If variable is used, data for that column will be excluded (not deleted) so other mechanisms (preprocessing) can work.
[[faq_how_are_ROC_curves_calculated_for_PLSDA | How are the ROC curves calculated for PLSDA?]]
* Once Batches have been identified, one or more batches can be plotted in the lower plot.


All methods, except [[batchmaturity|Batch Maturity]] require defining a means to identify the different batches because these are used to form the samples of the input matrices (and multiple samples are required for all methods other than Batch Maturity).
[[faq_how_are_error_bars_calculated_regression_model | How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?]]


==Steps==
[[faq_improve_performance_with_PLS_Toolbx_and_Matlab_on_Mac | How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?]]
Steps (subdivisions of batches) can be indicated on the '''Steps''' tab. Steps can be created in the same manor as '''Batches''' or indicated manually. Particular steps to be included or excluded can be selected.


Manual selection is done by selecting a primary variable and batch to align '''to''' then designating '''steps''' for the primary variable/batch. After the steps are set the [[batchalign]] function is used to "map" step location (as dataset class) for each batch.
[[faq_assign_classes_for_samples_in_a_DataSet | How do I assign classes for samples in a DataSet?]]


===Manually Selecting Steps===
[[faq_build_a_classification_model_from_class_set_other_than_the_first | How do I build a classification model from a class set other than the first?]]


[[Image:bspc_manual_select.png|500px|Manual Selection Interface]]
[[faq_choose_between_different_cross_validation_leave_out_options | How do I choose between the different cross-validation leave-out options?]]


To manually select steps:
[[faq_reference_Eigenvector| How do I cite/reference Eigenvector?]]


# Select the variable and batch to use from the plot list boxes at the bottom of the interface. These will become the variable and batch to which all others are aligned to (designated by a "*" next to the list item.
[[faq_interpret_ROC_curves_and_Sensitivity_Specificity_plots_from_PLSDA | How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?]]
# Click the '''Select''' button and the interface will switch.
# Click the '''Add''' button to place the first step marker.
# Drag this marker to the first step location.
# Repeat until all steps are placed.
# Select different batch from list menu to display "aligned" step position.
# Adjust alignment algorithm as needed using toolbar button.
# Click check-mark button to finish and save steps.


===Selected Steps Menu===
[[faq_make_DataSet_backwards_compatible | How do I make a DataSet backwards compatible?]]


[[Image:bspc_selected_steps.png|300px|]]
[[faq_obtain_or_use_recompilation_license_for_PLS_Toolbox | How do I obtain or use a recompilation license for PLS_Toolbox?]]


Once steps have been designated, they will appear the '''Step Selection''' list. If one or more steps should be ignored they can be deselected in this menu. Selected steps will appear in the batch plot as solid green lines and unselected steps appear as red dashed lines.
[[faq_use_custom_cross_validation_option | How do I use the "custom" cross-validation option?]]


==Align==
[[faq_out_of_memory_error_when_analyzing_data | I keep getting "out of memory" errors when analyzing my data. What can I do?]]


Methods that require equal length batches use the tools available on the '''Align''' tab from the [[batchalign]] function.
[[faq_java_lang_OutOfMemoryError| What can I do if I get a java.lang.OutOfMemoryError error?]]


The process to configure the alignment is:
[[faq_why_get_negative_scores_when_all_modes_are_set_to_nonnegativity | Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?]]


# Select the type of alignment:
[[faq_what_are_relative_contributions | What are "Relative Contributions"?]]
#* '''Linear''' - Linear interpolation or decimation to match selected batch's length.
#* '''COW''' - [[cow|Correlation Optimized Warping]] with Alignment Settings values.
#* '''Pad With NaN''' - Append each batch with NaN to make all batches equal length.
# Select the Batch and Variable to use as a reference (target) or Load a vector.
# Select alignment settings (if using COW).
# Click Update Plot to see the results.


The plots switch to displaying selected variables and batches with the pre-aligned data on top and post-aligned data on the bottom. Click the '''Update Plots''' button to refresh the plot after making any changes.
[[faq_what_are_reduced_T^2_and_Q_Statistics | What are the "Reduced" T<sup>2</sup> and Q Statistics?]]


[[Image:bspc_align_settings.png|Align Settings ]]
[[faq_units_for_RMSEC_and_RMSECV_for_PLSDA | What are the units used for RMSEC and RMSECV when cross-validating PLSDA models?  Why do the cross-validation curves look strange for PLSDA?]]


NOTE: In the image above, the alignment batch is Class 0 (the default) which has no members. This must be changed before alignment will work.
[[faq_what_do_the_four_Fit_/_Unique_Fit_stats_mean_in_MCR_PARAFAC | What do the four Fit/Unique Fit statistics mean in MCR and PARAFAC models?]]


==Summarize==
[[faq_internal_tests_used_to_select_suggested_number_of_PCs | What internal tests are used to select "suggested" number of PCs?]]


The summary PCA and summary PARAFAC methods make use of statistical summary functions to capture the trends in the trajectories of the variables. A Summary PCA or Summary PARAFAC model does not require alignment of batches and is generally less sensitive to the exact trajectory of the batches, providing some model robustness.
[[faq_what_is_PLS1_v_PLS2_and_how_to_create_separate_PLS1_models_from_multi_column_y_block | What is PLS1 vs PLS2 and how do I create separate PLS1 models when I have a multi-column y-block?]]


The statistics are calculated by the [[summary]] function and have different sensitivities to profile changes. The ability of each statistic to capture useful information from a trajectory depends on the dynamics and the statistic. Often it is useful to include a number of the statistics and decide, while modeling, which seem to be providing information and excluding the remaining statistics. If the statistics are sufficiently sensitive to the trajectory profile as to provide detection of out-of-specification batches, then the model will likely provide longer-term performance over an equivalent MPCA or PARAFAC model.
[[faq_difference_between_a_loading_and_a_weighting | What is the difference between a loading and a weighting?]]


The statistics which are available include:
[[faq_why_some_axis_labels_and_titles_upside_down_in_MIA_Toolbox | Why are some axis labels and titles on my axes upside-down when I'm viewing images in MIA_Toolbox?]]


[[Image:Bspc_summarize.png|Summary Options]]
[[faq_why_can't_I_recompile_PLS_Toolbox_functions | Why can't I recompile the PLS_Toolbox functions?]]


All stats summarize each column and each step (if specified) except for:
[[faq_why_get_missing_data_warning| Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action."]]
* '''Length''' Length of step = a single number for each step (irrespective of number of variables).
* '''Five-Number Summary''' 10, 25, 50, 75, 90th percentile = 5 values per step per variable.


For example with the [[Demonstration_Datasets | Dupont]] demo calibration data (dupont_cal), if you choose mean, std, slope, skewness, and length the size of your folded summary pca data will be:
[[faq_why_PLS_Toolbox_have_a_boxplot_function_that_conflicts_with_Stats_Toolbox | Why does PLS_Toolbox have a "boxplot" function that conflicts with the Mathworks Statistics Toolbox function of the same name?]]


10 variables x 4 stats + length = 41 values per step * 5 steps = 205 columns
[[faq_why_R2014b_give_error_cannot_convert_double_value_to_a_handle | Why does R2014b give me "Error using matlab.ui.Figure... Cannot convert double value to a handle" (or similar)?]]


==Finish==
==Command Line==
[[faq_specify_all_options_of_a_function_or_only_those_different_from_defaults | Do I have to specify all the options to a function or only the ones that are different from the defaults?]]


When completed there are 4 options:
[[faq_how_are_Q_residuals_and_Hotellings_T2_calculated_in_PLS_models | How are the Q-residuals and Hotelling's T<sup>2</sup> values calculated for PLS models?]]


* Send data directly to a new [[Analysis]] window.
[[faq_how_to_automate_PCA_analysis_for_multiple_images  | How do I automate PCA analysis for multiple images?]]
* Save the data to the workspace.
 
* Save a model for future data application. NOTE: In some more complicated instances (loading outside information) the model may not be able to fully capture each step taken in the interface.
[[faq_how_do_I_calculate_my_own_T2_and_Q_limits | How do I calculate my own T<sup>2</sup> and Q limits?]]
* Cancel and close the window.
 
[[faq_how_do_manually_calculate_the_limits_for_scores | How do I manually calculate the limits for scores?]]
 
[[faq_how_do_I_calculate_scores_from_a_PLS_or_PLSDA_model | How do I calculate scores from a PLS or PLSDA model?]]
 
[[faq_how_do_I_change_the_default_options_for_a_function | How do I change the default options for a function?]]
 
[[faq_how_do_I_interpret_the_misclassification_results_reported_by_crossval | How do I interpret the Misclassification results reported by crossval?]]
 
[[faq_how_do_I_make_PlotGUI_send_plot_to_a_new_figure_and_not_overwrite_current_figure | How do I make PlotGUI send its plot to a new figure and not overwrite the current figure?]]
 
[[faq_how_do_I_retrieve_and_display_predictions_from_a_model_structure_in_command_window | How do I retrieve and display predictions from a model structure in the command window?]]
 
==Manual==
==GUI==
==Installation==
 
 
 
 
 
 
 
[[Category:FAQ]]

Revision as of 13:26, 5 December 2018

_

Importing / Exporting

How do I concatenate multiple files into a single DataSet?

How do I create a multivariate image from separate images?

How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?

How do I import three-way data into Solo or PLS_Toolbox?

Why can't I import a Horiba NGC file on my 64-bit computer?

Why can't SPCREADR read multiple files I've selected?

Why do some Excel files fail to import?

General

Can I do PARALIND in PLS_Toolbox?

Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?

Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?

Can you give me more information on the R-Squared statistic?

How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?

Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?

Does the software stop working if my maintenance expires?

How and where do I report a problem with PLS_Toolbox?

How are T-contributions calculated?

How are the ROC curves calculated for PLSDA?

How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?

How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?

How do I assign classes for samples in a DataSet?

How do I build a classification model from a class set other than the first?

How do I choose between the different cross-validation leave-out options?

How do I cite/reference Eigenvector?

How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?

How do I make a DataSet backwards compatible?

How do I obtain or use a recompilation license for PLS_Toolbox?

How do I use the "custom" cross-validation option?

I keep getting "out of memory" errors when analyzing my data. What can I do?

What can I do if I get a java.lang.OutOfMemoryError error?

Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?

What are "Relative Contributions"?

What are the "Reduced" T2 and Q Statistics?

What are the units used for RMSEC and RMSECV when cross-validating PLSDA models? Why do the cross-validation curves look strange for PLSDA?

What do the four Fit/Unique Fit statistics mean in MCR and PARAFAC models?

What internal tests are used to select "suggested" number of PCs?

What is PLS1 vs PLS2 and how do I create separate PLS1 models when I have a multi-column y-block?

What is the difference between a loading and a weighting?

Why are some axis labels and titles on my axes upside-down when I'm viewing images in MIA_Toolbox?

Why can't I recompile the PLS_Toolbox functions?

Why do I get the warning/notice "Missing Data Found - Replacing with "best guess" from existing model. Results may be affected by this action."

Why does PLS_Toolbox have a "boxplot" function that conflicts with the Mathworks Statistics Toolbox function of the same name?

Why does R2014b give me "Error using matlab.ui.Figure... Cannot convert double value to a handle" (or similar)?

Command Line

Do I have to specify all the options to a function or only the ones that are different from the defaults?

How are the Q-residuals and Hotelling's T2 values calculated for PLS models?

How do I automate PCA analysis for multiple images?

How do I calculate my own T2 and Q limits?

How do I manually calculate the limits for scores?

How do I calculate scores from a PLS or PLSDA model?

How do I change the default options for a function?

How do I interpret the Misclassification results reported by crossval?

How do I make PlotGUI send its plot to a new figure and not overwrite the current figure?

How do I retrieve and display predictions from a model structure in the command window?

Manual

GUI

Installation