Analysis Menu: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
imported>Jeremy
No edit summary
Line 10: Line 10:
===Exploratory and Cluster Analysis Methods===
===Exploratory and Cluster Analysis Methods===
These methods are one-block methods and require only the X block to operate. The Y block is not used in these methods.
These methods are one-block methods and require only the X block to operate. The Y block is not used in these methods.
* '''PCA''' Principal Component Analysis: used for exploratory data analysis and Multivariate Statistical Process Control as well as general pattern recognition and fault detection applications.
* '''PCA''' Principal Component Analysis: used for exploratory data analysis and Multivariate Statistical Process Control as well as general pattern recognition and fault detection applications. PLS_Toolbox users, see also: [[pca]]
* '''Purity''': Interactive mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Most useful on data where some samples and/or variables represent "pure" responses or components (non-mixtures). Goal is to provide more physically-interpretable results than PCA.
* '''Purity''': Interactive mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Most useful on data where some samples and/or variables represent "pure" responses or components (non-mixtures). Goal is to provide more physically-interpretable results than PCA. PLS_Toolbox users, see also: [[purity]]
* '''MCR''' Multivariate Curve Resolution: Automated mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Uses an algorithm with successive approximations which can take some time to complete and has some ambiguity, but can operate with complicated mixtures of unknown components.  
* '''MCR''' Multivariate Curve Resolution: Automated mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Uses an algorithm with successive approximations which can take some time to complete and has some ambiguity, but can operate with complicated mixtures of unknown components. PLS_Toolbox users, see also: [[mcr]]
* '''PARAFAC''' PARAllel FACtor analysis: Very similar to MCR, but can be applied to multiway data (data with 3 or more dimensions) as well as typical 2-way data. Results for 2-way data are essentially the same as MCR, but results on multiway data can be very deterministic.
* '''PARAFAC''' PARAllel FACtor analysis: Very similar to MCR, but can be applied to multiway data (data with 3 or more dimensions) as well as typical 2-way data. Results for 2-way data are essentially the same as MCR, but results on multiway data can be very deterministic. PLS_Toolbox users, see also: [[parafac]]
* '''MPCA''' Multiway Principal Component Analysis: used for exploratory analysis of 3-way batch data in which the first mode is usually time, the second mode is variables, and the third mode is sample (e.g. batch number, or wafer number, in the case of semiconductor field). MPCA identifies trends both between variables, but also changes in variables through time, known as trajectory. Models can be more complicated to interpret than PCA models and may be more sensitive to minor variations, but can provide improved selectivity in some cases.
* '''MPCA''' Multiway Principal Component Analysis: used for exploratory analysis of 3-way batch data in which the first mode is usually time, the second mode is variables, and the third mode is sample (e.g. batch number, or wafer number, in the case of semiconductor field). MPCA identifies trends both between variables, but also changes in variables through time, known as trajectory. Models can be more complicated to interpret than PCA models and may be more sensitive to minor variations, but can provide improved selectivity in some cases. PLS_Toolbox users, see also: [[mpca]]
* '''Cluster''' : Performs a variety of unsupervised cluster analysis methods. Used to look for simalarities between samples with resutls displayed as a dendrogram with similar samples grouped together and attached by short "branches". A number of similarity metrics are available with different sensitivites.
* '''Cluster''' : Performs a variety of unsupervised cluster analysis methods. Used to look for similarities between samples with resutls displayed as a dendrogram with similar samples grouped together and attached by short "branches". A number of similarity metrics are available with different sensitivities. PLS_Toolbox users, see also: [[cluster]]


===Quantitative Analysis Methods===
===Quantitative Analysis Methods===
These methods are used in quantitative problems where one needs to determine the amount of a component, property, or other value based on the measured X-block responses. They are all two-block methods and most require both an X and a Y block to operate.
These methods are used in quantitative problems where one needs to determine the amount of a component, property, or other value based on the measured X-block responses. They are all two-block methods and most require both an X and a Y block to operate.
*'''PLS''' Partial Least Squares: Factor-based regression method using an inverse regression equation. PLS identifies latent variables (factors or patterns) in the X block which can be used to predict the column(s) of the Y block. Inverse methods are often used when not all underlying sources of variation are known and quantified.
*'''PLS''' Partial Least Squares: Factor-based regression method using an inverse regression equation. PLS identifies latent variables (factors or patterns) in the X block which can be used to predict the column(s) of the Y block. Inverse methods are often used when not all underlying sources of variation are known and quantified. PLS_Toolbox users, see also: [[pls]]
*'''PCR''' Principal Component Regression: Inverse regression method closely related to PLS with similar goals. PCR may be less sensitive to random and systematic error in the Y block but more sensitive to systematic error in the X block.
*'''PCR''' Principal Component Regression: Inverse regression method closely related to PLS with similar goals. PCR may be less sensitive to random and systematic error in the Y block but more sensitive to systematic error in the X block. PLS_Toolbox users, see also: [[pcr]]
*'''MLR''' Multiple Linear Regression: Non-factor based inverse regression method. MLR uses raw variable responses in X to predict Y. This method requires that all columns of X be unique (not highly correlated) and may be highly unstable or unusable with many variables. Models do not provide quality of fit statistics.
*'''MLR''' Multiple Linear Regression: Non-factor based inverse regression method. MLR uses raw variable responses in X to predict Y. This method requires that all columns of X be unique (not highly correlated) and may be highly unstable or unusable with many variables. Models do not provide quality of fit statistics. PLS_Toolbox users, see also: [[mlr]]
*'''CLS''' Clasical Least Squares: Factor-based classical regression method based on a simple linearally additive model. CLS works well when all responses in a system are known or can be determined experimentally. Often works well when several underlying sources of variance exist and their quantity needs to be determined.
*'''CLS''' Classical Least Squares: Factor-based classical regression method based on a simple linearly additive model. CLS works well when all responses in a system are known or can be determined experimentally. Often works well when several underlying sources of variance exist and their quantity needs to be determined. PLS_Toolbox users, see also: [[cls]]
:: Unlike the other methods, CLS can operate on an X block alone. If no Y block is loaded, CLS assumes that the samples in the X block are "pure component responses" (i.e. each row of the X block represents what an individual component of the system looks like when measured on its own.)
: Unlike the other methods, CLS can operate on an X block alone. If no Y block is loaded, CLS assumes that the samples in the X block are "pure component responses" (i.e. each row of the X block represents what an individual component of the system looks like when measured on its own.)


===Classification Methods===
===Classification Methods===
These methods are used to identify a sample as belonging to one or more groups of previously-classified samples. The samples in the calibration set must be assigned to classe(s). These class assignments are then used to help identify the class for an unknown sample. They are mostly one-block methods which operate on a single X block with the calibration samples' "class" field assigned to indicate the class membership.  
These methods are used to identify a sample as belonging to one or more groups of previously-classified samples. The samples in the calibration set must be assigned to class(es). These class assignments are then used to help identify the class for an unknown sample. They are mostly one-block methods which operate on a single X block with the calibration samples' "class" field assigned to indicate the class membership.  
*'''KNN''' K-Nearest Neighbors: A classification method which assigns an unknown sample to a class by identifying the "k" closest samples in the calibration set and tallying a "vote" of the clasess of these samples. The class which receives the highest vote count is determined to be the class of the unknown.
*'''KNN''' K-Nearest Neighbors: A classification method which assigns an unknown sample to a class by identifying the "k" closest samples in the calibration set and tallying a "vote" of the classes of these samples. The class which receives the highest vote count is determined to be the class of the unknown. PLS_Toolbox users, see also: [[knn]]
*'''PLSDA''' Partial Least Squares Discriminant Analysis: A classification method which identifies the differences between two or more classes by identifying what is different between the classes. PLSDA is a factor-based method very similar to Linear Discriminant Analysis (LDA) but does not suffer from problems with colinear (highly related) variables.
*'''PLSDA''' Partial Least Squares Discriminant Analysis: A classification method which identifies the differences between two or more classes by identifying what is different between the classes. PLSDA is a factor-based method very similar to Linear Discriminant Analysis (LDA) but does not suffer from problems with collinear (highly related) variables. PLS_Toolbox users, see also: [[plsda]]
*'''SIMCA''' Soft Independent Modeling of/by Class Analogy: A classification method in which a PCA model is created for each class in the calibration data. Unknown samples are then projected into each PCA model and classified as in or not in each class based on whether the sample falls "inside" each PCA model.
*'''SIMCA''' Soft Independent Modeling of/by Class Analogy: A classification method in which a PCA model is created for each class in the calibration data. Unknown samples are then projected into each PCA model and classified as in or not in each class based on whether the sample falls "inside" each PCA model. PLS_Toolbox users, see also: [[simca]]

Revision as of 17:58, 13 November 2008

The Analysis Menu in the Analysis GUI provides access to various analysis methods within the GUI. Some of these methods are "one-block" methods (meaning they operate on only the X data) and others are "two-block" methods (meaning they require both an X and a Y data be loaded). Most methods also create a "model" when they are executed.

In all cases, once the analysis method has been selected and appropriate data loaded, a model can be built by clicking on the Calibrate button in the toolbar (the Gears) Calibrate Button or by clicking on the "Model" button in the Status Panel Model button.png.

For many methods, particular method options can be selected or modified using the Options button: Options Button

The following methods are available in most versions of Solo and PLS_Toolbox (some special versions of the software may have fewer or additional methods). The methods are divided into groups based on their typical application:

Exploratory and Cluster Analysis Methods

These methods are one-block methods and require only the X block to operate. The Y block is not used in these methods.

  • PCA Principal Component Analysis: used for exploratory data analysis and Multivariate Statistical Process Control as well as general pattern recognition and fault detection applications. PLS_Toolbox users, see also: pca
  • Purity: Interactive mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Most useful on data where some samples and/or variables represent "pure" responses or components (non-mixtures). Goal is to provide more physically-interpretable results than PCA. PLS_Toolbox users, see also: purity
  • MCR Multivariate Curve Resolution: Automated mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Uses an algorithm with successive approximations which can take some time to complete and has some ambiguity, but can operate with complicated mixtures of unknown components. PLS_Toolbox users, see also: mcr
  • PARAFAC PARAllel FACtor analysis: Very similar to MCR, but can be applied to multiway data (data with 3 or more dimensions) as well as typical 2-way data. Results for 2-way data are essentially the same as MCR, but results on multiway data can be very deterministic. PLS_Toolbox users, see also: parafac
  • MPCA Multiway Principal Component Analysis: used for exploratory analysis of 3-way batch data in which the first mode is usually time, the second mode is variables, and the third mode is sample (e.g. batch number, or wafer number, in the case of semiconductor field). MPCA identifies trends both between variables, but also changes in variables through time, known as trajectory. Models can be more complicated to interpret than PCA models and may be more sensitive to minor variations, but can provide improved selectivity in some cases. PLS_Toolbox users, see also: mpca
  • Cluster : Performs a variety of unsupervised cluster analysis methods. Used to look for similarities between samples with resutls displayed as a dendrogram with similar samples grouped together and attached by short "branches". A number of similarity metrics are available with different sensitivities. PLS_Toolbox users, see also: cluster

Quantitative Analysis Methods

These methods are used in quantitative problems where one needs to determine the amount of a component, property, or other value based on the measured X-block responses. They are all two-block methods and most require both an X and a Y block to operate.

  • PLS Partial Least Squares: Factor-based regression method using an inverse regression equation. PLS identifies latent variables (factors or patterns) in the X block which can be used to predict the column(s) of the Y block. Inverse methods are often used when not all underlying sources of variation are known and quantified. PLS_Toolbox users, see also: pls
  • PCR Principal Component Regression: Inverse regression method closely related to PLS with similar goals. PCR may be less sensitive to random and systematic error in the Y block but more sensitive to systematic error in the X block. PLS_Toolbox users, see also: pcr
  • MLR Multiple Linear Regression: Non-factor based inverse regression method. MLR uses raw variable responses in X to predict Y. This method requires that all columns of X be unique (not highly correlated) and may be highly unstable or unusable with many variables. Models do not provide quality of fit statistics. PLS_Toolbox users, see also: mlr
  • CLS Classical Least Squares: Factor-based classical regression method based on a simple linearly additive model. CLS works well when all responses in a system are known or can be determined experimentally. Often works well when several underlying sources of variance exist and their quantity needs to be determined. PLS_Toolbox users, see also: cls
Unlike the other methods, CLS can operate on an X block alone. If no Y block is loaded, CLS assumes that the samples in the X block are "pure component responses" (i.e. each row of the X block represents what an individual component of the system looks like when measured on its own.)

Classification Methods

These methods are used to identify a sample as belonging to one or more groups of previously-classified samples. The samples in the calibration set must be assigned to class(es). These class assignments are then used to help identify the class for an unknown sample. They are mostly one-block methods which operate on a single X block with the calibration samples' "class" field assigned to indicate the class membership.

  • KNN K-Nearest Neighbors: A classification method which assigns an unknown sample to a class by identifying the "k" closest samples in the calibration set and tallying a "vote" of the classes of these samples. The class which receives the highest vote count is determined to be the class of the unknown. PLS_Toolbox users, see also: knn
  • PLSDA Partial Least Squares Discriminant Analysis: A classification method which identifies the differences between two or more classes by identifying what is different between the classes. PLSDA is a factor-based method very similar to Linear Discriminant Analysis (LDA) but does not suffer from problems with collinear (highly related) variables. PLS_Toolbox users, see also: plsda
  • SIMCA Soft Independent Modeling of/by Class Analogy: A classification method in which a PCA model is created for each class in the calibration data. Unknown samples are then projected into each PCA model and classified as in or not in each class based on whether the sample falls "inside" each PCA model. PLS_Toolbox users, see also: simca