Analysis Menu: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(New page: __TOC__ The Analysis Menu in the Analysis GUI provides access to various analysis methods within the GUI. Some of these methods are "one-block" methods (meaning they operate on only th...)
 
imported>Jeremy
Line 26: Line 26:


===Classification Methods===
===Classification Methods===
These methods are methods used to identify a sample as belonging to one or more groups of previously-classified samples. They are one- or two-block methods which can operate on a single X block if the "class" field has been set for the samples. Some can operate on two-blocks without X sample classes if the Y block contains boolean (true/false) class membership information.
These methods are used to identify a sample as belonging to one or more groups of previously-classified samples. The samples in the calibration set must be assigned to classe(s). These class assignments are then used to help identify the class for an unknown sample. They are mostly one-block methods which operate on a single X block with the calibration samples' "class" field assigned to indicate the class membership.  
*'''KNN''' K-Nearest Neighbors  
*'''KNN''' K-Nearest Neighbors: A classification method which assigns an unknown sample to a class by identifying the "k" closest samples in the calibration set and tallying a "vote" of the clasess of these samples. The class which receives the highest vote count is determined to be the class of the unknown.
*'''PLSDA''' Partial Least Squares Discriminant Analysis
*'''PLSDA''' Partial Least Squares Discriminant Analysis: A classification method which identifies the differences between two or more classes by identifying what is different between the classes. PLSDA is a factor-based method very similar to Linear Discriminant Analysis (LDA) but does not suffer from problems with colinear (highly related) variables.
*'''SIMCA''' SImple Method of Class Analogy
*'''SIMCA''' Soft Independent Modeling of/by Class Analogy: A classification method in which a PCA model is created for each class in the calibration data. Unknown samples are then projected into each PCA model and classified as in or not in each class based on whether the sample falls "inside" each PCA model.

Revision as of 18:50, 13 November 2008

The Analysis Menu in the Analysis GUI provides access to various analysis methods within the GUI. Some of these methods are "one-block" methods (meaning they operate on only the X data) and others are "two-block" methods (meaning they require both an X and a Y data be loaded). Most methods also create a "model" when they are executed.

In all cases, once the analysis method has been selected and appropriate data loaded, a model can be built by clicking on the Calibrate button in the toolbar (the Gears) Calibrate Button or by clicking on the "Model" button in the Status Panel Model button.png.

For many methods, particular method options can be selected or modified using the Options button: Options Button

Available Methods

The following methods are available in most versions of Solo and PLS_Toolbox (some special versions of the software may have fewer or additional methods). The methods are divided into groups based on their typical application:

Exploratory and Cluster Analysis Methods

These methods are one-block methods and require only the X block to operate. The Y block is not used in these methods.

  • PCA Principal Component Analysis: used for exploratory data analysis and Multivariate Statistical Process Control as well as general pattern recognition and fault detection applications.
  • Purity: Interactive mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Most useful on data where some samples and/or variables represent "pure" responses or components (non-mixtures). Goal is to provide more physically-interpretable results than PCA.
  • MCR Multivariate Curve Resolution: Automated mixture analysis method used to resolve mixtures of unknown responses and provide more physically-interpretable results than PCA. Uses an algorithm with successive approximations which can take some time to complete and has some ambiguity, but can operate with complicated mixtures of unknown components.
  • PARAFAC PARAllel FACtor analysis: Very similar to MCR, but can be applied to multiway data (data with 3 or more dimensions) as well as typical 2-way data. Results for 2-way data are essentially the same as MCR, but results on multiway data can be very deterministic.
  • MPCA Multiway Principal Component Analysis: used for exploratory analysis of 3-way batch data in which the first mode is usually time, the second mode is variables, and the third mode is sample (e.g. batch number, or wafer number, in the case of semiconductor field). MPCA identifies trends both between variables, but also changes in variables through time, known as trajectory. Models can be more complicated to interpret than PCA models and may be more sensitive to minor variations, but can provide improved selectivity in some cases.
  • Cluster : Performs a variety of unsupervised cluster analysis methods. Used to look for simalarities between samples with resutls displayed as a dendrogram with similar samples grouped together and attached by short "branches". A number of similarity metrics are available with different sensitivites.

Quantitative Analysis Methods

These methods are two-block methods and most require both an X and a Y block to operate.

  • PLS Partial Least Squares: Factor-based regression method using an inverse regression equation. PLS identifies latent variables (factors or patterns) in the X block which can be used to predict the column(s) of the Y block. Inverse methods are often used when not all underlying sources of variation are known and quantified.
  • PCR Principal Component Regression: Inverse regression method closely related to PLS with similar goals. PCR may be less sensitive to random and systematic error in the Y block but more sensitive to systematic error in the X block.
  • MLR Multiple Linear Regression: Non-factor based inverse regression method. MLR uses raw variable responses in X to predict Y. This method requires that all columns of X be unique (not highly correlated) and may be highly unstable or unusable with many variables. Models do not provide quality of fit statistics.
  • CLS Clasical Least Squares: Factor-based classical regression method based on a simple linearally additive model. CLS works well when all responses in a system are known or can be determined experimentally. Often works well when several underlying sources of variance exist and their quantity needs to be determined.

Classification Methods

These methods are used to identify a sample as belonging to one or more groups of previously-classified samples. The samples in the calibration set must be assigned to classe(s). These class assignments are then used to help identify the class for an unknown sample. They are mostly one-block methods which operate on a single X block with the calibration samples' "class" field assigned to indicate the class membership.

  • KNN K-Nearest Neighbors: A classification method which assigns an unknown sample to a class by identifying the "k" closest samples in the calibration set and tallying a "vote" of the clasess of these samples. The class which receives the highest vote count is determined to be the class of the unknown.
  • PLSDA Partial Least Squares Discriminant Analysis: A classification method which identifies the differences between two or more classes by identifying what is different between the classes. PLSDA is a factor-based method very similar to Linear Discriminant Analysis (LDA) but does not suffer from problems with colinear (highly related) variables.
  • SIMCA Soft Independent Modeling of/by Class Analogy: A classification method in which a PCA model is created for each class in the calibration data. Unknown samples are then projected into each PCA model and classified as in or not in each class based on whether the sample falls "inside" each PCA model.