Simca

From Eigenvector Research Documentation Wiki
Revision as of 10:15, 11 November 2017 by imported>Donal (→‎Cross-validation)
Jump to navigation Jump to search

Purpose

Create soft independent method of class analogy models for classification.

Synopsis

model = simca(x,ncomp,options) %creates simca model on dataset x
model = simca(x,classid,labels) %models double x with class id
pred = simca(x,model,options); %predictions on x with model
simca % Launches an Analysis window with simca as the selected method.

Description

The function SIMCA develops a SIMCA model, which is really a collection of PCA models, one for each class of data in the data set and is used for supervised pattern recognition.

When optional input ncomp is not supplied, SIMCA operates in an interactive mode. In this mode, the user is prompted for basic preprocessing and number of components to keep in each model. Individual models are built for each class and the PCA model of each class is cross-validated (using leave-one-out if the number of samples in the class is <= 20 or contiguous blocks if more than 20 samples in a given class).

For more automatic SIMCA model building, please see the pca or simcasub functions.

Inputs

  • x = M x N matrix of class "dataset" where class information is extracted from x.class{1,1} and labels from x.label{1,1}, or an M x N data matrix of class "double"
  • classid = M x 1 vector of class identifiers where each element is an integer identifying the class number of the corresponding sample.
  • model = when making predictions, input model is a SIMCA model structure.

Optional Inputs

  • ncomp = integer, number of PCs to use in each model. This is rarely known a priori. When ncomp=[] {default} the user is querried for number of PCs for each class.
  • labels = a character array with M rows that is used to label samples on Q vs. T2 plots, otherwise the class identifiers are used.

options = a structure array discussed below.

Outputs

  • model = model structure array with the following fields:
    • modeltype: 'SIMCA',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • description: cell array with text description of model,
    • submodel: structure array with each record containing the PCA model of each class (see PCA), and
    • detail: sub-structure with additional model details and results.
    • pred = is a structure, similar to model, that contains the SIMCA predictions. Additional, or other, fields in pred are:
    • rtsq: the reduced T2 (T2 divided by it's 95Found confidence limit line) where each column corresponds to each class in the SIMCA model,
    • rq: the reduced Q (Q divided by it's 95Found confidence limit line) where each column corresponds to each class in the SIMCA model,
    • nclass: the predicted class number (class to which the sample was closest when considering T2 and Q combined), and
    • submodelpred: structure array with each record containing the PCA model predictions for each class (see PCA),
    • classification: information about the classification of X-block samples (see description at Standard Model). For more information on class predictions, see Sample Classification Predictions.

For more information on class predictions, see Sample Classification Predictions

Note: Calling simca with no inputs starts the graphical user interface (GUI) for this analysis method.

Cross-validation

SIMCA in the Analysis window does not perform cross-validation at the SIMCA model level. There is a "Cross-Validation" entry in the "Analysis Flowchart" where the user can select the cross-validation method which should be applied in building the PCA sub-models. However, there is no cross-validation performed at the SIMCA model level.

The specific cross-validation settings used for an individual PCA sub-model can be modified to the user's preference in the PCA Analysis window while that PCA model is being fitted, along with any other settings for that PCA model.

Options

options = a structure array with the following fields:

  • display: [ {'on'} | 'off' ], governs level of display,
  • plots: ['none' | {'final'} ], governs level of plotting,
  • staticplots: ['no' | {'yes'} ], produce ole-style "static" plots,
  • rule: [{'combined'} | 'T2' | 'Q' | 'both'], governs how a sample's distance from sub-class is measured. 'Q' means reduced Q is used as distance measure. 'T2' means reduced T2 is used. 'both' means both T2 and Q are used (if either is outside the limit, the sample will be considered outside the class). 'combined' uses sqrt(Q^2 + T2^2), each reduced, as the distance measure,
  • preprocessing: { [ ] }, a preprocessing structure (see preprocess) that is used to preprocess data in each class.
  • classset: [ 1 ] indicates which class set in x to use.

Note: with display='off', plots='none', nocomp=(>0 integer) and preprocessing specified that SIMCA can be run without command line interaction.

See Also

cluster, crossval, discrimprob, knn, modelselector, pca, plsda, svmda