Simca

From Eigenvector Research Documentation Wiki
Revision as of 12:52, 30 September 2011 by imported>Donal (→‎Outputs)
Jump to navigation Jump to search

Purpose

Create soft independent method of class analogy models for classification.

Synopsis

model = simca(x,ncomp,options) %creates simca model on dataset x
model = simca(x,classid,labels) %models double x with class id
pred = simca(x,model,options); %predictions on x with model

Description

The function SIMCA develops a SIMCA model, which is really a collection of PCA models, one for each class of data in the data set and is used for supervised pattern recognition.

When optional input ncomp is not supplied, SIMCA operates in an interactive mode. In this mode, the user is prompted for basic preprocessing and number of components to keep in each model. Individual models are built for each class and the PCA model of each class is cross-validated (using leave-one-out if the number of samples in the class is <= 20 or contiguous blocks if more than 20 samples in a given class).

For more automatic SIMCA model building, please see the pca or simcasub functions.

Inputs

  • x = M x N matrix of class "dataset" where class information is extracted from x.class{1,1} and labels from x.label{1,1}, or an M x N data matrix of class "double"
  • classid = M x 1 vector of class identifiers where each element is an integer identifying the class number of the corresponding sample.
  • model = when making predictions, input model is a SIMCA model structure.

Optional Inputs

  • ncomp = integer, number of PCs to use in each model. This is rarely known a priori. When ncomp=[] {default} the user is querried for number of PCs for each class.
  • labels = a character array with M rows that is used to label samples on Q vs. T2 plots, otherwise the class identifiers are used.

options = a structure array discussed below.

Outputs

  • model = model structure array with the following fields:
    • modeltype: 'SIMCA',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • description: cell array with text description of model,
    • submodel: structure array with each record containing the PCA model of each class (see PCA), and
    • detail: sub-structure with additional model details and results.
    • pred = is a structure, similar to model, that contains the SIMCA predictions. Additional, or other, fields in pred are:
    • rtsq: the reduced T2 (T2 divided by it's 95Found confidence limit line) where each column corresponds to each class in the SIMCA model,
    • rq: the reduced Q (Q divided by it's 95Found confidence limit line) where each column corresponds to each class in the SIMCA model,
    • nclass: the predicted class number (class to which the sample was closest when considering T2 and Q combined), and
    • submodelpred: structure array with each record containing the PCA model predictions for each class (see PCA),
    • classification: information about the classification of X-block samples (see description at Standard Model). For more information on class predictions, see Sample Classification Predictions.

Note: Calling simca with no inputs starts the graphical user interface (GUI) for this analysis method.

Options

options = a structure array with the following fields:

  • display: [ {'on'} | 'off' ], governs level of display,
  • plots: ['none' | {'final'} ], governs level of plotting,
  • staticplots: ['no' | {'yes'} ], produce ole-style "static" plots,
  • rule: [{'combined'} | 'final' | 'T2' | 'Q'], decision rule,
  • preprocessing: { [ ] }, a preprocessing structure (see preprocess) that is used to preprocess data in each class.

Note: with display='off', plots='none', nocomp=(>0 integer) and preprocessing specified that SIMCA can be run without command line interaction.

See Also

cluster, crossval, discrimprob, knn, modelselector, pca, plsda