Plsda

From Eigenvector Research Documentation Wiki
Revision as of 08:36, 27 January 2011 by imported>Jeremy (→‎Options)
Jump to navigation Jump to search

Purpose

Partial least squares discriminant analysis.

Synopsis

model = plsda(x,y,ncomp,options)
model = plsda(x,ncomp,options)
pred = plsda(x,model,options)
valid = plsda(x,y,model,options)

Description

PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:

  • (A) a column vector of class numbers indicating class assignments:
   y = [1 1 3 2]';
NOTE: if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows.
  • (B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
   y = [1 0 0;
        1 0 0;
        0 0 1;
        0 1 0]
NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.

The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, plsdthres). Similarly, a probability of a sample being inside or outside the class can be calculated using discrimprob. The predicted probability of each class is included in the output model structure in the field:

model.details.predprobability

Inputs

  • x = X-block (predictor block), class "double" or "dataset",
  • y = Y-block
    • OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
    • otherwise, y is one of the following:
      • (A) column vector of sample classes for each sample in x
      • (B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
      • (C) a cell array of class groupings of classes from the x-block data. For example: {[1 2] [3]} would model classes 1 and 2 as a single group against class 3.
  • ncomp = the number of latent variables to be calculated (positive integer scalar).

Optional Inputs

  • options = an optional input options structure (see below)

Outputs

  • model = standard model structure containing the PLSDA model (See MODELSTRUCT).
  • pred = structure array with predictions
  • valid = structure array with predictions, includes known class information (Y block data) of test samples

Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.

Options

options = a structure that can contain the following fields:

  • display: [ 'off' | {'on'} ] governs level of display to command window.
  • plots: [ 'none' | {'final'} ] governs level of plotting.
  • preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
  • orthogonalize: [ {'off'} | 'on' ] Orthogonalize model to condense y-block variance into first latent variable; 'on' = produce orthogonalized model. Regression vector and predictions are NOT changed by this option, only the loadings, weights, and scores.
  • priorprob: [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors. NOTE: the "prior" option from older versions of the software had a bug which caused inverted behavior for this feature. The field name was changed to avoid confusion after the bug was fixed.
  • algorithm: [ 'nip' | {'sim'} | 'dspls' | 'robustpls' ] PLS algorithm to use: NIPALS, SIMPLS, DSPLS, or robust PLS.
  • blockdetails: [ 'compact' | {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks

See Also

class2logical, compressmodel, crossval, discrimprob, pls, plsdaroc, plsdthres, simca, svmda