Plsda: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Chuck
imported>Chuck
Line 28: Line 28:
</pre>
</pre>


:NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.
:'''NOTE''': When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.


The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, [[plsdthres]]). Similarly, a probability of a sample being inside or outside the class can be calculated using [[discrimprob]]. The predicted probability of each class is included in the output model structure in the field:
The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, [[plsdthres]]). Similarly, a probability of a sample being inside or outside the class can be calculated using [[discrimprob]]. The predicted probability of each class is included in the output model structure in the field:
Line 39: Line 39:
* '''y''' = Y-block  
* '''y''' = Y-block  
** OPTIONAL if '''x''' is a dataset containing classes for sample mode (mode 1)
** OPTIONAL if '''x''' is a dataset containing classes for sample mode (mode 1)
** otherwise, y is one of the following:
** otherwise, '''y''' is one of the following:
***(A) column vector of sample classes for each sample in '''x'''   
***(A) column vector of sample classes for each sample in '''x'''   
***(B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or  
***(B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or  

Revision as of 16:36, 8 October 2008

Purpose

Partial least squares discriminant analysis.

Synopsis

model = plsda(x,y,ncomp,options)
model = plsda(x,ncomp,options)
pred = plsda(x,model,options)
valid = plsda(x,y,model,options)

Description

PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:

  • (A) a column vector of class numbers indicating class assignments:
   y = [1 1 3 2]';
NOTE: if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows.
  • (B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
   y = [1 0 0;
        1 0 0;
        0 0 1;
        0 1 0]
NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.

The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, plsdthres). Similarly, a probability of a sample being inside or outside the class can be calculated using discrimprob. The predicted probability of each class is included in the output model structure in the field:

model.details.predprobability

Inputs

  • x = X-block (predictor block), class "double" or "dataset",
  • y = Y-block
    • OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
    • otherwise, y is one of the following:
      • (A) column vector of sample classes for each sample in x
      • (B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
      • (C) a cell array of class groupings of classes from the x-block data. For example: {[1 2] [3]} would model classes 1 and 2 as a single group against class 3.
  • ncomp = the number of latent variables to be calculated (positive integer scalar).

Optional Inputs

  • options = an optional input options structure (see below)

Outputs

  • model = standard model structure containing the PLSDA model (See MODELSTRUCT).
  • pred = structure array with predictions
  • valid = structure array with predictions, includes known class information (Y block data) of test samples

Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.

Options

  • options = a structure that can contain the following fields:
    • display: [ 'off' | {'on'} ] governs level of display to command window.
    • plots: [ 'none' | {'final'} ] governs level of plotting.
    • preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
    • prior: [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors.
    • algorithm: [ 'nip' | {'sim'} ] PLS algorithm to use: NIPALS or SIMPLS
    • blockdetails: [ 'compact' | {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks

See Also

class2logical, compressmodel, crossval, discrimprob, pls, plsdaroc, plsdthres, simca