Plsda
Purpose
Partial least squares discriminant analysis.
Synopsis
- model = plsda(x,y,ncomp,options)
- model = plsda(x,ncomp,options)
- pred = plsda(x,model,options)
- valid = plsda(x,y,model,options)
Description
PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:
- (A) a column vector of class numbers indicating class assignments:
y = [1 1 3 2]';
- NOTE: if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows (or other set if the option "classset" is used). For information on assigning classes to the X-block, see Assigning Sample Classes.
- (B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
y = [1 0 0; 1 0 0; 0 0 1; 0 1 0]
- NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.
The raw predictions from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, plsdthres). Similarly, a probability of a sample being inside or outside the class can be calculated using discrimprob. The predicted probability of each class as well as class assignments made with various rules can be found in the field:
- model.classification
For more details, see the classification field in the Standard Model Structure description.
Inputs
- x = X-block (predictor block), class "double" or "dataset",
- y = Y-block
- OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
- otherwise, y is one of the following:
- (A) column vector of sample classes for each sample in x
- (B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
- (C) a cell array of class groupings of classes from the x-block data. For example: {[1 2] [3]} would model classes 1 and 2 as a single group against class 3.
- ncomp = the number of latent variables to be calculated (positive integer scalar).
Optional Inputs
- options = an optional input options structure (see below)
Outputs
- model = standard model structure containing the PLSDA model (See MODELSTRUCT).
- pred = structure array with predictions
- valid = structure array with predictions, includes known class information (Y block data) of test samples
For more information on class predictions, see Sample Classification Predictions. Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.
Options
options = a structure that can contain the following fields:
- display: [ 'off' | {'on'} ] governs level of display to command window.
- plots: [ 'none' | {'final'} ] governs level of plotting.
- preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
- orthogonalize: [ {'off'} | 'on' ] Orthogonalize model to condense y-block variance into first latent variable; 'on' = produce orthogonalized model. Regression vector and predictions are NOT changed by this option, only the loadings, weights, and scores. See orthogonalizepls for more information.
- priorprob: [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors. NOTE: the "prior" option from older versions of the software had a bug which caused inverted behavior for this feature. The field name was changed to avoid confusion after the bug was fixed.
- classset: [ 1 ] indicates which class set in x to use when no y-block is provided.
- algorithm: [ 'nip' | {'sim'} | 'dspls' | 'robustpls' ] PLS algorithm to use: NIPALS, SIMPLS, DSPLS, or robust PLS.
- blockdetails: [ 'compact' | {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks
See Also
class2logical, compressmodel, crossval, discrimprob, knn, modelselector, pls, plsdaroc, plsdthres, simca, svmda, vip