Plsda: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
 
(34 intermediate revisions by 7 users not shown)
Line 1: Line 1:
===Purpose===
===Purpose===


Partial least squares discriminate analysis.
Partial least squares discriminant analysis.


===Synopsis===
===Synopsis===
 
:plsda - Launches an Analysis window with the PLSDA method selected
:model = plsda(x,y,ncomp,''options'')
:model = plsda(x,y,ncomp,''options'')
:model = plsda(x,ncomp,''options'')
:model = plsda(x,ncomp,''options'')
:pred  = plsda(x,model,''options'')
:pred  = plsda(x,model,''options'')
:valid = plsda(x,y,model,''options'')
:valid = plsda(x,y,model,''options'')
Please note that the recommended way to build and apply a PLSDA model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].


===Description===
===Description===
Line 14: Line 16:
PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:
PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:


(A) a column vector of class numbers indicating class assignments:
*(A) a column vector of class numbers indicating class assignments:


     y = [1 1 3 2]';
     y = [1 1 3 2]';


(B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
:'''NOTE:''' if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows (or other set if the option "classset" is used). For information on assigning classes to the X-block, see [[Assigning Sample Classes]].
 
*(B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
<pre>
<pre>
   y = [1 0 0;
   y = [1 0 0;
Line 26: Line 30:
</pre>
</pre>


NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.
:'''NOTE''': When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.
 
====Probability-based Predictions====
The raw predictions from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, [[plsdthres]]). Similarly, a probability of a sample being inside or outside the class can be calculated using [[discrimprob]]. The predicted probability of each class as well as class assignments made with various rules can be found in the field:
 
:model.classification
 
For more details, see [[Sample Classification Predictions]], and the description of the model's classification field in the [[Standard Model Structure]].


The prediction from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, PLSDTHRES). Similarly, a probability of a sample being inside or outside the class can be calculated using DISCRIMPROB. The predicted probability of each class is included in the output model structure in the field:
====Threshold-based Predictions====
It is possible to see the classification results based on the sample prediction relative to the threshold for that class. These can differ slightly from the predictions based on probabilities. The probability-based predictions are likely to be more accurate in situations where one class is narrowly distributed in y-prediction range but other classes are broadly distributed and so are more probable for y-prediction values far from the narrow class probable y range (see [http://www.eigenvector.com/faq/index.php?id=38]).
* In the PLSDA Analysis window the threshold-based classification results can viewed by using the menu: "Tools"->"Show Details"->"Model", or by mousing over the model icon. This reports the Sensitivity, Specificity, Class Error for each modeled class. The "Class Err." is defined as the mean of the false positive and false positive rates. (see [https://en.wikipedia.org/wiki/Sensitivity_and_specificity#Definitions  definitions]).
* For command line usage these are found in the model object as model.detail.misclassed, a cell array containing a matrix for each class, and model.detail.classerrc. For class j:


:model.details.predprobability
: False positive rate (1 - specificity): model.detail.misclassedc{j}(1, ncomp)
: False negative rate (1 - sensitivity): model.detail.misclassedc{j}(2, ncomp), where ncomp = number of latent variables used in model.
: Class Error: model.detail.classerrc(j, ncomp)


====Inputs====
====Inputs====


* '''x''' = X-block (predictor block) class "double" or "dataset",
* '''x''' = X-block (predictor block), class "double" or "dataset",
*       '''y''' = Y-block - OPTIONAL if x is a dataset containing classes for sample mode (mode 1) otherwise, y is one of:
* '''y''' = Y-block  
::(A) column vector of sample classes for each sample in x -OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
** OPTIONAL if '''x''' is a dataset containing classes for sample mode (mode 1)
::or (B) a logical array with 1 indicating class membership for each sample (rows) in one or more classes (columns)  
** otherwise, '''y''' is one of the following:
*   '''ncomp''' =  the number of latent variables to be calculated (positive integer scalar).
***(A) column vector of sample classes for each sample in '''x''' 
* '''options''' = an optional input options structure (see Options below)
***(B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
***(C) a cell array of class groupings of classes from the x-block data. For example: <tt> {[1 2] [3]} </tt>  would model classes 1 and 2 as a single group against class 3.
* '''ncomp''' =  the number of latent variables to be calculated (positive integer scalar).
 
====Optional Inputs====
 
* '''options''' = an optional input options structure (see below)


====Outputs====
====Outputs====


* '''model''' =  standard model structure containing the PLSDA model (See MODELSTRUCT).
* '''model''' =  standard model structure containing the PLSDA model (See [[Standard Model Structure]]).
 
* '''pred''' =  structure array with predictions
*     '''pred''' =  structure array with predictions
* '''valid''' =  structure array with predictions, includes known class information (Y block data) of test samples


*    '''valid''' =  structure array with predictionsz
Note: Calling '''plsda''' with no inputs starts the graphical user interface (GUI) for this analysis method.


Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.
For more information on class predictions, see [[Sample Classification Predictions]].


===Options===
===Options===


''options'' = a structure that can contain the following fields:
* '''display''': [ 'off' | {'on'} ]      governs level of display to command window.
* '''display''': [ 'off' | {'on'} ]      governs level of display to command window.
* '''plots''': [ 'none' | {'final'} ]  governs level of plotting.
* '''plots''': [ 'none' | {'final'} ]  governs level of plotting.
* '''preprocessing''': {[] []}  preprocessing structures for x and y blocks (see PREPROCESS).
* '''preprocessing''': {[] []}  preprocessing structures for x and y blocks (see PREPROCESS).
* '''orthogonalize''': [ {'off'} | 'on' ] Orthogonalize model to condense y-block variance into first latent variable; 'on' = produce orthogonalized model. Regression vector and predictions are NOT changed by this option, only the loadings, weights, and scores. See [[orthogonalizepls]] for more information.
* '''priorprob''': [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors. '''NOTE:''' the "prior" option from older versions of the software had a bug which caused inverted behavior for this feature. The field name was changed to avoid confusion after the bug was fixed.
* '''classset''':  [ 1 ] indicates which class set in x to use when no y-block is provided.
* '''algorithm''': [ 'nip' | {'sim'} | 'dspls' | 'robustpls' ] PLS algorithm to use: NIPALS, SIMPLS, DSPLS, or robust PLS.
* '''blockdetails''': [  'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
:* ‘Standard’ = keep predictions, raw residuals and for Y-block only (Y-block included).
:* ‘Compact’ = for this function, 'compact' is identical to 'standard'.
:* 'All' = keep predictions, raw residuals for both X- & Y-blocks as well as the X- & Y-blocks themselves.
* '''strictthreshold''': Probability threshold value to use in strict class assignment, see [[Sample_Classification_Predictions#Class_Pred_Strict]]. Default = 0.5.
*'''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
* '''roptions''': structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
**: '''alpha''': [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.


* '''algorithm''': [ 'nip' | {'sim'} ]     PLS algorithm to use: NIPALS or SIMPLS
*'''weights''': [ {'none'} | 'hist' | 'custom' ] governs sample weighting. 'none' does no weighting. 'hist' performs histogram weighting in which large numbers of samples at individual y-values are down-weighted relative to small numbers of samples at other values. 'custom' uses the weighting specified in the weightsvect option.
 
*'''weightsvect''': [ ] Used only with custom weights. The vector specified must be equal in length to the number of samples in the y block and each element is used as a weight for the corresponding sample. If empty, no sample weighting is done.
* '''blockdetails''': [ 'compact' | {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks


===See Also===
===See Also===


[[class2logical]], [[crossval]], [[pls]], [[plsdthres]], [[simca]]
[[analysis]], [[class2logical]], [[compressmodel]], [[crossval]], [[discrimprob]], [[knn]], [[lda]], [[modelselector]], [[pls]], [[plsdaroc]], [[plsdthres]], [[preprocess]], [[simca]], [[svmda]], [[vip]], [[EVRIModel_Objects]]

Latest revision as of 09:19, 8 December 2023

Purpose

Partial least squares discriminant analysis.

Synopsis

plsda - Launches an Analysis window with the PLSDA method selected
model = plsda(x,y,ncomp,options)
model = plsda(x,ncomp,options)
pred = plsda(x,model,options)
valid = plsda(x,y,model,options)

Please note that the recommended way to build and apply a PLSDA model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

PLSDA is a multivariate inverse least squares discrimination method used to classify samples. The y-block in a PLSDA model indicates which samples are in the class(es) of interest through either:

  • (A) a column vector of class numbers indicating class assignments:
   y = [1 1 3 2]';
NOTE: if classes are assigned in the input (x), y can be omitted and this option will be assumed using the first class set of the x-block rows (or other set if the option "classset" is used). For information on assigning classes to the X-block, see Assigning Sample Classes.
  • (B) a matrix of one or more columns containing a logical zero (= not in class) or one (= in class) for each sample (row):
   y = [1 0 0;
        1 0 0;
        0 0 1;
        0 1 0]
NOTE: When a vector of class numbers is used (case A, above), class zero (0) is reserved for "unknown" samples and, thus, samples of class zero are never used when calibrating a PLSDA model. The model will include predictions for these samples.

Probability-based Predictions

The raw predictions from a PLSDA model is a value of nominally zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. In practice a threshold between zero and one is determined above which a sample is in the class and below which a sample is not in the class (See, for example, plsdthres). Similarly, a probability of a sample being inside or outside the class can be calculated using discrimprob. The predicted probability of each class as well as class assignments made with various rules can be found in the field:

model.classification

For more details, see Sample Classification Predictions, and the description of the model's classification field in the Standard Model Structure.

Threshold-based Predictions

It is possible to see the classification results based on the sample prediction relative to the threshold for that class. These can differ slightly from the predictions based on probabilities. The probability-based predictions are likely to be more accurate in situations where one class is narrowly distributed in y-prediction range but other classes are broadly distributed and so are more probable for y-prediction values far from the narrow class probable y range (see [1]).

  • In the PLSDA Analysis window the threshold-based classification results can viewed by using the menu: "Tools"->"Show Details"->"Model", or by mousing over the model icon. This reports the Sensitivity, Specificity, Class Error for each modeled class. The "Class Err." is defined as the mean of the false positive and false positive rates. (see definitions).
  • For command line usage these are found in the model object as model.detail.misclassed, a cell array containing a matrix for each class, and model.detail.classerrc. For class j:
False positive rate (1 - specificity): model.detail.misclassedc{j}(1, ncomp)
False negative rate (1 - sensitivity): model.detail.misclassedc{j}(2, ncomp), where ncomp = number of latent variables used in model.
Class Error: model.detail.classerrc(j, ncomp)

Inputs

  • x = X-block (predictor block), class "double" or "dataset",
  • y = Y-block
    • OPTIONAL if x is a dataset containing classes for sample mode (mode 1)
    • otherwise, y is one of the following:
      • (A) column vector of sample classes for each sample in x
      • (B) a logical array with '1' indicating class membership for each sample (rows) in one or more classes (columns), or
      • (C) a cell array of class groupings of classes from the x-block data. For example: {[1 2] [3]} would model classes 1 and 2 as a single group against class 3.
  • ncomp = the number of latent variables to be calculated (positive integer scalar).

Optional Inputs

  • options = an optional input options structure (see below)

Outputs

  • model = standard model structure containing the PLSDA model (See Standard Model Structure).
  • pred = structure array with predictions
  • valid = structure array with predictions, includes known class information (Y block data) of test samples

Note: Calling plsda with no inputs starts the graphical user interface (GUI) for this analysis method.

For more information on class predictions, see Sample Classification Predictions.

Options

options = a structure that can contain the following fields:

  • display: [ 'off' | {'on'} ] governs level of display to command window.
  • plots: [ 'none' | {'final'} ] governs level of plotting.
  • preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
  • orthogonalize: [ {'off'} | 'on' ] Orthogonalize model to condense y-block variance into first latent variable; 'on' = produce orthogonalized model. Regression vector and predictions are NOT changed by this option, only the loadings, weights, and scores. See orthogonalizepls for more information.
  • priorprob: [ ] Vector of prior probabilities of observing each class. If any class prior is "Inf", the frequency of observation of that class in the calibration is used as its prior probability. If all priors are Inf, this has the effect of providing the fewest incorrect predictions assuming that the probability of observing a given class in future samples is similar to the frequency that class in the calibration set. The default [] uses all ones i.e. equal priors. NOTE: the "prior" option from older versions of the software had a bug which caused inverted behavior for this feature. The field name was changed to avoid confusion after the bug was fixed.
  • classset: [ 1 ] indicates which class set in x to use when no y-block is provided.
  • algorithm: [ 'nip' | {'sim'} | 'dspls' | 'robustpls' ] PLS algorithm to use: NIPALS, SIMPLS, DSPLS, or robust PLS.
  • blockdetails: [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
  • ‘Standard’ = keep predictions, raw residuals and for Y-block only (Y-block included).
  • ‘Compact’ = for this function, 'compact' is identical to 'standard'.
  • 'All' = keep predictions, raw residuals for both X- & Y-blocks as well as the X- & Y-blocks themselves.
  • strictthreshold: Probability threshold value to use in strict class assignment, see Sample_Classification_Predictions#Class_Pred_Strict. Default = 0.5.
  • confidencelimit: [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
  • roptions: structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
    • alpha: [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.
  • weights: [ {'none'} | 'hist' | 'custom' ] governs sample weighting. 'none' does no weighting. 'hist' performs histogram weighting in which large numbers of samples at individual y-values are down-weighted relative to small numbers of samples at other values. 'custom' uses the weighting specified in the weightsvect option.
  • weightsvect: [ ] Used only with custom weights. The vector specified must be equal in length to the number of samples in the y block and each element is used as a weight for the corresponding sample. If empty, no sample weighting is done.

See Also

analysis, class2logical, compressmodel, crossval, discrimprob, knn, lda, modelselector, pls, plsdaroc, plsdthres, preprocess, simca, svmda, vip, EVRIModel_Objects