Pls: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Chuck
No edit summary
 
(37 intermediate revisions by 8 users not shown)
Line 8: Line 8:
:pred  = pls(x,model,''options'')        %makes predictions with a new X-block
:pred  = pls(x,model,''options'')        %makes predictions with a new X-block
:valid = pls(x,y,model,''options'')      %makes predictions with new X- & Y-block
:valid = pls(x,y,model,''options'')      %makes predictions with new X- & Y-block
:options = pls(''options'')              %returns a default options structure
:pls % launches analysis window with PLS selected
 
Please note that the recommended way to build and apply a PLS model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].


===Description===
===Description===


PLS calculates a single partial least squares regression model using the given number of components ncomp to predict y from measurements x.
PLS calculates a single partial least squares regression model using the given number of components <tt>ncomp</tt> to predict a dependent variable <tt>y</tt> from a set of independent variables <tt>x</tt>.


To construct a PLS model, the inputs are x the predictor block (2-way array class "double" or class "datadet"), y the predicted block (2-way array class "double" or class "datadet"), ncomp the number of components to to be calculated (positive integer scalar), and the optional structure, ''options''. The output is a standard model structure model with the following fields (see MODELSTRUCT):
Alternatively, PLS can be used in 'predicton mode' to apply a previously built PLS model in <tt>model</tt> to an external set of test data in <tt>x</tt> (2-way array class "double" or "dataset"), in order to generate y-values for these data.  


* '''modeltype''': 'PLS',
Furthermore, if matching x-block and y-block measurements are available for an external test set, then PLS can be used in 'validation mode' to predict the y-values of the test data from the model <tt>model</tt> and <tt>x</tt>, and allow comparison of these predicted y-values to the known y-values <tt>y</tt>.


* '''datasource''': structure array with information about input data,
Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.  
 
* '''date''': date of creation,
 
* '''time''': time of creation,
 
* '''info''': additional model information,
 
* '''reg''': regression vector,
 
* '''loads''': cell array with model loadings for each mode/dimension,
 
* '''pred''': 2 element cell array with model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array) and the y-block predictions.
 
* '''wts''': double array with X-block weights,


* '''tsqs''': cell array with T<sup>2</sup> values for each mode,
====Inputs====


* '''ssqresiduals''': cell array with sum of squares residuals for each mode,
* '''x''' = the independent variable (X-block) data (2-way array class "double" or class "dataset")
* '''y''' = the dependent variable (Y-block) data (2-way array class "double" or class "dataset")
* '''ncomp''' = the number of components to to be calculated (positive integer scalar)


* '''description''': cell array with text description of model, and
====Outputs====


* '''detail''': sub-structure with additional model details and results.
* '''model''' = a standard model structure model with the following fields (see [[Standard Model Structure]]):
** '''modeltype''': 'PLS',
** '''datasource''': structure array with information about input data,
** '''date''': date of creation,
** '''time''': time of creation,
** '''info''': additional model information,
** '''reg''': regression vector,
** '''loads''': cell array with model loadings for each mode/dimension,
** '''pred''': 2 element cell array with
*** model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array),and
*** the y-block predictions.
** '''wts''': double array with X-block weights,
** '''tsqs''': cell array with T<sup>2</sup> values for each mode,
** '''ssqresiduals''': cell array with sum of squares residuals for each mode,
** '''description''': cell array with text description of model, and
** '''detail''': sub-structure with additional model details and results.


To make predictions the inputs are x the new predictor x-block (2-way array class "double" or "dataset"), and model the PLS model. The output pred is a structure, similar to model, that contains scores, predictions, etc. for the new data.
* '''pred''' a structure, similar to '''model''', that contains scores, predictions, etc. for the new data.


If new y-block measurements are also available then the inputs are x the new predictor x-block (2-way array class "double" or "dataset"), y the new predicted block (2-way array class "double" or "dataset"), and model the PLS model. The output valid is a structure, similar to model, that contains scores, predictions, and additional y-block statistics etc. for the new data.
* '''valid''' a structure, similar to '''model''', that contains scores, predictions, and additional y-block statistics, etc. for the new data.


Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.  
Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.


===Options===
===Options===
Line 53: Line 58:


* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''plots''' [ 'none' | {'final'} ], governs level of plotting,
* '''plots''' [ 'none' | {'final'} ], governs level of plotting,
* '''outputversion''': [ 2 | {3} ], governs output format (see below),
* '''outputversion''': [ 2 | {3} ], governs output format (see below),
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
* '''preprocessing''': {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
 
* '''algorithm''': [ 'nip' | {'sim'} | 'dspls' | 'robustpls' ], PLS algorithm to use: NIPALS, SIMPLS {default},  Direct Scores, or robust pls (with automatic outlier detection).
* '''algorithm''': [ 'nip' | {'sim'} | 'robustpls' ], PLS algorithm to use: NIPALS or SIMPLS {default}, and
* '''orthogonalize''': [ {'off'} | 'on' ] Orthogonalize model to condense y-block variance into first latent variable; 'on' = produce orthogonalized model. Regression vector and predictions are NOT changed by this option, only the loadings, weights, and scores. See [[orthogonalizepls]] for more information.
 
* '''blockdetails''': [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
* '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
:* ‘Standard’ = the predictions and raw residuals for the X-block as well as the X-block itself are not stored in the model to reduce its size in memory. Specifically, these fields in the model object are left empty: 'model.pred{1}', 'model.detail.res{1}', 'model.detail.data{1}'.
 
:* ‘Compact’ = for this function, 'compact' is identical to 'standard'.
:* 'All' = keep predictions, raw residuals for both X- & Y-blocks as well as the X- & Y-blocks themselves.
*'''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
*'''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
 
*'''weights''': [ {'none'} | 'hist' | 'custom' ]  governs sample weighting. 'none' does no weighting. 'hist' performs histogram weighting in which large numbers of samples at individual y-values are down-weighted relative to small numbers of samples at other values. 'custom' uses the weighting specified in the weightsvect option.
*'''weightsvect''': [ ] Used only with custom weights. The vector specified must be equal in length to the number of samples in the y block and each element is used as a weight for the corresponding sample. If empty, no sample weighting is done.
* '''roptions''': structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
* '''roptions''': structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
 
** '''alpha''': [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.
* '''alpha''': [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.


The default options can be retreived using: options = pls('options');.
The default options can be retreived using: options = pls('options');.


OUTPUTVERSION
====Outputversion====


By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
Line 78: Line 81:
:[b,ssq,p,q,w,t,u,bin] = pls(x,y,ncomp,''options'')
:[b,ssq,p,q,w,t,u,bin] = pls(x,y,ncomp,''options'')


where the outputs are
where the outputs are as defined for the [[nippls]] function. This is provided for backwards compatibility. It is recommended that users call the [[simpls]] or [[nippls]] functions directly.


* '''b''' = matrix of regression vectors or matrices for each number of principal components up to ncomp,
There is also a difference in the scores and loadings returned by the old version and the new (default) version. The old version (outputversion=2) keeps the variance in the loadings and the scores are normalized. The new version (outputversion=3) keeps the variance in the scores and has normalized loadings. The older format is related to the usage in the original algorithm publications. The newer format is used in order to maintain a standardized format across all PLS algorithms (robust PLS, and DSPLS).


* '''ssq''' = the sum of squares information,
===Algorithm===


* '''p''' = x-block loadings,
Note that unlike previous versions of the PLS function, the default algorithm (see Options, above) is the faster SIMPLS algorithm. If the alternate NIPALS algorithm is to be used, the options.algorithm field should be set to 'nip'.
 
* '''q''' = y-block loadings,
 
* '''w''' = x-block weights,
 
* '''t''' = x-block scores
 
* '''u''' = y-block scores, and


* '''bin''' = inner relation coefficients.
Option 'robustpls' enables a robust method for Partial Least Squares Regression based on the SIMPLS algorithm. This uses the function 'rsimpls' from the well-known LIBRA Toolbox, developed by Mia Hubert's research group at the Katholieke Universiteit Leuven (kuleuven.be). The RSIMPLS method is described in:  Hubert, M., and Vanden Branden, K. (2003), "Robust Methods for Partial Least Squares Regression", Journal of Chemometrics, 17, 537-549.


Note: The regression matrices are ordered in b such that each ''Ny'' (number of y-block variables) rows correspond to the regression matrix for that particular number of principal components.
====Studentized Residuals====
 
From version 8.8 onwards, the Studentized Residuals shown for PLS Scores Plot are now calculated for calibration samples as:
===Algorithm===
  MSE  = sum((res).^2)./(m-ncomp);
 
  syres = res./sqrt(MSE.*(1-L));
Note that unlike previous versions of the PLS function, the default algorithm (see Options, above) is the faster SIMPLS algorithm. If the alternate NIPALS algorithm is to be used, the options.algorithm field should be set to 'nip'.
where res = y residual, m = number of samples, ncomp = number of LV components and L = sample leverage.
This represents a constant multiplier change from how Studentized Residuals were previously calculated.
For test datasets the semi-Studentized residuals are calculated as:
  MSE  = sum((res).^2)./(m-ncomp);
  syres = pres./sqrt(MSE);
This represents a constant multiplier change from how the semi-Studentized Residuals were previously calculated.


===See Also===
===See Also===


[[analysis]], [[crossval]], [[modelstruct]], [[nippls]], [[pcr]], [[plsda]], [[preprocess]], [[ridge]], [[simpls]]
[[analysis]], [[crossval]], [[mlr]], [[modelstruct]], [[nippls]], [[pcr]], [[plsda]], [[preprocess]], [[ridge]], [[simpls]], [[EVRIModel_Objects]]

Latest revision as of 13:52, 6 February 2020

Purpose

Partial least squares regression for univariate or multivariate y-block.

Synopsis

model = pls(x,y,ncomp,options) %identifies model (calibration step)
pred = pls(x,model,options) %makes predictions with a new X-block
valid = pls(x,y,model,options) %makes predictions with new X- & Y-block
pls % launches analysis window with PLS selected

Please note that the recommended way to build and apply a PLS model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

PLS calculates a single partial least squares regression model using the given number of components ncomp to predict a dependent variable y from a set of independent variables x.

Alternatively, PLS can be used in 'predicton mode' to apply a previously built PLS model in model to an external set of test data in x (2-way array class "double" or "dataset"), in order to generate y-values for these data.

Furthermore, if matching x-block and y-block measurements are available for an external test set, then PLS can be used in 'validation mode' to predict the y-values of the test data from the model model and x, and allow comparison of these predicted y-values to the known y-values y.

Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.

Inputs

  • x = the independent variable (X-block) data (2-way array class "double" or class "dataset")
  • y = the dependent variable (Y-block) data (2-way array class "double" or class "dataset")
  • ncomp = the number of components to to be calculated (positive integer scalar)

Outputs

  • model = a standard model structure model with the following fields (see Standard Model Structure):
    • modeltype: 'PLS',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • reg: regression vector,
    • loads: cell array with model loadings for each mode/dimension,
    • pred: 2 element cell array with
      • model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array),and
      • the y-block predictions.
    • wts: double array with X-block weights,
    • tsqs: cell array with T2 values for each mode,
    • ssqresiduals: cell array with sum of squares residuals for each mode,
    • description: cell array with text description of model, and
    • detail: sub-structure with additional model details and results.
  • pred a structure, similar to model, that contains scores, predictions, etc. for the new data.
  • valid a structure, similar to model, that contains scores, predictions, and additional y-block statistics, etc. for the new data.

Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots [ 'none' | {'final'} ], governs level of plotting,
  • outputversion: [ 2 | {3} ], governs output format (see below),
  • preprocessing: {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
  • algorithm: [ 'nip' | {'sim'} | 'dspls' | 'robustpls' ], PLS algorithm to use: NIPALS, SIMPLS {default}, Direct Scores, or robust pls (with automatic outlier detection).
  • orthogonalize: [ {'off'} | 'on' ] Orthogonalize model to condense y-block variance into first latent variable; 'on' = produce orthogonalized model. Regression vector and predictions are NOT changed by this option, only the loadings, weights, and scores. See orthogonalizepls for more information.
  • blockdetails: [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
  • ‘Standard’ = the predictions and raw residuals for the X-block as well as the X-block itself are not stored in the model to reduce its size in memory. Specifically, these fields in the model object are left empty: 'model.pred{1}', 'model.detail.res{1}', 'model.detail.data{1}'.
  • ‘Compact’ = for this function, 'compact' is identical to 'standard'.
  • 'All' = keep predictions, raw residuals for both X- & Y-blocks as well as the X- & Y-blocks themselves.
  • confidencelimit: [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
  • weights: [ {'none'} | 'hist' | 'custom' ] governs sample weighting. 'none' does no weighting. 'hist' performs histogram weighting in which large numbers of samples at individual y-values are down-weighted relative to small numbers of samples at other values. 'custom' uses the weighting specified in the weightsvect option.
  • weightsvect: [ ] Used only with custom weights. The vector specified must be equal in length to the number of samples in the y block and each element is used as a weight for the corresponding sample. If empty, no sample weighting is done.
  • roptions: structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
    • alpha: [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.

The default options can be retreived using: options = pls('options');.

Outputversion

By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:

[b,ssq,p,q,w,t,u,bin] = pls(x,y,ncomp,options)

where the outputs are as defined for the nippls function. This is provided for backwards compatibility. It is recommended that users call the simpls or nippls functions directly.

There is also a difference in the scores and loadings returned by the old version and the new (default) version. The old version (outputversion=2) keeps the variance in the loadings and the scores are normalized. The new version (outputversion=3) keeps the variance in the scores and has normalized loadings. The older format is related to the usage in the original algorithm publications. The newer format is used in order to maintain a standardized format across all PLS algorithms (robust PLS, and DSPLS).

Algorithm

Note that unlike previous versions of the PLS function, the default algorithm (see Options, above) is the faster SIMPLS algorithm. If the alternate NIPALS algorithm is to be used, the options.algorithm field should be set to 'nip'.

Option 'robustpls' enables a robust method for Partial Least Squares Regression based on the SIMPLS algorithm. This uses the function 'rsimpls' from the well-known LIBRA Toolbox, developed by Mia Hubert's research group at the Katholieke Universiteit Leuven (kuleuven.be). The RSIMPLS method is described in: Hubert, M., and Vanden Branden, K. (2003), "Robust Methods for Partial Least Squares Regression", Journal of Chemometrics, 17, 537-549.

Studentized Residuals

From version 8.8 onwards, the Studentized Residuals shown for PLS Scores Plot are now calculated for calibration samples as:

 MSE   = sum((res).^2)./(m-ncomp);
 syres = res./sqrt(MSE.*(1-L));

where res = y residual, m = number of samples, ncomp = number of LV components and L = sample leverage. This represents a constant multiplier change from how Studentized Residuals were previously calculated. For test datasets the semi-Studentized residuals are calculated as:

 MSE   = sum((res).^2)./(m-ncomp);
 syres = pres./sqrt(MSE);

This represents a constant multiplier change from how the semi-Studentized Residuals were previously calculated.

See Also

analysis, crossval, mlr, modelstruct, nippls, pcr, plsda, preprocess, ridge, simpls, EVRIModel_Objects