Rpls: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Benjamin
(Created page with "===Purpose=== RPLS Recursive PLS and PCR variable selection. ===Synopsis=== :results = rpls(X,Y,ncomps,options) ===Description=== Perform variable selection based on PLS wei...")
 
imported>Donal
Line 6: Line 6:


===Description===
===Description===
Perform variable selection based on PLS weights and RMSECV values obtained at each iteration. The interval set which provides the lowest RMSECV is selected. RPLS has three modes: “specified”, “intelligent”, and “comprehensive”. The default is “specified” mode which will construct PLS models using the specified number of components (latent variables), ncomps, exclusively. The “intelligent” mode runs PLS and crossval on the entire data, determines the most appropriate number of latent variables, and then proceeds with RPLS as in "specified" mode. The “comprehensive” mode runs PLS from 1 latent variable to the maximum number of latent variables, the set of results with the lowest RMSECV value is returned. The “algorithm” options allows this function to behave as an RPLS or RPCR algorithm. The default is PLS, but options.algorithm=’pcr’ changes the algorithm to PCR.
Perform model-based variable selection based on PLS, iterative re-weighting of X by normalized regression coefficients. The final iteration, or the iteration which provides the lowest RMSECV is selected. RPLS has three modes: “specified”, “suggested”, and “surveyed”. The default is “specified” mode which will construct PLS models using the specified number of components (latent variables), ncomps, exclusively. The “suggested” mode runs PLS and crossval on the entire data, determines the most appropriate number of latent variables, and then proceeds with rPLS as in "specified" mode. The “surveyed” mode runs PLS from 1 latent variable to the maximum number of latent variables, the set of results with the lowest RMSECV value is returned. The “algorithm” options allows this function to behave as an rPLS or rPCR algorithm. The default is PLS, but options.algorithm=’pcr’ changes the algorithm to PCR.


Inputs are (X,Y) the X and Y data, (ncomps) the number of latent variables to be used (or maximum number of latent variables to be used in ‘intelligent’ and ‘comprehensive’ modes), (options) is the options structure for RPLS.
Inputs are (X,Y) the X and Y data, (ncomps) the number of latent variables to be used (or maximum number of latent variables to be used in ‘intelligent’ and ‘comprehensive’ modes), (options) is the options structure for rPLS.


If Options.plots is ‘final’, a plot is given displaying the RPLS weights with an overlay of the mean sample for reference. The iteration number is displayed on the y-axis, and the dataset axisscale is on the x-axis, if X has no axisscale, then variable indexes is used.
If Options.plots is ‘final’, a plot is given displaying the rPLS weights with an overlay of the mean sample for reference. The iteration number is displayed on the y-axis, and the dataset axisscale is on the x-axis, if X has no axisscale, then variable indexes is used.
 
Based on "Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS", Rinnan et al, J. Chemo. (2013).


====Inputs====
====Inputs====
Line 16: Line 18:
* '''Y''' = Y-block
* '''Y''' = Y-block
* '''ncomps''' = the specified number of latent variables. In intelligent and comprehensive modes, it sets the maxLV settings in the options (minimum 8 in intelligent and comprehensive modes).
* '''ncomps''' = the specified number of latent variables. In intelligent and comprehensive modes, it sets the maxLV settings in the options (minimum 8 in intelligent and comprehensive modes).
* '''options''' = options structure for RPLS. (optional).
* '''options''' = options structure for rPLS. (optional).


====Outputs====
====Outputs====

Revision as of 15:54, 20 April 2017

Purpose

RPLS Recursive PLS and PCR variable selection.

Synopsis

results = rpls(X,Y,ncomps,options)

Description

Perform model-based variable selection based on PLS, iterative re-weighting of X by normalized regression coefficients. The final iteration, or the iteration which provides the lowest RMSECV is selected. RPLS has three modes: “specified”, “suggested”, and “surveyed”. The default is “specified” mode which will construct PLS models using the specified number of components (latent variables), ncomps, exclusively. The “suggested” mode runs PLS and crossval on the entire data, determines the most appropriate number of latent variables, and then proceeds with rPLS as in "specified" mode. The “surveyed” mode runs PLS from 1 latent variable to the maximum number of latent variables, the set of results with the lowest RMSECV value is returned. The “algorithm” options allows this function to behave as an rPLS or rPCR algorithm. The default is PLS, but options.algorithm=’pcr’ changes the algorithm to PCR.

Inputs are (X,Y) the X and Y data, (ncomps) the number of latent variables to be used (or maximum number of latent variables to be used in ‘intelligent’ and ‘comprehensive’ modes), (options) is the options structure for rPLS.

If Options.plots is ‘final’, a plot is given displaying the rPLS weights with an overlay of the mean sample for reference. The iteration number is displayed on the y-axis, and the dataset axisscale is on the x-axis, if X has no axisscale, then variable indexes is used.

Based on "Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS", Rinnan et al, J. Chemo. (2013).

Inputs

  • X = X-block
  • Y = Y-block
  • ncomps = the specified number of latent variables. In intelligent and comprehensive modes, it sets the maxLV settings in the options (minimum 8 in intelligent and comprehensive modes).
  • options = options structure for rPLS. (optional).

Outputs

The output is a results structure with the following fields:

  • LVsUsed: Displays the nComp used.
  • selected: The selected RPLS iteration with the lowest RMSECV value.
  • RMSECVs: RMSECV values for each RPLS iteration.
  • iterativeReg: The Regression (weights) for each RPLS iteration.
  • cumulativeReg: The cumulative weights at each RPLS iteration.
  • selectedIdxs: the selected indexes for each RPLS iteration.
  • regDifferences: the calculated differences between weights.

Options

  • options = options structure containing the fields:
  • plots: [‘none’|{’final’}], governs level of plotting.
  • mode: [{‘specified’}|’intelligent’|’comprehensive’], defines the mode of RPLS to be performed:
  • specified mode: runs RPLS at the specified number of components.
  • intelligent mode: performs PLS & cross-validation on the entire dataset to determine the must appropriate number of LVs and then proceeds to run RPLS as in 'specified' mode.
  • comprehensive mode: runs comprehensively from 1 to the designated maximum number of latent variables, and selects the RPLS results with the lowest RMSECV value.
  • algorithm: [{‘pls’}|’pcr’] determines which algorithm to use in RPLS.
  • wtlimit: [1e-16] the lower limit for retaining variables.
  • cvi: {‘vet’, [], 3} Three element cell indicating the cross-validation leave-out settings to use {method splits iterations}. For valid modes, see the “cvi” input to crossval. If splits (the second element in the cell) is empty, the square root of the number of samples will be used. Cvi can also be a vector (non-cell) of indexes indicating leave-out grouping (see crossval for more info).
  • stopcrt: stop criteria, stops RPLS iteration process if the difference between iterations is less than the stopcrt value {default: 1e-12}
  • maxlv: max number of latent variables to use in cross-validation {default: 10}
  • maxiter: max number of RPLS iterations {default: 100}
  • preprocessing: defines preprocessing and can be one of the following:
  • ’none’ : no preprocessing {default}
  • ‘meancenter’ : mean centering
  • ‘autoscale’ : autoscaling

See Also

gaselctr, genalg, ipls, sratio, vip, Interval PLS (IPLS) for Variable Selection