Rpls: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Benjamin
No edit summary
imported>Benjamin
No edit summary
 
Line 49: Line 49:


===See Also===
===See Also===
[[gaselctr]], [[genalg]], [[ipls]], [[sratio]], [[vip]], [[Interval PLS (IPLS) for Variable Selection]], [[Genetic Algorithms for Variable Selection]], [[Sample and Variable Selection]], [[Variable Selection]]
[[selectvars]], [[gaselctr]], [[genalg]], [[ipls]], [[sratio]], [[vip]], [[Interval PLS (IPLS) for Variable Selection]], [[Genetic Algorithms for Variable Selection]], [[Sample and Variable Selection]], [[Variable Selection]]

Latest revision as of 17:56, 10 January 2018

Purpose

rPLS Recursive PLS and PCR variable selection.

Synopsis

results = rpls(X,Y,ncomps,options)

Description

Perform model-based variable selection based on PLS, iterative re-weighting of X by normalized regression coefficients. The final iteration, or the iteration which provides the lowest RMSECV is selected. rPLS has three modes: “specified”, “suggested”, and “surveyed”. The default is “specified” mode which will construct PLS models using the specified number of components (latent variables), ncomps, exclusively. The “suggested” mode runs PLS and crossval on the entire data, determines the most appropriate number of latent variables, and then proceeds with rPLS as in "specified" mode. The “surveyed” mode runs PLS from 1 latent variable to the maximum number of latent variables, the set of results with the lowest RMSECV value is returned. The “algorithm” options allows this function to behave as an rPLS or rPCR algorithm. The default is PLS, but options.algorithm=’pcr’ changes the algorithm to PCR.

Inputs are (X,Y) the X and Y data, (ncomps) the number of latent variables to be used (or maximum number of latent variables to be used in ‘suggested’ and ‘surveyed’ modes), (options) is the options structure for rPLS.

If Options.plots is ‘final’, a plot is given displaying the rPLS weights with an overlay of the mean sample for reference. The iteration number is displayed on the y-axis, and the dataset axisscale is on the x-axis, if X has no axisscale, then variable indexes is used.

Based on "Recursive weighted partial least squares (rPLS): an efficient variable selection method using PLS", Rinnan et al, J. Chemo. (2013).

Inputs

  • X = X-block
  • Y = Y-block
  • ncomps = the specified number of latent variables. In suggested and surveyed modes, it sets the maxLV settings in the options (minimum 8 in suggested and surveyed modes).
  • options = options structure for rPLS. (optional).

Outputs

The output is a results structure with the following fields:

  • LVsUsed: Displays the nComp used.
  • selected: The selected rPLS iteration with the lowest RMSECV value.
  • RMSECVs: RMSECV values for each rPLS iteration.
  • iterativeReg: The Regression (weights) for each rPLS iteration.
  • cumulativeReg: The cumulative weights at each rPLS iteration.
  • selectedIdxs: the selected indexes for each rPLS iteration.
  • regDifferences: the calculated differences between weights.

Options

  • options = options structure containing the fields:
  • plots: [‘none’|{’final’}], governs level of plotting.
  • mode: [{‘specified’}|’suggested’|’surveyed’], defines the mode of rPLS to be performed:
  • specified mode: runs rPLS at the specified number of components.
  • suggested mode: performs PLS & cross-validation on the entire dataset to determine the must appropriate number of LVs and then proceeds to run rPLS as in 'specified' mode.
  • surveyed mode: runs rPLS using from 1 to the designated maximum number of latent variables, and selects the rPLS results with the lowest RMSECV value.
  • algorithm: [{‘pls’}|’pcr’] determines which algorithm to use in rPLS.
  • wtlimit: [1e-16] the lower limit for retaining variables.
  • cvi: {‘vet’, [], 3} Three element cell indicating the cross-validation leave-out settings to use {method splits iterations}. For valid modes, see the “cvi” input to crossval. If splits (the second element in the cell) is empty, the square root of the number of samples will be used. Cvi can also be a vector (non-cell) of indexes indicating leave-out grouping (see crossval for more info).
  • stopcrt: stop criteria, stops rPLS iteration process if the difference between iterations is less than the stopcrt value {default: 1e-12}
  • maxlv: max number of latent variables to use in cross-validation {default: 10}
  • maxiter: max number of rPLS iterations {default: 100}
  • preprocessing: defines preprocessing and can be one of the following:
  • ’none’ : no preprocessing {default}
  • ‘meancenter’ : mean centering
  • ‘autoscale’ : autoscaling

See Also

selectvars, gaselctr, genalg, ipls, sratio, vip, Interval PLS (IPLS) for Variable Selection, Genetic Algorithms for Variable Selection, Sample and Variable Selection, Variable Selection