Rpls
Purpose
RPLS Recursive PLS and PCR variable selection.
Synopsis
- results = rpls(X,Y,ncomps,options)
Description
Perform variable selection based on PLS weights and RMSECV values obtained at each iteration. The interval set which provides the lowest RMSECV is selected. RPLS has three modes: “specified”, “intelligent”, and “comprehensive”. The default is “specified” mode which will construct PLS models using the specified number of components (latent variables), ncomps, exclusively. The “intelligent” mode runs PLS and crossval on the entire data, determines the most appropriate number of latent variables, and then proceeds with RPLS as in "specified" mode. The “comprehensive” mode runs PLS from 1 latent variable to the maximum number of latent variables, the set of results with the lowest RMSECV value is returned. The “algorithm” options allows this function to behave as an RPLS or RPCR algorithm. The default is PLS, but options.algorithm=’pcr’ changes the algorithm to PCR.
Inputs are (X,Y) the X and Y data, (ncomps) the number of latent variables to be used (or maximum number of latent variables to be used in ‘intelligent’ and ‘comprehensive’ modes), (options) is the options structure for RPLS.
If Options.plots is ‘final’, a plot is given displaying the RPLS weights with an overlay of the mean sample for reference. The iteration number is displayed on the y-axis, and the dataset axisscale is on the x-axis, if X has no axisscale, then variable indexes is used.
Inputs
- X = X-block
- Y = Y-block
- ncomps = the specified number of latent variables. In intelligent and comprehensive modes, it sets the maxLV settings in the options (minimum 8 in intelligent and comprehensive modes).
- options = options structure for RPLS. (optional).
Outputs
The output is a results structure with the following fields:
- LVsUsed: Displays the nComp used.
- selected: The selected RPLS iteration with the lowest RMSECV value.
- RMSECVs: RMSECV values for each RPLS iteration.
- iterativeReg: The Regression (weights) for each RPLS iteration.
- cumulativeReg: The cumulative weights at each RPLS iteration.
- selectedIdxs: the selected indexes for each RPLS iteration.
- regDifferences: the calculated differences between weights.
Options
- options = options structure containing the fields:
- plots: [‘none’|{’final’}], governs level of plotting.
- mode: [{‘specified’}|’intelligent’|’comprehensive’], defines the mode of RPLS to be performed:
- specified mode: runs RPLS at the specified number of components.
- intelligent mode: performs PLS & cross-validation on the entire dataset to determine the must appropriate number of LVs and then proceeds to run RPLS as in 'specified' mode.
- comprehensive mode: runs comprehensively from 1 to the designated maximum number of latent variables, and selects the RPLS results with the lowest RMSECV value.
- algorithm: [{‘pls’}|’pcr’] determines which algorithm to use in RPLS.
- wtlimit: [1e-16] the lower limit for retaining variables.
- cvi: {‘vet’, [], 3} Three element cell indicating the cross-validation leave-out settings to use {method splits iterations}. For valid modes, see the “cvi” input to crossval. If splits (the second element in the cell) is empty, the square root of the number of samples will be used. Cvi can also be a vector (non-cell) of indexes indicating leave-out grouping (see crossval for more info).
- stopcrt: stop criteria, stops RPLS iteration process if the difference between iterations is less than the stopcrt value {default: 1e-12}
- maxlv: max number of latent variables to use in cross-validation {default: 10}
- maxiter: max number of RPLS iterations {default: 100}
- preprocessing: defines preprocessing and can be one of the following:
- ’none’ : no preprocessing {default}
- ‘meancenter’ : mean centering
- ‘autoscale’ : autoscaling
See Also
gaselctr, genalg, ipls, sratio, vip, Interval PLS (IPLS) for Variable Selection