Choosecomp
Jump to navigation
Jump to search
Purpose
Returns suggestion for number of components to include in a model.
Synopsis
- lvs = choosecomp(model,options)
Description
Automatic factor suggestion based on information available in a given model. Suggestion is made on the available information in the model, depending on model type:
- PCA
- Without cross-validation: selection is based on looking for a "knee" (drop) in eigenvalue. The PC just before the drop is selected.
- With cross-validation: initial suggestion is made based on eigenvalues (as described above). Suggestion is refined by looking at change in RMSECV for adding or removing factors . If changing factors provides more than a given % improvement in RMSECV (relative to the maximum RMSECV observed), then suggestion is changed. Threshold for change is defined by options (see below).
- PLS/PCR
- Without cross-validation: No suggestion will be made.
- With cross-validation: A "knee" in RMSEC is searched for (none found will suggest 1 LV). The suggestion is then improved using the RMSECV values and the search algorithm described above for PCA.
- PLSDA
- Without cross-validation: No suggestion will be made.
- With cross-validation: An initial suggestion is determined by searching for a "knee" in the mean RMSECV (note difference from
PLS/PCR). This suggestion is then refined based on the mean misclassification error reported from cross-validation.
In all cases, a suggestion is only offered for models with more than 7 factors and if that suggestion includes less than 50% of the estimated rank of the data. No suggestion is made for unlisted model types.
Inputs
- model = standard model structure.
Optional Inputs
- options = options structure defined below.
Outputs
- lvs = number of suggested components. Will be empty [ ] if no suggestion can be made.
Options
The options structure can contain one or more of the following fields:
- plscvthreshold : [ ] Percent improvement required to relative RMSECV to change the number of LVs from the initial suggestion (for PLS models only). Empty uses average of absolute change of RMSECV for all observed factors. i.e.: mean(abs(diff(CV))) Where CV is the relative CV.
- plsdacvthreshold : [ ] Same as above but used for PLSDA models.
- pcacvthreshold : [ ] Same as above but used for PCA models.
The default values for these options can also be set using the setplspref command or the preferences expert interface.