Choosecomp
Jump to navigation
Jump to search
Purpose
Returns suggestion for number of components to include in a model.
Synopsis
- lvs = choosecomp(model,options)
Description
Automatic factor suggestion based on information available in a given model. Suggestion is made on the available information in the model, depending on model type:
- PCA
- Without cross-validation: selection is based on looking for a "knee" (drop) in eigenvalue. The PC just before the drop is selected.
- With cross-validation: initial suggestion is made based on eigenvalues (as described above). Suggestion is refined by looking at change in RMSECV for adding or removing factors . If changing factors provides more than a given % improvement in RMSECV (relative to the maximum RMSECV observed), then suggestion is changed. Threshold for change is defined by options (see below).
- PLS/PCR
- Without cross-validation: No suggestion will be made.
- With cross-validation: A "knee" in RMSEC is searched for (none found will suggest 1 LV). The suggestion is then improved using the RMSECV values and the search algorithm described above for PCA.
- PLSDA
- Without cross-validation: No suggestion will be made.
- With cross-validation: An initial suggestion is determined by searching for a "knee" in the mean RMSECV (note difference from
PLS/PCR). This suggestion is then refined based on the mean misclassification error reported from cross-validation.
In all cases, a suggestion is only offered for models with more than 7 factors and if that suggestion includes less than 50% of the estimated rank of the data. No suggestion is made for unlisted model types.
Inputs
- model = standard model structure.
Optional Inputs
- options = options structure defined below.
Outputs
- lvs = number of suggested components. Will be empty [ ] if no suggestion can be made.
Options
The options structure can contain one or more of the following fields:
- plscvthreshold : [ ] Percent improvement required to relative RMSECV to change the number of LVs from the initial suggestion (for PLS models only). If not specified (i.e. passed as [ ] empty) the algorithm uses a threshold equal to the average of the absolute difference of adjacent RMSECV values. i.e.: mean(abs(diff(CV))) Where CV is the relative CV.
- plsdacvthreshold : [ ] Same as above but used for PLSDA models.
- pcacvthreshold : [ ] Same as above but used for PCA models.
The default values for these options can also be set using the setplspref command or the preferences expert interface.