Choosecomp

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Returns suggestion for number of components to include in a model.

Synopsis

lvs = choosecomp(model)
lvs = choosecomp(model,options)

Description

Automatic factor suggestion based on information available in a given model. Suggestion is made on the available information in the model, depending on model type:

  • PCA
  • Without cross-validation: selection is based on looking for a "knee" (drop) in eigenvalue. The PC just before the drop is selected.
  • With cross-validation: initial suggestion is made based on eigenvalues (as described above). Suggestion is refined by looking at change in RMSECV for adding or removing factors . If changing factors provides more than a given % improvement in RMSECV (relative to the maximum RMSECV observed), then suggestion is changed. Threshold for change is defined by options (see below).
  • PLS/PCR
  • Without cross-validation: No suggestion will be made.
  • With cross-validation: A "knee" in RMSEC is searched for (none found will suggest 1 LV). The suggestion is then improved using the RMSECV values and the search algorithm described above for PCA.
  • PLSDA
  • Without cross-validation: No suggestion will be made.
  • With cross-validation: An initial suggestion is determined by searching for a "knee" in the mean RMSECV (note difference from PLS/PCR). This suggestion is then refined based on the mean misclassification error reported from cross-validation.

In all cases, a suggestion is only offered for models with more than 7 factors and if that suggestion includes less than 50% of the estimated rank of the data. No suggestion is made for unlisted model types.

Inputs

  • model = standard model structure.

Optional Inputs

  • options = options structure defined below.

Outputs

  • lvs = number of suggested components. Will be empty [ ] if no suggestion can be made.

Options

The options structure can contain one or more of the following fields:

  • plscvthreshold : [ ] Percent improvement required to relative RMSECV to change the number of LVs from the initial suggestion (for PLS models only). If not specified (i.e. passed as [ ] empty) the algorithm uses a threshold equal to the average of the absolute difference of adjacent RMSECV values. i.e.: mean(abs(diff(CV))) Where CV is the relative CV.
  • plsdacvthreshold : [ ] Same as above but used for PLSDA models.
  • pcacvthreshold : [ ] Same as above but used for PCA models.

The default values for these options can also be set using the setplspref command or the preferences expert interface.

See Also

crossval, estimatefactors, pca, pls, plsda