IPLS Variable Selection Interface

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Stepwise Interval Variable Selection Panel

The Interval Variable Selection panel in the Analysis Window provides access to the interval variable selection functionality.

Variable selection provides a simple way to discard variables which may be adding complexity to a model. Discarding such variables may improve performance of a final model. Variable selection can also be used as an exploratory tool to help identify variables which are the most interesting for a given task. For a detailed description of how interval variable selection works and practical considerations, see Interval PLS for Variable Selection. The following documentation describes the basic use of the controls on the Interval Variable Selection Panel.

Note that in addition to the controls on the panel, the current settings in the preprocessing and cross-validation settings of Analysis are also used by the Interval Variable Selection algorithm.



(Note: Mode is only visible if the "All Options" checkbox has been checked.) Defines the variable selection direction. "Forward" starts with no variables selected and looks for the best single interval to add. "Reverse" starts with all variables selected and looks for the worst single interval to discard. See the "No. of Intervals" option below to add or remove more than one interval. See the "Interval Size" option to add or remove "blocks" of variables in each interval.

No. of Intervals

Defines the total number of intervals to be added or removed. When a specific number is entered, the algorithm will continue to add or remove intervals up to this number (A value of one will limit the algorithm to selecting only one interval to add or remove, 2 will look for the two best/worst intervals to add/remove, etc.) When "Automatic" is checked, the algorithm will continue to add or remove intervals until the cross-validation results cease to improve.

Interval Size (vars)

Specifies how many adjacent variables should be included in each interval. For example, a value of 5 will cause each interval to contain a set of 5 adjacent variables which must be added or removed as a group. A value of one (the default) will cause each interval to contain a single variable.

Step Size

(Note: Step Size is only visible if the "All Options" checkbox has been checked.) Specifies the number of variables between the start of each interval. When less than the Interval Size, intervals will "overlap" and variables may belong to more than one interval (this allows a "sliding window" style of variable selection). When greater than the Interval Size, there will be a gap of unused variables between intervals (can be used for quick, course selection of variable windows). The automatic setting will cause there to be no overlap nor gaps between intervals.

Max LVs

For IPLS, IPLSDA, and IPCR, this setting defines the maximum number of Latent Variables or Principal Components which can be used for each variable subset mode.

Use Selected Intervals

Once a variable selection has been performed, the list of selected variable(s) will appear in the box at the bottom of the interface. To change the x-block to include only the selected variables, click the "Use" button. This will change the include field on the calibration X-block to the selected variables. To discard the list of selected variables and start variable selection clean, click the "Discard" button.