Automatic sample selection

From Eigenvector Research Documentation Wiki
Revision as of 21:55, 8 October 2012 by imported>Jeremy (Created page with "The Calibration/Validation Sample selection interface allows the user to choose which samples to keep in the calibration set (Cal) and which to move to the validation set (Val). ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Calibration/Validation Sample selection interface allows the user to choose which samples to keep in the calibration set (Cal) and which to move to the validation set (Val).

Selection can be done manually, by setting the "Sample Type" Class set (Under the Row Labels tab) to either Calibration or Validation for each sample, or automatically by selecting the Automatic split button (gear) in the toolbar.

The sample selection interface is opened by choosing "Split into Calibration / Validation" from any of the data blocks in the Analysis status window. The resulting interface is a customized DataSet editor which shows one row for each sample in the current calibration and validation blocks and allows the user to modify the status of each sample.

Once set selection is done, the "Accept Experiment Setup" toolbar button can be used to automatically sort the data into the calibration and validation blocks. All data marked as "Calibration" will be moved to the X/Y blocks in the calibration section of the Analysis window and all data marked as "Validation" will be moved to the X/Y blocks in the validation section of the Analysis window. Clicking the "Discard Experiment Setup" button will discard all Cal / Val changes.

Manual Sample Selection

Each sample can be moved to either the Calibration or Validation set by simply changing the "Sample Type" class. If there are labels for the samples, these will be shown in the Label field of the interface.

To move more than one sample at a time, click the button at the left of each row to move to select the row. Once all the desired rows are selected, use the Class pull-down menu on one of the selected rows to choose Calibration or Validation, as desired. All selected samples will be switched to the indicated set.

Automatic Sample Selection

Automatic sample selection walks the user through the selection asking a series of questions outlined below.

Disposition of Previous Selection Changes

First, if there are any samples which have been manually or automatically moved from Cal to Val, or vice versa, the user is asked if they want to Reset all samples back to their original set before automatic selection is done. Choosing "Reset" will restore all the samples to the set they were in when the sample selection interface was opened. Choosing "Select from Current Split" will keep the samples in their current split and allow further selection automatically. "Cancel" stops all selection.


Direction for Sample Selection

Next, if there are any samples marked as Validation, the user is asked which "direction" they want to select, either removing samples FROM the calibration set (to the validation set), or adding samples TO the calibration set (out of the validation set). The first option is used when there are more samples in the calibration than are desired or when the user wishes to create a test set for their model. The second option is used when new data has been measured and the user wishes to add some subset of these samples to a previous set of calibration samples (to improve model performance on the new types of samples.)

If all the samples are in the calibration set already, Remove From Calibration is assumed.


Selection Method

Next, the selection method must be chosen from:

  • Nearest Neighbor Thinning - based on reducennsamples this method selects samples by discarding samples which are very similar to existing samples. The result is a set of samples that spans the same range as the original data, but with an even distribution of samples across that range.
  • Onion - based on distselect this method first selects a ring of highly-unqiue samples based on distance (similar to the D-Optimal criteria), then leaves out a ring of the next-unique samples, then finally selects a random subset of samples inside the boundaries selected in the "onion".


Choosing Percentage to Keep

Finally, the user must select the percentage of samples to "select". In the case of Removing From Calibration, this is the percentage of Calibration samples to keep in the calibration set. In the case of Adding To Calibration, this is the percentage of Validation samples to add to the calibration set. The value must be between 1 and 100


Finishing the Selection

Once all settings have been defined, the selection will take place and the samples will be marked in their new sets. It may be useful to create a plot (click on the Plot toolbar button, or the Plot tab) to view which samples are in which sets. Accepting the changes will move all samples to the new sets and make sure Analysis is in the appropriate configuration for analysis of the data.