Duplex

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Select a subset of samples from a data set by the Duplex algorithm.

Synopsis

[selCal, selTest] = duplex(x, k)

Description

Selected samples should provide uniform coverage of the dataset and include samples on the boundary of the data set. Duplex starts by selecting the two samples furthest from each other and assigns these to the calibration set. Then finds the next two samples furthest from each other assigns these to the test set. Then iterates over the rest of the samples to find the sample furthest from the samples in the calibration set and assigns this to the calibration set and then finds the sample furthest from the test set and assigns this to the test set. This is done until the desired number of samples in the calibration set is reached.

References:

  • R.D. Snee, Validation of regression models: methods and examples, Technometrics 19 (1977) 415-428
  • M. Daszykowski, B. Walczak, D.L. Massart, Representative subset selection, Analytica Chimica Acta 468 (2002) 91-103

Inputs

  • x = array, or dataset, containing data to select k samples from,
  • k = number of samples to select.

Outputs

  • selCal = logical vector of length nsamples, indicating samples which are selected for calibration set (true = selected). If input x was a dataset object then sel has size (1, nincluded) where nincluded is the number of included samples, and sel indicates which included samples are selected.
  • selTest = (1,nsamples) logical vector indicating samples which are selected for test set, true = is selected. If input x was a dataset then sel has size (1, nincluded) and sel indicates which included samples are selected.

Example

>> load arch;
>> [selCal,selTest] = duplex(arch, 50);
>> arch_subset = arch(selCal,:);

See Also

distslct, reducennsamples, splitcaltest, doptimal, stdsslct, randomsplit, spxy