Doptimal: Difference between revisions
imported>Jeremy (Importing text file) |
imported>Bob No edit summary |
||
Line 1: | Line 1: | ||
===Purpose=== | ===Purpose=== | ||
Line 10: | Line 9: | ||
===Description=== | ===Description=== | ||
DOPTIMAL selects a number (nosamps) of samples from a candidate matrix x that maximizes the determinant of det(x(isel,:)'\*x(isel,:)) where isel is a vector of indices of the selected samples. | DOPTIMAL selects a number (<tt>nosamps</tt>) of samples from a candidate matrix <tt>x</tt> that maximizes the determinant of det(<tt>x</tt>(<tt>isel</tt>,:)'\*<tt>x</tt>(<tt>isel</tt>,:)) where <tt>isel</tt> is a vector of indices of the selected samples. | ||
The optional input ''iint'' is a vector of indices to initialize the optimization algorithm. If ''iint'' is not input the algorithm is initialized using samples identified as on the exterior of the data set using the DISTSLCT function. This is in contrast to initializing with a random subset used in many algorithms. The reason is that the routine is based on Fedorov's algorithm (de Aguiar, P.F., Bourguignon, B., Khots, M.S., Massart, D.L., and Phan-Than-Luu, R., "D-optimal designs", ''Chemo. Intell. Lab. Sys.'', '''30''', 199-210, 1995) which requires calculating inv(<tt>x</tt>(<tt>isel</tt>,:)'\*<tt>x</tt>(<tt>isel</tt>,:)), and it is possible that the inverse of a random set will not exist. The routine then exchanges the 'least informative' sample in the selected set with a 'more informative' sample in the candidate set. The optional input ''tol'' sets the tolerance for minimum increase in the determinant {default = 1x10<sup>-4</sup>}. | |||
Note that nosamps must be <u>></u> rank(<tt>x</tt>) (it is necessary but not sufficient that <tt>nosamps</tt> <u>></u> size(<tt>x</tt>,2)) for a good solution to be found. This is required so that a good estimate of inv(<tt>x</tt>(<tt>isel</tt>,:)'\*<tt>x</tt>(<tt>isel</tt>,:)) can be obtained. When <tt>nosamps</tt> <u>></u> size(<tt>x</tt>,2) the scores from PCA or PLS can be used where <tt>nosamps</tt> <u>></u> than the number of factors (principal components or latent variables) used. Also, note that the solution can depend on the initial guess and that <tt>isel</tt> does not necessarily represent a global optimum. | |||
====Inputs==== | |||
* '''x''': data matrix | |||
* '''nosamps''': number of samples to select | |||
====Optional Inputs==== | |||
* '''iint''': vector of initialization indices | |||
* '''tol''': tolerance for minimum increase in the determinant {default: 1x10<sup>-4</sup>} | |||
====Outputs==== | |||
* '''isel''': vector of selected indices | |||
===Examples=== | ===Examples=== | ||
Line 20: | Line 33: | ||
For an input matrix x that is m by 5 | For an input matrix x that is m by 5 | ||
<pre> | |||
isel5 = doptimal(x,5); | |||
isel6 = doptimal(x,6); | |||
</pre> | |||
===See Also=== | ===See Also=== | ||
[[distslct]], [[stdsslct]] | [[distslct]], [[stdsslct]] |
Latest revision as of 16:36, 8 October 2008
Purpose
Selects samples from a candidate matrix that satisfy the d-optimal condition.
Synopsis
- isel = doptimal(x,nosamps,iint,tol)
Description
DOPTIMAL selects a number (nosamps) of samples from a candidate matrix x that maximizes the determinant of det(x(isel,:)'\*x(isel,:)) where isel is a vector of indices of the selected samples.
The optional input iint is a vector of indices to initialize the optimization algorithm. If iint is not input the algorithm is initialized using samples identified as on the exterior of the data set using the DISTSLCT function. This is in contrast to initializing with a random subset used in many algorithms. The reason is that the routine is based on Fedorov's algorithm (de Aguiar, P.F., Bourguignon, B., Khots, M.S., Massart, D.L., and Phan-Than-Luu, R., "D-optimal designs", Chemo. Intell. Lab. Sys., 30, 199-210, 1995) which requires calculating inv(x(isel,:)'\*x(isel,:)), and it is possible that the inverse of a random set will not exist. The routine then exchanges the 'least informative' sample in the selected set with a 'more informative' sample in the candidate set. The optional input tol sets the tolerance for minimum increase in the determinant {default = 1x10-4}.
Note that nosamps must be > rank(x) (it is necessary but not sufficient that nosamps > size(x,2)) for a good solution to be found. This is required so that a good estimate of inv(x(isel,:)'\*x(isel,:)) can be obtained. When nosamps > size(x,2) the scores from PCA or PLS can be used where nosamps > than the number of factors (principal components or latent variables) used. Also, note that the solution can depend on the initial guess and that isel does not necessarily represent a global optimum.
Inputs
- x: data matrix
- nosamps: number of samples to select
Optional Inputs
- iint: vector of initialization indices
- tol: tolerance for minimum increase in the determinant {default: 1x10-4}
Outputs
- isel: vector of selected indices
Examples
For an input matrix x that is m by 5
isel5 = doptimal(x,5); isel6 = doptimal(x,6);