# Gaselctr

Jump to navigation
Jump to search

### Purpose

Genetic algorithm for variable selection with PLS.

### Synopsis

- model = gaselctr(x,y,
*options*) - [fit,pop,cavfit,cbfit] = gaselctr(x,y,
*options*)

### Description

GASELCTR uses a genetic algorithm optimization to minimize cross validation error for variable selection.

#### Inputs

**x**= the predictor block (x-block), and**y**= the predicted block (y-block) (note that all scaling should be done prior to running GASELCTR).

#### Outputs

**model**= a standard GENALG model structure with the following fields:**modeltype**: 'GENALG' This field will always have this value.**datasource**: {[1x1 struct] [1x1 struct]}, structures defining where the X- and Y-blocks came from.**date**: date stamp for when GASELCTR was run.**time**: time stamp for when GASELCTR was run.**info**: 'Fit results in "rmsecv", population included variables in "icol"', information field describing where the fitness results for each member of the population are contained.**rmsecv**: fitness results for each member of the population, for X*M*x*N*and*Mp*unique populations at convergence then rmsecv will be*1xMp*.**icol**: each row of icol corresponds to the variables used for that member of the population (a 1 [one] means that variable was used and a 0 [zero] means that it was not), for X*M*x*N*and*Mp*unique populations at convergence then icol will be*Mp*x*N*, and**detail**: [1x1 struct], a structure array containing model details including the following fields:**avefit**: the average fitness at each generation.**bestfit**: the best fitness at each generation, and**options**: a structure corresponding to the options discussed above.

For the second output syntax shown above,

**fit**is the same as`model.rmsecv`**pop**is the same as`model.icol`**cavfit**is the same as`model.detail.avefit`**cbfit**is the same as`model.detail.bestfit`

### Options

*options* is a structure array with the following fields:

**plots**: ['none' | {'intermediate'} | 'replicates' | 'final' ] Governs plots.- '
**final'**gives only a final summary plot. - '
**replicates'**gives plots at the end of each replicate. - '
**intermediate'**gives plots during analysis. - '
**none'**gives no plots.

- '
**display**: [{'on'}| 'off' ] governs output to the command window.**popsize**: {64} the population size (16__<__popsize__<__256 and popsize must be divisible by 4),**maxgenerations**: {100} the maximum number of generations (25__<__mg__<__500),**mutationrate**: {0.005} the mutation rate (typically 0.001__<__mt__<__0.01),**windowwidth**: {1} the number of variables in a window (integer window width),**convergence**: {50} percent of population the same at convergence (typically cn=80),**initialterms**: {30} percent terms included at initiation (10__<__it__<__50),**crossover**: {2} breeding cross-over rule (cr = 1: single cross-over; cr = 2: double cross-over),**algorithm**: [ 'mlr' | {'pls'} ] regression algorithm,**ncomp**: {10} maximum number of latent variables for PLS models,**cv**: [ 'rnd' | {'con'} ] cross-validation option ('rnd': random subset cross-validation; 'con': contiguous block subset cross-validation),**split**: {5} number of subsets to divide data into for cross-validation,**iter**: {1} number of iterations for cross-validation at each generation,**preprocessing**: {[ ] [ ]} a cell containing standard preprocessing structures for the X- and Y-blocks respectively (see PREPROCESS),**preapply**: [ {0} | 1 } If 1, preprocessing is applied to data prior to GA. This speeds up the performance of the selection, but may reduce the accuracy of the cross-validation results. Output "fit" values should only be compared to each other. A full cross-validation should be run after analysis to get more accurate RMSECV values.**reps**: {1} the number of replicate runs to perform,**target**: a two element vector [target_min target_max] describing the target range for number of variables/terms included in a model n. Outside of this range, the penaltyslope option is applied by multiplying the fitness for each member of the population by:

`penaltyslope*(target_min-n)`when n<target_min, or`penaltyslope*(n-target_max)`when n>target_max.- Field
`target`is used to bias models towards a given range of included variables (see penaltyslope below),

**targetpct**: {1} flag indicating if values in field target are given in percent of variables (1) or in absolute number of variables (0), and**penaltyslope**: {0} the slope of the penalty function (see target above).

### Examples

To use mean centering outside the genetic algorithm (no additional centering will be performed within the algorithm) do the following:

x2 = mncn(x); y2 = mncn(y); [fit,pop] = gaselctr(x2,y2);

To use mean centering inside the genetic algorithm (centering will be performed for each cross-validation subset) do the following:

options = gaselctr('options'); options.preprocessing{1} = preprocess('default', 'mean center'); options.preprocessing{2} = preprocess('default', 'mean center'); [fit,pop] = gaselctr(x2,y2,options);

### See Also

calibsel, fullsearch, genalg, genalgplot, ipls,Genetic Algorithms for Variable Selection