Svmoc

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Support Vector Machine for one-class analysis.

Synopsis

model = svmoc(x,options);  % identifies model using calibration data x, which all have the same, one class.
pred = svmoc(x,model,options);  % makes predictions with a new X-block, identifying outliers

Description

One-class analysis using SVM. SVMOC performs calibration and application of one-class Support Vector Machine models. SVMOC tries to represent the distribution of a data set where all data points belong to the same class. It does this by identifying an envelope which surrounds most of the calibration data. A small, user-specified, fraction (nu) of the data is left outside the distribution defining envelope. This fraction represents the expected outlier fraction in the data. When the model is applied to a new dataset any data points which fall outside the envelope are considered outliers from the one class. Svmoc is implemented using the LIBSVM package.

Note: svmoc is currently only available as a command-line function and is not available as a graphical user interface (GUI) analysis method.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots [ 'none' | {'final'} ], governs level of plotting,
  • preprocessing: {[]} preprocessing structures for x block (see PREPROCESS). NOTE that y-block preprocessing is NOT used with SVMOC. Any y-preprocessing will be ignored.
  • blockdetails: [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
  • algorithm: [ 'libsvm' ] algorithm to use. libsvm is default and currently only option.
  • kerneltype: [ 'linear' | {'rbf'} ], SVM kernel to use. 'rbf' is default. 'linear' is not recommended for SVMOC.
  • svmtype: [ 'one-class svm' ] Type of SVM to apply. 'one-class svm' is the default and only allowed value.
  • probabilityestimates: [0| {1} ], whether to train the SVR model for probability estimates, 0 or 1 (default 1)"
  • cvtimelimit: Set a time limit (seconds) on individual cross-validation sub-calculation when searching over supplied SVM parameter ranges for optimal parameters. Only relevant if parameter ranges are used for SVM parameters such as nu. Default is 10 (seconds);
  • splits: Number of subsets to divide data into when applying n-fold cross validation. Default is 5.
  • gamma: Value(s) to use for LIBSVM kernel gamma parameter. Default is 15 values from 10^-6 to 10, spaced uniformly in log.
  • nu: Value to use for LIBSVM 'n' parameter (nu). Default = 0.05 indicates 5% of calibration data are considered outliers from the distribution.

Algorithm

Svmoc uses the LIBSVM implementation using the user-specified values for the LIBSVM parameters (see options above). See [[1]] for further details of these options.

The default SVMOC gamma parameter has a value range rather than a single value. This svm function uses a search over the parameter range using cross-validation to select the optimal SVM parameter values and builds an SVM model using those values. This is the recommended usage. The user can avoid this search, however, by passing in a single desired gamma value.

Cross-validation misclassification rate is used to compare the performance of SVMOC models built at each gamma value when a range of gamma values is input. The search is performed using gamma from largest to smallest value. Choose first encountered gamma which has a misclassification rate < 1.25 times the overall minimum misclassification rate found for any of the gamma values, or if none found, then chose the gamma which gives smallest misclassification rate. This approach is taken because large gamma values give undesirable overfitted models (with large CV misclassification rate) while small gamma values give the simplest solution of a bounding hyper-sphere with fraction nu outliers and CV misclassification rate also approximately equal to nu. This selection method attempts to select an intermediate gamma which leads to a bounding hyper-surface capturing some of the shape of the calibration data distribution.

See Also

svm, svmda