Svm: Difference between revisions
imported>Donal |
|||
(40 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
===Purpose=== | ===Purpose=== | ||
SVM Support Vector Machine (LIBSVM) for regression | SVM Support Vector Machine (LIBSVM) for regression. Use SVMDA for SVM classification ([[Svmda]]). Please also look at the [[Svmda]] page since it has more detailed information much of which also applies to SVM for regression. | ||
===Synopsis=== | ===Synopsis=== | ||
:model = svm(x,y,options); %identifies model (calibration step) | :model = svm(x,y,options); %identifies model (calibration step). | ||
:pred | :pred = svm(x,model,options); %makes predictions with a new X-block | ||
:pred | :pred = svm(x,y,model,options); %performs a "test" call with a new X-block and known y-values | ||
:svm % Launches an Analysis window with SVM as the selected method. | |||
Please note that the recommended way to build and apply a SVM model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]]. | |||
===Description=== | ===Description=== | ||
SVM performs calibration and application of Support Vector Machine (SVM) regression models. | The SVM function or analysis method performs calibration and application of Support Vector Machine (SVM) regression models. SVM models can be used for regression problems. The model consists of a number of support vectors (essentially samples selected from the calibration set) and non-linear model coefficients which define the non-linear mapping of variables in the input x-block. The model allows prediction of the continuous y-block variable. It is recommended that classification be done through the svmda function. | ||
Svm is implemented using the LIBSVM package which provides both epsilon-support vector regression (epsilon-SVR) and nu-support vector regression (nu-SVR). Linear and Gaussian Radial Basis Function kernel types are supported by this function. | Svm is implemented using the LIBSVM package which provides both epsilon-support vector regression (epsilon-SVR) and nu-support vector regression (nu-SVR). Linear and Gaussian Radial Basis Function kernel types are supported by this function. | ||
Line 19: | Line 22: | ||
====Inputs==== | ====Inputs==== | ||
* '''x''' = X-block (predictor block) class "double" or "dataset", | * '''x''' = X-block (predictor block) class "double" or "dataset", containing numeric values, | ||
* '''y''' = Y-block (predicted block) class "double" or "dataset", | * '''y''' = Y-block (predicted block) class "double" or "dataset", containing numeric values, | ||
* '''model''' = previously generated model (when applying model to new data). | * '''model''' = previously generated model (when applying model to new data). | ||
====Outputs==== | ====Outputs==== | ||
* '''model''' = a standard model structure model with the following fields (see | * '''model''' = a standard model structure model with the following fields (see [[Standard Model Structure]]): | ||
** '''modeltype''': 'SVM', | ** '''modeltype''': 'SVM', | ||
** '''datasource''': structure array with information about input data, | ** '''datasource''': structure array with information about input data, | ||
Line 34: | Line 37: | ||
*** model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array) | *** model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array) | ||
** '''detail''': sub-structure with additional model details and results, including: | ** '''detail''': sub-structure with additional model details and results, including: | ||
* model.detail.svm.model: Matlab version of the libsvm svm_model (Java) | *** model.detail.svm.model: Matlab version of the libsvm svm_model (Java). Note that the number of support vectors used is given by model.detai.svm.model.l. It is useful to check this because it can indicate overfitting if most of the calibration samples are used as support vectors, or can indicate problems fitting a model if there are no support vectors (and all prediction values will equal a constant value, a weighted mean). | ||
* model.detail.svm.cvscan: | *** model.detail.svm.cvscan: Results of CV parameter scan | ||
* model.detail.svm. | *** model.detail.svm.svindices: Indices of X-block samples which are support vectors. | ||
* '''pred''' a structure, similar to '''model''' for the new data. | * '''pred''' a structure, similar to '''model''' for the new data. | ||
Line 45: | Line 48: | ||
* '''display''': [ 'off' | {'on'} ], governs level of display to command window, | * '''display''': [ 'off' | {'on'} ], governs level of display to command window, | ||
* '''plots''' [ 'none' | {'final'} ], governs level of plotting, | * '''plots''' [ 'none' | {'final'} ], governs level of plotting, | ||
* '''preprocessing''': {[]} preprocessing structures for x | * '''preprocessing''': {[] []} preprocessing structures for x and y blocks (see PREPROCESS). | ||
* '''compression''': [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the SVM model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the svmtype). Compression can make the SVM more stable and less prone to overfitting. | |||
* '''compressncomp''': [1] Number of latent variables (or principal components to include in the compression model. | |||
* '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks. | * '''blockdetails''': [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks. | ||
* '''algorithm''': [ 'libsvm' ] algorithm to use. libsvm is default and currently only option. | * '''algorithm''': [ 'libsvm' ] algorithm to use. libsvm is default and currently only option. | ||
Line 53: | Line 58: | ||
* '''cvtimelimit''': Set a time limit (seconds) on individual cross-validation sub-calculation when searching over supplied SVM parameter ranges for optimal parameters. Only relevant if parameter ranges are used for SVM parameters such as cost, epsilon, gamma or nu. Default is 10; | * '''cvtimelimit''': Set a time limit (seconds) on individual cross-validation sub-calculation when searching over supplied SVM parameter ranges for optimal parameters. Only relevant if parameter ranges are used for SVM parameters such as cost, epsilon, gamma or nu. Default is 10; | ||
* '''splits''': Number of subsets to divide data into when applying n-fold cross validation. Default is 5. | * '''splits''': Number of subsets to divide data into when applying n-fold cross validation. Default is 5. This option is only used when the "cvi" option is empty. | ||
* '''cvi''': {{}} Standard cross-validation cell (see crossval) defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values. If empty (the default), then random cross-validation is done based on the "splits" option. | |||
* '''gamma''': Value(s) to use for LIBSVM kernel gamma parameter. Default is 15 values from 10^-6 to 10, spaced uniformly in log. | * '''gamma''': Value(s) to use for LIBSVM kernel gamma parameter. Default is 15 values from 10^-6 to 10, spaced uniformly in log. | ||
* '''cost''': Value(s) to use for LIBSVM 'c' parameter. Default is 11 values from 10^-3 to 100, spaced uniformly in log. | * '''cost''': Value(s) to use for LIBSVM 'c' parameter. Default is 11 values from 10^-3 to 100, spaced uniformly in log. | ||
* '''epsilon''': Value(s) to use for LIBSVM 'p' parameter (epsilon in loss function). Default is the set of values [1.0, 0.1, 0.01]. | * '''epsilon''': Value(s) to use for LIBSVM 'p' parameter (epsilon in loss function). Default is the set of values [1.0, 0.1, 0.01]. | ||
* '''nu''': Value(s) to use for LIBSVM 'n' parameter (nu of nu-SVC, and nu-SVR). Default is the set of values [0.2, 0.5, 0.8]. | * '''nu''': Value(s) to use for LIBSVM 'n' parameter (nu of nu-SVC, and nu-SVR). Default is the set of values [0.2, 0.5, 0.8]. | ||
* ''' | * '''random_state''' : [1] Random seed number. Set this to a number for reproducibility. | ||
===Algorithm=== | ===Algorithm=== | ||
Svm uses the LIBSVM implementation using the user-specified values for the LIBSVM parameters (see ''options'' above). See [http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf] for further details of these options. | Svm uses the LIBSVM implementation using the user-specified values for the LIBSVM parameters (see ''options'' above). See [http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf] for further details of these options. | ||
The default SVM parameters cost, epsilon, nu and gamma have value ranges rather than single values. This svm function uses a search over the grid of appropriate parameters using cross-validation to select the optimal SVM parameter values and builds an SVM model using those values. This is the recommended usage. The user can avoid this grid-search by passing in single values for these parameters, however. | The default SVM parameters cost, epsilon, nu and gamma have value ranges rather than single values. This svm function uses a search over the grid of appropriate parameters using cross-validation to select the optimal SVM parameter values and builds an SVM model using those values. This is the recommended usage. The user can avoid this grid-search by passing in single values for these parameters, however. If you are using the command line SVM function to build a model then the optimal SVM parameters are shown in model.detail.svm.cvscan.best. If you are using the graphical Analysis SVM then the optimal parameters are reported in the summary window which is shown when you mouse-over the model icon, once the model is built. | ||
====Model building performance==== | |||
Building a single SVM model can sometimes be slow, especially if the calibration dataset is large. Using ranges for the SVM parameters to search for the optimal parameter combination increases the final model building time significantly. If cross-validation is used the calculation is again increased, possibly dramatically if the number of CV subsets is large. Some suggestions for faster SVM building include: | |||
:1) Turning CV off ("none") during preliminary analyses. This is MUCH faster and cross-validation is still performed using a default "Random Subsets" with 5 data splits and 1 iteration, | |||
:2) Using a coarse grid of SVM parameter values to search over for optimal values, | |||
:3) Choosing the CV method carefully, at least initially. For example, use "Random Subsets" with a small number of data splits (e.g. 5) and a small "Number of Iterations" (e.g. 1). | |||
:4) Using the compression option if the number of variables is large. | |||
====epsilon-SVR and nu-SVR==== | |||
There are two commonly used versions of SVM regression, 'epsilon-SVR' and 'nu-SVR'. The original SVM formulations for Regression (SVR) used parameters C [0, inf) and epsilon[0, inf) to apply a penalty to the optimization for points which were not correctly predicted. An alternative version of both SVM regression was later developed where the epsilon penalty parameter was replaced by an alternative parameter, nu [0,1], which applies a slightly different penalty. The main motivation for the nu versions of SVM is that it has a has a more meaningful interpretation. This is because nu represents an upper bound on the fraction of training samples which are errors (badly predicted) and a lower bound on the fraction of samples which are support vectors. Some users feel nu is more intuitive to use than C or epsilon. | |||
Epsilon or nu are just different versions of the penalty parameter. The same optimization problem is solved in either case. Thus it should not matter which form of SVM you use, epsilon or nu. PLS_Toolbox uses epsilon since this was the original formulation and is the most commonly used form. For more details on 'nu' SVM regression see [http://www.csie.ntu.edu.tw/~cjlin/papers/newsvr.pdf] | |||
The user must provide parameters (or parameter ranges) for SVM regression as: | |||
:*'epsilon-SVR': | |||
::'''epsilon''','''C''', (using linear kernel), or | |||
::'''epsilon''','''C''', '''gamma''' (using radial basis function kernel), | |||
:*'nu-SVR': | |||
::'''nu''', '''C''', (using linear kernel), or | |||
::'''nu''', '''C''', '''gamma''' (using radial basis function kernel), | |||
====SVM Parameters==== | |||
* '''cost''': Cost [0 ->inf] represents the penalty associated with errors larger than epsilon. Increasing cost value causes closer fitting to the calibration/training data. | |||
* '''gamma''': Kernel ''gamma'' parameter controls the shape of the separating hyperplane. Increasing gamma usually increases number of support vectors. | |||
* '''epsilon''': In training the regression function there is no penalty associated with points which are predicted within distance epsilon from the actual value. Decreasing epsilon forces closer fitting to the calibration/training data. | |||
* '''nu''': Nu (0 -> 1] indicates a lower bound on the number of support vectors to use, given as a fraction of total calibration samples, and an upper bound on the fraction of training samples which are errors (poorly predicted). | |||
===See Also=== | ===See Also=== | ||
[[analysis]], [[ | [[analysis]], [[ann]], [[mlr]], [[lwr]], [[pls]], [[pcr]], [[svmda]], [[preprocess]], [[EVRIModel_Objects]] |
Latest revision as of 10:29, 8 December 2023
Purpose
SVM Support Vector Machine (LIBSVM) for regression. Use SVMDA for SVM classification (Svmda). Please also look at the Svmda page since it has more detailed information much of which also applies to SVM for regression.
Synopsis
- model = svm(x,y,options); %identifies model (calibration step).
- pred = svm(x,model,options); %makes predictions with a new X-block
- pred = svm(x,y,model,options); %performs a "test" call with a new X-block and known y-values
- svm % Launches an Analysis window with SVM as the selected method.
Please note that the recommended way to build and apply a SVM model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.
Description
The SVM function or analysis method performs calibration and application of Support Vector Machine (SVM) regression models. SVM models can be used for regression problems. The model consists of a number of support vectors (essentially samples selected from the calibration set) and non-linear model coefficients which define the non-linear mapping of variables in the input x-block. The model allows prediction of the continuous y-block variable. It is recommended that classification be done through the svmda function.
Svm is implemented using the LIBSVM package which provides both epsilon-support vector regression (epsilon-SVR) and nu-support vector regression (nu-SVR). Linear and Gaussian Radial Basis Function kernel types are supported by this function.
Note: Calling svm with no inputs starts the graphical user interface (GUI) for this analysis method.
Inputs
- x = X-block (predictor block) class "double" or "dataset", containing numeric values,
- y = Y-block (predicted block) class "double" or "dataset", containing numeric values,
- model = previously generated model (when applying model to new data).
Outputs
- model = a standard model structure model with the following fields (see Standard Model Structure):
- modeltype: 'SVM',
- datasource: structure array with information about input data,
- date: date of creation,
- time: time of creation,
- info: additional model information,
- pred: 2 element cell array with
- model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array)
- detail: sub-structure with additional model details and results, including:
- model.detail.svm.model: Matlab version of the libsvm svm_model (Java). Note that the number of support vectors used is given by model.detai.svm.model.l. It is useful to check this because it can indicate overfitting if most of the calibration samples are used as support vectors, or can indicate problems fitting a model if there are no support vectors (and all prediction values will equal a constant value, a weighted mean).
- model.detail.svm.cvscan: Results of CV parameter scan
- model.detail.svm.svindices: Indices of X-block samples which are support vectors.
- pred a structure, similar to model for the new data.
Options
options = a structure array with the following fields:
- display: [ 'off' | {'on'} ], governs level of display to command window,
- plots [ 'none' | {'final'} ], governs level of plotting,
- preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
- compression: [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the SVM model. 'pca' uses a simple PCA model to compress the information. 'pls' uses either a pls or plsda model (depending on the svmtype). Compression can make the SVM more stable and less prone to overfitting.
- compressncomp: [1] Number of latent variables (or principal components to include in the compression model.
- blockdetails: [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
- algorithm: [ 'libsvm' ] algorithm to use. libsvm is default and currently only option.
- kerneltype: [ 'linear' | {'rbf'} ], SVM kernel to use. 'rbf' is default.
- svmtype: [ {'epsilon-svr'} | 'nu-svr' ] Type of SVM to apply. The default is 'epsilon-svr' for regression.
- probabilityestimates: [0| {1} ], whether to train the SVR model for probability estimates, 0 or 1 (default 1)"
- cvtimelimit: Set a time limit (seconds) on individual cross-validation sub-calculation when searching over supplied SVM parameter ranges for optimal parameters. Only relevant if parameter ranges are used for SVM parameters such as cost, epsilon, gamma or nu. Default is 10;
- splits: Number of subsets to divide data into when applying n-fold cross validation. Default is 5. This option is only used when the "cvi" option is empty.
- cvi: {{}} Standard cross-validation cell (see crossval) defining a split method, number of splits, and number of iterations. This cross-validation is use both for parameter optimization and for error estimate on the final selected parameter values. If empty (the default), then random cross-validation is done based on the "splits" option.
- gamma: Value(s) to use for LIBSVM kernel gamma parameter. Default is 15 values from 10^-6 to 10, spaced uniformly in log.
- cost: Value(s) to use for LIBSVM 'c' parameter. Default is 11 values from 10^-3 to 100, spaced uniformly in log.
- epsilon: Value(s) to use for LIBSVM 'p' parameter (epsilon in loss function). Default is the set of values [1.0, 0.1, 0.01].
- nu: Value(s) to use for LIBSVM 'n' parameter (nu of nu-SVC, and nu-SVR). Default is the set of values [0.2, 0.5, 0.8].
- random_state : [1] Random seed number. Set this to a number for reproducibility.
Algorithm
Svm uses the LIBSVM implementation using the user-specified values for the LIBSVM parameters (see options above). See [1] for further details of these options.
The default SVM parameters cost, epsilon, nu and gamma have value ranges rather than single values. This svm function uses a search over the grid of appropriate parameters using cross-validation to select the optimal SVM parameter values and builds an SVM model using those values. This is the recommended usage. The user can avoid this grid-search by passing in single values for these parameters, however. If you are using the command line SVM function to build a model then the optimal SVM parameters are shown in model.detail.svm.cvscan.best. If you are using the graphical Analysis SVM then the optimal parameters are reported in the summary window which is shown when you mouse-over the model icon, once the model is built.
Model building performance
Building a single SVM model can sometimes be slow, especially if the calibration dataset is large. Using ranges for the SVM parameters to search for the optimal parameter combination increases the final model building time significantly. If cross-validation is used the calculation is again increased, possibly dramatically if the number of CV subsets is large. Some suggestions for faster SVM building include:
- 1) Turning CV off ("none") during preliminary analyses. This is MUCH faster and cross-validation is still performed using a default "Random Subsets" with 5 data splits and 1 iteration,
- 2) Using a coarse grid of SVM parameter values to search over for optimal values,
- 3) Choosing the CV method carefully, at least initially. For example, use "Random Subsets" with a small number of data splits (e.g. 5) and a small "Number of Iterations" (e.g. 1).
- 4) Using the compression option if the number of variables is large.
epsilon-SVR and nu-SVR
There are two commonly used versions of SVM regression, 'epsilon-SVR' and 'nu-SVR'. The original SVM formulations for Regression (SVR) used parameters C [0, inf) and epsilon[0, inf) to apply a penalty to the optimization for points which were not correctly predicted. An alternative version of both SVM regression was later developed where the epsilon penalty parameter was replaced by an alternative parameter, nu [0,1], which applies a slightly different penalty. The main motivation for the nu versions of SVM is that it has a has a more meaningful interpretation. This is because nu represents an upper bound on the fraction of training samples which are errors (badly predicted) and a lower bound on the fraction of samples which are support vectors. Some users feel nu is more intuitive to use than C or epsilon. Epsilon or nu are just different versions of the penalty parameter. The same optimization problem is solved in either case. Thus it should not matter which form of SVM you use, epsilon or nu. PLS_Toolbox uses epsilon since this was the original formulation and is the most commonly used form. For more details on 'nu' SVM regression see [2]
The user must provide parameters (or parameter ranges) for SVM regression as:
- 'epsilon-SVR':
- epsilon,C, (using linear kernel), or
- epsilon,C, gamma (using radial basis function kernel),
- 'nu-SVR':
- nu, C, (using linear kernel), or
- nu, C, gamma (using radial basis function kernel),
SVM Parameters
- cost: Cost [0 ->inf] represents the penalty associated with errors larger than epsilon. Increasing cost value causes closer fitting to the calibration/training data.
- gamma: Kernel gamma parameter controls the shape of the separating hyperplane. Increasing gamma usually increases number of support vectors.
- epsilon: In training the regression function there is no penalty associated with points which are predicted within distance epsilon from the actual value. Decreasing epsilon forces closer fitting to the calibration/training data.
- nu: Nu (0 -> 1] indicates a lower bound on the number of support vectors to use, given as a fraction of total calibration samples, and an upper bound on the fraction of training samples which are errors (poorly predicted).
See Also
analysis, ann, mlr, lwr, pls, pcr, svmda, preprocess, EVRIModel_Objects