Svm: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Created page with ''''WARNING:''' Placeholder page for SVM function ===Purpose=== Partial least squares regression for univariate or multivariate y-block. ===Synopsis=== :model = pls(x,y,ncomp,…')
 
imported>Donal
No edit summary
Line 1: Line 1:
'''WARNING:''' Placeholder page for SVM function
===Purpose===
===Purpose===


Partial least squares regression for univariate or multivariate y-block.
SVM Support Vector Machine (LIBSVM) for regression or classification.


===Synopsis===
===Synopsis===


:model = pls(x,y,ncomp,''options'')       %identifies model (calibration step)
:model = svm(x,y,options);          %identifies model (calibration step)
:pred  = pls(x,model,''options'')         %makes predictions with a new X-block
:pred  = svm(x,model,options);      %makes predictions with a new X-block
:valid = pls(x,y,model,''options'')       %makes predictions with new X- & Y-block
:pred  = svm(x,y,model,options)%performs a "test" call with a new X-block and known y-values


===Description===
===Description===


PLS calculates a single partial least squares regression model using the given number of components <tt>ncomp</tt> to predict a dependent variable <tt>y</tt> from a set of independent variables <tt>x</tt>.
SVM performs calibration and application of Support Vector Machine (SVM) models. These are non-linear models which can be used for regression or classification problems. The model consists of a number of support vectors (essentially samples selected from the calibration set) and non-linear model coefficients which define the non-linear mapping of variables in the input x-block to allow prediction of either the continuous y-block variable (for regression problems), or the classification as passed in either the classes of the x-block or in a y-block which contains numerical classes.


Alternatively, PLS can be used in 'predicton mode' to apply a previously built PLS model in <tt>model</tt> to an external set of test data in <tt>x</tt> (2-way array class "double" or "dataset"), in order to generate y-values for these data.  
To choose between regression and classification, use the svmtype option:
* ''regression'' : svmtype = 'epsilon-svr' or 'nu-svr'
* ''classification'' : svmtype = 'c-svc' or 'nu-svc'
It is recommended that classification be done through the svmda function.


Furthermore, if matching x-block and y-block measurements are available for an external test set, then PLS can be used in 'validation mode' to predict the y-values of the test data from the model <tt>model</tt> and <tt>x</tt>, and allow comparison of these predicted y-values to the known y-values <tt>y</tt>.
Note: Calling svm with no inputs starts the graphical user interface (GUI) for this analysis method.  
 
Note: Calling pls with no inputs starts the graphical user interface (GUI) for this analysis method.  


====Inputs====
====Inputs====


* '''x''' = the independent variable (X-block) data (2-way array class "double" or class "datadet")
* '''x''' = X-block (predictor block) class "double" or "dataset",
* '''y''' = the dependent variable (Y-block) data (2-way array class "double" or class "datadet")
* '''y''' = Y-block (predicted block) class "double" or "dataset",
* '''ncomp''' = the number of components to to be calculated (positive integer scalar)
* '''model''' = previously generated model (when applying model to new data).


====Outputs====
====Outputs====
Line 51: Line 50:


===Options===
===Options===
 
'''***TODO***'''
''options'' =  a structure array with the following fields:
''options'' =  a structure array with the following fields:


Line 66: Line 65:


The default options can be retreived using: options = pls('options');.
The default options can be retreived using: options = pls('options');.
OUTPUTVERSION
By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
:[b,ssq,p,q,w,t,u,bin] = pls(x,y,ncomp,''options'')
where the outputs are
* '''b''' = matrix of regression vectors or matrices for each number of principal components up to ncomp,
* '''ssq''' = the sum of squares information,
* '''p''' = x-block loadings,
* '''q''' = y-block loadings,
* '''w''' = x-block weights,
* '''t''' = x-block scores
* '''u''' = y-block scores, and
* '''bin''' = inner relation coefficients.
Note: The regression matrices are ordered in b such that each ''Ny'' (number of y-block variables) rows correspond to the regression matrix for that particular number of principal components.


===Algorithm===
===Algorithm===

Revision as of 14:21, 25 January 2010

Purpose

SVM Support Vector Machine (LIBSVM) for regression or classification.

Synopsis

model = svm(x,y,options); %identifies model (calibration step)
pred = svm(x,model,options); %makes predictions with a new X-block
pred = svm(x,y,model,options); %performs a "test" call with a new X-block and known y-values

Description

SVM performs calibration and application of Support Vector Machine (SVM) models. These are non-linear models which can be used for regression or classification problems. The model consists of a number of support vectors (essentially samples selected from the calibration set) and non-linear model coefficients which define the non-linear mapping of variables in the input x-block to allow prediction of either the continuous y-block variable (for regression problems), or the classification as passed in either the classes of the x-block or in a y-block which contains numerical classes.

To choose between regression and classification, use the svmtype option:

  • regression : svmtype = 'epsilon-svr' or 'nu-svr'
  • classification : svmtype = 'c-svc' or 'nu-svc'

It is recommended that classification be done through the svmda function.

Note: Calling svm with no inputs starts the graphical user interface (GUI) for this analysis method.

Inputs

  • x = X-block (predictor block) class "double" or "dataset",
  • y = Y-block (predicted block) class "double" or "dataset",
  • model = previously generated model (when applying model to new data).

Outputs

  • model = a standard model structure model with the following fields (see MODELSTRUCT):
    • modeltype: 'PLS',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • reg: regression vector,
    • loads: cell array with model loadings for each mode/dimension,
    • pred: 2 element cell array with
      • model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array),and
      • the y-block predictions.
    • wts: double array with X-block weights,
    • tsqs: cell array with T2 values for each mode,
    • ssqresiduals: cell array with sum of squares residuals for each mode,
    • description: cell array with text description of model, and
    • detail: sub-structure with additional model details and results.
  • pred a structure, similar to model, that contains scores, predictions, etc. for the new data.
  • valid a structure, similar to model, that contains scores, predictions, and additional y-block statistics, etc. for the new data.

Options

***TODO*** options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots [ 'none' | {'final'} ], governs level of plotting,
  • outputversion: [ 2 | {3} ], governs output format (see below),
  • preprocessing: {[] []}, two element cell array containing preprocessing structures (see PREPROCESS) defining preprocessing to use on the x- and y-blocks (first and second elements respectively)
  • algorithm: [ 'nip' | {'sim'} | 'robustpls' ], PLS algorithm to use: NIPALS or SIMPLS {default}, and
  • blockdetails: [ {'standard'} | 'all' ], extent of predictions and residuals included in model, 'standard' = only y-block, 'all' x- and y-blocks.
  • confidencelimit: [ {'0.95'} ], confidence level for Q and T2 limits, a value of zero (0) disables calculation of confidence limits,
  • weights: [ 'hist' | [vector] ], governs sample weighting. If set to the string 'hist', y-block histogram weighting is done on the samples. If set to a vector, each element is used as a weight for the corresponding sample. If empty, no sample weighting is done.
  • roptions: structure of options to pass to rsimpls (robust PLS engine from the Libra Toolbox).
alpha: [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpls'.

The default options can be retreived using: options = pls('options');.

Algorithm

Note that unlike previous versions of the PLS function, the default algorithm (see Options, above) is the faster SIMPLS algorithm. If the alternate NIPALS algorithm is to be used, the options.algorithm field should be set to 'nip'.

See Also

analysis, crossval, modelstruct, nippls, pcr, plsda, preprocess, ridge, simpls