Annda: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
No edit summary
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:


Predictions based on Artificial Neural Network (ANNDA) classification models.
Predictions based on Artificial Neural Network (ANNDA) classification models.
ANNDA Artificial Neural Network for classification. Use ANN for Artificial Neural Network regression([[Ann]]).
ANNDA Artificial Neural Network for classification. Use ANN for Artificial Neural Network regression ([[Ann]]).


===Synopsis===
===Synopsis===
Line 22: Line 22:
to ANNDA then these weights do not need to be calculated.  
to ANNDA then these weights do not need to be calculated.  


There are two implementations of ANNDA available referred to as 'BPN' and 'Encog'.
ANNDA is implemented using 'BPN', a feedforward ANN using backpropagation training and is implemented in Matlab.
:BPN is a feedforward ANN using backpropagation training and is implemented in Matlab.
:Encog is a feedforward ANN using Resilient Backpropagation training. See [http://en.wikipedia.org/wiki/Rprop Rprop] for further details.
Encog is implemented using the Encog framework [http://www.heatonresearch.com/encog Encog] provided by
Heaton Research, Inc, under the Apache 2.0 license. Further details of Encog Neural Network features are
available at [http://www.heatonresearch.com/wiki/Main_Page#Encog_Documentation Encog Documentation].
BPN is the ANN version used by default but the user can specify the option 'algorithm' = 'encog' to use Encog instead.
Both implementations should give similar results but one may be faster than the other for different datasets.  


====Inputs====
====Inputs====
Line 52: Line 45:


* '''pred''' a structure, similar to '''model''' for the new data.
* '''pred''' a structure, similar to '''model''' for the new data.
=====Calculation of Class Probabilities=====
The raw predictions from the ANN model are values ideally close to zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. The distribution of the calibration sample predictions are modeled as Gaussians and used to provide a probability of the sample being in each class based on the raw prediction values. Please see this description of [https://www.wiki.eigenvector.com/index.php?title=Plsda#Probability-based_Predictions  converting raw ANN outputs into class probabilities]. The raw prediction values are in model.pred{2} and the classification probabilities are in model.detail.classification and model.detail.cvclassification.


====Training Termination====
====Training Termination====
The ANN is trained on a calibration dataset to minimize prediction error, RMSEC. It is important to not overtrain, however, so some some criteria for ending training are needed.
The ANN is trained on a calibration dataset to minimize prediction error, RMSEC. It is important to not over-train, however, so some some criteria for ending training are needed.


BPN determines the optimal number of learning iteration cycles by selecting the minumum RMSECV based on the calibration data over a range of learning iterations values (1 to options.learncycles). The cross-validation used is determined by option cvi, or else by cvmethod. If neither of these are specified then the minumum RMSEP using a single subset of samples from a 5-fold random split of the calibration data is used. This RMSECV value is based on pre-processed, scaled values and so it is not saved in the model.rmsecv field. Apply cross-validation (see below) to add this information to the model.
BPN determines the optimal number of learning iteration cycles by selecting the minumum RMSECV based on the calibration data over a range of learning iterations values (1 to options.learncycles). The cross-validation used is determined by option cvi, or else by cvmethod. If neither of these are specified then the minumum RMSEP using a single subset of samples from a 5-fold random split of the calibration data is used. This RMSECV value is based on pre-processed, scaled values and so it is not saved in the model.rmsecv field. Apply cross-validation (see below) to add this information to the model.
Encog training terminates whenever either a) RMSE becomes smaller than the option 'terminalrmse' value, or b) the rate of improvement of RMSE per 100 training iterations
becomes smaller than the option 'terminalrmserate' value, or c) time exceeds the option 'maxseconds' value (though results are not optimal if is stopped prematurely by this time limit).
Note these RMSE values refer to the internal preprocessed and scaled y values.
Note these RMSE values refer to the internal preprocessed and scaled y values.


====Cross-validation====
====Cross-validation====
Cross-validation can be applied to ANN when using either the ANN Analysis window or the command line. From the Analysis window specify the cross-validation method in the usual way (clicking on the model icon's red check-mark, or the "Choose Cross-Validation" link in the flowchart). In the cross-validation window the "Maximum Number of Nodes" specifies how many hidden-layer 1 nodes to test over. Viewing RMSECV versus number of hidden-layer 1 nodes (toolbar icon to left of Scores Plot) is useful for choosing the number of layer 1 nodes. From the command line use the crossval method to add crossvalidation information to an existing model.
Cross-validation can be applied to ANNDA when using either the ANNDA Analysis window or the command line.  
 
From the Analysis window specify the cross-validation method in the usual way (clicking on the model icon's red check-mark, or the "Choose Cross-Validation" link in the flowchart).  
 
From the command line use the model.crossvalidate method to add crossvalidation information to an existing model. For example, from the anndademo.m function:
 
  model = model.crossvalidate(arch, {'vet' 6 1}, 10);


===Options===
===Options===
Line 72: Line 71:
* '''blockdetails''' : [ {'standard'} | 'all' ] extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks.
* '''blockdetails''' : [ {'standard'} | 'all' ] extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks.
* '''waitbar''' : [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
* '''waitbar''' : [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
* '''algorithm''' : [{'bpn'} | 'encog'] ANN implementation to use.
* '''algorithm''' : [{'bpn'}] ANN implementation to use.
* '''nhid1''' : [{2}] Number of nodes in first hidden layer.
* '''nhid1''' : [{2}] Number of nodes in first hidden layer.
* '''nhid2''' : [{0}] Number of nodes in second hidden layer.
* '''nhid2''' : [{0}] Number of nodes in second hidden layer.
* '''learnrate''' : [0.125] ANN backpropagation learning rate (bpn only).
* '''learnrate''' : [0.125] ANN backpropagation learning rate (bpn only).
* '''learncycles''' : [20] Number of ANN learning iterations (bpn only).
* '''learncycles''' : [20] Number of ANN learning iterations (bpn only).
* '''terminalrmse''' : [0.05] Termination RMSE value (of scaled y) for ANN iterations (encog only).
* '''terminalrmserate''' : [1.e-9] Termination rate of change of RMSE per 100 iterations (encog only).
* '''maxseconds''' : [{20}] Maximum duration of ANN training in seconds (encog only).
* '''preprocessing''': {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
* '''preprocessing''': {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
* '''compression''': [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the ANNDA model. 'pca' uses a simple PCA model to compress the information. 'pls' uses a pls model. Compression can make the ANNDA more stable and less prone to overfitting.
* '''compression''': [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the ANNDA model. 'pca' uses a simple PCA model to compress the information. 'pls' uses a pls model. Compression can make the ANNDA more stable and less prone to overfitting.
Line 93: Line 89:
* '''activationfunction''' : For the default algorithm, 'bpn', this option uses a 'sigmoid' activation function, f(x) = 1/(1+exp(-x)). For the 'encog' algorithm this activationfunction option has two choices, 'tanh' as default, or 'sigmoid'.
* '''activationfunction''' : For the default algorithm, 'bpn', this option uses a 'sigmoid' activation function, f(x) = 1/(1+exp(-x)). For the 'encog' algorithm this activationfunction option has two choices, 'tanh' as default, or 'sigmoid'.


===Additional information on the ‘BPN’ ANNDA implementation===
===‘BPN’ ANNDA===
====Implementation Details====
The “BPN” implementation of ANNDA is a conventional feedforward back-propagation neural network where the weights are updated, or ‘trained’, so as to reduce the magnitude of the prediction error, except that the gradient-descent method of updating the weights is different from the usual “delta rule” approach. In the traditional delta-rule method the weights are changed at each increment of training time by a constant fraction of the contributing error gradient terms, leading to a reduced prediction error. In this “BPN” implementation the search for optimal weights by gradient-descent is treated as a continuous system, rather than incremental. The evolution of the weights with respect to training time is solved as a set of differential equations using a solver appropriate for systems where the solution (weights) may involve very different timescales. Most weights evolve slowly towards their final values but some weights may have periods of faster change. A reference paper for the BPN implementation is:
The “BPN” implementation of ANNDA is a conventional feedforward back-propagation neural network where the weights are updated, or ‘trained’, so as to reduce the magnitude of the prediction error, except that the gradient-descent method of updating the weights is different from the usual “delta rule” approach. In the traditional delta-rule method the weights are changed at each increment of training time by a constant fraction of the contributing error gradient terms, leading to a reduced prediction error. In this “BPN” implementation the search for optimal weights by gradient-descent is treated as a continuous system, rather than incremental. The evolution of the weights with respect to training time is solved as a set of differential equations using a solver appropriate for systems where the solution (weights) may involve very different timescales. Most weights evolve slowly towards their final values but some weights may have periods of faster change. A reference paper for the BPN implementation is:


Line 103: Line 100:
Note, the model.detail.ann.rmsecviter values are only used to pick the optimal number of learncycles. These rmsecviter values are calculated using scaled y and should not be compared to the reported RMSEC, RMSECV or RMSEP.
Note, the model.detail.ann.rmsecviter values are only used to pick the optimal number of learncycles. These rmsecviter values are calculated using scaled y and should not be compared to the reported RMSEC, RMSECV or RMSEP.


====Usage from ANNDA Analysis window====
===Summary of model building speed-up settings===
 
The command line function “annda” has input parameter “nhid” specifying the number of nodes in the hidden layer(s) and builds the optimal model for that network. When using the ANNDA Analysis window, however, it is possible to specify a scan over a range of hidden layer nodes to use. This is enabled by setting the “Maximum number of Nodes” value in the cross-validation window. This only works for BPN ANNDAs having a single hidden layer. This causes ANNDA models to be built for the range of hidden layer nodes up to the specified number and the resulting RMSECV plotted versus the number of nodes is shown by clicking on the “Plot cross-validation results” plot icon in the ANNDA Analysis window’s toolbar. This can be useful for deciding how many nodes to use. Note that this plot is only advisory. The resulting model is built with the input parameter number of nodes, ‘nhid’, and its model.detail.rmsecv value relates to this number of nodes. It is important to check for the optimal number of nodes to use in the ANN but this feature can greatly lengthen the time taken to build the ANNDA model and should be be set = 1 once the number of hidden nodes is decided.
 
====Summary of model building speed-up settings====
 
=====From the Analysis window:=====
ANNDA in PLS_Toolbox or Solo version 8.2 can be very slow if you use cross-validation (CV). This is mostly due to the CV settings window also specifying a test to find the optimal number of hidden layer 1 nodes, testing ANN models with 1, 2, …,20 nodes, each with CV. This is set by the top slider field “Maximum Number of Nodes L1”. For example, if you want to build an ANN model with 4 layer 1 nodes (using the “ANNDA Settings” field) but leave the CV settings window’s top slider set = 20, then you will actually build 20 models, each with CV, and save the RMSECV from each. This can be very slow, especially for the models with many nodes.
 
To make ANNDA  perform faster it is recommended that you drag this CV window’s “Maximum Number of Nodes L1” slider to the left, setting = 1, unless you really want to see the results of such a parameter search over the range specified by this slider.  This is the default in PLS_Toolbox and Solo versions after version 8.2. The RMSECV versus number of Layer 1 Nodes can be seen by clicking on the “Plot cross-validation results” icon (next to the Scores Plot icon).


Summary: To make ANNDA perform faster:
The time required to build ANNDA models using the 'BPN' method increases significantly when using training datasets having more than about a thousand samples or variables. Some tips on speeding up ANNDA model building include the following:


1. Move the top CV slider to the left, setting value = 1.
From the Analysis window:


2. Turning CV off or using a small number of CV splits.
1. Turning CV off or using a small number of CV splits.


3. Choose to use a small number of L1 nodes in the ANNDA settings window.
2. Don't use 2 hidden layers. This is very slow.


4. Don't use 2 hidden layers. This is very slow.
From the command line:


=====From the command line=====
1. Initially build ANNDA without cross-validation so as to decide on values for learnrate and learncycles by examining where the minimum value of model.detail.ann.rmscviter occurs versus learncycles. Note this uses a single-split CV to estimate rmsecv when the ANNDA cross-validation is set as "None". It is inefficient to use a larger than necessary value for option "learncycles".
1.     Initially build ANNDA without cross-validation so as to decide on values for learnrate and learncycles by examining where the minimum value of model.detail.ann.rmscviter occurs versus learncycles. Note this uses a single-split CV to estimate rmsecv when the ANNDA cross-validation is set as "None". It is inefficient to use a larger than necessary value for option "learncycles".


2.     Determine the number of hidden layer nodes to use by building a range of models with different number of nodes, nhid1, nhid2. If using the ANNDA Analysis window and the ANN has a single hidden layer then this can be done conveniently by using the “Maximum number of Nodes L1” setting in the cross-validation settings window. It is best to use a simple cross-validation at this stage, with a small number of splits and iterations at this survey stage.
2. Determine the number of hidden layer nodes to use by building a range of models with different number of nodes, nhid1, nhid2. It is best to use a simple cross-validation at this stage, with a small number of splits and iterations at this survey stage.


===See Also===
===See Also===


[[annda]], [[analysis]], [[crossval]], [[preprocess]], [[EVRIModel_Objects]]
[[ann]], [[analysis]], [[crossval]], [[preprocess]], [[EVRIModel_Objects]]

Revision as of 22:42, 9 December 2021

Purpose

Predictions based on Artificial Neural Network (ANNDA) classification models. ANNDA Artificial Neural Network for classification. Use ANN for Artificial Neural Network regression (Ann).

Synopsis

annda - Launches an Analysis window with ANNDA as the selected method.
[model] = annda(x, opts);
[model] = annda(x,y,options);
[model] = annda(x,y, nhid, options);
[pred] = annda(x,model,options);
[valid] = annda(x,model,options);
[valid] = annda(x,y,model,options);

Please note that the recommended way to build and apply an ANNDA model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

Build an ANNDA model from input dataset X, or input X and Y if classes are in Y, using the specified number of layers and layer nodes. Alternatively, if a model is passed in ANNDA makes a prediction for an input test X block. The ANNDA model contains quantities (weights etc) calculated from the calibration data. When a model structure is passed in to ANNDA then these weights do not need to be calculated.

ANNDA is implemented using 'BPN', a feedforward ANN using backpropagation training and is implemented in Matlab.

Inputs

  • x = X-block (predictor block) class "double" or "dataset", containing numeric values,
  • y = Y-block (optional) class "double" sample class values,
  • nhid = number of nodes in a single hidden layer ANN, or vector of two two numbers, indicating a two hidden layer ANN, representing the number of nodes in the two hidden layers. (this takes precedence over options nhid1 and nhid2),
  • model = previously generated model (when applying model to new data).

Outputs

  • model = a standard model structure model with the following fields (see Standard Model Structure):
    • modeltype: 'ANNDA',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • pred: 2 element cell array with
      • model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array)
    • detail: sub-structure with additional model details and results, including:
      • model.detail.ann.W: Structure containing details of the ANN, including the ANN type, number of hidden layers and the weights.
  • pred a structure, similar to model for the new data.
Calculation of Class Probabilities

The raw predictions from the ANN model are values ideally close to zero or one. A value closer to zero indicates the new sample is NOT in the modeled class; a value of one indicates a sample is in the modeled class. The distribution of the calibration sample predictions are modeled as Gaussians and used to provide a probability of the sample being in each class based on the raw prediction values. Please see this description of converting raw ANN outputs into class probabilities. The raw prediction values are in model.pred{2} and the classification probabilities are in model.detail.classification and model.detail.cvclassification.

Training Termination

The ANN is trained on a calibration dataset to minimize prediction error, RMSEC. It is important to not over-train, however, so some some criteria for ending training are needed.

BPN determines the optimal number of learning iteration cycles by selecting the minumum RMSECV based on the calibration data over a range of learning iterations values (1 to options.learncycles). The cross-validation used is determined by option cvi, or else by cvmethod. If neither of these are specified then the minumum RMSEP using a single subset of samples from a 5-fold random split of the calibration data is used. This RMSECV value is based on pre-processed, scaled values and so it is not saved in the model.rmsecv field. Apply cross-validation (see below) to add this information to the model. Note these RMSE values refer to the internal preprocessed and scaled y values.

Cross-validation

Cross-validation can be applied to ANNDA when using either the ANNDA Analysis window or the command line.

From the Analysis window specify the cross-validation method in the usual way (clicking on the model icon's red check-mark, or the "Choose Cross-Validation" link in the flowchart).

From the command line use the model.crossvalidate method to add crossvalidation information to an existing model. For example, from the anndademo.m function:

  model = model.crossvalidate(arch, {'vet' 6 1}, 10);

Options

options = a structure array with the following fields:

  • display : [ 'off' |{'on'}] Governs display
  • plots: [ {'none'} | 'final' ] governs plotting of results.
  • blockdetails : [ {'standard'} | 'all' ] extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks.
  • waitbar : [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
  • algorithm : [{'bpn'}] ANN implementation to use.
  • nhid1 : [{2}] Number of nodes in first hidden layer.
  • nhid2 : [{0}] Number of nodes in second hidden layer.
  • learnrate : [0.125] ANN backpropagation learning rate (bpn only).
  • learncycles : [20] Number of ANN learning iterations (bpn only).
  • preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
  • compression: [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the ANNDA model. 'pca' uses a simple PCA model to compress the information. 'pls' uses a pls model. Compression can make the ANNDA more stable and less prone to overfitting.
  • compressncomp: [1] Number of latent variables (or principal components to include in the compression model.
  • compressmd: [{'yes'} | 'no'] Use Mahalnobis Distance corrected.
  • cvmethod : [{'con'} | 'vet' | 'loo' | 'rnd'] CV method, OR [] for Kennard-Stone single split.
  • cvsplits : [{5}] Number of CV subsets.
  • cvi : M element vector with integer elements allowing user defined subsets. (cvi) is a vector with the same number of elements as x has rows i.e., length(cvi) = size(x,1). Each cvi(i) is defined as:
cvi(i) = -2 the sample is always in the test set.
cvi(i) = -1 the sample is always in the calibration set,
cvi(i) = 0 the sample is always never used, and
cvi(i) = 1,2,3... defines each test subset.
  • activationfunction : For the default algorithm, 'bpn', this option uses a 'sigmoid' activation function, f(x) = 1/(1+exp(-x)). For the 'encog' algorithm this activationfunction option has two choices, 'tanh' as default, or 'sigmoid'.

‘BPN’ ANNDA

Implementation Details

The “BPN” implementation of ANNDA is a conventional feedforward back-propagation neural network where the weights are updated, or ‘trained’, so as to reduce the magnitude of the prediction error, except that the gradient-descent method of updating the weights is different from the usual “delta rule” approach. In the traditional delta-rule method the weights are changed at each increment of training time by a constant fraction of the contributing error gradient terms, leading to a reduced prediction error. In this “BPN” implementation the search for optimal weights by gradient-descent is treated as a continuous system, rather than incremental. The evolution of the weights with respect to training time is solved as a set of differential equations using a solver appropriate for systems where the solution (weights) may involve very different timescales. Most weights evolve slowly towards their final values but some weights may have periods of faster change. A reference paper for the BPN implementation is:

Owens A J and Filkin D L 1989 Efficient training of the back propagation network by solving a system of stiff ordinary differential equations Proc. Int. Joint Conf. on Neural Networks vol II (IEEE Press) pp 381–6.

Algorithm parameters: learncycles and learnrate

This BPN technique results in much faster training that with the traditional delta-rule approach. The training is governed by two parameters, ‘learncycles’ and ‘learnrate’. The learnrate parameter specifies the training time duration of the first learncycle. Each subsequent learncycle’s time duration is twice the previous learncycle’s duration. The performance of the ANN is evaluated at the end of each learncycle interval by calculating the cross-validation prediction error, RMSECV. The RMSECV initially decreases rapidly with training time but eventually starts to increase again as the ANN begins to overfit the data. The number of training cycles which yields the minimum RMSECV therefore provides an estimate of the optimal ANN training duration, for the given learnrate value. The ANN model contains these RMSECV values in model.detail.ann.rmsecviter, and the optimal, minimum RMSECV occurs at index model.detail.ann.niter, which will be smaller than or equal to the learncycles value. It is useful to check rmsecviter to see if a minimum RMSECV has been attained, but also to see if you are using too many learn cycles. Reducing the number of learncycles can significantly speed up ANN training. Note, the model.detail.ann.rmsecviter values are only used to pick the optimal number of learncycles. These rmsecviter values are calculated using scaled y and should not be compared to the reported RMSEC, RMSECV or RMSEP.

Summary of model building speed-up settings

The time required to build ANNDA models using the 'BPN' method increases significantly when using training datasets having more than about a thousand samples or variables. Some tips on speeding up ANNDA model building include the following:

From the Analysis window:

1. Turning CV off or using a small number of CV splits.

2. Don't use 2 hidden layers. This is very slow.

From the command line:

1. Initially build ANNDA without cross-validation so as to decide on values for learnrate and learncycles by examining where the minimum value of model.detail.ann.rmscviter occurs versus learncycles. Note this uses a single-split CV to estimate rmsecv when the ANNDA cross-validation is set as "None". It is inefficient to use a larger than necessary value for option "learncycles".

2. Determine the number of hidden layer nodes to use by building a range of models with different number of nodes, nhid1, nhid2. It is best to use a simple cross-validation at this stage, with a small number of splits and iterations at this survey stage.

See Also

ann, analysis, crossval, preprocess, EVRIModel_Objects