Anndl: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
Line 101: Line 101:
**'''tf.batch_size''' : [ {'12'} ] Number of samples in each of the minibatches.
**'''tf.batch_size''' : [ {'12'} ] Number of samples in each of the minibatches.


===Additional information on the ‘tensorflow’ ANNDL implementation===
===Additional information on the ‘tensorflow’ ANNDL===
PLS_Toolbox does not include the full slate that Tensorflow has to offer, but more than enough to get off the ground running to build deep neural networks. Tensorflow offers a wide variety of the types of layers to use, loss functions, optimizers, and activation functions. This chart goes over what has been adapted from Tensorflow thus far:
PLS_Toolbox does not include the full slate that Tensorflow has to offer, but more than enough to get off the ground running to build deep neural networks. Tensorflow offers a wide variety of the types of layers to use, loss functions, optimizers, and activation functions. This chart goes over what has been adapted from Tensorflow thus far:
* '''Layers''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/layers]
 
* '''Layers''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/layers https://www.tensorflow.org/api_docs/python/tf/keras/layers]
# Dense (fully connected layer)
# Dense (fully connected layer)
# Flatten (Takes weights from the previous layer and flattens to a 1-dimensional vector)
# Flatten (Takes weights from the previous layer and flattens to a 1-dimensional vector)
Line 112: Line 113:
# MaxPooling[123]D (all three dimensions included)
# MaxPooling[123]D (all three dimensions included)


* '''Optimizers''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/optimizers]
* '''Optimizers''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/optimizers https://www.tensorflow.org/api_docs/python/tf/keras/optimizers]
# Adam
# Adam
# Adamax
# Adamax
Line 118: Line 119:
# SGD
# SGD


* '''Loss functions''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/losses]
* '''Loss functions''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/losses https://www.tensorflow.org/api_docs/python/tf/keras/losses]
# Mean Square Error
# Mean Square Error
# Mean Absolute Error
# Mean Absolute Error
# Logcosh
# Logcosh


* '''Activation functions''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/activations]
* '''Activation functions''' (visit here for more info: [https://www.tensorflow.org/api_docs/python/tf/keras/activations https://www.tensorflow.org/api_docs/python/tf/keras/activations]
# Relu
# Relu
# Tanh
# Tanh
Line 129: Line 130:
# Linear
# Linear


*'''Convolutional Neural Networks (CNN)'''
'''How to use these layers from the command-line and ANNDL Analysis Window'''
 
Each of the layers included in PLS_Toolbox are used to do specific tasks, and therefore require different arguments. As noted above, the '''tf.hidden_layer''' parameter is a cell array of structs, each struct representing a hidden layer. The supported field names for these layers are the following
# ‘type’
# ‘units’
# ‘size’
 
Each hidden layer requires the ‘type’ field to be populated by one of the supported hidden layers provided above. This table explains the parameter mapping for the ‘units’ and ‘size’ fields and the corresponding Tensorflow parameter names by layer type:
 
{| class="wikitable"
! Layer Type || Required Field(s) || Tensorflow Parameter(s) || Input data type
|-
| Dense || units || units || integer([1,∞))
|-
| Flatten || N/A || N/A || N/A
|-
| Dropout || units || rate || float((0,1])
|-
| BatchNormalization || N/A || N/A || N/A
|-
| Conv1D || units, size || kernels, filter_size || integer([1,∞))
|-
| Conv2D || units, size || kernels, filter_size || [x y], where x,y are integers specifying length and width
|-
| Conv3D || units, size || kernels, filter_size || [x y z], where x,y,z are integers specifying length, width, and height
|-
| MaxPooling1D || size || pool_size || integer([1,∞))
|-
| MaxPooling2D || size || pool_size || [x y], where x,y are integers specifying length and width
|-
| MaxPooling3D || size || pool_size || [x y z], where x,y,z are integers specifying length, width, and height
|-
| AveragePooling1D || size || pool_size || integer([1,∞))
|-
| AveragePooling2D || size || pool_size || [x y], where x,y are integers specifying length and width
|-
| AveragePooling3D || size || pool_size || [x y z], where x,y,z are integers specifying length, width, and height
|}
 
'''Build from command-line'''
 
With keeping the information from the above table in mind, here's an example of a 4-layer ANNDL using Tensorflow. The following network consists of 3 fully-connected layers and then uses a dropout layer to randomly set 25% of the nodes to 0.
 
<code> options = anndlda('options'); </code>
 
<code> options.algorithm = 'tensorflow'; </code>
 
<code> options.tf.hidden_layer{1} = struct('type','Dense','units',64); </code>
 
<code> options.tf.hidden_layer{2} = struct('type','Dense','units',32); </code>
 
<code> options.tf.hidden_layer{3} = struct('type','Dense','units',16); </code>
 
<code> options.tf.hidden_layer{4} = struct('type','Dropout','units',0.25); </code>
 
<code> model = anndlda(x,y,options); </code>
 
 
Here's an example of a 2D CNN:
 
<code> options = anndlda('options'); </code>
 
<code> options.algorithm = 'tensorflow'; </code>
 
<code> options.tf.hidden_layer{1} = struct('type','Conv2D','units',64, 'size', [3 3]); </code>
 
<code> options.tf.hidden_layer{2} = struct('type','MaxPooling2D','size',[3 3]); </code>
 
<code> options.tf.hidden_layer{3} = struct('type','Conv2D','units',64, 'size', [3 3]); </code>
 
<code> options.tf.hidden_layer{4} = struct('type','MaxPooling2D','size',[3 3]); </code>
 
<code> options.tf.hidden_layer{5} = struct('type','Flatten'); </code>
 
<code> options.tf.hidden_layer{6} = struct('type','Dense','units', 16); </code>
 
<code> model = anndlda(x,y,options); </code>
 
'''Build from Analysis Window'''
 
Open windows and switch the '''Framework''' to Tensorflow. Create layers by clicking the '''Add Hidden Layer''' button in the middle of the panel. This creates a new row, where each row corresponds to a hidden layer. Switch between layer types by clicking on a row's '''Layer Type''' dropdown menu. Provide value(s) for the '''Units''' column by clicking the box and entering arguments. For the '''Pool/Kernel Size''' column, enter arguments that are space separated if applicable (convolutional and pooling layers). Remove a layer by clicking the checkbox of the corresponding row of that layer.





Revision as of 08:58, 2 September 2021

Purpose

Predictions based on Artificial Deep Learning Neural Network (ANNDL) regression models.

Synopsis

anndl - Launches an Analysis window with ANN as the selected method.
[model] = anndl(x,y,options);
[pred] = anndl(x,model,options);
[valid] = anndl(x,y,model,options);

Please note that the recommended way to build and apply an ANNDL model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

Build an ANNDL model from input X and Y block data using the specified number of layers and layer nodes. Alternatively, if a model is passed in ANNDL makes a Y prediction for an input test X block. The ANNDL model contains quantities (weights etc) calculated from the calibration data. When a model structure is passed in to ANNDL then these weights do not need to be calculated.

There are two options of ANNDL available:'sklearn' and 'tensorflow'.

'sklearn' is the ANNDL version used by default but the user can specify the option 'algorithm' = 'tensorflow' to use Tensorflow instead. The Scikit-Learn implementation is fast, while Tensorflow is slower but provides more customization of the network architecture. Comparisons are discussed in further detail below.

Inputs

  • x = X-block (predictor block) class "double" or "dataset", containing numeric values,
  • y = Y-block (predicted block) class "double" or "dataset", containing numeric values,
  • model = previously generated model (when applying model to new data).

Outputs

  • model = a standard model structure model with the following fields (see Standard Model Structure):
    • modeltype: 'ANNDL',
    • datasource: structure array with information about input data,
    • date: date of creation,
    • time: time of creation,
    • info: additional model information,
    • pred: 2 element cell array with
      • model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array)
    • detail: sub-structure with additional model details and results.
  • pred a structure, similar to model for the new data.

Training Termination

The ANNDL is trained on a calibration dataset to minimize prediction error, RMSEC. It is important to not overtrain, however, so some criteria for ending training are needed.

Sklearn's max_iter parameter is the maximum number of iterations for weight optimization. However, this number may not be reached for a couple of reasons. One reason being that the sklearn early stopping has been enabled. This means is that the sklearn method automatically sets 10% of the calibration data aside as validation data and optimization will stop if the validation score is not improving by n_iter_no_change (hard set to 10) or tol (this is a adjustable parameter in PLS_Toolbox). Accuracy can be increased on the calibration set by decreasing tol, but this leads to overfitting when cross-validating or predicting on the validation set.

Tensorflow training termination follows the same convention as the sklearn implementation, just under the software's respective parameter names. Termination will occur whenever either options.tf.epochs is reached or the rate of improvement does not exceed options.tf.min_delta after 20 epochs. Note these RMSE values refer to the internal preprocessed and scaled y values.

Cross-validation

Cross-validation can be applied to ANNDL when using either the ANNDL Analysis window or the command line. From the Analysis window specify the cross-validation method in the usual way (clicking on the model icon's red check-mark, or the "Choose Cross-Validation" link in the flowchart). In the cross-validation window the "Maximum Number of Nodes" specifies how many nodes in the first hidden layer (nhid1) to test over. Viewing RMSECV versus number of nhid1 nodes (toolbar icon to left of Scores Plot) is useful for choosing the number of layer 1 nodes. From the command line use the crossval method to add crossvalidation information to an existing model. Since these networks generally require large node sizes (unlike ANN), cross-validation is not done on every possible value from 1:nhid1 as this would take some time. Instead, we have implemented a rule as to what node sizes for nhid1 to test over should be. Here's the cross-validation rule that is set in place:

  • If nhid1 <= 10, cross-validation looping is done over [1:nhid1]
    • e.g. Let nhid1 = 8, nhid1 looping array will be [1:8]
  • If nhid1 > 10 and nhid1 <= 100, cross-validation looping is done over [1 2 3 5 mod(nhid1,25) nhid1] (this array contains each value where mod(nhid1,25) is 0)
    • e.g. Let nhid1 = 95, nhid1 looping array will be [1 2 3 5 25 50 75 95]
  • If nhid1 > 100, looping is done over [10 20 30 50 100 mod(nhid1,100) nhid1] (this array contains each value where mod(nhid1,100) is 0)
    • e.g. Let nhid1 = 250, nhid1 looping array will be [10 20 30 50 100 200 250]

Again, this is to avoid doing cross-validation over every possible value in 1:nhid1.

Options

options = a structure array with the following fields:

  • display : [ 'off' |{'on'}] Governs display
  • plots: [ {'none'} | 'final' ] governs plotting of results.
  • blockdetails : [ {'standard'} | 'all' ] extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks.
  • waitbar : [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
  • algorithm : [{'sklearn'} | 'tensorflow'] ANNDL implementation to use.
  • preprocessing: {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
  • compression: [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculating or applying the ANNDL model. 'pca' uses a simple PCA model to compress the information. 'pls' uses a pls model. Compression can make the ANNDL more stable and less prone to overfitting.
  • compressncomp: [1] Number of latent variables (or principal components to include in the compression model.
  • compressmd: [{'yes'} | 'no'] Use Mahalnobis Distance corrected.
  • cvi : M element vector with integer elements allowing user defined subsets. (cvi) is a vector with the same number of elements as x has rows i.e., length(cvi) = size(x,1). Each cvi(i) is defined as:
cvi(i) = -2 the sample is always in the test set.
cvi(i) = -1 the sample is always in the calibration set,
cvi(i) = 0 the sample is always never used, and
cvi(i) = 1,2,3... defines each test subset.
  • sk : structure representing the input parameters for when algorithm=‘sklearn’
    • sk.activation : [ {'relu'} | 'tanh' | 'logistic' | 'identity' ] Type of activation function applied to the weights.
    • sk.solver : [ {'adam'} | 'lbfgs' | 'sgd' ] Solver for weight optimization. The lbfgs optimizer is in the famliy of quasi-Newton methods does especially well for smaller datasets and converges faster. The sgd solver does traditional stochastic gradient descent. The adam solver is another flavor of stochastic gradient descent and does well on large datasets in terms of speed and score.
    • sk.alpha : [ {'1.0000e-04'} ] L2 Penalty parameter.
    • sk.max_iter : [ {'200'} ] Maximum number of iterations for weight optimization.
    • sk.hidden_layer_sizes : [ {'100'} ] Vector of node sizes. The ith element represents the number of nodes in the ith hidden layer in the network.
    • sk.random_state : [ {'1'} ] Random seed number. Set this to a number for reproducibility.
    • sk.tol : [ {'1.0000e-04'} ] Tolerance for optimization.
    • sk.learning_rate_init : [ {'1.0000e-03'} ] Initial learning rate.
    • sk.batch_size : [ {'12'} ] Number of samples in each of the minibatches.
  • tf : structure representing the input parameters for when algorithm=‘tensorflow’
    • tf.activation : [ {'relu'} | 'tanh' | 'sigmoid' | 'linear' ] Type of activation function applied to the weights.
    • tf.optimizer : [{'adam'} | 'adamax' | 'rmsprop' | 'sgd'] Solver for weight optimization. The adam solver is another flavor of stochastic gradient descent and does well on large datasets in terms of speed and score. The adamax optimizer is an extension of adam and is based on the infinity norm. The rmsprop optimizer uses momentum and keeps a moving average of the gradients. The sgd solver does traditional stochastic gradient descent.
    • tf.loss : [{'mean_squared_error'} 'mean_absolute_error' 'log_cosh'] Choice of loss function to be minimized.
    • tf.epochs : [ {'200'} ] Maximum number of iterations for weight optimization.
    • tf.hidden_layer : [ {struct('type','Dense','units',100)} ] Cell array of structs, where each struct represents a hidden layer in the network. The struct accepts 3 possible fields: 'type', 'units', and 'size'. These layers are further explained below.
    • tf.random_state : [ {'1'} ] Random seed number. Set this to a number for reproducibility.
    • tf.min_delta : [ {'1.0000e-04'} ] Tolerance for optimization.
    • tf.learning_rate : [ {'1.0000e-03'} ] Initial learning rate.
    • tf.batch_size : [ {'12'} ] Number of samples in each of the minibatches.

Additional information on the ‘tensorflow’ ANNDL

PLS_Toolbox does not include the full slate that Tensorflow has to offer, but more than enough to get off the ground running to build deep neural networks. Tensorflow offers a wide variety of the types of layers to use, loss functions, optimizers, and activation functions. This chart goes over what has been adapted from Tensorflow thus far:

  1. Dense (fully connected layer)
  2. Flatten (Takes weights from the previous layer and flattens to a 1-dimensional vector)
  3. Dropout (Randomly assign node values to 0.)
  4. BatchNormalization (Normalizes weights to have a mean output close to 0 and standard deviation close to 1)
  5. Conv[123]D (all three dimensions included)
  6. AveragePooling[123]D (all three dimensions included)
  7. MaxPooling[123]D (all three dimensions included)
  1. Adam
  2. Adamax
  3. RMSProp
  4. SGD
  1. Mean Square Error
  2. Mean Absolute Error
  3. Logcosh
  1. Relu
  2. Tanh
  3. Sigmoid
  4. Linear

How to use these layers from the command-line and ANNDL Analysis Window

Each of the layers included in PLS_Toolbox are used to do specific tasks, and therefore require different arguments. As noted above, the tf.hidden_layer parameter is a cell array of structs, each struct representing a hidden layer. The supported field names for these layers are the following

  1. ‘type’
  2. ‘units’
  3. ‘size’

Each hidden layer requires the ‘type’ field to be populated by one of the supported hidden layers provided above. This table explains the parameter mapping for the ‘units’ and ‘size’ fields and the corresponding Tensorflow parameter names by layer type:

Layer Type Required Field(s) Tensorflow Parameter(s) Input data type
Dense units units integer([1,∞))
Flatten N/A N/A N/A
Dropout units rate float((0,1])
BatchNormalization N/A N/A N/A
Conv1D units, size kernels, filter_size integer([1,∞))
Conv2D units, size kernels, filter_size [x y], where x,y are integers specifying length and width
Conv3D units, size kernels, filter_size [x y z], where x,y,z are integers specifying length, width, and height
MaxPooling1D size pool_size integer([1,∞))
MaxPooling2D size pool_size [x y], where x,y are integers specifying length and width
MaxPooling3D size pool_size [x y z], where x,y,z are integers specifying length, width, and height
AveragePooling1D size pool_size integer([1,∞))
AveragePooling2D size pool_size [x y], where x,y are integers specifying length and width
AveragePooling3D size pool_size [x y z], where x,y,z are integers specifying length, width, and height

Build from command-line

With keeping the information from the above table in mind, here's an example of a 4-layer ANNDL using Tensorflow. The following network consists of 3 fully-connected layers and then uses a dropout layer to randomly set 25% of the nodes to 0.

options = anndlda('options');

options.algorithm = 'tensorflow';

options.tf.hidden_layer{1} = struct('type','Dense','units',64);

options.tf.hidden_layer{2} = struct('type','Dense','units',32);

options.tf.hidden_layer{3} = struct('type','Dense','units',16);

options.tf.hidden_layer{4} = struct('type','Dropout','units',0.25);

model = anndlda(x,y,options);


Here's an example of a 2D CNN:

options = anndlda('options');

options.algorithm = 'tensorflow';

options.tf.hidden_layer{1} = struct('type','Conv2D','units',64, 'size', [3 3]);

options.tf.hidden_layer{2} = struct('type','MaxPooling2D','size',[3 3]);

options.tf.hidden_layer{3} = struct('type','Conv2D','units',64, 'size', [3 3]);

options.tf.hidden_layer{4} = struct('type','MaxPooling2D','size',[3 3]);

options.tf.hidden_layer{5} = struct('type','Flatten');

options.tf.hidden_layer{6} = struct('type','Dense','units', 16);

model = anndlda(x,y,options);

Build from Analysis Window

Open windows and switch the Framework to Tensorflow. Create layers by clicking the Add Hidden Layer button in the middle of the panel. This creates a new row, where each row corresponds to a hidden layer. Switch between layer types by clicking on a row's Layer Type dropdown menu. Provide value(s) for the Units column by clicking the box and entering arguments. For the Pool/Kernel Size column, enter arguments that are space separated if applicable (convolutional and pooling layers). Remove a layer by clicking the checkbox of the corresponding row of that layer.


ANNDL and ANN

The two neural network Python implementations have similarities and differences with our ANN implementation. ANNDL offers the ability to build more than 2 hidden layers, unlike ANN. This can help in contexts where a more complex network architecture is needed to for complex datasets. The node sizes in these neural networks should also be treated differently. In ANN, it is advised to keep these node sizes small and avoid using a second hidden layer, if possible. After initial testing by some of our staff, we have found that the Python neural networks in ANNDL do well when the node sizes are much larger than that of ANN. Not only can these ANNDL models perform comparably well with ANN, but the speed when changing node sizes scales very well. Another advantage (and disadvantage) is the breadth of parameters to tinker with. While it is nice to have more of a variety options to choose from, building the perfect Python neural network can be time-consuming.


Usage from ANNDL Analysis window

When using the ANNDL Analysis window, like in the ANN Analysis window, it is possible to specify a scan over a range of hidden layer nodes to use in the first hidden layer. This is enabled by setting the “Maximum number of Nodes” value in the cross-validation window. This causes ANNDL models to be built for the range of hidden layer nodes up to the specified number and the resulting RMSECV plotted versus the number of nodes is shown by clicking on the “Plot cross-validation results” plot icon in the ANNDL Analysis window’s toolbar. This can be useful for deciding how many nodes to use in the first hidden layer. While cross-validating over a range of node sizes in the first hidden layer, the sizes in the remaining hidden layers stay fixed. Note that this plot is only advisory. The resulting model is built with the input parameter number of nodes, ‘nhid’, and its model.detail.rmsecv value relates to this number of nodes. It is important to check for the optimal number of nodes to use in the ANNDL but this feature can greatly lengthen the time taken to build the ANNDL model and should be be set = 1 once the number of hidden nodes is decided.

Summary of model building speed-up settings

From the Analysis window:

ANN in PLS_Toolbox or Solo version 8.2 and earlier can be very slow if you use cross-validation (CV). This is mostly due to the CV settings window also specifying a test to find the optimal number of hidden layer 1 nodes, testing ANN models with 1, 2, …,20 nodes, each with CV. This is set by the top slider field “Maximum Number of Nodes L1”. For example, if you want to build an ANN model with 4 layer 1 nodes (using the “ANN Settings” field) but leave the CV settings window’s top slider set = 20, then you will actually build 20 models, each with CV, and save the RMSECV from each. This can be very slow, especially for the models with many nodes.

To make ANN perform faster it is recommended that you drag this CV window’s “Maximum Number of Nodes L1” slider to the left, setting = 1, unless you really want to see the results of such a parameter search over the range specified by this slider. This is the default in PLS_Toolbox and Solo versions after version 8.2. The RMSECV versus number of Layer 1 Nodes can be seen by clicking on the “Plot cross-validation results” icon (next to the Scores Plot icon).

Summary: To make ANNDL perform faster:

1. Move the top CV slider to the left, setting value = 1.

2. Turning CV off or using a small number of CV splits.

3. Choose to use a small number of L1 nodes in the ANNDL settings window.

4. Increase batch size

From the command line

1. Initially build ANN without cross-validation so as to decide on values for learnrate and learncycles by examining where the minimum value of model.detail.ann.rmscviter occurs versus learncycles. Note this uses a single-split CV to estimate rmsecv when the ANN cross-validation is set as "None". It is inefficient to use a larger than necessary value for option "learncycles".

2. Determine the number of hidden layer nodes to use by building a range of models with different number of nodes, nhid1, nhid2. If using the ANN Analysis window and the ANN has a single hidden layer then this can be done conveniently by using the “Maximum number of Nodes L1” setting in the cross-validation settings window. It is best to use a simple cross-validation at this stage, with a small number of splits and iterations at this survey stage.

See Also

anndlda, annda, analysis, crossval, lwr, modelselector, pls, pcr, preprocess, svm, EVRIModel_Objects