Ann and Evri faq: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Lyle
 
Line 1: Line 1:
===Purpose===
__TOC___
==Importing / Exporting==


Predictions based on Artificial Neural Network (ANN) regression models.
[[faq_concatenate_multiple_files|How do I concatenate multiple files into a single DataSet?]]


===Synopsis===
[[faq_create_multivariate_image_from_separate_images|How do I create a multivariate image from separate images?]]


: [model] = ann(x,y,options);
[[faq_export_PCA_scores_and_loadings_to_text_file|How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?]]
: [model] = ann(x,y, nhid, options);
: [pred] = ann(x,model,options);
: [valid] = ann(x,y,model,options);


===Description===
[[faq_import_three-way_data|How do I import three-way data into Solo or PLS_Toolbox?]]


Build an ANN model from input X and Y block data using the specified number of layers and layer nodes.
[[faq_import_horiba_NGC_64bit |Why can't I import a Horiba NGC file on my 64-bit computer?]]
Alternatively, if a model is passed in ANN makes a Y prediction for an input test X block. The ANN model
contains quantities (weights etc) calculated from the calibration data. When a model structure is passed in
to ANN then these weights do not need to be calculated.


There are two implementations of ANN available referred to as 'BPN' and 'Encog'.
[[faq_SPCREADR_cant_read_multiple_files |Why can't SPCREADR read multiple files I've selected?]]
:BPN is a feedforward ANN using backpropagation training and is implemented in Matlab.
:Encog is a feedforward ANN using Resilient Backpropagation training. See [http://en.wikipedia.org/wiki/Rprop Rprop] for further details.
Encog is implemented using the Encog framework [http://www.heatonresearch.com/encog Encog] provided by
Heaton Research, Inc, under the Apache 2.0 license. Further details of Encog Neural Network features are
available at [http://www.heatonresearch.com/wiki/Main_Page#Encog_Documentation Encog Documentation].
BPN is the ANN version used by default but the user can specify the option 'algorithm' = 'encog' to use Encog instead.
Both implementations should give similar results but one may be faster than the other for different datasets.
BPN is currently the only version which calculates RMSECV.


====Inputs====
[[faq_some_EXCEL_files_fail_to_import |Why do some Excel files fail to import?]]


* '''x''' = X-block (predictor block) class "double" or "dataset", containing numeric values,
==General==
* '''y''' = Y-block (predicted block) class "double" or "dataset", containing numeric values,
* '''nhid''' = number of nodes in a single hidden layer ANN, or vector of two two numbers, indicating a two hidden layer ANN, representing the number of nodes in the two hidden layers. (this takes precedence over options nhid1 and nhid2),
* '''model''' = previously generated model (when applying model to new data).


====Outputs====
[[faq_PARALIND_in_PLS_Toolbox |Can I do PARALIND in PLS_Toolbox?]]


* '''model''' = a standard model structure model with the following fields (see [[Standard Model Structure]]):
[[faq_install_on_more_than_one_PC | Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?]]
** '''modeltype''': 'ANN',
** '''datasource''': structure array with information about input data,
** '''date''': date of creation,
** '''time''': time of creation,
** '''info''': additional model information,
** '''pred''': 2 element cell array with
*** model predictions for each input block (when options.blockdetail='normal' x-block predictions are not saved and this will be an empty array)
** '''detail''': sub-structure with additional model details and results, including:
*** model.detail.ann.W: Structure containing details of the ANN, including the ANN type, number of hidden layers and the weights.


* '''pred''' a structure, similar to '''model''' for the new data.
[[faq_multiple_class_sets_together_in_SIMCA_PLSDA_LDA | Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?]]


====Training Termination====
[[faq_more_info_on_R_Squared_statistic | Can you give me more information on the R-Squared statistic?]]
The ANN is trained on a calibration dataset to minimize prediction error, RMSEC. It is important to not overtrain, however, so some some criteria for ending training are needed.


BPN determines the optimal number of learning iteration cycles by selecting the minumum RMSECV using the calibration data over a range of learning iterations values. The cross-validation used is determined by option cvi, or else by cvmethod. If neither of these are specified then the minumum RMSEP using a single subset of samples from a 5-fold random split of the calibration data is used. This value is not saved in the model.rmsecv field. Apply cross-validation (see below) to add this information to the model.
[[faq_how_RMSEC_and_RMSECV_related to R2Y_and_Q2Y_seen_other_software | How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?]]


Encog training terminates whenever either a) RMSE becomes smaller than the option 'terminalrmse' value, or b) the rate of improvement of RMSE per 100 training iterations
[[faq_convergence_of_PARAFAC| Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?]]
becomes smaller than the option 'terminalrmserate' value, or c) time exceeds the option 'maxseconds' value (though results are not optimal if is stopped prematurely by this time limit).
Note these RMSE values refer to the internal preprocessed and scaled y values.


====Cross-validation====
[[faq_does_software_stop_working_if_maintenance_expires | Does the software stop working if my maintenance expires?]]
Cross-validation can be applied to ANN when using either the ANN Analysis window or the command line. From the Analysis window specify the cross-validation method in the usual way (clicking on the model icon's red check-mark, or the "Choose Cross-Validation" link in the flowchart). In the cross-validation window the "Maximum Number of Nodes" specifies how many hidden-layer 1 nodes to test over. Viewing RMSECV versus number of hidden-layer 1 nodes (toolbar icon to left of Scores Plot) is useful for choosing the number of layer 1 nodes. From the command line use the crossval method to add crossvalidation information to an existing model.


===Options===
[[faq_report_a_problem_with_PLS_Toolbox | How and where do I report a problem with PLS_Toolbox?]]


options = a structure array with the following fields:
[[faq_how_are_T_contributions_calculated | How are T-contributions calculated?]]
* '''display''' : [ 'off' |{'on'}] Governs display
* '''plots''': [ {'none'} | 'final' ] governs plotting of results.
* '''blockdetails''' : [ {'standard'} | 'all' ] extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks.
* '''waitbar''' : [ 'off' |{'auto'}| 'on' ] governs use of waitbar during analysis. 'auto' shows waitbar if delay will likely be longer than a reasonable waiting period.
* '''algorithm''' : [{'bpn'} | 'encog'] ANN implementation to use.
* '''nhid1''' : [{2}] Number of nodes in first hidden layer.
* '''nhid2''' : [{0}] Number of nodes in second hidden layer.
* '''learnrate''' : [0.125] ANN backpropagation learning rate (bpn only).
* '''learncycles''' : [20] Number of ANN learning iterations (bpn only).
* '''terminalrmse''' : [0.05] Termination RMSE value (of scaled y) for ANN iterations (encog only).
* '''terminalrmserate''' : [1.e-9] Termination rate of change of RMSE per 100 iterations (encog only).
* '''maxseconds''' : [{20}] Maximum duration of ANN training in seconds (encog only).
* '''preprocessing''': {[] []} preprocessing structures for x and y blocks (see PREPROCESS).
* '''compression''': [{'none'}| 'pca' | 'pls' ] type of data compression to perform on the x-block prior to calculaing or applying the ANN model. 'pca' uses a simple PCA model to compress the information. 'pls' uses a pls model. Compression can make the ANN more stable and less prone to overfitting.
* '''compressncomp''': [1] Number of latent variables (or principal components to include in the compression model.
* '''compressmd''': [{'yes'} | 'no'] Use Mahalnobis Distance corrected.
* '''cvmethod''' : [{'con'} | 'vet' | 'loo' | 'rnd'] CV method, OR [] for Kennard-Stone single split.
* '''cvsplits''' : [{5}] Number of CV subsets.
* '''cvi''' : ''M'' element vector with integer elements allowing user defined subsets. (cvi) is a vector with the same number of elements as x has rows i.e., length(cvi) = size(x,1). Each cvi(i) is defined as:
::cvi(i) = -2  the sample is always in the test set.
::cvi(i) = -1  the sample is always in the calibration set,
::cvi(i) =  0  the sample is always never used, and
::cvi(i) =  1,2,3... defines each test subset.


===See Also===
[[faq_how_are_ROC_curves_calculated_for_PLSDA | How are the ROC curves calculated for PLSDA?]]


[[analysis]], [[crossval]], [[lwr]], [[modelselector]], [[pls]], [[pcr]], [[svm]]
[[faq_how_are_error_bars_calculated_regression_model | How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?]]
 
[[faq_improve_performance_with_PLS_Toolbx_and_Matlab_on_Mac | How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?]]
 
[[faq_assign_classes_for_samples_in_a_DataSet | How do I assign classes for samples in a DataSet?]]
 
[[faq_build_a_classification_model_from_class_set_other_than_the_first | How do I build a classification model from a class set other than the first?]]
 
[[faq_choose_between_different_cross_validation_leave_out_options | How do I choose between the different cross-validation leave-out options?]]
 
[[faq_reference_Eigenvector| How do I cite/reference Eigenvector?]]
 
[[faq_interpret_ROC_curves_and_Sensitivity_Specificity_plots_from_PLSDA | How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?]]
 
[[faq_make_DataSet_backwards_compatible | How do I make a DataSet backwards compatible?]]
 
[[faq_obtain_or_use_recompilation_license_for_PLS_Toolbox | How do I obtain or use a recompilation license for PLS_Toolbox?]]
 
[[faq_use_custon_cross_validation_option | How do I use the "custom" cross-validation option?]]
 
[[faq_out_of_memory_error_when_analyzing_data | I keep getting "out of memory" errors when analyzing my data. What can I do?]]
 
[[faq_java_lang_OutOfMemoryError| What can I do if I get a java.lang.OutOfMemoryError error?]]
 
[[faq_why_get_negative_scores_when_all_modes_are_set_to_nonnegativity | Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?]]
 
[[faq_what_are_relative_contributions | What are "Relative Contributions"?]]
 
==Command Line==
==Manual==
==GUI==
==Installation==
 
 
 
 
 
 
 
[[Category:FAQ]]

Revision as of 14:15, 29 November 2018

_

Importing / Exporting

How do I concatenate multiple files into a single DataSet?

How do I create a multivariate image from separate images?

How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?

How do I import three-way data into Solo or PLS_Toolbox?

Why can't I import a Horiba NGC file on my 64-bit computer?

Why can't SPCREADR read multiple files I've selected?

Why do some Excel files fail to import?

General

Can I do PARALIND in PLS_Toolbox?

Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?

Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?

Can you give me more information on the R-Squared statistic?

How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?

Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?

Does the software stop working if my maintenance expires?

How and where do I report a problem with PLS_Toolbox?

How are T-contributions calculated?

How are the ROC curves calculated for PLSDA?

How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?

How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?

How do I assign classes for samples in a DataSet?

How do I build a classification model from a class set other than the first?

How do I choose between the different cross-validation leave-out options?

How do I cite/reference Eigenvector?

How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?

How do I make a DataSet backwards compatible?

How do I obtain or use a recompilation license for PLS_Toolbox?

How do I use the "custom" cross-validation option?

I keep getting "out of memory" errors when analyzing my data. What can I do?

What can I do if I get a java.lang.OutOfMemoryError error?

Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?

What are "Relative Contributions"?

Command Line

Manual

GUI

Installation