Tools Cross-Validation and Evri faq: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
No edit summary
 
imported>Lyle
 
Line 1: Line 1:
__TOC__
__TOC___
[[TableOfContents|Table of Contents]] | [[ModelApplication_ValidationPhase|Previous]] | [[Tools_ModelRobustness|Next]]
==Importing / Exporting==


==Cross-Validation Tool==
[[faq_concatenate_multiple_files|How do I concatenate multiple files into a single DataSet?]]


You use the Cross-Validation tool to:
[[faq_create_multivariate_image_from_separate_images|How do I create a multivariate image from separate images?]]


{| style="
[[faq_export_PCA_scores_and_loadings_to_text_file|How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?]]


|-
[[faq_import_three-way_data|How do I import three-way data into Solo or PLS_Toolbox?]]


|
[[faq_import_horiba_NGC_64bit |Why can't I import a Horiba NGC file on my 64-bit computer?]]
* Assess the optimal complexity of a model (for example, the number of principal components in a PCA or PCR model, or the number of latent variables in a PLS model).


|}
[[faq_SPCREADR_cant_read_multiple_files |Why can't SPCREADR read multiple files I've selected?]]


{| style="
[[faq_some_EXCEL_files_fail_to_import |Why do some Excel files fail to import?]]


|-
==General==


|
[[faq_PARALIND_in_PLS_Toolbox |Can I do PARALIND in PLS_Toolbox?]]
* Estimate the performance of a model when you apply the model to unknown data.


|}
[[faq_install_on_more_than_one_PC | Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?]]


For a given set of data, cross-validation involves a series of steps called subvalidation steps in which you remove a subset of objects from a set of data (the test set), build of a model using the remaining objects in the set of data (the model building set), and then apply the resulting model to the removed objects. You note how the errors accumulate as you leave out samples to determine the number of principal components/latent variables/factors to retain in the model. Cross-validation typically involves more than one subvalidation step, each of which in turn involves the selection of different subsets of samples for model building and model testing. In Solo, five different cross-validation methods are available, and these methods vary with respect to how the different sample subsets are selected for these subvalidation steps.
[[faq_multiple_class_sets_together_in_SIMCA_PLSDA_LDA | Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?]]


{| style="
[[faq_more_info_on_R_Squared_statistic | Can you give me more information on the R-Squared statistic?]]


|-
[[faq_how_RMSEC_and_RMSECV_related to R2Y_and_Q2Y_seen_other_software | How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?]]


|1.
[[faq_convergence_of_PARAFAC| Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?]]


|To open the Cross-Validation tool, do one of the following:
[[faq_does_software_stop_working_if_maintenance_expires | Does the software stop working if my maintenance expires?]]


|}
[[faq_report_a_problem_with_PLS_Toolbox | How and where do I report a problem with PLS_Toolbox?]]


{| style="margin-left:18pt"  style="
[[faq_how_are_T_contributions_calculated | How are T-contributions calculated?]]


|-
[[faq_how_are_ROC_curves_calculated_for_PLSDA | How are the ROC curves calculated for PLSDA?]]


|
[[faq_how_are_error_bars_calculated_regression_model | How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?]]
* On the Analysis window, click Tools > Cross-Validation.


|}
[[faq_improve_performance_with_PLS_Toolbx_and_Matlab_on_Mac | How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?]]


{| style="margin-left:18pt"  style="
[[faq_assign_classes_for_samples_in_a_DataSet | How do I assign classes for samples in a DataSet?]]


|-
[[faq_build_a_classification_model_from_class_set_other_than_the_first | How do I build a classification model from a class set other than the first?]]


|
[[faq_choose_between_different_cross_validation_leave_out_options | How do I choose between the different cross-validation leave-out options?]]
* Click the Cross-Validation icon in the Analysis window.


|}
[[faq_reference_Eigenvector| How do I cite/reference Eigenvector?]]


Note: You must load data into the Analysis window before the Cross-Validation icon is available.
[[faq_interpret_ROC_curves_and_Sensitivity_Specificity_plots_from_PLSDA | How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?]]


::''Cross-validation icon in the Analysis window''
[[faq_make_DataSet_backwards_compatible | How do I make a DataSet backwards compatible?]]


::[[Image:Cross_validation_icon_Analysis_window.png|406x83px]]
[[faq_obtain_or_use_recompilation_license_for_PLS_Toolbox | How do I obtain or use a recompilation license for PLS_Toolbox?]]
::


{| style="margin-left:18pt"  style="
[[faq_use_custon_cross_validation_option | How do I use the "custom" cross-validation option?]]


|-
[[faq_out_of_memory_error_when_analyzing_data | I keep getting "out of memory" errors when analyzing my data. What can I do?]]


|
[[faq_java_lang_OutOfMemoryError| What can I do if I get a java.lang.OutOfMemoryError error?]]
* In the Analysis window Flowchart pane, click Choose Cross-Validation.


|}
[[faq_why_get_negative_scores_when_all_modes_are_set_to_nonnegativity | Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?]]


{| style="
[[faq_what_are_relative_contributions | What are "Relative Contributions"?]]


|-
[[faq_what_are_reduced_T^2_and_Q_Statistics | What are the "Reduced" T^2 and Q Statistics?]]


|2.
==Command Line==
==Manual==
==GUI==
==Installation==


|In the Cross-Validation dialog box, select the method of cross-validation that you want to use.


|}


::''Cross-Validation dialog box''


::[[Image:Cross_validation_icon_dialog_box.png|288x138px]]
::


{| style="


|-


|3.
[[Category:FAQ]]
 
|Use the slider bars to change the default values for the available parameters.
 
|}
 
Note: Not all parameters are relevant for all cross-validation methods. The initial values that are specified for the available parameters are default values that are based on the dimensionality of the data. You can click Reset at any time to reset the parameters to their default settings. For the following descriptions:
 
{| style="
 
|-
 
|
* n is the total number of objects in the set of data.
 
|}
 
{| style="
 
|-
 
|
* s is the number of data splits specified for the cross-validation procedure, which must be less than n/2.
 
|}
 
{| style="
 
|-
 
|
* r is the number of iterations.
 
|}
 
::''Cross-validation methods compared''
 
::
::
{| style="
 
|-
 
|
 
''''''
 
|
 
'''Leave One Out'''
 
|
 
'''Venetian Blinds'''
 
|
 
'''Contiguous Block'''
 
|
 
'''Random Subsets'''
 
|
 
'''Custom'''
 
|-
 
|
Cross-validation method
 
|
[[Image:CV_Leave_One_Out.jpg|86x126px]]
 
|
[[Image:CVB_VenetianBlinds.jpg|84x126px]]
 
|
[[Image:CV_ContinguousBlocks.jpg|85x126px]]
 
|
[[Image:CV_RandomSubsets.jpg|85x127px]]
 
|
 
|-
 
|
Description
 
|
The default value. All samples in the set of data are used to build the model.
 
|
Each test set is determined by selecting every sth object in the set of data, starting at objects numbered 1 through s.
 
|
An alternative to Venetian Blinds. Each test set is determined by selecting contiguous blocks of n/s objects in the set of data, starting at object number 1.
 
|
"s" different test sets are determined through random selection of n/s objects in the set of data, such that no single object is in more than one test set. This procedure is repeated "r" times, where "r" is the number of iterations.
 
|
You manually define each of the test sets. You can assign specific objects in your set of data in one of three ways:
 
{| style="
 
|-
 
|
* To be in every test set.
 
|}
 
{| style="
 
|-
 
|
* To never be in a test set.
 
|}
 
{| style="
 
|-
 
|
* To not be used in the cross- validation procedure at all.
 
|}
 
|-
 
|
Available Parameters
 
|
Maximum Number of LVs
 
|
 
{| style="
 
|-
 
|
* Maximum Number of LVs
 
|}
 
{| style="
 
|-
 
|
* Number of Data Splits
 
|}
 
|
 
{| style="
 
|-
 
|
* Maximum Number of LVs
 
|}
 
{| style="
 
|-
 
|
* Number of Data Splits
 
|}
 
|
 
{| style="
 
|-
 
|
* Maximum Number of LVs
 
|}
 
{| style="
 
|-
 
|
* Number of Data Splits
 
|}
 
{| style="
 
|-
 
|
* Number of Iterations
 
|}
 
|
 
{| style="
 
|-
 
|
* Number of data splits
 
|}
 
{| style="
 
|-
 
|
* Object membership for each split
 
|}
 
{| style="
 
|-
 
|
* Total number of objects
 
|}
 
|-
 
|
<nowiki>#</nowiki> of Subvalidation Steps
 
|
n
 
|
s
 
|
s
 
|
(s*r)
 
|
s
 
|-
 
|
<nowiki>#</nowiki> of Test Samples per Subvalidation
 
|
1
 
|
n/s
 
|
n/s
 
|
n/s
 
|
Varies. User-defined.
 
|}
 
{| style="
 
|-
 
|4.
 
|Do one of the following:
 
|}
 
{| style="margin-left:18pt"  style="
 
|-
 
|
* Click Apply button to apply these settings and keep the Cross Validation dialog box open.
 
|}
 
{| style="margin-left:18pt"  style="
 
|-
 
|
* Click OK to apply these settings and close the Cross Validation dialog box.
 
|}

Revision as of 14:25, 29 November 2018

_

Importing / Exporting

How do I concatenate multiple files into a single DataSet?

How do I create a multivariate image from separate images?

How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?

How do I import three-way data into Solo or PLS_Toolbox?

Why can't I import a Horiba NGC file on my 64-bit computer?

Why can't SPCREADR read multiple files I've selected?

Why do some Excel files fail to import?

General

Can I do PARALIND in PLS_Toolbox?

Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?

Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?

Can you give me more information on the R-Squared statistic?

How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?

Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?

Does the software stop working if my maintenance expires?

How and where do I report a problem with PLS_Toolbox?

How are T-contributions calculated?

How are the ROC curves calculated for PLSDA?

How are the error bars calculated for a regression model and can they be related to a confidence limit (confidence in the prediction)?

How can I improve performance with PLS_Toolbox and Matlab on the Mac platform?

How do I assign classes for samples in a DataSet?

How do I build a classification model from a class set other than the first?

How do I choose between the different cross-validation leave-out options?

How do I cite/reference Eigenvector?

How do I interpret the ROC curves and Sensitivity / Specificity plots from PLSDA?

How do I make a DataSet backwards compatible?

How do I obtain or use a recompilation license for PLS_Toolbox?

How do I use the "custom" cross-validation option?

I keep getting "out of memory" errors when analyzing my data. What can I do?

What can I do if I get a java.lang.OutOfMemoryError error?

Nonnegativity (PARAFAC, PARAFAC2, Tucker): Why do I get negative scores when all modes are set to nonnegativity?

What are "Relative Contributions"?

What are the "Reduced" T^2 and Q Statistics?

Command Line

Manual

GUI

Installation