Automatic sample selection and Evri faq: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Lyle
 
Line 1: Line 1:
The Calibration/Validation Sample selection interface allows the user to choose which samples to keep in the calibration set (Cal) and which to move to the validation set (Val).
__TOC___
==Importing / Exporting==


Selection can be done manually, by setting the "Sample Type" Class set (Under the Row Labels tab) to either Calibration or Validation for each sample, or automatically by selecting the Automatic split button (gear) in the toolbar.
[[faq_concatenate_multiple_files|How do I concatenate multiple files into a single DataSet?]]


The sample selection interface is opened by choosing "Split into Calibration / Validation" from any of the data blocks in the Analysis status window. The resulting interface is a customized DataSet editor which shows one row for each sample in the current calibration and validation blocks and allows the user to modify the status of each sample.
[[faq_create_multivariate_image_from_separate_images|How do I create a multivariate image from separate images?]]


Once set selection is done, the "Accept Experiment Setup" toolbar button can be used to automatically sort the data into the calibration and validation blocks. All data marked as "Calibration" will be moved to the X/Y blocks in the calibration section of the Analysis window and all data marked as "Validation" will be moved to the X/Y blocks in the validation section of the Analysis window. Clicking the "Discard Experiment Setup" button will discard all Cal / Val changes.
[[faq_export_PCA_scores_and_loadings_to_text_file|How do I export PCA scores and loadings to a text file (to read into MS Excel, for example)?]]


__TOC__
[[faq_import_three-way_data|How do I import three-way data into Solo or PLS_Toolbox?]]


==Manual Sample Selection==
[[faq_import_horiba_NGC_64bit |Why can't I import a Horiba NGC file on my 64-bit computer?]]


Each sample can be moved to either the Calibration or Validation set by simply changing the "Sample Type" class. If there are labels for the samples, these will be shown in the Label field of the interface.
[[faq_SPCREADR_cant_read_multiple_files |Why can't SPCREADR read multiple files I've selected?]]


To move more than one sample at a time, click the button at the left of each row to move to select the row. Once all the desired rows are selected, use the Class pull-down menu on one of the selected rows to choose Calibration or Validation, as desired. All selected samples will be switched to the indicated set.
[[faq_some_EXCEL_files_fail_to_import |Why do some Excel files fail to import?]]


==Automatic Sample Selection==
==General==


Automatic sample selection walks the user through the selection asking a series of questions outlined below.
[[faq_PARALIND_in_PLS_Toolbox |Can I do PARALIND in PLS_Toolbox?]]


===Disposition of Previous Selection Changes===
[[faq_install_on_more_than_one_PC | Can I install PLS_Toolbox (or Solo) on more than one PC, such as on my desktop and laptop computer?]]


First, if there are any samples which have been manually or automatically moved from Cal to Val, or vice versa, the user is asked if they want to Reset all samples back to their original set before automatic selection is done. Choosing "Reset" will restore all the samples to the set they were in when the sample selection interface was opened. Choosing "Select from Current Split" will keep the samples in their current split and allow '''further''' selection automatically. "Cancel" stops all selection.
[[faq_multiple_class_sets_together_in_SIMCA_PLSDA_LDA | Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?]]


[[Image:Selreset.png]]
[[faq_more_info_on_R_Squared_statistic | Can you give me more information on the R-Squared statistic?]]


===Direction for Sample Selection===
[[faq_how_RMSEC_and_RMSECV_related to R2Y_and_Q2Y_seen_other_software | How are RMSEC and RMSECV related to R2Y and Q2Y I see in other software?]]


Next, if there are any samples marked as Validation, the user is asked which "direction" they want to select, either removing samples FROM the calibration set (to the validation set), or adding samples TO the calibration set (out of the validation set). The first option is used when there are more samples in the calibration than are desired ''or'' when the user wishes to create a test set for their model. The second option is used when new data has been measured and the user wishes to add some subset of these samples to a previous set of calibration samples (to improve model performance on the new types of samples.)
[[ faq_convergence_of_PARAFAC| Convergence of PARAFAC. How much variation between models is expected a particular PARAFAC is fit multiple times with the same settings?]]


If all the samples are in the calibration set already, Remove From Calibration is assumed.
[[ faq_does_software_stop_working_if_maintenance_expires | Does the software stop working if my maintenance expires?]]


[[Image:Seldirection.png]]
==Command Line==
==Manual==
==GUI==
==Installation==




===Selection Method===


Next, the selection method must be chosen from:
* Nearest Neighbor Thinning - based on [[reducennsamples]] this method selects samples by discarding samples which are very similar to existing samples. The result is a set of samples that spans the same range as the original data, but with an even distribution of samples across that range.
* Onion - based on [[distslct]] and [[Splitcaltest]] this method first selects a ring of highly-unqiue samples based on distance (similar to the D-Optimal criteria), then leaves out a ring of the next-unique samples, then finally selects a random subset of samples inside the boundaries selected in the "onion".


[[Image:Selmethod.png]]


===Choosing Percentage to Keep===


Finally, the user must select the percentage of samples to "select". In the case of Removing From Calibration, this is the percentage of Calibration samples to '''keep''' in the calibration set. In the case of Adding To Calibration, this is the percentage of Validation samples to '''add''' to the calibration set. The value must be between 1 and 100


[[Image:Selpct.png]]
[[Category:FAQ]]
 
 
===Finishing the Selection===
 
Once all settings have been defined, the selection will take place and the samples will be marked in their new sets. It may be useful to create a plot (click on the Plot toolbar button, or the Plot tab) to view which samples are in which sets. Accepting the changes will move all samples to the new sets and make sure Analysis is in the appropriate configuration for analysis of the data.

Revision as of 08:21, 21 November 2018