Model Building: Analysis Phases Overview and Tools Cross-Validation: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
No edit summary
 
imported>Jeremy
No edit summary
 
Line 1: Line 1:
__TOC__
__TOC__
[[TableOfContents|Table of Contents]] | [[ModelApplication_ValidationPhase|Previous]] | [[Tools_ModelRobustness|Next]]


[[TableOfContents|Table of Contents]] | [[ModelBuilding_PreProcessingMethods|Previous]] | [[ModelBuilding_CalibrationPhase|Next]]
==Cross-Validation Tool==


==Analysis Phases==
You use the Cross-Validation tool to:


The Analysis window serves as the core interface to the Solo modeling and analysis functions. You create your models in an Analysis window, apply models in this window, and also analyze and explore the models in this window. Three phases are required to completely carry out modeling and analysis in the Analysis window-the [[ModelBuilding_AnalysisPhasesOverview#Calibration phase|Calibration phase]], the [[ModelBuilding_AnalysisPhasesOverview#Test and Validation phase|Test and Validation phase]], and the [[ModelBuilding_AnalysisPhasesOverview#Model Application phase|Model Application phase]].
{|  


===Calibration phase===
|-
<td style="width: 18pt">*</td>


The Calibration phase consists of model building and exploratory analysis. In this phase, which affects only the Calibration side of the Status pane, you must load data into the X calibration control. This data is referred to as x block data, and it is a set of multivariate measurements on your data samples. Some analysis methods also require you to load data into the Y calibration control. This data is referred to as y block data and it is a set of secondary or reference measurements on the same data samples. During analysis, you identify any patterns or trends in the data, and any other information that you consider relevant, for example, any relationships that might exist between the x data and the y data, and use this information to build a model. See [[ModelBuilding_CalibrationPhase|Building the Model in the Calibration Phase]].
|Assess the optimal complexity of a model (for example, the number of principal components in a PCA or PCR model, or the number of latent variables in a PLS model).


===Test and Validation phase===
|}


The Test and Validation phase consists of applying the model that you built in the Calibration phase to your validation data, which is data with known physical and/or chemical characteristics. In this phase, which affects the Validation side of the Status pane, you must load data into to the X validation control, and if applicable, the Y validation control. As is the case in the Calibration phase, the data that you load into the X control is referred to as x block data, and it is a set of multivariate measurements on your data samples. Likewise, the data that you load into the Y control is referred to as y block data and it is a set of secondary or reference measurements on the same data samples. You use this validation data to confirm that the model that you built captures valid patterns and trends in the data. You test and validate the model by applying it to the validation data and verifying that the test results are acceptable. For example, PCA analysis is typically used for pattern recognition. A correctly built PCA model, therefore, can identify the instances for which this pattern has been broken, such as a failure in material that does not meet specifications. During the Test and Validation phase of a PCA model, some of the validation data samples should meet specifications and some of the validation data samples should be "out of spec." A well-built PCA model will identify or flag these "out of spec" samples. If the test results are acceptable, you can continue to the next phase, the Model Application phase. If the test results are not acceptable, you must return to the Calibration phase. See [[ModelApplication_ValidationPhase|Applying the Model in the Test and Validation Phase]].
{|  


===Model Application phase===
|-
<td style="width: 18pt">*</td>


The Model Application phase consists of applying the tested and verified model to new data, which is data with unknown characteristics, and therefore, the results of applying the model cannot be known in advance. If your test results, however, were acceptable in the Test and Validation phase, then the results from the Model Application phase are also likely accurate. For example, a correctly built PCA model that was successfully tested and validated in the Test and Validation phase should identify "out of spec" samples during the Model Application phase. See [[ModelApplication_ValidationPhase|Applying the Model in the Test and Validation Phase]].
|Estimate the performance of a model when you apply the model to unknown data.
 
|}
 
For a given set of data, cross-validation involves a series of steps called subvalidation steps in which you remove a subset of objects from a set of data (the test set), build of a model using the remaining objects in the set of data (the model building set), and then apply the resulting model to the removed objects. You note how the errors accumulate as you leave out samples to determine the number of principal components/latent variables/factors to retain in the model. Cross-validation typically involves more than one subvalidation step, each of which in turn involves the selection of different subsets of samples for model building and model testing. In Solo, five different cross-validation methods are available, and these methods vary with respect to how the different sample subsets are selected for these subvalidation steps.
 
{|
 
|-
<td style="width: 18pt">1.</td>
 
|To open the Cross-Validation tool, do one of the following:
 
|}
 
{| style="margin-left:18pt" 
 
|-
<td style="width: 18pt">*</td>
 
|On the Analysis window, click Tools &gt; Cross-Validation.
 
|}
 
{| style="margin-left:18pt" 
 
|-
<td style="width: 18pt">*</td>
 
|Click the Cross-Validation icon in the Analysis window.
 
|}
 
Note: You must load data into the Analysis window before the Cross-Validation icon is available.
 
::''Cross-validation icon in the Analysis window''
 
::[[Image:Cross_validation_icon_Analysis_window.png|406x83px]]
::
 
{| style="margin-left:18pt" 
 
|-
<td style="width: 18pt">*</td>
 
|In the Analysis window Flowchart pane, click Choose Cross-Validation.
 
|}
 
{|
 
|-
<td style="width: 18pt">2.</td>
 
|In the Cross-Validation dialog box, select the method of cross-validation that you want to use.
 
|}
 
::''Cross-Validation dialog box''
 
::[[Image:Cross_validation_icon_dialog_box.png|288x138px]]
::
 
{|
 
|-
<td style="width: 18pt">3.</td>
 
|Use the slider bars to change the default values for the available parameters.
 
|}
 
Note: Not all parameters are relevant for all cross-validation methods. The initial values that are specified for the available parameters are default values that are based on the dimensionality of the data. You can click Reset at any time to reset the parameters to their default settings. For the following descriptions:
 
{|
 
|-
<td style="width: 18pt">*</td>
 
|n is the total number of objects in the set of data.
 
|}
 
{|
 
|-
<td style="width: 18pt">*</td>
 
|s is the number of data splits specified for the cross-validation procedure, which must be less than n/2.
 
|}
 
{|
 
|-
<td style="width: 18pt">*</td>
 
|r is the number of iterations.
 
|}
 
::''Cross-validation methods compared''
 
::
::
<table style="border-collapse: collapse; margin-bottom: 12.0pt; margin-left: 0pt; margin-right: 0pt; margin-top: 3.0pt; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; text-align: left; width: 683.2512pt;" cellspacing="0" summary="">
 
|-
<td style="background-color: #E8E8E8; border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top; width: 83.25pt;">
 
''''''
</td>
<td style="background-color: #E8E8E8; border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top; width: 120.00024pt;">
 
'''Leave One Out'''
</td>
<td style="background-color: #E8E8E8; border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top; width: 120.00024pt;">
 
'''Venetian Blinds'''
</td>
<td style="background-color: #E8E8E8; border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top; width: 120.00024pt;">
 
'''Contiguous Block'''
</td>
<td style="background-color: #E8E8E8; border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top; width: 120.00024pt;">
 
'''Random Subsets'''
</td>
<td style="background-color: #E8E8E8; border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top; width: 120.00024pt;">
 
'''Custom'''
</td>
 
|-
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
Cross-validation method
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
[[Image:CV_Leave_One_Out.jpg|86x126px]]
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
[[Image:CVB_VenetianBlinds.jpg|84x126px]]
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
[[Image:CV_ContinguousBlocks.jpg|85x126px]]
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
[[Image:CV_RandomSubsets.jpg|85x127px]]
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
 
</td>
 
|-
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
Description
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
The default value. All samples in the set of data are used to build the model.
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
Each test set is determined by selecting every sth object in the set of data, starting at objects numbered 1 through s.
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
An alternative to Venetian Blinds. Each test set is determined by selecting contiguous blocks of n/s objects in the set of data, starting at object number 1.
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
"s" different test sets are determined through random selection of n/s objects in the set of data, such that no single object is in more than one test set. This procedure is repeated "r" times, where "r" is the number of iterations.
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
You manually define each of the test sets. You can assign specific objects in your set of data in one of three ways:
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|To be in every test set.
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|To never be in a test set.
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|To not be used in the cross- validation procedure at all.
 
|}
 
</td>
 
|-
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
Available Parameters
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
Maximum Number of LVs
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Maximum Number of LVs
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Number of Data Splits
 
|}
 
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Maximum Number of LVs
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Number of Data Splits
 
|}
 
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Maximum Number of LVs
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Number of Data Splits
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Number of Iterations
 
|}
 
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Number of data splits
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Object membership for each split
 
|}
 
{|
 
|-
<td style="width: 10.8pt">*</td>
 
|Total number of objects
 
|}
 
</td>
 
|-
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
<nowiki>#</nowiki> of Subvalidation Steps
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
n
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
s
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
s
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
(s*r)
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
s
</td>
 
|-
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
<nowiki>#</nowiki> of Test Samples per Subvalidation
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
1
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
n/s
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
n/s
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
n/s
</td>
<td style="border-bottom-color: #000000; border-bottom-style: solid; border-bottom-width: 1px; border-left-color: #000000; border-left-style: solid; border-left-width: 1px; border-right-color: #000000; border-right-style: solid; border-right-width: 1px; border-top-color: #000000; border-top-style: solid; border-top-width: 1px; padding-bottom: 1pt; padding-left: 6pt; padding-right: 6pt; padding-top: 6pt; vertical-align: top;">
Varies. User-defined.
</td>
 
</table>
 
{|
 
|-
<td style="width: 18pt">4.</td>
 
|Do one of the following:
 
|}
 
{| style="margin-left:18pt" 
 
|-
<td style="width: 18pt">*</td>
 
|Click Apply button to apply these settings and keep the Cross Validation dialog box open.
 
|}
 
{| style="margin-left:18pt" 
 
|-
<td style="width: 18pt">*</td>
 
|Click OK to apply these settings and close the Cross Validation dialog box.  
 
|}

Revision as of 12:22, 29 July 2010

Table of Contents | Previous | Next

Cross-Validation Tool

You use the Cross-Validation tool to:

*
Assess the optimal complexity of a model (for example, the number of principal components in a PCA or PCR model, or the number of latent variables in a PLS model).
*
Estimate the performance of a model when you apply the model to unknown data.

For a given set of data, cross-validation involves a series of steps called subvalidation steps in which you remove a subset of objects from a set of data (the test set), build of a model using the remaining objects in the set of data (the model building set), and then apply the resulting model to the removed objects. You note how the errors accumulate as you leave out samples to determine the number of principal components/latent variables/factors to retain in the model. Cross-validation typically involves more than one subvalidation step, each of which in turn involves the selection of different subsets of samples for model building and model testing. In Solo, five different cross-validation methods are available, and these methods vary with respect to how the different sample subsets are selected for these subvalidation steps.

1.
To open the Cross-Validation tool, do one of the following:
*
On the Analysis window, click Tools > Cross-Validation.
*
Click the Cross-Validation icon in the Analysis window.

Note: You must load data into the Analysis window before the Cross-Validation icon is available.

Cross-validation icon in the Analysis window
Cross validation icon Analysis window.png
*
In the Analysis window Flowchart pane, click Choose Cross-Validation.
2.
In the Cross-Validation dialog box, select the method of cross-validation that you want to use.
Cross-Validation dialog box
File:Cross validation icon dialog box.png
3.
Use the slider bars to change the default values for the available parameters.

Note: Not all parameters are relevant for all cross-validation methods. The initial values that are specified for the available parameters are default values that are based on the dimensionality of the data. You can click Reset at any time to reset the parameters to their default settings. For the following descriptions:

*
n is the total number of objects in the set of data.
*
s is the number of data splits specified for the cross-validation procedure, which must be less than n/2.
*
r is the number of iterations.
Cross-validation methods compared
|- |- |- |- |- |-

'

Leave One Out

Venetian Blinds

Contiguous Block

Random Subsets

Custom

Cross-validation method

CV Leave One Out.jpg

CVB VenetianBlinds.jpg

CV ContinguousBlocks.jpg

CV RandomSubsets.jpg

Description

The default value. All samples in the set of data are used to build the model.

Each test set is determined by selecting every sth object in the set of data, starting at objects numbered 1 through s.

An alternative to Venetian Blinds. Each test set is determined by selecting contiguous blocks of n/s objects in the set of data, starting at object number 1.

"s" different test sets are determined through random selection of n/s objects in the set of data, such that no single object is in more than one test set. This procedure is repeated "r" times, where "r" is the number of iterations.

You manually define each of the test sets. You can assign specific objects in your set of data in one of three ways:

*
To be in every test set.
*
To never be in a test set.
*
To not be used in the cross- validation procedure at all.

Available Parameters

Maximum Number of LVs

*
Maximum Number of LVs
*
Number of Data Splits
*
Maximum Number of LVs
*
Number of Data Splits
*
Maximum Number of LVs
*
Number of Data Splits
*
Number of Iterations
*
Number of data splits
*
Object membership for each split
*
Total number of objects

# of Subvalidation Steps

n

s

s

(s*r)

s

# of Test Samples per Subvalidation

1

n/s

n/s

n/s

Varies. User-defined.

4.
Do one of the following:
*
Click Apply button to apply these settings and keep the Cross Validation dialog box open.
*
Click OK to apply these settings and close the Cross Validation dialog box.