Tools: Permutation Test: Difference between revisions

Revision as of 13:53, 7 October 2011

Permutation Test Tool

Some regression and preprocessing methods are so exceptionally good at finding correlation between the measured data (X- and Y-blocks) that the model becomes too specific and will only apply to that exact data. Such over-fit models are often useless for predictive applications or even exploratory interpretation purposes. In many cases, careful use of cross-validation and/or separate validation data will help identify when this has happened. Permutation tests are another way to help identify an overfit model as well as provide a probability that the given model is significantly different from one built under the same conditions but on random data. If the modeling conditions are over-fitting, they will often provide a fit to random data which is better than would be expected. Permutation tests use this condition to test for over-fitting.

Permutation tests involve repeatedly and randomly reordering the y-block, rebuilding the model with the current modeling settings after each reordering. For a regression problem, this means each sample is assigned a nominally "incorrect" y-value (although the distribution of y-values is maintained because every sample's y-value is simply re-assigned to a different sample.) In the case of classification models, reordering the y-block is equivalent to shuffling the class assignments on each sample, assigning samples to the "wrong" classes.

Such permutation examines the extent the modeling conditions might be finding "chance correlation" between the x-block and the y-block or over-filtering the data. After each permutation of the y-block, the predictions for each sample from cross-validation and self-prediction, and the RMSEC and RMSECV (see Using Cross-Validation) are recorded. The shuffling is repeated multiple times and several statistics are calculated for each permutation. The result is reported in two forms:

A table of "Probability of Model Insignificance"
A plot of Sum Squared Y (SSQY) versus Y-block correlation

When the tests are performed, the user is prompted for the number of iterations to use. The statistics being calculated for the table results are designed to operate with very few iterations (as few as one, in fact) but additional iterations help to confirm the results. Iterations are more critical for the SSQ Y plot. If the plot is not of interest, the number of iterations can be greatly reduced (down to 5 or 10, for example). Otherwise, iterations of 50, 100, 200 or more should be used.

Probability Table

The probability table shows the probabilities (calculated using several different methods) that the predictions for the original, unperturbed model could have come from random chance. Put another way: these are the probabilities that the original model is not significantly different from one created from randomly shuffling the y-block. Results from three tests are shown:

Wilcoxon - Pairwise Wilcoxon signed rank test
Sign Test - Pairwise signed rank test
Rand t-test - Randomization t-test

The result is reported as a probability that the models are not distinguishable at the given probability level. Thus a value of 0.05 indicates that the models are indistinguishable at the 5% limit, which is equivalent to saying they are significantly different at the 95% limit.

The tests utilize the prediction residuals:

residuals=y-{\hat {y}}

where y is the perturbed y-values and ${\hat {y}}$ is the model-estimated values for y. The comparison is between residuals obtained with an un-permuted y-block to those obtained with the permuted y-block.

In most publications, the tests are performed on the self-prediction residuals (those obtained when the model is built from all data and applied to the same data), but the permutation tests used here also include results for the cross-validated residuals (obtained when the model is built from a subset of data and applied to the left-out data.) The sensitivity of these two sources of residuals to overfitting depends on the noise sources and modeling conditions. The cross-validated residuals may be slightly more sensitive to detecting over-fitting because both permuted and un-permuted residuals will begin to grow significantly as the data are over-fit.

An example below shows an example in which the original model is very unlikely to be random.

Probability of Model Insignificance vs. Permuted Samples
For model with 1 component(s)

Y-column:  1
                     Wilcoxon     Sign Test     Rand t-test
Self-Pred (RMSEC) :   0.000        0.000          0.005
Cross-Val (RMSECV):   0.000        0.000          0.005

Compare this to the result obtained when the number of samples is decreased to 1/3 and the number of latent variables raised to 2. The all three tests on self-predictions are now indicating that the un-permuted model is probably not significantly different (at the 95% confidence level) from ones created with randomly permuted samples. Both the Sign and Randomization t-tests indicate the permuted and un-permuted models are similar for cross-validated residuals.

Probability of Model Insignificance vs. Permuted Samples
For model with 2 component(s)

Y-column:  1
                     Wilcoxon     Sign Test     Rand t-test
Self-Pred (RMSEC) :   0.085        0.186          0.076
Cross-Val (RMSECV):   0.021        0.060          0.099

For more details on the statistics shown in these tests, see the following publications:

Edward V. Thomas, "Non-parametric statistical methods for multivariate calibration model selection and comparison", J. Chemometrics 2003; 17: 653–659.
Hilko van der Voet "Comparing the predictive accuracy of models using a simple randomization test", Chemometrics and Intelligent Laboratory Systems 25 (1994) 313-323.

SSQ Y Plot

The SSQ_Y Plot shows the self-prediction (calibration) and cross-validated y-block captured as a fractional value versus the correlation of the used y-block to the original y-block. For an non-permuted y-block the correlation should be one (1). For any permuted y-block the correlation should be significantly less.

For each permuted y-block, the root mean squared error of calibration and cross-validation (RMSEC and RMSECV, respectively) are calculated and stored. From these values, the fractional sum squared Y captured (SSQ Y) for the calibration (self-predictions) can be calculated from:

   SSQ_Y,C = 1-(RMSEC/SSQ_Y,Total)

Where SSQ_Y,Total is the total sum squared Y response) and for cross-validated predictions from:

   SSQ_Y,CV = 1-(RMSECV/SSQ_Y,Total)

The SSQ_Y,C is expected to increase up to a value of "1" when the model is capturing all the y-block response. The SSQ_Y,CV is expected to be about the same as the SSQ_Y,C as long as the model is not overfit.

Thus, when examining SSQ_Y,C and SSQ_Y,CV, the values should be similar for a given model. However, both SSQ_Y values should be higher for the model built on non-permuted y-block data versus models built from permuted data (indicating the permuted models are not doing as well - as would be expected).

@@ Line 12: / Line 12: @@
 ===Probability Table===
-The probability table shows the probabilities (calculated using several different methods) that the predictions for the original, unperturbed model could have come from random chance. Put another way: these are the probabilities that the original model is not significantly different from one created from randomly shuffling the y-block. Three tests are shown:
+The probability table shows the probabilities (calculated using several different methods) that the predictions for the original, unperturbed model could have come from random chance. Put another way: these are the probabilities that the original model is not significantly different from one created from randomly shuffling the y-block. Results from three tests are shown:
 * Wilcoxon - Pairwise Wilcoxon signed rank test
@@ Line 18: / Line 18: @@
 * Rand t-test - Randomization t-test
-These tests are performed on the residuals:
+The result is reported as a probability that the models are not distinguishable at the given probability level. Thus a value of 0.05 indicates that the models are indistinguishable at the 5% limit, which is equivalent to saying they are significantly different at the 95% limit.
-  residuals = y - y_hat
-where y is the perturbed y-values and y_hat is the model-estimated values for y. They compare the residuals obtained with an un-permuted y-block to those obtained with the permuted y-block.
-In most publications, these tests are performed on the self-prediction residuals (those obtained when the model is built from all data and applied to the same data), but the permutation tests used here also include results for the cross-validated residuals (obtained when the model is built from a subset of data and applied to the left-out data.)
+The tests utilize the prediction residuals:
+:<math>residuals = y - \hat{y}</math>
+where y is the perturbed y-values and <math>\hat{y}</math> is the model-estimated values for y. The comparison is between residuals obtained with an un-permuted y-block to those obtained with the permuted y-block.
+In most publications, the tests are performed on the self-prediction residuals (those obtained when the model is built from all data and applied to the same data), but the permutation tests used here also include results for the cross-validated residuals (obtained when the model is built from a subset of data and applied to the left-out data.) The sensitivity of these two sources of residuals to overfitting depends on the noise sources and modeling conditions. The cross-validated residuals may be slightly more sensitive to detecting over-fitting because both permuted and un-permuted residuals will begin to grow significantly as the data are over-fit.
 An example below shows an example in which the original model is very unlikely to be random.
@@ Line 36: / Line 38: @@
 </pre>
-Compare this to the result obtained when the number of samples is decreased to 1/3 and the number of latent variables raised to 2. The Randomized t-test is now indicating that the model is probably insignificantly different from one created from randomly permuted samples:
+Compare this to the result obtained when the number of samples is decreased to 1/3 and the number of latent variables raised to 2. The all three tests on self-predictions are now indicating that the un-permuted model is probably '''not''' significantly different (at the 95% confidence level) from ones created with randomly permuted samples. Both the Sign and Randomization t-tests indicate the permuted and un-permuted models are similar for cross-validated residuals.
 <pre>
@@ Line 48: / Line 50: @@
 </pre>
+For more details on the statistics shown in these tests, see the following publications:
+* Edward V. Thomas, "Non-parametric statistical methods for multivariate calibration model selection and comparison", J. Chemometrics 2003; 17: 653–659.
+* Hilko van der Voet "Comparing the predictive accuracy of models using a simple randomization test", Chemometrics and Intelligent Laboratory Systems 25 (1994) 313-323.
 ===SSQ Y Plot===

Tools: Permutation Test: Difference between revisions

Revision as of 13:53, 7 October 2011

Permutation Test Tool

Probability Table

SSQ Y Plot

Navigation menu

Search