Applying a Model Quick Start: Difference between revisions

Revision as of 15:57, 13 March 2009

Congratulations! You have collected calibration data and gone through the exercise of building a model that meets your objectives. Now, you want to exert one of the most stringent tests - applying your model to new data. If you have just completed the model building process, all that needs to be done is to load some new data as validation data. Another scenario is that you have a model that has been built awhile ago, and you wish to apply it to some new data.

In this example, there are three variables in the workspace

mymodel - a PLS model that has been built on spectral data to predict a concentration
spec2 - a new set of spectral data to be used to validate the model
conc - concentration data for the validation spectra

The concentration data contains values for five separate components. The model mymodel predicts only one of these concentration values. First, click on the icon for mymodel in the Workspace browser and drag it into the Analysis GUI (or double-click it to open a new Analysis GUI). You will see the SSQ table populated with values, indicating that the model has been loaded. If the model cache was activated during the course of building the model and remains so, the calibration data will also be loaded. You can see this by noting that the X and Y buttons appear depressed, and when you pass the mouse cursor over either information on the respective data blocks is revealed.

Now that the model has been loaded

click and drag the icon for spec2 in the Workspace browser into the Analysis GUI; you will be queried on how you want this data loaded - choose "Validation X"
click and drag the icon for conc in the Workspace browser into the Analysis GUI; you will be queried on how you want this data loaded - choose "Validation Y"
click on the "Apply Model" button under the Analysis Flowchart

When you click on the "Review Scores" button, a multiplot figure will open. Try double-clicking on each subplot to create separate figures. One useful plot is Q residuals versus Hotelling's T², with both the validation and calibration data visible. In the Plot Controls window, select "Hotelling T^2" for the x-axis, and "Q Residuals" for the y-axis. Make sure that the "Show Cal Data with Test" box is checked toward the bottom of the Plot Controls window. Finally, select "View" under the Plot Controls menu, then "Classes", followed by "Cal/Test Samples"; this will apply color/symbol coding for the two classes of samples. It is sometimes useful to use log scales for Q residuals and/or T²; in this example, a log scale is used for the Q residuals. For the second plot, "Y Predicted" is selected for the y-axis along with "Y Measured" for the x-axis. As in the Q residuals vs. T² plot, the black circles represent the calibration samples and the red triangles denote the validation samples.

We note from these two plots that:

the validation samples are markedly different from the calibration samples - there is at least a two order of magnitude difference in Q residuals between the validation and calibration sets, and several of the validation samples have values of T² that are higher than the 95% confidence limit
the predictions for the validation samples are biased to lower values, although the correlation as measured by R² is still high
a suggested step would be to determine what are the factors that contribute to the high values of Q residuals and T²; these are readily obtained by using the Q con and T con buttons on the Plot Controls window

@@ Line 1: / Line 1: @@
+{| border="1" cellpadding="5" cellspacing="0" align="left"
+|-
+|width="40%" valign="top" |
 Congratulations!  You have collected calibration data and gone through the exercise of building a model that meets your objectives.  Now, you want to exert one of the most stringent tests - applying your model to new data.  If you have just completed the model building process, all that needs to be done is to load some new data as validation data.  Another scenario is that you have a model that has been built awhile ago, and you wish to apply it to some new data.
@@ Line 9: / Line 12: @@
 The concentration data contains values for five separate components.  The model ''mymodel'' predicts only one of these concentration values.  First, click on the icon for ''mymodel'' in the Workspace browser and drag it into the Analysis GUI (or double-click it to open a new Analysis GUI).  You will see the SSQ table populated with values, indicating that the model has been loaded.  If the model cache was activated during the course of building the model and remains so, the calibration data will also be loaded.  You can see this by noting that the '''X''' and '''Y''' buttons appear depressed, and when you pass the mouse cursor over either information on the respective data blocks is revealed.
-[[Image:apply_model.013.png]]
+|[[Image:apply_model.013.png| |500px]]
+|-
+| valign="top"|
 Now that the model has been loaded
@@ Line 18: / Line 22: @@
-[[Image:apply_model.014.png]]
+|[[Image:apply_model.014.png| |500px]]
+|-
+| valign=500px|
 When you click on the "Review Scores" button, a multiplot figure will open.  Try double-clicking on each subplot to create separate figures.  One useful plot is Q residuals versus Hotelling's T<sup>2</sup>, with both the validation and calibration data visible.  In the Plot Controls window, select "Hotelling T^2" for the x-axis, and "Q Residuals" for the y-axis.  Make sure that the "Show Cal Data with Test" box is checked toward the bottom of the Plot Controls window.  Finally, select "View" under the Plot Controls menu, then "Classes", followed by "Cal/Test Samples"; this will apply color/symbol coding for the two classes of samples.  It is sometimes useful to use log scales for Q residuals and/or T<sup>2</sup>; in this example, a log scale is used for the Q residuals.  For the second plot, "Y Predicted" is selected for the y-axis along with "Y Measured" for the x-axis.  As in the Q residuals vs. T<sup>2</sup> plot, the black circles represent the calibration samples and the red triangles denote the validation samples.
-[[Image:apply_model.015.png]]
 We note from these two plots that:
@@ Line 29: / Line 32: @@
 * the predictions for the validation samples are biased to lower values, although the correlation as measured by R<sup>2</sup> is still high
 * a suggested step would be to determine what are the factors that contribute to the high values of Q residuals and T<sup>2</sup>; these are readily obtained by using the '''Q con''' and '''T con''' buttons on the Plot Controls window
+|[[Image:apply_model.015.png| |500px]]
+|}

Applying a Model Quick Start: Difference between revisions

Revision as of 15:57, 13 March 2009

Navigation menu

Search