Scores and Sample Statistics: Difference between revisions
Jump to navigation
Jump to search
imported>Jeremy |
imported>Jeremy |
||
(3 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
| Scores on PC/Comp/LV|| Scores give the amount that each PC or component ("Latent Variable" or LV, generically) contributes to each sample. In models like Purity, MCR, and PARAFAC, this is theoretically proportional to chemical concentration or other quantitative property (depending on the physics of the measurements being analyzed.) This is the T term in the equation: X = TP + E | | Scores on PC/Comp/LV|| Scores give the amount that each PC or component ("Latent Variable" or LV, generically) contributes to each sample. In models like Purity, MCR, and PARAFAC, this is theoretically proportional to chemical concentration or other quantitative property (depending on the physics of the measurements being analyzed.) This is the T term in the equation: X = TP + E | ||
|- | |- | ||
| Q Residuals || Sum square residuals (aka Q Residuals) is a scalar value for each sample which describes how much of the signal in each sample is left unexplained by the model. The higher this value, the more likely the sample contains some other systematic response which the model failed to describe/capture. | | Q Residuals || Sum square residuals (aka Q Residuals) is a scalar value for each sample which describes how much of the signal in each sample is left unexplained by the model. The higher this value, the more likely the sample contains some other systematic response which the model failed to describe/capture. Q is the summation across variables of the squared E term from the equation: X = TP + E. | ||
:<math>\mathbf{Q}_i=\sum_{j=1}^{n}{{e_{i,j}}^2}</math> | :<math>\mathbf{Q}_i=\sum_{j=1}^{n}{{e_{i,j}}^2}</math> | ||
where <em>i</em> is the index for samples, <em>j</em> is the index for variables and <math>e_{i,j}</math> represents the [i,j] element of the E matrix. | where <em>i</em> is the index for samples, <em>j</em> is the index for variables and <math>e_{i,j}</math> represents the [i,j] element of the E matrix. | ||
Line 15: | Line 15: | ||
| Hotelling T^2 || Hotelling T-squared is a scalar value for each sample which describes the sum squared scores, corrected for variance captured in each component (PC,LV,etc). It gives the distance to the multivariate center of the model. The larger this value, the further away from the center and, if the sample is part of the calibration set, the more influence the sample had in the model's fitting. Hotelling T-squared can be considered the counterpart to Q Residuals. Taken together, these two statistics give how much variance the model captured (T^2) and how much was left over (Q). | | Hotelling T^2 || Hotelling T-squared is a scalar value for each sample which describes the sum squared scores, corrected for variance captured in each component (PC,LV,etc). It gives the distance to the multivariate center of the model. The larger this value, the further away from the center and, if the sample is part of the calibration set, the more influence the sample had in the model's fitting. Hotelling T-squared can be considered the counterpart to Q Residuals. Taken together, these two statistics give how much variance the model captured (T^2) and how much was left over (Q). | ||
|- | |- | ||
| KNN Score Distance (k=3) || Gives the average distance to the k nearest neighbors (in most cases, k=3) in score space for each sample. This value is an indication of how well sampled the given region of the scores space was in the original model. For more information, see the description in [[knnscoredistance]]. | | KNN Score Distance (k=3) || Gives the average distance to the k nearest neighbors (in most cases, k=3) in score space for each sample. This value is an indication of how well sampled the given region of the scores space was in the original model. If a sample is fairly unique, it will be alone in a region of the scores plot and the resulting KNN Score Distance will be high. A high KNN Score Distance for test or prediction sample may indicate the given sample is not sufficiently like the calibration set to trust the predictions - particularly with mildly or highly non-linear repsonses. For more information, see the description in [[knnscoredistance]]. | ||
|} | |} | ||
Line 29: | Line 29: | ||
| Hotelling T^2 || X || || || X | | Hotelling T^2 || X || || || X | ||
|- align="center" | |- align="center" | ||
| KNN Score Distance (k=3) || X || | | KNN Score Distance (k=3) || X || X || X || X | ||
|} | |} |
Latest revision as of 12:15, 6 December 2011
Decomposition Methods
The decomposition methods generally use only the X-block and the following statistics are available:
Property | Description |
---|---|
Scores on PC/Comp/LV | Scores give the amount that each PC or component ("Latent Variable" or LV, generically) contributes to each sample. In models like Purity, MCR, and PARAFAC, this is theoretically proportional to chemical concentration or other quantitative property (depending on the physics of the measurements being analyzed.) This is the T term in the equation: X = TP + E |
Q Residuals | Sum square residuals (aka Q Residuals) is a scalar value for each sample which describes how much of the signal in each sample is left unexplained by the model. The higher this value, the more likely the sample contains some other systematic response which the model failed to describe/capture. Q is the summation across variables of the squared E term from the equation: X = TP + E.
where i is the index for samples, j is the index for variables and represents the [i,j] element of the E matrix. |
Hotelling T^2 | Hotelling T-squared is a scalar value for each sample which describes the sum squared scores, corrected for variance captured in each component (PC,LV,etc). It gives the distance to the multivariate center of the model. The larger this value, the further away from the center and, if the sample is part of the calibration set, the more influence the sample had in the model's fitting. Hotelling T-squared can be considered the counterpart to Q Residuals. Taken together, these two statistics give how much variance the model captured (T^2) and how much was left over (Q). |
KNN Score Distance (k=3) | Gives the average distance to the k nearest neighbors (in most cases, k=3) in score space for each sample. This value is an indication of how well sampled the given region of the scores space was in the original model. If a sample is fairly unique, it will be alone in a region of the scores plot and the resulting KNN Score Distance will be high. A high KNN Score Distance for test or prediction sample may indicate the given sample is not sufficiently like the calibration set to trust the predictions - particularly with mildly or highly non-linear repsonses. For more information, see the description in knnscoredistance. |
Property | PCA | Purity | MCR | PARAFAC |
---|---|---|---|---|
Scores on PC / Comp | X | X | X | X |
Q Residuals | X | X | X | X |
Hotelling T^2 | X | X | ||
KNN Score Distance (k=3) | X | X | X | X |