T-Squared Q residuals and Contributions: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Created page with "Hotelling's T<sup>2</sup> (T-Squared), Q residuals, and their corresponding contributions, are summary statistics which help describe how well a model is describing the data in a...")
 
imported>Jeremy
No edit summary
Line 1: Line 1:
Hotelling's T<sup>2</sup> (T-Squared), Q residuals, and their corresponding contributions, are summary statistics which help describe how well a model is describing the data in a given sample, and why that sample has its observed scores for a given model. These statistics are generally only available for factor-based models like [[Pca|Principal Components Analysis]], [[Pls| Partial Least Squares]], [[Pcr|Principal Components Regression]], or [[Mcr| Multivariate Curve Resolution]] models. But some non-factor-based models (e.g. [[Mlr|Multiple Linear Regression]]) also provide similar statistics.
Hotelling's T<sup>2</sup> (T-Squared), Q residuals, and their corresponding contributions, are summary statistics which help describe how well a model is describing the data in a given sample, and why that sample has its observed scores for a given model. These statistics are generally only available for factor-based models like [[Pca|Principal Components Analysis]], [[Pls| Partial Least Squares]], [[Pcr|Principal Components Regression]], or [[Mcr| Multivariate Curve Resolution]] models. But some non-factor-based models (e.g. [[Mlr|Multiple Linear Regression]]) also provide similar statistics.
===Q Residuals===


Given a model which describes data generically using loadings (<b>P</b>) and scores (<b>T</b>), the following equation describes the relationship between the data and the model:
Given a model which describes data generically using loadings (<b>P</b>) and scores (<b>T</b>), the following equation describes the relationship between the data and the model:
Line 7: Line 5:
::<math>X = TP^T + E</math>
::<math>X = TP^T + E</math>


Where X is a matrix of ''m'' rows (samples) and ''n'' columns (variables).
Where X is a matrix of ''m'' rows (samples) and ''n'' columns (variables). Based on this equation, the Q residuals, Hotelling's T<sup>2</sup>, and contributions for each can be calculated. This page describes the calculation, theory, and software how-to for calculating and using these statistics.
 
===Q Residuals===


Q Residuals are a lack-of-fit statistic calculated as the sum of squares of each row (sample) of E; i.e., for the ith sample in X, <math>x_i</math>:
Q Residuals are a lack-of-fit statistic calculated as the sum of squares of each row (sample) of E; i.e., for the i<sup>th</sup> sample in <math>X</math>, <math>x_i</math>:


::<math>Q_i = e_ie_i^T = x_i(I - P_kP_k^T)x_i^T</math>
::<math>Q_i = e_ie_i^T = x_i(I - P_kP_k^T)x_i^T</math>


where <math>e_i</math> is the i<sup>th</sup> row of E, <math>P_k</math> is the matrix of the k loadings vectors retained in the model (where each vector is a column of <math>P_k</math>) and I is the identity matrix of appropriate size (n by n). The Q statistic indicates how well each sample conforms to the model
where <math>e_i</math> is the i<sup>th</sup> row of <math>E</math>, <math>P_k</math> is the matrix of the k loadings vectors retained in the model (where each vector is a column of <math>P_k</math>) and I is the identity matrix of appropriate size (n by n). The Q statistic indicates how well each sample conforms to the model. It is a measure of the difference, or residual, between a sample and its projection into the k factors retained in the model.
It is a measure of the difference, or residual, between a sample and its projection into the
k factors retained in the model.


===Q Contributions===
===Q Contributions===


A single row of the ''m'' by ''n'' <b>E</b> matrix, <math>e_i</math>, represents the Q contributions for a given sample. The Q contributions show how much each variable contributes to the overall Q
A single row of the ''m'' by ''n'' <b>E</b> matrix, <math>e_i</math>, represents the Q contributions for a given sample. The Q contributions show how much each variable contributes to the overall Q statistic for the sample while retaining the sign of the deviation in the given variable. Such contributions can be useful in identifying the variables which contribute most to a given sample's sum-squared residual error. Such residuals can often be used to determine whether the lack-of-fit is due to systematic deviation of that sample or simple random variation.
statistic for the sample while retaining the sign of the deviation in the given variable.
Such contributions can be useful in identifying the variables which contribute most to a
given sample's sum-squared residual error. Such residuals can often be used to determine whether the lack-of-fit is due to systematic deviation of that sample or simple random variation.


Q contributions are plotted while viewing a [[Model_Building:_Plotting_Scores|scores plot]]. Use the "Q con" button on the Plot Controls to request Q contributions. You will be prompted to select one or more samples for which you wish to see contributions. Select the samples and a new figure will be created showing the contributions.
Q contributions are plotted while viewing a [[Model_Building:_Plotting_Scores|scores plot]]. Use the "Q con" button on the Plot Controls to request Q contributions. You will be prompted to select one or more samples for which you wish to see contributions. Select the samples and a new figure will be created showing the contributions.

Revision as of 13:38, 28 June 2012

Hotelling's T2 (T-Squared), Q residuals, and their corresponding contributions, are summary statistics which help describe how well a model is describing the data in a given sample, and why that sample has its observed scores for a given model. These statistics are generally only available for factor-based models like Principal Components Analysis, Partial Least Squares, Principal Components Regression, or Multivariate Curve Resolution models. But some non-factor-based models (e.g. Multiple Linear Regression) also provide similar statistics.

Given a model which describes data generically using loadings (P) and scores (T), the following equation describes the relationship between the data and the model:

Where X is a matrix of m rows (samples) and n columns (variables). Based on this equation, the Q residuals, Hotelling's T2, and contributions for each can be calculated. This page describes the calculation, theory, and software how-to for calculating and using these statistics.

Q Residuals

Q Residuals are a lack-of-fit statistic calculated as the sum of squares of each row (sample) of E; i.e., for the ith sample in , :

where is the ith row of , is the matrix of the k loadings vectors retained in the model (where each vector is a column of ) and I is the identity matrix of appropriate size (n by n). The Q statistic indicates how well each sample conforms to the model. It is a measure of the difference, or residual, between a sample and its projection into the k factors retained in the model.

Q Contributions

A single row of the m by n E matrix, , represents the Q contributions for a given sample. The Q contributions show how much each variable contributes to the overall Q statistic for the sample while retaining the sign of the deviation in the given variable. Such contributions can be useful in identifying the variables which contribute most to a given sample's sum-squared residual error. Such residuals can often be used to determine whether the lack-of-fit is due to systematic deviation of that sample or simple random variation.

Q contributions are plotted while viewing a scores plot. Use the "Q con" button on the Plot Controls to request Q contributions. You will be prompted to select one or more samples for which you wish to see contributions. Select the samples and a new figure will be created showing the contributions.

Hints:
  • Left-click on a variable in the figure to view the raw data for that variable (column of X).
  • Right-clicking the figure will make the figure "interactive" allowing you to view different subsets of samples when more than one has been selected.

Relative Q Contributions

Another form of contributions are "relative" Q contributions. Normally, the Q contributions are calculated as the residuals relative to the model. In some cases, comparing the Q contributions of two or more samples relative to each other can be useful in determining whether the same measurement phenomenon is responsible for both samples' Q residuals. In this case, relative contributions can be used in which one or more samples are selected as a "reference" set, then Q contributions

Relative Q contributions can be viewed when viewing a Scores plot. On the Plot Controls, select the "Q con Ref." button and, when prompted, select the sample(s) you want to use as the reference. When complete, a new class set will be created showing you which sample(s) are being used for the relative contributions. Next, use the "Q con" button to select the samples for which you want to view contributions. The result will be a contributions plot with relative contributions calculated and noted on the Y-axis.