T-Squared Q residuals and Contributions: Difference between revisions

Revision as of 08:47, 4 August 2022

Introduction

Hotelling's T² (T-Squared), Q residuals, and their corresponding contributions, are summary statistics which help explain how well a model is describing a given sample, and why that sample has its observed scores in a given model.

The statistics are described for factor-based models like Principal Components Analysis, Partial Least Squares, and Principal Components Regression. It should be noted that similar statistics can also be calculated for nonfactor-based (e.g. Multiple Linear Regression) and non-orthogonal models such as classical least squares but these are not shown here.

Given a model which describes data generically using loadings (P) and scores (T), the following equation describes the relationship between the data and the model:

X=TP^{T}+E

Where X is a matrix of m rows (samples) and n columns (variables). Based on this equation, the Q residuals, Hotelling's T², and contributions for each can be calculated. This page describes the calculation, theory, and software how-to for calculating and using these statistics.

The following discussion is generally written with respect to a Principal Components Analysis (PCA) model. When the statistics are calculated for other models, the calculations are somewhat different, but their interpretation is the same.

Q Residuals

Q Residuals are a lack-of-fit statistic calculated as the sum of squares of each row (sample) of E; i.e., for the i^th sample in $X$ , $x_{i}$ :

Q_{i}=e_{i}e_{i}^{T}=x_{i}(I-P_{k}P_{k}^{T})x_{i}^{T}

where $e_{i}$ is the i^th row of $E$ , $P_{k}$ is the matrix of the k loadings vectors retained in the model (where each vector is a column of $P_{k}$ ) and I is the identity matrix of appropriate size (n by n). The Q statistic indicates how well each sample conforms to the model. It is a measure of the difference, or residual, between a sample and its projection into the k factors retained in the model.

Note: For models in which the $P$ loadings are not orthogonal, the second part of the above equation cannot be used. Instead, an orthogonalization must be used.

Q Contributions

A single row of the m by n E matrix, $e_{i}$ , represents the Q contributions for a given sample. The Q contributions show how much each variable contributes to the overall Q statistic for the sample while retaining the sign of the deviation in the given variable. Such contributions can be useful in identifying the variables which contribute most to a given sample's sum-squared residual error. Such residuals can often be used to determine whether the lack-of-fit is due to systematic deviation of that sample or simple random variation.

Q contributions are plotted while viewing a scores plot. Use the "Q con" button on the Plot Controls to request Q contributions. You will be prompted to select one or more samples for which you wish to see contributions. Select the samples and a new figure will be created showing the contributions.

Hints:

Left-click on a variable in the figure to view the raw data for that variable (column of X).
Right-clicking the figure will make the figure "interactive" allowing you to view different subsets of samples when more than one has been selected.

Relative Q Contributions

Another form of contributions are "relative" Q contributions. Normally, the Q contributions are calculated as the residuals relative to the model. In some cases, comparing the Q contributions of two or more samples relative to each other can be useful in determining whether the same measurement phenomenon is responsible for both samples' Q residuals. In this case, relative contributions can be used in which one or more samples are selected as a "reference" set, then Q contributions are calculated for a sample of interest and the two are subtracted, indicating what is different in their residuals.

Relative Q contributions can be plotted when viewing a Scores plot. On the Plot Controls, select the "Q con Ref." button and, when prompted, select the sample(s) you want to use as the reference. When complete, a new class set will be created showing you which sample(s) are being used for the relative contributions. Next, use the "Q con" button to select the samples for which you want to view contributions. The result will be a contributions plot with relative contributions calculated and noted on the Y-axis.

Hotelling's T-Squared

Whereas Q residuals represent the magnitude of the variation remaining in each sample after projection through the model, the Hotelling's T² values represent a measure of the variation in each sample within the model. It indicates how far each sample is from the center (scores = 0) of the model.

Hotelling’s T² is the sum of the normalized squared scores and is defined as:

T_{i}^{2}=t_{i}\lambda ^{-1}t_{i}^{T}=x_{i}P_{k}\lambda ^{-1}P_{k}^{T}x_{i}^{T}

where $t_{i}$ refers to the i^th row of $T_{k}$ , the m by k matrix of scores from the model, and $\lambda$ is a diagonal matrix containing the eigenvalues ( $\lambda _{1}$ through $\lambda _{k}$ ) corresponding to k principal components, factors, or latent variables retained in the model where $\lambda =T_{k}^{T}T_{k}/(m-1)$ accounts for scaling by $(m-1)$ .

Note: The equations are specific to orthogonal loadings $P$ and diagonal $\lambda$ .

T-Squared Contributions

T² contributions (a.k.a. T Contributions) describe how individual variables contribute to the Hotelling's T² value for a given sample. The contributions to $T_{i}^{2}$ for the i^th sample, $t_{con,i}$ , is a 1 by k vector calculated from:

t_{con,i}=t_{i}\lambda ^{-1/2}P_{k}^{T}=x_{i}P_{k}\lambda ^{-1/2}P_{k}^{T}

The term $t_{con}$ can be considered a scaled version of the data within the model. The data are scaled to equalize the variance captured by each factor. Note that this formulation for T² contributions has the property that the sum-squared contributions give Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T_i^2} for the given sample. That is:

T_{i}^{2}=t_{con,i}t_{con,i}^{T}

T² contributions are plotted while viewing a scores plot. Use the "T con" button on the Plot Controls to request T² contributions. You will be prompted to select one or more samples for which you wish to see contributions. Select the samples and a new figure will be created showing the contributions.

Hints:

Left-click on a variable in the figure to view the raw data for that variable (column of X).
Right-clicking the figure will make the figure "interactive" allowing you to view different subsets of samples when more than one has been selected.

Relative T-Squared Contributions

Normally, T² contributions are calculated relative to the center of the model which is often the average sample. They describe how the original variables combined to give the scores that were observed for a given sample. In some cases, it is of more interest to calculate what makes a sample different from the other samples within the model. That is the utility of relative T² contributions. Relative contributions show how two or more samples differ from each other. Put another way, relative T² contributions show what changes in the underlying X variables to cause a sample to move arbitrarily from one point to another the "score space" of the model.

Numerically, relative contributions are simply the T² contributions for a reference sample (or the mean contributions from a group of reference samples) subtracted from the T² contributions for a sample of interest.

Relative T² contributions can be plotted when viewing a Scores plot. On the Plot Controls, select the "T con Ref." button and, when prompted, select the sample(s) you want to use as the reference. When complete, a new class set will be created showing you which sample(s) are being used for the relative contributions. Next, use the "T con" button to select the samples for which you want to view contributions. The result will be a contributions plot with relative contributions calculated and noted on the Y-axis.

@@ Line 4: / Line 4: @@
 Hotelling's T<sup>2</sup> (T-Squared), Q residuals, and their corresponding contributions, are summary statistics which help explain how well a model is describing a given sample, and why that sample has its observed scores in a given model.
-These statistics are generally only available for factor-based models like [[Pca|Principal Components Analysis]], [[Pls| Partial Least Squares]], [[Pcr|Principal Components Regression]], or [[Mcr| Multivariate Curve Resolution]] models. But some non-factor-based models (e.g. [[Mlr|Multiple Linear Regression]]) also provide similar statistics.
+The statistics are described for factor-based models like [[Pca|Principal Components Analysis]], [[Pls| Partial Least Squares]], and [[Pcr|Principal Components Regression]]. It should be noted that similar statistics can also be calculated for nonfactor-based (e.g. [[Mlr|Multiple Linear Regression]]) and non-orthogonal models such as classical least squares but these are not shown here.
 Given a model which describes data generically using loadings (<b>P</b>) and scores (<b>T</b>), the following equation describes the relationship between the data and the model:
@@ Line 49: / Line 49: @@
 ::<math>T_i^2 = t_i\lambda^{-1}t_i^T = x_iP_k\lambda^{-1}P_k^Tx_i^T</math>
-where <math>t_i</math> refers to the i<sup>th</sup> row of <math>T_k</math>, the ''m'' by ''k'' matrix of scores from the model, and <math>\lambda</math> is a diagonal matrix containing the eigenvalues (<math>\lambda_1</math> through <math>\lambda_k</math>) corresponding to ''k'' principal components, factors, or latent variables retained in the model.
+where <math>t_i</math> refers to the i<sup>th</sup> row of <math>T_k</math>, the ''m'' by ''k'' matrix of scores from the model, and <math>\lambda</math> is a diagonal matrix containing the eigenvalues (<math>\lambda_1</math> through <math>\lambda_k</math>) corresponding to ''k'' principal components, factors, or latent variables retained in the model where <math>\lambda = T_k^TT_k/(m-1)</math> accounts for scaling by <math>(m-1)</math>.
-''Note:'' For models in which the <math>P</math> loadings are not orthogonal, the <math>\lambda</math> in the above equation must be estimated. As such, T<sup>2</sup> and its contributions are not generally as accurate, nor as useful in such models.
+''Note:'' The equations are specific to orthogonal loadings <math>P</math> and diagonal <math>\lambda</math>.
 ===T-Squared Contributions===

T-Squared Q residuals and Contributions: Difference between revisions

Revision as of 08:47, 4 August 2022

Contents

Introduction