Asca
Purpose
ANOVA-simultaneous component analysis (ASCA) is a method to determine which factors within a fixed effects experimental design are significant relative to the residual error. ASCA permits an ANOVA-like analysis even when there are many more measured response variables than samples, unlike ANOVA which applies to a single response variable. ASCA is implemented following Smilde et al, "ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data", Bioinformatics, 2005. See also Zwanenburg et al., "ANOVA-principal component analysis and ANOVA-simultaneous component analysis: a comparison". J. Chemometrics, 2011.
Version 8.9 includes the ASCA+ extension to ASCA (Thiel, Feraud, and Govaerts, 2017) to handle an un-balanced design dataset. This approach uses a general linear model to estimate the ANOVA model parameters by regression rather than by using differences between level means as in conventional ANOVA. With un-balanced designs the conventional ANOVA estimation of factor effects become biased but are correctly estimated using ASCA+.
Synopsis
- [model] = asca(X, F);
- [model] = asca(X, F, ncomp);
- [model] = asca(X, F, ncomp, options);
Description
Build an ASCA model by applying ASCA to X-block data, X, measured according to an experimental design, F. Here X represents the measured responses, rows being the experiments, columns being the response variables. F has the same number of rows as X and a column for each factor, so a row shows the factor levels used for that experiment. An ASCA model is intended to show which factors have a significant effect in explaining the experimental data. A P-value estimating the significance of each factor or interaction is calculated based on a permutation test of the factor's levels.
The helper function "plotscores_asca" is useful for viewing the results of ASCA. An example of its use is seen in the ASCA demo ("ascademo" function).
Statistical Significance
The statistical significance of the measured factor effect is estimated by testing the null hypothesis H0 of no experimental effect against the alternative hypothesis of an experimental effect, with a confidence level of p. The null hypothesis translates into the statement that there is no difference between the level averages of the effect for a factor or interaction. If the factor has little effect then its level means will be similar, so the sum of squares of the mean levels for the factor, SSQ, will be small compared to when the factor effect is large and with big differences between the level means.
The P-values for a two-way ANOVA for the main effects are obtained by randomizing the levels belonging to the factor under consideration within the levels of the other factor. Each permutation leads to a SSQ value, which is compared with the value realized in the experiment, SSQobs. The probability p to find the observed value by chance is estimated from the number of permutations that give an SSQ value that is larger than the observed value SSQobs, divided by the number of permutations sampled. Depending on whether p is smaller or larger than a predetermined probability value, the no-effect hypothesis H0 is accepted or rejected. See section 2.3 of Zwanenburg et al, J. Chemometrics 2011; 25: 561–567, for more details.
Inputs
- X = the experimental value determined for each experiment/row of F.
- F = array or dataset experiment design matrix describing the settings of each X variable (cols) for each sample (row). Note, When F is a dataset then the origin/identity of each column should be described in the options field, "interactions", or in the userdata.DOE.col_ID field.
Optional Inputs
- ncomp = a cell array of integer values indicating the number of Principal Components to use in each sub-model, or a single integer value which will be used as the number of Principal Components for each sub-model. If omitted, the maximum number of components for each submodel will be calculated.
Outputs
- model = an ASCA standard model structure containing fields (when input matrix X has size mxn):
- submodel: {1xnsub cell} of evrimodels. nsub is the number of PCA sub-models used (nsub = sum of all factors and interactions modeled).
- combinedscores: [mxp dataset], combination of sub-model scores. p is the cumulative number of PCs used over all PCA sub-models. Column class sets identify which sub-model and PC number each column is associated with.
- combinedprojected: [mxp dataset] combination of scores of projecting the ANOVA residuals matrix onto each PCA sub-model. Column class sets identify which sub-model and PC number each column is associated with.
- combinedloads: [nxp dataset], combination of sub-model loads. p is the total number of PCs used over all PCA submodels. Column class sets identify which sub-model and PC number each column is associated with.
- detail: structure which contains field:
- effects: The percentage each effect (overall mean, factors, interactions and residuals) contributes to the sum of squares of the data matrix X.
- pvalues: P-values for significance of the factor or interaction's effect obtained by using a permutation test.
- data: cell array containing input X and F.
- decomp: [1x1 struct] containing internal quantities.
- decompdata: {1x(nsub+1) cell} of ANOVA decomposed arrays, including array of means (first entry), each size [mxn].
- decompresiduals: [mxn] containing the residuals term in the ANOVA model. This is the variability not modeled by the factors and interactions.
- decompnames: {mx1 cell} names of the ANOVA factor levels.
Options
options = a structure array with the following fields:
- display: [{'off'}| 'on' ] governs output to the command window.
- interactions: cell array of numerical vectors indicating which factors contribute to columns of F, or an integer which specifies the maximum order of interactions to include. For example, interactions = 3 includes two-way and three-way interactions. Example using cell array, interactions = { 1 2 [1 2]} indicates the first two columns of F represent factors while the third column represents the interaction of these two factors.
- npermutations: [{0}] Number of permutations to use when applying permutation test to each main factor to get P-value using Null Hypothesis that the factor has no effect on the experimental outcome. P-values are in model.detail.pvalues. This value determines the smallest resolvable P-value (=1/npermutations). Default, 0, means no P-values are calculated.
- nocenterpoints: [ 'off' |{'on'}] governs automatic filtering of center points. If a design contains additional added center points, these are typically removed before calculating the factor effects. However, some other packages do not do this filtering and the only way to match their results is to disable the filtering by setting this option to 'off'. Note that filtering can only be done if the input F is a DOE DataSet object.
- blockdetails: [ {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks
- preprocessing: {[]} preprocessing structure for x block (see PREPROCESS).
Example
>>ascademo