Asca
Purpose
ANOVA-simultaneous component analysis (ASCA) is a method to determine which factors within a fixed effects experimental design are significant relative to the residual error. ASCA permits an ANOVA-like analysis even when there are many more variables than samples. ASCA is implemented following Smilde et al, "ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data", Bioinformatics, 2005. See also Zwanenburg et al., "ANOVA-principal component analysis and ANOVA-simultaneous component analysis: a comparison". J. Chemometrics, 2011.
Synopsis
- [model] = asca(X, F);
- [model] = asca(X, F, ncomp);
- [model] = asca(X, F, ncomp, options);
Description
Build an ASCA model by applying ASCA to X-block data, X, measured according to an experimental design, F. An ASCA model is intended to show which factors have a significant in explaining the experimental data. A P-value estimating the significance of each factor or interaction is calculated based on a permutation test of the factor's levels.
Inputs
- X = the experimental value determined for each experiment/row of F.
- F = array or dataset experiment design matrix describing the settings of each X variable (cols) for each sample (row). Note, When F is a dataset then the origin/identity of each column should be described in the options field, "interactions", or in the userdata.DOE.col_ID field.
Optional Inputs
- ncomp = a cell array of integer values indicating the number of Principal Components to use in each sub-model, or a single integer value which will be used as the number of Principal Components for each sub-model. If omitted, the maximum number of components for each submodel will be calculated.
Outputs
- model = an ASCA standard model structure containing fields (when input matrix X has size mxn):
- submodel: {1xnsub cell} of evrimodels. nsub is the number of PCA sub-models used (nsub = sum of all factors and interactions modeled).
- combinedscores: [mxp dataset], combination of sub-model scores. p is the cumulative number of PCs used over all PCA sub-models. Column class sets identify which sub-model and PC number each column is associated with.
- combinedprojected: [mxp dataset] combination of scores of projecting the ANOVA residuals matrix onto each PCA sub-model. Column class sets identify which sub-model and PC number each column is associated with.
- combinedloads: [nxp dataset], combination of sub-model loads. p is the total number of PCs used over all PCA submodels. Column class sets identify which sub-model and PC number each column is associated with.
- details, which contains field:
- effects: The percentage each effect (overall mean, factors, interactions and residuals) contributes to the sum of squares of the data matrix X.
- pvalues: P-values for significance of the factor or interaction's effect obtained by using a permutation test.
- data: cell array containing input X and F.
- decomp: [1x1 struct] containing internal quantities.
- decompdata: {1x(nsub+1) cell} of ANOVA decomposed arrays, including array of means (first entry), each size [mxn].
- decompresiduals: [mxn] containing the residuals term in the ANOVA model. This is the variability not modeled by the factors and interactions.
- decompnames: {mx1 cell} names of the ANOVA factor levels.
Options
options = a structure array with the following fields:
- display: [{'off'}| 'on' ] governs output to the command window.
- interactions: cell array of numerical vectors indicating which factors contribute to columns of F, or an integer which specifies the maximum order of interactions to include. For example, interactions = 3 includes two-way and three-way interactions. Example using cell array, interactions = { 1 2 [1 2]} indicates the first two columns of F represent factors while the third column represents the interaction of these two factors.
- npermutations: [{0}] Number of permutations to use when applying permutation test to each main factor to get P-value using Null Hypothesis that the factor has no effect on the experimental outcome. P-values are in model.detail.pvalues. This value determines the smallest resolvable P-value (=1/npermutations). Default, 0, means no P-values are calculated.
- nocenterpoints: [ 'off' |{'on'}] governs automatic filtering of center points. If a design contains additional added center points, these are typically removed before calculating the factor effects. However, some other packages do not do this filtering and the only way to match their results is to disable the filtering by setting this option to 'off'. Note that filtering can only be done if the input F is a DOE DataSet object.
Example
>>ascademo