Asca

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

ANOVA-simultaneous component analysis (ASCA) is a method to determine which factors within a fixed effects experimental design are significant relative to the residual error. ASCA permits an ANOVA-like analysis even when there are many more variables than samples. ASCA is implemented following Smilde et al, "ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data", Bioinformatics, 2005. See also Zwanenburg et al., "ANOVA-principal component analysis and ANOVA-simultaneous component analysis: a comparison". J. Chemometrics, 2011.

Version 8.9 includes the ASCA+ extension to ASCA (Thiel, Feraud, and Govaerts, 2017) to handle an un-balanced design dataset. This approach uses a general linear model to estimate the ANOVA model parameters by regression rather than by using differences between level means as in conventional ANOVA. With un-balanced designs the conventional ANOVA estimation of factor effects become biased but are correctly estimated using ASCA+.

Synopsis

[model] = asca(X, F);
[model] = asca(X, F, ncomp);
[model] = asca(X, F, ncomp, options);

Description

Build an ASCA model by applying ASCA to X-block data, X, measured according to an experimental design, F. An ASCA model is intended to show which factors have a significant in explaining the experimental data. A P-value estimating the significance of each factor or interaction is calculated based on a permutation test of the factor's levels.

The helper function "plotscores_asca" is useful for viewing the results of ASCA. An example of its use is seen in the ASCA demo ("ascademo" function).

Inputs

  • X = the experimental value determined for each experiment/row of F.
  • F = array or dataset experiment design matrix describing the settings of each X variable (cols) for each sample (row). Note, When F is a dataset then the origin/identity of each column should be described in the options field, "interactions", or in the userdata.DOE.col_ID field.


Optional Inputs

  • ncomp = a cell array of integer values indicating the number of Principal Components to use in each sub-model, or a single integer value which will be used as the number of Principal Components for each sub-model. If omitted, the maximum number of components for each submodel will be calculated.

Outputs

  • model = an ASCA standard model structure containing fields (when input matrix X has size mxn):
submodel: {1xnsub cell} of evrimodels. nsub is the number of PCA sub-models used (nsub = sum of all factors and interactions modeled).
combinedscores: [mxp dataset], combination of sub-model scores. p is the cumulative number of PCs used over all PCA sub-models. Column class sets identify which sub-model and PC number each column is associated with.
combinedprojected: [mxp dataset] combination of scores of projecting the ANOVA residuals matrix onto each PCA sub-model. Column class sets identify which sub-model and PC number each column is associated with.
combinedloads: [nxp dataset], combination of sub-model loads. p is the total number of PCs used over all PCA submodels. Column class sets identify which sub-model and PC number each column is associated with.
detail: structure which contains field:
effects: The percentage each effect (overall mean, factors, interactions and residuals) contributes to the sum of squares of the data matrix X.
pvalues: P-values for significance of the factor or interaction's effect obtained by using a permutation test.
data: cell array containing input X and F.
decomp: [1x1 struct] containing internal quantities.
decompdata: {1x(nsub+1) cell} of ANOVA decomposed arrays, including array of means (first entry), each size [mxn].
decompresiduals: [mxn] containing the residuals term in the ANOVA model. This is the variability not modeled by the factors and interactions.
decompnames: {mx1 cell} names of the ANOVA factor levels.

Options

options = a structure array with the following fields:

  • display: [{'off'}| 'on' ] governs output to the command window.
  • interactions: cell array of numerical vectors indicating which factors contribute to columns of F, or an integer which specifies the maximum order of interactions to include. For example, interactions = 3 includes two-way and three-way interactions. Example using cell array, interactions = { 1 2 [1 2]} indicates the first two columns of F represent factors while the third column represents the interaction of these two factors.
  • npermutations: [{0}] Number of permutations to use when applying permutation test to each main factor to get P-value using Null Hypothesis that the factor has no effect on the experimental outcome. P-values are in model.detail.pvalues. This value determines the smallest resolvable P-value (=1/npermutations). Default, 0, means no P-values are calculated.
  • nocenterpoints: [ 'off' |{'on'}] governs automatic filtering of center points. If a design contains additional added center points, these are typically removed before calculating the factor effects. However, some other packages do not do this filtering and the only way to match their results is to disable the filtering by setting this option to 'off'. Note that filtering can only be done if the input F is a DOE DataSet object.
  • blockdetails: [ {'standard'} | 'all' ] Extent of detail included in model. 'standard' keeps only y-block, 'all' keeps both x- and y- blocks
  • preprocessing: {[]} preprocessing structure for x block (see PREPROCESS).

Example

>>ascademo

See Also

analysis, anovadoe, doegen, doeinteractions, mlsca