Pca: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Chuck
No edit summary
 
(24 intermediate revisions by 6 users not shown)
Line 5: Line 5:
===Synopsis===
===Synopsis===


:pca
 
:model  = pca(x,ncomp,options);  %identifies model (calibration step)
:model  = pca(x,ncomp,options);  %identifies model (calibration step)
:pred    = pca(x,model,options);  %projects a new X-block onto existing model
:pred    = pca(x,model,options);  %projects a new X-block onto existing model
:pca        %  Launches Analysis window with PCA selected


===Description===
Please note that the recommended way to build and apply a PCA model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].  
 
Performs a principal component analysis decomposition of the input array data returning ncomp principal components. E.g. for an ''M'' by ''N'' matrix '''X''' the PCA model is '''X''' = '''TP'''<sup>T</sup> + '''E''', where the scores matrix '''T''' is ''M'' by ''K'', the loadings matrix '''P''' is ''N'' by ''K'', the residuals matrix '''E''' is ''M'' by ''N'', and ''K'' is the number of factors or principal components ncomp. The output model is a PCA model structure. This model can be applied to new data by passing the model structure to PCA along with new data newdata or by using PCAPRO.
 
 
PARAFAC (PARAllel FACtor analysis) for multi-way arrays
 
===Synopsis===
 
:model  = parafac(X,ncomp,''initval,options'')
:pred    = parafac(Xnew,model)


===Description===
===Description===


PARAFAC will decompose an array of order ''N'' (where ''N'' >= 3) into the summation over the outer product of ''N'' vectors (a low-rank model). E.g. if ''N''=3 then the array is size ''I'' by ''J'' by ''K''. An example of three-way fluorescence data is shown below..
Performs a principal component analysis decomposition of the input array data returning ncomp principal components. E.g. for an ''M'' by ''N'' matrix '''X''' the PCA model is '''X''' = '''TP'''<sup>T</sup> + '''E''', where the scores matrix '''T''' is ''M'' by ''K'', the loadings matrix '''P''' is ''N'' by ''K'', the residuals matrix '''E''' is ''M'' by ''N'', and ''K'' is the number of factors or principal components <tt>ncomp</tt>. The output <tt>model</tt> is a PCA model structure. This model can be applied to new data by passing the model structure to PCA along with new data <tt>x</tt> or by using [[pcapro]].
 
For example, twenty-seven samples containing different amounts of dissolved hydroquinone, tryptophan, phenylalanine, and dopa are measured spectrofluoremetrically using 233 emission wavelengths (250-482 nm) and 24 excitation wavelengths (200-315 nm each 5 nm). A typical sample is also shown.
 
[[Image:Parafacdata.gif]]
 
A four-component PARAFAC model of these data will give four factors, each corresponding to one of the chemical analytes. This is illustrated graphically below. The first mode scores (loadings in mode 1) in the matrix '''A''' (27x4) contain estimated relative concentrations of the four analytes in the 27 samples. The second mode loadings '''B''' (233x4) are estimated emission loadings and the third mode loadings '''C''' (24x4) are estimated excitation loadings.
 
[[Image:Parafacresults.gif]]
 
In the PARAFAC algorithm, any missing values must be set to NaN or Inf and are then automatically handled by expectation maximization. This routine employs an alternating least squares (ALS) algorithm in combination with a line search. For 3-way data, the initial estimate of the loadings is usually obtained from the tri-linear decomposition (TLD).


====Inputs====
====Inputs====


* '''x''' = the multiway array to be decomposed, and
* '''x''' = X-block (2-way array class "double" or "dataset"), and


* '''ncomp''' =   
* '''ncomp''' =  number of components to to be calculated (positive integer scalar).
:* the number of factors (components) to use, OR
:* a cell array of parameters such as {a,b,c} which will then be used as starting point for the model. The cell array must be the same length as the number of modes and element j contain the scores/loadings for that mode. If one cell element is empty, this mode is guessed based on the remaining modes.


====Optional Inputs====
====Optional Inputs====


* '''''initval'''''
* '''model''' =  existing PCA model, onto which new data '''x''' is to be applied.
:* If a parafac model is input, the data are fit to this model where the loadings for the first mode (scores) are estimated.
:* If the loadings are input (e.g. model.loads) these are used as starting values.


*'''''options''''' =  discussed below.
* '''''options''''' =  discussed below.


====Outputs====
====Outputs====


The output of PCA is a model structure with the following fields (see MODELSTRUCT for additional information):
The output of PCA is a model structure with the following fields (see [[Standard Model Structure]] for additional information):


* '''modeltype''': 'PCA',
* '''modeltype''': 'PCA',
Line 79: Line 56:
If the inputs are a ''M''<sub>new</sub> by ''N'' matrix newdata and and a PCA model model, then PCA applies the model to the new data. Preprocessing included in model will be applied to newdata. The output pred is structure, similar to model, that contains the new scores, and other predictions for newdata.
If the inputs are a ''M''<sub>new</sub> by ''N'' matrix newdata and and a PCA model model, then PCA applies the model to the new data. Preprocessing included in model will be applied to newdata. The output pred is structure, similar to model, that contains the new scores, and other predictions for newdata.


Note: Calling pca with no inputs starts the graphical user interface (GUI) for this analysis method.  
Note: Calling pca with no inputs starts the graphical user interface (GUI) for this analysis method.


===Options===
===Options===
Line 91: Line 68:
* '''outputversion''': [ 2 | {3} ], governs output format (discussed below),
* '''outputversion''': [ 2 | {3} ], governs output format (discussed below),


* '''algorithm''': [ {'svd'} | 'maf' | 'robustpca' ], algorithm for decomposition,
* '''algorithm''': [ {'svd'} | 'maf' | 'robustpca' ], algorithm for decomposition. Note that algorithm 'maf' ([[maxautofactors | Maximum Autocorrelation Factors]] for hyperspectral images) requires Eigenvector's MIA_Toolbox,
 
*  '''Algorithm''' 'maf' requires Eigenvector's MIA_Toolbox.


* '''preprocessing''': {[]}, cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),
* '''preprocessing''': {[]}, cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),


* '''blockdetails''': [ {'standard'} | 'all' ], level of detail included in the model for predictions and residuals.
* '''blockdetails''': [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
:* ‘Standard’ = the predictions and raw residuals for the X-block as well as the X-block itself are not stored in the model to reduce its size in memory. Specifically, these fields in the model object are left empty: 'model.pred{1}', 'model.detail.res{1}', 'model.detail.data{1}'.
:* ‘Compact’ = for this function, 'compact' is identical to 'standard'.
:* 'All' = keep predictions, raw residuals for X-blocks as well as the X-blocks dataset itself.


* '''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits. A value of zero (0) disables calculation of confidencelimits.
* '''confidencelimit''': [ {'0.95'} ], confidence level for Q and T2 limits. A value of zero (0) disables calculation of confidencelimits.
Line 103: Line 81:
* '''roptions''': structure of options to pass to robpca (robust PCA engine from the Libra Toolbox).
* '''roptions''': structure of options to pass to robpca (robust PCA engine from the Libra Toolbox).


*  '''alpha''': [ {0.75} ], (1-alpha) measures the number of outliers the algorithcarbuggym should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpca'.
*  '''alpha''': [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpca'.
 
*  '''cutoff''': [] Similar to confidencelimit, this confidence level is used by the robust algorithm to indicate                        which sample(s) are considered outside the limits and, therefore, likely outliers. It does NOT indicate which samples were actually left out (see alpha above), but only those samples which appear to be more unusual. Default value is the same value as confidencelimit (if non-zero) or alpha (if confidencelimit is zero.)


The default options can be retreived using: options = pca('options');.
The default options can be retreived using: options = pca('options');.


OUTPUTVERSION
====OUTPUTVERSION====


By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:
Line 123: Line 103:
* '''res''' = the Q residuals,
* '''res''' = the Q residuals,


* '''reslim''' =  the estimated 95Found limit line for Q residuals,
* '''reslim''' =  the estimated 95% confidence limit line for Q residuals,


* '''tsqlim''' =  the estimated 95Found limit line for T<sup>2</sup>, and
* '''tsqlim''' =  the estimated 95% confidence limit line for T<sup>2</sup>, and


* '''tsq''' =  the Hotelling's T<sup>2</sup> values.
* '''tsq''' =  the Hotelling's T<sup>2</sup> values.


PREPROCESSING
====PREPROCESSING====


The preprocessing field can be empty [] (indicating that no preprocessing of the data should be used), or it can contain a preprocessing structure output from the PREPROCESS function. For example options.preprocessing = {preprocess('default', 'autoscale')}. This information is echoed in the output model in the model.detail.preprocessing field and is used when applying the PCA model to new data.
The preprocessing field can be empty [] (indicating that no preprocessing of the data should be used), or it can contain a preprocessing structure output from the PREPROCESS function. For example options.preprocessing = {preprocess('default', 'autoscale')}. This information is echoed in the output model in the model.detail.preprocessing field and is used when applying the PCA model to new data.
Line 135: Line 115:
===See Also===
===See Also===


[[analysis]], [[evolvfa]], [[ewfa]], [[explode]], [[parafac]], [[plotloads]], [[plotscores]], [[preprocess]], [[ssqtable]]
[[analysis]], [[browse]], [[evolvfa]], [[ewfa]], [[explode]], [[parafac]], [[plotloads]], [[plotscores]], [[preprocess]], [[ssqtable]], [[EVRIModel_Objects]]

Latest revision as of 10:30, 5 March 2021

Purpose

Perform principal components analysis.

Synopsis

model = pca(x,ncomp,options); %identifies model (calibration step)
pred = pca(x,model,options); %projects a new X-block onto existing model
pca  % Launches Analysis window with PCA selected

Please note that the recommended way to build and apply a PCA model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

Performs a principal component analysis decomposition of the input array data returning ncomp principal components. E.g. for an M by N matrix X the PCA model is X = TPT + E, where the scores matrix T is M by K, the loadings matrix P is N by K, the residuals matrix E is M by N, and K is the number of factors or principal components ncomp. The output model is a PCA model structure. This model can be applied to new data by passing the model structure to PCA along with new data x or by using pcapro.

Inputs

  • x = X-block (2-way array class "double" or "dataset"), and
  • ncomp = number of components to to be calculated (positive integer scalar).

Optional Inputs

  • model = existing PCA model, onto which new data x is to be applied.
  • options = discussed below.

Outputs

The output of PCA is a model structure with the following fields (see Standard Model Structure for additional information):

  • modeltype: 'PCA',
  • datasource: structure array with information about input data,
  • date: date of creation,
  • time: time of creation,
  • info: additional model information,
  • loads: cell array with model loadings for each mode/dimension,
  • pred: cell array with model predictions for the input block (when blockdetail='normal' x-block predictions are not saved and this will be an empty array)
  • tsqs: cell array with T2 values for each mode,
  • ssqresiduals: cell array with sum of squares residuals for each mode,
  • description: cell array with text description of model, and
  • detail: sub-structure with additional model details and results.

If the inputs are a Mnew by N matrix newdata and and a PCA model model, then PCA applies the model to the new data. Preprocessing included in model will be applied to newdata. The output pred is structure, similar to model, that contains the new scores, and other predictions for newdata.

Note: Calling pca with no inputs starts the graphical user interface (GUI) for this analysis method.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], governs level of display to command window,
  • plots: [ 'none' | {'final'} ], governs level of plotting.
  • outputversion: [ 2 | {3} ], governs output format (discussed below),
  • algorithm: [ {'svd'} | 'maf' | 'robustpca' ], algorithm for decomposition. Note that algorithm 'maf' ( Maximum Autocorrelation Factors for hyperspectral images) requires Eigenvector's MIA_Toolbox,
  • preprocessing: {[]}, cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),
  • blockdetails: [ 'compact' | {'standard'} | 'all' ] level of detail (predictions, raw residuals, and calibration data) included in the model.
  • ‘Standard’ = the predictions and raw residuals for the X-block as well as the X-block itself are not stored in the model to reduce its size in memory. Specifically, these fields in the model object are left empty: 'model.pred{1}', 'model.detail.res{1}', 'model.detail.data{1}'.
  • ‘Compact’ = for this function, 'compact' is identical to 'standard'.
  • 'All' = keep predictions, raw residuals for X-blocks as well as the X-blocks dataset itself.
  • confidencelimit: [ {'0.95'} ], confidence level for Q and T2 limits. A value of zero (0) disables calculation of confidencelimits.
  • roptions: structure of options to pass to robpca (robust PCA engine from the Libra Toolbox).
  • alpha: [ {0.75} ], (1-alpha) measures the number of outliers the algorithm should resist. Any value between 0.5 and 1 may be specified. These options are only used when algorithm is 'robustpca'.
  • cutoff: [] Similar to confidencelimit, this confidence level is used by the robust algorithm to indicate which sample(s) are considered outside the limits and, therefore, likely outliers. It does NOT indicate which samples were actually left out (see alpha above), but only those samples which appear to be more unusual. Default value is the same value as confidencelimit (if non-zero) or alpha (if confidencelimit is zero.)

The default options can be retreived using: options = pca('options');.

OUTPUTVERSION

By default (options.outputversion = 3) the output of the function is a standard model structure model. If options.outputversion = 2, the output format is:

[scores,loads,ssq,res,reslm,tsqlm,tsq] = pca(xblock1,2,options);

where the outputs are

  • scores = x-block scores,
  • loads = x-block loadings
  • ssq = the sum of squares information,
  • res = the Q residuals,
  • reslim = the estimated 95% confidence limit line for Q residuals,
  • tsqlim = the estimated 95% confidence limit line for T2, and
  • tsq = the Hotelling's T2 values.

PREPROCESSING

The preprocessing field can be empty [] (indicating that no preprocessing of the data should be used), or it can contain a preprocessing structure output from the PREPROCESS function. For example options.preprocessing = {preprocess('default', 'autoscale')}. This information is echoed in the output model in the model.detail.preprocessing field and is used when applying the PCA model to new data.

See Also

analysis, browse, evolvfa, ewfa, explode, parafac, plotloads, plotscores, preprocess, ssqtable, EVRIModel_Objects