Exteriorpts

From Eigenvector Research Documentation Wiki
Revision as of 21:48, 24 May 2016 by imported>Donal (→‎Description)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Purpose

Finds pts on the exterior of a normalized data space.

Synopsis

[isel,loads] = exteriorpts(x,ncomp,options)

Description

Given a two-way or higher-order data set (X), the most exterior samples or variables are identified and their indices returned.

For a two-way data set, the data (X) are assumed to be modelable as: X = CS' + E

The following is how it works. First, note that non-negative data all lie in a multivariate analog of the upper right hand quadrant and that, given sufficient selectivity in the data, the pure-component spectra (a.k.a. end-members) must lie at the exterior of the data cloud.

A) First take a 1 norm of all the data which constrains the responses to a hyper-plane and
B) remove data points with low norm (and most likely to be affected by noise). [see options.minnorm] (An alternative is to add a small offset to all the data to 'push them' towards the center of the data cloud.)

At this point the data are transformed from looking like a "snow-cone" with it's point at the origin to looking like a "hyper-pyramid" with the end-members corresponding to the corners.

C) Next, the 1-normed data are mean-centered so that the hyper-plane has a center at [0,0,...]. This procedure transforms the problem from finding points on the exterior of a data cloud to finding points at the vertices of a hyper-polygon which is done using the DISTSLCT function (called from EXTERIORPTS).

For additional information see Gallagher, NB, Shaver, JM, Martin, EB, Morris, J, Wise, BM, Windig, W, "Curve resolution for images with applications to TOF-SIMS and Raman," Chemo. and Intell. Lab. Sys., 77(1), 105-117 (2003).

Inputs

  • x = MxN matrix.
  • ncomp = number of components to extract.

Optional Inputs

  • options = a standard options structure containing one or more of the fields discussed in the Options section below.

Outputs

  • isel = if selectdim option was non-empty, isel is a vector of the selected indices. Otherwise, isel is a cell array with the indices selected on each mode of the data.
  • loads = cell array with extracted pts/factors. Modes other than selectdim are determined via projection.

Options

options = a structure array with the following fields:

  • selectdim: [1] mode of the data from which items should be selected (i.e. 1=rows, 2=columns, ...) If empty [], all modes are analyzed and the mode with the largest sum-squared captured value is used.
  • waitbar: [ 'off' | 'on' | {'auto'} ] governs of waitbar while processing. 'auto' uses waitbar only if multiple modes are being analyzed with nway data.
  • minnorm: [ 0.03 ] approximate noise level, points with unit area smaller than this (as a fraction of the maximum value in x) are ignored during selection.
  • usepca: [{'no'}| 'yes' ] governs use of PCA as a pre-filtering step on the data prior to selection.
  • usennls: [{'no'}| 'yes' ] governs use of non-negative least squares when calculating loadings for other-than-sample modes. Only used when (loads) output is requested.
  • distmeasure: [ {'Euclidian'} | 'Mahalanobis' ] Governs the type of distance measurement to use. Mahalanobis requires the usepca option to be 'yes'.
  • samplemode: [ 1 ] mode that contains variance (factors for other modes are normalized to unit 2-norm). Only used when loads output is requested.

See Also

als, distslct, mcr, parafac, purity, purityengine