Exteriorpts

From Eigenvector Research Documentation Wiki
Revision as of 09:04, 29 September 2011 by imported>Neal (→‎Description)
Jump to navigation Jump to search

Purpose

Finds pts on the exterior of a normalized data space.

Synopsis

[isel,loads] = exteriorpts(x,ncomp,options)

Description

Given a two-way or higher-order data set (X), the most exterior samples or variables are identified and their indices returned.

For a two-way data set, the data (X) are assumed to be modelable as: X = CS' + E

The following is how it works. First, note that non-negative data all lie in a multivariate analog of the upper right hand quadrant and that, given sufficient selectivity in the data, the pure-component spectra (a.k.a. end-members) must lie at the exterior of the data cloud.

A) First take a 1 norm of all the data which constrains the responses to a hyper-plane and
B) remove data points with low norm (and most likely to be affected by noise). [see options.minnorm] (An alternative is to add a small offset to all the data to 'push them' towards the center of the data cloud.)

At this point the data are transformed from looking like a "snow-cone" with it's point at the origin to looking like a "hyper-pyramid" with the end-members corresponding to the corners.

C) Next, the 1-normed data are mean-centered so that the hyper-plane has a center at [0,0,...]. This procedure transforms the problem from finding points on the exterior of a data cloud to finding points at the vertices of a hyper-polygon which is done using the DISTSLCT function (called from EXTERIORPTS).

Inputs

  • x = MxN matrix.
  • ncomp = number of components to extract.

Optional Inputs

  • options = a standard options structure containing one or more of the fields discussed in the Options section below.

Outputs

  • isel = if selectdim option was non-empty, isel is a vector of the selected indices. Otherwise, isel is a cell array with the indices selected on each mode of the data.
  • loads = cell array with extracted pts/factors. Modes other than selectdim are determined via projection.

Options

options = a structure array with the following fields:

  • selectdim: [1] mode of the data from which items should be selected (i.e. 1=rows, 2=columns, ...) If empty [], all modes are analyzed and the mode with the largest sum-squared captured value is used.
  • waitbar: [ 'off' | 'on' | {'auto'} ] governs of waitbar while processing. 'auto' uses waitbar only if multiple modes are being analyzed with nway data.
  • minnorm: [ 0.03 ] approximate noise level, points with unit area smaller than this (as a fraction of the maximum value in x) are ignored during selection.
  • usepca: [{'no'}| 'yes' ] governs use of PCA as a pre-filtering step on the data prior to selection.
  • usennls: [{'no'}| 'yes' ] governs use of non-negative least squares when calculating loadings for other-than-sample modes. Only used when (loads) output is requested.
  • distmeasure: [ {'Euclidian'} | 'Mahalanobis' ] Governs the type of distance measurement to use. Mahalanobis requires the usepca option to be 'yes'.
  • samplemode: [ 1 ] mode that contains variance (factors for other modes are normalized to unit 2-norm). Only used when loads output is requested.

See Also

als, distslct, mcr, parafac, purity, purityengine