Cluster img

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Perform automatic clustering of image using sample distances.

Synopsis

[clas,basis,info,h] = cluster_img(x,nclusters,options)
clas = cluster_img(x,basis,options)

Description

Performs partitional clustering (also known as Divisive Cluster Analysis, or DCA) in which samples are segregated into some pre-defined number of clusters (input as nclusters) based on their distance to the mean of each cluster.

The algorithm first identifies the specified number of unique samples (pixels) in a given image by using the DISTSLCT method and uses their responses as the target spectra for the clusters. All other pixels are then classified into one of those clusters based on which of the target spectra they are closest to. When using the K-means algorithm, the mean of each cluster is then calculated and used as the new target spectrum for that cluster. All the pixels are re-evaluated for their cluster assignments again (allowing for some pixels to change assignment due to the change in target spectra). This process is repeated until pixels stop changing classes or the target spectra stop changing. The final mean of each cluster is considered the basis which can be used to make class assignments for new samples.

If the algorithm is DISTSLCT, the means of the classes are not used in place of the target spectra.

This function normally also implements a robustness test by discarding any target spectra which accumulate less than a specified percentage of the samples. Thus, very unusual pixels (samples) which are not like a significant portion of the other pixels will not be allowed to create useless clusters with no appreciable membership. See the minarea option, below.

Working With the Results Plot

The plot generated by a call to Partitional Clustering contains the classification results as well as information about the target spectra (basis) and correlations to those targets:

  • Class Assignments - The plot is based on a DataSet object in which each sample/pixel has been assigned to a class corresponding to the target it was closest to during the clustering. The numerical data in the DataSet is the correlation of each sample/pixel with each of the target spectra (class means) plus the mean of all those correlations. The correlation serves as a rough indication of how strongly each pixel is assigned to each class. When classes are being shown (View > Classes in Plot Controls) the color of each pixel indicates its class and the intensity indicates the correlation of that pixel to the class. In many cases this intensity relates to the signal level in the sample.
The underlying classes can be viewed and extracted by Editing the DataSet object itself (Menu: File > Edit Data in the Plot controls) and then viewing the Row Labels. The Class column contains the class assignments of each sample. This can be copied and pasted into the original X-data, or exported for other analysis.
  • Class Basis / Targets - A toolbar button on the figure provides access to the basis spectra (i.e. the targets; means of the classes). These are the means of each class. They can be saved or exported through the Plot Controls or DataSet Editor and can be used in, for example, a PLSDA, KNN, or CLS model as the calibration data.
  • Class Statistics - The second toolbar button on the figure plots a histogram of how many samples were assigned to each class. This information is also available in text form by right-clicking any sample and choosing "View Class Statistics".

Inputs

  • [clas,basis,info,h] = cluster_img(x,nclusters,options) - The data to cluster (x), the total number of clusters to form (nclusters), and an optional (options) structure (see below), or
  • [clas,basis,info,h] = cluster_img(x,max_fract,options) - The data to cluster (x), the maximum fraction of samples allowed in a cluster (max_fract), and an optional (options) structure (see below), or
  • clas = cluster_img(x,basis,options) - A "prediction" call given the data to cluster (x), a previously calculated basis set returned by cluster_img (basis), and an optional (options) structure.

Outputs

Outputs are a vector of integer classes for each sample in x clas, the corresponding normalized basis spectra basis, information on the targets and correlation info, and the handle h of a figure if created. The output info is a structure containing the following fields:

  • corr = the correlation of each pixel to each target.
  • targets = the indices of the pixels used as targets.

Options

options = a structure array with the following fields:

  • plots: [ 'none' |{'final'}] governs plotting of results.
  • algorithm: [ 'distslct' |{'kmeans'}] algorithm for determining classes distslct is based solely on most unique samples; kmeans adjusts class target to mean of class (iterative).
  • minarea: { 1 } minimum area (in %) that a class must account for to be retained as a unique class.
  • preprocessing: { [] } preprocessing structure (see PREPROCESS).
  • pca: [ {'no'} | 'yes' ] When 'yes', A PCA model is calculated from the data and the scores are used to perform clustering. The output basis is in terms of the original variables.
  • ncomp: [ 1 ] Number of PCs (components) to retain from the PCA model. Only used if options.pca is 'yes'.

See Also

cluster, distslct, knn