Function Reference Manual: Difference between revisions

Revision as of 15:34, 11 August 2008

cluster

Purpose

Agglomerative and K-means cluster analysis with dendrograms.

Synopsis

[results,fig] = cluster(data,labels,options)

[results,fig] = cluster(data,options)

options = cluster('options')

Description

cluster(data) performs a cluster analysis using either one of six different agglomerative methods (including K-Nearest-Neighbor (KNN), furthest neighbor, and Ward's method) or K-means clustering algorithm and plots a dendrogram. The input is data (class double or dataset).

Optional input labels can be used to put labels on the dendrogram plots. For data M by N then labels must be a character array with M rows. When labels is not specified and data is class “double”, the dendrogram is plotted using sample numbers. When labels is not specified and data is class “dataset”, the dendrogram is plotted using sample labels. If the labels field is empty it will use sample numbers.

The output is a dendrogram showing the sample distances.

Note: Calling cluster with no inputs starts the graphical user interface (GUI) for this analysis method.

Outputs

The outputs are (results) a structure containing results of the clustering (defined below) and the handle (fig) to any plot created. The results structure will contain the following fields:

dist : the distance threshold at which each cluster forms.

class : the classes of each sample (columns of class) for each distance (rows of class).

order : the order of the samples which locates similar samples nearest to each other (this is the order used for the plots).

Monaco,Courier'>linkage : a table of linkages where each row indicates a linkage of one group to another. Each row in the matrix represents one group. The first two columns indicate the sample or group numbers which were linked to form the group. The final column indicates the distance between linked items. Group numbers start at m+1 (where m is the number of samples in the input dat matrix) thus, row j of this matrix is group number m+j. This matrix can be used with the statistics toolbox dendogram function. The (results.class) matrix can be used with the (results.dist) matrix to determine clusters of samples for any distance using:

results = cluster(data); %do cluster

ind = max(find(results.dist<threshold)); %user-desired threshold

thisclass = results.class(ind,:); %grab arbitrary classes

Options

Input options is a structure array with the following fields:

:plots: ['none' | {'final'}] Governs plotting. When set to 'none', the distance/cluster matrix is returned, 'final' returns a dendrogram plot showing sample distances.
:algorithm: [] clustering algorithm,
::'knn' : K-Nearest Neighbor {DEFAULT}
::'fn' : Furthest Neighbor
::'avgpair' : Average Paired Distance
::'med' : Median
::'cnt' : Centroid
::'ward' : Ward's Method
::'kmeans' : K-means
:preprocessing: {[]} Preprocessing structure or keyword (see PREPROCESS),
:pca: [{'off'} | 'on'] if ‘on’ then CLUSTER performs PCA first and clustering on the scores,
:ncomp: [] number of PCA factors to use {default = [], the user is prompted to select the number of factors from the SSQ table},
:mahalanobis: [{'off'} | 'on'] if ‘on’ then a Mahalanobis distance on the scores is used,
:slack: [0] integer number indicating how many samples can be "overridden" when two class branches merge. If the smaller of the two classes has no more than this number of samples, the branch will be absorbed into the larger class. This feature is only valid when classes are supplied in the input data. A value of 0 (zero) disables this feature.

The default options can be retreived using: options = cluster('options');.

@@ Line 2: / Line 2: @@
-'''Purpose'''
+=== Purpose ===
 Agglomerative and K-means cluster analysis with dendrograms.
-'''Synopsis'''
+=== Synopsis ===
 :[results,fig] = cluster(data'',labels,options'')
@@ Line 14: / Line 14: @@
 :options = cluster('options')
-'''Description'''
+=== Description ===
-''cluster(data)'' performs a cluster analysis using either one of six different agglomerative
+''cluster(data)'' performs a cluster analysis using either one of six different agglomerative methods (including K-Nearest-Neighbor (KNN), furthest neighbor, and Ward's method) or K-means clustering algorithm and plots a dendrogram. The input is data (class double or dataset).
-methods (including K-Nearest-Neighbor (KNN), furthest neighbor, and Ward's
-method) or K-means clustering algorithm and plots a dendrogram. The input is data (class double or
-dataset).
-Optional input ''labels'' can be used to put labels on the
+Optional input ''labels'' can be used to put labels on the dendrogram plots. For data ''M'' by ''N'' then ''labels'' must be a character array with ''M'' rows. When ''labels'' is not specified and data is class “double”, the dendrogram is plotted using sample numbers. When ''labels'' is not specified and ''data'' is class “dataset”, the dendrogram is plotted using sample labels. If the labels field is empty it will use sample numbers.
-dendrogram plots. For data ''M'' by ''N'' then ''labels'' must be a
-character array with ''M'' rows. When ''labels'' is not specified and data is class “double”, the
-dendrogram is plotted using sample numbers. When ''labels'' is not specified
-and ''data'' is class
-“dataset”, the dendrogram is plotted using sample labels. If the labels field is empty it
-will use sample numbers.
 The output is a dendrogram showing the sample distances.
-Note: Calling cluster}} with no inputs starts the graphical user interface (GUI) for this analysis
+Note: Calling cluster with no inputs starts the graphical user interface (GUI) for this analysis method.
-method.
-OUTPUTS:
+==== Outputs ====
 The outputs are (results) a structure containing results of
@@ Line 40: / Line 30: @@
 results structure will contain the following fields:
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>dist :&nbsp;&nbsp; the distance threshold at which each
+:dist : the distance threshold at which each cluster forms.
-cluster forms.
-<p class="optionsbody"><span style="font-size: 10.0; font-family: Monaco,Courier">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; class
+:class : the classes of each sample (columns of class) for each distance (rows of class).
-</span>:&nbsp;&nbsp; the classes of each sample (columns of class) for each distance
-(rows of class).
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Monaco,Courier'>order :&nbsp;&nbsp; the
+:order : the order of the samples which locates similar samples nearest to each other (this is the order used for the plots).
-order of the samples which locates similar samples nearest to each other (this
-is the order used for the plots).
 <p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Monaco,Courier'>linkage :&nbsp;&nbsp; a
@@ Line 74: / Line 59: @@
 classes
-<p class="Ref2">Options
+=== Options ===
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; options'' =&nbsp;&nbsp; a structure array with the following fields:
+Input ''options'' is a structure array with the following fields:
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>plots:&nbsp;&nbsp; Monaco,Courier'>['none' | {'final'}] Governs plotting. When set to 'none', the
-distance/cluster matrix is returned, 'final' returns a dendrogram plot showing
-sample distances.
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-size: 10.0; font-family: Monaco,Courier">algorithm</span>:&nbsp;&nbsp; [] clustering algorithm,
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'knn' {DEFAULT}:
-K-Nearest Neighbor
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'fn'
-: Furthest Neighbor
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'avgpair' : Average
-Paired Distance
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'med' : Median
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'cnt' : Centroid
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'ward' : Ward's Method
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'kmeans' : K-means
-<p class="optionsbody">&nbsp;&nbsp; <span style="font-size: 10.0; font-family: Monaco,Courier">preprocessing</span>:&nbsp;&nbsp; {[]} Preprocessing structure
-or keyword (see PREPROCESS),
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>pca:&nbsp;&nbsp; Monaco,Courier'>[{'off'} | 'on'] if ‘on’ then font-family:Monaco,Courier'>CLUSTER performs PCA first and clustering on the
-scores,
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>ncomp:&nbsp;&nbsp; Monaco,Courier'>[] number of PCA factors to use {default = [], the user is
-prompted to select the number of factors from the SSQ table},
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-size: 10.0; font-family: Monaco,Courier">mahalanobis</span>:&nbsp;&nbsp; [{'off'} | 'on'] if ‘on’
-then a Mahalanobis distance on the scores is used,
-<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>slack:&nbsp;&nbsp; Monaco,Courier'>[0] integer number indicating how many samples can be
-"overridden" when two class branches merge. If the smaller of the two
-classes has no more than this number of samples, the branch will be absorbed
-into the larger class. This feature is only valid when classes are supplied in
-the input data. A value of 0 (zero) disables this feature.
-<p class="optionsbody">&nbsp;
+<pre>
+:plots: ['none' | {'final'}] Governs plotting. When set to 'none', the distance/cluster matrix is returned, 'final' returns a dendrogram plot showing sample distances.
+:algorithm: [] clustering algorithm,
+::'knn' : K-Nearest Neighbor {DEFAULT}
+::'fn' : Furthest Neighbor
+::'avgpair' : Average Paired Distance
+::'med' : Median
+::'cnt' : Centroid
+::'ward' : Ward's Method
+::'kmeans' : K-means
+:preprocessing: {[]} Preprocessing structure or keyword (see PREPROCESS),
+:pca: [{'off'} | 'on'] if ‘on’ then CLUSTER performs PCA first and clustering on the scores,
+:ncomp: [] number of PCA factors to use {default = [], the user is prompted to select the number of factors from the SSQ table},
+:mahalanobis: [{'off'} | 'on'] if ‘on’ then a Mahalanobis distance on the scores is used,
+:slack: [0] integer number indicating how many samples can be "overridden" when two class branches merge. If the smaller of the two classes has no more than this number of samples, the branch will be absorbed into the larger class. This feature is only valid when classes are supplied in the input data. A value of 0 (zero) disables this feature.
+</pre>
 The default options can be retreived using: options = cluster('options');.
-<p class="Ref2">See Also
+=== See Also ===
-<span style="font-size: 10.0; font-family: Monaco,Courier"> agcluster, [analysis.html analysis], [corrmap.html corrmap], dendrogram, [gcluster.html gcluster], [simca.html simca]
+agcluster, [analysis.html analysis], [corrmap.html corrmap], dendrogram, [gcluster.html gcluster], [simca.html simca]
-</span>

Function Reference Manual: Difference between revisions

Revision as of 15:34, 11 August 2008

Contents

cluster

Purpose

Synopsis

Description

Outputs

Options

See Also

Navigation menu

Function Reference Manual: Difference between revisions

Revision as of 15:34, 11 August 2008

cluster

Purpose

Synopsis

Description

Outputs

Options

See Also

Navigation menu

Search