Function Reference Manual: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>WikiAdmin
(New page: == cluster == '''Purpose''' Agglomerative and K-means cluster analysis with dendrograms. '''Synopsis''' :[results,fig] = cluster(data'',labels,options'') :[results,fig] = cluster(d...)
 
imported>Jeremy
(#REDIRECT PLS_Toolbox_Topics)
 
(13 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
#REDIRECT [[PLS_Toolbox_Topics]]
== cluster ==
 
 
'''Purpose'''
 
Agglomerative and K-means cluster analysis with dendrograms.
 
'''Synopsis'''
 
 
:[results,fig] = cluster(data'',labels,options'')
 
:[results,fig] = cluster(data'',options'')
 
:options = cluster('options')
 
'''Description'''
 
''cluster(data)'' performs a cluster analysis using either one of six different agglomerative
methods (including K-Nearest-Neighbor (KNN), furthest neighbor, and Ward's
method) or K-means clustering algorithm and plots a dendrogram. The input is data (class double or
dataset).
 
Optional input ''labels'' can be used to put labels on the
dendrogram plots. For data ''M'' by ''N'' then ''labels'' must be a
character array with ''M'' rows. When ''labels'' is not specified and data is class “double”, the
dendrogram is plotted using sample numbers. When ''labels'' is not specified
and ''data'' is class
“dataset”, the dendrogram is plotted using sample labels. If the labels field is empty it
will use sample numbers.
 
The output is a dendrogram showing the sample distances.
 
Note: Calling cluster}} with no inputs starts the graphical user interface (GUI) for this analysis
method.
 
OUTPUTS:
 
The outputs are (results) a structure containing results of
the clustering (defined below) and the handle (fig) to any plot created. The
results structure will contain the following fields:
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>dist :&nbsp;&nbsp; the distance threshold at which each
cluster forms.
 
<p class="optionsbody"><span style="font-size: 10.0; font-family: Monaco,Courier">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; class
</span>:&nbsp;&nbsp; the classes of each sample (columns of class) for each distance
(rows of class).
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Monaco,Courier'>order :&nbsp;&nbsp; the
order of the samples which locates similar samples nearest to each other (this
is the order used for the plots).
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Monaco,Courier'>linkage :&nbsp;&nbsp; a
table of linkages where each row indicates a linkage of one group to another.
Each row in the matrix represents one group. The first two columns indicate the
sample or group numbers which were linked to form the group. The final column
indicates the distance between linked items. Group numbers start at m+1 (where
m is the number of samples in the input dat matrix) thus, row j of this matrix
is group number m+j. This matrix can be used with the statistics toolbox
dendogram function.
 
The (results.class) matrix can be used with the
(results.dist) matrix to determine clusters of samples for any distance using:
 
<p class="MATLABCommand">&nbsp;
 
<p class="MATLABCommand">results&nbsp;&nbsp; = cluster(data);&nbsp;&nbsp; %do
cluster
 
<p class="MATLABCommand">ind&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = max(find(results.dist&lt;threshold));&nbsp;
%user-desired threshold
 
<p class="MATLABCommand">thisclass = results.class(ind,:);&nbsp;&nbsp; %grab arbitrary
classes
 
<p class="Ref2">Options
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; options'' =&nbsp;&nbsp; a structure array with the following fields:
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>plots:&nbsp;&nbsp; Monaco,Courier'>['none' | {'final'}] Governs plotting. When set to 'none', the
distance/cluster matrix is returned, 'final' returns a dendrogram plot showing
sample distances.
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-size: 10.0; font-family: Monaco,Courier">algorithm</span>:&nbsp;&nbsp; [] clustering algorithm,
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'knn' {DEFAULT}:
K-Nearest Neighbor
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'fn'
 
: Furthest Neighbor
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'avgpair' : Average
Paired Distance
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'med' : Median
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'cnt' : Centroid
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'ward' : Ward's Method
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 'kmeans' : K-means
 
<p class="optionsbody">&nbsp;&nbsp; <span style="font-size: 10.0; font-family: Monaco,Courier">preprocessing</span>:&nbsp;&nbsp; {[]} Preprocessing structure
or keyword (see PREPROCESS),
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>pca:&nbsp;&nbsp; Monaco,Courier'>[{'off'} | 'on'] if ‘on’ then font-family:Monaco,Courier'>CLUSTER performs PCA first and clustering on the
scores,
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>ncomp:&nbsp;&nbsp; Monaco,Courier'>[] number of PCA factors to use {default = [], the user is
prompted to select the number of factors from the SSQ table},
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-size: 10.0; font-family: Monaco,Courier">mahalanobis</span>:&nbsp;&nbsp; [{'off'} | 'on'] if ‘on’
then a Mahalanobis distance on the scores is used,
 
<p class="optionsbody">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; font-family:Monaco,Courier'>slack:&nbsp;&nbsp; Monaco,Courier'>[0] integer number indicating how many samples can be
"overridden" when two class branches merge. If the smaller of the two
classes has no more than this number of samples, the branch will be absorbed
into the larger class. This feature is only valid when classes are supplied in
the input data. A value of 0 (zero) disables this feature.
 
<p class="optionsbody">&nbsp;
 
The default options can be retreived using: options = cluster('options');.
 
<p class="Ref2">See Also
<span style="font-size: 10.0; font-family: Monaco,Courier"> agcluster, [analysis.html analysis], [corrmap.html corrmap], dendrogram, [gcluster.html gcluster], [simca.html simca]
</span>

Latest revision as of 18:34, 2 September 2008

Redirect to: