Tsne: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:
:tsne        %Launches Analysis window with TSNE selected
:tsne        %Launches Analysis window with TSNE selected


Please note that the recommended way to build and apply a TSNE model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].  
Please note that the recommended way to build a TSNE model from the command line is to use the Model Object. Please see [[EVRIModel_Objects | this wiki page on building and applying models using the Model Object]].  


===Description===
===Description===
Line 17: Line 17:


'''Note: The PLS_Toolbox Python virtual environment must be configured in order to use this method. Find out more here: [[Python configuration]].'''
'''Note: The PLS_Toolbox Python virtual environment must be configured in order to use this method. Find out more here: [[Python configuration]].'''
'''At this time, one cannot terminate Python methods from building by the conventional CTRL+C. Please take this into account and mind the workspace when using this method.'''


====Inputs====
====Inputs====
Line 50: Line 51:
''options'' =  a structure array with the following fields:
''options'' =  a structure array with the following fields:


* '''display''': [ 'off' | {'on'} ], governs level of display to command window,
* '''display''': [ 'off' | {'on'} ], Governs level of display to command window,


* '''plots''': [ 'none' | {'final'} ], governs level of plotting.
* '''plots''': [ 'none' | {'final'} ], Governs level of plotting.
* '''warnings''' : [{'off'} | 'on'], Silence or display any potential Python warnings. Only visible in the MATLAB command window.


* '''outputversion''': [ 2 | {3} ], governs output format (discussed below),
* '''preprocessing''': {[]}, Cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),
 
* '''preprocessing''': {[]}, cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),


* '''n_components''': [  {'2'}  ], Dimension of the low dimensional embedded space.
* '''n_components''': [  {'2'}  ], Dimension of the low dimensional embedded space.
Line 62: Line 62:
* '''perplexity''': [ {'30'} ], Number of nearest neighbors TSNE considers when calculating conditional probabilities.
* '''perplexity''': [ {'30'} ], Number of nearest neighbors TSNE considers when calculating conditional probabilities.


* '''learning_rate''': [ {'200'} ] The learning rate for TSNE, usually in the range [10.0, 1000.0] (as recommended from Scikit-Learn).
* '''learning_rate''': [ {'200'} ], The learning rate for TSNE, usually in the range [10.0, 1000.0] (as recommended from Scikit-Learn).


*  '''early_exaggeration''': [ {'12'} ], Controls the tightness of clusters in the embedded space and the distance between clusters.
*  '''early_exaggeration''': [ {'12'} ], Controls the tightness of clusters in the embedded space and the distance between clusters.


*  '''n_iter''': [ {'1000'} ] Maximum number of iterations for optimization.
*  '''n_iter''': [ {'1000'} ], Maximum number of iterations for optimization.
 
*  '''n_iter_without_progress''': [ {'300'} ] Maximum number of iterations before aborting optimization without progress.


*  '''min_grad_norm''': [ {'1e-7'} ] Gradient norm threshold for optimization abort.
*  '''n_iter_without_progress''': [ {'300'} ], Maximum number of iterations before aborting optimization without progress.


*  '''metric''': [ {'euclidean'} | 'manhattan' | 'cosine' | 'mahalanobis' ] The metric used to calculate distance between data samples.
*  '''min_grad_norm''': [ {'1e-7'} ], Gradient norm threshold for optimization abort.


*  '''init''': [ {'random'} | 'pca' ] Initialization method for the embeddings.
*  '''metric''': [ {'euclidean'} | 'manhattan' | 'cosine' | 'mahalanobis' ], The metric used to calculate distance between data samples.


*  '''random_state''': [ {'1'} ] Random seed number. Set this to a number for reproducibility.
*  '''init''': [ {'random'} | 'pca' ], Initialization method for the embeddings.


*  '''method''': [ {'barnes_hut'} | 'exact' ] Gradient calculation algorithm.
*  '''random_state''': [ {'1'} ], Random seed number. Set this to a number for reproducibility.


*  '''angle''': [ {'0.5'} ] Angular size of a distant node as measured from a point.
*  '''method''': [ {'barnes_hut'} | 'exact' ], Gradient calculation algorithm.


*  '''compression''': [ {'none'} | 'pca' ] Maximum number of iterations for optimization.
*  '''angle''': [ {'0.5'} ], Angular size of a distant node as measured from a point.


*  '''compressncomp''': [ {'2'} ] Maximum number of iterations for optimization.
*  '''compression''': [ {'none'} | 'pca' ], Type of data compression to perform on the x-block prior to calculating or applying the TSNE model. 'pca' uses a simple PCA model to compress the information.


*  '''compressmd''': [ {'yes'} ] Maximum number of iterations for optimization.
*  '''compressncomp''': [ {'2'} ], Number of latent variables (or principal components to include in the compression model).
*  '''compressmd''': [ {'yes'} | 'no' ], Use Mahalnobis Distance corrected.


The default options can be retrieved using: options = tsne('options');.
The default options can be retrieved using: options = tsne('options');.
Line 96: Line 95:
===See Also===
===See Also===


[[umap]], [[pca]]
[[umap]], [[pca]], [[python]]

Latest revision as of 07:08, 25 October 2021

Purpose

Create t-distributed Stochastic Neighbor Embeddings for visualization.

Synopsis

model = tsne(x,options); %identifies model
tsne %Launches Analysis window with TSNE selected

Please note that the recommended way to build a TSNE model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

TSNE is one of many tools to visualize high-dimensional data. Our software utilizes the Scikit-Learn implementation of the TSNE method. Their documentation can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html. Similarity scores are calculated between data samples in the original space to joint probabilities, and the method tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the joint probabilities of the data in the original space. The model will return n_components embeddings. E.g. for an M by N matrix, if the dimension of the embedded space (n_components) is K the embeddings will be of shape M by K. This method cannot be applied to new data.

Note: The PLS_Toolbox Python virtual environment must be configured in order to use this method. Find out more here: Python configuration. At this time, one cannot terminate Python methods from building by the conventional CTRL+C. Please take this into account and mind the workspace when using this method.

Inputs

  • x = X-block (2-way array class "double" or "dataset").

Optional Inputs

  • options = discussed below.

Outputs

The output of TSNE is a model structure with the following fields (see Standard Model Structure for additional information):

  • modeltype: 'TSNE',
  • datasource: structure array with information about input data,
  • date: date of creation,
  • time: time of creation,
  • info: additional model information,
  • description: cell array with text description of model, and
  • detail: sub-structure with additional model details and results.

Note: The embeddings of the TSNE model can be found under detail.tsne.embeddings.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], Governs level of display to command window,
  • plots: [ 'none' | {'final'} ], Governs level of plotting.
  • warnings : [{'off'} | 'on'], Silence or display any potential Python warnings. Only visible in the MATLAB command window.
  • preprocessing: {[]}, Cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),
  • n_components: [ {'2'} ], Dimension of the low dimensional embedded space.
  • perplexity: [ {'30'} ], Number of nearest neighbors TSNE considers when calculating conditional probabilities.
  • learning_rate: [ {'200'} ], The learning rate for TSNE, usually in the range [10.0, 1000.0] (as recommended from Scikit-Learn).
  • early_exaggeration: [ {'12'} ], Controls the tightness of clusters in the embedded space and the distance between clusters.
  • n_iter: [ {'1000'} ], Maximum number of iterations for optimization.
  • n_iter_without_progress: [ {'300'} ], Maximum number of iterations before aborting optimization without progress.
  • min_grad_norm: [ {'1e-7'} ], Gradient norm threshold for optimization abort.
  • metric: [ {'euclidean'} | 'manhattan' | 'cosine' | 'mahalanobis' ], The metric used to calculate distance between data samples.
  • init: [ {'random'} | 'pca' ], Initialization method for the embeddings.
  • random_state: [ {'1'} ], Random seed number. Set this to a number for reproducibility.
  • method: [ {'barnes_hut'} | 'exact' ], Gradient calculation algorithm.
  • angle: [ {'0.5'} ], Angular size of a distant node as measured from a point.
  • compression: [ {'none'} | 'pca' ], Type of data compression to perform on the x-block prior to calculating or applying the TSNE model. 'pca' uses a simple PCA model to compress the information.
  • compressncomp: [ {'2'} ], Number of latent variables (or principal components to include in the compression model).
  • compressmd: [ {'yes'} | 'no' ], Use Mahalnobis Distance corrected.

The default options can be retrieved using: options = tsne('options');.

PREPROCESSING

The preprocessing field can be empty [] (indicating that no preprocessing of the data should be used), or it can contain a preprocessing structure output from the PREPROCESS function. For example options.preprocessing = {preprocess('default', 'autoscale')}. This information is echoed in the output model in the model.detail.preprocessing field.

See Also

umap, pca, python