Assigning Sample Classes

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

In many analysis applications, the groups or "classes" of samples in a data set are critical to the modeling and/or interpretation of results. In PCA, scores plots can be quickly interpreted for clustering if samples can be labeled/colored according to class. Likewise, the SIMCA, SVM-DA and PLS-DA methods rely on sample class information to generate models and assess results.

Solo and PLS_Toolbox make use of the free DataSet Object to associate a numerical class to each sample in a given set of data. These classes are automatically used when modeling and viewing your data and models. When viewing scores plots, for example, selecting View/Classes automatically uses unique symbols for each class.

Note that a class of 0 (zero) represents "unknown class" and should be avoided except for samples that you actually do not now the class of.

Assigning classes is similar to assigning labels and axisscales. There are a variety of ways to assign classes to samples using PLS_Toolbox described below.

DataSet Editor

From the Analysis window, the DataSet Editor can be opened using the menu item: Edit => Calibration or Validation => X-block Data (similarly, the "editds" command can be used from the MATLAB command line).

Because samples are associated with rows of the data, the classes for samples are assigned in the "Row Labels" tab. The column headed "Class" contains any classes for the samples. There are four ways of entering classes in the DataSet Editor:

(a) By Hand: Classes can be entered by hand in the cells of class column of the "Mode 1 Labels" tab. In PLS_Toolbox version 4.0+, clicking the "Class" tab and select Edit/Edit Class menu to enter all classes as a vector (Matlab formatted vector) of numbers or strings.
(b) Use Label As Class: If the classes were text labels when you read in your data, they will usually be included in the DataSet as label sets. In the DataSet editor, you can quickly make these classes by simply viewing the label set you want to make into a class set (select the sets just below the "Label" header on the Row Labels tab of the DataSet Editor), then right-clicking the Label column header and selecting "Use As Class". The labels will be automatically added as classes.
(c) Cut-and-Paste: If the class values are already entered in a separate application such as Microsoft Excel or a text editor, start by using the Edit/Copy function of that program to copy the class values there. Next, select the "Class" header button at the top of the Class column. Finally, select the Edit/Paste menu item.
  • Numerical Classes: If there were a sufficient number of numerical classes copied onto the clipboard, they will be pasted into the Class column.
  • Alphanumeric (String) Classes: If a set of strings is pasted into the class field, the strings will be converted into numerical classes but the strings will be added to the class "lookup" table (so the strings will appear in legends and in the DataSet editor).
(d) Use Column of Data: If the class values were loaded in as a column of your data (i.e. when you imported your data, the classes were one of the columns and now appear in the Data table). Select the "Data" tab of the DataSet Editor, locate the column which contains the class values, click on the column header button, and select the menu item: Edit/Use as Class. The selected column will be moved from the data table into the class field.
Note that strings can be assigned to numerically-assigned classes by editing the Class Lookup table. Select the Class column header (or right-click it) then select Edit/Class Lookup Table menu item.
(e) Load from MATLAB Workspace: If the class values are stored in a variable in the MATLAB workspace, they can be loaded directly. From the "Mode 1 Labels" tab, select the "Class" header button at the top of the class column. Select the menu item: File/Load Class. Locate the MATLAB variable using the load dialog and select "Load" (Hint: if the classes have already be read into another DataSet object, they can copied from that object using this same method)

Graphically

When viewing scores or even a plot of your data, you can use the Plot Controls to select points (Click "Select" and drag a box around the points to select), and then choose a numerical class for the selected samples by selecting Edit/Set Class of Selection from the Plot Controls figure. Note that you must be viewing the columns of the data to select classes for samples.

Command-line

Given a DataSet object in the MATLAB workspace, classes for samples can be set using the command:

x.class{1} = [1 1 2 2 2 2 3 3];

where x is the DataSet object and the values between the square brackets [ ] are the list of each sample's class.

Alternatively, you can assign STRING classes using a similar notation. Simply provide a cell array of strings (one string for each object/sample):

x.class{1} = {'A' 'A' 'B' 'B' 'B' 'B' 'C' 'C'};