DataSet Object Examples

From Eigenvector Research Documentation Wiki
Revision as of 15:27, 10 October 2008 by imported>Jeremy
Jump to navigation Jump to search

The following shows an example using the 'wine' data set in the PLS_Toolbox. Other examples can be found in the datasetdemo.m script.

The first step in the example is to load the 'wine' data set and examine the variables. The MATLAB commands are:

»load wine_raw
»whos
  Name  Size Bytes Class
  dat   10x5  400  double array
  names 10x6  120  char array
  vars   5x6   60  char array

The variable 'dat' contains the data array corresponding to the 5 variables wine, beer, and liquor consumption, life expectancy, and heart disease for 10 samples (countries).

The country names are contained in the variable 'names' and the variable names are contained in 'vars'. The next step creates a DataSet object, gives it a name, authorship, and description.

»wined = dataset(dat);
»wined.name = 'Wine';
»wined.author = 'A.E. Newman';
»wined.description= ...
{'Wine, beer, and liquor consumption (gal/yr)',...
'life expectancy (years), and heart disease rate', ...
'(cases/100,/yr) for 10 countries.'};
»wined.label{1} = names;
»wined.label{2} = vars;

Additional assignments can also be made. Here the label for the first mode (rows) is shown explicitly next to the data array (like sample labels). Also, titles, axis, and titles are assigned.

»wined.labelname{1} = 'Countries';
»wined.label{1} = ...
{'France' ...
'Italy', ...
'Switz', ...
'Austra', ...
...
'Mexico'};
»wined.title{1} = 'Country';
»wined.class{1} = [1 1 1 2 3];
»wined.classname{1} = 'Continent';
»wined.axisscale{1} = 1:5;
»wined.axisscalename{1} = 'Country Number';

Additional assignments can also be made for mode 2. Here the label for the second mode (columns) is shown explicitly above the data array (like column headings). Also, titles, axis, and titles are assigned.

»wined.labelname{2} = 'Variables';
»wined.label{2} = ...
{'Liquor','Wine','Beer','LifeExp','HeartD'};

If the data matrix is N-way the assignment process can be extended to Mode 3, Mode 4, ... Mode N. It can also be extended to using multiple sets of labels and axis scales e.g.

»wined.labelname{2,2} = 'Alcohol Content and Quality';
»wined.label{2,2} = {'high','medium','low','good','bad'};

An individual label can be replaced by further indexing into a given label set using curly braces followed by the string replacement:

»wined.label{2,2}{4} = 'excellent';

Sub-portions of the DataSet can be retrieved by indexing into the main DataSet object. For example, here the first three columns ('Liquor', 'Wine', and 'Beer') are extracted into a new DataSet named "alcohol":

»alcohol = wined(:,1:3);

Similarly, a shortcut to extract a single variable or sample out of the DataSet is to index into the main DataSet object using the label for the requested item. For example, to extract a DataSet containing only the Liquor values, you could use:

»alcohol = wined.liquor;

Note that the upper-case characters in the label do not matter. If there are any spaces or mathematical symbols in the label, you must enclose the label in parenthesis and quotes:

»alcohol = wined.('liquor');

Additionally, any field in the DataSet can also be indexed into directly. Here the second country name is pulled out of the labels by extracting the entire second row of the mode 1 labels:

»country2 = wined.label{1}(2,:);