DataSet Construction: Difference between revisions
imported>Mathias |
imported>Mathias |
||
Line 19: | Line 19: | ||
<br clear=all> | <br clear=all> | ||
==From the MATLAB Command Line== | |||
Datasets can be created by passing an array to the dataset function. In this example we will use data field from the wine demo dataset. | Datasets can be created by passing an array to the dataset function. In this example we will use data field from the wine demo dataset. | ||
Revision as of 12:30, 23 May 2016
Getting Started
In general, data is stored in a dataset object.
From a GUI
Using PLS_Toolbox and Solo, it is easy to import data into a dataset object using the data importer. From the workspace browser select File/Import Data to launch the GUI.
Alternatively this can be acheived by dragging the desired file into the Workspace Browser. In the case of text based file formats such as CSV, this will launch the following window.
This window will allow the user to choose options specefic to this file, such as the number of header rows to ignore, and which delimiter to use. Clicking OK will launch the data import tool pictured below. The user can specify which columns and rows will be used as the datasets axisscales and labels. In this example, the first row and the second column have been specefied as axisscales.
From the MATLAB Command Line
Datasets can be created by passing an array to the dataset function. In this example we will use data field from the wine demo dataset.
>> load wine >> dat = wine.data; >> names = wine.label{1}; >> var = wine.label{2}; >> whos Name Size Bytes Class Attributes dat 10x5 400 double names 10x6 120 char var 5x6 60 char wine 10x5 12156 dataset
The variable 'dat' contains the data array corresponding to the 5 variables wine, beer, and liquor consumption, life expectancy, and heart disease for 10 samples (countries).
The country names are contained in the variable 'names' and the variable names are contained in 'vars'. The next step creates a DataSet object, gives it a name, authorship, and description.
»wined = dataset(dat); »wined.name = 'Wine'; »wined.author = 'A.E. Newman'; »wined.description= ... {'Wine, beer, and liquor consumption (gal/yr)',... 'life expectancy (years), and heart disease rate', ... '(cases/100,/yr) for 10 countries.'}; »wined.label{1} = names; »wined.label{2} = vars;
Additional assignments can also be made. Here the label for the first mode (rows) is shown explicitly next to the data array (like sample labels). Also, titles, axis, and titles are assigned.
»wined.labelname{1} = 'Countries'; »wined.label{1} = ... {'France' ... 'Italy', ... 'Switz', ... 'Austra', ... ... 'Mexico'}; »wined.title{1} = 'Country'; »wined.class{1} = [1 1 1 2 3]; »wined.classname{1} = 'Continent'; »wined.axisscale{1} = 1:5; »wined.axisscalename{1} = 'Country Number';
Additional assignments can also be made for mode 2. Here the label for the second mode (columns) is shown explicitly above the data array (like column headings). Also, titles, axis, and titles are assigned.
»wined.labelname{2} = 'Variables'; »wined.label{2} = ... {'Liquor','Wine','Beer','LifeExp','HeartD'};
If the data matrix is N-way the assignment process can be extended to Mode 3, Mode 4, ... Mode N. It can also be extended to using multiple sets of labels and axis scales e.g.
»wined.labelname{2,2} = 'Alcohol Content and Quality'; »wined.label{2,2} = {'high','medium','low','good','bad'};
An individual label can be replaced by further indexing into a given label set using curly braces followed by the string replacement:
»wined.label{2,2}{4} = 'excellent';
Indexing Into DataSets
Sub-portions of the DataSet can be retrieved by indexing into the main DataSet object. For example, here the first three columns ('Liquor', 'Wine', and 'Beer') are extracted into a new DataSet named "alcohol":
»alcohol = wined(:,1:3);
Additionally, any field in the DataSet can also be indexed into directly. Here the second country name is pulled out of the labels by extracting the entire second row of the mode 1 labels:
»country2 = wined.label{1}(2,:);
Indexing using Labels and Classes
A shortcut to extract a subset of a DataSet is to index into the main DataSet object using labels and/or classes for the requested item(s). For example, to extract a DataSet containing only the Liquor values, you could use:
»alcohol = wined.liquor;
Note that the upper-case characters in the label do not matter. If the label or class starts with a number of contains any non-alphanumeric characters, you must enclose the label in parenthesis and quotes:
»alcohol = wined.('liquor');
Indexing with Class or Label Set Names
Class and label information can be extracted using the the "set name" and dot notation.
mylabels = wine.Country mylabels = France Italy Switz Austra Brit U.S.A. Russia Czech Japan Mexico
Note that class names will be checked first, before label names.
Creating 3-Way Data
There are several ways to create 3-way data in PLS_Toolbox.
If the data is given in seperate text based files such as .csv, the data can easily be imported into a 3-way dataset using the Text import data tool. By dragging all files into the Workspace Browser and then selecting