DataSet Object Fields

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

The following is a list of the DataSet Object properties (fields). The values in all fields can be assigned or retrieved using direct assignments (See dataset_subsasgn and dataset_subsref) or through the dataset_set and dataset_get DataSet Object Methods although direct assignments are the preferred of those two methods. See DataSet Object Examples for a walk-through example of how to build a DataSet object.

.name

char array (row vector)

This is usually the original variable name but can be set by the user. For example:

h = dataset(randn(5));
h.name = 'mydata';
h.name
ans =
   mydata

where ans is a 1x6 character array.

.type

string with one of the following values

'data', 'image', and 'batch'. See field '.data' for more information on these data types.

When .type = 'image', three additional fields are available in the DataSet: .imagemode, .imagesize, .imagedata, and .imageinclude. Each of these fields is described below.

.author

char array (row vector)

Used to assign authorship to a DataSet.

h.author = 'Joan Smith / Big Name Pharma';

.date

double array (1x6)

Six-element date/time stamp [year, month, day, hour, minute, second] indicating when the DataSet was created.

.moddate

double array (1x6)

Six-element date/time stamp indicating when the DataSet was last modified.

.data

Field holding the numerical data of the DataSet. Can take one of two forms:

1) Fixed-size array (i.e. the data matrix is N1xN2x…xNM) of class: double, single, logical, [u]int8, [u]int16, or [u]int32.

When the data are contained in a standard fixed-size array (field .type = 'data' or 'image') then M is the number of modes, indices are nm = 1, 2, …, Nm, and m = 1, 2, …, M. This notation follows, but is slightly more general than, the notation proposed by Kiers(2000). For example, where M=2 is N1xN2.

With multi-block analyses, the first mode (m=1) of the .data array corresponds to the common dimension between blocks (e.g. samples). Any function that changes this convention requires an optional control for sample/object mode.

For example, the following forms a three-slab multiway DataSet:

mwa(:,:,1) = randn(5);
mwa(:,:,2) = magic(5);
mwa(:,:,3) = sin([1:5]')\*cos(1:5);
mydatb = dataset(mwa);
size(mydatb.data,3)
ans =
  3

For more information on type 'image' DataSet objects, see the .imagemode field below.

Indexing into the .data field can be done several ways.

  1. Using .data will return entire matrix of data.
  2. The data can be index into with standard notation:
    column_1 = mydatb.data(:,1);%Return first column of data.
  3. Using named labels or classes. If the DataSet contains labels and or classids, these can be indexed into:
    load arch; fe_column = arch.Fe.data;
  4. Included data can be accessed with an additional .include suffix:
    include_data = arch.data.include;
    Note: Data cannot be assigned using this method. Data cannot be indexed into via this field.

2) cell array (Length I with i = 1, 2, …, I. Each cell contains a double array with m = 2, 3, …, M. The size of mode 1 (N1,i) for each array can vary from cell to cell. All other modes must be of equal size from cell to cell.)

This data construct allows for matrices that are all the same size except for in the first mode which can be of variable length. This data can not be stored in a multiway array and instead are stored in a cell array (field .type = 'batch'). This results in differences between double array data that are reflected in SET, GET, and the field .axisscale.

A standard DataSet object (type='data') can be created from a type='batch' DataSet object by using the standard cell-array indexing on the main DataSet object. For example, the first batch of a multi-batch DataSet object called allbatches can be extracted using the command:

 batch_1 = allbatches{1}

The .data field and all other fields will be reduced down to include only the information related to the batch specified.

.size

double vector {1xndims}

Returns vector of each dim size.


>> wine.size
ans =
    10     5


.sizestr

char array (row vector)

Returns string of size vector.

>> wine.sizestr
ans =
10x5

.imagemode

double scalar {1x1}

Mode which image data is unfolded to. For example, a typical RGB image (.jpg) would, by default, be unfolded into .imagemode=1 by buildimage.

>> dat = imread('EchoRidgeClouds.jpeg','jpeg');
size(dat)
ans =
   768   512     3
>> imgdso = buildimage(dat, [1 2], 1);
>> imgdso.imagemode
ans = 
     1

.imagesize

double vector {1xMimg}

Size of image modes. The "spatial" size of the image before it was unfolded. Using the example from above, the image size would be:

>> imgdso.imagesize
ans =
   768   512

.imagesizestr

char array (row vector)

Returns string of image size vector. From the example above:

>> imgdso.imagesizestr
ans =
768x512

.foldedsize

double vector {1xMimg}

Size of folded image, note that this size respects the .imagemode value so if an image was unfolded into a mode other than '1' the value would be reflected in this field. In this example the EchoRidgeClouds image is unfolded into mode=2 of the dataset so the folded size has the "slabs" in mode 1:

>> imgdso = buildimage(dat, [1 2], 2);
>> imgdso.foldedsize
ans =
     3   768   512

.foldedsizestr

char array (row vector)

Returns string of folded image size (from above).

.imagedata

double array

These three fields are only available for type 'image' DataSet objects. In image DataSet objects, the .data field contains "unfolded" image data. Image data is usually contained in a 2nd or higher-order matrix in which several modes are used to describe a spatial relationship between pieces of information. For example, many standard JPEG images are three-way images of size M x N x 3. The first two dimensions are the spatial dimensions in that the actual image is M pixels high by N pixels wide. The third dimension is the wavelength dimension and, in this case, contains 3 slabs – one for each of the Red, Green, Blue image components.

Working with such image data in DataSet objects is made easier by unfolding such multi-way images so that all the spatial modes are stacked on top of each other in a single mode. Unfolding is done so that all the spatial information is contained in a single mode and can be handled together – often so that each pixel can be analyzed as an individual sample (or even sometimes as variables). Individual pixels in an unfolded image are independent and can be individually included and excluded (see .include field) or assigned particular classes (see .class field), for example. In the case of the JPEG mentioned above, the unfolded image would be stored as an (MN) x 3 matrix where the first mode was M times N elements (i.e. pixels) in size.

The .imagemode field contains a scalar value indicating which mode of the .data field contains the spatial information. In the JPEG example, .imagemode would be 1 (one). Similarly the .imagesize field contains a vector describing the original size of the spatial mode before unfolding. The JPEG example would contain the two-element vector: [M N] Note: that .imagesize contains only the size of the image mode, not the entire data matrix. Note that the product of the .imagesize field must be equal to the size of the .imagemode mode of the .data field. That is, the number of pixels contained in the spatial mode of the data must be appropriate that it can be reshaped into a matrix of size .imagesize. See the .foldedsize field for the size of entire folded matrix.

The .imagedata field is a special read-only field which returns the contents of the .data field refolded back into the original image-sized matrix. In the JPEG example, the .data field would return a matrix of size (MN) x 3 (the unfolded image) but the .imagedata field would return the original M x N x 3 matrix. Any changes you make to the contents of the .data field will automatically be reflected in the contents returned by .imagedata. .imagedata, however, can not be written to.

To create a type 'image' DataSet object:

(a) Unfold the spatial mode of the original data (the following example is for three way data but is easily extended to multi-way images):

sz = size(x);
x = reshape(x, sz(1)*sz(2), sz(3));

(b) Create the standard DataSet object (type='data') from the unfolded data:

imgdso = dataset(x);

(c) Change the DataSet object type to 'image':

imgdso.type = 'image';

(d) And store the original image size in the DataSet

imgdso.imagemode = 1;
imgdso.imagesize = sz(1:2);

Note that although this example shows the image being unfolded into the first mode, there is no requirement that the first mode be the pixel mode. The first mode is, however, the most consistent with the concept that individual pixels are often considered separate samples. Use the permute function on x prior to creating the DataSet (before step (b) above) to move the spatial mode to a different dimension.

.imagedataf

double array

By default the .imagedata field folds data with the spacial dims first (e.g., m pixels by n pixels by z values) without regard to the .imagemode field. The .imagedataf field will return folded data where the spacial modes are inserted beginning at the .imagemode dim. For example, if you have a 250x250x3 image dataset where imagemode = 2 your DSO will look similar to:

...
       data: 3x62500 [double]
  imagesize: 250x250
  imagemode: 2
...

The .imagedataf field will return folded data for size 3x250x250 where as the .imagedata field will return 250x250x3.

.label

cell array {MxSlbls}

Each column of the cell contains char arrays of labels for each mode of '.data' (the cell entries may be empty.) The first row corresponds to labels for the first mode of '.data' and contains char arrays with N1 rows. Subsequent rows of the cell contain labels for corresponding modes of '.data' each of which are char arrays with Nm rows. Additional columns contain alternate sets of labels for all modes.

Assignments to individual cell elements may be a character array or a cell array. Labels are always converted to character arrays when retrieved.

For example, suppose there is a DataSet object x. Its labels for the mth mode are contained in x.label{m,1}. If there is a second set of labels for this mode, they are contained in x.label{m, 2}. The following sets the second set of labels for the first mode of DataSet x assuming size(x.data,1) = 3

x.label{1,2} = {'firstlabel','secondlabel','thirdlabel'};

An individual label in any set can be changed by adding an additional index using "curly braces" (the ones used for cell indexing) and the string which should be used. For example, changing the second row's label (in the second set) would be accomplished using:

x.label{1,2}{2} = 'New second label';

.labelname

cell array {MxSlbls}

Each column of the cell is associated with the corresponding column of '.label' and contains string descriptions (each a row char array) of the corresponding labels for each mode of '.data' (e.g. 'Mode 1 Lbls', 'Mode 2 Lbls', …). Additional columns contain alternate sets of label names.

For example, the label name associated with the first set of labels for mode m is contained in x.labelname{m,1}.

.axisscale

1) cell array {MxSscale} (when '.data' is a fixed-size array)

When '.data' is a fixed-size array (other than cell), each column of the cell contains a set of axis scales (a total of Mx1 vectors, each Nmx1 of class double) for each mode of '.data' (e.g. a time axis, or sample number). Rows of the cell correspond to modes of '.data' (e.g. a wavelength axis, or variable number). Additional columns of '.axisscale' contain alternate sets of axis scales for all modes.

For example, suppose there is a DataSet object x. Its axis scale for the mth mode are contained in x.axisscale{m,1}. If there is a second set of axis scales they are contained in x.axisscale{m,2}. The following sets the first set of axisscales for the third mode of DataSet x assuming size(x.data,3) = 10

x.axisscale{3,1} = 1:10;

2) cell array {MxSscale} (when '.data' is a cell array)

When '.data' is class cell, the contents of '.axisscale' are similar to those described above, except for mode 1 (i.e. first row of the '.axisscale' cell). In this case, the contents of the first row contain cell arrays (in contrast to vectors of class double) whose contents are axis scales (Nn1x1 vectors). Each axis scale corresponds to the first mode of each cell of '.data' (e.g. a time axis, or sample number). This allows for variable axis scales when the data matrices are of variable length in the first mode. Subsequent rows of the cell contain scales (Nmx1 vectors) for corresponding modes of '.data' (e.g. a wavelength axis, or variable number) just as when '.data' is a fixed-size array.

.axisscalename

cell array {MxSscale}

Each column of the cell is associated with the corresponding column of '.axisscale' and contains string descriptions (each a row char array) of the corresponding axis scale for each mode of '.data' (e.g. 'Mode 1 Axis', 'Mode 2 Axis', …). Multiple columns contain alternate sets of axis scale names.

For example, the axis scale name associated with the first set axis scale for mode m is contained in x.axisscalename{m,1}.

.axistype

cell array {MxSscale}

The 'axistype' field is informational only and does not effect other properties of the DataSet object. The axistype field identifies the relationship between adjacent elements in the corresponding axisscale (same mode and set). Axistype identifies this relationship using one of the following keywords:

none : {default} a relationship between values is not known. Software may default to whatever display and calculation mode desired.

discrete : Individual items (e.g. columns) are discrete of each other and should not be interpolated between in plots or other numerical operations. Show as individual points.

stick : Individual items (e.g. columns) are discrete of each other and should not be interpolated between in plots or other numerical operations, however, each item has a connection to a value of zero at the given axisscale position (e.g. sticks down to the zero line on the y-axis)

continuous : Individual items (e.g. columns) are considered points on a continuous axis and may be interpolated between in plots and other numerical operations.

When concatenating two DataSets with different axistypes for the concatenation mode, the least-presumptive axistype in this list will be used. That is, "none" will be selected over "discrete" which will be selected over "stick" which would be selected over "continuous".

Although some software makes use of this field to help customize plots and numerical operations, it is not required operation.

.imageaxisscale

1) cell array {MxSscale} (when '.data' is a fixed-size array)

This field applys to the image modes (i.e. spatial image mode) and there is one entry in each of these fields per image mode. For general information see axisscale. With the EchoRidgeCoulds example:

dat = imread('EchoRidgeClouds.jpeg','jpeg');
imgdso = buildimage(dat, [1 2], 1);
imgdso.imageaxisscale{1} = [1:768]/10;%Change image scale to tenths.
imgdso.imageaxisscale{2} = [1:512]/10;%Change image scale to tenths.

.imageaxisscalename

cell array {MxSscale}

This field applys to the image modes (i.e. spatial image mode) and there is one entry in each of these fields per image mode. See axisscalename.

.imageaxistype

cell array {MxSscale}

This field applys to the image modes (i.e. spatial image mode) and there is one entry in each of these fields per image mode. See axistype.

.title

cell array {MxStitle}

Each column of the cell contains a set of mode titles (each a char array) for each mode of '.data'. Rows of the cell correspond to modes of '.data' and additional columns of '.title' contain alternate sets of mode titles for all modes.

For example, suppose there is a DataSet object x. Its mode title for the mth mode is contained in x.title{m,1}. If there is a second set of mode titles they are contained in x.title{m,2}.

.titlename

cell array {MxStitle}

Each column of the cell is associated with the corresponding column of '.title' and contains string descriptions (each a row char array) of the corresponding mode titles for each mode of '.data' (e.g. 'Mode 1 Title', 'Mode 2 Title', …). Additional columns contain alternate sets of mode title names.

For example, the mode title name associated with the first set mode title for mode m is contained in x.titlename{m,1}.

.class

cell array {MxSclass}

Each column of the cell contains a set of class identifiers (each an Nmx1 vector) for each mode of '.data'. Rows of the cell correspond to modes of '.data' and additional columns contain alternate sets of class identifiers for all modes.

For example, suppose there is a DataSet object x. Its class identifier for the mth mode is contained in x.class{m,1}. If there is a second set of class identifiers they are contained in x.class{m,2}. See example for axisscale field.

Assignments into class can be either numeric (assigning an appropriate-length vector of numbers) or string (using an appropriate-length cell array of strings). However, retrieving values from the .class field will always return numerical values. See the classid field to retrieve classes as strings instead of numerical values and the classlookup field to retrieve the table used to translate numerical classes to string values.

.classid

cell array {MxSclass}

This field is a pseudonym for the class field and is indexed exactly the same except that when retrieving values from this field, the returned value is a cell array of strings where each string represents the class of the object (row, column, etc). Assignments to this field can also be either class strings or numerical class values. See also the classlookup field to retrieve the table used to translate numerical classes to string values.

.classlookup

cell array {MxSclass}

The classlookup field is sized the same as the class field but each entry contains a lookup table, structured as a k by 2 cell array. This lookup table contains the numerical class values in the first column and the corresponding class name in the second column. A shortcut to look up a class number or class string in this table is provided by adding the indexing command find() onto the end of the classlookup extraction command. Given a DataSet object, x, with a set of row classes:

x.classlookup{1,1}.find(val)

returns the string associated with the numerical value "val". Likewise:

x.classlookup{1,1}.find('string')

returns the numerical class value associated with the class description string 'string'.

.classlookup.assign

Assignment into the classlookup field is specially controlled. Individual entries can not be directly edited but, instead, must be changed using the indexing: assignstr or assignval. These commands are given after the .classlookup{mode,set} indexing and are followed by an assignment which gives the new values. For example, given a DataSet object, x, the following notation is used to change the string associated with a given class number:

x.classlookup{1,1}.assignstr = {3 'new string'};

The above would change the string associated with class 3 (of mode 1, set 1) to be 'new string'. Similarly:

x.classlookup{1,1}.assignval = {18 'current string'};

would change the class number associated with the class 'current string' to be 18.

If the entire lookup table is replaced by direct assignment, any class number which appears in the class field but does not appear in the new lookup table will be automatically changed to class 0 (zero) in the class field. Thus, replacing the lookup table can be used to effectively delete a class from the class field.

.classname

cell array {MxSclass}

Each column of the cell is associated with the corresponding column of '.class' and contains string descriptions (each a row char array) of the corresponding class identifiers for each mode of '.data' (e.g. 'Mode 1 Classes', 'Mode 2 Classes', …). Additional columns contain alternate sets of class identifier names.

For example, the class name associated with the first set class identifiers for mode m is contained in x.classname{m,1}.

.include

cell array {Mx1} with vector contents

Each row of '.include' contains a vector of indices indicating which samples or variables to use in an analysis. When first constructed the contents of each cell in '.include' is [1:Nm]. The '.include' field allows for sample or variable exclusion (e.g. soft delete) of '.data' rows or columns without actually removing or modifying the raw data in the data set. Note: In earlier versions of the DataSet object this field was named '.includ'. Although that field name will be translated into '.include', use of the full field name is recommended.

.uniqueid

char array (row vector) read-only

Unique identifier for a dataset.

.description

char array

Contains any text as a description of the data set.

.history

cell array of strings

Running history of commands that have modified the DataSet contents. User can add comments by using:

>> mydataset.history = 'Comment here.';
>> mydataset.history{end}
ans =
%%% Comment: Comment here. (08-Sep-2009 17:14:52.538)

.userdata

can contain any class and is a container for additional user data.

Note that when DataSet objects are concatenated this field may be turned into a cell array containing user data from each concatenated object in the cell containers.