Multiblocktool

From Eigenvector Research Documentation Wiki
Revision as of 17:02, 11 June 2015 by imported>Jeremy (→‎Getting Started)
Jump to navigation Jump to search

Multiblock Tool

Introduction

Joining data blocks from multiple sources can be challenging depending on where the data has come from, the attributes included with the data, and the manner in which you want to join the blocks.

There are several ways to join data blocks that break down into two categories: (A) Joining as new observations (samples or rows) and (B) Joining as new variables (columns). Joining data as new observations is discussed in the joining new data topic, but joining data as new variables is done using the Multiblock Tool and is discussed here.

The Mulitblock Tool is an interface designed to make joining data as new variables easier and more transparent. and can handle cases where data sets are collected over a time range but one or more is at higher frequency than the others, when some samples are included in only some blocks, when a block has missing data, or when the blocks need to be "reduced" via a multivariate modeled prior to joining (aka Model fusion).

Multiblock Tool Interface

Getting Started

Recognizing that the purpose of this tool is to join blocks as new variables, where the samples (observations, rows) must be equal, then there are a variety of ways in which data blocks can be joined. The simplest way is when blocks are equal in number of rows (and assuming each row corresponds to the same observation or sample). In some cases blocks are not equally sized but they do have samples in common and can be matched by sample label name. Finally, blocks may be sampled at different rates over the same period of time. If a time axisscale is included with both datasets samples can be automatically co-added to match the size.

The diagram below shows conceptually how variable axis scales, labels or simple data size are used to join two blocks.

MultiBlockOverview.jpg

Adding Data

To add data to Multiblock Tool drag and drop it from the Workspace Browser, Matlab Workspace, or from a file. If the data has variable class information you will be prompted and asked if you'd like to split the block into separate datasets for each class. The data will then be loaded. Default preprocessing will be added, group scaling for data and autoscale for models.

Setting Up the Join

Once data is loaded, use right-click menus to edit and model raw data. Next, adjust preprocessing as appropriate. If a model is being used, choose fields to be used form the model. When choosing the first time, default fields will be selected. Finally, click the gears button to join the data. If any problems occur an error message will appear indicating the problem.

MultiblockWindow.jpg

Repeating a Join

Repeating a join can be done in the same window by adding data to the New Data area. Note that where models are used they will be applied to new data and fields extracted as above.

MultiblocktoolJoinedNewData.jpg

Analyzing Joined Data

Use the right-click menus on the joined data icons at the top of the window to send the data to the Analysis window. Data will have a variable class indicating the original data.