Multiblocktool

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Multiblock Tool

Introduction

Joining data blocks from multiple sources can be challenging depending on where the data has come from, the attributes included with the data, and the manner in which you want to join the blocks.

There are several ways to join data blocks that break down into two categories: (A) Joining as new observations (samples or rows) and (B) Joining as new variables (columns). Joining data as new observations is discussed in the joining new data topic, but joining data as new variables is done using the Multiblock Tool and is discussed here.

The Mulitblock Tool is an interface designed to make joining data as new variables easier and more transparent. and can handle cases where data sets are collected over a time range but one or more is at higher frequency than the others, when some samples are included in only some blocks, when a block has missing data, or when the blocks need to be "reduced" via a multivariate modeled prior to joining (aka Model fusion).

Multiblock Tool Interface

Getting Started

Recognizing that the purpose of this tool is to join blocks as new variables, where the samples (observations, rows) must be equal, then there are a variety of ways in which data blocks can be joined. The simplest way is when blocks are equal in number of rows (and assuming each row corresponds to the same observation or sample). In some cases blocks are not equally sized but they do have samples in common and can be matched by sample label name. Finally, blocks may be sampled at different rates over the same period of time. If a time axisscale is included with both datasets samples can be automatically co-added to match the size.

The diagram below shows conceptually how sample axis scales, labels or simple data size are used to join two blocks.

MultiBlockOverview.jpg

This diagram also indicates the order of testing used in trying to match samples, first looking for overlapping time axisscales or other monotonically increasing sample axisscales between the data blocks. If this is not found it next looks for matching sample labels between the data blocks and matches using those common labels. Finally, if these do not exist, it simply checks if the data blocks have the same number of samples and matches them directly. In the case of matching based on time axisscales, if one data block has higher frequency sampling than the other then it is sub-sampled to match the lower frequency data block time axisscale before joining the data blocks.

Adding Data

To add data to Multiblock Tool drag and drop it from the Workspace Browser, Matlab Workspace, or from a file. The data will automatically be loaded into the tool. Default preprocessing will be added, group scaling for data and autoscale for models.

Note: If the data being imported has classes defined for the variables (identifying different groups of variables), you will be prompted if you want to split the block into separate DataSets for each class. This will allow you to handle each group of classed variables with different preprocessing and/or models.


Block-Specific Preprocessing

Each block can have a specific preprocessing assigned to it. This preprocessing may often include filtering, normalizing, or other scaling methods that are particular for the type of data in the given block. It will only be applied to the given block. Block variance scaling (a type of group scaling) is recommended at a minimum.

Block-Specific Models

Each block can be optionally first decomposed or otherwise manipulated with a model. The result is model fusion where the outputs of the model are joined together rather than the original variables from the block.

Setting Up the Join

Once data is loaded, use right-click menus to edit and model raw data. Next, adjust preprocessing as appropriate. If a model is being used, choose fields to be used form the model. When choosing the first time, default fields will be selected. Finally, click the gears button to join the data. If any problems occur an error message will appear indicating the problem.

MultiblockWindow.jpg

Repeating a Join

Repeating a join can be done in the same window by adding data to the New Data area. Note that where models are used they will be applied to new data and fields extracted as above.

MultiblocktoolJoinedNewData.jpg

Analyzing Joined Data

Use the right-click menus on the joined data icons at the top of the window to send the data to the Analysis window. Data will have a variable class indicating the original data.