Multi-block Multi-set and Data Fusion

From Eigenvector Research Documentation Wiki
Revision as of 16:55, 11 June 2015 by imported>Jeremy (Created page with "==Introduction== This page gives an overview of combining data blocks in PLS_Toolbox and Solo. There are various reasons for joining multiple data blocks together, and multip...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

This page gives an overview of combining data blocks in PLS_Toolbox and Solo. There are various reasons for joining multiple data blocks together, and multiple ways in which they can be joined. Some typical examples include:

  • Adding samples to an existing set of data to improve a calibration data set
  • Combining data from different experiments measured using the same technique(s)
  • Combining multiple measurements taken on the same objects (aka samples) or over the same or similar periods of time (for a time-dependent system)
  • Examining whether one or another technique provides more information on a given set of samples

The goal of these joining processes are often for multi-level, multi-set, data fusion, or model fusion studies.

In many cases, the join process tends to be one of two methods:

  1. Joining the data blocks as new variables for the same or similar objects or samples
  2. Joining the data blocks as new objects or samples over the same or similar measured variables

The basic concepts for each of these processes are described below.

Joining As New Samples

Joining as new samples is often a fairly simple process in that new data is simply appended on as new rows to a data matrix. In general, the number and type of variables must match, but tools (matchvars) exist that handle alignment of the different variable sets. Schematically, the process is shown in the image below.

The practical task of joining a second data block (or multiple additional data blocks) on as new samples can be accomplished within the Workspace Browser, Analysis Window, or DataSet Editor windows by simply dragging the new data files or loaded data object (usually from Workspace Browser) onto the existing loaded data. When dropped, you will be asked if you want to join the data as new samples, variables, or sometimes as new "slabs" (for 3D data). (The options available depend on the size of the data and only "Samples" may be available if sizes don't generally match.) The alignment of variables will generally be handled automatically and the blocks will be joined.

From the Matlab command line with PLS_Toolbox, the DataSet method augment() can be used to do joins:

 newdata = augment(1,data1,data2,data3)

would join three data blocks (data1, data2, data3) in the first mode (samples) automatically handling the variable alignment.

Samplejoin.png

Joining as New Variables

Joining data blocks as new variables is often more complex task and usually involves at least a scaling of the blocks (correcting for magnitude and variance). Sometimes the join involves additional block-specific preprocessing or even decomposing or analyzing the block with a multivariate model and using the outputs of that model in the join. Conceptually, the simplest join is shown below.

These operations are best carried out using the Multiblock Tool which handles the various alignment, preprocessing and modeling options.

Variablejoin.png