Multi-block Multi-set and Data Fusion
Introduction
This page gives an overview of combining data blocks in PLS_Toolbox and Solo. There are various reasons for joining multiple data blocks together, and multiple ways in which they can be joined. Some typical examples include:
- Adding samples to an existing set of data to improve a calibration data set
- Combining data from different experiments measured using the same technique(s)
- Combining multiple measurements taken on the same objects (aka samples) or over the same or similar periods of time (for a time-dependent system)
- Examining whether one or another technique provides more information on a given set of samples
The goal of these joining processes are often for multi-level, multi-set, data fusion, or model fusion studies.
In many cases, the join process tends to be one of two methods:
- Joining the data blocks as new variables for the same or similar objects or samples
- Joining the data blocks as new objects or samples over the same or similar measured variables
The basic concepts for each of these processes are described below.
Joining As New Samples
Joining as new samples is often a fairly simple process in that new data is simply appended on as new rows to a data matrix. In general, the number and type of variables must match, but tools (matchvars) exist that handle alignment of the different variable sets. Schematically, the process is shown in the image below.
The practical task of joining a second data block (or multiple additional data blocks) on as new samples can be accomplished within the Workspace Browser, Analysis Window, or DataSet Editor windows by simply dragging the new data files or loaded data object (usually from Workspace Browser) onto the existing loaded data. When dropped, you will be asked if you want to join the data as new samples, variables, or sometimes as new "slabs" (for 3D data). (The options available depend on the size of the data and only "Samples" may be available if sizes don't generally match.) The alignment of variables will generally be handled automatically and the blocks will be joined.
From the Matlab command line with PLS_Toolbox, the DataSet method augment() can be used to do joins:
newdata = augment(1,data1,data2,data3)
would join three data blocks (data1, data2, data3) in the first mode (samples) automatically handling the variable alignment.
Joining as New Variables
Joining data blocks as new variables is often more complex task and usually involves at least a scaling of the blocks (correcting for magnitude and variance). Sometimes the join involves additional block-specific preprocessing or even decomposing or analyzing the block with a multivariate model and using the outputs of that model in the join. Conceptually, the simplest join is shown below.
These operations are best carried out using the Multiblock Tool which handles the various alignment, preprocessing and modeling options.