Multiblocktool and Variableselectiongui: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Jeremy
 
imported>Scott
No edit summary
 
Line 1: Line 1:
=Multiblock Tool=
==Introduction==
__TOC__


=Introduction=
The Variable Selection panel contains an interface to several methods for performing variable selection. The goal is to find subsets of variables that improve predictions when compared to using all variables. This interface has several different methods available. Finding the best method and options settings will take some experimentation. Use links below for more information on particular methods.


Joining data blocks from multiple sources can be challenging depending on where the data has come from, the attributes included with the data, and the manner in which you want to join the blocks.
==Methods==


There are several ways to join data blocks that break down into two categories: (A) Joining as new observations (samples or rows) and (B) Joining as new variables (columns). Joining data as new observations is discussed in the [[joining new data]] topic, but joining data as new variables is done using the Multiblock Tool and is discussed here.
* Automatic (VIP or sRatio)
* GA - Genetic Algorithm
* iPLS - Interval PLS
* rPLS - Recursive PLS
* sRatio - Selectivity Ratio
* VIP - Variable Importance in Projection


The Mulitblock Tool is an interface designed to make joining data as new variables easier and more transparent. and can handle cases where data sets are collected over a time range but one or more is at higher frequency than the others, when some samples are included in only some blocks, when a block has missing data, or when the blocks need to be "reduced" via a multivariate modeled prior to joining (aka Model fusion).
==Work Flow==


=Multiblock Tool Interface=
* <u>Select a Method</u> - Select a method from the drop-down menu. Options for the method will be displayed. If a previous calculation has been done, the results of it will be displayed.  
==Getting Started==
* <u>Adjust Options</u> - By default, a simplified set of options are displayed. If the "Show All Options" checkbox is selected then all available options will be displayed. Depending on the options set, a particular method can take an extended amount of time to complete. For example, decreasing the window width in GA will increase the amount of time it takes to complete. See documentation for more details on optional settings.
Recognizing that the purpose of this tool is to join blocks as new ''variables'', where the samples (observations, rows) must be equal, then there are a variety of ways in which data blocks can be joined. The simplest way is when blocks are equal in number of rows (and assuming each row corresponds to the same observation or sample). In some cases blocks are not equally sized but they do have samples in common and can be matched by sample label name. Finally, blocks may be sampled at different rates over the same period of time. If a time axisscale is included with both datasets samples can be automatically [[coadd | co-added]] to match the size.
* <u>Run Variable Selection</u> - Clicking the "Execute" button will run the current variable selection method with values specified in the options. A waitbar will be displayed indicating the method is running. Some methods will display a waitbar with a message indicating it can be closed to cancel execution. NOTE: It can take some time for the method to finish a calculation loop and identify the user has canceled. If "Show Plots" is checked then any additional plots will be displayed in separate windows. This is useful for GA as it will show progress of the calculation.
 
* <u>View Results</u> - When a calculation is complete the selected variables will be displayed under a plot of the data mean as green bars.
The diagram below shows conceptually how variable axis scales, labels or simple data size are used to join two blocks.
 
[[Image:MultiBlockOverview.jpg|500px]]
 
==Adding Data==
To add data to Multiblock Tool drag and drop it from the Workspace Browser, Matlab Workspace, or from a file. The data will automatically be loaded into the tool. Default preprocessing will be added, [[gscale|group scaling]] for data and [[auto|autoscale]] for models.
 
Note: If the data being imported has classes defined for the variables (identifying different groups of variables), you will be prompted if you want to split the block into separate DataSets for each class. This will allow you to handle each group of classed variables with different preprocessing and/or models.
 
==Setting Up the Join==
Once data is loaded, use right-click menus to edit and model raw data. Next, adjust preprocessing as appropriate. If a model is being used, choose fields to be used form the model. When choosing the first time, default fields will be selected. Finally, click the gears button to join the data. If any problems occur an error message will appear indicating the problem.
 
[[Image:MultiblockWindow.jpg| |800px | ]]
 
==Repeating a Join==
 
Repeating a join can be done in the same window by adding data to the New Data area. Note that where models are used they will be applied to new data and fields extracted as above.
 
[[Image:MultiblocktoolJoinedNewData.jpg| |700px| ]]
 
== Analyzing Joined Data==
 
Use the right-click menus on the joined data icons at the top of the window to send the data to the Analysis window. Data will have a variable class indicating the original data.

Revision as of 14:24, 11 January 2018

Introduction

The Variable Selection panel contains an interface to several methods for performing variable selection. The goal is to find subsets of variables that improve predictions when compared to using all variables. This interface has several different methods available. Finding the best method and options settings will take some experimentation. Use links below for more information on particular methods.

Methods

  • Automatic (VIP or sRatio)
  • GA - Genetic Algorithm
  • iPLS - Interval PLS
  • rPLS - Recursive PLS
  • sRatio - Selectivity Ratio
  • VIP - Variable Importance in Projection

Work Flow

  • Select a Method - Select a method from the drop-down menu. Options for the method will be displayed. If a previous calculation has been done, the results of it will be displayed.
  • Adjust Options - By default, a simplified set of options are displayed. If the "Show All Options" checkbox is selected then all available options will be displayed. Depending on the options set, a particular method can take an extended amount of time to complete. For example, decreasing the window width in GA will increase the amount of time it takes to complete. See documentation for more details on optional settings.
  • Run Variable Selection - Clicking the "Execute" button will run the current variable selection method with values specified in the options. A waitbar will be displayed indicating the method is running. Some methods will display a waitbar with a message indicating it can be closed to cancel execution. NOTE: It can take some time for the method to finish a calculation loop and identify the user has canceled. If "Show Plots" is checked then any additional plots will be displayed in separate windows. This is useful for GA as it will show progress of the calculation.
  • View Results - When a calculation is complete the selected variables will be displayed under a plot of the data mean as green bars.