Bspcgui and Faq import three-way data: Difference between pages

From Eigenvector Research Documentation Wiki
(Difference between pages)
Jump to navigation Jump to search
imported>Scott
 
imported>Bob
No edit summary
 
Line 1: Line 1:
__TOC__
===Issue:===


=Introduction=
How do I import three-way data into Solo or PLS_Toolbox?
Batch Statistical Process Control (BSPC) is the analysis of process data as a function of both correlation among the measured variables and correlation in time (also known as the batch trajectory). The data is subdivided into "batches" (experiments) each of which may be further subdivided into "Steps" (sub-divisions of batch indicating processing segments or other divisions of batches). BSPC goes by many names, process monitoring, fault detection, and anomaly detection, to name a few. Methods generally rely on a model that describes normal and/or desirable operation. Often much is learned about the process from simply the creation of a model. Given a process model, future operating data can be compared to the model to determine if the process condition is nominal.


The BSPC interface is available in Solo or the PLS_Toolbox browser window in the "Analysis Tools" panel under the "TRANSFORM" label as "Batch Processor". This interface prompts the user to select a type of analysis method they want to do, then guides the user through the steps necessary to use that method including:
===Possible Solutions:===
# Importing and organizing the batch data
# Assuring the batch and step labels (if desired) are assigned
# Aligning the time axes of the batches (if needed for the specific analysis method)
# Choosing other data manipulation settings as needed for the method
# Rearranging the data to the appropriate format for the specific analysis method using the [[batchfold]] function


=Getting Started=
'''Solution 1) Built in EEM importers :'''
Data is derived directly from process data with the goal being to summarize high-dimensional data with a handful of factors that capture important directions in the data. Success is highly dependent upon the quantity and quality of process data.


Raw data is presumed to be in a 2 dimensional DataSet with rows being samples in time and columns being variables.
If applicable to your file type, use one of the built in EEM importers. There are importers for EEM data from Hitachi, Shimazdu, Horiba and Jasco. Please see this wiki entry for more information on [[Data_Importing_Formats | Data Importing Formats]]


[[Image:bspc_data_config.png|200px|Data Configuration]]
EEM data needs be configured in a specific way such that:


===Model Types===
* '''mode 1''' corresponds to '''samples'''


The following describes the model types available as targets of the BSPC processing. The dimensions of the resulting processed data and other considerations are listed along with a brief description of the unique characteristics of the model type.
* <div>'''mode 2''' corresponds to '''emission'''</div>


{| class="wikitable" border="1"
* <div>'''mode 3''' corresponds to '''excitation'''</div>
|+ BSPC Model Types
! Model !! Modes (Dimensions) !! Equal Length Batches !! Steps Aligned !! Data Shape !! Model Comments
|-
| Summary PCA || 2 || No || No || Batch x (Step/Summary) || PCA on statistics summarizing the change in measured variables over the batch progress. Less sensitive to specific batch trajectory.
|-
| [[Batchmaturity|Batch Maturity]] || 2 || No || No || (Batch/Step) x Variable, Can have Y-Block to indicate maturity || PCA with heterogeneous confidence limits based on percent progression through batch.
|-
| [[Mpca|MPCA]] || 3 || Yes || Yes || Time (step) x Variable x Batch || Multiway PCA - captures correlation between variables and their changes through time (trajectory). Very sensitive to trajectory differences.
|-
| [[Parafac|PARAFAC]] || 3 || Yes || Yes || Batch x Variable x Time (step) || Parallel Factor Analysis (multiway). Imposes stronger expectation of similarity between variable trajectories than MPCA.
|-
| Summary PARAFAC || 3 || No || No ||  Batch x Step x Summary || PARAFAC on summary statistics of variables over time. Less sensitive to specific batch trajectory, imposes expectation of correlation between steps of process.
|-
| [[Parafac2|PARAFAC2]] || 3 || No || No ||  Cell Array of Batches || PARAFAC with relaxed multiway structures (only available at PLS_Toolbox command line). Much less sensitive to specific batch trajectory than PARAFAC or MPCA.
|}


See Also: [[batchmaturity|Batch Maturity]], [[mpca|MPCA]], [[MSPC_and_Identification_of_Finite_Impulse_Response_Models|MSPC]], [[parafac|PARAFAC]], [[parafac2|PARAFAC2]]
The built-in EEM importers will handle this configuration automatically. When importing manually (see below), further manipulation will likely be necessary. Use the Transform &rarr; Permute modes and Transform &rarr; Reshape smenu items to modify your imported data as appropriate.


=Batch Processor Window=
'''Solution 2) For three-way data with few slabs:'''


The goal of the Batch Processor interface is to make it easier to assemble batch data for multivariate analysis. Different analyses and conditions require different data manipulation. This interface attempts to simplify the assembling of data for batch analysis which might otherwise be [[media:Bspc_diagram_roadmap.png |‎ very complicated]].
<ol style="list-style-type:lower-alpha">
  <li>Import the data slabs into the workspace (browser). The workspace browser is available from the main analysis user interface from the menu item FigBrowser.</li>
  <li>Each slab, i.e. each matrix of data is imported individually. Hence, if you have a '''10x8x3''' array, you will import three slabs each of size '''10x8'''.</li>
  <li>Use the mouse to drag slab two onto slab one. In the window that opens choose Augment and then choose augment in the Slabs direction.</li>
  <li>A two-slab three-way array has now replaced the first data matrix. More slabs can be added in the same fashion.</li>
</ol>


The workflow of the interface moves from left to right across the tabs at the top of the interface. Loading data and choosing an Analysis Type will enable relevant tabs. Clicking the '''Next''' button will open the next enabled tab. Batches and steps are defined, then alignment and summary information is added. When finished, "folded" data can be saved or exported to the [[Analysis GUI|analysis]] interface and a model for folding new data can be saved.
Alternatively, you may also open one slab in the dataset editor and then add additional slabs using File &rarr; Import. After selecting the next slab to import, answer the same questions as in step c above. Repeat for each slab.  


[[Image:BSPCGUI main.png| BSPC GUI]]
'''Solution 3) For larger three-way data:'''


==Start==
In the DataSet editor, you can import a full three-way array if you have it organized as a two-way matrix. Upon importing the two-way data, you can reshape to a three-way array using the menu item: Transform &rarr; Fold into 3-way.  
Load, append, edit, and or clear data. Selecting the Analysis type will automatically enable/disable relevant tabs.


* Dropping data onto the status area will load data. If previously loaded data exists, a prompt for overwrite or augment will appear.
For example, you have the above matrices (three slabs) in one table/matrix:
** If augment is chosen, two options will be given, augment as new batch or not. Augment as new batch adds a class for the data being augmented otherwise a "normal" augment will occur and if the new dataset has a matching class it will be merged.
* Dragging and dropping multiple-selected (Excel) files from the system browser (e.g., Windows Explorer or Finder) will pre-augment the files and create a label indicating file name. This label can be used to identify batches in the '''Batches''' tab.
* Data can be edited in the [[DataSet Editor]] by clicking the '''Edit''' button. Editing will cause the model to be cleared.


==Batch==
  [ Slab1;
Indicates which samples belong to which batch based on information in the loaded DataSet. Sources can be Class, Label, or Axisscale sets, or a single Variable (column). If manually loaded, a class is created from the loaded content. If the DataSet contains a class with the default name of "BSPC Batch" then it will be automatically selected after loading.
  Slab2;
  Slab3 ]


* If variable is used, data for that column will be excluded (not deleted) so other mechanisms (preprocessing) can work.
hence have the three slabs below each other. Upon importing, use the menu option described above to "Fold into 3-way" and choose three as the number of slabs and the data will be rearranged accordingly. If you are familiar with the MATLAB function <code>reshape</code>, you may also use Transform &rarr; Reshape for other types of rearrangements.  
* Once Batches have been identified, one or more batches can be plotted in the lower plot.


All methods, except [[batchmaturity|Batch Maturity]] require defining a means to identify the different batches because these are used to form the samples of the input matrices (and multiple samples are required for all methods other than Batch Maturity).
Note: the result of this command will give you slabs in the 3rd mode of the DataSet. If these slabs are separate samples (such as with EEMs), you'll want to use the Transform &rarr; Permute menu to reorder the dimensions. For example, permuting to the order [3 2 1] would swap the order of the 1st and 3rd modes, putting slabs as the first mode.  


==Steps==
Steps (subdivisions of batches) can be indicated on the '''Steps''' tab. Steps can be created in the same manor as '''Batches''' or indicated manually. Particular steps to be included or excluded can be selected.


Manual selection is done by selecting a primary variable and batch to align '''to''' then designating '''steps''' for the primary variable/batch. After the steps are set the [[batchalign]] function is used to "map" step location (as dataset class) for each batch.
'''Still having problems? Please contact our helpdesk at [mailto:helpdesk@eigenvector.com helpdesk@eigenvector.com]'''


===Manually Selecting Steps===
[[Category:FAQ]]
 
[[Image:bspc_manual_select.png|500px|Manual Selection Interface]]
 
To manually select steps:
 
# Select the variable and batch to use from the plot list boxes at the bottom of the interface. These will become the variable and batch to which all others are aligned to (designated by a "*" next to the list item.
# Click the '''Select''' button and the interface will switch.
# Click the '''Add''' button to place the first step marker.
# Drag this marker to the first step location.
# Repeat until all steps are placed.
# Select different batch from list menu to display "aligned" step position.
# Adjust alignment algorithm as needed using toolbar button.
# Click check-mark button to finish and save steps.
 
===Selected Steps Menu===
 
[[Image:bspc_selected_steps.png|300px|]]
 
Once steps have been designated, they will appear the '''Step Selection''' list. If one or more steps should be ignored they can be deselected in this menu. Selected steps will appear in the batch plot as solid green lines and unselected steps appear as red dashed lines.
 
==Align==
 
Methods that require equal length batches use the tools available on the '''Align''' tab from the [[batchalign]] function.
 
The process to configure the alignment is:
 
# Select the type of alignment:
#* '''Linear''' - Linear interpolation or decimation to match selected batch's length.
#* '''COW''' - [[cow|Correlation Optimized Warping]] with Alignment Settings values.
#* '''Pad With NaN''' - Append each batch with NaN to make all batches equal length.
# Select the Batch and Variable to use as a reference (target) or Load a vector.
# Select alignment settings (if using COW).
# Click Update Plot to see the results.
 
The plots switch to displaying selected variables and batches with the pre-aligned data on top and post-aligned data on the bottom. Click the '''Update Plots''' button to refresh the plot after making any changes.
 
[[Image:bspc_align_settings.png|Align Settings ]]
 
NOTE: In the image above, the alignment batch is Class 0 (the default) which has no members. This must be changed before alignment will work.
 
==Summarize==
 
The summary PCA and summary PARAFAC methods make use of statistical summary functions to capture the trends in the trajectories of the variables. A Summary PCA or Summary PARAFAC model does not require alignment of batches and is generally less sensitive to the exact trajectory of the batches, providing some model robustness.
 
The statistics are calculated by the [[summary]] function and have different sensitivities to profile changes. The ability of each statistic to capture useful information from a trajectory depends on the dynamics and the statistic. Often it is useful to include a number of the statistics and decide, while modeling, which seem to be providing information and excluding the remaining statistics. If the statistics are sufficiently sensitive to the trajectory profile as to provide detection of out-of-specification batches, then the model will likely provide longer-term performance over an equivalent MPCA or PARAFAC model.
 
The statistics which are available include:
 
[[Image:Bspc_summarize.png|Summary Options]]
 
All stats summarize each column and each step (if specified) except for:
* '''Length''' Length of step = a single number for each step (irrespective of number of variables).
* '''Five-Number Summary''' 10, 25, 50, 75, 90th percentile = 5 values per step per variable.
 
For example with the [[Demonstration_Datasets | Dupont]] demo calibration data (dupont_cal), if you choose mean, std, slope, skewness, and length the size of your folded summary pca data will be:
 
10 variables x 4 stats + length = 41 values per step * 5 steps = 205 columns
 
==Finish==
 
When completed there are 4 options:
 
* Send data directly to a new [[Analysis]] window.
* Save the data to the workspace.
* Save a model for future data application. NOTE: In some more complicated instances (loading outside information) the model may not be able to fully capture each step taken in the interface.
* Cancel and close the window.

Revision as of 10:45, 20 June 2019

Issue:

How do I import three-way data into Solo or PLS_Toolbox?

Possible Solutions:

Solution 1) Built in EEM importers :

If applicable to your file type, use one of the built in EEM importers. There are importers for EEM data from Hitachi, Shimazdu, Horiba and Jasco. Please see this wiki entry for more information on Data Importing Formats

EEM data needs be configured in a specific way such that:

  • mode 1 corresponds to samples
  • mode 2 corresponds to emission
  • mode 3 corresponds to excitation

The built-in EEM importers will handle this configuration automatically. When importing manually (see below), further manipulation will likely be necessary. Use the Transform → Permute modes and Transform → Reshape smenu items to modify your imported data as appropriate.

Solution 2) For three-way data with few slabs:

  1. Import the data slabs into the workspace (browser). The workspace browser is available from the main analysis user interface from the menu item FigBrowser.
  2. Each slab, i.e. each matrix of data is imported individually. Hence, if you have a 10x8x3 array, you will import three slabs each of size 10x8.
  3. Use the mouse to drag slab two onto slab one. In the window that opens choose Augment and then choose augment in the Slabs direction.
  4. A two-slab three-way array has now replaced the first data matrix. More slabs can be added in the same fashion.

Alternatively, you may also open one slab in the dataset editor and then add additional slabs using File → Import. After selecting the next slab to import, answer the same questions as in step c above. Repeat for each slab.

Solution 3) For larger three-way data:

In the DataSet editor, you can import a full three-way array if you have it organized as a two-way matrix. Upon importing the two-way data, you can reshape to a three-way array using the menu item: Transform → Fold into 3-way.

For example, you have the above matrices (three slabs) in one table/matrix:

 [ Slab1;
 Slab2;
 Slab3 ]

hence have the three slabs below each other. Upon importing, use the menu option described above to "Fold into 3-way" and choose three as the number of slabs and the data will be rearranged accordingly. If you are familiar with the MATLAB function reshape, you may also use Transform → Reshape for other types of rearrangements.

Note: the result of this command will give you slabs in the 3rd mode of the DataSet. If these slabs are separate samples (such as with EEMs), you'll want to use the Transform → Permute menu to reorder the dimensions. For example, permuting to the order [3 2 1] would swap the order of the 1st and 3rd modes, putting slabs as the first mode.


Still having problems? Please contact our helpdesk at helpdesk@eigenvector.com