Batchdigester: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Importing text file)
imported>Scott
No edit summary
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
===Purpose===
===Purpose===


Parse wafer or batch data into MPCA or Summary PCA form.
Parse wafer or batch data for MPCA or into summary variables for use in PCA.


===Synopsis===
===Synopsis===


:[out,options] = batchdigester(data,options);
:[out,options] = batchdigester(data,''options'');
:batchdigester    %prompt user for input and output
:batchdigester    %prompt user for input and output


===Description===
===Description===


Rearranges and optionally summarizes two-way dataset of batch or wafer data. Input data must be a DataSet object containing labels which identify different wafers or batches which should be split out of the data. Classes in data are (optionally) used to split each time profile of the batch/wafer into steps which can then be selected for inclusion in the output.  
Rearranges and optionally summarizes two-way dataset of batch (or wafer) data. Input (data) must be a DataSet object containing labels identifying each batch/wafer. Classes in (data) are (optionally) used to divide each time profile of the batch/wafer into individual steps. The steps and summary variables estimated for each step can then be selected for inclusion in the output.  


MPCA mode: If data is rearranged into MPCA data, each wafer/batch is arranged as one slab of a 3-way matrix. Each row is a time point and each column is one of the original variables. Only selected steps are included in the output.
MPCA mode: If data is rearranged into MPCA data, each wafer/batch is arranged as one slab of a 3-way matrix. Each row is a time point and each column is one of the original variables. Only selected steps are included in the output.
Line 17: Line 16:
Summary PCA mode: If data is summarized into Summary PCA data, all time points for a given step in a given wafer are summarized using one or more statistics:
Summary PCA mode: If data is summarized into Summary PCA data, all time points for a given step in a given wafer are summarized using one or more statistics:


*      '''Mean
:Mean
 
:Standard Deviation
**'''''''''''''''      Standard Deviation
:Minimum
 
:Maximum
*      '''Minimum
:Range
 
:Slope
**'''''''''''''''      Maximum
:Length (of step)
 
*      '''Range
 
**'''''''''''''''      Slope
 
*      '''Length''' (of step)


The time profile for each original variable is summarized using the given statistic(s) and turned into a single variable (column) of the output data. If steps are used, this is repeated for each step segment (each creating a new, separate variable in the output). Each wafer/batch is thus a single row of the output data with all of the steps and original variables summarized as new variables.  
If steps are not used, the time profile for each original variable is summarized using the given statistic(s) and turned into a single variable (column) of the output data. If steps are used, summary statistics are calculated within every step for each variable thus creating new variables in the output. Each wafer/batch is thus a single row of the output data with all of the steps and original variables summarized as new variables.  


Outputs are the digested data, out, and the options which can be used to reproduce the digestion process, options (see below).
Outputs are the digested data, (out), and the options which can be used to reproduce the digestion process, (''options'') (see below).


NOTE: The [[batchfold]] function replaces and enhances many of the data handling functionality.
===Options===
===Options===


* '''options''' = structure with one or more of the following fields:
options  = structure with one or more of the following fields:
 
* '''display''' :  [ 'off' | {'on'} ] governs level of display to command window.


* '''object''' : { 'batch' | 'wafer' } A string specifying the type of object being digested. This is used for display ONLY. The same algorithms are used in both cases but this option allows customization of the wording in the user prompts.
*'''object''' : { 'batch' | 'wafer' } A string specifying the type of object being digested. This is used for display ONLY. The same algorithms are used in both cases but this option allows customization of the wording in the user prompts.


* '''stepclassname''' : A string specifying the name of the class which should be used to indicate steps in the process.
*'''stepclassname''' : A string specifying the name of the class which should be used to indicate steps in the process.


* '''stepsdesired''' : A vector of steps which should be included in the digestion.
*'''stepsdesired''' : A vector of steps which should be included in the digestion.


* '''labelname''' : A string specifying the name of the label set which should be used to split data into batches/wafers. Use the keyword 'fixed' to specify that the batches are of fixed length and can be split using the nbatches option.
*'''labelname''' : A string specifying the name of the label set which should be used to split data into batches/wafers. Use the keyword 'fixed' to specify that the batches are of fixed length and can be split using the nbatches option.


* '''nbatches''' : The number of equally-sized batches to split the data into. Used ONLY when labelname is 'fixed'.  
*'''nbatches''' : The number of equally-sized batches to split the data into. Used ONLY when labelname is 'fixed'.  


* '''digestiontype''' : [ 'mpca' | 'spca' ] Specifies which digestion algorithm to use on the data.
*'''digestiontype''' : [ 'mpca' | 'spca' ] Specifies which digestion algorithm to use on the data.


* '''statistics''' : A cell specifying the statistics to be used on the data. Used ONLY when digestiontype = 'spca';
*'''statistics''' : A cell specifying the statistics to be used on the data. Used ONLY when digestiontype = 'spca';


If sufficent information is provided in these options, the processing of data will be automatic and the user will not have to answer any responses in the GUIs. Otherwise, only prompts for missing information will be given. The options which can be used to re-process using a given digestion "recipe" will be returned as the second output to any digestion request.
If sufficent information is provided in these options, the processing of data will be automatic and the user will not have to answer any responses in the GUIs. Otherwise, only prompts for missing information will be given. The options which can be used to re-process using a given digestion "recipe" will be returned as the second output to any digestion request.
Line 59: Line 51:
===See Also===
===See Also===


[[mpca]], [[pca]]
[[batchfold]], [[mpca]], [[pca]]

Latest revision as of 10:26, 6 September 2017

Purpose

Parse wafer or batch data for MPCA or into summary variables for use in PCA.

Synopsis

[out,options] = batchdigester(data,options);
batchdigester %prompt user for input and output

Description

Rearranges and optionally summarizes two-way dataset of batch (or wafer) data. Input (data) must be a DataSet object containing labels identifying each batch/wafer. Classes in (data) are (optionally) used to divide each time profile of the batch/wafer into individual steps. The steps and summary variables estimated for each step can then be selected for inclusion in the output.

MPCA mode: If data is rearranged into MPCA data, each wafer/batch is arranged as one slab of a 3-way matrix. Each row is a time point and each column is one of the original variables. Only selected steps are included in the output.

Summary PCA mode: If data is summarized into Summary PCA data, all time points for a given step in a given wafer are summarized using one or more statistics:

Mean
Standard Deviation
Minimum
Maximum
Range
Slope
Length (of step)

If steps are not used, the time profile for each original variable is summarized using the given statistic(s) and turned into a single variable (column) of the output data. If steps are used, summary statistics are calculated within every step for each variable thus creating new variables in the output. Each wafer/batch is thus a single row of the output data with all of the steps and original variables summarized as new variables.

Outputs are the digested data, (out), and the options which can be used to reproduce the digestion process, (options) (see below).

NOTE: The batchfold function replaces and enhances many of the data handling functionality.

Options

options = structure with one or more of the following fields:

  • object : { 'batch' | 'wafer' } A string specifying the type of object being digested. This is used for display ONLY. The same algorithms are used in both cases but this option allows customization of the wording in the user prompts.
  • stepclassname : A string specifying the name of the class which should be used to indicate steps in the process.
  • stepsdesired : A vector of steps which should be included in the digestion.
  • labelname : A string specifying the name of the label set which should be used to split data into batches/wafers. Use the keyword 'fixed' to specify that the batches are of fixed length and can be split using the nbatches option.
  • nbatches : The number of equally-sized batches to split the data into. Used ONLY when labelname is 'fixed'.
  • digestiontype : [ 'mpca' | 'spca' ] Specifies which digestion algorithm to use on the data.
  • statistics : A cell specifying the statistics to be used on the data. Used ONLY when digestiontype = 'spca';

If sufficent information is provided in these options, the processing of data will be automatic and the user will not have to answer any responses in the GUIs. Otherwise, only prompts for missing information will be given. The options which can be used to re-process using a given digestion "recipe" will be returned as the second output to any digestion request.

See Also

batchfold, mpca, pca