Batchdigester

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Parse wafer or batch data for MPCA or into summary variables for use in PCA.

Synopsis

[out,options] = batchdigester(data,options);
batchdigester %prompt user for input and output

Description

Rearranges and optionally summarizes two-way dataset of batch (or wafer) data. Input (data) must be a DataSet object containing labels identifying each batch/wafer. Classes in (data) are (optionally) used to divide each time profile of the batch/wafer into individual steps. The steps and summary variables estimated for each step can then be selected for inclusion in the output.

MPCA mode: If data is rearranged into MPCA data, each wafer/batch is arranged as one slab of a 3-way matrix. Each row is a time point and each column is one of the original variables. Only selected steps are included in the output.

Summary PCA mode: If data is summarized into Summary PCA data, all time points for a given step in a given wafer are summarized using one or more statistics:

Mean
Standard Deviation
Minimum
Maximum
Range
Slope
Length (of step)

If steps are not used, the time profile for each original variable is summarized using the given statistic(s) and turned into a single variable (column) of the output data. If steps are used, summary statistics are calculated within every step for each variable thus creating new variables in the output. Each wafer/batch is thus a single row of the output data with all of the steps and original variables summarized as new variables.

Outputs are the digested data, (out), and the options which can be used to reproduce the digestion process, (options) (see below).

NOTE: The batchfold function replaces and enhances many of the data handling functionality.

Options

options = structure with one or more of the following fields:

  • object : { 'batch' | 'wafer' } A string specifying the type of object being digested. This is used for display ONLY. The same algorithms are used in both cases but this option allows customization of the wording in the user prompts.
  • stepclassname : A string specifying the name of the class which should be used to indicate steps in the process.
  • stepsdesired : A vector of steps which should be included in the digestion.
  • labelname : A string specifying the name of the label set which should be used to split data into batches/wafers. Use the keyword 'fixed' to specify that the batches are of fixed length and can be split using the nbatches option.
  • nbatches : The number of equally-sized batches to split the data into. Used ONLY when labelname is 'fixed'.
  • digestiontype : [ 'mpca' | 'spca' ] Specifies which digestion algorithm to use on the data.
  • statistics : A cell specifying the statistics to be used on the data. Used ONLY when digestiontype = 'spca';

If sufficent information is provided in these options, the processing of data will be automatic and the user will not have to answer any responses in the GUIs. Otherwise, only prompts for missing information will be given. The options which can be used to re-process using a given digestion "recipe" will be returned as the second output to any digestion request.

See Also

batchfold, mpca, pca