Preprouser

From Eigenvector Research Documentation Wiki
Revision as of 17:25, 9 October 2008 by imported>Chuck (→‎See Also)
Jump to navigation Jump to search

Purpose

Enables user-defined items to be added to the preprocess catalog.

Synopsis

preprouser(fig)

Description

Each method available in the preprocess function has an associated 'methodname' such as those listed in the help for preprocess. Each method is defined using a preprocessing structure that contains all the necessary information to perform calculations for that method. The standard methods are defined in the preprocatalog file, which should not be edited by the user. Additional user-defined methods can be defined in the preprouser file and the following text describes how the user can add custom preprocessing methods. A few example methods already exist in the preprouser file to guide the user.

To add a custom user-defined preprocessing method, the user must

  • 1) open the PREPROUSER.M file,
  • 2) edit the file to create a structure with the fields described below,
  • 3) after defining the structure, add the line preprocess('addtocatalog',fig,usermethod), and
  • 4) save and close the PREPROUSER.M file.

The line added in Step 3

 preprocess('addtocatalog',fig,usermethod)

makes the new custom method available to preprocess. The input usermethod is the preprocessing structure containing the user-defined method, and fig is a figure handle passed to PREPROUSER by preprocess.

The methods defined in the preprocatalog and preprouser files are available to all functions making use of the preprocess function.

Preprocessing Structure

The fields in a preprocessing structure are listed here. Detailed descriptions and examples follow this list.

  • description: text string containing a description for the method,
  • calibrate: cell containing the line(s) of code to execute during a calibration operation (see command-line form 2 of preprocess),
  • apply: cell containing the line(s) of code to execute during an apply operation (see command-line case 3 of preprocess),
  • undo: cell containing the line(s) of code to execute during an undo operation (see command-line case 4 of preprocess),
  • out: cell used to hold calibration-phase results for use in apply or undo (these are the parameters estimated from the calibration data and used to preprocess new data),
  • settingsgui: text string containing the function name of a method-specific GUI to invoke when the Settings button is pressed in the preprocessing GUI,
  • settingsonadd: [ 0 | {1} ], boolean: 1 = indicates that the settings GUI should be automatically brought up when method is "added" in the preprocessing GUI,
  • usesdataset: [ {0} | 1 ], boolean: indicates if this method should be passed a dataset object (1) or a an array (0) (e.g. class "double" or "uint8"),
  • caloutputs: integer: number of expected items in field out after calibration has been performed. This field is set by the user to tell preprocess what the length of the cell in field out will be after calibration,
  • keyword: text string containing the 'methodname', this string is used in the call to preprocess so that it will return the custom preprocessing structure (see command-line case 1 of preprocess), and
  • userdata: user-defined variable often used to store method options.

Detailed descriptions and examples for each field follow:

Description Field

The description is a short (1-2 word) text string containing a description for the preprocessing method. The string will be displayed in the GUI and can also be used as a string keyword (see also keyword) to refer to this method.

Example:

pp.description = 'Mean Center';

Calibrate, Apply, Undo Fields

Each of these "command" fields contains a single cell consisting of a command string to be executed by preprocess when performing calibration, apply, or undo operations (see command-line cases 2, 3, and 4 of preprocess). Calibrate actions operate on original calibration data with the output parameters stored in the out field, whereas apply actions operate on new data using parameters stored in the out field as input(s). For methods which act on a single sample at a time, the calibrate and apply operations are often identical (for example, see the normalize example below). The undo action uses parameters stored in the out field as input(s) to remove preprocessing from previously preprocessed data. However, the undo action may be undefined for certain methods. If this is the case, the undo field should be an empty cell.

To assure that all samples (rows) in the data have been appropriately preprocessed, an apply command is automatically performed following a calibrate call. Note that excluded variables are replaced with NaN.

The command strings should be one or more valid MATLAB commands, each separated by a semicolon ';' (e.g. see EVAL). Each command will be executed inside the preprocess environment in which the following variables are available:

  • data: The data field contains the data on which to operate and in which to return modified results.
If the field usesdataset is 1 (one) then data will be a DataSet object. In this case, it is expected that the function will calibrate using only included rows but apply and undo the preprocessing to all rows.
If the field usesdataset is 0 (zero) then data will be an array (e.g. class "double"). In this case, the function will calibrate using all rows and columns and will apply and undo the preprocessing to all rows and columns.
  • out: Contents of the preprocessing structure field out (described below). Any changes will be stored in the preprocessing structure for use in subsequent apply and undo commands.
  • userdata: Contents of the preprocessing structure field userdata (described below). Any changes will be stored in the preprocessing structure for later retrieval.

Several variables are available for use during command operations (calibarate, apply, and undo). However, these variables should not be changed by the commands and are considered "read-only".

  • include: When the field usesdataset = 1, the data is passed as a dataset object. In this case, include contains the contents of the original dataset object's includ field.
  • otherdata: Cell array of any inputs to PREPROCESS which followed the data in the input list. For example, it is used by PLS_Toolbox regression functions to pass the y-block for use in methods which require that information.
  • originaldata: A dataset object which contains the original data unmodified by any preprocessing steps. For example, originaldata can be used to retrieve axis scale or class information even when usesdataset is 0 (zero).

Examples:

The following calibrate field performs mean-centering on data, returning both the mean-centered data as well as the mean values which are stored in out{1}:

pp.calibrate = { '[data,out{1}] = mncn(data);' };

The following apply and undo fields use the scale and rescale functions to apply and undo the previously determined mean values (stored by the calibrate operation in out{1}) with new data:

pp.apply = { 'data = scale(data,out{1});' };
pp.undo = { 'data = rescale(data,out{1});' };

Out Field

The out field is a cell array that contains the output parameters returned during the calibration operation. For example, if the following commands are run

load wine
s = preprocess('default','autoscale');
[dp,sp] = preprocess('calibrate',s,wine);

then the out field of sp is a 1 by 2 cell array with the first cell, out{1}, containing the means of the variables in the dataset wine, and the second cell, out{2}, contains the standard deviations. These parameters are used in subsequent apply and undo commands. See the related field caloutputs. Prior to the calibration operation both the out and caloutputs fields are empty.

Settingsgui Field

The name of a graphical user interface (GUI) function that allows the user to set options for this method. The function is expected to take as its only input a standard preprocessing structure from which it should take the current settings. The function should output the same preprocessing structure modified to meet the user's specification. Typically, these changes are made to the userdata field and the commands in the calibrate, apply and undo fields use that field's contents as input options.

The design of GUIs for selection of options is beyond the scope of this document and the user is directed to the following example files, both of which use GUIs to modify the userdata field of a preprocessing structure: autoset.m savgolset.m .

Example:

pp.settingsgui = 'autoset';

Settingsonadd Field

The settingsonadd field contains a boolean (1=true, 0=false) value. If it is 1=true, then when the user adds the method in the PREPROCESS GUI, the method's settingsgui will be automatically invoked. If a method requires the user to make a selection of options, settingsonadd=1 will guarantee that the user has an opportunity to modify the options or at least choose the default settings.

Example:

pp.settingsonadd = 1;

Usesdataset Field

The usesdataset field contains a boolean (1=true, 0=false) value.

If it is 1=true, the preprocessing method is capable of handling dataset objects and preprocess will pass the data as a dataset. It is the responsibility of the function(s) called by the method to appropriately handle the dataset's includ field.

If it is 0=false, the preprocessing method expects standard MATLAB classes (double, uint8, etc). preprocess, which uses a dataset object internally to hold the data, will extract data from the dataset ojbect prior to calling this method. It will then reinsert the preprocessed data back into the dataset object after the method has been invoked.

Although excluded columns are never extracted and excluded rows are not extracted when performing calibration operations, excluded rows are passed when performing apply and undo operations.

Example:

pp.usesdataset = 0;

Caloutputs Field

For functions which require a calibrate operation prior to an apply or undo (see the fields: calibrate and out), this field indicates how many values are expected in the out field. For example, in the case of mean centering the mean values stored in the field out are required to apply or undo the operation. Initially, out is an empty cell ({}). Following the calibration operation for mean centering, it becomes a single-item cell (length of one). For other calibration operations out may be a cell of length greater than one.

By examining this cell's length, preprocess can determine if a preprocessing structure has already been calibrated and contains the necessary information. The caloutputs field, when greater than zero, indicates to preprocess that it should test the out field prior to attempting an apply or undo.

Example: in the case of mean-centering, the length of out should be 1 (one) after calibration.

pp.caloutputs = 1;

Keyword Field

The field keyword is a string that can be used to retrieve the default preprocessing structure for this method. When retrieving a structure by keyword, preprocess ignores any spaces and is case-insensitive. The keyword field (or the description string, discussed above) can be used in place of any preprocessing structure in calibrate and default calls to preprocess:

pp = preprocess('default','meancenter');

Example:

pp.keyword = 'mncn';

Userdata Field

The field userdata contains additional user-defined data that can be changed during a calibration operation and retrieved for use in apply and undo operations. This field is often used to hold options for the preprocessing method which are then used by the commands in the calibrate, apply, and undo fields.

Example: in savgol, several input variables are defined with various method options, then they are assembled into a vector in userdata:

pp.userdata = [windowsize order derivative];

Examples

The following is the preprocessing structure used for sample normalization (see normaliz). The calibrate and apply commands are identical and there is no information that is stored during the calibration phase, thus caloutputs is zero. There is no undo defined for this operation (this is because the normalization information required to undo the action is not being stored anywhere). The norm type (e.g. a 2-norm) of the normalization is set in userdata and is used in both calibrate and apply steps.

pp.description = 'Normalize';
pp.calibrate = {'data = normaliz(data,0,userdata(1));'};
pp.apply = {'data = normaliz(data,0,userdata(1));'};
pp.undo = {};
pp.out = {};
pp.settingsgui = 'normset';
pp.settingsonadd = 0;
pp.usesdataset = 0;
pp.caloutputs = 0;
pp.keyword = 'Normalize';
pp.userdata = 2;

The following is the preprocessing structure used for Savitsky-Golay smoothing and derivatives (see savgol). In many ways this structure is similar to the normalize structure except that savgol takes a dataset object as input and, thus, usesdataset is set to 1. Also note that because of the various settings required by savgol, this method uses of the settingsonadd feature to bring up the settings GUI as soon as the method is added.

pp.description = 'SG Smooth/Derivative';
pp.calibrate = {'data=savgol(data,userdata(1),userdata(2),userdata(3));'};
pp.apply = {'data=savgol(data,userdata(1),userdata(2),userdata(3));'};
pp.undo = {};
pp.out = {};
pp.settingsgui = 'savgolset';
pp.settingsonadd = 1;
pp.usesdataset = 1;
pp.caloutputs = 0;
pp.keyword = 'sg';
pp.userdata = [ 15 2 0 ];

The following example creates a preprocessing structure to invoke multiplicative scatter correction (MSC, see mscorr) using the mean of the calibration data as the target spectrum. The calibrate cell here contains two separate operations. The first calculates the mean spectrum and the second performs the MSC. The third input to the mscorr function is a flag indicating whether an offset should also be removed. This flag is stored in the userdata field so that the settingsgui (mscorrset) can change the value easily. Note that there is no undo defined for this function.

pp.description = 'MSC (mean)';
pp.calibrate = { 'out{1}=mean(data); data=mscorr(data,out{1},userdata);' };
pp.apply = { 'data = mscorr(data,out{1});' };
pp.undo = {};
pp.out = {};
pp.settingsgui = 'mscorrset';
pp.settingsonadd = 0;
pp.usesdataset = 0;
pp.caloutputs = 1;
pp.keyword = 'MSC (mean)';
pp.userdata = 1;

See Also

preprocess