Preprocess: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Importing text file)
imported>Chuck
No edit summary
Line 1: Line 1:
===Purpose===
===Purpose===


Selection and application of preprocessing methods.
Selection and application of standard preprocessing methods.


===Synopsis===
===Synopsis===
Line 15: Line 14:
===Description===
===Description===


PREPROCESS is a general tool to choose preprocessing steps and to perform these steps on data. See PREPROUSER for a description on how custom preprpocessing can be added to the standard proprocessings listed below. PREPROCESS has four basic command-line forms which include:
PREPROCESS is a general tool to choose preprocessing steps and to perform these steps on data. See [[preprouser]] for a description on how custom preprocessing can be added to the standard preprocessing options listed below. PREPROCESS can be used to perform 4 different tasks:


1) SELECTION OF PREPROCESSING.
* 1) Specification of Preprocessing
* 2) Estimate preprocessing parameters (calibrate)
* 3) Apply preprocessing to new data (apply)
* 4) Remove the effect of previously-done preprocessing on data (undo)
 
====Case 1) Specification of Preprocessing====


The purpose of the following calls to PREPROCESS is to generate standard structure arrays that contain the desired preprocessing steps.
The purpose of the following calls to PREPROCESS is to generate standard structure arrays that contain the desired preprocessing steps.
Line 23: Line 27:
:s = preprocess;
:s = preprocess;


generates a GUI and allows the user to select preprocessing steps interactively. The output s is a standard preprocessing structure.
generates a GUI and allows the user to select preprocessing steps interactively. The output <tt>s</tt> is a standard preprocessing structure.


:s = preprocess(s);
:s = preprocess(s);


allows the user to interactively edit a previously identified preprocessing structure s. The output s is the edited preprocessing structure.
allows the user to interactively edit a previously-built preprocessing structure <tt>s</tt>. The output <tt>s</tt> is the edited preprocessing structure.


:s = preprocess('default','methodname');
:s = preprocess('default','methodname');


returns the default structure for method 'methodname'. A list of strings that can be used for 'methodname' can be viewed using the command:
returns the default structure for method <tt>methodname</tt>. A list of strings that can be used for <tt>methodname</tt> can be viewed using the command:


:preprocess('keywords')
:preprocess('keywords')


A list of standard methods 'methodname' follow:
Below is list of standard methods that can be used for 'methodname':
 
* ''''abs'''': takes the absolute value of the data (see ABS),
 
* ''''autoscale'''': centers columns to zero mean and scales to unit variance (see AUTO),
 
* ''''simple''' baseline': baseline (specified points),
 
* ''''baseline'''': baseline (weighted least squares),
 
* ''''derivative'''': derivative (SavGol),
 
* ''''detrend'''': remove a linear trend (see BASELINE),
 
* ''''gls''' weighting': generalized least squares weighting (see GLSW),
 
* ''''groupscale'''': group/block scaling (see GSCALE),
 
* ''''logdecay'''': log decay scaling,
 
* ''''log10'''': calculate base 10 logarithm of data,
 
* ''''mean''' center': center columns to have zero mean (see MNCN),
 
* ''''msc''' (mean)': multiplicative scatter correction with offset, the mean is the reference spectrum (see MSCORR),
 
* ''''median''' center': center columns to have zero median (see MEDIAN),
 
* ''''centering'''': multiway center,
 
* ''''scaling'''': multiway scale,
 
* ''''normalize'''': normalization of the rows (see NORMALIZ),
 
* ''''osc'''': orthogonal signal correction (see OSCCALC and OSCAPP),
 
* ''''smooth'''': Savitsky-Golay smoothing and deriviatives (see SAVGOL), and
 
* ''''snv'''': standard normal deviate (autoscale the rows, see SNV),


* ''''sqmnsc'''': sqrt mean scale, scale each variable by the square root of its mean.
* 'abs': takes the absolute value of the data (see [[abs]]),
* 'autoscale': centers columns to zero mean and scales to unit variance (see [[auto]]),
* 'simple baseline': baseline (specified points),
* 'baseline': baseline (weighted least squares),
* 'derivative': derivative [[savgol]],
* 'detrend': remove a linear trend (see [[baseline]]),
* 'gls weighting': generalized least squares weighting (see [[glsw]]),
* 'groupscale': group/block scaling (see [[gscale]]),
* 'logdecay': log decay scaling,
* 'log10': calculate base 10 logarithm of data,
* 'mean center': center columns to have zero mean (see [[mncn]]),
* 'msc (mean)': multiplicative scatter correction with offset, the mean is the reference spectrum (see [[mscorr]]),
* 'median center': center columns to have zero median (see [[median]]),
* 'centering': multiway center,
* 'scaling': multiway scale,
* 'normalize': normalization of the rows (see [[normaliz]]),
* 'osc': orthogonal signal correction (see [[osccalc]] and [[oscapp]]),
* 'smooth': Savitsky-Golay smoothing and deriviatives (see [[savgol]]), and
* 'snv': standard normal deviate (autoscale the rows, see [[snv]]),
* 'sqmnsc': sqrt mean scale, scale each variable by the square root of its mean.


The output is a standard preprocessing structure array s where each method to apply is a separate record.
The output is a standard preprocessing structure array <tt>s</tt>, where each preprocessing method to apply is contained in a separate record.


2) CALIBRATE.
====Case 2) Estimate preprocessing parameters (calibrate)====


The objective of the following calls to PREPROCESS is to estimate preprocessing parameters, if any, from a calibration data set and perform preprocessing on the calibration data set. The I/O format is:
The objective of the following calls to PREPROCESS is to estimate preprocessing parameters, if any, from a calibration data set and perform preprocessing on the calibration data set. The I/O format is:
Line 85: Line 70:
:[datap,sp] = preprocess('calibrate',s,data);
:[datap,sp] = preprocess('calibrate',s,data);


The inputs are s a standard preprocessing structure and data the calibration data. The preprocessed data is returned in datap, and preprocessing parameters are returned in a modified preprocessing structure sp. Note that sp is used as an input with the 'apply' and 'undo' commands described below.
The inputs are <tt>s</tt> a standard preprocessing structure and <tt>data</tt> the calibration data. The preprocessed data is returned in <tt>datap</tt>, and preprocessing parameters are returned in a modified preprocessing structure <tt>sp</tt>. Note that <tt>sp</tt> is used as an input with the 'apply' and 'undo' commands described below.


Short cuts for each method can also be used. Examples for 'mean center' and 'autoscale' are
Short-cuts for each method can also be used. Examples for 'mean center' and 'autoscale' are


:[datap,sp] = preprocess('calibrate','mean center',data);
:[datap,sp] = preprocess('calibrate','mean center',data);
Line 93: Line 78:
:[datap,sp] = preprocess('calibrate','autoscale',data);
:[datap,sp] = preprocess('calibrate','autoscale',data);


Preprocessing for some multi-block methods require that the y-block be passed also. The I/O format in these cases is:
Preprocessing for some multi-block methods (specifically, 'osc' and 'gls weighting') require that the y-block be passed also. The I/O format in these cases is:


:[datap,sp] = preprocess('calibrate',s,xblock,yblock);
:[datap,sp] = preprocess('calibrate',s,xblock,yblock);


Preprocessing 'methodname' that require a y-block are:
====Case 3) Apply preprocessing to new data (apply)====
 
: 'osc'
 
: 'gls weighting'
 
3) APPLY.


The objective of the following call to PREPROCESS
The objective of the following call to PREPROCESS
Line 109: Line 88:
:datap = preprocess('apply',sp,data)
:datap = preprocess('apply',sp,data)


is to apply the calibrated preprocessing in sp to new data. Inputs are sp the modified preprocessing structure (See 2 above) and the data, data, to apply the preprocessing to. The output is preprocessed data datap that is class "dataset".
is to '''apply''' the calibrated preprocessing in <tt>sp</tt> to new data. Inputs are <tt>sp</tt>, the modified preprocessing structure (See Case 2 above) and the data, <tt>data</tt>, to apply the preprocessing to. The output is preprocessed data <tt>datap</tt> that is class "dataset".


4) UNDO.
====Case 4) Remove the effect of previously-done preprocessing on data (undo)====


The inverse of applying preprocessing is perfromed in the following call to PREPROCESS
The inverse operation of applying preprocessing is performed in the following call to PREPROCESS


:data = preprocess('undo',sp,datap);
:data = preprocess('undo',sp,datap);


Inputs are sp the modified preprocessing structure (See 2 above) and the data, datap, (class "double" or "dataset") from which the preprocessing is removed. Note that for some preprocessing methods an inverse does not exist or has not been defined and an 'undo' call will cause an error to occur. For example, 'osc' and 'sg'. One reason for not defining an inverse, or undo, is because it would require a significant amount of memory storage when data sets get large.
Inputs are <tt>sp</tt>, the modified preprocessing structure (See Case 2 above) and the data, <tt>datap</tt>, (class "double" or "dataset") from which the preprocessing is removed. Note that for some preprocessing methods (for example, 'osc' and 'sg') an inverse does not exist or has not been defined, and in such cases an 'undo' call will cause an error to occur. One reason for not defining an inverse, or undo, is because it would require a significant amount of memory storage when data sets get large.


===See Also===
===See Also===


[[crossval]], [[pca]], [[pcr]], [[pls]], [[preprouser]]
[[crossval]], [[pca]], [[pcr]], [[pls]], [[preprouser]], [[preprocatalog]]

Revision as of 16:52, 9 October 2008

Purpose

Selection and application of standard preprocessing methods.

Synopsis

s = preprocess(s) %GUI preprocessing selection
s = preprocess('default','methodname') %Non-GUI selection
[datap,sp] = preprocess('calibrate',s,data) %single block calibrate
[datap,sp] = preprocess('calibrate',s,xblock,yblock) %multi-block
datap = preprocess('apply',sp,data) %apply to new data
data = preprocess('undo',sp,datap) %undo preprocessing

Description

PREPROCESS is a general tool to choose preprocessing steps and to perform these steps on data. See preprouser for a description on how custom preprocessing can be added to the standard preprocessing options listed below. PREPROCESS can be used to perform 4 different tasks:

  • 1) Specification of Preprocessing
  • 2) Estimate preprocessing parameters (calibrate)
  • 3) Apply preprocessing to new data (apply)
  • 4) Remove the effect of previously-done preprocessing on data (undo)

Case 1) Specification of Preprocessing

The purpose of the following calls to PREPROCESS is to generate standard structure arrays that contain the desired preprocessing steps.

s = preprocess;

generates a GUI and allows the user to select preprocessing steps interactively. The output s is a standard preprocessing structure.

s = preprocess(s);

allows the user to interactively edit a previously-built preprocessing structure s. The output s is the edited preprocessing structure.

s = preprocess('default','methodname');

returns the default structure for method methodname. A list of strings that can be used for methodname can be viewed using the command:

preprocess('keywords')

Below is list of standard methods that can be used for 'methodname':

  • 'abs': takes the absolute value of the data (see abs),
  • 'autoscale': centers columns to zero mean and scales to unit variance (see auto),
  • 'simple baseline': baseline (specified points),
  • 'baseline': baseline (weighted least squares),
  • 'derivative': derivative savgol,
  • 'detrend': remove a linear trend (see baseline),
  • 'gls weighting': generalized least squares weighting (see glsw),
  • 'groupscale': group/block scaling (see gscale),
  • 'logdecay': log decay scaling,
  • 'log10': calculate base 10 logarithm of data,
  • 'mean center': center columns to have zero mean (see mncn),
  • 'msc (mean)': multiplicative scatter correction with offset, the mean is the reference spectrum (see mscorr),
  • 'median center': center columns to have zero median (see median),
  • 'centering': multiway center,
  • 'scaling': multiway scale,
  • 'normalize': normalization of the rows (see normaliz),
  • 'osc': orthogonal signal correction (see osccalc and oscapp),
  • 'smooth': Savitsky-Golay smoothing and deriviatives (see savgol), and
  • 'snv': standard normal deviate (autoscale the rows, see snv),
  • 'sqmnsc': sqrt mean scale, scale each variable by the square root of its mean.

The output is a standard preprocessing structure array s, where each preprocessing method to apply is contained in a separate record.

Case 2) Estimate preprocessing parameters (calibrate)

The objective of the following calls to PREPROCESS is to estimate preprocessing parameters, if any, from a calibration data set and perform preprocessing on the calibration data set. The I/O format is:

[datap,sp] = preprocess('calibrate',s,data);

The inputs are s a standard preprocessing structure and data the calibration data. The preprocessed data is returned in datap, and preprocessing parameters are returned in a modified preprocessing structure sp. Note that sp is used as an input with the 'apply' and 'undo' commands described below.

Short-cuts for each method can also be used. Examples for 'mean center' and 'autoscale' are

[datap,sp] = preprocess('calibrate','mean center',data);
[datap,sp] = preprocess('calibrate','autoscale',data);

Preprocessing for some multi-block methods (specifically, 'osc' and 'gls weighting') require that the y-block be passed also. The I/O format in these cases is:

[datap,sp] = preprocess('calibrate',s,xblock,yblock);

Case 3) Apply preprocessing to new data (apply)

The objective of the following call to PREPROCESS

datap = preprocess('apply',sp,data)

is to apply the calibrated preprocessing in sp to new data. Inputs are sp, the modified preprocessing structure (See Case 2 above) and the data, data, to apply the preprocessing to. The output is preprocessed data datap that is class "dataset".

Case 4) Remove the effect of previously-done preprocessing on data (undo)

The inverse operation of applying preprocessing is performed in the following call to PREPROCESS

data = preprocess('undo',sp,datap);

Inputs are sp, the modified preprocessing structure (See Case 2 above) and the data, datap, (class "double" or "dataset") from which the preprocessing is removed. Note that for some preprocessing methods (for example, 'osc' and 'sg') an inverse does not exist or has not been defined, and in such cases an 'undo' call will cause an error to occur. One reason for not defining an inverse, or undo, is because it would require a significant amount of memory storage when data sets get large.

See Also

crossval, pca, pcr, pls, preprouser, preprocatalog