Registerspec: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Jeremy
(Importing text file)
imported>Chuck
No edit summary
Line 1: Line 1:
===Purpose===
===Purpose===


Line 7: Line 6:


:[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)
:[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)
:peaks = registerspec(data,xaxis,options)
:peaks = registerspec(data,xaxis,options)   % "peak-find" mode


===Description===
===Description===


REGISTERSPEC is used to correct spectra for shifts in x-axis (e.g. wavelength or frequency) registration. The alignment is based on either a polynomial or constrained-spline fit of reference peaks' observed position to their expected position. In contrast to other alignment methods (e.g. piecewise direct standardization or dynamic time warping), REGISTERSPEC may be more useful when 1) x-axis shifts are small and potentially non-linear, 2) only a few consistant reference peaks exist, and/or 3) when some of the spectral bands are expected to undergo significant shape changes in the normal range of observations.  
REGISTERSPEC is used to correct spectra for shifts in x-axis (e.g. wavelength or frequency) registration. The alignment is based on either a polynomial or constrained-spline fit of reference peaks' observed position to their expected position. In contrast to other alignment methods (e.g. piecewise direct standardization or dynamic time warping), REGISTERSPEC may be more useful when
*1) x-axis shifts are small and potentially non-linear,
*2) only a few consistent reference peaks exist, and/or
*3) when some of the spectral bands are expected to undergo significant shape changes in the normal range of observations.  


There are two modes used to call REGISTERSPEC. The first mode is used to align new spectra given a set of reference peaks. The second mode is used to help identify peaks in a calibration set that might be useful as reference peaks:
There are two modes used to call REGISTERSPEC:
*A) The first mode is used to align new spectra given a set of reference peaks.
*B) The second mode is used to help identify peaks in a calibration set that might be useful as reference peaks:


====Spectral Alignment:====
====Usage Mode 1: Spectral Alignment:====
 
A typical function call for spectral alignment is the following:


:[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)
:[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)


When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input a matrix or DataSet object containing spectra to be aligned, data, an x-axis reference for those spectra, xaxis, and a vector containing the expected positions of previously-identified reference peaks, peaks. Outputs are the spectra aligned to the reference peaks, data_i, the x-axis scale for those spectra, axaxis (generally the same as xaxis, except as discussed below) and an array, foundat, of the observed shifts for each reference peak (columns) and each spectra in data (rows).
When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input a matrix or DataSet object containing spectra to be aligned, <tt>data</tt>, an x-axis reference for those spectra, <tt>xaxis</tt>, and a vector containing the expected positions of previously-identified reference peaks, <tt>peaks</tt>. Outputs are the spectra aligned to the reference peaks, data_i, the x-axis scale for those spectra, axaxis (generally the same as xaxis, except as discussed below) and an array, foundat, of the observed shifts for each reference peak (columns) and each spectra in data (rows).


If the input xaxis is omitted and data is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as xaxis. Otherwise, a lack of input for xaxis will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.
If the input <tt>xaxis</tt> is omitted and <tt>data</tt> is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as xaxis. Otherwise, a lack of input for xaxis will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.


In addition to correcting peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally the output axaxis is the same as the input xaxis. However, when interpolation is performed, the output axaxis will contain the x-axis values that correspond to the interpolated spectra in the input data.
In addition to correcting peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally the output axaxis is the same as the input xaxis. However, when interpolation is performed, the output axaxis will contain the x-axis values that correspond to the interpolated spectra in the input data.
Line 27: Line 33:
Various options can be set through the optional input structure options. These are described in detail below. It is recommended that options.order, options.maxshift, and options.window be reviewed prior to use. Note that options.maxshift and options.window are input in absolute x-axis units and the desired input values will vary depending on the original x-axis interval (i.e. data-point spacing) and expected peak widths. In addition, the order of polynomial used to correct for shifts should be reviewed (options.order). It is generally best to keep the order as low as possible (<3 is preferable) to avoid over-fitting and unusual shifting at the ends of the spectrum.
Various options can be set through the optional input structure options. These are described in detail below. It is recommended that options.order, options.maxshift, and options.window be reviewed prior to use. Note that options.maxshift and options.window are input in absolute x-axis units and the desired input values will vary depending on the original x-axis interval (i.e. data-point spacing) and expected peak widths. In addition, the order of polynomial used to correct for shifts should be reviewed (options.order). It is generally best to keep the order as low as possible (<3 is preferable) to avoid over-fitting and unusual shifting at the ends of the spectrum.


====Reference Peak Identification:====
====Usage Mode 2: Reference Peak Identification:====
 
A typical function call for reference peak identification is the following:


:peaks = registerspec(data,xaxis,options)
:peaks = registerspec(data,xaxis,options)

Revision as of 12:39, 9 October 2008

Purpose

Shift spectra based on expected peak locations.

Synopsis

[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)
peaks = registerspec(data,xaxis,options)  % "peak-find" mode

Description

REGISTERSPEC is used to correct spectra for shifts in x-axis (e.g. wavelength or frequency) registration. The alignment is based on either a polynomial or constrained-spline fit of reference peaks' observed position to their expected position. In contrast to other alignment methods (e.g. piecewise direct standardization or dynamic time warping), REGISTERSPEC may be more useful when

  • 1) x-axis shifts are small and potentially non-linear,
  • 2) only a few consistent reference peaks exist, and/or
  • 3) when some of the spectral bands are expected to undergo significant shape changes in the normal range of observations.

There are two modes used to call REGISTERSPEC:

  • A) The first mode is used to align new spectra given a set of reference peaks.
  • B) The second mode is used to help identify peaks in a calibration set that might be useful as reference peaks:

Usage Mode 1: Spectral Alignment:

A typical function call for spectral alignment is the following:

[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)

When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input a matrix or DataSet object containing spectra to be aligned, data, an x-axis reference for those spectra, xaxis, and a vector containing the expected positions of previously-identified reference peaks, peaks. Outputs are the spectra aligned to the reference peaks, data_i, the x-axis scale for those spectra, axaxis (generally the same as xaxis, except as discussed below) and an array, foundat, of the observed shifts for each reference peak (columns) and each spectra in data (rows).

If the input xaxis is omitted and data is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as xaxis. Otherwise, a lack of input for xaxis will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.

In addition to correcting peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally the output axaxis is the same as the input xaxis. However, when interpolation is performed, the output axaxis will contain the x-axis values that correspond to the interpolated spectra in the input data.

Various options can be set through the optional input structure options. These are described in detail below. It is recommended that options.order, options.maxshift, and options.window be reviewed prior to use. Note that options.maxshift and options.window are input in absolute x-axis units and the desired input values will vary depending on the original x-axis interval (i.e. data-point spacing) and expected peak widths. In addition, the order of polynomial used to correct for shifts should be reviewed (options.order). It is generally best to keep the order as low as possible (<3 is preferable) to avoid over-fitting and unusual shifting at the ends of the spectrum.

Usage Mode 2: Reference Peak Identification:

A typical function call for reference peak identification is the following:

peaks = registerspec(data,xaxis,options)

When using REGISTERSPEC to identify reference peaks, the spectral data and x-axis information is supplied alone without a list of reference peaks. In this mode, a set of spectra (often those used for a multivariate calibration model) are searched for peaks which show relatively consistant maxima. The algorithm first locates peaks on the mean spectrum by automatically identifying positions that show a clear inflection point as a peak maximum. Peaks located in the first step are then tested on the individual spectra and must meet the following criteria:

(1) For all obesrved spectra, the peak must contain a maximum value (i.e. the peak cannot be a shoulder without an inflection point).

(2) For all observed spectra, the peak must not shift more than the value set by options.maxshift (default is 4 x-axis units) from the peak's position in the mean spectrum.

The output is a list of potential reference peaks. These should be examined carefully. There is no constraint that a peak have a signal to noise or signal to background level above that which permits the maximum to be found. Thus, very low-signal peaks could be returned as stable but not be observable in future spectra. Additionally, it may be useful to take the list of reference peaks and execute REGISTERSPEC on the calibration data itself to examine the extent and nature of shifting on the calibration data itself.

Often this routine is used as a preprocessing step for a calibration model. In these cases, REGISTERSPEC should be run both on the original calibration data (first to locate reference bands, then a second time to subject the calibration data to the shift algorithm), as well as on future data prior to prediction.

INPUTS

  • data = matrix or DataSet of spectra
  • xaxis = optional frequencies or energies associated with each 'variable in data {optional; default = use DataSet values, otherwise use 1:n}
  • peaks = expected locations of peaks to use for shifting. If omitted, 'findpeaks' mode will be invoked and stable peaks will be found in the data (see below).
  • OUTPUTS
  • data_i = shifted, interpolated data
  • axaxis = interpolated xaxis (will be equal to xaxis if no interpolation is requested)
  • foundat = matrix of peak shifts found for each peak (columns) in each spectrum (rows)
  • peaks = (only for 'findpeaks' mode) Locations of found peaks in xaxis units.

Notes: If input (peaks) is omitted, the algorithm identifies peaks in the mean spectrum by setting peaks at every variable and allowing these to drift to the nearest maximum. It then locates the same peaks in each of the individual spectra and keeps only those peaks which could be located in all spectra with less shift than specified in options.maxshift.

Examples

To locate stable peaks in (unshifted) calibration data

 peaks = registerspec(calibrationdata);

To correct x-axis shift in new data using previously identified peaks

 newdata_unshifted = registerspec(newdata,peaks);

Options

  • display: [ {'on'} | 'off' ] governs command-line output
  • plots: [ {'none'} | 'fit' | 'final' ] governs plotting options
  • nopeaks: [ 'none' | {'warning'} | 'error' ] governs behavior when none of the reference peaks can be located.
  • shiftby: [{-0.1}] minimum shifting interval. A positive value is interpreted as being in absolute xaxis units and a negative value as relative to the smallest xaxis interval.
  • interpolate: [{[]}] interpolation interval for output spectra. Empty [] does no interpolation. A positive value is interpreted as being in absolute xaxis units and a negative value as relative to the smallest xaxis interval.
  • maxshift: [in xaxis units, {4}] maximum allowed peak shift (peaks which require more shift than this will NOT be used for xaxis correction).
  • window: [in xaxis units, {[]}] size of window to search for each peak. Empty [] uses automatic window based on maxshift.
  • order: order of polynomial (only used for polynomial algorithms)
  • algorithm: xaxis correction algorithm. One of:
    • 'pchip': constrained picewise spline (well behaved)
    • 'poly': {default} standard polynomial fit to found peaks
    • 'iterativepoly': iterative polynomial fitting (order increased in each cycle - works better for badly shifted spectra)
    • 'findpeaks': locate non-moving peaks in whole dataset. Triggered by omission of the (peaks) input.
  • smoothing: [ 'off' | {'on'} ] governs use of smoothing algorithm during peak location. If 'on' each sub-window is smoothed prior to locating maximum in window.
  • smoothinfo: [width order] smoothing parameters to be passed to smoothing function (savgol) if enabled by smoothing option above. width is width of window in number of variables, order is order of polynomial. Default is width of 5 and order 2: [5 2].

Algorithm

Correction of x-axis shift in a given spectrum is achieved by first locating the maximum value nearest to the expected peak locations using localized spline interpolation nearby the expected location (within options.maxshift axis units from the expected position). The observed peak locations are then compared to the expected peak locations and the difference is fit with the desired function (see options). The difference is finally removed from the spectrum using interpolation back to the expected frequency or wavelength values.

Automatic peak location is achieved by attempting to locate peaks across the entire spectrum, then searching those peaks which show less than options.maxshift change in position throughout the set of calibration spectra.

See Also

alignmat, coadd, deresolv, stdfir, stdgen