Registerspec: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Chuck
No edit summary
imported>Chuck
Line 27: Line 27:
When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input:
When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input:


: a matrix or DataSet object containing spectra to be aligned, <tt>data</tt>,  
* a matrix or DataSet object containing spectra to be aligned, <tt>data</tt>,  
: an x-axis reference for those spectra, <tt>xaxis</tt>, and
* an x-axis reference for those spectra, <tt>xaxis</tt>, and
: a vector containing the expected positions of previously-identified reference peaks, <tt>peaks</tt>
* a vector containing the expected positions of previously-identified reference peaks, <tt>peaks</tt>


'''Note:''' If the input <tt>xaxis</tt> is omitted and <tt>data</tt> is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as <tt>xaxis</tt>. Otherwise, a lack of input for <tt>xaxis</tt> will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.
'''Note:''' If the input <tt>xaxis</tt> is omitted and <tt>data</tt> is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as <tt>xaxis</tt>. Otherwise, a lack of input for <tt>xaxis</tt> will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.
Line 35: Line 35:
Resulting outputs are  
Resulting outputs are  


: the spectra aligned to the reference peaks <tt>data_i</tt>,
* the spectra aligned to the reference peaks <tt>data_i</tt>,
: the x-axis scale for those spectra <tt>axaxis</tt> (generally the same as <tt>xaxis</tt>, except as discussed below) and
* the x-axis scale for those spectra <tt>axaxis</tt> (generally the same as <tt>xaxis</tt>, except as discussed below) and
: an array, <tt>foundat</tt>, of the observed shifts for each reference peak (columns) and each spectrum in <tt>data</tt> (rows).
* an array, <tt>foundat</tt>, of the observed shifts for each reference peak (columns) and each spectrum in <tt>data</tt> (rows).


In addition to correcting for peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally, the output <tt>axaxis</tt> is the same as the input <tt>xaxis</tt>. However, when interpolation is performed, the output <tt>axaxis</tt> will contain the x-axis values that correspond to the interpolated spectra in the input data.
In addition to correcting for peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally, the output <tt>axaxis</tt> is the same as the input <tt>xaxis</tt>. However, when interpolation is performed, the output <tt>axaxis</tt> will contain the x-axis values that correspond to the interpolated spectra in the input data.

Revision as of 15:09, 9 October 2008

Purpose

Shift spectra based on expected peak locations.

Synopsis

[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)
peaks = registerspec(data,xaxis,options)  % "peak-find" mode

Description

REGISTERSPEC is used to correct spectra for shifts in x-axis (e.g. wavelength or frequency) registration. The alignment is based on either a polynomial or constrained-spline fit of reference peaks' observed position to their expected position. In contrast to other alignment methods (e.g. piecewise direct standardization or dynamic time warping), REGISTERSPEC may be more useful when

  • a) x-axis shifts are small and potentially non-linear,
  • b) only a few consistent reference peaks exist, and/or
  • c) when some of the spectral bands are expected to undergo significant shape changes in the normal range of observations.

There are two modes used to call REGISTERSPEC:

  • 1) The first mode is used to align new spectra given a set of reference peaks.
  • 2) The second mode is used to help identify peaks in a calibration set that might be useful as reference peaks:

Usage Mode 1: Spectral Alignment:

A typical function call for spectral alignment is the following:

[data_i,axaxis,foundat] = registerspec(data,xaxis,peaks,options)

When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input:

  • a matrix or DataSet object containing spectra to be aligned, data,
  • an x-axis reference for those spectra, xaxis, and
  • a vector containing the expected positions of previously-identified reference peaks, peaks

Note: If the input xaxis is omitted and data is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as xaxis. Otherwise, a lack of input for xaxis will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.

Resulting outputs are

  • the spectra aligned to the reference peaks data_i,
  • the x-axis scale for those spectra axaxis (generally the same as xaxis, except as discussed below) and
  • an array, foundat, of the observed shifts for each reference peak (columns) and each spectrum in data (rows).

In addition to correcting for peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally, the output axaxis is the same as the input xaxis. However, when interpolation is performed, the output axaxis will contain the x-axis values that correspond to the interpolated spectra in the input data.

Various options can be set through the optional input structure options. These are described in detail below. It is recommended that three options fields- options.order, options.maxshift, and options.window- be reviewed prior to use. Note that options.maxshift and options.window are input in absolute x-axis units and the desired input values will vary depending on the original x-axis interval (i.e. data-point spacing) and expected peak widths. In addition, the order of polynomial used to correct for shifts should be reviewed (options.order). It is generally best to keep the order as low as possible (<3 is preferable) to avoid over-fitting and unusual shifting at the ends of the spectrum.

Often, this usage of REGISTERSPEC is for "preprocessing" of spectral data before developing a calibration model. In these cases, REGISTERSPEC should be run both on the original calibration data (first to locate reference bands, then a second time to subject the calibration data to the shift algorithm), as well as on any future data that is to be applied to the calibration.

Usage Mode 2: Reference Peak Identification:

A typical function call for reference peak identification is the following:

peaks = registerspec(data,xaxis,options)

When using REGISTERSPEC to identify reference peaks, the spectral data and x-axis information is supplied alone without a list of reference peaks. In this mode, a set of spectra (often those used for a multivariate calibration model) are searched for peaks which show relatively consistent maxima. The algorithm first locates peaks on the mean spectrum by automatically identifying positions that show a clear inflection point as a peak maximum. Peaks located in the first step are then tested on the individual spectra, and must meet the following criteria:

(1) For all obesrved spectra, the peak must contain a maximum value (i.e. the peak cannot be a shoulder without an inflection point).
(2) For all observed spectra, the peak must not shift more than the value set by options.maxshift (default is 4 x-axis units) from the peak's position in the mean spectrum.

The output is a list of potential reference peaks. These should be examined carefully. There is no constraint that a peak have a signal to noise or signal to background level above that which permits the maximum to be found. Thus, very low-signal peaks could be returned as stable but not be observable in future spectra. Additionally, it may be useful to take the list of reference peaks and execute REGISTERSPEC on the calibration data itself to examine the extent and nature of shifting on the calibration data itself.

Inputs

  • data = matrix or DataSet containing spectral data
  • xaxis = optional frequencies or energies associated with each 'variable in data {optional; default = use DataSet values, otherwise use 1:n}
  • peaks = expected locations of peaks to use for shifting. If omitted, 'findpeaks' mode will be invoked and stable peaks will be found in the data (see below).

Notes: If input (peaks) is omitted, the algorithm identifies peaks in the mean spectrum by setting peaks at every variable and allowing these to drift to the nearest maximum. It then locates the same peaks in each of the individual spectra and keeps only those peaks which could be located in all spectra with less shift than specified in options.maxshift.

Outputs

  • data_i = shifted, interpolated data
  • axaxis = interpolated xaxis (will be equal to xaxis if no interpolation is requested)
  • foundat = matrix of peak shifts found for each peak (columns) in each spectrum (rows)
  • peaks = (only for 'findpeaks' mode) Locations of found peaks in xaxis units.

Examples

To locate stable peaks in (unshifted) calibration data

peaks = registerspec(calibrationdata);

To correct x-axis shift in new data using previously identified peaks

newdata_unshifted = registerspec(newdata,peaks);

Options

options = a structure array with the following fields:

  • display: [ {'on'} | 'off' ] governs command-line output
  • plots: [ {'none'} | 'fit' | 'final' ] governs plotting options
  • waitbar: [ 'off' | 'on' | {'auto'} ] governs use of waitbar. If set to 'auto', waitbar will display if excessive time to process is expected.
  • nopeaks: [ 'none' | {'warning'} | 'error' ] governs behavior when none of the reference peaks can be located.
  • shiftby: [{-0.1}] minimum shifting interval. A positive value is interpreted as being in absolute xaxis units and a negative value as relative to the smallest xaxis interval.
  • interpolate: [{[]}] interpolation interval for output spectra. Empty [] does no interpolation. A positive value is interpreted as being in absolute xaxis units and a negative value as relative to the smallest xaxis interval.
  • maxshift: [in xaxis units, {4}] maximum allowed peak shift (peaks which require more shift than this will NOT be used for xaxis correction).
  • window: [in xaxis units, {[]}] size of window to search for each peak. Empty [] uses automatic window based on maxshift.
  • order: order of polynomial (only used for polynomial algorithms)
  • algorithm: xaxis correction algorithm. One of:
    • pchip: constrained piecewise spline (well behaved)
    • poly: {default} standard polynomial fit to found peaks
    • iterativepoly: iterative polynomial fitting (order increased in each cycle - works better for badly shifted spectra)
    • findpeaks: locate non-moving peaks in whole dataset. Triggered by omission of the peaks input.
  • smoothing: [ 'off' | {'on'} ] governs use of smoothing algorithm during peak location. If 'on' each sub-window is smoothed prior to locating maximum in window.
  • smoothinfo: [width order] smoothing parameters to be passed to smoothing function savgol if enabled by smoothing option above. width is width of window in number of variables, order is order of polynomial. Default is width of 5 and order 2: [5 2].

Algorithm

Correction of x-axis shift in a given spectrum is achieved by first locating the maximum value nearest to the expected peak locations using localized spline interpolation nearby the expected location (within options.maxshift axis units from the expected position). The observed peak locations are then compared to the expected peak locations and the difference is fit with the desired function (see options). The difference is finally removed from the spectrum using interpolation back to the expected frequency or wavelength values.

Automatic peak location is achieved by attempting to locate peaks across the entire spectrum, then searching those peaks which show less than options.maxshift change in position throughout the set of calibration spectra.

See Also

alignmat, coadd, deresolv, stdfir, stdgen