Datafit engine: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Scott
(Created page with "===Purpose=== Asymmetric least squares with smoothing, baselining & robust fitting. ===Synopsis=== : [yb,z,options] = datafit_engine(y,options); : options = datafit_engine('...")
 
imported>Scott
No edit summary
Line 23: Line 23:
===Inputs===
===Inputs===


* '''spec''' = ''M'' by ''N'' matrix of data to be baslined (class "double" or "dataset").
* '''y''' = ''M'' by ''N'' matrix of ROW vectors to be smoothed/fit (class "double" or "dataset").


===Options===
===Options===
Line 29: Line 29:
''options'' =  a structure array with the following fields:
''options'' =  a structure array with the following fields:


* '''plots''' : [ {'none'} | 'final' ] governs plotting.
* '''display''' : [ {'off'} | 'on'] governs level of display (waitbar on/off).


* '''algorithm''' : [ {'wlsbaseline'} | 'baseline' | 'whittaker' | 'datafit']
* '''trbflag''' : [ {'none'} | 'middle' | 'bottom' | 'top' ]
:: [[wlsbaseline]] - Baseline subtraction using iterative asymmetric least squares algorithm.
:: 'none' runs the smoother once (see WSMOOTH),
:: [[baseline]]    - Subtracts a polynomial baseline offset from spectra.
:: 'middle' uses robust least squares to fit a curve to the middle of the data,
:: [[wlsbaseline | whittaker]]  - Baseline subtraction using Whittaker filter.
:: 'bottom' uses asymmetric least squares to fit a curve to the bottom of the data (e.g., baselining spectra),
:: [[datafit_engine | datafit]]    - Asymmetric least squares baselining.
:: 'top'    uses asymmetric least squares to fit a curve to the top    of the data (e.g., fitting a Plank function).


* '''order''' : positive integer for polynomial order {default =1}.
Constraints, penalties and weights.


* '''wlsbaseline_options''' : see wlsbaseline.m.
* '''w0: {ones(1,N)} weights on the fit term:      (y-z)'*W0*(y-z);


* '''whittaker_options''' : see wlsbaseline.m.
:: [] if empty or not length N then
:: w0(y.include{2}) := 1 for (y) class DataSet or
:: w0 := ones(1,N) for (y) class double.
* '''lambdas: {1} Positive scalar defines how hard to smooth. Typically 10 < lambdas <1e6 with large values giving more smoothing.
* '''ws: {ones(1,N)} weights on the smoothing term lambdas*z'*Ds'*Ws*Ds*z; [] if empty or not length N then ws := ones(1,N).
* '''widths: {3} Smoothing window width for Ds (see SAVGOL). (Typically=3).
* '''orders: {2} Smoothing polynomial order for Ds (see SAVGOL). (Typically=2).
* '''derivs: {2} Smoothing derivative for Ds (see SAVGOL). (Typically=2).
* '''lambdae: {0} Positive scalar defines how hard to fit to equality constraints. Typically 0 <= lambdae <1e6.
* '''ye: {NaN(1,N)} non-NaN entries are the equality constraints on z, NaN entries are unconstrained.
* '''we: {zeros(1,N)} weights on equality constraint term lambdae*(z-y-ye)'*We*(z-y-ye) we(isnan(ye)) = 0, we(~isnan(ye)) = 1.
* '''lambdab: {0} Positive scalar defines how hard to fit to basis function(s). Typically 0 < lambdab <1e6.
* ''' wb: {zeros(1,N)} weights on basis function term lambdab*(z-Pa)'*Wb*(z-Pa)
* '''orderp: {[]} polynomial order (scalar >=0) (see POLYBASIS).
* '''knotp: {[]} (scalar integer >=1) integer defining the number of uniformly distributed INTERIOR knots there are then knotp+2 knots positioned at t = linspace(1,N,knotp+2);  (see SMOOTHBASIS) (1xKt vector) defining manually placed knot positions on [1:N].


* '''baseline_freqs''' : wavenumber or frequency axis vector, see baseline.m.
::Note: that knot positions must be such that there are at least 3 unique data points between eachknot.
* '''baseline_range''' : baseline regions, see baseline.m.
* '''basisp: {[]} NxKb custom basis. It is recommended that each column normalized to unit length (2-norm).
* '''baseline_options''' : see baseline.m.
* '''nonneg: {[0] | 1 } scalar indiacting if non-negativity should be applied to the coefficients on (options.basisp). The default is 0 = no. Kbx1 vector of indices (ones and zeros) indicating which coefficients should be non-negative. (See FASTERNNLS);       


* '''datafit_options''' : see datafit_engine.m. NOTE: 'lambdas' and 'trbflag' options have defaults updated for baselining.
Robust and asymmetric iterative fitting parameters.
 
* '''tol: {[]} is the base fit tolerance (scalar). If empty it is estimated by the mean absolute deviation of y.data - savgol(y.data,3,1,0); (see MADAC).
* '''tfac: {[1]} tolerance factor where res = (y.data-z)/(tol*tfac) is a considered large residual.
* '''p: {0.01} "asymmetry factor", where at each iteration w0 = (res>tol*tfac)*p+(res<=tol*tofac) .
* '''ittol: {1e-3} relative tolerance exit criterion norm(w0-w0old)/norm(w0)<ittol.
* '''itmax: {100} maximum iterations exit criterion.
* '''timemax: {600} maximum time (s) for each row of y exit criterion.


===Outputs===
===Outputs===

Revision as of 16:20, 17 December 2018

Purpose

Asymmetric least squares with smoothing, baselining & robust fitting.

Synopsis

[yb,z,options] = datafit_engine(y,options);
options = datafit_engine('options');

Description

DATAFIT_ENGINE is used to A) provide a smoothed estimate (z) of the rows of input (y) or B) a baselined estimate the rows of input (y). In each case the objective function to be minimized is given by:

O(z) = (y-z)'*W0*(y-z) + lambdas*z'*Ds'*Ws*Ds*z + lambdae*(y-ye-z)'*We*(y-ye-z) + lambdab*(Pa-z)'*Wb*(Pa-z) (1).

The weights W are diagonal matrices with entries 0<w<1 allowing individual points in the optimization to be weighted differently. The penalty magnitude is given by the scalar lambda factors.

E.g., the first term in (1) includes W0 that has zeros where data are missing and ones elsewhere. The second term uses a smoothing operator Ds and penalty factor lambdas to penalize roughness (by default Ds is the second difference operator). The third term is used to include equality constriants with penalty factor lambdae (if included) and the fourth term allows for basis functions (P) to be used (if included).

A) Smoothing is obtained using options.trbflag = 'none' or 'middle'. In this case the output (z), the smoothed estimate of (y) is likey desired. Output yb = y - z would then correspond to fit residuals. if options.trbflag = 'middle' a robust fitting strategy is used.
B) Baselining is obtained using options.trbflag = 'top' or 'bottom' via an asymmetric least squares fitting strategy. In this case the output (yb) is the baselined estimate of (y) and output (z) corresponds to the baseline.

See additional information on (options.trbflag) below.

Inputs

  • y = M by N matrix of ROW vectors to be smoothed/fit (class "double" or "dataset").

Options

options = a structure array with the following fields:

  • display : [ {'off'} | 'on'] governs level of display (waitbar on/off).
  • trbflag : [ {'none'} | 'middle' | 'bottom' | 'top' ]
'none' runs the smoother once (see WSMOOTH),
'middle' uses robust least squares to fit a curve to the middle of the data,
'bottom' uses asymmetric least squares to fit a curve to the bottom of the data (e.g., baselining spectra),
'top' uses asymmetric least squares to fit a curve to the top of the data (e.g., fitting a Plank function).

Constraints, penalties and weights.

  • w0: {ones(1,N)} weights on the fit term: (y-z)'*W0*(y-z);
[] if empty or not length N then
w0(y.include{2}) := 1 for (y) class DataSet or
w0 := ones(1,N) for (y) class double.
  • lambdas: {1} Positive scalar defines how hard to smooth. Typically 10 < lambdas <1e6 with large values giving more smoothing.
  • ws: {ones(1,N)} weights on the smoothing term lambdas*z'*Ds'*Ws*Ds*z; [] if empty or not length N then ws := ones(1,N).
  • widths: {3} Smoothing window width for Ds (see SAVGOL). (Typically=3).
  • orders: {2} Smoothing polynomial order for Ds (see SAVGOL). (Typically=2).
  • derivs: {2} Smoothing derivative for Ds (see SAVGOL). (Typically=2).
  • lambdae: {0} Positive scalar defines how hard to fit to equality constraints. Typically 0 <= lambdae <1e6.
  • ye: {NaN(1,N)} non-NaN entries are the equality constraints on z, NaN entries are unconstrained.
  • we: {zeros(1,N)} weights on equality constraint term lambdae*(z-y-ye)'*We*(z-y-ye) we(isnan(ye)) = 0, we(~isnan(ye)) = 1.
  • lambdab: {0} Positive scalar defines how hard to fit to basis function(s). Typically 0 < lambdab <1e6.
  • wb: {zeros(1,N)} weights on basis function term lambdab*(z-Pa)'*Wb*(z-Pa)
  • orderp: {[]} polynomial order (scalar >=0) (see POLYBASIS).
  • knotp: {[]} (scalar integer >=1) integer defining the number of uniformly distributed INTERIOR knots there are then knotp+2 knots positioned at t = linspace(1,N,knotp+2); (see SMOOTHBASIS) (1xKt vector) defining manually placed knot positions on [1:N].
Note: that knot positions must be such that there are at least 3 unique data points between eachknot.
  • basisp: {[]} NxKb custom basis. It is recommended that each column normalized to unit length (2-norm).
  • nonneg: {[0] | 1 } scalar indiacting if non-negativity should be applied to the coefficients on (options.basisp). The default is 0 = no. Kbx1 vector of indices (ones and zeros) indicating which coefficients should be non-negative. (See FASTERNNLS);

Robust and asymmetric iterative fitting parameters.

  • tol: {[]} is the base fit tolerance (scalar). If empty it is estimated by the mean absolute deviation of y.data - savgol(y.data,3,1,0); (see MADAC).
  • tfac: {[1]} tolerance factor where res = (y.data-z)/(tol*tfac) is a considered large residual.
  • p: {0.01} "asymmetry factor", where at each iteration w0 = (res>tol*tfac)*p+(res<=tol*tofac) .
  • ittol: {1e-3} relative tolerance exit criterion norm(w0-w0old)/norm(w0)<ittol.
  • itmax: {100} maximum iterations exit criterion.
  • timemax: {600} maximum time (s) for each row of y exit criterion.

Outputs

  • baselined_data = data with baseline removed.
  • baselines = baselines removed from data.

See Also

baseline, datafit_engine, preprocess, wlsbaseline