Datafit engine

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Asymmetric least squares with smoothing, baselining & robust fitting.

Synopsis

[yb,z,options] = datafit_engine(y,options);
options = datafit_engine('options');

Description

DATAFIT_ENGINE is used to A) provide a smoothed estimate (z) of the rows of input (y) or B) a baselined estimate the rows of input (y). In each case the objective function to be minimized is given by:

O(z) = (y-z)'*W0*(y-z) + lambdas*z'*Ds'*Ws*Ds*z + lambdae*(y-ye-z)'*We*(y-ye-z) + lambdab*(Pa-z)'*Wb*(Pa-z) (1).

The weights W are diagonal matrices with entries 0<w<1 allowing individual points in the optimization to be weighted differently. The penalty magnitude is given by the scalar lambda factors.

E.g., the first term in (1) includes W0 that has zeros where data are missing and ones elsewhere. The second term uses a smoothing operator Ds and penalty factor lambdas to penalize roughness (by default Ds is the second difference operator). The third term is used to include equality constriants with penalty factor lambdae (if included) and the fourth term allows for basis functions (P) to be used (if included).

A) Smoothing is obtained using options.trbflag = 'none' or 'middle'. In this case the output (z), the smoothed estimate of (y) is likey desired. Output yb = y - z would then correspond to fit residuals. if options.trbflag = 'middle' a robust fitting strategy is used.
B) Baselining is obtained using options.trbflag = 'top' or 'bottom' via an asymmetric least squares fitting strategy. In this case the output (yb) is the baselined estimate of (y) and output (z) corresponds to the baseline.

See additional information on (options.trbflag) below.

Inputs

  • y = M by N matrix of ROW vectors to be smoothed/fit (class "double" or "dataset").

Options

options = a structure array with the following fields:

  • display : [ {'off'} | 'on'] governs level of display (waitbar on/off).
  • trbflag : [ {'none'} | 'middle' | 'bottom' | 'top' ]
'none' runs the smoother once (see WSMOOTH),
'middle' uses robust least squares to fit a curve to the middle of the data,
'bottom' uses asymmetric least squares to fit a curve to the bottom of the data (e.g., baselining spectra),
'top' uses asymmetric least squares to fit a curve to the top of the data (e.g., fitting a Plank function).

Constraints, penalties and weights.

  • w0 : {ones(1,N)} weights on the fit term: (y-z)'*W0*(y-z);
[] if empty or not length N then
w0(y.include{2}) := 1 for (y) class DataSet or
w0 := ones(1,N) for (y) class double.
  • lambdas : {1} Positive scalar defines how hard to smooth. Typically 10 < lambdas <1e6 with large values giving more smoothing.
  • ws : {ones(1,N)} weights on the smoothing term lambdas*z'*Ds'*Ws*Ds*z; [] if empty or not length N then ws := ones(1,N).
  • widths : {3} Smoothing window width for Ds (see SAVGOL). (Typically=3).
  • orders : {2} Smoothing polynomial order for Ds (see SAVGOL). (Typically=2).
  • derivs : {2} Smoothing derivative for Ds (see SAVGOL). (Typically=2).
  • lambdae : {0} Positive scalar defines how hard to fit to equality constraints. Typically 0 <= lambdae <1e6.
  • ye : {NaN(1,N)} non-NaN entries are the equality constraints on z, NaN entries are unconstrained.
  • we : {zeros(1,N)} weights on equality constraint term lambdae*(z-y-ye)'*We*(z-y-ye) we(isnan(ye)) = 0, we(~isnan(ye)) = 1.
  • lambdab : {0} Positive scalar defines how hard to fit to basis function(s). Typically 0 < lambdab <1e6.
  • wb : {zeros(1,N)} weights on basis function term lambdab*(z-Pa)'*Wb*(z-Pa)
  • orderp : {[]} polynomial order (scalar >=0) (see POLYBASIS).
  • knotp : {[]} (scalar integer >=1) integer defining the number of uniformly distributed INTERIOR knots there are then knotp+2 knots positioned at t = linspace(1,N,knotp+2); (see SMOOTHBASIS) (1xKt vector) defining manually placed knot positions on [1:N].
Note: that knot positions must be such that there are at least 3 unique data points between eachknot.
  • basisp : {[]} NxKb custom basis. It is recommended that each column normalized to unit length (2-norm).
  • nonneg : {[0] | 1 } scalar indiacting if non-negativity should be applied to the coefficients on (options.basisp). The default is 0 = no. Kbx1 vector of indices (ones and zeros) indicating which coefficients should be non-negative. (See FASTERNNLS);

Robust and asymmetric iterative fitting parameters.

  • tol : {[]} is the base fit tolerance (scalar). If empty it is estimated by the mean absolute deviation of y.data - savgol(y.data,3,1,0); (see MADC).
  • tfac : {[1]} tolerance factor where res = (y.data-z)/(tol*tfac) is a considered large residual.
  • p : {0.01} "asymmetry factor", where at each iteration w0 = (res>tol*tfac)*p+(res<=tol*tofac) .
  • ittol : {1e-3} relative tolerance exit criterion norm(w0-w0old)/norm(w0)<ittol.
  • itmax : {100} maximum iterations exit criterion.
  • timemax : {600} maximum time (s) for each row of y exit criterion.

Outputs

  • yb = y - z.
  • z = the smoothed estiamte of (y)..

See Also

baseline, baselineds, preprocess, wlsbaseline, madc