Lmoptimize: Difference between revisions

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search
imported>Scott
imported>Neal
 
(12 intermediate revisions by one other user not shown)
Line 61: Line 61:


The objective function is defined as  , where  is a  x1 vector. The Jacobian  and the symmetric Hessian  are defined as
The objective function is defined as  , where  is a  x1 vector. The Jacobian  and the symmetric Hessian  are defined as
<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahQeacqGH9a -->
<!-- qpdaWcaaqaaiaabsgacaWGMbaabaGaaeizaiaahIhaaaGaeyypa0Za -->
<!-- amWaaeaafaWabeabbaaaaeaadaWcgaqaaiabgkGi2kaadAgaaeaacq -->
<!-- GHciITcaWG4bWaaSbaaSqaaiaaigdaaeqaaaaaaOqaamaalyaabaGa -->
<!-- eyOaIyRaamOzaaqaaiabgkGi2kaadIhadaWgaaWcbaGaaGOmaaqaba -->
<!-- aaaaGcbaGaeSO7I0eabaWaaSGbaeaacqGHciITcaWGMbaabaGaeyOa -->
<!-- IyRaamiEamaaBaaaleaacaWGobaabeaaaaaaaaGccaGLBbGaayzxaa -->
<!-- aaaa@4EA9@ -->


EQUATIONS
<math>\mathbf{J}=\frac{\text{d}f}{\text{d}\mathbf{x}}=\left[ \begin{matrix}
  {\partial f}/{\partial x_{1}}\;  \\
  {\partial f}/{\partial x_{2}}\;  \\
  \vdots  \\
  {\partial f}/{\partial x_{N}}\;  \\
\end{matrix} \right]</math>
 
<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahIeacqGH9a -->
<!-- qpdaWcaaqaaiaabsgacaqGGaGaaeiiaaqaaiaabsgacaWH4baaamaa -->
<!-- bmaabaWaaSaaaeaacaqGKbGaamOzaaqaaiaabsgacaWH4baaaaGaay -->
<!-- jkaiaawMcaamaaCaaaleqabaGaamivaaaakiabg2da9maadmaabaqb -->
<!-- amqabqabaaaaaeaadaWcgaqaaiabgkGi2oaaCaaaleqabaGaaGOmaa -->
<!-- aakiaadAgaaeaacqGHciITcaWG4bWaa0baaSqaaiaaigdaaeaacaaI -->
<!-- YaaaaaaaaOqaamaalyaabaGaeyOaIy7aaWbaaSqabeaacaaIYaaaaO -->
<!-- GaamOzaaqaaiabgkGi2kaadIhadaWgaaWcbaGaaGymaaqabaGccqGH -->
<!-- ciITcaWG4bWaaSbaaSqaaiaaikdaaeqaaaaaaOqaaiabl+Uimbqaam -->
<!-- aalyaabaGaeyOaIy7aaWbaaSqabeaacaaIYaaaaOGaamOzaaqaaiab -->
<!-- gkGi2kaadIhadaWgaaWcbaGaaGymaaqabaGccqGHciITcaWG4bWaaS -->
<!-- baaSqaaiaad6eaaeqaaaaaaOqaamaalyaabaGaeyOaIy7aaWbaaSqa -->
<!-- beaacaaIYaaaaOGaamOzaaqaaiabgkGi2kaadIhadaWgaaWcbaGaaG -->
<!-- OmaaqabaGccqGHciITcaWG4bWaaSbaaSqaaiaaigdaaeqaaaaaaOqa -->
<!-- amaalyaabaGaeyOaIy7aaWbaaSqabeaacaaIYaaaaOGaamOzaaqaai -->
<!-- abgkGi2kaadIhadaqhaaWcbaGaaGOmaaqaaiaaikdaaaaaaaGcbaGa -->
<!-- eS47IWeabaWaaSGbaeaacqGHciITdaahaaWcbeqaaiaaikdaaaGcca -->
<!-- WGMbaabaGaeyOaIyRaamiEamaaBaaaleaacaaIYaaabeaakiabgkGi -->
<!-- 2kaadIhadaWgaaWcbaGaamOtaaqabaaaaaGcbaGaeSO7I0eabaGaeS -->
<!-- O7I0eabaGaeSy8I8eabaGaeSO7I0eabaWaaSGbaeaacqGHciITdaah -->
<!-- aaWcbeqaaiaaikdaaaGccaWGMbaabaGaeyOaIyRaamiEamaaBaaale -->
<!-- aacaWGobaabeaakiabgkGi2kaadIhadaWgaaWcbaGaaGymaaqabaaa -->
<!-- aaGcbaWaaSGbaeaacqGHciITdaahaaWcbeqaaiaaikdaaaGccaWGMb -->
<!-- aabaGaeyOaIyRaamiEamaaBaaaleaacaWGobaabeaakiabgkGi2kaa -->
<!-- dIhadaWgaaWcbaGaaGOmaaqabaaaaaGcbaGaeS47IWeabaWaaSGbae -->
<!-- aacqGHciITdaahaaWcbeqaaiaaikdaaaGccaWGMbaabaGaeyOaIyRa -->
<!-- amiEamaaDaaaleaacaWGobaabaGaaGOmaaaaaaaaaaGccaGLBbGaay -->
<!-- zxaaaaaa@A28D@ -->
 
<math>\mathbf{H}=\frac{\text{d  }}{\text{d}\mathbf{x}}\left( \frac{\text{d}f}{\text{d}\mathbf{x}} \right)^{T}=\left[ \begin{matrix}
  {\partial ^{2}f}/{\partial x_{1}^{2}}\; & {\partial ^{2}f}/{\partial x_{1}\partial x_{2}}\; & \cdots  & {\partial ^{2}f}/{\partial x_{1}\partial x_{N}}\;  \\
  {\partial ^{2}f}/{\partial x_{2}\partial x_{1}}\; & {\partial ^{2}f}/{\partial x_{2}^{2}}\; & \cdots  & {\partial ^{2}f}/{\partial x_{2}\partial x_{N}}\;  \\
  \vdots  & \vdots  & \ddots  & \vdots  \\
  {\partial ^{2}f}/{\partial x_{N}\partial x_{1}}\; & {\partial ^{2}f}/{\partial x_{N}\partial x_{2}}\; & \cdots  & {\partial ^{2}f}/{\partial x_{N}^{2}}\;  \\
\end{matrix} \right]</math>


Two types of calls to the function fun are made. The first type is used often and is a simple evaluation of the function at x given by
Two types of calls to the function fun are made. The first type is used often and is a simple evaluation of the function at x given by


:fval = fun(x,params1,params2,...);
<pre>fval = fun(x,params1,params2,...);</pre>


The second type of call returns the Jacobian and Hessian
The second type of call returns the Jacobian and Hessian


:[fval,jacobian,hessian] = fun(x,params1,params2,...);
<pre>[fval,jacobian,hessian] = fun(x,params1,params2,...);</pre>
 
Therefore, to enhance the speed of the optimization, the M-file that evaluates the objective function should only evaluate the Jacobian and Hessian if nargout>1 as in the following example:
 
<source lang="matlab">


Therefore, to enhance the speed of the optimization, the M-file that evaluates the objective function should only evaluate the Jacobian and Hessian if nargout>1 as in the following example.
<pre>
function [p,p1,p2] = banana(x)
function [p,p1,p2] = banana(x)
%BANANA Rosenbrock's function
%BANANA Rosenbrock's function
%  INPUT:
%  INPUT:
Line 100: Line 162:
         -40\*x(1),            20]\*alpha;
         -40\*x(1),            20]\*alpha;
end
end
</pre>
</source>


This example shows that the Jacobian and Hessian are not evaluated unless explicitly called for by utilizing the nargout command. Since estimating  (output p1) and  (output p2) can be time consuming, this coding practice is expected to speed up the optimization.
This example shows that the Jacobian and Hessian are not evaluated unless explicitly called for by utilizing the nargout command. Since estimating  (output p1) and  (output p2) can be time consuming, this coding practice is expected to speed up the optimization.


A single step in a Gauss-Newton (G-N) optimization,   is given as
A single step in a Gauss-Newton (G-N) optimization, <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabfs5aejaahI -->
<!-- hadaWgaaWcbaGaam4Aaaqabaaaaa@3647@ -->
<math>\Delta \mathbf{x}_{k}</math>
is given as
 
<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabfs5aejaahI -->
<!-- hadaWgaaWcbaGaam4AaaqabaGccqGH9aqpcqGHsislcaWHibWaa0ba -->
<!-- aSqaaiaadUgaaeaacqGHsislcaaIXaaaaOGaaCOsamaaBaaaleaaca -->
<!-- WGRbaabeaaaaa@3DD2@ -->
 
<math>\Delta \mathbf{x}_{k}=-\mathbf{H}_{k}^{-1}\mathbf{J}_{k}</math>


 


where the index  corresponds to the step number.
where the index  corresponds to the step number.
Line 112: Line 193:
A problem with the G-N methods is that the inverse of the Hessian may not exist at every step, or it can converge to a saddle point if the Hessian is not positive definite [T.F. Edgar, D.M. Himmelblau, Optimization of Chemical Processes, 1st ed., McGraw-Hill Higher Education, New York, NY, 1988]. As an alternative, the Levenberg-Marquardt (L-M) method was used for CMF [K. Levenberg, Q. Appl. Math 2''' '''(1944) 164; D. Marquardt, S.I.A.M. J. Appl. Math 11''' '''(1963) 431; Edgar et al.]. A single step for the L-M method is given by
A problem with the G-N methods is that the inverse of the Hessian may not exist at every step, or it can converge to a saddle point if the Hessian is not positive definite [T.F. Edgar, D.M. Himmelblau, Optimization of Chemical Processes, 1st ed., McGraw-Hill Higher Education, New York, NY, 1988]. As an alternative, the Levenberg-Marquardt (L-M) method was used for CMF [K. Levenberg, Q. Appl. Math 2''' '''(1944) 164; D. Marquardt, S.I.A.M. J. Appl. Math 11''' '''(1963) 431; Edgar et al.]. A single step for the L-M method is given by


 
<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabfs5aejaahI -->
<!-- hadaWgaaWcbaGaam4AaaqabaGccqGH9aqpcqGHsisldaqadaqaaiaa -->
<!-- hIeadaWgaaWcbaGaam4AaaqabaGccqGHRaWkcqaH4oqCcaWHjbaaca -->
<!-- GLOaGaayzkaaWaaWbaaSqabeaacqGHsislcaaIXaaaaOGaaCOsamaa -->
<!-- BaaaleaacaWGRbaabeaaaaa@42FB@ -->
 
<math>\Delta \mathbf{x}_{k}=-\left( \mathbf{H}_{k}+\theta \mathbf{I} \right)^{-1}\mathbf{J}_{k}</math>
 
where <math>\theta </math> is a damping parameter and <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahMeaaaa@3396@ -->
<math>\mathbf{I}</math>
is a  x  identity matrix. This has a direct analogy to ridge regression [A.E. Hoerl, R.W. Kennard, K.F. Baldwin, Commun. Statist. 4''' '''(1975) 105] with <math>\theta </math>, the ridge parameter, constraining the size of the step. This method is also called a damped G-N method [G. Tomasi, R. Bro, Comput. Stat. Data Anal. in press''' '''(2005)]. There are several details to implementing the L-M approach [M. Lampton, Comput. Phys. 11''' '''(1997) 110]. Details associated with the LMOPTIMIZE function are discussed here.
 
At each iteration in the algorithm, the inverse of <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahIeadaWgaa -->
<!-- WcbaGaam4AaaqabaGccqGHRaWkcqaH4oqCcaWHjbaaaa@3824@ -->
<math>\mathbf{H}_{k}+\theta \mathbf{I}</math> must be estimated. As a part of this process the singular value decomposition (SVD) of <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahIeadaWgaa -->
<!-- WcbaGaam4Aaaqabaaaaa@34B0@ --><math>\mathbf{H}_{k}</math> is calculated as
 
<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahAfacaWHtb -->
<!-- GaaCOvamaaCaaaleqabaGaamivaaaakiabg2da9iaahIeadaWgaaWc -->
<!-- baGaam4Aaaqabaaaaa@3960@ -->
 
<math>\mathbf{VSV}^{T}=\mathbf{H}_{k}</math>


where  is a damping parameter and  is a  x  identity matrix. This has a direct analogy to ridge regression [A.E. Hoerl, R.W. Kennard, K.F. Baldwin, Commun. Statist. 4''' '''(1975) 105] with  , the ridge parameter, constraining the size of the step. This method is also called a damped G-N method [G. Tomasi, R. Bro, Comput. Stat. Data Anal. in press''' '''(2005)]. There are several details to implementing the L-M approach [M. Lampton, Comput. Phys. 11''' '''(1997) 110]. Details associated with the LMOPTIMIZE function are discussed here.


At each iteration in the algorithm, the inverse of  must be estimated. As a part of this process the singular value decomposition (SVD) of   is calculated as
Note that the left and right singular vectors are the same (and equal to <math>\mathbf{V}</math>) because the Hessian is symmetric. If the optimization surface is convex, <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahIeadaWgaa -->
<!-- WcbaGaam4Aaaqabaaaaa@34B0@ -->
<math>\mathbf{H}_{k}</math>
will be positive definite and the diagonal matrix <math>\mathbf{S}</math> will have all positive values on the diagonal. However, the optimization problem may be such that this is not the case at every step. Therefore a small number <math>\alpha</math> is added to the diagonal of  in an effort to ensure that the Hessian will always be positive definite. In the algorithm <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeg7aHjabg2 -->
<!-- da9maalyaabaGaaC4uamaaBaaaleaacaaIXaGaaiilaiaaigdaaeqa -->
<!-- aaGcbaGaamOBaiaadogacaWGVbGaamOBaiaadsgaaaaaaa@3D62@ -->
<math>\alpha ={\mathbf{S}_{1,1}}/{ncond}\;</math>
, where <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahofadaWgaa -->
<!-- WcbaGaaGymaiaacYcacaaIXaaabeaaaaa@35F2@ -->
<math>\mathbf{S}_{1,1}</math>
is the largest singular value and ''ncond'' is the maximum condition number desired for the Hessian [''ncond'' is input as options.ncond]. This can be viewed as adding a small dampening to the optimization and is always included at every step. In contrast, an additional damping factor that is allowed to adapt at each step is also included. The adapting dampening factor is given by <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeI7aXjabg2 -->
<!-- da9iabeU7aSnaaBaaaleaacaaIXaaabeaakiaahofadaWgaaWcbaGa -->
<!-- aGymaiaacYcacaaIXaaabeaaaaa@3B53@ -->
<math>\theta =\lambda _{1}\mathbf{S}_{1,1}</math>
where the initial <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIXaaabeaaaaa@355F@ -->
<math>\lambda _{1}</math>
is input to the algorithm as <tt>options.lamb(1)</tt>. It is typical that   is much larger than <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeg7aHbaa@3463@ -->
<math>\alpha </math>
. The inverse for the L-M step is then estimated as


  .
<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaamaabmaabaGaaC -->
<!-- isamaaBaaaleaacaWGRbaabeaakiabgUcaRiabeI7aXjaahMeaaiaa -->
<!-- wIcacaGLPaaadaahaaWcbeqaaiabgkHiTiaaigdaaaGccqGHijYUca -->
<!-- WHwbWaaeWaaeaacaWHtbGaey4kaSYaaeWaaeaacqaH4oqCcqGHRaWk -->
<!-- cqaHXoqyaiaawIcacaGLPaaacaWHjbaacaGLOaGaayzkaaWaaWbaaS -->
<!-- qabeaacqGHsislcaaIXaaaaOGaaCOvamaaCaaaleqabaGaamivaaaa -->
<!-- aaa@4BB9@ -->


Note that the left and right singular vectors are the same (and equal to ) because the Hessian is symmetric. If the optimization surface is convex,  will be positive definite and the diagonal matrix  will have all positive values on the diagonal. However, the optimization problem may be such that this is not the case at every step. Therefore a small number  is added to the diagonal of  in an effort to ensure that the Hessian will always be positive definite. In the algorithm  , where  is the largest singular value and  is the maximum condition number desired for the Hessian [  is input as options.ncond]. This can be viewed as adding a small dampening to the optimization and is always included at every step. In contrast, an additional damping factor that is allowed to adapt at each step is also included. The adapting dampening factor is given by  where the initial  is input to the algorithm as options.lamb(1). It is typical that  is much larger than  . The inverse for the L-M step is then estimated as
<math>\left( \mathbf{H}_{k}+\theta \mathbf{I} \right)^{-1}\approx \mathbf{V}\left( \mathbf{S}+\left( \theta +\alpha \right)\mathbf{I} \right)^{-1}\mathbf{V}^{T}</math>


 


and is used to estimate a step distance .
and is used to estimate a step distance <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabfs5aejaahI -->
<!-- hadaWgaaWcbaGaam4Aaaqabaaaaa@3647@ -->
<math>\Delta \mathbf{x}_{k}</math>
.


The ratio   is a measure of the improvement in the objective function relative to the improvement if the objective function decreased linearly. If   then a line search is initiated [ is a small number input as options.ramb(1)]. In this case, the damping factor   is increased (so that the step size is reduced) by setting   where   [ is input as options.lamb(2)], and a new step distance   is estimated. The ratio   is then estimated again. The damping factor   is increased until   or the maximum number of line search steps   is reached [ is input as options.kmax]. (If   increases sufficiently, the optimization resembles a damped steepest decent method.) If the maximum number of line search steps   is reached, the step is "rejected" and only a small movement is made such that   [ is input as options.ramb(3)].
The ratio <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aaaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaamaalyaabaGaam -->
<!-- OCaiabg2da9maadmaabaGaamOzamaabmaabaGaaCiEamaaBaaaleaa -->
<!-- caWGRbaabeaaaOGaayjkaiaawMcaaiabgkHiTiaadAgadaqadaqaai -->
<!-- aahIhadaWgaaWcbaGaam4AaaqabaGccqGHRaWkcqqHuoarcaWH4bWa -->
<!-- aSbaaSqaaiaadUgaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaa -->
<!-- aabaWaamWaaeaacqGHsislcaWHkbGaeuiLdqKaaCiEamaaBaaaleaa -->
<!-- caWGRbaabeaaaOGaay5waiaaw2faaaaaaaa@4C99@ --><math>{r=\left[ f\left( \mathbf{x}_{k} \right)-f\left( \mathbf{x}_{k}+\Delta \mathbf{x}_{k} \right) \right]}/{\left[ -\mathbf{J}\Delta \mathbf{x}_{k} \right]}\;</math> is a measure of the improvement in the objective function relative to the improvement if the objective function decreased linearly. If <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadkhacqGH8a -->
<!-- apcaWGYbWaaSbaaSqaaiaaigdaaeqaaaaa@369D@ -->
<math>r<r_{1}</math>
then a line search is initiated [<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadkhadaWgaa -->
<!-- WcbaGaaGymaaqabaGccqGH+aGpcaaIWaaaaa@366E@ -->
<math>r_{1}>0</math>
is a small number input as <tt>options.ramb(1)</tt>]. In this case, the damping factor <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIXaaabeaaaaa@355F@ -->
<math>\lambda _{1}</math>
is increased (so that the step size is reduced) by setting <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIXaaabeaakiabg2da9maalyaabaGaeq4UdW2aaSbaaSqa -->
<!-- aiaaigdaaeqaaaGcbaGaeq4UdW2aaSbaaSqaaiaaikdaaeqaaaaaaa -->
<!-- a@3BC6@ -->
<math>\lambda _{1}={\lambda _{1}}/{\lambda _{2}}\;</math>
where <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIYaaabeaakiabgYda8iaaigdaaaa@3729@ -->
<math>\lambda _{2}<1</math>
[<!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIYaaabeaaaaa@3560@ -->
<math>\lambda _{2}</math>
is input as <tt>options.lamb(2)</tt>], and a new step distance <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabfs5aejaahI -->
<!-- hadaWgaaWcbaGaam4Aaaqabaaaaa@3647@ -->
<math>\Delta \mathbf{x}_{k}</math>
is estimated. The ratio ''r'' is then estimated again. The damping factor <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIXaaabeaaaaa@355F@ -->
<math>\lambda _{1}</math>
is increased until <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadkhacqGHLj -->
<!-- YScaWGYbWaaSbaaSqaaiaaigdaaeqaaaaa@375F@ -->
<math>r\ge r_{1}</math>
or the maximum number of line search steps ''k''<sub>max</sub> is reached [''k''<sub>max</sub> is input as <tt>options.kmax</tt>]. (If <math>\lambda _{1}</math> increases sufficiently, the optimization resembles a damped steepest decent method.) If the maximum number of line search steps ''k''<sub>max</sub> is reached, the step is "rejected" and only a small movement is made such that <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabfs5aejaahI -->
<!-- hadaWgaaWcbaGaam4AaaqabaGccqGH9aqpcaWGYbWaaSbaaSqaaiaa -->
<!-- iodaaeqaaOWaaSGbaeaacqqHuoarcaWH4bWaaSbaaSqaaiaadUgaae -->
<!-- qaaaGcbaWaaqWaaeaacqqHuoarcaWH4bWaaSbaaSqaaiaadUgaaeqa -->
<!-- aaGccaGLhWUaayjcSdaaaaaa@4393@ -->
<math>\Delta \mathbf{x}_{k}=r_{3}{\Delta \mathbf{x}_{k}}/{\left| \Delta \mathbf{x}_{k} \right|}\;</math>
[''r''<sub>3</sub> is input as <tt>options.ramb(3)</tt>].


If instead, the first estimate of the ratio is large enough such that   then the line search is not initiated. If the ratio is sufficiently large such that , where   then the damping factor is decreased by setting   where   [ is input as options.ramb(2);   is input as options.lamb(3)].
If instead, the first estimate of the ratio is large enough such that <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadkhacqGHLj -->
<!-- YScaWGYbWaaSbaaSqaaiaaigdaaeqaaaaa@375F@ -->
<math>r\ge r_{1}</math>
then the line search is not initiated. If the ratio is sufficiently large such that ''r''>''r''<sub>2</sub>, where ''r''<sub>1</sub>>''r''<sub>2</sub> then the damping factor is decreased by setting <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIXaaabeaakiabg2da9maalyaabaGaeq4UdW2aaSbaaSqa -->
<!-- aiaaigdaaeqaaaGcbaGaeq4UdW2aaSbaaSqaaiaaiodaaeqaaaaaaa -->
<!-- a@3BC7@ -->
<math>\lambda _{1}={\lambda _{1}}/{\lambda _{3}}\;</math>
where <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIZaaabeaakiabg6da+iaaigdaaaa@372E@ -->
<math>\lambda _{3}>1</math>
[ ''r''<sub>2</sub> is input as <tt>options.ramb(2)</tt>; <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeU7aSnaaBa -->
<!-- aaleaacaaIZaaabeaaaaa@3561@ -->
<math>\lambda _{3}</math>
is input as <tt>options.lamb(3)</tt>].


A new value for   is then estimated from   and the next step is repeated from that point. The process is repeated until one of the stopping criteria [options.stopcrit] are met.
A new value for '''x''' is then estimated from <!-- MathType@MTEF@5@5@+= -->
<!-- faaagaart1ev2aqaKnaaaaWenf2ys9wBH5garuavP1wzZbItLDhis9 -->
<!-- wBH5garmWu51MyVXgaruWqVvNCPvMCG4uz3bqee0evGueE0jxyaiba -->
<!-- ieYlf9irVeeu0dXdh9vqqj=hHeeu0xXdbba9frFj0=OqFfea0dXdd9 -->
<!-- vqaq=JfrVkFHe9pgea0dXdar=Jb9hs0dXdbPYxe9vr0=vr0=vqpi0d -->
<!-- c9GqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaahIhadaWgaa -->
<!-- WcbaGaam4AaiabgUcaRiaaigdaaeqaaOGaeyypa0JaaCiEamaaBaaa -->
<!-- leaacaWGRbaabeaakiabgUcaRiabfs5aejaahIhadaWgaaWcbaGaam -->
<!-- 4Aaaqabaaaaa@3E1A@ -->
<math>\mathbf{x}_{k+1}=\mathbf{x}_{k}+\Delta \mathbf{x}_{k}</math>
and the next step is repeated from that point. The process is repeated until one of the stopping criteria [<tt>options.stopcrit</tt>] are met.


===Options===
===Options===
Line 140: Line 473:
* '''dispfreq''': ''N'', displays results every ''N''<sup>th</sup> iteration {default ''N''=10}.
* '''dispfreq''': ''N'', displays results every ''N''<sup>th</sup> iteration {default ''N''=10}.


* '''stopcrit''': [1e-6 1e-6 10000 3600] defines the stopping criteria as [(relative tolerance) (absolute tolerance) (maximum number of iterations) (maximum time in seconds)].
* '''stopcrit''': [1e-8 1e-10 10000 3600] defines the stopping criteria as [(relative tolerance) (absolute tolerance) (maximum number of iterations) (maximum time in seconds)].


* '''x''': [ {'off'} | 'on' ] saves (x) at each step.
* '''x''': [ {'off'} | 'on' ] saves (x) at each step.

Latest revision as of 11:48, 20 January 2011

Purpose

Levenberg-Marquardt non-linear optimization.

Synopsis

[x,fval,exitflag,out] = lmoptimize(fun,x0,options,params)

Description

Starting at (x0) LMOPTIMIZE finds (x) that minimizes the function defined by the function handle (fun) where (x) has parameters. The function (fun) must supply the Jacobian and Hessian i.e. they are not estimated by LMOPTIMIZE (an example is provided in the Algorithm Section below).

Inputs

  • fun = function handle, the call to fun is
[fval,jacobian,hessian] = fun(x)
(see the Algorithm Section for tips on writing (fun))
  • (fval) is a scalar objective function value,
  • (jacobian) is a x1 vector of Jacobian values, and
  • (hessian) is a x matrix of Hessian values.
  • x0 = x1 initial guess of the function parameters.

Optional Inputs

  • options = discussed below in the Options Section.
  • params = comma separated list of additional parameters passed to the objective function (fun), the call to (fun) is:
[fval,jacobian,hessian] = fun(x,params1,params2,...)

Outputs

  • x = x1 vector of parameter value(s) at the function minimum.
  • fval = scalar value of the function evaluated at (x).
  • exitflag = describes the exit condition with the following values
1: converged to a solution (x) based on one of the tolerance criteria
0: convergence terminated based on maximum iterations or maximum time.
  • out = structure array with the following fields:
  • critfinal: final values of the stopping criteria (see options.stopcrit below).
  • x: intermediate values of (x) if options.x=='on'.
  • fval: intermediate values of (fval) if options.fval=='on'.
  • Jacobian: last evaluation of the Jacobian if options.Jacobian=='on'.
  • Hessian: last evaluation of the Hessian if options.Hessian=='on'.

Algorithm

The objective function is defined as , where is a x1 vector. The Jacobian and the symmetric Hessian are defined as


Two types of calls to the function fun are made. The first type is used often and is a simple evaluation of the function at x given by

fval = fun(x,params1,params2,...);

The second type of call returns the Jacobian and Hessian

[fval,jacobian,hessian] = fun(x,params1,params2,...);

Therefore, to enhance the speed of the optimization, the M-file that evaluates the objective function should only evaluate the Jacobian and Hessian if nargout>1 as in the following example:

function [p,p1,p2] = banana(x)

%BANANA Rosenbrock's function
%  INPUT:
%    x  = 2 element vector [x1 x2]
%  OUTPUTS:
%    p  = P(x)  = 100(x1\^2-x2)\^2 + (x1-1)\^2
%    p1 = P'(x) = [400(x1\^3-x1x2) + 2(x1-1); -200(x1\^2-x2)] 
%    p2 = P"(x) = [1200x1\^2-400x2+2, -400x1; -400x1, 200]
%    p  is (fval)
%    p1 is (jacobian)
%    p2 is (Hessian)
%
%I/O: [p,p1,p2] = banana(x);
 
x12 = x(1)\*x(1);
x13 = x(1)\*x12;
x22 = x(2)\*x(2);
alpha = 10; %1 is not very stiff, 10 is The stiff function
 
p   = 10\*alpha\*(x13\*x(1)-2\*x12\*x(2)+x22) + x12-2\*x(1)+1;
if nargout>1
  p1  = [40\*alpha\*(x13-x(1)\*x(2)) + 2\*(x(1)-1);
         -20\*alpha\*(x12-x(2))];
  p2  = [120\*x12-40\*x(2) + 2, -40\*x(1);
         -40\*x(1),             20]\*alpha;
end

This example shows that the Jacobian and Hessian are not evaluated unless explicitly called for by utilizing the nargout command. Since estimating (output p1) and (output p2) can be time consuming, this coding practice is expected to speed up the optimization.

A single step in a Gauss-Newton (G-N) optimization, is given as



where the index corresponds to the step number.

A problem with the G-N methods is that the inverse of the Hessian may not exist at every step, or it can converge to a saddle point if the Hessian is not positive definite [T.F. Edgar, D.M. Himmelblau, Optimization of Chemical Processes, 1st ed., McGraw-Hill Higher Education, New York, NY, 1988]. As an alternative, the Levenberg-Marquardt (L-M) method was used for CMF [K. Levenberg, Q. Appl. Math 2 (1944) 164; D. Marquardt, S.I.A.M. J. Appl. Math 11 (1963) 431; Edgar et al.]. A single step for the L-M method is given by


where is a damping parameter and is a x identity matrix. This has a direct analogy to ridge regression [A.E. Hoerl, R.W. Kennard, K.F. Baldwin, Commun. Statist. 4 (1975) 105] with , the ridge parameter, constraining the size of the step. This method is also called a damped G-N method [G. Tomasi, R. Bro, Comput. Stat. Data Anal. in press (2005)]. There are several details to implementing the L-M approach [M. Lampton, Comput. Phys. 11 (1997) 110]. Details associated with the LMOPTIMIZE function are discussed here.

At each iteration in the algorithm, the inverse of must be estimated. As a part of this process the singular value decomposition (SVD) of is calculated as



Note that the left and right singular vectors are the same (and equal to ) because the Hessian is symmetric. If the optimization surface is convex, will be positive definite and the diagonal matrix will have all positive values on the diagonal. However, the optimization problem may be such that this is not the case at every step. Therefore a small number is added to the diagonal of in an effort to ensure that the Hessian will always be positive definite. In the algorithm , where is the largest singular value and ncond is the maximum condition number desired for the Hessian [ncond is input as options.ncond]. This can be viewed as adding a small dampening to the optimization and is always included at every step. In contrast, an additional damping factor that is allowed to adapt at each step is also included. The adapting dampening factor is given by where the initial is input to the algorithm as options.lamb(1). It is typical that is much larger than . The inverse for the L-M step is then estimated as



and is used to estimate a step distance .

The ratio is a measure of the improvement in the objective function relative to the improvement if the objective function decreased linearly. If then a line search is initiated [ is a small number input as options.ramb(1)]. In this case, the damping factor is increased (so that the step size is reduced) by setting where [ is input as options.lamb(2)], and a new step distance is estimated. The ratio r is then estimated again. The damping factor is increased until or the maximum number of line search steps kmax is reached [kmax is input as options.kmax]. (If increases sufficiently, the optimization resembles a damped steepest decent method.) If the maximum number of line search steps kmax is reached, the step is "rejected" and only a small movement is made such that [r3 is input as options.ramb(3)].

If instead, the first estimate of the ratio is large enough such that then the line search is not initiated. If the ratio is sufficiently large such that r>r2, where r1>r2 then the damping factor is decreased by setting where [ r2 is input as options.ramb(2); is input as options.lamb(3)].

A new value for x is then estimated from and the next step is repeated from that point. The process is repeated until one of the stopping criteria [options.stopcrit] are met.

Options

options is structure array with the following fields:

  • display: [ 'off' | {'on'} ] governs level of display to the command window.
  • dispfreq: N, displays results every Nth iteration {default N=10}.
  • stopcrit: [1e-8 1e-10 10000 3600] defines the stopping criteria as [(relative tolerance) (absolute tolerance) (maximum number of iterations) (maximum time in seconds)].
  • x: [ {'off'} | 'on' ] saves (x) at each step.
  • fval: [ {'off'} | 'on' ] saves (fval) at each step.
  • Jacobian: [ {'off'} | 'on' ] saves last evaluation of the Jacobian.
  • Hessian: [ {'off'} | 'on' ] saves last evaluation of the Hessian.
  • ncond = 1e6, maximum condition number for the Hessian (see Algorithm).
  • lamb = [0.01 0.7 1.5], 3-element vector used for damping factor control (see Algorithm Section):
lamb(1): lamb(1) times the biggest eigenvalue of the Hessian is added to Hessian eigenvalues when taking the inverse; the result is damping.
lamb(2): lamb(1) = lamb(1)/lamb(2) causes deceleration in line search.
lamb(3): lamb(1) = lamb(1)/lamb(3) causes acceleration in line search.
  • ramb = [1e-4 0.5 1e-6], 3-element vector used to control the line search (see Algorithm Section):
ramb(1): if fullstep < ramb(1)\*[linear step] back up (start line search).
ramb(2): if fullstep > ramb(2)\*[linear step], accelerate [change lamb(1) by the acceleration parameter lamb(3)].
ramb(3): if linesearch rejected, make a small movement in direction of L-M step ramb(3)\*[L-M step].
  • kmax = 50, maximum steps in line search (see Algorithm Section).

Examples

options   = lmoptimize('options');
options.x = 'on';
options.display = 'off';
[x,fval,exitflag,out] = lmoptimize(@banana,x0,options);
plot(out.x(:,1),out.x(:,2),'-o','color', ...
 [0.4 0.7 0.4],'markersize',2,'markerfacecolor', ...
 [0 0.5 0],'markeredgecolor',[0 0.5 0])

See Also

function_handle, lmoptimizebnd