iFit: iOptim Optimization methods


  1. Description of the optimization process and general syntax
    1. Optimizers syntax
    2. Optimization quality
  2. Recommended optimizers
    1. In short, for continuous problems: Powell (fminpowell)
    2. In short, for noisy problems: Particle Swarm Optimization (fminpso)
  3. List of optimizer implemented in iFit/iOptim
  4. Benchmarking Optimization methods
    1. Swarm and Genetic algorithms
    2. Gradient methods (Newton)
    3. Other optimizers (and simplex)
  5. Model parameter constraints
    1. Fixed parameters
    2. Parameters varying within limits
    3. Limiting parameter change
  6. References


Commands we use in this page: fminsearch, and many other compatible optimizers

Description of the optimization process and general syntax


The iOptim sub-library provides a set of minimization routines, that is computational methods which attempt to minimize, in d dimensions, an objective function, often referred as the criteria or cost function, by varying a set of parameters.

minimizer: { vary vector p d so that objective(p) is minimal }

A usual cost function is the least-square between the model and the data. The syntax which all of these optimizers use is the one from the default Matlab fminsearch function :
>> [parameters,criteria,message,output] = minimizer(objective, guess, options)
where the input arguments are:
  1. objective: the name of the function to minimize, objective(p). This can be a string or a function handle.
  2. guess: the starting parameter set that objective depend on (this is a vector of numerical values).
  3. options: a configuration structure as obtained from optimset
The objective function should use the vector of parameters in order to compute a value (the criteria) to minimize. This value is usually a positive scalar that should tend towards zero. When the objective returns a vector, the sum of its elements is used as criteria, except for the Levenberg-Marquardt for which it is used as is.

The options structure may contain the following members:
The results from the minimizer are as follow:
  1. parameters: is the parameter set which minimizes the objective function
  2. criteria: is the criteria/objective value at the minimum solution
  3. message: is the final optimization routine state
  4. output: is a structure that holds some information gathered during the minimization procedure (see below)
The last output argument has at least the following fields:
as well as the other useful fields:
In all supplied iOptim optimizers cases, the optimizer can handle constraints (or more specifically restraints, that is comstraints on parameters values), specified as a 4th input argument. These are often lower and upper bounds. Refer to the documentation of each minimizer for more information, and the documentation about fitting models onto data sets. The number of optimization parameters is not restricted.

Optimizers syntax

All of these optimizers follow the same syntax, just as the default Matlab fminsearch optimizer (which is based upon the simplex method).
>> [parameters, fval, exitval, output] = fminimfil(function, starting_parameters, options, {constraints})
where the options and output arguments are structures. All of these methods support constraints on parameters during the optimization process.

The iData fits method is a wrapper to any of the iOptim optimizers, with a specified fit criteria.

Optimization quality

An optimization procedure is associated to a criteria minimization. Once a local/global minimum has been found, the best found model parameters are returned. However, any optimization result is located in an uncertainty volume in which a small change of the parameters do not produce a change of the criteria larger than the tolerance used during the optimization procedure (options.TolX and options.TolFun).

In practice, if we assume the criteria and parameter distributions around the minimum to be roughly Gaussians, we may estimate the parameter change when the minimized criteria value is doubled (that is extended by its 'half-width') :

{ Δp so that χ²(p) < 2*min(χ²) }

This may be obtained from the optimization history, and the result is stored in output.parsHistoryUncertainty. where we use a Gaussian weighting for the parameter sets around the criteria value:

exp[ - (χ²(p)-min(χ²))²/8/min(χ²)² ]

Alternatively, if we consider the local criteria hyper-surface to be approximated by a quadratic, then the error bar is twice the inverse of the Hessian matrix at the optimized solution (which is the criteria curvature - very sensitive to noise):

Δp ∼ √[ 2 ||∂²χ² / ∂pi∂pj ||-1 ]

This value is computed and stored in output.parsHessianUncertainty, and the corresponding error matrix (covariance) is stored in output.parsHessianCovariance (its diagonal square root is the projected parameter uncertainty). This step is only performed when the Hessian matrix requires less than about a minute to compute. Additionally, the output.parsHessianCorrelation indicates the cross-correlations between parameters (off-diagonal values larger that 0.7 indicate cross-correlated parameters).

Recommended optimizers

In short, for continuous problems: Powell (fminpowell)

The Table 1a and Figure 1a present a selection of the 'best' optimization methods for continuous functions (non noisy problems). For each of them, we indicate the typical success ratio, and solving time.

Best optimizers
Figure 1a: Best optimizers for continuous (non-noisy) problems, as a function of the number of free parameters. The overall success ratio, as well as the mean execution time is shown. The Powell optimizer is both excellent in success ratio and execution time.


Optimizer
Description (continuous problems)
Mean Success ratio (%)
Mean execution time (s)
fminpso [8] Particle Swarm Optimization 97.0 18.0
fminpowell [13] Powell with Coggins line search 96.7 0.7
fminsimpsa [1] simplex/simulated annealing 94.7 26.5
fminimfil [3] Unconstrained Implicit filtering
92.8 9.8
fminralg [4] Shor R-algorithm 88.4 0.04
fminnewton [21] Newton gradient search 79.1
0.02
Table 1a: A selection among the most efficient and fast optimization methods, for continuous (non-noisy) problems.

In short, for noisy problems: Particle Swarm Optimization (fminpso)

The Table 1b and Figure 1b present a selection of the 'best' optimization methods for noisy functions. For each of them, we indicate the typical success ratio, and solving time.
Best optimizers for noisy functions
Figure 1b: Best optimizers for noisy problems, as a function of the number of free parameters. The overall success ratio, as well as the mean execution time is shown. The Particle Swarm optimizer is good in success ratio, with fair execution time.


Optimizer
Description (noisy problems)
Mean Success ratio (%)
Mean execution time (s)
fminpso [8] Particle Swarm Optimization 69.7 13.8
fminsimpsa [1] simplex/simulated annealing 67.7 24.2
fminsce [10] Shuffled Complex Evolution 65.7 18.7
fmincmaes [9] Evolution Strategy with Covariance Matrix Adaptation 59.6
9.8
fminimfil [3] Unconstrained Implicit filtering 40.5 6.0
fminhooke [12] Hooke-Jeeves direct search 38.9
8.1
Table 1b: A selection among the most efficient and fast optimization methods, for noisy problems.

List of optimizer implemented in iFit/iOptim

We have performed extended tests of all the optimizers that are given in the iOptim sub-library. For each optimizer, we have minimized a set of functions, which include sharp, broad, with local minima, etc... The function set has also been used when adding a Gaussian noise of 10%.
This makes about 80000 optimizations... We then present a list of these optimizers, with their mean solving efficiency for both continuous and noisy problems. This provides a quantitative measurement of optimizers, which help in choosing good methods, and put aside bad ones. More detailed results are shown in the plot below.

Generally, solving noisy problems require longer optimization search time that for continuous problems, and the optimizers that perform best in this case are those that intrinsically make use of random numbers.

The list of all available optimizer methods can be obtained from the command:
>> fits(iData)

Function Name

Description

Efficiency
(continuous) [%]

Efficiency
(noisy) [%]

Comments

fminanneal [18]

Simulated annealing

53.6

5.3


fminbfgs [19]

Broyden-Fletcher-Goldfarb-Shanno

83.9

2.5

fast

fmincgtrust [5]

Steihaug Newton-CG-Trust

87.4

4.1

fast

fmincmaes [9]

Evolution Strategy with Covariance Matrix Adaptation

86.3

59.5


fminga [15]

Genetic Algorithm (real coding)

84.1

55.5


fmingradrand [6]

Random Gradient

62.6

13.1

rather slow for a gradient method

fminhooke [12]

Hooke-Jeeves direct search

94.6

38.8

Fast and efficient

fminimfil [3]

Implicit filtering

92.7

40.5

the slowest of the gradient methods

fminkalman [20]

unscented Kalman filter

63.6

7.6


fminlm [7]

Levenberg-Maquardt

14.2

1.8

Performs very well when the objective is a vector, e.g. from least-square errors, on non-noisy problems. Rather slow when not used properly.

fminnewton [21]

Newton gradient search

79.1

1.6

fast

fminpowell [13]

Powell Search with Coggins

96.6

30.7

very fast, especially for d>16


Powell Search with Golden rule

slower than with Coggins

fminpso [8]

Particle Swarm Optimization

97.0

69.7


fminralg [4]

Shor r-algorithm

88.3

3.3

fast

fminrand [22]

adaptive random search

60.7

44.7


fminsce [10]

Shuffled Complex Evolution

88.0

65.7


fminsearchbnd [2] (fminsearch)

Nelder-Mead simplex (fminsearch)

55.3

5.4

Matlab default.

fminsimplex [14]

Nelder-Mead simplex (alternate implementation than fminsearch)

73.3

30.0

 

fminsimpsa [1]

simplex/simulated annealing

94.7

67.7


fminswarm [11]

Particule Swarm Optimizer (alternate implementation than fminpso)

78.4

50.2


fminswarmhybrid [11]

Hybrid Particule Swarm Optimizer

80.5

24.1



Table 2: Optimization methods in iFit/iOptim, with mean success ratio in the dimensionality range d=1-64. Bold values indicate recommended algorithms. Default parameters from Matlab implementations are used, and MaxFun=2500*d, MaxIter=250*d, TolFun=1e-4. Best optimizers are highlighted in green. Methods to avoid are in red.

Benchmarking Optimization methods

Most of the optimizers given with iOptim have been gathered from the open-source community. In a few cases, the same mathematical method has been implemented as a number of equivalent methods (swarm, simplex), but with different overall efficiency. This means that there is no unique way to code an optimization algorithm. We do not provide any guaranty regarding the effectiveness of the optimizers, but we may statistically compare them.

Swarm and Genetic algorithms

Swarms and Genetic algorithmsSwarm and Genetic algorithms for noisy problems
Figure 2: Mean efficiency and solving time for swarm and genetic algorithm methods, as a function of the number of parameters to optimize.

The swarm methods (aka particle swarm optimizer) provide a very good success ratio. Among these, the shuffled complex evolution fminsce and the particle swarm optimizer fminpso are among the best, followed by the covariance matrix adaptation evolution strategy fmincmaes (which has a lower efficiency but is much faster).

Gradient methods (Newton)

Gradient methodsGradient methods for noisy problems
Figure 3: Mean efficiency and solving time for gradient methods, as a function of the number of parameters to optimize.

The gradient methods (derived from the Newton's method) are often considered to be the fastest. The well known Marquardt-Levenberg is an implementation of an adaptive Newton's method with a least square criteria (see below). This is true - they are fast for a reduced number of model parameters. However they often fail, as the method easily gets trapped into local minima. They are notoriously inefficient with noisy data.

Note: The Maquardt-Levenberg method fminlm is designed for objective functions/criteria as vectors. Its efficiency is then highly improved. However, its memory requirements and execution time for large dimensionalities are significantly higher than other methods.

Among these methods, we provide two excellent optimizers for continuous problems, the Implicit Filtering fminimfil and the Shor R-algorithm fminralg, which is slightly less efficient, but much faster. Only the Implicit Filtering should be used when optimizing noisy problems, all the other gradient methods failing...

Other optimizers (and simplex)

 
iOptim: Other methodsOther methods for noisy problems
Figure 4: Mean efficiency and solving time for other non-conventional methods, as a function of the number of parameters to optimize.

The Hooke optimizer fminhooke is one of the most efficient methods for all continuous problem, and among the fastest solving time.
The Powell optimizer fminpowell is a well known, simple and efficient method, which works great on noisy problems. Its version with the Coggins line search is the fastest of all the optimizers proposed here. The use of the Golden rule line search slightly improves its efficiency, but dramatically increases the execution time.

The default optimizer shipped with Matlab is the fminsearch one, which is a pure simplex implementation. We do not recommend this method, as all others are better in efficiency, some being as fast. A better alternative is a mix of a simplex with a simulated annealing process, implemented as fminsimpsa.

Model parameter constraints

In many cases, the model parameters are to be constrained. Some optimization specialists call these restraints, that is parameter values constraints. This includes some fixed values, or bounds within which parameters should be restricted. This is specified to the optimizer method by mean of a 4th input argument constraints :
>> [parameters,criteria,message,output]= fminimfil(model, initial_parameters, options, constraints)

Fixed parameters

To fix some of the model parameters to their starting value, you just need to define constraints as a vector with 0 for free parameters, and 1 for fixed parameters, e.g. :
>> p=fminimfil(objective, starting, options, [ 1 0 0 0 ])

will fix the first model parameter. A similar behavior is obtained when setting constraints as a structure with a fixed member :
>> constraints.fixed = [ 1 0 0 0 ];
The constraints vector should have the same length as the model parameter vector.

Parameters varying within limits

If one needs to restrict the exploration range of parameters, it is possible to define the lower and upper bounds of the model parameters. This can be done by setting the 4th argument to the lower bounds lb, and the 5th to the upper ub, e.g. :
>> p=fminimfil(objective, starting, options, [ 0.5 0.8 0 0 ], [ 1 1.2 1 1 ])
A similar behavior is obtained by setting constraints as a structure with members min and max :
>> constraints.min = [ 0.5 0.8 0 0 ];
>> constraints.max = [ 1 1.2 1 1 ];
The constraints vectors should have the same length as the model parameter vector, and NaN values can be used not to apply min/max constraints on specified parameters.

Limiting parameter change

Last, it is possible to restrict the change rate of parameters by assigning the field to a vector. Each non-zero value then specifies the absolute change that the corresponding parameter can vary between two optimizer iterations.constraints.steps. NaN values can be used not to apply step constraints on specified parameters.

In short, the constraints structure can have the following members, which all should have the same length as the model parameter vector:
All these constraints may be used simultaneously.

References

  1. Simplex/simulated annealing fminsimpsa [SIMPSA by Donckels] Cardoso, Salcedo, Feyo de Azevedo and Bardosa, Comp. Chem Engng, 21 (1997) 1349; Kirkpatrick, J. Stat. Phys. 34 (1984) 975.
  2. Nelder-Mead simplex fminsearch(bnd) [fminsearch by Mathworks] Jeffrey C. Lagarias, James A. Reeds, Margaret H. Wright,  Paul E. Wright, "Convergence Properties of the Nelder-Mead Simplex  Method in Low Dimensions", SIAM Journal of Optimization, 9(1):  p.112-147, 1998.
  3. Unconstrained Implicit filtering fminimfil [imfil by Kelley, 1998 version] C. T. Kelley, Iterative Methods for Optimization, no. 18 in  Frontiers in Applied Mathematics, SIAM, Philadelphia, 1999.
  4. Shor's r-algorithm minimization fminralg [solvopt by Kuntsevich and Kappel] Shor N.Z., Minimization Methods for Non-Differentiable Functions, Springer Series in Computational Mathematics, Vol. 3, Springer-Verlag (1985)
  5. Steihaug Newton-CG-Trust region algorithm fmincgtrust [cgtrust from Kelley] Broyden, C. G., J. of the Inst of Math and Its Appl 1970, 6, 76-90; Fletcher, R., Computer Journal 1970, 13, 317-322; Goldfarb, D., Mathematics of Computation 1970, 24, 23-26; Shanno, D. F.,Mathematics of Computation 1970, 24, 647-656; C. T. Kelley, 1998, Iterative Methods for Optimization no. 18 in  Frontiers in Applied Mathematics, SIAM, Philadelphia, 1999.
  6. Random gradient optimizer fmingradrand [ossrs by Belur] Computer Methods in Applied Mechanics & Engg, Vol  19, (1979) 99
  7. Levenberg-Maquardt search fminlm [LMFsolve by Balda] Fletcher, R., (1971) Rpt. AERE-R 6799, Harwell; Fletcher, R., Computer Journal 1970, 13, 317-322
  8. Particle swarm optimization fminpso [PSO by Donckels] Kennedy J., Eberhart R.C. (1995): Particle swarm optimization. In: Proc. IEEE Conf. on Neural Networks, IV, Piscataway, NJ, pp. 1942-1948
  9. Evolution Strategy with Covariance Matrix Adaption fmincmaes [cmaes by Hansen] Hansen, N. and S. Kern (2004). Evaluating the CMA Evolution Strategy on Multimodal Test Functions. Eighth International Conference on Parallel Problem Solving from Nature PPSN VIII, Proceedings, pp. 282-291, Springer. ; Hansen, N. and A. Ostermeier (2001). Completely Derandomized Self-Adaptation in Evolution Strategies. Evolutionary Computation, 9(2), pp. 159-195.; Hansen, N., S.D. Mueller and P. Koumoutsakos (2003). Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1).
  10. Shuffled complex evolution fminsce [SCE by Donckels] Q. Y. Duan et al, J. of Opt. Th. and Appl. 76 (1993) 501.
  11. Particle Swarm Optimization fminswarm and fminswarmhybrid [hPSO by Leontitsis] Kennedy J., Eberhart R.C. (1995): Particle swarm optimization. In: Proc. IEEE Conf. on Neural Networks, IV, Piscataway, NJ, pp. 1942-1948; Shi, Y. and Eberhart, R. C. A modified particle swarm optimizer. Proc.  IEEE Int Conf. on Evol Comput pp. 69-73. IEEE Press, Piscataway, NJ, 1998
  12. Hooke-Jeeves direct search fminhooke [hooke by Kelley] Arthur F. Kaupe Jr., Communications of the ACM, Vol 6. (1963) 313; R. Hooke and T. A. Jeeves, Journal of the ACM, Vol. 8, April 1961, pp. 212 ; C. T. Kelley, 1998, Iterative Methods for Optimization, no. 18 in  Frontiers in Applied Mathematics, SIAM, Philadelphia, 1999.
  13. Powell minimization fminpowell [powell by Secchi] Brent, Algorithms for minimization without derivatives, Prentice-Hall (1973)
  14. Nelder-Mead simplex state machine fminsimplex [Simplex by Sigworth] Nelder and Mead, Computer J., 7 (1965) 308
  15. Genetic algorithm optimizer with parameter real coding, fminga [GA by Ivakpour]
  16. Hansen, N; Kern, S, Evaluating the CMA evolution strategy on multimodal test functions, PARALLEL PROBLEM SOLVING FROM NATURE - PPSN VIII- LECTURE NOTES IN COMPUTER SCIENCE, 3242 (2004) 282-291. [pdf]
  17. P.N. Suganthan, N. Hansen et al, Problem definition and evaluation criteria for the CEC 2005 Special Session on Real-Parameter Optimization (2005) [pdf]
  18. Simulated Annealing fminanneal fminanneal by joachim.vandekerckhove@psy.kuleuven.be  Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983). Optimization by Simulated Annealing. Science, 220, 671-680
  19. Broyden-Fletcher-Goldfarb-Shanno fminbfgs by A.R. Secchi Broyden, C. G., J. of the Inst of Math and Its Appl 1970, 6, 76-90 ; Fletcher, R., Computer Journal 1970, 13, 317-322 ; Goldfarb, D., Mathematics of Computation 1970, 24, 23-26 ; Shanno, D. F.,Mathematics of Computation 1970, 24, 647-656
  20. Unscented Kalman filter fminkalman by Yi Cao Kalman, R. E. "A New Approach to Linear Filtering and Prediction Problems", Transactions of the ASME - J. of Basic Engineering Vol. 82: pp. 35 (1960)
  21. Newton search fminnewton by C.T. Kelley Kelley 1998, Iterative Methods for Optimization, no. 18 in  Frontiers in Applied Mathematics, SIAM, Philadelphia, 1999. ; W. Press, Numerical Recipes, Cambridge (1988)
  22. Adaptive random search fminrand by A.R. Secchi 2001 A.R. Secchi and C.A. Perlingeiro, "Busca Aleatoria Adaptativa", in Proc. of XII Congresso Nacional de Matematica Aplicada e Computacional, Sao Jose do Rio Preto, SP, pp. 49-52 (1989).


E. Farhi - iFit/optimization methods - $Date: 2011-09-22 12:37:47 $ $Revision: 1.19 $ - back to Main iFit Page ILL, Grenoble, France <www.ill.eu>