1.1. Definition
A stochastic iterative global optimization algorithm can be seen as a stochastic process that iteratively samples from a search space, producing a sequence of samples. The first sample is drawn from a probability measure inherent to the algorithm, and subsequent samples are generated based on Markov kernels that depend on the previous samples and their associated evaluations by a continuous function. We define the following structure to represent such an algorithm.
structure Algorithm (α β : Type*) [MeasurableSpace α] [MeasurableSpace β] where
ν : Measure α
prob_measure : IsProbabilityMeasure ν
kernel_iter (n : ℕ) : Kernel (prod_iter_image α β n) α
markov_kernel (n : ℕ) : IsMarkovKernel (kernel_iter n)
The type prod_iter_image
is the product type (Iic n → α) × (Iic n → β)
which represents the sequence of samples and their evaluations at each iteration. The measure ν
is the initial probability measure from which the first sample is drawn, and kernel_iter
is the Markov kernel that defines how to sample the next element based on the previous ones.
To assess the validity of this definition, we encompass three well-known algorithms in the literature into this framework.
1.1.1. Pure Random Search
The Pure Random Search algorithm is a simple stochastic iterative global optimization algorithm that samples uniformly from the search space at each iteration. It can be defined as follows:
-
ν
:=\mathcal{U}(\alpha)
is the uniform measure on the search space\alpha
. -
kernel_iter n
:=
\_ \mapsto \mathcal{U}(\alpha)
is the Markov kernel that maps any pair of samples/evaluations to the uniform measure on\alpha
.
1.1.2. AdaLIPO
The AdaLIPO algorithm has been introduced in (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “Global optimization of Lipschitz functions”. In International conference on machine learning. and is a more sophisticated stochastic iterative global optimization algorithm made for Lipschitz functions. It uses a estimation of the Lipschitz constant to adaptively sample the search space. The algorithm can be defined as follows:
-
ν
:=\mathcal{U}(\alpha)
is the uniform measure on the search space\alpha
. -
kernel_iter n
:=
(X, f(X)) \mapsto \mathcal{U}(E(X, f(X)))
is the Markov kernel that maps any pair of samples/evaluations to the uniform measure on the setE(X, f(X))
defined asE(X, f(X)) := \{x : \alpha \; | \; \max_{1 \le i \le n} f(X_i) \le \min_{1 \le i \le n} f(X_i) + \kappa(X, f(X)) \cdot d(x, X_i)\}
where(X, f(X)) \triangleq \left[(X_1, \dots, X_n), (f(X_1), \dots, f(X_n))\right]
and\kappa(X, f(X))
is an estimation of the Lipschitz constant of the functionf
based on the previous samples and their evaluations.
1.1.2.1. CMA-ES
The CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is a popular stochastic iterative global introduced in (Hansen and Ostermeier, 1996)Nikolaus Hansen and Andreas Ostermeier, 1996. “Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation”. In Proceedings of IEEE international conference on evolutionary computation.. It samples from a multivariate normal distribution whose mean and covariance matrix is adapted based on the previous samples. The algorithm can be defined as follows:
-
ν
:=\mathcal{N}(\mu, \Sigma)
is the multivariate normal measure on the search space\alpha
with mean\mu
and covariance matrix\Sigma
. -
kernel_iter n
:=
(X, f(X)) \mapsto \mathcal{N}(\mu(X, f(X)), \Sigma(X, f(X)))
is the Markov kernel that maps any pair of samples/evaluations to the multivariate normal distribution with mean\mu(X, f(X))
and covariance matrix\Sigma(X, f(X))
adapted based on the previous samples and their evaluations.