Consistency of global optimization algorithms

1.1. Definition

A stochastic iterative global optimization algorithm can be seen as a stochastic process that iteratively samples from a search space, producing a sequence of samples. The first sample is drawn from a probability measure inherent to the algorithm, and subsequent samples are generated based on Markov kernels that depend on the previous samples and their associated evaluations by a continuous function. We define the following structure to represent such an algorithm.

structure Algorithm (α β : Type*) [MeasurableSpace α] [MeasurableSpace β] where ν : Measure α prob_measure : IsProbabilityMeasure ν kernel_iter (n : ) : Kernel (prod_iter_image α β n) α markov_kernel (n : ) : IsMarkovKernel (kernel_iter n)

The type prod_iter_image is the product type (Iic n → α) × (Iic n → β) which represents the sequence of samples and their evaluations at each iteration. The measure ν is the initial probability measure from which the first sample is drawn, and kernel_iter is the Markov kernel that defines how to sample the next element based on the previous ones.

To assess the validity of this definition, we encompass three well-known algorithms in the literature into this framework.

The Pure Random Search algorithm is a simple stochastic iterative global optimization algorithm that samples uniformly from the search space at each iteration. It can be defined as follows:

  • ν := \mathcal{U}(\alpha) is the uniform measure on the search space \alpha.

  • kernel_iter n := \_ \mapsto \mathcal{U}(\alpha) is the Markov kernel that maps any pair of samples/evaluations to the uniform measure on \alpha.

1.1.2. AdaLIPO

The AdaLIPO algorithm has been introduced in  (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “Global optimization of Lipschitz functions”. In International conference on machine learning. and is a more sophisticated stochastic iterative global optimization algorithm made for Lipschitz functions. It uses a estimation of the Lipschitz constant to adaptively sample the search space. The algorithm can be defined as follows:

  • ν := \mathcal{U}(\alpha) is the uniform measure on the search space \alpha.

  • kernel_iter n := (X, f(X)) \mapsto \mathcal{U}(E(X, f(X))) is the Markov kernel that maps any pair of samples/evaluations to the uniform measure on the set E(X, f(X)) defined as E(X, f(X)) := \{x : \alpha \; | \; \max_{1 \le i \le n} f(X_i) \le \min_{1 \le i \le n} f(X_i) + \kappa(X, f(X)) \cdot d(x, X_i)\} where (X, f(X)) \triangleq \left[(X_1, \dots, X_n), (f(X_1), \dots, f(X_n))\right] and \kappa(X, f(X)) is an estimation of the Lipschitz constant of the function f based on the previous samples and their evaluations.

1.1.2.1. CMA-ES

The CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is a popular stochastic iterative global introduced in  (Hansen and Ostermeier, 1996)Nikolaus Hansen and Andreas Ostermeier, 1996. “Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation”. In Proceedings of IEEE international conference on evolutionary computation.. It samples from a multivariate normal distribution whose mean and covariance matrix is adapted based on the previous samples. The algorithm can be defined as follows:

  • ν := \mathcal{N}(\mu, \Sigma) is the multivariate normal measure on the search space \alpha with mean \mu and covariance matrix \Sigma.

  • kernel_iter n := (X, f(X)) \mapsto \mathcal{N}(\mu(X, f(X)), \Sigma(X, f(X))) is the Markov kernel that maps any pair of samples/evaluations to the multivariate normal distribution with mean \mu(X, f(X)) and covariance matrix \Sigma(X, f(X)) adapted based on the previous samples and their evaluations.