Consistency of global optimization algorithms

2. Consistency and sampling

In  (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “Global optimization of Lipschitz functions”. In International conference on machine learning., the authors provided definitions for the consistency of a stochastic iterative global optimization algorithm and for the fact that an algorithm samples the whole search space.

2.1. Consistency of algorithms

In the paper, the consistency of an algorithm is defined using convergence in probability. A stochastic iterative global optimization algorithm is sait to be consistent over f if the maximum of the evaluations of the samples produced by the algorithm converges in probability to the maximum of f over the search space, i.e. \max_{0 \le i \le n} f(X_i) \xrightarrow{p} \max_{x \in \mathcal{X}} f(x), where X_i is the i-th sample produced by the algorithm. Note that we consider here the case where an algorithm searches to maximize a function, but the definition can be adapted to the case of minimization.

The latter definition can be unfolded as follows: \forall \varepsilon > 0, \; \mathbb{P}(|\max_{0 \le i \le n} f(X_i) - \max_{x \in \mathcal{X}} f(x)| > \varepsilon) \xrightarrow[n \to \infty]{} 0, where \mathbb{P} denotes the probability measure on sequences of samples of size n + 1 produced by the algorithm. In our formalization, we showed that \mathbb{P} is equal to Algorithm.fin_measure.

To formally define the consistency using Algorithm.fin_measure, we define the set of sequences of samples of size n + 1 such that the maximum of the evaluations is at least \varepsilon away from the maximum of f over the compact search space (noted fmax).

def set_dist_max {f : α β} (hf : Lipschitz f) {n : } (ε : ) := {u : iter α n | dist (Tuple.max (f u)) (fmax hf) > ε}

We can now define the measure of this set of sequences.

noncomputable def measure_dist_max (A : Algorithm α β) {f : α β} (hf : Lipschitz f) := fun ε n => A.fin_measure hf.continuous (set_dist_max hf (n := n) ε)

Finally, an algorithm A is consistent over a Lipschitz function f if measure_dist_max tends to 0 when n tends to infinity.

def is_consistent (A : Algorithm α β) {f : α β} (hf : Lipschitz f) := ε, 0 < ε Tendsto (measure_dist_max A hf ε) atTop (𝓝 0)

2.2. Sampling whole space

In the original paper, the authors also defined the fact that, given a function, an algorithm samples the whole search space. An algorithm is said to sample the whole search space, given f, if the supremum of the minimum distance between the samples and a point in the search space tends to 0 in probability, i.e. \sup_{x \in \mathcal{X}} \min_{0 \le i \le n} \| x - X_i\| \xrightarrow{p} 0, where \| \cdot \| denotes a norm on the search space (usually the Euclidean norm), and X_i is the i-th sample produced by the algorithm given f.

This definition can be unfolded as follows: \forall \varepsilon > 0, \; \mathbb{P}(\sup_{x \in \mathcal{X}} \min_{0 \le i \le n} \| x - X_i\| > \varepsilon) \xrightarrow[n \to \infty]{} 0, where \mathbb{P} is equal to Algorithm.fin_measure.

To formalize this definition, we define the minimum distance between a point in the search space and a sequence of samples.

noncomputable def min_dist_x := fun {n : } (u : iter α n) (x : α) => Tuple.min ((fun xi => dist xi x) u)

As min_dist_x is a continuous function, we can define the maximum of this function over the compact search space. Hence, the \sup in the above definition can be replaced by a \max over the search space.

noncomputable def max_min_dist {n : } (u : iter α n) := min_dist_x u (compact_argmax (min_dist_x_continuous u))

Finally, given a continuous function f, an algorithm A samples the whole search space if the measure of the set of sequences of samples of size n + 1 such that, max_min_dist is greater than \varepsilon, tends to 0 when n tends to infinity.

noncomputable def sample_whole_space (A : Algorithm α β) {f : α β} (hf : Continuous f) := ε > 0, Tendsto (fun n => A.fin_measure hf {u : iter α n | max_min_dist u > ε}) atTop (𝓝 0)

Now, we have all the definitions needed to formalize the Proposition 3 of  (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “Global optimization of Lipschitz functions”. In International conference on machine learning..