Formalization of optimization algorithms

2. Some examples🔗

In this page, we present two examples of algorithms encompassed by our framework: the Pure Random Search (PRS) and the LIPO algorithms. The formalization of these algorithms in our framework relies on the definition of the initial probability measure and the Markov kernels that define how to sample the next element based on the previous ones.

The Pure Random Search (PRS) algorithm is a simple stochastic iterative global optimization algorithm that samples uniformly from the input space at each iteration using a fixed probability measure μ. It can be represented in our framework as follows:

🔗def
PRS.{u_1, u_2} {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] : Algorithm α β
PRS.{u_1, u_2} {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] : Algorithm α β

The Pure Random Search (PRS) algorithm for global optimization. This baseline algorithm samples uniformly from the input space at each iteration using a fixed probability measure μ.

noncomputable def PRS : Algorithm α β where ν := μ kernel_iter _ := Kernel.const _ μ

2.2. LIPO🔗

The LIPO algorithm has been introduced in  (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “Global optimization of Lipschitz functions”. In International conference on machine learning. and is a more sophisticated stochastic iterative global optimization algorithm made for Lipschitz functions. It uses the Lipschitz constant to adaptively construct an upper bound that guides the sampling of the search space. More formally, at each iteration, LIPO samples uniformly from the set of potential maximizers defined as \left \{ x \in \alpha \; \middle| \; \max_{1 \le i \le n} f(X_i) \le \min_{1 \le i \le n} f(X_i) + \kappa \|X_i - x \|_2 \right\}, where α is a pseudo-metric, measurable, Borel, and second-countable space and \kappa is the Lipschitz constant of the function f : \alpha \to \mathbb{R}.

It can be represented in our framework as follows:

🔗def
LIPO.{u_1} {α : Type u_1} [PseudoMetricSpace α] [MeasurableSpace α] [BorelSpace α] [SecondCountableTopology α] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] (κ : NNReal) (h : (n : ) (data : prod_iter_image α n), μ (LIPO.potential_max κ data) 0) : Algorithm α
LIPO.{u_1} {α : Type u_1} [PseudoMetricSpace α] [MeasurableSpace α] [BorelSpace α] [SecondCountableTopology α] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] (κ : NNReal) (h : (n : ) (data : prod_iter_image α n), μ (LIPO.potential_max κ data) 0) : Algorithm α

The LIPO (LIPschitz Optimization) algorithm for global optimization. This algorithm optimizes an unknown function assuming only that it has a finite Lipschitz constant κ. It starts with an arbitrary probability measure μ as initial distribution and iteratively samples from the set of potential maximizers, ensuring consistency and convergence to the global optimum (Malherbe et al., 2017).

noncomputable def LIPO : Algorithm α where ν := μ kernel_iter _ := potential_max_kernel μ κ markov_kernel n := fun data => cond_isProbabilityMeasure (h n data)

2.3. RankOpt🔗

The RankOpt algorithm has been introduced in  (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “A Ranking Approach to Global Optimization”. In International conference on machine learning. and is a complex stochastic iterative global optimization algorithm. It is based on the notion of a ranking rule, which is a function induced by another function f and is defined as r_f (x, y) = \begin{cases} 1 & \text{if } f(x) > f(y) \\ 0 & \text{if } f(x) = f(y) \\ -1 & \text{if } f(x) < f(y) \end{cases}. Ranking rules define equivalence classes of functions that share the same induced ranking. To use RankOpt, one needs to define a countable set of ranking rules \mathcal{R} (e.g. a subset of the ranking rules induced by continuous functions). The algorithm samples points x such that, for any ranking rule r \in \mathcal{R} that is consistent with the observed data, 0 \le r(x, \arg \max_{1 \le i \le n} f(X_i)). The set of such points is the set of potential maximizers of the algorithm and is defined as \left \{x \in \alpha \; \middle| \; \exists r \in \mathcal{R}, \; \mathcal{L}_n(r) = 0 \land 0 \le r(x, \arg \max_{1 \le i \le n} f(X_i))\right\}, where α is a measurable space and \mathcal{L}_n(r) is the ranking loss of r on the observed data, defined as \mathcal{L}_n(r) \triangleq \frac{2}{n (n + 1)} \sum_{1 \le i \le j \le n} \mathbb{I}\left[r(X_i, X_j) \neq r_f(X_i, X_j)\right]. Note that RankOpt does not require to know r_f explicitly as it is evaluated only on observed data points.

RankOpt can be represented in our framework as follows:

🔗def
RankOpt.{u_1, u_2} {α : Type u_1} {β : Type u_2} [MeasurableSpace α] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] [TopologicalSpace β] [MeasurableSpace β] [BorelSpace β] [LinearOrder β] [SecondCountableTopology β] [OpensMeasurableSpace β] [OrderClosedTopology β] {𝓡 : Set (RankRule α)} (h𝓡 : 𝓡.Countable) (h : (n : ) (data : prod_iter_image α β n), μ (RankOpt.potential_max data 𝓡) 0) : Algorithm α β
RankOpt.{u_1, u_2} {α : Type u_1} {β : Type u_2} [MeasurableSpace α] (μ : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure μ] [TopologicalSpace β] [MeasurableSpace β] [BorelSpace β] [LinearOrder β] [SecondCountableTopology β] [OpensMeasurableSpace β] [OrderClosedTopology β] {𝓡 : Set (RankRule α)} (h𝓡 : 𝓡.Countable) (h : (n : ) (data : prod_iter_image α β n), μ (RankOpt.potential_max data 𝓡) 0) : Algorithm α β

The RankOpt algorithm for global optimization. This algorithm uses a ranking approach to optimize an unknown function. It maintains a hypothesis class 𝓡 of ranking rules. At each iteration, it samples from the set of points that could be optimal according to ranking rules consistent with the observed data (Malherbe et al., 2017).

noncomputable def RankOpt : Algorithm α β where ν := μ kernel_iter _ := potential_max_kernel μ h𝓡 markov_kernel n := fun data => cond_isProbabilityMeasure (h n data)

where RankRule is defined as the subtype of \{-1, 0, 1\}-valued functions that are jointly measurable:

def RankRule (α : Type*) [MeasurableSpace α] := {f : α α ({-1, 0, 1} : Set ) // Measurable <| Function.uncurry f}