2. Some examples
In this page, we present two examples of algorithms encompassed by our framework: the Pure Random Search (PRS) and the LIPO algorithms. The formalization of these algorithms in our framework relies on the definition of the initial probability measure and the Markov kernels that define how to sample the next element based on the previous ones.
2.1. Pure Random Search
The Pure Random Search (PRS) algorithm is a simple stochastic iterative global optimization algorithm that samples uniformly from the search space at each iteration. It can be represented in our framework as follows:
noncomputable def PRS : Algorithm α β where
ν := uniform univ
prob_measure := uniform_is_prob_measure (α:Type u_1β:Type u_2inst✝³:MeasureSpace αinst✝²:IsFiniteMeasure ℙinst✝¹:NeZero ℙinst✝:MeasurableSpace β⊢ ℙ univ ≠ 0 All goals completed! 🐙)
kernel_iter _ := Kernel.const _ (uniform (univ) : Measure α)
markov_kernel _ := ⟨fun _ => uniform_is_prob_measure (α:Type u_1β:Type u_2inst✝³:MeasureSpace αinst✝²:IsFiniteMeasure ℙinst✝¹:NeZero ℙinst✝:MeasurableSpace βx✝¹:ℕx✝:prod_iter_image α β x✝¹⊢ ℙ univ ≠ 0 All goals completed! 🐙)⟩2.2. LIPO
The LIPO algorithm has been introduced in (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “Global optimization of Lipschitz functions”. In International conference on machine learning. and is a more sophisticated stochastic iterative global optimization algorithm made for Lipschitz functions. It uses the Lipschitz constant to adaptively construct an upper bound that guides the sampling of the search space. More formally, at each iteration, LIPO samples uniformly from the set of potential maximizers defined as
\left \{ x \in \alpha \; \middle| \; \max_{1 \le i \le n} f(X_i) \le \min_{1 \le i \le n} f(X_i) + \kappa \|X_i - x \|_2 \right\},
where \alpha is a measurable subset of \mathbb{R}^d with finite and non-zero measure and \kappa is the Lipschitz constant of the function f : \alpha \to \mathbb{R}.
It can be represented in our framework as follows:
variable {d : ℕ} {κ : ℝ≥0} {α : Set (ℝᵈ d)} (mes_α : MeasurableSet α)
(mα₀ : ℙ α ≠ 0) (mα₁ : ℙ α ≠ ⊤)noncomputable def LIPO : Algorithm α ℝ where
ν := uniform univ
prob_measure := d:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0⊢ IsProbabilityMeasure (uniform univ)
d:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0this:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ IsProbabilityMeasure (uniform univ)
d:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0this:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ ℙ univ ≠ 0
rwa [Measure.Subtype.volume_univ mes_α.nullMeasurableSetd:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0this:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ ℙ α ≠ 0
kernel_iter _ := potential_max_kernel mes_α mα₁ κ
markov_kernel n := d:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0n:ℕ⊢ IsMarkovKernel (potential_max_kernel mes_α mα₁ κ)
d:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0n:ℕdata:prod_iter_image ↑α ℝ n⊢ IsProbabilityMeasure ((potential_max_kernel mes_α mα₁ κ) data)
d:ℕκ:ℝ≥0α:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤h:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data κ) ≠ 0n:ℕdata:prod_iter_image ↑α ℝ nthis:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ IsProbabilityMeasure ((potential_max_kernel mes_α mα₁ κ) data)
All goals completed! 🐙2.3. RankOpt
The RankOpt algorithm has been introduced in (Malherbe and Vayatis, 2017)Cédric Malherbe and Nicolas Vayatis, 2017. “A Ranking Approach to Global Optimization”. In International conference on machine learning. and is a complex stochastic iterative global optimization algorithm. It is based on the notion of a ranking rule, which is a function induced by another function f and is defined as
r_f (x, y) = \begin{cases}
1 & \text{if } f(x) > f(y) \\
0 & \text{if } f(x) = f(y) \\
-1 & \text{if } f(x) < f(y)
\end{cases}.
Ranking rules define equivalence classes of functions that share the same induced ranking. To use RankOpt, one needs to define a countable set of ranking rules \mathcal{R} (e.g. a subset of the ranking rules induced by continuous functions). The algorithm samples points x such that, for any ranking rule r \in \mathcal{R} that is consistent with the observed data, 0 \le r(x, \arg \max_{1 \le i \le n} f(X_i)). The set of such points is the set of potential maximizers of the algorithm and is defined as
\left \{x \in \alpha \; \middle| \; \exists r \in \mathcal{R}, \; \mathcal{L}_n(r) = 0 \land 0 \le r(x, \arg \max_{1 \le i \le n} f(X_i))\right\},
where \alpha is a measurable subset of \mathbb{R}^d with finite and non-zero measure and \mathcal{L}_n(r) is the ranking loss of r on the observed data, defined as
\mathcal{L}_n(r) \triangleq \frac{2}{n (n + 1)} \sum_{1 \le i \le j \le n} \mathbb{I}\left[r(X_i, X_j) \neq r_f(X_i, X_j)\right].
Note that RankOpt does not require to know r_f explicitly as it is evaluated only on observed data points.
RankOpt can be represented in our framework as follows:
variable {α : Type} [MeasurableSpace α] {d : ℕ} {α : Set (ℝᵈ d)}
(mes_α : MeasurableSet α) (mα₀ : ℙ α ≠ 0) (mα₁ : ℙ α ≠ ⊤)
{𝓡 : Set (RankRule α)} (h𝓡 : 𝓡.Countable)noncomputable def RankOpt : Algorithm α ℝ where
ν := uniform Set.univ
prob_measure := α✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0⊢ IsProbabilityMeasure (uniform univ)
α✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0this:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ IsProbabilityMeasure (uniform univ)
α✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0this:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ ℙ univ ≠ 0
rwa [Measure.Subtype.volume_univ mes_α.nullMeasurableSetα✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0this:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ ℙ α ≠ 0
kernel_iter _ := potential_max_kernel mes_α mα₁ h𝓡
markov_kernel n := α✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0n:ℕ⊢ IsMarkovKernel (potential_max_kernel mes_α mα₁ h𝓡)
α✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0n:ℕdata:prod_iter_image ↑α ℝ n⊢ IsProbabilityMeasure ((potential_max_kernel mes_α mα₁ h𝓡) data)
α✝:Typeinst✝:MeasurableSpace α✝d:ℕα:Set (ℝᵈ d)mes_α:MeasurableSet αmα₀:ℙ α ≠ 0mα₁:ℙ α ≠ ⊤𝓡:Set (RankRule ↑α)h𝓡:𝓡.Countableh:∀ (n : ℕ) (data : prod_iter_image ↑α ℝ n), ℙ (potential_max data 𝓡) ≠ 0n:ℕdata:prod_iter_image ↑α ℝ nthis:IsFiniteMeasure ℙ := i₁ mes_α mα₁⊢ IsProbabilityMeasure ((potential_max_kernel mes_α mα₁ h𝓡) data)
All goals completed! 🐙
where RankRule is defined as the subtype of \{-1, 0, 1\}-valued functions that are jointly measurable:
def RankRule (α : Type) [MeasurableSpace α] :=
{f : α → α → ({-1, 0, 1} : Set ℝ) // Measurable <| Function.uncurry f}