The Pure Random Search (PRS) algorithm for global optimization.
This baseline algorithm samples uniformly from the input space at each iteration using a fixed
probability measure ΞΌ.
2.Β Some examples
In this page, we present two examples of algorithms encompassed by our framework: the Pure Random Search (PRS) and the LIPO algorithms. The formalization of these algorithms in our framework relies on the definition of the initial probability measure and the Markov kernels that define how to sample the next element based on the previous ones.
2.1.Β Pure Random Search
The Pure Random Search (PRS) algorithm is a simple stochastic iterative global optimization algorithm that samples uniformly from the input space at each iteration using a fixed
probability measure ΞΌ. It can be represented in our framework as follows:
PRS.{u_1, u_2} {Ξ± : Type u_1} {Ξ² : Type u_2} [MeasurableSpace Ξ±] [MeasurableSpace Ξ²] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] : Algorithm Ξ± Ξ²PRS.{u_1, u_2} {Ξ± : Type u_1} {Ξ² : Type u_2} [MeasurableSpace Ξ±] [MeasurableSpace Ξ²] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] : Algorithm Ξ± Ξ²
noncomputable def PRS : Algorithm Ξ± Ξ² where
Ξ½ := ΞΌ
kernel_iter _ := Kernel.const _ ΞΌ2.2.Β Decision-based methods
We provide an interface for decision-based optimization algorithms. Such algorithm are described by a sequence of decision rules to decide wether to accept or reject a candidate solution at each iteration, based on the observed data. Decision rules are functions with the signature D_n : \Omega^{n + 1} \times \beta^{n + 1} \to [0, 1], where \Omega is the search space and \beta is the space of function values (most likely \mathbb{R}). More formally, given a function to optimize V : \Omega \to \beta at each iteration n > 0, the algorithm samples a candidate x from a distribution \mu on \Omega and accepts it with probability D_n(C_n, V(C_n), x).
We define Decision as a special case of the Algorithm structure, for which the Markov kernel at iteration n is defined through the decision rules (D_n)_{n \in \mathbb{N}}. Measurability assumptions are required on the decision rules for the kernels to be well-defined and we only support \{0, 1\}-valued decision rules at the moment.
Decision.{u_1, u_2} {Ξ± : Type u_1} {Ξ² : Type u_2} [MeasurableSpace Ξ±] [MeasurableSpace Ξ²] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] {decision : (n : β) β prod_iter_image Ξ± Ξ² n β Set Ξ±} (measurableSet_decision_prod : β (n : β), MeasurableSet {p | p.2 β decision n p.1}) (h : β (n : β) (data : prod_iter_image Ξ± Ξ² n), ΞΌ (decision n data) β 0) : Algorithm Ξ± Ξ²Decision.{u_1, u_2} {Ξ± : Type u_1} {Ξ² : Type u_2} [MeasurableSpace Ξ±] [MeasurableSpace Ξ²] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] {decision : (n : β) β prod_iter_image Ξ± Ξ² n β Set Ξ±} (measurableSet_decision_prod : β (n : β), MeasurableSet {p | p.2 β decision n p.1}) (h : β (n : β) (data : prod_iter_image Ξ± Ξ² n), ΞΌ (decision n data) β 0) : Algorithm Ξ± Ξ²
The interface for decision-based optimization algorithms.
2.3.Β LIPO
The LIPO algorithm has been introduced in Β (Malherbe and Vayatis,Β 2017)CΓ©dric Malherbe and Nicolas Vayatis,Β 2017. βGlobal optimization of Lipschitz functionsβ. InΒ International conference on machine learning. and is a decision-based global optimization algorithm made for Lipschitz functions. It uses the Lipschitz constant to adaptively construct an upper bound that guides the sampling of the search space. More formally, at each iteration, LIPO samples from the set of potential maximizers defined as
\left \{ x \in \alpha \; \middle| \; \max_{1 \le i \le n} f(X_i) \le \min_{1 \le i \le n} f(X_i) + \kappa \|X_i - x \|_2 \right\},
where Ξ± is a pseudo-metric, measurable, Borel, and second-countable space and \kappa is the Lipschitz constant of the function f : \alpha \to \mathbb{R}.
It can be represented in our framework as a special case of the Decision structure, where the decision rules return 1 if the candidate solution is in the set of potential maximizers and 0 otherwise:
noncomputable def LIPO : Algorithm Ξ± β :=
Decision ΞΌ (fun _ β¦ measurableSet_potential_max_prod ΞΊ) h
Note that LIPO requires the set of potential maximizers to have non-zero measure at each iteration, ensuring that the algorithm can sample from it. This is a non-trivial assumption that depends on the choice of the initial probability measure ΞΌ, the function to optimize, and the \sigma-algebra on the search space.
LIPO.{u_1} {Ξ± : Type u_1} [PseudoMetricSpace Ξ±] [MeasurableSpace Ξ±] [BorelSpace Ξ±] [SecondCountableTopology Ξ±] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] (ΞΊ : NNReal) (h : β (n : β) (data : prod_iter_image Ξ± β n), ΞΌ (LIPO.potential_max ΞΊ data) β 0) : Algorithm Ξ± βLIPO.{u_1} {Ξ± : Type u_1} [PseudoMetricSpace Ξ±] [MeasurableSpace Ξ±] [BorelSpace Ξ±] [SecondCountableTopology Ξ±] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] (ΞΊ : NNReal) (h : β (n : β) (data : prod_iter_image Ξ± β n), ΞΌ (LIPO.potential_max ΞΊ data) β 0) : Algorithm Ξ± β
The LIPO (LIPschitz Optimization) algorithm for global optimization.
This algorithm optimizes an unknown function assuming only that it has a finite Lipschitz
constant ΞΊ. It starts with an arbitrary probability measure ΞΌ as initial distribution and
iteratively samples from the set of potential maximizers, ensuring consistency and convergence to
the global optimum (Malherbe et al., 2017).
2.4.Β RankOpt
The RankOpt algorithm has been introduced in Β (Malherbe and Vayatis,Β 2017)CΓ©dric Malherbe and Nicolas Vayatis,Β 2017. βA Ranking Approach to Global Optimizationβ. InΒ International conference on machine learning. and is a sophisticated decision-based global optimization algorithm. It is based on the notion of a ranking rule, which is a function induced by another function f and is defined as
r_f (x, y) = \begin{cases}
1 & \text{if } f(x) > f(y) \\
0 & \text{if } f(x) = f(y) \\
-1 & \text{if } f(x) < f(y)
\end{cases}.
Ranking rules define equivalence classes of functions that share the same induced ranking. To use RankOpt, one needs to define a countable set of ranking rules \mathcal{R} (e.g. a subset of the ranking rules induced by continuous functions). The algorithm samples points x such that, for any ranking rule r \in \mathcal{R} that is consistent with the observed data, 0 \le r(x, \arg \max_{1 \le i \le n} f(X_i)). The set of such points is the set of potential maximizers of the algorithm and is defined as
\left \{x \in \alpha \; \middle| \; \exists r \in \mathcal{R}, \; \mathcal{L}_n(r) = 0 \land 0 \le r(x, \arg \max_{1 \le i \le n} f(X_i))\right\},
where Ξ± is a measurable space and \mathcal{L}_n(r) is the ranking loss of r on the observed data, defined as
\mathcal{L}_n(r) \triangleq \frac{2}{n (n + 1)} \sum_{1 \le i \le j \le n} \mathbb{I}\left[r(X_i, X_j) \neq r_f(X_i, X_j)\right].
Note that RankOpt does not require to know r_f explicitly as it is evaluated only on observed data points.
It can be represented in our framework as a special case of the Decision structure, where the decision rules return 1 if the candidate solution is in the set of potential maximizers and 0 otherwise:
noncomputable def RankOpt : Algorithm Ξ± Ξ² :=
Decision ΞΌ (fun _ β¦ measurableSet_potential_max_prod hπ‘) h
Note that RankOpt requires the set of potential maximizers to have non-zero measure at each iteration, ensuring that the algorithm can sample from it. This is a non-trivial assumption that depends on the choice of the initial probability measure ΞΌ, the function to optimize, and the \sigma-algebra on the search space.
RankOpt.{u_1, u_2} {Ξ± : Type u_1} {Ξ² : Type u_2} [MeasurableSpace Ξ±] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] [TopologicalSpace Ξ²] [MeasurableSpace Ξ²] [BorelSpace Ξ²] [LinearOrder Ξ²] [SecondCountableTopology Ξ²] [OpensMeasurableSpace Ξ²] [OrderClosedTopology Ξ²] {π‘ : Set (RankRule Ξ±)} (hπ‘ : π‘.Countable) (h : β (n : β) (data : prod_iter_image Ξ± Ξ² n), ΞΌ (RankOpt.potential_max data π‘) β 0) : Algorithm Ξ± Ξ²RankOpt.{u_1, u_2} {Ξ± : Type u_1} {Ξ² : Type u_2} [MeasurableSpace Ξ±] (ΞΌ : MeasureTheory.Measure Ξ±) [MeasureTheory.IsProbabilityMeasure ΞΌ] [TopologicalSpace Ξ²] [MeasurableSpace Ξ²] [BorelSpace Ξ²] [LinearOrder Ξ²] [SecondCountableTopology Ξ²] [OpensMeasurableSpace Ξ²] [OrderClosedTopology Ξ²] {π‘ : Set (RankRule Ξ±)} (hπ‘ : π‘.Countable) (h : β (n : β) (data : prod_iter_image Ξ± Ξ² n), ΞΌ (RankOpt.potential_max data π‘) β 0) : Algorithm Ξ± Ξ²
The RankOpt algorithm for global optimization.
This algorithm uses a ranking approach to optimize an unknown function. It maintains a hypothesis
class π‘ of ranking rules. It starts with an arbitrary probability measure ΞΌ as initial
distribution and samples from the set of points that could be optimal according to ranking rules
consistent with the observed data (Malherbe et al., 2017).
The term RankRule is defined as the subtype of \{-1, 0, 1\}-valued functions that are jointly measurable:
def RankRule (Ξ± : Type*) [MeasurableSpace Ξ±] :=
{f : Ξ± β Ξ± β ({-1, 0, 1} : Set β) // Measurable <| Function.uncurry f}2.5.Β CMA-ES
A general implementation of the CMA-ES algorithm in any dimension. As CMA-ES samples \lambda points at each iteration, the input space of the algorithm is \mathbb{R}^{d \times \lambda}, which represents a sequence of \lambda points in \mathbb{R}^{d}. The initial measure is the product of \lambda standard multivariate Gaussian measures on \mathbb{R}^{d}, and the kernel is defined as a product of \lambda multivariate Gaussian measures, where the mean and covariance matrix are given by measurable functions of the past evaluations. These functions can be anything as long as they are measurable w.r.t. the history of the algorithm, thus allowing for any CMA-ES variant to be implemented in this framework.
CMA_ES (d lam : β) {mean : (n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β EuclideanSpace β (Fin d)} (hmean : β (n : β), Measurable (mean n)) {covar : (n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β Matrix (Fin d) (Fin d) β} (hcovar : β (n : β), Measurable (covar n)) (m : EuclideanSpace β (Fin d)) (S : Matrix (Fin d) (Fin d) β) : Algorithm (β_ d lam) (β_ d lam)CMA_ES (d lam : β) {mean : (n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β EuclideanSpace β (Fin d)} (hmean : β (n : β), Measurable (mean n)) {covar : (n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β Matrix (Fin d) (Fin d) β} (hcovar : β (n : β), Measurable (covar n)) (m : EuclideanSpace β (Fin d)) (S : Matrix (Fin d) (Fin d) β) : Algorithm (β_ d lam) (β_ d lam)
The Covariance Matrix Adaptation - Evolution Strategy (CMA-ES) algorithm for global optimization, given the mean and covariance update rules as measurable functions of the history.
noncomputable def CMA_ES : Algorithm (β_ d lam) (β_ d lam) where
Ξ½ := Measure.pi (fun _ β¦ multivariateGaussian m S)
kernel_iter := CMAKernel d lam hmean hcovar
markov_kernel n := β¨fun a => d:βlam:βmean:(n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β EuclideanSpace β (Fin d)hmean:β (n : β), Measurable (mean n)covar:(n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β Matrix (Fin d) (Fin d) βhcovar:β (n : β), Measurable (covar n)m:EuclideanSpace β (Fin d)S:Matrix (Fin d) (Fin d) βn:βa:prod_iter_image (β_ d lam) (β_ d lam) nβ’ IsProbabilityMeasure ((CMAKernel d lam hmean hcovar n) a) d:βlam:βmean:(n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β EuclideanSpace β (Fin d)hmean:β (n : β), Measurable (mean n)covar:(n : β) β prod_iter_image (β_ d lam) (β_ d lam) n β Matrix (Fin d) (Fin d) βhcovar:β (n : β), Measurable (covar n)m:EuclideanSpace β (Fin d)S:Matrix (Fin d) (Fin d) βn:βa:prod_iter_image (β_ d lam) (β_ d lam) nβ’ IsProbabilityMeasure (Measure.pi fun x => multivariateGaussian (mean n a) (covar n a)); All goals completed! πβ©