nmoo.benchmark
A benchmarking utility. The following is perhaps the one of simplest benchmark one could make
from pymoo.problems.multi import ZDT1
from pymoo.algorithms.moo.nsga2 import NSGA2
from nmoo import Benchmark, GaussianNoise, KNNAvg, WrappedProblem
zdt1 = WrappedProblem(ZDT1())
noisy_zdt1 = GaussianNoise(zdt1, np.zeros(2), 1)
knnavg_zdt1 = KNNAvg(noisy_zdt1, max_distance=1.0)
benchmark = Benchmark(
problems={
"knnavg": {
"problem": knnavg_zdt1
}
},
algorithms={
"nsga2": {
"algorithm": NSGA2()
}
},
n_runs=3,
output_dir_path="./out",
)
which simply runs vanilla NSGA2
against a KNN-Averaging-denoised
Gaussian-noised synthetic
ZDT1 against NSGA21, 3
times. The benchmark can be executed with
benchmark.run()
and the ./out
directory will be populated with various artefact, see below.
Refer to https://github.com/altaris/noisy-moo/blob/main/example.ipynb to get started, or to https://github.com/altaris/noisy-moo/blob/main/example.py for a more complete example.
Artefact specification
After the benchmark above is run, the ./out
directory is populated with the
following artefacts:
benchmark.csv
the main result file. It has one row per (algorithm, problem, run number, generation). The columns are:n_gen
,n_eval
,timedelta
,algorithm
,problem
,n_run
,perf_igd
. Here is a samplen_gen,n_eval,timedelta,algorithm,problem,n_run,perf_igd 1,100,0 days 00:00:00.046010,nsga2,knnavg,1,2.7023936601855274 2,200,0 days 00:00:00.110027,nsga2,knnavg,1,2.9920028540271617 3,300,0 days 00:00:00.170194,nsga2,knnavg,1,2.808592743167947 4,400,0 days 00:00:00.234336,nsga2,knnavg,1,2.7716447570482603 5,500,0 days 00:00:00.300136,nsga2,knnavg,1,2.76605547730596 6,600,0 days 00:00:00.367092,nsga2,knnavg,1,2.016998447316908 7,700,0 days 00:00:00.432571,nsga2,knnavg,1,2.025674566580406 8,800,0 days 00:00:00.501700,nsga2,knnavg,1,1.7875644431157067 9,900,0 days 00:00:00.571355,nsga2,knnavg,1,2.5705921276809542
<problem>.<algorithm>.<run number>.csv
: same asbenchmark.csv
but only for a given (algorithm, problem, run number) triple.<problem>.<algorithm>.<run number>.pi-<perf. indicator>.csv
: performance indicator file. Contains one row per generation. The columns areperf_<perf. indicator name>
,algorithm
,problem
,n_gen
,n_run
. Here is a sample fromknnavg.nsga2.1.pi-igd.csv
:,perf_igd,algorithm,problem,n_gen,n_run 0,2.7023936601855274,nsga2,knnavg,1,1 1,2.9920028540271617,nsga2,knnavg,2,1 2,2.808592743167947,nsga2,knnavg,3,1 3,2.7716447570482603,nsga2,knnavg,4,1 4,2.76605547730596,nsga2,knnavg,5,1 5,2.0169984473169076,nsga2,knnavg,6,1 6,2.025674566580406,nsga2,knnavg,7,1 7,1.7875644431157067,nsga2,knnavg,8,1 8,2.5705921276809542,nsga2,knnavg,9,1 9,2.245542743713137,nsga2,knnavg,10,1
<problem>.<algorithm>.<run number>.<layer number>-<layer name>.npz
: NPZ archive containing the history of all calls to a given layer of a given problem. In the example above, problemknnavg_zda1
has three layers:knn_avg
(layer 1, the outermost one),gaussian_noise
(layer 2), andwrapped_problem
(layer 3, the innermost one). Recall that you can set the name of a layer using thename
argument inWrappedProblem.__init__
. The keys areX
,F
,_batch
,_run
. It may also contain keysG
,dF
,dG
,ddF
,ddG
,CV
,feasible
depending on the ground pymoo problem. The arrays at each keys have the same length (shape[0]
), which is the number of individuals that have been evaluated throughout that run. In our example above,out/knnavg.nsga2.1.1-knn_avg.npz
has keysX
,F
,_batch
,_run
, and the arrays have shape(19600, 30)
,(19600, 2)
,(19600,)
,(19600,)
, respectively. 30 is the number of variables of ZDT1, while 2 is the number of objectives.<problem>.<algorithm>.<run number>.pp.npz
: Pareto population of a given (algorithm, problem, run number) triple. The keys areX
,F
,G
,dF
,dG
,ddF
,ddG
,CV
,feasible
,_batch
, and all arrays have the same length (shape[0]
). Rowi
corresponds to an individual that was Pareto-ideal at generation_batch[i]
.<problem>.<algorithm>.gpp.npz
: Global Pareto population of a given problem-algorithm pair. It is the Pareto population of the population of all individuals designed across all runs and all generations of a given problem-algorithm pair. It is used to compute certain performance indicators in the absence of a baseline Pareto front. The keys areX
,F
,G
,dF
,dG
,ddF
,ddG
,CV
,feasible
,_batch
.
-
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T. (2000). A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In: , et al. Parallel Problem Solving from Nature PPSN VI. PPSN 2000. Lecture Notes in Computer Science, vol 1917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45356-3_83 ↩
1# pylint: disable=too-many-lines 2 3""" 4A benchmarking utility. The following is perhaps the one of simplest benchmark 5one could make 6```py 7from pymoo.problems.multi import ZDT1 8from pymoo.algorithms.moo.nsga2 import NSGA2 9from nmoo import Benchmark, GaussianNoise, KNNAvg, WrappedProblem 10 11zdt1 = WrappedProblem(ZDT1()) 12noisy_zdt1 = GaussianNoise(zdt1, np.zeros(2), 1) 13knnavg_zdt1 = KNNAvg(noisy_zdt1, max_distance=1.0) 14 15benchmark = Benchmark( 16 problems={ 17 "knnavg": { 18 "problem": knnavg_zdt1 19 } 20 }, 21 algorithms={ 22 "nsga2": { 23 "algorithm": NSGA2() 24 } 25 }, 26 n_runs=3, 27 output_dir_path="./out", 28) 29``` 30which simply runs vanilla `NSGA2` against a KNN-Averaging-denoised 31Gaussian-noised synthetic 32[ZDT1](https://pymoo.org/problems/multi/zdt.html#ZDT1) against NSGA2[^nsga2], 3 33times. The benchmark can be executed with 34``` 35benchmark.run() 36``` 37and the `./out` directory will be populated with various artefact, see below. 38 39Refer to https://github.com/altaris/noisy-moo/blob/main/example.ipynb to get 40started, or to https://github.com/altaris/noisy-moo/blob/main/example.py for a 41more complete example. 42 43## Artefact specification 44 45After the benchmark above is run, the `./out` directory is populated with the 46following artefacts: 47 48* `benchmark.csv` the main result file. It has one row per (algorithm, problem, 49 run number, generation). The columns are: `n_gen`, `n_eval`, `timedelta`, 50 `algorithm`, `problem`, `n_run`, `perf_igd`. Here is a sample 51 52 n_gen,n_eval,timedelta,algorithm,problem,n_run,perf_igd 53 1,100,0 days 00:00:00.046010,nsga2,knnavg,1,2.7023936601855274 54 2,200,0 days 00:00:00.110027,nsga2,knnavg,1,2.9920028540271617 55 3,300,0 days 00:00:00.170194,nsga2,knnavg,1,2.808592743167947 56 4,400,0 days 00:00:00.234336,nsga2,knnavg,1,2.7716447570482603 57 5,500,0 days 00:00:00.300136,nsga2,knnavg,1,2.76605547730596 58 6,600,0 days 00:00:00.367092,nsga2,knnavg,1,2.016998447316908 59 7,700,0 days 00:00:00.432571,nsga2,knnavg,1,2.025674566580406 60 8,800,0 days 00:00:00.501700,nsga2,knnavg,1,1.7875644431157067 61 9,900,0 days 00:00:00.571355,nsga2,knnavg,1,2.5705921276809542 62 63* `<problem>.<algorithm>.<run number>.csv`: same as `benchmark.csv` but only 64 for a given (algorithm, problem, run number) triple. 65 66* `<problem>.<algorithm>.<run number>.pi-<perf. indicator>.csv`: performance 67 indicator file. Contains one row per generation. The columns are `perf_<perf. 68 indicator name>`, `algorithm`, `problem`, `n_gen`, `n_run`. Here is a sample 69 from `knnavg.nsga2.1.pi-igd.csv`: 70 71 ,perf_igd,algorithm,problem,n_gen,n_run 72 0,2.7023936601855274,nsga2,knnavg,1,1 73 1,2.9920028540271617,nsga2,knnavg,2,1 74 2,2.808592743167947,nsga2,knnavg,3,1 75 3,2.7716447570482603,nsga2,knnavg,4,1 76 4,2.76605547730596,nsga2,knnavg,5,1 77 5,2.0169984473169076,nsga2,knnavg,6,1 78 6,2.025674566580406,nsga2,knnavg,7,1 79 7,1.7875644431157067,nsga2,knnavg,8,1 80 8,2.5705921276809542,nsga2,knnavg,9,1 81 9,2.245542743713137,nsga2,knnavg,10,1 82 83* `<problem>.<algorithm>.<run number>.<layer number>-<layer name>.npz`: NPZ 84 archive containing the history of all calls to a given layer of a given 85 problem. In the example above, problem `knnavg_zda1` has three layers: 86 `knn_avg` (layer 1, the outermost one), `gaussian_noise` (layer 2), and 87 `wrapped_problem` (layer 3, the innermost one). Recall that you can set the 88 name of a layer using the `name` argument in `WrappedProblem.__init__`. The 89 keys are `X`, `F`, `_batch`, `_run`. It may also contain keys `G`, `dF`, 90 `dG`, `ddF`, `ddG`, `CV`, `feasible` depending on the ground pymoo problem. 91 The arrays at each keys have the same length (`shape[0]`), which is the 92 number of individuals that have been evaluated throughout that run. In our 93 example above, `out/knnavg.nsga2.1.1-knn_avg.npz` has keys `X`, `F`, 94 `_batch`, `_run`, and the arrays have shape `(19600, 30)`, `(19600, 2)`, 95 `(19600,)`, `(19600,)`, respectively. 30 is the number of variables of ZDT1, 96 while 2 is the number of objectives. 97 98* `<problem>.<algorithm>.<run number>.pp.npz`: Pareto population of a given 99 (algorithm, problem, run number) triple. The keys are `X`, `F`, `G`, `dF`, 100 `dG`, `ddF`, `ddG`, `CV`, `feasible`, `_batch`, and all arrays have the same 101 length (`shape[0]`). Row `i` corresponds to an individual that was 102 Pareto-ideal at generation `_batch[i]`. 103 104* `<problem>.<algorithm>.gpp.npz`: *Global Pareto population* of a given 105 problem-algorithm pair. It is the Pareto population of the population of all 106 individuals designed across all runs and all generations of a given 107 problem-algorithm pair. It is used to compute certain performance indicators 108 in the absence of a baseline Pareto front. The keys are `X`, `F`, `G`, `dF`, 109 `dG`, `ddF`, `ddG`, `CV`, `feasible`, `_batch`. 110 111[^nsga2]: Deb, K., Agrawal, S., Pratap, A., Meyarivan, T. (2000). A Fast 112 Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective 113 Optimization: NSGA-II. In: , et al. Parallel Problem Solving from Nature 114 PPSN VI. PPSN 2000. Lecture Notes in Computer Science, vol 1917. Springer, 115 Berlin, Heidelberg. https://doi.org/10.1007/3-540-45356-3_83 116 117""" 118__docformat__ = "google" 119 120import os 121from copy import deepcopy 122from dataclasses import dataclass 123from itertools import product 124from pathlib import Path 125from typing import Any, Callable, Dict, List, Optional, Union 126 127import numpy as np 128import pandas as pd 129from joblib import Parallel, delayed 130from loguru import logger as logging 131from pymoo.config import Config 132from pymoo.factory import get_performance_indicator 133from pymoo.optimize import minimize 134 135from nmoo.callbacks import TimerCallback 136from nmoo.denoisers import ResampleAverage 137from nmoo.indicators import DeltaF, DeltaFPareto 138from nmoo.utils.population import pareto_frontier_mask, population_list_to_dict 139from nmoo.utils.logging import configure_logging 140 141Config.show_compile_hint = False 142 143_PIC = Callable[[Dict[str, np.ndarray]], Optional[float]] 144""" 145Performance Indicator Callable. Type of a function that takes a state (dict of 146`np.ndarray` with keys e.g. `F`, `X`, `pF`, etc.) and returns the value of a 147performance indicator. See `Benchmark._compute_performance_indicator`. 148""" 149 150 151@dataclass 152class PAPair: 153 """ 154 Represents a problem-algorithm pair. 155 """ 156 157 algorithm_description: Dict[str, Any] 158 algorithm_name: str 159 problem_description: Dict[str, Any] 160 problem_name: str 161 162 def __str__(self) -> str: 163 return f"{self.problem_name}|{self.algorithm_name}" 164 165 def global_pareto_population_filename(self) -> str: 166 """Returns `<problem_name>.<algorithm_name>.gpp.npz`.""" 167 return f"{self.problem_name}.{self.algorithm_name}.gpp.npz" 168 169 170@dataclass 171class PARTriple(PAPair): 172 """ 173 Represents a problem-algorithm-(run number) triple. 174 """ 175 176 n_run: int 177 178 def __str__(self) -> str: 179 return f"{self.problem_name}|{self.algorithm_name}|{self.n_run}" 180 181 def denoised_top_layer_history_filename(self) -> str: 182 """ 183 Returns 184 `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.denoised.npz`. 185 """ 186 prefix = self.filename_prefix() 187 name = self.problem_description["problem"]._name 188 return f"{prefix}.1-{name}.denoised.npz" 189 190 def denoised_top_layer_pareto_history_filename(self) -> str: 191 """ 192 Returns 193 `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.pareto-denoised.npz`. 194 """ 195 prefix = self.filename_prefix() 196 name = self.problem_description["problem"]._name 197 return f"{prefix}.1-{name}.pareto-denoised.npz" 198 199 def filename_prefix(self) -> str: 200 """Returns `<problem_name>.<algorithm_name>.<n_run>`.""" 201 return f"{self.problem_name}.{self.algorithm_name}.{self.n_run}" 202 203 def innermost_layer_history_filename(self) -> str: 204 """Returns the filename of the innermost layer history.""" 205 prefix = self.filename_prefix() 206 problem = self.problem_description["problem"] 207 inner = problem.innermost_wrapper() 208 name, depth = inner._name, problem.depth() 209 return f"{prefix}.{depth}-{name}.npz" 210 211 def pareto_population_filename(self) -> str: 212 """Returns `<problem_name>.<algorithm_name>.<n_run>.pp.npz`.""" 213 return self.filename_prefix() + ".pp.npz" 214 215 def pi_filename(self, pi_name: str) -> str: 216 """ 217 Returns `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. 218 """ 219 return self.filename_prefix() + f".pi-{pi_name}.csv" 220 221 def result_filename(self) -> str: 222 """Returns `<problem_name>.<algorithm_name>.<n_run>.csv`.""" 223 return self.filename_prefix() + ".csv" 224 225 def top_layer_history_filename(self) -> str: 226 """Returns the filename of the top layer history.""" 227 prefix = self.filename_prefix() 228 name = self.problem_description["problem"]._name 229 return f"{prefix}.1-{name}.npz" 230 231 232# pylint: disable=too-many-instance-attributes 233class Benchmark: 234 """ 235 A benchmark is constructed with a list of problems and pymoo algorithms 236 descriptions, and run each algorithm against each problem, storing all 237 histories for later analysis. 238 """ 239 240 SUPPORTED_PERFOMANCE_INDICATORS = [ 241 "df", 242 "dfp", 243 "gd", 244 "gd+", 245 "ggd", 246 "ggd+", 247 "ghv", 248 "gigd", 249 "gigd+", 250 "hv", 251 "igd", 252 "igd+", 253 "ps", 254 "rggd", 255 "rggd+", 256 "rghv", 257 "rgigd", 258 "rgigd+", 259 ] 260 261 _algorithms: Dict[str, dict] 262 """ 263 List of algorithms to be benchmarked. 264 """ 265 266 _dump_histories: bool 267 """ 268 Wether the history of each `WrappedProblem` involved in this benchmark 269 should be written to disk. 270 """ 271 272 _max_retry: int 273 """ 274 Maximum number of attempts to run a given problem-algorithm-(run number) 275 triple before giving up. 276 """ 277 278 _n_runs: int 279 """ 280 Number of times to run a given problem-algorithm pair. 281 """ 282 283 _output_dir_path: Path 284 """ 285 Path of the output directory. 286 """ 287 288 _performance_indicators: List[str] 289 """ 290 List of performance indicator to calculate during the benchmark. 291 """ 292 293 _problems: Dict[str, dict] 294 """ 295 List of problems to be benchmarked. 296 """ 297 298 _results: pd.DataFrame 299 """ 300 Results of all runs. 301 """ 302 303 _seeds: List[Optional[int]] 304 """ 305 List of seeds to use. Must be of length `_n_runs`. 306 """ 307 308 def __init__( 309 self, 310 output_dir_path: Union[Path, str], 311 problems: Dict[str, dict], 312 algorithms: Dict[str, dict], 313 n_runs: int = 1, 314 dump_histories: bool = True, 315 performance_indicators: Optional[List[str]] = None, 316 max_retry: int = -1, 317 seeds: Optional[List[Optional[int]]] = None, 318 ): 319 """ 320 Constructor. The set of problems to be benchmarked is represented by a 321 dictionary with the following structure: 322 323 problems = { 324 <problem_name>: <problem_description>, 325 <problem_name>: <problem_description>, 326 } 327 328 where `<problem_name>` is a user-defined string (but stay reasonable 329 since it may be used in filenames), and `<problem_description>` is a 330 dictionary with the following keys: 331 * `df_n_evals` (int, optional): see the explanation of the `df` 332 performance indicator below; defaults to `1`; 333 * `evaluator` (optional): an algorithm evaluator object that will be 334 applied to every algorithm that run on this problem; if an 335 algorithm already has an evaluator attached to it (see 336 `<algorithm_description>` below), the evaluator attached to this 337 problem takes precedence; note that the evaluator is deepcopied for 338 every run of `minimize`; 339 * `hv_ref_point` (optional, `np.ndarray`): a reference point for 340 computing hypervolume performance, see `performance_indicators` 341 argument; 342 * `pareto_front` (optional, `np.ndarray`): a Pareto front subset; 343 * `problem`: a `WrappedProblem` instance. 344 345 The set of algorithms to be used is specified similarly:: 346 347 algorithms = { 348 <algorithm_name>: <algorithm_description>, 349 <algorithm_name>: <algorithm_description>, 350 } 351 352 where `<algorithm_name>` is a user-defined string (but stay reasonable 353 since it may be used in filenames), and `<algorithm_description>` is a 354 dictionary with the following keys: * `algorithm`: a pymoo `Algorithm` 355 object; note that it is deepcopied 356 for every run of `minimize`; 357 * `display` (optional): a custom `pymoo.util.display.Display` object 358 for customization purposes; 359 * `evaluator` (optional): an algorithm evaluator object; note that it 360 is deepcopied for every run of `minimize`; 361 * `return_least_infeasible` (optional, bool): if the algorithm cannot 362 find a feasible solution, wether the least infeasible solution 363 should still be returned; defaults to `False`; 364 * `termination` (optional): a pymoo termination criterion; note that it 365 is deepcopied for every run of `minimize`; 366 * `verbose` (optional, bool): wether outputs should be printed during 367 during the execution of the algorithm; defaults to `False`. 368 369 Args: 370 algorithms (Dict[str, dict]): Dict of all algorithms to be 371 benchmarked. 372 dump_histories (bool): Wether the history of each 373 `WrappedProblem` involved in this benchmark should be written 374 to disk. Defaults to `True`. 375 max_retries (int): Maximum number of attempts to run a given 376 problem-algorithm-(run number) triple before giving up. Set it 377 to `-1` to retry indefinitely. 378 n_runs (int): Number of times to run a given problem-algorithm 379 pair. 380 problems (Dict[str, dict]): Dict of all problems to be benchmarked. 381 performance_indicators (Optional[List[str]]): List of perfomance 382 indicators to be calculated and included in the result 383 dataframe (see `Benchmark.final_results`). Supported indicators 384 are 385 * `df`: ΔF metric, see the documentation of 386 `nmoo.indicators.delta_f.DeltaF`; `df_n_eval` should be set 387 in the problem description, but it default to 1 if not; 388 * `dfp`: ΔF-Pareto metric, see the documentation of 389 `nmoo.indicators.delta_f_pareto.DeltaFPareto`; `df_n_eval` 390 should be set in the problem description, but it default to 1 391 if not; 392 * `gd`: [generational 393 distance](https://pymoo.org/misc/indicators.html#Generational-Distance-(GD)), 394 requires `pareto_front` to be set in the problem description 395 dictionaries, otherwise the value of this indicator will be 396 `NaN`; 397 * `gd+`: [generational distance 398 plus](https://pymoo.org/misc/indicators.html#Generational-Distance-Plus-(GD+)), 399 requires `pareto_front` to be set in the problem description 400 dictionaries, otherwise the value of this indicator will be 401 `NaN`; 402 * `hv`: 403 [hypervolume](https://pymoo.org/misc/indicators.html#Hypervolume), 404 requires `hv_ref_point` to be set in the problem discription 405 dictionaries, otherwise the value of this indicator will be 406 `NaN`; 407 * `igd`: [inverted generational 408 distance](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-(IGD)), 409 requires `pareto_front` to be set in the problem description 410 dictionaries, otherwise the value of this indicator will be 411 `NaN`; 412 * `igd+`: [inverted generational distance 413 plus](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-Plus-(IGD+)), 414 requires `pareto_front` to be set in the problem description 415 dictionaries, otherwise the value of this indicator will be 416 `NaN`; 417 * `ps`: population size, or equivalently, the size of the 418 current Pareto front; 419 * `ggd`: ground generational distance, where the ground 420 problem's predicted objective values are used instead of the 421 outer problem's; requires `pareto_front` to be set in the 422 problem description dictionaries, otherwise the value of this 423 indicator will be `NaN`; 424 * `ggd+`: ground generational distance plus; requires 425 `pareto_front` to be set in the problem description 426 dictionaries, otherwise the value of this indicator will be 427 `NaN`; 428 * `ghv`: ground hypervolume; requires `hv_ref_point` to be set 429 in the problem discription dictionaries, otherwise the value 430 of this indicator will be `NaN`; 431 * `gigd`: ground inverted generational distance; requires 432 `pareto_front` to be set in the problem description 433 dictionaries, otherwise the value of this indicator will be 434 `NaN`; 435 * `gigd+`: ground inverted generational distance plus; requires 436 `pareto_front` to be set in the problem description 437 dictionaries, otherwise the value of this indicator will be 438 `NaN`. 439 * `rggd`: resampled ground generational distance, where the 440 ground problem's predicted objective values (resampled and 441 averaged a given number of times) are used instead of the 442 outer problem's; requires `pareto_front` to be set in the 443 problem description dictionaries, otherwise the value of this 444 indicator will be `NaN`; `rg_n_eval` should also be set in 445 the problem description, but defaults to 1 if not; 446 * `rggd+`: resampled ground generational distance plus; 447 requires `pareto_front` to be set in the problem description 448 dictionaries, otherwise the value of this indicator will be 449 `NaN`; `rg_n_eval` should also be set in the problem 450 description, but defaults to 1 if not; `rg_n_eval` should 451 also be set in the problem description, but defaults to 1 if 452 not; 453 * `rghv`: resampled ground hypervolume; requires `hv_ref_point` 454 to be set in the problem discription dictionaries, otherwise 455 the value of this indicator will be `NaN`; `rg_n_eval` should 456 also be set in the problem description, but defaults to 1 if 457 not; 458 * `rgigd`: resampled ground inverted generational distance; 459 requires `pareto_front` to be set in the problem description 460 dictionaries, otherwise the value of this indicator will be 461 `NaN`; `rg_n_eval` should also be set in the problem 462 description, but defaults to 1 if not; 463 * `rgigd+`: resampled ground inverted generational distance 464 plus; requires `pareto_front` to be set in the problem 465 description dictionaries, otherwise the value of this 466 indicator will be `NaN`; `rg_n_eval` should also be set in 467 the problem description, but defaults to 1 if not. 468 469 In the result dataframe, the corresponding columns will be 470 named `perf_<name of indicator>`, e.g. `perf_igd`. If left 471 unspecified, defaults to `["igd"]`. 472 473 seeds (Optional[List[Optional[int]]]): List of seeds to use. The 474 first seed will be used for the first run of every 475 algorithm-problem pair, etc. 476 """ 477 self._output_dir_path = Path(output_dir_path) 478 self._set_problems(problems) 479 self._set_algorithms(algorithms) 480 if n_runs <= 0: 481 raise ValueError( 482 "The number of run (for each problem-algorithm pair) must be " 483 "at least 1." 484 ) 485 self._n_runs = n_runs 486 self._dump_histories = dump_histories 487 self._set_performance_indicators(performance_indicators) 488 self._max_retry = max_retry 489 if seeds is None: 490 self._seeds = [None] * n_runs 491 elif len(seeds) < n_runs: 492 raise ValueError( 493 f"Not enough seeds: provided {len(seeds)} seeds but specified " 494 f"{n_runs} runs." 495 ) 496 else: 497 if len(seeds) > n_runs: 498 logging.warning( 499 "Too many seeds: provided {} but only need {} " 500 "(i.e. n_run)", 501 len(seeds), 502 n_runs, 503 ) 504 self._seeds = seeds 505 506 def _compute_global_pareto_population(self, pair: PAPair) -> None: 507 """ 508 Computes the global Pareto population of a given problem-algorithm 509 pair. See `compute_global_pareto_populations`. Assumes that the global 510 Pareto population has not already been calculated, i.e. that 511 `<output_dir_path>/<problem>.<algorithm>.gpp.npz` does not exist. 512 """ 513 configure_logging(prefix=f"[{pair}]") 514 logging.debug("Computing global Pareto population") 515 gpp_path = ( 516 self._output_dir_path / pair.global_pareto_population_filename() 517 ) 518 populations: Dict[str, List[np.ndarray]] = {} 519 for n_run in range(1, self._n_runs + 1): 520 triple = PARTriple( 521 algorithm_description=pair.algorithm_description, 522 algorithm_name=pair.algorithm_name, 523 n_run=n_run, 524 problem_description=pair.problem_description, 525 problem_name=pair.problem_name, 526 ) 527 path = self._output_dir_path / triple.pareto_population_filename() 528 if not path.exists(): 529 logging.debug( 530 "File {} does not exist. The corresponding triple runs " 531 "most likely have not finished or all failed", 532 path, 533 triple, 534 ) 535 continue 536 data = np.load(path, allow_pickle=True) 537 for k, v in data.items(): 538 populations[k] = populations.get(k, []) + [v] 539 data.close() 540 541 consolidated = {k: np.concatenate(v) for k, v in populations.items()} 542 if "F" not in consolidated: 543 logging.error( 544 "No Pareto population file found. The corresponding triple " 545 "runs most likely have not finished or all failed", 546 ) 547 return 548 mask = pareto_frontier_mask(consolidated["F"]) 549 np.savez_compressed( 550 gpp_path, 551 **{k: v[mask] for k, v in consolidated.items()}, 552 ) 553 554 # pylint: disable=too-many-branches 555 # pylint: disable=too-many-locals 556 def _compute_performance_indicator( 557 self, triple: PARTriple, pi_name: str 558 ) -> None: 559 """ 560 Computes a performance indicators for a given problem-algorithm-(run 561 number) triple and stores it under 562 `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. Assumes 563 that the this performance indicator has not already been calculated, 564 i.e. that that file does not exist. 565 566 Warning: 567 This fails if either the top layer history or the pareto population 568 artefact (`<problem_name>.<algorithm_name>.<n_run>.pp.npz`) could 569 not be loaded as numpy arrays. 570 571 Todo: 572 Refactor (again) 573 """ 574 configure_logging(prefix=f"[{triple}|{pi_name}]") 575 logging.debug("Computing PI") 576 577 pic: _PIC = lambda _: np.nan 578 if pi_name in ["df", "dfp"]: 579 problem = triple.problem_description["problem"] 580 n_evals = triple.problem_description.get("df_n_evals", 1) 581 Class, fn = { 582 "df": (DeltaF, triple.denoised_top_layer_history_filename()), 583 "dfp": ( 584 DeltaFPareto, 585 triple.denoised_top_layer_pareto_history_filename(), 586 ), 587 }[pi_name] 588 delta_f = Class( 589 problem, 590 n_evals, 591 self._output_dir_path / fn, 592 ) 593 pic = lambda s: delta_f.do(s["F"], s["X"]) 594 elif pi_name in ["gd", "gd+", "igd", "igd+"]: 595 pic = self._get_pic_gd_type(triple, pi_name) 596 elif pi_name in ["ggd", "ggd+", "gigd", "gigd+"]: 597 pic = self._get_pic_gd_type(triple, pi_name[1:]) 598 elif pi_name in ["rggd", "rggd+", "rgigd", "rgigd+"]: 599 pic = self._get_pic_gd_type(triple, pi_name[2:]) 600 elif ( 601 pi_name in ["hv", "ghv", "rghv"] 602 and "hv_ref_point" in triple.problem_description 603 ): 604 ref_point = triple.problem_description["hv_ref_point"] 605 pi = get_performance_indicator("hv", ref_point=ref_point) 606 pic = lambda s: pi.do(s["F"]) 607 elif pi_name == "ps": 608 pic = lambda s: s["X"].shape[0] 609 else: 610 logging.warning( 611 "Unprocessable performance indicator {}. This could be " 612 "because some required arguments (e.g. 'hv_ref_point') are " 613 "missing", 614 pi_name, 615 ) 616 617 # On which history is the PIC going to be called? By default, it is on 618 # the top layer history. 619 if pi_name in ["ps"]: 620 history = np.load( 621 self._output_dir_path / triple.pareto_population_filename() 622 ) 623 elif pi_name in ["ggd", "ggd+", "ghv", "gigd", "gigd+"]: 624 history = np.load( 625 self._output_dir_path 626 / triple.innermost_layer_history_filename() 627 ) 628 elif pi_name in ["rggd", "rggd+", "rghv", "rgigd", "rgigd+"]: 629 history = self._get_rg_history(triple) 630 else: 631 history = np.load( 632 self._output_dir_path / triple.top_layer_history_filename() 633 ) 634 635 states: List[Dict[str, np.ndarray]] = [] 636 for i in range(1, history["_batch"].max() + 1): 637 idx = history["_batch"] == i 638 states.append({"X": history["X"][idx], "F": history["F"][idx]}) 639 df = pd.DataFrame() 640 df["perf_" + pi_name] = list(map(pic, states)) 641 df["algorithm"] = triple.algorithm_name 642 df["problem"] = triple.problem_name 643 df["n_gen"] = range(1, len(states) + 1) 644 df["n_run"] = triple.n_run 645 logging.debug( 646 "Writing result to {}", 647 self._output_dir_path / triple.pi_filename(pi_name), 648 ) 649 df.to_csv(self._output_dir_path / triple.pi_filename(pi_name)) 650 651 def _get_pic_gd_type(self, triple: PARTriple, pi_name: str) -> _PIC: 652 """ 653 Returns the `_PIC` corresponding to the either the `gd`, `gd+`, `igd`, 654 or `igd+` performance indicator. As a reminder, a `_PIC`, or 655 Performance Indicator Callable, is a function that takes a dict of 656 `np.ndarray` and returns an optional `float`. In this case, the dict 657 must have the key `F`. 658 """ 659 if "pareto_front" in triple.problem_description: 660 pf = triple.problem_description.get("pareto_front") 661 else: 662 path = ( 663 self._output_dir_path 664 / triple.global_pareto_population_filename() 665 ) 666 data = np.load(path) 667 pf = data["F"] 668 data.close() 669 pi = get_performance_indicator(pi_name, pf) 670 return lambda s: pi.do(s["F"]) 671 672 def _get_rg_history(self, triple: PARTriple) -> Dict[str, np.ndarray]: 673 """ 674 Returns the `X` and `F` history of the ground problem of the triple, 675 but where `F` has been resampled a given number of times (`rg_n_evals` 676 parameter in the problem's description). This involves wrapping the 677 ground problem in a `nmoo.denoisers.ResampleAverage` and evaluating the 678 history's `X` array. 679 """ 680 history = dict( 681 np.load( 682 self._output_dir_path / triple.top_layer_history_filename() 683 ) 684 ) 685 rgp = ResampleAverage( 686 triple.problem_description["problem"].ground_problem(), 687 triple.problem_description.get("rg_n_eval", 1), 688 ) 689 history["F"] = rgp.evaluate(history["X"], return_values_of="F") 690 return history 691 692 def _par_triple_done(self, triple: PARTriple) -> bool: 693 """ 694 Wether a problem-algorithm-(run number) has been successfully executed. 695 This is determined by checking if 696 `_output_dir_path/triple.result_filename()` exists or not. 697 """ 698 return (self._output_dir_path / triple.result_filename()).is_file() 699 700 def _run_par_triple( 701 self, 702 triple: PARTriple, 703 ) -> None: 704 """ 705 Runs a given algorithm against a given problem. See 706 `nmoo.benchmark.Benchmark.run`. Immediately dumps the history of the 707 problem and all wrapped problems with the following naming scheme: 708 709 output_dir_path/<problem_name>.<algorithm_name>.<n_run>.<level>.npz 710 711 where `level` is the depth of the wrapped problem, starting at `1`. See 712 `nmoo.wrapped_problem.WrappedProblem.dump_all_histories`. It also dumps 713 the compounded Pareto population for every at every generation (or just 714 the last generation of `set_history` is set to `False` in the algorithm 715 description) in 716 717 output_dir_path/<problem_name>.<algorithm_name>.<n_run>.pp.npz 718 719 Additionally, it generates a CSV file containing various statistics 720 named: 721 722 output_dir_path/<problem_name>.<algorithm_name>.<n_run>.csv 723 724 The existence of this file is also used to determine if the 725 problem-algorithm-(run number) triple has already been run when 726 resuming a benchmark. 727 728 Args: 729 triple: A `PARTriple` object representing the 730 problem-algorithm-(run number) triple to run. 731 """ 732 configure_logging(prefix=f"[{str(triple)}]") 733 logging.info("Starting run") 734 735 triple.problem_description["problem"].start_new_run() 736 evaluator = triple.problem_description.get( 737 "evaluator", 738 triple.algorithm_description.get("evaluator"), 739 ) 740 try: 741 seed = self._seeds[triple.n_run - 1] 742 problem = deepcopy(triple.problem_description["problem"]) 743 problem.reseed(seed) 744 results = minimize( 745 problem, 746 triple.algorithm_description["algorithm"], 747 termination=triple.algorithm_description.get("termination"), 748 copy_algorithm=True, 749 copy_termination=True, 750 # extra Algorithm.setup kwargs 751 callback=TimerCallback(), 752 display=triple.algorithm_description.get("display"), 753 evaluator=deepcopy(evaluator), 754 return_least_infeasible=triple.algorithm_description.get( 755 "return_least_infeasible", False 756 ), 757 save_history=True, 758 seed=seed, 759 verbose=triple.algorithm_description.get("verbose", False), 760 ) 761 except Exception as e: # pylint: disable=broad-except 762 logging.error("Run failed: {}", e) 763 return 764 else: 765 logging.success("Run successful") 766 767 # Dump all layers histories 768 logging.debug("Writing layer histories to {}", self._output_dir_path) 769 if self._dump_histories: 770 results.problem.dump_all_histories( 771 self._output_dir_path, 772 triple.filename_prefix(), 773 ) 774 775 # Dump pareto sets 776 logging.debug( 777 "Writing pareto sets to {}", 778 self._output_dir_path / triple.pareto_population_filename(), 779 ) 780 np.savez_compressed( 781 self._output_dir_path / triple.pareto_population_filename(), 782 **population_list_to_dict([h.opt for h in results.history]), 783 ) 784 785 # Create and dump CSV file 786 logging.debug( 787 "Writing result CSV to {}", 788 self._output_dir_path / triple.result_filename(), 789 ) 790 df = pd.DataFrame() 791 df["n_gen"] = [a.n_gen for a in results.history] 792 df["n_eval"] = [a.evaluator.n_eval for a in results.history] 793 df["timedelta"] = results.algorithm.callback._deltas 794 # Important to create these columns once the dataframe has its full 795 # length 796 df["algorithm"] = triple.algorithm_name 797 df["problem"] = triple.problem_name 798 df["n_run"] = triple.n_run 799 df.to_csv( 800 self._output_dir_path / triple.result_filename(), 801 index=False, 802 ) 803 804 def _set_algorithms(self, algorithms: Dict[str, dict]) -> None: 805 """Validates and sets the algorithms dict""" 806 if not algorithms: 807 raise ValueError("A benchmark requires at least 1 algorithm.") 808 for k, v in algorithms.items(): 809 if not isinstance(v, dict): 810 raise ValueError( 811 f"Description for algorithm '{k}' must be a dict." 812 ) 813 if "algorithm" not in v: 814 raise ValueError( 815 f"Description for algorithm '{k}' is missing mandatory " 816 "key 'algorithm'." 817 ) 818 self._algorithms = algorithms 819 820 def _set_performance_indicators( 821 self, performance_indicators: Optional[List[str]] 822 ) -> None: 823 """Validates and sets the performance indicator list""" 824 if performance_indicators is None: 825 self._performance_indicators = ["igd"] 826 else: 827 self._performance_indicators = [] 828 for pi in set(performance_indicators): 829 if pi not in Benchmark.SUPPORTED_PERFOMANCE_INDICATORS: 830 raise ValueError(f"Unknown performance indicator '{pi}'") 831 self._performance_indicators.append(pi) 832 self._performance_indicators = sorted(self._performance_indicators) 833 834 def _set_problems(self, problems: Dict[str, dict]) -> None: 835 """Validates and sets the problem dict""" 836 if not problems: 837 raise ValueError("A benchmark requires at least 1 problem.") 838 for k, v in problems.items(): 839 if not isinstance(v, dict): 840 raise ValueError( 841 f"Description for problem '{k}' must be a dict." 842 ) 843 if "problem" not in v: 844 raise ValueError( 845 f"Description for problem '{k}' is missing mandatory key " 846 "'problem'." 847 ) 848 self._problems = problems 849 850 def all_pa_pairs(self) -> List[PAPair]: 851 """Generate the list of all problem-algorithm pairs.""" 852 everything = product( 853 self._algorithms.items(), 854 self._problems.items(), 855 ) 856 return [ 857 PAPair( 858 algorithm_description=aa, 859 algorithm_name=an, 860 problem_description=pp, 861 problem_name=pn, 862 ) 863 for (an, aa), (pn, pp) in everything 864 ] 865 866 def all_par_triples(self) -> List[PARTriple]: 867 """Generate the list of all problem-algorithm-(run number) triples.""" 868 everything = product( 869 self._algorithms.items(), 870 self._problems.items(), 871 range(1, self._n_runs + 1), 872 ) 873 return [ 874 PARTriple( 875 algorithm_description=aa, 876 algorithm_name=an, 877 n_run=r, 878 problem_description=pp, 879 problem_name=pn, 880 ) 881 for (an, aa), (pn, pp), r in everything 882 ] 883 884 def compute_global_pareto_populations( 885 self, n_jobs: int = -1, **joblib_kwargs 886 ) -> None: 887 """ 888 The global Pareto population of a problem-algorithm pair is the merged 889 population of all pareto populations across all runs of that pair. This 890 function calculates global Pareto population of all pairs and dumps it 891 to `<output_dir_path>/<problem>.<algorithm>.gpp.npz`. If that file 892 exists for a given problem-algorithm pair, then the global Pareto 893 population (of that pair) is not recalculated. 894 """ 895 logging.info("Computing global Pareto populations") 896 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 897 executor( 898 delayed(Benchmark._compute_global_pareto_population)(self, p) 899 for p in self.all_pa_pairs() 900 if not ( 901 self._output_dir_path / p.global_pareto_population_filename() 902 ).is_file() 903 ) 904 905 def compute_performance_indicators( 906 self, n_jobs: int = -1, **joblib_kwargs 907 ) -> None: 908 """ 909 Computes all performance indicators and saves the corresponding 910 dataframes in 911 `output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. 912 If that file exists for a given problem-algorithm-(run 913 number)-(performance indicator) tuple, then it is not recalculated. 914 """ 915 logging.info( 916 "Computing performance indicators: {}", 917 ", ".join(self._performance_indicators), 918 ) 919 everything = product( 920 self.all_par_triples(), self._performance_indicators 921 ) 922 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 923 executor( 924 delayed(Benchmark._compute_performance_indicator)(self, t, pi) 925 for t, pi in everything 926 if not (self._output_dir_path / t.pi_filename(pi)).is_file() 927 ) 928 929 def consolidate(self) -> None: 930 """ 931 Merges all statistics dataframes 932 (`<problem_name>.<algorithm_name>.<n_run>.csv`) and all PI dataframes 933 (`<problem_name>.<algorithm_name>.<n_run>.pi.csv`) into a single 934 dataframe, and saves it under `output_dir_path/benchmark.csv`. 935 """ 936 logging.info("Consolidating statistics") 937 all_df = [] 938 for triple in self.all_par_triples(): 939 path = self._output_dir_path / triple.result_filename() 940 if not path.exists(): 941 logging.debug( 942 "Statistic file {} does not exist. The corresponding " 943 "triple [{}] most likely hasn't finished or failed", 944 path, 945 triple, 946 ) 947 continue 948 all_df.append(pd.read_csv(path)) 949 self._results = pd.concat(all_df, ignore_index=True) 950 self._results["timedelta"] = pd.to_timedelta( 951 self._results["timedelta"] 952 ) 953 self._results = self._results.astype( 954 { 955 "algorithm": "category", 956 "n_gen": "uint32", 957 "n_run": "uint32", 958 "problem": "category", 959 } 960 ) 961 962 logging.info("Consolidating performance indicators") 963 all_df = [] 964 for triple in self.all_par_triples(): 965 df = pd.DataFrame() 966 for pi_name in self._performance_indicators: 967 path = self._output_dir_path / triple.pi_filename(pi_name) 968 if not path.exists(): 969 logging.debug("PI file {} does not exist.", path) 970 continue 971 tmp = pd.read_csv(path) 972 if df.empty: 973 df = tmp 974 else: 975 col = "perf_" + pi_name 976 df[col] = tmp[col] 977 all_df.append(df) 978 979 self._results = self._results.merge( 980 pd.concat(all_df, ignore_index=True), 981 how="outer", 982 on=["algorithm", "problem", "n_gen", "n_run"], 983 ) 984 985 # ??? 986 if "Unnamed: 0" in self._results: 987 del self._results["Unnamed: 0"] 988 989 path = self._output_dir_path / "benchmark.csv" 990 logging.info("Writing results to {}", path) 991 self.dump_results(path, index=False) 992 993 def dump_results(self, path: Union[Path, str], fmt: str = "csv", **kwargs): 994 """ 995 Dumps the internal `_result` dataframe. 996 997 Args: 998 path (Union[Path, str]): Path to the output file. 999 fmt (str): Text or binary format supported by pandas, see 1000 `here <https://pandas.pydata.org/docs/user_guide/io.html>`_. 1001 CSV by default. 1002 kwargs: Will be passed on the `pandas.DataFrame.to_<fmt>` method. 1003 """ 1004 saver = { 1005 "csv": pd.DataFrame.to_csv, 1006 "excel": pd.DataFrame.to_excel, 1007 "feather": pd.DataFrame.to_feather, 1008 "gbq": pd.DataFrame.to_gbq, 1009 "hdf": pd.DataFrame.to_hdf, 1010 "html": pd.DataFrame.to_html, 1011 "json": pd.DataFrame.to_json, 1012 "parquet": pd.DataFrame.to_parquet, 1013 "pickle": pd.DataFrame.to_pickle, 1014 }[fmt] 1015 saver(self._results, path, **kwargs) 1016 1017 def final_results( 1018 self, 1019 timedeltas_to_microseconds: bool = True, 1020 reset_index: bool = True, 1021 ) -> pd.DataFrame: 1022 """ 1023 Returns a dataframe containing the final row of each 1024 algorithm/problem/n_run triple, i.e. the final record of each run of 1025 the benchmark. 1026 1027 If the `reset_index` argument set to `False`, the resulting dataframe 1028 will have a multiindex given by the (algorithm, problem, n_run) tuples, 1029 e.g. 1030 1031 n_gen timedelta perf_gd ... 1032 algorithm problem n_run ... 1033 nsga2 bnh 1 155 886181 0.477980 ... 1034 2 200 29909 0.480764 ... 1035 zdt1 1 400 752818 0.191490 ... 1036 2 305 979112 0.260930 ... 1037 1038 (note tha the `timedelta` column has been converted to microseconds, 1039 see the `timedeltas_to_microseconds` argument below). If `reset_index` 1040 is set to `True` (the default), then the index is reset, giving 1041 something like this: 1042 1043 algorithm problem n_run n_gen timedelta perf_gd ... 1044 0 nsga2 bnh 1 155 886181 0.477980 ... 1045 1 nsga2 bnh 2 200 29909 0.480764 ... 1046 2 nsga2 zdt1 1 400 752818 0.191490 ... 1047 3 nsga2 zdt1 2 305 979112 0.260930 ... 1048 1049 This form is easier to plot. 1050 1051 Args: 1052 reset_index (bool): Wether to reset the index. Defaults to 1053 `True`. 1054 timedeltas_to_microseconds (bool): Wether to convert the 1055 timedeltas column to microseconds. Defaults to `True`. 1056 1057 """ 1058 df = self._results.groupby(["algorithm", "problem", "n_run"]).last() 1059 if timedeltas_to_microseconds: 1060 df["timedelta"] = df["timedelta"].dt.microseconds 1061 return df.reset_index() if reset_index else df 1062 1063 def run( 1064 self, 1065 n_jobs: int = -1, 1066 n_post_processing_jobs: int = 2, 1067 **joblib_kwargs, 1068 ): 1069 """ 1070 Runs the benchmark sequentially. Makes your laptop go brr. The 1071 histories of all problems are progressively dumped in the specified 1072 output directory as the benchmark run. At the end, the benchmark 1073 results are dumped in `output_dir_path/benchmark.csv`. 1074 1075 Args: 1076 n_jobs (int): Number of processes to use. See the `joblib.Parallel`_ 1077 documentation. Defaults to `-1`, i.e. all CPUs are used. 1078 n_post_processing_jobs (int): Number of processes to use for post 1079 processing tasks (computing global Pareto populations and 1080 performance indicators). These are memory-intensive tasks. 1081 Defaults to `2`. 1082 joblib_kwargs (dict): Additional kwargs to pass on to the 1083 `joblib.Parallel`_ instance. 1084 1085 .. _joblib.Parallel: 1086 https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html 1087 """ 1088 if not os.path.isdir(self._output_dir_path): 1089 os.mkdir(self._output_dir_path) 1090 triples = self.all_par_triples() 1091 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 1092 current_round = 0 1093 while ( 1094 self._max_retry < 0 or current_round <= self._max_retry 1095 ) and any(not self._par_triple_done(t) for t in triples): 1096 executor( 1097 delayed(Benchmark._run_par_triple)(self, t) 1098 for t in triples 1099 if not self._par_triple_done(t) 1100 ) 1101 current_round += 1 1102 if any(not self._par_triple_done(t) for t in triples): 1103 logging.warning( 1104 "Benchmark finished, but some triples could not be run " 1105 "successfully within the retry budget ({}):", 1106 self._max_retry, 1107 ) 1108 for t in filter(lambda x: not self._par_triple_done(x), triples): 1109 logging.warning(" [{}]", t) 1110 self.compute_global_pareto_populations( 1111 n_post_processing_jobs, **joblib_kwargs 1112 ) 1113 self.compute_performance_indicators( 1114 n_post_processing_jobs, **joblib_kwargs 1115 ) 1116 self.consolidate()
153class PAPair: 154 """ 155 Represents a problem-algorithm pair. 156 """ 157 158 algorithm_description: Dict[str, Any] 159 algorithm_name: str 160 problem_description: Dict[str, Any] 161 problem_name: str 162 163 def __str__(self) -> str: 164 return f"{self.problem_name}|{self.algorithm_name}" 165 166 def global_pareto_population_filename(self) -> str: 167 """Returns `<problem_name>.<algorithm_name>.gpp.npz`.""" 168 return f"{self.problem_name}.{self.algorithm_name}.gpp.npz"
Represents a problem-algorithm pair.
172class PARTriple(PAPair): 173 """ 174 Represents a problem-algorithm-(run number) triple. 175 """ 176 177 n_run: int 178 179 def __str__(self) -> str: 180 return f"{self.problem_name}|{self.algorithm_name}|{self.n_run}" 181 182 def denoised_top_layer_history_filename(self) -> str: 183 """ 184 Returns 185 `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.denoised.npz`. 186 """ 187 prefix = self.filename_prefix() 188 name = self.problem_description["problem"]._name 189 return f"{prefix}.1-{name}.denoised.npz" 190 191 def denoised_top_layer_pareto_history_filename(self) -> str: 192 """ 193 Returns 194 `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.pareto-denoised.npz`. 195 """ 196 prefix = self.filename_prefix() 197 name = self.problem_description["problem"]._name 198 return f"{prefix}.1-{name}.pareto-denoised.npz" 199 200 def filename_prefix(self) -> str: 201 """Returns `<problem_name>.<algorithm_name>.<n_run>`.""" 202 return f"{self.problem_name}.{self.algorithm_name}.{self.n_run}" 203 204 def innermost_layer_history_filename(self) -> str: 205 """Returns the filename of the innermost layer history.""" 206 prefix = self.filename_prefix() 207 problem = self.problem_description["problem"] 208 inner = problem.innermost_wrapper() 209 name, depth = inner._name, problem.depth() 210 return f"{prefix}.{depth}-{name}.npz" 211 212 def pareto_population_filename(self) -> str: 213 """Returns `<problem_name>.<algorithm_name>.<n_run>.pp.npz`.""" 214 return self.filename_prefix() + ".pp.npz" 215 216 def pi_filename(self, pi_name: str) -> str: 217 """ 218 Returns `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. 219 """ 220 return self.filename_prefix() + f".pi-{pi_name}.csv" 221 222 def result_filename(self) -> str: 223 """Returns `<problem_name>.<algorithm_name>.<n_run>.csv`.""" 224 return self.filename_prefix() + ".csv" 225 226 def top_layer_history_filename(self) -> str: 227 """Returns the filename of the top layer history.""" 228 prefix = self.filename_prefix() 229 name = self.problem_description["problem"]._name 230 return f"{prefix}.1-{name}.npz"
Represents a problem-algorithm-(run number) triple.
182 def denoised_top_layer_history_filename(self) -> str: 183 """ 184 Returns 185 `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.denoised.npz`. 186 """ 187 prefix = self.filename_prefix() 188 name = self.problem_description["problem"]._name 189 return f"{prefix}.1-{name}.denoised.npz"
Returns
<problem_name>.<algorithm_name>.<n_run>
.1-
191 def denoised_top_layer_pareto_history_filename(self) -> str: 192 """ 193 Returns 194 `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.pareto-denoised.npz`. 195 """ 196 prefix = self.filename_prefix() 197 name = self.problem_description["problem"]._name 198 return f"{prefix}.1-{name}.pareto-denoised.npz"
Returns
<problem_name>.<algorithm_name>.<n_run>
.1-
200 def filename_prefix(self) -> str: 201 """Returns `<problem_name>.<algorithm_name>.<n_run>`.""" 202 return f"{self.problem_name}.{self.algorithm_name}.{self.n_run}"
Returns <problem_name>.<algorithm_name>.<n_run>
.
204 def innermost_layer_history_filename(self) -> str: 205 """Returns the filename of the innermost layer history.""" 206 prefix = self.filename_prefix() 207 problem = self.problem_description["problem"] 208 inner = problem.innermost_wrapper() 209 name, depth = inner._name, problem.depth() 210 return f"{prefix}.{depth}-{name}.npz"
Returns the filename of the innermost layer history.
212 def pareto_population_filename(self) -> str: 213 """Returns `<problem_name>.<algorithm_name>.<n_run>.pp.npz`.""" 214 return self.filename_prefix() + ".pp.npz"
Returns <problem_name>.<algorithm_name>.<n_run>.pp.npz
.
216 def pi_filename(self, pi_name: str) -> str: 217 """ 218 Returns `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. 219 """ 220 return self.filename_prefix() + f".pi-{pi_name}.csv"
Returns <problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv
.
222 def result_filename(self) -> str: 223 """Returns `<problem_name>.<algorithm_name>.<n_run>.csv`.""" 224 return self.filename_prefix() + ".csv"
Returns <problem_name>.<algorithm_name>.<n_run>.csv
.
226 def top_layer_history_filename(self) -> str: 227 """Returns the filename of the top layer history.""" 228 prefix = self.filename_prefix() 229 name = self.problem_description["problem"]._name 230 return f"{prefix}.1-{name}.npz"
Returns the filename of the top layer history.
Inherited Members
234class Benchmark: 235 """ 236 A benchmark is constructed with a list of problems and pymoo algorithms 237 descriptions, and run each algorithm against each problem, storing all 238 histories for later analysis. 239 """ 240 241 SUPPORTED_PERFOMANCE_INDICATORS = [ 242 "df", 243 "dfp", 244 "gd", 245 "gd+", 246 "ggd", 247 "ggd+", 248 "ghv", 249 "gigd", 250 "gigd+", 251 "hv", 252 "igd", 253 "igd+", 254 "ps", 255 "rggd", 256 "rggd+", 257 "rghv", 258 "rgigd", 259 "rgigd+", 260 ] 261 262 _algorithms: Dict[str, dict] 263 """ 264 List of algorithms to be benchmarked. 265 """ 266 267 _dump_histories: bool 268 """ 269 Wether the history of each `WrappedProblem` involved in this benchmark 270 should be written to disk. 271 """ 272 273 _max_retry: int 274 """ 275 Maximum number of attempts to run a given problem-algorithm-(run number) 276 triple before giving up. 277 """ 278 279 _n_runs: int 280 """ 281 Number of times to run a given problem-algorithm pair. 282 """ 283 284 _output_dir_path: Path 285 """ 286 Path of the output directory. 287 """ 288 289 _performance_indicators: List[str] 290 """ 291 List of performance indicator to calculate during the benchmark. 292 """ 293 294 _problems: Dict[str, dict] 295 """ 296 List of problems to be benchmarked. 297 """ 298 299 _results: pd.DataFrame 300 """ 301 Results of all runs. 302 """ 303 304 _seeds: List[Optional[int]] 305 """ 306 List of seeds to use. Must be of length `_n_runs`. 307 """ 308 309 def __init__( 310 self, 311 output_dir_path: Union[Path, str], 312 problems: Dict[str, dict], 313 algorithms: Dict[str, dict], 314 n_runs: int = 1, 315 dump_histories: bool = True, 316 performance_indicators: Optional[List[str]] = None, 317 max_retry: int = -1, 318 seeds: Optional[List[Optional[int]]] = None, 319 ): 320 """ 321 Constructor. The set of problems to be benchmarked is represented by a 322 dictionary with the following structure: 323 324 problems = { 325 <problem_name>: <problem_description>, 326 <problem_name>: <problem_description>, 327 } 328 329 where `<problem_name>` is a user-defined string (but stay reasonable 330 since it may be used in filenames), and `<problem_description>` is a 331 dictionary with the following keys: 332 * `df_n_evals` (int, optional): see the explanation of the `df` 333 performance indicator below; defaults to `1`; 334 * `evaluator` (optional): an algorithm evaluator object that will be 335 applied to every algorithm that run on this problem; if an 336 algorithm already has an evaluator attached to it (see 337 `<algorithm_description>` below), the evaluator attached to this 338 problem takes precedence; note that the evaluator is deepcopied for 339 every run of `minimize`; 340 * `hv_ref_point` (optional, `np.ndarray`): a reference point for 341 computing hypervolume performance, see `performance_indicators` 342 argument; 343 * `pareto_front` (optional, `np.ndarray`): a Pareto front subset; 344 * `problem`: a `WrappedProblem` instance. 345 346 The set of algorithms to be used is specified similarly:: 347 348 algorithms = { 349 <algorithm_name>: <algorithm_description>, 350 <algorithm_name>: <algorithm_description>, 351 } 352 353 where `<algorithm_name>` is a user-defined string (but stay reasonable 354 since it may be used in filenames), and `<algorithm_description>` is a 355 dictionary with the following keys: * `algorithm`: a pymoo `Algorithm` 356 object; note that it is deepcopied 357 for every run of `minimize`; 358 * `display` (optional): a custom `pymoo.util.display.Display` object 359 for customization purposes; 360 * `evaluator` (optional): an algorithm evaluator object; note that it 361 is deepcopied for every run of `minimize`; 362 * `return_least_infeasible` (optional, bool): if the algorithm cannot 363 find a feasible solution, wether the least infeasible solution 364 should still be returned; defaults to `False`; 365 * `termination` (optional): a pymoo termination criterion; note that it 366 is deepcopied for every run of `minimize`; 367 * `verbose` (optional, bool): wether outputs should be printed during 368 during the execution of the algorithm; defaults to `False`. 369 370 Args: 371 algorithms (Dict[str, dict]): Dict of all algorithms to be 372 benchmarked. 373 dump_histories (bool): Wether the history of each 374 `WrappedProblem` involved in this benchmark should be written 375 to disk. Defaults to `True`. 376 max_retries (int): Maximum number of attempts to run a given 377 problem-algorithm-(run number) triple before giving up. Set it 378 to `-1` to retry indefinitely. 379 n_runs (int): Number of times to run a given problem-algorithm 380 pair. 381 problems (Dict[str, dict]): Dict of all problems to be benchmarked. 382 performance_indicators (Optional[List[str]]): List of perfomance 383 indicators to be calculated and included in the result 384 dataframe (see `Benchmark.final_results`). Supported indicators 385 are 386 * `df`: ΔF metric, see the documentation of 387 `nmoo.indicators.delta_f.DeltaF`; `df_n_eval` should be set 388 in the problem description, but it default to 1 if not; 389 * `dfp`: ΔF-Pareto metric, see the documentation of 390 `nmoo.indicators.delta_f_pareto.DeltaFPareto`; `df_n_eval` 391 should be set in the problem description, but it default to 1 392 if not; 393 * `gd`: [generational 394 distance](https://pymoo.org/misc/indicators.html#Generational-Distance-(GD)), 395 requires `pareto_front` to be set in the problem description 396 dictionaries, otherwise the value of this indicator will be 397 `NaN`; 398 * `gd+`: [generational distance 399 plus](https://pymoo.org/misc/indicators.html#Generational-Distance-Plus-(GD+)), 400 requires `pareto_front` to be set in the problem description 401 dictionaries, otherwise the value of this indicator will be 402 `NaN`; 403 * `hv`: 404 [hypervolume](https://pymoo.org/misc/indicators.html#Hypervolume), 405 requires `hv_ref_point` to be set in the problem discription 406 dictionaries, otherwise the value of this indicator will be 407 `NaN`; 408 * `igd`: [inverted generational 409 distance](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-(IGD)), 410 requires `pareto_front` to be set in the problem description 411 dictionaries, otherwise the value of this indicator will be 412 `NaN`; 413 * `igd+`: [inverted generational distance 414 plus](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-Plus-(IGD+)), 415 requires `pareto_front` to be set in the problem description 416 dictionaries, otherwise the value of this indicator will be 417 `NaN`; 418 * `ps`: population size, or equivalently, the size of the 419 current Pareto front; 420 * `ggd`: ground generational distance, where the ground 421 problem's predicted objective values are used instead of the 422 outer problem's; requires `pareto_front` to be set in the 423 problem description dictionaries, otherwise the value of this 424 indicator will be `NaN`; 425 * `ggd+`: ground generational distance plus; requires 426 `pareto_front` to be set in the problem description 427 dictionaries, otherwise the value of this indicator will be 428 `NaN`; 429 * `ghv`: ground hypervolume; requires `hv_ref_point` to be set 430 in the problem discription dictionaries, otherwise the value 431 of this indicator will be `NaN`; 432 * `gigd`: ground inverted generational distance; requires 433 `pareto_front` to be set in the problem description 434 dictionaries, otherwise the value of this indicator will be 435 `NaN`; 436 * `gigd+`: ground inverted generational distance plus; requires 437 `pareto_front` to be set in the problem description 438 dictionaries, otherwise the value of this indicator will be 439 `NaN`. 440 * `rggd`: resampled ground generational distance, where the 441 ground problem's predicted objective values (resampled and 442 averaged a given number of times) are used instead of the 443 outer problem's; requires `pareto_front` to be set in the 444 problem description dictionaries, otherwise the value of this 445 indicator will be `NaN`; `rg_n_eval` should also be set in 446 the problem description, but defaults to 1 if not; 447 * `rggd+`: resampled ground generational distance plus; 448 requires `pareto_front` to be set in the problem description 449 dictionaries, otherwise the value of this indicator will be 450 `NaN`; `rg_n_eval` should also be set in the problem 451 description, but defaults to 1 if not; `rg_n_eval` should 452 also be set in the problem description, but defaults to 1 if 453 not; 454 * `rghv`: resampled ground hypervolume; requires `hv_ref_point` 455 to be set in the problem discription dictionaries, otherwise 456 the value of this indicator will be `NaN`; `rg_n_eval` should 457 also be set in the problem description, but defaults to 1 if 458 not; 459 * `rgigd`: resampled ground inverted generational distance; 460 requires `pareto_front` to be set in the problem description 461 dictionaries, otherwise the value of this indicator will be 462 `NaN`; `rg_n_eval` should also be set in the problem 463 description, but defaults to 1 if not; 464 * `rgigd+`: resampled ground inverted generational distance 465 plus; requires `pareto_front` to be set in the problem 466 description dictionaries, otherwise the value of this 467 indicator will be `NaN`; `rg_n_eval` should also be set in 468 the problem description, but defaults to 1 if not. 469 470 In the result dataframe, the corresponding columns will be 471 named `perf_<name of indicator>`, e.g. `perf_igd`. If left 472 unspecified, defaults to `["igd"]`. 473 474 seeds (Optional[List[Optional[int]]]): List of seeds to use. The 475 first seed will be used for the first run of every 476 algorithm-problem pair, etc. 477 """ 478 self._output_dir_path = Path(output_dir_path) 479 self._set_problems(problems) 480 self._set_algorithms(algorithms) 481 if n_runs <= 0: 482 raise ValueError( 483 "The number of run (for each problem-algorithm pair) must be " 484 "at least 1." 485 ) 486 self._n_runs = n_runs 487 self._dump_histories = dump_histories 488 self._set_performance_indicators(performance_indicators) 489 self._max_retry = max_retry 490 if seeds is None: 491 self._seeds = [None] * n_runs 492 elif len(seeds) < n_runs: 493 raise ValueError( 494 f"Not enough seeds: provided {len(seeds)} seeds but specified " 495 f"{n_runs} runs." 496 ) 497 else: 498 if len(seeds) > n_runs: 499 logging.warning( 500 "Too many seeds: provided {} but only need {} " 501 "(i.e. n_run)", 502 len(seeds), 503 n_runs, 504 ) 505 self._seeds = seeds 506 507 def _compute_global_pareto_population(self, pair: PAPair) -> None: 508 """ 509 Computes the global Pareto population of a given problem-algorithm 510 pair. See `compute_global_pareto_populations`. Assumes that the global 511 Pareto population has not already been calculated, i.e. that 512 `<output_dir_path>/<problem>.<algorithm>.gpp.npz` does not exist. 513 """ 514 configure_logging(prefix=f"[{pair}]") 515 logging.debug("Computing global Pareto population") 516 gpp_path = ( 517 self._output_dir_path / pair.global_pareto_population_filename() 518 ) 519 populations: Dict[str, List[np.ndarray]] = {} 520 for n_run in range(1, self._n_runs + 1): 521 triple = PARTriple( 522 algorithm_description=pair.algorithm_description, 523 algorithm_name=pair.algorithm_name, 524 n_run=n_run, 525 problem_description=pair.problem_description, 526 problem_name=pair.problem_name, 527 ) 528 path = self._output_dir_path / triple.pareto_population_filename() 529 if not path.exists(): 530 logging.debug( 531 "File {} does not exist. The corresponding triple runs " 532 "most likely have not finished or all failed", 533 path, 534 triple, 535 ) 536 continue 537 data = np.load(path, allow_pickle=True) 538 for k, v in data.items(): 539 populations[k] = populations.get(k, []) + [v] 540 data.close() 541 542 consolidated = {k: np.concatenate(v) for k, v in populations.items()} 543 if "F" not in consolidated: 544 logging.error( 545 "No Pareto population file found. The corresponding triple " 546 "runs most likely have not finished or all failed", 547 ) 548 return 549 mask = pareto_frontier_mask(consolidated["F"]) 550 np.savez_compressed( 551 gpp_path, 552 **{k: v[mask] for k, v in consolidated.items()}, 553 ) 554 555 # pylint: disable=too-many-branches 556 # pylint: disable=too-many-locals 557 def _compute_performance_indicator( 558 self, triple: PARTriple, pi_name: str 559 ) -> None: 560 """ 561 Computes a performance indicators for a given problem-algorithm-(run 562 number) triple and stores it under 563 `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. Assumes 564 that the this performance indicator has not already been calculated, 565 i.e. that that file does not exist. 566 567 Warning: 568 This fails if either the top layer history or the pareto population 569 artefact (`<problem_name>.<algorithm_name>.<n_run>.pp.npz`) could 570 not be loaded as numpy arrays. 571 572 Todo: 573 Refactor (again) 574 """ 575 configure_logging(prefix=f"[{triple}|{pi_name}]") 576 logging.debug("Computing PI") 577 578 pic: _PIC = lambda _: np.nan 579 if pi_name in ["df", "dfp"]: 580 problem = triple.problem_description["problem"] 581 n_evals = triple.problem_description.get("df_n_evals", 1) 582 Class, fn = { 583 "df": (DeltaF, triple.denoised_top_layer_history_filename()), 584 "dfp": ( 585 DeltaFPareto, 586 triple.denoised_top_layer_pareto_history_filename(), 587 ), 588 }[pi_name] 589 delta_f = Class( 590 problem, 591 n_evals, 592 self._output_dir_path / fn, 593 ) 594 pic = lambda s: delta_f.do(s["F"], s["X"]) 595 elif pi_name in ["gd", "gd+", "igd", "igd+"]: 596 pic = self._get_pic_gd_type(triple, pi_name) 597 elif pi_name in ["ggd", "ggd+", "gigd", "gigd+"]: 598 pic = self._get_pic_gd_type(triple, pi_name[1:]) 599 elif pi_name in ["rggd", "rggd+", "rgigd", "rgigd+"]: 600 pic = self._get_pic_gd_type(triple, pi_name[2:]) 601 elif ( 602 pi_name in ["hv", "ghv", "rghv"] 603 and "hv_ref_point" in triple.problem_description 604 ): 605 ref_point = triple.problem_description["hv_ref_point"] 606 pi = get_performance_indicator("hv", ref_point=ref_point) 607 pic = lambda s: pi.do(s["F"]) 608 elif pi_name == "ps": 609 pic = lambda s: s["X"].shape[0] 610 else: 611 logging.warning( 612 "Unprocessable performance indicator {}. This could be " 613 "because some required arguments (e.g. 'hv_ref_point') are " 614 "missing", 615 pi_name, 616 ) 617 618 # On which history is the PIC going to be called? By default, it is on 619 # the top layer history. 620 if pi_name in ["ps"]: 621 history = np.load( 622 self._output_dir_path / triple.pareto_population_filename() 623 ) 624 elif pi_name in ["ggd", "ggd+", "ghv", "gigd", "gigd+"]: 625 history = np.load( 626 self._output_dir_path 627 / triple.innermost_layer_history_filename() 628 ) 629 elif pi_name in ["rggd", "rggd+", "rghv", "rgigd", "rgigd+"]: 630 history = self._get_rg_history(triple) 631 else: 632 history = np.load( 633 self._output_dir_path / triple.top_layer_history_filename() 634 ) 635 636 states: List[Dict[str, np.ndarray]] = [] 637 for i in range(1, history["_batch"].max() + 1): 638 idx = history["_batch"] == i 639 states.append({"X": history["X"][idx], "F": history["F"][idx]}) 640 df = pd.DataFrame() 641 df["perf_" + pi_name] = list(map(pic, states)) 642 df["algorithm"] = triple.algorithm_name 643 df["problem"] = triple.problem_name 644 df["n_gen"] = range(1, len(states) + 1) 645 df["n_run"] = triple.n_run 646 logging.debug( 647 "Writing result to {}", 648 self._output_dir_path / triple.pi_filename(pi_name), 649 ) 650 df.to_csv(self._output_dir_path / triple.pi_filename(pi_name)) 651 652 def _get_pic_gd_type(self, triple: PARTriple, pi_name: str) -> _PIC: 653 """ 654 Returns the `_PIC` corresponding to the either the `gd`, `gd+`, `igd`, 655 or `igd+` performance indicator. As a reminder, a `_PIC`, or 656 Performance Indicator Callable, is a function that takes a dict of 657 `np.ndarray` and returns an optional `float`. In this case, the dict 658 must have the key `F`. 659 """ 660 if "pareto_front" in triple.problem_description: 661 pf = triple.problem_description.get("pareto_front") 662 else: 663 path = ( 664 self._output_dir_path 665 / triple.global_pareto_population_filename() 666 ) 667 data = np.load(path) 668 pf = data["F"] 669 data.close() 670 pi = get_performance_indicator(pi_name, pf) 671 return lambda s: pi.do(s["F"]) 672 673 def _get_rg_history(self, triple: PARTriple) -> Dict[str, np.ndarray]: 674 """ 675 Returns the `X` and `F` history of the ground problem of the triple, 676 but where `F` has been resampled a given number of times (`rg_n_evals` 677 parameter in the problem's description). This involves wrapping the 678 ground problem in a `nmoo.denoisers.ResampleAverage` and evaluating the 679 history's `X` array. 680 """ 681 history = dict( 682 np.load( 683 self._output_dir_path / triple.top_layer_history_filename() 684 ) 685 ) 686 rgp = ResampleAverage( 687 triple.problem_description["problem"].ground_problem(), 688 triple.problem_description.get("rg_n_eval", 1), 689 ) 690 history["F"] = rgp.evaluate(history["X"], return_values_of="F") 691 return history 692 693 def _par_triple_done(self, triple: PARTriple) -> bool: 694 """ 695 Wether a problem-algorithm-(run number) has been successfully executed. 696 This is determined by checking if 697 `_output_dir_path/triple.result_filename()` exists or not. 698 """ 699 return (self._output_dir_path / triple.result_filename()).is_file() 700 701 def _run_par_triple( 702 self, 703 triple: PARTriple, 704 ) -> None: 705 """ 706 Runs a given algorithm against a given problem. See 707 `nmoo.benchmark.Benchmark.run`. Immediately dumps the history of the 708 problem and all wrapped problems with the following naming scheme: 709 710 output_dir_path/<problem_name>.<algorithm_name>.<n_run>.<level>.npz 711 712 where `level` is the depth of the wrapped problem, starting at `1`. See 713 `nmoo.wrapped_problem.WrappedProblem.dump_all_histories`. It also dumps 714 the compounded Pareto population for every at every generation (or just 715 the last generation of `set_history` is set to `False` in the algorithm 716 description) in 717 718 output_dir_path/<problem_name>.<algorithm_name>.<n_run>.pp.npz 719 720 Additionally, it generates a CSV file containing various statistics 721 named: 722 723 output_dir_path/<problem_name>.<algorithm_name>.<n_run>.csv 724 725 The existence of this file is also used to determine if the 726 problem-algorithm-(run number) triple has already been run when 727 resuming a benchmark. 728 729 Args: 730 triple: A `PARTriple` object representing the 731 problem-algorithm-(run number) triple to run. 732 """ 733 configure_logging(prefix=f"[{str(triple)}]") 734 logging.info("Starting run") 735 736 triple.problem_description["problem"].start_new_run() 737 evaluator = triple.problem_description.get( 738 "evaluator", 739 triple.algorithm_description.get("evaluator"), 740 ) 741 try: 742 seed = self._seeds[triple.n_run - 1] 743 problem = deepcopy(triple.problem_description["problem"]) 744 problem.reseed(seed) 745 results = minimize( 746 problem, 747 triple.algorithm_description["algorithm"], 748 termination=triple.algorithm_description.get("termination"), 749 copy_algorithm=True, 750 copy_termination=True, 751 # extra Algorithm.setup kwargs 752 callback=TimerCallback(), 753 display=triple.algorithm_description.get("display"), 754 evaluator=deepcopy(evaluator), 755 return_least_infeasible=triple.algorithm_description.get( 756 "return_least_infeasible", False 757 ), 758 save_history=True, 759 seed=seed, 760 verbose=triple.algorithm_description.get("verbose", False), 761 ) 762 except Exception as e: # pylint: disable=broad-except 763 logging.error("Run failed: {}", e) 764 return 765 else: 766 logging.success("Run successful") 767 768 # Dump all layers histories 769 logging.debug("Writing layer histories to {}", self._output_dir_path) 770 if self._dump_histories: 771 results.problem.dump_all_histories( 772 self._output_dir_path, 773 triple.filename_prefix(), 774 ) 775 776 # Dump pareto sets 777 logging.debug( 778 "Writing pareto sets to {}", 779 self._output_dir_path / triple.pareto_population_filename(), 780 ) 781 np.savez_compressed( 782 self._output_dir_path / triple.pareto_population_filename(), 783 **population_list_to_dict([h.opt for h in results.history]), 784 ) 785 786 # Create and dump CSV file 787 logging.debug( 788 "Writing result CSV to {}", 789 self._output_dir_path / triple.result_filename(), 790 ) 791 df = pd.DataFrame() 792 df["n_gen"] = [a.n_gen for a in results.history] 793 df["n_eval"] = [a.evaluator.n_eval for a in results.history] 794 df["timedelta"] = results.algorithm.callback._deltas 795 # Important to create these columns once the dataframe has its full 796 # length 797 df["algorithm"] = triple.algorithm_name 798 df["problem"] = triple.problem_name 799 df["n_run"] = triple.n_run 800 df.to_csv( 801 self._output_dir_path / triple.result_filename(), 802 index=False, 803 ) 804 805 def _set_algorithms(self, algorithms: Dict[str, dict]) -> None: 806 """Validates and sets the algorithms dict""" 807 if not algorithms: 808 raise ValueError("A benchmark requires at least 1 algorithm.") 809 for k, v in algorithms.items(): 810 if not isinstance(v, dict): 811 raise ValueError( 812 f"Description for algorithm '{k}' must be a dict." 813 ) 814 if "algorithm" not in v: 815 raise ValueError( 816 f"Description for algorithm '{k}' is missing mandatory " 817 "key 'algorithm'." 818 ) 819 self._algorithms = algorithms 820 821 def _set_performance_indicators( 822 self, performance_indicators: Optional[List[str]] 823 ) -> None: 824 """Validates and sets the performance indicator list""" 825 if performance_indicators is None: 826 self._performance_indicators = ["igd"] 827 else: 828 self._performance_indicators = [] 829 for pi in set(performance_indicators): 830 if pi not in Benchmark.SUPPORTED_PERFOMANCE_INDICATORS: 831 raise ValueError(f"Unknown performance indicator '{pi}'") 832 self._performance_indicators.append(pi) 833 self._performance_indicators = sorted(self._performance_indicators) 834 835 def _set_problems(self, problems: Dict[str, dict]) -> None: 836 """Validates and sets the problem dict""" 837 if not problems: 838 raise ValueError("A benchmark requires at least 1 problem.") 839 for k, v in problems.items(): 840 if not isinstance(v, dict): 841 raise ValueError( 842 f"Description for problem '{k}' must be a dict." 843 ) 844 if "problem" not in v: 845 raise ValueError( 846 f"Description for problem '{k}' is missing mandatory key " 847 "'problem'." 848 ) 849 self._problems = problems 850 851 def all_pa_pairs(self) -> List[PAPair]: 852 """Generate the list of all problem-algorithm pairs.""" 853 everything = product( 854 self._algorithms.items(), 855 self._problems.items(), 856 ) 857 return [ 858 PAPair( 859 algorithm_description=aa, 860 algorithm_name=an, 861 problem_description=pp, 862 problem_name=pn, 863 ) 864 for (an, aa), (pn, pp) in everything 865 ] 866 867 def all_par_triples(self) -> List[PARTriple]: 868 """Generate the list of all problem-algorithm-(run number) triples.""" 869 everything = product( 870 self._algorithms.items(), 871 self._problems.items(), 872 range(1, self._n_runs + 1), 873 ) 874 return [ 875 PARTriple( 876 algorithm_description=aa, 877 algorithm_name=an, 878 n_run=r, 879 problem_description=pp, 880 problem_name=pn, 881 ) 882 for (an, aa), (pn, pp), r in everything 883 ] 884 885 def compute_global_pareto_populations( 886 self, n_jobs: int = -1, **joblib_kwargs 887 ) -> None: 888 """ 889 The global Pareto population of a problem-algorithm pair is the merged 890 population of all pareto populations across all runs of that pair. This 891 function calculates global Pareto population of all pairs and dumps it 892 to `<output_dir_path>/<problem>.<algorithm>.gpp.npz`. If that file 893 exists for a given problem-algorithm pair, then the global Pareto 894 population (of that pair) is not recalculated. 895 """ 896 logging.info("Computing global Pareto populations") 897 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 898 executor( 899 delayed(Benchmark._compute_global_pareto_population)(self, p) 900 for p in self.all_pa_pairs() 901 if not ( 902 self._output_dir_path / p.global_pareto_population_filename() 903 ).is_file() 904 ) 905 906 def compute_performance_indicators( 907 self, n_jobs: int = -1, **joblib_kwargs 908 ) -> None: 909 """ 910 Computes all performance indicators and saves the corresponding 911 dataframes in 912 `output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. 913 If that file exists for a given problem-algorithm-(run 914 number)-(performance indicator) tuple, then it is not recalculated. 915 """ 916 logging.info( 917 "Computing performance indicators: {}", 918 ", ".join(self._performance_indicators), 919 ) 920 everything = product( 921 self.all_par_triples(), self._performance_indicators 922 ) 923 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 924 executor( 925 delayed(Benchmark._compute_performance_indicator)(self, t, pi) 926 for t, pi in everything 927 if not (self._output_dir_path / t.pi_filename(pi)).is_file() 928 ) 929 930 def consolidate(self) -> None: 931 """ 932 Merges all statistics dataframes 933 (`<problem_name>.<algorithm_name>.<n_run>.csv`) and all PI dataframes 934 (`<problem_name>.<algorithm_name>.<n_run>.pi.csv`) into a single 935 dataframe, and saves it under `output_dir_path/benchmark.csv`. 936 """ 937 logging.info("Consolidating statistics") 938 all_df = [] 939 for triple in self.all_par_triples(): 940 path = self._output_dir_path / triple.result_filename() 941 if not path.exists(): 942 logging.debug( 943 "Statistic file {} does not exist. The corresponding " 944 "triple [{}] most likely hasn't finished or failed", 945 path, 946 triple, 947 ) 948 continue 949 all_df.append(pd.read_csv(path)) 950 self._results = pd.concat(all_df, ignore_index=True) 951 self._results["timedelta"] = pd.to_timedelta( 952 self._results["timedelta"] 953 ) 954 self._results = self._results.astype( 955 { 956 "algorithm": "category", 957 "n_gen": "uint32", 958 "n_run": "uint32", 959 "problem": "category", 960 } 961 ) 962 963 logging.info("Consolidating performance indicators") 964 all_df = [] 965 for triple in self.all_par_triples(): 966 df = pd.DataFrame() 967 for pi_name in self._performance_indicators: 968 path = self._output_dir_path / triple.pi_filename(pi_name) 969 if not path.exists(): 970 logging.debug("PI file {} does not exist.", path) 971 continue 972 tmp = pd.read_csv(path) 973 if df.empty: 974 df = tmp 975 else: 976 col = "perf_" + pi_name 977 df[col] = tmp[col] 978 all_df.append(df) 979 980 self._results = self._results.merge( 981 pd.concat(all_df, ignore_index=True), 982 how="outer", 983 on=["algorithm", "problem", "n_gen", "n_run"], 984 ) 985 986 # ??? 987 if "Unnamed: 0" in self._results: 988 del self._results["Unnamed: 0"] 989 990 path = self._output_dir_path / "benchmark.csv" 991 logging.info("Writing results to {}", path) 992 self.dump_results(path, index=False) 993 994 def dump_results(self, path: Union[Path, str], fmt: str = "csv", **kwargs): 995 """ 996 Dumps the internal `_result` dataframe. 997 998 Args: 999 path (Union[Path, str]): Path to the output file. 1000 fmt (str): Text or binary format supported by pandas, see 1001 `here <https://pandas.pydata.org/docs/user_guide/io.html>`_. 1002 CSV by default. 1003 kwargs: Will be passed on the `pandas.DataFrame.to_<fmt>` method. 1004 """ 1005 saver = { 1006 "csv": pd.DataFrame.to_csv, 1007 "excel": pd.DataFrame.to_excel, 1008 "feather": pd.DataFrame.to_feather, 1009 "gbq": pd.DataFrame.to_gbq, 1010 "hdf": pd.DataFrame.to_hdf, 1011 "html": pd.DataFrame.to_html, 1012 "json": pd.DataFrame.to_json, 1013 "parquet": pd.DataFrame.to_parquet, 1014 "pickle": pd.DataFrame.to_pickle, 1015 }[fmt] 1016 saver(self._results, path, **kwargs) 1017 1018 def final_results( 1019 self, 1020 timedeltas_to_microseconds: bool = True, 1021 reset_index: bool = True, 1022 ) -> pd.DataFrame: 1023 """ 1024 Returns a dataframe containing the final row of each 1025 algorithm/problem/n_run triple, i.e. the final record of each run of 1026 the benchmark. 1027 1028 If the `reset_index` argument set to `False`, the resulting dataframe 1029 will have a multiindex given by the (algorithm, problem, n_run) tuples, 1030 e.g. 1031 1032 n_gen timedelta perf_gd ... 1033 algorithm problem n_run ... 1034 nsga2 bnh 1 155 886181 0.477980 ... 1035 2 200 29909 0.480764 ... 1036 zdt1 1 400 752818 0.191490 ... 1037 2 305 979112 0.260930 ... 1038 1039 (note tha the `timedelta` column has been converted to microseconds, 1040 see the `timedeltas_to_microseconds` argument below). If `reset_index` 1041 is set to `True` (the default), then the index is reset, giving 1042 something like this: 1043 1044 algorithm problem n_run n_gen timedelta perf_gd ... 1045 0 nsga2 bnh 1 155 886181 0.477980 ... 1046 1 nsga2 bnh 2 200 29909 0.480764 ... 1047 2 nsga2 zdt1 1 400 752818 0.191490 ... 1048 3 nsga2 zdt1 2 305 979112 0.260930 ... 1049 1050 This form is easier to plot. 1051 1052 Args: 1053 reset_index (bool): Wether to reset the index. Defaults to 1054 `True`. 1055 timedeltas_to_microseconds (bool): Wether to convert the 1056 timedeltas column to microseconds. Defaults to `True`. 1057 1058 """ 1059 df = self._results.groupby(["algorithm", "problem", "n_run"]).last() 1060 if timedeltas_to_microseconds: 1061 df["timedelta"] = df["timedelta"].dt.microseconds 1062 return df.reset_index() if reset_index else df 1063 1064 def run( 1065 self, 1066 n_jobs: int = -1, 1067 n_post_processing_jobs: int = 2, 1068 **joblib_kwargs, 1069 ): 1070 """ 1071 Runs the benchmark sequentially. Makes your laptop go brr. The 1072 histories of all problems are progressively dumped in the specified 1073 output directory as the benchmark run. At the end, the benchmark 1074 results are dumped in `output_dir_path/benchmark.csv`. 1075 1076 Args: 1077 n_jobs (int): Number of processes to use. See the `joblib.Parallel`_ 1078 documentation. Defaults to `-1`, i.e. all CPUs are used. 1079 n_post_processing_jobs (int): Number of processes to use for post 1080 processing tasks (computing global Pareto populations and 1081 performance indicators). These are memory-intensive tasks. 1082 Defaults to `2`. 1083 joblib_kwargs (dict): Additional kwargs to pass on to the 1084 `joblib.Parallel`_ instance. 1085 1086 .. _joblib.Parallel: 1087 https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html 1088 """ 1089 if not os.path.isdir(self._output_dir_path): 1090 os.mkdir(self._output_dir_path) 1091 triples = self.all_par_triples() 1092 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 1093 current_round = 0 1094 while ( 1095 self._max_retry < 0 or current_round <= self._max_retry 1096 ) and any(not self._par_triple_done(t) for t in triples): 1097 executor( 1098 delayed(Benchmark._run_par_triple)(self, t) 1099 for t in triples 1100 if not self._par_triple_done(t) 1101 ) 1102 current_round += 1 1103 if any(not self._par_triple_done(t) for t in triples): 1104 logging.warning( 1105 "Benchmark finished, but some triples could not be run " 1106 "successfully within the retry budget ({}):", 1107 self._max_retry, 1108 ) 1109 for t in filter(lambda x: not self._par_triple_done(x), triples): 1110 logging.warning(" [{}]", t) 1111 self.compute_global_pareto_populations( 1112 n_post_processing_jobs, **joblib_kwargs 1113 ) 1114 self.compute_performance_indicators( 1115 n_post_processing_jobs, **joblib_kwargs 1116 ) 1117 self.consolidate()
A benchmark is constructed with a list of problems and pymoo algorithms descriptions, and run each algorithm against each problem, storing all histories for later analysis.
309 def __init__( 310 self, 311 output_dir_path: Union[Path, str], 312 problems: Dict[str, dict], 313 algorithms: Dict[str, dict], 314 n_runs: int = 1, 315 dump_histories: bool = True, 316 performance_indicators: Optional[List[str]] = None, 317 max_retry: int = -1, 318 seeds: Optional[List[Optional[int]]] = None, 319 ): 320 """ 321 Constructor. The set of problems to be benchmarked is represented by a 322 dictionary with the following structure: 323 324 problems = { 325 <problem_name>: <problem_description>, 326 <problem_name>: <problem_description>, 327 } 328 329 where `<problem_name>` is a user-defined string (but stay reasonable 330 since it may be used in filenames), and `<problem_description>` is a 331 dictionary with the following keys: 332 * `df_n_evals` (int, optional): see the explanation of the `df` 333 performance indicator below; defaults to `1`; 334 * `evaluator` (optional): an algorithm evaluator object that will be 335 applied to every algorithm that run on this problem; if an 336 algorithm already has an evaluator attached to it (see 337 `<algorithm_description>` below), the evaluator attached to this 338 problem takes precedence; note that the evaluator is deepcopied for 339 every run of `minimize`; 340 * `hv_ref_point` (optional, `np.ndarray`): a reference point for 341 computing hypervolume performance, see `performance_indicators` 342 argument; 343 * `pareto_front` (optional, `np.ndarray`): a Pareto front subset; 344 * `problem`: a `WrappedProblem` instance. 345 346 The set of algorithms to be used is specified similarly:: 347 348 algorithms = { 349 <algorithm_name>: <algorithm_description>, 350 <algorithm_name>: <algorithm_description>, 351 } 352 353 where `<algorithm_name>` is a user-defined string (but stay reasonable 354 since it may be used in filenames), and `<algorithm_description>` is a 355 dictionary with the following keys: * `algorithm`: a pymoo `Algorithm` 356 object; note that it is deepcopied 357 for every run of `minimize`; 358 * `display` (optional): a custom `pymoo.util.display.Display` object 359 for customization purposes; 360 * `evaluator` (optional): an algorithm evaluator object; note that it 361 is deepcopied for every run of `minimize`; 362 * `return_least_infeasible` (optional, bool): if the algorithm cannot 363 find a feasible solution, wether the least infeasible solution 364 should still be returned; defaults to `False`; 365 * `termination` (optional): a pymoo termination criterion; note that it 366 is deepcopied for every run of `minimize`; 367 * `verbose` (optional, bool): wether outputs should be printed during 368 during the execution of the algorithm; defaults to `False`. 369 370 Args: 371 algorithms (Dict[str, dict]): Dict of all algorithms to be 372 benchmarked. 373 dump_histories (bool): Wether the history of each 374 `WrappedProblem` involved in this benchmark should be written 375 to disk. Defaults to `True`. 376 max_retries (int): Maximum number of attempts to run a given 377 problem-algorithm-(run number) triple before giving up. Set it 378 to `-1` to retry indefinitely. 379 n_runs (int): Number of times to run a given problem-algorithm 380 pair. 381 problems (Dict[str, dict]): Dict of all problems to be benchmarked. 382 performance_indicators (Optional[List[str]]): List of perfomance 383 indicators to be calculated and included in the result 384 dataframe (see `Benchmark.final_results`). Supported indicators 385 are 386 * `df`: ΔF metric, see the documentation of 387 `nmoo.indicators.delta_f.DeltaF`; `df_n_eval` should be set 388 in the problem description, but it default to 1 if not; 389 * `dfp`: ΔF-Pareto metric, see the documentation of 390 `nmoo.indicators.delta_f_pareto.DeltaFPareto`; `df_n_eval` 391 should be set in the problem description, but it default to 1 392 if not; 393 * `gd`: [generational 394 distance](https://pymoo.org/misc/indicators.html#Generational-Distance-(GD)), 395 requires `pareto_front` to be set in the problem description 396 dictionaries, otherwise the value of this indicator will be 397 `NaN`; 398 * `gd+`: [generational distance 399 plus](https://pymoo.org/misc/indicators.html#Generational-Distance-Plus-(GD+)), 400 requires `pareto_front` to be set in the problem description 401 dictionaries, otherwise the value of this indicator will be 402 `NaN`; 403 * `hv`: 404 [hypervolume](https://pymoo.org/misc/indicators.html#Hypervolume), 405 requires `hv_ref_point` to be set in the problem discription 406 dictionaries, otherwise the value of this indicator will be 407 `NaN`; 408 * `igd`: [inverted generational 409 distance](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-(IGD)), 410 requires `pareto_front` to be set in the problem description 411 dictionaries, otherwise the value of this indicator will be 412 `NaN`; 413 * `igd+`: [inverted generational distance 414 plus](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-Plus-(IGD+)), 415 requires `pareto_front` to be set in the problem description 416 dictionaries, otherwise the value of this indicator will be 417 `NaN`; 418 * `ps`: population size, or equivalently, the size of the 419 current Pareto front; 420 * `ggd`: ground generational distance, where the ground 421 problem's predicted objective values are used instead of the 422 outer problem's; requires `pareto_front` to be set in the 423 problem description dictionaries, otherwise the value of this 424 indicator will be `NaN`; 425 * `ggd+`: ground generational distance plus; requires 426 `pareto_front` to be set in the problem description 427 dictionaries, otherwise the value of this indicator will be 428 `NaN`; 429 * `ghv`: ground hypervolume; requires `hv_ref_point` to be set 430 in the problem discription dictionaries, otherwise the value 431 of this indicator will be `NaN`; 432 * `gigd`: ground inverted generational distance; requires 433 `pareto_front` to be set in the problem description 434 dictionaries, otherwise the value of this indicator will be 435 `NaN`; 436 * `gigd+`: ground inverted generational distance plus; requires 437 `pareto_front` to be set in the problem description 438 dictionaries, otherwise the value of this indicator will be 439 `NaN`. 440 * `rggd`: resampled ground generational distance, where the 441 ground problem's predicted objective values (resampled and 442 averaged a given number of times) are used instead of the 443 outer problem's; requires `pareto_front` to be set in the 444 problem description dictionaries, otherwise the value of this 445 indicator will be `NaN`; `rg_n_eval` should also be set in 446 the problem description, but defaults to 1 if not; 447 * `rggd+`: resampled ground generational distance plus; 448 requires `pareto_front` to be set in the problem description 449 dictionaries, otherwise the value of this indicator will be 450 `NaN`; `rg_n_eval` should also be set in the problem 451 description, but defaults to 1 if not; `rg_n_eval` should 452 also be set in the problem description, but defaults to 1 if 453 not; 454 * `rghv`: resampled ground hypervolume; requires `hv_ref_point` 455 to be set in the problem discription dictionaries, otherwise 456 the value of this indicator will be `NaN`; `rg_n_eval` should 457 also be set in the problem description, but defaults to 1 if 458 not; 459 * `rgigd`: resampled ground inverted generational distance; 460 requires `pareto_front` to be set in the problem description 461 dictionaries, otherwise the value of this indicator will be 462 `NaN`; `rg_n_eval` should also be set in the problem 463 description, but defaults to 1 if not; 464 * `rgigd+`: resampled ground inverted generational distance 465 plus; requires `pareto_front` to be set in the problem 466 description dictionaries, otherwise the value of this 467 indicator will be `NaN`; `rg_n_eval` should also be set in 468 the problem description, but defaults to 1 if not. 469 470 In the result dataframe, the corresponding columns will be 471 named `perf_<name of indicator>`, e.g. `perf_igd`. If left 472 unspecified, defaults to `["igd"]`. 473 474 seeds (Optional[List[Optional[int]]]): List of seeds to use. The 475 first seed will be used for the first run of every 476 algorithm-problem pair, etc. 477 """ 478 self._output_dir_path = Path(output_dir_path) 479 self._set_problems(problems) 480 self._set_algorithms(algorithms) 481 if n_runs <= 0: 482 raise ValueError( 483 "The number of run (for each problem-algorithm pair) must be " 484 "at least 1." 485 ) 486 self._n_runs = n_runs 487 self._dump_histories = dump_histories 488 self._set_performance_indicators(performance_indicators) 489 self._max_retry = max_retry 490 if seeds is None: 491 self._seeds = [None] * n_runs 492 elif len(seeds) < n_runs: 493 raise ValueError( 494 f"Not enough seeds: provided {len(seeds)} seeds but specified " 495 f"{n_runs} runs." 496 ) 497 else: 498 if len(seeds) > n_runs: 499 logging.warning( 500 "Too many seeds: provided {} but only need {} " 501 "(i.e. n_run)", 502 len(seeds), 503 n_runs, 504 ) 505 self._seeds = seeds
Constructor. The set of problems to be benchmarked is represented by a dictionary with the following structure:
problems = {
<problem_name>: <problem_description>,
<problem_name>: <problem_description>,
}
where <problem_name>
is a user-defined string (but stay reasonable
since it may be used in filenames), and <problem_description>
is a
dictionary with the following keys:
df_n_evals
(int, optional): see the explanation of thedf
performance indicator below; defaults to1
;evaluator
(optional): an algorithm evaluator object that will be applied to every algorithm that run on this problem; if an algorithm already has an evaluator attached to it (see<algorithm_description>
below), the evaluator attached to this problem takes precedence; note that the evaluator is deepcopied for every run ofminimize
;hv_ref_point
(optional,np.ndarray
): a reference point for computing hypervolume performance, seeperformance_indicators
argument;pareto_front
(optional,np.ndarray
): a Pareto front subset;problem
: aWrappedProblem
instance.
The set of algorithms to be used is specified similarly::
algorithms = {
<algorithm_name>: <algorithm_description>,
<algorithm_name>: <algorithm_description>,
}
where <algorithm_name>
is a user-defined string (but stay reasonable
since it may be used in filenames), and <algorithm_description>
is a
dictionary with the following keys: * algorithm
: a pymoo Algorithm
object; note that it is deepcopied
for every run of minimize
;
display
(optional): a custompymoo.util.display.Display
object for customization purposes;evaluator
(optional): an algorithm evaluator object; note that it is deepcopied for every run ofminimize
;return_least_infeasible
(optional, bool): if the algorithm cannot find a feasible solution, wether the least infeasible solution should still be returned; defaults toFalse
;termination
(optional): a pymoo termination criterion; note that it is deepcopied for every run ofminimize
;verbose
(optional, bool): wether outputs should be printed during during the execution of the algorithm; defaults toFalse
.
Arguments:
- algorithms (Dict[str, dict]): Dict of all algorithms to be benchmarked.
- dump_histories (bool): Wether the history of each
WrappedProblem
involved in this benchmark should be written to disk. Defaults toTrue
. - max_retries (int): Maximum number of attempts to run a given
problem-algorithm-(run number) triple before giving up. Set it
to
-1
to retry indefinitely. - n_runs (int): Number of times to run a given problem-algorithm pair.
- problems (Dict[str, dict]): Dict of all problems to be benchmarked.
performance_indicators (Optional[List[str]]): List of perfomance indicators to be calculated and included in the result dataframe (see
Benchmark.final_results
). Supported indicators aredf
: ΔF metric, see the documentation ofnmoo.indicators.delta_f.DeltaF
;df_n_eval
should be set in the problem description, but it default to 1 if not;dfp
: ΔF-Pareto metric, see the documentation ofnmoo.indicators.delta_f_pareto.DeltaFPareto
;df_n_eval
should be set in the problem description, but it default to 1 if not;gd
: generational distance, requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;gd+
: generational distance plus, requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;hv
: hypervolume, requireshv_ref_point
to be set in the problem discription dictionaries, otherwise the value of this indicator will beNaN
;igd
: inverted generational distance, requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;igd+
: inverted generational distance plus, requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;ps
: population size, or equivalently, the size of the current Pareto front;ggd
: ground generational distance, where the ground problem's predicted objective values are used instead of the outer problem's; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;ggd+
: ground generational distance plus; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;ghv
: ground hypervolume; requireshv_ref_point
to be set in the problem discription dictionaries, otherwise the value of this indicator will beNaN
;gigd
: ground inverted generational distance; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;gigd+
: ground inverted generational distance plus; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
.rggd
: resampled ground generational distance, where the ground problem's predicted objective values (resampled and averaged a given number of times) are used instead of the outer problem's; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;rg_n_eval
should also be set in the problem description, but defaults to 1 if not;rggd+
: resampled ground generational distance plus; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;rg_n_eval
should also be set in the problem description, but defaults to 1 if not;rg_n_eval
should also be set in the problem description, but defaults to 1 if not;rghv
: resampled ground hypervolume; requireshv_ref_point
to be set in the problem discription dictionaries, otherwise the value of this indicator will beNaN
;rg_n_eval
should also be set in the problem description, but defaults to 1 if not;rgigd
: resampled ground inverted generational distance; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;rg_n_eval
should also be set in the problem description, but defaults to 1 if not;rgigd+
: resampled ground inverted generational distance plus; requirespareto_front
to be set in the problem description dictionaries, otherwise the value of this indicator will beNaN
;rg_n_eval
should also be set in the problem description, but defaults to 1 if not.
In the result dataframe, the corresponding columns will be named
perf_<name of indicator>
, e.g.perf_igd
. If left unspecified, defaults to["igd"]
.- seeds (Optional[List[Optional[int]]]): List of seeds to use. The first seed will be used for the first run of every algorithm-problem pair, etc.
851 def all_pa_pairs(self) -> List[PAPair]: 852 """Generate the list of all problem-algorithm pairs.""" 853 everything = product( 854 self._algorithms.items(), 855 self._problems.items(), 856 ) 857 return [ 858 PAPair( 859 algorithm_description=aa, 860 algorithm_name=an, 861 problem_description=pp, 862 problem_name=pn, 863 ) 864 for (an, aa), (pn, pp) in everything 865 ]
Generate the list of all problem-algorithm pairs.
867 def all_par_triples(self) -> List[PARTriple]: 868 """Generate the list of all problem-algorithm-(run number) triples.""" 869 everything = product( 870 self._algorithms.items(), 871 self._problems.items(), 872 range(1, self._n_runs + 1), 873 ) 874 return [ 875 PARTriple( 876 algorithm_description=aa, 877 algorithm_name=an, 878 n_run=r, 879 problem_description=pp, 880 problem_name=pn, 881 ) 882 for (an, aa), (pn, pp), r in everything 883 ]
Generate the list of all problem-algorithm-(run number) triples.
885 def compute_global_pareto_populations( 886 self, n_jobs: int = -1, **joblib_kwargs 887 ) -> None: 888 """ 889 The global Pareto population of a problem-algorithm pair is the merged 890 population of all pareto populations across all runs of that pair. This 891 function calculates global Pareto population of all pairs and dumps it 892 to `<output_dir_path>/<problem>.<algorithm>.gpp.npz`. If that file 893 exists for a given problem-algorithm pair, then the global Pareto 894 population (of that pair) is not recalculated. 895 """ 896 logging.info("Computing global Pareto populations") 897 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 898 executor( 899 delayed(Benchmark._compute_global_pareto_population)(self, p) 900 for p in self.all_pa_pairs() 901 if not ( 902 self._output_dir_path / p.global_pareto_population_filename() 903 ).is_file() 904 )
The global Pareto population of a problem-algorithm pair is the merged
population of all pareto populations across all runs of that pair. This
function calculates global Pareto population of all pairs and dumps it
to <output_dir_path>/<problem>.<algorithm>.gpp.npz
. If that file
exists for a given problem-algorithm pair, then the global Pareto
population (of that pair) is not recalculated.
906 def compute_performance_indicators( 907 self, n_jobs: int = -1, **joblib_kwargs 908 ) -> None: 909 """ 910 Computes all performance indicators and saves the corresponding 911 dataframes in 912 `output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. 913 If that file exists for a given problem-algorithm-(run 914 number)-(performance indicator) tuple, then it is not recalculated. 915 """ 916 logging.info( 917 "Computing performance indicators: {}", 918 ", ".join(self._performance_indicators), 919 ) 920 everything = product( 921 self.all_par_triples(), self._performance_indicators 922 ) 923 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 924 executor( 925 delayed(Benchmark._compute_performance_indicator)(self, t, pi) 926 for t, pi in everything 927 if not (self._output_dir_path / t.pi_filename(pi)).is_file() 928 )
Computes all performance indicators and saves the corresponding
dataframes in
output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv
.
If that file exists for a given problem-algorithm-(run
number)-(performance indicator) tuple, then it is not recalculated.
930 def consolidate(self) -> None: 931 """ 932 Merges all statistics dataframes 933 (`<problem_name>.<algorithm_name>.<n_run>.csv`) and all PI dataframes 934 (`<problem_name>.<algorithm_name>.<n_run>.pi.csv`) into a single 935 dataframe, and saves it under `output_dir_path/benchmark.csv`. 936 """ 937 logging.info("Consolidating statistics") 938 all_df = [] 939 for triple in self.all_par_triples(): 940 path = self._output_dir_path / triple.result_filename() 941 if not path.exists(): 942 logging.debug( 943 "Statistic file {} does not exist. The corresponding " 944 "triple [{}] most likely hasn't finished or failed", 945 path, 946 triple, 947 ) 948 continue 949 all_df.append(pd.read_csv(path)) 950 self._results = pd.concat(all_df, ignore_index=True) 951 self._results["timedelta"] = pd.to_timedelta( 952 self._results["timedelta"] 953 ) 954 self._results = self._results.astype( 955 { 956 "algorithm": "category", 957 "n_gen": "uint32", 958 "n_run": "uint32", 959 "problem": "category", 960 } 961 ) 962 963 logging.info("Consolidating performance indicators") 964 all_df = [] 965 for triple in self.all_par_triples(): 966 df = pd.DataFrame() 967 for pi_name in self._performance_indicators: 968 path = self._output_dir_path / triple.pi_filename(pi_name) 969 if not path.exists(): 970 logging.debug("PI file {} does not exist.", path) 971 continue 972 tmp = pd.read_csv(path) 973 if df.empty: 974 df = tmp 975 else: 976 col = "perf_" + pi_name 977 df[col] = tmp[col] 978 all_df.append(df) 979 980 self._results = self._results.merge( 981 pd.concat(all_df, ignore_index=True), 982 how="outer", 983 on=["algorithm", "problem", "n_gen", "n_run"], 984 ) 985 986 # ??? 987 if "Unnamed: 0" in self._results: 988 del self._results["Unnamed: 0"] 989 990 path = self._output_dir_path / "benchmark.csv" 991 logging.info("Writing results to {}", path) 992 self.dump_results(path, index=False)
Merges all statistics dataframes
(<problem_name>.<algorithm_name>.<n_run>.csv
) and all PI dataframes
(<problem_name>.<algorithm_name>.<n_run>.pi.csv
) into a single
dataframe, and saves it under output_dir_path/benchmark.csv
.
994 def dump_results(self, path: Union[Path, str], fmt: str = "csv", **kwargs): 995 """ 996 Dumps the internal `_result` dataframe. 997 998 Args: 999 path (Union[Path, str]): Path to the output file. 1000 fmt (str): Text or binary format supported by pandas, see 1001 `here <https://pandas.pydata.org/docs/user_guide/io.html>`_. 1002 CSV by default. 1003 kwargs: Will be passed on the `pandas.DataFrame.to_<fmt>` method. 1004 """ 1005 saver = { 1006 "csv": pd.DataFrame.to_csv, 1007 "excel": pd.DataFrame.to_excel, 1008 "feather": pd.DataFrame.to_feather, 1009 "gbq": pd.DataFrame.to_gbq, 1010 "hdf": pd.DataFrame.to_hdf, 1011 "html": pd.DataFrame.to_html, 1012 "json": pd.DataFrame.to_json, 1013 "parquet": pd.DataFrame.to_parquet, 1014 "pickle": pd.DataFrame.to_pickle, 1015 }[fmt] 1016 saver(self._results, path, **kwargs)
Dumps the internal _result
dataframe.
Arguments:
- path (Union[Path, str]): Path to the output file.
- fmt (str): Text or binary format supported by pandas, see here . CSV by default.
- kwargs: Will be passed on the
pandas.DataFrame.to_<fmt>
method.
1018 def final_results( 1019 self, 1020 timedeltas_to_microseconds: bool = True, 1021 reset_index: bool = True, 1022 ) -> pd.DataFrame: 1023 """ 1024 Returns a dataframe containing the final row of each 1025 algorithm/problem/n_run triple, i.e. the final record of each run of 1026 the benchmark. 1027 1028 If the `reset_index` argument set to `False`, the resulting dataframe 1029 will have a multiindex given by the (algorithm, problem, n_run) tuples, 1030 e.g. 1031 1032 n_gen timedelta perf_gd ... 1033 algorithm problem n_run ... 1034 nsga2 bnh 1 155 886181 0.477980 ... 1035 2 200 29909 0.480764 ... 1036 zdt1 1 400 752818 0.191490 ... 1037 2 305 979112 0.260930 ... 1038 1039 (note tha the `timedelta` column has been converted to microseconds, 1040 see the `timedeltas_to_microseconds` argument below). If `reset_index` 1041 is set to `True` (the default), then the index is reset, giving 1042 something like this: 1043 1044 algorithm problem n_run n_gen timedelta perf_gd ... 1045 0 nsga2 bnh 1 155 886181 0.477980 ... 1046 1 nsga2 bnh 2 200 29909 0.480764 ... 1047 2 nsga2 zdt1 1 400 752818 0.191490 ... 1048 3 nsga2 zdt1 2 305 979112 0.260930 ... 1049 1050 This form is easier to plot. 1051 1052 Args: 1053 reset_index (bool): Wether to reset the index. Defaults to 1054 `True`. 1055 timedeltas_to_microseconds (bool): Wether to convert the 1056 timedeltas column to microseconds. Defaults to `True`. 1057 1058 """ 1059 df = self._results.groupby(["algorithm", "problem", "n_run"]).last() 1060 if timedeltas_to_microseconds: 1061 df["timedelta"] = df["timedelta"].dt.microseconds 1062 return df.reset_index() if reset_index else df
Returns a dataframe containing the final row of each algorithm/problem/n_run triple, i.e. the final record of each run of the benchmark.
If the reset_index
argument set to False
, the resulting dataframe
will have a multiindex given by the (algorithm, problem, n_run) tuples,
e.g.
n_gen timedelta perf_gd ...
algorithm problem n_run ...
nsga2 bnh 1 155 886181 0.477980 ...
2 200 29909 0.480764 ...
zdt1 1 400 752818 0.191490 ...
2 305 979112 0.260930 ...
(note tha the timedelta
column has been converted to microseconds,
see the timedeltas_to_microseconds
argument below). If reset_index
is set to True
(the default), then the index is reset, giving
something like this:
algorithm problem n_run n_gen timedelta perf_gd ...
0 nsga2 bnh 1 155 886181 0.477980 ...
1 nsga2 bnh 2 200 29909 0.480764 ...
2 nsga2 zdt1 1 400 752818 0.191490 ...
3 nsga2 zdt1 2 305 979112 0.260930 ...
This form is easier to plot.
Arguments:
- reset_index (bool): Wether to reset the index. Defaults to
True
. - timedeltas_to_microseconds (bool): Wether to convert the
timedeltas column to microseconds. Defaults to
True
.
1064 def run( 1065 self, 1066 n_jobs: int = -1, 1067 n_post_processing_jobs: int = 2, 1068 **joblib_kwargs, 1069 ): 1070 """ 1071 Runs the benchmark sequentially. Makes your laptop go brr. The 1072 histories of all problems are progressively dumped in the specified 1073 output directory as the benchmark run. At the end, the benchmark 1074 results are dumped in `output_dir_path/benchmark.csv`. 1075 1076 Args: 1077 n_jobs (int): Number of processes to use. See the `joblib.Parallel`_ 1078 documentation. Defaults to `-1`, i.e. all CPUs are used. 1079 n_post_processing_jobs (int): Number of processes to use for post 1080 processing tasks (computing global Pareto populations and 1081 performance indicators). These are memory-intensive tasks. 1082 Defaults to `2`. 1083 joblib_kwargs (dict): Additional kwargs to pass on to the 1084 `joblib.Parallel`_ instance. 1085 1086 .. _joblib.Parallel: 1087 https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html 1088 """ 1089 if not os.path.isdir(self._output_dir_path): 1090 os.mkdir(self._output_dir_path) 1091 triples = self.all_par_triples() 1092 executor = Parallel(n_jobs=n_jobs, **joblib_kwargs) 1093 current_round = 0 1094 while ( 1095 self._max_retry < 0 or current_round <= self._max_retry 1096 ) and any(not self._par_triple_done(t) for t in triples): 1097 executor( 1098 delayed(Benchmark._run_par_triple)(self, t) 1099 for t in triples 1100 if not self._par_triple_done(t) 1101 ) 1102 current_round += 1 1103 if any(not self._par_triple_done(t) for t in triples): 1104 logging.warning( 1105 "Benchmark finished, but some triples could not be run " 1106 "successfully within the retry budget ({}):", 1107 self._max_retry, 1108 ) 1109 for t in filter(lambda x: not self._par_triple_done(x), triples): 1110 logging.warning(" [{}]", t) 1111 self.compute_global_pareto_populations( 1112 n_post_processing_jobs, **joblib_kwargs 1113 ) 1114 self.compute_performance_indicators( 1115 n_post_processing_jobs, **joblib_kwargs 1116 ) 1117 self.consolidate()
Runs the benchmark sequentially. Makes your laptop go brr. The
histories of all problems are progressively dumped in the specified
output directory as the benchmark run. At the end, the benchmark
results are dumped in output_dir_path/benchmark.csv
.
Arguments:
- n_jobs (int): Number of processes to use. See the joblib.Parallel
documentation. Defaults to
-1
, i.e. all CPUs are used. - n_post_processing_jobs (int): Number of processes to use for post
processing tasks (computing global Pareto populations and
performance indicators). These are memory-intensive tasks.
Defaults to
2
. - joblib_kwargs (dict): Additional kwargs to pass on to the joblib.Parallel instance.