nmoo.benchmark

A benchmarking utility. The following is perhaps the one of simplest benchmark one could make

from pymoo.problems.multi import ZDT1
from pymoo.algorithms.moo.nsga2 import NSGA2
from nmoo import Benchmark, GaussianNoise, KNNAvg, WrappedProblem

zdt1 = WrappedProblem(ZDT1())
noisy_zdt1 = GaussianNoise(zdt1, np.zeros(2), 1)
knnavg_zdt1 = KNNAvg(noisy_zdt1, max_distance=1.0)

benchmark = Benchmark(
    problems={
        "knnavg": {
            "problem": knnavg_zdt1
        }
    },
    algorithms={
        "nsga2": {
            "algorithm": NSGA2()
        }
    },
    n_runs=3,
    output_dir_path="./out",
)

which simply runs vanilla NSGA2 against a KNN-Averaging-denoised Gaussian-noised synthetic ZDT1 against NSGA21, 3 times. The benchmark can be executed with

benchmark.run()

and the ./out directory will be populated with various artefact, see below.

Refer to https://github.com/altaris/noisy-moo/blob/main/example.ipynb to get started, or to https://github.com/altaris/noisy-moo/blob/main/example.py for a more complete example.

Artefact specification

After the benchmark above is run, the ./out directory is populated with the following artefacts:

  • benchmark.csv the main result file. It has one row per (algorithm, problem, run number, generation). The columns are: n_gen, n_eval, timedelta, algorithm, problem, n_run, perf_igd. Here is a sample

    n_gen,n_eval,timedelta,algorithm,problem,n_run,perf_igd
    1,100,0 days 00:00:00.046010,nsga2,knnavg,1,2.7023936601855274
    2,200,0 days 00:00:00.110027,nsga2,knnavg,1,2.9920028540271617
    3,300,0 days 00:00:00.170194,nsga2,knnavg,1,2.808592743167947
    4,400,0 days 00:00:00.234336,nsga2,knnavg,1,2.7716447570482603
    5,500,0 days 00:00:00.300136,nsga2,knnavg,1,2.76605547730596
    6,600,0 days 00:00:00.367092,nsga2,knnavg,1,2.016998447316908
    7,700,0 days 00:00:00.432571,nsga2,knnavg,1,2.025674566580406
    8,800,0 days 00:00:00.501700,nsga2,knnavg,1,1.7875644431157067
    9,900,0 days 00:00:00.571355,nsga2,knnavg,1,2.5705921276809542
    
  • <problem>.<algorithm>.<run number>.csv: same as benchmark.csv but only for a given (algorithm, problem, run number) triple.

  • <problem>.<algorithm>.<run number>.pi-<perf. indicator>.csv: performance indicator file. Contains one row per generation. The columns are perf_<perf. indicator name>, algorithm, problem, n_gen, n_run. Here is a sample from knnavg.nsga2.1.pi-igd.csv:

    ,perf_igd,algorithm,problem,n_gen,n_run
    0,2.7023936601855274,nsga2,knnavg,1,1
    1,2.9920028540271617,nsga2,knnavg,2,1
    2,2.808592743167947,nsga2,knnavg,3,1
    3,2.7716447570482603,nsga2,knnavg,4,1
    4,2.76605547730596,nsga2,knnavg,5,1
    5,2.0169984473169076,nsga2,knnavg,6,1
    6,2.025674566580406,nsga2,knnavg,7,1
    7,1.7875644431157067,nsga2,knnavg,8,1
    8,2.5705921276809542,nsga2,knnavg,9,1
    9,2.245542743713137,nsga2,knnavg,10,1
    
  • <problem>.<algorithm>.<run number>.<layer number>-<layer name>.npz: NPZ archive containing the history of all calls to a given layer of a given problem. In the example above, problem knnavg_zda1 has three layers: knn_avg (layer 1, the outermost one), gaussian_noise (layer 2), and wrapped_problem (layer 3, the innermost one). Recall that you can set the name of a layer using the name argument in WrappedProblem.__init__. The keys are X, F, _batch, _run. It may also contain keys G, dF, dG, ddF, ddG, CV, feasible depending on the ground pymoo problem. The arrays at each keys have the same length (shape[0]), which is the number of individuals that have been evaluated throughout that run. In our example above, out/knnavg.nsga2.1.1-knn_avg.npz has keys X, F, _batch, _run, and the arrays have shape (19600, 30), (19600, 2), (19600,), (19600,), respectively. 30 is the number of variables of ZDT1, while 2 is the number of objectives.

  • <problem>.<algorithm>.<run number>.pp.npz: Pareto population of a given (algorithm, problem, run number) triple. The keys are X, F, G, dF, dG, ddF, ddG, CV, feasible, _batch, and all arrays have the same length (shape[0]). Row i corresponds to an individual that was Pareto-ideal at generation _batch[i].

  • <problem>.<algorithm>.gpp.npz: Global Pareto population of a given problem-algorithm pair. It is the Pareto population of the population of all individuals designed across all runs and all generations of a given problem-algorithm pair. It is used to compute certain performance indicators in the absence of a baseline Pareto front. The keys are X, F, G, dF, dG, ddF, ddG, CV, feasible, _batch.


  1. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T. (2000). A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In: , et al. Parallel Problem Solving from Nature PPSN VI. PPSN 2000. Lecture Notes in Computer Science, vol 1917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45356-3_83 

   1# pylint: disable=too-many-lines
   2
   3"""
   4A benchmarking utility. The following is perhaps the one of simplest benchmark
   5one could make
   6```py
   7from pymoo.problems.multi import ZDT1
   8from pymoo.algorithms.moo.nsga2 import NSGA2
   9from nmoo import Benchmark, GaussianNoise, KNNAvg, WrappedProblem
  10
  11zdt1 = WrappedProblem(ZDT1())
  12noisy_zdt1 = GaussianNoise(zdt1, np.zeros(2), 1)
  13knnavg_zdt1 = KNNAvg(noisy_zdt1, max_distance=1.0)
  14
  15benchmark = Benchmark(
  16    problems={
  17        "knnavg": {
  18            "problem": knnavg_zdt1
  19        }
  20    },
  21    algorithms={
  22        "nsga2": {
  23            "algorithm": NSGA2()
  24        }
  25    },
  26    n_runs=3,
  27    output_dir_path="./out",
  28)
  29```
  30which simply runs vanilla `NSGA2` against a KNN-Averaging-denoised
  31Gaussian-noised synthetic
  32[ZDT1](https://pymoo.org/problems/multi/zdt.html#ZDT1) against NSGA2[^nsga2], 3
  33times. The benchmark can be executed with
  34```
  35benchmark.run()
  36```
  37and the `./out` directory will be populated with various artefact, see below.
  38
  39Refer to https://github.com/altaris/noisy-moo/blob/main/example.ipynb to get
  40started, or to https://github.com/altaris/noisy-moo/blob/main/example.py for a
  41more complete example.
  42
  43## Artefact specification
  44
  45After the benchmark above is run, the `./out` directory is populated with the
  46following artefacts:
  47
  48* `benchmark.csv` the main result file. It has one row per (algorithm, problem,
  49  run number, generation). The columns are: `n_gen`, `n_eval`, `timedelta`,
  50  `algorithm`, `problem`, `n_run`, `perf_igd`. Here is a sample
  51
  52        n_gen,n_eval,timedelta,algorithm,problem,n_run,perf_igd
  53        1,100,0 days 00:00:00.046010,nsga2,knnavg,1,2.7023936601855274
  54        2,200,0 days 00:00:00.110027,nsga2,knnavg,1,2.9920028540271617
  55        3,300,0 days 00:00:00.170194,nsga2,knnavg,1,2.808592743167947
  56        4,400,0 days 00:00:00.234336,nsga2,knnavg,1,2.7716447570482603
  57        5,500,0 days 00:00:00.300136,nsga2,knnavg,1,2.76605547730596
  58        6,600,0 days 00:00:00.367092,nsga2,knnavg,1,2.016998447316908
  59        7,700,0 days 00:00:00.432571,nsga2,knnavg,1,2.025674566580406
  60        8,800,0 days 00:00:00.501700,nsga2,knnavg,1,1.7875644431157067
  61        9,900,0 days 00:00:00.571355,nsga2,knnavg,1,2.5705921276809542
  62
  63* `<problem>.<algorithm>.<run number>.csv`: same as `benchmark.csv` but only
  64  for a given (algorithm, problem, run number) triple.
  65
  66* `<problem>.<algorithm>.<run number>.pi-<perf. indicator>.csv`: performance
  67  indicator file. Contains one row per generation. The columns are `perf_<perf.
  68  indicator name>`, `algorithm`, `problem`, `n_gen`, `n_run`. Here is a sample
  69  from `knnavg.nsga2.1.pi-igd.csv`:
  70
  71        ,perf_igd,algorithm,problem,n_gen,n_run
  72        0,2.7023936601855274,nsga2,knnavg,1,1
  73        1,2.9920028540271617,nsga2,knnavg,2,1
  74        2,2.808592743167947,nsga2,knnavg,3,1
  75        3,2.7716447570482603,nsga2,knnavg,4,1
  76        4,2.76605547730596,nsga2,knnavg,5,1
  77        5,2.0169984473169076,nsga2,knnavg,6,1
  78        6,2.025674566580406,nsga2,knnavg,7,1
  79        7,1.7875644431157067,nsga2,knnavg,8,1
  80        8,2.5705921276809542,nsga2,knnavg,9,1
  81        9,2.245542743713137,nsga2,knnavg,10,1
  82
  83* `<problem>.<algorithm>.<run number>.<layer number>-<layer name>.npz`: NPZ
  84  archive containing the history of all calls to a given layer of a given
  85  problem. In the example above, problem `knnavg_zda1` has three layers:
  86  `knn_avg` (layer 1, the outermost one), `gaussian_noise` (layer 2), and
  87  `wrapped_problem` (layer 3, the innermost one). Recall that you can set the
  88  name of a layer using the `name` argument in `WrappedProblem.__init__`. The
  89  keys are `X`, `F`, `_batch`, `_run`. It may also contain keys `G`, `dF`,
  90  `dG`, `ddF`, `ddG`, `CV`, `feasible` depending on the ground pymoo problem.
  91  The arrays at each keys have the same length (`shape[0]`), which is the
  92  number of individuals that have been evaluated throughout that run. In our
  93  example above, `out/knnavg.nsga2.1.1-knn_avg.npz` has keys `X`, `F`,
  94  `_batch`, `_run`, and the arrays have shape `(19600, 30)`, `(19600, 2)`,
  95  `(19600,)`, `(19600,)`, respectively. 30 is the number of variables of ZDT1,
  96  while 2 is the number of objectives.
  97
  98* `<problem>.<algorithm>.<run number>.pp.npz`: Pareto population of a given
  99  (algorithm, problem, run number) triple. The keys are `X`, `F`, `G`, `dF`,
 100  `dG`, `ddF`, `ddG`, `CV`, `feasible`, `_batch`, and all arrays have the same
 101  length (`shape[0]`). Row `i` corresponds to an individual that was
 102  Pareto-ideal at generation `_batch[i]`.
 103
 104* `<problem>.<algorithm>.gpp.npz`: *Global Pareto population* of a given
 105  problem-algorithm pair. It is the Pareto population of the population of all
 106  individuals designed across all runs and all generations of a given
 107  problem-algorithm pair. It is used to compute certain performance indicators
 108  in the absence of a baseline Pareto front. The keys are `X`, `F`, `G`, `dF`,
 109  `dG`, `ddF`, `ddG`, `CV`, `feasible`, `_batch`.
 110
 111[^nsga2]: Deb, K., Agrawal, S., Pratap, A., Meyarivan, T. (2000). A Fast
 112    Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective
 113    Optimization: NSGA-II. In: , et al. Parallel Problem Solving from Nature
 114    PPSN VI. PPSN 2000. Lecture Notes in Computer Science, vol 1917. Springer,
 115    Berlin, Heidelberg. https://doi.org/10.1007/3-540-45356-3_83
 116
 117"""
 118__docformat__ = "google"
 119
 120import os
 121from copy import deepcopy
 122from dataclasses import dataclass
 123from itertools import product
 124from pathlib import Path
 125from typing import Any, Callable, Dict, List, Optional, Union
 126
 127import numpy as np
 128import pandas as pd
 129from joblib import Parallel, delayed
 130from loguru import logger as logging
 131from pymoo.config import Config
 132from pymoo.factory import get_performance_indicator
 133from pymoo.optimize import minimize
 134
 135from nmoo.callbacks import TimerCallback
 136from nmoo.denoisers import ResampleAverage
 137from nmoo.indicators import DeltaF, DeltaFPareto
 138from nmoo.utils.population import pareto_frontier_mask, population_list_to_dict
 139from nmoo.utils.logging import configure_logging
 140
 141Config.show_compile_hint = False
 142
 143_PIC = Callable[[Dict[str, np.ndarray]], Optional[float]]
 144"""
 145Performance Indicator Callable. Type of a function that takes a state (dict of
 146`np.ndarray` with keys e.g. `F`, `X`, `pF`, etc.) and returns the value of a
 147performance indicator. See `Benchmark._compute_performance_indicator`.
 148"""
 149
 150
 151@dataclass
 152class PAPair:
 153    """
 154    Represents a problem-algorithm pair.
 155    """
 156
 157    algorithm_description: Dict[str, Any]
 158    algorithm_name: str
 159    problem_description: Dict[str, Any]
 160    problem_name: str
 161
 162    def __str__(self) -> str:
 163        return f"{self.problem_name}|{self.algorithm_name}"
 164
 165    def global_pareto_population_filename(self) -> str:
 166        """Returns `<problem_name>.<algorithm_name>.gpp.npz`."""
 167        return f"{self.problem_name}.{self.algorithm_name}.gpp.npz"
 168
 169
 170@dataclass
 171class PARTriple(PAPair):
 172    """
 173    Represents a problem-algorithm-(run number) triple.
 174    """
 175
 176    n_run: int
 177
 178    def __str__(self) -> str:
 179        return f"{self.problem_name}|{self.algorithm_name}|{self.n_run}"
 180
 181    def denoised_top_layer_history_filename(self) -> str:
 182        """
 183        Returns
 184        `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.denoised.npz`.
 185        """
 186        prefix = self.filename_prefix()
 187        name = self.problem_description["problem"]._name
 188        return f"{prefix}.1-{name}.denoised.npz"
 189
 190    def denoised_top_layer_pareto_history_filename(self) -> str:
 191        """
 192        Returns
 193        `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.pareto-denoised.npz`.
 194        """
 195        prefix = self.filename_prefix()
 196        name = self.problem_description["problem"]._name
 197        return f"{prefix}.1-{name}.pareto-denoised.npz"
 198
 199    def filename_prefix(self) -> str:
 200        """Returns `<problem_name>.<algorithm_name>.<n_run>`."""
 201        return f"{self.problem_name}.{self.algorithm_name}.{self.n_run}"
 202
 203    def innermost_layer_history_filename(self) -> str:
 204        """Returns the filename of the innermost layer history."""
 205        prefix = self.filename_prefix()
 206        problem = self.problem_description["problem"]
 207        inner = problem.innermost_wrapper()
 208        name, depth = inner._name, problem.depth()
 209        return f"{prefix}.{depth}-{name}.npz"
 210
 211    def pareto_population_filename(self) -> str:
 212        """Returns `<problem_name>.<algorithm_name>.<n_run>.pp.npz`."""
 213        return self.filename_prefix() + ".pp.npz"
 214
 215    def pi_filename(self, pi_name: str) -> str:
 216        """
 217        Returns `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`.
 218        """
 219        return self.filename_prefix() + f".pi-{pi_name}.csv"
 220
 221    def result_filename(self) -> str:
 222        """Returns `<problem_name>.<algorithm_name>.<n_run>.csv`."""
 223        return self.filename_prefix() + ".csv"
 224
 225    def top_layer_history_filename(self) -> str:
 226        """Returns the filename of the top layer history."""
 227        prefix = self.filename_prefix()
 228        name = self.problem_description["problem"]._name
 229        return f"{prefix}.1-{name}.npz"
 230
 231
 232# pylint: disable=too-many-instance-attributes
 233class Benchmark:
 234    """
 235    A benchmark is constructed with a list of problems and pymoo algorithms
 236    descriptions, and run each algorithm against each problem, storing all
 237    histories for later analysis.
 238    """
 239
 240    SUPPORTED_PERFOMANCE_INDICATORS = [
 241        "df",
 242        "dfp",
 243        "gd",
 244        "gd+",
 245        "ggd",
 246        "ggd+",
 247        "ghv",
 248        "gigd",
 249        "gigd+",
 250        "hv",
 251        "igd",
 252        "igd+",
 253        "ps",
 254        "rggd",
 255        "rggd+",
 256        "rghv",
 257        "rgigd",
 258        "rgigd+",
 259    ]
 260
 261    _algorithms: Dict[str, dict]
 262    """
 263    List of algorithms to be benchmarked.
 264    """
 265
 266    _dump_histories: bool
 267    """
 268    Wether the history of each `WrappedProblem` involved in this benchmark
 269    should be written to disk.
 270    """
 271
 272    _max_retry: int
 273    """
 274    Maximum number of attempts to run a given problem-algorithm-(run number)
 275    triple before giving up.
 276    """
 277
 278    _n_runs: int
 279    """
 280    Number of times to run a given problem-algorithm pair.
 281    """
 282
 283    _output_dir_path: Path
 284    """
 285    Path of the output directory.
 286    """
 287
 288    _performance_indicators: List[str]
 289    """
 290    List of performance indicator to calculate during the benchmark.
 291    """
 292
 293    _problems: Dict[str, dict]
 294    """
 295    List of problems to be benchmarked.
 296    """
 297
 298    _results: pd.DataFrame
 299    """
 300    Results of all runs.
 301    """
 302
 303    _seeds: List[Optional[int]]
 304    """
 305    List of seeds to use. Must be of length `_n_runs`.
 306    """
 307
 308    def __init__(
 309        self,
 310        output_dir_path: Union[Path, str],
 311        problems: Dict[str, dict],
 312        algorithms: Dict[str, dict],
 313        n_runs: int = 1,
 314        dump_histories: bool = True,
 315        performance_indicators: Optional[List[str]] = None,
 316        max_retry: int = -1,
 317        seeds: Optional[List[Optional[int]]] = None,
 318    ):
 319        """
 320        Constructor. The set of problems to be benchmarked is represented by a
 321        dictionary with the following structure:
 322
 323            problems = {
 324                <problem_name>: <problem_description>,
 325                <problem_name>: <problem_description>,
 326            }
 327
 328        where `<problem_name>` is a user-defined string (but stay reasonable
 329        since it may be used in filenames), and `<problem_description>` is a
 330        dictionary with the following keys:
 331        * `df_n_evals` (int, optional): see the explanation of the `df`
 332            performance indicator below; defaults to `1`;
 333        * `evaluator` (optional): an algorithm evaluator object that will be
 334            applied to every algorithm that run on this problem; if an
 335            algorithm already has an evaluator attached to it (see
 336            `<algorithm_description>` below), the evaluator attached to this
 337            problem takes precedence; note that the evaluator is deepcopied for
 338            every run of `minimize`;
 339        * `hv_ref_point` (optional, `np.ndarray`): a reference point for
 340            computing hypervolume performance, see `performance_indicators`
 341            argument;
 342        * `pareto_front` (optional, `np.ndarray`): a Pareto front subset;
 343        * `problem`: a `WrappedProblem` instance.
 344
 345        The set of algorithms to be used is specified similarly::
 346
 347            algorithms = {
 348                <algorithm_name>: <algorithm_description>,
 349                <algorithm_name>: <algorithm_description>,
 350            }
 351
 352        where `<algorithm_name>` is a user-defined string (but stay reasonable
 353        since it may be used in filenames), and `<algorithm_description>` is a
 354        dictionary with the following keys: * `algorithm`: a pymoo `Algorithm`
 355        object; note that it is deepcopied
 356            for every run of `minimize`;
 357        * `display` (optional): a custom `pymoo.util.display.Display` object
 358            for customization purposes;
 359        * `evaluator` (optional): an algorithm evaluator object; note that it
 360            is deepcopied for every run of `minimize`;
 361        * `return_least_infeasible` (optional, bool): if the algorithm cannot
 362            find a feasible solution, wether the least infeasible solution
 363            should still be returned; defaults to `False`;
 364        * `termination` (optional): a pymoo termination criterion; note that it
 365            is deepcopied for every run of `minimize`;
 366        * `verbose` (optional, bool): wether outputs should be printed during
 367            during the execution of the algorithm; defaults to `False`.
 368
 369        Args:
 370            algorithms (Dict[str, dict]): Dict of all algorithms to be
 371                benchmarked.
 372            dump_histories (bool): Wether the history of each
 373                `WrappedProblem` involved in this benchmark should be written
 374                to disk. Defaults to `True`.
 375            max_retries (int): Maximum number of attempts to run a given
 376                problem-algorithm-(run number) triple before giving up. Set it
 377                to `-1` to retry indefinitely.
 378            n_runs (int): Number of times to run a given problem-algorithm
 379                pair.
 380            problems (Dict[str, dict]): Dict of all problems to be benchmarked.
 381            performance_indicators (Optional[List[str]]): List of perfomance
 382                indicators to be calculated and included in the result
 383                dataframe (see `Benchmark.final_results`). Supported indicators
 384                are
 385                * `df`: ΔF metric, see the documentation of
 386                  `nmoo.indicators.delta_f.DeltaF`; `df_n_eval` should be set
 387                  in the problem description, but it default to 1 if not;
 388                * `dfp`: ΔF-Pareto metric, see the documentation of
 389                  `nmoo.indicators.delta_f_pareto.DeltaFPareto`; `df_n_eval`
 390                  should be set in the problem description, but it default to 1
 391                  if not;
 392                * `gd`: [generational
 393                  distance](https://pymoo.org/misc/indicators.html#Generational-Distance-(GD)),
 394                  requires `pareto_front` to be set in the problem description
 395                  dictionaries, otherwise the value of this indicator will be
 396                  `NaN`;
 397                * `gd+`: [generational distance
 398                  plus](https://pymoo.org/misc/indicators.html#Generational-Distance-Plus-(GD+)),
 399                  requires `pareto_front` to be set in the problem description
 400                  dictionaries, otherwise the value of this indicator will be
 401                  `NaN`;
 402                * `hv`:
 403                  [hypervolume](https://pymoo.org/misc/indicators.html#Hypervolume),
 404                  requires `hv_ref_point` to be set in the problem discription
 405                  dictionaries, otherwise the value of this indicator will be
 406                  `NaN`;
 407                * `igd`: [inverted generational
 408                  distance](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-(IGD)),
 409                  requires `pareto_front` to be set in the problem description
 410                  dictionaries, otherwise the value of this indicator will be
 411                  `NaN`;
 412                * `igd+`: [inverted generational distance
 413                  plus](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-Plus-(IGD+)),
 414                  requires `pareto_front` to be set in the problem description
 415                  dictionaries, otherwise the value of this indicator will be
 416                  `NaN`;
 417                * `ps`: population size, or equivalently, the size of the
 418                  current Pareto front;
 419                * `ggd`: ground generational distance, where the ground
 420                  problem's predicted objective values are used instead of the
 421                  outer problem's; requires `pareto_front` to be set in the
 422                  problem description dictionaries, otherwise the value of this
 423                  indicator will be `NaN`;
 424                * `ggd+`: ground generational distance plus; requires
 425                  `pareto_front` to be set in the problem description
 426                  dictionaries, otherwise the value of this indicator will be
 427                  `NaN`;
 428                * `ghv`: ground hypervolume; requires `hv_ref_point` to be set
 429                  in the problem discription dictionaries, otherwise the value
 430                  of this indicator will be `NaN`;
 431                * `gigd`: ground inverted generational distance; requires
 432                  `pareto_front` to be set in the problem description
 433                  dictionaries, otherwise the value of this indicator will be
 434                  `NaN`;
 435                * `gigd+`: ground inverted generational distance plus; requires
 436                  `pareto_front` to be set in the problem description
 437                  dictionaries, otherwise the value of this indicator will be
 438                  `NaN`.
 439                * `rggd`: resampled ground generational distance, where the
 440                  ground problem's predicted objective values (resampled and
 441                  averaged a given number of times) are used instead of the
 442                  outer problem's; requires `pareto_front` to be set in the
 443                  problem description dictionaries, otherwise the value of this
 444                  indicator will be `NaN`; `rg_n_eval` should also be set in
 445                  the problem description, but defaults to 1 if not;
 446                * `rggd+`: resampled ground generational distance plus;
 447                  requires `pareto_front` to be set in the problem description
 448                  dictionaries, otherwise the value of this indicator will be
 449                  `NaN`; `rg_n_eval` should also be set in the problem
 450                  description, but defaults to 1 if not; `rg_n_eval` should
 451                  also be set in the problem description, but defaults to 1 if
 452                  not;
 453                * `rghv`: resampled ground hypervolume; requires `hv_ref_point`
 454                  to be set in the problem discription dictionaries, otherwise
 455                  the value of this indicator will be `NaN`; `rg_n_eval` should
 456                  also be set in the problem description, but defaults to 1 if
 457                  not;
 458                * `rgigd`: resampled ground inverted generational distance;
 459                  requires `pareto_front` to be set in the problem description
 460                  dictionaries, otherwise the value of this indicator will be
 461                  `NaN`; `rg_n_eval` should also be set in the problem
 462                  description, but defaults to 1 if not;
 463                * `rgigd+`: resampled ground inverted generational distance
 464                  plus; requires `pareto_front` to be set in the problem
 465                  description dictionaries, otherwise the value of this
 466                  indicator will be `NaN`; `rg_n_eval` should also be set in
 467                  the problem description, but defaults to 1 if not.
 468
 469                In the result dataframe, the corresponding columns will be
 470                named `perf_<name of indicator>`, e.g. `perf_igd`. If left
 471                unspecified, defaults to `["igd"]`.
 472
 473            seeds (Optional[List[Optional[int]]]): List of seeds to use. The
 474                first seed will be used for the first run of every
 475                algorithm-problem pair, etc.
 476        """
 477        self._output_dir_path = Path(output_dir_path)
 478        self._set_problems(problems)
 479        self._set_algorithms(algorithms)
 480        if n_runs <= 0:
 481            raise ValueError(
 482                "The number of run (for each problem-algorithm pair) must be "
 483                "at least 1."
 484            )
 485        self._n_runs = n_runs
 486        self._dump_histories = dump_histories
 487        self._set_performance_indicators(performance_indicators)
 488        self._max_retry = max_retry
 489        if seeds is None:
 490            self._seeds = [None] * n_runs
 491        elif len(seeds) < n_runs:
 492            raise ValueError(
 493                f"Not enough seeds: provided {len(seeds)} seeds but specified "
 494                f"{n_runs} runs."
 495            )
 496        else:
 497            if len(seeds) > n_runs:
 498                logging.warning(
 499                    "Too many seeds: provided {} but only need {} "
 500                    "(i.e. n_run)",
 501                    len(seeds),
 502                    n_runs,
 503                )
 504            self._seeds = seeds
 505
 506    def _compute_global_pareto_population(self, pair: PAPair) -> None:
 507        """
 508        Computes the global Pareto population of a given problem-algorithm
 509        pair. See `compute_global_pareto_populations`. Assumes that the global
 510        Pareto population has not already been calculated, i.e. that
 511        `<output_dir_path>/<problem>.<algorithm>.gpp.npz` does not exist.
 512        """
 513        configure_logging(prefix=f"[{pair}]")
 514        logging.debug("Computing global Pareto population")
 515        gpp_path = (
 516            self._output_dir_path / pair.global_pareto_population_filename()
 517        )
 518        populations: Dict[str, List[np.ndarray]] = {}
 519        for n_run in range(1, self._n_runs + 1):
 520            triple = PARTriple(
 521                algorithm_description=pair.algorithm_description,
 522                algorithm_name=pair.algorithm_name,
 523                n_run=n_run,
 524                problem_description=pair.problem_description,
 525                problem_name=pair.problem_name,
 526            )
 527            path = self._output_dir_path / triple.pareto_population_filename()
 528            if not path.exists():
 529                logging.debug(
 530                    "File {} does not exist. The corresponding triple runs "
 531                    "most likely have not finished or all failed",
 532                    path,
 533                    triple,
 534                )
 535                continue
 536            data = np.load(path, allow_pickle=True)
 537            for k, v in data.items():
 538                populations[k] = populations.get(k, []) + [v]
 539            data.close()
 540
 541        consolidated = {k: np.concatenate(v) for k, v in populations.items()}
 542        if "F" not in consolidated:
 543            logging.error(
 544                "No Pareto population file found. The corresponding triple "
 545                "runs most likely have not finished or all failed",
 546            )
 547            return
 548        mask = pareto_frontier_mask(consolidated["F"])
 549        np.savez_compressed(
 550            gpp_path,
 551            **{k: v[mask] for k, v in consolidated.items()},
 552        )
 553
 554    # pylint: disable=too-many-branches
 555    # pylint: disable=too-many-locals
 556    def _compute_performance_indicator(
 557        self, triple: PARTriple, pi_name: str
 558    ) -> None:
 559        """
 560        Computes a performance indicators for a given problem-algorithm-(run
 561        number) triple and stores it under
 562        `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. Assumes
 563        that the this performance indicator has not already been calculated,
 564        i.e. that that file does not exist.
 565
 566        Warning:
 567            This fails if either the top layer history or the pareto population
 568            artefact (`<problem_name>.<algorithm_name>.<n_run>.pp.npz`) could
 569            not be loaded as numpy arrays.
 570
 571        Todo:
 572            Refactor (again)
 573        """
 574        configure_logging(prefix=f"[{triple}|{pi_name}]")
 575        logging.debug("Computing PI")
 576
 577        pic: _PIC = lambda _: np.nan
 578        if pi_name in ["df", "dfp"]:
 579            problem = triple.problem_description["problem"]
 580            n_evals = triple.problem_description.get("df_n_evals", 1)
 581            Class, fn = {
 582                "df": (DeltaF, triple.denoised_top_layer_history_filename()),
 583                "dfp": (
 584                    DeltaFPareto,
 585                    triple.denoised_top_layer_pareto_history_filename(),
 586                ),
 587            }[pi_name]
 588            delta_f = Class(
 589                problem,
 590                n_evals,
 591                self._output_dir_path / fn,
 592            )
 593            pic = lambda s: delta_f.do(s["F"], s["X"])
 594        elif pi_name in ["gd", "gd+", "igd", "igd+"]:
 595            pic = self._get_pic_gd_type(triple, pi_name)
 596        elif pi_name in ["ggd", "ggd+", "gigd", "gigd+"]:
 597            pic = self._get_pic_gd_type(triple, pi_name[1:])
 598        elif pi_name in ["rggd", "rggd+", "rgigd", "rgigd+"]:
 599            pic = self._get_pic_gd_type(triple, pi_name[2:])
 600        elif (
 601            pi_name in ["hv", "ghv", "rghv"]
 602            and "hv_ref_point" in triple.problem_description
 603        ):
 604            ref_point = triple.problem_description["hv_ref_point"]
 605            pi = get_performance_indicator("hv", ref_point=ref_point)
 606            pic = lambda s: pi.do(s["F"])
 607        elif pi_name == "ps":
 608            pic = lambda s: s["X"].shape[0]
 609        else:
 610            logging.warning(
 611                "Unprocessable performance indicator {}. This could be "
 612                "because some required arguments (e.g. 'hv_ref_point') are "
 613                "missing",
 614                pi_name,
 615            )
 616
 617        # On which history is the PIC going to be called? By default, it is on
 618        # the top layer history.
 619        if pi_name in ["ps"]:
 620            history = np.load(
 621                self._output_dir_path / triple.pareto_population_filename()
 622            )
 623        elif pi_name in ["ggd", "ggd+", "ghv", "gigd", "gigd+"]:
 624            history = np.load(
 625                self._output_dir_path
 626                / triple.innermost_layer_history_filename()
 627            )
 628        elif pi_name in ["rggd", "rggd+", "rghv", "rgigd", "rgigd+"]:
 629            history = self._get_rg_history(triple)
 630        else:
 631            history = np.load(
 632                self._output_dir_path / triple.top_layer_history_filename()
 633            )
 634
 635        states: List[Dict[str, np.ndarray]] = []
 636        for i in range(1, history["_batch"].max() + 1):
 637            idx = history["_batch"] == i
 638            states.append({"X": history["X"][idx], "F": history["F"][idx]})
 639        df = pd.DataFrame()
 640        df["perf_" + pi_name] = list(map(pic, states))
 641        df["algorithm"] = triple.algorithm_name
 642        df["problem"] = triple.problem_name
 643        df["n_gen"] = range(1, len(states) + 1)
 644        df["n_run"] = triple.n_run
 645        logging.debug(
 646            "Writing result to {}",
 647            self._output_dir_path / triple.pi_filename(pi_name),
 648        )
 649        df.to_csv(self._output_dir_path / triple.pi_filename(pi_name))
 650
 651    def _get_pic_gd_type(self, triple: PARTriple, pi_name: str) -> _PIC:
 652        """
 653        Returns the `_PIC` corresponding to the either the `gd`, `gd+`, `igd`,
 654        or `igd+` performance indicator. As a reminder, a `_PIC`, or
 655        Performance Indicator Callable, is a function that takes a dict of
 656        `np.ndarray` and returns an optional `float`. In this case, the dict
 657        must have the key `F`.
 658        """
 659        if "pareto_front" in triple.problem_description:
 660            pf = triple.problem_description.get("pareto_front")
 661        else:
 662            path = (
 663                self._output_dir_path
 664                / triple.global_pareto_population_filename()
 665            )
 666            data = np.load(path)
 667            pf = data["F"]
 668            data.close()
 669        pi = get_performance_indicator(pi_name, pf)
 670        return lambda s: pi.do(s["F"])
 671
 672    def _get_rg_history(self, triple: PARTriple) -> Dict[str, np.ndarray]:
 673        """
 674        Returns the `X` and `F` history of the ground problem of the triple,
 675        but where `F` has been resampled a given number of times (`rg_n_evals`
 676        parameter in the problem's description). This involves wrapping the
 677        ground problem in a `nmoo.denoisers.ResampleAverage` and evaluating the
 678        history's `X` array.
 679        """
 680        history = dict(
 681            np.load(
 682                self._output_dir_path / triple.top_layer_history_filename()
 683            )
 684        )
 685        rgp = ResampleAverage(
 686            triple.problem_description["problem"].ground_problem(),
 687            triple.problem_description.get("rg_n_eval", 1),
 688        )
 689        history["F"] = rgp.evaluate(history["X"], return_values_of="F")
 690        return history
 691
 692    def _par_triple_done(self, triple: PARTriple) -> bool:
 693        """
 694        Wether a problem-algorithm-(run number) has been successfully executed.
 695        This is determined by checking if
 696        `_output_dir_path/triple.result_filename()` exists or not.
 697        """
 698        return (self._output_dir_path / triple.result_filename()).is_file()
 699
 700    def _run_par_triple(
 701        self,
 702        triple: PARTriple,
 703    ) -> None:
 704        """
 705        Runs a given algorithm against a given problem. See
 706        `nmoo.benchmark.Benchmark.run`. Immediately dumps the history of the
 707        problem and all wrapped problems with the following naming scheme:
 708
 709            output_dir_path/<problem_name>.<algorithm_name>.<n_run>.<level>.npz
 710
 711        where `level` is the depth of the wrapped problem, starting at `1`. See
 712        `nmoo.wrapped_problem.WrappedProblem.dump_all_histories`. It also dumps
 713        the compounded Pareto population for every at every generation (or just
 714        the last generation of `set_history` is set to `False` in the algorithm
 715        description) in
 716
 717            output_dir_path/<problem_name>.<algorithm_name>.<n_run>.pp.npz
 718
 719        Additionally, it generates a CSV file containing various statistics
 720        named:
 721
 722            output_dir_path/<problem_name>.<algorithm_name>.<n_run>.csv
 723
 724        The existence of this file is also used to determine if the
 725        problem-algorithm-(run number) triple has already been run when
 726        resuming a benchmark.
 727
 728        Args:
 729            triple: A `PARTriple` object representing the
 730                problem-algorithm-(run number) triple to run.
 731        """
 732        configure_logging(prefix=f"[{str(triple)}]")
 733        logging.info("Starting run")
 734
 735        triple.problem_description["problem"].start_new_run()
 736        evaluator = triple.problem_description.get(
 737            "evaluator",
 738            triple.algorithm_description.get("evaluator"),
 739        )
 740        try:
 741            seed = self._seeds[triple.n_run - 1]
 742            problem = deepcopy(triple.problem_description["problem"])
 743            problem.reseed(seed)
 744            results = minimize(
 745                problem,
 746                triple.algorithm_description["algorithm"],
 747                termination=triple.algorithm_description.get("termination"),
 748                copy_algorithm=True,
 749                copy_termination=True,
 750                # extra Algorithm.setup kwargs
 751                callback=TimerCallback(),
 752                display=triple.algorithm_description.get("display"),
 753                evaluator=deepcopy(evaluator),
 754                return_least_infeasible=triple.algorithm_description.get(
 755                    "return_least_infeasible", False
 756                ),
 757                save_history=True,
 758                seed=seed,
 759                verbose=triple.algorithm_description.get("verbose", False),
 760            )
 761        except Exception as e:  # pylint: disable=broad-except
 762            logging.error("Run failed: {}", e)
 763            return
 764        else:
 765            logging.success("Run successful")
 766
 767        # Dump all layers histories
 768        logging.debug("Writing layer histories to {}", self._output_dir_path)
 769        if self._dump_histories:
 770            results.problem.dump_all_histories(
 771                self._output_dir_path,
 772                triple.filename_prefix(),
 773            )
 774
 775        # Dump pareto sets
 776        logging.debug(
 777            "Writing pareto sets to {}",
 778            self._output_dir_path / triple.pareto_population_filename(),
 779        )
 780        np.savez_compressed(
 781            self._output_dir_path / triple.pareto_population_filename(),
 782            **population_list_to_dict([h.opt for h in results.history]),
 783        )
 784
 785        # Create and dump CSV file
 786        logging.debug(
 787            "Writing result CSV to {}",
 788            self._output_dir_path / triple.result_filename(),
 789        )
 790        df = pd.DataFrame()
 791        df["n_gen"] = [a.n_gen for a in results.history]
 792        df["n_eval"] = [a.evaluator.n_eval for a in results.history]
 793        df["timedelta"] = results.algorithm.callback._deltas
 794        # Important to create these columns once the dataframe has its full
 795        # length
 796        df["algorithm"] = triple.algorithm_name
 797        df["problem"] = triple.problem_name
 798        df["n_run"] = triple.n_run
 799        df.to_csv(
 800            self._output_dir_path / triple.result_filename(),
 801            index=False,
 802        )
 803
 804    def _set_algorithms(self, algorithms: Dict[str, dict]) -> None:
 805        """Validates and sets the algorithms dict"""
 806        if not algorithms:
 807            raise ValueError("A benchmark requires at least 1 algorithm.")
 808        for k, v in algorithms.items():
 809            if not isinstance(v, dict):
 810                raise ValueError(
 811                    f"Description for algorithm '{k}' must be a dict."
 812                )
 813            if "algorithm" not in v:
 814                raise ValueError(
 815                    f"Description for algorithm '{k}' is missing mandatory "
 816                    "key 'algorithm'."
 817                )
 818        self._algorithms = algorithms
 819
 820    def _set_performance_indicators(
 821        self, performance_indicators: Optional[List[str]]
 822    ) -> None:
 823        """Validates and sets the performance indicator list"""
 824        if performance_indicators is None:
 825            self._performance_indicators = ["igd"]
 826        else:
 827            self._performance_indicators = []
 828            for pi in set(performance_indicators):
 829                if pi not in Benchmark.SUPPORTED_PERFOMANCE_INDICATORS:
 830                    raise ValueError(f"Unknown performance indicator '{pi}'")
 831                self._performance_indicators.append(pi)
 832            self._performance_indicators = sorted(self._performance_indicators)
 833
 834    def _set_problems(self, problems: Dict[str, dict]) -> None:
 835        """Validates and sets the problem dict"""
 836        if not problems:
 837            raise ValueError("A benchmark requires at least 1 problem.")
 838        for k, v in problems.items():
 839            if not isinstance(v, dict):
 840                raise ValueError(
 841                    f"Description for problem '{k}' must be a dict."
 842                )
 843            if "problem" not in v:
 844                raise ValueError(
 845                    f"Description for problem '{k}' is missing mandatory key "
 846                    "'problem'."
 847                )
 848        self._problems = problems
 849
 850    def all_pa_pairs(self) -> List[PAPair]:
 851        """Generate the list of all problem-algorithm pairs."""
 852        everything = product(
 853            self._algorithms.items(),
 854            self._problems.items(),
 855        )
 856        return [
 857            PAPair(
 858                algorithm_description=aa,
 859                algorithm_name=an,
 860                problem_description=pp,
 861                problem_name=pn,
 862            )
 863            for (an, aa), (pn, pp) in everything
 864        ]
 865
 866    def all_par_triples(self) -> List[PARTriple]:
 867        """Generate the list of all problem-algorithm-(run number) triples."""
 868        everything = product(
 869            self._algorithms.items(),
 870            self._problems.items(),
 871            range(1, self._n_runs + 1),
 872        )
 873        return [
 874            PARTriple(
 875                algorithm_description=aa,
 876                algorithm_name=an,
 877                n_run=r,
 878                problem_description=pp,
 879                problem_name=pn,
 880            )
 881            for (an, aa), (pn, pp), r in everything
 882        ]
 883
 884    def compute_global_pareto_populations(
 885        self, n_jobs: int = -1, **joblib_kwargs
 886    ) -> None:
 887        """
 888        The global Pareto population of a problem-algorithm pair is the merged
 889        population of all pareto populations across all runs of that pair. This
 890        function calculates global Pareto population of all pairs and dumps it
 891        to `<output_dir_path>/<problem>.<algorithm>.gpp.npz`. If that file
 892        exists for a given problem-algorithm pair, then the global Pareto
 893        population (of that pair) is not recalculated.
 894        """
 895        logging.info("Computing global Pareto populations")
 896        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
 897        executor(
 898            delayed(Benchmark._compute_global_pareto_population)(self, p)
 899            for p in self.all_pa_pairs()
 900            if not (
 901                self._output_dir_path / p.global_pareto_population_filename()
 902            ).is_file()
 903        )
 904
 905    def compute_performance_indicators(
 906        self, n_jobs: int = -1, **joblib_kwargs
 907    ) -> None:
 908        """
 909        Computes all performance indicators and saves the corresponding
 910        dataframes in
 911        `output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`.
 912        If that file exists for a given problem-algorithm-(run
 913        number)-(performance indicator) tuple, then it is not recalculated.
 914        """
 915        logging.info(
 916            "Computing performance indicators: {}",
 917            ", ".join(self._performance_indicators),
 918        )
 919        everything = product(
 920            self.all_par_triples(), self._performance_indicators
 921        )
 922        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
 923        executor(
 924            delayed(Benchmark._compute_performance_indicator)(self, t, pi)
 925            for t, pi in everything
 926            if not (self._output_dir_path / t.pi_filename(pi)).is_file()
 927        )
 928
 929    def consolidate(self) -> None:
 930        """
 931        Merges all statistics dataframes
 932        (`<problem_name>.<algorithm_name>.<n_run>.csv`) and all PI dataframes
 933        (`<problem_name>.<algorithm_name>.<n_run>.pi.csv`) into a single
 934        dataframe, and saves it under `output_dir_path/benchmark.csv`.
 935        """
 936        logging.info("Consolidating statistics")
 937        all_df = []
 938        for triple in self.all_par_triples():
 939            path = self._output_dir_path / triple.result_filename()
 940            if not path.exists():
 941                logging.debug(
 942                    "Statistic file {} does not exist. The corresponding "
 943                    "triple [{}] most likely hasn't finished or failed",
 944                    path,
 945                    triple,
 946                )
 947                continue
 948            all_df.append(pd.read_csv(path))
 949        self._results = pd.concat(all_df, ignore_index=True)
 950        self._results["timedelta"] = pd.to_timedelta(
 951            self._results["timedelta"]
 952        )
 953        self._results = self._results.astype(
 954            {
 955                "algorithm": "category",
 956                "n_gen": "uint32",
 957                "n_run": "uint32",
 958                "problem": "category",
 959            }
 960        )
 961
 962        logging.info("Consolidating performance indicators")
 963        all_df = []
 964        for triple in self.all_par_triples():
 965            df = pd.DataFrame()
 966            for pi_name in self._performance_indicators:
 967                path = self._output_dir_path / triple.pi_filename(pi_name)
 968                if not path.exists():
 969                    logging.debug("PI file {} does not exist.", path)
 970                    continue
 971                tmp = pd.read_csv(path)
 972                if df.empty:
 973                    df = tmp
 974                else:
 975                    col = "perf_" + pi_name
 976                    df[col] = tmp[col]
 977            all_df.append(df)
 978
 979        self._results = self._results.merge(
 980            pd.concat(all_df, ignore_index=True),
 981            how="outer",
 982            on=["algorithm", "problem", "n_gen", "n_run"],
 983        )
 984
 985        # ???
 986        if "Unnamed: 0" in self._results:
 987            del self._results["Unnamed: 0"]
 988
 989        path = self._output_dir_path / "benchmark.csv"
 990        logging.info("Writing results to {}", path)
 991        self.dump_results(path, index=False)
 992
 993    def dump_results(self, path: Union[Path, str], fmt: str = "csv", **kwargs):
 994        """
 995        Dumps the internal `_result` dataframe.
 996
 997        Args:
 998            path (Union[Path, str]): Path to the output file.
 999            fmt (str): Text or binary format supported by pandas, see
1000                `here <https://pandas.pydata.org/docs/user_guide/io.html>`_.
1001                CSV by default.
1002            kwargs: Will be passed on the `pandas.DataFrame.to_<fmt>` method.
1003        """
1004        saver = {
1005            "csv": pd.DataFrame.to_csv,
1006            "excel": pd.DataFrame.to_excel,
1007            "feather": pd.DataFrame.to_feather,
1008            "gbq": pd.DataFrame.to_gbq,
1009            "hdf": pd.DataFrame.to_hdf,
1010            "html": pd.DataFrame.to_html,
1011            "json": pd.DataFrame.to_json,
1012            "parquet": pd.DataFrame.to_parquet,
1013            "pickle": pd.DataFrame.to_pickle,
1014        }[fmt]
1015        saver(self._results, path, **kwargs)
1016
1017    def final_results(
1018        self,
1019        timedeltas_to_microseconds: bool = True,
1020        reset_index: bool = True,
1021    ) -> pd.DataFrame:
1022        """
1023        Returns a dataframe containing the final row of each
1024        algorithm/problem/n_run triple, i.e. the final record of each run of
1025        the benchmark.
1026
1027        If the `reset_index` argument set to `False`, the resulting dataframe
1028        will have a multiindex given by the (algorithm, problem, n_run) tuples,
1029        e.g.
1030
1031                                     n_gen  timedelta   perf_gd  ...
1032            algorithm problem n_run                              ...
1033            nsga2     bnh     1        155     886181  0.477980  ...
1034                              2        200      29909  0.480764  ...
1035                      zdt1    1        400     752818  0.191490  ...
1036                              2        305     979112  0.260930  ...
1037
1038        (note tha the `timedelta` column has been converted to microseconds,
1039        see the `timedeltas_to_microseconds` argument below). If `reset_index`
1040        is set to `True` (the default), then the index is reset, giving
1041        something like this:
1042
1043              algorithm problem  n_run  n_gen  timedelta   perf_gd  ...
1044            0     nsga2     bnh      1    155     886181  0.477980  ...
1045            1     nsga2     bnh      2    200      29909  0.480764  ...
1046            2     nsga2    zdt1      1    400     752818  0.191490  ...
1047            3     nsga2    zdt1      2    305     979112  0.260930  ...
1048
1049        This form is easier to plot.
1050
1051        Args:
1052            reset_index (bool): Wether to reset the index. Defaults to
1053                `True`.
1054            timedeltas_to_microseconds (bool): Wether to convert the
1055                timedeltas column to microseconds. Defaults to `True`.
1056
1057        """
1058        df = self._results.groupby(["algorithm", "problem", "n_run"]).last()
1059        if timedeltas_to_microseconds:
1060            df["timedelta"] = df["timedelta"].dt.microseconds
1061        return df.reset_index() if reset_index else df
1062
1063    def run(
1064        self,
1065        n_jobs: int = -1,
1066        n_post_processing_jobs: int = 2,
1067        **joblib_kwargs,
1068    ):
1069        """
1070        Runs the benchmark sequentially. Makes your laptop go brr. The
1071        histories of all problems are progressively dumped in the specified
1072        output directory as the benchmark run. At the end, the benchmark
1073        results are dumped in `output_dir_path/benchmark.csv`.
1074
1075        Args:
1076            n_jobs (int): Number of processes to use. See the `joblib.Parallel`_
1077                documentation. Defaults to `-1`, i.e. all CPUs are used.
1078            n_post_processing_jobs (int): Number of processes to use for post
1079                processing tasks (computing global Pareto populations and
1080                performance indicators). These are memory-intensive tasks.
1081                Defaults to `2`.
1082            joblib_kwargs (dict): Additional kwargs to pass on to the
1083                `joblib.Parallel`_ instance.
1084
1085        .. _joblib.Parallel:
1086            https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html
1087        """
1088        if not os.path.isdir(self._output_dir_path):
1089            os.mkdir(self._output_dir_path)
1090        triples = self.all_par_triples()
1091        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
1092        current_round = 0
1093        while (
1094            self._max_retry < 0 or current_round <= self._max_retry
1095        ) and any(not self._par_triple_done(t) for t in triples):
1096            executor(
1097                delayed(Benchmark._run_par_triple)(self, t)
1098                for t in triples
1099                if not self._par_triple_done(t)
1100            )
1101            current_round += 1
1102        if any(not self._par_triple_done(t) for t in triples):
1103            logging.warning(
1104                "Benchmark finished, but some triples could not be run "
1105                "successfully within the retry budget ({}):",
1106                self._max_retry,
1107            )
1108            for t in filter(lambda x: not self._par_triple_done(x), triples):
1109                logging.warning("    [{}]", t)
1110        self.compute_global_pareto_populations(
1111            n_post_processing_jobs, **joblib_kwargs
1112        )
1113        self.compute_performance_indicators(
1114            n_post_processing_jobs, **joblib_kwargs
1115        )
1116        self.consolidate()
class PAPair:
153class PAPair:
154    """
155    Represents a problem-algorithm pair.
156    """
157
158    algorithm_description: Dict[str, Any]
159    algorithm_name: str
160    problem_description: Dict[str, Any]
161    problem_name: str
162
163    def __str__(self) -> str:
164        return f"{self.problem_name}|{self.algorithm_name}"
165
166    def global_pareto_population_filename(self) -> str:
167        """Returns `<problem_name>.<algorithm_name>.gpp.npz`."""
168        return f"{self.problem_name}.{self.algorithm_name}.gpp.npz"

Represents a problem-algorithm pair.

PAPair( algorithm_description: Dict[str, Any], algorithm_name: str, problem_description: Dict[str, Any], problem_name: str)
def global_pareto_population_filename(self) -> str:
166    def global_pareto_population_filename(self) -> str:
167        """Returns `<problem_name>.<algorithm_name>.gpp.npz`."""
168        return f"{self.problem_name}.{self.algorithm_name}.gpp.npz"

Returns <problem_name>.<algorithm_name>.gpp.npz.

class PARTriple(PAPair):
172class PARTriple(PAPair):
173    """
174    Represents a problem-algorithm-(run number) triple.
175    """
176
177    n_run: int
178
179    def __str__(self) -> str:
180        return f"{self.problem_name}|{self.algorithm_name}|{self.n_run}"
181
182    def denoised_top_layer_history_filename(self) -> str:
183        """
184        Returns
185        `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.denoised.npz`.
186        """
187        prefix = self.filename_prefix()
188        name = self.problem_description["problem"]._name
189        return f"{prefix}.1-{name}.denoised.npz"
190
191    def denoised_top_layer_pareto_history_filename(self) -> str:
192        """
193        Returns
194        `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.pareto-denoised.npz`.
195        """
196        prefix = self.filename_prefix()
197        name = self.problem_description["problem"]._name
198        return f"{prefix}.1-{name}.pareto-denoised.npz"
199
200    def filename_prefix(self) -> str:
201        """Returns `<problem_name>.<algorithm_name>.<n_run>`."""
202        return f"{self.problem_name}.{self.algorithm_name}.{self.n_run}"
203
204    def innermost_layer_history_filename(self) -> str:
205        """Returns the filename of the innermost layer history."""
206        prefix = self.filename_prefix()
207        problem = self.problem_description["problem"]
208        inner = problem.innermost_wrapper()
209        name, depth = inner._name, problem.depth()
210        return f"{prefix}.{depth}-{name}.npz"
211
212    def pareto_population_filename(self) -> str:
213        """Returns `<problem_name>.<algorithm_name>.<n_run>.pp.npz`."""
214        return self.filename_prefix() + ".pp.npz"
215
216    def pi_filename(self, pi_name: str) -> str:
217        """
218        Returns `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`.
219        """
220        return self.filename_prefix() + f".pi-{pi_name}.csv"
221
222    def result_filename(self) -> str:
223        """Returns `<problem_name>.<algorithm_name>.<n_run>.csv`."""
224        return self.filename_prefix() + ".csv"
225
226    def top_layer_history_filename(self) -> str:
227        """Returns the filename of the top layer history."""
228        prefix = self.filename_prefix()
229        name = self.problem_description["problem"]._name
230        return f"{prefix}.1-{name}.npz"

Represents a problem-algorithm-(run number) triple.

PARTriple( algorithm_description: Dict[str, Any], algorithm_name: str, problem_description: Dict[str, Any], problem_name: str, n_run: int)
def denoised_top_layer_history_filename(self) -> str:
182    def denoised_top_layer_history_filename(self) -> str:
183        """
184        Returns
185        `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.denoised.npz`.
186        """
187        prefix = self.filename_prefix()
188        name = self.problem_description["problem"]._name
189        return f"{prefix}.1-{name}.denoised.npz"

Returns <problem_name>.<algorithm_name>.<n_run>.1-.denoised.npz`.

def denoised_top_layer_pareto_history_filename(self) -> str:
191    def denoised_top_layer_pareto_history_filename(self) -> str:
192        """
193        Returns
194        `<problem_name>.<algorithm_name>.<n_run>`.1-<top_layer_name>.pareto-denoised.npz`.
195        """
196        prefix = self.filename_prefix()
197        name = self.problem_description["problem"]._name
198        return f"{prefix}.1-{name}.pareto-denoised.npz"

Returns <problem_name>.<algorithm_name>.<n_run>.1-.pareto-denoised.npz`.

def filename_prefix(self) -> str:
200    def filename_prefix(self) -> str:
201        """Returns `<problem_name>.<algorithm_name>.<n_run>`."""
202        return f"{self.problem_name}.{self.algorithm_name}.{self.n_run}"

Returns <problem_name>.<algorithm_name>.<n_run>.

def innermost_layer_history_filename(self) -> str:
204    def innermost_layer_history_filename(self) -> str:
205        """Returns the filename of the innermost layer history."""
206        prefix = self.filename_prefix()
207        problem = self.problem_description["problem"]
208        inner = problem.innermost_wrapper()
209        name, depth = inner._name, problem.depth()
210        return f"{prefix}.{depth}-{name}.npz"

Returns the filename of the innermost layer history.

def pareto_population_filename(self) -> str:
212    def pareto_population_filename(self) -> str:
213        """Returns `<problem_name>.<algorithm_name>.<n_run>.pp.npz`."""
214        return self.filename_prefix() + ".pp.npz"

Returns <problem_name>.<algorithm_name>.<n_run>.pp.npz.

def pi_filename(self, pi_name: str) -> str:
216    def pi_filename(self, pi_name: str) -> str:
217        """
218        Returns `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`.
219        """
220        return self.filename_prefix() + f".pi-{pi_name}.csv"

Returns <problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv.

def result_filename(self) -> str:
222    def result_filename(self) -> str:
223        """Returns `<problem_name>.<algorithm_name>.<n_run>.csv`."""
224        return self.filename_prefix() + ".csv"

Returns <problem_name>.<algorithm_name>.<n_run>.csv.

def top_layer_history_filename(self) -> str:
226    def top_layer_history_filename(self) -> str:
227        """Returns the filename of the top layer history."""
228        prefix = self.filename_prefix()
229        name = self.problem_description["problem"]._name
230        return f"{prefix}.1-{name}.npz"

Returns the filename of the top layer history.

class Benchmark:
 234class Benchmark:
 235    """
 236    A benchmark is constructed with a list of problems and pymoo algorithms
 237    descriptions, and run each algorithm against each problem, storing all
 238    histories for later analysis.
 239    """
 240
 241    SUPPORTED_PERFOMANCE_INDICATORS = [
 242        "df",
 243        "dfp",
 244        "gd",
 245        "gd+",
 246        "ggd",
 247        "ggd+",
 248        "ghv",
 249        "gigd",
 250        "gigd+",
 251        "hv",
 252        "igd",
 253        "igd+",
 254        "ps",
 255        "rggd",
 256        "rggd+",
 257        "rghv",
 258        "rgigd",
 259        "rgigd+",
 260    ]
 261
 262    _algorithms: Dict[str, dict]
 263    """
 264    List of algorithms to be benchmarked.
 265    """
 266
 267    _dump_histories: bool
 268    """
 269    Wether the history of each `WrappedProblem` involved in this benchmark
 270    should be written to disk.
 271    """
 272
 273    _max_retry: int
 274    """
 275    Maximum number of attempts to run a given problem-algorithm-(run number)
 276    triple before giving up.
 277    """
 278
 279    _n_runs: int
 280    """
 281    Number of times to run a given problem-algorithm pair.
 282    """
 283
 284    _output_dir_path: Path
 285    """
 286    Path of the output directory.
 287    """
 288
 289    _performance_indicators: List[str]
 290    """
 291    List of performance indicator to calculate during the benchmark.
 292    """
 293
 294    _problems: Dict[str, dict]
 295    """
 296    List of problems to be benchmarked.
 297    """
 298
 299    _results: pd.DataFrame
 300    """
 301    Results of all runs.
 302    """
 303
 304    _seeds: List[Optional[int]]
 305    """
 306    List of seeds to use. Must be of length `_n_runs`.
 307    """
 308
 309    def __init__(
 310        self,
 311        output_dir_path: Union[Path, str],
 312        problems: Dict[str, dict],
 313        algorithms: Dict[str, dict],
 314        n_runs: int = 1,
 315        dump_histories: bool = True,
 316        performance_indicators: Optional[List[str]] = None,
 317        max_retry: int = -1,
 318        seeds: Optional[List[Optional[int]]] = None,
 319    ):
 320        """
 321        Constructor. The set of problems to be benchmarked is represented by a
 322        dictionary with the following structure:
 323
 324            problems = {
 325                <problem_name>: <problem_description>,
 326                <problem_name>: <problem_description>,
 327            }
 328
 329        where `<problem_name>` is a user-defined string (but stay reasonable
 330        since it may be used in filenames), and `<problem_description>` is a
 331        dictionary with the following keys:
 332        * `df_n_evals` (int, optional): see the explanation of the `df`
 333            performance indicator below; defaults to `1`;
 334        * `evaluator` (optional): an algorithm evaluator object that will be
 335            applied to every algorithm that run on this problem; if an
 336            algorithm already has an evaluator attached to it (see
 337            `<algorithm_description>` below), the evaluator attached to this
 338            problem takes precedence; note that the evaluator is deepcopied for
 339            every run of `minimize`;
 340        * `hv_ref_point` (optional, `np.ndarray`): a reference point for
 341            computing hypervolume performance, see `performance_indicators`
 342            argument;
 343        * `pareto_front` (optional, `np.ndarray`): a Pareto front subset;
 344        * `problem`: a `WrappedProblem` instance.
 345
 346        The set of algorithms to be used is specified similarly::
 347
 348            algorithms = {
 349                <algorithm_name>: <algorithm_description>,
 350                <algorithm_name>: <algorithm_description>,
 351            }
 352
 353        where `<algorithm_name>` is a user-defined string (but stay reasonable
 354        since it may be used in filenames), and `<algorithm_description>` is a
 355        dictionary with the following keys: * `algorithm`: a pymoo `Algorithm`
 356        object; note that it is deepcopied
 357            for every run of `minimize`;
 358        * `display` (optional): a custom `pymoo.util.display.Display` object
 359            for customization purposes;
 360        * `evaluator` (optional): an algorithm evaluator object; note that it
 361            is deepcopied for every run of `minimize`;
 362        * `return_least_infeasible` (optional, bool): if the algorithm cannot
 363            find a feasible solution, wether the least infeasible solution
 364            should still be returned; defaults to `False`;
 365        * `termination` (optional): a pymoo termination criterion; note that it
 366            is deepcopied for every run of `minimize`;
 367        * `verbose` (optional, bool): wether outputs should be printed during
 368            during the execution of the algorithm; defaults to `False`.
 369
 370        Args:
 371            algorithms (Dict[str, dict]): Dict of all algorithms to be
 372                benchmarked.
 373            dump_histories (bool): Wether the history of each
 374                `WrappedProblem` involved in this benchmark should be written
 375                to disk. Defaults to `True`.
 376            max_retries (int): Maximum number of attempts to run a given
 377                problem-algorithm-(run number) triple before giving up. Set it
 378                to `-1` to retry indefinitely.
 379            n_runs (int): Number of times to run a given problem-algorithm
 380                pair.
 381            problems (Dict[str, dict]): Dict of all problems to be benchmarked.
 382            performance_indicators (Optional[List[str]]): List of perfomance
 383                indicators to be calculated and included in the result
 384                dataframe (see `Benchmark.final_results`). Supported indicators
 385                are
 386                * `df`: ΔF metric, see the documentation of
 387                  `nmoo.indicators.delta_f.DeltaF`; `df_n_eval` should be set
 388                  in the problem description, but it default to 1 if not;
 389                * `dfp`: ΔF-Pareto metric, see the documentation of
 390                  `nmoo.indicators.delta_f_pareto.DeltaFPareto`; `df_n_eval`
 391                  should be set in the problem description, but it default to 1
 392                  if not;
 393                * `gd`: [generational
 394                  distance](https://pymoo.org/misc/indicators.html#Generational-Distance-(GD)),
 395                  requires `pareto_front` to be set in the problem description
 396                  dictionaries, otherwise the value of this indicator will be
 397                  `NaN`;
 398                * `gd+`: [generational distance
 399                  plus](https://pymoo.org/misc/indicators.html#Generational-Distance-Plus-(GD+)),
 400                  requires `pareto_front` to be set in the problem description
 401                  dictionaries, otherwise the value of this indicator will be
 402                  `NaN`;
 403                * `hv`:
 404                  [hypervolume](https://pymoo.org/misc/indicators.html#Hypervolume),
 405                  requires `hv_ref_point` to be set in the problem discription
 406                  dictionaries, otherwise the value of this indicator will be
 407                  `NaN`;
 408                * `igd`: [inverted generational
 409                  distance](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-(IGD)),
 410                  requires `pareto_front` to be set in the problem description
 411                  dictionaries, otherwise the value of this indicator will be
 412                  `NaN`;
 413                * `igd+`: [inverted generational distance
 414                  plus](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-Plus-(IGD+)),
 415                  requires `pareto_front` to be set in the problem description
 416                  dictionaries, otherwise the value of this indicator will be
 417                  `NaN`;
 418                * `ps`: population size, or equivalently, the size of the
 419                  current Pareto front;
 420                * `ggd`: ground generational distance, where the ground
 421                  problem's predicted objective values are used instead of the
 422                  outer problem's; requires `pareto_front` to be set in the
 423                  problem description dictionaries, otherwise the value of this
 424                  indicator will be `NaN`;
 425                * `ggd+`: ground generational distance plus; requires
 426                  `pareto_front` to be set in the problem description
 427                  dictionaries, otherwise the value of this indicator will be
 428                  `NaN`;
 429                * `ghv`: ground hypervolume; requires `hv_ref_point` to be set
 430                  in the problem discription dictionaries, otherwise the value
 431                  of this indicator will be `NaN`;
 432                * `gigd`: ground inverted generational distance; requires
 433                  `pareto_front` to be set in the problem description
 434                  dictionaries, otherwise the value of this indicator will be
 435                  `NaN`;
 436                * `gigd+`: ground inverted generational distance plus; requires
 437                  `pareto_front` to be set in the problem description
 438                  dictionaries, otherwise the value of this indicator will be
 439                  `NaN`.
 440                * `rggd`: resampled ground generational distance, where the
 441                  ground problem's predicted objective values (resampled and
 442                  averaged a given number of times) are used instead of the
 443                  outer problem's; requires `pareto_front` to be set in the
 444                  problem description dictionaries, otherwise the value of this
 445                  indicator will be `NaN`; `rg_n_eval` should also be set in
 446                  the problem description, but defaults to 1 if not;
 447                * `rggd+`: resampled ground generational distance plus;
 448                  requires `pareto_front` to be set in the problem description
 449                  dictionaries, otherwise the value of this indicator will be
 450                  `NaN`; `rg_n_eval` should also be set in the problem
 451                  description, but defaults to 1 if not; `rg_n_eval` should
 452                  also be set in the problem description, but defaults to 1 if
 453                  not;
 454                * `rghv`: resampled ground hypervolume; requires `hv_ref_point`
 455                  to be set in the problem discription dictionaries, otherwise
 456                  the value of this indicator will be `NaN`; `rg_n_eval` should
 457                  also be set in the problem description, but defaults to 1 if
 458                  not;
 459                * `rgigd`: resampled ground inverted generational distance;
 460                  requires `pareto_front` to be set in the problem description
 461                  dictionaries, otherwise the value of this indicator will be
 462                  `NaN`; `rg_n_eval` should also be set in the problem
 463                  description, but defaults to 1 if not;
 464                * `rgigd+`: resampled ground inverted generational distance
 465                  plus; requires `pareto_front` to be set in the problem
 466                  description dictionaries, otherwise the value of this
 467                  indicator will be `NaN`; `rg_n_eval` should also be set in
 468                  the problem description, but defaults to 1 if not.
 469
 470                In the result dataframe, the corresponding columns will be
 471                named `perf_<name of indicator>`, e.g. `perf_igd`. If left
 472                unspecified, defaults to `["igd"]`.
 473
 474            seeds (Optional[List[Optional[int]]]): List of seeds to use. The
 475                first seed will be used for the first run of every
 476                algorithm-problem pair, etc.
 477        """
 478        self._output_dir_path = Path(output_dir_path)
 479        self._set_problems(problems)
 480        self._set_algorithms(algorithms)
 481        if n_runs <= 0:
 482            raise ValueError(
 483                "The number of run (for each problem-algorithm pair) must be "
 484                "at least 1."
 485            )
 486        self._n_runs = n_runs
 487        self._dump_histories = dump_histories
 488        self._set_performance_indicators(performance_indicators)
 489        self._max_retry = max_retry
 490        if seeds is None:
 491            self._seeds = [None] * n_runs
 492        elif len(seeds) < n_runs:
 493            raise ValueError(
 494                f"Not enough seeds: provided {len(seeds)} seeds but specified "
 495                f"{n_runs} runs."
 496            )
 497        else:
 498            if len(seeds) > n_runs:
 499                logging.warning(
 500                    "Too many seeds: provided {} but only need {} "
 501                    "(i.e. n_run)",
 502                    len(seeds),
 503                    n_runs,
 504                )
 505            self._seeds = seeds
 506
 507    def _compute_global_pareto_population(self, pair: PAPair) -> None:
 508        """
 509        Computes the global Pareto population of a given problem-algorithm
 510        pair. See `compute_global_pareto_populations`. Assumes that the global
 511        Pareto population has not already been calculated, i.e. that
 512        `<output_dir_path>/<problem>.<algorithm>.gpp.npz` does not exist.
 513        """
 514        configure_logging(prefix=f"[{pair}]")
 515        logging.debug("Computing global Pareto population")
 516        gpp_path = (
 517            self._output_dir_path / pair.global_pareto_population_filename()
 518        )
 519        populations: Dict[str, List[np.ndarray]] = {}
 520        for n_run in range(1, self._n_runs + 1):
 521            triple = PARTriple(
 522                algorithm_description=pair.algorithm_description,
 523                algorithm_name=pair.algorithm_name,
 524                n_run=n_run,
 525                problem_description=pair.problem_description,
 526                problem_name=pair.problem_name,
 527            )
 528            path = self._output_dir_path / triple.pareto_population_filename()
 529            if not path.exists():
 530                logging.debug(
 531                    "File {} does not exist. The corresponding triple runs "
 532                    "most likely have not finished or all failed",
 533                    path,
 534                    triple,
 535                )
 536                continue
 537            data = np.load(path, allow_pickle=True)
 538            for k, v in data.items():
 539                populations[k] = populations.get(k, []) + [v]
 540            data.close()
 541
 542        consolidated = {k: np.concatenate(v) for k, v in populations.items()}
 543        if "F" not in consolidated:
 544            logging.error(
 545                "No Pareto population file found. The corresponding triple "
 546                "runs most likely have not finished or all failed",
 547            )
 548            return
 549        mask = pareto_frontier_mask(consolidated["F"])
 550        np.savez_compressed(
 551            gpp_path,
 552            **{k: v[mask] for k, v in consolidated.items()},
 553        )
 554
 555    # pylint: disable=too-many-branches
 556    # pylint: disable=too-many-locals
 557    def _compute_performance_indicator(
 558        self, triple: PARTriple, pi_name: str
 559    ) -> None:
 560        """
 561        Computes a performance indicators for a given problem-algorithm-(run
 562        number) triple and stores it under
 563        `<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`. Assumes
 564        that the this performance indicator has not already been calculated,
 565        i.e. that that file does not exist.
 566
 567        Warning:
 568            This fails if either the top layer history or the pareto population
 569            artefact (`<problem_name>.<algorithm_name>.<n_run>.pp.npz`) could
 570            not be loaded as numpy arrays.
 571
 572        Todo:
 573            Refactor (again)
 574        """
 575        configure_logging(prefix=f"[{triple}|{pi_name}]")
 576        logging.debug("Computing PI")
 577
 578        pic: _PIC = lambda _: np.nan
 579        if pi_name in ["df", "dfp"]:
 580            problem = triple.problem_description["problem"]
 581            n_evals = triple.problem_description.get("df_n_evals", 1)
 582            Class, fn = {
 583                "df": (DeltaF, triple.denoised_top_layer_history_filename()),
 584                "dfp": (
 585                    DeltaFPareto,
 586                    triple.denoised_top_layer_pareto_history_filename(),
 587                ),
 588            }[pi_name]
 589            delta_f = Class(
 590                problem,
 591                n_evals,
 592                self._output_dir_path / fn,
 593            )
 594            pic = lambda s: delta_f.do(s["F"], s["X"])
 595        elif pi_name in ["gd", "gd+", "igd", "igd+"]:
 596            pic = self._get_pic_gd_type(triple, pi_name)
 597        elif pi_name in ["ggd", "ggd+", "gigd", "gigd+"]:
 598            pic = self._get_pic_gd_type(triple, pi_name[1:])
 599        elif pi_name in ["rggd", "rggd+", "rgigd", "rgigd+"]:
 600            pic = self._get_pic_gd_type(triple, pi_name[2:])
 601        elif (
 602            pi_name in ["hv", "ghv", "rghv"]
 603            and "hv_ref_point" in triple.problem_description
 604        ):
 605            ref_point = triple.problem_description["hv_ref_point"]
 606            pi = get_performance_indicator("hv", ref_point=ref_point)
 607            pic = lambda s: pi.do(s["F"])
 608        elif pi_name == "ps":
 609            pic = lambda s: s["X"].shape[0]
 610        else:
 611            logging.warning(
 612                "Unprocessable performance indicator {}. This could be "
 613                "because some required arguments (e.g. 'hv_ref_point') are "
 614                "missing",
 615                pi_name,
 616            )
 617
 618        # On which history is the PIC going to be called? By default, it is on
 619        # the top layer history.
 620        if pi_name in ["ps"]:
 621            history = np.load(
 622                self._output_dir_path / triple.pareto_population_filename()
 623            )
 624        elif pi_name in ["ggd", "ggd+", "ghv", "gigd", "gigd+"]:
 625            history = np.load(
 626                self._output_dir_path
 627                / triple.innermost_layer_history_filename()
 628            )
 629        elif pi_name in ["rggd", "rggd+", "rghv", "rgigd", "rgigd+"]:
 630            history = self._get_rg_history(triple)
 631        else:
 632            history = np.load(
 633                self._output_dir_path / triple.top_layer_history_filename()
 634            )
 635
 636        states: List[Dict[str, np.ndarray]] = []
 637        for i in range(1, history["_batch"].max() + 1):
 638            idx = history["_batch"] == i
 639            states.append({"X": history["X"][idx], "F": history["F"][idx]})
 640        df = pd.DataFrame()
 641        df["perf_" + pi_name] = list(map(pic, states))
 642        df["algorithm"] = triple.algorithm_name
 643        df["problem"] = triple.problem_name
 644        df["n_gen"] = range(1, len(states) + 1)
 645        df["n_run"] = triple.n_run
 646        logging.debug(
 647            "Writing result to {}",
 648            self._output_dir_path / triple.pi_filename(pi_name),
 649        )
 650        df.to_csv(self._output_dir_path / triple.pi_filename(pi_name))
 651
 652    def _get_pic_gd_type(self, triple: PARTriple, pi_name: str) -> _PIC:
 653        """
 654        Returns the `_PIC` corresponding to the either the `gd`, `gd+`, `igd`,
 655        or `igd+` performance indicator. As a reminder, a `_PIC`, or
 656        Performance Indicator Callable, is a function that takes a dict of
 657        `np.ndarray` and returns an optional `float`. In this case, the dict
 658        must have the key `F`.
 659        """
 660        if "pareto_front" in triple.problem_description:
 661            pf = triple.problem_description.get("pareto_front")
 662        else:
 663            path = (
 664                self._output_dir_path
 665                / triple.global_pareto_population_filename()
 666            )
 667            data = np.load(path)
 668            pf = data["F"]
 669            data.close()
 670        pi = get_performance_indicator(pi_name, pf)
 671        return lambda s: pi.do(s["F"])
 672
 673    def _get_rg_history(self, triple: PARTriple) -> Dict[str, np.ndarray]:
 674        """
 675        Returns the `X` and `F` history of the ground problem of the triple,
 676        but where `F` has been resampled a given number of times (`rg_n_evals`
 677        parameter in the problem's description). This involves wrapping the
 678        ground problem in a `nmoo.denoisers.ResampleAverage` and evaluating the
 679        history's `X` array.
 680        """
 681        history = dict(
 682            np.load(
 683                self._output_dir_path / triple.top_layer_history_filename()
 684            )
 685        )
 686        rgp = ResampleAverage(
 687            triple.problem_description["problem"].ground_problem(),
 688            triple.problem_description.get("rg_n_eval", 1),
 689        )
 690        history["F"] = rgp.evaluate(history["X"], return_values_of="F")
 691        return history
 692
 693    def _par_triple_done(self, triple: PARTriple) -> bool:
 694        """
 695        Wether a problem-algorithm-(run number) has been successfully executed.
 696        This is determined by checking if
 697        `_output_dir_path/triple.result_filename()` exists or not.
 698        """
 699        return (self._output_dir_path / triple.result_filename()).is_file()
 700
 701    def _run_par_triple(
 702        self,
 703        triple: PARTriple,
 704    ) -> None:
 705        """
 706        Runs a given algorithm against a given problem. See
 707        `nmoo.benchmark.Benchmark.run`. Immediately dumps the history of the
 708        problem and all wrapped problems with the following naming scheme:
 709
 710            output_dir_path/<problem_name>.<algorithm_name>.<n_run>.<level>.npz
 711
 712        where `level` is the depth of the wrapped problem, starting at `1`. See
 713        `nmoo.wrapped_problem.WrappedProblem.dump_all_histories`. It also dumps
 714        the compounded Pareto population for every at every generation (or just
 715        the last generation of `set_history` is set to `False` in the algorithm
 716        description) in
 717
 718            output_dir_path/<problem_name>.<algorithm_name>.<n_run>.pp.npz
 719
 720        Additionally, it generates a CSV file containing various statistics
 721        named:
 722
 723            output_dir_path/<problem_name>.<algorithm_name>.<n_run>.csv
 724
 725        The existence of this file is also used to determine if the
 726        problem-algorithm-(run number) triple has already been run when
 727        resuming a benchmark.
 728
 729        Args:
 730            triple: A `PARTriple` object representing the
 731                problem-algorithm-(run number) triple to run.
 732        """
 733        configure_logging(prefix=f"[{str(triple)}]")
 734        logging.info("Starting run")
 735
 736        triple.problem_description["problem"].start_new_run()
 737        evaluator = triple.problem_description.get(
 738            "evaluator",
 739            triple.algorithm_description.get("evaluator"),
 740        )
 741        try:
 742            seed = self._seeds[triple.n_run - 1]
 743            problem = deepcopy(triple.problem_description["problem"])
 744            problem.reseed(seed)
 745            results = minimize(
 746                problem,
 747                triple.algorithm_description["algorithm"],
 748                termination=triple.algorithm_description.get("termination"),
 749                copy_algorithm=True,
 750                copy_termination=True,
 751                # extra Algorithm.setup kwargs
 752                callback=TimerCallback(),
 753                display=triple.algorithm_description.get("display"),
 754                evaluator=deepcopy(evaluator),
 755                return_least_infeasible=triple.algorithm_description.get(
 756                    "return_least_infeasible", False
 757                ),
 758                save_history=True,
 759                seed=seed,
 760                verbose=triple.algorithm_description.get("verbose", False),
 761            )
 762        except Exception as e:  # pylint: disable=broad-except
 763            logging.error("Run failed: {}", e)
 764            return
 765        else:
 766            logging.success("Run successful")
 767
 768        # Dump all layers histories
 769        logging.debug("Writing layer histories to {}", self._output_dir_path)
 770        if self._dump_histories:
 771            results.problem.dump_all_histories(
 772                self._output_dir_path,
 773                triple.filename_prefix(),
 774            )
 775
 776        # Dump pareto sets
 777        logging.debug(
 778            "Writing pareto sets to {}",
 779            self._output_dir_path / triple.pareto_population_filename(),
 780        )
 781        np.savez_compressed(
 782            self._output_dir_path / triple.pareto_population_filename(),
 783            **population_list_to_dict([h.opt for h in results.history]),
 784        )
 785
 786        # Create and dump CSV file
 787        logging.debug(
 788            "Writing result CSV to {}",
 789            self._output_dir_path / triple.result_filename(),
 790        )
 791        df = pd.DataFrame()
 792        df["n_gen"] = [a.n_gen for a in results.history]
 793        df["n_eval"] = [a.evaluator.n_eval for a in results.history]
 794        df["timedelta"] = results.algorithm.callback._deltas
 795        # Important to create these columns once the dataframe has its full
 796        # length
 797        df["algorithm"] = triple.algorithm_name
 798        df["problem"] = triple.problem_name
 799        df["n_run"] = triple.n_run
 800        df.to_csv(
 801            self._output_dir_path / triple.result_filename(),
 802            index=False,
 803        )
 804
 805    def _set_algorithms(self, algorithms: Dict[str, dict]) -> None:
 806        """Validates and sets the algorithms dict"""
 807        if not algorithms:
 808            raise ValueError("A benchmark requires at least 1 algorithm.")
 809        for k, v in algorithms.items():
 810            if not isinstance(v, dict):
 811                raise ValueError(
 812                    f"Description for algorithm '{k}' must be a dict."
 813                )
 814            if "algorithm" not in v:
 815                raise ValueError(
 816                    f"Description for algorithm '{k}' is missing mandatory "
 817                    "key 'algorithm'."
 818                )
 819        self._algorithms = algorithms
 820
 821    def _set_performance_indicators(
 822        self, performance_indicators: Optional[List[str]]
 823    ) -> None:
 824        """Validates and sets the performance indicator list"""
 825        if performance_indicators is None:
 826            self._performance_indicators = ["igd"]
 827        else:
 828            self._performance_indicators = []
 829            for pi in set(performance_indicators):
 830                if pi not in Benchmark.SUPPORTED_PERFOMANCE_INDICATORS:
 831                    raise ValueError(f"Unknown performance indicator '{pi}'")
 832                self._performance_indicators.append(pi)
 833            self._performance_indicators = sorted(self._performance_indicators)
 834
 835    def _set_problems(self, problems: Dict[str, dict]) -> None:
 836        """Validates and sets the problem dict"""
 837        if not problems:
 838            raise ValueError("A benchmark requires at least 1 problem.")
 839        for k, v in problems.items():
 840            if not isinstance(v, dict):
 841                raise ValueError(
 842                    f"Description for problem '{k}' must be a dict."
 843                )
 844            if "problem" not in v:
 845                raise ValueError(
 846                    f"Description for problem '{k}' is missing mandatory key "
 847                    "'problem'."
 848                )
 849        self._problems = problems
 850
 851    def all_pa_pairs(self) -> List[PAPair]:
 852        """Generate the list of all problem-algorithm pairs."""
 853        everything = product(
 854            self._algorithms.items(),
 855            self._problems.items(),
 856        )
 857        return [
 858            PAPair(
 859                algorithm_description=aa,
 860                algorithm_name=an,
 861                problem_description=pp,
 862                problem_name=pn,
 863            )
 864            for (an, aa), (pn, pp) in everything
 865        ]
 866
 867    def all_par_triples(self) -> List[PARTriple]:
 868        """Generate the list of all problem-algorithm-(run number) triples."""
 869        everything = product(
 870            self._algorithms.items(),
 871            self._problems.items(),
 872            range(1, self._n_runs + 1),
 873        )
 874        return [
 875            PARTriple(
 876                algorithm_description=aa,
 877                algorithm_name=an,
 878                n_run=r,
 879                problem_description=pp,
 880                problem_name=pn,
 881            )
 882            for (an, aa), (pn, pp), r in everything
 883        ]
 884
 885    def compute_global_pareto_populations(
 886        self, n_jobs: int = -1, **joblib_kwargs
 887    ) -> None:
 888        """
 889        The global Pareto population of a problem-algorithm pair is the merged
 890        population of all pareto populations across all runs of that pair. This
 891        function calculates global Pareto population of all pairs and dumps it
 892        to `<output_dir_path>/<problem>.<algorithm>.gpp.npz`. If that file
 893        exists for a given problem-algorithm pair, then the global Pareto
 894        population (of that pair) is not recalculated.
 895        """
 896        logging.info("Computing global Pareto populations")
 897        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
 898        executor(
 899            delayed(Benchmark._compute_global_pareto_population)(self, p)
 900            for p in self.all_pa_pairs()
 901            if not (
 902                self._output_dir_path / p.global_pareto_population_filename()
 903            ).is_file()
 904        )
 905
 906    def compute_performance_indicators(
 907        self, n_jobs: int = -1, **joblib_kwargs
 908    ) -> None:
 909        """
 910        Computes all performance indicators and saves the corresponding
 911        dataframes in
 912        `output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`.
 913        If that file exists for a given problem-algorithm-(run
 914        number)-(performance indicator) tuple, then it is not recalculated.
 915        """
 916        logging.info(
 917            "Computing performance indicators: {}",
 918            ", ".join(self._performance_indicators),
 919        )
 920        everything = product(
 921            self.all_par_triples(), self._performance_indicators
 922        )
 923        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
 924        executor(
 925            delayed(Benchmark._compute_performance_indicator)(self, t, pi)
 926            for t, pi in everything
 927            if not (self._output_dir_path / t.pi_filename(pi)).is_file()
 928        )
 929
 930    def consolidate(self) -> None:
 931        """
 932        Merges all statistics dataframes
 933        (`<problem_name>.<algorithm_name>.<n_run>.csv`) and all PI dataframes
 934        (`<problem_name>.<algorithm_name>.<n_run>.pi.csv`) into a single
 935        dataframe, and saves it under `output_dir_path/benchmark.csv`.
 936        """
 937        logging.info("Consolidating statistics")
 938        all_df = []
 939        for triple in self.all_par_triples():
 940            path = self._output_dir_path / triple.result_filename()
 941            if not path.exists():
 942                logging.debug(
 943                    "Statistic file {} does not exist. The corresponding "
 944                    "triple [{}] most likely hasn't finished or failed",
 945                    path,
 946                    triple,
 947                )
 948                continue
 949            all_df.append(pd.read_csv(path))
 950        self._results = pd.concat(all_df, ignore_index=True)
 951        self._results["timedelta"] = pd.to_timedelta(
 952            self._results["timedelta"]
 953        )
 954        self._results = self._results.astype(
 955            {
 956                "algorithm": "category",
 957                "n_gen": "uint32",
 958                "n_run": "uint32",
 959                "problem": "category",
 960            }
 961        )
 962
 963        logging.info("Consolidating performance indicators")
 964        all_df = []
 965        for triple in self.all_par_triples():
 966            df = pd.DataFrame()
 967            for pi_name in self._performance_indicators:
 968                path = self._output_dir_path / triple.pi_filename(pi_name)
 969                if not path.exists():
 970                    logging.debug("PI file {} does not exist.", path)
 971                    continue
 972                tmp = pd.read_csv(path)
 973                if df.empty:
 974                    df = tmp
 975                else:
 976                    col = "perf_" + pi_name
 977                    df[col] = tmp[col]
 978            all_df.append(df)
 979
 980        self._results = self._results.merge(
 981            pd.concat(all_df, ignore_index=True),
 982            how="outer",
 983            on=["algorithm", "problem", "n_gen", "n_run"],
 984        )
 985
 986        # ???
 987        if "Unnamed: 0" in self._results:
 988            del self._results["Unnamed: 0"]
 989
 990        path = self._output_dir_path / "benchmark.csv"
 991        logging.info("Writing results to {}", path)
 992        self.dump_results(path, index=False)
 993
 994    def dump_results(self, path: Union[Path, str], fmt: str = "csv", **kwargs):
 995        """
 996        Dumps the internal `_result` dataframe.
 997
 998        Args:
 999            path (Union[Path, str]): Path to the output file.
1000            fmt (str): Text or binary format supported by pandas, see
1001                `here <https://pandas.pydata.org/docs/user_guide/io.html>`_.
1002                CSV by default.
1003            kwargs: Will be passed on the `pandas.DataFrame.to_<fmt>` method.
1004        """
1005        saver = {
1006            "csv": pd.DataFrame.to_csv,
1007            "excel": pd.DataFrame.to_excel,
1008            "feather": pd.DataFrame.to_feather,
1009            "gbq": pd.DataFrame.to_gbq,
1010            "hdf": pd.DataFrame.to_hdf,
1011            "html": pd.DataFrame.to_html,
1012            "json": pd.DataFrame.to_json,
1013            "parquet": pd.DataFrame.to_parquet,
1014            "pickle": pd.DataFrame.to_pickle,
1015        }[fmt]
1016        saver(self._results, path, **kwargs)
1017
1018    def final_results(
1019        self,
1020        timedeltas_to_microseconds: bool = True,
1021        reset_index: bool = True,
1022    ) -> pd.DataFrame:
1023        """
1024        Returns a dataframe containing the final row of each
1025        algorithm/problem/n_run triple, i.e. the final record of each run of
1026        the benchmark.
1027
1028        If the `reset_index` argument set to `False`, the resulting dataframe
1029        will have a multiindex given by the (algorithm, problem, n_run) tuples,
1030        e.g.
1031
1032                                     n_gen  timedelta   perf_gd  ...
1033            algorithm problem n_run                              ...
1034            nsga2     bnh     1        155     886181  0.477980  ...
1035                              2        200      29909  0.480764  ...
1036                      zdt1    1        400     752818  0.191490  ...
1037                              2        305     979112  0.260930  ...
1038
1039        (note tha the `timedelta` column has been converted to microseconds,
1040        see the `timedeltas_to_microseconds` argument below). If `reset_index`
1041        is set to `True` (the default), then the index is reset, giving
1042        something like this:
1043
1044              algorithm problem  n_run  n_gen  timedelta   perf_gd  ...
1045            0     nsga2     bnh      1    155     886181  0.477980  ...
1046            1     nsga2     bnh      2    200      29909  0.480764  ...
1047            2     nsga2    zdt1      1    400     752818  0.191490  ...
1048            3     nsga2    zdt1      2    305     979112  0.260930  ...
1049
1050        This form is easier to plot.
1051
1052        Args:
1053            reset_index (bool): Wether to reset the index. Defaults to
1054                `True`.
1055            timedeltas_to_microseconds (bool): Wether to convert the
1056                timedeltas column to microseconds. Defaults to `True`.
1057
1058        """
1059        df = self._results.groupby(["algorithm", "problem", "n_run"]).last()
1060        if timedeltas_to_microseconds:
1061            df["timedelta"] = df["timedelta"].dt.microseconds
1062        return df.reset_index() if reset_index else df
1063
1064    def run(
1065        self,
1066        n_jobs: int = -1,
1067        n_post_processing_jobs: int = 2,
1068        **joblib_kwargs,
1069    ):
1070        """
1071        Runs the benchmark sequentially. Makes your laptop go brr. The
1072        histories of all problems are progressively dumped in the specified
1073        output directory as the benchmark run. At the end, the benchmark
1074        results are dumped in `output_dir_path/benchmark.csv`.
1075
1076        Args:
1077            n_jobs (int): Number of processes to use. See the `joblib.Parallel`_
1078                documentation. Defaults to `-1`, i.e. all CPUs are used.
1079            n_post_processing_jobs (int): Number of processes to use for post
1080                processing tasks (computing global Pareto populations and
1081                performance indicators). These are memory-intensive tasks.
1082                Defaults to `2`.
1083            joblib_kwargs (dict): Additional kwargs to pass on to the
1084                `joblib.Parallel`_ instance.
1085
1086        .. _joblib.Parallel:
1087            https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html
1088        """
1089        if not os.path.isdir(self._output_dir_path):
1090            os.mkdir(self._output_dir_path)
1091        triples = self.all_par_triples()
1092        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
1093        current_round = 0
1094        while (
1095            self._max_retry < 0 or current_round <= self._max_retry
1096        ) and any(not self._par_triple_done(t) for t in triples):
1097            executor(
1098                delayed(Benchmark._run_par_triple)(self, t)
1099                for t in triples
1100                if not self._par_triple_done(t)
1101            )
1102            current_round += 1
1103        if any(not self._par_triple_done(t) for t in triples):
1104            logging.warning(
1105                "Benchmark finished, but some triples could not be run "
1106                "successfully within the retry budget ({}):",
1107                self._max_retry,
1108            )
1109            for t in filter(lambda x: not self._par_triple_done(x), triples):
1110                logging.warning("    [{}]", t)
1111        self.compute_global_pareto_populations(
1112            n_post_processing_jobs, **joblib_kwargs
1113        )
1114        self.compute_performance_indicators(
1115            n_post_processing_jobs, **joblib_kwargs
1116        )
1117        self.consolidate()

A benchmark is constructed with a list of problems and pymoo algorithms descriptions, and run each algorithm against each problem, storing all histories for later analysis.

Benchmark( output_dir_path: Union[pathlib.Path, str], problems: Dict[str, dict], algorithms: Dict[str, dict], n_runs: int = 1, dump_histories: bool = True, performance_indicators: Union[List[str], NoneType] = None, max_retry: int = -1, seeds: Union[List[Union[int, NoneType]], NoneType] = None)
309    def __init__(
310        self,
311        output_dir_path: Union[Path, str],
312        problems: Dict[str, dict],
313        algorithms: Dict[str, dict],
314        n_runs: int = 1,
315        dump_histories: bool = True,
316        performance_indicators: Optional[List[str]] = None,
317        max_retry: int = -1,
318        seeds: Optional[List[Optional[int]]] = None,
319    ):
320        """
321        Constructor. The set of problems to be benchmarked is represented by a
322        dictionary with the following structure:
323
324            problems = {
325                <problem_name>: <problem_description>,
326                <problem_name>: <problem_description>,
327            }
328
329        where `<problem_name>` is a user-defined string (but stay reasonable
330        since it may be used in filenames), and `<problem_description>` is a
331        dictionary with the following keys:
332        * `df_n_evals` (int, optional): see the explanation of the `df`
333            performance indicator below; defaults to `1`;
334        * `evaluator` (optional): an algorithm evaluator object that will be
335            applied to every algorithm that run on this problem; if an
336            algorithm already has an evaluator attached to it (see
337            `<algorithm_description>` below), the evaluator attached to this
338            problem takes precedence; note that the evaluator is deepcopied for
339            every run of `minimize`;
340        * `hv_ref_point` (optional, `np.ndarray`): a reference point for
341            computing hypervolume performance, see `performance_indicators`
342            argument;
343        * `pareto_front` (optional, `np.ndarray`): a Pareto front subset;
344        * `problem`: a `WrappedProblem` instance.
345
346        The set of algorithms to be used is specified similarly::
347
348            algorithms = {
349                <algorithm_name>: <algorithm_description>,
350                <algorithm_name>: <algorithm_description>,
351            }
352
353        where `<algorithm_name>` is a user-defined string (but stay reasonable
354        since it may be used in filenames), and `<algorithm_description>` is a
355        dictionary with the following keys: * `algorithm`: a pymoo `Algorithm`
356        object; note that it is deepcopied
357            for every run of `minimize`;
358        * `display` (optional): a custom `pymoo.util.display.Display` object
359            for customization purposes;
360        * `evaluator` (optional): an algorithm evaluator object; note that it
361            is deepcopied for every run of `minimize`;
362        * `return_least_infeasible` (optional, bool): if the algorithm cannot
363            find a feasible solution, wether the least infeasible solution
364            should still be returned; defaults to `False`;
365        * `termination` (optional): a pymoo termination criterion; note that it
366            is deepcopied for every run of `minimize`;
367        * `verbose` (optional, bool): wether outputs should be printed during
368            during the execution of the algorithm; defaults to `False`.
369
370        Args:
371            algorithms (Dict[str, dict]): Dict of all algorithms to be
372                benchmarked.
373            dump_histories (bool): Wether the history of each
374                `WrappedProblem` involved in this benchmark should be written
375                to disk. Defaults to `True`.
376            max_retries (int): Maximum number of attempts to run a given
377                problem-algorithm-(run number) triple before giving up. Set it
378                to `-1` to retry indefinitely.
379            n_runs (int): Number of times to run a given problem-algorithm
380                pair.
381            problems (Dict[str, dict]): Dict of all problems to be benchmarked.
382            performance_indicators (Optional[List[str]]): List of perfomance
383                indicators to be calculated and included in the result
384                dataframe (see `Benchmark.final_results`). Supported indicators
385                are
386                * `df`: ΔF metric, see the documentation of
387                  `nmoo.indicators.delta_f.DeltaF`; `df_n_eval` should be set
388                  in the problem description, but it default to 1 if not;
389                * `dfp`: ΔF-Pareto metric, see the documentation of
390                  `nmoo.indicators.delta_f_pareto.DeltaFPareto`; `df_n_eval`
391                  should be set in the problem description, but it default to 1
392                  if not;
393                * `gd`: [generational
394                  distance](https://pymoo.org/misc/indicators.html#Generational-Distance-(GD)),
395                  requires `pareto_front` to be set in the problem description
396                  dictionaries, otherwise the value of this indicator will be
397                  `NaN`;
398                * `gd+`: [generational distance
399                  plus](https://pymoo.org/misc/indicators.html#Generational-Distance-Plus-(GD+)),
400                  requires `pareto_front` to be set in the problem description
401                  dictionaries, otherwise the value of this indicator will be
402                  `NaN`;
403                * `hv`:
404                  [hypervolume](https://pymoo.org/misc/indicators.html#Hypervolume),
405                  requires `hv_ref_point` to be set in the problem discription
406                  dictionaries, otherwise the value of this indicator will be
407                  `NaN`;
408                * `igd`: [inverted generational
409                  distance](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-(IGD)),
410                  requires `pareto_front` to be set in the problem description
411                  dictionaries, otherwise the value of this indicator will be
412                  `NaN`;
413                * `igd+`: [inverted generational distance
414                  plus](https://pymoo.org/misc/indicators.html#Inverted-Generational-Distance-Plus-(IGD+)),
415                  requires `pareto_front` to be set in the problem description
416                  dictionaries, otherwise the value of this indicator will be
417                  `NaN`;
418                * `ps`: population size, or equivalently, the size of the
419                  current Pareto front;
420                * `ggd`: ground generational distance, where the ground
421                  problem's predicted objective values are used instead of the
422                  outer problem's; requires `pareto_front` to be set in the
423                  problem description dictionaries, otherwise the value of this
424                  indicator will be `NaN`;
425                * `ggd+`: ground generational distance plus; requires
426                  `pareto_front` to be set in the problem description
427                  dictionaries, otherwise the value of this indicator will be
428                  `NaN`;
429                * `ghv`: ground hypervolume; requires `hv_ref_point` to be set
430                  in the problem discription dictionaries, otherwise the value
431                  of this indicator will be `NaN`;
432                * `gigd`: ground inverted generational distance; requires
433                  `pareto_front` to be set in the problem description
434                  dictionaries, otherwise the value of this indicator will be
435                  `NaN`;
436                * `gigd+`: ground inverted generational distance plus; requires
437                  `pareto_front` to be set in the problem description
438                  dictionaries, otherwise the value of this indicator will be
439                  `NaN`.
440                * `rggd`: resampled ground generational distance, where the
441                  ground problem's predicted objective values (resampled and
442                  averaged a given number of times) are used instead of the
443                  outer problem's; requires `pareto_front` to be set in the
444                  problem description dictionaries, otherwise the value of this
445                  indicator will be `NaN`; `rg_n_eval` should also be set in
446                  the problem description, but defaults to 1 if not;
447                * `rggd+`: resampled ground generational distance plus;
448                  requires `pareto_front` to be set in the problem description
449                  dictionaries, otherwise the value of this indicator will be
450                  `NaN`; `rg_n_eval` should also be set in the problem
451                  description, but defaults to 1 if not; `rg_n_eval` should
452                  also be set in the problem description, but defaults to 1 if
453                  not;
454                * `rghv`: resampled ground hypervolume; requires `hv_ref_point`
455                  to be set in the problem discription dictionaries, otherwise
456                  the value of this indicator will be `NaN`; `rg_n_eval` should
457                  also be set in the problem description, but defaults to 1 if
458                  not;
459                * `rgigd`: resampled ground inverted generational distance;
460                  requires `pareto_front` to be set in the problem description
461                  dictionaries, otherwise the value of this indicator will be
462                  `NaN`; `rg_n_eval` should also be set in the problem
463                  description, but defaults to 1 if not;
464                * `rgigd+`: resampled ground inverted generational distance
465                  plus; requires `pareto_front` to be set in the problem
466                  description dictionaries, otherwise the value of this
467                  indicator will be `NaN`; `rg_n_eval` should also be set in
468                  the problem description, but defaults to 1 if not.
469
470                In the result dataframe, the corresponding columns will be
471                named `perf_<name of indicator>`, e.g. `perf_igd`. If left
472                unspecified, defaults to `["igd"]`.
473
474            seeds (Optional[List[Optional[int]]]): List of seeds to use. The
475                first seed will be used for the first run of every
476                algorithm-problem pair, etc.
477        """
478        self._output_dir_path = Path(output_dir_path)
479        self._set_problems(problems)
480        self._set_algorithms(algorithms)
481        if n_runs <= 0:
482            raise ValueError(
483                "The number of run (for each problem-algorithm pair) must be "
484                "at least 1."
485            )
486        self._n_runs = n_runs
487        self._dump_histories = dump_histories
488        self._set_performance_indicators(performance_indicators)
489        self._max_retry = max_retry
490        if seeds is None:
491            self._seeds = [None] * n_runs
492        elif len(seeds) < n_runs:
493            raise ValueError(
494                f"Not enough seeds: provided {len(seeds)} seeds but specified "
495                f"{n_runs} runs."
496            )
497        else:
498            if len(seeds) > n_runs:
499                logging.warning(
500                    "Too many seeds: provided {} but only need {} "
501                    "(i.e. n_run)",
502                    len(seeds),
503                    n_runs,
504                )
505            self._seeds = seeds

Constructor. The set of problems to be benchmarked is represented by a dictionary with the following structure:

problems = {
    <problem_name>: <problem_description>,
    <problem_name>: <problem_description>,
}

where <problem_name> is a user-defined string (but stay reasonable since it may be used in filenames), and <problem_description> is a dictionary with the following keys:

  • df_n_evals (int, optional): see the explanation of the df performance indicator below; defaults to 1;
  • evaluator (optional): an algorithm evaluator object that will be applied to every algorithm that run on this problem; if an algorithm already has an evaluator attached to it (see <algorithm_description> below), the evaluator attached to this problem takes precedence; note that the evaluator is deepcopied for every run of minimize;
  • hv_ref_point (optional, np.ndarray): a reference point for computing hypervolume performance, see performance_indicators argument;
  • pareto_front (optional, np.ndarray): a Pareto front subset;
  • problem: a WrappedProblem instance.

The set of algorithms to be used is specified similarly::

algorithms = {
    <algorithm_name>: <algorithm_description>,
    <algorithm_name>: <algorithm_description>,
}

where <algorithm_name> is a user-defined string (but stay reasonable since it may be used in filenames), and <algorithm_description> is a dictionary with the following keys: * algorithm: a pymoo Algorithm object; note that it is deepcopied for every run of minimize;

  • display (optional): a custom pymoo.util.display.Display object for customization purposes;
  • evaluator (optional): an algorithm evaluator object; note that it is deepcopied for every run of minimize;
  • return_least_infeasible (optional, bool): if the algorithm cannot find a feasible solution, wether the least infeasible solution should still be returned; defaults to False;
  • termination (optional): a pymoo termination criterion; note that it is deepcopied for every run of minimize;
  • verbose (optional, bool): wether outputs should be printed during during the execution of the algorithm; defaults to False.
Arguments:
  • algorithms (Dict[str, dict]): Dict of all algorithms to be benchmarked.
  • dump_histories (bool): Wether the history of each WrappedProblem involved in this benchmark should be written to disk. Defaults to True.
  • max_retries (int): Maximum number of attempts to run a given problem-algorithm-(run number) triple before giving up. Set it to -1 to retry indefinitely.
  • n_runs (int): Number of times to run a given problem-algorithm pair.
  • problems (Dict[str, dict]): Dict of all problems to be benchmarked.
  • performance_indicators (Optional[List[str]]): List of perfomance indicators to be calculated and included in the result dataframe (see Benchmark.final_results). Supported indicators are

    • df: ΔF metric, see the documentation of nmoo.indicators.delta_f.DeltaF; df_n_eval should be set in the problem description, but it default to 1 if not;
    • dfp: ΔF-Pareto metric, see the documentation of nmoo.indicators.delta_f_pareto.DeltaFPareto; df_n_eval should be set in the problem description, but it default to 1 if not;
    • gd: generational distance, requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • gd+: generational distance plus, requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • hv: hypervolume, requires hv_ref_point to be set in the problem discription dictionaries, otherwise the value of this indicator will be NaN;
    • igd: inverted generational distance, requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • igd+: inverted generational distance plus, requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • ps: population size, or equivalently, the size of the current Pareto front;
    • ggd: ground generational distance, where the ground problem's predicted objective values are used instead of the outer problem's; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • ggd+: ground generational distance plus; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • ghv: ground hypervolume; requires hv_ref_point to be set in the problem discription dictionaries, otherwise the value of this indicator will be NaN;
    • gigd: ground inverted generational distance; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN;
    • gigd+: ground inverted generational distance plus; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN.
    • rggd: resampled ground generational distance, where the ground problem's predicted objective values (resampled and averaged a given number of times) are used instead of the outer problem's; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN; rg_n_eval should also be set in the problem description, but defaults to 1 if not;
    • rggd+: resampled ground generational distance plus; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN; rg_n_eval should also be set in the problem description, but defaults to 1 if not; rg_n_eval should also be set in the problem description, but defaults to 1 if not;
    • rghv: resampled ground hypervolume; requires hv_ref_point to be set in the problem discription dictionaries, otherwise the value of this indicator will be NaN; rg_n_eval should also be set in the problem description, but defaults to 1 if not;
    • rgigd: resampled ground inverted generational distance; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN; rg_n_eval should also be set in the problem description, but defaults to 1 if not;
    • rgigd+: resampled ground inverted generational distance plus; requires pareto_front to be set in the problem description dictionaries, otherwise the value of this indicator will be NaN; rg_n_eval should also be set in the problem description, but defaults to 1 if not.

    In the result dataframe, the corresponding columns will be named perf_<name of indicator>, e.g. perf_igd. If left unspecified, defaults to ["igd"].

  • seeds (Optional[List[Optional[int]]]): List of seeds to use. The first seed will be used for the first run of every algorithm-problem pair, etc.

def all_pa_pairs(self) -> List[nmoo.benchmark.PAPair]:
851    def all_pa_pairs(self) -> List[PAPair]:
852        """Generate the list of all problem-algorithm pairs."""
853        everything = product(
854            self._algorithms.items(),
855            self._problems.items(),
856        )
857        return [
858            PAPair(
859                algorithm_description=aa,
860                algorithm_name=an,
861                problem_description=pp,
862                problem_name=pn,
863            )
864            for (an, aa), (pn, pp) in everything
865        ]

Generate the list of all problem-algorithm pairs.

def all_par_triples(self) -> List[nmoo.benchmark.PARTriple]:
867    def all_par_triples(self) -> List[PARTriple]:
868        """Generate the list of all problem-algorithm-(run number) triples."""
869        everything = product(
870            self._algorithms.items(),
871            self._problems.items(),
872            range(1, self._n_runs + 1),
873        )
874        return [
875            PARTriple(
876                algorithm_description=aa,
877                algorithm_name=an,
878                n_run=r,
879                problem_description=pp,
880                problem_name=pn,
881            )
882            for (an, aa), (pn, pp), r in everything
883        ]

Generate the list of all problem-algorithm-(run number) triples.

def compute_global_pareto_populations(self, n_jobs: int = -1, **joblib_kwargs) -> None:
885    def compute_global_pareto_populations(
886        self, n_jobs: int = -1, **joblib_kwargs
887    ) -> None:
888        """
889        The global Pareto population of a problem-algorithm pair is the merged
890        population of all pareto populations across all runs of that pair. This
891        function calculates global Pareto population of all pairs and dumps it
892        to `<output_dir_path>/<problem>.<algorithm>.gpp.npz`. If that file
893        exists for a given problem-algorithm pair, then the global Pareto
894        population (of that pair) is not recalculated.
895        """
896        logging.info("Computing global Pareto populations")
897        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
898        executor(
899            delayed(Benchmark._compute_global_pareto_population)(self, p)
900            for p in self.all_pa_pairs()
901            if not (
902                self._output_dir_path / p.global_pareto_population_filename()
903            ).is_file()
904        )

The global Pareto population of a problem-algorithm pair is the merged population of all pareto populations across all runs of that pair. This function calculates global Pareto population of all pairs and dumps it to <output_dir_path>/<problem>.<algorithm>.gpp.npz. If that file exists for a given problem-algorithm pair, then the global Pareto population (of that pair) is not recalculated.

def compute_performance_indicators(self, n_jobs: int = -1, **joblib_kwargs) -> None:
906    def compute_performance_indicators(
907        self, n_jobs: int = -1, **joblib_kwargs
908    ) -> None:
909        """
910        Computes all performance indicators and saves the corresponding
911        dataframes in
912        `output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv`.
913        If that file exists for a given problem-algorithm-(run
914        number)-(performance indicator) tuple, then it is not recalculated.
915        """
916        logging.info(
917            "Computing performance indicators: {}",
918            ", ".join(self._performance_indicators),
919        )
920        everything = product(
921            self.all_par_triples(), self._performance_indicators
922        )
923        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
924        executor(
925            delayed(Benchmark._compute_performance_indicator)(self, t, pi)
926            for t, pi in everything
927            if not (self._output_dir_path / t.pi_filename(pi)).is_file()
928        )

Computes all performance indicators and saves the corresponding dataframes in output_path/<problem_name>.<algorithm_name>.<n_run>.pi-<pi_name>.csv. If that file exists for a given problem-algorithm-(run number)-(performance indicator) tuple, then it is not recalculated.

def consolidate(self) -> None:
930    def consolidate(self) -> None:
931        """
932        Merges all statistics dataframes
933        (`<problem_name>.<algorithm_name>.<n_run>.csv`) and all PI dataframes
934        (`<problem_name>.<algorithm_name>.<n_run>.pi.csv`) into a single
935        dataframe, and saves it under `output_dir_path/benchmark.csv`.
936        """
937        logging.info("Consolidating statistics")
938        all_df = []
939        for triple in self.all_par_triples():
940            path = self._output_dir_path / triple.result_filename()
941            if not path.exists():
942                logging.debug(
943                    "Statistic file {} does not exist. The corresponding "
944                    "triple [{}] most likely hasn't finished or failed",
945                    path,
946                    triple,
947                )
948                continue
949            all_df.append(pd.read_csv(path))
950        self._results = pd.concat(all_df, ignore_index=True)
951        self._results["timedelta"] = pd.to_timedelta(
952            self._results["timedelta"]
953        )
954        self._results = self._results.astype(
955            {
956                "algorithm": "category",
957                "n_gen": "uint32",
958                "n_run": "uint32",
959                "problem": "category",
960            }
961        )
962
963        logging.info("Consolidating performance indicators")
964        all_df = []
965        for triple in self.all_par_triples():
966            df = pd.DataFrame()
967            for pi_name in self._performance_indicators:
968                path = self._output_dir_path / triple.pi_filename(pi_name)
969                if not path.exists():
970                    logging.debug("PI file {} does not exist.", path)
971                    continue
972                tmp = pd.read_csv(path)
973                if df.empty:
974                    df = tmp
975                else:
976                    col = "perf_" + pi_name
977                    df[col] = tmp[col]
978            all_df.append(df)
979
980        self._results = self._results.merge(
981            pd.concat(all_df, ignore_index=True),
982            how="outer",
983            on=["algorithm", "problem", "n_gen", "n_run"],
984        )
985
986        # ???
987        if "Unnamed: 0" in self._results:
988            del self._results["Unnamed: 0"]
989
990        path = self._output_dir_path / "benchmark.csv"
991        logging.info("Writing results to {}", path)
992        self.dump_results(path, index=False)

Merges all statistics dataframes (<problem_name>.<algorithm_name>.<n_run>.csv) and all PI dataframes (<problem_name>.<algorithm_name>.<n_run>.pi.csv) into a single dataframe, and saves it under output_dir_path/benchmark.csv.

def dump_results(self, path: Union[pathlib.Path, str], fmt: str = 'csv', **kwargs):
 994    def dump_results(self, path: Union[Path, str], fmt: str = "csv", **kwargs):
 995        """
 996        Dumps the internal `_result` dataframe.
 997
 998        Args:
 999            path (Union[Path, str]): Path to the output file.
1000            fmt (str): Text or binary format supported by pandas, see
1001                `here <https://pandas.pydata.org/docs/user_guide/io.html>`_.
1002                CSV by default.
1003            kwargs: Will be passed on the `pandas.DataFrame.to_<fmt>` method.
1004        """
1005        saver = {
1006            "csv": pd.DataFrame.to_csv,
1007            "excel": pd.DataFrame.to_excel,
1008            "feather": pd.DataFrame.to_feather,
1009            "gbq": pd.DataFrame.to_gbq,
1010            "hdf": pd.DataFrame.to_hdf,
1011            "html": pd.DataFrame.to_html,
1012            "json": pd.DataFrame.to_json,
1013            "parquet": pd.DataFrame.to_parquet,
1014            "pickle": pd.DataFrame.to_pickle,
1015        }[fmt]
1016        saver(self._results, path, **kwargs)

Dumps the internal _result dataframe.

Arguments:
  • path (Union[Path, str]): Path to the output file.
  • fmt (str): Text or binary format supported by pandas, see here . CSV by default.
  • kwargs: Will be passed on the pandas.DataFrame.to_<fmt> method.
def final_results( self, timedeltas_to_microseconds: bool = True, reset_index: bool = True) -> pandas.core.frame.DataFrame:
1018    def final_results(
1019        self,
1020        timedeltas_to_microseconds: bool = True,
1021        reset_index: bool = True,
1022    ) -> pd.DataFrame:
1023        """
1024        Returns a dataframe containing the final row of each
1025        algorithm/problem/n_run triple, i.e. the final record of each run of
1026        the benchmark.
1027
1028        If the `reset_index` argument set to `False`, the resulting dataframe
1029        will have a multiindex given by the (algorithm, problem, n_run) tuples,
1030        e.g.
1031
1032                                     n_gen  timedelta   perf_gd  ...
1033            algorithm problem n_run                              ...
1034            nsga2     bnh     1        155     886181  0.477980  ...
1035                              2        200      29909  0.480764  ...
1036                      zdt1    1        400     752818  0.191490  ...
1037                              2        305     979112  0.260930  ...
1038
1039        (note tha the `timedelta` column has been converted to microseconds,
1040        see the `timedeltas_to_microseconds` argument below). If `reset_index`
1041        is set to `True` (the default), then the index is reset, giving
1042        something like this:
1043
1044              algorithm problem  n_run  n_gen  timedelta   perf_gd  ...
1045            0     nsga2     bnh      1    155     886181  0.477980  ...
1046            1     nsga2     bnh      2    200      29909  0.480764  ...
1047            2     nsga2    zdt1      1    400     752818  0.191490  ...
1048            3     nsga2    zdt1      2    305     979112  0.260930  ...
1049
1050        This form is easier to plot.
1051
1052        Args:
1053            reset_index (bool): Wether to reset the index. Defaults to
1054                `True`.
1055            timedeltas_to_microseconds (bool): Wether to convert the
1056                timedeltas column to microseconds. Defaults to `True`.
1057
1058        """
1059        df = self._results.groupby(["algorithm", "problem", "n_run"]).last()
1060        if timedeltas_to_microseconds:
1061            df["timedelta"] = df["timedelta"].dt.microseconds
1062        return df.reset_index() if reset_index else df

Returns a dataframe containing the final row of each algorithm/problem/n_run triple, i.e. the final record of each run of the benchmark.

If the reset_index argument set to False, the resulting dataframe will have a multiindex given by the (algorithm, problem, n_run) tuples, e.g.

                         n_gen  timedelta   perf_gd  ...
algorithm problem n_run                              ...
nsga2     bnh     1        155     886181  0.477980  ...
                  2        200      29909  0.480764  ...
          zdt1    1        400     752818  0.191490  ...
                  2        305     979112  0.260930  ...

(note tha the timedelta column has been converted to microseconds, see the timedeltas_to_microseconds argument below). If reset_index is set to True (the default), then the index is reset, giving something like this:

  algorithm problem  n_run  n_gen  timedelta   perf_gd  ...
0     nsga2     bnh      1    155     886181  0.477980  ...
1     nsga2     bnh      2    200      29909  0.480764  ...
2     nsga2    zdt1      1    400     752818  0.191490  ...
3     nsga2    zdt1      2    305     979112  0.260930  ...

This form is easier to plot.

Arguments:
  • reset_index (bool): Wether to reset the index. Defaults to True.
  • timedeltas_to_microseconds (bool): Wether to convert the timedeltas column to microseconds. Defaults to True.
def run( self, n_jobs: int = -1, n_post_processing_jobs: int = 2, **joblib_kwargs):
1064    def run(
1065        self,
1066        n_jobs: int = -1,
1067        n_post_processing_jobs: int = 2,
1068        **joblib_kwargs,
1069    ):
1070        """
1071        Runs the benchmark sequentially. Makes your laptop go brr. The
1072        histories of all problems are progressively dumped in the specified
1073        output directory as the benchmark run. At the end, the benchmark
1074        results are dumped in `output_dir_path/benchmark.csv`.
1075
1076        Args:
1077            n_jobs (int): Number of processes to use. See the `joblib.Parallel`_
1078                documentation. Defaults to `-1`, i.e. all CPUs are used.
1079            n_post_processing_jobs (int): Number of processes to use for post
1080                processing tasks (computing global Pareto populations and
1081                performance indicators). These are memory-intensive tasks.
1082                Defaults to `2`.
1083            joblib_kwargs (dict): Additional kwargs to pass on to the
1084                `joblib.Parallel`_ instance.
1085
1086        .. _joblib.Parallel:
1087            https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html
1088        """
1089        if not os.path.isdir(self._output_dir_path):
1090            os.mkdir(self._output_dir_path)
1091        triples = self.all_par_triples()
1092        executor = Parallel(n_jobs=n_jobs, **joblib_kwargs)
1093        current_round = 0
1094        while (
1095            self._max_retry < 0 or current_round <= self._max_retry
1096        ) and any(not self._par_triple_done(t) for t in triples):
1097            executor(
1098                delayed(Benchmark._run_par_triple)(self, t)
1099                for t in triples
1100                if not self._par_triple_done(t)
1101            )
1102            current_round += 1
1103        if any(not self._par_triple_done(t) for t in triples):
1104            logging.warning(
1105                "Benchmark finished, but some triples could not be run "
1106                "successfully within the retry budget ({}):",
1107                self._max_retry,
1108            )
1109            for t in filter(lambda x: not self._par_triple_done(x), triples):
1110                logging.warning("    [{}]", t)
1111        self.compute_global_pareto_populations(
1112            n_post_processing_jobs, **joblib_kwargs
1113        )
1114        self.compute_performance_indicators(
1115            n_post_processing_jobs, **joblib_kwargs
1116        )
1117        self.consolidate()

Runs the benchmark sequentially. Makes your laptop go brr. The histories of all problems are progressively dumped in the specified output directory as the benchmark run. At the end, the benchmark results are dumped in output_dir_path/benchmark.csv.

Arguments:
  • n_jobs (int): Number of processes to use. See the joblib.Parallel documentation. Defaults to -1, i.e. all CPUs are used.
  • n_post_processing_jobs (int): Number of processes to use for post processing tasks (computing global Pareto populations and performance indicators). These are memory-intensive tasks. Defaults to 2.
  • joblib_kwargs (dict): Additional kwargs to pass on to the joblib.Parallel instance.