--- myst: html_meta: "description": "Guide for developers on running and writing ASV benchmarks for eOn, including local usage, CI integration, and adding new benchmarks." "keywords": "eOn benchmarks, ASV, airspeed velocity, performance testing, CI" --- # Benchmarks ```{versionadded} 2.5 ``` We use [ASV (Airspeed Velocity)](https://asv.readthedocs.io/) to track performance of the `eonclient` binary across commits. Benchmarks live in the `benchmarks/` directory and are configured via `asv.conf.json` at the repository root. ## Benchmark suite The current suite covers four workloads, each measuring wall-clock time and peak memory: | Class | System | Job type | |-------|--------|----------| | `TimeSaddleSearchMorseDimer` | 337-atom Pt slab (Morse) | Saddle search (dimer) | | `TimePointMorsePt` | 337-atom Pt slab (Morse) | Single point evaluation | | `TimeMinimizationLJCluster` | 997-atom LJ cluster | LBFGS minimization | | `TimeNEBMorsePt` | 337-atom Pt slab (Morse) | NEB (5 images) | Input data for each benchmark is stored under `benchmarks/data//` and contains a `config.ini` plus the necessary `.con` geometry files. ## Running locally ASV expects `eonclient` to be on `PATH`. Build and install it first: ```bash meson setup builddir --prefix=$CONDA_PREFIX --libdir=lib --buildtype release meson install -C builddir ``` Then install ASV and run the benchmarks against the current working tree: ```bash pip install asv asv machine --yes asv run -E "existing:$(which python)" --set-commit-hash $(git rev-parse HEAD) --quick ``` The `--quick` flag runs each benchmark once. Drop it for full statistical sampling (controlled by each class's `repeat` attribute). To compare two result files (e.g. after running on two different commits): ```bash MACHINE=$(ls .asv/results/ | grep -v benchmarks.json | head -1) # TODO: switch to `uvx asv-spyglass` once the labels feature is released uvx --from "asv-spyglass @ git+https://github.com/HaoZeke/asv_spyglass.git@enh-multiple-comparisons" \ asv-spyglass compare --label-before before --label-after after \ .asv/results/$MACHINE/*.json \ .asv/results/$MACHINE/*.json ``` The `benchmarks.json` metadata file is auto-discovered from the results directory, providing units and parameter names. You can also export a single result to a DataFrame: ```bash uvx asv-spyglass to-df .asv/results/$MACHINE/*.json --csv results.csv ``` Results are stored in `.asv/results/` and can be browsed as HTML with: ```bash asv publish asv preview ``` ## CI integration Every pull request targeting `main` triggers the **Benchmark PR** workflow (`.github/workflows/ci_benchmark.yml`). It: 1. Builds and installs `eonclient` at the `main` HEAD 2. Runs the full benchmark suite against `main` 3. Builds and installs `eonclient` at the PR HEAD 4. Runs the suite again against the PR 5. Compares the two runs using [asv-spyglass](https://github.com/airspeed-velocity/asv_spyglass) and posts a summary table as a PR comment The comment is updated in-place on subsequent pushes to the same PR. ## Adding a new benchmark 1. Create a data directory under `benchmarks/data//` containing a `config.ini` and any required `.con` files. You can reuse geometry files from `client/tests/` or `tests/data/`. 2. Add a new class in `benchmarks/bench_eonclient.py` following the existing pattern: ```python class TimeMyBenchmark: """Short description of what this benchmarks.""" timeout = 120 repeat = 5 number = 1 warmup_time = 0 def setup(self): self.tmpdir = tempfile.mkdtemp(prefix="asv_eon_") _copy_data(BENCH_DATA / "my_benchmark", self.tmpdir) def teardown(self): shutil.rmtree(self.tmpdir, ignore_errors=True) def time_my_benchmark(self): """Wall-clock time.""" subprocess.run( ["eonclient"], cwd=self.tmpdir, check=True, capture_output=True, ) def peakmem_my_benchmark(self): """Peak memory.""" subprocess.run( ["eonclient"], cwd=self.tmpdir, check=True, capture_output=True, ) ``` 3. Methods prefixed with `time_` measure wall-clock seconds; `peakmem_` measures peak RSS in bytes. ASV discovers these by convention. 4. Adjust `timeout` and `repeat` to match the expected cost of the workload. Cheap benchmarks (point evaluation) can use higher `repeat`; expensive ones (NEB, saddle search) should use lower values.