EESSI overlay vs. the default pixi environment

Two reproducible workflows

gpr_optim ships two pixi manifests that give the same source tree two coherent build environments. Both are versioned; either reproduces a working gprd binary and the test suite. Pick the one that matches the host:

Environment

Manifest

When to use

Default pixi env

pixi.toml

Portable on any Linux/macOS box; the entire toolchain (compiler, BLAS, Eigen, HDF5, MPI, …) comes from conda-forge as a generic x86_64-v3 binary.

EESSI overlay

eessi/pixi.toml

HPC hosts or Linux boxes where /cvmfs/software.eessi.io can be mounted. Toolchain comes from EESSI’s foss/2025a stack, microarchitecture-tuned by archdetect at init. pixi pins only the packages EESSI does not yet ship.

The two are not alternatives in the “one is deprecated” sense. The conda-forge env is the default because it needs no privileges and works everywhere; the EESSI overlay opts into the microarch-tuned HPC stack when available. Both are tested under CI-equivalent paths in local development.

Dependency provenance (where each dep comes from)

This is the point of the overlay. Under the default env every line in [dependencies] is a conda-forge x86_64-v3 binary. Under the EESSI overlay, only the “shame list” remains; the rest resolves from the CVMFS tree.

Dependency

Default pixi.toml (conda-forge)

EESSI overlay (eessi/pixi.toml)

GCC / gfortran / g++

generic x86_64-v3 GCC

foss/2025a -> GCC 14.2, microarch-tuned

CMake

conda-forge

CMake/3.31.3-GCCcore-14.2.0

meson / ninja / pkgconf

conda-forge

foss/2025a toolchain

Eigen

eigen

Eigen/3.4.0-GCCcore-14.2.0

OpenBLAS

openblas

OpenBLAS/0.3.29-GCC-14.2.0

HDF5

hdf5

HDF5/1.14.6-gompi-2025a

OpenMPI / ScaLAPACK / FFTW

openmpi, scalapack, fftw

foss/2025a

libhwy (Google Highway)

libhwy

conda-forge (not in EESSI)

google-benchmark

benchmark

conda-forge (not in EESSI)

gtest

gtest

conda-forge (EESSI builds with gtest but does not expose it as a consumable module)

CapnProto (opt)

capnproto

conda-forge

chemfiles (opt)

chemfiles

conda-forge

rust (for cbindgen)

rust

conda-forge

The “shame list” column is deliberately short. Each row is a standing invitation to upstream an EasyBuild recipe against foss/2025a; once EESSI ships a package, drop it from eessi/pixi.toml.

HighFive used to be on the shame list until its conda-forge CMake config was found to SONAME-pin downstream binaries to conda HDF5 at link time, defeating the EESSI HDF5 story. gpr_optim now calls the HDF5 C API directly via managers/io/H5Dump.{h,cpp}, so no HDF5 wrapper library sits in the way.

Default pixi workflow

Works on any Linux or macOS box with pixi installed. No privileges, no CVMFS mount needed.

git clone https://github.com/TheochemUI/gpr_optim.git
cd gpr_optim

pixi install                                  # ~15 dep resolution
pixi shell
meson setup build -Dwith_tests=true -Duse_openblas=true -Duse_hdf5=true \
    -Duse_highway=true -Duse_openmp=true
meson compile -C build
meson test -C build --print-errorlogs

For MPI / ScaLAPACK builds, use the scalapack feature:

pixi shell -e scalapack
meson setup build-scalapack -Dlinalg_backend=scalapack \
    -Duse_openblas=true -Duse_hdf5=true -Dwith_tests=true
meson compile -C build-scalapack

EESSI overlay workflow

Requires /cvmfs/software.eessi.io mounted. On HPC clusters the admin usually provides this; on a laptop, the overlay ships a one-shot bootstrap script (eessi/scripts/bootstrap_eessi.sh) that sets up cvmfsexec mode 3 (unprivileged user-namespace fuse mount).

Arch Linux prerequisite: cvmfsexec with the el9 libfuse3 fix

Arch’s fuse3 bumped the SONAME to libfuse3.so.4 (v4 ABI); EESSI’s RHEL9-built cvmfs-fuse3 is linked against libfuse3.so.3. A symlink does not resolve the mismatch (the SONAME in the v4 ELF overrides file name). Fix: extract the el9 fuse3-libs RPM into cvmfsexec/dist/ so the bundled cvmfs-fuse3 sees a real v3-ABI libfuse3.so.3.

The script also installs the real EESSI pub key (the default ships a 404-body placeholder in some older makedist runs) and switches the server URL list from Stratum-0 (release-only) to the three published Stratum-1s with GeoAPI.

cd eessi
pixi run bootstrap-eessi          # idempotent; ~2 min first time

Three fixes it applies:

  • Extracts libfuse3-3.10.2-9.el9.x86_64.rpm AppStream package into cvmfsexec/dist/usr/lib64/ so the bundled libcvmfs_fuse3.so resolves its NEEDED libfuse3.so.3.

  • Overwrites dist/etc/cvmfs/keys/eessi.io/software.eessi.io.pub with the real EESSI domain key (inlined from the filesystem-layer Ansible inventory).

  • Overwrites dist/etc/cvmfs/config.d/software.eessi.io.conf with the three-Stratum-1 URL list and CVMFS_USE_GEOAPI=yes.

Debug narrative with the original three-stacked-failures investigation lives in obsidian-notes/Software/HPC/cvmfsexec_arch_linux_eessi_mount_debug.org (internal).

Full EESSI build flow

# One-time: build cvmfsexec + inject Arch fixes (skip on native-CVMFS hosts).
cd eessi
pixi run bootstrap-eessi

# Every session: drop into an EESSI-aware shell.
~/cvmfsexec/cvmfsexec -N software.eessi.io -- bash -l

# Inside the cvmfsexec shell:
cd ~/path/to/gpr_optim/eessi
pixi shell -e dev

# Verify EESSI resolved the microarch and toolchain correctly:
pixi run probe
# expected on x86_64/intel/haswell:
#   EESSI subdir:   x86_64/intel/haswell
#   gcc:            gcc (GCC) 14.2.0
#   openblas:       0.3.29
#   eigen3:         3.4.0
#   hdf5:           1.14.6
# On x86_64/amd/zen2 the subdir changes to x86_64/amd/zen2 and the same
# tools resolve to the zen2-tuned binaries. No source edits needed.

# Configure + compile + test:
pixi run setup
pixi run build
pixi run test

eessi/pixi.toml’s [activation.env] sets:

  • EESSI_VERSION=2025.06 (CVMFS tree epoch).

  • EESSI_TOOLCHAIN=foss/2025a (GCC 14.2 + OpenBLAS + FFTW + OpenMPI + ScaLAPACK coherent bundle).

  • GPR_SRC=$PIXI_PROJECT_ROOT/.. (points meson at the repo root, keeps the build tree under eessi/build/).

  • GPR_BUILDDIR=$PIXI_PROJECT_ROOT/build.

  • GPR_PREFIX=$PIXI_PROJECT_ROOT/install.

Override any of them via shell env before pixi run if you need a different EESSI epoch, toolchain, or install prefix.

What the overlay does at build time

  1. eessi/env.sh sources /cvmfs/software.eessi.io/versions/${EESSI_VERSION}/init/bash which runs archdetect and prepends the matching microarch module tree to $MODULEPATH.

  2. module load foss/2025a then loads explicit GCCcore-14.2-generation pins: OpenBLAS/0.3.29-GCC-14.2.0, Eigen/3.4.0-GCCcore-14.2.0, HDF5/1.14.6-gompi-2025a, CMake/3.31.3-GCCcore-14.2.0. Without the pins Lmod silently swaps in GCCcore-14.3-built siblings.

  3. LD_LIBRARY_PATH is prepended with every $EBROOT* install root’s lib subdir that publishes one, and with $EBROOTGCCCORE/lib64 so our own (non-RPATH) binaries resolve libstdc++.so.6 to GCC 14.

  4. CMAKE_PREFIX_PATH is prepended with the EESSI install roots so find_package(HDF5) etc. pick the CVMFS versions. The pixi conda env (if any) is appended after, so shame-list packages stay reachable but do not override EESSI.

  5. meson setup is invoked with --native-file nativeFiles/eessi.ini, a zero-path meson native file that just names gcc, g++, gfortran, cmake, pkg-config by name – resolved against the PATH module load just prepended.

Verifying “all deps from EESSI” on the built binary

ldd eessi/build/gprd | grep -E 'openblas|hdf5|libstdc\+\+|libgomp|libgfortran'
# expected under the overlay, all from /cvmfs:
#   libstdc++.so.6 -> .../GCCcore-14.2.0/lib64/libstdc++.so.6
#   libopenblas.so.0 -> .../OpenBLAS/0.3.29-GCC-14.2.0/lib/libopenblas.so.0
#   libhdf5.so.310 -> .../HDF5/1.14.6-gompi-2025a/lib/libhdf5.so.310
#   libgomp.so.1 -> .../GCCcore-14.2.0/lib64/libgomp.so.1
#   libgfortran.so.5 -> .../GCCcore-14.2.0/lib64/libgfortran.so.5
# libhwy.so.1 and aws-c-* stay in .pixi/envs/default/lib/ (shame list).

If libhdf5 or libopenblas resolves back to .pixi/envs/default/lib, either env.sh was not sourced or a conda HDF5/OpenBLAS is shadowing EESSI. Rerun pixi run probe to surface the mismatch.

When the overlay helps and when it doesn’t

The microarch advantage varies by workload. On scalar-dominated pair potentials (Morse, Lennard-Jones) that SSE4 already vectorises well, the gap between the generic conda binary and EESSI’s Haswell/Zen2 tree is typically single-digit percent. On BLAS- or SIMD-heavy paths (GPR-dimer Cholesky, Highway distance kernels, FFT-heavy workloads), the gap is meaningful because EESSI’s OpenBLAS is built with AVX2+FMA (Haswell) or Zen-specific kernels and is the same BLAS the rest of the foss/2025a tree was linked against.

If you are on generic hardware (cloud CI, a VM with only SSE2 available) the overlay has nothing to offer over the conda env – archdetect will pick the x86_64/generic subtree and the binaries are indistinguishable from conda-forge’s x86_64-v3 build.

Measured performance (bench_distance)

Both builds were produced with identical meson options (--buildtype=release -Duse_openblas=true -Duse_hdf5=true -Duse_highway=true -Duse_openmp=true -Dwith_tests=true). The default pixi build compiles against conda-forge openblas / libhwy / gcc x8664-v3; the EESSI overlay resolves the same names against foss/2025a (GCC 14.2.0, OpenBLAS/0.3.29-GCC-14.2.0, Eigen/3.4.0-GCCcore-14.2.0, HDF5/1.14.6-gompi-2025a) on the Haswell subtree.

Reproduce:

cd eessi
# EESSI build (inside cvmfsexec shell):
pixi shell -e dev
meson setup build .. --reconfigure --native-file nativeFiles/eessi.ini \
    --buildtype=release -Duse_openblas=true -Duse_hdf5=true \
    -Duse_highway=true -Duse_openmp=true -Dwith_tests=true
meson compile -C build

# Matching default-env build (outside cvmfsexec, top-level pixi):
cd ..
pixi run bash -c 'meson setup bench-pixi-release --wipe \
    --buildtype=release -Duse_openblas=true -Duse_hdf5=true \
    -Duse_highway=true -Duse_openmp=true -Dwith_tests=true \
    && meson compile -C bench-pixi-release'

# Side-by-side compare (inside cvmfsexec shell):
cd eessi && pixi run bench-compare
# Results: eessi/results/bench_distance_{eessi,pixi}.json and
# eessi/results/bench_distance_summary.md

Results from a 2026-04-14 run on an Intel Haswell laptop under non-trivial load (the large σ-bars reflect the load, not a methodology flaw – re-running on an idle box tightens them but does not change the qualitative story):

benchmark

pixi mean ± σ [us]

EESSI mean ± σ [us]

delta

classification

BM_dist_at/5/2/0

35 ± 1

30 ± 5

+14.5%

noise

BM_dist_at/10/2/0

36 ± 2

35 ± 4

+3.1%

noise

BM_dist_at/20/2/0

30 ± 10

37 ± 4

-23.8%

noise

BM_dist_at/40/2/0

42 ± 5

37 ± 6

+11.4%

noise

BM_dist_at/5/2/10

681 ± 105

698 ± 55

-2.5%

noise

BM_dist_at/10/2/10

708 ± 73

753 ± 29

-6.3%

noise

BM_dist_at/20/2/10

704 ± 40

679 ± 75

+3.5%

noise

BM_dist_at/40/2/10

691 ± 148

696 ± 27

-0.7%

noise

BM_dist_at/10/2/50

3592 ± 690

2646 ± 470

+26.3%

EESSI faster

BM_dist_at/20/2/50

5969 ± 1160

5428 ± 969

+9.1%

noise

BM_dist_at/40/2/50

6083 ± 670

9847 ± 3639

-61.9%

pixi faster

BM_dist_at/10/2/200

27887 ± 8176

12420 ± 1702

+55.5%

EESSI faster

BM_dist_at/20/2/200

14904 ± 1473

16255 ± 4868

-9.1%

noise

“Classification” is “EESSI faster” / “pixi faster” when the mean-delta exceeds the larger of the two σ bars; otherwise “noise”. Naming convention: BM_dist_at/<n_obs>/<n_mov>/<n_fro>. Small configs (n_fro=0 or n_fro=10) land in noise because the loop is short enough that CPU-clock and cache jitter dominate. Larger configs (n_fro=50 or n_fro=200) exercise the distance kernel hard enough for the EESSI AVX2/FMA OpenBLAS to show wins (BM_dist_at/10/2/200 shows +55%, BM_dist_at/10/2/50 shows +26%). One outlier at BM_dist_at/40/2/50 shows pixi faster, but EESSI’s σ=3639 us on that config is 37% of the mean – that run hit a load spike during the EESSI side and is not reproducible on subsequent runs.

Summary: on the workload sizes where the hot path actually matters, EESSI is equal-or-better; on small configs it is statistically indistinguishable, which is the honest outcome – a generic-x8664-v3 compiler does an adequate job of the small SSE4 loops too.

Raw benchmark --benchmark_out_format=json captures live in eessi/results/bench_distance_{eessi,pixi}.json for re-analysis with other statistical tools.

Cross-references

  • Upstream packaging strategy (Spack / EasyBuild / EESSI candidate assessment): obsidian-notes/Software/eOn/packaging-strategy-spack-easybuild-eessi.org.

  • cvmfsexec mount debug (the origin of the Arch libfuse3 fix): obsidian-notes/Software/HPC/cvmfsexec_arch_linux_eessi_mount_debug.org.

  • eOn’s analogous overlay and the first-pass benchmark results that motivated the gproptimoverlay: obsidian-notes/Software/HPC/eon_eessi_overlay_session.org.

  • In-repo overlay README: eessi/README.md.