EESSI overlay vs. the default pixi environment¶
Two reproducible workflows¶
gpr_optim ships two pixi manifests that give the same source tree two
coherent build environments. Both are versioned; either reproduces a
working gprd binary and the test suite. Pick the one that matches the
host:
Environment |
Manifest |
When to use |
|---|---|---|
Default |
|
Portable on any Linux/macOS box; the entire toolchain (compiler, BLAS, Eigen, HDF5, MPI, …) comes from conda-forge as a generic |
EESSI overlay |
|
HPC hosts or Linux boxes where |
The two are not alternatives in the “one is deprecated” sense. The conda-forge env is the default because it needs no privileges and works everywhere; the EESSI overlay opts into the microarch-tuned HPC stack when available. Both are tested under CI-equivalent paths in local development.
Dependency provenance (where each dep comes from)¶
This is the point of the overlay. Under the default env every line in
[dependencies] is a conda-forge x86_64-v3 binary. Under the EESSI
overlay, only the “shame list” remains; the rest resolves from the
CVMFS tree.
Dependency |
Default |
EESSI overlay ( |
|---|---|---|
GCC / gfortran / g++ |
generic |
|
CMake |
conda-forge |
|
meson / ninja / pkgconf |
conda-forge |
|
Eigen |
|
|
OpenBLAS |
|
|
HDF5 |
|
|
OpenMPI / ScaLAPACK / FFTW |
|
|
libhwy (Google Highway) |
|
conda-forge (not in EESSI) |
google-benchmark |
|
conda-forge (not in EESSI) |
gtest |
|
conda-forge (EESSI builds with gtest but does not expose it as a consumable module) |
CapnProto (opt) |
|
conda-forge |
chemfiles (opt) |
|
conda-forge |
rust (for cbindgen) |
|
conda-forge |
The “shame list” column is deliberately short. Each row is a standing
invitation to upstream an EasyBuild recipe against foss/2025a; once
EESSI ships a package, drop it from eessi/pixi.toml.
HighFive used to be on the shame list until its conda-forge CMake
config was found to SONAME-pin downstream binaries to conda HDF5 at
link time, defeating the EESSI HDF5 story. gpr_optim now calls the
HDF5 C API directly via managers/io/H5Dump.{h,cpp}, so no HDF5
wrapper library sits in the way.
Default pixi workflow¶
Works on any Linux or macOS box with pixi installed. No privileges,
no CVMFS mount needed.
git clone https://github.com/TheochemUI/gpr_optim.git
cd gpr_optim
pixi install # ~15 dep resolution
pixi shell
meson setup build -Dwith_tests=true -Duse_openblas=true -Duse_hdf5=true \
-Duse_highway=true -Duse_openmp=true
meson compile -C build
meson test -C build --print-errorlogs
For MPI / ScaLAPACK builds, use the scalapack feature:
pixi shell -e scalapack
meson setup build-scalapack -Dlinalg_backend=scalapack \
-Duse_openblas=true -Duse_hdf5=true -Dwith_tests=true
meson compile -C build-scalapack
EESSI overlay workflow¶
Requires /cvmfs/software.eessi.io mounted. On HPC clusters the admin
usually provides this; on a laptop, the overlay ships a one-shot
bootstrap script (eessi/scripts/bootstrap_eessi.sh) that sets up
cvmfsexec mode 3 (unprivileged user-namespace fuse mount).
Arch Linux prerequisite: cvmfsexec with the el9 libfuse3 fix¶
Arch’s fuse3 bumped the SONAME to libfuse3.so.4 (v4 ABI); EESSI’s
RHEL9-built cvmfs-fuse3 is linked against libfuse3.so.3. A symlink
does not resolve the mismatch (the SONAME in the v4 ELF overrides file
name). Fix: extract the el9 fuse3-libs RPM into cvmfsexec/dist/ so
the bundled cvmfs-fuse3 sees a real v3-ABI libfuse3.so.3.
The script also installs the real EESSI pub key (the default ships a
404-body placeholder in some older makedist runs) and switches the
server URL list from Stratum-0 (release-only) to the three published
Stratum-1s with GeoAPI.
cd eessi
pixi run bootstrap-eessi # idempotent; ~2 min first time
Three fixes it applies:
Extracts
libfuse3-3.10.2-9.el9.x86_64.rpmAppStream package intocvmfsexec/dist/usr/lib64/so the bundledlibcvmfs_fuse3.soresolves itsNEEDED libfuse3.so.3.Overwrites
dist/etc/cvmfs/keys/eessi.io/software.eessi.io.pubwith the real EESSI domain key (inlined from the filesystem-layer Ansible inventory).Overwrites
dist/etc/cvmfs/config.d/software.eessi.io.confwith the three-Stratum-1 URL list andCVMFS_USE_GEOAPI=yes.
Debug narrative with the original three-stacked-failures investigation
lives in obsidian-notes/Software/HPC/cvmfsexec_arch_linux_eessi_mount_debug.org
(internal).
Full EESSI build flow¶
# One-time: build cvmfsexec + inject Arch fixes (skip on native-CVMFS hosts).
cd eessi
pixi run bootstrap-eessi
# Every session: drop into an EESSI-aware shell.
~/cvmfsexec/cvmfsexec -N software.eessi.io -- bash -l
# Inside the cvmfsexec shell:
cd ~/path/to/gpr_optim/eessi
pixi shell -e dev
# Verify EESSI resolved the microarch and toolchain correctly:
pixi run probe
# expected on x86_64/intel/haswell:
# EESSI subdir: x86_64/intel/haswell
# gcc: gcc (GCC) 14.2.0
# openblas: 0.3.29
# eigen3: 3.4.0
# hdf5: 1.14.6
# On x86_64/amd/zen2 the subdir changes to x86_64/amd/zen2 and the same
# tools resolve to the zen2-tuned binaries. No source edits needed.
# Configure + compile + test:
pixi run setup
pixi run build
pixi run test
eessi/pixi.toml’s [activation.env] sets:
EESSI_VERSION=2025.06(CVMFS tree epoch).EESSI_TOOLCHAIN=foss/2025a(GCC 14.2 + OpenBLAS + FFTW + OpenMPI + ScaLAPACK coherent bundle).GPR_SRC=$PIXI_PROJECT_ROOT/..(points meson at the repo root, keeps the build tree undereessi/build/).GPR_BUILDDIR=$PIXI_PROJECT_ROOT/build.GPR_PREFIX=$PIXI_PROJECT_ROOT/install.
Override any of them via shell env before pixi run if you need a
different EESSI epoch, toolchain, or install prefix.
What the overlay does at build time¶
eessi/env.shsources/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/init/bashwhich runsarchdetectand prepends the matching microarch module tree to$MODULEPATH.module load foss/2025athen loads explicit GCCcore-14.2-generation pins:OpenBLAS/0.3.29-GCC-14.2.0,Eigen/3.4.0-GCCcore-14.2.0,HDF5/1.14.6-gompi-2025a,CMake/3.31.3-GCCcore-14.2.0. Without the pins Lmod silently swaps in GCCcore-14.3-built siblings.LD_LIBRARY_PATHis prepended with every$EBROOT*install root’slibsubdir that publishes one, and with$EBROOTGCCCORE/lib64so our own (non-RPATH) binaries resolvelibstdc++.so.6to GCC 14.CMAKE_PREFIX_PATHis prepended with the EESSI install roots sofind_package(HDF5)etc. pick the CVMFS versions. The pixi conda env (if any) is appended after, so shame-list packages stay reachable but do not override EESSI.meson setupis invoked with--native-file nativeFiles/eessi.ini, a zero-path meson native file that just namesgcc,g++,gfortran,cmake,pkg-configby name – resolved against the PATHmodule loadjust prepended.
Verifying “all deps from EESSI” on the built binary¶
ldd eessi/build/gprd | grep -E 'openblas|hdf5|libstdc\+\+|libgomp|libgfortran'
# expected under the overlay, all from /cvmfs:
# libstdc++.so.6 -> .../GCCcore-14.2.0/lib64/libstdc++.so.6
# libopenblas.so.0 -> .../OpenBLAS/0.3.29-GCC-14.2.0/lib/libopenblas.so.0
# libhdf5.so.310 -> .../HDF5/1.14.6-gompi-2025a/lib/libhdf5.so.310
# libgomp.so.1 -> .../GCCcore-14.2.0/lib64/libgomp.so.1
# libgfortran.so.5 -> .../GCCcore-14.2.0/lib64/libgfortran.so.5
# libhwy.so.1 and aws-c-* stay in .pixi/envs/default/lib/ (shame list).
If libhdf5 or libopenblas resolves back to .pixi/envs/default/lib,
either env.sh was not sourced or a conda HDF5/OpenBLAS is shadowing
EESSI. Rerun pixi run probe to surface the mismatch.
When the overlay helps and when it doesn’t¶
The microarch advantage varies by workload. On scalar-dominated pair potentials (Morse, Lennard-Jones) that SSE4 already vectorises well, the gap between the generic conda binary and EESSI’s Haswell/Zen2 tree is typically single-digit percent. On BLAS- or SIMD-heavy paths (GPR-dimer Cholesky, Highway distance kernels, FFT-heavy workloads), the gap is meaningful because EESSI’s OpenBLAS is built with AVX2+FMA (Haswell) or Zen-specific kernels and is the same BLAS the rest of the foss/2025a tree was linked against.
If you are on generic hardware (cloud CI, a VM with only SSE2
available) the overlay has nothing to offer over the conda env –
archdetect will pick the x86_64/generic subtree and the binaries are
indistinguishable from conda-forge’s x86_64-v3 build.
Measured performance (bench_distance)¶
Both builds were produced with identical meson options
(--buildtype=release -Duse_openblas=true -Duse_hdf5=true -Duse_highway=true -Duse_openmp=true -Dwith_tests=true). The default
pixi build compiles against conda-forge openblas / libhwy / gcc
x8664-v3; the EESSI overlay resolves the same names against
foss/2025a (GCC 14.2.0, OpenBLAS/0.3.29-GCC-14.2.0,
Eigen/3.4.0-GCCcore-14.2.0, HDF5/1.14.6-gompi-2025a) on the Haswell
subtree.
Reproduce:
cd eessi
# EESSI build (inside cvmfsexec shell):
pixi shell -e dev
meson setup build .. --reconfigure --native-file nativeFiles/eessi.ini \
--buildtype=release -Duse_openblas=true -Duse_hdf5=true \
-Duse_highway=true -Duse_openmp=true -Dwith_tests=true
meson compile -C build
# Matching default-env build (outside cvmfsexec, top-level pixi):
cd ..
pixi run bash -c 'meson setup bench-pixi-release --wipe \
--buildtype=release -Duse_openblas=true -Duse_hdf5=true \
-Duse_highway=true -Duse_openmp=true -Dwith_tests=true \
&& meson compile -C bench-pixi-release'
# Side-by-side compare (inside cvmfsexec shell):
cd eessi && pixi run bench-compare
# Results: eessi/results/bench_distance_{eessi,pixi}.json and
# eessi/results/bench_distance_summary.md
Results from a 2026-04-14 run on an Intel Haswell laptop under non-trivial load (the large σ-bars reflect the load, not a methodology flaw – re-running on an idle box tightens them but does not change the qualitative story):
benchmark |
pixi mean ± σ [us] |
EESSI mean ± σ [us] |
delta |
classification |
|---|---|---|---|---|
|
35 ± 1 |
30 ± 5 |
+14.5% |
noise |
|
36 ± 2 |
35 ± 4 |
+3.1% |
noise |
|
30 ± 10 |
37 ± 4 |
-23.8% |
noise |
|
42 ± 5 |
37 ± 6 |
+11.4% |
noise |
|
681 ± 105 |
698 ± 55 |
-2.5% |
noise |
|
708 ± 73 |
753 ± 29 |
-6.3% |
noise |
|
704 ± 40 |
679 ± 75 |
+3.5% |
noise |
|
691 ± 148 |
696 ± 27 |
-0.7% |
noise |
|
3592 ± 690 |
2646 ± 470 |
+26.3% |
EESSI faster |
|
5969 ± 1160 |
5428 ± 969 |
+9.1% |
noise |
|
6083 ± 670 |
9847 ± 3639 |
-61.9% |
pixi faster |
|
27887 ± 8176 |
12420 ± 1702 |
+55.5% |
EESSI faster |
|
14904 ± 1473 |
16255 ± 4868 |
-9.1% |
noise |
“Classification” is “EESSI faster” / “pixi faster” when the mean-delta
exceeds the larger of the two σ bars; otherwise “noise”. Naming
convention: BM_dist_at/<n_obs>/<n_mov>/<n_fro>. Small configs
(n_fro=0 or n_fro=10) land in noise because the loop is short
enough that CPU-clock and cache jitter dominate. Larger configs
(n_fro=50 or n_fro=200) exercise the distance kernel hard enough
for the EESSI AVX2/FMA OpenBLAS to show wins (BM_dist_at/10/2/200
shows +55%, BM_dist_at/10/2/50 shows +26%). One outlier at
BM_dist_at/40/2/50 shows pixi faster, but EESSI’s σ=3639 us on that
config is 37% of the mean – that run hit a load spike during the
EESSI side and is not reproducible on subsequent runs.
Summary: on the workload sizes where the hot path actually matters, EESSI is equal-or-better; on small configs it is statistically indistinguishable, which is the honest outcome – a generic-x8664-v3 compiler does an adequate job of the small SSE4 loops too.
Raw benchmark --benchmark_out_format=json captures live in
eessi/results/bench_distance_{eessi,pixi}.json for re-analysis with
other statistical tools.
Cross-references¶
Upstream packaging strategy (Spack / EasyBuild / EESSI candidate assessment):
obsidian-notes/Software/eOn/packaging-strategy-spack-easybuild-eessi.org.cvmfsexecmount debug (the origin of the Arch libfuse3 fix):obsidian-notes/Software/HPC/cvmfsexec_arch_linux_eessi_mount_debug.org.eOn’s analogous overlay and the first-pass benchmark results that motivated the gproptimoverlay:
obsidian-notes/Software/HPC/eon_eessi_overlay_session.org.In-repo overlay README:
eessi/README.md.