gproptim
    • eOn Saddle point finding on potential energy surfaces
    • rgpot Universal RPC potential interface
/

Tutorial

  • Tutorial: GP-Accelerated Saddle Point Search
  • Tutorial: Fortran Interop
  • MPI-Parallel Saddle Point Search

How-to

  • How to: Use the T-Process for Robust GP Regression
  • How to: Run GP-NEB with OIE Acquisition
  • How-to: Use the C API from Fortran

Guide

  • Installation
  • EESSI overlay vs. the default pixi environment
  • Usage
  • Performance Optimisation
  • RPC and Serialization

Reference

  • Contributing
    • Performance Development Guidelines
  • Changelog

Python API

  • API Reference
    • gpr_optim

C++ API

  • C++ API Reference
    • Global Namespace
      • namespace atmd
        • class atmd::AtomicDimer
      • namespace aux
        • struct aux::AtomSignature
        • struct aux::InverseDistanceCache
        • class aux::AuxiliaryFunctionality
        • class aux::Distance
        • class aux::Gradient
        • class aux::ProblemSetUp
      • namespace ceres
      • namespace dimer
        • class dimer::Dimer
      • namespace fmin
      • namespace funcmin
        • class funcmin::ADAM
        • class funcmin::Ceres
        • template class funcmin::GprCostFunction
        • class funcmin::SCG
      • namespace gpr
      • namespace gpr
      • namespace gpr
      • namespace gpr
      • namespace gpr
        • namespace gpr::coord
        • namespace gpr::field
        • namespace gpr::io
        • namespace gpr::laplace
        • namespace gpr::linalg
        • namespace gpr::neb
        • namespace gpr::optim
        • namespace gpr::potentials
        • namespace gpr::priors
        • namespace gpr::sfc
        • namespace gpr::tests
        • enum gpr::HyperparameterOptimizationMode
        • enum gpr::ScgCurvatureMode
        • enum gpr::UncertaintyMode
        • struct gpr::ARDSearchResult
        • struct gpr::AdamOptimizationSettings
        • struct gpr::AtomsConfiguration
        • struct gpr::AtomsPositionAndType
        • struct gpr::BMAPrediction
        • struct gpr::CeresOptimizationSettings
        • template struct gpr::Derivatives
        • struct gpr::EnergyAndGradient
        • struct gpr::EnergySurfaceOutputInfo
        • struct gpr::GPRSetup
        • struct gpr::INLAPrediction
        • struct gpr::Indices2D
        • struct gpr::InputParameters
        • struct gpr::IterationsLimits
        • template struct gpr::KeyValuePair
        • struct gpr::LBFGSInfo
        • struct gpr::NoiseSearchResult
        • struct gpr::Observation
        • struct gpr::OptimizerCommonSettings
        • template struct gpr::Pair
        • struct gpr::ScgOptimizationSettings
        • struct gpr::SexpatLocalOracleScope
        • struct gpr::StopCriteriaForDimer
        • struct gpr::StructuredPermutation
        • struct gpr::TransitionParameters
        • template struct gpr::Triplet
        • class gpr::ARDGridSearch
        • class gpr::BayesianModelAveraging
        • class gpr::ConstantCF
        • class gpr::FactorizedLikelihood
        • class gpr::GaussianProcessRegression
        • class gpr::InverseDistanceDescriptor
        • class gpr::LaplaceINLA
        • class gpr::LikGaussian
        • class gpr::NoiseGridSearch
        • class gpr::ObservationLikelihood
        • class gpr::PriorBase
        • class gpr::PriorGaussian
        • class gpr::PriorLogNormal
        • class gpr::PriorLogUnif
        • class gpr::PriorSqrtt
        • class gpr::PriorT
        • class gpr::PriorUnif
        • class gpr::RFFModel
        • class gpr::RandomFourierFeatures
        • class gpr::SexpatCF
        • class gpr::StudentTProcessRegression
        • class gpr::TrustRadius
      • namespace laplace
      • namespace linalg
      • namespace math
        • class math::DistributionFunctions
        • class math::MatrixProperties
      • namespace pot
        • class pot::PotentialWrapper
      • namespace std
      • enum ConvergenceNormType
      • enum DebugLevels
      • enum DimerAlgorithm
      • enum Directions
      • enum DistanceMetricType
      • enum FrozenAndMovingAtoms
      • enum OptimizationAlgorithms
      • enum OptionsForGradCalculation
      • enum Potentials
      • enum gpr_acquisition_t
      • enum gpr_conv_norm_t
      • enum gpr_model_type_t
      • enum gpr_neb_mode_t
      • enum gpr_optim_alg_t
      • enum gpr_uncertainty_mode_t
      • struct gpr_atoms_config_t
      • struct gpr_atoms_s
      • struct gpr_dimer_config_t
      • struct gpr_lj_params_t
      • struct gpr_minimize_config_t
      • struct gpr_model_config_t
      • struct gpr_model_s
      • struct gpr_neb_config_t
      • struct gpr_neb_s

On this page

  • What you will learn
  • Prerequisites
  • Building with MPI support
  • Preparing input files
    • Geometry: con
    • Configuration: capnp binary
    • Dimer orientation: dat
  • Running the search
    • Single rank (serial)
    • Multiple ranks (MPI parallel)
  • What gets parallelized
  • Scaling results
    • CuH2 (218 atoms, EAM potential)
    • Cyclopentadienyl radical (14 atoms, PET-MAD-xs-v1.5.0)
  • Convergence path
  • Troubleshooting
    • Warning! Matrix L is not positive definite
    • Warning! Function value at xnew not finite
    • Dimer not converging
gpr_optim 0 0
Edit this page
  1. gproptim /
  2. MPI-Parallel Saddle Point Search
View Source Open in ChatGPT Open in Claude

MPI-Parallel Saddle Point Search¶

What you will learn¶

This tutorial walks through running a GP-accelerated dimer saddle point search with MPI parallelism. By the end you will be able to:

  1. Build gprd with the ScaLAPACK backend for distributed dense linear algebra

  2. Prepare input files (.con geometry, capnp config, orientation) for any molecular system

  3. Run the search with a PET-MAD universal ML potential via rgpot

  4. Scale across MPI ranks and measure the speedup

Prerequisites¶

  • A working pixi installation

  • An ML potential model file (PET-MAD .pt or any metatomic-compatible model)

  • A .con geometry file for your system

Building with MPI support¶

The ScaLAPACK backend distributes both the GP kernel assembly and the Cholesky factorization across MPI ranks. Install the environment and build:

pixi install -e scalapack
pixi run -e scalapack meson setup bld \
    -Dlinalg_backend=scalapack \
    -Duse_readcon=true \
    -Duse_capnp=true \
    -Duse_hdf5=false \
    -Dwith_tests=true
pixi run -e scalapack ninja -C bld

For MetatomicPot (PET-MAD), you also need to pass cmake prefix paths for torch, metatensor, and metatomic. These are found via Python:

SP=$(pixi run -e scalapack python3 -c \
    'import site; print(site.getsitepackages()[0])')
cat > torch_native.ini << EOF
[built-in options]
cmake_prefix_path = [
    '$SP/torch/share/cmake',
    '$SP/metatensor/lib/cmake',
    '$SP/metatensor/torch/torch-2.7/lib/cmake',
    '$SP/metatomic/torch/torch-2.7/lib/cmake'
]
cpp_args = [
    '-I$SP/metatensor/include',
    '-I$SP/metatensor/torch/torch-2.7/include',
    '-I$SP/metatomic/torch/torch-2.7/include',
    '-I$SP/vesin/include'
]
EOF
LIBRARY_PATH="$SP/vesin/lib:$LIBRARY_PATH" \
pixi run -e scalapack meson setup bld \
    --native-file torch_native.ini \
    -Dlinalg_backend=scalapack \
    -Duse_readcon=true \
    -Duse_capnp=true
pixi run -e scalapack ninja -C bld

Verify the build:

pixi run -e scalapack meson test -C bld
# Expect 41/41 tests pass (or 43 with scaling tests)

Preparing input files¶

Geometry: pos.con¶

The .con format is used by eOn and stores atomic positions, types, cell dimensions, and frozen/moving flags. Example for oxadiazole (9 atoms):

Generated by eOn

20.000000 20.000000 20.000000
90.000000 90.000000 90.000000


4
2 1 2 4
14.000000 15.980000 15.980000 1.008000
N
Coordinates of component   1
12.709927 12.943985 10.896305 0 1
...

The fifth column (0 or 1) marks each atom as moving (0) or frozen (1). gprd reads this via readcon-core and automatically assigns atom types from atomic numbers.

Configuration: capnp binary¶

GPR dimer parameters are stored as a Cap’n Proto binary. Generate one from a TOML config:

# Convert an eOn config.ini to the TOML format:
python3 schema/eon_ini_to_toml.py config.ini --con pos.con -o config.toml

# Generate the capnp binary:
uv run schema/gen_capnp_params.py config.toml params.bin

Key parameters to tune per system:

Parameter

TOML key

Typical value

Convergence force

problem.finalConvergenceForce

0.01

Max outer iterations

problem.relaxation.maxOuterIterations

300

SCG max iterations

optimizer.maxIter

400

SCG lambda

optimizer.scg.lambda

100

GP noise variance

gpr.gpSigma2

1e-5

Active radius

problem.actdistFro

30.3

Dimer orientation: direction.dat¶

One line per atom with dx dy dz displacement components. This sets the initial dimer orientation along a physically meaningful direction (e.g. a reaction coordinate). If omitted, gprd generates a random orientation.

Running the search¶

Single rank (serial)¶

gprd --pos pos.con \
     --orient direction.dat \
     --config params.bin \
     --potential metatomic \
     --model pet-mad-xs-v1.5.0.pt

Multiple ranks (MPI parallel)¶

mpirun -np 4 gprd --pos pos.con \
     --orient direction.dat \
     --config params.bin \
     --potential metatomic \
     --model pet-mad-xs-v1.5.0.pt

Available potentials: xtb (GFN2-xTB), gfn2, gfnff (GFN force field), lj (Lennard-Jones), cuh2 (EAM for Cu+H2), metatomic (any metatrain model).

What gets parallelized¶

The GP pipeline distributes work across MPI ranks at three levels:

  1. Kernel matrix assembly: The covariance matrix K is built tile by tile directly into a 2D block-cyclic distributed buffer. Each rank walks the lower-triangle of observation tile pairs and the BLACS ownership predicate selects exactly one rank per tile; owned tiles are evaluated locally and written into the rank’s block-cyclic slice. No replicated full K is materialized and no MPI_Allreduce fires on the assembled matrix. A legacy round-robin + Allreduce assembly (do_reduce=true) remains for non-ScaLAPACK callers and unit tests.

  2. Cholesky factorization: K = L L^T via ScaLAPACK pdpotrf over the same 2D block-cyclic BLACS process grid that owns the assembled tiles – no scatter step needed. Triangular solves (pdtrsm) and inverse (pdpotri) are also distributed and reuse the cached factor.

  3. Gradient matrices: The dK/d\theta matrices for each hyperparameter follow the same tile-based ownership pattern; the per-hyperparameter trace term is computed locally per rank and a single ordered_sum_bcast reduces all n_hp traces in one collective.

The force evaluations (PET-MAD inference) are replicated on all ranks. For systems where force evaluation is cheap relative to GP operations (e.g. ML potentials on small molecules), the MPI scaling is dominated by the GP linear algebra.

Scaling results¶

CuH2 (218 atoms, EAM potential)¶

The gprd binary on the hardcoded CuH2 system shows near-linear scaling:

Ranks

Wall time (s)

Speedup

1

26.1

1.0x

2

14.4

1.8x

4

8.3

3.1x

8

4.7

5.6x

Cyclopentadienyl radical (14 atoms, PET-MAD-xs-v1.5.0)¶

A real saddle point search on a hydrocarbon radical. The GP training set grows from 86 to 500+ data points as the dimer converges. With PET-MAD as the potential:

Ranks

Wall time (s)

Speedup

Outer iterations

Force evals

1

4829.5

1.0x

17

23

4

558.7

8.6x

13

19

16

263.8

18.3x

13

19

All multi-rank runs use OMP_NUM_THREADS=2 to control torch thread contention. The np=16 run uses rank-0-only force evaluation with MPI_Bcast, eliminating 15 redundant ML inference calls.

The np=4 run converges in 9.3 minutes with the dimer finding a saddle point at energy -6.952 eV. The observation count grows to 500+ during the search, giving kernel matrices of dimension 500 x 43 = 21,500 – well into the regime where distributed linear algebra matters.

Convergence path¶

The maximum force on any atom decreases across outer relaxation iterations:

Iteration   np=1     np=4
0          52.18    52.18
1           8.99     8.65
2           4.01     4.08
3           3.75     4.07
4           3.37     5.78
5           2.50     1.70
6           2.80     0.96
7           2.87     0.81
8           2.61     0.26
9           2.57     0.16
10          4.69     0.05
11          1.12     0.02
12          0.55     0.01  (np=4 converged)
13          0.25
14          0.06
15          0.03
16          0.02     (np=1 converged)

The slight differences between ranks (e.g. 8.99 vs 8.65 at iteration 1) are from floating-point non-associativity in the distributed reductions (ordered_sum_bcast and MPI_Allreduce inside the BLAS calls). Both runs converge to the same saddle point within the force tolerance.

Troubleshooting¶

Warning! Matrix L is not positive definite¶

The GP covariance matrix became numerically singular during SCG optimization. The SCG automatically increases its damping parameter (lambda) and retries. This is normal for ill-conditioned systems and does not indicate a bug.

Warning! Function value at xnew not finite¶

SCG took a step that produced NaN or infinity in the log-marginal likelihood. This happens when hyperparameters drift to extreme values (e.g. lengthscale near zero). The SCG damping mechanism handles it automatically.

Dimer not converging¶

If Current Max is not decreasing across outer iterations:

  • Check that --orient provides a physically meaningful direction (along a reaction coordinate, not random)

  • Increase max_outer_iterations

  • Try a smaller dimer_separation (default 0.01)

  • Ensure the potential gives reasonable forces for your system (test with a single-point evaluation first)

Previous
Tutorial: Fortran Interop
Next
How to: Use the T-Process for Robust GP Regression

2022--present, gproptim developers

Made with Sphinx and Shibuya theme.