|
SpECTRE
v2026.04.01
|
SpECTRE executables can write checkpoints that save their instantaneous state to disc; the execution can be restarted later from a saved checkpoint. This feature is useful for expensive simulations that would run longer than the wallclock limits on a supercomputer system.
Executables can checkpoint when:
To restart an executable from a checkpoint file, run a command like this:
where the 0123 should be the number of the checkpoint to restart from. You can also use the command-line interface (CLI) for restarting:
There are a number of caveats in the current implementation of checkpointing and restarting:
Certain simulation parameters can be modified when restarting from a checkpoint file. This is done by parsing a new input file containing just those options to modify; all other options will preserve their value from the original run.
Note, however, that not all tags are permitted to be modified: in the current implementation, only tags from the const_global_cache_tags that also have a member variable static constexpr bool is_overlayable = true; can be modified. The reason for this "opt-in" design is that in general, most tags interact with past or current simulation data in a way that would invalidate the simulation state if the tag were modified on restart (example: changing the domain invalidates all spatial data, changing the timestepper invalidates the history). Only tags that do not interact with the state should be permitted to be updated. For example: activation thresholds on various algorithms, or frequency of data observation, are safe parameters to modify.
The executable will update the global cache with new input file values during the phase UpdateOptionsAtRestartFromCheckpoint. The CheckpointAndExitAfterWallclock phase control automatically directs code flow to this phase after a restart.
In this option-updating phase, the code tries to read an "overlay" input file whose name is computed from the original input file and the number of the checkpoint used to restart. Say the original input file is path/to/Input.yaml and the code is restarted using a checkpoint +restart Checkpoints/Checkpoint_0123, then the overlay input file to read has name path/to/Input.overlay_0123.yaml. If this file does not exist, the executable continues with previous parameter values.