SpECTRE  v2024.04.12
PhaseControl::CheckpointAndExitAfterWallclock Struct Reference

Phase control object that runs the WriteCheckpoint and Exit phases after a specified amount of wallclock time has elapsed. More...

#include <CheckpointAndExitAfterWallclock.hpp>

Classes

struct  WallclockHours
 

Public Types

using options = tmpl::list< WallclockHours >
 
using argument_tags = tmpl::list<>
 
using return_tags = tmpl::list<>
 
using phase_change_tags_and_combines = tmpl::list< Tags::RestartPhase, Tags::WallclockHoursAtCheckpoint, Tags::CheckpointAndExitRequested >
 
template<typename Metavariables >
using participating_components = typename Metavariables::component_list
 

Public Member Functions

 CheckpointAndExitAfterWallclock (const std::optional< double > wallclock_hours, const Options::Context &context={})
 
 CheckpointAndExitAfterWallclock (CkMigrateMessage *msg)
 
template<typename... DecisionTags>
void initialize_phase_data_impl (const gsl::not_null< tuples::TaggedTuple< DecisionTags... > * > phase_change_decision_data) const
 
template<typename ParallelComponent , typename ArrayIndex , typename Metavariables >
void contribute_phase_data_impl (Parallel::GlobalCache< Metavariables > &cache, const ArrayIndex &array_index) const
 
template<typename... DecisionTags, typename Metavariables >
std::optional< std::pair< Parallel::Phase, ArbitrationStrategy > > arbitrate_phase_change_impl (const gsl::not_null< tuples::TaggedTuple< DecisionTags... > * > phase_change_decision_data, const Parallel::Phase current_phase, const Parallel::GlobalCache< Metavariables > &) const
 
void pup (PUP::er &p) override
 
- Public Member Functions inherited from PhaseChange
 PhaseChange (CkMigrateMessage *msg)
 
 WRAPPED_PUPable_abstract (PhaseChange)
 
template<typename ParallelComponent , typename DbTags , typename Metavariables , typename ArrayIndex >
void contribute_phase_data (const gsl::not_null< db::DataBox< DbTags > * > box, Parallel::GlobalCache< Metavariables > &cache, const ArrayIndex &array_index) const
 Send data from all participating_components to the Main chare for determining the next phase.
 
template<typename... DecisionTags, typename Metavariables >
std::optional< std::pair< Parallel::Phase, PhaseControl::ArbitrationStrategy > > arbitrate_phase_change (const gsl::not_null< tuples::TaggedTuple< DecisionTags... > * > phase_change_decision_data, const Parallel::Phase current_phase, const Parallel::GlobalCache< Metavariables > &cache) const
 Determine a phase request and PhaseControl::ArbitrationStrategy based on aggregated phase_change_decision_data on the Main Chare.
 
template<typename Metavariables , typename... Tags>
void initialize_phase_data (const gsl::not_null< tuples::TaggedTuple< Tags... > * > phase_change_decision_data) const
 Initialize the phase_change_decision_data on the main chare to starting values.
 

Static Public Attributes

static constexpr Options::String help
 

Detailed Description

Phase control object that runs the WriteCheckpoint and Exit phases after a specified amount of wallclock time has elapsed.

When the executable exits from here, it does so with Parallel::ExitCode::ContinueFromCheckpoint.

This phase control is useful for running SpECTRE executables performing lengthy computations that may exceed a supercomputer's wallclock limits. Writing a single checkpoint at the end of the job's allocated time allows the computation to be continued, while minimizing the disc space taken up by checkpoint files.

Note that this phase control is not a trigger on wallclock time. Rather, it checks the elapsed wallclock time when called, likely from a global sync point triggered by some other mechanism, e.g., at some slab boundary. Therefore, the WriteCheckpoint and Exit phases will run the first time this phase control is called after the specified wallclock time has been reached.

Warning
the global sync points must be triggered often enough to ensure there will be at least one sync point (i.e., one call to this phase control) in the window between the requested checkpoint-and-exit time and the time at which the batch system will kill the executable. To make this more concrete, consider this example: when running on a 12-hour queue with a checkpoint-and-exit requested after 11.5 hours, there is a 0.5-hour window for a global sync to occur, the checkpoint files to be written to disc, and the executable to clean up. In this case, triggering a global sync every 2-10 minutes might be desirable. Matching the global sync frequency with the time window for checkpoint and exit is the responsibility of the user!

Member Data Documentation

◆ help

constexpr Options::String PhaseControl::CheckpointAndExitAfterWallclock::help
staticconstexpr
Initial value:
{
"Once the wallclock time has exceeded the specified amount, trigger "
"writing a checkpoint and then exit with the 'ContinueFromCheckpoint' "
"exit code."}

The documentation for this struct was generated from the following file: