Classes | Public Types | Public Member Functions | Static Public Attributes | List of all members
CheckpointAndExitAfterWallclock< Metavariables, PhaseChangeRegistrars > Struct Template Reference

Phase control object that runs the WriteCheckpoint and Exit phases after a specified amount of wallclock time has elapsed. More...

#include <CheckpointAndExitAfterWallclock.hpp>

Classes

struct  WallclockHours
 

Public Types

using options = tmpl::list< WallclockHours >
 
using argument_tags = tmpl::list<>
 
using return_tags = tmpl::list<>
 
using phase_change_tags_and_combines = tmpl::list< Tags::RestartPhase< typename Metavariables::Phase >, Tags::WallclockHoursAtCheckpoint, Tags::CheckpointAndExitRequested >
 
template<typename LocalMetavariables >
using participating_components = typename LocalMetavariables::component_list
 

Public Member Functions

 CheckpointAndExitAfterWallclock (const std::optional< double > wallclock_hours, const Options::Context &context={})
 
 CheckpointAndExitAfterWallclock (CkMigrateMessage *msg) noexcept
 
template<typename... DecisionTags>
void initialize_phase_data_impl (const gsl::not_null< tuples::TaggedTuple< DecisionTags... > * > phase_change_decision_data) const noexcept
 
template<typename ParallelComponent , typename ArrayIndex , typename LocalMetavariables >
void contribute_phase_data_impl (Parallel::GlobalCache< LocalMetavariables > &cache, const ArrayIndex &array_index) const noexcept
 
template<typename... DecisionTags, typename LocalMetavariables >
std::optional< std::pair< typename Metavariables::Phase, ArbitrationStrategy > > arbitrate_phase_change_impl (const gsl::not_null< tuples::TaggedTuple< DecisionTags... > * > phase_change_decision_data, const typename LocalMetavariables::Phase current_phase, const Parallel::GlobalCache< LocalMetavariables > &) const noexcept
 
void pup (PUP::er &p) noexcept override
 

Static Public Attributes

static constexpr Options::String help
 

Detailed Description

template<typename Metavariables, typename PhaseChangeRegistrars = tmpl::list< Registrars::CheckpointAndExitAfterWallclock<Metavariables>>>
struct CheckpointAndExitAfterWallclock< Metavariables, PhaseChangeRegistrars >

Phase control object that runs the WriteCheckpoint and Exit phases after a specified amount of wallclock time has elapsed.

This phase control is useful for running SpECTRE executables performing lengthy computations that may exceed a supercomputer's wallclock limits. Writing a single checkpoint at the end of the job's allocated time allows the computation to be continued, while minimizing the disc space taken up by checkpoint files.

Note that this phase control is not a trigger on wallclock time. Rather, it checks the elapsed wallclock time when called, likely from a global sync point triggered by some other mechanism, e.g., at some slab boundary. Therefore, the WriteCheckpoint and Exit phases will run the first time this phase control is called after the specified wallclock time has been reached.

Warning
the global sync points must be triggered often enough to ensure there will be at least one sync point (i.e., one call to this phase control) in the window between the requested checkpoint-and-exit time and the time at which the batch system will kill the executable. To make this more concrete, consider this example: when running on a 12-hour queue with a checkpoint-and-exit requested after 11.5 hours, there is a 0.5-hour window for a global sync to occur, the checkpoint files to be written to disc, and the executable to clean up. In this case, triggering a global sync every 2-10 minutes might be desirable. Matching the global sync frequency with the time window for checkpoint and exit is the responsibility of the user!

Member Data Documentation

◆ help

template<typename Metavariables , typename PhaseChangeRegistrars = tmpl::list< Registrars::CheckpointAndExitAfterWallclock<Metavariables>>>
constexpr Options::String CheckpointAndExitAfterWallclock< Metavariables, PhaseChangeRegistrars >::help
staticconstexpr
Initial value:
{
"Once the wallclock time has exceeded the specified amount, trigger "
"writing a checkpoint and then exit."}

The documentation for this struct was generated from the following file: