Parallelization

Functions, classes and documentation related to parallelization and Charm++. More...

## Classes

class  ElementIndex< VolumeDim >
A class for indexing a Charm array by Element. More...

class  Parallel::AlgorithmImpl< ParallelComponent, tmpl::list< ActionsPack... > >
A distributed object (Charm++ Chare) that executes a series of Actions and is capable of sending and receiving data. Acts as an interface to Charm++. More...

struct  Parallel::ArrayIndex< Index >
The array index used for indexing Chare Arrays, mostly an implementation detail. More...

class  Parallel::ConstGlobalCache< Metavariables >
A Charm++ chare that caches constant data once per Charm++ node. More...

struct  Parallel::Tags::ConstGlobalCache
Tag to retrieve the Parallel::ConstGlobalCache from the DataBox. More...

struct  Parallel::Tags::FromConstGlobalCache< CacheTag >
Tag used to retrieve data from the Parallel::ConstGlobalCache. This is the recommended way for compute tags to retrieve data out of the global cache. More...

class  Parallel::Main< Metavariables >
The main function of a Charm++ executable. See the Parallelization documentation for an overview of Metavariables, Phases, and parallel components. More...

struct  Parallel::ReductionDatum< T, InvokeCombine, InvokeFinal, InvokeFinalExtraArgsIndices >
The data to be reduced, and invokables to be called whenever two reduction messages are combined and after the reduction has been completed. More...

struct  Parallel::ReductionData< ReductionDatum< Ts, InvokeCombines, InvokeFinals, InvokeFinalExtraArgsIndices >... >
Used for reducing a possibly heterogeneous collection of types in a single reduction call. More...

struct  Parallel::is_array_proxy< T >
Check if T is a Charm++ proxy for an array chare. More...

struct  Parallel::is_chare_proxy< T >
Check if T is a Charm++ proxy for a chare. More...

struct  Parallel::is_group_proxy< T >
Check if T is a Charm++ proxy for a group chare. More...

struct  Parallel::is_node_group_proxy< T >
Check if T is a Charm++ proxy for a node group chare. More...

struct  Parallel::is_bound_array< T, typename >
Check if T is a ParallelComponent for a Charm++ bound array. More...

struct  Parallel::has_pup_member< T, typename >
Check if T has a pup member function. More...

struct  Parallel::is_pupable< T, U >
Check if type T has operator| defined for Charm++ serialization. More...

## Macros

#define WRAPPED_PUPable_decl_template(className)   PUPable_decl_template(SINGLE_ARG(className))
Mark derived classes as serializable. More...

#define WRAPPED_PUPable_decl_base_template(baseClassName, className)   PUPable_decl_base_template(SINGLE_ARG(baseClassName), SINGLE_ARG(className))
Mark derived template classes as serializable. More...

## Typedefs

template<class Action >
using Parallel::get_inbox_tags_from_action = typename Parallel_detail::get_inbox_tags_from_action< Action >::type
Given an Action returns the list of inbox tags for that action.

template<class ActionsList >
using Parallel::get_inbox_tags = tmpl::remove_duplicates< tmpl::join< tmpl::transform< ActionsList, Parallel_detail::get_inbox_tags_from_action< tmpl::_1 > >> >
Given a list of Actions, get a list of the unique inbox tags.

template<class Action >
using Parallel::get_const_global_cache_tags_from_action = typename Parallel_detail::get_const_global_cache_tags_from_action< Action >::type
Given an Action returns the contents of the const_global_cache_tags alias for that action, or an empty list if no such alias exists.

template<class ActionsList >
using Parallel::get_const_global_cache_tags = tmpl::remove_duplicates< tmpl::join< tmpl::transform< ActionsList, Parallel_detail::get_const_global_cache_tags_from_action< tmpl::_1 > >> >
Given a list of Actions, get a list of the unique tags specified in the actions' const_global_cache_tags aliases.

## Functions

void Parallel::abort (const std::string &message)
Abort the program with an error message.

void Parallel::exit ()
Exit the program normally. This should only be called once over all processors.

int Parallel::number_of_procs ()
Number of processing elements.

int Parallel::my_proc ()
Index of my processing element.

int Parallel::number_of_nodes ()
Number of nodes.

int Parallel::my_node ()
Index of my node.

int Parallel::procs_on_node (const int node_index)
Number of processing elements on the given node.

int Parallel::my_local_rank ()
The local index of my processing element on my node. This is in the interval 0, ..., procs_on_node(my_node()) - 1.

int Parallel::first_proc_on_node (const int node_index)
Index of first processing element on the given node.

int Parallel::node_of (const int proc_index)
Index of the node for the given processing element.

int Parallel::local_rank_of (const int proc_index)
The local index for the given processing element on its node.

double Parallel::wall_time ()
The current wall time in seconds.

CmiNodeLock Parallel::create_lock () noexcept
Create a converse CmiNodeLock.

void Parallel::free_lock (const gsl::not_null< CmiNodeLock *> node_lock) noexcept
Free a converse CmiNodeLock. Using the lock after free is undefined behavior.

void Parallel::lock (const gsl::not_null< CmiNodeLock *> node_lock) noexcept
Lock a converse CmiNodeLock.

bool Parallel::try_lock (const gsl::not_null< CmiNodeLock *> node_lock) noexcept
Returns true if the lock was successfully acquired and false if the lock is already acquired by another processor.

void Parallel::unlock (const gsl::not_null< CmiNodeLock *> node_lock) noexcept
Unlock a converse CmiNodeLock.

template<typename... Args>
void Parallel::printf (const std::string &format, Args &&... args)
Print an atomic message to stdout with C printf usage. More...

template<typename... Args>
void Parallel::printf_error (const std::string &format, Args &&... args)
Print an atomic message to stderr with C printf usage. More...

template<class Action , class SenderProxy , class TargetProxy , class... Ts>
void Parallel::contribute_to_reduction (ReductionData< Ts... > reduction_data, const SenderProxy &sender_component, const TargetProxy &target_component) noexcept
Perform a reduction from the sender_component (typically your own parallel component) to the target_component, performing the Action upon receiving the reduction.

template<typename T , typename U >
std::vector< char > serialize (const U &obj)
Serialize an object using PUP. More...

template<typename T >
deserialize (const void *const data)
Deserialize an object using PUP. More...

template<typename ParallelComponentTag , typename Metavariables >
auto Parallel::get_parallel_component (ConstGlobalCache< Metavariables > &cache) noexcept -> Parallel::proxy_from_parallel_component< ConstGlobalCache_detail::get_component_if_mocked< typename Metavariables::component_list, ParallelComponentTag >> &
Access the Charm++ proxy associated with a ParallelComponent. More...

template<typename ParallelComponentTag , typename Metavariables >
auto Parallel::get_parallel_component (const ConstGlobalCache< Metavariables > &cache) noexcept -> const Parallel::proxy_from_parallel_component< ConstGlobalCache_detail::get_component_if_mocked< typename Metavariables::component_list, ParallelComponentTag >> &
Access the Charm++ proxy associated with a ParallelComponent. More...

template<typename ConstGlobalCacheTag , typename Metavariables >
auto Parallel::get (const ConstGlobalCache< Metavariables > &cache) noexcept -> const ConstGlobalCache_detail::type_for_get< ConstGlobalCacheTag, Metavariables > &
Access data in the cache. More...

template<typename ReceiveTag , typename Proxy , typename ReceiveDataType , Requires< detail::has_ckLocal_method< std::decay_t< Proxy >>::value > = nullptr>
Send the data args... to the algorithm running on proxy, and tag the message with the identifier temporal_id. More...

template<typename ReceiveTag , typename Proxy , typename ReceiveDataType , Requires< detail::has_ckLocal_method< std::decay_t< Proxy >>::value > = nullptr>
Send the data args... to the algorithm running on proxy, and tag the message with the identifier temporal_id. More...

template<typename Action , typename Proxy >
void Parallel::simple_action (Proxy &&proxy) noexcept
Invoke a simple action on proxy

template<typename Action , typename Proxy , typename Arg0 , typename... Args>
void Parallel::simple_action (Proxy &&proxy, Arg0 &&arg0, Args &&... args) noexcept
Invoke a simple action on proxy

template<typename Action , typename Proxy >
Invoke a threaded action on proxy, where the proxy must be a nodegroup.

template<typename Action , typename Proxy , typename Arg0 , typename... Args>
void Parallel::threaded_action (Proxy &&proxy, Arg0 &&arg0, Args &&... args) noexcept
Invoke a threaded action on proxy, where the proxy must be a nodegroup.

## Detailed Description

Functions, classes and documentation related to parallelization and Charm++.

SpECTRE builds a layer on top of Charm++ that performs various safety checks and initialization for the user that can otherwise lead to difficult-to-debug undefined behavior. The central concept is what is called a Parallel Component. A Parallel Component is a struct with several type aliases that is used by SpECTRE to set up the Charm++ chares and allowed communication patterns. Parallel Components are input arguments to the compiler, which then writes the parallelization infrastructure that you requested for the executable. There is no restriction on the number of Parallel Components, though practically it is best to have around 10 at most.

Here is an overview of what is described in detail in the sections below:

• Metavariables: Provides high-level configuration to the compiler, e.g. the physical system to be simulated.
• Phase: Defines distinct simulation phases separated by a global synchronization point, e.g. Initialize, Evolve and Exit.
• Algorithm: In each phase, iterates over a list of actions until the current phase ends.
• Parallel component: Maintains and executes its algorithm.
• Action: Performs a computational task, e.g. evaluating the right hand side of the time evolution equations. May require data to be received from another action potentially being executed on a different core or node.

### The Metavariables Class

SpECTRE takes a different approach to input options passed to an executable than is common. SpECTRE not only reads an input file at runtime but also has many choices made at compile time. The compile time options are specified by what is referred to as the metavariables. What exactly the metavariables struct specifies depends somewhat on the executable, but all metavariables structs must specify the following:

• help: a static constexpr OptionString that will be printed as part of the help message. It should describe the executable and basic usage of it, as well as any non-standard options that must be specified in the metavariables and their current values. An example of a help string for one of the testing executables is:
static constexpr OptionString help =
"An executable for testing the core functionality of the Algorithm. "
"Actions that do not perform any operations (no-ops), invoking simple "
"actions, mutating data in the DataBox, adding and removing items from "
"the DataBox, receiving data from other parallel components, and "
"out-of-order execution of Actions are all tested. All tests are run "
"just by running the executable, no input file or command line arguments "
"are required";
• component_list: a tmpl::list of the parallel components (described below) that are to be created. Most evolution executables will have the DgElementArray parallel component listed. An example of a component_list for one of the test executables is:
using component_list = tmpl::list<NoOpsComponent<TestMetavariables>,
MutateComponent<TestMetavariables>,
AnyOrderComponent<TestMetavariables>>;
• using const_global_cache_tag_list is set to a (possibly empty) tmpl::list of OptionTags that are needed by the metavariables.
• Phase: an enum class that must contain at least Initialization and Exit. Phases are described in the next section.
• determine_next_phase: a static function with the signature
static Phase determine_next_phase(
const Phase& current_phase,
const Parallel::CProxy_ConstGlobalCache<EvolutionMetavars>& cache_proxy)
noexcept;
What this function does is described below in the discussion of phases.

There are also several optional members:

• input_file: a static constexpr OptionString that is the default name of the input file that is to be read. This can be overridden at runtime by passing the --input-file argument to the executable.
• ignore_unrecognized_command_line_options: a static constexpr bool that defaults to false. If set to true then unrecognized command line options are ignored. Ignoring unrecognized options is generally only necessary for tests where arguments for the testing framework, Catch, are passed to the executable.

### Phases of an Execution

Global synchronization points, where all cores wait for each other, are undesirable for scalability reasons. However, they are sometimes inevitable for algorithmic reasons. That is, in order to actually get a correct solution you need to have a global synchronization. SpECTRE executables can have multiple phases, where after each phase a global synchronization occurs. By global synchronization we mean that no parallel components are executing or have more tasks to execute: everything is waiting on a task to perform.

Every executable must have at least two phases, Initialization and Exit. The next phase is decided by the static member function determine_next_phase in the metavariables. Currently this function has access to the phase that is ending, and also the global cache. In the future we will add support for receiving data from various components to allow for more complex decision making. Here is an example of a determine_next_phase function and the Phase enum class:

enum class Phase {
NoOpsStart,
NoOpsFinish,
MutateStart,
MutateFinish,
AnyOrderStart,
AnyOrderFinish,
Exit
};
static Phase determine_next_phase(
const Phase& current_phase,
const Parallel::CProxy_ConstGlobalCache<
TestMetavariables>& /*cache_proxy*/) noexcept {
switch (current_phase) {
case Phase::Initialization:
return Phase::NoOpsStart;
case Phase::NoOpsStart:
return Phase::NoOpsFinish;
case Phase::NoOpsFinish:
return Phase::MutateStart;
case Phase::MutateStart:
return Phase::MutateFinish;
case Phase::MutateFinish:
return Phase::AnyOrderStart;
case Phase::AnyOrderStart:
return Phase::AnyOrderFinish;
case Phase::AnyOrderFinish:
return Phase::Exit;
case Phase::Exit:
return Phase::Exit;
default:
ERROR("Unknown Phase...");
}
return Phase::Exit;
}

In contrast, an evolution executable might have phases Initialization, SetInitialData, Evolve, and Exit, but have a similar switch or if-else logic in the determine_next_phase function. The first phase that is entered is always Initialization. During the Initialization phase the initialize function is called on all parallel components. Once all parallel components' initialize function is complete, the next phase is determined and the execute_next_phase function is called after on all the parallel components.

At the end of an execution the Exit phase has the executable wait to make sure no parallel components are performing or need to perform any more tasks, and then exits. An example where this approach is important is if we are done evolving a system but still need to write data to disk. We do not want to exit the simulation until all data has been written to disk, even though we've reached the final time of the evolution.

### The Algorithm

Since most numerical algorithms repeat steps until some criterion such as the final time or convergence is met, SpECTRE's parallel components are designed to do such iterations for the user. An Algorithm executes an ordered list of actions until one of the actions cannot be evaluated, typically because it is waiting on data from elsewhere. When an algorithm can no longer evaluate actions it passively waits by handing control back to Charm++. Once an algorithm receives data, typically done by having another parallel component call the receive_data function, the algorithm will try again to execute the next action. If the algorithm is still waiting on more data then the algorithm will again return control to Charm++ and passively wait for more data. This is repeated until all required data is available. The actions that are iterated over by the algorithm are called iterable actions and are described below.

Note
Currently all Algorithms must execute the same actions (described below) in all phases. This restriction is also planned on being relaxed if the need arises.

### Parallel Components

Each Parallel Component struct must have the following type aliases:

1. using chare_type is set to one of:
1. Parallel::Algorithms::Singletons have one object in the entire execution of the program.
2. Parallel::Algorithms::Arrays hold zero or more elements, each of which is an object distributed to some core. An array can grow and shrink in size dynamically if need be and can also be bound to another array. A bound array has the same number of elements as the array it is bound to, and elements with the same ID are on the same core. See Charm++'s chare arrays for details.
3. Parallel::Algorithms::Groups are arrays with one element per core which are not able to be moved around between cores. These are typically useful for gathering data from array elements on their core, and then processing or reducing the data further. See Charm++'s group chares for details.
4. Parallel::Algorithms::Nodegroups are similar to groups except that there is one element per node. For Charm++ SMP (shared memory parallelism) builds, a node corresponds to the usual definition of a node on a supercomputer. However, for non-SMP builds nodes and cores are equivalent. We ensure that all entry method calls done through the Algorithm's simple_action and receive_data functions are threadsafe. User controlled threading is possible by calling the non-entry method member function threaded_action.
2. using metavariables is set to the Metavariables struct that stores the global metavariables. It is often easiest to have the Parallel Component struct have a template parameter Metavariables that is the global metavariables struct. Examples of this technique are given below.
3. using action_list is set to a tmpl::list of the Actions (described below) that the Algorithm running on the Parallel Component executes. The Actions are executed in the order that they are given in the tmpl::list.
4. using initial_databox is set to the type of the DataBox that will be passed to the first Action of the action_list. Typically it is the output of some simple action called during the Initialization Phase.
5. using options is set to a (possibly empty) tmpl::list of the option structs. The options are read in from the input file specified in the main Metavariables struct. After being read in they are passed to the initialize function of the parallel component, which is described below.
6. using const_global_cache_tag_list is set to a tmpl::list of OptionTags that are required by the parallel component. This is usually obtained from the action_list using the Parallel::get_const_global_cache_tags metafunction.
Note
Array parallel components must also specify the type alias using array_index, which is set to the type that indexes the Parallel Component Array. Charm++ allows arrays to be 1 through 6 dimensional or be indexed by a custom type. The Charm++ provided indexes are wrapped as Parallel::ArrayIndex1D through Parallel::ArrayIndex6D. When writing custom array indices, the Charm++ manual tells you to write your own CkArrayIndex, but we have written a general implementation that provides this functionality; all that you need to provide is a plain-old-data (POD) struct of the size of at most 3 integers.

Parallel Components have a static initialize function that is used effectively as the constructor of the components. The signature of the initialize functions must be:

static void initialize(
Parallel::CProxy_ConstGlobalCache<metavariables>& global_cache, opts...);

The initialize function is called by the Main Parallel Component when the execution starts and will typically call a simple Action to set up the initial state of the Algorithm, similar to what a constructor does for classes. The initialize function also receives arguments that are read from the input file which were specified in the options typelist described above. The options are usually used to initialize the Parallel Component's DataBox, or even the component itself. An example of initializing the component itself would be using the value of an option to control the size of the Parallel Component Array. The initialize functions of different Parallel Components are called in random order and so it is not safe to have them depend on each other.

Each parallel component must also decide what to do in the different phases of the execution. This is controlled by an execute_next_phase function with signature:

static void execute_next_phase(
const typename metavariables::Phase next_phase,
const Parallel::CProxy_ConstGlobalCache<metavariables>& global_cache);

The determine_next_phase function in the Metavariables determines the next phase, after which the execute_next_phase function gets called. The execute_next_phase function determines what the Parallel Component should do during the next phase. For example, it may simply call perform_algorithm, call a series of simple actions, perform a reduction over an Array, or not do anything at all. Note that perform_algorithm performs the same actions (the ones in action_list) no matter what Phase it is called in.

An example of a singleton Parallel Component is:

template <class Metavariables>
struct SingletonParallelComponent {
using chare_type = Parallel::Algorithms::Singleton;
using const_global_cache_tag_list = tmpl::list<>;
using metavariables = Metavariables;
using initial_databox = db::compute_databox_type<tmpl::list<>>;
using options = tmpl::list<>;
static void initialize(
Parallel::CProxy_ConstGlobalCache<Metavariables>& global_cache) {
auto& local_cache = *(global_cache.ckLocalBranch());
Parallel::simple_action<SingletonActions::Initialize>(
Parallel::get_parallel_component<SingletonParallelComponent>(
local_cache));
}
static void execute_next_phase(
const typename Metavariables::Phase next_phase,
const Parallel::CProxy_ConstGlobalCache<Metavariables>& global_cache) {
if (next_phase == Metavariables::Phase::PerformSingletonAlgorithm) {
auto& local_cache = *(global_cache.ckLocalBranch());
Parallel::get_parallel_component<SingletonParallelComponent>(local_cache)
.perform_algorithm();
return;
}
}
};

An example of an array Parallel Component is:

template <class Metavariables>
struct ArrayParallelComponent {
using chare_type = Parallel::Algorithms::Array;
using const_global_cache_tag_list = tmpl::list<>;
using metavariables = Metavariables;
using action_list =
ArrayActions::RemoveInt0, ArrayActions::SendToSingleton>;
using array_index = int;
using initial_databox =
using options = tmpl::list<>;
static void initialize(
Parallel::CProxy_ConstGlobalCache<Metavariables>& global_cache) {
auto& local_cache = *(global_cache.ckLocalBranch());
auto& array_proxy =
Parallel::get_parallel_component<ArrayParallelComponent>(local_cache);
for (int i = 0, which_proc = 0,
i < number_of_1d_array_elements; ++i) {
array_proxy[i].insert(global_cache, which_proc);
which_proc = which_proc + 1 == number_of_procs ? 0 : which_proc + 1;
}
array_proxy.doneInserting();
Parallel::simple_action<ArrayActions::Initialize>(array_proxy);
}
static void execute_next_phase(
const typename Metavariables::Phase next_phase,
Parallel::CProxy_ConstGlobalCache<Metavariables>& global_cache) {
auto& local_cache = *(global_cache.ckLocalBranch());
if (next_phase == Metavariables::Phase::PerformArrayAlgorithm) {
Parallel::get_parallel_component<ArrayParallelComponent>(local_cache)
.perform_algorithm();
}
}
};

Elements are inserted into the Array by using the Charm++ insert member function of the CProxy for the array. The insert function is documented in the Charm++ manual. In the above Array example array_proxy is a CProxy and so all the documentation for Charm++ array proxies applies. SpECTRE always creates empty Arrays with the constructor and requires users to insert however many elements they want and on which cores they want them to be placed. Note that load balancing calls may result in Array elements being moved.

### Actions

For those familiar with Charm++, actions should be thought of as effectively being entry methods. They are functions that can be invoked on a remote object (chare/parallel component) using a CProxy (see the Charm++ manual), which is retrieved from the ConstGlobalCache using the parallel component struct and the Parallel::get_parallel_component() function. Actions are structs with a static apply method and come in three variants: simple actions, iterable actions, and reduction actions. One important thing to note is that actions cannot return any data to the caller of the remote method. Instead, "returning" data must be done via callbacks or a callback-like mechanism.

The simplest signature of an apply method is for iterable actions:

template <typename DbTags, typename... InboxTags, typename Metavariables,
typename ArrayIndex, typename ActionList,
typename ParallelComponent>
static auto apply(db::DataBox<DbTags>& box,
const ArrayIndex& /*array_index*/,
const ActionList /*meta*/,
const ParallelComponent* const /*meta*/) noexcept

The return type is discussed at the end of each section describing a particular type of action. Simple actions can have additional arguments but must have at least the arguments shown above. Reduction actions must have the above arguments and an argument taken by value that is of the type the reduction was made over. The db::DataBox should be thought of as the member data of the parallel component while the actions are the member functions. The combination of a db::DataBox and actions allows building up classes with arbitrary member data and methods using template parameters and invocation of actions. This approach allows us to eliminate the need for users to work with Charm++'s interface files, which can be error prone and difficult to use.

The ConstGlobalCache is passed to each action so that the action has access to global data and is able to invoke actions on other parallel components. The ParallelComponent template parameter is the tag of the parallel component that invoked the action. A proxy to the calling parallel component can then be retrieved from the ConstGlobalCache. The remote entry method invocations are slightly different for different types of actions, so they will be discussed below. However, one thing that is disallowed for all actions is calling an action locally from within an action on the same parallel component. Specifically,

auto& local_parallel_component =
*Parallel::get_parallel_component<ParallelComponent>(cache).ckLocal();
Parallel::simple_action<error_call_single_action_from_action>(
local_parallel_component);

Here ckLocal() is a Charm++ provided method that returns a pointer to the local (currently executing) parallel component. See the Charm++ manual for more information. However, you are able to queue a new action to be executed later on the same parallel component by getting your own parallel component from the ConstGlobalCache (Parallel::get_parallel_component<ParallelComponent>(cache)). The difference between the two calls is that by calling an action through the parallel component you will first finish the series of actions you are in, then when they are complete Charm++ will call the next queued action.

Array, group, and nodegroup parallel components can have actions invoked in two ways. First is a broadcast where the action is called on all elements of the array:

auto& group_parallel_component = Parallel::get_parallel_component<
GroupParallelComponent<Metavariables>>(cache);
group_parallel_component,
db::get<Tags::CountActionsCalled>(box) + 100 * array_index,
db::get<Tags::CountActionsCalled>(box));

The second case is invoking an action on a specific array element by using the array element's index. The below example shows how a broadcast would be done manually by looping over all elements in the array:

auto& array_parallel_component =
Parallel::get_parallel_component<ArrayParallelComponent<Metavariables>>(
cache);
for (int i = 0; i < number_of_1d_array_elements; ++i) {
0, 101, true);
}

Note that in general you will not know what all the elements in the array are and so a broadcast is the correct method of sending data to or invoking an action on all elements of an array parallel component.

The array_index argument passed to all apply methods is the index into the parallel component array. If the parallel component is not an array the value and type of array_index is implementation defined and cannot be relied on. The ActionList type is the tmpl::list of iterable actions run on the algorithm. That is, it is equal to the action_list type alias in the parallel component.

#### 1. Simple Actions

Simple actions are designed to be called in a similar fashion to member functions of classes. They are the direct analog of entry methods in Charm++ except that the member data is stored in the db::DataBox that is passed in as the first argument. There are a couple of important things to note with simple actions:

1. A simple action must return void but can use db::mutate to change values of items in the DataBox if the DataBox is taken as a non-const reference. There is one exception: if the input DataBox is empty, then the simple action can return a DataBox of type initial_databox. That is, an action taking an empty DataBox and returning the initial_databox is effectively constructing the DataBox in its initial state.
2. A simple action is instantiated once for an empty db::DataBox<tmpl::list<>>, once for a DataBox of type initial_databox (listed in the parallel component), and once for each returned DataBox from the iterable actions in the action_list in the parallel component. In some cases you will need specific items to be in the DataBox otherwise the action won't compile. To restrict which DataBoxes can be passed you should use Requires in the action's apply function template parameter list. For example,
template <typename... DbTags, typename... InboxTags, typename Metavariables,
typename ArrayIndex, typename ActionList,
typename ParallelComponent,
cpp17::is_same_v<CountActionsCalled, DbTags>...>> = nullptr>
static void apply(db::DataBox<tmpl::list<DbTags...>>& box,
const ArrayIndex& /*array_index*/,
const ActionList /*meta*/,
const ParallelComponent* const /*meta*/) noexcept {
where the conditional checks if any element in the parameter pack DbTags is CountActionsCalled.

A simple action that does not take any arguments can be called using a CProxy from the ConstGlobalCache as follows:

Parallel::get_parallel_component<MutateComponent>(cache));

If the simple action takes arguments then the arguments must be passed to the simple_action method as a std::tuple (because Charm++ doesn't yet support variadic entry method templates). For example,

Multiple arguments can be passed to the std::make_tuple call.

Note
You must be careful about type deduction when using std::make_tuple because std::make_tuple(0) will be of type std::tuple<int>, which will not work if the action is expecting to receive a size_t as its extra argument. Instead, you can get a std::tuple<size_t> in one of two ways. First, you can pass in std::tuple<size_t>(0), second you can include the header Utilities/Literals.hpp and then pass in std::make_tuple(0_st).

#### 2. Iterable Actions

Actions in the algorithm that are part of the action_list are executed one after the other until one of them cannot be evaluated. Iterable actions may have an is_ready method that returns true or false depending on whether or not the action is ready to be evaluated. If no is_ready method is provided then the action is assumed to be ready to be evaluated. The is_ready method typically checks that required data from other parallel components has been received. For example, it may check that all data from neighboring elements has arrived to be able to continue integrating in time. The signature of an is_ready method must be:

template <typename DbTags, typename... InboxTags, typename Metavariables,
typename ArrayIndex>
const db::DataBox<DbTags>& box,
const ArrayIndex& /*array_index*/) noexcept

The inboxes is a collection of the tags passed to receive_data and are specified in the iterable actions member type alias inbox_tags, which must be a tmpl::list. The inbox_tags must have two member type aliases, a temporal_id which is used to identify when the data was sent, and a type which is the type of the data to be stored in the inboxes. The types are typically a std::unordered_map<temporal_id, DATA>. In the discussed scenario of waiting for neighboring elements to send their data the DATA type would be a std::unordered_map<TheElementIndex, DataSent>. Having DATA be a std::unordered_multiset is currently also supported. Here is an example of a receive tag:

using temporal_id = int;
};

The inbox_tags type alias for the action is:

and the is_ready function is:

template <typename DbTags, typename... InboxTags, typename Metavariables,
typename ArrayIndex>
const db::DataBox<DbTags>& /*box*/,
const ArrayIndex& /*array_index*/) noexcept {
}

Once all of the ints have been received, the iterable action is executed, not before.

Warning
It is the responsibility of the iterable action to remove data from the inboxes that will no longer be needed. The removal of unneeded data should be done in the apply function.

Iterable actions can change the type of the DataBox by adding or removing elements/tags from the DataBox. The only requirement is that the last action in the action_list returns a DataBox that is the same type as the initial_databox. Iterable actions can also request that the algorithm no longer be executed, and choose which action in the ActionList/action_list to execute next. This is all done via the return value from the apply function. The apply function for iterable actions must return a std::tuple of one, two, or three elements. The first element of the tuple is the new DataBox, which can be the same as the type passed in or a DataBox with different tags. Most iterable actions will simply return:

return std::forward_as_tuple(std::move(box));

By returning the DataBox as a reference in a std::tuple we avoid any unnecessary copying of the DataBox. The second argument is an optional bool, and controls whether or not the algorithm is terminated. If the bool is true then the algorithm is terminated, by default it is false. Here is an example of how to return a DataBox with the same type that is passed in and also terminate the algorithm:

return std::tuple<db::DataBox<DbTags>&&, bool>(std::move(box), true);

Notice that we again return a reference to the DataBox, which is done to avoid any copying. After an algorithm has been terminated it can be restarted by passing false to the set_terminate method followed by calling the perform_algorithm or receive_data methods.

The third optional element in the returned std::tuple is a size_t whose value corresponds to the index of the action to be called next in the action_list. The metafunction tmpl::index_of<list, element> can be used to get an tmpl::integral_constant with the value of the index of the element element in the typelist list. For example,

std::move(box), true,
tmpl::index_of<ActionList, iterate_increment_int0>::value + 1);

Again a reference to the DataBox is returned, while the termination bool and next action size_t are returned by value. The metafunction call tmpl::index_of<ActionList, iterate_increment_int0>::value returns a size_t whose value is that of the action iterate_increment_int0 in the action_list. The indexing of actions in the action_list starts at 0.

Iterable actions are invoked as part of the algorithm and so the only way to request they be invoked is by having the algorithm run on the parallel component. The algorithm can be explicitly evaluated by call the perform_algorithm method:

Parallel::get_parallel_component<NoOpsComponent>(local_cache)
.perform_algorithm();

The algorithm is also evaluated by calling the receive_data function, either on an entire array or singleton (this does a broadcast), or an on individual element of the array. Here is an example of a broadcast call:

auto& group_parallel_component = Parallel::get_parallel_component<
GroupParallelComponent<Metavariables>>(cache);
group_parallel_component,
db::get<Tags::CountActionsCalled>(box) + 100 * array_index,
db::get<Tags::CountActionsCalled>(box));

and of calling individual elements:

auto& array_parallel_component =
Parallel::get_parallel_component<ArrayParallelComponent<Metavariables>>(
cache);
for (int i = 0; i < number_of_1d_array_elements; ++i) {
0, 101, true);
}

The receive_data function always takes a ReceiveTag, which is set in the actions inbox_tags type alias as described above. The first argument is the temporal identifier, and the second is the data to be sent.

Normally when remote functions are invoked they go through the Charm++ runtime system, which adds some overhead. The receive_data function tries to elide the call to the Charm++ RTS for calls into array components. Charm++ refers to these types of remote calls as "inline entry methods". With the Charm++ method of eliding the RTS, the code becomes susceptible to stack overflows because of infinite recursion. The receive_data function is limited to at most 64 RTS elided calls, though in practice reaching this limit is rare. When the limit is reached the remote method invocation is done through the RTS instead of being elided.

#### 3. Reduction Actions

Finally, there are reduction actions which are used when reducing data over an array. For example, you may want to know the sum of a int from every element in the array. You can do this as follows:

Parallel::ReductionData<Parallel::ReductionDatum<int, funcl::Plus<>>>
my_send_int{array_index};
Parallel::contribute_to_reduction<ProcessReducedSumOfInts>(
my_send_int, my_proxy, singleton_proxy);

This reduces over the parallel component ArrayParallelComponent<Metavariables>, reduces to the parallel component SingletonParallelComponent<Metavariables>, and calls the action ProcessReducedSumOfInts after the reduction has been performed. The reduction action is:

struct ProcessReducedSumOfInts {
template <typename DbTags, typename... InboxTags, typename Metavariables,
typename ArrayIndex, typename ActionList,
typename ParallelComponent>
static void apply(db::DataBox<DbTags>& /*box*/,
const ArrayIndex& /*array_index*/,
const ActionList /*meta*/,
const ParallelComponent* const /*meta*/,
const int& value) noexcept {
SPECTRE_PARALLEL_REQUIRE(number_of_1d_array_elements *
(number_of_1d_array_elements - 1) / 2 ==
value);
}
};

As you can see, the last argument to the apply function is of type int, and is the reduced value.

You can also broadcast the result back to an array, even yourself. For example,

Parallel::contribute_to_reduction<ProcessReducedSumOfInts>(
my_send_int, my_proxy, array_proxy);

It is often necessary to reduce custom data types, such as std::vector or std::unordered_map. Charm++ supports such custom reductions, and so does our layer on top of Charm++. Custom reductions require one additional step to calling contribute_to_reduction, which is writing a reduction function to reduce the custom data. We provide a generic type that can be used in custom reductions, Parallel::ReductionData, which takes a series of Parallel::ReductionDatum as template parameters and ReductionDatum::value_types as the arguments to the constructor. Each ReductionDatum takes up to four template parameters (two are required). The first is the type of data to reduce, and the second is a binary invokable that is called at each step of the reduction to combine two messages. The last two template parameters are used after the reduction has completed. The third parameter is an n-ary invokable that is called once the reduction is complete, whose first argument is the result of the reduction. The additional arguments can be any ReductionDatum::value_type in the ReductionData that are before the current one. The fourth template parameter of ReductionDatum is used to specify which data should be passed. It is a std::index_sequence indexing into the ReductionData.

The action that is invoked with the result of the reduction is:

struct ProcessCustomReductionAction {
template <typename DbTags, typename... InboxTags, typename Metavariables,
typename ArrayIndex, typename ActionList,
typename ParallelComponent>
static void apply(db::DataBox<DbTags>& /*box*/,
const ArrayIndex& /*array_index*/,
const ActionList /*meta*/,
const ParallelComponent* const /*meta*/, int reduced_int,
std::vector<int>&& reduced_vector) noexcept {
SPECTRE_PARALLEL_REQUIRE(reduced_int == 10);
SPECTRE_PARALLEL_REQUIRE(reduced_map.at("unity") ==
number_of_1d_array_elements - 1);
SPECTRE_PARALLEL_REQUIRE(reduced_map.at("double") ==
2 * number_of_1d_array_elements - 2);
SPECTRE_PARALLEL_REQUIRE(reduced_map.at("negative") == 0);
reduced_vector ==
(std::vector<int>{-reduced_int * number_of_1d_array_elements *
(number_of_1d_array_elements - 1) / 2,
-reduced_int * number_of_1d_array_elements * 10,
8 * reduced_int * number_of_1d_array_elements}));
}
};

Note that it takes a Parallel::ReductionData object as its last argument.

Warning
All elements of the array must call the same reductions in the same order. It is defined behavior to do multiple reductions at once as long as all contribute calls on all array elements occurred in the same order. It is undefined behavior if the contribute calls are made in different orders on different array elements.

### Charm++ Node and Processor Level Initialization Functions

Charm++ allows running functions once per core and once per node before the construction of any parallel components. This is commonly used for setting up error handling and enabling floating point exceptions. Other functions could also be run. Which functions are run on each node and core is set by specifying a std::vector<void (*)()> called charm_init_node_funcs and charm_init_proc_funcs with function pointers to the functions to be called. For example,

static const std::vector<void (*)()> charm_init_node_funcs{
&setup_error_handling};
static const std::vector<void (*)()> charm_init_proc_funcs{

Finally, the user must include the Parallel/CharmMain.tpp file at the end of the main executable cpp file. So, the end of an executables main cpp file will then typically look as follows:

static const std::vector<void (*)()> charm_init_node_funcs{
&setup_error_handling};
static const std::vector<void (*)()> charm_init_proc_funcs{
using charmxx_main_component = Parallel::Main<TestMetavariables>;
#include "Parallel/CharmMain.tpp" // IWYU pragma: keep

## ◆ WRAPPED_PUPable_decl_base_template

 #define WRAPPED_PUPable_decl_base_template ( baseClassName, className ) PUPable_decl_base_template(SINGLE_ARG(baseClassName), SINGLE_ARG(className))

Mark derived template classes as serializable.

Any class that derives off of a class template base class must contain this macro if it is to be serialized.

## ◆ WRAPPED_PUPable_decl_template

 #define WRAPPED_PUPable_decl_template ( className ) PUPable_decl_template(SINGLE_ARG(className))

Mark derived classes as serializable.

Any class that derives off of a non-class template base class must contain this macro if it is to be serialized.

## ◆ deserialize()

template<typename T >
 T deserialize ( const void *const data )

Deserialize an object using PUP.

Template Parameters
 T the type to deserialize to

## ◆ get()

template<typename ConstGlobalCacheTag , typename Metavariables >
 auto Parallel::get ( const ConstGlobalCache< Metavariables > & cache ) -> const ConstGlobalCache_detail::type_for_get&
noexcept

Access data in the cache.

Requires: ConstGlobalCacheTag is a tag in tag_list

Returns: a constant reference to an object in the cache

## ◆ get_parallel_component() [1/2]

template<typename ParallelComponentTag , typename Metavariables >
 auto Parallel::get_parallel_component ( ConstGlobalCache< Metavariables > & cache ) -> Parallel::proxy_from_parallel_component< ConstGlobalCache_detail::get_component_if_mocked< typename Metavariables::component_list, ParallelComponentTag>>&
noexcept

Access the Charm++ proxy associated with a ParallelComponent.

Requires: ParallelComponentTag is a tag in component_list

Returns: a Charm++ proxy that can be used to call an entry method on the chare(s)

## ◆ get_parallel_component() [2/2]

template<typename ParallelComponentTag , typename Metavariables >
 auto Parallel::get_parallel_component ( const ConstGlobalCache< Metavariables > & cache ) -> const Parallel::proxy_from_parallel_component< ConstGlobalCache_detail::get_component_if_mocked< typename Metavariables::component_list, ParallelComponentTag>>&
noexcept

Access the Charm++ proxy associated with a ParallelComponent.

Requires: ParallelComponentTag is a tag in component_list

Returns: a Charm++ proxy that can be used to call an entry method on the chare(s)

## ◆ printf()

template<typename... Args>
 void Parallel::printf ( const std::string & format, Args &&... args )
inline

Print an atomic message to stdout with C printf usage.

Similar to Python, you can print any object that's streamable by passing it in as an argument and using the formatter "%s". For example,

std::vector<double> a{0.8, 73, 9.8};
Parallel::printf("%s\n", a);

## ◆ printf_error()

template<typename... Args>
 void Parallel::printf_error ( const std::string & format, Args &&... args )
inline

Print an atomic message to stderr with C printf usage.

See Parallel::printf for details.

template<typename ReceiveTag , typename Proxy , typename ReceiveDataType , Requires< detail::has_ckLocal_method< std::decay_t< Proxy >>::value > = nullptr>
noexcept

Send the data args... to the algorithm running on proxy, and tag the message with the identifier temporal_id.

If the algorithm was previously disabled, set enable_if_disabled to true to enable the algorithm on the parallel component.

Note
The reason there are two separate functions is because Charm++ does not allow defaulted arguments for group and nodegroup chares.

template<typename ReceiveTag , typename Proxy , typename ReceiveDataType , Requires< detail::has_ckLocal_method< std::decay_t< Proxy >>::value > = nullptr>
noexcept

Send the data args... to the algorithm running on proxy, and tag the message with the identifier temporal_id.

If the algorithm was previously disabled, set enable_if_disabled to true to enable the algorithm on the parallel component.

Note
The reason there are two separate functions is because Charm++ does not allow defaulted arguments for group and nodegroup chares.

## ◆ serialize()

template<typename T , typename U >
 std::vector serialize ( const U & obj )

Serialize an object using PUP.

The type to serialize as must be explicitly specified. We require this because a mismatch between the serialize and deserialize calls causes undefined behavior and we do not want this to depend on inferred types for safety.

Template Parameters
 T type to serialize