SpECTRE Documentation Coverage Report
Current view: top level - __w/spectre/spectre/docs/DevGuide - Parallelization.md Hit Total Coverage
Commit: f1ddee3e40d81480e49140855d2b0e66fafaa908 Lines: 0 1 0.0 %
Date: 2020-12-02 17:35:08
Legend: Lines: hit not hit

          Line data    Source code
       1           0 : \cond NEVER
       2             : Distributed under the MIT License.
       3             : See LICENSE.txt for details.
       4             : \endcond
       5             : # Parallelization, Charm++, and Core Concepts {#dev_guide_parallelization_foundations}
       6             : 
       7             : \tableofcontents
       8             : 
       9             : # Introduction {#dev_guide_parallelization_introduction}
      10             : 
      11             : SpECTRE builds a layer on top of Charm++ that performs various safety checks and
      12             : initialization for the user that can otherwise lead to difficult-to-debug
      13             : undefined behavior. The central concept is what is called a %Parallel
      14             : Component. A %Parallel Component is a struct with several type aliases that
      15             : is used by SpECTRE to set up the Charm++ chares and allowed communication
      16             : patterns. %Parallel Components are input arguments to the compiler, which then
      17             : writes the parallelization infrastructure that you requested for the executable.
      18             : There is no restriction on the number of %Parallel Components, though
      19             : practically it is best to have around 10 at most.
      20             : 
      21             : Here is an overview of what is described in detail in the sections below:
      22             : 
      23             : - \ref dev_guide_parallelization_metavariables_class "Metavariables": Provides
      24             :   high-level configuration to the compiler, e.g. the physical system to be
      25             :   simulated.
      26             : - \ref dev_guide_parallelization_phases_of_execution "Phases": Defines distinct
      27             :   simulation phases separated by a global synchronization point,
      28             :   e.g. `Initialization`, `Evolve` and `Exit`.
      29             : - \ref dev_guide_parallelization_core_algorithm "Algorithm": In each phase,
      30             :   repeatedly iterates over a list of actions until the current phase ends.
      31             : - \ref dev_guide_parallelization_parallel_components "Parallel component":
      32             :   Maintains and executes its algorithm.
      33             : - \ref dev_guide_parallelization_actions "Action": Performs a computational
      34             :   task, e.g. evaluating the right hand side of the time evolution equations. May
      35             :   require data to be received from another action potentially being executed on
      36             :   a different core or node.
      37             : 
      38             : # The Metavariables Class {#dev_guide_parallelization_metavariables_class}
      39             : 
      40             : SpECTRE takes a different approach to input options passed to an executable than
      41             : is common. SpECTRE not only reads an input file at runtime but also has many
      42             : choices made at compile time. The compile time options are specified by what is
      43             : referred to as the metavariables. What exactly the metavariables struct
      44             : specifies depends on the executable, but all metavariables structs must
      45             : specify the following:
      46             : 
      47             : - `help`: a `static constexpr Options::String` that will be printed as part of
      48             :   the help message. It should describe the executable and basic usage of it, as
      49             :   well as any non-standard options that must be specified in the metavariables
      50             :   and their current values. An example of a help string for one of the testing
      51             :   executables is:
      52             :   \snippet Test_AlgorithmCore.cpp help_string_example
      53             : - `component_list`: a `tmpl::list` of the parallel components (described below)
      54             :   that are to be created. Most evolution executables will have the
      55             :   `DgElementArray` parallel component listed. An example of a `component_list`
      56             :   for one of the test executables is:
      57             :   \snippet Test_AlgorithmCore.cpp component_list_example
      58             : - `using const_global_cache_tags`: a `tmpl::list` of tags that are
      59             :   used to place const items in the GlobalCache.  The alias may be
      60             :   omitted if the list is empty.
      61             : - `using mutable_global_cache_tags`: a `tmpl::list` of tags that are
      62             :   used to place mutable items in the GlobalCache.  The alias may be
      63             :   omitted if the list is empty.
      64             : - `Phase`: an `enum class` that must contain at least `Initialization` and
      65             :   `Exit`. Phases are described in the next section.
      66             : - `determine_next_phase`: a static function with the signature
      67             :   \code
      68             :     static Phase determine_next_phase(
      69             :       const Phase& current_phase,
      70             :       const Parallel::CProxy_GlobalCache<EvolutionMetavars>& cache_proxy)
      71             :       noexcept;
      72             :   \endcode
      73             :   What this function does is described below in the discussion of
      74             :   \ref dev_guide_parallelization_phases_of_execution "phases".
      75             : 
      76             : There are also several optional members:
      77             : 
      78             : - `input_file`: a `static constexpr Options::String` that is the default name of
      79             :   the input file that is to be read. This can be overridden at runtime by
      80             :   passing the `--input-file` argument to the executable.
      81             : - `ignore_unrecognized_command_line_options`: a `static constexpr bool` that
      82             :   defaults to `false`. If set to `true` then unrecognized command line options
      83             :   are ignored. Ignoring unrecognized options is generally only necessary for
      84             :   tests where arguments for the testing framework,
      85             :   [Catch](https://github.com/catchorg/Catch2/), are passed to the executable.
      86             : 
      87             : # Phases of an Execution {#dev_guide_parallelization_phases_of_execution}
      88             : 
      89             : Global synchronization points, where all cores wait for each other, are
      90             : undesirable for scalability reasons. However, they are sometimes inevitable for
      91             : algorithmic reasons. That is, in order to actually get a correct solution you
      92             : need to have a global synchronization. SpECTRE executables can have multiple
      93             : phases, where after each phase a global synchronization occurs. By global
      94             : synchronization we mean that no parallel components are executing or have more
      95             : tasks to execute: everything is waiting to be told what tasks to perform next.
      96             : 
      97             : Every executable must have at least two phases, `Initialization` and
      98             : `Exit`. The next phase is decided by the static member function
      99             : `determine_next_phase` in the metavariables. Currently this function has access
     100             : to the phase that is
     101             : ending, and also the global cache. In the future we will add support for
     102             : receiving data from various components to allow for more complex decision
     103             : making. Here is an example of a `determine_next_phase` function and the `Phase`
     104             : enum class:
     105             : \snippet Test_AlgorithmCore.cpp determine_next_phase_example
     106             : 
     107             : In contrast, an evolution executable might have phases
     108             : `Initialization`, `SetInitialData`, `Evolve`, and `Exit`, but have a
     109             : similar `switch` or `if-else` logic in the `determine_next_phase`
     110             : function. The first phase that is entered is always
     111             : `Initialization`. During the `Initialization` phase the
     112             : `Parallel::GlobalCache` is created, all non-array components are created,
     113             : and empty array components are created.  Next, the function
     114             : `allocate_array_components_and_execute_initialization_phase` is called
     115             : which allocates the elements of each array component, and then starts
     116             : the `Initialization` phase on all parallel components. Once all
     117             : parallel components' `Initialization` phase is complete, the next
     118             : phase is determined and the `execute_next_phase` function is called on
     119             : all the parallel components.
     120             : 
     121             : At the end of an execution the `Exit` phase has the executable wait to make sure
     122             : no parallel components are performing or need to perform any more tasks, and
     123             : then exits. An example where this approach is important is if we are done
     124             : evolving a system but still need to write data to disk. We do not want to exit
     125             : the simulation until all data has been written to disk, even though we've
     126             : reached the final time of the evolution.
     127             : 
     128             : \warning Currently dead-locks are treated as successful termination. In the
     129             : future checks against deadlocks will be performed before terminating.
     130             : 
     131             : # The Algorithm {#dev_guide_parallelization_core_algorithm}
     132             : 
     133             : Since most numerical algorithms repeat steps until some criterion such as the
     134             : final time or convergence is met, SpECTRE's parallel components are designed to
     135             : do such iterations for the user. An Algorithm executes an ordered list of
     136             : actions until one of the actions cannot be evaluated, typically because it is
     137             : waiting on data from elsewhere. When an algorithm can no longer evaluate actions
     138             : it passively waits by handing control back to Charm++. Once an algorithm
     139             : receives data, typically done by having another parallel component call the
     140             : `receive_data` function, the algorithm will try again to execute the next
     141             : action. If the algorithm is still waiting on more data then the algorithm will
     142             : again return control to Charm++ and passively wait for more data. This is
     143             : repeated until all required data is available. The actions that are iterated
     144             : over by the algorithm are called iterable actions and are described below.
     145             : Since the action list is phase dependent we refer to them generally as
     146             : phase-dependent action lists (PDALs, pronounced "pedals").
     147             : 
     148             : # Parallel Components {#dev_guide_parallelization_parallel_components}
     149             : 
     150             : Each %Parallel Component struct must have the following type aliases:
     151             : 1. `using chare_type` is set to one of:
     152             :    1. `Parallel::Algorithms::Singleton`s have one object in the entire execution
     153             :       of the program.
     154             :    2. `Parallel::Algorithms::Array`s hold zero or more elements, each of which
     155             :       is an object distributed to some core. An array can grow and shrink in
     156             :       size dynamically if need be and can also be bound to another array. A
     157             :       bound array has the same number of elements as the array it is bound to,
     158             :       and elements with the same ID are on the same core. See Charm++'s chare
     159             :       arrays for details.
     160             :    3. `Parallel::Algorithms::Group`s are arrays with
     161             :       one element per core which are not able to be moved around between
     162             :       cores. These are typically useful for gathering data from array elements
     163             :       on their core, and then processing or reducing the data further. See
     164             :       [Charm++'s](http://charm.cs.illinois.edu/help) group chares for details.
     165             :    4. `Parallel::Algorithms::Nodegroup`s are similar to
     166             :       groups except that there is one element per node. For Charm++ SMP (shared
     167             :       memory parallelism) builds, a node corresponds to the usual definition of
     168             :       a node on a supercomputer. However, for non-SMP builds nodes and cores are
     169             :       equivalent. We ensure that all entry method calls done through the
     170             :       Algorithm's `simple_action` and `receive_data` functions are
     171             :       threadsafe. User controlled threading is possible by calling the non-entry
     172             :       method member function `threaded_action`.
     173             : 2. `using metavariables` is set to the Metavariables struct that stores the
     174             :    global metavariables. It is often easiest to have the %Parallel
     175             :    Component struct have a template parameter `Metavariables` that is the
     176             :    global metavariables struct. Examples of this technique are given below.
     177             : 3. `using phase_dependent_action_list` is set to a `tmpl::list` of
     178             :    `Parallel::PhaseActions<PhaseType, Phase, tmpl::list<Actions...>>`
     179             :    where each `PhaseAction` represents a PDAL that will be executed on
     180             :    the parallel component during the specified phase. The %Actions are
     181             :    executed in the order that they are given in the `tmpl::list`s of
     182             :    the PDALs, but the phases need not be run in linear order. However,
     183             :    `db::DataBox` types are constructed assuming the phases are
     184             :    performed from first in the `phase_dependent_action_list` to the
     185             :    last. Simple actions (described below) can be executed in any
     186             :    phase. If there are no iterable actions in a phase then a
     187             :    `PhaseAction` need not be specified for that phase. However, at
     188             :    least one `PhaseAction`, even if it is empty, must be specified.
     189             : 4. `using initialization_tags` which is a `tmpl::list` of all the tags
     190             :    that will be inserted into the initial `db::DataBox` of each component.
     191             :    These tags are db::SimpleTag%s that have have a `using option_tags`
     192             :    type alias and a static function `create_from_options` (see the
     193             :    example below).  This list can usually be constructed from the
     194             :    initialization actions of the component (i.e. the list of actions
     195             :    in the `PhaseAction` list for the `Initialization` phase) using the
     196             :    helper function `Parallel::get_initialization_tags` (see the
     197             :    examples of components below).  Each initialization action may
     198             :    specify a type alias `using initialization_tags` which are a
     199             :    `tmpl::list` of tags that will be fetched from the db::DataBox by the
     200             :    action.  All `initialization_tags` are removed from the db::DataBox of
     201             :    the component at the end of the `Initialization` phase, except for
     202             :    tags listed in a type alias `using initialization_tags_to_keep` that
     203             :    may appear in each initialization action.
     204             : 5. `using const_global_cache_tags` is set to a `tmpl::list` of tags
     205             :    that are required by the `allocate_array` function of an array
     206             :    component, or simple actions called on the parallel component.
     207             :    These tags correspond to const items that are stored in the
     208             :    Parallel::GlobalCache (of which there is one copy per Charm++
     209             :    node).  The alias can be omitted if the list is empty.  (See
     210             :    `array_allocation_tags` below for specifying tags needed for the
     211             :    `allocate_array` function, but will not be added to the
     212             :    Parallel::GlobalCache.)
     213             : 6. `using mutable_global_cache_tags` is set to a `tmpl::list` of tags
     214             :    that correspond to mutable items that are stored in the
     215             :    Parallel::GlobalCache (of which there is one copy per Charm++
     216             :    core).  The alias can be omitted if the list is empty.
     217             : 
     218             : \note Array parallel components must also specify the type alias `using
     219             : array_index`, which is set to the type that indexes the %Parallel Component
     220             : Array. Charm++ allows arrays to be 1 through 6 dimensional or be indexed by a
     221             : custom type. The Charm++ provided indexes are wrapped as
     222             : `Parallel::ArrayIndex1D` through `Parallel::ArrayIndex6D`. When writing custom
     223             : array indices, the [Charm++ manual](http://charm.cs.illinois.edu/help) tells you
     224             : to write your own `CkArrayIndex`, but we have written a general implementation
     225             : that provides this functionality (see `Parallel::ArrayIndex`); all that you need
     226             : to provide is a plain-old-data
     227             : ([POD](http://en.cppreference.com/w/cpp/concept/PODType)) struct of the size of
     228             : at most 3 integers.
     229             : 
     230             : %Parallel array components have a static `allocate_array` function
     231             : that is used to construct the elements of the array. The
     232             : signature of the `allocate_array` functions must be:
     233             : \code
     234             : static void allocate_array(
     235             :     Parallel::CProxy_GlobalCache<metavariables>& global_cache,
     236             :     const tuples::tagged_tuple_from_typelist<initialization_tags>&
     237             :     initialization_items) noexcept;
     238             : \endcode
     239             : The `allocate_array` function is called by the Main parallel component
     240             : when the execution starts and will typically insert elements into
     241             : array parallel components. If the `allocate_array` function depends
     242             : upon input options, the array component must specify a `using
     243             : array_allocation_tags` type alias that is a `tmpl::list` of tags which
     244             : are db::SimpleTag%s that have have a `using option_tags` type alias
     245             : and a static function `create_from_options`. An example is:
     246             : \snippet DistributedLinearSolverAlgorithmTestHelpers.hpp array_allocation_tag
     247             : 
     248             : The `allocate_array` functions of different
     249             : array components are called in random order and so it is not safe to
     250             : have them depend on each other.
     251             : 
     252             : Each parallel component must also decide what to do in the different phases of
     253             : the execution. This is controlled by an `execute_next_phase` function with
     254             : signature:
     255             : \code
     256             : static void execute_next_phase(
     257             :     const typename metavariables::Phase next_phase,
     258             :     const Parallel::CProxy_GlobalCache<metavariables>& global_cache);
     259             : \endcode
     260             : The `determine_next_phase` function in the Metavariables determines the next
     261             : phase, after which the `execute_next_phase` function gets called. The
     262             : `execute_next_phase` function determines what the parallel component should do
     263             : during the next phase. Typically the `execute_next_phase` function should just
     264             : call `start_phase(phase)` on the parallel component. In the future
     265             : `execute_next_phase` may be removed.
     266             : 
     267             : An example of a singleton parallel component is:
     268             : \snippet Test_AlgorithmParallel.cpp singleton_parallel_component
     269             : 
     270             : An example of an array parallel component is:
     271             : \snippet Test_AlgorithmParallel.cpp array_parallel_component
     272             : Elements are inserted into the array by using the Charm++ `insert` member
     273             : function of the CProxy for the array. The `insert` function is documented in
     274             : the Charm++ manual. In the above Array example `array_proxy` is a `CProxy` and
     275             : so all the documentation for Charm++ array proxies applies. SpECTRE always
     276             : creates empty arrays with the constructor and requires users to insert however
     277             : many elements they want and on which cores they want them to be placed. Note
     278             : that load balancing calls may result in array elements being moved.
     279             : 
     280             : # Actions {#dev_guide_parallelization_actions}
     281             : 
     282             : For those familiar with Charm++, actions should be thought of as effectively
     283             : being entry methods. They are functions that can be invoked on a remote object
     284             : (chare/parallel component) using a `CProxy` (see the [Charm++
     285             : manual](http://charm.cs.illinois.edu/help)), which is retrieved from the
     286             : Parallel::GlobalCache using the parallel component struct and the
     287             : `Parallel::get_parallel_component()` function. %Actions are structs with a
     288             : static `apply` method and come in three variants: simple actions, iterable
     289             : actions, and reduction actions. One important thing to note
     290             : is that actions cannot return any data to the caller of the remote method.
     291             : Instead, "returning" data must be done via callbacks or a callback-like
     292             : mechanism.
     293             : 
     294             : The simplest signature of an `apply` method is for iterable actions:
     295             : \snippet Test_AlgorithmCore.cpp apply_iterative
     296             : The return type is discussed at the end of each section describing a particular
     297             : type of action. Simple actions can have additional arguments, do not receive
     298             : the `inboxes` or `ActionList`, and take the `ParallelComponent` as an
     299             : explicit first template parameter. Reduction actions have the same signature as
     300             : simple actions except that the additional arguments must be of the types reduced
     301             : over.
     302             : The `db::DataBox` should be thought of as the member data of the parallel
     303             : component while the actions are the member functions. The combination of a
     304             : `db::DataBox` and actions allows building up classes with arbitrary member data
     305             : and methods using template parameters and invocation of actions. This approach
     306             : allows us to eliminate the need for users to work with Charm++'s interface
     307             : files, which can be error prone and difficult to use.
     308             : 
     309             : The Parallel::GlobalCache is passed to each action so that the
     310             : action has access to global data and is able to invoke actions on
     311             : other parallel components. The `ParallelComponent` template parameter
     312             : is the tag of the parallel component that invoked the action. A proxy
     313             : to the calling parallel component can then be retrieved from the
     314             : Parallel::GlobalCache. The remote entry method invocations are
     315             : slightly different for different types of actions, so they will be
     316             : discussed below. However, one thing that is disallowed for all actions
     317             : is calling an action locally from within an action on the same
     318             : parallel component.  Specifically,
     319             : 
     320             : \snippet Test_AlgorithmNestedApply1.cpp bad_recursive_call
     321             : 
     322             : Here `ckLocal()` is a Charm++ provided method that returns a pointer
     323             : to the local (currently executing) parallel component. See the
     324             : [Charm++ manual](http://charm.cs.illinois.edu/help) for more
     325             : information.  However, you are able to queue a new action to be
     326             : executed later on the same parallel component by getting your own
     327             : parallel component from the Parallel::GlobalCache
     328             : (`Parallel::get_parallel_component<ParallelComponent>(cache)`).  The
     329             : difference between the two calls is that by calling an action through
     330             : the parallel component you will first finish the series of actions you
     331             : are in, then when they are complete Charm++ will call the next queued
     332             : action.
     333             : 
     334             : Array, group, and nodegroup parallel components can have actions invoked in two
     335             : ways. First is a broadcast where the action is called on all elements of the
     336             : array:
     337             : 
     338             : \snippet Test_AlgorithmParallel.cpp broadcast_to_group
     339             : 
     340             : The second case is invoking an action on a specific array element by using the
     341             : array element's index. The below example shows how a broadcast would be done
     342             : manually by looping over all elements in the array:
     343             : 
     344             : \snippet Test_AlgorithmParallel.cpp call_on_indexed_array
     345             : 
     346             : Note that in general you will not know what all the elements in the array are
     347             : and so a broadcast is the correct method of sending data to or invoking an
     348             : action on all elements of an array parallel component.
     349             : 
     350             : The `array_index` argument passed to all `apply` methods is the index into the
     351             : parallel component array. If the parallel component is not an array the value
     352             : and type of `array_index` is implementation defined and cannot be relied on. The
     353             : `ActionList` type is the `tmpl::list` of iterable actions in the current phase.
     354             : That is, it is equal to the `action_list` type alias in the current PDAL.
     355             : 
     356             : ## 1. Simple Actions {#dev_guide_parallelization_simple_actions}
     357             : 
     358             : Simple actions are designed to be called in a similar fashion to
     359             : member functions of classes. They are the direct analog of entry
     360             : methods in Charm++ except that the member data is stored in the
     361             : `db::DataBox` that is passed in as the first argument. A simple action
     362             : must return void but can use `db::mutate` to change values of items in
     363             : the `db::DataBox` if the `db::DataBox` is taken as a non-const
     364             : reference. In some cases you will need specific items to be in the
     365             : `db::DataBox` otherwise the action won't compile. To restrict which
     366             : `db::DataBox`es can be passed you should use `Requires` in the
     367             : action's `apply` function template parameter list. For example,
     368             : \snippet Test_AlgorithmCore.cpp requires_action
     369             : where the conditional checks if any element in the parameter pack `DbTags` is
     370             : `CountActionsCalled`.
     371             : 
     372             : A simple action that does not take any arguments can be called using a `CProxy`
     373             : from the Parallel::GlobalCache as follows:
     374             : 
     375             : \snippet Test_AlgorithmCore.cpp simple_action_call
     376             : 
     377             : If the simple action takes arguments then the arguments must be as follows:
     378             : 
     379             : \snippet Test_AlgorithmNodelock.cpp simple_action_with_args
     380             : 
     381             : ## 2. Iterable Actions {#dev_guide_parallelization_iterable_actions}
     382             : 
     383             : %Actions in the algorithm that are part of the current PDAL are executed one
     384             : after the other until one of them cannot be evaluated. Iterable
     385             : actions may have an `is_ready` method that returns `true` or `false` depending
     386             : on whether or not the action is ready to be evaluated. If no `is_ready` method
     387             : is provided then the action is assumed to be ready to be evaluated. The
     388             : `is_ready` method typically checks that required data from other parallel
     389             : components has been received. For example, it may check that all data from
     390             : neighboring elements has arrived to be able to continue integrating in time.
     391             : The signature of an `is_ready` method must be:
     392             : 
     393             : \snippet Test_AlgorithmCore.cpp is_ready_example
     394             : 
     395             : The `inboxes` is a collection of the tags passed to `receive_data` and are
     396             : specified in the iterable actions member type alias `inbox_tags`, which must be
     397             : a `tmpl::list`. The `inbox_tags` must have two member type aliases, a
     398             : `temporal_id` which is used to identify when the data was sent, and a `type`
     399             : which is the type of the data to be stored in the `inboxes`. The types are
     400             : typically a `std::unordered_map<temporal_id, DATA>`. In the discussed scenario
     401             : of waiting for neighboring elements to send their data the `DATA` type would be
     402             : a `std::unordered_map<TheElementId, DataSent>`. Inbox tags must also specify
     403             : a `static void insert_into_inbox()` function. For example,
     404             : 
     405             : \snippet Test_AlgorithmParallel.cpp int_receive_tag
     406             : 
     407             : For common types of `DATA`, such as a `map`, a data structure with an `insert`
     408             : function, a data structure with a `push_back` function, or copy/move assignment
     409             : that is used to insert the received data, inserters are available in
     410             : `Parallel::InboxInserters`. For example, there is
     411             : `Parallel::InboxInserters::Map` for `map` data structures. The inbox tag can
     412             : inherit publicly off the inserters to gain the required insertion capabilities:
     413             : 
     414             : \snippet Test_AlgorithmCore.cpp int receive tag insert
     415             : 
     416             : The `inbox_tags` type alias for the action is:
     417             : 
     418             : \snippet Test_AlgorithmParallel.cpp int_receive_tag_list
     419             : 
     420             : and the `is_ready` function is:
     421             : 
     422             : \snippet Test_AlgorithmParallel.cpp int_receive_tag_is_ready
     423             : 
     424             : Once all of the `int`s have been received, the iterable action is executed, not
     425             : before.
     426             : 
     427             : \warning
     428             : It is the responsibility of the iterable action to remove data from the inboxes
     429             : that will no longer be needed. The removal of unneeded data should be done in
     430             : the `apply` function.
     431             : 
     432             : Iterable actions can change the type of the db::DataBox by adding or
     433             : removing elements/tags from the db::DataBox. The only requirement is
     434             : that the last action in each PDAL returns a db::DataBox that is the
     435             : same type for each iteration. Iterable actions can also request that
     436             : the algorithm no longer be executed, and control which action in the
     437             : current PDAL will be executed next. This is all done via the return
     438             : value from the `apply` function.  The `apply` function for iterable
     439             : actions must return a `std::tuple` of one, two, or three elements. The
     440             : first element of the tuple is the new db::DataBox, which can be the
     441             : same as the type passed in or a db::DataBox with different tags.  Most
     442             : iterable actions will simply return:
     443             : 
     444             : \snippet Test_AlgorithmParallel.cpp return_forward_as_tuple
     445             : 
     446             : By returning the db::DataBox as a reference in a `std::tuple` we avoid
     447             : any unnecessary copying of the db::DataBox. The second argument is an
     448             : optional bool, and controls whether or not the algorithm is
     449             : terminated. If the bool is `true` then the algorithm is terminated, by
     450             : default it is `false`. Here is an example of how to return a
     451             : db::DataBox with the same type that is passed in and also terminate
     452             : the algorithm:
     453             : 
     454             : \snippet Test_AlgorithmParallel.cpp return_with_termination
     455             : 
     456             : Notice that we again return a reference to the db::DataBox, which is
     457             : done to avoid any copying. After an algorithm has been terminated it
     458             : can be restarted by passing `false` to the `set_terminate` method or
     459             : by calling `receive_data(..., true)`. Since the order in which
     460             : messages are received is undefined in most cases the
     461             : `receive_data(..., true)` call should be used to restart the
     462             : algorithm.
     463             : 
     464             : The third optional element in the returned `std::tuple` is a `size_t` whose
     465             : value corresponds to the index of the action to be called next in the
     466             : PDAL. The metafunction `tmpl::index_of<list, element>` can be used to
     467             : get an `tmpl::integral_constant` with the value of the index of the element
     468             : `element` in the typelist `list`. For example,
     469             : 
     470             : \snippet Test_AlgorithmCore.cpp out_of_order_action
     471             : 
     472             : Again a reference to the db::DataBox is returned, while the
     473             : termination `bool` and next action `size_t` are returned by value. The
     474             : metafunction call `tmpl::index_of<ActionList,
     475             : iterate_increment_int0>::%value` returns a `size_t` whose value is
     476             : that of the action `iterate_increment_int0` in the PDAL.  The indexing
     477             : of actions in the PDAL starts at `0`.
     478             : 
     479             : Iterable actions are invoked as part of the algorithm and so the only way
     480             : to request they be invoked is by having the algorithm run on the parallel
     481             : component. The algorithm can be explicitly evaluated in a new phase by calling
     482             : `start_phase(Phase::TheCurrentPhase)`:
     483             : 
     484             : \snippet Test_AlgorithmCore.cpp start_phase
     485             : 
     486             : Alternatively, to evaluate the algorithm without changing phases the
     487             : `perform_algorithm()` method can be used:
     488             : 
     489             : \snippet Test_AlgorithmParallel.cpp perform_algorithm
     490             : 
     491             : By passing `true` to `perform_algorithm` the algorithm will be restarted if it
     492             : was terminated.
     493             : 
     494             : The algorithm is also evaluated by calling the `receive_data` function, either
     495             : on an entire array or singleton (this does a broadcast), or an on individual
     496             : element of the array. Here is an example of a broadcast call:
     497             : 
     498             : \snippet Test_AlgorithmParallel.cpp broadcast_to_group
     499             : 
     500             : and of calling individual elements:
     501             : 
     502             : \snippet Test_AlgorithmParallel.cpp call_on_indexed_array
     503             : 
     504             : The `receive_data` function always takes a `ReceiveTag`, which is set in the
     505             : actions `inbox_tags` type alias as described above.  The first argument is the
     506             : temporal identifier, and the second is the data to be sent.
     507             : 
     508             : Normally when remote functions are invoked they go through the Charm++ runtime
     509             : system, which adds some overhead. The `receive_data` function tries to elide
     510             : the call to the Charm++ RTS for calls into array components. Charm++ refers to
     511             : these types of remote calls as "inline entry methods". With the Charm++ method
     512             : of eliding the RTS, the code becomes susceptible to stack overflows because
     513             : of infinite recursion. The `receive_data` function is limited to at most 64 RTS
     514             : elided calls, though in practice reaching this limit is rare. When the limit is
     515             : reached the remote method invocation is done through the RTS instead of being
     516             : elided.
     517             : 
     518             : ## 3. Reduction Actions {#dev_guide_parallelization_reduction_actions}
     519             : 
     520             : Finally, there are reduction actions which are used when reducing data over an
     521             : array. For example, you may want to know the sum of a `int` from every
     522             : element in the array. You can do this as follows:
     523             : 
     524             : \snippet Test_AlgorithmReduction.cpp contribute_to_reduction_example
     525             : 
     526             : This reduces over the parallel component
     527             : `ArrayParallelComponent<Metavariables>`, reduces to the parallel component
     528             : `SingletonParallelComponent<Metavariables>`, and calls the action
     529             : `ProcessReducedSumOfInts` after the reduction has been performed. The reduction
     530             : action is:
     531             : 
     532             : \snippet Test_AlgorithmReduction.cpp reduce_sum_int_action
     533             : 
     534             : As you can see, the last argument to the `apply` function is of type `int`, and
     535             : is the reduced value.
     536             : 
     537             : You can also broadcast the result back to an array, even yourself. For example,
     538             : 
     539             : \snippet Test_AlgorithmReduction.cpp contribute_to_broadcast_reduction
     540             : 
     541             : It is often necessary to reduce custom data types, such as `std::vector` or
     542             : `std::unordered_map`. Charm++ supports such custom reductions, and so does our
     543             : layer on top of Charm++.
     544             : Custom reductions require one additional step to calling
     545             : `contribute_to_reduction`, which is writing a reduction function to reduce the
     546             : custom data. We provide a generic type that can be used in custom reductions,
     547             : `Parallel::ReductionData`, which takes a series of `Parallel::ReductionDatum` as
     548             : template parameters and `ReductionDatum::value_type`s as the arguments to the
     549             : constructor. Each `ReductionDatum` takes up to four template parameters (two
     550             : are required). The first is the type of data to reduce, and the second is a
     551             : binary invokable that is called at each step of the reduction to combine two
     552             : messages. The last two template parameters are used after the reduction has
     553             : completed. The third parameter is an n-ary invokable that is called once the
     554             : reduction is complete, whose first argument is the result of the reduction. The
     555             : additional arguments can be any `ReductionDatum::value_type` in the
     556             : `ReductionData` that are before the current one. The fourth template parameter
     557             : of `ReductionDatum` is used to specify which data should be passed. It is a
     558             : `std::index_sequence` indexing into the `ReductionData`.
     559             : 
     560             : The action that is invoked with the result of the reduction is:
     561             : 
     562             : \snippet Test_AlgorithmReduction.cpp custom_reduction_action
     563             : 
     564             : Note that it takes objects of the types that the reduction was done over as
     565             : additional arguments.
     566             : 
     567             : \warning
     568             : All elements of the array must call the same reductions in the same order. It is
     569             : *defined* behavior to do multiple reductions at once as long as all contribute
     570             : calls on all array elements occurred in the same order. It is **undefined**
     571             : behavior if the contribute calls are made in different orders on different array
     572             : elements.
     573             : 
     574             : # Mutable items in the GlobalCache
     575             : 
     576             : Most items in the GlobalCache are constant, and are specified
     577             : by type aliases called `const_global_cache_tags` as
     578             : described above. However, the GlobalCache can also store mutable
     579             : items. Because of asynchronous execution, care must be taken when
     580             : mutating items in the GlobalCache, as described below.
     581             : 
     582             : A mutable item can be of any type, as long as that type is something
     583             : that can be checked for whether it is "up-to-date".  Here "up-to-date"
     584             : means that the item can be safely used (even read-only) without
     585             : needing to be mutated first. For example, a mutable item might be a
     586             : function of time that knows the range of times for which it is valid;
     587             : the mutable item is then deemed up-to-date if it will be called for a
     588             : time within its range of validity, and it is deemed not up-to-date if
     589             : it will be called for a time outside its range of validity.  Thus the
     590             : up-to-date status of a mutable item is determined by both the state of
     591             : the item itself and by the code that wishes to use that item.
     592             : 
     593             : ## 1. Specification of mutable GlobalCache items
     594             : 
     595             : Mutable GlobalCache items are specified by a
     596             : type alias `mutable_global_cache_tags`, which is treated the same way
     597             : as `const_global_cache_tags` for const items.
     598             : 
     599             : ## 2. Use of mutable GlobalCache items
     600             : 
     601             : ### 1. Checking if the item is up-to-date
     602             : 
     603             : Because execution is asynchronous, any code that uses a mutable item
     604             : in the GlobalCache must first check whether that item is up-to-date.
     605             : The information about whether an item is up-to-date is assumed to be
     606             : stored in the item itself.  For example, a mutable object stored in
     607             : the GlobalCache might have type `std::map<temporal_id,T>` (for some
     608             : type `T`), and then any code that uses the stored object can check
     609             : whether an entry exists for a particular `temporal_id`.  To avoid
     610             : race conditions, it is
     611             : important that up-to-date checks are based on something that is
     612             : independent of the order of mutation (like a `temporal_id`, and not
     613             : like checking the size of a vector).
     614             : 
     615             : To check an item, use the function
     616             : `Parallel::mutable_cache_item_is_ready`, which returns a bool
     617             : indicating whether the item is up-to-date.  If the item is up-to-date,
     618             : then it can be used.  `Parallel::mutable_cache_item_is_ready` takes a
     619             : lambda as an argument.  This lambda is passed a single argument: a
     620             : const reference to the item being retrieved.  The lambda should
     621             : determine whether the item is up-to-date. If so, it should return a
     622             : default_constructed `std::optional<CkCallback>`; if not, it should
     623             : return a `std::optional<CkCallback>` to a callback function that will
     624             : be called on the next `Parallel::mutate` of that item. The callback
     625             : will typically check again if the item is up-to-date and if so will
     626             : execute some code that gets the item via `Parallel::get`.
     627             : 
     628             : For the case of iterable actions, `Parallel::mutable_cache_item_is_ready`
     629             : is typically called from the `is_ready` function of the iterable action,
     630             : and the callback is `perform_algorithm()`.  In the example below, the
     631             : vector is considered up-to-date if it is non-empty:
     632             : 
     633             : \snippet Test_AlgorithmGlobalCache.cpp check_mutable_cache_item_is_ready
     634             : 
     635             : Note that `Parallel::mutable_cache_item_is_ready` is called on a local
     636             : core and does no parallel communication.
     637             : 
     638             : ### 2. Retrieving the item
     639             : 
     640             : The item is retrieved using `Parallel::get` just like for constant items.
     641             : For example, to retrieve the item `Tags::VectorOfDoubles`:
     642             : \snippet Test_AlgorithmGlobalCache.cpp retrieve_mutable_cache_item
     643             : 
     644             : Note that `Parallel::get` is called on a local core and does no
     645             : parallel communication.
     646             : 
     647             : Whereas we support getting *non-mutable* items in the GlobalCache from
     648             : a DataBox via `db::get`, we intentionally do not support
     649             : `db::get` of *mutable* items in the GlobalCache from a DataBox.
     650             : The reason is that mutable
     651             : items should be retrieved only after a `Parallel::mutable_cache_item_is_ready`
     652             : check, and being able to retrieve a mutable item from a DataBox makes it
     653             : difficult to enforce that check, especially when automatically-executing
     654             : compute items are considered.
     655             : 
     656             : ## 3. Modifying a mutable GlobalCache item
     657             : 
     658             : To modify a mutable item, pass `Parallel::mutate` two template
     659             : parameters: the tag to mutate, and a struct with an `apply` function
     660             : that does the mutating. `Parallel::mutate` takes two arguments:
     661             : a proxy to the GlobalCache, and a tuple that is passed into the
     662             : mutator function.  For the following example,
     663             : 
     664             : \snippet Test_AlgorithmGlobalCache.cpp mutate_global_cache_item
     665             : 
     666             : the mutator function is defined as below:
     667             : \snippet Test_AlgorithmGlobalCache.cpp mutate_global_cache_item_mutator
     668             : 
     669             : `Parallel::mutate` broadcasts to every core, where it calls the
     670             : mutator function and then calls all the callbacks that have been set
     671             : on that core by `Parallel::mutable_cache_item_is_ready`.  The
     672             : `Parallel::mutate` operation is guaranteed to be thread-safe without
     673             : any further action by the developer.
     674             : 
     675             : # Charm++ Node and Processor Level Initialization Functions {#dev_guide_parallelization_charm_node_processor_level_initialization}
     676             : 
     677             : Charm++ allows running functions once per core and once per node before the
     678             : construction of any parallel components. This is commonly used for setting up
     679             : error handling and enabling floating point exceptions. Other functions could
     680             : also be run. Which functions are run on each node and core is set by specifying
     681             : a `std::vector<void (*)()>` called `charm_init_node_funcs` and
     682             : `charm_init_proc_funcs` with function pointers to the functions to be called.
     683             : For example,
     684             : \snippet Test_AlgorithmCore.cpp charm_init_funcs_example
     685             : 
     686             : Finally, the user must include the `Parallel/CharmMain.tpp` file at the end of
     687             : the main executable cpp file. So, the end of an executables main cpp file will
     688             : then typically look as follows:
     689             : \snippet Test_AlgorithmParallel.cpp charm_include_example

Generated by: LCOV version 1.14