SpECTRE Documentation Coverage Report
 Current view: top level - __w/spectre/spectre/docs/DevGuide - Parallelization.md Hit Total Coverage Commit: f1ddee3e40d81480e49140855d2b0e66fafaa908 Lines: 0 1 0.0 % Date: 2020-12-02 17:35:08 Legend: Lines: hit not hit
  Line data Source code  1 0 : \cond NEVER 2 : Distributed under the MIT License. 3 : See LICENSE.txt for details. 4 : \endcond 5 : # Parallelization, Charm++, and Core Concepts {#dev_guide_parallelization_foundations} 6 : 7 : \tableofcontents 8 : 9 : # Introduction {#dev_guide_parallelization_introduction} 10 : 11 : SpECTRE builds a layer on top of Charm++ that performs various safety checks and 12 : initialization for the user that can otherwise lead to difficult-to-debug 13 : undefined behavior. The central concept is what is called a %Parallel 14 : Component. A %Parallel Component is a struct with several type aliases that 15 : is used by SpECTRE to set up the Charm++ chares and allowed communication 16 : patterns. %Parallel Components are input arguments to the compiler, which then 17 : writes the parallelization infrastructure that you requested for the executable. 18 : There is no restriction on the number of %Parallel Components, though 19 : practically it is best to have around 10 at most. 20 : 21 : Here is an overview of what is described in detail in the sections below: 22 : 23 : - \ref dev_guide_parallelization_metavariables_class "Metavariables": Provides 24 : high-level configuration to the compiler, e.g. the physical system to be 25 : simulated. 26 : - \ref dev_guide_parallelization_phases_of_execution "Phases": Defines distinct 27 : simulation phases separated by a global synchronization point, 28 : e.g. Initialization, Evolve and Exit. 29 : - \ref dev_guide_parallelization_core_algorithm "Algorithm": In each phase, 30 : repeatedly iterates over a list of actions until the current phase ends. 31 : - \ref dev_guide_parallelization_parallel_components "Parallel component": 32 : Maintains and executes its algorithm. 33 : - \ref dev_guide_parallelization_actions "Action": Performs a computational 34 : task, e.g. evaluating the right hand side of the time evolution equations. May 35 : require data to be received from another action potentially being executed on 36 : a different core or node. 37 : 38 : # The Metavariables Class {#dev_guide_parallelization_metavariables_class} 39 : 40 : SpECTRE takes a different approach to input options passed to an executable than 41 : is common. SpECTRE not only reads an input file at runtime but also has many 42 : choices made at compile time. The compile time options are specified by what is 43 : referred to as the metavariables. What exactly the metavariables struct 44 : specifies depends on the executable, but all metavariables structs must 45 : specify the following: 46 : 47 : - help: a static constexpr Options::String that will be printed as part of 48 : the help message. It should describe the executable and basic usage of it, as 49 : well as any non-standard options that must be specified in the metavariables 50 : and their current values. An example of a help string for one of the testing 51 : executables is: 52 : \snippet Test_AlgorithmCore.cpp help_string_example 53 : - component_list: a tmpl::list of the parallel components (described below) 54 : that are to be created. Most evolution executables will have the 55 : DgElementArray parallel component listed. An example of a component_list 56 : for one of the test executables is: 57 : \snippet Test_AlgorithmCore.cpp component_list_example 58 : - using const_global_cache_tags: a tmpl::list of tags that are 59 : used to place const items in the GlobalCache. The alias may be 60 : omitted if the list is empty. 61 : - using mutable_global_cache_tags: a tmpl::list of tags that are 62 : used to place mutable items in the GlobalCache. The alias may be 63 : omitted if the list is empty. 64 : - Phase: an enum class that must contain at least Initialization and 65 : Exit. Phases are described in the next section. 66 : - determine_next_phase: a static function with the signature 67 : \code 68 : static Phase determine_next_phase( 69 : const Phase& current_phase, 70 : const Parallel::CProxy_GlobalCache& cache_proxy) 71 : noexcept; 72 : \endcode 73 : What this function does is described below in the discussion of 74 : \ref dev_guide_parallelization_phases_of_execution "phases". 75 : 76 : There are also several optional members: 77 : 78 : - input_file: a static constexpr Options::String that is the default name of 79 : the input file that is to be read. This can be overridden at runtime by 80 : passing the --input-file argument to the executable. 81 : - ignore_unrecognized_command_line_options: a static constexpr bool that 82 : defaults to false. If set to true then unrecognized command line options 83 : are ignored. Ignoring unrecognized options is generally only necessary for 84 : tests where arguments for the testing framework, 85 : [Catch](https://github.com/catchorg/Catch2/), are passed to the executable. 86 : 87 : # Phases of an Execution {#dev_guide_parallelization_phases_of_execution} 88 : 89 : Global synchronization points, where all cores wait for each other, are 90 : undesirable for scalability reasons. However, they are sometimes inevitable for 91 : algorithmic reasons. That is, in order to actually get a correct solution you 92 : need to have a global synchronization. SpECTRE executables can have multiple 93 : phases, where after each phase a global synchronization occurs. By global 94 : synchronization we mean that no parallel components are executing or have more 95 : tasks to execute: everything is waiting to be told what tasks to perform next. 96 : 97 : Every executable must have at least two phases, Initialization and 98 : Exit. The next phase is decided by the static member function 99 : determine_next_phase in the metavariables. Currently this function has access 100 : to the phase that is 101 : ending, and also the global cache. In the future we will add support for 102 : receiving data from various components to allow for more complex decision 103 : making. Here is an example of a determine_next_phase function and the Phase 104 : enum class: 105 : \snippet Test_AlgorithmCore.cpp determine_next_phase_example 106 : 107 : In contrast, an evolution executable might have phases 108 : Initialization, SetInitialData, Evolve, and Exit, but have a 109 : similar switch or if-else logic in the determine_next_phase 110 : function. The first phase that is entered is always 111 : Initialization. During the Initialization phase the 112 : Parallel::GlobalCache is created, all non-array components are created, 113 : and empty array components are created. Next, the function 114 : allocate_array_components_and_execute_initialization_phase is called 115 : which allocates the elements of each array component, and then starts 116 : the Initialization phase on all parallel components. Once all 117 : parallel components' Initialization phase is complete, the next 118 : phase is determined and the execute_next_phase function is called on 119 : all the parallel components. 120 : 121 : At the end of an execution the Exit phase has the executable wait to make sure 122 : no parallel components are performing or need to perform any more tasks, and 123 : then exits. An example where this approach is important is if we are done 124 : evolving a system but still need to write data to disk. We do not want to exit 125 : the simulation until all data has been written to disk, even though we've 126 : reached the final time of the evolution. 127 : 128 : \warning Currently dead-locks are treated as successful termination. In the 129 : future checks against deadlocks will be performed before terminating. 130 : 131 : # The Algorithm {#dev_guide_parallelization_core_algorithm} 132 : 133 : Since most numerical algorithms repeat steps until some criterion such as the 134 : final time or convergence is met, SpECTRE's parallel components are designed to 135 : do such iterations for the user. An Algorithm executes an ordered list of 136 : actions until one of the actions cannot be evaluated, typically because it is 137 : waiting on data from elsewhere. When an algorithm can no longer evaluate actions 138 : it passively waits by handing control back to Charm++. Once an algorithm 139 : receives data, typically done by having another parallel component call the 140 : receive_data function, the algorithm will try again to execute the next 141 : action. If the algorithm is still waiting on more data then the algorithm will 142 : again return control to Charm++ and passively wait for more data. This is 143 : repeated until all required data is available. The actions that are iterated 144 : over by the algorithm are called iterable actions and are described below. 145 : Since the action list is phase dependent we refer to them generally as 146 : phase-dependent action lists (PDALs, pronounced "pedals"). 147 : 148 : # Parallel Components {#dev_guide_parallelization_parallel_components} 149 : 150 : Each %Parallel Component struct must have the following type aliases: 151 : 1. using chare_type is set to one of: 152 : 1. Parallel::Algorithms::Singletons have one object in the entire execution 153 : of the program. 154 : 2. Parallel::Algorithms::Arrays hold zero or more elements, each of which 155 : is an object distributed to some core. An array can grow and shrink in 156 : size dynamically if need be and can also be bound to another array. A 157 : bound array has the same number of elements as the array it is bound to, 158 : and elements with the same ID are on the same core. See Charm++'s chare 159 : arrays for details. 160 : 3. Parallel::Algorithms::Groups are arrays with 161 : one element per core which are not able to be moved around between 162 : cores. These are typically useful for gathering data from array elements 163 : on their core, and then processing or reducing the data further. See 164 : [Charm++'s](http://charm.cs.illinois.edu/help) group chares for details. 165 : 4. Parallel::Algorithms::Nodegroups are similar to 166 : groups except that there is one element per node. For Charm++ SMP (shared 167 : memory parallelism) builds, a node corresponds to the usual definition of 168 : a node on a supercomputer. However, for non-SMP builds nodes and cores are 169 : equivalent. We ensure that all entry method calls done through the 170 : Algorithm's simple_action and receive_data functions are 171 : threadsafe. User controlled threading is possible by calling the non-entry 172 : method member function threaded_action. 173 : 2. using metavariables is set to the Metavariables struct that stores the 174 : global metavariables. It is often easiest to have the %Parallel 175 : Component struct have a template parameter Metavariables that is the 176 : global metavariables struct. Examples of this technique are given below. 177 : 3. using phase_dependent_action_list is set to a tmpl::list of 178 : Parallel::PhaseActions> 179 : where each PhaseAction represents a PDAL that will be executed on 180 : the parallel component during the specified phase. The %Actions are 181 : executed in the order that they are given in the tmpl::lists of 182 : the PDALs, but the phases need not be run in linear order. However, 183 : db::DataBox types are constructed assuming the phases are 184 : performed from first in the phase_dependent_action_list to the 185 : last. Simple actions (described below) can be executed in any 186 : phase. If there are no iterable actions in a phase then a 187 : PhaseAction need not be specified for that phase. However, at 188 : least one PhaseAction, even if it is empty, must be specified. 189 : 4. using initialization_tags which is a tmpl::list of all the tags 190 : that will be inserted into the initial db::DataBox of each component. 191 : These tags are db::SimpleTag%s that have have a using option_tags 192 : type alias and a static function create_from_options (see the 193 : example below). This list can usually be constructed from the 194 : initialization actions of the component (i.e. the list of actions 195 : in the PhaseAction list for the Initialization phase) using the 196 : helper function Parallel::get_initialization_tags (see the 197 : examples of components below). Each initialization action may 198 : specify a type alias using initialization_tags which are a 199 : tmpl::list of tags that will be fetched from the db::DataBox by the 200 : action. All initialization_tags are removed from the db::DataBox of 201 : the component at the end of the Initialization phase, except for 202 : tags listed in a type alias using initialization_tags_to_keep that 203 : may appear in each initialization action. 204 : 5. using const_global_cache_tags is set to a tmpl::list of tags 205 : that are required by the allocate_array function of an array 206 : component, or simple actions called on the parallel component. 207 : These tags correspond to const items that are stored in the 208 : Parallel::GlobalCache (of which there is one copy per Charm++ 209 : node). The alias can be omitted if the list is empty. (See 210 : array_allocation_tags below for specifying tags needed for the 211 : allocate_array function, but will not be added to the 212 : Parallel::GlobalCache.) 213 : 6. using mutable_global_cache_tags is set to a tmpl::list of tags 214 : that correspond to mutable items that are stored in the 215 : Parallel::GlobalCache (of which there is one copy per Charm++ 216 : core). The alias can be omitted if the list is empty. 217 : 218 : \note Array parallel components must also specify the type alias using 219 : array_index, which is set to the type that indexes the %Parallel Component 220 : Array. Charm++ allows arrays to be 1 through 6 dimensional or be indexed by a 221 : custom type. The Charm++ provided indexes are wrapped as 222 : Parallel::ArrayIndex1D through Parallel::ArrayIndex6D. When writing custom 223 : array indices, the [Charm++ manual](http://charm.cs.illinois.edu/help) tells you 224 : to write your own CkArrayIndex, but we have written a general implementation 225 : that provides this functionality (see Parallel::ArrayIndex); all that you need 226 : to provide is a plain-old-data 227 : ([POD](http://en.cppreference.com/w/cpp/concept/PODType)) struct of the size of 228 : at most 3 integers. 229 : 230 : %Parallel array components have a static allocate_array function 231 : that is used to construct the elements of the array. The 232 : signature of the allocate_array functions must be: 233 : \code 234 : static void allocate_array( 235 : Parallel::CProxy_GlobalCache& global_cache, 236 : const tuples::tagged_tuple_from_typelist& 237 : initialization_items) noexcept; 238 : \endcode 239 : The allocate_array function is called by the Main parallel component 240 : when the execution starts and will typically insert elements into 241 : array parallel components. If the allocate_array function depends 242 : upon input options, the array component must specify a using 243 : array_allocation_tags type alias that is a tmpl::list of tags which 244 : are db::SimpleTag%s that have have a using option_tags type alias 245 : and a static function create_from_options. An example is: 246 : \snippet DistributedLinearSolverAlgorithmTestHelpers.hpp array_allocation_tag 247 : 248 : The allocate_array functions of different 249 : array components are called in random order and so it is not safe to 250 : have them depend on each other. 251 : 252 : Each parallel component must also decide what to do in the different phases of 253 : the execution. This is controlled by an execute_next_phase function with 254 : signature: 255 : \code 256 : static void execute_next_phase( 257 : const typename metavariables::Phase next_phase, 258 : const Parallel::CProxy_GlobalCache& global_cache); 259 : \endcode 260 : The determine_next_phase function in the Metavariables determines the next 261 : phase, after which the execute_next_phase function gets called. The 262 : execute_next_phase function determines what the parallel component should do 263 : during the next phase. Typically the execute_next_phase function should just 264 : call start_phase(phase) on the parallel component. In the future 265 : execute_next_phase may be removed. 266 : 267 : An example of a singleton parallel component is: 268 : \snippet Test_AlgorithmParallel.cpp singleton_parallel_component 269 : 270 : An example of an array parallel component is: 271 : \snippet Test_AlgorithmParallel.cpp array_parallel_component 272 : Elements are inserted into the array by using the Charm++ insert member 273 : function of the CProxy for the array. The insert function is documented in 274 : the Charm++ manual. In the above Array example array_proxy is a CProxy and 275 : so all the documentation for Charm++ array proxies applies. SpECTRE always 276 : creates empty arrays with the constructor and requires users to insert however 277 : many elements they want and on which cores they want them to be placed. Note 278 : that load balancing calls may result in array elements being moved. 279 : 280 : # Actions {#dev_guide_parallelization_actions} 281 : 282 : For those familiar with Charm++, actions should be thought of as effectively 283 : being entry methods. They are functions that can be invoked on a remote object 284 : (chare/parallel component) using a CProxy (see the [Charm++ 285 : manual](http://charm.cs.illinois.edu/help)), which is retrieved from the 286 : Parallel::GlobalCache using the parallel component struct and the 287 : Parallel::get_parallel_component() function. %Actions are structs with a 288 : static apply method and come in three variants: simple actions, iterable 289 : actions, and reduction actions. One important thing to note 290 : is that actions cannot return any data to the caller of the remote method. 291 : Instead, "returning" data must be done via callbacks or a callback-like 292 : mechanism. 293 : 294 : The simplest signature of an apply method is for iterable actions: 295 : \snippet Test_AlgorithmCore.cpp apply_iterative 296 : The return type is discussed at the end of each section describing a particular 297 : type of action. Simple actions can have additional arguments, do not receive 298 : the inboxes or ActionList, and take the ParallelComponent as an 299 : explicit first template parameter. Reduction actions have the same signature as 300 : simple actions except that the additional arguments must be of the types reduced 301 : over. 302 : The db::DataBox should be thought of as the member data of the parallel 303 : component while the actions are the member functions. The combination of a 304 : db::DataBox and actions allows building up classes with arbitrary member data 305 : and methods using template parameters and invocation of actions. This approach 306 : allows us to eliminate the need for users to work with Charm++'s interface 307 : files, which can be error prone and difficult to use. 308 : 309 : The Parallel::GlobalCache is passed to each action so that the 310 : action has access to global data and is able to invoke actions on 311 : other parallel components. The ParallelComponent template parameter 312 : is the tag of the parallel component that invoked the action. A proxy 313 : to the calling parallel component can then be retrieved from the 314 : Parallel::GlobalCache. The remote entry method invocations are 315 : slightly different for different types of actions, so they will be 316 : discussed below. However, one thing that is disallowed for all actions 317 : is calling an action locally from within an action on the same 318 : parallel component. Specifically, 319 : 320 : \snippet Test_AlgorithmNestedApply1.cpp bad_recursive_call 321 : 322 : Here ckLocal() is a Charm++ provided method that returns a pointer 323 : to the local (currently executing) parallel component. See the 324 : [Charm++ manual](http://charm.cs.illinois.edu/help) for more 325 : information. However, you are able to queue a new action to be 326 : executed later on the same parallel component by getting your own 327 : parallel component from the Parallel::GlobalCache 328 : (Parallel::get_parallel_component(cache)). The 329 : difference between the two calls is that by calling an action through 330 : the parallel component you will first finish the series of actions you 331 : are in, then when they are complete Charm++ will call the next queued 332 : action. 333 : 334 : Array, group, and nodegroup parallel components can have actions invoked in two 335 : ways. First is a broadcast where the action is called on all elements of the 336 : array: 337 : 338 : \snippet Test_AlgorithmParallel.cpp broadcast_to_group 339 : 340 : The second case is invoking an action on a specific array element by using the 341 : array element's index. The below example shows how a broadcast would be done 342 : manually by looping over all elements in the array: 343 : 344 : \snippet Test_AlgorithmParallel.cpp call_on_indexed_array 345 : 346 : Note that in general you will not know what all the elements in the array are 347 : and so a broadcast is the correct method of sending data to or invoking an 348 : action on all elements of an array parallel component. 349 : 350 : The array_index argument passed to all apply methods is the index into the 351 : parallel component array. If the parallel component is not an array the value 352 : and type of array_index is implementation defined and cannot be relied on. The 353 : ActionList type is the tmpl::list of iterable actions in the current phase. 354 : That is, it is equal to the action_list type alias in the current PDAL. 355 : 356 : ## 1. Simple Actions {#dev_guide_parallelization_simple_actions} 357 : 358 : Simple actions are designed to be called in a similar fashion to 359 : member functions of classes. They are the direct analog of entry 360 : methods in Charm++ except that the member data is stored in the 361 : db::DataBox that is passed in as the first argument. A simple action 362 : must return void but can use db::mutate to change values of items in 363 : the db::DataBox if the db::DataBox is taken as a non-const 364 : reference. In some cases you will need specific items to be in the 365 : db::DataBox otherwise the action won't compile. To restrict which 366 : db::DataBoxes can be passed you should use Requires in the 367 : action's apply function template parameter list. For example, 368 : \snippet Test_AlgorithmCore.cpp requires_action 369 : where the conditional checks if any element in the parameter pack DbTags is 370 : CountActionsCalled. 371 : 372 : A simple action that does not take any arguments can be called using a CProxy 373 : from the Parallel::GlobalCache as follows: 374 : 375 : \snippet Test_AlgorithmCore.cpp simple_action_call 376 : 377 : If the simple action takes arguments then the arguments must be as follows: 378 : 379 : \snippet Test_AlgorithmNodelock.cpp simple_action_with_args 380 : 381 : ## 2. Iterable Actions {#dev_guide_parallelization_iterable_actions} 382 : 383 : %Actions in the algorithm that are part of the current PDAL are executed one 384 : after the other until one of them cannot be evaluated. Iterable 385 : actions may have an is_ready method that returns true or false depending 386 : on whether or not the action is ready to be evaluated. If no is_ready method 387 : is provided then the action is assumed to be ready to be evaluated. The 388 : is_ready method typically checks that required data from other parallel 389 : components has been received. For example, it may check that all data from 390 : neighboring elements has arrived to be able to continue integrating in time. 391 : The signature of an is_ready method must be: 392 : 393 : \snippet Test_AlgorithmCore.cpp is_ready_example 394 : 395 : The inboxes is a collection of the tags passed to receive_data and are 396 : specified in the iterable actions member type alias inbox_tags, which must be 397 : a tmpl::list. The inbox_tags must have two member type aliases, a 398 : temporal_id which is used to identify when the data was sent, and a type 399 : which is the type of the data to be stored in the inboxes. The types are 400 : typically a std::unordered_map. In the discussed scenario 401 : of waiting for neighboring elements to send their data the DATA type would be 402 : a std::unordered_map. Inbox tags must also specify 403 : a static void insert_into_inbox() function. For example, 404 : 405 : \snippet Test_AlgorithmParallel.cpp int_receive_tag 406 : 407 : For common types of DATA, such as a map, a data structure with an insert 408 : function, a data structure with a push_back function, or copy/move assignment 409 : that is used to insert the received data, inserters are available in 410 : Parallel::InboxInserters. For example, there is 411 : Parallel::InboxInserters::Map for map data structures. The inbox tag can 412 : inherit publicly off the inserters to gain the required insertion capabilities: 413 : 414 : \snippet Test_AlgorithmCore.cpp int receive tag insert 415 : 416 : The inbox_tags type alias for the action is: 417 : 418 : \snippet Test_AlgorithmParallel.cpp int_receive_tag_list 419 : 420 : and the is_ready function is: 421 : 422 : \snippet Test_AlgorithmParallel.cpp int_receive_tag_is_ready 423 : 424 : Once all of the ints have been received, the iterable action is executed, not 425 : before. 426 : 427 : \warning 428 : It is the responsibility of the iterable action to remove data from the inboxes 429 : that will no longer be needed. The removal of unneeded data should be done in 430 : the apply function. 431 : 432 : Iterable actions can change the type of the db::DataBox by adding or 433 : removing elements/tags from the db::DataBox. The only requirement is 434 : that the last action in each PDAL returns a db::DataBox that is the 435 : same type for each iteration. Iterable actions can also request that 436 : the algorithm no longer be executed, and control which action in the 437 : current PDAL will be executed next. This is all done via the return 438 : value from the apply function. The apply function for iterable 439 : actions must return a std::tuple of one, two, or three elements. The 440 : first element of the tuple is the new db::DataBox, which can be the 441 : same as the type passed in or a db::DataBox with different tags. Most 442 : iterable actions will simply return: 443 : 444 : \snippet Test_AlgorithmParallel.cpp return_forward_as_tuple 445 : 446 : By returning the db::DataBox as a reference in a std::tuple we avoid 447 : any unnecessary copying of the db::DataBox. The second argument is an 448 : optional bool, and controls whether or not the algorithm is 449 : terminated. If the bool is true then the algorithm is terminated, by 450 : default it is false. Here is an example of how to return a 451 : db::DataBox with the same type that is passed in and also terminate 452 : the algorithm: 453 : 454 : \snippet Test_AlgorithmParallel.cpp return_with_termination 455 : 456 : Notice that we again return a reference to the db::DataBox, which is 457 : done to avoid any copying. After an algorithm has been terminated it 458 : can be restarted by passing false to the set_terminate method or 459 : by calling receive_data(..., true). Since the order in which 460 : messages are received is undefined in most cases the 461 : receive_data(..., true) call should be used to restart the 462 : algorithm. 463 : 464 : The third optional element in the returned std::tuple is a size_t whose 465 : value corresponds to the index of the action to be called next in the 466 : PDAL. The metafunction tmpl::index_of can be used to 467 : get an tmpl::integral_constant with the value of the index of the element 468 : element in the typelist list. For example, 469 : 470 : \snippet Test_AlgorithmCore.cpp out_of_order_action 471 : 472 : Again a reference to the db::DataBox is returned, while the 473 : termination bool and next action size_t are returned by value. The 474 : metafunction call tmpl::index_of::%value returns a size_t whose value is 476 : that of the action iterate_increment_int0 in the PDAL. The indexing 477 : of actions in the PDAL starts at 0. 478 : 479 : Iterable actions are invoked as part of the algorithm and so the only way 480 : to request they be invoked is by having the algorithm run on the parallel 481 : component. The algorithm can be explicitly evaluated in a new phase by calling 482 : start_phase(Phase::TheCurrentPhase): 483 : 484 : \snippet Test_AlgorithmCore.cpp start_phase 485 : 486 : Alternatively, to evaluate the algorithm without changing phases the 487 : perform_algorithm() method can be used: 488 : 489 : \snippet Test_AlgorithmParallel.cpp perform_algorithm 490 : 491 : By passing true to perform_algorithm the algorithm will be restarted if it 492 : was terminated. 493 : 494 : The algorithm is also evaluated by calling the receive_data function, either 495 : on an entire array or singleton (this does a broadcast), or an on individual 496 : element of the array. Here is an example of a broadcast call: 497 : 498 : \snippet Test_AlgorithmParallel.cpp broadcast_to_group 499 : 500 : and of calling individual elements: 501 : 502 : \snippet Test_AlgorithmParallel.cpp call_on_indexed_array 503 : 504 : The receive_data function always takes a ReceiveTag, which is set in the 505 : actions inbox_tags type alias as described above. The first argument is the 506 : temporal identifier, and the second is the data to be sent. 507 : 508 : Normally when remote functions are invoked they go through the Charm++ runtime 509 : system, which adds some overhead. The receive_data function tries to elide 510 : the call to the Charm++ RTS for calls into array components. Charm++ refers to 511 : these types of remote calls as "inline entry methods". With the Charm++ method 512 : of eliding the RTS, the code becomes susceptible to stack overflows because 513 : of infinite recursion. The receive_data function is limited to at most 64 RTS 514 : elided calls, though in practice reaching this limit is rare. When the limit is 515 : reached the remote method invocation is done through the RTS instead of being 516 : elided. 517 : 518 : ## 3. Reduction Actions {#dev_guide_parallelization_reduction_actions} 519 : 520 : Finally, there are reduction actions which are used when reducing data over an 521 : array. For example, you may want to know the sum of a int from every 522 : element in the array. You can do this as follows: 523 : 524 : \snippet Test_AlgorithmReduction.cpp contribute_to_reduction_example 525 : 526 : This reduces over the parallel component 527 : ArrayParallelComponent, reduces to the parallel component 528 : SingletonParallelComponent, and calls the action 529 : ProcessReducedSumOfInts after the reduction has been performed. The reduction 530 : action is: 531 : 532 : \snippet Test_AlgorithmReduction.cpp reduce_sum_int_action 533 : 534 : As you can see, the last argument to the apply function is of type int, and 535 : is the reduced value. 536 : 537 : You can also broadcast the result back to an array, even yourself. For example, 538 : 539 : \snippet Test_AlgorithmReduction.cpp contribute_to_broadcast_reduction 540 : 541 : It is often necessary to reduce custom data types, such as std::vector or 542 : std::unordered_map. Charm++ supports such custom reductions, and so does our 543 : layer on top of Charm++. 544 : Custom reductions require one additional step to calling 545 : contribute_to_reduction, which is writing a reduction function to reduce the 546 : custom data. We provide a generic type that can be used in custom reductions, 547 : Parallel::ReductionData, which takes a series of Parallel::ReductionDatum as 548 : template parameters and ReductionDatum::value_types as the arguments to the 549 : constructor. Each ReductionDatum takes up to four template parameters (two 550 : are required). The first is the type of data to reduce, and the second is a 551 : binary invokable that is called at each step of the reduction to combine two 552 : messages. The last two template parameters are used after the reduction has 553 : completed. The third parameter is an n-ary invokable that is called once the 554 : reduction is complete, whose first argument is the result of the reduction. The 555 : additional arguments can be any ReductionDatum::value_type in the 556 : ReductionData that are before the current one. The fourth template parameter 557 : of ReductionDatum is used to specify which data should be passed. It is a 558 : std::index_sequence indexing into the ReductionData. 559 : 560 : The action that is invoked with the result of the reduction is: 561 : 562 : \snippet Test_AlgorithmReduction.cpp custom_reduction_action 563 : 564 : Note that it takes objects of the types that the reduction was done over as 565 : additional arguments. 566 : 567 : \warning 568 : All elements of the array must call the same reductions in the same order. It is 569 : *defined* behavior to do multiple reductions at once as long as all contribute 570 : calls on all array elements occurred in the same order. It is **undefined** 571 : behavior if the contribute calls are made in different orders on different array 572 : elements. 573 : 574 : # Mutable items in the GlobalCache 575 : 576 : Most items in the GlobalCache are constant, and are specified 577 : by type aliases called const_global_cache_tags as 578 : described above. However, the GlobalCache can also store mutable 579 : items. Because of asynchronous execution, care must be taken when 580 : mutating items in the GlobalCache, as described below. 581 : 582 : A mutable item can be of any type, as long as that type is something 583 : that can be checked for whether it is "up-to-date". Here "up-to-date" 584 : means that the item can be safely used (even read-only) without 585 : needing to be mutated first. For example, a mutable item might be a 586 : function of time that knows the range of times for which it is valid; 587 : the mutable item is then deemed up-to-date if it will be called for a 588 : time within its range of validity, and it is deemed not up-to-date if 589 : it will be called for a time outside its range of validity. Thus the 590 : up-to-date status of a mutable item is determined by both the state of 591 : the item itself and by the code that wishes to use that item. 592 : 593 : ## 1. Specification of mutable GlobalCache items 594 : 595 : Mutable GlobalCache items are specified by a 596 : type alias mutable_global_cache_tags, which is treated the same way 597 : as const_global_cache_tags for const items. 598 : 599 : ## 2. Use of mutable GlobalCache items 600 : 601 : ### 1. Checking if the item is up-to-date 602 : 603 : Because execution is asynchronous, any code that uses a mutable item 604 : in the GlobalCache must first check whether that item is up-to-date. 605 : The information about whether an item is up-to-date is assumed to be 606 : stored in the item itself. For example, a mutable object stored in 607 : the GlobalCache might have type std::map (for some 608 : type T), and then any code that uses the stored object can check 609 : whether an entry exists for a particular temporal_id. To avoid 610 : race conditions, it is 611 : important that up-to-date checks are based on something that is 612 : independent of the order of mutation (like a temporal_id, and not 613 : like checking the size of a vector). 614 : 615 : To check an item, use the function 616 : Parallel::mutable_cache_item_is_ready, which returns a bool 617 : indicating whether the item is up-to-date. If the item is up-to-date, 618 : then it can be used. Parallel::mutable_cache_item_is_ready takes a 619 : lambda as an argument. This lambda is passed a single argument: a 620 : const reference to the item being retrieved. The lambda should 621 : determine whether the item is up-to-date. If so, it should return a 622 : default_constructed std::optional; if not, it should 623 : return a std::optional to a callback function that will 624 : be called on the next Parallel::mutate of that item. The callback 625 : will typically check again if the item is up-to-date and if so will 626 : execute some code that gets the item via Parallel::get. 627 : 628 : For the case of iterable actions, Parallel::mutable_cache_item_is_ready 629 : is typically called from the is_ready function of the iterable action, 630 : and the callback is perform_algorithm(). In the example below, the 631 : vector is considered up-to-date if it is non-empty: 632 : 633 : \snippet Test_AlgorithmGlobalCache.cpp check_mutable_cache_item_is_ready 634 : 635 : Note that Parallel::mutable_cache_item_is_ready is called on a local 636 : core and does no parallel communication. 637 : 638 : ### 2. Retrieving the item 639 : 640 : The item is retrieved using Parallel::get just like for constant items. 641 : For example, to retrieve the item Tags::VectorOfDoubles: 642 : \snippet Test_AlgorithmGlobalCache.cpp retrieve_mutable_cache_item 643 : 644 : Note that Parallel::get is called on a local core and does no 645 : parallel communication. 646 : 647 : Whereas we support getting *non-mutable* items in the GlobalCache from 648 : a DataBox via db::get, we intentionally do not support 649 : db::get of *mutable* items in the GlobalCache from a DataBox. 650 : The reason is that mutable 651 : items should be retrieved only after a Parallel::mutable_cache_item_is_ready 652 : check, and being able to retrieve a mutable item from a DataBox makes it 653 : difficult to enforce that check, especially when automatically-executing 654 : compute items are considered. 655 : 656 : ## 3. Modifying a mutable GlobalCache item 657 : 658 : To modify a mutable item, pass Parallel::mutate two template 659 : parameters: the tag to mutate, and a struct with an apply function 660 : that does the mutating. Parallel::mutate takes two arguments: 661 : a proxy to the GlobalCache, and a tuple that is passed into the 662 : mutator function. For the following example, 663 : 664 : \snippet Test_AlgorithmGlobalCache.cpp mutate_global_cache_item 665 : 666 : the mutator function is defined as below: 667 : \snippet Test_AlgorithmGlobalCache.cpp mutate_global_cache_item_mutator 668 : 669 : Parallel::mutate broadcasts to every core, where it calls the 670 : mutator function and then calls all the callbacks that have been set 671 : on that core by Parallel::mutable_cache_item_is_ready. The 672 : Parallel::mutate operation is guaranteed to be thread-safe without 673 : any further action by the developer. 674 : 675 : # Charm++ Node and Processor Level Initialization Functions {#dev_guide_parallelization_charm_node_processor_level_initialization} 676 : 677 : Charm++ allows running functions once per core and once per node before the 678 : construction of any parallel components. This is commonly used for setting up 679 : error handling and enabling floating point exceptions. Other functions could 680 : also be run. Which functions are run on each node and core is set by specifying 681 : a std::vector called charm_init_node_funcs and 682 : charm_init_proc_funcs with function pointers to the functions to be called. 683 : For example, 684 : \snippet Test_AlgorithmCore.cpp charm_init_funcs_example 685 : 686 : Finally, the user must include the Parallel/CharmMain.tpp file at the end of 687 : the main executable cpp file. So, the end of an executables main cpp file will 688 : then typically look as follows: 689 : \snippet Test_AlgorithmParallel.cpp charm_include_example 

 Generated by: LCOV version 1.14