SpECTRE
v2023.05.16

Distribution strategy for assigning elements to CPUs using a Morton ('Zorder') spacefilling curve to determine placement within each block, where Element
s are distributed across CPUs.
More...
#include <ElementDistribution.hpp>
Public Member Functions  
BlockZCurveProcDistribution (const std::unordered_map< ElementId< Dim >, double > &element_costs, size_t number_of_procs_with_elements, const std::vector< Block< Dim > > &blocks, const std::vector< std::array< size_t, Dim > > &initial_refinement_levels, const std::vector< std::array< size_t, Dim > > &initial_extents, const std::unordered_set< size_t > &global_procs_to_ignore={})  
The number_of_procs_with_elements argument represents how many procs will have elements. This is not necessarily equal to the total number of procs because some global procs may be ignored by the sixth argument global_procs_to_ignore .  
size_t  get_proc_for_element (const ElementId< Dim > &element_id) const 
Gets the suggested processor number for a particular ElementId , determined by the Morton curve weighted element assignment described in detail in the parent class documentation.  
const std::vector< std::vector< std::pair< size_t, size_t > > > &  block_element_distribution () const 
Distribution strategy for assigning elements to CPUs using a Morton ('Zorder') spacefilling curve to determine placement within each block, where Element
s are distributed across CPUs.
The element distribution attempts to assign a balanced total computational cost to each processor that is allowed to have Element
s. First, each Block
's Element
s are ordered by their Zcurve index (see more below). Element
s are traversed in this order and assigned to CPUs in order, moving onto the next CPU once the target cost per CPU is met. The target cost per CPU is defined as the remaining cost to distribute divided by the remaining number of CPUs to distribute to. This is an important distinction from simply having one constant target cost per CPU defined as the total cost divided by the total number of CPUs with elements. Since the total cost of Element
s on a processor will nearly never add up to be exactly the average cost per CPU, this means that we would either have to decide to overshoot or undershoot the average as we iterate over the CPUs and assign Element
s. If we overshoot the average on each processor, the final processor could have a much lower cost than the rest of the processors and we run the risk of overshooting so much that one or more of the requested processors don't get assigned any Element
s at all. If we undershoot the average on each processor, the final processor could have a much higher cost than the others due to remainder cost piling up. This algorithm avoids these risks by instead adjusting the target cost per CPU as we finish assigning cost to previous CPUs.
Morton curves are a simple and easilycomputed spacefilling curve that (unlike Hilbert curves) permit diagonal traversal. See, for instance, [138] for a discussion of mesh partitioning using spacefilling curves. A concrete example of the use of a Morton curve in 2d is given below.
A sketch of a 2D block with 4x2 elements, with each element labeled according to the order on the Morton curve:
(forming a zigzag path, that under some rotation/reflection has a 'Z' shape).
The Morton curve method is a quick way of getting acceptable spatial locality – usually, for approximately even distributions, it will ensure that elements are assigned in large volume chunks, and the structure of the Morton curve ensures that for a given processor and block, the elements will be assigned in no more than two orthogonally connected clusters. In principle, a Hilbert curve could potentially improve upon the gains obtained by this class by guaranteeing that all elements within each block form a single orthogonally connected cluster.
The assignment of portions of blocks to processors may use partial blocks, and/or multiple blocks to ensure an even distribution of elements to processors. We currently make no distinction between dividing elements between processors within a node and dividing elements between processors across nodes. The current technique aims to have a simple method of reducing communication globally, though it would likely be more efficient to prioritize minimization of internode communication, because communication across interconnects is the primary cost of communication in charm++ runs.
Dim  the number of spatial dimensions of the Block s 