SpECTRE  v2024.04.12
domain::BlockZCurveProcDistribution< Dim > Struct Template Reference

Distribution strategy for assigning elements to CPUs using a Morton ('Z-order') space-filling curve to determine placement within each block, where Elements are distributed across CPUs. More...

#include <ElementDistribution.hpp>

Public Member Functions

 BlockZCurveProcDistribution (const std::unordered_map< ElementId< Dim >, double > &element_costs, size_t number_of_procs_with_elements, const std::vector< Block< Dim > > &blocks, const std::vector< std::array< size_t, Dim > > &initial_refinement_levels, const std::vector< std::array< size_t, Dim > > &initial_extents, const std::unordered_set< size_t > &global_procs_to_ignore={})
 The number_of_procs_with_elements argument represents how many procs will have elements. This is not necessarily equal to the total number of procs because some global procs may be ignored by the sixth argument global_procs_to_ignore.
 
size_t get_proc_for_element (const ElementId< Dim > &element_id) const
 Gets the suggested processor number for a particular ElementId, determined by the Morton curve weighted element assignment described in detail in the parent class documentation.
 
const std::vector< std::vector< std::pair< size_t, size_t > > > & block_element_distribution () const
 

Detailed Description

template<size_t Dim>
struct domain::BlockZCurveProcDistribution< Dim >

Distribution strategy for assigning elements to CPUs using a Morton ('Z-order') space-filling curve to determine placement within each block, where Elements are distributed across CPUs.

Details

The element distribution attempts to assign a balanced total computational cost to each processor that is allowed to have Elements. First, each Block's Elements are ordered by their Z-curve index (see more below). Elements are traversed in this order and assigned to CPUs in order, moving onto the next CPU once the target cost per CPU is met. The target cost per CPU is defined as the remaining cost to distribute divided by the remaining number of CPUs to distribute to. This is an important distinction from simply having one constant target cost per CPU defined as the total cost divided by the total number of CPUs with elements. Since the total cost of Elements on a processor will nearly never add up to be exactly the average cost per CPU, this means that we would either have to decide to overshoot or undershoot the average as we iterate over the CPUs and assign Elements. If we overshoot the average on each processor, the final processor could have a much lower cost than the rest of the processors and we run the risk of overshooting so much that one or more of the requested processors don't get assigned any Elements at all. If we undershoot the average on each processor, the final processor could have a much higher cost than the others due to remainder cost piling up. This algorithm avoids these risks by instead adjusting the target cost per CPU as we finish assigning cost to previous CPUs.

Morton curves are a simple and easily-computed space-filling curve that (unlike Hilbert curves) permit diagonal traversal. See, for instance, [154] for a discussion of mesh partitioning using space-filling curves. A concrete example of the use of a Morton curve in 2d is given below.

A sketch of a 2D block with 4x2 elements, with each element labeled according to the order on the Morton curve:

x-->
0 1 2 3
----------------
y 0 | 0 2 4 6
| | | / | / | / |
v 1 | 1 3 5 7

(forming a zig-zag path, that under some rotation/reflection has a 'Z' shape).

The Morton curve method is a quick way of getting acceptable spatial locality – usually, for approximately even distributions, it will ensure that elements are assigned in large volume chunks, and the structure of the Morton curve ensures that for a given processor and block, the elements will be assigned in no more than two orthogonally connected clusters. In principle, a Hilbert curve could potentially improve upon the gains obtained by this class by guaranteeing that all elements within each block form a single orthogonally connected cluster.

The assignment of portions of blocks to processors may use partial blocks, and/or multiple blocks to ensure an even distribution of elements to processors. We currently make no distinction between dividing elements between processors within a node and dividing elements between processors across nodes. The current technique aims to have a simple method of reducing communication globally, though it would likely be more efficient to prioritize minimization of inter-node communication, because communication across interconnects is the primary cost of communication in charm++ runs.

Warning
The use of the Morton curve to generate a well-clustered element distribution currently assumes that the refinement is uniform over each block, with no internal structure that would be generated by, for instance AMR. This distribution method will need alteration to perform well for blocks with internal structure from h-refinement. Morton curves can be defined recursively, so a generalization of the present method is possible for blocks with internal refinement
Template Parameters
Dimthe number of spatial dimensions of the Blocks

The documentation for this struct was generated from the following file: