SpECTRE Documentation Coverage Report
Current view: top level - __w/spectre/spectre/docs/DevGuide - LoadBalancingNotes.md Hit Total Coverage
Commit: 9e4176f904ee1c51f05efc42ff1e485357699444 Lines: 0 1 0.0 %
Date: 2024-03-28 23:18:37
Legend: Lines: hit not hit

          Line data    Source code
       1           0 : \cond NEVER
       2             : Distributed under the MIT License.
       3             : See LICENSE.txt for details.
       4             : \endcond
       5             : 
       6             : # Notes on SpECTRE load-balancing using Charm++'s built-in load balancers {#load_balancing_notes}
       7             : 
       8             : \tableofcontents
       9             : 
      10             : The goal of load-balancing (LB) is to ensure that HPC resources are well-used
      11             : while performing large inhomogeneous simulations. In 2020-2021, Jordan Moxon
      12             : and Francois Hebert performed a number of tests using Charm++'s built-in load
      13             : balancers with SpECTRE.
      14             : 
      15             : These notes highlight the key points and give (at the bottom) some general
      16             : recommendations.
      17             : 
      18             : ### Overview of how LBs work with SpECTRE
      19             : 
      20             : In late 2020 FH tested Charm's LBs on simple, homogeneous, SpECTRE test cases.
      21             : These tests reveal the following broad behavior patterns:
      22             : - Without using any LBs (no LB command-line args, or `+balancer NullLB`),
      23             :   SpECTRE's performance depends sensitively on how the DG elements are
      24             :   distributed over the HPC system. This indicates that communication costs are
      25             :   very important in SpECTRE runs. This statement remains true for more expensive
      26             :   evolution systems like Generalized Harmonic.
      27             : - Charm's LBs that are not communications aware (e.g., `GreedyLB`, `RefineLB`,
      28             :   ...) all result in low parallel efficiency (20-30%). This is consistent with
      29             :   the understanding that communication costs are large. A good initial
      30             :   distribution of elements over processors will be degraded by these LBs,
      31             :   leading to more complicated communication graph and loss of performance.
      32             : - Some of Charm's communication-aware LBs perform well: they approach the
      33             :   efficiency of a "manually tuned" initial distribution of elements onto
      34             :   processors. This suggests these LBs do a good job of partitioning the
      35             :   communications graph. In FH's simple tests, the best results were from
      36             :   `RecBipartLB`, which came within 10-20% of a manual initial distribution.
      37             :   However, it is a slow algorithm and is best used infrequently or only a few
      38             :   times near the start of the simulation.
      39             : 
      40             : Note that at the 2020 Charm++ Workshop, the Charm team recommended that we use
      41             : `MetisLB`, or that we combined `MetisLB` with `RefineLB` (syntax:
      42             : `+balancer MetisLB +balancer RefineLB`, which applies the first balancer on the
      43             : first invocation and the second balancer on all subsequent invocations, to
      44             : 'polish' the results of the first LB). In practice, this failed for two reasons:
      45             : - `MetisLB` tends to error with FPEs
      46             : - When falling back to the pairing of `RecBipartLB` followed by `RefineLB`, the
      47             :   run starts with good performance. However, within a few applications of
      48             :   `RefineLB`, the performance is heavily degraded (down to 20-30% efficiency).
      49             :   It appears that we should stick to the comm-aware LB strategies.
      50             : 
      51             : ### Contaminated LB measurements on first invocation
      52             : 
      53             : There is reason to suspect that the Charm load balancing may incorrectly balance
      54             : the load when applied near the start of some simulations. This is because the
      55             : 'one-time' setup of the system may involve nontrivial computation, and (e.g.
      56             : for numeric initial data) communication patterns between components that differ
      57             : significantly from the patterns during evolution. Then, the first balancer
      58             : invocation, based partially on the measurements taken during the
      59             : non-generalizable initialization phase, can give rise to a poorly-chosen balance
      60             : and injure performance.
      61             : This is the suspected cause of poor performance that has been noticed
      62             : in cases of homogeneous load and numeric initial data in Generalized Harmonic
      63             : tests performed by Geoffrey Lovelace. JM confirmed the problem.
      64             : It appears that the balance is not similarly degraded when using cases that do
      65             : not involve numeric initial data -- some basic 2-node re-tests with Generalized
      66             : Harmonic by JM seem to produce useful balance (improved performance) when not
      67             : using numeric initial data.
      68             : 
      69             : This issue has not been investigated to a completely satisfactory conclusion,
      70             : though the above explanation seems most plausible.
      71             : 
      72             : In cases for which it appears that the LB data is problematically impacted by
      73             : the set up of the evolution system, we can try two main strategies to mitigate
      74             : the problem:
      75             : - Apply the load balancer at least two times near the start of the simulation,
      76             :   with sufficient gaps to collect useful balancing information. The LB database
      77             :   in Charm is cleared every time a balance is applied, so the later balances
      78             :   during the evolution should be uncontaminated. This strategy has not yet
      79             :   been carefully tested. To do this, use an input file similar to
      80             : ```
      81             : PhaseChangeAndTriggers:
      82             :   - Trigger:
      83             :       Slabs:
      84             :         Specified:
      85             :           Values: [5, 10, 15]
      86             :     PhaseChanges:
      87             :       - VisitAndReturn(LoadBalancing)
      88             : ```
      89             : - Use `LBTurnInstrumentOff` and `LBTurnInstrumentOn` to specifically exclude
      90             :   setup procedures from the LB instrumentation. First attempts indicate that
      91             :   this process might be challenging to accomplish correctly, and may require
      92             :   correspondence with the Charm developers to clarify at what points in the
      93             :   code execution those commands may be used, and precisely how they affect
      94             :   the load-balancing database. A first attempt by JM was to turn instrumentation
      95             :   off during array element construction, then turn instrumentation on for each
      96             :   element during the start of the `Evolve` phase, but that attempt led to a
      97             :   hang of the system, so the utility must have more constraints than were
      98             :   initially apparent.
      99             : 
     100             : ### Scotch load balancer
     101             : 
     102             : JM tested `ScotchLB`, and found better performance than with `RecBipartLB`. The
     103             : margin varied a great deal among the number of nodes used, but at multiple
     104             : points tested, the runtime was less than 65% of the `RecBipartLB` runtime.
     105             : The tests were performed with homogeneous load, but starting from the
     106             : round-robin element distribution. The indication is therefore that `ScotchLB`
     107             : is very effective at minimizing communication costs.
     108             : 
     109             : However, in practical applications, JM found that the `ScotchLB` often generates
     110             : FPEs during the graph partition step and causes the simulation to crash.
     111             : The issue [charm++ issue #3401](https://github.com/UIUC-PPL/charm/issues/3401)
     112             : tracks the progress to determine the cause of the problem and fix it in Charm.
     113             : The source of the problem has largely been identified, but the fix is still
     114             : pending.
     115             : 
     116             : `ScotchLB` will likely replace `RecBipartLB` as the most-frequently recommended
     117             : centralized communication based balancer for SpECTRE once the FPE bugs have
     118             : been fixed.
     119             : 
     120             : ### General recommendations
     121             : 
     122             : #### Homogeneous loads
     123             : 
     124             : For homogeneous loads, it is likely best to omit load-balancing and just use
     125             : the z-curve distribution (default) to give a good initial distribution and use
     126             : that for the entire evolution. This means calling the SpECTRE executable with
     127             : no LB-related command-line args, or with `+balancer NullLB`.
     128             : 
     129             : You may find modest gains from using a communication-based load balancer, but
     130             : likely only from the 'extra' parallel components of the system that cause the
     131             : load to be not completely homogeneous (e.g., components like the interpolator
     132             : or horizon finder).
     133             : If you need a very long evolution or intend to submit a large number of
     134             : evolutions, it may be worth experimenting to see whether 1-3 applications of
     135             : `RecBipartLB` (or `ScotchLB` once its bugs are fixed, see above) improve
     136             : performance for the system, for instance by using the input file:
     137             : ```
     138             : PhaseChangeAndTriggers:
     139             :   - Trigger:
     140             :       Slabs:
     141             :         Specified:
     142             :           Values: [5, 10, 15]
     143             :     PhaseChanges:
     144             :       - VisitAndReturn(LoadBalancing)
     145             : ```
     146             : and command-line args `+balancer RecBipartLB` (or `ScotchLB` when its
     147             : bugs are fixed). This may be particularly relevant for cases with numeric
     148             : initial data or other complicated set-up procedures.
     149             : 
     150             : #### Inhomogeneous loads
     151             : 
     152             : Based on our experiments, we anticipate that using a load-balancer may
     153             : significantly improve runtimes with inhomogeneous loads. Our testing on this
     154             : case is far more sparse, but for SpECTRE executables, it is probably remains
     155             : true that managing communication costs will be an important goal for the
     156             : balancer. It is likely worth attempting the evolution with a
     157             : periodically-applied centralized communication-aware balancer, e.g.:
     158             : ```
     159             : PhaseChangeAndTriggers:
     160             :   - Trigger:
     161             :       Slabs:
     162             :         EvenlySpaced:
     163             :           Interval: 1000
     164             :           Offset: 5
     165             :     PhaseChanges:
     166             :       - VisitAndReturn(LoadBalancing)
     167             : ```
     168             : paired with command-line args `+balancer RecBipartLB` (or `ScotchLB` when its
     169             : bugs are fixed).
     170             : 
     171             : Important considerations when choosing the interval with which to balance are:
     172             : - you will want to ensure that the balancer is applied frequently enough to
     173             :   prioritize expensive parts of the simulation before any relevent features
     174             :   'move' to other elements. For example, if a shock is moving across the
     175             :   simulation domain causing certain elements to be more expensive to compute,
     176             :   you want to balance often enough that the LB 'keeps up' with the movement of
     177             :   the shock.
     178             : - you will want to avoid balancing so frequently that the synchronization
     179             :   and balancer calculation itself becomes a significant portion of runtime.
     180             : 
     181             : We have not yet taken much detailed data on using the load-balancers for
     182             : inhomogeneous loads, so more detailed tests determining their efficacy would be
     183             : valuable.

Generated by: LCOV version 1.14