SpECTRE Documentation Coverage Report
Current view: top level - __w/spectre/spectre/docs/DevGuide - Observers.md Hit Total Coverage
Commit: f1ddee3e40d81480e49140855d2b0e66fafaa908 Lines: 0 1 0.0 %
Date: 2020-12-02 17:35:08
Legend: Lines: hit not hit

          Line data    Source code
       1           0 : \cond NEVER
       2             : Distributed under the MIT License.
       3             : See LICENSE.txt for details.
       4             : \endcond
       5             : # Observers Infrastructure {#observers_infrastructure_dev_guide}
       6             : 
       7             : The observers infrastructure works with two parallel components: a group and a
       8             : nodegroup. We have two types of observations: `Reduction` and `Volume` (see the
       9             : enum `observers::TypeOfObservation`). `Reduction` data is anything that is
      10             : written once per time/integral identifier per simulation. Some examples of
      11             : reduction data are integrals or L2 norms over the entire domain, integrals or L2
      12             : norms over part of the domain, and integrals over lower-dimensional surfaces
      13             : such as apparent horizons or slices through the domain. Volume data is anything
      14             : that has physical extent, such as any of the evolved variables (or derived
      15             : quantities thereof) across all or part of the domain, or quantities on
      16             : lower-dimensional surfaces in the domain (e.g. the rest mass density in the
      17             : xy-plane). Reduction and volume data both use the group and nodegroup for
      18             : actually getting the data to disk, but do so in a slightly different manner.
      19             : 
      20             : ### Reduction Data
      21             : 
      22             : Reduction data requires combining information from many or all cores of a
      23             : supercomputer to get a single value. Reductions are tagged by some temporal
      24             : value, which for hyperbolic systems is the time and for elliptic systems some
      25             : combination of linear and non-linear iteration count. The reduction data is
      26             : stored in an object of type `Parallel::ReductionData`, which takes as template
      27             : parameters a series of `Parallel::ReductionDatum`. A `Parallel::ReductionDatum`
      28             : takes as template parameters the type of the data and operators that define how
      29             : data from the different cores are to be combined to a single value. See the
      30             : paragraphs below for more detail, and the documentation of
      31             : `Parallel::ReductionDatum` for examples.
      32             : 
      33             : At the start of a simulation, every component and event that wants to perform a
      34             : reduction for observation, or will be part of a reduction observation, must
      35             : register with the `observers::Observer` component. The `observers::Observer` is
      36             : a group, which means there is one per core. The registration is used so that the
      37             : `Observer` knows once all data for a specific reduction (both in time and by
      38             : name/ID) has been contributed. Reduction data is combined on each core as it is
      39             : contributed by using the binary operator from `Parallel::ReductionDatum`'s
      40             : second template parameter. Once all the data is collected on the core, it is
      41             : copied to the local `observers::ObserverWriter` nodegroup, which keeps track of
      42             : how many of the cores on the node will be contributing to a specific
      43             : observation, and again combines all the data as it is being contributed. Once
      44             : all the node's data is collected to the nodegroup, the data is sent to node `0`
      45             : which combines the reduction data as it arrives using the binary operator from
      46             : `Parallel::ReductionDatum`'s second template parameter. Using node `0` for
      47             : collecting the final reduction data is an arbitrary choice, but we are always
      48             : guaranteed to have a node `0`.
      49             : 
      50             : Once all the reductions are received on node `0`, the `ObserverWriter` invokes
      51             : the `InvokeFinal` (third) template parameter on each `Parallel::ReductionDatum`
      52             : (this is the n-ary) in order to finalize the data before writing. This is used,
      53             : for example, for dividing by the total number of grid points in an L1 or L2
      54             : norm. The reduction data is then written to an HDF5 file whose name is set in
      55             : the input file using the option
      56             : `observers::Tags::ReductionFileName`. Specifically, the data is written into an
      57             : `h5::Dat` subfile since, along with the data, the subfile name must be passed
      58             : through the reductions.
      59             : 
      60             : The actions used for registering reductions are
      61             : `observers::Actions::RegisterEventsWithObservers`,
      62             : `observers::Actions::RegisterSingletonWithObserverWriter`, and
      63             : `observers::Actions::RegisterWithObservers`. There is a separate `Registration`
      64             : phase at the beginning of all simulations where everything must register with
      65             : the observers. The action `observers::Actions::ContributeReductionData` is used
      66             : to send data to the `observers::Observer` component in the case where there is a
      67             : reduction done across an array or subset of an array. If a singleton parallel
      68             : component needs to write data directly to disk it should use the
      69             : `observers::ThreadedActions::WriteReductionData` action called on the zeroth
      70             : element of the `observers::ObserverWriter` component.
      71             : 
      72             : ### Volume Data
      73             : 
      74             : Volume data is vaguely defined as anything that has some extent. For example, in
      75             : a 3d simulation, data on 2d surfaces is still considered volume data for the
      76             : purposes of observing data. The spectral coefficients can also be written as
      77             : volume data, though some care must be taken in that case to correctly identify
      78             : which mode is associated with which terms in the basis function
      79             : expansion. Whatever component will contribute volume data to be written must
      80             : register with the `observers::Observer` component (there currently isn't tested
      81             : support for directly registering with the `observers::ObserverWriter`). This
      82             : registration is the same as in the reduction data case.
      83             : 
      84             : Once the observers are registered, data is contributed to the
      85             : `observers::Observer` component using the
      86             : `observers::Actions::ContributeVolumeData` action. The data is packed into a
      87             : `std::vector<TensorComponent>`, where the `TensorComponent` is data from just
      88             : one tensor component or a reduction over a tensor. The `extents`,
      89             : `Spectral::Basis` and `Spectral::Quadrature` are currently also passed to the
      90             : `ContributeVolumeData` action. Once all the elements on a single core have
      91             : contributed their volume data to the `observers::Observer` group, the
      92             : `observers::Observer` group moves its data to the `observers::ObserverWriter`
      93             : component to be written. We write one file per node, appending the node ID to
      94             : the HDF5 file name to distinguish between files written by different nodes. The
      95             : HDF5 file name is specified in the input file using the
      96             : `observers::Tags::VolumeFileName` option. The data is written into a subfile of
      97             : the HDF5 file using the `h5::VolumeFile` class.
      98             : 
      99             : ### Threading and NodeLocks
     100             : 
     101             : Since the `observers::ObserverWriter` class is a nodegroup, its entry methods
     102             : can be invoked simultaneously on different cores of the node. However, this can
     103             : lead to race conditions if care isn't taken. The biggest caution is that the
     104             : `DataBox` cannot be mutated on one core and simultaneously accessed on
     105             : another. This is because in order to guarantee a reasonable state for data in
     106             : the `DataBox`, it must be impossible to perform a `db::get` on a `DataBox` from
     107             : inside or while a `db::mutate` is being done. What this means in practice is
     108             : that all entry methods on a nodegroup must put their `DataBox` accesses inside
     109             : of a `node_lock.lock()` and `node_lock.unlock()` block. To achieve better
     110             : parallel performance and threading, the amount of work done while the entire
     111             : node is locked should be minimized. To this end, we have additional locks. One
     112             : for the HDF5 files because we do not require a threadsafe HDF5
     113             : (`observers::Tags::H5FileLock`). We also have locks for the objects mutated when
     114             : contributing reduction data (`observers::Tags::ReductionDataLock`) and the
     115             : objects mutated when contributing volume data
     116             : (`observers::Tags::VolumeDataLock`).
     117             : 
     118             : ### Future changes
     119             : - It would be preferable to make the `Observer` and `ObserverWriter` parallel
     120             :   components more general and have them act as the core (node)group. Since any
     121             :   simple actions can be run on them, it should be possible to use them for most,
     122             :   if not all cases where we need a (node)group.

Generated by: LCOV version 1.14