LCOV - 3c072f0ce967e2e56649d3fa12aa2a0e4fe2a42e - /__w/spectre/spectre/docs/DevGuide/Observers.md

SpECTRE Documentation Coverage Report

Current view:	top level - __w/spectre/spectre/docs/DevGuide - Observers.md		Hit	Total	Coverage
Commit:	3c072f0ce967e2e56649d3fa12aa2a0e4fe2a42e	Lines:	0	1	0.0 %
Date:	2024-04-23 20:50:18
Legend:	Lines: hit not hit

          Line data    Source code

       1           0 : \cond NEVER
       2             : Distributed under the MIT License.
       3             : See LICENSE.txt for details.
       4             : \endcond
       5             : # Observers Infrastructure {#observers_infrastructure_dev_guide}
       6             : 
       7             : \tableofcontents
       8             : 
       9             : The observers infrastructure works with two parallel components: a group and a
      10             : nodegroup. We have two types of observations: `Reduction` and `Volume` (see the
      11             : enum `observers::TypeOfObservation`). `Reduction` data is anything that is
      12             : written once per time/integral identifier per simulation. Some examples of
      13             : reduction data are integrals or L2 norms over the entire domain, integrals or L2
      14             : norms over part of the domain, and integrals over lower-dimensional surfaces
      15             : such as apparent horizons or slices through the domain. Volume data is anything
      16             : that has physical extent, such as any of the evolved variables (or derived
      17             : quantities thereof) across all or part of the domain, or quantities on
      18             : lower-dimensional surfaces in the domain (e.g. the rest mass density in the
      19             : xy-plane). Reduction and volume data both use the group and nodegroup for
      20             : actually getting the data to disk, but do so in a slightly different manner.
      21             : 
      22             : ### Reduction Data
      23             : 
      24             : Reduction data requires combining information from many or all cores of a
      25             : supercomputer to get a single value. Reductions are tagged by some temporal
      26             : value, which for hyperbolic systems is the time and for elliptic systems some
      27             : combination of linear and non-linear iteration count. The reduction data is
      28             : stored in an object of type `Parallel::ReductionData`, which takes as template
      29             : parameters a series of `Parallel::ReductionDatum`. A `Parallel::ReductionDatum`
      30             : takes as template parameters the type of the data and operators that define how
      31             : data from the different cores are to be combined to a single value. See the
      32             : paragraphs below for more detail, and the documentation of
      33             : `Parallel::ReductionDatum` for examples.
      34             : 
      35             : At the start of a simulation, every component and event that wants to perform a
      36             : reduction for observation, or will be part of a reduction observation, must
      37             : register with the `observers::Observer` component. The `observers::Observer` is
      38             : a group, which means there is one per core. The registration is used so that the
      39             : `Observer` knows once all data for a specific reduction (both in time and by
      40             : name/ID) has been contributed. Reduction data is combined on each core as it is
      41             : contributed by using the binary operator from `Parallel::ReductionDatum`'s
      42             : second template parameter. Once all the data is collected on the core, it is
      43             : copied to the local `observers::ObserverWriter` nodegroup, which keeps track of
      44             : how many of the cores on the node will be contributing to a specific
      45             : observation, and again combines all the data as it is being contributed. Once
      46             : all the node's data is collected to the nodegroup, the data is sent to node `0`
      47             : which combines the reduction data as it arrives using the binary operator from
      48             : `Parallel::ReductionDatum`'s second template parameter. Using node `0` for
      49             : collecting the final reduction data is an arbitrary choice, but we are always
      50             : guaranteed to have a node `0`.
      51             : 
      52             : Once all the reductions are received on node `0`, the `ObserverWriter` invokes
      53             : the `InvokeFinal` (third) template parameter on each `Parallel::ReductionDatum`
      54             : (this is the n-ary) in order to finalize the data before writing. This is used,
      55             : for example, for dividing by the total number of grid points in an L1 or L2
      56             : norm. The reduction data is then written to an HDF5 file whose name is set in
      57             : the input file using the option
      58             : `observers::Tags::ReductionFileName`. Specifically, the data is written into an
      59             : `h5::Dat` subfile since, along with the data, the subfile name must be passed
      60             : through the reductions.
      61             : 
      62             : The actions used for registering reductions are
      63             : `observers::Actions::RegisterEventsWithObservers` and
      64             : `observers::Actions::RegisterWithObservers`. There is a separate `Registration`
      65             : phase at the beginning of all simulations where everything must register with
      66             : the observers. The action `observers::Actions::ContributeReductionData` is used
      67             : to send data to the `observers::Observer` component in the case where there is a
      68             : reduction done across an array or subset of an array. If a singleton parallel
      69             : component or a specific chare needs to write data directly to disk it should use
      70             : the `observers::ThreadedActions::WriteReductionDataRow` action called on the
      71             : zeroth element of the `observers::ObserverWriter` component.
      72             : 
      73             : ### Volume Data
      74             : 
      75             : Volume data is vaguely defined as anything that has some extent. For example, in
      76             : a 3d simulation, data on 2d surfaces is still considered volume data for the
      77             : purposes of observing data. The spectral coefficients can also be written as
      78             : volume data, though some care must be taken in that case to correctly identify
      79             : which mode is associated with which terms in the basis function
      80             : expansion. Whatever component will contribute volume data to be written must
      81             : register with the `observers::Observer` component (there currently isn't tested
      82             : support for directly registering with the `observers::ObserverWriter`). This
      83             : registration is the same as in the reduction data case.
      84             : 
      85             : Once the observers are registered, data is contributed to the
      86             : `observers::Observer` component using the
      87             : `observers::Actions::ContributeVolumeData` action. The data is packed into an
      88             : `ElementVolumeData` object that carries `TensorComponent`s on a grid.
      89             : Information on the grid, such as its extents, basis and quadrature, are stored
      90             : alongside the `TensorComponent`s. Once all the elements on a single core have
      91             : contributed their volume data to the `observers::Observer` group, the
      92             : `observers::Observer` group moves its data to the `observers::ObserverWriter`
      93             : component to be written. We write one file per node, appending the node ID to
      94             : the HDF5 file name to distinguish between files written by different nodes. The
      95             : HDF5 file name is specified in the input file using the
      96             : `observers::Tags::VolumeFileName` option. The data is written into a subfile of
      97             : the HDF5 file using the `h5::VolumeFile` class.
      98             : 
      99             : If a singleton parallel component or a specific chare needs to write volume data
     100             : directly to disk, such as surface data from an apparent horizon, it should use
     101             : the `observers::ThreadedActions::WriteVolumeData` action called on the zeroth
     102             : element of the `observers::ObserverWriter` component. For surface data (such as
     103             : output from horizon finds), this data should be written to a file specified by
     104             : the `observers::Tags::SurfaceFileName` option.
     105             : 
     106             : ### Threading and NodeLocks
     107             : 
     108             : Since the `observers::ObserverWriter` class is a nodegroup, its entry methods
     109             : can be invoked simultaneously on different cores of the node. However, this can
     110             : lead to race conditions if care isn't taken. The biggest caution is that the
     111             : `DataBox` cannot be mutated on one core and simultaneously accessed on
     112             : another. This is because in order to guarantee a reasonable state for data in
     113             : the `DataBox`, it must be impossible to perform a `db::get` on a `DataBox` from
     114             : inside or while a `db::mutate` is being done. What this means in practice is
     115             : that all entry methods on a nodegroup must put their `DataBox` accesses inside
     116             : of a `node_lock.lock()` and `node_lock.unlock()` block. To achieve better
     117             : parallel performance and threading, the amount of work done while the entire
     118             : node is locked should be minimized. To this end, we have additional locks. One
     119             : for the HDF5 files because we do not require a threadsafe HDF5
     120             : (`observers::Tags::H5FileLock`). We also have locks for the objects mutated when
     121             : contributing reduction data (`observers::Tags::ReductionDataLock`) and the
     122             : objects mutated when contributing volume data
     123             : (`observers::Tags::VolumeDataLock`).
     124             : 
     125             : ### Future changes
     126             : - It would be preferable to make the `Observer` and `ObserverWriter` parallel
     127             :   components more general and have them act as the core (node)group. Since any
     128             :   simple actions can be run on them, it should be possible to use them for most,
     129             :   if not all cases where we need a (node)group.

Generated by: LCOV version 1.14