Streaming a.k.a. where is my data?
Hlt2 output data is split into Streams. While the majority of lines will be routed 1 to the TURBO stream, there are generic/inclusive lines that will go to the FULL stream and calibration lines will end up in the TURCAL stream. Dividing data into streams is required as we cannot afford to write order 10 GB/s of data to disk. However, the available tape storage makes it possible to park about three quarters of the data. That data can then periodically or on demand be staged to disk and processed in Sprucing campaigns. The following picture shows this dataflow schematically:
It is important to note that the streaming settings are fully configurable. They can be optimized as there is a tradeoff between storage space and offline processing time. For optimal storage, all data would go to the same stream, as the objects will then only be stored once. If trigger lines from different streams are positive on an event, some of the information 2 is duplicated. To speed up offline processing, running over as small samples as possible would be best.
Stream settings for 2022 data taking
Data taken in 2022 is special in several aspects. As far as the streaming is concerned, all streams contain all detector raw banks. Also, events that have been triggered by multiple streams are copied to each stream. This is important to understand, as you might find candidates of your TURCAL line in TURBO, but not all of them. So when you want to produce ntuples, you need to know to which stream your lines went. The following table summarizes the stream settings in 2022:
Stream |
Routing bit |
Lines |
---|---|---|
FULL |
87 |
Topo, inclusive detached dileptons |
TURBO |
88 |
lines from physics WGs |
TURCAL |
90 |
PID, monitoring, commissioning and tracking efficiency lines |
NOBIAS |
91 |
prescaled (0.1) passthrough of |
PASSTHROUGH |
92 |
passthrough of Hlt1 physics lines |
There are additional routing bits that are not used for streaming itself.
Those are the “physics” bit, 95, that is set for all streams above, but will not be set for special
calibration-, monitoring-, error-streams, or alike;
and the luminosity bit, 94, to which there is an associated line Hlt2Lumi
.
The luminosity line is part of every of the above streams, to propagate the lumi information offline.
The PASSTHROUGH stream is meant for developments that require re-running Moore on the raw input data; i.e. the stream does not contain reconstructed objects. A test that runs the 2022 commissioning settings on PASSTHROUGH data has been created in Hlt/Moore/tests/qmtest/test_lbexec_hlt2_pp_hlt1passthrough_data.qmt
Footnotes
- 1
Splitting (a set of) lines into streams is done by reading RoutingBits from the Hlt2 output files. Those immediate Hlt2 output files are buffered in the online storage system and contain all yet “unsreamed” outputs of Hlt2 processing. Further processing steps stream and move the data, such that they will eventually appear in the bookkeeping.
- 2
Not all objects are necessarily written to both streams. E.g. if one line is a Turbo line, only the reconstruction objects used to make the decision, the header, Sel- and DecReports are persisted.