Streaming a.k.a. where is my data?

Hlt2 output data is split into Streams. While the majority of lines will be routed 1 to the TURBO stream, there are generic/inclusive lines that will go to the FULL stream and calibration lines will end up in the TURCAL stream. Dividing data into streams is required as we cannot afford to write order 10 GB/s of data to disk. However, the available tape storage makes it possible to park about three quarters of the data. That data can then periodically or on demand be staged to disk and processed in Sprucing campaigns. The following picture shows this dataflow schematically:

../_images/DPA_dataflow.png — Offline dataflow. About 10 GB/s of data arrive from Hlt2 processing, and split into the 3 gray streams. Sprucing will skim and prune the data on tape and pass through Turbo data that is directly written to disk for offline analysis. At this stage, the data can be split into further streams, drawn in purple. Numbers are indicative. Taken from LHCb-FIGURE-2020-016.

It is important to note that the streaming settings are fully configurable. They can be optimized as there is a tradeoff between storage space and offline processing time. For optimal storage, all data would go to the same stream, as the objects will then only be stored once. If trigger lines from different streams are positive on an event, some of the information 2 is duplicated. To speed up offline processing, running over as small samples as possible would be best.

Stream settings for 2022 data taking

Data taken in 2022 is special in several aspects. As far as the streaming is concerned, all streams contain all detector raw banks. Also, events that have been triggered by multiple streams are copied to each stream. This is important to understand, as you might find candidates of your TURCAL line in TURBO, but not all of them. So when you want to produce ntuples, you need to know to which stream your lines went. The following table summarizes the stream settings in 2022:

Streaming settings in 2022
Stream	Routing bit	Lines
FULL	87	Topo, inclusive detached dileptons
TURBO	88	lines from physics WGs
TURCAL	90	PID, monitoring, commissioning and tracking efficiency lines
NOBIAS	91	prescaled (0.1) passthrough of `Hlt1ODINNoBiasDecision`
PASSTHROUGH	92	passthrough of Hlt1 physics lines `^Hlt1(?!ODINNoBias\|ODINLumi).*Decision`

There are additional routing bits that are not used for streaming itself. Those are the “physics” bit, 95, that is set for all streams above, but will not be set for special calibration-, monitoring-, error-streams, or alike; and the luminosity bit, 94, to which there is an associated line Hlt2Lumi. The luminosity line is part of every of the above streams, to propagate the lumi information offline.

The PASSTHROUGH stream is meant for developments that require re-running Moore on the raw input data; i.e. the stream does not contain reconstructed objects. A test that runs the 2022 commissioning settings on PASSTHROUGH data has been created in Hlt/Moore/tests/qmtest/test_lbexec_hlt2_pp_hlt1passthrough_data.qmt

Footnotes

1: Splitting (a set of) lines into streams is done by reading RoutingBits from the Hlt2 output files. Those immediate Hlt2 output files are buffered in the online storage system and contain all yet “unsreamed” outputs of Hlt2 processing. Further processing steps stream and move the data, such that they will eventually appear in the bookkeeping.
2: Not all objects are necessarily written to both streams. E.g. if one line is a Turbo line, only the reconstruction objects used to make the decision, the header, Sel- and DecReports are persisted.