Line authoring guidelines

This page motivates and explains the various guidelines you should follow when writing HLT2 lines. If you just want a birds-eye view of the key points, skip to the Summary, otherwise read on!

When writing an HLT2 line to select your physics of interest, there are several questions worth considering before you start writing any code.

  • Are there any similar lines, filters, or combiners already configured in Moore that I can re-use?

  • Do I want an exclusive selection, capturing a specific physics process and final state, or an inclusive selection, capturing a set of many related processes?

  • What backgrounds should I try to suppress?

And more, all based on matching the line to the analysis you intend to perform. These questions should be discussed amongst your research group and in physics working groups. Once answered, you can follow the Writing an HLT2 line tutorial to get a feeling for how to write the code.

With an HLT2 line written and running, there are further questions to consider, based on the fact that HLT2 must run within certain operational limits, and must be maintained by you and others into the future.

  • What is an acceptable accept rate for my line?

  • What is an acceptable output bandwidth for my line?

  • How can I measure these things myself?

This page gives some guidelines to help you answer these questions. In general, there are no hard-and-fast rules, and we must consider things on a case-by-case basis. If things still aren’t clear after reading this page, or you think your usage is not addressed, you can ask a question:

Efficiency

Your primary concern as an analyst should be maximising signal efficiency. A larger efficiency means more signal for a given production rate and fewer opportunities for the selection to bias your signal.

To evaluate your signal efficiency you need a simulated sample of events which contain your signal process. There are already many simulated samples available, so there may be a suitable one available, and if not you should request a sample.

Write a draft selection once you have a sample in hand, following the Writing an HLT2 line tutorial if this is your first line. Check that the output looks sensible to you using the Analysing HLT2 output tutorial, and then use the output ntuples to check the efficiencies you care about.

If the efficiencies look good, it’s time to move on to measuring the rate.

Rate

There is a natural push and pull between the constraints of operating the trigger and the desire of the analyst to keep as much data as possible. For the former there is a fairly hard limit on how many events per second can be managed by the online and offline systems, and so we must ensure we do not exceed this. For the latter it can be convenient to retain as much data mining and data exploration capabilities as possible and to be able to study the effect of selections by imposing them only offline, where they can be relaxed and tightened at whim to study potential biases. These all lead to a tendency to want to increase the event rate.

So, for the analyst, the most efficient and most minimally-biasing signal selection is no selection at all. Not only is this idea prohibited operationally online, however, it will also result in every analyst having to process vast amounts of data offline, which is similarly impossible. There are also practical concerns for the analyst, as manipulating a compact and clean signal sample is much more manageable than wrangling terabytes of ntuples for each decay mode of interest. There are typically many ‘obvious’ selections which remove some backgrounds cleanly at the expense of little signal, and so it’s in everyone’s interests to apply these in the trigger.

Clearly, there is a balance to be struck. How do we achieve this?

Your line shares resources with every other. Having a hard cut-off rate for every line will result in some selecting far more data than is reasonable, and others being statistically limited. We need another metric.

The guideline for tuning your HLT2 line is to judge the rate against the true signal rate. That is, if the physics process of interest occurs around once per second, it would be unreasonable for your line to select one thousand events per second as almost all events will be background, and you will have to remove most of that offline anyway.

As the trigger is probably not 100% efficient on your signal, using the true signal rate as a guideline already includes some tolerance for your line to accept non-signal processes (background).

So, compute the true signal rate and use that as a benchmark against which to judge whether your line is being far too greedy or whether you can afford to relax your selection (or, indeed, whether you have a bug that’s rejecting all your signal!).

True signal rate

In the usual proton-proton running scheme of the LHC, the instantaneous luminosity of the colliding beams at the LHCb interaction point is around \(2 \times 10^{33} \mathrm{cm}^{-2}\mathrm{s}^{-1}\). Knowing this along with the production cross-section of the signal particle \(\sigma\) and the branching fraction to the final-state of interest \(\mathcal{B}\) gives you an estimate of the signal rate

\[\mathrm{Signal\ rate} = \mathcal{L}_{\mathrm{Inst.}} \times \sigma(pp \to H) \times \mathcal{B}(H \to f).\]

If the production cross-section or branching fractions are not known, this usually means they are quite small, and so your signal rate will be correspondingly small. An existing measurement of a similar, more common process can be used to compute a reasonable upper bound or estimate for your signal.

Use the LHCb publications page to see what measurements exist for production cross-sections, and consult the Particle Data Group for branching fractions. If in doubt, discuss with your research group.

Note

As an example, here we compute an estimate for the the signal rate of the process \(B^{+} \to J/\psi K^{+}\), with \(J/\psi \to \mu^{+} \mu^{-}\). We know these factors, uncertainties omitted:

  • The instantaneous luminosity in Run 3 proton-proton running is \(2 \times 10^{33} \mathrm{cm}^{-2}\mathrm{s}^{-1}\).

  • The \(B^{+}\) production cross section is 87 microbarn, or \(87 \times 10^{-30} \mathrm{cm}^{-2}\).

  • The \(B^{+} \to J/\psi K^{+}\) branching fraction is around \(10^{-3}\).

  • The \(J/\psi \to \mu^{+} \mu^{-}\) branching fraction is around \(6 \times 10^{-2}\).

Multiplying these factors together gives us an estimated signal rate of 10 Hz, which is quite high as beauty rates go.

Remember that this rate assumes 100% reconstruction and selection efficiency. In reality the efficiency can be considerably lower, by an order of magnitude or more. Observing the true signal rate as our HLT2 line selection rate will indicate the presence of background.

Computing rate

One typically uses signal Monte Carlo samples to estimate signal efficiencies and to study the effect of the HLT2 reconstruction and selection on quantities of interest. Signal MC contains one signal process per event, and so is not representative of real data.

Minimum bias MC, with event type 30000000, is more representative of real data, and is used to estimated the rate of HLT lines. By assuming that each event in such a sample corresponds to one event out of the 30 million events per second in real data, you can scale the accepted fraction of minimum bias MC events for 30 MHz to get an estimate for the rate of your line

\[\mathrm{HLT2\ line\ rate} = 30\,\mathrm{MHz} \times \frac{N_{\mathrm{Accepted}}}{N_{\mathrm{Processed}}}.\]

If you process one million minimum bias MC events and your HLT2 line selects 100 of them, your line has an estimated rate of 3 kHz, which is very high for a typical exclusive line. If no events are selected, one can estimate an upper limit of the rate based on the number of processed events.

In real running conditions, HLT2 only sees events processed by HLT1, which is not 100% efficient for many decay modes. There are HLT1-filtered MC samples available which can partly account for the HLT1 efficiency. As the Real-time Analysis group finalises the HLT1 menu, you will be able to make more precise estimates of the HLT1 efficiency.

Follow the Studying HLT efficiencies, rates and overlaps tutorial to learn more about measuring signal efficiencies and minimum bias rates. In the future we hope to automate the execution of such tools in the centrally-run nightly tools and monitor the trend of the rates of all lines over time.

Other considerations

The HLT2 output rate in Run 3 is expected to be in the tens of kilohertz, up to around 100 kHz, and we expect to run one to two thousand HLT2 lines. This can give some context to your signal rate. If it is around one hertz or below, this is very small in the context of the total rate, and the rate of line can reasonably be rounded up to several hertz, if this is useful for your analysis.

Use your best judgement when comparing the rate of your line to the signal rate. If your analysis strategy necessitates selecting a very wide sideband region in some invariant mass spectrum, a reasonable rate may be considerably higher than the signal rate as the selection captures a lot more background. Similarly, if your line selects several different processes, each of these may bring in their own backgrounds, increasing the rate. Conversely, if you know there are cuts you will apply before analysing the ntuples, apply these upfront in the trigger

If a process is considered particularly critical, for example if it is used by many analyses or is a control or calibration channel, it may be justifiable for the corresponding HLT2 line to take considerably more rate that the signal rate to be be able to meet the various needs.

Ultimately, the distribution of the total HLT2 output rate across the lines is decided amongst the physics working groups, taking the sorts of considerations above into account. Still, the task of distributing rates is much easier if each line already has a reasonable rate, so you should try to do this earlier rather than later.

Note

We have discussed evaluating the trigger performance using simulated data, which is an approximation of what we think the real data will look like. We will revisit the balance of rates once we have a better understanding of operating the trigger in real conditions.

Bandwidth

The total HLT2 output bandwidth is the product of the total output rate and the average event size

\[\mathrm{Bandwidth}\ [\mathrm{GB/s}] \propto \mathrm{Output\ rate}\ [\mathrm{kHz}] \times \mathrm{Avg.\ event\ size}\ [\mathrm{kB}].\]

The Upgrade Computing Model TDR gives the HLT2 output bandwidth in Run 3 will be ten gigabytes per second, with an average event size of around XX kilobytes. Each time your HLT2 line makes a positive decision, it contributes to the output bandwidth. As this is a limited resource, it’s important to understand the magnitude of this contribution and to limit it as much as possible.

Today, we do not have a good way of measuring the HLT2 output bandwidth, and so you do not need to worry about computing this exact number. However, one important component of bandwidth is the rate, and so you can already start getting a handle on your line’s output bandwidth by ensuring your output rate is sensible. See the Rate section for advice.

The Turbo and Full streams

As described in the Upgrade Computing Model TDR (chapter 3), there are two output streams of HLT2 intended for almost all physics analysis. Most lines send their selected events to the Turbo stream, whilst a few lines send their selected events to the Full stream.

In the Turbo stream, analysts have near-immediate access to their data, and can access every event and every object persisted by their HLT2 lines. Typically this means only the candidate decay tree that actually fired the line, with perhaps a few other objects in event as well.

In the Full stream, the entire reconstructed event is persisted. Analysts must wait for this data to go through an additional filtering step called Sprucing. This filters both the number of events and the number of objects within each event in order to reduce the total data size saved to disk. (The input data to the Sprucing is saved to tape, which analysts do not have access to.)

The vast majority of HLT2 lines will send their data to the Turbo stream. The few lines that output an enormous bandwidth, such as the inclusive ‘topological beauty’ lines, send their data to the Full stream, because we do not have the resources store their full output on disk. Unless you know your line is a special case, it will send events to the Turbo stream.

Further documentation on the streaming can be found in a dedicated section.

Timing and performance

As HLT2 must process around one megahertz of HLT1-accepted events, it must be fast. In computing the total HLT2 decision, each HLT2 line is executed as a series of steps, with each step determining whether the next should run. To keep HLT2 fast, each step should do as little work as possible, and a line should abort processing as early as possible.

As almost every HLT2 line requires the full reconstruction, this is factored out of the processing time taken by an individual line. The most time spent in computing a line decision is then typically combinatorics, where N-body vertices are created to form candidate decay chains. Here is the log produced by an example two-body vertex creation algorithm:

CombineParticles#6                  SUCCESS Number of counters : 9
|    Counter                                      |     #     |    sum     | mean/eff^* | rms/err^*  |     min     |     max     |
| "# D0 -> K-  K+ "                               |        73 |          0 |     0.0000 |     0.0000 |      0.0000 |      0.0000 |
| "# FilterDesktop#6/Particles"                   |        73 |        349 |     4.7808 |     2.7210 |      0.0000 |      12.000 |
| "# K+"                                          |        73 |        160 |     2.1918 |     1.6019 |      0.0000 |      6.0000 |
| "# K-"                                          |        73 |        189 |     2.5890 |     1.7108 |      0.0000 |      8.0000 |
| "# input particles"                             |        73 |        349 |     4.7808 |     2.7210 |      0.0000 |      12.000 |
| "# selected"                                    |        73 |          0 |     0.0000 |     0.0000 |      0.0000 |      0.0000 |
|*"#accept"                                       |        73 |          0 |( 0.000000 +- 0.000000)% |             |             |
|*"#pass combcut"                                 |       484 |          8 |( 1.652893 +- 0.5795360)%|   -------   |   -------   |
|*"#pass mother cut"                              |         8 |          0 |( 0.000000 +- 0.000000)% |   -------   |   -------   |

This algorithm processed 73 events and created a total of 484 two-body combinations. All but 8 of these entered the computationally expensive vertex fit, having been filtered out by the combination cuts.

It is good that very few candidates entered the vertex fit, but this algorithm is still computing the combination cut against many two-body combinations. Tighter selections on the input objects, kaons in this case, would reduce the number of combinations per event, and hence reduce the total processing time taken by this component. In conclusion, try to put cuts as early as possible.

When developing a line, it is good practice to keep an eye on the timing table which you can find the stdout of your Moore job. The table’s header looks like this:

| Name of Algorithm | Execution Count | Total Time / s | Avg. Time / us |

To speed up your selection, you are mainly interested in the Total Time of your filters and combiners. Try to:

  1. tighten the selection on the inputs

  2. apply individual Combination(12[34]) cuts

  3. order cuts by efficiency and functor evaluation speed

  4. configure the control flow

  5. share selections (not builders)

As mentioned above, combination cuts allow to reject background candidates early. In three- or more-prong decays, combination cuts are carried out in sequence, starting with the sub-combination of the first two particles in the decay descriptor: Combination12Cut, then Combination123Cut etc. up to the final CombinationCut. The individual cuts within any combination- or mother-cuts are as well carried out in sequence, and the functor evaluation will break as soon as the result is false.

In practice, mass cuts of (sub-)combinations often help to speed up the selection considerably. For example, a [D0 -> K- pi- pi+ pi+]cc 4-body combiner will profit from cuts like Combination12Cut=F.MASS < comb_m_max - pi_mass - pi_mass and Combination123Cut=F.MASS < comb_m_max - pi_mass. Such cuts are fast and 100% signal-efficient. Also, instances of MAX(S)DOCACUT are recommended to ensure that the tracks make a good vertex. Evaluating MAX(S)DOCACUT is computationally more expensive than kinematic cuts like masses, so the kinematic cuts should be applied first. In some cases it makes sense to move functors like MAX(S)DOCACUT to the final CombinationCut.

If you configure the control flow in a clever way, you can avoid that expensive combiners are executed at all in certain events. Consider the 4-body \(D^0\) decay from above, coming from a semi-leptonic \(B^- \to D^0 \mu^- \bar{\nu}_\mu\) decay; and you have written a tight muon-filter that on average only finds a good muon in every 2nd event. In such a case, the heavy \(D^0\) combiner only needs to be executed on a positive muon decision, if you put the muon-filter before the final b candidate into the control flow.

Finally, it is important to note that algorithms, like filters and combiners, are de-duplicated in pyconf. This means that an algorithm used in several places is only executed once if it is configured exactly the same; meaning same inputs and properties (cuts). What pyconf cannot de-duplicate are for instance combiners that have slightly different cuts (e.g. called through builder functions with different arguments). In this case, all non-unique combiners are run from scratch. In some cases it may be worth to call filters on top of common combiners that share a loose selection.

Code

It should be possible for any analyst to understand any HLT2 line. This helps decrease the maintenance burden of HLT2 and helps us all understand how we are selecting the events that enter our analyses.

To this end, there are a few coding conventions you should follow to keep the source coherent. The Moore coding conventions give an overview of Python-specific things, and you should try to get a feeling for these conventions by looking at the files already in Moore. The following are conventions specific to writing HLT2 lines.

Note

You can find a list of best practices based on the example in “Writing an HLT2 line”.

Naming

Almost all our names, like function names, function arguments, variables within functions, and file names, are in lowercase snake case (like snake_case). Exceptions are module-level constants, which are in uppercase snake case (like SNAKE_CASE), and class names, which are in camel case (like CamelCase).

HLT2 line names, the name argument you give to the Moore.lines.Hlt2Line constructor, also have their own conventions. These are discussed in Moore#60.

For debugging purposes, it has proven useful to overwrite the default names of combiners (e.g. TwoBodyCombiner#123 to Tutorial_Lb0_Combiner). For (machine-)readability it is useful to have names like MyWG_MyModule_MyCombiner.

Note

If you use combiners multiple times with different (cut-) configurations in (builder-) functions and give a name to the combiner-algorithm, you have to give it a unique name.

The reason for that is that functors from the cache are called by name, and the instantiation order of the functor cache and the actual configuration for selections can be different; such that the wrong set of cuts might be applied. See LHCb#267 for details.

We therefore recommend to either move away from passing cuts to builder functions or pass (part of) the name as argument to the builder function.

The naming of builder-, filter- and line-defining functions itself is, apart from the leading underscore for local functions, not of great concern. We recommend keeping them in snake_case, short and descriptive.

Style

Follow the Moore coding conventions. Python module imports should be grouped together at the top of the file. These are typically grouped by ‘source’ (first Python standard library, then from Gaudi/Configurables, then PyConf or Moore, then your own code) and then alphabetically within groups.

Summary

All nuance aside, below are the steps typically considered when adding an HLT2 line.

  1. Discuss the idea with your research group and your physics analysis working group.

  2. Look at what’s available in Moore. If a selection already exists for your physics process, contact the author and work together to make sure it suits everyone’s needs. If a selection does not exist, see if you can re-use parts from existing selections for similar processes.

  3. See if simulated samples exist for your signal, and request some if not.

  4. Write a draft selection for your signal, following the structure and style shown in existing HLT2 lines.

  5. Compute the signal efficiencies you care about using the signal MC sample. For many analyses you will care about more than just the integrated or ‘total’ efficiency, but the efficiency as a function of one or more variables. Think about this and check.

  6. Compute the selection rate using minimum bias MC samples.

  7. Using the ‘true signal rate’ as a guide, tune your selection, referring to step 5, until the rate seems acceptable.

  8. Check that you apply cuts as early as possible, and that you do not create lots of combinations per event.

  9. Discuss your findings and iterate with your working group.

  10. Present your results at an RTA WP3 meeting.

You do not need to make detailed bandwidth and timing considerations at this point.