ThOr functors


List of all ThOr functors: ThOr functors reference

The ThOr selection framework (for Throughput Oriented) is made up of a set of C++ algorithms, tools, and functors which are composed together in Python in the HLT2 application.

This page explains what ThOr functors are, how they work, and how they relate to the functors from the LoKi selection framework which was used in Runs 1 and 2 and in the original Run 3 HLT2 configuration.

What are functors?

In an abstract sense when we say ‘functor’ we mean a function object. This is an object which can be called using the usual open-close-parenthesis () notation used in C++, Python, and other programming languages.

More specifically, functors can be composed with other functors through operations like addition and comparison. Composing functors creates new functors.

As an example, say we had a functor square which, when called, returns the square of a number, and we have a similar functor cube. Here are some behaviours we could reasonably expect from such functors:

>>> square(2)
4
>>> cube(2)
8
>>> (square + cube)(2)
12
>>> (square > 5)(2)
False
>>> ((square + cube) > 10)(2)
True

The result of square + cube is a new functor. We can call that new functor with a value and get the same result as if we had called the individual functors with the same value and added up the results ourselves:

>>> square(2) + cube(2)
12
# Composition, stored in a variable
>>> square_plus_cube = square + cube
>>> square_plus_cube(2)
12
# Composition with no intermediate variable
>>> (square + cube)(2)
12

Looks neat enough, but why is this useful? Why use functors?

Functors allow you to construct complex compositions without having to know the input value.

Look at the functor square_plus_cube in the example above. We can pass any value into that functor and get the result. If we were to call the square and cube functors individually we would have to pass in the new value we’re interested in each time. Even more complex operations, such as chaining and binding, could be done when composing functors.

This is powerful because we can then construct functors and pass them around without having to even know what value will be passed in! One piece of code can be responsible for creating functors, while another can accept a functor and pass in the value it got from somewhere.

It is for exactly these benefits that we use functors in the LHCb selection frameworks.

Functors in a selection framework

Most LHCb applications run in two stages:

  1. Configuration, where Python objects are used to define the data and control flow of the application.

  2. Execution, where the configuration specified by the first step is used to create C++ objects, like algorithms, which are executed in the order defined by the data and control flow.

So, the Python configuration does not manipulate event data directly. The configuration only tells the C++ algorithms what to do when they run, and they tell them via configuration parameters. The algorithms are responsible for the details like looping over particles and saving the result.

Because of this, the configuration cannot run functions against the event data. This raises a question: how can we construct complex selections in the configuration if we can’t directly manipulate the objects to be selected directly?

In other words: how can we compose selection functions without having access to their inputs? We’ve seen the answer to this in the previous section: functors!

Look at this example Python configuration of an algorithm which will filter particles:

ParticleFilter(Input=make_long_pions(), Cut=F.PT > 500 * MeV)

The Cut property defines a functor expression. This will be translated to a string behind the scenes which the C++ algorithm will receive as a configuration parameter when it runs. The algorithm will use this expression to build a C++ functor, and will use that functor to create filtered output. Very roughly, it does something that looks like this:

// Convert the string from the configuration to a C++ functor
auto functor = make_functor( get_property( "Cut" ) )
// Create the output container and filter the input
std::vector<const LHCb::Particle*> output;
for ( const auto* particle : input ) {
  // Evaluate the functor with the current particle
  if ( functor( particle ) ) {
    output.push_back( particle );
  }
}
return output;

Using functors in the configuration, and more broadly in the selection framework, allow us to separate the configuration, like what specific cuts to apply, from the execution, where an algorithm does the heavy lifting.

This separation of concerns means we don’t need to write a brand new C++ algorithm for every different selection. That’s a big win!

If you’re curious as to how we go from Python configuration to C++ functor object, read on to the next section.

How ThOr functors work

This is a technical section which explains how ThOr functors work. It might be useful if you’re looking to develop ThOr functors or trying to understand why your functor expression isn’t working.

It explains how the Python representation in the configuration ends up as a C++ object which is evaluated using some input object, how the C++ is structered, and how the functor cache used in production works.

From configuration to C++

To begin, we must understand two things:

  1. The goal of functors.

  2. How a Gaudi application, like Moore, is configured.

With functors we want to be able to express complex selection requirements as part of the Python configuration. This has many benefits, such as not requiring a curious analyst to jump back and forth between Python and C++ to understand what their HLT2 line is doing. This requirement means we need a way to translate from whatever the representation is in the configuration to some C++ object we can run inside an algorithm. But flexibility typically comes at the cost of speed. We run thousands of selections in every event, and we want them to be fast. ThOr tries to meet these competing requirements.

A Gaudi applications broadly consists of two stages: configuration and execution. The goal of the configuration is to define what C++ components should be run and in what order, and what data to pass between themselves. Fundamentally, the goal Python configuration is to construct a big string which defines these things. A dedicated C++ component in Gaudi then decodes this string to figure out what other C++ components should be instantiated and what the values for their various properties should be.

Note

There are lots of important details which make things easy to use, but at a high level this really is all Gaudi Configurables do! A configuration like this:

from Configurables import SomeAlg
SomeAlg(PropA=250, PropB=True, PropC="PT > 250")

Gets translated to a string during the execution of gaudirun.py which looks very much like a dictionary of strings:

"{'SomeAlg': {'PropA': '250', 'PropB': 'true', 'PropC': 'PT > 250'}}"

The Gaudi JobOptionsSvc component parses this string before the Gaudi::Application components uses the resulting C++ map to instantiate and configure each specified C++ component.

Putting these two things together, we understand that whatever the representation of a functor is in Moore, it must eventually be converted to a string, and this string must somehow become a C++ object which is used in an algorithm.

ThOr implements this behaviour by dynamically creating the required C++ functor objects during Moore’s initialisation phase. In essence, this lets you define some C++ code in a string and then compile and execute this string dynamically. This allows ThOr to operate thusly:

  1. Construct a string in the Python configuration which represents the full C++ functor expression to be evaluated.

  2. Inside an algorithm, convert the string to a C++ object.

We can play around to see this in action. For the first step, we can see that the Python ‘functor’ objects are really just data classes which hold the information needed for compilation:

# In a Moore environment, e.g. lb-run Moore/v52r0 python
>>> import Functors as F
>>> str(F.PT)
"('::Functors::Track::TransverseMomentum{}', ['Functors/TrackLike.h'], 'PT')"

Python functor objects, which get bound to algorithm configurables like Filter(Cut=F.PT > 250), eventually have str called on them to convert them into a string. We see here that this string contains three pieces of information:

  1. The C++ code, as a string, which shows how the C++ functor should be instantiated.

  2. The header files necessary to compile the C++ code.

  3. A pretty representation that can be useful for debugging.

The Python functor object exposes each of these pieces of information:

>>> F.PT.code()
'::Functors::Track::TransverseMomentum{}'
>>> F.PT.headers()
['Functors/TrackLike.h']
>>> F.PT.code_repr()
'PT'

The Python functors also know how to create more complex functors through composition. But there’s nothing fundamentally different about these more complex Python functor expressions, they just result in a correspondingly more complex string representation:

>>> str((F.PT > 250) & (F.MINIPCHI2("/Event/PVs") > 4))
'(\'operator&( operator>( ::Functors::Track::TransverseMomentum{}, std::integral_constant<int, 250>{} ), operator>( ::Functors::Track::MinimumImpactParameterChi2<>( /* TES location of input [primary] vertices */ std::string{"/Event/PVs"} ), std::integral_constant<int, 4>{} ) )\', [\'<string>\', \'Functors/TrackLike.h\'], \'( ( PT > 250 ) & ( MINIPCHI2(Vertices=/Event/PVs) > 4 ) )\')'

Now we understand how the string representation is generated, we can move to how the C++ works.

Algorithms that want to use ThOr functors typically include the with_functors helper mixin. The details aren’t important, but it adds properties to the algorithm against which Python functor objects are set, and a decode method which converts the property value to a bona fide C++ functor object.

The decoding is handled by a ‘functor factory’ service called FunctorFactory. It’s this service that takes the various components embedded in the string representation and returns a C++ functor object which can then be called. While doing this it decides whether to JIT compile the C++ functor or retrieve it from a functor cache to create the final C++ object.

Once the C++ object has been created it is bound to an algorithm. This allows the functor to communicate with the algorithm that will be using/owning it. One important use-case for this is to attach a functors data dependencies to the owning algorithm itself, which allows the application scheduler to discover the dependencies. This ensures that the functor-holding algorithm does not run before the producers of the functor’s dependencies have run.

The algorithm now holds a C++ functor which it can pass in objects and do with the result whatever it pleases.

C++ implementation

This section is currently under construction 🚧 See Moore#284.

Chaining and binding operations

We have already discussed about the possibility of composing functors each other to create new functors. The operations between functors are not limited to few arithmetic ones; other operators could turn out to help us to handle more complex compositions. We can to take advantage of the chaining and binding operations to create ad-hoc functors for our selection!

Functor’s chaining and binding operations provide powerful tools for composing functions, leading users to effectively create an expressive and concise code able to cover several functionalities.

Chaining operation (@)

The chaining operator @ is a binary operator between functors that applies the output of the functor on the right as the input of the functor on the left. Mathematically, if we have two functors B and C and we define a new functor A = B @ C, it means A(input) = B(C(input)) when called.

As an example, if we have two functors add_two and multiply_three and we chain them to compose a new functor, we will have:

>>> add_two(3)
5
>>> multiply_three(3)
9
>>> (add_two @ multiply_three)(3)
11

Hence multiply_by_three functor’s output becomes the input to the add_two functor!

Binding operation (bind)

The binding operator bind is a function that is applied to a given functor and expects a list of functors to bind. The net effect when the composed functor is called is that the functor before the bind will have as input argument the output of the functors bound. More explicitly, if we have a functor A = B.bind(C,D), with B, C and D functors themselves, when called the expression will translate as A(input) = B(C(input), D(input)).

Extending the above example, if we have a binary functor (i.e. it expects two inputs) sum:

>>> sum(5,9)
14
>>> sum.bind(add_two,multiply_three)(3)
14

In this case, add_two and multiply_three functors’ outputs are used as inputs to the sum functor!

Functor cache

Using JIT compilation in a production setting has the drawback of slowing down application initialisation as the required functors must be compiled on demand.

That’s why we use a functor cache when running HLT2 in production. This is a shared object library created during the CMake build step of the Moore package, as defined in the MooreCache package:

  1. The production configuration of the Moore application is run up to and including the C++ initialisation phase. The configuration is identical except for a few flags which tell the C++ functor helpers we are running in ‘functor cache mode’.

  2. Instead of compiling the C++ functor expression strings, the C++ strings are written out to files. A hash is associated to each C++ string so that it can be found in the cache later.

  3. The files containing the C++ are compiled using the same compiler and compilation flags we use for compiling the rest of the stack.

When HLT2 is then run to start processing events we explicitly disable JIT compilation compilation, instead requiring that functors be taken from the functor cache.

During initialisation the functor helpers use the functor string hash to load the corresponding C++ functor object from the functor cache, which the algorithm can then execute as normal.

Comparisons between ThOr and LoKi

Both ThOr functors and LoKi functors exists for very similar reasons but differ in several ways. These differences are largely details for most HLT2 line authors, but in case you’re curious this section outlines the biggest ones and explains why they exist.

Configuration

As we’ve seen, the configuration of ThOr functors looks quite different from that of LoKi functors:

>>> loki = "(PT > 250 * MeV) & (MIPCHI2DV(PRIMARY) > 4)"
>>> thor = (F.PT > 250 * MeV) & (MINIPCHI2(pvs) > 4)

LoKi functors are expressed as strings in the configuration whereas ThOr functors are expressed using Python representations. The latter approach allows for configuration-time checks of expressions: if the FAKE functor you’re trying to use doesn’t exist then F.FAKE will raise an error as soon as the configuration runs this line.

Using Python representations directly in the configuration also allows for intuitive introspection, for example using help(F.PT) in a Python prompt.

Throughput

LoKi functors, like ThOr, are safe and flexible, but achieve this in a different way with different assumptions.

Each functor implementation takes great care to verify that its input is of the type it expects and that the information it is trying to compute makes sense. These checks were very important considering the flexibility of the C++ event model from Runs 1 and 2 and the wide range of application styles that were used.

LoKi functors are implemented in a way that relies heavily on virtual function calls, exploiting inheritance trees to share functionality.

Both of these techniques have clear benefits, but they also have a drawback: speed. Sanity checks and virtual function calls each take time, and this adds up over hundreds or thousands of functor calls per event.

In addition, ThOr functors were designed to accommodate the structure-of-arrays data model being adopted for Run 3. This model is optimised for fast data access and improving CPU cache validity, but results in some considerable API differences compared to the array-of-structures model used in Runs 1 and 2. That is why LoKi functors were not adapted for the SOA model, and a new functor framework, ThOr was made instead. ThOr functors operating on SOA data can execute much faster than on AOS data, over 4 times faster!

Technical implementation

The principle of how LoKi functors work is similar to that for ThOr functors: they must go from a string-based representation, passed to an algorithm as a configuration parameter, to a C++ object.

However, LoKi C++ functors have a corresponding Python binding. There is a PT LoKi functor accessible in a Python interpreter which corresponds to the same object as the PT functor in C++. Unlike the ThOr Python functors, which are just representations of the C++ definitions, LoKi Python functors are the real deal: they are the C++ objects. (This works in the same way you can access ROOT.TTree in Python; you’re manipulating the C++ object using Python bindings.)

So, LoKi functor expression strings are actually strings of Python code:

>>> loki_functor = "PT > 250 * MeV"

Algorithms convert this string to C++ by embedding it inside a small Python program, launching a Python interpreter to evaluate the program, and then extracting the resulting C++ object which is created.

This approach is nice for several reasons.

  1. If you need to, you can use the functors in a Python script. The Bender application is commonly used to perform analysis in this way.

  2. You do not need to write a separate Python representation for each functor. Bindings are generated automatically from their C++ counterparts.

There are also some drawbacks:

  1. Because of the way Python bindings are generated, some composition operations require additional Python code to be written and maintained.

  2. There is some overhead to running a Python interpreter inside a C++ algorithm.

  3. One cannot validate functor expressions at configuration time. A non-existent functor inside a string will only raise an error when the algorithm starts to run.

    • In some cases it may be that incorrect functors only raise errors once they are actually executed, that is when they are fed their input. This may happen very infrequently if the functor is in an algorithm that does not run very often, e.g. a post-fit vertex cut inside a combiner looking for a rare physics process.

Of course, ThOr functors have their pros and cons as well! Suffice to say that LoKi functors have served LHCb extremely well for many years, which stands as a testament to their utility, robustness, and implementation effectiveness.

Functor translation tables

This section is useful if you’re Converting an HLT2 line to ThOr functors or just want to understand the relationship between specific LoKi and ThOr functors.

These are not exhausative lists of all ThOr functors. Check out the ThOr functors reference for that.

If there are missing LoKi functors or translations you think might be incorrect, check out the advice in the Missing functors documentation.

There are some conventions followed in each table for compactness:

  1. The symbol F corresponds to the convention of importing the Functors module:

    import Functors as F
    
  2. The symbols pvs corresponds to a Python data handle representing the container primary vertices within the event. These are typically created as:

    from RecoConf.reconstruction_objects import make_pvs
    
    def particle_maker(make_pvs=make_pvs):
        pvs = make_pvs()
        # ...
    

See the Converting an HLT2 line to ThOr functors tutorial for more details on using ThOr in HLT2.

Standard LoKi functors

Standard LoKi functors are those evaluated on individual LHCb::Particle objects. They are used in filter algorithms, like FilterDesktop, and in the ‘child cuts’ and ‘post-vertex fit cuts’ of a combiner algorithm like CombineParticles. Full list of ThOr functors: ThOr functors reference.

LoKi functor

ThOr equivalent

Equal values?

Comments

ALL/PTRUE

F.ALL

✔️

ABSID

F.IS_ABS_ID(<int>)

✔️

BPV

F.BPV(pvs)

️✔️

BPVCORRM

F.BPVCORRM(pvs)

BPVDIRA

F.BPVDIRA(pvs)

BPVETA()

F.BPVETA(pvs)

Assumes input objects is a composite (they have an associated end vertex).

BPVVDCHI2()

F.BPVFDCHI2(pvs)

‘VD’/’FD’ for ‘vertex/flight distance’.

BPVVDZ()

F.BPVVDZ(pvs)

BPVIP()/BESTPVIP()

F.BPVIP(pvs)

✔️

BPVIPCHI2()/BESTPVIPCHI2()

F.BPVIPCHI2(pvs)

✔️

BPVLTIME()

F.BPVLTIME(pvs)

CHI2VXNDOF

F.CHI2DOF

️⚪

In principle but not numerically. See checks in https://indico.cern.ch/event/995287/contributions/4633380/attachments/2354933/4018715/WP3%20JieWu%2020211129.pdf

CHI2IP

F.IPCHI2

️⚪

CHILD

F.CHILD

CL

F.IS_NOT_H/F.IS_PHOTON

For photons: LoKi CL has been an alias to LoKi IS_NOT_H since S21 and F.IS_NOT_H replicates it. For merged pi0: LoKi CL has been an alias to LoKi (1 - IS_PHOTON) and (1 - F.IS_PHOTON) replicates it

DELTAR2

F.DR2

Assumes inputs are charged basics

DETA

F.DETA

✔️

Assumes inputs are charged basics

DIRA

F.BPVDIRA(pvs)

✔️

DPHI

F.DPHI

Assumes inputs are charged basics

E

F.ENERGY

✔️

ETA

F.ETA

✔️

Assumes input objects are charged basics (they have an associated track).

ID

F.IS_ID(<int>)

✔️

INGENERATION

F.INGENERATION(<int>)

✔️

Assumes input objects are composite objects.

INMUON

F.INMUON

✔️

INTREE

F.INTREE

✔️

Assumes input objects are composite objects.

IP

F.IP

✔️

IPCHI2

F.IPCHI2

✔️

ISDOWN

F.TRACKISDOWN

✔️

ISLONG

F.TRACKISLONG

✔️

ISMUON

F.ISMUON

✔️

KEY/PKEY/TrKEY/MCVKEY

F.OBJECT_KEY

M

F.MASS

✔️

MAXTREE

F.MAXTREE

✔️

Assumes input objects are composite objects.

MCMOTHER

F.MC_MOTHER

✔️

MCREC

F.MC_RECONSTRUCTIBLE

MINTREE

F.MINTREE

✔️

Assumes inputs are composite objects.

MIPCHI2DV(PRIMARY)

F.MINIPCHI2(pvs)

✔️

F.MINIPCHI2CUT(pvs) may be a more efficient alternative as it can stop early.

MIPDV(PRIMARY)

F.MINIP(pvs)

✔️

F.MINIPCUT(pvs) may be a more efficient alternative as it can stop early.

MTDOCACHI2

F.MTDOCACHI2(<int>, pvs)

️✔️

NDAUGHTERS

F.NINGENERATION(<predicate>, 1)

✔️

Assumes it is applied to a composite object

NINGENERATION

F.NINGENERATION(<predicate>, <int>)

✔️

Assumes inputs are composite objects.

NINTREE

F.NINTREE(<predicate>)

✔️

Assumes inputs are composite objects.

NONE

F.NONE

✔️

ODIN_BUNCH

BUNCHCROSSING_ID(odin)

✔️

ODIN_BXTYP

BUNCHCROSSING_TYPE(odin)

✔️

ODIN_EVTNUMBER

EVENTNUMBER(odin)

✔️

ODIN_EVTTYPE

EVENTTYPE(odin)

✔️

ODIN_RUN

RUNNUMBER(odin)

✔️

ODIN_TCK

ODINTCK(odin)

✔️

P

F.P

✔️

PCOV2

F.COV

PHI

F.PHI

✔️

PID/MCID

F.PARTICLE_ID

✔️

PIDK

F.PID_K

✔️

PIDe

F.PID_E

✔️

PIDmu

F.PID_MU

✔️

PIDp

F.PID_P

✔️

PIDpi

F.PID_PI

✔️

PROBNNk

F.PROBNN_K

PROBNNe

F.PROBNN_E

PROBNNmu

F.PROBNN_MU

PROBNNp

F.PROBNN_P

PROBNNpi

F.PROBNN_PI

PROBNNghost

F.PROBNN_GHOST

PT

F.PT

✔️

PX

F.PX

✔️

PY

F.PY

✔️

PZ

F.PZ

✔️

Q

F.CHARGE

✔️

SIZE

F.SIZE(container)

container is a data handle to the container whose size to measure.

SUMCONE

F.SUMCONE(<functor>, rels)

RELS is a data handle to the relation table.

TrTYPE

F.TRACKTYPE

✔️

TrCHI2DOF

F.CHI2DOF

✔️

TrCHI2

F.CHI2

️⚪

TrCLONE

F.TRACKISCLONE

️✔️

TrGHOSTPROB

F.GHOSTPROB

✔️

TrHAST

F.TRACKHAST

✔️

TrHASUT

F.HASUT

✔️

TrHASVELO

F.HASVELO

✔️

TrIPSELECTED

F.TRACKISSELECTED

✔️

TRTTRACK

F.TRACKISTTRACK

✔️

TRUP

F.TRACKISUPSTREAM

✔️

VCHI2/CHI2VX

F.CHI2

In principle but not numerically. See checks in https://indico.cern.ch/event/995287/contributions/4633380/attachments/2354933/4018715/WP3%20JieWu%2020211129.pdf

VD

F.BPVFD

wrt the bpv

VDCHI2

F.BPVFDCHI2

wrt the bpv

VDOF

F.NDOF

VMINVDDV(PRIMARY)

F.MIN_ELEMENT @ F.ALLPV_FD(pvs)

️✔️

VX

F.END_VX

VY

F.END_VY

VZ

F.END_VZ

in_range

in_range

✔️

The ThOr functor is in the Functors.math module.

log

log

✔️

Natural logarithm. Base \(e\). The ThOr functor is in the Functors.math module.

Array LoKi functors

Array functors act on lists or vectors of particles. These are most commonly encountered in combiner algorithms where the pre-fit composite is described by an array of candidate children.

An example array LoKi functor is APT, which evaluates the transverse momentum of the transient ‘combination’ object as the transverse component of the sum of the child momenta.

LoKi functor

ThOr equivalent

Equal values?

Comments

AALL/ATRUE

F.ALL

ACHILD

F.CHILD

ACHI2DOCA(x, y)

F.DOCACHI2(x, y)

Indices x and y are child indices (starting from one).

ACUTDOCACHI2

F.MAXDOCACHI2CUT

ACUTDOCA

F.MAXDOCACUT

ADOCA(x, y)

F.DOCA(x, y)

Indices x and y are child indices (starting from one).

ADOCACHI2

N/A

No direct equivalent; use F.MAXDOCACHI2CUT instead.

ADOCAMAX

N/A

No direct equivalent; use F.MAXDOCACUT instead.

AHASCHILD(<predicate>)

F.SUM(<predicate>) > 0

AMAXCHILD

F.MAX

AMINCHILD

F.MIN

AM/AM0/AMASS

F.MASS

ANONE/AFALSE

F.NONE

ANUM(<predicate>)

F.SUM(<predicate>)

APT

F.PT

ASUM

F.SUM

Mathematical operations on functors

Mathematical functions are implemented in Functors.math - for example ::

import Functors.math as fmath fmath.log(F.MINIPCHI2(pvs))