Running with Ganga

Running Moore by hand on your machine is good for small runs over a file or two, but can quickly become too slow as the number of input files you need to process increases.

Running Moore on the Grid lets you run multiple instances simultaneously. Ganga is the interface we use for submitting jobs to the Grid, and Moore is compatible with Ganga.

Note

This page assumes you are already familiar with Ganga and how to submit Ganga jobs using other Gaudi-based LHCb applications, such as DaVinci. The Starterkit lesson on Ganga is a good place to start if you’re unsure.

Configuration

Let’s begin with a typical configuration of Moore for running HLT2 on some Monte Carlo DST files:

options.input_files = ["xrootd://...", "xrootd://..."]
options.input_type = "ROOT"
options.input_raw_format = 4.3
options.data_type = "Upgrade"
options.dddb_tag = "dddb-20171126"
options.conddb_tag = "sim-20171127-vc-md100"
options.simulation = True

# Somewhere later
run_moore(options, lines_maker)

We specify the paths to the input data, the file format (ROOT or MDF), the raw event format, the data type, the tags, and that we’re running over MC.

Because Ganga specifies the locations of the input data for us, we don’t need to give options.input_files:

# Don't need this when using Ganga
# options.input_files = ["xrootd://...", "xrootd://..."]
options.input_type = "ROOT"
options.input_raw_format = 4.3
options.data_type = "Upgrade"
options.dddb_tag = "dddb-20171126"
options.conddb_tag = "sim-20171127-vc-md100"
options.simulation = True

# Somewhere later
run_moore(options, lines_maker)

That one line is all we need to change.

You can also enable HLT2 output if you wish:

options.output_file = 'hlt2_example.dst'
options.output_type = 'ROOT'

# Somewhere later
run_moore(options, lines_maker)

Build

Ganga bundles up a local build of Moore, which is then downloaded and used by each Grid worker node. This build is different from the lb-stack-setup-based build we normally use, as in Developing Moore. We must use an lb-dev-based build instead.

Note

If you want to submit a version of Moore that includes your own changes, which are not yet part of the master branch, you must first push your changes to a branch in Moore.

You can read more about working with lb-dev on the Starterkit lesson.

At the moment, we recommend that you create an lb-dev project using a version of Moore from the latest lhcb-master, which is deployed on CVMFS with the highest priority:

$ lb-dev --platform x86_64_v2-centos7-gcc11-opt --nightly lhcb-master/latest Moore/master
$ cd ./MooreDev_master

If you want to modify packages, you can now checkout those packages, for example:

$ git lb-use Moore
$ git lb-checkout Moore/master Hlt

If you want to use existing modifications in your own branch, use your own branch name above rather than master.

Finally, run the build:

$ make

Job definition

You can now define the job as usual. The Ganga job application type should be GaudiExec, as for any other Gaudi-based LHCb application, like DaVinci:

# Inside a Ganga prompt
In [1]: app = GaudiExec(
   ...:     directory="/path/to/your/MooreDev_master",
   ...:     options=["/path/to/your/hlt2_example.py"],
   ...:     platform=["x86_64_v2-centos7-gcc11-opt"],
   ...: )

In [2]: j = Job(name="MooreJobXYZ", application=app)

Because we took a build of Moore from the nightlies, the Grid jobs must have access to the /cvmfs/lhcbdev.cern.ch CVMFS repository (this is where nightly builds are installed). All tier 1 Grid sites have /cvmfs/lhcbdev.cern.ch mounted, and all MC samples are required to have replicas at T1 sites, therefore we can require our jobs to run at them.

Here is an example of fetching a dataset from the bookkeeping, targeting sites with /cvmfs/lhcbdev.cern.ch mounted:

# Inside a Ganga prompt, after setting up our Job object `j`
In [3]: bkq = BKQuery("/MC/Upgrade/Beam7000GeV-Upgrade-MagDown-Nu7.6-25ns-Pythia8/Sim09c-Up02/Reco-Up01/27163002/LDST")

In [4]: ds = bkq.getDataset()

In [5]: j.inputdata = ds

In [6]: j.backend = Dirac()

In [7]: j.backend.diracOpts = 'j.setTag(["/cvmfs/lhcbdev.cern.ch/"])'

And that’s it. Configure the rest of the Job properties as you normally would, such as j.splitter. Don’t forget to specify output location if HLT2 output is enabled:

# In [8]: j.outputfiles = [DiracFile('hlt2_example.dst')]

Note

As described here, a JSON file needs to be witten out for DaVinci. This can be done by adding a few lines at the end of your option file, but it should be obtained from a run on the grid. However, the JSON files are identical for all subjobs and Ganga will fail to replicate one file to different remote locations. There are two ways to resolve this:

Submit only one subjob to obtain the JSON file and configure your job option by
j.outputfiles = [DiracFile('hlt2_example.dst'), DiracFile('hlt2_tck.json')].
Save the JSON files locally by
j.outputfiles = [DiracFile('hlt2_example.dst'), LocalFile('hlt2_tck.json')].

Note

If you want to run over files at other sites you will need to base your lb-dev environment off of a released version of Moore. For example:

$ lb-dev Moore/v52r0

You can then checkout the Hlt package from your branch and build the project as usual.

Using a released version of Moore, rather than the latest nightly build, is a viable option if you do not rely on features added to Moore since the release you are using. The simplest way to find out if this applies to you is to try using lb-dev with a release version and see if your lines run and you get the output you expect.