Skip to content

cernops/higgs-demo

Repository files navigation

Higgs Demo

Binder

Reperforming the Higgs analysis using Kubernetes.

This tool helps with the submission and management of the Higgs Analysis against any existing Kubernetes cluster. It relies on the Higgs dataset being accessible either in a S3 or GCS bucket.

Dataset

The CMS Higgs dataset is available as open data, and has 70TB over 26920 files.

It is expected that it is made available in the same cloud provider you plan to run the computation - S3 or GCS are supported.

Configuration Files

If you look into the config directory you can find a few examples of demo configuration files. It offers enough flexibility to scale the demo and to split the submission across multiple clusters if needed.

Each JSON entry in the list includes the information for a corresponding cluster named kubecon-demo-$i, where $i is the list index. The remaining fields include the node flavor to be used, the number of cluster nodes, and the path to use for the stage-in data (/dev/shm for shared memory, which is the recommended setup).

Command Line

The higgsdemo command line offers the following functionality.

Clusters Create

clusters-create will create one cluster per list entry in the config file, matching the given flavor and number of nodes. Clusters will be named kubecon-demo-$i, where $i is the list index in the config.

This is mostly useful where splitting the submission across multiple clusters is required.

Prepare

This is useful to speed up the actual demo, and makes sure the docker image is available in advance on every cluster node.

higgsdemo prepare --dataset-mapping config/demo-highmem-minimal.json

Submit

Here's an example submission for a scaled down analysis:

higgsdemo submit --dataset-mapping config/demo-highmem-minimal.json --access-key ... --secret-key ... --redis-host ...

The dataset-mapping should point to one of the config files. It will spawn a set of parallel processes (one per cluster) doing the submission. The remaining params are the access and secret keys to either S3 or GCS.

Watch and Cleanup

To check the evolution of the computation you can trigger a watch in a specific cluster:

higgsdemo watch --cluster kubecon-demo-0
{'pods': {'Running': 48, 'total': 56, 'Succeeded': 8}, 'jobs': {'total': 56, 'succeeded': 8}}
...

To reperform the computation in an existing cluster, a cleanup is required:

higgsdemo cleanup --cluster kubecon-demo-0

Sample Run

higgsdemo clusters-create --dataset-mapping config/demo-high-mem.json

for i in 0 1 2 3 4 5 6 7 8 9; do gcloud container clusters get-credentials --region europe-west4 kubecon-demo-$i; done

higgsdemo prepare --dataset-mapping config/demo-high-mem.json

# monitor prepull progress in a single cluster
kubectl --context ... -n prepull get po

higgsdemo submit --access-key ... --secret-key ... --dataset-mapping config/demo-high-mem.json

Installation

virtualenv .venv
. .venv/bin/activate
pip install -r requirements
python setup.py install