Higgs Demo

Reperforming the Higgs analysis using Kubernetes.

This tool helps with the submission and management of the Higgs Analysis against any existing Kubernetes cluster. It relies on the Higgs dataset being accessible either in a S3 or GCS bucket.

Dataset

The CMS Higgs dataset is available as open data, and has 70TB over 26920 files.

It is expected that it is made available in the same cloud provider you plan to run the computation - S3 or GCS are supported.

Configuration Files

If you look into the config directory you can find a few examples of demo configuration files. It offers enough flexibility to scale the demo and to split the submission across multiple clusters if needed.

Each JSON entry in the list includes the information for a corresponding cluster named kubecon-demo-$i, where $i is the list index. The remaining fields include the node flavor to be used, the number of cluster nodes, and the path to use for the stage-in data (/dev/shm for shared memory, which is the recommended setup).

Command Line

The higgsdemo command line offers the following functionality.

Clusters Create

clusters-create will create one cluster per list entry in the config file, matching the given flavor and number of nodes. Clusters will be named kubecon-demo-$i, where $i is the list index in the config.

This is mostly useful where splitting the submission across multiple clusters is required.

Prepare

This is useful to speed up the actual demo, and makes sure the docker image is available in advance on every cluster node.

higgsdemo prepare --dataset-mapping config/demo-highmem-minimal.json

Submit

Here's an example submission for a scaled down analysis:

higgsdemo submit --dataset-mapping config/demo-highmem-minimal.json --access-key ... --secret-key ... --redis-host ...

The dataset-mapping should point to one of the config files. It will spawn a set of parallel processes (one per cluster) doing the submission. The remaining params are the access and secret keys to either S3 or GCS.

Watch and Cleanup

To check the evolution of the computation you can trigger a watch in a specific cluster:

higgsdemo watch --cluster kubecon-demo-0
{'pods': {'Running': 48, 'total': 56, 'Succeeded': 8}, 'jobs': {'total': 56, 'succeeded': 8}}
...

To reperform the computation in an existing cluster, a cleanup is required:

higgsdemo cleanup --cluster kubecon-demo-0

Sample Run

higgsdemo clusters-create --dataset-mapping config/demo-high-mem.json

for i in 0 1 2 3 4 5 6 7 8 9; do gcloud container clusters get-credentials --region europe-west4 kubecon-demo-$i; done

higgsdemo prepare --dataset-mapping config/demo-high-mem.json

# monitor prepull progress in a single cluster
kubectl --context ... -n prepull get po

higgsdemo submit --access-key ... --secret-key ... --dataset-mapping config/demo-high-mem.json

Installation

virtualenv .venv
. .venv/bin/activate
pip install -r requirements
python setup.py install

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
config		config
datasets_s3		datasets_s3
higgsdemo		higgsdemo
lumi		lumi
min_datasets_s3		min_datasets_s3
notebook		notebook
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cm-runjob.yaml		cm-runjob.yaml
ds-prepull.yaml		ds-prepull.yaml
job-template.yaml		job-template.yaml
prepare.sh		prepare.sh
requirements.txt		requirements.txt
setup.py		setup.py

License

cernops/higgs-demo

Folders and files

Latest commit

History

Repository files navigation

Higgs Demo

Dataset

Configuration Files

Command Line

Clusters Create

Prepare

Submit

Watch and Cleanup

Sample Run

Installation

About

Resources

License

Stars

Watchers

Forks

Languages