LCG Web>WLCGGDBDocs>GDBMeetingNotes20170412 (2017-04-23, MaartenLitmaath)

EditAttachPDF

Summary of April GDB, April 12, 2017 (CERN) DRAFT

Notes compiled by C. Biscarat & M. Jouvin ; any and all mistakes are ours.

Agenda
Introduction - I. Collier
Argus - A. Ceccanti
LHCONE/OPN Report - E. Martelli
pre-GDB "Collaborating with Other Communities" Report - I. Collier
Benchmarking
- Benchmarking WG Update - D. Giordano
- CPU Units Proposal - A. McNab
Containers
OSG All Hands Meeting - E. Fajardo

Agenda

Introduction - I. Collier

WLCG Manchester workshop registration open ; starts at lunch time June 19th, ends on Wednesday end afternoon ; Thursday morning: IPV6 hands-on; afternoon SKA
eduGAIN login to the CERN Twiki has been enabled ; Users in the “wlcg-external” e-group now have edit rights on LCG topics.
“Computing and Software for Big Science” Journal has now been launched [http://www.springer.com/physics/particle+and+nuclear+physics/journal/41781 one issue available]

Argus - A. Ceccanti

Argus 1.7.0 current release in UMD, stable (bug fix in production + port on Centos7).
Argus 1.7.1 targeting the May UMD release: make Argus aware of authentication profiles
Argus is being integrated in INDIGO-DataCloud AAI (OpenID Connect token)
Funding: main funding from INDIGO (ends in Sept. 2017), if EOSC-hub is approved, funding is secured for the next 3 years

LHCONE/OPN Report - E. Martelli

LHCOPN

RAL: 3rd 10G link, IPv6 configured
CERN: provisioning 3rd 100G link to Wigner

LHCONE

Traffic stable in the last months (no increase): probably due to the LHC shutdown
5 additional sites connected (PL, USA, TW)
ESnet now sees more traffic on LHCONE than on OPN

perfSonar

Waiting for v4 to stabilize the services and the underlying infrastructure
WLCG working on ETF probes to monitor the service
ATLAS working on integrating network metrics into analytics

New collaborations

Belle II: perfSonar infrastructure with MadDash setup
- Some asymmetric performances identified
NOvA: traffic between two main sites (FNAL and FZU NOvA) rerouted over LHCONE in October 2016 - modest load
Pierre Auger Observatory: experiment in Argentina, data storage in Lyon (FR), ask to go through LHCOne
Xenon: 1.5TB/day produced at LNGS (Gran Sasso, IT), just started to connect to LHCOne

Developments in Asia

More sites connected or willing to connect to LHCONE, including China (IHEP and CCNU)
TransPAC: initiative to offer Asia a transpacific 100G link to LHCONE (Seattle)

Network operators confident they are ready for Run3 and started to discuss Run4 challenges

Idea to build a network brokering service based on SDN to improve the usage efficiency of the network infrastructure

Next meeting

colocated with HEPIX fall 2017

pre-GDB "Collaborating with Other Communities" Report - I. Collier

agenda: https://indico.cern.ch/event/578969/ agenda

Idea of the pre-GDB: to invite communities already using WLCG or having upcoming requirements, how we can share/coexist better. Some make good use of the grid (IceCube happy and comfortable), LIGO is struggling a bit more, SKA perhaps just at the beginning of what to do.

Organised as a panel with a few questions ; participants: Michel Jouvin, Maarten Litmaath, Anna Scaife (SKA), John White (EISCAT 3D)

Questions:

Something particularly useful?
One thing you learned?
One next step?

A. Scaife

Good to get here and see other projects and storage needs.
To use the grid, it is useful to have local contact like people in WLCG make faster progress
Looking for more collaboration: confirms that SKA would welcome WLCG participation to its next all-hands meeting in September in Manchester

M. Jouvin

Aware of some projects, good to have better overviews of the current changes.
Distributed computing is the basis of these projects within different constraints.
People are important to establish links: often adhoc contacts so far, example of Icecube with the role of G. Merino
A lot of potential for a fruitful collaboration both on the infrastructure side (WLCG) and on the SW side (HSF).
We could establish a list of contact to facilitate distribution of information, invite each others to relevant meetings/workshops and add SW side (HSF) to future initiatives (and current CWP initiative)

J. White (EISCAT)

Need to pick the best bits of WLCG.
One particular challenge for EISCAT is to change their culture in terms of computing: small community not used to the scale of computing they require
EISCAT may be interested to join some WLCG meetings to learn more from WLCG experience: currently attending mainly EGI meetings but not as focused

Maarten

The various projects are not worlds apart, we see good opportunities for converging on technologies, some communities use the same infrastructure or same SW.
We could share our experience in setting up our model: data challenges, service challenges.
Increase collaboration and cross-participation in meetings to move to facing the big data challenges together. Avoid duplicating a lot of work.

A. McNab

Impressed by the commonalities already existing in term of the kind of resources needed
Collaboration also important for sites as we'd like to avoid providing different set of resources to various communities with similar needs

Ian C.

There was a bit of discussion on network activities: even if the specific LHCOne is targeting the LHC communities, the technology is valuable for other communities.
Wanted to demonstrate that coming to CERN was not so difficult!
Next opportunity to meeting is the WLCG workshop in Manchester
Need to identify what are the other opportunities

Benchmarking

Benchmarking WG Update - D. Giordano

HEPiX WG web site: https://twiki.cern.ch/twiki/bin/view/HEPIX/CpuBenchmark

WG mandate

Investigate scaling issues in HS06 compared to real workloads
Next generation of long-running benchmark
Evaluate fast benchmarks, in particular to estimate the performance of a VM

Since February:

DB12 underwent deep set of analyses
CMS: preliminary comparison of CMSSW performance Vs DB12, KV, HS06
Study Tier-0 job performance with passive methods

Fast benchmark

DB12 (Dirac Benchmark) shows a good correlation for simulation jobs
- Good agreement for ALICE and LHCb
- Work in progress for ATLAS CMS
See slides for detailed analysis of DB12 performance profile
- Haswell perf boost now understood: related to the improvement of branch prediction in Haswell as DB12 is dominated by branch prediction (the +45% boost appears only when running DB12 in # of slots == # physical cores and goes down when profiting of SMT enabled)
- Alternative implementations done with Numpy and in C++: in both cases: dominated by the mathematical functions rather than branch prediction. In addition much faster to run. See https://twiki.cern.ch/twiki/bin/view/HEPIX/DB12VsPythonVersion#DB12np_py_A_python_version_based
Impact on Python and OS versions on DB12
- Effect related to Python version (up to 18%) but marginal effect of the number of parallel processes launched (hyperthreading)
- C++ version not really affected by OS/compiler versions
- C++ and Numpy versions have a better scaling with HS06
Also a detailed profiling study of other fast benchmark candidates
- ATLAS KV (mostly based on GEANT) exhibiting a problem similar to DB12 with SMT
- Geant4: very sensitive to non-locality of the the data: exposes to cache architecture differences
- HS06: CMS found it more greedy than its simulation code (ttbar simulation)
Replacement of HS06 by a fast benchmark: large divergence of opinions
- Not a large enough mix of instruction: exposes to microarchitecture optimizations
- Not a clear understanding of the medium/long-term consequences of such a choice

Also an analysis of T0 activity performance correlation with HS06, using reco jobs (A. Sciaba)

Scaling generally good but 2 exceptions
- Not true on Opteron (6276) by a significant factor
- Haswell tends to perform better than predicted by HS06 at a 10% level

HS06 successor: starting to draft requirements for its validation

Could be HS17 or a HEP suite based on experiment's workloads
Main target architecture will be x86_64
- But plan to explore other architectures of interest like ARM
As in HS06, there will be fixed version of OS and compiler version and options
Perform reproducible studies, including sharing experiment codes through CVMFS or containers
- Includes building a testbed with representative HW and apps/benchmarks

ATLAS and CMS are instrumenting pilots to run, collect and study DB12 scores Vs production jobs (including multi- thread jobs)

Discussion:

Original DB12 shows some problems, NumPy and DB12 scales and avoid the Haswell problem, are you considering to move to NumPy ot C++ for fast benchmark ?
One could provide a suite of several tools and weight the results ; one should not compare to HS06 but to the experiments payload. In this respect the proposed testbed is important and will give the opportunity to collect the results.

CPU Units Proposal - A. McNab

Need to prepare the ability to change/update the benchmark we use for accounting and pledges ; it could also help when there is a new HW generation delivering more CPU units to apps than reflected by the benchmark

Example of Haswell new branch prediction change
Impact on site procurements: either they deliver more than what they are credited for or they may buy older HW that looks less expensive but are more expensive relatively to the perf they deliver

Proposal to introduce a new unit called "CPU Units (CU)" in parallel with HS06 in the accounting system (APEL, portal...) with 1.0 CU = 1.0 HS06 when introduced

Then WLCG has the freedom to change what a CU is without changing the accounting/pledging infrastructure
- Can respond to evolving experiment code
CU definition should be based on empirical evidence about experiment software performance across relevant hardware
A site should pledge in the current value of the CU, even if using old HW
- But a revision should scale up the numbers in such a way that a site will never publish less than before
On newer hardware if the new definition is sensitive to improvements in technology, then new CU value may go up
- Encourage sites to buy HW that deliver the most to experiments
Benchmark could be based on a suite of different microbenchmarks or apps
- Should be open-source, easy to run and distribute
Lower the cost of updating the benchmark: convert benchmarking from a commissioning activity into an operational activity

Discussion

A lot of the people present not really convinced that it solves the problem, just shifting it
- Will allow more optimized changes with a more difficult comparison over time
Ale: thinks that if we convince APEL to support more units, we should do it in a way that allows more units, not just one more

This proposal has been presented in the Benchmarking WG and in the accounting WG already. This topic should be followed up at the Manchester workshop.

Containers

Session about what we can do with containers regarding sites.

Introduction - J. Blomer

VM vs. container virtualizations

Less overhead but less isolation: larger surface exposed to attacks
No privileged operations possible as the container user is treated as a normal user (e.g. no mount)
Not one feature but a set of features that make Linux containers: every component moving at its own pace... adding to the complexity

Container engines

Main products : Docker, Singularity
Dominated by Docker, introduced the push-pull model (from/to registry)
Singularity: new engine from HPC world, very lightweight, removing the unnecessary parts from Docker in our context
Other engines not as popular in our community: Linux containers (lxc), Rocket (rkt), systemd

Ability to build container clusters with orchestrators

Mesos (good for long-running service), Kubernetes (availability to build small cluster), ...

Containers and CVMFS

Bind mounts: mount in the container filesystems mounted in the host machine
- Can be used to access CVMFS: one shared cache for all containers running on a host (and other processes on the host)
Docker volume driver
- Integrates with Kubernetes
- Can be used also to access CVMFS from a container
From inside the container if started as a privileged container: not really used in practice
Coming soon: Docker graph driver that will allow to get container images from an outside source, e.g. CVMFS
- Container images not stored as a single file in CVMFS

Possible dangers with containers: containers foster an attitude of "capturing the mess"

More moving parts (and moving targets) in your system
Automation required: containers must be disposable items
- In particular not carriers for data, databases...

Discussion

If containers are instantiated as a cluster, what's the benefit in term of performances
- The container cluster (VMs) can be instantiated in advance and shared by many containers that will start almost immediately

RAL Experience - A. Lahiff

Since RAL moved to HTCondor, a lot of improvements in the ability to contain user processes with OS features (cgroups...) but still sharing the same root file system

HTCondor introduced the Docker universe in 2015 to run payload in Docker containers

Successfully tested by RAL to run SL6 WN (in containers) on SL7 machines
Nebraska T2 migrated fully to Docker universe last summer

RAL current configuration

HTCondor 8.6.1
CVMFS bind mounted into containers
40% of the batch farm now moved to SL7 machines with payload from many VOs (4 LHC + others) run into containers

Container cluster managers

Using Mesos to manage many different computing activities/resources
Start using Kubernetes to implement a single API across on-premise resources and multiple commercial clouds
- Successfully demonstrated for CMS, LHCb and ATLAS

Plans

Add an xrootd gateway to worker nodes (requires to use SL7 machines)
Provide access to RHEL7 via CEs
- Easy for ATLAS and CMS
- Still need to figure out how to do it with DIRAC and ALICE
Give access to Singularity
- CMS interested to migrate from glexec to Singularity ; useful for other experiments, e.g. ATLAS
Get rid of pool accounts

Containers in ATLAS - A. Filipcic

Motivations and benefits

Similar to VMs but more flexible and no performance loss
Independence of execution environment from the OS
- Isolate ATLAS from site choices/upgrades
- Isolate sites from ATLAS constraints
Easy to make test environments
- Several different environments can be used at the same time on the same site
Common approach for execution, software distribution for all sites (including HPC)

Currently concentrating on Singularity

Docker only for specific use cases (more difficult to deploy)
Singularity easy to deploy by site: one RPM
- No specific UID required: current UID preserved when the container is started
Already decided (last ATLAS S&C Workshop) for large scale singularity deployment, starting with all modern OS sites
Already some good experience at several sites: encouraging all sites to deploy it on recent OS versions
- Some specific steps needed on EL6
Bind mounts: some default ones added, sites must use their local one (scratch space...)
AGIS is container-ready using the 'catchall' parameter
- May consider adding new parameters if needed
Pilots will be improved to allow a per-payload selection of the container to use, based on AGIS settings for the site
- A few weeks ahead...

Long-term plans

All ATLAS jobs will use containers
No more than a basic OS will be requested from sites (CoreOS will be enough)
- Libs, grid MW... will be added in the container - easier for sites and centralised SW deployment
- CVMFS will be used as the main distribution point for container images

Open questions

Image management
- How to manage them? Enable private images?
- Images with a common core and a VO specific part
Security
- Tracing the container activity: instructions for sysadmins
- Handling/fixing of security vulnerabilities
Deployment model:
- Minimal host OS - not compatible with WLCG site requirement ; need to agree in WLCG and if possible others (e.g. Belle II)

Time for a task force?

Jakob:

ATLAS wants to start with img files in cvmfs of ~2GB, this is big, why not going to flat files ?
Andrej: ATLAS prefers to distribute one single file.

Jeff : concerns about the impact of such a move on site responsibility in case of job errors as the site will have less information about the problems

Brian: part of the answer may be a central syslog
Jeff: cannot log every information about every possible errors to syslog... or syslog will become the problem!

Singularity in CMS - B. P. Bockelman

Nebraska (Brian's site) runs Singularity into Docker

All the WNs run as Docker containers

Main objective: simple isolation

Isolate pilot from payload and vice versa
- Processes that can be interacted with, files/filesystems access
Replace glexec, the current and problematic solution to isolation
Make user OS environment as minimal and identical as possible

Singularity

Provides the isolation needed by CMS, does not do resource management (the batch system does)
No daemons, no UID switching
Easy to install: default configuration appropriate, no need to edit config files
User gains no privilege being inside the container
- E.g. all setuid binaries disabled in the container

glexec replacement: Singularity meets CMS needs in term of isolation

In fact adopted by the Isolation and Traceability TF as the glexec replacement
Ironing out the last details to allow sites to adopt it
- Currently a few sites running with this configuration

Will allow to decouple OS installed by the OS (and used by the pilot) from the one used to execute the payload.

The pilot is in charge of instantiating the appropriate container: can use a different container for each payload it schedules
Sites can run EL7 WNs as soon as they provide Singularity
- Otherwise, CMS may be unable to utilize the site.

Singularity images

CMS decided to use Docker images rather than native ones
Singularity can use directories (unpacked images) rather than single image file
- Image can be pretty big (a few GB)
- CMS used to distribute the directory: benefit from the resulting caching

CMS image main characteristics

EL6 image with default passwd, group, shadow for a sane environment
payload run as the pilot user
Mounting user working directory as /srv in the image
Need to figure out the most appropriate way for a site to pass CMS the information about the local file systems that must be mounted in the container

Singularity and SAM tests

Singularity disables all setuid binaries, including glexec
but glexec is a mandatory/critical SAM test for CMS
- Have a SAM probe for Singularity: need to figure out to OR it with the glexec test
Not all CMS tests are EL7 ready anyway

Traceability: glexec provides isolation and traceability but Singularity provides only isolation

Solution 1: sites rely on the VOs to do the appropriate logging and contact them in case of problems
- In fact already happens and some sites comfortable with it
Solution 2: VO asserts to the site what the user will run
- Basically with glexec when setting GLEXEC_CLIENT_CERT
- Work in progress to do this with HTCondor and HTCondor-CE: should be ready end of Spring. No reason to be CMS specific.

Conclusions

Sites may be able to decommission glexec as soon as they deploy singularity. Nebraska will hopefully do this in April!
Looking for interested sites to participate. It’s an exciting-but-young effort: there will be some speed bumps, but will benefit from your help!

LHCb Perspective - A. McNab

Currently 2 sites running with containers: RAL and Skygrid at Yandex

Both use containers derived from DIRAC VMs
From this experience, developing a generic LHCb container definition
- Uses Docker
- CERNVM root image (via CVMFS)
- CVMFS and init scripts to run in the container provided as Docker volumes
- Format supported by Vac and Vcycle

Singularity as a glexec replacement

Need to add a Singularity-based wrapper to replace the glexec-based one in DIRAC: no major difficulty foreseen
Plan to test the approach with LHCb DIRAC VMs replacing the sudo wrapper approach currently used
Singularity is not a requirement to support EL6 environment on EL7 hosts: Docker or VM are other possible approaches
Singularity may also be used to allow users to package their jobs as Docker images
- May help to make analysis more reproducible

Discussion:

idea of users packaging their jobs in containers images very interesting and could be extended to other VO ; question to be clarified: shall we let any user container to go on the grid ?

Security Strengths and Issues - V. Brillault

Containers decouple provisioning and VOs

OS/library independent from VOs
No VOs libraries leaking to provisioning

Containers provide a better isolation than UID switch (glexec)

WN processes and files invisible/not accessible
cgroups to manage resources used

Potential issues

Young technology: new classes of bugs in the kernel, missing support and the ecosystem changing fast
Most kernel bugs can still be exploited with containers: still need the ability to update quickly (emergency updates)
Singularity is still suid: could disappear in 7.4 but a sysctl configuration might be needed
- Disabling suid will disable OverlayFS
Singularity is an attractive technology to replace glexec but would rely on kernel security updates
- No central callout/service required: simpler configuration means less failures but at the price of no traceability to the end-user (see Brian/CMS talk)
- Potential impact on the way central banning is done nowadays: move from site-based central banning to VO-based central banning?

Conclusion on containers (Ian Collier + Maarten Litmaath)

There are things to explore still. A WG should be proposed to iron out the subject.

OSG All Hands Meeting - E. Fajardo

See slides.

LHC is still the most CPU consuming community.

Xenon1t (experiment at Gran Sasso) leveraged different technologies from LHC

ATLAS Rucio
GlideinWMS

Year of retirement for many components!

GRAM: replaced by HTCondor-CE
glexec: Singularity as the replacement
GIP/BDII: replaced by OSG Collector integrated into HTCondor
Gratia: replaced by a decentralized ElasticSearch
Bestman2: replaced by load-balanced GridFtp

New big comers:

singularity

Opportunistic storage replaced by StashCache

Used by LIGO

Future work:

Monitoring SAM is too CMS specific, move to component self-testing
Simplify the VO zoo since all workflows are pilot-based
- Long-term goal: throw away GUMS, the authorization service which is adding a lot of complexity

Current OSG project ends June 2018: discussions in progress for another 5-year extension

Topic revision: r5 - 2017-04-23 - MaartenLitmaath

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback