Summary of October GDB, October 11, 2017 (CERN)
Agenda
https://indico.cern.ch/event/578991
Introduction - I. Collier
presentation
See slides, in particular for next workshop details
- Workshop date: March 26-29 2018
Batch Systems and CE - M. Wadenstein
presentation
Editorial board set up: Maarten, Mattias, Alexandra, Helge
- An egroup will be created to make communication easier
DPHEP Update - J. Shiers
presentation
DPHEP last workshop (March): HEP no longer alone with large production LTDP repositories
FAIR data principles
- Findable, Accessible, Interoperable and Re-usable
- Based on globally unique and eternally persistent identifiers!
- Interoperable: not between HEP experiments but between all sciences!
- Trustworthy Data Repositories (TDRs) at the core of FAIR and ability to share data
- FAIR evolved to include SW in addition to data+metadata
ISO16363 certification, based on AOIS breakdown: CERN preparing for it
- Many elements covered by current CERN practices but some weak areas to be addressed, sometimes just a matter of better documented what is done
- Not only scientific data but organization data
- See slides for details about certification contents and challenges
- Some topics require close collaboration with experiments
- Some projects, like EDUAT, encourage their members to get certified
- CERN goal: complete first certification prior to next ESPP update in 2020
DPHEP : agreement that LTDP includes date + documentation + SW + environment
- Invenio-based services often used for documentation
- CVMFS+CERNVM used for SW and environment
- Less clear: which sites offer bit preservation as a service
- Difficulty: motivate people for a problem that will show up a long time in the future when most people will not be there anymore.
- Collaboration less lively than before: a workshop every 2 years
- Most active sites: CERN and FNAL (neutrino experiments)
CRIC - A. D. Girolamo
presentation
See slides.
Application Perf vs. Core Number -M. Alef
presentation
See slides.
WLCG Storage Accounting - O. Keeble
presentation
Convergence reached on recommendations! Time to start implementing a prototype!
- DPM and EOS already have the recommendations implemented: experiments can start using them
- Also need to get dCache and Xrootd onboard for the prototype implementation: were involved in recommandations and no problem foreseen
- Not yet time for deployment: need to be integrated into standard product distribution first
CVMFS Update - J. Blomer
presentation
2.4 released last August
- Versioning: has always been present but never exposed. Can now display changes between any two snapshots on the release manager machine
- On the client side, history exposed under .cvmfs subdirectory (.cvmfs no shown by the ls command but is properly handled if specified)
- Branching: allow to create a new snapshot from any existing revision rather than just the last one
- Cache plugins: offers the possibility to locate the cache outside a local disk on the worker node
- Can be in-memory, Xrootd, Hadoop...
- Support for HPC systems, diskless WN...
- Plugins in charge of ensuring cache consistency: a key feature CVMFS client relies on
- Cache hierarchy: example hot cache in memory (small), warm cache on a cluster file system
- Repository updated reduced from 30 mn to 5 mn
- Support for triggered replication (opposed to current pulled replication from Stratum 1)
- Some refinements to come to addresss some possible security concerns in push mode
- Debian 8/9 support
- Collaboration with LIGO: used for data access through CVMFS developed by OSG
- Support for Yubikey
CVMFS deployment status: 36% of sites still at 2.1.19/20
- Can we set the new WLCG baseline to 2.4? Probably yes... but need to be caution on the operational implications
- Preference is:
- Check that main WLCG sites are running 2.4 (are part of the 26%)
- Encourage deployment 2.4 deployment
- ALICE may want to use new versioning/branching features: start wide deployment in ALICE sites and get feedback
- Add a ETF test for CVMFS version?
- Once we are confident enough, change the baseline
System performance and cost model working group - M. Schulz
presentation
Introduction to SciTokens - B. Bockelmann
presentation
SciTokens: project started last July aiming at introducing a capabilities-based authz infrastructure
- Also provide a reference platform combining CILogon, HTCondor, CVMFS
Capabilities-based authz: rather than globaly expose identity and policy, just share capabilities
- Token-based infrastructure: tokens describe the capabilities a bearer have
- For traceability, the contain can contain an identifier that can be used by a VO/user to provide the identity associated with the token. Better privacy preservation.
- Access provided only to a subset of a resources
- The whole world moved authorization to capabilities: e.g. OAuth2
- In OAuth2, 3 components: authz server (also the identity provider), ressource owner (end-user) who approves authz, the client that receives tokens (e.g. web app)
- In fact concepts very close to what ALICE has been doing for a decade...
SciTokens main work areas:
- Integrate OAuth2 into HTCondor submit host
- CILogon OAuth2 support being enhanced to support VO scopes
- HTCondor will manage token lifetime, renewal...
- Xrootd and CVMFS being enhanced to access data using a token rather than a grid credential
- Tokens for distributed infrastructure: the token is not for one particular "client" but for all possible clients in the VO
See demo on
http://demo.scitokens.org
--
IanCollier - 2017-10-23