WLCG Operations Coordination Minutes, Feb 4, 2021
Highlights
Agenda
https://indico.cern.ch/event/999296/
Attendance
- local:
- remote: Adrian (APEL), Alessandra (Napoli), Alessandro Di G (ATLAS), Alessandro P (EGI), Andreas H (DESY-ZN), Andreas P (KIT), Andrew (TRIUMF), Benjamin (EGI), Catalin (EGI), Christoph (CMS), Concezio (LHCb), Daniel (security), David Cameron (ATLAS), David Cohen (Technion), David South (ATLAS), Federico (LHCb), Gavin (CERN computing), Giuseppe (CMS), Hannah (CERN SSO), Julia (WLCG), Maarten (ALICE + WLCG), Marian (networks + monitoring), Matt (Lancaster), Panos (WLCG), Paolo (CERN SSO), Thomas (DESY-HH)
- apologies:
Operations News
- the next meeting is planned for March 4
Special topics
Remaining dependencies on lcg-bdii.cern.ch
see the
APEL presentation
ALICE
None
ATLAS
None
CMS
None
LHCb
LHCb/DIRAC queries lcg-bdii.cern.ch:2170 to get CE info
Discussion
- Alessandro P:
- most sites have been using
lcg-bdii.cern.ch
by default
- they can switch to the top-level BDIIs of their NGIs instead
- or they can define the accounting message broker explicitly
- Adrian: the explicit definition is supported since quite a while
- Julia:
- sites probably should just do that
- will EGI still consider setting up a top-level BDII then?
- Federico: we use the BDII to discover new CEs as well as CE details
- Maarten:
- there are risks associated with running on resources discovered in the BDII
- normally VOs should first validate the resources they want to entrust with jobs
- Federico:
- we use the BDII to discover if our list of CEs for a site must be updated
- we also use it to discover opportunistic resources for MC simulation jobs
- Thomas: our HTCondor CEs are not published in the BDII
- Julia: the GOCDB also has semi-static info
- Federico: it lacks queues and whether to use single- or multi-core jobs
- Gavin: what do other experiments do?
- Julia: they first run tests and negotiate with the service providers
- Adrian: in principle the GOCDB could be enhanced
- Maarten:
- we already looked into that in the past years
- missing functionality was put into CRIC instead
- Julia: CRIC can bridge the gap, but site admins would have to update it
- Andreas P:
- the information in the BDII cannot always be considered reliable
- that is why there are systems like AGIS and CRIC
- other DIRAC VOs do not have to be affected by the decommissioning of the CERN BDII
- Alessandro P:
- DIRAC can use a list of BDII services provided by big NGIs
- mind: EGI needs site- and top-level BDII services for various use cases
- Julia:
- DIRAC can use other BDII instances
- sites can adjust their APEL configuration or use a different BDII
- we will follow up on these conclusions
- Federico:
- I can change the default in DIRAC, which will then be taken by all DIRAC VOs
- the definition can be a list of BDIIs
- Catalin: if CERN stops its top-level BDII, other NGIs might follow?
- Maarten:
- other NGIs have other communities to support
- running a BDII may be one of the requirements
- Catalin: EGI may set up a catch-all BDII later this year
Experiment use cases for altsecurityidentities
in XLDAP service
see the
presentation
ALICE
None
ATLAS
Only through CRIC
CMS
- certificate mapping to CERN user name for CMS Grid tools
- require CERN (or WLCG) foreign certificate registration and mapping to CERN usernames and email addresses with well defined API (not necessarily LDAP)
LHCb
- Federico: LHCb does not use that functionality
Discussion
- Hannah: we are trying to move away from certificates
- Giuseppe: could the functionality be implemented in CRIC?
- Panos:
- CMS do not want to have the information mixed into their instance
- we could set up a separate instance or use the WLCG CRIC for it
- as also ATLAS have a dependency, it does not look specific to a VO
- Paolo:
- we implemented the functionality only to support CERN SSO use cases
- it was never meant for storing external certificate details
- Hannah: what would be an argument against using CRIC?
- Julia: CRIC was meant for topology use cases, not authorization
- Julia: we need to discuss this internally
- Christoph: could the functionality be added to the new SSO?
- Paolo:
- it would imply significant extra development effort
- we currently depend a lot on the legacy infrastructure for it
- Hannah: the current schema even is specific to Windows
- Julia: what is the timeline for the new SSO to replace the legacy system?
- Hannah: a few more years before the old back-end can be stopped
- Julia: that is very similar to the timeline for phasing out certificates!
- Alessandro Di G: how may we keep the functionality?
- Hannah:
- as it is related to the grid, maybe IAM could handle it instead?
- please bring it up in the Authorization WG
- Maarten: we need to optimize the overall effort spent on this legacy use case
- Julia: we will take it offline
Middleware News
- Useful Links
- Baselines/News
Tier 0 News
Tier 1 Feedback
Tier 2 Feedback
Experiments Reports
ALICE
- High to very high activity on average in the last weeks.
- New record reached on Jan 24 and 31: 176k concurrent jobs.
- No major problems.
ATLAS
- Stable running the last two months including over Xmas break
- Improvements in upgrade software mean a lot fewer jobs with very high memory requirements
- Problem with Swiss CA affected data transfers to/from Uni Bern and CSCS
- TPC: ATLAS will concentrate only on HTTP as a protocol, all xrootd TPC transfers have stopped
- Migration status: dcache: 17, DPM: 23, StoRM: 1, EOS: 1, Xrootd: 2, Total: 44
CMS
- CMS collaboration meeting this week
- running smoothly at around 340k cores
- KIT, CNAF and RAL contributed beyond pledge
- usual production/analysis split of 3:1
- main processing activities:
- Run 2 ultra-legacy Monte Carlo
- Run 2 pre-UL Monte Carlo
- on track or beyond on HPC allocation use
- sustained contribution from US HPCs
- prefer CentOS replacement with longevity, i.e. >>5 year support cycle
- no BDII dependence
- require CERN (or WLCG) foreign certificate registration and mapping to CERN usernames and email addresses with well defined API (not necessarily LDAP)
LHCb
- Federico: essentially NTR
Task Forces and Working Groups
GDPR and WLCG services
Accounting TF
- Discussing with APEL developers integration of the new benchmark in the accounting flow
Archival Storage WG
Containers WG
CREAM migration TF
Details
here
Summary:
- 90 tickets
- 57 done: 29 ARC, 27 HTCondor, 1 none
- 8 sites plan for ARC, 6 are considering it
- 11 sites plan for HTCondor, 6 are considering it, 5 consider using SIMPLE
- 1 ticket without reply
Discussion
- Marian:
- when can CREAM support be switched off in ETF?
- already done for CMS
- Maarten:
- EGI are ticketing sites that did not replace their CREAM CEs yet
- sites have until the end of Feb to migrate without penalty
- if a few more weeks can be tolerated, let's support CREAM until March
- ATLAS, LHCb: no objections
- Julia:
- CREAM support in ETF can be switched off at the start of March
- the remaining open tickets should be updated with that information
dCache upgrade TF
- Almost done. 37 out of 41 instances migrated to 5.2.15 or higher
DPM upgrade TF
- 34 out of 49 DPM sites have migrated to DPM 1.14 and enabled macaroons
StoRM upgrade TF
- 10 out of 24 sites upgraded to 1.11.19
Information System Evolution TF
- CMS CRIC has been upgraded to the latest release
- WLCG CRIC has been upgraded to the latest release.
- MONIT team is planning to switch to the CRIC API instead of experiments VOfeeds.
- Improved home page
- Discussed with the network and perfsonar experts necessary functionality in CRIC to become a WLCG network topology source
- Migration of AGIS to ATLAS CRIC is ongoing
IPv6 Validation and Deployment TF
Detailed status
here.
Monitoring
Network Throughput WG
- perfSONAR infrastructure - 4.3.2 is the latest release
- WLCG/OSG Network Monitoring Platform
- Discussing with CRIC team the possiblity to use it to store the aggregated perfSONAR topology (GOCDB/OSG/NREN/etc.)
- Work on publishing directly from perfSONAR toolkits - tests are on-going
- An issue was identified with central configuration (psconfig/PWA), which is being investigated in collaboration with perfSONAR developers (psconfig degraded for now)
- EU project ARCHIVER will use perfSONAR to test cloud connectivity
- WLCG Network Throughput Support Unit: see twiki for summary of recent activities.
Traceability WG
Action list
Specific actions for experiments
Specific actions for sites
AOB