DRAFT
WLCG Operations Coordination Minutes, Nov 11, 2021
Highlights
Agenda
https://indico.cern.ch/event/1094950/
Attendance
- local:
- remote:
- apologies:
Operations News
Special topics
Monitoring of Data Challenges. Lessons learned and plans for improvements.
Middleware News
- Useful Links
- Baselines/News
- Followup of the fallout from the Brazilian CA renewal:
-
canl-java v2.6.0
has been tested OK with dCache
- it will be backported to supported branches, ETA Dec
- the StoRM team have been asked to update from
v2.5.0
Tier 0 News
Tier 1 Feedback
Tier 2 Feedback
Experiments Reports
ALICE
- Mostly business as usual
- The tape challenge was very successful
ATLAS
CMS
- Good CPU usage above 300k cores on average, with sizable contribution from HPC
- main activity Run 2 ultra-legacy Monte Carlo
- WebDAV deployment done for Tier1/2, now at Tier-3 level
- SAM tests for WebDAV are ready to go in production
- Planning to add IAM to CMS production VOMSes list today
- Data challenges and tape tests finished, producing the final report
- (Partial) network outage at CERN on Friday late afternoon (Oct 15th): OTG:0066817
- Several CMS services affected, particularly CMS webservices
- Most issues could quickly be fixed
- Main issue voms-admin clients failing (voms-proxy-init working though)
- MonIT monitoring unavailable due to HDFS outage (Nov 1st) caused by DNS lookup errors OTG:0067144
LHCb
- Smooth running at 140k cores
- Low number of MC, WG and Analysis production requests in the queue
- Reprocessing (stripping) of 2016 is waiting for the request for validation
- Tape data challenge finished on 22/10/2021
- ~10GB/s of throughput was achieved
- Cleaning of the test data is still ongoing
- It helped to detect and solve different problems and bottlenecks
Task Forces and Working Groups
GDPR and WLCG services
Accounting TF
Archival Storage WG
Containers WG
CREAM migration TF
Details
here
Summary:
- 90 tickets
- 84 done: 39 ARC, 40 HTCondor, 1 both, 1 K8s, 3 none
- 1 site plans for ARC, 1 is considering it
- 2 sites have or plan for HTCondor, 1 is considering it
No change since last month.
dCache upgrade TF
DPM upgrade TF
StoRM upgrade TF
Information System Evolution TF
IPv6 Validation and Deployment TF
Detailed status
here.
Monitoring
Network Throughput WG
- perfSONAR infrastructure - 4.4.1 is the latest release (please update ASAP, we also recommend rebooting all nodes after update)
- WLCG/OSG Network Monitoring Platform
- Work is on-going to resolve issues reported to the perfSONAR ream - number issues already fixed, but some are still open
- Recent and upcoming WG updates:
- WLCG Network Throughput Support Unit: see twiki for summary of recent activities.
Traceability WG
Transition to Tokens and Globus Retirement WG
- CMS are starting to use their IAM VOMS endpoint in production
Action list
Specific actions for experiments
Specific actions for sites
AOB