Week of 230717
WLCG Operations Call details
- The connection details for remote participation are provided on this agenda page.
General Information
- The purpose of the meeting is:
- to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
- to announce or schedule interventions at Tier-1 sites;
- to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
- to provide important news about the middleware;
- to communicate any other information considered interesting for WLCG operations.
- The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
- The SCOD rota for the next few weeks is at ScodRota
- Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to the
wlcg-scod
list (at cern.ch
) to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.
Best practices for scheduled downtimes
Monday
Attendance:
- local: Maarten (WLCG + Alice + GGUS), Panos (WLCG + CMS + Chair)
- remote: Andrew (TRIUMF), Daniele (CNAF), Darren S. (NDGF), Darren M. (RAL), David M. (FNAL), Julia (WLCG), Mark (LHCb), Onno (NL-T1) , Peter (ATLAS), Xavier (KIT)
Experiments round table:
- LHCb reports ( raw view) -
- Tier 1:
- CNAF:
- Chris H working with site to do significant remounting/reconfiguration of storage on Wednesday (GGUS:162740)
Sites / Services round table:
- ASGC:
- BNL: NTR
- CNAF:
- GGUS:162189 ticket still open; is it "waiting for reply" addressed to us or to FTS? We don't have a recent log for a failed transfer since transfers have not being submitted for some weeks.
- Mark will have a look, the ticket can most probably be closed
- LHCb Storage Area reorganization - disk and tape filesystems unmounted: https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=34238 - keeping track of this activity on the ticket: GGUS:162740
- EGI:
- FNAL:
- IN2P3:
- JINR:
- KISTI:
- KIT: NTR
- NDGF:
- NL-T1: After last Thursday's kernel downgrade, the Sara dCache now seems to running smoothly. Contrary to what we've announced, IPv6 is still the preferred protocol. If there are still problems, we could plan another downtime to change that.
- Maarten asked what was actually wrong with the Kernel. The new kernel is a bleeding edge kernel so problems have arisen, NL-T1 is migrating to the LT kernel which should be more stable.
- Maarten suggested that the IPv6 issues can be brought up to a larger audience in some HEPIX mailing list, probably the IPv6 working group. Onno will bring this up to the network team.
- NRC-KI:
- OSG:
- PIC: We have an incident with our IBM TS4500 library, it seems the tape robot arm broke on Wednesday last week. This is a serious incident and technicians are solving the issue. We expect the library to come back in a few days. For the moment the writes are stopped (thanks to ATLAS, CMS and LHCb for stopping writes to tape system at PIC). Sorry for the inconveniences.
- RAL: NTR
- TRIUMF: NTR
- CERN computing services:
- CMS IAM DB down for OS upgrade, approx 7 minutes, July 18 07:00 - 08:30 OTG:0078587
- CERN storage services:
- CERN databases:
- GGUS: NTR
- Monitoring:
- Middleware: NTR
- Networks:
- Security:
AOB: