Week of 130408
Daily WLCG Operations Call details
To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:
- Dial +41227676000 (Main) and enter access code 0119168, or
- To have the system call you, click here
The scod rota for the next few weeks is at
ScodRota
WLCG Availability, Service Incidents, Broadcasts, Operations Web
General Information
Monday
Attendance:
- local: Alessandro, Belinda, Jarka, Maarten, Maria D, Simone
- remote: David, Joel, Kyle, Lisa, Michael, Onno, Pepe, Rolf, Salvatore, Thomas, Tiju, Wei-Jen, Xavier
Experiments round table:
- ATLAS reports (raw view) -
- T0
- GGUS:92166 (transfers to CERN failing with "Error with credential") still open and creating troubles. The issue has been open at the beginning of march, has been intermittent, never really understood AFAIK. CMS did observe the same issue at some point. ATLAS updated the ticket today with the most recent failures. Please investigate.
- Maarten: the expert was overloaded with other urgent matters and then away on holidays; will follow up offline
- Simone: the matter has not been critical because the transfers usually make it OK later, but the monitoring very often has a lot of red, which makes other issues difficult to spot
- T1s
- Issue with file staging at CNAF. GGUS:93165 has been submitted.
- CMS reports (raw view) -
- LHC / CMS
- Rereconstruction of 2012 data in the tails, load at the T1 sites small
- CERN / central services and T0
- We are beginning to treat CERN more as a T1 in terms of transfers, processing
- Tier-1:
- IN2P3 Hammercloud and SAM test failures over weekend seems solved but (as of last night) GGUS tickets still open
- Tier-2:
- LHCb reports (raw view) -
- Mainly user jobs with some MC ongoing.
- T0:
- SLS sensor for LHCb LFC flickering every 5 minutes.
- T1:
Sites / Services round table:
- ASGC - ntr
- BNL - ntr
- CNAF
- the issue reported by ATLAS appears to be due to the StoRM configuration, the ticket should soon be updated
- FNAL - ntr
- IN2P3 - ntr
- KIT
- the issues with ATLAS storage reported last week are not yet resolved and the cause is still unknown; a downtime may be needed for updating GPFS
- NDGF - ntr
- NLT1 - ntr
- OSG - ntr
- PIC - ntr
- RAL
- at-risk downtime tomorrow morning for network maintenance, 2 short breaks expected, FTS will be drained beforehand
- dashboards - ntr
- GGUS/SNOW - ntr
- storage
- the CASTOR file update feature has been disabled today as announced
AOB:
Thursday
Attendance:
- local: Alex, Jarka, Joel, Luca M, Maarten, Simone
- remote: Gareth, Jeff, Kyle, Lisa, Lucia, Michael, Rolf, Stefano, Thomas, Wei-Jen, Xavier
Experiments round table:
- CMS reports (raw view) -
- LHC / CMS
- Rereconstruction of 2012 data in the tails, load at the T1 sites small. User's analysis goes on at constant pace.
- CERN / central services and T0
- We are beginning to treat CERN more as a T1 in terms of transfers, processing
- Tier-1:
- Tier-2:
- LHCb reports (raw view) -
- Mainly user jobs with some MC ongoing.
- T0:
- SLS sensor for LHCb LFC flickering every 5 minutes. (RQF:0190901)
- T1:
Sites / Services round table:
- ASGC
- network uplink interrupted Tue afternoon, fixed Tue night
- BNL - ntr
- CNAF - ntr
- FNAL - ntr
- IN2P3
- local ALICE contact reported MonALISA tests being in error
- Maarten: the job numbers are known to be underreported, that will be debugged
- after the meeting: the issue with job numbers remains to be fixed, while test results look OK
- KIT - ntr
- NDGF - ntr
- NLT1 - ntr
- OSG - ntr
- RAL
- yesterday's planned intervention affecting FTS went OK
- dashboards - ntr
- grid services
- storage
- Apr 18 10:00 proposed for EOS-ALICE upgrade with ~30 min downtime
AOB: