Week of 130527
Daily WLCG Operations Call details
To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:
- Dial +41227676000 (Main) and enter access code 0119168, or
- To have the system call you, click here
The scod rota for the next few weeks is at
ScodRota
WLCG Availability, Service Incidents, Broadcasts, Operations Web
General Information
Monday
Attendance:
- local: Felix, Jarka, Vladimir, Stefan, Maarten, Alexandre
- remote: Roger, Paolo, Rolf, Onno, Stefano, Pepe
Experiments round table:
- ATLAS reports (raw view) -
- Central services
- One of the Pilot factories (voatlas171) at CERN got stalled on 24th May after 13 hrs UTC. It was restarted on Sunday 26th May after 1am UTC. The issue is on the APF itself, and it is being addressed by with pilot factory developers. Temporary fix (auto-restart) deployed on Sunday.
- T0
- ALICE -
- IN2P3: 110 TB added through 2 additional xrootd servers, thanks!
- LHCb reports (raw view) -
- Incremental stripping campaign in progress and MC productions ongoing
- T0: (GGUS:94346) cvmfs doesn't update local cache correctly
- T1:
- Other : (GGUS:93966) Request to GGUS to allow fine-grained SE reporting by sites. Current suggestion is to have different end points for Tape and Disk.
Sites / Services round table:
- IN2P3: NTR
- NDGF: problem Sat/Sun night with some pools that went down, problem fixed Sunday afternoon. Downtime today for some pools at 4pm
- CNAF: NTR
- ASGC: NTR
- NL-T1: NTR
- PIC: NTR
- Dashboard: NTR
- Grid services: NTR
AOB:
Thursday
Attendance:
- local: Maria, Alessandro, David, Jan, Maarten, Stefan
- remote: David, Vladimir, Ronald, Lisa, Roger, Tiju, Rolf, Jeremy, Paolo, Rob, Pepe, Felix
Experiments round table:
- ATLAS
- Central services
- APF: the issue on Monday has been understood and fixed. More resiliency has been setup on APF machines to have always at least 2 nodes covering each PandaResource.
- T0/1
- INFN-T1 high percentage of failures. Problems in getting turl of files within few seconds GGUS:94389
- RRC-KI-T1 transfer failures GGUS:94363 . now solved.
- CMS
- Production activity at moderate levels at T1's, and reasonably heavy at CERN.
- GGUS:94330 for IN2P3 over weekend -- CVMFS problem quickly fixed (ticket solved)
- LHCb
- Incremental stripping campaign in progress and MC productions ongoing
- T0: (GGUS:94346) cvmfs doesn't update local cache correctly
- T1:
- Other : (GGUS:93966) Request to GGUS to allow fine-grained SE reporting by sites. Current suggestion is to have different end points for Tape and Disk.
Sites / Services round table:
- ASGC: NTR
- NL-T1: NTR
- FNAL: NTR
- NDGF: some pools in Slowenia unavailable. Alessandro: how many TBs? Roger: Around 650-700TB of Atlas data is stored on these pools.
- RAL: NTR
- IN2P3: NTR
- GridPP: NTR
- CNAF: problem with SRM, going to be fixed today
- OSG: NTR
- PIC: Announcement: 10 June, declared DT for upgrade of dcache headnodes and intervention on network.
- Dashboard: NTR
- Storage: EOS/CMS unstable lately - fix available, needs to be deployed. SRM also showed instabilities. Dropped out of redirector in Bari, problem is fixed. Reminder: several transparent interventions on Castor databases for all VOs will be done next week
- GGUS: Important GGUS release next Wednesday 2013/06/05, especially due to the amount of Support Units being decommissioned. Check the dev. items' list http://bit.ly/14Jhw0C for details
AOB: