Week of 131111
WLCG Operations Call details
To join the call, at 15.00 CE(S)T, by default on Monday and Thursday (at CERN in 513 R-068), do one of the following:
- Dial +41227676000 (Main) and enter access code 0119168, or
- To have the system call you, click here
The scod rota for the next few weeks is at
ScodRota
WLCG Availability, Service Incidents, Broadcasts, Operations Web
General Information
Thursday
Attendance:
- local: Arne, Ignacio, Maarten, Marcin, Massimo, Stefano B
- remote: Alexander, Felix, Kyle, Lisa, Michael, Pepe, Roger, Rolf, Stefano, Tiju, WooJin
Experiments round table:
- ATLAS reports (raw view) -
- Central services
- T0/T1
- TAIWAN-LCG2 data loss tracked in BUG:103058 .
- pic GGUS:98821 few files were unavailable - now solved. Due to "We restarted ATLAS pools in order to apply new xrootd plugins. In theory it was a transparent action, but seems some unique files were requested during the restart of the dCache process."
- CMS reports (raw view) -
- No more data from detector until next Spring.
- Central Production and Analysis running normally. Still having CVMFS pains here and there. So far convenient but not as solid as local installation. New "feature": a correctly installed and working software starts being bad at random time.
- From GGUS land:
- GGUS:98817 T2_IN_TIFR came up of long downtime and was not in WLCG BDII. Eventually understood by ASIA ROC as "site was blacklisted because working badly in the past". Being retested now. Basically a communication problem.
- GGUS:98733 T1_US_FNAL problem with local disk, discussion between FNAL admins and FNAL CMS computing operators, not sure why it ended up in GGUS to begin with. Being followed up by CMS Data Ops.
- ALICE -
- ALICE sites that care about the SAM Availability and Reliability reports should ensure they look OK in the ALICE SAM tests, which will be used instead of the Ops tests as of Jan 2014:
Sites / Services round table:
- ASGC - ntr
- BNL - ntr
- CNAF - ntr
- FNAL - ntr
- IN2P3
- dCache has been upgraded to a SHA-2 compliant version; the downtime needed to be extended until Wed morning for a DB conversion
- KIT - ntr
- NDGF
- downtime today for dCache head node upgrade that went OK
- Sun Nov 17 network maintenance affecting all Swedish pools hosting ALICE and ATLAS data for 10 min
- Mon Nov 18 network maintenance for OPN at HPC2N site, affecting access to ALICE and ATLAS data hosted there
- NLT1
- trouble with one dCache file server hosting data for ALICE, ATLAS and LHCb; being worked on
- OSG - ntr
- PIC
- see ATLAS report
- lots of CMS jobs with heavy use of the SE caused SAM tests to time out; being looked into
- RAL - ntr
- CERN central services: the CERN AFS team plan to replace the AFS service key with one that has a stronger encryption type next Tuesday Nov 19th around noon
- SSB notice
- this is expected to be mostly transparent
- databases - ntr
- grid services
- LFC migration to Puppet
- LHCb R/O instance has been OK with 1 such node included for over a week; will move its remaining nodes today
- shared instance (for Ops): idem
- ATLAS: feedback needed
- switch of myproxy to SLC6 Puppet-managed servers on 19/11/2013. Cf. SSB notice
- storage
- on Tue Nov 12 EOS-CMS has been upgraded OK to the Beryl version
- early morning today there was an EOS-ALICE instability
- Maarten: I did not see any complaints
AOB: