Week of 140217

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following:
    1. Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or
    2. To have the system call you, click here

  • In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found here. The SCOD will email the WLCG operations list in case the Vidyo backup should be used.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Alessandro, Felix, Luca M, Maarten, Nacho
  • remote: Gareth, Kyle, Lisa, Michael, Onno, Pavel, Pepe, Rolf, Sang-Un, Stefano, Vladimir

Experiments round table:

  • ATLAS reports (raw view) -
    • Central Services
      • NTR
    • T1
      • BNL-ATLAS: Sat-Sun BNL dCache was suffering from the high IO load in the name space database, caused by the auto-vacuum of the PostgreSQL. dCache down, site blacklisted for transfers and production (GGUS:101275). Fixed now.
      • INFN-T1: Sat-Sun thousands of jobs failing at INFN-T1, failure rate at about 60%, with get error, staging input files failed (GGUS:101281).

  • ALICE -
    • NTR

Sites / Services round table:

  • ASGC
    • downtime Mon 24 04:00-10:00 to fix 2 issues:
      • firmware bug in storage back-end for CASTOR DB
      • DPM HW failure
  • BNL
    • over the weekend the dCache name service provider became unresponsive due to a similar behavior as was seen for the SRM: a massive vacuum operation launched by the PostgreSQL DB interfered with name server queries from Chimera; parameters were adjusted and the situation has been stable since; the resolution will be communicated to the dCache admin forum
  • FNAL - ntr
  • IN2P3 - ntr
  • KISTI
    • downtime tomorrow 0:00-9:00 for network maintenance
  • KIT - ntr
  • NLT1 - ntr
  • OSG - ntr
  • PIC
    • today there was a downtime of the tape back-end system; we tried to use the new GOCDB mechanism to declare that only the tape back-end was affected, but we did not manage
      • Maarten: to be followed up in the Ops Coordination meeting on Thu
  • RAL - ntr

  • GGUS
    • Activity for the last 4 weeks attached to this page for tomorrow's MB.
  • grid services
    • transparent WMS updates tomorrow
  • storage
    • LHCb EOS SRM upgraded OK to new SHA-2 compliant version

AOB:

Thursday

Attendance:

  • local: Felix, Maarten, Nacho, Oliver
  • remote: Dennis, Jeremy, John, Lisa, Michael, Pavel, Rolf, Sang-Un, Saverio, Vladimir

Experiments round table:

  • CMS reports (raw view) -
    • global xrootd redirector stopped serving responses, fixed through restart GGUS:101414
      • work to introduce redundancy and provide proper critical service instructions started
    • CNAF T1 is in downtime to upgrade storage, production activities were stopped but queues were kept open for analysis reading input via AAA, seems to work fine

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • MC simulation and user jobs. Stripping verification.
    • T0: NTR
    • T1: NTR

Sites / Services round table:

  • ASGC
    • reminder: downtime Mon 24 04:00-10:00 to fix storage-related issues
  • BNL - ntr
  • CNAF - ntr
  • FNAL - ntr
  • GridPP - ntr
  • IN2P3
    • on March 18 there will be maintenance affecting various services:
      • batch will be down for at least half a day
      • mass storage downtime duration not yet known
  • KISTI - ntr
  • KIT - ntr
  • NLT1
    • short downtime yesterday to replace a broken disk controller; may have affected availability of ATLAS data
  • RAL
    • on Tue the FTS-3 twice suffered a downtime of ~2h, due to a failed move of the MySQL DB; the service is OK now
      • Oliver: beware this service is becoming steadily more important for production

  • GGUS
    • NB! Monthly Release next Wed 2014/02/26 with ALARM tests.
  • grid services
    • fts-t2-service.cern.ch service certs had expired, fixed on Mon (GGUS:101301)
    • FTS-3 was updated as agreed with the users
    • WMS updates went OK

AOB:

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx 2014-02-18.pptx r1 manage 2869.2 K 2014-02-17 - 12:10 MariaDimou Final GGUS slides for the 2014/02/18 WLCG MB
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2014-02-20 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback