Week of 150105

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Alessandro (ATLAS), Andrej (ATLAS), Herve (storage), Luca (databases), Maarten (SCOD + ALICE), Manuel (grid services)
  • remote: Christian (NDGF), Christoph (CMS), Gareth (RAL), Joel (LHCb), Lisa (FNAL), Michael (BNL), Onno (NLT1), Rob (OSG), Rolf (IN2P3), Sang-Un (KISTI)

Experiments round table:

  • ATLAS reports (raw view) -
    • Alessandro:
      • Happy new year from ATLAS!
      • there were not enough jobs to fill all ATLAS resources during the last 2 weeks
      • a big MC campaign could not be launched yet because its physics validation has not finished
      • nonetheless peaks of ~130k concurrent jobs were reached for a few days
      • the sites were essentially in good shape: thanks!
      • activities should soon ramp up again

  • CMS -
    • Best wishes for 2015!
    • Quite some activity during the holiday period
    • Rather good performance and stability of the sites - Thanks
    • Some tickets opened to a few T2 sites
    • CCIN2P3 run out of disk space
      • Solved among CMS experts and Lyon site contact (Thanks Sebastien et al.)
      • GGUS:110950 , can be closed I guess

  • ALICE -
    • Happy New Year!
    • very high activity during the end-of-year break
      • our thanks go to the sites for their good performance!
    • 1 team ticket needed to be opened for CERN: cvmfs-alice.cern.ch was down (GGUS:110968)
      • fixed a few hours later, on a Fri evening!
      • instability of that host is to be followed up

  • LHCb reports (raw view) -
    • HAPPY NEW YEAR 2015.
    • "Legacy Run1 Stripping" campaign running full steam and progressing well, prestaging of input data was restarted + MC and user jobs.
    • T0:
    • T1:
      • RAL : 1 disk server issue, 1 network glitch during the christmas period

Sites / Services round table:

  • ASGC:
  • BNL: ntr
  • CNAF:
  • FNAL:
    • dCache tape back-end outage Wed from 10 AM to 2 PM CST
  • GridPP:
  • IN2P3: ntr
  • JINR:
  • KISTI: ntr
  • KIT:
  • NDGF: ntr
  • NL-T1: ntr
  • NRC-KI:
  • OSG:
    • 1 ATLAS user managed to open an OSG alarm ticket for a problem that does not look critical for the experiment; this matter is followed up in GGUS:110922
      • Alessandro: we will look into how to prevent such misuse
    • a new version of the messaging system is being tested to support the transfer of multi-core accounting records; the aim is to put it in production on the 4th Tue of this month
  • PIC:
  • RAL:
    • there were network breaks on Dec 24 and 25
    • 1 disk server for CMS is currently out
    • the CASTOR service used by ALICE will have its OS upgraded tomorrow
  • TRIUMF:

  • CERN batch and grid services: nta
  • CERN storage services: ntr
  • Databases: ntr
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

Thursday

Attendance:

  • local: Herve (storage), Joel (LHCb), Maarten (SCOD + ALICE), Manuel (grid services)
  • remote: Andrej (ATLAS), Antonio (CNAF), Christian (NDGF), Christoph (CMS), Dennis (NLT1), Di (TRIUMF), Gareth (RAL), Lisa (FNAL), Pepe (PIC), Rolf (IN2P3), Thomas (KIT)

Experiments round table:

  • ATLAS reports (raw view) -
    • CentralServices/T0
      • spot a bug with FTS3 and CASTOR causing some files not being prestaged. This is discussed in the FTS3 steering meeting.
      • ProdSys2 validation still ongoing, now all the jobs are running, we expect news by tomorrow morning.
      • bulk submission of MC tasks is happening now.

  • CMS
    • Nobody can join the call today - sorry
    • Disk endpoints at KIT, PIC and FNAL are rather full
      • Some clean-up has to be done from the CMS production side
      • In contact with CMS site contacts

  • ALICE -
    • continued high activity

  • LHCb reports (raw view) -
    • "Legacy Run1 Stripping" campaign running full steam and progressing well, prestaging of input data was restarted + MC and user jobs.
    • T0: Problem of EOS access wednesday between 3h and 5h am. Severe problem with FTS3. FTS tried to transfer file which are not yet staged.(bug in SRM interface of GFAL2, which was discovered and fixed yesterday)
    • T1:
      • RAL : few files lost
      • CNAF : few file lost

Sites / Services round table:

  • ASGC:
  • BNL:
  • CNAF: ntr
  • FNAL: ntr
  • GridPP:
  • IN2P3: ntr
  • JINR:
  • KISTI:
  • KIT:
    • the dedicated link between KIT and IN2P3 has been phased out: the connectivity is now provided through LHCONE
  • NDGF: ntr
  • NL-T1: ntr
  • NRC-KI:
  • OSG:
  • PIC: ntr
  • RAL:
    • there was a 15-min network break around 09:30 local time
    • high load from LHCb jobs observed, has eased down in the meantime
    • we will follow up on the inaccessible LHCb files
  • TRIUMF: ntr

  • CERN batch and grid services: nta
  • CERN storage services: ntr
  • Databases:
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

-- AndreaSciaba - 2014-12-16

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2015-01-08 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback