Week of 151109

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Links to Tier-1 downtimes

ALICE ATLAS CMS LHCB
  BNL FNAL  

Monday

Attendance:

  • local: Andrea Sciabà, Krystof Borkovec, Maarten Litmaath (ALICE), Luca Mascetti (IT-DSS), Jerome Belleman (IT-PES), Katarzyna Dziedziniewicz-Wojcik (IT-DB), Andrea Manzi (MW officer)
  • remote: Michael Ernst (BNL), Asa (ASGC), Christoph Wissing (CMS), Dmytro Karpenko (NDGF), Rolf Rumler (CC-!IN2P3), Lisa Giacchetti (FNAL), Onno Zweers (NL-T1), Sang Un Ahn (KISTI), Gareth Smith (RAL), Francesco Noferini (CNAF), Pepe Flix (PIC), Kyle Gross (OSG)

Experiments round table:

  • ATLAS reports (raw view) -
    • Grid production activities ongoing, T0 spillover test almost finished

  • CMS reports (raw view) -
    • Good resource utilization: sustained 120k cores busy
    • No major issues to report

  • ALICE -
    • NTR

Sites / Services round table:

  • ASGC: ntr
  • BNL: ntr
  • CNAF: We planned (again) an upgrade of the OS in the frontier general-purpose (non LHC) router to fix an OS bug. We will schedule a down for the service and an LCG-"at risk" because of the problems we got in the previous attempt. The intervention is scheduled on 24th of November at 18.00. The intervention should last 5' but it has been declared for 30'.
  • FNAL: ntr
  • GridPP:
  • IN2P3: ntr
  • JINR:
  • KISTI: ntr
  • KIT: ntr
  • NDGF: ntr
  • NL-T1: last week's downtime did not solve the Storage Manager timeout problem. Investigation is continuing.
  • NRC-KI:
  • OSG: ntr
  • PIC: ntr
  • RAL: ntr
  • TRIUMF:

  • CERN batch and grid services:
    • IPv6 dual-stack Myproxy node available for testing: px510.cern.ch. The plan is to have the same setup on myproxy.cern.ch by January.
    • myproxy.cern.ch will receive an update on Monday 16 (please read the announcement placed on the ITSSB)
    • there were problems with job failures affecting all VOs, which reappeared also after applying some mitigation measures.
  • CERN storage services: updated EOSATLAS to the latest version
  • Databases: ntr
  • GGUS:
  • Grid Monitoring:
  • MW Officer: Critical Vulnerability broadcasted by SVG on Friday affecting NSS. (https://wiki.egi.eu/wiki/SVG:Advisory-SVG-2015-CVE-2015-7183). All software where the SSL handshaking is based on Mozilla Network security services which includes RedHat 6 and 7 and its derivatives is affected ( for instance libcurl uses NSS). All running resources based on Red hat and it's derivatives MUST be patched by 2015-11-13 T21:00+01:00. Sites failing to act and/or failing to respond to requests from the EGI CSIRT team risk site suspension.

Sites please read the link with the full advisory, as it explains which services are affected. After applying the patch, the services must be restarted.

For CERN, the patch will be installed automatically for puppet based nodes, but the restart needs to be done manually.

AOB:

Thursday

Attendance:

  • local: Luca (SCOD+Storage), Jerome and Krystof (Batch&Grid), Kate (DB), Giuseppe (CMS)
  • remote: Dario (ATLAS), Zoltan (LHCb), Asa (ASGC), Michael (BNL), Matteo (CNAF), Lisa (FNAL), Rolf (IN2P3), San (KISTI), Dmytro (NDGF), Andrew (NL-T1), Kyle (OSG), Jose (PIC), Gareth (RAL), Thomas (KIT)

Experiments round table:

  • ATLAS reports (raw view) -
    • Grid production activities ongoing, still some T0 spillover tasks running and transferring; Went over 250 k slots used (!) thanks to opportunistic resources (HPC).

  • CMS reports (raw view) -
    • No major issues to report
    • Tier0 updated to CMSSW_7_5_5
    • Preparing for Heavy Ion run

  • ALICE -
    • low activity today, now ramping up again

  • LHCb reports (raw view) -
    • Data Processing:
      • Data processing of pp data at T0/T1/T2 sites.
      • Monte Carlo mostly at T0/T1/T2/T2D, user analysis at T0/1/2D sites
    • T0
      • NTR
    • T1

Sites / Services round table:

  • ASGC: NTR
  • BNL: NTR
  • CNAF:
    • New storage for ALICE was physically delivered
    • Downtime for the LHCb storage due to an upgrade of the firmware (it should terminate in the afternoon).
  • FNAL: Downtime next monday to patch nss and upgrade machines' kernel from 9:00 to 16:00 CST, dCache and EOS affected
  • GridPP:
  • IN2P3: published SIR on network incident (3/11/2015)
  • JINR:
  • KISTI: running with half of the job slots from next week till early February 2016
  • KIT: NTR
  • NDGF: NTR
  • NL-T1: NTR
  • NRC-KI:
  • OSG: problem assigning ticket GGUS:116787
  • PIC: NTR
  • RAL: NTR
  • TRIUMF:

  • CERN batch and grid services:
    • myproxy.cern.ch's intervention has been rescheduled (Thu 19), please read the announcement placed on the ITSSB
  • CERN storage services:
    • EOS and CASTOR upgrades done during the week
  • Databases:
    • Delay on the CMS DataGuard Tuesday evening due to network congestion
    • Ongoing intervention to ALICE DataGuard
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2015-11-12 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback