Week of 200217

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Kate (WLCG, chair), Julia (WLCG), Maarten (WLCG, ALICE), Michal (ATLAS), Olga (computing), Marian (network, monitoring), Remy (storage), Nicolo (storage)
  • remote: Andrew (NL-T1), Jens (NDGF), Vladimir (LHCb), Darren (NDGF), Elena (CNAF), Christoph (CMS), Dave M (FNAL), Sang Un (KISTI)

Experiments round table:

  • ATLAS reports ( raw view) -
    • Activities:
    • Issues
      • ATLAS dashboards showing no data between Wednesday evening and Thursday morning (RQF:1527003)
        • The monitoring project on HDFS ran out of quota for the "number of files" (OTG:0054839)
      • few not fully filled bins in running jobs plots (RQF:1513867, RQF:1529407)
        • The InfluxDB database was not responsive since yesterday night. Database is operational again and the holes recovered.
      • atlas-in2p3-cc-frontier degraded (GGUS:145575)
      • pilots failing at IN2P3-CC - mistake in HTCondorCE config
      • pilots at BNL_PROD_UCORE are failing with "atlas software repository NOT found" (GGUS:145539, GGUS:145529) - few broken WNs
      • Transfer timeouts from FZK-LCG2_MCTAPE (GGUS:145564)

  • ALICE -
    • NTR

  • LHCb reports ( raw view) -
    • Activity:
      • Stripping campaign ongoing, occupying most of T0/1 capacity (no T2s)
      • Staging for 2016 is almost finished. Staging for 2017 is ongoing.
    • Issues:
      • no significant issues

Sites / Services round table:

  • ASGC: nc
  • BNL: nc
  • CNAF: NTR
  • EGI: nc
  • FNAL: DCache upgrades finished for both disk and tape
  • IN2P3: IN2P3-CC will be in maintenance on March 17th, a Tuesday. As usual details will be available one week before the event. CEs and SEs are foreseen to be in downtime for the whole day.
  • JINR: NTR
  • KISTI: NTR
  • KIT: NTR
  • NDGF: NTR
  • NL-T1: NTR
  • NRC-KI: nc
  • OSG: nc
  • PIC: Tomorrow Tue. 18th Feb. we will be doing a re-configuration in PIC's firewall. The intervention should be transparent for the users.
  • RAL: nc
  • TRIUMF: (Feb17 is a holiday here)
    • TRIUMF_SIM had no jobs running for ~ 1day, workernodes (3K cores) located at TRIUMF were inadvertently using legacy configurations for CVMFS cache squid servers.
    • Three tape cartridge had media error - all 324 files recovered.

  • CERN computing services: NTR
  • CERN storage services: NTR
  • CERN databases: NTR
  • GGUS: NTR
  • Monitoring:
    • Final reports for the January availability sent around
    • ETF LHCb/ALICE: GGUS:145475 - glite-ce clients broken in latest UMD4; glite-ce and condor clients coexistence appears challenging going forward
    • ETF LHCb: change of VOMS role used in testing is planned (to /lhcb/Role=samtest)
  • MW Officer: nc
  • Networks: EELA-UTFSM and CBFP networking issues were resolved
  • Security: nc

AOB: Christoph asked why LHCb is planning the change of the VOMS role. Marian explained that it will be done not to use the production credential. Maarten will check if such a high effort change is necessary.

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r19 - 2020-02-17 - JosepFlix
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback