Week of 180709

WLCG Operations Call details
General Information
Best practices for scheduled downtimes
Monday - virtual meeting

WLCG Operations Call details

At CERN the meeting room is 513-R-068.

For remote participation we use the Vidyo system. Instructions can be found here.

General Information

The purpose of the meeting is:
- to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
- to announce or schedule interventions at Tier-1 sites;
- to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
- to provide important news about the middleware;
- to communicate any other information considered interesting for WLCG operations.
The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
The SCOD rota for the next few weeks is at ScodRota
General information about the WLCG Service can be accessed from the Operations Portal
Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Best practices for scheduled downtimes

Monday - virtual meeting

NOTE: This week the CHEP 2018 Conference

is being held in Sofia.

You may provide relevant incidents, announcements etc. here for the operations record.

Attendance:

local:
remote:

Experiments round table:

ATLAS reports ( raw view) -

CMS reports ( raw view) -

ALICE -
- NTR

LHCb reports ( raw view) -

Sites / Services round table:

ASGC:
BNL:
CNAF:
EGI:
FNAL:
IN2P3:
JINR:
KISTI:
KIT:
NDGF:
- One CE (ce01.grid.uio.no) is currently down due to filesystem problems. Should return on Tuesday or later if problems persists. Vendor is involved in sorting out the problems.
- Another CE (atlas.triolith.nsc.liu.se) is going down on Thursday for two weeks and will return with a new name.
NL-T1:
- A dCache pool node broke down on Monday. It was fixed only on Friday. We apologize for the inconvenience. Here's a small timeline:
  - Monday afternoon: we reported the issue to the vendor at 15:08 CEST
  - Wednesday afternoon: an engineer arrived on site (rather late; we have NBD support) but without parts.
  - Wednesday evening: we escalated the case at the vendor. The vendor explained that one of three replacement parts was out of stock.
  - Friday morning: engineer arrived with the two replacement parts that were in stock. One of those fixed the problem. Around lunchtime the node was up again.
  - This service level is not what we expect from this vendor and we pointed this out to them.
- dCache 3.2 seems to have introduced a bug: when a file is deleted while dCache writes it to tape, the pool on which the file resides is disabled and has to be restarted. Typical problem case: SAM tests. A partial workaround, increasing flush interval, was provided by IN2P3 on the dCache user forum, thanks!
NRC-KI:
OSG:
PIC:
RAL: NTR.
TRIUMF:

CERN computing services:
CERN storage services:
CERN databases:
GGUS: NTR
Monitoring:
- Draft reports for the Jun 2018 availability sent around
MW Officer:
Networks:
Security: NTR

AOB:

Topic revision: r11 - 2018-07-12 - VincentBrillault

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback