Week of 150202

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Stefan (SCOD), Maarten (ALICE), Alessandro (Storage), Xavi (Storage), Maciej (DB), Jerome (Grid Services),
  • remote: Dimitri (KIT), Felix (ASGC), Rolf (IN2P3), Lisa (FNAL), Onno (NL-T1), Sang Un (KISTI), Christoph (CMS), Christian (NDGF), Di Qing (TRIUMF), Gareth (RAL), Elisabeth (OSG)
  • apologies: Alessandro (ATLAS), Joel (LHCb)

Experiments round table:

  • ATLAS
    • Central Services/T0/T1
      • Nothing specific for WLCG ops ex-daily meeting

  • CMS reports (raw view) -
    • Continue the efforts from last week
      • Tape staging tests at KIT and PIC
      • Consistency checks for storage at all Tier-1s

  • ALICE -
    • NTR

  • LHCb
    • "Legacy Run1 Stripping" waiting on FTS transfers and recovery from lost files. Large MC campaign upcoming + User jobs but not much happening right now.
    • T0: NTR
    • T1: Downtime at RAL and forthcoming intervention on IN2P3 on CE transparent for user.

Sites / Services round table:

  • ASGC: DT next Monday of server farm for upgrade of servers to slc6,
  • BNL: NR
  • CNAF: NR
  • FNAL: GGUS alarm that went to wrong person last week. FNAL believe it's something on CERN side, ticket was updated. Is there something else to do? Maarten: ticket still open? Lisa: Believe it's closed Maarten: Cleanest will be to open a new ticket in your tracker or in GGUS, assign it to GGUS team. Feb 11 DT 8am-4pm for various patching
  • GridPP: NR
  • IN2P3: Two day outage from March 3rd, details to come. In addition new dCache release to be deployed on Feb 24th, to be confirmed.
  • JINR: NR
  • KISTI: NTR
  • KIT: NTR
  • NDGF: NTR
  • NL-T1: Followup on broken pool node, not yet in production. Something more is broken than the back plane. Machine back to the vendor to be investigated. Next week on Tue DT, among other things (OS updates and dCache upgrade), for pool nodes fan speed will be increased to increase cooling. Cannot exclude the possibility that this was the cause of the failure.
  • NRC-KI: NR
  • OSG: NTR
  • PIC: NR
  • RAL: Outage of Castor this morning for 1 1/2 hours for security patches. Tomorrow backup OPN link to CERN will be re-connected, during day will flip forth and back, shall be transparent for users.
  • TRIUMF: After the planned power shutdown, one of DCS units (IBM storage system) did not come back properly, there were issues with two disk drawers which have 24 disk drivers in. After vendor came to repair it and replaced one drawer, the problem of that DCS unit is gone and the luns were rebuilding. We are waiting for the parts for another drawer. All luns of that DCS unit are still in read-only mode for the moment.

  • CERN batch and grid services: NTR
  • CERN storage services: Plan update on EOS/CMS next
  • Databases: NTR
  • GGUS: NR
  • Grid Monitoring: NR
  • MW Officer: NR

AOB:

Thursday

Attendance:

  • local: Stefan (SCOD), Christoph (CMS), Thomas (KIT), Jerome (Grid Services), Maarten (ALICE), Alessandro (Storage), Kacper (DB), Maciej (DB), Andrea (MW)
  • remote: Dennis (NL-T1), Felix (ASGC), Rolf (IN2P3), John (RAL), Sang Un (KISTI), Christoph (NDGF), Di Qing (TRIUMF), Jeremy (GridPP), Rob (OSG), Antonio (CNAF), John (RAL)
  • apologies: Alessandro (ATLAS)

Experiments round table:

  • ATLAS
    • absent b/c of software week

  • CMS reports (raw view) -
    • Bigger MC production campaign for Run2 launched this week
    • Some FTS error messages being investigated by FTS team GGUS:111594

  • ALICE -
    • NTR

  • LHCb
    • "Legacy Run1 Stripping" waiting on FTS transfers and recovery from lost files. Large MC campaign upcoming + User jobs but not much happening right now.
    • T0: Lot of jobs failing at CERN (GGUS:111565)
    • T1: SARA : any news about the INCIDENT report ?.

Sites / Services round table:

  • ASGC: NTR
  • BNL: NR
  • CNAF: NTR
  • FNAL: NR
  • GridPP: NTR
  • IN2P3: NTR
  • JINR: NR
  • KISTI: NTR
  • KIT: last week new drives for one of the tape libraries received which should be ready now
  • NDGF: NTR
  • NL-T1: NTR
  • NRC-KI: NR
  • OSG: Following up on GGUS ticket which was wrongly dispatched. Multi-core reporting to APEL shall happen soon with some test records
  • PIC: NR
  • RAL: NTR
  • TRIUMF: Problem mentioned on Monday with storage is fixed. Some WNs were shutdown b/c of temperature going above threshold.

  • CERN batch and grid services:
    • Monday Feb 9th from 10:15 CET fts3.cern.ch will stop accepting requests for up to an hour while database is migrated. OTG0018274
    • Looking into LHCb failing jobs (mentioned above)
  • CERN storage services: This morning updated CMS / EOS, all fine
  • Databases: NTR
  • GGUS: Investigating issue with FNAL where email was wrongly sent
  • Grid Monitoring: Availability and Reliability draft reports have been sent out
  • MW Officer: Problems observed for Argus with the latest Java release more details in WLCG Ops Coord meeting minutes

AOB:

  • WLCG Operations Coordination meeting today at 15.30

-- AndreaSciaba - 2014-12-16

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2015-02-05 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback