Week of 130401

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here

The scod rota for the next few weeks is at ScodRota

WLCG Availability, Service Incidents, Broadcasts, Operations Web

VO Summaries of Site Usability SIRs Broadcasts Operations Web
ALICE ATLAS CMS LHCb WLCG Service Incident Reports Broadcast archive Operations Web

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board WLCG Baseline Versions WLCG Blogs GgusInformation Sharepoint site - LHC Page 1


Monday: Easter Monday holiday

  • The meeting will be held on Tuesday instead.

Tuesday

Attendance:

  • local: Raja, Maarten, Jan, Jerome, Stefan
  • remote: Peter, Xavier, Rolf, Wei-Jen, Oliver, Onno, Lisa, Lucia, Rob, Pepe, Gareth, Jeremy, Roger

Experiments round table:

  • CMS reports (raw view) -
    • LHC / CMS
      • Rereconstruction of 2012 data in the tails, load at the T1 sites small
    • CERN / central services and T0
      • Frontier system under high load over the weekend, FastSim workflow was mis-configured using the FullSim job splitting causing very short jobs with a lot of access to the SQUID caches to access alignment and calibration constants. If you see failures in SAM tests and/or Hammercloud tests because of failed access to Frontier, please open a savannah ticket to get the SiteReadiness calculation corrected.
    • Tier-1:
      • ntr
    • Tier-2:
      • ntr

  • ALICE -
    • NTR
      • Xavier: There was a question from ALICE why jobs were lost last week, the reason was a reboot of the VOBOX
      • Maarten: Yes, was also reported in the WLCG Ops meeting last week, but also after the reboot there were some instabilities seen

  • LHCb reports (raw view) -
    • Mainly user jobs with some MC ongoing.
    • T0:
      • No SAM tests displayed on the SUM dashboard - solved now (GGUS:92924). Solution not very clear though.
    • T1:
      • RAL : Continuing to have occasional problems with setting up job environment.

Sites / Services round table:

  • FNAL: NTR
  • KIT: today 8.30 am one fileserver showed issues, hardware is currently being replaced, for the moment 6 x 30 TB are not available for ATLAS
  • CNAF: Kernel upgrade is finished. Pledges are available as of today
  • ASGC: NTR
  • RAL: NTR
  • NL-T1: NTR
  • NDGF: NTR
  • PIC: Scheduled DT of last week went well, but after coming back online the chimera system was unstable, therefore rolled back to the previous version. CPU pledges installed as of today.
  • IN2P3: NTR
  • OSG: NTR
  • GridPP: NTR

  • Batch Services: NTR
  • Storage: Announcement: next Monday the "file update functionality" for CASTOR will be removed for ATLAS, CMS and LHCb. ALICE was already running without.

AOB:

Thursday

Attendance:

  • local: Maarten, Jan, Jarka, Raja, Jerome, MariaD, Stefan
  • remote: Salvatore, Michael, John, David, Jeremy, Pepe, Ronald, Lisa, Rob, Xavier, Wei-Jen, Roger, Peter

Experiments round table:

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • Mainly user jobs with some MC ongoing.
    • T0:
      • Nothing to report
    • T1:
      • RAL : Overnight many jobs failed setting up job environment.

Sites / Services round table:

  • KIT: NTR
  • BNL: Tuesday there was a brief outage of SRM, because SRM DB grew to size that was longer serviceable, reduced size, adjusted now.
  • RAL: Currently looking at CVMFS problem reported by LHCb
  • ASGC: NTR
  • GridPP: NTR
  • PIC: NTR
  • NDGF: NTR
  • NL-T1: Announce DT in 2 weeks (18th April) because of network maintenance, both storage and cpu will be down
  • FNAL: NTR
  • OSG: NTR

  • Storage: NTR
  • Dashboards:
    • Raja: do you know what was the problem why tests were not displayed last week?
    • Maarten: SAM machine was down and needed to be rebooted
  • GGUS: NTR
  • Grid Services: NTR

AOB:

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2013-04-04 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback