LHCOPN Operations Telecom 2011-04-12

The scope of this phoneconf is [2011-01-01, 2011-03-31].

Participation

Sites represented:

  • CH-CERN: Edoardo Martelli, John Shade
  • DE-KIT: Aurelie Reymund
  • ES-PIC: Fernando Lopez
  • FR-CCIN2P3: Guillaume Cessieux (Chair)
  • IT-INFN-CNAF: Stefano Zani
  • NDGF: Dennis Wallberg
  • NL-T1: Sander Boele, Pieter de Boer
  • TW-ASGC: Wen-Shui Chen
  • US-FNAL-CMS: Vyto Grigaliunas

Apologies:

  • CA-TRIUMF: Vitaliy Kondratenko
  • UK-T1-RAL: Nick Moore
  • US-T1-BNL: John Bigrow

Operations Overview

During time-window [2011-01-01, 2011-03-31]:

  • 43 tickets: 15 (35%) IL2, 5 IL3 (12%), 5 Info (12%), 14 ML2 (32%), 4 ML3 (9%)
  • Kind of problem: 41 connectivity issues (95%), 1 performance issue, 1 none
  • 12 tickets (28%) reported impact on services: 4 Loss of service and 6 performance degradation

Distribution of tickets' assignments was as following:

LHCOPN.png

Pending issues ( https://gus.fzk.de/pages/all_lhcopn.php ) :

Operations KPIs

Monitoring report

Computed by Sander Boele from LHCOPN dashboard: http://casper.grid.sara.nl/

Top 20 numbers of measurements with Packetloss > 0.1% during the last three months (less than 2% of data missing in the period)

SRC DEST # of measurements
DE-KIT-HADES CA-TRIUMF-HADES 1118
DE-KIT-HADES US-FNAL-CMS-HADES 913
NL-T1-HADES US-FNAL-CMS-HADES 901
US-T1-BNL-HADES TW-ASGC-HADES 666
TW-ASGC-HADES US-T1-BNL-HADES 651
IT-INFN-CNAF-HADES TW-ASGC-HADES 592
TW-ASGC-HADES IT-INFN-CNAF-HADES 565
TW-ASGC-HADES CA-TRIUMF-HADES 460
US-FNAL-CMS-HADES TW-ASGC-HADES 363
US-FNAL-CMS-HADES NL-T1-HADES 291
US-FNAL-CMS-HADES DE-KIT-HADES 282
TW-ASGC-HADES US-FNAL-CMS-HADES 269
US-T1-BNL-HADES CA-TRIUMF-HADES 209
CA-TRIUMF-HADES UK-T1-RAL-HADES 182
CA-TRIUMF-HADES ES-PIC-HADES 180
TW-ASGC-HADES ES-PIC-HADES 174
US-FNAL-CMS-HADES ES-PIC-HADES 172
US-T1-BNL-HADES ES-PIC-HADES 168
US-FNAL-CMS-HADES US-T1-BNL-HADES 166
DE-KIT-HADES ES-PIC-HADES 164

Backup tests league table

Site Date of last backup test report Have we a report since 1 year?
CA-TRIUMF 2008-06-03 KO
CH-CERN 2008-06-03 KO
DE-KIT 2009-10-14 KO
ES-PIC 2010-04-22 OK
FR-CCIN2P3 2010-03-08 OK
IT-INFN-CNAF 2008-04-09 KO
NDGF 2008-04-09 KO
NL-T1 2009-02-10 KO
TW-ASGC 2010-12-28 OK
UK-T1-RAL 2010-08-24 OK (but reported issue about during last ops phoneconf?)
US-FNAL-CMS 2008-04-24 KO
US-T1-BNL 2008-06-03 KO

Site Reports

CA-TRIUMF

For all GGUS tickets assigned to TRIUMF between 1-04-2011 and 31-06-2011:
  • 14 GGUS tickets

  • Solved:
    • IL2, 4
    • ML2, 4
    • ML3, 1

Opened last month and not closed yet:

  • In progress
    • ML2 3
    • IL 1

  • On hold
    • ML2 1

On June 16,2011 TRIUMF migrated from the CWDM system to the dedicated fibre circuits.

The following circuits have been affected:

  • BNL-TRIUMF-LHCOPN-001 - 10G
  • CERN-TRIUMF-LHCOPN-001 - B - 1G
  • CERN-TRIUMF-LHCOPN-002 - P - 5G
  • SARA-TRIUMF-LHCOPN-001 - T1- 1G

CH-CERN

  • CERN LHCOPN routers will be upgraded with new hardware. The first one will be replaced in May, the second one in June. Exact dates will be provided. Sites having two links to CERN will be migrated to new hardware one link at a time to avoid full disconnection. This will be discussed site by site, and for most of them this should be a 10 minutes maintenance. This will also be a good use case for backup tests.

DE-KIT

No service impacting event on any link:

  • 0 GGUS tickets assigned to DE-KIT
  • 1 planned maintenance at DFN (#65897)
  • 1 link down event on the link DE-KIT/NL-T1 (#67086, duplicate #67087)

Aurelie reminded it was decided to not open a GGUS ticket for non service impacting event (for example maintenance announcements), this is why they now have few tickets.

ES-PIC

  • All GGUS tickets opened are related with ML2
  • PIC yearly electrical maintenance will be from 19th to 20th April

FR-CCIN2P3

No network service impacting event during the time window.

  • Link CERN-IN2P3-LHCOPN-001: No event!
  • Link GRIDKA-IN2P3-LHCOPN-001:
    • 1 fiber cut around Besançon-Dijon #RENATER-2130329, 2011-02-15 19:43 -> 2011-02-16 04:19
    • 16 flaps (sound regular: Nearly all occurred Tuesday or Thursday between 02:00 am and 03:00 am)

CCIN2P3 would like some brief backup tests to be made since the routing between FR-CCIN2P3, DE-KIT and NL-T1 turned to be really complex, particularly to ensure paths' symmetry.

IT-INFN-CNAF

Second Link between CNAF and CERN activated January 27th. The two links are used in a round robin load balancing. But as the two links are using two really diverse paths the RTT are not the same on both links and this may lead to some issue. This is being investigated. Efficiency of the redundancy was fully tested but not reported.

NDGF

Our backup connection was re-routed internally in NORDUnet to gain physical redundancy since the main connection was running in the same physical fibre trunk provided by GEANT. Other than that nothing out of the ordinary to report.

NL-T1

For all GGUS tickets assigned to NL-T1 between 1-1-2011 and 31-3-2011 this is the report based on the CSV export:

  • 16 GGUS tickets
    • IL2, 8
    • IL3, 1
    • Info, 1
    • ML2, 4
    • ML3, 2
  • Closed 16 (on 1-3-2011)

Link related problems:

  • FERMI-SARA-LHCOPN-001 - T1 - 1G - (1x IL2, 1x ML3)
  • GRIDKA-SARA-LHCOPN-001 - 10G - (2x IL2)
  • NDGF-SARA-LHCOPN-001 - 10G - (1x IL2, 1x ML2)
  • SARA-TRIUMF-LHCOPN-001 - T1- 1G - (2x IL2, 3x ML2, 1x ML3)

Despite our new policy to separate work for SURFnet6/Netherlight NOC with NL-T1 operations, a few tickets still have been logged for non NL-T1 links. It is our policy not to do so.

  • CERN-TRIUMF-LHCOPN-002 - P - 5G - (3x IL2, 1x Info)

Ticket 68777 was incorrectly opened as IL2, while it should have been ML2

On January 18th we implemented a new routing setup allowing us to better serve Nordugrid en DE-KIT. In march we closed the long lasting ticket 62381, we've fixed this issues by smartly applying route metrics

Pieter de Boer -- NL-T1 / SARA 12/04/2011

TW-ASGC

Link CERN-ASGC-LHCOPN-003: two unscheduled down time and one scheduled down time events:
  • International carrier reported that multiple fiber cut on the backhaul nearby Amsterdam. 2011-03-02 23:08 - 2011-03-04 11:30
  • Fiber replacement scheduled maintenance requested by international carrier. The requested maintenance window was 8 hours, but the real down time was less then 5 minutes. 2011-02-28 00:00 - 2011-02-28 08:00.
  • Unscheduled down time due to Japan-US submarine cable cut. 2011-02-16 22:47 - 2011-02-17 12:00

The procurement project of 2.5G link from Taipei to Amsterdam and CERN is delayed to mid of July because of price negotiation.

UK-T1-RAL

High utilization noted incoming from CERN to RAL and discussions being held re load balancing the two 10GB trunks.

Scheduled Internal RAL Site Router Upgrade on 15-3-11 and 22-3-11 No Tier1 traffic via JANET during upgrade. * since upgrade reported RAL internal performance problems have affected traffic via JANET, taking the form of packet loss. It is beleived that a misconfiiguration of Microsoft NLB is the cause and is being investigated.

US-FNAL-CMS

Some event within USLHCnet domain, but nothing affecting service or L3.

US-T1-BNL

[Mail from John after the phoneconf: from BNL we've had minimal / no problems]

AOB

  • Conclusions from Ops WG meeting 7: http://indico.cern.ch/materialDisplay.py?contribId=11&materialId=1&confId=129691
    • Two items to go ahead
      • Discussion with GGUS on how to precisely improve interactions between LHCOPN helpdesk and WLCG GGUS
      • Discussion with Sander about gathering monitoring information indicating service impacting events happening on the LHCOPN
  • John asked how behaves the monitoring system with the round robin balancing over the two links at IT-INFN-CNAF
    • Sander plans to correlate OWD with Traceroute database to avoid issue and to be able to support RTT discrepencies
  • Next LHCONE/LHCOPN meeting is June, 13th and 14th, Washington DC, http://indico.cern.ch/conferenceDisplay.py?confId=131550

Next Ops Phoneconf

The next Teleconference was scheduled for Tuesday, July the 5th, 2011 - 16:30 Geneva time (CEST)

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2011-07-07 - BrunoHoeft
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LHCOPN All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback