Present: Dario Barberis, Simone Campana, Alastair Dewhurst, Alessandro Di Girolamo, Dave Dykstra, Luis Linares, Stefan Roiser, Andrea Valassi
Not present: Alexandre Beche, Doug Benjamin, Barry Blumenfeld

Discussed the difficulty of changing the squid monitoring machines' (currently single, virtual) IP address because it affects every squid's Access Control List and a majority of site's firewalls to allow incoming UDP/SNMP queries from MRTG. Suggestion is to change it this time to allow all IP addresses at CERN to give maximum flexibility of location of the monitoring machines.

  • Barry asked CMS T2 mailing list, very little objection so far
  • Simone will ask at ATLAS Jamboree December 10/11
  • Stefan will ask LHCb regarding CVMFS squid monitoring (and it would be good to include such squids in the MRTG monitoring)

SAM/SUM tests currently read from an XML file known as ATP

  • The squid monitoring service could also use it, and probably should
  • GOCDB & OIM are only for declaring services and up/down time
  • ATP accepts VO-specific additions from another XML file; ATLAS generates this from AGIS
  • Where would the XML file come from for other VOs? One possibility is to extend AGIS to be for all VOs

The user based monitoring (SAM/SUM) uses the proxy port 3128, but MRTG monitoring uses port 3401. The names & IP addresses of the same squid are fairly often different too, one on a private network and one on the public network. There are definitely known cases of non-standard ports for MRTG monitor (for example servers running multiple squid processes) and there may also be cases of the user port. Definitely the user port is different for the reverse-proxy squids used on servers, although that's not relevant to the SAM/SUM tests.

Discussed SquidMonitoringTaskForceQuestions

  1. The failover mechanism isn't planned to change to reconcile the differences between the CMS & ATLAS experiments. There are failovers to the reverse-proxy squids used for all services, that should be monitored in a common way. This is not yet done for CVMFS stratum 1 nor for ATLAS Frontier server squids, only CMS Frontier (based on the awstats squid monitor). CMS also wants to monitor their centralized backup proxies this way.
  2. & 6. There's mostly a consensus that squids should be separately tested in SAM/SUM as a first-class service of their own, not only associated with CEs and servers as they are now, to help narrow down root causes when there's a problem. This implies they need to be defined in GOCDB/OIM. One possibility is to use the output of MRTG to determine if they are up or down, but if so in order to avoid false alarms do to dropped UDP packets it should look at it them multiple times to see if a problem persists. Or we could have the squid monitoring machines generate another web page that only shows errors if there are multiple failures in a row and have the SAM/SUM tests use those.

We will continue discussing the questions at the next meeting in 2 weeks.

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt squid20121130AV.txt r1 manage 6.4 K 2012-12-12 - 18:17 AndreaValassi Andrea's notes from the meeting
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2012-12-12 - AndreaValassi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback