This page contains a description of the tests that are regularly performed in order to check the availability of the ETICS services. In addition there is an explanation of the most frequent error messages.

Tests

The tests are submitted from a machine at INFN (etics-01.cnaf.infn.it) and they are done every 5 minutes but job submissions are done every hour. The results of tests are published on the same node (http://etics-01.cnaf.infn.it/SLS) and they are xml files in the format requested by the SLS monitoring tools. The history of the availability of the services and an overall availability of the ETICS infrastructure is shown at the SLS page: https://lemonweb.cern.ch/sls/service.php?id=ETICS (it requires NICE authentication). The tests are submitted on the three official site of the project (CERN, INFN, UOW)

Types of tests

There are six types of tests:

  • ETICS_DB accesses the ETICS DB with a mysql command (it is done only for CERN)
  • ETICS_WS checks the availability of the Web Service trying to do a wget of a https request
  • ETICS_WEBAPP checks the availability of the Web Application trying to download its main page
  • ETICS_REMOTE checks the availability of the remote submission service (it fails if an error is returned when submitting a build)
  • ETICS_REMOTE_OUTPUT checks if the submitted builds succeded or not. The builds are submitted on two different platforms (slc3_ia32_gcc323, slc4_ia32_gcc346)
  • ETICS_REMOTE_OUTPUT_CROSS is like the previous one but building on platforms that are not available locally (tests the cross-site submission)

Each test is characterized by an ID that is the name of the test followed by the symbol "_" and the name of the site (e.g. ETICS_REMOTE_OUTPUT_INFN)

In the SLS page there are special metaservice that are used to show the overall availability of a site or of the ETICS world. Their names are:

  • ETICS
  • ETICS_INFN
  • ETICS_UOW
  • ETICS_CERN

The availability of ETICS is the average of the availability of the three sites. The availability of a site is a weighted average of the availability of its services. The weight of the services are as follows:

  • ETICS_DB : 5
  • ETICS_WS : 5
  • ETICS_WEBAPP : 5
  • ETICS_REMOTE : 3
  • ETICS_REMOTE_OUTPUT : 2
  • ETICS_REMOTE_OUTPUT_CROSS : 1

SLS notification mail

A mail is sent to the ETICS site administrators if an error is detected on the sites (availability = 100%). The mail is sent every 5 minutes but for problem in the remote submission is sent every hour.

Understanding the errors

When a site administrator receives an error notification mail he/she must try to understand and solve the problem as soon as possible. Here there is a list of the most frequent causes of errors:

  • ETICS_DB : mysql port closed on the firewall, changed password in the DB, maximum number of open connections reached
  • ETICS_WS : problems with the https protocol (try to contact the insecure instance of the WS), tomcat died (restart condor)
  • ETICS_WEBAPP : problems with the https protocol (try to contact the insecure instance of the WebApp), tomcat died (restart condor)
  • ETICS_REMOTE : problems in the configuration of the connection between WS and Metronome (check the WS configuration files), impossible to submit to Metronome (try a local Metronome submission: nmi_submit)
  • ETICS_REMOTE_OUTPUT : the first thing to do is to check which is the platform where the jobs fail. If the jobs are still in queues, add new WN for that platform. If the WNs with that platform are not running, restart them. If the jobs failed, there may be problems in the WN configuration.
  • ETICS_REMOTE_OUTPUT_CROSS : if the ETICS_REMOTE_OUTPUT on the same site is failed the cause can be related to the job submission otherwise there is something wrong in the cross-site submission.

-- Main.mselmi - 09 Aug 2007

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2007-08-09 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ETICS All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback