-- JamieShiers - 03 Aug 2005

R-GMA Smoke Tests and Actions

Registry - Remote Tests

Check that tomcat is running http://lcgic01.gridpp.rl.ac.uk:8080 This should return the Apache Tomcat web page

Check the R-GMA Registry is running http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/RegistryServlet/getNewStatus

N.B. this is a non supported function and may be removed in the next release

This should return an xml page containing general information about the Registry including data about memory use.

Check the message queues on the Registry http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/RegistryServlet/newGetNewStatus

N.B. this is a non supported function and may be removed in the next release

The returned web page contains information about messages that are waiting to be sent to. There are three queues: RM-fast, RM-medium and RM-slow. The messages in bold are the ones that are currently being sent. Initially all messages are in the fast queue, if the send times out then the message is moved to the medium queue for retry. A further time out will result in the message moving to the slow queue.

If the request for this web page takes more than 5 seconds to return then there is likely to be a problem with the Registry. Check to see if there are an excessive number of messages in the fast queue. Continue to check over a 10 minute period. Since the introduction of the three queues in July we have not witnessed any major problems with the Registry. If there is still a problem after 10 minutes then try flushing the queues:

http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/RegistryServlet/flushQueue?queueName=name&host=host

The queueName and host parameters are both optional. If you omit both, all hosts from all queues will be deleted! You can find out the names of the queues from newGetNewStatus. The host relates to the servletURL bit of a servlet connection.

Registry - Local Intervention

If after caring out the remote tests there is still a problem there are two other options: ban a site and restart tomcat.

Check the server. You can check that R-GMA is up and running ok using:

/opt/glite/bin/rgma-server-check

GLITE_LOCATION is the installation directory of R-GMA the default location is /opt/glite

Check for error messages and restart tomcat if necessary.

Restart Tomcat. If you failed to get a response from any of the remote tests it may be necessary to restart tomcat:

/sbin/service tomcat5 restart

Ban a site. If after flushing the message queues on the Registry, the fast queue blocks up again it is possible to ban the offending site, usually the site in bold. This can be done by adding it into the deny section of the access control list:

/opt/glite/etc/rgma-server/access-control-list.xml

It will then be necessary to flush the queues of any offending messages.

Archivers - Remote Tests

There may be several archivers running on the grid which constitute critical services. A list of such archivers should be provided by the managers of said archivers.

When a problem is reported with the results from a query of an archived table, first check if any archivers are available for that table and whether or not they are working.

Go to an R-GMA browser: http://hostname:8080/R-GMA/

Click on “Table Sets”.

Click on the table name for which the problem was reported.

If there are any history producer URLs listed:

  • Click on “Type of query: History”;

  • Click on “Select producers you want to query:”;

  • Click on the first URL listed under the heading history;

  • Click on “Query”;

  • Check that the results are as expected;

  • Repeat this for all of the URLs for the history producers.

If there are any latest producer URLs listed:

  • Click on “Type of query: Latest”;

  • Then query each of the latest producers in the same manner described for the history producers.

If a bad producer is found check that the R-GMA servlets are running:

  • or

  • If there is a problem with the R-GMA:

    • Restart Tomcat on that machine;

    • Restart the archiver.

  • Otherwise:

    • Restart the archiver.

If no history or latest producers are found then check the machines were they are supposed to be running:

  • or

  • If there is a problem with the R-GMA:

    • Restart Tomcat on that machine;

    • Restart the archiver.

  • Otherwise:

    • Restart the archiver.

Other things to check

Check this disk space on the machine where the archivers database is. Usually this will be on the local R-GMA server.

Contacting the third level support

Instructions on how and when to do this are given in: https://twiki.cern.ch/twiki/bin/view/LCG/RGMAThirdLevelProcedure
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2006-02-07 - PeterJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback