--
JamieShiers - 03 Aug 2005
R-GMA Smoke Tests and Actions
Registry - Remote Tests
Check that tomcat is running
http://lcgic01.gridpp.rl.ac.uk:8080
This should return the Apache Tomcat web page
Check the R-GMA Registry is running
http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/RegistryServlet/getNewStatus
N.B. this is a non supported function and may be removed in the next release
This should return an xml page containing general information about the Registry including data about memory use.
Check the message queues on the Registry
http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/RegistryServlet/newGetNewStatus
N.B. this is a non supported function and may be removed in the next release
The returned web page contains information about messages that are waiting to be sent to. There are three queues: RM-fast, RM-medium and RM-slow. The messages in bold are the ones that are currently being sent. Initially all messages are in the fast queue, if the send times out then the message is moved to the medium queue for retry. A further time out will result in the message moving to the slow queue.
If the request for this web page takes more than 5 seconds to return then there is likely to be a problem with the Registry. Check to see if there are an excessive number of messages in the fast queue. Continue to check over a 10 minute period. Since the introduction of the three queues in July we have not witnessed any major problems with the Registry. If there is still a problem after 10 minutes then try flushing the queues:
http://lcgic01.gridpp.rl.ac.uk:8080/R-GMA/RegistryServlet/flushQueue?queueName=name&host=host
The queueName and host parameters are both optional. If you omit both, all hosts from all queues will be deleted! You can find out the names of the queues from newGetNewStatus. The host relates to the servletURL bit of a servlet connection.
Registry - Local Intervention
If after caring out the remote tests there is still a problem there are two other options: ban a site and restart tomcat.
Check the server. You can check that R-GMA is up and running ok using:
/opt/glite/bin/rgma-server-check
GLITE_LOCATION is the installation directory of R-GMA the default location is /opt/glite
Check for error messages and restart tomcat if necessary.
Restart Tomcat. If you failed to get a response from any of the remote tests it may be necessary to restart tomcat:
/sbin/service tomcat5 restart
Ban a site. If after flushing the message queues on the Registry, the fast queue blocks up again it is possible to ban the offending site, usually the site in bold. This can be done by adding it into the deny section of the access control list:
/opt/glite/etc/rgma-server/access-control-list.xml
It will then be necessary to flush the queues of any offending messages.
Archivers - Remote Tests
There may be several archivers running on the grid which constitute critical services. A list of such archivers should be provided by the managers of said archivers.
When a problem is reported with the results from a query of an archived table, first check if any archivers are available for that table and whether or not they are working.
Go to an R-GMA browser:
http://hostname:8080/R-GMA/
Click on “Table Sets”.
Click on the table name for which the problem was reported.
If there are any history producer URLs listed:
- Click on “Type of query: History”;
- Click on “Select producers you want to query:”;
- Click on the first URL listed under the heading history;
- Check that the results are as expected;
- Repeat this for all of the URLs for the history producers.
If there are any latest producer URLs listed:
- Click on “Type of query: Latest”;
- Then query each of the latest producers in the same manner described for the history producers.
If a bad producer is found check that the R-GMA servlets are running:
- If there is a problem with the R-GMA:
-
- Restart Tomcat on that machine;
If no history or latest producers are found then check the machines were they are supposed to be running:
- If there is a problem with the R-GMA:
-
- Restart Tomcat on that machine;
Other things to check
Check this disk space on the machine where the archivers database is. Usually this will be on the local R-GMA server.
Contacting the third level support
Instructions on how and when to do this are given in:
https://twiki.cern.ch/twiki/bin/view/LCG/RGMAThirdLevelProcedure