Skip to topic | Skip to bottom

Start of topic | Skip to actions

SC4 Service Technical Questionnaire for VOMS/VOMRS

The questions below will allow an appropriate infrastructure, operations and procedures to be established for a service. It is recommended to answer the questions with a collection of experts from the full service stack (system administrators, storage administrators, application administrators).

Where the answers are not known, a guess based on current experience and tests should be entered followed by a question marl (?). As more experience is gained, an improved figure can be entered. Naturally, some attributes of the service such as performance may not be possible to attain if the data provided is not precise. While an answer of 'do not know' is not ideal, it is better than an incorrect but confident answer.

This is intended for use before the solution is implemented. The ScFourQualityAssurance step ensures that the requested service has been implemented or highlights open activities.

Additional information can be found at ScFourServiceDefinition and ScFourServiceTechnicalFactors.

Question Answer
Service
What service class is requested during which calendar periods ? An answer for AP, AS, OP and OS is required
OS=C
Who is providing second level support for the application (e.g. when there is an application problem, which organisation is responsible for resolution) CERN
project-lcg-vo-dteam-admin@cern.ch
By what mechanism should the second level support organisation be contacted email
What is the agreed response time of the second level organisation 3 hrs (9-5,Mon-Fri)
Is the service level defined in ScFourServiceDefinition Yes VOMS
Configuration
What are the interfaces for this application vomrs & voms-admin: HTTP-based user interface and SOAP API, implemented as a Java web application.
voms core: Internet ports, configurable, 15000 by default. We already use 15000-15010.
What machines does it/could it run on Linux SL3
What are the configuration parameters vomrs: See http://computing.fnal.gov/docs/products/vomrs/vomrs1_2/configfile.html for the config. paremetres.
voms-admin: See http://cern.ch/dimou/lcg/voms/voms-admin-config-parameters
voms core: port, backlog, backend, socket timeout, etc...
Hardware Sizing
How much CPU power does the application need Dependent on the number of client connections but basically anything >=Dual CPU 1GHz (what we have now). We can expect ~60 simultaneous client connections.
How much real memory does the application require Total for our configuration should be >=3GB
How much swap space does the application require vomrs: needs about 60 MiB of virtual memory per VO. There is also one vomrs server per VO that requires about 20-30 MiB per VO.
voms-admin: needs about 50-75MiB of virtual memory per VO.
voms core: Dependent on the number of client connections.
What is the additional disk space required for the application (local logs, state data) local logs, size is configurable.
vomrs: 60 MiB
voms-admin: 100 MiB
What is the database setup requirements Database_Type=Oracle
Database_Name=? (living on grid8/voms-pilot.cern.ch host)
Database_Size=vomrs and voms-admin applications need at least one separate database account per VO. Normally one account capable of doing selects on all the DB, and another capable of doing
update on a single table (the table will contain just one record). Expected size for a VO with 5000 users, 10 groups and 10 roles is less then 2MB, plus thedata for the admin interface.
Software Components
What software components make up the solution ? Web Servers, Databases, Code vomrs (uses tomcat, needs database backend, now on grid8)
foundation-view (uses CERN HR Oracle DB)
voms-admin (uses tomcat,soap,java)
voms
http://cern.ch/dimou/lcg/voms/server.html
Is there any licenses software which is part of the solution Oracle client but OK for CERN
Is there a diagram explaining the role of the application in the total deliverable There are diagrams describing the links between the components but I don't know if there is a picture of the "total deliverable" where we could plug our part.
Data
Is the application stateful ? Where is the state data stored ? vomrs: The web UI retains session states for some configurable amount of time.
vomrs and voms-admin: Yes. The state is stored in the configured database.
The user interface is stateless, i.e., the service does not
retain session states across user requests.
voms core: Yes, the state is stored in DB, the size is fixed.
Is there a replication procedure so that the state data can be copied to another system ? vomrs: a script exists that copies the configuration. 2 or more vomrs web application/servers can share the same database.
voms-admin: Yes. Two or more voms-admin servers can share the same database.
Backup/Restore
What files and directories should be backed up daily Everything under /opt and /var
Is there any requirement for a backup more frequenty than once a day No
What files need to be archived (i.e. kept for legal, security or accounting) ? Everything in /var/log
How long should the archive data be kept Two years. See http://edms.cern.ch/document/428034 section 4 point 2
What databases need to be backed up All under accounts voms_[VOname] and vomrs_[VOname]
Is there a requirement for a coherent backup between files and databses No
Is off-site data storage required for any of the data being backed up Not for now. Replication of VO DB is being discussed in the Security group still. Normally it is allowed by policy but not yet required.
Networking
Are IP aliases supported for the service rather than the hostname of the machines ? Not now.
What is the expected network bandwidth requirement for the machine 100MB/sec sustained bandwidth at maximum.
Is connectivity from the application to the outside of CERN required ? If so, for what purpose TCP/IP connectivity in http://network.cern.ch is INCOMING
Is connectivity from outside of CERN to the application required ? If so, for what purpose Yes, on port 8443 for authorised registration and ports 15xyz for voms-proxy
What external systems is the product dependent on for correct function lxb2051 (where the configuration files are stored), the central Physics (?) db servers (grid8 for now) and the CERN HR db.
Monitoring
What processes need to be running for the service to be up ? tomcat5, edg-voms, sshd (from within CERN), crond, java (vomrs)
What file systems need to be monitored and to what thresholds to avoid operation issues local /opt and /var up to 80%
Is there an application level check (such as a simulation of a user query) which can be used to check that the application is responding to user requests Yes. The command 'service gLite status' and https://[VOMSserver].cern.ch:8443/vo/[VOname]/vomrs where VOMSserver == lcg-voms | voms | voms-slave and [VOname] == a list of 9 VOs visible in https://lcg-voms.cern.ch:8443/vomses (requires personal certificate).
Automation
What automatic processes run when (cron, acron) cron contains various processes different on every [VOMSserver].
Testing
Is a test environment defined Yes. On testbed004 == voms-test
Procedures
Is there an administration guide which explains
  • Installation
  • Configuration
  • Update
  • Problem Solving

Installation: https://uimon.cern.ch/twiki/bin/view/LCG/VomsCernSetup

Configuration & Update: https://uimon.cern.ch/twiki/bin/view/LCG/VomsConfiguration

Problem Solving: https://uimon.cern.ch/twiki/bin/view/LCG/VomsProblemSolving

Are there provides defined for the operators to
  • Start
  • Stop
  • Check status
the product
https://uimon.cern.ch/twiki/bin/view/LCG/VomsStartStopCheck
Is there automatic monitoring of the service and a procedure to re-act in the event of a problem r-gma configured, maybe not used (?). Periodical restart of tomcat5.
In the event of a extended failure, what processes must be executed retro-actively (such as accounting catchup) None I know of.
What regular tasks need to be performed by the operators (cleanup of file systems, reboot of servers,...) None hopefully when the system and software is stable. So far, we have been doing continuous upgrades to install bug fixes.
What regular tasks need to be performed by administrators (change configuration files, tuning) Creation of new VOs, logs' checkin, voms-admin parameters adjustment has been so far necessary.
For planned changes, how can the service be drained so that there are no new requests arriving ? What is the maximum lifetime of a request to the application can be stopped after draining ? Stoppage of tomcat and voms (== edg-voms) on scheduled time. Don't know about jobs running that might require proxy renewal during the intervention. Normally the default time for a proxy is 12 hours.
Users
Who are the users of the service ? All grid registered users of all VOs.
What declarations of users / groups / roles is required This is a decision of the VO Admins and is declared via vomrs.
How will the users access the service Via the web https://lcg-voms.cern.ch:8443/vo/[VOname]/vomrs and the command line of their UI (voms-proxy-[init|info])
What super user / high access rights are required by the application adminstrators root (the way we have installed now) and the passwd for the CERN HR db view and the Oracle db accounts were the data are stored (just for direct testing of the databases' content with sql commands).
What technical users are required for the installation and administrator I don't understand this question
Support
Who are the users of the service All grid registered users of all VOs.
What channel do the users have for reporting problems email to project-lcg-vo-[VOname]-admin@cern.ch
Escalation
Who should be informed when the service goes down The LCG-ROLLOUT@listserv.cclrc.ac.uk list and the VO Admins project-lcg-vo-[VOname]-admin@cern.ch, via the EGEE BROADCAST <egee-broadcast@cern.ch>
When the service window for recovery will not be achieved, who should comprise the crisis committee project-lcg-vo-dteam-admin@cern.ch
What other services should be stopped in order to reduce impact of outage The cron jobs for running edg-mkgridmap.pl on every CE and RB but it is hard to arrange that at all sites.
Changes
Who is authorized to request an update or change The VO Admins, the developers, the Grid Deployment members responsible for LCG2 and/or gLite releases.
What are the procedures for announcing the change to the community Announcement via EGEE broadcast tool under https://cic.in2p3.fr
What is the lifetime of the current product version (i.e. when should it be changed) Between May and October 2005 we had to change 6 times. We sincerely wish to stabilise if the code is now robust.
When are the maintenance windows for this product ? I don't understand. We make changes during working hours but we announce them before.



-- Maria Dimou 1 Nov 2005
to top