SC4 Service Technical Questionnaire

The questions below will allow an appropriate infrastructure, operations and procedures to be established for a service. It is recommended to answer the questions with a collection of experts from the full service stack (system administrators, storage administrators, application administrators).

Where the answers are not known, a guess based on current experience and tests should be entered followed by a question marl (?). As more experience is gained, an improved figure can be entered. Naturally, some attributes of the service such as performance may not be possible to attain if the data provided is not precise. While an answer of 'do not know' is not ideal, it is better than an incorrect but confident answer.

This is intended for use before the solution is implemented. The ScFourQualityAssurance step ensures that the requested service has been implemented or highlights open activities.

Additional information can be found at ScFourServiceDefinition and ScFourServiceTechnicalFactors.

Question Answer
Service
What service class is requested during which calendar periods ? An answer for AP, AS, OP and OS is required AP=
AS=
OP=
OS=
Who is providing second level support for the application (e.g. when there is an application problem, which organisation is responsible for resolution)  
By what mechanism should the second level support organisation be contacted  
What is the agreed response time of the second level organisation  
Is the service level defined in ScFourServiceDefinition  
Configuration
What are the interfaces for this application  
What machines does it/could it run on  
What are the configuration parameters  
Hardware Sizing
How much CPU power does the application need  
How much real memory does the application require  
How much swap space does the application require  
What is the additional disk space required for the application (local logs, state data)  
What is the database setup requirements Database_Type=
Database_Name=
Database_Size=
Software Components
What software components make up the solution ? Web Servers, Databases, Code  
Is there any licenses software which is part of the solution  
Is there a diagram explaining the role of the application in the total deliverable  
Data
Is the application stateful ? Where is the state data stored ?  
Is there a replication procedure so that the state data can be copied to another system ?  
Backup/Restore
What files and directories should be backed up daily  
Is there any requirement for a backup more frequenty than once a day  
What files need to be archived (i.e. kept for legal, security or accounting) ?  
How long should the archive data be kept  
What databases need to be backed up  
Is there a requirement for a coherent backup between files and databses  
Is off-site data storage required for any of the data being backed up  
Networking
Are IP aliases supported for the service rather than the hostname of the machines ?  
What is the expected network bandwidth requirement for the machine  
Is connectivity from the application to the outside of CERN required ? If so, for what purpose  
Is connectivity from outside of CERN to the application required ? If so, for what purpose  
What external systems is the product dependent on for correct function  
Monitoring
What processes need to be running for the service to be up ?  
What file systems need to be monitored and to what thresholds to avoid operation issues  
Is there an application level check (such as a simulation of a user query) which can be used to check that the application is responding to user requests  
Automation
What automatic processes run when (cron, acron)  
Testing
Is a test environment defined  
Procedures
Is there an administration guide which explains
  • Installation
  • Configuration
  • Update
  • Problem Solving
 
Are there provides defined for the operators to
  • Start
  • Stop
  • Check status
the product
 
Is there automatic monitoring of the service and a procedure to re-act in the event of a problem  
In the event of a extended failure, what processes must be executed retro-actively (such as accounting catchup)  
What regular tasks need to be performed by the operators (cleanup of file systems, reboot of servers,...)  
What regular tasks need to be performed by administrators (change configuration files, tuning)  
For planned changes, how can the service be drained so that there are no new requests arriving ? What is the maximum lifetime of a request to the application can be stopped after draining ?  
Users
Who are the users of the service ?  
What declarations of users / groups / roles is required  
How will the users access the service  
What super user / high access rights are required by the application adminstrators  
What technical users are required for the installation and administrator  
Support
Who are the users of the service  
What channel do the users have for reporting problems  
Escalation
Who should be informed when the service goes down  
When the service window for recovery will not be achieved, who should comprise the crisis committee  
What other services should be stopped in order to reduce impact of outage  
Changes
Who is authorized to request an update or change  
What are the procedures for announcing the change to the community  
What is the lifetime of the current product version (i.e. when should it be changed)  
When are the maintenance windows for this product ?  

-- TimBell - 06 Sep 2005

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2005-09-13 - TimBell
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback