Tier0 Services Required for Service Challenge 4 and Initial LHC Service

The following table gives a high-level overview of the services required for SC4 and the Initial LHC Service.

Some of these services are required at Tier1s and / or Tier2s. A list of services required per site will be produced at a later date.

It is intended as a first step at understanding the service issues and the implications on middleware enhancements / hardware requirements etc. The focus is redundancy, high availability and scalability, achieved where possible in software (which makes the hardware part much easier and much more flexible.)

Please see An Overview of LCG 2 Middleware (Oct 2004) (an update on the timescale of end Sep 2005 will be prepared).

Issues that need to be addressed include:

  • criticality (critical, high, medium, low) for acceptable downtime where
    • C = critical means <1 hour,
    • H = high < 4 hours,
    • M = medium < 24 hours,
    • L = low < 1 week (or some similar scale)

(proposed by Tim Bell - maybe these should be aligned with the parameters for minimum levels of T0 service in the MoU (page A3.2) - Jamie)

  • disaster recovery (e.g. is it necessary to have the machines for the service in different locations?)

  • service supports high availability (i.e. like BDII where the software can automatically provide for HA or where this needs to be implemented as a standby machine)

  • externally accessibility required ?

Many of the services also include / rely on a database component: some Oracle, some MySQL. These issues also have to be addressed.

To be added:

  • recovery procedures defined Y/N, tested Y/N
  • expected lifetime of service; foreseen replacement service

Also need:

  • Level 1, 2 & 3 procedures;
  • Mailing lists (standards?)
  • Documentation, FAQ, ...
  • Monitoring, including comparison of delivered service level with agreed level
  • ...

Need also to identify service manager / coordinator for each service / assign to organisational unit

Assign ownership of each service to carry deployment forward

Software supplier(s) also to be added, dependencies etc.

gLite components: R-GMA, VOMS, FTS

gLite migration TDB: RB, CE

Others N/A

ID Service Name Acronym Purpose Contact Information Current Situation Growth Availability Issues Criticality (C/H/M/L)
1 ResourceBroker RB Farms out jobs to sites+logging and book-keeping David Smith 20 machines with raid array   Concern C
2 MyProxy   Renew/acquire credentials Maarten Litmaath     Long-running jobs cannot renew proxy, FTS uses directly (hence C) C
3 BdiiService BDII Grid information system Lawrence Field 4 farm nodes, dns alias depends on query rate, add commodity boxes no automatic failover to external BDIIs if CERN site down. Some sites have their own BDIIs. State kept (4MB) in memory and on disk C
4 SiteBdii     Lawrence 1   Need at least one additional mc H
5 ComputeElement CE           C
6 RgmaService R-GMA Grid monitoring Lawrence Field see below     M
7 MonboxService   see above Lawrence Field 1 farm node, 2GB memory   Properly configured clients ok - see below M
8 ArchiverService   see above Lawrence Field 4 as above. Local mysql DB   Permanently lose monitoring info after client timeout M
9 GridView             M
10 SftService SFT Regular tests of components per site Piotr Nyczyk, Judit Novak 2 farm nodes, MySQL Depends on need for historical data / number of tests Detailed site status unavailable M
11 GridPeek   For storage of log files of running jobs (to provide visibility prior to job end). Patricia Mendez 1 DPM instance add additional servers / storage as required Log files of current jobs not visible M
12 VomsService VOMS manages users / roles / VOs Maria Dimou Pilot - farm node running application server + DB Separate DB from App server Current jobs ok, new jobs cannot be submitted H
13 LcgFileCatalog LFC Site local file catalog for ALICE, ATLAS, CMS. Global catalog for LHCb hep-service-lfc@cernNOSPAMPLEASE.ch 5 farm nodes (LFC servers) + Oracle DB     C
14 FileTransferService FTS reliable file transfer service - CMS currently using phedex. fts-support@cernNOSPAMPLEASE.ch 2 disk servers (lxshare021d and 026d) + pilot   Key service offered by Tier0 for T0<->T1 data production data transfers C
15 CastorGrid CASTORGRID This is the low level service which runs the actual SRM and gridFTP to perform data transfers in and out of CASTOR. Wan-Data.Operations@cernNOSPAMPLEASE.ch 8 load balanced worker nodes connected via 2 x 1Gb link. Can grow as needed provided there is enough network capacity. This model will probably be replaced by CASTOR WAN pools setup as used for SC3. C

Services that use Oracle

LFC shared backend across all VOs
FTS ditto
CASTOR  
Gridview  
VOMS porting from MySQL in progress - target for SC4

Services that use MySQL

  • RB
  • R-GMA
  • SFT

Tier1 Services

Tier2 Services

-- TimBell - 05 Sep 2005

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2007-02-14 - FlaviaDonno
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback