OldServiceChallengeFourProgress < LCG

LCG Web>LCGServiceChallenges>ProgressLogs>ServiceChallengeFourProgress>OldServiceChallengeFourProgress (2007-02-14, FlaviaDonno)

EditAttachPDF

Tier0 Services Required for Service Challenge 4 and Initial LHC Service

The following table gives a high-level overview of the services required for SC4 and the Initial LHC Service.

Some of these services are required at Tier1s and / or Tier2s. A list of services required per site will be produced at a later date.

It is intended as a first step at understanding the service issues and the implications on middleware enhancements / hardware requirements etc. The focus is redundancy, high availability and scalability, achieved where possible in software (which makes the hardware part much easier and much more flexible.)

Please see An Overview of LCG 2 Middleware (Oct 2004) (an update on the timescale of end Sep 2005 will be prepared).

Issues that need to be addressed include:

criticality (critical, high, medium, low) for acceptable downtime where
- C = critical means <1 hour,
- H = high < 4 hours,
- M = medium < 24 hours,
- L = low < 1 week (or some similar scale)

(proposed by Tim Bell - maybe these should be aligned with the parameters for minimum levels of T0 service in the MoU (page A3.2) - Jamie)

disaster recovery (e.g. is it necessary to have the machines for the service in different locations?)

service supports high availability (i.e. like BDII where the software can automatically provide for HA or where this needs to be implemented as a standby machine)

externally accessibility required ?

Many of the services also include / rely on a database component: some Oracle, some MySQL. These issues also have to be addressed.

To be added:

recovery procedures defined Y/N, tested Y/N
expected lifetime of service; foreseen replacement service

Also need:

Level 1, 2 & 3 procedures;
Mailing lists (standards?)
Documentation, FAQ, ...
Monitoring, including comparison of delivered service level with agreed level
...

Need also to identify service manager / coordinator for each service / assign to organisational unit

Assign ownership of each service to carry deployment forward

Software supplier(s) also to be added, dependencies etc.

gLite components: R-GMA, VOMS, FTS

gLite migration TDB: RB, CE

Others N/A

ID	Service Name	Acronym	Purpose	Contact Information	Current Situation	Growth	Availability Issues	Criticality (C/H/M/L)
1	ResourceBroker	RB	Farms out jobs to sites+logging and book-keeping	David Smith	20 machines with raid array		Concern	C
2	MyProxy		Renew/acquire credentials	Maarten Litmaath			Long-running jobs cannot renew proxy, FTS uses directly (hence C)	C
3	BdiiService	BDII	Grid information system	Lawrence Field	4 farm nodes, dns alias	depends on query rate, add commodity boxes	no automatic failover to external BDIIs if CERN site down. Some sites have their own BDIIs. State kept (4MB) in memory and on disk	C
4	SiteBdii			Lawrence	1		Need at least one additional mc	H
5	ComputeElement	CE						C
6	RgmaService	R-GMA	Grid monitoring	Lawrence Field	see below			M
7	MonboxService		see above	Lawrence Field	1 farm node, 2GB memory		Properly configured clients ok - see below	M
8	ArchiverService		see above	Lawrence Field	4 as above. Local mysql DB		Permanently lose monitoring info after client timeout	M
9	GridView							M
10	SftService	SFT	Regular tests of components per site	Piotr Nyczyk, Judit Novak	2 farm nodes, MySQL	Depends on need for historical data / number of tests	Detailed site status unavailable	M
11	GridPeek		For storage of log files of running jobs (to provide visibility prior to job end).	Patricia Mendez	1 DPM instance	add additional servers / storage as required	Log files of current jobs not visible	M
12	VomsService	VOMS	manages users / roles / VOs	Maria Dimou	Pilot - farm node running application server + DB	Separate DB from App server	Current jobs ok, new jobs cannot be submitted	H
13	LcgFileCatalog	LFC	Site local file catalog for ALICE, ATLAS, CMS. Global catalog for LHCb	hep-service-lfc@cernNOSPAMPLEASE.ch	5 farm nodes (LFC servers) + Oracle DB			C
14	FileTransferService	FTS	reliable file transfer service - CMS currently using phedex.	fts-support@cernNOSPAMPLEASE.ch	2 disk servers (lxshare021d and 026d) + pilot		Key service offered by Tier0 for T0<->T1 data production data transfers	C
15	CastorGrid	CASTORGRID	This is the low level service which runs the actual SRM and gridFTP to perform data transfers in and out of CASTOR.	Wan-Data.Operations@cernNOSPAMPLEASE.ch	8 load balanced worker nodes connected via 2 x 1Gb link.	Can grow as needed provided there is enough network capacity.	This model will probably be replaced by CASTOR WAN pools setup as used for SC3.	C

Services that use Oracle

LFC	shared backend across all VOs
FTS	ditto
CASTOR
Gridview
VOMS	porting from MySQL in progress - target for SC4

Services that use MySQL

RB
R-GMA
SFT

Tier1 Services

Tier2 Services

-- TimBell - 05 Sep 2005

Topic revision: r4 - 2007-02-14 - FlaviaDonno

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback