Architecture and planning for the full data flow chain DAQ-T0-T1.
1. General Architecture
The general data flow in the T0 system can be disentangled into several part :
* DAQ --> T0
issues : the size and access pattern of the DAQ disk buffer
* T0 --> Tape
issues : file sizes, IO scheduling read and write streams, disk server congestion
* T0 --> Reconstruction Farm
issues : direct access to the disk servers versus local caching on the worker nodes
* T0 --> Tier 1 export
issues : high number of parralel IO and TCP streams, Tier 1 HSM charactaristics
and then of course the combination of the different flows.
In addition we have to face the complication of the start-up period where the data flow is most probably different
compared to the stable running period in 2009.
Jul 2005 *
ArchitectureandTechnologiesforPhaseIIoftheLCGprojectatCERN_v10.doc: General Arcitecture overview paper, CERN T0 and CAF
Sep 2005 *
CDRandT0scenarios3.doc: Central Data recording scenarios and considerations
Mar 2006 *
BuildingBlocks.doc: Decription of the basic computing fabric building blocks
2. Experiment plans and architectures
ALICE
tentative time schedule :
The original schedule (start in May) had to be revised, because the ALICE installations at the pit are
only fully available by the end of August 2006. The reschedule is based on a meeting
with ALICE and IT on the 24th of July.
14.September 2006 Update : The installation at point 5 is still not ready. First tests have been done. The alimdc disk pool
in Castor2 has 28 disk servers with 135 TB since ab out 4 weeks.
The whole schedule is now shifted by about 4-6 weeks.
week 31 (31. July - 6. August)
-- upgrade the ALICE WAN pool by 15 disk servers (WAN pool is now 17 )
-- increase the tape migration streams to 10 ( done)
-- first functional and low performance (~300 MB/s) DAQ-T0-tape tests
-- in parallel the TCP commissioning will use the same setup to transfer at the 20 MB/s continuously
-- start the export to the T1 sites at 300 MB/s from the WAN pool
week 32 - 34 (7. - 27. August)
-- installation of the local ALICE disk pool at the pit (~20 fibre channel disk arrays in total)
-- there are still some problems with the correct network connectivity to IT (BPS will discuss with CS)
-- start functional tests of the xrootd-Castor2 interface, few reconstruction
jobs , different pool than the DAQ tests
-- continue the export to the T1 sites at 300 MB/s from the WAN pool
week 35 (28. August - 3. September)
-- increase the WAN pool to 30 disk server
-- increase the migration streams to 30 (that is equivalent to the number of used tape drives)
-- increase to the performance to 1 GB/s
week 36 (4. - 10. September)
-- 1 GB/s full DAQ-T0-tape test for one consecutive week (no xrootd involved)
-- no other data access in parallel to the DAQ-T0 tests
week 37 (11. - 17. September)
-- backup week in case of problems
Twiki page for the ALICE DC 7 exercise :
https://uimon.cern.ch/twiki/bin/view/ALICE/AliceDataChallengeVII
Mar 2006 *
ALICE__DAQ-T0-T1_architecture_and_time_schedules.doc:
ATLAS
tentative time schedule :
The ATLAS T0 test was successfully finished in July 2006. Further WAN tests are continuing as during the T0
test the maximum achieved data rates during short time periods was about 500-600 MB/s while the goal
was 780 MB/s.
Results :
http://indico.cern.ch/getFile.py/access?contribId=5&resId=0&materialId=slides&confId=a063069 T0 test experience summary
http://indico.cern.ch/getFile.py/access?contribId=6&resId=0&materialId=slides&confId=a063069 T1 export and data management summary
https://uimon.cern.ch/twiki/bin/view/Atlas/AtlasTierZero twiki history of the details during the test
http://indico.cern.ch/getFile.py/access?sessionId=1&resId=0&materialId=0&confId=4960 post-mortem of the T0 exercise
http://indico.cern.ch/getFile.py/access?sessionId=1&resId=0&materialId=0&confId=4959 post-mortem of the T1 export exercise
a new plan for the next T0 data challenge in September 2006 has been prepared :
*
Planning_proposal_for_the_ATLAS__Sept_DC_v1.doc: \ T0 data challneg planning for ATLAS, September 2006
Jan 2006 *
https://uimon.cern.ch/twiki/bin/view/Atlas/Tier1DataFlow
Mar 2006 *
ATLAS__DAQ-T0-T1_architecture_and_time_schedules.doc:
CMS
tentative time schedule :
week 27 (03 - 09 July)
-- creation of a new CMS pool T0input, 5 disk server
-- first tests with the new T0 software prototype
-- BPS provides disk server performance scenarios based on previous
measurements to estimate the performance of a simple round-robin scheduling
week 28 (10 - 16 July)
-- increase the T0input pool to 10 nodes
-- scalability tests (3,4,6,8,10 server setups)
comparison of CASTOR2 standard scheduling with a
possible direct round-robin scheduling
this would require a special Castor2 setup = 5 overlapping pools)
-- continue functionality tests of the T0 software
limited performance
-- use the default CMS pool (or WAN) for extended functionality
T0 tests
week 29 (17 - 23 July)
-- continue tests
-- create a new pool, T0export with 10 disk server
and use it as part of the tests
this would be the pool where reconstruction, tape migration and
T1 export would take place
week 30 (24 - 30 July)
-- continue tests
week 31 (31 - 6 August)
-- continue tests
week 32 (7 - 13 August)
-- continue tests
-- increase the T0export pool to 20 disk server
-- full functional test of the T0 software, aim for nominal speed (225 MB/s)
week 33 (14 - 20 August)
-- increase the T0export pool to 35 disk server
-- run at full nominal speed for one or two days
-- the T1 export question needs to be clarified (real transfers or emulation)
-- clarify the expected performance of the reconstruction program
the resources (disk servers) for the two mentioned pools will come from some 3 different sources
1. to reach the 2006 resources allocation for CMS another 10 disk servers need to be added from IT (T0input buffer)
2. the resources for the T0export buffer will come from a re-arrangement of the existing CMS pools (cmsprod, wan)
3. for the 2 weeks of the full test (week 32+33) another ~20 server will be added from the special DC IT resources
(on loan for 2 weeks)
14.September 2006 Update :
This part of the tests has been finished by now and the first results have been presented :
http://indico.cern.ch/getFile.py/access?contribId=59&sessionId=4&resId=0&materialId=slides&confId=5878
A new detailed plan for the CSA06 exercise has been prepared :
*
Resource_allocation_plan_for_CMS_during_Sept-Nov2006.doc: CMS resource schedule for CSA06
Mar 2006 *
CMS__DAQ-T0-T1_architecture_and_time_schedules.doc:
June 2006 *
https://twiki.cern.ch/twiki/bin/view/CMS/CMST0Project
LHCb
Mar 2006 *
LHCb__DAQ-T0-T1_architecture_and_time_schedules_v2.doc:
IT plans
October 2006 DAQ->T0->Tape+REC+EXP at 1.0 GBytes/s (= 4 GBytes/s internal traffic)
(delayed from April due to load balancing problems in Castor2)
October 2006 DAQ->T0-Tape at > 2GBytes/s, equal number of tape drives from IBM and STK
(delayed from May due to load balancing problems in Castor2)
October 2006 DAQ->T0->Tape+REC+EXP at 1.6 GBytes/s (= 6.4 GBytes/s internal traffic)
After quite some tests inside IT and together with the experiments it becomes clear that these
remaining three tests are only of very limited use. They were focusing on the possibility to run all
4 experiments together on the same resources. In the mean time the different experiment models
became much more detailed and there are subtle but significant differences in the operation and data flow
of the T0. The focus is now on detailed experiement data challenges like the one for ATLAS in June and now
in September.
3. Status of tests
December 2005 IT DAQ->T0->Tape at 950 MB/s for more than one week, emulation
January 2006 ATLAS DAQ->T0->Tape+REC at nominal and higher rates (>320 MB/s)
February 2006 IT DAQ->T0->Tape at 1 GBytes/s only the new IBM 3592B drives, peak at 1.8 GBytes (24h)
March 2006 IT DAQ->T0->Tape+EXP+REC at 2.2 GBytes input + 2.2 GBytes/s output
The following table shows in a color coded manner the status of the tests which have already
been done and those which are planned. This is constantly updated.
(there is still a problem to visualize this in this web page decently, thus here is also
the power-point original view
--> *
DC_status_planning_diagram.ppt: multi-dimensional plannung diagram )
* View of the planning time table for the Data Challenges:
Apr 2006 *
panzer_MB_04April2006_DAQ-T0-T1_status_v3.ppt: Status presentation of the DAQ-T0-T1 tests to the MB
Mar 2006 *
status_DCs_march06_v3.ppt: Brief status report on the data challenges
Feb 2006 *
Castor2_talk_lhcc_meeting_06Jan2006.ppt: Data Challenge Status, LHCC meeting
Feb 2006 *
atlas_t0_test_jan2006_lhcc_meeting.ppt: ATLAS Data Challenge status report, LHCC meeting
June 2006 *
BPS_performance+stability_review_castor2_8June06.ppt: Castor2 review T0 performance and reliability
4. Available resources
There are currently three different resource systems available for all the planned tests :
- 1. a large dedicated Data Challenge setup (48 disk server, 40 tape drives, 120 CPU nodes)
- 2. a large dedicated setup for the SC4 throughput tests (42 disk server)
- 3. the 'normal' Castor2 setup for each of the 4 experiments
Mar 2006 *
ResourceallocationplanningforCERNin2006__v4.doc: Resource availability in 2006
5. Benchmarks
To understand the complex data flow behaviour in the full system one has to have a good
understanding of the performance charactaristics of the different components :
Disk server, CPU server, Tape server, Tape drives, Network
Apr 2006 *
http://hepix.caspur.it/spring2006/TALKS/6apr.kelemen.locfs.pdf : file system performance values
Apr 2006 *
https://twiki.cern.ch/twiki/bin/view/FIOgroup/TsiTpSrvPerf : benchmarks for tape server and tape drives
Feb 2006 *
Diskserverbenchmarks_v2.doc: Sequential disk server stream benchmarks
Mar 2006 * disk server benchmark with mixture of low speed and high speed streams in preparation
Mar 2006 * performance effects when using the local disk space oon the worker nodes in preparation