Architecture and planning for the full data flow chain DAQ-T0-T1.
1. General Architecture
The general data flow in the T0 system can be disentangled into several part :
* DAQ --> T0
issues : the size and access pattern of the DAQ disk buffer
* T0 --> Tape
issues : file sizes, IO scheduling read and write streams, disk server congestion
* T0 --> Reconstruction Farm
issues : direct access to the disk servers versus local caching on the worker nodes
* T0 --> Tier 1 export
issues : high number of parralel IO and TCP streams, Tier 1 HSM charactaristics
and then of course the combination of the different flows.
In addition we have to face the complication of the start-up period where the data flow is most probably different
compared to the stable running period in 2009.
Jul 2005 *
ArchitectureandTechnologiesforPhaseIIoftheLCGprojectatCERN_v10.doc: General Arcitecture overview paper, CERN T0 and CAF
Sep 2005 *
CDRandT0scenarios3.doc: Central Data recording scenarios and considerations
Mar 2006 *
BuildingBlocks.doc: Decription of the basic computing fabric building blocks
2. Experiment plans and architectures
ALICE
Plans
there are no concrete plans for 2007 yet
Achievements
The original schedule (start in May) had to be revised, because the ALICE installations at the pit are
only fully available by the end of August 2006. The reschedule is based on a meeting
with ALICE and IT on the 24th of July.
14.September 2006 Update : The installation at point 5 is still not ready. First tests have been done. The alimdc disk pool
in Castor2 has 28 disk servers with 135 TB since ab out 4 weeks.
The whole schedule is now shifted by about 4-6 weeks.
The ALICE DC VII finally started in October and run until the end of December.
Twiki page for the ALICE DC 7 exercise : https://uimon.cern.ch/twiki/bin/view/ALICE/AliceDataChallengeVII
first summary of the ALICE DC 7 exercise :
Background information
Mar 2006 *
ALICE__DAQ-T0-T1_architecture_and_time_schedules.doc:
ATLAS
Plans
the next full scale ATLAS T0 data challenge is planned for February 2007.
Achievements
The second ATLAS T0 test was successfully finished in July 2006. Further WAN tests are continuing as during the T0
test the maximum achieved data rates during short time periods was about 500-600 MB/s while the goal
was 780 MB/s.
Results :
http://indico.cern.ch/getFile.py/access?contribId=5&resId=0&materialId=slides&confId=a063069 T0 test experience summary
http://indico.cern.ch/getFile.py/access?contribId=6&resId=0&materialId=slides&confId=a063069 T1 export and data management summary
https://uimon.cern.ch/twiki/bin/view/Atlas/AtlasTierZero twiki history of the details during the test
http://indico.cern.ch/getFile.py/access?sessionId=1&resId=0&materialId=0&confId=4960 post-mortem of the T0 exercise
http://indico.cern.ch/getFile.py/access?sessionId=1&resId=0&materialId=0&confId=4959 post-mortem of the T1 export exercise
a new plan for the next T0 data challenge in September 2006 has been prepared :
*
Planning_proposal_for_the_ATLAS__Sept_DC_v1.doc: \ T0 data challneg planning for ATLAS, September 2006
The third ATLAS T0 test took place in october 2006:
summary of the results :
Background information
Jan 2006 *
https://uimon.cern.ch/twiki/bin/view/Atlas/Tier1DataFlow
Mar 2006 *
ATLAS__DAQ-T0-T1_architecture_and_time_schedules.doc:
CMS
Plans
Achievements
Background information
tentative time schedule :
14.September 2006 Update :
This part of the tests has been finished by now and the first results have been presented :
http://indico.cern.ch/getFile.py/access?contribId=59&sessionId=4&resId=0&materialId=slides&confId=5878
A new detailed plan for the CSA06 exercise has been prepared :
*
Resource_allocation_plan_for_CMS_during_Sept-Nov2006.doc: CMS resource schedule for CSA06
the CSA06 exercise took place in October and November 2006 :
summary of results :
Mar 2006 *
CMS__DAQ-T0-T1_architecture_and_time_schedules.doc:
June 2006 *
https://twiki.cern.ch/twiki/bin/view/CMS/CMST0Project
LHCb
Plans
Achievements
Background information
Mar 2006 *
LHCb__DAQ-T0-T1_architecture_and_time_schedules_v2.doc:
IT
Plans
Achievements
Background information
October 2006 DAQ->T0->Tape+REC+EXP at 1.0 GBytes/s (= 4 GBytes/s internal traffic)
(delayed from April due to load balancing problems in Castor2)
October 2006 DAQ->T0-Tape at > 2GBytes/s, equal number of tape drives from IBM and STK
(delayed from May due to load balancing problems in Castor2)
October 2006 DAQ->T0->Tape+REC+EXP at 1.6 GBytes/s (= 6.4 GBytes/s internal traffic)
After quite some tests inside IT and together with the experiments it becomes clear that these
remaining three tests are only of very limited use. They were focusing on the possibility to run all
4 experiments together on the same resources. In the mean time the different experiment models
became much more detailed and there are subtle but significant differences in the operation and data flow
of the T0. The focus is now on detailed experiement data challenges like the one for ATLAS in June and now
in September.
3. Status of tests
December 2005 IT DAQ->T0->Tape at 950 MB/s for more than one week, emulation
January 2006 ATLAS DAQ->T0->Tape+REC at nominal and higher rates (>320 MB/s)
February 2006 IT DAQ->T0->Tape at 1 GBytes/s only the new IBM 3592B drives, peak at 1.8 GBytes (24h)
March 2006 IT DAQ->T0->Tape+EXP+REC at 2.2 GBytes input + 2.2 GBytes/s output
The following table shows in a color coded manner the status of the tests which have already
been done and those which are planned. This is constantly updated.
(there is still a problem to visualize this in this web page decently, thus here is also
the power-point original view
--> *
DC_status_planning_diagram.ppt: multi-dimensional plannung diagram )
* View of the planning time table for the Data Challenges:
Apr 2006 *
panzer_MB_04April2006_DAQ-T0-T1_status_v3.ppt: Status presentation of the DAQ-T0-T1 tests to the MB
Mar 2006 *
status_DCs_march06_v3.ppt: Brief status report on the data challenges
Feb 2006 *
Castor2_talk_lhcc_meeting_06Jan2006.ppt: Data Challenge Status, LHCC meeting
Feb 2006 *
atlas_t0_test_jan2006_lhcc_meeting.ppt: ATLAS Data Challenge status report, LHCC meeting
June 2006 *
BPS_performance+stability_review_castor2_8June06.ppt: Castor2 review T0 performance and reliability
4. Resources
Mar 2006 *
ResourceallocationplanningforCERNin2006__v4.doc: Resources available in 2006
Dec 2006 Resources available in 2007
5. Benchmarks
To understand the complex data flow behaviour in the full system one has to have a good
understanding of the performance charactaristics of the different components :
Disk server, CPU server, Tape server, Tape drives, Network
Apr 2006 *
http://hepix.caspur.it/spring2006/TALKS/6apr.kelemen.locfs.pdf : file system performance values
Apr 2006 *
https://twiki.cern.ch/twiki/bin/view/FIOgroup/TsiTpSrvPerf : benchmarks for tape server and tape drives
Feb 2006 *
Diskserverbenchmarks_v2.doc: Sequential disk server stream benchmarks