RAL Setup

RAL has used Castor as its tape backend since 2006. In 2019 it was decide to replace Castor with CTA and the Oracle Tape Librariy with Spectra Logic TFinity Library using IBM enterprise tapes. Hardware was order in December 2019 but unfortunately not all of it was delivered and installed by the time the lockdowns started. We focused on getting the new tape robot into production and the data migrated. Migration of data to the TFinity Library was completed in April 2021. A virtual instance of CTA has been running since January 2021 and the production hardware was finally ready in April 2021. An instance of CTA has been available at RAL for internal testings since July 2021. We intend to open this up for external VO tests in September. Note, that the migration from Castor to CTA will be done via a database migration, hopefully in Q4 2021 and certainly before data taking starts for Run 3. We have a strong preference for testing CTA rather than Castor as it will allow us to optimise performance and resolve a large number of teething problems.

Castor

  • Will your site be willing to participate in the joint tape test?
Yes, for all VOs, although we do not believe there is much to be gained by testing Castor as it will be decomissioned for the LHC experiments before Run 3 starts.

  • What is the maximum available tape throughput of the site?
Previous ATLAS Data Carousel testing has shown 20Gb/s throughput. We have not invested any effort in the last few years to improving Castor performance.

  • Number of tape drives and drive types
The TFinity tape robot has 20 x TS1160 drives. Castor is currently configured to use 14 of them. This can be changed although it is unfortunately not just a config change as tape servers need to be re-installed to move them between Castor and CTA, but this is a practised proceedure, so a few days should be sufficient notice.

  • Does the site have hard-partition between read and write?
Castor does not have a hard-partition of the disk buffers for reads and writes.

  • For multi-VOs, does the site have hard-partition between VOs?
Castor does not have a hard-partition between the VOs.

  • How the tape drives are allocated among VOs and for write and read?
Writes are given priority athough it is restricted to 4 drives per tape family max. There are some limits that prevent a VO from using all drives but Castor has shown quite effective at balancing the load for the VOs. Depending on the exact use case a VO can use around 80% of the tape drives if no-one else is busy.

  • Tape disk buffer size
Castor has a single buffer for reads and writes that is shared by all VOS and is around 1.1PB. It is made up of 2018 hardware.

  • Any other info the site finds relevant ans wants to share. For example ,how much of the tape infrastructure is shared between experiments, what is the impact of other communities, etc.
We also support other VOs such as Dune and Na62. For Na62 we store a complete copy of their raw data (as a back-up for CERN), which is a few PB a year. It should be noted that Castor does not support Webdav TPC transfers.

CTA

  • Will your site be willing to participate in the joint tape test?
    • Yes, for all VOs. Gvien that it is a new endpoint, it may be sensible to test a few VOs, e.g. ATLAS and CMS. If ALICE and LHCb were to test Castor, it would still stress the underlying Tape hardware.

  • What is the maximum available tape throughput of the site?
No idea, it would be good to test it!! We anticipate that CTA will regularly need to cope with 50Gb/s input during data taking and have spec it to be able to cope with 200Gb/s+ although that throughtput would not be possible until we have the next generation of tape drives (or simply more drives).

  • Number of tape drives and drive types
The Tfinity tape robot has 20 x TS1160 drives. CTA is currently configured to use 6 of them. This can be changed although it is unfortunately not just a config change as tape servers need to be re-installed to move them between Castor and CTA, but this is a practised proceedure, so a few days should be sufficient notice.

  • Does the site have hard-partition between read and write?
Yes, CTA has different read and write partitions.

  • For multi-VOs, does the site have hard-partition between VOs?
CTA does not currently have a hard-partition between VOs, although we note that CERN does. This is the main difference between RAL and CERN's setup. RAL chose a shared setup for the VOs as we are significantly smaller than CERN and we don't have the quite as high requirements for reliability during taking for each VO.

  • How the tape drives are allocated among VOs and for write and read?
Writes are given priority. Other settings are still to be decided.

  • Tape disk buffer size
We currently have around 400TB of SSD disk buffer capacity. We expect to assign this as ~2/3 reads, ~1/3 write.

  • Any other info the site finds relevant ans wants to share. For example ,how much of the tape infrastructure is shared between experiments, what is the impact of other communities, etc.
The non-LHC communities will be the last to migrate to CTA, so this instance is free to be tested to its limits without impacting other users. We have allocated the LHC VOs additional tape capacity to run any tests they like against CTA. We gave CMS 3PB extra capacity and any of the other VOs can have next years pledge early if they want.

Please see some additional questions to answer

  • According with the LHC experiment expected throughputs YY on site XX how much you could offer for the 2nd week of October and what percentage of hardware you will have during that date?
We have decided to split our tape drives 50:50 between Castor and CTA during the data challenge. This means there will be 10 IBM TS1160 drives in CTA for the data challenge to make use of (we will also only deploy half of the EOS buffer). These have a theoretical combined write performance of 4GB/s (10 x 400MB/s). As it happens the VO requirements for RAL during the data challenge sum to approximately 4GB/s as well. The challenge will therefore stress CTA to its limits!

  • When do you expect to have required hardware in place?
The hardware is being put in place currently. Given CTA is so new we expect some teething problems, but we hope that for a significant fraction of September, the VOs can be doing functional testing against CTA.
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2021-08-25 - AlastairDewhurst
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback