Will your site be willing to participate in the joint tape test?

Yes (maybe) - We are performing a massive data migration from the old ORACLE library to the new IBM library (around 14 PB, for LHC), hence we have limited manpower and resources we could offer for the test. However, if the test is only done for a few days, these migrations could be either paused or we could offer a limited amount of drives for the test. It would be really beneficial for us to know in advance the details of the tests (reads/writes?), volumes, etc… so we can better address our position.

In PIC we support ATLAS, CMS and LHCb, and very recently (March 2021) both ATLAS and CMS performed a tape test at the site as well. ATLAS is still in the old ORACLE library, which is being deprecated, and we doubt we can learn more on the PIC tape library system as compared to the last tests that were performed. We think it does not make sense to still be targeting the old system to tests, since we are retiring it. CMS is being migrated at the moment, and LHCb data has been fully migrated to the new IBM library.

A link with the detailed information from the last test by CMS is available here:

https://docs.google.com/presentation/d/1sZJS7pGLiQVhsvrwAXw6twEnDeD2EYNmbtTTHumV3WM/edit?usp=sharing

For multi-VO site , for which VOs then?

PIC would be available to participate in the tape test for all WLCG VOs it supports, namely ATLAS, CMS and LHCb, if necessary. But, please take note of our concerns regarding the lessons we can get out of this exercise. In particular, the case of ATLAS is non-sense for us to be tested again, since (as written below) the data is on an old technology that we are retiring.

What is the maximum available tape throughput of the site?

With the current setup at PIC, the combined aggregated tape read throughput at PIC is ~2.3 GB/s, while the aggregated tape write throughput is ~3.5 GB/s.

Number of tape drives and drive types

The PIC old ORACLE library is equipped with T10K technology. It has 8 drives T10KC and 6 drives T10KD. From the last test made by CMS and ATLAS (*), the read throughput for T10KC and T10KD drives were ~125 MB/s per drive, while the writes occurred at ~150 MB/s per drive.

The PIC new IBM library is equipped with LT0 technology: a total of 10 LT08 drives to handle LT07M8 and LT08 tapes that are installed in the library. From the last test made by CMS (*), the read throughput was ~60 MB/s per drive, and the writes occurred at ~150 MB/s per drive.

(*) These March 2021 tests occurred in parallel to other production activities at the site.

Does the site have hard-partition between read and write?

No

For multi-VOs, does the site have hard-partition between VOs?

Since we have two libraries with different tape technologies, the usage by the VOs is distributed on these libraries:

ATLAS has 100% of the data (8.3 PB) in the ORACLE library, and uses it for new writes. 98% of the data is placed in T10KD tapes. The remaining 2% of data sits in T10KC tapes.

CMS has 83% of data in the ORACLE library (8.1 PB), stored completely in T10KC tapes. The remaining 17% of data is already in the IBM library (800TB), stored in LT07M8 tapes, a library that is used for new writes. New tapes have been added (LT08) and migrations from the ORACLE library are currently performed in background.

LHCb has 100% of the data in the IBM library (2.1 PB), stored in LT08 tapes, a library that is used for new writes.

For shared technologies, Enstore dynamically allocates drives to VOs on the basis of free drives and queuing recall processes. If tape drives are occupied, recalls remain on queue. So, writing activities have more priority.

At the end of the massive migration (a process that will take approx. 2 years), all of the data will be stored in the IBM library, in which we will likely have introduced LT09 drives and tapes.

How the tape drives are allocated among VOs and for write and read?

We don’t separate drives for reads/writes or VOs. If VOs share a library and a technology, a fairshare mechanism guarantees that the usage is correct, given the data volumes the VOs typically move in/out the tape library. So, all of the drives are shared for VOs on the same technology.

Tape disk buffer size

For read:

ATLAS read: 63 TB

CMS read: 67 TB

LHCb read: 12 TB

For write:

ATLAS write: 106 TB

CMS write: 67 TB

LHCb write: 50 TB

Any other info the site finds relevant ans wants to share. For example ,how much of the tape infrastructure is shared between experiments, what is the impact of other communities, etc.

Generally, all tape drives are shared between all VOs running at PIC. We will be glad to participate in the tests, however we need to understand better the benefits of this particular test as compared to the last one made in March 2021, since we have limited resources and manpower and we are in the process of a massive migration of data between the old and the new library.

Performance of the tape systems is highly dependent on parameter tuning, both in Rucio/FTS and local to the site (queue lengths), and the rates are very sensitive to the file sizes. We understand that the tests are useful, however we would like that an assessment of the benefits is made, since many sites are involved in massive data migrations on their tape systems before the Run-3 starts. Also, please consider that PIC wouldn’t like to spend efforts in testing a technology that is being deprecated, so this means that ATLAS tape tests at this stage are not very meaningful at PIC.

Please see some additional questions to answer

  • According with the LHC experiment expected throughputs YY on site XX how much you could offer for the 2nd week of October and what percentage of hardware you will have during that date?

With the current setup at PIC, the combined aggregated tape read throughput at PIC is ~2.3 GB/s, while the aggregated tape write throughput is ~3.5 GB/s.

  • When do you expect to have required hardware in place?

Two drives are being installed and they will be available for the test. All of the hardware that we need to install this 2021 will be already deployed by the date of the tests.

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2021-09-01 - JosepFlix
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback