DOMA Rucio Instance

The DOMA Rucio instance is intended to perform automatic tests (eventually, scale tests) of various third party copy implementations. It is operated by the Rucio team on the CERN Kubernetes infrastructure; the point of contact is Thomas Beerman.

Rucio configuration

Installation of Rucio is covered here.

The rucio.cfg of the Rucio client should be as follows:

[client]
rucio_host = https://rucio-doma.cern.ch:443
auth_host = https://rucio-doma.cern.ch:443
auth_type = userpass
username = 
password = 
account = root
ca_cert= /etc/pki/tls/certs/CERN-bundle.pem
request_retries = 3

[policy]
permission = generic
support_rucio = https://github.com/rucio/rucio/issues/

You must contact Thomas to receive your username and password and fill it above. Note the REST interface is not available outside CERN; you must be inside the CERN firewall to issue rucio commands.

The scripts managing this instance have been uploaded to a small GitHub repository: https://github.com/bbockelm/rucio-doma-tests

Monitoring Links

Proposal for Rucio-based Scale Tests

Criteria for participation

To help encourage sites to fix technical issues prior to scale tests, the endpoints must first meet these criteria:

  • Admins must say it is scale-test ready. The endpoint should be appropriately designed to send and receive >1Gbps of transfers and be able to hold approximately 10TB for DOMA test transfers. It should be production-quality and not a developer testbed.
  • Endpoint must demonstrate 7 days of successful transfers. The success rate in the interoperability transfers should be >90% for a given protocol for all inbound and outbound links. If there is a technical reason that a link cannot function (e.g., known problem due to software release issues), then that link should be blacklisted from the interoperability transfers.

Stress Test Mechanism

  • Each source site in the stress tests will receive one or more 1TB datasets named $SITENAME-stress-N (example: NEBRASKA-XRD_H-stress-1, NEBRASKA-XRD_H-stress-2). These will be uploaded from the test client endpoint and pinned permanently via a Rucio rule to the source site.
  • Then, each hour:
    • For each fully-uploaded source dataset that has zero file replicas at a destination site, a rule will be created to subscribe the dataset to each destination site.
    • For each source dataset with 100% file replicas at a destination, the rule creating the transfer will be deleted.

This setup will continuously retransfer the same source dataset to all destinations: each time the source dataset is completely transferred, it will be unpinned. This will cause Rucio to delete the destination replica. Once the destination replica is fully deleted, a new rule will be created in the next cycle, triggering new transfers.

We will use FTS's built-in mechanism for tuning the number of concurrent transfers. The number of 1TB source datasets in flight will be reviewed at the biweekly meetings and adjusted upward or downward to set the scale.

Criteria for Removal from Stress Tests

  • Admin complaints about stress tests causing stability issues.
  • >10% failure rate.

Monitoring

  • Stress tests for each protocol should not be concurrent and should be monitored separately from the functional tests.

Technical Scratchpad

To add a new RSE for HTTP, do this:

rucio-admin rse add FLORIDA-XRD_H
rucio-admin rse set-attribute --rse FLORIDA-XRD_H --key type --value HTTPS
rucio-admin rse set-attribute --rse FLORIDA-XRD_H --key fts --value https://fts3-devel.cern.ch:8446
rucio-admin rse set-attribute --rse FLORIDA-XRD_H --key lfn2pfn_algorithm --value identity
rucio-admin rse add-protocol --hostname cmsio3.rc.ufl.edu --port 1094 --scheme https --prefix /store/user/dteam --domain-json '{"wan": {"read": 1, "write": 1, "third_party_copy": 1, "delete": 1}, "lan": {"read": 1, "write": 1, "delete": 0}}' FLORIDA-XRD_H

The RSE name should be of the form SITENAME-TECHNOLOGY_PROTOCOL where SITENAME is a human-readable sitename (e.g., "FLORIDA"), technology is a 3-letter abbreviation for the implementation (DCA, DPM, XRD), and protocol is either H or X for HTTP and xrootd, respectively.

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2020-03-13 - BrianBockelman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback