No file left behind - monitoring transfer latencies in PhEDEx

The CMS experiment has to move Petabytes of data among dozens of computing centres with low latency in order to make efficient use of its resources. Transfer operations are well established to achieve the desired level of throughput, but operators lack a system to identify early on transfers that will need manual intervention to reach completion.

File transfer latencies are sensitive to the underlying problems in the transfer infrastructure, and their measurement can be used as prompt trigger for preventive actions. For this reason, PhEDEx, the CMS transfer management system, has recently implemented a monitoring system to measure the transfer latencies at the level of individual files. For the first time now, the system can predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies early, and correct the issues while the transfer is still in progress. Statistics are aggregated for blocks of files, recording a historical log to monitor the long-term evolution of transfer latencies, which are used as cumulative metrics to evaluate the performance of the transfer infrastructure, and to plan the global data placement strategy.

In this contribution, we present the typical patterns of transfer latencies that have been identified in the operational experience acquired with the latency monitor. We show how we are able to detect the sources of latency arising from the underlying infrastructure (such as stuck files) which need operator intervention, and we identify the areas in PhEDEx where a development effort can reduce the latency. The improvement in transfer completion times achieved since the implementation of the latency monitoring in 2011 is demonstrated.

Authors:

  • Nicolo Magini (IT-ES)
  • Tony Wildish, Chih-Hao Huang, Natalia Ratnikova, Thorsten Chwalek, Alberto Sanchez (CMS PhEDEx development team)
  • Stefan Piperov, Paul Rossman, Federica Moscato, Mingming Yang, Andrea Sartirana, Rapolas Kaselis, Si Xie (CMS transfer operations team)

-- NicoloMagini - 22-Sep-2011

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2011-10-27 - NicoloMagini
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback