Dashboard Task Monitoring abstract for EGI CF 2012

Title

User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system

Overview

The Experiment Dashboard is a monitoring system developed for the LHC experiments in order to provide the view of the Grid infrastructure from the perspective of the Virtual Organisation (VO). It enables a transparent view of the experiment activities across different middleware implementations and combines the Grid monitoring data with information that is specific to the VO. Job processing is the core part of the VO computing activities. The scientists must be able to monitor the execution status, application and grid-level messages of their tasks that may run at any site within the VO. The Dashboard Task Monitoring applications collect and expose a user-centric set of information to the user regarding submitted tasks. They provide a clear and precise view of the task status evolution and reason of failure as a function of time or site. Advanced graphical plots are also available which give a more usable and attractive interface to the analysis and production user.

Description of Work (abstract)

Various fully distributed job submission methods and execution backends are used within both the ATLAS and CMS VOs. More than 700,000 ATLAS and 300,000 CMS jobs are submitted daily to the Worldwide LHC Computing Grid (WLCG) and are processed on different middleware platforms. The LHC job processing activity is divided in two categories: processing of large-scale Monte-Carlo production jobs and user analysis jobs. The main difference between these categories is that the former is a well-organised activity performed by a group of experts, while the latter is chaotic analysis processing by diverse and geographically widespread members of the physics community. The behaviour of analysis jobs is particularly difficult to predict as it is normally carried out by users who are not necessarily experienced in using the Grid. All of these factors increase the complexity of the monitoring of the job processing activities within these VOs. While most of the existing monitoring applications are coupled to a specific Workload Management System (WMS), such as CRAB Monitoring for CMS and Panda Monitoring for ATLAS, the Dashboard Task Monitoring applications support different middleware implementations and job submission systems. They combine Grid monitoring data with information that is specific to the experiment by collecting information from various sources, such as the user interface of the WMS, the job submission systems, and the jobs themselves, presenting all this information in a coherent way, as if all of it came from one source. The development was user driven with physicists invited to test the prototypes in order to assemble further requirements and identify weaknesses with the applications. This talk will describe the current status of the job processing monitoring, cover the Dashboard Task Monitoring applications for the analysis and the production users which are widely used by the ATLAS and CMS community, and provide an insight into future development plans.

Impact

The Dashboard Task Monitoring applications for analysis and production users have become very popular within the ATLAS and CMS communities and play an important role in the analysis and production operations of the LHC. They also play an important role in the support infrastructure as they ensure that only serious issues are escalated to the support teams. More than two hundred and fifty distinct users are using them daily for their work just for CMS. Close collaboration with users and production teams resulted in the tools being focused on their exact monitoring needs.

Conclusions

There was major progress in the development of applications for monitoring of the user analysis and production activities from 2009 onwards. This work is very important, since it contributes to the overall success of the LHC offline computing effort. During the first year of data taking, the Dashboard Task Monitoring applications were proven to be an essential component for the LHC computing operations. They are being developed in very close collaboration with the physicists who use the Grid infrastructure to submit analysis and production jobs. As a result, they respond well to the needs of the LHC experiments.

URL

http://dashboard.cern.ch

Presentation Type

Presentation/Paper

Track

Software services for users and communities

-- EdwardKaravakis - 17-Nov-2011

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2011-11-24 - EdwardKaravakis
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback