We foresee three different use cases :

Requirements for MSG for ATLAS job monitoring

Reports from the worker nodes

We estimate tat ATLAS has up to 50-60K jobs running in parallel all over WLCG infrastructure. Does not mean that the jobs have to keep connection during it's life time.

In the current model a job sends information only in the beginning and in the end of run, so two short connections per job is OK.

More complicated scenario is foreseen as well but only if the simple one would prove to work: we would like to enable bpeek-like functionality or some heart-beat not very often but every 30 minutes or so.

In the first (simple) scenario the content of information consists of status information about jobs (started running, finished, exit code, name of the worker node, etc...) and some processing details like cpu , memory consumption, but only in the end of run, not as a heart beat.

1 or 2 consumers (one is Dashboard another potential consumer is GANGA/Hammer client monitor)

Info is sent in key-value pairs using Stomp , 10-20 pairs per message. We need one-way reliable data transfer with recording data to the DB. Time to keep message before deleting - 24 hours would be great, but can be shorter (6 hours or so) Latency of 10 minutes is acceptable.

Consumer will be connected permanently recording data in the DB No particular security requirements

This is already prototyped for a subset of GANGA jobs (see link https://twiki.cern.ch/twiki/bin/view/ArdaGrid/ATLASJobMonMSG). Looks to work well. But the scope will be completely different as soon as PANDA pilots are instrumented for reporting

Reports from the GANGA client

Very preliminary estimation:

~100 producers per day distributed over WLCG sites not necessary all working in parallel ~10K messages per producer randomly distributed over time.

1 or 2 consumers (one is Dashboard another potential consumer is GANGA/Hammer client monitor)

Info is sent in key-value pairs using Stomp or Openwire (currently Stomp, but we consider using OpenWire for this use case ), 10-20 pairs per message. We need one-way reliable data transfer with recording data to the Dashboard DB. Time to keep message before deleting - 24 hours would be great, but can be shorter (6 hours or so) Latency of 10 minutes is acceptable.

Consumer will be connected permanently recording data in the DB No particular security requirements

This is already prototyped for a subset of GANGA jobs (see link https://twiki.cern.ch/twiki/bin/view/ArdaGrid/ATLASJobMonMSG). Looks to work well. Though we foresee some increase in terms of # of GANGA clients and # of messages compared to what we have now, but this increase won't be dramatical after PANDA instrumentation is in place.

Reports from the PANDA server

For the moment we are not sure we would go for it. Since PANDA DB as well as dashboard DB are both sitting at CERN, enabling reporting via MSG can be a sort of overhead. It is still under discussion.

In case we decide to go for it, below is what we have in mind.

Most probably we will use OpenWire since performance here is important. 1 producer -> 1 or 2 consumers (one is Dashboard another potential consumer is GANGA/Hammer client monitor) Producer and consumer are both located at CERN. Message contains a json object describing a single PANDA job. According to examples we got ~4K symbols. Example of content is attached. We estimate ~3mln messages per day not necessary equally distributed over time. We need one-way reliable data transfer with recording data to the DB. Time to keep message before deleting - 24 hours would be great, but can be shorter (6 hours or so) Latency of 10 minutes is acceptable.

Consumer will be connected permanently No particular security requirements

Questions we would like to clarify

Main question for us is whether the MSG service will be provided as a part of the middleware stack and will be supported as any other service of production quality. Looks like experiments are hesitant to invest in development which will rely on MSG unless this is confirmed.

Outcome of the meeting with MSG developers/supporters 28.05.2010

Presented Julia,Pablo,Marco, David, Laura,Irina from Dashboard team, Lionel and Konstantin from MSG team.

As far as we (Dashboard people) understood currently it is not clear whether MSG can satisfy requirements we described above. Test which had been run against MSG in summer of 2009 gave pretty optimistic results, but apparently, messages sent from the same client did not create every time new connections, and this could have impact on the test results. Presentation describing results of last summer tests can be found here. Secure connection which is planned to be enabled for MSG can create a big overhead and can have a bad impact on the performance. As far as Dashboard is concerned we do not need secure connection. But MSG people objected that if we share the common service, common rules should be applied. Secure connection can help to find easily who is misusing the system, etc... A possible solution could be to have a dedicated MSG server for Dashboard. Lionel asked who will pay for it. We think that it should be provided as a part of the standard WLCG infrastructure and should be handled as any of the gLite services (WMS, VOMS, etc...). One of the attractive points of MSG for possible clients is that it wil be provided as a part of the infrastructure. Lionel told that GT group provides development effort , but won't run MSG service. When MSG is production ready (as we understood it is not yet the case), it will be a responsibility of Greece to run MSG service for EGI. Julia objected that Dashboard clients are LHC experiments and we are discussing WLCG needs. EGI is slightly different scope. To understand whether MSG can handle the load of messages sent by ATLAS or CMS jobs from the WN it was decided to run another round of tests, every time creating new connection. However, it is difficult to simulate a real job processing activity of 300K jobs submitted per day. Dashboard team will think how to organize test of some reasonable scope and will coordinate this test with MSG team.

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2011-08-07 - EdwardKaravakis
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback