Requirements for ACE from the LHC experiments

Here we try to summarize an outcome of meetings/discussions pieces of existing documentation, related to this subject :

Whether experiments are happy with one single algorithm for VO-specific availability calculation which is consistent with GridView algorithm?

In general people tempt to agree that we should keep just one single algorithm (based on the discussion at the meeting on the 23.06.2010). Most of the limitations of the current system rather relate to the topology description than to the algorithm of availability calculation.

The GridView Service Availability Computation algorithm is described in this document.

What should be done differently compared to the current algorithm is the following:

1). If site has two services of the same type and one is down another one in maintenance, currently GridView considered the service type in maintenance. It should be other way around. It should be rather down than maintenance, because otherwise the site can register one fake service and keep it all the time in maintenance and it won't be ever down. Writing this page, I asked myself, what should happen if there 10 services of the same type and 9 of them are in maintennace 1 is down, should be the overall state of the service type considered to be down then? Should not one take as a value a value of the majority of services of a given type?

2). When there are several services of the same flavour, then there is a logical 'OR' in the availability calculation. VOs should be able to redefine this default behavior, if the need, changing 'OR' by 'AND'. THis possibility should be foreseen on the UI where VOs define a profile.

3). Profile should be linked to a certain group. For example, it that certain tests or service types are critical ONLY for T1s, but not critical for T2s.

How to handle VO-specific service types, like CRAB server for example

David told that there was an agreement recorded in the document approved by MB, that all services which would be tested by SAM should be registered in GOCDB. Alessandro mentioned that registration in GOCDB is not a straight forward process and pointed to a corresponding savannah bug https://savannah.cern.ch/support/?113592. Andrea expressed some doubts that experiments would be happy to register experiment-specific services in GOCDB. It was suggested to re-discuss this question inside the experiments in order to understand whether they agree with the statement that all experiment-specific services which need to be tested by SAM should be registered in GOCDB.

Some remarks:

The validity of the test should be defined on the test level. Default is 24 hours. Where/how it should be defined?

If there is no critical tests defined for the site, the site would be always green

SAM won't use BDII for availability calculation, since information which can be taked from BDII (which services are used by VO), should come with the VO topology description. However, if topology description is not provided, then BDII can be used on this purpose.

Whether the overall site is considered to be in downtime or only a particular service type is defined by site admin. VOs are free to define different profiles in order to decouple various functionalities of the sites in various profiles.

GridView considers only scheduled downtime as maintenance, unscheduled downtime is regarded as the site is down.

Meetings

Meetings are recorded on this twiki only starting from the 23.06.2010, there were many discussion before this date

Meeting 23.06.2010

Attended by: David Collados , Phool Chand, Andrea Sciaba, Roberto Santinelli, Alessandro Di Girolamo, Akshat Kakkar, Pablo Saiz, Julia Andreeva

The goal of the meeting was to understand the requirements of the experiments for VO-specific availability calculations. The discussions started from the old document written by William Olliver two years ago. Discussed issues are described above.

We did not get to the point of discussion of the time line for having VO-specific availabilities calculation prototype in place. Would be nice if Andrea, Alessandro and Roberto can look again through Williams document and think about issues discussed today, so that by the next meeting we can have some updated version of the requirements for GridView and then can estimate how much time is required for implementation and whether we can have some intermediate milestones.

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf ExtensionsToSam.pdf r1 manage 196.5 K 2010-06-23 - 12:29 DavidCollados  
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2010-06-23 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback