System Management Working Group (SMWG)

One of the problems observed (by EGEE and LCG) in providing a reliable grid service is the reliability of the local fabric services of participating sites. The SMWG should bring together the existing expertise in different area of fabric management to build a common repository of tools and knowledge for the benefit of HEP system managers community. The idea is not to present all possible tools nor to create new ones, but to recommend specific tools for specific problems according to the best practices already in use at sites. Although this group is proposed in order to help improve grid sites reliability, the results should be useful to any site running similar local services. Two areas should be improved by the group: tools and documentation.

Goals

  • Improve overall level of grid site reliability, focussing on improving system management practices, sharing expertise, experience and tools
  • Provide a repository for
    • Management tools
    • Local fabric monitoring sensors
    • HOWTOs
  • Provide site manager input to requirements on grid monitoring and management tools
  • Propose existing tools to the grid monitoring working group as solutions to general problems
  • Produce a Grid Site Fabric Management cook-book
    • Recommend basic tools to cover essential practices, including security management
    • Discover what are common problems for sites and document how experienced sites solve them
    • Document collation of best practices for grid sites
  • Point out holes in existing documentation sets
  • Identify training needs
    • To be addressed in a workshop or by EGEE

Scope

  • Initially should focus on improving reliability of the basic fabric services (compute clusters, batch systems, storage systems, database services, etc.) needed by the grid community, but could later broaden in scope to include other aspects.
  • Should cover basic security practices and tools required for a secure and trustworthy infrastructure.
  • This group will clearly overlap with the Grid Service Monitoring Working Group (GSMWG) particularly in the area of local fabric monitoring and in supplying feedback on missing grid services monitoring tools or necessary improvements. The two groups are required to work in close contact and boundaries and division of responsibility should be discussed between the groups. It is important that work is not duplicated.

Chairs

  • Alessandra Forti - University of Manchester
  • Michel Jouvin - LAL

Anticipated Participation

  • System managers from "HEPiX" and other grid sites.

Related documents

  • Grid Service Monitoring Working Group Mandate
  • System Analysis Working Group Mandate

-- Main.aforti - 19 Dec 2006

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2007-02-05 - IanRobertNeilson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback