Grid Deployment Board (GDB)
Web - Wiki - Agendas - Minutes
2 April 2008 - GDB Meeting - Agenda
The GDB in April covered the progress of Nagios Monitoring
for Grid Services at Tier-1 sites, a system used to online monitor and
display about the services at the WLCG Tier-1 sites.
An update on HEP CPU
Benchmarking was presented; there is a test bed with different
processors available to the Experiments for benchmarking their typical
production and analysis applications.
The status and progress of the Review of LHC Multi-User Pilot Job
Frameworks, OPN Networks and Tape Efficiency
were presented. The plans for CCRC08
May were also presented and discussed during the meeting.
1 April 2008 - Pre-GDB - Agenda
CCRC08 F2F Meeting - The F2F Meeting focused on the Communication between Services, Sites
and Experiments and reviewed many of the WLCG services used during the
CCRC08 Challenges (FTS, LFC, SRM, DB,
etc). The four LHC Experiments
reported on the issues they have found and presented their Plans for CCRC08-May.
5 March 2008 - GDB Meeting - Agenda
Results at the Efficiency
of CPU usage by the Experiments software, seen from the CERN site and
compared this with the Experiments views, were presented.
There was also an update on the group doing the Pilot Jobs Experiments Frameworks
reviews. The mission/mandate is now agreed: review security issues; define
a minimum set of security requirements; advise on improvements. The Experiments
are to produce a document about their system and a security questionnaire
is being discussed.
The strategy for WLCG
Services Monitoring with Nagios and also some examples of usage were
presented.
4 March 2008 – Pre-GDB
- Agenda
CCRC08 F2F Meeting - The F2F Meeting focused on the reviews of
all WLCG services through the CCRC08-Feb challenge. The presentations covered
Service/Operations, gLite Middleware, FTS, MSS, Tier-0 Tape Usage
and each of the four LHC Experiments
reported on the issues found and lessons gained during CCRC08-Feb.
14 April 2008 – Agenda, Minutes
OPS Tests Problems
- Many sites failed the SAM tests due to a wrongly advertised global LFC
for OPS VO. It is a weak point of the infrastructure: a site can publish
anything and make all sites fail OPS tests. There are ongoing
investigations to come up with a proposal on how to fix this issue.
7 April 2008 – Agenda, Minutes
GFAL and multiple BDII
- GFAL client is able to use multiple BDIIs since version 1.10.6. This will
be documented in the user guide that is in preparation.
WMS and LB into
Pre-Production - The gLite WMS and LB services for SL4 were released to
the Pre-Production service. In parallel with the usual certification cycle,
a pilot service of the new WMS and LB will be run at CERN-PROD. Experiments
are invited to use the service in real production context and to provide
feedback.
31 March 2008 – Agenda, Minutes
ATLAS SAM Tests -
ATLAS has developed a SAM test to verify the correct version of lcg-utils on
the WN. The results can be seen in the SAM web page, selecting the ATLAS VO.
The sites that give ERROR in this test did not upgrade to a SRM2 compatible
version of lcg-utils, which is due since several weeks.
17 March 2008 – Agenda, Minutes
AMGA-oracle to
Pre-Production - New glite-AMGA-oracle component was released to the
Pre-Production Service with Glite 3.1.0 PPS Update 21 for certification.
10 March 2008 – Agenda, Minutes
3 March 2008 – Agenda, Minutes
XFS Files System
Recommended - It has been demonstrated that the EXT3 file system is far
less performing then the XFS file system for file deletion operations. For
example deleting 2048 files with 1.5GB size takes 5 seconds on XFS vs. 90
minutes on EXT3. Therefore all sites running DPM must migrate from EXT3 to
XFS as soon as possible. Running XFS does not have any counter effect, only
benefits.
Architects Forum (AF)
Minutes
- Web
3 April 2008 – Minutes
LCG_54d
Configuration - New configuration with fixes to POOL has been produced
(LCG_54d). It includes also a new version of dCache.
Removing SEAL Dependencies - Large progress reported in removing the
SEAL dependency of the POOL packages. POOL is completed. CORAL not finished
but very advanced. COOL; very few decisions to be make.
Changes in the Nightly Build - Building SEAL will be removed from the
"dev" slot. The "dev1" dedicated to patches to LCG_53
will also be stopped. Building for SCL3 will also be stopped since no new
requests are expected from the Experiments.
Python Installation - Requested by LHCb (Dirac) to provide a
statically linked version of Python in the Applications Area release area.
20 March 2008 – Minutes
ROOT Schema
Evolution – Circulated, by R.Brun, the initial version of the proposal
describing the specifications for an advanced ROOT schema evolution.
Migration to ROOT
5.20 - Initial discussion whether or not the Experiments would migrate
to ROOT 5.20 (summer version) and what would be the effort required. A
major release of the POOL projects (POOL, CORAL, and COOL) removing the
dependency to SEAL is already scheduled.
Moving to GCC 4.3
- Since the 'official' gcc 4.3 compiler has been released, it was decided
to skip gcc 4.2. A 4.3 preview will be setup soon in the nightly builds.
For Windows, despite of some concerns, it was agreed to move to VC 9.0 in a
time scale of a few months.
6 March 2008 – Minutes
Changes in the
Reflex API – There will be a proposal for some changes to the Reflex
API. The new API header files will be proposed and the impact of these
changes on the applications code will need to be studied carefully.
Geant4 9.1 - All
Experiments have plans to converge with Geant4 version 9.1 in the long
term. The June release of Geant4 is only a bug fix release of the 9.1
release.
Mac OSX 10.5 Support
- The migration towards MacOSX 10.5 (Leopard) has started with the newly
available build server. The tag for that platform is
"osx105_x86_64_gcc40".
|
|
Management
Board (MB)
Web - Wiki - Members - Agendas - Minutes
15 April 2008 – Agenda,
LCG
3D status and maintenance –
D.Duellmann presented the status of the LCG 3D project and proposed the
next steps in order to move to standard maintenance of the database services
at the Tier-0 and Tier-1 sites.
Tier-1 and Tier-2 Reliability and Availability March 2008 – A.Aimar distributed the availability
and reliability reports of theTier-1 and Tier-2 sites. The reports now
included again the tests specific to each of the Experiments (VO-specific
SAM tests).
1 April 2008 - Agenda, Minutes
CCRC08
Activities - J.Shiers presented an update
of the CCRC08 activities. The semi-automatic reporting strategy continues
successfully, with different degrees of completeness depending on the Experiment.
The new
CASTOR version (2.1.7) will be ready for pre-production certification on
the 1st of April as originally planned: All functional and stress tests are
successful.
The fixes
deployed at CERN to the LCG CE, which reduce the load by an order of
magnitude, are being packaged for external distribution and should be ready
next week. This patch is important for outside sites and should be deployed
as soon as it becomes available.
OSG/SAM Milestones - R.Quick presented the
planning for the next few months on the implementation of the SAM tests for
the OSG Tier-2 sites.
OSG will
be working with the SAM developers on defining the transport mechanisms to
get scheduled downtime into the SAM database. There are design sessions and
discussions planned at next WLCG Collaboration Meetings on April 21-24 and
during dedicated May’s meetings in Madison, Wisconsin.
Overview Board Summary - The Overview Board met on the
day before the MB Meeting. I. Bird presented a brief summary of the
discussion.
Contact Information of the Tier-1 and Experiments – Each Site and Experiments confirmed
their contact information for operations and alarms (web, phone and/or
email). Link
25 March 2008 - Agenda, Minutes
CCRC08
Activities - J.Shiers presented an
update of the CCRC08 activities.
The Services reported some
disk failure on the ATLAS integration RAC. Oracle published the 64bit
version of 10.2.0.4 – it is deployed on a test RAC at CERN. The first
standard tests were performed without problems. CASTOR: was upgraded to
2.1.6-11 on the pools for ATLAS, CMS and LHCb.
HEP
Benchmarking - H.Meinhard presented a
summary of the status and progress of the HEP CPU Benchmarking working
group. The working group progresses in benchmarking the typical hardware
used at the WLCG sites. Currently there is a dedicated test cluster at CERN
(of 7 different machines/processors) and a few machines available for
benchmarking at other laboratories (DESY Zeuthen, RAL). The Experiments are
running their benchmark applications on each of the configurations available.
OSG
Site Functional Tests - R.Quick presented the
status of the integration of the OSG test system (RSV) with the SAM site
monitoring system developed at CERN. The RSV test results are uploaded into
SAM; the process has been stable for several weeks. The few problems
encountered have been solved but reliability data still needs to be
published.
18 March 2008 - Agenda, Minutes
Tier-0 Power Plans - T.Cass presented a summary of
the electricity power situation of the Computer Centre at CERN in the next
few years.
GDB Summary - J.Gordon summarized the last GDB meeting
in March. The main topics discussed were: Monitoring of the LCG services,
CPU Efficiency, Tape Efficiency and Job Priorities.
CPU Usage limits per Job - There are two conflicting
points of view between Sites and VOs. Sites would like to have a maximum
wall time allowed for the execution of a job (i.e. a job running longer than
a maximum allowed is terminated) The agreement was to have 24h maximum CPU
time and 36h maximum wall time. In order to inform the VOs of the
termination of the job a GGUS ticket is raised when the max time is
exceeded.
11 March 2008 - Agenda, Minutes
CCRC08 Progress
- CCR08 is now in the phase 1.5 of CCRC'08 (i.e. between the February phase
1 run and phase 2 in May). There are no formally coordinated activities or metrics in
this 1.5 phase as yet.
Currently it involves individual Experiments doing separately functionality,
throughput and stress testing of their computing model, verifying their components
and their sites.
Pilot Jobs WG Update - The Pilot Jobs Frameworks
working group, launched by the GDB, was mandated by WLCG MB. Its mission is
to review security issues in the pilot job framework of each Experiment. The
goal is to verify a minimum set of security requirements and advise on
improvements if needed.
Update
of the HL Milestones - The MB verified all due
milestones in the High Level Milestones dashboard
4 March 2008 - Agenda, Minutes
CCRC08-Feb Review
- In the last week of February, CCRC08 progressed without major problems.
There were successful combined data exports ran for several days at rates
of 1-2 GB/s. CMS alone has run close to 1GB/s. All the CCRC08 work continued in parallel to many
other activities. People although busy with other activities and meetings
were able to also “run the challenge” for extended periods. This is a very
positive sign that WLCG is moving to disciplined and controlled run
practices.
LHCb QR Report - Ph.Charpentier presented the status and
progress of LHCb during the last quarter. LHCb completed the testing of
their Core Software and the Application Area’s packages, mainly testing the
latest release of ROOT and verifying that the LCG 54 configuration suits
the needs of the LHCb applications.
SAM Tests Update - M.Schulz presented a summary
of the status of the SAM Availability tests and how the reliability calculations
are performed.
26 February 2008 -
Agenda, Minutes
CCRC08 Will Continue Beyond
February- J.Shiers presented the progress of the CCRC08
challenge. Data exports from CERN are now running at 1-2 GB/s. The peak by
the Experiments is ~1GB/s greater than the average achieved during SC4
(1.3GB/s). CMS alone has managed more than 1GB/s and ATLAS starts running
at similar rates. The number of problems reported is only a few per
VO per day.
CMS QR Report - M.Kasemann
presented the CMS’ grid activities since October 2007, covering the CSA07
activities and the CCRC08-February tests The CMS Computing infrastructure
is fully utilized by ongoing production.
Tape Efficiency Metrics
- Site Roundtable. The LHCC Referees asked that all Tier-1 sites also
collect tape efficiency metrics in the near future, during CCRC08 in May
these metrics should be available in order to analyze how efficiently tapes
are used by the typical read/write patterns.
Sites Availability and Reliability
Algorithm - A.Aimar described how the Availability and
Reliability calculations are performed in SAM and GridView, the results are
then used for the reports on site availability and reliability to the
Overview Board.
|