https://twiki.cern.ch/twiki/pub/LCG/WebHome/LCGlogo.jpg

LCG Bulletin

 https://twiki.cern.ch/twiki/pub/LCG/WebHome/LCGlogo.jpg

Grid Deployment Board (GDB)
Web - Wiki - Agendas - Minutes

6 February 2008 - GDB Meeting - Agenda

The GDB in January covered the progress of HEP CPU Benchmarking, a test bed with different processors is being provided to the Experiments to benchmark their applications. The GSSD Final Report summarized the work still needing to be completed and some interesting new issues were raised about space tokens. The Worker Node Working Group was launched; it should be investigating the future WN characteristics in terms of memory, hard disk space, with subsequent matching of such resources. The output of the WG should describe the deployment process with details to advertise how to describe heterogeneous WNs at the sites. A set of Security Documents, on VO Operations and Pilot Jobs policies, was presented and should be approved in the near future. The GDB also launched the Review of LHC Multi-User Pilot Job Frameworks.

5 February 2008 - Pre-GDB Meeting  - Agenda

CCRC'08 F2F Meeting - The all day meeting focused on the baseline versions of the gLite Middleware and of the SRM implementations. Presentations on progress, plans and issues covered gLite, CASTOR, dCache, DPM and StoRM. Readiness of the Sites and Experiments was also discussed in detail as well as tracking the challenge, including monitoring, logging and reporting.

9 January 2007 - GDB Meeting - Agenda

The main topics presented and discussed at the GDB in January 2008 were the Handling of Persistent Storage Classes (T1D0), the status of the SRM2 Deployment and the preparation for the CCRC08 Challenges.

There were also reports about CPU Benchmarking in HEPiX, Security Policies, Pilot Jobs and the Worker Node Strategies.

10 January 2008 - Post-GDB Meeting - Agenda

CCRC 2008 F2F Meeting - The all-day F2F meeting focused on the preparation of the CCRC challenges for February and May 2008. The Experiments presented their requirements in terms of storage at the different sites (Tier-0 and Tier-1). The SRM 2.2 Deployment at every WLCG Site was discussed and Tape Usage Statistics at CERN were presented. Proposals for basic operations, bug fixing, upgrades, metrics, monitoring and reporting were also discussed.

WLCG-OSG-EGEE Operations (OPS)
Agendas - Minutes - Action List

25 February 2008 - Agenda, Minutes  

Moving to gLite 3.1 - One month of proven good performance of the gLite 3.1 service is the criteria for being able to withdraw the previous version (gLite 3.0). Withdrawal of support means no more bug fixes or functional updates (other than for security issues).

Pre-Release of SAM Services - Pre-release version of the SAM web services: As announced on Friday Feb. 22 through the same-announce mailing list, there is a new pre-release version of the SAM web services (lcg-sam-server-ws-0.11.0) installed on the SAM Validation instance. Link
People are encouraged to review these changes, adapt their code (if necessary) and test the new interfaces as soon as possible.

GLUE 2.0 Draft Available - The initial draft of Glue 2.0 is available at the following Link. Send feedback to L.Field.

DCache Updated for CCRC08 - Where to get dCache updates during CCRC '08. It was clarified that during CCRC '08 WLCG sites should take dCache updates from the official dCache repositories. Link

18 February 2008 - Agenda, Minutes 

64-Bits SL4 WN - The 64bit SL4 WN will be available relatively soon and there was a discussion on how sites should publish whether they have 32bit, 64 bit or a mixture. Proposals will be discussed at the next meeting.

Running MW Services on a Single Node - Sites and VOs were asked to give any feedback they have on problems seen with combining several middleware services on one physical node.

Down-time Scheduling Procedures - LHCb raised the issue that some sites seem not to be adhering to the agreed WLCG rules for announcing service downtime. This will be monitored more closely.

11 February 2008 - Agenda, Minutes  

GOCDB Outage - There was an analysis of the down-time of RAL (due to a power cut) and the effect it had on grid operations due to the outage of the GOC DB. Lessons learned will be forthcoming.

Problems with Jobs Submission - There was an analysis of the VOMS issue which prevented many users in the UK from submitting jobs for ~24 hours. Procedures will be changed to ensure that a similar problem doesn't occur again.

Retirement of the Classic SE- It was announced that the classic SE will be retired at the end of May. Any VOs or sites that have an issue with this should contact N.Thackray or Maria Barroso Lopez. An announcement containing this information was broadcasted.

All ATLAS WN Move to SLC4 - ATLAS requested to move all their WNs from SLC3 to SLC4. The deadline will be 15 March 2008.

Site Share Installation Area - ATLAS also requested that all sites increase the shared software installation area at the sites from 10 to 100 GB

4 February 2008 - Agenda, Minutes

Sites Suspended- Some sites had to be suspended due to many problems encountered when operating them.

28 January 2008 - Agenda, Minutes

GridView Availability Algorithm - Decision made to amend the GridView site availability algorithm so that "transparent" interventions are handled correctly (i.e. they will not be counted as down-time).

Timeouts of SE and SRM Tests - SE and SRM tests failing at FNAL (with timeouts after 600s). After discussion with the site and the monitoring team, the site will try to address this through the use of better hardware.

21 January 2008 - Agenda, Minutes

Middleware for CCRC08 - The base versions for CCRC '08 of the various middleware services were advertised to all sites and discussed. GLite 3.1 VOBox was released into production.

14 January  2008 - Agenda, Minutes

Space Tokens Needed - ATLAS requested the following for CCRC '08: Each site should publish in the Information System updated information in some GLUE fields: for all storage areas including their space token association

Issues at some Sites. - LHCb reported RFIO problems at both CNAF and RAL that were seriously hindering production.  

Architects Forum (AF)
Minutes - Web

21 February 2008 - Minutes

ROOT Will Support Qt4 only - The ROOT interface to the Qt GUI toolkit version 3 (Qt3) will not be further supported starting from next production release. Only Qt4 will be supported.

New Projects in PH-SFT - Agreed that Architects Forum will also monitor/drive the developments for the two new R&D projects hosted in the PH-SFT group. Both project leaders will be invited to the meetings and short status reports will be given regularly.

LCG_54a Configuration Ready - It includes some changes with respect to LCG_54. A new patch configuration will be prepared in order to include the new version of COOL.

Experiments Releases - Experiments are picking-up configuration LCG_54(x) for building their production releases during the next weeks.

7 February 2008 - Minutes

Experiments Agree to Abandon Support for Qt3 - Requested to experiments their support needs with respect to the version of the Qt GUI toolkit and its interoperability with ROOT. Decided that if no reason is identified from ATLAS or CMS to keep Qt3 support, then ROOT will move to Qt4 exclusively.

CORAL Server Decision - Discussion about CORAL server development. Decided to go ahead with the proposal and develop the prototype. This prototype needs to be delivered quickly because of the urgent needs from ATLAS online.

LCG_54a in Preparation - New configuration in preparation with latest bug fixes. ATLAS would like to defer the release of LCG_54a for a week. This was agreed and the target date for the bug/fix release was fixed for February 18th

24 January 2008 - Minutes

Geant4 Funding - J.Apostolakis made a summary of his analysis of the effects the DoE budget cut in SLAC has for Geant4. He also has written a memo with more details which has been sent to ATLAS and CMS management.

Changes to the Nightly Builds - There will be some changes in the nightly builds with new slots for patches of LCG_54 and new version of gcc compiler.

CORAL Server Proposal - D.Duellmann presented the CORAL server proposal in the last AA meeting. The experiments should comment on the proposal. The decision eventually could be taken at the next AF.

10 January 2008 - Minutes

ROOT Release in Preparation - The production release of ROOT scheduled for Wednesday 16th assuming all pending problems are solved and no new ones are detected by the ongoing experiment validations. A few days later we should get the complete LCG_54 release.

Python 2.5 in Candidate Release - Python 2.5 has been validated and is going to be added into the candidate configuration for release ("dev" slot in the nightly builds)

MAC OSX 10.5 new Supported Platform - Agreed to introduce MacOSX 10.5 (leopard) as a new platform after the release of LCG_54

GCC 4.2 Future Support - Agreed to start working for gcc 4.2 after the release of LCG_54. Eventually we will drop gcc 4.1 after the complete software stack has been validated with gcc 4.2.

Adding Geant4 and Genser in the Nightly Builds - Suggested to have Geant4 and Genser MC generators as part of the nightly builds to facilitate the validation by ATLAS. 

 

Management Board (MB)
Web - Wiki - Members - Agendas - Minutes

26 February 2008 - Agenda, Minutes 

CRC08 Will Continue Beyond February- J.Shiers presented the progress of the CCRC08 challenge. Data exports from CERN are now running at 1-2 GB/s. The peak by the Experiments is ~1GB/s greater than the average achieved during SC4 (1.3GB/s). CMS alone has managed more than 1GB/s and ATLAS starts running at similar rates.  The number of problems reported is only a few per VO per day. The CCRC operations will not stop after February but will continue in March and beyond. Site participation in the daily operations calls needs to be improved.

CMS QR Report - M.Kasemann presented the CMS’ grid activities since October 2007, covering the CSA07 activities and the CCRC08-February tests The CMS Computing infrastructure is fully utilized by ongoing production. Finished the CSA07 production (and much more) and a detailed analysis of the performance was executed. The newly formed Processing and Data Access (PADA) taskforce addresses deployment, integration, commissioning and scale testing. It will bring the elements of the CMS Computing Program into stable and scalable operations. The CCRC08 functional tests in February have actually complemented CSA07 and tested important additional functionality.

Tape Efficiency Metrics - Site Roundtable. The LHCC Referees asked that all Tier-1 sites also collect tape efficiency metrics in the near future, during CCRC08 in May these metrics should be available in order to analyze how efficiently tapes are used by the typical read/write patterns.

Sites Availability and Reliability Algorithm - A.Aimar described how the Availability and Reliability calculations are performed in SAM and GridView, the results are then used for the reports on site availability and reliability to the Overview Board.

19 February 2008 - Agenda, Minutes

CCRC08 and WLCG Services - J.Shiers presented an update on the CCRC08 challenge. The scope and timeline CCRC08 will not achieve the sustained exports from ATLAS+CMS (+others) at nominal 2008 rates for 2 weeks by end February 2008. These goals can be achieved soon after February. Therefore the proposal is to continue CCRC08 through March, April and beyond. The WLCG Computing Service is in full production mode and to run permanently is its actual purpose. One needs to move from the mind-set of “challenge then relax” to “full production all the time”. Experiments should remember that there is no GGUS TPMs on weekends / holidays / nights. A problem submitted to GGUS on a Friday evening will be answered only the next Monday.

LHCC Referees Feedback - I.Bird summarized the meeting with the LHCC Referees that took place the same morning. The questions and concerns of the referees were about: Status of the Sites, the SRM installations and why the Experiments were not all running at the same time - the Referees would like this to be demonstrated. They also asked that tape efficiency and overall metrics for site performance should be defined and measured.

Availability and Reliability in January 2008 - Tier-1 Reliability and Availability. Below are the reliability metrics since April 2007.  One should note that the target is moved to 93%; otherwise 10 sites would have been above the old 91% target. The averages for the best 8 in the last 6 months were: Aug 94% Sept 93% Oct 93% Nov 95% Dec 96% Jan 95%

I.Bird proposed that the MB starts, from next month, to also verify the status of reliability and availability of Tier-2 sites.

12 February 2008 - Agenda, Minutes

CRC 08 Started - J.Shiers presented a summary of the initial days of CCRC08.The February CCRC08 run started on Monday 4 February. The preparations for this challenge have proceeded (largely) smoothly. The execution is still manpower intensive and schedules remain extremely tight. M

ATLAS QR Report - D.Barberis presented the ATLAS QR report, an update on the information presented in November. ATLAS considers the FDR-1 exercise very useful and has learned important information on data concentration at CERN and event mixing (jobs with many input files) -    The data quality loop was tried and basically works. The calibration procedures were also attempted for the first time and still need testing. The ATLAS Tier-0 internals are not a worry except for operations manpower; shifts are being organized.

5 February 2008 - Agenda, Minutes

CCRC08 Preparation - J.Shiers reported on the status and progress of the CCRC08. With respect to previous challenges it is better prepared and executed. The information flow has been considerably improved; a lot of work was focused on ensuring that middleware and storage-ware are ready. Accurate and timely reporting of problems and their resolution still needs to be standardized.

CASTOR Metrics - T.Cass proposed the set of metrics that could be collected to measure and monitor the performance of the MSS storage at the Tier-0 site. There are many different CASTOR related performance measurements that can be collected, but these are not easily available from a single location. Metrics will now be tracked consistently in order to show performance issues, and hopefully improvements, over time.

Post SLC4 options at CERN -T.Cass presented the possible options for the future platform to support. The choice discussed is between progressively delivering SLC5 services (test clusters, build services, etc), or skipping the RHES5 based platform and introducing SLC6 services from the release of RHES6, expected Q4 08.

ALICE Quarterly Report - L.Betev presented the QR for ALICE, from Nov 07 to Jan 08. The ALICE MC production continued with very good site and services availability. The Conditions data collection was in operation from day one (Shuttle system to Offline CondDB).  All data source components are ready and integrated, including the DAQ/DCS/HLT databases and fileservers. The focus is now on having a full complement of conditions data - and the corresponding online software - for all detectors.

Applications Area before CCRC - P.Mato presented the status of the Applications Area software. The AA software is very weakly coupled with grid services. Very few points of contact (e.g. access to event and conditions data). Typically each experiment will use the version they have managed to fully integrate and validate with their applications. The CCRC February run was based on last year’s releases while the CCRC May run will be on based on the new configuration.

29 January 2008 - Agenda, Minutes

OSG Site Availability and Storage Services - In the last couple of weeks OSG started uploading RSV records to SAM publishing information on the US Tier-2 production resources listed in the WLCG MOUs for ATLAS and CMS. The logs, on the SAM side, have revealed that uploads are steady and accessible. The SAM web interface still does not present the expected results. Joint debugging is actively taking place

Interim User Accounting Policy - J.Gordon presented the status of user level accounting and the current plans. The proposal is to have an interim solution approved by the MB. ATLAS would like be granted access to their user data. ATLAS also uses VOMS Roles/Groups (related issue) The interim policy (see Policy Document below) is that selected ATLAS people should agree before being given access to ATLAS data. Sites are informed that if they publish UserDN data, then the ATLAS VRM will have access to the ATLAS data.

22 January 2008 - Agenda, Minutes

CCRC08 Preparation - The services are not only going to be tested for scaling but also new updates are still being tested or certified (LFC, Gfal, lcg-utils). One should agree on a target set of versions for all components of middleware and storage software no later than the April Face-to-Face Meetings (April 1/2).

Pilot Jobs Framework Review - J.Gordon presented the updated mandate resulting from the last GDB discussion.  The proposed mandate is to review the Multi-User Pilot Job Frameworks of the LHC Experiments (Draft 1) and to produce a report to the Management Board about the safety and effect of the framework.

GDB Summary - J.Gordon presented a summary of the January GDB.

15 January 2008 - Agenda, Minutes

Metrics for of Tape Efficiency -: During the previous meeting T.Bell presented the metrics used at CERN and proposed that some of those metrics should be collected by the Tier-1 sites. The Tier-1 sites should comment whether the Tier-1’s can collect the same metrics. 

WLCG Service Interventions / Interruptions - J.Shiers summarized the issues of (1) “expected response time from the Experiments” and (2) what could be achieved by defining clear procedures and common standards. Some Experiments (CMS and LHCb) have requested for the most critical services, a maximum downtime of 30’. It as has been stated on several occasions, including at the WLCG Service Reliability workshop and at the OB, maximum downtime of 30’ is impossible to guarantee an affordable cost. Intervention in 30’ is not obvious.

Update on CPU Benchmarking - H.Meinhard reported the progress at the HEP CPU Benchmarking working group: Several different systems (currently based on seven different processors) are available at CERN. The standard SPEC benchmarks were run at CERN on those seven hosts.

NDGF SAM Tests - O.Smirnova presented an update about the specific SAM tests that have been developed in order to calculate the standard Site Availability at NDGF.

8 January2008 - Agenda, Minutes

OSG Resource and Service Validation (RSV) - OSG is developing a service testing infrastructure and probes that will calculate the availability and reliability of the OSG sites. The results collected will also be passed by the standard WLCG SAM availability monitoring system and published via GridView.

CCRC08 F2F Meeting - J.Shiers presented the status of the CCRC planning and the preparation of the F2F Meeting scheduled for the 10 January 2008 at CERN. Link

Improving Tape Efficiency at Tier-0 Site - T.Bell summarized the issues concerning storage tape handling and explained which are the major areas for improvement, both in the configuration of the Tier-0 storage system and in the usage patterns of the storage in the Experiments’ applications. The presentation was followed by a discussion and by feedback from the Experiments on how their applications could improve tape usage. A similar kind of analysis of tape efficiency will be performed at the Tier-1 Sites in the coming weeks.

General News and Events
LCG Meetings - Calendar

4 March 2008 

CCRC'08 F2F Meeting. Agenda 

5 March 2008

Grid Deployment Board (GDB) Meeting at CERN. Agenda

2 April 2008

Grid Deployment Board (GDB) Meeting at CERN. Agenda

21-25 April 2008

WLCG Collaboration Workshop (Tier0/Tier1/Tier2) at CERN. Link

14 May 2008 

Grid Deployment Board (GDB) Meeting at CERN. Agenda  

12-13 June 2008

WLCG CCRC-08 Workshop at CERN. Link