WLCG Operations Coordination Minutes, July 5th, 2018
Highlights
Agenda
Attendance
- local: Julia (WLCG), Maarten (ALICE + WLCG), Mayank (WLCG), Renato (LHCb)
- remote: Alessandro (CNAF), Catherine (LPSC + IN2P3), Dimitrios (WLCG), Eric (IN2P3-CC), Gareth (RAL), Johannes (ATLAS), Stephan (CMS)
- apologies:
Operations News
- the next meeting will be on Sep 13
- please let us know if that date would pose a significant problem
Special topics
- Some follow up regarding GDPR and WLCG services. This topic has been discussed at the June MB. Latest proposal consists of the following:
- Produce a light-weight “Code of Conduct” for WLCG (and EGI/EOSC-hub). This implies replacement of the existing EGI/WLCG Data Protection Policy Framework and providing a general WLCG Data Privacy Statement and a template for others to use.
- The EOSC-hub/AARC2/WLCG policy team will prepare draft documents (expected to be approved early autumn)
- WLCG Ops should continue building their list of services needing a Privacy Statement
Middleware News
- Useful Links
- Baselines/News
- Issues:
Discussion
- Maarten: EGI and OSG have just sent a security advisory concerning Singularity
Tier 0 News
Tier 1 Feedback
Tier 2 Feedback
Experiments Reports
ALICE
- Normal to high activity levels on average
- No major problem
ATLAS
- Stable grid production over the last weeks with up to ~300-350k concurrently running job slots. Additional HPC contributions with peaks of 100k concurrently running job slots.
- There is the usual mix of grid workflows on-going: MC generation, simulation and data and MC derivation production. MC reconstruction is currently at a smaller scale with a larger campaign planned to start in August or September.
- Upcoming is the first larger scale test of MC pile-up simulation and digitisation with MC overlay.
- Commissioning of the Harvester submission system via PanDA is on-going: US HPCs, Grid: CERN, BNL, Iberian cloud
- EOSATLAS: there are worries from ATLAS on the EOS stability.
- Several short (20mins-1h) EOS issues in the past month.
- Waiting for a post-mortem report (as Twiki with ServiceIncidentReports ) of the current EOS instabilities
- Julia: we will ask the EOS team to give a presentation in our next meeting
CMS
- LHC in beta*=90m run
- Tier-1 keeping up with incoming data
- compute system busy at about 250k cores
- usual mix of about 20% analysis 80% production
- CMS EOS crash last week triggered by an eosdump of one of our legacy cleaning scripts
LHCb
- Productions:
- Collision18 production ongoing
- User and Simulations running
- No major problems
Ongoing Task Forces and Working Groups
Accounting TF
- Following the question of Di at the last WLCG Operations Coordination meeting regarding accounting for jobs submitted via BOINC the accounting task force twiki page has been updated with short instruction from Andrew McNab. Di is going to try and will report his experience at the September Accounting Task Force meeting.
Archival Storage WG
Update of providing tape info
PLEASE CHECK AND UPDATE THIS TABLE
Site |
Info enabled |
Plans |
Comments |
CERN |
YES |
|
|
BNL |
YES |
|
|
CNAF |
YES |
|
Space accounting info is integrated in the portal. Other metrics are on the way |
FNAL |
YES |
|
|
IN2P3 |
YES |
|
Space accounting info is integrated in the portal. Other metrics are on the way |
JINR |
YES |
|
|
KISTI |
NO |
|
KISTI has been contacted. Will enable it soon |
KIT |
YES |
|
|
NDGF |
NO |
|
NDGF has a distributed storage which complicates the task. Discuss with NDGF possibility to do aggregation on the storage space accounting server side |
NLT1 |
NO |
|
|
NRC-KI |
YES |
|
|
PIC |
YES |
|
|
RAL |
YES |
|
Space accounting info is integrated in the portal. Other metrics are on the way |
TRIUMF |
YES |
|
|
- Julia: we will contact SARA by e-mail
One can see all sites integrated in storage space accounting for tapes
here
- Dimitrios: mind that the plots may show some jagged lines due to recent network issues
- Julia: we will soon move the prototype to production
Information System Evolution TF
IPv6 Validation and Deployment TF
Detailed status
here.
Machine/Job Features TF
Monitoring
MW Readiness WG
Network Throughput WG
Squid Monitoring and HTTP Proxy Discovery TFs
- Just a measurement: CMS@Home is now using the Cloudflare caching (openhtc.io) and measurements show nearly 5 minutes of savings in job startup time on average.
Traceability WG
Container WG
Action list
Creation date |
Description |
Responsible |
Status |
Comments |
03 Nov 2016 |
Review VO ID Card documentation and make sure it is suitable for multicore |
WLCG Operations |
In progress |
GGUS:133915 |
07 Jun 2018 |
Followup of OSG service URL changes |
WLCG Operations |
Ongoing |
We suggest that for all middleware using various OSG-related URLs the experiments look at this page and inform operations in case you need more help |
07 Jun 2018 |
GDPR policy implementation across WLCG and experiment services |
WLCG Operations + experiments |
Ongoing |
|
Specific actions for experiments
Specific actions for sites
AOB