Summary of March GDB, March 21, 2018 (ISGC)
DRAFT
Agenda
https://indico.cern.ch/event/651351/
Introduction - I. Collier
presentation
Notes
Main topics
Upcoming meetings
AMS Report - F. Lee
presentation
Notes
- 230TB raw data so far
- ~16 billion events per year
- data reconstruction rate = 100 times data collection rate
Questions
- Philippe Charpentier - which tools are used for reconstruction, simulation
- Felix - 2 approaches
IHEP Grid Report - W. Wu
presentation
Questions
- Jiri Chudoba: why both DPM and dCache are used? why not unify them?
- WW: historic reason; no appetite for merging; both are stable for the amount of storage used
- Philippe Charpentier: why ATLAS do not fill grid queues and need BOINC to do it?
- WW: there is an conservative approach for Panda (ATLAS)
- PC: some more agressivity on pilot jobs is needed to fill the 400 cores… jobs are highly inefficient
Security Report - R. Wartel
presentation
Notes
- specific Asia-related issues
- discussions on how to improve further collaboration in Asia?
Questions
- RW: maybe new operational security trust group?
- Ian Collier: maybe Asia Tier Forum could help?
- Sang-Un Awn: we recognise there are problems, and working on them
- David Kelsey: create a mailing list?
- RW: need to start with a list of volunteers; others will follow for sure (other countries); security contacts from GOCDB do not always work
IPv6 Status - D. Kelsey
presentation
Notes
- Motivation includes WN/VM IPv6 only sites + opportunistic, many providers finding it hard to provide enough IPv4 addresses
- Track progress using central BDII, then expt VO feeds
- SiGNET (Slovenia) is in the vanguard
- EOS instances all IPv6
- Public internet, 28th Jan, turned on ATLAS EOS instance
- Replacement of firewall to make it fully IPv6 capable (public)
- Tier 1 status
- connectivity: all except KISTI (NB: to be done, planned by end of May)
- Dual stack storage not such good progress
- (2 weeks old, haven't rechecked)
- What's the % of storage per VO?
- Fermilab/Brookhaven FTS IPv4
- In deployment, better to fix IPv6 than switch back to IPv4!
- Tier 2
- Tickets asking for clarification have been issued
- hand holding/checking/advice/can't help everyone, but...
- Monthly stats
- Done gone from 10% to 20%
- FTS transfer monitoring
- 12% of data transferred in 30 day window over IPv6
- efficiency increase with IPv6, is there an obvious cause?
- newer plot shows ~ same results (11% data transferred over IPv6)
- IPv6 transition doesn't always go smoothly
- lack of training?
- put things in DNS, but later firewall problems
- 43% of UK data available over IPv6
- IPv6 seems to be faster? (Brunel)
- MW: IPv4 via firewall, while IPv6 bypasses?
- PC: Different protocols lead to different results?
- EOS components IPv6 internally
Questions
- WW: What of this data is going to appear through Grafana? ATLAS Kibana, doesn't have all sites, doesn't have all data. More complete info in this one?
- DK: Whenever I've seen personal it's been these dashboards, don't know if it's going somewhere else
Belle II Report - T. Hara
presentation
Notes
- Would have liked to have joined in person
- Colleagues include tech/support staff
- CHEP2015 in Okinawa showed expected resources
- CPU: have to reprocess data many times
- Expected bandwidth: 10Gb/s
- Compute model has layered structure
- BelleDIRAC extensions
- production management, monitoring, fabrication...
- KEK site - several servers for DIRAC
- 2 catalogues running at KEK
- CondDB -> BNL
- Belle uses LHCONE
- JPN-US - previous 20, now 100
- JPN-EUR - prev no direct, 2015 20G -> 2019 upgrade
- handle different kinds of production
- DDM is working well
- expecting more transfer so need to make sure
- implement visual monitoring system
- need new dashboard as new monitor needs CERN accounts
- End of April observe first collision -> data taking
- Site issues
- BNL became main centre for Belle in US
- BNL only T1
- all raw data processed there
- decided by DOE in August
- graph shows transition
- looking for 3,4 times running jobs
- developing features, look to reach current LHCb level in future
Questions
Middleware report - E. M. Wadenstein
presentation
Notes
- Batch system poll in room
- pbs/torque: some
- htcondor: more
- lsf: couple
- sge: some
- cream: some
- arc: some
- htcondorce: couple
- other:
- BDII representation
- Torque/Maui: trending down
- SLURM: trending up
- HTCondor: trending up
- GE: slightly down
- LSF: even
- Recs
- HPC -> Slurm
- HTC -> HTCondor
- new features -> at least these software should support these
- HTCondor: CERN + RAL moved
- HTCondorCE only really supports HTCondor for WLCG
- WLCG to get help to get dev to Asian meetings, Asian Tier Forum?
- HTCondor or ARC?
- Not sure about Slurm contacts
- Offer is there to connect right people
- training, personal discussions, etc...
Questions
- IC: Need you to tell us (to room)
- Sang-Un Awn: Don't know next plan, on agenda for next planning meeting
- end of April
- after that make contacts
- next forum
- In any case for next ISGC a short colocated workshop?
- MW: OK, HTCondor/ARCCE most interesting?
- Sang-Un Awn: Good to have this at well established place like ISGC
LHCOPN/LHCONE Report - E. Martelli/I. Collier
presentation
Notes
- different communities: LHCOPN up to sites
- LHCONE has different user agreement
- pragmatic to include other communities
Questions
- DK: What was global identity federation problem?
- Observation that this will be an issue with DTN
- DK: Node to node?
- Came up in more than one talk. Effective transfer nodes, still need interface
- DK: remember suggesting not to use their own PKI, will look at slides
EISCAT 3D Report - E. M. Wadenstein
presentation
Notes
- Measuring interactions between Earth and Sun via ionosphere
- Big radar dishes
- Replace dishes with fields ... software defined
- land acquisition, permits mostly done
- Transmission, reception from several sites
- 64 Gbps out from each node
- 3 racks of DAQ
- planning for accessibility of site
Questions
- IC: Scalable in terms of people?
- NEIC project 2 (1.5) FTE
- Hardware/antennas, scaled up 12 people
- + more
- + outsourcing: FPGA comes with software from company
Storage Accounting Update - D. Christidis
presentation
Notes
- CRIC heavy development
- SRR good progress but under dev
- WLCG Archival Storage WG
- Enable sharing of metrics and statistics between sites
- EOS wants to remove SRM this year
Questions
- IC: For last steps, timeframe for CMS?
- Not yet. Expect T0 soon since network already there, can't give estimate
--
DavidCrooks - 2018-03-23