LCG Web>WLCGGDBDocs>GDBMeetingNotes20190911 (2019-09-12, IanCollier)

EditAttachPDF

Summary of September 2019 GDB, September 11th, 2019

Agenda
Introduction
2020 Joint WLCG-HSF Workshop
DUNE Computing Outlook
Rucio - news & outlook
Multi-VO Rucio instance
DUNE Rucio plans
MULTI One
Global VO Configuration for new VOs
AuthZ WG F2F report
OSG WLCG topics
Network Virtualisation WG report
SAND project status
ICECube Computing Outlook

DRAFT

Agenda

For all slides etc. see:

Agenda

Introduction

Speaker: Ian Collier (Science and Technology Facilities Council STFC (GB))

slides

January, first week back at CERN not best time for meeting hence week later
GDB @ ISGC to be confirmed

Q&A

Matthias: one more thing for events. ARC CE workshop friday after HEPiX (will email details to Ian)
Eduardo: Taiwan, following ISGC LHCONE meeting at ASGC 8/9 March (will will send details to Ian)

2020 Joint WLCG-HSF Workshop

Speakers: Christoph Wissing (Deutsches Elektronen-Synchrotron (DE)), Erik Mattias Wadenstein (University of Umeå (SE)), Mihai Carabas (University Politehnica of Bucharest (RO)) Lund/Umeå proposal

DESY proposal

UPB proposal

Ian: Proposals will also be sent to HSF meeting

Notes:

Brian: Have we had a WLCG meeting at any of the 3 locations?

Ian: DESY had a WLCG Workshop in 2011.

Graeme: Matthias, you also had an option at the end of June. That wasn't a week that we had considered but it's still early enough that we could consider. There might be more sunlight at that time in Umeå?

Matthias: Also shave 20€/person

Michal: 300€, price changing on number of people?
A: Depends on price. Biggest costs are venue, slightly higher if fewer people but not by much
Michal: Good to have those details.
Matthias: Done the bid for 200-250 people, most of the costs are per person so scales pretty well for the Umeå part.

Simone: Week ending June, very close to July 4. How much of a problem would this be for US participants?
IC: End of June a problem?
BB: What was the end date?

Matthias: ending 3/7

Ian: Sounds in room that this would be an issue, act as a blocker.
Frank: or everyone will leave on the morning of the 3rd

Ian: So back to the weeks in May.

Andrew M: Trying to keep the costs down so that more early career people can come, important to keep that in mind.
Ian: Occurs to me that I'd be delighted to come to Sweden but know that it can be expensive. 33€/night for hostel for Hamburg. And they were estimating 200€, Hamburg looks attractive.

Ian: Graeme/Simone, decision process?
Graeme: Process would be to take input from people privately, any comments, up until the beginning of next week, then in one weeks time meet to discuss.
Ian: I will remind people that we have an email address in the slides , so any questions or comments, send to that address wlcg-hsf-workshop-2020-organistion@cern.ch
All good proposals, recognise that it is real work to prepare them.

By next GDB will be some announcements.

ACTION: People should send comments to wlcg-hsf-workshop-2020-organistion@cernNOSPAMPLEASE.ch before 16th September.

DUNE Computing Outlook

Speaker: Andrew McNab (University of Manchester)

slides

When DUNE running at sites, not just monte carlo, real data

Plot - mostly to see see number of sites, peaking at 8500 jobs
Showing that quite a lot is happening, peak reprocessing. Quite a variety of sites, some in the US, some in EU

Tier1/Tier2: because custodial requirements are different, may have role for larger Tier-2s to act in a different role.

Notes

Brian: You said, supernovae were 1TB, strikes me that this is the exact sort of thing that HDF5 is for. IRIS-HEP, overlap, have you thought about it?

Andrew M: Has been mentioned Brian: Works well with data much larger than memory

Brian: coordinated workflows. LIGO uses Condor, devil in details but might be worth checking with them.

Andrew: It is said you need site to do something, but with enough jobs and network paths, can work out how to do this

Rucio - news & outlook

Speaker: Martin Barisits (CERN)

slides

Edgar: slide 8, how to scale. You're creating new threads?
Martin: More like distribution of the workload among threads already working.

E: Where are the scalability limits - datasets/containers, etc. I remember a case where with a container w/ too many files, MySQL gave up. Where are the limits?
M: In general, always scale with database, whatever it can handle. In terms of creating transfers, billion size dataset, make replication rule: this will not go through, take too long to specify. These are things we are trying to address. Limits on terms of size but largely DB.

E: have you documented these?
M: We tried to scale this as a test, 10 million, 20 million sized datasets went through fine. Probably not the way you want to go in a practical approach with data management.

Doug: We heard in the previous DUNE talk about object stores, storage technology is changing from POSIX-Object, there are robust systems out there, what is the RUCIO thinking about that?

M: Currently, always used object stores as file stores which is probably not the way you want to go, but don't have specific plans about this in the future. Possibly going to event level, orders of magnitude difference in row storage for DB. But for now no specific plans.

Multi-VO Rucio instance

Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

slides

Q: Rucio authentication: which method are you using to auth users; what happens with a user that is in two VOs?
Katy: Data security? Has been put in place.
A: PKI? Certificate access? Password?
Martin: Multi VO, auth is the same as usual Rucio, x.509/password/kerberos/token based/etc. When auth to rucio you also specify the VO you want to attach to. User could be mapped to multiple VO and then choose.
Q: If that user has access to data, can see data from other people, but not write?
A: Will only be able to see data within VO, but not other VO.
Q: Isolation for read/write in RUCIO, when implemented?
A: Astro and other domains have been asking for this, we're looking into it, question is how fine grained this needs to be. Looking into it but nothing for just now.

Edgar: Multi VO instance - you mentioned DB, multi VO are sharing a DB? We heard that the database is the limiting factor... Not sure how big experiments feel about sharing database if transfers are being slowed down...

Ian: Katy highlighted that we support VOs that don't have much data yet. Don't need to bring up an instance per VO. But might migrate later when data becomes larger.

Q: Why did you choose architecture with one database?
A: Use case is where it's unlikely to be an issue. When they have more data and files, then bring up a new instance. Keep cost of entry down for them and us.
Katy: Development would be to carve off an instance from the multi-VO when needed.

DUNE Rucio plans

Speaker: Robert Illingworth (Fermi National Accelerator Lab. (US))

slides

Brian: Thing with the tape backend requiring SRM, what's the way forward?
Martin: [rucio] Don't have a good answer. Right now using SRM for everything tape related, not sure what plans are
Robert: Stage with SRM with something else then switch to SRM?
Katy: CTA is using xroot only, not SRM.
Brian: sounds like an amorphous thing, no one knows the timeline. Try to nail down in DOMA who owns this. Multiple paths to solution but need someone to own it and have a clock that counts down.

Ian: Nominees?
Brian: I nominate Simone

ACTION: Communicate to Simone, to delegate as appropriate.

MULTI One

Speakers: Edoardo Martelli (CERN), Tony Cass (CERN) slides

Brian: Had several funded R&D projects along these lines, Condor jobs, network namespaces

Another approach experiment with gridftp packet tagging, etc. what we found was that the best way to do this was to have gridftp tag flows, have SDN structure decide what to reroute. Instead of tagging packets tagging flows and deciding what you what to do differently was for us the most pragmatic approach.

Something we'll be looking at in the SAND project.

Shawn: go through in gory detail for this kind of thing and other things. Work to explore what would pay off in our community. Rich set of options, issue is going to be identifying easiest/most complex. Multiple data centers, VOs, very complex

PeteC: Good thing to do Eduardo. Caution that one should write down the problem you want to solve - what exactly is the problem that SKA shares with WLCG. Risk turning solution looking for a problem.

Ian: Good point.

Steve: LHCONE problems on monthly problems with routing. Understand logical reason you want to do this, problem is n^2. Multiple networks for one SE

Brian: Tagged flows worked the best for us

Tony: For NOVA, you don't have to do the work, if people want to do the work to separate it's not a requirement because the traffic goes to CERN only. If we do the separation work at CERN that that can be a demonstration of how to do this

Ian: Are you content that it's just fermilab and cern working on this, "we're going to do this" or do you want other sites involved? Eduardo: at least one, any other site that wants to join, particularly if you use something different.

Tony: And people like Brian

Ian: Explicitly not requiring other sites, but interested sites with bandwidth should get in touch.

ACTION: Sites with interest in this should get in touch

Global VO Configuration for new VOs

Speaker: David Crooks (Science and Technology Facilities Council STFC (GB))

slides

QA:

Brian: in OSG the VO data is topology, the voms data is osg-vo-config. Topology has a web api - been long on the list to move all the data into topology.

David: Should've said: presenting my understanding

Andrea: VO is Voms specific - moving to a model where more information is needed. How flexible is this for future needs of Oauth/OIDC Vos who have to show different things?

David: This is here's what we currently have, and then here's the future

Brian: VO identified as a URL - do I trust that endpoint, but it is easier to refer to without clashing between infrastructures

Edgar: How would this work today? Have a VO with resources in EGI an in OSG - go and register the VO in VO config of OSG and then in GocDB, then have to

Maarten?: Note that VOMS information goes into the operations portal, not GOC DB

David: Current way it works - VOMs in operational portal and OSG repos. Populate the Ops portal and then inform the site and they'd add you

Edgar: Once it's in OSG, I just tell the site "these are the files from my repo go and get them"

D: Q of how best can we streamline - focus on the well understood areas so that if a site wants to check details they can do it without contacting

Andrew: Would be nice if we could have one way - The EGI way doesn't have an automated where OSG does - could we adopt OSG and bring it out to the WLCG thing? Rather than having multiple systems

Ian: Possible. Ponder that for a bit as it will come up again!

Brian: Lot of effort to combine - things we can do to reduce friction. Trust roots, policy - the VOMs stuff is easy, but that's hard. Figuring out the trust roots etc would be the way forward.

Ian: True that if we knew where to look for info - a simple piece that isn't quite there, but there are also more difficult things.

AuthZ WG F2F report

Speaker: Hannah Short (CERN)

slides

No Questions

Ian: Crucial point - read and comment the schema within the next week and a bit.
Hannah: I'll be pinging some for review.

ACTION: Read schema document and give comments over the next week

OSG WLCG topics

Speaker: Frank Würthwein (UCSD/SDSC)

slides

QA:

Maarten: CVMFS in planning, if not production

Ian: I wonder if it's backwards - sites who get their entire worker node from CVFMS. CVMFS is easy to install in a way worker nodes are not

Shawn: What Frank was implying

Frank: we distribute our containers

Brian: One thing we have working is a non singularity CVMFS and Singularity all in one - requires things which need brand new training, but things move forward. How do we distribute containers better for things which aren't WLCG sites - then CVMFS is a convenient way to distribute containers. Smooth over configuration differences - back to VOMS, there's a lot of config states - even more opaque than CAs to install

Steve: is self contained containers making things harder?

Frank: The objective is growing raw capacity. Can't expect universities to support tools at cost of effort. Therefore to grow capacity you need means to do so without cost - if i can achieve the aspiration, there's no reason people can't offer access. But if you can't then some clusters wil always be closed

Matthias: Seen work done by NCF?

Ian: Lots here - can we agree on an approach to bear down on some of them? Some differences irreducible, but better ways to share information and some can probably be done fairly easily. Is this something where it would be worth setting up one working group to work on this convergance to spend time to chip away?

Frank: Delegation to some of them - ie some about CILogon to joint security.

Dave K: Several approaches - combined assurance David spoke about, don't need to do it at AuthN level. Already talking to LIGO, do it for DUNE & Fermilab. But it is a risk assessment to be done. Average end user - don't need higher level of assurance - up to a VO to decide wo to trust and what they can do

Maarten: Not a quick thing - now have experience of 20 years, do we need to change?

Frank:

Delegate ways to deal with Auth better to security group

CVMFS less obvious - some aspects plain deployment, but why some to CERN and some to EGI

Brian: People with stakes trying to work together. CVMFS working group, but it's not under a wider umbrella. Maybe bring this to them and ask them to charge the hill

Ian: Maybe a piece here about gathering info. Say to CVMFS at CERN, maybe gather info. Need to remind CVMFS people about things to be done to be brought together

Andrew: Using the CVMFS from EGI - stuff you need to know, easy to get working with some things missing. Need a central place for CVMFS to make sure it's more standardised.
Maarten: This can be done

Brian: Not really a mechanism for feedback

Andrew: Theme- get people to be permissive enough. You need a "definitive" list for people to look at

Brian: Some policy issues. Repos hosted by OSG need to be OSG affiliated.

Andrew: But that's just hosting - people need to be permissive about things they're not involved with and allow people to contribute to config.

Brian: Need to separate things we host and things we keep

Andrew: chasing things side by side

Ian: Start doing things assuming limited scope, then realise resources everywhere and it's not so limited. For our broader WLCG/EGI/OCG working out a way to unify CVFMS is not interactable - not everyone will want it. Should get our heads together and just do this. Highlight to those running CVFMS that we are seeinfg issues, needs some light shone on it

Alessandra [via Zoom]: Not all sites deploy central [CVMFS] configs - should be a requirement

Things to delegate:

CVMFS stuff to people running the services

How do we tackle VO things David was talking about? Reasons why things are how they are, but how can we be more seamless

Brian: Some things are irreducible - eg established trust. However there are things that once done, do we need to do others? Do I trust people, what are the techical steps needed. As the trust infra becomes more diverse, need to consider what different things are used.

Dave K: not had a single level of assurance - trust level between individual Vos. Be nice if OSG and EGI do it the same

Ian: Or at least not in conflict

Dave K: Agree what we're recording and compare

David C: If there a source on the EGI and OSG side doesn't need to be one place - just well understood access mechanisms and info.

DK: OSG should contain to maintain their own - issue is shared Vos

Andrew: I don't see why we couldn't have one place.

Brian: Reduce overlaps, even if you can't overlap

Dk: we may agree, but convincing EGI is harder.

DC: One way we could do is as we have read APIs if the WLCG wanted to do something, as long as it had the right read permissions here's something which pulls from common sources

Ian: Then important thing is that information is equally understandable and compatible

Brian: Detecting conflicts etc would deal with much of the hardship [automation isn't always the be all and end all]

Ian: Could be enough that the two point to each other -- then you know where what you need is.

Network Virtualisation WG report

Speaker: Shawn Mc Kee (University of Michigan (US))

slides

Q: Is there a standard?

Shawn: Particular momentum behind P4, but there's not really production quality drive which covers multiple networks

SAND project status

Speakers: Brian Paul Bockelman (University of Nebraska Lincoln (US)), Shawn Mc Kee (University of Michigan (US))

slides

Q: Copying transatlantic data from?

Shawn: Doesnt identify individual sources - just the volume

Frank: Not useful for asking why are we using the transatlantic network the way we do - limited ability to do network use diagnostic.

Brian: as we start to gain condor data points, we will gain some more info - though it won't answer your question.

ICECube Computing Outlook

Speaker: Benedikt Riedel (University of Wisconsin-Madison)

slides

Brian: Burst work - order of 60,000 cores within a handful of minutes - how far ar you?

Benedikt: Sit on around 3000 cores in bout 30 seconds, but have issues with Amazon API limits - have to email them and ask to increase limits.

Brian: Do you percieve limits between 3 and 60, or is it just a case of dialing up?

Benedikt: Not using a scheduler - can be more pinpointed, and just used the AWS interface. More issues with the monitoring - want the infrastructure to pop into existence and then disappear instantly.

DavidCrooks - 2019-09-11

Topic revision: r4 - 2019-09-12 - IanCollier

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback