Summary of July 2019 GDB, July 10th, 2019

Agenda

For all slides etc. see:

Agenda

Introduction - Ian Collier

Speaker: Ian Collier (Science and Technology Facilities Council)

slides

GDB Chair (IB) - no slides

  • IC chaired GDB for 4 years (formal position with no fixed term)
  • Current position ends around end-2019
  • 25-30% FTE
  • Opportunity to review role to revisit role of GDB
  • more "operational" than "deployment" (there is an ops group)
  • lots on ongoing R&D, and how will they be deployed and policy implications (eg security)
  • more formal collaboration with other experiments (eg DUNE, Belle2, SKA) but they are not dependent on the WLCG-specific infrastructure
  • more scope for the GDB and responsibilities for the Chair?
  • could be co-Chairs (or expanded steering committee)?
  • in the past, Chair was not supposed to be a CERN person (continue?)
  • formal process with nominations and a vote by (some group???)
  • meetings still monthly at CERN but could be more virtual

Comments by the group

  • still feeling that CERN is the ideal meeting venue
  • two co-chairs to help with time-zone issues (help get chair from outside Europe)
  • two chairs could bring different expertise
  • meeting frequency - could be every two months rather than every month?
  • group/clustering other meetings (eg DOMA) with the GDB weeks
  • GDB has not been focal point for R&D (maybe reconsider)
  • eg HEPIX has a US and European co-chair
  • GDB is still very euro-centric
  • co-chair with complementary locations/group/experience?

Clarification 2019-08-01 IanC - depending on details and how you account time could be more like 20%

  • send nominations and comments to Ian Bird
  • timescale for identifying candidates by October 2019
  • plan is to proceed with search for co-chairs with the co-chairs managing the frequency
  • current position is 2 years (maybe have overlap) with renewable term

CernVM Workshop report (JB)

  • Review of CernVM workshop in June 2019 held at CERN
  • no questions/comments

Dynafed pre-GDB report (AD)

  • Review of Dynafed workshop in July 2019 held at CERN
  • Dynafed itself is in a good state and much of the effort is integrating its use
  • there remain some outstanding challenges (authentication) and integrating it with experiments workload and data management systems

Security Challenge Final Report (SG)

  • final report in EGI security challenge
  • previous report during the May 2019 GDB
  • summary said security is generally good but challenges remain

Traceability & Isolation WG report (VB)

  • eport and recommendations about the traceability of jobs (pilots/payloads)
  • xcache use case needs to be considered

DPM Workshop report (FF and OK)

  • report on Bern workshop in June 2019
  • concern about the long term support of DPM (limited manpower)

ALICE Computing Outlook

Speaker: Latchezar Betev

ALICE upgrading its read-out system, offline and online software. Some highlights (see slides for details):

  • Completely new read-out system, no trigger farm, all "compressed" (40x) by new "O2" facility
  • Planned O2 facility similar in compute power to a T1 (hoping for more optimization)
  • New processing algorithm (for Run3) already uses 8x less CPU (compared to Run2) and 2.7x less space for its output
  • Ongoing work to optimize MC simulation: more details in a report in September
  • Grid Middleware, AliEn being completely rewritten into jAliEn
  • Compared to the current 72% used for MC simulations, only 25% is planned for Run 3

Question/Answers/Comments:

Edoardo Martelli: Are the 9000 Fibers used? What's the rate of data. Latchezar Betev: These 9000 Fibers are the current plan. The estimated rate is written on the slides (3.5TB/s)

Ian Collier: When you say it's fits into the "standard Gird resource growth", which model are you using? Latchezar Betev: It's hard to say, it's a crystal ball in any case, there will adjustment. The main message is that we are not factors away in term of resources, we did a lot of effort to fit reasonable estimates.

Alessandro Di Girolamo: About the new EOS erasure-encoded configuration, instead of the standatd 2 copies, how much are you saving? Latchezar Betev: Currently the overhead is 20% (8+2). It's already done in fact and seems to work well.

ATLAS Computing Outlook

Speaker: Alessandro Di Girolamo

See slides for details. Highlights:

  • Compute: ATLAS using more resource than the pledge (but ratio is decreasing)
  • Storage activity: 80% synchronous, currently LAN, planing to use "Rucio Mover" -- 20% asyncronous, managed by FTS through Rucio
  • For Run3, no major change but lot of work on updating the infrastructure itself
  • Real challenge is for Run4/HL-LHC: CPU resources required still above "flat budget" (that will not happen), but improving. Disk also above.
  • Software preparation and validation ongoing in order to be ready for Run 3
  • Disks: no pledge increase until 2021, will be tight. Study Group working to decrease data footprint (30%)
  • GPU/ML: Opportunistic resources, hard to use. Making sure that Panda can support GPU/ML jobs and execute them
  • HPC providing quite a bit of resource. Requires a lot of support and future resources unclear.
  • Facilities change: Plan to decommission storage at small sites (<500TB), which seems to generate more problem/support on average
  • Facilities change: New deployment idea: site deploy kubernetes, central team deploy and maintain services

Question/Answers/Comments:

Latchezar Betev: What is "Rucio Mover"? Alessandro Di Girolamo: It's a tool that configure access to data on the WN based on the available protocol. For production is normally results into copy to scratch but for analysis, it's not the cae.

Pepe Flix: When you say sending ML jobs to GPU resources, do you include HPC? Alessandro Di Girolamo: Yes, definitely, it's everything we can get

Latchezar Betev: Lost files in not only hardware failure, but also software and human

Romain Wartel: On Slate, the discussion is not only the central team and systems, but also the problem of managing Kubernetes at every site Alessandro Di Girolamo: This is only the beggining, there will be a lot of challenges Remote(?): This is the first time we start discussing this model, it would be a good idea to have a pre-GDB on it Ian Collier: Yes, we could schedule one (but maybe not before December, a lot is already booked) Matthias Wadenstein: As a site representative, I want to express my worry that Kubernetes seems to be more complex to deploy, maintain and keep secure than the grid stack, for which we have experiment

Xrootd Workshop report

Speaker: Michal Kamil Simon

See slides for details

No question or comments

Privacy Policy Discussion

Speaker: David Kelsey

Discussing few issues/problematic points in the Privacy Policy (See slides for details)

  • Must we enforce the deletion or anonymization of accounting logs at 18 months ?

Matthias Wadenstein: Are you talking only about job accounting or also storage access logs (who created/accessed/deleted files, kept much longer) David Kelsey: So far, it's only about job accounting. Is this documented everywhere? Is there a need to keep them? Matthias Wadenstein: Keeping for example who deleted a file, when a VO is asking why it's not there, is very important and has been used in multiple cases

Remote(?): Why can't we simply use the same wording as the GDPR "no longer than is necessary" and then sites can define what is necessary? Ian Bird: The responsibility is in the end the site, who is holding the data. But we have to be very careful not to create a miriade of policies David Kelsey: I believe that only saying "necessary" without the exact duration is not "open and transparent" and thus not inline with GDPR.

Alessandro Di Girolamo: What about central systems, at the VO, at CERN? Ian Bird: The data should be anonymized at that time or there should be a privacy notice for it. Simone Campana: For example a Panda user should be made aware that their accesses will be kept and processed at CERN

  • Why do we need to keep the user registration data for 36 months?

Ian Collier: We should make it explicit that the reason here is "for allowing the users to come back".

  • Other comments on Data Retention

Latchezar Betev: Shouldn't we fight to try to get an exception instead of trying to fight to fit when it doesnt David Kelsey: The best outcome for us indeed would be if the Géant Code of Conduct to be finalized and then get an EU ruling that scentific research and data is something different ....

[Long and heated discussions not reaching consensus or conclusions, partially cut short due to timing]

Latchezar Betev: We are keeping all the logs of who ran production jobs in the past. We are using it long after: we analysed it recently ? (CMS): CMS will store all the job record on tape, would we need to delete them?

Ian Bird: We have to differentiate what we did in the past and what we should do in the future. We should keep reasonable policies with how long we keep data in them. If a VO/VO service wants to keep data for longer, e.g. 30 years, they should write it in their privacy notice and make sure that the user is notified/aware of it. Instead of arguing between us on finding a date, VO should set their and wait to be cahallenged

Next step: Dave to produce a document and present it to the Management Board.

-- IanCollier - 2019-07-12

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2019-08-01 - IanCollier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback