Summary of February GDB, February 13th, 2019 (CERN)

Agenda

Agenda

Introduction - Ian Collier

slides

Introduction – no comments of questions. A lot of upcoming meetings!

WLCG Information System Evolution (Julia Andreeva)

slides

On SRR/CRR. We are not pushing for sites to get rid of BDII and stop using it. Simply want providers to publish SRR. For CRR, if sites are happy with site-BDII they can do as they wish. We provide only a complimentary way of publishing because OSG already stopped using the BDII.

Oxana Smirnova: regarding the installed capacity. Now in REBUS. Is this the part of this plan or does it go away completely?

Julia Andreeva: Could not agree in group on term ‘installed capacity’. The good news is that we hope the new system provides something that in the CRR schema it provides similar numbers as BDII ( number of cores and performance) so can calculate the capacity. For storage, this accounting will already have total storage space. For tapes it does not make much sense. For disk it is there. We discovered problems in that some areas are not overlapping and cannot be provided by all storage providers – indicative numbers will be possible.

Alessandra Forti: Comment on the capacity. Installed capacity has more value in operations than in accounting.

Maarten Litmaath: Agreed. Need to come to a conclusion.

JA: In the new system we see the possibility to record the data, but people should be aware that there are going to be issues for reporting both storage and CPU.

AF: Separating accounting from operations is best.

Alessandro di Girolamo: From REBUS we should remove the tab with installed capacity and or migrate to a new system with more flexibility. Will not resolve because it is a complicated matter. But we will offer to operations team to offer and tune the values. Everything is logged so you can find the history of changes.

JA: Perhaps OS can try CRIC to demonstrate whether it works.

Do you have a checklist of what to do to turn off the BDII. Would be nice.

JA: We need descriptions of CRR and SRR. Then there are concerns of EGI as they have a lot of fundtions, such as security testing, that depends on BDII. As soon as site provides information to CRIC we can enable an interface similar to what BDII provides – a configuration task to change the contact point for the information.

IC: Can exploit the meeting in May to discuss this with EGI?

JA: Yes there will be a session at the meeting.

HEP Reference workloads in Containers (Domenico Giordano)

slides

Alessandra Fortissimos: Singularity will be able to run all the workloads.

Maarten Litmaath: HS06 benchmark also used by EGI. Do you think they would also benefit from the work? Weighting different workflows for a benchmark for all VOs?

IC: If May timeline works it could be presented at the GDB co-located wit EGI and can follow-up.

??: The infrastructure, yes you can fit it in. But I agree the benchmarks should be more field specific.

The framework could be used for field specific.

EGI does go out and request resources for activities in HS06, but they do not have the allocation mechanism.

Alessandro: You mention tagging and snapshotting is standalone. Could experiments star running these periodically to validate resources? (Experiments can’t currently validate HS06). Take the suite and put inside the ATLAS pilot systems and fetch the results. Might want to think about the broader impact and where the results are stored.

DO you still need more contributions? - Think we are reaching the critical mass and will attract interest if people start leaving.

Containers WG update (Gavin McCance)

slides

  • Singularity: simple command line to run your job inside a containerised environment, enables isolation for multiple payloads of the same pilot. Allows separation of job's OS from system one. Currently one single tool: Singularity”(SLC6 / CC7) Recommendation is to stick on Singularity v 2.6.x for now
  • Unprivileged containers: this is desirable goal (RH7.6 user namespace), males singularity as standard process (ie. run direcvtly out
of cvmfs), some areas need still privileged (ie. HPC), testing now unprivileged with experiments.
  • Container distributions: general agreement that CVMFS is the most efficient and where we want to go. Works for both Singularity and docker/containerd. Other solutions needed for HPC. Less obvious need for "common inter-experiment base image" but important within experiment to build images as a hierarchy of docker layers, to maximise cache-ability.

Mattias Wadenstein: HPC extra consideration for data access as need to be mounted. Still waiting for a good solution on this.

IanC: This is outside the scope of the Containers WG

IanC: Is there enough work going on this?

Gavin: Experiments testing and involved. There is work going on.

IanC: What is the prognosis form the developers?

Gavin: Common desire to meet the same goals, being worked on.

Maarten: Need to wait also the security review and the outcome (ie. could be that GO is more secure) and might need to adapt the current versions according to this or use a different solution.

Gavin: regarding security that is why CERN is pushing hard on unprivileged use case

Alessandra Fortissimos: There are a good number of runtimes becoming competitive with Singularity.

WLCG Privacy Policy Update (Ian Collier for Dave Kelsey)

slides

Maarten Litmaath: Do you know any hard timeline that is looming or are we good so long as we continue working on it with some pace?

Ian C: Pragmatic approach. Something may happen to cause us to move much faster. Our existing policies are not bad. Not quite at GDPR level but gap is not large. For services based at CERN there is a different schedule.

ML: Yes we have new ‘house rules’ but adjusting services will take months if not years. So you have not heard any worrying about countries wondering about services running and …

IC: There is some risk of formal complaints. (Added later - possibly small)

Ian Bird: Having this policy does not remove the need for local need for policies to be in place at other sites e.g. RAL.

GATES – a general A/B testing service (Ilija Vukotic)

slides

A/B testing is a way to compare two versions of a single variable (randomized experiment with two variants, includes application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistics)

Experiments have large data stores collecting data from different computing systems: job scheduling, data distribution, FTS, PerfSONAR, etc. While that is great for monitoring, accounting, and finding issues, it is not sufficient for the system optimization. One can try to guess what kind of effect a change will made, but without validation it does not mean much. We need a way to quickly test different options and get actionable answers.

The goal is to have a testing service to do simple and fast A/B and hence have relatively quick feedback on changes in the system rather than do the change and wait or months to see the impact (i.e. ATLAS C3PO)

Proposing GATES: infrastructure to collect centrally all data from systems (panda, fts, rucio, pilots, etc.) and produce analysis based on correlations of the different inputs.

Questions and discussion

Maarten Litmaath: Nice idea. If had it could profit nicely. Realitsitc examples. What was your motiviation to present it here? Is this idea floating around in ATLAS?

IIlija Vukotic: Doing analytics for ATLAS and this is really needed. Really need for optimizing. It is the same for all other systems, not just within experiments.

Alessandro di Girolamo: We asked I to present in GDB steering discussion. Difficult to understand wider impacts. Trying to structure what we are doing. Connected with Operational Intelligence discussion. Need to do something not experiment specific. Infrastructure is shared. Set of people to abstract the problem.

ML: Quick and dirty idea can be used to prove approach and gain experience. Could use this at workshop of example of what we could aim at.

Request is for people to be aware of this and think about it, and how can we as a collaboration take this forward.

Workshop on Cloud Storage Synchronization and Sharing Services Report Rome 2019 (Jakub Moscicki )

slides

Summary:

Sites/products:

  • DESYBox: HA-nextCloud instance, pNFS access for selected users, 9KHz stats
  • CERNBox: Direct filesystem access via EOS fuse, eoshome sharding, accumulating lots of use-cases across the laboratory
  • BNLBox: mobile users:, ATLAS collaborators spent half their time in US and halfat CERN
  • Dropbox: Magic Pocket: immutable content-addressable block storage; 1PB/day, 650K+ disks. Optimizing the full stack to the lowest disk layer: not using filesystems, write directly to disk to fully exploit SMR technology (14 TB HDD)

Notebooks for analysis growing: Data-analysis applications integrated with EFSS (Enterprise File Sync&Share) with 6 sites integrating Jupyter Notebooks. The SWAN service at CERN is averaging 200 user-sessions per day with 1300 unique users in six months.

EOSC and FAIR: Integration with European Open Science Cloud (EOSC) is a hot topic. Two strategies at the moment… on-premise and commercial. FAIR principles encouraged. Funding opportunities.

Good feedback from the CS3 community. The workshop is extremely well perceived and was well evaluated by the participants.

Comment:

Maarten Litmaath: Is storage part of the name? CS4? This is much bigger than storage, so I am not sure you should put storage in the name!

-- IanCollier - 2019-03-06

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2019-03-13 - IanCollier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback