LCG Web>WLCGGDBDocs>GDBMeetingNotes20150411 (2015-04-12, MichelJouvin)

EditAttachPDF

WLCG Workshop, Okinawa, April 11-12, 2015

Agenda
WLCG Status
Experiment Session
Computing at HL-LHC Timescale
WLCG Partners
Wrap-Up

Agenda

https://indico.cern.ch/event/345619/other-view?view=standard

WLCG Status

Tokyo T2 Site Report - T. Nakamura

ICEPP regional analysis center, located at Tokyo University, created in 2007

ATLAS only
Upgraded every 3 years: next upgrade at end of 2015
- Goal: reach 7-8 PB by 2018
46 KHSO6, 2.5 PB pledged (3.2 PB including LocalGroupDisk)
- Storage : DPM

WN: 10 GbE connection

WAN: 10 GB/s connection

Can be saturated by FTS3
Japan connected to NY by a 10 Gb/s line + a new 10 Gb/s line through Osaka and Washington DC
- Also a backup line through LA
International connection to be shared in the future with Belle2 and ITER
- 100 Gb/s upgrade planned in 2016 : LA
- 2 more 10 Gb/s to GEANT in 2016
- 20 Gb/s for Tokyo T2 in 2016

WLCG Status and Readiness for Run 2 - J. Flix

Operations Coordination well established since its creation in 2012

1.5 FTE
Manages operational issues and service deployment in synergy with OSG and EGI
Discusses experiment plans and requests
Define and follow-up actions
- Task forces and WGs created as needed: ~10 FTEs
Meetings: fornigthly meeting + shorter meeting twice a week

Achievements in the last year

Federated Xrootd deployment for FAX and AAA
perfSonar deployment
Coordination of SHA2 readiness effort + replacement of VOMRS by VOMS-Admin
FTS3 testing and deployment
WMS decommissioning
Multicore job support shared with single core jobs
- High efficiency of resource utilization maintained

WLCG critical services list updated for Run2

Used by T0 to allocate the appropriate support effort
May be extented to cover T1s and T2s in the future: some critical services hosted outside CERN

glexec deployment done and moving to production

Already done in CMS
In progress in ATLAS

Other ongoing activities

Squid monitoring
Machine/job features: pass information to running jobs
IPv6 validation and deployment
http deployment in WLCG
MW readiness WG: readiness assessment using experiment workflows
- WLCG MW Officer role created: defines baselines
- Expert from all partners participating
Network and transfer metrics

WLCG Operational Costs

Feedback from experiments + site survey end of 2014
Huge effort done by all experiments to optimize their computing models to achieve phycics objectives with the budget restrictions in many countries
- Improved SW
- Multicore jobs
- Optimized usage of disks
- T2s used for some workflow activities previously handled by T1s
- Exploit new type of resources: HPC, clouds

Main challenges foreseen for Run 2

MW support
Migration to new batch systems
- Also pass job parameters to batch systems
Migration to a new major OS version
IPv6 + more demanding networks
Cloud resources (community and commercial) part of standard WLCG resources and operations
Unsecured budgets... in particular for personnel

Need to begin to think at long term planning for HL-LHC: expect significant changes in the computing models

Discussion

Jeff T.: unsecured budget for personnel is a major threat. Can live with lower HW resources than expected but cannot leave without the persons to run the infrastructure and support the experiments
Philippe C: it is time to stop speaking about T1 and T2. MONARC model is dead: experiments looks only at functions/services delivered by site
- Ian B.: agree technically but politically we need to continue to deal with the difference. A requirement from funding agencies.
- Pepe: T1s are committing to higher level of support, impact on manpower

WLCG Security - R. Wartel

Identity federation status

A lot of apps impacted and requiring changes: Vidyo, online CA, VOMS
Some progress: communities and projects better organized, Code of Conduct should help establish trust, policy work well received, AARC (H202) bringing some hope...
- AARC: 2 years, 19 partners, outreach/training, technical and policy work. Priorities: working international authN, harmonization of attributes

Global computing: also adopted by criminal organisations!

Cyrbercrime highly profitable and risks are minimum
Specialized markets
Malware-as-a service
Strong consolidation of the underground economy: severe competition between a handful of exploit kits (EK)
- Huge progress in time-to-market for exploits: a vulnerability can be exploited in EKs a few hours after being identified
Email remains the leading source of compromise: 90+% of breaches caused by spear phishing
- Target phising: 70% of efficiency
- Exploits started before antivirus updates: 24h before an antivirus update

Old security ("medieval fortress") approach does not work anymore

Landscape has changed: datacenters security and laptop security are equally important, main attacks target both and all platforms
Need to focus on procedure and people
Protect both services and people
Windigo example: lasting since 4 years, 30K servers compromised in the last 2 years (including big names!), a full ecosystem of advanced malware

New threats: ransomware, doxing

Recently happened in HEP communities: multiple staff targeted, including death threats
Our community exposed: very open, a lot of articles with personal information...

Need to learn and adapt

Internatiaonal collaboration is our main asset
Don't overlook mobile device security/protection
Incident handling as part of normal operations
- Importance of traceability
- Also has a cost
Global adversaries require dedicated WLCG experts
- Sites will deal with traceability requests
Global incident response: need appropriate legal, policy and technical tools
- Also remove community/organisation boundaries

DB Services during Run 2 - L. Canali

Oracle: remains the proven solution for high-availability, concurrent transaction DBs

New NetApp backend (FAS8060) with more memory and and SSDs: improved perfs
Oracle production version at CERN is 11.2.0.4
- Preparing upgrage to 12.1.02.2
New critical databases
- QPSR (Quench Protection): 150 krow/seconds, 1 Mrow/seconds achieved during stress testing
- SCADAR: WinCC/PVSS archiving
Replication evolution: moved to Golden Gate for Atlas conditions DB replication to T1s, Active Data Guard for online to offline replication
CERN 24x7 piquet during Run 1: will restart in May 2015
- The need will be reevaluated in 2016

Trends: servers with more cores and memory, SSDs becoming affordable

Consolidate HW to reduce management costs
- Balance with the ability to upgrade services one by one: also adding the possibility to run several Oracle instance on the same box
Review applications to take advantage of workloads that can fit in memory
- Reduce application complexity

DB on demand: self-service for provisionning, management of backup

Mainly MySQL (85%) but also Postgres and Oracle
Monitoring tools provided to users to troubleshoot performance problems

Scale out DBs: share-nothing architectures targeted to high performances and low cost

Backend at CERN: Hadoop
Query engines: both SQL (declarative) and imperative (MapReduce and Spark)
Currently offloading some Oracle DBs to Hadoop for DB that are "write once, read many": LHC logs, SCADA, CMS popularity
- Data warehouse, reporting and analytics

Network Update - T. Cass

CERN news

WiFi: Campus-wide 802.11ac rollout planned in 2016/7
- Controller-based "wave 2" solution with one access point per 3 offices
- Market survey in progress to select the HW in December 2015 or early 2016
CERN network: remove the strong difference between LCG and GPN networks, move Campus network off GPN
- LCG, GPN and other networks in the future as subsets of the computer center network
Data center connections: 260 Gb/s to experiment pits, 200 Gb/s to Wigner, 210 Gb/s to LHCOPN, 100 Gb/s to LHCONE, 80 Gb/s to GP internet
ESnet extended to CERN (and Amsterdam)

Ethernet roadmap: 1 Tb/s after 2020...

LHCOPN: main paths remain, backup paths moved to LHCONE

LHCONE: moving to a global infrastructure for HEP (Belle2, Auger...)

Many common sites
Working on extension to South America (Argentina for Auger), Africa and Middle East
AUP now agreed/available

IPv6 deployment in WLCG progressing but T2s readiness remains a concern

Only 20% of T2s currently ready and 20% with a plan in the next 2 years
Experiments working on IPv6 support: CMS wants AAA sites to support IPv6 during 2015, ATLAS requesting T1s and T2Ds to provide dual-stack perfSonar instances
- IPv6 WNs expected soon at several places

SDN

Project started with Brocade in OpenLab
Focus on expected network evolution and new use cases we could face in the future
- CDN4LHC: reduce load on long distance links, improved perfs for poorly connected sites. Network of cache servers based on peering, IPv6 multicast
- SDN may play a role in implementing these new strategies

WLCG Monitoring - J. Andreeva

See slides

Batch Systems - I. Collier

RAL is running 560 WNs, 12K cores

40-60 Kjobs submitted every day

Start looking at a Torque/MAUI alternatives in August 2012

Scalability, reliability, high-availability, dynamic resources
Concentrated on open-source solution
- Open-source GE: long-term future uncertain, community not very active
- SLURM: found various issues in our use case
HTCondor chosen as replacement

Step by step migration to HTCondor

Started a new CE with decommissionned HW
Tested with a first VO: ATLAS
- Hardened the configuration
Then CMS, LHCb, ALICE
Adaptation of operation tools: monitoring, accounting...
After validation by all the LHC VOs, migrated 50% of the capacity
- 1 year after starting investigation
- Migration completed 2 months later

Very good experience over the past 2 years: very stable operation

No change needed to the configuration when ramping up the number of WN
Higher job start rate
Easy upgrades
Strong and good community support

Main HTCondor features used

Hierarchical accounting groups to achieve fairshare
- dteam/ops treated as high priority jobs: also possible to flag other groups
partitionable slots + defrag daemon
- defrag daemon: configuration updated by a cron job to do only what is necessary
- Also use a sort expression to ensure that multicore jobs are considered before single core jobs
HA central managers
- Several collectors running concurrently: submit machines and WNs report to all of them
- One active negociator: managed by condor_had
PID namespace + mount under scratch
cgroups for CPU and memory
- memory cgroups: issues found, disabled until fixed

Monitoring historically based on Nagios and Ganglia

Used startd cron to implement a WN health check and prevent a job starting in case of problems with a VO granularity
- Information published into a ClassAds: easy to check it with different tools
Nagios checks: mainly for condor_master daemon on all machines
- Alarming different depending on the type of machine
condor_gangliad: collect information from Condor ClassAds into Ganglia: one dedicated machine
Recently added ElasticSearch for displaying/analysing completed jobs
- Source: condor history files

Integration with cloud (OpenNebula): allow to take advantage of unused cloud resources

About to move it into production: absence of static membership helps a lot

Future plans

Move WNs to SL7 and run SL6 WNs in a chroot environment using NAMED_CHROOT functionality in HTCondor
Simplification of WN configuration/installation using CMVFS grid.cern.ch: 800 RPMs less
Use PID namespace as an alternative to pool accounts: should provide the same of job isolation and traceability

Volunteer Computing - L. Field

Motivation

Free resources: 100K hosts achievable...
Community engagement: outreach channel, offering people a chance to participate

But also many challenges

Cost of using free resources: integration, operations
Attracting/retaining volunteers: advertisement, engagement
Low level of insureance: anyone can register

Virtualization opens a way to address the challenges: tested with Test4Theory

Vacuum model is very close to BOINC model: VM started by BOINC starts an agent/pilot to connect to experiment central queue

vLHC@home: CERN BOINC central service for several projects

ATLAS@Home started 2 years ago without any effort to attract volunteers: already 5K volunteers, 2nd simulation site
vLHC@home includes a Drupal portal as the common entry point for all projects
DataBridge: a scalable and efficient service to download job parameters and upload job results
A common platform allows to coordinate/share the outreach effort and the development/operation costs

WLCG Operational Costs - A. Forti

First summary of the survey presented at March GDB, concentrating on FTEs

MW Support

CEs and SEs concentrating most of the negative feedback about deployment and troubleshooting: poor documentation, lack of log mining tools
YAIM future unclear but many sites still relying on it
Sites would like to see the WLCG specific services reduces
Concerns about ARGUS and Torque/Maui support

Torque/MAUI: sites waiting for recommendations

Toward a "simple T2"

Recommend ARC CE: no need for an APEL box, simpler to configure/manage
Simplify/reduce the number of information published into the BDII
- Also missing a YAIM replacement to fill the value in the BDII
Keep up with the work to reduce the number of storage protocols

Virtualization/clouds: provide sites with a few images containing everything required, without the site having to configure specific services

Containers are another promising technology to relief sites from configuration WLCG specific services

Storage

http-based federation: stick to industry standards
Ceph more and more attractive: need to properly support it as a storage system in WLCG

Monitoring: should make more publicity to SAM integration into local Nagios

Local monitoring is essential to catch problems before the jobs are affected
SAM tests: lack of documentation about errors detected, time wasted to google, also not possible to manually rerun the tests after fixing the problems
- Also the issue of experiment specific pages being protected: sometimes difficult to access existing information

WLCG OpsCoord should be the channel to make request to sites

Many sites supporting several VOs: direct request has the risk of conflicting requirements
Also need to ensure that there are not too many urgent requirements put on sites

Sites asking for more clear OpsCoord directions

Proposal: add a 'site actions' section to the minutes
Should also try to consolidate the information available at one entry point

Also a need for site to site communication

Mailing list: risk of overlap with lcg-rollout
Open and searchable wikis
WLCG Jamborees?
HEPiX and GDB

Improve participation of sites to OpsCoord meetings, including TF/WG

Start a bit later (4pm) to allow a larger US participation
Asian participation

Experiment Session

ALICE - M. Litmaath

Run 2: more detectors added, new LHC conditions will lead to a double event rate

Also result in a 25% increase of event size
Efforts concentrated on improving SW performance

Simulation is still mainly based on G3 for performance reasons

G4 is still 2x slower...
... but also gives access to multicore resources (multithreaded capabilities in v10
- v10 validation has started

Distributing computing and analysis: no news, good news! Things work and continue to grow

Migration to CVMFS fully completed
AliEn: new ARC interface, ongoing consolidation work
Testing "opportunistic" use of HLT by AliEn
Analysis: organised analysis (trains) now more than half of the analysis load
- -50% of individual analysis in 1 year

Data popularity monitored: almost no inactive data left

Run 2 peparation

Re-commissioning in progress, in particular with cosmics data
Reprocessing of RAW from Run 1 with last SW

CPU efficiency quite constant now for all types of jobs: ~80%

Several times > 70K concurrent jobs

Several new sites joined or about to join

including a few candidate T1s: UNAM (Mexico), Sao Paolo
- After KISTI and Russian KI
Several T2s with a significant increase of their resources: Hiroshima, Torino , COMSTAT

Storage changes

Xrootd v4: IPv6 support
EOS: 4 external sites running it now...
Xrootd proxy to allow xrootd access from clusters without an outbound connectivity: GSI, HPC clusters

RFC proxies required on VOBOX to move to latest openssl

Didn't manage to get legacy proxy working
Work in progress, done at many sites
Latest AliEn on VOBOX as soon as migration is completed

SAM3: will use a new A/R formula

Basically any CE (or a VOBOX) & all SEs

R&D work in progress

Ceph as a xrootd backend
Virtual Analysis Facility (VAF): Proof on Demand in a cloud
- Using HTCondor, Elastiq, CernVM Online

ATLAS - A. Filpcic

New computing model: less difference between T1 and T2

Long-lived data stored on well connected stable SEs: 90% availability required
- Sites with availability < 80% not considered for data placement
Jobs executed everywhere: intermediate datasets left distributed over SEs
New services (JEDI, Prodsys-2, Rucio) all in production since December 2014
- New system faster and more flexible

A lot of effort put on monitoring tools tailored to ATLAS needs

Generic tools generally not matching ATLAS needs: FTS3 dashboard, WLCG transfer... and FAX dashboard

Analysis share: 5% at T1s, 50% at T2s

Production: most jobs with 2 GB/core and 6-12 hours

Some specific jobs require extreme resources: 4-8 GB, 4-6 days
AthenaMP: works with up to 32 cores, improves memory footprint
- Not supported job types: merge jobs (but fast, low memory), event generation
- Initialisation/completion time done with a single core: ~15 mn. Should allow up to 96% of efficiency for a 6-8h job duration
ATLAS is target 80% of its resources able to run multicore jobs but currently at 50%
Jobs monitoring their RSS usage against the limit set and will terminate if going over
Plan to use more opportunistic resources in the future: ATLAS production system can now handle them transparently
- For non I/O intensive workloads

Data access through WAN: job overflow

Level controlled by JEDI

Shift changes: new Computing Run Coordinator created

Less requirements of kwowledge in ATLAS SW
Call to site admins as volunteers: counted as a Class-2 shift

Sites invited to attend weekly ADC meeting

CMS - C. Wissing

Event rate from 400 Hz to 1 Khz: computing during Run 2 will be resource constrained

Also increased pileup

Multicore processing: not only for improving memory footprint but also to stay in the 48h windows for RECO jobs

Prompt RECO is a 4 thread app

Lowering site boundaries

Data federation to allow remote access to data
Compute resources in one global HTCondor pool used for production and analysis
- Sharing between prod and analysis done by HTCondor instead of being configured at sites
- Local fairshare configuration: see slides
Provisionning of opportunistic resources (cloud, HPC...) through HTCondor GlideinVMS too

HLT for processing and production: HLT size larger than all T1s, configured as an OpenStack cloud

Data storage: direct access to EOS
Network connection upgraded from 60 Gb/s to 120 Gb/s

To face the resource constraint expected, working on being able to use any non pledged resources: clouds, HPC, ...

Simplified management of disk space at sites

All spaces previously managed by groups now transferred into centrally managed space: central space is 60% of the pledged disk space
Still 40% of the pledge space that is basically unregistered and unmanaged: deploying space management service at sites

Disk/tape separation achieved: will allow to use T1s for user chaotic analysis

No risk to trigger tape access
Recent tape exercise done: all T1s achieved performances far beyond expected ones

New data format: mini-AOD

To replace group ntuples
50 kB/event
Should satisfy 80% of the analysis cases

LHCb - S. Roiser

LHCb uses "intensity leveling" with a reduced luminosity constant over a fill: with the new LHC conditions, this will reduce the pileup

New trigger scheme

1 MHz after the HW L0 trigger
HLT1: real time partial reconstruction, buffered to disks
HLT2: (slightly) deferred full event selection, with calibration data. Output very close to offline reconstruction
- Output: 12.5 kHz
- 10 kHz (with some parked events) intended to go through the offline reconstruction
- 2.5 kHz TurboStream: events that can be directly processed by physics analysis, without going through offline reconstruction

Offline reconstruction: now supposed to be the final processing, no reprocessing foreseen before end of Run 2

Done using the same calibration/alignement as in HLT
Longer retention expected for stripping output: compensated by more physics moved to MDST
Some T2s used for reconstruction: no more tight coupling between a T2 and a T1

Analysis can be run at T0, T1s or T2Ds

A small fraction of the LHCb workload but the highest priority in the central task queue

Can take advantage of any computing infrastructure, virtualized or not: all environments served by the same pilot infrastructure connected to DIRAC

Data storage: no more direct processing from tape caches/disk buffers

Data copied from tapes (disk buffers) to disk only storage through FTS3
- Includes T2Ds
Should lead to a reduction of tape disk buffers
LFC replaced by DIRAC File Catalog: bookkeeping unchanged
Data popularity monitored

Data access through "Gaudi federation"

List of replica created for each analysis job with ability to fail over to a non local replica in case of problems
SRM used for tape interactions and for writing to storage (job output, data replication)
Xrootd: used for reading only. SRM-less.
http/DAV: deployed at all sites, could be an alternative protocol to SRM
- http federation started: development in progress for doing data consistency checks

Computing at HL-LHC Timescale

Introduction - I. Bird

CERN Council decided that HL-LHC should apply to be an ESFRI project

Opportunity for new sources of funding
ESFRI didn't exist when LHC project was started

Planning towards HL-LHC: need to agree on common baselines and expectations

Need to discuss potentially controversial topics: will the current computing models scale? physics costs vs. computing costs?
Distributed computing is here to stay
General purpose x86 Linux comes to an end: more efficient to specialize
- GPU, HPC, ARM...
- Still a role to play but only for some specialized workflows/workloads
Datacenters: 0(100) is not very efficient, concentrate on O(10) large data facilities with associated computing resources
- Potential role for commercial providers
"T2s role": today providing > 50% of the computing resources and engagement of a lot of skilled people. Don't want to loose that
- A lot of workflows still appropriate for this kind of resources

Also the recognized need for evolving/reengineer SW: HEP SW Foundation

WLCG must think at its role in a HEP-wide infrastructure serving future HEP projects (ILC, Belle2, FCC...), Intensity Frontier experiments and other related sciences (astro-particles...)

Need a common repository/library of proven tools and MW to allow reuse of existing solution: HSF can help with this
- We also need to adopt standards every time this is possible
Need strong input from experiments
Failing to do this quickly will lead to too high costs: we are under the pressure of funding agencies and other bodies who are following this closely and asking hard questions...

ALICE View - T. Chuyo

At HL-LHC, from 40 Mhz collisions to 50 MHz

No more data selection in the trigger: continuous read out. More than 1 TB/s from the detectors.
x100 output to storage: 100 GB/s to storage, 13 GB/s to computing centers at Run 3
Needs for local data storage higher than anticipated

Common HW and SW teams for DAQ, HLT and offline

02 facility for online (synchronous) processing

Synchronous reconstruction (online) followed by offline, asynchronous refined reconstruction with quality control

zero suppression, compression: TPC still accounts for 60% of event size
Asynchronous reconstruction of raw data at T0 + T1s
Asynchronous reconstruction of MC data at T2s

Analysis at Analysis Facilities: input is AOD produced by T0/T1 (from raw data) and T2 (from MC data)

New approach already exercised during Run 1.

ATLAS - G. Stewart

ATLAS upgrade will happen during LS3

Replacement of inner dectector
Rate increase: x10
- x10 in raw storage: 75 PB/year

Impact on the different workflows

Event generation and simulation not really affected
- Simulation is scaling with energy, a lot of places that are good target for concurrency/vectorization (GeantV)
- Event Generation : CPU intensive, a good candidate for HPC (some preliminary work at Argonne)
Digitization: linear scaling with pileup
Reconstruction: factorial scaling with pileup
Analysis: linear scaling with pileup

Integrated Simulation Framework: integrated framework that can combines different simulation engines using the most appropriate for each part of the event simulation

Including fast simulation

Tracking: the key component at the heart of the battle against combinatorics!

Currently highly serialized to allow early rejection of poor candidates and avoid wasting CPU cycles
Need to probably sacrifice some serial efficiency to benefit from more concurrency: but quickly hitting memory limits...
Deep learning may play a role in the future

Framework: GaudiHive, multi-threaded version of Gaudi

No easy migration path from the current framework: progressive plan towards Run 3, including a lot of training

Analysis: moving to train model

Required to be smarter with I/O

Computing evolution foreseen

More disks but also more tapes to manage more efficiently derived data
New computing resource types: classic WLCG sites will remain a key part with bigger facilities
- Smaller sites to move to lighter MW like BOINC?
Details are uncertain concerning the HW but multi-threading, data oriented design, parallel algorithms will be the keys for the success

New generation of data management and workload management tools developped for Run 2 have been designed with the scalability/flexibility required for HL6LHC

Discussion

Jeff: what about future programming languages? Any chance to move to something other than C++?
- Graeme: I don't think so. Would be a 100 M$ effort at least. I don't think FAs will buy/fund this. Fortunately, C++ is improving.

CMS - D. Lange

2 phase upgrade

Run 3: deal with high pileup
Run 4: deal with extreme pileup
- Planning 5-7.5 kHz of events

Computing resource needs estimate for Run 4: x200 compared to Run 2

Depending on the exact estimate for HW evolution and SW improvements: 3-15x deficit
Almost the same for storage
Reconstruction is the key part to optimize

Multithreaded CMSSW framework being commissionned now

Short term (Run 2): will allow to process higher trigger rates
Longer term: explore new approaches based on more parallelism
- Have ported CMS track reconstruction to Intel Phi

Analysis: miniAOD format, 5-10x smaller than Run 1 format

Potential for big analysis improvements in Run 2
Basis for R&D toward more I/O performant analysis data models

CMS R&D active and organized around weekly meetings, open outside CMS

Working with TechLab for benchmarking new HW architectures
CMS members actively involved in HSF

Discussion

Nate to take into account that will need new ideas for reconstruction and other activities and these new ideas always have lower performance at the beginning. May negatively balance the gains achieved with existing algorithms.

LHCb - M. Cattaneo

LHCb upgrades will be completed at Run 3

In fact LHCb runs at a "reduced luminosity" that doesn't require HL-LHC: need to redesign several sub-detectors
Goal: improve precision
Software trigger at 40 MHz: 2 levels HLT with a fully reconstructed event produced by online

Need to be ready in 2020: no time for major changes in technology

R&D based on existing experience
Run 2 as a testbed for new ideas
New TDR in 2018

Reconstruction

HLT1 will run something very close to current offline reconstruction at event rate (30 MHz)
HLT1 data buffered to disk for deferred processing with calibration/alignment data: will already been used in Run 2
- No need for offline reconstruction
- Redefine RAW data as reconstructed data?
Doing reconstruction in only one place allows for HW optimizations but the code must continue to run on x86 architectures for MC events
2.5-5 GB/s to storage
Changing the game for skimming and analysis: all events wrote out by reconstruction are interesting and will be analysed
- For some analysis interested only in the decay of the triggering signal, a new smaller format: TurboStream

Offline resources: mainly for simulation

Active work in progress for fast MC: LHCb relies on a high volume of simulation (x50 expected)

Storage model: few datacenters both for tapes and disks

Tapes: 3 sites would be enough
Disks: no need for many smaller sites but recognize that it can be an important sociological/funding issues

Current distributed model for CPU works well for LHCb

Also include more opprotunistic resources
No need for coupling with data

Discussion

Simone (SC): all experiments share the goal of moving to a limited number of datacenters with CPU everywhere. How we move formard? The main issue is probably the socioligical one: how to reduce the number of datacenters and keep the expertise/know-how that is spread in all the sites?

P. Charpentier (PC): we are facing the problem that many sites and funding agencies consider that doing physics is running analysis and don't understand the importance of running MC.
- Need to convince FAs and sites that you can have a very valuable contribution to physics without operating storage
Ian Bird (IB): agree with Philippe but FAs are not obscure bodys but people close to us. It's up to us to explain what we consider a useful contribution to physics.

D. Britton: we must be careful about consolidation. In UK, getting involvment of institutions like universities we were able to deliver 300% of the resources directly funded by the project (GridPP). If consolidating on fewer centers, may lead to a reduction of hwat is delivered.

G. Steward (GS): operational costs come mainly from storage. Can probably maintain a very diverse/distributed contribution to CPUs while consolidating storage

L. Sexton-Kennedy: some steps of the analysis require local storage, it's a small amount but end users may need it at their sites

HW Trends - B. Penzer

Semiconductor market saturating: no growth anymore

Server market small but very profitable: 99% Intel

HEP is 0,3% of the server market which is becoming a niche....
Not many companies with the ability to spend a large part of their revenue in R&D: Intel spends 25%
Most IC companies are fab-less: only 4 companies with leading edge fabs
A few companies dominate the different markets: Intel (processor graphics) Samsung (disks, memories...)
- Not necessarily competing with each others

May have reached the point where the HS06/$ will not improve anymore: cost to produce the new generations matching the moore law increased significantly

Impact on server prices not yet clear as the processor market is highly profitable. But the price/performane ratio improvement may be only 10%

Microserver developments: currently dominated by ARM but Intel is coming

XeonD adopted by Facebook instead of ARM: easier software port
Game may change is Samsung is buying AMD...

Still a few new architecture in the fieds but nothing has materialized yet...

GPUS for HPC: a very small market (10K units) financed by integrated graphics cards whose market (revenue) is decreasing

Tape drive: LTO now has 96% of the market, 1 cent/GB

Memory: new technology called "memory stacks" coming that will improve by a x15 performances

Volatile DRAM market
NAND Flash: reached the limit with 2D, going 3D
Disruptive technologies complicated: many projects dropped due to the cost of producing them

Disks: 6 TB and 8 TB available but future unclear

Several technologies available to produce higher density disks but costs are very high and impossible to predict what will happen
SSD vs. HDD cost/size: x3 to x25. Not an affordable replacement.
- Even producing the same size volume of HDD with SSD would be a huge investment : 0.5 T$!

Several knobs for savings in the total envelope of systems

Storage: should speak about "storage units" defined as a combination of space and perfs
- Significant gains possible for providing large spaces without high perfs, for example for simulation
Current Haswell can execute 32 instructions per cycle, HEP using 1: improving the SW is the main path for savings...

New Storage Technologies - L. Mascetti

See slides.

Resource Provisionning: Clouds - L. Field

An extension of the pilot job paradygm: pilot VM

Need for consolidation in the way cloud are used between VOs

CernVM: a VM image (including OS) through CVMFS
- CVMFS is already a requirement
Capacity management: the vacuum model is a robust generic approach (VO is not responsible for starting the VM, the VM pops up and connect to VO central queue)
- Removes the need for all the frameworks to know about all resources types: parameter space reduction
Monitoring: fabric management is the responsibility of the capacity manager. Should be common to all VOs.
Accounting: need to map jobs (VO view) to resources offered to the VO (VMs, site view)
- Need a unique solution for all VOs: need to give a unified view to resource/budget holders

Commerical clouds: a lot of different initiatives

Helix Nebula
Microsoft Azure Pilot to start soon with CERN Openlab
Amazon/BNL joint project for ATLAS and CMS
- New Scientific Computing group at AWS
PICSE
European Science Cloud Pilot: H2020 PCP proposal
- Buyers group: organizations member of WLCG

CPU Resource Provisionning towards 2022 - A. McNab

Virtual "grid with pilot jobs" site

The site manages only the virtual infrastructure: nothing VO specific
- Lot simpler than managing WNs
VO is managing its execution environment: CernVM
Building a "virtual grid" is just starting a VM with a pilot job
- With the Vacuum model, starting the VM is no longer handled by a VO central service: simplification

Vacuum model: on a small user_data file must be supplied by the site to define what to run when.

3 VM lifecycle managers implementing the Vacuum model: Vac, Vcycle, HTCondor Vacuum

Vac: standalone implementation (IaaC). No IaaS involved.
Vcycle: Vac features for an IaaS cloud. Currently supporting OpenStack.
- Can be run centrally or by a site
HTCondor Vacuum: injects jobs which create VMs that coexist with normal jobs
- Used at RAL

Vac and Vcycle are implementing target shares to enable dynamic sharing of resources

Software Evolution - L. Sexton-Kennedy

Almost all experiments are dealing with the complexity of events by increasing the granularity of the detectors: ILD, LAr, may be CMS...

Need for a global collaboration in HEP: a challenge in itself

How HEP SW Foundation can help: mechanism to facilitate collaboration around SW

Collaboration is the only affordable way to address the challenges

SLAC workshop: the real kick-off meeting for HSF

Good non EU participation
Many non HEP/IF experiments represented
Community and project views: different focus, no conflict
Decided to adopt the Apache model: bottom-up, project-based, do-ocracy
- Transparency essential
- Darwinian approach: HSF provides an infrastructure to projects, users decide projects that survive

Several WG formed that started to work

Training: consensus it must be the initial focus, several types of training needed, learn from other initiatives like SW Carpentry
Packaging and Building WG: define a build protocol to orchestrate the combination of various SW projects
- Role of new technologies like Docker
- Allow adoption by existing projects
- Discussions in issues of GitHub HSF/packaging
Licensing WG: many SW projects without a license...
- An open-source license is mandatory to participate to HSF
- Build upon CERN recent work on the topic
SW Project WG: work on incubator idea
Development Tools: give access to tools/platforms available at certain labs, like CERN TechLab, FNAL...
Communication and Exchange WG: SW Knowledge Base
- Everybody can contribute, adds its project, make a review... just request an account

More information: see http://hepsoftwarefoundation.org

Everybody interested is welcome to join: see web sites for mailing list addresses

WLCG Partners

OSG - R. Quick

Current status of OSG

Ready for LHC Run 2
Read to embrace Intensity Frontier as a new major stakeholder
Ready for making a big leap frowards in shrinking the geek gap in data analysis
Readiy to work with bioinformatics to move it to DHTC through science gateways

Current usage figures: 75% HEP, 67% LHC

Footprints on 120 campuses
Strong opportunistic usage
- OSG considers as part of his mission to address long-tail science needs

Recent changes in OSG leaders: see slides

HTCondorCE: lessons learned with 10 years of GRAM, turn a CE into a particular configuration of HTCondor

Ownership by HTCondor team, some contributions by OSG
- Authz by voms/gsi, support for multiple batch systems...
Easily delivered in OSG stack
Since Dec. 2014: default OSG-CE
- GRAM-CE support expected to end in July 2016

OSG CA planning to transition to CI-Logon

CI-Logon: a NCSA project in conjunction with XD and XSEDE
OSG CA will be accredited by IGTF
Hope to complete the transition early 2016: smooth transition expected apart from DN changes

OSG as a service: OSG-Connect gateway

Abstract complexities of using DHTC
Generic service easily customized for each community
Starting collaboration with EGI competence centers

Data movement in the hand of big communities: no serious effort in OSG to offer a full fledged data service for long tail science

HPC resources: collaboration with XSEDE, currently mainly XSEDE offloading work to OSG but working on the other direction too.

Good will on both sides

Network Services at OSG and WLCG - S. McKee

Network monitoring for WLCG through a standard open source tool: perfSONAR

260 instances deployed
Wide deployment by WLCG was a driver for significant improvements in v3.4
- Current version is 3.4.2 and addresses all the know issues in 3.4
Made a huge progress in perfoSONAR config management with meshes: central configuration of instances
- Dynamic reconfiguration is possible
- perfSONAR instances can participate in more than one mesh
Instances monitored by OMD: https://psomd.grid.iu.edu/WLCGperfSONAR/check_mk/

MadDash for metrics visualisation: http://psmad.grid.iu.edu/maddash-webui/

Several concrete examples where perfSONAR infrastructure helped to isolate and fix tricky problems

See slides for a recent example between AGLT2 and SARA

OSG working on exploiting the perfSONAR data to raise alarm from a central archive: PuNDIT

Future plan

Build tools above perfSONAR to help diagnose/troubleshoot topology issues
Datastore access through ActiveMQ for application to use the data for their decisions
- Pilot planned with FTS
- Also a plan to integrate data from perfSONAR into FTS metrics in the SSB dashboard

Belle 2 - T. Hara

Belle 2: 50 ab-1

100 PB of raw data
- compared to 1 PB for Belle
Should start in 2018

Need to adopt a world-wide distributed computing to meet the computing challenges

Distributed Computing infrastructure should be ready mid-2017
Next KEK computing system replacement planned mid-2016
- Phases like for Tokyo T2

Computing resources very close to those of ATLAS or CMS

More for tapes
3 main data sites: KEK, PNNL, GridKA/DESY

Belle 2 joined LHCONE

Similar requirements to LHC for networks
May Belle 2 sites are WLCG sites
Achieved 1 GB/s between KEK and PNNL

Significant overlap between WLCG and Belle 2 sites: exposed to the MW diversity in WLCG

Using DIRAC as the experiment framework
Catalogs: not yet decided between DFC and AMGA+LFC
Developing their own monitoring tools above DIRAC: in particular to do site testing
Using GGUS

Wants to be an observer in WLCG and participate to GDB/WLCG workshops

Ian/Michel: welcome to participate to GDBs!

Wrap-Up

Follow-up of discussions to be announced later

Several topics will be discussed during GDB
A specific initiative to progress on our infrastructure evolution

Next workshop in 9 months: details to be announced later

Thanks to CHEP organizers to make this meeting possible!

-- MichelJouvin - 2015-04-12

Topic revision: r1 - 2015-04-12 - MichelJouvin

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback