DRAFT

WLCG Operations Coordination Minutes, April 12th, 2018

Highlights

Agenda

Attendance

  • local:
  • remote:
  • apologies:

Operations News

  • T1 sites are requested to complete the tape survey
  • We welcome Balazs Konya in a new role of WLCG Middleware Officer

SAM recalculation policy

  • Delay for submission of GGUS tickets for recalculation should be respected. It is 10 days after monthly draft of the availability reports are sent around by WLCG project office. If the GGUS tickets are not submitted respecting 10 days delay, they are not accepted;
  • A/R recalculations will be accepted if they are relevant to the site MoU commitment or concern time ranges of sufficient length:
    • For T1 sites:
      • if only the corrected A/R will meet the 97% threshold;
      • or if the total concerned time ranges exceed 20 hours.
    • For T2 sites:
      • if only the corrected A/R will meet the 95% threshold;
      • or if the total concerned time ranges exceed 20 hours

Middleware News

  • Useful Links
  • Baselines/News
  • Issues:
    • It was discovered that voms-clients-java-3.3.0 package in EPEL was broken. The update to that package caused essential commands (voms-proxy-info,init) to be removed. The investigation revealed that the cause of the problem was change of package name and the post-install scripts. Mattias Ellert provided a fixed package (voms-clients-java-3.3.0-2.el6) that is now in EPEL testing.

Discussion

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • The activity levels have been lowish on average
    • Normal levels in the last days
    • New productions will be prepared
    • High amounts of analysis jobs in preparation of Quark Matter 2018, May 13-19

ATLAS

  • Stable grid production over the last weeks with up to ~300-350k concurrently running job slots, including the HLT farm. Additional HPC contributions with peaks of more than 1mio currently running job slots, but the actual corepower of the HPC CPUs is up to a factor of 10 weaker in comparison to regular grid site CPUs.
  • Currently there is the usual mix of “CPU heavy” grid workflows of MC generation and simulation on-going with a smaller fraction of MC digitization and reconstruction. A smaller campaign of a delayed 2017 data stream was processed.
  • Small operational hiccups due to certificate/proxy prolongations in various systems
  • EOS reported potential corruption of files for an 8h period on 30 March - files may be corrupted even if adler32 checksum is correct.
  • Ongoing discussion within ATLAS to evaluate MD5 vs ADLER32 checksum.
  • No operational problems with FTS in the past weeks, since large data reprocessing with large transfer rates finished. FTS at CERN and BNL in use at large scale and FTS at RAL in use for one site.
  • Tier0 is ready for LHC data taking.

CMS

  • cosmic data taken the last month with and without magnetic field
  • LHC collisions since April but not physics data yet
  • Had one round of transfer tests T0->T1
    • Might repeat a few tests after adjustments
  • Compute systems busy at around 220k slots last month
    • usual 70% production 30% analysis split
  • Kyungpook National University in Daegu, Korea informed us that they need to end Tier-2 service April 30th due to lost funding; KNU was an excellent site and we are sad to lose them
  • Singularity deployment almost complete
    • over 80% of Tier-1,2 sites ready
    • one Tier-2 site and HLT still need to setup
    • getting close to making Singularity mandatory in SAM (after ongoing installation activity at sites is complete)
  • SAM corrections done as needed to make sure results/site evaluations are representative; do we need cross-experiment policy or can we leave it at VO discretion?

LHCb

  • Ready to restart data taking
    • Use of HLTFarm for offline MC production has been close to 100% during the past few months, we expect reduction any time soon
  • Productions:
    • Several Stripping productions close to the end. We could finish several productions since when CNAF is back in business. Run in "mesh" mode (~all Tier1s "helping" CNAF process its data)
    • MC simulation activities still taking close to 90% of distributed computing CPU
  • CNAF:
    • almost all data recovered, including wet tapes. Staging for DataStripping went on without too many issues.
  • DIRAC services
    • 6/7 weeks ago we experienced big problems with updating DBOD (MySQL and host): everything is operationally OK since then, but we'll still need to go through more upgrades ~soon
    • excluding the above, running 100% availability with an average of 120K concurrently running jobs
    • support for Glue2 is being added (late...): reports are that the same "mix of info" of BDII/Glue1 persists.
    • 1 new HPC site being integrated in these days.

Ongoing Task Forces and Working Groups

Accounting TF

  • [[https://indico.cern.ch/event/711469/][Accounting Task Force meeting] in March reviewed how experiments prepare the RRB reports.
  • CERN batch accounting problem is still under investigation

Archival Storage WG

Update of providing tape info

Site Info enabled Plans Comments
CERN YES    
ASGC NO    
BNL YES    
CNAF NO    
FNAL YES    
IN2P3 NO    
JINR NO    
KISTI NO    
KIT YES    
NDGF NO    
NIKHEF-SARA NO    
NRC-KI YES    
PIC YES    
RAL NO    
TRIUMF NO    

Information System Evolution TF

IPv6 Validation and Deployment TF

Detailed status here.

Machine/Job Features TF

Monitoring

NTR.

MW Readiness WG

Network Throughput WG


  • perfSONAR 4.0.2 and CC7 campaign - 210 instances updated to 4.0.2; 81 instances already on CC7
    • WLCG broadcast will be sent to remind sites to plan an upgrade to CC7 and review the firewall port openings
    • perfSONAR 4.1 release, planned in Q2 2018 will no longer ship SL6 packages
  • Attended perfSONAR developers F2F meeting in Amsterdam and presented feedback from OSG/WLCG
  • WG reports planned for upcoming HEPiX and CHEP
  • Networking and perfSONAR were also major topics at the OSG-All Hands (https://indico.fnal.gov/event/15344/)
    • 4 presentations were given on various topics related to the WG
    • One of the outcomes was a proposal to create a dedicated site-based documentation showing all links relevant to a given site
  • WLCG/OSG network services
  • Outreach and other activities:
    • GEANT has added several perfSONAR instances on LHCONE at their major network hubs (ams, gva, lon, par, fra) - both IPv4 and IPv6
    • Advania was added to HNSciCloud test mesh
    • MGHPCC (http://www.mghpcc.org/) plans to deploy up to 22 perfSONARs, currently in discussion how we can help
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

Squid Monitoring and HTTP Proxy Discovery TFs

Traceability WG

Container WG

Special topics

Action list

Creation date Description Responsible Status Comments
01 Sep 2016 Collect plans from sites to move to EL7 WLCG Operations Ongoing [ older comments suppressed ]
Dec 7 update: Tier-1 plans are documented in the Nov 2 minutes.
Jan 18 update: CREAM and the UI were released in UMD-4 on Dec 18.
03 Nov 2016 Review VO ID Card documentation and make sure it is suitable for multicore WLCG Operations In progress GGUS:133915
14 Sep 2017 Followup of CVMFS configuration changes,
check effects on sites in Asia
WLCG Operations Pending March 1st update: this might imply significant effort; low priority for now.

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Comments Deadline Completion

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Comments Deadline Completion

AOB

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r12 - 2018-04-12 - AlbertoAimar
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback