DRAFT

WLCG Operations Coordination Minutes, Jan 27, 2022

Highlights

  • Pre-GDB on operational effort and possible optimization will take place on the 24th of February 3:00-5:30 PM CET.

Agenda

https://indico.cern.ch/event/1120678/

Attendance

  • local:
  • remote:
  • apologies:

Operations News

  • IAM service support
    • IAM has been added to the table of critical services at CERN
    • Its urgency and impact numbers have been copied from the VOMS row
      • They are expected to change in the course of this year
    • The details for submitting a ticket have been added to these pages:
    • For the next few months, the support level is closer to 8/5 than 24/7
      • That is expected to improve in the course of this year
      • It implies we should not yet rely on very short-lived tokens
      • Rather imitate what is done with VOMS proxies for now

Special topics

XRootD monitoring

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Normal activity levels on average in the last 8 weeks
  • No major issues
  • Site VOboxes are being switched from legacy AliEn to JAliEn
    • Most sites and ~85% of the resources are done
      • Progress can be tracked here
      • Issues at a few of the remaining sites are being followed up
    • JAliEn is needed for Run-3 multi-core jobs
    • Most sites should only see 8-core jobs eventually
      • Some already do, some others receive a mix
    • Such jobs also can run up to 8 single-core (legacy) tasks
    • For each task, Singularity is tried from CVMFS
      • If that fails, a local system installation is tried
      • If that fails, the task is run in the classic way

ATLAS

  • Smooth running over Xmas break and in last few weeks with 700-800k slots
    • Main activities Run2 data and MC reprocessing
    • Including running MC reprocessing on 50% of the HLT farm
  • Another CA update (SlovakGrid) means all dCache services need to be restarted (same as with Swiss and Brazilian updates last year)
  • Switch to IAM VOMS server seemed to go smoothly, required some clean up of tools still using legacy (non-RFC) proxies
  • AGIS servers were shut down last week
  • Problems with slow transfers to and from RAL, hard to debug (GGUS:154436)
  • SRR storage reporting is shaky especially at dCache sites. Several times storage got full because SRR was not up to date.
  • Planning a Run 3 commissioning data transfer test ~end Feb/begin March involving full T0 and export to T1 tapes

CMS

  • running smoothly with 300-350k cores
    • no significant issues during the holidays
    • transatlantic link Fermilab--CERN down to tertiary, 20Gbit/s, link during most of the holidays
      • Waiting for last pieces of information about the chain of responsibility linked to the machine that failed at CERN causing the issue
    • internal saturation for analysis jobs during the holidays
      • traced to large number of jobs with low number of sub-jobs
    • usual production/analysis split of 3:1
    • HPC allocations contributing up to 30k cores
      • 2021 allocations for machines in the US all consumed well before the end of the allocation period
    • production activity mainly Run 2 ultra-legacy Monte Carlo
    • re-reconstruction of parked B data, 11B of 12B processed
  • SRM+WebDAV commissioning at Tier-1 sites started
  • accidental deletion of SAM/HC datasets middle of December
    • at about a third of sites
    • all files restored by middle of January
  • HammerCloud instabilities since several weeks
    • job status queries failed causing multiple jobs and empty status page, corrected
    • no new jobs being submitted for series, still being investigated
    • working on updating HC jobs for Run 3 software/input datasets
  • big Thanks to all sites contributing above the pledge!
    • this is much appreciated while sites struggle to get new machines
    • a very welcome boost of the CMS physics program

LHCb

  • smooth running over Xmas break and in January
  • using 140-160k cores
    • re-processing campaign of Run2 ended in Dec 31st, 2021 (!)
    • simulation jobs at 95%, user jobs at 5%
  • some data movements / replicas to
    • recover disk space at PIC
    • deal with long-term downtime of storage at CBPF
  • planning tape throughput tests at Tier0
    • tape read test at Tier1 should also be planned

Task Forces and Working Groups

GDPR and WLCG services

  • Updated list of services
  • Review of the status of publishing of the CERN RoPOs for WLCG services hosted at CERN and WLCG Data Privacy Notice for other WLLCG service has been given at the January WLCG MB. We were asked to go ahead and to accelerate this process. WLCG Ops Coordination will follow up.

Accounting TF

dCache upgrade TF

Information System Evolution TF

  • WLCG CRIC has been bootstrapped with initial information for network topology. Will submit tickets against the sites asking to validate this data.

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

  • Kick-off meeting of the Monitoring Task Force took place on the 13th of January. Agreed on the main directions of work. Jira project WLCGMONTF has been created to follow up on the progress.

Network Throughput WG

  • perfSONAR infrastructure - 4.4.2 is the latest release
  • WLCG/OSG Network Monitoring Platform
    • Weekly meetings focusing on operations and infrastructure and alerts/alarms improvements
  • 100G perfSONAR meeting took place last week
    • We have agreed to tune up performance in order to reach ~ 10% of capacity (right now it's ~3-5%)
  • Recent and upcoming WG updates:
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

WG for Transition to Tokens and Globus Retirement

  • Progressing via Authorization WG meetings

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

-- JuliaAndreeva - 2022-01-25

Edit | Attach | Watch | Print version | History: r12 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2022-01-27 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback