WLCG Operations Coordination Minutes, March 2, 2023

Highlights

Agenda

https://indico.cern.ch/event/1259089/

Attendance

  • local:
  • remote: David B (IN2P3-CC), David Cameron (ATLAS), Stephan (CMS), Borja (WLCG Monitoring), David Cohen (IL T2 sites), Andrea (TRIUMF), David Mason (FNAL), Eric (IN2P3), Panos (WLCG), Adrian Coveney (APEL), Federico Stagni (LHCb), Domenico Giordano (WLCG benchmarking), Gonzalo (WLCG benchmarking), Giuseppe Bagliesi (CMS), Matthew Steve Dodge (EGI), David South (ATLAS), Marian Babic (WLCG networks), Stefano Dal Pra (CNAF), Christoph Wissing (DOMA, CMS), Alessandra Forti (ATLAS, UK T2s), Max Fischer (KIT), Thomas Hartmann (DESY), Natalia (WLCG), Julia (WLCG)
  • apologies:

Operations News

Special topics

Readiness for switching to a new benchmark

Discussion after presentation of Domenico and Gonzalo

  • For how long we intend to keep benchmark?

Domenico:The expectation is to keep proposed configuration as long as possible. Similarly to HS06 , 23 is a tag, no HEPScore24 is expected. Necessary changes will be discussed and approved by MB.

  • Thomas mentioned that for the new HW the HEPScore and HEPSPEC give almost the same result, while for the old hardware HEPScore is lower than HEPSCPEC. Is it expected?

Domenico: Yes . Expected and understood. This was the reason why re-benchmarking of the old resources is not recommended

  • Stefano: Can we report two benchmarks?

Julia: The specification does allow to report two benchmarks for the same resource in the dictionary of the normalized records , though APEL will consume only one for the time being. If two are reported, APEL will consume HEPScore.

  • Thomas H. : Puppet configuration for the benchmark is needed for the site.

Domenico: It already exists at CERN. Will help other sites with it.

  • Domenico: We would like benchmarking results to be reported to the central repository at CERN. For this the site needs to be authenticated to CERN message queue (MQ). Can work with host certificate, DN has to be communicated to the support team of the MQ at CERN.

  • Token authentication is not enabled yet for message queue at CERN, some development would be needed.

  • Julia: Do we need to broadcast to the sites the call for validation of the benchmarking suite during this month?

Domenico
Yes, it is needed.

Discussion after presentation of Julia and Adrian

  • Adrian: APEL client won't be ready for the 1st of APRIL.

Julia
What is the time line for new version of APEL client?

Adrian: End of April, since it will be implemented by the new team member who needs some time to get familiar with the code. This would not include upgrade for Python 3, which will come later. Should not be a big problem since many sites might use third party clients to generate the records

  • Authentication to the message queue used by APEL - AMS (different from the one used for reporting benchmarking results to CERN).

Adrian
In order to authenticate, there should be a registration of the DNs per service done in GocDB.

  • Is token authentication foreseen?

Adrian
Yes it is foreseen, AMS is ready to use tokens, but for production accounting there is component which translates certificates to tokens.

  • Thomas: We see scalability issues with APEL. Will DB stay the same?

Adrian : For immediate future :YES

  • Julia: we are preparing complete documentation for the pioneer sites which will validate the new dataflow and will start testing asap.

Middleware News

  • Next week a broadcast will be sent about WLCG advice on
    the next Linux OS options and associated MW plans
    • The text is under review this week

  • On Tuesday, the following broadcast was sent about (RH)EL 9 vs. SHA-1 CAs:
    • As has been discussed in recent WLCG Ops meetings, there is a mismatch
      between the default security policies of RHEL 9 + derivatives and
      the use of SHA-1 by a number of CAs in IGTF.
    • RHEL 9 + derivatives and other recent Linux versions come with
      OpenSSL v3, which disables a number of legacy algorithms.
      In addition, RHEL 9 + derivatives disable SHA-1 by default.
    • Unfortunately, SHA-1 is still used in root certificates of various CAs.
    • Though their number is steadily decreasing, at this time it appears
      we cannot yet declare SHA-1 unsupported across the infrastructure,
      because too many sites and resources would be affected.
    • Instead of re-enabling SHA-1, Red Hat have suggested the IGTF CA
      distribution could be adjusted in a way that should cause its CAs
      to be trusted irrespective of any dependencies on SHA-1.
    • However, such adjustments would need to be tested with all
      relevant middleware that has any business with certificates,
      which may take a considerable amount of time.
    • This then implies that clients and services running RHEL 9 or a
      derivative (AlmaLinux 9, Rocky Linux 9) will need to enable SHA-1
      for the time being.
    • The minimal way to do that is as follows:
      update-crypto-policies --set DEFAULT:SHA1
    • This matter is to be revisited when either Red Hat's suggestion has
      been made to work across our infrastructure, or the remaining use
      cases depending on SHA-1 have become negligible, which may take
      many more months.

  • Discussion:
Stephan: Can we have a list of CAs having troubles?

After the meeting Maarten has added the list (see below CMS report)

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Lowish to normal activity on average in the last weeks
  • More sites have been switched from single- to 8-core pilots
    • Other sites are planned to follow in the coming weeks
    • Some sites will be able to support whole-node jobs instead
  • Job submission token configuration details will soon be communicated

ATLAS

  • Mostly smooth running with 500-700k slots on average
  • Issues on Alma9 with old CA certificates using SHA-1
  • Planning to move Harvester to HTCondor10 this month
    • Only token auth with HTCondorCE
    • Only ARC REST interface
    • Some sites still have not upgraded their HTCondorCEs to support tokens so will be cut off, but no real impact on total resources

CMS

  • started taking cosmic data with the CMS detector
  • overall smooth running, no major issues
    • good core usage between 170k and 410k cores
    • production pressure lost for about 10 hours last Thursday
    • usual production/analysis split of about 3:1
    • significant contribution from HPCs peaking at over 70k
    • main production activity Run 2 ultra-legacy Monte Carlo and Run 3
  • tape writing backlog at JINR decreasing nicely with config adjustment
  • waiting on python3 version/port of HammerCloud
  • working with our DPM sites to migrate to other storage technology
    • limited by operations manpower/expertise
  • token migration progressing steadily
    • going through old, unused clients with excessive scope in IAM and cleaning things up
    • plan to explore IAM logging and traceability the next weeks
    • ETF updated with new xrootd version and probes including IAM-issued token probes moved to production instance; Many Thanks to Marian Babik!
    • native xrootd config ready; working on dCache config continuing
  • looking forward to 24x7 production IAM support by CERN
  • does WLCG have a list of root CAs with SHA-1 that could be shared?
    • or what is the latest expiration?

Provided after the meeting

  • These CA root certificates still feature SHA1:
    ASGCCA-2007, ArmeSFo, BYGCA, CESNET-CA-Root, CNIC, DFN-GridGermany-Root,
    DZeScience, DigiCertAssuredIDRootCA-Root, DigiCertGridCA-1-Classic,
    DigiCertGridRootCA-Root, DigiCertGridTrustCA-Classic, GermanGrid,
    IHEP-2013, KEK, LIPCA, MARGI, QuoVadis-Root-CA2, RDIG, RomanianGRID,
    SRCE, SiGNET-CA, TRGrid, seegrid-ca-2013
  • Their end dates typically are (sometimes many) years from now
  • They will need to be re-issued using SHA2 instead,
    which is not a trivial process in IGTF,
    but many others already did so and more are to follow

LHCb

  • Full system again for the past few weeks -- this followed a rather long period of under-utilization of the resources
  • Still using single-core queues almost everywhere. Single-core jobs constitutes the 99% of the ran jobs. Things are away about to change, and we'll verify and then activate multi-core queues at all Tier1s and Tier2Ds (Tier2s with Disk resources).
  • Moving to use "SingularityCE" everywhere (this is following a long-standing security issue -- isolation).
    • We ticket-ed those few sites where this was not possible yet.
    • From that point on, LHCb Pilots will start failing whenever Singularity is not available
  • DIRACOS2 (the conda-based environment for DIRAC installations) is dropping support for OpenSSL 1.1 (-> OpenSSL 3.0.0). Most notably this means a new xrootd version -- only site affected atm is RAL
  • New ETF tests in pre-prod. Plan to add more token-related tests.
  • DIRAC support for submission to HTCondor and AREX CEs with tokens validated this morning.

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

  • Progressing with integrating of the new benchmark in the accounting workflow. See presentation.

Information System Evolution TF

  • Most of CRIC instances have been upgraded to use new SSO
  • Following the request of LHCb to provide CE and queue information via CRIC API, a prototype of the loader that will pull data from BDII is ready. Next step is to iterate with LHCb experts to see whether all corner-cases are covered.
  • In order to support CMS migration to tokens CMS CRIC has been populated with all the new groups that should be synced to IAM. The sync script is ready to be deployed as a cron, it should be deployed in the Openshift cluster that runs the actual service.
  • According to our information CERN BDII instance is not used by any clients, we are planning to switch it off shortly

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

  • The plan for dCache storage with XRootD protocol monitoring flow has been discussed between WLCG Monitoring Task Force, dCache experts and FNAL dCache support team. The implementation and next steps for the development of the prototype have been agreed. dCache developers will prepare a forwarding script to send data to CERN Messaging system

Network Throughput WG


WG for Transition to Tokens and Globus Retirement

  • Further progress with the CE token support campaign on EGI
    • 115 of 133 tickets have already been solved
    • only a few sizable T2 sites remain, the rest are small sites

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

  • Next meeting is planned for the 6th of April
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2023-03-06 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback