TWiki
>
LCG Web
>
WLCGCommonComputingReadinessChallenges
>
WLCGOperationsWeb
>
WLCGOpsCoordination
>
WLCGOpsMinutes150521
(2018-02-28,
MaartenLitmaath
)
(raw view)
E
dit
A
ttach
P
DF
---+!! WLCG Operations Coordination Minutes, May 21st 2015 <br />%TOC{depth="4"}% ---++ Agenda * https://indico.cern.ch/event/393606/ ---++ Attendance * local: Oliver Keeble, Marian Babik, Maarten Litmaath, Andrea Sciaba, Maite Barroso, Maria Alandes, Alberto Aimar, David Cameron, Maria Dimou * remote: Christoph Wissing, Jeremy Coles, Pepe Flix, Massimo Sgaravatto, Renaud Vernet, Thomas Hartmann, Antonio Maria Perez Calero Yzquierdo, Alessandra Forti, Gareth Smith, Catherine Biscarat ---++ Operations News ---++ Middleware News * Useful Links: * [[WLCGBaselineVersions][Baseline Versions]] * [[WLCGBaselineVersions#Issues_Affecting_the_WLCG_Infras][MW Issues]] * [[WLCGT0T1GridServices#Storage_deployment][Storage Deployment]] * Baselines: * EMI update today containing Storm 1.11.8. This version has already been verified by MW readiness and as soon as will get into UMD ( end of May) will be set as baseline * dCache 2.10.28/2.12.8 verified by MW readiness and set as baseline ( fixes for DB leak) * as discussed in the previous meeting, torque 2.5.13 has been added to the baselines table * MW Issues: * NTR * T0 and T1 services * CERN * CASTOR for LHC has been updated to 2.1.15. Small delta releases (2.1.15-8 are being rolled out) * SRM validation going on. xroot is now the main access protocol (RFIO is obsolete and its possible decommissioning will be discussed at the end of 2015 to take place in 2016 or later) * KIT * Updated all dCache setups to 2.11.19 last week. Very urgent update due to a leak in the Chimera database * IN2P3 * plan to upgrade dCache to 2.10.30+ on core servers (16/06/2015) * RRC-KI-T1 * dCache upgrade to 2.10.29 ---++ Tier 0 News * LFC decommissioning: To be stopped 22nd of June: https://cern.service-now.com/service-portal/view-outage.do?from=CSP-Service-Status-Board&&n=OTG0021439 * CMS pilot role mappings changed from a static to a pool account mapping to be able to identify the CMS functional tests and ensure proper scheduling of these jobs. A side effect of the intervention was that when the mapping changed incoming CMS jobs failed with a delegation error. This was fixed subsequently on all CEs. Pepe asks for a clarification about what functional tests mean. Maite confirms these refer to CMS SAM tests. Maarten asks whether there is any problem in EGI with LFC decommissioning at CERN. Maite confirms EGI doesn't rely on CERN LFC. It was checked with them months ago. ---++ Tier 1 Feedback None ---++ Tier 2 Feedback None ---++ Experiments Reports ---+++ ALICE * normal to high activity * no major operational issues ---+++ ATLAS * Very high activity (> 200k running job slots today) * Only possible through mix of single and multi-core jobs (see later) * Running final stress tests of data transfer this week (inside CERN and outside) before data taking * Request T1 sites to avoid major downtimes from now until late summer ---+++ CMS * Production overview * Finished with Upgrade DIGI-RECO (for now) * More memory intense production compared to usual workloads * Needed to run in multi-core pilots not using all cores * Finally started Run2 DIGI-RECO campaign * Will keep resources rather busy for next weeks/months * Successfully extended to stronger Tier-2 sites * Several issues with EOS at CERN last week * Affected Tier-0 operations (luckily no LHC collisions yet) * List of tickets (all solved now): GGUS:113687, GGUS:113678, GGUS:113664, GGUS:113657 * Global Xrootd re-director at CERN * Is an important component for CMS * Increased usage and higher dependency on the service * Many users requesting files via Global redirector * Production jobs occasionally sent to sites not hosting data * CMS requests an increase in impact to *8* (from 5) and urgency to *6* (from 5) number for WLCG critical service * Quite long iteration to get a configuration settled recently GGUS:113032 Pepe asks for a clarification on the usage of multicore pilots not using all of the cores. Christoph explains that there was indeed a mix of single core and multi core jobs. Maarten suggests that the requested change to the global xrootd redirector should be recorded in the [[https://twiki.cern.ch/twiki/bin/view/LCG/WLCGCritSvc][Critical Services Table]]. Maite will follow up but she explains this is already in the list of critical services. Pepe asks how changes on the critical services table should be communicated. There was a meeting in December for defining this, Maite adds that it shouldn't change very often in any case. ---+++ LHCb * LHCb Computing workshop going on, so not much operation follow up * Downtime today though due to the Oracle DB upgrade * T1 * Problem to contact SARA SRM Seems to be back. They claim that the fetch-crl problem could be now on CERN side. Upgrading on FTS servers and our vobox planned. ---++ Ongoing Task Forces and Working Groups ---+++ gLExec Deployment TF * NTR ---+++ RFC proxies * SAM-Nagios refresh_proxy probe should handle RFC proxies transparently when *UMD-3* is used * preprod already has UMD-3, prod still has UMD-2 * we will try this out on ALICE preprod * the readiness of sites then can be observed per experiment * no failures are expected to be due to the change of proxy type * when the sites are checked OK, the central services of each experiment are next * plus pilot factories at T1 etc. * experiment status * ALICE: done * ATLAS: * check central services * ensure all pilot factories run a recent Condor-G with a UMD-3 !CREAM client * CMS: ditto * LHCb: check DIRAC Pepe asks whether there is any feedback given by the experiments on the timeline to implement this. Maarten explains there is no timeline yet but in any case it's not critical. Maarten explains that in the future a new version of the voms clients would be needed and this will require coordination with EGI. David Cameron asks whether MW services are certified to work with RFC proxies. Maarten confirms this is the case. Pepe asks whether that means that all MW services deployed at sites are able to support RFC proxies. Maarten explains that this should be the case if the MW version is the one currently supported and not an old version. But normally EGI follows up on sites running obsolete versions. ---+++ Machine/Job Features ---+++ Middleware Readiness WG <br />%INCLUDE{ "MiddlewareReadinessArchive" section="20150521" }% ---+++ Multicore Deployment %INCLUDE{ "MulticoreTFReports" section="21052015" }% Pepe asks how many sites are offering static multicore resources for ATLAS. Alessandra says that concrete numbers could be checked looking at the monitoring tools, but it's not the case for most sites. ---+++ IPv6 Validation and Deployment TF <br />%INCLUDE{ "WlcgIpv6" section="20150521" }% ---+++ Squid Monitoring and HTTP Proxy Discovery TFs * No news, main developers haven't been able to give it any time lately because of other priorities ---+++ Network and Transfer Metrics WG <br />%INCLUDE{ "NetworkTransferMetrics" section="21052015" }% Pepe asks whether this new GGUS SU is the preferred channel to report issues. Marian confirms this is the case. Pepe asks how many incidents have been discussed so far. Marian replies that only one, the AGLT2 to SARA incident. ---+++ HTTP Deployment TF The first TF meeting has taken place, minutes are attached to the agenda, all reachable from the TF home page - https://twiki.cern.ch/twiki/bin/view/LCG/HTTPDeployment Some first steps were agreed, in particular identification of an appropriate monitoring solution and the compilation of a full list of functionality that the experiments would like to see delivered via HTTP. The next meeting will focus on the latter topic. Maria Alandes asks how the monitoring is going to be organised, where the HTTP endpoints are going to be taken from. Oliver explains this hasn't been decided yet. At least LHCb and ATLAS have these defined in their configuration DBs. Maria adds that Alessandro has been in touch with her to check whether HTTP endpoints could be taken from the BDII instead of being manually inserted in AGIS by sys admin. She can report on her findings to the TF. Oliver explains this could be indeed interesting but configuration DBs show the endpoints that are used by experiments and TF should focus on those. ---++ Action list | Description | Responsible | Status | Comments | | CMS instructions to shifters to be changed so that tickets are not opened if just one CE is red. | C. Wissing | CLOSED | Christoph explains this is now understood among the different involved parties and could be now closed | ---++ AOB -- Main.MariaALANDESPRADILLO - 2015-05-06 -- Main.MariaALANDESPRADILLO - 2015-05-20
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r17
<
r16
<
r15
<
r14
<
r13
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r17 - 2018-02-28
-
MaartenLitmaath
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback