Question
Question 4. If you want to, please let us know what is the approximate effort (in terms of FTEs) needed to support storage at your site, as is currently deployed and at the current capacity? If it makes sense, you may split the effort between any VO-specific support effort and generic (VO agnostic) effort.
Answers
CERN
EOS - Difficult to quantify and not very representative (complex mix of dev, ops and fabric management).
Ceph - ~2 FTE on operations
Castor - ~2FTE across disk and tape ops
hephy-Vienna
0.5 FTE.
KI-LT2-QMUL
2/3 FTE (including procurement, installation, configuration, maintainance)
UKI-LT2-RHUL
0.25 FTE
RO-13-ISS
Nebraska
Storage support varies from week to week. It likely averages between .25 to .5 FTE. When we commission new storage devices that can be a bit time consuming but the bulk of our storage effort is in monitoring and replacing failed media.
INFN-ROMA1
About 0.5 FTE
NDGF-T1
2
BEgrid-ULB-VUB
0.8 FTE
NCG-INGRID-PT
1 FTE
The effort required to provide the storage is very low, as most of the actions are automatized. It is less than 0.2 FTE.
LRZ-LMU
1.0
CA-WATERLOO-T2
1 FTE (main support 0.8 FTE for grid admin + 0.2 of other sys admins)
CA-VICTORIA-WESTGRID-T2
.5
Taiwan_LCG2
1. We will need at least two FTEs to maintain our storage system. However, the idea FTE will be three.
2. Yes, it kinda makes sense to split effort between specific VO support and generic support.
If we evaluate managing the storage at 50% of the effort for the site, this would be 0.5 FTE or less
asd
MPPMU
.5 FTE 3PB
INFN-LNL-2
Australia-ATLAS
0.3
0.2 FTE for hardware support only. Central management performed by NDGF-T1 team.
KR-KISTI-GSDC-02
VO-specific support effort : generic effort = 20:80
UKI-LT2-IC-HEP
0.5
UKI-SOUTHGRID-BRIS-HEP
0.5
GR-07-UOI-HEPLAB
UKI-SOUTHGRID-CAM-HEP
Approx 0.3FTE (all generic)
USC-LCG2
EELA-UTFSM
DESY-ZN
0.2
PSNC
2FTE, but it will be extended due to specifiv VO requests
UAM-LCG2
0.5 FTEs
T2_HU_BUDAPEST
INFN-Bari
0.5 FTE
IEPSAS-Kosice
approximate effort (in terms of FTEs) needed to support storage : 2
current capacity : 1
It is no easy to answer to this question, because we also provide many other storage solutions for non WLCG experiments, sometimes with the same technology (dcache, xrootd) and sometime not (Irods, GPFS,…). Storage team is in charge to provide this full set of solution and the split cannot be done easily
NONE_DUMMY
blah
WEIZMANN-LCG2
10%
RU-SPbSU
USCMS_FNAL_WC1
Approximately 3 FTE supporting hardware, services for disk storage (questions seem to imply we're only talking about disk here.
RRC-KI-T1
vanderbilt
1.0
UNIBE-LHEP
CA-SFU-T2
about 0.1 FTE (we've had many problems with dcache lately), for 3 VOs
_CSCS-LCG2
1.5 FTE
T2_BR_SPRACE
T2_BR_UERJ
1FTE
GSI-LCG2
About 3-4 FTEs including also the maintainance of the Lustre backend.
UKI-NORTHGRID-LIV-HEP
CIEMAT-LCG2
Aggregating the effort required for hardware-related tasks (deployment and maintenance), O.S. installations, middleware deployment and management, and general operation tasks, probably ~2 FTEs are required to support storage (this does not include networking-related activities). In our case, CMS represents ~90% of storage usage, and probably supporting other (basically, local) communities only adds ~10% of additional effort.
a
T2_US_Purdue
25%
0.8 FTE should be agnostic but, in reality, it is more ATLAS-specific in the sense that 90 % of the total capacity is for ATLAS
TRIUMF-LCG2
Overall, a baseline of 2 FTEs to support all aspects of the Tier-1 storage infrastructure. We only support ATLAS.
KR-KISTI-GSDC-01
As mentioned above, since we are only supporting ALICE VO now, we cannot split our effort into different VOs. Currently we have 1.5 FTEs for storage management and operations: 1 FTE for storage hardware (daily maintenance, hardware procurement, installation and configuration, vendor contact, etc.) and 0.5 FTE for storage operation (
XRootD and EOS maintenance and operations, update and upgrade, etc.)
GRIF
0.8 FTE
0.5 FTE
ATLAS and ALICE VO support => FTE ~ 0,3
ZA-CHPC
0.1
JINR-T1
3 FTE
praguelcg2
2 FTEs
UKI-NORTHGRID-LIV-HEP
Approximately 0.25FTE, mostly maintaining the current system.
INDIACMS-TIFR
Entire site managed by 2 Admins
TR-10-ULAKBIM
0,2
prague_cesnet_lcg2
0.5FTE
TR-03-METU
0,2
aurora-grid.lunarc.lu.se
SARA-MATRIX_NKHEF-ELPROD__NL-T1_
1 VO agnostic, 0.7 VO specific
DESY-HH
T3_PSI_CH
1 FTE for the whole support of T3
SAMPA
10FTE per day
INFN-T1
4 FTE fo SW and HW support + 2 FTE for VO specific support
GLOW
0.5
UNI-FREIBURG
n.a.
Ru-Troitsk-INR-LCG2
T2_Estonia
HDFS needs from time to time disk swap if disk fails otherwise it run quite well and no maintenance required very often.
Ceph has/have still multiple problem with load/configuration and needs more attention. Hard to say how much time. Depends on problem/incident.
pic
Approximately, we do have 1.5 FTEs to operate disk and tape resources. Regular system updates are VO-agnostic, and they require time for testing and validating, before applying changes and/or deploy new versions. New hardware installations might be VO-dependent (indeed, 75% of the disk resources deployed are used by the LHC). Tape recycles/repacks are VO-dependent, however the deployed tools are VO-agnostic. We would say 60-40 for VO-specific and VO-agnostic effort can apply in our case.
ifae
The ifae site is hosted at PIC. Approximately, we do have 1.5 FTEs to operate disk and tape resources. Regular system updates are VO-agnostic, and they require time for testing and validating, before applying changes and/or deploy new versions. New hardware installations might be VO-dependent (indeed, 75% of the disk resources deployed are used by the LHC). Tape recycles/repacks are VO-dependent, however the deployed tools are VO-agnostic. We would say 60-40 for VO-specific and VO-agnostic effort can apply in our case.
NCBJ-CIS
Not much - 0.1 FTE for all grid related activities.
Echo: 2.5 FTE
Castor 2.1 FTE
T2_IT_Rome
BNL-ATLAS
N/A
FZK-LCG2
INFN-NAPOLI-ATLAS
about 0.33 FTE
--
OliverKeeble - 2019-08-22