Question
Question 7. Have you explored or are exploring alternative storage technology that you believe could provide significant improvements, such as enhanced performance or reliability, or reduced monetary- or support cost (e.g., low endurance SSDs)?Have you seen or do you imagine there are any impediments in adopting that technology? For example, the technology cannot be deployed because it would not live up to the WLCG
MoU requirements, as they are currently phrased.
Answers
CERN
Hardware
Consumer disk drives - inconclusive results, and what’s the future of rotating consumer drives anyway? They are difficult to buy in bulk, and the cost advantage vs enterprise is reduced under these circumstances
SMR similarly inconclusive, trade-off price, complexity, write-speed etc.
SSDs - we use them in targeted situations, but they are not threatening spinning disk for capacity.
Investigating high capacity low endurance SSD, e.g. CTA use case as tape buffer. 3.8TB drives under consideration.
Infrastructure
Infrastructure refinement - reduce overheads by managing network/chassis/head etc per disk. Global price per PB is always the consideration in purchasing. New systems have 192 12TB disks per node.
Redundancy
Erasure coding for EOS. Has been demonstrated. Can be reconsidered as a purely internal optimisation, under active investigation this year for production use.
Different Ceph S3 regions set up for redundancy, with mirroring. Considered using external cloud in the same way (not pursued).
Static
QoS
Different service instances deployed with different
QoS.
Ceph specialisation e.g. 0.5PB of all flash cephfs storage. Creation of all-flash and bulk storage classes
Data lifecycle
Ceph also has data lifecycle stuff which may be interesting. No concrete use-case yet.
WLCG pledges privilege raw capacity and thus constrain the flexibility to provide smaller, faster installations. This affects many aspects of our systems, e.g. i/o dimensioning for tape (number of drives), configuring disk cache for tape (where throughput is more important than capacity), deploying SSDs for targeted use (e.g. system disks, journals, high-performance analysis interfaces).
There are often infrastructure constraints on available options - e.g. geographically distributed systems cannot realistically use erasure coding.
Without an economic model, users will always simply choose the fastest option (observed with Ceph block devices where different classes are available). Users currently have no notion of cost. Many dimensions of performance are not accounted (currently it’s only capacity) - e.g. metadata i/o is not accounted, this changes how we dimension the services.
Procurement rules which promote bulk purchasing of standard blocks reduce our ability to tailor services and
QoS. For example, dimensioning the Castor disk pool was done on spindles and NICs, and given that all building blocks are standard, the capacity was a by-product of this calculation (and is higher than necessary).
hephy-Vienna
No
KI-LT2-QMUL
Looked at CEPH as posable replacement to Lustre. but Lustre fulfils our needs
UKI-LT2-RHUL
Exploring the idea of no SRM, just a cache.
RO-13-ISS
Depending of the experiment needs, the main indicator is the cost/TiB with a certain network/hdd threshold available
Nebraska
We have not looked at alternate technologies.
INFN-ROMA1
We did explore SSD systems and other technologies, although the funding available is not enough to have large-scale testbeds. We do not foresee any problem to integrate several technologies, as we do it already.
NDGF-T1
No.
BEgrid-ULB-VUB
We have looked into ceph as a way to replace raid and enhance the reliability of our mass storage, but it requires a lot more memory than standard raid+dCache, which was not conceivable due to memory price.
For now costs are regularly reduced by increasing disk storage and increasing disk number per node. AFAIK, only something like ceph with Erasure Coding could increase netto storage compared to our current small raid6 pools while keeping the same safety margins, and hence reduce our storage costs.
NCG-INGRID-PT
We are using NVMe storage in certain nodes for higher performance. Low endurance might be useful for high read low write access, likely will not work for caching.
We are looking at new SDN software, like
OpenIO for improving the performance of SAS nearline and cut the cost of performance.
LRZ-LMU
We did not test on a larger scale
CA-WATERLOO-T2
Don't see non-spinning storage costs being competitive for ~2PB+ for a while
CA-VICTORIA-WESTGRID-T2
Tentative minimal Ceph exploration
Taiwan_LCG2
Nothing but caching mechanism. So, I think WLCG storage evolvement is on the right track.
No
asd
MPPMU
INFN-LNL-2
Australia-ATLAS
No, because we have no money. Would like to use University S3-compatible storage
For permanent storage we will stick with HDD, for cache we plan to migrate to SSD (smaller volume). For sequential access (streaming, transfers), the HDD performance is sufficient, for direct/random I/O SSD provide a significant performance boost. Ceph tiering technologies will be explored in addition to automatic cache prefetching from HDD to SSD.
KR-KISTI-GSDC-02
I heard that there will be an increase in JBODs with software RAID systems because of the cost savings. I agree that the cost problem is a big problem. First of all, we need to focus on this problem.
UKI-LT2-IC-HEP
No
UKI-SOUTHGRID-BRIS-HEP
no
GR-07-UOI-HEPLAB
UKI-SOUTHGRID-CAM-HEP
No - site
has to move to "storage-less" solution for viability
USC-LCG2
Not in the context of WLCG
EELA-UTFSM
DESY-ZN
We are looking at it, but the current technology still provides the best ratio in price/performance for us
PSNC
n/o
UAM-LCG2
No. No impediments.
T2_HU_BUDAPEST
no
INFN-Bari
NO
IEPSAS-Kosice
no we haven't explored alternative storage technology
We already use some alternatives storages technologies in order to provide storage services for non WLCG activities (SSD, blackbox type NAS or SAN,). Nevertheless, theses usages are today not the same range of the WLCG usage (poor capacities, less performance, others requirements). We have not yet perform the exercise of imagine this type of solution on wlcg context
NONE_DUMMY
blah
WEIZMANN-LCG2
No. For us the simplicity of a uniform storage system, despite costing more for the equipment, is paramount because of our limited manpower.
RU-SPbSU
USCMS_FNAL_WC1
Certainly there are cheaper storage technologies that do not live up to the WLCG
MoU requirements as they are currently phrased.
RRC-KI-T1
vanderbilt
no
UNIBE-LHEP
No
CA-SFU-T2
Yes. Deploying dcache on Dell SBB3s seems to be the most economic way. At the same time quite reliable.
_CSCS-LCG2
No
T2_BR_SPRACE
No
T2_BR_UERJ
We do not explore those alternatives. At the moment it is not being possible to think about adopting / testing new storage alternatives. (financial crisis in Brazil)
GSI-LCG2
These technologies are not yet attractive for us.
UKI-NORTHGRID-LIV-HEP
No
CIEMAT-LCG2
We are beginning to purchase SSDs for our worker nodes as local disk and we are thinking (as a future goal) about use CEPH as a global files system standalone and/or underlying our dCache instance.
a
T2_US_Purdue
We have tested SMR (shignled) disks as direct replacement of the current HDDs.
SMR technology promises cost savings of the order of 10-15%, which we consider significant.
Current support for host-managed (HM) SMR disks in
CentOS7 is minimal, and required significant efforts on our side to accommodate that technology. Availability of HM-SMRs is not great. Device-managed (DM) SMR works out of the box, but currently offers no enterprise-class disks, and I/O performance is significantly lower, albeit compatible with our typical CMS workflows.
We are also exploring CEPH as a replacement of HDFS, to benefit from its erasure coding capability.
No
TRIUMF-LCG2
From a hardware technology perspective, we have been using somewhat the same technology for the bulk of our storage systems over the span of many years (i.e. fibre channel) and followed industry trends and standards which has been effective to achieve stable Tier-1 operations. dCache has been quite good for a number of years.
We have explored Clustered file systems like Ceph which provide a certain level performance and reliability, however, for a full and optimal Tier-1 deployment at scale with proper integration with the ATLAS data management system is not trivial. Protocols supported are also limited.
We do have an xcache setup for testing, but how to fit it into ATLAS production is not clear. One of the intent was to set up a regional cache that could be used by the west coast Tier-2 centres (or Tier-3). The storage cached capacity would be of the order of ~PB scale if needed or necessary.
KR-KISTI-GSDC-01
We are currently exploring a high-density JBOD enclosure (at least 60 disks in a box) to ensure maximum storage capacity within a budget. Also we are looking for a reliable solution for software-defined storage, such as ceph, gluster, or EOS that can be deployed upon JBOD enclosures. One concern is that reliability of the storage with new configuration since we must meet the requirement of WLCG
MoU as a Tier-1.
GRIF
No
no
No pertinent inputs on this question
No
ZA-CHPC
We did, but have resorted back to EOS for all, for simplicity in admin.
JINR-T1
N/A
praguelcg2
No.
UKI-NORTHGRID-LIV-HEP
We don’t see any possibility of using storage systems other than magnetic HDDs due to budgetary constraints.
INDIACMS-TIFR
exploring EOS
TR-10-ULAKBIM
I am not sure they will provide improvements.
prague_cesnet_lcg2
We have not explored this, but I do not see any problem with it.
TR-03-METU
No
aurora-grid.lunarc.lu.se
I'm testing
BeeGFS.
SARA-MATRIX_NKHEF-ELPROD__NL-T1_
No. Yes.
No.
DESY-HH
We explore storage technologies and will continue in the future.
T3_PSI_CH
SAMPA
Yes, I've been looking at some storage technology alternatives. The impediment can be only requirements of the WLCG
INFN-T1
We are investigating alternative deployment models of distributed filesystems on top of RAIDless systems, (disk servers+JBODs). The aim is to understand if we can maintain high availability standards at a lower price (considering both cost of ownership and operation). Work is in progress and still do not have a definite answer.
GLOW
No
UNI-FREIBURG
2016 we tried low endurance SSDs as /tmp (I know - this is NO storage) in our non-WLCG cluster - not relieable. replaced all in 2018 with semi-enterprise quality SSDs
I see no point in using SSDs for basic storage yet.
Ru-Troitsk-INR-LCG2
No
T2_Estonia
We are happy with our storage technology. After we moved scratch from local to central storage we have no more performance issues with that and it has been quite reliable. It is very cost effective already. Changing that technology means we have to change whole our site. I think that is not possible and cost effective.
pic
We are exploring Ceph, but no alternative storage technology is being tested at PIC. The main impediment is monetary, given the available funding, exploring new endeavours is typically a difficult task. We provide the WLCG
MoU requirements in the most optimal way we’ve found, and we keep an eye and participate in Working Groups which deal with new technologies, which could be eventually adopted in our site.
ifae
We are exploring Ceph, but no alternative storage technology is being tested at PIC. The main impediment is monetary, given the available funding, exploring new endeavours is typically a difficult task. We provide the WLCG
MoU requirements in the most optimal way we’ve found, and we keep an eye and participate in Working Groups which deal with new technologies, which could be eventually adopted in our site.
NCBJ-CIS
We are not exploring any alternatives at the moment.
RAL has already done this with the deployment of Echo, our Ceph backed disk storage. We have optimised it for high throughput and it has massively improved performance over our previous setup. It also has far better reliability and data availability.
The object store model is successfully used in the commerical sector to scale far beyond the current needs of the WLCG. We believe our storage is comfortably scalable to meet the requirements of HL-LHC.
Ceph is used widely within the
RAL site (far beyond the Tier-1). It provides a backend for the STFC Cloud as well multiple
CephFS instance for local users. The effort to maintain a Ceph cluster is just a small part of a large group.
There were significant hurdles to implement this solution. The main issues were:
1) We did not offer an SRM or support for deprecated functionality such as the lcg-* commands. Despite experiments saying they didn't need them, it was often found that they did still rely on an old feature and we had to wait for them to fix it.
2) Experiments still basically work on the assumption that the underlying storage is a file system. Object stores are different from traditional files systems however they meet the needs of WLCG workflows very well. In some cases, things need to be done slightly differently and this has met with a mixed response from Experiments. Mostly they are very supportive but there are always edge cases.
3) There was a lot of mis-understanding about how it worked and a general resistance to change. We are still told it is more expensive/less performant/harder to support than other solutions despite this not being the case.
T2_IT_Rome
we have not explored
BNL-ATLAS
We explore Ceph with erasure code support, and dCache using libRados interface to serve storage at scale of 10PB. We are also evaluating Lustre as potential use in HPC and HTC environment in the future.
FZK-LCG2
We have been investigating object storage as a low performance storage tier, however at the high capacities we have to provide, the price of the disks is the dominating factor. We will keep investigating cheaper storage solutions based on erasure coding.
INFN-NAPOLI-ATLAS
No alternative technologies explored yet
--
OliverKeeble - 2019-08-22