LCG Web>DomaActivities>QoS>QoSSurveyAnswers>QosSurveyAnswersQ7 (2019-09-20, OliverKeeble)

Question

Question 7. Have you explored or are exploring alternative storage technology that you believe could provide significant improvements, such as enhanced performance or reliability, or reduced monetary- or support cost (e.g., low endurance SSDs)?Have you seen or do you imagine there are any impediments in adopting that technology? For example, the technology cannot be deployed because it would not live up to the WLCG MoU requirements, as they are currently phrased.

Answers

CERN

Hardware

Consumer disk drives - inconclusive results, and what’s the future of rotating consumer drives anyway? They are difficult to buy in bulk, and the cost advantage vs enterprise is reduced under these circumstances SMR similarly inconclusive, trade-off price, complexity, write-speed etc. SSDs - we use them in targeted situations, but they are not threatening spinning disk for capacity. Investigating high capacity low endurance SSD, e.g. CTA use case as tape buffer. 3.8TB drives under consideration.

Infrastructure

Infrastructure refinement - reduce overheads by managing network/chassis/head etc per disk. Global price per PB is always the consideration in purchasing. New systems have 192 12TB disks per node.

Redundancy

Erasure coding for EOS. Has been demonstrated. Can be reconsidered as a purely internal optimisation, under active investigation this year for production use. Different Ceph S3 regions set up for redundancy, with mirroring. Considered using external cloud in the same way (not pursued).

Static QoS

Different service instances deployed with different QoS. Ceph specialisation e.g. 0.5PB of all flash cephfs storage. Creation of all-flash and bulk storage classes

Data lifecycle

Ceph also has data lifecycle stuff which may be interesting. No concrete use-case yet.

WLCG pledges privilege raw capacity and thus constrain the flexibility to provide smaller, faster installations. This affects many aspects of our systems, e.g. i/o dimensioning for tape (number of drives), configuring disk cache for tape (where throughput is more important than capacity), deploying SSDs for targeted use (e.g. system disks, journals, high-performance analysis interfaces).

There are often infrastructure constraints on available options - e.g. geographically distributed systems cannot realistically use erasure coding.

Without an economic model, users will always simply choose the fastest option (observed with Ceph block devices where different classes are available). Users currently have no notion of cost. Many dimensions of performance are not accounted (currently it’s only capacity) - e.g. metadata i/o is not accounted, this changes how we dimension the services.

Procurement rules which promote bulk purchasing of standard blocks reduce our ability to tailor services and QoS. For example, dimensioning the Castor disk pool was done on spindles and NICs, and given that all building blocks are standard, the capacity was a by-product of this calculation (and is higher than necessary).

hephy-Vienna

KI-LT2-QMUL

Looked at CEPH as posable replacement to Lustre. but Lustre fulfils our needs

UKI-LT2-RHUL

Exploring the idea of no SRM, just a cache.

RO-13-ISS

Depending of the experiment needs, the main indicator is the cost/TiB with a certain network/hdd threshold available

Nebraska

We have not looked at alternate technologies.

INFN-ROMA1

We did explore SSD systems and other technologies, although the funding available is not enough to have large-scale testbeds. We do not foresee any problem to integrate several technologies, as we do it already.

NDGF-T1

No.

BEgrid-ULB-VUB

We have looked into ceph as a way to replace raid and enhance the reliability of our mass storage, but it requires a lot more memory than standard raid+dCache, which was not conceivable due to memory price. For now costs are regularly reduced by increasing disk storage and increasing disk number per node. AFAIK, only something like ceph with Erasure Coding could increase netto storage compared to our current small raid6 pools while keeping the same safety margins, and hence reduce our storage costs.

NCG-INGRID-PT

We are using NVMe storage in certain nodes for higher performance. Low endurance might be useful for high read low write access, likely will not work for caching.

IN2P3-IRES

We are looking at new SDN software, like OpenIO for improving the performance of SAS nearline and cut the cost of performance.

LRZ-LMU

We did not test on a larger scale

CA-WATERLOO-T2

Don't see non-spinning storage costs being competitive for ~2PB+ for a while

CA-VICTORIA-WESTGRID-T2

Tentative minimal Ceph exploration

Taiwan_LCG2

Nothing but caching mechanism. So, I think WLCG storage evolvement is on the right track.

IN2P3-SUBATECH

asd

MPPMU

INFN-LNL-2

Australia-ATLAS

No, because we have no money. Would like to use University S3-compatible storage

SiGNET

For permanent storage we will stick with HDD, for cache we plan to migrate to SSD (smaller volume). For sequential access (streaming, transfers), the HDD performance is sufficient, for direct/random I/O SSD provide a significant performance boost. Ceph tiering technologies will be explored in addition to automatic cache prefetching from HDD to SSD.

KR-KISTI-GSDC-02

I heard that there will be an increase in JBODs with software RAID systems because of the cost savings. I agree that the cost problem is a big problem. First of all, we need to focus on this problem.

UKI-LT2-IC-HEP

BelGrid-UCL

UKI-SOUTHGRID-BRIS-HEP

GR-07-UOI-HEPLAB

UKI-SOUTHGRID-CAM-HEP

No - site has to move to "storage-less" solution for viability

USC-LCG2

Not in the context of WLCG

EELA-UTFSM

DESY-ZN

We are looking at it, but the current technology still provides the best ratio in price/performance for us

PSNC

n/o

UAM-LCG2

No. No impediments.

T2_HU_BUDAPEST

INFN-Bari

IEPSAS-Kosice

no we haven't explored alternative storage technology

IN2P3-CC

We already use some alternatives storages technologies in order to provide storage services for non WLCG activities (SSD, blackbox type NAS or SAN,). Nevertheless, theses usages are today not the same range of the WLCG usage (poor capacities, less performance, others requirements). We have not yet perform the exercise of imagine this type of solution on wlcg context

NONE_DUMMY

blah

WEIZMANN-LCG2

No. For us the simplicity of a uniform storage system, despite costing more for the equipment, is paramount because of our limited manpower.

RU-SPbSU

USCMS_FNAL_WC1

Certainly there are cheaper storage technologies that do not live up to the WLCG MoU requirements as they are currently phrased.

RRC-KI-T1

vanderbilt

UNIBE-LHEP

CA-SFU-T2

Yes. Deploying dcache on Dell SBB3s seems to be the most economic way. At the same time quite reliable.

_CSCS-LCG2

T2_BR_SPRACE

T2_BR_UERJ

We do not explore those alternatives. At the moment it is not being possible to think about adopting / testing new storage alternatives. (financial crisis in Brazil)

GSI-LCG2

These technologies are not yet attractive for us.

UKI-NORTHGRID-LIV-HEP

CIEMAT-LCG2

We are beginning to purchase SSDs for our worker nodes as local disk and we are thinking (as a future goal) about use CEPH as a global files system standalone and/or underlying our dCache instance.

a

T2_US_Purdue

We have tested SMR (shignled) disks as direct replacement of the current HDDs. SMR technology promises cost savings of the order of 10-15%, which we consider significant. Current support for host-managed (HM) SMR disks in CentOS7 is minimal, and required significant efforts on our side to accommodate that technology. Availability of HM-SMRs is not great. Device-managed (DM) SMR works out of the box, but currently offers no enterprise-class disks, and I/O performance is significantly lower, albeit compatible with our typical CMS workflows.

We are also exploring CEPH as a replacement of HDFS, to benefit from its erasure coding capability.

IN2P3-LAPP

TRIUMF-LCG2

From a hardware technology perspective, we have been using somewhat the same technology for the bulk of our storage systems over the span of many years (i.e. fibre channel) and followed industry trends and standards which has been effective to achieve stable Tier-1 operations. dCache has been quite good for a number of years.

We have explored Clustered file systems like Ceph which provide a certain level performance and reliability, however, for a full and optimal Tier-1 deployment at scale with proper integration with the ATLAS data management system is not trivial. Protocols supported are also limited.

We do have an xcache setup for testing, but how to fit it into ATLAS production is not clear. One of the intent was to set up a regional cache that could be used by the west coast Tier-2 centres (or Tier-3). The storage cached capacity would be of the order of ~PB scale if needed or necessary.

KR-KISTI-GSDC-01

We are currently exploring a high-density JBOD enclosure (at least 60 disks in a box) to ensure maximum storage capacity within a budget. Also we are looking for a reliable solution for software-defined storage, such as ceph, gluster, or EOS that can be deployed upon JBOD enclosures. One concern is that reliability of the storage with new configuration since we must meet the requirement of WLCG MoU as a Tier-1.

GRIF

IN2P3-CPPM

IN2P3-LPC

No pertinent inputs on this question

IN2P3-LPSC

ZA-CHPC

We did, but have resorted back to EOS for all, for simplicity in admin.

JINR-T1

N/A

praguelcg2

No.

UKI-NORTHGRID-LIV-HEP

We don’t see any possibility of using storage systems other than magnetic HDDs due to budgetary constraints.

INDIACMS-TIFR

exploring EOS

TR-10-ULAKBIM

I am not sure they will provide improvements.

prague_cesnet_lcg2

We have not explored this, but I do not see any problem with it.

TR-03-METU

aurora-grid.lunarc.lu.se

I'm testing BeeGFS.

SARA-MATRIX_NKHEF-ELPROD__NL-T1_

No. Yes.

FMPhI-UNIBA

No.

DESY-HH

We explore storage technologies and will continue in the future.

T3_PSI_CH

SAMPA

Yes, I've been looking at some storage technology alternatives. The impediment can be only requirements of the WLCG

INFN-T1

We are investigating alternative deployment models of distributed filesystems on top of RAIDless systems, (disk servers+JBODs). The aim is to understand if we can maintain high availability standards at a lower price (considering both cost of ownership and operation). Work is in progress and still do not have a definite answer.

GLOW

UNI-FREIBURG

2016 we tried low endurance SSDs as /tmp (I know - this is NO storage) in our non-WLCG cluster - not relieable. replaced all in 2018 with semi-enterprise quality SSDs I see no point in using SSDs for basic storage yet.

Ru-Troitsk-INR-LCG2

T2_Estonia

We are happy with our storage technology. After we moved scratch from local to central storage we have no more performance issues with that and it has been quite reliable. It is very cost effective already. Changing that technology means we have to change whole our site. I think that is not possible and cost effective.

pic

We are exploring Ceph, but no alternative storage technology is being tested at PIC. The main impediment is monetary, given the available funding, exploring new endeavours is typically a difficult task. We provide the WLCG MoU requirements in the most optimal way we’ve found, and we keep an eye and participate in Working Groups which deal with new technologies, which could be eventually adopted in our site.

ifae

NCBJ-CIS

We are not exploring any alternatives at the moment.

RAL-LCG2

RAL has already done this with the deployment of Echo, our Ceph backed disk storage. We have optimised it for high throughput and it has massively improved performance over our previous setup. It also has far better reliability and data availability.

The object store model is successfully used in the commerical sector to scale far beyond the current needs of the WLCG. We believe our storage is comfortably scalable to meet the requirements of HL-LHC.

Ceph is used widely within the RAL site (far beyond the Tier-1). It provides a backend for the STFC Cloud as well multiple CephFS instance for local users. The effort to maintain a Ceph cluster is just a small part of a large group.

There were significant hurdles to implement this solution. The main issues were: 1) We did not offer an SRM or support for deprecated functionality such as the lcg-* commands. Despite experiments saying they didn't need them, it was often found that they did still rely on an old feature and we had to wait for them to fix it. 2) Experiments still basically work on the assumption that the underlying storage is a file system. Object stores are different from traditional files systems however they meet the needs of WLCG workflows very well. In some cases, things need to be done slightly differently and this has met with a mixed response from Experiments. Mostly they are very supportive but there are always edge cases. 3) There was a lot of mis-understanding about how it worked and a general resistance to change. We are still told it is more expensive/less performant/harder to support than other solutions despite this not being the case.

T2_IT_Rome

we have not explored

BNL-ATLAS

We explore Ceph with erasure code support, and dCache using libRados interface to serve storage at scale of 10PB. We are also evaluating Lustre as potential use in HPC and HTC environment in the future.

FZK-LCG2

We have been investigating object storage as a low performance storage tier, however at the high capacities we have to provide, the price of the disks is the dominating factor. We will keep investigating cheaper storage solutions based on erasure coding.

INFN-NAPOLI-ATLAS

No alternative technologies explored yet

-- OliverKeeble - 2019-08-22

Topic revision: r3 - 2019-09-20 - OliverKeeble

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback