My collection point for HADOOP-related projects around CERN-IT
(personal, no official info here. "real" page is at
ItHadoop)
Infrastructure
- old "ahc" cluster in CDB:
- "hadoop" cluster in Puppet: Cloudera CDH4.1.2
- Namenode: lxbrf39c04
- see "CDBHosts --puppet_hostgroup=hadoop/datanode" for rest
- IT-DB ex-RAC7
Projects / Test
The
WLCG Database TEG in their
final report (Apr 2012) made the recommendation to IT to
"..
deploy a suitably sized Hadoop cluster, [..and ] Hadoop clients, including pig/hive should be made
available on user interfaces(lxplus), together with a reasonably sized HBase installation. We make
no operational requirements on the cluster [..]"
(This TEG was chaired by Dario, so some overlap with ATLAS wishes)
- chat with IanB: under which conditions could IT offer this as a service?
- clearly manpower-limited - application level support & up must be with the experiment.
- split instances might work (self-serve model for new Hadoop instances), as long as not 1 SM per instance.. and as long as the resources come from the experiment allocation (probably wall-clock time for CPU, while the instance is up, plus reserved storage)
- ideal: shared instance, e.g. on batch nodes (use some cores for Hadoop, rest for bacth - closer to sweet spot for Hadoop). Need to see which productionizing steps are required, and whether (inevitable) clashes/overlapping utilization are harmful
- security on
- accounting - need to report the actual use for storage and CPU (and ideally remove from CASTOR/EOS and LXBATCH quota)
ATLAS
ATLAS formally (B.Kersevan+H. von der Schmitt
memo to I.Bird, 2013-01-30) asked for IT support around Hadoop, initially for prototyping, in 3 areas:
- PanDA logging: Archival and querying of job and file records in Panda (in development)
- EventIndex (starting design)
- Distributed Data Management (DDM) accounting and related activities (already in test production),
ATLAS-TAG
ATLAS-DDM
(see memo for current setup)
IT-internal
AI monitoring/GNI
CASTOR logviewer
OpenLab
- Bob: interest expressed by 2 partners, no project yet
- initial testing by Maaike (openlab fellow, worked with IT-DB/Oracle): "physics analysis inside databases"
Topic revision: r5 - 2013-05-28
- JanIven