Welcome to the CERN-Tier0 Analysis Facility (CAF) for (p)DUNE Users TWiki Home page

General

(p)DUNE Analysis Facility located at CERN-Tier0 to provide fast response to latency critical activities:

  • Diagnostic of detector problems
  • Prompt alignment and calibration
    • export new constants to Tier-0 and other computing centers worldwide (FNAL) for future data reprocessing
  • performance services
  • Hot physics analysis

The task of the (p)DUNE Tier-0 system is to perform the prompt reconstruction of the raw data coming from the on-line data acquisition system, and to register raw and derived data with the File Transfer Service ( FTS) system, which then distributes them to the FNAL centre and beyond.

Introduction

Tier0-core/nodes/EOS access

Neutrino Platform and the prototype experiments at CERN, NP02/ NP04, have dedicated cores and space from CERN Tier0. By now what it is available : 1 PB EOS, 6 PB tape and 1500 cores and from August onwards NP will provide 3 PB of EOS disks space. The machines are the normal batch worker nodes, 2 GB memory per core, and the jobs will run together with other jobs on the batch farm. The batch system is based on HTCondor. The CPUs are a mix of new and not so new, typically less then 3 years old. The typical machine size is 8 cores. For more information how to access NP experiments EOS space, follow this link.

Software Installation

Submitting jobs to NP Tier0 cores

After login to lxplus cluster, you can proceed with install larsoft/dunetpc with the same way as in neutplatform cluster. See instructions 1, 2.

You can have a set of examples scripts located at the NP gitlab repository. If you have problems accessing it let meMicrosoft Outlook e-mail file know.

To submit the condor job: 
condor_submit nptest_htcondorjob.sub
Have a look at the self-explanatory comments of the scripts.
To examine the running jobs, you have several options. 
The closest analog to "bpeek" (lxbatch system) is "condor_tail <jobID>" , 
which can be used to inspect the standard out (or other files condor knows about) 
of running jobs. 
Or you can use "condor_ssh_to_job  <jobID>" which drops you into the same sandbox as the running job, 
allowing you to inspect as you see fit. 

You can also have EOS for the input/output/log. 
 
Keep in mind: "condor_submit -spool" will just take your files and submit them to the schedd, 
and won't write to them in the meantime. 
You then can retrieve them when your job completes using "condor_transfer_data".
You can move output files at the end of your command by just 
having the script you submit as the "executable" do it for you

To monitor batch NP02/NP04 jobs , follow the link1, link2 respectively.

A collection of useful HTCondor commands, one can find here. For more information have a look at the Quick Start Guide form CERN HTCondor.

CAF system accounts

Specific CAF subsystem accounts and job priorities scheduler/queues:

For the HTCondor schedds:

The standard condor_schedds that IT has are also not available for login because they hold people's credentials - but are load-balanced, so in principle should be fine. There is an option later, if we all agree/want, to run our own schedd (I'm in favor of this option). This can be handy if, for example, lots of production jobs are submitted from the same machine - the local schedd gives a much faster response.

CAF batch groups with priority shares

The following lxbatch batch groups with priority shares are available for the systems and combined performance groups. The batch group managers are responsible for adding and removing members.

Detector system Batch group name (bugroup) Batch group manager

Performance group Batch group name (bugroup) Batch group manager

Data Model

Useful Commands

Monitoring

CERN's HTCondor monitoring is here. The following monitoring link can show if we have users using our resources.

Batch Monitoring for NP02

HTCondor monitoring for NP02

Batch Monitoring for NP04

HTCondor monitoring for NP04

Operations

EOS and TAPE

  • For more information how to access NP experiments EOS space, follow this link.
  • For more information how to access NP experiments CASTOR (CERN Advanced STORage manager), a data tape storage system used at CERN, follow this link.

Miscellanea

Useful links

Contacts

Tier-0 contacts:

In case of problems with the on-call phone, contact the experts directly:

Arrow blue up Back to Neutrino Platform Computing Twiki Main Page


Major updates:

-- NectarB - 2017-02-03

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2018-03-05 - NectarB
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback