ArdaGrid Web>DIANE>DIANETutorial (2011-02-10, JakubMoscicki)

DIANE Tutorial

DIANE Tutorial
Introduction
Querying the master
Submiting more workers (adding resources to the master)
Using the worker agent factory
Task monitoring
Configure runtime parameters and job scheduling policies
Advanced scheduling with job requirements: matching task and worker capabilities

Introduction

DIANE is a tool for managing large number of small independent tasks (typically for parametric study). It works based on master-worker scheme which can improve application execution time and provide partial fault tolerance (in comparison with built-in gLite parametric job). DIANE is very flexible and power users may customize almost any aspect of it, including scheduling and application wrappers.

This tutorial just gives the basics: how to run simple executable applications from the point of view of a simple user.

If you want to know more here is more reading. But you don't have to read it now to continue the tutorial:

more on DIANE computing model in the reference manual at http://cern.ch/diane/reference.php
in case of problems see first the FAQ and Cookbook

The basics

DIANE run consists of a master and worker agents. The master is a mini-server which is started automatically when the processing starts and goes away when the processing terminates. The workers agents typically run as jobs (on the Grid, batch system). Master will survey the processing of your tasks (making sure that all tasks are completed). The worker agents provide the CPU power to do the tasks. If tasks fail or workers die for any reason the master will automatically reassign the tasks to other workers. As a user you may easily control what should be the master policies in this respect.

Worker agents are submitted as jobs to the EGEE Grid using the Ganga interface (http://cern.ch/ganga). Using Ganga is not mandatory but it is very convenient and flexible because it allows you to easily use your batch system or other local resources. We will come back to this later.

To run DIANE you must have:

at least one open network port which accepts incoming TCP connections (talk to your system administrator if needed)
if you want to submit to the Grid then the Grid job submission commands must be available (EDG or gLite). This is so called Grid User Interface.
if you want to submit to a batch system then the batch system commands (bsub,qsub... etc) must be available (LSF or PBS or SGE)

Of course if you just want to use your local batch system then you do not need to have the Grid UI and vice-versa.

Installation

Follow the instruction at http://cern.ch/diane/install.php

Initial configuration

Ganga (submitter interface)

The diane-submitter command encapsulates the Ganga interface. Before you submit any job you must configure Ganga correctly. Ganga is installed in the background by the DIANE installation script.

Note: the diane-submitter script is new in version 2.4, for older versions you should use diane-env -d ganga instead.

First time users: run diane-submitter -i to enter Ganga prompt. Type ^D (Control-D) to exit.

Then edit the configuration file ~/.gangarc

For running at the Grid at minimum you should define the following parameters:

[LCG]VirtualOrganisation=YourVO
[LCG]GLITE_ENABLE=True if you want to use gLite middleware
[LCG]EDG_ENABLE=True if you want to use EDG middleware

For using he batch system you typically do not need to configure anything except when your batch system is installed in a strange way. See corresponding section of ~/.gangarc file [LSF,PBS,SGE].

Simple Example

Here is example of simple executable application.

Suppose that you have a hello script is in your current working directory and it looks like this:

#!/usr/bin/env bash
rm -f message.out
echo hello $* > message.out
echo "I said hello $* and saved it in message.out"

After changing the executable permission bits (chmod u+x hello) you may simply run it like this: hello 123.

Now suppose that you want to run 20 times the "hello" executable script, changing its arguments every time. So have 20 almost identical tasks. In DIANE you define the work to be done using a run file which is a simple python file.

File hello.run:

# tell DIANE that we are just running executables
# the ExecutableApplication module is a standard DIANE test application

from diane_test_applications import ExecutableApplication as application

# the run function is called when the master is started
# input.data stands for run parameters
def run(input,config):
	d = input.data.task_defaults # this is just a convenience shortcut

	# all tasks will share the default parameters (unless set otherwise in individual task)
	d.input_files = ['hello']
	d.output_files = ['message.out']
	d.executable = 'hello'

	# here are tasks differing by arguments to the executable
	for i in range(20):
		t = input.data.newTask()
		t.args = [str(i)]

Since release 2.1 there is a possibility to add extra monitoring information to tasks. For example you may add some application-specific details or labels to easily keep track of work done by the tasks. This new functionality is described on a separate DIANETaskMonitoring page.

Now you can start the master using the run file:

$ diane-run hello.run

The master will start in its own run directory (this information is printed by the master - check the output). The rundir is typically located in ~/diane/runs/nnn. The default location may be changed with $DIANE_USER_WORKSPACE environment variable.

Note: If you do not specify the port then master will be started on a random port (selected by the operating system). This may not work if you have firewall and you may be required to use only certain ports. Check DIANEQuestionsAndAnswers on how to set the master's port number.

You may now start a couple of worker agents:

$ diane-submitter Local --diane-worker-number=2

This command will start 2 worker agents locally on your computer. You will see master producing quite some output. After a while the processing should be terminated and you are ready to see the results. All results are stored by the master in the run directory (this behaviour may be customized and depends on the application plugins).

diane-submitter uses Ganga tool to submit and run worker agent jobs. Each of the worker agent jobs can process multiple diane tasks. If you have many worker agent jobs the run completion time will be shorter. If you have less worker agent jobs or if some of the worker jobs crash for some reason than the only noticible effect will be the slowdown of the run but everything will continue to run without you intervention. You may also add new worker agents at any time.

Running diane-submitter -i enters interactive mode and is equivalent to running diane-env -d ganga without any arguments. In this mode you can inspect what is the status of your worker agent jobs, kill them if you like, inspect the stdout/stderr when they are terminated and so on. More in the Ganga tutorial: http://cern.ch/ganga/user/html/GangaIntroduction

Quick recipe to get the stderr of the worker job:

start diane-submitter -i and wait until the worker job status is completed
then use j.peek() method or ls -l $j.outputdir.

Querying the master

The full index of commands is provided in the Related Pages of the Reference Manual.

Check -h and --help options.

Every time you execute diane-run a new run directory is created. In this way you may start a number of masters which will not clash with one another.

Here are some commands which directly talk to the master. Unless you specify otherwise the commands always apply to the last started master.

diane-master-ping : checks if the master is alive,
diane-master-ping getStatusReport : gets the summary of the master status,
diane-master-ping getStatusReport : gets more detailed information about the master status,
diane-master-ping kill : kills the master.

Use -f option to select a different master (if you have many of them running concurrently).

If you have started a number of masters and you are lost, you may use diane-ls command which will give you the summary on all locally started masters.

Submiting more workers (adding resources to the master)

Submission of worker agent jobs is easily handled with Ganga submitter scripts.

List all available submitters with diane-submitter -l

Note: the diane-submitter script is new in version 2.4, for older versions you should use diane-env -d ganga instead. You will not be able to use -l option.

A few predefined submitters are distributed with the release: here is the list in SVN.

User-defined submitter scripts may be placed in ~/diane/submitters.

Here are some examples:

Submitting 1 more worker to local batch system (LSF):
- diane-submitter LSF
Submitting 1 more worker locally:
- diane-submitter Local
Submitting 5 more workers on the EGEE/EGI/LCG Grid which will connect to the last started master (corresponding to the latest directory in $DIANE_WORKSPACE/runs):
- diane-submitter LCG --diane-worker-number 5
Submitting 5 workers which will connect to the master number XXX (corresponding to =$DIANE_WORKSPACE/runs/XXX):
- diane-submitter LCG --diane-worker-number 5 --diane-master=workspace:XXX
Starting a worker on an arbitrary host (e.g. selected node of a private cluster)
- Detailed instructions are here: DIANERemoteSubmitter

In same cases you may want to use the --diane-run-file option which will pass additional configuration parameters into the worker agent jobs. Examples:

you may want both master and workers be started in the authenticated mode (GSI). You can manage the configuration parameters from a single place - the run file.
your application may require customization of the worker agent job submission - such customization is made available via the run file.

You may also easily write your own submitter scripts to customize the system to your needs. Look at LocalSubmitter.py for an easy example or at LCGSubmitter.py for more structured implementation.

Using the worker agent factory

The AgentFactory is a special submitter script which automatizes the submission of the worker jobs. If you submit a bunch of worker jobs then over a longer period of time you'll see that some of these worker jobs terminated (for different reasons). You may be however interested on keeping a certain number of worker agents always in the pool. The worker factory will do exactly that on your behalf - if the number of worker agents drops, the worker factory will automatically submit some more. The worker factory may be run in a cron or directly from the command line.

Current implementation of the agent factory works with the LCGSubmitter.py and it uses the heurisitcs to choose the best possible Computing Elements.

Try: diane-env -d ganga AgentFactory.py --help

Task monitoring

Configure runtime parameters and job scheduling policies

Most of applications use the SimpleTaskScheduler, which allows to set several simple scheduling policies.

Core framework runtime configuration parameters define other advanced settings.

Scheduler policies and configuration parameters are set in the run file. A canonical example:

def run(input,config):
    input.scheduler.policy.STOP_IF_FAILED_TASKS = True
    input.scheduler.policy.FAILED_TASK_MAX_ASSIGN = 1
    config.WorkerAgent.HEARTBEAT_DELAY = 10
    ....

Several examples are provided in the test directory

Advanced scheduling with job requirements: matching task and worker capabilities

DIANERequirementsCapabilityScheduling

OUTDATED: If you want your master to join the directory service, then you should specify an additional --ds option. Read more on DIANEDirectoryService

-- JakubMoscicki - 08 Jun 2007

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
jpg	ganga-diane-architecture.jpg	r1	manage	51.3 K	2008-10-31 - 19:24	JakubMoscicki

Topic revision: r17 - 2011-02-10 - JakubMoscicki

ArdaGrid

ArdaGrid Web
ArdaGrid Web Home
Changes
Index
Search

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
ArdaGrid All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback