GangaInputFiles < ArdaGrid

ArdaGrid Web>GangaIndex>GangaInputFiles (2013-10-29, MattWilliams)

The Idea of input files

On the last Ganga developer days at CERN, the idea of using the output files as input for other jobs was discussed.

A new inputfiles field should be added to the Job.py class which will be a list of file objects : SandboxFile, MassStorageFile, LCGSEFile, DiracFile

In the input files mechanism we should use the same objects, same configuration and same automatic file type detection as in the output files mechanism.

Inputfiles field targets to replace the inputsandbox job field. Backward compatibility should be possible as in the outputfiles - per job the user can use either the inputfiles or the inputsandbox field.

These requirements are implemented now.

Adapting file type for the inputfiles mechanism

Imagine we have a file type class that was used as part of the outputfiles mechanism (above sections are describing this) that now we want to adapt for using in the input files mechanism.

What you have to do in your file type class is to override the getWNScriptDownloadCommand method

getWNScriptDownloadCommand() : returns the script that needs to be injected in the job's script for downloading the input files before running the application's script

MassStorageFile.py and LCGSEFile.py can be used as reference as this method is implemented there.

Input files mechanism

File objects (either standalone or used as part of the outputfiles mechanism) can be stored in the box and used as input for other jobs

In [2]:j.inputfiles = [box[-1], box[-2]]

In [3]:j.inputfiles
Out[3]: [MassStorageFile (
 namePattern = 'input1.txt' ,
 locations = ['/afs/cern.ch/user/i/idzhunov/gangamass/1074/input1.txt'] 
 ), LCGSEFile (
 locations = ['guid:8b55a764-8bee-46ae-be8c-cf7bd803fa81'] ,
 namePattern = 'input.txt' ,
 lfc_host = 'lfc-dteam.cern.ch' ,
 se = 'srm-public.cern.ch' 
 )]

In IBackend.py in master_prepare method we go throught the inputfiles list and :

if the file is SandboxFile we add it to the inputsandbox
if the file is different from SandboxFile and has to be processed on the client for the job's backend (this comes from the configuration) we download the file (calling get() method) in a temp dir and add it to the inputsandbox. Temp dir is deleted after inputsandbox has been tar archived.

In every backend class, in creation of the job script, we should make sure to include the script for downloading input files (those that should be processed on the WN according to the configuration)

For the purpose we call getWNCodeForDownloadingInputFiles method from OutputFileManager.py. This is done for Localhost, Batch, LCG and CREAM backends. For Dirac and other backends it should be done in the same manner.

Definition of getWNCodeForDownloadingInputFiles method :

def getWNCodeForDownloadingInputFiles(job, indent):

    if len(job.inputfiles) == 0:
        return ""

    insertScript = """\n"""

    for inputFile in job.inputfiles:  

        inputfileClassName = inputFile.__class__.__name__

        if outputFilePostProcessingOnWN(job, inputfileClassName):
            insertScript += inputFile.getWNScriptDownloadCommand(indent)

    return insertScript

Topic revision: r1 - 2013-10-29 - MattWilliams

ArdaGrid

ArdaGrid Web
ArdaGrid Web Home
Changes
Index
Search

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
ArdaGrid All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback