header

Metadata Access from the Shell

You can access the metadata catalogue either with the mdclient metadata terminal tool as configured in the last section, or via the mdcli (or mdjavacli for those using the java package) command line tool which allows you to directly issue metadata commands on the shell, it's output is intended to be easily parseable by scripting languages:

  $ mdcli -p8822 -slocalhost listattr /
  t
  text
  f
  float

Metadata commands are parsed into pieces which are each separated by white space similarly to shell commands. If you want the white space to be part of one piece of the command itself, for example when you want to set an attribute to a string which contains white space, you must enclose it in singe quotes: ' '. Single quotes are part of the command syntax and used when parsing the commands into parts. You need them every time a part shall contain spaces. Double quotes however are used in queries Metadata Queries (expressions evaluated by the database backend) to distinguish strings from variable references and common values. In order to put a single quote into a command argument or any other character, you can use an octal code, e.g. to get a "'".

Note that quotes may be removed by the shell when parsing a shell command, so if you are using the mdcli tool where the AMGA command is given on the command line, you will need to protect these single quotes from being removed by the shell with double quotes:" ". The various APIs will in contrast usually (that is the Python and Java APIs do so, but not the C-API) automatically quote any arguments you pass to them with single quotes so they are not to be used in those APIs. The following is an example with mdclient, mdcli and the Python API showing how quotes are being used:

  $ mdclient
  Query> find /files '/files:producer="CERN"'

  $ mdcli find /files "'/files:producer="CERN"'"

An in python:

  mdclient.find('/files', '/files:producer="CERN"');

The metadata server uses a streaming protocol. Some APIs (for example the Java one) allow to interrupt the streaming of a response. The same is true for the mdclient. Pressing CTRL-C once during the transmission of the result will interrupt the streaming of the result. Only pressing CTRL-C a second time will terminate the client.

In the following is given a list of metadata commands. Additionally commands may be available for group or user access management, as described in Users, Groups and ACLs or Management of Users using the database backend depending on the server setup. To find out which commands are available on the server you are connected to, use the help command.

Manipulating Collections

Commands for Manipulating Attributes

Commands for entry manipulation

Finding and Updating Entries

As of AMGA 1.2 selectattr also supports constraints to the query similar to a SELECT clause. Queries can now take the form:

    query [distinct] [limit xx [offset yy]] [order exp] [group_by exp]
where the distinct keyword translates into a SELECT DISTINCT, the limit and offset clause limits the number of rows returned and with the order clause rows can be ordered according to the given expression. The group_by construct corresponds to the GROUP BY SQL statement.

Native SQL Query

Supporting native SQL query greatly eases the work needed to port existing SQL-based database applications to the Grid using AMGA and complements the existing metadata query language in AMGA.

The followings are currently supported commands for native SQL queries.

They conforms to the SQL-92 Entry Level <direct_data_statements>. All the keywords in native SQL queries should be provided in capital letters. However table names, aliases, and column names can be either of capital or small. For the details, refer Native SQL Queries .

Permission Handling

Capabilities

Capabilities are additional attributes assigned to individual users. They are used for example to allow a user to replicate login information. Currently no mapping of VOMS capabilities is done, but this could be a future use-case.

Index Management

Sudo

Since AMGA 2.0 a sudo command exists, which allows the root user to become any other user. The syntax is: sudo <user>

Table Constraints

Views

Views allow you to create virtual new tables (directories) that combine the information of other tables, similar to what selectattr does. AMGA uses the native support of the database to provide views, so the actual behaviour depends on the database backend. For example some backends (like PostgreSQL) allow you to update an existing view, which actually updates the tables behind it.

An important use-case of views are to support access restrictions to attributes (the columns of the underlying table). This is a typical use-case for views also in normal databse usage.

Views can be accessed and deleted like normal directories.

In the following examples, the first one shows a use case where a view (view1) is created using all the entries in the current directory, but using only the attr1 or attr2 columns. After assigning the right permissions to the resulting new directory (./view1), this can be made readeable for users who need to read these attributes, while they will not have access to the resto of the attibutes in the . directory. In the second example a view (view2) is created combining attributes from the current directory and the dir subdirectory.

  Query> view_create view1 . attr1 attr2 ''
  Query> view_create view2 . attr1 ./dir:attr2 'dir:FILE = FILE'

Sequences

Sequences allow the creation of a sequence of integer numbers, which are guaranteed to be unique. They are also monotonically increasing at least during a single AMGA connection. The exact implementation depends on the database backend, which can optimize handing out parts of the sequence in batches, so that two consecutive connections not necessarily get first a smaller number in the sequence and then the larger. Sequences are not supported by MySQL <5.0 and SQLite. On MySQL and Oracle sequences are implemented through stored proceedures. In PostgreSQL the native mechanism is used.

Sequences behave like another directory in a directory. They cannot be deleted with rmdir, howerver, instead sequence_remove must be used. The name of the sequence must be lower case due to limitations in some backends.

Backing Up Data

Site management

For using replication, each AMGA server needs to know about the other servers that take part in the system, in order to communicate with them. These will be refered to as <it>sites</it>. The information about other sites is stored in the backend. Sites have the following configuration properties:

    id
    name
    hostname
    port
    login
    password
    use_ssl
    authenticate_with_certificate
    cert_file
    key_file
    use_grid_proxy
    verify_server_cert
    trusted_cert_dir
    require_data_encryption

id is a numeric identifier internal to each AMGA instance and generated automatically by AMGA when the site is inserted in the configuration. name is an human-readable identifier of the site, which can be freely chosen by the administrator. hostname and port is the network address of the remote site. The rest of the properties control the security settings of the connection to the master and are similar to the ones defined in the mdclient.config file, having a similar meaning. Section Configuration of the C++ and Java command line clients describes their usage.

The following commands can be used to manage the sites and their configuration:

Various administrative commands

Replication

The following are the commands used to control replication from the AMGA command line interface. Some of them are for nodes acting as slaves, others for nodes acting as masters. Nodes acting both as slave and master can use all of them. Section Replication in AMGA provides the background information required to understand these commands.

Commands for Slave Nodes

Slave nodes are responsible for initiating replication, by contacting the master and requesting the replication of the directories they are interested on. This is done using the following commands:

Commands for Master Nodes

The main responsibility of master nodes is to configure the access control rights of slaves. Slaves connect to the master using the standard AMGA users and authenticate in the same way. Access control is done using the replication right, which is granted to users to control the directories they are allowed to replicate. The following commands allow granting and removing this right:

AMGA also has special commands for user and group management. They are optional and may not be available on your installation for example if it collaborates with a file catalogue and uses the permission system of that catalogue. For more information see Users, Groups and ACLs and User Management.

Federation

The following are the commands used to control federation from the AMGA command line interface. Section Federation in AMGA provides the background information required to understand these commands. One AMGA server may initiate federation, by contacting other AMGA server and requesting the federation of the directories it is interested on. This is done using the following commands:

Commands for Replica Management

The set of commands below provide functionality for the management of replicas, in terms of a global storage index, as well as a local replica lookup system.

In order to have support for replica management, the entries in the table need to have support for GUIDs. This is enabled, by using the makedir command instead of the createdir command, which takes exactly the same options, execpt, that permissions and the guid feature are switched on by default.

GUIDs are inserted as root via

Query> addentry /fdir/entry GUID beda818d-1090-489c-b9f9-6f3156a81828
If you want to play with this, use the uuidgen UNIX-command to create new guids. To list the guid values, use the -g switch in ls or stat.

The following commands allow the management of replicas of files, similar to a file catalogue local to a site. This functionality provies the possiblity to map GUIDs to SURLs (also known as PFNs, Physical File Names). Note that AMGA only supports one SURL per guid. This means that if you want to support multiple replicas (e.g. on multiple Storage Elements), you need to have one AMGA catalogue per SE:

The next set of commands allows to assign sites as locations of replicas to guids. This is usefull if the catalogue is being used in a global mode. In that case it can easily provide information on sites with replicas of a given file. This functionality is also referred to as a SiteIndex (SI) and AMGA stores the information in a highly-optimized bitfield. The Site Index functionality can be combined with the AMGA functionality for local replica management above. Note that the ideal mode of operation would be if local AMGA catalogues store SURLs and can be locally updated, replicating the information back to a global SI. However, this mode of operation is currently not supported by AMGA's replication functionality.

The SI functionality is highly optimzed in AMGA for the usage with Oracle, but also PostgreSQL was successfully tested and the performance is very good. The guids are stored in binary format as well as the storage index bitmap, so the information is stored in the most compact format. Indexes are used for row-lookup.

The function lfn_lookup guid is used to retrieve the LFN and SURL from a given guid, however in AMGA 2.0 it only returns the directory in which a file resides.


Generated on Mon Apr 16 13:59:18 2012 for AMGA by  doxygen 1.4.7