Grid Monitoring Data Exchange Standard (Draft)
General concepts
Data types
In order to avoid compatibility problems when calling web services written in different programming languages from various sources (web browser, application, etc.) we suggest to narrow down the list of data types to the following set:
- Scalar values:
- string - quite intuitive
- number - optionally two sub-types: int, float
- boolean - can be represented as string "true" or "false"
- timestamp - W3C date and time, default precision: up to seconds (optionally fractions of seconds)
- Structures:
- list - sequence of elements of any type, preserving order of elements (optionally random access with indexing from 0)
- dictionary - unordered associative array (hash) with key of any scalar type and value of any type (optionally may preserve order of pairs)
Conventions
Passing list in URL
A list of values should be passed as a single parameter in the URL using the following rules:
- parameter name MAY have
[]
suffix (for compatibility with PHP)
- individual values from the list should be passed as separate
param=value
or param[]=value
expressions separated by &
sign (CGI multiple selection box form)
Example:
- goal: passing [val1, val2, val3] as parA
- solution:
Timestamp format
Suggestions for all cases (including XML):
- W3C date and time format (based on ISO8601)
- all components from year to second (no fractions if possible)
- all values in UTC ("Z" as timezone)
- Example of this format convention would be: ...?startTime=2007-02-09T14:42:20Z
Boolean value
Boolean value for URL and XML should be passed by one of the strings: true, false
Data model
The following picture shows the data model used for standardised request and response formats:
The data model introduces three categories:
- class
- attribute
- role (marked as a label over associations)
All services following the standard should comply to the following rules:
- accept parameters related to the data model (request)
- build a query over the service specific data repository using the given parameters as a filter
- deliver a result of the query in standard format (response)
Request format
As the most typical request format for all web services we suggest an HTTP URL with parameters encoded in GET or POST query string:
- should be supported by all components that don't require complex types for parameters (flat lists at most)
- procedure name can be encoded either as part of URL (URI path element) or one of parameters
- all parameters related to the data model (common parameters) should be constructed with recommendation given in next section of this document
- in addition service can expect any number of service specific parameters outside the common data model
Common parameters
All the parameters related to the data model defined in this document should be constructed using one of the two possible notations (note that underscore is used as a separator, and capitalisation of identifiers):
-
ClassName_attributeName
- for example Site_name
-
roleName_ClassName_attributeName
- for example criticalFor_VO_name
The optional role in the parameter has to be used when the class alone for a given request is not enough to construct a query without ambiguity.
Additionally the current standard defines a set of additional parameters not related directly to the data model but used to narrow down a query:
-
startTime
- beginning of the time range for historical queries
-
endTime
- end of the time range for historical queries
Composing request URL
The URL composed as a valid request must contain the following components:
- base url containing host and path components (may contain procedure name)
- query string in case of GET method starting with
?
and in case of POST passed as POST data:
- service specific parameters (may contain procedure name)
- subset of common parameters
The exact list of service specific parameters and supported common parameters together with the exact semantics should be defined in the specification of a given service.
Example:
Response Format
Examples
Current status of services
Request URL:
http://server.org/current_status?return=criticalFor_VO_name&Region_name=CERN&Site_name=CERN-PROD& \\
Service_endpoint=https%3A%2F%2Fce101.cern.ch%3A2119%2F&calculatedFor_VO_name[]=OPS&calculatedFor_VO_name[]=Atlas
In the URL above the meaning of the parameters is the following:
-
return
- additional role_Class_attribute to be included in the output, here: return for which VO name each metric result is critical
-
Region_name
- selected value for region.name, triggers the output of region
element
-
Site_name
- selected value for region.name, triggers the output of site
element
-
Service_name
- selected value for region.name
-
calculatedFor_VO_name[]
- list of selected VO names for which metric results should be returned
Response XML:
<?xml version="1.0"?>
<root xmlns="http://cern.ch/grid-mon/2007/05/mon-exchange-schema/">
<Region name="CERN">
<Site name="CERN-PROD">
<type>Production</type>
<status>Certified</status>
<SiteMetric name="site-daily-avail">
<measurement>
<status>ok</status>
<summary>0.3</summary>
<timestamp>2007-02-25T00:00:00Z</timestamp>
</measurement>
</SiteMetric>
<Service endpoint="https://ce101.cern.ch:2119/" type="CE">
<isMonitored>true</isMonitored>
<inMaintenance>false</inMaintenance>
<metricGroup groupBy="calculatedForVO" value="OPS">
<ServiceMetric name="service-daily-avail">
<measurement>
<status>ok</status>
<summary>0.3</summary>
<timestamp>2007-02-25T00:00:00Z</timestamp>
</measurement>
</ServiceMetric>
<ServiceMetric name="CE-sft-job">
<measurement>
<status>ok</status>
<criticalForVO>OPS</criticalForVO>
<criticalForVO>Atlas</criticalForVO>
<timestamp>2007-02-26T13:00:00Z</timestamp>
</measurement>
</ServiceMetric>
<ServiceMetric name="CE-totalcpus">
<measurement>
<status>ok</status>
<summary>2433</summary>
<criticalForVO>OPS</criticalForVO>
<timestamp>2007-02-26T13:20:00Z</timestamp>
</measurement>
</ServiceMetric>
<ServiceMetric name="CE-freecpus">
<measurement>
<status>ok</status>
<summary>200</summary>
<criticalForVO>Atlas</criticalForVO>
<timestamp>2007-02-26T11:30:00Z</timestamp>
</measurement>
</ServiceMetric>
</metricGroup>
<metricGroup groupBy="calculatedForVO" value="Atlas">
<ServiceMetric name="CE-sft-job">
<measurement>
<status>ok</status>
<criticalForVO>Atlas</criticalForVO>
<timestamp>2007-02-26T12:11:00Z</timestamp>
</measurement>
</ServiceMetric>
</metricGroup>
</Service>
</Site>
</Region>
</root>
History of selected test for a service
Request URL:
http://server.org/metric_history?Service_endpoint=https%3A%2F%2Fce101.cern.ch%3A2119%2F& \\
calculatedFor_VO_name=OPS&ServiceMetric_name=CE-sft-job
Response XML:
<?xml version="1.0"?>
<root xmlns="http://cern.ch/grid-mon/2007/05/mon-exchange-schema/">
<Service endpoint="https://ce101.cern.ch:2119/" type="CE">
<metricGroup groupBy="calculatedForVO" value="OPS">
<ServiceMetric name="CE-sft-job">
<measurement>
<timestamp>2007-02-26T10:00:00Z</timestamp>
<status>ok</status>
</measurement>
<measurement>
<timestamp>2007-02-26T11:00:00Z</timestamp>
<status>error</status>
</measurement>
<measurement>
<timestamp>2007-02-26T12:00:00Z</timestamp>
<status>ok</status>
</measurement>
</ServiceMetric>
</metricGroup>
</Service>
</root>
History of selected test for a host
Request URL:
http://server.org/metric_history?Host_name=bdii001.cern.ch&HostMetric_name=host-cpu-load
Response XML:
<?xml version="1.0"?>
<root xmlns="http://cern.ch/grid-mon/2007/05/mon-exchange-schema/">
<Host name="bdii001.cern.ch">
<HostMetric name="host-cpu-load">
<measurement>
<timestamp>2007-02-26T10:00:00Z</timestamp>
<summary>0.1</summary>
<status>ok</status>
</measurement>
<measurement>
<timestamp>2007-02-26T11:00:00Z</timestamp>
<summary>10.4</summary>
<status>warning</status>
</measurement>
<measurement>
<timestamp>2007-02-26T12:00:00Z</timestamp>
<summary>0.8</summary>
<status>ok</status>
</measurement>
</HostMetric>
</Host>
</root>
TODO
- the whole Response format section, however the current example are good starting point
- semantics of the data model: how it maps to GOC DB, Glue Schema, SAM and the authoritative sources of values
- naming convention of metric names, suggestion
type:name
format, examples:
- I decided not to use the prefixes and leave it to be service (repository) specific. In future if we are observing problems with metric names clashing we can introduce namespacing with the default namespace (no namespace prefix) being of local scope (referring to this monitoring tool) [-- PiotrNyczyk - 04 Apr 2007]
- decide on capitalization of identifiers, suggestion:
ClassName
, attributeName
, roleName
Comments
Please feel free to add any comments below:
- -- Emir Imamagic - 17 May 2007
- I'm aware that semantics of data is still in TODO list, but I would like to add a comment regarding attribute type in class Service. I see in the examples that for type values you suggest using nodetype values (e.g. CE, SE, MON). I think that nodetype isn't the best source of values for service type because it doesn't give a clear description what the service really is (e.g. service on port 2119 is either standard or gLiteCE-flavour Gatekeeper). Also if you have multiple nodetypes deployed on single host, single service can be associated with multiple nodetypes (e.g. MDS service is deployed on both CE, SE and MON). My suggestion would be to use real service names (e.g. GridFTP, MDS, ...) or GlueServiceType values (where applicable).
- If I understodd correctly, nodetype values are associated with hosts. Shouldn't the class Host include a value (or additional class) which defines which nodetype that host is? Or do you plan to exclude it from the standard completely, because it doesn't have to be used in general (e.g. some grid implementation might not use it at all)?
- -- IanNeilson - 06 Mar 2007
- What is the string encoding used. Should this be defined?
- IMHO we don't have to define that as both for URIs and XML there are standards for that . In URIs it is clear that everything that goes beyond accepted set of chars should go as %hex (for example %20 for space). In XML you have a default character set or encoding attribute in the top (<?xml ...) [-- PiotrNyczyk - 04 Apr 2007]
- Why is it necessary to "recommend" alternatives to XML-RPC? (just asking)
- No, and that's why I removed all references to XML-RPC, SOAP, etc. [-- PiotrNyczyk - 04 Apr 2007]
- -- IanNeilson - 27 Mar 2007
- A 'pseudo'-metric is needed in the data model to account for the derived (from critical tests) overall status of the service.
- I think we shouldn't introduce a new class for that as it doesn't change anything in attributes or relations to other classes. This is just a metter of the semantics that one metric uses values measured on a "real" system and another uses the results coming from other metrics. [-- PiotrNyczyk - 04 Apr 2007]
- What is the timestamp of the above metric? (now, of latest metric, of earliest metric ..... )
- I would say the timestamp should be always "time of measurement" and the details can be only defined for a particular metric individually. [-- PiotrNyczyk - 04 Apr 2007]
Change Log
- -- PiotrNyczyk - 04 Apr 2007
- answered to the comments.
- modified a bit the XML format according to Paul Millar's comments (by mail)
- removed prefixes for metric names
- -- PiotrNyczyk - 12 Apr 2007 - modifications after exchange of ideas with Paul Millar
- changed the namespace in XML to a working one
- changed the structure of history XML: additional element "historyEntry"
- -- PiotrNyczyk - 16 May 2007 - modifications after phone conference (Piotr, James, Paul)
- changed data model: Service identified by an endpoint
- changed the namespace in XML to 2007/05
- changed the structure of XMLs:
- replaced element "historyEntry" with "measurement"
- "measurement" obligatory even for current status query
- added example of SiteMetric in current status response
- example of HostMetric response