LCG Web>WLCGCommonComputingReadinessChallenges>CCRC08SSWGStorageBusyMeeting090212 (2009-02-25, FlaviaDonno)

First meeting on "busy" storage services (12/02/2009)

Participants

J.P. Baud, G. Behrman, B. Bockelman, Shaun De Witt, F. Donno, A. Frohner, E. Lanciotti, G. Lo Presti, L. Magnoni, R. Mollon, T. Perelmutov, A. Sciabà, A. Sim, A. Shoshani, D. Smith, P. Tedesco, R. Zappi

Assumptions

In what follows the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

desiredTotalRequestTime, remainingTotalRequestTime and estimatedWaitTime are expressed in seconds.

The value of desiredTotalRequestTime, if specified, MUST be zero or positive. If it is 0, each file request must be tried at least once. If it is unspecified, the request may be retried for a duration, which is dependent on the SRM, in case transient error occurs (5.1.2).

The initial TotalRequestTime MAY differ from the desiredTotalRequestTime specified by the user.

remainingTotalRequestTime indicates the amount left of the initial TotalRequestTime. If it is 0, the request has timed out. If it is -1, each file request will be at least tried once (5.1.2).

For estimatedWaitTime, -1 stands for "unknown" (1.33).

desiredTotalRequestTime MUST be relative to the time the request is initially processed by the server. remainingTotalRequestTime and estimatedWaitTime, if returned, MUST be relative to the time when the response from the server is provided to the client.

The desiredTotalRequestTime and the returned remainingTotalRequestTime MUST NOT determine the remainingPinTime of TURLs and/or copies or states of a SURLs. In fact, the remainingPinTime determines the lifetime of a copy or TURL after the relative request that has created it has been executed. In case of an srmPrepareToPut request, the request itself completes independently of when a PutDone is issued.

Conclusions

Prescriptions for the SRM server

Synchronous requests

The SRM server MUST return SRM_INTERNAL_ERROR when it is too busy to process the request.

Asynchronous requests

The SRM server SHOULD clean up a request after a reasonable time, but not before the request is completed, failed or aborted.
The SRM server SHOULD do its best to abort a request after the remainingTotalRequestTime has expired (=0). Aborting the request implies following the WLCG recommended behavior of srmAbort. This means that completed files MUST NOT be aborted or removed, while uncompleted files MUST be aborted and removed.
The SRM server SHOULD return a remainingTotalRequestTime less or equal to the time until the request times out.
The SRM server MUST return SRM_INTERNAL_ERROR when it is too busy (to queue the request?). It is up to the specific implementation to establish what means for a server (or any of its components) to be too busy. The status of the single files SHOULD NOT be returned.
If the request will be processed (request status equal to SRM_REQUEST_QUEUED or SRM_REQUEST_INPROGRESS), the SRM server SHOULD return an estimatedWaitTime for each file in the request to tell the client when the next polling SHOULD happen in order to have a new update on the status of each file.

Prescriptions for the polling algorithm of a client application

Synchronous requests

If the client application receives an SRM_INTERNAL_ERROR from the SRM server, it SHOULD repeat the request with an exponential retry time until a timeout is reached.

Asynchronous requests

A client application SHOULD specify a desiredTotalRequestTime equal to the total time for which the client would like the request to be processed.
A client SHOULD abort a request after the desiredTotalRequestTime has expired, in order to be backward compatible with the behaviour of some existing SRM server implementations.
A client application SHOULD stop polling the status of the request when one of these conditions is satisfied (in this order):
- the SRM server returns a remainingTotalRequestTime equal to 0 (zero) or SRM_REQUEST_TIMED_OUT
- the desiredTotalRequestTime has elapsed.
If the client application receives an SRM_INTERNAL_ERROR from the SRM server, it SHOULD repeat the request with an exponential retry time until a timeout is reached.
The client application SHOULD poll again the status of a request after a time of the order of the estimatedWaitTime of the files in the request if available, or after an exponential polling time if typical estimatedWaitTime is -1 or undefined.

Questions still to be answered

Is dCache taking into account desiredTotalRequestTime for all asynchronous requests?
Is dCache returning correctly remainingTotalRequestTime for all asynchronous requests?
Is it possible that the status of a request is updated timely by an SRM server so that an asynchronous request can return a reasonably up to date status ? For instance, in case of a file being UNAVAILABLE or LOST, an srmBringOnline request SHOULD return almost immediately with the correct file status.
Some people attending the phone conference had a different impression on the decision reached about the handling by the client of the SRM_INTERNAL_ERROR returned by the server, namely if the client should repeat the request with an exponential retry time or rather back off and retry sometime later. Which one should we consider? Please, note that an SRM_FAILURE corresponds to a fatal error and is a final status that forces the SRM server to process a request while with SRM_INTERNAL_ERROR we would like to signal clients about a very critical status of the server that might have non-working components or hardware failures or resources usage to the very limit. If we do not want to force the client to totally back off and therefore consider the SRM_INTERNAL_ERROR almost as a fatal error, we can suggest to use an exponential retry algorithm with a very steep exponential curve. The point of concern is that now servers suffer a lot from DoS attacks.

Topics for the next meering

Discuss the correctness of the implementations of the srmStatusOf... calls

Miscellaneous considerations

It is agreed NOT to reuse SRM_FILE_BUSY at the request level, but use SRM_INTERNAL_ERROR
estimatedWaitTime is returned by CASTOR (but it is not very reliable), and it is NOT by StoRM, DPM and dCache
It was suggested to use srmGetRequestSummary to define the polling time, since it is lighter on the SRM server than a srmStatusOf... call. The idea is to use longer polling times when the number of completed files does not change, and shorter times when it changes.
It is agreed that srmPing cannot possibly return an estimate of the time to process a request because it would be strongly dependent on the type of request
The idea of conveying information via otherInfo of srmPing has been discarded: clients must not parse text strings
When SRM_INTERNAL_ERROR is returned for an asynchronous request, no structure about files is returned, hence estimatedWaitTime is not returned
estimatedWaitTime was explicitely considered optional by the WLCG addendum
In the current SRM specifications the following sentence appears:The desiredTotalRequestTime can be negotiated with the SRM implementation. This functionality is desired for the longer term but notrequired for the start of LHC production. This sentence must be changed with an appropriate WLCG addendum specifying that The desiredTotalRequestTime and remainingRequestTime functionality is desired for the start of LHC production.
An asynchronous request can return SRM_INTERNAL_ERROR if SRM is so busy it cannot even queue the request
Need to modify the WLCG addendum to clarify the meaning of estimatedWaitTime and that all times are in seconds (see Assumptions)

Reference pseudo-code

Synchronous requests

sleep_time = 5 // reset at each call
total_time = 0 // only reset at the first time
ret = srmSyncMethod
while (ret == SRM_INTERNAL_ERROR and total_time < total_timeout) {
    sleep(sleep_time)
    total_time += sleep_time
    sleep_time *= 2
    ret = srmSyncMethod
}

Asynchronous requests

sleep_time = 5 // default is 1 in FTS; reset at each call
total_time = 0
ret = srmAsyncMethod
while (ret == SRM_INTERNAL_ERROR and total_time < total_timeout) {
    sleep(sleep_time)
    total_time += sleep_time
    sleep_time *= 2
    ret = srmAsyncMethod
}
req = ret->requestId
ret = srmStatusOfAsyncMethod(req)
while ((ret == SRM_INTERNAL_ERROR or not finalState(ret)) and total_time < total_timeout) {
    sleep(sleep_time)
    total_time += sleep_time
    sleep_time *= 2 // could be something smaller
    ret = srmStatusOfAsyncMethod(req)
}

Example of a randomized exponential backoff

This plot describes the total request rate in a case with 1000 clients contacting a server every 5 s when the server is up, and polling with a randomized exponential backoff time when the server returns SRM_INTERNAL_ERROR. The server is down between 200 s and 1000 s. The horizontal scale is in tens of seconds and the vertical scale in tens of Hz. The backoff is effective in quickly bringing down the total request rate to a negligible level, and the recovery after the server is up again is gradual, as desired. The backoff algorithm implemented is this

-- Flavia Donno, Akos Frohner, Elisa Lanciotti and Andrea Sciaba - 17 Feb 2009

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
png	test.png	r1	manage	21.8 K	2009-02-20 - 10:54	AndreaSciaba	Exponential backoff

Topic revision: r11 - 2009-02-25 - FlaviaDonno

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback