PreStagingTestsFollowup12thFeb < LCG

LCG Web>PreStagingTests>PreStagingTestsFollowup12thFeb (2009-06-05, AndreaSciaba)

Follow up of the meeting of 12th Feb about busy storage services:

Feb 13th 2009: Minutes sent by Akos (text file 'mail_akos.txt' attached to this page)

16 Feb 2009 - Alex Sim:
we probably need to mention what the server side automatic "request abort" means/does after total request time expires. It could be different from when client calls for the request abort or files abort. Or it could have the same effect.

17 Feb 2009 - David Smith:
As we're recommending an exponential back off for the client (when the client doesn't have more information, like the estimatedWaitTime), the client will soon have backed off to a long time between queries. I don't think it is necessary to add that INTERNAL_ERROR means to back off even more quickly.

18 Feb 2009 - Alex Sim:
Aborting the request implies following the WLCG recommended behavior of srmAbort. This means that completed files MUST NOT be aborted or removed, while uncompleted files MUST be aborted and removed. e.g. 5.11.2.e When aborting srmPrepareToGet request, all uncompleted files must be aborted, and all successfully completed files must be released. Flavia agrees.

Shaun: I believe desiredTotalRequestTime is negotiable and can be set to a value <> the input value - so the returned remainingRequestTime should be set as appropriate.

18 Feb - Shaun:
If a server returns INTERNAL_ERROR from a async request is this a request to back off only for this request. Should a client interpret an INTERNAL_ERROR from sync request as being a general request to back off sending anything? Would it make sense to say a INTERNAL_ERROR from a Ping request to be an indicator that all requests should be delayed? This is an item for discussion, I don't expect any answer immediately...

A more precise definition of 'the client must back off when receiving an INTERNAL_ERROR'. Proposal by Andrea: I think that the only meaningful thing is to suggest some kind of exponential back off, preferably with randomized increments. This is why because there couldwell be cases when SRM_INTERNAL_ERROR is due to a very transient problem, and waiting N hours (as you seem to suggest) would be very inefficient.

According to my calculations, if you have N clients using an exponential backoff, the total polling frequency goes like

f =~ N / ((r-1)*t)

(apart from numerical constants)

where t is total time elapsed and r the factor multiplying the previous backoff time interval. So, with 1000 clients and r=2, after 100 seconds the polling frequency would be 10 Hz, after 1000 seconds 1 Hz, etc. If you choose to wait for "a good while", meaning by that some large constant amount of time T, the total frequency will go as

f = N / T

so it will be constant with time. The bad thing here is that you must tune T as a function of N and you are very inefficient if the problem is resolved quickly. Everything considered, I think that the exponential backoff if much preferable.

-- ElisaLanciotti - 23 Feb 2009

Topic revision: r1 - 2009-06-05 - AndreaSciaba

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback