Best practices for scheduled downtimes
Tier-1 downtimes
Experiments may experience problems when two or more of their Tier-1 sites are inaccessible at the same time. Therefore Tier-1 sites should do their best to avoid scheduling a downtime classified as "outage" in a time slot overlapping with an "outage" downtime already declared by another Tier-1 site supporting the same VO(s). The following procedure is recommended:
- A Tier-1 should check the downtimes calendar (see below) to see if another Tier-1 already has an "outage" downtime in the desired time slot.
- If there is a conflict, the best would be to pick another time slot.
- In case stronger constraints do not permit another time slot, the Tier-1 should point out the existence of the conflict to the SCOD mailing list and at the next WLCG operations call, to discuss it with the representatives of the experiments involved and possibly the other Tier-1, to see if a less disruptive scenario can be arranged instead.
As an additional precaution, the SCOD will check the downtimes calendar for Tier-1 "outage" downtime conflicts at least once during his/her shift, for the current and the following two weeks; in case a conflict is found, the SCOD will follow up with the parties involved.
Links to Tier-1 downtimes
Advance notifications of downtimes
The experiments have expressed how much in advance they would like to be informed of a scheduled downtime,
depending on its duration. In general
the earlier, the better. The following table summarizes what each of the
experiments would appreciate as advance notifications (
N) for downtimes (
DT) of various lengths:
Note that CMS would like to have sufficient time to migrate data away from a site that
will have a downtime longer than 1 month, to be discussed with CMS operations.