Real time downstream was not set for LFC replication
Description
After SARA database recovery LFC replication was restored without real time downstream turned on.
Impact
- The latency of LFC replication (i.e. time gap between source data insertion and apply time) was growing up to a maxmum of one hour from 10.09 12:00 till 14.09 17:00
Time line of the incident
- Wednesday 8th of September, 14:00 - SARA replication was restored (for both ATLAS and LHCB)
- Friday 10th of September, 16:00 - SARA replication for LFC added back to the main setup
- Tuesday 14th of September, 17:00 - Real time downstream parameter re-enabled for LFC replication
Analysis
LFC replication (contrary to LHCB replication) is configured to use a special 'real time downstream' optimization in order to minimize replication latency through downstream database. After re-instantiantion of conditions replication to SARA by DBAs (see SARA downtime incident) this parameter was not set back to the correct value. The reasons why the parameter was not set properly after re-instantiation of SARA are under investigation. A review of the streams re-instantiation procedures is also being done to avoid further occurences of the issue.
--
MarcinBlaszczyk - 27-Sep-2010