PostMortem28Feb13 < DB

DB Web>PostMortems>PostMortem28Feb13 (2013-03-03, AntonTopurov)

ACCLOG frozen instance 2, LHCLOGDB service unavailable

Description

ACCLOG instance number 2 was inaccesible due to memory problem. Service LHCLOGDB running on this instance was inaccessible.

Impact

Service LHCLOGDB was inaccessible

Time line of the incident

28-Feb-13 04:03 - ACCLOG2 instance froze due to problems with memory
28-Feb-13 04:19 - RACMON sms alert sent to the shift phone.
28-Feb-13 04:25 - Investigation started
28-Feb-13 04:44 - Service LHCLOGDB manually relocated to the surviving instance.
28-Feb-13 04:48 - Instance killed by person on shift
28-Feb-13 04:48 - Automatic restart by clasterware failed with "unable to allocate Large Pages"
28-Feb-13 04:53 - Successfull manual start of the isnstance.
28-Feb-13 05:12 - Service LHCLOGDB relocated to its preferred instance.

Analysis

ACCLOG2 instance was inaccessible from 04:03. Monitoring reported:

acclog: Error monitoring service lhclog: ORA-01034: ORACLE not available
acclog: Error monitoring service lhclog: ORA-27123: unable to attach to shared memory segment

alert.log was full of

Process W000 died, see its trace file
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process

Clusterware did not notice any anomalies and the LHCLOGDB service was not automatically relocated to the surviving instance 1, therefore service was not available.
Connecting to the instance did not work, needed to be killed. Restart of the instance by the clusterware failed with huge pages allocation.
Manual restart after few minutes did not encounter memory problems and went ok.

Follow up

ALL IT-DB monitoring tools (RACMON, EM11, Legacy scripts) detected the problem very fast.
Clusterware is not detecting such problems
Service relocation to be implemented with scripts in such a case - not very streightforward.

Topic revision: r1 - 2013-03-03 - AntonTopurov

Public webs

- Cern Search
- TWiki Search
- Google Search
DB All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback