CMSR node broken, May 26th 2010

Description

On Wednesday 26.05, CMSR instance 3 crashed around 9:20 am. It was caused by a hw problem related to a memory module failure.

Impact

  • CMS offline database was up all the time, only sessions connected to instance 3 were affected and reconnected to the available database instances.

Time line of the incident

  • Wednesday 26.05.2010 vendor call open, intervention postponed to Friday
  • Friday 28.05.2010 vendor postponed intervention to Monday
  • Monday 31.05.2010 vendor removed failed memory and postponed again the intervention to Friday
  • Tuesday 01.06.2010 vendor call open to exchange memory from another node
  • Friday 04.06.2010 vendor did not replace memory.... still waiting....

Analysis

  • A vendor call was open to replace the failed memory on Wednesday 26.05, after the node crashed. Vendor should have replaced memory in 12 working hours. However, intervention was postponed to Friday 28.05; on Friday, it was postponed to Monday 31.05. CMSR database was running on 3 nodes (out of 4) which were enough to handle the CMS database load. On Monday, vendor just removed the failed memory module but did not replace it, intervention was postponed to end of the week (Friday 04.01). No spare nodes on RAC7 (due to delays with installation of new hw) in order to replace the crashed node. On Tuesday 01.06, new vendor call was open in order to replace failed memory with memory from a healthy node (used by a test database on RAC7).

Follow up

  • Hardware problem escalated with Dell.
  • On Wednesday 02.06, memory was exchanged and the node was added back to CMSR cluster.

-- EvaDafonte - 07-Jun-2010

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2010-06-07 - EvaDafonte
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback