Backlinks to PostMortems in DB Web

Results from DB web retrieved at 10:00 (GMT)

Database issues after patching, June 2nd 3rd 2010 Description Oracle PSU APR 2010 patch, although it has passed validation on test and integration databases, showed...
Local filesystem got full on two LHCBR nodes Description Local filesystem ORA/dbs00 got full on LHCBR 1st and 2nd node. Impact LHCBR database was not...
SIR application not displaying correctly (missing static resources) Description The static content of the SIR application was not displayed correcly. This problem...
Accelerator databases unavailability Description Accelerator databases (acclog, acccon, accmeas, eniaar, laser, tim, zora, susi, encvorcl) could not be accessed from...
OVM hypervisor pool reboot This post mortem is work in progress! Description CHANGE ME Brief (a few lines) introducing the incident. Focus on what the end users...
IT DB Virtualisation Infrastructure degraded Description A pool of 19 OracleVM hypervisors locate at SafeHost, hosting various production AIS application servers...
ATLAS Online database not accessible from the ATLAS Technical and Coordination Network (ATCN) Description The ATLAS Online Database (ATONR) is was accessible from...
Cinder database restored from backup Description The Cinder database hosted in the DBOD OpenStack CRS was relocated to the second node of the cluster, generating...
Extended downtime of LHCB online DB following power cut of August 9th, 2010 Description LHCB online database was down from Monday 9 8 2010 at 21:41till at Tuesday...
Spontaneous reboots of different nodes of CMSR production database Description During last 3 weeks CMS offline production database was affected by numerous reboots...
Replication for ATLAS conditions and LHCB conditions to SARA stopped Description The database at SARA was unavailable since a storage corruption issue on 18.08....
SHORT UNAVAILABILITY OF ACCMEAS LASER DATABASES Description Oracle database virtual IPs (vip) and listeners went down on the ACCMEAS and LASER databases on Wednesday...
Description On 11th August following instruction of Netapp Support (NetApp Log #2001633149) we tried to change the shelf ID loop identificator of two diskshelf. One...
Unavailability of the CMS offline production DB (CMSR), 11th March 2011 Description The CMS offline production database (CMSR) went down due to a local power cut...
ATLAS offline production database (ATLR) high load Description Atlas offline DB, ATLR, suffered high load and instance reboots during the nights of 11th and 12th...
Few short interruptions of replication of CMS data from online to offline Description On Tuesday 13th July at around 1:30 AM the replication of CMS data failed due...
Real time downstream was not set for LFC replication Description After SARA database recovery LFC replication was restored without real time downstream turned on...
Instability of node 2 and 4 of CMSR database affecting online to offline replication Description CMSR rebooted 3 times on Monday evening 13.09.2010 (node 4 twice...
Performance problems on the Opendays ticketing application infrastructure Description From August 15th to August 20th, the Opendays ticketing application infrastructure...
Description Instance 4 of Atlas offline was reboteed 4 times during the period of 8 days. All reboots happend between 4AM 5AM and occured every second day. Impact...
ADCR database suffered from lost writes Description In the period of 9th November 2012 to 15th January 2013 ADCR production database has suffered from several lost...
BAAN LDAP login broken Description Users could not login on BAAN on Thu, July 15th from 9:55 to 10:25. Only the users defined in LDAP were affected (all system related...
Unavailability of the CMS offline production DB (CMSR), 15th March 2011 Description The CMS offline production database (CMSR) went down due to problems with SAN...
Real time downstream was not set for LFC replication Description After SARA database recovery LFC replication was restored without real time downstream turned on...
Description On Wednesday afternoon, at 2pm, the Atlas offline production database (ATLR) was affected by a scheduled intervention aimed at replacing a defective redundant...
Power cut in the computer center This is to keep track of issues raised after the power cut that occurred on Saturday, December 18th 2010 from 12:00 to 16:00. TODO...
DBoD VOMS database unavailable Description The DBoD VOMS database was unavailable on Saturday 18th October from 05:54 to 17:10. Impact CERN VOMS server not...
Database services unavailable or degraded after network intervention Description Some database services were unavailable or degraded as a side effect of the network...
Loss of connectivity with accelerator databases in TN due to a network issue Description On 19th July while preparing an intervention on accelerator database NAS...
RAC52 storage issue Description Database services running on RAC52 were in general unavailable to users. In some cases existing connections were working but no new...
Instability of node 3 and 4 of CMSR database affecting online to offline replication Description On Friday 20th August in the morning nodes 3 and 4 of the CMSR production...
Storage upgrade on NetApp Vault clusters. Description As recommended by Netapp support and following its procedure an upgrade has been tried on 2 storage clusters...
RAC51 storage switch extension outage Description Planned intervention to add additonal storage switches to RAC51 resulted in a major service outage. All AIS services...
Unavailability of Mysql backend used by Drupal, 21st July 2011 Description Mysql vm`s are on dbvrtg046 and dbvrtg047. Impact No user access to drupal sites, inclusing...
Unexpected problems during Qualiac migration to Safe Host (DB nodes AND Application servers). Description Date proposed (having in mind absences, upcoming easter...
Two OVM servers were rebooted by the high availability system Description During what was supposed to be routine maintenance the OVS agent rebooted two nodes (dbsrvg...
Storage intervention affecting severely OVM infrastructure Description A clean up action on the storage affected several production volumes used by production OVM...
CMSONR database blockage during snapshot taking (in progress) Description On Tuesday a snapshot requested by CMS was finishing export from CMSONR database. The job...
Replication of ATLAS data to Tier1 sites stopped because of capture crashing permanently Description On Monday, 23rd of August at about 8.15 AM, the ATLAS data capture...
ACCLOG activity suspended due to outage of one NAS Filer. Description Database activity in ACCLOG became suspended when the NFS access to one of the filers was cut...
Problem during NAS upgrade. Some databases: Drupal, EDH, AIS RAC IMPACT, Foundation, PPT, etc. , NOVA, hammercloud (Atlas), etc. affected. Description This intervention...
DB outage caused by problems in filer dbnasr1132 Description Several databases (PDBR,LHCBR,ATLR,ATONR,SUSI TEST,ZORA TEST,TIM DB) remained in a `suspended` state...
ACCCON partially not reachable Description ACCCON has been partially not reachable (acc settings db) between 5:29 and 6:10. Impact The acc settings db database...
LHCb Streaming to PIC hung, June 24th 25th 2010 Description The Streams process responsible for propagation of transactions of LHCb experiment to PIC got stuck...
Description Atlas offline DB, ADCR, went down due to disk falure on 23th of November 2011 around 21:45PM. After emergency failover to standby hardware, ADCR database...
LCGR extended downtime during db upgrade to 10.2.0.5 Description The LCGR database was down on Tuesday 25.01 from 09:30 to 17:00 due to the problems encountered during...
LHCBR database got stuck Description LHCB offline production database (LHCBR) hung completely following a disk failure Impact The database was not available to...
Services were not failed over properly after node eviction on ATLR, June 26th 2010 Description 4th node of ATLAS offline database (ATLR) suffered from high load...
CMSR node broken, May 26th 2010 Description On Wednesday 26.05, CMSR instance 3 crashed around 9:20 am. It was caused by a hw problem related to a memory module...
Qualiac and other long term export files deleted from TSM Description Because of deletion request sent to TSM team, apart from old and not used filesystem backup...
Inconsistency of data at SARA (after recovery) Description Replication of LHCb conditions to SARA stopped because streams apply process aborted due to inconsistency...
Database storage incident Description After a network cable replacement on the database storage, part of the Database on Demand instances lost access to the specific...
CMSR database hung following vendor mistake during broken disk replacement Description CMS offline production database (CMSR) got stuck around 14:00 following a mistake...
Unavailability of the CMS offline production DB (CMSR), 28th April 2011 Description The CMS offline production database (CMSR) got stuck on Thursday 28th April...
ADCR ADG RAC7 database suffered from multiple block corruptions after disk failures Description In the last 2 weeks of December 2013 we have observed multiple read...
ACCLOG frozen instance 2, LHCLOGDB service unavailable Description ACCLOG instance number 2 was inaccesible due to memory problem. Service LHCLOGDB running on this...
AISLogin not available Description One connection out of two to AISLogin and all the websites served by the application server ias ais03 prod ( aismedia.cern.ch...
Unavalability of Apex applications running in ITCORE Description There was an intervention scheduled for Wednesday 17.30 to upgrade APEX installation in ITCORE database...
Castor Name Server not available due to problems with one of the database datafiles. Description Castor software was blocked because it could not modify data in the...
CHANGE ME TITLE describing the incident (not the underlying problem that caused the incident) Description CHANGE ME Brief (a few lines) introducing the incident...
Intervention on the physical power in vault area Description A planned intervention on the physical power in the vault area, resulted in an outage of nine blade...
Database issues during patching, May 31st 2010 June 2nd 2010 Description Several issues observed/discovered during the patching of different databases. Some issues...
FTS3 database down due to a hardware problem Description FTS3 connectivity loss during weekend. Impact FTS3 users unable to connect to the database. Any application...
Multiple node restarts after power cut Description Several databases have been affected following the power issue (cf the CF C5 report) of 24 servers in RAC50/barn...
Qualiac database unavailable Description Qualiac database was not available during in the early morning on 07.03.2012 . Interactive Qualiac users were not able to...
ACCelerators LOGging database blocked due to lack of space in archivelog area Description Due to lack of space in the archive log area, ACCLOG, an Oracle RAC db...
ACCelerators LOGging database blocked due to lack of space in archivelog area Description Due to lack of space in the archive log area the archiving of redologs...
ACCelerators LOGging (ACCLOG) database blocked due to lack of space in archivelog area Description Due to lack of space in the archive log area, ACCLOG, an Oracle...
CASTOR storage controllers upgrade Description As recommended by Netapp support an upgrade of CASTOR nas boxes was performed successfully on all FAS3140 and FAS...
CHANGE ME TITLE describing the incident (not the underlying problem that caused the incident) Description CHANGE ME Brief (a few lines) introducing the incident...
TWeeder info for DB Total Number of topics: 128 1 Topics updated during the last 7 days Days Web Topic Date S7 DB DevelopingOracleRestfulServices...
Welcome to the 1 web CERN DB blog Private (restricted access) Available Information ServiceDocuments PostMortems Introduction to the DB blog...
Statistics for DB Web Month: Topic views: Topic saves: File uploads: Most popular topic views: Top viewers: Top contributors for...
Number of topics: 73

 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    DB All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback