Site | Status | Recent changes | Planned changes |
---|---|---|---|
CERN | CASTOR 2.1.11-2 (SL5); SRM 2.10-x (SL4); xrootd: 2.1.11-1 FTS SL4 3.2.1 i.e old EOS -0.1.0/xrootd-3.0.4 |
All instances are now on the latest CASTOR version. All of them use the Transfer Manager for internal sceduling (LSF replacement) . The end of August technical stop was used to perform several DB maintenance intervention (security patched on all, including the Name Server, CMS stager DB moved to new more performing hardware, ATLAS stager DB defragmented | Complete the migation to new hardware (all instances -but CMS, and the Name Server) Resource move (disk space) from CASTOR to EOS is being planned (time scale: October 2011) |
ASGC | CASTOR 2.1.11-2 SRM 2.11-0 DPM 1.8.0-1 |
29-30/8: downtime due to CASTOR upgrade | None |
BNL | dCache 1.9.5-23 (PNFS, Postgres 9) | None | Transition to Chimera during next TS (Nov) |
CNAF | StoRM 1.7.0 | ||
FNAL | dCache 1.9.5-23 (PNFS) httpd=1.9.5.-25 Scalla xrootd 2.9.1/1.4.2-4 Oracle Lustre 1.8.3 |
||
IN2P3 | dCache 1.9.5-26 (Chimera) on core servers. Mix of 1.9.5-24 and 1.9.5-26 on pool nodes | ||
KIT | dCache (admin nodes): 1.9.5-25 (ATLAS, Chimera), 1.9.5-26 (CMS, Chimera) 1.9.5-26 (LHCb, PNFS) dCache (pool nodes): 1.9.5-9 through 1.9.5-26 |
||
NDGF | dCache 1.9.12 | ||
NL-T1 | dCache 1.9.5-23 (Chimera) (SARA), DPM 1.7.3 (NIKHEF) | ||
PIC | dCache 1.9.12-8; PNFS, Postgres 9.0 | 13/9: upgrade to dCache 1.9.12-10 | |
RAL | CASTOR 2.1.10-2 2.1.10-0 (tape servers) SRM 2.10-0 |
7/9: will apply DB patches to 3D ATLAS and LHCb, FTS and LFC. Services will be "at risk" | |
TRIUMF | dCache 1.9.5-21 with Chimera namespace | None | None |
Site | Version | OS, n-bit | Backend | Upgrade plans |
---|---|---|---|---|
ASGC | 1.8.0-1 | SLC5 64-bit | Oracle | None |
BNL | 1.8.0-1 | SL5, 64-bit | Oracle | None |
CERN | 1.7.3 64-bit | SLC4 | Oracle | Upgrade to SLC5 64-bit pending |
CNAF | 1.7.4-7 (ATLAS, to be dismissed> 1.8.0-1 (LHCb, recently updated) |
SL5 64-bit | Oracle | |
FNAL | N/A | Not deployed at Fermilab | ||
IN2P3 | 1.8.0-1 | SL5 64-bit | Oracle 11g | Oracle DB migrated to 11g on Feb. 8th |
KIT | 1.7.4-7 | SL5 64-bit | Oracle | Oracle backend migration pending |
NDGF | 1.7.4.7-1 | Ubuntu 9.10 64-bit | MySQL | None |
NL-T1 | 1.7.4-7 | CentOS5 64-bit | Oracle | |
PIC | 1.7.4-7 | SL5 64-bit | Oracle | |
RAL | 1.7.4-7 | SL5 64-bit | Oracle | |
TRIUMF | 1.7.3-1 | SL5 64-bit | MySQL |
Site | Status, recent changes, incidents, ... | Planned interventions |
---|---|---|
BNL | -- Oracle database services at BNL were turned off and related hardware power off past Saturday as a part of the preventive shutdown of the BNL Tier 1 due to critical weather conditions. Atlas Conditions database - streams propagation and apply process and database service were disabled Saturday ~11:47 AM EDT and re-enabled in Monday 29 at ~10:34AM EDT. BNL LFC and FTS, US ATLAS Tier 3 LFC databases - disabled at ~11:15AM EDT Saturday and re-enabled at 10:00AM EDT Monday Standby database for BNL LFC and FTS, US ATLAS Tier 3 LFC databases - disabled at ~11:30 EDT Saturday and re-enabled at 11:00AM EDT Monday. VOMS and Priority Stager - disabled at ~11:15AM EDT Saturday and re-enabled at 10:05AM EDT Monday So far no problems (hardware and database and replication) observed after re-enbling the database services. Activities: --Migrated VOMS and Priority Stager to a newer disk storage. -- ORACLE diagnosed in Service Request 3-3535183751 and SR 3-3396390491 a possible cause of the apply process and the gather_stats job contention reported in 04/13/11 and 06/15/11. Oracle claims that this problem is due to a BUG 6011045 which affects DBMS_STATS causing a deadlock between 'cursor: pin S wait on X' and 'library cache lock'. In addition Oracle provided a patch number (P6011045) and recommend to apply it to fix this BUG. The patch appears to be rolling. |
--To apply CPU patches initially to Conditions Database and proposed patch from Oracle (P6011045) --To relocate the LFC and FTS cluster to a new datacenter room, this intervention has been positioned until next Technical Stop (November) and will be scheduled in conjunction with other services (dCache) maintenance to minimize overall site services disruption. |
CNAF | ||
KIT | Nothing to report | None |
IN2P3 | On 18 July 2011, hard disks have broken down involving a corruption at different level (control file, undo, redo logs, archived logs and datafiles). Unique solution to restore databases, was to apply an incomplete restore for all databases. Streaming was restored together with CERN DBAs. Storage team is in contact with the Pillar support to understand the root of problem. Note that, all Luns presented to ASM are in RAID5 and hard disk failures should not have impacted Oracle. |
CPU patching schedule: - DBLHCB on 6th sept 2011 - DBATL on 7th sept 2011 - DBAMI on 8th sept 2011 We may foresee a downtime on 20th sept for a micro-code upgrade on FC switches and disk array. This intervention involves to shutdown all 3d databases for around 12 hours. We wait for the confirmation of Atlas France before fixing a date. To be scheduled. |
NDGF | Nothing to report | None |
PIC | - Complete stop of ATLAS' database. - apply CPU July on remaining instances (maybe next week) |
|
RAL | ||
SARA | Nothing to report (not attending) | No interventions |
TRIUMF | Nothing to report | No interventions |