Please log all the interventions performed on the production servers listed below.

Production Server

who, when (CET) why (reason for intervention) what result
Marian, Tue Mar 4 12:07:22 CET 2008 test line OK -
Marian, Thu Apr 3 08:13:11 CEST 2008 observation: noticed the queue of 550+jobs set the MAX_JOBS_RUNNING=700 (condor_config)  
Marian/Lorenzo, Thu Apr 3 12:00:00 CEST 2008 deployment of Release v2.0.3-1 using the etics-deployment-server v1.3.8-4 troubles with hostcert.chain - resolved
Marian, Fri Apr 11 21:12:00 CEST 2008 job queue building with around 100 registration awaiting observation no action taken as increased load was expected
Marian, Mon Apr 20 08:17:00 CEST 2008 some artefacts registration failing after timeout (e.g runid=69376) observation no action taken
Marian, Wed Apr 23 11:29:32 CEST 2008 request of new platforms registration inserted 3 new platforms+mappings (ubuntu8_ia32_gcc423,ubuntu8_ia64_gcc423,ubuntu8_x86_64_gcc423) success
Marian, Wed May 7 10:11:32 CEST 2008 some registration blocked since May 4th , 17:25 (nmi runid=72526)   no action taken (informing Lorenzo)
Marian, Tue May 13 22:30:00 CEST 2008 not sufficient command length for client<->WS communication, limited service performance deployment of webservice-1.3.5-1, db schema alteration, for Tomcat set JAVA_OPTS="-Xms2048m -Xmx2048m" success
Marian, Fri May 16 , 09:20:00 CEST 2008 condor queue building up removed jobs targeting non existing platforms queue reduced/load decreased
Marian, Fri June 6, 18:50:00 CEST 2008 server slow tuning MySQL database: cache buffer parameters tuned better server performance
Marian, Tue July 15, 09:50:00 CEST 2008 server loaded with many jobs 2 WN disconnected from the pool restarting WNs
Marian, Wed July 16, 22:00:00 CEST 2008 Bug fixing in WA deployed WA v1.3.12-1 as part of etics_R_2_0_9_1 operating correctly
Marian/Lorenzo, Fri Oct 30, 11:00:00 CEST 2008 Repository deployment deployed new repository server v1.2.4 operating correctly
Marian/all, Wed Nov 09, 08:00 - 1800 CEST 2008 Investigation/restarts of services due to poor external connectivity temporary service improvement
Marian/all, Wed Nov 10, 09:00 CEST 2008 AFS client upgraded, switch replaced, AFS volumes moved temporary service improvement
Alberto, Marian, Wed Dec 13, 09:00 CEST 2008 Jobs waiting in the queue removing jobs using nmi nominal operational throughput restored
Marian/Lorenzo, Mon, Jan 19, 12:00 CEST 2009 deploying new release v2.3.0-1 adding "submissions" tab operating correctly
Marian/Lorenzo, Wed Feb 04, 12:00 CEST 2009 old reference to etics-repository not refresed cleaning the tomcat cache operating correctly
Marian, Mon Mar 30, 11:00 CEST 2009 resubmisions, bug fixes deploying release etics_R_2_4_0_1 operating correctly
Marian, Thu Apr 02, 12:00 CEST 2009 resubmisions, bug fix(locking) deploying release etics_R_2_4_1_1 operating correctly
Marian, Mon May 04, 12:00 CEST 2009 non-blocking submission (WA) deploying release etics_R_2_4_2_1 operating correctly

Repository Server


who, when (CET) why (reason for intervention) what result
Marian, Tue Mar 4 12:12:13 CET 2008 test line OK --
Lorenzo, Thu Apr 3 09:03:11 CEST 2008 registration blocked restarting repository service condor queue reduced by 300 jobs after repository service restart
Lorenzo/Marian, Thu Apr 3 12:00:00 CEST 2008 upgrade of repository service upgraded to etics-repository-webservice v1.1.3-1 troubles with cache refreshing-bug identified
Marian/Lorenzo, Fri Oct 30, 11:00:00 CEST 2008 Repository deployment deployed new repository server v1.2.4 operating correctly
Marian/all, Wed Nov 10, 09:00 CEST 2008 AFS client upgraded, switch replaced, AFS volumes moved temporary service improvement
Marian/Lorenzo, Wed Feb 02, 12:00 CEST 2009 occasional problems with registrstion on AFS swapping etics-repository and etics3-repository operating correctly
Marian, Lorenzo Mon Mar 30, 11:00 CEST 2009 yum repository,bug fixes deploying release etics_R_2_4_0_1 operating correctly
Marian Wed 1 Apr, 15:45 CEST 2009 afs access occasionally failing applying new operational parameters no access failures (under observation)
Marian * Apr, * CEST 2009 error messages from repository WS looking into log files, reposrting to developers workaround provided to users
Marian Wed May 13, 10:45 CEST 2009 server not responsive restarted server, verified operational parameters correct operation


-- MarianZUREK - 04 Feb 2009

Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2009-05-25 - MarianZUREK
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ETICS All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback