Week of 051114

Open Actions from last week:
  • Look at queries in FTS for locking problem (Gavin/Kris) IN PROGRESS
  • QF for FTS memory leak code - beginnning of this week (Paolo) IN PROGRESS
  • Lyon wants to switch to SRM Copy (fts-support)
  • New LFC sensor to detect current thread usage, and external service availability via CLI tools (James)

Chair: Harry Renshall

On Call: David + Roberto

Monday:

Log: Nothing

New Actions: Kris has not found locking problem in production FTS. They will now load up the test node, fts006, and work on that. The new LFC sensor will be written this week. Meanwhile there is a new operator alarm on which they will reboot and there is an automatic reboot in place. The castor client will be updated today on lxplus and lxbatch and the castor2 request handler will be restarted - all this should be transparent.

Discussion: FTS data rates dropped over the weekend but probably because the experiment queues emptied. Today/tomorrow is the LHC committee review of the LCG project during which there will be some demos so service managers should check all their service components. For the same reason the swap over of the global and site bdii and the edg ce will probably be left till Wednesday.

Tuesday:

Log: Nothing

New Actions: Patricia to switch IN2P3 channel to use srm-copy. Harry to follow up how to improve problem reporting flow from GGUS to CERN workflows.

Discussion: Current traffic is a steady 10 MB/s to BNL. We are also seeing at BNL the old problem of a dcache advisoryDelete failing to remove files left after an incomplete transfer causing subsequent transfer attempts to fail with 'file exists, cannot overwrite'. Gavin will follow up. SRM puts to Sara were reported timing out by Atlas. Sara have cleaned out their postgres database and restarted the srm door as the usual fix for this problem. The castor2 request handler restart was postponed till tomorrow since the prereq move of the logging database has not yet happened. A report of srmcp failures to RAL that was submitted to GGUS exposed some confusion in how to interface GGUS and CERN problem management workflows.

Wednesday

Log: Nothing

Actions: Patricia to set up a new FTS channel for BNL T1 to T2 testing. Swap the production lcg-bdii to a new mid-range server (this was announced Monday to be done today after the LHCC review). Harry to warn experiments (esp cms) of a 5-10 minutes router stoppage at 12.00 affecting some disk pools.

Discussion: Patricia switched the IN2P3 channel to use srm-copy - thanks. There was a report from Flavia Donno that Atlas (M.Branco) had mailed many site problems to a support list and got few replies. James thought they might be trapped in Shiva and agreed to check. Simone reported one Atlas problem of poor performance transferring to BNL which they (BNL) said was due to a high load on their dcache services. James said this was due to traffic during the Supercomputing conference when many networks get busy. This was not known to Atlas.

Thursday

Log: Nothing

Actions:

Discussion: restart of F10 router went ok. Harry forgot to notify CMS, but no traffic was noted on the link before the intervention and there were no operator alarms. Lyon and BNL configurations done and tests run overnight. No report yet on progress

Friday

Log: Nothing

Actions: lxgate24, which runs the production GRIDVIEW showing FTS transfer summaries, is not responding. An operations ticket is to be followed up.

Discussion: Outgoing FTS traffic now running at about 200MB/sec. There is a fix for the FTS memory leak that will be tested today with a view to going in production on Monday. The LFC pilot service uses the grid7 server which needs the recent Oracle security patch. This requires a 20 minute downtime and it was agreed to perform this at 14.00 today.

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2005-11-18 - HarryRenshall
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback