SCWeeklyPhoneCon060116 < LCG

TWiki>

LCG Web>LCGServiceChallenges>ServiceChallengeMeetings>SCWeeklyPhoneCon060116 (2006-01-17, GavinMcCance)

EditAttachPDF

-- GavinMcCance- 16 Jan 2006

Present: Jan, Olof, Maarten, Harry, James, Gavin, Jamie (via phone)
Sites called in: INFN, ASCC, FNAL, GRIDKA, IN2P3, BNL, NDGF, PIC, DESY, TRIUMF, SARA/NIKHEF, GSI.
Expts called in: LHCb, CMS, Alice
Absent: Atlas

Outlook for SC3 Disk-Disk re-run

Goal: At least 800MB/s minimum stable with current software, for 3-4 days.

If we can do that, then try SRM copy to see how that affects the rate.

Main work for next 24 hours is optimising the rates to obtain stable running.

Issues

For deleting files (to avoid filling up space), the recommendation is to do this locally (via some script or other).

Would be nice to see all the parameters for all the channels somewhere, e.g. on a web page (made dynamically by FTS).

Would be good to document changes/problems in the wiki to keep track of all the changes.

Reports from Sites

CERN

Since afternoon, rate is 50% what it should be - scaled equally across sites. CERN is investigating.

Threading problems on new DLF server caused break in service. Problem is only seen at production loads. Castor team downgraded version of DLF server. A fix is available and will be deployed soon.

Noted that only a small subset of test-files are being used (only 370 of 8000), causing poor load distribution. --> problem in test-load generator.

ASCC

Running well.

BNL

Doing 50 files with 10 streams - still need to find optimal values for this. BNL have two people on call for monitoring.

CNAF

Out of memory problems on Castor1 - probably too many files / streams. We will reduce this numebr and start to tune it. 2 streams should be OK (previous tuning).

Currently running with asymmetric routing - CNAF->CERN using new 10 gig, CERN-CNAF previous network 2 x 1 gig. If the tuning doesn't work, maybe we should change the routing to fully symmetric 2 x 1 gig.

DESY

Working well then hit by a firewall issue. Fix underway - hopefully engineers will have it fixed tonight.

Michael: Changing to SRM copy -> we should get at least 50% more.

FNAL

Running at 80 MB/s. Increasing the number of transfers should increase rate linearly. Unbalanced queuing issue on the pools - this is being looked at by experts. Recommend 20 streams per transfer with 2MB TCP buffer.

#50 / #15 transfers issue. FTS asked to do 50 concurrent, but FNAL only sees 15 concurrent - CERN will check. If we do SRM copy, FNAL can monitor what's going on.

GRIDKA

Pool nodes got totally filled up. Files deleted and pool nodes restarted had to be restarted. Fix for this is in 1.6.6-4 dCache release (currently running 1.6.6-1 - will deploy after rerun to maintain stability of system). There is a cron job running to clean files now.

IN2P3

80MB/s - good given 1-gig link.

NDGF

Firewall problems being looked at. CERN + NDGF network experts investigating.

PIC

Running well.

RAL

Reconfiguring: Increased the number of disk servers 4 -> 9. gridFTP memory prpoblems: too many streams? Best effort at weekend, 2 people keeping an eye on cluster.

SARA/NIKHEF

Reconfiguring. Almost done. Added 4 more pool nodes. Ready to start with more transfers. There was a DNS problem, now resolved (propagation delay).

TRIUMF

Working well. 80MB/s. Can increase number of files. Suggest buffer size increase, and reduce # streams.

Reports from Experiments

LHCb

Want to schedule phase-1 rerun. The SC3-rerun hardware setup isn't appropriate for this, so we will return to the the production castorgridsc cluster and give LHCb priority.

CMS

No issues.

Alice

No issues.

AOB

Experiment / tier-1 Software / deployment priorities document is underway to give realistic planning for SC4.

Topic revision: r5 - 2006-01-17 - GavinMcCance

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback