SCWeeklyPhoneCon060613 < LCG

TWiki>

LCG Web>LCGServiceChallenges>ServiceChallengeMeetings>SCWeeklyPhoneCon060613 (2005-07-13, unknown)

EditAttachPDF

Date and Location -------------------------

1. July 2005
2. :00 - 17:30

Attendees ---------------------------------

ASCC: Di Qing

BNL
FNAL: Jon

FZK: Jens Rehn

IN2P3

INFN: Luca, Michele Michelotto, Daniele Bonacorsi

SARA: Ron

NGDF: Lars

TRIUMF: Reda

PIC: Gonzalo, Jose Hernandez

RAL: Andrew Sansum, Martin, Derek

CERN: Jamie Shiers, James Casey, Gavin Mc Cance, Patrica Mendez Lorenzo, Sophie Lemaitre, Vlado Bahyl, Roberto

DESY: Michael Ernst, Patrick Fuhrmann

Expts: Lassi Tuura, Nick Brook, tim Barrass

Subject -------------------------------------

CERN Overview Not as far as we want, but now we've got the castor2 major problems solved - rate is back up .

Need to work on ASCC

RAL: Problem with gridftp servers hanging from time to time.

TRIUMF: Pool node hangs at 99% CPU on one transfer and accepts no more connections.. Are other people seeing that ?

Michael: can you send us the logs for that node.

SARA: Ron : We saw a lot of put entries - restarting didn't really help - we had to clean up the postgres database. Also we've had a pool fill up - seems we saw some other pool nodes freeze - didn't get enough info from the log files, but the debug limit is up again.

This morning we had a pool node that crashed twice since it ran out of memory when we tried to get up to 150MB/s. Now throttled back down.

Michael: What is running out of memory - the JVM?

Ron: Yes.

Lassi : See some problems with timeouts at RAL and CNAF

Derek: We allocated 12 TB, 10TB was already allocated.

Michael: 10 streams per transfer - on average 8 transfers per pool nodes.

Jon: At FNAL we see 20 streams per transfer - 5 per pool with 2 pool partitions per system.

IN2P3: We had problems for the last two days - srmcp worked fine, but glite-url-copy works.

FZK: Jos: Now have transfers - 3 pool nodes only connected due to network problems at the FZK side.

NDGF

PIC: gonzalo: saw many problems yesterday - this morning started again - running smoothly.

James: Tuning might be different.

ASCC: Di: Network bandwidth very low. - only one machine being used - gridftp. Need to get SRM working.

Michael: We've seen some problems with individual nodes at the castorgridsc cluster - are these resolved.

James: Can we get a path at DESY we can write to - faster to fix it ourselves rather than round-tripping.

Jamie: won't talk about service phase now - we are focussing on throughput phase. But need to prepare for GDB - sites can co-ordinate with Jeremy/Jamie/Kors. One issue is that the sites will come back with their sample jobs for validation.

James: Load generator can be run from either end. If you want to run your own, feel free - fill in the table in SCThreeThroughputHowto to say if you want to run it yourself.

Jamie: No meeting next week due to the GDB - people can dial into that.

Topic revision: r1 - 2005-07-13 - unknown

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback