Date and Location -------------------------
-
- July 2005
- :00 - 17:30
Attendees ---------------------------------
- ASCC
- Di Qing
- BNL
-
- FNAL
- Jon
- FZK
- Jens Rehn
- IN2P3
-
- INFN
- Luca, Michele Michelotto, Daniele Bonacorsi
- SARA
- Ron
- NGDF
- Lars
- TRIUMF
- Reda
- PIC
- Gonzalo, Jose Hernandez
- RAL
- Andrew Sansum, Martin, Derek
- CERN
- Jamie Shiers, James Casey, Gavin Mc Cance, Patrica Mendez Lorenzo, Sophie Lemaitre, Vlado Bahyl, Roberto
- DESY
- Michael Ernst, Patrick Fuhrmann
- Expts
- Lassi Tuura, Nick Brook, tim Barrass
Subject -------------------------------------
CERN Overview
Not as far as we want, but now we've got the castor2 major problems solved - rate is back up .
Need to work on ASCC
- RAL
- Problem with gridftp servers hanging from time to time.
- TRIUMF
- Pool node hangs at 99% CPU on one transfer and accepts no more connections.. Are other people seeing that ?
Michael: can you send us the logs for that node.
- SARA
- Ron : We saw a lot of put entries - restarting didn't really help - we had to clean up the postgres database. Also we've had a pool fill up - seems we saw some other pool nodes freeze - didn't get enough info from the log files, but the debug limit is up again.
This morning we had a pool node that crashed twice since it ran out of memory when we tried to get up to 150MB/s. Now throttled back down.
- Michael
- What is running out of memory - the JVM?
- Ron
- Yes.
Lassi : See some problems with timeouts at
RAL and CNAF
- Derek
- We allocated 12 TB, 10TB was already allocated.
- Michael
- 10 streams per transfer - on average 8 transfers per pool nodes.
- Jon
- At FNAL we see 20 streams per transfer - 5 per pool with 2 pool partitions per system.
- IN2P3
- We had problems for the last two days - srmcp worked fine, but glite-url-copy works.
- FZK
- Jos: Now have transfers - 3 pool nodes only connected due to network problems at the FZK side.
- NDGF
-
- PIC
- gonzalo: saw many problems yesterday - this morning started again - running smoothly.
- James
- Tuning might be different.
- ASCC
- Di: Network bandwidth very low. - only one machine being used - gridftp. Need to get SRM working.
- Michael
- We've seen some problems with individual nodes at the castorgridsc cluster - are these resolved.
- James
- Can we get a path at DESY we can write to - faster to fix it ourselves rather than round-tripping.
- Jamie
- won't talk about service phase now - we are focussing on throughput phase. But need to prepare for GDB - sites can co-ordinate with Jeremy/Jamie/Kors. One issue is that the sites will come back with their sample jobs for validation.
- James
- Load generator can be run from either end. If you want to run your own, feel free - fill in the table in SCThreeThroughputHowto to say if you want to run it yourself.
- Jamie
- No meeting next week due to the GDB - people can dial into that.
Topic revision: r1 - 2005-07-13
- unknown