-- HarryRenshall - 26 Feb 2008

Week of 080225

Open Actions from last week:

Monday:

See the weekly joint operations meeting minutes

Additional material:

ATLAS: are surveying Tier 1 tape activities via this email from K.Bos:

during the Jamboree (ATLAS week) I would like to also discuss the use of tapes for CCRC in the Tier-1's. This was also asked for by the WLCG and so it will probably be raised again next week in the CCRC meeting. At least for the past days we have been writing systematically to all Tier-1's using the ATLASDATATAPE space token.

Could each Tier-1 please prepare 1 slide with some initial information about: - have all data that were sent been written to tape and with which delay ? - have you noticed we also tried to delete files from tape? and has that been done? and with which delay again ? - what is the size of the t1d0 buffer before the tapes, and how big is the buffer for reading from tape? - how did you (or not) manage to separate ATLAS data from the other VO's ? - how many tape drives did you use, and what was the writing speed ? - do you have a way to display these results? and can we see those from CERN ? - any other issue (problem, bug) which you believe is relevant ?

ALICE detailed report (PM):

1. Alice transfers stopped on Friday night and during the whole weekend. The reason was an upgrade of the aliEn software at CERN that messed up the FTD at CERN. The issue has been solved this morning and as you can see transfers to FZK and SARA are now going on from 10:00 this morning.

2. The NFS problem with the CNAF VOBOX is now solved. Performing the SE setup of CNAF. While restarting the SE service of Alice within the VOBOX the "cp" still fails which may be a server problem. Still investigating this issue

3. Lyon VOBOX is still not accesible for me and I have submitted a ticket related to this problem this morning to the elog. However several issues are now being investigated with the site manager:

3.1) There were no pool accounts configured for none Alice srm endpoints. This issue has already been solved 3.2) Several issues related to the access through the 1094 port have been also solved by the site manager during the downtime of the site this morning 3.3) The access to the srm endpoint is failing with the message: Last server error 3012 (' Cannot find write pool : Pool manager error: No write pools configured for <disk-sc4:alice@osm>'). This issue still has to be solved by the site.

4. No news about the messages I sent to all T1 sites regarding the proxy workaround to install into the VOBOXES. FZK and Lyon should be doing it today but I will contact back the site managers.

Tuesday:

elog review:

Experiment report(s):

-ATLAS:

    1. AOD exports within cloud (T1->T2) gave very good results. Most of the sites (>90%) got approximately the full set of AODs. US cloud did not participate under Michael Ernst request. Replication problems in NL cloud.
    2. Many sites are upgrading to the new version of dCache fixing several issues. This explains the high failure rates for FZK and LYON.
    3. Activity for the reprocessing is ongoing. Input datasets are being replicated to T1 tape areas (T1->T1 transfers). This can be measured in additional 400MB/s aggregated traffic to T1s, which shows up in the ATLAS production dashboard
    4. Plans: tomorrow a general cleanup (data on T1s, data on CASTOR@CERN, subscriptions in DDM, datasets in DDM) will be performed. From thursday a focused exercise on data completion will be done: data will be produced at peak rate and pushed out of T1s, but the focus will be in trying to understand how many datasets fail to be transfered after 24h (in addition to the throughput measurements). This exercise should help to understand also how the new version of ATLAS site services (v0.6) improves dataset completion in respect of the current version (v0.5). This exercise will continue for the all weekend. On Monday, T0 traffic will have to stop to offer room to M6 data taking.

-LHCB: are emulating LHC operations with 6 hours on, 6 hours off. Cleared up many problems this weekend - still seeing failures of gsi dcap servers at IN2P3 and NL-T1. Non-release of dcache space after file deletion is still there for T1D0 spaces - dirty workaround is to add more 'logical' space.

-ALICE: data is beginning to arrive to IN2P3. Ongoing problems at CNAF and NDGF needs to set up host certificates.

Core services (CERN) report:

DB services (CERN) report: there is a high load on the DB side FTS servers but it is not causing service degradation. Decided to leave tuning (sql changes) till after February run.

Monitoring / dashboard report:

Release update:

Questions/comments from sites/experiments:

AOB: There are problems at some sites with lack of dcache support for mixed mode API on space tokens (some calls give one while others do not).

Wednesday

elog review:

Experiment report(s):

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

Questions/comments from sites/experiments:

AOB:

Thursday

elog review:

Experiment report(s):

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

Questions/comments from sites/experiments:

AOB:

Friday

elog review:

Experiment report(s):

ALICE (PM):

1. Yesterday afternoon the last issues with CNAF were solved and first bunch of transfers began to arrive to that site.

2. The FTD central server at CERN went down tonight so as you can see in the graphics, transfers dessapeared during the whole night. The system is back right now anf 1st transfers are now beginning to arrive

3. Regarding the problems with NDGF Pablo and me tried yesterday to setup the transfers (with cero sucess) due to a connection problem with srm at CERN from NDGF. We were able to transfers from CERN but not pulling from NDGF. Today Gavin has submitted the following email, the problem was not at the NDGF site but a FTS server issue:

"We've noticed an issue with the new-style delegated proxy certificates when used in FTS. Currently the only person using this on the CERN-PROD FTS service is Pablo. The issue comes when contacting any SRM (v1 or v2) using the delegated proxy certificate. We get on attempting to contact the SRM:

[srm308] > /usr/bin/srm2_testPing httpg://srm-alice.cern.ch:8443

SOAP 1.1 fault: SOAP-ENV:Client [no subcode] "CGSI-gSOAP: Error reading token data header: Connection closed"

The FTS server sees the same problem when trying to use this proxy. Both client machines srm308 and the FTS server are using vdt1.2.2. We've tested against srm servers running both vdt 1.2 and vdt 1.6. The same proxy can be used to contact the Java-based FTS web-service with no problem."

ATLAS (SC): ccrc'08 activities have finished today to give time to prepare for the M6 cosmics run starting on Tuesday (e.g. to clean up disks).

LHCb (RS): There is a Dirac problem preventing reconstruction jobs running at T1 sites which have the required data. Transfers from CERN to SARA are failing and from time to time all T1 transfers out of CERN fail. FD said they are running the SRM and dcap servers on the same machine which is not recommended. There will be a dcache site conf call at 16.00.

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

Questions/comments from sites/experiments:

AOB: JS reminded that next Mondays call will be in the EGEE operations meeting and there will be no calls Tuesday/Wednesday due to all-day meetings at CERN. The intention is to continue the daily meetings through March and April.

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-02-29 - HarryRenshall
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback