PPS Pilot Follow-up Meeting Minutes Thu 22 May 2008

  • Date: Thu 22 May 2008
  • Agenda: 34504
  • Description: pilot of SL4 WMS at CERN-PROD and INFN-CNAF
  • Chair: Antonio Retico

Attendance

  • PPS: Antonio Retico
  • CERN-PROD: Maarten Litmaath
  • CNAF-INFN: Daniele Cesini, Danilo Dongiovanni
  • CMS: Enzo MIccio
  • Atlas: David Rebatto
  • Certification/SA3: Di Qing

Status and results of the pilot service (by VOs and sites)

Two couples of WMS 3.1 (on for Atlas and one for CMS) were set-up at CNAf and Cern and they were included in the pools of machines used for standard production activity by CMS and Atlas. Specifically for CMS it was used in test mode by CMS users for production MC analysis.

Both CMS and Atlas agree that the general behaviour of the service looked correct and the performances of the system are considerably better with respect to the version currently in production.

(Off-line) Maarten confirmed that he was watching wms117 at Cern and the system did not break during the operations.

Two noticeable issues were observed due to BUG:36669 and BUG:36224. Both issues were fixed manually at Cern, whereas the manual workaround for the 2nd one was not applied at CNAF These issues and other minor ones are discussed in detail in the next section.

Open Issues (by VOs, sites, deployment teams)


BUG:35244 - Can't submit jobs using voms proxies with multiple roles due to a mapping problem

Severity: Critical

Status: No Patch yet

David: This was found out earlier (just after certification) and the manual workaround was applied to the CERN pilot only in a second time. It was successfully applied also to the WMS in Milano

Antonio: The clean fix requires changes both in yaim core and yaim-wms. This could take additional two weeks of certification because the new configuration done by yaim core would have to be re-certified against all nodes.

The option to introduce an hack in yaim-wms to fix the problem, to be removed in a later release was considered and discarded (Di, Maarten, Antonio).

Proposal: As the workaround is of simple application, in order not to delay further the release, the decision is made to report the issue among the known ones and advertise clearly the workaround in the release notes


BUG:36669 3.1 WMS submission fails for static accounts

Severity: Normal

Status: No Patch yet

Maarten: This is a regression to an old bug, already fixed in some branch and now re-introduced . The workaround here is to apply a legal, although not enforced, configuration of the pool accounts on the WMS. The application of this workaround at the sites is even desirable, because this fits better with the advices from the JPSG for better traceability of jobs.

The needed configuration actions affect only the groups.conf file in yaim (teh users.conf is left untouched). This may be annoying though for sites where the content of this file is shared among many nodes (e.g. Cern)

Decision: Reported in the release notes as a known issue. Explicit mention in the release nodes to the need for the sites to apply a new configuration of the accounts on the WMS . The example file provided by yaim must reflect this configuration


BUG:36757 - WMProxy API Python: local proxy in api constructor does not work

Severity: Normal

Status: fix candidate for integration

Discussion:

Antonio: this was opened by the developers, does not seem to have affected the activity of the VOs

Decision: Describe among Known Issues


BUG:35357 - WMProxy API Python: getOuputFileList empty list if only one file in OutputSandbox

Severity: Normal

Status: Patch Incomplete

Discussion:

Maarten: this was never seen because most of the jdl used for tests have got at least two files in output. Is it a common use case?

David: The method is actually used to describe the output, which is then retrieved via globus-url-copy. The single-file output sandbox used to be a rather common use case in the past, when the output was compressed in a tar. Now this is not the default but it is still possible, in which case you have to make sure that you add a dummy file to your output

Decision: Describe among Known Issues. This issue is possibly relevant for the users, so it should be appropriately documented. The release notes are not sufficient for that.

BUG:36432 - "/etc/init.d/gLite start" command modifies terminal setting Severity: Minor

Status: No patch available yet

Discussion:

Maarten: I found something very similar installing another service, so it might be that this issue is related to yaim. To bee seen

Decision: Describe among Known Issues


BUG:36336 - GLITE_LOCATION_VAR set wrongly on LB 3.1 node

Severity: Normal

Status: No patch available yet

Discussion:

Di:The bug is was already addressed in the known issue on Savannah patch pages, and a workaround is available there as well.GLITE_LOCATION_VAR set wrongly on LB 3.1 node, I pointed that it was already addressed in the known issue on Savannah patch pages, and a workaround is available there as well.

Decision: Describe among Known Issues


David reported also of an issue they had with the cronjob used to purge stale jobs from the LB. Apparently this cronjob is affecting the performance of the LB node and it needs sometimes to be disabled. The advice of disabling the cronjob is also frequently given by the developers.

Daniele and Danilo confirm to have received the same advice and to have temporarily disabled the cronjob in case of stress of the LB

Danilo also reports of a suggestion received by one of the developers to increase the frequency of the cronjob (the job is now run every 6 hours and). According to the developer increasing the frequency could make the queries perform better.

Decision: This issue has to be reported among the knows issues and properly tracked via a bug

Update(23/8) by David: The bug already exists. BUG:24690


Recommendations for release and deployment

The extra configuration (special indexes) needed by Atlas and CMS on on the LB and WMS respectively has to be described in the release notes

A wiki was written by Yvan Calas (thanks) describing it https://twiki.cern.ch/twiki/bin/view/FIOgroup/ScLCGWms31ConfigVO

Danilo and Daniele are kindly requested to confirm that the information there is correct

Decision about termination/extension of the pilot

The decision is made to terminate the pilot and relase the WMS on Thursday 29 May 2008.

Thanks to all participants.

AOB

Actions

Assigned to Due date Description State Closed Notify  
Main.Antonio 2008-05-29 Verify with Andrea Sciaba' if an appropriate section is available in the User Guide to describe known issues affecting users   edit
Main.David 2008-05-27 Verify if there is a bug or eventually open one to be referenced in the release notes among the known issues

Udate 23/May by David: The bug already exists. BUG:24690

2008-05-23 edit
Main.Daniele, Main.Danilo 2008-05-27 Confirm that the instruction in https://twiki.cern.ch/twiki/bin/view/FIOgroup/ScLCGWms31ConfigVO are correct
Update:26-5-08
Information are correct, but that script must be run on
> an empty DB otherwise it may crash. When run at cnaf on a DB 2GB big
> it crashed.
2008-05-27 edit


Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-05-27 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback