Ian Bird: Ok. RFC plus glexec deployment are 2 issues that need followup. May be in the "Operations Coordination Team"
Michel: Next GDB there will be a report by Maarten on where we are wrt RFC proxies
Ian Bird: put cloud as a pre-GDB discussion forum for the moment
Michel: For the batch systems, there is a group in HEPIX. Should benefit from their work and not duplicate
Wahid: also missing SRM. In the TEG report, the SRM functionalities required were identified. What is not in the report is the alternatives in which people is working now
Ian Bird: yes, most probably need a group for this
Philippe: bottom line: do we keep SRM? do we replace it?
Ian Bird: may be SRM will be more used for archive and not for data analysis systems (more towards xrootd there).
Markus: for the Managed transfers, we could work with gridftp only, for instance
Philippe: reporting space used/available is something we currently get from the SRM, and we need to keep this
Ian Bird: ACTION: Markus and Wahid will write down what the Storage Interface (avoid call it SRM) Group should do.
Philippe: a more fundamental problem with these WG is the lack of a managerial body that can enforce what sites and m/w are doing/deploying.
Ian Bird: Indeed. This body does not exist. WLCG does not enforce, it needs to look for consensus
Jeff: how do we pass from recommendations to make it happen? We do not have an enforcement body
Ian Bird: The Operations Coordination Team proposed by the OPS TEG should be empowered by the MB to follow up on the deployment of all these
Storage Accounting Update, John Gordon
StAR: storage accounting record, proposed by EMI. Scope is limited to consumption of storage
published version, link URL
being implemented by EMI storage providers
Implementations
All the EMI providers have implementation roadmap (see slides for timeline. Accounting sensors released in EMI3)
Need to talk to other providers. John talked to Castor&EOS developers and they were not aware.
Fallback solution: if native publishing is missing from storage providers, records can be published through Dashboard, Gstat or similar from bdii info
John presents a PROPOSAL: try and cover the WLCG use case in the Installed Capacities document (http://cern.ch/go/6JPK). See slides for proposed fields.
Note that both allocated and used storage capacity will be accounted for. Also, we should make sure that SITE info is included in the record
Need to decide what to publish. Eg: total allocated and used, per site, per VO, per month
Need to decide what to view. Eg: min/max/average over time period
Gstat is already collecting information from the BDII. We could do some reports to try and push the improvement in the quality of the data
Discussion:
Ian Fisk: We use the CPU accounting to justify the usage of the CPU, not for decision making. We should scope for the storage accounting in the same way. For instance: do not invest time in per-DN/user accounting, since it is not used
Michel: This service looks a bit far away in time. Should we think in some interim solution?
Ian Bird: are the interfaces to the storage systems sufficient for the experiments to know the usage of the storage systems?
Philippe: We have an agent polling SRM every 10min. Use this for monitoring&accounting. BUT the only info that SRM is not providing is for tape usage. We have only the disk information
Claudio: in CMS trying to use the bdii information and aggregate it in the dashboard. Unfortunately, most of the sites publish unreliable information
Michel: what we do?
John: we should have prototypes and reports from a subset of sites and implementations
Maarten: we have to ask EMI about the time scales when we can start testing some of these implementations
ACTION: John Gordon agreed to ask EMI about these time scales
Information system, Maria Alandes
Caching mode in the bdii is available. It is the default config from the EMI1 update in March
Deployment status: around 80 top-bdii (19 EMI, 17 gLite, 29 publishing wrong version information). We can not know anyhow if the caching is enabled
NOTE: it would be good to reduce the number of top-bdiis. Will check the status of old WLCG proposal in these lines
Feedback from CERN site and experiments is that seems that bdii works better than before
Plan for the future: automate the run of glue-validator
Issue: glue-info-provider-service it is used by most services. The future support is not clear. Need to come to conclusion soon.
Glue 2.0 status:
Benefits: is a simplified model, adds m/w version, support for multicore jobs
Deployment status: now at 53%. Progressing very slowly
EGI has set a deadline for non-Glue2 site-bdiis: end Sep 2012
From EMI2, all services must publish in Glue2
Clients (ie GFAL) can not migrate until all info is published in Glue2
Evolution of the Information System
Recommendations from OPS and WM TEGs (see slides). Long term plan: refactor the the IS into "three pillars": 1) service data (static); 2) state data (dynamic) and 3) metadata (quasi-static)
EMIR, The EMI Registry: will address the service discovery problem. Distributed in EMI2. See main features in slides.
"ginfo" is a client developed at CERN. Will be able to query EMIR and also the BDII. Candidate for replacing lcg-info and lcg-infosites
IS future panorama (see diagram in slides): get rid of the site and top bdiis
EMIR: candidate for static information
For the state data: message based Resource Information System. Still a lot of open questions
Discussion
Maarten: where is OSG? now they are in the bdii, we need to make sure we can do the aggregation at the WLCG level as well. May be we need to pass the requirement to EMI that they need to be compliant with OSG.
Markus: This is not that urgent. We still need to evaluate EMIR. If we are not convinced it is robust enough to be used as a core service, we will not go away from the caching bdii
Alberto: EMI and OSG are organizing some meetings do discusses issues like this. The issue is not being ignored.
Davide: The WM TEG recommendation was to have as as simple as possible IS. We would like to implement multicore support quite early. Do we need to wait for Glue2.0?
Markus: sure it will take time to move to Glue2.0. But we can not patch the old glue. Maxcores, etc, needs to go to Glue2.0
A. Girolamo: from the experiment we would like to see the maxcores attribute to be associated to a CE+queue (GlueCE), or may be to a SubCluster (discussion)
Michel: This is something we will need to discuss in the technical groups dedicated to this (FOLLOW)
WN Security pre-gdb, Romain
Authentication (crls, etc) can not be used for banning. Because a user can be malicious without being compromised. Authorization is the correct way: removing from VOMS not enough, due to long lived proxies. Need central banning to ensure appropriate incident response
Central banning deployment PROPOSAL: "All WLCG sites must implement necessary mechanisms to pull central banning lists from the central Argus instance, for example by deploying Argus locally when applicable. The deployment of these solutions should be followed up in the GDB"
Traceability: necessary for VOs and sites to collaborate to share information on this.
Concern with traceability of data: e.g. storage. This needs to be followed up.
Virtualization on the WN: A WG should be appointed to make recommendations for sites to fulfill the logging and traceability policy on the WN (FOLLOW)
Using external clouds: There are significant security concerns in using external cloud providers. A WG should be appointed to understand the policy & operational issues it raises (FOLLOW)
Discussion:
Philippe: Why do you care about WLCG policies at sites that have nothing to do with WLCG?
Romain: The WLCG policy says that if you provide resources to the Grid, you should be compliant with the WLCG security policies
Ian Fisk: Need to get this clarified. We have lots of examples of sites that "burst" to 3rd sites. May be opportunistic resources, but it will be public clouds at some point. What is the worry?
Romain: The entire security model has been based on the fact that we control the resources.
Ian Fisk: Key issue is to realise who is taking the risk. A site decides to make their pledge through amazon: the contract is between the site and Amazon. The site needs to be aware
Conclusion: Some discussion is needed. Agreement that it is all about risk. Make clear which are the risks and who is taking them (FOLLOW)
Proxy lifetime: can we reduce the VOMS proxy lifetimes? consensus is that this is a good idea, but followup at the GDB is needed at the technical level (FOLLOW)
Philippe: why do you want to enforce this? it makes life more complicated!
Romain: it is always a compromise between security and ease of use
Pool account recycling: PROPOSAL to recycle pool accounts only after they have been unused for 6 months.
Michel: how do we do this? ask EMI to configure this by default, or chase the sites?
Maarten: both. Maarten: ACTION to pass this requirement to EMI.
Very long term future: Identity Federation, i.e. having a model where x509 would be hidden from users: No conclusion, no proposal, just highlighting that it will be worth to participate in these discussions in the future
EMI News, Cristina
See details of EMI updates and timelines in the slides
EMI2 released 21st May. Full support for SL5 and SL6 64bit. Partial Debian6 support.
5 new products: CANL, EMIR, EMI-Nagios, Pseudonimity, WNoDeS
Full integration of ARGUS authz in the CEs and Storage.
Better support of Glue2 in all relevant services.
Initial implementation of the EMI Execution Service interface in all CEs
Support for NFS4.1, http, webdav in the SEs
Messaging-based SE-FileCatalog sync service (SEMsg)
EMI updates schedule: continue providing continuous delivery of EMI1 and EMI2 updates. 1 update cycle/month.
Planned product updates: top bdii, blah, dpm and lfc, gfal/lcg_utils, WMS (see version numbers and details in slides)
Plans for Y3: UI/WN tarballs, Debian 6 porting (aim is to have it by the end of October), Java APIs on Maven Central, release into EPEL/Debian repositories
Discussion
Michel: is the WN in SL6 ready? Cristina: yes, in EMI2, but 32bit binaries are missing
Markus: proposes that the 5/6 sites which have deployed special queues to test the EMI1 WNs, should move now to test the EMI2 WN.
Maarten: the tests from ATLAS and CMS have not reached a conclusion yet. So propose to keep the test EMI1 resources as they are.
Markus: better to put the effort in testing EMI2.
Maarten: do we want to send the message to sites: do not upgrade ever to EMI1, go directly to EMI2?
Markus: may be yes. Otherwise there is the risk we will never have time to certify EMI2 before the end of the project
Cristina: confirms that can run EMI2 WN works with EM1 CREAMCE
Globus s/w support at OSG, Brian
The grant providing globus support dedicated for OSG expires this summer
Organizations with interest (i.e. WLCG) should push to get commit access
Ian Bird: we did this 10 years ago. Maarten was allowed to commit. It did not work very well. Then we got in collaboration with VDT to do this. Not sure if this will work
Technical dependencies:
GSI and GridFTP: widely used. For gridftp alternate implementations and alternate solutions exist => Not concerned
GRAM: not widely used at our scale. Bugs discovered weekly => we are concerned! Alternate exists, need some time to investigate them.
OSG CE: OSG working to investigate the possibility of switching the OSG-CE to using CREAM or Condor-CE technologies. The OSG Executive Team will make a decision. Will know about 1 or 2 weeks after July GDB.
Note there are real impacts of adopting one of these. E.g. could be that if we adopt CREAM, direct gliteWMS submission to OSG will be dropped.
Maarten: moving to CREAM does not break the gliteWMS, otherwise EMI sites would not work
Brian: OSG has not concrete plan (yet?) to provide support for Glue2.0 (Maarten asks for clarification)
ACTION: to clarify with OSG what are their plans regarding Glue2.0
EMI Sustainability Plans, Alberto di Meglio
Software products
Clarification: EMI is a collaboration project (collaborating product teams). The end of EMI is not the end of the product teams.
The list of sw products has been circulated: contains all the products, with partners committing effort to each product. This list now can be make public, and will be circulated.
Few gaps have been identified * Yaim: there is no plan in EMI to get out of yaim. However there is not much enthusiasm supporting yaim beyond the end of EMI. This is an open issue. * RAL-SAGA-SD: Will not be supported. But from this morning presentation seems that there is an alternative: ginfo * also mentioned as minor problems: delegation java, torque config
Global Tasks
Build: ETICS is not required anymore for building, from EMI2. Instead, standard tools are used (mock, pbuilder)
Certification: now it is done by PTs, but testbed not really large-scale. This is an open issue
Coordination: being discussed currently. The plan is to converge by September. One idea is to do it similar to ScienceSoft. Still under discussion
Collaborations:
WLCG: proposal to have a meeting among WLCG, EGI, EMI. The date of the meeting still needs to be found that it fits all the people that need to attend.
EGI: same as above, plus discussion regarding non-HEP
OSG: meeting being planned for end June to discuss common activities, etc.
Discussion
Michel: as WLCG, it is very important for us to make the list of products outside EMI which are important for us (dashboard, etc) which are related to the EGI HUC effort which will also end
Ian Bird: the outcome of the above WLCG-EMI-EGI meeting needs to be how do we manage software in the future, also to discuss: how do we do certification, stage rollout and deployment in general.
Communicating Machine features to batch jobs, Tony Cass
This talk was already presented in the GDB 14 months ago.
The HEPIX VWG proposed the creation of 3 files in /etc/machinefeatures for communicating VM features to jobs (see slides for details). Felt useful also for real machines
ATLAS: configuration issues for BNL and IN2P3 being addressed
Experiment plans:
CMS: started opening tickets where glexec does not work
Claudio: main reason for CMS pushing this is the security challenge in July. In the long term, it should be WLCG who asks the site to support glexec
ATLAS: working on glideinWMS backend for Panda, running glexec tests since few weeks
ALICE: will look into implementing special proxies with critical extension only understood by glexec. This will require some development
LHCb: will check again if glexec support in DIRAC needs adjustments
Michel: suggests to send e-mail to the CB list for the site representatives to pass the message to the technical people that glexec should be installed and configured at all sites.
Need to decide on a pilot project: non-browser based. Services enabling WLCG resources using home-issued generated credentials (see slides for diagrams)
Proposed plan: 1) proof of concept; 2) architecture design; 3) pilot service
CALL for interested experts, sites, VOs to join this effort.
Discussion
Alberto: STS (Security Token Service) it is a translation service between different credential systems. The implementation of STS in EMI is late. Asks if there is room for collaboration, and in particular if the effort in EMI can be stopped
David Kelsey: we should add STS to the list of possible solutions, and EMI STS experts are more than welcome to join this effort