Auth and Auth Reports for LCG SCM from 2007 and earlier.

December 12th 2007

  • Upgrade to new version of glite-VOMS_oracle and migration to SLC4 have been done on Monday, December 10th
    • voms-admin-2.0.8 is now used on production servers (it fixes some security flaws)
      • not possible anymore to have the same have twice with 2 different syntaxes for "Email=" field, and this version is not yet able to generate these 2 syntaxes (2.0.9 is). Hence, for the moment a few users are not able to use some sites (especially OSG sites), while the new version of edg-mkgridmap won't be installed on those sites.

November 21th 2007

  • An EGEE broadcast was sent to let sites know that it is possible to use trusted VOMS DN instead of their certificate
    • lcg-vomscerts cannot be removed as announced in the broadcast due to dependency problem
    • Some sites are misconfigured (host name and domain name), and so have some problems to make it work.

November 14th 2007

  • The certificate of voms.cern.ch will expire on Nov. 20th. Due to short announce to upgrade lcg-vomscerts, some sites may not upgrade lcg-vomscerts quick enough, and proxies coming from voms.cern.ch will be rejected on those sites. 2 solutions:
    • Delay the change of the host certificate to give 2or 3 days more to sites, and let VOs know that their proxies' lifetime will be shorter
    • Stop VOMS proxy generation on voms.cern.ch, in order that lcg-voms.cern.ch handle all requests correctly (After discussing it with other attendees, this solution was chosen; VOMS proxy generation will be stopped on voms.cern,ch after a EGEE broadcast to announce it)

October 17th 2007

  • Security fix for voms-admin 1.2.19 is ready but won't be certified, efforts are on voms-admin 2.0.8.
  • Motherboard of voms104 must be replaced, voms.cern.ch is pointing to voms101 again

September 19th 2007

  • Upgrade of VOMRS on production servers to fix some security issues on webUI
  • new node into gridvoms cluster (voms104). Currently voms104 is replacing voms101 which went to maintenance due to a hardware problem. This problem is now fixed.

August 22nd 2007

(By M.Dimou)
  • Neither voms.cern.ch nor lcg-voms.cern.ch recovered properly from the Oracle wlcg servers' problem of August 17th (Friday around 5pm). The reasons are still being discussed with the voms core developers in savannah #19770
  • The vomrs developers wishes the upgrade to vomrs-1.3.1. to proceed due to its new features. We hope to agree on a date early September.
  • The voms-admin certification status will be discussed today at the EMT. Latest information we have dates since June 25th.
  • A vom(r)s workshop is being prepared for the week of October 22nd. Here is the draft agenda.

July 11th 2007

  • VOMS servers have been upgraded to latest production versions
    • Fix lots of bugs
    • Enable GA support in VOMS Core (can be used for testing purposes)

June 20th 2007

  • LHCb asks for longer VOMS proxy (1 month)
    • current proxy lifetime is 1 week (already too long according to security people)
    • reason is that with current lcg RBs, and 1 week proxy, some jobs fail
    • in discussion in OSCT

May 30th 2007

  • Transition to the new certificate of lcg-voms.cern.ch seems to have been done quite smoothly
    • Some people haven't updated their 'vomses' for the certificate change of voms.cern.ch which was in January
    • AFAIK, site-info.def with new DN is not yet published with glite-yaim

May 16th

  • Reminder - lcg-voms-cern.ch has a new certificate
    • first broadcast has be done on May, 15th
    • already available in lcg-vomscerts-4.5.0.1
    • new certificate will be used on May 24th

May 9th

  • Grid certificate of lcg-voms.cern.ch will expire on May 29th
  • Oracle DB passwords have been changed on May, 8th due to expiricy date
    • voms.cern.ch had a problem this night (reported by Yvan Calas) [solved]
      • It was due to an misconfiguration following the password change
      • No users have complained yet, but gridmap file generation must have failed

April 25th

  • New vomrs-ping script to check VOMRS status
    • Increases a lot the number of detected errors
    • Must be run on the master node with VOMRS
      • Changes must be done in the configuration of LinuxHA (in CERN-CC-gridvoms RPM package)

April 18th

  • Reinstallation of voms102 and voms103 done
    • CDB templates need some improvments for smoother reinstallation
    • VOMRS was broken on lcg-voms during some minutes, but VOMS and VOMS-Admin were perfectly available.

April 4th

  • voms102 and voms103 are on the same switch. Maybe it will be a good idea to have them on different switch.
  • Reinstallation of voms102 scheduled on April, 16th ; and voms103 on April 17th if there is no problem with voms102.
  • Intervention on April, 2nd :
    • The switch replacements finished at 09:15, but there was no communication about it before 11.30
    • VOMS services were available at 10.40, but with high load during about 30 min

March 21st

  • March 20th:
    • "DBDirty Connection" appeared in VOMRS log, and made all mail notifications fail.

March 14th

  • March 5th : due to an error in CDB, spma removed jdk-1.5. So it made tomcat5 stop running on production voms server during several minutes until jdk was installed by hand

February 28th

  • Following Oracle intervention, all VOM(R)S services are back, after upgrading the connectiong string in all configuration files
  • SPMA problem appeared on voms102 ; caused by a problem with automounter and nfs (?!?)
    • Master node has been switched to voms103, to be able to investigate without breaking VOMS services
    • Now it is fixed !

February 14th

January 30th

The VOM(R)S Workshop took place at CERN in the period 22-26/1 with developers' participation and was very useful.
  • Bugs were discussed thoroughly,
  • jdk-1.5 was installed on all 3 production servers for better tomcat debugging, with no problem for the server (so far).
  • patch 869 was installed in an emergency to apply the fix for bug 13888
  • major progress was made in the understanding of OCI connections to Oracle from vomrs, voms-admin and voms core. A working configuration is under test now.
LCGVomsCernSetup will reflect the new rpms, when they appear on the gLite production repository, for FIO information and CDB update.

Another very useful part of this workshop was the discussion with several VOs of their requirements concerning VOMS Generic Attributes at the WLCG BOF

This workshop marks the end of the LCG User Registration Task Force (TF). Notes and actions are being prepared and will be available from the TF meetings' index.

January 16th

  • Happy New Year 007
  • The Xmas period was smooth for the VOM(R)S services.
  • A new tomcat catalina option discovered by the VOMRS developer Tanya Levshina made the service perform infinitely better (no tomcat restarts from cron!!).
  • The split we applied in December, voms.cern.ch for gridmap and proxy only and lcg-voms.cern.ch for vomrs and proxy only seems beneficial.
  • Some sites, including CERN nodes haven't applied the VOConfigForSites2007 despite intense publicity.

December 18th

No more LCGSCM meetings this year but important vom(r)s events to note for the record:
  • Dec 19th: Increased timeout for ATLAS and CMS on voms.cern.ch to the values of lcg-voms.cern.ch. Instructions in page VomsFAQforServiceManagers
  • Dec 19th: Wrote documentation LCGVomsConFiles and linked all recent twiki pages from LCGSecurity.
  • Dec 18th: Introduced the parameters (ENABLE=BROKEN)(LOAD_BALANCE=yes) in all (3 servers per 10 VOs = 30 files) /var/glite/etc/voms-admin/[VOname]/voms.database.properties. This is par of Solution 2 in https://twiki.cern.ch/twiki/bin/view/PSSGroup/OCIClientHangProtection we had applied for voms core but not for voms-admin. Now, the above voms-admin files for all VOs look like this:
# The JDBC connection string.
jdbc.URL                        jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=lcgr4-v.cern.ch)(PORT=10121))(ADDRESS=(PROTOCOL=TCP)(HOST=lcgr1-v.cern.ch)(PORT=10121))(ADDRESS=(PROTOCOL=TCP)(HOST=lcgr2-v.cern.ch)(PORT=10121))(ADDRESS=(PROTOCOL=TCP)(HOST=lcgr3-v.cern.ch)(PORT=10121))(ENABLE=BROKEN)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=lcg_voms.cern.ch)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)(RETRIES=200)(DELAY=15))))
vomrs doesn't use these 2 parametres yet. We plan their introduction in vomrs as well as the use of lcg_vom(r)s _w accounts in vomrs and voms-admin configurations when we are back in full operation in January.
  • Dec 18th: Informed GD Group and the ROC managers about all actions related to voms.cern.ch certificate change on Jan. 9th. Changed parametres in page VOConfigForSites2007.
  • Dec 14th: tomcat unrepearable repetitive outofmemory errors. Emergency tel. meeting with vomrs and voms-admin developers. Decided to re-configure vomrs on lcg-voms.cern.ch (voms102 or voms103) to synchronise with voms-admin on voms.cern.ch. In this way:
    • voms.cern.ch is used for gridmap file generation and voms proxies.
    • lcg-voms.cern.ch is used for vomrs updates (via the web ui) and voms proxies.
  • Dec 13th: Changed the tomcat restart from cron frequency to 4 times per day on all 3 servers in view of the Xmas shutdown at CERN.
  • Dec 13th: Published (with R.Harakaly) page LCGVOConfigForSites to help sites remove ldap-related lines after the lcg-registar.cern.ch withdrawal.

December 12th

Dec 12, 2006 5:07:10 PM org.apache.catalina.startup.HostConfig undeployApps
 SEVERE: Error undeploying web application at context path /vo/dteam
 java.lang.OutOfMemoryError
Tomcat was restarted.
  • Dec 12th: vomrs patched to fix https://savannah.cern.ch/bugs/index.php?22272. Details in page VomrsUpdateLog.
  • Dec 11th: Big number of user complaints who couldn't obtain a voms-proxy. The problem, summarised the next day by M.Anjo, was on the Oracle side "yesterday around 13:30 the clusterware of the LCG RAC entered in a strange state and start giving this error for which there is almost no information both on web and metalink. We do not have any information on the server logs(...)".
  • Dec 11th: The OPS VO vomrs interface was found broken with "Error:null" around 3pm. By the time we restarted this specific VO, tomcat was found OutofMemory all together.
  • Dec 11th: The ALICE VO voms-admin list users' interface was found broken around 11am, while the others continued working. This makes the gridmap file refresh for ALICE impossible while it lasts. See details in https://savannah.cern.ch/bugs/?func=detailitem&item_id=21808#comment6 *Dec 11th: The LDAP Authentication server was switched-off at 10am as announced multiple times since Spring 2005 according to the plan. More details in LcgRegistrar.

December 5th

  • Upgrade to vomrs-1.3.0, as published in GmodJournal#20061201_Actions , was done on Dec. 4th as planned. VO Admins were prompted to submit in savannah further improvement requests if any. Tomcat performance is being monitoring (OutofMemory error on Dec 5th 2:57PM CET. Several questions were addressed to vomrs-grid-support@cernNOSPAMPLEASE.ch.
  • The stoppage of lcg-registrar.cern.ch on Dec 11th at 10AM CET is published on all possible fora. Details in LcgRegistrar.

November 29th

  • Multiple intermittent errors from the voms servers when clients build the gridmap file. One case was due to bogus characters in the user DN bug #21932. Other cases are not yet explained, still being discussed with the voms-admin developer bug #21808, ggus 15120, ggus 15730 and more.
  • Despite the adoption of solution 2 in the OCIClientHangProtection recipe (see previous report below), SAM jobs couldn't run during the week-end of Nov 25th due to multiple hanging edg-voms processes on lcg-voms (voms103) for the OPS VO bug #21930 and remedy 384540.
  • A new certificate is signed by the new CERN CA (IT/IS group) for voms.cern.ch. It is needed on Jan 9th. The preparation of new vomses files for deployment at the sites is being done by M.Dimou and a new rpm for distribution will be published by M.Litmaath.
  • vomrs will be upgraded to v.1.3.0 on Monday Dec. 4th. Announcement will be broadcasted to all relevant parties on Friday Dec 1st.

November 15th

November 7th

  • On Nov. 3rd pm and during the week-end that followed, the same Oracle problems as in October 28th and 30th, caused 279 processes to hang on voms.cern.ch and users became unable to obtain a voms-proxy. The developers were asked why the replica server (lcg-voms.cern.ch) was not tried instead after a time-out.
  • On Nov. 1st at 1:22am tomcat stopped working on voms.cern.ch (voms101).
  • On Nov. 1st at 9am, as a consequence of bug #16236 the voms-admin interface of Alice, CMS and DTEAM were only displaying their VO homepage and no others. This bug is fixed since some time but the relevant patch #869 is not yet released.

October 31st

An Oracle instance hung on the database backend on Saturday Oct. 28th and Monday morning Oct 30th. The dteam voms sessions hung for ever. As a result no dteam user could obtain a voms-proxy. A big number of edg-voms processes were found hanging for dteam on both servers (299 processes on voms.cern.ch and 249 on lcg-voms.cern.ch). The database experts observed such problems often recently and they work with Oracle support on a solution. The voms developers were informed of this voms-ping 'inefficiency' and are working on this. Savannah bug #19770 contains the details.

After some basic vomrs and voms testing on INT3R, we broadcasted on http://cic.in2p3.fr and relevant mailing lists, a 15mins interruption for vomrs on Nov. 1st during the security patch installation on the Oracle servers. As vom(r)s service managers we are concerned about the testing effort required:

  • every 3 months -basic tests(!?)- for the Oracle security patches.
  • every 6 months -the whole test suite (which takes a week!) for Oracle patchset fixes (the next one is imminent, announced for the end of October).
  • every time we receive a new vomrs version (see VomrsUpdateLog for the vomrs update frequency).
It seems that no vom(r)s on Oracle testing takes place elsewhere, so the required testing resources are higher than what we can afford.

October 25th

Alarm from voms102 on Sat. Oct 21st at 8:03am due to an incomplete restart from cron. No service problem as voms102 is the slave lcg-voms since Oct 5th. Fixed at the next restart.

CA-1.10 updated on all voms servers and lcg-registrar.cern.ch. NB!! This service goes out of production on Dec 11th according to LcgRegistrar.

Important coordination meeting with voms/voms-admin/vomrs developers was held on Oct 24th. The notes still very draft on Oct 25th will be in http://cern.ch/dimou/lcg/registrar/TF/meetings/2006-10-24

October 17th

The voms DNS alias was taken out of voms101 and given to prod-voms (which also holds the lcg-voms alias) at 9:30am CEST on October 16th according to the plan. A last broadcast of news and email on the subject was issued via http://cic.in2p3.fr on Oct 12th.

voms101.cern.ch was re-installed from scratch according to the requirements of the security group. The re-configuration was completed but a tomcat file ownership problem prevents the relevant process to start. This is now being investigated. As soon as this is solved the voms DNS alias will be given back to voms101. This VOMS server will remain available in the future for gridmap file building but not for user registration (no vomrs) nor for voms-proxy (no fall-back service). The details on page LCGVomsLdapServer.

October 10th

namely:
fetch-crl-2.0-1.noarch.rpm            
glite-security-utils-config-1.2.5-1.noarch.rpm 
glite-security-voms-clients-1.6.16-2.i386.rpm
 glite-VOMS_oracle-3.0.4-1.noarch.rpm  
glite-security-voms-api-cpp-1.6.16-4.i386.rpm   
glite-security-voms-oracle-2.0.8-0.i386.rpm

September 26th

  • Monitoring work is being done by R. Bonvallet, using voms103, with emphasis on the memory leak possibly increasing after every gridmap file generation. Updates in relevant tickets linked from VomsServiceMonitor.
  • Clarifications being exchanged offline and in savannah with developers to acheive VomsOracleImprove.
  • 21/9 https://savannah.cern.ch/bugs/?func=detailitem&item_id=13888 caused problem to correctly construct the gridmap file for CMS. Work-around, suggested by the voms-admin developer A.Ceccanti, applied.
  • 20/9 at 18:08. Additional OutofMemory discovered in addition to the one reported last week.

September 21st

Out of Memory again on lcg-voms.cern.ch Forced unscheduled tomcat restart on:
  • 20/09 at 15:30
  • 15/09 at 18:00
  • 13/09 at 07:49 (didn't restart from cron)

Testing of VOMRS 1.3 partially completed and reported

September 12th

  • We need to decide on a date for voms101,2,3 kernel upgrade (now running Linux voms10x.cern.ch 2.4.21-40.EL.cernsmp #1 SMP Fri Mar 17 00:53:42 CET 2006 i686 i686 i386 GNU/Linux) in order to broadcast the scheduled interruption.

  • Optimising the access to Oracle is quite complex and affects multiple databases, vom(r)s components and glite config. files. Summary in VomsOracleImpove and the savannah bugs linked from that table.

September 6th

  • Work is continuing for changing voms to understand long connection strings to the database backend and OCI. Example in bug #17456. Progress is monitored in the EMT meeting

  • About 10 messages per day are being sent to a mailing list but some of the members don't know at all the hosts concerned:

Subject: Cron <root@lxb7026> . /etc/glite/profile.d/glite_setenv.sh ; sh $GLITE_LOCATION/libexec/glite-wms-check-daemons.sh > /dev/null
 Date: Tue, 5 Sep 2006 18:45:17 +0200
 From: root@lxb7026.cern.ch (Cron Daemon)
To: grid-cern-prod-admins@cern.ch

warning, got bogus unix line.

Sep 4th: Thanks to Jan Iven for noticing that the firewall permissions for the VOMS servers were set with an expiry date and they would become unreachable from the outside if we hadn't taken immediate action to change that.

August 30th

Aug 29th:

  • voms101 (voms.cern.ch) suffered the long-standing voms-admin/tomcat bug #16843 Maria Dimou (MD) changed the CATALINA_OPTIONS in the tomcat configuration file and arranged a daily restart of tomcat from cron.

  • vomrs-1.2-3 was upgraded on voms102 and voms103 by the developer (Tanya Levshina from FNAL) at 21hrs CEST to complete the use of Fully Qualified Object Names.

  • MD and Roberto Bonvallet (GD security team) is working on voms monitoring daemons' enhancements met with with Tim Bell for advice. Some suggestions are submitted to the developers bug #19311. The new tools will be documented in VomsServiceMonitor.
The "recycling" of voms101.cern.ch after Oct. 16th (end of voms-ldap life) was also briefly discussed. The suggestions for future use will be recorded in LCGVomsLdapServer.

Aug 27th: The voms servers had problems to connect to the database because some part of the vom(r)s software stack does not use the full connection string and only contacts one listener. MD and Miguel Anjo are meeting on Aug 31st to plan the software change.

Aug 24th: After CMS VO re-configuration and restart, the other VOs defined on the same server died. This is a new VOMS core bug #19349

Aug 23rd: An alarm was raised at 04:45 from voms102 (lcg-voms.cern.ch at that time). Email sent only to maria.dimou@cernNOSPAMPLEASE.ch. /var was 91% full. MD cleaned the voms-admin log files that are not rotated. Could LEMON and LinuxHA related files in /var/lib/heartbeat/cores/root/core.28xx and /var/edg-fmon-agent/voms102/2006_08-23_0000xxxx be also cleaned? (although, they were, certainly, not the cause of the problem).

August 22nd

Aug 21st: Some lemon expertise is growing in GD, so we hope to complete and enhance vom(r)s monitoring tools in VomsServiceMonitor with advice from FIO.

Aug 18th: Weekly reminders started from cron to LHC Experiment and DTEAM users who use voms-proxy-init without being registered on lcg-voms.cern.ch. The text of the personal email to the user is in http://cern.ch/dimou/lcg/voms/voms-ldap.end

Aug 17th: VOMRS was re-started, with management agreement. after code was changed to use the Oracle-suggested Fully Qualified Object Names. Monitoring on the database side showed that some queries running against VOMRS schemas are still not using Fully Qualified Names so another rpm from the developer is imminent.

August 15th

Aug 15th:

Aug 14th: Tests are not conclusive on whether the Oracle workaround fixes the bug published below which obliges VOMRS to remain shutdown on lcg-voms.cern.ch

Aug 11th: lcg-voms.cern.ch tomcat was OutofMemory around 17pm despite successful restart from cron at 7:49am as VomsServiceMonitor explains. No news reported in savannah bug #16843

Aug 11th: Published on the CIC portal: We need to, temporarily, shutdown vomrs on lcg-voms.cern.ch which hit Oracle bug 2508682 described in Oracle https://metalink.oracle.com as follows:

> ****************************************************************
> Under very heavy load a session may end up using the wrong shared cursor
> if concurrent sessions are all executing the same statement but use
> different versions of the statement (eg: for different schemas).
> Eg: If different users all issue the same INSERT statement into a
> privately owned table then under heavy load one of the sessions
> may end up using the wrong shared cursor and may insert data
> into the wrong schema table
> Workaround:
> Fully qualify object names.
> ***************************************************************** 
We will publish the service restart on http://cic.in2p3.fr a.s.a.p.

Aug 10th:

Aug. 9th: Together with M.Anjo reconfigured and moved voms.cern.ch (voms101) data from grid8.cern.ch to:

db host:lcgr4-v.cern.ch
 db name:voms_pilot.cern.ch
 db port: 10121
 db accounts: voms_ldap_alice, voms_ldap_atlas, voms_ldap_cms,
voms_ldap_lhcb, voms_ldap_dteam.

Aug 2nd - by Ian Neilson

1st Aug - All VOMS administration interfaces and gridmapfile export were unavailable for approx 1hr due to misunderstanding that Oracle port 1521 would still be available 'inside cern' for some time after closure to external connections today. Apparently this applied only to 137.138.x.x and voms services are on 128.142.x.x. Manual reconfiguration was necessary since the configuration scripts are currently hardwired to the old port.

31st July - A reconfiguration of a 'test vo' on lcg-voms.cern.ch (voms102) to use different database ports, in combination with bugs in scripts to reconfigure and restart only single VO caused the service to be unavailable for approx 1hr. Updates as part of gLite-3.0.1 apparently fix these problems and will be applied

30th July - Although primary service lcg-voms.cern.ch came back after weekend power failures, voms.cern.ch was delayed due to lower Oracle service level offered by backend database for this machine (as discussed week 12th July). Action on this, either to decommission the service or move the database will be discussed next week with Maria Dimou

July 26th - by Ian Neilson

24th July - Database corruption/inconsistency related to incomplete registration fixed manually by developer.

21st July - VOMRS interface patch applied to fix bug preventing Representative re-assignment (need rpm!).

20th July - Tomcat "Out of memory error" at approx 17:20 caused unscheduled interface restart.

July 19th - by Ian Neilson

July 11th - lcg-voms.cern.ch planned upgrade to VOMRS version 1.2.3 completed within announced timeslot.

July 12th - by Ian Neilson

July 11th - voms.cern.ch service restored.

July 7th - One of the high availability Oracle cluster machines used behind lcg-voms.cern.ch was 'off'. Due to bug in current VOMRS configuration the necessary failover connection string cannot be used. This caused the VOMRS web interface to be unavailable for new registrants and VO admins. Proxy generation was not affected. Manual reconfiguration to use one of the other machines restored the service.

July 6th - On the evening of Thurs. July 5th the database behind voms.cern.ch crashed with hardware problems. This VOMS instance, providing voms credentials only to those not re-registered in lcg-voms.cern.ch, was never on high availability so is down pending hardware repair (latest estimate 12th July). No updates were being made through this machine and users should be able to run using non-voms proxies.

July 6th 2006 - by Ian Neilson

Upgrade to VOMRS version 1.2.3 now provisionally planned for next week due to testing schedule.

3rd July - CRL update was once again found to be failing causing denial of service to some users. Fixed by crond restart.

3rd July - Short unscheduled service interruption due to reboot after power supply replacement. High availability switch made correctly.

4th July - As a result of the HA switch of 3rd a configuration discrepancy in max. proxy lifetime for cms and atlas was found on the new service instance which had not been propagated from the primary server. Fixed but required unscheduled restart.

June 29th 2006

A VOMRS upgrade to version 1.2.3 is planned for the 1st week of July on the production service lcg-voms.cern.ch. Detailed plan:

Decide on the date/time/duration of the rpm upgrade and inform <vomrs-grid-support@cern.ch> (Tanya Levshina, FNAL)

Broadcast scheduled intervention 24 hrs earlier on http://cic.in2p3.fr (Maria this week or Ian next week).

rpm -U on voms102 and voms103 (Tanya).

Send the location of the new rpm (including the Foundation db api) to <vomrs-grid-support@cern.ch> (Tanya)

Update twiki page for the FIO installation server https://uimon.cern.ch/twiki/bin/view/LCG/VomsCernSetup for vomrs (Lanxin).

Inform Thorsten Kleinwort (FIO) about the change of rpm on the hosts (Lanxin).

************************************************
> Release 1.2.3 Changes (04/14/06)
> -------------------------------
> New features and bug fixes
>
> 1. Web UI - allows to specified the status of the certificate in search criteria for all pages dealing with certificate 
> 2. Web UI - shows subset of CN information instead certificate subject (DN) when selection representative, group owner,
> managers, etc
> 3. Web UI - LCG registration type do not have to add Institution and Site menu and do not allow to select
> "institution", displays "Institute" information fetched from CERN HR DB if chosen (savannah bug#15134) 
> 4. Web UI - better(?) layout for group/role selection
> 5. Added subset of CN information in mail subject when it is relevant (savannah bug #14653) 
> 6. Added Registration Type for each event, so different types of registration can have 
> own set of events.
>
> Bug fixes:
>
> 1. Fixed the generation of the link in notification email, so the "(" in the certificate subject can be handled correctly
> 2. Fixed some error and warning, as well as some wording and labels in help pages 3. Fixed handling of expiration date for
> LCG Type of registration (savannah bug #15146)
> 4. Implemented work around for Oracle bug (savannah bug #14286)
> ************************************************************** 

EMT discussed critical bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=16843 on June 21st. java 1.5 with voms-admin-server-1.2.17 will be tested at CNAF. No change planned on the production servers until the next official gLite release is certified.

TCG discussed the future of VOMRS for the whole EGEE community based on https://uimon.cern.ch/twiki/bin/view/LCG/VomrsFunctionality and required additional effort for bugs listed in http://cern.ch/dimou/lcg/vomrs/ . The outcome was:

  • M.Schulz (SA3) to decide on future VOMRS distribution (VDT or gLite).
  • J. White (SA1) to report on necessary effort for making voms-admin compliant with the Security requirements.
  • E.Laure to contact D.Kelsey (JSPG) for general implementation of Security requirements in products.

On June 21st lcg-voms.cern.ch suffered from bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=15638

On June 27th voms.cern.ch couldn't contact its database due to work on the Oracle server grid8.cern.ch. Users who have not yet registered on lcg-voms.cern.ch (the primary VOMS service) were affected by this.

June 21st 2006

About the part of VOMS service run by FIO:

https://uimon.cern.ch/twiki/bin/view/LCG/LCGSecurity contains 14 documentation links about the VOM(R)S service.

Depending on the output of the https://uimon.cern.ch/twiki/bin/view/LCG/VomsServiceMonitor scripts, the operators or SMoD will run the re-start procedures as documented in https://uimon.cern.ch/twiki/bin/view/LCG/VomsStartStopCheck or will contact the VOMS Service Manager (alias 3rd level support) as documented in https://uimon.cern.ch/twiki/bin/view/LCG/VomsProblemSolving

On June 8th pm: Maria installed http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/RPMS.Release3.0/glite-VOMS_oracle-3.0.3-0.noarch.rpm on voms101,2,3 as replacement of glite-VOMS_oracle-2.3.0-1.noarch.rpm as announced. This contains the fix for https://savannah.cern.ch/bugs/?func=detailitem&item_id=15567

Updated bug status in savannah for all bugs listed in the June 8th report below.

Updated pages LCGVomsCernSetup and VomsConfiguration.

Following a request of the ATLAS VO manager to make updates in voms.cern.ch, the ldap-sync procedure should remain disabled in the future. Page LCGVomsLdapServer was updated. The end of lcg-registrar.cern.ch is planned for YearEnd 2006 and communicated to all VO managers. By then, voms.cern.ch will become obsolete.

The rpm containing the critical bug fix: https://savannah.cern.ch/bugs/?func=detailitem&item_id=16781 is now ready for testing.

Critical https://savannah.cern.ch/bugs/?func=detailitem&item_id=16843 was discussed at the June 21st EMT. A test installation at CNAF will use java1.5 before porting the tests to other testbeds.

June 8th 2006

As the number of twiki pages on VOMS is becoming high an index was made under https://twiki.cern.ch/twiki/bin/view/LCG/LCGSecurity .

We started building VOM(R)S FAQs and HowTos that are not specific to the CERN installation on http://goc.grid.sinica.edu.tw/gocwiki/ for use by users, VOMS service managers and GGUS supporters.

Review of high-priority bugs that will require rpm replacement on voms101,2,3 when fixes released:

  • bug #15567: Ready. Maria will install a.s.a.p.
  • bug #16781: Important bug. Fix available. Sub-system not available. Fix not released.
  • bug #16843: Left for gLite 3.1 (October 2006).
  • bug #16236: Left for gLite 3.1 (October 2006).
  • bug #16742: Not ready. Fix to be negotiated with the developers (Maria in savannah).

On Friday June 2nd, we discovered that the voms server on lcg-voms.cern.ch for DTEAM and OPS VOs couldn't connect the the database (db) and users were considered 'unknown'. This is due to bug #16781 but shouldn't occur because we keep db connections 'alive' from cron (/etc/cron.d/voms-maint). As we never managed to explain this, we left it as a one off incident.

May 31st 2006

Documentation on voms101.cern.ch (running the VOMS-LDAP synchronisation service) is now available in LCGVomsLdapServer.

Documentation on alarms and daemons is now available in VomsServiceMonitor.

Pages LCGVomsCernSetup, VomsStartStopCheck, VomsWlcgHa were improved or updated.

Page VomsProblemSolving was changed following the final list on 3rd level support produced by IT/GD/OPS section.

On May 24th we discovered that the CA upgrade to version 1.4-1 was not recognised by VOMS unless the core service gets restarted. This is a TrustManager bug now filed as https://savannah.cern.ch/bugs/?func=detailitem&item_id=17046

On May 26th, the /etc/cron.d/voms-maint jobs that restart tomcat automatically had vanished for the second time from voms102.cern.ch (lcg-voms). Relaxed mode is now set for SPMA to accept what is on the host and not force synchronisation with CDB.

May 23rd 2006

  • The voms server was found dead on voms.cern.ch (voms101) after the power cut of May 17th around 17hrs.
  • Tomcat was found dead on voms.cern.ch (voms101) on May 22nd at 12 (noon).

We plan to enhance existing voms monitoring tools that notify us in case of problems. The present tools are:

voms-check: script using voms-ping and

voms-maint: cron table grouping preventive restart jobs.

Pending fixes in https://savannah.cern.ch/bugs/?func=detailitem&item_id=16742 and https://savannah.cern.ch/bugs/?func=detailitem&item_id=16842

Do we need to add a component for vomrs-check or LinuxHA (with operators' alarm) takes care of this? Twiki page documenting all vom(r)s monitoring tools will follow decsions on the above.

Page LCGVomsCernSetup was updated with:

  • the new glite repository location of the voms rpms.
  • the home-made voms-maint rpm (Action on Thorsten to copy on swrep).
  • Logfile locations and config. files that define the filenames and rotators.

Many thanks to Veronique to upgraded the CA rpms to ca_*-1.4-1 on voms101,2,3.

May 16th 2006

Since the migration to the production Oracle db servers (on May 9th) we suffer from voms core bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=16781 (the db disconnects idle accounts to save resources). This is a very big problem for us because the users can't get a proxy suddently after the db account for their VO was idle for the time of the production db timeout (normally 4 hours!). We try to restart it from cron now, so far unsuccessfully. We installed a rpm with /etc/cron.d entries http://cern.ch/dimou/lcg/voms/voms-maint-3.0-1.noarch.rpm on voms102 and voms103 (it is not needed on voms101 because it doesn't use the production db) but does no harm if FIO wants to put it there for homogeneity. Must be put on the installation server because the May 15th scheduled reboot removed it!! Page LCGVomsCernSetup is updated.

Otherwise, the vomrs startup and fail-over of lcg-voms from voms102 to voms103 tested on May 15th was a success. The only question is why voms core doesn't run on voms103 now. Last trace in the log is Mon May 15 15:23:04 2006:voms103.cern.ch:vomsd(30037):INFO:STARTUP:VOMSServer when we started it by hand. As explained in page VomsStartStopCheck and VomsWlcgHa only vomrs should run on the master only.

The glite configuration script included in glite-VOMS_oracle-2.3.0-1.noparch.rpm has to be upgraded again on all hosts (bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=15567 fix). Maria will install install glite-VOMS_oracle-3.0.2-0 and update page LCGVomsCernSetup. Then, Thorsten please update installation server.

A number of rpms like security-utils-config and the CAs are out of date on voms101,2,3. https://uimon.cern.ch/twiki/pub/LCG/Trash.LCGVomsCernSetup/rpm.list is a snapshot of what was installed mid-March thatis already quite out of date. Who updates rpms which are necessary but not on the LCGVomsCernSetup page?

May 9th 2006

The migration to the production Oracle db servers and the new hardware took place today as planned. Problems we faced:

  • the crl was not being updated due to https://savannah.cern.ch/bugs/?func=detailitem&item_id=15638. Fixed by I. Neilson manually. Requires installation of rpm glite-security-utils-config-1.2.4-1. Will be done by M.Dimou but should also copied to the installation server. (Action: FIO).
  • the database password agreed and configured in /opt/glite/etc/voms/[VO_Name]/voms.pass for all 10 VOs (40 oracle accounts) was not known to the database (why?) so trying to start the service we got "Initialization error: wrong database version" (misleading). Fixed by M.Anjo on the Oracle servers.
  • the ports on the firewall were open for all individual hosts voms101,2,3 on the firewall but not for the 'wrapper' prod-voms.cern.ch. We didn't know this was necessary when we took the relevant action and wrote the 'connectivity' section in LCGVomsCernSetup. Fixed by computer.security and netops after urgent request by M.Dimou.
  • openafs was upgraded by T.Kleinwort given that the machines were on scheduled service interruption. vomrs didn't start on the 'master' voms102.cern.ch after the reboot as instructed by VomsStartStopCheck. Started manually by M.Dimou. This should be solved before next reboot will be needed and/or before the master will change due to a LinuxHA signal. (Action: FIO).
  • the system error messages reported in the May 2nd report (below) are still a concern. Emailing them to the 3rd level support as in VomsProblemSolving is not helpful. (Action: FIO).
  • discovered that all 4 rpms (rpm names marked with NB!!! on LCGVomsCernSetup) installed and configured by M.Dimou before May 2nd (see report further down on this page) disappeared from voms102 and voms103 but not from voms101 ( WHY?). This is a very big problem because, at least one of them, contained an important bug fix and because reconfiguring voms is quite long and requires a service interruption. FIO help is necessary to understand the reasons for this.

May 2nd 2006

The voms configuration problems were solved on voms101,2,3. Rpms and configuration is now upto-date on all 3 hosts. Rpm list on twiki page LCGVomsCernSetup is also updated. Actions 1,2,3 in the previous report (below) were completed by Maria. The ports are not yet open on the firewall so pressure should be kept on the network group to do that. The machines will enter in production on May 9th at 10am if pending actions are completed. Two more actions were added in the list:

  1. upgrade vomrs on voms102 and voms 103 (Tanya)
  2. create crontabs (different) on voms101,2,3.

FIO, please update the installation server with the following rpm changes:

Erase bogus rpm: CERN-CC-gridvoms-1.1-1

Include/replace 4 rpms(see twiki page LCGVomsCernSetup for details):

glite-VOMS_oracle-2.3.0-1.noarch.rpm

glite-security-voms-admin-server-1.2.16-1.noarch.rpm

glite-security-voms-oracle-2.0.7-0.i386.rpm

voms-tools-0.0.1-1.noarch.rpm

Also, FIO, please explain the error messages I've been receiving for weeks (forwarded in emai):

Subject: voms103 - SPMA_ERROR for a few mns & a lot of times over weekend. (CM000000000111326)

[ voms103 - GD GRIDVOMS ]


Forwarded message ----------

Subject: voms102.cern.ch: /usr/local/sbin/sysacct-handle error

voms102.cern.ch: /usr/local/sbin/sysacct-handle error (Mon May 1 00:02:01 2006)

/usr/sbin/sa --other-usracct-file /var/account/usracct --other-savacct-file /var/account/savacct /var/account/pacct -a > /var/account/pacct-report-week.17 failed No such file or directory

April 25th 2006

The complete wipe-out of voms103 took place last week but the rebuild was not successful because the configuration files were not being backed-up. After installing 4 new rpms with patches (LCGVomsCernSetup page updated, new rpms marked with " NB!!! Installed ... Replaces ...") we are trying to re-create the configuration, so far unsuccessfully because of errors in the procedure (VomsConfiguration) that we don't yet understand. We have arranged the copy of the data to the production Oracle database and the entry of voms102,3 in production for Tuesday May 2nd starting at 10am. This means (actions on people in parentheses):

  1. voms103 must be successfully rebuilt (Maria)
  2. ports have to be opened on the firewall. LCGVomsCernSetup page updated, section "Connectivity".(FIO to computer.security)
  3. voms102 must be identical to voms103 (FIO)
  4. vomrs on voms102 and voms103 must be mostly the same as on lcg-voms.cern.ch today, namely, cover the same 10 VOs with the same configuration parameters but use the production Oracle db. Check that!!! (Lanxin)
  5. vomrs should be starting at boot time via /opt/vomrs-1.2/etc/init.d/vomrs start until https://savannah.cern.ch/bugs/?func=detailitem&item_id=15788 done. (FIO)
  6. the DNS alias lcg-voms.cern.ch has to go out of tbed0152 to voms102/3 (FIO)
  7. broadcast news for service interruption (Maria)
  8. copy the data out of grid8.cern.ch (pre-production Oracle db) to production db. LCGVomsCernSetup page updated, section "Oracle accounts".(Miguel Anjo (informed))

April 4th 2006

The LinuxHA tests ended on April 4th but a complete wipe-out of voms103 will take place during the next 2 days to test whether the Installation server contents https://uimon.cern.ch/twiki/bin/view/LCG/Trash.LCGVomsCernSetup and a restore from backup plus the https://uimon.cern.ch/twiki/bin/view/LCG/VomsStartStopCheck will give us a working service without intervention.

March 28th 2006

High Availability testing between voms102,3 will take place during the week of April 3rd. The rpm including the voms-ping script is available on page LCGVomsCernSetup and linked from VomsWlcgHa. The configuration and the memory upgrade on voms103 are still pending.

The documentation is ready in pages LCGVomsCernSetup, VomsConfiguration, VomsStartStopCheck and is linked from WlcgScDash in the relevant sections of the ScFourTechnicalQuestionnaire.

Since the introduction of [http://cern.ch/dimou/voms/glite3.html][glite3.0]] voms the production servers suffer terrible memory problems as described in https://savannah.cern.ch/bugs/index.php?func=detailitem&item_id=15714

To understand the reasons of these problems we need intensive testing. For this purpose we need the extension of the present 15 accounts to 40 Oracle test accounts on the production servers (format lcg_voms_voms1-40 for 10 VOs) for voms (20 accounts with the _W) and vomrs (another 20 accounts with the _W). It is very important that we keep these accounts for as long as we use VOMS.

We will need to change one rpm on voms101,2,3 when the package is certified. It corresponds to a quick fix expressed in https://savannah.cern.ch/bugs/?func=detailitem&item_id=15692.

March 21st 2006

Since March 15th the CERN VOMS servers run the glite3.0 VOMS version as made available for the PPS in http://lxb2042.cern.ch/gLite/APT/PPS/rhel30/ Hosts involved: lcg-voms.cern.ch and voms.cern.ch Detailed rpm versions in: http://cern.ch/dimou/lcg/voms/glite3.html Backend database used for both is still grid8/voms-pilot.cern.ch (pre-production.

Installation of glite3.0 VOMS is done on the new hardware:

voms101.cern.ch (identical configuration to voms.cern.ch will become the voms-ldap server and will keep using the pre-production db backend).

voms102.cern.ch (identical configuration to lcg-voms.cern.ch is ready as one of the master/slave pair of the new lcg-voms.cern.ch. It is configured to use the production db bakcend).

voms103.cern.ch (should become identical to voms102). The rpms are now there but the service must be configured (Maria). 2GB of memory should be added to this host to have an identical configuration to voms101. (FIO)

Information concerning the hosts, rpms, accounts, logs continues being added in https://uimon.cern.ch/twiki/bin/view/LCG/CernVomsSetup?topic=Trash.LCGVomsCernSetup (Maria).

Improvements by the developers are pending for the voms-ping script https://uimon.cern.ch/twiki/bin/view/LCG/VomsWlcgHa

Documentation on start/stop procedures is pending (Maria).

March 1st 2006

We have daily changes in the VOMS code in order to make a solid version available for glite3. We now have to stop and freeze the code for installation on the new hardware. Latest situation in http://cern.ch/dimou/lcg/voms/glite3.html

A VOMS/VOMRS workshop is approaching http://cern.ch/dimou/lcg/registrar/TF/meetings/2006-03-13 We count a lot on FIO and database experts participation to freeze the servers on a production quality set-up. We need to install the new code on new hardware and different backend.

No more LDAP updates are possible for LHC Experiment VOs and DTEAM since last Friday 24 Feb. All registrations are done in VOMRS.

VOMRS development requests are in http://cern.ch/dimou/lcg/vomrs/savannah_entries_on_VOMRS.html

February 21st 2006

The GDB on Feb 8th following a presentation on VOMS/VOMRS http://agenda.cern.ch/askArchive.php?base=agenda&categ=a057702&id=a057702s1t10/moreinfo concluded that we have to go ahead with the LHC Experiments & DTEAM migration to VOMS as planned.

VOMRS is used more and more now. A number of bugs and enhancements are registered in savannah. See https://savannah.cern.ch/search/?words=vomrs&type_of_search=bugs&Search=Search&exact=1 (for bugs) and https://savannah.cern.ch/search/?words=vomrs&type_of_search=task&Search=Search&exact=1 (for tasks) The issue of VOMRS support at CERN is not yet clarified.

We have daily changes in the VOMS code in order to make a solid version available for glite3. Latest situation in http://cern.ch/dimou/lcg/voms/glite3.html

A VOMS/VOMRS workshop is approaching http://cern.ch/dimou/lcg/registrar/TF/meetings/2006-03-13 We count a lot on FIO and database experts participation to freeze the servers on a production quality set-up.

January 31st 2006

VOMS/VOMRS installation on the new hardware purchased by FIO hasn't yet started due to the following reasons:

1. gLite R1.5 came out on Sunday Jan. 22nd but included the same voms components as gLite R1.4 (voms-admin 1.2.10 and voms 1.6.10) containing bugs from which we need to get away, e.g. https://savannah.cern.ch/bugs/?func=detailitem&item_id=13863 and https://savannah.cern.ch/bugs/?func=detailitem&item_id=13899. http://cern.ch/dimou/lcg/voms/glite1.5 explains the details.

2. as a result of the above testing is not completed and certification has not started, therefore, no question to install the software on the (future) production servers.

3. most importantly, the principal voms tester reported:

> > I guess the VOMS installation script deletes all the entries > > in the database when it hits a clean machine. Clean in the > > sense that there is not VOMS installed.

As we now are in full Experiment members' migration and numerous true, recent entries are added in the VOMS db, I can't take the risk to install the product before this problem is proven to have gone away. - maria

January 17th 2006

http://cern.ch/dimou/lcg/voms/StatusFall2005/ reflects the current situation with VOMS. The tomcat performance problem was finally identified as a voms-admin bug after 6 months of impossible operation. DTEAM VO updates in VOMRS have started. Some changes are required in VOMRS code to satisfy delegation requirements from the ROC to the site managers for this VO.

A VOMS/VOMRS workshop at CERN is being prepared for the week of March 13th.

November 30th 2005

Slow progress in ticket https://savannah.cern.ch/bugs/?func=detailitem&item_id=12613, which is a problem because its completion is a prerequisite for moving the VOMS/VOMRS data to the central databases' production service.

voms-admin for DTEAM and ATLAS VOs suffered from bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=13863 that demonstrated itself in the latest official release issued with glite R1.4.1.

Prepared http://cern.ch/dimou/lcg/voms/Xmas2005.html and http://cern.ch/dimou/lcg/registrar/Xmas2005.html for the service availability over Xmas.

Prepared http://cern.ch/dimou/lcg/voms/voms-lcg2-2_7_0.html for the next LCG2 release.

November 14th 2005

Updated ticket https://savannah.cern.ch/bugs/?func=detailitem&item_id=12613 as requested at the last meeting: "Further to my comment of Nov. 1st in this ticket: "The database services consider this bug a show-stopper for the move of the VOMRS/VOMS databases from the pre-production to the productions servers.", the CERN Physics' database services and the LCG Service Coordination Meeting https://twiki.cern.ch/twiki/bin/view/LCG/LcgScm require a date by which this development will be completed and packaged in a gLite release for installation on the VOMS servers. Please update this ticket with the date information. Thank you - maria"

M.Girone mentioned at the last meeting the existence of a person in CERN/IT with tomcat expertise. As I submitted apologies for all Aas people missing from the next meeting, please send me by email the name of the tomcat expert.

Extensive discussions are taking place in the Weekly Operations' Meeting on new DTEAM set-up and population via VOMRS/VOMS. Full report in http://egee-docs.web.cern.ch/egee-docs/operational_tools/Operations_Meetings/Weekly_Operations_Meeting_minutes_2005-11-07.htm

- maria

November 2nd 2005

Activities completed since last week

Memory upgrade on the primary VOMS server lcg-voms.cern.ch on Fri Oct. 28th. The tomcat problem is still not explained.

Work in progress

Debugging performance problems on the VOMS servers https://savannah.cern.ch/bugs/?func=detailitem&item_id=10970 Discussing the move of the VOMS data from grid8 (pre-production) to production servers. https://savannah.cern.ch/bugs/?func=detailitem&item_id=12613 seems to be a show-stopper. Preparing notes from the VOMS workshop of 17-21 Oct. that describe the new servers' architecture. They will live in http://cern.ch/dimou/lcg/registrar/TF/meetings/2005-10-17

Issues

-- SteveTraylen - 09 Jul 2008

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2008-08-05 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback