APEL FAQ and Troubleshooting

General Information

Q. How can I communicate issues to the APEL team?

A. Please use the GGUS helpdesk and provide the following information:

  • Your Site Name
  • A description of the problem
  • The version of the APEL packages installed
  • Any relevant log file
For other issues you can contact APEL SUPPORT.

Q. Where can I find the APEL Users Guide?

A. For the EMI2 Apel Publisher, you can find the documentation in the APEL Client page.

For the EMI3 Apel Cllient, you can find the documentation in the APEL Client page.

Q. Where can I check the accounting data for my site?

A. Please check the Accounting Portal.

Q. How do I access my local accounting database?

A. You can log on to MySQL by running:

EMI2> mysql -u accounting -p accounting
EMI3> mysql -u apel -p apelclient

You will be prompted for your accounting database password, which was defined when configuring the node.

Q. How do I check the accounting status of my site?

A. You can check the synchronisation pages for your site at:

For these pages to be created, your site must:

  • Be registered in GOCDB with production/certified status
  • Have at least one CE defined with a servicetype "APEL" in GOCDB. Please note this is different than the "glite-APEL" service needed for your publishing box.

Q. How do I change the MySQL default temporary directory?

A. MySQL uses the value of the TMPDIR environment variable as the pathname of the directory in which to store temporary files. If you don't have TMPDIR set, MySQL uses the system's default, which is normally /tmp, /var/tmp, or /usr/tmp. If the filesystem containing your temporary file directory is too small, you can use the --tmpdir option to mysqld to specify a directory in a filesystem where you have enough space.

Q. How do I change the Sitename?

A. Please open a GGUS ticket and we'll guide you through the procedure.

Q. How do I make a mysqldump of the database?

A. You can create a dump of your accounting database with the following command:

EMI2> mysqldump -u accounting -p accounting > sitename.sql
EMI3> mysqldump -u apel -p apelclient > sitename.sql

The APEL support team may ask you for a copy of the dump file. As this file is often very large we recommend that you make the file available on an HTTP server for the team to download. If this isn't possible please contact the team to discuss other file transfer options.

Note that the database can contain private job information so please don't make the dump publicly available.

EMI2 APEL Questions

Q. How do I upgrade my glite-APEL box to emi-apel?

A. If you have a working glite-APEL node ( not shared with another glite service in the same box) and want to upgrade to the EMI version follow these steps:

  • Install the EMI yum repository/EPEL repository by following the instructions on the EMI webpage.
  • Remove the old apel packages:
# yum remove glite-APEL glite-apel-core glite-apel-publisher glite-apel-yaim glite-yaim-core
  • Install the new metapackage:
# yum install emi-apel
  • The node now needs to be reconfigured with YAIM. You can find the information about configuring the glite-APEL service in the APEL Publisher admin guide. Once you have the correct variables in the file site-info.def, you'll need to run the following command:
# /opt/glite/yaim/bin/yaim -c -n glite-APEL -s /opt/glite/yaim/examples/siteinfo/site-info.def

YAIM will configure the publisher configuration file. The database does not need to be reconfigured, so please ignore any YAIM error messages when trying to create 'accounting' database.

If you had your publisher cron job running at a specific time, you'll need to change it in /etc/cron.d/glite-apel-publisher, as YAIM will have replace it during configuration.

The APEL publisher is now ready.

Q. How do I upgrade my APEL parser from gLite to EMI version?

A. The APEL parsers (LSF, Condor, SGE and PBS/Torque) are distributed as part of the CREAM-CE or the batch system utils metapackages. You will need to check the instructions for the metapackage on how to upgrade (if possible).

The location of the APEL packages has changed from gLite to EMI, so reconfiguration with YAIM is needed.

Q. How do I archive processed/old data from my local database?

A. Tables in MySQL may crash when they reach 4Gb. The records that have already been processed/published in APEL can be automatically deleted or manually archived to reduce the size of the APEL tables.

Automatic Deletion

  • Make a copy of your parser configuration file (if you have more than one just select any).
  • Modify the configuration file to enable the DBDeleteProcessor (with option cleanAll="no"). The rest of the processors can be disabled.

<?xml version="1.0" encoding="UTF-8"?>
<ApelConfiguration enableDebugLogging="yes">
    <SiteName>place your site name here</SiteName>
    <DBURL>jdbc:mysql://localhost:3306/accounting</DBURL>
    <DBUsername>no-default</DBUsername>
    <DBPassword>no-default</DBPassword>
    <DBDeleteProcessor cleanAll="no"/>
</ApelConfiguration>

  • Run the apel parser with that configuration file:

env APEL_HOME=/ /usr/bin/apel-batchsystem-log-parser -f /etc/glite-apel-batchsystem/parser-config_delete.xml

Manually Archiving of Tables

  • Create archive tables: In the local MySQL database run the following commands:

mysql> create table EventRecords_archive like EventRecords;
mysql> create table MessageRecords_archive like MessageRecords;
mysql> create table GkRecords_archive like GkRecords;
mysql> create table BlahdRecords_archive like BlahdRecords;
mysql> create table LcgRecords_archive like LcgRecords;

  • Copy old processed data to the archive tables:
Please replace YYYY-MM-DD with the data you want to archive the data until. Please note the table LcgRecords doesn't keep a flag indicating if the record has been published/processed. This information is stored in the RepublishInfo table instead. Do not archive any records in LcgRecords with MeasurementDate newer than the date recorded in the RepublishInfo table. You can find the latest successful publishing date by running the following query:

mysql> select * from RepublishInfo;

The following queries copy the old data into the archive tables:

mysql> insert into EventRecords_archive (select * from EventRecords where EventDate < 'YYYY-MM-DD' and Processed = 1);
mysql> insert into MessageRecords_archive (select * from MessageRecords where ValidFrom < 'YYYY-MM-DD' and Processed = 1);
mysql> insert into GkRecords_archive (select * from GkRecords where ValidFrom< 'YYYY-MM-DD' and Processed = 1);
mysql> insert into BlahdRecords_archive (select * from BlahdRecords where ValidFrom < 'YYYY-MM-DD' and Processed = 1);
mysql> insert into LcgRecords_archive (select * from LcgRecords where MeasurementDate < 'YYYY-MM-DD');

  • Remove the old records from the live tables: If the previous queries run successfully we can now delete the old records from the live tables.

mysql> delete from EventRecords where EventDate < 'YYYY-MM-DD' and Processed = 1;
mysql> delete from MessageRecords where ValidFrom < 'YYYY-MM-DD' and Processed = 1;
mysql> delete from GkRecords where ValidFrom < 'YYYY-MM-DD' and Processed = 1;
mysql> delete from BlahdRecords where ValidFrom < 'YYYY-MM-DD' and Processed = 1;
mysql> delete from LcgRecords where MeasurementDate < 'YYYY-MM-DD'; 

  • Release unused space from the live tables:

mysql> optimize table EventRecords;
mysql> optimize table MessageRecords;
mysql> optimize table GkRecords;
mysql> optimize table BlahdRecords;
mysql> optimize table LcgRecords;

EMI2 APEL Troubleshooting

Q. ApelException: Unable to setup a database connection

In /var/log/apel.log, you see an exception like this:

Fri Mar  2 07:43:01 UTC 2012: apel-publisher - ------ Starting the APEL Publisher ------
Fri Mar  2 07:43:01 UTC 2012: apel-publisher - program aborted
org.glite.apel.core.ApelException: Unable to setup a database connection: org.gjt.mm.mysql.Driver
   at org.glite.apel.core.db.MySQLImpl.<init>(Unknown Source)
   at org.glite.apel.core.processor.DBProcessor.<init>(Unknown Source)
   at org.glite.apel.publisher.ApelPublisher.run(Unknown Source)
   at org.glite.apel.publisher.ApelPublisher.main(Unknown Source)

A. There is a bug in the script /usr/bin/apel-publisher. The variable MYSQL_DRIVER_CP should be edited to read

MYSQL_DRIVER_CP="/usr/share/java/mysql-connector-java.jar"

Q. Out of Memory exception when running the publisher

A. The following can be checked/modified:

  • Make sure you have the latest version of glite-apel-publisher and glite-apel-core installed.
  • Please lower the value of Limit in the publisher configuration file (/etc/glite-apel-publisher/publisher-config-yaim.xml).
  • The memory assigned to the APEL Publisher (-Xmx1024m) can be manually modified in the APEL Publisher script (/usr/bin/apel-publisher). Please note two lines need to be modified.

Q. SQLException: Can't open file: 'tableName.MYI'. (errno: 145)

A. One of the MySQL tables in the accounting database is corrupted.

mysql> select count(*) from tableName;
 ERROR 1016: Can't open file: 'tableName.MYI'. (errno: 145)

This can be fixed by repairing the table. Please note that depending on the size of the table, this process might take a long time.

myqsl> REPAIR TABLE tablename;

Q. SecurityUtils Exception while trying to randomise

A. The publisher was not able to encrypt the UserDN field, which is too long for the public key.

Mon Mar 12 15:21:12 UTC 2007: apel-publisher - SecurityUtils Exception while trying to randomise datajava.lang.ArrayIndexOutOfBoundsException: 1
Mon Mar 12 15:21:12 UTC 2007: apel-publisher - program aborted
Mon Mar 12 15:21:12 UTC 2007: apel-publisher - Internal program error - please report bug
java.lang.NullPointerException

This bug has now been fixed. Please make sure you're running the latest version of glite-apel-publisher and glite-apel-core.

Q. SQLException: Can't find file: 'tableName.frm'. (errno:13)

program aborted
org.glite.apel.core.ApelException: java.sql.SQLException: Can't find file: './accounting/BlahdRecords.frm' (errno: 13)

A. A possible permission issue. Please check that the files in the apel accounting database (/var/lib/mysql/accounting/) are owned by the mysql user.

Q. I have no errors on my publisher log file, but no new data appears in the Accounting Portal

A. Please run the following query in your local database:

mysql> select Max(EventDate) from LcgRecords;

If the date obtained is not current, the accounting data obtained by parsing the log files at your site is not being correctly joined. This can be caused by different reasons.

  • Check that the APEL Parser is correctly parsing all the log files. You can check this by either looking into the parser log file (/var/log/apel.log in your CE/LRMS) or running the following queries in your local database:

mysql> select Max(EventDate) from EventRecords;
mysql> select Max(ValidFrom) from BlahdRecords;

If any of this dates is not current, the associated log files are not being correctly parsed (batch system log files for EventRecords, accounting log files for BlahdRecords). Check that the parser configuration (/etc/glite-apel-_batchsystem_/parser-config-yaim.xml) is pointing to the correct directories and that the directories specified contain up to date log files. Please check that the log files have supported filenames (to find what are the supported names please check the APEL User Guide).

  • Check that the Sitename is correct and the same in all the different configuration files (both in the parser and the publisher).

  • You are not publishing CPU benchmark information for your CE(s) in your site BDII. Please check the APEL User Guide to understand what you should be publishing.

Q. I have no errors on my publisher, but my site is publishing a very small number of CPU Hours

A. Check the Sync page for your site (http://goc-accounting.grid-support.ac.uk/rss/_yoursitename__Sync.html). If there are no gaps in the data, please check what value of SpecInt you are publishing. You can check this in the Accounting Portal.

Q. User name or password invalid exception

Thu Oct 21 21:05:21 UTC 2010: apel-publisher - program aborted
org.glite.apel.core.ApelException:  org.glite.apel.core.ApelException: javax.jms.JMSException: User name or  password is invalid: No user for client certificate:  *******************************
        at org.glite.apel.publisher.AccountPublisher.<init>(Unknown Source)
        at org.glite.apel.publisher.AccountManager.run(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.runJoinProcessor(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.run(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.main(Unknown Source)
Caused  by: org.glite.apel.core.ApelException: javax.jms.JMSException: User  name or password is invalid: No user for client certificate:  ****************************************
        at org.glite.apel.publisher.AccountPublisher.createActiveMQProducer(Unknown Source)

A. Please check that your emi-apel node is registered in GOCDB with service "glite-APEL". You must also specify the DN for your host certificate. It will take ~1hour after making the change in GOCDB for it to be propagated to the APEL broker Access Control List. If you still don't have access after this time, please open a GGUS ticket for APEL.

There is one more thing to check. If your certificate's DN includes an email address, the DN may look like this (whether you read it from the file itself or retrieve it using OpenSSL):

/C=UK/O=eScience/OU=CLRC/L=RAL/CN=raptest.esc.rl.ac.uk/emailAddress=sct-certificates@stfc.ac.uk

The APEL broker expects all of the keys in the DN to be upper case. So you must make sure that you change this to EMAILADDRESS as follows:

/C=UK/O=eScience/OU=CLRC/L=RAL/CN=raptest.esc.rl.ac.uk/EMAILADDRESS=sct-certificates@stfc.ac.uk

Q. Certificate_unknown error

Sat Oct 30 10:50:26 UTC 2010: apel-publisher - program aborted
org.glite.apel.core.ApelException:
 org.glite.apel.core.ApelException: javax.jms.JMSException: Could not
connect to broker URL: ssl://apel-broker.esc.rl.ac.uk:61617. Reason:
javax.net.ssl.SSLHandshakeException: Received fatal alert:
certificate_unknown
        at org.glite.apel.publisher.AccountPublisher.<init>(Unknown Source)
        at org.glite.apel.publisher.AccountManager.run(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.runJoinProcessor(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.run(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.main(Unknown Source)

A. Please check that your certificate allows for both client and server authentication. You can check this by looking at the extended usage section of your certificate:

openssl x509 -in hostcert.pem -text

If your host certificate does not have either client or server usage, please contact your CA to obtain a certificate with both usages.

Q. The trustAnchors parameter must be non-empty

Fri Oct 29 05:43:10 UTC 2010: apel-publisher - program aborted
org.glite.apel.core.ApelException: org.glite.apel.core.ApelException: javax.jms.JMSException: Could not connect to broker URL:
ssl://apel-broker.esc.rl.ac.uk:61617. Reason: javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error:
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
        at org.glite.apel.publisher.AccountPublisher.<init>(Unknown Source)
        at org.glite.apel.publisher.AccountManager.run(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.runJoinProcessor(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.run(Unknown Source)
        at org.glite.apel.publisher.ApelPublisher.main(Unknown Source)

A. Make sure that lcg-CA is installed in your host.

Q. The SpecRecords table is full of duplicated/invalid entries

A. From glite-apel-core-2.0.13-8, the way APEL reads the benchmarking values from the GRIS/GIIS information system has changed. If your SpecRecords table contains many duplicated entries, please follow these steps:

  • Make a backup of the table:
mysql> create table SpecRecords_backup like SpecRecords;
mysql> insert into SpecRecords_backup (select * from SpecRecords);
mysql> delete from SpecRecords;

  • Enable the DBProcessor in either the APEL parser or the APEL publisher:

<DBProcessor inspectTables="yes"/>

  • Run the parser or the publisher (whichever you've modified). This will update the schema of the SpecRecords table and obtain the correct information from the GIIS/GRIS.

Q. SocketException when trying to publish

org.glite.apel.core.ApelException: org.glite.apel.core.ApelException: javax.jms.JMSException: Could not connect to broker URL: ssl://apel-broker.esc.rl.ac.uk:61617. Reason: java.net.SocketException: Socket closed
at org.glite.apel.publisher.AccountPublisher.<init>(Unknown Source)
at org.glite.apel.publisher.AccountManager.run(Unknown Source)
at org.glite.apel.publisher.ApelPublisher.runJoinProcessor(Unknown Source)
at org.glite.apel.publisher.ApelPublisher.run(Unknown Source)
...

A. Make sure that lcg-CA is installed in your node.

Q. java.lang.SecurityException in the publisher

java.lang.SecurityException: JCE cannot authenticate the provider BC
        at javax.crypto.Cipher.getInstance(DashoA13*..)
        at javax.crypto.Cipher.getInstance(DashoA13*..)
        at org.glite.apel.core.SecurityUtils.<init>(Unknown Source)
        at org.glite.apel.core.SecurityUtils.<clinit>(Unknown Source)
        at org.glite.apel.publisher.AccountPublisher.publish(Unknown Source)
...

A. Make sure you are using OpenJDK. In the publisher cron job (/etc/cron.d/glite-apel-publisher) check that JAVA_HOME is pointing to the OpenJDK installation.

Q. MysqlDataTruncation in the PBS log parser

org.glite.apel.core.ApelException: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Out of range value for column 'MemoryVirtual' at row x
at org.glite.apel.core.db.MySQLImpl.insertEventRecords(Unknown Source)
...

A. This is a known bug: https://savannah.cern.ch/bugs/?98979. Until a fix is released in EMI, it is possible to substitute the glite-apel-core.jar file (attached to this page) which will fix the problem.

Replace the version in /usr/share/java (on the same machine as the parser) with the one attached to this page. It puts Integer.MAX_VALUE (it's the same in MySQL and Java; 2147483647) in the DB instead of any values bigger than that.


EMI3 APEL Questions

Q. How do I upgrade from EMI2 APEL to EMI3 APEL?

A. Please follow the Client Upgrade plan here. Please note that some sites have reported problems with duplicate records after using the migration script. A fix procedure will be published here soon.

Q. Where do I find the EMI3 APEL rpm packages?

A. You can find the EMI APEL packages in the EMI3 repository here and in the UMD3 repository here. There are 4 packages:

  • apel-client - the software which sends data to the accounting repository. Previously called the APEL Publisher.
  • apel-parser - the log file parsers.
  • apel-server - the software for the central accounting repository or regional APEL servers.
  • apel-lib - a library which all of the above depend on.
In addition, sites will need the epel repository enabled.

For the most up to date packages, see our github page apel.github.io/apel.

EMI3 APEL Troubleshooting

Q. After upgrading to EMI3 APEL, my Sync record reports an error which republishing will not fix (duplicate record problem)

A. Work in progress - Please create a GGUS ticket. Instructions will follow here

  • Disable the APEL parsers and APEL client
    • Comment out the cron job which runs the parsers and client
    • The parsers may run on several different machines
    • We don't want the database to be changed during this process
  • Backup the database on the APEL client machine.
  • Note the Sync value discrepancy at:
  • Check the value matches the value returned by the following query:
  • Run the following query to delete the duplicate records:
  • Disable the unloader and ssm message sender in the client.cfg:
  • Run the client
  • Run the following query to extract new sync records and compare with the Sync page above. The values should now match.
    • If they match, then proceed
    • If they are still not correct, contact the APEL Team. If you cannot wait for help, restore the database from the dump, and uncomment the cron jobs for the parsers and client.
  • Enable the unloader and ssm message sender in the client.cfg; configure a gap publish for the afffected month(s):
  • Run the client
  • Check with the APEL team that the data has arrived (via GGUS or email) or wait 24-48 hours for the Sync page to update.
  • Confgure the client with interval = latest.
  • Restore the parser and client cron jobs.
Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatjar glite-apel-core.jar r1 manage 61.7 K 2012-11-21 - 15:02 UnknownUser Updated apel core jar file to fix PBS parsing error.
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2014-03-07 - StuartPullinger
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EMI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback