S3 logging

Access logs from Træfik reverse-proxy are collected via a side-car process called fluentbit. It pushes the logs to Monit Logs infrastructure for later processing by Logstash for filtering and enrichment running on Monit Marathon. Eventually, logs are then pushed to HDFS (/project/monitoring/archive/s3/logs) and to Elasticsearch for storage and visualization.

fluentbit on S3 RadosGWs

Since late April 2022, we use fluentbit on RadosGWs+Træfik frontends as it is much more gentle on memory than Logstash (which we were using previously).

fluentbit tails the log files produced by Træfik (both HTTP access logs and Træfik daemon logs), add a few fields and context through metadata, and pushes the records to the Monit Logs infrastructure at URI monit-logs-s3.cern.ch:10013/s3 using TLS encryption.

It is installed via puppet (exmaple for Gabe) by using the shared class fluentbit.pp responsible for installation and configuration of the fluentbit service.

fluentbit on the RadosGWs+Træfik frontends is configured to tail two input files, namely the access (/var/log/traefik/access.log) and the daemon (/var/log/traefik/service.log) logs of Træfik. Logs from the access (daemon) file are tagged as traefik.access.* (traefik.service.*), labelled as s3_access (s3_daemon). Before sending to the Monit infrastructure, the message is prepared to define the payload data and metadata (see monit.lua):

  • producer is s3 (used to build path on HDFS) -- must be whitelisted on the Monit infra;
  • type defines if the logs are access or daemon (used to build path on HDFS);
  • index_prefix defines the index for the logs (is used by Logstashon Monit Marathon and on Elasticsearch).

Logstash on Monit Marathon

Logstash is the tool that reads the aggregated log stream from Kafka, does most of the transformation and writes to Elasticsearch.

This Logstash process runs in a Docker container on the Monit Marathon cluster (see Applications --> storage --> s3logs-to-es). For debugging purposes, stdout and stderr of the container are available on monit-spark-master.cern.ch:5050/ -- They do not work from Marathon.

The Dockerfile, configuration pipeline, etc., are stored in s3logs-to-es.

This Logstash instance:

  • removes the additional fields introduced by the Monit infrastructure (metadata unused by us)
  • parses the original message as json document
  • adds costing information
  • adds geographical information of the client IP (geoIP)
  • copies a subset of fields relevant for CSIR to a different index
  • ...and pushes the results (full logs, and CSIR stripped version) to Elasticsearch

Elasticsearch

We finally have our dedicated Elasticsearch instance managed by the Elasticsearch Service.

There's not much to configure from our side, just a few useful links and the endpoint config repository:

Data is kept for:

  • 10 days on fast SSD storage, local to the ES cluster
  • other 20 days (30 total) on ceph storage
  • 13 months (stripped-down version, some fields are filtered out -- see below) for CSIR purposes

Indexes on ES must start with ceph_s3. This is the only whitelisted pattern, and hence the only one allowed. We currently use different indexes:

  • ceph_s3_access: Access logs for Gabe (s3.cern.ch)
  • ceph_s3_daemon: Traefik service logs for Gabe
  • ceph_s3_access-csir: Stripped down version of Gabe access logs for CSIR, retained for 13 months
  • ceph_s3_fr_access: Access logs of Nethub (s3-fr-prevessin-1.cern.ch)
  • ceph_s3_fr_daemon: Traefik service logs for Nethub
  • ceph_s3_fr_access-csir: Stripped down version of Nethub access logs for CSIR, retained for 13 months

ES is also a data source for Monit grafana dashboards:

  • Grafana uses basic auth to ES with user ceph_ro:<password> (The password is stored in Teigi: ceph/gabe/es-ceph_ro-password)
  • ES must have the internal user ceph_ro configured with permissions to read ceph* indexes

HDFS

HDFS is solely used as a storage backed to store the logs for 13 months for CSIR purposes. As of July 2021, HDFS stores the full logs (to be verified if they do not eat too much space on HDFS). To check/read logs on HDFS, you must have access to the HDFS cluster (see prerequisites) and from lxplus

source /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/setup.sh
source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-swan-setconf.sh analytix 3.2 spark3
kinit
hdfs dfs -ls /project/monitoring/archive/s3/logs
Improve me !