Using the kubernetes DAQ Services cluster in NP04

This is the OLD cluster, no longer used as of 4.2

Note that the web proxy may interfere with access to the API. So you should not have it active in your session.

Log into np04-srv-014

Set up your kubernetes environment

As yourself:

export KUBECONFIG=/nfs/home/np04daq/.kube/config.014 

Try kubectl

$ kubectl get nodes

NAME           STATUS   ROLES                  AGE     VERSION
NAME           STATUS   ROLES                  AGE   VERSION
np04-srv-014   Ready    control-plane,master   27d   v1.23.4
np04-srv-023   Ready    <none>                 27d   v1.23.4

DQM also runs here. There is not yet a DNS alias.

Boot Basic Services

'''sh /scratch/kube/boot_basic_services.sh"

This will apply the manifests currently being used here.

Manifests

Manifests (yaml files) are in /scratch/kube/manifests, which is a symlink to the manifests in https://github.com/DUNE-DAQ/pocket/

/manifests/opmon (Influxdb is in here) /manifests/dunedaqers (ERS Postgres and Kafka manifests are both here and need to be reorganized.

Print Credentials and ports for running services

sh /scratch/kube/print-creds.sh

Available services:
Kafka
   address (in-cluster): kafka-svc.kafka-kraft:9092
   address (out-cluster): 10.73.136.34:30092
ERS Postgres
   address (in-cluster): postgres-svc.ers:5432
   User: admin   Password: xxxxx
   ASP Password: Password=xxxxx
Kubernetes dashboard
   URL (in-cluster): http://kubernetes-dashboard.kubernetes-dashboard
   URL (out-cluster): http://10.73.136.34:31001
   Password: none. click 'skip' in login window

Examine Pod (Kafka)

$ kubectl -n kafka-kraft describe pod kafka-0

Examine Pod (DQM)

List pods in dqm namespace

[boking@np04-srv-014 ~]$ kubectl -n dqm get pods
NAME                         READY   STATUS    RESTARTS   AGE
dqmbackend-c478bc775-vjzf7   1/1     Running   0          87m
dqmredis-5958646bd6-jfps4    1/1     Running   0          87m

get detailed pod info

$ kubectl -n dqm describe pod deploy/dqmbackend

$ kubectl -n dqm describe pod deploy/dqmredis

get pod logs

$ kubectl -n dqm logs deploy/dqmbackend

Get TTY on a pod

$ kubectl -n dqm exec -it deploy/dqmbackend bash

Update/Start/Stop Pod

$ kubectl apply -f manifests/dunedaqers/kafka.yaml

$ kubectl delete -f manifests/dunedaqers/kafka.yaml

useful commands

$ kubectl get namespaces

$ kubectl get pods --all-namespaces

$ kubectl -n kafka-kraft logs kafka-0

$ kubectl -n ers get pods

$ kubectl -n ers logs postgresql-<uuid>

Accessing the Control Plane UI

Kubernetes Dashboard: http://np04-srv-014.cern.ch:31001/

Shutting down the cluster

Drain the worker node (023)

export KUBECONFIG=/nfs/home/np04daq/.kube/config.014 
kubectl drain np04-srv-023

once all pods have been drained, shut down kubelet and containerd on worker node (023)

$ systemctl stop kubelet

$ systemctl stop containerd

shut down kubelet and containerd on worker node (014)

$ systemctl stop kubelet

$ systemctl stop containerd

reboot nodes

if kubelet and containerd start properly, uncordon nodes

export KUBECONFIG=/nfs/home/np04daq/.kube/config.014 
kubectl uncordon np04-srv-023

if kubelet won't start

It may be a timeout issue cleaning up containerd. Shortcut is to move /scratch/containerd aside, restart containerd, and once it comes up, start kubelet.
cd /scratch
mv containerd comtainerd.foo
systemctl restart containerd

-- BonnieKing - 2022-03-23

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r11 - 2023-10-26 - BonnieKing
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback