Using the kubernetes DAQ Services cluster in NP04
This is the OLD cluster, no longer used as of 4.2
Note that the web proxy may interfere with access to the API. So you should not have it active in your session.
Log into
np04-srv-014
Set up your kubernetes environment
As yourself:
export KUBECONFIG=/nfs/home/np04daq/.kube/config.014
Try
kubectl
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
NAME STATUS ROLES AGE VERSION
np04-srv-014 Ready control-plane,master 27d v1.23.4
np04-srv-023 Ready <none> 27d v1.23.4
DQM also runs here. There is not yet a DNS alias.
Boot Basic Services
'''sh /scratch/kube/boot_basic_services.sh"
This will apply the manifests currently being used here.
Manifests
Manifests (yaml files) are in /scratch/kube/manifests, which is a symlink to the manifests in
https://github.com/DUNE-DAQ/pocket/
/manifests/opmon (Influxdb is in here)
/manifests/dunedaqers (ERS Postgres and Kafka manifests are both here and need to be reorganized.
Print Credentials and ports for running services
sh /scratch/kube/print-creds.sh
Available services:
Kafka
address (in-cluster): kafka-svc.kafka-kraft:9092
address (out-cluster): 10.73.136.34:30092
ERS Postgres
address (in-cluster): postgres-svc.ers:5432
User: admin Password: xxxxx
ASP Password: Password=xxxxx
Kubernetes dashboard
URL (in-cluster): http://kubernetes-dashboard.kubernetes-dashboard
URL (out-cluster): http://10.73.136.34:31001
Password: none. click 'skip' in login window
Examine Pod (Kafka)
$ kubectl -n kafka-kraft describe pod kafka-0
Examine Pod (DQM)
List pods in dqm namespace
[boking@np04-srv-014 ~]$ kubectl -n dqm get pods
NAME READY STATUS RESTARTS AGE
dqmbackend-c478bc775-vjzf7 1/1 Running 0 87m
dqmredis-5958646bd6-jfps4 1/1 Running 0 87m
get detailed pod info
$ kubectl -n dqm describe pod deploy/dqmbackend
$ kubectl -n dqm describe pod deploy/dqmredis
get pod logs
$ kubectl -n dqm logs deploy/dqmbackend
Get TTY on a pod
$ kubectl -n dqm exec -it deploy/dqmbackend bash
Update/Start/Stop Pod
$ kubectl apply -f manifests/dunedaqers/kafka.yaml
$ kubectl delete -f manifests/dunedaqers/kafka.yaml
useful commands
$ kubectl get namespaces
$ kubectl get pods --all-namespaces
$ kubectl -n kafka-kraft logs kafka-0
$ kubectl -n ers get pods
$ kubectl -n ers logs postgresql-<uuid>
Accessing the Control Plane UI
Kubernetes Dashboard:
http://np04-srv-014.cern.ch:31001/
Shutting down the cluster
Drain the worker node (023)
export KUBECONFIG=/nfs/home/np04daq/.kube/config.014
kubectl drain np04-srv-023
once all pods have been drained, shut down kubelet and containerd on worker node (023)
$ systemctl stop kubelet
$ systemctl stop containerd
shut down kubelet and containerd on worker node (014)
$ systemctl stop kubelet
$ systemctl stop containerd
reboot nodes
if kubelet and containerd start properly, uncordon nodes
export KUBECONFIG=/nfs/home/np04daq/.kube/config.014
kubectl uncordon np04-srv-023
if kubelet won't start
It may be a timeout issue cleaning up containerd. Shortcut is to move /scratch/containerd aside, restart containerd, and once it comes up, start kubelet.
cd /scratch
mv containerd comtainerd.foo
systemctl restart containerd
--
BonnieKing - 2022-03-23