Last mod: 2026.03.05
Kafka on Kubernetes
Prerequisites
We need a configured Kubernetes cluster according to the instructions described here. In the example, the hosts are: 192.168.3.22 (msi), 192.168.3.23 (hp), and 192.168.3.25 (x510). Naturally, other hosts can be used, but in that case the scripts must be modified accordingly.
Download
Download package kafka-k8s.tar.gz and unpack:
wget https://dziak.tech/content/DevOps/kafka_on_kubernetes/downloads/kafka-k8s.tar.gz
tar -xzvf kafka-k8s.tar.gz
Cluster topology
| Role | Name | IP | CPU | RAM | Disk |
|---|---|---|---|---|---|
| Worker node | msi | 192.168.3.22 | i7 | 32 GB | SSD 250 GB |
| Worker node | hp | 192.168.3.23 | i5 | 32 GB | SSD 250 GB |
| Control plane | x510 | 192.168.3.24 | i5 | 8 GB | SSD 120 GB + HDD 900 GB (NFS) |
Shared NFS directory: /mnt/nfs-k8s exported by x510.
Node Preparation (perform on ALL nodes)
1. NFS Configuration on the control-plane (x510)
# On X510 – NFS server installation
sudo apt install -y nfs-kernel-server
# Add export
echo "/mnt/nfs-k8s 192.168.3.0/24(rw,sync,no_subtree_check,no_root_squash)" \
| sudo tee -a /etc/exports
sudo mkdir -p /mnt/nfs-k8s
sudo chmod 777 /mnt/nfs-k8s
sudo exportfs -ra
sudo systemctl enable --now nfs-kernel-server
2. NFS client on worker nodes (hp, msi)
# On hp and msi
sudo apt install -y nfs-common
sudo mkdir -p /mnt/nfs-k8s
# Mounting Test (Optional)
sudo mount -t nfs 192.168.3.24:/mnt/nfs-k8s /mnt/nfs-k8s
df -h /mnt/nfs-k8s
sudo umount /mnt/nfs-k8s
3. Node labels (on control-plane, z kubectl)
kubectl label node x510 kubernetes.io/hostname=x510
kubectl label node hp kubernetes.io/hostname=hp
kubectl label node msi kubernetes.io/hostname=msi
4. Taint control-plane (optional – already set by default)
Strimzi and other workloads are deployed to hp and msi.
The Kafka controller can run on x510 - if the control-plane has a NoSchedule taint,
remove it temporarily during installation or allow the controller to run only on the worker nodes.
# Check taints
kubectl describe node x510 | grep Taint
# Remove taint if needed (for Kafka controller on X510):
kubectl taint nodes x510 node-role.kubernetes.io/control-plane:NoSchedule-
Component installation
Step 1 – Namespaces
kubectl apply -f 00-namespaces/namespaces.yaml
Step 2 – NFS StorageClass and Provisioner
kubectl apply -f 01-nfs/nfs-provisioner.yaml
# Verification
kubectl get pods -n kube-system | grep nfs
kubectl get storageclass
Step 3 – Strimzi Operator
Download and install the Strimzi Cluster Operator (version 0.41.x supports Kafka 4.x):
STRIMZI_VER="0.41.0"
kubectl create -f \
"https://github.com/strimzi/strimzi-kafka-operator/releases/download/${STRIMZI_VER}/strimzi-cluster-operator-${STRIMZI_VER}.yaml" \
-n kafka
# or a shorter alias:
kubectl apply -f "https://strimzi.io/install/latest?namespace=kafka" -n kafka
# Wait for readiness
kubectl rollout status deployment/strimzi-cluster-operator -n kafka --timeout=180s
kubectl get pods -n kafka
Step 4 – Kafka 4.x with KRaft (ZooKeeper-less)
# ConfigMap with JMX Metrics
kubectl apply -f 03-kafka/kafka-metrics-configmap.yaml
# Kafka Cluster (KafkaNodePool + Kafka CR)
kubectl apply -f 03-kafka/kafka-cluster.yaml
# Monitor status – this may take 3–5 minutes
kubectl get kafka -n kafka -w
kubectl get kafkanodepool -n kafka
kubectl get pods -n kafka -w
# Ready when:
kubectl wait kafka/kafka-cluster --for=condition=Ready --timeout=600s -n kafka
Step 5 – Kafka Topics
kubectl apply -f 03-kafka/kafka-topics.yaml
kubectl get kafkatopic -n kafka
Step 6 – Prometheus
kubectl apply -f 04-prometheus/prometheus.yaml
# Verification
kubectl rollout status deployment/prometheus -n monitoring
kubectl get svc prometheus -n monitoring
# Access: http://192.168.3.22:30090
Optional (if you have Prometheus Operator / kube-prometheus-stack):
kubectl apply -f 04-prometheus/strimzi-podmonitor.yaml
Step 7 – Grafana
kubectl apply -f 05-grafana/grafana.yaml
kubectl rollout status deployment/grafana -n monitoring
# Access: http://192.168.3.22:30300
# Login: admin / kafka-admin-2024
Import Kafka dashboards for Strimzi:
In Grafana → Dashboards → Import → paste the ID or URL:
- Kafka Overview:
https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json - KRaft / ZooKeeper:
https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kraft.json - Kafka Operator:
https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-operators.json
Or via kubectl:
# Download Strimzi dashboards
curl -sL \
https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json \
-o /tmp/strimzi-kafka.json
kubectl create configmap grafana-kafka-dashboard \
--from-file=strimzi-kafka.json=/tmp/strimzi-kafka.json \
-n monitoring
Step 8 – OpenTelemetry collector
kubectl apply -f 06-otel/otel-collector.yaml
kubectl rollout status deployment/otel-collector -n otel
# Verification
kubectl get pods -n otel
kubectl logs -n otel -l app=otel-collector --tail=20
Post-installation verification
Status of all components
kubectl get pods -n kafka
kubectl get pods -n monitoring
kubectl get pods -n otel
kubectl get pvc -A
Kafka test
# Producer (send a test message)
kubectl run kafka-producer -it --rm \
--image=quay.io/strimzi/kafka:latest-kafka-4.0.0 \
--restart=Never \
-n kafka \
-- bin/kafka-console-producer.sh \
--bootstrap-server kafka-cluster-kafka-bootstrap:9092 \
--topic test-topic
# Consumer (receive message)
kubectl run kafka-consumer -it --rm \
--image=quay.io/strimzi/kafka:latest-kafka-4.0.0 \
--restart=Never \
-n kafka \
-- bin/kafka-console-consumer.sh \
--bootstrap-server kafka-cluster-kafka-bootstrap:9092 \
--topic test-topic \
--from-beginning
# Information about the KRaft Cluster
kubectl run kafka-admin -it --rm \
--image=quay.io/strimzi/kafka:latest-kafka-4.0.0 \
--restart=Never \
-n kafka \
-- bin/kafka-metadata-quorum.sh \
--bootstrap-server kafka-cluster-kafka-bootstrap:9092 \
describe --status
OTel test – sending a sample trace
# From any node in the 192.168.3.x network
curl -X POST http://192.168.3.22:30318/v1/traces \
-H "Content-Type: application/json" \
-d '{
"resourceSpans": [{
"resource": {"attributes": [{"key":"service.name","value":{"stringValue":"test-service"}}]},
"scopeSpans": [{
"spans": [{
"traceId":"0102030405060708090a0b0c0d0e0f10",
"spanId":"0102030405060708",
"name":"test-span",
"startTimeUnixNano":"1700000000000000000",
"endTimeUnixNano":"1700000001000000000"
}]
}]
}]
}'
Service sddresses
| Service | External Address | Port |
|---|---|---|
| Prometheus | http://192.168.3.22:30090 | NodePort |
| Grafana | http://192.168.3.22:30300 | NodePort (admin/kafka-admin-2024) |
| OTel gRPC | 192.168.3.22:30317 | NodePort |
| OTel HTTP | 192.168.3.22:30318 | NodePort |
| Kafka external | 192.168.3.22:32100 | NodePort |
| Kafka internal | kafka-cluster-kafka-bootstrap.kafka:9092 | ClusterIP |
Data architecture / workflow
Applications/Services
│
▼ OTLP (gRPC :4317 / HTTP :4318)
┌──────────────────┐
│ OTel Collector │ (2 replicas – hp and msi)
│ namespace: otel │
└──────┬───────────┘
│ Kafka Producer
▼
┌────────────────────┐ ┌─────────────────────┐
│ Kafka 4.x KRaft │◄────│ Strimzi Operator │
│ namespace: kafka │ │ (manages CR) │
│ - controller(x510)│ └─────────────────────┘
│ - broker(hp) │
│ - broker(msi) │
└──────┬─────────────┘
│ JMX /metrics :9404
▼
┌──────────────────┐ ┌──────────────────────┐
│ Prometheus │────►│ Grafana │
│ namespace: │ │ Dashboards Kafka │
│ monitoring │ │ + OTel metrics │
└──────────────────┘ └──────────────────────┘
Troubleshooting
Kafka does not start
kubectl describe kafka kafka-cluster -n kafka
kubectl describe kafkanodepool broker -n kafka
kubectl logs -n kafka -l strimzi.io/kind=Kafka --tail=50
PVC in Pending state
kubectl get pvc -A
kubectl describe pvc <name> -n kafka
# Check if the NFS provisioner is working:
kubectl get pods -n kube-system | grep nfs
kubectl logs -n kube-system -l app=nfs-client-provisioner
No metrics in Prometheus
# Check if port 9404 is accessible on the Kafka pods
kubectl get svc -n kafka
kubectl exec -n monitoring -it deploy/prometheus -- \
wget -qO- http://kafka-cluster-kafka-brokers.kafka.svc:9404/metrics | head -20
Full system reset
kubectl delete kafka kafka-cluster -n kafka
kubectl delete kafkanodepool controller broker -n kafka
kubectl delete pvc --all -n kafka
kubectl delete pvc --all -n monitoring