Security & Observability
RBAC (Role-Based Access Control)
RBAC controls who can do what in the cluster. Two scopes: namespace-level (Role) and cluster-level (ClusterRole).
Role + RoleBinding (Namespace-scoped)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: dev
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: dev
subjects:
- kind: ServiceAccount
name: ci-bot
namespace: dev
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Use ClusterRole + ClusterRoleBinding for cluster-wide access (e.g., namespace management, node monitoring).
RBAC Best Practices
| Practice | Why |
|---|---|
| Least privilege | Grant minimum permissions needed |
| Namespace-scoped first | Use Role before ClusterRole |
| ServiceAccounts per workload | Not the default SA |
| Audit regularly | kubectl auth can-i --list --as=system:serviceaccount:dev:ci-bot |
Pod Security
SecurityContext
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: app
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
Pod Security Standards (PSS)
Replace deprecated PodSecurityPolicies. Three levels enforced per namespace:
| Level | Description |
|---|---|
| Privileged | No restrictions |
| Baseline | Prevents known privilege escalations |
| Restricted | Hardened, best practice (recommended for production) |
kubectl label namespace production \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/warn=restricted
Resource Quotas and Limits
Namespace Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: dev
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
pods: "20"
LimitRange (Per-Pod Defaults)
Use LimitRange to set default requests/limits for containers that don't specify them. Prevents runaway pods.
Observability Stack
Metrics: Prometheus + Grafana
# Install via Helm (kube-prometheus-stack)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
Provides: Prometheus (metrics collection), Grafana (dashboards), Alertmanager (alerts), Node Exporter, kube-state-metrics.
| Component | Purpose |
|---|---|
| Prometheus | Scrapes metrics from pods/nodes via /metrics endpoints |
| Grafana | Visualization dashboards, alerting |
| Alertmanager | Routes alerts to Slack, PagerDuty, email |
| kube-state-metrics | Cluster state → Prometheus metrics |
Logging: EFK Stack
# Fluent Bit (DaemonSet — log collector)
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit -n logging --create-namespace
# Elasticsearch + Kibana
helm repo add elastic https://helm.elastic.co
helm install elasticsearch elastic/elasticsearch -n logging
helm install kibana elastic/kibana -n logging
| Component | Role |
|---|---|
| Fluent Bit | Lightweight log collector (DaemonSet on every node) |
| Elasticsearch | Log storage and full-text search |
| Kibana | Log visualization and debugging UI |
Health Checks
kubectl top requires metrics-server installed in the cluster.
kubectl top nodes # node CPU/memory
kubectl top pods -A --sort-by=memory # pod resource usage
kubectl get events --sort-by='.lastTimestamp' -A # cluster events
Debugging Checklist
| Symptom | First Commands |
|---|---|
Pod Pending |
kubectl describe pod <p> → check events, resources, taints |
Pod CrashLoopBackOff |
kubectl logs <p> --previous → check exit code |
Pod ImagePullBackOff |
Check image name, registry auth, imagePullSecrets |
| Service no endpoints | kubectl get endpoints <svc> → labels mismatch? |
| High latency | kubectl top pods, check resource limits, HPA status |
| OOMKilled | Increase limits.memory or optimize app memory usage |
Debug Commands
kubectl describe pod <name> # events, conditions, mounts
kubectl logs <pod> -f --tail=200 # follow logs
kubectl logs <pod> -c <container> # multi-container pod
kubectl logs <pod> --previous # previous crash
kubectl exec -it <pod> -- /bin/sh # shell access
kubectl debug -it <pod> --image=busybox # ephemeral debug container
kubectl debug node/<node> -it --image=busybox # node-level debug
kubectl get events -A --sort-by='.lastTimestamp' | tail -20