Enhancing Security in Kubernetes with Linux Capabilities

SysCalls Are the Attack Surface!

Linux Capabilities are a powerful feature that allows non-root users to perform specific privileged operations without needing full root access. This mechanism divides root-level privileges into distinct units, known as capabilities, which can be independently enabled or disabled for different processes. This approach allows for more granular control over the system calls a process or container can execute, enhancing the security posture by adhering to the principle of least privilege.

In Kubernetes, capabilities are an integral part of security contexts applied within pod manifests. They play a crucial role in defining and enforcing security policies tailored to the needs of individual workloads. To further refine access controls and reduce potential attack surfaces, integrating Linux capabilities with SECCOMP or SELinux mandatory access control (MAC) frameworks is advisable. These frameworks help define precise, enforceable rules that limit the actions containers can perform, thereby mitigating the risk of exploitation.

Despite Kubernetes’ open and flexible nature, securing workloads requires careful consideration and proactive security measures. A deployment without a properly defined security context is a call to action for security teams to reassess and fortify the deployment strategy. By default, Kubernetes does not enforce strict security settings, making it essential for administrators to implement robust security measures tailored to their operational requirements.

Linux DateTime Example

$ ls -l /bin/date
-rwxr-xr-x 1 root root 108920 Sep  5  2019 /bin/date
root@master:~# useradd -m -G kurtis
root@master:~# su kurtis

$ date -s "Mon May 11 15:58:41 CEST 2018"
date: cannot set date: Operation not permitted
Fri 11 May 2018 02:58:41 PM +01
kurtis@master:/root$ exit
exit

root@master:~# setcap cap_sys_time=ep /bin/date
root@master:~# su kurtis

$ date -s "Mon May 11 15:58:41 CEST 2018"
Fri 11 May 2018 02:58:41 PM +01
$ getcap /usr/bin/date
/usr/bin/date = cap_sys_time+ep

Docker Example + Dockerfile to Test With Ping:

$ cat <<EOF > Dockerfile
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y libcap2-bin inetutils-ping
CMD ["/sbin/getpcaps", "1"]
EOF

# Build
$ docker build . -t getcaps


# Test
$ docker run --rm getcaps
Capabilities for `1': = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+ep

$ docker run -it --rm getcaps /bin/sh -c 'whoami'
root

Drop capabilities and add to ping from pod example:

$ docker run --rm  --cap-drop ALL getcaps /bin/sh -c 'ping -c1 -w2 127.0.0.1'
ping: Lacking privilege for raw socket.

$ docker run --rm  --cap-drop ALL --cap-add CAP_NET_RAW getcaps /bin/sh -c 'ping -c1 -w2 127.0.0.1'
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.067 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.067/0.067/0.067/0.000 ms

$ docker run --rm  --cap-drop ALL --cap-add CAP_NET_RAW getcaps
Capabilities for `1': = cap_net_raw+ep

$ docker run --rm  --cap-drop ALL --cap-add CAP_NET_RAW getcaps /bin/sh -c 'capsh --print'
Current: = cap_net_raw+ep
Bounding set =cap_net_raw
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=

capabilities are a fundamental security element in virtualization systems such as Docker or Linux containers for the management of security context

Kubernetes Example:

# Tag and push image
$ docker tag getcaps kvad/getcaps
$ docker push kvad/getcaps
Using default tag: latest
The push refers to repository [docker.io/kvad/getcaps]
0ef81cb2551d: Pushing [==================================================>]  43.73MB
...

Start Minikube or kind for testing:

PodSpec:

apiVersion: v1
kind: Pod
metadata:
  name: getcaps
  labels:
    app: getcaps
spec:
  hostNetwork: false
  hostPID: false
  hostIPC: false
  containers:
  - name: getcaps
    image: kvad/getcaps:latest
    securityContext:
      privileged: false
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "ping -c2 -w2 127.0.0.1" ]

$ kubectly apply -f getcaps.yaml
$ kubectl logs getcaps
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.050 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.051 ms
--- 127.0.0.1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.050/0.051/0.051/0.000 ms

DROP capabilities and ADD

apiVersion: v1
kind: Pod
metadata:
  name: getcaps
  labels:
    app: getcaps
spec:
  hostNetwork: false
  hostPID: false
  hostIPC: false
  containers:
  - name: getcaps
    image: kvad/getcaps:latest
    securityContext:
      privileged: false
      capabilities:
        drop:
          - all
        add: ["CAP_NET_RAW"]

    command: [ "/bin/sh", "-c", "--" ]
    args: [ "ping -c2 -w2 127.0.0.1" ]