Docker Security Examples

Containers share the host kernel — the only barriers between a container process and full host compromise are Linux security primitives: namespaces, cgroups, capabilities, seccomp, and mandatory access control (AppArmor/SELinux). Understanding how these controls work, and how attackers bypass them, is fundamental to securing any container workload.

This tutorial walks through practical Docker security scenarios:

Linux capabilities — drop all, add only what's needed, and observe the difference
Seccomp profiles — restrict which syscalls a container can make
AppArmor profiles — mandatory access control for file and network operations
Privilege escalation techniques — how --privileged, hostPID, and nsenter give attackers full host access
Read-only filesystems — prevent runtime tampering with --read-only and tmpfs
Host mount dangers — why -v /:/host is equivalent to giving away root
Image inspection — audit layers and scan for vulnerabilities before running

Each example is designed to be run locally with Docker — no Kubernetes cluster required.

Prerequisites

Docker installed and running
Basic familiarity with Linux permissions and processes
A terminal (all examples use standard Docker CLI)

mkdir -p ~/docker-security-lab && cd ~/docker-security-lab

Linux Capabilities

Linux capabilities split the monolithic root privilege into discrete units. Instead of granting a process full root power, you grant only the specific capabilities it needs. Docker containers start with a reduced capability set, but it's still more than most workloads require.

View Default Container Capabilities

# Run a container and list its capabilities
docker run --rm -it alpine:latest /bin/sh -c "apk add -q libcap && capsh --print"

The output shows the capability bounding set — these are the maximum privileges available to processes inside the container.

Drop All Capabilities

Dropping all capabilities is the most restrictive baseline. Most applications will still run — they just can't perform privileged operations like binding to low ports, changing file ownership, or sending raw network packets.

# Drop ALL capabilities — ping fails because it needs CAP_NET_RAW
docker run --rm --cap-drop ALL alpine:latest /bin/sh -c \
    "ping -c1 -W2 127.0.0.1"

Expected output: ping: permission denied (are you running as root?)

Add Back Only What's Needed

# Add CAP_NET_RAW back — ping works again
docker run --rm --cap-drop ALL --cap-add NET_RAW alpine:latest /bin/sh -c \
    "ping -c1 -W2 127.0.0.1"

Expected output: 1 packets transmitted, 1 packets received, 0% packet loss

Common Capabilities and Their Risk

Capability	What it allows	Risk level
`NET_RAW`	Raw sockets (ping, packet crafting)	Medium — enables ARP spoofing
`NET_ADMIN`	Network configuration changes	High — can sniff traffic, modify routes
`SYS_ADMIN`	Mount filesystems, manage namespaces	Critical — near-root access
`SYS_PTRACE`	Trace and debug processes	High — can read memory of other processes
`DAC_OVERRIDE`	Bypass file permission checks	High — read/write any file
`SETUID` / `SETGID`	Change process UID/GID	High — escalate to any user

Best practice: Always use --cap-drop ALL as the baseline and add back only the specific capabilities your application requires. Document why each capability is needed.

# Example: web server that only needs to bind to port 80
docker run --rm --cap-drop ALL --cap-add NET_BIND_SERVICE \
    -p 8080:80 nginx:alpine

Seccomp Profiles

Seccomp (Secure Computing Mode) filters which system calls a container process can execute. Docker applies a default seccomp profile that blocks approximately 44 of the 300+ Linux syscalls, including dangerous ones like mount, reboot, and kexec_load.

Running Without Seccomp (Dangerous)

Disabling seccomp removes all syscall filtering — the container process can invoke any syscall the kernel supports.

# Without seccomp: unshare succeeds — the process can create new namespaces
docker run --rm -it --security-opt seccomp=unconfined alpine:latest \
    unshare --map-root-user --user /bin/sh -c "whoami && id"

This works because unshare requires the CLONE_NEWUSER syscall, which the default seccomp profile blocks.

Running With the Default Profile

# With default seccomp: unshare is blocked
docker run --rm -it alpine:latest \
    unshare --map-root-user --user /bin/sh -c "whoami && id"

Expected: unshare: unshare(0x10000000): Operation not permitted

Using a Custom Seccomp Profile

Download Docker's default profile and customize it:

# Download the default seccomp profile
curl -sO https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json

# Inspect which syscalls are allowed
cat default.json | python3 -m json.tool | grep -c "name"

Create a stricter profile that also blocks chmod and chown:

cat > strict-seccomp.json << 'SECCOMP'
{
    "defaultAction": "SCMP_ACT_ERRNO",
    "defaultErrnoRet": 1,
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_AARCH64"
    ],
    "syscalls": [
        {
            "names": [
                "accept", "accept4", "access", "bind", "brk", "capget",
                "capset", "chdir", "clone", "close", "connect", "dup",
                "dup2", "dup3", "epoll_create", "epoll_create1",
                "epoll_ctl", "epoll_wait", "epoll_pwait", "execve",
                "exit", "exit_group", "faccessat", "fchdir", "fcntl",
                "fstat", "fstatfs", "futex", "getcwd", "getdents64",
                "getegid", "geteuid", "getgid", "getpeername",
                "getpid", "getppid", "getrandom", "getsockname",
                "getsockopt", "getuid", "ioctl", "listen", "lseek",
                "madvise", "mmap", "mprotect", "munmap", "nanosleep",
                "newfstatat", "open", "openat", "pipe", "pipe2",
                "poll", "ppoll", "prctl", "pread64", "prlimit64",
                "pwrite64", "read", "readlink", "readlinkat",
                "recvfrom", "recvmsg", "rename", "rt_sigaction",
                "rt_sigprocmask", "rt_sigreturn", "select",
                "sendmsg", "sendto", "set_robust_list",
                "set_tid_address", "setsockopt", "shutdown",
                "sigaltstack", "socket", "stat", "statfs",
                "sysinfo", "tgkill", "uname", "unlink", "wait4",
                "write", "writev"
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}
SECCOMP

# Run with the strict profile — chmod is now blocked
docker run --rm -it --security-opt seccomp=./strict-seccomp.json \
    alpine:latest /bin/sh -c "touch /tmp/test && chmod 777 /tmp/test"

Expected: chmod: /tmp/test: Operation not permitted

AppArmor Profiles

AppArmor provides mandatory access control at the filesystem and network level. Docker's default AppArmor profile (docker-default) restricts containers from writing to /proc and /sys, mounting filesystems, and accessing certain devices.

Check the Default AppArmor Profile

# Verify AppArmor is loaded (Linux hosts only)
sudo aa-status 2>/dev/null | head -20

# Run with the default profile explicitly
docker run --rm -it --security-opt apparmor=docker-default \
    alpine:latest /bin/sh -c "cat /proc/sysrq-trigger"

Create a Custom AppArmor Profile

Write a restrictive profile that prevents a container from writing to anything except /tmp:

sudo tee /etc/apparmor.d/docker-restricted << 'EOF'
#include <tunables/global>

profile docker-restricted flags=(attach_disconnected,mediate_deleted) {
    #include <abstractions/base>

    # Allow read access to most paths
    / r,
    /** r,

    # Only allow writes to /tmp
    /tmp/** rw,

    # Deny writes everywhere else
    deny /etc/** w,
    deny /usr/** w,
    deny /var/** w,
    deny /home/** w,
    deny /root/** w,

    # Deny raw network access
    deny network raw,

    # Deny mount operations
    deny mount,

    # Allow necessary capabilities
    capability net_bind_service,
    capability setuid,
    capability setgid,
}
EOF

# Load the profile
sudo apparmor_parser -r /etc/apparmor.d/docker-restricted

# Test: writing to /etc fails
docker run --rm -it --security-opt apparmor=docker-restricted \
    alpine:latest /bin/sh -c "echo test > /etc/test.txt"

# Test: writing to /tmp succeeds
docker run --rm -it --security-opt apparmor=docker-restricted \
    alpine:latest /bin/sh -c "echo test > /tmp/test.txt && cat /tmp/test.txt"

Privilege Escalation Techniques

Understanding how attackers escalate privileges from inside a container helps you defend against these patterns. Each technique below exploits a specific Docker misconfiguration.

Technique 1: `--privileged` Flag

The --privileged flag disables all container isolation — capabilities, seccomp, AppArmor, device cgroups, and namespace restrictions are removed. The container process has the same access as a root process on the host.

# Privileged container can see ALL host devices
docker run --rm -it --privileged alpine:latest /bin/sh -c \
    "fdisk -l 2>/dev/null | head -20"

The container can see and interact with host block devices, mount host filesystems, load kernel modules, and access every hardware device.

Technique 2: `nsenter` with Host PID Namespace

When a container shares the host's PID namespace (--pid=host), it can see all host processes. Combined with --privileged, nsenter lets you enter the host's mount, UTS, network, and IPC namespaces — effectively escaping the container entirely.

# Full host escape: nsenter into PID 1 (the host's init process)
docker run --rm -it --privileged --pid=host alpine:latest \
    nsenter -t 1 -m -u -n -i /bin/sh -c "hostname && whoami && cat /etc/hostname"

What each nsenter flag does:

Flag	Namespace	Effect
`-t 1`	Target	Attach to PID 1 (host init)
`-m`	Mount	See the host filesystem
`-u`	UTS	See the host hostname
`-n`	Network	See the host network stack
`-i`	IPC	See the host IPC resources

This command gives you an interactive shell as root on the host. From here, you can read /etc/shadow, install packages, modify systemd services, or pivot to other machines on the network.

Technique 3: Host Filesystem Mount

Mounting the host root filesystem into a container provides direct read/write access to everything on the host:

# Mount the host root filesystem at /host
docker run --rm -it -v /:/host alpine:latest /bin/sh -c \
    "chroot /host /bin/sh -c 'cat /etc/shadow | head -5'"

This works even without --privileged if the user running Docker has permission to bind-mount /. The chroot then pivots into the host filesystem, giving the process full host context.

Technique 4: Docker Socket Mount

Mounting the Docker socket gives the container control over the Docker daemon — it can create new privileged containers, access volumes, and effectively control the host:

# Mount the Docker socket — the container can now manage Docker
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \
    docker:latest docker ps

From here, an attacker can launch a privileged container with the host filesystem mounted — achieving full host compromise in two steps.

Defense: Never mount the Docker socket into application containers. Use rootless Docker or Podman for environments where containers need to build other containers.

Read-Only Filesystems and tmpfs

Running containers with --read-only prevents any writes to the container's root filesystem. This blocks attackers from downloading tools, modifying configs, or writing persistence mechanisms.

Basic Read-Only Container

# Read-only root filesystem — writes fail
docker run --rm -it --read-only alpine:latest /bin/sh -c \
    "echo 'malware' > /tmp/payload"

Expected: can't create /tmp/payload: Read-only file system

Read-Only with tmpfs for Required Write Paths

Most applications need to write to a few specific paths (temp files, PID files, logs). Use tmpfs mounts for these — they exist only in memory and are never written to disk.

# Read-only with tmpfs for /tmp — application writes work, nothing persists
docker run --rm -it --read-only --tmpfs /tmp:rw,noexec,nosuid \
    alpine:latest /bin/sh -c \
    "echo 'tempdata' > /tmp/test && cat /tmp/test && ls -la /tmp/"

Key tmpfs mount options:

Option	Effect
`rw`	Allow read/write (default for tmpfs)
`noexec`	Prevent executing binaries from tmpfs — blocks attackers from running downloaded payloads
`nosuid`	Ignore SUID/SGID bits — prevents privilege escalation via setuid binaries
`size=64m`	Limit tmpfs size — prevents memory exhaustion attacks

Running as Non-Root with Read-Only Filesystem

Combine --read-only, --tmpfs, non-root user, and dropped capabilities for a hardened container:

# Create a test image with a non-root user
cat > /tmp/Dockerfile.secure << 'EOF'
FROM alpine:latest
RUN adduser -D -u 1000 appuser
USER appuser
WORKDIR /home/appuser
CMD ["sh"]
EOF

docker build -t secure-test -f /tmp/Dockerfile.secure /tmp

# Run with full hardening
docker run --rm -it \
    --read-only \
    --tmpfs /tmp:rw,noexec,nosuid,size=64m \
    --cap-drop ALL \
    --security-opt no-new-privileges \
    -u 1000:1000 \
    secure-test /bin/sh -c "whoami && id && touch /tmp/ok && echo 'write works in /tmp'"

The --security-opt no-new-privileges flag prevents the process from gaining additional privileges through SUID binaries or capability inheritance — even if an attacker finds a setuid binary inside the container, they cannot exploit it.

Host Network Dangers

Using --network=host removes network namespace isolation — the container shares the host's network stack, can bind to any port, and can sniff all host traffic.

# Host network: nginx binds directly to the host's port 80
docker run --rm -d --network=host --name host-nginx nginx:alpine

# Verify it's listening on the host's network interface
curl -s -o /dev/null -w "%{http_code}" http://localhost:80

# Cleanup
docker stop host-nginx

Why this is dangerous:

The container can bind to any port on the host, potentially hijacking services
Network monitoring tools inside the container can capture all host traffic
The container can communicate with services bound to 127.0.0.1 on the host (databases, admin interfaces)
No port mapping is needed — the container bypasses Docker's network proxy entirely

Best practice: Use Docker's bridge network (default) or custom networks. Only use --network=host when absolutely required for performance-critical network applications, and combine it with other security controls.

Image Inspection and Vulnerability Scanning

Before running any image — especially from public registries — inspect its contents and scan for known vulnerabilities.

Inspect Image Layers and History

# Pull an image
docker pull ubuntu/squid:latest

# View the full build history — shows every Dockerfile instruction
docker history --no-trunc ubuntu/squid:latest

Look for suspicious patterns in the history:

curl or wget fetching unknown URLs
Scripts executed during build (RUN bash -c "...")
Environment variables containing credentials or tokens
Layers that install tools like netcat, nmap, or socat

Scan with Trivy

# Scan for HIGH and CRITICAL vulnerabilities
trivy image --severity HIGH,CRITICAL ubuntu/squid:latest

# Scan and output as JSON for pipeline integration
trivy image --severity HIGH,CRITICAL -f json -o scan-results.json \
    ubuntu/squid:latest

# Scan a local Dockerfile for misconfigurations
trivy config /tmp/Dockerfile.secure

Inspect a Running Container's Filesystem

# Start a container to inspect
docker run -d --name inspect-target alpine:latest sleep 3600

# Export the filesystem as a tar and examine
docker export inspect-target | tar -tf - | head -50

# Look for suspicious files in writable locations
docker export inspect-target | tar -tf - | grep -E "(tmp|dev/shm|var/tmp)/"

# Check for SUID binaries
docker exec inspect-target find / -perm -4000 -type f 2>/dev/null

# Cleanup
docker stop inspect-target && docker rm inspect-target

Verify Image Signatures with Cosign

For images signed with Sigstore cosign:

# Verify an image signature
cosign verify --key cosign.pub <registry>/<image>:<tag>

# Verify with keyless signing (Sigstore transparency log)
cosign verify \
    --certificate-identity <signer-email> \
    --certificate-oidc-issuer https://accounts.google.com \
    <registry>/<image>:<tag>

Putting It All Together: Secure Container Checklist

Run containers with the maximum practical restrictions:

# Production-hardened container run command
docker run -d \
    --name my-app \
    --read-only \
    --tmpfs /tmp:rw,noexec,nosuid,size=64m \
    --cap-drop ALL \
    --cap-add NET_BIND_SERVICE \
    --security-opt no-new-privileges \
    --security-opt seccomp=./strict-seccomp.json \
    --security-opt apparmor=docker-restricted \
    --memory 512m \
    --cpus 1 \
    --pids-limit 100 \
    -u 1000:1000 \
    --network my-app-net \
    -p 8080:8080 \
    my-app:latest

What each flag does:

Flag	Security control
`--read-only`	Immutable root filesystem
`--tmpfs /tmp:noexec`	Writable temp that blocks binary execution
`--cap-drop ALL`	Remove all Linux capabilities
`--cap-add NET_BIND_SERVICE`	Add back only what's needed
`--no-new-privileges`	Block SUID/capability escalation
`--seccomp=`	Custom syscall filter
`--apparmor=`	Mandatory access control profile
`--memory 512m`	Prevent memory exhaustion
`--cpus 1`	Prevent CPU exhaustion
`--pids-limit 100`	Prevent fork bombs
`-u 1000:1000`	Run as non-root user
`--network my-app-net`	Isolated network (not host)

Lab Cleanup

# Remove any running lab containers
docker rm -f falco-test secure-test inspect-target host-nginx 2>/dev/null

# Remove lab images
docker rmi secure-test 2>/dev/null

# Remove generated files
rm -f strict-seccomp.json default.json scan-results.json
rm -f /tmp/Dockerfile.secure

# Remove custom AppArmor profile (if created)
sudo apparmor_parser -R /etc/apparmor.d/docker-restricted 2>/dev/null
sudo rm -f /etc/apparmor.d/docker-restricted

# Remove lab directory
rm -rf ~/docker-security-lab/

Next Steps

Apply these patterns to Kubernetes using Pod Security Standards and the eBPF and Seccomp Container Hardening tutorial
Add runtime monitoring with Falco to detect when containers violate security baselines — see the Gatekeeper and Falco tutorial
Scan images in CI/CD using the Vulnerability Scanning with SBOM, Syft, and Grype tutorial
Explore fileless attacks that bypass read-only filesystems using the Memfd Syscall In-Memory Execution tutorial — these demonstrate why seccomp and Falco are essential even with --read-only
Understand Linux capabilities in depth with the Linux Capabilities and Containers tutorial

Docker Security Examples

Docker Security Examples

Prerequisites

Linux Capabilities

View Default Container Capabilities

Drop All Capabilities

Add Back Only What's Needed

Common Capabilities and Their Risk

Seccomp Profiles

Running Without Seccomp (Dangerous)

Running With the Default Profile

Using a Custom Seccomp Profile

AppArmor Profiles

Check the Default AppArmor Profile

Create a Custom AppArmor Profile

Privilege Escalation Techniques

Technique 1: --privileged Flag

Technique 2: nsenter with Host PID Namespace

Technique 3: Host Filesystem Mount

Technique 4: Docker Socket Mount

Read-Only Filesystems and tmpfs

Basic Read-Only Container

Read-Only with tmpfs for Required Write Paths

Running as Non-Root with Read-Only Filesystem

Host Network Dangers

Image Inspection and Vulnerability Scanning

Inspect Image Layers and History

Scan with Trivy

Inspect a Running Container's Filesystem

Verify Image Signatures with Cosign

Putting It All Together: Secure Container Checklist

Lab Cleanup

Next Steps

Technique 1: `--privileged` Flag

Technique 2: `nsenter` with Host PID Namespace