Runtime K8s Monitoring with Gatekeeper and Falco

Runtime K8s Monitoring with Gatekeeper and Falco

In the dynamic landscape of cloud-native applications, securing production environments is paramount. Proper threat modeling of workflows is essential to establish a baseline security posture. Tools like Falco and Gatekeeper are indispensable in this context — they provide powerful rule syntax that helps define and enforce the desired state of runtime workloads, alerting administrators whenever a container enters an undesired state.

In today's multi-cloud environments, where the origins of packages are often obscure, such capabilities are crucial for maintaining compliance and ensuring security. These tools are fundamental for professionals in the modern cloud-native ecosystem, covering essential aspects of security as defined by Gartner's quadrants of "Cloud Configuration: Kubernetes Security Posture Management" and "Runtime Protection."

This tutorial builds a complete Kubernetes runtime security lab:

  1. Deploy OPA Gatekeeper as an admission controller to enforce security policies at deploy time
  2. Deploy Falco with eBPF for real-time runtime detection of malicious behavior
  3. Write and test admission policies that block privileged containers, enforce image registries, and require security labels
  4. Write and test runtime rules that detect shell execution, suspicious network traffic, and container escape attempts
  5. Simulate attacks with malicious pod manifests and observe both layers of defense in action

Policy test cases for Kubernetes malicious use cases are available in the companion repository: https://github.com/kurtiepie/k8s_test_cases. These configs test malicious workloads to assess your policy defenses effectively.


How the Two Guard Rails Work Together

Kubernetes security requires defense in depth — no single tool covers every attack vector. Gatekeeper and Falco operate at different points in the workload lifecycle and complement each other:

  • Gatekeeper operates at the admission stage — it intercepts API requests before pods are scheduled and rejects workloads that violate policy. Think of it as the bouncer at the door.
  • Falco operates at runtime — it monitors system calls via eBPF on running containers and alerts when behavior deviates from baseline. Think of it as the security camera inside.

Together, they create a layered defense: Gatekeeper prevents known-bad configurations from entering the cluster, and Falco detects unknown-bad behavior once workloads are running.


Prerequisites

  • A running Kubernetes cluster (minikube, kind, or a managed cluster)
  • kubectl configured and connected to your cluster
  • helm v3 installed
  • Basic understanding of Kubernetes RBAC and pod security

Create a working directory for the lab:

mkdir -p ~/k8s-runtime-lab && cd ~/k8s-runtime-lab

Threat Model: Malicious Pod Attack Scenario

Before deploying defenses, it helps to understand what we're defending against. The following threat model illustrates a typical attack scenario in Kubernetes:

  1. Initial access — an attacker deploys a privileged container from a malicious container registry, either through a compromised CI/CD pipeline or a misconfigured RBAC policy
  2. Privilege escalation — the container runs with elevated privileges (privileged: true, hostPID: true, or dangerous capabilities like SYS_ADMIN)
  3. Discovery — the attacker enumerates the node filesystem, service accounts, and network from inside the container
  4. Lateral movement — using the compromised service account token or node-level access, the attacker pivots to other workloads or the control plane
  5. Command and control — the container establishes outbound connections to an external C2 server

Each of these stages generates indicators of compromise that Gatekeeper and Falco can detect and block. Malicious pod manifest examples from the excellent BishopFox badPods project demonstrate these attack patterns.

Gatekeeper blocks steps 1-2 by rejecting pods with dangerous security configurations at admission time. Falco detects steps 3-5 by monitoring syscalls and network activity at runtime.


Kubernetes API Request Lifecycle

Understanding where these tools intercept requests is critical. The Kubernetes API request lifecycle flows through several stages:

Client Request → Authentication → Authorization → Admission Controllers → etcd
                                                   ↑
                                            Gatekeeper operates here
                                            (Validating Webhook)

Pod Scheduled → Container Starts → Process Execution → System Calls
                                                        ↑
                                                 Falco operates here
                                                 (eBPF kernel probes)

During admission, Gatekeeper's validating webhook evaluates the pod spec against OPA policies written in Rego. If the pod violates any constraint, the API server returns an error and the pod is never created.

During runtime, Falco's eBPF probes attach to kernel tracepoints and monitor every execve, connect, open, and other syscalls. When a syscall matches a Falco rule condition, an alert fires in real time.


Step 1: Deploy Gatekeeper

Install Gatekeeper using Helm:

# Add the Gatekeeper Helm repo
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update

# Install Gatekeeper in its own namespace
helm install gatekeeper gatekeeper/gatekeeper \
    --namespace gatekeeper-system \
    --create-namespace

# Verify the deployment
kubectl get pods -n gatekeeper-system

You should see the gatekeeper-controller-manager and gatekeeper-audit pods running. The controller manager handles admission webhooks, and the audit controller periodically scans existing resources for policy violations.

# Confirm the webhook is registered
kubectl get validatingwebhookconfigurations | grep gatekeeper

Step 2: Deploy Falco with eBPF

Install Falco using Helm with eBPF mode and the Falcosidekick UI for visualizing alerts:

# Add the Falco Helm repo
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

# Install Falco with eBPF driver, Falcosidekick, and the web UI
helm install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set driver.kind=ebpf \
    --set falcosidekick.enabled=true \
    --set falcosidekick.webui.enabled=true \
    --set auditLog.enabled=true

# Verify the deployment
kubectl get pods -n falco

Wait for all pods to reach Running status. Falco runs as a DaemonSet — one pod per node — so you should see one falco pod for each node in your cluster.

# Check Falco logs to confirm eBPF probes loaded
kubectl logs -l app.kubernetes.io/name=falco -n falco --tail=20

Look for lines indicating the eBPF probe was loaded successfully and rules are active.

Access the Falcosidekick UI

# Port-forward the Falcosidekick UI
kubectl port-forward svc/falco-falcosidekick-ui -n falco 2802:2802 &

# Browse to http://localhost:2802
# Default credentials: admin / admin

The UI provides a real-time dashboard of all Falco alerts across your cluster.


Step 3: Guard Rail 1 — Admission Controller Policies

With Gatekeeper deployed, we define policies using two Kubernetes custom resources:

  • ConstraintTemplate — defines the Rego policy logic and the parameters it accepts
  • Constraint — applies the template to specific resources with specific parameter values

Policy: Block Privileged Containers

Privileged containers disable all container-based security controls (namespaces, cgroups, seccomp, AppArmor). They should almost never be allowed in production.

Create block-privileged-template.yaml:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sblockprivileged
spec:
  crd:
    spec:
      names:
        kind: K8sBlockPrivileged
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sblockprivileged

        violation[{"msg": msg}] {
            c := input_containers[_]
            c.securityContext.privileged == true
            msg := sprintf(
                "Privileged container not allowed: %v in %v",
                [c.name, input.review.object.metadata.name]
            )
        }

        input_containers[c] {
            c := input.review.object.spec.containers[_]
        }

        input_containers[c] {
            c := input.review.object.spec.initContainers[_]
        }

Create block-privileged-constraint.yaml:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sBlockPrivileged
metadata:
  name: block-privileged-containers
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
      - falco
  parameters: {}

Apply the policy:

kubectl apply -f block-privileged-template.yaml
kubectl apply -f block-privileged-constraint.yaml

Policy: Enforce Approved Container Registries

Only allow images from trusted registries — this prevents attackers from deploying containers from public or malicious registries.

Create allowed-repos-template.yaml:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          type: object
          properties:
            repos:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedrepos

        violation[{"msg": msg}] {
            container := input.review.object.spec.containers[_]
            not startswith(container.image, input.parameters.repos[_])
            msg := sprintf(
                "Container image '%v' comes from an untrusted registry",
                [container.image]
            )
        }

Create allowed-repos-constraint.yaml:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: require-trusted-registries
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
      - falco
  parameters:
    repos:
      - "gcr.io/your-project/"
      - "docker.io/library/"
      - "registry.k8s.io/"

Apply the policy:

kubectl apply -f allowed-repos-template.yaml
kubectl apply -f allowed-repos-constraint.yaml

Policy: Require Security Labels

Enforce that all workloads carry labels identifying their environment and owner — critical for auditing and incident response.

Create required-labels-template.yaml:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg}] {
            required := input.parameters.labels[_]
            not input.review.object.metadata.labels[required]
            msg := sprintf(
                "Missing required label: '%v' on %v '%v'",
                [required,
                 input.review.object.kind,
                 input.review.object.metadata.name]
            )
        }

Create required-labels-constraint.yaml:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-env-and-owner
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
      - falco
  parameters:
    labels:
      - "env"
      - "owner"
kubectl apply -f required-labels-template.yaml
kubectl apply -f required-labels-constraint.yaml

Test the Admission Policies

Try deploying pods that violate each policy:

# Test 1: Privileged container — should be DENIED
kubectl run priv-test --image=nginx \
    --overrides='{
        "spec": {
            "containers": [{
                "name": "priv-test",
                "image": "nginx",
                "securityContext": {"privileged": true}
            }]
        }
    }' --restart=Never

# Expected: Error — "Privileged container not allowed"

# Test 2: Untrusted registry — should be DENIED
kubectl run bad-registry --image=evil.registry.io/backdoor:latest \
    --restart=Never

# Expected: Error — "Container image comes from an untrusted registry"

# Test 3: Missing labels — should be DENIED
kubectl run no-labels --image=docker.io/library/nginx:latest \
    --restart=Never

# Expected: Error — "Missing required label: 'env'"

Check the Gatekeeper audit log for all current violations across existing resources:

kubectl get k8sblockprivileged block-privileged-containers \
    -o jsonpath='{.status.violations}' | jq .

kubectl get k8sallowedrepos require-trusted-registries \
    -o jsonpath='{.status.violations}' | jq .

Step 4: Guard Rail 2 — Runtime Policies with Falco

With admission policies establishing the baseline, Falco monitors runtime behavior. Falco ships with a comprehensive default ruleset, but custom rules tailored to your environment are essential.

Understanding Falco Rule Structure

A Falco rule has three components:

- rule: Descriptive Rule Name
  desc: Explanation of what the rule detects
  condition: >
    # Sysdig filter expression evaluated against every syscall
    spawned_process and container and
    proc.name in (bash, sh, csh, zsh)
  output: >
    # Alert message with context fields
    Shell spawned in container
    (user=%user.name container=%container.name image=%container.image.repository
     command=%proc.cmdline)
  priority: WARNING
  tags: [container, shell, mitre_execution]

The condition field uses Sysdig's filter syntax to match against kernel events. Key fields include:

  • spawned_process — a new process was execve'd
  • container — the event occurred inside a container
  • proc.name — the process name
  • fd.sip / fd.sport — socket destination IP and port
  • evt.type — the syscall type (open, connect, execve, etc.)

Custom Rule: Detect Shell in Production Containers

Create custom-falco-rules.yaml:

customRules:
  custom-rules.yaml: |-

    - rule: Shell Spawned in Production Container
      desc: >
        Detect interactive shell execution in containers labeled env=production.
        Production containers should never need interactive shells.
      condition: >
        spawned_process and container and
        proc.name in (bash, sh, csh, zsh, ksh) and
        k8s.pod.label.env = "production"
      output: >
        Shell spawned in production container
        (user=%user.name pod=%k8s.pod.name ns=%k8s.ns.name
         image=%container.image.repository command=%proc.cmdline
         parent=%proc.pname)
      priority: CRITICAL
      tags: [container, shell, mitre_execution, production]

    - rule: Outbound Connection to Non-RFC1918
      desc: >
        Detect containers making outbound TCP connections to addresses
        outside RFC 1918 private ranges. Indicates potential C2 communication
        or data exfiltration.
      condition: >
        outbound and container and
        fd.typechar = 4 and
        not (fd.sip startswith "10." or
             fd.sip startswith "172.16." or
             fd.sip startswith "172.17." or
             fd.sip startswith "172.18." or
             fd.sip startswith "172.19." or
             fd.sip startswith "172.2" or
             fd.sip startswith "172.30." or
             fd.sip startswith "172.31." or
             fd.sip startswith "192.168.")
      output: >
        Outbound connection to non-private IP from container
        (command=%proc.cmdline connection=%fd.name
         container=%container.name image=%container.image.repository
         pod=%k8s.pod.name ns=%k8s.ns.name)
      priority: WARNING
      tags: [container, network, mitre_command_and_control]

    - rule: Suspicious Reconnaissance Tool in Container
      desc: >
        Detect execution of common reconnaissance and exploitation tools
        inside containers. These tools are rarely legitimate in production.
      condition: >
        spawned_process and container and
        proc.name in (nmap, socat, nc, ncat, netcat, tcpdump, mitmproxy,
                      wireshark, tshark, masscan, nikto, sqlmap)
      output: >
        Suspicious tool executed in container
        (tool=%proc.name command=%proc.cmdline
         container=%container.name image=%container.image.repository
         pod=%k8s.pod.name ns=%k8s.ns.name user=%user.name)
      priority: CRITICAL
      tags: [container, mitre_discovery, mitre_lateral_movement]

    - rule: Container Drift Detected
      desc: >
        Detect execution of a binary that was not part of the original
        container image. Indicates an attacker has downloaded and executed
        a payload inside the container.
      condition: >
        spawned_process and container and
        not proc.is_exe_from_memfd = false and
        (proc.exe startswith "/tmp/" or
         proc.exe startswith "/dev/shm/" or
         proc.exe startswith "/var/tmp/" or
         proc.exe startswith "/run/")
      output: >
        Binary executed from suspicious path in container
        (command=%proc.cmdline exe=%proc.exe
         container=%container.name image=%container.image.repository
         pod=%k8s.pod.name ns=%k8s.ns.name)
      priority: CRITICAL
      tags: [container, mitre_execution, drift]

    - rule: Cloud Metadata Service Access from Container
      desc: >
        Detect attempts to access cloud instance metadata services (AWS, GCP, Azure)
        from containers. Common technique for credential theft in cloud environments.
      condition: >
        outbound and container and
        fd.sip = "169.254.169.254"
      output: >
        Container attempted to access cloud metadata service
        (command=%proc.cmdline container=%container.name
         image=%container.image.repository
         pod=%k8s.pod.name ns=%k8s.ns.name)
      priority: CRITICAL
      tags: [container, cloud, mitre_credential_access]

Upgrade the Falco deployment with custom rules:

helm upgrade falco falcosecurity/falco \
    --namespace falco \
    --reuse-values \
    -f custom-falco-rules.yaml

Test the Runtime Rules

Deploy a test pod and trigger alerts:

# Deploy a test pod with production label
kubectl run falco-test \
    --image=docker.io/library/alpine:latest \
    --labels="env=production,owner=security-team" \
    --restart=Never \
    -- sleep 3600

# Wait for the pod to be running
kubectl wait --for=condition=Ready pod/falco-test --timeout=60s

# Test 1: Spawn a shell — triggers "Shell Spawned in Production Container"
kubectl exec falco-test -- /bin/sh -c "whoami"

# Test 2: Run a recon tool — triggers "Suspicious Reconnaissance Tool"
kubectl exec falco-test -- /bin/sh -c "apk add --no-cache nmap && nmap --version"

# Test 3: Access metadata service — triggers "Cloud Metadata Service Access"
kubectl exec falco-test -- /bin/sh -c "wget -q -O- http://169.254.169.254/ 2>/dev/null || true"

# Test 4: Download and execute from /tmp — triggers "Container Drift Detected"
kubectl exec falco-test -- /bin/sh -c "cp /bin/ls /tmp/payload && /tmp/payload"

Check Falco alerts:

# View Falco logs for recent alerts
kubectl logs -l app.kubernetes.io/name=falco -n falco --tail=50 | grep -E "Warning|Critical"

# Or check the Falcosidekick UI at http://localhost:2802

Step 5: Simulating an Attack with BishopFox badPods

The badPods repository provides pod manifests that exploit various Kubernetes misconfigurations. Use these to validate your policies.

Create malicious-pod.yaml — a pod attempting multiple privilege escalations:

apiVersion: v1
kind: Pod
metadata:
  name: attack-pod
  labels:
    env: production
    owner: attacker
spec:
  hostNetwork: true
  hostPID: true
  containers:
    - name: attacker
      image: docker.io/library/alpine:latest
      securityContext:
        privileged: true
        capabilities:
          add: ["SYS_ADMIN", "NET_ADMIN"]
      volumeMounts:
        - name: host-root
          mountPath: /host
      command: ["/bin/sh", "-c"]
      args:
        - |
          echo "=== Node filesystem ==="
          ls /host/etc/kubernetes/
          echo "=== Host processes ==="
          ps aux
          echo "=== Service account token ==="
          cat /var/run/secrets/kubernetes.io/serviceaccount/token
          sleep 3600
  volumes:
    - name: host-root
      hostPath:
        path: /
        type: Directory

Attempt to deploy:

kubectl apply -f malicious-pod.yaml

Expected result with Gatekeeper active: The pod is rejected at admission with an error indicating the privileged security context is not allowed.

If Gatekeeper were not present and this pod somehow ran, Falco would fire multiple alerts:

  • Shell spawned in production container
  • Access to sensitive host paths
  • Reading of Kubernetes service account tokens
  • Potential container breakout via host filesystem mount

This demonstrates the value of layered defense — Gatekeeper blocks the deployment entirely, but Falco serves as the safety net if any misconfiguration allows the pod through.


Gatekeeper Audit: Scanning Existing Resources

Gatekeeper doesn't just block new deployments — it also audits existing resources. This catches workloads that were deployed before policies were in place.

# Check all constraint violations across the cluster
kubectl get constraints -o json | jq '.items[] | {
    kind: .kind,
    name: .metadata.name,
    violations: (.status.violations // [] | length),
    details: [.status.violations[]? | {
        name: .name,
        namespace: .namespace,
        message: .message
    }]
}'

This audit capability is critical for brownfield clusters where legacy workloads may not comply with newly applied policies.


Falco Alert Integration

Falco alerts are only useful if they reach the right people. Falcosidekick supports dozens of output channels:

# Upgrade Falco with Slack integration
helm upgrade falco falcosecurity/falco \
    --namespace falco \
    --reuse-values \
    --set falcosidekick.config.slack.webhookurl="https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
    --set falcosidekick.config.slack.minimumpriority="warning"

Other common integrations include:

  • Prometheus/Grafana — graph alert frequency over time, set up dashboards
  • PagerDuty/OpsGenie — page on-call for critical alerts
  • AWS Security Hub / GCP SCC — centralize findings in cloud-native SIEM
  • Elasticsearch/OpenSearch — full-text search and correlation across alerts

In a mature setup, Falco alerts feed into a SOAR platform that can automatically respond — for example, cordoning a node, killing a pod, or revoking a service account token when a critical alert fires.


Lab Cleanup

# Remove test pods
kubectl delete pod falco-test --ignore-not-found
kubectl delete pod attack-pod --ignore-not-found

# Remove Gatekeeper constraints and templates
kubectl delete k8sblockprivileged block-privileged-containers --ignore-not-found
kubectl delete k8sallowedrepos require-trusted-registries --ignore-not-found
kubectl delete k8srequiredlabels require-env-and-owner --ignore-not-found
kubectl delete constrainttemplate k8sblockprivileged k8sallowedrepos k8srequiredlabels --ignore-not-found

# Uninstall Falco and Gatekeeper
helm uninstall falco -n falco
helm uninstall gatekeeper -n gatekeeper-system

# Remove namespaces
kubectl delete namespace falco gatekeeper-system --ignore-not-found

# Remove lab files
rm -rf ~/k8s-runtime-lab/

Next Steps

  • Expand the Rego policy library using the Gatekeeper Library — pre-built templates for dozens of common security policies
  • Tune Falco rules for your environment using the Falco Rules Repository — reduce false positives by adding exceptions for legitimate processes
  • Implement automated response with Falco's k8s_audit plugin to automatically delete pods that trigger critical alerts
  • Add network policies to complement Falco's network monitoring — enforce east-west traffic restrictions at the CNI level
  • Cross-reference with the eBPF and Seccomp Container Hardening tutorial for kernel-level container hardening that complements runtime monitoring
  • Test with the Trojanized Docker Image tutorial to see how Falco detects supply chain attacks at runtime