Docker and YARA Malware Scanning in the SDLC

Docker and YARA Malware Scanning in the SDLC

Vulnerability scanners like Grype and Trivy check known CVE databases — they tell you if a library has a published vulnerability. But they don't detect novel malware, backdoored binaries, or trojanized packages that haven't been assigned a CVE yet. That gap is where YARA comes in.

YARA is a pattern matching engine that lets you write custom rules to detect specific byte sequences, strings, and structural patterns in files. It doesn't need a CVE database — you define what "suspicious" looks like, and YARA finds it.

This tutorial covers two labs:

  1. Lab 1 — Scanning Docker images for known malware using the EICAR test file
  2. Lab 2 — Writing YARA rules to detect the Shai-Hulud 2.0 npm supply chain worm, a real-world attack that compromised npm maintainer accounts and propagated through the ecosystem

For vulnerability and SBOM scanning (the other half of build security), see the Vulnerability & SBOM Scanning with Syft and Grype tutorial.

Example prototype available: https://github.com/kurtiepie/MalScan


What is YARA?

YARA rules have three sections:

rule ExampleRule
{
    meta:
        description = "What this rule detects"
        author = "Your name"
        severity = "critical"

    strings:
        $text1 = "suspicious_string" ascii
        $hex1 = { 4D 5A 90 00 }
        $regex1 = /eval\(atob\(['"]/

    condition:
        any of them
}
  • meta — Human-readable metadata. Not used for matching, but useful for triage and reporting.
  • strings — The patterns to search for. Can be plain text (ascii/wide), hex byte sequences, or regular expressions.
  • condition — Boolean logic that decides if the rule fires. Can reference string matches, file size, file magic bytes, and more.

YARA vs antivirus signatures: AV engines use proprietary signature databases that update automatically. YARA puts you in control — you write rules for the specific threats relevant to your environment. This makes it ideal for detecting supply chain attacks where the malware is novel and hasn't been signatured by AV vendors yet.


Lab 1: Docker Image Malware Scanning with EICAR

The EICAR test file is a safe, non-destructive file developed by the European Institute for Computer Antivirus Research. Every AV engine recognizes it as malicious, making it perfect for testing detection pipelines without handling real malware.

Prerequisites

mkdir -p ~/yara-lab && cd ~/yara-lab

Step 1: Download the EICAR Test File

curl -O https://secure.eicar.org/eicar.com.txt

The file contains a single string that triggers AV detection:

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

Step 2: Write a YARA Rule for EICAR

Create rules/eicar.yar:

mkdir -p rules
cat > rules/eicar.yar << 'EOF'
rule EICAR_Test_File
{
    meta:
        description = "Detects the EICAR antivirus test file"
        author = "YARA Lab"
        reference = "https://www.eicar.org/download-anti-malware-testfile/"
        severity = "info"

    strings:
        $eicar = "X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*" ascii

    condition:
        $eicar
}
EOF

Step 3: Create a Docker Image with the EICAR File

Create Dockerfile-eicar:

FROM alpine:3.19
WORKDIR /app
COPY eicar.com.txt /app/
CMD ["cat", "/app/eicar.com.txt"]

Build and tag:

docker build -f Dockerfile-eicar -t eicar-test:0.1 .

Step 4: Scan with YARA

Option A: Direct scan with blacktop/yara

If you run a local Docker registry, you can scan the registry storage directly:

# Push to local registry
docker tag eicar-test:0.1 localhost:5000/eicar-test:0.1
docker push localhost:5000/eicar-test:0.1

# Scan the registry storage
docker run --rm \
    -v "$PWD/rules:/rules:ro" \
    -v /opt/docker-registry/data/docker/registry/v2/:/malware:ro \
    blacktop/yara -r /rules/eicar.yar /malware/repositories/eicar-test/

Option B: Scan exported image layers

Without a local registry, export and scan the image layers:

# Save the image as a tar
docker save eicar-test:0.1 -o /tmp/eicar-test.tar

# Extract layers
mkdir -p /tmp/eicar-layers
tar -xf /tmp/eicar-test.tar -C /tmp/eicar-layers

# Scan all layers
docker run --rm \
    -v "$PWD/rules:/rules:ro" \
    -v /tmp/eicar-layers:/scan:ro \
    blacktop/yara -r /rules/eicar.yar /scan/

Expected output:

EICAR_Test_File /scan/.../layer.tar

Option C: Deep scan with Deepfence YaraHunter

Deepfence YaraHunter unpacks Docker image layers automatically and runs a comprehensive rule set:

docker run -i --rm --name=deepfence-yarahunter \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /tmp:/home/deepfence/output \
    deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
    --image-name eicar-test:0.1 \
    --output=json > eicar-results.json

Inspect the results:

cat eicar-results.json | jq '.[] | {rule: .rule_name, path: .complete_filename, severity: .severity}'

Lab 2: Detecting npm Supply Chain Attacks (Shai-Hulud 2.0)

In late 2025, the Shai-Hulud 2.0 campaign demonstrated what a fully automated npm supply chain worm looks like. Attackers compromised npm maintainer accounts and published trojanized versions of legitimate packages. Every developer who ran npm install on a compromised dependency got hit — before their application code even ran.

How the Attack Worked

The attack chain exploited npm's lifecycle hooks — scripts in package.json that execute automatically during npm install:

npm install
  └─> preinstall hook fires
        └─> node setup_bun.js
              └─> downloads and executes bun_environment.js
                    ├─> harvests GitHub tokens (env vars, gh CLI)
                    ├─> harvests npm tokens (NPM_TOKEN, .npmrc)
                    ├─> creates GitHub repo for exfiltration
                    ├─> deploys self-hosted GitHub Actions runner
                    ├─> runs TruffleHog to scan filesystem for secrets
                    ├─> extracts cloud secrets (AWS IMDS, Azure Key Vault, GCP)
                    ├─> uploads stolen data as base64-encoded JSON
                    └─> uses npm tokens to trojanize MORE packages (worm propagation)

Key IOCs (Indicators of Compromise):

Indicator Type
setup_bun.js Loader script injected into packages
bun_environment.js Obfuscated primary payload
"preinstall": "node setup_bun.js" Malicious lifecycle hook in package.json
contents.json, environment.json, cloud.json Exfiltration data files
truffleSecrets.json, actionsSecrets.json Stolen secrets files
TruffleHog execution: trufflehog filesystem $HOME Secret scanning tool abuse
AWS IMDS access: 169.254.169.254 Cloud credential theft

Step 1: Write YARA Rules for Shai-Hulud Detection

Create rules/npm_supply_chain.yar:

rule NPM_ShaiHulud_Loader
{
    meta:
        description = "Detects Shai-Hulud 2.0 loader script (setup_bun.js)"
        author = "YARA Lab"
        reference = "https://www.sweet.security/blog/shai-hulud-2-0-escalating-npm-supply-chain-threat-demands-immediate-attention"
        severity = "critical"

    strings:
        $filename = "setup_bun.js" ascii
        $payload = "bun_environment" ascii
        $preinstall = "preinstall" ascii

    condition:
        2 of them
}

rule NPM_ShaiHulud_Payload
{
    meta:
        description = "Detects Shai-Hulud 2.0 primary payload patterns"
        severity = "critical"

    strings:
        $env_file = "bun_environment.js" ascii
        $gh_token = "GITHUB_TOKEN" ascii
        $npm_token = "NPM_TOKEN" ascii
        $npmrc = ".npmrc" ascii
        $trufflehog = "trufflehog" ascii
        $imds = "169.254.169.254" ascii

    condition:
        3 of them
}

rule NPM_ShaiHulud_Exfil
{
    meta:
        description = "Detects Shai-Hulud 2.0 exfiltration artifacts"
        severity = "high"

    strings:
        $contents = "contents.json" ascii
        $environment = "environment.json" ascii
        $cloud = "cloud.json" ascii
        $truffleSec = "truffleSecrets.json" ascii
        $actionsSec = "actionsSecrets.json" ascii

    condition:
        3 of them
}

rule NPM_Suspicious_Preinstall_Hook
{
    meta:
        description = "Detects suspicious preinstall hooks in package.json"
        severity = "high"

    strings:
        $preinstall = /"preinstall"\s*:\s*"node\s+[a-zA-Z_]+\.js"/ ascii
        $setup = "setup_" ascii
        $bun = "_bun" ascii

    condition:
        $preinstall and ($setup or $bun)
}

rule NPM_Credential_Harvester
{
    meta:
        description = "Detects credential harvesting patterns common in npm supply chain attacks"
        severity = "critical"

    strings:
        $gh_token = "GITHUB_TOKEN" ascii
        $npm_token = "NPM_TOKEN" ascii
        $npmrc = ".npmrc" ascii
        $aws_key = "AWS_SECRET_ACCESS_KEY" ascii
        $aws_imds = "169.254.169.254" ascii
        $az_token = "az account get-access-token" ascii
        $gcp_meta = "metadata.google.internal" ascii
        $base64 = "base64" ascii

    condition:
        filesize < 500KB and 4 of them
}

rule NPM_Obfuscated_Malicious_JS
{
    meta:
        description = "Detects common obfuscation patterns in malicious npm packages"
        severity = "medium"

    strings:
        $eval_atob = /eval\s*\(\s*atob\s*\(/ ascii
        $eval_buffer = /eval\s*\(\s*Buffer\.from\s*\(/ ascii
        $char_concat = /String\.fromCharCode\s*\(\s*\d+\s*(,\s*\d+\s*){10,}\)/ ascii
        $hex_array = /\[0x[0-9a-f]+\s*(,\s*0x[0-9a-f]+\s*){20,}\]/ ascii
        $reverse_exec = /\]\s*\.\s*reverse\s*\(\s*\)\s*\.\s*join/ ascii

    condition:
        any of them
}

Step 2: Create a Test npm Package (Simulated Trojan)

Create a harmless simulated trojanized package for testing:

mkdir -p test-npm-package

# Simulated malicious package.json
cat > test-npm-package/package.json << 'EOF'
{
  "name": "totally-legit-package",
  "version": "1.0.0",
  "description": "A helpful utility",
  "preinstall": "node setup_bun.js",
  "main": "index.js"
}
EOF

# Simulated loader (harmless — just echoes)
cat > test-npm-package/setup_bun.js << 'EOF'
// Simulated Shai-Hulud loader for YARA testing
// In the real attack, this downloads and executes bun_environment.js
console.log("SIMULATED: Would load bun_environment.js");
console.log("SIMULATED: Would harvest GITHUB_TOKEN and NPM_TOKEN");
console.log("SIMULATED: Would access 169.254.169.254 for AWS IMDS");
EOF

# Simulated payload (harmless)
cat > test-npm-package/bun_environment.js << 'EOF'
// Simulated Shai-Hulud payload for YARA testing
// In the real attack, this is heavily obfuscated
var targets = ["contents.json", "environment.json", "cloud.json",
               "truffleSecrets.json", "actionsSecrets.json"];
console.log("SIMULATED: Would create exfiltration files:", targets);
console.log("SIMULATED: Would run trufflehog filesystem $HOME");
EOF

Step 3: Scan with YARA

# Scan the test package directory
yara -r rules/npm_supply_chain.yar test-npm-package/

Expected output:

NPM_ShaiHulud_Loader test-npm-package/package.json
NPM_Suspicious_Preinstall_Hook test-npm-package/package.json
NPM_ShaiHulud_Loader test-npm-package/setup_bun.js
NPM_ShaiHulud_Payload test-npm-package/setup_bun.js
NPM_ShaiHulud_Payload test-npm-package/bun_environment.js
NPM_ShaiHulud_Exfil test-npm-package/bun_environment.js

Step 4: Scan a Docker Image Containing npm Packages

Build a Docker image that includes npm dependencies, then scan it:

# Dockerfile-node-app
FROM node:20-slim
WORKDIR /app
COPY test-npm-package/ /app/
CMD ["node", "index.js"]
# Build the image
docker build -f Dockerfile-node-app -t node-app-test:0.1 .

# Export and scan
docker save node-app-test:0.1 -o /tmp/node-app.tar
mkdir -p /tmp/node-layers
tar -xf /tmp/node-app.tar -C /tmp/node-layers

# Scan all layers for supply chain IOCs
yara -r rules/npm_supply_chain.yar /tmp/node-layers/

# Or use Deepfence YaraHunter with custom rules
docker run -i --rm \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v "$PWD/rules:/home/deepfence/rules:ro" \
    -v /tmp:/home/deepfence/output \
    deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
    --image-name node-app-test:0.1 \
    --output=json > npm-scan-results.json

Step 5: Scan node_modules Before Building

Catch compromised packages before they get baked into an image:

# After npm install, before docker build
yara -r rules/npm_supply_chain.yar -r node_modules/ 2>/dev/null

# Check specifically for suspicious preinstall hooks
find node_modules -name "package.json" -exec \
    grep -l '"preinstall"' {} \; | while read pkg; do
    echo "=== $pkg ==="
    yara rules/npm_supply_chain.yar "$pkg"
done

Writing Effective YARA Rules

String Selection

Choose strings that are specific to the threat and unlikely to appear in legitimate code:

# BAD — too generic, will fire on every Node.js app
strings:
    $s1 = "require" ascii

# GOOD — specific to the attack pattern
strings:
    $s1 = "preinstall" ascii
    $s2 = "setup_bun.js" ascii
    $s3 = "bun_environment" ascii

Condition Logic

Use conditions that reduce false positives:

condition:
    filesize < 500KB and    // Don't scan huge files
    3 of ($ioc_*) and       // Require multiple IOC matches
    not $legitimate         // Exclude known good patterns

File Type Filters

Restrict rules to relevant file types:

condition:
    // Only match ELF binaries
    uint32(0) == 0x464C457F and $reverse_shell

    // Only match JSON files (package.json)
    uint8(0) == 0x7B and $preinstall_hook

Performance

  • Avoid expensive regex patterns in hot paths
  • Use filesize conditions to skip large files
  • Put the cheapest conditions first (short-circuit evaluation)
  • Test rules against a large benign corpus before deploying

Integrating YARA into CI/CD

GitHub Actions Example

name: YARA Malware Scan
on:
  push:
    branches: [main]
  pull_request:

jobs:
  yara-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install YARA
        run: sudo apt-get install -y yara

      - name: Scan source code for supply chain IOCs
        run: |
          yara -r rules/npm_supply_chain.yar . 2>&1 | tee yara-results.txt
          if [ -s yara-results.txt ]; then
            echo "::error::YARA detected suspicious patterns"
            cat yara-results.txt
            exit 1
          fi

      - name: Install dependencies
        run: npm ci

      - name: Scan node_modules
        run: |
          yara -r rules/npm_supply_chain.yar node_modules/ 2>&1 \
            | tee yara-deps-results.txt
          if [ -s yara-deps-results.txt ]; then
            echo "::error::YARA detected malicious dependencies"
            cat yara-deps-results.txt
            exit 1
          fi

      - name: Build Docker image
        run: docker build -t app:${{ github.sha }} .

      - name: Scan Docker image with YaraHunter
        run: |
          docker run -i --rm \
            -v /var/run/docker.sock:/var/run/docker.sock \
            -v /tmp:/home/deepfence/output \
            deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
            --image-name app:${{ github.sha }} \
            --output=json > /tmp/yara-image-scan.json
          HITS=$(jq length /tmp/yara-image-scan.json)
          if [ "$HITS" -gt 0 ]; then
            echo "::error::YaraHunter found $HITS matches in Docker image"
            jq '.[].rule_name' /tmp/yara-image-scan.json
            exit 1
          fi

Combined Pipeline with Syft and Grype

For a complete build security pipeline, combine YARA (malware detection) with Syft (SBOM generation) and Grype (CVE scanning). See the Vulnerability & SBOM Scanning with Syft and Grype tutorial for setting up the SBOM/CVE side.

A full pipeline looks like:

# 1. YARA — detect malware and supply chain attacks
yara -r rules/ node_modules/

# 2. Build the image
docker build -t myapp:latest .

# 3. Syft — generate SBOM
syft myapp:latest -o spdx-json > sbom.json

# 4. Grype — scan SBOM for CVEs
grype sbom:sbom.json --fail-on critical

# 5. YaraHunter — deep scan the built image
docker run -i --rm \
    -v /var/run/docker.sock:/var/run/docker.sock \
    deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
    --image-name myapp:latest

Each tool covers a different gap:

Tool Detects Misses
YARA Novel malware, backdoors, IOC patterns Known CVEs, version-based vulnerabilities
Grype Published CVEs with fix versions Zero-days, trojaned packages without CVEs
Syft Full dependency tree, license issues Nothing about whether code is malicious
Trivy CVEs + some misconfigs Custom malware patterns

Cleanup

# Remove test images
docker rmi eicar-test:0.1 node-app-test:0.1 2>/dev/null

# Remove exported layers
rm -rf /tmp/eicar-layers /tmp/node-layers /tmp/eicar-test.tar /tmp/node-app.tar

# Remove test files
rm -rf test-npm-package/

Next Steps

  • Expand the YARA rule set — Add rules for cryptominers (stratum+tcp://), data exfiltration patterns, and post-exploitation tools
  • Integrate with Falco — Detect runtime execution of YARA-flagged binaries. See the Runtime K8s Monitoring with Gatekeeper and Falco tutorial
  • Build a malware sample repository — Collect IOCs from real supply chain incidents and write rules against them
  • Test against the Debian Package Supply Chain lab — Use the YARA reverse shell detection rules from that tutorial to scan .deb packages
  • Automate SBOM + YARA in your pipeline — Follow the Vulnerability & SBOM Scanning with Syft and Grype tutorial to add CVE scanning alongside YARA