Vulnerability scanners like Grype and Trivy check known CVE databases — they tell you if a library has a published vulnerability. But they don't detect novel malware, backdoored binaries, or trojanized packages that haven't been assigned a CVE yet. That gap is where YARA comes in.
YARA is a pattern matching engine that lets you write custom rules to detect specific byte sequences, strings, and structural patterns in files. It doesn't need a CVE database — you define what "suspicious" looks like, and YARA finds it.
This tutorial covers two labs:
For vulnerability and SBOM scanning (the other half of build security), see the Vulnerability & SBOM Scanning with Syft and Grype tutorial.
Example prototype available: https://github.com/kurtiepie/MalScan
YARA rules have three sections:
rule ExampleRule
{
meta:
description = "What this rule detects"
author = "Your name"
severity = "critical"
strings:
$text1 = "suspicious_string" ascii
$hex1 = { 4D 5A 90 00 }
$regex1 = /eval\(atob\(['"]/
condition:
any of them
}
meta — Human-readable metadata. Not used for matching, but useful for triage and reporting.strings — The patterns to search for. Can be plain text (ascii/wide), hex byte sequences, or regular expressions.condition — Boolean logic that decides if the rule fires. Can reference string matches, file size, file magic bytes, and more.YARA vs antivirus signatures: AV engines use proprietary signature databases that update automatically. YARA puts you in control — you write rules for the specific threats relevant to your environment. This makes it ideal for detecting supply chain attacks where the malware is novel and hasn't been signatured by AV vendors yet.
The EICAR test file is a safe, non-destructive file developed by the European Institute for Computer Antivirus Research. Every AV engine recognizes it as malicious, making it perfect for testing detection pipelines without handling real malware.
mkdir -p ~/yara-lab && cd ~/yara-lab
curl -O https://secure.eicar.org/eicar.com.txt
The file contains a single string that triggers AV detection:
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
Create rules/eicar.yar:
mkdir -p rules
cat > rules/eicar.yar << 'EOF'
rule EICAR_Test_File
{
meta:
description = "Detects the EICAR antivirus test file"
author = "YARA Lab"
reference = "https://www.eicar.org/download-anti-malware-testfile/"
severity = "info"
strings:
$eicar = "X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*" ascii
condition:
$eicar
}
EOF
Create Dockerfile-eicar:
FROM alpine:3.19
WORKDIR /app
COPY eicar.com.txt /app/
CMD ["cat", "/app/eicar.com.txt"]
Build and tag:
docker build -f Dockerfile-eicar -t eicar-test:0.1 .
If you run a local Docker registry, you can scan the registry storage directly:
# Push to local registry
docker tag eicar-test:0.1 localhost:5000/eicar-test:0.1
docker push localhost:5000/eicar-test:0.1
# Scan the registry storage
docker run --rm \
-v "$PWD/rules:/rules:ro" \
-v /opt/docker-registry/data/docker/registry/v2/:/malware:ro \
blacktop/yara -r /rules/eicar.yar /malware/repositories/eicar-test/
Without a local registry, export and scan the image layers:
# Save the image as a tar
docker save eicar-test:0.1 -o /tmp/eicar-test.tar
# Extract layers
mkdir -p /tmp/eicar-layers
tar -xf /tmp/eicar-test.tar -C /tmp/eicar-layers
# Scan all layers
docker run --rm \
-v "$PWD/rules:/rules:ro" \
-v /tmp/eicar-layers:/scan:ro \
blacktop/yara -r /rules/eicar.yar /scan/
Expected output:
EICAR_Test_File /scan/.../layer.tar
Deepfence YaraHunter unpacks Docker image layers automatically and runs a comprehensive rule set:
docker run -i --rm --name=deepfence-yarahunter \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp:/home/deepfence/output \
deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
--image-name eicar-test:0.1 \
--output=json > eicar-results.json
Inspect the results:
cat eicar-results.json | jq '.[] | {rule: .rule_name, path: .complete_filename, severity: .severity}'
In late 2025, the Shai-Hulud 2.0 campaign demonstrated what a fully automated npm supply chain worm looks like. Attackers compromised npm maintainer accounts and published trojanized versions of legitimate packages. Every developer who ran npm install on a compromised dependency got hit — before their application code even ran.
The attack chain exploited npm's lifecycle hooks — scripts in package.json that execute automatically during npm install:
npm install
└─> preinstall hook fires
└─> node setup_bun.js
└─> downloads and executes bun_environment.js
├─> harvests GitHub tokens (env vars, gh CLI)
├─> harvests npm tokens (NPM_TOKEN, .npmrc)
├─> creates GitHub repo for exfiltration
├─> deploys self-hosted GitHub Actions runner
├─> runs TruffleHog to scan filesystem for secrets
├─> extracts cloud secrets (AWS IMDS, Azure Key Vault, GCP)
├─> uploads stolen data as base64-encoded JSON
└─> uses npm tokens to trojanize MORE packages (worm propagation)
Key IOCs (Indicators of Compromise):
| Indicator | Type |
|---|---|
setup_bun.js |
Loader script injected into packages |
bun_environment.js |
Obfuscated primary payload |
"preinstall": "node setup_bun.js" |
Malicious lifecycle hook in package.json |
contents.json, environment.json, cloud.json |
Exfiltration data files |
truffleSecrets.json, actionsSecrets.json |
Stolen secrets files |
TruffleHog execution: trufflehog filesystem $HOME |
Secret scanning tool abuse |
AWS IMDS access: 169.254.169.254 |
Cloud credential theft |
Create rules/npm_supply_chain.yar:
rule NPM_ShaiHulud_Loader
{
meta:
description = "Detects Shai-Hulud 2.0 loader script (setup_bun.js)"
author = "YARA Lab"
reference = "https://www.sweet.security/blog/shai-hulud-2-0-escalating-npm-supply-chain-threat-demands-immediate-attention"
severity = "critical"
strings:
$filename = "setup_bun.js" ascii
$payload = "bun_environment" ascii
$preinstall = "preinstall" ascii
condition:
2 of them
}
rule NPM_ShaiHulud_Payload
{
meta:
description = "Detects Shai-Hulud 2.0 primary payload patterns"
severity = "critical"
strings:
$env_file = "bun_environment.js" ascii
$gh_token = "GITHUB_TOKEN" ascii
$npm_token = "NPM_TOKEN" ascii
$npmrc = ".npmrc" ascii
$trufflehog = "trufflehog" ascii
$imds = "169.254.169.254" ascii
condition:
3 of them
}
rule NPM_ShaiHulud_Exfil
{
meta:
description = "Detects Shai-Hulud 2.0 exfiltration artifacts"
severity = "high"
strings:
$contents = "contents.json" ascii
$environment = "environment.json" ascii
$cloud = "cloud.json" ascii
$truffleSec = "truffleSecrets.json" ascii
$actionsSec = "actionsSecrets.json" ascii
condition:
3 of them
}
rule NPM_Suspicious_Preinstall_Hook
{
meta:
description = "Detects suspicious preinstall hooks in package.json"
severity = "high"
strings:
$preinstall = /"preinstall"\s*:\s*"node\s+[a-zA-Z_]+\.js"/ ascii
$setup = "setup_" ascii
$bun = "_bun" ascii
condition:
$preinstall and ($setup or $bun)
}
rule NPM_Credential_Harvester
{
meta:
description = "Detects credential harvesting patterns common in npm supply chain attacks"
severity = "critical"
strings:
$gh_token = "GITHUB_TOKEN" ascii
$npm_token = "NPM_TOKEN" ascii
$npmrc = ".npmrc" ascii
$aws_key = "AWS_SECRET_ACCESS_KEY" ascii
$aws_imds = "169.254.169.254" ascii
$az_token = "az account get-access-token" ascii
$gcp_meta = "metadata.google.internal" ascii
$base64 = "base64" ascii
condition:
filesize < 500KB and 4 of them
}
rule NPM_Obfuscated_Malicious_JS
{
meta:
description = "Detects common obfuscation patterns in malicious npm packages"
severity = "medium"
strings:
$eval_atob = /eval\s*\(\s*atob\s*\(/ ascii
$eval_buffer = /eval\s*\(\s*Buffer\.from\s*\(/ ascii
$char_concat = /String\.fromCharCode\s*\(\s*\d+\s*(,\s*\d+\s*){10,}\)/ ascii
$hex_array = /\[0x[0-9a-f]+\s*(,\s*0x[0-9a-f]+\s*){20,}\]/ ascii
$reverse_exec = /\]\s*\.\s*reverse\s*\(\s*\)\s*\.\s*join/ ascii
condition:
any of them
}
Create a harmless simulated trojanized package for testing:
mkdir -p test-npm-package
# Simulated malicious package.json
cat > test-npm-package/package.json << 'EOF'
{
"name": "totally-legit-package",
"version": "1.0.0",
"description": "A helpful utility",
"preinstall": "node setup_bun.js",
"main": "index.js"
}
EOF
# Simulated loader (harmless — just echoes)
cat > test-npm-package/setup_bun.js << 'EOF'
// Simulated Shai-Hulud loader for YARA testing
// In the real attack, this downloads and executes bun_environment.js
console.log("SIMULATED: Would load bun_environment.js");
console.log("SIMULATED: Would harvest GITHUB_TOKEN and NPM_TOKEN");
console.log("SIMULATED: Would access 169.254.169.254 for AWS IMDS");
EOF
# Simulated payload (harmless)
cat > test-npm-package/bun_environment.js << 'EOF'
// Simulated Shai-Hulud payload for YARA testing
// In the real attack, this is heavily obfuscated
var targets = ["contents.json", "environment.json", "cloud.json",
"truffleSecrets.json", "actionsSecrets.json"];
console.log("SIMULATED: Would create exfiltration files:", targets);
console.log("SIMULATED: Would run trufflehog filesystem $HOME");
EOF
# Scan the test package directory
yara -r rules/npm_supply_chain.yar test-npm-package/
Expected output:
NPM_ShaiHulud_Loader test-npm-package/package.json
NPM_Suspicious_Preinstall_Hook test-npm-package/package.json
NPM_ShaiHulud_Loader test-npm-package/setup_bun.js
NPM_ShaiHulud_Payload test-npm-package/setup_bun.js
NPM_ShaiHulud_Payload test-npm-package/bun_environment.js
NPM_ShaiHulud_Exfil test-npm-package/bun_environment.js
Build a Docker image that includes npm dependencies, then scan it:
# Dockerfile-node-app
FROM node:20-slim
WORKDIR /app
COPY test-npm-package/ /app/
CMD ["node", "index.js"]
# Build the image
docker build -f Dockerfile-node-app -t node-app-test:0.1 .
# Export and scan
docker save node-app-test:0.1 -o /tmp/node-app.tar
mkdir -p /tmp/node-layers
tar -xf /tmp/node-app.tar -C /tmp/node-layers
# Scan all layers for supply chain IOCs
yara -r rules/npm_supply_chain.yar /tmp/node-layers/
# Or use Deepfence YaraHunter with custom rules
docker run -i --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "$PWD/rules:/home/deepfence/rules:ro" \
-v /tmp:/home/deepfence/output \
deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
--image-name node-app-test:0.1 \
--output=json > npm-scan-results.json
Catch compromised packages before they get baked into an image:
# After npm install, before docker build
yara -r rules/npm_supply_chain.yar -r node_modules/ 2>/dev/null
# Check specifically for suspicious preinstall hooks
find node_modules -name "package.json" -exec \
grep -l '"preinstall"' {} \; | while read pkg; do
echo "=== $pkg ==="
yara rules/npm_supply_chain.yar "$pkg"
done
Choose strings that are specific to the threat and unlikely to appear in legitimate code:
# BAD — too generic, will fire on every Node.js app
strings:
$s1 = "require" ascii
# GOOD — specific to the attack pattern
strings:
$s1 = "preinstall" ascii
$s2 = "setup_bun.js" ascii
$s3 = "bun_environment" ascii
Use conditions that reduce false positives:
condition:
filesize < 500KB and // Don't scan huge files
3 of ($ioc_*) and // Require multiple IOC matches
not $legitimate // Exclude known good patterns
Restrict rules to relevant file types:
condition:
// Only match ELF binaries
uint32(0) == 0x464C457F and $reverse_shell
// Only match JSON files (package.json)
uint8(0) == 0x7B and $preinstall_hook
filesize conditions to skip large filesname: YARA Malware Scan
on:
push:
branches: [main]
pull_request:
jobs:
yara-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install YARA
run: sudo apt-get install -y yara
- name: Scan source code for supply chain IOCs
run: |
yara -r rules/npm_supply_chain.yar . 2>&1 | tee yara-results.txt
if [ -s yara-results.txt ]; then
echo "::error::YARA detected suspicious patterns"
cat yara-results.txt
exit 1
fi
- name: Install dependencies
run: npm ci
- name: Scan node_modules
run: |
yara -r rules/npm_supply_chain.yar node_modules/ 2>&1 \
| tee yara-deps-results.txt
if [ -s yara-deps-results.txt ]; then
echo "::error::YARA detected malicious dependencies"
cat yara-deps-results.txt
exit 1
fi
- name: Build Docker image
run: docker build -t app:${{ github.sha }} .
- name: Scan Docker image with YaraHunter
run: |
docker run -i --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp:/home/deepfence/output \
deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
--image-name app:${{ github.sha }} \
--output=json > /tmp/yara-image-scan.json
HITS=$(jq length /tmp/yara-image-scan.json)
if [ "$HITS" -gt 0 ]; then
echo "::error::YaraHunter found $HITS matches in Docker image"
jq '.[].rule_name' /tmp/yara-image-scan.json
exit 1
fi
For a complete build security pipeline, combine YARA (malware detection) with Syft (SBOM generation) and Grype (CVE scanning). See the Vulnerability & SBOM Scanning with Syft and Grype tutorial for setting up the SBOM/CVE side.
A full pipeline looks like:
# 1. YARA — detect malware and supply chain attacks
yara -r rules/ node_modules/
# 2. Build the image
docker build -t myapp:latest .
# 3. Syft — generate SBOM
syft myapp:latest -o spdx-json > sbom.json
# 4. Grype — scan SBOM for CVEs
grype sbom:sbom.json --fail-on critical
# 5. YaraHunter — deep scan the built image
docker run -i --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
deepfenceio/deepfence_malware_scanner_ce:2.0.0 \
--image-name myapp:latest
Each tool covers a different gap:
| Tool | Detects | Misses |
|---|---|---|
| YARA | Novel malware, backdoors, IOC patterns | Known CVEs, version-based vulnerabilities |
| Grype | Published CVEs with fix versions | Zero-days, trojaned packages without CVEs |
| Syft | Full dependency tree, license issues | Nothing about whether code is malicious |
| Trivy | CVEs + some misconfigs | Custom malware patterns |
# Remove test images
docker rmi eicar-test:0.1 node-app-test:0.1 2>/dev/null
# Remove exported layers
rm -rf /tmp/eicar-layers /tmp/node-layers /tmp/eicar-test.tar /tmp/node-app.tar
# Remove test files
rm -rf test-npm-package/
stratum+tcp://), data exfiltration patterns, and post-exploitation tools.deb packages