performing-malware-triage-with-yara
Performs rapid malware triage and classification using YARA rules to match file patterns, strings, byte sequences, and structural characteristics against known malware families and suspicious indicators. Covers rule writing, scanning, and integration with analysis pipelines. Activates for requests involving YARA rule creation, malware classification, pattern matching, sample triage, or signature-based detection.
Best use case
performing-malware-triage-with-yara is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Performs rapid malware triage and classification using YARA rules to match file patterns, strings, byte sequences, and structural characteristics against known malware families and suspicious indicators. Covers rule writing, scanning, and integration with analysis pipelines. Activates for requests involving YARA rule creation, malware classification, pattern matching, sample triage, or signature-based detection.
Teams using performing-malware-triage-with-yara should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/performing-malware-triage-with-yara/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How performing-malware-triage-with-yara Compares
| Feature / Agent | performing-malware-triage-with-yara | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Performs rapid malware triage and classification using YARA rules to match file patterns, strings, byte sequences, and structural characteristics against known malware families and suspicious indicators. Covers rule writing, scanning, and integration with analysis pipelines. Activates for requests involving YARA rule creation, malware classification, pattern matching, sample triage, or signature-based detection.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Performing Malware Triage with YARA
## When to Use
- Rapidly classifying a large batch of malware samples against known family signatures
- Writing detection rules for a newly analyzed malware family based on unique byte patterns
- Scanning file shares, endpoints, or memory dumps for indicators of a specific threat
- Building automated triage pipelines that classify samples before manual analysis
- Hunting for variants of a known threat across an enterprise using YARA scans
**Do not use** as the sole analysis method; YARA triage identifies known patterns but does not reveal new or unknown malware behaviors.
## Prerequisites
- YARA 4.x installed (`apt install yara` or `pip install yara-python`)
- YARA rule repositories (YARA-Rules, awesome-yara, Malpedia rules, Florian Roth's signature-base)
- Python 3.8+ with `yara-python` for scripted scanning
- Sample collection organized in a directory structure for batch scanning
- Understanding of PE file format, hex patterns, and regular expressions for rule writing
## Workflow
### Step 1: Scan Samples with Existing Rule Sets
Apply community and commercial YARA rules to classify samples:
```bash
# Scan a single file
yara -s malware_rules.yar suspect.exe
# Scan a directory of samples
yara -r malware_rules.yar /path/to/samples/
# Scan with multiple rule files
yara -r rules/apt_rules.yar rules/ransomware_rules.yar rules/trojan_rules.yar suspect.exe
# Scan with timeout (prevent hanging on large files)
yara -t 30 malware_rules.yar suspect.exe
# Scan and show matching strings
yara -s -r malware_rules.yar suspect.exe
# Scan with compiled rules (faster for repeated scans)
yarac malware_rules.yar compiled_rules.yarc
yara compiled_rules.yarc suspect.exe
```
```bash
# Download community rule sets
git clone https://github.com/Yara-Rules/rules.git yara-community-rules
git clone https://github.com/Neo23x0/signature-base.git signature-base
# Scan with signature-base
yara -r signature-base/yara/*.yar suspect.exe
```
### Step 2: Write Rules for Unique String Patterns
Create YARA rules based on strings extracted during malware analysis:
```
rule MalwareX_Strings {
meta:
description = "Detects MalwareX based on unique strings"
author = "analyst"
date = "2025-09-15"
reference = "Internal Analysis Report #1547"
hash = "e3b0c44298fc1c149afbf4c8996fb924"
tlp = "WHITE"
strings:
// C2 URL pattern
$url1 = "/gate.php?id=" ascii
$url2 = "/panel/connect.php" ascii
// Unique mutex name
$mutex = "Global\\CryptLocker_2025" ascii wide
// User-Agent string
$ua = "Mozilla/5.0 (compatible; MSIE 10.0)" ascii
// Registry persistence path
$reg = "Software\\Microsoft\\Windows\\CurrentVersion\\Run\\WindowsUpdate" ascii
// Campaign identifier
$campaign = "campaign_2025_q3" ascii
condition:
uint16(0) == 0x5A4D and // PE file (MZ header)
filesize < 500KB and // Size constraint
($url1 or $url2) and // At least one C2 URL
($mutex or $campaign) and // Campaign identifier
$ua // Specific User-Agent
}
```
### Step 3: Write Rules for Byte Patterns
Create rules matching specific code sequences:
```
rule MalwareX_Decryptor {
meta:
description = "Detects MalwareX XOR decryption routine"
author = "analyst"
date = "2025-09-15"
strings:
// XOR decryption loop (x86 assembly)
// mov al, [esi+ecx]
// xor al, [edi+ecx]
// mov [esi+ecx], al
// inc ecx
// cmp ecx, edx
// jl loop
$xor_loop = { 8A 04 0E 32 04 0F 88 04 0E 41 3B CA 7C F3 }
// RC4 KSA initialization (256-byte loop)
$rc4_ksa = { 33 C0 88 04 ?8 40 3D 00 01 00 00 7? }
// Embedded RSA public key marker
$rsa_key = { 06 02 00 00 00 A4 00 00 52 53 41 31 } // PUBLICKEYBLOB
condition:
uint16(0) == 0x5A4D and
($xor_loop or $rc4_ksa) and
$rsa_key
}
```
### Step 4: Write Rules with PE Module
Leverage YARA's PE module for structural detection:
```
import "pe"
import "hash"
import "math"
rule MalwareX_PE_Characteristics {
meta:
description = "Detects MalwareX by PE structure and imports"
author = "analyst"
condition:
pe.is_pe and
// Compiled within specific timeframe
pe.timestamp > 1693526400 and // After 2023-09-01
pe.timestamp < 1727740800 and // Before 2024-10-01
// Specific import hash
pe.imphash() == "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6" or
// Suspicious import combination
(
pe.imports("kernel32.dll", "VirtualAllocEx") and
pe.imports("kernel32.dll", "WriteProcessMemory") and
pe.imports("kernel32.dll", "CreateRemoteThread") and
pe.imports("wininet.dll", "InternetOpenA")
) or
// High entropy .text section (packed)
(
for any section in pe.sections : (
section.name == ".text" and
math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
)
)
}
rule MalwareX_Rich_Header {
meta:
description = "Detects MalwareX by Rich header hash"
condition:
pe.is_pe and
hash.md5(pe.rich_signature.clear_data) == "abc123def456abc123def456abc123de"
}
```
### Step 5: Batch Triage with Python
Automate scanning of sample collections:
```python
import yara
import os
import json
import hashlib
from datetime import datetime
# Compile all rule files
rule_files = {
"apt": "rules/apt_rules.yar",
"ransomware": "rules/ransomware_rules.yar",
"trojan": "rules/trojan_rules.yar",
"custom": "rules/custom_rules.yar",
}
rules = yara.compile(filepaths=rule_files)
# Scan sample directory
results = []
sample_dir = "/path/to/samples"
for filename in os.listdir(sample_dir):
filepath = os.path.join(sample_dir, filename)
if not os.path.isfile(filepath):
continue
with open(filepath, "rb") as f:
data = f.read()
sha256 = hashlib.sha256(data).hexdigest()
matches = rules.match(filepath)
result = {
"filename": filename,
"sha256": sha256,
"size": len(data),
"matches": [],
"classification": "UNKNOWN",
}
for match in matches:
result["matches"].append({
"rule": match.rule,
"namespace": match.namespace,
"tags": match.tags,
"strings": [(hex(s[0]), s[1], s[2].decode("utf-8", errors="replace")[:100])
for s in match.strings] if match.strings else []
})
if result["matches"]:
result["classification"] = result["matches"][0]["namespace"].upper()
results.append(result)
# Summary
classified = sum(1 for r in results if r["classification"] != "UNKNOWN")
print(f"Scanned: {len(results)} samples")
print(f"Classified: {classified} ({classified/len(results)*100:.1f}%)")
print(f"Unknown: {len(results)-classified}")
# Export results
with open("triage_results.json", "w") as f:
json.dump(results, f, indent=2)
```
### Step 6: Validate and Optimize Rules
Test rules for false positives and performance:
```bash
# Test rule syntax
yara -C custom_rules.yar
# Scan known-clean directory to check false positives
yara -r custom_rules.yar /path/to/clean_files/ > false_positives.txt
wc -l false_positives.txt
# Benchmark rule performance
time yara -r custom_rules.yar /path/to/large_sample_collection/
# Profile individual rule performance
yara -p custom_rules.yar suspect.exe
```
## Key Concepts
| Term | Definition |
|------|------------|
| **YARA Rule** | Pattern matching rule defining strings, byte sequences, and conditions that identify a specific file or malware family |
| **Condition** | Boolean expression combining string matches, file properties, and module functions to determine if a rule matches |
| **Hex String** | Byte pattern with optional wildcards (??) and jumps ([N-M]) for matching machine code or binary data |
| **PE Module** | YARA module providing access to PE file properties (imports, sections, timestamps, resources) for structural matching |
| **Imphash** | MD5 hash of a PE file's import table; samples from the same family often share import hashes |
| **Rich Header** | Undocumented PE structure containing compiler/linker metadata; consistent within malware build environments |
| **YARA-C** | Compiled YARA rule format enabling faster scanning by pre-compiling rules for repeated use |
## Tools & Systems
- **YARA**: Pattern matching engine for identifying and classifying malware based on text, hex, and structural patterns
- **yara-python**: Python bindings for YARA enabling scripted scanning, rule compilation, and integration with analysis pipelines
- **yarGen**: Automatic YARA rule generator that creates rules from malware samples by identifying unique strings and opcodes
- **YARA-Rules (GitHub)**: Community-maintained repository of YARA rules covering malware families, exploits, and suspicious indicators
- **Malpedia YARA**: Curated YARA rules from the Malpedia malware encyclopedia with high-quality family-specific rules
## Common Scenarios
### Scenario: Creating Detection Rules for a New Malware Family
**Context**: Reverse engineering of a new malware sample has identified unique strings, byte patterns, and PE characteristics. YARA rules are needed for enterprise-wide hunting and ongoing detection.
**Approach**:
1. Extract unique strings from the unpacked binary (C2 URLs, mutex names, registry paths)
2. Identify unique byte sequences from the encryption routine or C2 protocol (from Ghidra analysis)
3. Record PE characteristics (imphash, Rich header hash, section names, compilation timestamp range)
4. Write a YARA rule combining string, byte pattern, and PE module conditions
5. Test against the known malware samples to confirm true positive detection
6. Test against a clean file corpus (Windows system files, common applications) to verify zero false positives
7. Deploy to enterprise scanning infrastructure and threat intelligence platform
**Pitfalls**:
- Writing rules too specific to a single sample (will not detect variants with minor changes)
- Writing rules too generic (matching legitimate software, causing false positives)
- Using strings that appear in common libraries or frameworks (e.g., OpenSSL strings)
- Not testing on a sufficiently large clean corpus before deployment
## Output Format
```
YARA TRIAGE RESULTS
=====================
Scan Date: 2025-09-15
Rule Sets: apt_rules (847 rules), ransomware_rules (312 rules),
trojan_rules (1,204 rules), custom_rules (45 rules)
Samples Scanned: 2,500
Processing Time: 47 seconds
CLASSIFICATION SUMMARY
APT: 12 samples (0.5%)
Ransomware: 187 samples (7.5%)
Trojan: 423 samples (16.9%)
Unknown: 1,878 samples (75.1%)
TOP MATCHING RULES
Rule Matches Family
MalwareX_C2_Beacon 45 MalwareX
LockBit3_Ransom_Note 38 LockBit 3.0
Emotet_Epoch5_Loader 32 Emotet
CobaltStrike_Beacon_Config 28 Cobalt Strike
QakBot_DLL_Loader 25 QakBot
SAMPLE DETAIL
File: suspect.exe
SHA-256: e3b0c44298fc1c149afbf4c8996fb924...
Matches:
[1] MalwareX_Strings (custom)
- $url1 at 0x4A20: "/gate.php?id="
- $mutex at 0x5100: "Global\\CryptLocker_2025"
[2] MalwareX_Decryptor (custom)
- $xor_loop at 0x401200: { 8A 04 0E 32 04 0F ... }
[3] MalwareX_PE_Characteristics (custom)
- PE import combination matched
Classification: MALWAREX (HIGH CONFIDENCE)
```Related Skills
reverse-engineering-rust-malware
Reverse engineer Rust-compiled malware using IDA Pro and Ghidra with techniques for handling non-null-terminated strings, crate dependency extraction, and Rust-specific control flow analysis.
reverse-engineering-malware-with-ghidra
Reverse engineers malware binaries using NSA's Ghidra disassembler and decompiler to understand internal logic, cryptographic routines, C2 protocols, and evasion techniques at the assembly and pseudo-C level. Activates for requests involving malware reverse engineering, disassembly analysis, decompilation, binary analysis, or understanding malware internals.
reverse-engineering-dotnet-malware-with-dnspy
Reverse engineers .NET malware using dnSpy decompiler and debugger to analyze C#/VB.NET source code, identify obfuscation techniques, extract configurations, and understand malicious functionality including stealers, RATs, and loaders. Activates for requests involving .NET malware analysis, C# malware decompilation, managed code reverse engineering, or .NET obfuscation analysis.
reverse-engineering-android-malware-with-jadx
Reverse engineers malicious Android APK files using JADX decompiler to analyze Java/Kotlin source code, identify malicious functionality including data theft, C2 communication, privilege escalation, and overlay attacks. Examines manifest permissions, receivers, services, and native libraries. Activates for requests involving Android malware analysis, APK reverse engineering, mobile malware investigation, or Android threat analysis.
performing-yara-rule-development-for-detection
Develop precise YARA rules for malware detection by identifying unique byte patterns, strings, and behavioral indicators in executable files while minimizing false positives.
performing-wireless-security-assessment-with-kismet
Conduct wireless network security assessments using Kismet to detect rogue access points, hidden SSIDs, weak encryption, and unauthorized clients through passive RF monitoring.
performing-wireless-network-penetration-test
Execute a wireless network penetration test to assess WiFi security by capturing handshakes, cracking WPA2/WPA3 keys, detecting rogue access points, and testing wireless segmentation using Aircrack-ng and related tools.
performing-windows-artifact-analysis-with-eric-zimmerman-tools
Perform comprehensive Windows forensic artifact analysis using Eric Zimmerman's open-source EZ Tools suite including KAPE, MFTECmd, PECmd, LECmd, JLECmd, and Timeline Explorer for parsing registry hives, prefetch files, event logs, and file system metadata.
performing-wifi-password-cracking-with-aircrack
Captures WPA/WPA2 handshakes and performs offline password cracking using aircrack-ng, hashcat, and dictionary attacks during authorized wireless security assessments to evaluate passphrase strength and wireless network security posture.
performing-web-cache-poisoning-attack
Exploiting web cache mechanisms to serve malicious content to other users by poisoning cached responses through unkeyed headers and parameters during authorized security tests.
performing-web-cache-deception-attack
Execute web cache deception attacks by exploiting path normalization discrepancies between CDN caching layers and origin servers to cache and retrieve sensitive authenticated content.
performing-web-application-vulnerability-triage
Triage web application vulnerability findings from DAST/SAST scanners using OWASP risk rating methodology to separate true positives from false positives and prioritize remediation.