> Bot Defense
Multi-layered bot detection: PoW challenges, heuristic scoring, and Cloudflare integration.
Overview
Every incoming request is scored from 0 (definitely human) to 100 (definitely bot). Views and events with bot_score ≥ 50 are excluded from daily rollups. Known bot User-Agents are dropped entirely before scoring.
Known Bot UA Filter
Requests matching any pattern in ECHELON_BOT_UA_PATTERNS are dropped before scoring. Default list includes:
- Googlebot, Bingbot, Yandex, Baidu, DuckDuckBot
- GPTBot, ClaudeBot, ChatGPT-User, Google-Extended
- curl, wget, python-requests, HeadlessChrome, Puppeteer, Playwright
- And many more crawlers and automation tools
Bot Scoring Factors
| Signal | Condition | Points |
|---|---|---|
| CF verified bot | Cloudflare confirms known bot UA | +15 |
| CF bot score ≤ 2 | Very confident bot | +50 |
| CF bot score 3–29 | Likely automated | +30 |
| CF bot score 30–50 | Uncertain | +10 |
| Interaction < 850ms | Beacon fired too fast | +20 |
| Interaction 850–1000ms | Suspiciously fast | +8 |
| Suspect country | IP in ECHELON_SUSPECT_COUNTRIES | +30 (configurable) |
| Per-site suspect country | IP in ECHELON_SITE_SUSPECT_COUNTRIES | +30 (stacks with global) |
| Burst detection | > 15 requests in 5-minute window | +25 |
| Missing Accept-Language | No header present | +10 |
| Missing Sec-CH-UA + Sec-Fetch-Site | Both headers absent | +10 |
| Unrealistic screen | Width/height ≤ 0 or > 10000 | +10 |
| No referrer + deep path | Direct visit to path with ≥ 2 segments | +5 |
| PoW token missing | No proof-of-work token sent | +15 |
| PoW token invalid | Token fails verification | +25 |
Score is capped at 100.
Proof-of-Work System
Every request for /ea.js embeds a WebAssembly blob and a challenge string. The browser must solve the challenge before sending beacons.
How It Works
- A WASM module is generated from a 64-byte random seed every 6 hours
- Implements a SipHash-inspired algorithm with randomized constants
- Challenge string rotates every minute (HMAC-SHA256 of minute bucket)
- Client computes:
token = wasm.solve(challenge + ":" + sessionId + ":" + siteId) - Token is a 32-character hex string
- Cached in
sessionStorage, re-solved on 10% of requests - WASM solve is awaited up to 150ms before sending the beacon
Server Verification
The server tries all combinations of:
- Current + previous WASM slots (covers 6-hour rotation)
- Last N minute buckets (configurable via
ECHELON_CHALLENGE_WINDOW_MINUTES, default 10)
Returns "valid", "missing", or "invalid" — each adding different penalty points.
Burst Detection
- Window: 5 minutes per IP
- Threshold: 15 requests triggers +25 penalty
- Map size: max 100,000 entries (pruned when exceeded)
Rate Limiting
Tracking endpoints (/b.gif, /e) are rate-limited per IP:
- Default: 100 requests per 60-second window
- Configurable via
ECHELON_RATE_LIMIT_MAXandECHELON_RATE_LIMIT_WINDOW_MS
Cloudflare Integration
Set ECHELON_BEHIND_CLOUDFLARE=true to enable:
- Read
cf-ipcountryfor geo data (ignoresXXandT1) - Read Cloudflare bot score from
cf-bot-scoreheader - Detect CF-verified bots
- Use
cf-connecting-ipfor real client IP
Referrer Classification
Incoming referrers are classified into categories:
| Type | Domains |
|---|---|
ai | perplexity.ai, chat.openai.com, chatgpt.com, claude.ai, you.com, phind.com, copilot.microsoft.com, gemini.google.com, poe.com |
search | Google (22 TLDs), bing.com, yahoo.com, duckduckgo.com, yandex.com/ru, ecosia.org |
social | facebook.com, twitter.com, x.com, reddit.com, linkedin.com, instagram.com, t.co |
direct_or_unknown | Everything else |
Manual Exclusion
Via the admin dashboard or API, you can manually exclude specific visitor IDs from rollups:
# Exclude
POST /api/bots/exclude
{ "visitor_id": "abc123", "label": "Known bot" }
# Re-include
DELETE /api/bots/exclude/abc123
Discard Threshold
Set ECHELON_BOT_DISCARD_THRESHOLD to a score (e.g., 80) to discard high-score requests before they're even stored. Default is 0 (store all, filter at rollup time).