Bot Defense
Multi-layered bot detection: PoW challenges, heuristic scoring, and Cloudflare integration.
Overview
Every incoming request is scored from 0 (definitely human) to 100 (definitely bot). Views and events with bot_score ≥ 50 are excluded from daily rollups. Known bot User-Agents are dropped entirely before scoring.
The Core Problem
Analytics tools are plagued by bot traffic that distorts your data. Echelon solves this with a novel approach: every tracker script embeds a runtime-generated WebAssembly module that browsers must solve before pageviews are accepted. Combined with heuristic scoring and optional Cloudflare integration, the result is clean analytics data without CAPTCHAs or third-party bot detection services.
Known Bot UA Filter
Requests matching any pattern in ECHELON_BOT_UA_PATTERNS are dropped before scoring. Default list includes:
- Googlebot, Bingbot, Yandex, Baidu, DuckDuckBot
- GPTBot, ClaudeBot, ChatGPT-User, Google-Extended
- curl, wget, python-requests, HeadlessChrome, Puppeteer, Playwright
- And many more crawlers and automation tools
Bot Scoring Factors
| Signal | Condition | Points |
|---|---|---|
| CF verified bot | Cloudflare confirms known bot UA | +15 |
| CF bot score ≤ 2 | Very confident bot | +50 |
| CF bot score 3–29 | Likely automated | +30 |
| CF bot score 30–50 | Uncertain | +10 |
| Interaction < 850ms | Beacon fired too fast | +20 |
| Interaction 850–1000ms | Suspiciously fast | +8 |
| Suspect country | IP in ECHELON_SUSPECT_COUNTRIES | +30 (configurable) |
| Per-site suspect country | IP in ECHELON_SITE_SUSPECT_COUNTRIES | +30 (stacks with global) |
| Burst detection | > 15 requests in 5-minute window | +25 |
| Missing Accept-Language | No header present | +10 |
| Missing Sec-CH-UA + Sec-Fetch-Site | Both headers absent | +10 |
| Unrealistic screen | Width/height ≤ 0 or > 10000 | +10 |
| No referrer + deep path | Direct visit to path with ≥ 2 segments | +5 |
| PoW token missing | No proof-of-work token sent | +30 |
| PoW token invalid | Token fails verification or replayed | +40 |
Score is capped at 100.
Proof-of-Work System
Every request for /ea.js embeds a WebAssembly blob and a challenge string. The browser must solve the challenge before sending beacons.
Why PoW Blocks Robots
Most analytics spam comes from scripts that fire HTTP requests directly — curl, python-requests, headless browsers running thousands of sessions. Proof-of-work stops them because:
- Requires a real WASM runtime. The challenge can only be solved by executing a WebAssembly module. Simple HTTP clients can’t do this — they’d need to embed a full WASM engine, which eliminates the vast majority of bot scripts.
- Costs CPU time per request. Even if a bot does run the WASM, every beacon requires a fresh solve. At scale, this makes spamming your analytics economically impractical — real browsers solve one challenge and cache it, but a bot farm generating thousands of fake pageviews pays the CPU cost on every single one.
- Rotates unpredictably. The WASM module itself is regenerated from a random seed every 6 hours, and the challenge string changes every minute. Bots can’t precompute answers or hardcode solutions — they’d need to re-fetch and re-solve constantly.
- Binds to session context. The solve includes the session ID and site ID, so a valid token from one session can’t be replayed in another. This prevents token-harvesting attacks where a bot solves once and reuses the result.
- Invisible to real users. Unlike CAPTCHAs, the PoW runs in the background in under 150ms. Visitors never see it, never interact with it, and never get blocked by it. The cost falls entirely on automation.
How It Works
- A WASM module is generated from a 64-byte random seed every 6 hours
- Implements a SipHash-inspired algorithm with randomized constants
- Challenge string rotates every minute (HMAC-SHA256 of minute bucket)
- Client computes:
token = wasm.solve(challenge + ":" + sessionId + ":" + siteId) - Token is a 32-character hex string
- Cached in
sessionStorage, re-solved on 10% of requests - WASM solve is awaited up to 150ms before sending the beacon
Server Verification
The server tries all combinations of:
- Current + previous WASM slots (covers 6-hour rotation)
- Last N minute buckets (configurable via
ECHELON_CHALLENGE_WINDOW_MINUTES, default 10)
Returns "valid", "missing", or "invalid" — each adding different penalty points.
Burst Detection
- Window: 5 minutes per IP
- Threshold: 15 requests triggers +25 penalty
- Map size: max 100,000 entries (pruned when exceeded)
Rate Limiting
Tracking endpoints (/b.gif, /e) are rate-limited per IP:
- Default: 100 requests per 60-second window
- Configurable via
ECHELON_RATE_LIMIT_MAXandECHELON_RATE_LIMIT_WINDOW_MS
🗑️ "Did you know you can set ECHELON_BOT_DISCARD_THRESHOLD to immediately drop high-score requests before they're stored? This saves database space when you're confident in your scoring." -🦭
Cloudflare Integration
Set ECHELON_BEHIND_CLOUDFLARE=true to enable:
- Read
cf-ipcountryfor geo data (ignoresXXandT1) - Read Cloudflare bot score from
cf-bot-scoreheader - Detect CF-verified bots
- Use
cf-connecting-ipfor real client IP
Referrer Classification
Incoming referrers are classified into categories:
| Type | Domains |
|---|---|
ai | perplexity.ai, chat.openai.com, chatgpt.com, claude.ai, you.com, phind.com, copilot.microsoft.com, gemini.google.com, poe.com |
search | Google (17 TLDs), bing.com, yahoo.com, duckduckgo.com, yandex.com/ru, ecosia.org |
social | facebook.com, twitter.com, x.com, reddit.com, linkedin.com, instagram.com, t.co |
direct_or_unknown | Everything else |
Manual Exclusion
Via the admin dashboard or API, you can manually exclude specific visitor IDs from rollups:
# Exclude
POST /api/bots/exclude
{ "visitor_id": "abc123", "label": "Known bot" }
# Re-include
DELETE /api/bots/exclude/abc123
Discard Threshold
Set ECHELON_BOT_DISCARD_THRESHOLD to a score (e.g., 80) to discard high-score requests before they're even stored. Default is 0 (store all, filter at rollup time).