Bot Defense

Multi-layered bot detection: PoW challenges, heuristic scoring, and Cloudflare integration.

Overview

Every incoming request is scored from 0 (definitely human) to 100 (definitely bot). Views and events with bot_score ≥ 50 are excluded from daily rollups. Known bot User-Agents are dropped entirely before scoring.

The Core Problem

Analytics tools are plagued by bot traffic that distorts your data. Echelon solves this with a novel approach: every tracker script embeds a runtime-generated WebAssembly module that browsers must solve before pageviews are accepted. Combined with heuristic scoring and optional Cloudflare integration, the result is clean analytics data without CAPTCHAs or third-party bot detection services.

Known Bot UA Filter

Requests matching any pattern in ECHELON_BOT_UA_PATTERNS are dropped before scoring. Default list includes:

Googlebot, Bingbot, Yandex, Baidu, DuckDuckBot
GPTBot, ClaudeBot, ChatGPT-User, Google-Extended
curl, wget, python-requests, HeadlessChrome, Puppeteer, Playwright
And many more crawlers and automation tools

Bot Scoring Factors

Signal	Condition	Points
CF verified bot	Cloudflare confirms known bot UA	+15
CF bot score ≤ 2	Very confident bot	+50
CF bot score 3–29	Likely automated	+30
CF bot score 30–50	Uncertain	+10
Interaction < 850ms	Beacon fired too fast	+20
Interaction 850–1000ms	Suspiciously fast	+8
Suspect country	IP in `ECHELON_SUSPECT_COUNTRIES`	+30 (configurable)
Per-site suspect country	IP in `ECHELON_SITE_SUSPECT_COUNTRIES`	+30 (stacks with global)
Burst detection	> 15 requests in 5-minute window	+25
Missing Accept-Language	No header present	+10
Missing Sec-CH-UA + Sec-Fetch-Site	Both headers absent	+10
Unrealistic screen	Width/height ≤ 0 or > 10000	+10
No referrer + deep path	Direct visit to path with ≥ 2 segments	+5
PoW token missing	No proof-of-work token sent	+30
PoW token invalid	Token fails verification or replayed	+40

Score is capped at 100.

Proof-of-Work System

Every request for /ea.js embeds a WebAssembly blob and a challenge string. The browser must solve the challenge before sending beacons.

Why PoW Blocks Robots

Most analytics spam comes from scripts that fire HTTP requests directly — curl, python-requests, headless browsers running thousands of sessions. Proof-of-work stops them because:

Requires a real WASM runtime. The challenge can only be solved by executing a WebAssembly module. Simple HTTP clients can’t do this — they’d need to embed a full WASM engine, which eliminates the vast majority of bot scripts.
Costs CPU time per request. Even if a bot does run the WASM, every beacon requires a fresh solve. At scale, this makes spamming your analytics economically impractical — real browsers solve one challenge and cache it, but a bot farm generating thousands of fake pageviews pays the CPU cost on every single one.
Rotates unpredictably. The WASM module itself is regenerated from a random seed every 6 hours, and the challenge string changes every minute. Bots can’t precompute answers or hardcode solutions — they’d need to re-fetch and re-solve constantly.
Binds to session context. The solve includes the session ID and site ID, so a valid token from one session can’t be replayed in another. This prevents token-harvesting attacks where a bot solves once and reuses the result.
Invisible to real users. Unlike CAPTCHAs, the PoW runs in the background in under 150ms. Visitors never see it, never interact with it, and never get blocked by it. The cost falls entirely on automation.

How It Works

A WASM module is generated from a 64-byte random seed every 6 hours
Implements a SipHash-inspired algorithm with randomized constants
Challenge string rotates every minute (HMAC-SHA256 of minute bucket)
Client computes: token = wasm.solve(challenge + ":" + sessionId + ":" + siteId)
Token is a 32-character hex string
Cached in sessionStorage, re-solved on 10% of requests
WASM solve is awaited up to 150ms before sending the beacon

Server Verification

The server tries all combinations of:

Current + previous WASM slots (covers 6-hour rotation)
Last N minute buckets (configurable via ECHELON_CHALLENGE_WINDOW_MINUTES, default 10)

Returns "valid", "missing", or "invalid" — each adding different penalty points.

Burst Detection

Window: 5 minutes per IP
Threshold: 15 requests triggers +25 penalty
Map size: max 100,000 entries (pruned when exceeded)

Rate Limiting

Tracking endpoints (/b.gif, /e) are rate-limited per IP:

Default: 100 requests per 60-second window
Configurable via ECHELON_RATE_LIMIT_MAX and ECHELON_RATE_LIMIT_WINDOW_MS

🗑️ "Did you know you can set ECHELON_BOT_DISCARD_THRESHOLD to immediately drop high-score requests before they're stored? This saves database space when you're confident in your scoring." -🦭

Cloudflare Integration

Set ECHELON_BEHIND_CLOUDFLARE=true to enable:

Read cf-ipcountry for geo data (ignores XX and T1)
Read Cloudflare bot score from cf-bot-score header
Detect CF-verified bots
Use cf-connecting-ip for real client IP

Referrer Classification

Incoming referrers are classified into categories:

Type	Domains
`ai`	perplexity.ai, chat.openai.com, chatgpt.com, claude.ai, you.com, phind.com, copilot.microsoft.com, gemini.google.com, poe.com
`search`	Google (17 TLDs), bing.com, yahoo.com, duckduckgo.com, yandex.com/ru, ecosia.org
`social`	facebook.com, twitter.com, x.com, reddit.com, linkedin.com, instagram.com, t.co
`direct_or_unknown`	Everything else

Manual Exclusion

Via the admin dashboard or API, you can manually exclude specific visitor IDs from rollups:

# Exclude
POST /api/bots/exclude
{ "visitor_id": "abc123", "label": "Known bot" }

# Re-include
DELETE /api/bots/exclude/abc123

Discard Threshold

Set ECHELON_BOT_DISCARD_THRESHOLD to a score (e.g., 80) to discard high-score requests before they're even stored. Default is 0 (store all, filter at rollup time).

Installation Features API Reference Configuration Architecture Data Ownership & Open Access Portable Data MCP Server Telemetry Why ea.js?