Portable Data
One file. Your data. Any tool.
The Idea
Echelon Analytics stores everything in a single SQLite database file. There is no proprietary data format, no vendor API you must call, no export step. The file on disk is your data.
You can copy it to your laptop, open it in any SQLite client, run SQL queries, build reports in Python or R, pipe it into a spreadsheet, feed it to an LLM, or back it up with cp. If you stop using Echelon tomorrow, your data stays exactly where it is, in an open format readable by thousands of tools.
Getting the File
The database lives at the path set by ECHELON_DB_PATH (default: ./echelon.db). In Docker deployments, it's at /app/data/echelon.db inside the container, mounted from the host's data/ directory.
Download a Copy
# From the Docker host — copy the live database cp data/echelon.db ~/echelon-backup.db # From a remote server scp user@server:~/ea-private/data/echelon.db . # Safe copy while the app is running (SQLite WAL checkpoint) sqlite3 data/echelon.db "PRAGMA wal_checkpoint(TRUNCATE);" cp data/echelon.db ~/echelon-snapshot.db
SQLite in WAL mode is safe to copy while the application is running. For a fully consistent snapshot, run the checkpoint command first to flush the write-ahead log into the main database file.
Open It Anywhere
# Command-line
sqlite3 echelon.db
# Python
import sqlite3
conn = sqlite3.connect("echelon.db")
# R
library(RSQLite)
con <- dbConnect(SQLite(), "echelon.db")
# DuckDB (for analytics queries)
duckdb -c "ATTACH 'echelon.db' AS ea (TYPE sqlite)"
# Any GUI: DB Browser for SQLite, TablePlus, DBeaver, DataGrip...
"Did you know DuckDB can attach your SQLite database directly and run analytics queries significantly faster? Great for large datasets — no ETL step needed." -🦆❤️🦭
Database Design
The schema is intentionally simple. Nine tables, no joins required for basic queries, no surrogate keys you need to look up. Every timestamp is ISO 8601 UTC. Every ID is human-readable text.
Design Principles
- Raw data first. Individual pageviews and events are stored as-is. Nothing is aggregated or discarded on write (unless bot score exceeds the discard threshold).
- Rollups for speed. Daily aggregates are pre-computed for fast dashboard queries, but you can always go back to the raw data.
- Bot scores preserved. Every record carries its bot score (0–100) so you can filter or re-analyze with your own thresholds.
- UTC everywhere. All
created_attimestamps usestrftime('%Y-%m-%dT%H:%M:%fZ','now')— ISO 8601, UTC, millisecond precision. - Text IDs. Visitor IDs, session IDs, experiment IDs are all human-readable text. No opaque integer foreign keys to chase.
Table Reference
visitor_views
Every pageview recorded by the tracker. This is the primary analytics table — one row per page load.
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-incrementing primary key |
visitor_id | TEXT | Visitor identifier (HMAC hash or cookie value) |
path | TEXT | Page path (e.g. /about, /blog/my-post) |
site_id | TEXT | Site identifier from data-site attribute |
session_id | TEXT | Browser session (from sessionStorage) |
interaction_ms | INTEGER | Milliseconds of active interaction (pointer, scroll, keyboard) |
screen_width | INTEGER | Screen width in pixels |
screen_height | INTEGER | Screen height in pixels |
device_type | TEXT | desktop, mobile, or tablet |
os_name | TEXT | Operating system (e.g. Windows, iOS, Android) |
country_code | TEXT | ISO 3166-1 alpha-2 country code (from geo headers) |
is_returning | INTEGER | 1 if visitor seen before, 0 if new |
referrer | TEXT | Full referrer URL (nullable) |
referrer_type | TEXT | Classified referrer: direct, search, social, other |
bot_score | INTEGER | Bot likelihood 0–100. Scores ≥50 are suspect. |
is_pwa | INTEGER | 1 if loaded as a Progressive Web App |
utm_source | TEXT | UTM source parameter |
utm_medium | TEXT | UTM medium parameter |
utm_campaign | TEXT | UTM campaign parameter |
utm_content | TEXT | UTM content parameter |
utm_term | TEXT | UTM term parameter |
created_at | TEXT | ISO 8601 UTC timestamp with milliseconds |
Indexes: visitor_id, (site_id, created_at), path, country_code, device_type, (created_at, bot_score), utm_campaign, (utm_campaign, site_id, created_at WHERE utm_campaign IS NOT NULL).
visitor_views_daily
Pre-computed daily aggregates. Excludes bot traffic (score ≥50) and manually excluded visitors. Used by the dashboard for fast date-range queries.
| Column | Type | Description |
|---|---|---|
site_id | TEXT | Site identifier |
date | TEXT | Date string YYYY-MM-DD |
device_type | TEXT | Device category |
country_code | TEXT | Country code |
is_returning | INTEGER | Returning visitor flag |
visits | INTEGER | Total pageview count |
unique_visitors | INTEGER | Distinct visitor count |
avg_interaction_ms | INTEGER | Average interaction time |
Primary key: (site_id, date, device_type, country_code, is_returning). This composite key makes rollups idempotent — re-running the aggregation for the same day safely overwrites.
semantic_events
Behavioral and custom events: clicks, scroll depth, bounces, form submissions, A/B experiment conversions, and any events sent via the JavaScript API.
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-incrementing primary key |
event_type | TEXT | Event name (e.g. click, scroll_depth, purchase) |
site_id | TEXT | Site identifier |
session_id | TEXT | Browser session |
visitor_id | TEXT | Visitor identifier |
data | TEXT | JSON payload with event-specific data |
experiment_id | TEXT | A/B experiment ID (if applicable) |
variant_id | TEXT | Experiment variant (if applicable) |
utm_campaign | TEXT | UTM campaign parameter |
device_type | TEXT | Device category |
referrer | TEXT | Referrer URL |
hour | INTEGER | Hour of day (0–23 UTC) |
month | INTEGER | Month (1–12) |
day_of_week | INTEGER | Day of week (0=Sunday, 6=Saturday) |
is_returning | INTEGER | Returning visitor flag |
bot_score | INTEGER | Bot likelihood 0–100 |
created_at | TEXT | ISO 8601 UTC timestamp |
Indexes: (site_id, event_type, created_at), visitor_id, (created_at, bot_score), (experiment_id, variant_id), (utm_campaign, site_id WHERE utm_campaign IS NOT NULL).
excluded_visitors
Visitor IDs manually flagged for exclusion from rollups and dashboards.
| Column | Type | Description |
|---|---|---|
visitor_id | TEXT | Primary key — the visitor to exclude |
label | TEXT | Optional human-readable label |
created_at | TEXT | When the exclusion was added |
experiments
A/B experiment definitions. Each experiment targets a specific event type as its success metric.
| Column | Type | Description |
|---|---|---|
experiment_id | TEXT | Primary key — unique experiment identifier |
name | TEXT | Human-readable name |
description | TEXT | What the experiment tests |
status | TEXT | draft, active, paused, completed, or archived |
metric_event_type | TEXT | The event_type that counts as a conversion |
allocation_percent | INTEGER | % of traffic enrolled (1–100) |
created_at | TEXT | Creation timestamp |
started_at | TEXT | When the experiment went active |
ended_at | TEXT | When the experiment completed |
experiment_variants
Variants within an experiment. Each has a weight for traffic allocation and an optional JSON config payload.
| Column | Type | Description |
|---|---|---|
experiment_id | TEXT | References experiments.experiment_id |
variant_id | TEXT | Variant identifier |
name | TEXT | Variant name (e.g. "Control", "Blue Button") |
weight | INTEGER | Traffic weight (higher = more traffic) |
is_control | INTEGER | 1 for the control group |
config | TEXT | Optional JSON configuration for the variant |
Primary key: (experiment_id, variant_id). Foreign key cascades deletes from experiments.
utm_campaigns
Registered marketing campaigns for the campaign dashboard.
| Column | Type | Description |
|---|---|---|
id | TEXT | Primary key — campaign ID |
name | TEXT | Display name |
utm_campaign | TEXT | Matching utm_campaign value from URLs |
site_id | TEXT | Which site this campaign belongs to |
status | TEXT | active, paused, or archived |
created_at | TEXT | Creation timestamp |
Unique constraint: (utm_campaign, site_id) — one campaign per site per UTM value.
site_settings
Per-site configuration, currently used for custom consent banner CSS.
| Column | Type | Description |
|---|---|---|
site_id | TEXT | Primary key — site identifier |
consent_css | TEXT | Custom CSS for the cookie consent banner |
updated_at | TEXT | Last update timestamp |
perf_metrics
Performance benchmarks recorded by CI/CD or manual runs.
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-incrementing primary key |
recorded_at | TEXT | When the metric was recorded |
commit_hash | TEXT | Git commit (for tracing regressions) |
branch | TEXT | Git branch |
category | TEXT | Metric category (e.g. latency, throughput) |
metric | TEXT | Metric name (e.g. p99_response_ms) |
value | REAL | Numeric value |
unit | TEXT | Unit of measurement (e.g. ms, rps) |
metadata | TEXT | Optional JSON metadata |
maintenance_log
Tracks daily rollup and purge operations.
| Column | Type | Description |
|---|---|---|
date | TEXT | Primary key — the date being maintained |
status | TEXT | started or completed |
rollup_rows | INTEGER | Rows inserted into daily rollup |
purge_views | INTEGER | Pageview rows deleted |
purge_events | INTEGER | Event rows deleted |
started_at | TEXT | When maintenance began |
completed_at | TEXT | When maintenance finished |
Example Queries
Copy-paste these into sqlite3, DBeaver, a Python notebook, or any SQL tool.
Pageviews per day (human traffic only)
SELECT date(created_at) AS day, COUNT(*) AS views FROM visitor_views WHERE bot_score < 50 GROUP BY day ORDER BY day DESC LIMIT 30;
Top pages this week
SELECT path, COUNT(*) AS views, COUNT(DISTINCT visitor_id) AS visitors
FROM visitor_views
WHERE bot_score < 50
AND created_at >= date('now', '-7 days')
GROUP BY path
ORDER BY views DESC
LIMIT 20;
Unique visitors by country
SELECT country_code, COUNT(DISTINCT visitor_id) AS visitors
FROM visitor_views
WHERE bot_score < 50
AND created_at >= date('now', '-30 days')
GROUP BY country_code
ORDER BY visitors DESC;
Device breakdown
SELECT device_type,
COUNT(*) AS views,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct
FROM visitor_views
WHERE bot_score < 50
AND created_at >= date('now', '-30 days')
GROUP BY device_type
ORDER BY views DESC;
Average interaction time by page
SELECT path,
COUNT(*) AS views,
ROUND(AVG(interaction_ms) / 1000.0, 1) AS avg_seconds
FROM visitor_views
WHERE bot_score < 50
AND interaction_ms > 0
AND created_at >= date('now', '-30 days')
GROUP BY path
ORDER BY avg_seconds DESC
LIMIT 20;
Event type distribution
SELECT event_type, COUNT(*) AS count
FROM semantic_events
WHERE bot_score < 50
AND created_at >= date('now', '-30 days')
GROUP BY event_type
ORDER BY count DESC;
Returning vs new visitors
SELECT
CASE is_returning WHEN 1 THEN 'returning' ELSE 'new' END AS type,
COUNT(DISTINCT visitor_id) AS visitors,
COUNT(*) AS pageviews
FROM visitor_views
WHERE bot_score < 50
AND created_at >= date('now', '-30 days')
GROUP BY is_returning;
UTM campaign performance
SELECT utm_campaign,
COUNT(*) AS pageviews,
COUNT(DISTINCT visitor_id) AS visitors,
ROUND(AVG(interaction_ms) / 1000.0, 1) AS avg_seconds
FROM visitor_views
WHERE utm_campaign IS NOT NULL
AND bot_score < 50
GROUP BY utm_campaign
ORDER BY pageviews DESC;
Bot traffic analysis
SELECT
CASE
WHEN bot_score >= 50 THEN 'likely bot'
WHEN bot_score >= 25 THEN 'suspect'
ELSE 'likely human'
END AS category,
COUNT(*) AS views,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct
FROM visitor_views
WHERE created_at >= date('now', '-7 days')
GROUP BY category
ORDER BY views DESC;
Experiment conversion rates
SELECT e.name AS experiment,
se.variant_id,
COUNT(*) AS conversions,
COUNT(DISTINCT se.visitor_id) AS unique_converters
FROM semantic_events se
JOIN experiments e ON e.experiment_id = se.experiment_id
WHERE se.event_type = e.metric_event_type
GROUP BY e.name, se.variant_id
ORDER BY e.name, se.variant_id;
Hourly traffic pattern
SELECT
CAST(strftime('%H', created_at) AS INTEGER) AS hour_utc,
COUNT(*) AS views
FROM visitor_views
WHERE bot_score < 50
AND created_at >= date('now', '-7 days')
GROUP BY hour_utc
ORDER BY hour_utc;
Data Lifecycle
| Stage | What happens | Retention |
|---|---|---|
| Ingest | Pageviews and events buffered in memory, flushed to SQLite every 10–15 seconds | — |
| Raw storage | Individual records in visitor_views and semantic_events | 90 days (configurable) |
| Rollup | Daily aggregation into visitor_views_daily, excluding bots and excluded visitors | 730 days |
| Purge | Raw data older than retention period is deleted; PRAGMA incremental_vacuum reclaims space | — |
SQLite Pragmas
Echelon configures these pragmas on startup:
| PRAGMA | Value | Effect |
|---|---|---|
journal_mode | WAL | Readers don't block writers. Safe to copy while running. |
synchronous | NORMAL | Fsync at checkpoints only. Balances speed and durability. |
busy_timeout | 5000 | Wait up to 5 seconds on lock contention before failing. |
foreign_keys | ON | Enforce referential integrity (experiment variants). |
Tips
- Always filter on
bot_score < 50for human analytics. Omit it only when you want to analyze bot traffic. - Use the rollup table (
visitor_views_daily) for dashboards and long-range trends — it's pre-filtered and indexed for fast date-range queries. - The
datacolumn insemantic_eventsis JSON. Use SQLite'sjson_extract()to query inside it:json_extract(data, '$.amount'). - Timestamps are strings in ISO 8601 format. SQLite's
date(),time(), andstrftime()functions work directly on them. - Back up regularly. It's just a file:
cp,rsync, or your favorite backup tool. - DuckDB can attach SQLite databases directly and run analytical queries significantly faster than SQLite for large datasets.
- AI agents can query your database directly using the MCP server — copy the file to any machine, point
ECHELON_DB_PATHat it, and ask questions in natural language.