Portable Data

One file. Your data. Any tool.


The Idea

Echelon Analytics stores everything in a single SQLite database file. There is no proprietary data format, no vendor API you must call, no export step. The file on disk is your data.

You can copy it to your laptop, open it in any SQLite client, run SQL queries, build reports in Python or R, pipe it into a spreadsheet, feed it to an LLM, or back it up with cp. If you stop using Echelon tomorrow, your data stays exactly where it is, in an open format readable by thousands of tools.

Getting the File

The database lives at the path set by ECHELON_DB_PATH (default: ./echelon.db). In Docker deployments, it's at /app/data/echelon.db inside the container, mounted from the host's data/ directory.

Download a Copy

# From the Docker host — copy the live database
cp data/echelon.db ~/echelon-backup.db

# From a remote server
scp user@server:~/ea-private/data/echelon.db .

# Safe copy while the app is running (SQLite WAL checkpoint)
sqlite3 data/echelon.db "PRAGMA wal_checkpoint(TRUNCATE);"
cp data/echelon.db ~/echelon-snapshot.db

SQLite in WAL mode is safe to copy while the application is running. For a fully consistent snapshot, run the checkpoint command first to flush the write-ahead log into the main database file.

Open It Anywhere

# Command-line
sqlite3 echelon.db

# Python
import sqlite3
conn = sqlite3.connect("echelon.db")

# R
library(RSQLite)
con <- dbConnect(SQLite(), "echelon.db")

# DuckDB (for analytics queries)
duckdb -c "ATTACH 'echelon.db' AS ea (TYPE sqlite)"

# Any GUI: DB Browser for SQLite, TablePlus, DBeaver, DataGrip...

"Did you know DuckDB can attach your SQLite database directly and run analytics queries significantly faster? Great for large datasets — no ETL step needed." -🦆❤️🦭

Database Design

The schema is intentionally simple. Nine tables, no joins required for basic queries, no surrogate keys you need to look up. Every timestamp is ISO 8601 UTC. Every ID is human-readable text.

Design Principles


Table Reference

visitor_views

Every pageview recorded by the tracker. This is the primary analytics table — one row per page load.

ColumnTypeDescription
idINTEGERAuto-incrementing primary key
visitor_idTEXTVisitor identifier (HMAC hash or cookie value)
pathTEXTPage path (e.g. /about, /blog/my-post)
site_idTEXTSite identifier from data-site attribute
session_idTEXTBrowser session (from sessionStorage)
interaction_msINTEGERMilliseconds of active interaction (pointer, scroll, keyboard)
screen_widthINTEGERScreen width in pixels
screen_heightINTEGERScreen height in pixels
device_typeTEXTdesktop, mobile, or tablet
os_nameTEXTOperating system (e.g. Windows, iOS, Android)
country_codeTEXTISO 3166-1 alpha-2 country code (from geo headers)
is_returningINTEGER1 if visitor seen before, 0 if new
referrerTEXTFull referrer URL (nullable)
referrer_typeTEXTClassified referrer: direct, search, social, other
bot_scoreINTEGERBot likelihood 0–100. Scores ≥50 are suspect.
is_pwaINTEGER1 if loaded as a Progressive Web App
utm_sourceTEXTUTM source parameter
utm_mediumTEXTUTM medium parameter
utm_campaignTEXTUTM campaign parameter
utm_contentTEXTUTM content parameter
utm_termTEXTUTM term parameter
created_atTEXTISO 8601 UTC timestamp with milliseconds

Indexes: visitor_id, (site_id, created_at), path, country_code, device_type, (created_at, bot_score), utm_campaign, (utm_campaign, site_id, created_at WHERE utm_campaign IS NOT NULL).

visitor_views_daily

Pre-computed daily aggregates. Excludes bot traffic (score ≥50) and manually excluded visitors. Used by the dashboard for fast date-range queries.

ColumnTypeDescription
site_idTEXTSite identifier
dateTEXTDate string YYYY-MM-DD
device_typeTEXTDevice category
country_codeTEXTCountry code
is_returningINTEGERReturning visitor flag
visitsINTEGERTotal pageview count
unique_visitorsINTEGERDistinct visitor count
avg_interaction_msINTEGERAverage interaction time

Primary key: (site_id, date, device_type, country_code, is_returning). This composite key makes rollups idempotent — re-running the aggregation for the same day safely overwrites.

semantic_events

Behavioral and custom events: clicks, scroll depth, bounces, form submissions, A/B experiment conversions, and any events sent via the JavaScript API.

ColumnTypeDescription
idINTEGERAuto-incrementing primary key
event_typeTEXTEvent name (e.g. click, scroll_depth, purchase)
site_idTEXTSite identifier
session_idTEXTBrowser session
visitor_idTEXTVisitor identifier
dataTEXTJSON payload with event-specific data
experiment_idTEXTA/B experiment ID (if applicable)
variant_idTEXTExperiment variant (if applicable)
utm_campaignTEXTUTM campaign parameter
device_typeTEXTDevice category
referrerTEXTReferrer URL
hourINTEGERHour of day (0–23 UTC)
monthINTEGERMonth (1–12)
day_of_weekINTEGERDay of week (0=Sunday, 6=Saturday)
is_returningINTEGERReturning visitor flag
bot_scoreINTEGERBot likelihood 0–100
created_atTEXTISO 8601 UTC timestamp

Indexes: (site_id, event_type, created_at), visitor_id, (created_at, bot_score), (experiment_id, variant_id), (utm_campaign, site_id WHERE utm_campaign IS NOT NULL).

excluded_visitors

Visitor IDs manually flagged for exclusion from rollups and dashboards.

ColumnTypeDescription
visitor_idTEXTPrimary key — the visitor to exclude
labelTEXTOptional human-readable label
created_atTEXTWhen the exclusion was added

experiments

A/B experiment definitions. Each experiment targets a specific event type as its success metric.

ColumnTypeDescription
experiment_idTEXTPrimary key — unique experiment identifier
nameTEXTHuman-readable name
descriptionTEXTWhat the experiment tests
statusTEXTdraft, active, paused, completed, or archived
metric_event_typeTEXTThe event_type that counts as a conversion
allocation_percentINTEGER% of traffic enrolled (1–100)
created_atTEXTCreation timestamp
started_atTEXTWhen the experiment went active
ended_atTEXTWhen the experiment completed

experiment_variants

Variants within an experiment. Each has a weight for traffic allocation and an optional JSON config payload.

ColumnTypeDescription
experiment_idTEXTReferences experiments.experiment_id
variant_idTEXTVariant identifier
nameTEXTVariant name (e.g. "Control", "Blue Button")
weightINTEGERTraffic weight (higher = more traffic)
is_controlINTEGER1 for the control group
configTEXTOptional JSON configuration for the variant

Primary key: (experiment_id, variant_id). Foreign key cascades deletes from experiments.

utm_campaigns

Registered marketing campaigns for the campaign dashboard.

ColumnTypeDescription
idTEXTPrimary key — campaign ID
nameTEXTDisplay name
utm_campaignTEXTMatching utm_campaign value from URLs
site_idTEXTWhich site this campaign belongs to
statusTEXTactive, paused, or archived
created_atTEXTCreation timestamp

Unique constraint: (utm_campaign, site_id) — one campaign per site per UTM value.

site_settings

Per-site configuration, currently used for custom consent banner CSS.

ColumnTypeDescription
site_idTEXTPrimary key — site identifier
consent_cssTEXTCustom CSS for the cookie consent banner
updated_atTEXTLast update timestamp

perf_metrics

Performance benchmarks recorded by CI/CD or manual runs.

ColumnTypeDescription
idINTEGERAuto-incrementing primary key
recorded_atTEXTWhen the metric was recorded
commit_hashTEXTGit commit (for tracing regressions)
branchTEXTGit branch
categoryTEXTMetric category (e.g. latency, throughput)
metricTEXTMetric name (e.g. p99_response_ms)
valueREALNumeric value
unitTEXTUnit of measurement (e.g. ms, rps)
metadataTEXTOptional JSON metadata

maintenance_log

Tracks daily rollup and purge operations.

ColumnTypeDescription
dateTEXTPrimary key — the date being maintained
statusTEXTstarted or completed
rollup_rowsINTEGERRows inserted into daily rollup
purge_viewsINTEGERPageview rows deleted
purge_eventsINTEGEREvent rows deleted
started_atTEXTWhen maintenance began
completed_atTEXTWhen maintenance finished

Example Queries

Copy-paste these into sqlite3, DBeaver, a Python notebook, or any SQL tool.

Pageviews per day (human traffic only)

SELECT date(created_at) AS day, COUNT(*) AS views
FROM visitor_views
WHERE bot_score < 50
GROUP BY day
ORDER BY day DESC
LIMIT 30;

Top pages this week

SELECT path, COUNT(*) AS views, COUNT(DISTINCT visitor_id) AS visitors
FROM visitor_views
WHERE bot_score < 50
  AND created_at >= date('now', '-7 days')
GROUP BY path
ORDER BY views DESC
LIMIT 20;

Unique visitors by country

SELECT country_code, COUNT(DISTINCT visitor_id) AS visitors
FROM visitor_views
WHERE bot_score < 50
  AND created_at >= date('now', '-30 days')
GROUP BY country_code
ORDER BY visitors DESC;

Device breakdown

SELECT device_type,
       COUNT(*) AS views,
       ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct
FROM visitor_views
WHERE bot_score < 50
  AND created_at >= date('now', '-30 days')
GROUP BY device_type
ORDER BY views DESC;

Average interaction time by page

SELECT path,
       COUNT(*) AS views,
       ROUND(AVG(interaction_ms) / 1000.0, 1) AS avg_seconds
FROM visitor_views
WHERE bot_score < 50
  AND interaction_ms > 0
  AND created_at >= date('now', '-30 days')
GROUP BY path
ORDER BY avg_seconds DESC
LIMIT 20;

Event type distribution

SELECT event_type, COUNT(*) AS count
FROM semantic_events
WHERE bot_score < 50
  AND created_at >= date('now', '-30 days')
GROUP BY event_type
ORDER BY count DESC;

Returning vs new visitors

SELECT
  CASE is_returning WHEN 1 THEN 'returning' ELSE 'new' END AS type,
  COUNT(DISTINCT visitor_id) AS visitors,
  COUNT(*) AS pageviews
FROM visitor_views
WHERE bot_score < 50
  AND created_at >= date('now', '-30 days')
GROUP BY is_returning;

UTM campaign performance

SELECT utm_campaign,
       COUNT(*) AS pageviews,
       COUNT(DISTINCT visitor_id) AS visitors,
       ROUND(AVG(interaction_ms) / 1000.0, 1) AS avg_seconds
FROM visitor_views
WHERE utm_campaign IS NOT NULL
  AND bot_score < 50
GROUP BY utm_campaign
ORDER BY pageviews DESC;

Bot traffic analysis

SELECT
  CASE
    WHEN bot_score >= 50 THEN 'likely bot'
    WHEN bot_score >= 25 THEN 'suspect'
    ELSE 'likely human'
  END AS category,
  COUNT(*) AS views,
  ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct
FROM visitor_views
WHERE created_at >= date('now', '-7 days')
GROUP BY category
ORDER BY views DESC;

Experiment conversion rates

SELECT e.name AS experiment,
       se.variant_id,
       COUNT(*) AS conversions,
       COUNT(DISTINCT se.visitor_id) AS unique_converters
FROM semantic_events se
JOIN experiments e ON e.experiment_id = se.experiment_id
WHERE se.event_type = e.metric_event_type
GROUP BY e.name, se.variant_id
ORDER BY e.name, se.variant_id;

Hourly traffic pattern

SELECT
  CAST(strftime('%H', created_at) AS INTEGER) AS hour_utc,
  COUNT(*) AS views
FROM visitor_views
WHERE bot_score < 50
  AND created_at >= date('now', '-7 days')
GROUP BY hour_utc
ORDER BY hour_utc;

Data Lifecycle

StageWhat happensRetention
IngestPageviews and events buffered in memory, flushed to SQLite every 10–15 seconds
Raw storageIndividual records in visitor_views and semantic_events90 days (configurable)
RollupDaily aggregation into visitor_views_daily, excluding bots and excluded visitors730 days
PurgeRaw data older than retention period is deleted; PRAGMA incremental_vacuum reclaims space

SQLite Pragmas

Echelon configures these pragmas on startup:

PRAGMAValueEffect
journal_modeWALReaders don't block writers. Safe to copy while running.
synchronousNORMALFsync at checkpoints only. Balances speed and durability.
busy_timeout5000Wait up to 5 seconds on lock contention before failing.
foreign_keysONEnforce referential integrity (experiment variants).

Tips


Installation Features API Reference Bot Defense Configuration Architecture MCP Server Telemetry Why ea.js?