Data Explorer
Data Explorer
The Data Explorer is a browser-based SQL REPL that lets you query the leaderboard database directly from the web interface. The database runs entirely in the browser via WebAssembly — no server required.
Configuration
The Data Explorer is enabled by default. You can configure it in config.yaml under the leaderboard section:
leaderboard:
data_explorer:
enabled: true # optional, default: true
source: /data.db # optional, default: "/data.db"Disabling the Data Explorer
Set enabled: false to hide the Data Explorer from the navigation and return a 404 for /data:
leaderboard:
data_explorer:
enabled: falseUsing an External Database Source
If your database file exceeds the static hosting provider's file size limit (e.g., Cloudflare Pages has a 25 MiB per-file limit), you can host the .db file separately and point to it with the source option:
leaderboard:
data_explorer:
source: https://data.leaderboard.yourorg.com/data.dbWhen source is an external URL (http:// or https://), the prebuild step skips copying the .db file into public/, keeping the deployment bundle small. The WebAssembly worker and WASM files are still shipped locally since they are small and always required.
Cross-origin hosting requires proper CORS headers. The external host must expose
Accept-Ranges,Content-Range, andContent-Lengthresponse headers for range requests to work. Cloudflare R2 is the recommended option. See the GitHub Pages CORS limitation for details on why some hosts don't work cross-origin.
How It Works
During the build, the SQLite database is copied to public/data.db as a static asset. When a user visits /data, the sql.js-httpvfs library loads a WebAssembly SQLite engine in a Web Worker and makes HTTP range requests to fetch only the database pages needed for each query — the full database is never downloaded unless a query requires a full table scan.
Accessing the Explorer
Navigate to /data in your leaderboard deployment, or click Data in the navigation bar.
Features
SQL Editor
Write any read-only SQL query against the leaderboard database. The editor supports:
- Multi-line queries
- Cmd+Enter (Mac) / Ctrl+Enter (Windows/Linux) to execute
- Auto-resizing textarea
Example Queries
Click any of the preset query buttons to load a starter query:
| Query | Description |
|---|---|
| All contributors | Lists every contributor with username, name, role, and title |
| Top 10 by points | Ranks contributors by total activity points |
| Activity types | Shows all activity definitions and their point values |
| Recent activity | Displays the 20 most recent activities with details |
| Activity per contributor | Aggregates activity count and points per contributor |
| Badge holders | Lists recent badge achievements with variant info |
Schema Browser
On larger screens, a sidebar displays the full database schema — every table and its columns with types. Click a table name to expand or collapse its column list.
Query Statistics
After each query, the stats bar shows:
- Query duration — how long the SQL execution took
- Bytes fetched — total data transferred via HTTP range requests
- Request count — number of HTTP requests made to serve the query
Database Schema
The following tables are available for querying:
contributor
| Column | Type |
|---|---|
| username | VARCHAR (PK) |
| name | VARCHAR |
| role | VARCHAR |
| title | VARCHAR |
| avatar_url | VARCHAR |
| bio | TEXT |
| social_profiles | JSON |
| joining_date | DATE |
| meta | JSON |
activity
| Column | Type |
|---|---|
| slug | VARCHAR (PK) |
| contributor | VARCHAR (FK → contributor) |
| activity_definition | VARCHAR (FK → activity_definition) |
| title | VARCHAR |
| occurred_at | TIMESTAMP |
| link | VARCHAR |
| text | TEXT |
| points | SMALLINT |
| meta | JSON |
activity_definition
| Column | Type |
|---|---|
| slug | VARCHAR (PK) |
| name | VARCHAR |
| description | TEXT |
| points | SMALLINT |
| icon | VARCHAR |
badge_definition
| Column | Type |
|---|---|
| slug | VARCHAR (PK) |
| name | VARCHAR |
| description | TEXT |
| variants | JSON |
contributor_badge
| Column | Type |
|---|---|
| slug | VARCHAR (PK) |
| badge | VARCHAR (FK → badge_definition) |
| contributor | VARCHAR (FK → contributor) |
| variant | VARCHAR |
| achieved_on | DATE |
| meta | JSON |
global_aggregate
| Column | Type |
|---|---|
| slug | VARCHAR (PK) |
| name | VARCHAR |
| description | TEXT |
| value | JSON |
| hidden | BOOLEAN |
| meta | JSON |
contributor_aggregate_definition
| Column | Type |
|---|---|
| slug | VARCHAR (PK) |
| name | VARCHAR |
| description | TEXT |
| hidden | BOOLEAN |
contributor_aggregate
| Column | Type |
|---|---|
| aggregate | VARCHAR (FK → contributor_aggregate_definition) |
| contributor | VARCHAR (FK → contributor) |
| value | JSON |
| meta | JSON |
Useful Query Examples
Contributors with the longest streaks
SELECT ca.contributor, c.name, ca.value AS streak_days
FROM contributor_aggregate ca
JOIN contributor c ON ca.contributor = c.username
WHERE ca.aggregate = 'longest_streak'
ORDER BY CAST(ca.value AS INTEGER) DESC
LIMIT 10;Activity breakdown by type
SELECT ad.name, COUNT(*) AS count, SUM(a.points) AS total_points
FROM activity a
JOIN activity_definition ad ON a.activity_definition = ad.slug
GROUP BY a.activity_definition
ORDER BY total_points DESC;Monthly activity trend
SELECT strftime('%Y-%m', occurred_at) AS month,
COUNT(*) AS activities,
SUM(points) AS points
FROM activity
GROUP BY month
ORDER BY month;Contributors who earned a specific badge
SELECT cb.contributor, c.name, cb.variant, cb.achieved_on
FROM contributor_badge cb
JOIN contributor c ON cb.contributor = c.username
WHERE cb.badge = 'prolific_contributor'
ORDER BY cb.achieved_on;Hosting the Database on GitHub Pages
If your deployment target has a per-file size limit (e.g., Cloudflare Pages at 25 MiB), or you simply want to decouple the database from the application build, you can serve the .db file from GitHub Pages using a GitHub Actions workflow — without ever committing the database file to git.
Why GitHub Pages?
- No per-file size limit — GitHub Pages has a 1 GB total site size limit but no individual file size restriction
- Supports HTTP Range Requests — required by
sql.js-httpvfsfor lazy loading - Free for public repositories — ideal for open-source leaderboards
- No bloated git history — the workflow uploads the database as a Pages deployment artifact, so the
.dbfile never enters version control
GitHub Actions Workflow
Create .github/workflows/publish-db.yml in your leaderboard repository (or data repository):
name: Publish Leaderboard Database
on:
schedule:
- cron: "0 */6 * * *" # Every 6 hours
workflow_dispatch: # Manual trigger
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: false
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: pnpm
- run: pnpm install
- name: Build database
run: pnpm build:data
env:
LEADERBOARD_DATA_DIR: ./data
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Prepare Pages artifact
run: |
mkdir -p _site
cp data/.leaderboard.db _site/data.db
- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v3
deploy:
needs: build
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4This workflow:
- Runs the plugin runner to build a fresh database
- Copies the
.dbfile into a_site/directory - Uploads it as a GitHub Pages artifact (no git commit)
- Deploys the artifact to GitHub Pages
Enable GitHub Pages
In your repository, go to Settings → Pages and set the source to GitHub Actions (not "Deploy from a branch").
Configure config.yaml
Point the Data Explorer to the published URL:
leaderboard:
data_explorer:
source: https://yourorg.github.io/your-repo/data.dbThe prebuild step will skip copying data.db locally and the browser will fetch it directly from GitHub Pages.
CORS Limitation
Important: GitHub Pages does not set the
Access-Control-Expose-Headersresponse header. When the Data Explorer is hosted on a different origin than the database (cross-origin), the browser strips theAccept-RangesandContent-Rangeheaders from responses. This preventssql.js-httpvfsfrom performing range requests correctly, resulting in a "database disk image is malformed" error.GitHub Pages works only when the leaderboard site and the database are served from the same origin. For cross-origin hosting, use Cloudflare R2 or another storage provider that supports configurable CORS headers.
Hosting the Database on Cloudflare R2
Cloudflare R2 is an object storage service with a generous free tier. It supports configurable CORS policies — making it the recommended option for hosting the database externally, especially when deploying the leaderboard to Cloudflare Pages.
Why Cloudflare R2?
- Configurable CORS headers — you can expose
Accept-Ranges,Content-Range, andContent-Length, which are required for cross-origin range requests - No per-file size limit — R2 supports objects up to 5 TB
- Free tier — 10 GB storage, 10 million reads/month, 1 million writes/month
- Native Cloudflare integration — pairs naturally with Cloudflare Pages deployments
- Global edge network — low-latency access worldwide
Step 1: Create an R2 Bucket
- Go to the Cloudflare dashboard → R2 Object Storage → Create bucket
- Name it (e.g.,
leaderboard-data) - Choose a location hint closest to your users
Step 2: Enable Public Access
- Open your bucket → Settings → Public access
- Under Custom Domains, add a domain (e.g.,
data.leaderboard.yourorg.com) — or use the default R2 public URL by enabling R2.dev subdomain
Step 3: Configure CORS Policy
In your bucket settings, go to CORS policy and add the following, replacing the origin with your leaderboard deployment URL:
[
{
"AllowedOrigins": ["https://contributors.yourorg.com"],
"AllowedMethods": ["GET", "HEAD"],
"AllowedHeaders": ["Range"],
"ExposeHeaders": ["Accept-Ranges", "Content-Range", "Content-Length"],
"MaxAgeSeconds": 86400
}
]This ensures only your leaderboard site can make cross-origin requests to the database. Requests from any other origin will be blocked by the browser.
CORS is not full access control. CORS only prevents other websites from loading the database via JavaScript (
fetch,XMLHttpRequest). It does not prevent direct downloads — anyone with the URL can still access the file viacurl, a browser address bar, or any non-browser client. If you need to restrict direct access, see Securing Access with a Cloudflare Worker below.
Step 4: Upload the Database
You can upload manually via the dashboard, or use wrangler in CI:
npx wrangler r2 object put leaderboard-data/data.db \
--file data/.leaderboard.db \
--content-type application/octet-streamStep 5: Configure config.yaml
Point the Data Explorer to your R2 URL:
leaderboard:
data_explorer:
source: https://data.leaderboard.yourorg.com/data.dbIf using the R2.dev subdomain instead of a custom domain:
leaderboard:
data_explorer:
source: https://leaderboard-data.<account-id>.r2.dev/data.dbAutomating with GitHub Actions
Create .github/workflows/publish-db.yml to rebuild and upload the database on a schedule:
name: Publish Leaderboard Database to R2
on:
schedule:
- cron: "0 */6 * * *" # Every 6 hours
workflow_dispatch: # Manual trigger
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: pnpm
- run: pnpm install
- name: Build database
run: pnpm build:data
env:
LEADERBOARD_DATA_DIR: ./data
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload to R2
run: npx wrangler r2 object put leaderboard-data/data.db --file data/.leaderboard.db --content-type application/octet-stream
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}Add these secrets to your repository:
CLOUDFLARE_API_TOKEN— create an API token with R2 write permissions in the Cloudflare dashboardCLOUDFLARE_ACCOUNT_ID— found on the R2 overview page
Technical Details
Build-Time Setup
The prebuild script (scripts/setup-db.ts) copies three files into public/:
data.db— the SQLite database from the data directorysqlite.worker.js— the sql.js-httpvfs Web Workersql-wasm.wasm— the SQLite WebAssembly binary (~1.2 MB)
These are wired into predev and prebuild in package.json via the setup:db script.
Client-Side Architecture
The useDatabase() hook in lib/sql-repl/use-database.ts:
- Dynamically imports
sql.js-httpvfs(avoids SSR issues) - Calls
createDbWorkerwith the static asset URLs - Uses
serverMode: "full"with arequestChunkSizeof 4096 bytes - Exposes
exec(sql)to run queries andgetStats()for network metrics - Cleans up the Web Worker on component unmount
Performance
- Lazy loading: Only database pages touched by a query are fetched
- Worker thread: Queries run in a Web Worker, keeping the UI responsive
- Indexed queries: Queries using indexed columns (
contributor,occurred_at,activity_definition) are fast even on large databases - Full table scans: Queries like
SELECT * FROM activitywill fetch the entire table — useLIMITfor large tables
Limitations
- Read-only: You can run
INSERTorUPDATEstatements but they only affect the in-memory copy and are lost on page reload - SQLite dialect: Uses SQLite SQL syntax, not PostgreSQL or MySQL
- Memory: The fetched database pages are kept in memory — very large databases may cause high memory usage
- No persistence: Query history is not saved between sessions