Engineering¶
Tech stack, repo layout, local environment, deployment, particularities.
Tech stack¶
Selection principles¶
- Open source first — every component must be self-hostable or replaceable
- Python backend — matches the team's profile (data engineering)
- Near-zero cost during the beta — 10 users, no reason to pay
Backend¶
| Component | Choice | Target version |
|---|---|---|
| API framework | FastAPI | ≥ 0.115 |
| ASGI server | Uvicorn | ≥ 0.34 |
| ORM | SQLAlchemy | ≥ 2.0 |
| Geospatial extension | GeoAlchemy2 | ≥ 0.15 |
| Migrations | Alembic | ≥ 1.14 |
| Authentication | Cookie session + double-submit CSRF (JWT payload via PyJWT); bcrypt for passwords | — |
| Validation | Pydantic v2 | ≥ 2.0 |
| Rate limiting | slowapi | ≥ 0.1.9 |
Database¶
| Component | Choice |
|---|---|
| RDBMS | PostgreSQL (16 in prod on Railway, 18 locally — see backups.md for the version-mismatch rationale) |
| Geospatial extension | PostGIS 3 |
PostGIS handles coordinates, bounding boxes, and geographic queries (radius, intersection…).
Media storage¶
| Component | Choice |
|---|---|
| Object storage | AWS S3 (private bucket, eu-west region) |
| CDN | AWS CloudFront (with Origin Access Control) |
| Python SDK | boto3 |
S3 + CloudFront from day one (not Supabase). AWS familiarity, evidence-preservation primitives (Object Lock, versioning, replication), no future migration tax. The backend talks to storage through a small Storage protocol (S3Storage for prod, LocalStorage for dev/CI). Shipped in v0.0.2 — see CHANGELOG.md.
Frontend¶
| Component | Choice |
|---|---|
| Framework | Next.js 16 (App Router) |
| UI runtime | React 19 |
| Language | TypeScript |
| Interactive map | MapLibre GL JS (via react-map-gl/maplibre) + CARTO Dark Matter vector tiles |
| Rich editor (proof) | Tiptap |
| Styles | Tailwind CSS 4 (CSS-first config — @theme block in frontend/src/app/globals.css, no tailwind.config.ts) |
| Icons | lucide-react |
| Linting | ESLint 9 (flat config in frontend/eslint.config.mjs, bridged via FlatCompat to eslint-config-next's next/core-web-vitals preset). The next lint wrapper was deprecated in Next 15 and removed in Next 16 — npm run lint invokes eslint directly. |
| Tests | Vitest + Testing Library (jsdom, config in frontend/vitest.config.mts). Colocated *.test.ts(x) under src/; npm test runs once, npm run test:watch watches. NEXT_PUBLIC_API_URL is stubbed in the config so importing lib/api.ts doesn't trip its boot guard. |
MapLibre GL JS is open-source (BSD-3-Clause), uses vector tiles, and supports client-side clustering. CARTO Dark Matter tiles are free for non-commercial use and visually align with the dark theme.
Client pages load read-only API data through useApiResource<T>(path) (frontend/src/hooks/useApiResource.ts): GET on mount and on every path change, abort of the in-flight request on unmount / path change, skip while path is null (auth unresolved, route params not ready), refetch() for retry buttons and post-mutation refreshes. Errors surface as messages for the page to render — 401 handling stays in the proxy. Lists the page mutates after seeding (e.g. TagPicker appending a newly created tag) stay useState + apiFetch.
Hosting¶
| Service | Platform | Estimated cost |
|---|---|---|
| Backend (FastAPI) | Railway | ~0–5 €/month |
| Frontend (Next.js) | Vercel | Free |
| Database (PostgreSQL + PostGIS) | Railway | Included in the plan |
| Media storage | AWS S3 + CloudFront | ~1–3 $/month at beta scale |
Beta total (10 users): ~5 €/month.
Out of technical scope for the MVP¶
- Redis / external cache — not needed (an in-process TTL+LRU cache is used for the points endpoint, see
backend/app/cache.py) - Task queue (Celery, etc.) — no async processing in the MVP
- Multi-region S3 / cross-region replication — single-region for closed beta
- Monitoring / observability — UptimeRobot liveness checks on the API health endpoint + a Sentry SDK on both tiers (backend + frontend), opt-in via a DSN env var (shipped v0.1.0 — see Observability). No full APM / tracing pipeline yet.
Repository layout (monorepo)¶
vidit/
├── AGENTS.md
├── CHANGELOG.md # release history (append-only)
├── CLAUDE.md # one-line `@AGENTS.md` pointer for Claude Code
├── CODE_OF_CONDUCT.md # Contributor Covenant 2.1
├── CONTRIBUTING.md # PR flow, doc-sync rule, commit conventions
├── LICENSE # AGPL-3.0
├── Makefile # init / dev / seed / mock-admin / test entry points
├── README.md
├── SECURITY.md # vulnerability reporting
├── docker-compose.yml # PostgreSQL + PostGIS for local dev
├── docker/ # custom PG 18 image (PostGIS + AGE + pg_cron) + backup cron
│
├── backend/ # FastAPI (Python)
│ ├── app/
│ │ ├── main.py # FastAPI entry point
│ │ ├── config.py # Settings (pydantic-settings)
│ │ ├── database.py # SQLAlchemy engine + session
│ │ ├── cache.py # In-process TTL + LRU cache
│ │ ├── dependencies.py # get_db, get_current_user
│ │ ├── middleware/ # HSTS, request-context, CSRF, gate
│ │ ├── models/ # SQLAlchemy — one table per file
│ │ │ ├── admin_event.py # Admin-action audit log
│ │ │ ├── auth_event.py # /auth/* audit log
│ │ │ ├── auth_token.py # Single-use password-reset tokens
│ │ │ ├── bounty.py # Bounty + BountyClaim (multi-claimer signal)
│ │ │ ├── follow.py # Analyst → analyst follow edges
│ │ │ ├── geolocation.py
│ │ │ ├── invite_code.py
│ │ │ ├── media.py
│ │ │ ├── pending_registration.py # Pre-creation registration staging
│ │ │ ├── proof_image.py # Inline images uploaded from the Tiptap proof editor
│ │ │ ├── tag.py
│ │ │ └── user.py
│ │ ├── schemas/ # Pydantic v2 — request/response
│ │ │ ├── admin.py
│ │ │ ├── auth.py
│ │ │ ├── bounty.py
│ │ │ ├── geolocation.py
│ │ │ ├── media.py
│ │ │ ├── recovery.py # Password-reset request/confirm bodies
│ │ │ ├── search.py
│ │ │ ├── tag.py
│ │ │ └── user.py
│ │ ├── routers/ # FastAPI endpoints
│ │ │ ├── admin.py
│ │ │ ├── auth.py
│ │ │ ├── bounties.py
│ │ │ ├── geolocations.py
│ │ │ ├── search.py
│ │ │ ├── social.py # Follow / unfollow / timeline
│ │ │ ├── tags.py
│ │ │ └── users.py
│ │ └── services/ # Business logic
│ │ ├── admin.py # Invite mint, trust toggle, soft/hard delete
│ │ ├── audit.py # auth_events + admin_events writes
│ │ ├── auth.py # JWT, hashing, invite-code consume (atomic UPDATE)
│ │ ├── auth_cookies.py # Session + CSRF cookie issuance / clearing
│ │ ├── auth_tokens.py # Single-use password-reset tokens
│ │ ├── bounties.py # create_with_evidence (bounty side)
│ │ ├── email.py # Resend / console-echo email transport
│ │ ├── evidence_intake.py # Shared media intake: file cap, upload loop, commit/sweep + typed errors
│ │ ├── evidence_processing.py # EXIF strip + sha256 hash on upload
│ │ ├── geolocations.py # create_with_evidence (geo side) + typed GeolocationError hierarchy
│ │ ├── maintenance.py # Reapers: auth tokens, proof orphans, pending regs
│ │ ├── registration.py # Pre-creation flow: pending row, claim, confirm
│ │ ├── sanitize.py # Server-side Tiptap (ProseMirror) sanitiser
│ │ ├── search.py # ts_headline-driven highlight pipeline
│ │ ├── seed.py # Admin demo-data seeder
│ │ ├── social.py # Follow edges, timeline assembly
│ │ └── storage.py # Storage protocol + S3Storage / LocalStorage + sweep_keys post-commit helper
│ ├── alembic/ # DB migrations
│ ├── scripts/ # Local-dev helpers (mock_admin, seed_demo, seed_timeline)
│ ├── tests/
│ ├── alembic.ini
│ ├── pyproject.toml # uv + dependencies
│ └── Dockerfile
│
├── frontend/ # Next.js 16 (TypeScript)
│ ├── src/
│ │ ├── app/ # App Router
│ │ │ ├── layout.tsx
│ │ │ ├── page.tsx # Public landing page (storefront)
│ │ │ ├── about/ # Public marketing / mission page
│ │ │ ├── admin/ # Admin console (invites, demo seed, reapers)
│ │ │ ├── bounties/ # Bounty index + detail + create
│ │ │ ├── geolocations/ # Detail + submit form
│ │ │ ├── map/ # Interactive map (the app home)
│ │ │ ├── profile/[username]/ # Analyst profile
│ │ │ ├── search/ # Global search
│ │ │ ├── settings/ # User settings
│ │ │ ├── timeline/ # Following-feed
│ │ │ ├── (auth)/ # Login, register, forgot, etc. (sidebar hidden)
│ │ │ ├── error.tsx # Route-level error boundary
│ │ │ └── global-error.tsx # Root error boundary
│ │ ├── components/
│ │ │ ├── admin/ # Admin console panels (SeedWipePanel, etc.)
│ │ │ ├── auth/ # LoginForm, RegisterForm, etc.
│ │ │ ├── editor/ # Tiptap components
│ │ │ ├── geolocation/ # GeolocationCard, GeolocationDetailBody, etc. (cross-page)
│ │ │ ├── geolocations/new/ # Submit-form sections (LocationPicker, etc.)
│ │ │ ├── map/ # MapLibre GL components + map overlays (FilterPanel, etc.)
│ │ │ ├── profile/ # TrustBadge, ProfileHeader, useProfileEdit, etc.
│ │ │ ├── ui/ # PageShell, styles.ts, WipBadge, etc.
│ │ │ ├── ClosedBetaBanner.tsx
│ │ │ ├── PathTracker.tsx
│ │ │ └── Sidebar.tsx
│ │ ├── contexts/AuthContext.tsx
│ │ ├── hooks/ # useAdmin, etc.
│ │ ├── lib/ # api.ts, auth.ts, mediaUrls.ts, format.ts, …
│ │ ├── types/index.ts # Shared types
│ │ └── proxy.ts # Host redirect + auth wall (Edge runtime)
│ ├── public/
│ ├── package.json
│ ├── tsconfig.json
│ └── next.config.mjs
│
├── docs/ # technical reference
│ ├── api.md
│ ├── backups.md # weekly pg_dump cron + restore drill
│ ├── data-model.md
│ ├── design.md
│ └── engineering.md # tech stack + repo layout + deployment + particularities
│
├── planning/ # project planning (not user docs)
│ ├── next.md # scheduled work + unscheduled candidates
│ └── roadmap.md # vision + openness commitment
│
├── video/ # "Promo as code" pipeline — see video/README.md
│ ├── src/ # Remotion composition (Demo.tsx) + components
│ ├── seed-bounties.js # Seeds bounty list from analyst tweets (idempotent)
│ ├── record-submit.js # Playwright + DOM cursor overlay → recording-submit.mp4
│ ├── package.json # remotion + playwright deps
│ └── README.md # Operator guide + brittleness notes
│
└── .github/
└── workflows/
├── ci.yml # per-commit gate: backend + frontend + docs-pairing jobs
├── deploy.yml # manual workflow_dispatch (railway up / vercel deploy)
└── pr-title.yml
DCO sign-off is enforced by the Probot DCO App, not an in-tree workflow file.
Backend — conventions¶
Layered structure¶
| Layer | Role | Rule |
|---|---|---|
| routers/ | HTTP endpoints, no business logic | Calls a service, returns a schema. Maps service-raised typed errors to HTTP status + {code, message} detail (see routers/auth.py _REGISTRATION_ERROR_STATUS / routers/admin.py _ADMIN_ERROR_STATUS). |
| services/ | Business logic | Accesses the DB through the session, never sees Request/Response, never raises HTTPException — raise a typed error subclass with a stable code and let the router translate. |
| models/ | SQLAlchemy tables | No logic — just structure |
| schemas/ | Pydantic validation | Input and output separated (Create, Read, Update, List) |
| dependencies.py | FastAPI injection | get_db, get_current_user |
Schema naming¶
GeolocationCreate → POST input
GeolocationUpdate → PATCH input
GeolocationRead → output (API response)
GeolocationList → simplified output (map, lists)
Code comments¶
Default to none. A comment earns its place only when it states something the code cannot: a hidden constraint or invariant, a bug it prevents, a security or performance rationale, why a # type: ignore / @ts-expect-error exists, a non-obvious decision, or surprising external behaviour. Delete comments that restate the adjacent line, docstrings that echo the signature, and Usage: blocks for trivial symbols. FastAPI route-handler docstrings are the exception — they surface as the OpenAPI description, so keep their first-line summary.
Local environment¶
Docker Compose¶
docker-compose.yml spins up a custom PostgreSQL image (docker/Dockerfile) bundling PostGIS, Apache AGE, and pg_cron. The two preloaded extensions need shared_preload_libraries = 'age, pg_cron' baked into postgresql.conf at image-build time — appended to postgresql.conf.sample in docker/Dockerfile since the stock postgres image doesn't honour POSTGRES_SHARED_PRELOAD_LIBRARIES. Container vidit-db; data volume mounted at /var/lib/postgresql (not /data) so AGE catalog state persists across restarts.
The backend (FastAPI via uvicorn) and the frontend (Next.js dev server) run on the host for hot reload.
docker-compose up -d → PostgreSQL on :5432
uv run uvicorn ... → backend on :8000
npm run dev → frontend on :3000
Environment variables¶
Each service has its own .env (not committed):
backend/.env—DATABASE_URL,JWT_SECRET,STORAGE_BACKEND(localors3),S3_BUCKET,AWS_REGION,CLOUDFRONT_DOMAIN,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,CORS_ORIGINS. Full list inbackend/.env.example.frontend/.env.local—NEXT_PUBLIC_API_URL. Full list infrontend/.env.local.example.
Running multiple frontends against one backend¶
The local CORS allowlist accepts every localhost:<port> (http or https) by default — see backend/app/config.py (cors_origin_regex). One backend on :8000 serves any number of concurrent frontends (main checkout, worktrees, alternate ports) without restart. For a frontend on a non-default port:
The override is only the localhost regex — explicit CORS_ORIGINS (production hosts) still apply. What keeps this safe in prod is the SameSite=lax attribute on the auth cookies (backend/app/config.py cookie_samesite), not cookie domain scoping — domain scoping governs which host receives cookies, not which origin may trigger the request. A cross-site fetch from a page at localhost:N doesn't carry lax cookies to api.vidit.app, so a hostile local page gets no credentialed response.
In prod, set CORS_ORIGIN_REGEX= (empty) in Railway env vars to drop the localhost allowance — the protection above holds only while the cookies stay SameSite=lax, and the public CORS surface shouldn't depend on a cookie attribute staying put.
CI/CD¶
GitHub Actions¶
| Workflow | Trigger | Steps |
|---|---|---|
ci.yml |
Every push to main and every PR (no path filters — required checks must always report, or a docs-only PR hangs on a path-skipped required check once branch protection is on) |
Four jobs. backend-lint: uv sync → ruff check → ruff format --check → mypy app. backend-test (needs: backend-lint — a ruff/mypy failure never starts the service container; a green run pays the lint duration before tests start): alembic upgrade head → pytest against a PostGIS service container. frontend (one job on purpose — the steps fail fast in sequence off a single npm ci; splitting checks from tests would pay a second install for nothing): npm ci → eslint → tsc --noEmit → vitest run → next build. docs-pairing (PR-only): fails the PR when it doesn't touch both docs/ (api / data-model / engineering / design / backups) AND planning/ (next.md or roadmap.md) — friction-first guardrail; if the change genuinely needs neither, override with a justification in the PR description. Dependabot PRs are exempt (gated on pull_request.user.login != 'dependabot[bot]') — routine version bumps don't carry doc impact; if one turns out to, the human handling the merge adds it via a follow-up commit. Consolidated from the previous backend.yml + frontend.yml + docs-pairing.yml trio. PR-branch force-pushes cancel the obsolete in-flight run; pushes to main always run to completion. |
codeql.yml |
Push to main, PR to main, weekly cron (Monday 06:00 UTC) |
CodeQL dataflow analysis on Python + TypeScript/JavaScript with the security-extended query suite. Findings post to Security tab → Code scanning alerts. The analyze job is gated on !github.event.repository.private — code scanning is free on public repos but a paid GitHub Advanced Security add-on on private ones, so the job runs on the public repo and skips (rather than fails) anywhere the repository is private, e.g. a private fork. |
pr-title.yml |
PR opened / edited / synchronized | Validates the PR title against Conventional Commits. Stays outside ci.yml on purpose: it re-runs on title edits, and bundling it would re-run the full test suite on every edit. |
deploy.yml |
workflow_dispatch |
See Deployment below. |
Dependabot configuration lives at .github/dependabot.yml: weekly Monday-morning version-update PRs across pip (backend), npm (frontend), and github-actions ecosystems, with grouping (@sentry/*, @tiptap/*, @typescript-eslint/*, @types/*, next + @next/* + eslint-config-next, and a minor-and-patch catch-all) so a busy ecosystem doesn't open ten PRs in one morning. Major bumps stay individual on purpose — those are the ones worth reviewing one at a time. Security PRs are unaffected by the config: they ship as Dependabot's default, one PR per advisory, on the same flow as the #21 / #22 / #23 batch.
DCO sign-off is enforced by the Probot DCO App, not an in-repo workflow. The app is installed on the org and posts a status check named DCO on every PR — walks every commit, fails the first one missing a Signed-off-by: trailer, links remediation instructions. The same de-facto-standard installation Kubernetes / Helm / containerd / Linux-kernel mirror use; trades the "no third-party in CI" posture for zero maintenance + no Actions minutes per PR event. Implements DCO 1.1 — not a CLA, no relicensing, inbound = outbound = AGPL-3.0.
Hardening (forks make every workflow run attacker-reachable):
- Every third-party action is SHA-pinned, with the human-readable version in a trailing comment (the
# vX.Y.Zform is the one Dependabot'sgithub-actionsecosystem reads to know which pin to rewrite on a version-update PR). - Every workflow declares a top-level
permissions:block scoped to the minimum it needs (contents: readfor the five CI workflows,pull-requests: readonpr-title.yml). - No workflow uses
pull_request_target— fork-PR escalation vector. Stick topull_request.
Deployment¶
| Service | Platform | Identifier | Method |
|---|---|---|---|
| Source | GitHub | github.com/vidithq/vidit — public, AGPL-3.0. Cross-linked from the landing roadmap card, the /about AGPL paragraph, and the sidebar header (next to the X + Discord shortcuts). |
Direct push to feature branches; main is branch-protected, every change lands via PR. |
| Backend | Railway | project vidit / service backend — public host https://api.vidit.app (Railway-internal backend.railway.internal) |
Dockerfile build, deployed via the deploy workflow (workflow_dispatch). Auto-deploy on push to main is off. railway up from backend/ works as a manual fallback. |
| Frontend | Vercel | team vidithq / project vidit-frontend — primary domain https://vidit.app (apex), www.vidit.app 308-redirects at the Vercel domain layer; vidit-frontend.vercel.app and any other non-canonical host 308-redirects at the Next.js proxy layer (frontend/src/proxy.ts — the file convention next@16 renamed from middleware.ts) so the project alias doesn't accumulate duplicate-content surface in search. |
Deployed via the deploy workflow (workflow_dispatch) using vercel pull + vercel build + vercel deploy --prebuilt --prod. vercel --prod from frontend/ works as a manual fallback. Per-deployment hash URLs are SSO-walled; only the project alias is public. |
| DNS | Cloudflare | vidit.app zone, DNS-only (gray cloud) |
Apex + www A → Vercel 76.76.21.21; api CNAME → Railway. Proxy mode (orange cloud) breaks Let's Encrypt cert provisioning. |
| Database | Railway | managed Postgres + PostGIS, service postgres-db (image postgis/postgis:16-3.4) |
DATABASE_URL (with internal *.railway.internal host) is auto-injected onto the backend service when the DB is attached. New consumers wire it as ${{backend.DATABASE_URL}}. Public networking is off — admin scripts run inside the backend container via railway ssh --service backend. |
| Migrations | Railway | — | Pre-deploy hook: uv run alembic upgrade head (in backend/railway.json). Runs before the new container takes traffic. |
| Media | AWS | bucket <media-bucket> (region eu-west-3), CloudFront d10w3bld05vsky.cloudfront.net (OAC, not OAI). Versioning ON; Object Lock ON with default rule GOVERNANCE / 365 days (bucket-wide — see CHANGELOG v0.3.0); CORS GET/HEAD from https://vidit.app. Every image upload lands three sibling objects: the original (post EXIF-strip), <key>_hero.jpg (max-dim 1280, JPEG q80), <key>_thumb.jpg (max-dim 400, JPEG q80). Frontend renderers derive the hero / thumbnail URL from Media.storage_url via frontend/src/lib/mediaUrls.ts — keep that helper and the backend derivative_key() in backend/app/services/storage.py in sync. |
Backend uploads via boto3 as IAM user <runtime-iam-user> (object-level perms only); bucket-level admin uses a separate <s3-admin> IAM principal. CloudFront serves the bucket. |
| Backups | Railway + AWS | Cron service backend-backup (image docker/backup/, 0 0 * * MON — Monday 00:00 UTC) → bucket <backup-bucket> (region eu-west-3). Versioning ON, SSE-S3, all public access blocked. Lifecycle: current objects expire 365d, noncurrent versions 30d, aborted multipart uploads 7d. |
Writes through IAM user <backup-iam-user> with write-only S3 permissions (PutObject/AbortMultipartUpload/ListMultipartUploadParts) on the backup bucket — no Get, no Delete. Restore reads use the <s3-admin> profile, never the runtime user. Full runbook + restore drill: backups.md. |
Naming: <product>-<env>-<region> for the bucket so a future vidit-staging-eu-west-3 slots in. Service is just backend because Railway already nests it under vidit/production. Vercel project is vidit-frontend because the team scope is vidithq.
Operating the platform — CLIs¶
Railway:
brew install railway
railway login # browser auth, saved per machine
railway link # interactive: pick project → environment → service (writes .railway/)
railway status # what's currently linked
railway variables # list
railway variables --set "KEY=value" # add/update; triggers redeploy
railway up [--detach] # build + deploy from cwd
railway logs [--build] # tail running deployment / latest build
railway run -- <command> # run a one-off in the service env
Vercel:
brew install vercel-cli
vercel login # interactive — but see Keychain quirk below
vercel link --yes --scope vidithq --project vidit-frontend
vercel env ls
printf 'value' | vercel env add NAME production # pipe avoids leaking via ps/history
vercel --prod --yes # promote to production
Vercel Keychain quirk: CLI ≥ 32 stores tokens in macOS Keychain; the auth.json file only contains {}. A sandboxed shell without Keychain access can't see credentials saved by vercel login and triggers a fresh device-auth flow on every invocation. Workaround for headless use: generate at https://vercel.com/account/tokens, then export VERCEL_TOKEN=… and pass --token="$VERCEL_TOKEN" --scope vidithq on every command.
--scope is required in non-interactive shells (no default team).
NEXT_PUBLIC_* env vars are baked into the JS bundle at build time. NEXT_PUBLIC_API_URL and NEXT_PUBLIC_DEMO_VIDEO_URL (the landing demo video — a CloudFront .mp4 URL) are passed explicitly into the build from repo variables in deploy.yml, because vercel pull doesn't reliably surface NEXT_PUBLIC_* to next build.
Observability — what's wired and how to turn it on¶
| Piece | State | How to turn on |
|---|---|---|
| Backend Sentry | SDK wired in backend/app/main.py — sentry_sdk.init(...) runs only when SENTRY_DSN is non-empty. |
Create a project at sentry.io (Python / FastAPI), copy the DSN, then on Railway backend service: railway variables --set "SENTRY_DSN=https://..." --set "SENTRY_ENVIRONMENT=production". Verify: hit a 5xx path or sentry_sdk.capture_message('hello') from railway ssh and confirm it lands. |
| Frontend Sentry | SDK wired in frontend/instrumentation-client.ts + sentry.server.config.ts + sentry.edge.config.ts; booted by frontend/instrumentation.ts which also re-exports onRequestError = Sentry.captureRequestError so errors thrown inside nested React Server Components reach Sentry. Sentry.init(...) runs only when NEXT_PUBLIC_SENTRY_DSN (client) or SENTRY_DSN (server / edge) is non-empty. app/error.tsx + app/global-error.tsx forward caught exceptions via Sentry.captureException (React error boundaries are not auto-captured). next.config.mjs is wrapped with withSentryConfig. |
On Vercel set NEXT_PUBLIC_SENTRY_DSN (Production) + SENTRY_DSN (server runtime) + NEXT_PUBLIC_SENTRY_ENVIRONMENT=production + SENTRY_ENVIRONMENT=production. For build-time source-map upload also add repo variables SENTRY_ORG + SENTRY_PROJECT + repo secret SENTRY_AUTH_TOKEN (wired through deploy.yml) and set the same on Vercel. Trigger a deploy workflow run. Verification: see Frontend Sentry verification below. |
| Uptime monitor | External. Pings /health from outside Railway region to catch outages. |
Pick a free tier (UptimeRobot, BetterStack, Hyperping). Add https://api.vidit.app/health as an HTTP monitor, 1–5 min cadence, alert routes to owner email + the Vidit Discord webhook. Health endpoint is unauthenticated and returns {"status":"ok"}. |
| CloudWatch budget alarm | External. $20/mo guardrail against a forgotten log-volume spike or a runaway CloudFront-cache-miss bill. | AWS console → Billing → Budgets → Create budget → Cost budget, monthly $20 fixed amount, threshold 80% actual + 100% forecasted → email alert to owner. |
Branch protection on main |
External — configured via the branch-protection API; free on public repos (unenforced on free-plan private ones). | Active rule: PRs only, six required status checks (the four ci.yml jobs — Backend lint & format, Backend tests, Frontend lint, type-check, test, build, PR touches docs/ and planning/ — plus pr-title.yml's Conventional commit title and DCO from the Probot DCO App, not a workflow file), enforced for admins, linear history required, force-push and branch deletion disallowed. No required-review count: a sole maintainer cannot approve their own PR, so a review floor would deadlock every merge — add one (or CODEOWNERS) when a second maintainer exists. strict (require branch up to date) is off so the weekly Dependabot wave merges without per-PR rebase round-trips. ci.yml runs un-path-filtered precisely so these required checks always report. |
| Secret scanning + push protection | External — Settings → Code security; free on public repos, no config file. | Both enabled. Scanning alerts on provider-pattern tokens/keys anywhere in history and new commits; push protection rejects a push containing one before it lands (bypassable per-push with a logged justification). Alerts surface in Security tab → Secret scanning. |
Frontend Sentry verification¶
Drilled 2026-05-18. In an incognito window (extensions disabled):
- (a) Browse a few pages and check sentry.io → your project → Sessions for ticks within ~1 min. Session tracking emits an envelope per page load — no console action needed.
- (b) For an explicit issue, run
setTimeout(() => { throw new Error("manual test") }, 0)in DevTools. ThesetTimeoutmatters: a synchronousthrowfrom the console is swallowed by the DevTools wrapper and never reacheswindow.onerror. The SDK doesn't exposeSentryonwindowin 10.x, soSentry.captureMessage(...)from the console errors withSentry is not defined.
Ad-blocker caveat. uBlock, Brave shields, AdGuard, and most browser tracking-protection lists block direct POSTs to *.ingest.sentry.io with ERR_BLOCKED_BY_CLIENT. The fix is tunnelRoute: "/monitoring" in withSentryConfig, which proxies envelopes through a same-origin route — not yet wired.
Maintenance runbooks¶
Mint an invite code from the host:
railway ssh --service backend -- python <<'EOF'
import os, secrets, string
from datetime import UTC, datetime, timedelta
from app.database import SessionLocal
from app.models.invite_code import InviteCode
alphabet = string.ascii_uppercase + string.digits
code = "".join(secrets.choice(alphabet) for _ in range(12))
db = SessionLocal()
try:
row = InviteCode(
code=code,
max_uses=1,
expires_at=datetime.now(UTC) + timedelta(days=7),
note="break-glass",
)
db.add(row)
db.commit()
print(f"invite code: {code} (expires in 7d, 1 use)")
finally:
db.close()
EOF
Generate curated demo geolocations from the admin panel: make seed covers the auto-generated 50-point dataset for onboarding. For curated demos (promo recordings, screenshots, manually-themed content), populate s3://<bucket>/demo-pool/geo-XX/{media,proof}/ (or .local-storage/demo-pool/geo-XX/{media,proof}/ when STORAGE_BACKEND=local) with photos per template, then go to /admin → Demo data panel → enter a count → Generate. Seeded geos carry a demo tag for filtering; the same panel wipes them.
Clean up an orphan Railway domain (e.g. an auto-generated *.up.railway.app host — leaks the project name to scanners):
Railway dashboard → project `vidit` → service `postgres-db` → Settings → Networking
→ remove any public domain that isn't actively in use
Public networking on postgres-db is off. Delete any public domain with no DATABASE_PUBLIC_URL consumer.
Particularities (non-obvious things that bit us)¶
postgres://→postgresql://— Railway injects the legacy scheme; SQLAlchemy 2 only loads underpostgresql://. We string-prefix swap inbackend/app/config.py_normalize_postgres_scheme. Fix landed in PR #21.$PORTnot expanded inrailway.json'sstartCommand— Railway passes the literal string$PORT. Fix: dropstartCommandand let the DockerfileCMD ["sh", "-c", "… --port ${PORT:-8000}"]expand it. See PR #22.CORS_ORIGINSis a comma-separated string, not pydantic's default JSON list. Propertycors_origins_listparses it. The deployed Vercel alias must be in the list or browser calls fail at preflight. See PR #23.COOKIE_DOMAINmust be.vidit.appin prod — thevidit_csrfcookie is set byapi.vidit.appbut read by JavaScript atvidit.app. Without the parent-domain scope (COOKIE_DOMAIN=.vidit.appon the Railwaybackendservice) the double-submit CSRF check can't see the token and every mutating request fails withCSRF token missing or invalid.- Two
ghaccounts on the same machine drift — symptom isRepository not foundongit fetchfor a repo you can normally access. Fix:gh auth statusthengh auth switch --user <correct-account>.ghconfigures git's credential helper. - The Vercel bundle stays up during a backend outage — static JS loads from Vercel CDN regardless of Railway state. When investigating "the site is broken", check
/healthon Railway first. - uvicorn needs
--proxy-headersbehind Railway, AND nothing may readrequest.client.hostfor security purposes — without--proxy-headers --forwarded-allow-ips='*'(set in the Dockerfile'sCMD),request.url.schemedefaults tohttpand absolute URLs in emails go out broken. With those flags, however, uvicorn populatesrequest.client.hostfrom the left-most entry ofX-Forwarded-For(uvicorn'salways_trust=Truebranch returnsx_forwarded_for_hosts[0]). Railway appends toX-Forwarded-Forrather than overwriting it, so the left-most entry is whatever the client sent — fully attacker-controlled. The two callers that need a trustworthy client IP — the slowapi rate limiter and the auth-events audit log — both route throughservices/audit.py::extract_client_ip, which parses XFF itself and picks the right-most entry (the one the trusted proxy actually wrote). The slowapi side specifically uses therate_limit_keywrapper (same module) as itskey_func. Without that, an attacker could rotateX-Forwarded-For: <random>to mint a fresh per-IP rate-limit bucket per request, or sendX-Forwarded-For: <victim_ip>to pin a victim's bucket and lock them out — defeating every per-endpoint rate limit. Never readrequest.client.hostdirectly for rate-limit, auth, or audit purposes; reach forextract_client_ip/rate_limit_key. If a second trusted proxy ever sits in front of Railway (Cloudflare, etc.), bumpTRUSTED_PROXY_HOPSto match —extract_client_ippeels one extra hop per increment.
Package management¶
| Service | Tool | File |
|---|---|---|
| Backend | uv | pyproject.toml + uv.lock |
| Frontend | npm | package.json + package-lock.json |
Dependency security updates¶
Dependabot watches both ecosystems (pip on backend/uv.lock, npm on frontend/package-lock.json) and opens a security alert per advisory at github.com/vidithq/vidit/security/dependabot. The alert carries the GHSA ID, the vulnerable range, and the first patched version — the inputs needed to decide whether the fix lands as a lockfile-only refresh, a direct-dep bump, or a targeted overrides entry.
Three flows in practice:
- Transitive — lockfile-only. When the vulnerable package is reached through another dep and the resolver can pull the patched version without lifting a top-level constraint, the fix is a
uv lock --upgrade(backend) ornpm update <pkg>/npm audit fix(frontend) and nothing else.pyproject.tomlandpackage.jsondon't move. Bundles the rest of the resolver-drift bumps along with it; gated by theci.ymljobs green on the lock-only diff. - Direct — manifest + lock. When the patched version is outside the current top-level constraint (a SemVer-major bump on a direct dep is the common case), the fix lands the manifest bump in the same PR as the lock refresh. A breaking-change pass is part of the diff; tests and types are the floor, browser smoke for the frontend.
- Override-pinned —
npmoverrides. When a transitivenpmdep ships a CVE and the direct parent can't be lifted in the same PR (e.g.,eslint-config-nextpinned to^14.2until the Next migration;maplibre-glon its own release cadence),frontend/package.jsonoverridesforce-resolve the patched version with targeted-range syntax (pkg@<x.y.zto scope to the vulnerable range only,parent>pkgfor a single nested path). Universal overrides would force-downgrade safe higher-major lines elsewhere in the tree (e.g.,@sentry/bundler-plugin-core'sglob@13) and tripnpm lspeer-warning noise that breaksnpm ciin CI; the targeted forms avoid both. Override values are written as ranges (^x.y.z), not exact pins — npm 10 (used bynpm ciin CI viaactions/setup-node@v4.4.0) rewrites the consumer's peer-dep range to match the override exactly, so a fixed"8.5.10"collapses anautoprefixer@10peer postcss: "^8.1.0"intopeer postcss: "8.5.10"and clashes the moment top-level postcss resolves to a higher patch.
Dependabot itself opens version-bump PRs when it can — those land via the same PR flow as any contribution (Conventional title, sign-off, docs/+planning/ touch). Batched lockfile refreshes (closing N advisories at once with one uv lock --upgrade) cite each GHSA in the CHANGELOG entry so the audit trail stays per-advisory even though the diff is one lockfile.