Data model¶
Overview¶
erDiagram
users {
UUID id PK
VARCHAR username
VARCHAR email
VARCHAR password_hash
BOOLEAN is_active
BOOLEAN is_admin
BOOLEAN is_trusted
TEXT trust_reason "nullable"
TIMESTAMPTZ email_verified_at "nullable"
TIMESTAMPTZ deleted_at "nullable, soft-delete"
BOOLEAN is_demo "synthetic demo author"
INTEGER token_version "session-invalidation counter"
TEXT bio "nullable, profile blurb"
TEXT avatar_url "nullable"
JSONB external_links "default {}, linktree-style"
TIMESTAMPTZ created_at
}
admin_events {
UUID id PK
UUID actor_id FK "nullable"
TEXT action
JSON target "nullable"
TIMESTAMPTZ created_at
}
auth_events {
UUID id PK
UUID user_id FK "nullable"
TEXT event
INET ip "nullable"
TEXT user_agent "nullable"
TIMESTAMPTZ created_at
}
auth_tokens {
UUID id PK
UUID user_id FK
TEXT token_hash
TEXT purpose "password_reset | email_verification (legacy)"
TIMESTAMPTZ expires_at
TIMESTAMPTZ consumed_at "nullable"
TIMESTAMPTZ created_at
}
invite_codes {
UUID id PK
VARCHAR code
UUID created_by FK
UUID used_by FK "audit-only"
TIMESTAMPTZ used_at "audit-only"
TIMESTAMPTZ expires_at "nullable"
INT max_uses
INT use_count
TIMESTAMPTZ revoked_at "nullable"
TIMESTAMPTZ created_at
}
pending_registrations {
UUID id PK
VARCHAR email UK
VARCHAR username UK
VARCHAR password_hash
UUID invite_code_id FK
TEXT token_hash UK
TIMESTAMPTZ expires_at
TIMESTAMPTZ created_at
}
geolocations {
UUID id PK
UUID author_id FK
VARCHAR title
GEOMETRY location
TEXT source_url
JSONB proof
DATE event_date
TIMESTAMPTZ deleted_at "nullable, soft-delete"
BOOLEAN is_demo "synthetic demo row"
UUID originated_from_bounty_id FK "nullable, trace to bounty"
TIMESTAMPTZ created_at
TIMESTAMPTZ updated_at
}
bounties {
UUID id PK
UUID author_id FK
VARCHAR title
TEXT source_url
JSONB description "nullable, Tiptap"
VARCHAR status "open|fulfilled|closed"
TIMESTAMPTZ closed_at "nullable"
TIMESTAMPTZ deleted_at "nullable, soft-delete"
BOOLEAN is_demo "synthetic demo row"
TIMESTAMPTZ created_at
TIMESTAMPTZ updated_at
}
bounty_claims {
UUID bounty_id FK
UUID user_id FK
TIMESTAMPTZ created_at
}
media {
UUID id PK
UUID geolocation_id FK "nullable, XOR with bounty_id"
UUID bounty_id FK "nullable, XOR with geolocation_id"
TEXT storage_url
VARCHAR media_type
VARCHAR sha256 "nullable, hex-encoded SHA-256 of uploaded bytes"
INET uploaded_ip "nullable, submitter IP at upload time"
TEXT uploaded_user_agent "nullable, submitter UA at upload time"
TEXT original_filename "nullable, client-supplied filename"
TIMESTAMPTZ created_at
}
proof_images {
UUID id PK
TEXT s3_key
UUID user_id FK
UUID geolocation_id FK "nullable"
VARCHAR sha256 "nullable, hex-encoded SHA-256 of uploaded bytes"
INET uploaded_ip "nullable, submitter IP at upload time"
TEXT uploaded_user_agent "nullable, submitter UA at upload time"
TEXT original_filename "nullable, client-supplied filename"
TIMESTAMPTZ created_at
}
tags {
UUID id PK
VARCHAR name
VARCHAR category
}
geolocation_tags {
UUID geolocation_id FK
UUID tag_id FK
}
bounty_tags {
UUID bounty_id FK
UUID tag_id FK
}
follows {
UUID follower_id FK
UUID followed_id FK
TIMESTAMPTZ created_at
}
users ||--o{ invite_codes : "created_by"
users ||--o{ invite_codes : "used_by"
invite_codes ||--o{ pending_registrations : "invite_code_id"
users ||--o{ auth_tokens : "user_id"
users ||--o{ admin_events : "actor_id"
users ||--o{ auth_events : "user_id"
users ||--o{ geolocations : "author_id"
users ||--o{ bounties : "author_id"
bounties ||--o{ geolocations : "originated_from_bounty_id"
geolocations ||--o{ media : "geolocation_id"
bounties ||--o{ media : "bounty_id"
geolocations ||--o{ proof_images : "geolocation_id"
users ||--o{ proof_images : "user_id"
geolocations ||--o{ geolocation_tags : "geolocation_id"
tags ||--o{ geolocation_tags : "tag_id"
bounties ||--o{ bounty_tags : "bounty_id"
tags ||--o{ bounty_tags : "tag_id"
bounties ||--o{ bounty_claims : "bounty_id"
users ||--o{ bounty_claims : "user_id"
users ||--o{ follows : "follower_id"
users ||--o{ follows : "followed_id"
Tables¶
users¶
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
username |
VARCHAR(50) |
UNIQUE, NOT NULL |
email |
VARCHAR(255) |
UNIQUE, NOT NULL |
password_hash |
VARCHAR(255) |
NOT NULL |
is_active |
BOOLEAN |
NOT NULL, default true |
is_admin |
BOOLEAN |
NOT NULL, default false — auto-flipped on login/register if email matches ADMIN_EMAILS |
is_trusted |
BOOLEAN |
NOT NULL, default false — substantiated trust mark (toggle UI lands later; column ships now) |
trust_reason |
TEXT |
nullable — required at the application layer when is_trusted=true |
email_verified_at |
TIMESTAMPTZ |
nullable — set to created_at by the pre-creation registration flow. Every row minted after the pending_registrations migration only exists because the analyst clicked the confirmation link, so this is non-NULL for new accounts. |
deleted_at |
TIMESTAMPTZ |
nullable — non-NULL = soft-deleted (login rejected, profile 404s, filtered from public reads). Soft-deleting a user cascade-soft-deletes every geolocation they authored. Hard-delete (the GDPR escape hatch) drops the row + cascade-drops their geolocations + sweeps S3. |
is_demo |
BOOLEAN |
NOT NULL, default false — TRUE iff created by the admin Demo data seeder (5 fixed demo-analyst-N accounts with unloggable hashes + @vidit.invalid emails). The wipe button drops every flagged user + their geolocations in one go. |
token_version |
INTEGER |
NOT NULL, default 0 — monotonic session-invalidation counter. The session JWT embeds this as a tv claim; get_current_user 401s on mismatch. Bumped on logout, password change, password reset, and soft-delete so every outstanding JWT for the user becomes invalid at once. Pre-migration cookies (no tv) 401 too — the migration's one-time forced logout is intended. |
bio |
TEXT |
nullable — short plain-text blurb shown on the public profile, edited via PATCH /users/me. Capped at 500 chars at the API layer (no DB constraint — cap changes don't need migrations). |
avatar_url |
TEXT |
nullable — public avatar URL. Validated as http(s) at the API layer to keep javascript: URLs out of the <img src> render path. No upload pipeline yet (free-form URL — analysts paste a Gravatar / CDN link). |
external_links |
JSONB |
NOT NULL, default '{}'::jsonb — Linktree-style object keyed by platform (x, discord, website, github). Default {} (never NULL) so the read path is always a dict. PATCH /users/me replaces the column wholesale; partial-merge conflicts with the whole-panel form submit. |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
ix_users_liveon(created_at) WHERE deleted_at IS NULL— partial; admin search and the auth path filterdeleted_at IS NULLix_users_demoon(id) WHERE is_demo = true— partial; the wipe sweep runsWHERE is_demo = trueand would otherwise full-scan the tableix_users_search_ftsGIN onto_tsvector('simple', coalesce(username, '') || ' ' || coalesce(bio, ''))— backsGET /search(analyst branch); bio joins the indexed expression sots_headlinecan return a fragment highlight
auth_tokens¶
Single-use, expiring tokens for password reset.
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default uuid4() |
user_id |
UUID |
FK → users.id ON DELETE CASCADE, NOT NULL |
token_hash |
TEXT |
UNIQUE, NOT NULL — sha256(secret); the plaintext is only ever in the email link |
purpose |
TEXT |
NOT NULL, CHECK in ('password_reset', 'email_verification') — only password_reset is minted today; email_verification is a legacy value retained as the DB-level cross-purpose-rejection anchor |
expires_at |
TIMESTAMPTZ |
NOT NULL |
consumed_at |
TIMESTAMPTZ |
nullable; non-null = single-use already redeemed |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
ix_auth_tokens_user_idon(user_id)— cascade delete + per-user revocation lookupsix_auth_tokens_user_purposeon(user_id, purpose)— "is there a live reset for this user?" before mintix_auth_tokens_live_expires_aton(expires_at) WHERE consumed_at IS NULL— partial; reaper scan stays cheap as consumed rows accumulate
Lifecycle: created by mint, consumed by consume, force-revoked by revoke_all_live_for_user on fresh same-purpose mint. Cleanup runs on-demand via the admin Maintenance panel (services/maintenance.py::reap_auth_tokens): live-but-expired rows are deleted immediately, and consumed rows older than 30 days are dropped.
invite_codes¶
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
code |
VARCHAR(64) |
UNIQUE, NOT NULL |
created_by |
UUID |
FK → users.id ON DELETE SET NULL, nullable (admin-seeded codes; FK nulled on user hard-delete to preserve the audit row) |
used_by |
UUID |
FK → users.id ON DELETE SET NULL, nullable — audit-only (records the first user to consume the code; FK nulled on user hard-delete) |
used_at |
TIMESTAMPTZ |
nullable — audit-only, paired with used_by |
expires_at |
TIMESTAMPTZ |
nullable |
max_uses |
INTEGER |
NOT NULL, default 1 — quota; minted by the admin page (1 = single-use) |
use_count |
INTEGER |
NOT NULL, default 0 — incremented on each successful registration |
revoked_at |
TIMESTAMPTZ |
nullable — set by the admin page; non-null instantly invalidates the code |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
A code is valid iff revoked_at IS NULL AND use_count < max_uses AND (expires_at IS NULL OR expires_at > now()). used_by / used_at are not part of the validity check.
pending_registrations¶
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
email |
VARCHAR(255) |
UNIQUE, NOT NULL — holds the address until the user confirms or the row expires |
username |
VARCHAR(50) |
UNIQUE, NOT NULL — same role as email for handle uniqueness |
password_hash |
VARCHAR(255) |
NOT NULL — bcrypt; transferred straight into users.password_hash at confirmation |
invite_code_id |
UUID |
FK → invite_codes.id ON DELETE CASCADE, NOT NULL — invite is referenced, not consumed, until confirmation |
token_hash |
TEXT |
UNIQUE, NOT NULL — sha256(secret); the plaintext is only ever in the email link |
expires_at |
TIMESTAMPTZ |
NOT NULL — 24h after mint |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
ix_pending_registrations_expires_aton(expires_at)— reaper sweep
Why UNIQUE on email / username instead of a partial index? Partial-index predicates must be IMMUTABLE in Postgres, and expires_at > now() is STABLE. The plain UNIQUE keeps race-window protection without a predicate; the /auth/register path deletes expired rows inline before its INSERT and the admin Maintenance reaper sweeps the rest, so a recently-expired pending row does not permanently pin its address.
Lifecycle: POST /auth/register inserts via services/registration.py::create_pending_registration. POST /auth/confirm-registration consumes via confirm_pending_registration (creates the users row, bumps invite_codes.use_count, deletes the pending row). POST /auth/resend-confirmation re-mints the token on the same row, invalidating the previous link. Cleanup: services/maintenance.py::reap_pending_registrations, exposed in the admin Maintenance panel.
auth_events¶
Append-only audit log for auth-relevant events. Populated synchronously from the auth router on login / failed_login / logout / register_pending / register_resent / register_confirmed / password_reset_requested / password_reset_completed / password_changed. Writes happen inside a SAVEPOINT (db.begin_nested()) so an INSERT failure (e.g. INET coercion from a malformed X-Forwarded-For) rolls back only the audit row and the caller's transaction stays usable.
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default uuid4() |
user_id |
UUID |
FK → users.id ON DELETE SET NULL, nullable — NULL when the email didn't match a live user (failed_login, password_reset_requested no-op branch, register_pending, register_resent on both branches, anonymous logout) so the row doesn't double as a probe-able email-to-existence oracle |
event |
TEXT |
NOT NULL — plain string, no DB enum so adding a new event kind doesn't require a migration |
ip |
INET |
nullable — IPv4 / IPv6, parsed via ipaddress.ip_address in services/audit.py::extract_client_ip. We take the right-most entry of X-Forwarded-For (the trusted-proxy observation), not the left-most: left-most is client-spoofable. Assumes one trusted hop (Railway); add a TRUSTED_PROXY_HOPS peel when a second trusted proxy lands in front. An invalid value lands as NULL rather than poisoning the savepoint. |
user_agent |
TEXT |
nullable, capped at 1024 chars |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
ix_auth_events_user_id_created_aton(user_id, created_at)— "what did this user do, latest first" forensics queryix_auth_events_event_created_aton(event, created_at)— "did event X spike recently"
No retention policy today.
admin_events¶
Append-only audit row for admin actions taken via the /admin page. Sibling to the auth_events table above.
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default uuid4() |
actor_id |
UUID |
FK → users.id ON DELETE SET NULL, nullable |
action |
TEXT |
NOT NULL — e.g. invite_created, invite_revoked |
target |
JSON |
nullable — free-form context (target IDs, parameters) |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
ix_admin_events_actor_idon(actor_id)— "what did this admin do?"ix_admin_events_created_aton(created_at)— chronological scans
geolocations¶
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
author_id |
UUID |
FK → users.id, NOT NULL |
title |
VARCHAR(255) |
NOT NULL |
location |
GEOMETRY(Point, 4326) |
NOT NULL |
source_url |
TEXT |
NOT NULL |
proof |
JSONB |
nullable — Tiptap document (ProseMirror JSON) |
event_date |
DATE |
NOT NULL |
deleted_at |
TIMESTAMPTZ |
nullable — non-NULL = soft-deleted (filtered from public reads; admin-only thereafter) |
is_demo |
BOOLEAN |
NOT NULL, default false — TRUE iff seeded by the admin Demo data panel. Surfaced via the always-attached demo free tag (filterable in the map UI) and dropped en masse by the wipe button. |
originated_from_bounty_id |
UUID |
FK → bounties.id ON DELETE SET NULL, nullable — provenance trace populated when a geolocation is promoted from a bounty. SET NULL so a hard-deleted bounty doesn't take its descendant geolocation with it. |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
updated_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
- GIST(location) — required for geospatial queries (bounding-box filtering, proximity sort)
- (author_id) — profile lookup
- (event_date), (created_at) — time-based queries
- (author_id, created_at DESC) — composite for profile listing
- ix_geolocations_live on (created_at) WHERE deleted_at IS NULL — partial; every public read filters deleted_at IS NULL; the partial keeps the index tight.
- ix_geolocations_demo on (id) WHERE is_demo = true — partial; the demo-wipe sweep runs WHERE is_demo = true and otherwise full-scans
- ix_geolocations_originated_from_bounty_id on (originated_from_bounty_id) — bounty detail reads "what geolocation came from me?" via this column
- uq_geolocations_originated_from_bounty_id on (originated_from_bounty_id) WHERE originated_from_bounty_id IS NOT NULL — unique partial; at most one geolocation per fulfilled bounty. Backs the SELECT ... FOR UPDATE taken by POST /geolocations bounty_id=…: even if a future code path forgets the row lock, the second insert fails on constraint violation. The WHERE clause keeps standalone geolocations out of the uniqueness check.
- ix_geolocations_search_fts GIN on to_tsvector('simple', coalesce(title, '')) — backs GET /search. simple config (not english) keeps matching predictable for the closed beta corpus of place names and analyst handles; soft-delete is filtered at query time. source_url is intentionally not in the indexed expression — see migration o1j3k5l7m9n1 for the rationale (Postgres' simple parser tokenizes URLs as host/path units).
locationis a PostGIS point in WGS84 (SRID 4326 = standard GPS coordinates). GeoAlchemy2 exposes.lat/.lngviaWKBElement, orST_X/ST_Yin raw SQL.
bounties¶
An unfinished geolocation: media + a source URL someone couldn't place. Lifecycle: open → fulfilled (a geolocation was promoted from this bounty via POST /geolocations bounty_id=…) or closed (author withdrew). "I'm working on this" is a parallel multi-claimer signal in bounty_claims, not a state.
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
author_id |
UUID |
FK → users.id, NOT NULL |
title |
VARCHAR(255) |
NOT NULL |
source_url |
TEXT |
NOT NULL — required because a bounty is evidence in search of a location |
description |
JSONB |
nullable — Tiptap document (ProseMirror JSON), sanitised server-side on insert |
status |
VARCHAR(20) |
NOT NULL, default 'open' — one of open, fulfilled, closed. Plain string (no DB enum) so adding a state doesn't require a migration |
closed_at |
TIMESTAMPTZ |
nullable — set when status transitions to fulfilled or closed |
deleted_at |
TIMESTAMPTZ |
nullable — soft-delete; filtered from every public read (admin path acts on the raw model) |
is_demo |
BOOLEAN |
NOT NULL, default false — TRUE iff seeded by the admin "Demo bounties" panel. The wipe button drops every flagged row in one bulk DELETE. |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
updated_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
- ix_bounties_status_created_at on (status, created_at) — "open bounties, newest first" is the dominant index query
- ix_bounties_author_id on (author_id) — profile + admin lookups
- ix_bounties_deleted_at on (deleted_at) — cheap soft-delete filter on every public read
- ix_bounties_demo on (id) WHERE is_demo = true — partial; the demo-wipe sweep runs WHERE is_demo = true and otherwise full-scans
- ix_bounties_search_fts GIN on to_tsvector('simple', coalesce(title, '')) — parallel index to the geolocations one, backs the same GET /search endpoint
The pointer from a fulfilled bounty to the resulting geolocation lives on geolocations.originated_from_bounty_id (not on this table) — geolocation-side because read paths render the trace. DELETE /bounties/{id} refuses with 409 when a geolocation already points to the row, so the descendant isn't orphaned silently.
bounty_tags¶
Many-to-many junction between bounties and tags. Mirrors geolocation_tags.
| Column | Type | Constraints |
|---|---|---|
bounty_id |
UUID |
FK → bounties.id ON DELETE CASCADE |
tag_id |
UUID |
FK → tags.id ON DELETE CASCADE |
Composite PK: (bounty_id, tag_id)
bounty_claims¶
Multi-analyst "I'm working on this" signal. Multiple analysts can hold a claim on the same bounty simultaneously — the claim is a public hint to coordinate, not a single-claimer reservation. Composite PK makes re-claiming idempotent at the DB level; the POST /bounties/{id}/claim endpoint exploits that and returns 204 either way.
| Column | Type | Constraints |
|---|---|---|
bounty_id |
UUID |
FK → bounties.id ON DELETE CASCADE |
user_id |
UUID |
FK → users.id ON DELETE CASCADE |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Composite PK: (bounty_id, user_id)
Indexes:
- ix_bounty_claims_bounty_id_created_at on (bounty_id, created_at) — "who's working on bounty X right now?" detail-page query, newest-first ordering
- ix_bounty_claims_user_id on (user_id) — profile / dashboard "what is this user working on?"
Claims don't gate the lifecycle: a bounty can be fulfilled by an analyst who never claimed, and claims aren't cleared when the bounty terminates. Hard-delete on the bounty cascades-drops the rows.
follows¶
Directed follow edges between analysts. Drives the per-user GET /timeline feed and the followers_count / following_count / is_following fields on GET /users/{username}.
| Column | Type | Constraints |
|---|---|---|
follower_id |
UUID |
FK → users.id ON DELETE CASCADE, NOT NULL |
followed_id |
UUID |
FK → users.id ON DELETE CASCADE, NOT NULL |
created_at |
TIMESTAMPTZ |
NOT NULL, default CURRENT_TIMESTAMP |
Composite PK: (follower_id, followed_id) — the pair is the natural identity (no surrogate id) and the PK alone gives uniqueness, so no separate UNIQUE constraint is shipped.
Self-follow is rejected at two layers: the router returns 400 Cannot follow yourself before the row is staged, and CHECK (follower_id <> followed_id) (constraint ck_follows_no_self_follow) refuses the INSERT even if a future code path skips the router check. The 400 gives the UI a clean error, the CHECK is the durable invariant.
Indexes:
- ix_follows_followed_id on (followed_id) — the PK indexes the forward direction (who is X following?) on its leading column, but the reverse direction (who follows X?) — the query that powers followers_count on every profile load — would otherwise full-scan.
ON DELETE CASCADE on both FKs: hard-deleting an analyst drops every edge they're on either side of, so a deleted user can't keep ghost-followers or ghost-followings. Soft-deleted users (users.deleted_at IS NOT NULL) keep their edges — the public profile 404s anyway, and resurrecting an account should resurrect its graph.
media¶
Shared by geolocations and bounties. Exactly one of geolocation_id / bounty_id is non-NULL — the XOR is enforced both at the model layer and by a DB CHECK constraint (ck_media_exactly_one_owner). The polymorphism keeps the slice-2 bounty-fulfilment flow cheap: instead of re-uploading S3 objects, promoting a bounty rewrites the row pointers in place.
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
geolocation_id |
UUID |
FK → geolocations.id ON DELETE CASCADE, nullable — XOR with bounty_id |
bounty_id |
UUID |
FK → bounties.id ON DELETE CASCADE, nullable — XOR with geolocation_id |
storage_url |
TEXT |
NOT NULL — S3 / CloudFront URL |
media_type |
VARCHAR(10) |
NOT NULL — 'image' or 'video' |
sha256 |
VARCHAR(64) |
nullable — hex-encoded SHA-256 of the uploaded bytes, captured at upload time (services/storage.py::UploadResult). Stable content fingerprint that survives storage-class changes and copy operations, unlike the S3 ETag (MD5 for non-multipart uploads, not stable across copies). NULL on rows that pre-date this column. Demo-seeder rows now carry the hash too — services/seed.py::_prepare_pool_media runs a one-pass-per-seed hash + derivative production over each unique demo-pool/ media key, so demo content matches real upload integrity. The hash is computed on the bytes that land on S3 — for images that means after the EXIF strip — so an auditor downloading the public URL can independently verify the hash matches. |
uploaded_ip |
INET |
nullable — submitter IP at upload time. Extracted via services/audit.py::extract_client_ip (right-most XFF, defended against client-spoofed prepends). NULL when the value isn't parseable as IPv4 / IPv6, on demo-seeder rows, and on rows that pre-date the column. Admin-only — never surfaced on the public read API. |
uploaded_user_agent |
TEXT |
nullable — submitter User-Agent string at upload time, capped at 1 KB. Same admin-only visibility as uploaded_ip. |
original_filename |
TEXT |
nullable — client-supplied filename (e.g. IMG_1234.jpg). Surfaced on the public read API because investigators sometimes need to trace evidence back to a source post by filename. |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Check constraint:
- ck_media_exactly_one_owner: (geolocation_id IS NOT NULL AND bounty_id IS NULL) OR (geolocation_id IS NULL AND bounty_id IS NOT NULL)
Indexes:
- (sha256) WHERE sha256 IS NOT NULL — partial index for "find every row with this content hash" audit / dedup queries; only the populated cohort, so demo rows don't bloat it.
At least one media item is required per geolocation and per bounty (enforced at the application layer — both POST /geolocations and POST /bounties reject empty file lists). Multiple media items per parent are allowed.
The media table stores the top-level evidence files uploaded with the form (the gallery shown above the proof body). Inline images embedded inside the proof body — added via the rich-text editor's + Image button — live in proof_images instead.
S3 key prefixes: uploads/<geolocation_id>/... for geolocation media, bounty_uploads/<bounty_id>/... for bounty media. When slice 2 promotes a bounty, we rewrite the DB pointers but leave the S3 keys where they are.
proof_images¶
Inline images embedded inside the Tiptap proof body (one row per image referenced from the <image> nodes in geolocations.proof).
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
s3_key |
TEXT |
UNIQUE, NOT NULL — canonical key (host-stripped); inverse of Storage.public_url |
user_id |
UUID |
FK → users.id ON DELETE CASCADE, NOT NULL |
geolocation_id |
UUID |
FK → geolocations.id ON DELETE CASCADE, nullable until form submit |
sha256 |
VARCHAR(64) |
nullable — hex-encoded SHA-256 of the uploaded bytes (same rationale and NULL-cohort as media.sha256). |
uploaded_ip |
INET |
nullable — submitter IP at upload (same rationale and admin-only visibility as media.uploaded_ip). |
uploaded_user_agent |
TEXT |
nullable — submitter UA, capped at 1 KB. Admin-only. |
original_filename |
TEXT |
nullable — client-supplied filename. Public on the read API. |
created_at |
TIMESTAMPTZ |
NOT NULL, default now() |
Indexes:
- (user_id) — orphan reaper scopes by user; future per-user quota
- (geolocation_id) — collect keys for the cascade-delete S3 sweep
- (created_at) WHERE geolocation_id IS NULL — partial index for the orphan reaper
- (sha256) WHERE sha256 IS NOT NULL — partial content-hash index, mirrors ix_media_sha256
Lifecycle:
1. Editor uploads an image → POST /geolocations/proof-images writes to S3 and inserts a row with geolocation_id = NULL.
2. User submits the geolocation form → after sanitization, POST /geolocations extracts the <image> srcs from the proof JSON, derives s3_keys, and UPDATE proof_images SET geolocation_id = :geo WHERE s3_key IN (...) AND user_id = :u AND geolocation_id IS NULL. The user-id scope blocks one user from claiming another user's keys.
3. Geolocation deletion → the router collects matching s3_keys, deletes the S3 objects, then commits the cascade DB delete. (S3 is deleted post-commit so a failed transaction can't strand referenced files.)
4. Form abandoned → the row stays with geolocation_id = NULL and is reaped on-demand from the admin Maintenance panel (services/maintenance.py::reap_proof_image_orphans, 24h grace), which deletes both the S3 object and the row.
Why a separate table instead of embedding the URL in the proof JSON: gives O(1) orphan reaping, lets us sweep S3 on geolocation delete, and supports per-user quotas / takedown lookups without scanning the bucket or parsing JSONB.
tags¶
| Column | Type | Constraints |
|---|---|---|
id |
UUID |
PK, default gen_random_uuid() |
name |
VARCHAR(100) |
UNIQUE, NOT NULL |
category |
VARCHAR(20) |
NOT NULL — 'conflict', 'capture_source', or 'free' |
Tags with category conflict are the main conflicts (Ukraine, Gaza, Sudan…), plus an Other escape value.
Tags with category capture_source are the original "lens" that captured the media — Smartphone, Satellite, Drone, Static camera, Dashcam, Body / helmet cam, plus an Unknown escape value. Seeded in prod by migration s5n7p9r1t3v5 (the demo seeder is local-only, but the category is required on the submit form, so the options must exist on a fresh prod DB).
Tags with category free are user-created and free-form.
conflict and capture_source are curated (server-managed, not user-creatable) and required: POST /geolocations rejects a submission that doesn't carry at least one tag from each category (see api.md → POST /geolocations). The rule is enforced at the API layer, not by a DB constraint — pre-existing rows and the demo seeder are unaffected, and each category ships an escape value. name is globally UNIQUE across all three categories, so a capture_source tag can't share a name with a free or conflict tag.
geolocation_tags¶
Many-to-many junction table between geolocations and tags.
| Column | Type | Constraints |
|---|---|---|
geolocation_id |
UUID |
FK → geolocations.id ON DELETE CASCADE |
tag_id |
UUID |
FK → tags.id ON DELETE CASCADE |
Composite PK: (geolocation_id, tag_id)
Design decisions¶
Why JSONB for proof?¶
Tiptap (the rich editor) serializes content as ProseMirror JSON. Storing that JSON as-is in a JSONB column avoids any conversion. PostgreSQL can index and query JSONB natively.
Why geolocation_tags and not an array column on geolocations?¶
Many-to-many: a geolocation can carry several tags, and a tag can appear on many geolocations. The geolocation_tags junction table is the standard solution: it allows efficient filtering (WHERE tag_id = X) and indexing on both sides. The alternative (a tag_ids[] array on geolocations) would make filters more complex and less performant at scale.
-- Tags for a given geolocation
SELECT t.name, t.category
FROM tags t
JOIN geolocation_tags gt ON gt.tag_id = t.id
WHERE gt.geolocation_id = 'a3f8c2d1-...';
-- → [{ name: "Ukraine", category: "conflict" }, { name: "airstrike", category: "free" }]
Why a single tags table with a category?¶
Conflicts and free-form tags share the same mechanics (filtering, many-to-many association). A single table plus a category field avoids duplicating the logic. The distinction is still queryable: WHERE category = 'conflict'.
Why GEOMETRY instead of two lat / lng columns?¶
PostGIS unlocks native geospatial queries (bounding-box filtering, distance computation, clustering). GeoAlchemy2 exposes those types directly to SQLAlchemy.
Typical MVP queries¶
-- All points for the map (initial load)
SELECT id, title, ST_X(location) AS lng, ST_Y(location) AS lat, event_date
FROM geolocations;
-- Filter by conflict
SELECT g.id, g.title, ST_X(g.location) AS lng, ST_Y(g.location) AS lat
FROM geolocations g
JOIN geolocation_tags gt ON gt.geolocation_id = g.id
JOIN tags t ON t.id = gt.tag_id
WHERE t.name = 'Ukraine' AND t.category = 'conflict';
-- An analyst's geolocations (profile)
SELECT id, title, event_date, created_at
FROM geolocations
WHERE author_id = :user_id
ORDER BY event_date DESC;
Seed data¶
A third-party KMZ export can be mapped locally to generate test data. Large binaries; not version-controlled.
KML → Vidit mapping:
| KML field | → | Vidit column |
|---|---|---|
coordinates (lng, lat) |
→ | geolocations.location |
description (first line) |
→ | geolocations.title |
TimeStamp |
→ | geolocations.event_date |
"Source(s)" URLs in description |
→ | geolocations.source_url |
"Geolocation(s)" URLs in description |
→ | geolocations.proof |
styleUrl (icon color) |
→ | tags (side / event type) |
Local KMZ import is no longer scripted in this repo (the prior seed_external.py / enrich_media.py were deleted — the admin Demo data panel replaces them with a synthetic-data flow that references a curated demo-pool/ S3 prefix). Mapping kept for a future agreement-bound import.