# Underlay — AI Integration Guide Underlay is a versioned, content-addressed registry for structured knowledge. Apps push snapshots of their data; Underlay preserves them and serves them via HTTPS API. Built by Knowledge Futures (501c3): https://www.knowledgefutures.org Base URL: https://underlay.org/api --- ## Authentication There are two auth methods: 1. API Key (for programmatic access): Header: Authorization: Bearer ul_ Keys have scopes: read, write, admin. Create keys at https://underlay.org/settings/keys or via POST /api/accounts/keys. 2. Session cookie (for browser use): POST /api/accounts/login with {"email": "...", "password": "..."} sets a session cookie. POST /api/accounts/signup with {"email", "password", "username", "displayName"} creates an account. POST /api/accounts/logout clears the session. GET /api/accounts/me returns the current user (works with either auth method). All GET requests are public — no auth required to read public data. All write requests (POST, PATCH, PUT, DELETE) require authentication. If a Bearer token is provided but invalid, the request is rejected immediately (401). ## Rate Limits Requests are rate-limited per IP (unauthenticated) or per account (authenticated). | Auth status | Limit | |-------------------|---------------| | Unauthenticated | 60 req/min | | Authenticated | 5,000 req/min | Rate limit headers are included on every response: - X-RateLimit-Limit: max requests in the window - X-RateLimit-Remaining: requests left - X-RateLimit-Reset: seconds until the window resets When exceeded, you'll get a 429 response with a Retry-After header. To get the higher limit, authenticate with an API key (recommended for any automated access). --- ## Core Concepts - Collection: a named, versioned body of data owned by an account. Identified by :owner/:slug. - Version: an immutable snapshot containing a JSON Schema, records, and file references. Sequential integer numbers, auto-derived semver. - Record: a flat JSON object with { id, type, data }. Records reference other records by id and files by hash. - File: a binary blob stored by SHA-256 hash, referenced in record data as {"$file": "sha256:"}. - Schema: a JSON Schema document for a single record type, stored as a global, immutable, content-addressed entity. Each type gets its own schema. Schema changes trigger a major version bump. - Schema labeling: schemas can be labeled post-hoc with URIs or names (e.g. "schema.org/Person") for cross-collection discovery. --- ## Reading Data ### Browse collections GET /api/collections → list public collections (?q=search&limit=50&offset=0) GET /api/collections/:owner/:slug → collection metadata + latest version summary ### Read versions GET /api/collections/:owner/:slug/versions → list versions (newest first, ?limit=50&offset=0) GET /api/collections/:owner/:slug/versions/latest → latest version with full metadata GET /api/collections/:owner/:slug/versions/:n → specific version by number ### Read records and files GET /api/collections/:owner/:slug/versions/:n/records → records for a version (?type=TypeName&limit=100&after=recordId) GET /api/collections/:owner/:slug/versions/:n/manifest → lightweight manifest: record ids/types + file hashes GET /api/collections/:owner/:slug/files/:hash → download a file by hash HEAD /api/collections/:owner/:slug/files/:hash → check if a file exists (returns Content-Length, Content-Type) ### Diff GET /api/collections/:owner/:slug/versions/:n/diff?from=:m → diff between version m and n (added, updated, removed records) ### Export GET /api/collections/:owner/:slug/export → download .tar.gz archive (manifest.json + records/*.ndjson + files/*) GET /api/collections/:owner/:slug/export?version=3 → export a specific version --- ## Writing Data: The Push Flow This is the canonical workflow for syncing an app's data to Underlay: ### Step 1: Get current state GET /api/collections/:owner/:slug/versions/latest Response includes { number, semver, hash, recordCount, fileCount }. If 404, no versions exist yet — your first push should use base_version: null. ### Step 2: Fetch the manifest (optional, for diffing) GET /api/collections/:owner/:slug/versions/:n/manifest Response: { "version": 3, "semver": "v1.2.0", "hash": "abc123...", "records": [{"id": "rec-1", "type": "Article"}, ...], "files": ["deadbeef...", ...] } Compare this against your local data to determine what changed. ### Step 3: Upload new files (if any) For each file your app has that Underlay doesn't: PUT /api/collections/:owner/:slug/files/sha256: Content-Type: application/octet-stream Body: raw file bytes The server verifies the SHA-256 hash matches the body. Existing hashes are idempotent (200 OK). Check existence first with HEAD if you want to skip uploads. ### Step 4: Push the version POST /api/collections/:owner/:slug/versions Content-Type: application/json Authorization: Bearer ul_ { "base_version": 3, "message": "Daily archive 2026-04-27", "app_id": "my-app", "actor_id": "my-app:cron-job", "schemas": { "Article": { "type": "object", "properties": { "title": {"type": "string"}, "body": {"type": "string"}, "publishedAt": {"type": "string", "format": "date-time"}, "authorId": {"type": "string", "x-ref-type": "Author"} } }, "Author": { "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string", "private": true} } } }, "changes": { "added": [ {"id": "article-42", "type": "Article", "data": {"title": "New Paper", "body": "...", "publishedAt": "2026-04-27T00:00:00Z", "authorId": "author-1"}} ], "updated": [ {"id": "article-10", "type": "Article", "data": {"title": "Updated Title", "body": "...", "publishedAt": "2025-01-01T00:00:00Z", "authorId": "author-2"}} ], "removed": ["article-5"] } } Field reference: - base_version: the version number you diffed against. null for first push. Used for optimistic locking. - schemas: per-type JSON Schema map. Required on first push. If omitted on subsequent pushes and records validate against carried-forward schemas, they are carried forward automatically. - schemas[TypeName]: JSON Schema for that record type. Use "x-ref-type" to annotate foreign key fields. - message: human-readable commit message (optional). - app_id: identifier for the pushing application (optional). - actor_id: identifier for the user or process that triggered the push (optional). - changes.added: new records to insert. - changes.updated: existing records to replace (full record, not a patch). - changes.removed: record ids to delete. ### Step 5: Handle the response Success (201): { "version": 4, "semver": "v1.3.0", "hash": "def456...", "recordCount": 100, "fileCount": 5 } Conflict (409 — someone pushed while you were diffing): { "error": "Version conflict", "currentVersion": 4, "statusCode": 409 } → Re-fetch latest, re-diff, re-push with the new base_version. Missing files (422 — records reference files not yet uploaded): { "error": "Missing files", "filesNeeded": ["sha256:abc...", "sha256:def..."], "statusCode": 422 } → Upload the listed files, then retry the push. ### First push (no existing versions) Set base_version to null. Put all records in changes.added. Include schemas for all types. --- ## Chunked Push: Large Uploads For pushes exceeding 100MB (or millions of records), use the chunked upload protocol. This avoids body size limits and memory pressure by streaming changes in batches. ### Step 1: Start an upload session POST /api/collections/:owner/:slug/versions/upload Authorization: Bearer ul_ { "base_version": 3, "message": "Bulk import 2M records", "app_id": "my-app", "schemas": { ... } } Response (201): { "sessionId": "uuid-of-session", "expiresAt": "2026-04-30T01:00:00.000Z" } Sessions expire after 1 hour. ### Step 2: Append batches (repeat as needed) PUT /api/collections/:owner/:slug/versions/upload/:sessionId Authorization: Bearer ul_ { "changes": { "added": [ ...up to 10,000 records... ], "updated": [ ... ], "removed": ["id-1", "id-2"] } } Response: { "received": { "added": 5000, "updated": 0, "removed": 0 }, "totalStaged": 15000 } Each batch can contain up to 10,000 records. Call this endpoint as many times as needed. If a record ID is sent more than once, the last write wins (upsert semantics). ### Step 3: Finalize the version POST /api/collections/:owner/:slug/versions/upload/:sessionId/finalize Authorization: Bearer ul_ Response (201): same as regular push { "version": 4, "semver": "v1.3.0", "hash": "...", "recordCount": 2000000, "fileCount": 5 } Finalize applies all staged changes to the base version, validates against schemas, computes hashes, and creates the new immutable version. ### Errors during finalize - 409 Version conflict: someone pushed since your base_version. Start a new session. - 422 Schema validation failed: one or more records don't match. Fix and re-upload. - 422 Missing files: records reference files not yet uploaded. Upload them and retry finalize. - 410 Session expired: the 1-hour window elapsed. Start a new session. ### Cancel a session DELETE /api/collections/:owner/:slug/versions/upload/:sessionId Response: 204 (staged records are discarded) ### Check session status GET /api/collections/:owner/:slug/versions/upload/:sessionId Response: { sessionId, status, recordCount, baseVersion, expiresAt, createdAt } Status values: open, finalizing, completed, failed, expired. Note: On successful finalize, the session and all staged records are deleted from the database. There is no need to manually clean up completed sessions. --- ## Pagination (Records Endpoint) The records endpoint uses cursor-based pagination for efficient traversal of large collections. GET /api/collections/:owner/:slug/versions/:n/records?limit=100&after=record-id-42 Response: { "records": [ ...up to `limit` records... ], "pagination": { "limit": 100, "hasMore": true, "nextCursor": "record-id-142", "total": 2000000 } } Parameters: - limit: max records per page (default 100, max 1000) - after: cursor (record ID) — return records with IDs lexicographically after this value - offset: legacy offset-based pagination (still supported, but cursor is preferred for large sets) - type: filter by record type To paginate through all records: 1. First request: GET .../records?limit=1000 2. If pagination.hasMore is true, use pagination.nextCursor for the next request: GET .../records?limit=1000&after= 3. Repeat until hasMore is false. --- ## Record Format { "id": "unique-stable-id", "type": "TypeName", "data": { "title": "Some value", "authorId": "author-123", "attachment": {"$file": "sha256:abc123def456..."} } } - id: stable, unique within the collection. Use your app's primary key or generate a deterministic one. - type: groups records by kind (e.g. "Article", "Author", "Grant"). - data: flat JSON object. Reference other records by id. Reference files with {"$file": "sha256:"}. Records are flat — no nested joins. Relationships are expressed by storing the id of the related record. The schema declares which fields are references, so tools can resolve them at read time. --- ## Schema Discovery Schemas are globally deduplicated by content hash. If two collections use the same type shape, they share the same schema row. This enables cross-collection discovery. GET /api/schemas → search schemas (?q=text&slug=TypeName&label=uri&schema_hash=sha256:...&limit=50&offset=0) GET /api/schemas/:id → single schema with labels and usage info GET /api/collections/:owner/:slug/schemas → schemas for latest version (?version=N for specific, ?raw=true to skip label enrichment) POST /api/schemas/:id/labels → add label {"label": "schema.org/Person"} (requires write scope) DELETE /api/schemas/:id/labels/:label → remove label (requires admin scope) When schemas are returned via the collection schemas endpoint, known labels are injected as "x-underlay-labels" on the schema body (opt-out with ?raw=true). --- ## Versioning - Versions are sequential integers (1, 2, 3, ...). - Semver is derived automatically: schema change → major, record change → minor, metadata-only → patch. - Schema change = any type's schema_id changed, or types added/removed between versions. - Each version has a content-addressed hash computed from sorted type→schema_hash map + sorted records + sorted file hashes. - Versions are immutable once created. --- ## Collection Management POST /api/accounts/:owner/collections → create collection {"slug", "name", "description", "public"} PATCH /api/collections/:owner/:slug → update {"name", "description", "public"} DELETE /api/collections/:owner/:slug → delete collection (requires admin scope) GET /api/accounts/:owner/collections → list collections for an account --- ## Privacy & Visibility Underlay supports fine-grained privacy at three levels: types, fields, and individual records. Private data is stored alongside public data in the same version but is only visible to the collection owner. ### Private Types Mark an entire type as private in its schema. All records of that type are hidden from public readers. "schemas": { "Article": { "type": "object", "properties": { "title": {"type": "string"} } }, "InternalNote": { "type": "object", "private": true, "properties": { "note": {"type": "string"}, "articleId": {"type": "string"} } } } Public readers see only the Article type. InternalNote is completely hidden (including from the schema response). ### Private Fields Mark individual fields as private within a type's schema. The type itself is visible, but those fields are stripped for public readers. "Author": { "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string", "private": true}, "phone": {"type": "string", "private": true} } } Public readers see Author records with only "name". The owner sees all fields. ### Private Records Mark individual records as private when pushing. The type and schema are visible, but that specific record is hidden. "changes": { "added": [ {"id": "article-1", "type": "Article", "data": {...}}, {"id": "article-2", "type": "Article", "data": {...}, "private": true} ] } article-2 is only visible to the collection owner. Public readers see article-1 only. ### How it works - Public hash: computed from public content only (excludes private types, records, and fields) - Private hash: computed from all content (used by the owner for integrity verification) - Schema filtering: the schema returned to public readers omits private types and private fields - Record filtering: queries by non-owners automatically exclude private records and strip private fields --- ## API Key Management POST /api/accounts/keys → create key {"label": "my-app", "scope": "write"} Response includes the key once: {"key": "ul_abc123...", "id": "..."} GET /api/accounts/keys → list keys (id, label, scope, createdAt, lastUsedAt — not the key itself) DELETE /api/accounts/keys/:id → revoke a key --- ## Error Codes 400 — Validation error (bad request body) 401 — Authentication required 403 — Insufficient scope (e.g. read key used for write) 404 — Not found (or private collection you can't access) 409 — Version conflict (re-fetch and retry with new base_version) 422 — Missing files (upload them first, then retry push) --- Full documentation: https://underlay.org/docs Integration guide: https://underlay.org/docs/integration Source code: https://github.com/knowledgefutures/underlay