# Underlay - AI Integration Guide Underlay is a versioned, content-addressed registry for structured knowledge. Apps push snapshots of their data; Underlay preserves them and serves them via HTTPS API. Built by Knowledge Futures (501c3): https://www.knowledgefutures.org Base URL: https://underlay.org/api --- ## Authentication There are two auth methods: 1. API Key (for programmatic access): Header: Authorization: Bearer Keys are prefixed with "ul_" (e.g. ul_abc123...) when created through the UI. Keys have scopes: read, write, admin. Keys can optionally be scoped to specific collections via metadata. Create keys at https://underlay.org/settings/keys (personal) or /:owner/settings/keys (organization). Keys are managed via better-auth's apiKey plugin at /api/auth/api-key/*. 2. Session cookie (for browser use): Users sign in via KF Auth SSO (OAuth2/PKCE) at https://underlay.org/login. Accounts are created automatically on first sign-in, along with a default organization. GET /api/accounts/me returns the current user and their organization memberships. All GET requests are public — no auth required to read public data. All write requests (POST, PATCH, PUT, DELETE) require authentication. If a Bearer token is provided but invalid, the request is rejected immediately (401). ## Rate Limits All API requests are rate-limited per IP (unauthenticated) or per account (authenticated). | Auth status | Limit | |-------------------|---------------| | Unauthenticated | 60 req/min | | Authenticated | 5,000 req/min | Rate limit headers are included on every response: - X-RateLimit-Limit: max requests in the window - X-RateLimit-Remaining: requests left - X-RateLimit-Reset: seconds until the window resets When exceeded, you'll get a 429 response with a Retry-After header. To get the higher limit, authenticate with an API key (recommended for any automated access). --- ## Core Concepts - Organization: an entity that owns collections. Every user gets a default organization on signup. Identified by :slug. Managed via better-auth's organization plugin at /api/auth/organization/*. - Collection: a named, versioned body of data owned by an organization. Identified by :owner/:slug. - Version: an immutable snapshot containing a JSON Schema, records, and file references. Identified by semver (e.g. "v1.0.0"). Major = schema change, Minor = records/files change, Patch = metadata-only change. - Record: a flat JSON object with { id, type, data }. Records are content-addressed: the SHA-256 hash of the canonical JSON `{"id":...,"type":...,"data":...}` is the record's identity. Records are stored globally and deduplicated — the same record in ten collections is stored once. Records reference other records by id and files by hash. Wire format is JSONL (one record per line). - File: a binary blob stored by SHA-256 hash, referenced in record data as {"$file": "sha256:"}. - Schema: a JSON Schema document for a single record type, stored as a global, immutable, content-addressed entity. Each type gets its own schema. Schema changes trigger a major version bump. - Schema labeling: schemas can be labeled post-hoc with URIs or names (e.g. "schema.org/Person") for cross-collection discovery. --- ## Reading Data ### Browse collections GET /api/collections → list public collections (?q=search&limit=50&offset=0) GET /api/collections/:owner/:slug → collection metadata + latest version summary ### Read versions GET /api/collections/:owner/:slug/versions → list versions (newest first, ?limit=50&offset=0) GET /api/collections/:owner/:slug/versions/latest → latest version with full metadata GET /api/collections/:owner/:slug/versions/:semver → specific version by semver (e.g. /versions/v1.2.0) ### Read records and files GET /api/collections/:owner/:slug/versions/:semver/records → records for a version (?type=TypeName&limit=100&after=recordId) GET /api/collections/:owner/:slug/versions/:semver/manifest → manifest: record ids/types/hashes + file hashes + schema hashes (?since=v1.0.0 for delta) GET /api/collections/:owner/:slug/versions/:semver/files → list files for a version (hash, size, content type) GET /api/collections/:owner/:slug/files/:hash → download a file by hash HEAD /api/collections/:owner/:slug/files/:hash → check if a file exists (returns Content-Length, Content-Type) ### Records (global, content-addressed) GET /api/records/:hash/provenance → find all collections/versions containing this record hash POST /api/records/batch → fetch records by hash: {"hashes": ["abc..."]} → JSONL stream ### Diff GET /api/collections/:owner/:slug/versions/:semver/diff?from=:semver → diff between two versions (added, updated, removed records) ### Export GET /api/collections/:owner/:slug/export → download .tar.gz archive (manifest.json + records/*.ndjson + files/*) GET /api/collections/:owner/:slug/export?version=v2.0.0 → export a specific version ### Fork POST /api/collections/:owner/:slug/fork → fork collection into caller's org (requires write auth) Body: { "targetOrg": "my-org", "slug": "optional-new-slug" } Creates a new collection under targetOrg with the source's latest version. Records, schemas, and files are referenced (not copied) — zero additional storage. Response includes { id, owner, slug, forkedFrom: { owner, slug, version } }. --- ## Writing Data: The Push Flow This is the canonical workflow for syncing an app's data to Underlay: ### Step 1: Get current state GET /api/collections/:owner/:slug/versions/latest Response includes { semver, hash, recordCount, fileCount }. If 404, no versions exist yet — your first push should use base_version: null. ### Step 2: Fetch the manifest (optional, for diffing) GET /api/collections/:owner/:slug/versions/:semver/manifest Response: { "semver": "v1.2.0", "hash": "abc123...", "schemas": {"Article": "schema-hash...", ...}, "records": [{"id": "rec-1", "type": "Article", "hash": "record-hash..."}, ...], "files": ["deadbeef...", ...] } For delta manifests, add ?since= to get only the changes between two versions: GET /api/collections/:owner/:slug/versions/:semver/manifest?since=:semver Response includes "delta": { "added": [...], "updated": [...], "removed": [...] } Compare this against your local data to determine what changed. ### Step 3: Upload new files (if any) For each file your app has that Underlay doesn't: PUT /api/collections/:owner/:slug/files/sha256: Content-Type: application/octet-stream Body: raw file bytes The server verifies the SHA-256 hash matches the body. Existing hashes are idempotent (200 OK). Check existence first with HEAD if you want to skip uploads. ### Step 4: Push the version (negotiate protocol) All pushes use the negotiate protocol — a three-step flow similar to git's pack negotiation. The client sends a manifest of record hashes; the server says which it needs; the client sends only those records; then commits. #### Step 4a: Negotiate POST /api/collections/:owner/:slug/versions/negotiate Content-Type: application/json Authorization: Bearer ul_ { "base_version": "v1.2.0", "message": "Daily archive 2026-04-27", "app_id": "my-app", "actor_id": "my-app:cron-job", "schemas": { "Article": { "type": "object", "properties": { "title": {"type": "string"}, "body": {"type": "string"}, "publishedAt": {"type": "string", "format": "date-time"}, "authorId": {"type": "string", "x-ref-type": "Author"} } }, "Author": { "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string", "private": true} } } }, "manifest": [ {"id": "article-42", "type": "Article", "hash": "abc123..."}, {"id": "article-10", "type": "Article", "hash": "def456..."} ], "files": ["7a8b9c..."], "metadata": { "description": "Daily archive of publications" } } Each record hash is SHA-256 of the canonical JSON — see "Record Hashing" section below for the exact algorithm. Clients MUST canonicalize data (sort object keys recursively) before hashing, or the server will reject the records. Field reference: - base_version: the semver string of the version you diffed against (e.g. "v1.2.0"). null for first push. Used for optimistic locking. - schemas: per-type JSON Schema map. Required on every push. - manifest: array of {id, type, hash} for every record in the new version. - files: array of file hashes (SHA-256 hex strings) referenced by records. - metadata: optional JSON object for version metadata (description, readme, license, etc.). Merged with previous version's metadata. - message: human-readable commit message (optional). - app_id: identifier for the pushing application (optional). - actor_id: identifier for the user or process that triggered the push (optional). - strip_unknown_fields: if true, records with fields not defined in the schema will have those fields silently stripped. Default: false (returns 422 with extra field list). Response: { "session_id": "uuid", "needed_records": ["def456..."], "needed_files": [], "total_records": 2, "total_files": 1, "already_have_records": 1, "already_have_files": 1 } #### Step 4b: Send needed records POST /api/collections/:owner/:slug/versions/negotiate/:sessionId/records Content-Type: application/x-ndjson Authorization: Bearer ul_ {"id":"article-10","type":"Article","data":{"title":"Updated Title","body":"..."}} Each line is one JSON record. Only send records whose hashes appear in needed_records. Call this endpoint multiple times for large datasets (up to 10,000 records per batch). If needed_records was empty, skip this step entirely. Response: { "received": 1, "remaining": 0 } When remaining reaches 0, all needed records have been received. #### Step 4c: Commit POST /api/collections/:owner/:slug/versions/negotiate/:sessionId/commit Authorization: Bearer ul_ No request body needed. The server validates all records against schemas, computes version hashes, and creates the new immutable version. Response (201): { "semver": "v1.3.0", "hash": "def456...", "recordCount": 2, "fileCount": 1 } ### Step 5: Handle errors Conflict (409 — someone pushed while you were diffing): { "error": "Version conflict", "currentVersion": "v1.3.0", "statusCode": 409 } → Re-negotiate with the new base_version. Missing records (400 — commit called before all records submitted): { "error": "Missing records", "missing_hashes": ["def456..."], "statusCode": 400 } → Send the remaining records via the /records endpoint, then retry commit. Missing files (422 — records reference files not yet uploaded): { "error": "Missing files", "filesNeeded": ["sha256:abc..."], "statusCode": 422 } → Upload the listed files, then retry commit. Extra fields (422 — records contain fields not in the schema): { "error": "Records contain fields not defined in schema", "extraFields": [...], "statusCode": 422 } → Either fix the records, or re-negotiate with "strip_unknown_fields": true. Sessions expire after 10 minutes. If the session expires, re-negotiate. ### Session management GET /api/collections/:owner/:slug/versions/negotiate/:sessionId → check session status DELETE /api/collections/:owner/:slug/versions/negotiate/:sessionId → cancel session (204) ### First push (no existing versions) Set base_version to null. Include all records in the manifest. Include schemas for all types. The first version will be v1.0.0. --- ## Pagination (Records Endpoint) The records endpoint uses cursor-based pagination for efficient traversal of large collections. GET /api/collections/:owner/:slug/versions/:semver/records?limit=100&after=record-id-42 Response: { "records": [ ...up to `limit` records... ], "pagination": { "limit": 100, "hasMore": true, "nextCursor": "record-id-142", "total": 2000000 } } Parameters: - limit: max records per page (default 100, max 1000) - after: cursor (record ID) — return records with IDs lexicographically after this value - offset: legacy offset-based pagination (still supported, but cursor is preferred for large sets) - type: filter by record type To paginate through all records: 1. First request: GET .../records?limit=1000 2. If pagination.hasMore is true, use pagination.nextCursor for the next request: GET .../records?limit=1000&after= 3. Repeat until hasMore is false. --- ## Record Format { "id": "unique-stable-id", "type": "TypeName", "data": { "title": "Some value", "authorId": "author-123", "attachment": {"$file": "sha256:abc123def456..."} } } - id: stable, unique within the collection. Use your app's primary key or generate a deterministic one. - type: groups records by kind (e.g. "Article", "Author", "Grant"). - data: flat JSON object. Reference other records by id. Reference files with {"$file": "sha256:"}. Records are flat — no nested joins. Relationships are expressed by storing the id of the related record. The schema declares which fields are references, so tools can resolve them at read time. --- ## Record Hashing (required for push) Records are content-addressed. The hash is the record's identity across the system — it determines deduplication, manifest membership, and version integrity. Any client that pushes data must compute hashes exactly as the server does, or the push will fail with "Unexpected record hash". ### Algorithm 1. Build the canonical object: `{"id": , "type": , "data": }` - The top-level keys MUST appear in this exact order: id, type, data. - The `data` value MUST be canonicalized (see below). - The `private` flag is NOT part of the hash. Two records with identical id/type/data but different privacy flags produce the same hash. 2. Serialize to JSON with no extra whitespace (standard JSON.stringify behavior). 3. Compute SHA-256 over the UTF-8 bytes of that JSON string. 4. Encode as lowercase hex (64 characters). ### Canonicalization Canonicalization recursively sorts all object keys alphabetically. This ensures that `{"b":1,"a":2}` and `{"a":2,"b":1}` produce the same hash. Rules: - Objects: sort keys lexicographically (by Unicode code point), recurse into values. - Arrays: preserve order, recurse into elements. - Primitives (strings, numbers, booleans, null): unchanged. Reference implementation (JavaScript): ```javascript function canonicalize(value) { if (value === null || typeof value !== 'object') return value; if (Array.isArray(value)) return value.map(canonicalize); const sorted = {}; for (const key of Object.keys(value).sort()) { sorted[key] = canonicalize(value[key]); } return sorted; } function hashRecord(record) { const canonical = JSON.stringify({ id: record.id, type: record.type, data: canonicalize(record.data), }); // Node.js: // const hash = createHash('sha256').update(canonical).digest('hex'); // Browser: // const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(canonical)); // const hash = Array.from(new Uint8Array(buf)).map(b => b.toString(16).padStart(2, '0')).join(''); return { hash, canonical }; } ``` ### Example Input record: { "id": "article-1", "type": "Article", "data": { "title": "Hello", "body": "World" } } Canonical JSON (data keys already sorted): {"id":"article-1","type":"Article","data":{"body":"World","title":"Hello"}} SHA-256 hash: e3b7a... (64 hex characters) Note: if data keys were in a different order (e.g. `{"title":"Hello","body":"World"}`), canonicalization sorts them to `{"body":"World","title":"Hello"}` — producing the same hash. ### Schema hashing Schemas are also content-addressed. The hash is SHA-256 of `JSON.stringify(canonicalize(schemaBody))`. The same canonicalization rules apply. --- ## Schema Discovery Schemas are globally deduplicated by content hash. If two collections use the same type shape, they share the same schema row. This enables cross-collection discovery. GET /api/schemas → search schemas (?q=text&slug=TypeName&label=uri&schema_hash=sha256:...&limit=50&offset=0) GET /api/schemas/:id → single schema with labels and usage info GET /api/collections/:owner/:slug/schemas → schemas for latest version (?version=v1.2.0 for specific, ?raw=true to skip label enrichment) POST /api/schemas/:id/labels → add label {"label": "schema.org/Person"} (requires write scope) DELETE /api/schemas/:id/labels/:label → remove label (requires admin scope) When schemas are returned via the collection schemas endpoint, known labels are injected as "x-underlay-labels" on the schema body (opt-out with ?raw=true). --- ## Versioning - Versions are identified solely by semver (e.g. "v1.0.0", "v1.2.3"). - Semver semantics: major bump = schema change (types added/removed or schema_id changed), minor bump = records/files change, patch bump = metadata-only change. - Each version has a content-addressed hash computed from sorted schema hashes + sorted record hashes + sorted file hashes + metadata. - Each version has a `metadata` field: a JSON object that can contain `readme`, `license`, and other arbitrary metadata. - Records are content-addressed: SHA-256 of canonical JSON (see "Record Hashing" section). Records are globally deduplicated. - Versions are immutable once created. - The provenance endpoint (GET /api/records/:hash/provenance) shows every collection and version that includes a given record. --- ## Organization Management Organizations are managed via better-auth's organization plugin at /api/auth/organization/*. Every user gets a default organization on signup. Users can create additional organizations. POST /api/auth/organization/create → create org {"name", "slug"} GET /api/auth/organization/list → list user's organizations PATCH /api/auth/organization/update → update org DELETE /api/auth/organization/delete → delete org Member management (invite, remove, update roles) is also under /api/auth/organization/*. ## Collection Management POST /api/accounts/:owner/collections → create collection {"slug", "name", "public"} PATCH /api/collections/:owner/:slug → update {"name", "slug", "public"} PATCH /api/collections/:owner/:slug/metadata → update version metadata {"description", "readme", "license", ...} — creates a patch version (requires write scope) DELETE /api/collections/:owner/:slug → delete collection (requires admin scope) GET /api/accounts/:owner/collections → list collections for an organization --- ## Privacy & Visibility Underlay supports fine-grained privacy at three levels: types, fields, and individual records. Private data is stored alongside public data in the same version but is only visible to the collection owner. ### Private Types Mark an entire type as private in its schema. All records of that type are hidden from public readers. "schemas": { "Article": { "type": "object", "properties": { "title": {"type": "string"} } }, "InternalNote": { "type": "object", "private": true, "properties": { "note": {"type": "string"}, "articleId": {"type": "string"} } } } Public readers see only the Article type. InternalNote is completely hidden (including from the schema response). ### Private Fields Mark individual fields as private within a type's schema. The type itself is visible, but those fields are stripped for public readers. "Author": { "type": "object", "properties": { "name": {"type": "string"}, "email": {"type": "string", "private": true}, "phone": {"type": "string", "private": true} } } Public readers see Author records with only "name". The owner sees all fields. ### Private Records Mark individual records as private in the negotiate manifest. The type and schema are visible, but that specific record is hidden. "manifest": [ {"id": "article-1", "type": "Article", "hash": ""}, {"id": "article-2", "type": "Article", "hash": "", "private": true} ] article-2 is only visible to the collection owner. Public readers see article-1 only. ### How it works - Public hash: computed from public content only (excludes private types, records, fields, and metadata) - Public record hash: a record of a type with private fields is listed in public manifests under the hash of its filtered projection ({"id", "type", "data"} with private fields stripped). Record endpoints resolve either address; hashing the document you receive always reproduces the address you requested. - Private hash: computed from all content including metadata (used by the owner for integrity verification) - Schema filtering: the schema returned to public readers omits private types and private fields - Record filtering: queries by non-owners automatically exclude private records and strip private fields --- ## API Key Management API keys are managed via better-auth's apiKey plugin. All endpoints are under /api/auth/api-key/*. POST /api/auth/api-key/create → create key {"name": "my-app", "metadata": {"scope": "write"}, "prefix": "ul"} The scope in metadata is translated to permissions server-side. Response includes the key once: {"key": "ul_abc123...", "id": "..."} GET /api/auth/api-key/list → list keys (id, name, start, permissions, metadata, createdAt, expiresAt) POST /api/auth/api-key/delete → revoke a key {"keyId": "..."} --- ## Error Codes 400 — Validation error (bad request body) 401 — Authentication required 403 — Insufficient scope (e.g. read key used for write) 404 — Not found (or private collection you can't access) 409 — Version conflict (re-fetch and retry with new base_version) 413 — Payload too large (file upload exceeds size limit) 422 — Missing files, schema validation failed, or records contain extra fields not in the schema 429 — Rate limited (wait and retry) --- Full documentation: https://underlay.org/docs Protocol specification: https://underlay.org/protocol Integration guide: https://underlay.org/docs/integration Source code: https://github.com/knowledgefutures/underlay