# Underlay - AI Integration Guide

Underlay is a versioned, content-addressed registry for structured knowledge.
Apps push snapshots of their data; Underlay preserves them and serves them via HTTPS API.
Built by Knowledge Futures (501c3): https://www.knowledgefutures.org

Base URL: https://underlay.org/api

---

## Authentication

There are two auth methods:

1. API Key (for programmatic access):
   Header: Authorization: Bearer <key>
   Keys are prefixed with "ul_" (e.g. ul_abc123...) when created through the UI.
   Keys have scopes: read, write, admin.
   Keys can optionally be scoped to specific collections via metadata.
   Create keys at https://underlay.org/settings/keys (personal) or /:owner/settings/keys (organization).
   Keys are managed via better-auth's apiKey plugin at /api/auth/api-key/*.

2. Session cookie (for browser use):
   Users sign in via KF Auth SSO (OAuth2/PKCE) at https://underlay.org/login.
   Accounts are created automatically on first sign-in, along with a default organization.
   GET /api/accounts/me returns the current user and their organization memberships.

All GET requests are public — no auth required to read public data.
All write requests (POST, PATCH, PUT, DELETE) require authentication.
If a Bearer token is provided but invalid, the request is rejected immediately (401).

## Rate Limits

All API requests are rate-limited per IP (unauthenticated) or per account (authenticated).

| Auth status       | Limit         |
|-------------------|---------------|
| Unauthenticated   | 60 req/min    |
| Authenticated     | 5,000 req/min |

Rate limit headers are included on every response:
- X-RateLimit-Limit: max requests in the window
- X-RateLimit-Remaining: requests left
- X-RateLimit-Reset: seconds until the window resets

When exceeded, you'll get a 429 response with a Retry-After header.
To get the higher limit, authenticate with an API key (recommended for any automated access).

---

## Core Concepts

- Organization: an entity that owns collections. Every user gets a default organization on signup. Identified by :slug. Managed via better-auth's organization plugin at /api/auth/organization/*.
- Collection: a named, versioned body of data owned by an organization. Identified by :owner/:slug.
- Version: an immutable snapshot containing a JSON Schema, records, and file references. Identified by semver (e.g. "v1.0.0"). Major = schema change, Minor = records/files change, Patch = metadata-only change.
- Record: a flat JSON object with { id, type, data }. Records are content-addressed: the SHA-256 hash of the canonical JSON `{"id":...,"type":...,"data":...}` is the record's identity. Records are stored globally and deduplicated — the same record in ten collections is stored once. Records reference other records by id and files by hash. Wire format is JSONL (one record per line).
- File: a binary blob stored by SHA-256 hash, referenced in record data as {"$file": "sha256:<hex>"}.
- Schema: a JSON Schema document for a single record type, stored as a global, immutable, content-addressed entity. Each type gets its own schema. Schema changes trigger a major version bump.
- Schema labeling: schemas can be labeled post-hoc with URIs or names (e.g. "schema.org/Person") for cross-collection discovery.

---

## Reading Data

### Browse collections
GET /api/collections                                    → list public collections (?q=search&limit=50&offset=0)
GET /api/collections/:owner/:slug                       → collection metadata + latest version summary

### Read versions
GET /api/collections/:owner/:slug/versions              → list versions (newest first, ?limit=50&offset=0)
GET /api/collections/:owner/:slug/versions/latest       → latest version with full metadata
GET /api/collections/:owner/:slug/versions/:semver      → specific version by semver (e.g. /versions/v1.2.0)

### Read records and files
GET /api/collections/:owner/:slug/versions/:semver/records   → records for a version (?type=TypeName&limit=100&after=recordId)
GET /api/collections/:owner/:slug/versions/:semver/manifest  → manifest: record ids/types/hashes + file hashes + schema hashes (?since=v1.0.0 for delta)
GET /api/collections/:owner/:slug/versions/:semver/files     → list files for a version (hash, size, content type)
GET /api/collections/:owner/:slug/files/:hash            → download a file by hash
HEAD /api/collections/:owner/:slug/files/:hash           → check if a file exists (returns Content-Length, Content-Type)

### Records (global, content-addressed)
GET /api/records/:hash/provenance                        → find all collections/versions containing this record hash
POST /api/records/batch                                  → fetch records by hash: {"hashes": ["abc..."]} → JSONL stream

### Diff
GET /api/collections/:owner/:slug/versions/:semver/diff?from=:semver → diff between two versions (added, updated, removed records)

### Export
GET /api/collections/:owner/:slug/export                → download .tar.gz archive (manifest.json + records/*.ndjson + files/*)
GET /api/collections/:owner/:slug/export?version=v2.0.0  → export a specific version

### Fork
POST /api/collections/:owner/:slug/fork                 → fork collection into caller's org (requires write auth)
  Body: { "targetOrg": "my-org", "slug": "optional-new-slug" }
  Creates a new collection under targetOrg with the source's latest version.
  Records, schemas, and files are referenced (not copied) — zero additional storage.
  Response includes { id, owner, slug, forkedFrom: { owner, slug, version } }.

---

## Writing Data: The Push Flow

This is the canonical workflow for syncing an app's data to Underlay:

### Step 1: Get current state
GET /api/collections/:owner/:slug/versions/latest

Response includes { semver, hash, recordCount, fileCount }.
If 404, no versions exist yet — your first push should use base_version: null.

### Step 2: Fetch the manifest (optional, for diffing)
GET /api/collections/:owner/:slug/versions/:semver/manifest

Response:
{
  "semver": "v1.2.0",
  "hash": "abc123...",
  "schemas": {"Article": "schema-hash...", ...},
  "records": [{"id": "rec-1", "type": "Article", "hash": "record-hash..."}, ...],
  "files": ["deadbeef...", ...]
}

For delta manifests, add ?since=<semver> to get only the changes between two versions:
GET /api/collections/:owner/:slug/versions/:semver/manifest?since=:semver
Response includes "delta": { "added": [...], "updated": [...], "removed": [...] }

Compare this against your local data to determine what changed.

### Step 3: Upload new files (if any)
For each file your app has that Underlay doesn't:

PUT /api/collections/:owner/:slug/files/sha256:<hex>
Content-Type: application/octet-stream
Body: raw file bytes

The server verifies the SHA-256 hash matches the body. Existing hashes are idempotent (200 OK).
Check existence first with HEAD if you want to skip uploads.

### Step 4: Push the version (negotiate protocol)

All pushes use the negotiate protocol — a three-step flow similar to git's pack negotiation.
The client sends a manifest of record hashes; the server says which it needs; the client sends
only those records; then commits.

#### Step 4a: Negotiate
POST /api/collections/:owner/:slug/versions/negotiate
Content-Type: application/json
Authorization: Bearer ul_<key>

{
  "base_version": "v1.2.0",
  "message": "Daily archive 2026-04-27",
  "app_id": "my-app",
  "actor_id": "my-app:cron-job",
  "schemas": {
    "Article": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "body": {"type": "string"},
        "publishedAt": {"type": "string", "format": "date-time"},
        "authorId": {"type": "string", "x-ref-type": "Author"}
      }
    },
    "Author": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "email": {"type": "string", "private": true}
      }
    }
  },
  "manifest": [
    {"id": "article-42", "type": "Article", "hash": "abc123..."},
    {"id": "article-10", "type": "Article", "hash": "def456..."}
  ],
  "files": ["7a8b9c..."],
  "metadata": {
    "description": "Daily archive of publications"
  }
}

Each record hash is SHA-256 of the canonical JSON — see "Record Hashing" section below for
the exact algorithm. Clients MUST canonicalize data (sort object keys recursively) before
hashing, or the server will reject the records.

Field reference:
- base_version: the semver string of the version you diffed against (e.g. "v1.2.0"). null for first push. Used for optimistic locking.
- schemas: per-type JSON Schema map. Required on every push.
- manifest: array of {id, type, hash} for every record in the new version.
- files: array of file hashes (SHA-256 hex strings) referenced by records.
- metadata: optional JSON object for version metadata (description, readme, license, etc.). Merged with previous version's metadata.
- message: human-readable commit message (optional).
- app_id: identifier for the pushing application (optional).
- actor_id: identifier for the user or process that triggered the push (optional).
- strip_unknown_fields: if true, records with fields not defined in the schema will have those fields silently stripped. Default: false (returns 422 with extra field list).

Response:
{
  "session_id": "uuid",
  "needed_records": ["def456..."],
  "needed_files": [],
  "total_records": 2,
  "total_files": 1,
  "already_have_records": 1,
  "already_have_files": 1
}

#### Step 4b: Send needed records
POST /api/collections/:owner/:slug/versions/negotiate/:sessionId/records
Content-Type: application/x-ndjson
Authorization: Bearer ul_<key>

{"id":"article-10","type":"Article","data":{"title":"Updated Title","body":"..."}}

Each line is one JSON record. Only send records whose hashes appear in needed_records.
Call this endpoint multiple times for large datasets (up to 10,000 records per batch).
If needed_records was empty, skip this step entirely.

Response:
{ "received": 1, "remaining": 0 }

When remaining reaches 0, all needed records have been received.

#### Step 4c: Commit
POST /api/collections/:owner/:slug/versions/negotiate/:sessionId/commit
Authorization: Bearer ul_<key>

No request body needed. The server validates all records against schemas, computes
version hashes, and creates the new immutable version.

Response (201):
{ "semver": "v1.3.0", "hash": "def456...", "recordCount": 2, "fileCount": 1 }

### Step 5: Handle errors

Conflict (409 — someone pushed while you were diffing):
{ "error": "Version conflict", "currentVersion": "v1.3.0", "statusCode": 409 }
→ Re-negotiate with the new base_version.

Missing records (400 — commit called before all records submitted):
{ "error": "Missing records", "missing_hashes": ["def456..."], "statusCode": 400 }
→ Send the remaining records via the /records endpoint, then retry commit.

Missing files (422 — records reference files not yet uploaded):
{ "error": "Missing files", "filesNeeded": ["sha256:abc..."], "statusCode": 422 }
→ Upload the listed files, then retry commit.

Extra fields (422 — records contain fields not in the schema):
{ "error": "Records contain fields not defined in schema", "extraFields": [...], "statusCode": 422 }
→ Either fix the records, or re-negotiate with "strip_unknown_fields": true.

Sessions expire after 10 minutes. If the session expires, re-negotiate.

### Session management
GET  /api/collections/:owner/:slug/versions/negotiate/:sessionId  → check session status
DELETE /api/collections/:owner/:slug/versions/negotiate/:sessionId → cancel session (204)

### First push (no existing versions)
Set base_version to null. Include all records in the manifest. Include schemas for all types.
The first version will be v1.0.0.

---

## Pagination (Records Endpoint)

The records endpoint uses cursor-based pagination for efficient traversal of large collections.

GET /api/collections/:owner/:slug/versions/:semver/records?limit=100&after=record-id-42

Response:
{
  "records": [ ...up to `limit` records... ],
  "pagination": {
    "limit": 100,
    "hasMore": true,
    "nextCursor": "record-id-142",
    "total": 2000000
  }
}

Parameters:
- limit: max records per page (default 100, max 1000)
- after: cursor (record ID) — return records with IDs lexicographically after this value
- offset: legacy offset-based pagination (still supported, but cursor is preferred for large sets)
- type: filter by record type

To paginate through all records:
1. First request: GET .../records?limit=1000
2. If pagination.hasMore is true, use pagination.nextCursor for the next request:
   GET .../records?limit=1000&after=<nextCursor>
3. Repeat until hasMore is false.

---

## Record Format

{
  "id": "unique-stable-id",
  "type": "TypeName",
  "data": {
    "title": "Some value",
    "authorId": "author-123",
    "attachment": {"$file": "sha256:abc123def456..."}
  }
}

- id: stable, unique within the collection. Use your app's primary key or generate a deterministic one.
- type: groups records by kind (e.g. "Article", "Author", "Grant").
- data: flat JSON object. Reference other records by id. Reference files with {"$file": "sha256:<hex>"}.

Records are flat — no nested joins. Relationships are expressed by storing the id of the related record.
The schema declares which fields are references, so tools can resolve them at read time.

---

## Record Hashing (required for push)

Records are content-addressed. The hash is the record's identity across the system — it determines
deduplication, manifest membership, and version integrity. Any client that pushes data must compute
hashes exactly as the server does, or the push will fail with "Unexpected record hash".

### Algorithm

1. Build the canonical object: `{"id": <id>, "type": <type>, "data": <canonicalized_data>}`
   - The top-level keys MUST appear in this exact order: id, type, data.
   - The `data` value MUST be canonicalized (see below).
   - The `private` flag is NOT part of the hash. Two records with identical id/type/data
     but different privacy flags produce the same hash.

2. Serialize to JSON with no extra whitespace (standard JSON.stringify behavior).

3. Compute SHA-256 over the UTF-8 bytes of that JSON string.

4. Encode as lowercase hex (64 characters).

### Canonicalization

Canonicalization recursively sorts all object keys alphabetically. This ensures that
`{"b":1,"a":2}` and `{"a":2,"b":1}` produce the same hash.

Rules:
- Objects: sort keys lexicographically (by Unicode code point), recurse into values.
- Arrays: preserve order, recurse into elements.
- Primitives (strings, numbers, booleans, null): unchanged.

Reference implementation (JavaScript):

```javascript
function canonicalize(value) {
  if (value === null || typeof value !== 'object') return value;
  if (Array.isArray(value)) return value.map(canonicalize);
  const sorted = {};
  for (const key of Object.keys(value).sort()) {
    sorted[key] = canonicalize(value[key]);
  }
  return sorted;
}

function hashRecord(record) {
  const canonical = JSON.stringify({
    id: record.id,
    type: record.type,
    data: canonicalize(record.data),
  });
  // Node.js:
  // const hash = createHash('sha256').update(canonical).digest('hex');
  // Browser:
  // const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(canonical));
  // const hash = Array.from(new Uint8Array(buf)).map(b => b.toString(16).padStart(2, '0')).join('');
  return { hash, canonical };
}
```

### Example

Input record:
  { "id": "article-1", "type": "Article", "data": { "title": "Hello", "body": "World" } }

Canonical JSON (data keys already sorted):
  {"id":"article-1","type":"Article","data":{"body":"World","title":"Hello"}}

SHA-256 hash:
  e3b7a... (64 hex characters)

Note: if data keys were in a different order (e.g. `{"title":"Hello","body":"World"}`),
canonicalization sorts them to `{"body":"World","title":"Hello"}` — producing the same hash.

### Schema hashing

Schemas are also content-addressed. The hash is SHA-256 of `JSON.stringify(canonicalize(schemaBody))`.
The same canonicalization rules apply.

---

## Schema Discovery

Schemas are globally deduplicated by content hash. If two collections use the same type shape, they share the same schema row. This enables cross-collection discovery.

GET /api/schemas                                        → search schemas (?q=text&slug=TypeName&label=uri&schema_hash=sha256:...&limit=50&offset=0)
GET /api/schemas/:id                                    → single schema with labels and usage info
GET /api/collections/:owner/:slug/schemas               → schemas for latest version (?version=v1.2.0 for specific, ?raw=true to skip label enrichment)
POST /api/schemas/:id/labels                            → add label {"label": "schema.org/Person"} (requires write scope)
DELETE /api/schemas/:id/labels/:label                   → remove label (requires admin scope)

When schemas are returned via the collection schemas endpoint, known labels are injected as "x-underlay-labels" on the schema body (opt-out with ?raw=true).

---

## Versioning

- Versions are identified solely by semver (e.g. "v1.0.0", "v1.2.3").
- Semver semantics: major bump = schema change (types added/removed or schema_id changed), minor bump = records/files change, patch bump = metadata-only change.
- Each version has a content-addressed hash computed from sorted schema hashes + sorted record hashes + sorted file hashes + metadata.
- Each version has a `metadata` field: a JSON object that can contain `readme`, `license`, and other arbitrary metadata.
- Records are content-addressed: SHA-256 of canonical JSON (see "Record Hashing" section). Records are globally deduplicated.
- Versions are immutable once created.
- The provenance endpoint (GET /api/records/:hash/provenance) shows every collection and version that includes a given record.

---

## Organization Management

Organizations are managed via better-auth's organization plugin at /api/auth/organization/*.
Every user gets a default organization on signup. Users can create additional organizations.

POST /api/auth/organization/create     → create org {"name", "slug"}
GET /api/auth/organization/list        → list user's organizations
PATCH /api/auth/organization/update    → update org
DELETE /api/auth/organization/delete   → delete org

Member management (invite, remove, update roles) is also under /api/auth/organization/*.

## Collection Management

POST /api/accounts/:owner/collections              → create collection {"slug", "name", "public"}
PATCH /api/collections/:owner/:slug                → update {"name", "slug", "public"}
PATCH /api/collections/:owner/:slug/metadata       → update version metadata {"description", "readme", "license", ...} — creates a patch version (requires write scope)
DELETE /api/collections/:owner/:slug               → delete collection (requires admin scope)
GET /api/accounts/:owner/collections               → list collections for an organization

---

## Privacy & Visibility

Underlay supports fine-grained privacy at three levels: types, fields, and individual records.
Private data is stored alongside public data in the same version but is only visible to the collection owner.

### Private Types
Mark an entire type as private in its schema. All records of that type are hidden from public readers.

"schemas": {
  "Article": {
    "type": "object",
    "properties": { "title": {"type": "string"} }
  },
  "InternalNote": {
    "type": "object",
    "private": true,
    "properties": { "note": {"type": "string"}, "articleId": {"type": "string"} }
  }
}

Public readers see only the Article type. InternalNote is completely hidden (including from the schema response).

### Private Fields
Mark individual fields as private within a type's schema. The type itself is visible, but those fields are stripped for public readers.

"Author": {
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string", "private": true},
    "phone": {"type": "string", "private": true}
  }
}

Public readers see Author records with only "name". The owner sees all fields.

### Private Records
Mark individual records as private in the negotiate manifest. The type and schema are visible, but that specific record is hidden.

"manifest": [
  {"id": "article-1", "type": "Article", "hash": "<sha256>"},
  {"id": "article-2", "type": "Article", "hash": "<sha256>", "private": true}
]

article-2 is only visible to the collection owner. Public readers see article-1 only.

### How it works
- Public hash: computed from public content only (excludes private types, records, fields, and metadata)
- Public record hash: a record of a type with private fields is listed in public manifests under
  the hash of its filtered projection ({"id", "type", "data"} with private fields stripped).
  Record endpoints resolve either address; hashing the document you receive always reproduces
  the address you requested.
- Private hash: computed from all content including metadata (used by the owner for integrity verification)
- Schema filtering: the schema returned to public readers omits private types and private fields
- Record filtering: queries by non-owners automatically exclude private records and strip private fields

---

## API Key Management

API keys are managed via better-auth's apiKey plugin. All endpoints are under /api/auth/api-key/*.

POST /api/auth/api-key/create  → create key {"name": "my-app", "metadata": {"scope": "write"}, "prefix": "ul"}
                                  The scope in metadata is translated to permissions server-side.
                                  Response includes the key once: {"key": "ul_abc123...", "id": "..."}
GET /api/auth/api-key/list     → list keys (id, name, start, permissions, metadata, createdAt, expiresAt)
POST /api/auth/api-key/delete  → revoke a key {"keyId": "..."}

---

## Error Codes

400 — Validation error (bad request body)
401 — Authentication required
403 — Insufficient scope (e.g. read key used for write)
404 — Not found (or private collection you can't access)
409 — Version conflict (re-fetch and retry with new base_version)
413 — Payload too large (file upload exceeds size limit)
422 — Missing files, schema validation failed, or records contain extra fields not in the schema
429 — Rate limited (wait and retry)

---

Full documentation: https://underlay.org/docs
Protocol specification: https://underlay.org/protocol
Integration guide: https://underlay.org/docs/integration
Source code: https://github.com/knowledgefutures/underlay