# Underlay — AI Integration Guide

Underlay is a versioned, content-addressed registry for structured knowledge.
Apps push snapshots of their data; Underlay preserves them and serves them via HTTPS API.
Built by Knowledge Futures (501c3): https://www.knowledgefutures.org

Base URL: https://underlay.org/api

---

## Authentication

There are two auth methods:

1. API Key (for programmatic access):
   Header: Authorization: Bearer ul_<key>
   Keys have scopes: read, write, admin.
   Create keys at https://underlay.org/settings/keys or via POST /api/accounts/keys.

2. Session cookie (for browser use):
   POST /api/accounts/login with {"email": "...", "password": "..."} sets a session cookie.
   POST /api/accounts/signup with {"email", "password", "username", "displayName"} creates an account.
   POST /api/accounts/logout clears the session.
   GET /api/accounts/me returns the current user (works with either auth method).

All GET requests are public — no auth required to read public data.
All write requests (POST, PATCH, PUT, DELETE) require authentication.
If a Bearer token is provided but invalid, the request is rejected immediately (401).

## Rate Limits

Requests are rate-limited per IP (unauthenticated) or per account (authenticated).

| Auth status       | Limit         |
|-------------------|---------------|
| Unauthenticated   | 60 req/min    |
| Authenticated     | 5,000 req/min |

Rate limit headers are included on every response:
- X-RateLimit-Limit: max requests in the window
- X-RateLimit-Remaining: requests left
- X-RateLimit-Reset: seconds until the window resets

When exceeded, you'll get a 429 response with a Retry-After header.
To get the higher limit, authenticate with an API key (recommended for any automated access).

---

## Core Concepts

- Collection: a named, versioned body of data owned by an account. Identified by :owner/:slug.
- Version: an immutable snapshot containing a JSON Schema, records, and file references. Sequential integer numbers, auto-derived semver.
- Record: a flat JSON object with { id, type, data }. Records reference other records by id and files by hash.
- File: a binary blob stored by SHA-256 hash, referenced in record data as {"$file": "sha256:<hex>"}.
- Schema: a JSON Schema document for a single record type, stored as a global, immutable, content-addressed entity. Each type gets its own schema. Schema changes trigger a major version bump.
- Schema labeling: schemas can be labeled post-hoc with URIs or names (e.g. "schema.org/Person") for cross-collection discovery.

---

## Reading Data

### Browse collections
GET /api/collections                                    → list public collections (?q=search&limit=50&offset=0)
GET /api/collections/:owner/:slug                       → collection metadata + latest version summary

### Read versions
GET /api/collections/:owner/:slug/versions              → list versions (newest first, ?limit=50&offset=0)
GET /api/collections/:owner/:slug/versions/latest       → latest version with full metadata
GET /api/collections/:owner/:slug/versions/:n            → specific version by number

### Read records and files
GET /api/collections/:owner/:slug/versions/:n/records   → records for a version (?type=TypeName&limit=100&after=recordId)
GET /api/collections/:owner/:slug/versions/:n/manifest  → lightweight manifest: record ids/types + file hashes
GET /api/collections/:owner/:slug/files/:hash            → download a file by hash
HEAD /api/collections/:owner/:slug/files/:hash           → check if a file exists (returns Content-Length, Content-Type)

### Diff
GET /api/collections/:owner/:slug/versions/:n/diff?from=:m → diff between version m and n (added, updated, removed records)

### Export
GET /api/collections/:owner/:slug/export                → download .tar.gz archive (manifest.json + records/*.ndjson + files/*)
GET /api/collections/:owner/:slug/export?version=3      → export a specific version

---

## Writing Data: The Push Flow

This is the canonical workflow for syncing an app's data to Underlay:

### Step 1: Get current state
GET /api/collections/:owner/:slug/versions/latest

Response includes { number, semver, hash, recordCount, fileCount }.
If 404, no versions exist yet — your first push should use base_version: null.

### Step 2: Fetch the manifest (optional, for diffing)
GET /api/collections/:owner/:slug/versions/:n/manifest

Response:
{
  "version": 3,
  "semver": "v1.2.0",
  "hash": "abc123...",
  "records": [{"id": "rec-1", "type": "Article"}, ...],
  "files": ["deadbeef...", ...]
}

Compare this against your local data to determine what changed.

### Step 3: Upload new files (if any)
For each file your app has that Underlay doesn't:

PUT /api/collections/:owner/:slug/files/sha256:<hex>
Content-Type: application/octet-stream
Body: raw file bytes

The server verifies the SHA-256 hash matches the body. Existing hashes are idempotent (200 OK).
Check existence first with HEAD if you want to skip uploads.

### Step 4: Push the version
POST /api/collections/:owner/:slug/versions
Content-Type: application/json
Authorization: Bearer ul_<key>

{
  "base_version": 3,
  "message": "Daily archive 2026-04-27",
  "app_id": "my-app",
  "actor_id": "my-app:cron-job",
  "schemas": {
    "Article": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "body": {"type": "string"},
        "publishedAt": {"type": "string", "format": "date-time"},
        "authorId": {"type": "string", "x-ref-type": "Author"}
      }
    },
    "Author": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "email": {"type": "string", "private": true}
      }
    }
  },
  "changes": {
    "added": [
      {"id": "article-42", "type": "Article", "data": {"title": "New Paper", "body": "...", "publishedAt": "2026-04-27T00:00:00Z", "authorId": "author-1"}}
    ],
    "updated": [
      {"id": "article-10", "type": "Article", "data": {"title": "Updated Title", "body": "...", "publishedAt": "2025-01-01T00:00:00Z", "authorId": "author-2"}}
    ],
    "removed": ["article-5"]
  }
}

Field reference:
- base_version: the version number you diffed against. null for first push. Used for optimistic locking.
- schemas: per-type JSON Schema map. Required on first push. If omitted on subsequent pushes and records validate against carried-forward schemas, they are carried forward automatically.
- schemas[TypeName]: JSON Schema for that record type. Use "x-ref-type" to annotate foreign key fields.
- message: human-readable commit message (optional).
- app_id: identifier for the pushing application (optional).
- actor_id: identifier for the user or process that triggered the push (optional).
- changes.added: new records to insert.
- changes.updated: existing records to replace (full record, not a patch).
- changes.removed: record ids to delete.

### Step 5: Handle the response

Success (201):
{ "version": 4, "semver": "v1.3.0", "hash": "def456...", "recordCount": 100, "fileCount": 5 }

Conflict (409 — someone pushed while you were diffing):
{ "error": "Version conflict", "currentVersion": 4, "statusCode": 409 }
→ Re-fetch latest, re-diff, re-push with the new base_version.

Missing files (422 — records reference files not yet uploaded):
{ "error": "Missing files", "filesNeeded": ["sha256:abc...", "sha256:def..."], "statusCode": 422 }
→ Upload the listed files, then retry the push.

### First push (no existing versions)
Set base_version to null. Put all records in changes.added. Include schemas for all types.

---

## Chunked Push: Large Uploads

For pushes exceeding 100MB (or millions of records), use the chunked upload protocol. This avoids
body size limits and memory pressure by streaming changes in batches.

### Step 1: Start an upload session
POST /api/collections/:owner/:slug/versions/upload
Authorization: Bearer ul_<key>

{
  "base_version": 3,
  "message": "Bulk import 2M records",
  "app_id": "my-app",
  "schemas": { ... }
}

Response (201):
{
  "sessionId": "uuid-of-session",
  "expiresAt": "2026-04-30T01:00:00.000Z"
}

Sessions expire after 1 hour.

### Step 2: Append batches (repeat as needed)
PUT /api/collections/:owner/:slug/versions/upload/:sessionId
Authorization: Bearer ul_<key>

{
  "changes": {
    "added": [ ...up to 10,000 records... ],
    "updated": [ ... ],
    "removed": ["id-1", "id-2"]
  }
}

Response:
{
  "received": { "added": 5000, "updated": 0, "removed": 0 },
  "totalStaged": 15000
}

Each batch can contain up to 10,000 records. Call this endpoint as many times as needed.
If a record ID is sent more than once, the last write wins (upsert semantics).

### Step 3: Finalize the version
POST /api/collections/:owner/:slug/versions/upload/:sessionId/finalize
Authorization: Bearer ul_<key>

Response (201): same as regular push
{
  "version": 4,
  "semver": "v1.3.0",
  "hash": "...",
  "recordCount": 2000000,
  "fileCount": 5
}

Finalize applies all staged changes to the base version, validates against schemas,
computes hashes, and creates the new immutable version.

### Errors during finalize
- 409 Version conflict: someone pushed since your base_version. Start a new session.
- 422 Schema validation failed: one or more records don't match. Fix and re-upload.
- 422 Missing files: records reference files not yet uploaded. Upload them and retry finalize.
- 410 Session expired: the 1-hour window elapsed. Start a new session.

### Cancel a session
DELETE /api/collections/:owner/:slug/versions/upload/:sessionId
Response: 204 (staged records are discarded)

### Check session status
GET /api/collections/:owner/:slug/versions/upload/:sessionId
Response: { sessionId, status, recordCount, baseVersion, expiresAt, createdAt }

Status values: open, finalizing, completed, failed, expired.

Note: On successful finalize, the session and all staged records are deleted from the database.
There is no need to manually clean up completed sessions.

---

## Pagination (Records Endpoint)

The records endpoint uses cursor-based pagination for efficient traversal of large collections.

GET /api/collections/:owner/:slug/versions/:n/records?limit=100&after=record-id-42

Response:
{
  "records": [ ...up to `limit` records... ],
  "pagination": {
    "limit": 100,
    "hasMore": true,
    "nextCursor": "record-id-142",
    "total": 2000000
  }
}

Parameters:
- limit: max records per page (default 100, max 1000)
- after: cursor (record ID) — return records with IDs lexicographically after this value
- offset: legacy offset-based pagination (still supported, but cursor is preferred for large sets)
- type: filter by record type

To paginate through all records:
1. First request: GET .../records?limit=1000
2. If pagination.hasMore is true, use pagination.nextCursor for the next request:
   GET .../records?limit=1000&after=<nextCursor>
3. Repeat until hasMore is false.

---

## Record Format

{
  "id": "unique-stable-id",
  "type": "TypeName",
  "data": {
    "title": "Some value",
    "authorId": "author-123",
    "attachment": {"$file": "sha256:abc123def456..."}
  }
}

- id: stable, unique within the collection. Use your app's primary key or generate a deterministic one.
- type: groups records by kind (e.g. "Article", "Author", "Grant").
- data: flat JSON object. Reference other records by id. Reference files with {"$file": "sha256:<hex>"}.

Records are flat — no nested joins. Relationships are expressed by storing the id of the related record.
The schema declares which fields are references, so tools can resolve them at read time.

---

## Schema Discovery

Schemas are globally deduplicated by content hash. If two collections use the same type shape, they share the same schema row. This enables cross-collection discovery.

GET /api/schemas                                        → search schemas (?q=text&slug=TypeName&label=uri&schema_hash=sha256:...&limit=50&offset=0)
GET /api/schemas/:id                                    → single schema with labels and usage info
GET /api/collections/:owner/:slug/schemas               → schemas for latest version (?version=N for specific, ?raw=true to skip label enrichment)
POST /api/schemas/:id/labels                            → add label {"label": "schema.org/Person"} (requires write scope)
DELETE /api/schemas/:id/labels/:label                   → remove label (requires admin scope)

When schemas are returned via the collection schemas endpoint, known labels are injected as "x-underlay-labels" on the schema body (opt-out with ?raw=true).

---

## Versioning

- Versions are sequential integers (1, 2, 3, ...).
- Semver is derived automatically: schema change → major, record change → minor, metadata-only → patch.
- Schema change = any type's schema_id changed, or types added/removed between versions.
- Each version has a content-addressed hash computed from sorted type→schema_hash map + sorted records + sorted file hashes.
- Versions are immutable once created.

---

## Collection Management

POST /api/accounts/:owner/collections              → create collection {"slug", "name", "description", "public"}
PATCH /api/collections/:owner/:slug                → update {"name", "description", "public"}
DELETE /api/collections/:owner/:slug               → delete collection (requires admin scope)
GET /api/accounts/:owner/collections               → list collections for an account

---

## Privacy & Visibility

Underlay supports fine-grained privacy at three levels: types, fields, and individual records.
Private data is stored alongside public data in the same version but is only visible to the collection owner.

### Private Types
Mark an entire type as private in its schema. All records of that type are hidden from public readers.

"schemas": {
  "Article": {
    "type": "object",
    "properties": { "title": {"type": "string"} }
  },
  "InternalNote": {
    "type": "object",
    "private": true,
    "properties": { "note": {"type": "string"}, "articleId": {"type": "string"} }
  }
}

Public readers see only the Article type. InternalNote is completely hidden (including from the schema response).

### Private Fields
Mark individual fields as private within a type's schema. The type itself is visible, but those fields are stripped for public readers.

"Author": {
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string", "private": true},
    "phone": {"type": "string", "private": true}
  }
}

Public readers see Author records with only "name". The owner sees all fields.

### Private Records
Mark individual records as private when pushing. The type and schema are visible, but that specific record is hidden.

"changes": {
  "added": [
    {"id": "article-1", "type": "Article", "data": {...}},
    {"id": "article-2", "type": "Article", "data": {...}, "private": true}
  ]
}

article-2 is only visible to the collection owner. Public readers see article-1 only.

### How it works
- Public hash: computed from public content only (excludes private types, records, and fields)
- Private hash: computed from all content (used by the owner for integrity verification)
- Schema filtering: the schema returned to public readers omits private types and private fields
- Record filtering: queries by non-owners automatically exclude private records and strip private fields

---

## API Key Management

POST /api/accounts/keys    → create key {"label": "my-app", "scope": "write"}
                              Response includes the key once: {"key": "ul_abc123...", "id": "..."}
GET /api/accounts/keys     → list keys (id, label, scope, createdAt, lastUsedAt — not the key itself)
DELETE /api/accounts/keys/:id → revoke a key

---

## Error Codes

400 — Validation error (bad request body)
401 — Authentication required
403 — Insufficient scope (e.g. read key used for write)
404 — Not found (or private collection you can't access)
409 — Version conflict (re-fetch and retry with new base_version)
422 — Missing files (upload them first, then retry push)

---

Full documentation: https://underlay.org/docs
Integration guide: https://underlay.org/docs/integration
Source code: https://github.com/knowledgefutures/underlay