Integration Guide

Everything a developer or LLM needs to push data to the registry. No SDK required. HTTPS and JSON. For a machine-readable version, see llms.txt.

What is Underlay?

Underlay is a versioned registry for structured knowledge. Apps push snapshots of their data; Underlay preserves them, deduplicates records and files, and serves them via a stable API. Think npm for data, or Docker Hub for structured content.

Core Concepts

  • Collection: A named, versioned body of structured data. Identified by :owner/:slug.
  • Version: An immutable snapshot: JSON Schema + records + file references + metadata. Identified by semver (e.g. v1.0.0).
  • Record: A flat JSON object with an id, a type, and a data payload conforming to the schema. Content-addressed by SHA-256 hash.
  • File: A binary blob (PDF, image, etc.) stored by SHA-256 hash. Referenced in records via {"$file": "sha256:<hex>"}.

Authentication

Create an API key at /settings/keys or via the API. Pass it as:

Authorization: Bearer ul_your_key_here

Keys are scoped: read, write, or admin. Use write for pushing data.

The Push Flow

All pushes use the negotiate protocol, a three-step flow similar to git's pack negotiation:

  1. Get the current latest version (its semver string)
  2. Upload any new binary files by hash
  3. Hash your records and negotiate: send a manifest of record hashes. The server responds with which records it needs.
  4. Send records: upload only the needed records as JSONL (up to 10,000 per batch). Skip if the server already has everything.
  5. Commit: finalize and create the version.
  6. On 409 Conflict, re-fetch latest and retry from step 3
# 1. Get current state (returns the latest version's semver, e.g. "v1.2.0")
curl https://underlay.org/api/collections/:owner/:slug/versions/latest

# 2. Upload any new files
HASH=$(shasum -a 256 paper.pdf | cut -d' ' -f1)
curl -X PUT "https://underlay.org/api/collections/:owner/:slug/files/sha256:$HASH" \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/pdf" \
  --data-binary @paper.pdf

# 3. Hash your records and negotiate
curl -X POST https://underlay.org/api/collections/:owner/:slug/versions/negotiate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $KEY" \
  -d '{
    "base_version": "v1.2.0",
    "message": "Daily sync",
    "schemas": { ... },
    "manifest": [
      {"id": "record-1", "type": "Article", "hash": "abc123..."},
      {"id": "record-2", "type": "Article", "hash": "def456..."}
    ]
  }'
# → {"session_id":"...","needed_records":["def456..."],...}

# 4. Send only the records the server needs (as JSONL)
curl -X POST .../negotiate/SESSION_ID/records \
  -H "Content-Type: application/x-ndjson" \
  -H "Authorization: Bearer $KEY" \
  --data-binary '{"id":"record-2","type":"Article","data":{...}}'

# 5. Commit
curl -X POST .../negotiate/SESSION_ID/commit \
  -H "Authorization: Bearer $KEY"

Record Hashing

Before negotiating, you must hash each record client-side. The hash is the SHA-256 of the canonical JSON representation of { id, type, data } with all object keys sorted recursively. This ensures any implementation produces the same hash for the same content.

import { createHash } from 'node:crypto'

function canonicalize(value) {
  if (value === null || typeof value !== 'object') return value
  if (Array.isArray(value)) return value.map(canonicalize)
  const sorted = {}
  for (const key of Object.keys(value).sort()) {
    sorted[key] = canonicalize(value[key])
  }
  return sorted
}

function hashRecord(record) {
  const obj = { id: record.id, type: record.type, data: canonicalize(record.data) }
  return createHash('sha256').update(JSON.stringify(obj)).digest('hex')
}

See the Protocol spec for the full hashing specification with worked examples.

Record Format

Every record has three fields: id (stable string), type (matches schema), and data (the payload).

  • Relationships are plain ID strings (e.g. "authorId": "author-1")
  • Files are referenced as {"$file": "sha256:<hex>"}
  • No joins, no nesting. Keep records flat

Metadata

Each version carries a metadata object that can include description, readme, license, and any other key-value pairs. Metadata lives on the version, not the collection; it's versioned alongside your data. Set it on your first push and update it via subsequent pushes or the metadata endpoint.

To update metadata without changing records or schemas (e.g. editing the readme), PATCH /api/collections/:owner/:slug/metadata with the fields to change. This creates a patch version automatically.

First Push Example

The negotiate request for a first push. Include schemas (a per-type JSON Schema map), metadata, and a manifest of record hashes:

{
  "base_version": null,
  "message": "Initial import",
  "app_id": "my-app",
  "metadata": {
    "description": "Articles and authors from my app",
    "readme": "# My App Data\nExported from the app database."
  },
  "schemas": {
    "Article": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "body": {"type": "string"},
        "authorId": {"type": "string"},
        "publishedAt": {"type": "string", "format": "date-time"}
      }
    },
    "Author": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "email": {"type": "string"}
      }
    }
  },
  "manifest": [
    {"id": "author-1", "type": "Author", "hash": "a1b2c3..."},
    {"id": "article-1", "type": "Article", "hash": "d4e5f6..."}
  ]
}

After negotiating, send the needed records as JSONL, then commit. See the Quickstart for the complete curl walkthrough.

Mapping a SQL Database

Most apps store data in SQL. Here's how to map it to Underlay records:

-- For each table, generate a JSON Schema type:
-- table name → type name
-- column name → property name
-- column type → JSON Schema type (text→string, integer→integer, etc.)
-- foreign keys → note as ID references in the schema description

-- Example: a "publications" table with columns (id, title, doi, author_id)
-- becomes a "Publication" type with properties {title: string, doi: string, authorId: string}
-- The record id is the primary key value.

General rules:

  • Each table becomes a record type
  • Each row becomes a record (primary key → record id)
  • Foreign keys become string ID references
  • Binary columns (BLOBs) → upload as files, replace with $file references
  • Generate a JSON Schema from your column types

Versioning

Versions are identified by semver (e.g. v1.0.0). The semver is derived automatically from what changed:

  • Schema changes → major bump
  • Record or file changes → minor bump
  • Metadata-only changes (readme, license, etc.) → patch bump

The first version of a collection is always v1.0.0. The base_version in a negotiate request is a semver string (or null for the first push).

Privacy

You can control what's publicly visible at three levels:

  • Private types: Add "private": true to a type in the schema. All records of that type are hidden from public readers.
  • Private fields: Add "private": true to a field in the schema. That field is stripped from public responses.
  • Private records: Add "private": true to individual records in the manifest when negotiating. Those records are hidden from public queries.

Private content is stored in the same version; the owner always sees everything. Public readers see only the filtered view. The public content hash excludes private data, so verifiers can confirm integrity of the public subset.

API Reference

Full API docs are at /docs. The key endpoints:

POST .../versions/negotiateStart a push (hash negotiation)
POST .../negotiate/:id/recordsSend needed records (JSONL, repeatable)
POST .../negotiate/:id/commitFinalize and create the version
DELETE .../negotiate/:idCancel a negotiate session
GET .../versions/latestGet latest version
GET .../versions/:semver/recordsGet records (paginated)
GET .../versions/:semver/manifestGet record hash manifest (supports delta via ?since=)
GET .../versions/:semver/diff?from=Diff two versions
PUT .../files/:hashUpload a file
POST /api/records/batchFetch records by hash (NDJSON response)
GET /api/records/:hash/provenanceFind which collections contain a record
GET /api/collectionsBrowse public collections

Unknown Fields

If records contain fields not defined in the schema, the commit returns 422 with a list of extra fields per record. To accept stripping those fields before storage, set "strip_unknown_fields": true in the negotiate request.

When stripping is enabled, the server removes extra fields, recomputes record hashes, and stores only the schema-conformant data.

Error Handling

  • 409 Conflict: Another version was pushed since your base_version. Re-negotiate.
  • 422 Unprocessable: Records reference files that haven't been uploaded, schema validation failed, or records contain fields not in the schema.
  • 400 Bad Request: Malformed JSONL, hash mismatch, or missing records.

Pushing from Scripts

The most common pattern for pushing data from a script, cron job, or CI pipeline:

  1. Query your source (database, API, filesystem) and build an array of records in {id, type, data} format.
  2. Hash each record: SHA-256 of the canonical JSON with keys sorted recursively. See the hashing section above or the Protocol spec.
  3. Negotiate: send the manifest of { id, type, hash } entries. The server tells you which records it already has.
  4. Send missing records as JSONL. For large datasets, batch into groups of 5,000–10,000 records per request.
  5. Commit to create the version.

A minimal Node.js/Python script typically takes 30-50 lines: query your data, map rows to records, hash them, POST to negotiate. No SDK needed. See the Quickstart for a curl-based walkthrough.

Source Code

Underlay is open source: github.com/knowledgefutures/underlay

Built by Knowledge Futures, a 501(c)(3) public charity. Contact: [email protected]