Concepts

Underlay has four core primitives. Everything else is built from these.

Collection

A collection (plural: collections) is a named, versioned body of structured data. It belongs to an account (a user or an organization) and is identified by :owner/:slug — for example, knowledge-futures/pubpub-archive.

A collection can be public (browsable by anyone) or private (visible only to the owner and org members). Each collection has its own independent version history.

Version

A version is an immutable snapshot of a collection at a point in time. Each version contains:

  • A JSON Schema describing the structure of the records
  • A set of records — the actual data
  • References to files — binary assets
  • Metadata — who pushed it, when, from which app, with what message

Versions are numbered sequentially (1, 2, 3…) and also carry a semver label. The semver is derived automatically:

  • Schema changes → major bump
  • Record changes → minor bump
  • Metadata-only changes → patch bump

Each version also has a hash — a SHA-256 digest of the canonical representation of the schema, records, and file references. Two versions with the same hash have identical content.

Record

A record is a flat JSON object with an id and a type. Records are the rows of your data.

{
  "id": "pub-001",
  "type": "Publication",
  "data": {
    "title": "The Structure of Scientific Revolutions",
    "doi": "10.1234/example",
    "authors": ["author-001", "author-002"],
    "pdf": { "$file": "sha256:a1b2c3..." }
  }
}

Relationships between records are expressed as ID references — just strings. There are no joins, no foreign keys. An LLM or application can resolve references by reading the schema and records together.

Binary data is referenced via {"$file": "sha256:..."} — a pointer to a content-addressed file in the registry.

File

A file is a binary blob (PDF, image, dataset, anything) stored by its SHA-256 hash. Files are content-addressed: the same bytes always produce the same hash, so identical files are stored only once regardless of how many records reference them.

Files are uploaded before pushing a version. When you push, the registry verifies that every $file reference in your records points to an existing file.

Accounts

Underlay has two account types:

  • Users — individual accounts with email/password login
  • Organizations — group accounts with members who have roles (owner, admin, member)

Both can own collections. API keys are scoped to an account and optionally to a specific collection, with permission levels: read, write, or admin.

Privacy & Visibility

Underlay supports fine-grained privacy at three levels, allowing you to store sensitive data alongside public data in the same collection.

Collection-level

A collection can be public (listed in browse, readable by anyone) or private (visible only to the owner and org members).

Type-level

Mark an entire record type as private in the schema by adding "private": true to the type definition. All records of that type are hidden from public readers, and the type is stripped from the schema response.

Field-level

Mark individual fields within a type as private by adding "private": true to the field definition. The type remains visible, but those fields are stripped from records returned to public readers.

Record-level

Mark individual records as private by including "private": true in the record when pushing. The record is hidden from public queries but visible to the collection owner.

Private content is excluded from the public hash (used for verifying publicly-visible content) but included in the private hash (used by owners for full integrity verification).