Concepts
Underlay has four core primitives. Everything else is built from these.
Collection
A collection (plural: collections) is a named, versioned body of structured data. It belongs to an account (a user or an organization) and is identified by :owner/:slug, e.g. knowledge-futures/pubpub-archive.
A collection can be public (browsable by anyone) or private (visible only to the owner and org members). Each collection has its own independent version history.
Version
A version is an immutable snapshot of a collection at a point in time. Each version contains:
- A JSON Schema describing the structure of the records
- A set of records (the actual data)
- References to files (binary assets)
- A metadata object that can contain
readme,license, and other fields
Versions are identified by semver (e.g. v1.0.0, v1.1.0, v2.0.0). The semver is derived automatically from what changed:
- Schema changes → major bump
- Record or file changes → minor bump
- Metadata-only changes (readme, license, etc.) → patch bump
Each version also has a hash, a SHA-256 digest of the canonical representation of the schema, records, and file references. Two versions with the same hash have identical content.
Record
A record is a flat JSON object with an id, a type, and a data payload. Records are the rows of your data.
{
"id": "pub-001",
"type": "Publication",
"data": {
"title": "The Structure of Scientific Revolutions",
"doi": "10.1234/example",
"authors": ["author-001", "author-002"],
"pdf": { "$file": "sha256:a1b2c3..." }
}
}Records are content-addressed: each record is identified by the SHA-256 hash of its canonical JSON ({"id":...,"type":...,"data":...}). This means:
- The same record appearing in multiple collections is stored only once.
- Pushing a new version only transfers records the server doesn't already have (via the negotiate protocol).
- Any record can be traced back to every collection and version that includes it (provenance).
Relationships between records are expressed as ID references (just strings). There are no joins, no foreign keys. An LLM or application can resolve references by reading the schema and records together.
Records are validated against the schema on push. If a record contains fields not defined in the schema, the push is rejected with a 422 listing the extra fields. Set strip_unknown_fields to accept stripping them automatically.
Binary data is referenced via {"$file": "sha256:..."}, a pointer to a content-addressed file in the registry. The wire format for records is JSONL, one record per line, independently hashable and streamable.
File
A file is a binary blob (PDF, image, dataset, anything) stored by its SHA-256 hash. Files are content-addressed: the same bytes always produce the same hash, so identical files are stored only once regardless of how many records reference them.
Files are uploaded before pushing a version. When you push, the registry verifies that every $file reference in your records points to an existing file.
Accounts
Underlay has two account types:
- Users: individual accounts with email/password login
- Organizations: group accounts with members who have roles (owner, admin, member)
Both can own collections. API keys are scoped to an account and optionally to a specific collection, with permission levels: read, write, or admin.
Privacy & Visibility
Underlay supports fine-grained privacy at three levels, allowing you to store sensitive data alongside public data in the same collection.
Collection-level
A collection can be public (listed in browse, readable by anyone) or private (visible only to the owner and org members).
Type-level
Mark an entire record type as private in the schema by adding "private": true to the type definition. All records of that type are hidden from public readers, and the type is stripped from the schema response.
Field-level
Mark individual fields within a type as private by adding "private": true to the field definition. The type remains visible, but those fields are stripped from records returned to public readers.
Record-level
Mark individual records as private by including "private": true in the record when pushing. The record is hidden from public queries but visible to the collection owner.
Private content is excluded from the public hash (used for verifying publicly-visible content) but included in the private hash (used by owners for full integrity verification).