Structured Document Text

Structured, normalized text for PDFs, EPUBs, and snapshots, with more formats planned (e.g., OCR images, transcribed audio).

Core principles

document-worker is the sole structured-text producer
Every block/text node maps back to the source file
Structured text alone must support all text-based annotations
Preserve internal reference links among inline and block nodes, but leave relationship types for consumers to infer
The outline is the primary index for document structure

Use cases

Reading mode (Desktop/iOS/Android/Web) with text annotations
Text layer for the Zotero Reader PDF viewer
Structured data for agents
Section-level chunking for embeddings
Outline preview outside the Reader (e.g., in the item pane)

Performance goals (WIP)

Requirements

Instant access to any structured-text region
Low memory use during long-lived in-memory access
Random access when stored on disk or in S3
Low storage overhead

Design decisions

Large JSON is slow and memory-heavy, so store gzipped, sharded JSON chunks with a top-level index (effectively binary)
For PDFs, node-to-position maps are strings since they dominate size

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
src		src
test		test
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
schema.d.ts		schema.d.ts
schema.json		schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structured Document Text

Core principles

Use cases

Performance goals (WIP)

Requirements

Design decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Structured Document Text

Core principles

Use cases

Performance goals (WIP)

Requirements

Design decisions

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages