Skip to content

The round-trip contract

Inkroom’s core promise: md → doc → md is the identity function. This page is the precise statement of that promise — what is guaranteed, what is normalized, and how it is enforced.

Canonical form

Markdown allows the same document to be written many ways (*em* vs _em_, setext vs ATX headings, + vs - bullets). A WYSIWYG editor cannot preserve which variant you typed — so Inkroom defines a canonical form C(md): parse, then stringify with fixed settings.

ConstructCanonical form
HeadingsATX (##), never setext
Bullets- (* only to disambiguate adjacent lists)
Ordered markers1. 2. 3. (incrementing; start preserved)
Emphasis / strong*em* / **strong**
Code blocksbacktick fences, info string preserved verbatim
Thematic break*** (a dash rule is setext/front-matter ambiguous — fuzzer-proven)
Hard breaktrailing \
Link titles"double quotes"
Tablespipe-padded GFM (or fixed-format HTML when merged)

The guarantees (CI-enforced, byte-level)

For every document in the golden corpus — CommonMark constructs, all of GFM, front matter, math, footnotes, CJK/RTL text, deep nesting, merged-cell tables:

  1. IdempotenceC(C(x)) === C(x)
  2. Identityserialize(parse(C(x))) === C(x), byte for byte
  3. Schema validity — every parsed document passes ProseMirror’s check()
  4. Doc stabilityparse(serialize(doc)) equals doc structurally

A regression in any of these fails CI. There is no allowlist of “known diffs”.

Documented normalizations

Applied once, when a document first enters the editor:

  • Reference links/images are inlined ([x][ref] + [ref]: url[x](url)). A WYSIWYG surface cannot keep the indirection intact through arbitrary edits, so we normalize it up front instead of corrupting it silently.
  • Soft line breaks join to spaces — paragraphs become one logical line (semantics unchanged per CommonMark; hard breaks are preserved as \).
  • Autolink forms[url](url) and <url> normalize to the autolink form; GFM literal www. links normalize to explicit links.
  • Empty paragraphs are dropped on save (markdown has no representation for them).
  • Front matter is preserved verbatim, including formatting inside it.

Things that round-trip because we engineered them to

  • Merged-cell tables: GFM can’t express them, so they serialize as a fixed-format HTML <table> and are lifted back into a fully editable table on parse — byte-stable in both directions.
  • Fence info strings (```ts {1,3} title="x") — language and meta survive.
  • Tight vs loose lists, ordered-list start numbers, task states.
  • Raw HTML blocks/inlines — verbatim bytes, sanitized preview.
  • Footnotes — definitions stay where you wrote them.
  • Track-changes marks — default serialization is the accepted view; getMarkdown({ suggestions: "critic" }) emits open CriticMarkup.

Known limits (v1, honest)

  • $…$ math requires @inkroom/plugin-math (deliberate: $5-$10 in prose must never become math).
  • Mode-switch cursor mapping inside heavily-escaped text is approximate (clamped to the containing node — never lost).
  • Two adjacent identical links merge into one (ProseMirror mark model).

Enforced by testing/golden-corpus — 69 cases and growing. Found markdown that breaks a guarantee? That’s a high-priority bug: please file it with the document attached.