The round-trip contract
Inkroom’s core promise: md → doc → md is the identity function. This
page is the precise statement of that promise — what is guaranteed, what is
normalized, and how it is enforced.
Canonical form
Markdown allows the same document to be written many ways (*em* vs _em_,
setext vs ATX headings, + vs - bullets). A WYSIWYG editor cannot preserve
which variant you typed — so Inkroom defines a canonical form C(md):
parse, then stringify with fixed settings.
| Construct | Canonical form |
|---|---|
| Headings | ATX (##), never setext |
| Bullets | - (* only to disambiguate adjacent lists) |
| Ordered markers | 1. 2. 3. (incrementing; start preserved) |
| Emphasis / strong | *em* / **strong** |
| Code blocks | backtick fences, info string preserved verbatim |
| Thematic break | *** (a dash rule is setext/front-matter ambiguous — fuzzer-proven) |
| Hard break | trailing \ |
| Link titles | "double quotes" |
| Tables | pipe-padded GFM (or fixed-format HTML when merged) |
The guarantees (CI-enforced, byte-level)
For every document in the golden corpus — CommonMark constructs, all of GFM, front matter, math, footnotes, CJK/RTL text, deep nesting, merged-cell tables:
- Idempotence —
C(C(x)) === C(x) - Identity —
serialize(parse(C(x))) === C(x), byte for byte - Schema validity — every parsed document passes ProseMirror’s
check() - Doc stability —
parse(serialize(doc))equalsdocstructurally
A regression in any of these fails CI. There is no allowlist of “known diffs”.
Documented normalizations
Applied once, when a document first enters the editor:
- Reference links/images are inlined (
[x][ref]+[ref]: url→[x](url)). A WYSIWYG surface cannot keep the indirection intact through arbitrary edits, so we normalize it up front instead of corrupting it silently. - Soft line breaks join to spaces — paragraphs become one logical line
(semantics unchanged per CommonMark; hard breaks are preserved as
\). - Autolink forms —
[url](url)and<url>normalize to the autolink form; GFM literalwww.links normalize to explicit links. - Empty paragraphs are dropped on save (markdown has no representation for them).
- Front matter is preserved verbatim, including formatting inside it.
Things that round-trip because we engineered them to
- Merged-cell tables: GFM can’t express them, so they serialize as a
fixed-format HTML
<table>and are lifted back into a fully editable table on parse — byte-stable in both directions. - Fence info strings (
```ts {1,3} title="x") — language and meta survive. - Tight vs loose lists, ordered-list start numbers, task states.
- Raw HTML blocks/inlines — verbatim bytes, sanitized preview.
- Footnotes — definitions stay where you wrote them.
- Track-changes marks — default serialization is the accepted view;
getMarkdown({ suggestions: "critic" })emits open CriticMarkup.
Known limits (v1, honest)
$…$math requires @inkroom/plugin-math (deliberate:$5-$10in prose must never become math).- Mode-switch cursor mapping inside heavily-escaped text is approximate (clamped to the containing node — never lost).
- Two adjacent identical links merge into one (ProseMirror mark model).
Enforced by testing/golden-corpus — 69 cases
and growing. Found markdown that breaks a guarantee? That’s a
high-priority bug: please file it with the document attached.