From fb63c1e341c9e8859c9f69f566a6fe94ca54dd6c Mon Sep 17 00:00:00 2001 From: alex Date: Sun, 22 Feb 2026 20:05:12 +0100 Subject: Move document formats to blog website --- blog/content/notes/index.gmi | 1 + blog/content/notes/tech/document-formats.gmi | 97 ++++++++++++++++++++++++++++ misc/document-formats.md | 83 ------------------------ 3 files changed, 98 insertions(+), 83 deletions(-) create mode 100644 blog/content/notes/tech/document-formats.gmi delete mode 100644 misc/document-formats.md diff --git a/blog/content/notes/index.gmi b/blog/content/notes/index.gmi index 3fb8729d..e28ebf70 100644 --- a/blog/content/notes/index.gmi +++ b/blog/content/notes/index.gmi @@ -13,6 +13,7 @@ Notes about some books I like: => tech/misc-linux-stuff Misc Linux stuff => tech/running-commands-in-linux Running commands in Linux => tech/ssh-for-beginners SSH for beginners +=> tech/document-formats Document formats ### Gadgets diff --git a/blog/content/notes/tech/document-formats.gmi b/blog/content/notes/tech/document-formats.gmi new file mode 100644 index 00000000..385c0c0e --- /dev/null +++ b/blog/content/notes/tech/document-formats.gmi @@ -0,0 +1,97 @@ +# Document formats + +Most of the time, when writing a document, I want a document format with the following properties: + +* Fast to write using a plain text editor +* Easy to parse into an AST + +An AST is a programming-friendly representation of a document. ASTs reduce the effort required to write tools such as a program that validates links in a document. Ideally, ASTs contain information to track a document element to the position it occupies in the original document. With this information, if you write a tool such as a spell checker, then you can highlight misspelled works precisely in the original document. + +On top of that, some features that I don't always need: + +* Math support +* Sophisticated code blocks. For example, being able to highlight arbitrary parts of code blocks (not syntax highlighting). +* Diagram support + +## Existing formats + +### Markdown + +* Easy to write using a plain text editor +* Has good AST parsers with position information +* Has math support +* Does not support sophisticated code blocks +* There are many extensions with support for math, diagrams, and many others +* Is very popular and supported everywhere +* However, there is a wide variety of variants and quirks +* Especifically, because Markdown was not designed with parsing in mind, so tools based on different parsers can have differences in behavior + +### Djot + +=> https://djot.net + +It is very similar to Markdown, except: + +* It is designed for parsing, so independent parsing implementations are very compatible with each other +* It is not so popular, so there are less extension and tool support + +### AsciiDoc + +=> https://asciidoc.org + +Compared to Markdown: + +* It's more complex to write, but mostly because it's different and more powerful +* There are attempts to write better parsers, but good parsers with position information are not available yet +* Supports sophisticated code blocks +* It has a smaller ecosystem than Markdown, but many good quality tools such as Antora + +### Typst + +=> https://typst.app + +Checks all my boxes, except: + +* It is designed for parsing and it has an AST, but it is not easy to access +* Currently Typst is very oriented towards generating paged documents (e.g. PDF) +* It includes a full programming language, which is mostly good (very extensible), but this might increase complexity undesirably + +Typst is very new and is not yet very popular. + +=> https://codeberg.org/haydn/typesetter Typesetter is a desktop application that embeds Typst, so no additional setup is needed. However, Typesetter is only available as a Flatpak. + +### Verso + +=> https://github.com/leanprover/verso + +A Markdown-like closely tied to the Lean programming language. + +* Eliminates ambiguous syntax for easier parsing and is stricter (not all text is valid Verso) +* Has a (Lean) data model +* Designed for extensibility + +### TODO: other formats + +=> https://github.com/nota-lang/nota Nota (a document language for the browser) +=> https://github.com/christianvoigt/argdown Argdown (for argumentation) +=> https://github.com/podlite/podlite Podlite +=> https://orgmode.org Org Mode (an Emacs-based tool based on a lightweight markup language) +=> https://github.com/nvim-neorg Neorg (similar to Org Mode for Neovim) +=> https://github.com/sile-typesetter/sile Sile (typesetting system) + +## Creating your own formats + +=> https://github.com/spc476/MOPML Someone created its own lightweight format using Lua and PEGs. +=> https://tratt.net/laurie/blog/2020/which_parsing_approach.html "Which parsing approach" has information about choosing parsing approaches. + +## About gemtext + +=> https://geminiprotocol.net/docs/gemtext-specification.gmi + +Gemtext is an extremely minimalistic markup language designed for use with the Gemini protocol (an extremely minimalistic protocol similar to HTTP). + +The Gemini protocol and gemtext are intentionally designed to limit their power, in my opinion as a comment on the web. + +This document is gemtext-native for use in my own minimalistic publishing system. + +I also use it as a statement, although the limitations of gemtext can be significant in technical writing. For example, gemtext has no inline links, no inline verbatim code, only three levels of headings, etc. diff --git a/misc/document-formats.md b/misc/document-formats.md deleted file mode 100644 index 9ee21acc..00000000 --- a/misc/document-formats.md +++ /dev/null @@ -1,83 +0,0 @@ -# Document formats - -Most of the time, when writing a document, I want a document format with the following properties: - -* Fast to write using a plain text editor -* Easy to parse into an AST - -An AST is a programming-friendly representation of a document. -ASTs reduce the effort required to write tools such as a program that validates links in a document. -Ideally, ASTs contain information to track a document element to the position it occupies in the original document. -With this information, if you write a tool such as a spell checker, then you can highlight misspelled works precisely in the original document. - -On top of that, some features that I don't always need: - -* Math support -* Sophisticated code blocks. - For example, being able to highlight arbitrary parts of code blocks (not syntax highlighting). -* Diagram support - -## Existing formats - -### Markdown - -* Easy to write using a plain text editor -* Has good AST parsers with position information -* Has math support -* Does not support sophisticated code blocks -* There are many extensions with support for math, diagrams, and many others -* Is very popular and supported everywhere -* However, there is a wide variety of variants and quirks -* Especifically, because Markdown was not designed with parsing in mind, so tools based on different parsers can have differences in behavior - -### [Djot](https://djot.net/) - -It is very similar to Markdown, except: - -* It is designed for parsing, so independent parsing implementations are very compatible with each other -* It is not so popular, so there are less extension and tool support - -### [AsciiDoc](https://asciidoc.org/) - -Compared to Markdown: - -* It's more complex to write, but mostly because it's different and more powerful -* There are attempts to write better parsers, but good parsers with position information are not available yet -* Supports sophisticated code blocks -* It has a smaller ecosystem than Markdown, but many good quality tools such as Antora - -### [Typst](https://typst.app/) - -Checks all my boxes, except: - -* It is designed for parsing and it has an AST, but it is not easy to access -* Currently Typst is very oriented towards generating paged documents (e.g. PDF) -* It includes a full programming language, which is mostly good (very extensible), but this might increase complexity undesirably - -Typst is very new and is not yet very popular. - -[Typesetter](https://codeberg.org/haydn/typesetter) is a desktop application that embeds Typst, so no additional setup is needed. -However, Typesetter is only available as a Flatpak. - -### [Verso](https://github.com/leanprover/verso) - -A Markdown-like closely tied to [the Lean programming language](https://lean-lang.org/): - -* Eliminates ambiguous syntax for easier parsing and is stricter (not all text is valid Verso) -* Has a (Lean) data model -* Designed for extensibility - -### TODO: other formats - -- https://github.com/nota-lang/nota -- https://github.com/christianvoigt/argdown -- https://github.com/nvim-neorg -- https://github.com/podlite/podlite/ -- https://orgmode.org/ -- https://github.com/sile-typesetter/sile - -## Creating your own formats - -https://github.com/spc476/MOPML someone created its own lightweight format using Lua and PEGs. - -https://tratt.net/laurie/blog/2020/which_parsing_approach.html has information about choosing parsing approaches. -- cgit v1.2.3