aboutsummaryrefslogtreecommitdiff
path: root/blog/content/notes/tech/document-formats.gmi
blob: 385c0c0e656551d57fac664c51087619c471a06a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Document formats

Most of the time, when writing a document, I want a document format with the following properties:

* Fast to write using a plain text editor
* Easy to parse into an AST

An AST is a programming-friendly representation of a document. ASTs reduce the effort required to write tools such as a program that validates links in a document. Ideally, ASTs contain information to track a document element to the position it occupies in the original document. With this information, if you write a tool such as a spell checker, then you can highlight misspelled works precisely in the original document.

On top of that, some features that I don't always need:

* Math support
* Sophisticated code blocks. For example, being able to highlight arbitrary parts of code blocks (not syntax highlighting).
* Diagram support

## Existing formats

### Markdown

* Easy to write using a plain text editor
* Has good AST parsers with position information
* Has math support
* Does not support sophisticated code blocks
* There are many extensions with support for math, diagrams, and many others
* Is very popular and supported everywhere
* However, there is a wide variety of variants and quirks
* Especifically, because Markdown was not designed with parsing in mind, so tools based on different parsers can have differences in behavior

### Djot

=> https://djot.net

It is very similar to Markdown, except:

* It is designed for parsing, so independent parsing implementations are very compatible with each other
* It is not so popular, so there are less extension and tool support

### AsciiDoc

=> https://asciidoc.org

Compared to Markdown:

* It's more complex to write, but mostly because it's different and more powerful
* There are attempts to write better parsers, but good parsers with position information are not available yet
* Supports sophisticated code blocks
* It has a smaller ecosystem than Markdown, but many good quality tools such as Antora

### Typst

=> https://typst.app

Checks all my boxes, except:

* It is designed for parsing and it has an AST, but it is not easy to access
* Currently Typst is very oriented towards generating paged documents (e.g. PDF)
* It includes a full programming language, which is mostly good (very extensible), but this might increase complexity undesirably

Typst is very new and is not yet very popular.

=> https://codeberg.org/haydn/typesetter Typesetter is a desktop application that embeds Typst, so no additional setup is needed. However, Typesetter is only available as a Flatpak.

### Verso

=> https://github.com/leanprover/verso

A Markdown-like closely tied to the Lean programming language.

* Eliminates ambiguous syntax for easier parsing and is stricter (not all text is valid Verso)
* Has a (Lean) data model
* Designed for extensibility

### TODO: other formats

=> https://github.com/nota-lang/nota Nota (a document language for the browser)
=> https://github.com/christianvoigt/argdown Argdown (for argumentation)
=> https://github.com/podlite/podlite Podlite
=> https://orgmode.org Org Mode (an Emacs-based tool based on a lightweight markup language)
=> https://github.com/nvim-neorg Neorg (similar to Org Mode for Neovim)
=> https://github.com/sile-typesetter/sile Sile (typesetting system)

## Creating your own formats

=> https://github.com/spc476/MOPML Someone created its own lightweight format using Lua and PEGs.
=> https://tratt.net/laurie/blog/2020/which_parsing_approach.html "Which parsing approach" has information about choosing parsing approaches.

## About gemtext

=> https://geminiprotocol.net/docs/gemtext-specification.gmi

Gemtext is an extremely minimalistic markup language designed for use with the Gemini protocol (an extremely minimalistic protocol similar to HTTP).

The Gemini protocol and gemtext are intentionally designed to limit their power, in my opinion as a comment on the web.

This document is gemtext-native for use in my own minimalistic publishing system.

I also use it as a statement, although the limitations of gemtext can be significant in technical writing. For example, gemtext has no inline links, no inline verbatim code, only three levels of headings, etc.