Subtext Graph Specification
Version: 0.1
Author: Sebastian
Abstract
This document describes a specification for storing a knowledge graph based on the Subtext markup language in a file system. Applications adhering to this specification are able read and manipulate Subtext graphs in a conformant way. The specification includes a directory structure and a file format.
Introduction and motivation
Out of discontent with existing note-taking apps, the author started working on the note-taking application NENO (acronym for "network of notes") in early 2020. Please read the design principles of NENO as these have significant influence on this specification. Since the author did not want to create just another app, but an open and interoperable knowledge management system independent of single apps, the desire for an open specification arose. This also pays in to the goal of having all the de-facto rules of NENO not only documented in, and spread throughout the code, but collected in a normative ruleset.
Specification requirement levels
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Graph directory
A
graph directory
is a directory on a file system that contains a
Subtext graph
.
Subtext Graph
A Subtext graph
consists of
Subtext graph files
and
arbitrary graph files
in a
graph directory
and its subdirectories.
To obtain a representation of
the full graph, implementations MUST collect and parse all
Subtext graph files
in
the
graph directory
and its subdirectories.
Subtext graph file
A
Subtext graph file
is an UTF8-encoded text file with the filename
extension
.subtext
.
The Subtext graph file
's
name consists of its
slug
, followed by the filename
extension.
This section is non-normative.
Note:
A subtext graph file looks like this: <slug>.subtext
(replace <slug>
with the actual slug)
This section is non-normative.
Note:
The author has considered to use a different filename extension than
.subtext
for Subtext graph files
(e.g. .subg
, .sugra
, or .neno
),
to distinguish Subtext
graph files from files that just contain Subtext markup
language. This would also promote the fact that Subtext graph
files could contain content with a different type than Subtext markup.
However, that would introduce a breaking change in how
.subtext
files are currently being used by several active
users.
A Subtext graph file
can contain zero or more
headers
, followed by an
optional
content section
.
If a Subtext graph file
has at least one
header
and a content section
,
the header section
and the
content section
MUST be separated
with an empty line.
If a Subtext graph file
has no headers
but a
content section
, the
Subtext graph file
MUST start with
the content section
,
unless the lines from the beginning of the
content section
until the first empty line or EOF
could be parsed as
valid headers
.
In that case, implementations MUST insert a
header section
with
an empty header
at the beginning of the
Subtext graph file
.
This section is non-normative.
Note: A content section whose first lines could be parsed as a spec-compliant header section should be a very rare case. Here is an example of how the file has to look like in that case:
:: :this-is:the-content :section:
If a Subtext graph file
contains at least one header but no content
section, the
file ends with the line of the last header.
This section is non-normative.
Note: It is possible that a Subtext graph file can have an empty content section which is different from having no content section at all. If there is an empty content section, the file ends with two newline characters.
The default content type of a
Subtext graph file
is
text/vnd.subtext
.
Implementations MUST NOT add a
newline character
at the end
of the file.
This section is non-normative.
Note: However, a file can still end with a newline character, this is then considered part of the content section.
Header
A header section can be at the beginning of a
Subtext graph file
and consists of one or more
line-separated headers.
A header has the following format:
:<KEY>:<VALUE>
where <KEY>
is the header key and
<VALUE>
is the header value.
A header key has a minimum length of 0 characters, and a maximum
length of 200 characters.
Allowed are all Unicode
characters, except for a colon
(:
)
and
the newline
character
. Implementations SHOULD use only lower-case letters
and hyphens (-
) in a header key.
A header value has a minimum length of 0 characters.
There is no maximum length.
Allowed are all Unicode characters, except for the
newline
character
.
Empty header
An empty header key is a header
key with zero characters. Empty header keys
are reserved for special use.
An empty header has an empty header key and a value with zero characters
(::
).
Canonical header
Implementations SHOULD include the following header when creating a
Subtext graph file
:
Key: created-at
, value: An ISO 8601 timestamp of the time of the
creation of this file
Implementations SHOULD include the following header when creating a
Subtext graph file
:
Key: updated-at
, value: An ISO 8601 timestamp of the last time
the file's name, any
header, or the
content section
of the file has been changed.
If a
Subtext graph file
has an
updated-at
header, implementations SHOULD update the
ISO 8601 timestamp in the header
value each time the file's name, any
header, or the
content section
of the file has been changed.
If the Subtext graph file
has a content section
with the default content type,
implementations MAY include the following header when creating a content
file:
Key: content-type
, Value: The MIME type of the content
If the Subtext graph file
has a content section
with a different than the default
content type, implementations MUST include the content-type
header.
This section is non-normative.
Note: Example structure of a header section for a Subtext graph file with content.
:created-at:<ISO timestamp> :updated-at:<ISO timestamp> :content-type:<MIME type>
Structure of a header section for a Subtext graph file with content with example values:
:created-at:2024-09-29T19:22:43+02:00 :updated-at:2024-09-29T19:22:43+02:00 :content-type:text/vnd.subtext
Content section
The content section can include any Unicode code points and MUST contain
content of the
Subtext graph file
's
content-type.
Slug
A slug
is a string that identifies a
Subtext graph file
.
An implementation that wants to create a
Subtext graph file
MUST assign it a slug
that is unique in the
Subtext graph
.
An implementation SHOULD map a slug to a file path relative to the
graph directory
when
storing or retrieving the
Subtext graph file
for a specific slug. Slashes (/
) in the slug SHOULD be
interpreted as
directory separators, so that a
Subtext graph file
that has a slug with a slash in it will be placed in a subdirectory of the
graph directory
.
Slug syntax
A slug MUST have a length of at least 1 Unicode code point and 200 Unicode code points maximum.
This section is non-normative.
Note: Windows systems can handle up to 255 chars in a filename, but we truncate at 200 to leave a bit of room for the filename extension and possible future prefixes and suffixes.
A slug MUST match this ECMAScript regular expression:
/^[\p{L}\p{M}\d_][\p{L}\p{M}\d\-._]*((?<!\.)\/[\p{L}\p{M}\d\-_][\p{L}\p{M}\d\-._]*)*$/u
A slug MUST NOT contain two dots in direct succession
(..
).
A slug MUST NOT start or end with a dot (.
).
A slug MUST NOT start with a dash (-
).
A slug contains one or more slug segments. The segments can be obtained by
separating the slug at every slash (/
).
A slug segment MUST NOT start or end with a dot (.
).
A slug segment MUST NOT start with a dash (-
).
This section is non-normative.
Note: Examples of valid slugs:
foo foo/bar f-o-o/b-a-r f/o/o/b/a/r foo/bar.png
Examples of invalid slugs:
/foo foo/ .foo foo. foo./bar foo/.bar -foo
Implementations MUST only use lower-case letters in a slug.
Implementations MUST disallow the usage of dots (.) in slugs that do not
point towards
arbitrary graph files
.
Alias
An alias
is a slug
that points to another
slug
.
To create an alias
, an implementation
MUST create a
Subtext graph file
whose slug
is the
alias
.
This
Subtext graph file
MUST have a
header
with the key
alias-of
,
and the value of the target slug.
A Subtext graph file
that contains a
header
with the key alias-of
SHOULD have no
content section
.
Arbitrary graph file
An
arbitrary graph file
is a file with a different filename extension
than the one of a
Subtext graph file
.
An arbitrary graph file is part of the
Subtext graph
.
Implementations MUST store an
arbitrary graph file
inside the
graph directory
or one of its
subdirectories.
A
Subtext graph file
MUST be
created to accompany and point towards the
arbitrary graph file
. This
Subtext graph file
MUST be
stored in the same directory as the
arbitrary graph file
.
This section is non-normative.
Note: Technically, the Subtext graph file acts in this case as a sidecar file.
The
arbitrary graph file
's
name MAY be the same as the
slug
of its
accompanying
Subtext graph file
.
The
arbitrary graph file
's
name SHOULD be the same as the last slug segment of the
slug
of its
accompanying
Subtext graph file
.
This section is non-normative.
It might be easier for implementations to enforce that the arbitrary graph file's name is the same as its slug's last segment, because then it can derive filename from slug and it does not need to keep track of possible filename collisions in addition to slug collisions.
This section is non-normative.
Note:
Example filenames for Subtext graph files that point to arbitrary
graph files are:
song.mp3.subtext
pointing towards song.mp3
,
good-movie.subtext
pointing towards
movie-1234.mp4
To identify a
Subtext graph file
as a file that accompanies an
arbitrary graph file
,
the
Subtext graph file
MUST have a
header
with the key file
and
the
arbitrary graph file
's
name as value.
A Subtext graph file
with a
header
with the key file
MUST NOT
have a
content section
.
This section is non-normative.
Note:
Example header in an accompanying graph file:
:file:song.mp3
A Subtext graph file
that has a
header
with the key file
MUST also have a
header
with the key size
and as value the size of the
arbitrary graph file
in bytes.
If a Subtext graph file
has a file
header
but no size
header
,
an implementation MUST ignore this
Subtext graph file
and the
arbitrary graph file
it points towards.
Creating a slug and a normalized filename for an arbitrary graph file
If an implementation wants to include an
arbitrary graph file
in the graph, the implementation MAY use the following algorithm to
derive a slug and a normalized filename from the
arbitrary graph file
's
original name:
/********************* Helper functions *********************/ const getExtensionFromFilename = (filename: string): string | null => { const posOfDot = filename.lastIndexOf("."); if (posOfDot === -1) { return null; } const extension = filename.substring(posOfDot + 1).toLowerCase(); if (extension.length === 0) { return null; } return extension; }; const removeExtensionFromFilename = (filename: string): string => { const posOfDot = filename.lastIndexOf("."); if (posOfDot === -1) { return filename; } return filename.substring(0, posOfDot); }; const sluggifyFilename = (filename: string): string => { return filename // Trim leading/trailing whitespace .trim() // remove invalid chars .replace(/['’]+/g, "") // Replace invalid chars with dashes. .replace(/[^\p{L}\p{M}\d\-._]+/gu, "-") // Replace runs of one or more dashes with a single dash .replace(/-+/g, "-") // remove initial dot from dotfiles .replace(/^\./g, "") .toLowerCase() // remove leading and trailing dashes .replace(/^-+/, "") .replace(/-+$/, ""); }; /********************* Main function *********************/ const getSlugAndNameForNewArbitraryFile = ( namespace: string, // e.g. "files" for "files/image.png" originalFilename: string, existingSlugs: Set:<Slug>, ): { slug: Slug, filename: string } => { const extension = getExtensionFromFilename(originalFilename); const originalFilenameWithoutExtension = removeExtensionFromFilename( originalFilename, ); const sluggifiedFileStem = sluggifyFilename(originalFilenameWithoutExtension); let n = 1; while (true) { const showIntegerSuffix = n > 1; const stemWithOptionalIntegerSuffix = showIntegerSuffix ? `${sluggifiedFileStem}-${n}` : sluggifiedFileStem; const filename = stemWithOptionalIntegerSuffix + ( extension ? ( stemWithOptionalIntegerSuffix ? "." : "" ) + extension.trim().toLowerCase() : "" ); const slug: Slug = `${namespace}/${filename}`; if (!existingSlugs.has(slug)) { return { slug, filename }; } n++; } };
Subtext
Subtext markup, as defined in
https://github.com/polyrainbow/subtext/
is the default content type for
content sections
.
Implementations MUST
incorporate a Subtext parser to be able to evaluate the edges of the
Subtext graph
.
Interpretation of slashlinks
Implementations MUST interpret the value of a Subtext slashlink as a
slug
and, if the entity that the
slug
refers to exists, interpret this slashlink as an edge of the
Subtext graph
.
Interpretation of wikilinks
Implementations MUST resolve a Wikilink value to a
slug
and,
if the entity that the slug
refers to exists, interpret this Wikilink as an edge of the
Subtext graph
.
This section is non-normative.
Note: Wikilinks can only point to other notes or aliases, but not arbitrary graph files, because dots are replaced when resolving the slug from the Wikilink value. Dots are a common symbol used in Wikilink texts, so it is not desirable to leave them as-is when resolving a slug.
Wikilink slug resolver algorithm
Implementations MUST use the following algorithm to resolve a
slug
from a
Wikilink value:
/* We will replace dots with dashes, as we do not allow these chars in note slugs (even though they are generally allowed in slugs). As a consequence, this means that uploaded files with dots in slugs (like `files/image.png`) cannot be referenced via a Wikilink. Also, it will replace series of multiple slashes (//, ///, ...) with single slashes (/). In order to link to nested note slugs, we have to use "//" as separator, e.g. [[Person//Alice A.]] */ const sluggifyWikilinkText = (text: string): string => { return text // Trim leading/trailing whitespace .trim() // remove invalid chars .replace(/['’]+/g, "") // Replace invalid chars with dashes. Keep / for processing afterwards .replace(/[^\p{L}\p{M}\d\-_/]+/gu, "-") // replace single slashes .replace(/(?<!\/)\/(?!\/)/g, "-") // replace multiple slashes (//, ///, ...) with / .replace(/\/\/+/g, "/") // Replace runs of one or more dashes with a single dash .replace(/-+/g, "-") .toLowerCase() // remove leading and trailing dashes .replace(/^-+/, "") .replace(/-+$/, ""); };
Additional implementation file
An additional implementation file is a file in the
graph directory
that
is neither a
Subtext graph file
, nor an
arbitrary graph file
.
Implementations MAY store
additional implementation files
in the graph
directory.
Such files MUST NOT have the filename extension .subtext
and
there MUST NOT be a
Subtext graph file
with the same slug
as the additional implementation file's name.
This section is non-normative.
Note:
As an example, an application might want to store the favorite notes
of a user inside the
graph directory
.
It could do that by creating a file named
favorites.txt
. The application then needs to take care that no
arbitrary graph file with the same slug is created.
The application might also want to use dotfiles (e.g.
.favorites
) for this use case. Since dots (.) are disallowed
at the beginning of a slug, there is no danger of a collision with a
slug.
Newline character
Newline characters are \n
characters. \r
characters are
ignored.
License
CC-BY-SA 4.0