From d6a7009656be7cc8bd2b8f449835799e97fd7020 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 14:45:45 -0600 Subject: [PATCH 01/22] Create xxxx-resource-markup.md Initial commit before PR --- proposals/xxxx-resource-markup.md | 157 ++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 proposals/xxxx-resource-markup.md diff --git a/proposals/xxxx-resource-markup.md b/proposals/xxxx-resource-markup.md new file mode 100644 index 00000000000..ab6f48913ec --- /dev/null +++ b/proposals/xxxx-resource-markup.md @@ -0,0 +1,157 @@ +# Marking up resources + +This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Threads (MSC3440)](https://github.com/matrix-org/matrix-doc/blob/gsouquet/threading-via-relations/proposals/3440-threading-via-relations.md) to represent annotations and discussion. This MSC specifies: + +* Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. +* Additional (optional) data in the `m.room.child` and `m.room.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. +* An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. + +# Proposal + +## Additional data in `m.room.create` + +A space will be considered a *resource* if its creation event includes a key `m.markup.resource`. + +The `m.markup.resource` value MUST include either: + +1. an `m.file` key, populated according to the `m.file` schema as presented in [Extensible Events - Files (MSC3551)](https://github.com/matrix-org/matrix-doc/blob/travis/msc/extev/files/proposals/3551-extensible-events-files.md), or +2. a `url` and `mimetype` key. This format is prefered for potentially mutable resources (like web pages with dynamic content) or for resources that require multiple network requests to display properly. + +Clients should recognize that a `url` subordinate to an `m.markup.resource` (including within an `m.file` value) may contain URI schemes other than `mxc`. It may contain `http(s)`, and may ultimately contain other schemes in the future. Clients handling `m.markup.resource` should be prepared to fail gracefully upon encountering an unrecognized scheme. + +An optional `md5_hash` key may be included. If present, this key should be populated by an md5 hash of the resource, for file-integrity checking. + +### Examples + +#### A hypothetical web resource + +``` +{ + "type": "m.room.create", + "state_key": "", + "content": { + "creator": "@example:example.org", + "m.federate": true, + "room_version": "7" + "m.markup.resource": { + "url": "https://danilafe.com/blog/introducing_highlight/" + "mimetype": "text/html" + } + } + } +} +``` + + +## Additional data in `m.room.child` and `m.room.parent` + +Children of resources will be considered *conversations concerning* the resource. For purposes of discoverability, may sometimes be helpful to attach additional data to the content of `m.space.child` and `m.space.parent` events, in order to indicate a specific part of the resource that the conversation is based upon. The location of the part of the resource that the conversation is based upon will be indicated by the value of an `m.markup.location` key within the contents of the `m.space.child` and/or `m.space.parent` event. + +Different mimetypes will require different notions of "location". A need for new notions of location may become evident over time. For example PDFs begin with a need to specify highlighted regions and then at a later date, pindrop locations. One location might also reasonably be presented in two or more different ways. For example, in a PDF, a location might be presented both as coordinates designating a region of a page, and as a tag or set of tags with offsets for use with a screen reader. In an audio file, a location might be presented both as a pair of bounding timestamps and as a pair of offsets within the text of embedded lyrics. + +Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. + +### Examples + +#### A hypothetical audio annotation: + +``` +{ + "type": "m.space.child", + "state_key": "!abcd:example.com", + "content": { + "via": ["example.com", "test.org"] + "m.markup.location": { + "m.markup.audio_timespan" { + "begin": 0 + "end": 31983 + } + "com.genius.markup.lyrics" { + "begin": 0 + "end": 35 + } + } + } +} +``` + +## Annotation Message Events + +It may be desirable, within a conversation concerning a resource, to make reference to some part of the resource. Annotation message events make this possible. + +An annotation message event will treat `m.markup` as an extensible event schema following [Extensible events (MSC1767)](https://github.com/matrix-org/matrix-doc/pull/1767), but the message will ordinarily include an `m.text` value with text optionally describing the annotation as a fallback. The `m.markup` value will consist of an `m.markup.location`, and an `m.markup.parent` that indicates the room id of the resource with which the annotation message is associated. (The latter is necessary when a room has more than one parent resource.) Until migration to extensible events is complete, annotations will send messages of the type `m.room.message`, for compatibility with non-annotation-aware clients. + +### Examples + +#### An annotation prior to MSC1767 adoption + + +``` +{ + "type": "m.room.message", + "content": { + "msgtype": "m.emote", + "body": "created an annotation", + "org.matrix.msc1767.text": "created an annotation", + "m.markup": { + "m.markup.location": {..} + "m.markup.parent": "!WKZqabcAWoDDNZzupv:matrix.org" + } + } +} +``` + +#### An annotation after MSC1767 adoption and migration + + +``` +{ + "type": "m.markup", + "content": { + "m.text": "created an annotation", + "m.emote": {} + "m.markup": { + "m.markup.location": {..} + "m.markup.parent": "!WKZqabcAWoDDNZzupv:matrix.org" + } + } +} +``` + +# Potential Issues + +There's no notion of "ownership" for state events---anyone who can send `m.space.parent` events can overwrite `m.space.parent` events sent by others. So anyone who can create a conversation concerning a certain resource can also remove conversations created by others. Clients can partly mitigate this by at least discouraging accidental deletions and encouraging courtesy. A more robust mitigation might be to introduce subspaces of resources, within which less-trusted users could still create conversations concerning a given resource. However, this seems undesirably complicated for an initial implementation. If it turns out to be necessary in practice, it could be added in a future MSC. + +# Alternatives + +## Greater generality + +The idea of attaching conversations to locations might be construed even more broadly, to incorporate spaces representing resources that aren't easily associated with mimetypes and urls. For example, someone might want to create a space with rooms located at some sort of geospatial region, or located during some time-slice of an event. + +However, these more abstract cases can be subsumed under the design here. Geospatial data can be represented using something like [geojson](https://en.wikipedia.org/wiki/GeoJSON) or some other standard, and time-slices of events can be represented as locations within a recording of the event (or locations within some other representation of the event, if no recording is available). + +## Resources as a space type or subtype + +Resources could be designated as such using an `m.purpose` event, as in [Room subtyping (MSC3088)](https://github.com/matrix-org/matrix-doc/blob/travis/msc/mutable-subtypes/proposals/3088-room-subtyping.md), or with an `m.room.type` event as in [Room Types (MSC1840)](https://github.com/matrix-org/matrix-doc/pull/1840). + +However, + +1. Indicating an associated resource in the room creation event makes it possible to inspect an invitation to a new space, allowing annotation-oriented clients to ignore irrelevant invitations. +2. If `m.purpose` or `m.room.type` are integrated into the spec and turn out to be useful for, e.g. filtering, then it would be straightforward to designate one or more `m.purpose` values or `m.room.type` values for resource rooms. + +## Standalone `m.annotation.location` state events + +Rather than being represented by `m.space.child` events, annotations that open a conversation concerning a part of a resource could be introduced as a new kind of state event. This has the disadvange of not making relationships between a resource and conversations about its parts visible to clients which are space-aware but not annotation-aware. + +# Security Considerations + +None. + +# Unstable Prefix + +| Proposed Final Identifier | Purpose | Development Identifier | +| ------------------------- | ---------------------------------------------------------- | ----------------------------------------- | +| `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.mscXXX.markup.location` | +| `m.markup.resource` | key in `m.create` | `com.open-tower.mscXXX.markup.resource` | +| `m.markup` | extensible event schema | `com.open-tower.mscXXX.markup` | +| `m.markup.parent` | key in `m.annotation` | `com.open-tower.mscXXX.markup.parent` | From bf4c2ab1905d703b09e3a78aafa508702b633464 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 14:52:01 -0600 Subject: [PATCH 02/22] Fix msc numbering --- .../{xxxx-resource-markup.md => 3574-resource-markup.md} | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) rename proposals/{xxxx-resource-markup.md => 3574-resource-markup.md} (97%) diff --git a/proposals/xxxx-resource-markup.md b/proposals/3574-resource-markup.md similarity index 97% rename from proposals/xxxx-resource-markup.md rename to proposals/3574-resource-markup.md index ab6f48913ec..b376d758a8b 100644 --- a/proposals/xxxx-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -151,7 +151,7 @@ None. | Proposed Final Identifier | Purpose | Development Identifier | | ------------------------- | ---------------------------------------------------------- | ----------------------------------------- | -| `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.mscXXX.markup.location` | -| `m.markup.resource` | key in `m.create` | `com.open-tower.mscXXX.markup.resource` | -| `m.markup` | extensible event schema | `com.open-tower.mscXXX.markup` | -| `m.markup.parent` | key in `m.annotation` | `com.open-tower.mscXXX.markup.parent` | +| `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.msc3574.markup.location` | +| `m.markup.resource` | key in `m.create` | `com.open-tower.msc3574.markup.resource` | +| `m.markup` | extensible event schema | `com.open-tower.msc3574.markup` | +| `m.markup.parent` | key in `m.annotation` | `com.open-tower.msc3574.markup.parent` | From 187de9e6c290c967107040fdd38f65fcd292a9b3 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 15:01:02 -0600 Subject: [PATCH 03/22] Replace old reference to threads This approach fell away during drafting --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index b376d758a8b..bd479b80319 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -1,6 +1,6 @@ # Marking up resources -This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Threads (MSC3440)](https://github.com/matrix-org/matrix-doc/blob/gsouquet/threading-via-relations/proposals/3440-threading-via-relations.md) to represent annotations and discussion. This MSC specifies: +This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Extensible Events (MSC1767)](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1767/proposals/1767-extensible-events.md) to represent annotations and discussion. This MSC specifies: * Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. * Additional (optional) data in the `m.room.child` and `m.room.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. From 767928c1c624b9e4d06fb7f6c8c107ac562e78d8 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 15:13:11 -0600 Subject: [PATCH 04/22] Switch to sha256 for integrity checking --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index bd479b80319..1cf61ce7419 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -19,7 +19,7 @@ The `m.markup.resource` value MUST include either: Clients should recognize that a `url` subordinate to an `m.markup.resource` (including within an `m.file` value) may contain URI schemes other than `mxc`. It may contain `http(s)`, and may ultimately contain other schemes in the future. Clients handling `m.markup.resource` should be prepared to fail gracefully upon encountering an unrecognized scheme. -An optional `md5_hash` key may be included. If present, this key should be populated by an md5 hash of the resource, for file-integrity checking. +An optional `sha256_hash` key may be included. If present, this key should be populated by a sha256 hash of the resource, for file-integrity checking. ### Examples From 877c0107d6f4548b34bbc1f643a52dcc39cd1c57 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sun, 19 Dec 2021 11:07:13 -0600 Subject: [PATCH 05/22] Fix some event names --- proposals/3574-resource-markup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 1cf61ce7419..5f8fd0860a3 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -3,7 +3,7 @@ This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Extensible Events (MSC1767)](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1767/proposals/1767-extensible-events.md) to represent annotations and discussion. This MSC specifies: * Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. -* Additional (optional) data in the `m.room.child` and `m.room.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. +* Additional (optional) data in the `m.space.child` and `m.space.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. * An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. # Proposal @@ -43,7 +43,7 @@ An optional `sha256_hash` key may be included. If present, this key should be po ``` -## Additional data in `m.room.child` and `m.room.parent` +## Additional data in `m.space.child` and `m.space.parent` Children of resources will be considered *conversations concerning* the resource. For purposes of discoverability, may sometimes be helpful to attach additional data to the content of `m.space.child` and `m.space.parent` events, in order to indicate a specific part of the resource that the conversation is based upon. The location of the part of the resource that the conversation is based upon will be indicated by the value of an `m.markup.location` key within the contents of the `m.space.child` and/or `m.space.parent` event. From c7778d698caad8144dfb4747d74606a10bfa3cf0 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sun, 19 Dec 2021 16:59:31 -0600 Subject: [PATCH 06/22] First revision * add discussions as threads alternative * note possibility of a widget interface --- proposals/3574-resource-markup.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 5f8fd0860a3..97d9879f8dc 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -6,6 +6,8 @@ This MSC proposes a way to annotate and discuss various resources (web pages, do * Additional (optional) data in the `m.space.child` and `m.space.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. * An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. +Resources and markup can be displayed in specialized annotation-aware clients, and potentially in a widget or widget-like interface for widget-compatible clients, although the latter possibility remains speculative. The use of extensible events and standard space events should provide for a reasonable degree of compatibility with general-purpose matrix clients. + # Proposal ## Additional data in `m.room.create` @@ -143,6 +145,21 @@ However, Rather than being represented by `m.space.child` events, annotations that open a conversation concerning a part of a resource could be introduced as a new kind of state event. This has the disadvange of not making relationships between a resource and conversations about its parts visible to clients which are space-aware but not annotation-aware. +## Discussions as threads + +Discussions concerning a part of resource could be modeled as threads rooted in `m.markup` message events, using [Threading via `m.thread` relation (MSC3440)](https://github.com/matrix-org/matrix-doc/pull/3440). The current proposal is intended to be compatible with clients that model discussions this way, since those clients are free to build threads rooted in `m.markup` events and display these however they like. However, an alternative approach to marking up resources would be to *only* introduce `m.markup` events, and expect all clients to folow a thread-based model. + +Instead, the current proposal also provides the option of modeling discussions concerning a resouce as standalone rooms. There are a number of advantages to this choice. Discussions modeled as rooms inherit: + +* Unread counts and read-up-to-markers +* Typing notifications +* Fine-grained access controls (for e.g. a private discussions of a part of a resource) +* Conversation threading (since MSC3440 doesn't allow nested threads) + +among other things. + +The main disadvantage to allowing both models seems to be the possibility of fragmentation and incompatibility between annotation-aware clients. In practice, this seems unlikely to be a major problem. + # Security Considerations None. From 49b6c2b62516baeeae008517a349b71745c7536e Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 14:45:45 -0600 Subject: [PATCH 07/22] Create xxxx-resource-markup.md Initial commit before PR --- proposals/xxxx-resource-markup.md | 157 ++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 proposals/xxxx-resource-markup.md diff --git a/proposals/xxxx-resource-markup.md b/proposals/xxxx-resource-markup.md new file mode 100644 index 00000000000..ab6f48913ec --- /dev/null +++ b/proposals/xxxx-resource-markup.md @@ -0,0 +1,157 @@ +# Marking up resources + +This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Threads (MSC3440)](https://github.com/matrix-org/matrix-doc/blob/gsouquet/threading-via-relations/proposals/3440-threading-via-relations.md) to represent annotations and discussion. This MSC specifies: + +* Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. +* Additional (optional) data in the `m.room.child` and `m.room.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. +* An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. + +# Proposal + +## Additional data in `m.room.create` + +A space will be considered a *resource* if its creation event includes a key `m.markup.resource`. + +The `m.markup.resource` value MUST include either: + +1. an `m.file` key, populated according to the `m.file` schema as presented in [Extensible Events - Files (MSC3551)](https://github.com/matrix-org/matrix-doc/blob/travis/msc/extev/files/proposals/3551-extensible-events-files.md), or +2. a `url` and `mimetype` key. This format is prefered for potentially mutable resources (like web pages with dynamic content) or for resources that require multiple network requests to display properly. + +Clients should recognize that a `url` subordinate to an `m.markup.resource` (including within an `m.file` value) may contain URI schemes other than `mxc`. It may contain `http(s)`, and may ultimately contain other schemes in the future. Clients handling `m.markup.resource` should be prepared to fail gracefully upon encountering an unrecognized scheme. + +An optional `md5_hash` key may be included. If present, this key should be populated by an md5 hash of the resource, for file-integrity checking. + +### Examples + +#### A hypothetical web resource + +``` +{ + "type": "m.room.create", + "state_key": "", + "content": { + "creator": "@example:example.org", + "m.federate": true, + "room_version": "7" + "m.markup.resource": { + "url": "https://danilafe.com/blog/introducing_highlight/" + "mimetype": "text/html" + } + } + } +} +``` + + +## Additional data in `m.room.child` and `m.room.parent` + +Children of resources will be considered *conversations concerning* the resource. For purposes of discoverability, may sometimes be helpful to attach additional data to the content of `m.space.child` and `m.space.parent` events, in order to indicate a specific part of the resource that the conversation is based upon. The location of the part of the resource that the conversation is based upon will be indicated by the value of an `m.markup.location` key within the contents of the `m.space.child` and/or `m.space.parent` event. + +Different mimetypes will require different notions of "location". A need for new notions of location may become evident over time. For example PDFs begin with a need to specify highlighted regions and then at a later date, pindrop locations. One location might also reasonably be presented in two or more different ways. For example, in a PDF, a location might be presented both as coordinates designating a region of a page, and as a tag or set of tags with offsets for use with a screen reader. In an audio file, a location might be presented both as a pair of bounding timestamps and as a pair of offsets within the text of embedded lyrics. + +Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. + +### Examples + +#### A hypothetical audio annotation: + +``` +{ + "type": "m.space.child", + "state_key": "!abcd:example.com", + "content": { + "via": ["example.com", "test.org"] + "m.markup.location": { + "m.markup.audio_timespan" { + "begin": 0 + "end": 31983 + } + "com.genius.markup.lyrics" { + "begin": 0 + "end": 35 + } + } + } +} +``` + +## Annotation Message Events + +It may be desirable, within a conversation concerning a resource, to make reference to some part of the resource. Annotation message events make this possible. + +An annotation message event will treat `m.markup` as an extensible event schema following [Extensible events (MSC1767)](https://github.com/matrix-org/matrix-doc/pull/1767), but the message will ordinarily include an `m.text` value with text optionally describing the annotation as a fallback. The `m.markup` value will consist of an `m.markup.location`, and an `m.markup.parent` that indicates the room id of the resource with which the annotation message is associated. (The latter is necessary when a room has more than one parent resource.) Until migration to extensible events is complete, annotations will send messages of the type `m.room.message`, for compatibility with non-annotation-aware clients. + +### Examples + +#### An annotation prior to MSC1767 adoption + + +``` +{ + "type": "m.room.message", + "content": { + "msgtype": "m.emote", + "body": "created an annotation", + "org.matrix.msc1767.text": "created an annotation", + "m.markup": { + "m.markup.location": {..} + "m.markup.parent": "!WKZqabcAWoDDNZzupv:matrix.org" + } + } +} +``` + +#### An annotation after MSC1767 adoption and migration + + +``` +{ + "type": "m.markup", + "content": { + "m.text": "created an annotation", + "m.emote": {} + "m.markup": { + "m.markup.location": {..} + "m.markup.parent": "!WKZqabcAWoDDNZzupv:matrix.org" + } + } +} +``` + +# Potential Issues + +There's no notion of "ownership" for state events---anyone who can send `m.space.parent` events can overwrite `m.space.parent` events sent by others. So anyone who can create a conversation concerning a certain resource can also remove conversations created by others. Clients can partly mitigate this by at least discouraging accidental deletions and encouraging courtesy. A more robust mitigation might be to introduce subspaces of resources, within which less-trusted users could still create conversations concerning a given resource. However, this seems undesirably complicated for an initial implementation. If it turns out to be necessary in practice, it could be added in a future MSC. + +# Alternatives + +## Greater generality + +The idea of attaching conversations to locations might be construed even more broadly, to incorporate spaces representing resources that aren't easily associated with mimetypes and urls. For example, someone might want to create a space with rooms located at some sort of geospatial region, or located during some time-slice of an event. + +However, these more abstract cases can be subsumed under the design here. Geospatial data can be represented using something like [geojson](https://en.wikipedia.org/wiki/GeoJSON) or some other standard, and time-slices of events can be represented as locations within a recording of the event (or locations within some other representation of the event, if no recording is available). + +## Resources as a space type or subtype + +Resources could be designated as such using an `m.purpose` event, as in [Room subtyping (MSC3088)](https://github.com/matrix-org/matrix-doc/blob/travis/msc/mutable-subtypes/proposals/3088-room-subtyping.md), or with an `m.room.type` event as in [Room Types (MSC1840)](https://github.com/matrix-org/matrix-doc/pull/1840). + +However, + +1. Indicating an associated resource in the room creation event makes it possible to inspect an invitation to a new space, allowing annotation-oriented clients to ignore irrelevant invitations. +2. If `m.purpose` or `m.room.type` are integrated into the spec and turn out to be useful for, e.g. filtering, then it would be straightforward to designate one or more `m.purpose` values or `m.room.type` values for resource rooms. + +## Standalone `m.annotation.location` state events + +Rather than being represented by `m.space.child` events, annotations that open a conversation concerning a part of a resource could be introduced as a new kind of state event. This has the disadvange of not making relationships between a resource and conversations about its parts visible to clients which are space-aware but not annotation-aware. + +# Security Considerations + +None. + +# Unstable Prefix + +| Proposed Final Identifier | Purpose | Development Identifier | +| ------------------------- | ---------------------------------------------------------- | ----------------------------------------- | +| `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.mscXXX.markup.location` | +| `m.markup.resource` | key in `m.create` | `com.open-tower.mscXXX.markup.resource` | +| `m.markup` | extensible event schema | `com.open-tower.mscXXX.markup` | +| `m.markup.parent` | key in `m.annotation` | `com.open-tower.mscXXX.markup.parent` | From c95d52eb28487ff906687fa9c25eb9ea4665d8d8 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 14:52:01 -0600 Subject: [PATCH 08/22] Fix msc numbering --- .../{xxxx-resource-markup.md => 3574-resource-markup.md} | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) rename proposals/{xxxx-resource-markup.md => 3574-resource-markup.md} (97%) diff --git a/proposals/xxxx-resource-markup.md b/proposals/3574-resource-markup.md similarity index 97% rename from proposals/xxxx-resource-markup.md rename to proposals/3574-resource-markup.md index ab6f48913ec..b376d758a8b 100644 --- a/proposals/xxxx-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -151,7 +151,7 @@ None. | Proposed Final Identifier | Purpose | Development Identifier | | ------------------------- | ---------------------------------------------------------- | ----------------------------------------- | -| `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.mscXXX.markup.location` | -| `m.markup.resource` | key in `m.create` | `com.open-tower.mscXXX.markup.resource` | -| `m.markup` | extensible event schema | `com.open-tower.mscXXX.markup` | -| `m.markup.parent` | key in `m.annotation` | `com.open-tower.mscXXX.markup.parent` | +| `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.msc3574.markup.location` | +| `m.markup.resource` | key in `m.create` | `com.open-tower.msc3574.markup.resource` | +| `m.markup` | extensible event schema | `com.open-tower.msc3574.markup` | +| `m.markup.parent` | key in `m.annotation` | `com.open-tower.msc3574.markup.parent` | From 395c8a926f74ec8ceb2ccb1144bcef294f66ca10 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 15:01:02 -0600 Subject: [PATCH 09/22] Replace old reference to threads This approach fell away during drafting --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index b376d758a8b..bd479b80319 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -1,6 +1,6 @@ # Marking up resources -This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Threads (MSC3440)](https://github.com/matrix-org/matrix-doc/blob/gsouquet/threading-via-relations/proposals/3440-threading-via-relations.md) to represent annotations and discussion. This MSC specifies: +This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Extensible Events (MSC1767)](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1767/proposals/1767-extensible-events.md) to represent annotations and discussion. This MSC specifies: * Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. * Additional (optional) data in the `m.room.child` and `m.room.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. From 1bf60f0ed60e375e029a8cca03b71f6b738d5040 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 18 Dec 2021 15:13:11 -0600 Subject: [PATCH 10/22] Switch to sha256 for integrity checking --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index bd479b80319..1cf61ce7419 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -19,7 +19,7 @@ The `m.markup.resource` value MUST include either: Clients should recognize that a `url` subordinate to an `m.markup.resource` (including within an `m.file` value) may contain URI schemes other than `mxc`. It may contain `http(s)`, and may ultimately contain other schemes in the future. Clients handling `m.markup.resource` should be prepared to fail gracefully upon encountering an unrecognized scheme. -An optional `md5_hash` key may be included. If present, this key should be populated by an md5 hash of the resource, for file-integrity checking. +An optional `sha256_hash` key may be included. If present, this key should be populated by a sha256 hash of the resource, for file-integrity checking. ### Examples From 3ba0e0ad2ab825d345e2ad73fcab2f70db695554 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sun, 19 Dec 2021 11:07:13 -0600 Subject: [PATCH 11/22] Fix some event names --- proposals/3574-resource-markup.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 1cf61ce7419..5f8fd0860a3 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -3,7 +3,7 @@ This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Extensible Events (MSC1767)](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1767/proposals/1767-extensible-events.md) to represent annotations and discussion. This MSC specifies: * Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. -* Additional (optional) data in the `m.room.child` and `m.room.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. +* Additional (optional) data in the `m.space.child` and `m.space.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. * An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. # Proposal @@ -43,7 +43,7 @@ An optional `sha256_hash` key may be included. If present, this key should be po ``` -## Additional data in `m.room.child` and `m.room.parent` +## Additional data in `m.space.child` and `m.space.parent` Children of resources will be considered *conversations concerning* the resource. For purposes of discoverability, may sometimes be helpful to attach additional data to the content of `m.space.child` and `m.space.parent` events, in order to indicate a specific part of the resource that the conversation is based upon. The location of the part of the resource that the conversation is based upon will be indicated by the value of an `m.markup.location` key within the contents of the `m.space.child` and/or `m.space.parent` event. From 50a4e1026bd4ab0609cf791df5fa712230648ef6 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sun, 19 Dec 2021 16:59:31 -0600 Subject: [PATCH 12/22] First revision * add discussions as threads alternative * note possibility of a widget interface --- proposals/3574-resource-markup.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 5f8fd0860a3..97d9879f8dc 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -6,6 +6,8 @@ This MSC proposes a way to annotate and discuss various resources (web pages, do * Additional (optional) data in the `m.space.child` and `m.space.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. * An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. +Resources and markup can be displayed in specialized annotation-aware clients, and potentially in a widget or widget-like interface for widget-compatible clients, although the latter possibility remains speculative. The use of extensible events and standard space events should provide for a reasonable degree of compatibility with general-purpose matrix clients. + # Proposal ## Additional data in `m.room.create` @@ -143,6 +145,21 @@ However, Rather than being represented by `m.space.child` events, annotations that open a conversation concerning a part of a resource could be introduced as a new kind of state event. This has the disadvange of not making relationships between a resource and conversations about its parts visible to clients which are space-aware but not annotation-aware. +## Discussions as threads + +Discussions concerning a part of resource could be modeled as threads rooted in `m.markup` message events, using [Threading via `m.thread` relation (MSC3440)](https://github.com/matrix-org/matrix-doc/pull/3440). The current proposal is intended to be compatible with clients that model discussions this way, since those clients are free to build threads rooted in `m.markup` events and display these however they like. However, an alternative approach to marking up resources would be to *only* introduce `m.markup` events, and expect all clients to folow a thread-based model. + +Instead, the current proposal also provides the option of modeling discussions concerning a resouce as standalone rooms. There are a number of advantages to this choice. Discussions modeled as rooms inherit: + +* Unread counts and read-up-to-markers +* Typing notifications +* Fine-grained access controls (for e.g. a private discussions of a part of a resource) +* Conversation threading (since MSC3440 doesn't allow nested threads) + +among other things. + +The main disadvantage to allowing both models seems to be the possibility of fragmentation and incompatibility between annotation-aware clients. In practice, this seems unlikely to be a major problem. + # Security Considerations None. From eeb188fdf8f0e064207a2e34f7385e5ce96b9e17 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Mon, 10 Jan 2022 13:22:35 -0600 Subject: [PATCH 13/22] Link MSC 3592, discuss Web Annotation Data Model --- proposals/3574-resource-markup.md | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 97d9879f8dc..42f5b7c58e1 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -51,7 +51,9 @@ Children of resources will be considered *conversations concerning* the resource Different mimetypes will require different notions of "location". A need for new notions of location may become evident over time. For example PDFs begin with a need to specify highlighted regions and then at a later date, pindrop locations. One location might also reasonably be presented in two or more different ways. For example, in a PDF, a location might be presented both as coordinates designating a region of a page, and as a tag or set of tags with offsets for use with a screen reader. In an audio file, a location might be presented both as a pair of bounding timestamps and as a pair of offsets within the text of embedded lyrics. -Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. +Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. Some proposed location types are described in: + +- [MSC3592: Markup locations for PDF documents)[https://github.com/matrix-org/matrix-doc/pull/3592] ### Examples @@ -160,7 +162,28 @@ among other things. The main disadvantage to allowing both models seems to be the possibility of fragmentation and incompatibility between annotation-aware clients. In practice, this seems unlikely to be a major problem. -# Security Considerations +## The Web Annotation Data Model + +The [Web Annotation Working group](https://www.w3.org/annotation/) at the W3C has published a detailed set of recommendations for interoperable and shared web annotation. These include both [a data model](https://www.w3.org/TR/annotation-model/) and [a protocol](https://www.w3.org/TR/annotation-protocol/) for annotation servers. The fundamental idea of the annotation model is to view an annotation as a connection between zero or more "body" resources, annotating one or more "target" resources. + +intro_model + +Targets are identified using a combination of an internationalized URI, and a set of [selectors](https://www.w3.org/TR/annotation-model/#selectors) (e.g. XPaths or CSS selectors) to pick out a part of the resource designated by the URI. Bodies can be identified in the same way, or embedded as formatted or unformatted text within the annotation. + +Matrix could adopt a markup spec based on this set of recommendations, and focus on annotations that link targets, potentially specified via `mxc://` or `matrix://` uris, to bodies, potentially presented as `matrix://` or `mxc://` uris. Annotations could be sent in timelines as extensible events, and in resource spaces meant to collect sets of annotations - roughly corresponding to the w3c's concept of [annotation collections and pages](https://www.w3.org/TR/annotation-model/#collections). Downstream, it's conceivable that annotation servers speaking the w3c annotation protocol with a matrix backend could be implemented as app services. + +Some advantages of this proposal are: + +1. In principle, using a w3c spec should support greater interoperability with other annotation systems, as well as better exportability, archivability, machine-readability... +2. The w3c spec is detailed and includes features (a notion of annotation state, accessibilty attribution, rendering advice...) that go considerably beyond what this MSC proposes, and whose inclusion into the matrix spec proper might feel like bloat. +3. The w3c spec is broad enough to support the functionality provided by this MSC, but also supports some other functionality (i.e. annotations with embedded bodies, annotations linking to standalone content in or out of matrix). + +Some disadvantages are: + +1. In practice, near-term prospects for interoperability might be limited. There are not many [implementations](https://w3c.github.io/test-results/annotation-model/all.html) in the wild. Even, for example, Hypothes.is supports only part of the w3c data model, and [apparently only in an (undocumented?) read-only capacity](https://github.com/hypothesis/h/blob/28c2c5bdf5d85f12307ed56f90995ad1c1f214ac/h/routes.py#L122). If desired, bridging might be better accomplished by using APIs for individual annotation services directly, rather than by routing through an incompletely supported data model. +2. The w3c spec's selectors for PDF annotation are somewhat limited, and more generally the set of selectors built into the spec are not likely to cover all use cases. The w3c spec does incorporate [an extension mechanism](https://www.w3.org/TR/annotation-vocab/#extensions), via JSON-LD contexts. Perhaps the matrix spec for document markup would want to eventually incorporate a well-documented JSON-LD context for any extended selector types that become important. + +# Security Considerations - None. From 3ff7e2a6e4a729030e3f2760d48db469ee8c92de Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Mon, 10 Jan 2022 14:45:33 -0600 Subject: [PATCH 14/22] Clarify hypothes.is support level for w3c standard --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 9678369d221..afe07378430 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -182,7 +182,7 @@ Some advantages of this proposal are: Some disadvantages are: -1. In practice, near-term prospects for interoperability might be limited. There are not many [implementations](https://w3c.github.io/test-results/annotation-model/all.html) in the wild. Even, for example, Hypothes.is supports only part of the w3c data model, and [apparently only in an (undocumented?) read-only capacity](https://github.com/hypothesis/h/blob/28c2c5bdf5d85f12307ed56f90995ad1c1f214ac/h/routes.py#L122). If desired, bridging might be better accomplished by using APIs for individual annotation services directly, rather than by routing through an incompletely supported data model. +1. In practice, near-term prospects for interoperability might be limited. There are not many [implementations](https://w3c.github.io/test-results/annotation-model/all.html) in the wild. Even, for example, Hypothes.is supports only part of the w3c data model, and [apparently only in an undocumented read-only capacity](https://github.com/hypothesis/h/blob/28c2c5bdf5d85f12307ed56f90995ad1c1f214ac/h/routes.py#L122). If desired, bridging might be better accomplished by using APIs for individual annotation services directly, rather than by routing through an incompletely supported data model. 2. The w3c spec's selectors for PDF annotation are somewhat limited, and more generally the set of selectors built into the spec are not likely to cover all use cases. The w3c spec does incorporate [an extension mechanism](https://www.w3.org/TR/annotation-vocab/#extensions), via JSON-LD contexts. Perhaps the matrix spec for document markup would want to eventually incorporate a well-documented JSON-LD context for any extended selector types that become important. # Security Considerations - From e240904c52a6c03ef260a7d38fee6a8b2dc68ae2 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Mon, 10 Jan 2022 16:02:54 -0600 Subject: [PATCH 15/22] Fix merge artifact --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index afe07378430..454d9447b2f 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -163,7 +163,7 @@ among other things. The main disadvantage to allowing both models seems to be the possibility of fragmentation and incompatibility between annotation-aware clients. In practice, this seems unlikely to be a major problem. # Security Considerations -======= + ## The Web Annotation Data Model The [Web Annotation Working group](https://www.w3.org/annotation/) at the W3C has published a detailed set of recommendations for interoperable and shared web annotation. These include both [a data model](https://www.w3.org/TR/annotation-model/) and [a protocol](https://www.w3.org/TR/annotation-protocol/) for annotation servers. The fundamental idea of the annotation model is to view an annotation as a connection between zero or more "body" resources, annotating one or more "target" resources. From 2911bf3886416fb2b418383a2f5ff594230ba8b0 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Tue, 11 Jan 2022 08:50:25 -0600 Subject: [PATCH 16/22] Add w3c disadvantage: overlap in functionality w3c spec duplicates some Matrix functionality --- proposals/3574-resource-markup.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 454d9447b2f..4d49745b9e8 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -184,8 +184,9 @@ Some disadvantages are: 1. In practice, near-term prospects for interoperability might be limited. There are not many [implementations](https://w3c.github.io/test-results/annotation-model/all.html) in the wild. Even, for example, Hypothes.is supports only part of the w3c data model, and [apparently only in an undocumented read-only capacity](https://github.com/hypothesis/h/blob/28c2c5bdf5d85f12307ed56f90995ad1c1f214ac/h/routes.py#L122). If desired, bridging might be better accomplished by using APIs for individual annotation services directly, rather than by routing through an incompletely supported data model. 2. The w3c spec's selectors for PDF annotation are somewhat limited, and more generally the set of selectors built into the spec are not likely to cover all use cases. The w3c spec does incorporate [an extension mechanism](https://www.w3.org/TR/annotation-vocab/#extensions), via JSON-LD contexts. Perhaps the matrix spec for document markup would want to eventually incorporate a well-documented JSON-LD context for any extended selector types that become important. +3. The w3c spec is sufficiently expansive that it overlaps to some extent with existing Matrix functionality. For example, you could have an annotation targeting a matrix event, whose "purpose" (a field allowed by the w3c spec) is "reply" (a value listed in the spec). Care would need to be taken not to create confusing duplications like this. -# Security Considerations - +# Security Considerations None. From caf39f458ee0dfde6dad5c7a561b57a19185f110 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Thu, 13 Jan 2022 09:16:27 -0600 Subject: [PATCH 17/22] Fix Markup Typo --- proposals/3574-resource-markup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 4d49745b9e8..310d2c77dae 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -53,7 +53,7 @@ Different mimetypes will require different notions of "location". A need for new Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. Some proposed location types are described in: -- [MSC3592: Markup locations for PDF documents)[https://github.com/matrix-org/matrix-doc/pull/3592] +- [MSC3592: Markup locations for PDF documents](https://github.com/matrix-org/matrix-doc/pull/3592) ### Examples From 3d8aa19600061274920813d3967b40a20a3a7ed4 Mon Sep 17 00:00:00 2001 From: gleachkr Date: Sat, 15 Jan 2022 18:25:17 -0600 Subject: [PATCH 18/22] Fix another typo --- proposals/3574-resource-markup.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 310d2c77dae..adbb70e5946 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -162,8 +162,6 @@ among other things. The main disadvantage to allowing both models seems to be the possibility of fragmentation and incompatibility between annotation-aware clients. In practice, this seems unlikely to be a major problem. -# Security Considerations - ## The Web Annotation Data Model The [Web Annotation Working group](https://www.w3.org/annotation/) at the W3C has published a detailed set of recommendations for interoperable and shared web annotation. These include both [a data model](https://www.w3.org/TR/annotation-model/) and [a protocol](https://www.w3.org/TR/annotation-protocol/) for annotation servers. The fundamental idea of the annotation model is to view an annotation as a connection between zero or more "body" resources, annotating one or more "target" resources. From e7a2dbcb567f7aa479be9e7f469c93274c071f9f Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Wed, 23 Feb 2022 14:59:52 -0600 Subject: [PATCH 19/22] First pass at w3c WADM serialization --- proposals/3574-resource-markup.md | 65 +++++++++++++++++++++++++------ 1 file changed, 54 insertions(+), 11 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index adbb70e5946..18af94eefb1 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -5,6 +5,7 @@ This MSC proposes a way to annotate and discuss various resources (web pages, do * Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. * Additional (optional) data in the `m.space.child` and `m.space.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. * An annotation event that is used within child rooms. The specific data describing the annotation location is once again resource-specific, and will be described in further MSCs. +* A general scheme for serializing annotations in a way compatible with the [w3c web annotation data model](https://www.w3.org/TR/annotation-model/) Resources and markup can be displayed in specialized annotation-aware clients, and potentially in a widget or widget-like interface for widget-compatible clients, although the latter possibility remains speculative. The use of extensible events and standard space events should provide for a reasonable degree of compatibility with general-purpose matrix clients. @@ -44,12 +45,11 @@ An optional `sha256_hash` key may be included. If present, this key should be po } ``` - ## Additional data in `m.space.child` and `m.space.parent` -Children of resources will be considered *conversations concerning* the resource. For purposes of discoverability, may sometimes be helpful to attach additional data to the content of `m.space.child` and `m.space.parent` events, in order to indicate a specific part of the resource that the conversation is based upon. The location of the part of the resource that the conversation is based upon will be indicated by the value of an `m.markup.location` key within the contents of the `m.space.child` and/or `m.space.parent` event. +Children of resources will be considered *conversations concerning* the resource. For purposes of discoverability, may sometimes be helpful to attach additional data to the content of `m.space.child` and `m.space.parent` events, in order to indicate a specific part of the resource that the conversation concerns. The location of the part of the resource that the conversation is based upon will be indicated by the value of an `m.markup.location` key within the contents of the `m.space.child` and/or `m.space.parent` event. -Different mimetypes will require different notions of "location". A need for new notions of location may become evident over time. For example PDFs begin with a need to specify highlighted regions and then at a later date, pindrop locations. One location might also reasonably be presented in two or more different ways. For example, in a PDF, a location might be presented both as coordinates designating a region of a page, and as a tag or set of tags with offsets for use with a screen reader. In an audio file, a location might be presented both as a pair of bounding timestamps and as a pair of offsets within the text of embedded lyrics. +Different Media Types will require different notions of "location". A need for new notions of location may become evident over time. For example PDFs begin with a need to specify highlighted regions and then at a later date, pindrop locations. One location might also reasonably be presented in two or more different ways. For example, in a PDF, a location might be presented both as coordinates designating a region of a page, and as a tag or set of tags with offsets for use with a screen reader. In an audio file, a location might be presented both as a pair of bounding timestamps and as a pair of offsets within the text of embedded lyrics. Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. Some proposed location types are described in: @@ -122,6 +122,55 @@ An annotation message event will treat `m.markup` as an extensible event schema } ``` +## Web Annotation Data Model serialization + +The [Web Annotation Working group](https://www.w3.org/annotation/) at the W3C has published a detailed set of recommendations for interoperable and shared web annotation. These include both [a data model](https://www.w3.org/TR/annotation-model/) and [a protocol](https://www.w3.org/TR/annotation-protocol/) for annotation servers. The fundamental idea of the annotation model is to view an annotation as a connection between zero or more "body" resources, annotating one or more "target" resources. + +intro_model + +Targets are identified using a combination of an internationalized URI, and a set of [selectors](https://www.w3.org/TR/annotation-model/#selectors) (e.g. XPaths or CSS selectors) to pick out a part of the resource designated by the URI. Bodies can be identified in the same way, or embedded as formatted or unformatted text within the annotation. + +The web annotation data model is an open standard for ensuring interoperability between web annotation systems. It is partly implemented by [Hypothes.is](https://web.hypothes.is), and has the backing of [a coalition of publishers and research institutions](https://hypothes.is/annotating-all-knowledge/). It's desirable, from the point of view of the Matrix foundation's guiding principles (e.g. interoperability rather than fragmentation, openness rather than proprietary lock-in) that we ensure compatibility with this standard. Adhering to the standard may also open the way for bridging with other compliant annotation systems in the future. + +`m.space.child` and `m.space.parent` annotations can be serialized to the web annotation data model according to the following minimal scheme: + +``` +{ + "@context": "http://www.w3.org/ns/anno.jsonld", + "id": "matrix:EVENT", + "type": "Annotation", + "body": "matrix:ANNOTATION", + "target": { + "type": "SpecificResource", + "source": "matrix:RESOURCE", + "selector": SELECTORS + } +} +``` + +Here, `matrix:EVENT` is a matrix URI indicating the `m.space.child` or `m.space.parent` event being serialized, `matrix:ANNOTATION` is a matrix uri for the child room serving as an annotation, `matrix:RESOURCE` is a uri for the parent room serving as a resource. This corresponds, on the w3c model, to a simple annotation with one target (the resource) and one body (the annotation room). + +SELECTORS stands for zero or more [selectors](https://www.w3.org/TR/annotation-model/#selectors) in the sense of the w3c model. These can be, for example, XPaths, CSS selectors, or offsets within a file. The specific selectors used should be determined by the locations present in the annotation event. MSCs introducing location types should specify how, or whether, these location types can be put into correspondence with w3c selectors. + +It may be desirable to also indicate an `http` url at which the file attached to the resource can be located (as derived from the `m.markup.resource` field of the resource's creation event). This can be done by replacing the `target` field above with + +``` + "target": [{ + "type": "SpecificResource", + "source": "matrix:RESOURCE", + "selector": SELECTORS + }, + { + "type": "SpecificResource", + "source": "https://ADDRESS", + "selector": SELECTORS + }] +``` + +It may be desirable, from the point of view of particular implementations of w3c model serialization, to include other fields in the w3c model serialization - for example, an [embedded textual body](https://www.w3.org/TR/annotation-model/#embedded-textual-body) based on the body of the first message in the annotation room might be necessary for bridging, a [state object](https://www.w3.org/TR/annotation-model/#states) indicating the time of creation might be important for archival purposes, and an indication of [the rendering software](https://www.w3.org/TR/annotation-model/#rendering-software) used to view the annotation might be helpful for interoperability. Attaching additional data like this is allowed and left to choice at the level of implementations. + +Annotation messages can be serialized to the w3c model much as above, but with the `body` field omitted - the w3c model permits annotations with empty bodies. + # Potential Issues There's no notion of "ownership" for state events---anyone who can send `m.space.parent` events can overwrite `m.space.parent` events sent by others. So anyone who can create a conversation concerning a certain resource can also remove conversations created by others. Clients can partly mitigate this by at least discouraging accidental deletions and encouraging courtesy. A more robust mitigation might be to introduce subspaces of resources, within which less-trusted users could still create conversations concerning a given resource. However, this seems undesirably complicated for an initial implementation. If it turns out to be necessary in practice, it could be added in a future MSC. @@ -164,13 +213,7 @@ The main disadvantage to allowing both models seems to be the possibility of fra ## The Web Annotation Data Model -The [Web Annotation Working group](https://www.w3.org/annotation/) at the W3C has published a detailed set of recommendations for interoperable and shared web annotation. These include both [a data model](https://www.w3.org/TR/annotation-model/) and [a protocol](https://www.w3.org/TR/annotation-protocol/) for annotation servers. The fundamental idea of the annotation model is to view an annotation as a connection between zero or more "body" resources, annotating one or more "target" resources. - -intro_model - -Targets are identified using a combination of an internationalized URI, and a set of [selectors](https://www.w3.org/TR/annotation-model/#selectors) (e.g. XPaths or CSS selectors) to pick out a part of the resource designated by the URI. Bodies can be identified in the same way, or embedded as formatted or unformatted text within the annotation. - -Matrix could adopt a markup spec based on this set of recommendations, and focus on annotations that link targets, potentially specified via `mxc://` or `matrix://` uris, to bodies, potentially presented as `matrix://` or `mxc://` uris. Annotations could be sent in timelines as extensible events, and in resource spaces meant to collect sets of annotations - roughly corresponding to the w3c's concept of [annotation collections and pages](https://www.w3.org/TR/annotation-model/#collections). Downstream, it's conceivable that annotation servers speaking the w3c annotation protocol with a matrix backend could be implemented as app services. +Matrix could adopt a markup spec that strictly follows the w3c data model, for example by requiring that annotation events include a valid serialized w3c annotation as their contents (or as an option in an extensible event). Some advantages of this proposal are: @@ -180,7 +223,7 @@ Some advantages of this proposal are: Some disadvantages are: -1. In practice, near-term prospects for interoperability might be limited. There are not many [implementations](https://w3c.github.io/test-results/annotation-model/all.html) in the wild. Even, for example, Hypothes.is supports only part of the w3c data model, and [apparently only in an undocumented read-only capacity](https://github.com/hypothesis/h/blob/28c2c5bdf5d85f12307ed56f90995ad1c1f214ac/h/routes.py#L122). If desired, bridging might be better accomplished by using APIs for individual annotation services directly, rather than by routing through an incompletely supported data model. +1. In practice, near-term prospects for interoperability might be limited. There are not many [implementations](https://w3c.github.io/test-results/annotation-model/all.html) in the wild. Even, for example, Hypothes.is supports only part of the w3c data model, and [apparently only in an undocumented read-only capacity](https://github.com/hypothesis/h/blob/28c2c5bdf5d85f12307ed56f90995ad1c1f214ac/h/routes.py#L122). If desired, bridging might be better accomplished by using APIs for individual annotation services directly, together with the minimal serialization proposed above, rather than by routing through an incompletely supported data model. 2. The w3c spec's selectors for PDF annotation are somewhat limited, and more generally the set of selectors built into the spec are not likely to cover all use cases. The w3c spec does incorporate [an extension mechanism](https://www.w3.org/TR/annotation-vocab/#extensions), via JSON-LD contexts. Perhaps the matrix spec for document markup would want to eventually incorporate a well-documented JSON-LD context for any extended selector types that become important. 3. The w3c spec is sufficiently expansive that it overlaps to some extent with existing Matrix functionality. For example, you could have an annotation targeting a matrix event, whose "purpose" (a field allowed by the w3c spec) is "reply" (a value listed in the spec). Care would need to be taken not to create confusing duplications like this. From 89f657108247c9bb65ecbb483f74d4fd161b4cb7 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Sun, 13 Mar 2022 08:08:52 -0500 Subject: [PATCH 20/22] Add text markup, security considerations also update links --- proposals/3574-resource-markup.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 18af94eefb1..1fe053c842a 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -1,6 +1,7 @@ # Marking up resources -This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-doc/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Extensible Events (MSC1767)](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1767/proposals/1767-extensible-events.md) to represent annotations and discussion. This MSC specifies: +This MSC proposes a way to annotate and discuss various resources (web pages, documents, videos, and other files) using Matrix. The general idea is to use [Spaces (MSC1772)](https://github.com/matrix-org/matrix-spec-proposals/pull/1772) to represent a general resource to be annotated, and then a combination of child rooms and [Extensible Events (MSC1767)]( +https://github.com/matrix-org/matrix-spec-proposals/pull/1767) to represent annotations and discussion. This MSC specifies: * Additional data in the `m.room.create` event to mark a space as describing a resource to be annotated. * Additional (optional) data in the `m.space.child` and `m.space.parent` events to mark sections of the resource (pages, timestamps, etc.) that are being discussed by the child room. The specific format of the location data is resource-specific, and will be described in further MSCs. @@ -17,7 +18,7 @@ A space will be considered a *resource* if its creation event includes a key `m. The `m.markup.resource` value MUST include either: -1. an `m.file` key, populated according to the `m.file` schema as presented in [Extensible Events - Files (MSC3551)](https://github.com/matrix-org/matrix-doc/blob/travis/msc/extev/files/proposals/3551-extensible-events-files.md), or +1. an `m.file` key, populated according to the `m.file` schema as presented in [Extensible Events - Files (MSC3551)](https://github.com/matrix-org/matrix-spec-proposals/pull/3551), or 2. a `url` and `mimetype` key. This format is preferred for potentially mutable resources (like web pages with dynamic content) or for resources that require multiple network requests to display properly. Clients should recognize that a `url` subordinate to an `m.markup.resource` (including within an `m.file` value) may contain URI schemes other than `mxc`. It may contain `http(s)`, and may ultimately contain other schemes in the future. Clients handling `m.markup.resource` should be prepared to fail gracefully upon encountering an unrecognized scheme. @@ -53,7 +54,8 @@ Different Media Types will require different notions of "location". A need for n Hence, the `m.markup.location` value MUST be an object, whose keys are different kinds of locations occupied by a single annotation, with the names of those locations either formalized in the matrix spec or namespaced using Java conventions. Some proposed location types are described in: -- [MSC3592: Markup locations for PDF documents](https://github.com/matrix-org/matrix-doc/pull/3592) +- [MSC3592: Markup locations for PDF documents](https://github.com/matrix-org/matrix-spec-proposals/pull/3592) +- [MSC3752: Markup locations for text](https://github.com/matrix-org/matrix-spec-proposals/pull/3752) ### Examples @@ -83,7 +85,7 @@ Hence, the `m.markup.location` value MUST be an object, whose keys are different It may be desirable, within a conversation concerning a resource, to make reference to some part of the resource. Annotation message events make this possible. -An annotation message event will treat `m.markup` as an extensible event schema following [Extensible events (MSC1767)](https://github.com/matrix-org/matrix-doc/pull/1767), but the message will ordinarily include an `m.text` value with text optionally describing the annotation as a fallback. The `m.markup` value will consist of an `m.markup.location`, and an `m.markup.parent` that indicates the room id of the resource with which the annotation message is associated. (The latter is necessary when a room has more than one parent resource.) Until migration to extensible events is complete, annotations will send messages of the type `m.room.message`, for compatibility with non-annotation-aware clients. +An annotation message event will treat `m.markup` as an extensible event schema following [Extensible events (MSC1767)](https://github.com/matrix-org/matrix-spec-proposals/pull/1767), but the message will ordinarily include an `m.text` value with text optionally describing the annotation as a fallback. The `m.markup` value will consist of an `m.markup.location`, and an `m.markup.parent` that indicates the room id of the resource with which the annotation message is associated. (The latter is necessary when a room has more than one parent resource.) Until migration to extensible events is complete, annotations will send messages of the type `m.room.message`, for compatibility with non-annotation-aware clients. ### Examples @@ -185,7 +187,7 @@ However, these more abstract cases can be subsumed under the design here. Geospa ## Resources as a space type or subtype -Resources could be designated as such using an `m.purpose` event, as in [Room subtyping (MSC3088)](https://github.com/matrix-org/matrix-doc/blob/travis/msc/mutable-subtypes/proposals/3088-room-subtyping.md), or with an `m.room.type` event as in [Room Types (MSC1840)](https://github.com/matrix-org/matrix-doc/pull/1840). +Resources could be designated as such using an `m.purpose` event, as in [Room subtyping (MSC3088)](https://github.com/matrix-org/matrix-spec-proposals/pull/3088), or with an `m.room.type` event as in [Room Types (MSC1840)](https://github.com/matrix-org/matrix-spec-proposals/pull/1840). However, @@ -198,7 +200,7 @@ Rather than being represented by `m.space.child` events, annotations that open a ## Discussions as threads -Discussions concerning a part of resource could be modeled as threads rooted in `m.markup` message events, using [Threading via `m.thread` relation (MSC3440)](https://github.com/matrix-org/matrix-doc/pull/3440). The current proposal is intended to be compatible with clients that model discussions this way, since those clients are free to build threads rooted in `m.markup` events and display these however they like. However, an alternative approach to marking up resources would be to *only* introduce `m.markup` events, and expect all clients to follow a thread-based model. +Discussions concerning a part of resource could be modeled as threads rooted in `m.markup` message events, using [Threading via `m.thread` relation (MSC3440)](https://github.com/matrix-org/matrix-spec-proposals/pull/3440). The current proposal is intended to be compatible with clients that model discussions this way, since those clients are free to build threads rooted in `m.markup` events and display these however they like. However, an alternative approach to marking up resources would be to *only* introduce `m.markup` events, and expect all clients to follow a thread-based model. Instead, the current proposal also provides the option of modeling discussions concerning a resource as standalone rooms. There are a number of advantages to this choice. Discussions modeled as rooms inherit: @@ -229,7 +231,7 @@ Some disadvantages are: # Security Considerations -None. +Because state events are not encrypted, `m.space.child` events with `m.markup.location` keys may leak information about encrypted resources. This is really a general problem with unencrypted state events, and should be solved by something like [MSC3414: Encrypted State Events](https://github.com/matrix-org/matrix-spec-proposals/pull/3414). Until encrypted state events are available, MSC individual location types with fields that might leak information should flag this as a security consideration, and clients should mitigate with appropriate warnings. # Unstable Prefix From 6c238af83bd875316ceaf12a6ec4ee30a596640d Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Sat, 21 May 2022 14:42:17 -0500 Subject: [PATCH 21/22] Link Audiovisual media markup --- proposals/3574-resource-markup.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 1fe053c842a..48f747255c4 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -56,6 +56,7 @@ Hence, the `m.markup.location` value MUST be an object, whose keys are different - [MSC3592: Markup locations for PDF documents](https://github.com/matrix-org/matrix-spec-proposals/pull/3592) - [MSC3752: Markup locations for text](https://github.com/matrix-org/matrix-spec-proposals/pull/3752) +- [MSC3775: Markup locations for Audiovisual Media](https://github.com/matrix-org/matrix-spec-proposals/pull/3775) ### Examples From 3160aafc14b9608bc5a6d465536bc17f13cb96c9 Mon Sep 17 00:00:00 2001 From: Graham Leach-Krouse Date: Tue, 7 Jun 2022 15:46:51 -0500 Subject: [PATCH 22/22] Tiny lints, mention MSC3761 --- proposals/3574-resource-markup.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/proposals/3574-resource-markup.md b/proposals/3574-resource-markup.md index 48f747255c4..fe0a37aa5ee 100644 --- a/proposals/3574-resource-markup.md +++ b/proposals/3574-resource-markup.md @@ -176,7 +176,8 @@ Annotation messages can be serialized to the w3c model much as above, but with t # Potential Issues -There's no notion of "ownership" for state events---anyone who can send `m.space.parent` events can overwrite `m.space.parent` events sent by others. So anyone who can create a conversation concerning a certain resource can also remove conversations created by others. Clients can partly mitigate this by at least discouraging accidental deletions and encouraging courtesy. A more robust mitigation might be to introduce subspaces of resources, within which less-trusted users could still create conversations concerning a given resource. However, this seems undesirably complicated for an initial implementation. If it turns out to be necessary in practice, it could be added in a future MSC. +Until something like +[MSC3761](https://github.com/matrix-org/matrix-spec-proposals/pull/3761) is added to the spec, there's no notion of "ownership" for state events. Anyone who can send `m.space.parent` events can overwrite `m.space.parent` events sent by others. So anyone who can create a conversation concerning a certain resource can also remove conversations created by others. Clients can partly mitigate this by at least discouraging accidental deletions and encouraging courtesy. A more robust mitigation might be to introduce subspaces of resources, within which less-trusted users could still create conversations concerning a given resource. However, this seems undesirably complicated for an initial implementation. If it turns out to be necessary in practice, it could be added in a future MSC. # Alternatives @@ -195,7 +196,7 @@ However, 1. Indicating an associated resource in the room creation event makes it possible to inspect an invitation to a new space, allowing annotation-oriented clients to ignore irrelevant invitations. 2. If `m.purpose` or `m.room.type` are integrated into the spec and turn out to be useful for, e.g. filtering, then it would be straightforward to designate one or more `m.purpose` values or `m.room.type` values for resource rooms. -## Standalone `m.annotation.location` state events +## Standalone `m.markup.location` state events Rather than being represented by `m.space.child` events, annotations that open a conversation concerning a part of a resource could be introduced as a new kind of state event. This has the disadvantage of not making relationships between a resource and conversations about its parts visible to clients which are space-aware but not annotation-aware. @@ -241,4 +242,4 @@ Because state events are not encrypted, `m.space.child` events with `m.markup.lo | `m.markup.location` | key in `m.space.child`, `m.space.parent` and `m.annotation`| `com.open-tower.msc3574.markup.location` | | `m.markup.resource` | key in `m.create` | `com.open-tower.msc3574.markup.resource` | | `m.markup` | extensible event schema | `com.open-tower.msc3574.markup` | -| `m.markup.parent` | key in `m.annotation` | `com.open-tower.msc3574.markup.parent` | +| `m.markup.parent` | key in `m.markup` | `com.open-tower.msc3574.markup.parent` |