-
Notifications
You must be signed in to change notification settings - Fork 124
Sanitizing html after markdown and tool/inject-html directive rendering #2833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
fb47020
Sanitizing html after markdown and tool/inject-html directive rendering
jonasfj 7e9b609
Making sanitizing hidden and off by deafult
jonasfj 5bf7549
dart run ginder build
jonasfj e014a45
Tests of HTML sanitizing
jonasfj 73ac1d4
Fix number of classes
jonasfj 2c29efa
Moved html sanitizer tests
jonasfj 2ff8702
Removed the script removing hack
jonasfj File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
dartdoc: | ||
linkToSource: | ||
root: '.' | ||
uriTemplate: 'https://github.com/dart-lang/dartdoc/blob/v4.0.0/%f%#L%l%' | ||
uriTemplate: 'https://github.com/dart-lang/dartdoc/blob/v4.1.0-dev/%f%#L%l%' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
// Generated code. Do not modify. | ||
const packageVersion = '4.0.0'; | ||
const packageVersion = '4.1.0-dev'; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good place for this code. I had considered inserting this somewhere in the markdown rendering pipeline too to try to check embedded HTML, but perhaps users of the markdown output already have their own sanitization in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package:markdown
explicitly says to do sanitization as a post-processing step:https://pub.dev/packages/markdown#html-sanitization
They also reference Markdown's XSS Vulnerability (and how to mitigate it).
TL;DR: XSS mitigation is best done post-markdown rendering.
Maybe it's possible to write a custom implementation of HtmlRenderer which does sanitizing on-the-fly. But I'm not sure if
Text
nodes in markdown can contain inline HTML, as isHtmlRenderer
does not escape<>
when renderingText
nodes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick stab a it, and it doesn't look easy t make
HtmlRenderer
produce safe HTML... Even just usinghtmlSerializeEscape
when outputtingText
broke a lot of tests that aimed to ensure that inline HTML keeps working.My two cents is that maybe... but probably we have to sanitize after markdown rendering -- otherwise, we'll probably have to dramatically refactor our markdown rendering pipeline.
If we wanted to support
{@tool}
directives which can side-step HTML sanitizing, then I suggest we extend the markdown parser with custom block syntax, and have it translate{@tool foo --my-flag}...{@end-tool}
into<pre data-tool="foo --my-flag">...</pre>
, then we could replace these<pre>
tags after HTML sanitizing, run tools at this point and decide if we want to sanitize the output of a tool on a tool by tool basis.Played a bit around with
package:markdown
, and we can do things like:Then we just have find
<pre data-tool="COMMAND">STDIN</pre>
afterwards, decode the HTML text and attribute text before running the tool. Anyways, maybe that's a larger refactor :DThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this probably involves a quite a bit of refactoring to make it work -- as we would effectively need to move execution of tools and rendering of markdown into the renderer. I think it makes some sense to put rendering of markdown in the renderer, but I can also see the argument that execution of tools might not need to happen there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. Moving tool execution to the back end rather than the front breaks all kinds of internal assumptions in dartdoc and runs into a sync vs. async problem as well IIRC from the last time I tried to change the pipeline.
When you say that execution of tools "might not need to happen here" do you mean that it literally doesn't need to run, or just that it doesn't have to be moved? I do wonder if we can get away with "not solving this" for pub for the moment -- if we just forbid tool execution when the post-processing flag is on and then leave me the cleanup to get this working in a more comprehensive manner. IIRC tool execution is forbidden anyhow on pub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
I'm thinking that -- if you want to do tool execution you have two options:
It's not unreasonable that dartdoc simply stick with (1), and then if needed expand the allowlist for the sanitizer. In practice most tools probably just want to:
<img>
tag (allowed),And maybe allowing embedding of sandboxed iframes could offer an escape hatch for remaining use-cases. So perhaps option (1) is fine, and we just always sanitize output from tools and
{@inject-html}
directives.Note on HTML sanitizing:
<style>
or attributesstyle="..."
)<script>
and attributesonclick="..."
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed notes on all of this!
Thinking about this from dartdoc's perspective going with
#1
seems reasonable for now. Being as safe as possible seems a good default choice in any event. But we're going to have to do something about Flutter as script tags are part of their existing tool infrastructure. Perhaps we can make the sanitizer able to be disabled in dartdoc_options.yaml so it can be scoped? Pub already will filter things from dartdoc_options files so that seems like it could be reasonable to do (as long as pub knows that dartdoc_options.yaml files can be located anywhere in the tree?).Examples from Flutter docs:
Trusted domain iframe from a
youtube
directive:https://api.flutter.dev/flutter/material/material-library.html
Here is an example of a tool inserting script tags:
https://api.flutter.dev/flutter/material/AboutListTile-class.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easy allow iframes from trusted domains. Or we could simply hardcode an exception for youtube, that seems reasonable given how many users want to embed youtube.
But in general, I suspect
<iframe sandbox="allow-same-origin allow-scripts"
is reasonably safe, so long as the embedded document is, either: (i) from a separate domain, or (ii) trusted (if it's from the same domain). It seems like a reasonable expectation that if the document is from the same domain, then it's trusted.Embedding
<script>
inline violates a lot of good practices, include security recommendations around content-security-policies (I think we strive to use something like strict CSP whenever possible).In many cases, it would be better if the
{@tool
directive generated something like:Then
dartdoc
options to inject a<script src="static/snippet-buttons.js">
file in the header. Then havesnippet-buttons.js
do something like:Hmm, looking at it Flutter docs already embed
https://api.flutter.dev/assets/snippets.js
, so instead of embeddingonclick
handlers in the HTML code, they just have usedata-...
properties (or special CSS classes) to indicate that an HTML element shouldonclick
handlers, and set it up in anDOMContentLoaded
handler insnippets.js
.It's similar to what we do with syntax highlighting, we just apply a special CSS class
language-<language>
on a<pre>
tag, and thenhighlight.pack.js?v1
will do the rest in anDOMContentLoaded
handler.So perhaps we should just allow custom
data-
attributes, or we could limit it to only allow data-attributes on the formdata-custom-*
(that way you can't mess withdata-base-href
anddata-using-base-href
which are already used in dartdoc generate output).