-
Notifications
You must be signed in to change notification settings - Fork 127
Sanitizing html after markdown and tool/inject-html directive rendering #2833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
fb47020
7e9b609
5bf7549
e014a45
73ac1d4
2c29efa
2ff8702
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
dartdoc: | ||
linkToSource: | ||
root: '.' | ||
uriTemplate: 'https://github.com/dart-lang/dartdoc/blob/v4.0.0/%f%#L%l%' | ||
uriTemplate: 'https://github.com/dart-lang/dartdoc/blob/v4.1.0-dev/%f%#L%l%' |
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -2,14 +2,17 @@ | |||||||
// for details. All rights reserved. Use of this source code is governed by a | ||||||||
// BSD-style license that can be found in the LICENSE file. | ||||||||
|
||||||||
import 'package:html/parser.dart' show parse; | ||||||||
import 'package:html/dom.dart' as dom; | ||||||||
import 'package:html/parser.dart' show parseFragment; | ||||||||
|
||||||||
import 'package:markdown/markdown.dart' as md; | ||||||||
import 'package:meta/meta.dart'; | ||||||||
|
||||||||
abstract class DocumentationRenderer { | ||||||||
DocumentationRenderResult render( | ||||||||
List<md.Node> nodes, { | ||||||||
@required bool processFullDocs, | ||||||||
@required bool sanitizeHtml, | ||||||||
}); | ||||||||
} | ||||||||
|
||||||||
|
@@ -20,16 +23,16 @@ class DocumentationRendererHtml implements DocumentationRenderer { | |||||||
DocumentationRenderResult render( | ||||||||
List<md.Node> nodes, { | ||||||||
@required bool processFullDocs, | ||||||||
@required bool sanitizeHtml, | ||||||||
}) { | ||||||||
if (nodes.isEmpty) { | ||||||||
return DocumentationRenderResult.empty; | ||||||||
} | ||||||||
|
||||||||
var rawHtml = md.HtmlRenderer().render(nodes); | ||||||||
var asHtmlDocument = parse(rawHtml); | ||||||||
for (var s in asHtmlDocument.querySelectorAll('script')) { | ||||||||
s.remove(); | ||||||||
} | ||||||||
for (var pre in asHtmlDocument.querySelectorAll('pre')) { | ||||||||
var asHtmlFragment = parseFragment(rawHtml); | ||||||||
|
||||||||
for (var pre in asHtmlFragment.querySelectorAll('pre')) { | ||||||||
if (pre.children.length > 1 && pre.children.first.localName != 'code') { | ||||||||
continue; | ||||||||
} | ||||||||
|
@@ -44,16 +47,26 @@ class DocumentationRendererHtml implements DocumentationRenderer { | |||||||
// Assume the user intended Dart if there are no other classes present. | ||||||||
if (!specifiesLanguage) pre.classes.add('language-dart'); | ||||||||
} | ||||||||
|
||||||||
if (sanitizeHtml) { | ||||||||
_sanitize(asHtmlFragment); | ||||||||
} else { | ||||||||
// Never allow scripts! | ||||||||
for (var s in asHtmlFragment.querySelectorAll('script')) { | ||||||||
s.remove(); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sadly we can't do this, see other comments. I'm slightly surprised that the tests still work (probably because this condition is Flutter specific), but browsing to generated flutter documentation will definitely break if we unconditionally forbid this. The script tags we need are structured in a very, very specific way -- maybe we can pattern match here. Tagging @gspencergoog , but it looks like maybe we could just get away with dumping any script that doesn't match the pattern, here: https://github.com/flutter/flutter/blob/5d587f9528fbeb7f308d686b7e5751dd232640db/dev/snippets/config/skeletons/sample.html#L12 I mean, it's kind of hacky, but if it works? Definitely open to better ideas here too. Making this a configuration option seems like it might be wise too, in case we try it and go "oh no it is using the wrong thing". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some Maybe even some sort of dartdoc_options.yaml option that takes a set of regular expressions to match onClick or script tags. And yes, that scares me.... will continue scratching my head on this, and input very welcome. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did this because this is what happens on dartdoc/lib/src/render/documentation_renderer.dart Lines 29 to 31 in b4fa868
But yeah, looking at the code it seems that I'm 100% okay removing it, since this is with HTML sanitizing not enabled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I'm alright with this as is then, we can iterate to improve matters. I do think some end-to-end tests by hand would be good; the pipeline is complicated enough for doc generation that I want to be sure we're not missing something. |
||||||||
} | ||||||||
} | ||||||||
|
||||||||
var asHtml = ''; | ||||||||
|
||||||||
if (processFullDocs) { | ||||||||
// `trim` fixes an issue with line ending differences between Mac and | ||||||||
// Windows. | ||||||||
asHtml = asHtmlDocument.body.innerHtml?.trim(); | ||||||||
asHtml = asHtmlFragment.outerHtml.trim(); | ||||||||
} | ||||||||
var asOneLiner = asHtmlDocument.body.children.isEmpty | ||||||||
var asOneLiner = asHtmlFragment.children.isEmpty | ||||||||
? '' | ||||||||
: asHtmlDocument.body.children.first.innerHtml; | ||||||||
: asHtmlFragment.children.first.innerHtml; | ||||||||
|
||||||||
return DocumentationRenderResult(asHtml: asHtml, asOneLiner: asOneLiner); | ||||||||
} | ||||||||
|
@@ -68,3 +81,253 @@ class DocumentationRenderResult { | |||||||
const DocumentationRenderResult( | ||||||||
{@required this.asHtml, @required this.asOneLiner}); | ||||||||
} | ||||||||
|
||||||||
bool _allowClassName(String className) => | ||||||||
className == 'deprecated' || className.startsWith('language-'); | ||||||||
|
||||||||
Iterable<String> _addLinkRel(String uri) { | ||||||||
final u = Uri.tryParse(uri); | ||||||||
if (u.host.isNotEmpty) { | ||||||||
// TODO(jonasfj): Consider allowing non-ugc links for trusted sites. | ||||||||
return ['ugc']; | ||||||||
} | ||||||||
return []; | ||||||||
} | ||||||||
|
||||||||
void _sanitize(dom.Node node) { | ||||||||
if (node is dom.Element) { | ||||||||
final tagName = node.localName.toUpperCase(); | ||||||||
if (!_allowedElements.contains(tagName)) { | ||||||||
node.remove(); | ||||||||
return; | ||||||||
} | ||||||||
node.attributes.removeWhere((k, v) { | ||||||||
final attrName = k.toString(); | ||||||||
if (attrName == 'class') { | ||||||||
node.classes.removeWhere((cn) => !_allowClassName(cn)); | ||||||||
return node.classes.isEmpty; | ||||||||
} | ||||||||
return !_isAttributeAllowed(tagName, attrName, v); | ||||||||
}); | ||||||||
if (tagName == 'A') { | ||||||||
final href = node.attributes['href']; | ||||||||
if (href != null) { | ||||||||
final rels = _addLinkRel(href); | ||||||||
if (rels != null && rels.isNotEmpty) { | ||||||||
node.attributes['rel'] = rels.join(' '); | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
if (node.hasChildNodes()) { | ||||||||
// doing it in reverse order, because we could otherwise skip one, when a | ||||||||
// node is removed... | ||||||||
for (var i = node.nodes.length - 1; i >= 0; i--) { | ||||||||
_sanitize(node.nodes[i]); | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
bool _isAttributeAllowed(String tagName, String attrName, String value) { | ||||||||
if (_alwaysAllowedAttributes.contains(attrName)) return true; | ||||||||
|
||||||||
// Special validators for special attributes on special tags (href/src/cite) | ||||||||
final attributeValidators = _elementAttributeValidators[tagName]; | ||||||||
if (attributeValidators == null) { | ||||||||
return false; | ||||||||
} | ||||||||
|
||||||||
final validator = attributeValidators[attrName]; | ||||||||
if (validator == null) { | ||||||||
return false; | ||||||||
} | ||||||||
|
||||||||
return validator(value); | ||||||||
} | ||||||||
|
||||||||
// Inspired by the set of HTML tags allowed in GFM. | ||||||||
final _allowedElements = <String>{ | ||||||||
'H1', | ||||||||
'H2', | ||||||||
'H3', | ||||||||
'H4', | ||||||||
'H5', | ||||||||
'H6', | ||||||||
'H7', | ||||||||
'H8', | ||||||||
'BR', | ||||||||
'B', | ||||||||
'I', | ||||||||
'STRONG', | ||||||||
'EM', | ||||||||
'A', | ||||||||
'PRE', | ||||||||
'CODE', | ||||||||
'IMG', | ||||||||
'TT', | ||||||||
'DIV', | ||||||||
'INS', | ||||||||
'DEL', | ||||||||
'SUP', | ||||||||
'SUB', | ||||||||
'P', | ||||||||
'OL', | ||||||||
'UL', | ||||||||
'TABLE', | ||||||||
'THEAD', | ||||||||
'TBODY', | ||||||||
'TFOOT', | ||||||||
'BLOCKQUOTE', | ||||||||
'DL', | ||||||||
'DT', | ||||||||
'DD', | ||||||||
'KBD', | ||||||||
'Q', | ||||||||
'SAMP', | ||||||||
'VAR', | ||||||||
'HR', | ||||||||
'RUBY', | ||||||||
'RT', | ||||||||
'RP', | ||||||||
'LI', | ||||||||
'TR', | ||||||||
'TD', | ||||||||
'TH', | ||||||||
'S', | ||||||||
'STRIKE', | ||||||||
'SUMMARY', | ||||||||
'DETAILS', | ||||||||
'CAPTION', | ||||||||
'FIGURE', | ||||||||
'FIGCAPTION', | ||||||||
'ABBR', | ||||||||
'BDO', | ||||||||
'CITE', | ||||||||
'DFN', | ||||||||
'MARK', | ||||||||
'SMALL', | ||||||||
'SPAN', | ||||||||
'TIME', | ||||||||
'WBR', | ||||||||
}; | ||||||||
|
||||||||
// Inspired by the set of HTML attributes allowed in GFM. | ||||||||
final _alwaysAllowedAttributes = <String>{ | ||||||||
'abbr', | ||||||||
'accept', | ||||||||
'accept-charset', | ||||||||
'accesskey', | ||||||||
'action', | ||||||||
'align', | ||||||||
'alt', | ||||||||
'aria-describedby', | ||||||||
'aria-hidden', | ||||||||
'aria-label', | ||||||||
'aria-labelledby', | ||||||||
'axis', | ||||||||
'border', | ||||||||
'cellpadding', | ||||||||
'cellspacing', | ||||||||
'char', | ||||||||
'charoff', | ||||||||
'charset', | ||||||||
'checked', | ||||||||
'clear', | ||||||||
'cols', | ||||||||
'colspan', | ||||||||
'color', | ||||||||
'compact', | ||||||||
'coords', | ||||||||
'datetime', | ||||||||
'dir', | ||||||||
'disabled', | ||||||||
'enctype', | ||||||||
'for', | ||||||||
'frame', | ||||||||
'headers', | ||||||||
'height', | ||||||||
'hreflang', | ||||||||
'hspace', | ||||||||
'ismap', | ||||||||
'label', | ||||||||
'lang', | ||||||||
'maxlength', | ||||||||
'media', | ||||||||
'method', | ||||||||
'multiple', | ||||||||
'name', | ||||||||
'nohref', | ||||||||
'noshade', | ||||||||
'nowrap', | ||||||||
'open', | ||||||||
'prompt', | ||||||||
'readonly', | ||||||||
'rel', | ||||||||
'rev', | ||||||||
'rows', | ||||||||
'rowspan', | ||||||||
'rules', | ||||||||
'scope', | ||||||||
'selected', | ||||||||
'shape', | ||||||||
'size', | ||||||||
'span', | ||||||||
'start', | ||||||||
'summary', | ||||||||
'tabindex', | ||||||||
'target', | ||||||||
'title', | ||||||||
'type', | ||||||||
'usemap', | ||||||||
'valign', | ||||||||
'value', | ||||||||
'vspace', | ||||||||
'width', | ||||||||
'itemprop', | ||||||||
}; | ||||||||
|
||||||||
bool _alwaysAllowed(String _) => true; | ||||||||
|
||||||||
bool _validLink(String url) { | ||||||||
try { | ||||||||
final uri = Uri.parse(url); | ||||||||
return uri.isScheme('https') || | ||||||||
uri.isScheme('http') || | ||||||||
uri.isScheme('mailto') || | ||||||||
!uri.hasScheme; | ||||||||
} on FormatException { | ||||||||
return false; | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
bool _validUrl(String url) { | ||||||||
try { | ||||||||
final uri = Uri.parse(url); | ||||||||
return uri.isScheme('https') || uri.isScheme('http') || !uri.hasScheme; | ||||||||
} on FormatException { | ||||||||
return false; | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
final _citeAttributeValidator = <String, bool Function(String)>{ | ||||||||
'cite': _validUrl, | ||||||||
}; | ||||||||
|
||||||||
final _elementAttributeValidators = | ||||||||
<String, Map<String, bool Function(String)>>{ | ||||||||
'A': { | ||||||||
'href': _validLink, | ||||||||
}, | ||||||||
'IMG': { | ||||||||
'src': _validUrl, | ||||||||
'longdesc': _validUrl, | ||||||||
}, | ||||||||
'DIV': { | ||||||||
'itemscope': _alwaysAllowed, | ||||||||
'itemtype': _alwaysAllowed, | ||||||||
}, | ||||||||
'BLOCKQUOTE': _citeAttributeValidator, | ||||||||
'DEL': _citeAttributeValidator, | ||||||||
'INS': _citeAttributeValidator, | ||||||||
'Q': _citeAttributeValidator, | ||||||||
}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
// Generated code. Do not modify. | ||
const packageVersion = '4.0.0'; | ||
const packageVersion = '4.1.0-dev'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good place for this code. I had considered inserting this somewhere in the markdown rendering pipeline too to try to check embedded HTML, but perhaps users of the markdown output already have their own sanitization in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package:markdown
explicitly says to do sanitization as a post-processing step:https://pub.dev/packages/markdown#html-sanitization
They also reference Markdown's XSS Vulnerability (and how to mitigate it).
TL;DR: XSS mitigation is best done post-markdown rendering.
Maybe it's possible to write a custom implementation of HtmlRenderer which does sanitizing on-the-fly. But I'm not sure if
Text
nodes in markdown can contain inline HTML, as isHtmlRenderer
does not escape<>
when renderingText
nodes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick stab a it, and it doesn't look easy t make
HtmlRenderer
produce safe HTML... Even just usinghtmlSerializeEscape
when outputtingText
broke a lot of tests that aimed to ensure that inline HTML keeps working.My two cents is that maybe... but probably we have to sanitize after markdown rendering -- otherwise, we'll probably have to dramatically refactor our markdown rendering pipeline.
If we wanted to support
{@tool}
directives which can side-step HTML sanitizing, then I suggest we extend the markdown parser with custom block syntax, and have it translate{@tool foo --my-flag}...{@end-tool}
into<pre data-tool="foo --my-flag">...</pre>
, then we could replace these<pre>
tags after HTML sanitizing, run tools at this point and decide if we want to sanitize the output of a tool on a tool by tool basis.Played a bit around with
package:markdown
, and we can do things like:Then we just have find
<pre data-tool="COMMAND">STDIN</pre>
afterwards, decode the HTML text and attribute text before running the tool. Anyways, maybe that's a larger refactor :DThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this probably involves a quite a bit of refactoring to make it work -- as we would effectively need to move execution of tools and rendering of markdown into the renderer. I think it makes some sense to put rendering of markdown in the renderer, but I can also see the argument that execution of tools might not need to happen there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. Moving tool execution to the back end rather than the front breaks all kinds of internal assumptions in dartdoc and runs into a sync vs. async problem as well IIRC from the last time I tried to change the pipeline.
When you say that execution of tools "might not need to happen here" do you mean that it literally doesn't need to run, or just that it doesn't have to be moved? I do wonder if we can get away with "not solving this" for pub for the moment -- if we just forbid tool execution when the post-processing flag is on and then leave me the cleanup to get this working in a more comprehensive manner. IIRC tool execution is forbidden anyhow on pub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
I'm thinking that -- if you want to do tool execution you have two options:
It's not unreasonable that dartdoc simply stick with (1), and then if needed expand the allowlist for the sanitizer. In practice most tools probably just want to:
<img>
tag (allowed),And maybe allowing embedding of sandboxed iframes could offer an escape hatch for remaining use-cases. So perhaps option (1) is fine, and we just always sanitize output from tools and
{@inject-html}
directives.Note on HTML sanitizing:
<style>
or attributesstyle="..."
)<script>
and attributesonclick="..."
)Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed notes on all of this!
Thinking about this from dartdoc's perspective going with
#1
seems reasonable for now. Being as safe as possible seems a good default choice in any event. But we're going to have to do something about Flutter as script tags are part of their existing tool infrastructure. Perhaps we can make the sanitizer able to be disabled in dartdoc_options.yaml so it can be scoped? Pub already will filter things from dartdoc_options files so that seems like it could be reasonable to do (as long as pub knows that dartdoc_options.yaml files can be located anywhere in the tree?).Examples from Flutter docs:
Trusted domain iframe from a
youtube
directive:https://api.flutter.dev/flutter/material/material-library.html
Here is an example of a tool inserting script tags:
https://api.flutter.dev/flutter/material/AboutListTile-class.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easy allow iframes from trusted domains. Or we could simply hardcode an exception for youtube, that seems reasonable given how many users want to embed youtube.
But in general, I suspect
<iframe sandbox="allow-same-origin allow-scripts"
is reasonably safe, so long as the embedded document is, either: (i) from a separate domain, or (ii) trusted (if it's from the same domain). It seems like a reasonable expectation that if the document is from the same domain, then it's trusted.Embedding
<script>
inline violates a lot of good practices, include security recommendations around content-security-policies (I think we strive to use something like strict CSP whenever possible).In many cases, it would be better if the
{@tool
directive generated something like:Then
dartdoc
options to inject a<script src="static/snippet-buttons.js">
file in the header. Then havesnippet-buttons.js
do something like:Hmm, looking at it Flutter docs already embed
https://api.flutter.dev/assets/snippets.js
, so instead of embeddingonclick
handlers in the HTML code, they just have usedata-...
properties (or special CSS classes) to indicate that an HTML element shouldonclick
handlers, and set it up in anDOMContentLoaded
handler insnippets.js
.It's similar to what we do with syntax highlighting, we just apply a special CSS class
language-<language>
on a<pre>
tag, and thenhighlight.pack.js?v1
will do the rest in anDOMContentLoaded
handler.So perhaps we should just allow custom
data-
attributes, or we could limit it to only allow data-attributes on the formdata-custom-*
(that way you can't mess withdata-base-href
anddata-using-base-href
which are already used in dartdoc generate output).