Skip to content

Conversation

lolbinarycat
Copy link
Contributor

@lolbinarycat lolbinarycat commented Aug 17, 2025

best reviewed a commit at a time.

I kept finding more edge case so I ended up having to make quite significant changes to the parser in order to make it preserve state across events and handle multiline attributes correctly.

fixes #145529

@rustbot
Copy link
Collaborator

rustbot commented Aug 17, 2025

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Aug 17, 2025
@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat lolbinarycat force-pushed the rustdoc-invalid_html_tags-svg-145529 branch from f2456a7 to adf1e7d Compare August 17, 2025 18:27
@rustbot

This comment has been minimized.

@klensy
Copy link
Contributor

klensy commented Aug 17, 2025

pulldown-cmark splits html blocks

Used pulldown-cmark is slightly old for reasons, this can be changed in more recent versions.

// for some reason, pulldown-cmark splits html blocks into seperate events for each line.
// we undo this, in order to handle multi-line tags.
match (a, b) {
((Event::Html(_), ra), (Event::Html(_), rb)) if ra.end == rb.start => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if you wrap an html block in a markdown blockquote.

$ cargo run --release -- --events
    Finished `release` profile [optimized] target(s) in 0.06s
     Running `target/release/pulldown-cmark --events`
> <div
> class="foo">
0..22: Start(BlockQuote(None))
2..22: Start(HtmlBlock)
2..7: Html(Borrowed("<div\n"))
9..22: Html(Borrowed("class=\"foo\">\n"))
2..22: End(HtmlBlock)
0..22: End(BlockQuote(None))
EOF

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think should be done instead, then? Unconditionally coalesce adjacent Html events, using Cow::Owned if needed? That could potentially also break the check that looks for > within the range..

The other option would be fully refactoring the parser to move its state into a struct so that it can be passed multiple strings and doesn't need to have the strings concatenated. If we're doing that we can also refactor the tag parsing a bit so unfinished tags (those missing >) are actually treated as a separate error case from unclosed tags.

The latter would be more robust, but also a fair bit more work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditionally coalesce adjacent Html events, using Cow::Owned if needed?

I think that's the easiest way to handle it.

    while let Some((event, range)) = p.next() {
        match event {
            Event::Start(Tag::CodeBlock(_)) => in_code_block = true,
            Event::Html(text) | Event::InlineHtml(text) if !in_code_block => {
                extract_tags(&mut tags, &text, range, dox, &mut is_in_comment, &report_diag)
            }
            // for some reason, pulldown-cmark splits html blocks into seperate events for each line.
            // we undo this, in order to handle multi-line tags.
            Event::Tag(Tag::HtmlBlock) if !in_code_block => {
                let mut html_text = String::new();
                while let Some((event_inner, _)) = p.next() {
                    match event_inner {
                        Event::Html(text) => {
                            html_text.push_str(&text[..]);
                        }
                        Event::End(TagEnd::HtmlBlock) => break,
                        _ => unreachable!("html is supposed to be a leaf block"),
                    }
                }
                extract_tags(&mut tags, &html_text, range, dox, &mut is_in_comment, &report_diag);
            }
            Event::End(TagEnd::CodeBlock) => in_code_block = false,
            _ => {}
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still doesn't solve the issue of > from backquotes. I think refactoring the parser is probably the correct thing to do here.

@lolbinarycat lolbinarycat marked this pull request as draft August 18, 2025 21:59
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 18, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat lolbinarycat force-pushed the rustdoc-invalid_html_tags-svg-145529 branch from 50d97d1 to 6e6de07 Compare August 21, 2025 18:48
@rust-log-analyzer

This comment has been minimized.

previously, this lint did not distinguish between `<img` and `<img>`,
and since the latter should be accepted under html5,
the former was also accepted.

the parser now also handles multi-line tags and multi-line attributes.
@lolbinarycat lolbinarycat force-pushed the rustdoc-invalid_html_tags-svg-145529 branch from 6e6de07 to d022089 Compare August 21, 2025 20:05
@lolbinarycat lolbinarycat marked this pull request as ready for review August 21, 2025 21:28
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 21, 2025
@rustbot
Copy link
Collaborator

rustbot commented Aug 21, 2025

⚠️ Warning ⚠️

  • There are issue links (such as #123) in the commit messages of the following commits.
    Please move them to the PR description, to avoid spamming the issues with references to the commit, and so this bot can automatically canonicalize them to avoid issues with subtree.

@GuillaumeGomez
Copy link
Member

Seems like it's already a big improvement over the existing so let's merge it.

@notriddle: Based on your comments, you're not completely satisfied with the current fix so I suggest we do iterative improvements in follow-up(s).

Thanks to both of you in any case!

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Aug 25, 2025

📌 Commit d022089 has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 25, 2025
jhpratt added a commit to jhpratt/rust that referenced this pull request Aug 25, 2025
…tags-svg-145529, r=GuillaumeGomez

make rustdoc::invalid_html_tags more robust

best reviewed a commit at a time.

I kept finding more edge case so I ended up having to make quite significant changes to the parser in order to make it preserve state across events and handle multiline attributes correctly.

fixes rust-lang#145529
bors added a commit that referenced this pull request Aug 25, 2025
Rollup of 12 pull requests

Successful merges:

 - #143193 (Port `#[link]` to the new attribute parsing infrastructure )
 - #144373 (remove deprecated Error::description in impls)
 - #144885 (Implement some more checks in `ptr_guaranteed_cmp`. )
 - #145535 (make rustdoc::invalid_html_tags more robust)
 - #145766 (test(rustfmt): Verify frontmatter is preserved)
 - #145811 (Fix some minor issues in comments)
 - #145814 (Handle unwinding fatal errors in codegen workers)
 - #145815 (Wait for DPkg frontend lock when trying to remove packages)
 - #145821 (compiletest: if a compiler fails, show its output)
 - #145845 (Make `x test distcheck` self-contained)
 - #145847 (Don't show warnings from xcrun with -Zverbose-internals)
 - #145856 (Update books)

r? `@ghost`
`@rustbot` modify labels: rollup
Zalathar added a commit to Zalathar/rust that referenced this pull request Aug 26, 2025
…tags-svg-145529, r=GuillaumeGomez

make rustdoc::invalid_html_tags more robust

best reviewed a commit at a time.

I kept finding more edge case so I ended up having to make quite significant changes to the parser in order to make it preserve state across events and handle multiline attributes correctly.

fixes rust-lang#145529
bors added a commit that referenced this pull request Aug 26, 2025
Rollup of 13 pull requests

Successful merges:

 - #143193 (Port `#[link]` to the new attribute parsing infrastructure )
 - #143689 (Allow linking a prebuilt optimized compiler-rt builtins library)
 - #144885 (Implement some more checks in `ptr_guaranteed_cmp`. )
 - #145535 (make rustdoc::invalid_html_tags more robust)
 - #145766 (test(rustfmt): Verify frontmatter is preserved)
 - #145811 (Fix some minor issues in comments)
 - #145814 (Handle unwinding fatal errors in codegen workers)
 - #145815 (Wait for DPkg frontend lock when trying to remove packages)
 - #145821 (compiletest: if a compiler fails, show its output)
 - #145845 (Make `x test distcheck` self-contained)
 - #145847 (Don't show warnings from xcrun with -Zverbose-internals)
 - #145856 (Update books)
 - #145858 (Update wasm-component-ld dependency)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit that referenced this pull request Aug 26, 2025
Rollup of 12 pull requests

Successful merges:

 - #143689 (Allow linking a prebuilt optimized compiler-rt builtins library)
 - #144885 (Implement some more checks in `ptr_guaranteed_cmp`. )
 - #145535 (make rustdoc::invalid_html_tags more robust)
 - #145766 (test(rustfmt): Verify frontmatter is preserved)
 - #145811 (Fix some minor issues in comments)
 - #145814 (Handle unwinding fatal errors in codegen workers)
 - #145815 (Wait for DPkg frontend lock when trying to remove packages)
 - #145821 (compiletest: if a compiler fails, show its output)
 - #145845 (Make `x test distcheck` self-contained)
 - #145847 (Don't show warnings from xcrun with -Zverbose-internals)
 - #145856 (Update books)
 - #145858 (Update wasm-component-ld dependency)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit that referenced this pull request Aug 26, 2025
Rollup of 12 pull requests

Successful merges:

 - #143689 (Allow linking a prebuilt optimized compiler-rt builtins library)
 - #144885 (Implement some more checks in `ptr_guaranteed_cmp`. )
 - #145535 (make rustdoc::invalid_html_tags more robust)
 - #145766 (test(rustfmt): Verify frontmatter is preserved)
 - #145811 (Fix some minor issues in comments)
 - #145814 (Handle unwinding fatal errors in codegen workers)
 - #145815 (Wait for DPkg frontend lock when trying to remove packages)
 - #145821 (compiletest: if a compiler fails, show its output)
 - #145845 (Make `x test distcheck` self-contained)
 - #145847 (Don't show warnings from xcrun with -Zverbose-internals)
 - #145856 (Update books)
 - #145858 (Update wasm-component-ld dependency)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit aecc028 into rust-lang:master Aug 26, 2025
10 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Aug 26, 2025
rust-timer added a commit that referenced this pull request Aug 26, 2025
Rollup merge of #145535 - lolbinarycat:rustdoc-invalid_html_tags-svg-145529, r=GuillaumeGomez

make rustdoc::invalid_html_tags more robust

best reviewed a commit at a time.

I kept finding more edge case so I ended up having to make quite significant changes to the parser in order to make it preserve state across events and handle multiline attributes correctly.

fixes #145529
github-actions bot pushed a commit to rust-lang/compiler-builtins that referenced this pull request Aug 28, 2025
Rollup of 12 pull requests

Successful merges:

 - rust-lang/rust#143689 (Allow linking a prebuilt optimized compiler-rt builtins library)
 - rust-lang/rust#144885 (Implement some more checks in `ptr_guaranteed_cmp`. )
 - rust-lang/rust#145535 (make rustdoc::invalid_html_tags more robust)
 - rust-lang/rust#145766 (test(rustfmt): Verify frontmatter is preserved)
 - rust-lang/rust#145811 (Fix some minor issues in comments)
 - rust-lang/rust#145814 (Handle unwinding fatal errors in codegen workers)
 - rust-lang/rust#145815 (Wait for DPkg frontend lock when trying to remove packages)
 - rust-lang/rust#145821 (compiletest: if a compiler fails, show its output)
 - rust-lang/rust#145845 (Make `x test distcheck` self-contained)
 - rust-lang/rust#145847 (Don't show warnings from xcrun with -Zverbose-internals)
 - rust-lang/rust#145856 (Update books)
 - rust-lang/rust#145858 (Update wasm-component-ld dependency)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Self-closing elements spanning multiple lines emit warnings
7 participants