-
Notifications
You must be signed in to change notification settings - Fork 3.5k
fix: resolve even-numbered backtick precedence issue (#3776) #3786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve even-numbered backtick precedence issue (#3776) #3786
Conversation
- Update blockSkip regex to properly handle multiple backticks using inlineCode pattern - Change tokenization order to process codespans before emphasis (CommonMark precedence) - Add comprehensive unit tests covering main issue and edge cases - Fix ensures even-numbered backticks (2, 4, 6, etc.) create proper codespans Fixes markedjs#3776
|
@sanki92 is attempting to deploy a commit to the MarkedJS Team on Vercel. A member of the Team first needs to authorize it. |
Summary of ChangesHello @sanki92, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical parsing issue in the Markdown renderer where codespans created with even-numbered backticks were not correctly recognized, allowing unintended emphasis formatting. The changes ensure strict adherence to CommonMark specifications by reordering token processing to prioritize codespans and enhancing the regular expression used for block skipping, thereby preventing incorrect rendering of inline code. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a precedence issue where even-numbered backtick sequences were not being parsed as codespans correctly. The approach of reordering the tokenization logic and updating the blockSkip regex is sound, and the new unit tests are thorough. I've included a couple of suggestions to improve maintainability by reducing code duplication in both the regular expression definitions and the new tests.
src/rules.ts
Outdated
|
|
||
| // sequences em should skip over [title](link), `code`, <html> | ||
| const blockSkip = /\[[^\[\]]*?\]\((?:\\[\s\S]|[^\\\(\)]|\((?:\\[\s\S]|[^\\\(\)])*\))*\)|`[^`]*?`|<(?! )[^<>]*?>/g; | ||
| const blockSkip = /\[[^\[\]]*?\]\((?:\\[\s\S]|[^\\\(\)]|\((?:\\[\s\S]|[^\\\(\)])*\))*\)|(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)|<(?! )[^<>]*?>/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern for matching code spans, (+)([^`]|[^`][\s\S]*?[^])\1(?!), is now duplicated from the inlineCode` rule (line 253). To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider extracting this pattern into a shared constant. This would make future updates to the pattern easier and less error-prone.
For example, you could define a constant and reuse it:
const codeSpanPattern = '(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)';
const inlineCode = new RegExp(`^${codeSpanPattern}`);
// ...
const blockSkip = new RegExp(`\[[^\[\]]*?\]\((?:...)*\)|${codeSpanPattern}|<(?! )[^<>]*?>`, 'g');There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! 👍 Extracted the shared pattern into _codeSpanPattern to avoid duplication. Much cleaner now.
| it('should prioritize codespans over emphasis for double backticks', () => { | ||
| const input = '**You might think this should be bold, but it should actually be regular text because codespans have higher priority: ``**``'; | ||
| const html = marked(input); | ||
|
|
||
| // Should create a codespan containing **, not emphasis | ||
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | ||
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | ||
| }); | ||
|
|
||
| it('should prioritize codespans over emphasis for quadruple backticks', () => { | ||
| const input = '**You might think this should be bold, but: ````**````'; | ||
| const html = marked(input); | ||
|
|
||
| // Should create a codespan containing **, not emphasis | ||
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | ||
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | ||
| }); | ||
|
|
||
| it('should continue working correctly for single backticks', () => { | ||
| const input = '**You might think this should be bold, but: `**`'; | ||
| const html = marked(input); | ||
|
|
||
| // Should create a codespan containing **, not emphasis | ||
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | ||
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | ||
| }); | ||
|
|
||
| it('should continue working correctly for triple backticks', () => { | ||
| const input = '**You might think this should be bold, but: ```**```'; | ||
| const html = marked(input); | ||
|
|
||
| // Should create a codespan containing **, not emphasis | ||
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | ||
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These four test cases are very similar, differing only in the number of backticks used. To make the test suite more concise and maintainable, you could refactor them into a single, data-driven test. This would reduce code duplication and make it easier to add more variations in the future.
| it('should prioritize codespans over emphasis for double backticks', () => { | |
| const input = '**You might think this should be bold, but it should actually be regular text because codespans have higher priority: ``**``'; | |
| const html = marked(input); | |
| // Should create a codespan containing **, not emphasis | |
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | |
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | |
| }); | |
| it('should prioritize codespans over emphasis for quadruple backticks', () => { | |
| const input = '**You might think this should be bold, but: ````**````'; | |
| const html = marked(input); | |
| // Should create a codespan containing **, not emphasis | |
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | |
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | |
| }); | |
| it('should continue working correctly for single backticks', () => { | |
| const input = '**You might think this should be bold, but: `**`'; | |
| const html = marked(input); | |
| // Should create a codespan containing **, not emphasis | |
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | |
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | |
| }); | |
| it('should continue working correctly for triple backticks', () => { | |
| const input = '**You might think this should be bold, but: ```**```'; | |
| const html = marked(input); | |
| // Should create a codespan containing **, not emphasis | |
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | |
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | |
| }); | |
| const backtickTests = [ | |
| { name: 'single', ticks: '`' }, | |
| { name: 'double', ticks: '``' }, | |
| { name: 'triple', ticks: '```' }, | |
| { name: 'quadruple', ticks: '````' }, | |
| ]; | |
| for (const { name, ticks } of backtickTests) { | |
| it(`should prioritize codespans over emphasis for ${name} backticks`, () => { | |
| const input = `**You might think this should be bold, but: ${ticks}**${ticks}`; | |
| const html = marked(input); | |
| // Should create a codespan containing **, not emphasis | |
| assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **'); | |
| assert.ok(!html.includes('<strong>'), 'Should not contain strong tags'); | |
| }); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect suggestion! Refactored into a data-driven test with a loop. Way more maintainable - thanks!
- Extract shared codespan pattern to reduce duplication (DRY principle) - Refactor tests into data-driven format for maintainability - Improve code consistency between inlineCode and blockSkip regex
UziTech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- use test/specs/new for markdown tests
- don't rearrange Lexer
- use the edit method instead of new Regexp
…ve tests to specs/new
|
Hi @UziTech, Thanks for the feedback! I've addressed all your points:
The fix still works correctly. All 1707 tests are passing. |
UziTech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regexp has a redos vulnerability
You can check by pasting the regexp into https://makenowjust-labs.github.io/recheck/playground/
the full regexp:
/\[[^\[\]]*?\]\((?:\\[\s\S]|[^\\\(\)]|\((?:\\[\s\S]|[^\\\(\)])*\))*\)|(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)|<(?! )[^<>]*?>/
- Replace complex codespan pattern with simple ReDoS-safe version - Change from vulnerable (+)([^]|[^][\s\S]*?[^])\1(?!) - To safe pattern \+[\s\S]*?\+ without nested quantifiers
|
Hi @UziTech, Thanks for catching the ReDoS vulnerability! I've fixed it by replacing the complex codespan pattern with a much simpler, ReDoS-safe version. The new pattern:
|
|
Hi @UziTech, just following up on this PR. I’ve addressed the ReDoS vulnerability and all tests are passing. If everything looks good to you and there’s nothing else to change, could you add the |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
This actually breaks some markdown like |
- Change alternation order from shortest to longest to ensure proper matching - Use pattern: ``[^]*?``|`[^]*?`|`[^]*?`|[^]*? - Fixes case where **This should be bold `** was incorrectly blocked - Maintains fix for original issue: **text `**` correctly creates codespan - All 1707 tests pass, ReDoS-safe pattern verified
|
Fixed ✅ |
|
I think the codepattern regexp should be Looks like you need to rebase to resolve conflicts with the latest version Can you also add the following to the test: |
b57392c to
1c8089b
Compare
|
Done ✅ |
Co-authored-by: Tony Brix <[email protected]>
UziTech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 💯
Marked version: 16.3.0
Markdown flavor: CommonMark
Description
Even-numbered backtick strings (2, 4, 6, etc.) weren't creating proper codespans, allowing emphasis to be incorrectly processed.
Before:
**text ``**`` more**→<strong>text ``</strong>`` more**❌After:
**text ``**`` more**→**text <code>**</code> more**✅Root Cause
blockSkipregex only matched single backticks properlySolution
blockSkipregex to handle multiple backticks (same pattern asinlineCode)Changes Made
Contributor
test/unit/issue-3776-backtick-precedence.test.jswith 6 comprehensive test cases covering:Committer