Skip to content

Conversation

@sanki92
Copy link
Contributor

@sanki92 sanki92 commented Oct 5, 2025

Marked version: 16.3.0

Markdown flavor: CommonMark

Description

Even-numbered backtick strings (2, 4, 6, etc.) weren't creating proper codespans, allowing emphasis to be incorrectly processed.

Before: **text ``**`` more**<strong>text ``</strong>`` more**
After: **text ``**`` more****text <code>**</code> more**

Root Cause

  1. blockSkip regex only matched single backticks properly
  2. Emphasis was processed before codespans (violates CommonMark precedence)

Solution

  • Updated blockSkip regex to handle multiple backticks (same pattern as inlineCode)
  • Changed tokenization order: codespans before emphasis
  • Added comprehensive unit tests

Changes Made

// src/rules.ts - Updated blockSkip regex
(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)

// src/Lexer.ts - Fixed tokenization order  
if (token = this.tokenizer.codespan(src)) { /* process first */ }
if (token = this.tokenizer.emStrong(src, maskedSrc, prevChar)) { /* process after */ }

Contributor

  • Test(s) exist to ensure functionality and minimize regression - Added test/unit/issue-3776-backtick-precedence.test.js with 6 comprehensive test cases covering:
    • Double backticks (main issue case)
    • Quadruple backticks
    • Single & triple backticks (regression tests)
    • Mixed emphasis and codespan scenarios
    • Nested cases
  • Full test suite passes (1705/1705 tests)

Committer

- Update blockSkip regex to properly handle multiple backticks using inlineCode pattern
- Change tokenization order to process codespans before emphasis (CommonMark precedence)
- Add comprehensive unit tests covering main issue and edge cases
- Fix ensures even-numbered backticks (2, 4, 6, etc.) create proper codespans

Fixes markedjs#3776
@vercel
Copy link

vercel bot commented Oct 5, 2025

@sanki92 is attempting to deploy a commit to the MarkedJS Team on Vercel.

A member of the Team first needs to authorize it.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sanki92, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical parsing issue in the Markdown renderer where codespans created with even-numbered backticks were not correctly recognized, allowing unintended emphasis formatting. The changes ensure strict adherence to CommonMark specifications by reordering token processing to prioritize codespans and enhancing the regular expression used for block skipping, thereby preventing incorrect rendering of inline code.

Highlights

  • Fixes Backtick Precedence Issue: Resolves a bug where even-numbered backticks (e.g., text) were not correctly creating codespans, leading to emphasis being incorrectly processed within them.
  • Root Cause Identified: The problem stemmed from the blockSkip regex not properly matching multiple backticks and emphasis being processed before codespans, violating CommonMark precedence rules.
  • Solution Implemented: The blockSkip regex has been updated to handle multiple backticks, and the tokenization order in Lexer.ts has been changed to prioritize codespans over emphasis.
  • Comprehensive Testing: A new unit test file (issue-3776-backtick-precedence.test.js) has been added with 6 test cases covering various scenarios, including double, quadruple, single, triple, mixed, and nested backtick/emphasis combinations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a precedence issue where even-numbered backtick sequences were not being parsed as codespans correctly. The approach of reordering the tokenization logic and updating the blockSkip regex is sound, and the new unit tests are thorough. I've included a couple of suggestions to improve maintainability by reducing code duplication in both the regular expression definitions and the new tests.

src/rules.ts Outdated

// sequences em should skip over [title](link), `code`, <html>
const blockSkip = /\[[^\[\]]*?\]\((?:\\[\s\S]|[^\\\(\)]|\((?:\\[\s\S]|[^\\\(\)])*\))*\)|`[^`]*?`|<(?! )[^<>]*?>/g;
const blockSkip = /\[[^\[\]]*?\]\((?:\\[\s\S]|[^\\\(\)]|\((?:\\[\s\S]|[^\\\(\)])*\))*\)|(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)|<(?! )[^<>]*?>/g;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex pattern for matching code spans, (+)([^`]|[^`][\s\S]*?[^])\1(?!), is now duplicated from the inlineCode` rule (line 253). To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, consider extracting this pattern into a shared constant. This would make future updates to the pattern easier and less error-prone.

For example, you could define a constant and reuse it:

const codeSpanPattern = '(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)';
const inlineCode = new RegExp(`^${codeSpanPattern}`);
// ...
const blockSkip = new RegExp(`\[[^\[\]]*?\]\((?:...)*\)|${codeSpanPattern}|<(?! )[^<>]*?>`, 'g');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! 👍 Extracted the shared pattern into _codeSpanPattern to avoid duplication. Much cleaner now.

Comment on lines 6 to 40
it('should prioritize codespans over emphasis for double backticks', () => {
const input = '**You might think this should be bold, but it should actually be regular text because codespans have higher priority: ``**``';
const html = marked(input);

// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});

it('should prioritize codespans over emphasis for quadruple backticks', () => {
const input = '**You might think this should be bold, but: ````**````';
const html = marked(input);

// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});

it('should continue working correctly for single backticks', () => {
const input = '**You might think this should be bold, but: `**`';
const html = marked(input);

// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});

it('should continue working correctly for triple backticks', () => {
const input = '**You might think this should be bold, but: ```**```';
const html = marked(input);

// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These four test cases are very similar, differing only in the number of backticks used. To make the test suite more concise and maintainable, you could refactor them into a single, data-driven test. This would reduce code duplication and make it easier to add more variations in the future.

Suggested change
it('should prioritize codespans over emphasis for double backticks', () => {
const input = '**You might think this should be bold, but it should actually be regular text because codespans have higher priority: ``**``';
const html = marked(input);
// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});
it('should prioritize codespans over emphasis for quadruple backticks', () => {
const input = '**You might think this should be bold, but: ````**````';
const html = marked(input);
// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});
it('should continue working correctly for single backticks', () => {
const input = '**You might think this should be bold, but: `**`';
const html = marked(input);
// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});
it('should continue working correctly for triple backticks', () => {
const input = '**You might think this should be bold, but: ```**```';
const html = marked(input);
// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});
const backtickTests = [
{ name: 'single', ticks: '`' },
{ name: 'double', ticks: '``' },
{ name: 'triple', ticks: '```' },
{ name: 'quadruple', ticks: '````' },
];
for (const { name, ticks } of backtickTests) {
it(`should prioritize codespans over emphasis for ${name} backticks`, () => {
const input = `**You might think this should be bold, but: ${ticks}**${ticks}`;
const html = marked(input);
// Should create a codespan containing **, not emphasis
assert.ok(html.includes('<code>**</code>'), 'Should contain codespan with **');
assert.ok(!html.includes('<strong>'), 'Should not contain strong tags');
});
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect suggestion! Refactored into a data-driven test with a loop. Way more maintainable - thanks!

- Extract shared codespan pattern to reduce duplication (DRY principle)
- Refactor tests into data-driven format for maintainability
- Improve code consistency between inlineCode and blockSkip regex
Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • use test/specs/new for markdown tests
  • don't rearrange Lexer
  • use the edit method instead of new Regexp

@sanki92
Copy link
Contributor Author

sanki92 commented Oct 5, 2025

Hi @UziTech,

Thanks for the feedback! I've addressed all your points:

  • Updated blockSkip to use edit().replace('codePattern', inlineCode).getRegex() instead of new RegExp
  • Reverted to original tokenization order (emStrong before codespan)
  • Created proper markdown spec tests in test/specs/new/issue_3776_backtick_precedence.{md,html} and removed the unit test

The fix still works correctly. All 1707 tests are passing.

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regexp has a redos vulnerability

Image

You can check by pasting the regexp into https://makenowjust-labs.github.io/recheck/playground/

the full regexp:

/\[[^\[\]]*?\]\((?:\\[\s\S]|[^\\\(\)]|\((?:\\[\s\S]|[^\\\(\)])*\))*\)|(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)|<(?! )[^<>]*?>/

- Replace complex codespan pattern with simple ReDoS-safe version
- Change from vulnerable (+)([^]|[^][\s\S]*?[^])\1(?!)
- To safe pattern \+[\s\S]*?\+ without nested quantifiers
@sanki92
Copy link
Contributor Author

sanki92 commented Oct 7, 2025

Hi @UziTech,

Thanks for catching the ReDoS vulnerability! I've fixed it by replacing the complex codespan pattern with a much simpler, ReDoS-safe version.

The new pattern:

  • Eliminates nested quantifiers and problematic alternation
  • Maintains all original functionality
  • All 1707 tests continue to pass

@sanki92
Copy link
Contributor Author

sanki92 commented Oct 11, 2025

Hi @UziTech, just following up on this PR. I’ve addressed the ReDoS vulnerability and all tests are passing. If everything looks good to you and there’s nothing else to change, could you add the hacktoberfest-accepted label so it counts for the event? Thanks!

@vercel
Copy link

vercel bot commented Oct 12, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
marked-website Ready Ready Preview Comment Oct 15, 2025 6:03am

@UziTech
Copy link
Member

UziTech commented Oct 12, 2025

This actually breaks some markdown like

**This should be bold ``**`

This PR

latest marked

commonmark

- Change alternation order from shortest to longest to ensure proper matching
- Use pattern: ``[^]*?``|`[^]*?`|`[^]*?`|[^]*?
- Fixes case where **This should be bold `** was incorrectly blocked
- Maintains fix for original issue: **text `**` correctly creates codespan
- All 1707 tests pass, ReDoS-safe pattern verified
@sanki92
Copy link
Contributor Author

sanki92 commented Oct 12, 2025

Fixed ✅

@UziTech
Copy link
Member

UziTech commented Oct 14, 2025

I think the codepattern regexp should be (?<!`)(`+)[^`]+\1(?!`) this is safe and will ensure the same amount of backticks in the front and back.

Looks like you need to rebase to resolve conflicts with the latest version

Can you also add the following to the test:

<p><strong>This should be bold ``</strong>`</p>
<p><strong>This should be bold `</strong>``</p>

**This should be bold ``**`

**This should be bold `**``

@sanki92 sanki92 force-pushed the fix-issue-3776-backtick-precedence branch from b57392c to 1c8089b Compare October 14, 2025 14:33
@sanki92
Copy link
Contributor Author

sanki92 commented Oct 15, 2025

Done ✅

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 💯

This was referenced Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Even-numbered backtick strings have incorrect precedence with at least emStrong and link delimiters

3 participants