Cannot include escaped dollar when using dollars delimiters #32

rogino · 2021-09-23T23:06:46Z

The regex for dollar delimiters breaks if the content includes an escaped dollar (\$), making it impossible to type dollar literals inside math blocks:

$$
\$10.00
$$

$\$10.00$

will all fail to be parsed as a latex block.

Workarounds include using a different delimiter style or use \text{\textdollar}, but getting it to work without these would be nice.

I've been to update the math_block regex from:

/\${2}([^$]+?)\${2}/gmy to

/\${2}((?:\\\$|[^$])+?)\${2}/gmy

but it doesn't cover all cases, and I haven't been able to figure out where to start with the math_inline regex.

The text was updated successfully, but these errors were encountered:

edemaine · 2021-10-26T15:32:00Z

Unfortunately, regexes won't cut it to fix this problem in full generality, e.g. to support

$$ x = \text{what is $x$?} $$

Would you be willing to review a PR that rewrites the parsing engine to count {/}s and check whether $s are escaped with \, when processing in dollars mode?

goessner · 2021-10-27T16:08:06Z

hmm ... interesting.

Usually I am very reluctant with extending the regexes used in texmath. They are performance critical, as they are evaluated once with each user key stroke.

In fact escaped dollar sign \$ is a valid element inside latex math and interestingly it is working flawlessly inside of other non-dollar delimiters.

Dollar delimiters are special beasts. While having a short look into the math-inline regex

/\$((?:\S)|(?:\S.*?\S))\$/

I reidentified the final \S (last character before closing dollar) as a shortcut for [^\r\n\t\f\v ] (character exept), which might be easily extended to [^\r\n\t\f\v \\] presumably without too much performance cost.

After testing it in https://regex101.com/ successfully for several relevant cases, I considered it worth for also use it as a guard in the math-block regexes.

Surprisingly a drastic simplification is helping here ... from

/\${2}([^$]+?)\${2}/

to

/\${2}(.+?)\${2}/

Expect it available in the next version.

thanks ...

goessner · 2021-10-27T16:28:25Z

Erik, thanks for your potential help.

If there are some problems with my bug fix attempt, I would like to come back to your PR offer.

edemaine · 2021-10-27T19:26:26Z

/\${2}(.+?)\${2}/

Changing from [^$] to . excludes newlines. I think you want to allow single newlines (though you could forbid double newlines, as TeX does) within math. Instead of . you probably want [^], as in:

/\${2}([^]+?)\${2}/

But none of this fixes use of \$ (escaped $) or \text{$...$} (nested $) within a math expression.

One potential fix would be to use the regex as above, and then check whether the closing $ is actually escaped (had an odd number of \s before it), or has more unescaped {s than }s in it, and in that case, doing more work (probably another regex match to get contents until the next \${2}, concatenating, and trying again). This would mostly just take extra time in the weird edge cases of escaped and nested $s, which currently don't work, so seems like a win? But it would involve counting the number of {s and }s (without preceding \s) to make sure it's matched. I imagine this is all way faster than the cost of calling KaTeX, though.

Alternatively (and what I originally had in mind), the regex could match the opening \${2}, and then do a secondary search for { or } or `${2}$, checking for escaping in each case, and repeat until finding the unnested closing notation. I could test to see which is faster in which cases.

goessner · 2021-10-28T11:44:44Z

In fact I had temporarily forgotten that '.' excludes newlines. So taking your proposed '[^]' works fine ...

inline: [
   {   name: 'math_inline_double',
       rex: /\${2}(.+?)\${2}/gy
   },
   {   name: 'math_inline',
       rex: /\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$/gy
  }
],
block: [
  {   name: 'math_block_eqno',
      rex: /\${2}([^]+?[^\\])\${2}\s*?\(([^)\s]+?)\)/gmy
  },
  {   name: 'math_block',
     rex: /\${2}([^]+?[^\\])\${2}/gmy
  }
]

... as you can see with this example code (https://github.com/goessner/markdown-it-texmath/blob/master/test/bug-dollars.html)

    const str = `
  # Simple Dollar tests
  ## Inline

  here "$a+\\$ = b$" we "$\\$$" go "$\\text{\\$some...\\$}$"

  ## Inline block (single line only)
  
  $$a+\\$ = \\text{\\$more...\\$} \\$$$  or ...

  ## Block (multiline)

  $$
  a+\\$ = \\text{\\$text...\\$} \\$
  $$
`

resulting in ...

To also handle unescaped dollars inside of \text{$...$} I see effort to use relation as disproportionate. It seems to be reasonable to also escape dollars \text{\$...\$} in that edge case.
So I would prefer to live with that small insufficiency of markdown-it-texmath for performance and simplicity reasons.

I hope I have not overlooked anything ... thanks.

edemaine · 2021-10-28T14:42:23Z

Minor points:

math_inline_double still uses .; should probably switch that to [^] too.
I think you can use [^\s\\] instead of [^\r\n\t\f\v \\]. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)
[^]+?[^\\] (which occurs in both block rules) seems to require at least two characters. Should probably be [^]*?[^\\].
Shouldn't math_inline_double have the same addition to exclude a trailing \?
\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$ can be simplified to \$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$ or \$([^\s\\]|\S.*?[^\s\\])\$. (I'm not quite sure why you're forbidding spaces next to the $s but I assume that's intentional, to avoid some stray matching.)

These new rules seem to deal with \$ properly. Nice!!

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:

So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Unmatched braces generate errors in KaTeX, though. (I see either Uncaught ParseError: KaTeX parse error: Expected '}', got 'EOF' at end of input or KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: \text{.) So perhaps that could be detected, which triggers an "extension" regex? I believe the extension regex is exactly math_inline without the leading $, or applying math_inline but starting from the final $. On input $\text{$x+y$}$, after matching $\text{$ , you'd next grab x+y$, fail, and then grab }$, and then succeed. This is quadratic time in the number of $s, but that's probably small... Alternatively, when in extension mode, we could count {/}s, so call KaTeX at most twice.

goessner · 2021-11-12T11:42:34Z

Sorry for the delay. Thanks for your valuable input.

* `math_inline_double` still uses `.`; should probably switch that to `[^]` too.

math_inline should be written on a single line, but I added it to be more forgiving here.

* I think you can use `[^\s\\]` instead of `[^\r\n\t\f\v \\]`. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)

This is definitely better ... taken.

* `[^]+?[^\\]` (which occurs in both block rules) seems to require at least two characters.  Should probably be `[^]*?[^\\]`.

Yes ... thanks for catching.

* Shouldn't `math_inline_double` have the same addition to exclude a trailing `\`?

sure ... done.

* `\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$` can be simplified to `\$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$` or `\$([^\s\\]|\S.*?[^\s\\])\$`. (I'm not quite sure why you're forbidding spaces next to the `$`s but I assume that's intentional, to avoid some stray matching.)

yes ... again a significant improvement.

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:
So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Now I understand that reentering math mode effect. Escaping makes no sense indeed. I still consider it a not so relevant edge case in practise. I don't want to invest that significant implementation effort at current. A PR is always welcome though.

thanks again ...

goessner closed this as completed Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot include escaped dollar when using dollars delimiters #32

Cannot include escaped dollar when using dollars delimiters #32

rogino commented Sep 23, 2021

edemaine commented Oct 26, 2021

Uh oh!

goessner commented Oct 27, 2021

Uh oh!

goessner commented Oct 27, 2021

Uh oh!

edemaine commented Oct 27, 2021

Uh oh!

goessner commented Oct 28, 2021

Uh oh!

edemaine commented Oct 28, 2021

Uh oh!

goessner commented Nov 12, 2021

Uh oh!

Cannot include escaped dollar when using dollars delimiters #32

Cannot include escaped dollar when using dollars delimiters #32

Comments

rogino commented Sep 23, 2021

edemaine commented Oct 26, 2021

Uh oh!

goessner commented Oct 27, 2021

Uh oh!

goessner commented Oct 27, 2021

Uh oh!

edemaine commented Oct 27, 2021

Uh oh!

goessner commented Oct 28, 2021

Uh oh!

edemaine commented Oct 28, 2021

Uh oh!

goessner commented Nov 12, 2021

Uh oh!