Skip to content

Cannot include escaped dollar when using dollars delimiters #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rogino opened this issue Sep 23, 2021 · 7 comments
Closed

Cannot include escaped dollar when using dollars delimiters #32

rogino opened this issue Sep 23, 2021 · 7 comments

Comments

@rogino
Copy link

rogino commented Sep 23, 2021

The regex for dollar delimiters breaks if the content includes an escaped dollar (\$), making it impossible to type dollar literals inside math blocks:

$$
\$10.00
$$

$\$10.00$

will all fail to be parsed as a latex block.

Workarounds include using a different delimiter style or use \text{\textdollar}, but getting it to work without these would be nice.

I've been to update the math_block regex from:

/\${2}([^$]+?)\${2}/gmy to

/\${2}((?:\\\$|[^$])+?)\${2}/gmy

but it doesn't cover all cases, and I haven't been able to figure out where to start with the math_inline regex.

@edemaine
Copy link
Contributor

Unfortunately, regexes won't cut it to fix this problem in full generality, e.g. to support

$$ x = \text{what is $x$?} $$

Would you be willing to review a PR that rewrites the parsing engine to count {/}s and check whether $s are escaped with \, when processing in dollars mode?

@goessner
Copy link
Owner

hmm ... interesting.

Usually I am very reluctant with extending the regexes used in texmath. They are performance critical, as they are evaluated once with each user key stroke.

In fact escaped dollar sign \$ is a valid element inside latex math and interestingly it is working flawlessly inside of other non-dollar delimiters.

Dollar delimiters are special beasts. While having a short look into the math-inline regex

/\$((?:\S)|(?:\S.*?\S))\$/

I reidentified the final \S (last character before closing dollar) as a shortcut for [^\r\n\t\f\v ] (character exept), which might be easily extended to [^\r\n\t\f\v \\] presumably without too much performance cost.

After testing it in https://regex101.com/ successfully for several relevant cases, I considered it worth for also use it as a guard in the math-block regexes.

Surprisingly a drastic simplification is helping here ... from

/\${2}([^$]+?)\${2}/

to

/\${2}(.+?)\${2}/

Expect it available in the next version.

thanks ...

@goessner
Copy link
Owner

Erik, thanks for your potential help.

If there are some problems with my bug fix attempt, I would like to come back to your PR offer.

@edemaine
Copy link
Contributor

/\${2}(.+?)\${2}/

Changing from [^$] to . excludes newlines. I think you want to allow single newlines (though you could forbid double newlines, as TeX does) within math. Instead of . you probably want [^], as in:

/\${2}([^]+?)\${2}/

But none of this fixes use of \$ (escaped $) or \text{$...$} (nested $) within a math expression.

One potential fix would be to use the regex as above, and then check whether the closing $ is actually escaped (had an odd number of \s before it), or has more unescaped {s than }s in it, and in that case, doing more work (probably another regex match to get contents until the next \${2}, concatenating, and trying again). This would mostly just take extra time in the weird edge cases of escaped and nested $s, which currently don't work, so seems like a win? But it would involve counting the number of {s and }s (without preceding \s) to make sure it's matched. I imagine this is all way faster than the cost of calling KaTeX, though.

Alternatively (and what I originally had in mind), the regex could match the opening \${2}, and then do a secondary search for { or } or `${2}$, checking for escaping in each case, and repeat until finding the unnested closing notation. I could test to see which is faster in which cases.

@goessner
Copy link
Owner

In fact I had temporarily forgotten that '.' excludes newlines. So taking your proposed '[^]' works fine ...

inline: [
   {   name: 'math_inline_double',
       rex: /\${2}(.+?)\${2}/gy
   },
   {   name: 'math_inline',
       rex: /\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$/gy
  }
],
block: [
  {   name: 'math_block_eqno',
      rex: /\${2}([^]+?[^\\])\${2}\s*?\(([^)\s]+?)\)/gmy
  },
  {   name: 'math_block',
     rex: /\${2}([^]+?[^\\])\${2}/gmy
  }
]

... as you can see with this example code (https://github.com/goessner/markdown-it-texmath/blob/master/test/bug-dollars.html)

    const str = `
  # Simple Dollar tests
  ## Inline

  here "$a+\\$ = b$" we "$\\$$" go "$\\text{\\$some...\\$}$"

  ## Inline block (single line only)
  
  $$a+\\$ = \\text{\\$more...\\$} \\$$$  or ...

  ## Block (multiline)

  $$
  a+\\$ = \\text{\\$text...\\$} \\$
  $$
`

resulting in ...
grafik

To also handle unescaped dollars inside of \text{$...$} I see effort to use relation as disproportionate. It seems to be reasonable to also escape dollars \text{\$...\$} in that edge case.
So I would prefer to live with that small insufficiency of markdown-it-texmath for performance and simplicity reasons.

I hope I have not overlooked anything ... thanks.

@edemaine
Copy link
Contributor

Minor points:

  • math_inline_double still uses .; should probably switch that to [^] too.
  • I think you can use [^\s\\] instead of [^\r\n\t\f\v \\]. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)
  • [^]+?[^\\] (which occurs in both block rules) seems to require at least two characters. Should probably be [^]*?[^\\].
  • Shouldn't math_inline_double have the same addition to exclude a trailing \?
  • \$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$ can be simplified to \$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$ or \$([^\s\\]|\S.*?[^\s\\])\$. (I'm not quite sure why you're forbidding spaces next to the $s but I assume that's intentional, to avoid some stray matching.)

These new rules seem to deal with \$ properly. Nice!!

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:

image

So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Unmatched braces generate errors in KaTeX, though. (I see either Uncaught ParseError: KaTeX parse error: Expected '}', got 'EOF' at end of input or KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: \text{.) So perhaps that could be detected, which triggers an "extension" regex? I believe the extension regex is exactly math_inline without the leading $, or applying math_inline but starting from the final $. On input $\text{$x+y$}$, after matching $\text{$, you'd next grab x+y$, fail, and then grab }$, and then succeed. This is quadratic time in the number of $s, but that's probably small... Alternatively, when in extension mode, we could count {/}s, so call KaTeX at most twice.

@goessner
Copy link
Owner

Sorry for the delay. Thanks for your valuable input.

* `math_inline_double` still uses `.`; should probably switch that to `[^]` too.

math_inline should be written on a single line, but I added it to be more forgiving here.

* I think you can use `[^\s\\]` instead of `[^\r\n\t\f\v \\]`. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)

This is definitely better ... taken.

* `[^]+?[^\\]` (which occurs in both block rules) seems to require at least two characters.  Should probably be `[^]*?[^\\]`.

Yes ... thanks for catching.

* Shouldn't `math_inline_double` have the same addition to exclude a trailing `\`?

sure ... done.

* `\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$` can be simplified to `\$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$` or `\$([^\s\\]|\S.*?[^\s\\])\$`. (I'm not quite sure why you're forbidding spaces next to the `$`s but I assume that's intentional, to avoid some stray matching.)

yes ... again a significant improvement.

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:
So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Now I understand that reentering math mode effect. Escaping makes no sense indeed. I still consider it a not so relevant edge case in practise. I don't want to invest that significant implementation effort at current. A PR is always welcome though.

thanks again ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants