Skip to content

anchors don't work when contains punctuation marks just like or ( #26367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lazyky opened this issue Aug 7, 2023 · 14 comments · Fixed by #26388
Closed

anchors don't work when contains punctuation marks just like or ( #26367

lazyky opened this issue Aug 7, 2023 · 14 comments · Fixed by #26388
Labels

Comments

@lazyky
Copy link

lazyky commented Aug 7, 2023

Description

Markdown Heading ID contains Unicode is inconsistent with Github.
For #### test(1) in gitea , the id is "user-content-test-1" and in github it is "user-content-test1"

The markdown below is available for jumping on github, but not for gitea.

#### test(1)
to [test(1)](#test1)

gitea

id = "user-content-test-1"
8d42bfb9f7ea6f43e4f3b6b8f339aff

github

id = "user-content-test1"
89b1f42dcd9174ec189487dd55057a4

Gitea Version

1.20.2

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

I was able to reproduce it using try.gitea.io.

Database

None

@lazyky lazyky added the type/bug label Aug 7, 2023
@CaiCandong
Copy link
Member

What's the impact of this problem?

@bioinformatist
Copy link

@CaiCandong

Sometimes we need to use section titles like this:

image

However, the malfunction of the anchors pointing to them makes reading somewhat difficult.

@lazyky lazyky changed the title Markdown Heading ID contains Unicode “(” is inconsistent with Github anchors don't work when contains unicode Aug 8, 2023
@CaiCandong
Copy link
Member

Markdown Heading ID contains Unicode is inconsistent with Github.

Thanks for the report, I understand the problem, besides does this problem also occur when using ( directly?

@lazyky lazyky changed the title anchors don't work when contains unicode anchors don't work when contains punctuation marks just like or ( Aug 8, 2023
@lazyky
Copy link
Author

lazyky commented Aug 8, 2023

Markdown Heading ID contains Unicode is inconsistent with Github.

Thanks for the report, I understand the problem, besides does this problem also occur when using ( directly?

Yes. I also test (, !, :, *, and . They are same as . @CaiCandong

#### test(0)

#### test!1

#### test:2

#### test*3

#### test!4

#### test:5

gitea

image

github

image

@CaiCandong
Copy link
Member

CaiCandong commented Aug 8, 2023

I've located the code for this problem, it has to do with the user-conent-* generation rules, but I'm not particularly sure how github handles this, can you give me some more examples to help me refine the code?

#### test:ad # df
#### test:ad # df
#### test:ad #23 df 2*/*

@lazyky
Copy link
Author

lazyky commented Aug 8, 2023

test:ad # df

test:ad # df

test:ad #23 df 2*/*

github

There are the examples on github
image

@CaiCandong
Copy link
Member

def cheanValue(anchor_name):
    anchor_name = anchor_name.strip()
    ret = []
    for c in anchor_name:
        if c.isalpha() or c.isdigit() or c == '_' or c == '-':
            ret.append(c.lower())
        if c == ' ':
            ret.append('-')
    return ''.join(ret)

def test():
    cases = [
        ["", ""],
        ["test(0)", "test0"],
        ["test!1", "test1"],
        ["test:2", "test2"],
        ["test*3", "test3"],
        ["test!4", "test4"],
        ["test:5", "test5"],
        ["test*6", "test6"],
        ["test:6 a", "test6-a"],
        ["test:6 !b", "test6-b"],
        ["test:ad # df", "testad--df"],
        ["test:ad #23 df 2*/*", "testad-23-df-2"],
        ["test:ad 23 df 2*/*", "testad-23-df-2"],
        ["test:ad # 23 df 2*/*", "testad--23-df-2"],
        ["Anchors in Markdown", "anchors-in-markdown"],
        ["a_b_c", "a_b_c"],
        ["a-b-c", "a-b-c"],
        ["a-b-c----", "a-b-c----"],
        ["test:6a", "test6a"],
        ["test:a6", "testa6"],
        ["tes a a   a  a", "tes-a-a---a--a"],
        ["  tes a a   a  a  ", "tes-a-a---a--a"]]
    for parm,expect in cases:
        if cheanValue(parm) != expect:
            print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm)))
test()

Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github?
@lazyky @bioinformatist

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Aug 8, 2023

This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745

Quote the old comment from that issue:


I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

There are various characters would be removed&replaced during URL generation. For example, the single quote ' in your demo file, too.

https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click
https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click

Since there is no standard, so there is no right or wrong, as long as it works.

Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream goldmark use CommonMark standard.

@CaiCandong
Copy link
Member

CaiCandong commented Aug 8, 2023

I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

I understand what you're saying, and it's not a bug. But do we need to adjust it so that github/vscode is consistent?

@wxiaoguang
Copy link
Contributor

Just to share the information from old issues. I am neutral for it.

@bioinformatist
Copy link

This one is also related: Different behaviors when generating Markdown links for headings containing punctuations and other symbols #19745

Quote the old comment from that issue:

I would say it's more like a feature but not a bug, because Markdown is not a strict system, and there seems no unique standard.

There are various characters would be removed&replaced during URL generation. For example, the single quote ' in your demo file, too.

https://github.com/federico-ntr/gitea-double-quotes-test#placeholder-to-force-scrolling-on-links-click
https://try.gitea.io/federico-ntr/double-quotes-test#placeholder-to-force-scrolling-on-link-s-click

Since there is no standard, so there is no right or wrong, as long as it works.

Maybe the answer to the question could be: if there is a definition in CommonMark, then make upstream goldmark use CommonMark standard.

Got that. Sure it is not a bug, but it seems that the logic of github is more straightforward and easier to use.

@lazyky
Copy link
Author

lazyky commented Aug 8, 2023

def cheanValue(anchor_name):
    anchor_name = anchor_name.strip()
    ret = []
    for c in anchor_name:
        if c.isalpha() or c.isdigit() or c == '_' or c == '-':
            ret.append(c.lower())
        if c == ' ':
            ret.append('-')
    return ''.join(ret)

def test():
    cases = [
        ["", ""],
        ["test(0)", "test0"],
        ["test!1", "test1"],
        ["test:2", "test2"],
        ["test*3", "test3"],
        ["test!4", "test4"],
        ["test:5", "test5"],
        ["test*6", "test6"],
        ["test:6 a", "test6-a"],
        ["test:6 !b", "test6-b"],
        ["test:ad # df", "testad--df"],
        ["test:ad #23 df 2*/*", "testad-23-df-2"],
        ["test:ad 23 df 2*/*", "testad-23-df-2"],
        ["test:ad # 23 df 2*/*", "testad--23-df-2"],
        ["Anchors in Markdown", "anchors-in-markdown"],
        ["a_b_c", "a_b_c"],
        ["a-b-c", "a-b-c"],
        ["a-b-c----", "a-b-c----"],
        ["test:6a", "test6a"],
        ["test:a6", "testa6"],
        ["tes a a   a  a", "tes-a-a---a--a"],
        ["  tes a a   a  a  ", "tes-a-a---a--a"]]
    for parm,expect in cases:
        if cheanValue(parm) != expect:
            print("error: parm: %s, expect: %s, actual: %s" % (parm, expect, cheanValue(parm)))
test()

Can you help me write some test cases from github to verify that the logic of the cheanValue function is consistent with github? @lazyky @bioinformatist

Yes. That's right, but "" will not be rendered

github

d2cafcdf8f6f7a19fd0b848ad81d4ac
39c7a2bd9e2a1db1f853cda7972e15f

@CaiCandong
Copy link
Member

Yes. That's right, but "" will not be rendered

These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?

@lazyky
Copy link
Author

lazyky commented Aug 8, 2023

Yes. That's right, but "" will not be rendered

These test cases are the ones I got from github, of course they are correct. What I mean is can you help me to add some more test cases?

Ok. Below is the examples I tested on github

[
    ["tes()", "tes"],
    ["tes…@a", "tesa"],
    ["tes¥& a", "tes-a"],
    ["tes= a", "tes-a"],
    ["tes|a", "tesa"],
    ["tes\a", "tesa"],
    ["tes/a", "tesa"]
]

silverwind pushed a commit that referenced this issue Aug 9, 2023
Fix #26367
Related #19745

Thanks @lazyky for providing  test cases
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants