-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add linkcheck_case_sensitive configuration option #14046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add linkcheck_case_sensitive configuration option #14046
Conversation
|
Hi @FazeelUsmani - thank you for developing and describing this pull request. I have a concern that enabling the option reduces the precision of other hyperlinks that are checked. Could you explain the use case where it would be easier for a documentation project to enable this option by editing the |
|
That’s a good point, @jayaddison. |
|
@FazeelUsmani got it, understood. As often happens, I had a misunderstanding to begin with - you are saying that this only affects whether case-adjusted response URLs are considered to be Let me think about this a little more; I do understand the value in this now, but am wary of (and trying to think of) any problem side-effects. |
|
(also, thank you for the explanation) |
|
Separately: I do think that we should probably isolate the |
|
Hmm.. makes sense. I can refactor this into two separate options: This would give users more granular control. Most users would likely want |
|
Two options seems overkill for this use-case. What do browsers do de facto on case mismatches on fragment IDs? A |
|
Fair point — browsers generally treat fragment IDs as case-sensitive, though behavior can vary depending on the HTML generator. My thought was mainly to avoid false negatives in edge cases (like auto-generated anchors that normalize casing differently). |
|
I can't think of drawbacks to the The anchor-checking I'm less certain about; given that we believe browsers seem to navigate to anchors case-sensitively -- something I too checked locally and that is certainly the case in Firefox 140.4 -- I'd be reluctant to offer that without a demonstrable use-case (again that can't be solved easily by fixing the source documentation). |
|
That makes sense — I’ll keep it as a single |
|
Sounds good to me! Thanks @FazeelUsmani. |
56d6a63 to
d115b1e
Compare
| # With case-sensitive checking, a URL that redirects to different case | ||
| # should be marked as redirected | ||
| lowercase_uri = f'http://{address}/path' | ||
| if lowercase_uri in rowsby: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the tests locally, the code inside this if condition isn't being evaluated - so I think the test coverage is incomplete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... let me check.
Yes, the test document didn't contain a link to /path.
I've created a dedicated test root (test-linkcheck-case-check) with a link to the /path endpoint and changed the conditional check to an assertion so the test will fail if the URL isn't being checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| def mock_request(self, method, url, **kwargs): | ||
| response = original_request(self, method, url, **kwargs) | ||
| # Change the URL to uppercase to simulate server behavior | ||
| if '/path' in str(response.url).lower(): | ||
| response.url = str(response.url).replace('/path', '/PATH') | ||
| return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach of mocking/intercepting the HTTP response object bypasses some of the logic that we want to test.
Instead let's rewrite the CaseSensitiveHandler HTTP server so that when /path is requested, we redirect the client to /Path -- and then the test should be able to confirm (without mocking/patching) how the linkcheck builder handles that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I've rewritten CaseSensitiveHandler to perform an actual HTTP 301 redirect from /path to /Path, and removed all the mocking/patching code. The test now correctly validates the redirect detection logic without bypassing any of the implementation.
| .. confval:: linkcheck_case_insensitive | ||
| :type: :code-py:`bool` | ||
| :default: :code-py:`False` | ||
|
|
||
| When :code-py:`True`, the *linkcheck* builder will compare URL paths | ||
| case-insensitively when checking for redirects. | ||
| This is useful for checking links on case-insensitive servers | ||
| (for example, GitHub, Windows-based servers, or certain hosting platforms) | ||
| that may return URLs with different case than the original link. | ||
|
|
||
| When enabled, URL paths like ``/Path`` and ``/path`` are considered | ||
| equivalent, preventing false-positive redirect warnings on | ||
| case-insensitive servers. | ||
|
|
||
| .. note:: | ||
|
|
||
| This option only affects URL path comparison for redirect detection. | ||
| HTML anchor checking remains case-sensitive to match browser behavior, | ||
| where fragment identifiers (``#anchor``) are case-sensitive per the | ||
| HTML specification. | ||
|
|
||
| Example: | ||
|
|
||
| .. code-block:: python | ||
| linkcheck_case_insensitive = True | ||
| .. versionadded:: 8.2 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be my perception, but I would suggest that it may be easier to understand the config setting if we remove the inverse prefix (e.g. using sensitive instead of insensitive).
I also have a few ideas for the documentation text; please let me know what you think:
| .. confval:: linkcheck_case_insensitive | |
| :type: :code-py:`bool` | |
| :default: :code-py:`False` | |
| When :code-py:`True`, the *linkcheck* builder will compare URL paths | |
| case-insensitively when checking for redirects. | |
| This is useful for checking links on case-insensitive servers | |
| (for example, GitHub, Windows-based servers, or certain hosting platforms) | |
| that may return URLs with different case than the original link. | |
| When enabled, URL paths like ``/Path`` and ``/path`` are considered | |
| equivalent, preventing false-positive redirect warnings on | |
| case-insensitive servers. | |
| .. note:: | |
| This option only affects URL path comparison for redirect detection. | |
| HTML anchor checking remains case-sensitive to match browser behavior, | |
| where fragment identifiers (``#anchor``) are case-sensitive per the | |
| HTML specification. | |
| Example: | |
| .. code-block:: python | |
| linkcheck_case_insensitive = True | |
| .. versionadded:: 8.2 | |
| .. confval:: linkcheck_case_sensitive | |
| :type: :code-py:`bool` | |
| :default: :code-py:`True` | |
| This setting controls how the *linkcheck* builder decides | |
| whether a hyperlink's destination is the same as the URL | |
| written in the documentation. | |
| By default, *linkcheck* requires the destination URL to match the written URL case-sensitively. This means that a link to ``http://webserver.test/USERNAME`` in | |
| the documentation that the server redirects to ``http://webserver.test/username`` will be reported as ``redirected``. | |
| To allow a more lenient URL comparison, that will report the previous case as | |
| as ``working`` instead, configure this setting to ``False``. | |
| .. note:: | |
| HTML anchor checking is always case-sensitive, and is | |
| not affected by this setting. | |
| .. versionadded:: 8.2 | |
Edit: replace status code of successful with the actual value from the code, working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied your documentation changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
sphinx/builders/linkcheck.py
Outdated
| if self.case_insensitive: | ||
| normalised_url = normalised_url.casefold() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a rare scenario, but strictly speaking I think we should avoid casefolding any fragment/anchor identifier part of the URL.
e.g.
- ❌
/USERNAME#Badges->/username#badges - ✅
/USERNAME#Badges->/username#Badges
(I only realised this after attempting to rewrite the config setting documentation in my own words -- in particular the part about anchor case-sensitivity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I've fixed the _normalise_url() function to only casefold the URL before the fragment. Now /USERNAME#Badges correctly becomes /username#Badges (preserving the fragment case). The function now splits on # and only applies .casefold() to the URL part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; the updated URL normalisation logic looks good to me; thank you.
sphinx/builders/linkcheck.py
Outdated
| app.add_event('linkcheck-process-uri') | ||
|
|
||
| # priority 900 to happen after ``check_confval_types()`` | ||
| app.connect('config-inited', handle_deprecated_linkcheck_case_config, priority=899) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to add any deprecation/migration-handling code for this config setting, because the linkcheck_case_insensitive config setting has not been included in a published release of Sphinx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the deprecation handler function and the old config value registration.
Co-authored-by: James Addison <[email protected]>
…sphinx into linkcheck_case_insensitive
| def do_HEAD(self): | ||
| if self.path == '/path': | ||
| # Redirect lowercase /path to uppercase /Path | ||
| self.send_response(301, 'Moved Permanently') | ||
| self.send_header('Location', '/Path') | ||
| self.send_header('Content-Length', '0') | ||
| self.end_headers() | ||
| elif self.path == '/Path': | ||
| self.send_response(200, 'OK') | ||
| self.send_header('Content-Length', '0') | ||
| self.end_headers() | ||
| else: | ||
| self.send_response(404, 'Not Found') | ||
| self.send_header('Content-Length', '0') | ||
| self.end_headers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the do_HEAD code block here, as the base HTTP server will invoke do_GET instead if it's missing.
NB: Let's also apply the Location response header simplification to do_GET when we do that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've removed the do_HEAD method entirely - the base HTTP server will invoke do_GET instead. I've also simplified the Location header in do_GET to just use /Path instead of the full URL with host.
| } in content | ||
|
|
||
|
|
||
| class CaseSensitiveHandler(BaseHTTPRequestHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly nitpicky, but: the client and the test case are about case sensitivity -- but this handler is not really. What it does is to normalise/capitalise the path. So I'd suggest renaming it:
| class CaseSensitiveHandler(BaseHTTPRequestHandler): | |
| class CapitalisePathHandler(BaseHTTPRequestHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it from CaseSensitiveHandler to CapitalisePathHandler and updated the docstring to reflect what it actually does.
| lowercase_uri = f'http://{address}/path' | ||
| assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked' | ||
| assert rowsby[lowercase_uri]['status'] == 'redirected' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the containment check is good defensive coding, we don't perform similar checks in the other tests, so for consistency I would recommend:
| lowercase_uri = f'http://{address}/path' | |
| assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked' | |
| assert rowsby[lowercase_uri]['status'] == 'redirected' | |
| assert rowsby[f'http://{address}/path']['status'] == 'redirected' |
(I briefly considered suggesting renaming the variable to something like origin_url, outbound_url or documented_url -- but on balance I think we should simply omit it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified both tests to directly assert on rowsby[f'http://{address}/path']['status'] without the intermediate variable or defensive containment check, matching the style used in other tests in the file
|
Ok, I think this is looking pretty good @FazeelUsmani - thanks for accommodating my feedback. The only significant remaining question I have is whether this should be controlled as a single boolean value, or whether it should be enabled for patterns of URLs -- similar to the way that Currently: I am leaning towards the idea that, ideally, documentation projects should only enable this for the domains/websites where they need the tolerance for case-normalising redirects. The reason I lean that way is that -- generally -- URLs are in fact case sensitive, and so I have some lingering doubts about making this a flag that is likely to affect cases where it's not worthwhile to casefold. PS: either way, if you could also update the title of this PR that would be great! |
|
Thanks for the feedback @jayaddison! That's a really good point about making this pattern-based rather than a global boolean. I can see how that would be safer and more precise - only applying case-insensitivity to domains that actually need it (like GitHub), similar to how So, the pattern-based approach would be something like: linkcheck_case_sensitive_patterns = {
r'https://github\.com/.*': False,
}Does this sounds good to you? |
|
That mostly sounds good - I would suggest some adjustments, though: either:
OR
I don't have a strong preference yet -- I'm continuing to think about it. The second option might make it slightly more apparent to readers that these Edit: fixup: the matching patterns are not necessarily domains. |
|
Thanks for clarifying, @jayaddison! I'll go with Option 2 - using a list with So the config would look like: linkcheck_case_insensitive = [
r'https://github\.com/.*',
]Does that sound good to you? I'll start refactoring to implement pattern matching once you confirm. |
|
@FazeelUsmani yep, that sounds good to me 👍 |
| freshenv=True, | ||
| confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/.*']}, | ||
| ) | ||
| def test_linkcheck_case_insensitive(app: SphinxTestApp) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I think it would be possible to merge the two test cases into a single def test_linkcheck_case_sensitivity now that we can pattern-match different URL patterns.
(the code for the two of them is very similar currently, so perhaps the end result would be neater/smaller by combining them)
|
|
||
|
|
||
| @pytest.mark.sphinx('html', testroot='numfig') | ||
| @pytest.mark.test_params(shared_result='test_build_html_numfig') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to restore this pytest marker (I think you did, but maybe it got removed again somehow).
|
Ah, and one more thing: please make sure to update I think it's looking good; we'll also need a core developer/maintainer to approve it, and I can't promise if/when that will be. |
This PR adds a new configuration option
linkcheck_ignore_caseto enable case-insensitive URL and anchor checking in the linkcheck builder.Problem
Some web servers (e.g., GitHub, certain hosting platforms) are case-insensitive and may return URLs with different casing than the original link. This causes the linkcheck builder to report false-positive redirects when the URLs differ only in case, even though they point to the same resource.
Solution
linkcheck_ignore_caseboolean configuration option (default:False)AnchorCheckParserto support case-insensitive matching when enableddoc/usage/configuration.rst