Skip to content

Conversation

@FazeelUsmani
Copy link

This PR adds a new configuration option linkcheck_ignore_case to enable case-insensitive URL and anchor checking in the linkcheck builder.

Problem

Some web servers (e.g., GitHub, certain hosting platforms) are case-insensitive and may return URLs with different casing than the original link. This causes the linkcheck builder to report false-positive redirects when the URLs differ only in case, even though they point to the same resource.

Solution

  • Added linkcheck_ignore_case boolean configuration option (default: False)
  • Modified URL comparison logic to support case-insensitive matching when enabled
  • Modified anchor comparison in AnchorCheckParser to support case-insensitive matching when enabled
  • Added comprehensive tests for both URL and anchor case-insensitive checking
  • Updated documentation in doc/usage/configuration.rst

@FazeelUsmani FazeelUsmani marked this pull request as draft November 7, 2025 15:08
@jayaddison
Copy link
Contributor

Hi @FazeelUsmani - thank you for developing and describing this pull request.

I have a concern that enabling the option reduces the precision of other hyperlinks that are checked.

Could you explain the use case where it would be easier for a documentation project to enable this option by editing the conf.py instead of fixing the URLs/anchors in their documentation sources to use the correct casing?

@FazeelUsmani FazeelUsmani marked this pull request as ready for review November 10, 2025 08:56
@FazeelUsmani
Copy link
Author

That’s a good point, @jayaddison.
This option is off by default and meant mainly only for large or older docs where many links hit servers that normalise URL casing (like GitHub) or are case-insensitive (like Windows). Enabling it just filters out harmless casing-related redirects so teams can focus on real link issues instead of noise.

@jayaddison
Copy link
Contributor

@FazeelUsmani got it, understood. As often happens, I had a misunderstanding to begin with - you are saying that this only affects whether case-adjusted response URLs are considered to be redirect instead of successful.

Let me think about this a little more; I do understand the value in this now, but am wary of (and trying to think of) any problem side-effects.

@jayaddison
Copy link
Contributor

(also, thank you for the explanation)

@jayaddison
Copy link
Contributor

Separately: I do think that we should probably isolate the redirect-case-sensitivity handling from the HTML anchor case-sensitivity; they seem fairly functionally different from each other to me.

@FazeelUsmani
Copy link
Author

Hmm.. makes sense. I can refactor this into two separate options:
linkcheck_ignore_case_urls: For comparing URL paths (the redirect scenario)
linkcheck_ignore_case_anchors: For comparing HTML anchors

This would give users more granular control. Most users would likely want linkcheck_ignore_case_urls = True (for case-insensitive servers) while keeping linkcheck_ignore_case_anchors = False (since HTML IDs are technically case-sensitive per spec). What do you say?

@AA-Turner
Copy link
Member

AA-Turner commented Nov 10, 2025

Two options seems overkill for this use-case. What do browsers do de facto on case mismatches on fragment IDs?

A

@FazeelUsmani
Copy link
Author

Fair point — browsers generally treat fragment IDs as case-sensitive, though behavior can vary depending on the HTML generator. My thought was mainly to avoid false negatives in edge cases (like auto-generated anchors that normalize casing differently).
That said, I’m fine keeping it as a single option if we note the anchor behavior clearly in the docs.

@jayaddison
Copy link
Contributor

I can't think of drawbacks to the redirect case-folding -- and although it's maybe slightly controversial, I wonder whether we should enable it by default.

The anchor-checking I'm less certain about; given that we believe browsers seem to navigate to anchors case-sensitively -- something I too checked locally and that is certainly the case in Firefox 140.4 -- I'd be reluctant to offer that without a demonstrable use-case (again that can't be solved easily by fixing the source documentation).

@FazeelUsmani
Copy link
Author

That makes sense — I’ll keep it as a single linkcheck_ignore_case option limited to the URL path. Anchor checks will remain case-sensitive to align with browser behavior, and I’ll clarify this distinction in the docs so users understand the expected behavior.

@jayaddison
Copy link
Contributor

Sounds good to me! Thanks @FazeelUsmani.

@FazeelUsmani FazeelUsmani marked this pull request as draft November 11, 2025 13:13
@FazeelUsmani FazeelUsmani force-pushed the linkcheck_case_insensitive branch 3 times, most recently from 56d6a63 to d115b1e Compare November 11, 2025 14:31
# With case-sensitive checking, a URL that redirects to different case
# should be marked as redirected
lowercase_uri = f'http://{address}/path'
if lowercase_uri in rowsby:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the tests locally, the code inside this if condition isn't being evaluated - so I think the test coverage is incomplete?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... let me check.

Yes, the test document didn't contain a link to /path.
I've created a dedicated test root (test-linkcheck-case-check) with a link to the /path endpoint and changed the conditional check to an assertion so the test will fail if the URL isn't being checked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 1487 to 1492
def mock_request(self, method, url, **kwargs):
response = original_request(self, method, url, **kwargs)
# Change the URL to uppercase to simulate server behavior
if '/path' in str(response.url).lower():
response.url = str(response.url).replace('/path', '/PATH')
return response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach of mocking/intercepting the HTTP response object bypasses some of the logic that we want to test.

Instead let's rewrite the CaseSensitiveHandler HTTP server so that when /path is requested, we redirect the client to /Path -- and then the test should be able to confirm (without mocking/patching) how the linkcheck builder handles that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've rewritten CaseSensitiveHandler to perform an actual HTTP 301 redirect from /path to /Path, and removed all the mocking/patching code. The test now correctly validates the redirect detection logic without bypassing any of the implementation.

Comment on lines 3816 to 3844
.. confval:: linkcheck_case_insensitive
:type: :code-py:`bool`
:default: :code-py:`False`

When :code-py:`True`, the *linkcheck* builder will compare URL paths
case-insensitively when checking for redirects.
This is useful for checking links on case-insensitive servers
(for example, GitHub, Windows-based servers, or certain hosting platforms)
that may return URLs with different case than the original link.

When enabled, URL paths like ``/Path`` and ``/path`` are considered
equivalent, preventing false-positive redirect warnings on
case-insensitive servers.

.. note::

This option only affects URL path comparison for redirect detection.
HTML anchor checking remains case-sensitive to match browser behavior,
where fragment identifiers (``#anchor``) are case-sensitive per the
HTML specification.

Example:

.. code-block:: python
linkcheck_case_insensitive = True
.. versionadded:: 8.2

Copy link
Contributor

@jayaddison jayaddison Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be my perception, but I would suggest that it may be easier to understand the config setting if we remove the inverse prefix (e.g. using sensitive instead of insensitive).

I also have a few ideas for the documentation text; please let me know what you think:

Suggested change
.. confval:: linkcheck_case_insensitive
:type: :code-py:`bool`
:default: :code-py:`False`
When :code-py:`True`, the *linkcheck* builder will compare URL paths
case-insensitively when checking for redirects.
This is useful for checking links on case-insensitive servers
(for example, GitHub, Windows-based servers, or certain hosting platforms)
that may return URLs with different case than the original link.
When enabled, URL paths like ``/Path`` and ``/path`` are considered
equivalent, preventing false-positive redirect warnings on
case-insensitive servers.
.. note::
This option only affects URL path comparison for redirect detection.
HTML anchor checking remains case-sensitive to match browser behavior,
where fragment identifiers (``#anchor``) are case-sensitive per the
HTML specification.
Example:
.. code-block:: python
linkcheck_case_insensitive = True
.. versionadded:: 8.2
.. confval:: linkcheck_case_sensitive
:type: :code-py:`bool`
:default: :code-py:`True`
This setting controls how the *linkcheck* builder decides
whether a hyperlink's destination is the same as the URL
written in the documentation.
By default, *linkcheck* requires the destination URL to match the written URL case-sensitively. This means that a link to ``http://webserver.test/USERNAME`` in
the documentation that the server redirects to ``http://webserver.test/username`` will be reported as ``redirected``.
To allow a more lenient URL comparison, that will report the previous case as
as ``working`` instead, configure this setting to ``False``.
.. note::
HTML anchor checking is always case-sensitive, and is
not affected by this setting.
.. versionadded:: 8.2

Edit: replace status code of successful with the actual value from the code, working.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied your documentation changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

Comment on lines 637 to 638
if self.case_insensitive:
normalised_url = normalised_url.casefold()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a rare scenario, but strictly speaking I think we should avoid casefolding any fragment/anchor identifier part of the URL.

e.g.

  • /USERNAME#Badges -> /username#badges
  • /USERNAME#Badges -> /username#Badges

(I only realised this after attempting to rewrite the config setting documentation in my own words -- in particular the part about anchor case-sensitivity)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I've fixed the _normalise_url() function to only casefold the URL before the fragment. Now /USERNAME#Badges correctly becomes /username#Badges (preserving the fragment case). The function now splits on # and only applies .casefold() to the URL part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; the updated URL normalisation logic looks good to me; thank you.

app.add_event('linkcheck-process-uri')

# priority 900 to happen after ``check_confval_types()``
app.connect('config-inited', handle_deprecated_linkcheck_case_config, priority=899)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add any deprecation/migration-handling code for this config setting, because the linkcheck_case_insensitive config setting has not been included in a published release of Sphinx.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the deprecation handler function and the old config value registration.

Comment on lines 1449 to 1463
def do_HEAD(self):
if self.path == '/path':
# Redirect lowercase /path to uppercase /Path
self.send_response(301, 'Moved Permanently')
self.send_header('Location', '/Path')
self.send_header('Content-Length', '0')
self.end_headers()
elif self.path == '/Path':
self.send_response(200, 'OK')
self.send_header('Content-Length', '0')
self.end_headers()
else:
self.send_response(404, 'Not Found')
self.send_header('Content-Length', '0')
self.end_headers()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the do_HEAD code block here, as the base HTTP server will invoke do_GET instead if it's missing.

NB: Let's also apply the Location response header simplification to do_GET when we do that, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've removed the do_HEAD method entirely - the base HTTP server will invoke do_GET instead. I've also simplified the Location header in do_GET to just use /Path instead of the full URL with host.

} in content


class CaseSensitiveHandler(BaseHTTPRequestHandler):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly nitpicky, but: the client and the test case are about case sensitivity -- but this handler is not really. What it does is to normalise/capitalise the path. So I'd suggest renaming it:

Suggested change
class CaseSensitiveHandler(BaseHTTPRequestHandler):
class CapitalisePathHandler(BaseHTTPRequestHandler):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it from CaseSensitiveHandler to CapitalisePathHandler and updated the docstring to reflect what it actually does.

Comment on lines 1501 to 1503
lowercase_uri = f'http://{address}/path'
assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked'
assert rowsby[lowercase_uri]['status'] == 'redirected'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the containment check is good defensive coding, we don't perform similar checks in the other tests, so for consistency I would recommend:

Suggested change
lowercase_uri = f'http://{address}/path'
assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked'
assert rowsby[lowercase_uri]['status'] == 'redirected'
assert rowsby[f'http://{address}/path']['status'] == 'redirected'

(I briefly considered suggesting renaming the variable to something like origin_url, outbound_url or documented_url -- but on balance I think we should simply omit it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified both tests to directly assert on rowsby[f'http://{address}/path']['status'] without the intermediate variable or defensive containment check, matching the style used in other tests in the file

@jayaddison
Copy link
Contributor

Ok, I think this is looking pretty good @FazeelUsmani - thanks for accommodating my feedback.

The only significant remaining question I have is whether this should be controlled as a single boolean value, or whether it should be enabled for patterns of URLs -- similar to the way that linkcheck_auth and linkcheck_ignore can be configured for specific patterns.

Currently: I am leaning towards the idea that, ideally, documentation projects should only enable this for the domains/websites where they need the tolerance for case-normalising redirects.

The reason I lean that way is that -- generally -- URLs are in fact case sensitive, and so I have some lingering doubts about making this a flag that is likely to affect cases where it's not worthwhile to casefold.

PS: either way, if you could also update the title of this PR that would be great!

@FazeelUsmani FazeelUsmani changed the title Add linkcheck_ignore_case configuration option Add linkcheck_case_sensitive configuration option Nov 18, 2025
@FazeelUsmani
Copy link
Author

Thanks for the feedback @jayaddison!

That's a really good point about making this pattern-based rather than a global boolean. I can see how that would be safer and more precise - only applying case-insensitivity to domains that actually need it (like GitHub), similar to how linkcheck_auth and linkcheck_ignore work.

So, the pattern-based approach would be something like:

linkcheck_case_sensitive_patterns = {
    r'https://github\.com/.*': False,
}

Does this sounds good to you?

@jayaddison
Copy link
Contributor

jayaddison commented Nov 18, 2025

That mostly sounds good - I would suggest some adjustments, though: either:

  • Rename the config dictionary's name to linkcheck_case_sensitivity -- for consistency with the auth and ignore settings, that omit the word patterns.

OR

  • Use a list datastructure instead of a dict, and rename the variable to linkcheck_case_insensitive (my mistake - perhaps I should not have suggested removing the in... prefix earlier! if we choose this route, I think it'd make sense again because it'd be describing a collection of opt-in patterns that disable the default case-sensitive behaviour)

I don't have a strong preference yet -- I'm continuing to think about it.

The second option might make it slightly more apparent to readers that these domains patterns are exceptions and that case-sensitive URL comparison is the default.

Edit: fixup: the matching patterns are not necessarily domains.

@FazeelUsmani
Copy link
Author

Thanks for clarifying, @jayaddison! I'll go with Option 2 - using a list with linkcheck_case_insensitive. I agree it makes the intent clearer: these are opt-in patterns for URLs that should be treated case-insensitively, with case-sensitive being the default.

So the config would look like:

linkcheck_case_insensitive = [
    r'https://github\.com/.*',
]

Does that sound good to you? I'll start refactoring to implement pattern matching once you confirm.

@jayaddison
Copy link
Contributor

@FazeelUsmani yep, that sounds good to me 👍

freshenv=True,
confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/.*']},
)
def test_linkcheck_case_insensitive(app: SphinxTestApp) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I think it would be possible to merge the two test cases into a single def test_linkcheck_case_sensitivity now that we can pattern-match different URL patterns.

(the code for the two of them is very similar currently, so perhaps the end result would be neater/smaller by combining them)



@pytest.mark.sphinx('html', testroot='numfig')
@pytest.mark.test_params(shared_result='test_build_html_numfig')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to restore this pytest marker (I think you did, but maybe it got removed again somehow).

@jayaddison
Copy link
Contributor

Ah, and one more thing: please make sure to update CHANGES.rst to describe your change and add a credit entry for it!

I think it's looking good; we'll also need a core developer/maintainer to approve it, and I can't promise if/when that will be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants