Skip to content

ENH: enable linkchecker #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 2, 2023
Merged

ENH: enable linkchecker #174

merged 21 commits into from
Jul 2, 2023

Conversation

mmcky
Copy link
Contributor

@mmcky mmcky commented Apr 23, 2023

This PR enables the link checker that will run:

  1. when a PR is opened or reopened

The link checker will only run once per Pull Request.

@netlify
Copy link

netlify bot commented Apr 23, 2023

Deploy Preview for taupe-gaufre-c4e660 ready!

Name Link
🔨 Latest commit ab0977f
🔍 Latest deploy log https://app.netlify.com/sites/taupe-gaufre-c4e660/deploys/64a1293c88696000080d36e0
😎 Deploy Preview https://deploy-preview-174--taupe-gaufre-c4e660.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions
Copy link

github-actions bot commented Apr 23, 2023

@mmcky
Copy link
Contributor Author

mmcky commented Apr 24, 2023

@HengchengZhang (cc: @HumphreyYang) can you please review the link checker results:

I suspect the 403 is a valid link so essentially this might be a false positive

(     zreferences: line   13) broken    https://doi.org/https://doi.org/10.2307/1235116 - 403 Client Error: Forbidden for url: https://onlinelibrary.wiley.com/doi/abs/10.2307/1235116

It would be great if you could (as a project) do some research into making the link checker that we use (provided by sphinx) more robust.

https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-the-linkcheck-builder

I look forward to what you find. Thanks.

@HengchengZhang
Copy link
Member

Sorry @mmcky, I didn't manage to come up a nice solution for now.

The problem with sphinx.builders.linkcheck is that it used some python based web scrapping technique which can be detected by Wiley online library.

Therefore the Wiley online library return a 403 error code and rejected the link checker.

I've tried changing the configuration in sphinx like changing the user-agent name and request-header string to make our request more like a real browser but it still didn't work. The only time I succeeded is by using a headless scrapping package.

Thus I believe this problem only occurs when check links direct to some really anti-scrapping website like the Wiley online library because I also tried to use other web scrapping packages to read websites in Wiley online library but they also failed.

The current solution come to my mind is to manually check these urls and ignore them in link checker to avoid building error. Of course we can try different link checker but it could be complicated and requires additional packages.

@mmcky
Copy link
Contributor Author

mmcky commented May 3, 2023

thanks @HengchengZhang nicely researched.

In that case I would be in favour of adding an ignore to these Wiley links for now.

Now that our source files are in md files perhaps we can look at other link checker tools to see if they work such as:

@HengchengZhang
Copy link
Member

Thanks @mmcky, this actually works for the Wiley links!

But it fails for some wikipedia links because it also check all the anchors in the link.( and wikipedia has a lot of empty anchors)

I think that can be fixed by ignoring anchor checking during the link checking process. But then it is wried that sphinx didn't report them as errors as the anchor check is set to be True by default.

@mmcky
Copy link
Contributor Author

mmcky commented May 3, 2023

But it fails for some wikipedia links because it also check all the anchors in the link.( and wikipedia has a lot of empty anchors)

I am not sure I fully understand this. Do the anchors resolve in the browser but fail on the link checker?

@mmcky
Copy link
Contributor Author

mmcky commented May 3, 2023

It looks like they are building special cases into sphinx for github

https://github.com/sphinx-doc/sphinx/pull/9260/files

So maybe wikipedia has some special structures as well?

@mmcky
Copy link
Contributor Author

mmcky commented Jun 28, 2023

@HengchengZhang once the new build runs can you check the linkchecker results for anything that should be fixed before next week.

@mmcky mmcky closed this Jun 28, 2023
@mmcky mmcky reopened this Jun 28, 2023
@mmcky
Copy link
Contributor Author

mmcky commented Jun 28, 2023

@HengchengZhang the results are now available in the linkchecker task. 👍

@HengchengZhang
Copy link
Member

Hi @mmcky I've fixed the broken links, but the error left here is that the link checker consider QuantEcon.py as a link and thus report failure.

Shall we also add this to ignore list or paraphrase QuantEcon.py to something like the QuantEcon package?

@mmcky mmcky closed this Jul 2, 2023
@mmcky mmcky reopened this Jul 2, 2023
@mmcky mmcky closed this Jul 2, 2023
@mmcky mmcky reopened this Jul 2, 2023
@mmcky
Copy link
Contributor Author

mmcky commented Jul 2, 2023

thanks @HengchengZhang it should pass now

@mmcky mmcky closed this Jul 2, 2023
@mmcky mmcky reopened this Jul 2, 2023
@mmcky
Copy link
Contributor Author

mmcky commented Jul 2, 2023

thanks @HengchengZhang

@mmcky mmcky merged commit e49728b into main Jul 2, 2023
@mmcky mmcky deleted the enable-linkcheck branch July 2, 2023 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants