Extract links and optionally images from Markdown files. Output sorted list of unique links, one per line.
Extracted links can be further processed by other programs.
Tested on Windows and Linux.
git clone https://github.com/wpdevelopment11/linkextr
cd linkextr
python3 -m venv .venv
source .venv/bin/activate
# Install the package which is used to parse Markdown files
pip install mistletoe
Extract links from all Markdown files in the directory (recursively) and output them to stdout:
./linkextr.py /path/to/dir
Same, but output to the file:
./linkextr.py --output links.txt /path/to/dir
Extract links from the specified Markdown files only:
./linkextr.py first_file.md second_file.md
Extract images too, but only if their src starts with http://
or https://
:
./linkextr.py --images first_file.md second_file.md
Prefix the links that start with a forward slash to form the valid URL:
# For example:
# [see this post](/posts/hello-world) => https://example.com/posts/hello-world
./linkextr.py --prefix https://example.com file.md
linkextr.py [-h] [-o OUTPUT] [-p PREFIX] [-a] [-i] [dir | file(s) ...]
Extract links from Markdown files
Positional arguments:
dir | file(s) Directory or zero or more Markdown files to extract the links from (default: stdin)
Options:
-o, --output OUTPUT File to write extracted links (default: stdout)
-p, --prefix PREFIX Add a prefix to the links that start with a forward slash
-a, --alluri Extract all links, even if they don't start with http:// or https://
-i, --images Extract URLs of images in addition to links
-h, --help Show this help message and exit
python3 -m unittest discover test
- Only Markdown links are supported, the links from
<a>
HTML tags are not extracted.