Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions source/specifications/glob-patterns.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
=================
``glob`` patterns
=================

Some PyPA specifications, e.g. :ref:`pyproject.toml's license-files
<pyproject-toml-license-files>`, accept certain types of *glob patterns*
to match a given string containing wildcards and character ranges against
files and directories. This specification defines which patterns are acceptable
and how they should be handled.


Valid glob patterns
===================

For PyPA purposes, a *valid glob pattern* MUST be a string matched against
filesystem entries as specified below:

- Alphanumeric characters, underscores (``_``), hyphens (``-``) and dots (``.``)
MUST be matched verbatim.

- Special glob characters: ``*``, ``?``, ``**`` and character ranges: ``[]``
containing only the verbatim matched characters MUST be supported.
Within ``[...]``, the hyphen indicates a locale-agnostic range (e.g. ``a-z``,
order based on Unicode code points).
Hyphens at the start or end are matched literally.

- Path delimiters MUST be the forward slash character (``/``).

- Patterns always refer to *relative paths*,
e.g., when used in :file:`pyproject.toml`, patterns should always be
relative to the directory containing that file.
Therefore the leading slash character MUST NOT be used.

- Parent directory indicators (``..``) MUST NOT be used.

Any characters or character sequences not covered by this specification are
invalid. Projects MUST NOT use such values.
Tools consuming glob patterns SHOULD reject invalid values with an error.

Literal paths (e.g. :file:`LICENSE`) are valid globs which means they
can also be defined.

Tools consuming glob patterns:

- MUST treat each value as a glob pattern, and MUST raise an error if the
pattern contains invalid glob syntax.
- MUST raise an error if any individual user-specified pattern does not match
at least one file.

Examples of valid glob patterns:

.. code-block:: python

"LICEN[CS]E*"
"AUTHORS*"
"licenses/LICENSE.MIT"
"licenses/LICENSE.CC0"
"LICENSE.txt"
"licenses/*"

Examples of invalid glob patterns:

.. code-block:: python

"..\LICENSE.MIT"
# .. must not be used.
# \ is an invalid path delimiter, / must be used.

"LICEN{CSE*"
# the { character is not allowed


Reference implementation in Python
==================================

It is possible to defer the majority of the pattern matching against the file
system to the :mod:`glob` module in Python's standard library. It is necessary
however to perform additional validations.

The code below is as a simple reference implementation:

.. code-block:: python

import os
import re
from glob import glob


def find_pattern(pattern: str) -> list[str]:
"""
>>> find_pattern("/LICENSE.MIT")
Traceback (most recent call last):
...
ValueError: Pattern '/LICENSE.MIT' should be relative...
>>> find_pattern("../LICENSE.MIT")
Traceback (most recent call last):
...
ValueError: Pattern '../LICENSE.MIT' cannot contain '..'...
>>> find_pattern("LICEN{CSE*")
Traceback (most recent call last):
...
ValueError: Pattern 'LICEN{CSE*' contains invalid characters...
"""
if ".." in pattern:
raise ValueError(f"Pattern {pattern!r} cannot contain '..'")
if pattern.startswith((os.sep, "/")) or ":\\" in pattern:
raise ValueError(
f"Pattern {pattern!r} should be relative and must not start with '/'"
)
if re.match(r'^[\w\-\.\/\*\?\[\]]+$', pattern) is None:
raise ValueError(f"Pattern '{pattern}' contains invalid characters.")
found = glob(pattern, recursive=True)
if not found:
raise ValueError(f"Pattern '{pattern}' did not match any files.")
return found
31 changes: 5 additions & 26 deletions source/specifications/pyproject-toml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,8 @@ Tools SHOULD validate and perform case normalization of the expression.
The table subkeys of the ``license`` key are deprecated.


.. _pyproject-toml-license-files:

``license-files``
-----------------

Expand All @@ -260,43 +262,20 @@ configuration files, e.g. :file:`setup.py`, :file:`setup.cfg`, etc.)
to file(s) containing licenses and other legal notices to be
distributed with the package.

The strings MUST contain valid glob patterns, as specified below:

- Alphanumeric characters, underscores (``_``), hyphens (``-``) and dots (``.``)
MUST be matched verbatim.

- Special glob characters: ``*``, ``?``, ``**`` and character ranges: ``[]``
containing only the verbatim matched characters MUST be supported.
Within ``[...]``, the hyphen indicates a locale-agnostic range (e.g. ``a-z``,
order based on Unicode code points).
Hyphens at the start or end are matched literally.
The strings MUST contain valid glob patterns, as specified in
:doc:`/specifications/glob-patterns`.

- Path delimiters MUST be the forward slash character (``/``).
Patterns are relative to the directory containing :file:`pyproject.toml`,
therefore the leading slash character MUST NOT be used.

- Parent directory indicators (``..``) MUST NOT be used.

Any characters or character sequences not covered by this specification are
invalid. Projects MUST NOT use such values.
Tools consuming this field SHOULD reject invalid values with an error.
Patterns are relative to the directory containing :file:`pyproject.toml`,

Tools MUST assume that license file content is valid UTF-8 encoded text,
and SHOULD validate this and raise an error if it is not.

Literal paths (e.g. :file:`LICENSE`) are valid globs which means they
can also be defined.

Build tools:

- MUST treat each value as a glob pattern, and MUST raise an error if the
pattern contains invalid glob syntax.
- MUST include all files matched by a listed pattern in all distribution
archives.
- MUST list each matched file path under a License-File field in the
Core Metadata.
- MUST raise an error if any individual user-specified pattern does not match
at least one file.

If the ``license-files`` key is present and
is set to a value of an empty array, then tools MUST NOT include any
Expand Down
1 change: 1 addition & 0 deletions source/specifications/section-distribution-metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ Package Distribution Metadata
inline-script-metadata
platform-compatibility-tags
well-known-project-urls
glob-patterns