Skip to content

PEP 701: Add some clarifications to f-string debug expressions and tokens #2929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 45 additions & 2 deletions pep-0701.rst
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,47 @@ This PEP leaves up to the implementation the level of f-string nesting allowed.
This means that limiting nesting is **not part of the language specification**
but also the language specification **doesn't mandate arbitrary nesting**.

Three new tokens are introduced:
Handling of f-string debug expressions
--------------------------------------

Since Python 3.8, f-strings can be used to debug expressions by using the
``=`` operator. For example::

>>> a = 1
>>> f"{1+1=}"
'1+1=2'

This semantics were not introduced formally in a PEP and they were implemented
in the current string parser as a special case in `bpo-36817
<https://bugs.python.org/issue?@action=redirect&bpo=36817>`_ and documented in
`the f-string lexical analysis section
<https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_.

This feature is not affected by the changes proposed in this PEP but is
important to specify that the formal handling of this feature requires the lexer
to be able to "untokenize" the expression part of the f-string. This is not a
problem for the current string parser as it can operate directly on the string
token contents. However, incorporating this feature into a given parser
implementation requires the lexer to keep track of the raw string contents of
the expression part of the f-string and make them available to the parser when
the parse tree is constructed for f-string nodes. A pure "untokenization" is not
enough because as specified currently, f-string debugging preserve whitespace,
including spaces after the ``{`` and the ``=`` characters. This means that the
raw string contents of the expression part of the f-string must be kept intact
and not just the associated tokens.

How parser/lexer implementations deal with this problem is of course up to the
implementation.

New tokens
----------

Three new tokens are introduced: ``FSTRING_START``, ``FSTRING_MIDDLE`` and
``FSTRING_END``. This PEP does not mandate the precise definitions of these tokens
as different lexers may have different implementations that may be more efficient
than the ones proposed here given the context of the particular implementation. However,
the following definitions are provided as a reference so that the reader can have a
better understanding of the proposed grammar changes and how the tokens are used:

* ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s).
* ``FSTRING_MIDDLE``: This token includes the text between the opening quote
Expand Down Expand Up @@ -254,6 +294,9 @@ while ``f"""some words"""`` will be tokenized simply as::
FSTRING_START - 'f"""'
FSTRING_END - 'some words'

Consequences of the new grammar
-------------------------------

All restrictions mentioned in the PEP are lifted from f-literals, as explained below:

* Expression portions may now contain strings delimited with the same kind of
Expand Down Expand Up @@ -291,7 +334,7 @@ limited to be different from the quotes of the enclosing string, because this is
now allowed: as an arbitrary Python string can contain any possible choice of
quotes, so can any f-string expression. Additionally there is no need to clarify
that certain things are not allowed in the expression part because of
implementation restructions such as comments, new line characters or
implementation restrictions such as comments, new line characters or
backslashes.

The only "surprising" difference is that as f-strings allow specifying a
Expand Down