PEP 701: Add some clarifications to f-string debug expressions and tokens (#2929)

pablogsal · web-flow · commit 9de4efd734ce · 2022-12-16T21:41:26.000Z
diff --git a/pep-0701.rst b/pep-0701.rst
@@ -213,7 +213,47 @@ This PEP leaves up to the implementation the level of f-string nesting allowed.
 This means that limiting nesting is **not part of the language specification**
 but also the language specification **doesn't mandate arbitrary nesting**. 
 
-Three new tokens are introduced:
+Handling of f-string debug expressions
+--------------------------------------
+
+Since Python 3.8, f-strings can be used to debug expressions by using the
+``=`` operator. For example::
+
+    >>> a = 1
+    >>> f"{1+1=}"
+    '1+1=2'
+
+This semantics were not introduced formally in a PEP and they were implemented
+in the current string parser as a special case in `bpo-36817
+<https://bugs.python.org/issue?@action=redirect&bpo=36817>`_ and documented in
+`the f-string lexical analysis section
+<https://docs.python.org/3/reference/lexical_analysis.html#f-strings>`_.
+
+This feature is not affected by the changes proposed in this PEP but is
+important to specify that the formal handling of this feature requires the lexer
+to be able to "untokenize" the expression part of the f-string. This is not a
+problem for the current string parser as it can operate directly on the string
+token contents. However, incorporating this feature into a given parser
+implementation requires the lexer to keep track of the raw string contents of
+the expression part of the f-string and make them available to the parser when
+the parse tree is constructed for f-string nodes. A pure "untokenization" is not
+enough because as specified currently, f-string debugging preserve whitespace,
+including spaces after the ``{`` and the ``=`` characters. This means that the
+raw string contents of the expression part of the f-string must be kept intact
+and not just the associated tokens.
+
+How parser/lexer implementations deal with this problem is of course up to the
+implementation.
+
+New tokens
+----------
+
+Three new tokens are introduced: ``FSTRING_START``, ``FSTRING_MIDDLE`` and
+``FSTRING_END``. This PEP does not mandate the precise definitions of these tokens
+as different lexers may have different implementations that may be more efficient
+than the ones proposed here given the context of the particular implementation.  However,
+the following definitions are provided as a reference so that the reader can have a
+better understanding of the proposed grammar changes and how the tokens are used:
 
 * ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s).
 * ``FSTRING_MIDDLE``: This token includes the text between the opening quote
@@ -254,6 +294,9 @@ while ``f"""some words"""`` will be tokenized simply as::
     FSTRING_START - 'f"""'
     FSTRING_END - 'some words'
 
+Consequences of the new grammar
+-------------------------------
+
 All restrictions mentioned in the PEP are lifted from f-literals, as explained below:
 
 * Expression portions may now contain strings delimited with the same kind of
@@ -291,7 +334,7 @@ limited to be different from the quotes of the enclosing string, because this is
 now allowed: as an arbitrary Python string can contain any possible choice of
 quotes, so can any f-string expression. Additionally there is no need to clarify
 that certain things are not allowed in the expression part because of
-implementation restructions such as comments, new line characters or
+implementation restrictions such as comments, new line characters or
 backslashes. 
 
 The only "surprising" difference is that as f-strings allow specifying a