Skip to content

gh-65002: Make note that null bytes are used to pad bytes #98635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 28, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Doc/library/struct.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ platform-dependent.
+--------+--------------------------+--------------------+----------------+------------+
| Format | C Type | Python type | Standard size | Notes |
+========+==========================+====================+================+============+
| ``x`` | pad byte | no value | | |
| ``x`` | pad byte | no value | | \(7) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best if this footnote were the first. Is (0) allowed or must footnote numbering start with 1?
If the latter, I'm NOT asking for renumbering in this situation.

A deeper question: by 'null byte', I presume the footnote means \0. Why put this in a footnote? Why not

Suggested change
| ``x`` | pad byte | no value | | \(7) |
| ``x`` | \0 pad byte | no value | | |

Copy link
Member

@terryjreedy terryjreedy Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative

Suggested change
| ``x`` | pad byte | no value | | \(7) |
| ``x`` | pad byte (\0) | no value | | |

Copy link
Member

@CAM-Gerlach CAM-Gerlach Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on Discord, this isn't a footnote at all, its just arbitrary text—so it could be made anything at all, including 0. They should be made actual footnotes ([1]_/.. [1]), or much better auto-numbered footnotes ([#note-x]_/.. [#note-x]) that would avoid this problem and also getting out of sync and the reader doesn't have to manually scroll multiple pages to find the note (which I've had to do quite a bit on this particular table). However, I considered that out of scope of this particular PR, and better done in a followup, which is why the numbering didn't bother much (as it would be quickly fixed anyway).

I presume the clarification in the footnote is for when this is used to output rather than input a binary struct. My concern here is that since the behavior is not 1:1 on input and output (pad bytes on input are ignored, while on output they area always \0), I'm concerned this might mislead users into thinking that input bytes must be \0, rather than simply ignored. Also, it should be a literal, so \0

Incorporating @zware 's suggestion to resolve this, as well as making it a literal:

Suggested change
| ``x`` | pad byte | no value | | \(7) |
| ``x`` | pad byte | no value | | |
| | (output as ``\0``) | | | |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by input a binary struct? Like a = bytearray(b'\0Y')?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is if you call struct.unpack with x in the format on some struct-packed bytes as input, the bytes ignored by x don't have to be \0 (as the previous suggestion could imply), but rather could be anything. \0 is only what is output/packed, not nessesarily what is input/unpacked. I.e. on output/packing,

>>> struct.pack('!xxc', b'a')
b'\x00\x00a'

but those bytes can be anything on input/unpacking, not just \0, i.e.

>>> struct.unpack('!xxc', b'\0\0a')
(b'a',)
>>> struct.unpack('!xxc', b'42a')
(b'a',)
>>> struct.unpack('!xxc', b'\r\na')
(b'a',)

where as "\0 pad byte" may imply to the user that only the first might be accepted.

+--------+--------------------------+--------------------+----------------+------------+
| ``c`` | :c:expr:`char` | bytes of length 1 | 1 | |
+--------+--------------------------+--------------------+----------------+------------+
Expand Down Expand Up @@ -291,6 +291,9 @@ Notes:
operations. See the Wikipedia page on the `half-precision floating-point
format <half precision format_>`_ for more information.

(7)
For padding, ``x`` inserts null bytes.

Comment on lines +294 to +296
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If either change suggested above is accepted,

Suggested change
(7)
For padding, ``x`` inserts null bytes.


A format character may be preceded by an integral repeat count. For example,
the format string ``'4h'`` means exactly the same as ``'hhhh'``.
Expand Down