12
12
13
13
--------------
14
14
15
- This module performs conversions between Python values and C structs represented
16
- as Python :class: `bytes ` objects. This can be used in handling binary data
17
- stored in files or from network connections, among other sources. It uses
18
- :ref: `struct-format-strings ` as compact descriptions of the layout of the C
19
- structs and the intended conversion to/from Python values.
15
+ This module converts between Python values and C structs represented
16
+ as Python :class: `bytes ` objects. Compact :ref: `format strings <struct-format-strings >`
17
+ describe the intended conversions to/from Python values.
18
+ The module's functions and objects can be used for two largely
19
+ distinct applications, data exchange with external sources (files or
20
+ network connections), or data transfer between the Python application
21
+ and the C layer.
20
22
21
23
.. note ::
22
24
23
- By default, the result of packing a given C struct includes pad bytes in
24
- order to maintain proper alignment for the C types involved; similarly,
25
- alignment is taken into account when unpacking. This behavior is chosen so
26
- that the bytes of a packed struct correspond exactly to the layout in memory
27
- of the corresponding C struct. To handle platform-independent data formats
28
- or omit implicit pad bytes, use ``standard `` size and alignment instead of
29
- ``native `` size and alignment: see :ref: `struct-alignment ` for details.
25
+ When no prefix character is given, native mode is the default. It
26
+ packs or unpacks data based on the platform and compiler on which
27
+ the Python interpreter was built.
28
+ The result of packing a given C struct includes pad bytes which
29
+ maintain proper alignment for the C types involved; similarly,
30
+ alignment is taken into account when unpacking. In contrast, when
31
+ communicating data between external sources, the programmer is
32
+ responsible for defining byte ordering and padding between elements.
33
+ See :ref: `struct-alignment ` for details.
30
34
31
35
Several :mod: `struct ` functions (and methods of :class: `Struct `) take a *buffer *
32
36
argument. This refers to objects that implement the :ref: `bufferobjects ` and
@@ -102,10 +106,13 @@ The module defines the following exception and functions:
102
106
Format Strings
103
107
--------------
104
108
105
- Format strings are the mechanism used to specify the expected layout when
106
- packing and unpacking data. They are built up from :ref: `format-characters `,
107
- which specify the type of data being packed/unpacked. In addition, there are
108
- special characters for controlling the :ref: `struct-alignment `.
109
+ Format strings describe the data layout when
110
+ packing and unpacking data. They are built up from :ref: `format characters<format-characters> `,
111
+ which specify the type of data being packed/unpacked. In addition,
112
+ special characters control the :ref: `byte order, size and alignment<struct-alignment> `.
113
+ Each format string consists of an optional prefix character which
114
+ describes the overall properties of the data and one or more format
115
+ characters which describe the actual data values and padding.
109
116
110
117
111
118
.. _struct-alignment :
@@ -116,6 +123,11 @@ Byte Order, Size, and Alignment
116
123
By default, C types are represented in the machine's native format and byte
117
124
order, and properly aligned by skipping pad bytes if necessary (according to the
118
125
rules used by the C compiler).
126
+ This behavior is chosen so
127
+ that the bytes of a packed struct correspond exactly to the memory layout
128
+ of the corresponding C struct.
129
+ Whether to use native byte ordering
130
+ and padding or standard formats depends on the application.
119
131
120
132
.. index ::
121
133
single: @ (at); in struct format strings
@@ -144,12 +156,10 @@ following table:
144
156
145
157
If the first character is not one of these, ``'@' `` is assumed.
146
158
147
- Native byte order is big-endian or little-endian, depending on the host
148
- system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
149
- IBM z and most legacy architectures are big-endian;
150
- and ARM, RISC-V and IBM Power feature switchable endianness
151
- (bi-endian, though the former two are nearly always little-endian in practice).
152
- Use ``sys.byteorder `` to check the endianness of your system.
159
+ Native byte order is big-endian or little-endian, depending on the
160
+ host system. For example, Intel x86, AMD64 (x86-64), and Apple M1 are
161
+ little-endian; IBM z and many legacy architectures are big-endian.
162
+ Use :data: `sys.byteorder ` to check the endianness of your system.
153
163
154
164
Native size and alignment are determined using the C compiler's
155
165
``sizeof `` expression. This is always combined with native byte order.
@@ -231,9 +241,9 @@ platform-dependent.
231
241
+--------+--------------------------+--------------------+----------------+------------+
232
242
| ``d `` | :c:expr: `double ` | float | 8 | \( 4) |
233
243
+--------+--------------------------+--------------------+----------------+------------+
234
- | ``s `` | :c:expr: `char[] ` | bytes | | |
244
+ | ``s `` | :c:expr: `char[] ` | bytes | | \( 9) |
235
245
+--------+--------------------------+--------------------+----------------+------------+
236
- | ``p `` | :c:expr: `char[] ` | bytes | | |
246
+ | ``p `` | :c:expr: `char[] ` | bytes | | \( 8) |
237
247
+--------+--------------------------+--------------------+----------------+------------+
238
248
| ``P `` | :c:expr: `void \* ` | integer | | \( 5) |
239
249
+--------+--------------------------+--------------------+----------------+------------+
@@ -292,24 +302,40 @@ Notes:
292
302
format <half precision format_> `_ for more information.
293
303
294
304
(7)
295
- For padding, ``x `` inserts null bytes.
296
-
305
+ When packing, ``'x' `` inserts one NUL byte.
306
+
307
+ (8)
308
+ The ``'p' `` format character encodes a "Pascal string", meaning a short
309
+ variable-length string stored in a *fixed number of bytes *, given by the count.
310
+ The first byte stored is the length of the string, or 255, whichever is
311
+ smaller. The bytes of the string follow. If the string passed in to
312
+ :func: `pack ` is too long (longer than the count minus 1), only the leading
313
+ ``count-1 `` bytes of the string are stored. If the string is shorter than
314
+ ``count-1 ``, it is padded with null bytes so that exactly count bytes in all
315
+ are used. Note that for :func: `unpack `, the ``'p' `` format character consumes
316
+ ``count `` bytes, but that the string returned can never contain more than 255
317
+ bytes.
318
+
319
+ (9)
320
+ For the ``'s' `` format character, the count is interpreted as the length of the
321
+ bytes, not a repeat count like for the other format characters; for example,
322
+ ``'10s' `` means a single 10-byte string mapping to or from a single
323
+ Python byte string, while ``'10c' `` means 10
324
+ separate one byte character elements (e.g., ``cccccccccc ``) mapping
325
+ to or from ten different Python byte objects. (See :ref: `struct-examples `
326
+ for a concrete demonstration of the difference.)
327
+ If a count is not given, it defaults to 1. For packing, the string is
328
+ truncated or padded with null bytes as appropriate to make it fit. For
329
+ unpacking, the resulting bytes object always has exactly the specified number
330
+ of bytes. As a special case, ``'0s' `` means a single, empty string (while
331
+ ``'0c' `` means 0 characters).
297
332
298
333
A format character may be preceded by an integral repeat count. For example,
299
334
the format string ``'4h' `` means exactly the same as ``'hhhh' ``.
300
335
301
336
Whitespace characters between formats are ignored; a count and its format must
302
337
not contain whitespace though.
303
338
304
- For the ``'s' `` format character, the count is interpreted as the length of the
305
- bytes, not a repeat count like for the other format characters; for example,
306
- ``'10s' `` means a single 10-byte string, while ``'10c' `` means 10 characters.
307
- If a count is not given, it defaults to 1. For packing, the string is
308
- truncated or padded with null bytes as appropriate to make it fit. For
309
- unpacking, the resulting bytes object always has exactly the specified number
310
- of bytes. As a special case, ``'0s' `` means a single, empty string (while
311
- ``'0c' `` means 0 characters).
312
-
313
339
When packing a value ``x `` using one of the integer formats (``'b' ``,
314
340
``'B' ``, ``'h' ``, ``'H' ``, ``'i' ``, ``'I' ``, ``'l' ``, ``'L' ``,
315
341
``'q' ``, ``'Q' ``), if ``x `` is outside the valid range for that format
@@ -319,17 +345,6 @@ then :exc:`struct.error` is raised.
319
345
Previously, some of the integer formats wrapped out-of-range values and
320
346
raised :exc: `DeprecationWarning ` instead of :exc: `struct.error `.
321
347
322
- The ``'p' `` format character encodes a "Pascal string", meaning a short
323
- variable-length string stored in a *fixed number of bytes *, given by the count.
324
- The first byte stored is the length of the string, or 255, whichever is
325
- smaller. The bytes of the string follow. If the string passed in to
326
- :func: `pack ` is too long (longer than the count minus 1), only the leading
327
- ``count-1 `` bytes of the string are stored. If the string is shorter than
328
- ``count-1 ``, it is padded with null bytes so that exactly count bytes in all
329
- are used. Note that for :func: `unpack `, the ``'p' `` format character consumes
330
- ``count `` bytes, but that the string returned can never contain more than 255
331
- bytes.
332
-
333
348
.. index :: single: ? (question mark); in struct format strings
334
349
335
350
For the ``'?' `` format character, the return value is either :const: `True ` or
@@ -345,18 +360,36 @@ Examples
345
360
^^^^^^^^
346
361
347
362
.. note ::
348
- All examples assume a native byte order, size, and alignment with a
349
- big-endian machine.
363
+ Native byte order examples (designated by the ``'@' `` format prefix or
364
+ lack of any prefix character) may not match what the reader's
365
+ machine produces as
366
+ that depends on the platform and compiler.
367
+
368
+ Pack and unpack integers of three different sizes, using big endian
369
+ ordering::
350
370
351
- A basic example of packing/unpacking three integers::
371
+ >>> from struct import *
372
+ >>> pack(">bhl", 1, 2, 3)
373
+ b'\x01\x00\x02\x00\x00\x00\x03'
374
+ >>> unpack('>bhl', b'\x01\x00\x02\x00\x00\x00\x03'
375
+ (1, 2, 3)
376
+ >>> calcsize('>bhl')
377
+ 7
352
378
353
- >>> from struct import *
354
- >>> pack('hhl', 1, 2, 3)
355
- b'\x00\x01\x00\x02\x00\x00\x00\x03'
356
- >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
357
- (1, 2, 3)
358
- >>> calcsize('hhl')
359
- 8
379
+ Attempt to pack an integer which is too large for the defined field::
380
+
381
+ >>> pack(">h", 99999)
382
+ Traceback (most recent call last):
383
+ File "<stdin>", line 1, in <module>
384
+ struct.error: 'h' format requires -32768 <= number <= 32767
385
+
386
+ Demonstrate the difference between ``'s' `` and ``'c' `` format
387
+ characters::
388
+
389
+ >>> pack("@ccc", b'1', b'2', b'3')
390
+ b'123'
391
+ >>> pack("@3s", b'123')
392
+ b'123'
360
393
361
394
Unpacked fields can be named by assigning them to variables or by wrapping
362
395
the result in a named tuple::
@@ -369,35 +402,132 @@ the result in a named tuple::
369
402
>>> Student._make(unpack('<10sHHb', record))
370
403
Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
371
404
372
- The ordering of format characters may have an impact on size since the padding
373
- needed to satisfy alignment requirements is different::
374
-
375
- >>> pack('ci', b'*', 0x12131415)
376
- b'*\x00\x00\x00\x12\x13\x14\x15'
377
- >>> pack('ic', 0x12131415, b'*')
378
- b'\x12\x13\x14\x15*'
379
- >>> calcsize('ci')
405
+ The ordering of format characters may have an impact on size in native
406
+ mode since padding is implicit. In standard mode, the user is
407
+ responsible for inserting any desired padding.
408
+ Note in
409
+ the first ``pack `` call below that three NUL bytes were added after the
410
+ packed ``'#' `` to align the following integer on a four-byte boundary.
411
+ In this example, the output was produced on a little endian machine::
412
+
413
+ >>> pack('@ci', b'#', 0x12131415)
414
+ b'#\x00\x00\x00\x15\x14\x13\x12'
415
+ >>> pack('@ic', 0x12131415, b'#')
416
+ b'\x15\x14\x13\x12#'
417
+ >>> calcsize('@ci')
380
418
8
381
- >>> calcsize('ic')
419
+ >>> calcsize('@ ic')
382
420
5
383
421
384
- The following format ``'llh0l' `` specifies two pad bytes at the end, assuming
385
- longs are aligned on 4-byte boundaries::
422
+ The following format ``'llh0l' `` results in two pad bytes being added
423
+ at the end, assuming the platform's longs are aligned on 4-byte boundaries::
386
424
387
- >>> pack('llh0l', 1, 2, 3)
425
+ >>> pack('@ llh0l', 1, 2, 3)
388
426
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
389
427
390
- This only works when native size and alignment are in effect; standard size and
391
- alignment does not enforce any alignment.
392
-
393
428
394
429
.. seealso ::
395
430
396
431
Module :mod: `array `
397
432
Packed binary storage of homogeneous data.
398
433
399
- Module :mod: `xdrlib `
400
- Packing and unpacking of XDR data.
434
+ Module :mod: `json `
435
+ JSON encoder and decoder.
436
+
437
+ Module :mod: `pickle `
438
+ Python object serialization.
439
+
440
+
441
+ .. _applications :
442
+
443
+ Applications
444
+ ------------
445
+
446
+ Two main applications for the :mod: `struct ` module exist, data
447
+ interchange between Python and C code within an application or another
448
+ application compiled using the same compiler (:ref: `native formats<struct-native-formats> `), and
449
+ data interchange between applications using agreed upon data layout
450
+ (:ref: `standard formats<struct-standard-formats> `). Generally speaking, the format strings
451
+ constructed for these two domains are distinct.
452
+
453
+
454
+ .. _struct-native-formats :
455
+
456
+ Native Formats
457
+ ^^^^^^^^^^^^^^
458
+
459
+ When constructing format strings which mimic native layouts, the
460
+ compiler and machine architecture determine byte ordering and padding.
461
+ In such cases, the ``@ `` format character should be used to specify
462
+ native byte ordering and data sizes. Internal pad bytes are normally inserted
463
+ automatically. It is possible that a zero-repeat format code will be
464
+ needed at the end of a format string to round up to the correct
465
+ byte boundary for proper alignment of consective chunks of data.
466
+
467
+ Consider these two simple examples (on a 64-bit, little-endian
468
+ machine)::
469
+
470
+ >>> calcsize('@lhl')
471
+ 24
472
+ >>> calcsize('@llh')
473
+ 18
474
+
475
+ Data is not padded to an 8-byte boundary at the end of the second
476
+ format string without the use of extra padding. A zero-repeat format
477
+ code solves that problem::
478
+
479
+ >>> calcsize('@llh0l')
480
+ 24
481
+
482
+ The ``'x' `` format code can be used to specify the repeat, but for
483
+ native formats it is better to use a zero-repeat format like ``'0l' ``.
484
+
485
+ By default, native byte ordering and alignment is used, but it is
486
+ better to be explicit and use the ``'@' `` prefix character.
487
+
488
+
489
+ .. _struct-standard-formats :
490
+
491
+ Standard Formats
492
+ ^^^^^^^^^^^^^^^^
493
+
494
+ When exchanging data beyond your process such as networking or storage,
495
+ be precise. Specify the exact byte order, size, and alignment. Do
496
+ not assume they match the native order of a particular machine.
497
+ For example, network byte order is big-endian, while many popular CPUs
498
+ are little-endian. By defining this explicitly, the user need not
499
+ care about the specifics of the platform their code is running on.
500
+ The first character should typically be ``< `` or ``> ``
501
+ (or ``! ``). Padding is the responsibility of the programmer. The
502
+ zero-repeat format character won't work. Instead, the user must
503
+ explicitly add ``'x' `` pad bytes where needed. Revisiting the
504
+ examples from the previous section, we have::
505
+
506
+ >>> calcsize('<qh6xq')
507
+ 24
508
+ >>> pack('<qh6xq', 1, 2, 3) == pack('@lhl', 1, 2, 3)
509
+ True
510
+ >>> calcsize('@llh')
511
+ 18
512
+ >>> pack('@llh', 1, 2, 3) == pack('<qqh', 1, 2, 3)
513
+ True
514
+ >>> calcsize('<qqh6x')
515
+ 24
516
+ >>> calcsize('@llh0l')
517
+ 24
518
+ >>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
519
+ True
520
+
521
+ The above results (executed on a 64-bit machine) aren't guaranteed to
522
+ match when executed on different machines. For example, the examples
523
+ below were executed on a 32-bit machine::
524
+
525
+ >>> calcsize('<qqh6x')
526
+ 24
527
+ >>> calcsize('@llh0l')
528
+ 12
529
+ >>> pack('@llh0l', 1, 2, 3) == pack('<qqh6x', 1, 2, 3)
530
+ False
401
531
402
532
403
533
.. _struct-objects :
@@ -411,9 +541,9 @@ The :mod:`struct` module also defines the following type:
411
541
.. class :: Struct(format)
412
542
413
543
Return a new Struct object which writes and reads binary data according to
414
- the format string *format *. Creating a Struct object once and calling its
415
- methods is more efficient than calling the :mod: ` struct ` functions with the
416
- same format since the format string only needs to be compiled once.
544
+ the format string *format *. Creating a `` Struct `` object once and calling its
545
+ methods is more efficient than calling module-level functions with the
546
+ same format since the format string is only compiled once.
417
547
418
548
.. note ::
419
549
0 commit comments