Skip to content

Commit dcafb36

Browse files
WilliamRoyNelsontomasr8sodlepicnixzencukou
authored
gh-121999: Change default tarfile filter to 'data' (GH-122002)
Co-authored-by: Tomas R <[email protected]> Co-authored-by: Scott Odle <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
1 parent bc94cf7 commit dcafb36

File tree

6 files changed

+76
-76
lines changed

6 files changed

+76
-76
lines changed

Doc/library/shutil.rst

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -706,11 +706,9 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
706706

707707
The keyword-only *filter* argument is passed to the underlying unpacking
708708
function. For zip files, *filter* is not accepted.
709-
For tar files, it is recommended to set it to ``'data'``,
710-
unless using features specific to tar and UNIX-like filesystems.
709+
For tar files, it is recommended to use ``'data'`` (default since Python
710+
3.14), unless using features specific to tar and UNIX-like filesystems.
711711
(See :ref:`tarfile-extraction-filter` for details.)
712-
The ``'data'`` filter will become the default for tar files
713-
in Python 3.14.
714712

715713
.. audit-event:: shutil.unpack_archive filename,extract_dir,format shutil.unpack_archive
716714

@@ -721,6 +719,12 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
721719
the *extract_dir* argument, e.g. members that have absolute filenames
722720
starting with "/" or filenames with two dots "..".
723721

722+
Since Python 3.14, the defaults for both built-in formats (zip and tar
723+
files) will prevent the most dangerous of such security issues,
724+
but will not prevent *all* unintended behavior.
725+
Read the :ref:`tarfile-further-verification`
726+
section for tar-specific details.
727+
724728
.. versionchanged:: 3.7
725729
Accepts a :term:`path-like object` for *filename* and *extract_dir*.
726730

Doc/library/tarfile.rst

Lines changed: 47 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,12 @@ Some facts and figures:
4040
Archives are extracted using a :ref:`filter <tarfile-extraction-filter>`,
4141
which makes it possible to either limit surprising/dangerous features,
4242
or to acknowledge that they are expected and the archive is fully trusted.
43-
By default, archives are fully trusted, but this default is deprecated
44-
and slated to change in Python 3.14.
4543

44+
.. versionchanged:: 3.14
45+
Set the default extraction filter to :func:`data <data_filter>`,
46+
which disallows some dangerous features such as links to absolute paths
47+
or paths outside of the destination. Previously, the filter strategy
48+
was equivalent to :func:`fully_trusted <fully_trusted_filter>`.
4649

4750
.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
4851

@@ -495,18 +498,18 @@ be finalized; only the internally used file object will be closed. See the
495498
The *filter* argument specifies how ``members`` are modified or rejected
496499
before extraction.
497500
See :ref:`tarfile-extraction-filter` for details.
498-
It is recommended to set this explicitly depending on which *tar* features
499-
you need to support.
501+
It is recommended to set this explicitly only if specific *tar* features
502+
are required, or as ``filter='data'`` to support Python versions with a less
503+
secure default (3.13 and lower).
500504

501505
.. warning::
502506

503507
Never extract archives from untrusted sources without prior inspection.
504-
It is possible that files are created outside of *path*, e.g. members
505-
that have absolute filenames starting with ``"/"`` or filenames with two
506-
dots ``".."``.
507508

508-
Set ``filter='data'`` to prevent the most dangerous security issues,
509-
and read the :ref:`tarfile-extraction-filter` section for details.
509+
Since Python 3.14, the default (:func:`data <data_filter>`) will prevent
510+
the most dangerous security issues.
511+
However, it will not prevent *all* unintended or insecure behavior.
512+
Read the :ref:`tarfile-extraction-filter` section for details.
510513

511514
.. versionchanged:: 3.5
512515
Added the *numeric_owner* parameter.
@@ -517,6 +520,9 @@ be finalized; only the internally used file object will be closed. See the
517520
.. versionchanged:: 3.12
518521
Added the *filter* parameter.
519522

523+
.. versionchanged:: 3.14
524+
The *filter* parameter now defaults to ``'data'``.
525+
520526

521527
.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None)
522528

@@ -536,10 +542,8 @@ be finalized; only the internally used file object will be closed. See the
536542

537543
.. warning::
538544

539-
See the warning for :meth:`extractall`.
540-
541-
Set ``filter='data'`` to prevent the most dangerous security issues,
542-
and read the :ref:`tarfile-extraction-filter` section for details.
545+
Never extract archives from untrusted sources without prior inspection.
546+
See the warning for :meth:`extractall` for details.
543547

544548
.. versionchanged:: 3.2
545549
Added the *set_attrs* parameter.
@@ -602,14 +606,8 @@ be finalized; only the internally used file object will be closed. See the
602606
String names are not allowed for this attribute, unlike the *filter*
603607
argument to :meth:`~TarFile.extract`.
604608

605-
If ``extraction_filter`` is ``None`` (the default),
606-
calling an extraction method without a *filter* argument will raise a
607-
``DeprecationWarning``,
608-
and fall back to the :func:`fully_trusted <fully_trusted_filter>` filter,
609-
whose dangerous behavior matches previous versions of Python.
610-
611-
In Python 3.14+, leaving ``extraction_filter=None`` will cause
612-
extraction methods to use the :func:`data <data_filter>` filter by default.
609+
If ``extraction_filter`` is ``None`` (the default), extraction methods
610+
will use the :func:`data <data_filter>` filter by default.
613611

614612
The attribute may be set on instances or overridden in subclasses.
615613
It also is possible to set it on the ``TarFile`` class itself to set a
@@ -619,6 +617,14 @@ be finalized; only the internally used file object will be closed. See the
619617
To set a global default this way, a filter function needs to be wrapped in
620618
:func:`staticmethod()` to prevent injection of a ``self`` argument.
621619

620+
.. versionchanged:: 3.14
621+
622+
The default filter is set to :func:`data <data_filter>`,
623+
which disallows some dangerous features such as links to absolute paths
624+
or paths outside of the destination.
625+
Previously, the default was equivalent to
626+
:func:`fully_trusted <fully_trusted_filter>`.
627+
622628
.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
623629

624630
Add the file *name* to the archive. *name* may be any type of file
@@ -969,6 +975,12 @@ In most cases, the full functionality is not needed.
969975
Therefore, *tarfile* supports extraction filters: a mechanism to limit
970976
functionality, and thus mitigate some of the security issues.
971977

978+
.. warning::
979+
980+
None of the available filters blocks *all* dangerous archive features.
981+
Never extract archives from untrusted sources without prior inspection.
982+
See also :ref:`tarfile-further-verification`.
983+
972984
.. seealso::
973985

974986
:pep:`706`
@@ -992,12 +1004,13 @@ can be:
9921004

9931005
* ``None`` (default): Use :attr:`TarFile.extraction_filter`.
9941006

995-
If that is also ``None`` (the default), raise a ``DeprecationWarning``,
996-
and fall back to the ``'fully_trusted'`` filter, whose dangerous behavior
997-
matches previous versions of Python.
1007+
If that is also ``None`` (the default), the ``'data'`` filter will be used.
1008+
1009+
.. versionchanged:: 3.14
9981010

999-
In Python 3.14, the ``'data'`` filter will become the default instead.
1000-
It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
1011+
The default filter is set to :func:`data <data_filter>`.
1012+
Previously, the default was equivalent to
1013+
:func:`fully_trusted <fully_trusted_filter>`.
10011014

10021015
* A callable which will be called for each extracted member with a
10031016
:ref:`TarInfo <tarinfo-objects>` describing the member and the destination
@@ -1080,6 +1093,9 @@ reused in custom filters:
10801093

10811094
Return the modified ``TarInfo`` member.
10821095

1096+
Note that this filter does not block *all* dangerous archive features.
1097+
See :ref:`tarfile-further-verification` for details.
1098+
10831099

10841100
.. _tarfile-extraction-refuse:
10851101

@@ -1093,6 +1109,8 @@ With ``errorlevel=0`` the error will be logged and the member will be skipped,
10931109
but extraction will continue.
10941110

10951111

1112+
.. _tarfile-further-verification:
1113+
10961114
Hints for further verification
10971115
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10981116

@@ -1110,9 +1128,10 @@ Here is an incomplete list of things to consider:
11101128
disk, memory and CPU usage.
11111129
* Check filenames against an allow-list of characters
11121130
(to filter out control characters, confusables, foreign path separators,
1113-
etc.).
1131+
and so on).
11141132
* Check that filenames have expected extensions (discouraging files that
1115-
execute when you “click on them”, or extension-less files like Windows special device names).
1133+
execute when you “click on them”, or extension-less files like Windows
1134+
special device names).
11161135
* Limit the number of extracted files, total size of extracted data,
11171136
filename length (including symlink length), and size of individual files.
11181137
* Check for files that would be shadowed on case-insensitive filesystems.

Lib/tarfile.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2248,13 +2248,7 @@ def _get_filter_function(self, filter):
22482248
if filter is None:
22492249
filter = self.extraction_filter
22502250
if filter is None:
2251-
import warnings
2252-
warnings.warn(
2253-
'Python 3.14 will, by default, filter extracted tar '
2254-
+ 'archives and reject files or modify their metadata. '
2255-
+ 'Use the filter argument to control this behavior.',
2256-
DeprecationWarning, stacklevel=3)
2257-
return fully_trusted_filter
2251+
return data_filter
22582252
if isinstance(filter, str):
22592253
raise TypeError(
22602254
'String names are not supported for '

Lib/test/test_shutil.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2145,9 +2145,6 @@ def check_unpack_archive_with_converter(self, format, converter, **kwargs):
21452145
def check_unpack_tarball(self, format):
21462146
self.check_unpack_archive(format, filter='fully_trusted')
21472147
self.check_unpack_archive(format, filter='data')
2148-
with warnings_helper.check_warnings(
2149-
('Python 3.14', DeprecationWarning)):
2150-
self.check_unpack_archive(format)
21512148

21522149
def test_unpack_archive_tar(self):
21532150
self.check_unpack_tarball('tar')

Lib/test/test_tarfile.py

Lines changed: 18 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -722,6 +722,24 @@ def format_mtime(mtime):
722722
tar.close()
723723
os_helper.rmtree(DIR)
724724

725+
@staticmethod
726+
def test_extractall_default_filter():
727+
# Test that the default filter is now "data", and the other filter types are not used.
728+
DIR = pathlib.Path(TEMPDIR) / "extractall_default_filter"
729+
with (
730+
os_helper.temp_dir(DIR),
731+
tarfile.open(tarname, encoding="iso8859-1") as tar,
732+
unittest.mock.patch("tarfile.data_filter", wraps=tarfile.data_filter) as mock_data_filter,
733+
unittest.mock.patch("tarfile.tar_filter", wraps=tarfile.tar_filter) as mock_tar_filter,
734+
unittest.mock.patch("tarfile.fully_trusted_filter", wraps=tarfile.fully_trusted_filter) as mock_ft_filter
735+
):
736+
directories = [t for t in tar if t.isdir()]
737+
tar.extractall(DIR, directories)
738+
739+
mock_data_filter.assert_called()
740+
mock_ft_filter.assert_not_called()
741+
mock_tar_filter.assert_not_called()
742+
725743
@os_helper.skip_unless_working_chmod
726744
def test_extract_directory(self):
727745
dirtype = "ustar/dirtype"
@@ -738,31 +756,6 @@ def test_extract_directory(self):
738756
finally:
739757
os_helper.rmtree(DIR)
740758

741-
def test_deprecation_if_no_filter_passed_to_extractall(self):
742-
DIR = pathlib.Path(TEMPDIR) / "extractall"
743-
with (
744-
os_helper.temp_dir(DIR),
745-
tarfile.open(tarname, encoding="iso8859-1") as tar
746-
):
747-
directories = [t for t in tar if t.isdir()]
748-
with self.assertWarnsRegex(DeprecationWarning, "Use the filter argument") as cm:
749-
tar.extractall(DIR, directories)
750-
# check that the stacklevel of the deprecation warning is correct:
751-
self.assertEqual(cm.filename, __file__)
752-
753-
def test_deprecation_if_no_filter_passed_to_extract(self):
754-
dirtype = "ustar/dirtype"
755-
DIR = pathlib.Path(TEMPDIR) / "extractall"
756-
with (
757-
os_helper.temp_dir(DIR),
758-
tarfile.open(tarname, encoding="iso8859-1") as tar
759-
):
760-
tarinfo = tar.getmember(dirtype)
761-
with self.assertWarnsRegex(DeprecationWarning, "Use the filter argument") as cm:
762-
tar.extract(tarinfo, path=DIR)
763-
# check that the stacklevel of the deprecation warning is correct:
764-
self.assertEqual(cm.filename, __file__)
765-
766759
def test_extractall_pathlike_dir(self):
767760
DIR = os.path.join(TEMPDIR, "extractall")
768761
with os_helper.temp_dir(DIR), \
@@ -4011,15 +4004,6 @@ def test_data_filter(self):
40114004
self.assertIs(filtered.name, tarinfo.name)
40124005
self.assertIs(filtered.type, tarinfo.type)
40134006

4014-
def test_default_filter_warns(self):
4015-
"""Ensure the default filter warns"""
4016-
with ArchiveMaker() as arc:
4017-
arc.add('foo')
4018-
with warnings_helper.check_warnings(
4019-
('Python 3.14', DeprecationWarning)):
4020-
with self.check_context(arc.open(), None):
4021-
self.expect_file('foo')
4022-
40234007
def test_change_default_filter_on_instance(self):
40244008
tar = tarfile.TarFile(tarname, 'r')
40254009
def strict_filter(tarinfo, path):
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
The default extraction filter for the :mod:`tarfile` module is now
2+
set to :func:`'data' <tarfile.data_filter>`.

0 commit comments

Comments
 (0)