Skip to content

Support Python 3 #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Mar 13, 2021
Merged

Support Python 3 #263

merged 22 commits into from
Mar 13, 2021

Conversation

ligurio
Copy link
Member

@ligurio ligurio commented Jan 11, 2021

Uncommitted changes:

Testing status

vshard, Python 2

PR to bump test-run version in vshard - tarantool/vshard#268

Tarantool, Python 3

CI (enabled Python 3 in test-run.py, branch ligurio/python3-support in tarantool): everything is passed except job with FreeBSD, it requires manual update of golden image tarantool/tarantool#5751

Tarantool, Python 2

We want to keep compatibility with Python 2, so I have prepared a special branches in tarantool (ligurio/python3-support-py2) and test-run (ligurio/gh-15-support-python-3-wip-py2) repositories to check that tests passed with Python 2. See CI status in branch ligurio/python3-support-py2 in tarantool).

  • tarantool ligurio/python3-support-py2 contains all commits from ligurio/python3-support except those that force using python3:
    • "Bump test-run (Python 3 support)"
    • "readme: update package names in installation steps"
    • "rpm: use Python 3 in testing on RPM packaging"
    • "travis: switch to Python 3 version"
    • "test: use Python 3 in httpd.py shebang"
  • test-run ligurio/gh-15-support-python-3-wip-py2 contains all commits from ligurio/gh-15-support-python-3-wip-py2 with removed commit that changes shebang to Python 3 in test-run.py.

@Totktonada
Copy link
Member

My thoughts regarding the shebang: packpack/packpack-docker-images#68 (comment).

@ligurio ligurio force-pushed the ligurio/gh-15-support-python-3-wip branch 4 times, most recently from d7adb6e to 8cfb910 Compare January 27, 2021 14:35
@Totktonada Totktonada linked an issue Jan 27, 2021 that may be closed by this pull request
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that the Python 3 support is closer to the reality!

I'll look through commits one by one and will place my comments for each commit in it's own GitHub's review. I guess it'll help us to navigate over the discussion (and also will help to deliver first comments earlier).

Don't get me wrong. I will not insist on fixing every line and every comment according to my preferences. No. However I want to share preferences and thoughts during review to learn something new from each other.

01/23 Explicit imports of needed objects

So called 'star import' (the first link) is irrelevant here. import foo is not a star import. 'Import everything from a module' is not the same as 'import a module'.

The second link refers the question, where the accepted answer does not state that this approach 'allows to keep code clean for readers'. The answer author prefers the opposite (at least in the reduced context of their discussion -- regarding sys) and explicitly said that it is the code style question. I would treat it as that there is no general rule of thumb (and brief looking around relevant SO discussions confirms this).

I guess you personally prefer this style and it is okay. Don't hesitate to declare your vision as the reason of the change.

What regarding me... I don't remember why exactly I did this change in the draft. I don't have a personal preference, but it seems that from foo import bar is the preferred way within the test-run's code. Improving consistency looks as enough motivation for me in the scope of this project.

One more note: I tends to mark all commits with Part of #xxx even when they're just side effect of looking around code to fix / implement #xxx (and it is easier to keep it within the patchset). There are two reasons:

  • Git (w/o GitHub) have no any mechanism to mark a consequent set of commits as part of one patch series (I don't count merge commits, because of my preference to keep a history linear).
  • I like to read stories and don't like changes, which appears from nothing. When there is a linked issue, we can guess that an author found a bad code during looking into the issue and decided to fix it. That's the nice storyline.

Too much text, sorry.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

02/23 Remove unused variable

I tested it. flake8 complains regarding this line when called from Python 3, but does not complain when called from Python 2. So it is certainly part of #20.

My flake8 versions:

$ flake8 --version  # Python 2, flake8 from pip
3.8.4 (mccabe: 0.6.1, pycodestyle: 2.6.0, pyflakes: 2.2.0) CPython 2.7.18 on
Linux
$ /usr/bin/flake8 --version  # Python 3, flake8 from portage
3.8.4 (mccabe: 0.6.1, pycodestyle: 2.6.0, pyflakes: 2.2.0) CPython 3.9.1 on Linux

LGTM otherwise.

03/23 python3: make print calls compatible with Python 3.x

LGTM.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

04/23 python3: fix path import lib/box_connection.py

This change is unclear for me: see comments below for details.

I see that I'm co-author here, so maybe the reason is the content of my experimental patch. But this fact alone does not advocate the change anyhow. Sorry if I unintentionally directed you to a wrong way.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

05/23 python3: fix import of StringIO module

I don't get why do you moved one code block (see below).

I left some details regarding all those buffer implementations, but primarily for myself -- to better understand what would be an ideal solution in a common case like our one.

LGTM.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

06/23 python3: fix mode format in chmod() call

The change itself is okay, but the explanation is misleading. See below.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

07/24 python3: fix import of SimpleQueue

This change looks useless. See details below.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

08/24 python3: fix configparser module import

There are several comments regarding this change, see details below.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

09/24 python3: replace xrange() with xrange()

The change is okay (the size is small: it is always 3), but there is a typo in the commit title: the last xrange() should be range().

If you want to keep xrange() usage on Python 2, that's easy. I would bother in code that tends to be universal, but here it is optional.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10/24 python3: using six to iteritems()

Some code that work on Python 2 and Python 3 was changed. I would not change it if we'll not going to add six-like iteritems() and iterkeys() functions to lib/utils.py (see below).

New six usages were added. I would avoid this: please, consider this as my little wish. Since we can easily eliminate the dependency I hope we'll have less problems with RPM / Deb dependencies in tarantool and a few connectors and modules that also use test-run.

There are few six usages now and everything that we need is just the following code in lib/utils.py:

if sys.version_info[0] == 3:
    string_types = str,
    integer_types = int,
    from shlex import quote as shlex_quote
else:
    string_types = basestring,
    integer_types = (int, long)
    from pipes import quote as shlex_quote

Aside of this we need to change .iteritems() and .iterkeys() to just .items() and .keys(). Or we can copy the corresponding functions from six and use them (if someone really bother about invisible performance aberrations when running test-run on Python 2). In this case the lib/utils.py file will gain 16 SLOC instead of 8 SLOC, that's negligible.

I commented four cases with more details. The rest is the same.

@ligurio ligurio force-pushed the ligurio/gh-15-support-python-3-wip branch 2 times, most recently from 6f00b29 to e20eb77 Compare February 1, 2021 15:36
@ligurio ligurio force-pushed the ligurio/gh-15-support-python-3-wip branch 3 times, most recently from 4dce9be to fdf2764 Compare February 4, 2021 15:01
ligurio and others added 4 commits March 10, 2021 13:14
Unlike Python 2, Python 3 does not treat `import bar` from a module in a
package `foo` as importing of `foo.bar`.

Absolute import that is relative to a test-run root (like `import
lib.utils`) still work, because `sys.path` contains a script directory
(after symlinks resolution) on both Python versions.

https://www.python.org/dev/peps/pep-0328/

Part of #20

Co-authored-by: Alexander Turenko <[email protected]>
What's New In Python 3.0 [1]:

The function attributes named func_X have been renamed to use the __X__ form,
freeing up these names in the function attribute namespace for user-defined
attributes. To wit, func_closure, func_code, func_defaults, func_dict,
func_doc, func_globals, func_name were renamed to __closure__, __code__,
__defaults__, __dict__, __doc__, __globals__, __name__, respectively.

1. https://docs.python.org/3/whatsnew/3.0.html#operators-and-special-methods

Part of #20
Fix Tarantool connection liveness and box-py/call.test.py test.

Part of #20

Co-authored-by: Alexander Turenko <[email protected]>
gevent 1.2 and below does not support Python 3.7 and above, see [1].

1. gevent/gevent#1297

Part of #20
@ligurio ligurio force-pushed the ligurio/gh-15-support-python-3-wip branch 3 times, most recently from eecbd70 to bcba5f7 Compare March 12, 2021 11:50
@Totktonada
Copy link
Member

  1. open result file in binary mode and convert line using byte_to_str()

I see, several result files contains characters that are not in ASCII, but in UTF-8:

  • test/engine/select.result
  • test/sql/collation.result
  • test/sql/icu-upper-lower.result

Don't know how I missed it. Thanks!

@Totktonada
Copy link
Member

Hey, not I know how I missed this :) The fail is due to non UTF-8 locale in our images.

The commit message (PR #39) states the following:

  • at CentOS 8 the localization environment changed to "C";
    (needed by backport repository);

So I have several questions here:

  1. What the exact problem was?
  2. Can it be solved by a temporary changing of the environment during the image build?

@avtikhon Can you clarify?

(Anyway, I don't mind the change in test-run.)

In Python 3 start method 'spawn' in multiprocessing module becomes
default on Mac OS [1].

The 'spawn' method causes re-execution of some code, which is already
executed in the main process. At least it is seen on the lib/__init__.py
code, which removes the 'var' directory. Some other code may have side
effects too, it requires investigation.

The method also requires object serialization that doesn't work when
objects use lambdas, whose for example used in class TestSuite
(lib/test_suite.py).

The latter problem is easy to fix, but the former looks more
fundamental. So we stick to the 'fork' method now.

The new start method is available on Python 3 only:

Traceback (most recent call last):
  File "../../test/test-run.py", line 227, in <module>
    multiprocessing.set_start_method('fork')
AttributeError: 'module' object has no attribute 'set_start_method'

1. https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Fixes #265
Part of #20

Co-authored-by: Alexander Turenko <[email protected]>
@Totktonada
Copy link
Member

Hey, not I know how I missed this :) The fail is due to non UTF-8 locale in our images.

So I should change the commit message of the 'python3: decouple bytes and strings' commit, because it states the following now:

We can use unicode or binary streams for files (unicode by default[^1]).

[^1]: Technically it depends on a system locale, but, hey, does anyone
      see a non UTF-8 locale after the millennium?

Since we don't lean on this property anymore, we should reflect it here.

Totktonada and others added 5 commits March 12, 2021 22:58
In Python 2 default string type (`<str>`) is a binary string,
non-unicode. We receive data from a socket, from a Popen stream, from a
file as a string and operate on those strings without any conversions.

Python 3 draws a line here. We usually operate on unicode strings in the
code (because this is the default string type, `<str>`), but receive
bytes from a socket and a Popen stream. We can use unicode or binary
streams for files (unicode by default[^1]).

This commit decouples bytes and strings. In most cases it means that we
convert data from bytes to a string after receiving from a socket /
Popen stream and convert it back from a string to bytes before writting
to a socket. Those operations are no-op on Python 2.

So, the general rule for our APIs is to accept and return `<str>`
disregarding Python version. Not `<bytes>`, not `<unicode>`.

The only non-trivial change is around `FilteredStream` and writes into
`sys.stdout`. The `FilteredStream` instance replaces `sys.stdout` during
execution of a test, so it should follow the usual convention and accept
`<str>` in the `write()` method. This is both intuitive and necessary,
because `*.py` tests rely on `print('bla bla')` to write into a result
file.

However the stream should also accept `<bytes>`, because we have a unit
test (`unit/json.test`), which produces a binary output, which does not
conform UTF-8 encoding. The separate `write_bytes()` method was
introduced for this sake. UnittestServer and AppServer write tests
output as bytes directly, TarantoolServer rely on the usual string
output.

We also use bytes directly, when write from one stream to another one:
in `app_server.py` for stderr (writting to a log file), in
`tarantool_server.py` for log destination property (because it is the
destination for Popen).

[^1]: Technically it depends on a Python version and a system locale.
      There are situations, when default text file streams encoding is
      not 'utf-8'. They will be handled in the next commit.

Part of #20

Co-authored-by: Sergey Bronnikov <[email protected]>
The problem: test files, result files contain UTF-8 symbols, which are
out of ASCII range. Just `open(file, 'r').read()` without the
encoding='utf-8' argument fails to decode them, when the default
encoding for file text streams is not 'utf-8'. We meet this situation on
Python 3.6.8 (provided by CentOS 7 and CentOS 8), when the POSIX locale
is set (`LC_ALL=C`).

The solution is described in the code comment: replace `open()` built-in
function and always set `encoding='utf-8'`. That's hacky way, but it
looks better than change every `open()` call across the code and don't
forget to do that in all future code (and keep Python 2 compatibility in
the mind). But maybe we'll revisit the approach later.

There is another way to hack the `open()` behaviour that works for me on
Python 3.6.8:

 | import _bootlocale
 | _bootlocale.getpreferredencoding = (lambda *args: 'utf8')

However it leans on Python internals and looks less reliable than the
implemented one.

Part of #20
When Tarantool unit tests runs using test-run error with output like below
may happen:

[027] small/small_class.test
[027] Test.run() received the following error:
[027] Traceback (most recent call last):
[027]   File "/__w/tarantool/tarantool/test-run/lib/test.py", line 192, in run
[027]     self.execute(server)
[027]   File "/__w/tarantool/tarantool/test-run/lib/unittest_server.py", line 20, in execute
[027]     proc = Popen(execs, cwd=server.vardir, stdout=PIPE, stderr=STDOUT)
[027]   File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
[027]     restore_signals, start_new_session)
[027]   File "/usr/lib/python3.5/subprocess.py", line 1282, in _execute_child
[027]     raise child_exception_type(errno_num, err_msg)
[027] FileNotFoundError: [Errno 2] No such file or directory: '../test/small/small_class.test'
[027] [ fail ]

The root cause of error is changed behaviour of Popen in Python 3 in comparison
to Python 2. One should explicitly set path to a current work dir:

Python 2 [1]: "If cwd is not None, the child’s current directory will be
changed to cwd before it is executed. Note that this directory is not
considered when searching the executable, so you can’t specify the
program’s path relative to cwd."

Python 3 [2]: "If cwd is not None, the function changes the working
directory to cwd before executing the child. <...> In particular, the
function looks for executable (or for the first item in args) relative
to cwd if the executable path is a relative path."

1. https://docs.python.org/2/library/subprocess.html#subprocess.Popen
2. https://docs.python.org/3/library/subprocess.html#subprocess.Popen

Part of #20
To make test box-py/call.test.py compatible with Python 2 and Python 3
it must have the same output when running with both versions of
interpeter. With running on Python 2.7 output of called commands
concluded to round brackets while on Python 3 it is not.

[001] @@ -20,7 +20,7 @@
[001]  - true
[001]  - null
[001]  ...
[001] -call  f1 ()
[001] +('call ', 'f1', ((),))
[001]  - 'testing'
[001]  - 1
[001]  - False

To fix it print should use formatted string in print() call.

Part of #20
Related issues and changes:

- Support Python 3 in Tarantool test suite
  tarantool/tarantool#5538
- Python 3 in tarantool-python
  tarantool/tarantool-python#181
  tarantool/tarantool-python#186
- Use Python 3 in a test infrastructure
  tarantool/tarantool#5652

Closes #20
@Totktonada
Copy link
Member

Hey, not I know how I missed this :) The fail is due to non UTF-8 locale in our images.

So I should change the commit message of the 'python3: decouple bytes and strings' commit, because it states the following now:

We can use unicode or binary streams for files (unicode by default1).

[^1]: Technically it depends on a system locale, but, hey, does anyone
      see a non UTF-8 locale after the millennium?

Since we don't lean on this property anymore, we should reflect it here.

It appears that the problem is not only here. We read test files in the same way. We read result files when a hung worker is detected, which is not quite usual situation, so it'll not be covered by our usual CI run in tarantool.

I decided to work the problem around for all text streams uniformly. So, I reverted the change from the 'python3: decouple bytes and strings' commit and slightly changed its wording:

I added one more commit to handle the problem. It is named 'python3: set text file streams encoding to utf-8', I'll push it here soon.

Footnotes

  1. Technically it depends on a Python version and a system locale.
    There are situations, when default text file streams encoding is
    not 'utf-8'. They will be handled in the next commit.

@Totktonada Totktonada force-pushed the ligurio/gh-15-support-python-3-wip branch from bcba5f7 to e93d067 Compare March 12, 2021 22:36
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks okay for me now.

I did some testing on Python 2 and Python 3, but not on full set of our platforms. I see that you tested the patchset thoroughly, so I don't worry.

I'll think about better order of pushing relevant test-run and tarantool patches and will proceed.

@Totktonada Totktonada merged commit 5baa4f0 into master Mar 13, 2021
@Totktonada Totktonada deleted the ligurio/gh-15-support-python-3-wip branch March 13, 2021 16:18
@Totktonada
Copy link
Member

Updated the test-run submodule in tarantool in 2.8.0-127-gf203bf33a (part of PR #5751).

Backports to 2.7, 2.6 and 1.10 are in progress.

@Totktonada
Copy link
Member

2.7 in 2.7.1-116-ga71248988.
2.6 in 2.6.2-113-g7421d0f43.

@Totktonada
Copy link
Member

1.10 in 1.10.9-75-g949e86b7e.

Totktonada added a commit that referenced this pull request Apr 2, 2021
Less dependencies we have, less effort we need to keep RPM / Deb package
dependencies and building intructions for various OSes up to date.

It was mostly removed in [1]: this commit removed the last occurence and
drops the dependency.

[1]: #263
Totktonada added a commit that referenced this pull request Apr 27, 2021
Less dependencies we have, less effort we need to keep RPM / Deb package
dependencies and building intructions for various OSes up to date.

It was mostly removed in [1]: this commit removed the last occurence and
drops the dependency.

[1]: #263
Totktonada added a commit that referenced this pull request Apr 27, 2021
Less dependencies we have, less effort we need to keep RPM / Deb package
dependencies and building intructions for various OSes up to date.

It was mostly removed in [1]: this commit removed the last occurence and
drops the dependency.

[1]: #263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[10sp] Make test (and other python code) Python3 compatible
2 participants