-
Notifications
You must be signed in to change notification settings - Fork 66
Python 3 compatibility #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This fixes most of the import errors that came from using the implicit relative imports that Python 2 supports. This also fixes the use of `xrange`, which has replaced `range` in Python 3. The same has happened for `izip`, which is also being aliased. The octal number syntax changed in Python 3, so we are now converting from strings using the `int` built-in function, which will produce the same output across both versions of Python.
This will be switched back when the pull request for Python 3 support has been merged into the central async repository.
If it works in Python 3.3, it should also work in Python 3.4. Considering it is the latest stable release, gitdb should be tested against it.
In Python 3, the default string type is now the Python 2 unicode strings. The unicode strings cannot be converted to a byte stream, so we have to convert it before writing to the streams.
This uses memoryview by default, which is supported in Python 3 and Python 2.7, but not Python 2.6, and falls back to the old `buffer` type in Python 2.6 and when the memoryview does not support the type, such as when mmap instaces are passed in.
This adds a `byte_ord` version of `ord` which will let `bytes` safely pass through in Python 3. `cmp` was also swapped out as it has been dropped in Python 3.
This changes the internals to use BytesIO over StringIO, which fixed a few of the failing tests in Python 3. We are only importing from `io` now, instead of the entire chain, as this is available in Python 2.6+.
There were a few things which were being reused consistently for compatibility purposes, such as the `buffer`/`memoryview` functions as well as the `izip` method which needed to be aliased for Python 3. The `buffer` function was taken from `smmap` [1] and reworked slightly to handle the optional third parameter. This also adds a compatibility file dedicated entirely to encoding issues, which seem to be the biggest problem. The main functions were taken in part from the Django project [2] and rewritten slightly because our needs are a bit more narrow. A constants file has been added to consistently handle the constants which are required for the gitdb project in the core and the tests. This is part of a greater plan to reorganize the `util.py` file included in this project. This points the async extension back at the original repository and points it to the latest commit. [1]: https://github.com/Byron/smmap/blob/1af4b42a2354acbb53c7956d647655922658fd80/smmap/util.py#L20-L26 [2]: https://github.com/django/django/blob/b8d255071ead897cf68120cd2fae7c91326ca2cc/django/utils/encoding.py
Bytes should always be returned from the streams, so the tests should be checking against byte strings instead of text strings. This also fixes the `sha_iter` as it relied on the Python 2 `iterkeys` which has been renamed to `keys` in Python 3.
For the most part, the most difficult part of getting this to work with Python 3 has been encoding issues. Previously, gitdb passed data around internally switching between text and bytes at random and most of the tests use
Most of the remaining failing tests are encoding issues which have been difficult to trace for some reason. A general code review (from anyone) would be appreciated so we can iron out issues early. |
Hi Kevin, Thanks for all the hard work - I will try to answer your questions based on my best judgement, which clearly won't be the ultimate truth that should be discussed if you see any problems.
From my experience, when a library has been made 2/3 compatible, python 2 clients will always have trouble of some kind. For that reason, I would prefer clear assertions to indicate the problem right where it happens, instead of either waiting until things explode or trying to auto-encode/decode and fail to guess the right encoding. |
This makes it easier to deal with things internally as now everything is passed as bytes.
All your changes in this pull request have been merged into the py2n3 branch. |
The good news is that at least on OSX, there is just one test failing consistently across interpreters. On linux, as run by travis ci, there are two additional issues that I should be able to debug with a virtual machine locally. |
This is a WIP pull request to get gitdb working under Python 3. This depends on my pull request in async in order for the dependencies to be compatible.