Skip to content

Change exception wrapping to use repr instead of str #419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ionelmc
Copy link

@ionelmc ionelmc commented Jun 3, 2019

I have an app that makes heavy use of pymongo. However once in a while I get a "InvalidBSON('',)" exception. I haven't been able to track the original exception because pymongo discards useful information when wrapping exceptions.

@ionelmc
Copy link
Author

ionelmc commented Jun 3, 2019

Huh ... so "codec can't decode byte 0xff" will turn into the uglier "UnicodeDecodeError('utf8', '\\xff\\x00', 0, 1, 'invalid start byte')" - maybe just special-case UnicodeDecodeError for the old way?

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Jun 4, 2019

Hi @ionelmc, It would be interesting to add the repr information in python 2's exception. Perhaps we could add both the str() and repr()? In Python 3 I think we should look into relying on the built-in exception __context__ instead of changing the traceback: https://docs.python.org/3/reference/simple_stmts.html?highlight=__context__#the-raise-statement

Are you able to diagnose the cryptic InvalidBSON error with your change? You'll probably also want to change the pure Python version of the code too: https://github.com/mongodb/mongo-python-driver/blob/3.8.0/bson/__init__.py#L443-L446
(see the str(value) in https://github.com/mongodb/mongo-python-driver/blob/3.8.0/bson/py3compat.py).

@ionelmc
Copy link
Author

ionelmc commented Jun 5, 2019

Sounds like good compromise, repr on 2, str+context on 3.

@ionelmc
Copy link
Author

ionelmc commented Jun 5, 2019

So I looked a bit at the code - it's a bit annoying to change, I don't get why it was written like that. How exactly does calling _error clears the error state?

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Jun 5, 2019

@ionelmc Exception handling code is not pretty in C land :). I suggest holding off on making the C changes until we've found a suitable API in pure python.

I talked a bit about this problem with my co-worker @prashantmital and we think that it may make more sense to add the original exception as a property of the re-raised InvalidBSON exception, something like this:

class InvalidBSON(BSONError):
    """Raised when trying to create a BSON object from invalid data.
    """
    def __init__(self, msg='', cause=None):
        super(InvalidBSON, self).__init__(msg)
        self._msg = msg
        self._cause = cause

    @property
    def cause(self):
        """The exception which caused this InvalidBSON error or None.
        """
        return self._cause

We would provide the cause exception whenever we catch and re-raise another, eg in _bson_to_dict and decode_all. Applications can then print out or log the error however they like:

import bson
from bson.errors import InvalidBSON

try:
    bson.BSON('\x0e\x00\x00\x00\x10_\xFFd\x00\x01\x00\x00\x00\x00').decode()
except InvalidBSON as exc:
    # InvalidBSON("'utf8' codec can't decode byte 0xff in position 1: invalid start byte",)
    print(repr(exc))
    # UnicodeDecodeError('utf8', '_\xffd', 1, 2, 'invalid start byte')
    print(repr(exc.cause))

What do you think about this approach?

I'd still like to get to the bottom the cryptic "InvalidBSON('',)" errors you're experiencing.

@behackett
Copy link
Member

behackett commented Jun 5, 2019

This doesn't seem to be a problem in pure Python 3 since we have exception chaining:

$ python
Python 3.6.8 (default, Mar  5 2019, 08:24:34) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bson
>>> bson.BSON('\x0e\x00\x00\x00\x10_\xFFd\x00\x01\x00\x00\x00\x00').decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
>>> bson.BSON(b'\x0e\x00\x00\x00\x10_\xFFd\x00\x01\x00\x00\x00\x00').decode()
Traceback (most recent call last):
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 440, in _bson_to_dict
    return _elements_to_dict(data, 4, end, opts)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 427, in _elements_to_dict
    key, value, position = _element_to_dict(data, position, obj_end, opts)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 403, in _element_to_dict
    element_name, position = _get_c_string(data, position, opts)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 164, in _get_c_string
    opts.unicode_decode_error_handler, True)[0], end + 1
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 1164, in decode
    return _bson_to_dict(self, codec_options)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 446, in _bson_to_dict
    reraise(InvalidBSON, exc_value, exc_tb)
  File "/home/sigstop/work/mongo-python-driver/bson/py3compat.py", line 53, in reraise
    raise exctype(str(value)).with_traceback(trace)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 440, in _bson_to_dict
    return _elements_to_dict(data, 4, end, opts)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 427, in _elements_to_dict
    key, value, position = _element_to_dict(data, position, obj_end, opts)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 403, in _element_to_dict
    element_name, position = _get_c_string(data, position, opts)
  File "/home/sigstop/work/mongo-python-driver/bson/__init__.py", line 164, in _get_c_string
    opts.unicode_decode_error_handler, True)[0], end + 1
bson.errors.InvalidBSON: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
>>> 

It might be interesting to fix up our C extensions to also do exception chaining.
https://www.python.org/dev/peps/pep-3134/#c-api
https://docs.python.org/3/c-api/exceptions.html#c.PyException_SetCause

@ionelmc would that solve your problem?

@ionelmc
Copy link
Author

ionelmc commented Jun 12, 2019

This project doesn't use python3 yet. I have uncovered the real cause now, it's a MemoryError (which unsurprisingly will return "" for __str__).

Considering that I think the best way would be avoiding rewrapping when MemoryError is seen (in addition to using repr instead of str on py2).

Can you explain why calling _error clears the error state?

@ionelmc
Copy link
Author

ionelmc commented Jun 12, 2019

I suspect that _error doesn't actually clear the error state, but might clear it (because it does an import and that can trigger lots of things).

Anyway ... it turns out this get error object and import on demand thing was added in b738a6f - I don't consider this a good change.

@behackett
Copy link
Member

Importing objects on demand like that is necessary to deal with Python subinterpreter issues that come up with packages like mod_wsgi. For details (and related code) see 2483775.

@behackett
Copy link
Member

behackett commented Jun 12, 2019

A simple change to resolve this might be to check if the original exception provides a non-empty error string and fall back to the original exception name as the error string in that case.

@behackett
Copy link
Member

@ionelmc what do you think of my last proposal? If it sounds reasonable would you be willing to update this PR?

@ionelmc
Copy link
Author

ionelmc commented Oct 5, 2019

I don't think looking inside the exception is a good idea. It would open another can of worms.

@ShaneHarvey
Copy link
Member

(Actual) tracking ticket: PYTHON-2047

@pooooodles pooooodles added the tracked-in-jira Ticket filed in Mongo's Jira system label Nov 20, 2020
@ShaneHarvey
Copy link
Member

ShaneHarvey commented Jan 15, 2021

Thanks for bringing this issue to our attention! We're going to go with a different approach to fixing this problem as described in: https://jira.mongodb.org/browse/PYTHON-2047

Instead of always calling repr(), we're going to fallback to repr() only when str() results in an empty string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira Ticket filed in Mongo's Jira system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants