-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Avoid MemoryError on large queries #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What python interpreter and version are you using? |
Python 2.6.2 and Python 2.6.7 with the CPython interpreter. |
Can you open a ticket at jira.mongodb.org under the "python driver" project with reproduction steps and possibly an example document? I want to do a little more research on this issue before merging your pull request. |
Sure, I'll do it once I'm back in the office tomorrow. |
Thanks a lot. I'm going to do some more research in the meantime. |
Hi Chris, are you using a large batch_size setting? By default MongoDB will only return 4MB or 101 documents (whichever comes first) in a single batch. I'm not sure I understand how PyMongo could use TBs of memory in a query just by doing "".join(chunks) for each batch. |
@tompko I can't reproduce this behavior in my own tests. What OS are you on? Is your application multi-threaded, and if so roughly how many concurrent threads are running? At the bottom of Connection.__receive_data_on_socket, before the return statement, could you add
... and let me know what range the lengths fall in, in the scenario that caused the out-of-memory error? |
We haven't been able to reproduce this behavior. If you have a test case please open a ticket under the python project at jira.mongodb.org. |
Using pymongo 2.3, and had the same issue. This patch seems to fix this issue. Added reporting length after exception is caught: The following fixes the issue: |
I still haven't been able to reproduce this but it's been reported a few times. This change only reverts back to our pre-2.2 behavior so I'm going to merge it. Thanks for the patch and your patience with this issue. |
When receiving data from large queries we were running into a MemoryError. From investigating sock_info.sock.recv returns a buffer of length size, which is then inserted into the chunks list. Unfortunately we were only receiving a small amount of bytes per iteration so chunks was filling up with items of size (approximately) length and quickly running out of memory. In total our query looked like it would try to allocate about 4Tb worth of memory.
I've rewritten the function to behave more like the a previous version which fixes the memory issue as the chunk memory is freed once the data has been concatenated to the message.