Skip to content

MemoryError while retrieving large cursors #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

TomasB
Copy link

@TomasB TomasB commented Sep 22, 2012

This is the issue we experience on windows machines:
Python version v2.7.2
Mongo v2.0.7 (running on ubuntu)

>>> ================================ RESTART ================================
>>> from pymongo import Connection
>>> c = Connection("mongodb://user:pswd@mongo/admin")
>>> c_users  = c["data"]["user"]
>>> users = c_users.find({}, ["_id"])
>>> users.count()
193845
>>> for u in users:
    s = u["_id"]



Traceback (most recent call last):
  File "<pyshell#24>", line 1, in <module>
    for u in users:
  File "build\bdist.win32\egg\pymongo\cursor.py", line 778, in next
    if len(self.__data) or self._refresh():
  File "build\bdist.win32\egg\pymongo\cursor.py", line 742, in _refresh
    limit, self.__id))
  File "build\bdist.win32\egg\pymongo\cursor.py", line 666, in __send_message
    **kwargs)
  File "build\bdist.win32\egg\pymongo\connection.py", line 907, in _send_message_with_response
    return self.__send_and_receive(message, sock_info)
  File "build\bdist.win32\egg\pymongo\connection.py", line 885, in __send_and_receive
    return self.__receive_message_on_socket(1, request_id, sock_info)
  File "build\bdist.win32\egg\pymongo\connection.py", line 877, in __receive_message_on_socket
    return self.__receive_data_on_socket(length - 16, sock_info)
  File "build\bdist.win32\egg\pymongo\connection.py", line 858, in __receive_data_on_socket
    chunk = sock_info.sock.recv(length)
MemoryError
>>> 

At this point I am not even convinced that this is pymongo's issue, and not python's, but the change seems to fix the this without any other bad side affects.
Please do not hesitate to request for more info, I'm quite eager to have it resolved in driver's mainstream code.

Stack:
  for user in all_users:
File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 778, in next
  if len(self.__data) or self._refresh():
File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 742, in _refresh
  limit, self.__id))
File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 666, in __send_message
  **kwargs)
File "C:\Python27\lib\site-packages\pymongo\connection.py", line 907, in _send_message_with_response
  return self.__send_and_receive(message, sock_info)
File "C:\Python27\lib\site-packages\pymongo\connection.py", line 885, in __send_and_receive
  return self.__receive_message_on_socket(1, request_id, sock_info)
File "C:\Python27\lib\site-packages\pymongo\connection.py", line 877, in __receive_message_on_socket
  return self.__receive_data_on_socket(length - 16, sock_info)
File "C:\Python27\lib\site-packages\pymongo\connection.py", line 858, in __receive_data_on_socket
  chunk = sock_info.sock.recv(length)
MemoryError
@TomasB
Copy link
Author

TomasB commented Sep 22, 2012

Just tried the same code to run on Ubuntu, python 2.6.5, exception doesn't happen.
Updated python on windows to v2.7.3, still happening

@rozza
Copy link
Member

rozza commented Sep 24, 2012

Related: #132

@rozza
Copy link
Member

rozza commented Sep 24, 2012

@TomasB just to clarify you get this memory error on Windows python 2.7.2 and Ubuntu python 2.7.3.
Also - with the same dataset and test Ubuntu python 2.6.5 does not cause a MemoryError?

@TomasB
Copy link
Author

TomasB commented Sep 24, 2012

@rozza, the tests I ran:
windows, python v2.7.2 - fail
windows, python v2.7.3 - fail
ubuntu, python v2.6.5 - ok

In all these cases I was connecting to the same db, which is running on ubuntu mongo v2.0.7.

@TomasB
Copy link
Author

TomasB commented Sep 24, 2012

When I want to loop through a whole collection, it fails at 101 attempt to retrieve document.
When I limit cursor to retrieve 1000 document, it retrieves beyond 101 document.

>>> from pymongo import Connection
>>> c = Connection("mongodb://user:pswd@mongo/admin")
>>> c_users  = c["data"]["user"]
>>> users = c_users.find({}, ["_id"]).limit(1000)
>>> users.count()
193845
>>> cnt = 0
>>> for u in users:
    cnt += 1
    try:
        s = u["_id"]
    except:
        print "Done so far: %s" % cnt
        raise


>>> print cnt
1000
>>> ================================ RESTART ================================
>>> from pymongo import Connection
>>> c = Connection("mongodb://user:pswd@mongo/admin")
>>> c_users  = c["data"]["user"]
>>> users = c_users.find({}, ["_id"])
>>> cnt = 0
>>> for u in users:
    cnt += 1
    try:
        s = u["_id"]
    except:
        print "Done so far: %s" % cnt
        raise



Traceback (most recent call last):
  File "<pyshell#29>", line 1, in <module>
    for u in users:
  File "build\bdist.win32\egg\pymongo\cursor.py", line 778, in next
    if len(self.__data) or self._refresh():
  File "build\bdist.win32\egg\pymongo\cursor.py", line 742, in _refresh
    limit, self.__id))
  File "build\bdist.win32\egg\pymongo\cursor.py", line 666, in __send_message
    **kwargs)
  File "build\bdist.win32\egg\pymongo\connection.py", line 907, in _send_message_with_response
    return self.__send_and_receive(message, sock_info)
  File "build\bdist.win32\egg\pymongo\connection.py", line 885, in __send_and_receive
    return self.__receive_message_on_socket(1, request_id, sock_info)
  File "build\bdist.win32\egg\pymongo\connection.py", line 877, in __receive_message_on_socket
    return self.__receive_data_on_socket(length - 16, sock_info)
  File "build\bdist.win32\egg\pymongo\connection.py", line 858, in __receive_data_on_socket
    chunk = sock_info.sock.recv(length)
MemoryError
>>> print cnt
101

@behackett
Copy link
Member

To get the results you are expecting for count you should use Cursor.count and pass with_limit_and_skip=True. See the docs here:

http://api.mongodb.org/python/current/api/pymongo/cursor.html#pymongo.cursor.Cursor.count

We're looking at the memory issues.

@TomasB
Copy link
Author

TomasB commented Sep 24, 2012

@behackett thanks for the tip

@ajdavis
Copy link
Member

ajdavis commented Sep 24, 2012

@TomasB can you open a bug report at jira.mongodb.org in the "Python Driver" project, please? We need to work with you a little more to understand what the underlying problem is.

@TomasB
Copy link
Author

TomasB commented Sep 25, 2012

Sure, I'll do, but the earliest is going to be coming Wednesday.

@ajdavis
Copy link
Member

ajdavis commented Sep 25, 2012

@TomasB what version of Windows are you using? Windows 7, XP, 2000? How much RAM is installed on the Windows machine? Is it a 32- or 64-bit machine?

@TomasB
Copy link
Author

TomasB commented Sep 25, 2012

OS Name Microsoft Windows 7 Professional 32 bit
Version 6.1.7601 Service Pack 1 Build 7601
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
System Manufacturer Hewlett-Packard
System Model HP Compaq 6005 Pro MT PC
System Type X86-based PC
Processor AMD Athlon(tm) II X2 B24 Processor, 3000 Mhz, 2 Core(s), 2 Logical Processor(s)
BIOS Version/Date Hewlett-Packard 786G6 v01.11, 2010-08-04
SMBIOS Version 2.6
Windows Directory C:\Windows
System Directory C:\Windows\system32
Boot Device \Device\HarddiskVolume2
Locale United States
Hardware Abstraction Layer Version = "6.1.7601.17514"
User Name Not Available
Time Zone Pacific Daylight Time
Installed Physical Memory (RAM) 4.00 GB
Total Physical Memory 3.00 GB
Available Physical Memory 868 MB
Total Virtual Memory 6.00 GB
Available Virtual Memory 2.24 GB
Page File Space 3.00 GB
Page File C:\pagefile.sys

@TomasB
Copy link
Author

TomasB commented Oct 1, 2012

@TomasB
Copy link
Author

TomasB commented Oct 3, 2012

Hmm.. looks like io.BytesIO has some space for improvement: http://mail.python.org/pipermail/python-dev/2012-July/120983.html
Since you were trying to improve performance by using a list and EMPTY.join() instead of string concatenation, you may want to NOT use this implementation (though it should still yield a better performance than initial string concatenation implementation).

@behackett
Copy link
Member

io.BytesIO wouldn't have worked either way since we support back to CPython 2.4 (and Jython just recently added an io module). I think we are going to use PyMongo's implementation from before 2.2, as I mentioned in PYTHON-413.

@behackett behackett closed this Oct 3, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants