Skip to content

memory leak in Tree.join() #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hvnsweeting opened this issue May 29, 2015 · 4 comments
Closed

memory leak in Tree.join() #289

hvnsweeting opened this issue May 29, 2015 · 4 comments

Comments

@hvnsweeting
Copy link
Contributor

Hello,
I'm not a direct user of gitpython but I used it as a backend of saltstack gitfs fileserver:

Due to the mem leak problem in this issue:
saltstack/salt#17006 (comment)

I traced down the memleak happen in multiple places:

  1. https://github.com/saltstack/salt/blob/develop/salt/fileserver/gitfs.py#L1453
  2. https://github.com/saltstack/salt/blob/develop/salt/fileserver/gitfs.py#L1151

for 1, I used memory_profiler to help showing memleak, here are some results:
https://github.com/gitpython-developers/GitPython/blob/master/git/objects/tree.py#L180

170  70.3359 MiB   0.0000 MiB           if '/' in file:
   171                                         tree = self
   172                                         item = self
   173                                         tokens = file.split('/')
   174                                         for i, token in enumerate(tokens):
   175                                             item = tree[token]
   176                                             if item.type == 'tree':
   177                                                 tree = item
   178                                             else:
   179                                                 # safety assertion - blobs are at the end of the path
   180                                                 if i != len(tokens) - 1:
   181                                                     raise KeyError(msg % file)
   182
   183                                                 print item, dir(item) # added by me
   184                                                 return item
   185                                             # END handle item type
   186                                         # END for each token of split path
   187                                         if item == self:
   188                                             raise KeyError(msg % file)
   189                                         return item
   190                                     else:
   191  70.3438 MiB   0.0078 MiB               for info in self._cache:
   192  70.3438 MiB   0.0000 MiB                   if info[2] == file:     # [2] == name

And:

201929    174  61.0586 MiB   0.0000 MiB               for i, token in enumerate(tokens):
201930    175                                             item = tree[token]
201931    176  61.0625 MiB   0.0039 MiB                   if item.type == 'tree':
201932    177  61.0586 MiB  -0.0039 MiB                       tree = item
201933    178                                             else:
201934    179                                                 # safety assertion - blobs are at the end of the path
201935    180                                                 # if i != len(tokens) - 1: I commented them out
201936    181                                                 #     raise KeyError(msg % file)
201937    182  61.0625 MiB   0.0039 MiB                       return item

I'm using:

GitPython==1.0.1

it's not clear to me why it leaked, could you please help me to figure it out / give a fix for it.
Thanks

@Byron Byron added this to the v1.0.2 - Fixes milestone May 29, 2015
@Byron
Copy link
Member

Byron commented May 29, 2015

Thanks for posting. Would you update the issue with the filenames containing the code shown in the memory_profiler output ?

Even though 1) seems to be within the tree.py implementation (unrelated to the backend), it might be worth trying to change the backend of GitPython to use the cgit implementation underneath.

You can do that by instantiating the Repo as follows:

git.Repo("path", odbt=git.GitCmdObjectDB)

Maybe this already improves things.

My two cents on memory leaks in python are that they should only occour if there are cyclic strong pointers, or if there is a cache that is never cleared. The latter could exit in GitPython depending on usage patterns, the first mentioned shouldn't exist as I at least paid attention to it. However, there is a chance some cycle is in there nonetheless. I also admit that right now, I don't understand how to interpret the output presented here, which makes me poke around blindly to help fixing the issue :). That might improve though.

@hvnsweeting
Copy link
Contributor Author

thank you very much for helping me "fixing" this.
I changed the gitpython backend in the salt code (saltstack/salt#24291)

and now the leak mem "seem" gone. I said "seem" because salt-master has multiple problems cause memleak than just git relate. See the changes here: saltstack/salt#17006 (comment) (I changed it in 29/5) - that is the memory of daemon that use gitpython.

Anw, should I create a new issue for the gitdb project?

@Byron
Copy link
Member

Byron commented Jun 1, 2015

I think you could consider creating a new gitdb issue if you have clear evidence on where the memory leak is created - usually this is hard to do, and more like chasing phantoms, especially in a GC language like python. Additionally GitDB/GitPython is in maintenance mode, and I am not investing any time in it unless it is possible to make fixes in 30m or less, which seems unlikely to be possible in this case.

A common fix for issues related with resources (like memory, filehandles) is to use the multiprocessing package to offload gitpython related work into its own process. That way, resources will definitely be freed. It's probably not worth doing that on a per-call basis, but one could keep these workers around for some time-period, and restart them in a certain interval.

Another solution could be to use the python bindings of libgit2 directly - it's somewhat lower-level, but depending on what you do it might still be feasible to make that change.

@Byron
Copy link
Member

Byron commented Jul 23, 2016

Closed due to inactivity. Please comment to get it reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants