Skip to content

Handle pathologically connected dependency graphs #13416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

iamsrp-deshaw
Copy link

The pip install process looks to install packages in the order of distance from the root of the graph; i.e. packages furthest down the dependency tree will be installed first. The idea being, AFAICT, that one wants to have a package's dependencies in place before it is installed, so that an issue with the installation does not leave the system in a bad state.

This works well when you have a graph with no cycles in it but less well when you have cycles. Once the graph has been pruned of leaves the algorithm looks to determine the node distance with the recursive visit() method. The visit() method attempts to do an exhaustive walk of the dependency graph which, for a densely connected graph results in exponential time, for example (right now for me):

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
3720492 user      20   0  805276 127112   8896 R  99.7   0.2  21696:31 pip     

In order to address this we change the walk to be a heuristic which looks to break the graph cycles optimistically, whilst still keeping with the spirit of the install semantics. The leaf-pruning step of the previous algorithm is maintained (but rewritten) so semantics are preserved, but it is combined with the cycle-breaking, since it's cycles which break the leaf-pruning for working.

The proposed implementation runs the ordering step in just under 7ms for the pathological case above (which is still running at the time of writing).

It is noted that this patch changes the behavior of the cycle breaking and so, if there are package trees which have a strong dependency on the order in which the packages are installed then there is a possibility of breakage when there is a cycle in the dependency tree. However, this is weighed against the argument that the new algorithm is a distinct improvement in speed and that it is reasonable to believe that few packages will fall into the set whereby they have cycles in their dependency trees and also have a strict requirement about installation order, and which might have issues during the installation process.

The semantics of installation order in graphs without cycles are preserved in this change (per the unit tests).

iamsrp-deshaw and others added 5 commits June 5, 2025 13:57
When trying to install a set of packages which have a pathological
dependency graph, potentially with lots of cycles, walking the graph
exhaustively results in exponential runtime. The aim of that walk was to
determine which packages are the most "leaf-like" ones, being furthest
away from the root package. Instead we use a heuristic to determine a
good way to break the cycle in the graph.
@notatallshaw
Copy link
Member

Thanks for the PR, be aware, it may take some time to fully review your PR as pip maintainers have very limited capacity.

But from skimming it, a couple of comments/questions:

  • Do you have any reproducible benchmarks, real world or synthetic, that shows this speed up improvement?
  • Can the topological ordering be kept in a separate function or method and keep or modify the tests on it? I'm uncomfortable with losing so many tests for a critical part of the install process

@iamsrp-deshaw
Copy link
Author

Thanks for the quick response!

Both very good questions/comments. Let me come back in a bit with some enhanced tests to address your concerns about losing them. (I had removed them since the code which they were associated with was also removed; however, as you note, this also removes the ordering checks which were, effectively, transitively being tested.

No hurry on this either. You are right to be defensive about touching something at the very core of the pip installer (I appreciate the well judged push-back) and taking the time, where available, to do the review is good. Better safe than sorry.

@iamsrp-deshaw
Copy link
Author

Actually, I came up with a much less invasive change. Let's drop this and I'll submit that one instead.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants