Handle pathologically connected dependency graphs #13416
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The pip install process looks to install packages in the order of distance from the root of the graph; i.e. packages furthest down the dependency tree will be installed first. The idea being, AFAICT, that one wants to have a package's dependencies in place before it is installed, so that an issue with the installation does not leave the system in a bad state.
This works well when you have a graph with no cycles in it but less well when you have cycles. Once the graph has been pruned of leaves the algorithm looks to determine the node distance with the recursive
visit()
method. Thevisit()
method attempts to do an exhaustive walk of the dependency graph which, for a densely connected graph results in exponential time, for example (right now for me):In order to address this we change the walk to be a heuristic which looks to break the graph cycles optimistically, whilst still keeping with the spirit of the install semantics. The leaf-pruning step of the previous algorithm is maintained (but rewritten) so semantics are preserved, but it is combined with the cycle-breaking, since it's cycles which break the leaf-pruning for working.
The proposed implementation runs the ordering step in just under 7ms for the pathological case above (which is still running at the time of writing).
It is noted that this patch changes the behavior of the cycle breaking and so, if there are package trees which have a strong dependency on the order in which the packages are installed then there is a possibility of breakage when there is a cycle in the dependency tree. However, this is weighed against the argument that the new algorithm is a distinct improvement in speed and that it is reasonable to believe that few packages will fall into the set whereby they have cycles in their dependency trees and also have a strict requirement about installation order, and which might have issues during the installation process.
The semantics of installation order in graphs without cycles are preserved in this change (per the unit tests).