Skip to content

Directly fetching refs returns the wrong symbolic references #1109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
muggenhor opened this issue Jan 14, 2021 · 3 comments
Open

Directly fetching refs returns the wrong symbolic references #1109

muggenhor opened this issue Jan 14, 2021 · 3 comments

Comments

@muggenhor
Copy link
Contributor

muggenhor commented Jan 14, 2021

When directly fetching a ref without specifying a destination ref. E.g. repo.remotes.origin.fetch(["master"]) instead of repo.remotes.origin.fetch(["refs/heads/master:refs/remotes/origin/master"]) git will produce a different amount of ref-update lines on stderr than in .git/FETCH_HEAD. For this example it would produce both these lines:

 * branch              master     -> FETCH_HEAD
 * [new branch]        master     -> origin/master

.git/FETCH_HEAD however will (kind of obviously) not contain anything to match the -> FETCH_HEAD lines (there can be multiple, more on that later).

037d62a9966743cf7130193fa08d5182df251b27                branch 'master' of https://github.com/gitpython-developers/GitPython

This mismatch causes the truncation code from 1537aab to kick in. Unfortunately, because the mismatched output (extra -> FETCH_HEAD lines that don't appear in the FETCH_HEAD file itself) appear in the front this truncation code leaves the stderr lines and .git/FETCH_HEAD lines mismatched. In this particular use case that's easily remedied by truncating the front of the list instead of the back:

diff --git a/git/remote.py b/git/remote.py
index 65916614..72c15ddc 100644
--- a/git/remote.py
+++ b/git/remote.py
@@ -698,9 +698,9 @@ class Remote(LazyMixin, Iterable):
             log.debug("info lines: " + str(fetch_info_lines))
             log.debug("head info : " + str(fetch_head_info))
             if l_fil < l_fhi:
-                fetch_head_info = fetch_head_info[:l_fil]
+                fetch_head_info = fetch_head_info[l_fhi - l_fil :]
             else:
-                fetch_info_lines = fetch_info_lines[:l_fhi]
+                fetch_info_lines = fetch_info_lines[l_fil - l_fhi :]
             # end truncate correct list
         # end sanity check + sanitization
 

But I'm unsure about submitting this patch as a PR because it might end up causing problems in case that I'm unaware of, where truncating from the end is the correct thing to do (or maybe fetch_head_info should be truncated at the back and only fetch_info_lines at the front...).

Additionally the -> FETCH_HEAD lines seem to appear on stderr once for every refspec without a dst segment (given the refspec syntax [+]<src>[:<dst>] from git help fetch). But, when fetching the same refspec without dst twice, the corresponding dd3cdfc9..037d62a9 master -> origin/master line that signals the update of the local remote ref (.git/refs/remotes/*) appears only once on stderr while it does generate two lines in .git/FETCH_HEAD.

As a result of this I'm convinced that the attempt to try to match the stderr output to the produced .git/FETCH_HEAD is unlikely to be made to work in every scenario...

FYI: for me it's more important that the return value of remote.fetch has each of its .ref members correspond one-to-one with FETCH_HEAD. The above patch would accomplish that iff len(fetch_head_info) <= len(fetch_info_lines). So it's at least an improvement for my needs as I would never get the wrong info/mismatched info. But I might still miss info if there are less lines discovered on stderr than in .git/FETCH_HEAD. So my preference would be to still keep processing .git/FETCH_HEAD lines, even when we've ran out of stderr lines.

So in short: my naive expectation of remote.fetch is that it returns one FetchInfo entry per ref entry in .git/FETCH_HEAD. All the extra metadata about those ref entries I see as nice to have and it would be acceptable for me to not have access to that under some circumstances.

@Byron
Copy link
Member

Byron commented Jan 17, 2021

Thanks so much for taking the time to investigate the implementation!
I couldn't agree more that there isn't always a perfect solution to obtain information on which refs got updated, so the only thing one could hope here is to make it a little less imperfect.

Even though I would definitely love to see this improvement in mainline, I also agree with the sentiment that it might break in unpredictable ways.

Maybe there is a middle ground - would it be possible to factor the ref handling into its own implementation and allow others to inject their own? Or to make it a little less complex, provide another method to fetch/pull but with a different algorithm to allow testing in the wild?

That way the existing algorithm could remain stable for now without being in the way of progress.

What do you think?

@muggenhor
Copy link
Contributor Author

Sorry for the late reply, I got side tracked.

I like your idea for making it possible to replace the ref-handling implementation. That should allow some experimentation with this to incrementally get a better implementation instead of having to get it right immediately.

I think that it's better to parametrize the ref-handling implementation than to add an extra method.

@Byron
Copy link
Member

Byron commented Jan 23, 2021

No sorry ever necessary :).

Great to hear! Yes, going ahead with parameterizing the method should do just fine. Once the new way of doing the work is proven to work, it could even become the default one day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants