Skip to content

Commit f459478

Browse files
authored
Merge pull request git#319 from git/rn-45/jn-generation-number-v2
rn-45: Add "[RFC] Generation Number v2" article
2 parents ef549d0 + fe17315 commit f459478

File tree

1 file changed

+122
-0
lines changed

1 file changed

+122
-0
lines changed

rev_news/drafts/edition-45.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,128 @@ This edition covers what happened during the month of October 2018.
101101
should be in the upcoming v2.20.0 Git release
102102
[scheduled for the beginning of December](https://tinyurl.com/gitCal).
103103

104+
105+
### General
106+
107+
* [[RFC] Generation Number v2](https://public-inbox.org/git/[email protected]/t/#u)
108+
109+
The [commit-graph](https://github.com/git/git/blob/master/Documentation/technical/commit-graph.txt)
110+
file mechanism (see the description above) accelerates commit graph
111+
walks in the two following ways:
112+
113+
1. Getting basic commit data (satisfying requirements of
114+
`parse_commit_gently()`) without decompressing and parsing.
115+
2. Reducing the time it takes to walk commits and determine
116+
topological relationships by providing "generation number" (as a
117+
negative-cut reachability index), which makes it possible skip
118+
walking whole parts of commit graph.
119+
120+
The current version of the generation number has the advantage over
121+
using heuristic based on the commit date that it is always correct.
122+
It turned out however that in some cases it can give worse
123+
performance than using the date heuristics; that is why its use got
124+
limited in [[PATCH 1/1] commit: don't use generation numbers if not
125+
needed](https://public-inbox.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/).
126+
127+
For the same reason why [[PATCH 0/6] Use generation numbers for
128+
--topo-order](https://public-inbox.org/git/[email protected]/),
129+
and its subsequent revisions, also limited its use:
130+
131+
> One notable case that is not included in this series is the case
132+
> of a history comparison such as `git rev-list --topo-order A..B`.
133+
134+
Removing this limitation yields correct results, but the performance
135+
is worse.
136+
137+
That is why Derrick Stolee sent [this RFC](https://public-inbox.org/git/[email protected]/t/#u)
138+
139+
> We've discussed in several places how to improve upon generation
140+
> numbers. This RFC is a report based on my investigation into a
141+
> few new options, and how they compare for Git's purposes on
142+
> several existing open-source repos.
143+
>
144+
> You can find this report and the associated test scripts at
145+
> <https://github.com/derrickstolee/gen-test>.
146+
147+
> Please also let me know about any additional tests that I could
148+
> run. Now that I've got a lot of test scripts built up, I can re-run
149+
> the test suite pretty quickly.
150+
151+
He then explains why Generation Number v2 is needed:
152+
153+
> Specifically, some algorithms in Git already use commit date as a
154+
> heuristic reachability index. This has some problems, though, since
155+
> commit date can be incorrect for several reasons (clock skew between
156+
> machines, purposefully setting `GIT_COMMIT_DATE` to the author date, etc.).
157+
> However, the speed boost by using commit date as a cutoff was so
158+
> important in these cases, that the potential for incorrect answers was
159+
> considered acceptable.
160+
>
161+
> When these algorithms were converted to use generation numbers, we
162+
> _added_ the extra constraint that the algorithms are _never incorrect_.
163+
> Unfortunately, this led to some cases where performance was worse than
164+
> before. There are known cases where `git merge-base A B` or
165+
> `git log --topo-order A..B` are worse when using generation numbers
166+
> than when using commit dates.
167+
>
168+
> This report investigates four replacements for generation numbers, and
169+
> compares the number of walked commits to the existing algorithms (both
170+
> using generation numbers and not using them at all). We can use this
171+
> data to make decisions for the future of the feature.
172+
173+
The very rough implementation of those four proposed generation
174+
numbers can be found in [the `reach-perf` branch in
175+
https://github.com/derrickstolee/git](https://github.com/derrickstolee/git/tree/reach-perf).
176+
177+
Based on performed benchmarks (by comparing the number of commits
178+
walked with the help of [trace2](https://public-inbox.org/git/[email protected]/t/#u)
179+
facility), Stolee proposed to pursue one of the following options,
180+
though he was undecided about which one to choose:
181+
182+
1. Maximum Generation Number.
183+
2. Corrected Commit Date.
184+
185+
Maximum generation number has the advantage that it is
186+
backwards-compatibile, that is it can be used (but not updated) with
187+
the current code; however it is not locally-computable or
188+
immutable. Corrected commit date would require changes to the
189+
commit-graph format, but it can be updated incrementally.
190+
191+
Junio C Hamano [replied](https://public-inbox.org/git/[email protected]/t/#m83011e1c6f4dedf35a2e167870cdcb1bfda46e30)
192+
that
193+
194+
> [...] I personally do not think being compatible with
195+
> currently deployed clients is important at all (primarily because I
196+
> still consider the whole thing experimental), and there is a clear
197+
> way forward once we correct the mistake of not having a version
198+
> number in the file format that tells the updated clients to ignore
199+
> the generation numbers. For longer term viability, we should pick
200+
> something that is immutable, reproducible, computable with minimum
201+
> input---all of which would lead to being incrementally computable, I
202+
> would think.
203+
204+
It looks like the Corrected Commit Date is the way
205+
forward,... unless the variant of Maximum Generation
206+
Number [proposed by Jakub Narębski](https://public-inbox.org/git/[email protected]/t/#e6a1ca42ad3dd0e9f3a63912af404973be4cbe181),
207+
which looks like it could be updated almost incrementally, would
208+
turn out to be better. The change to use Corrected Commit Date
209+
would require new revision of the commit-graph format (which includes
210+
a version number, fortunately). Derrick Stolee [writes](https://public-inbox.org/git/[email protected]/)
211+
212+
> Here is my list for what needs to be in the next
213+
> version of the commit-graph file format:
214+
>
215+
> 1. A four-byte hash version.
216+
>
217+
> 2. File incrementality (split commit-graph).
218+
>
219+
> 3. Reachability Index versioning
220+
>
221+
> Most of these changes will happen in the file header. The chunks
222+
> themselves don't need to change, but some chunks may be added that
223+
> only make sense in v2 commit-graphs.
224+
225+
104226
## Developer Spotlight: Elijah Newren
105227

106228
* Who are you and what do you do?

0 commit comments

Comments
 (0)