@@ -101,6 +101,128 @@ This edition covers what happened during the month of October 2018.
101
101
should be in the upcoming v2.20.0 Git release
102
102
[ scheduled for the beginning of December] ( https://tinyurl.com/gitCal ) .
103
103
104
+
105
+ ### General
106
+
107
+ * [[ RFC
] Generation Number v2
] ( https://public-inbox.org/git/[email protected] /t/#u )
108
+
109
+ The [ commit-graph] ( https://github.com/git/git/blob/master/Documentation/technical/commit-graph.txt )
110
+ file mechanism (see the description above) accelerates commit graph
111
+ walks in the two following ways:
112
+
113
+ 1 . Getting basic commit data (satisfying requirements of
114
+ ` parse_commit_gently() ` ) without decompressing and parsing.
115
+ 2 . Reducing the time it takes to walk commits and determine
116
+ topological relationships by providing "generation number" (as a
117
+ negative-cut reachability index), which makes it possible skip
118
+ walking whole parts of commit graph.
119
+
120
+ The current version of the generation number has the advantage over
121
+ using heuristic based on the commit date that it is always correct.
122
+ It turned out however that in some cases it can give worse
123
+ performance than using the date heuristics; that is why its use got
124
+ limited in [[ PATCH 1/1] commit: don't use generation numbers if not
125
+ needed] ( https://public-inbox.org/git/efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com/ ) .
126
+
127
+ For the same reason why [[ PATCH 0/6] Use generation numbers for
128
+ --topo-order
] ( https://public-inbox.org/git/[email protected] / ) ,
129
+ and its subsequent revisions, also limited its use:
130
+
131
+ > One notable case that is not included in this series is the case
132
+ > of a history comparison such as ` git rev-list --topo-order A..B ` .
133
+
134
+ Removing this limitation yields correct results, but the performance
135
+ is worse.
136
+
137
+ That is why Derrick Stolee sent
[ this RFC
] ( https://public-inbox.org/git/[email protected] /t/#u )
138
+
139
+ > We've discussed in several places how to improve upon generation
140
+ > numbers. This RFC is a report based on my investigation into a
141
+ > few new options, and how they compare for Git's purposes on
142
+ > several existing open-source repos.
143
+ >
144
+ > You can find this report and the associated test scripts at
145
+ > < https://github.com/derrickstolee/gen-test > .
146
+
147
+ > Please also let me know about any additional tests that I could
148
+ > run. Now that I've got a lot of test scripts built up, I can re-run
149
+ > the test suite pretty quickly.
150
+
151
+ He then explains why Generation Number v2 is needed:
152
+
153
+ > Specifically, some algorithms in Git already use commit date as a
154
+ > heuristic reachability index. This has some problems, though, since
155
+ > commit date can be incorrect for several reasons (clock skew between
156
+ > machines, purposefully setting ` GIT_COMMIT_DATE ` to the author date, etc.).
157
+ > However, the speed boost by using commit date as a cutoff was so
158
+ > important in these cases, that the potential for incorrect answers was
159
+ > considered acceptable.
160
+ >
161
+ > When these algorithms were converted to use generation numbers, we
162
+ > _ added_ the extra constraint that the algorithms are _ never incorrect_ .
163
+ > Unfortunately, this led to some cases where performance was worse than
164
+ > before. There are known cases where ` git merge-base A B ` or
165
+ > ` git log --topo-order A..B ` are worse when using generation numbers
166
+ > than when using commit dates.
167
+ >
168
+ > This report investigates four replacements for generation numbers, and
169
+ > compares the number of walked commits to the existing algorithms (both
170
+ > using generation numbers and not using them at all). We can use this
171
+ > data to make decisions for the future of the feature.
172
+
173
+ The very rough implementation of those four proposed generation
174
+ numbers can be found in [ the ` reach-perf ` branch in
175
+ https://github.com/derrickstolee/git ] ( https://github.com/derrickstolee/git/tree/reach-perf ) .
176
+
177
+ Based on performed benchmarks (by comparing the number of commits
178
+ walked with the help of
[ trace2
] ( https://public-inbox.org/git/[email protected] /t/#u )
179
+ facility), Stolee proposed to pursue one of the following options,
180
+ though he was undecided about which one to choose:
181
+
182
+ 1 . Maximum Generation Number.
183
+ 2 . Corrected Commit Date.
184
+
185
+ Maximum generation number has the advantage that it is
186
+ backwards-compatibile, that is it can be used (but not updated) with
187
+ the current code; however it is not locally-computable or
188
+ immutable. Corrected commit date would require changes to the
189
+ commit-graph format, but it can be updated incrementally.
190
+
191
+ Junio C Hamano
[ replied
] ( https://public-inbox.org/git/[email protected] /t/#m83011e1c6f4dedf35a2e167870cdcb1bfda46e30 )
192
+ that
193
+
194
+ > [ ...] I personally do not think being compatible with
195
+ > currently deployed clients is important at all (primarily because I
196
+ > still consider the whole thing experimental), and there is a clear
197
+ > way forward once we correct the mistake of not having a version
198
+ > number in the file format that tells the updated clients to ignore
199
+ > the generation numbers. For longer term viability, we should pick
200
+ > something that is immutable, reproducible, computable with minimum
201
+ > input---all of which would lead to being incrementally computable, I
202
+ > would think.
203
+
204
+ It looks like the Corrected Commit Date is the way
205
+ forward,... unless the variant of Maximum Generation
206
+ Number
[ proposed by Jakub Narębski
] ( https://public-inbox.org/git/[email protected] /t/#e6a1ca42ad3dd0e9f3a63912af404973be4cbe181 ) ,
207
+ which looks like it could be updated almost incrementally, would
208
+ turn out to be better. The change to use Corrected Commit Date
209
+ would require new revision of the commit-graph format (which includes
210
+ a version number, fortunately). Derrick Stolee
[ writes
] ( https://public-inbox.org/git/[email protected] / )
211
+
212
+ > Here is my list for what needs to be in the next
213
+ > version of the commit-graph file format:
214
+ >
215
+ > 1 . A four-byte hash version.
216
+ >
217
+ > 2 . File incrementality (split commit-graph).
218
+ >
219
+ > 3 . Reachability Index versioning
220
+ >
221
+ > Most of these changes will happen in the file header. The chunks
222
+ > themselves don't need to change, but some chunks may be added that
223
+ > only make sense in v2 commit-graphs.
224
+
225
+
104
226
## Developer Spotlight: Elijah Newren
105
227
106
228
* Who are you and what do you do?
0 commit comments