Commit 5b65b8f
committed
list: improve performance with many topics
On a test repo with 100 topic branches:
$ git init test
$ cd test
$ git commit --allow-empty -m root
$ git topics setup
$ for i in $(seq 1 100); do git topics start $i; done
Before:
$ time git topics > /dev/null
real 0m2.344s
user 0m0.547s
sys 0m0.457s
After:
$ time git topics > /dev/null
real 0m0.285s
user 0m0.084s
sys 0m0.071s
There are many subtle things to talk about here...
This has long been a pain point for me on certain problematic repos, in
particular because `git topics list --all --porcelain` is used by tab
completion. While I can easily imagine more "specialized" ways of
improving performance for tab completion, the first thing to attack was
the `list` command itself.
The core problem was that the previous algorithm would loop over every
topic branch and invoke `git merge-base` (multiple times!) to determine
where that branch was merged. Essentially, this amounted to O(n) calls
in the number of topics, and each call is a relatively expensive
operation.
The fix introduced by this commit instead makes cleverer use of `git
for-each-ref` so that the heavy lifting is done by its --merged &
--no-merged flags using O(1) calls in the number of topics. While there
are certainly fewer overall calls to git subcommands, I'm not 100% clear
on why for-each-ref is more efficient than merge-base. Looking at the
implementations as of this writing (circa git v2.20.0-rc1), merge-base
uses commit-reach.c whereas for-each-ref uses ref-filter.c. The
algorithms are hairier than I can be bothered to pick apart right now,
but I suspect they're probably quite different. At any rate, the proof
is in the pudding, as seen in the above benchmark.
Changing this implementation meant reevaluating certain details of the
original approach. In particular, the logic formerly contained in the
`not_a_topic` function changes in two significant ways:
1. Now that we aren't calling merge-base on each individual branch, we
won't be able to recognize the interesting edge case of orphan
branches. Using `git checkout --orphan`, it's possible to create an
entirely separate root commit such that a branch will have no
ancestors in common with 'master' - so it's not *really* a topic
branch. This is generally uncommon, though of course git.git
exercises such strange cases, as in its 'todo' branch.
All this change means is that branches like 'todo' will show up as
unmerged topics. I figure that's not too terrible, given the
(perceived?) infrequency of this situation. Plus, I'm in good
company: even `git branch --no-merged` would have the same output,
since it uses for-each-ref & ref-filter.c underneath.
One possible workaround is to use the --contains flag to
for-each-ref. If you could identify the root commit of the 'master'
branch, you could make sure to only list refs that contained that
commit. However, the ways I could think to find a suitable commit all
seem hackish:
* `git rev-list --max-parents=0` gives you *all* the root commits, so
you'd still have to figure out which one belongs to 'master'.
* `git rev-list --reverse --max-count=1` just gives you the HEAD
commit, since the max count is applied *before* reversing the list.
I guess you could just pipe it to `tail -1`, but that sort of makes
me wrinkle my nose (strong argument, I know).
* Might be able to just call --contains with the latest version tag
from 'master', thus listing topics forked since the last release.
However, this still breaks on the base case, when you haven't done
a `git topics release` on the repo yet. Come to think of it, it
even breaks if you just started a topic then released without
finishing that topic yet. Never mind the fragility introduced by
manual tagging and such.
So I'll hold off on filtering orphans on the YAGNI assumption.
2. The old implementation had a special `case` clause to filter out
refs/*/HEAD. I basically did this because I didn't realize where
refs/origin/HEAD was coming from before in my own repos. GitHub sets
this when you switch its "default branch", and several of my projects
had it set to 'develop'. The real *underlying* thing I think we want
to avoid is just any symref in general. We can accomplish this easily
using one of the builtin atoms in for-each-ref --format. So really,
this is an algorithmic improvement: instead of hard-coding HEAD, we
avoid listing any symbolic refs (on the assumption that the concrete
ref will be listed regardless).
Finally, there has been an interesting performance *regression* when I
tried this change out idly on a clone of the git.git repo.
Before:
$ time git topics list -s
- pu
real 0m1.059s
user 0m0.748s
sys 0m0.110s
After:
$ time git topics list -s
- pu
- todo
real 0m2.822s
user 0m2.430s
sys 0m0.205s
I have yet to be able to reproduce what exactly is causing this. There
are few refs to loop through, so it seems to be an interesting case. My
first thought was that the commit history is very long on 'master', so
checking --merged was maybe slower in that case. However, I have not
been able to duplicate this in a vacuum.
The moral of the story, as it ever is with performance issues, is that
I'll have to keep my eyes peeled for cases that are palpably slow. In
the meantime, this commit seems to give a substantial improvement to my
current real-world examples.1 parent 414898f commit 5b65b8f
1 file changed
+66
-60
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
| 30 | + | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
39 | 43 | | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
| 44 | + | |
| 45 | + | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
| 89 | + | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
93 | | - | |
| 92 | + | |
| 93 | + | |
94 | 94 | | |
95 | | - | |
96 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
97 | 98 | | |
98 | 99 | | |
99 | 100 | | |
100 | | - | |
101 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
105 | 107 | | |
106 | | - | |
| 108 | + | |
107 | 109 | | |
108 | 110 | | |
109 | | - | |
110 | | - | |
| 111 | + | |
| 112 | + | |
111 | 113 | | |
112 | | - | |
113 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
114 | 117 | | |
115 | 118 | | |
116 | 119 | | |
117 | | - | |
118 | | - | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
119 | 123 | | |
120 | 124 | | |
121 | 125 | | |
122 | 126 | | |
123 | | - | |
| 127 | + | |
124 | 128 | | |
125 | 129 | | |
126 | | - | |
127 | | - | |
| 130 | + | |
| 131 | + | |
128 | 132 | | |
129 | | - | |
130 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
131 | 136 | | |
132 | 137 | | |
133 | 138 | | |
134 | | - | |
135 | | - | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
136 | 142 | | |
137 | 143 | | |
138 | 144 | | |
139 | 145 | | |
140 | | - | |
141 | | - | |
142 | | - | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
143 | 149 | | |
144 | 150 | | |
145 | | - | |
146 | | - | |
| 151 | + | |
| 152 | + | |
147 | 153 | | |
148 | 154 | | |
149 | 155 | | |
| |||
0 commit comments