Skip to content

Commit 5b65b8f

Browse files
committed
list: improve performance with many topics
On a test repo with 100 topic branches: $ git init test $ cd test $ git commit --allow-empty -m root $ git topics setup $ for i in $(seq 1 100); do git topics start $i; done Before: $ time git topics > /dev/null real 0m2.344s user 0m0.547s sys 0m0.457s After: $ time git topics > /dev/null real 0m0.285s user 0m0.084s sys 0m0.071s There are many subtle things to talk about here... This has long been a pain point for me on certain problematic repos, in particular because `git topics list --all --porcelain` is used by tab completion. While I can easily imagine more "specialized" ways of improving performance for tab completion, the first thing to attack was the `list` command itself. The core problem was that the previous algorithm would loop over every topic branch and invoke `git merge-base` (multiple times!) to determine where that branch was merged. Essentially, this amounted to O(n) calls in the number of topics, and each call is a relatively expensive operation. The fix introduced by this commit instead makes cleverer use of `git for-each-ref` so that the heavy lifting is done by its --merged & --no-merged flags using O(1) calls in the number of topics. While there are certainly fewer overall calls to git subcommands, I'm not 100% clear on why for-each-ref is more efficient than merge-base. Looking at the implementations as of this writing (circa git v2.20.0-rc1), merge-base uses commit-reach.c whereas for-each-ref uses ref-filter.c. The algorithms are hairier than I can be bothered to pick apart right now, but I suspect they're probably quite different. At any rate, the proof is in the pudding, as seen in the above benchmark. Changing this implementation meant reevaluating certain details of the original approach. In particular, the logic formerly contained in the `not_a_topic` function changes in two significant ways: 1. Now that we aren't calling merge-base on each individual branch, we won't be able to recognize the interesting edge case of orphan branches. Using `git checkout --orphan`, it's possible to create an entirely separate root commit such that a branch will have no ancestors in common with 'master' - so it's not *really* a topic branch. This is generally uncommon, though of course git.git exercises such strange cases, as in its 'todo' branch. All this change means is that branches like 'todo' will show up as unmerged topics. I figure that's not too terrible, given the (perceived?) infrequency of this situation. Plus, I'm in good company: even `git branch --no-merged` would have the same output, since it uses for-each-ref & ref-filter.c underneath. One possible workaround is to use the --contains flag to for-each-ref. If you could identify the root commit of the 'master' branch, you could make sure to only list refs that contained that commit. However, the ways I could think to find a suitable commit all seem hackish: * `git rev-list --max-parents=0` gives you *all* the root commits, so you'd still have to figure out which one belongs to 'master'. * `git rev-list --reverse --max-count=1` just gives you the HEAD commit, since the max count is applied *before* reversing the list. I guess you could just pipe it to `tail -1`, but that sort of makes me wrinkle my nose (strong argument, I know). * Might be able to just call --contains with the latest version tag from 'master', thus listing topics forked since the last release. However, this still breaks on the base case, when you haven't done a `git topics release` on the repo yet. Come to think of it, it even breaks if you just started a topic then released without finishing that topic yet. Never mind the fragility introduced by manual tagging and such. So I'll hold off on filtering orphans on the YAGNI assumption. 2. The old implementation had a special `case` clause to filter out refs/*/HEAD. I basically did this because I didn't realize where refs/origin/HEAD was coming from before in my own repos. GitHub sets this when you switch its "default branch", and several of my projects had it set to 'develop'. The real *underlying* thing I think we want to avoid is just any symref in general. We can accomplish this easily using one of the builtin atoms in for-each-ref --format. So really, this is an algorithmic improvement: instead of hard-coding HEAD, we avoid listing any symbolic refs (on the assumption that the concrete ref will be listed regardless). Finally, there has been an interesting performance *regression* when I tried this change out idly on a clone of the git.git repo. Before: $ time git topics list -s - pu real 0m1.059s user 0m0.748s sys 0m0.110s After: $ time git topics list -s - pu - todo real 0m2.822s user 0m2.430s sys 0m0.205s I have yet to be able to reproduce what exactly is causing this. There are few refs to loop through, so it seems to be an interesting case. My first thought was that the commit history is very long on 'master', so checking --merged was maybe slower in that case. However, I have not been able to duplicate this in a vacuum. The moral of the story, as it ever is with performance issues, is that I'll have to keep my eyes peeled for cases that are palpably slow. In the meantime, this commit seems to give a substantial improvement to my current real-world examples.
1 parent 414898f commit 5b65b8f

File tree

1 file changed

+66
-60
lines changed

1 file changed

+66
-60
lines changed

libexec/lib/list

Lines changed: 66 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -27,40 +27,40 @@ done
2727

2828
require_setup
2929

30-
refname() {
31-
git rev-parse --symbolic-full-name "$1" 2>/dev/null
30+
push() {
31+
git rev-parse --symbolic-full-name "$1@{push}" 2>/dev/null
3232
}
3333

34-
master_ref="$(refname "$MASTER")"
35-
master_pushref="$(refname "$MASTER@{push}")"
36-
37-
develop_ref="$(refname "$DEVELOP")"
38-
develop_pushref="$(refname "$DEVELOP@{push}")"
34+
topics="
35+
if test %(refname:short) != '$MASTER' &&
36+
test %(refname:short) != '$DEVELOP' &&
37+
test %(refname) != '$(push "$MASTER")' &&
38+
test %(refname) != '$(push "$DEVELOP")' &&
39+
test -z %(symref); then
40+
echo %(refname:short)
41+
fi
42+
"
3943

40-
not_a_topic() {
41-
case "$1" in
42-
refs/*/HEAD) return 0 ;;
43-
"$master_ref"|"$master_pushref") return 0 ;;
44-
"$develop_ref"|"$develop_pushref") return 0 ;;
45-
*) test -z "$(git merge-base "$MASTER" "$1")" ;;
46-
esac
44+
topics() {
45+
eval "$(xargs git for-each-ref --shell --format="$topics")"
4746
}
4847

49-
on_master=()
50-
on_develop=()
51-
on_topic=()
52-
53-
while read branch; do
54-
if not_a_topic "$branch"; then
55-
continue
56-
elif git merge-base --is-ancestor "$branch" "$MASTER"; then
57-
on_master+=("$branch")
58-
elif git merge-base --is-ancestor "$branch" "$DEVELOP"; then
59-
on_develop+=("$branch")
60-
else
61-
on_topic+=("$branch")
62-
fi
63-
done < <(echo "${patterns[@]}" | xargs git for-each-ref --format="%(refname)")
48+
finished=($(
49+
git for-each-ref --format="%(refname)" --merged "$MASTER" $patterns |
50+
topics
51+
))
52+
53+
integrated=($(
54+
git for-each-ref --format="%(refname)" --merged "$DEVELOP" $patterns |
55+
xargs git for-each-ref --format="%(refname)" --no-merged "$MASTER" |
56+
topics
57+
))
58+
59+
started=($(
60+
git for-each-ref --format="%(refname)" --no-merged "$DEVELOP" $patterns |
61+
xargs git for-each-ref --format="%(refname)" --no-merged "$MASTER" |
62+
topics
63+
))
6464

6565
case "$format" in
6666
long|short)
@@ -79,71 +79,77 @@ case "$format" in
7979
esac
8080

8181
if test "$colorize" = "true"; then
82-
header="$(git config --get-color "color.topics.header" "normal")"
83-
finished="$(git config --get-color "color.topics.finished" "green")"
84-
integrated="$(git config --get-color "color.topics.integrated" "yellow")"
85-
started="$(git config --get-color "color.topics.started" "red")"
86-
reset="$(git config --get-color "" "reset")"
82+
header_color="$(git config --get-color color.topics.header normal)"
83+
finished_color="$(git config --get-color color.topics.finished green)"
84+
integrated_color="$(git config --get-color color.topics.integrated yellow)"
85+
started_color="$(git config --get-color color.topics.started red)"
86+
reset_color="$(git config --get-color "" reset)"
8787
fi
8888

89-
if test "${#on_master[@]}" -ne 0; then
89+
if test "${#finished[@]}" -ne 0; then
9090
case "$format" in
9191
long)
92-
echo "${header}Topics merged to $MASTER:"
93-
echo " (use 'git topics release' to tag a new version)$reset"
92+
echo "${header_color}Topics merged to $MASTER:"
93+
echo " (use 'git topics release' to tag a new version)$reset_color"
9494
echo
95-
echo "${on_master[@]}" |
96-
xargs git for-each-ref --format=" ${finished}%(refname:short)$reset"
95+
for topic in "${finished[@]}"; do
96+
echo " $finished_color$topic$reset_color"
97+
done
9798
echo
9899
;;
99100
short|porcelain)
100-
echo "${on_master[@]}" |
101-
xargs git for-each-ref --format="* ${finished}%(refname:short)$reset"
101+
for topic in "${finished[@]}"; do
102+
echo "* $finished_color$topic$reset_color"
103+
done
102104
;;
103105
esac
104106
fi
105107

106-
if test "${#on_develop[@]}" -ne 0; then
108+
if test "${#integrated[@]}" -ne 0; then
107109
case "$format" in
108110
long)
109-
echo "${header}Topics merged to $DEVELOP:"
110-
echo " (use 'git topics finish' to promote to $MASTER)$reset"
111+
echo "${header_color}Topics merged to $DEVELOP:"
112+
echo " (use 'git topics finish' to promote to $MASTER)$reset_color"
111113
echo
112-
echo "${on_develop[@]}" |
113-
xargs git for-each-ref --format=" ${integrated}%(refname:short)$reset"
114+
for topic in "${integrated[@]}"; do
115+
echo " $integrated_color$topic$reset_color"
116+
done
114117
echo
115118
;;
116119
short|porcelain)
117-
echo "${on_develop[@]}" |
118-
xargs git for-each-ref --format="+ ${integrated}%(refname:short)$reset"
120+
for topic in "${integrated[@]}"; do
121+
echo "+ $integrated_color$topic$reset_color"
122+
done
119123
;;
120124
esac
121125
fi
122126

123-
if test "${#on_topic[@]}" -ne 0; then
127+
if test "${#started[@]}" -ne 0; then
124128
case "$format" in
125129
long)
126-
echo "${header}Topics not yet merged:"
127-
echo " (use 'git topics integrate' to promote to $DEVELOP)$reset"
130+
echo "${header_color}Topics not yet merged:"
131+
echo " (use 'git topics integrate' to promote to $DEVELOP)$reset_color"
128132
echo
129-
echo "${on_topic[@]}" |
130-
xargs git for-each-ref --format=" ${started}%(refname:short)$reset"
133+
for topic in "${started[@]}"; do
134+
echo " $started_color$topic$reset_color"
135+
done
131136
echo
132137
;;
133138
short|porcelain)
134-
echo "${on_topic[@]}" |
135-
xargs git for-each-ref --format="- ${started}%(refname:short)$reset"
139+
for topic in "${started[@]}"; do
140+
echo "- $started_color$topic$reset_color"
141+
done
136142
;;
137143
esac
138144
fi
139145

140-
if test "${#on_topic[@]}" -eq 0 &&
141-
test "${#on_master[@]}" -eq 0 &&
142-
test "${#on_develop[@]}" -eq 0; then
146+
if test "${#finished[@]}" -eq 0 &&
147+
test "${#integrated[@]}" -eq 0 &&
148+
test "${#started[@]}" -eq 0; then
143149
case "$format" in
144150
long)
145-
echo "${header}No topics found."
146-
echo "Use 'git topics start' to create a new branch.$reset"
151+
echo "${header_color}No topics found."
152+
echo "Use 'git topics start' to create a new branch.$reset_color"
147153
;;
148154
short|porcelain)
149155
;;

0 commit comments

Comments
 (0)