Skip to content

Teach the builtin rebase about the builtin interactive rebase #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Aug 22, 2018

The builtin rebase and the builtin interactive rebase have been developed independently, on purpose: Google Summer of Code rules specifically state that students have to work on independent projects, they cannot collaborate on the same project.

The reason is probably the very fine tradition in academia to prohibit teamwork, which makes grading easier (at the expense of not exactly preparing the students for the real world, unless they want to stay in academia).

One fallout is that the rebase-in-c and rebase-i-in-c patches cause no merge conflicts but a royal number of tests in the test suite to fail.

It is easy to explain why: rebase-in-c was developed under the assumption that all rebase backends are implemented in Unix shell script and can be sourced via . git-rebase--<backend>, which is no longer true with rebase-i-in-c, where git-rebase--interactive is a hard-linked builtin.

This patch fixes that.

Note: while this patch targets pk/rebase-in-c-6-final, it will not work correctly without ag/rebase-i-in-c. So my suggestion is to rewrite the pk/rebas-in-c-6-final branch by first merging ag/rebase-i-in-c, then applying this here patch, and only then cherry-pick "rebase: default to using the builtin rebase".

Changes since v2:

  • Prepare for the break command, by skipping the call to finish_rebase() for interactive rebases altogether (the built-in interactive rebase already takes care of that).
  • Remove a no-longer-used case arm (skipped by the newly-introduced code).

Changes since v1:

  • replaced the too-terse commit message by a copy-edited version of this cover letter (leaving out only the rant about disallowing teamwork).

@dscho
Copy link
Member Author

dscho commented Aug 22, 2018

For lurkers: the test failures are to be expected because of this:

Note: while this patch targets pk/rebase-in-c-6-final, it will not work correctly without ag/rebase-i-in-c. So my suggestion is to rewrite the pk/rebas-in-c-6-final branch by first merging ag/rebase-i-in-c, then applying this here patch, and only then cherry-pick "rebase: default to using the builtin rebase".

@dscho
Copy link
Member Author

dscho commented Aug 22, 2018

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 22, 2018

Submitted as [email protected]

@dscho dscho force-pushed the pk/rebase-in-c-6-final branch from a5bb234 to 73b2570 Compare August 22, 2018 22:23
@dscho dscho force-pushed the rebase-in-c-6-final branch from 29d4981 to 5403014 Compare August 29, 2018 14:28
@dscho
Copy link
Member Author

dscho commented Aug 29, 2018

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 29, 2018

Submitted as [email protected]

agrn added 4 commits August 29, 2018 13:38
This rewrites the part of interactive rebase which initializes the
basic state, make the script and complete the action, as a buitin, named
git-rebase--interactive2 for now.  Others modes (`--continue`,
`--edit-todo`, etc.) will be rewritten in the next commit.

git-rebase--interactive.sh is modified to call git-rebase--interactive2
instead of git-rebase--helper.

Signed-off-by: Alban Gruin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This rewrites the submodes of interactive rebase (`--continue`,
`--skip`, `--edit-todo`, and `--show-current-patch`) in C.

git-rebase.sh is then modified to call directly git-rebase--interactive2
instead of git-rebase--interactive.sh.

Signed-off-by: Alban Gruin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This removes git-rebase--interactive.sh, as its functionnality has been
replaced by git-rebase--interactive2.

git-rebase--interactive2.c is then renamed to git-rebase--interactive.c.

Signed-off-by: Alban Gruin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This moves the rebase--helper modes still used by
git-rebase--preserve-merges.sh (`--shorten-ids`, `--expand-ids`,
`--check-todo-list`, `--rearrange-squash` and `--add-exec-commands`) to
rebase--interactive.c.

git-rebase--preserve-merges.sh is modified accordingly, and
rebase--helper.c is removed as it is useless now.

Signed-off-by: Alban Gruin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
@dscho dscho force-pushed the pk/rebase-in-c-6-final branch from 73b2570 to 6582824 Compare August 29, 2018 22:36
prertik and others added 17 commits September 6, 2018 11:56
This commit introduces support for `--gpg-sign` option which is used
to GPG-sign commits.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This commit converts more code from the shell script version to the
builtin rebase. In this instance, we just have to be careful to
keep support for passing multiple `--whitespace` options, as the
shell script version does so, too.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
To support `--autostash` we introduce a function `apply_autostash()`
just like in `git-legacy-rebase.sh`.

Rather than refactoring and using the same function that exists in
`sequencer.c`, we go a different route here, to avoid clashes with
the sister GSoC project that turns the interactive rebase into a
builtin.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This commit adds support for the `--exec` option which takes a shell
command-line as argument. This argument will be appended as an `exec
<cmd>` command after each line in the todo list that creates a commit in
the final history.  commands.

Note: while the shell script version of `git rebase` assigned the empty
string to `cmd` by default, we *unset* it here because the code looks
nicer and it does not change the behavior.

The `--exec` option requires `--interactive` machinery.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This commit introduces the `--allow-empty-message` option to
`builtin/rebase.c`. The motivation behind this option is: if there are
empty messages (which is not allowed in Git by default, but can be
imported from different version control systems), the rebase will fail.

Using `--allow-empty-message` overrides that behaviour which will allow
the commits having empty messages to continue in rebase operation.

Note: a very recent change made this the default in the shell scripted
`git rebase`, therefore the builtin rebase does the same.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
The mode to rebase non-linear branches is now supported by the builtin
rebase, too.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
We need this functionality in the builtin rebase.

Note: to make this function truly reusable, we have to switch the call
get_merges_many_dirty() to get_merges_many() because we want the commit
flags to be reset (otherwise, subsequent get_merge_bases() calls would
obtain incorrect results). This did not matter when the function was
called in `git rev-parse --fork-point` because in that command, the
process definitely did not traverse any commits before exiting.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This commit adds support for `--fork-point` and `--no-fork-point`.
This is converted as-is from `git-legacy-rebase.sh`.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
When running a rebase in non-am mode, it uses the recursive merge to
cherry-pick the commits, and the rebase command allows to configure
the merge strategy to be used in this operation.

This commit adds that support to the builtin rebase.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
This option allows to rebase entire histories up to, and including, the
root commit.

The conversion from the shell script is straight-forward, apart from
the fact that we do not have to write an empty tree in C.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
The `git rebase` command, when called without the `<upstream>`
command-line argument, automatically looks for the upstream
branch configured for the current branch.

With this commit, the builtin rebase learned that trick, too.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
In the next patch, we will make use of that in the code that
fast-forwards to `onto` whenever possible.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
When trying to rebase onto a direct descendant of HEAD, we can
take a shortcut and fast-forward instead. This commit makes it so.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
In this commit, we pass `--progress` to the `format-patch` command if
stderr is connected to an interactive terminal, unless we're in quiet
mode.

This `--progress` option will be used in `format-patch` to show progress
reports on stderr as patches are generated.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Some options are only handled by the git-rebase--interactive backend,
even if run non-interactively. For this awkward situation (run
non-interactively, but use the interactive backend), the shell scripted
version of `git rebase` introduced the concept of an "implied
interactive rebase". All it does is to replace the editor by a dummy one
(`:` is the Unix command that takes arbitrary command-line parameters,
ignores them and simply exits with success).

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
While working on the GSoC project to convert the rebase command to a
builtin, the rebase command learned to error out on certain command-line
option combinations that cannot work, such as --whitespace=fix with
--interactive.

This commit converts that code.

Signed-off-by: Pratik Karki <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
…ase-i-in-c

* ag/rebase-i-in-c:
  rebase -i: move rebase--helper modes to rebase--interactive
  rebase -i: remove git-rebase--interactive.sh
  rebase--interactive2: rewrite the submodes of interactive rebase in C
  rebase -i: implement the main part of interactive rebase as a builtin
  rebase -i: rewrite init_basic_state() in C
  rebase -i: rewrite write_basic_state() in C
  rebase -i: rewrite the rest of init_revisions_and_shortrevisions() in C
  rebase -i: implement the logic to initialize $revisions in C
  rebase -i: remove unused modes and functions
  rebase -i: rewrite complete_action() in C
  t3404: todo list with commented-out commands only aborts
  sequencer: change the way skip_unnecessary_picks() returns its result
  sequencer: refactor append_todo_help() to write its message to a buffer
  rebase -i: rewrite checkout_onto() in C
  rebase -i: rewrite setup_reflog_action() in C
  sequencer: add a new function to silence a command, except if it fails
  rebase -i: rewrite the edit-todo functionality in C
  editor: add a function to launch the sequence editor
  rebase -i: rewrite append_todo_help() in C
  sequencer: make three functions and an enum from sequencer.c public
@dscho dscho force-pushed the pk/rebase-in-c-6-final branch from 6582824 to 53f745a Compare September 6, 2018 21:48
The builtin rebase and the builtin interactive rebase have been
developed independently, on purpose: Google Summer of Code rules
specifically state that students have to work on independent projects,
they cannot collaborate on the same project.

One fallout is that the rebase-in-c and rebase-i-in-c patches cause no
merge conflicts but a royal number of tests in the test suite to fail.

It is easy to explain why: rebase-in-c was developed under the
assumption that all rebase backends are implemented in Unix shell script
and can be sourced via `. git-rebase--<backend>`, which is no longer
true with rebase-i-in-c, where git-rebase--interactive is a hard-linked
builtin.

This patch fixes that.

Please note that we also skip the finish_rebase() call for interactive
rebases because the built-in interactive rebase already takes care of
that. This is needed to support the upcoming `break` command that wants
to interrupt the rebase with exit code 0 (and naturally wants to keep
the state directory intact when doing so).

While at it, remove the `case` arm for the interactive rebase that is
now skipped in favor of the short-cut to the built-in rebase.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the rebase-in-c-6-final branch from 5403014 to db1652e Compare October 5, 2018 15:47
@dscho
Copy link
Member Author

dscho commented Oct 5, 2018

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 5, 2018

Submitted as [email protected]

@dscho dscho force-pushed the pk/rebase-in-c-6-final branch 2 times, most recently from c8b4c4b to 5541bd5 Compare October 11, 2018 08:37
Copy link
Member Author

dscho commented Nov 2, 2018

This branch is now known as js/rebase-in-c-5.5-work-with-rebase-i-in-c.

Copy link
Member Author

dscho commented Nov 2, 2018

This patch series was integrated into pu via git@c67e5b4.

Copy link
Member Author

dscho commented Nov 2, 2018

This patch series was integrated into next via git@b734d81.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Member Author

dscho commented Nov 5, 2018

This patch series was integrated into master via git@4b3517e.

@dscho dscho added the master label Nov 5, 2018 — with GitGitGadget
@dscho dscho closed this Nov 5, 2018
Copy link
Member Author

dscho commented Nov 5, 2018

Closed via 4b3517e.

@dscho dscho deleted the rebase-in-c-6-final branch November 5, 2018 21:10
dscho pushed a commit that referenced this pull request Jan 17, 2023
It is possible to trigger an integer overflow when parsing attribute
names when there are more than 2^31 of them for a single pattern. This
can either lead to us dying due to trying to request too many bytes:

     blob=$(perl -e 'print "f" . " a=" x 2147483649' | git hash-object -w --stdin)
     git update-index --add --cacheinfo 100644,$blob,.gitattributes
     git attr-check --all file

    =================================================================
    ==1022==ERROR: AddressSanitizer: requested allocation size 0xfffffff800000032 (0xfffffff800001038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0)
        #0 0x7fd3efabf411 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
        #1 0x5563a0a1e3d3 in xcalloc wrapper.c:150
        #2 0x5563a058d005 in parse_attr_line attr.c:384
        #3 0x5563a058e661 in handle_attr_line attr.c:660
        #4 0x5563a058eddb in read_attr_from_index attr.c:769
        #5 0x5563a058ef12 in read_attr attr.c:797
        #6 0x5563a058f24c in bootstrap_attr_stack attr.c:867
        #7 0x5563a058f4a3 in prepare_attr_stack attr.c:902
        #8 0x5563a05905da in collect_some_attrs attr.c:1097
        #9 0x5563a059093d in git_all_attrs attr.c:1128
        #10 0x5563a02f636e in check_attr builtin/check-attr.c:67
        #11 0x5563a02f6c12 in cmd_check_attr builtin/check-attr.c:183
        #12 0x5563a02aa993 in run_builtin git.c:466
        #13 0x5563a02ab397 in handle_builtin git.c:721
        #14 0x5563a02abb2b in run_argv git.c:788
        #15 0x5563a02ac991 in cmd_main git.c:926
        #16 0x5563a05432bd in main common-main.c:57
        #17 0x7fd3ef82228f  (/usr/lib/libc.so.6+0x2328f)

    ==1022==HINT: if you don't care about these errors you may set allocator_may_return_null=1
    SUMMARY: AddressSanitizer: allocation-size-too-big /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 in __interceptor_calloc
    ==1022==ABORTING

Or, much worse, it can lead to an out-of-bounds write because we
underallocate and then memcpy(3P) into an array:

    perl -e '
        print "A " . "\rh="x2000000000;
        print "\rh="x2000000000;
        print "\rh="x294967294 . "\n"
    ' >.gitattributes
    git add .gitattributes
    git commit -am "evil attributes"

    $ git clone --quiet /path/to/repo
    =================================================================
    ==15062==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002550 at pc 0x5555559884d5 bp 0x7fffffffbc60 sp 0x7fffffffbc58
    WRITE of size 8 at 0x602000002550 thread T0
        #0 0x5555559884d4 in parse_attr_line attr.c:393
        #1 0x5555559884d4 in handle_attr_line attr.c:660
        #2 0x555555988902 in read_attr_from_index attr.c:784
        #3 0x555555988902 in read_attr_from_index attr.c:747
        #4 0x555555988a1d in read_attr attr.c:800
        #5 0x555555989b0c in bootstrap_attr_stack attr.c:882
        #6 0x555555989b0c in prepare_attr_stack attr.c:917
        #7 0x555555989b0c in collect_some_attrs attr.c:1112
        #8 0x55555598b141 in git_check_attr attr.c:1126
        #9 0x555555a13004 in convert_attrs convert.c:1311
        #10 0x555555a95e04 in checkout_entry_ca entry.c:553
        #11 0x555555d58bf6 in checkout_entry entry.h:42
        #12 0x555555d58bf6 in check_updates unpack-trees.c:480
        #13 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
        #14 0x555555785ab7 in checkout builtin/clone.c:724
        #15 0x555555785ab7 in cmd_clone builtin/clone.c:1384
        #16 0x55555572443c in run_builtin git.c:466
        #17 0x55555572443c in handle_builtin git.c:721
        #18 0x555555727872 in run_argv git.c:788
        #19 0x555555727872 in cmd_main git.c:926
        #20 0x555555721fa0 in main common-main.c:57
        #21 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
        #22 0x555555723f39 in _start (git+0x1cff39)

    0x602000002552 is located 0 bytes to the right of 2-byte region [0x602000002550,0x602000002552) allocated by thread T0 here:
        #0 0x7ffff768c037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x555555d7fff7 in xcalloc wrapper.c:150
        #2 0x55555598815f in parse_attr_line attr.c:384
        #3 0x55555598815f in handle_attr_line attr.c:660
        #4 0x555555988902 in read_attr_from_index attr.c:784
        #5 0x555555988902 in read_attr_from_index attr.c:747
        #6 0x555555988a1d in read_attr attr.c:800
        #7 0x555555989b0c in bootstrap_attr_stack attr.c:882
        #8 0x555555989b0c in prepare_attr_stack attr.c:917
        #9 0x555555989b0c in collect_some_attrs attr.c:1112
        #10 0x55555598b141 in git_check_attr attr.c:1126
        #11 0x555555a13004 in convert_attrs convert.c:1311
        #12 0x555555a95e04 in checkout_entry_ca entry.c:553
        #13 0x555555d58bf6 in checkout_entry entry.h:42
        #14 0x555555d58bf6 in check_updates unpack-trees.c:480
        #15 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
        #16 0x555555785ab7 in checkout builtin/clone.c:724
        #17 0x555555785ab7 in cmd_clone builtin/clone.c:1384
        #18 0x55555572443c in run_builtin git.c:466
        #19 0x55555572443c in handle_builtin git.c:721
        #20 0x555555727872 in run_argv git.c:788
        #21 0x555555727872 in cmd_main git.c:926
        #22 0x555555721fa0 in main common-main.c:57
        #23 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308

    SUMMARY: AddressSanitizer: heap-buffer-overflow attr.c:393 in parse_attr_line
    Shadow bytes around the buggy address:
      0x0c047fff8450: fa fa 00 02 fa fa 00 07 fa fa fd fd fa fa 00 00
      0x0c047fff8460: fa fa 02 fa fa fa fd fd fa fa 00 06 fa fa 05 fa
      0x0c047fff8470: fa fa fd fd fa fa 00 02 fa fa 06 fa fa fa 05 fa
      0x0c047fff8480: fa fa 07 fa fa fa fd fd fa fa 00 01 fa fa 00 02
      0x0c047fff8490: fa fa 00 03 fa fa 00 fa fa fa 00 01 fa fa 00 03
    =>0x0c047fff84a0: fa fa 00 01 fa fa 00 02 fa fa[02]fa fa fa fa fa
      0x0c047fff84b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
      Shadow gap:              cc
    ==15062==ABORTING

Fix this bug by using `size_t` instead to count the number of attributes
so that this value cannot reasonably overflow without running out of
memory before already.

Reported-by: Markus Vervier <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
gitgitgadget bot pushed a commit that referenced this pull request Aug 19, 2024
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
gitgitgadget bot pushed a commit that referenced this pull request Aug 22, 2024
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
gitgitgadget bot pushed a commit that referenced this pull request Aug 23, 2024
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants