-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Original bug ID: 7431
Reporter: alexmarkley
Assigned to: @mshinwell
Status: closed (set by @mshinwell on 2016-12-22T07:16:05Z)
Resolution: not a bug
Priority: normal
Severity: crash
Platform: x86_64
OS: Linux
OS Version: Fedora 25
Version: 4.04.0
Category: runtime system and C interface
Bug description
When running the Unison File Synchronizer (a project written in OCaml: https://www.cis.upenn.edu/~bcpierce/unison/index.html ) against a large replica (1TB), I am encountering a showstopping segfault every single time.
-
I have tried multiple versions of Unison, including stable versions which were working fine for me in the past and newer beta versions.
-
I initially tried the official Fedora builds of OCaml (4.02.3-3) and when I was having no success with those, I removed them from my system and I built/installed OCaml 4.04.0 myself.
-
I finally got a really good backtrace (included in this report) running OCaml 4.04 and Unison git master. As you can see, the segmentation fault occurred within the OCaml heap compaction portion of the garbage collection routine.
-
It is worth noting that I never had this problem in earlier releases of Fedora, even with the same or earlier versions of OCaml and Unison. (I'm not sure what this implies, except that perhaps the bug is actually being triggered by a lower-level system component, like GCC or a system library.)
Related bug reports:
https://bugzilla.redhat.com/show_bug.cgi?id=1401759
Steps to reproduce
NOTE: These steps may only successfully reproduce the issue if the client is running Fedora 25 on x86_64, and if both OCaml and Unison were built on that machine.
-
Create a large, complicated dataset on the server for Unison to synchronize. Ideally this will be over 1TB in size and require over 2 hours to transfer.
-
Perform a synchronization between the client and the server, requiring the majority of the data to be transferred from the server to the client. (This mimics initial synchronization of a new hub/spoke node.)
-
Observe the client fails to synchronize the entire dataset. Client is terminated with SIGSEGV.
Additional information
===SNIP===
/home/alex/Temp/galculator-2.1.3/intltool-extract.in has already been transferred
/home/alex/Temp/galculator-2.1.3/intltool-merge.in has already been transferred
/home/alex/Temp/galculator-2.1.3/intltool-update.in has already been transferred
33% 100:25 ETA
Program received signal SIGSEGV, Segmentation fault.
0x00000000004ec76b in invert_pointer_at (p=p@entry=0x7fffd38c7b28) at compact.c:90
90 compact.c: No such file or directory.
(gdb) thread apply all bt full
Thread 1 (process 19298):
#0 0x00000000004ec76b in invert_pointer_at (p=p@entry=0x7fffd38c7b28) at compact.c:90
val = 140736742586384
hp = 0x7461705f77617264
q = 140736742586416
#1 0x00000000004ec90c in do_compaction () at compact.c:228
q =
i =
sz = 6
t =
infixes =
p = 0x7fffd38c7b10
ch = 0x7fffbf2fc000 "\363\273M"
chend = 0x7ffff09f1000 ""
#2 0x00000000004ecdea in caml_compact_heap () at compact.c:426
target_wsz =
live =
#3 0x00000000004ed24a in caml_compact_heap_maybe () at compact.c:547
fw =
fp = 170.748871
#4 0x00000000004daf4a in caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:785
p = 0.0043600637275738388
dp =
filt_p = 0.0043600637275738388
spend =
computed_work = 1522479
i =
#5 0x00000000004dbedf in caml_gc_dispatch () at minor_gc.c:463
trigger =
#6 0x00000000004dbf77 in caml_check_urgent_gc (extra_root=) at minor_gc.c:482
caml__frame = 0x0
caml__roots_extra_root = {next = 0x0, ntables = 1, nitems = 1, tables = {0x7fffffffd758, 0x7fffffffd870, 0x4dc96a <caml_alloc_shr+170>, 0x22, 0x7fff9d02f6b0}}
#7 0x00000000004dcfe5 in caml_alloc_string (len=65497) at alloc.c:103
result =
offset_index =
wosize = 8188
#8 0x000000000047205c in camlBytearray__sub_1422 () at /root/unison-git/src/bytearray.ml:63
No locals.
#9 0x0000000000447812 in camlTransfer__receiveRec_1568 () at /root/unison-git/src/transfer.ml:295
No locals.
#10 0x0000000000427cef in camlCopy__decompr_2936 () at /root/unison-git/src/transfer.ml:304
No locals.
#11 0x0000000000426bca in camlCopy__fun_3367 () at /root/unison-git/src/copy.ml:401
No locals.
#12 0x000000000046cc11 in camlUtil__convertUnixErrorsToExn_1955 () at /root/unison-git/src/ubase/util.ml:170
No locals.
#13 0x000000000043f46a in camlRemote__processStream_2291 () at /root/unison-git/src/remote.ml:664
No locals.
#14 0x000000000043fe26 in camlRemote__fun_4468 () at /root/unison-git/src/remote.ml:732
No locals.
#15 0x0000000000464e4d in camlLwt__apply_1225 () at /root/unison-git/src/lwt/lwt.ml:75
No locals.
#16 0x000000000046510e in camlLwt__fun_1451 () at /root/unison-git/src/lwt/lwt.ml:94
No locals.
#17 0x000000000048d101 in camlList__iter_1252 () at list.ml:77
No locals.
#18 0x0000000000464b2e in camlLwt__restart_1211 () at /root/unison-git/src/lwt/lwt.ml:31
No locals.
#19 0x000000000046182e in camlLwt_unix_impl__fun_2430 () at /root/unison-git/src/lwt/generic/lwt_unix_impl.ml:153
No locals.
#20 0x000000000048d101 in camlList__iter_1252 () at list.ml:77
No locals.
#21 0x0000000000461671 in camlLwt_unix_impl__run_1579 () at /root/unison-git/src/lwt/generic/lwt_unix_impl.ml:148
No locals.
#22 0x000000000040e80a in camlUitext__doTransport_1863 () at /root/unison-git/src/uitext.ml:490
No locals.
#23 0x000000000040f84e in camlUitext__doit_1922 () at /root/unison-git/src/uitext.ml:556
No locals.
#24 0x0000000000410034 in camlUitext__synchronizeOnce_1968 () at /root/unison-git/src/uitext.ml:718
No locals.
#25 0x000000000041094a in camlUitext__loop_2237 () at /root/unison-git/src/uitext.ml:788
No locals.
#26 0x0000000000410b4d in camlUitext__synchronizeUntilDone_2242 () at /root/unison-git/src/uitext.ml:810
No locals.
#27 0x0000000000410df7 in camlUitext__start_2249 () at /root/unison-git/src/uitext.ml:870
No locals.
#28 0x00000000004085fa in camlMain__Body_1550 () at /root/unison-git/src/main.ml:241
No locals.
#29 0x0000000000407a93 in camlLinktext__entry () at /root/unison-git/src/linktext.ml:19
No locals.
#30 0x0000000000404369 in caml_program ()
No symbol table info available.
#31 0x00000000004ef12e in caml_start_program ()
No symbol table info available.
#32 0x00000000004ef475 in caml_main (argv=0x7fffffffdca8) at startup.c:145
exe_name =
proc_self_exe = "/usr/local/bin/unison", '\000' <repeats 234 times>
res =
tos = 0 '\000'
#33 0x0000000000403c5c in main (argc=, argv=) at main.c:37
No locals.
(gdb)
===SNIP===