Skip to content

Enable external storage shim (extstore). #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2018
Merged

Enable external storage shim (extstore). #38

merged 2 commits into from
May 30, 2018

Conversation

pataquets
Copy link
Contributor

@pataquets pataquets commented May 15, 2018

Extstore info at: https://github.com/memcached/memcached/wiki/Extstore

Looks like it is maturing very quickly and would be worthwhile having access to it on Docker images.

@tianon
Copy link
Member

tianon commented May 15, 2018

Looks like we're running into a combination of memcached/memcached#319 and later memcached/memcached#374.

@tianon
Copy link
Member

tianon commented May 15, 2018

For reference, we support all of amd64, arm32v5, arm32v6, arm32v7, arm64v8, i386, ppc64le, and s390x, so I'd rather wait to merge this until the functionality is working on all supported platforms.

@dormando
Copy link

Any chance you could test the 'next' branch? Or does it have to wait for a released version?

Got fixes for all of my buildbots so far, but I think you have a few extra platforms.

@tianon
Copy link
Member

tianon commented May 23, 2018 via email

@dormando
Copy link

dormando commented May 23, 2018

Would be really useful, since I'd have to make another release if it didn't work :P
(and yes, the official builds should only track the official releases)

@tianon
Copy link
Member

tianon commented May 23, 2018

Found some time tonight, so I'm running lots of make test now -- will report back results across all the arches listed above as soon as I've got them. 👍

@tianon
Copy link
Member

tianon commented May 23, 2018

key:

  • ✔️ success
  • ❌ failure
arch alpine debian
amd64 ✔️ ✔️
arm32v5 N/A
arm32v6 N/A
arm32v7 N/A ✔️
arm64v8 ❌?? ❌??
i386 ✔️ ✔️
ppc64le ✔️ ✔️
s390x

https://gist.github.com/tianon/d3aa50500e22cf8686a08ae4fa2cbae1
(full build/test logs for ~every arch; see below -- git clone https://gist.github.com/d3aa50500e22cf8686a08ae4fa2cbae1.git xxx is probably the easiest way to consume/review given the length)

I'm confused about the arm64v8 failures, since memcached/memcached@f939a84 should've covered that 😕 (but my assembly-foo is admittedly weak -- only ever really did z80 in any serious capacity 😅).

also:

remote: error: File s390x-debian.log is 129.47 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File s390x-alpine.log is 129.44 MB; this exceeds GitHub's file size limit of 100.00 MB

There's one extstore-specific test that when it fails, it appears to spew some absolutely massive test data -- not sure how I'll share this massive log ATM or what to trim to get it down to a useful size without trimming useful data, and I think the offending test is t/chunked-extstore.t. The arm32v6-alpine.log file looks pretty similar to the s390x files (same spewing output, but happened to be just barely under GitHub's 100MB limit so it's included in the gist).

Now, if your response to this is that we need to trim the list of architectures we support, that's fair too (we're not opposed to doing that), but the current image (which includes make test) does build successfully on all these combinations so IMO it'd be a shame to drop any (but again, totally willing to if that's your preference). ❤️

(Also, in the future, feel free to ping me directly on anything you'd like either multiarch or Docker tested before release -- always happy to help out where we can!! ❤️ ❤️)

@dormando
Copy link

whaaat arm64v8 is that which doesn't have the crc32c instruction? Is there an march or cross compile issue?

the v5 failure doesn't seem to have anything to do with extstore but the v6 one looks crazy weird.

I get it though; the patch from qualcomm wasn't good enough for this. If the CPU instruction isn't known to the compiler it won't build, which is usally dealt with by replacing the symbol with the opcode. I can also add configure time tests which is a pain... but that arm64 really should have that instruction, my rpi3 does. Running tests on rpi2 now.

I bet chunked on the arm32's is something else... but repro'ing this might be a pain if my rpi2 won't do it (it's running tests now.. was too lazy to plug it in).

Is it possible to mask extstore for arm/390 but leave other options for the primary builds? Extstore doesn't make a ton of sense on small platforms, but it would on proper aarch64 server boards with lots of disk... those should build just fine with the crc instruction.

On primary platforms extstore can benefit people now, it'd suck to hold back it's progress (but also to lose any small platforms for non-extstore purposes).

@tianon
Copy link
Member

tianon commented May 23, 2018

The arm64 box I was testing on is a HiSilicon Hi1616 chip (64 cores) graciously provided to us by the WorksOnArm project, so definitely a server chip, and we're compiling directly on that chip natively (no cross-compile, emulation, or anything like that), and thus even more odd that it didn't work.

I've also got access to an 8-core AppliedMicro XGene1, but it fails with a similar error:

/tmp/ccJlGhka.s: Assembler messages:
/tmp/ccJlGhka.s:156: Error: selected processor does not support `crc32cx w0,w0,x3'
/tmp/ccJlGhka.s:207: Error: selected processor does not support `crc32cb w0,w0,w2'
/tmp/ccJlGhka.s:217: Error: selected processor does not support `crc32ch w0,w0,w2'
/tmp/ccJlGhka.s:227: Error: selected processor does not support `crc32cw w0,w0,w1'
/tmp/ccJlGhka.s:241: Error: selected processor does not support `crc32cb w0,w0,w2'
/tmp/ccJlGhka.s:251: Error: selected processor does not support `crc32ch w0,w0,w1'
/tmp/ccJlGhka.s:264: Error: selected processor does not support `crc32cw w0,w0,w2'
/tmp/ccJlGhka.s:274: Error: selected processor does not support `crc32cb w0,w0,w1'

So maybe this instruction isn't well-supported across the breadth of arm64 chips?

I'm happy to re-test next on arm32v5 with extstore disabled again (the latest release works fine), and for the real builds we can definitely be choosy about which arches use this flag and which don't.

Regarding the IBM s390x mainframe stuff, I'll give a poke to some contacts I've got on that team and see if they've got any ideas (but for now, just planning to exclude extstore on s390x and arm32 is totally sane).

@dormando
Copy link

dormando commented May 23, 2018

what platform is the arm32v5?

I haven't looked at the s390x failures, but I have no way of developing on one and am not sure of any users, certainly not extstore. I'd prefer to put my efforts to making sure arm works better. If it runs fine without extstore, please stick with that :)

All I have is a couple of RPI's though... one of the tests is flaky under 32bit mode, but they all do pass. I have an rpi2 in pure 32bit, and a rpi3 with a (painfully built) 64bit kernel running aarch64... also the patch I got which does this came from qualcomm, but unsure what they tested on exactly.

I'll do some blind googling and see if I can figure this out.

(also, thanks so much for testing! It's really hard to get access to these exotic platforms...)

@dormando
Copy link

How much RAM do the armv5 and v6 platforms have?

Looks like some of my tests are flaky under 32bit mode, as it's trying to fill disk and expecting the page layout to be in specific forms. Staring hard at the failure for some of these that might just be the case.

armv5 failing normal chunked-items.t (not even extstore) might also be similar. unfortunately the test platform makes it hard to tell if the daemon just died during the test, or if it gave invalid output

re: v8, grep crc /proc/cpuinfo should say if that extension exists. I've been reading online that -march=armv8-a+crc is necessary to ensure the CRC instructions get built, but I don't see that being passed on my rpi3 and it's compiling fine. What version of GCC is on your v8 platforms?

I'll have to think about how to determine via configure if armv8-a+crc should be forced :/ might be as simple as "if I force this march and it works", but I have a deep fear of that accidentally compiling something that won't run if the target arch is actually v7 or something.

@dormando
Copy link

Alright, looks like I goofed my cross compile on the rpi3: it wasn't actually building with aarch64. You probably do need the -march to get the silly thing to compile.

For now, I've added a "--enable-arm-crc32" which gates the instructions for those who're familiar enough to build it properly. This is pushed to 'next' and you should try again with --enable-extstore but without --enable-arm-crc32.

The other platforms I can look at once you let me know how much RAM they have. I've been testing builds with 512mb. On the other hand, armv5 might be too old for me to want to really pursue.

@tianon
Copy link
Member

tianon commented May 24, 2018

what platform is the arm32v5?

We build them on that same HiSilicon chip (which has ~125Gb of RAM 😅), although I've got a real one here at home (the "Pogoplug") that has 256mb of RAM. Definitely 100% understand if you want to avoid wasting more cycles on this arch (it's very much on the way out -- generally has no hardware floating point chip, so it's about as speedy as you'd imagine).

I haven't looked at the s390x failures, but I have no way of developing on one and am not sure of any users, certainly not extstore. I'd prefer to put my efforts to making sure arm works better. If it runs fine without extstore, please stick with that :)

Yeah, I put out a feeler to IBM, but haven't seen a response so for now we'll just go extstore-less there. 👍

... rpi3 with a (painfully built) 64bit kernel running aarch64 ...

Ouch. I use https://wiki.debian.org/RaspberryPi3 on mine, which has worked pretty well (arm64 Debian).

(also, thanks so much for testing! It's really hard to get access to these exotic platforms...)

Amen -- so, so true. 😞

How much RAM do the armv5 and v6 platforms have?

Ours is a special case (we build on the HiSilicon), but AFAIK, in the real-world v6 is really only the RPi 1 and RPi Zero -- even the RPi 2 is a v7 chip (and the 3 is a 64bit v8, as you know).

grep crc /proc/cpuinfo

HiSilicon machine:

$ grep crc /proc/cpuinfo | sort -u
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
$ 

XGene1:

$ grep crc /proc/cpuinfo | sort -u
$ grep '^Features' /proc/cpuinfo | sort -u
Features	: fp asimd evtstrm
$ 

😕 😞

Is there any way to do runtime detection on arm64 instead? (I mean, you could scrape /proc/cpuinfo, but that seems a little hacky, and probably would eat some of the speed benefits of using the native instruction?)

What version of GCC is on your v8 platforms?

That'd be https://pkgs.alpinelinux.org/package/v3.7/main/aarch64/gcc and https://packages.debian.org/stretch/gcc, so 6.4.0 on Alpine and 6.3.0 on Debian.

This is pushed to 'next' and you should try again with --enable-extstore but without --enable-arm-crc32.

Sure thing, firing off some fresh builds!

@tianon
Copy link
Member

tianon commented May 24, 2018

Ok, arm64v8 is now green on both Debian and Alpine (specifying --enable-extstore and not --enable-arm-crc32). arm32v5, arm32v6, and s390x are still no bueno, but that's OK (we can skip them for extstore for now). 👍

Is it still useful for me to give you the arm32 failing logs, or are you good?

The arm32v5 failure seems really strange given that it passes on the current release, but not on next (and I've verified that it still fails on next when I disable extstore). 😕 (Maybe we'll have to drop that arch completely at this point)

@dormando
Copy link

RE: Runtime detection, that's exactly what the code is doing, but if you have a compiler error you can't fix that at runtime :P I'll need to revisit by either stubbing the opcodes like I mentioned earlier, or some other trick. Most likely there'll just be a configure option to force the march build. :/

Is arm32v5 not failing on 1.5.7 presently? I'm confused, since nothing in next since 1.5.7 actually touches non-extstore code unless you're using seccomp? If you're using seccomp, does disabling it fix anything?

If 1.5.7 passes, does fac34333294ae6a05e822498e3295ec45ce16de4 fail? (or just bisect it, I have no idea which commit would've caused chunked items to fail in that way)

@tianon
Copy link
Member

tianon commented May 24, 2018

Ah, I've been building on a different host than the real builds -- they're currently all using a QEMU VM on my local machine instead. Trying again now on the proper hardware... (We've got a number of arm32 builds that fail on our arm64 hardware, even though it supports almost all arm32 instructions.)

@tianon
Copy link
Member

tianon commented May 24, 2018

Ok, building on the proper hardware, memcached/memcached@cf3b853 works fine with --enable-extstore on arm32v5. Re-testing arm32v6 now. 👍

@dormando
Copy link

that's nuts! was the QEMU VM set to low RAM or something? I guess it could also be a timing issue, since I still have some of those in the test suite.

@tianon
Copy link
Member

tianon commented May 24, 2018

No, I think it must be a timing issue -- the failures were from the beefy HiSilicon box (the one with over 100GB of RAM and 64 cores). 😅

The QEMU emulated machine is a more modest 1GB of RAM with only 4 cores.

@dormando
Copy link

ah confused; thought you said the failures were from QEMU

@tianon
Copy link
Member

tianon commented May 24, 2018

Sorry for the confusion! To hopefully clear things up: most of our arm32vX builds use the beefy HiSilicon box I've been testing this on, but memcached is one exception where we instead build on my QEMU VM for the official builds.

arm32v6 with --enable-extstore still fails on my QEMU VM, so we might need to exclude that one as well as s390x (which fails in the same way).

@dormando
Copy link

Got it.. well, let me know which ones still fail on the real hardware. I can work through the timing issues in the tests over a longer period of time, which should eventually help with QEMU stuff.

@tianon
Copy link
Member

tianon commented May 24, 2018

The ones that fail on the real hardware are down to just arm32v6 and s390x. 👍

@dormando
Copy link

Alright, lets focus on arm32v6. is it still that huge spew of data? If so you can probably cut like... the middle 50% of it and give me the start and the end bits.

@tianon
Copy link
Member

tianon commented May 24, 2018

@tianon
Copy link
Member

tianon commented May 24, 2018

It fails in a slightly different way when I run it on a RPi2 in arm32v6 mode:

+ make test
./sizes
Slab Stats	64
Thread stats	-6464
Global stats	184
Settings	528
Item (no cas)	32
Item (cas)	40
extstore header	12
Libevent thread	104
Connection	352
----------------------------------------
libevent thread cumulative	6440
Thread stats cumulative		6336
./testapp
1..52
ok 1 - cache_create
ok 2 - cache_constructor
ok 3 - cache_constructor_fail
ok 4 - cache_destructor
ok 5 - cache_reuse
ok 6 - cache_redzone
ok 7 - issue_161
ok 8 - strtol
ok 9 - strtoll
ok 10 - strtoul
ok 11 - strtoull
ok 12 - issue_44
ok 13 - vperror
ok 14 - issue_101
Signal handled: Terminated.
ok 15 - start_server
ok 16 - issue_92
ok 17 - issue_102
ok 18 - binary_noop
ok 19 - binary_quit
ok 20 - binary_quitq
ok 21 - binary_set
ok 22 - binary_setq
ok 23 - binary_add
ok 24 - binary_addq
ok 25 - binary_replace
ok 26 - binary_replaceq
ok 27 - binary_delete
ok 28 - binary_deleteq
ok 29 - binary_get
ok 30 - binary_getq
ok 31 - binary_getk
ok 32 - binary_getkq
ok 33 - binary_gat
ok 34 - binary_gatq
ok 35 - binary_gatk
ok 36 - binary_gatkq
ok 37 - binary_incr
ok 38 - binary_incrq
ok 39 - binary_decr
ok 40 - binary_decrq
ok 41 - binary_version
ok 42 - binary_flush
ok 43 - binary_flushq
ok 44 - binary_append
ok 45 - binary_appendq
ok 46 - binary_prepend
ok 47 - binary_prependq
ok 48 - binary_stat
ok 49 - binary_illegal
ok 50 - binary_pipeline_hickup
Signal handled: Interrupt.
ok 51 - shutdown
ok 52 - stop_server
getaddrinfo(): Name does not resolve
failed to listen on TCP port 37893: Invalid argument
slab class   1: chunk size        80 perslab   13107
slab class   2: chunk size       104 perslab   10082
slab class   3: chunk size       136 perslab    7710
slab class   4: chunk size       176 perslab    5957
slab class   5: chunk size       224 perslab    4681
slab class   6: chunk size       280 perslab    3744
slab class   7: chunk size       352 perslab    2978
slab class   8: chunk size       440 perslab    2383
slab class   9: chunk size       552 perslab    1899
slab class  10: chunk size       696 perslab    1506
slab class  11: chunk size       872 perslab    1202
slab class  12: chunk size      1096 perslab     956
slab class  13: chunk size      1376 perslab     762
slab class  14: chunk size      1720 perslab     609
slab class  15: chunk size      2152 perslab     487
slab class  16: chunk size      2696 perslab     388
slab class  17: chunk size      3376 perslab     310
slab class  18: chunk size      4224 perslab     248
slab class  19: chunk size      5280 perslab     198
slab class  20: chunk size      6600 perslab     158
slab class  21: chunk size      8256 perslab     127
slab class  22: chunk size     10320 perslab     101
slab class  23: chunk size     12904 perslab      81
slab class  24: chunk size     16136 perslab      64
slab class  25: chunk size     20176 perslab      51
slab class  26: chunk size     25224 perslab      41
slab class  27: chunk size     31536 perslab      33
slab class  28: chunk size     39424 perslab      26
slab class  29: chunk size     49280 perslab      21
slab class  30: chunk size     61600 perslab      17
slab class  31: chunk size     77000 perslab      13
slab class  32: chunk size     96256 perslab      10
slab class  33: chunk size    120320 perslab       8
slab class  34: chunk size    150400 perslab       6
slab class  35: chunk size    188000 perslab       5
slab class  36: chunk size    235000 perslab       4
slab class  37: chunk size    293752 perslab       3
slab class  38: chunk size    367192 perslab       2
slab class  39: chunk size    524288 perslab       2
<26 server listening (auto-negotiate)
<27 new auto-negotiating client connection
<27 connection closed.
slab class   1: chunk size        80 perslab   13107
slab class   2: chunk size       104 perslab   10082
slab class   3: chunk size       136 perslab    7710
slab class   4: chunk size       176 perslab    5957
slab class   5: chunk size       224 perslab    4681
slab class   6: chunk size       280 perslab    3744
slab class   7: chunk size       352 perslab    2978
slab class   8: chunk size       440 perslab    2383
slab class   9: chunk size       552 perslab    1899
slab class  10: chunk size       696 perslab    1506
slab class  11: chunk size       872 perslab    1202
slab class  12: chunk size      1096 perslab     956
slab class  13: chunk size      1376 perslab     762
slab class  14: chunk size      1720 perslab     609
slab class  15: chunk size      2152 perslab     487
slab class  16: chunk size      2696 perslab     388
slab class  17: chunk size      3376 perslab     310
slab class  18: chunk size      4224 perslab     248
slab class  19: chunk size      5280 perslab     198
slab class  20: chunk size      6600 perslab     158
slab class  21: chunk size      8256 perslab     127
slab class  22: chunk size     10320 perslab     101
slab class  23: chunk size     12904 perslab      81
slab class  24: chunk size     16136 perslab      64
slab class  25: chunk size     20176 perslab      51
slab class  26: chunk size     25224 perslab      41
slab class  27: chunk size     31536 perslab      33
slab class  28: chunk size     39424 perslab      26
slab class  29: chunk size     49280 perslab      21
slab class  30: chunk size     61600 perslab      17
slab class  31: chunk size     77000 perslab      13
slab class  32: chunk size     96256 perslab      10
slab class  33: chunk size    120320 perslab       8
slab class  34: chunk size    150400 perslab       6
slab class  35: chunk size    188000 perslab       5
slab class  36: chunk size    235000 perslab       4
slab class  37: chunk size    293752 perslab       3
slab class  38: chunk size    367192 perslab       2
slab class  39: chunk size    524288 perslab       2
<26 server listening (ascii)
<27 new ascii client connection.
<27 connection closed.
slab class   1: chunk size        80 perslab   13107
slab class   2: chunk size       104 perslab   10082
slab class   3: chunk size       136 perslab    7710
slab class   4: chunk size       176 perslab    5957
slab class   5: chunk size       224 perslab    4681
slab class   6: chunk size       280 perslab    3744
slab class   7: chunk size       352 perslab    2978
slab class   8: chunk size       440 perslab    2383
slab class   9: chunk size       552 perslab    1899
slab class  10: chunk size       696 perslab    1506
slab class  11: chunk size       872 perslab    1202
slab class  12: chunk size      1096 perslab     956
slab class  13: chunk size      1376 perslab     762
slab class  14: chunk size      1720 perslab     609
slab class  15: chunk size      2152 perslab     487
slab class  16: chunk size      2696 perslab     388
slab class  17: chunk size      3376 perslab     310
slab class  18: chunk size      4224 perslab     248
slab class  19: chunk size      5280 perslab     198
slab class  20: chunk size      6600 perslab     158
slab class  21: chunk size      8256 perslab     127
slab class  22: chunk size     10320 perslab     101
slab class  23: chunk size     12904 perslab      81
slab class  24: chunk size     16136 perslab      64
slab class  25: chunk size     20176 perslab      51
slab class  26: chunk size     25224 perslab      41
slab class  27: chunk size     31536 perslab      33
slab class  28: chunk size     39424 perslab      26
slab class  29: chunk size     49280 perslab      21
slab class  30: chunk size     61600 perslab      17
slab class  31: chunk size     77000 perslab      13
slab class  32: chunk size     96256 perslab      10
slab class  33: chunk size    120320 perslab       8
slab class  34: chunk size    150400 perslab       6
slab class  35: chunk size    188000 perslab       5
slab class  36: chunk size    235000 perslab       4
slab class  37: chunk size    293752 perslab       3
slab class  38: chunk size    367192 perslab       2
slab class  39: chunk size    524288 perslab       2
<26 server listening (binary)
<27 new binary client connection.
<27 connection closed.
Invalid value for binding protocol: http
 -- should be one of auto, binary, or ascii
Maximum connections must be greater than 0
Maximum connections must be greater than 0
Number of threads must be greater than 0
t/00-startup.t .............. ok
t/64bit.t ................... skipped: Skipping 64-bit tests on 32-bit build
t/binary-extstore.t ......... ok
t/binary-get.t .............. ok
t/binary-sasl.t ............. skipped: Skipping SASL tests
t/binary.t .................. ok
t/bogus-commands.t .......... ok
t/cas.t ..................... ok
t/chunked-extstore.t ........ ok
t/chunked-items.t ........... ok
t/daemonize.t ............... ok
t/dash-M.t .................. ok
t/dyn-maxbytes.t ............ ok
t/evictions.t ............... ok
t/expirations.t ............. ok
t/extstore-buckets.t ........ ok

#   Failed test '0 pages are free'
#   at t/extstore.t line 100.
#          got: '1'
#     expected: '0'
# Looks like you failed 1 test of 27.
t/extstore.t ................ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/27 subtests 
t/flags.t ................... ok
t/flush-all.t ............... ok
t/getandtouch.t ............. ok
t/getset.t .................. ok
t/idle-timeout.t ............ ok
t/incrdecr.t ................ ok
t/inline_asciihdr.t ......... ok
t/issue_104.t ............... ok
t/issue_108.t ............... ok
t/issue_14.t ................ ok
t/issue_140.t ............... skipped: Fix for Issue 140 was only an illusion
t/issue_152.t ............... ok
t/issue_163.t ............... ok
t/issue_183.t ............... ok
t/issue_192.t ............... ok
t/issue_22.t ................ ok
t/issue_260.t ............... skipped: Only possible to test #260 under artificial conditions
t/issue_29.t ................ ok
t/issue_3.t ................. ok
t/issue_41.t ................ ok
t/issue_42.t ................ ok
t/issue_50.t ................ ok
t/issue_61.t ................ ok
t/issue_67.t ................ ok
t/issue_68.t ................ ok
t/issue_70.t ................ ok
Item max size cannot be less than 1024 bytes.
Cannot set item size limit higher than 1/2 of memory max.
t/item_size_max.t ........... ok
t/line-lengths.t ............ ok
t/lru-crawler.t ............. ok
t/lru-maintainer.t .......... ok
t/lru.t ..................... ok
t/malicious-commands.t ...... ok
t/maxconns.t ................ ok
t/misbehave.t ............... skipped: Privilege drop not supported
t/multiversioning.t ......... ok
t/noreply.t ................. ok
t/quit.t .................... ok
t/refhang.t ................. skipped: Test is flaky. Needs special hooks.
t/slabhang.t ................ skipped: Test is flaky. Needs special hooks.
t/slabs-reassign-chunked.t .. ok
t/slabs-reassign2.t ......... ok
t/slabs_reassign.t .......... ok
PORT: 46715
t/stats-conns.t ............. ok
t/stats-detail.t ............ ok
t/stats.t ................... ok
t/touch.t ................... ok
t/udp.t ..................... ok
t/unixsocket.t .............. ok
t/watcher.t ................. ok
t/whitespace.t .............. skipped: Skipping tests probably because you don't have git.

Test Summary Report
-------------------
t/extstore.t              (Wstat: 256 Tests: 27 Failed: 1)
  Failed test:  20
  Non-zero exit status: 1
Files=67, Tests=64765, 532 wallclock secs (122.04 usr  5.04 sys + 213.23 cusr 32.06 csys = 372.37 CPU)
Result: FAIL
make: *** [Makefile:1873: test] Error 1
The command '/bin/sh -c set -x 		&& apk add --no-cache --virtual .build-deps 		autoconf automake 		ca-certificates 		coreutils 		cyrus-sasl-dev 		dpkg-dev dpkg 		gcc 	libc-dev 		libevent-dev 		libressl 		linux-headers 		make 		perl 		perl-utils 		tar 	&& wget -O memcached.tar.gz "https://github.com/memcached/memcached/archive/$MEMCACHED_COMMIT.tar.gz" 	&& mkdir -p /usr/src/memcached 	&& tar -xzf memcached.tar.gz -C /usr/src/memcached --strip-components=1 	&& rm memcached.tar.gz && cd /usr/src/memcached 		&& ./autogen.sh 	&& ./configure --build="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" 		--enable-extstore 		--enable-sasl 	&& make -j "$(nproc)" 		&& make test 	&& make install 		&& cd / && rm -rf /usr/src/memcached 	&& runDeps="$( 		scanelf --needed --nobanner --format '%n#p' --recursive /usr/local 			| tr ',' '\n' 			| sort -u 	| awk 'system("[ -e /usr/local/lib/" $1 " ]") == 0 { next } { print "so:" $1 }' )" 	&& apk add --virtual .memcached-rundeps $runDeps 	&& apk del .build-deps 		&& memcached -V' returned a non-zero code: 2

@dormando
Copy link

that failure I'm familiar with. my rpi2 did that once but on repeated runs works okay. test is flaky but is passable. I'll boot my pi2 back up and try to improve that now I guess

@tianon
Copy link
Member

tianon commented May 24, 2018

Anything more I can provide to help debug the failure in the longer log? That's from the machine that'll actually be building this once it's released, so it's more concerning than just a flaky test (since we automatically re-attempt builds several times to account for flaky failures when we build the real thing).

@dormando
Copy link

Try 'next' branch again, please?

Can no longer repro the second failure on my rpi2, and the at least start of the log spew is likely the same issue in the other test. I changed the pacing of the inserts and ramped up the count a bit to make up for extra space available on 32bit systems.

@dormando
Copy link

broke it in a different way, hold on...

@dormando
Copy link

dormando commented May 24, 2018

okay, pushed 'next' for chunked item changes again. re-tuned a lot of things, and shrunk the number of tests by a lot. if it spews again the file should at least be smaller.

hmm... running them in a loop, they do occasionally fail, but less often :/ need a better mechanism for doing a trailing fill on such slow systems...

edit: alright, 'next' is rebased again... now passes every time on my desktop and rpi2, at least so far, which is much more often than all previous tests. Sorry about that... it took a long time to figure out the compactor was helpfully rescuing the damn canary values :) So now it's told to back off until it's required for the tests. That among a few other changes.

This might fix all of the tests everywhere.

edit2: rebased with one more tiny change. it passed in a loop for an hour on the rpi2.

@tianon
Copy link
Member

tianon commented May 25, 2018

@dormando
Copy link

Think I need to task off of this for a while; does the arm32v6 always fail in that same spot, or does it pass sometimes? I think that's still a pacing issue.

the s390x... just mask it off for now. would you mind opening an issue on the memcached project and link back to both logs? There're some other user facing bugs and some bench work I'd like to prioritize for the next week.

Thanks a ton for all your help so far... sorry my tests are so flaky :| Feel dumb it took so long to figure out the compactor race. No idea what's wrong with the s390x though.

@tianon
Copy link
Member

tianon commented May 25, 2018

No problem -- definitely not complaining, just trying to be helpful in one of the only ways I can! 😄

Will file issues to track these further and we'll just mask out on those two arches for now. 👍

@tianon
Copy link
Member

tianon commented May 25, 2018

(Yes, I think both test failures are pretty consistent -- with s390x, we've got both Debian and Alpine failing in exactly the same way.)

@tianon
Copy link
Member

tianon commented May 25, 2018

Ah crap, spoke too soon. On arm32v6, I replaced make test with make test || make test || make test || make test and it eventually passed. 😅

@tianon
Copy link
Member

tianon commented May 25, 2018

Do you still want another issue just for arm32v6, or just chalk it up to general flakiness?

(Filed s390x at memcached/memcached#381)

@dormando
Copy link

Yeah please do. The tests shouldn't be flaky, and I beat most of that out of them already. Dunno howtf your platform is so sensitive to it though. The compaction algorithms changed a bunch since I first wrote the tests, so they were due up for some work.

@tianon
Copy link
Member

tianon commented May 25, 2018

Done deal! memcached/memcached#382

(I've just confirmed on s390x, and that one is definitely not a flakiness issue -- it's gotta be something deeper. 😞)

@dormando
Copy link

dormando commented May 25, 2018

oh shoot... actually in t/chunked-extstore.t, can you try changing:
print $sock "extstore drop_under 3\r\n";
to... 1? that was a typo.

with drop_under being 3, it could race and throw out odd or even numbered objects :|

that won't fix s390x though. I have no idea what's going on there.

@tianon
Copy link
Member

tianon commented May 25, 2018

Still flaky with that patched from 3 to 1 😞

@dormando
Copy link

nuts. punting! fuck it. I tried :)

Copy link
Member

@yosifkit yosifkit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tianon tianon merged commit a027e9e into docker-library:master May 30, 2018
tianon added a commit to infosiftr/stackbrew that referenced this pull request Jun 5, 2018
- `bash`: 4.4.23
- `ghost`: 1.23.1
- `julia`: 0.6.3
- `matomo`: GPG race conditions (matomo-org/docker#105)
- `memcached`: `extstore` (docker-library/memcached#38)
- `mongo`: 4.0.0~rc2
- `openjdk`: remove 9 (docker-library/openjdk#199), add 10 for Windows (docker-library/openjdk#200), 11-ea+16
- `owncloud`: update PECL exts (docker-library/owncloud#102)
- `percona`: 5.7.22, 5.6.40
- `php`: fix `wget: error getting response: Connection reset by peer`
- `piwik`: GPG race conditions (matomo-org/docker#105)
- `python`: add `nis` nonsense to 2.7 (docker-library/python#281), 3.7.0b5
- `rocket.chat`: 0.65.1
- `ruby`: 2.6.0-preview2
- `wordpress`: update GPG for wp-cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants