Skip to content

master: "make check" unpack_hetero datatype test segv #3522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsquyres opened this issue May 12, 2017 · 5 comments
Closed

master: "make check" unpack_hetero datatype test segv #3522

jsquyres opened this issue May 12, 2017 · 5 comments
Assignees
Labels

Comments

@jsquyres
Copy link
Member

When configured statically on a Linux x86_64 machine, make check fails the unpack_hetero test with a segv. Here's the backtrace:

(gdb) bt
#0  0x000000000040fd7c in opal_datatype_compute_ptypes (datatype=0x5810c0 <opal_datatype_int4>) at opal_datatype_get_count.c:163
#1  0x000000000040b714 in opal_datatype_compute_remote_size (pData=0x5810c0 <opal_datatype_int4>, sizes=0x7f48c8) at opal_convertor.c:456
#2  0x000000000040b82f in opal_convertor_compute_remote_size (pConvertor=0x7f4760) at opal_convertor.c:483
#3  0x000000000040ba3e in opal_convertor_prepare_for_recv (convertor=0x7f4760, datatype=0x5810c0 <opal_datatype_int4>, count=1, pUserBuf=0x7fffffffca40) at opal_convertor.c:569
#4  0x0000000000405d3d in main (argc=1, argv=0x7fffffffcb68) at unpack_hetero.c:61

The code in question is:

163 =>  datatype->ptypes = (size_t*)calloc(OPAL_DATATYPE_MAX_SUPPORTED, sizeof(size_t));

gdb shows:

(gdb) p datatype
$2 = (opal_datatype_t *) 0x5810c0 <opal_datatype_int4>
(gdb) p datatype->ptypes
$3 = (size_t *) 0x0

I'm not quite sure why this is a segv. Is this read-only member, perchance?

@jsquyres
Copy link
Member Author

Forgot to mention -- here's the MTT showing where it is happening (and I can reproduce manually): https://mtt.open-mpi.org/index.php?do_redir=2442

@ggouaillardet
Copy link
Contributor

@jsquyres yes, this is likely the root cause

OPAL_DECLSPEC const opal_datatype_t opal_datatype_int4 = OPAL_DATATYPE_INITIALIZER_INT4(0);

@jsquyres
Copy link
Member Author

@ggouaillardet What should it be?

@ggouaillardet
Copy link
Contributor

@jsquyres

with shared libs

61	    if( OPAL_SUCCESS != opal_convertor_prepare_for_recv( pConv, &opal_datatype_int4, 1, rbuf ) ) {
(gdb) p &opal_datatype_int4
$1 = (const opal_datatype_t *) 0x6020a0 <opal_datatype_int4>

0000000000602000      4K rw--- unpack_hetero

but with static libs

61	    if( OPAL_SUCCESS != opal_convertor_prepare_for_recv( pConv, &opal_datatype_int4, 1, rbuf ) ) {
(gdb) p &opal_datatype_int4
$6 = (const opal_datatype_t *) 0x57fc80 <opal_datatype_int4>

0000000000400000   1952K r-x-- unpack_hetero

removing the const qualifier is an option, but i suspect it is an overkill
i'll try to see if i can improve that (by not computing ptypes for a predefined datatype (since that should be pretty trivial)

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 15, 2017
@jsquyres
Copy link
Member Author

@bosilca says he's looking at this.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue May 24, 2017
karasevb pushed a commit to karasevb/ompi that referenced this issue Jun 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants