Skip to content

ikrit spml cleanup, mkey cache and assorted bug fixes #2354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 14, 2016

Conversation

alex-mikheev
Copy link
Contributor

@yosefe @igor-ivanov please review

@jsquyres
Copy link
Member

jsquyres commented Nov 3, 2016

bot:mellanox:retest

shmem_quit() shall complete all outstanding get_nbi() requests

Signed-off-by: Alex Mikheev <[email protected]>
In this case there is no point to add another progress callback

Signed-off-by: Alex Mikheev <[email protected]>
use single array instead of array of pointers

Signed-off-by: Alex Mikheev <[email protected]>
check every possible transport

Signed-off-by: Alex Mikheev <[email protected]>
It improves cpu cache hit ratio.

Signed-off-by: Alex Mikheev <[email protected]>
@yosefe
Copy link
Contributor

yosefe commented Nov 6, 2016

@alex-mikheev test failure:

06:40:33   CC       sshmem_verbs_module.lo
06:40:33 In file included from sshmem_verbs_module.c:36:0:
06:40:33 sshmem_verbs_module.c: In function ‘segment_create’:
06:40:33 sshmem_verbs_module.c:336:34: error: ‘map_segment_t’ has no member named ‘seg_base_addr’
06:40:33             ds_buf->seg_id, ds_buf->seg_base_addr, (unsigned long)ds_buf->seg_size, ds_buf->seg_name)
06:40:33                                   ^
06:40:33 ../../../../opal/util/output.h:523:52: note: in definition of macro ‘OPAL_OUTPUT_VERBOSE’

@alex-mikheev alex-mikheev force-pushed the topic/oshmem_mkey_cache branch from 10601d9 to 8c92506 Compare November 7, 2016 12:45
@@ -158,41 +162,97 @@ extern int mca_memheap_seg_cmp(const void *k, const void *v);

extern mca_memheap_map_t* memheap_map;

static inline int map_segment_is_va_in(map_base_segment_t *s, const void *va)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove const from "va" in all functions

@@ -158,41 +162,97 @@ extern int mca_memheap_seg_cmp(const void *k, const void *v);

extern mca_memheap_map_t* memheap_map;

static inline int map_segment_is_va_in(map_base_segment_t *s, const void *va)
{
return ((uintptr_t)va >= (uintptr_t)s->va_base &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no needto cast to uintptr_t

(uintptr_t)va < (uintptr_t)s->va_end);
}

static inline map_segment_t *memheap_find_seg(const int segno)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no point making int parameter const


static inline int memheap_is_va_in_segment(const void *va, const int segno)
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for space line here


static inline void* memheap_va2rva(const void* va, const void* local_base, const void* remote_base)
{
return (void*) (remote_base > local_base ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to cast to uintptr_t

else {
tr_id = i;
}
tr_id = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove tr_id variable


OPAL_THREAD_LOCK(&oshmem_request_lock);
assert(false == put_req->req_put.req_base.req_free_called);
put_req->req_put.req_base.req_free_called = true;
opal_free_list_return (&mca_spml_base_put_requests,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to take oshmem_request_lock now?

@@ -690,9 +491,10 @@ sshmem_mkey_t *mca_spml_ikrit_register(void* addr,
}
SPML_VERBOSE(5,
"rank %d ptl %d addr %p size %llu %s",
oshmem_proc_pe(oshmem_proc_local()), i, addr, (unsigned long long)size,
my_rank, i, addr, (unsigned long long)size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation uses tabs, need spaces

if (handle)
*handle = put_req;

/* fill out request */
put_req->mxm_req.base.mq = mca_spml_ikrit.mxm_mq;
/* request immediate responce if we are getting low on send buffers. We only get responce from remote on ack timeout.
* Also request explicit ack once in a while */
#if MXM_API < MXM_VERSION(2,0)
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the code

opal_progress();
}

while (0 < mca_spml_ikrit.n_active_gets) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can combine the two while loops with &&

@jjhursey
Copy link
Member

jjhursey commented Nov 7, 2016

bot:ibm:retest

@jladd-mlnx
Copy link
Member

@jjhursey Are these false alarms? Please advise.

@jsquyres
Copy link
Member

jsquyres commented Nov 7, 2016

bot:ibm:retest

@jjhursey
Copy link
Member

jjhursey commented Nov 7, 2016

Disregard the IBM-CI. The relay machine failed unexpectedly this afternoon. I'll take down the Jenkins until it is back up.

@hppritcha
Copy link
Member

@jsquyres i think this can be merged

* Call opal_progress() so that ompi will no deadlock
* (for example may need to respond to rkey requests)
*/
opal_progress();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alex-mikheev I don't think this is a right place for opal_progress. The function is not always invoked from a loop. The opal_progress has to be moved to the while loop that invokes shmem_lock_cswap.

@hppritcha
Copy link
Member

removing rm approved from this so the ucx/mlnx folks can sort this out.

Current standard says that behaviour in the case of error is undefined

Signed-off-by: Alex Mikheev <[email protected]>
@alex-mikheev
Copy link
Contributor Author

@shamisp please take a look at my last two commits

@shamisp
Copy link
Contributor

shamisp commented Nov 10, 2016

@hppritcha Looks good to me.

@igor-ivanov
Copy link
Member

👍

@jladd-mlnx jladd-mlnx merged commit 9a79da7 into open-mpi:master Nov 14, 2016
@yosefe yosefe deleted the topic/oshmem_mkey_cache branch November 20, 2016 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants