Description
I'm running into a problem with creating a graph. A recreation of the problem is here and the quick summary is that the code does something like:
cuda_code = """
extern "C" __global__ void simple(char *str) {
printf("this is a test\\n");
printf("ptr: %p\\n", str);
printf("passed argument was: %s\\n", str);
}
"""
def mkgraph():
# initialize device
# load PTX, get function
# allocate memory
# create memcpy node, copies UTF-8 encoded bytes to GPU
# create kernel node
# add dependency from kernel node to memcpy node
# *** run graph first time ***
# return graph
g = mkgraph()
# *** run graph second time ***
The first graph execution works. I can also instantiate and execute the graph multiple times before the function returns and it works fine and prints out the correct string.
The second graph execution has the exact same memory address as the first instantiation. The this is a test
message prints fine, the pointer address points fine, and then the final line is passed argument was:
followed by garbage.
*** LAUNCHING GRAPH IN FUNCTION ***
this is a test
ptr: 0x7f4b19800000
passed argument was: hello from host
*** LAUNCHING GRAPH OUTSIDE FUNCTION ***
this is a test
ptr: 0x7f4b19800000
passed argument was: ?t?TK�
My best guess is that there is a refcount that gets decremented when the function returns and the graph isn't hanging on to a copy of the memory, so it's freed or something? Is this a bug in the CUDA Python code or is there something I'm missing?