-
-
Notifications
You must be signed in to change notification settings - Fork 442
Closed
Description
Trying to run some basic examples on a system with 4 GH200 modules using a container image based on nvcr.io/nvidia/pytorch:25.01-py3
with viztracer 1.0.1 installed on top fails for me as follows.
For moving tensors to a CUDA device with test_cuda.py
import torch
from viztracer import VizTracer
with VizTracer(log_torch=True) as tracer:
initial_value = torch.tensor([3.0]).cuda(0)
print("done!")
I'm getting
/workspace$ python test_cuda.py
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 330, in _lazy_init
queued_call()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 1567, in _register_triton_kernels
torch._TritonLibrary.registerOp(
File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2585, in registerOp
cls.lib.define(full_schema)
File "/usr/local/lib/python3.12/dist-packages/torch/library.py", line 153, in define
result = self.m.define(schema, alias_analysis, tuple(tags))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: VizTracer: Unexpected type. Might be an event mismatch.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/test_cuda.py", line 5, in <module>
initial_value = torch.tensor([3.0]).cuda(0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 336, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: VizTracer: Unexpected type. Might be an event mismatch.
CUDA call was originally invoked at:
File "/workspace/test_cuda.py", line 1, in <module>
import torch
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2007, in <module>
_C._initExtension(_manager_path())
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 1585, in <module>
_lazy_call(_register_triton_kernels)
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 261, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 330, in _lazy_init
queued_call()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 1567, in _register_triton_kernels
torch._TritonLibrary.registerOp(
File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2585, in registerOp
cls.lib.define(full_schema)
File "/usr/local/lib/python3.12/dist-packages/torch/library.py", line 153, in define
result = self.m.define(schema, alias_analysis, tuple(tags))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Tried to register an operator (triton::_triton_bsr_dense_mm_out(Tensor bsr, Tensor dense, *, Tensor(a!) out) -> Tensor(a!)) with the same name and overload name multiple times. Each overload's schema should only be registered with a single call to def(). Duplicate registration: registered at /dev/null:2578. Original registration: registered at /dev/null:2578
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/test_cuda.py", line 4, in <module>
with VizTracer(log_torch=True) as tracer:
File "/usr/local/lib/python3.12/dist-packages/viztracer/viztracer.py", line 170, in __exit__
self.stop()
File "/usr/local/lib/python3.12/dist-packages/viztracer/viztracer.py", line 241, in stop
self.torch_profile.__exit__(None, None, None)
File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 777, in __exit__
self.stop()
File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 793, in stop
self._transit_action(self.current_action, None)
File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 836, in _transit_action
action()
File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 239, in stop_trace
self.profiler.__exit__(None, None, None)
File "/usr/local/lib/python3.12/dist-packages/torch/autograd/profiler.py", line 369, in __exit__
device_module.synchronize()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 965, in synchronize
_lazy_init()
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 336, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: Tried to register an operator (triton::_triton_bsr_dense_mm_out(Tensor bsr, Tensor dense, *, Tensor(a!) out) -> Tensor(a!)) with the same name and overload name multiple times. Each overload's schema should only be registered with a single call to def(). Duplicate registration: registered at /dev/null:2578. Original registration: registered at /dev/null:2578
CUDA call was originally invoked at:
File "/workspace/test_cuda.py", line 1, in <module>
import torch
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2007, in <module>
_C._initExtension(_manager_path())
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 995, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 1585, in <module>
_lazy_call(_register_triton_kernels)
File "/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py", line 261, in _lazy_call
_queued_calls.append((callable, traceback.format_stack()))
[nid006679:53988:0:53988] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xf86a280)
==== backtrace (tid: 53988) ====
0 /opt/hpcx/ucx/lib/libucs.so.0(ucs_handle_error+0x2cc) [0x4000c1cd14dc]
1 /opt/hpcx/ucx/lib/libucs.so.0(+0x3168c) [0x4000c1cd168c]
2 /opt/hpcx/ucx/lib/libucs.so.0(+0x319b8) [0x4000c1cd19b8]
3 linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x4000239507dc]
4 [0xf86a280]
=================================
Segmentation fault (core dumped)
and for DDP with test_ddp.py
import torch
import torch.distributed as dist
from viztracer import VizTracer
with VizTracer(log_torch=True) as tracer:
dist.init_process_group(backend='nccl', init_method='env://') # having set DDP env vars
print("done!")
it is
/workspace$ MASTER_ADDR=$(hostname) MASTER_PORT=29500 RANK=0 WORLD_SIZE=1 LOCAL_RANK=1 LOCAL_WORLD_SIZE=1 python test_ddp.py
Loading finish
Total Entries: 73
Use the following command to open the report:
vizviewer /workspace/viztracer.json
Traceback (most recent call last):
File "/workspace/test_ddp.py", line 6, in <module>
dist.init_process_group(backend='nccl', init_method='env://') # having set DDP env vars
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py", line 94, in wrapper
with _WaitCounter(f"pytorch.wait_counter.c10d.{func.__name__}").guard():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: VizTracer: Unexpected type. Might be an event mismatch.
Using only the CPU and no DDP, a simple test runs fine. Does viztracer support CUDA and DDP workloads with Pytorch?
Metadata
Metadata
Assignees
Labels
No labels