-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Describe the bug
I was able to run the Auto3DSeg based on the Task04_Hippocampus example. However, I was encountered the following bug on a customized dataset which I want to have multiple phase images as the network input. Any suggestion to how to use multiple phase data configured in the datalist json file?
Environment
================================
Printing MONAI config...
MONAI version: 1.0.1
Numpy version: 1.23.5
Pytorch version: 1.13.0+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 8271a19
MONAI file: /home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/init.py
Optional dependencies:
Pytorch Ignite version: 0.4.10
Nibabel version: 3.2.0
scikit-image version: 0.19.3
Pillow version: 9.3.0
Tensorboard version: 2.11.0
gdown version: 4.5.3
TorchVision version: 0.14.0+cu117
tqdm version: 4.64.1
lmdb version: 1.3.0
psutil version: 5.9.4
pandas version: 1.5.1
einops version: 0.6.0
transformers version: 4.21.3
mlflow version: 2.0.1
pynrrd version: 1.0.0
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
System: Linux
Linux version: Ubuntu 18.04.6 LTS
Platform: Linux-5.4.0-1069-aws-x86_64-with-glibc2.27
Processor: x86_64
Machine: x86_64
Python version: 3.8.2
Process name: python
Command: ['/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: []
Num physical CPUs: 24
Num logical CPUs: 48
Num usable CPUs: 48
CPU usage (%): [3.3, 3.0, 3.0, 2.6, 3.3, 3.0, 3.3, 3.0, 3.3, 2.6, 3.3, 3.0, 3.3, 3.0, 3.3, 3.3, 3.0, 3.0, 3.0, 3.0, 3.3, 3.0, 3.0, 3.0, 3.0, 3.3, 3.3, 3.0, 3.3, 3.0, 3.0, 3.0, 3.6, 3.0, 3.0, 3.3, 3.0, 3.0, 3.3, 3.0, 3.3, 3.0, 3.0, 3.3, 3.3, 3.3, 3.3, 97.7]
CPU freq. (MHz): 1959
Load avg. in last 1, 5, 15 mins (%): [0.4, 4.9, 8.0]
Disk usage (%): 38.7
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 186.7
Available memory (GB): 184.2
Used memory (GB): 1.0
================================
Printing GPU config...
Num GPUs: 4
Has CUDA: True
CUDA version: 11.7
cuDNN enabled: True
cuDNN version: 8500
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA A10G
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 80
GPU 0 Total memory (GB): 22.2
GPU 0 CUDA capability (maj.min): 8.6
GPU 1 Name: NVIDIA A10G
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 80
GPU 1 Total memory (GB): 22.2
GPU 1 CUDA capability (maj.min): 8.6
GPU 2 Name: NVIDIA A10G
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 80
GPU 2 Total memory (GB): 22.2
GPU 2 CUDA capability (maj.min): 8.6
GPU 3 Name: NVIDIA A10G
GPU 3 Is integrated: False
GPU 3 Is multi GPU board: False
GPU 3 Multi processor count: 80
GPU 3 Total memory (GB): 22.2
GPU 3 CUDA capability (maj.min): 8.6
Additional context
Full trace
poetry run python -m monai.apps.auto3dseg AutoRunner run --input='./input.yaml' --work_dir='./LiverCrop'
2022-11-28 02:19:56,661 - INFO - ./LiverCrop does not exists. Creating...
2022-11-28 02:19:56,661 - INFO - ./LiverCrop created to save all results
2022-11-28 02:19:56,661 - INFO - Loading ./input.yaml for AutoRunner and making a copy in /home/ubuntu/Code/autoseg3d/LiverCrop/input.yaml
2022-11-28 02:19:56,663 - INFO - The output_dir is not specified. /home/ubuntu/Code/autoseg3d/LiverCrop/ensemble_output will be used to save ensemble predictions
2022-11-28 02:19:56,663 - INFO - Directory /home/ubuntu/Code/autoseg3d/LiverCrop/ensemble_output is created to save ensemble predictions
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 320/320 [06:45<00:00, 1.27s/it]
algo_templates.tar.gz: 296kB [00:00, 712kB/s]
2022-11-28 02:26:47,886 - INFO - Downloaded: /tmp/tmpndcigyw9/algo_templates.tar.gz
2022-11-28 02:26:47,886 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmpndcigyw9/algo_templates.tar.gz.
2022-11-28 02:26:47,886 - INFO - Writing into directory: /home/ubuntu/Code/autoseg3d/LiverCrop.
2022-11-28 02:26:55,159 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet2d_0
2022-11-28 02:27:02,219 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet2d_1
2022-11-28 02:27:09,540 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/dints_0
2022-11-28 02:27:16,466 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/dints_1
2022-11-28 02:27:23,697 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0
2022-11-28 02:27:30,749 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_1
2022-11-28 02:27:38,115 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet_0
2022-11-28 02:27:45,072 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet_1
2022-11-28 02:27:45,075 - INFO - Launching: torchrun --nnodes=1 --nproc_per_node=4 /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py run --config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'
['torchrun', '--nnodes=1', '--nproc_per_node=4', '/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py', 'run', "--config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'"]
Traceback (most recent call last):
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 186, in _run_cmd
normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
File "/usr/local/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['torchrun', '--nnodes=1', '--nproc_per_node=4', '/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py', 'run', "--config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'"]' returned non-zero exit status 1.The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/main.py", line 22, in
fire.Fire(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/auto_runner.py", line 586, in run
self._train_algo_in_sequence(history)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/auto_runner.py", line 488, in _train_algo_in_sequence
algo.train(self.train_params)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 203, in train
return self._run_cmd(cmd, devices_info)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 191, in _run_cmd
raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
RuntimeError: subprocess call error 1: b'WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"])
File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
component_trace = _Fire(component, args, parsed_flag_args, context, name)
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"]) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _FireFile "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"])
File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"])
File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 34440) of binary: /home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/python
Traceback (most recent call last):
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py FAILED
Failures:
[1]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 34441)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 34442)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 34443)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.htmlRoot Cause (first observed failure):
[0]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 34440)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html', b'[info] number of GPUs: 4
[info] number of GPUs: 4
[info] number of GPUs: 4
2022-11-28 02:27:50,522 - Added key: store_based_barrier_key:1 to store for rank: 2
[info] number of GPUs: 4
2022-11-28 02:27:50,564 - Added key: store_based_barrier_key:1 to store for rank: 1
2022-11-28 02:27:50,570 - Added key: store_based_barrier_key:1 to store for rank: 0
2022-11-28 02:27:50,572 - Added key: store_based_barrier_key:1 to store for rank: 3
2022-11-28 02:27:50,572 - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
2022-11-28 02:27:50,574 - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
2022-11-28 02:27:50,574 - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
2022-11-28 02:27:50,581 - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
'