Skip to content

Auto3DSeg: How to Set up Multiple Phase Data in Datalist JSON File? #5592

@moonforsun

Description

@moonforsun

Describe the bug
I was able to run the Auto3DSeg based on the Task04_Hippocampus example. However, I was encountered the following bug on a customized dataset which I want to have multiple phase images as the network input. Any suggestion to how to use multiple phase data configured in the datalist json file?

Environment

================================
Printing MONAI config...

MONAI version: 1.0.1
Numpy version: 1.23.5
Pytorch version: 1.13.0+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 8271a19
MONAI file: /home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/init.py

Optional dependencies:
Pytorch Ignite version: 0.4.10
Nibabel version: 3.2.0
scikit-image version: 0.19.3
Pillow version: 9.3.0
Tensorboard version: 2.11.0
gdown version: 4.5.3
TorchVision version: 0.14.0+cu117
tqdm version: 4.64.1
lmdb version: 1.3.0
psutil version: 5.9.4
pandas version: 1.5.1
einops version: 0.6.0
transformers version: 4.21.3
mlflow version: 2.0.1
pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================
Printing system config...

System: Linux
Linux version: Ubuntu 18.04.6 LTS
Platform: Linux-5.4.0-1069-aws-x86_64-with-glibc2.27
Processor: x86_64
Machine: x86_64
Python version: 3.8.2
Process name: python
Command: ['/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: []
Num physical CPUs: 24
Num logical CPUs: 48
Num usable CPUs: 48
CPU usage (%): [3.3, 3.0, 3.0, 2.6, 3.3, 3.0, 3.3, 3.0, 3.3, 2.6, 3.3, 3.0, 3.3, 3.0, 3.3, 3.3, 3.0, 3.0, 3.0, 3.0, 3.3, 3.0, 3.0, 3.0, 3.0, 3.3, 3.3, 3.0, 3.3, 3.0, 3.0, 3.0, 3.6, 3.0, 3.0, 3.3, 3.0, 3.0, 3.3, 3.0, 3.3, 3.0, 3.0, 3.3, 3.3, 3.3, 3.3, 97.7]
CPU freq. (MHz): 1959
Load avg. in last 1, 5, 15 mins (%): [0.4, 4.9, 8.0]
Disk usage (%): 38.7
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 186.7
Available memory (GB): 184.2
Used memory (GB): 1.0

================================
Printing GPU config...

Num GPUs: 4
Has CUDA: True
CUDA version: 11.7
cuDNN enabled: True
cuDNN version: 8500
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA A10G
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 80
GPU 0 Total memory (GB): 22.2
GPU 0 CUDA capability (maj.min): 8.6
GPU 1 Name: NVIDIA A10G
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 80
GPU 1 Total memory (GB): 22.2
GPU 1 CUDA capability (maj.min): 8.6
GPU 2 Name: NVIDIA A10G
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 80
GPU 2 Total memory (GB): 22.2
GPU 2 CUDA capability (maj.min): 8.6
GPU 3 Name: NVIDIA A10G
GPU 3 Is integrated: False
GPU 3 Is multi GPU board: False
GPU 3 Multi processor count: 80
GPU 3 Total memory (GB): 22.2
GPU 3 CUDA capability (maj.min): 8.6

Additional context
Full trace

poetry run python -m monai.apps.auto3dseg AutoRunner run --input='./input.yaml' --work_dir='./LiverCrop'
2022-11-28 02:19:56,661 - INFO - ./LiverCrop does not exists. Creating...
2022-11-28 02:19:56,661 - INFO - ./LiverCrop created to save all results
2022-11-28 02:19:56,661 - INFO - Loading ./input.yaml for AutoRunner and making a copy in /home/ubuntu/Code/autoseg3d/LiverCrop/input.yaml
2022-11-28 02:19:56,663 - INFO - The output_dir is not specified. /home/ubuntu/Code/autoseg3d/LiverCrop/ensemble_output will be used to save ensemble predictions
2022-11-28 02:19:56,663 - INFO - Directory /home/ubuntu/Code/autoseg3d/LiverCrop/ensemble_output is created to save ensemble predictions
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 320/320 [06:45<00:00, 1.27s/it]
algo_templates.tar.gz: 296kB [00:00, 712kB/s]
2022-11-28 02:26:47,886 - INFO - Downloaded: /tmp/tmpndcigyw9/algo_templates.tar.gz
2022-11-28 02:26:47,886 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmpndcigyw9/algo_templates.tar.gz.
2022-11-28 02:26:47,886 - INFO - Writing into directory: /home/ubuntu/Code/autoseg3d/LiverCrop.
2022-11-28 02:26:55,159 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet2d_0
2022-11-28 02:27:02,219 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet2d_1
2022-11-28 02:27:09,540 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/dints_0
2022-11-28 02:27:16,466 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/dints_1
2022-11-28 02:27:23,697 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0
2022-11-28 02:27:30,749 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_1
2022-11-28 02:27:38,115 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet_0
2022-11-28 02:27:45,072 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet_1
2022-11-28 02:27:45,075 - INFO - Launching: torchrun --nnodes=1 --nproc_per_node=4 /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py run --config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'
['torchrun', '--nnodes=1', '--nproc_per_node=4', '/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py', 'run', "--config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'"]
Traceback (most recent call last):
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 186, in _run_cmd
normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True)
File "/usr/local/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['torchrun', '--nnodes=1', '--nproc_per_node=4', '/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py', 'run', "--config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'"]' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/main.py", line 22, in
fire.Fire(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/auto_runner.py", line 586, in run
self._train_algo_in_sequence(history)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/auto_runner.py", line 488, in _train_algo_in_sequence
algo.train(self.train_params)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 203, in train
return self._run_cmd(cmd, devices_info)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 191, in _run_cmd
raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e
RuntimeError: subprocess call error 1: b'WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"])
File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
component_trace = _Fire(component, args, parsed_flag_args, context, name)
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"]) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire

File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"])
File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
Traceback (most recent call last):
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in
fire.Fire()
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run
str_img = os.path.join(data_file_base_dir, list_train[_i]["image"])
File "/usr/local/lib/python3.8/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'list'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 34440) of binary: /home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/python
Traceback (most recent call last):
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py FAILED

Failures:
[1]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 34441)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 34442)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 34443)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2022-11-28_02:27:51
host : ip-172-20-253-208.us-west-2.compute.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 34440)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

', b'[info] number of GPUs: 4
[info] number of GPUs: 4
[info] number of GPUs: 4
2022-11-28 02:27:50,522 - Added key: store_based_barrier_key:1 to store for rank: 2
[info] number of GPUs: 4
2022-11-28 02:27:50,564 - Added key: store_based_barrier_key:1 to store for rank: 1
2022-11-28 02:27:50,570 - Added key: store_based_barrier_key:1 to store for rank: 0
2022-11-28 02:27:50,572 - Added key: store_based_barrier_key:1 to store for rank: 3
2022-11-28 02:27:50,572 - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
2022-11-28 02:27:50,574 - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
2022-11-28 02:27:50,574 - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
2022-11-28 02:27:50,581 - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.
[info] world_size: 4
'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions