Error when building current Tensorflow 2 with certain CPU instructions avx512f avx512vnni avx512bf16 fma

Good afternoon,

Trying to build the current Tensorflow binary with Python 3.12, Clang 18, Cuda 12.5, Nvidia driver 555 as suggested on
https://www.tensorflow.org/install/source
https://www.tensorflow.org/install/pip

Checked out master branch of https://github.com/tensorflow/tensorflow.git

See the requested root cause code attachments zip of the error message at the end.

Steps:
```
1.
$ python3 configure.py
You have bazel 7.4.1 installed.
Please specify the location of python. [Default is /home/simon/.pyenv/versions/3.12.10/bin/python3]: 


Found possible Python library paths:
  /home/simon/.pyenv/versions/3.12.10/lib/python3.12/site-packages
Please input the desired Python library path to use.  Default is [/home/simon/.pyenv/versions/3.12.10/lib/python3.12/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the hermetic CUDA version you want to use or leave empty to use the default version. 

Please specify the hermetic cuDNN version you want to use or leave empty to use the default version. 

Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 8.9

Please specify the local CUDA path you want to use or leave empty to use the default version.

Please specify the local CUDNN path you want to use or leave empty to use the default version.

Please specify the local NCCL path you want to use or leave empty to use the default version.

Do you want to use clang as CUDA compiler? [Y/n]: 
Clang will be used as CUDA compiler.

Please specify clang path that to be used as host compiler. [Default is /usr/lib/llvm-18/bin/clang]: 

You have Clang 18.1.3 installed.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=mkl_aarch64 	# Build with oneDNN and Compute Library for the Arm Architecture (ACL).
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=numa        	# Build with NUMA support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
	--config=v1          	# Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=nogcp       	# Disable GCP support.
	--config=nonccl      	# Disable NVIDIA NCCL support.



2.
$ bazel build //tensorflow/tools/pip_package:wheel --repo_env=USE_PYWRAP_RULES=1 --repo_env=WHEEL_NAME=tensorflow -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --config=cuda --config=cuda_wheel
--> OK, no error


with additionals params: --copt=-mavx512f --copt=-mavx512vnni --copt=-mavx512bf16 --copt=-mfma:

$ bazel build //tensorflow/tools/pip_package:wheel --repo_env=USE_PYWRAP_RULES=1 --repo_env=WHEEL_NAME=tensorflow -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mavx512f --copt=-mavx512vnni --copt=-mavx512bf16 --copt=-mfma --config=cuda --config=cuda_wheel
WARNING: The following configs were expanded more than once: [cuda_clang, cuda, cuda_version]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Reading 'startup' options from /home/simon/tensorflow/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=162
INFO: Reading rc options for 'build' from /home/simon/tensorflow/.bazelrc:
  Inherited 'common' options: --announce_rc --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility --noenable_bzlmod --noincompatible_enable_cc_toolchain_resolution --noincompatible_enable_android_toolchain_resolution --experimental_repo_remote_exec --java_runtime_version=remotejdk_21
INFO: Reading rc options for 'build' from /home/simon/tensorflow/.bazelrc:
  'build' options: --repo_env=ML_WHEEL_TYPE=snapshot --repo_env=ML_WHEEL_BUILD_DATE= --repo_env=ML_WHEEL_VERSION_SUFFIX= --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --host_features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --@rules_python//python/config_settings:precompile=force_disabled
INFO: Reading rc options for 'build' from /home/simon/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/home/simon/.pyenv/versions/3.12.10/bin/python3 --action_env PYTHON_LIB_PATH=/home/simon/.pyenv/versions/3.12.10/lib/python3.12/site-packages --python_path=/home/simon/.pyenv/versions/3.12.10/bin/python3 --action_env LD_LIBRARY_PATH=/usr/local/cuda-12.5/lib64: --config=cuda_clang --action_env CLANG_CUDA_COMPILER_PATH=/usr/lib/llvm-18/bin/clang --config=cuda_clang
...
WARNING: The following configs were expanded more than once: [cuda_clang, cuda, cuda_version]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
WARNING: Build option --copt has changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed).
INFO: Analyzed target //tensorflow/tools/pip_package:wheel (0 packages loaded, 58564 targets configured).
ERROR: /home/simon/tensorflow/tensorflow/core/kernels/BUILD:3587:18: Compiling tensorflow/core/kernels/matmul_op_fused.cc failed: (Exit 1): clang failed: error executing CppCompile command (from target //tensorflow/core/kernels:matmul_op) /usr/lib/llvm-18/bin/clang -MD -MF bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/matmul_op/matmul_op_fused.pic.d ... (remaining 572 arguments skipped)
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:57:9: warning: 'CHECK' macro redefined [-Wmacro-redefined]
   57 | #define CHECK(condition) ABSL_LOG_INTERNAL_CHECK_IMPL((condition), #condition)
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:304:9: note: previous definition is here
  304 | #define CHECK(condition)              \
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:65:9: warning: 'QCHECK' macro redefined [-Wmacro-redefined]
   65 | #define QCHECK(condition) ABSL_LOG_INTERNAL_QCHECK_IMPL((condition), #condition)
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:538:9: note: previous definition is here
  538 | #define QCHECK(condition) CHECK(condition)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:88:9: warning: 'DCHECK' macro redefined [-Wmacro-redefined]
   88 | #define DCHECK(condition) ABSL_LOG_INTERNAL_DCHECK_IMPL((condition), #condition)
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:517:9: note: previous definition is here
  517 | #define DCHECK(condition) \
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:116:9: warning: 'CHECK_EQ' macro redefined [-Wmacro-redefined]
  116 | #define CHECK_EQ(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:495:9: note: previous definition is here
  495 | #define CHECK_EQ(val1, val2) CHECK_OP(Check_EQ, ==, val1, val2)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:118:9: warning: 'CHECK_NE' macro redefined [-Wmacro-redefined]
  118 | #define CHECK_NE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:496:9: note: previous definition is here
  496 | #define CHECK_NE(val1, val2) CHECK_OP(Check_NE, !=, val1, val2)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:120:9: warning: 'CHECK_LE' macro redefined [-Wmacro-redefined]
  120 | #define CHECK_LE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:497:9: note: previous definition is here
  497 | #define CHECK_LE(val1, val2) CHECK_OP(Check_LE, <=, val1, val2)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:122:9: warning: 'CHECK_LT' macro redefined [-Wmacro-redefined]
  122 | #define CHECK_LT(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:498:9: note: previous definition is here
  498 | #define CHECK_LT(val1, val2) CHECK_OP(Check_LT, <, val1, val2)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:124:9: warning: 'CHECK_GE' macro redefined [-Wmacro-redefined]
  124 | #define CHECK_GE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:499:9: note: previous definition is here
  499 | #define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:126:9: warning: 'CHECK_GT' macro redefined [-Wmacro-redefined]
  126 | #define CHECK_GT(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:500:9: note: previous definition is here
  500 | #define CHECK_GT(val1, val2) CHECK_OP(Check_GT, >, val1, val2)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:128:9: warning: 'QCHECK_EQ' macro redefined [-Wmacro-redefined]
  128 | #define QCHECK_EQ(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:539:9: note: previous definition is here
  539 | #define QCHECK_EQ(x, y) CHECK_EQ(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:130:9: warning: 'QCHECK_NE' macro redefined [-Wmacro-redefined]
  130 | #define QCHECK_NE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:540:9: note: previous definition is here
  540 | #define QCHECK_NE(x, y) CHECK_NE(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:132:9: warning: 'QCHECK_LE' macro redefined [-Wmacro-redefined]
  132 | #define QCHECK_LE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:541:9: note: previous definition is here
  541 | #define QCHECK_LE(x, y) CHECK_LE(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:134:9: warning: 'QCHECK_LT' macro redefined [-Wmacro-redefined]
  134 | #define QCHECK_LT(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:542:9: note: previous definition is here
  542 | #define QCHECK_LT(x, y) CHECK_LT(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:136:9: warning: 'QCHECK_GE' macro redefined [-Wmacro-redefined]
  136 | #define QCHECK_GE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:543:9: note: previous definition is here
  543 | #define QCHECK_GE(x, y) CHECK_GE(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:138:9: warning: 'QCHECK_GT' macro redefined [-Wmacro-redefined]
  138 | #define QCHECK_GT(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:544:9: note: previous definition is here
  544 | #define QCHECK_GT(x, y) CHECK_GT(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:140:9: warning: 'DCHECK_EQ' macro redefined [-Wmacro-redefined]
  140 | #define DCHECK_EQ(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:527:9: note: previous definition is here
  527 | #define DCHECK_EQ(x, y) _TF_DCHECK_NOP(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:142:9: warning: 'DCHECK_NE' macro redefined [-Wmacro-redefined]
  142 | #define DCHECK_NE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:528:9: note: previous definition is here
  528 | #define DCHECK_NE(x, y) _TF_DCHECK_NOP(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:144:9: warning: 'DCHECK_LE' macro redefined [-Wmacro-redefined]
  144 | #define DCHECK_LE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:529:9: note: previous definition is here
  529 | #define DCHECK_LE(x, y) _TF_DCHECK_NOP(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:146:9: warning: 'DCHECK_LT' macro redefined [-Wmacro-redefined]
  146 | #define DCHECK_LT(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:530:9: note: previous definition is here
  530 | #define DCHECK_LT(x, y) _TF_DCHECK_NOP(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:148:9: warning: 'DCHECK_GE' macro redefined [-Wmacro-redefined]
  148 | #define DCHECK_GE(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:531:9: note: previous definition is here
  531 | #define DCHECK_GE(x, y) _TF_DCHECK_NOP(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:34:
external/com_google_absl/absl/log/check.h:150:9: warning: 'DCHECK_GT' macro redefined [-Wmacro-redefined]
  150 | #define DCHECK_GT(val1, val2) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:532:9: note: previous definition is here
  532 | #define DCHECK_GT(x, y) _TF_DCHECK_NOP(x, y)
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:37:
In file included from external/local_xla/xla/primitive_util.h:31:
external/com_google_absl/absl/log/log.h:199:9: warning: 'LOG' macro redefined [-Wmacro-redefined]
  199 | #define LOG(severity) ABSL_LOG_INTERNAL_LOG_IMPL(_##severity)
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:160:9: note: previous definition is here
  160 | #define LOG(severity) _TF_LOG_##severity
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:37:
In file included from external/local_xla/xla/primitive_util.h:31:
external/com_google_absl/absl/log/log.h:237:9: warning: 'LOG_EVERY_N' macro redefined [-Wmacro-redefined]
  237 | #define LOG_EVERY_N(severity, n) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:274:9: note: previous definition is here
  274 | #define LOG_EVERY_N(severity, n)                       \
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:37:
In file included from external/local_xla/xla/primitive_util.h:31:
external/com_google_absl/absl/log/log.h:245:9: warning: 'LOG_FIRST_N' macro redefined [-Wmacro-redefined]
  245 | #define LOG_FIRST_N(severity, n) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:280:9: note: previous definition is here
  280 | #define LOG_FIRST_N(severity, n)                       \
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:37:
In file included from external/local_xla/xla/primitive_util.h:31:
external/com_google_absl/absl/log/log.h:253:9: warning: 'LOG_EVERY_POW_2' macro redefined [-Wmacro-redefined]
  253 | #define LOG_EVERY_POW_2(severity) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:286:9: note: previous definition is here
  286 | #define LOG_EVERY_POW_2(severity)                         \
      |         ^
In file included from tensorflow/core/kernels/matmul_op_fused.cc:60:
In file included from external/local_xla/xla/stream_executor/gpu/redzone_allocator.h:26:
In file included from external/local_xla/xla/shape.h:37:
In file included from external/local_xla/xla/primitive_util.h:31:
external/com_google_absl/absl/log/log.h:265:9: warning: 'LOG_EVERY_N_SEC' macro redefined [-Wmacro-redefined]
  265 | #define LOG_EVERY_N_SEC(severity, n_seconds) \
      |         ^
external/local_xla/xla/tsl/platform/default/logging.h:296:9: note: previous definition is here
  296 | #define LOG_EVERY_N_SEC(severity, n_seconds)                      \
      |         ^
fatal error: error in backend: Cannot select: 0x5c17b34467b0: v8f16,ch = masked_load<(load unknown-size from %ir.477, align 2, !alias.scope !4349)> 0x5c17ba7dd760, 0x5c17af9d7b10, undef:i64, 0x5c17a3780e90, undef:v8f16
  0x5c17af9d7b10: i64 = add 0x5c17b9dc0130, 0x5c17a2b8c390
    0x5c17b9dc0130: i64,ch = CopyFromReg 0x5c17ba7dd760, Register:i64 %33
      0x5c17b9dc0910: i64 = Register %33
    0x5c17a2b8c390: i64 = shl 0x5c17b3446dd0, Constant:i8<1>
      0x5c17b3446dd0: i64,ch = CopyFromReg 0x5c17ba7dd760, Register:i64 %37
        0x5c17af9d7c60: i64 = Register %37
      0x5c17a12e56a0: i8 = Constant<1>
  0x5c17ac4e4850: i64 = undef
  0x5c17a3780e90: v8i1 = extract_subvector 0x5c17ac6d0c10, Constant:i64<0>
    0x5c17ac6d0c10: v16i1 = insert_subvector 0x5c17a3781360, 0x5c17b3eff7b0, Constant:i64<0>
      0x5c17a3781360: v16i1 = BUILD_VECTOR Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, Constant:i8<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
        0x5c17b3eff9e0: i8 = Constant<0>
      0x5c17b3eff7b0: v4i1 = X86ISD::CMPM 0x5c17acaf76e0, 0x5c17a4e96110, TargetConstant:i8<5>
        0x5c17acaf76e0: v4f32 = fadd 0x5c17adcafaa0, 0x5c17a12e6040
          0x5c17adcafaa0: v4f32,ch = load<(load (s128) from %ir.464, align 4, !tbaa !41, !alias.scope !4342)> 0x5c17ba7dd760, 0x5c17bbafe5f0, undef:i64
            0x5c17bbafe5f0: i64 = add 0x5c17adcaf720, 0x5c17a46d51b0
              0x5c17adcaf720: i64,ch = CopyFromReg 0x5c17ba7dd760, Register:i64 %30
                0x5c17b3f8fb50: i64 = Register %30
              0x5c17a46d51b0: i64 = shl 0x5c17b3446dd0, Constant:i8<2>
                0x5c17b3446dd0: i64,ch = CopyFromReg 0x5c17ba7dd760, Register:i64 %37
                  0x5c17af9d7c60: i64 = Register %37
                0x5c17b32f6e90: i8 = Constant<2>
            0x5c17ac4e4850: i64 = undef
          0x5c17a12e6040: v4f32 = X86ISD::CVTPH2PS 0x5c17aee48760
            0x5c17aee48760: v8i16 = bitcast 0x5c17a37812f0
              0x5c17a37812f0: v2i64 = scalar_to_vector 0x5c17ba10ec80
                0x5c17ba10ec80: i64,ch = load<(load (s64) from %ir.467, align 2)> 0x5c17ba7dd760, 0x5c17a3a13410, undef:i64
                  0x5c17a3a13410: i64 = add 0x5c17b9da2770, 0x5c17a2b8c390


                  0x5c17ac4e4850: i64 = undef
        0x5c17a4e96110: v4f32,ch = CopyFromReg 0x5c17ba7dd760, Register:v4f32 %27
          0x5c17ba18d300: v4f32 = Register %27
        0x5c17b4351de0: i8 = TargetConstant<5>
      0x5c17adcaf4f0: i64 = Constant<0>
    0x5c17adcaf4f0: i64 = Constant<0>
  0x5c17b33e5380: v8f16 = undef

In function: _ZN5Eigen8internal14TensorExecutorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1ElEELi0ENS_11MakePointerEEEKNS_14TensorSelectOpIKNS_19TensorCwiseBinaryOpINS0_13scalar_cmp_opIffLNS0_14ComparisonNameE1ELb0EEEKNS9_INS0_13scalar_sum_opIffEEKS7_KNS_18TensorConversionOpIfKNS3_INS4_IKNS_4halfELi1ELi1ElEELi0ES6_EEEEEEKNS_20TensorCwiseNullaryOpINS0_18scalar_constant_opIfEESP_EEEEKNS9_INS0_17scalar_product_opIffEESP_SU_EESP_EEEENS_13DefaultDeviceELb1ELNS0_15TiledEvaluationE0EE3runERS14_RKS15_

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /usr/lib/llvm-18/bin/clang -MD -MF bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/matmul_op/matmul_op_fused.pic.d -frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/matmul_op/matmul_op_fused.pic.o -DEIGEN_MAX_ALIGN_BYTES=64 -DEIGEN_ALLOW_UNALIGNED_SCALARS -DEIGEN_USE_AVX512_GEMM_KERNELS=0 -DHAVE_SYS_UIO_H -DTF_USE_SNAPPY -DTENSORFLOW_USE_NUMA -DTF_ENABLE_ACTIVITY_WATCHER -DTF_MAJOR_VERSION=2 -DTF_MINOR_VERSION=20 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=\"-dev0+selfbuilt\" -DLLVM_ON_UNIX=1 -DHAVE_BACKTRACE=1 -DBACKTRACE_HEADER=<execinfo.h> -DLTDL_SHLIB_EXT=\".so\" -DLLVM_PLUGIN_EXT=\".so\" -DLLVM_ENABLE_THREADS=1 -DHAVE_DEREGISTER_FRAME=1 -DHAVE_LIBPTHREAD=1 -DHAVE_PTHREAD_GETNAME_NP=1 -DHAVE_PTHREAD_H=1 -DHAVE_PTHREAD_SETNAME_NP=1 -DHAVE_REGISTER_FRAME=1 -DHAVE_SETENV_R=1 -DHAVE_STRERROR_R=1 -DHAVE_SYSEXITS_H=1 -DHAVE_UNISTD_H=1 -D_GNU_SOURCE -DHAVE_GETAUXVAL=1 -DHAVE_MALLINFO=1 -DHAVE_SBRK=1 -DHAVE_STRUCT_STAT_ST_MTIM_TV_NSEC=1 -DHAVE_BUILTIN_THREAD_POINTER -DLLVM_NATIVE_ARCH=\"X86\" -DLLVM_NATIVE_ASMPARSER=LLVMInitializeX86AsmParser -DLLVM_NATIVE_ASMPRINTER=LLVMInitializeX86AsmPrinter -DLLVM_NATIVE_DISASSEMBLER=LLVMInitializeX86Disassembler -DLLVM_NATIVE_TARGET=LLVMInitializeX86Target -DLLVM_NATIVE_TARGETINFO=LLVMInitializeX86TargetInfo -DLLVM_NATIVE_TARGETMC=LLVMInitializeX86TargetMC -DLLVM_NATIVE_TARGETMCA=LLVMInitializeX86TargetMCA -DLLVM_HOST_TRIPLE=\"x86_64-unknown-linux-gnu\" -DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-linux-gnu\" -DLLVM_VERSION_MAJOR=21 -DLLVM_VERSION_MINOR=0 -DLLVM_VERSION_PATCH=0 -DLLVM_VERSION_STRING=\"21.0.0git\" -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DLLVM_HAS_AArch64_TARGET=1 -DLLVM_HAS_AMDGPU_TARGET=1 -DLLVM_HAS_ARM_TARGET=1 -DLLVM_HAS_NVPTX_TARGET=1 -DLLVM_HAS_PowerPC_TARGET=1 -DLLVM_HAS_RISCV_TARGET=1 -DLLVM_HAS_SystemZ_TARGET=1 -DLLVM_HAS_X86_TARGET=1 -DBLAKE3_USE_NEON=0 -DBLAKE3_NO_AVX2 -DBLAKE3_NO_AVX512 -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DNO_LLVM_SUPPORT=0 -DCURL_STATICLIB -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DEIGEN_ALTIVEC_USE_CUSTOM_PACK=0 -DEIGEN_NEON_GEBP_NR=4 -iquote . -iquote bazel-out/k8-opt/bin -iquote external/com_google_absl -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -iquote external/local_xla -iquote bazel-out/k8-opt/bin/external/local_xla -iquote external/local_tsl -iquote bazel-out/k8-opt/bin/external/local_tsl -iquote external/com_googlesource_code_re2 -iquote bazel-out/k8-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/k8-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/k8-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/k8-opt/bin/external/highwayhash -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/ml_dtypes_py -iquote bazel-out/k8-opt/bin/external/ml_dtypes_py -iquote external/snappy -iquote bazel-out/k8-opt/bin/external/snappy -iquote external/hwloc -iquote bazel-out/k8-opt/bin/external/hwloc -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -iquote external/cuda_cudart -iquote bazel-out/k8-opt/bin/external/cuda_cudart -iquote external/cuda_cublas -iquote bazel-out/k8-opt/bin/external/cuda_cublas -iquote external/cuda_cccl -iquote bazel-out/k8-opt/bin/external/cuda_cccl -iquote external/cuda_nvtx -iquote bazel-out/k8-opt/bin/external/cuda_nvtx -iquote external/cuda_nvcc -iquote bazel-out/k8-opt/bin/external/cuda_nvcc -iquote external/cuda_cusolver -iquote bazel-out/k8-opt/bin/external/cuda_cusolver -iquote external/cuda_cufft -iquote bazel-out/k8-opt/bin/external/cuda_cufft -iquote external/cuda_cusparse -iquote bazel-out/k8-opt/bin/external/cuda_cusparse -iquote external/cuda_curand -iquote bazel-out/k8-opt/bin/external/cuda_curand -iquote external/cuda_cupti -iquote bazel-out/k8-opt/bin/external/cuda_cupti -iquote external/cuda_nvml -iquote bazel-out/k8-opt/bin/external/cuda_nvml -iquote external/cuda_nvjitlink -iquote bazel-out/k8-opt/bin/external/cuda_nvjitlink -iquote external/local_config_tensorrt -iquote bazel-out/k8-opt/bin/external/local_config_tensorrt -iquote external/nvshmem -iquote bazel-out/k8-opt/bin/external/nvshmem -iquote external/nccl_archive -iquote bazel-out/k8-opt/bin/external/nccl_archive -iquote external/nvtx_archive -iquote bazel-out/k8-opt/bin/external/nvtx_archive -iquote external/cuda_cudnn -iquote bazel-out/k8-opt/bin/external/cuda_cudnn -iquote external/llvm-project -iquote bazel-out/k8-opt/bin/external/llvm-project -iquote external/cudnn_frontend_archive -iquote bazel-out/k8-opt/bin/external/cudnn_frontend_archive -iquote external/gif -iquote bazel-out/k8-opt/bin/external/gif -iquote external/curl -iquote bazel-out/k8-opt/bin/external/curl -iquote external/boringssl -iquote bazel-out/k8-opt/bin/external/boringssl -iquote external/jsoncpp_git -iquote bazel-out/k8-opt/bin/external/jsoncpp_git -iquote external/onednn -iquote bazel-out/k8-opt/bin/external/onednn -iquote external/stablehlo -iquote bazel-out/k8-opt/bin/external/stablehlo -iquote external/gemmlowp -iquote bazel-out/k8-opt/bin/external/gemmlowp -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers -Ibazel-out/k8-opt/bin/external/cuda_cudart/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cublas/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cccl/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_nvtx/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_nvcc/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cusolver/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cufft/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cusparse/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_curand/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cupti/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_nvml/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_nvjitlink/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -Ibazel-out/k8-opt/bin/external/nvshmem/_virtual_includes/nvshmem_config -Ibazel-out/k8-opt/bin/external/nccl_archive/_virtual_includes/nccl_config -Ibazel-out/k8-opt/bin/external/nvtx_archive/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/cuda_cudnn/_virtual_includes/headers -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/ArithCanonicalizationIncGen -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/AsmParserTokenKinds -Ibazel-out/k8-opt/bin/external/cudnn_frontend_archive/_virtual_includes/cudnn_frontend -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/MLIRShapeCanonicalizationIncGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/NVPTXCodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/NVPTXCommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/NVPTXInfo -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/NVPTXUtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/SPIRVCanonicalizationIncGen -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/ShapeToStandardGen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/mlir_hlo -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/canonicalize_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/convert_op_folder -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_ops_attrs_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_ops_common -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_ops_enums_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_ops_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_ops_pattern_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_ops_typedefs_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/mhlo_passes -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/chlo_legalize_to_hlo_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/hlo_legalize_to_stablehlo -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/map_stablehlo_to_hlo_op -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/legalize_to_linalg_utils -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/map_mhlo_to_scalar_op -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/transformation_helpers -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/map_chlo_to_hlo_op -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/mhlo_pass_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/mhlo_rng_utils -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/mhlo_scatter_gather_utils -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/shape_component_analysis -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/stablehlo_legalize_to_hlo -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/type_conversion -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/unfuse_batch_norm -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/transforms_passes -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/deallocation_passes -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/deallocation_passes_inc_gen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/deallocation_utils -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/transforms_passes_inc_gen -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/GPUToNVVMGen -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/all_passes -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/transforms_gpu_passes -Ibazel-out/k8-opt/bin/external/local_xla/xla/mlir_hlo/_virtual_includes/gpu_transforms_passes_inc_gen -Ibazel-out/k8-opt/bin/external/llvm-project/mlir/_virtual_includes/GPUToROCDLTGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AArch64CodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AArch64CommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AArch64Info -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AArch64UtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/ARMCodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/ARMCommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/ARMInfo -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/ARMUtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AMDGPUCodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AMDGPUCommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AMDGPUInfo -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/AMDGPUUtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/InstCombineTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/amdgpu_isel_target_gen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/r600_target_gen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/PowerPCCodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/PowerPCCommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/PowerPCInfo -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/PowerPCUtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/RISCVCodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/RISCVCommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/RISCVInfo -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/RISCVUtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/riscv_isel_target_gen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/SystemZCodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/SystemZCommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/SystemZInfo -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/SystemZUtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/X86CodeGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/X86CommonTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/X86Info -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/X86UtilsAndDesc -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/JITLinkTableGen -Ibazel-out/k8-opt/bin/external/llvm-project/llvm/_virtual_includes/X86DisassemblerInternalHeaders -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt/bin/external/com_google_protobuf/src -isystem external/zlib -isystem bazel-out/k8-opt/bin/external/zlib -isystem external/farmhash_archive/src -isystem bazel-out/k8-opt/bin/external/farmhash_archive/src -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/eigen_archive/mkl_include -isystem bazel-out/k8-opt/bin/external/eigen_archive/mkl_include -isystem external/hwloc/hwloc -isystem bazel-out/k8-opt/bin/external/hwloc/hwloc -isystem external/hwloc/include -isystem bazel-out/k8-opt/bin/external/hwloc/include -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/cuda_cudart/include -isystem bazel-out/k8-opt/bin/external/cuda_cudart/include -isystem external/cuda_cublas/include -isystem bazel-out/k8-opt/bin/external/cuda_cublas/include -isystem external/cuda_cccl/include -isystem bazel-out/k8-opt/bin/external/cuda_cccl/include -isystem external/cuda_nvtx/include -isystem bazel-out/k8-opt/bin/external/cuda_nvtx/include -isystem external/cuda_nvcc/include -isystem bazel-out/k8-opt/bin/external/cuda_nvcc/include -isystem external/cuda_cusolver/include -isystem bazel-out/k8-opt/bin/external/cuda_cusolver/include -isystem external/cuda_cufft/include -isystem bazel-out/k8-opt/bin/external/cuda_cufft/include -isystem external/cuda_cusparse/include -isystem bazel-out/k8-opt/bin/external/cuda_cusparse/include -isystem external/cuda_curand/include -isystem bazel-out/k8-opt/bin/external/cuda_curand/include -isystem external/cuda_cupti/include -isystem bazel-out/k8-opt/bin/external/cuda_cupti/include -isystem external/cuda_nvml/include -isystem bazel-out/k8-opt/bin/external/cuda_nvml/include -isystem external/cuda_nvjitlink/include -isystem bazel-out/k8-opt/bin/external/cuda_nvjitlink/include -isystem external/cuda_cudnn/include -isystem bazel-out/k8-opt/bin/external/cuda_cudnn/include -isystem external/llvm-project/llvm/include -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/include -isystem external/llvm-project/mlir/include -isystem bazel-out/k8-opt/bin/external/llvm-project/mlir/include -isystem external/gif -isystem bazel-out/k8-opt/bin/external/gif -isystem external/curl/include -isystem bazel-out/k8-opt/bin/external/curl/include -isystem external/boringssl/src/include -isystem bazel-out/k8-opt/bin/external/boringssl/src/include -isystem external/jsoncpp_git/include -isystem bazel-out/k8-opt/bin/external/jsoncpp_git/include -isystem external/onednn/include -isystem bazel-out/k8-opt/bin/external/onednn/include -isystem external/onednn/src -isystem bazel-out/k8-opt/bin/external/onednn/src -isystem external/onednn/src/common -isystem bazel-out/k8-opt/bin/external/onednn/src/common -isystem external/onednn/src/common/ittnotify -isystem bazel-out/k8-opt/bin/external/onednn/src/common/ittnotify -isystem external/onednn/src/cpu -isystem bazel-out/k8-opt/bin/external/onednn/src/cpu -isystem external/onednn/src/cpu/gemm -isystem bazel-out/k8-opt/bin/external/onednn/src/cpu/gemm -isystem external/onednn/src/cpu/x64/xbyak -isystem bazel-out/k8-opt/bin/external/onednn/src/cpu/x64/xbyak -isystem external/onednn/src/graph -isystem bazel-out/k8-opt/bin/external/onednn/src/graph -isystem tensorflow/compiler/mlir/tensorflow/include -isystem bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/include -isystem external/llvm-project/llvm/lib/Target/NVPTX -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/NVPTX -isystem external/llvm-project/llvm/lib/Target/AArch64 -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/AArch64 -isystem external/llvm-project/llvm/lib/Target/ARM -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/ARM -isystem external/llvm-project/llvm/lib/Target/AMDGPU -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/AMDGPU -isystem external/llvm-project/llvm/lib/Target/PowerPC -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/PowerPC -isystem external/llvm-project/llvm/lib/Target/RISCV -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/RISCV -isystem external/llvm-project/llvm/lib/Target/SystemZ -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/SystemZ -isystem external/llvm-project/llvm/lib/Target/X86 -isystem bazel-out/k8-opt/bin/external/llvm-project/llvm/lib/Target/X86 -fmerge-all-constants -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fPIC -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -fstack-protector -Wall -Wno-invalid-partial-specialization -fno-omit-frame-pointer -no-canonical-prefixes -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections --cuda-path=external/cuda_nvcc -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result -Werror=unused-result -Wswitch -Werror=switch -DAUTOLOAD_DYNAMIC_KERNELS -Qunused-arguments -Wno-unknown-cuda-version -Qunused-arguments -Wno-unknown-cuda-version -mavx -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -mavx512vnni -mavx512bf16 -mfma -std=c++17 -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -ftemplate-depth=900 -fno-exceptions -DGOOGLE_CUDA=1 -DTENSORFLOW_USE_XLA=1 -DINTEL_MKL -DENABLE_ONEDNN_V3 -DAMD_ZENDNN -DTF_LLVM_X86_AVAILABLE=1 -msse3 -pthread -DNV_CUDNN_DISABLE_EXCEPTION -DGOOGLE_CUDA=1 -DNV_CUDNN_DISABLE_EXCEPTION -DTENSORFLOW_USE_XLA=1 -DINTEL_MKL=1 -c tensorflow/core/kernels/matmul_op_fused.cc -o bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/matmul_op/matmul_op_fused.pic.o
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'tensorflow/core/kernels/matmul_op_fused.cc'.
4.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@_ZN5Eigen8internal14TensorExecutorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1ElEELi0ENS_11MakePointerEEEKNS_14TensorSelectOpIKNS_19TensorCwiseBinaryOpINS0_13scalar_cmp_opIffLNS0_14ComparisonNameE1ELb0EEEKNS9_INS0_13scalar_sum_opIffEEKS7_KNS_18TensorConversionOpIfKNS3_INS4_IKNS_4halfELi1ELi1ElEELi0ES6_EEEEEEKNS_20TensorCwiseNullaryOpINS0_18scalar_constant_opIfEESP_EEEEKNS9_INS0_17scalar_product_opIffEESP_SU_EESP_EEEENS_13DefaultDeviceELb1ELNS0_15TiledEvaluationE0EE3runERS14_RKS15_'
 #0 0x00007ec5bc1a63bf llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xda63bf)
 #1 0x00007ec5bc1a44f9 llvm::sys::RunSignalHandlers() (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xda44f9)
 #2 0x00007ec5bc0efff3 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xcefff3)
 #3 0x00007ec5bc0effa2 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xceffa2)
 #4 0x00007ec5bc1a0c70 llvm::sys::Process::Exit(int, bool) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xda0c70)
 #5 0x00005c1771e0691b (/usr/lib/llvm-18/bin/clang+0x1491b)
 #6 0x00007ec5bc0fe01c llvm::report_fatal_error(llvm::Twine const&, bool) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xcfe01c)
 #7 0x00007ec5bca2f8e3 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x162f8e3)
 #8 0x00007ec5bca2ed76 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x162ed76)
 #9 0x00007ec5bef3a2ee (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x3b3a2ee)
#10 0x00007ec5bca2611f llvm::SelectionDAGISel::DoInstructionSelection() (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x162611f)
#11 0x00007ec5bca25790 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x1625790)
#12 0x00007ec5bca248de llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x16248de)
#13 0x00007ec5bca22934 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x1622934)
#14 0x00007ec5bef30086 (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x3b30086)
#15 0x00007ec5bc5826b9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0x11826b9)
#16 0x00007ec5bc2f7772 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xef7772)
#17 0x00007ec5bc2fd2f4 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xefd2f4)
#18 0x00007ec5bc2f7e9f llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xef7e9f)
#19 0x00007ec5c4817310 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x1c17310)
#20 0x00007ec5c4b9fa07 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x1f9fa07)
#21 0x00007ec5c37973d6 clang::ParseAST(clang::Sema&, bool, bool) (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0xb973d6)
#22 0x00007ec5c560662c clang::FrontendAction::Execute() (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x2a0662c)
#23 0x00007ec5c55830b4 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x29830b4)
#24 0x00007ec5c568263d clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x2a8263d)
#25 0x00005c1771e0642e cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/lib/llvm-18/bin/clang+0x1442e)
#26 0x00005c1771e03894 (/usr/lib/llvm-18/bin/clang+0x11894)
#27 0x00007ec5c5233972 (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x2633972)
#28 0x00007ec5bc0eff77 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/usr/lib/llvm-18/bin/../lib/libLLVM.so.18.1+0xceff77)
#29 0x00007ec5c5233237 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x2633237)
#30 0x00007ec5c51fb518 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x25fb518)
#31 0x00007ec5c51fb77f clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x25fb77f)
#32 0x00007ec5c5217c20 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/usr/lib/llvm-18/bin/../lib/libclang-cpp.so.18.1+0x2617c20)
#33 0x00005c1771e031ec clang_main(int, char**, llvm::ToolContext const&) (/usr/lib/llvm-18/bin/clang+0x111ec)
#34 0x00005c1771e10383 main (/usr/lib/llvm-18/bin/clang+0x1e383)
#35 0x00007ec5bac2a1ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#36 0x00007ec5bac2a28b call_init ./csu/../csu/libc-start.c:128:20
#37 0x00007ec5bac2a28b __libc_start_main ./csu/../csu/libc-start.c:347:5
#38 0x00005c1771e00255 _start (/usr/lib/llvm-18/bin/clang+0xe255)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Ubuntu clang version 18.1.3 (1ubuntu1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-18/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/matmul_op_fused-129f06.cpp
clang: note: diagnostic msg: /tmp/matmul_op_fused-129f06.sh
clang: note: diagnostic msg: 

********************
Target //tensorflow/tools/pip_package:wheel failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1779.080s, Critical Path: 141.76s
INFO: 14132 processes: 35 internal, 14097 local.
ERROR: Build did NOT complete successfully
```

[matmul_op_fused-129f06.zip](https://github.com/user-attachments/files/20653600/matmul_op_fused-129f06.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when building current Tensorflow 2 with certain CPU instructions avx512f avx512vnni avx512bf16 fma #143365

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when building current Tensorflow 2 with certain CPU instructions avx512f avx512vnni avx512bf16 fma #143365

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions