-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
Describe the bug
When building the dumb http server with CONFIG_DEBUG enabled, for the stm32h747i_disco_m7 board, we very fast get a semaphore timeout on waiting for transmission complete callback from ST HAL layer. We can see
[00:00:02.719,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:02.719,000] <err> eth_stm32_hal: eth packet timeout
98 fa 9b 39 67 d7 02 80 e1 67 51 79 08 00 45 00 |...9g... .gQy..E.
00 28 00 00 00 00 40 06 64 a8 a9 fe 61 15 a9 fe |.(....@. d...a...
61 16 1f 90 ed 02 04 81 fc b9 99 ef e4 5e 50 10 |a....... .....^P.
05 00 08 90 00 00 |......
Often it hangs and doesn't recover. Build without CONFIG_DEBUG and it works flawless. Increasing semaphore timeout time doesn't do any difference.
What have you tried to diagnose or workaround this issue?
With instruction cache disabled, it works flawlessly with CONFIG_DEBUG enabled. I managed to track it down to
modules/hal/stm32/stm32cube/stm32h7xx/drivers/src/stm32h7xx_hal_eth.c: 2979. If we enable instruction cache before this line, timeout. If we enable it after, it works. If we insert a data barrier after, it works:
/* Mark it as LAST descriptor */
SET_BIT(dmatxdesc->DESC3, ETH_DMATXNDESCRF_LD);
__DSB();
I guess this is a fix, but really not that nice to be messing around in ST's HAL, and I'm also wondering if this fix actually fixes a problem we are causing in the driver. We might be doing something wrong in our stm32h7 driver? Comparing with ST samples for STM32H743 it looks correct. I didn't find anybody else having this issue with ST's HAL on stm32h747i MCU.
I've seen some people from ST contributing here, maybe somebody can take a look at this?
Messing with buffer alignments doesn't have any effect either; I tried alignment 256 bytes, with confirmation it was aligned.
Please note that this issue was seen when making this driver as well (#27188 (comment))
To Reproduce
Steps to reproduce the behavior:
- add to
/boards/arm/stm32h747i_disco/stm32h747i_disco.dtsi
&mac {
status="okay";
pinctrl-0 = <ð_tx_en_pg11 ð_txd1_pg12
ð_txd0_pg13 ð_mdc_pc1
ð_mdio_pa2 ð_ref_clk_pa1
ð_crs_dv_pa7 ð_rxd0_pc4
ð_rxd1_pc5>;
};
- add to
samples/net/sockets/dumb_http_server/prj.conf
CONFIG_NET_L2_ETHERNET=y
CONFIG_ETH_STM32_HAL=y
CONFIG_DEBUG=y
west build -b stm32h747i_disco_m7 zephyr/samples/net/sockets/dumb_http_server/- west flash
- Open COM port and do repeated http requests to devkit over ethernet, for example
ab -n 100 -c 1 http://192.0.2.1:8080/ - The error will be printed in the terminal instantly, and it will hang subsequent transmissions
Expected behavior
Ethernet tx complete semaphore should not timeout. Temporary fix proves this is possible.
Impact
Showstopper if it hangs, which it appears to do. We realize it only happens when building for debug, but it's not sustainable to not be able to debug properly. Something's wrong
Logs and console output
Please note I am using different IP's than example.
*** Booting Zephyr OS build zephyr-v2.4.0-1314-gc3ac3027a17a ***
[00:00:00.006,000] <inf> net_config: Initializing network
[00:00:00.006,000] <inf> net_config: Waiting interface 1 (0x24004ae4) to be up...
Single-threaded dumb HTTP server waits for a connection on port 8080...
[00:00:02.000,000] <inf> net_config: Interface 1 (0x24004ae4) coming up
[00:00:02.000,000] <inf> net_config: IPv4 address: 169.254.97.21
[00:00:02.991,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:02.991,000] <err> eth_stm32_hal: eth packet timeout
98 fa 9b 39 67 d7 02 80 e1 97 e2 22 08 06 00 01 |...9g... ..."....
08 00 06 04 00 02 02 80 e1 97 e2 22 a9 fe 61 15 |........ ..."..a.
98 fa 9b 39 67 d7 a9 fe 61 16 |...9g... a.
Connection #0 from 169.254.97.22
Connection from 169.254.97.22 closed
Environment (please complete the following information):
- OS: Ubuntu 20.04
- Toolchain Zephyr SDK
- Commit SHA or Version used: c3ac302 (v2.4.99)
Additional context
Ethernet cable connected straight to computer