Skip to content

eth: stm32h747i_disco: sem timeout and hang on debug build #29915

@emillindq

Description

@emillindq

Describe the bug
When building the dumb http server with CONFIG_DEBUG enabled, for the stm32h747i_disco_m7 board, we very fast get a semaphore timeout on waiting for transmission complete callback from ST HAL layer. We can see

[00:00:02.719,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:02.719,000] <err> eth_stm32_hal: eth packet timeout
98 fa 9b 39 67 d7 02 80  e1 67 51 79 08 00 45 00 |...9g... .gQy..E.
00 28 00 00 00 00 40 06  64 a8 a9 fe 61 15 a9 fe |.(....@. d...a...
61 16 1f 90 ed 02 04 81  fc b9 99 ef e4 5e 50 10 |a....... .....^P.
05 00 08 90 00 00                                |......           

Often it hangs and doesn't recover. Build without CONFIG_DEBUG and it works flawless. Increasing semaphore timeout time doesn't do any difference.
What have you tried to diagnose or workaround this issue?
With instruction cache disabled, it works flawlessly with CONFIG_DEBUG enabled. I managed to track it down to
modules/hal/stm32/stm32cube/stm32h7xx/drivers/src/stm32h7xx_hal_eth.c: 2979. If we enable instruction cache before this line, timeout. If we enable it after, it works. If we insert a data barrier after, it works:

/* Mark it as LAST descriptor */
SET_BIT(dmatxdesc->DESC3, ETH_DMATXNDESCRF_LD);
__DSB();

I guess this is a fix, but really not that nice to be messing around in ST's HAL, and I'm also wondering if this fix actually fixes a problem we are causing in the driver. We might be doing something wrong in our stm32h7 driver? Comparing with ST samples for STM32H743 it looks correct. I didn't find anybody else having this issue with ST's HAL on stm32h747i MCU.
I've seen some people from ST contributing here, maybe somebody can take a look at this?
Messing with buffer alignments doesn't have any effect either; I tried alignment 256 bytes, with confirmation it was aligned.
Please note that this issue was seen when making this driver as well (#27188 (comment))

To Reproduce
Steps to reproduce the behavior:

  1. add to /boards/arm/stm32h747i_disco/stm32h747i_disco.dtsi
&mac {
	status="okay";
	pinctrl-0 = <&eth_tx_en_pg11 &eth_txd1_pg12
				 &eth_txd0_pg13 &eth_mdc_pc1
				 &eth_mdio_pa2 &eth_ref_clk_pa1
				 &eth_crs_dv_pa7 &eth_rxd0_pc4
				 &eth_rxd1_pc5>;
};
  1. add to samples/net/sockets/dumb_http_server/prj.conf
CONFIG_NET_L2_ETHERNET=y
CONFIG_ETH_STM32_HAL=y
CONFIG_DEBUG=y
  1. west build -b stm32h747i_disco_m7 zephyr/samples/net/sockets/dumb_http_server/
  2. west flash
  3. Open COM port and do repeated http requests to devkit over ethernet, for example ab -n 100 -c 1 http://192.0.2.1:8080/
  4. The error will be printed in the terminal instantly, and it will hang subsequent transmissions

Expected behavior
Ethernet tx complete semaphore should not timeout. Temporary fix proves this is possible.

Impact
Showstopper if it hangs, which it appears to do. We realize it only happens when building for debug, but it's not sustainable to not be able to debug properly. Something's wrong

Logs and console output
Please note I am using different IP's than example.

*** Booting Zephyr OS build zephyr-v2.4.0-1314-gc3ac3027a17a  ***
[00:00:00.006,000] <inf> net_config: Initializing network
[00:00:00.006,000] <inf> net_config: Waiting interface 1 (0x24004ae4) to be up...
Single-threaded dumb HTTP server waits for a connection on port 8080...
[00:00:02.000,000] <inf> net_config: Interface 1 (0x24004ae4) coming up
[00:00:02.000,000] <inf> net_config: IPv4 address: 169.254.97.21
[00:00:02.991,000] <err> eth_stm32_hal: HAL_ETH_TransmitIT tx_int_sem take timeout
[00:00:02.991,000] <err> eth_stm32_hal: eth packet timeout
98 fa 9b 39 67 d7 02 80  e1 97 e2 22 08 06 00 01 |...9g... ..."....
08 00 06 04 00 02 02 80  e1 97 e2 22 a9 fe 61 15 |........ ..."..a.
98 fa 9b 39 67 d7 a9 fe  61 16                   |...9g... a.      
Connection #0 from 169.254.97.22
Connection from 169.254.97.22 closed

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Toolchain Zephyr SDK
  • Commit SHA or Version used: c3ac302 (v2.4.99)

Additional context
Ethernet cable connected straight to computer

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions