-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
Describe the bug
Describe the bug
dns_resolve_close() fails to clean up DNS server slots when the context state is DNS_RESOLVE_CONTEXT_INACTIVE. This leaves stale server configuration data in memory, causing subsequent dns_resolve_init() calls to incorrectly skip socket creation, permanently breaking DNS resolution after network reconnection events.
Expected behavior:
DNS resolution should work normally after step 8.
Actual behavior:
DNS resolution permanently hangs/times out. All getaddrinfo() calls fail silently.
Impact
This bug affects any application that needs to reconfigure DNS after network state changes:
- Cellular modem power cycling scenarios
- VPN connection/disconnection cycles
- Any system with dynamic network topology requiring DNS reconfiguration
- Applications calling
dns_resolve_close()/dns_resolve_init()multiple times
Root Cause Analysis
In subsys/net/lib/dns/resolve.c, the function dns_resolve_close_locked() contains this code:
static int dns_resolve_close_locked(struct dns_resolve_context *ctx)
{
int i, ret;
if (ctx->state != DNS_RESOLVE_CONTEXT_ACTIVE) {
return -ENOENT; // Bug: returns immediately without cleanup!
}
// ... cleanup code that never executes when state is INACTIVE ...
}Problem flow:
- System boots →
net_init()auto-creates DNS context inINACTIVEstate - Application calls
dns_resolve_init()→ state becomesACTIVE, server slot populated - DNS works
- Network disconnects
- Application calls
dns_resolve_close()→ context transitions toINACTIVE - Server slot still has:
sa_family=1, IP address="141.1.1.1" (NOT cleared because close did nothing!) - Network reconnects
- Application calls
dns_resolve_init()with same DNS server is_server_name_found()checks server[0] withsa_family=1→ finds "server already exists"- Code skips socket creation via
continuestatement dns_dispatcher_register()never called → socket service restart never triggered- Socket service keeps polling old (closed) socket FD
- DNS responses arrive but are never delivered to application
- DNS permanently broken
What have you tried to diagnose or workaround this issue?
- Added extensive logging to DNS resolver to trace server slot states through close/init cycles
- Confirmed that
dns_resolve_close_locked()returns immediately when state isINACTIVE - Verified that server slots retain
sa_familyand IP address data after close - Traced socket service behavior and confirmed it never receives restart notification on second init
- Tested workaround: Modified
dns_resolve_close_locked()to always clean up server slots regardless of state → DNS resolution works correctly
Proposed Fix
Modify dns_resolve_close_locked() to clean up server slots even when context is not ACTIVE:
static int dns_resolve_close_locked(struct dns_resolve_context *ctx)
{
int i, ret;
if (ctx->state != DNS_RESOLVE_CONTEXT_ACTIVE) {
/* Even if context is not ACTIVE, we should still close any open sockets
* to ensure proper cleanup. This handles cases where context is INACTIVE
* but server slots still have valid data from previous initialization.
*/
for (i = 0; i < SERVER_COUNT; i++) {
if (ctx->servers[i].sock >= 0 ||
ctx->servers[i].dns_server.sa_family != 0) {
ret = dns_server_close(ctx, i);
if (ret < 0 && ret != -ENOENT) {
NET_DBG("Cannot close DNS server %d (%d)", i, ret);
}
}
}
ctx->state = DNS_RESOLVE_CONTEXT_INACTIVE;
return 0; // Success, not error
}
// ... existing ACTIVE state cleanup code unchanged ...
}Regression
- This is a regression.
Steps to reproduce
Steps to reproduce the behavior:
- Boot system with cellular modem or any network interface
- Call
dns_resolve_init()with a DNS server (e.g., "141.1.1.1") - Verify DNS resolution works correctly
- Simulate network disconnect/reconnect (e.g., modem power cycle)
- Call
dns_resolve_close()to clean up DNS context - Network reconnects with same or different IP address
- Call
dns_resolve_init()again with the same DNS server - Attempt DNS resolution
Relevant log output
## Logs and console output
**Working scenario (First DNS init):**
[00:01:06.105] <dbg> net_dns_resolve: is_server_name_found: Checking server[0]: sa_family=0 sock=-1
[00:01:06.105] <dbg> net_dns_resolve: is_server_name_found: Server 141.1.1.1 NOT found
[00:01:06.105] <inf> net_dns_resolve: get_free_slot: Found free slot at index 0
[socket creation proceeds normally]
[00:01:07.181] <dbg> modem_hl7812: offload_bind: offload_bind
[00:01:07.181] <dbg> modem_hl7812: offload_recv: offload_recv
[00:01:07.181] <dbg> net_sock_svc: Socket service: triggering restart via eventfd for service 0x8094e34 with 1 fds
[00:01:07.181] <inf> modem_hl7812: DNS resolver initialized successfully
**Broken scenario (Second DNS init after reconnect):**
[00:02:23.560] <inf> modem_hl7812: Closing DNS context (state=3) to unregister from socket service
[00:02:23.560] <dbg> net_dns_resolve: dns_resolve_close_locked: state=3
[00:02:23.560] <inf> modem_hl7812: DNS context state after close: 3 (expecting 3=INACTIVE)
[00:02:23.560] <inf> modem_hl7812: Initializing DNS resolver with server 141.1.1.1
[00:02:23.560] <dbg> net_dns_resolve: is_server_name_found: Checking server[0]: sa_family=1 sock=-1 ← BUG!
[00:02:23.560] <dbg> net_dns_resolve: Server[0] addr=141.1.1.1, comparing with 141.1.1.1
[00:02:23.560] <inf> net_dns_resolve: Server 141.1.1.1 FOUND at index 0 (sock=-1, sa_family=1)
[00:02:23.560] <dbg> net_dns_resolve: Server 141.1.1.1 already exists
[socket creation SKIPPED - offload_bind/recv never called]
[socket service restart NEVER triggered]
[00:02:24.633] <inf> modem_hl7812: DNS resolver initialized successfully, new socket registered with service
**Note:** `sa_family=1` (AF_INET) in the broken scenario indicates the server slot was never cleared, even though the socket was closed (`sock=-1`).Impact
Annoyance – Minor irritation; no significant impact on usability or functionality.
Environment
Environment
Target Platform:
- Board: Custom board with stm32u5+ Sierra Wireless HL7812 cellular modem
- Zephyr version: v4.3.0-rc2 (likely affects all versions)
- OS: Windows 11
- Toolchain: zephyr-sdk-0.17.2
- Last commit from zephyr,
Commit: 80be486c8779725605cca3a85472b0861e6f6b04
Parents: 9a455998a4ffae941f1023e32e1f29f7cb43449e
Author: BUDKE Gerson Fernando <[[email protected]](mailto:[email protected])>
Author Date: Tue Nov 04 2025 10:10:42 GMT+0000 (Greenwich Mean Time)
Additional Context
No response